CN113191445A - Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm - Google Patents

Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm Download PDF

Info

Publication number
CN113191445A
CN113191445A CN202110531130.8A CN202110531130A CN113191445A CN 113191445 A CN113191445 A CN 113191445A CN 202110531130 A CN202110531130 A CN 202110531130A CN 113191445 A CN113191445 A CN 113191445A
Authority
CN
China
Prior art keywords
image
hash
generator
hash code
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110531130.8A
Other languages
Chinese (zh)
Other versions
CN113191445B (en
Inventor
曹媛
刘峻玮
桂杰
许晓伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202110531130.8A priority Critical patent/CN113191445B/en
Publication of CN113191445A publication Critical patent/CN113191445A/en
Application granted granted Critical
Publication of CN113191445B publication Critical patent/CN113191445B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a large-scale image retrieval method based on an automatic supervision countercheck hash algorithm. The invention provides a new Hash learning framework, which is called self-supervision counterattack Hash; the framework primarily learns discriminative hash codes using image rotation based self-supervised similarity metrics and generation countermeasure networks. The neural network model mainly comprises an encoder for acquiring the hash code, a generator for generating a pseudo image and a discriminator for distinguishing a true image from a false image; a loss function consisting of approximate semantic similarity loss, feature loss and antagonism loss is designed to maintain the similarity between the image and the hash code. Adding self-supervision characteristics into the whole model, neglecting bottom-layer semantic information and keeping high-layer semantic information; particularly for short hash codes, the high-level semantic information of the image can be better maintained. Experimental results show that compared with the conventional retrieval method, the image retrieval method provided by the invention has better image retrieval performance.

Description

Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm
Technical Field
The invention belongs to the technical field of deep learning, and particularly relates to a large-scale image data retrieval method based on an automatic supervision countermeasure Hash algorithm.
Background
The hash algorithm is concerned more and more in solving the problem of large-scale image retrieval due to low storage requirement and high search efficiency; and can be classified into supervised hashing and unsupervised hashing according to whether an image label is used, and generally, the supervised hashing method has better performance than the unsupervised hashing method. However, in most cases, there is no label information in the dataset that is useful for the image, and manual labeling requires a lot of manpower. In response to this problem, many researchers have attempted to improve the process. For example, Gidaris et al propose an image rotation based self-supervision method; however, this may result in different feature representations of the images before and after rotation. Although Misra et al solved this problem, they did not map the similarity matrix of similar images in the original space to the feature space.
With the rise of deep learning, deep learning algorithms can be divided into two categories: supervised learning and unsupervised learning. Supervised learning algorithms are favored by people with their high accuracy. However, manually marked labels are not readily available and require significant human resources. Therefore, in recent years, unsupervised learning algorithms have received increasing attention. The self-supervised learning is a popular choice in the unsupervised learning, and the popularity thereof is inevitable. After all mainstream supervised learning tasks mature, data becomes the most important bottleneck. Learning effective information from unlabeled data is a very important research subject, and self-supervision learning provides very rich imagination.
Disclosure of Invention
The invention aims to provide a large-scale image retrieval method based on an automatic supervision countermeasure hash algorithm to make up for the defects of the prior art.
In order to achieve the purpose, the invention adopts the following specific technical scheme:
a large-scale image retrieval method based on an automatic supervision countermeasure hash algorithm comprises the following steps:
s1: acquiring image data comprising a training set and a test set;
s2: optimizing the encoder by utilizing the training set;
s3: rotating the test set image data, and inputting the test set image data into an optimized S2 encoder to obtain a hash code;
s4: and calculating the Hamming distance between the hash code obtained in the step S3 and the hash code of the training set S2, sequencing the Hamming distances from small to large, outputting the first k retrieval results, and completing retrieval.
Further, in S2: the encoder uses a structure similar to VGG19, including five convolutional layers, two fully-connected layers, and one hash layer; for feature comparison, a full connection layer is added at the end; by utilizing the relation between image neighborhood structures, namely the relation between the Hash code B and the semantic similarity matrix, the following objective function l is providedsTo learn the hash code to approximate as closely as possible the raw data distribution in the projection space:
Figure BDA0003067929210000021
wherein, L is the length of the hash code, S is a similarity matrix, E represents that the objective function optimizes an encoder which is used for generating the hash code
Figure BDA0003067929210000022
Optimization ofsSimilar images in the original space can be made to have similar hash codes when mapped to the hash space.
Further, the encoder optimization in S2 specifically includes:
s2-1: obtaining the feature vectors of the images of the training set, calculating the cosine distances between the images and sequencing to obtain a similarity ranking;
s2-2: analyzing the similarity ranking and setting a threshold value to obtain a similarity matrix;
s2-3: rotating the training set image and inputting the image into the encoder to obtain a hash code;
s2-4: inputting the hash code into a generator to obtain a pseudo image;
s2-5: inputting the pseudo image and the real image into a discriminator simultaneously for countermeasure training;
s2-6: optimizing the encoder, the generator and the discriminator according to an objective function; the optimized encoder, generator and discriminator form an auto-supervised countermeasure hash algorithm.
Further, the S2-1 is specifically: for database points
Figure BDA0003067929210000023
Feature vectors are extracted from pool5 layer of VGG model by using k-nearest neighbor (KNN) method
Figure BDA0003067929210000024
And calculating cosine distances between the two groups, and sequencing the two groups in the order from small to large to obtain similarity ranking.
Further, in the S2-2: setting the K1 range as its neighborhood according to the cosine similarity of each image, and obtaining the initial matrix S1,S1The calculation is as follows:
Figure BDA0003067929210000025
wherein x isiAnd xjFor the feature vector of the image, K1-NN is xiK1 nearest neighbors of (S)1On the basis of (1), comparing S1And corresponding column of (1) and use
Figure BDA0003067929210000031
Structure S2As follows:
Figure BDA0003067929210000032
wherein K2-NN is xiK2 nearest neighbors and finally, combining these two matrices into S, is calculated as follows:
Figure BDA0003067929210000033
further, in the S2-4: the generator consists of a full-connection layer and four deconvolution layers, and a hash code is input into the generator to be used as noise to generate a new image; specifically, the hash code is input into a fully concatenated layer of size 8 × 8 × 256, and then 3 deconvolution layers of 5 × 5 and 1 × 1 are used, the number of kernels being 256, 128, 32, and 3, respectively; for the image I generated by the generatorGAnd an original image I, and an objective function l is provided between the feature vectorsf1(ii) a The objective function is defined as follows:
Figure BDA0003067929210000034
where Ψ (-) denotes the convolution-activated feature vector, w and h denote the sizes of the corresponding features, and D denotes the adjustment of the parameters in the arbiter; the generator generates a new image by using the hash code of the rotated image, so that a larger difference exists between the feature vector of the new image and the original image in consideration of low-level semantic information in the image; based on this problem, the image I after rotationRAn objective function is arranged between the feature vectors of the original image I and the feature vectors of the original image I, the objective function being to ensure that the feature vectors of the same image are as similar as possible irrespective of rotation, thereby reducing the new image IRAnd a rotated image I obtained from the encoderGThe feature vector of the original image is obtained from the discriminator. Therefore, we use this loss function to optimize the encoder and the discriminator; so the finally set objective function lf2The following were used:
Figure BDA0003067929210000035
wherein, IRThe image is rotated, and I is an original image; the aim of the method is to omit the lower layersSo that the feature loss of the lower layer is not considered. Wherein lf=lf1+γlf2And gamma is a weight parameter.
Further, in the S2-5: according to the structure of counterstudy, a discriminator D is arranged to judge whether the image is true or false, so as to optimize the generator G; the optimization of the generation of the countermeasure network is a very small value game problem; the discriminator D is composed of four convolution layers, the number of the cores is 32, 128, 256 and 256, a full connection layer is arranged next to the cores, the size of the full connection layer is 1024, and eLU is used as an activation function; the generative model is essentially a maximum likelihood estimate, which is used to generate a model of the particular distribution data; the function of the generated model is to capture the distribution of sample data and convert the distribution of the original input information into a sample with specified distribution through the transformation of parameters in the maximum likelihood estimation; using a standard objective function for generating a countermeasure network, the formula is as follows:
Figure BDA0003067929210000041
wherein G represents adjusting parameters in the generator, D (-) represents the output of the last layer of the discriminator, and G (-) represents the output of the last layer of the generator; in order to make the effect of generating the antagonistic network more obvious, random noise is input into the generator, and the generator and the discriminator are optimized by using the following loss function:
Figure BDA0003067929210000042
where z is random noise, ld=ld′+ld”
Further, in the S2-6: the sum of the four parts of the loss function is set, and the two weights α and β of the loss function are set, the overall loss function, i.e. the objective function, is as follows:
L=ls+αlf+βld (9);
use ofThe above (9) performs network optimization, and adds a hash layer after the last complete connection layer to obtain hash code, and parameter learning is required
Figure BDA0003067929210000043
Theta, eta and xi, and the calculation process is as follows;
Figure BDA0003067929210000044
wherein
Figure BDA0003067929210000045
Indicates the input rotated picture, beta,
Figure BDA0003067929210000046
And θ represents a parameter in the encoder network;
IG=G(bi;η) (11)
where η represents a parameter in the generator;
Figure BDA0003067929210000047
where ξ represents a parameter in the discriminator; the objective function is optimized using back-propagation and random gradient descent algorithms.
The invention has the advantages and technical effects that:
the invention provides a new Hash learning framework, which is called self-supervision counterattack Hash; the framework primarily learns discriminative hash codes using image rotation based self-supervised similarity metrics and generation countermeasure networks (GANs). The neural network model mainly comprises an encoder for acquiring the hash code, a generator for generating a pseudo image and a discriminator for distinguishing a true image from a false image; a loss function consisting of approximate semantic similarity loss, feature loss and antagonism loss is designed to maintain the similarity between the image and the hash code. Adding self-supervision characteristics into the whole model, neglecting bottom-layer semantic information and keeping high-layer semantic information; particularly for short hash codes, the high-level semantic information of the image can be better maintained.
And the experimental result shows that compared with the conventional retrieval method, the image retrieval method provided by the invention has better image retrieval performance.
Drawings
FIG. 1 is a diagram illustrating the process of the present invention for self-supervised countermeasure hashing.
Fig. 2 is an effect diagram of the similarity matrix S generated by the present invention.
Fig. 3 is a graph comparing the effect of the weight parameter gamma on the mean of the precision (mAP) of different bits in the present invention.
FIG. 4 is a comparison graph of the results of whether pixel loss is considered in the loss function of the present invention.
Detailed Description
The invention will be further explained and illustrated by means of specific embodiments and with reference to the drawings.
Example 1:
a large-scale image retrieval method based on an auto-supervision countermeasure hash algorithm comprises the following steps (as shown in figure 1):
step 1: firstly, extracting a Semantic similarity matrix S (such as a Semantic similarity matrix part in FIG. 1 and FIG. 2) from a data set;
step 2: rotating the image to a certain angle, inputting the image into an Encoder to obtain a hash code (such as an Encoder part in fig. 1);
and step 3: inputting the hash code into the Generator to obtain a new image (as in the Generator section of FIG. 1);
and 4, step 4: the original image and the new image are input to a Discriminator for confrontation recognition (see the Discriminator section of fig. 1).
And 5: and optimizing the network according to the objective function. The performance (table 1) and training time (table 2) of the self-supervised counterhash algorithm (SHGan) and several hash algorithms (iterative quantization (ITQ), Locality Sensitive Hash (LSH), Spectral Hash (SH), Spherical hash (Spherical hash), deep binary hash (DeepBit), Deep Hash (DH), binary counterhash (BGAN)) on the cifar-10 dataset are shown in the following table:
TABLE 1 mean of average precision results on cifar-10 data set after 90 degree rotation of the image
Figure BDA0003067929210000061
TABLE 2 training and testing time for deep Hash, binary countermeasure Hash, and self-supervision countermeasure Hash
Figure BDA0003067929210000062
TABLE 3 average precision mean of 12-bit hash codes obtained by 90 and 180 degree rotation in self-supervised countermeasure hashing
Rotation angle mAP
90 0.495
180 0.510
Example 2:
the embodiment 1 comprises the following concrete steps:
step 1: for database points
Figure BDA0003067929210000063
Feature vectors are extracted from pool5 layer of VGG model by using k-nearest neighbor (KNN) method
Figure BDA0003067929210000064
And calculating cosine distances between them, and sorting in order from small to large. Setting the K1 range as its neighborhood according to the cosine similarity of each image, and obtaining the initial matrix S1,S1The calculation is as follows:
Figure BDA0003067929210000071
wherein x isiAnd xjFor the feature vector of the image, K1-NN is xiK1 nearest neighbors of (S)1On the basis of (1), comparing S1And corresponding column of (1) and use
Figure BDA0003067929210000072
Structure S2As follows:
Figure BDA0003067929210000073
wherein K2-NN is xiK2 nearest neighbors and finally, combining these two matrices into S, is calculated as follows:
Figure BDA0003067929210000074
step 2: the picture is rotated to a certain angle and then input into an encoder E, which uses a structure similar to VGG19 and comprises five convolutional layers, two full-link layers and a hash layer. For feature comparison, a full link layer is added at the end. By utilizing the relationship between the image neighborhood structures, namely the relationship between the hash code B and the semantic similarity matrix S, the following objective function is proposed to learn the hash code so as to be as close to the original data distribution in the projection space as possible:
Figure BDA0003067929210000075
wherein L is the length of the hash code, S is a similarity matrix, E represents that the objective function optimizes the encoder, and the encoder E is used for generating the hash code
Figure BDA0003067929210000076
Optimization ofsSimilar images in the original space can be made to have similar hash codes when mapped to the hash space.
And step 3: in generator G, we input hash code B as "noise" to generate a new image. Specifically, the hash code B is input into a fully connected layer having a size of 8 × 8 × 256. Then 3 5 × 5 and 1 × 1 deconvolution layers were used. The number of kernels is 256, 128, 32 and 3, respectively. For the image I generated by the generatorGAnd an original image I, an objective function is proposed between the feature vectors. The objective function is defined as follows:
Figure BDA0003067929210000077
where Ψ (-) denotes the convolution-activated feature vector, w and h denote the sizes of the corresponding features, and D denotes the adjustment of the parameters in the arbiter. However, the generator generates a new image using the hash code of the rotated image, and thus, a large gap exists between the feature vector of the new image and the original image in consideration of low-level semantic information in the image. Based on this problem, the image I after rotationRAn objective function is arranged between the feature vectors of the original image I and the feature vectors of the original image I, the objective function being to ensure that the feature vectors of the same image are as similar as possible irrespective of rotation, thereby reducing the new image IRAnd a rotated image I obtained from the encoderGThe feature vector of the original image is obtained from the discriminator. Therefore, this loss function is used to optimize the encoder and the discriminator. The set objective function is as follows:
Figure BDA0003067929210000081
wherein, IRThe image is rotated, and I is an original image; the purpose of this method is to ignore the semantic information of the lower layers, so it does not take into account the feature loss of the lower layers. lf=lf1+γlf2And gamma is a weight parameter.
And 4, step 4: the original picture I and the pseudo picture I are combinedGThe input to the discriminator is optimized by setting a discriminator D to judge the truth of the picture according to the structure of the counterstudy. Optimization to generate a competing network is a very small value gaming problem. Arbiter D consists of four convolutional layers with a core number of 32, 128, 256, followed by a fully-connected layer of 1024 size, eLU as the activation function. The generative model is essentially a maximum likelihood estimate that is used to generate a model of the particular distribution data. The function of the generated model is to capture the distribution of sample data and convert the distribution of the original input information into samples with specified distribution through the transformation of parameters in the maximum likelihood estimation. Using a standard objective function for generating a countermeasure network, the formula is as follows:
Figure BDA0003067929210000082
wherein G represents adjusting parameters in the generator, D (-) represents the output of the last layer of the discriminator, and G (-) represents the output of the last layer of the generator; in order to make the effect of generating the antagonistic network more obvious, random noise is input into the generator, and the generator and the discriminator are optimized by using the following loss function:
Figure BDA0003067929210000083
where z is random noise. ld=ld′+ld”
The sum of the four parts of the loss function is set, and the two weights α and β of the loss function are set, and the overall loss function is as follows:
L=ls+αlf+βld (9)
and 5: we use (9) above to optimize our network and add a hash layer after the last complete connection layer to obtain our hash code, which requires learning the parameter We Tθ, η, and ξ, the calculation is as follows:
Figure BDA0003067929210000091
wherein
Figure BDA0003067929210000092
Indicates the input rotated picture, beta,
Figure BDA0003067929210000093
And θ represents a parameter in the encoder network.
IG=G(bi;η) (11)
Where η represents a parameter in the generator G.
Figure BDA0003067929210000094
Where ξ represents a parameter in discriminator D; the objective function is optimized using back-propagation and random gradient descent algorithms.
Example 3:
experiments were performed on Cifar-10. Cifar-10 is a dataset compiled by Alex krizhevsky and Ilya sutskver. It contains 60000 images (32 × 32)10 categories of 6000 pictures each. In Cifar-10, 1000 pictures are randomly drawn at each class as a training set and 100 pictures are drawn as a test set.
Evaluation indices mean of precision (mAP) and mean precision (AP) were used to evaluate our method. For each query, the average precision is the average of the top k results, the average precision is the average of all queries, and the calculation formula of the average precision is as follows:
Figure BDA0003067929210000095
where N represents the number of instances in the database used for the query that are relevant to the ground truth. P (k) represents the precision of the first k instances. When the kth instance is relevant to the query, δ (k) is 1, otherwise δ (k) is 0.
Results on the cifar-10 dataset:
first, let K1 be 20 and K2 be 30 to obtain the semantic similarity matrix S. Fig. 2 shows a part of the data in the matrix. Then, the minimum batch size was set to 256, and the learning rate was set to 0.0001. Setting alpha-1, beta-1 and gamma-3. The rotation angle is 90 degrees.
Table 1 shows the average precision mean results for 12, 24, 32 and 48 bits. The results show that the average precision mean results of 12 bits, 24 bits and 48 bits are respectively improved by 9.4%, 9.8% and 5.2%. This shows that the present invention has better performance on fewer bits, and the hash code provided by the present invention can better represent high-level semantic information of an image, thereby verifying the above inference. To further verify this idea, the image was rotated by 180 degrees again for the experiment, and the results are shown in table 3. As shown in table 3, the result of the 12-bit hash code is improved by 10.9%.
Training and testing times for Deep Hash (DH), binary countermeasure hash (BGAN), and self-supervised countermeasure hash (SHGan) were further compared. As shown in table 2. Binary hash (BGAN) and self-supervision countermeasure hash parameters are more, and training time and testing time are both longer than DH. However, binary hashing (BGAN) and self-supervised counterhashing (hch) generate hash codes very quickly due to the advantages of the hashing algorithm.
The influence of the parameter gamma on the experimental results was investigated. As shown in fig. 3, it was found that γ had little influence on the experimental results.
According to the pixel loss function in the binary countermeasure hash, a loss function at a pixel level is added to the auto-supervised countermeasure hash. The formula is as follows:
Figure BDA0003067929210000101
Iijand
Figure BDA0003067929210000102
respectively representing the original image and the generated pseudo-image. Since the high number of bits is more representative of the pixel information in the image, the experimental results of the 32-bit and 48-bit hash codes are compared. The results are shown in FIG. 4. It can be seen that the mean of average accuracy (mAP) of this method decreases significantly with the loss of pixels, which further illustrates that our invention can enable neural networks to learn the high level semantic information of images.
In summary, the present invention provides an auto-supervised hash algorithm based on a generative countermeasure network, which is called as an auto-supervised countermeasure hash algorithm. The self-supervision countermeasure hash is composed of an encoder, a generator and a discriminator. A loss function consisting of approximate semantic similarity loss, feature loss and antagonism loss is designed to maintain the similarity between the image and the hash code. The learned hash code can better represent high-level semantic information of the image, so that the accuracy of image retrieval is improved. Experimental results on the Cifar-10 dataset show that the proposed self-supervised counterhash has a higher performance. The invention provides a self-supervision learning method, which utilizes self-supervision information based on rotation or other transformation to design a target function; the generation of an antagonistic network is one of the most promising self-supervision learning methods; generating the countermeasure network can effectively generate synthetic data from the underlying space that is similar to the training data.

Claims (8)

1. A large-scale image retrieval method based on an automatic supervision countermeasure hash algorithm is characterized by comprising the following steps:
s1: acquiring image data comprising a training set and a test set;
s2: optimizing the encoder by utilizing the training set;
s3: rotating the test set image data, and inputting the test set image data into an optimized S2 encoder to obtain a hash code;
s4: and calculating the Hamming distance between the hash code obtained in the step S3 and the hash code of the training set S2, sequencing the Hamming distances from small to large, outputting the first k retrieval results, and completing retrieval.
2. The large-scale image retrieval method according to claim 1, wherein in S2: the encoder uses a structure similar to VGG19, including five convolutional layers, two fully-connected layers, and one hash layer; adding a full connection layer; by utilizing the relationship between image neighborhood structures, namely the relationship between the hash code and the semantic similarity matrix, the following objective function is proposed to learn the hash code so as to approximate the original data distribution in the projection space:
Figure FDA0003067929200000011
where L is the length of the hash code, and the encoder is configured to generate the hash code
Figure FDA0003067929200000012
Optimization ofsSimilar images in the original space are made to have similar hash codes when mapped to the hash space.
3. The large-scale image retrieval method of claim 1, wherein the encoder optimization in S2 specifically includes:
s2-1: obtaining the feature vectors of the images of the training set, calculating the cosine distances between the images and sequencing to obtain a similarity ranking;
s2-2: analyzing the similarity ranking and setting a threshold value to obtain a similarity matrix;
s2-3: rotating the training set image and inputting the image into the encoder to obtain a hash code;
s2-4: inputting the hash code into a generator to obtain a pseudo image;
s2-5: inputting the pseudo image and the real image into a discriminator simultaneously for countermeasure training;
s2-6: optimizing the encoder, the generator and the discriminator according to an objective function; the optimized encoder, generator and discriminator form an auto-supervised countermeasure hash algorithm.
4. The large-scale image retrieval method according to claim 3, wherein the S2-1 is specifically: for database points
Figure FDA0003067929200000013
Extracting feature vectors from pool5 layer of VGG model by using k-nearest neighbor KNN method
Figure FDA0003067929200000014
And calculating cosine distances between the two groups, and sequencing the two groups in the order from small to large to obtain similarity ranking.
5. The large-scale image retrieval method according to claim 3, wherein in the S2-2: setting the K1 range as its neighborhood according to the cosine similarity of each image, and obtaining the initial matrix S1,S1The calculation is as follows:
Figure FDA0003067929200000015
wherein x isiAnd xjFor the feature vector of the image, K1-NN is xiK1 nearest neighbors of (S)1On the basis of (1), comparing S1And corresponding column of (1) and use
Figure FDA0003067929200000021
Structure S2As follows:
Figure FDA0003067929200000022
wherein K2-NN is xiK2 nearest neighbors and finally, combining these two matrices into S, is calculated as follows:
Figure FDA0003067929200000023
6. the large-scale image retrieval method according to claim 3, wherein in the S2-4: the generator consists of a full-connection layer and four deconvolution layers, and a hash code is input into the generator to be used as noise to generate a new image; specifically, the hash code is input into a fully concatenated layer of size 8 × 8 × 256, and then 3 deconvolution layers of 5 × 5 and 1 × 1 are used, the number of kernels being 256, 128, 32, and 3, respectively; for the image I generated by the generatorGAnd an original image I, and an objective function is provided among the characteristic vectors; the objective function is defined as follows:
Figure FDA0003067929200000024
where Ψ (-) denotes the convolution-activated feature vector, w and h represent the sizes of the corresponding features; the generator generates a new image by using the hash code of the rotated image, and the rotated image IRAn objective function is set between the feature vector of the original image I and the feature vector of the original image I, and the feature vector of the original image is obtained from the discriminator; using this loss function to optimize the encoder and the discriminator; the final set objective function is therefore as follows:
Figure FDA0003067929200000025
wherein lf=lf1+γlf2And gamma is a weight parameter.
7. The large-scale image retrieval method according to claim 3, wherein in the S2-5: according to the structure of counterstudy, a discriminator D is arranged to judge whether the image is true or false, so as to optimize the generator G; the discriminator D is composed of four convolution layers, the number of the cores is 32, 128, 256 and 256, a full connection layer is arranged next to the cores, the size of the full connection layer is 1024, and eLU is used as an activation function; using a standard objective function for generating a countermeasure network, the formula is as follows:
Figure FDA0003067929200000026
in order to make the effect of generating the antagonistic network more obvious, random noise is input into the generator, and the generator and the discriminator are optimized by using the following loss function:
Figure FDA0003067929200000031
where z is random noise, ld=ld′+ld”
8. The large-scale image retrieval method according to claim 3, wherein in the S2-6: the sum of the four parts of the loss function is set, and the two weights α and β of the loss function are set, the overall loss function, i.e. the objective function, is as follows:
L=ls+αlf+βld (9);
using the above (9) to perform network optimization, and adding a hash layer after the last complete connection layer to obtain hash code, learning parameters is required
Figure FDA0003067929200000032
Theta, eta and xi, and the calculation process is as follows;
Figure FDA0003067929200000033
wherein
Figure FDA0003067929200000034
A rotated picture representing the input is shown,
Figure FDA0003067929200000035
and θ represents a parameter in the encoder network;
IG=G(bi;η) (11)
where η represents a parameter in the generator;
Figure FDA0003067929200000036
where ξ represents a parameter in the discriminator; the objective function is optimized using back-propagation and random gradient descent algorithms.
CN202110531130.8A 2021-05-16 2021-05-16 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm Active CN113191445B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110531130.8A CN113191445B (en) 2021-05-16 2021-05-16 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110531130.8A CN113191445B (en) 2021-05-16 2021-05-16 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm

Publications (2)

Publication Number Publication Date
CN113191445A true CN113191445A (en) 2021-07-30
CN113191445B CN113191445B (en) 2022-07-19

Family

ID=76981846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110531130.8A Active CN113191445B (en) 2021-05-16 2021-05-16 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm

Country Status (1)

Country Link
CN (1) CN113191445B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326390A (en) * 2021-08-03 2021-08-31 中国海洋大学 Image retrieval method based on depth feature consistent Hash algorithm
CN113946710A (en) * 2021-10-12 2022-01-18 浙江大学 Video retrieval method based on multi-mode and self-supervision characterization learning
CN115841589A (en) * 2022-11-08 2023-03-24 河南大学 Unsupervised image translation method based on generation type self-attention mechanism

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063112A (en) * 2018-07-30 2018-12-21 成都快眼科技有限公司 A kind of fast image retrieval method based on multi-task learning deep semantic Hash, model and model building method
CN109918528A (en) * 2019-01-14 2019-06-21 北京工商大学 A kind of compact Hash code learning method based on semanteme protection
CN109960737A (en) * 2019-03-15 2019-07-02 西安电子科技大学 Remote Sensing Images search method of the semi-supervised depth confrontation from coding Hash study
CN110110128A (en) * 2019-05-06 2019-08-09 西南大学 The discrete hashing image searching system of quickly supervision for distributed structure/architecture
CN110516095A (en) * 2019-08-12 2019-11-29 山东师范大学 Weakly supervised depth Hash social activity image search method and system based on semanteme migration
CN111597298A (en) * 2020-03-26 2020-08-28 浙江工业大学 Cross-modal retrieval method and device based on deep confrontation discrete hash learning
CN112199520A (en) * 2020-09-19 2021-01-08 复旦大学 Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix
CN112214623A (en) * 2020-09-09 2021-01-12 鲁东大学 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method
CN112214570A (en) * 2020-09-23 2021-01-12 浙江工业大学 Cross-modal retrieval method and device based on counterprojection learning hash

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063112A (en) * 2018-07-30 2018-12-21 成都快眼科技有限公司 A kind of fast image retrieval method based on multi-task learning deep semantic Hash, model and model building method
CN109918528A (en) * 2019-01-14 2019-06-21 北京工商大学 A kind of compact Hash code learning method based on semanteme protection
CN109960737A (en) * 2019-03-15 2019-07-02 西安电子科技大学 Remote Sensing Images search method of the semi-supervised depth confrontation from coding Hash study
CN110110128A (en) * 2019-05-06 2019-08-09 西南大学 The discrete hashing image searching system of quickly supervision for distributed structure/architecture
CN110516095A (en) * 2019-08-12 2019-11-29 山东师范大学 Weakly supervised depth Hash social activity image search method and system based on semanteme migration
CN111597298A (en) * 2020-03-26 2020-08-28 浙江工业大学 Cross-modal retrieval method and device based on deep confrontation discrete hash learning
CN112214623A (en) * 2020-09-09 2021-01-12 鲁东大学 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method
CN112199520A (en) * 2020-09-19 2021-01-08 复旦大学 Cross-modal Hash retrieval algorithm based on fine-grained similarity matrix
CN112214570A (en) * 2020-09-23 2021-01-12 浙江工业大学 Cross-modal retrieval method and device based on counterprojection learning hash

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUAN CAO ET AL.: ""Learning to Hash with Dimension Analysis based Quantizer for Image Retrieval"", 《IEEE》 *
施鸿源 等: ""适用于图像检索的强化对抗生成哈希方法"", 《小型微型计算机***》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326390A (en) * 2021-08-03 2021-08-31 中国海洋大学 Image retrieval method based on depth feature consistent Hash algorithm
CN113326390B (en) * 2021-08-03 2021-11-02 中国海洋大学 Image retrieval method based on depth feature consistent Hash algorithm
CN113946710A (en) * 2021-10-12 2022-01-18 浙江大学 Video retrieval method based on multi-mode and self-supervision characterization learning
CN113946710B (en) * 2021-10-12 2024-06-11 浙江大学 Video retrieval method based on multi-mode and self-supervision characterization learning
CN115841589A (en) * 2022-11-08 2023-03-24 河南大学 Unsupervised image translation method based on generation type self-attention mechanism

Also Published As

Publication number Publication date
CN113191445B (en) 2022-07-19

Similar Documents

Publication Publication Date Title
CN113190699B (en) Remote sensing image retrieval method and device based on category-level semantic hash
CN107480261B (en) Fine-grained face image fast retrieval method based on deep learning
CN108875807B (en) Image description method based on multiple attention and multiple scales
CN107122809B (en) Neural network feature learning method based on image self-coding
CN113191445B (en) Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm
CN111898689A (en) Image classification method based on neural network architecture search
Islam et al. InceptB: a CNN based classification approach for recognizing traditional bengali games
CN111125411A (en) Large-scale image retrieval method for deep strong correlation hash learning
CN108984642A (en) A kind of PRINTED FABRIC image search method based on Hash coding
CN113657561A (en) Semi-supervised night image classification method based on multi-task decoupling learning
CN109960732B (en) Deep discrete hash cross-modal retrieval method and system based on robust supervision
CN112686376A (en) Node representation method based on timing diagram neural network and incremental learning method
CN114118369A (en) Image classification convolution neural network design method based on group intelligent optimization
CN111783688B (en) Remote sensing image scene classification method based on convolutional neural network
Bai et al. Learning high-level image representation for image retrieval via multi-task dnn using clickthrough data
CN111079840B (en) Complete image semantic annotation method based on convolutional neural network and concept lattice
CN116977725A (en) Abnormal behavior identification method and device based on improved convolutional neural network
CN111507472A (en) Precision estimation parameter searching method based on importance pruning
CN114168782B (en) Deep hash image retrieval method based on triplet network
Hao et al. Architecture self-attention mechanism: Nonlinear optimization for neural architecture search
CN113283530B (en) Image classification system based on cascade characteristic blocks
CN112446432B (en) Handwriting picture classification method based on quantum self-learning self-training network
CN115131605A (en) Structure perception graph comparison learning method based on self-adaptive sub-graph
CN113887653A (en) Positioning method and system for tightly-coupled weak supervised learning based on ternary network
CN114170426A (en) Algorithm model for classifying rare tumor category small samples based on cost sensitivity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant