CN111832706A

CN111832706A - Hash center-based continuous learning method

Info

Publication number: CN111832706A
Application number: CN202010649331.3A
Authority: CN
Inventors: 郭宝龙; 陈志杰; 廖楠楠; 李�诚; 莫文强; 张素婷
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2020-10-27

Abstract

The invention relates to a continuous learning method based on a Hash center, which inputs similar data pairs and non-similar data pairs in an image data set into a convolutional neural network for feature learning; the result of the feature learning passes through a Hash layer, wherein the Hash layer comprises three full-connection layers, and each full-connection layer comprises a row of neurons and an activation function; the hash codes of the similar data pairs converge to a public hash center after being constrained by the center similarity, the hash codes of the dissimilar data pairs converge to different hash centers after being constrained by the center similarity, and the hash layer converts continuous depth representation into K-dimensional representation; after the hash layer outputs a real number vector, using an activation function to binarize the K-dimensional representation into a K-bit binary hash code; the method can learn richer image characteristics, lose less detailed information of data, generate the binary hash code with higher accuracy and strong distinguishability, and can improve the hash retrieval performance of the image.

Description

Hash center-based continuous learning method

The technical field is as follows:

the invention relates to a machine learning method, in particular to a continuous learning method based on a hash center.

(II) background art:

and image retrieval, namely given a query image, quickly and accurately returning similar images from a specified database, and obtaining an image sequence according to the similarity measurement. Image retrieval can be divided into two categories according to different descriptions of image content: text-Based Image Retrieval (KBIR) and Content-Based Image Retrieval (CBIR).

Text-based image retrieval systems rely on keyword expressions to match query images to database images and then return results that best match the keyword content. The image retrieval system using the text as the label has the advantages that the system is relatively simple to implement and relatively high in retrieval speed, and the retrieval effect directly depends on the quality of the label marked on the image.

The image retrieval system based on the content finds the nearest neighbor image of the query image from a given database by using the extracted image visual features and a preset similarity measurement method, and the research content of the image retrieval system comprises visual content representation, feature compression, data indexing, nearest neighbor searching and other aspects.

The task of large-scale image retrieval is an important research branch in the field of computer vision, and the hash retrieval method has received more and more attention in order to ensure retrieval quality and calculation efficiency. The idea of the hash retrieval method is to convert high dimensional data into compact binary hash codes and generate similar binary hash codes for similar data items. At present, the deep hash method combining deep learning and hash coding obtains a prominent effect in an actual retrieval task by virtue of strong image feature learning and hash learning capabilities.

Due to the explosive growth of multimedia data, the KBIR technology based on manual labeling has been unable to meet the search requirement in the scene of massive data, while the CBIR technology has been rapidly developed in the past decade, but with the sharp increase of image data, the dimensionality of image features is continuously increased, and the conventional CBIR technology also faces different problems, such as higher precision based on linear scanning, but is not suitable for the search task with larger data volume; the traditional tree-based index structure has higher retrieval speed in the low-dimensional data retrieval task, but the problem of dimension disaster occurs along with the continuous increase of data dimension.

The defects and shortcomings of the existing image retrieval technology based on Hash coding mainly comprise:

(1) semantic gap exists between image hash code and high-level semantics "

In the image retrieval process, the similarity of two images is judged to have difference by two modes of bottom layer visual characteristics and semantic characteristics. For example, two images are judged to belong to similar images according to information such as color, shape, texture and the like, and are judged to belong to non-similar images based on semantic information. This is because there is a difference between the understanding of the underlying visual features and the high level semantics during learning.

(2) Validity and timeliness of hash codes

In large-scale image retrieval, effectiveness and timeliness are important indexes for evaluating hash codes. The validity refers to that the binary codes calculated by the learning mode can better distinguish images of different semantic categories, and the timeliness refers to that the binary codes obtained by the learning mode can distinguish valid data in real time. In actual image retrieval, effectiveness and timeliness are not compatible, and the effectiveness and timeliness need to be balanced in a proper manner.

(3) Discrete optimization problem in hash learning

The purpose of hash learning is to map image features to a binary space of a certain dimension, and the quality of hash mapping is determined by the discrete optimization problem of hash learning. The method can achieve the aim of improving the hash retrieval performance of the image by reducing the quantization error and keeping the similarity of the binary code meanings. Therefore, the performance of large-scale Hash image retrieval can be improved through a large-scale discrete optimization mode. However, at present, the general method has the problem of NP difficulty (Non-deterministic polymonomialttime Hardness), and the discrete optimization problem cannot be effectively solved under the condition of large-scale image data.

The existing deep hash learning method is mainly used for learning continuous hash expression through the similarity of data pairs, but the global distribution of large-scale data cannot be effectively learned only through local information of the data pairs. Meanwhile, most methods affected by the problem of unfixed gradient in symbol activation function optimization need to firstly learn continuous representation and then generate binary hash codes in a single binarization process, so that original feature details of data are seriously lost, and the retrieval performance is reduced.

(III) the invention content:

the technical problem to be solved by the invention is as follows: the method can learn richer image features, loss less detailed information of data, and the generated binary hash code has higher accuracy and strong distinguishability, so that the performance of image hash retrieval is improved.

The technical scheme of the invention is as follows:

a continuous learning method based on a hash center comprises the following steps:

step 1, preparing an image data set (such as common image data sets disclosed by CIFAR-10, MS COCO, NUS-WIDE and the like) in the image retrieval field, wherein the image data set contains similar data pairs and non-similar data pairs;

step 2, inputting the similar data pairs and the non-similar data pairs in the image data set into a Convolutional Neural Network (CNN) for feature learning;

step 3, converting the continuous depth representation into K-dimensional representation by the hash layer (fch) through the result of the feature learning;

and 4, after the real number vector is output by the hash layer, using an activation function to binarize the K-dimensional representation into a K-bit binary hash code.

In step 2, the Convolutional Neural Network (CNN) is AlexNet or ResNet.

In step 3, after the hash layer is passed, the hash codes of the similar data pairs are converged to a public hash center after being constrained by the central similarity, the hash codes of the dissimilar data pairs are converged to different hash centers after being constrained by the central similarity, and the convergence target is the hash center in the Hamming space; the center similarity learning may save global similarity information between pairs of data into a hash function, with hash centers being a set of points spread out in hamming space and with sufficient distance from each other.

In step 3, the hash layer comprises three fully-connected layers, each fully-connected layer comprises a row of neurons and an activation function, the activation functions of the first two fully-connected layers are ReLU functions which can rapidly perform gradient descent, and the activation function of the third fully-connected layer is tanh function which limits the output to (-1, 1).

In step 4, the activation function is an sgn function.

The method for generating the hash center comprises the following specific steps:

s1, inputting the number m of hash centers to be generated and the dimension K of a Hamming space;

s2, if K is 2ⁿConstructing a K Hadamard (Hadmard) matrix

Wherein n is a positive integer greater than 1,

representing elements in a Hadamard matrix, wherein the Hadamard matrix is a binary matrix with-1 or 1 as an element;

then constructing the matrix according to the properties of the Hadmard matrix

T represents a transposed matrix;

(several existing methods for constructing hadamard matrices, such as Kronecker product, Paley, circulant matrix, walsh transform, etc., are limited to K-2ⁿAnd in hash code based image retrieval,the lengths of the hash codes are not all selected to be equal to the power of 2; therefore, when the length of the hash code is 12 bits, 48 bits and the like, the hash center cannot be generated continuously through the Hadamard matrix; )

If K is 2ⁿ3, then performing fast generation 2ⁿ3, Hadamard matrix flow;

s3, iteration i, from i ═ 1 to m:

if m is less than or equal to K, then

Directly taking any row from the matrix as a hash center;

if K is more than m and less than or equal to 2K, then

Reconstructing a hash center;

if none of the above conditions are met, then choose general data to c _i0, the other half of data c_iWhen the value is 1, the iteration is ended;

s4, replacing all-1 in the generated hash center with 0;

s5, outputting a hash center

Fast generation of 2ⁿThe 3Hadamard matrix procedure is as follows:

s2.1, inputting K, and initializing the first four rows x1, x2, x3 and x4 of the matrix;

s2.2, calculating that m is log₂(K/3) putting back if m is not an integer greater than 1, otherwise m → m-1;

s2.3, iteration l, from l ═ m to 1:

recursively randomly choosing xi to satisfy d_row(Y,xi,j)＝6；

Exiting and finishing the iteration;

s2.4, output H_k。

The invention has the beneficial effects that:

1. the method continuously optimizes the activation function and learns the central similarity, richer image characteristics can be learned, less data detail information is lost, the obtained binary hash code has higher accuracy, the accuracy of each bit is stably higher than that of other methods, and the speed is high.

2. The invention adopts the hash code generated by the learning method of center similarity, so that the hash code of similar data is converged in a public hash center, and dissimilar data is far away from the hash center, thus intuitively representing stronger distinguishability.

3. The continuous learning method solves the problem of the ill-adapted gradient in the deep network non-convex optimization with the unsmooth symbol activation by smoothing the original function and converting the original function into a different function and then gradually reducing the smoothing amount in the training period, so that the network can be rapidly converged.

4. The invention introduces quantization loss L_QThe generated hash code is refined, so that the generated binary hash code can be converged on a hash center, the problem that the generated hash code cannot be completely converged on the hash center in the prior art is solved, and the image hash retrieval performance is improved.

5. The invention generates 2 rapidly by the best offset matrixⁿ3Hadamard matrix, has promoted the flexibility that the length of the Hash code chooses, has expanded the application range of the invention.

(IV) description of the drawings:

FIG. 1 is a block diagram of a hash-centric based continuous learning method;

FIG. 2 is a graph of detection accuracy for each bit on the NUS _ WIDE data set according to the present invention and other methods;

FIG. 3 is a graph of detection accuracy for each bit on the MS COCO data set according to the present invention and other methods;

FIG. 4 is a graph of detection accuracy for each bit on the CIFAR-10 data set according to the present invention and other methods;

FIG. 5 is an accuracy-recall curve on a NUS _ WIDE data set for the present invention and other methods;

FIG. 6 is a graph of accuracy versus recall on a MS COCO data set for the present invention and other methods;

FIG. 7 is a graph of accuracy of the present invention and other methods in returning TOP-K on NUS _ WIDE data set (64 bits);

FIG. 8 is a graph of the accuracy of the present invention and other methods in returning TOP-K on the MS COCO data set (64 bits);

FIG. 9 is a graph of loss error and MAP for the present invention when trained on a CIFAR-10 dataset with the HCN method;

FIG. 10 is a two-dimensional visualization result of a hash code according to the present invention;

FIG. 11 is a two-dimensional visualization result of HashNet hash codes;

FIG. 12 is a visual comparison of the results of the present invention and HCN method queries on both NUS _ WIDE and MS COCO data sets;

in the figure, 1 is an image similarity data pair; 2 is image non-similar data pair; 3 is a hash layer; and 4 is Hamming space.

(V) detailed embodiment:

the hash center-based continuous learning method comprises the following steps (shown in fig. 1):

step 1, preparing an image data set (adopting a NUS-WIDE image data set) in the image retrieval field, wherein the image data set contains a similar data pair 1 and a non-similar data pair 2;

step 2, inputting the similar data pair 1 and the dissimilar data pair 2 in the image data set into a Convolutional Neural Network (CNN) for feature learning;

step 3, converting the continuous depth representation into K-dimensional representation by the Hash layer 3(fch) through the result of the feature learning;

and 4, after the hash layer 3 outputs the real number vector, using an activation function to binarize the K-dimensional representation into a K-bit binary hash code.

In step 2, the Convolutional Neural Network (CNN) is AlexNet.

In step 3, after passing through the hash layer 3, the hash codes of the similar data pairs 1 converge to a common hash center after passing through the central similarity constraint, the hash codes of the dissimilar data pairs 2 converge to different hash centers after passing through the central similarity constraint, and the convergence targets are the hash centers in the hamming space 4 (C1 and C2 in fig. 1); the center similarity learning may save global similarity information between pairs of data into a hash function, whereas the hash center is a set of points spread out in the hamming space 4 and having a sufficient distance between each other.

In step 3, the hash layer 3 contains three fully-connected layers, each fully-connected layer contains a row of neurons and an activation function, the activation functions of the first two fully-connected layers are ReLU functions, which can perform gradient descent quickly, and the activation function of the third fully-connected layer is tanh function, which limits the output to (-1, 1).

In step 4, the activation function is an sgn function.

s1, inputting the number m of hash centers to be generated and 4-dimensional K of a Hamming space;

s2, if K is 2ⁿConstructing a K Hadamard (Hadmard) matrix

Wherein n is a positive integer greater than 1,

then constructing the matrix according to the properties of the Hadmard matrix

T represents a transposed matrix;

(several existing methods for constructing hadamard matrices, such as Kronecker product, Paley, circulant matrix, walsh transform, etc., are limited to K-2ⁿAnd in the image retrieval based on the hash code, the length of the hash codeAre not all equal to a power of 2; therefore, when the length of the hash code is 12 bits, 48 bits and the like, the hash center cannot be generated continuously through the Hadamard matrix; )

If K is 2ⁿ3, then performing fast generation 2ⁿ3, Hadamard matrix flow;

s3, iteration i, from i ═ 1 to m:

if m is less than or equal to K, then

Directly taking any row from the matrix as a hash center;

if K is more than m and less than or equal to 2K, then

Reconstructing a hash center;

s4, replacing all-1 in the generated hash center with 0;

s5, outputting a hash center

Fast generation of 2ⁿThe 3Hadamard matrix procedure is as follows:

s2.3, iteration l, from l ═ m to 1:

recursively randomly choosing xi to satisfy d_row(Y,xi,j)＝6；

Exiting and finishing the iteration;

s2.4, output H_k。

Experimental results and analysis:

in order to verify the effectiveness of the invention, a comparison experiment with the current classical hash retrieval method is designed, and the selected hash method mainly comprises a data independent hash method, an unsupervised hash method, a supervised hash method and a deep hash learning method according to different classifications.

1. Experimental Environment

The experimental development environment is shown in table 1:

TABLE 1 Experimental Environment

2. Experimental data set

To verify the performance of the invention, public image datasets commonly used in the field of image retrieval will be used for comparison: CIFAR-10, MS COCO and NUS-WIDE. Each data set is partitioned into a training set and a query set.

The CIFAR-10 dataset contains 60,000 RGB images of 32 x 32 pixels, which are manually labeled and then divided into 10 categories, each of which contains 6,000 images. Because the single label data set is a single label data set, the hash center is directly generated for each type in the experiment, and then 50,000 pieces of data are randomly selected as a training set and 10,000 pieces of data are selected as a query set.

The MS COCO is an image recognition and segmentation dataset, containing 82,783 training images and 40,504 verification images in the current version, each labeled by some of 80 classes. After pruning the images without category information, 112,217 data sets were randomly selected in this experiment by combining the training images and the verification images, 5,000 images were randomly selected as the query set, the remaining images were used as the database, and 10,000 images were randomly extracted from the database as the training set. In addition, since the MS COCO is a multi-labeled image dataset, 80 hash centers are generated for all classes using the Hadamard matrix, and then the centroid of the multi-center is calculated as the semantic hash center for each image with multiple labels.

NUS-WIDE is a data set created by the media search Laboratory (LMS) of the national university of singapore, containing 269,648 images downloaded from flicker. Each image was manually annotated by some of the 81 basic categories, and in this experiment we selected images for evaluation from the 21 most frequent categories, from which 5,000 images were randomly selected as the query set, and the remaining images were used as the database, and then 10,000 images were randomly selected from the database as the training set. Also, NUS-WIDE is a multi-label image dataset that generates 21 hash centers for all classes and then computes the semantic hash center for each image.

3. Results and analysis of the experiments

To verify the effectiveness of the present invention, experiments were performed on 3 reference image datasets, namely MS COCO, NUS _ WIDE, and CIFAR-10. 8 representative image retrieval methods are selected for comparison, namely LSH, SH, ITQ, SDH, CNNH, DNNH, DHN and HashNet. In addition, comparative experiments using both AlexNet and ResNet for the invention and HCN were specifically designed.

TABLE 2 NUS WIDE (MAP @5000) results of the experiment

TABLE 3 MS COCO (MAP @5000) Experimental results

TABLE 4 CIFAR-10(MAP @1000) results of the experiment

Table 2, Table 3 and Table 4 show the MAP comparison of the data sets NUS _ WIDE, MS COCO and CIFAR-10 for the present invention and other methods, respectively. In the experiment, three hash codes with different lengths are mainly adopted for comparison, namely 16 bits, 32 bits and 64 bits.

The comparison result proves the effectiveness of the invention, and meanwhile, the higher the hash code length is, the richer the reserved information is, and the higher the precision is, can be obtained through analysis.

FIGS. 2, 3 and 4 show the results of the present invention compared to other methods for the accuracy of searches on the NUS _ WIDE, MS COCO and CIFAR-10 datasets. Through fig. 2, fig. 3 and fig. 4, it can be seen that the accuracy of each bit of the three data sets is stably higher than that of other methods, because the activation function is continuously optimized and the center similarity learning is performed, so that a more accurate binary hash code is obtained. In addition, the supervised hashing method SDH has a higher accuracy than the unsupervised hashing methods LSH, SH, and ITQ because the supervised hashing method effectively utilizes the label information. And the deep hash method obtains higher retrieval accuracy by combining the advantages of deep learning. In particular, DHN controls quantization error while preserving pairwise similarity through end-to-end learning, whereas HashNet improves DHN by balancing positive and negative pairs in training data and a continuous technique to reduce quantization error. More importantly, the invention has better and more stable performance, and basically has no influence on the performance of the invention when the coding length is changed.

FIGS. 5 and 6 show the results of comparing the accuracy and recall on the NUS _ WIDE and MS COCO two data sets by the present invention and other methods. It can be seen from fig. 5 and 6 that when the length of the generated hash code is 64 bits, the SH and IQT methods using unsupervised learning have similar curves, and the accuracy decreases rapidly with the increase of the recall rate. The method using deep learning is superior to the former two methods, in which DNNH performs more stably than CNNH due to the end-to-end learning method. The invention is superior to other methods on two data sets, particularly the recall rate is in the range of 0.4-0.8, and the invention has stable performance.

FIGS. 7 and 8 are graphs comparing the accuracy of the present invention and other methods in returning TOP-K result sets on both NUS _ WIDE and MS COCO data sets. From fig. 7 and fig. 8, when the hash code length is 64 bits, the present invention achieves better performance in different TOP-ks than other methods, further proving the effectiveness of the present invention. In addition, as TOP-K increases, there is a small decrease in accuracy, which is caused by the ambiguity of Hamming ordering. Meanwhile, the performance of the method on the MS COCO data set is better than that of the NUS _ WIDE, because the NUS _ WIDE is multi-label data, and the existing method does not perform more effective measurement on the multi-label data, so that the retrieval difficulty is higher.

FIG. 9 shows the loss error and MAP versus Epoch for the present invention and HCN methods when trained on the CIFAR-10 dataset. As can be seen from fig. 9, as Epoch increases, the present invention converges to smaller loss error faster than HCN, and the MAP value of the present invention is better and more stable than HCN.

Tables 5 and 6 show the results of a comparison of MAP obtained for the inventive and HCN processes using AlexNet and ResNet networks on the MS COCO and CIFAR-10 datasets, respectively. As can be seen from tables 5 and 6, the MAP of the present invention performed better than HCN on both data sets using the AlexNet network. When the ResNet network is used, the MAP value of the invention is slightly less than HCN when 16 bits are on the MS COCO data set, and the invention is stable and better than HCN. The invention continuously optimizes the symbol activation function, so that less detailed information of data is lost, richer image characteristics are learned, and more accurate binary hash codes are obtained.

TABLE 5 MAP for HCCN and HCN on MS COCO Using AlexNet and ResNet

TABLE 6 MAP for HCCN and HCN on CIFAR-10 using AlexNet and ResNet

Fig. 10 and 11 show the results of two-dimensional visualization of the invention and HashNet using t-SNE. 1000 data of 10 classifications are randomly extracted for testing in the experiment, and as can be seen from fig. 10 and 11, the hash codes generated by the invention have the advantages that like data are converged together, and dissimilar data are far away from each other, so that the hash codes are more intuitive and have stronger distinguishability than HashNet. The invention adopts a central similarity learning mode, so that the data of the same type is converged to a common hash center, and the dissimilar data is far away from the hash center.

FIG. 12 is an experiment conducted on both NUS _ WIDE and MS COC0 data sets for both the present invention and HCN methods, showing the visualization of the search when returning to TOP-10. As can be seen from fig. 12, the present invention has a higher retrieval accuracy than HCN in the returned results for the same query image on both datasets.

The results of experimental comparison clearly show that the hash code generated by the invention has higher accuracy and stronger distinguishability, and each index on AlexNet is higher than that of the original HCN method.

Claims

1. A continuous learning method based on a Hash center is characterized in that: comprises the following steps:

step 1, preparing an image data set in the field of image retrieval, wherein the image data set contains similar data pairs and non-similar data pairs;

step 2, inputting the similar data pairs and the non-similar data pairs in the image data set into a convolutional neural network for feature learning;

step 3, the result of the feature learning passes through a Hash layer, and the Hash layer converts continuous depth representation into K-dimensional representation;

2. The hash center-based continuous learning method according to claim 1, wherein: in the step 2, the convolutional neural network is AlexNet or ResNet.

3. The hash center-based continuous learning method according to claim 1, wherein: in the step 3, after the hash layer is passed, the hash codes of the similar data pairs are converged to a common hash center after being constrained by the center similarity, the hash codes of the dissimilar data pairs are converged to different hash centers after being constrained by the center similarity, and the convergence target is the hash center in the hamming space.

4. The hash center-based continuous learning method according to claim 1, wherein: in step 3, the hash layer includes three fully-connected layers, each fully-connected layer includes a row of neurons and an activation function, the activation functions of the first two fully-connected layers are ReLU functions, and the activation function of the third fully-connected layer is tanh function.

5. The hash center-based continuous learning method according to claim 1, wherein: in step 4, the activation function is an sgn function.