CN110059206A

CN110059206A - A kind of extensive hashing image search method based on depth representative learning

Info

Publication number: CN110059206A
Application number: CN201910249642.8A
Authority: CN
Inventors: 王祥丰; 胡慷; 孔桦桦; 田伟; 陈寅峰; 李鑫
Original assignee: Enjoyor Co Ltd
Current assignee: Enjoyor Co Ltd
Priority date: 2019-03-29
Filing date: 2019-03-29
Publication date: 2019-07-26

Abstract

A kind of extensive hashing image search method based on depth representative learning, the post-processing of Optimization Learning and search result including data prediction, based on semi-supervised convolutional neural networks layer building, based on pairs of Hash loss function design step loss function.The present invention carries out the extraction of characteristics of image and the study of hash function using depth convolutional neural networks layer and full articulamentum, devise the assembling loss function for intersecting entropy loss item, triple loss item and pseudo label loss item and constituting of tape label, it is optimized using the stochastic gradient descent method with momentum, computational efficiency with higher finally realizes the unified image retrieval performance of accuracy and speed.The present invention carries out picture search problem to carry out High Efficiency Modeling under the basis based on image data structure and label under the status of picture search field fast development, the model accuracy and more optimized inquiry velocity effectively improved.

Description

A kind of extensive hashing image search method based on depth representative learning

Technical field

The invention belongs to field of image search, are related to a kind of large-scale image search method.

Background technique

With the rapid development of the technologies such as big data technology, development of Mobile Internet technology, technology of Internet of things, image, video etc. Acquisition, convergence and the storage of multimedia resource are more and more convenient.In practical applications, 80% or so data be all with document, The unstructured data of the forms such as image, video, audio storage, and same such unstructured data all presses index every year Increase about 3/5ths.In unstructured data, image be again it is most important, occupy enormous proportions, but at the same time Important information is contained.Computer vision is primarily upon the research of image data, therefore also naturally enough becomes machine learning One of with the most popular research point of artificial intelligence technology.

Application problem in face of various complexity and the amount of images growth rate that is exceedingly fast promote traditional images analysis to handle Technology encounters very big challenge, especially huge resource, superelevation dimension, mass memory, slowly in terms of challenge. To solve the problems, such as this some column, a variety of approximate algorithms with low time complexity are suggested, wherein to be based on hash algorithm The algorithm of representative obtains a large amount of concerns, because it has many advantages, such as constant enquiry time and a small amount of compression binary code storage.But A large amount of image data is all not no label, and especially in fields such as picture search, the image data of only only a few is had Label information, this is also to consume what a large amount of manpower and material resources obtained.How to merge has flag data and a large amount of nothings using a small amount of The image search method of flag data is the method with Great significance and actual demand.Design targetedly efficient needle Derivation algorithm to large-scale image search is vital.

Hash algorithm is typical nearest neighbor search method, is mainly used for solving memory space recited above greatly and retrieves The problem of time length.In hash algorithm, common target is one mapping function W of study, and sample is expressed as a string of fixations The binary-coding of length, encoding usually using {+1, -1 } or {+1,0 } indicates bit each, so that semantically similar Sample can obtain similar coding, we are usually using Hamming distance from the similitude between measurement binary-coding.

In initial work, local sensitivity hash algorithm (LSH) is a kind of popular algorithm, the key of LSH algorithm Part is to divide space using the hyperplane being randomly generated, and it is every to determine to fall in the position of hyperplane according to sample One value, LSH require similar sample to fall among the same Hash bucket with greater probability value, without similar sample with Minimum probability value is fallen among the same Hash bucket, and by using bit multiple, final LSH can reach satisfactory inspection Suo Xiaoguo, LSH method have stringent theoretical proof to guarantee its performance.But in order to obtain shorter more compact coding And better retrieval performance, researcher consider the label information of image from different objective function is constructed, and choose different excellent Change method must further improve the performance of Hash retrieval using linearly or nonlinearly model etc..

The appearance of deep neural network causes huge repercussion in computer vision field, and researcher is unnecessary to go again The complicated inefficient expression feature of engineer, but neural network oneself is allowed to go to learn the characterization of image.The present invention is exactly to combine Deep neural network designs a kind of semi-supervised Hash searching algorithm, the Hash searching algorithm of the remote ultra-traditional of retrieval rate.

Summary of the invention

In order to overcome the lower deficiency of accuracy rate of existing large-scale image search method, the present invention provides a kind of accurate The higher extensive hashing image search method based on depth representative learning of rate.

The technical solution adopted by the present invention to solve the technical problems is:

A kind of extensive hashing image search method based on depth representative learning, comprising the following steps:

Step 1: data prediction；

Step 2: deep neural network constructs, using pretreated labeling in data prediction step or without label Data carry out deep neural network model building；

Step 3: Hash loss function designs, the Hash loss function is by softmax cross entropy loss function, ternary Group loss function, pseudo label loss function three parts composition.

Step 4: deep neural network training and model optimization derive letter for the Hash loss function of step 3 Several gradients carries out the update of parameter until convergence to entire depth neural network model using the gradient descent method with momentum；

Step 5, the deep neural network model after input picture to training, is calculated binary system Hash codes to data Image in library calculates Hamming distances, returns to candidate result of the most like image as picture search.

Preferably, the step 5 further includes, the candidate result is the result of coarseness；Select layer mind second from the bottom Output through network is further calculated the Euclidean distance of these characterizations in candidate result, is arranged again with this as characterization Sequence, the final result are exactly fine-grained search result.

Preferably, the data prediction step includes but is not limited to the standardization of image normalizing, drop in the step 1 It makes an uproar, deblurring, image data augmentation preliminary treatment, further includes whether having class declaration according to data to be divided into tape label data With without label data, for there is the data of label, it is also necessary to construct triple structure.

Preferably, the output of neural network the last layer includes two parts in the step 3: Hash codes and classification Value；The output of neural network the last layer includes two parts: Hash codes and class label；The class label intersects as softmax The input of entropy loss function is converted into the class probability between [0,1] by softmax function, then passes through cross entropy loss function Obtain penalty values to the end；The Hash codes are the inputs of triple loss function and pseudo label loss function.

The softmax cross entropy loss function:

J_S(x_i)=- y_ilogP(x_i)

The wherein index of i representative sample, x represent data sample, and y represents label, P (x_i) represent i-th of sample and belong to y's Probability value；

The triple loss function:

Wherein, i is sample index, and x represents data sample,Indicate x^LSample with same label,Represent and x^LSample with different labels, h (*) represent the corresponding output of network, m_tFor threshold parameter；

The pseudo label loss function:

Wherein, i, j are similarly sample index, for the data of not tape label, assign the maximum label of probability value to its work For pseudo label, m_pFor threshold parameter, the maximum distance of two different classes of samples is represented, A (i, j)=1 indicates that sample i, j belong to Same class, A (i, j)=0 indicate that sample i, j belong to inhomogeneity, and h (*) represents the corresponding output of network.The Hash loss function Are as follows:

Wherein J_S(x_i) it is softmax cross entropy loss function,It is triple loss function,It is pseudo label loss function, p is pseudo label, and α, β and γ are weight parameters.

Preferably, for softmax cross entropy loss function, passing through deep learning lower portion in the step 4 Automatic derivation；For triple loss function and pseudo label loss function, derivation is re-started using back-propagation algorithm.

Preferably, in an iterative process, being trained optimization to tape label data first, first not in the step 4 Using pseudo label loss function；When loss drops to given threshold, unlabeled exemplars data are added, are lost in conjunction with pseudo label Function is iterated optimization to Hash loss function until convergence.

Preferably, the deep neural network uses basic neural network framework in the step 2, pass through stacking Convolutional layer, pond layer and full articulamentum form core network.In the neural network framework, 2 layers are added to connect entirely using 5 layers of convolutional layer The mode of layer is connect, and distinguishes Hash coding layer and classification layer simultaneously in the last layer；Network parameter is set as, first layer convolution Layer: convolution kernel size is 11x11, stride 4, port number 64, activation primitive Relu；Pool layers of Max: core size is 3x3, stride 2；Second layer convolutional layer: convolution kernel size is 5x5, and stride 1, port number 192, activation primitive is Relu.Pool layers of Max: core size is 3x3, stride 2；Third layer convolutional layer: convolution kernel size be 3x3, stride 1, Port number is 384, activation primitive Relu；4th layer of convolutional layer: convolution kernel size is 5x5, stride 1, and port number is 256, activation primitive Relu；Layer 5 convolutional layer: convolution kernel size is 3x3, stride 1, port number 256, activation letter Number is Relu；The full articulamentum neuron nodal point number of first layer is 2048；The full articulamentum neuron nodal point number of the second layer is Hash codes The sum of digit and classification number.

Beneficial effects of the present invention are mainly manifested in:

1, in the network model building based on depth, the network architecture of similar AlexNet is selected, directly by original graph As the input as model, advantage is that neural network can acquire the characterization of better image data, has stronger extensive Performance；

2, ingenious to combine triple loss in the design of Hash loss function, softmax intersects entropy loss and pseudo label damage Item is lost, the study of model multi-angle can be conducive to, to obtain higher robustness.

3, in the optimization of model and the update of parameter, the derived function of loss function and the training side of model are given Method can reduce manually flag data in the case where guaranteeing test accuracy rate in conjunction with this semi-supervised mode of learning The required time, to save human cost.

4, in the step of image retrieval, we have proposed multi-level sort methods from thick to thin, can allow in the time In the case where, further increase the accuracy rate of retrieval.

Detailed description of the invention

Fig. 1 is the flow chart of the extensive hashing image search method the present invention is based on depth representative learning.

Fig. 2 is the network of the extensive hashing image search method the present invention is based on depth representative learning.

Specific embodiment

The invention will be further described below in conjunction with the accompanying drawings.

Referring to Figures 1 and 2, a kind of extensive hashing image search method based on depth representative learning, by image data Structure and label condition driving effectively modeling proposed a kind of based on depthmeter on the basis of existing picture search algorithm The extensive hashing image search method of study is levied, its advantages are that of obtaining high accuracy and quick search speed and rule Mould scalability and less memory space.

The extensive hashing image search method the following steps are included:

Step 1: data prediction

In data prediction part, first have to class declaration whether is had according to the data in data set, for example, class name or Person one-hot coding, is divided into tape label data and without label data, for there is the data of label, it is also necessary to construct ternary Group structure, that is, for sample x^L, choose another sample for having same labelAnother is chosen with different marks The sample of labelThe data for forming a triple structure, the input as network.It is also desirable to all data sets Standardization processing is carried out, the operation for subtracting mean value divided by standard deviation is carried out, controls the numeric distribution of sample.In the situation of sample deficiency Under, need to carry out data augmentation operation, including rotated, translated to picture, fuzzy noise etc., and the width of image is high Resize to 224.

Step 2: deep neural network constructs

Deep neural network model is carried out using pretreated labeling in data prediction step or without label data Building, herein, we still take basic neural network framework, by stacking convolutional layer, pond layer and full articulamentum group At core network.In network model building, the network structure of AlexNet is copied, adds 2 layers of full articulamentum using 5 layers of convolutional layer Mode, and the last layer simultaneously distinguish Hash coding layer and classification layer.Specific network parameter is set as, first layer convolution Layer: convolution kernel size is 11x11, stride 4, port number 64, activation primitive Relu.Pool layers of Max: core size is 3x3, stride 2.Second layer convolutional layer: convolution kernel size is 5x5, and stride 1, port number 192, activation primitive is Relu.Pool layers of Max: core size is 3x3, stride 2.Third layer convolutional layer: convolution kernel size be 3x3, stride 1, Port number is 384, activation primitive Relu.4th layer of convolutional layer: convolution kernel size is 5x5, stride 1, and port number is 256, activation primitive Relu.Layer 5 convolutional layer: convolution kernel size is 3x3, stride 1, port number 256, activation letter Number is Relu.The full articulamentum neuron nodal point number of first layer is 2048.The full articulamentum neuron nodal point number of the second layer is Hash codes The sum of digit and classification number, such as Hash code bit number are 48, share 10 classes, then the number of nodes of full articulamentum is 58.Specific net Network structure chart is as shown in Figure 2.

Step 3: Hash loss function designs

In the design of loss function, it is divided into three parts, the softmax cross entropy loss function of tape label, triple Loss function, pairs of pseudo label loss function.

The output of neural network the last layer includes two parts: Hash codes and class label.

Input of the class label as softmax will be converted into the class probability between [0,1] by softmax function, then be led to It crosses cross entropy objective function and obtains penalty values to the end, softmax cross entropy loss function is most common Classification Loss function.

Hash codes part is the input of triple loss function item and pairs of pseudo label loss item, and triple loss is not It is same as another loss function that softmax intersects entropy loss, it is characterized in that identical two samples of the meaning of one's words is promoted to measure Under space from as far as possible closely, and different two samples of the meaning of one's words under metric space from must as far as possible far.It is pairs of to be Tag entry loss item can regard certain a part of triple loss item in the case where the meaning of one's words is identical or different as.

First part is that commonly used softmax intersects entropy loss in image classification, for each sample, our all mistakes Softmax obtains the probability value P of classification, the calculating target function in a manner of cross entropy.It is as follows:

J_S(x_i)=- y_ilog P(x_i)

The wherein index of i representative sample, x representative sample, y represent label, P (x_i) represent the probability that i-th of sample belongs to y Value；

Second part is triple loss item, for the triple data of the tape label constructed in step 1, Wo Menshe The distance that a mode measures them is counted.Each sample x can have output valve h (x), for identical exemplar x^LWithThe distance between they are minimized, are calculated with Euclidean distance.Conversely, then maximizing the distance between they.It is as follows:

Wherein, i is sample index, and x represents data sample,Indicate x^LSample with same label,Representative and x^L Sample with different labels, h (*) represent the corresponding output of network, m_tFor threshold parameter；

Part III is pseudo label loss item, and for unlabelled sample, we can obtain institute by neural network output Belong to the probability value of each classification, the pseudo label that we assign the classification where maximum value as it constructs pseudo- as second part The distance between label measurement.It is as follows:

Wherein, i, j are similarly sample index, m_pFor threshold parameter, the maximum distance of two different classes of samples, A are represented (i, j)=1 indicates that sample i, j belong to same class, and A (i, j)=0 indicates that sample i, j belong to inhomogeneity, and it is corresponding that h (*) represents network Output.

Hash loss function specifically:

Wherein J_S(x_i) be tape label softmax cross entropy loss function,It is triple damage Function is lost,It is pseudo label loss function, p is pseudo label.Three loss functions It can be regulated and controled by weight between, wherein α, β and γ be as weight parameter, we enable α=1 herein, β=1, γ= 0.5。

Step 4: deep neural network training and model optimization

For the loss function of step 3, the gradient of derived function loss function utilizes the gradient descent method pair with momentum Entire model carries out the update of parameter until convergence.

Wherein, we are trained convergence to the sample of tape label first, that is, loss decline about to 0.2 with Under, it then is added unlabeled exemplars, alternately be trained.This semi-supervised alternating training method can be very big The time required for marker samples is reduced in degree, saves cost of human resources.The side SGD with momentum is selected in terms of optimizer Method, learning rate are set as 0.001, and momentum is set as 0.9.Common gradient declines updating method are as follows:

Wherein W_tFor the parameter of t moment neural network, ▽ J is the derivative of loss function J.

Momentum gradient descent method is as follows:

Wherein θ is momentum parameter, replaces former gradient to carry out parameter update using gradient after exponent-weighted average.Because every The information of gradient before gradient after a exponent-weighted average contains.

For softmax cross entropy loss function, pass through the automatic derivation of deep learning lower portion；Triple is lost Function and pseudo label loss function, re-start derivation using back-propagation algorithm；

For triple loss function derivation:

Wherein I represents indicator function, when C is True, then for 1 on the contrary be then -1；

For pseudo label loss function derivation:

As A (i, j)=1:

As A (i, j)=0:

Optimization is iterated to objective function until convergence.

Step 5: image retrieval

After the completion of model training, we keep the Hash coding of training data.Give a test query picture, mould Type can export the binary-coding that it is predicted.The Hash of the binary-coding of prediction and training data is encoded and carries out hamming by we The measurement of distance returns to candidate result of the most like image as picture search, is waited according to apart from size given threshold Selected works prepare.

If it is required that higher inquiry precision, can carry out finer sequence, to the inverse of Candidate Set and training sample Second layer feature carries out Euclidean distance comparison, to resequence, obtains more accurately retrieval effectiveness.It is looked into for example, we give Ask image I_q, candidate pool P has been obtained after hamming sorts, we are that the higher front layer of dimension is used to connect output entirely as candidate Collect the detailed characterizations of P, we use V_qAnd V_i ^PRespectively represent query image I_qWith the feature vector of candidate pool P, we pass through Europe Formula distance measure P in i-th of sample and query image similarity S, as follows: where S is smaller, represent two samples it Between similarity it is higher, we can reorder to candidate pool by S, obtain more preferably query result.

s_i=| | V_q-V_i ^P||

Wherein, S is smaller, and the similarity represented between two samples is higher, we can reset candidate pool by S Sequence obtains more preferably query result.

The scheme of the present embodiment, using depth convolutional neural networks layer and full articulamentum carry out characteristics of image extraction and The study of hash function devises the intersection entropy loss item, triple loss item and pseudo label damage of tape label on this basis The assembling loss function that item is constituted is lost, is optimized using the stochastic gradient descent method with momentum, calculating with higher Efficiency, finally, the searching method that the present invention is combined using coarseness and fine granularity, realizes the unified image of accuracy and speed with this Retrieval performance.The present invention carries out based on image data picture search problem under the status of picture search field fast development High Efficiency Modeling, the model accuracy effectively improved and more optimized inquiry velocity are carried out under the basis of structure and label.The present invention The algorithm proposed has better accuracy and robustness than traditional several image retrieval algorithms, and the present invention can be used for Large-scale image retrieval scene, serves intelligent searching system, such as: search engine to scheme to search figure function, electric business website Similar commercial articles searching, the individualized content of social platform recommends etc..

Protection content of the invention is not limited to above embodiments.Without departing from the spirit and scope of the invention, originally Field technical staff it is conceivable that variation and advantage be all included in the present invention, and with appended claims be protect Protect range.

Claims

1. a kind of extensive hashing image search method based on depth representative learning, which is characterized in that the method includes with Lower step:

Step 1: pre-processing image data；

Step 3: Hash loss function designs, the Hash loss function is damaged by softmax cross entropy loss function, triple Lose function, pseudo label loss function three parts composition.

Step 4: deep neural network training and model optimization, for the Hash loss function of step 3, derived function loss The gradient of function carries out the update of parameter to entire depth neural network model until receiving using the gradient descent method with momentum It holds back.

Step 5, the deep neural network model after input picture to training, is calculated binary system Hash codes in database Image calculate Hamming distances, return to candidate result of the most like image as picture search.

2. a kind of extensive hashing image search method based on depth representative learning as described in claim 1, feature exist In the step 5 further includes that the candidate result is the result of coarseness；The output of layer neural network second from the bottom is selected to make For characterization, the Euclidean distance of these characterizations is further calculated in candidate result, is resequenced with this, the knot finally obtained Fruit is exactly fine-grained search result.

3. a kind of extensive hashing image search method based on depth representative learning as claimed in claim 1 or 2, feature It is, in the step 1, the data prediction step includes but is not limited to the standardization of image normalizing, noise reduction, deblurring, figure It further include whether having class declaration according to data to be divided into tape label data and without number of tags as the preliminary treatment of data augmentation According to for there is the data of label, it is also necessary to construct triple structure.

4. a kind of extensive hashing image search method based on depth representative learning as claimed in claim 3, feature exist In in the step 3, the output of neural network the last layer includes two parts: Hash codes and class label；The class label is made For the input of softmax cross entropy loss function, the class probability between [0,1] is converted by softmax function, then passes through friendship Pitch the penalty values of entropy loss function call to the end；The Hash codes are the defeated of triple loss function and pseudo label loss function Enter.

5. a kind of extensive hashing image search method based on depth representative learning as claimed in claim 4, feature exist In the softmax cross entropy loss function:

J_S(x_i)=- y_ilogP(x_i)

The wherein index of i representative sample, x represent data sample, and y represents label, P (x_i) represent the probability that i-th of sample belongs to y Value；

The triple loss function:

Wherein, i is sample index, and x represents data sample,Indicate x^LSample with same label,Representative and x^LIt has The sample of different labels, h (*) represent the corresponding output of network, m_tFor threshold parameter；

The pseudo label loss function:

Wherein, i, j are similarly sample index, for the data of not tape label, assign the maximum label of probability value to it as puppet Label, m_pFor threshold parameter, the maximum distance of two different classes of samples is represented, A (i, j)=1 indicates sample i, and j belongs to same Class, A (i, j)=0 indicate that sample i, j belong to inhomogeneity, and h (*) represents the corresponding output of network.

6. a kind of extensive hashing image search method based on depth representative learning as claimed in claim 5, feature exist In the Hash loss function are as follows:

7. a kind of extensive hashing image search method based on depth representative learning as claimed in claim 1 or 2, feature It is, in the step 4, for softmax cross entropy loss function, passes through the automatic derivation of deep learning lower portion；For Triple loss function and pseudo label loss function, re-start derivation using back-propagation algorithm.

8. a kind of extensive hashing image search method based on depth representative learning as claimed in claim 1 or 2, feature It is, in the step 4, in an iterative process, optimization is trained to tape label data first, does not first use pseudo label to damage Lose function；When loss drops to given threshold, unlabeled exemplars data are added, in conjunction with pseudo label loss function, to Hash Loss function is iterated optimization until convergence.

9. a kind of extensive hashing image search method based on depth representative learning as claimed in claim 1 or 2, feature It is, in the step 2, the deep neural network uses basic neural network framework, by stacking convolutional layer, Chi Hua Layer and full articulamentum form core network.

10. a kind of extensive hashing image search method based on depth representative learning as claimed in claim 9, feature exist In the neural network framework distinguishes Hash in such a way that 5 layers of convolutional layer add 2 layers of full articulamentum, and in the last layer simultaneously Coding layer and classification layer；Network parameter is set as, first layer convolutional layer: convolution kernel size is 11x11, stride 4, port number It is 64, activation primitive Relu；Pool layers of Max: core size is 3x3, stride 2；Second layer convolutional layer: convolution kernel size For 5x5, stride 1, port number 192, pool layers of activation primitive Relu, Max: core size is 3x3, stride 2； Third layer convolutional layer: convolution kernel size is 3x3, stride 1, port number 384, activation primitive Relu；4th layer of convolution Layer: convolution kernel size is 5x5, stride 1, port number 256, activation primitive Relu；Layer 5 convolutional layer: convolution kernel is big Small is 3x3, stride 1, port number 256, activation primitive Relu；The full articulamentum neuron nodal point number of first layer is 2048；The full articulamentum neuron nodal point number of the second layer is the sum of Hash code bit number and classification number.