CN106649665A

CN106649665A - Object-level depth feature aggregation method for image retrieval

Info

Publication number: CN106649665A
Application number: CN201611152148.2A
Authority: CN
Inventors: 李豪杰; 暴雨; 樊鑫; 罗钟铉
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2016-12-14
Filing date: 2016-12-14
Publication date: 2017-05-10

Abstract

The invention relates to the field of digit media and provides an object-level depth feature aggregation method for image retrieval. First, an unsupervised method is used to generate candidate regions that may contain objects; then corresponding convolution neural network characteristics are extracted; finally, the area features are aggregated to obtain image feature representation having high robustness for image transformation for the use of image retrieval applications. The present invention addresses the lack of geometric transformation and spatial layout invariance of existing models, and the object-based mode is adopted to solve the problems in the prior art; the image features generated by the method have high robustness on image geometric transformation and spatial arrangement transformation; the accuracy of image retrieval is increased; the obtain image is quit compact and concise so that complexity of similarity calculation among images is reduced and retrieval efficiency is increased.

Description

A kind of object level depth characteristic polymerization towards image retrieval

Technical field

The invention belongs to field of digital media, is related to a kind of object level depth characteristic polymerization towards image retrieval.

Background technology

CBIR as computer vision field an important research problem, in past ten years By the extensive concern of Chinese scholars.CBIR is referred to and found out from image data base and query image Similar image.Because the difference of the factor such as angle, distance, environment, can cause similar or identical reference object to exist when shooting Different images have very big change, such as yardstick, visual angle, layout change.Therefore generate one there are various image changes The characteristics of image of high robust, is the key for solving image retrieval problem.

Relative to traditional characteristics of image based on engineer, especially convolutional neural networks have been for the method based on study The powerful ability that Jing shows in image characteristics extraction, takes in the Computer Vision Task such as image classification and target detection Obtained huge success.In image retrieval problem, have at present based on the overall situation and based on the two kinds of convolutional neural networks features in local Method for expressing.

Based on global method, the feature of entire image is directly extracted using convolutional neural networks, as final image Feature.But it is because that convolutional neural networks are mainly encoded to global space information, causes gained feature to lack to image The geometric transformations such as yardstick, rotation, translation and the consistency of space layout change, limit it for highly variable image retrieval Robustness.

For the method based on local, the feature of image local area is extracted using convolutional neural networks, be then polymerized this A little provincial characteristics generate final characteristics of image.Although these methods take into account the local message of image so that feature is relative There is higher robustness to all kinds of changes in global approach, but still there are some defects in these methods.For example using slip The method of window is obtaining image-region (with reference to Yunchao Gong, Liwei Wang, Ruiqi Guo, Svetlana Lazebnik is in European Conference on Computer Vision the 392-407 page article delivered in 2014 " Multi-scale orderless pooling of deep convolutional activation features "), because The vision contents such as color, texture, the edge of image are not accounted for, the region without semantic meaning in a large number is produced, it is poly- after being Conjunction process brings redundancy and noise information.In addition, provincial characteristics merges commonly used maximum pond algorithm (refers to Konda Reddy Mopuri, R.Venkatesh Babu is in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops the 62-70 page article " Object for delivering in 2015 Level deep feature pooling for compact image representation "), because only remaining feature Peak response without consider feature between association, lose bulk information, reduce gained final image feature differentiation Property.

The present invention solves problem above by object-based method.When image-region is generated, using based on content Unsupervised object generation method, i.e., visual informations such as color of image, texture, edges generating image by way of clustering Region.Because same semantic object has certain visual similarity in image, the image-region for so obtaining is very general A part for an object or object can be included in rate.Meanwhile, a width scene image is typically made up of some objects, to this The parsing of a little objects is the key for understanding scene.Therefore the image-region based on content generation is relative to simple sliding window bag Containing more visual informations for having a semantic meaning, its feature interpretation also has higher distinction, while being carried out based on characteristics of objects Fusion, space layout change of the final feature of gained to object in scene also has good robustness.In the mistake of aggregation features Cheng Shi, using VLAD (Vector of Locally Aggregated Descriptors) algorithm, first by image area characteristics Clustered, then count all provincial characteristics in piece image and represent final with the accumulation residual error of its close cluster centre Characteristics of image.Relative to maximum pond algorithm, the method considers local message while association between provincial characteristics to image There is finer portraying so that the final image feature for obtaining has more high robust to the conversion of all kinds of images.

The content of the invention

For the deficiencies in the prior art, the present invention provides a kind of object level depth characteristic polymerization side towards image retrieval Method, the characteristics of image that generation has high robust to Image geometry transform and object space layout change is answered for image retrieval With.

The technical scheme is that：

A kind of object level depth characteristic polymerization towards image retrieval, comprises the following steps：

Step 1, to database in each image using Selective Search algorithms extract candidate region, generate It is likely to contain the image candidate region of object.Described Selective Search (Selective Search for Object Recognition) a kind of algorithm image partition method that to be utilization visual information merged based on delamination area, can Generate that class is independent and high-quality multiple dimensioned candidate region.Relative to sliding window, the feature of the candidate region comprising object is retouched State with higher distinction, while object-based mode can also improve the robustness that fusion feature is converted to space layout.

Step 2, selects the convolutional neural networks structural model being widely adopted, and to convolutional Neural on public database Network carries out pre-training.

Step 3, the convolutional neural networks completed using training extract the feature in all image candidate regions

3.1) image candidate region is zoomed in and out and is filled into after fixed size, as the input of convolutional neural networks；

3.2) using the output of the full articulamentum FC7 of convolutional neural networks as the image candidate region Expressive Features.

Step 4, the Expressive Features of the candidate region obtained to step 3 carry out dimensionality reduction using Principal Component Analysis Algorithm, by it Dimension is reduced to N-dimensional, obtains low-dimensional candidate region feature；Dimensionality reduction can reduce the complexity for calculating afterwards, improve efficiency.

Step 5, the low-dimensional candidate region feature obtained to step 4 carries out Unsupervised clustering using K mean cluster algorithm, gathers Into K cluster centre.

Step 6, the K that the low-dimensional candidate region feature and step 5 that belong to same image obtained to step 4 is obtained poly- Class center, is polymerized using VLAD algorithms, and every image obtains the VLAD features that a dimension is N*K dimensions.Described VLAD (Vector of Locally Aggregated Descriptors) algorithm is the fusion method based on statistics, and it has counted area Characteristic of field represents final characteristics of image with the accumulation residual error of its close cluster centre；Relative to simple pond algorithm, should The feature that algorithm has more careful description, generation to picture material has more high robust to image conversion.

Step 7, the VLAD features obtained to step 6 carry out dimensionality reduction using Principal Component Analysis Algorithm, and its dimension is reduced to into D Dimension, generates succinct characteristics of image.Dimensionality reduction can reduce Similarity Measure complexity and noise, the similarity wherein between image by Euclidean distance between characteristics of image is measuring.

Beneficial effects of the present invention are that the characteristics of image for generating has the height converted to Image geometry transform and space layout Robustness, drastically increases the accuracy rate of image retrieval, and the characteristics of image that next is obtained is extremely compact succinct, reduces image Between Similarity Measure complexity.

Description of the drawings

Fig. 1 is the flow chart of depth characteristic polymerization of the present invention.

Fig. 2 is the schematic diagram of image searching result, and most left figure is query image, and remaining image is the similar diagram for retrieving Picture, from left to right sorts from high to low according to similarity successively.

Specific embodiment

The specific embodiment of the present invention is described in detail below in conjunction with technical scheme and accompanying drawing.

Embodiment 1：The retrieval of similar image

1. Fig. 1 is the flow chart of the present invention, uses Selective Search algorithms to all images of storehouse image first Quick mode carry out the extraction of candidate region, average every image can obtain the candidate region that about 2000 sizes differ.

2. the present invention is input into as 224*224's using the convolutional neural networks structure Alex network of Krizhevsky et al. RGB image, including five layers of convolutional layer, three layers of maximum pond layer and three layers of full articulamentum.The network is trained using Caffe frameworks, Training data is 1000 class categorized data sets in ILSVRC12 matches.

3. after the completion of network training, the candidate region that step 1 is obtained is by filling and zooms to fixed size 224*224 Afterwards as the input of network, the feature of the output as correspondence candidate region of full articulamentum fc7 is extracted, its size is 4096 dimensions.

4. dimensionality reduction is carried out to the feature of all candidate regions using Principal Component Analysis Algorithm, obtain low-dimensional candidate region special Levy, wherein corresponding dictionary dimension size be 512*4096, will all candidate regions characteristic dimension from 4096 dimension drop to 512 Dimension.

5. Unsupervised clustering is carried out to low-dimensional candidate region feature using K mean cluster algorithm, be polymerized to 256 cluster centres {c₁, c₂..., c₂₅₆}。

6. use VLAD algorithms to be VLAD features by the low-dimensional candidate region feature coding of each image.First, distribute Each low-dimensional candidate region feature p in image_jTo from its 5 nearest cluster centre rNN (p_j), be then polymerized all low-dimensionals Candidate region feature deducts the residual error of the cluster centre of its distribution, obtains x as the VLAD features of image：

Wherein, j is the subscript of candidate region in an image；p_jThe low-dimensional feature of the candidate region of j is designated as under；c₁、c_k Respectively first and k-th cluster centre；rNN(p_j) it is from p_j5 nearest cluster centres；w_j1、w_jkFor p_jRespectively with c₁With c_kGaussian kernel similarity, represent correspondence cluster centre weight, to each candidate region, standardize it to nearest 5 gather The weight at class center and for 1.Final every image obtains corresponding VLAD features, and its size is 512*256=131072 dimensions.

7. the VLAD features for being obtained to step 6 using Principal Component Analysis Algorithm carry out dimensionality reduction, obtain succinct characteristics of image, Wherein corresponding dictionary dimension be 512*131072, will VLAD features characteristic dimension from 131072 dimension drop to 512 dimensions.

8. for query image, candidate region is generated using step 1, step 3 extracts candidate region feature, then using Principal Component Analysis Algorithm dictionary and cluster centre that Jing is completed in step 4,5 training, obtain its corresponding VLAD special by step 6 Levy, the Principal Component Analysis Algorithm dictionary dimensionality reduction for finally being completed using step 7 training obtains the succinct characteristics of image of 512 dimensions.

9. the Euclidean distance between the feature of query image and the characteristics of image of storehouse image is calculated, and is sorted by size, distance Similarity is higher between value less expression image.Fig. 2 is the schematic diagram of the result of retrieval.

Claims

1. a kind of object level depth characteristic polymerization towards image retrieval, it is characterised in that following steps：

Step 1, to database in each image using Selective Search algorithms extract candidate region, generate image Candidate region；

Step 2, selects convolutional neural networks structural model, and carries out pre-training to convolutional neural networks on public database；

3.1) image candidate area zoom is filled into after fixed size, as the input of convolutional neural networks；

3.2) using the output of the full articulamentum FC7 of convolutional neural networks as the image candidate region Expressive Features；

Step 4, the Expressive Features of the candidate region obtained to step 3 carry out dimensionality reduction using Principal Component Analysis Algorithm, by its dimension N-dimensional is reduced to, low-dimensional candidate region feature is obtained；

Step 5, the low-dimensional candidate region feature obtained to step 4 carries out Unsupervised clustering using K mean cluster algorithm, is polymerized to K Individual cluster centre；

Step 6, in the K cluster that the low-dimensional candidate region feature and step 5 that belong to same image obtained to step 4 is obtained The heart, is polymerized using VLAD algorithms, and every image obtains the VLAD features that a dimension is N*K dimensions；

Step 7, the VLAD features obtained to step 6 carry out dimensionality reduction using Principal Component Analysis Algorithm, and its dimension is reduced to into D dimensions, raw Into succinct characteristics of image.