CN103235955A

CN103235955A - Extraction method of visual word in image retrieval

Info

Publication number: CN103235955A
Application number: CN2013101591837A
Authority: CN
Inventors: 黄祥林; 吕慧; 陈明祥; 杨丽芳; 范瑛; 张璐; 邢承磊
Original assignee: XINHUA NEWS AGENCY; Communication University of China
Current assignee: XINHUA NEWS AGENCY; Communication University of China
Priority date: 2013-05-03
Filing date: 2013-05-03
Publication date: 2013-08-07

Abstract

The invention discloses an extraction method of a visual word in image retrieval, belonging to the field of intelligent information processing such as multimedia information retrieval and mode recognition. According to the extraction method, the feature uniqueness and the information amount maintaining binary local feature are obtained by binarizing a local feature set in an image library, so that the space utilization rate of the feature is improved in the vector space, the uniqueness of the visual word is favorably improved, and the calculation speed is improved and the memory cost is reduced in the subsequent retrieval or classification application by quickly calculating the hamming distance of the binary feature.

Description

The extracting method of vision word in a kind of image retrieval

Technical field

The invention belongs to information retrieval field such as multimedia information retrieval, data mining and pattern-recognition, be specifically related to the extracting method of vision word in a kind of image retrieval.

Background technology

The CBIR technology is by retrieving feature analyses such as color of image, texture, shapes, and result for retrieval visually embodies the correlativity with query image.Visual feature of image can be divided into global characteristics and local feature vectors.The global image feature representation global statistics of integral image feature, comparatively responsive to the position of object in the image and dimensional variation etc., distribute or the shape facility of image-region etc. as color histogram, image texture.The local feature vectors of image generally is based on parameters such as the position, direction, yardstick in the point that has abundant texture in the image or zone, to the description of extracting the statistic of all pixels in the image block around each unique point or the zone.Local feature vectors commonly used is described SIFT (conversion of yardstick invariant features), the GLOH (gradient position and direction histogram) etc. that comprise based on histogram of gradients, they not only have the very strong property distinguished, different picture materials be can distinguish, to a certain degree picture noise and the error brought of feature detection also can be tolerated simultaneously.

The visual word bag model based on the image retrieval of local feature vectors or classified use image in present most forward positions is realized the extensibility of system.The visual word bag model utilizes the local feature vectors of training image to form " vision vocabulary " in advance, and utilize and should " vision vocabulary " the image local feature vector be quantized, with the cluster centre that is expressed as them-" the vision word " of similar image local feature vector approximation.Thus, image is represented as the set of a group " vision word ".Subsequently, people utilize " the vision word " of inverted index table memory image, and utilize the TF-IDF model in the text retrieval that image is retrieved.

Can see that in the image retrieval based on local feature vision word quantized result has great influence for final result for retrieval.

Common vision word method for expressing adopts the k-means algorithm that the feature samples training set is carried out cluster, and each cluster centre is corresponding to a vision word, and the visual word allusion quotation formed in all vision words.People such as Jurie have a few in conjunction with online cluster and two kinds of methods of Mean-Shift, produce vision word more uniformly; People such as Nister represent to use in the process more vision word to become possibility by level k-means method construct vision word tree at image; People such as Moosmann consider the random forest algorithm, can effectively improve the formation efficiency of vision dictionary.

The local feature vectors dimension of image is higher, and bearing dimension disaster when the similarity comparison of carrying out between the vector: along with the increase of dimension, the vector distribution of local feature becomes sparse, and most of vector produces high correlation distance.This has just reduced the comparative and universality of visual pattern.

Local feature after the binaryzation has strengthened the space availability ratio of local feature vectors, and has kept stability and the quantity of information of local feature vectors.But the local feature vectors after the binaryzation is not carried out in the research before that the vision word extracts.

List of references

1.J.Philbin，O.Chum，M.Isard，J.Sivic，and?A.Zisserman.Object?retrieval?with?large?vocabularies?and?fast?spatial?matching.In?Proc.CVPR，2007.

2.J.Sivic?and?A.Zisserman，Video?Google：A?Text?Retrieval?Approach?to?Object?Matching?in?Videos，Proc.Ninth?Int’l?Conf.Computer?Vision，2003，pp.1470-1478.

Summary of the invention

The objective of the invention is to propose a kind of method of extracting based on the local feature vectors vision word after the binaryzation, by being gathered, the local feature vectors in the image library carries out binaryzation, obtain the two-value local feature vectors that feature is unique and quantity of information keeps, improve the space availability ratio of feature in vector space, be conducive to improve the uniqueness of vision word, and the Hamming distance by quick calculating two value tags after retrieval or classification application in improve the speed of calculating and the cost that reduces to store.

Overall thought of the present invention is as follows: the local feature vectors of at first extracting all images in the image library, sample and obtain the set of local feature vectors vector, feature in the proper vector set is carried out statistical study, obtain the intermediate value on each dimension of local feature vectors, preserve intermediate value and intermediate value is carried out binaryzation as threshold value to all local feature vectors, cluster is carried out in set to the local feature vectors of image library vector afterwards, with cluster centre as the vision vocabulary.Intermediate value is as threshold value on the characteristic dimension of preserving before utilizing, and the local feature vectors vector of vision vocabulary correspondence is carried out binaryzation.When extracting the vision word of every width of cloth image, at first utilize the dimension intermediate value that binaryzation is carried out in the set of the vector of the local feature vectors on the single image, then two-value local feature vectors vector is searched arest neighbors in two-value vision vocabulary, with the vision word of arest neighbors as the final corresponding vision word of this local feature vectors vector.

Concrete innovative point: this method will utilize the binaryzation feature to improve the uniqueness that the vector space utilization factor strengthens the local feature vectors vector, the quantity of information and the uniqueness that have kept local feature vectors, and then improve the uniqueness of vision word, improved counting yield and the storage efficiency of feature again.

Concrete grammar step of the present invention is:

1 extracts the local feature of all images in the image library, obtains feature samples training set F={f ₁, f ₂..., f _m, the number of m presentation graphic, f _iThe local feature vectors set of presentation video i, f _iCan be expressed as f _i={ t _I1, t _I2..., t _Im, the local feature vectors number of m presentation video, t _ImM the feature of presentation video i;

Local feature vectors is added up in each dimension in 2 pairs of feature samples training sets, obtains the intermediate value B={b on each dimension ₁, b ₂... b _n, n represents the dimension of local feature, b _iIntermediate value on the expression dimension i;

3 pairs of feature samples training sets carry out the cluster training, obtain cluster centre as " vision vocabulary " V={v ₁, v ₂.., v _k, k represents the number of cluster, i.e. the size of vision vocabulary.v _iThe local feature vectors of expression vision word i correspondence is represented;

4 utilize the dimension intermediate value that obtains in the step 2 as threshold value, and the vector in the visual word Table V is carried out binaryzation, obtain the vision vocabulary of binaryzation

5 pairs of single images extract its local feature vectors set f, and utilize dimension intermediate value in the step 2 as threshold value, and each proper vector is carried out binaryzation, obtain the proper vector set f of binaryzation ^b

6 take out the binaryzation proper vector set f of single image ^bIn each binaryzation proper vector, with 4 in binaryzation vision vocabulary in the binaryzation vector compare, the vision word that nearest or the most similar binaryzation vector is corresponding is as the vision word of this binaryzation feature correspondence.

In the said method, the described image local feature vector of step 1, these can express the feature of image local marking area to comprise SIFT, SURF, GLOH, MSER, angle point feature.

Description of drawings

Accompanying drawing is the vision word leaching process figure of image.

Embodiment

The technical scheme of present embodiment is as follows: as shown in drawings, at first extract the local feature vectors of all images in the image library, obtain the feature samples training set, for example extract the SIFT feature of image.Again the characteristic component on each dimension of the proper vector in the features training sample set is added up, obtained the intermediate value on each dimension.The SIFT feature has 128 dimensions, and the value on each dimension is 0,1 ..., the intermediate value b on 255 each dimension of statistics _i, i represents dimension, obtains dimension intermediate value B={b ₁, b ₂..., b _n.

Afterwards the feature samples training set is carried out the k mean cluster, obtain k cluster centre, corresponding one of each cluster centre has unique vision word, and these vision words have just constituted visual word Table V={ v ₁, v ₂..., v _k}

With vision word characteristic of correspondence vector in the vision vocabulary, namely the cluster centre proper vector utilizes dimension intermediate value before as threshold value, converts it into the vision vocabulary into binaryzation afterwards

Conversion formula is:

Formula 1

b _iThe expression value of dimension intermediate value on the i dimension, i.e. binary-state threshold on the dimension i, t _iThe component value of expression local feature vectors on dimension i,

The binaryzation conversion values of expression local feature vectors on dimension i.

To single image, extract its local feature vectors set f, and utilize dimension intermediate value B as threshold value, 1 utilizes formula 1 that each proper vector is carried out binaryzation, obtains the proper vector set f of binaryzation ^b

Take out the binaryzation proper vector set f of single image ^bIn each binaryzation proper vector, with binaryzation visual word Table V ^bIn the binaryzation vector compare, the vision word that nearest or the most similar binaryzation vector is corresponding obtains the vision set of letters of this width of cloth image thus as the vision word of this binaryzation feature correspondence.

Should be understood that above-mentioned description at embodiment is comparatively concrete, can not therefore think the restriction to scope of patent protection of the present invention, scope of patent protection of the present invention should be as the criterion with claims.

Claims

1. the extracting method of vision word in the image retrieval is characterized in that, may further comprise the steps:

1.1 the local feature of all images in the extraction image library obtains feature samples training set F={f ₁, f ₂..., f _m, the number of m presentation graphic, f _iThe local feature vectors set of presentation video i, f _iCan be expressed as f _i={ t _I1, t _I2..., t _Im, the local feature vectors number of m presentation video, t _ImM the feature of presentation video i;

1.2 local feature vectors in the feature samples training set is added up in each dimension, obtains the intermediate value B={b on each dimension ₁, b ₂... b _n, n represents the dimension of local feature, b _iIntermediate value on the expression dimension i;

1.3 the feature samples training set is carried out the cluster training, obtains cluster centre as " vision vocabulary " V={v ₁, v ₂..., v _k, k represents the number of cluster, i.e. the size of vision vocabulary.v _iThe local feature vectors of expression vision word i correspondence is represented;

1.4 utilize the dimension intermediate value that obtains in 1.2 as threshold value, the vector in the visual word Table V is carried out binaryzation, obtain the vision vocabulary of binaryzation

1.5 to single image, extract its local feature vectors set f, and utilize dimension intermediate value in 1.2 as threshold value, each proper vector is carried out binaryzation, obtain the proper vector set f of binaryzation ^b

1.6 take out the binaryzation proper vector set f of single image ^bIn each binaryzation proper vector, with 1.4 in binaryzation vision vocabulary in the binaryzation vector compare, the vision word that nearest or the most similar binaryzation vector is corresponding is as the vision word of this binaryzation feature correspondence.

2. the extracting method of vision word in the image retrieval is characterized in that: the described image local feature of step 1.1, these can express the feature of image local marking area to comprise SIFT, SURF, GLOH, MSER, angle point feature.