CN103235955A - Extraction method of visual word in image retrieval - Google Patents
Extraction method of visual word in image retrieval Download PDFInfo
- Publication number
- CN103235955A CN103235955A CN2013101591837A CN201310159183A CN103235955A CN 103235955 A CN103235955 A CN 103235955A CN 2013101591837 A CN2013101591837 A CN 2013101591837A CN 201310159183 A CN201310159183 A CN 201310159183A CN 103235955 A CN103235955 A CN 103235955A
- Authority
- CN
- China
- Prior art keywords
- binaryzation
- local feature
- feature
- vision
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an extraction method of a visual word in image retrieval, belonging to the field of intelligent information processing such as multimedia information retrieval and mode recognition. According to the extraction method, the feature uniqueness and the information amount maintaining binary local feature are obtained by binarizing a local feature set in an image library, so that the space utilization rate of the feature is improved in the vector space, the uniqueness of the visual word is favorably improved, and the calculation speed is improved and the memory cost is reduced in the subsequent retrieval or classification application by quickly calculating the hamming distance of the binary feature.
Description
Technical field
The invention belongs to information retrieval field such as multimedia information retrieval, data mining and pattern-recognition, be specifically related to the extracting method of vision word in a kind of image retrieval.
Background technology
The CBIR technology is by retrieving feature analyses such as color of image, texture, shapes, and result for retrieval visually embodies the correlativity with query image.Visual feature of image can be divided into global characteristics and local feature vectors.The global image feature representation global statistics of integral image feature, comparatively responsive to the position of object in the image and dimensional variation etc., distribute or the shape facility of image-region etc. as color histogram, image texture.The local feature vectors of image generally is based on parameters such as the position, direction, yardstick in the point that has abundant texture in the image or zone, to the description of extracting the statistic of all pixels in the image block around each unique point or the zone.Local feature vectors commonly used is described SIFT (conversion of yardstick invariant features), the GLOH (gradient position and direction histogram) etc. that comprise based on histogram of gradients, they not only have the very strong property distinguished, different picture materials be can distinguish, to a certain degree picture noise and the error brought of feature detection also can be tolerated simultaneously.
The visual word bag model based on the image retrieval of local feature vectors or classified use image in present most forward positions is realized the extensibility of system.The visual word bag model utilizes the local feature vectors of training image to form " vision vocabulary " in advance, and utilize and should " vision vocabulary " the image local feature vector be quantized, with the cluster centre that is expressed as them-" the vision word " of similar image local feature vector approximation.Thus, image is represented as the set of a group " vision word ".Subsequently, people utilize " the vision word " of inverted index table memory image, and utilize the TF-IDF model in the text retrieval that image is retrieved.
Can see that in the image retrieval based on local feature vision word quantized result has great influence for final result for retrieval.
Common vision word method for expressing adopts the k-means algorithm that the feature samples training set is carried out cluster, and each cluster centre is corresponding to a vision word, and the visual word allusion quotation formed in all vision words.People such as Jurie have a few in conjunction with online cluster and two kinds of methods of Mean-Shift, produce vision word more uniformly; People such as Nister represent to use in the process more vision word to become possibility by level k-means method construct vision word tree at image; People such as Moosmann consider the random forest algorithm, can effectively improve the formation efficiency of vision dictionary.
The local feature vectors dimension of image is higher, and bearing dimension disaster when the similarity comparison of carrying out between the vector: along with the increase of dimension, the vector distribution of local feature becomes sparse, and most of vector produces high correlation distance.This has just reduced the comparative and universality of visual pattern.
Local feature after the binaryzation has strengthened the space availability ratio of local feature vectors, and has kept stability and the quantity of information of local feature vectors.But the local feature vectors after the binaryzation is not carried out in the research before that the vision word extracts.
List of references
1.J.Philbin,O.Chum,M.Isard,J.Sivic,and?A.Zisserman.Object?retrieval?with?large?vocabularies?and?fast?spatial?matching.In?Proc.CVPR,2007.
2.J.Sivic?and?A.Zisserman,Video?Google:A?Text?Retrieval?Approach?to?Object?Matching?in?Videos,Proc.Ninth?Int’l?Conf.Computer?Vision,2003,pp.1470-1478.
Summary of the invention
The objective of the invention is to propose a kind of method of extracting based on the local feature vectors vision word after the binaryzation, by being gathered, the local feature vectors in the image library carries out binaryzation, obtain the two-value local feature vectors that feature is unique and quantity of information keeps, improve the space availability ratio of feature in vector space, be conducive to improve the uniqueness of vision word, and the Hamming distance by quick calculating two value tags after retrieval or classification application in improve the speed of calculating and the cost that reduces to store.
Overall thought of the present invention is as follows: the local feature vectors of at first extracting all images in the image library, sample and obtain the set of local feature vectors vector, feature in the proper vector set is carried out statistical study, obtain the intermediate value on each dimension of local feature vectors, preserve intermediate value and intermediate value is carried out binaryzation as threshold value to all local feature vectors, cluster is carried out in set to the local feature vectors of image library vector afterwards, with cluster centre as the vision vocabulary.Intermediate value is as threshold value on the characteristic dimension of preserving before utilizing, and the local feature vectors vector of vision vocabulary correspondence is carried out binaryzation.When extracting the vision word of every width of cloth image, at first utilize the dimension intermediate value that binaryzation is carried out in the set of the vector of the local feature vectors on the single image, then two-value local feature vectors vector is searched arest neighbors in two-value vision vocabulary, with the vision word of arest neighbors as the final corresponding vision word of this local feature vectors vector.
Concrete innovative point: this method will utilize the binaryzation feature to improve the uniqueness that the vector space utilization factor strengthens the local feature vectors vector, the quantity of information and the uniqueness that have kept local feature vectors, and then improve the uniqueness of vision word, improved counting yield and the storage efficiency of feature again.
Concrete grammar step of the present invention is:
1 extracts the local feature of all images in the image library, obtains feature samples training set F={f
1, f
2..., f
m, the number of m presentation graphic, f
iThe local feature vectors set of presentation video i, f
iCan be expressed as f
i={ t
I1, t
I2..., t
Im, the local feature vectors number of m presentation video, t
ImM the feature of presentation video i;
Local feature vectors is added up in each dimension in 2 pairs of feature samples training sets, obtains the intermediate value B={b on each dimension
1, b
2... b
n, n represents the dimension of local feature, b
iIntermediate value on the expression dimension i;
3 pairs of feature samples training sets carry out the cluster training, obtain cluster centre as " vision vocabulary " V={v
1, v
2.., v
k, k represents the number of cluster, i.e. the size of vision vocabulary.v
iThe local feature vectors of expression vision word i correspondence is represented;
4 utilize the dimension intermediate value that obtains in the step 2 as threshold value, and the vector in the visual word Table V is carried out binaryzation, obtain the vision vocabulary of binaryzation
5 pairs of single images extract its local feature vectors set f, and utilize dimension intermediate value in the step 2 as threshold value, and each proper vector is carried out binaryzation, obtain the proper vector set f of binaryzation
b
6 take out the binaryzation proper vector set f of single image
bIn each binaryzation proper vector, with 4 in binaryzation vision vocabulary in the binaryzation vector compare, the vision word that nearest or the most similar binaryzation vector is corresponding is as the vision word of this binaryzation feature correspondence.
In the said method, the described image local feature vector of step 1, these can express the feature of image local marking area to comprise SIFT, SURF, GLOH, MSER, angle point feature.
Description of drawings
Accompanying drawing is the vision word leaching process figure of image.
Embodiment
The technical scheme of present embodiment is as follows: as shown in drawings, at first extract the local feature vectors of all images in the image library, obtain the feature samples training set, for example extract the SIFT feature of image.Again the characteristic component on each dimension of the proper vector in the features training sample set is added up, obtained the intermediate value on each dimension.The SIFT feature has 128 dimensions, and the value on each dimension is 0,1 ..., the intermediate value b on 255 each dimension of statistics
i, i represents dimension, obtains dimension intermediate value B={b
1, b
2..., b
n.
Afterwards the feature samples training set is carried out the k mean cluster, obtain k cluster centre, corresponding one of each cluster centre has unique vision word, and these vision words have just constituted visual word Table V={ v
1, v
2..., v
k}
With vision word characteristic of correspondence vector in the vision vocabulary, namely the cluster centre proper vector utilizes dimension intermediate value before as threshold value, converts it into the vision vocabulary into binaryzation afterwards
Conversion formula is:
b
iThe expression value of dimension intermediate value on the i dimension, i.e. binary-state threshold on the dimension i, t
iThe component value of expression local feature vectors on dimension i,
The binaryzation conversion values of expression local feature vectors on dimension i.
To single image, extract its local feature vectors set f, and utilize dimension intermediate value B as threshold value, 1 utilizes formula 1 that each proper vector is carried out binaryzation, obtains the proper vector set f of binaryzation
b
Take out the binaryzation proper vector set f of single image
bIn each binaryzation proper vector, with binaryzation visual word Table V
bIn the binaryzation vector compare, the vision word that nearest or the most similar binaryzation vector is corresponding obtains the vision set of letters of this width of cloth image thus as the vision word of this binaryzation feature correspondence.
Should be understood that above-mentioned description at embodiment is comparatively concrete, can not therefore think the restriction to scope of patent protection of the present invention, scope of patent protection of the present invention should be as the criterion with claims.
Claims (2)
1. the extracting method of vision word in the image retrieval is characterized in that, may further comprise the steps:
1.1 the local feature of all images in the extraction image library obtains feature samples training set F={f
1, f
2..., f
m, the number of m presentation graphic, f
iThe local feature vectors set of presentation video i, f
iCan be expressed as f
i={ t
I1, t
I2..., t
Im, the local feature vectors number of m presentation video, t
ImM the feature of presentation video i;
1.2 local feature vectors in the feature samples training set is added up in each dimension, obtains the intermediate value B={b on each dimension
1, b
2... b
n, n represents the dimension of local feature, b
iIntermediate value on the expression dimension i;
1.3 the feature samples training set is carried out the cluster training, obtains cluster centre as " vision vocabulary " V={v
1, v
2..., v
k, k represents the number of cluster, i.e. the size of vision vocabulary.v
iThe local feature vectors of expression vision word i correspondence is represented;
1.4 utilize the dimension intermediate value that obtains in 1.2 as threshold value, the vector in the visual word Table V is carried out binaryzation, obtain the vision vocabulary of binaryzation
1.5 to single image, extract its local feature vectors set f, and utilize dimension intermediate value in 1.2 as threshold value, each proper vector is carried out binaryzation, obtain the proper vector set f of binaryzation
b
1.6 take out the binaryzation proper vector set f of single image
bIn each binaryzation proper vector, with 1.4 in binaryzation vision vocabulary in the binaryzation vector compare, the vision word that nearest or the most similar binaryzation vector is corresponding is as the vision word of this binaryzation feature correspondence.
2. the extracting method of vision word in the image retrieval is characterized in that: the described image local feature of step 1.1, these can express the feature of image local marking area to comprise SIFT, SURF, GLOH, MSER, angle point feature.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013101591837A CN103235955A (en) | 2013-05-03 | 2013-05-03 | Extraction method of visual word in image retrieval |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013101591837A CN103235955A (en) | 2013-05-03 | 2013-05-03 | Extraction method of visual word in image retrieval |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103235955A true CN103235955A (en) | 2013-08-07 |
Family
ID=48883994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2013101591837A Pending CN103235955A (en) | 2013-05-03 | 2013-05-03 | Extraction method of visual word in image retrieval |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103235955A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095884A (en) * | 2015-08-31 | 2015-11-25 | 桂林电子科技大学 | Pedestrian recognition system and pedestrian recognition processing method based on random forest support vector machine |
CN105760875A (en) * | 2016-03-10 | 2016-07-13 | 西安交通大学 | Binary image feature similarity discrimination method based on random forest algorithm |
CN105975643A (en) * | 2016-07-22 | 2016-09-28 | 南京维睛视空信息科技有限公司 | Real-time image retrieval method based on text index |
CN109711250A (en) * | 2018-11-13 | 2019-05-03 | 深圳市深网视界科技有限公司 | Feature vector binaryzation, similarity evaluation, search method, equipment and medium |
CN111373393A (en) * | 2017-11-24 | 2020-07-03 | 华为技术有限公司 | Image retrieval method and device and image library generation method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521618A (en) * | 2011-11-11 | 2012-06-27 | 北京大学 | Extracting method for local descriptor, image searching method and image matching method |
US20120237096A1 (en) * | 2006-05-03 | 2012-09-20 | University Of Tennessee Research Foundation | Method and system for the diagnosis of disease using retinal image content and an archive of diagnosed human patient data |
-
2013
- 2013-05-03 CN CN2013101591837A patent/CN103235955A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120237096A1 (en) * | 2006-05-03 | 2012-09-20 | University Of Tennessee Research Foundation | Method and system for the diagnosis of disease using retinal image content and an archive of diagnosed human patient data |
CN102521618A (en) * | 2011-11-11 | 2012-06-27 | 北京大学 | Extracting method for local descriptor, image searching method and image matching method |
Non-Patent Citations (2)
Title |
---|
WENGANG ZHOU 等: "《ICIMCS"12 Proceedings of the 4th international Conference on Internet Multimedia Computing and Service》", 9 September 2012 * |
孙孟柯 等: ""基于Bag of words模型的图像检索***的设计与实现"", 《电脑知识与技术》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095884A (en) * | 2015-08-31 | 2015-11-25 | 桂林电子科技大学 | Pedestrian recognition system and pedestrian recognition processing method based on random forest support vector machine |
CN105095884B (en) * | 2015-08-31 | 2018-11-13 | 桂林电子科技大学 | A kind of pedestrian's identifying system and processing method based on random forest support vector machines |
CN105760875A (en) * | 2016-03-10 | 2016-07-13 | 西安交通大学 | Binary image feature similarity discrimination method based on random forest algorithm |
CN105760875B (en) * | 2016-03-10 | 2019-03-01 | 西安交通大学 | The similar implementation method of differentiation binary picture feature based on random forests algorithm |
CN105975643A (en) * | 2016-07-22 | 2016-09-28 | 南京维睛视空信息科技有限公司 | Real-time image retrieval method based on text index |
CN105975643B (en) * | 2016-07-22 | 2019-08-16 | 南京维睛视空信息科技有限公司 | A kind of realtime graphic search method based on text index |
CN111373393A (en) * | 2017-11-24 | 2020-07-03 | 华为技术有限公司 | Image retrieval method and device and image library generation method and device |
CN111373393B (en) * | 2017-11-24 | 2022-05-31 | 华为技术有限公司 | Image retrieval method and device and image library generation method and device |
CN109711250A (en) * | 2018-11-13 | 2019-05-03 | 深圳市深网视界科技有限公司 | Feature vector binaryzation, similarity evaluation, search method, equipment and medium |
CN109711250B (en) * | 2018-11-13 | 2024-02-02 | 深圳市深网视界科技有限公司 | Feature vector binarization, similarity evaluation, retrieval method, device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Melekhov et al. | Siamese network features for image matching | |
Saavedra et al. | Sketch based Image Retrieval using Learned KeyShapes (LKS). | |
Wang et al. | Clothes search in consumer photos via color matching and attribute learning | |
Yi et al. | Feature representations for scene text character recognition: A comparative study | |
Shen et al. | Mobile product image search by automatic query object extraction | |
CN106126581A (en) | Cartographical sketching image search method based on degree of depth study | |
Zhang et al. | Automatic discrimination of text and non-text natural images | |
CN103235955A (en) | Extraction method of visual word in image retrieval | |
CN104850822A (en) | Blade identification method based on multi-characteristic fusion simple background | |
Sachdeva et al. | Better object recognition using bag of visual word model with compact vocabulary | |
Gao et al. | Efficient view-based 3-D object retrieval via hypergraph learning | |
Ghosh et al. | Efficient indexing for query by string text retrieval | |
JP5959446B2 (en) | Retrieval device, program, and method for high-speed retrieval by expressing contents as a set of binary feature vectors | |
Chathurani et al. | Content-based image (object) retrieval with rotational invariant bag-of-visual words representation | |
Arandjelovic | Matching objects across the textured-smooth continuum | |
Wang et al. | Spatial matching of sketches without point correspondence | |
Xi et al. | Beyond context: Exploring semantic similarity for tiny face detection | |
ur Rehman et al. | Multi-feature fusion based re-ranking for person re-identification | |
Liu et al. | Person re-identification by local feature based on super pixel | |
Qi et al. | Saliency meets spatial quantization: A practical framework for large scale product search | |
Kulkarni et al. | Image Class Ranking using Bag of Features | |
Abdelkhalak et al. | Content-based bird retrieval using shape context, color moments and bag of features | |
Belver et al. | Evaluating age estimation using deep convolutional neural nets | |
Bursuc et al. | Retrieval of multiple instances of objects in videos | |
Wang et al. | A multi-instance multi-label learning framework of image retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20130807 |