CN106557779A

CN106557779A - A kind of object identification method based on marking area bag of words

Info

Publication number: CN106557779A
Application number: CN201610921396.2A
Authority: CN
Inventors: 袁家政; 刘宏哲; 郭燕飞
Original assignee: Beijing Union University
Current assignee: Beijing Union University
Priority date: 2016-10-21
Filing date: 2016-10-21
Publication date: 2017-04-05

Abstract

The present invention provides a kind of object identification method based on marking area bag of words, comprises the following steps：Corner Detection, positions the marking area of image, and SIFT feature is extracted, image area characteristics similarity system design.The method is to extract local feature in target area, is on the one hand avoided using complicated image Segmentation Technology；On the other hand greatly reduce the characteristic point unrelated with object.

Description

A kind of object identification method based on marking area bag of words

Technical field

The present invention relates to the technical field of Digital Image Processing, particularly a kind of object based on marking area bag of words Recognition methodss.

Background technology

With developing rapidly for machine learning and area of pattern recognition correlation technique, computer vision technique is constantly carried Height, imitated using computer the mankind cognitive competence so that process mitigate or assist people complete routine work task into For possibility.Object identification has become a particularly important research direction in pattern recognition, has in military and civilian field Extensive demand and application.Such as：Intelligent video monitoring, self-driving navigation, man-machine interaction, magnanimity based on content in the Internet Image retrieval etc..

How object is accurately and effectively recognized, to meet growing collection machine vision, psychology and practical application request, still It is a challenging job.Research shows that the mankind obtain the information in the external world, have 60% from visual information.Meter In calculation machine visual field, the form of expression of visual information is mainly image and video, and wherein image is the main load of visual information Body.Image is analyzed, human visual system is simulated, cognitive identification is carried out to numerous and diverse ten million object, extract object what The feature of sample, and effective object representation is realized, the object model of reasonable simplicity is set up, more highlightedly goes to distinguish an object With another object, these are all the key problems for recognizing object.

Object identification is most to talk about one of research topic of jump in computer vision field, quickly and accurately identifies object It is an important research direction.Visual angle is often received in object identification, and yardstick blocks, the factor interference such as background mixes, be reply this A little to challenge, many scholars propose to set up feature description to image using local feature.Feature bag BOW tables based on local key point Good performance is shown up to method in various visualization classification tasks.Traditional recognition methodss based on bag of words are general Realized using SIFT description, K mean cluster and grader.One of defect of bag of words is front background regardless of for shape Into in the vision word of BOW expression, some are extracted from background parts.SIFT description extract characteristic point enormous amount, energy The impact that change of scale and the anglec of rotation are brought is resisted very well, and analysis SIFT describes son and characteristics of image can be effectively ensured in quantity The adequacy and robustness of expression.SIFT description also have the aspect of its deficiency as the representative in local feature description's algorithm, The extraction of key point is that, based on entire image, the point of interest for detecting much comes from background parts.And based on general emerging There is the defect of lazy weight in the characteristic point detected by interest point detective operators, make image feature representation not abundant enough, but often The characteristic point for detecting concentrates on target location.

The patent document of Publication No. CN105654122A is disclosed to be known based on the spatial pyramid object that kernel function is matched Other method.Comprise the steps of：Extract ED-SIFT (the Efficient Dense Scale-invariant of subject image Feature Transform) description；Using k-means++ clustering algorithms by the ED-SIFT description son clusters of training sample, Obtain visual dictionary；Spatial pyramid is introduced, and the verse word for acquisition training sample and test sample being matched using kernel function is straight Fang Tu；The training and test sample identification of training sample are completed using SVM classifier.ED-SIFT description that the method is proposed It is that, based on entire image, the point of interest for detecting much comes from background parts in the extraction of key point, front background can be caused Regardless of；K-means++ clusters are as a kind of unsupervised learning algorithm, sensitive to abnormal data in addition, once occur in data set Exceptional value can cause the impact that can not be despised to experimental result；And it relies on the selection of k value, so that k value is determined in advance, And impact of the selection of k value to classification is most important, its appropriate level directly decides the quality of classification.

The content of the invention

In order to solve above-mentioned technical problem, the present invention is showed for the conventional local feature for building visual dictionary Unstable, unreliable or unrelated with object problem, proposes a kind of object identification method based on marking area bag of words.It is first First, the method utilizes strong Corner Detection device, determines the marking area of image.Then, local feature is extracted simultaneously from marking area Bag of words are modeled as, and finally recognition result are provided using nearest neighbor classifier.

The present invention provides a kind of object identification method based on marking area bag of words, comprises the following steps：

Step 1：Corner Detection；

Step 2：The marking area of positioning image；

Step 3：SIFT feature is extracted；

Step 4：Image area characteristics similarity system design.

Preferably, the angle point is ShiTomasi angle points, and its rate of change for passing through to calculate gradient direction is calculated.

In any of the above-described scheme preferably, the ShiTomasi angle points are that brightness of image changes violent or curvature Very big point.

In any of the above-described scheme preferably, the step 2 is that positioning image key area is converted to angle in image The region of point distribution.

In any of the above-described scheme preferably, the localization method is：Image is divided into into m × n blocks, angle in counting per block The quantity of point, the angle point quantity included per block is recorded in a m * n matrix.If piecemeal interior angle point quantity >=q, it is believed that angle Continuous concentrated area that point is located is image key area, and wherein q is the threshold value for judging piecemeal interior angle point quantity, is included for screening The background area of isolated or a small amount of angle point.

In any of the above-described scheme preferably, the step 3 includes that DoG extreme points extractions and characteristic vector are formed.

In any of the above-described scheme preferably, the method for the DoG extreme points extractions is by carrying out to original image Scaling, obtains multi-scale image space representation sequence, completes the feature extraction on different resolution.

In any of the above-described scheme preferably, the yardstick includes at least one of large scale and little yardstick.

In any of the above-described scheme preferably, the large scale (low resolution), embodies the general picture feature of object.

In any of the above-described scheme preferably, the little yardstick (high-resolution), embodies the minutia of object.

In any of the above-described scheme preferably, the metric space L (x, y, σ) of an image be defined as original image I (x, Y) 2 dimension Gaussian function G (x, y, the σ) convolution algorithms with a variable dimension.Wherein σ represents scale size, Gaussian function definition It is as follows：

Gaussian convolution core and original image convolution, obtain metric space, are defined as：

L (x, y, σ)=G (x, y, σ) * I (x, y)

It is that stable key point is found on multiscale space, it is necessary first to build image pyramid, then in pyramid Each layer it is adjacent make difference construct Gaussian difference scale space (DOG),

D (x, y, σ)=(G (x, y, k σ)-G (x, y, σ)) * I (x, y)=L (x, y, k σ)-L (x, y, σ)

In any of the above-described scheme preferably, using histogram of gradients statistic law, extreme point is origin, counts certain area In domain, pixel generates made contribution to extreme point direction.Rectangular histogram totally 36 post, is a post per 10 degree, the width that will be put in neighborhood Value is added in rectangular histogram according to the post corresponding to angle, and the length of post represents the size of gradient magnitude.

In any of the above-described scheme preferably, the method that the characteristic vector is formed is to be tried to achieve by DoG metric spaces Extreme point so as to possess scale invariability, using the gradient direction distribution characteristic of extreme point neighborhood territory pixel, Jing statistics with histogram is true Determine the principal direction of extreme point, be each extreme point assigned direction parameter, so as to obtain rotational invariance.Each is calculated using following Gradient magnitude m (x, y) and gradient direction θ (x, y) of pixel：

In any of the above-described scheme preferably, before setting up key point description vectors, rotatable coordinate axis are to the pass first The principal direction of key point, centered on key point, counts the gradient direction distribution rectangular histogram of 4 × 4 sub-regions.Wherein, per height Region is the image block of 4 × 4 pixel compositions, and the total size of subregion is 16 × 16 pixels.

In any of the above-described scheme preferably, the step 4 is that the image in image library is trained and was tested Journey.

In any of the above-described scheme preferably, the training process is comprised the following steps：

Step a1：Training object picture is read in, the marking area of image is determined；

Step a2：The SIFT feature of training sample is extracted in marking area, if the total i width of training picture, each image SIFT feature point number be n_1,n₂..., n_i, the SIFT feature sum of extraction is (n₁+n₂+…+n_i)；

Step a3：It is (n with a size₁+n₂+…+n_iThe original training matrix of) × 128. is depositing all samples SIFT feature, creates the visual dictionary needed for BOW models using k means clustering algorithms.Sizes of the k for visual dictionary, as BOW Histogrammic dimension；

Step a4：Mapped on visual dictionary, the BOW rectangular histograms of statistics training picture, each image are big by one Little k dimensional vectors represent that all of training picture can be stored with the new eigenmatrix of an i*k dimension.

In any of the above-described scheme preferably, the test process is comprised the following steps：

Step b1：Training object picture is read in, the marking area of image is determined；

Step b2：The SIFT feature of training sample is extracted in marking area, if the total i width of training picture, each image SIFT feature point number be n_1,n₂..., n_i, the SIFT feature sum of extraction is (n₁+n₂+…+n_i)；

Step b3：SIFT feature is extracted in target area, by the Projection Character for extracting to visual dictionary, forms test image BOW expression.The visual dictionary of test process projection is that the training process is set up；

Step b4：Mapped on visual dictionary, the BOW rectangular histograms of statistics training picture, each image are big by one Little k dimensional vectors represent that all of training picture can be stored with the new eigenmatrix of an i*k dimension.

In any of the above-described scheme preferably, in the BOW models, image is reached by a vector table, and vector is root What the frequency for being mapped to visual dictionary according to Feature Words was calculated.For example image j can be expressed as：

d_j=(n_{J, 0,}n_{J, 1,}...., n_{J, k-1})

Wherein k represents visual dictionary size, n_{J, i}(i=0,1 ... .k-1) represent that image j is mapped to i-th visual vocabulary Frequency, frequency also becomes code word.

In any of the above-described scheme preferably, with the tf-idf methods of weighting in information retrieval, the BOW for generating cum rights is special Levy.

In any of the above-described scheme preferably, the tf refer to certain key word frequency of occurrences in an article it is high and Seldom occur in other articles, then it is assumed that the word discrimination is high, it is big to article classification contribution.

It is preferably in any of the above-described scheme, the idf is referred to if the file comprising the word is few in document data bank, Then idf weights are bigger, illustrate that the word has good difference degree.

In any of the above-described scheme preferably, tf-idf weights of the vocabulary i in image j are：

w_{J, i}=w_{Tf, j, i}×w_{Idf, j, i}(i=0,1 ... .k-1)

w_{Tf, j, i}For the contribution degree weights that i-th vocabulary is classified to image j, w_{Idf, j, i}For i-th word pair in document data bank The contribution degree weights of image j classification.

In any of the above-described scheme preferably, specifically it is calculated as follows：

n_{J, i}It is number of times that i-th vocabulary occurs in image j, n_jFor vocabulary number summation in image j, i.e.,：

N be train storehouse in image number, n_dIt is the image number for wherein including vocabulary d.Discrimination of the vocabulary in storehouse The frequency occurred in different images with it is inversely proportional to.To prevent the situation except zero, general denominator takes n_d+1。

In any of the above-described scheme preferably, image j cum rights BOW features are designated as：

bof_j=w_{J, i}×d_j(i=0,1 ... .., k-1)

In bag of words, the image comprising the feature quantity such as not can be expressed with the vector of fixed dimension, can be by degree Amount test image realizes object identification with image similarity in training storehouse.

In any of the above-described scheme preferably, test vector：

d_t=(n_{T, 0}, n_{T, 1,}..., n_{T, k-1})

Test image weighting BOW features are designated as bof_i, computational methods are identical with training image.Euclidean distance is to calculate two Absolute distance between characteristic vector, it is directly related with characteristics of image concrete numerical value：COS distance is to calculate two characteristic vectors Between angle judging similarity.Euclidean distance is compared, COS distance stresses difference of two characteristic vectors on direction, and Non- distance or length.The difference of tolerance high dimension vector generally adopts COS distance, but COS distance override feature numerical value, this Kind is insensitive to cause error.Therefore, characteristic vector is owned using the COS distance dis condition S measured similarities of adjustment herein Numerical value in dimension deducts the average of eigenvalue.The COS distance dis of adjustment is bigger, illustrates that the angle between characteristic vector is got over It is little, represent that two width images are more similar.

Method proposed by the present invention is to extract local feature in target area, is on the one hand avoided using complicated image point Cut technology；On the other hand the characteristic point unrelated with object is greatly reduced, is had：Parameter is few, calculates simple, and processing speed is fast, The features such as image recognition effect is good.

Description of the drawings

Fig. 1 is the stream of a preferred embodiment of the object identification method based on marking area bag of words according to the present invention Cheng Tu.

Fig. 2 is the excellent of the structure information bank of the object identification method based on marking area bag of words according to the present invention The flow chart for selecting embodiment.

Fig. 3 is a preferred embodiment of the object identification method based on marking area bag of words according to the present invention DoG metric space structural maps.

Fig. 4 is a preferred embodiment of the object identification method based on marking area bag of words according to the present invention DoG spatial extremas point detection figure.

Fig. 5 is the side of a preferred embodiment of the object identification method based on marking area bag of words according to the present invention Figure is generated to histogrammic.

Fig. 6 is the pass of a preferred embodiment of the object identification method based on marking area bag of words according to the present invention Key point characteristic vector forms figure.

Specific embodiment

The present invention is further elaborated with specific embodiment below in conjunction with the accompanying drawings.

Embodiment one

Marking area positioning, feature extraction and description, figure are included based on the object identification method of marking area bag of words As 3 parts such as provincial characteristicss similarity system designs.

Execution step 100, carries out Corner Detection.

ShiTomasi angle points calculate angle point by the rate of change of calculating gradient direction, and exactly brightness of image change is violent Or the point that curvature is very big, its main thought is the version that signal in image is determined using auto-associating matrix.Assume Signal in image at x is I (x), using picture signal and Gaussian function G (x, σ_D) differential carry out convolution algorithm and obtain single order Derivative.

L (x, σ_D)=I (x) * G_u(x, σ_D) (1)

L_v(x, σ_D)=I (x) * G_v(x, σ_D) (2)

L_u(x, σ_D)L_v(x, σ_D)=I_u(x, σ_D) (3)

Wherein σ_DBe differential yardstick, incidence formula is obtained using formula (1)～(3)

Wherein, σ_iIt is integral scale, G is Gauss operator.

Calculate the eigenvalue λ of correlation matrix C_1,, λ₂, retain min (λ₁, λ₂), obtain image holds value indicative image. when During ShiTomasi angle point grids during given threshold value：I, a little meets min (λ in image₁, λ₂)>During λ, then it is assumed that the point is strong angle Point.

Execution step 110, positions the marking area of image.

Positioning image key area is converted to into the region of angle point distribution in image, localization method is：By image be divided into m × N blocks, the quantity of angle point in counting per block, the angle point quantity included per block is recorded in a m * n matrix.If piecemeal interior angle Point quantity >=q, it is believed that continuous concentrated area that angle point is located is image key area, and wherein q is to judge piecemeal interior angle point quantity Threshold value, for screening the background area comprising isolated or a small amount of angle point.

Execution step 120, carries out SIFT feature extraction, and SIFT feature is extracted includes following two parts：

(1) DoG extreme points extractions.

One sparse combination of the local feature region obtained by property detector is the basis for building BOW.Accurately to show Object will be reflected using certain yardstick.The main thought of Scale-space theory is by carrying out yardstick contracting to original image Put, obtain multi-scale image space representation sequence, complete the feature extraction on different resolution.Large scale (low resolution), body The general picture feature of existing object：Little yardstick (high-resolution), embodies the minutia of object.One image metric space L (x, y, 2 dimension Gaussian function G (x, y, σ) convolution algorithms of original image I (x, y) and a variable dimension are defined as σ).Wherein σ represents chi Degree size, Gaussian function are defined as follows：

L (x, y, σ)=G (x, y, σ) * I (x, y)

(6)

D (x, y, σ)=(G (x, y, k σ)-G (x, y, σ)) * I (x, y)=L (x, y, k σ)-L (x, y, σ) (7)

The Local Extremum that DoG is spatially detected as key point, be find extreme point, each pixel will with With corresponding 9 × 2 points of 8 consecutive points and neighbouring yardstick of yardstick, totally 26 points are compared for it, to ensure in yardstick Space and two dimensional image space all detect extreme point, and the pixel is all bigger than 26 points or all little, that is, be defined as a pass Key point.

(2) characteristic vector is formed.

Extreme point is tried to achieve by DoG metric spaces so as to possess scale invariability, using the gradient of extreme point neighborhood territory pixel Direction Distribution Characteristics, Jing statistics with histogram determine the principal direction of extreme point, are each extreme point assigned direction parameter, so as to obtain Rotational invariance.Gradient magnitude m (x, y) and gradient direction θ (x, y) of each pixel are calculated using formula (4), (5)：

Using histogram of gradients statistic law, extreme point is origin, and in statistics certain area, pixel is given birth to extreme point direction The contribution of Cheng Suozuo.Rectangular histogram totally 36 post, is a post per 10 degree, the amplitude put in neighborhood is added to according to the post corresponding to angle In rectangular histogram, the length of post represents the size of gradient magnitude.

Before setting up key point description vectors, the principal direction of rotatable coordinate axis to the key point first, in key point being The heart, counts the gradient direction distribution rectangular histogram of 4 × 4 sub-regions.Wherein, every sub-regions are the figures of 4 × 4 pixels compositions As block, the total size of subregion is 16 × 16 pixels.

Execution step 130, image area characteristics similarity system design.

We need to be trained the image in image library and test process, finally give the object of identification.

Embodiment two

As shown in Fig. 2 image area characteristics similarity system design includes the image in image library is trained and was tested Journey.

(1) training process

Execution step 200, reads in training object picture, determines the marking area of image.

Execution step 210, extracts the SIFT feature of training sample in marking area, if the total i width of training picture, per width The SIFT feature point number of image is n_1,n₂..., n_i, the SIFT feature sum of extraction is (n₁+n₂+…+n_i)。

Execution step 211, is (n with a size₁+n₂+…+n_iThe original training matrix of) × 128. is depositing all samples This SIFT feature, creates the visual dictionary needed for BOW models using k means clustering algorithms.Sizes of the k for visual dictionary, i.e., For the histogrammic dimensions of BOW.Execution step 230, is mapped on visual dictionary, the BOW rectangular histograms of statistics training picture, often Width image is represented that by a size k dimensional vector all of training picture can be stored with the new eigenmatrix of an i*k dimension.

(2) test process

Execution step 220, extracts the SIFT feature of training sample in marking area, if the total i width of training picture, per width The SIFT feature point number of image is n_1,n₂..., n_i, the SIFT feature sum of extraction is (n₁+n₂+…+n_i)。

Execution step 221, extracts SIFT feature in target area, by the Projection Character for extracting to visual dictionary, is formed The BOW expression of test image.

In BOW models, image is reached by a vector table, and vector is the cymometer that visual dictionary is mapped to according to Feature Words Calculate.For example image j can be expressed as:

d_j=(n_{J, 0,}n_{J, 1,}...., n_{J, k-1})

(6)

To project the Feature Words high to image expression contribution degree, herein using the tf-idf methods of weighting in information retrieval, Generate the BOW features of cum rights.Tf thoughts are:Certain key word frequency of occurrences in an article is high and seldom go out in other articles It is existing, then it is assumed that the word discrimination is high, it is big to article classification contribution.Idf thoughts are:If the text comprising the word in document data bank Part is few, then idf weights are bigger, illustrates that the word has good difference degree.Tf-idf weights of the vocabulary i in image j are：

w_{J, i}=w_{Tf, j, i}×w_{Idf, j, i}(i=0,1 ... .k-1)

(7)

W in formula (7)_{Tf, j, i}For the contribution degree weights that i-th vocabulary is classified to image j, w_{Idf, j, i}For in document data bank The contribution degree weights that i word is classified to image j.Specifically it is calculated as follows：

In formula (8), n_{J, i}It is number of times that i-th vocabulary occurs in image j, n_jFor vocabulary number summation in image j, i.e.,：

In formula (10), N be train storehouse in image number, n_dIt is the image number for wherein including vocabulary d.Vocabulary is in storehouse The frequency that occurs in different images with it of discrimination be inversely proportional to.To prevent the situation except zero, general denominator takes n_d+1.

Image j cum rights BOW features are designated as：

bof_j=w_{J, i}×d_j(i=0,1 ... .., k-1) (11)

In bag of words, the image comprising the feature quantity such as not can be expressed with the vector of fixed dimension, can be by degree Amount test image realizes object identification with image similarity in training storehouse.Test vector:

d_t=(n_{T, 0}, n_{T, 1}... ..., n_{T, k-1})

The visual dictionary of test process projection is that training process is set up.

Embodiment three

As shown in figure 3, being that stable key point is found on multiscale space, pyramidal each layer is constructed first adjacent Make difference and construct Gaussian difference scale space (DoG)：

DoG difference pyramids are obtained by gaussian pyramid.

As shown in figure 4, the Local Extremum for spatially detecting DoG is used as key point, it is to find extreme point, each picture Vegetarian refreshments will with corresponding 9 × 2 points of 8 consecutive points and neighbouring yardstick of yardstick, totally 26 points are compared with it, with Ensure all to detect extreme point in metric space and two dimensional image space, the pixel is all bigger than 26 points or all little, i.e., really It is set to a key point.

Example IV

As shown in figure 5, histogram of gradients statistic law is adopted, and with extreme point as origin, pixel pair in statistics certain area Extreme point direction generates done contribution.Rectangular histogram totally 36 post, it is per 10 degree of one post, the amplitude put in neighborhood is right according to angle institute The post answered is added in rectangular histogram, and the length of post represents the size of gradient magnitude.The extreme value gone out using the statistics with histogram of 7 posts The schematic diagram of point principal direction, the principal direction of neighborhood histogram of gradients are the peak value of statistic histogram.

Embodiment five

As shown in fig. 6, what the left side represented is 8 × 8 pixel sizes, 2 × 2 sub-regions form the example of SIFT description. Each little lattice represents a pixel of crucial vertex neighborhood place metric space, and the direction of arrow represents pixel gradient direction, Arrow length represents the amplitude of the pixel.The gradient orientation histogram in 8 directions is calculated in 4 × 4 pixel size windows, is drawn One seed point of cumulative formation of each gradient direction.The right represent key point formed by 4 seed points, each seed point by 4 × 4 pixel statistics direction histogram is formed, and the characteristic vector of formation is 2 × 2 × 8=32 dimensions.Actual pixel size is 16 × 16,4 × 4=16 seed point can be formed, characteristic vector is tieed up for 4 × 4 × 8=128.

For a better understanding of the present invention, it is described in detail above in association with the specific embodiment of the present invention, but is not Limitation of the present invention.Every technical spirit according to the present invention still belongs to any simple modification made for any of the above embodiments In the scope of technical solution of the present invention.What in this specification, each embodiment was stressed be it is different from other embodiments it Place, same or analogous part cross-reference between each embodiment.For system embodiment, due to itself and method Embodiment is corresponded to substantially, so description is fairly simple, related part is illustrated referring to the part of embodiment of the method.

The method of the present invention, device and system may be achieved in many ways.For example, software, hardware, firmware can be passed through Or any combinations of software, hardware, firmware are realizing the method for the present invention and system.For above-mentioned the step of methods described Order is not limited to order described in detail above merely to illustrate, the step of the method for the present invention, unless with other sides Formula is illustrated.Additionally, in certain embodiments, also the present invention can be embodied as recording program in the recording medium, these Program is included for realizing the machine readable instructions of the method according to the invention.Thus, the present invention also covers storage for performing The recording medium of the program of the method according to the invention.

Description of the invention is given for the sake of example and description, and is not exhaustively or by the present invention It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.Select and retouch It is, for the principle and practical application that more preferably illustrate the present invention, and one of ordinary skill in the art is managed to state embodiment The present invention is solved so as to design the various embodiments with various modifications for being suitable to special-purpose.

Claims

1. a kind of object identification method based on marking area bag of words, comprises the following steps：

Step 1：Corner Detection；

Step 2：The marking area of positioning image；

Step 3：SIFT feature is extracted；

Step 4：Image area characteristics similarity system design.

2. the object identification method based on marking area bag of words as claimed in claim 1, it is characterised in that：The angle point For ShiTomasi angle points, its rate of change for passing through to calculate gradient direction is calculated.

3. the object identification method based on marking area bag of words as claimed in claim 2, it is characterised in that：It is described ShiTomasi angle points are that brightness of image changes violent or curvature point very greatly.

4. the object identification method based on marking area bag of words as claimed in claim 1, it is characterised in that：The step 2 is the region that positioning image key area is converted to angle point distribution in image.

5. the object identification method based on marking area bag of words as claimed in claim 4, it is characterised in that：The positioning Method is：Image is divided into into m × n blocks, the angle point quantity included per block is recorded a m × n by the quantity of angle point in counting per block In matrix.If piecemeal interior angle point quantity >=q, it is believed that continuous concentrated area that angle point is located is image key area, and wherein q is to sentence The threshold value of disconnected piecemeal interior angle point quantity, for screening the background area comprising isolated or a small amount of angle point.

6. the object identification method based on marking area bag of words as claimed in claim 1, it is characterised in that：The step 3 include that DoG extreme points extractions and characteristic vector are formed.

7. the object identification method based on marking area bag of words as claimed in claim 6, it is characterised in that：The DoG The method of extreme points extraction is by scaling being carried out to original image, multi-scale image space representation sequence being obtained, is completed Feature extraction on different resolution.

8. the object identification method based on marking area bag of words as claimed in claim 7, it is characterised in that：The yardstick Including at least one of large scale and little yardstick.

9. the object identification method based on marking area bag of words as claimed in claim 8, it is characterised in that：The big chi Degree (low resolution), embodies the general picture feature of object.

10. the object identification method based on marking area bag of words as claimed in claim 9, it is characterised in that：It is described little Yardstick (high-resolution), embodies the minutia of object.