CN101493891A

CN101493891A - Characteristic extracting and describing method with mirror plate overturning invariability based on SIFT

Info

Publication number: CN101493891A
Application number: CNA2009100679878A
Authority: CN
Inventors: 操晓春; 郭晓杰; 张钢; 曲彦龄; 武琳; 张炜
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2009-02-27
Filing date: 2009-02-27
Publication date: 2009-07-29
Anticipated expiration: 2029-02-27
Also published as: CN101493891B

Abstract

The invention pertains to the technical field of image processing and relates to a feature extraction and description method based on SIFT and with invariance in mirror inversion. The method comprises the following steps: (1) a Gaussian kernel convolution process is conducted to an input image; (2) a Gaussian difference process is conducted to the image to detect an extremum point of the image; (3) feature points are selected; (4) locations of the feature points are accurately positioned; (5) a direction parameter of each feature point is determined; (6) summation is conducted to values of gradient modulus respectively taking a main direction as a boundary, at the two sides; and (7) pixel units in a Gaussian weight window are organized to conduct encoding and normalization and further generate description data of the image. The method increases the robustness of feature extraction and description methods to a mirror image and extends the applicable field of computer vision.

Description

Feature extraction with mirror face turning invariance and describing method based on SIFT

Technical field

The invention belongs to technical field of image processing, relate to a kind of image characteristic extracting method.

Background technology

Current technical development of computer is at full speed, and the application of computer vision and image retrieval more and more widely also shows out simultaneously their importance.Popular three-dimensional reconstruction, object identification, camera calibrated, robot binocular navigation or the like all are to be based upon on the basis of computer vision, so solve the existing problem of computer vision rationally and effectively or improve the imperfection part and bring huge promotion can for computing machine circle even scientific circles.Computer vision is based upon on a kind of intelligent theory that makes the computer simulation mankind (mammal) vision and acquire a certain degree, all need the feature of image is carried out extraction and analysis together with image retrieval etc., so the definition of characteristics of image and extraction scheme have very important effect.Certainly, multiple solution is arranged now, common feature extraction and the describing method that has based on gradient comprises Harriscorner detector[1], SIFT[2], SURF[3], HOG[4], GLOH[5].Except the feature extracting method based on gradient, extraction waits other method based on profile in addition.Because the MIFT that the present invention proposes is based on the method for gradient, so will not too much describe other method.Wherein Harris corner detector (Harris's Corner Detection device) can extract on the yardstick of image own the unique point for rotation and illumination unchangeability.In fact, be not only to be to extract angle point as said in the title, but all unique points that on a plurality of directions, has remarkable gradient.But the limitation of Harris corner detector is relatively large, because it is very responsive to the convergent-divergent of graphical rule.In order to remove or to weaken the influence that graphical rule changes, Lowe proposes SIFT (conversion of yardstick invariant features) and has solved the problem that scale is brought, and it has also guaranteed rotational invariance even to a certain degree can tolerate influences such as illumination, affined transformation and covering certainly.SURF simply is a SIFT who quickens version, and it and SIFT adopt a kind of auxiliary area strategy, that is to say with the unique point to be that an auxiliary area is specified at the center, and the pixel in this zone determines the description of unique point jointly.Different is, and SIFT adopts a kind ofly according to different pixels in the zone the different strategy of unique point contribution is given weights to them, and SURF then only adopts the equal weights strategy based on integral image (integral image).HOG provides a kind of human detection method based on gradient information in conjunction with SVM (support vector machine).GLOH then is another mutation form of SIFT, it utilizes circular strategy to organize auxiliary area, purpose is robustness and the high-lighting that strengthens feature, initial GLOH descriptor has 272 dimensions, but through PCA (principal component analysis (PCA)) operation, make 272 dimensions reduce to 128 dimensions identical, guarantee on the basis of not losing key message, improve the efficient of matching operation with SIFT.All above-mentioned methods, though can well solve the distortion on rotation, dimensional variation or even illumination variation and the image such as affine, but nearly all method is not all considered the situation of mirror image one class, this situation is very common in actual life, water surface inverted image for example, mirror image, the different angles of symmetric objects are observed or the like.

List of references:

[1] C.Harris and M.J.Stephens, " arm of angle detector ", Alvey visual conference, vol.20, pp.147-152,1988.

[2] D.G.Lowe, " being selected from the unique image feature of the constant key point of yardstick ", the international periodical of computer vision, vol.60, pp.91-110,2004.

[3] H.Bay, T.Tuytelaars and L.Van Gool., " Surf: quicken to extract the robustness feature ", computer vision Europe meeting, pp.404-417,2006.

[4] N.Dalal and B.Triggs, " based on the histogrammic human detection of direction gradient ", the international meeting of computer vision and pattern-recognition instrument, vol.1, pp.886-893,2005.

[5] K.Mikolajczyk and C.Schmid, " performance estimation of local feature description's symbol ", IEEE pattern analysis and machine intelligence proceedings, vol.27, pp.1651-1630,2004.

Summary of the invention

The object of the present invention is to provide a kind ofly can solve the problem that SIFT feature extracting method that the mirror face turning phenomenon caused lost efficacy, keep all advantages and the performance of SIFT simultaneously again.In other words, provide a kind of feature extraction and describing method that under the two kinds of situations in upset front and back, keeps same description form, also promptly have the feature extraction and the describing method of mirror face turning invariance.For this reason, the present invention adopts following technical scheme:

A kind of feature extraction with mirror face turning invariance and describing method based on SIFT comprise the following steps:

Step 1: to the image I of input (x y) carries out the gaussian kernel process of convolution, promptly L (x, y, σ)=G (x, y, σ) * I (x, y), obtain the image L that multiscale space expresses (x, y, σ), in the formula,

G (x, y, σ) = \frac{1}{2 π σ^{2}} e^{- (x^{2} + y^{2}) / 2 σ^{2}},

Wherein σ is the variance of Gauss normal distribution;

Step 2: the image L that multiscale space is expressed (x, y, σ) according to following formula carry out Gaussian difference process D (x, y, σ)=(G (x, y, k σ)-and G (x, y, σ)) * I (x, y)=L (x, y, k σ)-L (x, y σ), detects image L (x, y, extreme point σ) that multiscale space is expressed;

Step 3: utilize threshold method and Hessian matrix method screening unique point;

Step 4: the accurate location feature point of the mode position of adopting the three-dimensional quafric curve of match;

Step 5: according to the gradient direction θ on the auxiliary neighborhood territory pixel of unique point (x, y) and big or small m (x, information y) are determined the direction parameter of each unique point, wherein:

m (x, y) = \sqrt{(L (x + 1, y) - L (x - 1, y))^{2} + (L (x, y + 1) - L (x, y + 1))^{2},}

θ (x, y) = \tan^{- 1} (\frac{L (x, y + 1) - L (x, y - 1)}{L (x + 1, y) - L (x - 1, y)}),

And adopt histogrammic form, and in conjunction with the Gauss's weight window that with the unique point is the center of circle, add up according to the gradient magnitude and the direction of each pixel in the whole neighborhood, determine principal direction;

Step 6:, calculate to being the gradient-norm value summation of marginal both sides with principal direction respectively

m_{r} = Σ_{k = 1}^{(N_{bin} - 2) / 2} L_{(n_{d} - k + N_{bin}) % N_{bin}}

With

m_{l} = Σ_{k = 1}^{(N_{bin} - 2) / 2} L_{(n_{d} + k + N_{bin}) % N_{bin}},

Wherein, N _BinBe the sum of all directions, n _dThen be the index and the L of principal direction _iBe illustrated in direction

On the gradient-norm value, % represents modulo operation;

Step 7: if m _r＞m _lThen organize the pixel cell in Gauss's weight window from top to bottom, from right to left, each gradient is encoded clockwise from the relative principal direction 0 degree beginning of gradient angle in each unit, otherwise, then organize the pixel cell in Gauss's weight window from top to bottom, from right to left, each gradient is encoded counterclockwise from the relative principal direction 0 degree beginning of gradient angle in each unit, and carries out the normalization operation, thereby generates the data of description to image.

The present invention has increased feature extraction and the describing method robustness to the mirror image problem, has expanded the application of computer vision.The method of all feature extraction and description is not considered to some extent the situation of mirror image and is handled in the computer nowadays vision field, though can under variations such as rotation, yardstick, illumination, can keep to a certain degree stability as methods such as Harris corner detector, SIFT, SURF, GLOH, under the situation of mirror image, but feel simply helpless.The method that The present invention be directed to this situation and propose, when successfully solving the mirror image situation, for the situation of non-specular surface imaging, it has the performance close with SIFT.Feature extraction and describing method that the present invention proposes are referred to as MIFT here, with the comparative result of SIFT under the mirror image situation as shown in Figure 1, be all in threshold value under 0.60 the situation, 258 unique points of MIFT coupling are right, 12 unique points of SIFT coupling are right.The design sketch of Fig. 2 compares the matching result of several comparatively complex image.

Description of drawings

Fig. 1: the matching effect figure under the mirror image situation.Fig. 1 (a) is that MIFT matching result Fig. 1 (b) is the SIFT matching result.

Fig. 2: MIFT and SIFT matching result are relatively.(a) under non-specular surface imaging situation, the MIFT matching result.(b) under non-specular surface imaging situation, the SIFT matching result.(c) under the mirror image situation, the MIFT matching result.(d) under the mirror image situation, the SIFT matching result.

The expression situation synoptic diagram of Fig. 3 piece image under adjacent three yardsticks.

Fig. 4: feature descriptor contextual analysis of organization before and after the image upset.Wherein, (a) unique point and auxiliary area thereof the expression on flipped image not.(b) same unique point and auxiliary area thereof the expression on image after the upset.(c) from the expression of 8 gradients in the 14th unit in (b).(d) from the expression of 8 gradients in the 14th unit in (a).(e) feature descriptor of SIFT and MIFT under (a) situation.(f) feature descriptor of SIFT under (b) situation.(g) feature descriptor of MIFT under (b) situation.

Fig. 5: unique point gradient information figure.Among the figure, n _dBe the index subscript of principal direction, % represents modulo operation.

Fig. 6: feature detection and feature description process flow diagram.

Fig. 7: characteristic matching process flow diagram.

Embodiment

The present invention at first carries out dimensional variation with input picture, and this variation is finished by Gaussian convolution, in the image of a series of different scales, all seeks the extreme point of pixel gray-scale value at each pixel.Yet not all extreme point all meets the standard as unique point, because unique point need have certain high-lighting and robustness, by Gaussian difference (Difference of Gaussian) and Hessian matrix (Hessian matrix) are provided with appropriate threshold, screen with alternative point with skirt response thereby respectively unique point is had low contrast.The left extreme point of screening through these two steps is exactly desired unique point, for these extreme points, by after the three-dimensional conic fitting, obtains their precise coordinates and yardstick information.Information such as the coordinate of keeping characteristics point, yardstick are for afterwards matching stage provides available information.

Above-mentioned part can be considered to feature point detector, that is to say that this part mainly is a unique point of seeking image.So and then, be exactly the information of tissue signature's point how, become available feature description, so that more high-rise application--characteristic matching to be provided.Unique point is described the structure that part is divided into statistical gradient information assigned direction parameter and feature descriptor.

At the mirror image problem, it is four classes that image is divided into, and is respectively: original image, flip horizontal image (mirror image), flip vertical image (inversion) and complete flipped image (existing level to overturn with vertical two classes simultaneously).But checking is that complete flipped image is equal to the resulting image of original image rotating coordinate system 180 degree easily, and same, flip vertical image and flip horizontal image also exist as the similarity relation between original image and the complete flipped image.Again because the operation of the assigned direction parameter before unique point is described makes feature description have rotational invariance, so four kinds of situations can be reduced to two kinds, i.e. original image and flip horizontal image.Between original image and flip horizontal image, there is a kind of fixing relation, this relation can be divided into two levels.Each unique point among the SIFT all is that the pixel that is subjected to a zone influences, and this strategy is the robustness for the influence that reduces noise, raising unique point.Pairing this zone of so same unique point, the row reversed in order of area pixel unit just in original image and flip horizontal image, and the row order is constant, this is of two higher levels in the level.Another level then is the unit of relative microcosmic, and in each little pixel cell, according to pairing relation between two kinds of images, 8 gradient directions in each unit satisfy following formula:

Wherein, subscript ori represents original image, and subscript hor represents the flip horizontal image.Coding strategy makes full use of the relation of two levels, thereby produces the feature descriptor of a mirror face turning invariance.

Below in conjunction with drawings and Examples the present invention is described further.

Step 1: the multiscale space of input picture is represented

Koendetink and Lindeberg proof Gaussian convolution nuclear are the constant convolution kernels that unique linearity realizes change of scale.The two-dimensional Gaussian function form is:

G (x, y, σ) = \frac{1}{2 π σ^{2}} e^{- (x^{2} + y^{2}) / 2 σ^{2}},

Wherein σ is the variance of Gauss normal distribution.The image of a width of cloth two dimensional image under different scale can obtain by image and gaussian kernel convolution so: L (x, y, σ)=G (x, y, σ) * I (x, y).

Step 2: detect yardstick spatial extrema point

In resulting multiple dimensioned image in the step 1, can be the extreme value of observation unit by calculating and relatively obtain with the pixel from whole metric space.Except certain number of pixels point on the more adjacent yardstick, the adjacent pixels point on simultaneously also will more same width of cloth two dimensional image space is able to comparatively comprehensively detect extreme point.The present invention utilizes DoG (Difference-of-Gaussian) operator to be similar to LoG (Laplacian-of-Gaussian) to detect extreme point, though the precision of DoG is slightly too late than LoG, the former arithmetic speed surpasses the latter.Wherein, DoG operator definitions form is as follows:

D(x，y，σ)＝(G(x，y，kσ)-G(x，y，σ))*I(x，y)＝L(x，y，kσ)-L(x，y，σ)。

Fig. 3 has shown the expression situation of piece image under adjacent three yardsticks, and the black cross is current pixel position of calculating, and the round dot of grey is the pixel that all need compare, totally 2 * 9+8=26 point.

Step 3: from extreme point, screen unique point

Having finished above-mentioned two resulting extreme points of step and formed the alternative set of unique point, screened in the alternative point during unique point will be gathered thus, that is to say, is not the requirement that all extreme points all satisfy unique point.Because in this set, also exist the point of low contrast and skirt response, outstanding inadequately as their uniqueness of feature and the stability of image, so adopt two kinds of different strategies that this two classes point is weeded out.One in computing Gaussian difference (DoG), is provided with the extreme point that suitable threshold is eliminated low contrast effectively.Its two because Gaussian difference (DoG) itself exists skirt response, adopt Hessian matrix (Hessian matrix) method to filter out point (saddle point) so with skirt response.

Step 4: accurate location feature point position

Passed through above operation, unique point is determined, because conversion and the influence on the pixel unit size on the yardstick may cause unique point coordinate and some deviation of yardstick information, for the information that guarantees unique point accurate, the mode of the three-dimensional quafric curve of employing match is carried out the match of information and is approached, to obtain more precise coordinates and graphical rule information.

Step 5: determine direction parameter

Have rotational invariance in order to satisfy unique point,, be each unique point assigned direction parameter according to the gradient direction on the auxiliary neighborhood territory pixel of unique point and the information of size.Wherein, the big or small m of gradient (x, y) and orientation angle θ (x can be that basic calculation is come out by the difference between the pixel y), and account form is:

m (x, y) = \sqrt{(L (x + 1, y) - L (x - 1, y))^{2} + (L (x, y + 1) - L (x, y + 1))^{2},}

θ (x, y) = \tan^{- 1} (\frac{L (x, y + 1) - L (x, y - 1)}{L (x + 1, y) - L (x - 1, y)}) .

Then, adopt histogrammic form to carry out Information Statistics, with histogrammic scope dictates is the 0-360 degree, 10 degree are a unit, and, add up according to the gradient magnitude and the direction of each pixel in the whole neighborhood in conjunction with being Gauss's weight window (Gaussian weighted windows) in the center of circle with the unique point.In histogram in 36 directions the intensity maximum be principal direction, because the influence of noise or distortion, image may slightly change or distortion, and these variations or distortion may cause the principal direction parameter generating deviation of unique point, for the influence of alleviating or avoiding these deviations that the direction parameter is brought, the present invention is the same with SIFT to use auxilliary direction.Auxilliary direction is defined as the direction of intensity greater than 80% principal direction intensity, and auxilliary direction has a plurality of.In fact, each auxilliary direction all is regarded as with principal direction of equal importance and set up independent descriptor in the link of generating feature descriptor.

Step 6: unique point descriptor coding

The feature descriptor that the present invention adopts is represented that by a vector this vector has comprised the gradient information of the pixel in all Gauss's weight window.Vector has comprised the information of 4 * 4 * 8=128 dimension, and the dimension of this degree is proved a kind of of success by Lowe, and same MIFT also uses the vector of 128 dimensions to represent descriptor, although the size of dimension and optional constant.With Fig. 4 is example, respectively two levels is analyzed.Wherein (a) is illustrated in unique point and the auxiliary area thereof in the flipped image not, (b) then is the expression in same unique point and its image of auxiliary area after upset, and both of these case has all been specified direction parameter.

At first, the analytical characteristic descriptor order of level-16 unit of macroscopic view comparatively.After regulation principal direction, SIFT uses fixing order, takes these 16 unit of column major order's (also can take row major order) tissue, and as shown in Figure 4, SIFT feature descriptor vector is shown in (e).Through the image after the upset, the situation of just corresponding (b), the row order of these 16 unit is constant, but the row reversed in order has so just caused the vectorial organizational form of the feature descriptor of SIFT under (b) shown in (f).It is beyond any doubt that SIFT has good stable and robustness to rotation, scale, illumination, affined transformation etc., but same undeniable be that for this class situation of mirror image, SIFT is powerless.So, the present invention proposes a kind of coding method that can before and after the image upset, obtain the descriptor of unique form.Under two kinds of different situations, two kinds of coded systems are only arranged under the situation of row priority encoding, a kind of is that from right to left order another kind then is a reverse sequence.On directly perceived, the gradient-norm value of pointing to about among the figure (dot-and-dash line) can be as the foundation of judging certain mode in adopting above-mentioned two kinds.But the gradient information that points to about only using is relatively more responsive to influences such as noises, so the present invention uses the strategy that all sensing the same side gradient-norm values are sued for peace instead.As shown in Figure 5, abstractly be for mathematical formulae:

m_{r} = Σ_{k = 1}^{(N_{bin} - 2) / 2} L_{(n_{d} - k + N_{bin}) % N_{bin}},

m_{l} = Σ_{k = 1}^{(N_{bin} - 2) / 2} L_{(n_{d} + k + N_{bin}) % N_{bin}} .

Wherein, N _BinBe the sum of all directions, here N _Bin=36, n _dThen be the index and the L of principal direction _iBe illustrated in direction

On the gradient-norm value, % represents modulo operation.M wherein _rAnd m _lBe respectively lower right dotted arrow and upper left side dash-dot arrows sum among Fig. 5.In view of the above, we change over coding strategy into by m by original permanent order coding _rAnd m _lThe comparative result decision.In theory, the method that proposes by the present invention before and after upset can obtain same descriptor shown in (g) among Fig. 4.Similar to major-minor orientation principle, in order to reduce influences such as various noises and illumination condition variation, strengthen the robustness of MIFT, if satisfy min{m _r, m _l}＞τ max{m _r, m _l, (wherein τ is a threshold value, and we are set to 0.70 here) another descriptor so then generates thereupon.

Second level be microcosmic comparatively, is the relation specific to all directions gradient in each unit.As (c) among Fig. 4 with (d), same part in the image after their respectively corresponding upsets and before the upset, the relation of all directions gradient sees above-mentioned specific descriptions for details (c) and (d).According to this special relationship,, can generate final feature descriptor: calculate and comparison m by following strategy in conjunction with the analysis of first level _rAnd m _l, by a bigger sensing as coded sequence, in conjunction with Fig. 4, m under the situation shown in (a) _r＜m _l, so the coded sequence of 16 unit for from top to bottom, from left to right, 8 gradient directions in each unit then shown in (d), from A, coding counterclockwise; If m under the situation shown in (b) _r＞m _lSo the coded sequence of 16 unit for from top to bottom, from right to left, 8 gradient directions in each unit are shown in (c), from A, encode clockwise, resulting feature descriptor is identical under the both of these case, that is to say that this feature description can solve the problem of mirror image class.Certainly, in order to increase the stability that feature descriptor changes illumination condition, last normalization operation is absolutely necessary.The flow process of feature detection and feature description, as shown in Figure 6.

After to the detection of the processing of input picture, unique point, feature description, the part of characteristic matching is absolutely necessary, and also needs meticulous design right to reach the correct unique point of coupling as much as possible simultaneously, reduces the purpose of erroneous matching.The matching process that SIFT adopted is to describe unique points all in vector and the image 1 by current unique point to be matched in the image 2 to describe vector and carry out inner product operation and obtain one group of numerical value, the ordering that these numerical value are ascending, the pairing group of feature point of result of the minimum of numerical value just that makes number one is the object whether required judgement mates.If minimum numerical value is lower than certain threshold value with ratio between the second little numerical value, pairing two unique points of minimum value are considered to mate so, otherwise, do not mate.This method is compared with general thinking (method of global threshold promptly is set), have certain stability and rationality, because the various distortion of image and noise all can exert an influence to this, fixed index is the problem of can not effective and reasonable explanation whether mating so.

Equally, MIFT also adopts the matching strategy of similar SIFT, but improves to some extent on this, and this improvement is the right situation of point in order to reduce even to avoid mistake to lose should mating.The inner product result of two vector of unit length reflects the cosine value of two vector angles, and more little two vectors of the value of that is to say are approaching more, and matching degree is high more.Because same unique point may be because major-minor direction and m _rAnd m _lThe similarity problem and generate a plurality of feature descriptors, this situation causes the high possibility of a plurality of feature descriptor similarity degrees to increase.In order to reduce of the influence of this situation to matching result, the present invention has improved the matching process of SIFT, additional information to feature descriptor is checked, if the coordinate in pairing two group of feature point of two numerical value that compared, information such as yardstick all are identical, the pairing group of feature point of that numerical value that so will be bigger skips over, enter with next numerical value and compare corresponding to a group of feature point matching degree, additional information until two group of feature point is not quite similar, if two value ratios are lower than certain threshold value, the pairing group of feature point of minimum value is mated so, otherwise, do not mate.The characteristic matching flow process, as shown in Figure 7.

Claims

1. feature extraction with mirror face turning invariance and describing method based on a SIFT is characterized in that, comprise the following steps:

G (x, y, σ) = \frac{1}{2 π σ^{2}} e^{- (x^{2} + y^{2}) / 2 σ^{2}},

Wherein σ is the variance of Gauss normal distribution;

m (x, y) = \sqrt{{(L (x + 1, y) - L (x - 1, y))}^{2} + {(L (x, y + 1) - L (x, y - 1))}^{2}},

θ (x, y) = \tan^{- 1} (\frac{L (x, y + 1) - L (x, y - 1)}{L (x + 1, y) - (L (x - 1), y)}),

m_{r} = Σ_{k = 1}^{(N_{bin} - 2) / 2} L_{(n_{d} - k + N_{bin}) % N_{bin}}

With

m_{l} = Σ_{k = 1}^{(N_{bin} - 2) / 2} L_{(n_{d} + k + N_{bin}) % N_{bin}},

On the gradient-norm value, % represents modulo operation;