CN105184320A

CN105184320A - Non-negative sparse coding image classification method based on structural similarity

Info

Publication number: CN105184320A
Application number: CN201510566662.XA
Authority: CN
Inventors: 石伟伟; 王进军; 龚怡宏; 张世周
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2015-09-08
Filing date: 2015-09-08
Publication date: 2015-12-23
Anticipated expiration: 2035-09-08
Also published as: CN105184320B

Abstract

The invention provides a non-negative sparse coding image classification method based on structural similarity. The method comprises the steps that dense SIFT feature extracting is carried out on all images in an image data set to be processed; a number of SIFT features are randomly selected to acquire the codebook of the image data set to be processed; a non-negative sparse coding model based on structural similarity is established; the SIFT features which are randomly selected to solve the codebook of the image data set; the codebook is fixed, and all SIFT features are coded; a spatial pyramid maximum pooling method is used to integrate the coding of each image in the image data set, so as to acquire the feature vector of each image; the image data set is divided into a training set and a test set, and the image feature vectors after spatial pyramid maximum pooling in the training set and corresponding labels of the images are used to train a classifier; and for any image, the feature vector after spatial pyramid maximum pooling is input into the trained classifier to acquire the prediction category of the image.

Description

The image classification method of the non-negative sparse coding of structure based similarity

Technical field:

The invention belongs to computer visual image sorting technique field, be specifically related to a kind of image classification method of non-negative sparse coding of structure based similarity.

Background technology:

A critical function in biological vision system primary stage is exactly the statistical redundancy removing input stimulus as far as possible.The response that primary visual cortex stimulates to external world meets openness, namely only has the neuron of minority to be activated, is encoded to sparse coding accordingly.Sparse coding generally, is exactly a combination signal being expressed as one group of base, and requires only to need several bases of minority just can by signal reconstruction out.Sparse coding has been widely applied to the field such as computer vision, picture signal process, such as, and the application such as signal reconstruction, signal denoising, image characteristics extraction and classification.

Structural information is defined as independent of brightness, contrast by structural similarity index, the attribute of reflected signal structure, and is the combination of brightness, contrast and structure three Different factor by distortion modeling.With the estimation of average as brightness, the estimation that standard deviation is spent as a comparison, covariance is as the tolerance of structure similarity degree.

Traditional sparse coding method is based on the reconstruct under least mean-square error meaning, namely make the quadratic sum of reconstructed error little as far as possible, meanwhile, make rarefaction of encoding accordingly as far as possible, the element of the proper vector that sparse table is encoded in encoding now is exactly as much as possible is zero.The current image classification method based on sparse coding, major part is all do based on the encoding model of minimum reconstructed quadratic sum, and error sum of squares does not meet the visual characteristic of human eye as the judgment criteria of distortion.Nearest research shows, the major function of human visual system is the structured message extracted from visual zone image and video, and error sum of squares does not consider the visual characteristic of human eye fully, therefore traditional sparse coding reconstruct can not evaluate the structural similarity of reconstructed image and original image well.

Summary of the invention:

The object of the invention is to for the deficiencies in the prior art, provide a kind of image classification method of non-negative sparse coding of structure based similarity.

For achieving the above object, the present invention adopts following technical scheme:

The image classification method of the non-negative sparse coding of structure based similarity, comprises the following steps:

1) all images are concentrated densely to extract SIFT feature respectively to pending view data;

2), after concentrating the complete SIFT feature of all image zooming-out to pending view data, the SIFT feature choosing 5 ten thousand to 50 ten thousand is randomly for asking the code book of pending image data set;

3) the non-negative sparse coding model of structure based similarity is set up;

4) according to step 2) and step 3), the code book of this image data set is solved by the SIFT feature of random selecting;

5), after the code book of this image data set solves out, fixed code book, encodes to all SIFT feature;

6) concentrate the coding often opening image to carry out spatial pyramid maximum pond method to view data to integrate, often opened the proper vector of image;

7) image data set is divided into training set and test set, with image feature vector and the corresponding label of image in the maximum pond of the spatial pyramid of training set, trains a sorter;

8) for any image, proper vector after maximum for its spatial pyramid pondization is input in the sorter trained, obtains this image prediction classification.

The present invention further improves and is, step 1) in, concentrate all images according to the block of pixels of 16 to 32 and the sliding step of 6 to 10 to pending view data, densely extract the SIFT feature of each image.

The present invention further improves and is, step 3) in, if code book is A=[a ₁, a ₂..., a _k], a base vector is shown in each list of A, and the columns of code book is k, SIFT feature vector x _iunder code book A, corresponding sparse coding is s _i, definition encoder matrix S=[s ₁, s ₂..., s _n], namely each row of encoder matrix S are structural similarity sparse codings of corresponding SIFT feature; The objective function of non-negative sparse coding model is as follows:

\underset{S &GreaterEqual; 0, &ForAll; i, || a_{i} || = 1}{m i n} F (A, S) = Σ_{i = 1}^{n} (1 - S S I M (x_{i}, {As}_{i})) + λ Σ_{i = 1}^{n} Σ_{j = 1}^{k} s_{j i}

Wherein, i=1,2 ..., n, n are the number of the SIFT feature of random choose, j=1,2 ..., k, k are the columns of code book A, s _jifor sparse coding is s _ia jth component;

Write as matrix form:

\underset{S &GreaterEqual; 0, &ForAll; i, || a_{i} || = 1}{m i n} F (A, S) = Σ_{i = 1}^{n} (1 - S S I M (x_{i}, {As}_{i})) + λ {|| S ||}_{m_{1}}

Wherein, || a _i||=1, namely each mould length arranged of code book or L-2 norm are 1, the m of representing matrix S ₁norm, m ₁norm equal the absolute value of all elements of matrix and;

In the presentation code matrix S of S >=0, each element is non-negative;

λ is adjustment structure distortion level and openness weight coefficient of encoding, and λ is larger, and corresponding coding is sparse, and has 0.05≤λ≤0.5;

SSIM () is structural similarity target function.

The present invention further improves and is, step 4) concrete methods of realizing as follows:

401) according to step 2) in choose randomly 5 ten thousand to 50 ten thousand SIFT feature, the value of given weight coefficient λ;

402) initialization code book A:

Random initializtion code book A ⁽⁰⁾, by be normalized into 1 by long for the mould of each row of code book A, put t=1; A ^(t)represent the t time iterative value of code book, A ⁽⁰⁾represent code book initial value;

403) the corresponding encoder matrix S of random initializtion ⁽⁰⁾, each element in matrix is initialized as a random number between 0 to 1, S ^(t)the t time iterative value of presentation code matrix S;

404) encoder matrix S is upgraded, specific as follows:

Wherein: symbol ← expression assignment, the value by the right variable is assigned to the variable on the left side; Symbol with represent that Hardmard takes advantage of and removes respectively, namely the dot product of matrix and point remove; Evolution function sqrt () acts on each component elements of matrix representation to corresponding matrix and does open computing; B ₊=max (B, 0), B _-=-min (B, 0), with evolution function sqrt (), max () and min () makes comparisons each element and 0 of matrix here respectively, gets maximum or least member accordingly; h be with the element of matrix S same order be entirely 1 matrix;

405) code book A is upgraded, specific as follows:

A^{(t)} &LeftArrow; A^{(t - 1)} - σ {&dtri;}_{A} E

Wherein, σ is the Gradient Descent step-length optimizing code book, and the span of h is 0.01 ~ 0.1, molecule max{|A _ij| represent the maximal value of element absolute value in code book A, denominator represent the gradient matrix of E about code book A the mean value of all elements absolute value, the transposition of subscript T representing matrix or vector, ρ represents penalty coefficient, and F is the objective function of non-negative sparse coding model;

406) if this variable relative of twice is less than 10 before and after target function value ^-6, or reach predetermined iterations, just stop performing, obtain code book matrix A; Otherwise, put t=t+1, forward step 403 to).

The present invention further improves and is, wherein, namely D is a diagonal matrix, and its diagonal element is respectively i is the unit matrix with D same order.

The present invention further improves and is, step 5) in, in objective function, code book is fixed, according to calculate the non-negative sparse coding s of the structure based similarity of SIFT feature x.

Relative to prior art, the present invention has following advantage:

The present invention is by being incorporated into inside sparse coding method by structural similarity index, change traditional coded system, improve the quality of coding, make to encode and more meet the coded system of human visual system, then the mode that corresponding coding carries out the maximum pond of spatial pyramid is integrated, the proper vector of image can be obtained, and then the proper vector behind maximum for spatial pyramid pond is applied to Images Classification.

The present invention eliminates the reconstruct distortion metrics based on least mean-square error in traditional sparse coding method, by the tolerance of structural similarity as reconstruct distortion, propose the non-negative sparse coding model of structure based similarity, the party can encode to the feature of any vectorization, is a kind of coding method meeting human-eye visual characteristic.

The present invention densely extracts SIFT feature to image, the similar non-negative sparse coding of structure based is carried out to SIFT feature, the proper vector that the maximum pondization of spatial pyramid just obtains entire image is carried out to coding, the proper vector obtained is used for Images Classification.

Accompanying drawing illustrates:

Fig. 1 is the process flow diagram of the image classification method of the non-negative sparse coding that the present invention is based on structural similarity.

Fig. 2 is spatial pyramid maximum pond schematic diagram.

Embodiment:

Below in conjunction with drawings and Examples, the present invention is described in further detail.

The present invention attempts from the angle of structural similarity to find corresponding sparse coding, the present invention introduces structural similarity keeps degree important measurement index as information, then add that non-negative sparse retrains, give the non-negative sparse coding model of structure based similarity.The nonnegativity why requirement is encoded is because the coding of non-negative has better stability in the application.

Setting signal x and y, x, y ∈ R ⁿ, structural similarity is defined as follows:

S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x, y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})},

Wherein, x _ifor i-th component elements of signal x, y _ifor i-th component elements of signal y, σ _xand σ _yrepresent the standard deviation of signal x and y respectively,

σ_{y} = {(\frac{1}{N - 1} Σ_{i = 1}^{N} {(y_{i} - μ_{y})}^{2})}^{\frac{1}{2}},

σ _x,yrepresent the covariance of signal x and y,

σ_{x, y} = \frac{1}{N - 1} Σ_{i = 1}^{N} (x_{i} - μ_{x}) (y_{i} - μ_{y}),

0<C ₁, C ₂< < 1 is 0 and ad hoc two minimum normal numbers for avoiding denominator, and structural similarity, more close to 1, illustrates that these two features are more close.Therefore, 1-SSIM (x, y) can as the degree weighing Characteristic Distortion.

As shown in Figure 1, the present invention is based on the image classification method of the non-negative sparse coding of structural similarity, comprise the following steps:

1): to each width picture of data set according to a certain size block of pixels (such as the image block of 16 × 16 pixel sizes) and pre-determined sliding step up and down (such as the sliding step of 6 pixels), densely extract SIFT feature.

2): from extracted all SIFT feature, choose n (such as, 200,000) individual SIFT feature randomly, selected whole features form a matrix, are designated as X, note X=[x ₁, x ₂..., x _n]; Each arranges x _i∈ R ^{p × 1}(i=1,2 ..., n) represent a SIFT feature vector, p represents the dimension of extracted SIFT feature, if when SIFT feature vector is 128 dimension, so p=128, whole SIFT feature selected are here for asking the code book of this data set.

3): set code book as A=[a ₁, a ₂..., a _k], a base vector is shown in each list of A, and the columns of code book is k, SIFT feature vector x _i(i=1,2 ..., sparse coding n) under code book A is s _i(i=1,2 ..., n), definition encoder matrix S=[s ₁, s ₂..., s _n].Provide the objective function of non-negative sparse coding model below:

\underset{S &GreaterEqual; 0, &ForAll; i, || a_{i} || = 1}{m i n} F (A, S) = Σ_{i = 1}^{n} (1 - S S I M (x_{i}, {As}_{i})) + λ Σ_{i = 1}^{n} Σ_{j = 1}^{k} s_{j i}

Wherein, i=1,2 ..., n, n are the number of the SIFT feature of random choose, j=1,2 ..., k, k are the columns of code book, s _jifor sparse coding is s _ia jth component;

Write as matrix form:

\underset{S &GreaterEqual; 0, &ForAll; i, || a_{i} || = 1}{m i n} F (A, S) = Σ_{i = 1}^{n} (1 - S S I M (x_{i}, {As}_{i})) + λ {|| S ||}_{m_{1}}

Wherein, || a _i||=1, namely each mould length arranged of code book or L-2 norm are 1 (the L-2 norm of vector equals its all elements square root sum square), and requiring the mould of each base in code book length to be 1 is to prevent trivial solution, the m of representing matrix S ₁norm, m ₁norm equal the absolute value of all elements of matrix and.In the representing matrix S of S >=0, each element is non-negative.In objective function, the effect of Section 1 is that structure is similar as much as possible to former feature in order to ensure the feature of reconstruct, that the one of reconstruct distortion is weighed, in traditional sparse coding model, reconstruct distortion weighs by the quadratic sum of error, and the present invention's structural similarity index is weighed.Section 2 in objective function is to ensure the openness of coding.λ regulates this weight coefficient of two, and λ is larger, and corresponding coding is sparse, and in method implementation procedure, λ is taken as the constant being greater than zero, and the size of λ value can regulate according to different data sets.Optimum code book sought by the present invention's needs and corresponding structural similarity sparse coding makes objective function obtain minimum value.

4) SIFT feature selected: according to step 2) and step 3) in objective function, solve the code book of the optimum of pending image data set.Adopt the strategy of alternative optimization to optimize the objective function of the non-negative sparse coding model of structure based similarity, concrete scheme is:

The first step: given selected SIFT feature, forms matrix X; The value of given weight coefficient λ.

Second step, initialization code book A;

3rd step: fixing A, optimizes S, adopts the convex optimization method of nonnegative matrix;

4th step: fixing S, optimizes A, adopts gradient descent method;

Repeat the 3rd step and the 4th step, until algorithm convergence.

When objective function converges, the code book obtained is exactly the code book of this data set.Judge that the method for convergence can with of following two kinds of criterions: before and after (a) target function value, this variable relative of twice is less than 10 ^-6; This variable relative of b F-norm that the front and back two of () code book are taken second place poor is less than 10 ^-6; When restraining, A is required code book.Here the F-norm of matrix equals square root sum square of all elements in matrix.

Concrete analysis is as follows with the process solving scheme:

By

F (A, S) = Σ_{i = 1}^{n} (1 - S S I M (x_{i}, {As}_{i})) + λ {|| S ||}_{m_{1}}

Can obtain

\frac{\partial F}{\partial S} = {&dtri;}_{S} F = - B + λ H

Wherein, h for the element of S same order (namely line number and columns are all equal) be entirely 1 matrix.

Because mould length of each row of bound for objective function requirement code book is 1, the restricted problem about code book is converted into unconstrained problem by Means of Penalty Function Methods by us.

That is, make here, the transposition of subscript T representing matrix or vector, ρ represents penalty coefficient, ρ=1 to 1000.

\frac{\partial E}{\partial A} = {&dtri;}_{A} E = \frac{\partial F}{\partial A} + 4 ρ A (D - I),

Wherein,

D = d i a g (a_{1}^{T} a_{1}, a_{2}^{T} a_{2}, ..., a_{k}^{T} a_{k}),

Namely be a diagonal matrix, its diagonal element is respectively i is the unit matrix with D same order.

Order for optimizing the Gradient Descent step-length of code book, the span of s is 0.01 ~ 0.1, generally gets s=0.05, molecule max{|A _ij| representing the element value of maximum absolute value in code book, denominator represents that E is about code book A gradient matrix the mean value of all elements absolute value.

Optimized algorithm:

Step 1: random initializtion code book A ⁽⁰⁾, by be normalized into 1 by long for the mould of each row of code book, put t=1; A ^(t)represent the t time iterative value of code book, A ⁽⁰⁾represent code book initial value.

Step 2: the corresponding encoder matrix S of random initializtion ⁽⁰⁾(each element in matrix is initialized as a random number between 0 to 1), A ^(t)represent the t time iterative value of S.

Step 3: upgrade encoder matrix S,

Step 4: upgrade code book (code book) A

A^{(t)} &LeftArrow; A^{(t - 1)} - σ {&dtri;}_{A} E

Step 5: if algorithm convergence, or reach predetermined iterations, just stop performing, obtain A, S; Otherwise, put t=t+1, forward step 3 to.

Note: here, symbol ← expression assignment, the value by the right variable is assigned to the variable on the left side; Symbol with represent that Hardmard takes advantage of and removes respectively, namely the dot product of matrix removes (each component elements does corresponding computing respectively) with point; Evolution function sqrt () acts on each component elements of matrix representation to corresponding matrix and does open computing; For any one matrix M, M ₊=max (M, 0), M _-=-min (M, 0), with evolution function sqrt (), max () and min () makes comparisons each element and 0 of matrix here respectively, gets maximum or least member accordingly.

5): fixed code book, then according to the non-negative sparse coding model of structure based similarity, corresponding sparse coding is calculated to the SIFT feature of each width of data centralization figure.Namely in objective function, code book is fixed, according to calculate the non-negative sparse coding s of the structure based similarity of SIFT feature x, calculate SIFT feature x sparse coding adopt Methods and steps 4) in fixed code book A go the method for Optimized Coding Based matrix S identical.

6): for the every piece image of pending image data set, by the corresponding sparse coding of its all SIFT feature, integrate according to the mode in the maximum pond of spatial pyramid (SPMmaxpooling), can obtain the proper vector of a higher-dimension, the proper vector of the higher-dimension obtained is exactly the proper vector for visual tasks such as classifying of this image.

As shown in Figure 2, the maximum pond of spatial pyramid (SPMmaxpooling): spatial pyramid is exactly the grid be divided into respectively by former picture, generally be divided into the grid of 1 × 1,2 × 2,4 × 4, each grid can regard a larger image block as.In each image block, carry out maximum pond to the sparse coding of its all SIFT feature on every one dimension, namely in each dimension, the result in pond is the maximal value of absolute value in all sparse coding respective dimension.The proper vector on this image block can be obtained behind each grid pond, proper vector after all grids or image block upper storage reservoir is stitched together, the feature of the higher-dimension obtained is exactly the proper vector of this picture in its entirety, and this operating process is just called the maximum pond of spatial pyramid.Figure of description 1 is shown in by the schematic diagram in the maximum pond of spatial pyramid.

7): in each class of pending image data set, several pictures of random selecting, or determine some images in advance, as training set, with label training sorter that the characteristic sum in the maximum pond of the spatial pyramid of training set image is corresponding, after sorter trains, the parameter of sorter just determines, and generally just uses SVM classifier.

8) proper vector behind maximum for the spatial pyramid of test set picture pond is input to sorter, just can obtains corresponding classification prediction label.

Claims

1. the image classification method of the non-negative sparse coding of structure based similarity, is characterized in that, comprises the following steps:

2. the image classification method of the non-negative sparse coding of structure based similarity according to claim 1, it is characterized in that, step 1) in, concentrate all images according to the block of pixels of 16 to 32 and the sliding step of 6 to 10 to pending view data, densely extract the SIFT feature of each image.

3. the image classification method of the non-negative sparse coding of structure based similarity according to claim 1, is characterized in that, step 3) in, if code book is A=[a ₁, a ₂..., a _k], a base vector is shown in each list of A, and the columns of code book is k, SIFT feature vector x _iunder code book A, corresponding sparse coding is s _i, definition encoder matrix S=[s ₁, s ₂..., s _n], namely each row of encoder matrix S are structural similarity sparse codings of corresponding SIFT feature; The objective function of non-negative sparse coding model is as follows:

\underset{S &GreaterEqual; 0, &ForAll; i, || a_{i} || = 1}{m i n} F (A, S) = Σ_{i = 1}^{n} (1 - S S I M (x_{i}, {As}_{i})) + λ Σ_{i = 1}^{n} Σ_{j = 1}^{k} s_{j i}

Write as matrix form:

\underset{S &GreaterEqual; 0, &ForAll; i, || a_{i} || = 1}{m i n} F (A, S) = Σ_{i = 1}^{n} (1 - S S I M (x_{i}, {As}_{i})) + λ {|| S ||}_{m_{1}}

In the presentation code matrix S of S >=0, each element is non-negative;

SSIM () is structural similarity target function.

4. the image classification method of the non-negative sparse coding of structure based similarity according to claim 3, is characterized in that, step 4) concrete methods of realizing as follows:

402) initialization code book A:

404) encoder matrix S is upgraded, specific as follows:

Wherein: symbol ← expression assignment, the value by the right variable is assigned to the variable on the left side; Symbol with represent that Hardmard takes advantage of and removes respectively, namely the dot product of matrix and point remove; Evolution function sqrt () acts on each component elements of matrix representation to corresponding matrix and does open computing; B ₊=max (B, 0), B _-=-min (B, 0), with evolution function sqrt (), max () and min () makes comparisons each element and 0 of matrix here respectively, gets maximum or least member accordingly; B=[b ₁, b ₂..., b _n], h be with the element of matrix S same order be entirely 1 matrix;

405) code book A is upgraded, specific as follows:

A^{(t)} &LeftArrow; A^{(t - 1)} - σ {&dtri;}_{A} E

5. the image classification method of the non-negative sparse coding of structure based similarity according to claim 4, is characterized in that,

{&dtri;}_{A} E = \frac{\partial F}{\partial A} + 4 ρ A (D - I),

Wherein,

D = d i a g (a_{1}^{T} a_{1}, a_{2}^{T} a_{2}, ..., a_{k}^{T} a_{k}),

Namely D is a diagonal matrix, and its diagonal element is respectively i is the unit matrix with D same order.

6. the image classification method of the non-negative sparse coding of structure based similarity according to claim 4, is characterized in that, step 5) in, in objective function, code book is fixed, according to calculate the non-negative sparse coding s of the structure based similarity of SIFT feature x.