CN108596195A

CN108596195A - A kind of scene recognition method based on sparse coding feature extraction

Info

Publication number: CN108596195A
Application number: CN201810435125.5A
Authority: CN
Inventors: 曾伟波; 苏江文; 郑耀松; 吕君玉; 林吓强; 陈铠
Original assignee: State Grid Information and Telecommunication Co Ltd; Fujian Yirong Information Technology Co Ltd
Current assignee: State Grid Information and Telecommunication Co Ltd; Fujian Yirong Information Technology Co Ltd
Priority date: 2018-05-09
Filing date: 2018-05-09
Publication date: 2018-09-28
Anticipated expiration: 2038-05-09
Also published as: CN108596195B

Abstract

The present invention relates to image identification technical field, more particularly to a kind of scene recognition method based on sparse coding feature extraction.A kind of scene recognition method based on sparse coding feature extraction, including step：Pretreatment operation is carried out to the sample graph image set gathered in advance for training；Extract the feature representation vector of sample graph image set；Feature representation vector and its corresponding classification mark are added in linear classifier, linear scene classifier is built；Pretreatment operation is carried out to sample graph image set to be identified；Extract the feature representation vector of sample graph image set to be identified；The feature representation vector of sample graph image set to be identified is sent into linear scene classifier and is identified, the classification mark of the affiliated scene class of sample graph image set is obtained.The main information of image can be retained while reducing image dimension using sparse coding technology, while to noise and being blocked with powerful robustness.

Description

A kind of scene recognition method based on sparse coding feature extraction

Technical field

The present invention relates to image identification technical field, more particularly to a kind of scene Recognition based on sparse coding feature extraction Method.

Background technology

Scene Recognition refers to going out in scene picture according to content similar in scene image such as same color feature recognition Scene, it is therefore an objective to the scene characteristic in image is excavated by imitating the sensing capability of the mankind, to automatically identify image The scene being subordinate to.In scene identification process, whole image is to be judged to other as a whole, is not related to specific mesh Mark.Because objectives can only be as a foundation for judging generic in scene classification, but not with scene generic It is certain perfectly correlated.Scene Recognition is a basic preprocessing process in computer vision and robot field, it is in image Important role is served as in the computer intelligence field of content retrieval, pattern-recognition and machine learning etc..

In recent years, scene Recognition research has achieved greater advance, the method for emerging many scene type modelings.Root Existing scene recognition method is divided into four classes according to scene type modeling pattern：

(1) scene recognition method based on global characteristics

Scene recognition method based on global characteristics passes through the global visualization of the images such as color, texture and shape spy mostly It levies scene to be described, and is successfully applied in outdoor scene identification.In comparison, ruler of the color characteristic for scene Degree, the variation at visual angle and the rotation of image can obtain better recognition result；And texture corresponds to image with shape feature Structure and directional information, these are also that human visual system is sensitive just, therefore texture and shape feature and the mankind Visual perception result has better consistency.However, the scene recognition method based on global characteristics usually requires search image All pixels point, and do not account for the spatial relation of pixel, therefore it has poor real-time and versatility.

(2) scene recognition method based on target

One locality can be accurately located by a series of great representative targets around it, former based on this Reason, most of scene recognition methods are also to pick out the scene corresponding to image according to the result of target identification in image.In It is that such scene recognition method needs to undergo the stages such as image segmentation, combination of multiple features and target identification.When target to be identified When far apart from visual angle, which is just probably hidden in the background information that those lack break-up value, in the segmentation stage With regard to oneself through being ignored, and then target identification work is caused to cannot achieve.In addition to this, in order to simplify answering for concrete scene Miscellaneous degree, need choose one group of target that can represent the scene, and these it is reliable and stablize representative targets On The Choices Become another bottleneck for restricting the scene Recognition based on target.

(3) scene recognition method based on region

In view of limitation possessed by the scene recognition method based on target, some researchers utilize the obtained region of segmentation To replace scene representations target, and feature combination is carried out to form scene indicia according to the structural relation in these regions.It should The key of class scene recognition method is how to obtain reliable Region Segmentation Algorithm.And the character representation of these area informations Method has very much, such as：Part may be used with the mode that the overall situation is combined to realize, that is, extract the global statistics inside region Feature；Region can also be characterized by extracting the local invariant feature in region；Can also according to bag of words come Area information is characterized.

(4) scene recognition method based on bionical feature

In view of the real-time and high efficiency of scene Recognition, computer vision system best at present and the mankind and other The gap that can not be made up is remained between the vision system of animal.In view of superior scene Recognition possessed by humans and animals Ability produces the scene recognition method based on bionical feature, the treatment mechanism that this method passes through the biological visual cortex of simulation therewith To realize scene Recognition.Its basic ideas is ground for a certain biological vision mechanism or certain a kind of biological vision characteristic spread Study carefully, and establish effective computation model by careful analysis, to obtain satisfactory result.For example, being based on the mankind The method of visual attention selection mechanism can easily cause certain the image area information that people pays attention to and handle as priority pair As this selective mechanism can greatly improve effect of the scene recognition method to the processing of visual information, analysis and identification Rate.

Every difficult point present in existing scene Recognition, as Same Scene be dynamic change, Same Scene picture deposit Image between variability, different classes might have many similar points, different scenes image it is possible that overlapping The case where and scene classification the very big factor of classification performance on accuracy etc. dependent on Training scene image category mark, own The accuracy that everything can all cause scene classification to identify is not high.

Invention content

For this reason, it may be necessary to a kind of scene recognition method based on sparse coding feature extraction be provided, to solve scene classification The not high problem of identification accuracy.

To achieve the above object, a kind of scene recognition method based on sparse coding feature extraction is inventor provided, is had Body technique scheme is as follows：

A kind of scene recognition method based on sparse coding feature extraction, including step：To gathered in advance for training Sample graph image set carry out pretreatment operation；Extract the feature representation vector of the sample graph image set after the pretreatment operation；It will The feature representation vector and its corresponding classification mark of the sample graph image set are added in linear classifier, to the linear classification Device carries out the optimal parameter that parameter learning obtains linear classifier, and linear scene classifier is built according to the optimal parameter；It is right Sample graph image set to be identified carries out pretreatment operation；Extract the spy of the sample graph image set to be identified after the pretreatment operation Sign expression vector；The feature representation vector of the pretreated sample graph image set to be identified is sent into the linear scene point It is identified in class device, obtains the classification mark of the affiliated scene class of the sample graph image set to be identified.

Further, the pretreatment operation includes：Picture contrast normalizes and Gamma correction process.

Further, described " the feature representation vector for extracting the sample graph image set after the pretreatment operation ", including：It adopts The low-level image feature that the sample graph image set after pretreatment operation is extracted with the method that multiple dimensioned SIFT feature merges, i.e., to each pixel Point extracts the SIFT key points of image using the field of a variety of scale sizes in each field, and it is crucial to solve the SIFT The sparse expression of point, forms the mark sheet of pretreated sample graph image set using spatial pyramid strategy and max-pooling Up to vector.

Further, the step " sparse expression for solving the SIFT key points ", including：Using local linear constraint Coding solves the sparse expression of the SIFT key points.

Further, the step using spatial pyramid strategy and max-pooling " and after forming pretreatment operation The feature representation vector of sample graph image set ", including：The regional area for dividing an image into 1 × 1,1 × 4 and 4 × 1, in partial zones Using the histogram of max-pooling statistical coding features in domain, the feature representation of the regional area is formed, all areas are connected The feature representation in domain forms the feature representation vector of the sample graph image set after pretreatment operation.

Further, the step " carries out the best ginseng that parameter learning obtains linear classifier to the linear classifier Number ", including：The weight parameter of linear classifier is calculated using least square method, and is obtained linearly using cross-validation method Grader optimal parameter.

Further, described " picture contrast normalization " includes step：Image is gone into YUV face from RGB color The colour space carries out YUV color spaces global and local comparison normalized；The global and local compares normalized Process only operates the channels Y, and other two channels remain unchanged, and the global normalization is to normalize image pixel value To near image pixel mean value, the part normalization is reinforced edge.

Further, described " extract the feature representation of the sample graph image set to be identified after the pretreatment operation to Amount ", including：The low-level image feature of pretreated sample graph image set to be identified is extracted, and is merged using multiple dimensioned SIFT feature； Using the low-level image feature of the sample graph image set after the method extraction pretreatment operation of multiple dimensioned SIFT feature fusion, i.e., to each picture Vegetarian refreshments uses the field of a variety of scale sizes, and the SIFT key points of image are extracted in each field, and solves the SIFT and close The sparse expression of key point forms the feature of pretreated sample graph image set using spatial pyramid strategy and max-pooling Expression vector.

The beneficial effects of the invention are as follows：

1, method of the invention is the scene Recognition based on global characteristics, and entire scene image is judged to as a whole It is disconnected, it is not related to objectives.And when extracting the low-level image feature of sample graph image set, merged using multiple dimensioned SIFT feature, it can To increase the number of SIFT key points, while the local detail information of image can also be increased.

2, the main information of image can be retained while reducing image dimension using sparse coding technology, while to making an uproar It sound and blocks with powerful robustness.Bottom sparse coding feature representation combination max-pooling methods can reduce upper layer The complexity of sorter model accelerates the speed of classifier training.And sparse coding is a kind of nonlinear Feature Mapping mode, Subsequent classification performance can be effectively improved using this kind of Feature Mapping mode.

3, pretreatment operation is normalized using contrast and Gamma corrections are combined, and can significantly be mitigated due to image local Shade and illumination variation caused by influence.

4, sparse coding technology uses local linear constraint coding techniques, can analytically acquire the sparse expression of signal, Solution's expression is directly obtained, without iterative solution, improves the solution efficiency of sparse coding.

5, the complexity of model is reduced as scene classifier using linear classifier, improves the training speed of grader, The possibility of over-fitting is reduced simultaneously.

Description of the drawings

Fig. 1 is a kind of flow chart of the scene recognition method based on sparse coding feature extraction described in specific implementation mode；

Fig. 2 is the feature representation vector of the sample graph image set after the extraction pretreatment operation described in specific implementation mode Schematic diagram；

Fig. 3 is the signal for the sparse expression for solving SIFT key points described in specific implementation mode using sparse coding technology Figure；

Fig. 4 is the schematic diagram of the computational methods process of sparse expression described in specific implementation mode；

Fig. 5 is the signal for dividing the image into several regional areas described in specific implementation mode using spatial pyramid strategy Figure.

Specific implementation mode

For the technology contents of technical solution, construction feature, the objects and the effects are described in detail, below in conjunction with specific reality It applies example and attached drawing is coordinated to be explained in detail.

First to involved in present embodiment to noun make some explanations：

SIFT：That is Scale invariant features transform (Scale-invariant feature transform, SIFT) is to use In a kind of description of image processing field.This description has scale invariability, can detect key point in the picture, is a kind of Local feature description's.

Sparse coding (Sparse Coding)：It is that a kind of areas simulation mammalian visual systems main view cortex V1 are simply thin Born of the same parents' receptive field Artificial Neural Network.This method has locality, the band general character of directionality and frequency domain in space, be it is a kind of from The image statistics method of adaptation.

Referring to Fig. 1, in the present embodiment, need at least meet the following conditions for trained sample graph image set：1, same The training sample image collection of class scene will include different mode as much as possible；2, the training sample image of inhomogeneous scene Collection will keep in balance as far as possible.The purpose so done is preferably to learn the parameter of cutting edge aligned scene classifier, so as to carry High scene Classification and Identification accuracy.

Step S101：Pretreatment operation is carried out to the sample graph image set gathered in advance for training.It can be used such as lower section Formula：The pretreatment operation includes：Picture contrast normalizes and Gamma correction process." the picture contrast normalization " Including step：Image is gone into YUV color spaces from RGB color, the progress global and local comparison of YUV color spaces is returned One change is handled；The global and local comparison normalized process only operates the channels Y, and other two channels are kept not Become, the global normalization is to normalize to image pixel value near image pixel mean value, and the part normalization is opposite side Edge is reinforced.Pretreatment operation is normalized using contrast and Gamma corrections are combined, and can significantly be mitigated due to image local Shade and shadow change caused by influence.

Step S102：Extract the feature representation vector of the sample graph image set after the pretreatment operation.It can be used such as lower section Formula：The low-level image feature of the sample graph image set after pretreatment operation is extracted using multiple dimensioned SIFT feature fusion method, scale factor has [4,6,8,9,10]；Using the field of a variety of scales such as 4*4,6*6 i.e. in one pixel of sample image, and in each neck The SIFT key points of image are extracted in domain.When extracting the low-level image feature of sample graph image set, merged using multiple dimensioned SIFT feature Method can increase the number of SIFT key points, obtain more image informations, while can also increase the local detail information of image.

Further, after getting the SIFT key points of different zones, the sparse expression of the SIFT key points is solved, In the present embodiment, the sparse expression of the SIFT key points is solved using local linear constraint coding.Sparse coding technology Using local linear constraint coding techniques, the sparse expression of signal can be analytically acquired, that is, directly obtains solution's expression, nothing It needs to iteratively solve, improves the solution efficiency of sparse coding.And using the feature representation of sparse coding technical construction image, Ke Yi Utmostly retain the main information of image while being effectively reduced the complexity of image, and to noise and blocks with very Strong robustness.Sparse coding is a kind of nonlinear Feature Mapping mode, can effectively be carried using this kind of Feature Mapping mode High subsequent classification performance.

After the sparse expression for having solved SIFT key points, that is, after the sparse coding for completing SIFT key points, image is divided At 1 × 1,1 × 4 and 4 × 1 regional area, using the straight of max-pooling statistics SIFT key points codings in regional area Fang Tu forms the feature representation of the regional area, and the feature representation for connecting all areas forms the sample graph after pretreatment operation The feature representation vector of image set.

Step S103：Linear classification is added in the feature representation vector of the sample graph image set and its corresponding classification mark In device, the optimal parameter that parameter learning obtains linear classifier is carried out to the linear classifier, according to the optimal parameter structure Build linear scene classifier.It can be used such as under type：The weight parameter of linear classifier is calculated using least square method, and Linear classifier optimal parameter is obtained using cross-validation method, linear scene classifier is built according to the optimal parameter.Using Linear classifier reduces the complexity of model as scene classifier, improves the training speed of grader, while reducing over-fitting Possibility.

Step S104：Pretreatment operation is carried out to sample graph image set to be identified.It can be used such as under type：The pretreatment Operation includes：Picture contrast normalizes and Gamma correction process." the picture contrast normalization " includes step：It will figure Normalized is compared as going to YUV color spaces from RGB color, carrying out global and local to YUV color spaces；It is described Global and local comparison normalized process only operates the channels Y, and other two channels remain unchanged, and the overall situation is returned One change is to normalize to image pixel value near image pixel mean value, and the part normalization is reinforced edge.In advance Processing operation is normalized using contrast and Gamma corrections are combined, and can significantly mitigate the shade and shadow due to image local Influence caused by variation.

Step S105：Extract the feature representation vector of the sample graph image set to be identified after the pretreatment operation.It can adopt With such as under type：The low-level image feature of the sample graph image set after extraction pretreatment operation, scale are merged using multiple dimensioned SIFT feature The factor has [4,6,8,9,10]；Using the field of a variety of scales such as 4*4,6*6 i.e. in one pixel of sample image, and The SIFT key points of image are extracted in each field.When extracting the low-level image feature of sample graph image set, using multiple dimensioned SIFT spies Sign fusion, can increase the number of SIFT key points, obtain more image informations, while can also increase the local detail letter of image Breath.

After the sparse expression for having solved SIFT key points, that is, after the sparse coding for completing SIFT key points, image is divided At 1 × 1,1 × 4 and 4 × 1 regional area, using the histogram of max-pooling statistics key point codings in regional area Figure, forms the feature representation of the regional area, and the feature representation for connecting all areas forms the sample image after pretreatment operation The feature representation vector of collection.

Step S106：The feature representation vector of the pretreated sample graph image set to be identified is sent into described linear It is identified in scene classifier, obtains the classification mark of the affiliated scene class of the sample graph image set to be identified.I.e.：Input is current Sample graph image set feature representation vector to be identified, is sent in trained linear scene sorter model, according to Linear scene classifier output quantity determines the type belonging to current sample graph image set to be identified.Type belonging to it is by highest The output point of value determines.

The present invention is based on the scene Recognition of global characteristics, entire scene image is judged as a whole, is not related to Objectives.And when extracting the low-level image feature of sample graph image set, merged using multiple dimensioned SIFT feature, the passes SIFT can be increased The number of key point, while the local detail information of image can also be increased.And image dimension can reduced using sparse coding technology Retain the main information of image while spending, while to noise and blocking with powerful robustness.Bottom sparse coding feature Expression combines max-pooling methods that can reduce the complexity of upper layer sorter model, accelerates the speed of classifier training.And Sparse coding is a kind of nonlinear Feature Mapping mode, and subsequent classification can be effectively improved using this kind of Feature Mapping mode Energy.Sparse coding technology uses local linear constraint coding techniques, can analytically acquire the sparse expression of signal, i.e., directly To solution's expression the solution efficiency of sparse coding is improved without iterative solution.And using linear classifier as scene classification Device reduces the complexity of model, improves the training speed of grader, while reducing the possibility of over-fitting.

Please refer to being implemented as follows for Fig. 2 to Fig. 5, step S102 or step S105：

The sparse expression that SIFT key points are solved using sparse coding technology, enables x ∈ RⁿFor input signal, (i.e. SIFT is crucial Point), B=[b₁,b₂,...,b_m]∈R^n×mFor dictionary, then sparse coding technology is to solve following L1-norm problems

To obtain the sparse expression c ∈ R of input signal^m。

Further, the computational methods of the sparse expression are specific as follows：

Using local restriction, each input signal is projected to its Local coordinate system, for input vector x= {x₁,x₂,...,x_n, the K near offset of x is found in subrange, is then rebuild x using K near offset, is Each atom assigns weight and achievees the purpose that choose K arest neighbors, obtains object function：

Wherein, λ is regularization coefficient, indicates to press element multiplication, d is the vector that the weight of each atom of dictionary is constituted, A It indicates that element is all 1 vector, enables

Wherein dist (x, B)=[dist (x, b₁),...,dist(x,b_m)]^T, dist (x, b_j) indicate signal x and atom b_j Euclidean distance, j=1, the parameter of 2 ..., m, σ the weight rate of decay in order to control, object function compiled by Analytical Solution Code c：

Wherein, C indicates the covariance matrix of data, is encodedAnd it is normalized to obtain final coding c.

Dividing the image into several regional areas, the method for division uses spatial pyramid strategy, i.e., image is divided into 1 × 1,1 × 4 and 4 × 1 size (region division herein be in order to be counted to all key points in the area, so as to Obtain the local message of image).In each regional area, key point in each region is counted using max-pooling and is compiled The histogram of code, the feature representation as the region.N is shared such as in the i of section_iA key point, the coding square of all key points Battle array beIts each row represents the sparse expression of a key point.z_i∈R^mFor the feature representation in the region, then

Wherein z_ijFor z_iJ-th of element, c_kjThe element arranged for the row k jth of C.

The feature representation for linking all areas forms the final feature representation vector of image, i.e. Z=[z₁,z₂,...,z₉]∈ R^9m。

Further, step S103 is implemented as follows：

For a series of input training sample to (z_i,t_i), i=1,2 ..., N (t is the label true value of training sample), The object function of linear classifier is：

Subjectto:Wz_i=t_i-ε_i, i=1 ..., N.

Using method of Lagrange multipliers, solve optimal models weight is：

Wherein C is regularization coefficient, adjusts to obtain an optimal parameter by cross-validation method.

It should be noted that although the various embodiments described above have been described herein, it is not intended to limit The scope of patent protection of the present invention.Therefore, based on the present invention innovative idea, to embodiment described herein carry out change and repair Change, or using equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, it directly or indirectly will be with Upper technical solution is used in other related technical areas, is included within the scope of patent protection of the present invention.

Claims

1. a kind of scene recognition method based on sparse coding feature extraction, which is characterized in that including step：

Pretreatment operation is carried out to the sample graph image set gathered in advance for training；

Extract the feature representation vector of the sample graph image set after the pretreatment operation；

The feature representation vector of the sample graph image set and its corresponding classification mark are added in linear classifier, to the line Property grader carry out the optimal parameter that parameter learning obtains linear classifier, linear scene classification is built according to the optimal parameter Device；

Pretreatment operation is carried out to sample graph image set to be identified；

Extract the feature representation vector of the sample graph image set to be identified after the pretreatment operation；

The feature representation vector of the pretreated sample graph image set to be identified is sent into the linear scene classifier Identification obtains the classification mark of the affiliated scene class of the sample graph image set to be identified.

2. a kind of scene recognition method based on sparse coding feature extraction according to claim 1, which is characterized in that

The pretreatment operation includes：Picture contrast normalizes and Gamma correction process.

3. a kind of scene recognition method based on sparse coding feature extraction according to claim 1, which is characterized in that

" the feature representation vector for extracting the sample graph image set after the pretreatment operation ", including：With multiple dimensioned SIFT spies The method of fusion is levied to extract the low-level image feature of sample graph image set, i.e., uses the field of a variety of scale sizes to each pixel, The SIFT key points of image are extracted in each field；

The sparse expression of the SIFT key points is solved, and pretreatment is formed using spatial pyramid strategy and max-pooling The feature representation vector of sample graph image set afterwards.

4. a kind of scene recognition method based on sparse coding feature extraction according to claim 3, which is characterized in that

The step " sparse expression for solving the SIFT key points ", including：Described in local linear constraint coding solution The sparse expression of SIFT key points.

5. a kind of scene recognition method based on sparse coding feature extraction according to claim 3, which is characterized in that

The step " and the sample graph image set after pretreatment operation is formed using spatial pyramid strategy and max-pooling Feature representation vector ", including：

The regional area that 1 × 1,1 × 4 and 4 × 1 are divided an image into using spatial pyramid strategy, is used in regional area Max-pooling counts the histogram of SIFT key points coding, forms the feature representation of the regional area, connects all areas Feature representation formed pretreatment operation after sample graph image set feature representation vector.

6. a kind of scene recognition method based on sparse coding feature extraction according to claim 1, which is characterized in that

The step " optimal parameter that parameter learning obtains linear classifier is carried out to the linear classifier ", including：

The weight parameter of linear classifier is calculated using least square method, and linear classifier is obtained using cross-validation method Optimal parameter.

7. a kind of scene recognition method based on sparse coding feature extraction according to claim 2, which is characterized in that

" the picture contrast normalization " includes step：Image is gone into YUV color spaces, to YUV face from RGB color The colour space carries out global and local and compares normalized；

The global and local comparison normalized process only operates the channels Y, and other two channels remain unchanged, institute It is to normalize to image pixel value near image pixel mean value to state global normalization, and the part normalization is carried out to edge Reinforce.

8. a kind of scene recognition method based on sparse coding feature extraction according to claim 1, which is characterized in that

" the feature representation vector for extracting the sample graph image set to be identified after the pretreatment operation ", including：With more rulers The method of SIFT feature fusion is spent to extract the low-level image feature of sample graph image set, i.e., a variety of scale sizes is used to each pixel Field, in each field extract image SIFT key points；