CN110348283A

CN110348283A - Fine granularity model recognizing method based on the fusion of bilinear model multilayer feature

Info

Publication number: CN110348283A
Application number: CN201910391455.3A
Authority: CN
Inventors: 龚声蓉; 戴兴华; 王朝晖; 刘纯平; 钟珊
Original assignee: Changshu Institute of Technology
Current assignee: Changshu Institute of Technology
Priority date: 2019-05-13
Filing date: 2019-05-13
Publication date: 2019-10-18

Abstract

The present invention discloses a kind of fine granularity model recognizing method based on the fusion of bilinear model multilayer feature, include the following steps: (10) first interior auto-correlation information extractions layer by layer: according to convolution feature in each layer of bilinear model convolutional layer, parameter matrix is decomposed into two single order vectors by auto-correlation information bilinearity feature in extract layer；(20) second layer by layer in auto-correlation information extractions: according to convolution feature in each layer of bilinear model convolutional layer, parameter matrix is decomposed into two single order vectors by auto-correlation information bilinearity feature in extract layer；(30) interlayer cross-correlation information extraction: according to each interlayer convolution feature of bilinear model convolutional layer, the interlayer cross-correlation information bilinearity feature of different convolution interlayers is extracted, parameter matrix is decomposed into two single order vectors；(40) multilayer feature merges: fusion multilayer feature obtains vehicle fine granularity feature.Fine granularity model recognizing method of the invention, computation complexity is low, accuracy is high.

Description

Fine granularity model recognizing method based on the fusion of bilinear model multilayer feature

Technical field

The invention belongs to intelligent image identification technology field, especially a kind of tool is merged based on bilinear model multilayer feature Fine granularity model recognizing method.

Background technique

Fine granularity vehicle cab recognition is the classification more refined to subclass vehicles different under same major class.Its purpose It is that the specifying informations such as Vehicle manufacturers, vehicle model are judged to the vehicle image under any angle and scene.Utilize particulate Degree vehicle cab recognition technology can preferably analyze road conditions in automatic Pilot field, while traffic police can also cross and pass through fine granularity Vehicle cab recognition more easily checks violation vehicle.In addition to this, fine granularity vehicle cab recognition also has in other many fields Have a wide range of applications, such as road section traffic volume flow monitoring analysis, 0vehicle marketing assistant analysis and road video monitoring etc..

In terms of fine granularity model recognizing method is concentrated mainly on following two: the target detection of vehicle and to detection target Classification.The components such as the preceding face of vehicle, headlight are positioned by algorithm of target detection first, then each component is mentioned respectively Feature is taken, finally the characteristic synthetic of all components gets up and classifies to target vehicle.Wherein basis uses sets by hand Fine granularity model recognizing method can be divided into based on manual feature and based on deep by the feature or convolutional neural networks feature of meter Spend convolution feature two major classes.

Since the target of fine granularity image studies is visually closely similar, the information with distinction is frequently found in carefully Small regional area, therefore early stage many methods first carry out target detection to input vehicle pictures using the feature artificially designed, Feature is extracted again later to classify.But in general, the feature of hand-designed is not necessarily optimal for final classification, And be difficult accurately to detect vehicle under complex scene, thus generalization ability is not strong.

The fast development of promotion and depth learning technology recently as hardware computing capability, some domestic and foreign scholars open The research begun by the excellent ability in feature extraction development fine granularity vehicle cab recognition of convolutional neural networks.Based on depth convolution feature Method achieve significant achievement in fine granularity vehicle cab recognition task, but convolutional neural networks for component detection very The computation complexity of time-consuming, model is very high, this is fatal for large-scale image analysis task.

Therefore, problem of the existing technology is: fine granularity vehicle cab recognition computation complexity is high, accuracy is low.

Summary of the invention

The purpose of the present invention is to provide a kind of fine granularity vehicle cab recognition sides based on the fusion of bilinear model multilayer feature Method, computation complexity is low, accuracy is high.

Realize the technical solution of the object of the invention are as follows:

A kind of fine granularity model recognizing method based on the fusion of bilinear model multilayer feature, includes the following steps:

(10) first interior auto-correlation information extractions layer by layer: it according to convolution feature in each layer of bilinear model convolutional layer, extracts Auto-correlation information bilinearity feature in layer, while parameter matrix is decomposed into two single order vectors using bilinearity pond is decomposed；

(20) second interior auto-correlation information extractions layer by layer: it according to convolution feature in each layer of bilinear model convolutional layer, extracts Auto-correlation information bilinearity feature in layer, while parameter matrix is decomposed into two single order vectors using bilinearity pond is decomposed；

(30) according to each interlayer convolution feature of bilinear model convolutional layer, different volumes interlayer cross-correlation information extraction: are extracted Interlayer cross-correlation information bilinearity feature between lamination, while parameter matrix is decomposed into two one using bilinearity pond is decomposed Rank vector；

(40) multilayer feature merges: using the bilinearity pondization of pairs of cross-layer interaction and the bilinearity pond of single cross-layer interaction Change, merges multilayer feature, obtain vehicle fine granularity feature.

Compared with prior art, remarkable advantage of the invention are as follows:

1, computation complexity is low: the present invention is classified end to end using the completion of B-CNN model, avoids localized region Detection, simplifies network model, reduces computation complexity；

2, accuracy is high: the present invention is based on B-CNN to extract local detail feature, is based on cross-layer bilinearity pond method, will Auto-correlation information is merged in the interlayer interactive information of each convolutional layer, layer, and intermediate convolutional layer is made full use of to activate, and is obtained more The feature representation of robust improves the accuracy rate of fine granularity vehicle cab recognition.

Detailed description of the invention

Fig. 1 is the main flow chart of the fine granularity model recognizing method merged the present invention is based on bilinear model multilayer feature.

Fig. 2 be in Fig. 1 first layer by layer in auto-correlation information extracting step flow chart.

Fig. 3 be in Fig. 1 second layer by layer in auto-correlation information extracting step flow chart.

The flow chart of Fig. 4 cross-correlation information extracting step between the middle layer Fig. 1.

Fig. 5 is the flow chart of multilayer feature fusion steps in Fig. 1.

Fig. 6 is the network structure of bilinearity fusion on VGG-16.

Fig. 7 is influence comparison diagram of the different dimensions to precision.

Specific embodiment

As shown in Figure 1, the present invention is based on the fine granularity model recognizing methods of bilinear model multilayer feature fusion, including such as Lower step:

As shown in Fig. 2, auto-correlation information extracting step includes: in the layer of (10) first layer

(11) first layer by layer in bilinearity character representation: the bilinearity feature that auto-correlation information C × C is tieed up in the layer of first layer It is expressed as,

bilinear(l,I,f_A,f_A)=f_A(l,I)^Tf_A(l, I)=X^TX (1)

In formula, X is the feature note that first layer convolutional layer extracts, and dimension is C × M, and x indicates X in the feature point of a certain position Amount, the i.e. feature vector in a certain channel of convolutional neural networks；

(12) first interior bilinearity character representation outputs layer by layer: complete bilinear model are as follows:

z_i=x^TW_ix (2)

Wherein W_i∈R^c×cFor projection matrix, z_iIt is exported for the bilinearity character representation of position B-CNN；

(13) projection matrix decomposes: by projection matrix W_iIt is decomposed into two single order vectors:

Wherein A_i∈R^c；

(14) auto-correlation information exports in layer: study tensor representation W=[W₁,W₂,...,W_o]∈R^c×c×o, obtain first layer Auto-correlation information in the layer of convolutional layer, o dimension output z ∈ R^o:

Wherein A ∈ R^c×dFor projection matrix, P ∈ R^d×oIt is classification matrix,It is Hadamard product, d is then to determine that joint is embedding Enter the hyper parameter of dimension.

As shown in figure 3, described (20) second layer by layer in auto-correlation information extracting step include:

(21) second layer by layer in bilinearity character representation: the bilinearity feature that auto-correlation information C × C is tieed up in the layer of the second layer It is expressed as,

bilinear(l,I,f_A,f_A)=f_A(l,I)^Tf_A(l, I)=Y^TY (5)

In formula, the feature that second layer convolutional layer extracts is denoted as Y, and dimension is C × N, and y indicates Y in the feature point of a certain position Amount；

(22) second interior bilinearity character representation outputs layer by layer: complete bilinear model are as follows:

z_i=y^TW_iy (6)

Wherein W_i∈R^c×cFor projection matrix, z_iIt is exported for the bilinearity character representation of position B-CNN.

(23) second layer projection matrix decomposes: by projection matrix W_iTwo single order vectors are decomposed into,

Wherein B_i∈R^c。

(24) second interior auto-correlation information outputs layer by layer: study tensor representation W=[W₁,W₂,...,W_o]∈R^c×c×o, obtain Auto-correlation information in the layer of second layer convolutional layer, o dimension output z ∈ R^o:

Wherein B ∈ R^c×dFor projection matrix, P ∈ R^d×oIt is classification matrix,It is Hadamard product, d is then to determine that joint is embedding Enter the hyper parameter of dimension.

As shown in figure 4, (30) the interlayer cross-correlation information extracting step includes:

(31) interlayer cross-correlation acquisition of information: interlayer cross-correlation information X^TY、Y^TX is two extracted to different convolutional layers Feature seeks apposition,

bilinear(l,I,f_A,f_B)=f_A(l,I)^Tf_B(l, I)=X^TY (9)

bilinear(l,I,f_A,f_B)=f_B(l,I)^Tf_A(l, I)=Y^TX (10)

Wherein, f_A、f_BThe feature of the two adjacent convolutional layers extracted is denoted as X and Y, and dimension is respectively C × M and C × N, x The characteristic component of X and Y at same position, the i.e. feature vector in the same channel of convolutional neural networks are respectively indicated with y；

(32) interlayer bilinear model exports: complete bilinear model is respectively as follows:

z_i=x^TW_iy (11)

z_i=y^TW_ix (12)

(33) interlayer projection matrix decomposes: by projection matrix W_iIt is separately disassembled into two single order vectors:

Wherein A_i∈R^c、B_i∈R^c、

(34) interlayer cross-correlation information exports: study tensor representation W=[W₁,W₂,...,W_o]∈R^c×c×o, it is mutual to obtain interlayer Relevant information X^TY、Y^TX o dimension output z ∈ R^o:

WhereinFor projection matrix, P ∈ R^d×oIt is classification matrix,It is Hadamard product, d then determines Surely combine the hyper parameter of Embedded dimensions.

As shown in figure 5, (40) the multilayer feature fusion steps include:

(41) cross-layer bilinearity the bilinearity pond based on the interaction of pairs of cross-layer: is indicated into Y^TX is handed over based on single cross-layer Mutual bilinearity pondization is superimposed, forms another fusion feature vector Z after splicing,

Z=X^TX+Y^TY+X^TY+Y^TX (17)

The calculation formula of its backpropagation indicates are as follows:

(42) bilinearity of two convolutional layers the bilinearity pond based on the interaction of single cross-layer: is indicated into X^TX and Y^TY and across Layer bilinearity indicates X^TY is superimposed, and a fusion feature vector Z is formed after splicing:

Z=X^TX+Y^TY+X^TY (20)

The calculation formula of its backpropagation indicates are as follows:

For the validity for verifying the method for the present invention, inventor has carried out following confirmatory experiment.

Confirmatory experiment of the invention is using data set: Stanford Cars-196 and Stanford BMW-10. Stanford Cars-196 and Stanford BMW-10 are to be currently being widely used due to having certain scale and complexity Two fine granularity vehicle cab recognition data sets.Stanford Cars-196 includes 196 kinds of class of vehicle, totally 16185 automobile figures Piece is divided according to the production firm of vehicle, model and productive year, such as: Audi A4 in 2011.Each subclass difference The image not waited comprising 48~136, wherein 24~68 are divided into training image, remaining image is as test set. Stanford BMW-10 data set then includes the different angle picture of 10 BMW car systems, and every class vehicle packet 2 is containing the training of about 25 width Picture.Use disclosed VGG-16 that water is used only in order to compare under same standard with other methods as basic network model Flat overturning is to expand training samples number.The classification results of original image and its flipped image are averaged during the test Value, the result as final classification.

Experimental Hardware environment: Ubuntu 16.04, GTX1080ti, video memory 12G, Core (TM) i7 processor, dominant frequency are 3.4G inside saves as 16G.

Code running environment: source code library (MatConvNet), python2.7, Matlab 2014a.

1, model validation experimental analysis

Feasibility analysis is carried out to two different Fusion Features schemes.Because in VGG-16, compared with shallow-layer feature, Relu5_1, relu5_2 and relu5_3 include more part of semantic information, therefore using VGG-D as basic model, are used Tri- kinds of integration programs of relu5_3+relu5_2, relu5_3+relu5_1 and relu5_3+relu5_2 are in Stanford Cars- 196 compare experiment.Experimental result (black overstriking data indicate precision highest) as shown in table 3-1.

Although DCL-Fusion introduces additional interlayer interactive information, Stanford compared to SCL-Fusion It is on Cars-196 the experimental results showed that DCL-Fusion performance there is no promoted, increase calculating cost instead, cause to instruct It is slack-off to practice the time.Infer by careful analysis, it may be possible to X^TY and Y^TX transposition and include similar information each other, is added Y^TX it Cause the ratio of X and Y cross-correlation information excessive afterwards, SCL-Fusion Fusion Features are not helped instead.Therefore later Experiment in, do not add Y^TX only selects network structure on the first Fusion Features scheme SCL-Fusion, VGG-16 such as Shown in Fig. 6.

Model efficiency analysis on table 1 relu5_3+relu5_2, relu5_3+relu5_1 and relu5_3+relu5_2

2, the experimental analysis of parameter d adjustment

D is the hyper parameter for determining joint Embedded dimensions in decomposing bilinearity pond model, in order to study SCL-Fusion Influence of the middle parameter d to experimental result, is tested on Stanford Cars-196 data set, by the reality of SCL-Fusion It tests result to compare with general decomposition bilinearity pond, as a result as shown in Figure 7.

As it can be seen from table 1 in the integration program of three kinds of different layers, relu5_3 and two layers of relu5_1 of Fusion Features Better nicety of grading can be obtained, therefore in an experiment, be carried out using the feature of two convolutional layers of relu5_1 and relu5_3 Fusion, captures this two layers auto-correlation and cross-correlation information.What general decomposition bilinearity pondization was extracted is the last one convolution The auto-correlation information of layer relu5_3.By testing it can be found that the result of SCL-Fusion is significantly better than logical at identical d Decomposition bilinearity pond shows that richer information can be extracted by the interlayer interaction of feature, enhances sentencing for model Other ability.

Observing Fig. 5 simultaneously can see, as hyper parameter d is gradually varied to 8192 from 512, general decomposition bilinearity pond Change and the nicety of grading of two models of SCL-Fusion can gradually rise.As d=8192, SCL-Fusion is in Stanford 93.05% optimum efficiency is obtained on Cars-196.Therefore hyper parameter d=8192 in next experiment.

3, Fusion Features effect analysis

Experiment before shows that convolutional layer Fusion Features are obviously improved nicety of grading, therefore in Stanford Quantitative experiment is carried out on Cars-196 data set, the Fusion Features for analyzing which convolutional layer in SCL-Fusion can obtain highest Nicety of grading.According to Fig.5, as a result, setting insertion dimension d=8192, combines different convolutional layers and carry out Fusion Features Classifying quality is studied, and the results are shown in Table 2 (black overstriking data indicate precision highest).

It can be found that experiment can obtain when the Fusion Features of tri- convolutional layers of relu5_3, relu5_2 and relu5_1 Best effect.Compared with the scheme that relu5_1+relu5_3 fusion is used only, nicety of grading improves 0.4%.Show centre Convolutional layer activation is effective to fine grit classification task.Because in the forward and reverse communication process of CNN, there are information to lose Lose, when propagating among vital distinguishing characteristics, which may have lost, to be identified for fine granularity in convolutional layer.With general point Solution bilinear model is compared, and SCL-Fusion considers the interaction feature of intermediate convolutional layer, the auto-correlation information including convolutional layer It is therefore more steady with cross-correlation information.The auto-correlation of relu5_3, relu5_2 and relu5_1 are used in subsequent experiment It is merged with cross-correlation information.

The Comparative result of the different convolutional layer fusions of table 2

4, with the Comparision of forefathers

In order to further verify the fine granularity image classification proposed by the present invention based on the fusion of bilinear model multilayer feature The validity of algorithm is compared it with performance of the mainstream algorithm on Stanford Cars-196 data set, experiment knot Fruit is as shown in table 3 (black overstriking data indicate precision highest).

Comparison of experiment results on 3 Stanford Cars-196 of table

Method proposed by the present invention achieves 93.45% recognition accuracy under weak supervision condition, is higher by than BoT algorithm 0.95%, while lower than HSnet algorithm 0.45%.But two algorithms of BoT and HSnet all employ additionally in the training stage Markup information belongs to strong supervision algorithm.Algorithm proposed by the present invention and unsupervised method RA-CNN, MA-CNN based on component It compares, nicety of grading improves 1.05% and 0.95% respectively.In addition, the algorithm ratio Improved B-CNN and LRBP effect are all It is better, but effect is lower than the bilinearity pond algorithm HBP of layering.Because HBP is not only supported compared with method proposed by the present invention The feature interaction of interlayer, while learning fine granularity in a manner of mutually enhancing and indicate.Network knot of the DLA algorithm to current main-stream Structure (VGG, ResNet, ResNeXt, DenseNet etc.) summarize abstract, proposes a kind of significantly more efficient mode to take Establishing network structure achieves highest accuracy of identification on current Stanford Cars-196 data set.

The recognition effect comparison of method proposed by the present invention and typical fine granularity recognition methods on BMW-10 data set is such as Shown in table 4.Because the difference between each subclass vehicle of fine granularity vehicle classification is very subtle, the knowledge in the past based on manual feature Other classification effect is bad, and SPM algorithm only achieves 66.1% classification accuracy on BMW-10 data set.BB algorithm is The ability in feature extraction of enhancing localized region, artificial selection have the image-region of discrimination, achieve 69.3% knowledge Other effect shows that the effective position for regional area can be obviously improved the accuracy of fine granularity vehicle cab recognition.BB-3D-G exists It is improved on the basis of this, BB method is promoted to 3d space to the influence for eliminating visual angle, so that recognition accuracy improves 6.7%.The fine granularity vehicle cab recognition that BoxCars uses 3D rectangle frame relevant information to achieve 77.2% as CNN input is accurate Rate.And method proposed by the present invention is not under the premise of by additional callout box, by relu5_3, relu5_2 and relu5_1 tri- The convolutional layer feature of layer finds out the cross-correlation information (relu5_3*relu5_ of its autocorrelative bilinearity feature and interlayer respectively 2+relu5_2*relu5_1+relu5_3*relu5_1), using decompose bilinearity pond by projection matrix be decomposed into two it is one-dimensional Vector achieves 79.27% vehicle cab recognition accuracy to reduce calculation amount, improves 2.95% compared to BB-3D-G method, table Bright fine granularity model recognizing method proposed by the present invention can effectively improve the accuracy of fine granularity vehicle cab recognition.

Comparison of experiment results on 4 Stanford BMW-10 of table

Claims

1. a kind of fine granularity model recognizing method based on the fusion of bilinear model multilayer feature, which is characterized in that including as follows Step:

(10) first layer by layer in auto-correlation information extractions: according to convolution feature in each layer of bilinear model convolutional layer, in extract layer Auto-correlation information bilinearity feature, while parameter matrix is decomposed into two single order vectors using bilinearity pond is decomposed；

(20) second layer by layer in auto-correlation information extractions: according to convolution feature in each layer of bilinear model convolutional layer, in extract layer Auto-correlation information bilinearity feature, while parameter matrix is decomposed into two single order vectors using bilinearity pond is decomposed；

(30) according to each interlayer convolution feature of bilinear model convolutional layer, different convolutional layers interlayer cross-correlation information extraction: are extracted Between interlayer cross-correlation information bilinearity feature, while using decompose bilinearity pond by parameter matrix be decomposed into two single orders to Amount；

(40) multilayer feature merges: bilinearity pondization and the bilinearity pond of single cross-layer interaction using the interaction of pairs of cross-layer, Multilayer feature is merged, vehicle fine granularity feature is obtained.

2. fine granularity model recognizing method according to claim 1, which is characterized in that in the layer of (10) first layer certainly Related information extraction step includes:

(11) first layer by layer in bilinearity character representation: the bilinearity character representation that auto-correlation information C × C is tieed up in the layer of first layer For,

bilinear(l,I,f_A,f_A)=f_A(l,I)^Tf_A(l, I)=X^TX (1)

In formula, X be first layer convolutional layer extract feature note, dimension be C × M, x indicate X a certain position characteristic component, i.e., The feature vector in a certain channel of convolutional neural networks；

z_i=x^TW_ix (2)

Wherein A_i∈R^c；

(14) auto-correlation information exports in layer: study tensor representation W=[W₁,W₂,...,W_o]∈R^c×c×o, obtain first layer convolution Auto-correlation information in the layer of layer, o dimension output z ∈ R^o:

Wherein A ∈ R^c×dFor projection matrix, P ∈ R^d×oIt is classification matrix,It is Hadamard product, d is then to determine joint insertion dimension Several hyper parameters.

3. fine granularity model recognizing method according to claim 1, which is characterized in that described (20) second is interior from phase layer by layer Closing information extracting step includes:

(21) second layer by layer in bilinearity character representation: the bilinearity character representation that auto-correlation information C × C is tieed up in the layer of the second layer For,

bilinear(l,I,f_A,f_A)=f_A(l,I)^Tf_A(l, I)=Y^TY (5)

In formula, the feature that second layer convolutional layer extracts is denoted as Y, and dimension is C × N, and y indicates Y in the characteristic component of a certain position；

z_i=y^TW_iy (6)

Wherein B_i∈R^c。

(24) second interior auto-correlation information outputs layer by layer: study tensor representation W=[W₁,W₂,...,W_o]∈R^c×c×o, obtain second Auto-correlation information in the layer of layer convolutional layer, o dimension output z ∈ R^o:

Wherein B ∈ R^c×dFor projection matrix, P ∈ R^d×oIt is classification matrix,It is Hadamard product, d is then to determine joint Embedded dimensions Hyper parameter.

4. fine granularity model recognizing method according to claim 1, which is characterized in that (30) the interlayer cross-correlation information Extraction step includes:

(31) interlayer cross-correlation acquisition of information: interlayer cross-correlation information X^TY、Y^TX is two features extracted to different convolutional layers Apposition is sought,

bilinear(l,I,f_A,f_B)=f_A(l,I)^Tf_B(l, I)=X^TY (9)

bilinear(l,I,f_A,f_B)=f_B(l,I)^Tf_A(l, I)=Y^TX (10)

Wherein, f_A、f_BThe feature of the two adjacent convolutional layers extracted is denoted as X and Y, and dimension is respectively C × M and C × N, and x and y divide It Biao Shi not characteristic component of the X and Y at same position, the i.e. feature vector in the same channel of convolutional neural networks；

z_i=x^TW_iy (11)

z_i=y^TW_ix (12)

Wherein A_i∈R^c、B_i∈R^c、

(34) interlayer cross-correlation information exports: study tensor representation W=[W₁,W₂,...,W_o]∈R^c×c×o, obtain interlayer cross-correlation Information X^TY、Y^TX o dimension output z ∈ R^o:

Z=X^TY=P^T(A^TxoB^Ty) (15)

WhereinFor projection matrix, P ∈ R^d×oIt is classification matrix, o is Hadamard product, and d is then to determine connection Close the hyper parameter of Embedded dimensions.

5. fine granularity model recognizing method according to claim 1, which is characterized in that (40) the multilayer feature fusion step Suddenly include:

(41) cross-layer bilinearity the bilinearity pond based on the interaction of pairs of cross-layer: is indicated into Y^TX interacts double with based on single cross-layer Linear pondization is superimposed, forms another fusion feature vector Z after splicing,

Z=X^TX+Y^TY+X^TY+Y^TX (17)

The calculation formula of its backpropagation indicates are as follows:

(42) bilinearity of two convolutional layers the bilinearity pond based on the interaction of single cross-layer: is indicated into X^TX and Y^TY and cross-layer are double Linear expression X^TY is superimposed, and a fusion feature vector Z is formed after splicing:

Z=X^TX+Y^TY+X^TY (20)

The calculation formula of its backpropagation indicates are as follows: