CN111461181A

CN111461181A - Vehicle fine-grained classification method and device

Info

Publication number: CN111461181A
Application number: CN202010183058.XA
Authority: CN
Inventors: 傅慧源; 马华东; 王川铭
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2020-03-16
Filing date: 2020-03-16
Publication date: 2020-07-28
Anticipated expiration: 2040-03-16
Also published as: CN111461181B

Abstract

The invention provides a vehicle fine-grained classification method and a vehicle fine-grained classification device, wherein the classification method comprises the following steps: extracting a basic characteristic diagram of an input picture through a convolutional neural network; adaptively constructing a global structure diagram according to the basic feature diagram; adopting a graph convolutional neural network to carry out incidence relation reasoning on the global structure graph to form global guidance information; applying the global guidance information to the basic feature map to obtain an enhanced feature map; and inputting the enhanced feature map into a classifier for classification. The classification method of the invention leads the obtained characteristic diagram to have stronger expression capability by reasoning the incidence relation among different components of the vehicle and applying the inference result of the incidence relation to the basic characteristic diagram, thus improving the accuracy of classification of fine-grained images of the vehicle; in addition, the classification method does not need to label the key area, is rapid in feature extraction, wide in application scene and low in calculation consumption.

Description

Vehicle fine-grained classification method and device

Technical Field

The invention relates to the technical field of vehicle classification, in particular to a vehicle fine-grained classification method and device.

Background

The vehicle is used as an important vehicle for people to go out and is closely related to the life of people, and the development of the modern vehicle production technology enables more and more types of automobiles to appear in the daily life of people; vehicles generally have similar appearance structures, different types of vehicles only have slight differences in positions of lamps, doors and the like, and related experts are often needed for identifying the fine granularity of the vehicles.

The fine-grained classification of vehicles is one of the most important tasks in computer vision, and not only plays an important role in other vision tasks such as vehicle re-identification and vehicle tracking, but also can provide assistance for the brain and intelligent traffic construction of cities; at the same time, this is a very challenging problem, since identifying models requires capturing subtle visual differences between classes or instances that are easily masked by other factors (e.g., viewpoint, lighting, or scene).

The basic flow of the existing vehicle fine-grained classification method can be summarized as follows:

basic feature extraction: usually, the basic feature extraction is performed through the CNNs to obtain a feature map of input picture data;

finding the position corresponding to the key area of the vehicle on the characteristic diagram by a supervision or unsupervised method; the supervision means that each image containing the vehicle to be recognized is marked with corresponding key area positions, usually four coordinate values, which represent the positions of the upper left corner point and the lower right corner point of the area; the unsupervised method is to obtain the position of a key area through an attention mechanism or other unsupervised learning methods;

further extracting fine-grained characteristics of the key area; for the found key area, utilizing a neural network to extract features;

splicing the independent fine-grained characteristics of each key area to obtain the characteristic representation of the vehicle, wherein the splicing method is generally the splicing of arrays or numerical matrixes;

and classifying the spliced vehicle characteristic representation through a classifier to obtain a final classification result.

However, the existing classification methods have the following problems:

when searching for a key area in the existing classification method, the supervised method usually needs additional marking information to mark positions of a car lamp, a car window, a car door and the like of a car, so that the corresponding area on the feature map can be found, the marking cost of data is greatly increased, the number of data is limited, and the applicable scene of the model is also limited;

the existing classification method needs to extract the characteristics of key areas respectively, and the calculation consumption of the classification method is increased by multiple times of calculation.

Disclosure of Invention

In view of the above, the present invention provides a vehicle fine-grained classification method to solve the problems that the existing classification method needs to label the key area information, the key area is not associated, and the calculation consumption is large.

Based on the above object, the present invention provides a vehicle fine-grained classification method, comprising:

the method comprises the following steps: extracting a basic characteristic diagram of an input picture through a convolutional neural network;

step two: adaptively constructing a global structure diagram according to the basic feature diagram;

step three: adopting a graph convolutional neural network to carry out incidence relation reasoning on the global structure graph to form global guidance information;

step four: applying the global guidance information to the basic feature map to obtain an enhanced feature map;

step five: and inputting the enhanced feature map into a classifier for classification.

Optionally, the method for extracting the basic feature map includes:

the method comprises the steps of inputting an input picture into a residual error neural network, taking the output of a layer3 structure of the residual error neural network as the basic feature map, adjusting the size of the input picture to a test size by a linear interpolation method before the input picture is input into the residual error neural network, for example, adjusting the size of the input picture to 448 × 448, and preferably selecting the number of layers of the residual error neural network according to actual needs, for example, selecting a residual error neural network with 34 layers, 50 layers or 101-layer network layers, preferably a residual error neural network with 101-layer network layers.

Optionally, the constructing the global structure diagram includes constructing node information and side information of the global structure diagram;

the construction of the node information comprises the following steps:

carrying out region segmentation on the basic feature graph on the length dimension and the width dimension by adopting the size of a predefined region to obtain a multi-scale region, then respectively aggregating the features of each region through pooling operation in a neural network, and aggregating the features of each region into a one-dimensional vector with the length being the dimension of the channel of the basic feature graph to obtain the node information;

the construction of the side information comprises:

calculating the incidence relation between the node information by adopting matrix multiplication to obtain the side information;

the correlation is shown in formula I:

V＝η(A，W_η)*(A，W) Formula I;

in formula I, V is side information, A is node information, η sum is two convolution layers respectively, W represents parameters of the convolution layers, and represents matrix multiplication.

Optionally, the performing association relationship inference on the global structure diagram by using the graph convolutional neural network includes:

inputting the global structure chart into a graph convolution neural network with a layer of graph convolution layer, and carrying out association relationship reasoning on different region characteristics by utilizing the side information.

Optionally, the relationship reasoning is as shown in formula II:

G＝V*(A*W_g) Formula II;

in formula II, G is global guidance information; v is side information; a is node information; wg is a parameter of the graph convolution neural network; denotes matrix multiplication.

Optionally, the applying the global guidance information on the basic feature map includes: applying the global guidance information on the channel dimension and the space dimension of the basic feature map;

the applying the global guidance information on the channel dimension of the basic feature map comprises:

pooling operation is carried out according to the space direction to pool the global structure chart into a one-dimensional vector, and then the one-dimensional vector is multiplied by the basic characteristic chart according to the dimension;

the applying global guidance information on the spatial dimension of the basic feature map comprises:

generating a local attention map from the basic feature map through a convolution layer;

pooling the global guidance information into a global feature map with only one dimension according to the channel direction through pooling operation;

the base feature map is multiplied by the local attention map and the global feature map by pixels.

Optionally, the inputting the enhanced feature map into a classifier for classification includes:

aggregating the enhanced feature maps into a feature vector with the channel dimension of the basic feature map through a spatial pooling layer;

and inputting the feature vectors into a layer of full-link layer for classification.

Optionally, the classification method further includes:

taking a Stanford Cars Dataset (Stanford Cars Dataset) as a training Dataset, processing images in the training Dataset according to the steps from one step to five to obtain a prediction classification result, carrying out loss calculation on the prediction classification result and a standard result given in the training Dataset according to a cross entropy loss method, then, reversely transmitting the calculated loss to a corresponding position of a parameter to be learned through a reverse propagation method, and modifying the parameter according to a random gradient descent method to obtain the parameter of the final model.

Optionally, the loss calculation formula is shown in formula III:

in formula III, L represents the loss, N represents the total number of images, M represents the number of classes, Pic represents the probability that the current prediction classification result is class c, and yic represents whether the prediction classification result class is the same as the given class in the training data set, and is 1 when the same class is used, and is not 0 when the same class is used.

A second aspect of the embodiments of the present invention also provides a vehicle fine-grained classification device, where the classification device includes: the classification method comprises a memory, a processor connected with the memory, and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to execute the classification method.

From the above, it can be seen that the method and the device for classifying the fine granularity of the vehicle provided by the invention at least comprise the following technical effects:

according to the classification method, the incidence relation between different components of the vehicle is inferred, and the inference result of the incidence relation is applied to the basic characteristic diagram, so that the obtained characteristic diagram has stronger expression capability, and the accuracy of classification of fine-grained images of the vehicle is improved; in addition, the classification method does not need to label the key area, is rapid in feature extraction, wide in application scene and low in calculation consumption.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a fine-grained classification method for vehicles according to an embodiment of the present invention;

FIG. 2 is a flow chart of a construction of a global structure diagram according to an embodiment of the present invention;

FIG. 3 is a flowchart of applying global guidance information provided by an embodiment of the present invention to a basic feature diagram;

fig. 4 is a diagram illustrating a variation of the vehicle classification accuracy in the training process according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

It is to be noted that technical terms or scientific terms used in the embodiments of the present invention should have the ordinary meanings as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined.

The basic flow of the existing vehicle fine-grained classification method can be summarized as follows: extracting basic features; finding the position corresponding to the key area of the vehicle on the characteristic diagram by a supervision or unsupervised method; further extracting fine-grained characteristics of the key area; splicing the independent fine-grained characteristics of each key area to obtain the characteristic representation of the vehicle; and classifying the spliced vehicle characteristic representation through a classifier to obtain a final classification result.

However, the existing classification methods have the following problems:

when searching for a key area in the existing classification method, the supervised method usually needs additional marking information to mark positions of a car lamp, a car window, a car door and the like of a car, so that the corresponding area on the feature map can be found, the marking cost of data is greatly increased, the number of data is limited, and the applicable scene of the model is also limited; in addition, the existing classification method needs to extract the features of the key areas respectively, and the calculation consumption of the classification method is increased by multiple times of calculation.

In view of the above technical problem, the present invention provides a vehicle fine-grained classification method, as shown in fig. 1, the method includes:

the method comprises the following steps: extracting a basic feature map of an input picture through a convolutional neural network, wherein the basic feature map is actually an array with multiple dimensions obtained by the picture data through the convolutional neural network;

The classification method considers the incidence relation among different components of the vehicle by carrying out incidence relation reasoning, applies the incidence relation reasoning result to the basic characteristic diagram, enables the obtained characteristic diagram to have stronger expression capability, and improves the accuracy of classification of fine-grained images of the vehicle.

The method for extracting the basic feature map is not strictly limited in the invention, for example, in an embodiment, a residual error neural network with a 50-layer network layer can be used for extracting, specifically, an input picture is input into the residual error neural network with the 50-layer network layer, the output of a layer3 structure of a residual error through the network is used as the basic feature map, in an embodiment, before the input picture is input into the residual error neural network with the 50-layer network layer, the size of the input picture can be adjusted to 448 × 448 through a linear interpolation method, under the condition that the input picture is 448 × 448, the size of the basic feature map is 28 × 28, and each pixel point is a vector with 1024 numerical values.

Fig. 2 is a flow chart of the construction of the global structure diagram of the present invention, and as shown in fig. 2, the construction of the global structure diagram includes the construction of node information and side information of the global structure diagram;

the construction of the node information comprises the following steps:

carrying out region segmentation on the basic feature graph on the length dimension and the width dimension by adopting the size of a predefined region to obtain a multi-scale region, then respectively aggregating the features of each region through pooling operation in a neural network, and aggregating the features of each region into a one-dimensional vector with the length being the dimension of the channel of the basic feature graph to obtain node information;

the construction of the side information comprises the following steps:

and calculating the incidence relation between the node information by adopting matrix multiplication to obtain the side information.

The above relationship is shown in formula I:

V＝η(A，W_η)*(A，W) Formula I;

By calculating the incidence relation among the node information, the side information of the global structure chart is constructed in a self-adaptive manner; the constructed global structure diagram is adaptively changed for different input pictures in the presence of side information, so that the constructed global structure diagram can more robustly respond to the change of factors such as vehicle postures in the image.

In the present invention, the method for reasoning the association relationship of the global structure diagram by using the convolutional neural network is not strictly limited, for example, in an embodiment, the method for reasoning the association relationship of the global structure diagram by using the convolutional neural network may specifically be performed by the following method:

inputting the global structure chart into a graph convolution neural network with a layer of graph convolution layer, and carrying out incidence relation reasoning on different region characteristics by utilizing side information of the global structure chart;

the above correlation reasoning is performed by using a formula shown in formula II:

G＝V*(A*W_g) Formula II;

The node information in the global structure diagram obtained in the invention is still isolated, and although the node information is connected through the side information, the node information and the side information are not exchanged, so that the incidence relation reasoning is carried out on the global structure diagram by adopting the graph convolution neural network, and the expression capability of the vehicle global structure diagram is improved.

FIG. 3 is a flow chart of the application of global guidance information to a base feature map in accordance with the present invention; as shown in fig. 3, applying the global guide information on the basic feature map includes: applying global guidance information on channel dimensions and space dimensions of a basic feature map (which may be abbreviated as a feature map);

specifically, applying the global guidance information on the channel dimension of the basic feature map includes:

pooling the global structure graph into a one-dimensional vector (g) by a spatial direction pooling operation_c) Multiplying the one-dimensional vector by the basic characteristic diagram according to dimensions, wherein the multiplication simultaneously follows an array propagation rule;

applying the global guidance information in the spatial dimension of the base feature map comprises:

generating a local attention map (g) from the base feature map by a convolutional layer_s1)；

Pooling the global guide information in channel direction into a global feature map (g) with only one dimension by a pooling operation_s2)；

The classification method in the present invention is not strictly limited, and for example, the classification can be performed by using a conventional classification method in the art according to actual needs; in one embodiment, the enhanced feature maps may be aggregated into a feature vector having the dimensions of the channel of the basic feature map by a spatial pooling layer;

and inputting the feature vectors into a full-link layer for classification.

In one embodiment, the classification method further comprises:

taking Stanford Cars Dataset as a training Dataset, processing images in the training Dataset according to the steps from one to five to obtain a prediction classification result, performing loss calculation on the prediction classification result and a standard result given in the training Dataset according to a cross entropy loss method, then, transmitting the calculated loss back to a corresponding position of a parameter to be learned through a back propagation method, and modifying the parameter according to a random gradient descent method to obtain the parameter of the final model.

Specifically, in one embodiment, images in the Stanford Cars Dataset training Dataset are normalized to a size of 448x448, the normalized images are input into the above-described steps one-five of the present invention to obtain a predicted classification result, the loss of the predicted classification result and the standard result is calculated according to the formula shown in formula III below,

in the formula III, L is loss, N is the total number of images, M is the number of categories, Pic represents the probability when the current prediction classification result is the category c, yic represents whether the prediction classification result category is the same as the given category of the training data set, and is 1 when the prediction classification result category is the same and is not 0 when the prediction classification result category is not the same;

then, a random gradient descent is used as an optimizer, and the parameters thereof are updated according to formula IV:

in formula IV, θ is a parameter to be updated, and α represents the learning rate;

setting the learning rate to be 0.1, reducing the learning rate by 10 times after every 30 rounds of training, and obtaining the final parameters after 100 rounds of training; the change of the accuracy in the training process is shown in fig. 4, wherein the ordinate is the accuracy and the abscissa is the number of training rounds;

as can be seen from FIG. 4, the accuracy of the vehicle fine-grained classification method can reach 92.4%.

The invention also provides a vehicle fine-grained classification device, which comprises: the classification method comprises a memory, a processor connected with the memory, and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to execute the classification method.

According to the classification method, the incidence relation between different components of the vehicle is inferred, and the inference result of the incidence relation is applied to the basic characteristic diagram, so that the obtained characteristic diagram has stronger expression capability, and the accuracy of classification of fine-grained images of the vehicle is improved; in addition, the classification method does not need to label key areas and extract features, has wide application scenes and low calculation consumption.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description.

The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A vehicle fine-grained classification method is characterized by comprising the following steps:

2. The classification method according to claim 1, wherein the basic feature map extraction method comprises:

and inputting the input picture into the residual error neural network, and taking the output of the layer3 structure of the residual error neural network as the basic feature map.

3. The classification method according to claim 1, wherein the constructing the global structure diagram includes constructing node information and side information of the global structure diagram;

the construction of the node information comprises the following steps:

the construction of the side information comprises:

the correlation is shown in formula I:

V＝η(A，W_η)*(A，W) Formula I;

4. The classification method according to claim 3, wherein the performing association relationship inference on the global structure graph by using the graph convolutional neural network comprises:

5. The classification method according to claim 4, wherein the correlation reasoning is as shown in formula II:

G＝V*(A*W_g) Formula II;

6. The classification method according to claim 1, wherein the applying global guidance information on the basic feature map comprises: applying the global guidance information on the channel dimension and the space dimension of the basic feature map;

7. The classification method according to claim 1, wherein the inputting the enhanced feature map into a classifier for classification comprises:

8. The classification method according to claim 1, further comprising:

taking Stanford Cars Dataset as a training Dataset, processing images in the training Dataset according to the steps from one to five in the claim 1 to obtain a prediction classification result, carrying out loss calculation on the prediction classification result and a given standard result in the training Dataset according to a cross entropy loss method, then, transmitting the calculated loss back to a corresponding position of a parameter to be learned through a back propagation method, and modifying the parameter according to a random gradient descent method to obtain the parameter of a final model.

9. The classification method according to claim 8, wherein the loss calculation formula is shown in formula III:

10. A vehicle fine grain classification apparatus, characterized by comprising: a memory, a processor connected to the memory, and a computer program stored on the memory and executable on the processor, the processor executing the computer program to perform the classification method of any one of claims 1 to 9.