CN113723356A

CN113723356A - Heterogeneous characteristic relation complementary vehicle weight recognition method and device

Info

Publication number: CN113723356A
Application number: CN202111078976.7A
Authority: CN
Inventors: 李甲; 赵佳健; 赵一凡; 郭鑫; 赵沁平
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2021-11-30
Anticipated expiration: 2041-09-15
Also published as: CN113723356B

Abstract

The invention discloses a vehicle weight recognition method and device with complementary heterogeneous characteristic relations, which comprises the following steps: acquiring a vehicle image, inputting the vehicle image into a convolutional neural network, and extracting to obtain a plurality of heterogeneous features of different layers; constructing a graph relation complementation module, and fusing a plurality of heterogeneous features of different levels from a low level to a high level by using the graph relation complementation module based on a relation to obtain a cross-layer complementation feature; extracting local features of the vehicle image through progressive central pooling operation, and performing heterogeneous relation fusion on the local features and the complementary features of the highest level in the cross-layer complementary features by using a graph relation complementary module to obtain heterogeneous complementary features; and splicing the cross-layer complementary features and the heterogeneous complementary features to obtain the vehicle image characterization features comprising multilayer semantic information and multilayer local area information. The invention can be widely applied to computer vision systems in the fields of urban traffic, public safety, automatic driving and the like.

Description

Heterogeneous characteristic relation complementary vehicle weight recognition method and device

Technical Field

The invention relates to the field of computer vision and multimedia analysis, in particular to a vehicle weight recognition method and device with complementary heterogeneous characteristic relations.

Background

Given a vehicle image, the purpose of vehicle weight recognition is to be able to find the images of the vehicle taken from different cameras in the vehicle database. Vehicle weight recognition has gained increasing attention from researchers because of its wide application prospects in urban public safety and intelligent transportation systems. Vehicle re-identification has made significant progress in recent years with the disclosure of numerous data sets and the application of deep learning.

Disclosure of Invention

In light of the above-mentioned practical needs and key issues, the present invention is directed to: the vehicle weight recognition method with the heterogeneous feature relationship complementation is provided, and comprises the steps of inputting a queried vehicle image, extracting different heterogeneous features through a depth network, using a graph relationship complementation module to realize complementation based on the relationship among the features, and finally outputting the characterization features of the vehicle.

The invention comprises the following 4 steps:

step S100, obtaining a vehicle image, inputting the vehicle image into a convolutional neural network ResNet, and extracting to obtain a plurality of heterogeneous features of different layers, wherein the heterogeneous features of the different layers are heterogeneous features from a low layer to a high layer;

step S200, constructing a graph relation complementation module, and fusing a plurality of heterogeneous features of different levels from a low level to a high level by using the graph relation complementation module based on a relation to obtain a cross-layer complementation feature, wherein the cross-layer complementation feature is a multi-layer complementation feature from the low level to the high level;

step S300, extracting local features of the vehicle image through progressive central pooling operation, and performing heterogeneous relation fusion on the local features and the complementary features of the highest level in the cross-layer complementary features by using a graph relation complementary module to obtain heterogeneous complementary features, wherein the local features comprise local region information;

and S400, splicing the cross-layer complementary features and the heterogeneous complementary features to obtain vehicle image representation features comprising multilayer semantic information and multilayer local area information, wherein in the training stage from the step S100 to the step S400, a triple loss function and a cross entropy loss function are adopted to carry out supervision optimization network.

The biggest difficulty in vehicle weight recognition is that the characteristics of the same vehicle image taken at different angles are obviously different, for example, the front part of the vehicle and the tail part of the vehicle have great shape difference. For this difficulty, the current deep learning method can be divided into two types: data-driven and feature-complementary. Data-driven methods consider that solving this difficulty relies on sufficient data, but consider that real data acquisition costs are too high in reality, and for this reason, such methods generate a large amount of synthetic data using a three-dimensional (3D) rendering model or an antagonistic learning approach. The current feature complementation method mainly adopts local region features with high discrimination to supplement global features. In order to accurately locate a local area with high identification degree, the current method uses additional labeling information such as a key point positioning tag, a detection frame tag, a component segmentation tag and the like to assist the network in learning corresponding local features.

The method disclosed by the invention belongs to a vehicle re-identification method for performing feature complementation by utilizing heterogeneous features extracted by a deep network, and has two beneficial characteristics compared with the feature complementation network: 1) extra image marking information is not needed, so that the labor cost is saved, and the practicability of the method is improved; 2) the method has the advantages of supplement of key local area characteristics and complementation of semantic information of different levels.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a general block diagram of a heterogeneous feature relationship complementary vehicle re-identification method implementation of the present invention;

FIG. 2 is a flow chart of some embodiments of a heterogeneous feature relationship complementary vehicle re-identification method of the present invention;

FIG. 3 is a diagram of a vehicle weight recognition method S200 and S300 with complementary heterogeneous characteristic relationships according to the present invention;

FIG. 4 is a flowchart of the steps of a heterogeneous characteristic relationship complementary vehicle re-identification method S200 according to the present invention;

fig. 5 is a flowchart of the steps of the heterogeneous characteristic relationship complementary vehicle weight recognition method S300 according to the present invention.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 2 is a flow chart of some embodiments of the heterogeneous signature relationship complementary vehicle re-identification method of the present invention.

And S100, acquiring a vehicle image, inputting the vehicle image into a convolutional neural network ResNet, and extracting to obtain a plurality of heterogeneous features of different layers.

In some embodiments, the execution subject of the vehicle re-identification method with the complementary heterogeneous feature relationship may obtain a vehicle image, input the vehicle image into the convolutional neural network ResNet, and extract a plurality of heterogeneous features of different layers. Wherein, the heterogeneous characteristics of a plurality of different levels are heterogeneous characteristics from a low level to a high level. ResNet is a feature extractor that includes 4 stages. For the convolutional neural network ResNet, the execution body selects and extracts the output features of the last layer of the network blocks (the number of the network blocks can be dynamically adjusted according to specific situations) in the last 3 stages as the heterogeneous features of a plurality of different layers. For the design, ResNet and various variant networks (such as ResNeXt, SE-Net and the like) with different architectures can be adopted, and the last layer of characteristics of the network block corresponding to the corresponding stage can be extracted.

As an example, the execution subject may select the last 3 stages of extraction to be S2, S3, and S4 in fig. 1. S2 represents the second stage. S3 denotes the third stage. S4 denotes the fourth stage.

And S200, constructing a graph relation complementation module, and fusing a plurality of heterogeneous features of different levels from a low level to a high level based on a relation by using the graph relation complementation module to obtain the cross-layer complementation features.

Fig. 4 is a flowchart of the steps of the heterogeneous characteristic relationship complementary vehicle weight recognition method S200 according to the present invention. The step flow of S200 is as follows:

and step S210, carrying out graph relation complementation on the characteristics in the S2 stage in ResNet. And the executing main body constructs a graph relation complementation module, and the graph relation complementation module is utilized to fuse a plurality of heterogeneous features of different levels from a low level to a high level based on the relation to obtain the cross-layer complementation features. Wherein, the cross-layer complementary features are multi-layer complementary features from a low layer to a high layer. May include the steps of:

firstly, point multiplication operation and limitation of a preset threshold value alpha are carried out on the heterogeneous characteristic vectors V pairwise, and a relation coefficient matrix A of the heterogeneous characteristic vectors is obtained by using the following formula:

where a represents a relational coefficient matrix. A. the_ijIs shown with respect to V_i and V_jA matrix of relational coefficients. V denotes a heterogeneous feature vector. V_iRepresenting the ith heterogeneous feature vector. i represents a serial number. V_jRepresenting the jth heterogeneous feature vector. j represents a serial number.

Representing the transpose of the jth heterogeneous feature vector. α represents a predetermined threshold value.

And secondly, regularizing the relation coefficient matrix A to obtain a regularized relation coefficient matrix.

And thirdly, multiplying the regularized relation coefficient matrix by a heterogeneous characteristic vector V, and performing characteristic complementation based on the relation to obtain a cross-layer complementary characteristic.

As an example, the feature map extracted in S2 within ResNet is compressed into a vector v by a global average pooling function (GAP), and a 1 × 1 convolutional layer is used to reduce the dimensionality of the feature vector. Then, all the heterogeneous feature vectors are spliced into a heterogeneous feature vector V by using a splicing operation C ():

V＝C(W₁V₁，...，W_kV_k)。

where V represents a heterogeneous feature vector. C () represents a splicing operation. W represents a learnable parameter matrix in a 1 × 1 convolutional layer. W₁Representing the 1 st learnable parameter matrix. V₁Representing the 1 st heterogeneous feature vector. W_kRepresenting the kth learnable parameter matrix. k represents a serial number. V_kRepresenting the kth heterogeneous feature vector.

Then, through a graph relation complementation module, each feature vector fuses the relation-based complementary information of other vectors. In the graph relation complementation module, point multiplication operation and limitation of a preset threshold value alpha are carried out on the heterogeneous characteristic vectors V pairwise to obtain a relation coefficient matrix A of the heterogeneous characteristic vectors. And then, performing L1 regularization on the relation coefficient matrix A, namely L1 norm regularization, wherein L1 norm regularization is to add L1 norm to the cost function, so that the learning result can meet sparsification, and the feature extraction is facilitated. The individual values of each row in the constraint relationship matrix are between (0, 1). And then, graph regularization is carried out (2016, TN Kipf and MWelling) to enable the relation matrix to approximate a Laplace matrix, the regularized relation coefficient matrix is multiplied by a heterogeneous feature matrix V, feature complementation based on the relation is carried out, and the cross-layer complementary feature is obtained.

In some optional implementation manners of some embodiments, the using the graph relationship complementation module to perform relationship-based fusion from a low level to a high level on a plurality of heterogeneous features of different levels to obtain the cross-layer complementary feature may further include the following steps:

the cross-layer complementary features are multiplied by a learnable parameter matrix W and processed through a neuron removal layer dropout, a Batch regularization layer Batch Norm, and an activation function ReLU, as shown in fig. 3. The cross-layer complementary features are further enhanced using the following equations. Meanwhile, in order to prevent the gradient from disappearing, the method adds a residual error connection:

wherein ,

representing cross-layer complementary features with constant feature dimensions. ReLU () represents an activation function. BN () represents batch regularization layer operations. Dropout () represents the neuron removal layer operation. A denotes a relational coefficient matrix. V denotes a heterogeneous feature vector. W_aRepresenting a first learnable parameter matrix. W_bRepresenting a second learnable parameter matrix.

Cross-layer complementary features representing feature dimension compression.

In the graph relation complementation module, the learnable parameter matrix is used twice, and the function of using the parameter matrix for the first time is to not change the dimension of the feature vector and is used for keeping the original property of the cross-layer complementary feature. The second use of the parameter matrix serves to reduce the dimensionality of the heterogeneous feature vectors for reducing the complexity of subsequent operations.

Optionally, the constructing graph relationship complementation module performs relationship-based fusion from a low level to a high level on a plurality of heterogeneous features of different levels by using the graph relationship complementation module to obtain the cross-layer complementary feature, and may include the following steps:

firstly, carrying out semantic information complementation on low-level heterogeneous features through a graph relation complementation module, and obtaining the complemented heterogeneous features through the following formula:

wherein ,

representing the heterogeneous characteristics after complementation. G () represents graph relation complementation module processing. C () represents a splicing operation. W represents a learnable parameter matrix. W₁Representing the 1 st learnable parameter matrix. V₁Representing the 1 st heterogeneous feature vector. W_kRepresenting the kth learnable parameter matrix. k represents a serial number. V_kRepresenting the kth heterogeneous feature vector.

And secondly, splicing the complemented heterogeneous features into a feature vector to be input into the next layer, performing feature fusion with the heterogeneous features of a higher level, and obtaining the complemented heterogeneous features in the next layer by the following formula:

wherein V' represents a complementary heterogeneous characteristic in the next layer. C () represents a splicing operation. W'₁Representing the 1 st learnable parameter matrix in the next layer. V'₁Representing the 1 st heterogeneous eigenvector in the next layer. W'_uRepresents the u-th learnable parameter in the next layerA matrix of numbers. u represents a serial number. V'_uRepresenting the u-th heterogeneous eigenvector in the next layer. W'_u+1Represents the u +1 th learnable parameter matrix in the next layer.

Representing the heterogeneous characteristics after complementation.

In step S220, the features in stage S3 are graphically complemented with the complementary features of step S210. Generating feature vectors after the feature map extracted in S3 in ResNet is processed in the same way as step S210, wherein all the feature vectors in the S3 stage are matched with the complementary heterogeneous features output in step S210

The co-stitching generates a complementary heterogeneous signature V' in the next layer (e.g., S4). Then, through the graph relation complementation module, each feature vector fuses the relation-based complementary information of other vectors and is transmitted as a complementary vector to S4.

And step S230, performing relation complementation on the features in the stage S4 and the complementary features in the step S220, separating the spliced features, and transmitting the complementary features of the highest level into the step S300. After the feature map extracted in S3 in ResNet is subjected to the same operation as that in step S220, a complementary feature vector fused with semantic information of different levels is obtained, then a feature vector representing information of different levels is separated through a separation operation, and the feature vector representing information of the highest level is transmitted to step S300 and is further feature-complementary with the feature of the local area; while the other vectors are passed to step S400 as part of the final feature.

And step S300, extracting local features of the vehicle image through progressive central pooling operation, and performing heterogeneous relation fusion on the local features and the complementary features of the highest level in the cross-layer complementary features by using a graph relation complementary module to obtain heterogeneous complementary features.

In some embodiments, the execution subject may extract local features of the vehicle image through a progressive central pooling operation, and perform heterogeneous relationship fusion on the local features and complementary features of a highest level in the cross-layer complementary features by using a graph relationship complementation module, and obtain heterogeneous complementary features through the following steps, where the local features include local region information.

And performing information complementation on the local features and the complementary features of the highest level in the cross-layer complementary features under a graph relation complementation module based on the complementary features of the highest level in the cross-layer complementary features, so that the complementary features of the highest level fusing low-level semantic information obtain the complementation of local region information to obtain heterogeneous complementary features.

Fig. 5 is a flowchart of the steps of the heterogeneous characteristic relationship complementary vehicle weight recognition method S300 according to the present invention. The step flow of S300 is as follows:

and step S310, acquiring local region characteristics by adopting progressive central pooling operation and mapping operation. The progressive central pooling operation described above may include the steps of:

the method comprises the steps of firstly, based on priori knowledge, adopting progressive central pooling operation, taking the center of an image with the size of X multiplied by Y as a fixed point, gradually enlarging a sensing area, and extracting S mask tensors M of local areas with different sizes based on the image center. The priori knowledge is that in vehicle weight recognition, a vehicle is located in the middle of an image, and S mask tensors M of local areas based on the center of the image and with different sizes are extracted through the following formula:

where M denotes a mask tensor.

Representing the k mask tensor. x represents the abscissa of the position coordinates of the pixel points in the image. y represents the ordinate of the position coordinate of the pixel point in the image. k represents a serial number. X represents the width of the image. Y represents the height of the image.

Representing the square of the radius of the kth local area. R represents a moietyThe radius of the area. R_kThe radius of the kth local area is indicated. R has a value range of

And is

The value range of k is that k is less than or equal to S;

secondly, taking the position invariance of the convolutional neural network into consideration, extracting a corresponding region feature map from the global features through mapping operation and global pooling operation, carrying out linear change through a learnable parameter matrix and a learnable offset vector, and obtaining a local feature map F by using the following formula^r：

wherein ,

the kth local feature map is shown. F^rA local feature map is shown. W represents a learnable parameter matrix. W_kRepresenting the kth learnable parameter matrix. Phi denotes a global pooling operation. P () represents a mapping operation. F^gRepresenting a global feature map. M^kRepresenting the k mask tensor. B is_kRepresenting the k-th learnable offset vector. B denotes a learnable offset vector. k represents a serial number. S denotes the total number of mask tensors.

In step S320, the local feature is complementary to the highest-level feature in step S230, and the process goes to step S400. Each local feature forms a feature vector after the same operation as S210, then all feature vectors and the feature vector representing the highest-level information output in S230 are spliced together into a vector matrix, and then each feature vector is subjected to fusion complementation based on the relationship through a graph relationship complementation module to form a complementary vector including local key information, and then the complementary vector is transmitted to S400.

And S400, splicing the cross-layer complementary features and the heterogeneous complementary features to obtain vehicle image representation features comprising multilayer semantic information and multilayer local area information.

In some embodiments, the execution subject may splice the cross-layer complementary features and the heterogeneous complementary features to obtain vehicle image characterization features including multiple layers of semantic information and multiple layers of local region information. In the training phase from step S100 to step S400, a triplet loss function and a cross entropy loss function are used to perform supervised optimization of the network.

As an example, feature dimension compression is carried out on all cross-layer complementary features and local region complementary features according to the importance of the cross-layer complementary features and the local region complementary features, the feature dimension of the higher layer is higher, the feature dimension containing larger region information is higher, then the feature dimensions are spliced into a final feature vector, and in the training stage, a triple loss function and a cross entropy loss function are adopted to carry out supervision optimization network.

It is understood that the units described in a heterogeneous characteristic relationship-complementary vehicle weight recognition apparatus correspond to the respective steps in the method described with reference to fig. 2. Therefore, the operations, features and the generated beneficial effects described above for the method are also applicable to a vehicle re-identification device with a complementary heterogeneous feature relationship and the units included therein, and are not described again here.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A heterogeneous characteristic relationship complementary vehicle weight identification method comprises the following steps:

2. The method of claim 1, wherein the using graph relationship complementation module to merge a plurality of heterogeneous features of different levels from a lower level to a higher level and based on a relationship to obtain a cross-layer complementary feature comprises:

by carrying out dot product operation and limitation of a preset threshold value alpha on the heterogeneous characteristic vectors V pairwise, a relation coefficient matrix A of the heterogeneous characteristic vectors is obtained by using the following formula:

wherein A represents a relationship coefficient matrix, A_ijIs shown with respect to V_i and V_jV represents a heterogeneous eigenvector, V_iDenotes the ith heterogeneous feature vector, i denotes the sequence number, V_jDenotes the jthThe heterogeneous feature vector, j represents a sequence number,

representing the transpose of the jth heterogeneous feature vector, alpha representing a predetermined threshold;

regularizing the relation coefficient matrix A to obtain a regularized relation coefficient matrix;

and multiplying the regularized relation coefficient matrix by a heterogeneous characteristic vector V, and performing characteristic complementation based on the relation to obtain the cross-layer complementary characteristic.

3. The method of claim 2, wherein the using graph relationship complementation module to merge multiple heterogeneous features of different levels from lower level to upper level based on relationship to obtain cross-layer complementary features further comprises:

multiplying the cross-layer complementary features by a learnable parameter matrix W, and further enhancing the cross-layer complementary features through processing of a neuron removal layer dropout, a Batch regularization layer Batch Norm and an activation function ReLU by using the following formula:

wherein ,

BN () represents batch regularization layer operation, Dropout () represents neuron removal layer operation, A represents a relational coefficient matrix, V represents a heterogeneous feature vector, W represents a cross-layer complementary feature with invariant feature dimensions, ReLU () represents an activation function, BN () represents a batch regularization layer operation, Dropout () represents a neuron removal layer operation, A represents a relational coefficient matrix, V represents a heterogeneous feature vector, and W represents a non-linear function_aRepresenting a first learnable parameter matrix, W_bRepresents a second learnable parameter matrix,

cross-layer complementary features representing feature dimension compression.

4. The method of claim 3, wherein the constructing a graph relationship complementation module, using the graph relationship complementation module to perform fusion from a low level to a high level of heterogeneous features of different levels based on a relationship, to obtain a cross-layer complementary feature, comprises:

and carrying out semantic information complementation on the low-level heterogeneous features through a graph relation complementation module, and obtaining the complemented heterogeneous features through the following formula:

wherein ,

representing heterogeneous characteristics after complementation, G () representing graph relation complementation module processing, C () representing splicing operation, W representing parameter matrix capable of learning, W₁Denotes the 1 st learnable parameter matrix, V₁Denotes the 1 st heterogeneous feature vector, W_kDenotes the kth learnable parameter matrix, k denotes the sequence number, V_kRepresenting a kth heterogeneous feature vector;

splicing the complemented heterogeneous features into a feature vector, inputting the feature vector into the next layer, performing feature fusion with the heterogeneous features of a higher layer, and obtaining the complemented heterogeneous features in the next layer by the following formula:

wherein V 'represents a complementary heterogeneous feature in the next layer, C () represents a splicing operation, W'₁Represents the 1 st learnable parameter matrix, V 'in the next layer'₁Represents the 1 st heterogeneous feature vector, W 'in the next layer'_uRepresents the u-th learnable parameter matrix in the next layer, u represents the sequence number, V'_uRepresents the u-th heterogeneous feature vector, W'_u+1Represents the u +1 th learnable parameter in the next layerThe matrix is a matrix of a plurality of matrices,

representing the heterogeneous characteristics after complementation.

5. The method of claim 4, wherein the progressive central pooling operation comprises the steps of:

based on prior knowledge, adopting progressive central pooling operation, taking the center of an image with the size of X multiplied by Y as a fixed point, gradually enlarging a perception area, and extracting S mask tensors M of local areas based on the image center with different sizes, wherein the prior knowledge is that in vehicle weight identification, a vehicle is positioned in the middle of the image, and the mask tensors M of the local areas based on the image center with different sizes are extracted by the following formula:

where, M denotes a mask tensor,

denotes a k-th mask tensor, X denotes an abscissa of a position coordinate of a pixel point in the image, Y denotes an ordinate of a position coordinate of a pixel point in the image, k denotes a serial number, X denotes a width of the image, Y denotes a height of the image,

denotes the square of the radius of the kth partial region, R denotes the radius of the partial region, R_kRepresents the radius of the kth local area, and the value range of R is

And is

The value range of k is that k is less than or equal to S;

considering the position invariance of the convolutional neural network, extracting a corresponding region feature map from global features through mapping operation and global pooling operation, carrying out linear change through a learnable parameter matrix and a learnable offset vector, and obtaining a local feature map F by using the following formula^r：

wherein ,

denotes the kth local feature map, F^rRepresenting a local feature map, W representing a learnable parameter matrix, W_kRepresents the kth learnable parameter matrix, #representsthe global pooling operation, P () represents the mapping operation, F^gRepresenting a global feature map, M^kRepresenting the k mask tensor, B_kDenotes the kth learnable offset vector, B denotes the learnable offset vector, k denotes the ordinal number, and S denotes the total number of mask tensors.

6. The method of claim 5, wherein the utilizing a graph relationship complementation module to perform heterogeneous relationship fusion on the local features and the complementary features of the highest level in the cross-layer complementary features to obtain heterogeneous complementary features comprises:

7. A heterogeneous feature relationship complementary vehicle weight recognition device, comprising:

step S100, an obtaining unit is configured to obtain a vehicle image, input the vehicle image into a convolutional neural network ResNet, and extract and obtain a plurality of heterogeneous features of different layers, wherein the heterogeneous features of the plurality of different layers are heterogeneous features from a low layer to a high layer;

step S200, a fusion unit is configured to construct a graph relation complementary module, and the graph relation complementary module is utilized to perform fusion from a low level to a high level on a plurality of heterogeneous features of different levels based on the relation to obtain cross-layer complementary features, wherein the cross-layer complementary features are multi-layer complementary features from the low level to the high level;

step S300, a heterogeneous relation fusion unit is configured to extract local features of the vehicle image through progressive central pooling operation, and perform heterogeneous relation fusion on the local features and complementary features of the highest level in the cross-layer complementary features by using a graph relation complementary module to obtain heterogeneous complementary features, wherein the local features comprise local region information;

and S400, a splicing unit is configured to splice the cross-layer complementary features and the heterogeneous complementary features to obtain vehicle image representation features comprising multilayer semantic information and multilayer local area information, wherein in the training stage from S100 to S400, a triple loss function and a cross entropy loss function are adopted to carry out supervision optimization network.