CN113420742A - Global attention network model for vehicle weight recognition - Google Patents

Global attention network model for vehicle weight recognition Download PDF

Info

Publication number
CN113420742A
CN113420742A CN202110977958.6A CN202110977958A CN113420742A CN 113420742 A CN113420742 A CN 113420742A CN 202110977958 A CN202110977958 A CN 202110977958A CN 113420742 A CN113420742 A CN 113420742A
Authority
CN
China
Prior art keywords
global
channels
network model
global attention
vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110977958.6A
Other languages
Chinese (zh)
Other versions
CN113420742B (en
Inventor
庞希愚
田鑫
王成
姜刚武
郑艳丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jiaotong University
Original Assignee
Shandong Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Jiaotong University filed Critical Shandong Jiaotong University
Priority to CN202110977958.6A priority Critical patent/CN113420742B/en
Publication of CN113420742A publication Critical patent/CN113420742A/en
Application granted granted Critical
Publication of CN113420742B publication Critical patent/CN113420742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of vehicle identification, in particular to a global attention network model for vehicle weight identification, which comprises a backbone network, a local branch for dividing a feature map into two parts and two global branches with global attention modules, wherein the local branch is connected with the two local branches; the backbone network is split into 3 branches; the global attention network model extracts feature vectors using global average pooling to cover the entire vehicle information; the local branch only divides the feature map into two parts horizontally. The invention constructs a global attention network with three branches to extract a large amount of discriminative information; two global attention modules, namely CGAM and SGAM, are constructed, the global relation of nodes is modeled through the average pairwise relation among the nodes, the importance degree of the nodes is deduced, and the calculation complexity is reduced; the feature diagram is only horizontally divided into two parts on the local branch, and the problems of misalignment and local inconsistency are solved to a great extent.

Description

Global attention network model for vehicle weight recognition
Technical Field
The invention relates to the technical field of vehicle identification, in particular to a global attention network model for vehicle weight identification.
Background
The vehicle weight recognition refers to recognition of target vehicles under different cameras, plays an important role in intelligent transportation and smart cities, and has a lot of applications in real life. For example, in real traffic monitoring systems, vehicle re-identification may serve as a location, supervision, and criminal investigation for the target vehicle. With the rise of deep neural networks and the introduction of large data sets, improving the accuracy of vehicle re-identification has become a research hotspot in the fields of computer vision and multimedia in recent years. However, due to different viewing angles of the plurality of cameras and influences of illumination, shielding and the like, the intra-class feature distance becomes larger, the inter-class feature distance becomes smaller, and the difficulty of identification is further increased.
Pedestrian heavy identification and vehicle heavy identification are essentially the same and both belong to the image retrieval task. In recent years, Convolutional Neural Network (CNN) based methods have made great progress in pedestrian re-identification. Therefore, the CNN model applied to pedestrian re-recognition also has good performance in vehicle recognition. Most advanced CNN-based pedestrian re-recognition methods employ CNN models pre-trained on ImageNet and adjust them on the re-recognition dataset under different loss supervision.
CNN-based vehicle and pedestrian re-identification is often focused on extracting global features of people or vehicle images. In this way, complete feature information can be obtained globally, but the global features cannot well describe intra-class differences caused by factors such as viewing angles. In order to extract fine-grained local features, pedestrian re-identification Network models such as a PCB (partial-based Convolutional base) with local branches and an MGN (Multiple granular Network) are designed. These networks divide the feature map into several strips to extract local features. In addition, the latter combines local features with global features, further improving the performance of the model. For vehicle weight recognition, vehicles of the same vehicle type are substantially identical in global appearance. While in small areas, such as check marks, decorations, and usage marks, they may vary greatly. Therefore, the local detailed information of the automobile is also important for the task of identifying the weight of the automobile.
However, these local-based models have a common disadvantage in that they require relatively aligned body parts for the same person in order to learn significant local features. Although vehicle heavy recognition and pedestrian heavy recognition are both image retrieval problems in nature, the body part boundaries of vehicles are not as sharp as pedestrians, and the bodies of the same vehicle are greatly different from one another when viewed from different angles. On the other hand, strictly uniform partitioning of the feature map destroys local intra-consistency. And the destruction degree of local consistency is generally proportional to the number of local partitions, i.e. the greater the number of partitions, the easier it is to destroy local intra-consistency. This makes it difficult for deep neural networks to obtain meaningful fine-grained local information from the local, thereby degrading performance. Therefore, it is not feasible to simply apply the partial segmentation method in the task of re-identifying the pedestrian to the vehicle.
Attention mechanisms play an important role in the human perception system, helping people focus on identifying useful distinctive features, eliminating some noise and background interferences. For network models, the attention mechanism may focus the model on the target subject rather than the background, and is widely used in the task of re-recognition. Therefore, many networks with attention modules are proposed. However, they mainly build attention to nodes (channels, spatial positions) by direct convolution on self information, or directly reconstruct nodes using pairwise relationships between nodes, and do not consider that the global relationships between nodes have an important guiding role in building attention (importance) to nodes.
In the task of vehicle weight identification, illumination change, perspective change and resolution difference can be generated at different camera positions, so that the intra-class difference of the same vehicle at different visual angles is large, or the inter-class difference of different vehicles is small due to the same vehicle type. This greatly increases the difficulty of the vehicle re-identification task. The key to vehicle re-identification is the extraction of vehicle discriminating features. In order to better extract such features from the vehicle image and improve the accuracy of the recognition, it is necessary to propose a global attention network model for vehicle re-recognition.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a global attention network model for vehicle weight recognition, which can simply extract local fine information and solve the problems of local misalignment and local consistency damage to a great extent; and reliable attention of the nodes can be established according to the global relationship between the nodes, so that more credible significance information for vehicle re-identification is extracted.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a global attention network model for vehicle weight recognition comprises a backbone network, a local branch dividing a feature map into two parts and two global branches with global attention modules; the backbone network is split into 3 branches; the global attention network model extracts and obtains a feature vector on a final feature map output by each branch by using a global average pooling GAP (GAP) so as to cover the whole body information of the vehicle image; the local branch only horizontally divides the vehicle characteristic diagram into two parts, and the problems of misalignment and local consistency damage can be solved to a great extent.
The two global branches respectively have a channel global Attention module cgam (channel global Attention module) and a spatial global Attention module sgam (spatial global Attention module) for extracting more reliable saliency information. The backbone network employs the ResNet50 network model.
In order to improve the resolution, the step size of down-sampling of res _ conv5_1 blocks of Global Branch Global-C Branch and Local Branch is changed from 2 to 1, and then a spatial Global attention module and a channel Global attention module are respectively added after res _ conv5 blocks of two Global branches to extract reliable significance information and enhance the feature discrimination capability, wherein res _ cov5 represents the fourth layer of the Resnet50 network model; res _ cov5_1 represents the first component block in the fourth layer of the Resnet50 network model.
After extracting the feature vectors using the global mean pooling GAP on each branch, the feature dimensionality reduction module, which contains 1 × 1 convolution, BN layers, and ReLU functions, reduces the feature vector dimensionality to 256, providing a compact feature representation. The network model is trained by applying triple loss and cross entropy loss on each branch, specifically, the triple loss is directly applied on the 256-dimensional feature vector, and a full connection layer is added behind the 256-dimensional feature vector to apply the cross entropy loss. In the testing stage, the features before the fully connected layer of the three branches are connected as the final output features.
The CGAM architecture: quantity of design
Figure 519131DEST_PATH_IMAGE001
Is a feature map of a CGAM input, in which
Figure 842796DEST_PATH_IMAGE002
The number of the channels is the number of the channels,
Figure 29058DEST_PATH_IMAGE003
and
Figure 830792DEST_PATH_IMAGE004
the spatial height and width, respectively, of the tensor; slave function
Figure 786109DEST_PATH_IMAGE005
And
Figure 687200DEST_PATH_IMAGE006
to obtain tensor
Figure 360758DEST_PATH_IMAGE007
And
Figure 231762DEST_PATH_IMAGE008
and will be
Figure 307166DEST_PATH_IMAGE009
Is deformed into
Figure 707054DEST_PATH_IMAGE010
Will be
Figure 25165DEST_PATH_IMAGE011
Is deformed into
Figure 168702DEST_PATH_IMAGE012
Figure 98612DEST_PATH_IMAGE013
And
Figure 669402DEST_PATH_IMAGE014
the architecture is identical, consisting of two 1 × 1 convolutions and two 3 × 3 packet convolutions, as well as two BatchNormal layers and two Relu activation functions. The above-mentioned
Figure 583131DEST_PATH_IMAGE005
Architecture, using two 3 x 3 packet convolutions to increase the receptive field and decrease the number of parameters. Then, matrix multiplication is used to obtain matrix
Figure 264779DEST_PATH_IMAGE015
It shows pairwise relationships of all channels.
Figure 845933DEST_PATH_IMAGE016
Writing into:
Figure 587624DEST_PATH_IMAGE017
in addition, the matrix
Figure 723070DEST_PATH_IMAGE016
Each row element of (a) represents a pair-wise relationship between each channel and all other channels. The average pairwise relationship of the channels is modeled to obtain a global relationship of the channels. Then, the global relationship importance of one channel relative to other channels is used to obtain the importance of the channel in all channelsThe weight of (c).
The process of obtaining the weight of a channel in all channels by using the global relationship importance of the channel relative to other channels is as follows: applying relational average pooled RAPs to matrices
Figure 576055DEST_PATH_IMAGE016
To obtain a vector
Figure 683819DEST_PATH_IMAGE018
Wherein
Figure 127570DEST_PATH_IMAGE019
For the number of channels, at which time each element of the vector r represents the global relationship between each channel and all channels, the first element of the vector r
Figure 956504DEST_PATH_IMAGE020
An element is defined as
Figure 987478DEST_PATH_IMAGE021
Figure 949749DEST_PATH_IMAGE022
All global relationships are converted to a weight for each channel using the softmax function.
Figure 643029DEST_PATH_IMAGE023
In order to obtain an attention map
Figure 159592DEST_PATH_IMAGE024
First, vector quantity
Figure 455576DEST_PATH_IMAGE025
Is deformed into
Figure 600249DEST_PATH_IMAGE026
Then broadcast as
Figure 385803DEST_PATH_IMAGE027
I.e. the attention map obtained
Figure 717558DEST_PATH_IMAGE024
. Finally, the two elements at the same position are multiplied by element-wise multiplication and the two elements at the same position are added by element-wise sum to obtain the final feature map
Figure 348391DEST_PATH_IMAGE028
Figure 347571DEST_PATH_IMAGE028
Can be expressed as:
Figure 569605DEST_PATH_IMAGE029
the SGAM architecture: spatial attention and channel attention, which work in a similar manner, use global relationships between locations and channels to determine the importance of each location and channel, respectively. However, SGAM has three differences compared to CGAM. First, let the quantity of the image
Figure 654235DEST_PATH_IMAGE030
Is a characteristic diagram of the SGAM input,
Figure 924696DEST_PATH_IMAGE031
and
Figure 388170DEST_PATH_IMAGE032
the system has the same structure and comprises a 1 × 1 convolution, a BN layer and a ReLU function, and the number of channels is measured
Figure 656471DEST_PATH_IMAGE033
Is reduced to
Figure 697239DEST_PATH_IMAGE034
Figure 607558DEST_PATH_IMAGE035
For the reduction factor, set to 2 in the experiment; by a function
Figure 784592DEST_PATH_IMAGE036
And
Figure 286112DEST_PATH_IMAGE037
obtain the tensor
Figure 79756DEST_PATH_IMAGE038
And
Figure 856082DEST_PATH_IMAGE039
and will be
Figure 153202DEST_PATH_IMAGE038
Is deformed into
Figure 622361DEST_PATH_IMAGE040
Will be
Figure 903300DEST_PATH_IMAGE039
Is deformed into
Figure 748897DEST_PATH_IMAGE041
(ii) a Matrix multiplication is then employed to determine the pairwise relationship between locations and obtain a matrix
Figure 431682DEST_PATH_IMAGE042
Figure 540583DEST_PATH_IMAGE043
Second, to determine the importance of a location, the matrix is aligned
Figure 980923DEST_PATH_IMAGE044
Obtaining vectors using relational average pooled RAPs
Figure 833472DEST_PATH_IMAGE045
(ii) a Vector quantity
Figure 449393DEST_PATH_IMAGE046
To (1) a
Figure 604562DEST_PATH_IMAGE047
The individual elements may be represented as:
Figure 860094DEST_PATH_IMAGE048
thirdly, the invention firstly transforms the vector generated by the softmax function into the vector
Figure 516334DEST_PATH_IMAGE049
Then broadcast it as
Figure 908132DEST_PATH_IMAGE050
In CGAM and SGAM, the feature map to which attention is applied and the original feature map are added to obtain a final output feature map. There are two reasons for using the addition operation here. First, the normalization function used here is Softmax, which is a function that maps weight values to a range of 0 to 1, and the sum of all weight values is 1. Due to the existence of a large number of weights, the feature mapping element value output by the attention module is possibly small, the features of the original network are broken, and if the original feature map is not added, great difficulty is brought to training. Secondly, this addition operation is also highlighted
Figure 624416DEST_PATH_IMAGE051
Reliable saliency information in (1). Experiments also show that the model has good performance through the residual structure. Compared with the model without addition operation, the model has 1.2%/1.5% improvement on mAP and Top-1, respectively.
For the Loss function, the most common Cross Entropy Loss function (Cross entry Loss) and triple Loss function (triple Loss) are used.
The cross entropy represents the difference between the true probability distribution and the predicted probability distribution. Can be expressed as:
Figure 367244DEST_PATH_IMAGE052
wherein
Figure 764858DEST_PATH_IMAGE053
Indicating the number of images in the small batch,
Figure 948846DEST_PATH_IMAGE054
a real tag representing an ID is attached to the tag,
Figure 304872DEST_PATH_IMAGE055
is shown as
Figure 472679DEST_PATH_IMAGE056
The ID of the class predicts the logarithm.
The objective of the triplet penalty is to keep samples with the same label as close as possible in the embedding space, while samples with different labels are kept as far apart as possible. The invention adopts the triple loss batch-hard triple loss of the hard batch to randomly extract each small batch
Figure 673985DEST_PATH_IMAGE057
An identity and
Figure 774796DEST_PATH_IMAGE058
an image to meet the requirements of the batch-hard triplet loss. The loss can be defined as
Figure 832882DEST_PATH_IMAGE059
Wherein the content of the first and second substances,
Figure 815881DEST_PATH_IMAGE060
Figure 617615DEST_PATH_IMAGE061
Figure 572933DEST_PATH_IMAGE062
is a feature extracted from anchor point, positive sample, negative sample, respectively, will
Figure 512903DEST_PATH_IMAGE063
Is set to 1.2, which helps to reduce intra-class variation and broaden inter-class variation to improve model performance.
The total training penalty is the sum of the cross-entropy penalty and the triplet penalty, consisting of
Figure 655303DEST_PATH_IMAGE064
Wherein
Figure 526307DEST_PATH_IMAGE065
And
Figure 601710DEST_PATH_IMAGE066
is a hyperparameter that balances the two loss terms, both set to 1 in the experiment,
Figure 798336DEST_PATH_IMAGE067
is the number of branches.
The invention has the technical effects that:
compared with the prior art, the global attention network model for vehicle weight recognition has the following advantages: the invention constructs a global attention network with three branches to extract a large amount of discriminative information; based on the global relationship of nodes, the invention constructs two global attention modules of CGAM and SGAM; the global relationship of the nodes is obtained by modeling the average pairwise relationship between the nodes and all other nodes, and then the global importance of the nodes is deduced, so that on one hand, the difficulty of attention learning is reduced, the calculation complexity is reduced, on the other hand, more reliable node importance measurement can be obtained through group evaluation, and more reliable significance information is extracted; according to the invention, the vehicle image is only horizontally divided into two parts on the local branch, so that the problems of part misalignment and local consistency damage can be solved to a great extent. The effectiveness of the algorithm is verified through experiments on two vehicle weight identification data sets. The performance of this process is superior to the SOTA process.
Drawings
FIG. 1 is a schematic diagram of the overall network architecture of the present invention;
FIG. 2 is a block diagram of the CGAM architecture of the present invention;
FIG. 3 shows the present invention
Figure 490349DEST_PATH_IMAGE005
Architecture diagram;
fig. 4 is a diagram of the SGAM architecture of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings of the specification.
As shown in fig. 1, a global attention network model for vehicle weight recognition includes a backbone network, a local branch dividing a feature map into two parts, and two global branches having global attention modules; the backbone network uses ResNet50 as the basis of feature map extraction, multi-loss training is carried out by adjusting the stage and removing the original full connection layer, and a ResNet50 backbone network is split into 3 branches after res _ conv4_1 residual blocks; the global attention network model covers the entire body part of the vehicle image using a global average pooling GAP; the local branch only divides the vehicle characteristic diagram into two parts horizontally, and the problems of misalignment and damaged local consistency can be solved to a great extent.
In order to improve the resolution, the step size of down-sampling of res _ conv5_1 blocks of the Global Branch Global-C Branch and the Local Branch is changed from 2 to 1, and then a spatial Global attention module and a channel Global attention module are respectively added after res _ conv5 blocks of the two Global branches to extract reliable significance information and enhance the feature identification capability.
The feature dimensionality reduction module contains a 1 x 1 convolution, a BN layer, and a ReLU function that reduces the feature vector dimensionality to 256, thereby providing a compact feature representation. The network model is trained by applying triple loss and cross entropy loss on each branch, specifically, the triple loss is directly applied on the 256-dimensional feature vector, and a full connection layer is added behind the 256-dimensional feature vector to apply the cross entropy loss. In the testing stage, the features before the fully connected layer of the three branches are connected as the final output features.
The two global branches respectively have a channel global Attention module cgam (channel global Attention module) and a spatial global Attention module sgam (spatial global Attention module) for extracting more reliable saliency information.
As shown in FIG. 2, the CGAM architecture is illustrated, with the amount of design
Figure 899464DEST_PATH_IMAGE001
Is a feature map of a CGAM input, in which
Figure 360533DEST_PATH_IMAGE002
The number of the channels is the number of the channels,
Figure 728060DEST_PATH_IMAGE003
and
Figure 782735DEST_PATH_IMAGE004
the spatial height and width, respectively, of the tensor; slave function
Figure 995542DEST_PATH_IMAGE068
And
Figure 311116DEST_PATH_IMAGE069
to obtain tensor
Figure 318387DEST_PATH_IMAGE007
And
Figure 984991DEST_PATH_IMAGE008
and will be
Figure 1489DEST_PATH_IMAGE009
Is deformed into
Figure 437149DEST_PATH_IMAGE010
Will be
Figure 880900DEST_PATH_IMAGE011
Is deformed into
Figure 503643DEST_PATH_IMAGE012
Figure 527093DEST_PATH_IMAGE013
And
Figure 817260DEST_PATH_IMAGE014
the architecture is the same, consisting of two 1 × 1 convolutions and two 3 × 3 packet convolutions as well as two BatchNormal layers and two Relu activation functions. The above-mentioned
Figure 697492DEST_PATH_IMAGE013
Architecture, using two 3 x 3 packet convolutions to increase the receptive field and decrease the number of parameters. Then, matrix multiplication is used to obtain matrix
Figure 73109DEST_PATH_IMAGE015
Which shows the pair-wise relationship of all channels,
Figure 165830DEST_PATH_IMAGE016
writing into:
Figure 576083DEST_PATH_IMAGE070
in addition, the matrix
Figure 361637DEST_PATH_IMAGE016
Each row element of (a) represents a pair-wise relationship between each channel and all other channels. The average pairwise relationship of the channels is modeled to obtain a global relationship of the channels. Then, the global relationship importance of one channel relative to other channels is used to obtain the weight of the channel in all channelsAnd (4) heavy.
Specifically, as shown in fig. 3, the number of channels of the input tensor is first convolved by 1 × 1
Figure 693392DEST_PATH_IMAGE002
Halving, the signature was then divided into 32 groups by 3 x 3 block convolution, each group was convolved separately, and the signature size was kept constant by filling in one value. In addition, this 3 x 3 convolution keeps the number of channels unchanged. The BatchNormal (BN) layer is used for normalization and the Relu activation function is used to add non-linear factors. Then, 1 × 1 and 3 × 3 convolution is used again to make the number of channels consistent with the original input tensor.
The process of obtaining the weight of a channel in all channels by using the global relationship importance of the channel relative to other channels is as follows: applying Relational Average Pooling (RAP) to matrices
Figure 589804DEST_PATH_IMAGE016
To obtain a vector
Figure 854563DEST_PATH_IMAGE018
Wherein
Figure 76597DEST_PATH_IMAGE019
For the number of channels, at which time each element of the vector r represents the global relationship between each channel and all channels, the first element of the vector r
Figure 426807DEST_PATH_IMAGE020
An element is defined as
Figure 861330DEST_PATH_IMAGE021
Figure 715017DEST_PATH_IMAGE071
By usingsoftmaxThe function converts all global relationships into a weight for each channel.
Figure 576794DEST_PATH_IMAGE072
In order to obtain an attention map
Figure 148720DEST_PATH_IMAGE024
First, vector quantity
Figure 652514DEST_PATH_IMAGE073
Is deformed into
Figure 626286DEST_PATH_IMAGE026
Then broadcast it as
Figure 190123DEST_PATH_IMAGE027
. Finally, applying element-wise multiplication and element-wise sum to the original feature map to obtain the final feature
Figure 514925DEST_PATH_IMAGE074
Figure 291251DEST_PATH_IMAGE028
Can be expressed as:
Figure 119530DEST_PATH_IMAGE075
as shown in fig. 4, which illustrates the SGAM architecture, spatial attention and channel attention utilize global relationships between locations and between channels, respectively, to determine the importance of each location and channel, and they operate similarly. However, SGAM has three differences compared to CGAM. First, let the quantity of the image
Figure 588688DEST_PATH_IMAGE030
Is a characteristic diagram of the SGAM input,
Figure 135207DEST_PATH_IMAGE031
and
Figure 246383DEST_PATH_IMAGE032
the system has the same structure and comprises a 1 × 1 convolution, a BN layer and a ReLU function, and the number of channels is measured
Figure 194747DEST_PATH_IMAGE033
Is reduced to
Figure 834807DEST_PATH_IMAGE034
Figure 868622DEST_PATH_IMAGE035
For the reduction factor, set to 2 in the experiment; by a function
Figure 252330DEST_PATH_IMAGE036
And
Figure 55201DEST_PATH_IMAGE037
obtain the tensor
Figure 131741DEST_PATH_IMAGE038
And
Figure 387273DEST_PATH_IMAGE039
and will be
Figure 574672DEST_PATH_IMAGE038
Is deformed into
Figure 497629DEST_PATH_IMAGE040
Will be
Figure 745071DEST_PATH_IMAGE039
Is deformed into
Figure 511336DEST_PATH_IMAGE041
(ii) a Matrix multiplication is then employed to determine the pairwise relationship between locations and obtain a matrix
Figure 971267DEST_PATH_IMAGE042
Figure 483151DEST_PATH_IMAGE076
Second, to determine the importance of a location, the present invention applies to the matrix
Figure 370336DEST_PATH_IMAGE044
Obtaining vectors using relational average pooled RAPs
Figure 6985DEST_PATH_IMAGE045
. Vector quantity
Figure 536186DEST_PATH_IMAGE046
To (1) a
Figure 168156DEST_PATH_IMAGE047
The individual elements may be represented as:
Figure 491821DEST_PATH_IMAGE077
thirdly, the invention firstly transforms the vector generated by the softmax function into the vector
Figure 209241DEST_PATH_IMAGE078
Then broadcast it as
Figure 542134DEST_PATH_IMAGE079
In CGAM and SGAM, the feature map to which attention is applied and the original feature map are added to obtain a final output feature map. There are two reasons for using an addition operation. First, the normalization function used here is Softmax, which is a function that maps weight values to a range of 0 to 1, and the sum of all weight values is 1. Due to the existence of a large number of weights, the feature mapping element value output by the attention module is possibly small, the features of the original network are broken, and if the original feature map is not added, great difficulty is brought to training. Secondly, this addition operation is also highlighted
Figure 28610DEST_PATH_IMAGE051
Reliable saliency information in (1). Experiments have also shown that, by this meansThe residual structure and the model have good performance. Compared with the model without addition operation, the model has 1.2%/1.5% improvement on mAP and Top-1, respectively.
For the Loss Function, the most common Cross Entropy Loss Function (Cross entry Loss Function) and triple Loss Function (triple Loss) are used.
The cross entropy represents the difference between the true probability distribution and the predicted probability distribution. Can be expressed as
Figure 523176DEST_PATH_IMAGE080
Wherein
Figure 196734DEST_PATH_IMAGE081
Indicating the number of images in the small batch,
Figure 536580DEST_PATH_IMAGE082
a real tag representing an ID is attached to the tag,
Figure 877562DEST_PATH_IMAGE083
is shown as
Figure 74188DEST_PATH_IMAGE084
The ID of the class predicts the logarithm.
The objective of the triplet penalty is to keep samples with the same label as close as possible in the embedding space, while samples with different labels are kept as far apart as possible. The invention adopts the batch-hard triplet loss to randomly draw each small batch
Figure 235042DEST_PATH_IMAGE057
An identity and
Figure 175317DEST_PATH_IMAGE058
an image to meet the requirements of the batch-hard triplet loss. The loss can be defined as
Figure 636385DEST_PATH_IMAGE085
Wherein the content of the first and second substances,
Figure 738333DEST_PATH_IMAGE060
Figure 652062DEST_PATH_IMAGE061
Figure 864869DEST_PATH_IMAGE062
is a feature extracted from anchor point, positive sample, negative sample, respectively, will
Figure 180444DEST_PATH_IMAGE063
Is set to 1.2, which helps to reduce intra-class variation and broaden inter-class variation to improve model performance.
The total training penalty is the sum of the cross-entropy penalty and the triplet penalty, consisting of
Figure 718873DEST_PATH_IMAGE086
Wherein
Figure 119898DEST_PATH_IMAGE065
And
Figure 401975DEST_PATH_IMAGE066
is a hyperparameter that balances the two loss terms, both set to 1 in the experiment,
Figure 572056DEST_PATH_IMAGE067
is the number of branches.
Experiment:
data set: the model of the invention was evaluated on two common vehicle weight identification data sets, including VeRi776 and VehicleID.
Veni 776, which consists of about 5 million 776 images of a car, taken by 20 cameras at different locations and at different viewing angles. The training set contains 576 vehicles and the test set contains the remaining 200 vehicles.
The VehicleID comprises day data captured by a plurality of real monitoring cameras distributed in a small city in China. 26267 cars (221763 pictures) were in the entire dataset. And extracting a small test set, a medium test set and a large test set according to the sizes of the test sets. In the reasoning stage, one image is randomly selected for each vehicle to serve as a gallery set, and other images serve as query images.
Evaluation indexes are as follows: on the basis of comprehensively evaluating each data set, two indexes of CMC and mAP are adopted to compare with the prior method. The CMC is an estimate of finding a correct match in the top K of the returned result. The mAP is a comprehensive index comprehensively considering the accuracy and the recall ratio of the query result.
The implementation details are as follows: the ResNet50 is selected as the backbone network that generates the features. The invention applies the same training strategy to both data sets. The RGB three channels for each pixel are normalized and the image size is adjusted to 256 x 256 before being input to the network. Randomly extracting from each mini-batch
Figure 750228DEST_PATH_IMAGE057
Individual identities, each identity drawn at random
Figure 904129DEST_PATH_IMAGE058
And (4) images to meet the requirement of triplet loss. In the experiment, set up
Figure 396421DEST_PATH_IMAGE087
And
Figure 421009DEST_PATH_IMAGE088
to train the model proposed by the present invention. For the margin parameter of the triplet loss, the invention was set to 1.2 in all experiments. Adam was used as the optimizer. For the learning rate strategy, the initial learning rate is set to 2e-4, decays to 2e-5 after 120 epoch, and further drops to 2e-6, 2e-7 at 220, 320 epoch for faster convergence. The entire training process lasted 450 epochs, each branch was trained with cross-entropy loss and a batch-hard triplet loss.
In the testing phase, the Veri776 dataset was tested in the form of an image-to-track. The minimum image-to-image distance is taken as the image-to-track distance by calculating the distance between the query image and all images in the gallery set. For the VehicleID data set, three test sets thereof were tested separately. And connecting the characteristics before the fully-connected layer of the three branches as final output characteristics.
The experimental results are as follows: the results of the proposed model and other most advanced models on both data sets were compared. The prior art has designed a local maximum occlusion representation (LOMO) to address the problem of visual and light changes. To obtain better results on the Complecars dataset, the Googlenet model was fine-tuned, and the fine-tuned model was called GoogleNet. And then, adopting SIFT, Color Name and GoogleLeNet characteristics to identify the vehicle in the union domain. RAM first divides the image horizontally into three parts and then embeds detailed visual cues in these local areas. To improve the ability to identify subtle differences, PRNs introduce local normalization (PR) constraints in the vehicle re-identification task. The parsing-based view-aware embedded networks (PVENs) may avoid mismatch of local features under different views. Generative Adaptive Networks (GAN) use Generative and discriminative models to learn each other to produce good output. The VAMI generates the characteristics of the different views with the help of the GAN. TAMR proposes a two-stage attention network to gradually focus on fine but distinct local details in the visual appearance of the vehicle and proposes a multi-grain rank-loss learning structured depth feature embedding.
The results of the experiments on VeRi776 and VehicleiD are shown in Table 1 and Table 2, respectively. Of all vision-based methods, the TGRA method of the present invention achieves the best results over others. It is found from table 1 that, firstly, TGRA is increased by 2.7% in maps and 0.1% in CMC @1 compared to PVEN. Secondly, the CMC @5 of the method of the present invention is already over 99.1%, which is a promising performance in real vehicle re-identification scenarios. Table 2 shows the results of the comparison on three test data sets of different scales. The TGRA of the invention is improved by 4.0% +, compared with PRN, on different test data on CMC @ 5. It should be noted that some advanced network models require the use of other auxiliary models, which increases the complexity of the algorithm. For example, PVEN uses U-Net to parse a vehicle into four different views. The PRN uses YOLO as a detection network for local positioning. TAMR uses STN to automatically position the windshield and the head of the vehicle. However, the model of the present invention still has better performance without using any auxiliary model.
The model of the invention reports 82.24% mAP, 95.77% CMC @1 and 99.11% CMC @5 on the VeRi776 test set. CMC @1 was reported to be 81.51%, 95.54%, 72.81%, CMC @5 was 96.38%, 93.69%, 91.01% on three test sets of VehicleiD. All results were obtained in single query mode, with no reordering. Table 1:
Figure 35661DEST_PATH_IMAGE089
table 2:
Figure 145699DEST_PATH_IMAGE091
ablation study: a number of experiments were performed on both data sets to verify the validity of the key module in the TGRA. The optimal structure of the model is determined by comparing the performances of different structures.
Validity of CGAM and SGAM: CGAM and SGAM are a channel global attention module and a spatial global attention module, respectively. The results on the test set of VeRi776 are shown in Table 3. Table 3:
Figure 238420DEST_PATH_IMAGE092
the validity of the local branch is verified on the Veni 776 as shown in Table 4. "w/o" means none; "local" refers to a local branch of TGRA; "PART-3" and "PART-4" refer to references that divide the feature map into three or four PARTs, respectively.
Table 4:
Figure 117515DEST_PATH_IMAGE093
the model of the invention consists of three branches, on two global branches, channel global attention and spatial global attention are used to extract reliable saliency information. The present invention verifies the effect of SGAM and CGAM on the model, respectively (table 3). As can be seen from Table 3, on the test set of VeRi776, "Baseline + SGAM" was improved by 0.6% and 0.6% at mAP and CMC @1, respectively, compared to Baseline. In addition, compared with Baseline, "Global-C (Branch)" is improved by 1.7% in mAP and 1.0% in CMC @ 1. Then, when both branches with CGAM and SGAM were trained simultaneously, the model yielded 5.0% and 1.6% improvement in mAP and CMC @1 compared to Baseline.
In addition, the invention also carries out qualitative analysis on the global attention module so as to more intuitively see the effectiveness of the global attention module. Experimental results show that the network with the global attention module can accurately find the same vehicle image. It is very difficult to identify the same vehicle when the query image and the target image are at different perspectives, but the model of the present invention can also identify the same vehicle well. Therefore, the global attention module of the present invention performs well in enhancing the difference pixels and suppressing the noise pixels.
Local branch verification: TGRA w/o local represents a TGRA model without local branches. In order to fully verify the effectiveness of the local branching proposed by the invention, the invention also carries out two experiments, wherein one experiment is to divide the characteristic diagram into three parts, and the other experiment is to divide the characteristic diagram into four parts. As can be seen from table 4, first, among the four models, the TGRA without local branches has the worst performance, indicating that local detail information is crucial in the vehicle re-identification task. Second, "TGRA (our)" increased 0.5% in maps and 0.6% in CMC @1 on the test set of VeRi776 compared to "TGRA (Part-3)". In addition, it can be seen that the larger the number of divisions, the worse the performance. This is due to misalignment and local disruption of consistency. However, the local branching proposed by the present invention can solve these problems to a large extent. The ablation experiment proves the effectiveness of the method.
The invention provides a global attention network with three branches for vehicle weight recognition, and the model can extract useful characteristics of a vehicle from multiple angles. Furthermore, on the local branch, in order to solve the problems of misalignment and local consistency failure to a large extent, the present invention divides the vehicle characteristic map into two parts uniformly. Finally, through the global attention module, the network can focus on the most significant part of the vehicle re-identification task, learning more identifying and robust features. The characteristics of these three branches are connected during the test phase to obtain better performance. Experiments show that the model of the invention is obviously superior to the best current model on the data sets of VeRi776 and VehicleiD.

Claims (8)

1. A global attention network model for vehicle re-identification, characterized by: the system comprises a backbone network, a local branch for dividing a feature map into two parts and two global branches with global attention modules; the backbone network is split into 3 branches; the global attention network model obtains a feature vector on a feature map finally output by each branch by using a global average pooling GAP; the local branch only divides the vehicle characteristic diagram into two parts horizontally.
2. The global attention network model for vehicle weight recognition according to claim 1, wherein: the two global branches have a channel global attention module CGAM and a spatial global attention module SGAM, respectively, and the backbone network employs ResNet 50.
3. The global attention network model for vehicle weight recognition according to claim 1 or 2, characterized in that: changing the step size of downsampling of res _ conv5_1 blocks of the global branch and the local branch from 2 to 1, and then respectively adding a spatial global attention module and a channel global attention module after res _ conv5 blocks of the two global branches to extract reliable significance information and enhance the feature discrimination capability, wherein res _ conv5 represents a fourth layer of a Resnet50 network model; res _ conv5_1 represents the first component block in the fourth layer of the Resnet50 network model.
4. The global attention network model for vehicle weight recognition according to claim 2, wherein: the CGAM architecture: quantity of design
Figure 991966DEST_PATH_IMAGE001
Is a feature map of a CGAM input, in which
Figure 120459DEST_PATH_IMAGE002
The number of the channels is the number of the channels,
Figure 502024DEST_PATH_IMAGE003
and
Figure 438887DEST_PATH_IMAGE004
the spatial height and width, respectively, of the tensor; slave function
Figure 129762DEST_PATH_IMAGE005
And
Figure 479972DEST_PATH_IMAGE006
to obtain tensor
Figure 914496DEST_PATH_IMAGE007
And
Figure 33761DEST_PATH_IMAGE008
and will be
Figure 629959DEST_PATH_IMAGE009
Is deformed into
Figure 733044DEST_PATH_IMAGE010
Will be
Figure 705679DEST_PATH_IMAGE011
Is deformed into
Figure 351556DEST_PATH_IMAGE012
Figure 649813DEST_PATH_IMAGE013
And
Figure 646719DEST_PATH_IMAGE014
the architecture is the same, consisting of two 1 × 1 convolutions and two 3 × 3 packet convolutions as well as two BatchNormal layers and two Relu activation functions.
5. The global attention network model for vehicle weight recognition of claim 4, wherein: function(s)
Figure 157466DEST_PATH_IMAGE015
Architecture, using two 3 x 3 packet convolutions to increase the field of view and reduce the number of parameters, and then using matrix multiplication to obtain a matrix
Figure 985744DEST_PATH_IMAGE016
Which shows the pair-wise relationship of all channels,
Figure 189324DEST_PATH_IMAGE017
is written into
Figure 939105DEST_PATH_IMAGE018
Matrix array
Figure 50281DEST_PATH_IMAGE017
Each row element of (a) represents the pairwise relationship between each channel and all other channels, the average pairwise relationship of the channels is modeled to obtain the global relationship of the channels, and then the global relationship importance of one channel relative to the other channels is used to obtainThe weight of the channel in all channels is obtained.
6. The global attention network model for vehicle weight recognition of claim 5, wherein: the specific process of obtaining the weight of a channel in all channels by using the global relationship importance of the channel relative to other channels is as follows:
applying relational average pooled RAPs to matrices
Figure 670749DEST_PATH_IMAGE017
To obtain a vector
Figure 841967DEST_PATH_IMAGE019
Wherein
Figure 79045DEST_PATH_IMAGE020
For the number of channels, at which time each element of the vector r represents the global relationship between each channel and all channels, the first element of the vector r
Figure 197173DEST_PATH_IMAGE021
An element is defined as
Figure 531203DEST_PATH_IMAGE022
Figure 811005DEST_PATH_IMAGE023
Converting all global relations into the weight of each channel by adopting a softmax function;
Figure 332117DEST_PATH_IMAGE024
first, vector
Figure 988357DEST_PATH_IMAGE025
Is deformed into
Figure 380155DEST_PATH_IMAGE026
Then broadcast as
Figure 96439DEST_PATH_IMAGE027
I.e. the attention map obtained
Figure 104846DEST_PATH_IMAGE028
(ii) a Finally, the two elements at the same position are multiplied by element-wise multiplication and the two elements at the same position are added by element-wise sum to obtain the final feature map
Figure 541340DEST_PATH_IMAGE029
Figure 522065DEST_PATH_IMAGE030
7. The global attention network model for vehicle re-identification according to any one of claims 2, 4, 5, 6, wherein: the SGAM architecture: quantity of design
Figure 940408DEST_PATH_IMAGE031
Is a characteristic diagram of the SGAM input,
Figure 701691DEST_PATH_IMAGE032
and
Figure 230892DEST_PATH_IMAGE033
the system has the same structure and comprises a 1 × 1 convolution, a BN layer and a ReLU function, and the number of channels is measured
Figure 597283DEST_PATH_IMAGE034
Is reduced to
Figure 655368DEST_PATH_IMAGE035
Figure 841630DEST_PATH_IMAGE036
For the reduction factor, set to 2 in the experiment; by a function
Figure 377785DEST_PATH_IMAGE037
And
Figure 864261DEST_PATH_IMAGE038
obtain the tensor
Figure 827669DEST_PATH_IMAGE039
And
Figure 501227DEST_PATH_IMAGE040
and will be
Figure 106652DEST_PATH_IMAGE039
Is deformed into
Figure 182055DEST_PATH_IMAGE041
Will be
Figure 378681DEST_PATH_IMAGE040
Is deformed into
Figure 539535DEST_PATH_IMAGE042
(ii) a Matrix multiplication is then employed to determine the pairwise relationship between locations and obtain a matrix
Figure 948651DEST_PATH_IMAGE043
Figure 878561DEST_PATH_IMAGE044
8. Global attention for vehicle weight identification as claimed in claim 7A network model, characterized by: for matrix
Figure 121455DEST_PATH_IMAGE045
Obtaining vectors using relational average pooled RAPs
Figure 769605DEST_PATH_IMAGE046
(ii) a Vector quantity
Figure 982411DEST_PATH_IMAGE047
To (1) a
Figure 766828DEST_PATH_IMAGE048
The individual elements may be represented as:
Figure 305256DEST_PATH_IMAGE049
CN202110977958.6A 2021-08-25 2021-08-25 Global attention network model for vehicle weight recognition Active CN113420742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110977958.6A CN113420742B (en) 2021-08-25 2021-08-25 Global attention network model for vehicle weight recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110977958.6A CN113420742B (en) 2021-08-25 2021-08-25 Global attention network model for vehicle weight recognition

Publications (2)

Publication Number Publication Date
CN113420742A true CN113420742A (en) 2021-09-21
CN113420742B CN113420742B (en) 2022-01-11

Family

ID=77719317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110977958.6A Active CN113420742B (en) 2021-08-25 2021-08-25 Global attention network model for vehicle weight recognition

Country Status (1)

Country Link
CN (1) CN113420742B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822246A (en) * 2021-11-22 2021-12-21 山东交通学院 Vehicle weight identification method based on global reference attention mechanism
CN113989836A (en) * 2021-10-20 2022-01-28 华南农业大学 Dairy cow face weight recognition method, system, equipment and medium based on deep learning
CN114663861A (en) * 2022-05-17 2022-06-24 山东交通学院 Vehicle re-identification method based on dimension decoupling and non-local relation
CN116052218A (en) * 2023-02-13 2023-05-02 中国矿业大学 Pedestrian re-identification method
CN116110076A (en) * 2023-02-09 2023-05-12 国网江苏省电力有限公司苏州供电分公司 Power transmission aerial work personnel identity re-identification method and system based on mixed granularity network
CN116311105A (en) * 2023-05-15 2023-06-23 山东交通学院 Vehicle re-identification method based on inter-sample context guidance network
CN116704453A (en) * 2023-08-08 2023-09-05 山东交通学院 Adaptive partitioning and a priori reinforcement part learning network for vehicle re-identification

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070073A (en) * 2019-05-07 2019-07-30 国家广播电视总局广播电视科学研究院 Pedestrian's recognition methods again of global characteristics and local feature based on attention mechanism
CN110084139A (en) * 2019-04-04 2019-08-02 长沙千视通智能科技有限公司 A kind of recognition methods again of the vehicle based on multiple-limb deep learning
CN111325111A (en) * 2020-01-23 2020-06-23 同济大学 Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision
CN111368815A (en) * 2020-05-28 2020-07-03 之江实验室 Pedestrian re-identification method based on multi-component self-attention mechanism
CN111401177A (en) * 2020-03-09 2020-07-10 山东大学 End-to-end behavior recognition method and system based on adaptive space-time attention mechanism
CN111507217A (en) * 2020-04-08 2020-08-07 南京邮电大学 Pedestrian re-identification method based on local resolution feature fusion
CN111931624A (en) * 2020-08-03 2020-11-13 重庆邮电大学 Attention mechanism-based lightweight multi-branch pedestrian heavy identification method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084139A (en) * 2019-04-04 2019-08-02 长沙千视通智能科技有限公司 A kind of recognition methods again of the vehicle based on multiple-limb deep learning
CN110070073A (en) * 2019-05-07 2019-07-30 国家广播电视总局广播电视科学研究院 Pedestrian's recognition methods again of global characteristics and local feature based on attention mechanism
CN111325111A (en) * 2020-01-23 2020-06-23 同济大学 Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision
US20210232813A1 (en) * 2020-01-23 2021-07-29 Tongji University Person re-identification method combining reverse attention and multi-scale deep supervision
CN111401177A (en) * 2020-03-09 2020-07-10 山东大学 End-to-end behavior recognition method and system based on adaptive space-time attention mechanism
CN111507217A (en) * 2020-04-08 2020-08-07 南京邮电大学 Pedestrian re-identification method based on local resolution feature fusion
CN111368815A (en) * 2020-05-28 2020-07-03 之江实验室 Pedestrian re-identification method based on multi-component self-attention mechanism
CN111931624A (en) * 2020-08-03 2020-11-13 重庆邮电大学 Attention mechanism-based lightweight multi-branch pedestrian heavy identification method and system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
QIAN SHI等: "Hyperspectral Image Denoising Using a 3-D", 《IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING》 *
TENG, SZ等: "SCAN: Spatial and Channel Attention Network for Vehicle Re-Identification", 《 ADVANCES IN MULTIMEDIA INFORMATION PROCESSING》 *
刘紫燕 等: "基于注意力机制的行人重识别特征提取方法", 《计算机应用》 *
朱绍祥: "基于深度学习的行人重识别***的研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
谢彭宇等: "基于多尺度联合学习的行人重识别", 《北京航空航天大学学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989836A (en) * 2021-10-20 2022-01-28 华南农业大学 Dairy cow face weight recognition method, system, equipment and medium based on deep learning
CN113822246A (en) * 2021-11-22 2021-12-21 山东交通学院 Vehicle weight identification method based on global reference attention mechanism
CN114663861A (en) * 2022-05-17 2022-06-24 山东交通学院 Vehicle re-identification method based on dimension decoupling and non-local relation
CN114663861B (en) * 2022-05-17 2022-08-26 山东交通学院 Vehicle re-identification method based on dimension decoupling and non-local relation
CN116110076A (en) * 2023-02-09 2023-05-12 国网江苏省电力有限公司苏州供电分公司 Power transmission aerial work personnel identity re-identification method and system based on mixed granularity network
CN116110076B (en) * 2023-02-09 2023-11-07 国网江苏省电力有限公司苏州供电分公司 Power transmission aerial work personnel identity re-identification method and system based on mixed granularity network
CN116052218A (en) * 2023-02-13 2023-05-02 中国矿业大学 Pedestrian re-identification method
CN116052218B (en) * 2023-02-13 2023-07-18 中国矿业大学 Pedestrian re-identification method
CN116311105A (en) * 2023-05-15 2023-06-23 山东交通学院 Vehicle re-identification method based on inter-sample context guidance network
CN116311105B (en) * 2023-05-15 2023-09-19 山东交通学院 Vehicle re-identification method based on inter-sample context guidance network
CN116704453A (en) * 2023-08-08 2023-09-05 山东交通学院 Adaptive partitioning and a priori reinforcement part learning network for vehicle re-identification
CN116704453B (en) * 2023-08-08 2023-11-28 山东交通学院 Method for vehicle re-identification by adopting self-adaptive division and priori reinforcement part learning network

Also Published As

Publication number Publication date
CN113420742B (en) 2022-01-11

Similar Documents

Publication Publication Date Title
CN113420742B (en) Global attention network model for vehicle weight recognition
Chen et al. Partition and reunion: A two-branch neural network for vehicle re-identification.
CN108197326B (en) Vehicle retrieval method and device, electronic equipment and storage medium
CN106557579B (en) Vehicle model retrieval system and method based on convolutional neural network
CN112966137B (en) Image retrieval method and system based on global and local feature rearrangement
CN111507217A (en) Pedestrian re-identification method based on local resolution feature fusion
CN108154133B (en) Face portrait-photo recognition method based on asymmetric joint learning
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN111582339B (en) Vehicle detection and recognition method based on deep learning
CN108764096B (en) Pedestrian re-identification system and method
CN113592007B (en) Knowledge distillation-based bad picture identification system and method, computer and storage medium
CN102176208A (en) Robust video fingerprint method based on three-dimensional space-time characteristics
CN112785480B (en) Image splicing tampering detection method based on frequency domain transformation and residual error feedback module
Zang et al. Traffic lane detection using fully convolutional neural network
CN110826415A (en) Method and device for re-identifying vehicles in scene image
CN109325407B (en) Optical remote sensing video target detection method based on F-SSD network filtering
CN112861605A (en) Multi-person gait recognition method based on space-time mixed characteristics
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
CN105184299A (en) Vehicle body color identification method based on local restriction linearity coding
CN113269224A (en) Scene image classification method, system and storage medium
CN117197763A (en) Road crack detection method and system based on cross attention guide feature alignment network
Elkerdawy et al. Fine-grained vehicle classification with unsupervised parts co-occurrence learning
CN110458234B (en) Vehicle searching method with map based on deep learning
CN110516640B (en) Vehicle re-identification method based on feature pyramid joint representation
CN112418262A (en) Vehicle re-identification method, client and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant