CN113255490A

CN113255490A - Unsupervised pedestrian re-identification method based on k-means clustering and merging

Info

Publication number: CN113255490A
Application number: CN202110530514.8A
Authority: CN
Inventors: 何建军; 蔡华鹏
Original assignee: Chengdu Univeristy of Technology
Current assignee: Chengdu Univeristy of Technology
Priority date: 2021-05-15
Filing date: 2021-05-15
Publication date: 2021-08-13

Abstract

The invention relates to the field of pedestrian re-identification in deep learning, in particular to an unsupervised pedestrian re-identification method based on K-means clustering and merging, which comprises the steps that an unsupervised clustering and merging algorithm based on K-means is used for image processing, and a network model based on SE-ResNet50 is used for pedestrian re-identification. The method comprises the following steps: selecting a pedestrian re-identification data set; constructing a pedestrian re-identification model comprising a feature extraction network and a connection prediction network, and then clustering and merging the extracted feature images by using a K-means algorithm; and finally, training the clustered image fusion network to obtain a result of pedestrian re-identification. The method obtains good results on two pedestrian re-identification data sets Market-1501 and Duke MTMC-reiD, and has certain help effect on pedestrian re-identification in the field of unsupervised learning.

Description

Unsupervised pedestrian re-identification method based on k-means clustering and merging

Technical Field

The invention relates to pedestrian re-identification in deep learning, which utilizes SE-ResNet50 to extract image features and then carries out training after k-means clustering, and belongs to the field of computer vision.

Background

Pedestrian re-identification is a popular research direction in the field of computer vision in recent years, and is a technology for searching and identifying a specific pedestrian in a cross-camera and cross-scene by using a computer vision technology. The pedestrian re-identification has good application in the fields of criminal investigation, medical imaging, security and intelligent life.

Due to the improvement of computer hardware and the rapid increase of calculated amount, deep learning also enters the golden period of development and obtains a plurality of excellent achievements, and similarly, the deep learning also enters a hundred-flower full-scale state in the pedestrian re-identification direction, and the deep neural network forms abstract deep features by extracting basic features of a bottom layer, so that the internal rules of data are found. The mainstream deep learning technology is used for pedestrian re-recognition of networks such as GoogleNet and ResNet, and pedestrian image features are extracted first and then the pedestrian re-recognition is carried out.

However, although the conventional pedestrian re-identification method achieves good effects, it is a supervised learning method based on classification data [, which is still in a slow stage in cross-domain unsupervised or semi-supervised pedestrian re-identification, and mainly has the following problems: 1) the model trained under the supervision condition is directly transferred to a target domain for testing, so that the performance is greatly reduced; 2) it is very difficult to perform domain adaptation without id tagged target domain data.

Therefore, in order to solve the difficulties, an improved unsupervised K-means clustering and merging pedestrian re-identification method is provided, and the method is different in that on the basis of the original bottom-up clustering, an SE module is added into a convolutional neural network to extract pedestrian features, and then K-means clustering and merging are carried out on the features extracted by the network, so that pedestrian pictures with similar features are classified into one class, and then a model is retrained until the model converges, namely training is stopped.

Disclosure of Invention

The invention utilizes SE-ResNet-50 to extract image characteristics, then carries out K-means clustering combination, and finally carries out pedestrian re-identification, wherein the overall network diagram is shown in figure 2.

The first step is as follows: SE-ResNet-50 extracts image abstract features.

We specify the input feature map size as H × C × W, where H, C, W denotes the height, width, and number of channels, respectively, of the input feature map. cu is the value of the corresponding point on the feature map. The input feature map is first convolved by 3x3 and maximally pooled.

The last two layers of the convolutional neural network are combined with the SE module to form a pedestrian re-identification network structure based on the SE module, so that the characteristics of the SE module are fully utilized, the convolutional neural network is helped to improve the identification capability of information, the importance of each channel is further easily identified, the important characteristics are improved, and the secondary characteristics are inhibited.

Under the full-connection receptive field of a time-space domain, the channel weight coefficient is changed into 1 multiplied by C, wherein the purpose of multiplication of the channel weight coefficient and the related channel number is to calibrate the final characteristic, and a small amount of parameters are added properly to enable the network to better identify the spatial characteristic.

The second step is that: and clustering the extracted characteristic images by using a K-means algorithm.

(1) Randomly selecting K elements from the extracted original characteristic data as respective central points of K groups,

(2) the distances from the K elements to the K clusters are calculated respectively, and the elements are divided into the clusters closest to the elements, and the formula is as follows.

E is each feature point and randomly selected feature point mu_iThe distance of (c).

(3) And recalculating the central point of each clique by using the averaging principle. The formula is as follows.

Ci is a different clustering group, and the mean value of K is minimized to optimize the K-means result.

(4) The last two steps are repeated until the results of the iteration converge.

And thirdly, training the unsupervised clustered pedestrian images by fusing a network.

Because the clustering-based pedestrian re-recognition task is more complex, a deeper network is required for training, and theoretically, the convolutional neural network can improve the generalization capability of the model along with the increase of the number of layers [18 ]. In fact, the convolutional neural network has a degradation problem in a model while the number of layers is increased, the root cause is that the convolutional neural network is influenced by factors such as gradient reduction in transmission between layers, and in order to better improve the recognition effect of the model on an input picture, a SENET network is introduced into the convolutional neural network.

And fourthly, evaluating the performance of the model on the test set.

The method used in the present invention was evaluated effectively using the Cumulative Matching Curve (CMC), mean of average precision (maps). Cumulative Matching Curves (CMCs) are common evaluation criteria in pedestrian re-identification. Firstly, setting all searched pedestrian pictures as N, then bringing the pictures into a candidate set, then comparing the brought pictures in a set with similar distances, then sequencing the pedestrian images in the set according to the distances, wherein sequencing results are expressed by (1,2 Nr.. rNr), and finally counting to obtain a CMC curve, wherein the closer the sequence of the pedestrians to be searched is, the better the algorithm effect is.

Drawings

FIG. 1 is a network flow diagram

FIG. 2 is a diagram of a network architecture based on SE modules

FIG. 3 basic framework diagram for unsupervised learning

FIG. 4 Cluster merge graph

FIG. 5 is a graph showing the results of the experiment

FIG. 6 is a graph showing the results of the experiment

Detailed Description

A SE-ResNet50 network is built by utilizing a Pythroch deep learning framework, under a Centoss 6.0 environment, NVIDIA GTX1080Ti is adopted as hardware configuration, a Market-1501 data set and a DukeMTMC-reiD pedestrian re-identification data set are adopted as an experimental data set, firstly, in the first stage, training is carried out to enable epochs to be 20, batch _ size to be 16, drop to be 0.5, mp to be 0.05 and lambda to be 0.005. SGD optimization was then performed to achieve momentum of 0.9, 15epochs before Ir of 0.1, and 5epochs after Ir of 0.01. The final graphics card was NVIDIAGTX1080Ti, we tested on the Market-1501 dataset and the DukeMTMC dataset, taking 39 hours to complete training.

1. Feature extraction stage

The SE-ResNet50 is used for helping the convolutional neural network to improve the information recognition capability, each characteristic channel is compressed into a real number, the real number has a global receptive field, a compressed real number set corresponding to each characteristic channel is obtained after all the characteristic channels are compressed, each real number represents a corresponding global characteristic channel, and the number of the real numbers in the real number set is the same as the total number of the characteristic channels.

And then, introducing an excitation function, performing global average pooling on C H W to obtain a feature map of 1X 1C, wherein the feature map has a global receptive field at this stage, then performing excitation operation, performing nonlinear transformation on the result obtained in the first step through a fully-connected neural network, and multiplying the obtained result to the input feature as weight.

2. Image feature clustering stage

As shown in fig. 4, k feature points are randomly selected as initial features, then the distance between each feature and the selected feature is sequentially calculated, and the feature points are clustered to the feature point closest to the feature point, that is, the feature point most similar to the feature point, and finally k cluster families are obtained. The pedestrian features are classified in a spherical mode, so that the diversity of the pedestrian categories is maximized. In (b), the merging of clusters is performed so that the features embedded in the same cluster of spheres are closer and closer. In (c), the top half of the sphere shows cluster merge results without diversity regularization: (point 1, point 3) and (point 4, point 8) have the shortest distance and then merge into one cluster. The sphere in the lower half shows the cluster merge results for diversity regularization: although the distance between the yellow and green clusters is shortest, the two clusters are too large to merge, but instead merge point 6 and point 7.

3. Network training phase

And (3) putting the images subjected to clustering and merging into a pedestrian re-recognition network for training, wherein the network selects SE-ResNet50, and the training is ended until convergence.

4. Comparison of Experimental results

Compared with the existing unsupervised method, the improved method has the advantage that the final realization effect is obviously improved. The Market-1501 data set and the DukeMTMC-reiD pedestrian re-identification data set are selected in the experiment, and an accumulated matching curve (CMC) and an average precision mean value (mAP) are used when the method is effectively evaluated. The results of the experiment are shown in tables 1 and 2.

The method presented in table 1 compares the two image-based datasets with the current advanced method

Table 2 comparison with the exemplary method on two data sets

Claims

1. An unsupervised clustering merging algorithm based on K-means is used for image processing and is characterized by comprising the following steps of:

1) selecting a pedestrian re-identification data set;

2) designing an unsupervised learning basic framework, wherein a network for extracting image features comprises a convolutional layer, a maximum pooling layer and an activation layer;

3) in order to obtain a group of images with the most similar features, clustering is carried out by using a K-means algorithm, and the distance between each feature is calculated according to the following formula:

2. a network model based on SE-ResNet50 is used for pedestrian re-identification, and is characterized by comprising the following steps:

1) the feature network model adds the SE module into the convolutional neural network to avoid errors in the clustering process. In the SE network, an SE module is the most core part and can compress a large number of identified features, so that only important feature information is processed, and the purpose of effectively extracting the features is finally achieved;

2) performing 'compression' operation on the features one by one in the spatial dimension, and replacing the input feature channel number with a real number to ensure that the real number has the features of the global receptive field and the input and the output are in more one-to-one correspondence;

3) performing excitation operation on the features, and calibrating a weight information for all feature channels so as to embody the importance of the feature channels;

4) according to the characteristic channel weighting information calibrated by the excitation operation, weighting each channel by adopting a multiplication principle based on the initial characteristic, thereby re-labeling the original characteristic on the channel.