CN113255490A - Unsupervised pedestrian re-identification method based on k-means clustering and merging - Google Patents
Unsupervised pedestrian re-identification method based on k-means clustering and merging Download PDFInfo
- Publication number
- CN113255490A CN113255490A CN202110530514.8A CN202110530514A CN113255490A CN 113255490 A CN113255490 A CN 113255490A CN 202110530514 A CN202110530514 A CN 202110530514A CN 113255490 A CN113255490 A CN 113255490A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- identification
- features
- network
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Probability & Statistics with Applications (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the field of pedestrian re-identification in deep learning, in particular to an unsupervised pedestrian re-identification method based on K-means clustering and merging, which comprises the steps that an unsupervised clustering and merging algorithm based on K-means is used for image processing, and a network model based on SE-ResNet50 is used for pedestrian re-identification. The method comprises the following steps: selecting a pedestrian re-identification data set; constructing a pedestrian re-identification model comprising a feature extraction network and a connection prediction network, and then clustering and merging the extracted feature images by using a K-means algorithm; and finally, training the clustered image fusion network to obtain a result of pedestrian re-identification. The method obtains good results on two pedestrian re-identification data sets Market-1501 and Duke MTMC-reiD, and has certain help effect on pedestrian re-identification in the field of unsupervised learning.
Description
Technical Field
The invention relates to pedestrian re-identification in deep learning, which utilizes SE-ResNet50 to extract image features and then carries out training after k-means clustering, and belongs to the field of computer vision.
Background
Pedestrian re-identification is a popular research direction in the field of computer vision in recent years, and is a technology for searching and identifying a specific pedestrian in a cross-camera and cross-scene by using a computer vision technology. The pedestrian re-identification has good application in the fields of criminal investigation, medical imaging, security and intelligent life.
Due to the improvement of computer hardware and the rapid increase of calculated amount, deep learning also enters the golden period of development and obtains a plurality of excellent achievements, and similarly, the deep learning also enters a hundred-flower full-scale state in the pedestrian re-identification direction, and the deep neural network forms abstract deep features by extracting basic features of a bottom layer, so that the internal rules of data are found. The mainstream deep learning technology is used for pedestrian re-recognition of networks such as GoogleNet and ResNet, and pedestrian image features are extracted first and then the pedestrian re-recognition is carried out.
However, although the conventional pedestrian re-identification method achieves good effects, it is a supervised learning method based on classification data [, which is still in a slow stage in cross-domain unsupervised or semi-supervised pedestrian re-identification, and mainly has the following problems: 1) the model trained under the supervision condition is directly transferred to a target domain for testing, so that the performance is greatly reduced; 2) it is very difficult to perform domain adaptation without id tagged target domain data.
Therefore, in order to solve the difficulties, an improved unsupervised K-means clustering and merging pedestrian re-identification method is provided, and the method is different in that on the basis of the original bottom-up clustering, an SE module is added into a convolutional neural network to extract pedestrian features, and then K-means clustering and merging are carried out on the features extracted by the network, so that pedestrian pictures with similar features are classified into one class, and then a model is retrained until the model converges, namely training is stopped.
Disclosure of Invention
The invention utilizes SE-ResNet-50 to extract image characteristics, then carries out K-means clustering combination, and finally carries out pedestrian re-identification, wherein the overall network diagram is shown in figure 2.
The first step is as follows: SE-ResNet-50 extracts image abstract features.
We specify the input feature map size as H × C × W, where H, C, W denotes the height, width, and number of channels, respectively, of the input feature map. cu is the value of the corresponding point on the feature map. The input feature map is first convolved by 3x3 and maximally pooled.
The last two layers of the convolutional neural network are combined with the SE module to form a pedestrian re-identification network structure based on the SE module, so that the characteristics of the SE module are fully utilized, the convolutional neural network is helped to improve the identification capability of information, the importance of each channel is further easily identified, the important characteristics are improved, and the secondary characteristics are inhibited.
Under the full-connection receptive field of a time-space domain, the channel weight coefficient is changed into 1 multiplied by C, wherein the purpose of multiplication of the channel weight coefficient and the related channel number is to calibrate the final characteristic, and a small amount of parameters are added properly to enable the network to better identify the spatial characteristic.
The second step is that: and clustering the extracted characteristic images by using a K-means algorithm.
(1) Randomly selecting K elements from the extracted original characteristic data as respective central points of K groups,
(2) the distances from the K elements to the K clusters are calculated respectively, and the elements are divided into the clusters closest to the elements, and the formula is as follows.
(3) And recalculating the central point of each clique by using the averaging principle. The formula is as follows.
Ci is a different clustering group, and the mean value of K is minimized to optimize the K-means result.
(4) The last two steps are repeated until the results of the iteration converge.
And thirdly, training the unsupervised clustered pedestrian images by fusing a network.
Because the clustering-based pedestrian re-recognition task is more complex, a deeper network is required for training, and theoretically, the convolutional neural network can improve the generalization capability of the model along with the increase of the number of layers [18 ]. In fact, the convolutional neural network has a degradation problem in a model while the number of layers is increased, the root cause is that the convolutional neural network is influenced by factors such as gradient reduction in transmission between layers, and in order to better improve the recognition effect of the model on an input picture, a SENET network is introduced into the convolutional neural network.
And fourthly, evaluating the performance of the model on the test set.
The method used in the present invention was evaluated effectively using the Cumulative Matching Curve (CMC), mean of average precision (maps). Cumulative Matching Curves (CMCs) are common evaluation criteria in pedestrian re-identification. Firstly, setting all searched pedestrian pictures as N, then bringing the pictures into a candidate set, then comparing the brought pictures in a set with similar distances, then sequencing the pedestrian images in the set according to the distances, wherein sequencing results are expressed by (1,2 Nr.. rNr), and finally counting to obtain a CMC curve, wherein the closer the sequence of the pedestrians to be searched is, the better the algorithm effect is.
Drawings
FIG. 1 is a network flow diagram
FIG. 2 is a diagram of a network architecture based on SE modules
FIG. 3 basic framework diagram for unsupervised learning
FIG. 4 Cluster merge graph
FIG. 5 is a graph showing the results of the experiment
FIG. 6 is a graph showing the results of the experiment
Detailed Description
A SE-ResNet50 network is built by utilizing a Pythroch deep learning framework, under a Centoss 6.0 environment, NVIDIA GTX1080Ti is adopted as hardware configuration, a Market-1501 data set and a DukeMTMC-reiD pedestrian re-identification data set are adopted as an experimental data set, firstly, in the first stage, training is carried out to enable epochs to be 20, batch _ size to be 16, drop to be 0.5, mp to be 0.05 and lambda to be 0.005. SGD optimization was then performed to achieve momentum of 0.9, 15epochs before Ir of 0.1, and 5epochs after Ir of 0.01. The final graphics card was NVIDIAGTX1080Ti, we tested on the Market-1501 dataset and the DukeMTMC dataset, taking 39 hours to complete training.
1. Feature extraction stage
The SE-ResNet50 is used for helping the convolutional neural network to improve the information recognition capability, each characteristic channel is compressed into a real number, the real number has a global receptive field, a compressed real number set corresponding to each characteristic channel is obtained after all the characteristic channels are compressed, each real number represents a corresponding global characteristic channel, and the number of the real numbers in the real number set is the same as the total number of the characteristic channels.
And then, introducing an excitation function, performing global average pooling on C H W to obtain a feature map of 1X 1C, wherein the feature map has a global receptive field at this stage, then performing excitation operation, performing nonlinear transformation on the result obtained in the first step through a fully-connected neural network, and multiplying the obtained result to the input feature as weight.
2. Image feature clustering stage
As shown in fig. 4, k feature points are randomly selected as initial features, then the distance between each feature and the selected feature is sequentially calculated, and the feature points are clustered to the feature point closest to the feature point, that is, the feature point most similar to the feature point, and finally k cluster families are obtained. The pedestrian features are classified in a spherical mode, so that the diversity of the pedestrian categories is maximized. In (b), the merging of clusters is performed so that the features embedded in the same cluster of spheres are closer and closer. In (c), the top half of the sphere shows cluster merge results without diversity regularization: (point 1, point 3) and (point 4, point 8) have the shortest distance and then merge into one cluster. The sphere in the lower half shows the cluster merge results for diversity regularization: although the distance between the yellow and green clusters is shortest, the two clusters are too large to merge, but instead merge point 6 and point 7.
3. Network training phase
And (3) putting the images subjected to clustering and merging into a pedestrian re-recognition network for training, wherein the network selects SE-ResNet50, and the training is ended until convergence.
4. Comparison of Experimental results
Compared with the existing unsupervised method, the improved method has the advantage that the final realization effect is obviously improved. The Market-1501 data set and the DukeMTMC-reiD pedestrian re-identification data set are selected in the experiment, and an accumulated matching curve (CMC) and an average precision mean value (mAP) are used when the method is effectively evaluated. The results of the experiment are shown in tables 1 and 2.
The method presented in table 1 compares the two image-based datasets with the current advanced method
Table 2 comparison with the exemplary method on two data sets
Claims (2)
1. An unsupervised clustering merging algorithm based on K-means is used for image processing and is characterized by comprising the following steps of:
1) selecting a pedestrian re-identification data set;
2) designing an unsupervised learning basic framework, wherein a network for extracting image features comprises a convolutional layer, a maximum pooling layer and an activation layer;
3) in order to obtain a group of images with the most similar features, clustering is carried out by using a K-means algorithm, and the distance between each feature is calculated according to the following formula:
2. a network model based on SE-ResNet50 is used for pedestrian re-identification, and is characterized by comprising the following steps:
1) the feature network model adds the SE module into the convolutional neural network to avoid errors in the clustering process. In the SE network, an SE module is the most core part and can compress a large number of identified features, so that only important feature information is processed, and the purpose of effectively extracting the features is finally achieved;
2) performing 'compression' operation on the features one by one in the spatial dimension, and replacing the input feature channel number with a real number to ensure that the real number has the features of the global receptive field and the input and the output are in more one-to-one correspondence;
3) performing excitation operation on the features, and calibrating a weight information for all feature channels so as to embody the importance of the feature channels;
4) according to the characteristic channel weighting information calibrated by the excitation operation, weighting each channel by adopting a multiplication principle based on the initial characteristic, thereby re-labeling the original characteristic on the channel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110530514.8A CN113255490A (en) | 2021-05-15 | 2021-05-15 | Unsupervised pedestrian re-identification method based on k-means clustering and merging |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110530514.8A CN113255490A (en) | 2021-05-15 | 2021-05-15 | Unsupervised pedestrian re-identification method based on k-means clustering and merging |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113255490A true CN113255490A (en) | 2021-08-13 |
Family
ID=77182031
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110530514.8A Pending CN113255490A (en) | 2021-05-15 | 2021-05-15 | Unsupervised pedestrian re-identification method based on k-means clustering and merging |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113255490A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110400349A (en) * | 2019-07-03 | 2019-11-01 | 成都理工大学 | Robot navigation tracks restoration methods under small scene based on random forest |
CN111860678A (en) * | 2020-07-29 | 2020-10-30 | 中国矿业大学 | Unsupervised cross-domain pedestrian re-identification method based on clustering |
CN112560604A (en) * | 2020-12-04 | 2021-03-26 | 中南大学 | Pedestrian re-identification method based on local feature relationship fusion |
CN112766237A (en) * | 2021-03-12 | 2021-05-07 | 东北林业大学 | Unsupervised pedestrian re-identification method based on cluster feature point clustering |
-
2021
- 2021-05-15 CN CN202110530514.8A patent/CN113255490A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110400349A (en) * | 2019-07-03 | 2019-11-01 | 成都理工大学 | Robot navigation tracks restoration methods under small scene based on random forest |
CN111860678A (en) * | 2020-07-29 | 2020-10-30 | 中国矿业大学 | Unsupervised cross-domain pedestrian re-identification method based on clustering |
CN112560604A (en) * | 2020-12-04 | 2021-03-26 | 中南大学 | Pedestrian re-identification method based on local feature relationship fusion |
CN112766237A (en) * | 2021-03-12 | 2021-05-07 | 东北林业大学 | Unsupervised pedestrian re-identification method based on cluster feature point clustering |
Non-Patent Citations (1)
Title |
---|
邬可 等: ""基于压缩激励残差网络与特征融合的行人重识别"", 《激光与光电子学进展》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108830296B (en) | Improved high-resolution remote sensing image classification method based on deep learning | |
CN110738146B (en) | Target re-recognition neural network and construction method and application thereof | |
CN109165566B (en) | Face recognition convolutional neural network training method based on novel loss function | |
CN113378632B (en) | Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method | |
CN113516012B (en) | Pedestrian re-identification method and system based on multi-level feature fusion | |
CN110348399B (en) | Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network | |
CN108256482B (en) | Face age estimation method for distributed learning based on convolutional neural network | |
CN109828251A (en) | Radar target identification method based on feature pyramid light weight convolutional neural networks | |
CN112580590A (en) | Finger vein identification method based on multi-semantic feature fusion network | |
CN109063649B (en) | Pedestrian re-identification method based on twin pedestrian alignment residual error network | |
CN113688894B (en) | Fine granularity image classification method integrating multiple granularity features | |
CN112200123B (en) | Hyperspectral open set classification method combining dense connection network and sample distribution | |
CN114299542A (en) | Video pedestrian re-identification method based on multi-scale feature fusion | |
CN114463812B (en) | Low-resolution face recognition method based on double-channel multi-branch fusion feature distillation | |
CN112329536A (en) | Single-sample face recognition method based on alternative pair anti-migration learning | |
CN111598004A (en) | Progressive-enhancement self-learning unsupervised cross-domain pedestrian re-identification method | |
CN115909052A (en) | Hyperspectral remote sensing image classification method based on hybrid convolutional neural network | |
CN112070010B (en) | Pedestrian re-recognition method for enhancing local feature learning by combining multiple-loss dynamic training strategies | |
CN112364809A (en) | High-accuracy face recognition improved algorithm | |
CN115965864A (en) | Lightweight attention mechanism network for crop disease identification | |
CN116052218A (en) | Pedestrian re-identification method | |
CN115116139A (en) | Multi-granularity human body action classification method based on graph convolution network | |
CN109145950B (en) | Hyperspectral image active learning method based on image signal sampling | |
Wang et al. | Bit-plane and correlation spatial attention modules for plant disease classification | |
CN113033345B (en) | V2V video face recognition method based on public feature subspace |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210813 |