CN117612201B - Single-sample pedestrian re-identification method based on feature compression - Google Patents

Single-sample pedestrian re-identification method based on feature compression Download PDF

Info

Publication number
CN117612201B
CN117612201B CN202311371401.3A CN202311371401A CN117612201B CN 117612201 B CN117612201 B CN 117612201B CN 202311371401 A CN202311371401 A CN 202311371401A CN 117612201 B CN117612201 B CN 117612201B
Authority
CN
China
Prior art keywords
pedestrian
image
sample
recognition
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311371401.3A
Other languages
Chinese (zh)
Other versions
CN117612201A (en
Inventor
吕泽
董彦斌
王进
徐嘉玲
王可
赵颖钏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202311371401.3A priority Critical patent/CN117612201B/en
Publication of CN117612201A publication Critical patent/CN117612201A/en
Application granted granted Critical
Publication of CN117612201B publication Critical patent/CN117612201B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a single-sample pedestrian re-identification method based on feature compression, which comprises the following steps: firstly, performing contrast generation picture operation on labeled pictures in a standard data set of pedestrian re-identification; then, placing the generated pictures and unlabeled pictures into a single-sample pedestrian re-identification network to obtain a distance matrix, selecting a certain number of pictures with highest scores, and labeling pseudo labels; secondly, selecting a pedestrian image, entering a network for training, particularly, carrying out feature compression on the image, and combining a loss function to train a single sample identification network with good performance; and finally, completing the identification of the target pedestrian. According to the single-sample pedestrian re-recognition method based on feature compression, unlabeled images are added to a model step by step, overfitting is avoided by utilizing contrast generated images, noise influence caused by pictures is reduced by utilizing feature compression, and therefore accuracy of model recognition is improved.

Description

Single-sample pedestrian re-identification method based on feature compression
Technical Field
The invention relates to the technical field of artificial intelligence and computer vision, in particular to a single-sample pedestrian re-identification method based on feature compression.
Background
Pedestrian Re-identification (Re-ID) refers to a technique of finding the same pedestrian from a large number of pedestrian images captured by a plurality of non-overlapping cameras. The technology has wide application in the fields of video monitoring, disease tracking and the like. With the rapid development of deep learning, pedestrian re-recognition based on deep learning is becoming the current mainstream technology.
Early, the Re-ID model relied primarily on fully supervised learning, meaning that each pedestrian required multiple samples to ensure recognition accuracy. Large scale manual labeling is both expensive and potentially unstable, especially when there is a label ambiguity. Furthermore, collecting private information of the face, position, etc. of pedestrians may involve a risk of privacy disclosure. Whereas the performance of the model may be severely affected if large amounts of labeling data are not used, especially in practical applications where there is only a small amount or even only one sample (single sample).
Therefore, single-sample pedestrian re-recognition becomes a new technical problem in the industry, and research on a single-sample pedestrian re-recognition method becomes particularly critical. The method aims to optimize the performance of the Re-ID model under the condition of less data, thereby further expanding the application field of the Re-ID model.
Although some solutions are proposed by the existing single-sample pedestrian re-identification method, the accuracy is still not high enough. The main reason is that these schemes fail to avoid the uncertainty effects of unlabeled images, which results in slow gradient descent and even occasional failure to jump out of the gradient during model training. In order to solve the problems, the invention provides a single-sample pedestrian re-recognition method based on feature compression, which is characterized in that unlabeled images are gradually added into a model, the image is generated by CycleGAN to avoid over fitting, and the noise influence caused by the model is reduced by using feature compression, so that the accuracy of model recognition is improved.
The method has clear difference from the prior method, adopts the GAN generation technology to acquire more training data in the model, avoids the overfitting phenomenon of the model, further provides a method for compressing three features, avoids the noise influence caused by the generated picture, and forces the model to learn more effective picture features. Meanwhile, aiming at the problem that the false label in the single-sample pedestrian re-identification possibly brings huge interference, the method provides a weighted difficult-sample triplet loss and solves the problem that the model cannot reach the optimal value due to misleading of the model by the false label in the false label.
Disclosure of Invention
The purpose of the invention is that: the single-sample pedestrian re-identification method based on feature compression is provided, and the identification capacity of a model to unlabeled samples is improved, so that the model identification accuracy is improved.
In order to achieve the above functions, the invention designs a single-sample pedestrian re-identification method based on feature compression, which comprises the following steps S1-S6, and the identification of a target pedestrian is completed:
Step S1: selecting a standard data set of pedestrian re-recognition formed by pedestrian images, wherein the standard data set of pedestrian re-recognition comprises labeled images and unlabeled images;
step S2: based on CycleGAN networks, constructing an antagonism picture generation module, and taking tagged images in a standard dataset of pedestrian re-recognition as input to generate a preset number of antagonism generated images;
Step S3: based on ResNet network, introducing a feature compression method to perform data enhancement, constructing a single-sample pedestrian re-recognition network, and using the image generated in the step S2 and the unlabeled image in the standard data set of pedestrian re-recognition as inputs to obtain a distance matrix, and labeling the unlabeled image with a pseudo label according to the distance matrix;
Step S4: randomly selecting P pedestrians, selecting K images for each pedestrian, inputting a single-sample pedestrian re-recognition network, and training the single-sample pedestrian re-recognition network by combining a weighted difficult-sample triplet loss function and a tag smooth regularized cross entropy loss function;
Step S5: if the number of the designated training rounds is reached or all the images are marked with pseudo labels, training is completed, a trained single-sample pedestrian re-recognition network is obtained, otherwise, the step S3 is returned, and training is continued on the single-sample pedestrian re-recognition network;
step S6: and (5) applying the trained single-sample pedestrian re-recognition network to finish recognition of the target pedestrians.
As a preferred technical scheme of the invention: the standard data set of pedestrian re-identification in the step S1 is a mark-1501 data set.
As a preferred technical scheme of the invention: representing a standard dataset of pedestrian re-recognition as x= { l, u }, wherein l is a labeled image set and u is a non-labeled image set; the CycleGAN network training two generators G and G ' described in step S2, wherein one generator G inputs a labeled image l, outputs an unlabeled image u ', denoted as G (l) =u '; another generator G 'inputs an unlabeled image u, outputs a labeled image l', denoted as G '(u) =l';
Correspondingly training two scoring devices S and S', and respectively judging whether the images output by the two generators are good or not: if the generator G output image u 'and the unlabeled image set u have no similar images, the scoring device S scores the generator G output image u' with a preset low score, otherwise, if the generator G output image u 'and the unlabeled image set u have similar images, the scoring device S scores the generator G output image u' with a preset high score;
If the generator G 'outputs the image l' and the labeled image set l have no similar images, the scoring device S 'scores the generator G' outputs the image l 'with a preset low score, otherwise, if the generator G' outputs the image l 'and the labeled image set l have similar images, the scoring device S' scores the generator G 'outputs the image l' with a preset high score.
As a preferred technical scheme of the invention: the feature compression method introduced in step S3 includes a feature stepping technique for reducing the color value of the input image expressed by i bits to be expressed by j bits, the expression of which is as follows:
wherein X is a single-channel input value of each pixel of the input image, and X is an output value;
Randomly generating a value p for each image, the range being defined between 0 and 1; and determining the characteristic compression condition of the image according to the value of p, wherein the expression is as follows:
where α 1 and α 2 are boundary values.
As a preferred technical scheme of the invention: in step S4, each image is divided into n areas which are not overlapped with each other by adopting a competitive shielding mode for the images of the same pedestrian, a part of the areas in each image are shielded in sequence, each image generates n images, the shielded images select random areas of random pictures in a large-scale dataset ImageNet to generate a preset number of images, and the images are used as training samples of a single-sample pedestrian re-recognition network.
As a preferred technical scheme of the invention: the weighted hard sample triplet loss function L Sig_Hard_TriLoss described in step S4 is as follows:
wherein A is an anchor point sample, P is a positive sample, N is a negative sample, alpha is a threshold parameter, and The calculation of the Euclidean distance is performed, and the formula is calculated as follows:
For the feature operation, the formula is calculated as follows:
Where f is the input feature, P is the positive sample, BN is the batch normalization operation, Is an output feature;
The cross entropy loss function L cross-entropy is as follows:
in the formula, num is the number of pedestrians, y is the real label of the pedestrians, p i is the output predicted identity probability value, and q i is as follows:
Wherein epsilon is the error rate, and substitutes the error rate into a cross entropy loss function to obtain a cross entropy loss function L lsr of label smoothing regularization in the step S4;
the cross entropy loss function of the joint weighted hard sample triplet loss function and label smoothing regularization is as follows:
Lsum=LSig_Hard_Tri+Llsr
the beneficial effects are that: the advantages of the present invention over the prior art include:
The invention designs a single-sample pedestrian re-recognition method based on feature compression, and provides a single-sample pedestrian re-recognition strategy based on feature compression, which continuously reduces the range of unlabeled images by using a gradual labeling mode, and achieves the effect of training all data as labeled images by using the proposed feature compression method and a weighted difficult-sample triplet loss combined training model, thereby improving the accuracy of single-sample pedestrian re-recognition.
Specifically, the invention designs a pedestrian re-identification single sample identification strategy, gradually marks as a main strategy, uses ResNet as a main network, uses CycleGAN to supplement the number of images, and avoids the problem of overfitting caused by fewer data sets. In order to avoid unstable influence on the model caused by the local excessive highlighting of the generated image and the unlabeled image, a module is further added for processing. In addition, a characteristic compression method is introduced to enhance data, and cross entropy loss and weighted difficult sample triplet loss of joint label smoothing regularization are adopted to conduct supervision training. Such a design allows the sample not to mislead the model due to the over-highlighting of local information. On the other hand, the model can effectively cope with larger errors caused by unavoidable classification errors in the training process of single-sample pedestrian re-recognition tasks, so that the model is always effectively and gradually reduced, and the robustness and recognition accuracy of the model are finally improved.
In the mark 1501 public data set, the average accuracy average value (mAP) and the correct recognition rate (Rank-1) of the method are 82.1% and 90.3% respectively. The accuracy is higher than that of most existing single-sample pedestrian re-identification models, the characteristics can be extracted more effectively, and the identification accuracy of the network is improved.
Drawings
FIG. 1 is a pedestrian image of a portion within a mark 1501 dataset;
FIG. 2 is a schematic diagram of a prior art single sample recognition model for pedestrian re-recognition;
FIG. 3 is a network frame diagram of a single sample pedestrian re-recognition method based on feature compression provided in accordance with an embodiment of the present invention;
FIG. 4 is a flow chart of a single sample pedestrian re-identification method based on feature compression provided in accordance with an embodiment of the present invention;
Fig. 5 is a schematic diagram of RGB image pixel feature stepping provided according to an embodiment of the invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Fig. 1 shows a pedestrian image of a portion of a mark 1501 dataset, which is acquired by 6 cameras using Dynamic Movement Primitives pedestrian detection method, including 32668 images of 1501 pedestrians. Wherein 12936 images composed of 751 persons are used as training set, and 19732 images composed of 750 persons are used as test set. The test set is further divided into 3368 query images and 19364 images to be queried.
Compared to the traditional Re-ID task, a single sample Re-ID requires only a very small amount of tag information. Specifically, in a single sample Re-ID, each class has only one labeled sample, and the rest are unlabeled. In the iterative process of training, part of unlabeled samples are selected and pseudo labels are given for training. The Re-ID model is then updated with these data with the real tag and the pseudo tag. However, the effect obtained by the method is still different from the full-supervised learning. This is mainly because the single sample pedestrian re-recognition task faces three challenges: 1) How unlabeled samples are screened to assign pseudo tags; 2) How to formulate an effective loss function for semi-supervised learning; 3) How to avoid the overfitting phenomenon caused by data scarcity.
In recent years, in order to solve the problem of low model accuracy in single sample recognition, a number of single sample recognition methods for pedestrian re-recognition have been proposed. Such as a scheme of gradually adding samples to the pseudo tag samples and generating based on images of cameras, thereby improving the accuracy of model identification. But because of its camera-based picture generation scheme, the pictures actually generated are consistent with the cameras only in number and number. And the generated picture is not processed effectively, so that the characteristics of the picture are difficult to predict. And although the loss function optimizes the task, the loss function only optimizes the anchor point of each picture during training, and the possible interference caused by the error of image classification is not considered, so the accuracy is still not high enough. The model structure is shown in fig. 2.
Aiming at the problem of low accuracy of pedestrian Re-recognition of the current single sample, the invention provides a single sample recognition method (Feature Squeezing Re-ID, FS Re-ID) based on feature compression, which is directly applied to a given marked picture to generate more pictures of the sample with scattered features, so that the problem of too little training data in the single sample pedestrian Re-recognition is solved, the feature compression method is applied to the generated pictures and the original pictures, and meanwhile, the weighted difficult sample triplet loss is designed according to the condition of task image classification errors, thereby improving the model recognition accuracy.
The single-sample pedestrian re-recognition method based on feature compression provided by the embodiment of the invention, referring to fig. 3 and 4, performs the following steps S1-S6 to complete the recognition of the target pedestrian:
Step S1: selecting a standard data set of pedestrian re-recognition formed by pedestrian images, wherein the standard data set of pedestrian re-recognition is a mark-1501 data set, the pedestrian images are formed by three primary colors of red, green and blue, and the pedestrian images comprise three channels, and each channel corresponds to a primary color; the standard dataset for pedestrian re-recognition comprises tagged images and untagged images;
All the tagged images are put into the model constructed in the following steps, and the network is directly trained by using the small part of images, so that the model has better performance at the beginning, and model optimization is accelerated.
Step S2: based on CycleGAN networks, constructing an antagonism picture generation module, and taking tagged images in a standard dataset of pedestrian re-recognition as input to generate a preset number of antagonism generated images; so as to ensure the diversity of images and improve the generalization capability of the model;
Representing a standard dataset of pedestrian re-recognition as x= { l, u }, wherein l is a labeled image set and u is a non-labeled image set; the CycleGAN network training two generators G and G ' described in step S2, wherein one generator G inputs a labeled image l, outputs an unlabeled image u ', denoted as G (l) =u '; another generator G 'inputs an unlabeled image u, outputs a labeled image l', denoted as G '(u) =l';
Correspondingly training two scoring devices S and S', and respectively judging whether the images output by the two generators are good or not: if the generator G output image u 'and the unlabeled image set u have no similar images, the scoring device S scores the generator G output image u' with a preset low score, otherwise, if the generator G output image u 'and the unlabeled image set u have similar images, the scoring device S scores the generator G output image u' with a preset high score;
If the generator G 'outputs the image l' and the labeled image set l have no similar images, the scoring device S 'scores the generator G' outputs the image l 'with a preset low score, otherwise, if the generator G' outputs the image l 'and the labeled image set l have similar images, the scoring device S' scores the generator G 'outputs the image l' with a preset high score.
Step S3: based on ResNet network, introducing a feature compression method to perform data enhancement, constructing a single-sample pedestrian re-recognition network, and using the image generated in the step S2 and the unlabeled image in the standard data set of pedestrian re-recognition as inputs to obtain a distance matrix, and labeling the unlabeled image with a pseudo label according to the distance matrix;
The feature compression method introduced in step S3 includes a feature stepping technique, the idea of which is to reduce the color value of the input image originally expressed by i bits to be expressed by j bits only, and specifically, the technique compresses the features only before inputting ResNet. The computer only supports discrete representations as approximations of continuous natural data. A standard digital image is represented by an array of pixels, each pixel typically represented as three numbers representing red, green and blue colors. Since Market-1501 is used in the test dataset, two common representations are presented here as 8-bit gray scale and 24-bit color, respectively. The gray scale image provides 2 8 = 256 possible values for each pixel. The 8-bit value represents the intensity of the pixel, where 0 is black, 255 is white, and the middle number represents the different gray scale. The 8-bit gray scale can be extended to color images with separate red, green and blue channels. This provides 24 bits for each pixel, representing 2 24 ≡ 1600 tens of thousands of different colors. On the one hand, while one generally prefers a larger bit depth because it brings the displayed image closer to a natural image, a higher color depth is generally not necessary to interpret the image. On the other hand, in order to expand the diversity of samples, the number of bits of certain pixels is randomly reduced, and larger model changes caused by fine disturbance of the generated image can be avoided. As shown in fig. 5, the effect when the same picture is represented by different bits.
The RGB per-channel original 8-bit image is reduced to a smaller number of bits without significantly reducing human recognizability of the image. It is difficult to distinguish the difference between an original image with 8 bits of color per channel and an image with as little as 3 bits of color depth, so many antagonistic examples can be alleviated while maintaining the accuracy of legal examples with a strategy of compression to 3 to 5 bits;
In order to reduce the color value of the i bit depth of the input image to the j bit depth (1.ltoreq.j.ltoreq.7), the expression is as follows:
Where X is a single channel input value for each pixel of the input image and X is an output value. Dividing the input value x by 2 i -j, and then rounding down; next, the integer is multiplied by 2 i-j. This approach steps from the i-feature to the j-bit by an integer rounding operation. In the present invention, since the image is an RGB image, i=8.
And for the degree of random compression in the pictures, a strategy of random number generation is selected to randomly control each picture. Specifically, the present invention selects α 1 and α 2 as boundary values for the strategy of selecting to compress a picture to 3 to 5 bits. And these two values were used as super parameters, and were set before the experiment, in the present invention, α 1 was set to 0.3 and α 2 was set to 0.6.
For each picture, a value p is randomly generated, and the range is always limited to 0 to 1. And determining the characteristic compression condition of the picture according to the value of p, wherein the expression is as follows:
Step S4: randomly selecting P pedestrians, selecting K images for each pedestrian, inputting a single-sample pedestrian re-recognition network, and training the single-sample pedestrian re-recognition network by combining a weighted difficult-sample triplet loss function and a tag smooth regularized cross entropy loss function;
In this embodiment, the image is compressed by adopting a competitive occlusion mode, specifically, the skill compresses the features only before the input ResNet and after the feature is stepped. One big problem of single sample pedestrian re-recognition is that the training samples of the pictures are too few, once the model is limited to the local features of the pictures, the model cannot effectively recognize the overall features of the pictures, and the model falls into local optimum and batch optimum, but cannot reach the optimum condition of the whole model training stage. Thus, we choose to block the picture locally, however, the effect of the blocking will also greatly affect the effect of feature compression, and thus further affect the training effect obtained by the model. The invention provides a competition shielding method, which randomly shields any picture, inputs the original picture and all obtained pictures generated at the moment into the rest network, obtains the scores of the models, and selects the pictures. By means of partial competition shielding, the model can be forced to learn the characteristics of other wider areas, so that the probability of sinking local optimum and batch optimum of the model can be reduced, and the model can always achieve global optimum.
Specifically, in step S4, for the images of the same pedestrian, a competition shielding mode is adopted to divide each image into n areas which are not overlapped with each other, a part of areas in each image are shielded in sequence, each image generates n images, the shielded images select random areas of random pictures in a large-scale dataset ImageNet so as to improve the randomness of the pictures, and thus the generalization capability of the model is enhanced; and generating a preset number of images to serve as training samples of the single-sample pedestrian re-recognition network. And then, inputting all the original pictures pic and the shielding pictures pic of the picture into a model, scoring all the pictures, calculating Euclidean distance, and selecting the picture with the lowest score. The formula is calculated as follows:
Wherein pic is the current picture, pic i is the i-th competing picture, and Win (pic) is the selected logically optimal picture. By using The calculation of the Euclidean distance is performed, and the formula is calculated as follows:
Wherein dim is the final dimension number of the feature, and i and j are different features.
Unlike common random occlusion, after one picture subjected to competitive occlusion is randomly occluded for a plurality of times, the original picture and all pictures are input into network computing characteristics, and unlike a strategy of selecting a picture with a higher score after CycleGAN training, the competitive occlusion always selects a picture with a lower score, because the nature of the occluded picture is the original picture, the picture with a lower score can often occlude more critical information. The shielding mode can possibly cause that part of important information is shielded, and a logically optimal picture is selected to replace a picture at a certain stage, so that the model can be forced to pay more attention to the whole information, and misleading of the model caused by noise characteristics by the picture is reduced.
Thus, occlusion forces the model to learn more and more extensive areas on the one hand, and on the other hand, while masking critical information may occur during occlusion, it is often unavoidable that the model is focused on unimportant areas because it is not always focused on effectively identifying areas. Thus, in most cases, the model may focus on non-important areas, as compared to a small number of adverse situations that may exist, by obscuring the area the model is focusing on in competing occlusions, the model can be effectively forced to focus on other areas and on truly important areas. In addition, by competing for occlusion and maximizing feature selection, the effect of occlusion can be maximized, the proportion of features that may mislead the model can be reduced, and compression processing of the features can be performed.
In this embodiment, the picture is feature compressed in a spatially smooth manner, specifically, the technique compresses the features at the second BottleNeck of each layer ResNet. This method is a very time-consuming but efficient method since it does not require additional parameters. The local smoothing method smoothes each pixel with neighboring pixels, and the local smoothing method can be designed as a gaussian smoothing, a mean smoothing, or a median smoothing method by selecting different mechanisms to weight neighboring pixels.
In which a median smoothing method, a median filter runs a sliding window over each pixel of the image, with its center pixel replaced by the median of the neighboring pixels within the window. It does not actually reduce the number of pixels in the image, but rather disperses the pixel values into nearby pixels. The median filter essentially extrudes features from the sample by making adjacent pixels more similar.
The window size is a configurable parameter, ranging from 1 to the image size. If it is set to the image size, it flattens the entire image to one color. The present invention selects a square window of length 3. Several filling methods can be used for pixels on the edges, since there are no real pixels to fill the window, a reflective filling is chosen in this embodiment, the image is mirrored with the edges in order to calculate the median of the window if necessary.
In the step S4, a weighted difficult sample triplet loss function and a label smoothing regularized cross entropy loss function joint training network are adopted, and the two loss functions are commonly used loss functions in the field of pedestrian re-identification. The weighted difficult sample loss is more challenging when three images are selected on the basis of the traditional triplet loss, so that the feature extraction capacity of a model can be improved, and a weight is added for each value;
P pedestrians are selected, K images are selected for each pedestrian, and then the images are combined into a data set with the size of P multiplied by K. Then, for each training, three images within the class that are furthest from each other and closest to each other are selected from this dataset to form a triplet, and the most difficult pair of samples is used to calculate the loss function. The loss function expression is as follows:
Wherein A is an anchor point sample, P is a positive sample, N is a negative sample, and alpha is a threshold parameter;
However, because of errors in the sample labels in the single-sample pedestrian re-identification, the simplest anchor point sample is directly selected as the center of the sample label, and excessive errors can be possibly caused, so that the center value of each sample and the center value of the adjacent positive sample are always calculated as the anchor point of the sample when the actual score of each sample is calculated, and the huge influence caused when the samples are classified wrongly can be effectively avoided.
On the other hand, when the negative sample is used as the positive sample, the characteristic of the negative sample is far away from the characteristic of the positive sample due to the fact that the negative sample is in error, so that the characteristic of the negative sample is rapidly pulled towards the positive sample, and the negative sample in the positive sample group cannot be recognized by the model, and the instability of the model is caused.
Also, since there is an error in the negative samples in the single sample pedestrian re-recognition, i.e. positive samples are identified as negative samples, because their features are close to the positive samples, which is too close to the features of the positive samples, will cause the features of the positive samples to be pushed rapidly towards the negative samples, making the model more unable to identify positive samples in the negative sample population, which also makes the data tend to be in distinct different categories, although in the view of the model, in practice, in each category the positive and negative samples are hybridized with each other, which has a long-lasting effect on the final performance.
In practice, in a single sample pedestrian re-recognition model, the erroneous classification generated during training is unavoidable, but in order to improve the severe interference caused by such erroneous classification, a coefficient is selected to be added to each value, and if one value is small or similar to the other value, even if the influence that this value may have is erroneous, it is tolerable because its influence on the model is small. But if some values are too large, this will have a non-negligible effect on the model if they are left free for modification of the model parameters. Specifically, all data are subjected to batch normalization and a sigmoid function is used for limiting values, and the expression is as follows:
Where f is the input feature, P is the positive sample, BN is the batch normalization operation, Is an output feature;
Finally, the weighted hard sample triplet loss function L Sig_Hard_TriLoss described in step S4 is obtained as follows:
Wherein A is an anchor point sample, P is a positive sample, N is a negative sample, and alpha is a threshold parameter;
The invention also adopts a cross entropy loss function, which is the most commonly used loss function in the task of pedestrian re-recognition, and measures the difference between the model prediction result and the real label, and the cross entropy loss function L cross-entropy has the following formula:
In the formula, num is the number of pedestrians, y is the real label of the pedestrians, p i is the output predicted identity probability value, and q i is the following formula:
Where q i is 1 when the predicted tag i is equal to the pedestrian genuine tag y, otherwise 0.
In order to further improve the robustness of the model and avoid the phenomenon that the direct use of the cross entropy loss function may cause overfitting, a label smoothing method is introduced, namely, a small number of wrong labels are distributed to the network, and q i is as follows:
Wherein epsilon is the error rate and is generally 0.1; substituting the cross entropy loss function to obtain a cross entropy loss function L lsr of label smoothing regularization in the step S4;
the cross entropy loss function of the combined weighting difficult sample triplet loss function and the label smoothing regularization ensures that samples achieve better clustering effect in a feature space, the precision of pedestrian re-identification is improved, and the combined loss function has the following formula:
Lsum=LSig_Hard_Tri+Llsr
Step S5: if the number of the designated training rounds is reached or all the images are marked with pseudo labels, training is completed, a trained single-sample pedestrian re-recognition network is obtained, otherwise, the step S3 is returned, and training is continued on the single-sample pedestrian re-recognition network;
step S6: and (5) applying the trained single-sample pedestrian re-recognition network to finish recognition of the target pedestrians.
The test flow of the invention is as follows:
Step 1: inputting a query set and an atlas set GALLERY SET, and entering step 2;
Step 2: extracting features of all pedestrian images of the query set (query set) and the atlas set (GALLERY SET) input in the step 1 by using a model obtained in the training process, and entering the step 3;
Step 3: calculating the similarity between the features of the query set and the features of the gallery set, and entering step 4;
Step 4: obtaining a matching result corresponding to each pedestrian image in the query set according to the similarity, and entering step 5;
Step 5: and (5) ending.
Preferably, the query set in step 1 of the test flow represents a resistance sample, and the gallery set represents a set of pedestrian images for which the query set matches.
Preferably, in step 2 of the test flow, the tested data is subjected to feature extraction by the backbone network, without CycleGAN and feature compression processes, as shown in fig. 3.
Preferably, in step 4 in the test flow, each query set has several images matched from the gallery set, and the images are used as evaluation indexes according to Rank-n and average Precision (mAP). Wherein Rank-n represents the probability of correct results in the n top images in the matched results, and mAP can embody the average retrieval performance of the method.
The following is a specific embodiment of a single-sample pedestrian re-identification method based on feature compression, which is designed by applying the invention:
Example 1:
The present embodiment is evaluated on a widely used test set of pedestrian re-identification public data set mark 1501. Market-1501 contains 32,668 images and 1,501 person IDs, with 751 person IDs serving as a training set and 750 person IDs serving as a testing set.
The present embodiment uses Cumulative Matching Characteristics (CMC) curves and average accuracy (mAP) to evaluate the performance of the system. CMC curves are the correct matching accuracy for different order numbers. Typically, the Re-ID task selects Rank-1, rank-5, and Rank-10 to represent the CMC curve, mAP is the average of the average Accuracy (AP) of all query images.
In this embodiment, a ResNet network is selected as a basic skeleton of the network, parameters ResNet are pre-trained on ImageNet, and the stride of the last bottleneck block of the network is set to be 1. In the aspect of image preprocessing, three data enhancement methods of random clipping, horizontal overturning and erasure are considered, and the tag smoothing regularization rate is set to be 0.1. The image size is uniformly set to 256×128. The learning rate was initially set to 3.5X10 -6, the first ten epochs were linearly increased to 3.5X10 -5, and thereafter decreased to 1/10 of the original at the 40 th and 80 th epochs, respectively. During testing, the pedestrian characteristics passing through the BN layer are used for searching, and cosine distances are used for measuring the distances among pedestrians. This embodiment uses pytorch 1.8.8 deep learning frameworks, accelerated with one NVIDIA 3090.
In this embodiment, the identification method of the present invention is evaluated in the mark 1501 dataset. The performance of each method on the mark 1501 dataset is compared to the results shown in table 1.
TABLE 1 comparison of the method of the invention with other methods on the Market1501 dataset%
As can be seen from table 1, in the conventional methods, the above four methods adopt a strategy of transfer learning, so that the accuracy is high, and the accuracy is slightly low in the single sample recognition learning method adopted by the following three methods.
The method designed by the invention achieves the new and most advanced performance on the mark 1501, wherein mAP is 43.4%, and Rank-1 is 75.3%. Compared with the most advanced single sample recognition learning method PSM in the table, the method improves the accuracy of mAP indexes by 1.2% on the mark 1501.
In comparison with the migration learning group, the method and the system have the advantages that competitive results are obtained, certain improvement is obtained on the mAP and Rank-1 indexes, the method and the loss function provided by the invention provide a better processing capacity of the model for difficult data, and the model is prevented from being greatly interfered, so that model convergence is accelerated, and the final precision of the model is improved. The present invention actually achieves the same accuracy as the transfer learning, while the present invention requires fewer training tags, which demonstrates the efficiency of the present invention.
In summary, the single sample recognition learning method provided by the invention can effectively cope with the problem of unavoidable erroneous classification of the single sample pedestrian recognition task when processing the single sample pedestrian recognition task, reduce the influence of the single sample recognition learning method on the model and improve the recognition accuracy of the network.
Example 2:
This embodiment will describe an applicable scenario of the present invention.
When a crime occurs in a certain place, the police has only a small number of pictures of criminals, for example, only 3 pictures, and the action track of the criminal needs to be found by combining the small number of pictures and utilizing a large number of monitoring nearby, so that capturing is implemented. If manual searching is adopted, a great deal of manpower and time are consumed, meanwhile, the escape time of the criminal is also given, the time and opportunity for the criminal to make crime again are possibly given, and the case breaking rate and social public safety are also affected. If the common pedestrian re-recognition model is used for recognition, the effect of the model is greatly reduced due to the transition of the picture style, and a large amount of manpower is required for secondary authentication.
Therefore, by adopting the identification method provided by the invention, criminals can be found out quickly and efficiently.
Firstly, cutting out the monitored figure picture by using a computer, and marking by a small amount of manpower;
then, training the model by utilizing the strategy of the model to adapt to the image style;
Then, inputting the picture of the criminals into a model to obtain an accurate identification result;
therefore, a relatively accurate detection result can be obtained by only needing a small amount of manpower, so that the efficiency of capturing criminals by police and the social security sense are improved.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention.

Claims (5)

1. The single-sample pedestrian re-recognition method based on the feature compression is characterized by comprising the following steps of S1-S6, and the recognition of the target pedestrian is completed:
Step S1: selecting a standard data set of pedestrian re-recognition formed by pedestrian images, wherein the standard data set of pedestrian re-recognition comprises labeled images and unlabeled images;
step S2: based on CycleGAN networks, constructing an antagonism picture generation module, and taking tagged images in a standard dataset of pedestrian re-recognition as input to generate a preset number of antagonism generated images;
Step S3: based on ResNet network, introducing a feature compression method to perform data enhancement, constructing a single-sample pedestrian re-recognition network, and using the image generated in the step S2 and the unlabeled image in the standard data set of pedestrian re-recognition as inputs to obtain a distance matrix, and labeling the unlabeled image with a pseudo label according to the distance matrix;
Step S4: randomly selecting P pedestrians, selecting K images for each pedestrian, inputting a single-sample pedestrian re-recognition network, and training the single-sample pedestrian re-recognition network by combining a weighted difficult-sample triplet loss function and a tag smooth regularized cross entropy loss function;
The weighted hard sample triplet loss function L Sig_Hard_TriLoss described in step S4 is as follows:
wherein A is an anchor point sample, P is a positive sample, N is a negative sample, alpha is a threshold parameter, and The calculation of the Euclidean distance is performed, and the formula is calculated as follows:
For the feature operation, the formula is calculated as follows:
Where f is the input feature, P is the positive sample, BN is the batch normalization operation, Is an output feature;
The cross entropy loss function L cross-entropy is as follows:
in the formula, num is the number of pedestrians, y is the real label of the pedestrians, p i is the output predicted identity probability value, and q i is as follows:
Wherein epsilon is the error rate, and substitutes the error rate into a cross entropy loss function to obtain a cross entropy loss function L lsr of label smoothing regularization in the step S4;
the cross entropy loss function of the joint weighted hard sample triplet loss function and label smoothing regularization is as follows:
Lsum=LSig_Hard_Tri+Llsr
Step S5: if the number of the designated training rounds is reached or all the images are marked with pseudo labels, training is completed, a trained single-sample pedestrian re-recognition network is obtained, otherwise, the step S3 is returned, and training is continued on the single-sample pedestrian re-recognition network;
step S6: and (5) applying the trained single-sample pedestrian re-recognition network to finish recognition of the target pedestrians.
2. The method for single sample pedestrian re-recognition based on feature compression of claim 1, wherein the standard dataset for pedestrian re-recognition in step S1 is a mark-1501 dataset.
3. The single-sample pedestrian re-recognition method based on feature compression of claim 1, wherein a standard dataset of pedestrian re-recognition is expressed as x= { l, u }, where l is a labeled image set and u is a non-labeled image set; the CycleGAN network training two generators G and G ' described in step S2, wherein one generator G inputs a labeled image l, outputs an unlabeled image u ', denoted as G (l) =u '; another generator G 'inputs an unlabeled image u, outputs a labeled image l', denoted as G '(u) =l';
Correspondingly training two scoring devices S and S', and respectively judging whether the images output by the two generators are good or not: if the generator G output image u 'and the unlabeled image set u have no similar images, the scoring device S scores the generator G output image u' with a preset low score, otherwise, if the generator G output image u 'and the unlabeled image set u have similar images, the scoring device S scores the generator G output image u' with a preset high score;
If the generator G 'outputs the image l' and the labeled image set l have no similar images, the scoring device S 'scores the generator G' outputs the image l 'with a preset low score, otherwise, if the generator G' outputs the image l 'and the labeled image set l have similar images, the scoring device S' scores the generator G 'outputs the image l' with a preset high score.
4. The method for identifying the pedestrian re-by-pedestrian with single sample based on the characteristic compression according to claim 1, wherein the characteristic compression method introduced in the step S3 comprises a characteristic ladder technology, and the color value expressed by i bits of the input image is reduced to be expressed by j bits, and the expression is as follows:
wherein X is a single-channel input value of each pixel of the input image, and X is an output value;
Randomly generating a value p for each image, the range being defined between 0 and 1; and determining the characteristic compression condition of the image according to the value of p, wherein the expression is as follows:
where α 1 and α 2 are boundary values.
5. The single-sample pedestrian re-recognition method based on feature compression according to claim 1, wherein in step S4, each image is divided into n areas which are not overlapped with each other by adopting a competitive shielding mode for the images of the same pedestrian, a part of the areas in each image are shielded in sequence, each image generates n images, the shielded images select random areas of random pictures in a large-scale dataset ImageNet to generate a preset number of images, and the images are used as training samples of a single-sample pedestrian re-recognition network.
CN202311371401.3A 2023-10-20 2023-10-20 Single-sample pedestrian re-identification method based on feature compression Active CN117612201B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311371401.3A CN117612201B (en) 2023-10-20 2023-10-20 Single-sample pedestrian re-identification method based on feature compression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311371401.3A CN117612201B (en) 2023-10-20 2023-10-20 Single-sample pedestrian re-identification method based on feature compression

Publications (2)

Publication Number Publication Date
CN117612201A CN117612201A (en) 2024-02-27
CN117612201B true CN117612201B (en) 2024-05-28

Family

ID=89955049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311371401.3A Active CN117612201B (en) 2023-10-20 2023-10-20 Single-sample pedestrian re-identification method based on feature compression

Country Status (1)

Country Link
CN (1) CN117612201B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242064A (en) * 2020-01-17 2020-06-05 山东师范大学 Pedestrian re-identification method and system based on camera style migration and single marking
CN112131961A (en) * 2020-08-28 2020-12-25 中国海洋大学 Semi-supervised pedestrian re-identification method based on single sample
CN112668544A (en) * 2021-01-13 2021-04-16 昆明理工大学 Pedestrian re-identification method based on hard sample confusion and feature robustness enhancement
CN112906606A (en) * 2021-03-05 2021-06-04 南京航空航天大学 Domain-adaptive pedestrian re-identification method based on mutual divergence learning
CN114782997A (en) * 2022-05-12 2022-07-22 东南大学 Pedestrian re-identification method and system based on multi-loss attention adaptive network
CN116824695A (en) * 2023-06-07 2023-09-29 南通大学 Pedestrian re-identification non-local defense method based on feature denoising

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163234B (en) * 2018-10-10 2023-04-18 腾讯科技(深圳)有限公司 Model training method and device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242064A (en) * 2020-01-17 2020-06-05 山东师范大学 Pedestrian re-identification method and system based on camera style migration and single marking
CN112131961A (en) * 2020-08-28 2020-12-25 中国海洋大学 Semi-supervised pedestrian re-identification method based on single sample
CN112668544A (en) * 2021-01-13 2021-04-16 昆明理工大学 Pedestrian re-identification method based on hard sample confusion and feature robustness enhancement
CN112906606A (en) * 2021-03-05 2021-06-04 南京航空航天大学 Domain-adaptive pedestrian re-identification method based on mutual divergence learning
CN114782997A (en) * 2022-05-12 2022-07-22 东南大学 Pedestrian re-identification method and system based on multi-loss attention adaptive network
CN116824695A (en) * 2023-06-07 2023-09-29 南通大学 Pedestrian re-identification non-local defense method based on feature denoising

Also Published As

Publication number Publication date
CN117612201A (en) 2024-02-27

Similar Documents

Publication Publication Date Title
US11195051B2 (en) Method for person re-identification based on deep model with multi-loss fusion training strategy
US11315345B2 (en) Method for dim and small object detection based on discriminant feature of video satellite data
US20220092882A1 (en) Living body detection method based on facial recognition, and electronic device and storage medium
CN110210539B (en) RGB-T image saliency target detection method based on multi-level depth feature fusion
CN109670528B (en) Data expansion method facing pedestrian re-identification task and based on paired sample random occlusion strategy
CN110427807B (en) Time sequence event action detection method
Varga et al. Fully automatic image colorization based on Convolutional Neural Network
CN113011319A (en) Multi-scale fire target identification method and system
CN111274922B (en) Pedestrian re-identification method and system based on multi-level deep learning network
CN107463920A (en) A kind of face identification method for eliminating partial occlusion thing and influenceing
CN109410184B (en) Live broadcast pornographic image detection method based on dense confrontation network semi-supervised learning
CN106446015A (en) Video content access prediction and recommendation method based on user behavior preference
CN106529499A (en) Fourier descriptor and gait energy image fusion feature-based gait identification method
CN109255289B (en) Cross-aging face recognition method based on unified generation model
CN109657715B (en) Semantic segmentation method, device, equipment and medium
CN114067444A (en) Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature
CN113688894B (en) Fine granularity image classification method integrating multiple granularity features
CN113011357A (en) Depth fake face video positioning method based on space-time fusion
CN113239801B (en) Cross-domain action recognition method based on multi-scale feature learning and multi-level domain alignment
CN108596240B (en) Image semantic segmentation method based on discriminant feature network
CN109325507A (en) A kind of image classification algorithms and system of combination super-pixel significant characteristics and HOG feature
CN109492528A (en) A kind of recognition methods again of the pedestrian based on gaussian sum depth characteristic
CN114255403A (en) Optical remote sensing image data processing method and system based on deep learning
CN109165698A (en) A kind of image classification recognition methods and its storage medium towards wisdom traffic
CN111191531A (en) Rapid pedestrian detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant