CN115063836A - Pedestrian tracking and re-identification method based on deep learning - Google Patents

Pedestrian tracking and re-identification method based on deep learning Download PDF

Info

Publication number
CN115063836A
CN115063836A CN202210657848.6A CN202210657848A CN115063836A CN 115063836 A CN115063836 A CN 115063836A CN 202210657848 A CN202210657848 A CN 202210657848A CN 115063836 A CN115063836 A CN 115063836A
Authority
CN
China
Prior art keywords
pedestrian
matching
tracking
track
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210657848.6A
Other languages
Chinese (zh)
Inventor
王璇
宋永超
吕骏
王莹洁
徐金东
赵金东
阎维青
雷明威
李凯强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yantai University
Original Assignee
Yantai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yantai University filed Critical Yantai University
Priority to CN202210657848.6A priority Critical patent/CN115063836A/en
Publication of CN115063836A publication Critical patent/CN115063836A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

A pedestrian tracking and re-identification method based on deep learning comprises the following steps; step 1: carrying out pedestrian target detection on the video images frame by frame; step 2: performing feature extraction on the pedestrian detected in each frame in the step 1 by adopting a Deepsort model to generate a npy file; and step 3: adopting Fastreid to carry out pedestrian re-identification detection, and carrying out feature extraction according to a preset pedestrian picture base to generate a npy file; and 4, step 4: cosine similarity calculation is carried out on the feature extraction result of each pedestrian target and the feature extraction result of the specific pedestrian base, if the cosine similarity is larger than a threshold value gamma, the specific pedestrian target needing re-identification is judged, and tracking of the pedestrian is carried out, otherwise, target tracking is not carried out; the invention can accurately position specific pedestrians crossing time, regions and cameras, can carry out reasoning and detection through real-time videos, achieves the optimal effect through a series of improvements, finally finishes the project landing, and can be generally applied to systems such as intelligent monitoring, intelligent security and the like.

Description

Pedestrian tracking and re-identification method based on deep learning
Technical Field
The invention belongs to the technical field of intelligent monitoring and security protection, and particularly relates to a pedestrian tracking and re-identification method based on deep learning.
Background
With the development of science and technology, surveillance videos are widely applied to the fields of commerce, security, searching and the like, and play a very important role in daily life of people. Pedestrian re-recognition technology since the rise of face recognition technology has been developed as one of the main directions of computer vision. Although the face recognition technology is developed more mature, under the conditions of high-density people, low resolution of a capturing camera, angle deviation of the camera and the like, the face recognition technology cannot achieve an ideal effect, but the face recognition technology can continuously play an important role, timely positions and identifies specific pedestrians in a monitoring video, and has important significance for investigation of criminal cases, search and rescue of missing people and the like.
Until now, enterprises in the field of domestic and foreign artificial intelligence have intensive research on the pedestrian re-identification technology, and the current pedestrian re-identification technology has the following research difficulties and problems:
(1) the real-world pedestrians can have complex and changeable situations such as barrier shielding, day-to-night, clothing change and the like, and the precision of the experimental algorithm is difficult to achieve.
(2) The cross-region identification has the safety and privacy problems, the data set is difficult to obtain, and the method for obtaining the model algorithm with high robustness under the condition of sample imbalance is extremely challenging.
(3) When the camera is crossed for tracking, due to the change of the camera, light and shade, barrier shielding and camera definition can be changed accordingly, and the same target can be still identified without the limitation of a tracking range, so that the problem that the pedestrian re-identification technology needs to solve urgently is solved.
Disclosure of Invention
In order to overcome the technical problems, the invention provides a pedestrian tracking and re-identification method based on deep learning, which combines an improved YOLOv5-Lite target detection algorithm and an improved Deepsort target tracking algorithm, can accurately position specific pedestrians across time, regions and cameras, can carry out reasoning and detection through real-time video, and enables a system model to achieve the optimal effect through a series of improvement measures.
In order to achieve the purpose, the invention adopts the following technical scheme:
a pedestrian tracking and re-identification method based on deep learning comprises the following steps;
step 1: carrying out pedestrian target detection on the video image frame by adopting an improved YOLOv5-Lite model;
and 2, step: performing feature extraction on the pedestrian detected in each frame in the step 1 by adopting a Deepsort model to generate a npy file;
and step 3: adopting Fastreid to carry out pedestrian re-identification detection, and carrying out feature extraction according to a preset pedestrian picture base to generate a npy file;
and 4, step 4: performing cosine similarity calculation on the feature extraction result of each pedestrian target in the step 2 and the feature extraction result of the specific pedestrian base in the step 3, wherein the formula is shown as (1), and x 1 And x 2 And if the number of the vectors is two non-0 vectors and is greater than a threshold value gamma, judging the vectors to be specific pedestrian targets needing re-identification, and tracking the pedestrians by using the tracking strategy of the improved Deepsort model, otherwise, not tracking the targets.
Figure BDA0003689045630000031
Further, the step 1 comprises the following sub-steps:
step 1.1: inputting an improved YOLOv5-Lite model network structure into a picture of a data set, adding a BiFPN module on the basis of YOLOv5-Lite, realizing the combination of cross-scale bidirectional connection and rapid normalization by the BiFPN, inputting different feature weights, and enabling the network to learn by itself, wherein the weights are normalized to be between 0 and 1 in a Softmax-based fusion mode as shown in a formula (2):
Figure BDA0003689045630000032
wherein, w i And w j Is a learnable weight;
and step 1.2, extracting the features of the picture by using the convolutional neural network, then outputting a feature graph, dividing the picture into small blocks and generating an anchor frame, associating the labeled prediction frame with the feature graph, and finally establishing a loss function and starting end-to-end training, wherein the loss function is shown as a formula (3).
Figure BDA0003689045630000033
Wherein
Figure BDA0003689045630000034
RepresentsC represents the diagonal distance of the minimum closure area which can contain the prediction frame and the real frame at the same time. v is used to measure the consistency of the relative proportions of the two rectangular boxes, and α is a weighting factor:
Figure BDA0003689045630000035
further, the step 3 comprises the following sub-steps:
step 3.1: after an input picture is preprocessed, a pre-training model ResNet50 is called as a backhaul, an output feature graph represents an object through a feature vector in a polymerization mode through Gem Pooling, the feature vector obtained in the front is changed to a certain extent through a Bnneck module, and finally triple loss is defined to learn similarity in classification and discrimination in classification, so that direct discrimination between different feature vectors is more obvious, and the same feature vectors converge.
The triple loss input is a triple including an Anchor (Anchor) example, a Positive (Positive) example and a Negative (Negative) example, similarity calculation between samples is realized by optimizing the distance between the Anchor example and the Positive example to be smaller than the distance between the Anchor example and the Negative example, and a: anchor, anchor example; p: positive, a sample of the same class as a; n: negative, a sample of a different class than a; margin is a constant greater than 0:
L=max(d(a,p)-d(a,n)+margin,0)(5)
further, the tracking strategy in step 4 includes the following sub-steps:
step 4.1: tracking the specific pedestrian target selected for re-identification by using an NSA Kalman filtering algorithm, specifically updating the appearance state of the ith track at the frame t in an exponential moving average manner
Figure BDA0003689045630000041
Figure BDA0003689045630000042
Wherein f is i t Is the appearance embedding of the current match detection and α ═ 0.9 is the momentum term.
Adaptive noise is added at the same time to enhance the robustness of tracking, wherein the covariance of the adaptive noise
Figure BDA0003689045630000043
As shown in equation (7):
Figure BDA0003689045630000044
wherein R is k Is a predetermined constant measured noise covariance, c k Is the detection confidence score in state k and no longer uses only the appearance feature distance in the matching process, but considers both appearance and motion information;
the matching cascade is replaced by a common global linear distribution, wherein the distribution matrix C is the appearance cost A a And cost of action A m Weighted sum of (c):
C=λA a +(1-λ)A m (8)
wherein the weighting factor λ is set to 0.98;
step 4.2: after the tracking track is predicted by a Kalman filtering algorithm, a track is predicted for the current frame, if confirmation (pedestrians or vehicles) is predicted, detection is carried out on the current frame, then the detection frame and the confirmed track frame are in cascade matching, and the tracked detection frame is updated after matching is completed;
if the track matching fails, IoU matching is carried out again, if the matching is successful, then updating is carried out again, and then the tracking process of prediction-observation-updating is repeated. IoU matching failures are divided into observation matching failures and trajectory matching failures: for observation matching failure, a method of establishing a new track is adopted, then three times of investigation is carried out, and if the target (pedestrian or vehicle) is still an actual target, the confirmation is carried out; and judging whether the track matching fails, judging whether the track matching is confirmed to be a pedestrian or a vehicle, if the track matching is not confirmed, deleting the track matching, otherwise, setting a threshold value for the track matching, if the track matching is larger than the threshold value max _ age, deleting the track matching, considering that the track matching is moved out of the observation range, and if the track matching is smaller than the threshold value, carrying out three times of investigation on the track matching again and returning to the initial stage.
The invention has the beneficial effects.
The invention realizes the real-time tracking and re-identification of the specific pedestrian target. Compared with the algorithm before improvement, the YOLOv5-Lite detection module model improves the recognition accuracy by 3% while keeping the average accuracy, and the detection accuracy can reach 92%; the Deepsort tracking module model has different amplitude improvements on various indexes for evaluating the tracking performance, and a better tracking effect is obtained; the feature extraction logic of the Fastreid re-identification module is optimized, so that the algorithm speed is improved in a crossing manner; the integral model provided by the invention can reach higher precision in real-time detection, thereby meeting the requirement of actual video monitoring and having wide application prospect.
Description of the drawings:
FIG. 1 Overall flow of pedestrian re-identification System
FIG. 2 is a modified YOLOv5-Lite model network structure.
Figure 3 tracking algorithm Deepsort improves strategy diagram.
Fig. 4 shows pictures of the pedestrian to be detected, which are named bag and red from left to right.
Fig. 5 the pedestrian bag recognizes the effect again in the area 1.
Fig. 6 shows that the pedestrian red has a heavy recognition effect in the region 1.
Fig. 7 the pedestrian bag re-identifies the effect in area 2.
Fig. 8 the pedestrian red re-recognizes the effect in the area 2.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings and the attached tables in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
A pedestrian tracking and re-identification method based on deep learning comprises the following steps;
step 1: carrying out pedestrian target detection on the video image frame by adopting an improved YOLOv5-Lite model;
step 2: adopting a DeepSort model to perform feature extraction on the pedestrians detected in each frame in the step 1 to generate npy files;
and step 3: adopting Fastreid to carry out pedestrian re-identification detection, and carrying out feature extraction according to a preset pedestrian picture base to generate a npy file;
and 4, step 4: and (3) performing cosine similarity calculation on the feature extraction result of each pedestrian target in the step (2) and the feature extraction result of the specific pedestrian base in the step (3), wherein the formula is shown as (1), if the feature extraction result is greater than a threshold value gamma, the specific pedestrian target is judged to be the specific pedestrian target needing re-identification, the tracking of the pedestrian is performed by utilizing the tracking strategy of the improved Deepsort model, and otherwise, the target tracking is not performed
Figure BDA0003689045630000071
The step 1 comprises the following substeps:
step 1.1: the picture of the data set is input into a modified YOLOv5-Lite model network structure, as shown in fig. 2. According to the invention, a BiFPN module (a weighted bidirectional feature pyramid network) is added on the basis of the original YOLOv5-Lite, so that the feature extraction is effectively enhanced. The BiFPN realizes the combination of cross-scale bidirectional connection and quick normalization, inputs different feature weights and enables a network to learn by itself, and normalizes the weights between 0 and 1 in a mode of Softmax-based fusion as shown in a formula (2):
Figure BDA0003689045630000072
wherein, w i And w j Are learnable weights.
And 1.2, extracting the features of the picture by using the convolutional neural network, then outputting a feature graph, dividing the picture into small blocks, generating an anchor frame, associating the labeled prediction frame with the feature graph, and finally establishing a loss function and starting end-to-end training, wherein the loss function is shown as a formula (3).
Figure BDA0003689045630000081
Wherein
Figure BDA0003689045630000082
Representing the euclidean distance between the center points of the prediction box and the real box. c represents the diagonal distance of the minimum closure area that can contain both the prediction box and the real box. v is used to measure the consistency of the relative proportions of the two rectangular boxes, and α is a weighting factor:
Figure BDA0003689045630000083
the step 3 comprises the following substeps:
step 3.1: after an input picture is preprocessed, a pre-training model ResNet50 is called as a backhaul, an output feature graph represents an object through a feature vector in a polymerization mode through Gem Pooling, the feature vector obtained in the front is changed to a certain extent through a Bnneck module, and finally triple loss is defined to learn similarity in classification and discrimination in classification, so that direct discrimination between different feature vectors is more obvious, and the same feature vectors converge.
The triple loss input is a triple including an Anchor (Anchor) example, a Positive (Positive) example and a Negative (Negative) example, similarity calculation between samples is realized by optimizing the distance between the Anchor example and the Positive example to be smaller than the distance between the Anchor example and the Negative example, and a: anchor, anchor example; p: positive, a sample of the same class as a; n: negative, a sample of a different class than a; margin is a constant greater than 0:
L=max(d(a,p)-d(a,n)+margin,0) (5)
the invention improves the tracking strategy in Deepsort, and the tracking strategy in the step 4 comprises the following substeps:
step 4.1: and tracking the specific pedestrian target selected for re-identification by using an NSA Kalman filtering algorithm. In particular, the appearance state of the ith track at the frame t is updated in an exponential moving average mode
Figure BDA0003689045630000091
Figure BDA0003689045630000092
Wherein f is i t Is the appearance embedding of the current match detection and α ═ 0.9 is the momentum term.
Adaptive noise is added into the algorithm at the same time to enhance the robustness of tracking. Wherein the covariance of the noise is adapted
Figure BDA0003689045630000093
As shown in equation (7):
Figure BDA0003689045630000094
wherein R is k Is a predetermined constant measured noise covariance, c k Is the detection confidence score in state k and no longer uses only the appearance feature distance in the matching process, but considers both appearance and motion information.
In order to solve the problem that the matching precision is limited by the additional prior constraint, common global linear distribution is adopted to replace matching cascade. Wherein the allocation matrix C is the appearance cost A a And cost of action A m Weighted sum of (c):
C=λA a +(1-λ)A m (8)
with the weighting factor lambda set to 0.98.
Step 4.2: after the tracking track is predicted through a Kalman filtering algorithm, a track is predicted for a current frame, if confirmation (pedestrians or vehicles) is predicted, detection is carried out on the current frame, then the detection frame and the confirmed track frame are in cascade matching, and the tracked detection frame is updated after matching is completed.
If the track matching fails, IoU matching is carried out again, if the matching is successful, then updating is carried out again, and then the tracking process of prediction-observation-updating is repeated. IoU matching failures are divided into observation matching failures and trajectory matching failures: for observation matching failure, a method of establishing a new track is adopted, then three times of investigation is carried out, and if the target (pedestrian or vehicle) is still an actual target, the confirmation is carried out; and judging whether the track matching fails, judging whether the track matching is confirmed to be a pedestrian or a vehicle, if the track matching is not confirmed, deleting the track matching, otherwise, setting a threshold value for the track matching, if the track matching is larger than the threshold value max _ age, deleting the track matching, considering that the track matching is moved out of the observation range, and if the track matching is smaller than the threshold value, carrying out three times of investigation on the track matching again and returning to the initial stage.
Example (b):
as shown in fig. 1: firstly, intercepting a picture of a target pedestrian, then, performing feature extraction on an intercepted pedestrian base through a Fastreid feature extraction model to generate a corresponding npy file, reading in a video to be detected, detecting all pedestrians in a current video frame by using a YOLOv5-Lite target detection algorithm, then, performing feature extraction on the detected pedestrians by using a Deepsort algorithm to generate a npy file, calculating cosine similarity of the two generated npy files, judging the pedestrians as the target pedestrians if the similarity is greater than a threshold gamma, tracking the target pedestrians by using the Deepsort algorithm, and finally, displaying the whole flow through simple visualization, wherein the similarity is less than the threshold gamma and is a non-target pedestrian.
As shown in fig. 2: the Concat modules of the original network header are all replaced with BiFPN _ Concat modules.
As shown in fig. 3, the ordinary kalman filter is replaced by the NSA kalman filter algorithm, and the ordinary global linear distribution is adopted to replace the matching cascade, so as to update the track appearance in an Exponential Moving Average (EMA) manner.
Fig. 4 shows a picture of a pedestrian to be searched, which is captured in advance.
The search results of the pedestrian to be searched in the area 1 are shown in fig. 5-6.
Fig. 7-8 show the search result of the pedestrian to be searched in the area 2.
TABLE 1 comparison of indices before and after improvement of the YOLOv5-Lite Algorithm
Figure BDA0003689045630000111
TABLE 2 comparison of indices before and after Deepsort algorithm improvement
Figure BDA0003689045630000112
TABLE 3 Deepsort algorithm before and after improvement index comparison (continuation)
Figure BDA0003689045630000113
As shown in table 1, on the premise that the picture input sizes are 640 × 640, the overall size of the improved model has a small increase, and on the premise that the map0.5, the map0.5: 0.95 are basically leveled before and after improvement, and the Recall and the frame rate FPS have a small decrease, the accuracy of the model increases by 3% from 0.89 to 0.92, which indicates that the accuracy of the improved model has a certain increase, and compared with other algorithms, the model size and accuracy are improved a lot, and the test set is more excellent.
As shown in tables 2-3, the IDR index is increased from 21.7 to 24.9, the IDP index is increased from 71.8 to 74.7, and the IDF1 index is increased from 33.3 to 37.4, which indicates that the correctly identified recall value and the detection score are both significantly improved; rcll is improved to 31.3 from 27.2, Prcn is improved to 94.0 from 89.9, which shows that the improved Deepsort algorithm is obviously improved in precision; the FAR is reduced from 0.63 to 0.42, namely the number of misidentification of each frame is reduced after improvement; the MT is increased from 25 to 30, the ML is reduced from 339 to 307, which shows that the number of successfully tracked frames accounts for more than 80 percent of the total number of frames, the number of GT tracks accounts for less than 20 percent of the total number of frames, and the number of GT tracks accounts for less than 20 percent of the total number of frames; the number of FP as false reports is reduced to 2214 from 3352, and the number of FN as false reports is reduced to 75817 from 80411; the IDs are from 218 to 239, and the frequency of ID-switch appearance increases after model changes; the FM rises from 1121 to 1190, which shows that after the algorithm is improved, the capability of continuously tracking the target appears again after the target is shielded is improved; the MOTA is increased from 23.9 to 29.1, the MOTP is increased from 78.4 to 78.5, and the detection quality and the tracking accuracy are improved to a certain extent. Analysis and comparison show that the tracking performance and accuracy of Deepsort are greatly improved after algorithm improvement, and the Deepsort is more excellent on the same data set compared with the Deepsort improved.
The innovation points of the invention are as follows:
firstly, aiming at the branch model v5Lite-g of YOLOv5-Lite, the network header is modified, and the Concat is completely replaced by BiFPN _ Concat.
The second improvement is that the common Kalman filtering is replaced by the NSA Kalman filtering algorithm, and a self-adaptive noise covariance calculation method is introduced
Figure BDA0003689045630000131
Figure BDA0003689045630000132
Wherein R is k Is a predetermined constant measured noise covariance, c k Is the detection confidence score in state k and no longer uses only the appearance feature distance in the matching process, but considers both appearance and motion information.
The cost matrix C is the appearance cost A a And cost of action A m Weighted sum of (c):
C=λA a +(1-λ)A m (2)
wherein the weighting factor lambda is set to 0.98, and in addition, in order to solve the problem that the matching precision is limited by additional prior constraint, common global linear distribution is adopted to replace matching cascade.
Updating appearance state of ith track at frame t in mode of exponential moving average
Figure BDA0003689045630000133
Figure BDA0003689045630000134
Wherein f is i t Is the appearance embedding of the current match detection and α ═ 0.9 is the momentum term.
And thirdly, converting the fastried model file of the pth suffix into the model file of the onnx suffix.
And fourthly, changing the pedestrian detection into frame detection, namely detecting all pedestrians in the video by using a YOLOv5-Lite detection model every other frame, and meanwhile, adding a display frame rate module for a real-time video visual interface.

Claims (4)

1. A pedestrian tracking and re-identification method based on deep learning is characterized by comprising the following steps;
step 1: carrying out pedestrian target detection on the video image frame by adopting an improved YOLOv5-Lite model;
step 2: performing feature extraction on the pedestrian detected in each frame in the step 1 by adopting a Deepsort model to generate a npy file;
and step 3: adopting Fastreid to carry out pedestrian re-identification detection, and carrying out feature extraction according to a preset pedestrian picture base to generate a npy file;
and 4, step 4: performing cosine similarity calculation on the feature extraction result of each pedestrian target in the step 2 and the feature extraction result of the specific pedestrian base in the step 3, wherein the formula is shown as (1), and x 1 And x 2 For the two non-0-vectors,if the number of the pedestrian targets is larger than the threshold value gamma, the specific pedestrian targets needing to be re-identified are judged, the tracking of the pedestrians is carried out by utilizing the tracking strategy of the improved Deepsort model, and otherwise, the target tracking is not carried out
Figure FDA0003689045620000011
2. The deep learning-based pedestrian tracking and re-identification method according to claim 1, wherein the step 1 comprises the following sub-steps:
step 1.1: inputting an improved YOLOv5-Lite model network structure into a picture of a data set, adding a BiFPN module on the basis of YOLOv5-Lite, realizing the combination of cross-scale bidirectional connection and rapid normalization by the BiFPN, inputting different feature weights, and enabling the network to learn by itself, wherein the weights are normalized to be between 0 and 1 in a Softmax-based fusion mode as shown in a formula (2):
Figure FDA0003689045620000021
wherein, w i And w j Is a learnable weight;
step 1.2, extracting the features of the picture by using the convolutional neural network, then outputting a feature graph, dividing the picture into small blocks and generating an anchor frame, correlating the labeled prediction frame with the feature graph, and finally establishing a loss function and starting end-to-end training, wherein the loss function is shown as a formula (3);
Figure FDA0003689045620000022
where ρ is 2 (b,b gt ) C represents the diagonal distance of the minimum closure area which can contain the prediction frame and the real frame at the same time. v is used to measureThe consistency of the relative proportions of the two rectangular boxes, α is the weighting factor:
Figure FDA0003689045620000023
3. the deep learning-based pedestrian tracking and re-identification method according to claim 1, wherein the step 3 comprises the following sub-steps:
step 3.1: after an input picture is preprocessed, calling a pre-training model ResNet50 as a backhaul, then representing an object by a feature vector in an aggregation mode through a Gem Pooling output feature map, changing the feature vector obtained in the front by a Bnneck module, and finally defining triple loss to learn similarity in classification and discrimination in the class, so that direct discrimination between different feature vectors is more obvious, and the same feature vectors are more convergent;
the Tripletloss input is a triple including an Anchor (Anchor) example, a Positive (Positive) example and a Negative (Negative) example, similarity calculation between samples is realized by optimizing the distance between the Anchor example and the Positive example to be smaller than the distance between the Anchor example and the Negative example, and a: anchor, anchor example; p: positive, a sample of the same class as a; n: negative, a sample of a different class than a; margin is a constant greater than 0:
L=max(d(a,p)-d(a,n)+margin,0)(5)。
4. the deep learning-based pedestrian tracking and re-identification method according to claim 1, wherein the tracking strategy in the step 4 comprises the following sub-steps:
step 4.1: tracking the specific pedestrian target selected for re-identification by using the NSA Kalman filtering algorithm, specifically updating the appearance state of the ith track at the frame t in an exponential moving average manner
Figure FDA0003689045620000031
Figure FDA0003689045620000032
Wherein f is i t Is the appearance embedding of the current match detection and α ═ 0.9 is the momentum term.
Adaptive noise is added at the same time to enhance the robustness of tracking, wherein the covariance of the adaptive noise
Figure FDA0003689045620000033
As shown in equation (7):
Figure FDA0003689045620000034
wherein R is k Is a predetermined constant measured noise covariance, c k Is the detection confidence score in state k and no longer uses only the appearance feature distance in the matching process, but considers both appearance and motion information;
the matching cascade is replaced by a common global linear distribution, wherein the distribution matrix C is the appearance cost A a And cost of action A m Weighted sum of (c):
C=λA a +(1-λ)A m (8)
wherein the weighting factor λ is set to 0.98;
step 4.2: after the tracking track is predicted by a Kalman filtering algorithm, a track is predicted for the current frame, if confirmation (pedestrians or vehicles) is predicted, detection is carried out on the current frame, then the detection frame and the confirmed track frame are in cascade matching, and the tracked detection frame is updated after matching is completed;
if the track matching fails, IoU matching is carried out again, if the matching is successful, updating is carried out again, then the tracking process of predicting-observing-updating is repeated, and IoU matching failure is divided into observation matching failure and track matching failure: for observation matching failure, a method of establishing a new track is adopted, then three times of investigation is carried out, and if the target (pedestrian or vehicle) is still an actual target, the confirmation is carried out; and judging whether the track matching fails, judging whether the track matching is confirmed to be a pedestrian or a vehicle, if the track matching is not confirmed, deleting the track matching, otherwise, setting a threshold value for the track matching, if the track matching is larger than the threshold value max _ age, deleting the track matching, considering that the track matching is moved out of the observation range, and if the track matching is smaller than the threshold value, carrying out three times of investigation on the track matching again and returning to the initial stage.
CN202210657848.6A 2022-06-10 2022-06-10 Pedestrian tracking and re-identification method based on deep learning Pending CN115063836A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210657848.6A CN115063836A (en) 2022-06-10 2022-06-10 Pedestrian tracking and re-identification method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210657848.6A CN115063836A (en) 2022-06-10 2022-06-10 Pedestrian tracking and re-identification method based on deep learning

Publications (1)

Publication Number Publication Date
CN115063836A true CN115063836A (en) 2022-09-16

Family

ID=83200418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210657848.6A Pending CN115063836A (en) 2022-06-10 2022-06-10 Pedestrian tracking and re-identification method based on deep learning

Country Status (1)

Country Link
CN (1) CN115063836A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115620242A (en) * 2022-12-19 2023-01-17 城云科技(中国)有限公司 Multi-line person target re-identification method, device and application
CN116453103A (en) * 2023-06-15 2023-07-18 松立控股集团股份有限公司 Vehicle cross-mirror tracking license plate recognition method, system and electronic equipment
CN116766213A (en) * 2023-08-24 2023-09-19 烟台大学 Bionic hand control method, system and equipment based on image processing

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115620242A (en) * 2022-12-19 2023-01-17 城云科技(中国)有限公司 Multi-line person target re-identification method, device and application
CN116453103A (en) * 2023-06-15 2023-07-18 松立控股集团股份有限公司 Vehicle cross-mirror tracking license plate recognition method, system and electronic equipment
CN116453103B (en) * 2023-06-15 2023-08-18 松立控股集团股份有限公司 Vehicle cross-mirror tracking license plate recognition method, system and electronic equipment
CN116766213A (en) * 2023-08-24 2023-09-19 烟台大学 Bionic hand control method, system and equipment based on image processing
CN116766213B (en) * 2023-08-24 2023-11-03 烟台大学 Bionic hand control method, system and equipment based on image processing

Similar Documents

Publication Publication Date Title
CN115063836A (en) Pedestrian tracking and re-identification method based on deep learning
Li et al. A deep learning approach for real-time rebar counting on the construction site based on YOLOv3 detector
CN103246896B (en) A kind of real-time detection and tracking method of robustness vehicle
CN108564052A (en) Multi-cam dynamic human face recognition system based on MTCNN and method
Yang et al. Single shot multibox detector with kalman filter for online pedestrian detection in video
Dai et al. A survey of detection-based video multi-object tracking
CN116363694A (en) Multi-target tracking method of unmanned system crossing cameras matched with multiple pieces of information
Yin Object Detection Based on Deep Learning: A Brief Review
Liu et al. Video face detection based on improved SSD model and target tracking algorithm
CN116824641B (en) Gesture classification method, device, equipment and computer storage medium
Zhang et al. An efficient deep neural network with color-weighted loss for fire detection
Zhou Deep learning based people detection, tracking and re-identification in intelligent video surveillance system
Tang et al. Multilevel traffic state detection in traffic surveillance system using a deep residual squeeze-and-excitation network and an improved triplet loss
Li et al. Nighttime pedestrian detection based on feature attention and transformation
CN109063600A (en) Human motion method for tracing and device based on face recognition
Jia et al. PV-YOLO: An Object Detection Model for Panoramic Video based on YOLOv4
Kim et al. Development of a real-time automatic passenger counting system using head detection based on deep learning
Li et al. Review of multi-object tracking based on deep learning
Xiang et al. Safety helmet detection algorithm in complex scenarios based on YOLOX
Tian et al. Pedestrian multi-target tracking based on YOLOv3
Hajari et al. Novel approach for pedestrian unusual activity detection in academic environment
Bai et al. Pedestrian Tracking and Trajectory Analysis for Security Monitoring
Yin et al. Flue gas layer feature segmentation based on multi-channel pixel adaptive
Sharma et al. Multi-object tracking using TLD framework
Dileep et al. Anomalous event detection in crowd scenes using histogram of optical flow and entropy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination