CN114120188B - Multi-row person tracking method based on joint global and local features - Google Patents

Multi-row person tracking method based on joint global and local features Download PDF

Info

Publication number
CN114120188B
CN114120188B CN202111373622.5A CN202111373622A CN114120188B CN 114120188 B CN114120188 B CN 114120188B CN 202111373622 A CN202111373622 A CN 202111373622A CN 114120188 B CN114120188 B CN 114120188B
Authority
CN
China
Prior art keywords
detection frame
track
frame
detection
new
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111373622.5A
Other languages
Chinese (zh)
Other versions
CN114120188A (en
Inventor
陈军
孙志宏
梁超
王晓芬
柴笑宇
杨斌
姚红豆
邱焰升
高�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202111373622.5A priority Critical patent/CN114120188B/en
Publication of CN114120188A publication Critical patent/CN114120188A/en
Application granted granted Critical
Publication of CN114120188B publication Critical patent/CN114120188B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-row person tracking method based on joint global and local characteristics, which comprises the following steps: firstly, fusing original detection frames, detecting key points of the fused detection frames, and filtering according to the confidence coefficient of the key points to determine the final reserved key point coordinates; secondly, generating a new detection frame according to the reserved key point coordinates; then, extracting features of the new detection frame, extracting global features and local features of the pedestrian detection frame respectively, and designing a measurement method based on the combined global features and local features to calculate the similarity between the track and the detection frame; and finally, executing a tracking management strategy, and updating, terminating and other related operations on the track to obtain a final motion track. The method solves the problems of inaccurate expression of the identity features of the pedestrians and the like caused by shielding in a crowded scene, effectively improves the quality of the original detection result, improves the expression capability of the identity features of the pedestrians, effectively improves the data association precision, and improves the tracking accuracy.

Description

Multi-row person tracking method based on joint global and local features
Technical Field
The invention relates to the technical field of monitoring target tracking, in particular to a multi-row person tracking method based on joint global and local features.
Background
The multi-target tracking is used as a middle layer task in the field of computer vision, and has wide application prospects, such as security monitoring, sports analysis, unmanned driving, life medicine and the like. The task of multi-object tracking is to input a video segment, requiring the output of the trajectory of objects appearing in the video. Since pedestrian tracking has a wide research value, multi-line human tracking is the main stream of research in the field of multi-target tracking.
In recent years, as the performance of detection algorithms is continuously improved, multi-row person tracking based on a detection tracking framework has become the mainstream of multi-row person tracking. The principle based on the detection tracking frame is that pedestrians in each frame of the video are detected first, and then appearance features of the pedestrians are extracted to conduct data association and form a final motion track. Currently, most researchers use global features of pedestrian detection boxes to characterize pedestrian identities. However, in a crowded scene, frequent shielding among pedestrians causes that the shielded pedestrian detection frame often contains information of other interferents, so that the extracted global features contain other interference information, thus inaccurate feature expression is caused, the matching precision of data association is influenced, and the tracking accuracy is reduced. Therefore, the representation of the identity characteristics of the blocked pedestrians in the crowded scene plays a very important role in the accuracy of multi-row human tracking.
Relevant researchers express pedestrian identity characteristics by introducing local features. In the document [1], authors design a multi-target tracking method based on blocks, which adopts a local detector to detect pedestrians, obtains a pedestrian detection frame composed of a plurality of sub-blocks, then extracts the HOG characteristic of each sub-block, and calculates the similarity between each sub-block of two pedestrians. And fusing the similarity among all the sub-blocks by taking the confidence coefficient of the detection result of each sub-block as a weight. In document [2], a multi-target tracking method based on a local online learning appearance model is adopted, and for each pedestrian detection frame, the author normalizes it to a size of 24×58, and then cuts it into 15 small sub-blocks along the horizontal and vertical directions. The appearance characteristics of 15 sub-blocks are extracted, and a color histogram is used as the appearance characteristics. Document [3] builds a local-based multi-target tracking method to deal with local occlusion problems, they divide the pedestrian detection frame into 8 sub-blocks, then extract the HOG features of the sub-blocks, and use a first-order markov model to perform data correlation. Document [4] proposes a multi-target tracking method based on a main component, the author thinks that when a pedestrian is blocked by a certain component, the author thinks that it is the main component if the component has little change in appearance over time; for a large change in appearance, it is considered to be occluded. However, due to the lack of an overall relationship between sub-blocks, it is difficult to express an entire image efficiently. Although these methods of extracting local features take into account occlusion factors to some extent, the overall internal links between the detection frames are ignored.
Related references:
[1]Izadinia H,Saleemi I,Li W,et al.2T:Multiple people multiple parts tracker[C]//Proceedings of the European Conference on Computer Vision.Springer,2012:100-114.
[2]Yang B,Nevatia R.Online learned discriminative part-based appearance models for multi-human tracking[C]//Proceedings of the European Conference on Computer Vision.Springer,2012:484-498.
[3]Liu H,Chang F.A novel multi-object tracking method based on main-parts model[C]//Proceedings of the Chinese Control And Decision Conference.IEEE,2017:4569-4573.
[4]Shu G,Dehghan A,Oreifej O,et al.Part-based multiple-person tracking with partial occlusion handling[C]//2012IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2012:1815-1821.
disclosure of Invention
Aiming at the problems, the invention provides a multi-row person tracking method based on joint global and local features. The method comprises the steps of firstly correcting an original detection frame by using a detection corrector, and improving the quality of the original detection frame. And secondly, respectively extracting global features and local features of the detection frame to express identity features of pedestrians, and then designing a method for measuring similarity of the global features and the local features to correlate the pedestrians. And finally, executing a tracking management strategy, updating the tracks associated to the detection frames, suspending the tracks which are not matched to any detection frames, and initializing the detection frames which are not matched to any tracks.
The aim of the invention can be achieved by the following technical scheme:
a multi-row person tracking method based on joint pedestrian and head detection, comprising the steps of:
step 1, acquiring a detection result of each frame by adopting public data set data;
step 2, fusion of detection frames; the detection results on the public data set can have the problems of missed detection and false detection, the detection results need to be corrected to obtain more accurate detection results, and detection frame fusion is adopted, namely the effective overlapping rate between two detection frames on each frame exceeds a certain threshold value, and a new first detection frame is obtained;
step 3, detecting key points; after the new detection frames are obtained through fusion in the step 2, detecting key points, wherein each new detection frame contains a large number of key points;
step 4, filtering key points; setting a threshold value, filtering key points with lower confidence, and if the residual quantity of certain pedestrian key points exceeds a certain threshold value, considering that the pedestrian key points are correct;
step 5, detecting and correcting; according to the key points remained in the new detection frame, a new second detection frame is obtained again, and according to the boundary key points, a new corrected detection frame is obtained by utilizing the proportional relation between the human body key points and the human body height;
step 6, extracting features, namely extracting features of the corrected detection frame by adopting a convolutional neural network; firstly, training a pedestrian re-identification PCB network on a pedestrian re-identification data set, then extracting features of a corrected detection frame, wherein the PCB network divides the pedestrian into p blocks according to the horizontal direction and the vertical direction, obtains feature vectors of each block, and calculates visible area labels of the pedestrian according to the fact that whether key points exist in each block of the pedestrian detection frame or not is counted;
step 7, local appearance characteristic data association is carried out, after characteristic extraction is carried out on the historical track and a certain target detection frame on the t frame by utilizing the method in the step 6, local characteristic data association is calculated by calculating the cosine distance of the local characteristic vector of the historical track and the certain target detection frame on the t frame;
step 8, associating the overall appearance characteristic data; after extracting the characteristics of the historical track and a certain target detection frame on the t frame by using the method in the step 6, calculating global characteristic data association by calculating the cosine distance of global characteristic vectors of the historical track and the certain target detection frame on the t frame;
step 9, data association; in the step 6-8, the whole appearance characteristic, the local appearance characteristic and the visible label of a certain target detection frame on the history track and the t frame are fused so as to calculate data association, and then a Hungary matching algorithm is adopted to obtain an optimal matching result;
step 10, tracking management; step S9, after the optimal solution of the data association is obtained through a Hungary algorithm, returning a successfully matched detection frame and track pair, a detection frame which is not matched with the track, and a track which is not matched with the detection frame; for the track matched with the detection frame, updating the track; setting a tracking track state which is not matched with the detection frame as a track suspension state; regarding a detection frame which is not matched with the track, considering the detection frame as a new track, initializing and adding the new track into a track set;
and 11, repeating the steps 2-10 until all frames are processed, and outputting the target track.
Further, in step 2, combining the detection frames with the effective overlapping rate exceeding a certain threshold value on each frame to obtain a new detection frame;
the effective overlap ratio calculation formula is as follows:
wherein the method comprises the steps ofAnd->Representing the A-th and B-th detection frames on t frames, respectively, < >> Representing the area that can cover the two boxes at a minimum;
the new detection frame calculation formula is as follows:
x new =min(x A ,x B )
y new =min(y A ,y B )
w new =max((x A +w A ),(x B +w B ))-x new
h new =max((y A +h A ),(y B +h B ))-y new
wherein (x) A ,y A ) To detect the coordinates of frame A, (w A ,h A ) To detect the width and height of the frame A, (x) B ,y B ) To detect the coordinates of frame B, (w B ,h B ) Is the width and height of the detection frame B.
Further, the key points detected in step 3 are respectively: left eye, right eye, nose, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, and right ankle.
Further, in step 4, if the remaining amount of a certain pedestrian key point exceeds a certain threshold, it is considered to be correct, wherein a specific filtering formula is as follows:
wherein ε is τ Is a binary variable, ε τ =1 indicates that the confidence of the τ -th key point is greater than the threshold, the key point is kept, otherwise, the key (β) indicates the number of valid key points in the β -th detection frame, CS τ The confidence score of the τ -th key point is represented, and γ represents a preset threshold.
Further, in the step 5, a newly corrected detection frame is obtained according to the boundary key points; the specific correction formula is as follows:
D new ={L x ,T y ,R x -L x +θ,B y -T y +ρ}
wherein L is x And R is x To correct the leftmost and rightmost coordinates of the x-axis of the key point in the detection frame after correction, T y And B y And the uppermost coordinate and the lowermost coordinate of the key point y axis in the corrected detection frame are represented, and theta and rho are two parameters which respectively represent the x-axis offset and the y-axis offset.
Further, in step 6, whether a key point exists in each block of the pedestrian detection frame is counted to calculate a visible area label of the pedestrian, and a specific calculation formula is as follows:
wherein l v Representing whether the v-th block is visible, v=1, …, p, 1 if visible, otherwise 0, q s Represents the s-th key point in the pedestrian detection frame, p represents the total number of the detection frame blocks, and H represents the height of the image.
Further, in step 7, the local appearance characteristic data association calculation formula is as follows:
wherein the method comprises the steps ofAnd->Feature vectors, d of the v-th block of a certain target detection frame j on the historical track i and t frames respectively v For the characteristic distance of the v-th block of a certain target detection frame j on the historical track i and t frames, p represents the total number of the detection frame blocks.
Further, the overall appearance feature similarity measurement formula in step 8 is as follows:
wherein the method comprises the steps ofAnd->Global feature vector d of a target detection frame j on the historical track i and t frames respectively g Global feature distance for a target detection frame j on target track i and t frames
Further, the specific implementation method of the data association in the step 9 is as follows;
the data association is realized by calculating the characteristic distance dist of the history track i and the target detection frame j, and the specific calculation formula is as follows:
wherein the method comprises the steps ofAnd->The visible scores of the v-th block of a certain target detection frame j on the historical track i and t frames are respectively, if the visible score is 0, the part is invisible, 1 represents the part is visible, and p represents the total number of the detection frame blocks.
Further, in step 10, the specific tracking management method is as follows:
the matching result in the step has 3 cases, namely the detection result of the successful matching of the detection frame and the track, the track which is not matched with the detection frame and the detection result of the track which is not matched with the detection frame;
for the successfully matched pair of the detection frame and the track, updating the history track i successfully matched with the target detection frame j of the t frame by using the target detection frame j of the t frame;
for tracks which are not matched with the detection frame, setting the state of a history track i as a suspension state, and adding the track ID to the vanishing target set;
and initializing the track of a target detection frame j of the t frame for the detection frame which is not matched with the track, assigning a new track ID, and adding the new track ID into the track set.
Compared with the existing multi-pedestrian tracking technology, the invention has the following advantages and beneficial effects:
1) Compared with the prior art, the method solves the problems that the multi-row person tracking cannot process missed detection and false detection in a crowded scene, and the like, and greatly improves the detection quality. The detection frame fusion strategy designed by the invention can eliminate redundancy and false detection problems in the original detection result. And detecting the missing detection frame in the original detection result through the key point detection of the new detection frame.
2) The invention adopts the head tracking strategy integrating global and local characteristics, can carry out identity expression on the whole pedestrian and can also solve the problem of local shielding. The designed feature matching strategy is simple and effective, so that the invention is easier to realize in actual engineering, and the engineering efficiency is improved.
Drawings
FIG. 1 is a diagram of a detection corrector designed by the present invention.
Fig. 2 is a system frame diagram of the present invention.
Detailed Description
In order to facilitate a person of ordinary skill in the art in understanding and practicing the present invention, the present invention will be described in further detail below with reference to the accompanying drawings, it being understood that the examples described herein are for illustration and explanation of the present invention only and are not intended to be limiting of the present invention.
Compared with the existing method which only uses global features, the method can effectively express the identity of the blocked pedestrians, improves the data association matching precision in the multi-row human tracking and improves the multi-row human tracking accuracy. The invention firstly corrects the original detection result by using the detection trimmer to solve the problems of missed detection, false detection and the like in the original detection result, then respectively extracts the global features and the local features of the pedestrian detection frame and designs a brand-new multi-row human tracking feature matching strategy to carry out data association. And finally, updating the track matched with the detection frame after the data association by adopting tracking management, stopping the track which is not matched with the detection frame, and initializing the track of the detection frame which is not matched with the track to obtain the final pedestrian track.
The specific implementation method comprises the following steps:
step S1: and (5) detecting pedestrians. Obtaining a pedestrian detection frame set D, D= { D by adopting an original detection result on the public data set 1 ,D 2 ,…,D t ,…};
Wherein D is t A set of boxes is detected for pedestrians over t frames.
Step S2: and (5) fusion of detection frames. For the detection result in step S1, there may be some problems such as missing detection and false detection, and correction needs to be performed to obtain a more accurate detection result. And combining the detection frames with the effective overlapping rate exceeding a certain threshold value on each frame by adopting a detection frame fusion strategy to obtain a new detection frame.
The effective overlap ratio calculation formula is as follows:
wherein the method comprises the steps ofAnd->Representing the A-th and B-th detection frames on t frames, respectively, < >> Representing the area that can cover the two boxes at a minimum.
The new detection frame calculation formula is as follows:
x new =min(x A ,x B )
y new =min(y A ,y B )
w new =max((x A +w A ),(x B +w B ))-x new
h new =max((y A +h A ),(y B +h B ))-y new
wherein (x) A ,y A ) To detect the coordinates of frame A, (w A ,h A ) Is the width and height of the detection frame A. (x) B ,y B ) To detect the coordinates of frame B, (w B ,h B ) Is the width and height of the detection frame B.
Step S2 is repeatedly performed until no more t frames satisfy the condition, and then stops.
Step S3: and (5) detecting key points. And (2) after obtaining a new detection frame according to each frame in the step (S2), detecting key points. And detecting key points of the human body by adopting a key point detection algorithm of an on-off source. Suppose that K pedestrians are detected in a new detection frame, each pedestrian contains 17 key points, which are respectively: the left eye, right eye, nose, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, and right ankle, then contains 17 x k key points.
Step S4: and filtering key points. Setting a threshold value and filtering key points with low confidence. A certain pedestrian keypoint residual is deemed correct if it exceeds a certain threshold. The specific filtering formula is as follows:
wherein ε is τ Is a binary variable, ε τ =1 means that the confidence of the τ -th key point is greater than the threshold, the key point remains, otherwise it is deleted. count (β) represents the number of valid keypoints in the β -th detection frame. CS (circuit switching) τ The confidence score of the τ -th key point is represented, and γ represents a preset threshold.
Step S5: and detecting correction. And (5) according to the key points remained in the new detection frame, recovering the new detection frame. And according to boundary key points such as eyes, ankles and the like in the new detection frame, the new corrected detection frame is obtained by utilizing the proportional relation between the human body key points and the height of the human body. The specific correction formula is as follows:
D new ={L x ,T y ,R x -L x +θ,B y -T y +ρ}
wherein L is x And R is x To correct the leftmost and rightmost coordinates of the x-axis of the key point in the detection frame after correction, T y And B y And the coordinates of the uppermost and the lowermost of the key points in the corrected detection frame in the y axis are represented, θ and ρ are two parameters, and the x axis offset and the y axis offset are respectively represented, and are obtained by carrying out regression learning on training samples.
Step S6: and (5) extracting characteristics. And extracting the characteristics of the corrected detection frame by adopting a convolutional neural network. Firstly, training on a pedestrian re-identification data set by adopting a pedestrian re-identification PCB network, and then extracting features. The PCB network divides pedestrians into p blocks according to the horizontal and vertical directions, and obtains the characteristic vector f of each block v (v=1, …, p). Calculating visible area labels of pedestrians according to the fact that whether key points exist in each part of the pedestrian detection frame:
wherein l v Representing whether the v-th block is visible, v=1, …, p, 1 if visible, otherwise 0, q s Represents the s-th key point in the pedestrian detection frame, p represents the total number of the detection frame blocks, and H represents the height of the image.
Step S7: the local appearance characteristic data is associated. After S6 feature extraction is carried out on a certain target detection frame j on the historical track i and the t frame, local feature data association is calculated, and a specific calculation formula is as follows:
wherein the method comprises the steps ofAnd->Feature vectors, d of the v-th block of a certain target detection frame j on the historical track i and t frames respectively v For the characteristic distance of the v-th block of a certain target detection frame j on the historical track i and t frames, p represents the total number of the detection frame blocks.
Step S8: and (5) correlating the overall appearance characteristic data. After S6 feature extraction is finished on a certain target detection frame j on the history track i and the t frame, the overall appearance features of the target detection frame j are respectively as follows: the apparent feature distance of a certain target detection frame j on the history track i and t frames is calculated as follows:
wherein the method comprises the steps ofAnd->Global feature vector d of a target detection frame j on the historical track i and t frames respectively g The global feature distance of a certain target detection frame j on the historical track i and t frames.
Step S9: and (5) data association. In the step S6-S8, the whole appearance characteristic, the local appearance characteristic and the visible label of a certain target detection frame j on the history track i and the t frame are fused to calculate data association, and then the Hungary matching algorithm is adopted to obtain the optimal matching result. The specific calculation formula of the characteristic distance dist of a certain target detection frame j on the history track i and t frames is as follows:
wherein the method comprises the steps ofAnd->The visible scores of the v-th block of a certain target detection frame j on the historical track i and t frames are respectively, if the visible score is 0, the part is invisible, 1 represents the part is visible, and p represents the total number of the detection frame blocks.
Step S10: and (5) tracking management. And step S9, after the optimal solution of the data association is obtained through the Hungary algorithm, returning the successfully matched detection frame and track pairs, the detection frame which is not matched with the track, and the track which is not matched with the detection frame. For the track matched with the detection frame, updating the track; setting a tracking track state which is not matched with the detection frame as a track suspension state; for a detection box that does not match a track, then the detection box is considered a new track, initialized and added to the track set.
The matching result in the step has 3 cases, namely the detection result of the successful matching of the detection frame and the track, the track which is not matched with the detection frame and the detection result of the track which is not matched with the detection frame.
In the step, for the successfully matched pair of the detection frame and the track, a target detection frame j of t frames is used for updating the history track i successfully matched with the target detection frame j.
In this step, for tracks that do not match the detection frame, the history track i state is set to abort and the track ID is added to the vanishing target set.
In the step, for the detection frame which is not matched with the track, track initialization is carried out on a certain target detection frame j of the t frame, a new track ID is assigned, and the new track ID is added into the track set.
Step S11: and repeating the steps S2-S10 until all frames are processed, and outputting the target track.
It should be understood that the foregoing description of the preferred embodiments is not intended to limit the scope of the invention, but rather to limit the scope of the claims, and that those skilled in the art can make substitutions or modifications without departing from the scope of the invention as set forth in the appended claims.

Claims (10)

1. A multi-row person tracking method based on joint global and local features, comprising the steps of:
step 1, acquiring a detection result of each frame by adopting public data set data;
step 2, fusion of detection frames; the detection results on the public data set can have the problems of missed detection and false detection, the detection results need to be corrected to obtain more accurate detection results, and detection frame fusion is adopted, namely the effective overlapping rate between two detection frames on each frame exceeds a certain threshold value, and a new first detection frame is obtained;
step 3, detecting key points; after the new detection frames are obtained through fusion in the step 2, detecting key points, wherein each new detection frame contains a large number of key points;
step 4, filtering key points; setting a threshold value, filtering key points with lower confidence, and if the residual quantity of certain pedestrian key points exceeds a certain threshold value, considering that the pedestrian key points are correct;
step 5, detecting and correcting; according to the key points remained in the new detection frame, a new second detection frame is obtained again, and according to the boundary key points, a new corrected detection frame is obtained by utilizing the proportional relation between the human body key points and the human body height;
step 6, extracting features, namely extracting features of the corrected detection frame by adopting a convolutional neural network; firstly, training a pedestrian re-identification PCB network on a pedestrian re-identification data set, then extracting features of a corrected detection frame, wherein the PCB network divides the pedestrian into p blocks according to the horizontal direction and the vertical direction, obtains feature vectors of each block, and calculates visible area labels of the pedestrian according to the fact that whether key points exist in each block of the pedestrian detection frame or not is counted;
step 7, local appearance characteristic data association is carried out, after characteristic extraction is carried out on the historical track and a certain target detection frame on the t frame by utilizing the method in the step 6, local characteristic data association is calculated by calculating the cosine distance of the local characteristic vector of the historical track and the certain target detection frame on the t frame;
step 8, associating the overall appearance characteristic data; after extracting the characteristics of the historical track and a certain target detection frame on the t frame by using the method in the step 6, calculating global characteristic data association by calculating the cosine distance of global characteristic vectors of the historical track and the certain target detection frame on the t frame;
step 9, data association; in the step 6-8, the whole appearance characteristic, the local appearance characteristic and the visible label of a certain target detection frame on the history track and the t frame are fused so as to calculate data association, and then a Hungary matching algorithm is adopted to obtain an optimal matching result;
step 10, tracking management; step S9, after the optimal solution of the data association is obtained through a Hungary algorithm, returning a successfully matched detection frame and track pair, a detection frame which is not matched with the track, and a track which is not matched with the detection frame; for the track matched with the detection frame, updating the track; setting a tracking track state which is not matched with the detection frame as a track suspension state; regarding a detection frame which is not matched with the track, considering the detection frame as a new track, initializing and adding the new track into a track set;
and 11, repeating the steps 2-10 until all frames are processed, and outputting the target track.
2. The multi-row person tracking method based on joint global and local features of claim 1, wherein: step 2, combining the detection frames with the effective overlapping rate exceeding a certain threshold value on each frame to obtain a new detection frame;
the effective overlap ratio calculation formula is as follows:
wherein the method comprises the steps ofAnd->Representing the A-th and B-th detection frames on t frames, respectively, < >> Representing the area that can cover the two boxes at a minimum;
the new detection frame calculation formula is as follows:
x new =min(x A ,x B )
y new =min(y A ,y B )
w new =max((x A +w A ),(x B +w B ))-x new
h new =max((y A +h A ),(y B +h B ))-y new
wherein (x) A ,y A ) To detect the coordinates of frame A, (w A ,h A ) To detect the width and height of the frame A, (x) B ,y B ) To detect the coordinates of frame B, (w B ,h B ) Is the width and height of the detection frame B.
3. The multi-row person tracking method based on joint global and local features of claim 1, wherein: the key points detected in the step 3 are respectively: left eye, right eye, nose, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, and right ankle.
4. The multi-row person tracking method based on joint global and local features of claim 1, wherein: in step 4, if the residual quantity of a certain pedestrian key point exceeds a certain threshold, the residual quantity is considered to be correct, wherein a specific filtering formula is as follows:
wherein ε is τ Is a binary variable, ε τ =1 indicates that the confidence of the τ -th key point is greater than the threshold, the key point is kept, otherwise, the key (β) indicates the number of valid key points in the β -th detection frame, CS τ The confidence score of the τ -th key point is represented, and γ represents a preset threshold.
5. The multi-row person tracking method based on joint global and local features of claim 1, wherein: step 5, obtaining a new corrected detection frame according to the boundary key points; the specific correction formula is as follows:
D new ={L x ,T y ,R x -L x +θ,B y -T y +ρ}
wherein L is x And R is x To correct the leftmost and rightmost coordinates of the x-axis of the key point in the detection frame after correction, T y And B y Representing key points in corrected detection frameThe y-axis uppermost and lowermost coordinates, θ and ρ, are two parameters representing the x-axis and y-axis offsets, respectively.
6. The multi-row person tracking method based on joint global and local features of claim 1, wherein: in step 6, calculating whether key points exist in each partition of the pedestrian detection frame to calculate visible area labels of the pedestrians, wherein the specific calculation formula is as follows:
wherein l v Representing whether the v-th block is visible, v=1, …, p, 1 if visible, otherwise 0, q s Represents the s-th key point in the pedestrian detection frame, p represents the total number of the detection frame blocks, and H represents the height of the image.
7. The multi-row person tracking method based on joint global and local features of claim 1, wherein: in the step 7, the local appearance characteristic data association calculation formula is as follows:
wherein the method comprises the steps ofAnd->Feature vectors, d of the v-th block of a certain target detection frame j on the historical track i and t frames respectively v For the characteristic distance of the v-th block of a certain target detection frame j on the historical track i and t frames, p represents the total number of the detection frame blocks.
8. The multi-row person tracking method based on joint global and local features of claim 1, wherein: in the step 8, the overall appearance characteristic similarity measurement formula is as follows:
wherein the method comprises the steps ofAnd->Global feature vector d of a target detection frame j on the historical track i and t frames respectively g The global feature distance of a certain target detection frame j on the target track i and t frames.
9. The multi-row person tracking method based on joint global and local features of claim 1, wherein: the specific implementation method of the data association in the step 9 is as follows;
the data association is realized by calculating the characteristic distance dist of the history track i and the target detection frame j, and the specific calculation formula is as follows:
wherein the method comprises the steps ofAnd->The visible scores of the v-th block of a certain target detection frame j on the historical track i and t frames are respectively, if the visible score is 0, the part is invisible, 1 represents the part is visible, and p represents the total number of the detection frame blocks.
10. The multi-row person tracking method based on joint global and local features of claim 1, wherein: in step 10, the specific tracking management method is as follows:
the matching result in the step has 3 cases, namely the detection result of the successful matching of the detection frame and the track, the track which is not matched with the detection frame and the detection result of the track which is not matched with the detection frame;
for the successfully matched pair of the detection frame and the track, updating the history track i successfully matched with the target detection frame j of the t frame by using the target detection frame j of the t frame;
for tracks which are not matched with the detection frame, setting the state of a history track i as a suspension state, and adding the track ID to the vanishing target set;
and initializing the track of a target detection frame j of the t frame for the detection frame which is not matched with the track, assigning a new track ID, and adding the new track ID into the track set.
CN202111373622.5A 2021-11-19 2021-11-19 Multi-row person tracking method based on joint global and local features Active CN114120188B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111373622.5A CN114120188B (en) 2021-11-19 2021-11-19 Multi-row person tracking method based on joint global and local features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111373622.5A CN114120188B (en) 2021-11-19 2021-11-19 Multi-row person tracking method based on joint global and local features

Publications (2)

Publication Number Publication Date
CN114120188A CN114120188A (en) 2022-03-01
CN114120188B true CN114120188B (en) 2024-04-05

Family

ID=80396409

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111373622.5A Active CN114120188B (en) 2021-11-19 2021-11-19 Multi-row person tracking method based on joint global and local features

Country Status (1)

Country Link
CN (1) CN114120188B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100442B (en) * 2022-08-23 2022-11-22 浙江大华技术股份有限公司 Target matching method, target and part matching method and related equipment
CN115331263B (en) * 2022-09-19 2023-11-07 北京航空航天大学 Robust attitude estimation method, application of robust attitude estimation method in direction judgment and related method
CN117953015B (en) * 2024-03-26 2024-07-09 武汉工程大学 Multi-row person tracking method, system, equipment and medium based on video super-resolution

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110349184A (en) * 2019-06-06 2019-10-18 南京工程学院 The more pedestrian tracting methods differentiated based on iterative filtering and observation
WO2020155873A1 (en) * 2019-02-02 2020-08-06 福州大学 Deep apparent features and adaptive aggregation network-based multi-face tracking method
CN112734809A (en) * 2021-01-21 2021-04-30 高新兴科技集团股份有限公司 Online multi-pedestrian tracking method and device based on Deep-Sort tracking framework
WO2021114702A1 (en) * 2019-12-10 2021-06-17 ***股份有限公司 Target tracking method, apparatus and system, and computer-readable storage medium
CN113221787A (en) * 2021-05-18 2021-08-06 西安电子科技大学 Pedestrian multi-target tracking method based on multivariate difference fusion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110517292A (en) * 2019-08-29 2019-11-29 京东方科技集团股份有限公司 Method for tracking target, device, system and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020155873A1 (en) * 2019-02-02 2020-08-06 福州大学 Deep apparent features and adaptive aggregation network-based multi-face tracking method
CN110349184A (en) * 2019-06-06 2019-10-18 南京工程学院 The more pedestrian tracting methods differentiated based on iterative filtering and observation
WO2021114702A1 (en) * 2019-12-10 2021-06-17 ***股份有限公司 Target tracking method, apparatus and system, and computer-readable storage medium
CN112734809A (en) * 2021-01-21 2021-04-30 高新兴科技集团股份有限公司 Online multi-pedestrian tracking method and device based on Deep-Sort tracking framework
CN113221787A (en) * 2021-05-18 2021-08-06 西安电子科技大学 Pedestrian multi-target tracking method based on multivariate difference fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Survey of Multiple Pedestrian Tracking Based on Tracking-by-Detection Framework;孙志宏等;IEEE Transactions on Circuits and Systems for Video Technology;20210501;第31卷(第5期);第1819-1833页 *
一体化多目标跟踪算法研究综述;周雪等;电子科技大学学报;20220930;第51卷(第5期);第728-734页 *
自适应在线判别外观学习的分层关联多目标跟踪;方岚;于凤芹;;中国图象图形学报;20200415(第04期);第84-96页 *

Also Published As

Publication number Publication date
CN114120188A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN114120188B (en) Multi-row person tracking method based on joint global and local features
CN110472554B (en) Table tennis action recognition method and system based on attitude segmentation and key point features
WO2020042419A1 (en) Gait-based identity recognition method and apparatus, and electronic device
CN107832672B (en) Pedestrian re-identification method for designing multi-loss function by utilizing attitude information
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
CN114220176A (en) Human behavior recognition method based on deep learning
CN107145862B (en) Multi-feature matching multi-target tracking method based on Hough forest
CN108256421A (en) A kind of dynamic gesture sequence real-time identification method, system and device
CN105069434B (en) A kind of human action Activity recognition method in video
CN109344694B (en) Human body basic action real-time identification method based on three-dimensional human body skeleton
CN104298968B (en) A kind of method for tracking target under complex scene based on super-pixel
CN109191497A (en) A kind of real-time online multi-object tracking method based on much information fusion
CN114067358A (en) Human body posture recognition method and system based on key point detection technology
KR102132722B1 (en) Tracking method and system multi-object in video
CN102999920A (en) Target tracking method based on nearest neighbor classifier and mean shift
CN109670440B (en) Identification method and device for big bear cat face
CN107564035B (en) Video tracking method based on important area identification and matching
CN107798691B (en) A kind of unmanned plane independent landing terrestrial reference real-time detection tracking of view-based access control model
CN109034247B (en) Tracking algorithm-based higher-purity face recognition sample extraction method
CN107862240A (en) A kind of face tracking methods of multi-cam collaboration
CN107103303A (en) A kind of pedestrian detection method based on GMM backgrounds difference and union feature
CN113192105A (en) Method and device for tracking multiple persons and estimating postures indoors
CN112200074A (en) Attitude comparison method and terminal
CN112949569B (en) Falling analysis-oriented human body posture point effective extraction method
CN110991274A (en) Pedestrian tumbling detection method based on Gaussian mixture model and neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant