CN112116634B - Multi-target tracking method of semi-online machine - Google Patents

Multi-target tracking method of semi-online machine Download PDF

Info

Publication number
CN112116634B
CN112116634B CN202010754142.2A CN202010754142A CN112116634B CN 112116634 B CN112116634 B CN 112116634B CN 202010754142 A CN202010754142 A CN 202010754142A CN 112116634 B CN112116634 B CN 112116634B
Authority
CN
China
Prior art keywords
frame
detection
kalman
track
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010754142.2A
Other languages
Chinese (zh)
Other versions
CN112116634A (en
Inventor
刘龙军
金焰明
孙宏滨
郑南宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010754142.2A priority Critical patent/CN112116634B/en
Publication of CN112116634A publication Critical patent/CN112116634A/en
Application granted granted Critical
Publication of CN112116634B publication Critical patent/CN112116634B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The multi-target tracking method of the semi-online machine-mounted comprises the steps of obtaining detection frames of pedestrians or moving targets according to videos of the pedestrians or moving targets, obtaining a Kalman sequence spectrum according to position change information among the detection frames in a period of time window, finding a pair of Kalman heads according to the Kalman sequence spectrum, obtaining the detection frames of the targets or the moving objects to be tracked in the next frame through similarity of an appearance model, a moving model and a size change model, and enabling the targets or the moving objects to be in the detection frames in the frame, otherwise, indicating that the targets are lost; and splicing the detection frames with the similarity higher than the threshold value into a Kalman sequence spectrum, updating a motion model and an appearance model in the Kalman sequence spectrum, and tracking a pedestrian or a moving object target in the next frame. The method is suitable for any track spliced multi-target tracking algorithm, namely, the method is not limited to the constraint of different tracks generated by a plurality of targets such as pedestrians, moving objects and the like, the tracking precision can be effectively improved, and the identity conversion value can be reduced.

Description

Multi-target tracking method of semi-online machine
Technical Field
The invention relates to a tracking method, in particular to a multi-target tracking method of a semi-online machine.
Background
The multi-target tracking method is mainly applied to track tracking of a plurality of people or moving objects in a video sequence shot by a camera: in the driving scene of the unmanned vehicle, real-time track tracking can be carried out on pedestrians or other vehicle targets on the road shot by a camera arranged in the unmanned vehicle, and the motion track of the pedestrians or other vehicle targets is predicted, so that the unmanned vehicle can carry out effective avoidance or automatic driving decision according to the motion of the targets; in a plurality of cross-camera monitoring scenes, a plurality of pedestrians in the cameras can be tracked according to requirements, and walking tracks and positioning of a plurality of pedestrian targets can be monitored through videos captured by different cameras; in sports scenes shot by the cameras, such as basketball games, the running tracks of a plurality of athletes shot by the cameras can be tracked respectively by a multi-target tracking method, and actions, behavior analysis and the like on the athletes on the field are performed based on the tracked tracks. The multi-target tracking method can also be applied to tracking a plurality of targets such as enemy ships, vehicles and the like in military scenes. The current tracking methods are numerous, but in order to efficiently track, prompt and optimization such as real-time performance, accuracy and the like must be performed on the multi-target tracking method.
MOT (multi-objective tracking) can be largely divided into online MOT and offline MOT, which differ in that: the former can be pushed backwards along with the real-time frame number, the tracking track result can be timely given, and overall, the real-time performance is higher than that of the latter, and the accuracy is lower than that of the latter; the latter must wait for the whole video sequence to finish the forward calculation, and track after obtaining the information such as the detection frames in all video frames, so it is difficult to meet the real-time requirement compared with the former, but the accuracy is generally higher due to better combination of global information. On-line tracking requires real-time track tracking to be completed immediately after the detection operation of each next frame is completed. Therefore, the online tracking algorithm has better real-time performance intuitively, but cannot effectively utilize global information of the video, so that accuracy may be reduced; in contrast, offline tracking is tracking a track after all frames of a given video sequence have been detected. The mode can well utilize global information, and tracking results are relatively accurate, but cannot meet the real-time requirement. The sizes of the time receptive fields of the online tracking, the semi-online tracking and the offline tracking are respectively the current frame, the time window and the global, and are sequentially increased; the real-time performance is reduced in turn.
Occlusion problems have been one of the difficulties in MOT, and although iterative updating of various algorithms is quite rapid, most of the algorithm performance remains difficult to maintain robust when severe occlusion is encountered. Regardless of whether an online MOT or an offline MOT, or MOTs constructed using deep learning methods, various approaches have been made to attempt to address occlusion problems when they are encountered. But essentially by sacrificing real-time. The accuracy and the precision are very important in the scene of actual tracking application, for example, the poor real-time performance of a tracking algorithm in an unmanned automobile can lead to the delay of vehicle judgment, the erroneous judgment or the delayed decision, and unnecessary traffic accidents are caused; the poor accuracy can lead to a plurality of targets to be tracked in disorder, so that tracking failure is caused, for example, when a criminal suspects are tracked by using a multi-target tracking algorithm in a plurality of intelligent cameras in a city, the criminal suspects can be tracked, or the tracked non-suspects can cause the true suspects to run away, and the like.
Disclosure of Invention
The invention aims to provide a semi-online multi-target tracking method.
In order to achieve the above object, the present invention is realized by the following technical scheme:
According to a multi-target tracking method of a semi-online machine, a detection frame of a pedestrian or a moving target is obtained through a YOLO-V3 detector according to a video of the pedestrian or the moving target, a Kalman sequence spectrum is obtained according to position change information among the detection frames in a period of time window, then a pair of Kalman heads are found according to the Kalman sequence spectrum, a detection frame of the target or the moving object to be tracked in a next frame is obtained through similarity of an appearance model, a moving model and a size change model, and the target or the moving object is located in a detection frame in the frame, otherwise, the target is indicated to be lost; and splicing the detection frames with the similarity higher than the threshold value into a Kalman sequence spectrum, updating a motion model and an appearance model in the Kalman sequence spectrum, and tracking a pedestrian or a moving object target in the next frame.
The invention is further improved in that the similarity of the appearance model is obtained through the following processes:
in the n th frame video, the patch is fixed in size [64,128], there are D detection boxes, D patches, and the X detection box is expressed as Patch corresponding to the X-th detection frame is/>
In n th frames, performing crop and restore operations on the area where each detection frame is located to obtain D patches with the number equal to the fixed size of the detection frame, then dividing the pixels of the D patches into a plurality of groups according to the color interval,
The matrix result reshape obtained by grouping is taken as a one-dimensional vector Tsr X, and the one-dimensional vector Tsr X is taken as a one-dimensional vectorTo obtain an appearance model, and to express the appearance model of the X-th detection frame and the Y-th trajectory as: f (X) and f (Y); finally, the appearance model is updated by vector fusion, denoted/>The similarity of the appearance model is as follows formula (3-1);
Wherein: Λ A (X, Y) represents the similarity that is the appearance model.
The invention is further improved in that the similarity between the motion model and the size change model is obtained through the following processes: the time difference between adjacent frames is delta t, and the kth object in the nth frame isPosition center coordinates/>For (c, d), the velocity vector of the corresponding coordinate is/>Acceleration vector/>, corresponding to coordinatesFor/>The target corresponds to the detection frame size/>Is (w, h) corresponds to the change speed/>For/>The driving force of change is/>The detector influence factor is alpha;
Motion state of the kth object in the nth frame And size status/>Respectively/>And/>Covariance matrix/>, between element factors in motion stateCovariance matrix between element factors in size state is/>According to the law of physical motion, a position prediction equation and a size prediction equation for the next frame are obtained as follows:
I.e.
I.e.
Order theSimplifying the two iterative state transfer equations and the covariance matrix updating equation as follows:
Carrying out Kalman filter prediction based on normal distribution by taking the formulas (3-8) and (3-9) as iterative equations of a motion model and a size model to obtain position prediction information of an n+1th frame And size prediction information/>
For any of the first segment of track X and the second segment of track Y,And/>A forward velocity vector pointing from the head to the tail of the first track X and a reverse velocity vector pointing from the tail to the head of the second track Y, respectively; /(I)Representing the course of motion simulated by a kalman filter; f (X, Y) is a forward similarity score pointing from the tail of track X toward the head of track Y; Is the reverse similarity score from the head of track Y to the tail of track X;
where Λ M (X, Y) represents the similarity between the first segment of track X and the second segment of track Y.
A further improvement of the invention is that the length of the time window is defined as N, the minimum instantiation length of the short track is T m, the Kalman family chart is denoted as KFM, and the kth detection box in the nth frame is denoted as Representing a detection frame in a KFMOrder in the corresponding fragment trajectories;
If it is It indicates that the detection box has not been cascaded with any fragment trajectories in the KFM, x representsIs the (x+1) th member of a segment of fragment trajectories in the KFM, the ith fragment trajectory in the KFM is defined as TK i, if the length of the ith fragment trajectory is greater than T m and its motion model, appearance model, size model are not updated in the nth frame, then the ith fragment trajectory is instantiated as a reliable short trajectory ST j, otherwise, the ith fragment trajectory is disassembled;
The invention is further improved in that the specific process of splicing the detection frames with the similarity higher than the threshold value into the Kalman sequence spectrum is as follows:
firstly, finding a detection frame KH of an nth frame of pedestrian picture pair: detection frame in nth frame and n+1st frame pictures In/>For the ith detection box in the nth frame,/>Searching each pair of detection frames which possibly belong to the same target and are close in IOU relation for the detection frames in the n+1st frame of picture; if/>Then it willAnd/>Marked 0 and 1, respectively, and will/>And/>Called a pair detection frame KH;
there will be several pairs KH in the nth and n +1 frames (e.g., And/>);/>Representing detection frame/>, in KFMRepresenting the order in the corresponding fragment trajectories;
Second, predicting:
Predicting the position of the pedestrian target in the next frame of picture according to the motion module of each pedestrian target in the current n+1st frame of picture
Thirdly, track growth: selecting and selecting according to formula (3)The most similar detection box is/>Then, let/>And updating the motion model and the appearance model of the unstable track TK i which is not yet instantiated;
predicting the position in the next frame by using the motion model and the appearance model of the updated unstable track TK i;
fourth step: repeating the first to third steps for each frame of tracking;
fifth step, instantiating or backtracking: instantiating or backtracking short tracks in the KFM in the current frame according to the following conditions:
a) Instantiation: if the length of the unstable track TK i in the Kalman sequence spectrum is not updated in the last frame, instantiating the unstable track TK i as a new reliable track ST j;
b) Backtracking: if the length of the unstable track TK i in the Kalman sequence spectrum is smaller than the threshold T m and is not updated in the last frame, the unstable track TK i is deleted in the Kalman family graph KFM and set And marking the fragment track in the Kalman sequence spectrum as a forbidden route.
The invention is further improved in that the specific process of the second step is as follows: according toAnd/>Establishing a motion model of an unstable track TK i, predicting the position of a pedestrian target belonging to a track TK i in (n+2) th frames according to the motion model, and defining the position as/>
Compared with the prior art, the invention has the beneficial effects that:
First: the method is suitable for any track spliced multi-target tracking algorithm, namely, the method is not limited to the constraint of different tracks generated by a plurality of targets such as pedestrians, moving objects and the like, the tracking precision can be effectively improved, and the identity conversion value can be reduced;
Second,: the generated tracking result can be checked, and the error tracking result is corrected, so that the algorithm is more robust, for example, the target positioning error in the current video frame caused in pedestrian target tracking, namely, when the tracking result of the online multi-target tracking algorithm is wrong, the error can be detected through the backtracking module in the time window of the method, so that the tracking track is corrected;
Third,: the method has the advantages that through masking and covering (IOU) of the intersection area between the targets, the degree of distinction between multiple targets is effectively improved with extremely low calculation cost, the problem of target feature disappearance caused by serious shielding in crowded places such as malls, station intersections and the like can be effectively relieved, and the feature distinction degree between incompletely shielded targets and shielding targets is effectively improved;
Fourth,: the invention can utilize the global information in the existing time window to check and correct the error tracking result in a certain time under the condition of meeting the real-time requirement. The method has very robust results in various extreme scenes and has very good suitability for other algorithms similar to short track splicing.
Drawings
FIG. 1 is a flowchart of the backtracking mechanism algorithm of the present invention.
FIG. 2 is a flowchart of the overall algorithm of the present invention.
FIG. 3 is a schematic diagram of an IOU mask module in accordance with an embodiment of the present invention.
FIG. 4 is a schematic diagram of appearance model creation in an example of the invention.
Fig. 5 is a comparison of the performance of each algorithm of MOT 2015.
Fig. 6 is a comparison of the MOT2015 algorithms FPS.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings.
The invention adopts MOT of semi-online mechanism, and the method can make good compromise and optimization in terms of real-time performance and precision.
Referring to fig. 1, the specific process of the present invention is: and obtaining a detection frame of the pedestrian or the moving object by using a video of the pedestrian or the moving object shot by a camera through a YOLO-V3 detector, namely, eliminating other objects or backgrounds for the frame of the pedestrian or the moving object. Acquiring video of a period of time, obtaining a Kalman sequence spectrum according to position change information among detection frames in a period of time window, finding a pair of Kalman heads (KALMAN HEAD, KH) according to the Kalman sequence spectrum, obtaining a detection frame of a target or a moving object to be tracked in the next frame through similarity of an appearance model, a motion model and a size change model, and enabling the target or the moving object to be always in a detection frame in the frame, otherwise, indicating that the target is lost. And splicing the detection frames with the similarity higher than the threshold value into a Kalman sequence spectrum, and updating a motion model and an appearance model in the Kalman sequence spectrum to be used for tracking the pedestrian or moving object target in the next frame.
The similarity of the appearance model is obtained through the following processes:
In frame n th, patch size is fixed to [64,128], the number of pixel histogram bins is 64, there are D detection boxes, D patches, and the X detection box is expressed as Patch corresponding to the X-th detection frame is/>
Then, in the n th frames, the present invention performs the crop and restore operations on the region where each detection box is located (e.g., the present invention cuts and resizes each patch to a tensor of the shape [64,128 ]. After these operations, the present invention obtains D patches equal in number to the fixed size of the detection frame. The invention then divides the pixels of the D patches into groups (e.g. 64 groups) each by color interval,
And grouping the resulting matrix result reshape into a one-dimensional vector Tsr X, i.e., the present invention obtains a1 x 144 tensor from a 3 x 64 tensor remodel, which is obtained by grouping the color intervals. The invention then takes Tsr X as theIs a representation vector of appearance features of (a). In combination with the appearance model function of [12], the invention obtains an appearance model, and represents the appearance model of the X-th detection frame and the Y-th track as: f (X) and f (Y). Finally, the present invention updates the appearance model by vector fusion, which can be expressed as/>Thus, the appearance similarity can be obtained as shown in the following formula (3-1).
Wherein: Λ A (X, Y) represents the similarity that is the appearance model. This approach would be an effective way to enhance the discrimination between targets when the present invention fails to pass through a single physical motion model, where the relationship between the trajectory and the detection frame is complex.
The similarity between the motion model and the size change model is obtained through the following processes: the time difference between adjacent frames is delta t, and the kth object in the nth frame isPosition center coordinates/>For (c, d), the velocity vector of the corresponding coordinate is/>Acceleration vector/>, corresponding to coordinatesFor/>The target corresponds to the detection frame size/>Is (w, h) corresponds to the change speed/>For/>The driving force of change is/>The detector impact factor is α (the higher the mIOU of the detector the higher the value, defaulting to 0.7).
Motion state of the kth object in the nth frameAnd size status/>Respectively/>And/>Covariance matrix/>, between element factors in motion stateCovariance matrix between element factors in size state is/>According to the law of physical motion, a position prediction equation and a size prediction equation for the next frame are obtained as follows:
I.e.
I.e.
Order theSimplifying the two iterative state transfer equations and the covariance matrix updating equation as follows:
Carrying out Kalman filter prediction based on normal distribution by taking the formulas (3-8) and (3-9) as iterative equations of a motion model and a size model to obtain position prediction information of an n+1th frame And size prediction information/>
For any two segments of the trajectories X and Y,And/>The forward velocity vector from the head to the tail of track X and the reverse velocity vector from the tail to the head of track Y are respectively available from equations (3-10) and (3-11). /(I)Representing the course of motion simulated by a kalman filter. F (X, Y) is a forward similarity score pointing from the tail of track X toward the head of track Y; /(I)Is the inverse similarity score pointing from the head of track Y to the tail of track X. The similarity is as follows:
the overall similarity can be expressed as (3-12), Λ M (X, Y) representing the similarity between the first segment of trajectory X and the second segment of trajectory Y calculated by equation (3-10) and equation (3-11). The value range of Λ M (X, Y) is [0,1], and the value of Λ M (X, Y) is closer to 1, which represents that the more likely the first segment of track X and the second segment of track Y belong to the same target in the physical motion simulation process of the model, which is an important basis for judging the connection between the fragment tracks.
Track confidence can be intuitively understood as the degree of matching between the constructed track and the actual track of the object. The confidence level of the trace conf (T i) can be represented by equation (3-13).
Wherein: Representing the average similarity between the various detections in the existing trajectory,/> Representing two detection boxes/>, in track T i And/> Representing the continuity of the trajectory, α is the number of frames that the object is lost, and β is a control parameter (default 0.4) related to the accuracy of the detector.
The video sequence selected by the semi-online mechanism on the time axis is positioned between the online mechanism and the offline mechanism, which is a good compromise between the online mechanism and the offline mechanism in performance, but the semi-online tracking mechanism can be well optimized in real-time performance and accuracy through optimization of algorithms such as shielding optimization, semantic segmentation optimization and the like.
The present invention defines the length of the time window as N. The minimum instantiation length of the short track is T m. The kalman family graph may be denoted as KFM for recording the detection relationship between the motion model and the appearance model. Note that the kth detection box in the nth frame is represented asIt also contains the coordinates and reliability detected in the list: [ x, y, w, h, conf ]; use of the invention/>Representing detection frame/>, in KFMRepresenting the order in the corresponding fragment trajectories. It can be represented by the following mathematical expression:
If it is It indicates that the detection box has not cascaded with any fragment trajectories in the KFM. x representsIs the (x+1) th member of a segment of the fragment trajectory in the KFM. The ith fragment track in KFM is defined as TK i, which is instantiated as a reliable short track ST j if its length is greater than T m and its motion model, appearance model, size model are not updated in the nth frame, otherwise the track will be disassembled.
Taking the situation of the n-th frame pedestrian picture as an example, the invention introduces a short track tracking process and a track backtracking strategy in the invention, as shown in fig. 2:
firstly, finding a detection frame KH of an nth frame of pedestrian picture pair: detection frame in nth frame and n+1st frame pictures In/>For the detection frame in the nth frame picture,/>And searching each pair of detection frames which possibly belong to the same target and are close in IOU relationship for the detection frames in the n+1st frame of picture. If/>Then/>AndMarked 0 and 1, respectively, and will/>And/>Referred to as a pair detection frame KH.
After this step, the present invention will have pairs of KH in the nth and n +1 frames (e.g.,And/>)。Representing detection frame/>, in KFMRepresenting the order in the corresponding patch trajectories, the ith detection box in the nth frame is denoted/>
Second, predicting:
And predicting the positions of the pedestrian targets in the next frame of picture according to the motion module of each pedestrian target in the current n+1st frame of picture. The specific process is as follows:
According to And/>And establishing a motion model of the unstable track TK i. Based on the motion model, the position of the pedestrian object belonging to the track TK i in the (n+2) th frame is predicted and defined as/>
Thirdly, track growth: selecting and selecting a matching strategy according to formula (3)The most similar detection box is/>Then, let/>And updates the motion model and the appearance model of the unstable trajectory TK i that has not been instantiated.
The position in the next frame is predicted using the motion model and the appearance model of the updated unstable trajectory TK i.
Fourth step: the tracking for each frame is repeated through the first to third steps.
Fifth step, instantiating or backtracking: short tracks in a KFM (e.g., TK 0,TK1,…,TKi) in the current frame are instantiated or traced back according to the following conditions:
a) Instantiation: if the length of the unstable track TK i in the Kalman sequence spectrum is not updated in the previous frame, the unstable track TK i is instantiated as a new reliable track ST j.
For example, if the length of the unstable track TK i is greater than or equal to the threshold T m, it is a reliable track. The threshold T m is determined according to the actual situation, and is generally taken to be 5.
B) Backtracking: if the length of the unstable track TK i in the Kalman sequence spectrum is smaller than the threshold T m and is not updated in the last frame, the unstable track TK i is deleted in the Kalman family graph KFM and setThe patch trajectories in the kalman sequence spectrum are then marked as forbidden routes to avoid later exploration of the reoccurrence of the path.
The invention adopts the IOU shade module to process the mutual shielding situation of two or more targets, and the process is as follows. As shown in fig. 3, a scene in which targets are blocked from each other is shown. When the object A and the object B are mutually shielded, before extracting the features from the detection frame area, the IOU area between the A and the B is used as a mask to cover the pixel information of the IOU area, so that the related objects are prevented from sharing the feature information of the IOU area, and the distinguishing degree of appearance models of different objects is effectively improved. However, when a plurality of objects block each other, it is easy to cause a phenomenon that the detection area of the object is almost completely covered by a plurality of IOU masks, thereby causing the appearance feature of the blocked object to be completely covered. To avoid this, the present invention provides thresh IOU to avoid the worst case.
Referring to fig. 3, in the nth frame, the kth detection frame is marked A set of IOU masks between each detection box is/>For the kth detection box, assume that the detection box set/>All of the detection frames are identical to/>When the shielding phenomenon occurs in the detection frame area, the detection frame set/>Detection box in the IOU mask/>, IOU mask/>Record as all occlusion merge area/>If it is/>Post-overlay/>Is/>Then by/>Cover/>The process of (1) is expressed as/>
If it is obtainedLess than a predetermined threshold value Thres IM, i.e./>The appearance characteristics of the target are difficult to express in an appearance model, so the invention adopts the method of detecting the frame set/>In detection frame and/>Sequencing the sizes of the shielding areas, and then sequentially removing the detection frames with the minimum shielding area one by one from the detection frame setIn addition, a new detection frame set/>, is obtainedAnd collect the new detection frame/>Re-take into equations (4) and (5) for calculation until/>When the final IOU mask is obtained, the final IOU mask/>As an IOU mask module, when appearance feature extraction is performed, setting the pixel value of the original image area where the appearance feature extraction is located to be zero: the shielded target and the shielded target pass through the shielding intersection area, so that the characteristic distinction between the targets is increased.
The following are specific examples.
The time window length is first set to 40 frames. The video and the detection frames of each frame are used as input, the patch of the corresponding detection frame obtained after clipping and resize is subjected to feature extraction, and appearance representation is carried out in a pixel histogram grouping mode as shown in fig. 3, so that an appearance model is established. The appearance model is built by the following process: in the n th frame, the patch size is fixed to [64,128], the pixel histogram grouping number is 64, and there are D detection boxes in the frame, D patches, and the X detection box is expressed asPatch corresponding to the X-th detection frame is/>
In frame n th, the crop and restore operations are performed on the region where each detection box is located (e.g., the present invention cuts and resizes each patch to a tensor of the shape [64,128 ]. After these operations, the present invention obtains D patches equal in number to the fixed size of the detection frame.
The invention then divides the pixels of the D patches into groups (e.g., 64 groups) each according to the color interval, and divides the matrix result reshape obtained by the grouping into a one-dimensional vector Tsr X. That is, the present invention obtains a1×192 tensor remodeled from a 3×64 tensor, which is obtained by grouping color intervals. Then the invention takes the one-dimensional vector Tsr X as the patch corresponding to the X-th detection frameIs a model of the appearance of (a).
The appearance model of the X-th detection frame and the appearance model of the Y-th track are expressed as: f (X) and f (Y). Finally, the present invention updates the appearance model by vector fusion, which can be expressed asThus, the appearance similarity can be obtained as shown in formula (7).
Wherein: Λ A (X, Y) represents the similarity that is the appearance model.
After the above steps, a chip track with greatly improved reliability can be obtained, and the noise occurring in the detection can be effectively examined, as shown in table 1.
Table 1 comparison of results of the algorithms on MOT15
Referring to fig. 5 and 6, the present invention combines the advantages of online and offline MOTs at the expense of a small amount of real-time, with good precision improvement at MOTA, MOTP, IDS, ML, MT, FM, and a suitable balance between real-time and accuracy.
The algorithm of the present invention performs substantially better on the data set MOT2015 than baseline, except fps, where MOTA, MOTP are raised by 12.6 and 6.3 percent, respectively, which illustrates a very large improvement in continuous tracking ability of the target over baseline, as a whole, which also illustrates that the short trace patch generated by the algorithm of the present invention is more accurate and robust; the algorithm of the present invention reduces 82 on IDS with less boost in the overall number of identity transitions. The algorithm of the invention has higher MT value and lower ML value, which shows that, to a certain extent, a more robust short track can reduce the number of missing frames between partial fragment tracks. The MOTA bit column of the algorithm of the present invention is first, and other metrics such as MOTP, recall, IDS are generally far better than average, from the overall performance among the algorithms, which means that the algorithm of the present invention has greater stability and generalization capability in the algorithm framework. It is particularly notable that the trace-back mechanism does not include any module involving complex computation, and only relies on a simple side appearance model and a motion model to perform the trace-back process through states between online and offline, so that the algorithm of the present invention has extremely obvious FPS advantages compared to other algorithms in the table.
The method adopts a semi-online mechanism to optimize the instantaneity and the accuracy of the multi-target tracking method. The method can detect and correct the established tracking result, effectively improves the target appearance characteristic degree, has high speed and small calculation resource requirement, can be used on an application-specific integrated circuit such as an Inweida TX2 and the like in the scenes of automatic driving, pedestrian tracking and the like, and effectively solves the problem that the real-time performance and the algorithm precision such as MOTA index of the current multi-target tracking algorithm are difficult to reach the best simultaneously.

Claims (2)

1. A multi-target tracking method of semi-online machine-set is characterized in that a detection frame of a pedestrian target is obtained through a YOLO-V3 detector according to a pedestrian target video, a Kalman sequence spectrum is obtained according to position change information among the detection frames in a period of time window, then a pair of Kalman heads are found according to the Kalman sequence spectrum, the detection frame of the pedestrian target to be tracked in the next frame is obtained through similarity of an appearance model, a motion model and a size change model, and the target is located in a detection frame in the frame, otherwise, the target is indicated to be lost; splicing a detection frame with the similarity higher than a threshold value into a Kalman sequence spectrum, updating a motion model and an appearance model in the Kalman sequence spectrum, and tracking a pedestrian target in the next frame;
In the first place In the frame, mark the/>Individual detection frames/>,/>One set of IOU masks between each detection box isFor the/>The number of detection frames is assumed to be equal to the number of detection frames/>All of the detection frames are identical to/>Shielding in the detection frame region, the detection frame set/>The detection box in the box gets the IOU mask/>Record as all occlusion combined area/>; Is/>Post-overlay/>Is/>
(4)
(5)
If it is obtainedLess than a predetermined threshold/>According to the detection frame set/>In detection frame and/>Sequencing the area of the shielding region, and then sequentially removing the detection frames with the smallest shielding region area from the detection frame set one by oneIn addition, a new detection frame set/>, is obtainedAnd collect the new detection frame/>Re-take into equations (4) and (5) for calculation until/>When a final IOU mask is obtained, the final/>As an IOU mask module, when appearance feature extraction is performed, setting the pixel value of the original image area where the appearance feature extraction is located to be zero: the shielded target and the shielded target pass through a shielding intersection area, so that the characteristic distinction degree among the targets is increased;
Defining the length of the time window as N, and the minimum instantiation length of the short track as The Kalman family graph is denoted KFM, and the kth detection box in the nth frame is denoted/>,/>Representation/>Detection frame/>Corresponding to the order in the patch trajectories;
(1)
If it is Then it indicates that the detection frame is at/>Not yet concatenated with any fragment trajectories,/>Representation/>Is thatFirst (/ >) of a segment of fragment trajectoryMembers,/>/>The individual fragment trajectories are defined as/>If/>The length of individual fragment trajectories is greater than/>And its motion model, appearance model, size model at/>Not updated in the frame, then the/>The individual fragment trajectories are instantiated as reliable short trajectories/>Otherwise, the/>The individual fragment trajectories are disassembled;
(2)
for the detection frames with similarity higher than the threshold value, the specific process of splicing the detection frames into the Kalman sequence spectrum is as follows:
First step, find the first Pair detection frame/>, of frame pedestrian picture: In/>Frame sum/>Detection frame (/ >) in frame picture,) In/>For/>Detection frame in frame map,/>Finding out each pair of detection frames which possibly belong to the same target and are close in IOU relation in the frame picture; if/>,/>Then willMarked as/>, respectivelyAnd/>And will/>Called/>
At the nth frame andThere are several pairs/>;/>Representation/>Detection frame/>The ith detection box in the nth frame is denoted/>, in the order in the corresponding patch track
Second, predicting:
Predicting the position of the pedestrian target in the next frame of picture according to the motion module of each pedestrian target in the current n+1st frame of picture
Thirdly, track growth: selecting and selecting according to formula (3)The most similar detection box is/>; Then, let theAnd updates the unstable track/>, which has not been instantiatedA motion model and an appearance model of (a);
else 0(3)
Using updated unstable trajectories Predicting the position in the next frame by the motion model and the appearance model;
fourth step: repeating the first to third steps for each frame of tracking;
fifth step, instantiating or backtracking: instantiating or backtracking short tracks in the KFM in the current frame according to the following conditions:
a) Instantiation: if unstable trajectories in the Kalman sequence spectrum Length is greater than or equal to threshold/>If not updated in the previous frame, the unstable track/>Instantiate as new reliable track/>
B) Backtracking: if unstable trajectories in the Kalman sequence spectrumIs less than a threshold/>And no update in the previous frame will be in the Kalman family map/>Unstable track/>And set/>And marking the fragment track in the Kalman sequence spectrum as a forbidden route.
2. The multi-target tracking method of a semi-online machine according to claim 1, wherein the specific process of the second step is: according toAnd/>Build unstable track/>Predicting a motion model belonging to a trajectory/>, based on the motion modelPedestrian target at/>The position of the frame and defining the position as/>
CN202010754142.2A 2020-07-30 2020-07-30 Multi-target tracking method of semi-online machine Active CN112116634B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010754142.2A CN112116634B (en) 2020-07-30 2020-07-30 Multi-target tracking method of semi-online machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010754142.2A CN112116634B (en) 2020-07-30 2020-07-30 Multi-target tracking method of semi-online machine

Publications (2)

Publication Number Publication Date
CN112116634A CN112116634A (en) 2020-12-22
CN112116634B true CN112116634B (en) 2024-05-07

Family

ID=73799581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010754142.2A Active CN112116634B (en) 2020-07-30 2020-07-30 Multi-target tracking method of semi-online machine

Country Status (1)

Country Link
CN (1) CN112116634B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906533B (en) * 2021-02-07 2023-03-24 成都睿码科技有限责任公司 Safety helmet wearing detection method based on self-adaptive detection area

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101141633A (en) * 2007-08-28 2008-03-12 湖南大学 Moving object detecting and tracing method in complex scene
CN103530894A (en) * 2013-10-25 2014-01-22 合肥工业大学 Video target tracking method based on multi-scale block sparse representation and system thereof
CN103632376A (en) * 2013-12-12 2014-03-12 江苏大学 Method for suppressing partial occlusion of vehicles by aid of double-level frames
CN104915970A (en) * 2015-06-12 2015-09-16 南京邮电大学 Multi-target tracking method based on track association
CN105809714A (en) * 2016-03-07 2016-07-27 广东顺德中山大学卡内基梅隆大学国际联合研究院 Track confidence coefficient based multi-object tracking method
CN106096645A (en) * 2016-06-07 2016-11-09 上海瑞孚电子科技有限公司 Resist and repeatedly block and the recognition and tracking method and system of color interference
WO2017185688A1 (en) * 2016-04-26 2017-11-02 深圳大学 Method and apparatus for tracking on-line target
CN108447080A (en) * 2018-03-02 2018-08-24 哈尔滨工业大学深圳研究生院 Method for tracking target, system and storage medium based on individual-layer data association and convolutional neural networks
CN109191497A (en) * 2018-08-15 2019-01-11 南京理工大学 A kind of real-time online multi-object tracking method based on much information fusion
CN109919981A (en) * 2019-03-11 2019-06-21 南京邮电大学 A kind of multi-object tracking method of the multiple features fusion based on Kalman filtering auxiliary
CN110135314A (en) * 2019-05-07 2019-08-16 电子科技大学 A kind of multi-object tracking method based on depth Trajectory prediction
CN110362715A (en) * 2019-06-28 2019-10-22 西安交通大学 A kind of non-editing video actions timing localization method based on figure convolutional network
KR20200039043A (en) * 2018-09-28 2020-04-16 한국전자통신연구원 Object recognition device and operating method for the same
KR20200061118A (en) * 2018-11-23 2020-06-02 인하대학교 산학협력단 Tracking method and system multi-object in video
CN111242985A (en) * 2020-02-14 2020-06-05 电子科技大学 Video multi-pedestrian tracking method based on Markov model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102366779B1 (en) * 2017-02-13 2022-02-24 한국전자통신연구원 System and method for tracking multiple objects
US10846915B2 (en) * 2018-03-21 2020-11-24 Intel Corporation Method and apparatus for masked occlusion culling
US10957053B2 (en) * 2018-10-18 2021-03-23 Deepnorth Inc. Multi-object tracking using online metric learning with long short-term memory

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101141633A (en) * 2007-08-28 2008-03-12 湖南大学 Moving object detecting and tracing method in complex scene
CN103530894A (en) * 2013-10-25 2014-01-22 合肥工业大学 Video target tracking method based on multi-scale block sparse representation and system thereof
CN103632376A (en) * 2013-12-12 2014-03-12 江苏大学 Method for suppressing partial occlusion of vehicles by aid of double-level frames
CN104915970A (en) * 2015-06-12 2015-09-16 南京邮电大学 Multi-target tracking method based on track association
CN105809714A (en) * 2016-03-07 2016-07-27 广东顺德中山大学卡内基梅隆大学国际联合研究院 Track confidence coefficient based multi-object tracking method
WO2017185688A1 (en) * 2016-04-26 2017-11-02 深圳大学 Method and apparatus for tracking on-line target
CN106096645A (en) * 2016-06-07 2016-11-09 上海瑞孚电子科技有限公司 Resist and repeatedly block and the recognition and tracking method and system of color interference
CN108447080A (en) * 2018-03-02 2018-08-24 哈尔滨工业大学深圳研究生院 Method for tracking target, system and storage medium based on individual-layer data association and convolutional neural networks
CN109191497A (en) * 2018-08-15 2019-01-11 南京理工大学 A kind of real-time online multi-object tracking method based on much information fusion
KR20200039043A (en) * 2018-09-28 2020-04-16 한국전자통신연구원 Object recognition device and operating method for the same
KR20200061118A (en) * 2018-11-23 2020-06-02 인하대학교 산학협력단 Tracking method and system multi-object in video
CN109919981A (en) * 2019-03-11 2019-06-21 南京邮电大学 A kind of multi-object tracking method of the multiple features fusion based on Kalman filtering auxiliary
CN110135314A (en) * 2019-05-07 2019-08-16 电子科技大学 A kind of multi-object tracking method based on depth Trajectory prediction
CN110362715A (en) * 2019-06-28 2019-10-22 西安交通大学 A kind of non-editing video actions timing localization method based on figure convolutional network
CN111242985A (en) * 2020-02-14 2020-06-05 电子科技大学 Video multi-pedestrian tracking method based on Markov model

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Confidence-Based Data Association and Discriminative Deep Appearance Learning for Robust Online Multi-Object Tracking;Seung-Hwan Bae 等,;《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》;20180331;第40卷(第3期);第595-610页 *
基于分层数据关联的在线多目标跟踪算法;李明华 等,;《现代计算机》;20180215;第2018年卷(第5期);第25-29页 *
深度分析卡尔曼滤波算法原理;strongerHuang,;《https://mp.weixin.qq.com/s/OSTyc-NA-gFjNcz2xqqTdQ》;20200624;第1-18页 *
深度解读:卡尔曼滤波,如此强大的工具 你值得弄懂!;嵌入式ARM,;《嵌入式ARM》;20190908;第1-21页 *
详解卡尔曼滤波原理;慧天地,;《https://www.sohu.com/a/332038419_650579》;20190807;第1-24页 *

Also Published As

Publication number Publication date
CN112116634A (en) 2020-12-22

Similar Documents

Publication Publication Date Title
CN111488795B (en) Real-time pedestrian tracking method applied to unmanned vehicle
CN109636829B (en) Multi-target tracking method based on semantic information and scene information
Hausler et al. Multi-process fusion: Visual place recognition using multiple image processing methods
CN112836639A (en) Pedestrian multi-target tracking video identification method based on improved YOLOv3 model
CN111160212B (en) Improved tracking learning detection system and method based on YOLOv3-Tiny
US11436815B2 (en) Method for limiting object detection area in a mobile system equipped with a rotation sensor or a position sensor with an image sensor, and apparatus for performing the same
CN109035295B (en) Multi-target tracking method, device, computer equipment and storage medium
CN103729861A (en) Multiple object tracking method
CN115841649A (en) Multi-scale people counting method for urban complex scene
CN106599918B (en) vehicle tracking method and system
KR20180070258A (en) Method for detecting and learning of objects simultaneous during vehicle driving
CN111666871A (en) Improved YOLO and SIFT combined multi-small-target detection and tracking method for unmanned aerial vehicle
CN111161325A (en) Three-dimensional multi-target tracking method based on Kalman filtering and LSTM
CN116152297A (en) Multi-target tracking method suitable for vehicle movement characteristics
CN113763427A (en) Multi-target tracking method based on coarse-fine shielding processing
CN112116634B (en) Multi-target tracking method of semi-online machine
CN114049610B (en) Active discovery method for motor vehicle reversing and reverse driving illegal behaviors on expressway
CN114926859A (en) Pedestrian multi-target tracking method in dense scene combined with head tracking
CN115346155A (en) Ship image track extraction method for visual feature discontinuous interference
CN117593685A (en) Method and device for constructing true value data and storage medium
CN115100565B (en) Multi-target tracking method based on spatial correlation and optical flow registration
CN111986231A (en) Multi-target tracking method and system
Xie et al. A multi-object tracking system for surveillance video analysis
CN111815682A (en) Multi-target tracking method based on multi-track fusion
CN112907634B (en) Vehicle tracking method based on unmanned aerial vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant