WO2020215492A1 - 基于YOLOv3多伯努利视频多目标检测跟踪方法 - Google Patents

基于YOLOv3多伯努利视频多目标检测跟踪方法 Download PDF

Info

Publication number
WO2020215492A1
WO2020215492A1 PCT/CN2019/094662 CN2019094662W WO2020215492A1 WO 2020215492 A1 WO2020215492 A1 WO 2020215492A1 CN 2019094662 W CN2019094662 W CN 2019094662W WO 2020215492 A1 WO2020215492 A1 WO 2020215492A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
frame
detection
detection frame
tracking
Prior art date
Application number
PCT/CN2019/094662
Other languages
English (en)
French (fr)
Inventor
杨金龙
程小雪
张光南
刘建军
张媛
葛洪伟
Original Assignee
江南大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江南大学 filed Critical 江南大学
Priority to US16/869,645 priority Critical patent/US11094070B2/en
Publication of WO2020215492A1 publication Critical patent/WO2020215492A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • the invention relates to a multi-target detection and tracking method based on YOLOv3 multi-Bernoulli video, which belongs to the field of machine vision and intelligent information processing.
  • the target detection and tracking method based on data association was mainly used in the early stage.
  • the target detector was first used to detect multiple targets in the video sequence, and then the video multi-target tracking was completed with the help of technologies such as data association.
  • Typical data associations include: multi-hypothesis tracking, joint probabilistic data association, graph decomposition, dynamic programming, etc.
  • Random Finite Set (RFS) theory has achieved certain advantages in tracking an unknown and varying number of multiple targets.
  • the random set modeling of target states and observations can avoid complex data association operations.
  • Professor Mahler proposed Probability Hypothesis Density (PHD) and Multi-Bernoulli (MeMBer) filters random finite set theory has been widely used in the field of target tracking.
  • multi-target tracking algorithms based on random finite set theory mainly include two categories: multi-target tracking algorithms based on probability hypothesis density (PHD)/potential probability hypothesis density (CPHD) and multi-Bernoulli (MeMBer)/ Multi-target tracking algorithm based on potential equilibrium multi-Bernoulli (CBMeMBer).
  • Typical closed solutions include: particle filter PHD/CPHD, Gaussian mixture PHD/CPHD, particle filter CBMeMBer and Gaussian mixture CBMeMBer, etc.
  • Particle Filter Multiple Bernoulli (PFMB) technology recursively approximates the posterior probability density of the multi-target state set with the help of multi-target Bayesian estimation theory, which can improve the tracking accuracy of multi-targets with varying numbers.
  • the PFMB method is difficult to detect new targets, and when multiple targets are occluded and interfered with each other, the tracking accuracy decreases, and even the target is underestimated.
  • the present invention provides a method based on YOLOv3. Nuuli video multi-target detection and tracking method. In the detection and tracking process of the method, the YOLOv3 technology is used to detect the kth and k+1 frames of the video sequence.
  • the number of the kth frame detection frame is recorded as n, and the detection frame state set is The number of detection frames in the k+1 frame is m, and the detection frame state set is among them, Represents the state vector of the i-th detection frame, parameter Respectively represent the abscissa and ordinate of the upper left corner of the i-th detection frame in the kth frame, as well as the width, height and label of the detection frame;
  • the middle frame of the video k>0, uses the detection frame, target trajectory information and target template information to achieve new target determination, missed target re-identification, and survival target optimization tracking; among them, the survival target optimization tracking is in the Do Bernoulli filter framework
  • the anti-interference convolution feature is used to represent the detection frame of the kth frame and the k+1th frame, which are respectively denoted as with Represents the convolution feature vector of the i-th detection frame in the kth frame; calculate the similarity matrix ⁇ corresponding to the convolution feature of the detection frame, namely:
  • the detection frame uses the detection frame, target trajectory information and target template information to achieve new target determination, missing target re-identification, and survival target optimization tracking, including:
  • the matching of adjacent detection frames includes:
  • performing target recognition according to the detection frame of the adjacent frame, the target template set, and the surviving target trajectory set includes:
  • the target of the detection frame is the surviving target, and the target label is If Is empty, but matches the target with label l b in the surviving target trajectory set, it can also be judged that the target in the detection frame is a surviving target, and a label is assigned to it, namely
  • the detection frame is determined to be interference clutter.
  • the method further includes: constructing a target motion model according to the confidence of the detection frame;
  • the target motion model is a random walk model
  • the YOLOv3 detector detects the target, and the confidence of the detection frame When greater than the confidence threshold T B , the detection frame is used to adjust the target state, that is
  • Is the target state vector labeled l i in the k-1 frame, Is the detection frame with the label l i in the kth frame, and the confidence of the detection frame Indicates the probability that the detection frame contains the target and the score of the degree of matching between the detection frame and the target, e(k-1) represents zero-mean Gaussian white noise, ⁇ is the learning rate between the target state and the corresponding detection frame information, and the confidence of the detection frame The higher the value, the larger the value of ⁇ , which means the more confidence in the detection result.
  • Dobernulli parameter set Represents the posterior probability distribution of multiple targets, where M k-1 is the number of targets existing at k-1, Represents the existence probability of target l i at time k-1, Represents the probability distribution of the target l i at time k-1, represented by a set of weighted particles:
  • the multi-objective probability distribution is still a multi-Bernoulli parameter set, expressed as:
  • M ⁇ ,k is the number of newborn targets in the kth frame.
  • the multi-target prediction probability distribution of the k-1th frame is:
  • the surviving target trajectory, target template, and target detection result are used to merge and update the target template, namely:
  • the detector can detect these two targets, it means that the target is slightly occluded, and the target template is adaptively updated;
  • the other can be determined The target is an occluded target.
  • the template update is stopped, and the displacement difference of the first two frames of the target is used to estimate the target real-time velocity v and the movement direction ⁇ to predict the target, and the size of the target frame remains unchanged;
  • the occluded target is judged based on the similarity between the target frame and the template, and the occluded target is processed in the same way as in the case (2); if the target is in the tracking process When the target is separated or reappears, the target is re-identified according to the match between the detection result and the target template.
  • the anti-jamming convolution feature is used to target the target.
  • the details are described in depth, and the target state is accurately estimated with the help of an adaptive PFMB method.
  • the detection result and the tracking result are especially interactively fused to improve the estimation accuracy of the multi-target state.
  • the present invention is also aimed at YOLOv3 in the complex environment video multi-target detection, there will also be problems of detection errors and inaccurate detection results.
  • the present invention integrates the YOLOv3 detection method under the PFMB filtering framework, which not only solves the problem of unknown new targets in the PFMB method
  • Estimation problems, target identification problems, and fusion modeling of detection confidence and filtering likelihood can also effectively improve the tracking accuracy of multiple video targets in complex environments.
  • Figure 1 is an algorithm framework diagram of the method of the present invention.
  • Figure 2 is a schematic diagram of the working principle of YOLOv3.
  • Figure 3 is a diagram of the YOLOv3 network structure.
  • Figure 4 is the experimental results of EnterExitCrossingPaths1cor sequence.
  • Figure 5 is a comparison chart of the estimated number of experimental targets in the EnterExitCrossingPaths1cor sequence.
  • Figure 6 is a comparison diagram of the OSPA distance estimation in the EnterExitCrossingPaths1cor sequence experiment.
  • FIG. 7 shows the results of Jogging sequence experiments.
  • Figure 8 is a comparison diagram of the estimated number of experimental targets in the Jogging sequence.
  • Figure 9 is a comparison diagram of Jogging column experimental OSPA distance estimation.
  • FIG. 10 shows the results of the Subway sequence experiment.
  • Figure 11 is a comparison chart of the estimated number of targets in the Subway sequence experiment.
  • Figure 12 is a comparison diagram of OSPA distance estimation in Subway sequence experiment.
  • FIG. 13 shows the results of the Human4 sequence experiment.
  • Figure 14 is a comparison diagram of the estimated number of human 4 sequence experiments.
  • Figure 15 is a comparison diagram of the human4 sequence experiment OSPA distance estimation.
  • FIG. 16 shows the results of the Suv sequence experiment.
  • Figure 17 is a comparison diagram of the estimated number of targets in the Suv sequence experiment.
  • Figure 18 is a comparison diagram of OSPA distance estimation in Suv sequence experiment.
  • the target component through Bernoulli parameters (r (i) , p (i) ) to parameterize the posterior probability distribution of the target, where r (i) and p (i) represent the existence probability and probability distribution of the i-th target, respectively .
  • MBF uses Bayesian theory to iteratively update the posterior probability distribution of multiple targets to achieve the state estimation of multiple targets.
  • M k-1 represents the number of surviving targets in the k-1 frame.
  • the predicted probability distribution of multiple targets can be expressed as:
  • ⁇ f 1 ( ⁇ ), f 2 ( ⁇ )> represents the standard inner product ⁇ f 1 (x)f 2 (x)dx
  • P S, k is the target survival probability
  • ⁇ ) Is the target state transition function
  • M ⁇ ,k represents the number of newly born targets in the k-th frame.
  • the updated multi-target posterior probability density can be calculated from the multi-Bernoulli parameter of the missed target And measuring the updated Do Bernoulli parameters Approximately, namely:
  • p D, k (x) is the target detection probability
  • x) is the measurement likelihood function
  • YOLOv3 uses Darknet-53 as the feature extraction network.
  • the network structure is shown in Figure 3. It consists of continuous 3 ⁇ 3 and 1 ⁇ 1 convolutional layers, and integrates the residual block of the Residual Neural Network (ResNet) (Residual block), the entire network is divided into multiple sub-segments to train stage by stage, and the residual of each sub-segment is trained using a shortcut connection method to achieve the smallest overall residual.
  • Residual Neural Network Residual Neural Network
  • YOLOv3 predicts bounding boxes on three different scales, each scale predicts 3 bounding boxes, and local feature interaction is performed within the scale.
  • a series of convolutional layers are added to obtain a feature map.
  • Perform position regression and classification this process is the smallest scale prediction; connect the convolutional layer upsampling in the previous scale to the last 16 ⁇ 16 feature map, and output the prediction information after multiple convolutions; the same is true , Connect the mid-scale convolutional layer upsampling with the last 32 ⁇ 32 feature map, and obtain the bounding box prediction on the largest scale after a series of convolutions.
  • YOLOv3 adopts the idea of anchor box in Faster R-CNN, and generates a priori of 9 bounding boxes through the k-means clustering method, and each size predicts 3 bounding boxes.
  • YOLOv3 divides the image into G ⁇ G grids, then the predicted tensor for each scale is G ⁇ G ⁇ [3*(4+1+80)], where the number 3 represents 3 prediction bounding boxes, and the number 4 represents each bounding box contains four predicted offset value (t x, t y, t w, t h), (t x, t y, t w, t h) the center coordinates of the bounding box is the predicted offset value and width, A high offset value, the number 1 represents the confidence score of the bounding box, and the number 80 represents the conditional category probability P r (Class i /Object) that the predicted bounding box belongs to 80 classes of objects.
  • the bounding box uses the sum of squared error losses during training.
  • the confidence of the bounding box is determined by the probability that the bounding box contains the target P r (object) and the accuracy of the bounding box constitute:
  • the conditional class probability P r (Class i /Object) represents the probability that it belongs to a certain class under the premise that the bounding box contains the target.
  • YOLOv3 combines conditional category probabilities and bounding box confidence to obtain bounding box category confidence scores (class specific confidence scores, C) to indicate the probability that the bounding box contains the target classified into each category and how well the bounding box matches the target.
  • the confidence of the border category can be expressed as:
  • This embodiment provides a multi-target detection and tracking method based on YOLOv3 multi-Bernoulli video.
  • the method includes:
  • YOLOv3 algorithm to detect the kth and k+1 frames of the video sequence, record the number of the kth frame detection frame as n, and the detection frame state set as The number of detection frames in the k+1 frame is m, and the detection frame state set is among them, Represents the state vector of the i-th detection frame, parameter Respectively represent the abscissa and ordinate of the upper left corner of the i-th detection frame in the kth frame, as well as the width, height and label of the detection frame.
  • the anti-interference convolution feature is used to represent the detection frame of the kth frame and the k+1th frame, which are respectively denoted as with Represents the convolution feature vector of the i-th detection frame in the kth frame; calculate the similarity matrix ⁇ corresponding to the convolution feature of the detection frame, namely:
  • the intersection of the target box and the IOU are added as a restriction: if If the intersection ratio IOU of the two detection frames is greater than the intersection ratio threshold T u , it can be determined that the two detection frames match.
  • the target of the detection frame is the surviving target, and the target label is If Is empty, but it matches the target with label l b in the trajectory of the surviving target, it can also be determined that the target in the detection frame is a surviving target, and a label is assigned to it, namely
  • the detection frame is interference clutter.
  • Step 3 Target prediction
  • the target motion model is a random walk model
  • the YOLOv3 detector detects the target, and the confidence of the detection frame When greater than the confidence threshold T B , the detection frame is used to adjust the target state, that is
  • Is the target state vector labeled l i in the k-1 frame, Is the detection frame with the label l i in the kth frame, and the confidence of the detection frame Indicates the probability that the detection frame contains the target and the score of the degree of matching between the detection frame and the target, e(k-1) represents zero-mean Gaussian white noise, ⁇ is the learning rate between the target state and the corresponding detection frame information, and the confidence of the detection frame The higher and the larger ⁇ , the more confidence in the detection result.
  • the use of a good detection frame to adjust the target state can eliminate the accumulated error caused by long-term tracking and better optimize the prediction result.
  • the Dobernulli parameter set is used Represents the posterior probability distribution of multiple targets, where M k-1 is the number of targets existing at k-1, Represents the existence probability of target l i at time k-1, Represents the probability distribution of the target l i at time k-1, represented by a set of weighted particles:
  • the multi-objective probability distribution is still a multi-Bernoulli parameter set, expressed as:
  • M ⁇ ,k is the number of newborn targets in the kth frame.
  • the target frame randomly sample m background sample frames with a rectangular frame the size of the target frame.
  • the distance between the background sample frame and the center of the target frame must be greater than 1/4 of the target frame width in the horizontal direction or greater than the vertical distance 1/4 the height of the target frame, then normalize the size of the target frame and the background sample frame to n ⁇ n, and gray-scale the image to obtain the sample set ⁇ I, B 1 , B 2 ,..., B m ⁇ , where I represents the target Frame image, B i is the i-th background sample image.
  • Extract feature maps that weaken background information The feature maps are expanded in rows and sequentially spliced to obtain the final one-dimensional convolution feature f, where,
  • vec(C) is the column vector by concatenating all the elements in C.
  • the only solution of sparse representation is obtained by soft-shrinking method:
  • is the median of the tensor C.
  • f 1 and f 2 are the corresponding target frame convolution features.
  • the target feature atlas and background feature atlas corresponding to the target are used to calculate the convolution features corresponding to the particles:
  • Il is the normalized image of the candidate frame represented by the particle. Expand S i in rows to obtain d one-dimensional vectors, which are sequentially connected to the feature f l of the candidate frame corresponding to the final particle.
  • Step 5 Target status update and extraction
  • the present invention adopts a random resampling method to resample the sampled particle set to avoid particle degradation and eliminate Bernoulli components with too low probability.
  • Step 6 Target occlusion processing mechanism
  • the detector may have three situations: 1) When the target is not completely occluded When the detector may detect two targets, the method of the present invention proposes to use an update mechanism to track the target and update the template; 2) When only one target can be detected by the detector, the other target can be determined to be an occluded target.
  • the template update is stopped for the occluded target, and the displacement difference of the first two frames of the target is used to estimate the real-time speed v and the moving direction ⁇ of the target, and the target is predicted and estimated, and the size of the target frame remains unchanged; 3) Both targets cannot When detected by the detector, the occluded target is judged based on the similarity between the target and the template, and the occluded target is processed in the same way as in the second case. If the target disappears or is lost during the tracking process, when the target separates or reappears, the target can be re-identified according to the matching of the detection result and the target template.
  • Step 7 Target template update
  • the video sequence data used in the present invention is the sequence Huamn4 in Visual Tracker Benchmark TB50, the sequence Jogging, Subway, and Suv in Visual Tracker Benchmark TB100, and the sequence EnterExitCrossingPaths1cor in the CAVUAR data set. These 5 groups of typical video sequences are from different scenarios. It also includes dynamic background, target close proximity, target deformation, image blur, target size change, target occlusion and other interference situations.
  • the evaluation indicators used in the experiment are Multiple Object Tracking Accuracy (MOTA), Multiple Object Tracking Precision (MOTP), the number of complete trajectory targets (Mostly Tracked, MT), and the number of tag jumps (Identity). Switch, IDs) are defined as follows:
  • m t is the number of lost targets in the t-th frame
  • fp t is the number of misdetected targets in the t-th frame
  • mme t is the number of jumps in the target label in the tracking trajectory in the t-th frame
  • g t is The actual number of targets in the t frame.
  • MT The number of complete targets in the trajectory (MT), which represents the number of target trajectories whose trajectories account for more than 80% of the true trajectory length, and describes the integrity of the trajectory.
  • Tag jump number which indicates the number of times the target tag changes during the tracking process.
  • the method of this application is implemented by Matlab2016a, which runs on a workstation with Intel Core i7-8700, 3.2GHz, 12 cores, 16GB memory, and NVIDIA Geforce GTX 1080 Ti graphics card, and is combined with the traditional particle multi-Bernoulli filter (PFMB) method and the IOU-Tracker method proposed in the paper "High-Speed Tracking-by-Detection Without Using Image Information” published by Erik Bochinski et al. in 2017 for performance comparison and analysis. Because the traditional PFMB method lacks a detection mechanism for new-born targets, YOLOv3 is also used in this experiment for detection, and the detection result determined as a new-born target is taken as the new-born target of PFMB and then tracked.
  • PFMB particle multi-Bernoulli filter
  • the video sequence used in this experiment is the EnterExitCrossingPaths1cor sequence in the CAVUAR data set. This sequence contains 383 frames of images.
  • the fusion of the detector result, the existence of the target trajectory and the target template of the present invention can realize the re-identification of the target and re-add the target to the tracking trajectory.
  • Figure 4 shows the experimental results of the EnterExitCrossingPaths1cor sequence.
  • Figure 4(a) is the detection result of the YOLOv3 algorithm
  • Figure 4(b) is the tracking result of the traditional PFMB method
  • Figure 4(c) is the tracking result of the IOU-Tracker method
  • Figure 4 (d) is the tracking result of the method of the present invention. It can be seen that when the target is occluded, the detector may not be able to detect the occluded target, which greatly reduces the detection effect.
  • the traditional PFMB method is used. When there is occlusion, the target is followed.
  • Figure 5 is a comparison chart of the estimated number of targets in the EnterExitCrossingPaths1cor sequence experiment
  • Figure 6 is a comparison chart of the OSPA distance estimation of the EnterExitCrossingPaths1cor sequence experiment. Missing tracking results in a large deviation of the tracking frame and an increase in the OSPA value; the IOU-Tracker method is affected by the performance of the detector. During the target occlusion process, the number of targets decreases and the OSPA value rises sharply. The method of the present invention deals with long-time occluded targets , There may also be target missed tracking. When the target is out of occlusion, it can effectively re-identify the missed tracking target. The overall tracking performance is better than the traditional PFMB method and IOU-Tracker method. In particular, it can be seen from the four evaluation indicators of MOTA, MOTP, MT, and IDs shown in Table 1.
  • Table 1 shows the results of 50 Monte Carlo simulations. It can be seen that the traditional PFMB method has the lowest values of indicators MOTA, MOTP, and MT, and IDs is 1. This is because the method lacks the target re-recognition process, which leads to the target being lost. When it reappears, it will not be able to be associated with the previous trajectory, and the label will jump. In the IOU-Tracker method, when the target is close to or hidden, the target label will frequently jump, resulting in the highest IDs. The method of the present invention can effectively re-identify the target, reduce the problem of target label jump, and effectively reduce track fragments.
  • the video sequence used is the Jogging sequence and Subway sequence in the Visual Tracker Benchmark TB100 data set.
  • the Jogging sequence is an intersection scene where the camera moves, which contains 3 moving targets, and the target is blocked.
  • the Subway sequence contains 8 targets, and there are problems such as close proximity and frequent occlusion of multiple targets.
  • FIG. 7(a) is the detection result of the YOLOv3 algorithm
  • Figure 7(b) is the tracking result of the traditional PFMB method
  • Figure 7(c) is the tracking result of the IOU-Tracker method
  • 7(d) is the tracking result of the method of the present invention.
  • the YOLOv3 detector is difficult to correctly detect the occluded target.
  • the IOU-Tracker method loses the missed target.
  • the IOU-Tracker method defines the re-detected target as a new target, the traditional PFMB Even if the method does not lose the target, the deviation degree of the tracking frame increases.
  • the method of the present invention can well integrate the detection result and tracking result of the target, realize the adjustment of the tracking frame, and obtain more accurate tracking results.
  • Figure 8 shows a comparison chart of the number of targets in the Jogging sequence experiment.
  • the traditional PFMB method is consistent with the number of targets of the present invention.
  • Figure 9 shows the comparison chart of the OSPA distance estimation in the Jogging sequence experiment. It can be seen that the YOLOv3 detector The existence of missing detection results in inaccurate estimation of the number of targets in the IOU-Tracker method, and a decrease in missing estimates.
  • the OSPA value rises sharply and the accuracy is not high; the traditional PFMB method increases the deviation of the tracking frame after the target is occluded, resulting in the OSPA value The rise increases, and the method of the present invention has better anti-blocking ability, and the tracking performance is obviously better than the traditional PFMB method and the IOU-Tracker method.
  • Figure 10(a) is the detection result of the YOLO v3 algorithm
  • Figure 10(b) is the tracking result of the traditional PFMB method
  • Figure 10(c) is the tracking result of the IOU-Tracker method
  • Figure 10(d) shows the tracking result of the method of the present invention. It can be seen that when multiple targets are close to or occluded each other, the detector is difficult to detect the occluded target or the detection result has a large deviation, resulting in frequent loss of targets in the IOU-Tracker method.
  • the traditional PFMB method deals with occlusion and close proximity problems, Large deviations may also occur, or even tracking loss, as shown in the 17th and 19th frames in Figure 10(b), and the method of the present invention can better track the occluded target.
  • Figure 11 shows the comparison diagram of the target number estimation of the Subway sequence
  • Figure 12 shows the comparison diagram of the OSPA distance estimation of the Subway sequence.
  • the quantitative analysis results of the Subway sequence are shown in Table 2.2.
  • multiple targets are also in a state of obscuring each other for a long time and are in close proximity.
  • the method of the present invention is significantly better than the traditional PFMB method and IOU in terms of MOTA, MOTP, MT, and IDs. -Tracker method.
  • the Human4 sequence in the video sequence Visual Tracker Benchmark TB50 data set is used.
  • the sequence contains a total of 667 frames of images, which is the scene of the camera moving traffic lights.
  • the video is caused by camera movement or rapid target movement.
  • the target is ambiguous, and many targets frequently deform.
  • Figure 13 shows the results of the Human4 sequence experiment.
  • Figure 13 (a) is the detection result of the YOLO v3 algorithm
  • (b) is the tracking result of the traditional PFMB method
  • (c) is the tracking result of the IOU-Tracker method
  • (d) is the method of the present invention
  • the tracking results It can be seen that the image blur may cause the performance of the detector to decrease, and some targets are missed, as shown in the 50th and 74th frames in Figure 13(a).
  • the target deformation has little impact on the performance of the detector.
  • the traditional PFMB method can keep up with the target, the tracking frame of some targets will deviate.
  • the method of the present invention can better deal with these two situations and has higher tracking. Accuracy.
  • Figure 13 shows the comparison chart of human4 sequence target number estimation
  • Figure 14 shows the comparison chart of Human4 sequence OSPA distance estimation. It can be seen that the method of the present invention can better deal with the problems of image blur and target deformation, and the tracking performance is better than the traditional PFMB method and the IOU-Tracker method.
  • the traditional PFMB method causes the target to miss or only track a part of the target area, resulting in lower values of MOTA, MOTP, and MT; the IOU-Tracker method is better
  • the detection result of the detector is used, so the MOTP value is equivalent to the method of the present invention, and the target size change and image blurring cause the performance of the detector to decrease, and some targets are missed, which makes the MOTA value too small and the IDs value to be the largest.
  • a large number of target tags jump and the track estimation is not accurate.
  • the method of the present invention can better correlate the detection frame, effectively reduce the jump of the target label, and is obviously better than the IOU-Tracker in IDs.
  • the Suv sequence in the video sequence Visual Tracker Benchmark TB100 data set is used.
  • the sequence contains 400 frames of images. It is a road scene with a dynamic background of the camera moving and contains 6 targets. There is a large displacement caused by fast motion.
  • Figure 16 shows the experimental results of the Suv sequence.
  • Figure 16(a) is the detection result of the YOLO v3 algorithm
  • Figure 16(b) is the tracking result of the traditional PFMB method
  • Figure 16(c) is the tracking result of the IOU-Tracker method
  • 16(d) is the tracking result of the method of the present invention. It can be seen that the large displacement of the target has no effect on the detector. The IOU-Tracker method performs well, while the traditional PFMB method will cause the target to be lost. The method of the present invention uses the result of the detector to adjust the tracking process. The displacement target tracking results are also significantly better than the other two methods.
  • Figure 17 is a comparison diagram of the target number estimation of Suv sequence
  • Figure 18 is a comparison diagram of OSPA distance estimation of Suv sequence. It can be seen that the traditional PFMB method lacks a processing mechanism for large displacement of the target. Therefore, when the target displacement is large, the target may be lost or the tracking frame deviates greatly, resulting in a large OSPA value, and IOU-Tracker
  • the method and the method of the present invention can use the result of the detector to improve the ability to process large displacements of the target, and have better tracking accuracy.
  • the method of the present invention can effectively process the large displacement of the target by fusing the detection results, and is significantly better than the traditional PFMB method in various indicators, and is also significantly better than the IOU-Tracker method in MOTA, MT, and IDs.
  • the multi-target detection and tracking method based on YOLOv3 multi-Bernoulli video provided by this application is interactively fused with the detection result and the tracking result.
  • the method proposed in this application can fuse the detection results and tracking results of the targets, realize the adjustment of the tracking frame, and obtain comparison Accurate tracking results; when there are image blurring and target deformation, the method proposed in this application can better correlate the detection frame and effectively reduce the jump of the target label; and when there is a large displacement of the target, this application
  • the proposed method can use the results of the detector to adjust the tracking process, so as to prevent the occurrence of missing targets.
  • Part of the steps in the embodiments of the present invention can be implemented by software, and the corresponding software program can be stored in a readable storage medium, such as an optical disc or a hard disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于YOLOv3多伯努利视频多目标检测跟踪方法,属于机器视觉、智能信息处理领域。本发明在多伯努利滤波框架下引入YOLOv3检测技术,采用抗干扰的卷积特征描述目标,并交互融合检测结果和跟踪结果,实现对数目未知且时变的视频多目标状态进行精确估计;跟踪过程中,将匹配的检测框与目标轨迹及目标模板相结合,实时进行目标新生判断与遮挡目标重识别,同时考虑检测目标和估计目标的身份标记信息,实现对目标身份识别及航迹跟踪,可以有效提高对被遮挡目标的跟踪精度,减少轨迹碎片。实验表明,本发明具有良好的跟踪效果和鲁棒性,能广泛满足智能视频监控、人机交互、智能交通管制等***的实际设计需求。

Description

基于YOLOv3多伯努利视频多目标检测跟踪方法 技术领域
本发明涉及基于YOLOv3多伯努利视频多目标检测跟踪方法,属于机器视觉、智能信息处理领域。
背景技术
复杂环境下视频多目标跟踪应用领域中,除了存在光照变化、目标形变、目标被遮挡等问题,还存在目标数目未知、新生目标不确定、目标交叉或紧邻运动、目标消失及杂波干扰等复杂情况,一直是多目标跟踪领域中研究的难点和具有挑战性的问题。
针对视频多目标跟踪问题,早期主要采用基于数据关联的目标检测跟踪方法,先采用目标检测器对视频序列进行多目标检测,然后借助数据关联等技术完成对视频多目标跟踪。典型的数据关联如:多假设跟踪、联合概率数据关联、图分解、动态规划等,虽然这些方法在视频多目标跟踪中取得了一定的效果,但由于复杂的数据关联运算,一定程度上降低了算法的运算效率,此外,对数目未知且变化的多目标跟踪,存在目标数目及状态估计不准确的问题。
近年来,随机有限集(Random Finite Set,RFS)理论在对数目未知且变化的多目标跟踪中取得了一定优势,分别对目标状态和观测进行随机集建模,可避免复杂的数据关联运算。自Mahler教授提出概率假设密度(Probability hypothesis density,PHD)和多伯努利(MeMBer)滤波器之后,随机有限集理论在目标跟踪领域得到了广泛地应用。概括来说,基于随机有限集理论的多目标跟踪算法主要包含两大类:基于概率假设密度(PHD)/势概率假设密度(CPHD)的多目标跟踪算法和基于多伯努利(MeMBer)/势均衡多伯努利(CBMeMBer)的多目标跟踪算法。典型的闭合解有:粒子滤波PHD/CPHD、高斯混合PHD/CPHD、粒子滤波CBMeMBer和高斯混合CBMeMBer等。尤其是粒子多伯努利滤波(Particle Filter Multiple Bernoulli,PFMB)技术,借助多目标贝叶斯估计理论递推近似多目标状态集的后验概率密度,可提高对数目变化多目标的跟踪精度。但PFMB方法难以检测新生目标,且当多目标之间出现相互遮挡和干扰时,跟踪精度下降,甚至出现目标被漏估计的问题。
发明内容
为了解决目前存在的现有目标跟踪方法无法检测新生目标以及当多目标之间出现相互遮挡和干扰时,跟踪精度下降,甚至出现目标被漏估计的问题,本发明提供了一种基于YOLOv3多伯努利视频多目标检测跟踪方法,所述方法检测跟踪过程中,采用YOLOv3技术检测第k和k+1帧视频序列,记第k帧检测框个数为n,检测框状态集为
Figure PCTCN2019094662-appb-000001
第k+1帧检测框个数为m,其检测框状态集为
Figure PCTCN2019094662-appb-000002
Figure PCTCN2019094662-appb-000003
其中,
Figure PCTCN2019094662-appb-000004
表示第i个检测框状态向量,参数
Figure PCTCN2019094662-appb-000005
分别表示第k帧第i个检测框左上角的横坐标、纵坐标,以及检测框的宽、高和标签;
将第k和k+1帧视频序列的检测框进行匹配;对于视频的初始帧,k=0,将初始帧中已匹配的检测框作为初始的新生目标加入目标模板集和目标轨迹集中;对于视频的中间帧,k>0,利用检测框、目标轨迹信息和目标模板信息实现新生目标判定、漏跟目标重识别和存活目标优化跟踪;其中,存活目标优化跟踪是在多伯努利滤波框架下,利用当前帧置信度大于给定置信度阈值T b的检测框信息,优化对应目标的跟踪过程。
可选的,所述对于视频的初始帧,k=0,将已匹配的检测框作为新生目标加入目标模板集和目标轨迹集中,包括:
采用抗干扰的卷积特征表示第k帧和第k+1帧的检测框,分别记为
Figure PCTCN2019094662-appb-000006
Figure PCTCN2019094662-appb-000007
Figure PCTCN2019094662-appb-000008
Figure PCTCN2019094662-appb-000009
表示第k帧中第i个检测框的卷积特征向量;计算检测框卷积特征对应的相似度矩阵Λ,即:
Figure PCTCN2019094662-appb-000010
其中,
Figure PCTCN2019094662-appb-000011
表示第k帧中第i个检测框与第k+1帧中第j个检测框的相似度;从相似度矩阵Λ的每一行选择值最大、且大于相似度阈值T l的两个检测框作为匹配结果,将最终匹配的检测框对作为初始的新生目标加入目标模板集和目标轨迹集中。
可选的,将初始两帧的检测框进行匹配,并判定新生目标,将最终匹配的检测框对作为初始的新生目标加入目标模板集中,包括:
假定同一个目标在相邻两帧不会出现特别大的位移变化,在相似度匹配的基础上,加入目标框的交 并比IOU作为限制:若两个检测框的交并比IOU大于交并比阈值T u,则可判定两个检测框匹配;
从相似度矩阵Λ的每一行选择值最大、且大于相似度阈值T l,同时交并比IOU大于交并比阈值T u的两个检测框作为匹配结果,将最终匹配的检测框对作为初始的新生目标,分配标签
Figure PCTCN2019094662-appb-000012
并加入目标轨迹集
Figure PCTCN2019094662-appb-000013
中,其中,
Figure PCTCN2019094662-appb-000014
为目标状态向量,各个分量分别表示第i个目标框左上角横坐标、纵坐标以及目标框的宽、高和标签,M k为第k帧目标的个数,给已匹配的检测框对添加与初始的新生目标对应的目标标签,即
Figure PCTCN2019094662-appb-000015
同时建立新生目标模板。
可选的,所述对于视频的中间帧,k>0,利用检测框、目标轨迹信息和目标模板信息来实现新生目标判定、漏跟目标重识别和存活目标优化跟踪,包括:
对相邻的检测框进行匹配,确定新生目标与重识别目标的判定条件,再根据相邻帧的检测框、目标模板集和存活目标轨迹集来判定目标是否为新生目标、重识别目标或存活目标。
可选的,所述对相邻的检测框进行匹配,包括:
计算第k和k+1帧视频序列中检测框之间的相似度矩阵Λ,从相似度矩阵Λ的每一行选择值最大且大于相似度阈值T l的两个检测框,且这两个检测框的交并比IOU大于交并比阈值T u,则可判定两个检测框匹配,假设第k帧中的第i个检测框与第k+1帧中的第j个检测框匹配,则为第k+1帧中匹配目标的标签赋值,即:
Figure PCTCN2019094662-appb-000016
Figure PCTCN2019094662-appb-000017
表示第k帧中第i个检测框的标签,若
Figure PCTCN2019094662-appb-000018
为空,则表示该检测框包含的目标在第k-1帧中未被检测到。
可选的,所述对于当前帧检测框目标,根据相邻帧的检测框、目标模板集和存活目标轨迹集来进行目标识别,包括:
(1)新生目标识别
若为
Figure PCTCN2019094662-appb-000019
空,与第k+1帧第j个检测框匹配,且目标模板集中没有与之匹配的目标,则判定该目标为新生目标,分配标签
Figure PCTCN2019094662-appb-000020
建立新生目标状态集:
Figure PCTCN2019094662-appb-000021
其中
Figure PCTCN2019094662-appb-000022
表示标签为l i的新生目标状态向量,M Γ表示新生目标个数,n表示目标模板的个数;给已匹配的检测框对添加与新生目标对应相同的目标标签,即
Figure PCTCN2019094662-appb-000023
同时将新生目标加入目标模板集,并根据新生参数初始化新生目标的采样粒子;
(2)漏跟目标重识别
Figure PCTCN2019094662-appb-000024
为空,与第k+1帧第j个检测框匹配,在存活目标轨迹集中没有与之匹配的目标,但与目标模板中标签为l a的目标匹配,表示该检测框包含的目标在之前帧漏检,且出现漏跟情况,在第k帧,对该目标进行了重识别,将其重新加入到存活目标轨迹集中:
Figure PCTCN2019094662-appb-000025
其中,
Figure PCTCN2019094662-appb-000026
表示标签为l i的存活目标状态向量;给已匹配的检测框对添加与重识别目标对应相同的目标标签,即
Figure PCTCN2019094662-appb-000027
(3)存活目标识别
Figure PCTCN2019094662-appb-000028
不为空,则该检测框目标为存活目标,目标标签为
Figure PCTCN2019094662-appb-000029
Figure PCTCN2019094662-appb-000030
为空,但与存活目标轨迹集中标签为l b的目标匹配,也可判定该检测框目标为存活目标,为其进行标签赋值,即
Figure PCTCN2019094662-appb-000031
(4)干扰杂波识别
Figure PCTCN2019094662-appb-000032
为空,在第k+1帧中没有与之匹配的检测框,同时目标模板集中也不存在相匹配的目标,则判定该检测框为干扰杂波。
可选的,所述方法还包括:根据检测框置信度构造目标运动模型;
假设目标运动模型为随机游走模型,若当YOLOv3检测器检测到该目标,且检测框置信度
Figure PCTCN2019094662-appb-000033
大于置信度阈值T B时,采用检测框调整目标状态,即
Figure PCTCN2019094662-appb-000034
其中,
Figure PCTCN2019094662-appb-000035
为第k-1帧中标签为l i目标状态向量,
Figure PCTCN2019094662-appb-000036
为第k帧中标签为l i目标的检测框,检测框置信度
Figure PCTCN2019094662-appb-000037
表示检测框包含目标的概率和检测框与目标匹配程度的得分,e(k-1)表示零均值高斯白噪声,η为目标状态与对应检测框信息之间的学习率,检测框的置信度越高,η越大,表示越信任检测结果。
可选的,在k-1时刻,采用多伯努利参数集
Figure PCTCN2019094662-appb-000038
表示多目标的后验概率分布,其中,M k-1为k-1时刻存在目标的数目,
Figure PCTCN2019094662-appb-000039
表示在k-1时刻目标l i的存在概率,
Figure PCTCN2019094662-appb-000040
表示k-1时刻目标l i的概率分布,由一组加权粒子表示:
Figure PCTCN2019094662-appb-000041
其中,
Figure PCTCN2019094662-appb-000042
表示第k-1帧目标l i的第j个采样粒子的权值,
Figure PCTCN2019094662-appb-000043
表示第k-1帧目标l i的第j个采样粒子的状态向量,
Figure PCTCN2019094662-appb-000044
为目标l i的采样粒子个数,δ(·)为狄拉克函数;
多目标概率分布预仍为多伯努利参数集,表示为:
Figure PCTCN2019094662-appb-000045
存活目标的多伯努利参数预测
Figure PCTCN2019094662-appb-000046
可根据下式得到:
Figure PCTCN2019094662-appb-000047
Figure PCTCN2019094662-appb-000048
其中,
Figure PCTCN2019094662-appb-000049
Figure PCTCN2019094662-appb-000050
Figure PCTCN2019094662-appb-000051
Figure PCTCN2019094662-appb-000052
为状态转移函数,采用所述随机游走模型,通过下式计算得到:
Figure PCTCN2019094662-appb-000053
其中,
Figure PCTCN2019094662-appb-000054
Figure PCTCN2019094662-appb-000055
新生目标多伯努利参数集
Figure PCTCN2019094662-appb-000056
可由下式计算得到:
Figure PCTCN2019094662-appb-000057
其中,M Γ,k为第k帧新生目标数目。
可选的,假设第k-1帧多目标预测概率分布为:
Figure PCTCN2019094662-appb-000058
通过粒子的量测似然更新多目标后验概率分布:
Figure PCTCN2019094662-appb-000059
根据下式获得更新后的多伯努利参数集:
Figure PCTCN2019094662-appb-000060
Figure PCTCN2019094662-appb-000061
Figure PCTCN2019094662-appb-000062
其中,
Figure PCTCN2019094662-appb-000063
为目标l i对应模板的卷积特征
Figure PCTCN2019094662-appb-000064
与粒子
Figure PCTCN2019094662-appb-000065
的之间的量测似然,与相似度矩阵元素计算过程一致。
可选的,目标运动过程中,采用存活目标轨迹、目标模板和目标检测结果进行融合更新目标模板,即:
Figure PCTCN2019094662-appb-000066
其中,
Figure PCTCN2019094662-appb-000067
分别表示第k和k-1帧时目标l i对应的卷积特征模板,
Figure PCTCN2019094662-appb-000068
为第k-1帧目标l i跟踪结果卷积特征的稀疏表示,
Figure PCTCN2019094662-appb-000069
为第k帧目标l i检测框卷积特征的稀疏表示,ρ为模板的学习速率,
Figure PCTCN2019094662-appb-000070
为第k帧目标l i对应检测框的置信度,
Figure PCTCN2019094662-appb-000071
为检测框的学习率。
可选的,检测跟踪过程中,当相邻两个目标框的交并比IOU大于阈值T u时,判定这两个目标出现遮挡情况;
此时,(1)当检测器能检测出这两个目标,表示目标轻度遮挡,对目标模板进行自适应更新;(2)当只有一个目标能被检测器检测出来时,可判定另一个目标为被遮挡目标,对被遮挡目标,停止模板更新,采用该目标前两帧的位移差,估计目标实时速度v与运动方向θ,对目标进行预测,目标框的大小保持不变;(3)当两个目标都无法通过检测器检测出来时,依据目标框与模板的相似度判断被遮挡目标,与第(2)种情况中采用同样的方式处理被遮挡的目标;若目标在跟踪过程中消失或者跟丢,在目标分离或者重新出现时,根据检测结果与目标模板的匹配对目标进行重识别。
本发明有益效果是:
通过引入YOLOv3技术对视频序列中的多目标进行检测,并将其作为新生目标,考虑到检测中存在目标过检测或漏检测、以及检测结果不精确的问题,采用抗干扰的卷积特征对目标细节进行深度描述,并借助自适应PFMB方法进行目标状态精确估计,本发明中尤其是将检测结果与跟踪结果进行交互融合,以提高多目标状态的估计精度。本发明还针对YOLOv3在对复杂环境的视频多目标检测,同样会存在检测错误和检测结果不准确的问题,本发明在PFMB滤波框架下融入YOLOv3检测方法,既解决了PFMB方法中未知新生目标的估计问题、目标身份的识别问题,同时将检测置信度和滤波似然进行融合建模,也可以有效提高对复杂环境下数目变化的视频多目标跟踪精度。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明方法的算法框架图。
图2是YOLOv3工作原理示意图。
图3是YOLOv3网络结构图。
图4是EnterExitCrossingPaths1cor序列实验结果。
图5是EnterExitCrossingPaths1cor序列实验目标数目估计对比图。
图6是EnterExitCrossingPaths1cor序列实验OSPA距离估计对比图。
图7是Jogging序列实验结果。
图8是Jogging序列实验目标数目估计对比图。
图9是Jogging列实验OSPA距离估计对比图。
图10是Subway序列实验结果。
图11是Subway序列实验目标数目估计对比图。
图12是Subway序列实验OSPA距离估计对比图。
图13是Human4序列实验结果。
图14是Human4序列实验目标数目估计对比图。
图15是Human4序列实验OSPA距离估计对比图。
图16是Suv序列实验结果。
图17是Suv序列实验目标数目估计对比图。
图18是Suv序列实验OSPA距离估计对比图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
首先对本申请涉及的相关内容进行介绍如下:
1、多伯努利滤波原理
在空间χ上,将多目标状态RFS表示为X={X (1),…,X (M)},X (i)={(r (i),p (i))}为第i个目标分量,通过伯努利参数(r (i),p (i))来参数化目标后验概率分布,其中r (i)和p (i)分别表示第i个目标的存在概率和概率分布。MBF采用贝叶斯理论对多目标的后验概率分布进行迭代更新,以达到对多目标的状态估计。
假设第k-1帧多目标的后验概率分布为:
Figure PCTCN2019094662-appb-000072
其中,M k-1表示第k-1帧存活目标的个数。则预测的多目标概率分布可表示为:
Figure PCTCN2019094662-appb-000073
由第k-1帧存活目标的多伯努利参数
Figure PCTCN2019094662-appb-000074
和第k帧新生目标的多伯努利参数
Figure PCTCN2019094662-appb-000075
构成,其中,
Figure PCTCN2019094662-appb-000076
Figure PCTCN2019094662-appb-000077
<f 1(·),f 2(·)>表示标准内积∫f 1(x)f 2(x)dx,P S,k为目标存活概率,f k|k-1(x|·)为目标状态转移函数,M Γ,k表示第k帧新生目标的数目。
若第k帧,预测的多目标概率分布为:
Figure PCTCN2019094662-appb-000078
则更新后的多目标后验概率密度,可由漏检目标的多伯努利参数
Figure PCTCN2019094662-appb-000079
和量测更新后的多伯努利参数
Figure PCTCN2019094662-appb-000080
近似表示,即:
Figure PCTCN2019094662-appb-000081
其中,
Figure PCTCN2019094662-appb-000082
Figure PCTCN2019094662-appb-000083
Figure PCTCN2019094662-appb-000084
Figure PCTCN2019094662-appb-000085
ψ k,z=g k(z|x)p D,k(x)
p D,k(x)为目标检测概率,g k(z|x)表示量测似然函数,Z k、κ k(z)分别表示量测集和杂波密度函数,第k帧目标数目估计为M k=M k|k-1+|Z k|。
2、YOLOv3目标检测
YOLOv3的工作原理如图2所示。
YOLOv3采用Darknet-53作为特征提取的网络,该网络结构如图3所示,由连续的3×3和1×1卷积层组成,融合残差网络(Residual Neural Network,ResNet)的残差块(Residual block),将整个网络分为多个子段逐阶段训练,采用shortcut的连接方式对每个子段的残差进行训练,从而达到总体残差最小。
YOLOv3在三个不同尺度上预测边界框,每个尺度预测3个边界框,尺度内进行局部特征交互,在基础网络之后,添加一系列卷积层得到特征图(feature map),在此基础上,进行位置回归与分类,此过程为最小尺度预测;将上一尺度中的卷积层上采样与最后一个16×16大小的特征图连接,再次通过多个卷积后输出预测信息;同理,将中间尺度的卷积层上采样与最后一个32×32大小的特征图连接,经过一系列卷积得到最大尺度上的边界框预测。
YOLOv3采用Faster R-CNN中锚框(anchor box)的思想,通过k-means聚类的方法生成9个边界框的先验,每个尺寸各预测3个边界框。YOLOv3将图像划分为G×G个网格,则每个尺度预测的张量为G×G×[3*(4+1+80)],其中数字3表示3个预测边界框,数字4表示每个预测边界框包含4个偏移值(t x,t y,t w,t h),(t x,t y,t w,t h)为预测边界框中心坐标偏移值和宽、高的偏移值,数字1表示边界框的置信度(confidence score),数字80表示预测边界框属于80类物体条件类别概率P r(Class i/Object)。
边界框在训练时采用平方误差损失的总和。边界框的置信度由边界框包含目标的可能性P r(object)和边界框的准确度
Figure PCTCN2019094662-appb-000086
构成:
Figure PCTCN2019094662-appb-000087
若边界框中不包含物体,P r(object)的值为0,置信度为0,否则P r(object)的值为1,置信度为预测框与真实框之间的交并比。条件类别概率P r(Class i/Object)表示在边界框包含目标的前提下,它属于某个类别的概率。YOLOv3联合条件类别概率和边框置信度,得出边界框类别置信度(class specific confidence scores,C)来表示边界框包含目标归类于每个类别的可能性和边界框与目标匹配的好坏,边框类别置信度可表示为:
Figure PCTCN2019094662-appb-000088
实施例一:
本实施例提供一种基于YOLOv3多伯努利视频多目标检测跟踪方法,参见图1,所述方法包括:
步骤一:初始化
1.1参数初始化,初始时刻k=0,视频总帧数为N,初始化采样粒子最大数目为L max,粒子最小数目为L min,初始目标存在概率P s=0.99。
1.2目标检测,
采用YOLOv3算法检测第k和k+1帧视频序列,记第k帧检测框个数为n,检测框状态集为
Figure PCTCN2019094662-appb-000089
Figure PCTCN2019094662-appb-000090
第k+1帧检测框个数为m,其检测框状态集为
Figure PCTCN2019094662-appb-000091
其中,
Figure PCTCN2019094662-appb-000092
表示第i个检测框状态向量,参数
Figure PCTCN2019094662-appb-000093
分别表示第k帧第i个检测框左上角的横坐标、纵坐标,以及检测框的宽、高和标签。
1.3初始帧目标框匹配
k=0时,采用抗干扰的卷积特征表示第k帧和第k+1帧的检测框,分别记为
Figure PCTCN2019094662-appb-000094
Figure PCTCN2019094662-appb-000095
Figure PCTCN2019094662-appb-000096
表示第k帧中第i个检测框的卷积特征向量;计算检测框卷积特征对应的相似度矩阵Λ,即:
Figure PCTCN2019094662-appb-000097
其中,
Figure PCTCN2019094662-appb-000098
表示第k帧中第i个检测框与第k+1帧中第j个检测框的相似度;从相似度矩阵Λ的每一行选择值最大、且大于相似度阈值T l的两个检测框作为匹配结果,将最终匹配的检测框对作为初始的新生目标加入目标模板集和目标轨迹集中。
由于视频各帧中目标连续变化,假定同一个目标在相邻两帧不会出现特别大的位移变化,因此,在相似似然匹配的基础上,加入目标框的交并比IOU作为限制:若两个检测框的交并比IOU大于交并比阈值T u,则可判定两个检测框匹配。从相似度矩阵Λ的每一行选择值最大、且大于相似度阈值T l,同时交并比IOU大于交并比阈值T u的两个检测框作为匹配结果,将最终匹配的检测框对作为初始的新生目标,并分配标签
Figure PCTCN2019094662-appb-000099
并加入目标轨迹集
Figure PCTCN2019094662-appb-000100
中,其中,
Figure PCTCN2019094662-appb-000101
为目标状态向量,各个分量分别表示第i个目标框左上角横坐标、纵坐标以及目标框的宽、高和标签,M k为第k帧目标的个数,给已匹配的检测框对添加与初始的新生目标对应的目标标签,即
Figure PCTCN2019094662-appb-000102
同时建立新生目标模板。
步骤二:目标识别
当k>0时,根据相邻帧的检测框、目标模板集和存在目标轨迹集来判定目标是否为新生目标、重识别目标或存在目标。
2.1检测框匹配
计算第k和k+1帧视频序列中检测框之间的相似度矩阵Λ,从相似度矩阵Λ的每一行选择值最大且大于相似度阈值T l的两个检测框,且这两个检测框的交并比IOU大于交并比阈值T u,则可判定两个检测框匹配,假设第k帧中的第i个检测框与第k+1帧中的第j个检测框匹配,则为第k+1帧中匹配目标的标签赋值,即:
Figure PCTCN2019094662-appb-000103
Figure PCTCN2019094662-appb-000104
表示第k帧中第i个检测框的标签,若
Figure PCTCN2019094662-appb-000105
为空,则表示该检测框包含的目标在第k-1帧中未被检测到。
2.2目标识别
(1)新生目标识别
若为
Figure PCTCN2019094662-appb-000106
空,与第k+1帧第j个检测框匹配,且目标模板集中没有与之匹配的目标,则判定该目标为新生目标,分配标签
Figure PCTCN2019094662-appb-000107
建立新生目标状态集:
Figure PCTCN2019094662-appb-000108
其中
Figure PCTCN2019094662-appb-000109
表示标签为l i的新生目标状态向量,M Γ表示新生目标个数,n表示目标模板的个数。给已匹配的检测框对添加与新生目标对应相同的目标标签,即
Figure PCTCN2019094662-appb-000110
同时将新生目标加入目标模板集,并根据新生参数初始化新生目标的采样粒子。
(2)漏跟目标重识别
Figure PCTCN2019094662-appb-000111
为空,与第k+1帧第j个检测框匹配,在存活目标轨迹集中没有与之匹配的目标,但与目标模板集中标签为l a的目标匹配,表示该检测框包含的目标在之前帧漏检,且出现漏跟情况,在第k帧,对该目标进行了重识别,将其重新加入到存活目标轨迹中:
Figure PCTCN2019094662-appb-000112
其中,
Figure PCTCN2019094662-appb-000113
表示标签为l i的存活目标状态向量。给已匹配的检测框对添加与重识别目标对应相同的目标标签,即
Figure PCTCN2019094662-appb-000114
(3)存活目标识别
Figure PCTCN2019094662-appb-000115
不为空,则该检测框目标为存活目标,目标标签为
Figure PCTCN2019094662-appb-000116
Figure PCTCN2019094662-appb-000117
为空,但与存活目标轨迹中标签为l b的目标匹配,也可判定该检测框目标为存活目标,为其进行标签赋值,即
Figure PCTCN2019094662-appb-000118
(4)干扰杂波识别
Figure PCTCN2019094662-appb-000119
为空,在第k+1帧中没有与之匹配的检测框,同时目标模板中也不存在相匹配的目标,则判定该检测框为干扰杂波。
步骤三:目标预测
3.1目标运动模型
假设目标运动模型为随机游走模型,若当YOLOv3检测器检测到该目标,且检测框置信度
Figure PCTCN2019094662-appb-000120
大于置信度阈值T B时,采用检测框调整目标状态,即
Figure PCTCN2019094662-appb-000121
其中,
Figure PCTCN2019094662-appb-000122
为第k-1帧中标签为l i目标状态向量,
Figure PCTCN2019094662-appb-000123
为第k帧中标签为l i目标的检测框,检测框置信度
Figure PCTCN2019094662-appb-000124
表示检测框包含目标的概率和检测框与目标匹配程度的得分,e(k-1)表示零均值高斯白噪声,η为目标状态与对应检测框信息之间的学习率,检测框的置信度越高,η越大,表示越信任检测结果,利用好的检测框去调整目标状态,能消除长时间跟踪而导致的累加误差,可以较好地优化预测结果。
3.2目标预测
在k-1时刻,采用多伯努利参数集
Figure PCTCN2019094662-appb-000125
表示多目标的后验概率分布,其中,M k-1为k-1时刻存在目标的数目,
Figure PCTCN2019094662-appb-000126
表示在k-1时刻目标l i的存在概率,
Figure PCTCN2019094662-appb-000127
表示k-1时刻目标l i的概率分布,由一组加权粒子表示:
Figure PCTCN2019094662-appb-000128
其中,
Figure PCTCN2019094662-appb-000129
表示第k-1帧目标l i的第j个采样粒子的权值,
Figure PCTCN2019094662-appb-000130
表示第k-1帧目标l i的第j个采样粒子的状态向量,
Figure PCTCN2019094662-appb-000131
为目标l i的采样粒子个数,δ(·)为狄拉克函数。
多目标概率分布预仍为多伯努利参数集,表示为:
Figure PCTCN2019094662-appb-000132
存活目标的多伯努利参数预测
Figure PCTCN2019094662-appb-000133
可根据下式得到:
Figure PCTCN2019094662-appb-000134
Figure PCTCN2019094662-appb-000135
其中,
Figure PCTCN2019094662-appb-000136
Figure PCTCN2019094662-appb-000137
Figure PCTCN2019094662-appb-000138
Figure PCTCN2019094662-appb-000139
为状态转移函数,采用上文提出的运动模型,可通过下式计算得到:
Figure PCTCN2019094662-appb-000140
其中,
Figure PCTCN2019094662-appb-000141
Figure PCTCN2019094662-appb-000142
新生目标多伯努利参数集
Figure PCTCN2019094662-appb-000143
可由下式计算得到:
Figure PCTCN2019094662-appb-000144
Figure PCTCN2019094662-appb-000145
其中,M Γ,k为第k帧新生目标数目。
步骤四:相似度计算
4.1目标卷积特征提取
(1)构造卷积核
在目标框周围,以目标框大小的矩形框随机采样m个背景样本框,背景样本框与目标框中心位置的距离,要求在水平方向距离大于1/4目标框宽度或者在竖直方向距离大于1/4目标框高度,然后将目标框与背景样本框尺寸规范化为n×n,并灰度化图像,得到样本集{I,B 1,B 2,…,B m},其中I表示目标框图像,B i为第i个背景样本图像。使用大小为w×w的滑动窗口,以步长Δ分别对样本集图像进行卷积操作,得到目标图像块集合y={Y 1,Y 2,…,Y l}和背景图像块集合
Figure PCTCN2019094662-appb-000146
其中,Y i∈R w×w,Z ij∈R w×w,l=(n-w+Δ) 2,为保留梯度信息,弱化亮度影响,将所有图像块减去自身均值并二范数归一化处理,最终,使用k-means算法,从目标图像块集合中选出d个图像块作为目标卷积核:
Figure PCTCN2019094662-appb-000147
从m个背景样本对应的背景图像块集合中,选出m×d个背景图像块
Figure PCTCN2019094662-appb-000148
对这些图像块平均池化得到背景卷积核:
Figure PCTCN2019094662-appb-000149
(2)提取卷积特征
利用提取的目标卷积核,在目标图像I上,以步长Δ进行卷积操作
Figure PCTCN2019094662-appb-000150
提取目标特征图
Figure PCTCN2019094662-appb-000151
其中,
Figure PCTCN2019094662-appb-000152
同时采用(1)中所提取的背景卷积核,同样以步长Δ在图像I上卷积
Figure PCTCN2019094662-appb-000153
得到相应的背景特征图
Figure PCTCN2019094662-appb-000154
在目标特征图上进行背景信息减除:
Figure PCTCN2019094662-appb-000155
提取弱化背景信息的特征图
Figure PCTCN2019094662-appb-000156
将特征图按行展开并顺序拼接,得到最终的一维卷积特征f,其中,
Figure PCTCN2019094662-appb-000157
4.2特征的稀疏表示
将特征图集看作三维的张量C∈R (n-w+Δ)×(n-w+Δ),对张量进行稀疏化表示,凸显目标的特征,利用稀疏向量f去逼近vec(C),使以下目标函数最小化:
Figure PCTCN2019094662-appb-000158
其中,vec(C)是通过串联C中所有元素的列向量,
Figure PCTCN2019094662-appb-000159
通过soft-shrinking方法求得稀疏表示的唯一解:
Figure PCTCN2019094662-appb-000160
其中,λ是张量C的中位数。
4.3计算相似度
两个目标框的相似度计算公式为:
Figure PCTCN2019094662-appb-000161
其中f 1,f 2为对应的目标框卷积特征,在计算粒子与目标的量测似然时,利用目标对应的目标特征图集和背景特征图集来计算粒子对应的卷积特征:
Figure PCTCN2019094662-appb-000162
其中I l为粒子表示的候选框的规范化之后的图像。将S i按行展开,得到d个一维向量,顺序连接为最终粒子对应候选框的特征f l
步骤五:目标状态更新及提取
5.1目标状态更新
假设第k-1帧多目标预测概率分布为:
Figure PCTCN2019094662-appb-000163
通过粒子的量测似然更新多目标后验概率分布:
Figure PCTCN2019094662-appb-000164
根据下式获得更新后的多伯努利参数集:
Figure PCTCN2019094662-appb-000165
Figure PCTCN2019094662-appb-000166
Figure PCTCN2019094662-appb-000167
其中,
Figure PCTCN2019094662-appb-000168
为目标l i对应模板的卷积特征
Figure PCTCN2019094662-appb-000169
与粒子
Figure PCTCN2019094662-appb-000170
的之间的量测似然,与相似度矩阵元素计算过程一致。
5.2目标状态提取
为了防止粒子出现退化,本发明采用随机重采样方式对采样粒子集进行重采样,来避免粒子退化,剔除存在概率过小的伯努利分量。根据更新后的多目标后验概率分布,提取存在概率大于0.5的伯努利分量对应的目标状态.
步骤六:目标遮挡处理机制
当相邻两个目标框的交并比IOU大于阈值T I时,可判定这两个目标出现紧邻,且部分遮挡,此时,检测器可能出现三种情况:1)当目标没有被完全遮挡时,检测器可能检测出两个目标,本发明方法中提出采用更新机制对目标进行跟踪和模板更新;2)只有一个目标能被检测器检测出来时,可判定另一个目标为被遮挡目标,提出对被遮挡目标,停止模板更新,采用该目标前两帧的位移差估计目标实时速度v与运动方向θ,对目标进行预测估计,目标框的大小保持不变;3)两个目标都无法通过检测器检测出来时,依据目标与模板的相似度判断被遮挡目标,与第二种情况中采用同样的方式处理被遮挡的目标。若目标在跟踪过程中消失或者跟丢,在目标分离或者重新出现时,可根据检测结果与目标模板的匹配对目标进行重识别。
步骤七:目标模板更新
目标运动过程中,周围环境及自身状态会不断发生变化,如背景变化、自身扭曲、旋转及尺度变化等,因此,需要对目标模板进行实时更新,综合考虑采用存活目标轨迹、目标模板和目标检测结果进行融合更新目标模板,即:
Figure PCTCN2019094662-appb-000171
其中,
Figure PCTCN2019094662-appb-000172
分别表示第k和k-1帧时目标l i对应的卷积特征模板,
Figure PCTCN2019094662-appb-000173
为第k-1帧目标l i跟踪结果卷积特征的稀疏表示,
Figure PCTCN2019094662-appb-000174
为第k帧目标l i检测框卷积特征的稀疏表示,ρ为模板的学习速率,
Figure PCTCN2019094662-appb-000175
为第k帧目标l i对应检测框的置信度,
Figure PCTCN2019094662-appb-000176
为检测框的学习率,采用较好的检测框去更新目标模板,可以有效去除跟踪过程中的累积误差。同时,在运动过程中,目标状态会发生不断变化,采用高准确率的检测框去更新模板,可有效地将目标最新状态加入到模板中,更好地适应目标后续跟踪。
为验证本申请提出的上述基于YOLOv3多伯努利视频多目标检测跟踪方法的效果,特实验如下:
1、实验条件及参数
本发明采用的视频序列数据为Visual Tracker Benchmark TB50中的序列Huamn4,Visual Tracker Benchmark TB100中的序列Jogging、Subway、Suv,以及CAVUAR数据集中的序列EnterExitCrossingPaths1cor,这5组典型视频序列分别来源于不同场景,且包含动态背景、目标紧邻、目标形变、图片模糊、目标尺寸变化、目标遮挡等干扰情况。实验中采用的评价指标为多目标跟踪正确度(Multiple Object Tracking Accuracy,MOTA)、多目标跟踪精度(Multiple Object Tracking Precision,MOTP)、轨迹完整目标数目(Mostly Tracked,MT)、标签跳变数(Identity Switch,IDs),分别定义如下:
1)多目标跟踪正确度(MOTA)
Figure PCTCN2019094662-appb-000177
其中,m t为第t帧中被跟丢的目标数量,fp t为第t帧误检的目标数,mme t为第t帧中跟踪轨迹中目标标签发生跳变的数目,g t表示第t帧中目标的实际个数。
2)多目标跟踪精度(MOTP)
Figure PCTCN2019094662-appb-000178
其中,
Figure PCTCN2019094662-appb-000179
为第t帧中第i个目标的跟踪框与目标真实框的重叠率。
3)轨迹完整目标数目(MT),表示目标跟踪轨迹占真实轨迹长度80%以上的目标轨迹数目,刻画了轨迹的完整程度。
4)标签跳变数(IDs),表示跟踪过程中目标标签发生变化的次数。
2、实验及结果分析
本申请方法采用Matlab2016a实现,在处理器为Intel Core i7-8700、3.2GHz,12核,内存为16GB,显卡为NVIDIA Geforce GTX 1080 Ti的工作站上运行,并与传统的粒子多伯努利滤波(PFMB)方法和2017年Erik Bochinski等在发表论文《High-Speed Tracking-by-Detection Without Using Image Informations》中提出的IOU-Tracker方法进行性能比较与分析。由于传统PFMB方法中,缺少对新生目标检测机制,本实验中也采用YOLOv3进行检测,并将判定为新生目标的检测结果作为PFMB的新生目标,然后进行跟踪。
具体实验从四个方面对发明方法进行性能评估,即:目标重识别、目标紧邻与遮挡、图像模糊与目标形变、目标大位移等,实验结果如下。
实验一:目标重识别
本实验采用的视频序列为CAVUAR数据集中EnterExitCrossingPaths1cor序列,该序列包含383帧图像,存在目标紧邻和较长时间的目标遮挡问题,同时伴随目标逐渐出现导致形变较大的情况。由于较长时间遮挡,很容易导致目标跟丢,本发明融合检测器结果、存在目标轨迹和目标模板能够实现对目标的重识别,将目标重新加入到跟踪轨迹中。
图4给出了EnterExitCrossingPaths1cor序列实验结果,其中,图4(a)为YOLOv3算法检测结果,图4(b)为传统PFMB方法跟踪结果,图4(c)为IOU-Tracker方法跟踪结果,图4(d)为本发明方法的跟踪结果。可以看出,当目标被遮挡时,检测器可能无法检测出被遮挡目标,大幅度降低检测效果,如图4(a)中第93帧,采用传统的PFMB方法,存在遮挡时,目标被跟丢;由于IOU-Tracker完全抛弃使用图像信息,只利用目标检测结果进行跟踪处理,所以该方法也无法继续跟踪漏检目标,当目标再次被检测到时,将会被定义为新目标,难以与历史目标关联;而本发明方法,对于较长时间被遮挡的目标,由于存在概率逐渐降低导致目标消失,但当目标再次出现时,能有效地进行重识别,重新加入到目标跟踪轨迹中。
图5为EnterExitCrossingPaths1cor序列实验目标数目估计对比图,图6为EnterExitCrossingPaths1cor序列实验OSPA距离估计对比图,可以看出,传统的PFMB方法缺少目标遮挡处理机制,在目标被被遮挡后,容易出现误跟和漏跟情况,导致跟踪框偏离较大,OSPA值上升;IOU-Tracker方法受检测器性能影响,目标被遮挡过程中,目标数目减少,OSPA值急剧上升,本发明方法处理长时间被遮挡目标时,也可能出现目标漏跟的情况,当目标脱离遮挡后,能有效地对漏跟目标进行重识别,整体跟踪性能优于传统PFMB方法和IOU-Tracker方法。尤其从表1所示的MOTA、MOTP、MT、IDs四个评价指标可以看出。
表1为50次蒙特卡洛仿真结果,可以看出,传统PFMB方法的指标MOTA、MOTP、MT的值最低,IDs为1,是因为该方法缺少目标重识别过程,导致目标被跟丢后,再重新出现时将无法与之前的轨迹关联上,从而会出现标签跳变情况。IOU-Tracker方法在目标紧邻或遮挡时,目标标签将会出现频繁跳变,导致IDs的值最高。而本发明方法能有效地对目标进行重识别,减少目标标签跳变的问题,有效减少轨迹碎片。
表1目标重识别跟踪性能评价(表中↑表示值越大越好,↓表示值越小越好)
Figure PCTCN2019094662-appb-000180
实验二:目标紧邻与遮挡
采用视频序列为Visual Tracker Benchmark TB100数据集中的Jogging序列、Subway序列。Jogging序列为相机移动的路口场景,包含3个运动目标,存在目标被遮挡的情况。Subway序列包含8个目标,存在多个目标紧邻和频繁被遮挡等问题。
Jogging序列的实验结果如图7所示,其中,图7(a)为YOLOv3算法检测结果,图7(b)为传统PFMB方法跟踪结果,图7(c)为IOU-Tracker方法跟踪结果,图7(d)为本发明方法的跟踪结果。可以看出,YOLOv3检测器难以正确检测出被遮挡目标,IOU-Tracker方法丢失漏检目标,当目标脱离遮挡状态时,IOU-Tracker方法将重新检测到的目标定义为新的目标,传统的PFMB方法即使没有丢失目标,但是跟踪框的偏离程度却增大,而本发明的方法能很好地融合目标的检测结果和跟踪结果,实现对跟踪框进行调整,可以获得比较准确的跟踪结果。
图8给出了Jogging序列实验目标数目变化估计对比图,其中传统PFMB方法与本发明方法的目标数目变化一致,图9给出了Jogging序列实验OSPA距离估计对比图,可以看出,YOLOv3检测器存在漏检,导致IOU-Tracker方法目标数目估计不准确,出现漏估计减少,OSPA值急剧上升较大,精度不高;传统PFMB方法在目标被遮挡后,跟踪框的偏离程度增加,导致OSPA值上升增大,而本发明方法具有较好抗遮挡能力,跟踪性能明显优于传统PFMB方法与和IOU-Tracker方法。
Subway序列的实验结果如图10所示,其中,图10(a)为YOLO v3算法检测结果,图10(b)为传统PFMB方法跟踪结果,图10(c)为IOU-Tracker方法跟踪结果,图10(d)为本发明方法的跟踪结果。可以看出,当多个目标邻近或相互遮挡的时候,检测器难以检测到被遮挡目标或者检测结果偏差较大,导致IOU-Tracker方法频繁丢失目标,传统的PFMB方法处理遮挡、紧邻问题时,也会出现较大的偏差,甚至跟丢,如图10(b)中第17、19帧,而本发明方法能够较好地对被遮挡目标进行跟踪。
图11给出Subway序列目标数目估计对比图,图12给出了Subway序列OSPA距离估计对比图。可以看出,由于YOLOv3检测器难以检测到部分出现和被遮挡目标,所以IOU-Tracker方法目标数目变化大,OSPA值较高,传统的PFMB方法在跟踪过程中丢失目标,OSPA值突增,而本发明方法能够较好地处理目标遮挡问题,具有较高的跟踪精度。
进行50次蒙特卡洛仿真,Jogging序列跟踪的量化统计结果如表2.1所示。可以看出,虽然传统PFMB方法与本发明方法的多目标跟踪正确度MOTA相当,但遮挡恢复后,传统PFMB方法的跟踪框会出现偏差,导致跟踪误差偏大,所以MOTP值比本发明方法要小。此外,由于检测器对被遮挡目标检测效果差,且该序列中两个目标一直处于紧邻状态,所以IOU-Tracker方法在对存在紧邻的序列Jogging上跟踪精度较低,且目标标签出现频繁跳变。
Subway序列的定量分析结果如表2.2所示,该序列中,多个目标也长期处于互相遮挡、紧邻状态,本发明方法在MOTA、MOTP、MT、IDs上都明显要优于传统PFMB方法与IOU-Tracker方法。
表2.1 Jogging序列目标紧邻与遮挡
Figure PCTCN2019094662-appb-000181
表2.2 Subway序列目标紧邻与遮挡
Figure PCTCN2019094662-appb-000182
实验三:图像模糊与目标形变
采用视频序列Visual Tracker Benchmark TB50数据集中的Human4序列,该序列共包含667帧图像,为相机移动红绿灯路口场景,其中,包括三种类型17个目标,视频中存在由于相机运动或目标快速运动等造成的目标模糊情况,且存在许多目标频繁发生形变问题。
图13为Human4序列实验结果,其中,图13(a)为YOLO v3算法检测结果,(b)为传统PFMB方法跟踪结果,(c)为IOU-Tracker方法跟踪结果,(d)为本发明方法的跟踪结果。可以看出,图片模糊可能导致检测器性能下降,部分目标被漏检,如图13(a)中第50帧、第74帧。目标形变对检测器的性能影响较小,虽然,传统PFMB方法能跟上目标,但部分目标的跟踪框会出现偏离,而本发明方法能够较好地处理这两种情况,具有较高的跟踪精度。
图13给出Human4序列目标数目估计对比图,图14给出了Human4序列OSPA距离估计对比图。可以看出,本发明方法能较好的处理图像模糊与目标形变问题,跟踪性能优于传统的PFMB方法和IOU-Tracker方法。
进行50次蒙特卡洛仿真,定量分析结果如表3所示。可以看出,由于频繁的目标形变、图像模糊而导致传统PFMB方法中出现目标漏跟或仅跟踪到目标的部分区域,导致MOTA、MOTP、MT的值都比较低;IOU-Tracker方法能较好地利用检测器的检测结果,所以,MOTP值与本发明方法相当,而目标尺寸变化和图像模糊导致检测器性能下降,存在部分目标被漏检,使得MOTA值偏小,IDs值最大,即存在大量目标标签跳变,航迹估计不准确。本发明方法能较好地对检测框进行关联,有效减少目标标签的跳变,在IDs上明显优于IOU-Tracker。
表3目标模糊与形变跟踪性能评价
Figure PCTCN2019094662-appb-000183
实验四:目标大位移
实验中采用视频序列Visual Tracker Benchmark TB100数据集中的Suv序列,该序列共包含400帧图像,为相机移动动态背景的公路场景,包含6个目标,存在快速运动导致的大位移情况。
图16给出了Suv序列实验结果,其中,图16(a)为YOLO v3算法检测结果,图16(b)为传统PFMB方法跟踪结果,图16(c)为IOU-Tracker方法跟踪结果,图16(d)为本发明方法的跟踪结果。可以看出,目标的大位移对检测器没有影响,IOU-Tracker方法表现良好,而传统PFMB方法会出现目标跟丢的情况,本发明方法由于利用检测器的结果去调整跟踪过程,对于较大位移的目标跟踪结果也明显要优于另外两种方法。
图17为Suv序列目标数目估计对比图,图18为Suv序列OSPA距离估计对比图。可以看出,传统的PFMB方法由于缺少对目标大位移的处理机制,所以,当目标位移较大时,可能出现目标丢失或跟踪框偏离较大的情况,导致OSPA值较大,而IOU-Tracker方法与本发明方法能利用检测器结果提高对目标大位移的处理能力,具有较好的跟踪精度。
进行50次蒙特卡洛仿真,定量分析结果如表4所示。对大位移目标进行跟踪,传统的PFMB方法容易出现漏跟、位置偏移过大的情况因此,四个评价指标结果都比较差;由于视频序列也还存在目标遮挡和紧邻等情况,导致IOU-Tracker方法跟踪结果中,目标标签也出现频繁的跳变情况,所以MOTA值偏低,IDs值偏高。而本发明方法能融合检测结果有效地处理目标大位移情况,在各项指标上都明显优于传统PFMB方法,在MOTA,MT,IDs上也明显优于IOU-Tracker方法。
表4目标大位移跟踪性能评价
Figure PCTCN2019094662-appb-000184
通过上述实验可知,本申请提供的基于YOLOv3多伯努利视频多目标检测跟踪方法,通过将检测结果与跟踪结果进行交互融合,在目标重识别过程中,对于较长时间被遮挡的目标,当其再次出现时,能有效地进行重识别;而对于存在多个目标紧邻和频繁被遮挡情况时,本申请提出的方法能够融合目标的检测结果和跟踪结果,实现对跟踪框进行调整,获得比较准确的跟踪结果;而对于存在图像模糊与目标形变情况时,本申请提出的方法能够较好地对检测框进行关联,有效减少目标标签的跳变;而对于存在目标大位移情况时,本申请提出的方法能够利用检测器的结果去调整跟踪过程,从而防止出现目标跟丢的情况发生。
本发明实施例中的部分步骤,可以利用软件实现,相应的软件程序可以存储在可读取的存储介质中,如光盘或硬盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (11)

  1. 一种多目标检测跟踪方法,其特征在于,所述方法检测跟踪过程中,采用YOLOv3技术检测第k和k+1帧视频序列,记第k帧检测框个数为n,检测框状态集为
    Figure PCTCN2019094662-appb-100001
    第k+1帧检测框个数为m,其检测框状态集为
    Figure PCTCN2019094662-appb-100002
    其中,
    Figure PCTCN2019094662-appb-100003
    表示第i个检测框状态向量,参数
    Figure PCTCN2019094662-appb-100004
    分别表示第k帧第i个检测框左上角的横坐标、纵坐标,以及检测框的宽、高和标签;
    将第k和k+1帧视频序列的检测框进行匹配;对于视频的初始帧,k=0,将初始帧中已匹配的检测框作为初始的新生目标加入目标模板集和目标轨迹集中;对于视频的中间帧,k>0,利用检测框、目标轨迹信息和目标模板信息实现新生目标判定、漏跟目标重识别和存活目标优化跟踪;其中,存活目标优化跟踪是在多伯努利滤波框架下,利用当前帧置信度大于给定置信度阈值T b的检测框信息,优化对应目标的跟踪过程。
  2. 根据权利要求1所述的方法,其特征在于,所述对于视频的初始帧,k=0,将已匹配的检测框作为新生目标加入目标模板集和目标轨迹集中,包括:
    采用抗干扰的卷积特征表示第k帧和第k+1帧的检测框,分别记为
    Figure PCTCN2019094662-appb-100005
    Figure PCTCN2019094662-appb-100006
    Figure PCTCN2019094662-appb-100007
    表示第k帧中第i个检测框的卷积特征向量;计算检测框卷积特征对应的相似度矩阵Λ,即:
    Figure PCTCN2019094662-appb-100008
    其中,
    Figure PCTCN2019094662-appb-100009
    表示第k帧中第i个检测框与第k+1帧中第j个检测框的相似度;从相似度矩阵Λ的每一行选择值最大、且大于相似度阈值T l的两个检测框作为匹配结果,将最终匹配的检测框对作为初始的新生目标加入目标模板集和目标轨迹集中。
  3. 根据权利要求2所述的方法,其特征在于,将初始两帧的检测框进行匹配,并判定新生目标,将最终匹配的检测框对作为初始的新生目标加入目标模板集中,包括:
    假定同一个目标在相邻两帧不会出现特别大的位移变化,在相似度匹配的基础上,加入目标框的交并比IOU作为限制:若两个检测框的交并比IOU大于交并比阈值T u,则可判定两个检测框匹配;
    从相似度矩阵Λ的每一行选择值最大、且大于相似度阈值T l,同时交并比IOU大于交并比阈值T u的两个检测框作为匹配结果,将最终匹配的检测框对作为初始的新生目标,分配标签
    Figure PCTCN2019094662-appb-100010
    并加入目标轨迹集
    Figure PCTCN2019094662-appb-100011
    中,其中,
    Figure PCTCN2019094662-appb-100012
    为目标状态向量,各个分量分别表示第i个目标框左上角横坐标、纵坐标以及目标框的宽、高和标签,M k为第k帧目标的个数,给已匹配的检测框对添加与初始的新生目标对应的目标标签,即
    Figure PCTCN2019094662-appb-100013
    同时建立新生目标模板。
  4. 根据权利要求3所述的方法,其特征在于,所述对于视频的中间帧,k>0,利用检测框、目标轨迹信息和目标模板信息来实现新生目标判定、漏跟目标重识别和存活目标优化跟踪,包括:
    对相邻的检测框进行匹配,确定新生目标与重识别目标的判定条件,再根据相邻帧的检测框、目标模板集和存活目标轨迹集来判定目标是否为新生目标、重识别目标或存活目标。
  5. 根据权利要求4所述的方法,其特征在于,所述对相邻的检测框进行匹配,包括:
    计算第k和k+1帧视频序列中检测框之间的相似度矩阵Λ,从相似度矩阵Λ的每一行选择值最大且大于相似度阈值T l的两个检测框,且这两个检测框的交并比IOU大于交并比阈值T u,则可判定两个检测框匹配,假设第k帧中的第i个检测框与第k+1帧中的第j个检测框匹配,则为第k+1帧中匹配目标的标签赋值,即:
    Figure PCTCN2019094662-appb-100014
    Figure PCTCN2019094662-appb-100015
    表示第k帧中第i个检测框的标签,若
    Figure PCTCN2019094662-appb-100016
    为空,则表示该检测框包含的目标在第k-1帧中未被检测到。
  6. 根据权利要求5所述的方法,其特征在于,所述对于当前帧检测框目标,根据相邻帧的检测框、目标模板集和存活目标轨迹集来进行目标识别,包括:
    (1)新生目标识别
    若为
    Figure PCTCN2019094662-appb-100017
    空,与第k+1帧第j个检测框匹配,且目标模板集中没有与之匹配的目标,则判定该目标为 新生目标,分配标签
    Figure PCTCN2019094662-appb-100018
    建立新生目标状态集:
    Figure PCTCN2019094662-appb-100019
    其中
    Figure PCTCN2019094662-appb-100020
    表示标签为l i的新生目标状态向量,M Γ表示新生目标个数,n表示目标模板的个数;给已匹配的检测框对添加与新生目标对应相同的目标标签,即
    Figure PCTCN2019094662-appb-100021
    同时将新生目标加入目标模板集,并根据新生参数初始化新生目标的采样粒子;
    (2)漏跟目标重识别
    Figure PCTCN2019094662-appb-100022
    为空,与第k+1帧第j个检测框匹配,在存活目标轨迹集中没有与之匹配的目标,但与目标模板集中标签为l a的目标匹配,表示该检测框包含的目标在之前帧漏检,且出现漏跟情况,在第k帧,对该目标进行了重识别,将其重新加入到存活目标轨迹集中:
    Figure PCTCN2019094662-appb-100023
    其中,
    Figure PCTCN2019094662-appb-100024
    表示标签为l i的存活目标状态向量;给已匹配的检测框对添加与重识别目标对应相同的目标标签,即
    Figure PCTCN2019094662-appb-100025
    (3)存活目标识别
    Figure PCTCN2019094662-appb-100026
    不为空,则该检测框目标为存活目标,目标标签为
    Figure PCTCN2019094662-appb-100027
    Figure PCTCN2019094662-appb-100028
    为空,但与存活目标轨迹集中标签为l b的目标匹配,也可判定该检测框目标为存活目标,为其进行标签赋值,即
    Figure PCTCN2019094662-appb-100029
    (4)干扰杂波识别
    Figure PCTCN2019094662-appb-100030
    为空,在第k+1帧中没有与之匹配的检测框,同时目标模板集中也不存在相匹配的目标,则判定该检测框为干扰杂波。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:根据检测框置信度构造目标运动模型;
    假设目标运动模型为随机游走模型,若当YOLOv3检测器检测到该目标,且检测框置信度
    Figure PCTCN2019094662-appb-100031
    大于置信度阈值T B时,采用检测框调整目标状态,即
    Figure PCTCN2019094662-appb-100032
    其中,
    Figure PCTCN2019094662-appb-100033
    为第k-1帧中标签为l i目标状态向量,
    Figure PCTCN2019094662-appb-100034
    为第k帧中标签为l i目标的检测框,检测框置信度
    Figure PCTCN2019094662-appb-100035
    表示检测框包含目标的概率和检测框与目标匹配程度的得分,e(k-1)表示零均值高斯白噪声,η为目标状态与对应检测框信息之间的学习率,检测框的置信度越高,η越大,表示越信任检测结果。
  8. 根据权利要求7所述的方法,其特征在于,在k-1时刻,采用多伯努利参数集
    Figure PCTCN2019094662-appb-100036
    Figure PCTCN2019094662-appb-100037
    表示多目标的后验概率分布,其中,M k-1为k-1时刻存在目标的数目,
    Figure PCTCN2019094662-appb-100038
    表示在k-1时刻目标l i的存在概率,
    Figure PCTCN2019094662-appb-100039
    表示k-1时刻目标l i的概率分布,由一组加权粒子表示:
    Figure PCTCN2019094662-appb-100040
    其中,
    Figure PCTCN2019094662-appb-100041
    表示第k-1帧目标l i的第j个采样粒子的权值,
    Figure PCTCN2019094662-appb-100042
    表示第k-1帧目标l i的第j个采样粒子的状态向量,
    Figure PCTCN2019094662-appb-100043
    为目标l i的采样粒子个数,δ(·)为狄拉克函数;
    多目标概率分布预仍为多伯努利参数集,表示为:
    Figure PCTCN2019094662-appb-100044
    存活目标的多伯努利参数预测
    Figure PCTCN2019094662-appb-100045
    可根据下式得到:
    Figure PCTCN2019094662-appb-100046
    Figure PCTCN2019094662-appb-100047
    其中,
    Figure PCTCN2019094662-appb-100048
    Figure PCTCN2019094662-appb-100049
    Figure PCTCN2019094662-appb-100050
    Figure PCTCN2019094662-appb-100051
    为状态转移函数,采用所述随机游走模型,通过下式计算得到:
    Figure PCTCN2019094662-appb-100052
    其中,
    Figure PCTCN2019094662-appb-100053
    Figure PCTCN2019094662-appb-100054
    新生目标多伯努利参数集
    Figure PCTCN2019094662-appb-100055
    可由下式计算得到:
    Figure PCTCN2019094662-appb-100056
    Figure PCTCN2019094662-appb-100057
    其中,M Γ,k为第k帧新生目标数目。
  9. 根据权利要求8所述的方法,其特征在于,假设第k-1帧多目标预测概率分布为:
    Figure PCTCN2019094662-appb-100058
    通过粒子的量测似然更新多目标后验概率分布:
    Figure PCTCN2019094662-appb-100059
    根据下式获得更新后的多伯努利参数集:
    Figure PCTCN2019094662-appb-100060
    Figure PCTCN2019094662-appb-100061
    Figure PCTCN2019094662-appb-100062
    其中,
    Figure PCTCN2019094662-appb-100063
    为目标l i对应模板的卷积特征
    Figure PCTCN2019094662-appb-100064
    与粒子
    Figure PCTCN2019094662-appb-100065
    的之间的量测似然,与相似度矩阵元素计算过程一致。
  10. 根据权利要求9所述的方法,其特征在于,目标运动过程中,采用存活目标轨迹、目标模板和目标检测结果进行融合更新目标模板,即:
    Figure PCTCN2019094662-appb-100066
    其中,
    Figure PCTCN2019094662-appb-100067
    分别表示第k和k-1帧时目标
    Figure PCTCN2019094662-appb-100068
    对应的卷积特征模板,
    Figure PCTCN2019094662-appb-100069
    为第k-1帧目标l i跟踪结果卷积特征的稀疏表示,
    Figure PCTCN2019094662-appb-100070
    为第k帧目标l i检测框卷积特征的稀疏表示,ρ为模板的学习速率,
    Figure PCTCN2019094662-appb-100071
    为第k帧目标l i对应检测框的置信度,
    Figure PCTCN2019094662-appb-100072
    为检测框的学习率。
  11. 根据权利要求10所述的方法,其特征在于,检测跟踪过程中,当相邻两个目标框的交并比IOU大于阈值T u时,判定这两个目标出现遮挡情况;
    此时,(1)当检测器能检测出这两个目标,表示目标轻度遮挡,对目标模板进行自适应更新;
    (2)当只有一个目标能被检测器检测出来时,可判定另一个目标为被遮挡目标,对被遮挡目标,停止模板更新,采用该目标前两帧的位移差,估计目标实时速度v与运动方向θ,对目标进行预测,目标框的大小保持不变;
    (3)当两个目标都无法通过检测器检测出来时,依据目标框与模板的相似度判断被遮挡目标,与第(2)种情况中采用同样的方式处理被遮挡的目标;若目标在跟踪过程中消失或者跟丢,在目标分离或者重新出现时,根据检测结果与目标模板的匹配对目标进行重识别。
PCT/CN2019/094662 2019-04-23 2019-07-04 基于YOLOv3多伯努利视频多目标检测跟踪方法 WO2020215492A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/869,645 US11094070B2 (en) 2019-04-23 2020-05-08 Visual multi-object tracking based on multi-Bernoulli filter with YOLOv3 detection

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910328735.X 2019-04-23
CN201910328735.XA CN110084831B (zh) 2019-04-23 2019-04-23 基于YOLOv3多伯努利视频多目标检测跟踪方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/869,645 Continuation US11094070B2 (en) 2019-04-23 2020-05-08 Visual multi-object tracking based on multi-Bernoulli filter with YOLOv3 detection

Publications (1)

Publication Number Publication Date
WO2020215492A1 true WO2020215492A1 (zh) 2020-10-29

Family

ID=67416267

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/094662 WO2020215492A1 (zh) 2019-04-23 2019-07-04 基于YOLOv3多伯努利视频多目标检测跟踪方法

Country Status (2)

Country Link
CN (1) CN110084831B (zh)
WO (1) WO2020215492A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205544A (zh) * 2021-04-27 2021-08-03 武汉大学 基于交并比估计的空间注意力强化学习跟踪方法

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472588B (zh) * 2019-08-19 2020-11-24 上海眼控科技股份有限公司 锚点框确定方法、装置、计算机设备和存储介质
CN110647818A (zh) * 2019-08-27 2020-01-03 北京易华录信息技术股份有限公司 一种遮挡目标物体的识别方法及装置
CN112465859A (zh) * 2019-09-06 2021-03-09 顺丰科技有限公司 快速运动目标的检测方法、装置、设备和储存介质
CN110765363B (zh) * 2019-09-27 2023-05-05 复旦大学 一种基于高斯分布表示的深度推荐***
CN110780164B (zh) * 2019-11-04 2022-03-25 华北电力大学(保定) 基于yolo的绝缘子红外故障定位诊断方法及装置
CN111107427B (zh) * 2019-11-20 2022-01-28 Oppo广东移动通信有限公司 图像处理的方法及相关产品
CN112850436A (zh) * 2019-11-28 2021-05-28 宁波微科光电股份有限公司 一种电梯智能光幕的行人趋势检测方法及***
CN111126235B (zh) * 2019-12-18 2023-06-16 浙江大华技术股份有限公司 一种船只违章停泊检测处理方法及装置
CN111079694A (zh) * 2019-12-28 2020-04-28 神思电子技术股份有限公司 一种柜面助手履职监控装置和方法
CN111274917B (zh) * 2020-01-17 2023-07-18 江南大学 一种基于深度检测的长时目标跟踪方法
CN111292355B (zh) * 2020-02-12 2023-06-16 江南大学 一种融合运动信息的核相关滤波多目标跟踪方法
CN111340855A (zh) * 2020-03-06 2020-06-26 电子科技大学 一种基于轨迹预测的道路移动目标检测方法
CN111402295B (zh) * 2020-03-11 2023-08-08 桂林理工大学 基于对象检测和跟踪的移动物体识别方法
CN111524095A (zh) * 2020-03-24 2020-08-11 西安交通大学 一种用于旋转物体的目标检测方法
CN111754545B (zh) * 2020-06-16 2024-05-03 江南大学 一种基于iou匹配的双滤波器视频多目标跟踪方法
CN111797802B (zh) * 2020-07-14 2023-06-02 华侨大学 一种基于ai视觉的扶梯不安全行为实时预警方法
CN111680689B (zh) * 2020-08-11 2021-03-23 武汉精立电子技术有限公司 一种基于深度学习的目标检测方法、***及存储介质
US11889227B2 (en) 2020-10-05 2024-01-30 Samsung Electronics Co., Ltd. Occlusion processing for frame rate conversion using deep learning
CN112507874B (zh) * 2020-12-10 2023-02-21 上海芯翌智能科技有限公司 一种用于检测机动车加塞行为的方法与设备
CN112529942B (zh) * 2020-12-22 2024-04-02 深圳云天励飞技术股份有限公司 多目标跟踪方法、装置、计算机设备及存储介质
CN112329893A (zh) * 2021-01-04 2021-02-05 中国工程物理研究院流体物理研究所 基于数据驱动的异源多目标智能检测方法及***
CN112967320B (zh) * 2021-04-02 2023-05-30 浙江华是科技股份有限公司 一种基于桥梁防撞的船舶目标检测跟踪方法
CN113139468B (zh) * 2021-04-24 2023-04-11 西安交通大学 融合局部目标特征与全局特征的视频摘要生成方法
CN113537077B (zh) * 2021-07-19 2023-05-26 江苏省特种设备安全监督检验研究院 基于特征池优化的标签多伯努利视频多目标跟踪方法
CN113674306A (zh) * 2021-07-29 2021-11-19 杭州宇泛智能科技有限公司 基于鱼眼镜头的行人轨迹获取方法、***、装置和介质
CN113421288B (zh) * 2021-08-23 2021-12-17 杭州云栖智慧视通科技有限公司 一种多目标实时轨迹跟踪中的静止轨迹碎片改进方法
CN113920164B (zh) * 2021-10-27 2024-05-24 浙江工商大学 一种剧场环境下基于近红外防伪油墨的演员身份重识别方法
CN113724298B (zh) * 2021-11-01 2022-03-18 深圳市城市交通规划设计研究中心股份有限公司 多点位感知融合方法及装置、计算机可读存储介质
CN114092515B (zh) * 2021-11-08 2024-03-05 国汽智控(北京)科技有限公司 用于障碍遮挡的目标跟踪检测方法、装置、设备及介质
CN113989332B (zh) * 2021-11-16 2022-08-23 苏州魔视智能科技有限公司 一种目标跟踪方法、装置、存储介质及电子设备
CN114972418B (zh) * 2022-03-30 2023-11-21 北京航空航天大学 基于核自适应滤波与yolox检测结合的机动多目标跟踪方法
CN115471773B (zh) * 2022-09-16 2023-09-15 北京联合大学 一种面向智慧教室的学生跟踪方法及***

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176164A (zh) * 2013-04-11 2013-06-26 北京空间飞行器总体设计部 基于无线传感器网络的多目标无源跟踪方法
CN105354860A (zh) * 2015-08-26 2016-02-24 西安电子科技大学 基于箱粒子滤波的扩展目标CBMeMBer跟踪方法
CN106408594A (zh) * 2016-09-28 2017-02-15 江南大学 基于多伯努利特征协方差的视频多目标跟踪方法
CN106910205A (zh) * 2017-03-03 2017-06-30 深圳市唯特视科技有限公司 一种基于随机有限集滤波器耦合的多目标跟踪方法
CN109002783A (zh) * 2018-07-02 2018-12-14 北京工业大学 救援环境中的人体检测以及姿态识别方法
CN109508444A (zh) * 2018-12-18 2019-03-22 桂林电子科技大学 区间量测下交互式多模广义标签多伯努利的快速跟踪方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104766320B (zh) * 2015-04-02 2017-06-13 西安电子科技大学 阈值化量测下的多伯努利滤波弱目标检测与跟踪方法
CN105182291B (zh) * 2015-08-26 2017-08-25 西安电子科技大学 自适应目标新生强度的phd平滑器的多目标跟踪方法
US10242266B2 (en) * 2016-03-02 2019-03-26 Mitsubishi Electric Research Laboratories, Inc. Method and system for detecting actions in videos
CN108876819A (zh) * 2018-06-11 2018-11-23 深圳市唯特视科技有限公司 一种基于泊松多伯努利滤波的三维多目标追踪算法
CN109325407B (zh) * 2018-08-14 2020-10-09 西安电子科技大学 基于f-ssd网络滤波的光学遥感视频目标检测方法
CN109118523B (zh) * 2018-09-20 2022-04-22 电子科技大学 一种基于yolo的图像目标跟踪方法
CN109598742A (zh) * 2018-11-27 2019-04-09 湖北经济学院 一种基于ssd算法的目标跟踪方法及***

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176164A (zh) * 2013-04-11 2013-06-26 北京空间飞行器总体设计部 基于无线传感器网络的多目标无源跟踪方法
CN105354860A (zh) * 2015-08-26 2016-02-24 西安电子科技大学 基于箱粒子滤波的扩展目标CBMeMBer跟踪方法
CN106408594A (zh) * 2016-09-28 2017-02-15 江南大学 基于多伯努利特征协方差的视频多目标跟踪方法
CN106910205A (zh) * 2017-03-03 2017-06-30 深圳市唯特视科技有限公司 一种基于随机有限集滤波器耦合的多目标跟踪方法
CN109002783A (zh) * 2018-07-02 2018-12-14 北京工业大学 救援环境中的人体检测以及姿态识别方法
CN109508444A (zh) * 2018-12-18 2019-03-22 桂林电子科技大学 区间量测下交互式多模广义标签多伯努利的快速跟踪方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, DIANWEI ET AL.: "An improved infrared video image pedestrian detection algorithm", JOURNAL OFXI'AN UNIVERSITY OF POSTS AND TELECOMMUNICATIONS, vol. 23, no. 4, 31 July 2018 (2018-07-31), pages 48 - 52 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205544A (zh) * 2021-04-27 2021-08-03 武汉大学 基于交并比估计的空间注意力强化学习跟踪方法
CN113205544B (zh) * 2021-04-27 2022-04-29 武汉大学 基于交并比估计的空间注意力强化学习跟踪方法

Also Published As

Publication number Publication date
CN110084831A (zh) 2019-08-02
CN110084831B (zh) 2021-08-24

Similar Documents

Publication Publication Date Title
WO2020215492A1 (zh) 基于YOLOv3多伯努利视频多目标检测跟踪方法
US11094070B2 (en) Visual multi-object tracking based on multi-Bernoulli filter with YOLOv3 detection
Maksai et al. Non-markovian globally consistent multi-object tracking
CN107563313B (zh) 基于深度学习的多目标行人检测与跟踪方法
Jana et al. YOLO based Detection and Classification of Objects in video records
WO2023065395A1 (zh) 作业车辆检测与跟踪方法和***
Wojek et al. Monocular visual scene understanding: Understanding multi-object traffic scenes
Wang et al. Robust video-based surveillance by integrating target detection with tracking
CN111127513A (zh) 一种多目标跟踪方法
CN110288627B (zh) 一种基于深度学习和数据关联的在线多目标跟踪方法
CN106778712B (zh) 一种多目标检测与跟踪方法
EP2131328A2 (en) Method for automatic detection and tracking of multiple objects
CN111932583A (zh) 一种基于复杂背景下的时空信息一体化智能跟踪方法
CN103971386A (zh) 一种动态背景场景下的前景检测方法
CN110532921B (zh) 基于ssd检测广义标签多伯努利视频多目标跟踪方法
KR20190023389A (ko) 변화점 검출을 활용한 다중클래스 다중물체 추적 방법
Dai et al. Instance segmentation enabled hybrid data association and discriminative hashing for online multi-object tracking
CN106780567B (zh) 一种融合颜色和梯度直方图的免疫粒子滤波扩展目标跟踪方法
Yang et al. A probabilistic framework for multitarget tracking with mutual occlusions
Li et al. Robust object tracking via multi-feature adaptive fusion based on stability: contrast analysis
He et al. Fast online multi-pedestrian tracking via integrating motion model and deep appearance model
CN114926859A (zh) 一种结合头部跟踪的密集场景下行人多目标跟踪方法
CN114627339B (zh) 茂密丛林区域对越境人员的智能识别跟踪方法及存储介质
Vo et al. Object tracking: an experimental and comprehensive study on vehicle object in video
Lu et al. Hybrid deep learning based moving object detection via motion prediction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19925790

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19925790

Country of ref document: EP

Kind code of ref document: A1