CN112927267A - Target tracking method under multi-camera scene - Google Patents
Target tracking method under multi-camera scene Download PDFInfo
- Publication number
- CN112927267A CN112927267A CN202110275199.9A CN202110275199A CN112927267A CN 112927267 A CN112927267 A CN 112927267A CN 202110275199 A CN202110275199 A CN 202110275199A CN 112927267 A CN112927267 A CN 112927267A
- Authority
- CN
- China
- Prior art keywords
- target
- tracking
- data set
- target object
- pictures
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/292—Multi-camera tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target tracking method in a multi-camera scene, which realizes splicing of different camera pictures by using YOLO-V4 in combination with an improved Deepsort algorithm and an image splicing algorithm, and finally realizes multi-target tracking in a spliced video. In the aspect of a data set, a self-made intelligent trolley data set and a self-made vehicle re-identification data set containing an intelligent trolley are adopted. According to the invention, through constructing abundant data sets, improving models and splicing and fusing pictures, multi-target tracking under a multi-camera scene is realized, and the accuracy rate of vehicle re-identification is improved better.
Description
Technical Field
The invention relates to a target tracking method, in particular to a target tracking method in a multi-camera scene.
Background
Target detection and tracking are a research hotspot in the field of computer vision at present, and have wide application in the fields of video monitoring, automatic driving, man-machine interaction, intelligent home furnishing and the like. The moving target tracking belongs to the content of video analysis, and the video analysis integrates the middle-level and high-level processing stages in the field of computer vision research, namely, the image sequence is processed, so that the rule of the moving target is researched, or semantic and non-semantic information support including motion detection, target classification, target tracking, event detection and the like is provided for decision alarm of a system. The research and application of the video target tracking method is an important branch in the field of computer vision, and is increasingly and widely applied to various fields of scientific technology, national defense construction, aerospace, medicine and health and national economy, so that the research target tracking technology has great practical value and wide development prospect.
With the development of neural networks, neural networks for object detection and tracking have been developed from machine learning to deep learning. Currently, target detection algorithms are broadly divided into two categories: one is two stages, the detection problem is divided into two stages by the two-stage detection algorithm, firstly candidate regions (region prosages) are generated and then classified, and typical representatives of the one are R-CNN, Fast R-CNN and Master R-CNN families. The recognition error rate and the recognition missing rate of the images are low, but the speed is low, and the real-time detection scene cannot be met. Another type of method is called a one-stage detection algorithm, which does not require a stage of generating a candidate region, directly generates a class probability and a position coordinate value of an object, and can directly obtain a final detection result through a single detection, so that the detection speed is faster, and more typical algorithms such as YOLO, SSD, YOLOv3, YOLO-V4, CenterNet, and the like are available. The main task of multi-Object Tracking, i.e. Multiple Object Tracking (MOT), is to provide a sequence of images, find moving objects in the sequence of images, and identify moving objects in different frames, i.e. to provide a certain accurate id, although these objects may be arbitrary, such as pedestrians, vehicles, various animals, etc. Currently, the mainstream target Tracking strategy studied in the industry of the academic community is TBD (Tracking-by-detectino), that is, target detection is performed in each frame, and then target Tracking is performed by using the result of the target detection, and the more classical algorithms include SORT and deep SORT. The image splicing technology is a technology for carrying out space matching alignment on a group of image sequences of mutually overlapped parts, forming a complete new image of a wide-view-angle scene containing information of each image sequence after resampling and synthesis, and can make up the defect of limited shooting content of a single-point camera, expand the experience range of equipment and represent algorithms of SIFT, SURF and ORB.
For the application of target tracking in various scenes, many scholars have already made many better research results, but the research of multi-target cross-camera tracking technology, most of academic research nowadays focuses on finding the overlapping part between the images captured by the cameras, and the overlapping part is used as a basis for tracking different cameras respectively. The optimized SURF algorithm is utilized to match the overlapped parts of the images of the two cameras, the target handover of the multiple cameras is completed, and the cross-camera tracking is realized. Moreover, when multiple targets appear in different camera pictures, cross-camera tracking of overlapped picture parts can only be realized, a determined accurate id is allocated to the overlapped picture parts, and how the id of target tracking of the non-overlapped part is allocated is not provided with a proper solution. In the aspect of target detection, in the prior art, a frame difference method is adopted, namely a difference operation is performed on two continuous frames of images of a video image sequence to obtain a moving target contour, the algorithm is simple to implement, low in programming complexity and high in running speed, but the algorithm is seriously dependent on a selected inter-frame time interval and a selected segmentation threshold, poor in universality and easy to be restricted by scenes.
In order to realize multi-target tracking among multiple cameras and enable target detection and tracking to have better effects, a method of YOLO-V4 combined with Deepsort is adopted, a target appearance feature extraction network in the Deepsort is improved, and an attention mechanism is introduced to obtain better matching features.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a multi-target and high-identification-accuracy target tracking method in a multi-camera scene.
The technical scheme is as follows: the invention discloses a target tracking method under a multi-camera scene, which comprises the following steps:
s1: shooting a picture of a target object, and labeling the picture to obtain a first target object data set;
s2: the first target object data set and the collected target object associated data set are scattered and mixed to obtain a total data set, and a YOLO-V4 model is trained by adopting the total data set;
s3: shooting each target object at multiple angles to obtain pictures of each target object at different angles and obtain a second target object data set;
s4: an attention mechanism is introduced to improve a target appearance characteristic extraction network in a Deepsort algorithm;
s5: training the improved target appearance characteristic extraction network by using a second target object data set;
s6: combining the trained YOLO-V4 model with an improved Deepsort algorithm, obtaining a detection frame of a target object by using the YOLO-V4 model, and tracking the detected target object by using the improved Deepsort algorithm to obtain a target object tracking model;
s7: the method comprises the steps of adopting a plurality of cameras arranged at different positions, and tracking a target object to be tracked by applying a target object tracking model.
Has the advantages that: compared with the prior art, the invention has the following remarkable advantages:
the invention obtains the video spliced by multiple cameras by applying the SURF image splicing algorithm, and finally realizes multi-target tracking by applying the YOLO-V4 model in combination with the improved Deepsort algorithm in the video. The target appearance information extraction network in the Deepsort algorithm is optimized, a channel attention mechanism is introduced, and the vehicle weight identification accuracy is improved by 1.1% compared with the prior art. In conclusion, the multi-target tracking method and the multi-target tracking system realize multi-target tracking in a multi-camera scene, and the accuracy rate of vehicle weight identification is improved well.
Drawings
FIG. 1 is a comparison graph of the effect of the target appearance information extraction network and the improved target appearance information extraction network in the original Deepsort algorithm trained by the self-made vehicle re-identification data set according to the present invention.
FIG. 2 is a diagram of the tracking effect of multiple intelligent trolleys in a single-camera scene by applying a YOLO-V4 model and an improved Deepsort algorithm.
FIG. 3 is a diagram of a plurality of intelligent vehicle tracking effects in a scene with two cameras fused by applying a YOLO-V4 model and an improved Deepsort algorithm.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
According to the method, different camera pictures are spliced by using YOLO-V4 in combination with an improved Deepsort algorithm and an image splicing algorithm, and finally multi-target tracking is realized in the spliced video. In the aspect of a data set, a self-made intelligent trolley data set and a self-made vehicle re-identification data set containing an intelligent trolley are adopted. The method comprises the following specific steps:
s1: shooting a photo of the intelligent vehicle, and marking the photo to obtain an intelligent vehicle data set;
s2: combining the intelligent vehicle data set with the collected vehicle data set to obtain a total data set, and training a YOLO-V4 model by using the data set;
s3: shooting each intelligent vehicle at multiple angles to obtain pictures of each intelligent vehicle at different angles, taking out the part of the pictures containing the intelligent vehicles, and combining the collected vehicle heavy identification data sets to obtain a vehicle heavy identification data set containing the intelligent vehicles;
s4: the target appearance characteristic extraction network in the deep sort algorithm is improved, and an attention mechanism is introduced;
s5: training an improved target appearance characteristic extraction network in the Deepsort algorithm by using a vehicle weight recognition data set;
s6: and combining the trained YOLO-V4 model with an improved Deepsort algorithm to obtain a model capable of tracking the intelligent trolley.
S7: and splicing videos shot by a plurality of cameras by using a SURF algorithm to obtain spliced videos, and tracking the intelligent vehicle in the videos by using a YOLO-V4 model and an improved Deepsort algorithm.
In step S1, the target detection effect of the YOLO-V4 model is closely related to the data set, and therefore, the data set must be sufficient. In the process of data set production, all the situations that the intelligent vehicle can appear in the scene need to be considered. And shooting pictures of the intelligent trolley from different angles, different shooting distances and different scenes to obtain 560 pictures containing the intelligent trolley. And finally, labeling the intelligent trolleys in the pictures by using data set labeling software to obtain the label file corresponding to each picture, wherein the quantity, the size and the angle of the intelligent trolleys contained in each picture are different. And combining the collected partial car data set with the self-made intelligent car data set to obtain a final data set. Wherein, 80% of the data set is used for a training set, 10% is used for a verification set, and the last 10% is used as a test set.
In step S2, the target detection network used is a YOLO-V4 network. The YOLO-V4 network is a YOLO series one-stage target detection network, is an improved version of YOLOV3, is subjected to a lot of small improvements on the basis of YOLOV3, and achieves great improvement of target detection accuracy while keeping the recognition rate not to be reduced. The major improvements of YOLO-V4 are as follows: 1. the method improves the Yolov3 backbone extraction network Darknet53, modifies the activation function of Darknet Conv2D from LeakyReLU to Mish, and the formula of the Mish function is as follows:
Mish=x*tanh(ln(1+ex))
the network architecture in Darknet53 was then modified to use the CSPnet architecture, with Darknet53 modified to CSPDarknet 53. 2. The SPP structure and the PANet structure are used. The SPP structure is connected to the last feature layer of the CSPdark net53 to perform convolution for three times, then the SPP structure is processed by using the maximum pooling of four different scales, the size of the pooling kernel of the maximum pooling is respectively 13x13, 9x9, 5x5 and 1x1, and the repeated extraction of features is realized by using an up-sampling and down-sampling network of the PANet. 3. The training part adopts a Mosaic data enhancement method, 4 pictures are read each time, four pictures are respectively turned, zoomed, changed in color gamut and the like, and the pictures are well arranged according to four directions to form new pictures. 4. CIOU is used as regression to optimize LOSS. The CIOU takes the distance between the target and the prior frame, the overlapping rate, the scale and the penalty term into consideration, so that the regression of the target frame becomes more stable, and the calculation formula is as follows:
where IOU is the intersection/union of areas between the prediction box and the actual box, ρ2(b,bgt) C represents the diagonal distance of the minimum closure area which can contain the prediction frame and the real frame at the same time. And the calculation formula of alpha and v is as follows:
w, h and wgt,hgtRepresenting the width and height of the real frame and the prediction frame, 1-CIOU can obtain the corresponding LOSS, and the calculation formula is as follows:
in step S3, the quality of the extraction capability of the target appearance feature extraction network in deep sort is closely related to the data set used for training the network, and therefore, a vehicle re-identification data set needs to be created. Every dolly all need take the picture of different angles separately, draws the position of the intelligent vehicle in the picture alone, and every dolly is about taking 40 pictures, then combines together the vehicle heavy identification data set that collects with the intelligent car heavy identification data set of self-control, obtains the vehicle heavy identification data set that contains the intelligent vehicle, and the data set contains 585 different vehicles, and every kind of vehicle possess about 40 pictures. Taking 90% as training set and 10% as testing set.
In steps S4-S5, a Deepsort target tracking algorithm is used and improved. The deep Sort algorithm is an improvement of the Sort algorithm, the Sort tracking method is to input IOU conditions of a detection frame and a tracking frame into the Hungarian algorithm for linear distribution to associate inter-frame IDs, and although the method is high in tracking precision and accuracy, ID switching is easily caused. Therefore, the Deepsort adds the appearance information of the target into the matching calculation, so that the ID can be correctly matched under the condition that the target is shielded and appears later, the frequent ID switch can be effectively reduced, the 128-dimensional feature vector corresponding to the detection frame is calculated through the convolutional neural network for extracting the appearance information of the target, and the effect of tracking the target is directly influenced by the extracting effect of the convolutional neural network on the appearance information of the target. For the patent, the identified main target is the intelligent vehicle, and for this purpose, a proper convolutional neural network needs to be trained to extract the target appearance information of the intelligent vehicle. In order to enable the feature extraction capability of the network to be better, the method improves the feature extraction network of the Deepsort, and introduces a channel attention mechanism network ECA-Net behind the original residual error network. ECA-Net provides a local cross-channel interaction strategy without dimension reduction and a method for adaptively selecting the size of a one-dimensional convolution kernel, thereby realizing the improvement on performance.
In step S6, the YOLO-V4 model is combined with the improved Deepsort algorithm, and multi-target tracking is carried out in the scene of a single camera.
In step S7, in order to make the stitching have good accuracy and robustness and have good real-time performance, the SURF algorithm is used to extract the feature points of the image sequence. The SURF algorithm has the advantages of high speed and high matching degree in the current mainstream image splicing algorithm, so the SURF algorithm is also called as a rapid robust feature. The idea of multi-camera video stitching is as follows: firstly, reading pictures captured by each camera, then splicing the captured pictures by using a SURF algorithm to obtain spliced pictures, and finally fusing all the spliced pictures to obtain a final multi-camera fused video. And tracking the intelligent vehicle in the video by using a YOLO-V4 model and combining a modified version of Deepsort algorithm.
In order to better embody the technical effect of the invention, the classification performance of the trained YOLO-V4 network is counted, and as a result, as shown in tables 1 and 2, the AP in table 1 is the average accuracy, and the identification accuracy of the YOLO-V4 network under a single category is reflected. The mAP is the average of the AP values under all categories, reflecting the accuracy of the YOLO-V4 network under all categories. F1 is a comprehensive evaluation index of the model and reflects the classification effect of the YOLO-V4 network.
TABLE 1 Classification Performance index of trained YOLO-V4 network
TABLE 2 Deepsort target appearance information extraction network comparison before and after improvement
Accuracy in table 2 is the Accuracy, and the larger the Accuracy is, the stronger the extraction capability of the target appearance information extraction network is. The Loss is a calculated value of a Loss function, and the smaller the Loss is, the stronger the extraction capability of the target appearance information extraction network is.
Claims (5)
1. A target tracking method under a multi-camera scene is characterized by comprising the following steps:
s1: shooting a picture of a target object, and labeling the picture to obtain a first target object data set;
s2: the first target object data set and the collected target object associated data set are scattered and mixed to obtain a total data set, and a YOLO-V4 model is trained by adopting the total data set;
s3: shooting each target object at multiple angles to obtain pictures of each target object at different angles and obtain a second target object data set;
s4: an attention mechanism is introduced to improve a target appearance characteristic extraction network in a Deepsort algorithm;
s5: training the improved target appearance characteristic extraction network by using a second target object data set;
s6: combining the trained YOLO-V4 model with an improved Deepsort algorithm, obtaining a detection frame of a target object by using the YOLO-V4 model, and tracking the detected target object by using the improved Deepsort algorithm to obtain a target object tracking model;
s7: the method comprises the steps of adopting a plurality of cameras arranged at different positions, and tracking a target object to be tracked by applying a target object tracking model.
2. The method for tracking the target under the multi-camera scene according to claim 1, wherein the step S2 of building the YOLO-V4 model comprises:
(1) the method improves the Yolov3 backbone extraction network Darknet53, modifies the activation function of Darknet Conv2D from LeakyReLU to Mish, and the formula of the Mish function is as follows:
Mish=x*tanh(ln(1+ex))
then modifying the network structure in Darknet53, using CSPnet structure, modifying Darknet53 into CSPDarknet 53;
(2) after the SPP structure is connected to the last feature layer of the CSPdakrnet 53 to carry out convolution for three times, the SPP structure is respectively used for processing by utilizing the maximum pooling of four different scales, the size of the pooling kernel of the maximum pooling is respectively 13x13, 9x9, 5x5 and 1x1, and the repeated extraction of the features is realized by utilizing an up-sampling network and a down-sampling network of the PANET;
(3) the training part adopts a Mosaic data enhancement method, and the method reads a plurality of pictures each time, respectively turns over, zooms, changes color gamut and the like, and places the pictures in various directions to form new pictures;
(4) CIOU is used as regression optimization LOSS, and the calculation formula is as follows:
where IOU is the intersection/union of the areas between the predicted and actual boxes, ρ 2(b, b)gt) C represents the diagonal distance of the minimum closure area which can simultaneously contain the prediction frame and the real frame; the calculation formula of alpha and v is as follows:
w, h and wgt,hgtRepresenting the width and height of the real frame and the prediction frame, 1-CIOU can obtain the corresponding LOSS, and the calculation formula is as follows:
3. the method for tracking the target under the multi-camera scenario as claimed in claim 1, wherein the attention mechanism of step S4 is a channel attention mechanism network ECA-Net.
4. The method for tracking the target under the multi-camera scene according to claim 1, wherein the step S7 further comprises the steps of: firstly, reading pictures captured by each camera, splicing the captured pictures by using a SURF algorithm, and then fusing all the spliced pictures to obtain a final multi-camera fused video.
5. The method for tracking the target under the multi-camera scene as claimed in claim 1, wherein the target is an intelligent car, and the target-related data is other data with a shape similar to the intelligent car.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110275199.9A CN112927267A (en) | 2021-03-15 | 2021-03-15 | Target tracking method under multi-camera scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110275199.9A CN112927267A (en) | 2021-03-15 | 2021-03-15 | Target tracking method under multi-camera scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112927267A true CN112927267A (en) | 2021-06-08 |
Family
ID=76174965
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110275199.9A Withdrawn CN112927267A (en) | 2021-03-15 | 2021-03-15 | Target tracking method under multi-camera scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112927267A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114882351A (en) * | 2022-03-31 | 2022-08-09 | 河海大学 | Multi-target detection and tracking method based on improved YOLO-V5s |
CN114882393A (en) * | 2022-03-29 | 2022-08-09 | 华南理工大学 | Road reverse running and traffic accident event detection method based on target detection |
CN115035251A (en) * | 2022-06-16 | 2022-09-09 | 中交第二航务工程局有限公司 | Bridge deck vehicle real-time tracking method based on domain-enhanced synthetic data set |
CN116993779A (en) * | 2023-08-03 | 2023-11-03 | 重庆大学 | Vehicle target tracking method suitable for monitoring video |
-
2021
- 2021-03-15 CN CN202110275199.9A patent/CN112927267A/en not_active Withdrawn
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114882393A (en) * | 2022-03-29 | 2022-08-09 | 华南理工大学 | Road reverse running and traffic accident event detection method based on target detection |
CN114882351A (en) * | 2022-03-31 | 2022-08-09 | 河海大学 | Multi-target detection and tracking method based on improved YOLO-V5s |
CN114882351B (en) * | 2022-03-31 | 2024-04-26 | 河海大学 | Multi-target detection and tracking method based on improved YOLO-V5s |
CN115035251A (en) * | 2022-06-16 | 2022-09-09 | 中交第二航务工程局有限公司 | Bridge deck vehicle real-time tracking method based on domain-enhanced synthetic data set |
CN115035251B (en) * | 2022-06-16 | 2024-04-09 | 中交第二航务工程局有限公司 | Bridge deck vehicle real-time tracking method based on field enhanced synthetic data set |
CN116993779A (en) * | 2023-08-03 | 2023-11-03 | 重庆大学 | Vehicle target tracking method suitable for monitoring video |
CN116993779B (en) * | 2023-08-03 | 2024-05-14 | 重庆大学 | Vehicle target tracking method suitable for monitoring video |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sun et al. | Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning | |
Asha et al. | Vehicle counting for traffic management system using YOLO and correlation filter | |
CN104244113B (en) | A kind of video abstraction generating method based on depth learning technology | |
CN112927267A (en) | Target tracking method under multi-camera scene | |
CN111914664A (en) | Vehicle multi-target detection and track tracking method based on re-identification | |
Lyu et al. | Small object recognition algorithm of grain pests based on SSD feature fusion | |
Jadhav et al. | Aerial multi-object tracking by detection using deep association networks | |
Tang et al. | Integrated feature pyramid network with feature aggregation for traffic sign detection | |
CN116402850A (en) | Multi-target tracking method for intelligent driving | |
Zhang et al. | Exploiting Offset-guided Network for Pose Estimation and Tracking. | |
CN108280844A (en) | A kind of video object localization method based on the tracking of region candidate frame | |
Adeli et al. | A component-based video content representation for action recognition | |
Shehzadi et al. | 2d object detection with transformers: a review | |
CN112232240A (en) | Road sprinkled object detection and identification method based on optimized intersection-to-parallel ratio function | |
Kalva et al. | Smart Traffic monitoring system using YOLO and deep learning techniques | |
Wang et al. | Summary of object detection based on convolutional neural network | |
Hassan et al. | Multi-object tracking: a systematic literature review | |
Alomari et al. | Smart real-time vehicle detection and tracking system using road surveillance cameras | |
CN114821482A (en) | Vector topology integrated passenger flow calculation method and system based on fisheye probe | |
Rabecka et al. | Assessing the performance of advanced object detection techniques for autonomous cars | |
CN113420660A (en) | Infrared image target detection model construction method, prediction method and system | |
Akdag et al. | Transformer-based fusion of 2D-pose and spatio-temporal embeddings for distracted driver action recognition | |
Tian et al. | Pedestrian multi-target tracking based on YOLOv3 | |
Kovbasiuk et al. | Detection of vehicles on images obtained from unmanned aerial vehicles using instance segmentation | |
Quang et al. | Character time-series matching for robust license plate recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210608 |