CN116523957A

CN116523957A - Multi-target tracking method, system, electronic equipment and storage medium

Info

Publication number: CN116523957A
Application number: CN202310301644.3A
Authority: CN
Inventors: 王军德; 张佳琦; 叶辉
Original assignee: Wuhan Kotei Informatics Co Ltd
Current assignee: Wuhan Kotei Informatics Co Ltd
Priority date: 2023-03-24
Filing date: 2023-03-24
Publication date: 2023-08-01

Abstract

The invention provides a multi-target tracking method, a system, electronic equipment and a storage medium, wherein the method comprises the following steps: creating a target tracking container, performing target detection on an image frame acquired by a camera through a CenterNet network, outputting target detection information, and adding a detected target into the target tracking container; extracting corresponding feature vectors in the target detection information through principal component analysis; calculating the feature similarity of the current frame detection target and the detection target in the target tracking container based on the extracted feature vector, and performing target matching through a Hungary algorithm; and updating the target in the target tracking container in real time based on the target matching result. The scheme can reduce the difficulty of constructing and training the tracking model dataset, reduce the occupation of computational resources and improve the running speed of the model on the premise of ensuring the detection precision.

Description

Multi-target tracking method, system, electronic equipment and storage medium

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a multi-target tracking method, a multi-target tracking system, electronic equipment and a storage medium.

Background

The multi-target tracking is to track a plurality of targets in the video at the same time, and allocate ids for the targets, so as to finally obtain the motion trail of the targets. The multi-target tracking covers the content of a plurality of computer fields, is a key technology in the field of computer vision, and has wide application in the directions of automatic driving, intelligent monitoring, behavior recognition and the like.

Currently, commonly used multi-target tracking methods include: (1) Tracking based on target detection, wherein target detection is carried out on each frame of image through a target detection algorithm, then targets of the front frame of image and the rear frame of image are associated, and the target detection is generally realized by adopting a SORT or deep SORT algorithm; (2) The method comprises the steps of adopting a frame combining target detection and tracking, adding an enabling branch to a target detection part, outputting a target detection result and a target feature vector at the same time, and then performing target matching to obtain a final result; (3) A multi-target tracking framework based on an attention mechanism is used for tracking targets existing in a current frame by utilizing a query-key mechanism and simultaneously completing detection of new targets.

However, these methods all suffer from several major drawbacks: for the method (1), the idswitch phenomenon of the SORT algorithm is serious, the deep SORT is to extract a target enabling through a ReID network to construct a similarity matrix, the data training set of the ReID network is very difficult to manufacture, and the scale of the disclosed data set is smaller; for the method (2), since the number of targets has no upper limit, in order to cope with the distribution of a large number of target IDs, a full connection layer is added to an output layer, for example, 14455 output nodes are set by JDE, so that the calculation amount and the storage volume of a model are greatly increased, and the training process is difficult; in the method (3), the transformation operation amount is large, so that the running speed of the tracking model in equipment with slightly poorer performance is low, and the tracking model is difficult to update in real time.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide a multi-target tracking method, system, electronic device and storage medium, which are used for solving the problems of difficult data set creation and model training and large calculation amount of model operation in the existing multi-target tracking.

In a first aspect of an embodiment of the present invention, there is provided a multi-target tracking method, including:

creating a target tracking container, performing target detection on an image frame acquired by a camera through a CenterNet network, outputting target detection information, and adding a detected target into the target tracking container;

extracting corresponding feature vectors in the target detection information through principal component analysis;

calculating the feature similarity of the current frame detection target and the detection target in the target tracking container based on the extracted feature vector, and performing target matching through a Hungary algorithm;

and updating the target in the target tracking container in real time based on the target matching result.

In a second aspect of an embodiment of the present invention, there is provided a multi-target tracking system in panoramic imaging, including:

the target detection module is used for creating a target tracking container, carrying out target detection on an image frame acquired by a camera through a CenterNet network, outputting target detection information and adding a detected target into the target tracking container;

the feature extraction module is used for extracting corresponding feature vectors in the target detection information through principal component analysis;

the target matching module is used for calculating the feature similarity of the detection target of the current frame and the detection target in the target tracking container based on the extracted feature vector, and performing target matching through a Hungary algorithm;

and the target updating module is used for updating the target in the target tracking container in real time based on the target matching result.

In a third aspect of the embodiments of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect of the embodiments of the present invention when the computer program is executed by the processor.

In a fourth aspect of the embodiments of the present invention, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method provided by the first aspect of the embodiments of the present invention.

In the embodiment of the invention, the target detection container is constructed, the central Net network is adopted for target detection, the target feature vector is extracted through principal component analysis, and the target matching is carried out by combining the Hungary algorithm, so that the real-time updating of the target in the target tracking container is realized, the calculation amount of a model is greatly reduced on the premise of ensuring the detection precision, and the consumption of calculation force resources is reduced. The method can reduce the difficulty of extracting the target characteristics based on principal component analysis, avoid the influence caused by target shielding, unclear and the like, and overcome the difficulties in data set production and model training in the traditional multi-target tracking.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flow chart of a multi-objective tracking method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a multi-target tracking system according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the term "comprising" and other similar meaning in the description of the invention or the claims and the above-mentioned figures is intended to cover a non-exclusive inclusion, such as a process, method or system, apparatus comprising a series of steps or elements, without limitation to the listed steps or elements. Furthermore, "first" and "second" are used to distinguish between different objects and are not used to describe a particular order.

It should be noted that, pure vision-based multi-target tracking has the problems of difficult data set standard, frequent target shielding, large influence of light on the camera, incapability of detecting target depth by the camera, incapability of completely matching the target frame with the detected target under different angles, and the like.

The method for carrying out data association based on IoU does not model the characteristics of a detected target, and id switch caused by target shielding can be generated, and the phenomenon is obvious in dense pedestrian tracking. Meanwhile, the change of the obstacle posture can influence the change of the detection frame, and the traditional Kalman filter can set the length and the width of the detection frame in the state variable as the state variable, but the accuracy of the detection frame predicted each time can be influenced due to the fact that the change rule of the obstacle posture cannot be accurately modeled, so that the accumulated error is caused. However, in the solution of this embodiment, the state variable does not predict the length and width of the detection frame, but rather focuses on predicting the center position of the obstacle, and in order to improve the accuracy of predicting the center position, the central net is used to perform target detection.

The IoU method for calculating the correlation matrix generally only considers the relation of position matching, and does not consider the characteristics of the detection target. The ReID based on deep learning can better extract target characteristics, but the network is difficult to train and faces the problem of insufficient training data sets. In the scheme, an unsupervised feature extraction method is adopted, namely target features are extracted through principal component analysis, and meanwhile, an association matrix is calculated by combining IoU, so that target similarity calculation is realized.

Referring specifically to fig. 1, a flow chart of a multi-target tracking method according to an embodiment of the present invention includes:

s101, creating a target tracking container, performing target detection on an image frame acquired by a camera through a central Net network, outputting target detection information, and adding a detected target into the target tracking container;

the target tracking container is used for storing target information of the image frames and can update targets in real time. The central Net is a target detector, in this embodiment, the central Net is adopted as an anchor frame-free detector, so that inefficient and complex Anchor operation is removed, performance of a detection algorithm is improved, and running speed of the whole algorithm is further improved after time-consuming non-extremely-inhibited NMS is removed; meanwhile, the central Net outputs the distribution of the detection target center in a hematmap mode, and the loss function also comprises correction of the detection center, so that the accuracy of the detection target center is higher than that of a method based on an anchor frame, and the prediction accuracy of a Kalman filter on the detection target center can be improved.

Preferably, the object detection is performed on the image frames captured by the camera through a centrnet network using an anchor-free framework.

The target detection information can comprise information such as target positions, number and detection frames, and the feature vectors of the targets can be conveniently extracted by outputting the target detection information.

The target detection information at least comprises the number of detected targets, two-dimensional coordinates of the targets in the image, the size of a detection frame, the confidence of the targets, the types of the targets and the confidence of the types.

For example, object detection information BBoxes is generated, the data dimension of which is (, 7), n represents the number of detected objects, 7 dimensions are (x,, h, bj_conf, class_conf, law), x, y are coordinates centered on the picture, w, h are the width and height of the detection frame, obj_conf, class_is the object confidence and class confidence, class is the class.

S102, extracting corresponding feature vectors in target detection information through principal component analysis;

the principal component analysis is to perform feature decomposition on the covariance matrix to obtain principal components (i.e. feature vectors) and weights of the data. Is often used to reduce the dimensionality of the data set and to preserve features in the data set that contribute most to the variance.

Specifically, the length and width of a target detection frame are scaled to obtain m rows of matrix X, and X covariance is calculated according to a formula:

where Σ represents covariance, m represents matrix line number, i is count variable, X represents feature matrix,representing the average value of the feature matrix;

and calculating a feature matrix of covariance, and taking the top k column vectors as feature matrices of m multiplied by k shapes of the target detection information for representing the target feature vectors.

S103, calculating the feature similarity of the detection target of the current frame and the detection target in the target tracking container based on the extracted feature vector, and performing target matching through a Hungary algorithm;

and respectively acquiring the feature vector of the detection target in the current image frame and the feature vector of the target in the target tracking container, and calculating the feature similarity of the detection target and the feature vector. Wherein, the similarity matrix is calculated by alpha IoU + (1-) corr, and alpha is the weight. The similarity includes location and object feature information and takes a value between [0,1 ].

Wherein the detection target in the target tracking container is predicted by kalman filtering.

The Kalman filtering state isRepresenting the coordinates of the center point in the image and the change rate thereof, the following predictive equation can be established by using the uniform model:

s _k ＝s _k-1 +；

a represents a state transition matrix, and the form is as follows:

omega represents noise, its variance is Q, and the predicted value isThe variance of the predicted value is->The variance calculation formula is:

and the Hungary algorithm is used for matching the targets of the image frames with the targets in the target tracking container to obtain a matching result between the targets.

Optionally, for the matched detection target, using the center of the detection result as an observation value for updating the Kalman filter;

adding the new detection target to the target detection container for the unmatched new detection target;

and judging whether the unmatched time of the target exceeds a preset value or not for the detection target which is unmatched and has the history record, deleting the unmatched target from the target tracking container if the unmatched time exceeds the preset value, and predicting the unmatched target through a kalman filter if the unmatched target does not exceed the preset value.

S104, updating the target in the target tracking container in real time based on the target matching result.

Specifically, if the matched target exists in the target tracking container, updating the target according to the observed value of the current detected target, and if the matched target does not exist, updating the target based on the kalman filter prediction.

In this embodiment, the problem that the feature of the tracking target cannot be extracted is overcome by principal component analysis, and the phenomenon of the SORT framework idswitch can be improved by feature+ IoU matching. The SORT framework cannot extract the target characteristics, so that when the shielding target reappears, the shielding target is detected as a new obstacle, the target characteristics are extracted through principal component analysis, the extracted characteristics are compared with the history record, and the problem of tracking the shielded target in a short time can be solved.

The length and width precision of the detection frame for multi-target tracking is not important in the multi-target tracking problem, a small error can be allowed, and the calculation amount of a model can be reduced by accurately estimating the central position of a tracking target, so that the change prediction of the detection frame is deleted in a Kalman filtering stage, and only the central position is corrected. The prediction frame length and width of the target detection are used as prediction results, so that the use precision is enough, the dimension of the Kalman prediction model can be reduced on the premise of ensuring the use precision, and the operation speed is improved.

Therefore, the target detection precision of the embodiment is slightly lower than that of the traditional deep learning model, but the consumed calculation force and storage space are far smaller than those of the conventional model, and the running speed can be greatly improved.

It should be understood that the sequence number of each step in the above embodiment does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not be construed as limiting the implementation process of the embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a multi-target tracking system according to an embodiment of the present invention, where the system includes:

the target detection module 210 is configured to create a target tracking container, perform target detection on an image frame acquired by a camera through a centrnet network, output target detection information, and add a detected target to the target tracking container;

wherein, the image frame collected by the camera is subject to object detection through a CenterNet network adopting an anchor-free framework.

The feature extraction module 220 is configured to extract a corresponding feature vector in the target detection information through principal component analysis;

specifically, the extracting the corresponding feature vector in the target detection information through principal component analysis includes:

the length and width of the target detection frame are scaled to obtain m rows of matrix X, and X covariance is calculated according to a formula:

and calculating a feature matrix of covariance, taking the first k column vectors as feature matrices of m multiplied by k shapes of the target detection information, and representing the target feature vectors.

The target matching module 230 is configured to calculate a feature similarity between the current frame detection target and the detection target in the target tracking container based on the extracted feature vector, and perform target matching by using a hungarian algorithm;

optionally, the object matching module 230 further includes:

and the target prediction module is used for predicting the detection target in the target tracking container through kalman filtering.

Preferably, the target matching by the hungarian algorithm includes:

for the matched detection targets, using the center of the detection result as an observation value for updating the Kalman filter;

The target updating module 240 is configured to update the target in the target tracking container in real time based on the target matching result.

And if the matched target exists in the target tracking container, updating the target according to the observed value of the current detection target, and if the matched target does not exist, updating the target based on the prediction of the kalman filter.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and module may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device is used for multi-target detection tracking. As shown in fig. 3, the electronic apparatus 3 of this embodiment includes: memory 310, processor 320, and system bus 330, the memory 310 including an executable program 3101 stored thereon, it will be understood by those skilled in the art that the electronic device structure shown in fig. 3 is not limiting of the electronic device and may include more or fewer components than illustrated, or may combine certain components, or a different arrangement of components.

The following describes the respective constituent elements of the electronic device in detail with reference to fig. 3:

the memory 310 may be used to store software programs and modules, and the processor 320 may execute various functional applications and data processing of the electronic device by executing the software programs and modules stored in the memory 310. The memory 310 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device (such as cache data), and the like. In addition, memory 310 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

An executable program 3101 containing network request methods on the memory 310, the executable program 3101 may be partitioned into one or more modules/units stored in the memory 310 and executed by the processor 320 to implement multi-objective tracking and the like, the one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions for describing the execution of the computer program 3101 in the electronic device 3. For example, the computer program 3101 may be divided into functional modules such as a target detection module, a feature extraction module, a target matching module, and a target updating module.

Processor 320 is a control center of the electronic device that utilizes various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the electronic device and process data by running or executing software programs and/or modules stored in memory 310, and invoking data stored in memory 310, thereby performing overall condition monitoring of the electronic device. Optionally, processor 320 may include one or more processing units; preferably, the processor 320 may integrate an application processor that primarily handles operating systems, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 320.

The system bus 330 is used to connect various functional components inside the computer, and CAN transfer data information, address information, and control information, and the types of which may be PCI bus, ISA bus, and CAN bus, for example. Instructions from the processor 320 are transferred to the memory 310 through the bus, the memory 310 feeds back data to the processor 320, and the system bus 330 is responsible for data and instruction interaction between the processor 320 and the memory 310. Of course, the system bus 330 may also access other devices, such as a network interface, a display device, etc.

In an embodiment of the present invention, the executable program executed by the process 320 included in the electronic device includes:

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system, apparatus and module may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A multi-target tracking method, comprising:

2. The method of claim 1, wherein the object detection of the image frames acquired by the camera through the central net network comprises:

object detection is performed on image frames acquired by a camera through a centrnet network employing an anchor-free framework.

3. The method of claim 1, wherein the target detection information includes at least a number of detected targets, two-dimensional coordinates of the targets in the image, a size of a detection frame, a target confidence, a target category, and a category confidence.

4. The method according to claim 1, wherein extracting the corresponding feature vector in the target detection information by principal component analysis includes:

5. The method of claim 1, wherein calculating feature similarity of the current frame detection target to the detection target in the target tracking container based on the extracted feature vector further comprises:

the detection target in the target tracking container is predicted by kalman filtering.

6. The method according to claim 1, characterized in that said target matching by means of the hungarian algorithm comprises:

7. The method of claim 1, wherein the updating the targets in the target tracking container in real time comprises:

and if the matched target exists in the target tracking container, updating the target according to the observed value of the current detection target, and if the matched target does not exist, updating the target based on the kalman filter prediction.

8. A multi-target tracking system, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of a multi-objective tracking method according to any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed implements the steps of a multi-objective tracking method according to any one of claims 1 to 7.