CN113780064A

CN113780064A - Target tracking method and device

Info

Publication number: CN113780064A
Application number: CN202110852187.8A
Authority: CN
Inventors: 张海鸣; 曹彤彤; 刘冰冰
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-12-10

Abstract

The application discloses a target tracking method and device in the field of artificial intelligence, which are used for tracking a target efficiently and accurately in a determined manner by combining the performance of a detector and improving the tracking efficiency and the tracking accuracy. The method comprises the following steps: acquiring at least one detection object in a current frame through a detector, wherein the current frame is any one frame in input data; acquiring a tracking object including an object detected in a previous frame of a current frame using a detector; matching the tracking object with at least one detection object to obtain a matching result; and determining a first state quantity of the tracking object according to the matching result and a performance map of the detector, wherein the first state quantity is used for indicating whether the tracking object is output or whether the tracking object is killed, and the performance map comprises the detection accuracy of the detector in a plurality of grids in the detection range.

Description

Target tracking method and device

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a target tracking method and apparatus.

Background

Multi-object tracking (MOT) is a very important task in many important scenarios, such as automatic driving applications, and can establish the association between the frames before and after the obstacle object. When the object detection (object detection) is invalid, the output of the tracking object can be continuously maintained, and meanwhile, the problem of object detection false detection can be solved to a certain extent, and the defect of frame-by-frame object detection is overcome. In addition, the multi-target tracking can obtain target motion state information, so that important information can be provided for higher-level tasks such as automatic driving intention identification and behavior prediction.

In a target tracking mode of some scenes, a detector is utilized to carry out target detection frame by frame to obtain a target sequence, then data association and state estimation work of detection targets of previous and next frames is finished through data association, motion state estimation, tracking management and the like, and an optimal tracking object sequence is output. However, the detector may have a false detection or a missing detection, so that the object is usually not output directly after being detected, but output after delaying for a certain time, and therefore, the efficiency of outputting the tracking object is low, and how to efficiently and accurately determine whether to output the tracking object becomes an urgent problem to be solved.

Disclosure of Invention

The application provides a target tracking method and a target tracking device, which are used for tracking a target efficiently and accurately in a determined manner by combining the performance of a detector, and improving the tracking efficiency and the tracking accuracy.

In view of the above, in a first aspect, the present application provides a target tracking method, including: acquiring at least one detection object in a current frame through a detector, wherein the current frame is any one frame in input data; acquiring a tracking object including an object detected in a previous frame of a current frame using a detector; matching the tracking object with the at least one detection object to obtain a detection result; and determining a first state quantity of the tracking object according to the matching result and a performance map of the detector, wherein the first state quantity is used for indicating whether the tracking object is output or whether the tracking object is killed, and the performance map comprises the detection accuracy of the detector in a plurality of grids in the detection range.

Therefore, in the embodiment of the application, the detection accuracy of the detector in each region is combined, the state quantity of the tracked object can be calculated more accurately, so that the confidence of the detected object can be determined according to the detection accuracy of the detector, the effectiveness of the tracked object can be determined efficiently, whether the detected tracked object is accurate can be determined, whether the tracked object is output or not is determined accurately, the output delay can be reduced, and the efficiency of determining whether the tracked object is output or not is improved.

In a possible implementation, the aforementioned determining a first state quantity of the tracking object according to the matching result and the performance map of the detector may include: determining the position information of the tracking object in the current frame according to the matching result; inquiring the detection accuracy corresponding to the tracking object in the performance map based on the position information of the tracking object in the current frame; the first state quantity is calculated according to the detection accuracy.

In this embodiment, the position of the tracked object may be determined according to the matching result between the tracked object and the detection object, so that the corresponding detection accuracy may be queried in the performance map according to the position, and the confidence level that the tracked object is tracked may be indirectly determined, so that the state quantity of the tracked object may be more accurately determined according to the detection accuracy, the delay in determining whether to output or eliminate the tracked object may be reduced, and whether to output or eliminate the tracked object may be more efficiently determined.

In a possible implementation, the aforementioned determining the position information of the tracked object in the current frame according to the matching result may include: if the at least one detection object does not have a detection object matched with the tracking object, determining the position information of the tracking object in the current frame according to the motion state information of the tracking object; and if the tracking object is matched with a first detection object in the at least one detection object, taking the position information of the first detection object as the position information of the tracking object in the current frame.

Therefore, in the embodiment of the present application, if the tracked object is matched with the detection object, that is, the detection object and the tracked object may be the same object, the position of the detection object may be used as the position of the tracked object, the tracking of the tracked object is realized, and the corresponding detection accuracy is queried in the performance map of the detector, so that the state quantity indicating whether to output or eliminate the tracked object is calculated more accurately based on the detection accuracy, a more accurate state quantity is obtained, and whether to output or eliminate the tracked object is determined more efficiently. Further, when the tracking object is occluded, the position of the tracking object can be predicted, and when the detection accuracy of the tracking object is determined based on the predicted position, the tracking object can be temporarily tracked even when the tracking object is occluded, so that the tracking object is not lost.

In a possible implementation, the aforementioned determining the position information of the tracking object in the current frame according to the motion state information of the tracking object may include: acquiring a predicted position of the tracking object in the current frame according to the motion state information of the tracking object; calculating a prediction distance value between a tracking object in the current frame and a collection device for collecting the current frame according to the prediction position; acquiring an actual distance value between a predicted position in a current frame and acquisition equipment through input data; and if the difference value between the predicted distance value and the actual distance value is larger than a first threshold value, taking the predicted position as the position information of the tracking object in the current frame.

Therefore, in the embodiment of the application, when the tracking object is occluded, the predicted distance between the tracking object predicted at the predicted position of the tracking object and the acquisition device and the actual distance between the obstacle actually acquired at the predicted position and the acquisition device can be calculated, whether the tracking object is occluded or not can be determined based on the difference between the predicted distance and the actual distance, if the difference is larger than a certain value, it is indicated that the tracking object is occluded by the obstacle, and the information at the predicted position in the current frame acquired by the acquisition device is the information of the obstacle, so that the tracking object can be tracked based on the predicted position, and the problem of tracking loss caused by the fact that the tracking object is temporarily occluded can be avoided.

In one possible implementation, the first state quantity includes a first output indication state quantity indicating whether the tracking object is output in the current frame and a first extinction indication state quantity indicating whether the tracking object is extinguished in the current frame; the aforementioned calculating the first state quantity according to the detection accuracy may include: acquiring a direct observation quantity, wherein the direct observation quantity indicates a motion state of a tracking object, and the direct observation quantity can comprise information such as the motion speed and the motion direction of the acquired tracking object; calculating a first posterior probability and a second posterior probability of the tracked object in the current frame according to the detection accuracy and the direct observed quantity, wherein the first posterior probability is used for representing the probability of outputting the tracked object in the current frame, and the second posterior probability is used for representing the probability of eliminating the tracked object in the current frame; and obtaining a first output indication state quantity based on the first posterior probability, and obtaining a first extinction indication state quantity based on the second posterior probability.

Therefore, in the embodiment of the present application, whether to output or to die the tracked object may be respectively represented by the first output indication state quantity and the first die indication state quantity, and specifically, the first output indication state quantity and the first die indication state quantity may be calculated by the motion state of the tracked object and the detection accuracy of the detector, so that the first output indication state quantity and the first die indication state quantity may be calculated more accurately, and whether to output or to die the tracked object may be known more accurately.

In a possible implementation, the obtaining of the first output indicating state quantity based on the first a posteriori probability and the obtaining of the first death indicating state quantity based on the second a posteriori probability may include: acquiring a second output indication state quantity and a second extinction indication state quantity of the tracking object in the previous frame, wherein the first state quantity comprises a first output indication state quantity and a first extinction indication state quantity, the second output indication state quantity is used for indicating whether the tracking object is output in the previous frame or not, and the second extinction indication state quantity is used for indicating whether the tracking object is extinguished in the previous frame or not; and fusing the second posterior probability and the second death indication state quantity to obtain a first death indication state quantity.

In the embodiment of the application, the state quantity of the tracked object obtained by calculation when the previous frame is tracked can be combined, so that the tracked object can be tracked in time in an iterative mode, whether the tracked object is output or disappeared is determined, for example, the efficiency of determining whether the tracked object is output or disappeared can be improved in a recursive Bayesian inference tracking mode.

In one possible embodiment, the first posterior probability calculated in the case where no detection object matching the tracking object exists in the at least one detection object is greater than the first posterior probability calculated in the case where the tracking object matches the first detection object in the at least one detection object; and the second posterior probability calculated in the case where the detection object matching the tracking object does not exist in the at least one detection object is larger than the second posterior probability calculated in the case where the tracking object matches the first detection object in the at least one detection object.

Therefore, when the detection object matched with the tracking object is not detected or the tracking object is shielded, the output indication state quantity of the tracking object can be subjected to negative gain, namely, the confidence coefficient of the output tracking object is reduced, and the apoptosis indication state quantity is subjected to positive gain, namely, the confidence coefficient of the apoptosis tracking object is increased, so that the condition of false detection is avoided.

In one possible implementation, the detection accuracy of each grid in the performance map includes a plurality of categories of corresponding accuracies; the aforementioned querying, in the performance map, the detection accuracy corresponding to the tracking object based on the position information of the tracking object in the current frame may include: acquiring the category of a tracking object; and inquiring the detection accuracy corresponding to the tracking object in the performance map based on the position information of the tracking object in the current frame and the category of the tracking object.

Therefore, in the embodiment of the application, the performance map of the detector can be set to include the accuracy of the detector in different areas and different categories, so that more accurate detection accuracy can be inquired by combining the position and the category of the tracked object, further more accurate state quantity of the tracked object can be calculated based on more accurate detection accuracy, and the efficiency of determining whether to output or eliminate the tracked object is improved.

In a possible implementation, the method may further include: and if the at least one detection object comprises a second detection object which is not matched with the tracking object, taking the second detection object as a new tracking object, and tracking the new tracking object in the next frame of the current frame.

In the embodiment of the application, if a new detection object is detected, the detection object is used as a new tracking object, so that the new tracking object is updated in the next frame, and the detected object can be tracked in time.

In a possible implementation, before acquiring, by the detector, at least one detection object in the current frame, the method may further include: dividing the detection range of the detector into a plurality of grids; acquiring truth value data, wherein the truth value data comprises acquisition data acquired by acquisition equipment and information of a corresponding truth value object; detecting the acquired data by using a detector to obtain information of a predicted object; and calculating the detection accuracy corresponding to each grid in the plurality of grids according to the prediction object and the truth object to obtain a performance map.

Therefore, in the embodiment of the application, the performance of the detector can be coded in advance to obtain the performance map, so that when the target tracking is performed, the accuracy corresponding to the tracked object can be quickly determined according to the performance map, and the efficiency of determining whether the tracked object is output or eliminated is improved.

In a second aspect, an embodiment of the present application provides a target tracking apparatus having a function of implementing the target tracking method of the first aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.

In a third aspect, an embodiment of the present application provides a target tracking apparatus, including: a processor and a memory, wherein the processor and the memory are interconnected by a line, and the processor calls the program code in the memory for executing the processing-related function in the target tracking method according to any one of the first aspect. Alternatively, the target tracking device may be a chip.

In a fourth aspect, embodiments of the present application provide an object tracking apparatus, which may also be referred to as a digital processing chip or chip, where the chip includes a processing unit and a communication interface, and the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit, and the processing unit is configured to execute functions related to processing in the first aspect or any one of the optional implementations of the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the method in the first aspect or any optional implementation manner of the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the method of the first aspect or any of the optional embodiments of the first aspect.

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence body framework for use in the present application;

FIG. 2 is a system architecture diagram provided herein;

fig. 3 is a schematic view of an application scenario of a target tracking method provided in the present application;

fig. 4 is a schematic flowchart of a target tracking method provided in the present application;

FIG. 5 is a schematic diagram of an application framework provided herein;

FIG. 6 is a schematic flow chart of detector performance coding provided herein;

FIG. 7 is a schematic flow chart illustrating target tracking according to the present disclosure;

FIG. 8 is a schematic diagram of an occlusion analysis method provided herein;

fig. 9 is a state quantity diagram of bayesian tracking provided by the present application;

FIG. 10 is a schematic diagram of a target tracking device according to the present application;

FIG. 11 is a schematic diagram of another object tracking device provided in the present application;

fig. 12 is a schematic structural diagram of a chip provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The general workflow of the artificial intelligence system will be described first, please refer to fig. 1, which shows a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.

(1) Infrastructure

The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by an intelligent chip, such as a Central Processing Unit (CPU), a Network Processor (NPU), a Graphic Processor (GPU), an Application Specific Integrated Circuit (ASIC), or a Field Programmable Gate Array (FPGA), or other hardware acceleration chip; the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.

(2) Data of

Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capabilities

After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.

(5) Intelligent product and industrial application

The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, wisdom city etc..

First, a system architecture applied to the target tracking method provided by the present application is exemplarily introduced, and referring to fig. 2, a system architecture diagram provided by the present application is schematically illustrated. Wherein, the system may comprise an acquisition device 201 and an execution device 202.

The acquisition device 201 may include a device for acquiring information including depth, such as a laser radar, a millimeter wave radar, an image sensor, an infrared sensor, and the like. The collection device may transmit the collected data to the execution device.

The execution device 202 may detect information of objects included in the data collected by the collection device 201 based on the data collected by the collection device 201, track each object in the data collected by the collection device 201, and output each tracked object.

For example, a Multi Object Tracking (MOT) algorithm can establish an association relationship between a target and previous and subsequent frames, and basically follows a Detection-Tracking framework (Tracking by Detection), that is, a detector is used to perform target Detection frame by frame, a sequence of a Detection target is output as an input of a multi-target Tracking module, then the multi-target Tracking module completes data association and state estimation work on the Detection target of the previous and subsequent frames through data association, motion state estimation, Tracking management and the like, and finally, an optimal Tracking Object sequence is output through a Tracking management module.

However, the tracking management is used as an output interface between the multi-target tracking module and a downstream module, which not only needs to maintain the length of an internal tracking object sequence, but also needs to determine which reliable tracking objects are output externally. Generally, the detector may have missed detection (FN) and false detection (FP), so the tracking module cannot directly output all the tracked objects to the outside, and should have a certain "delay" output mechanism to reduce the false detection and missed detection problems caused by the uncertainty of the detector.

On the other hand, users usually expect to be able to generate and output a "real" tracking object as soon as possible on the basis of outputting the tracking object "in a delayed manner, and to die out a" false detection "target as soon as possible, so that an optimal tracking object sequence is output, and tracking performance is guaranteed. Moreover, the tracking management module also needs to be able to cope with the problem of missing detection of the target due to short-time occlusion, and still maintain a period of time for outputting the tracked object when the detector misses detection due to occlusion.

Therefore, tracking management plays a very important role in the multi-target tracking task, and some common multi-target tracking management mechanisms can be generally divided into two types: the fixed wait threshold generates a casualty mechanism and a finite state machine state transition management mechanism. In the former, a fixed waiting threshold value is set in the stages of generating and disappearing the tracked object, and the tracked object is generated or disappeared in a delayed mode, so that the problems of false detection and missed detection of the detector can be solved to a certain extent. However, the threshold value is usually inconvenient to determine and is usually set empirically, and the threshold value determination is heuristic and lacks theoretical support, and the set threshold value is difficult to balance between reducing false tracking objects caused by false detection and reducing detection target loss caused by false detection. The latter carries on the tracking state transfer according to the tracking object continuous tracking frame number or time, compared with the method of directly setting and generating the extinction threshold, it can recover the short lost target to a certain extent, but it does not adjust the threshold according to the detector performance and the tracking state, the threshold quantity needed to be set manually is more.

For another example, in some common target Tracking methods, such as a multi-layer track archive management method applied in a multi-Hypothesis Tracking (MHT) system, a multi-layer track archive including a general track archive and an internal track archive is established, and the internal track archive is updated in real time corresponding to MHT Hypothesis generation and pruning processes. And setting corresponding logics of the general track file and the internal track file, updating the general track file information based on the internal track file information, outputting and displaying the target track in real time according to the target information in the general track file, and finishing target tracking management and target information reporting. However, in this method, the number management of the tracked objects is indirectly realized by means of the hypothesis generation and pruning operations in the multi-hypothesis tracking algorithm, however, as the tracking time is prolonged, a large number of hypothesis sets need to be stored for each tracked object, and the operation efficiency is low. Moreover, the method is suitable for tracking a far target, and for a near target, the tracking effect may be poor due to reasons such as the target being blocked or the detection accuracy.

For example, the tracking result may be generated by tracking the feature points of the target area by an optical flow method, and according to the tracking result and the inverse mapping relationship between the feature points and the tracking object frame, it is determined whether the tracking object frame of each of the plurality of tracking objects succeeds in tracking, and if so, the tracking object frame is recalculated and the feature points are updated, otherwise, the tracking object frame of the tracking object and the corresponding feature points are deleted. However, this method only uses the matching result of the optical flow features to determine whether to add or delete the tracking object, which may result in poor tracking effect and low robustness for a scene with complicated optical flow. And if the target is lost temporarily, if the target is blocked, the lost target cannot be maintained and restored, so that the tracking effect is poor.

Therefore, the present application provides a target tracking method, which quickly and accurately determines which tracked objects can be output by dividing the detection accuracy of different areas of the detector, so as to improve the target tracking efficiency and tracking effect, and the following introduces the target tracking method provided by the present application.

Firstly, the target tracking method provided by the application can be applied to various scenes needing target detection or target tracking, such as automatic driving, monitoring or shooting and the like. For ease of understanding, the following description is given by way of example only, and not by way of limitation, of some possible scenarios.

Scene one, automatic driving

The method provided by the application can be applied to a perception module of a vehicle. For example, implementations are software and hardware systems that may be combined with an autonomous vehicle. Wherein the hardware system may include an object detection sensor, or a processor, etc. Wherein the object detection sensor may comprise a lidar sensor for detecting objects in the surroundings of the host vehicle. The processor may be configured to receive data from the object detection sensor and process the data, output an obstacle object, such as a general purpose processor, a graphic image processor, or the like. The software system comprises an operating system, a sensor driver, a sensor data processing program and the like. The method can be deployed in a perception module in an automatic driving software system, can be used as a tracking management submodule of a multi-target tracking module in the perception module, can maintain an internal tracking object sequence, outputs a stable and reliable tracking object result to the outside, and generally sends the result to other submodules in the perception module, such as a prediction module.

Specifically, for example, in automatic driving, tracking of other obstacles near the vehicle, such as other vehicles, pedestrians, road barriers, signs, or the like, is important for automatic driving, and affects driving safety, driving efficiency, and the like of the vehicle. For example, as shown in fig. 3, one or more laser radars, such as the laser radar 301 shown in fig. 3, may be disposed in a vehicle roof, and in the process of automatic driving, the one or more laser radars may collect environmental information near the vehicle in real time to obtain a laser point cloud, and then detect an object in the point cloud data by using a detector, and track the object in the point cloud data in real time, so that the vehicle knows obstacles near the vehicle that affect the driving safety of the vehicle in real time, and plans or adjusts the driving path of the vehicle in time. Of course, the lidar in the vehicle may also be replaced by other sensors capable of acquiring the depth, such as an infrared sensor and an image sensor, which are not described herein again.

Scene two, monitoring

In a monitoring scene, an image sensor can be arranged in the monitoring equipment, a monitoring video can be collected in real time through the image sensor, each frame of image can be detected by the monitoring equipment, an object in each frame of image is identified, an incidence relation between the object in each frame of image and an object included in the previous frame or multiple frames of images is established, and the object is continuously tracked. When the output condition is satisfied, the object can be tracked for output. If people in the monitoring picture can be tracked, the orientation of the camera is adjusted in real time according to the tracked people, so that the camera can track the people and monitor the state of the people in the scene in time.

Scene three, robot

The target tracking method can be applied to the intelligent robot. The intelligent robot can be provided with a laser radar or an image sensor, collects data in a monitoring range in real time, identifies and tracks objects in the collected data, and can output the tracked objects after the output condition is met. For example, an image sensor may be disposed in the smart robot, and the image sensor may recognize an image acquired in real time by the image sensor, detect an object therein, track the object, and output the tracked object after meeting the output condition, so that the smart robot may perform a tracking operation based on the tracked object, such as adjusting the orientation or the traveling direction of the smart robot.

Therefore, the target tracking method can be widely applied to various scenes needing target tracking, and the target tracking method provided by the application determines the detection accuracy of the detector in different detection areas in the detection range by analyzing the performance of the detector, so that the accuracy of the detected object is determined based on the accuracy, and whether the tracked object is output or not is determined more accurately and efficiently.

Referring to fig. 4, a specific flow of the target tracking method provided in the present application is described below.

401. At least one detection object in the current frame is acquired by a detector.

Wherein, the detector (detector) can be used to identify the object in the current frame and output at least one detected object. Namely, the current frame is used as the input of the detector, and the information of the identified at least one object, such as the category, position or motion state of the at least one object, is output.

The current frame may be any one of the input data, which may be the data collected by the aforementioned collection device, such as the point cloud data collected by the laser radar, the point cloud data collected by the millimeter wave radar, or the image collected by the image sensor. For example, the point cloud data collected by the laser radar may be divided according to a preset time length unit, and each unit time length is divided into one frame, so that the point cloud data is divided into multiple frames.

The detector may comprise a pre-selected model or a pre-selected trained model, e.g. the detector may comprise a network for object detection, a classification network or a segmentation network, etc. Specifically, for example, the detector may include a Convolutional Neural Network (CNN), a Deep Convolutional Neural Network (DCNN), a cyclic neural network (RNN), a region-based convolutional neural network (RCNN), or a fast RCNN (fast RCNN).

402. And acquiring a tracking object.

The tracked object is an object that has been determined to be tracked in the input data, and the tracked object may include an object detected by a detector in the previous frame, and information such as a position, a classification, or a motion state of the tracked object may be acquired.

For example, when processing input data, a tracking object set may be established, where the tracking object set includes one or more tracking objects, and when processing a first frame, the tracking object set is initialized, a detection object identified by a detector is added to the tracking object set, and when processing each subsequent frame, an object in the tracking object set may be tracked.

The number of the tracking objects may include one or more, and the present application exemplarily illustrates one of the tracking objects by way of example and not by way of limitation.

It should be understood that the previous frame referred to herein refers to a frame arranged before the current frame according to a certain sequence, and the sequence may be a time sequence for collecting input data, or a sequence for detecting the input data, for example, if the input data is a segment of video, the target tracking may be performed on the video according to a forward playing sequence of the video, or the target tracking may be performed on the video according to a reverse sequence, and the like.

403. And matching the tracking object with at least one detection object to obtain a matching result.

Wherein, the tracking object and at least one detection object can be matched, and the matching result between the tracking object and the at least one object is determined.

It is understood that the tracking object and the at least one detection object may be matched to determine whether the tracking object and one of the at least one detection object are the same object. For example, if the tracking object is a red car, it may be determined whether there is a car that is the same as the tracking object from among the identified cars, and if so, it indicates that there is an object that matches the tracking object from among the identified cars.

Specifically, the manner of matching the tracking object and the at least one detection object may specifically be matching according to specific information of the tracking object and the detection object, for example, matching according to information of a type, a shape, a position, a movement speed, a movement direction, or the like of the object, so as to identify whether the tracking object and the detection object are the same object or calculate a confidence that the tracking object and the detection object are the same object. For example, if the size, shape, color, position of the tracking target, predicted position of the detection target, and the like of the tracking target and the detection target are the same, and the predicted position may be calculated from the movement speed, movement direction, and the like of the detection target, it indicates that the tracking target and the detection target are the same target.

Generally, the matching result between the tracking object and the detection object can be classified as matching or not matching, where matching indicates that the tracking object and the detection object are the same object, and not matching indicates that the tracking object and the detection object are different objects. Of course, the matching result between the tracking object and the detection object may also be represented by a matching degree, for example, when the matching degree is higher than a certain value, that is, the tracking object and the detection object are the same object, the tracking of the tracking object is continued.

For example, the matching result may specifically include: the presence of a detection object matching the tracking object, the absence of a detection object matching the tracking object (i.e., the presence of an unnecessary tracking object), the absence of a tracking object matching the detection object (i.e., the presence of an unnecessary detection object), or the like.

404. And determining a first state quantity of the tracking object according to the matching result and the performance map of the detector.

After matching the tracking object and the detection object, the output tracking object or the lost tracking object can be determined based on the performance map of the detector, that is, the output tracking object or the lost tracking object is represented by the calculated first state quantity. That is, after the first state quantity is obtained, it is determined whether to output the tracking object in the current frame or to die the tracking object in the current frame.

Outputting the tracked object may be understood as determining that the tracked object exists continuously in multiple frames, i.e. confirming that the tracked object exists, and outputting the tracked object for further processing based on the tracked object. For example, in an automatic driving scenario, a tracking object is output, that is, it is determined that there are other vehicles or pedestrians near the vehicle, and information of the tracking object may be transmitted to an automatic driving control module of the vehicle, and further processing may be performed on the tracking object, such as adjusting a driving path, accelerating or decelerating, and the like of the automatic driving vehicle.

The tracking object is erased, which can be understood as stopping tracking the tracking object. When the next frame is targeted, the object determined to be disappeared is deleted from the tracking object, that is, the tracking object is not further tracked. For example, the tracking object includes a green car and a red car, and if it is determined that the tracking of the red car is stopped, for example, the red car is driven out of the monitoring range of the own car or the red car is blocked, the red car may be deleted from the tracking object, and when the next frame is processed, the red car is not tracked, so as to reduce the tracking workload.

The performance map may include the detection accuracy of a plurality of grids of the detector within the detection range, and the performance map may represent the detection accuracy differences of the detector in different regions or different categories, and the like. Wherein, the detection accuracy of the detector can be represented by recall rate, precision or average precision. The aforementioned detection range may include an effective detection range of the detector, such as a range in which the detection accuracy is greater than a preset accuracy, or a predetermined range, etc. For example, the detection range of the detector may be divided into a plurality of grids in advance, and the detection accuracy of the detector in each grid, that is, the performance maps before and after composition, may be calculated. The performance map may be stored in the form of a graph, a table or other manners, and may be specifically adjusted according to an actual application scenario.

In general, the performance of the detector has a high reference value for life cycle management of the tracked object, and the detection performance of the detector for different areas is different. In the embodiment of the application, the tracked object can be determined to be output or killed through the detection accuracy of the detector in different areas, so that whether the tracked object is output or killed can be more accurately judged, and the efficiency of outputting the tracked object can be improved.

Specifically, it is possible to determine position information of the tracking object in the current frame according to the matching result, and to query the performance map for detection accuracy corresponding to the tracking object based on the position information, and to calculate the first state quantity based on the detection accuracy.

In one possible implementation, if there is no detection object matching the tracking object in at least one detection object in the current frame output by the detector, the position information of the tracking object in the current frame may be determined according to the motion state of the tracking object. If the tracking object is matched with one of the at least one detection object (referred to as a first detection object), it can be understood that the first detection object and the tracking object are the same object, and the position information of the first detection object is used as the position information of the tracking object.

Specifically, if there is no detection object matching the tracking object in at least one detection object in the current frame output by the detector, the predicted position of the tracking object in the current frame may be obtained according to the motion state information of the tracking object, where the motion state information may specifically include information such as a motion speed, a motion direction, or a start position of the tracking object, and the motion state information may be extracted from the input data or calculated according to the input data. Then, calculating a predicted distance value between the tracking object in the current frame and a sensor for acquiring the current frame according to the predicted position, and acquiring an actual distance value between the predicted position in the current frame and acquisition equipment through input data; and if the difference value between the predicted distance value and the actual distance value is larger than a first threshold value, taking the predicted position as the position information of the tracking object in the current frame.

For example, it may be understood that, if it is detected through the input data that a difference between a distance between a predicted position where the tracking object is located and the acquisition device and the predicted distance is greater than a first threshold value, which indicates that the tracking object may be occluded so that the acquisition device cannot acquire data of the tracking object, the predicted position may be used as the position information of the tracking object. Therefore, in the embodiments of the present invention, it is possible to adaptively solve the problem that the tracking target is blocked, to avoid the temporary loss of the tracking target by predicting the position of the tracking target, to realize continuous tracking of the tracking target, and to stably output the tracking target even if there is a temporary loss.

More specifically, the first state quantity may include a first output indication state quantity for indicating whether or not the tracking object is output in the current frame and a first extinction indication state quantity for indicating whether or not the tracking object is extinguished in the current frame, and generally, the first output indication state quantity and the first extinction indication state quantity are in a negative correlation relationship. The manner of calculating the first state quantity may specifically include: acquiring a direct observation quantity, wherein the direct observation quantity is used for representing the motion state of a tracking object, or the motion state information is used as the direct observation quantity; then, according to the detection accuracy and the direct observed quantity, calculating a first posterior probability and a second posterior probability of the tracked object in the current frame, wherein the first posterior probability is used for representing the probability of outputting the tracked object in the current frame, and the second posterior probability is used for representing the probability of eliminating the tracked object in the current frame; and obtaining a first output indication state quantity based on the first posterior probability, and obtaining a first extinction indication state quantity based on the second posterior probability. For example, the first posterior probability may be converted to generate an observation likelihood ratio regarding the precision and the recall ratio and used as the first output indication state quantity, and the first posterior probability may be converted to generate an observation likelihood ratio regarding the precision and the recall ratio and used as the first death indication state quantity.

Optionally, a second output indication state quantity and a second extinction indication state quantity of the tracking object in the previous frame may also be obtained, where the first state quantity includes a first output indication state quantity and a first extinction indication state quantity, the second output indication state quantity is used to indicate whether the tracking object is output in the previous frame, and the second extinction indication state quantity is used to indicate whether the tracking object is erased in the previous frame; and then fusing the first posterior probability and the second output indication state quantity to obtain a first output indication state quantity, and fusing the second posterior probability and the second apoptosis indication state quantity to obtain a first apoptosis indication state quantity. Therefore, in the embodiment of the present application, whether to output the tracking object or to die the tracking object can be determined quickly and accurately in a manner of iterating the state quantity of the tracking object in the input data.

In general, the first posterior probability calculated in the case where there is no detection object matching the tracking object among the at least one detection object is larger than the first posterior probability calculated in the case where the tracking object matches the first detection object among the at least one detection object; the second posterior probability calculated in the case where the detection object matching the tracking object does not exist in the at least one detection object is larger than the second posterior probability calculated in the case where the tracking object matches the first detection object in the at least one detection object.

Of course, the first posterior probability may be directly used as the first output indication state quantity, and the second posterior probability may be used as the first death indication state quantity, so as to reduce the subsequent calculation quantity.

In one possible implementation, a performance map of the detector may also be obtained prior to step 404. The performance map may be extracted from a memory, or may be obtained by analyzing the performance of the detector.

The manner of obtaining the performance map may specifically include dividing a detection range of the detector into a plurality of grids, and obtaining truth data, where the truth data includes the collected data collected by the collecting device and annotation information of a corresponding truth object, such as a sequence of annotated objects, and the annotation information may include a matching result between the truth object and the truth object in each frame of the collected data. The collected data is used as the input of the detector to obtain the information of the predicted object output by the detector, such as the sequence of the predicted object, and then the difference value between the information of the predicted object and the information of the true value object is compared, so that the detection accuracy corresponding to each grid can be calculated, and the performance map of the detector is obtained. The detection accuracy may be represented by a recall rate, a precision index, or an average precision.

The manner of dividing the detection range may include dividing according to a depth or an angle range. For example, the division may be performed according to the distance from the acquisition device, such as dividing the range within 10 meters into a grid, dividing the range of 10-20 meters into a grid, and so on. Or, the vehicle body may be divided according to a plane area, for example, in an automatic driving scene, a certain area where the vehicle head faces may be divided into a grid, areas where the vehicle bodies on both sides face may be divided into grids, and so on. Of course, the grid may also be divided according to a three-dimensional space, that is, the grid may be divided by combining the depth and the plane area, and the grid may be specifically adjusted according to an actual application scene.

In addition, the detection range of the detector may be an area within a preset range from the acquisition device, for example, an area within 100 meters of the diameter of the acquisition device, or an effective detection range of the acquisition device may be used as the detection range, for example, a range in which the accuracy of the detector exceeds 90% is used as the detection range, and the detection range may be specifically selected according to an actual application scenario.

In one possible implementation, if there is a detection object that does not match the tracking object in at least one detection object in the current frame output by the detector, the detection object that does not match the tracking object may be regarded as a new tracking object and tracked in the next frame. For example, when tracking an object in a current frame, the tracking object includes objects A, B, C and D, and the detection object includes A, B, C, D and E, and when tracking an object in a next frame, the object E may be taken as the tracking object and tracked.

In addition, if the performance graph indicates the detection accuracy of the detector in different areas and different types of objects within the detection range, when the detection accuracy corresponding to the position information of the tracked object is queried, the detection accuracy corresponding to the tracked object is also queried according to the type of the tracked object, that is, the detection accuracy corresponding to the tracked object is queried in the performance graph based on the position information of the tracked object in the current frame and the type of the tracked object, so that finer-grained division of the detection accuracy of the detector is realized, the calculated first state quantity is more accurate, the analysis result of tracking or fading the tracked object is more accurate, and whether the tracked object is output or faded can be efficiently determined.

The foregoing describes a flow of the target tracking method provided in the present application, and for convenience of understanding, the following describes the flow of the target tracking method provided in the present application in more detail with reference to a specific application scenario.

Referring to fig. 5, an architecture diagram of an application of the target tracking method provided in the present application is shown.

Therein, the architecture is divided into two parts, an offline part 501 and an online part 502, which are shown in fig. 5.

The offline part 501 is to divide the detection range of the detector last time, and count the detection accuracy in each grid by adding a labeled truth-value sequence in advance, that is, a sequence added to an object in the collected truth-value data, so as to obtain the performance map of the detector.

The online section 502 analyzes the data acquired by the acquisition device using the performance map of the detector output from the offline section, and outputs a sequence of tracked objects.

Specifically, online portion 502 may include collecting input data by a collection device, then detecting an object in each frame of the input data using a detector, and outputting the detected object in each frame. And then performing data association analysis based on the input data, namely analyzing a matching result between the tracking object and the detection object, and then performing tracking management on the tracking object based on the matching result.

Further, after the input data is acquired by the acquisition device, the data in each frame of the input data is detected using a detector to identify an object in each frame. During the processing of each frame, the objects in the tracking object set can be tracked. For example, when a current frame is processed, a tracking object set including one or more tracking objects including a detection object detected by the detector in a previous frame may be acquired. And then matching the tracked objects in the tracked object set with one or more detected objects detected in the current frame to realize data association between the tracked objects and the detected objects, such as adding the information of the tracked objects in the current frame to the sequence of the tracked objects. The matching result may include multiple types, such as matching to a tracked object and a detected object, absence of a detected object matching to the tracked object, absence of a tracked object matching to the detected object, and the like, where each tracked object or detected object has a state quantity within one pair, such as generating an indication state quantity or a death indication state quantity, and the generated indication state quantity is used to indicate whether to output the tracked object, and the death indication state quantity is used to indicate whether to destroy the tracked object, that is, to stop tracking the tracked object.

Then, the detected object can be tracked according to the matching result. If the tracked object and the detected object are successfully matched, namely the detected object and the tracked object are the same object, the tracked object can be continuously tracked according to the detection accuracy of the detector for the detected object, and the positive gain is performed on the generation indicating state quantity of the tracked object, namely, the confidence of outputting the tracked object is increased, and meanwhile, the negative gain can be performed on the extinction indicating state quantity of the tracked object. If there is no tracking object matching the detection object, the detection object may be added to the tracking object set, a new tracking object is tracked in the next frame, and the generation indication state quantity and the extinction indication state quantity of the tracking object are initialized. If the detection object matched with the tracking object does not exist, the tracking object can be subjected to occlusion analysis, namely whether the tracking object is occluded or not is judged, if the tracking object is determined to be occluded, the tracking object can be continuously tracked, and if the tracking object is not occluded, the tracking object can be killed, namely the tracking of the tracking object is stopped. The occlusion analysis may be specifically performed according to a difference between a predicted distance value between a predicted position of the tracked object and the acquisition device and an actual predicted value corresponding to the predicted position in the input data, if the difference between the predicted distance value and the actual distance value is too large, it may be determined that the tracked object is occluded, and if the predicted position does not detect the object, it may be determined that the tracked object is not tracked, or the tracked object exceeds the acquisition range of the acquisition device, that is, it may be unnecessary to track the tracked object.

If the tracking target meets the output condition, if the value of the generation instruction state quantity exceeds the preset value, the tracking target can be output. If the tracked object meets the extinction condition, if the value of the extinction indicating state quantity exceeds the preset value, the tracked object can be extinguished, namely, the tracked object is deleted from the tracked object set, namely, the tracking of the tracked object is stopped.

Therefore, in the embodiment of the application, the detector can be used for detecting the object in the input data, and the output or the death of the tracked object can be analyzed based on the detection accuracy of the detector in different areas and/or different categories, so as to obtain an accurate analysis result, thereby improving the output or the death efficiency of the tracked object.

Further, for the convenience of understanding, the target tracking method provided by the present application may be divided into a plurality of stages, such as specifically, detector performance encoding, target tracking, and the like. The following describes the respective stages.

Stage one, detector performance encoding

The purpose of the detector performance coding is to count the performance of the detector according to the region or object category, and determine the performance of the detector in different regions and different categories, so that the life cycle management can be performed on the tracked object in a targeted manner. The specific measure for measuring the performance of the detector may include a recall rate, an accuracy measure, or an average accuracy, etc., and the recall rate and the accuracy are exemplified in this embodiment, and the recall rate and the accuracy mentioned below may also be replaced by the average accuracy or other parameters for measuring the performance of the detector.

Illustratively, the flow of detector performance encoding may be as shown in fig. 6.

First, grid division is performed. An interested detection area is defined as the detection range of the detector, the detection range is divided into a plurality of grids, the detection range can be divided according to a two-dimensional plane, or the detection range can be divided according to a three-dimensional space, which is equivalent to discretizing the detection space of the detector, so as to conveniently count the performance of the detector in different areas. For example, if the method provided by the present application is deployed in a vehicle, a rectangular area may be divided from the central axis of the vehicle as the detection range. It can be understood that a two-dimensional grid map is obtained by performing grid division on a detection area of the detector, and spatial discretization of the detection area is realized, so that performance differences of the detection area on different areas can be conveniently counted subsequently.

And acquiring data with true value labeling, wherein the data comprises data acquired by a sensor, labeling the data acquired by the sensor, and labeling information of a true value object included in each frame of the acquired data, namely true value labeling, such as information of a sequence, a position or a category of the true value object.

The data collected by the sensor is then used as an input of the detector, and information of one or more detection objects in each frame in the collected data, such as positions or categories, is output. The detector may comprise a pre-trained model for detecting or identifying objects in the input data, etc., such as may comprise an object detection model, a classification model, etc.

Accuracy calculations, such as recall rate and accuracy of different areas, are then performed based on the information of the detection object output by the detector and the information of the labeled truth object, so that the detection accuracy of the detector is represented by the recall rate and accuracy, and a performance map of the detector is obtained. Of course, the detection accuracy of the detector may be represented by other parameters, such as average accuracy, besides the recall ratio and the accuracy, and may be specifically adjusted according to an actual application scenario, which is not limited in this application.

If the detection accuracy of the detector can be expressed as:

the recall ratio is as follows:

precision:

wherein (u, v) represents an index value or a coordinate value of the grid, and the like, for representing a position of the grid, for example, a coordinate system may be established in the detection area, u may be an abscissa of a center point of a certain grid in the detection area, v may be an ordinate of a center point of a certain grid in the detection area, or u may be a longitude of a center point of a certain grid in the detection area, v may be a longitudinal latitude of a center point of a certain grid in the detection area, and the like, TP represents a positive detection number (true positive), FN represents a False detection number (False negative), FP represents a False detection number (False positive), recalling (u, v) and Precision (u, v) respectively represent recalling rate and Precision of the detector in different areas, and the Recall rate may be used to evaluate a performance of the detector not being missed, that a higher Recall value is, a lower number of missed detections is, and the Precision is used to evaluate a performance of the detector not being missed, i.e., the higher the Precision, the less false positives.

After the recall rate and the precision of each grid are obtained, the recall rate and the precision of each grid can be counted, and finally a performance map is obtained, so that the detection accuracy of each grid can be conveniently inquired in an online part. The performance graph may be stored in the memory, so that the performance graph may be extracted from the memory when the phase two is performed, or the phase one may be performed before the phase two is performed, so that the phase two may be performed based on a result of the phase one, and a specific execution timing sequence may be adjusted according to an actual application scenario.

In addition, in addition to counting the overall detection accuracy of each grid, the detection accuracy of the detector in different categories in each grid can be counted in a finer granularity manner. For example, generally, the detection accuracy of the detector for different classes of objects may be different, and in calculating the recall rate and the precision, different classes of TP, FN and FP may be counted respectively, so that the recall rate and the precision of different classes in each grid are calculated based on the different classes of TP, FN and FP. Therefore, the performance of the detector can be divided into finer granularity, and more accurate detection accuracy can be obtained.

The first stage may be performed offline, that is, before target tracking is performed, performance of the detector may be analyzed, and an analysis result may be stored in the memory, and when target tracking is performed, a performance map of the detector may be extracted from the stored data, so that accurate tracking of the target may be achieved in combination with performance of the detector. Of course, the second phase may be executed before the second phase is performed, and the adjustment may be specifically performed according to an actual application scenario, which is not limited in the present application.

Generally, some commonly used target detection tasks for 3D point clouds, such as target detection tasks performed by using deep neural networks such as PointRCNN and TANet, only evaluate the overall performance of the algorithm on a test set to obtain the overall recall rate and accuracy, and do not consider the performance difference of detectors due to the difference in the characteristics of point cloud density, number and the like of different targets in different directions and different categories. According to the method and the device, the detection accuracy of different areas and/or different types is counted, which is equivalent to obtaining the detection accuracy of finer granularity, so that the state of the tracked object can be more accurately estimated based on the finer detection accuracy in the follow-up process, the accuracy of an estimation result is improved, and the efficiency of outputting or eliminating the tracked object can be indirectly improved.

Stage two, target tracking

Illustratively, the flow of target tracking may be as shown in fig. 7.

First, information of one or more detection objects in a current frame in input data, i.e., one of a set of detection objects or information such as a class of an identified detection object, a position in the current frame, etc., is output by a detector.

And then matching the tracking object with the detection object, and judging whether the detection object associated with the tracking object exists or not. A tracking object is any one object in a set of tracking objects.

The tracking object set includes the object identified when the previous frame is detected, for example, when the first frame is processed, the object detected in the first frame is added to the tracking object set, when the second frame is processed, the tracking object in the tracking object set and the object identified in the second frame are tracked, and the tracking object set is updated based on the new detected object, so that the tracking object set continues to be tracked in the next frame, and so on.

However, there are various situations in the matching result between the tracking object and the detection object, such as the presence of the detection object associated with the tracking object, the absence of the detection object associated with the tracking object, or the absence of the tracking object associated with the detection object, which will be described below.

Case one, the existence of a detection object associated with a tracking object

If the detection object associated with the tracking object exists, inquiring corresponding detection accuracy in the performance map according to the position and the type of the detection object, and performing positive gain on the target generation management model and performing negative gain on the target extinction management model. The target generation management model is used for calculating the first output indication state quantity, and the target extinction management model is used for calculating the first extinction indication state quantity.

Second, there is no tracking object associated with the detection object

If the detection object set includes redundant detection objects that do not match the tracking objects in the tracking object set, the detection object set may be updated by using the detection object, that is, the detection object is added to the tracking object set, so as to track the detection object in the next frame.

Case three, absence of detection object associated with tracking object

If there is no detection object associated with the tracking object, the position of the tracking object in the current frame may be predicted, and then the corresponding detection accuracy may be queried in the performance map according to the predicted position and the category of the tracking object.

And then judging whether the tracked object is shielded or not, if the tracked object is confirmed to be shielded, continuously performing positive gain on the target generation management model, and performing negative gain on the target extinction management model.

If the tracked object is determined not to be blocked, negative gain can be performed on the target generation management model, and positive gain can be performed on the target extinction management model. That is, the possibility of outputting the tracking object is reduced, and the possibility of disappearing the tracking object is increased.

Generally, in the process of tracking a target, a situation that the tracked object is shielded due to the influence of other obstacles or other tracked objects often occurs, and one of the meanings of multi-target tracking compared with target detection is to be able to maintain target output for a period of time when detection failure occurs and detection omission occurs due to shielding of the tracked object, so that important information can be provided for further behavior decisions, for example, important information is provided for behavior decisions of an automatic driving vehicle.

Therefore, occlusion analysis is introduced in the tracking process, and meanwhile, a processing mode when the target is occluded is considered in the tracking management module. For example, as shown in fig. 8, a 3D lidar scanner may be disposed on the roof of the host vehicle, the scanning range of the lidar is 360 degrees, and if the scanning range is divided into 720 angular ranges with an angular resolution of 0.5 degrees, 720 virtual rays are provided. When the occlusion analysis is performed, whether other obstacles exist on the 720 virtual rays or not and the radial distance value of the rays from the origin of the coordinate system to the obstacles need to be initially set to be free of obstacles and to be 0 by default according to the obstacle point cloud of each frame.

Specifically, as shown in fig. 8, at time t-1 (i.e., the previous frame), the scan surface of the tracked object is completely visible with respect to the host vehicle, and then the ray distance value in the virtual ray range covering the tracked object can be calculated according to the spatial position occupied by the tracked object, and the ray distance value in the corresponding angle range of the tracked object can also be calculated according to the obstacle point cloud distribution at that time, and the two distance values of each ray are substantially equal due to no occlusion at time t-1. At time t, the tracked object is affected by the shielding of another obstacle, and most of the area of the tracked object cannot be scanned by the laser radar at this time, the solid line shown in fig. 8 is the area scanned by the laser radar, and the virtual line segment is the area which cannot be scanned by the laser radar, that is, the area shielded by the obstacle due to the shielding.

Similarly, a theoretical ray distance value (i.e., a predicted distance value) in a virtual ray range covering the tracked object may be calculated according to the spatial position occupied by the tracked object, and an actual ray distance value (i.e., an actual distance value) may be calculated according to the point cloud distribution of the obstacle at the time. Therefore, the relationship between the predicted distance and the actual distance of the virtual ray is used as a judgment condition for judging whether the tracked object is occluded or not.

For example, if half or more of the virtual rays in the virtual ray coverage area of the tracked object are occluded, the tracked object is considered to be occluded.

Further, if a continuous plurality of frames of the tracking target are occluded, the tracking target can be eliminated. For example, if the tracking object is detected to be occluded for 10 consecutive frames, the tracking object may be killed to reduce the workload.

Therefore, in the embodiment of the application, the occlusion analysis of the tracking object is added, even if the tracking object is occluded, the tracking object can be further processed according to the occluded situation, and the output or the death tracking object can be determined more accurately and efficiently.

The object generation management model and the object generation model involved in the above-described various cases are exemplarily described below.

Firstly, a management mechanism is established for each tracked object, and the management mechanism comprises a target generation management model and a target extinction management model and is used for judging whether the tracked object should be generated or extinct. The two models can consider the situation that the tracked object is shielded, namely, a proper gain is selected according to the shielded situation and the non-shielded situation, the problem that the tracked object is lost due to shielding to a certain degree can be solved, and the tracked object can be continuously tracked. For example, if the tracked object is directly erased when the tracked object disappears, if the tracked object is detected again, the state of the tracked object needs to be initialized, and the tracked object needs to be tracked again, so that the efficiency is low.

Specifically, for time t, M tracking object sequences Θ ═ θ_1,t,θ_2,t,…,θ_i,t,θ_M,tConsidering the historical tracking information of the tracked objects, the generation and extinction management process of each tracked object can be represented by a recursive bayesian model, and as shown in fig. 9, a recursive bayesian diagram of the tracking state of a certain tracked object is shown.

Wherein, theta_tA specific state quantity (i.e. a first state quantity) of the tracking object concerned at the time t, wherein the state quantity can be understood as a hidden variable, i.e. the state quantity can not be directly observed, and can be directly observed through a direct observed quantity O observed at each time_1,t＝[O₁,O₂…,O_t]In which O is_t＝o_t,1,o_t,2,…,o_t,NThe hidden variables are indirectly inferred, and the state of the tracked object is estimated.

For example, hidden variables may include a variable x that indicates whether the tracked object may be generated_i,tAnd a variable y for indicating whether the tracking object can be extinguished_i,t。

The specific cases of the production management and the extinction management will be described below, respectively.

1. Object generation management model

In order to determine whether to output a tracking target from an observed quantity, it is necessary to calculate the hidden variable x_i,tIs determined (i.e., the first a posteriori probability). The posterior probability of whether the tracking object is generated at the time t can be obtained according to a recursive Bayes posterior estimation formula as follows:

wherein C is a normalization coefficient, k belongs to {0,1}, a value of k being 0 indicates that the tracking object is not generated (i.e., not output), and 1 indicates that the tracking object needs to be generated.

Is a likelihood estimate, and:

therefore, to determine whether to output the tracking target, the following ratio is calculated here:

if the ratio r is larger than 1, the tracking object is considered to be really existed, and the tracking object can be output so as to facilitate the downstream task. For the sake of calculation, the above equation is converted into a logarithmic function form here:

at the moment, the original expression is changed into an accumulation iterative expression from a multiplication expression, and the calculation complexity is greatly reduced. It is therefore critical that observation likelihood ratios need to be calculated

To this end, embodiments of the present application build observation models for management of target generation. For target generation (i.e. outputting a tracking object), the recall rate and detection accuracy of the detector have a large influence on target generation, and if the recall rate and accuracy of the used detector are higher, the higher the probability that a certain object is detected by the detector, the higher the probability that the object is output as a real tracking object is. In addition, in the process of tracking the tracked object, the tracked object may appear in different directions from the acquisition device, and differences in performance of the detector may also appear, so that the observation model established in the embodiment needs to be combined with recall rate and accuracy of the detector.

Here, taking the detection accuracy queried in the detector performance map as an important observation quantity generated by the judgment target, the following can be obtained:

wherein, R (u, v) and P (u, v) respectively represent recall rate and precision value inquired in the performance map of the detector according to the area where the observation quantity matched with the tracking object is located. The observed likelihood ratio is thus obtained as:

the forward gain is usually a positive value, i.e., the confidence of target generation is increased, and when the forward gain is applied to the target generation management model in fig. 7, the above-described gain is applied to the target generation management model

Is that

Represents a negative gain when the target generation management model is subjected to the negative gain

Can select

Usually negative, meaning minusLess confidence in the tracking formation is generated.

The final decision conditions for tracking object generation can thus be obtained as follows:

if the current frame is the first frame detected as the tracking object, the setting can be made

That is, the first frame in which the tracking object is detected does not output the tracking object by default, so as to avoid the situation of false detection.

Therefore, in the embodiment of the application, whether the tracked object is output or not is determined by combining the detection accuracy of the detector, and different gain directions are adaptively selected according to different situations, so that the tracked object can be more accurately tracked, the time length of delay output is reduced, and a more accurate output result is more efficiently obtained.

2. Target extinction management model

Similar to the target generation management model, the goal of target elimination is to maintain the number of internal tracking object sequences, delete the erroneously detected tracking objects in time, avoid maintaining tracking object sequences whose number increases with the passage of time, and eliminate the updated tracking object sequences after management, which can also reduce the association uncertainty when associating the tracking objects with the detection objects.

When managing target extinction, in order to determine whether a certain tracked object should be extinguished, it is necessary to calculate the hidden variable y by using the observed quantity_i,tThe posterior probability of the tracking object can also establish a decision condition for the extinction of the tracking object through recursive Bayesian inference:

similarly to the target generation management described above, an observation model may be created to calculate a likelihood ratio as to whether or not the tracking target should die. For the target extinction process, the influence of the recall rate and the detection precision of the detector is large, and if the recall rate and the precision of the used detector are higher, the higher the recall rate and the precision of the used detector are, the smaller the probability of the target extinction is shown to be smaller if the detector observes the tracking object. Therefore, the observation likelihood ratio for judging the target extinction is obtained as follows:

the forward gain is usually a positive value, that is, the confidence of target extinction is increased to accelerate the extinction of the tracked object, and in the above fig. 7, when the forward gain is applied to the target extinction management model, the above mentioned

Is that

Represents the negative gain, when the negative gain is performed on the target extinction management model

Can select

Therefore, in the embodiment of the present application, the detection accuracy corresponding to the tracked object may be extracted from the performance map of the detector, so that the confidence of tracking the tracked object may be accurately calculated based on the accuracy, the efficiency of outputting or erasing the tracked object may be improved, and the tracked object may be efficiently output, so as to improve the efficiency of further processing the tracked object, or an invalid tracked object may be erased in time, so as to reduce the workload required for tracking the invalid tracked object, and improve the work efficiency of the device. It can be understood that the method and the device can avoid manual setting to generate a casualty fixed threshold value by means of Bayesian estimation, and improve efficiency and accuracy of outputting or casualty tracking objects. Meanwhile, the method can adapt to the problem that the tracked object is shielded, output the sequence of the optimal tracked object, and effectively solve the problem that the performance of the detector and the tracking output performance are difficult to balance by setting various thresholds in the common tracking management scheme.

The foregoing describes a flow of a target tracking method provided in the present application, and a device for executing the foregoing method is described in detail below.

First, referring to fig. 10, a schematic structural diagram of a target tracking device provided in the present application is as follows.

The target tracking device includes:

a detection module 1001, configured to obtain at least one detection object in a current frame through a detector, where the current frame is any one frame in input data;

a tracking module 1002 for acquiring a tracking object, the tracking object including an object detected in a frame previous to a current frame using a detector;

a matching module 1003, configured to match the tracking object with at least one detection object to obtain a matching result;

the tracking module 1002 is further configured to determine a first state quantity of the tracking object according to the matching result and a performance map of the detector, where the first state quantity is used to indicate whether the tracking object is output or whether the tracking object is killed, and the performance map includes detection accuracies of the detector in a plurality of grids in the detection range.

In a possible implementation, the tracking module 1002 is specifically configured to: determining the position information of the tracking object in the current frame according to the matching result; inquiring the detection accuracy corresponding to the tracking object in the performance map based on the position information of the tracking object in the current frame; the first state quantity is calculated according to the detection accuracy.

In a possible implementation, the tracking module 1002 is specifically configured to: if the at least one detection object does not have a detection object matched with the tracking object, determining the position information of the tracking object in the current frame according to the motion state information of the tracking object; and if the tracking object is matched with a first detection object in the at least one detection object, taking the position information of the first detection object as the position information of the tracking object in the current frame.

In a possible implementation, the tracking module 1002 is specifically configured to: acquiring a predicted position of the tracking object in the current frame according to the motion state information of the tracking object; calculating a prediction distance value between a tracking object in the current frame and a collection device for collecting the current frame according to the prediction position; acquiring an actual distance value between a predicted position in a current frame and acquisition equipment through input data; and if the difference value between the predicted distance value and the actual distance value is larger than a first threshold value, taking the predicted position as the position information of the tracking object in the current frame.

In one possible implementation, the first state quantity includes a first output indication state quantity and a first extinction indication state quantity, the first output indication state quantity is used for indicating whether the tracking object is output in the current frame, and the first extinction indication state quantity is used for indicating whether the tracking object is extinguished in the current frame; the tracking module 1002 is specifically configured to: acquiring a direct observation quantity, wherein the direct observation quantity is used for representing the motion state of a tracking object; according to the detection accuracy and the direct observed quantity, calculating a first posterior probability and a second posterior probability of the tracked object in the current frame, wherein the first posterior probability is used for representing the probability of outputting the tracked object in the current frame, and the second posterior probability is used for representing the probability of eliminating the tracked object in the current frame; and obtaining a first output indication state quantity based on the first posterior probability, and obtaining a first extinction indication state quantity based on the second posterior probability.

In a possible implementation, the tracking module 1002 is specifically configured to: acquiring a second output indication state quantity and a second extinction indication state quantity of the tracking object in the previous frame, wherein the first state quantity comprises a first output indication state quantity and a first extinction indication state quantity, the second output indication state quantity is used for indicating whether the tracking object is output in the previous frame or not, and the second extinction indication state quantity is used for indicating whether the tracking object is extinguished in the previous frame or not; and fusing the second posterior probability and the second death indication state quantity to obtain a first death indication state quantity.

In one possible embodiment, the detection accuracy of each grid in the performance map comprises a plurality of categories of corresponding accuracies; the tracking module 1002 is specifically configured to: acquiring the category of a tracking object; and inquiring the detection accuracy corresponding to the tracking object in the performance map based on the position information of the tracking object in the current frame and the category of the tracking object.

In a possible implementation, the tracking module 1002 is further configured to: and if the at least one detection object comprises a second detection object which is not matched with the tracking object, taking the second detection object as a new tracking object, and tracking the new tracking object in the next frame of the current frame.

In a possible implementation, the target tracking apparatus further includes an encoding module 1004 for dividing a detection range of the detector into a plurality of grids before acquiring at least one detection object in the current frame by the detector; acquiring truth value data, wherein the truth value data comprises acquisition data acquired by acquisition equipment and information of a corresponding truth value object; detecting the acquired data by using a detector to obtain information of a predicted object; and calculating the detection accuracy corresponding to each grid in the plurality of grids according to the prediction object and the truth object to obtain a performance map.

Referring to fig. 11, a schematic structural diagram of another object tracking device provided in the present application is as follows.

The target tracking device may include a processor 1101 and a memory 1102. The processor 1101 and memory 1102 are interconnected by wires. Wherein program instructions and data are stored in memory 1102.

The memory 1102 stores program instructions and data corresponding to the steps of fig. 4-9 described above.

The processor 1101 is configured to perform the method steps performed by the target tracking apparatus shown in any one of the embodiments of fig. 4-9.

Optionally, the object tracking device may also include a transceiver 1103 for receiving or transmitting data.

Also provided in an embodiment of the present application is a computer-readable storage medium having stored therein a program for generating a running speed of a vehicle, which when running on a computer, causes the computer to execute the steps in the method as described in the foregoing embodiment shown in fig. 4 to 9.

Alternatively, the aforementioned target tracking device shown in fig. 11 is a chip.

The present application further provides a target tracking apparatus, which may also be referred to as a digital processing chip or a chip, where the chip includes a processing unit and a communication interface, the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit, and the processing unit is configured to execute the method steps executed by the target tracking apparatus shown in any one of the foregoing embodiments in fig. 4 to fig. 9.

The embodiment of the application also provides a digital processing chip. Integrated with the digital processing chip are circuitry and one or more interfaces for implementing the processor 1101 described above, or the functionality of the processor 1101. When integrated with memory, the digital processing chip may perform the method steps of any one or more of the preceding embodiments. When the digital processing chip is not integrated with the memory, the digital processing chip can be connected with the external memory through the communication interface. The digital processing chip implements the actions performed by the target tracking device in the above embodiments according to the program codes stored in the external memory.

Embodiments of the present application also provide a computer program product, which when running on a computer, causes the computer to execute the steps performed by the target tracking apparatus in the method described in the foregoing embodiments shown in fig. 4 to 9.

The target tracking device provided by the embodiment of the application can be a chip, and the chip comprises: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer executable instructions stored in the storage unit to cause the chip in the server to execute the neural network training method described in the embodiments shown in fig. 4-9. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.

Specifically, the aforementioned processing unit or processor may be a Central Processing Unit (CPU), a Network Processor (NPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic devices (programmable gate array), discrete gate or transistor logic devices (discrete hardware components), or the like. A general purpose processor may be a microprocessor or any conventional processor or the like.

Referring to fig. 12, fig. 12 is a schematic structural diagram of a chip according to an embodiment of the present disclosure, where the chip may be represented as a neural network processor NPU 120, and the NPU 120 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core portion of the NPU is an arithmetic circuit 1203, and the controller 1204 controls the arithmetic circuit 1203 to extract matrix data in the memory and perform multiplication.

In some implementations, the arithmetic circuitry 1203 internally includes multiple processing units (PEs). In some implementations, the operational circuitry 1203 is a two-dimensional systolic array. The arithmetic circuit 1203 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 1203 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 1202 and buffers each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 1201 and performs matrix operation with the matrix B, and partial results or final results of the obtained matrix are stored in an accumulator (accumulator) 1208.

The unified memory 1206 is used for storing input data and output data. The weight data directly passes through a Direct Memory Access Controller (DMAC) 1205, and the DMAC is transferred to the weight memory 1202. The input data is also carried into the unified memory 1206 by the DMAC.

A Bus Interface Unit (BIU) 1210 for interaction of the AXI bus with the DMAC and the Instruction Fetch Buffer (IFB) 1209.

A bus interface unit 1210 (BIU) for fetching the instruction from the external memory by the instruction fetch memory 1209 and for fetching the original data of the input matrix a or the weight matrix B from the external memory by the storage unit access controller 1205.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1206 or to transfer weight data into the weight memory 1202 or to transfer input data into the input memory 1201.

The vector calculation unit 1207 includes a plurality of operation processing units, and performs further processing on the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as batch normalization (batch normalization), pixel-level summation, up-sampling of a feature plane and the like.

In some implementations, the vector calculation unit 1207 can store the processed output vector to the unified memory 1206. For example, the vector calculation unit 1207 may apply a linear function and/or a nonlinear function to the output of the operation circuit 1203, for example, linear interpolation is performed on the feature planes extracted by the convolution layer, and further, for example, a vector of accumulated values is used to generate an activation value. In some implementations, the vector calculation unit 1207 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to arithmetic circuitry 1203, e.g., for use in subsequent layers in a neural network.

An instruction fetch buffer (issue fetch buffer)1209 connected to the controller 1204, configured to store instructions used by the controller 1204;

the unified memory 1206, the input memory 1201, the weight memory 1202, and the instruction fetch memory 1209 are all On-Chip memories. The external memory is private to the NPU hardware architecture.

The operation of each layer in the recurrent neural network can be performed by the operation circuit 1203 or the vector calculation unit 1207.

Where any of the aforementioned processors may be a general purpose central processing unit, microprocessor, ASIC, or one or more integrated circuits configured to control the execution of the programs of the methods of fig. 4-9.

It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods described in the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Finally, it should be noted that: the above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.

Claims

1. A target tracking method, comprising:

acquiring at least one detection object in a current frame through a detector, wherein the current frame is any one frame in input data;

acquiring a tracking object including an object detected in a frame previous to the current frame using the detector;

matching the tracking object with the at least one detection object to obtain a matching result;

and determining a first state quantity of the tracking object according to the matching result and a performance map of the detector, wherein the first state quantity is used for indicating whether the tracking object is output or whether the tracking object is killed, and the performance map comprises the detection accuracy of the detector in a plurality of grids in a detection range.

2. The method according to claim 1, wherein the determining a first state quantity of the tracking object according to the matching result and the performance map of the detector comprises:

determining the position information of the tracking object in the current frame according to the matching result;

based on the position information of the tracking object in the current frame, inquiring the detection accuracy corresponding to the tracking object in the performance map;

calculating the first state quantity according to the detection accuracy.

3. The method of claim 2, wherein the determining the position information of the tracking object in the current frame according to the matching result comprises:

if the detection object matched with the tracking object does not exist in the at least one detection object, determining the position information of the tracking object in the current frame according to the motion state information of the tracking object;

and if the tracking object is matched with a first detection object in the at least one detection object, taking the position information of the first detection object as the position information of the tracking object in the current frame.

4. The method according to claim 3, wherein the determining the position information of the tracking object in the current frame according to the motion state information of the tracking object comprises:

acquiring the predicted position of the tracking object in the current frame according to the motion state information of the tracking object;

calculating a prediction distance value between the tracking object in the current frame and a collection device for collecting the current frame according to the prediction position;

acquiring an actual distance value between the predicted position and the acquisition equipment in the current frame through the input data;

and if the difference value between the predicted distance value and the actual distance value is larger than a first threshold value, taking the predicted position as the position information of the tracking object in the current frame.

5. The method according to any one of claims 2 to 4, wherein the first state quantity includes a first output indication state quantity for indicating whether the tracking object is output in the current frame and a first extinction indication state quantity for indicating whether the tracking object is extinguished in the current frame;

the calculating the first state quantity according to the detection accuracy includes:

acquiring a direct observation quantity, wherein the direct observation quantity is used for representing the motion state of the tracking object;

calculating a first posterior probability and a second posterior probability of the tracking object in the current frame according to the detection accuracy and the direct observation, wherein the first posterior probability is used for representing the probability of outputting the tracking object in the current frame, and the second posterior probability is used for representing the probability of dying the tracking object in the current frame;

and obtaining the first output indication state quantity based on the first posterior probability, and obtaining the first death indication state quantity based on the second posterior probability.

6. The method of claim 5, wherein said deriving the first output indicating state quantity based on the first a posteriori probability and the first death indicating state quantity based on the second a posteriori probability comprises:

acquiring a second output indication state quantity and a second extinction indication state quantity of the tracked object in the previous frame, wherein the first state quantity comprises a first output indication state quantity and a first extinction indication state quantity, the second output indication state quantity is used for indicating whether the tracked object is output in the previous frame, and the second extinction indication state quantity is used for indicating whether the tracked object is extinguished in the previous frame;

and fusing the first posterior probability and the second output indication state quantity to obtain the first output indication state quantity, and fusing the second posterior probability and the second apoptosis indication state quantity to obtain the first apoptosis indication state quantity.

7. The method of any one of claims 2-6, wherein the detection accuracy of each grid in the performance map comprises a plurality of categories of corresponding accuracies;

the querying, in the performance graph, the detection accuracy corresponding to the tracking object based on the position information of the tracking object in the current frame includes:

acquiring the category of the tracking object;

and inquiring the detection accuracy corresponding to the tracking object in the performance map based on the position information of the tracking object in the current frame and the category of the tracking object.

8. The method according to any one of claims 1-7, further comprising:

and if the at least one detection object comprises a second detection object which is not matched with the tracking object, taking the second detection object as a new tracking object, and tracking the new tracking object in the next frame of the current frame.

9. The method according to any of claims 1-8, wherein prior to said acquiring by the detector at least one detection object in the current frame, the method comprises:

dividing a detection range of the detector into a plurality of grids;

acquiring truth value data, wherein the truth value data comprises acquisition data acquired by acquisition equipment and information of a corresponding truth value object;

detecting the acquired data by using a detector to obtain information of a predicted object;

and calculating the detection accuracy corresponding to each grid in the grids according to the prediction object and the truth value object to obtain the performance map.

10. An object tracking device, comprising:

the detection module is used for acquiring at least one detection object in a current frame through a detector, wherein the current frame is any one frame in input data;

a tracking module for acquiring a tracking object including an object detected in a frame previous to the current frame using the detector;

the matching module is used for matching the tracking object with the at least one detection object to obtain a matching result;

the tracking module is further configured to determine a first state quantity of the tracked object according to the matching result and a performance map of the detector, where the first state quantity is used to indicate whether to output the tracked object or whether to die the tracked object, and the performance map includes detection accuracies of the detector in a plurality of grids in a detection range.

11. The apparatus according to claim 10, wherein the tracking module is specifically configured to:

calculating the first state quantity according to the detection accuracy.

12. The apparatus according to claim 11, wherein the tracking module is specifically configured to:

13. The apparatus according to claim 12, wherein the tracking module is specifically configured to:

14. The apparatus according to any one of claims 11-13, wherein the first state quantity comprises a first output indication state quantity for indicating whether the tracking object is output in the current frame and a first extinction indication state quantity for indicating whether the tracking object is extinguished in the current frame;

the tracking module is specifically configured to:

15. The apparatus according to claim 14, wherein the tracking module is specifically configured to:

16. The apparatus of any one of claims 11-15, wherein the detection accuracy of each grid in the performance map comprises a plurality of categories of corresponding accuracies;

the tracking module is specifically configured to:

acquiring the category of the tracking object;

17. The apparatus according to any one of claims 10-16, wherein the tracking module is further configured to:

18. The apparatus according to any of claims 10-17, wherein the apparatus further comprises an encoding module configured to:

before the at least one detection object in the current frame is acquired through the detector, dividing the detection range of the detector into a plurality of grids;

19. An object tracking apparatus comprising a processor coupled to a memory, the memory storing a program, the program instructions stored by the memory when executed by the processor implementing the method of any of claims 1 to 9.

20. A computer-readable storage medium comprising a program which, when executed by a processing unit, performs the method of any of claims 1 to 9.

21. An object tracking device comprising a processing unit and a communication interface, the processing unit obtaining program instructions through the communication interface, the program instructions when executed by the processing unit implementing the method of any one of claims 1 to 9.

22. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the method according to any of claims 1 to 9.