CN116266359A

CN116266359A - Target tracking method, device, computer equipment and storage medium

Info

Publication number: CN116266359A
Application number: CN202111542820.XA
Authority: CN
Inventors: 干磊; 马志超
Original assignee: Shenzhen Pudu Technology Co Ltd
Current assignee: Shenzhen Pudu Technology Co Ltd
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2023-06-20

Abstract

The application relates to a tracking method, a tracking device, computer equipment and a storage medium of a target object. The method comprises the following steps: acquiring depth point cloud, color map and Lei Dadian cloud acquired by acquiring a target environment; respectively carrying out density clustering on the depth point cloud and the radar point cloud to obtain a depth point cloud cluster and a radar point cloud cluster; detecting the target object in the color map to obtain a detection frame with a semantic tag; matching the detection frame with the depth point cloud cluster and the radar point cloud cluster respectively to obtain a target depth point cloud cluster and a target radar point cloud cluster which are intersected with the target detection frame; and tracking the target object based on the target detection frame and a fusion point cloud cluster between the target depth point cloud cluster and the target radar point cloud cluster. By adopting the tracking mode combining 2D and 3D, the method can improve the precision of a tracking system.

Description

Target tracking method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision, and in particular, to a method and apparatus for tracking a target object, a computer device, and a storage medium.

Background

Target tracking (Visual Object Tracking) is one of the important technologies in the field of computer vision. Despite extensive research in recent years, with the continuous progress of new times of science and technology, target tracking is becoming an indispensable key part for robots due to task needs and requirements of decision planning.

In the conventional target tracking scheme, the target is usually detected and tracked based on color maps acquired at different moments, so that the target tracking purpose is achieved. However, using conventional target tracking schemes may result in inaccurate tracking results.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a tracking method, apparatus, computer device, and storage medium for an object.

A method of tracking a target, the method comprising:

acquiring depth point cloud, color map and Lei Dadian cloud acquired by acquiring a target environment;

respectively carrying out density clustering on the depth point cloud and the radar point cloud to obtain a depth point cloud cluster and a radar point cloud cluster; detecting the target object in the color map to obtain a detection frame with a semantic tag;

matching the detection frame with the depth point cloud cluster and the radar point cloud cluster respectively to obtain a target depth point cloud cluster and a target radar point cloud cluster which are intersected with the target detection frame;

And tracking the target object based on the target detection frame and a fusion point cloud cluster between the target depth point cloud cluster and the target radar point cloud cluster.

In one embodiment, the performing density clustering on the depth point cloud and the radar point cloud to obtain a depth point cloud cluster and a radar point cloud cluster includes:

selecting depth points and radar points from the depth point cloud and the radar point cloud as target depth points and target radar points;

determining a first distance between the target depth point and a depth point neighborhood and a second distance between the target radar point and a radar point neighborhood;

when the first distance is smaller than a distance threshold, adding the target depth point to the depth point neighborhood; and adding the target radar point to the radar point neighborhood when the second distance is less than the distance threshold;

traversing the depth points in the depth point cloud and the radar points in the radar point cloud until the depth points in the depth point cloud and the radar points in the radar point cloud are added into the corresponding depth point neighborhood and the corresponding radar point neighborhood, and obtaining a depth point cloud cluster and a radar point cloud cluster according to the depth point neighborhood and the radar point neighborhood.

In one embodiment, the detecting the target object in the color map to obtain a detection frame with a semantic tag includes:

detecting a target object in the color map by using a target detection model to carry out framing to obtain a detection frame;

determining the behavior state of the target object, and generating a multi-level semantic tag according to the behavior state;

outputting a detection frame with the multi-level semantic tags.

In one embodiment, the target detection frame comprises a first target detection frame and a second target detection frame;

the step of matching the detection frame with the depth point cloud cluster and the radar point cloud cluster respectively to obtain a target depth point cloud cluster and a radar point cloud cluster which have intersection with the target detection frame comprises the following steps:

projecting the depth point cloud cluster to a plane where the color map is located, so as to obtain a depth point cloud projection based on the depth point cloud cluster and the first target detection frame; calculating the intersection ratio between the first target detection frame and the depth point cloud projection; determining a target depth point cloud cluster according to the size of the intersection ratio;

projecting the radar point cloud cluster to a plane where the color map is located to obtain radar point cloud projections and the second target detection frame, wherein an intersection area exists between each Lei Dadian cloud projection and the second target detection frame; and determining the radar point cloud cluster with the largest intersection area as a target radar point cloud cluster.

In one embodiment, the determining the target depth point cloud cluster according to the size of the merging ratio includes:

comparing each cross-over ratio with a preset cross-over ratio upper limit value and a preset cross-over ratio lower limit value;

when the first target intersection ratio is larger than the preset intersection ratio upper limit value, determining a depth point cloud cluster corresponding to the first target intersection ratio as a target depth point cloud cluster; the first target overlap ratio belongs to at least one of the overlap ratios;

when the second target intersection ratio is larger than the preset intersection ratio lower limit value and smaller than the preset intersection ratio upper limit value, clustering the depth point cloud clusters corresponding to the second target intersection ratio to obtain target depth point cloud clusters; the second target overlap ratio belongs to at least two of the overlap ratios.

In one embodiment, the target detection frame comprises a first target detection frame and a second target detection frame; the method further comprises the steps of:

when the first target detection frame is matched with the target depth point cloud cluster, the second target detection frame is matched with the target radar point cloud cluster, and tracking the target object based on the first target detection frame and the target depth point cloud cluster; or alternatively, the process may be performed,

And tracking the target object based on the second target detection frame and the target radar point cloud cluster.

In one embodiment thereof, the method further comprises:

and when the Euclidean distance between the target depth point cloud cluster and the radar point cloud cluster is smaller than the preset distance threshold, fusing the target depth point cloud cluster and the radar point cloud cluster to obtain the fused point cloud cluster.

In one embodiment, the tracking the target object based on the target detection frame and a fusion point cloud cluster between the target depth point cloud cluster and the radar point cloud cluster includes:

calculating a first matching cost between the current target detection frame and the target detection frame tracked by history, and a second matching cost between the current fusion point cloud cluster and the fusion point cloud cluster tracked by history; the current fusion point cloud cluster is a point cloud cluster fused between the target depth point cloud cluster and the radar point cloud cluster;

weighting and summing the first matching cost and the second matching cost to obtain a comprehensive matching cost;

determining a comprehensive matching result based on the comprehensive matching cost; the comprehensive matching result is used for representing the matching condition between the current target detection frame and the history tracking target detection frame and between the current fusion point cloud cluster and the history tracking fusion point cloud cluster;

And determining tracking information for tracking the target object according to the comprehensive matching result, and tracking the target object according to the tracking information.

In one embodiment, the determining the tracking information for tracking the target object according to the comprehensive matching result includes:

and when the comprehensive matching result is determined based on the comprehensive matching cost smaller than the preset loss threshold, carrying out state update on the target detection frame and the fusion point cloud cluster of the history tracking according to the current target detection frame and the current fusion point cloud cluster to obtain tracking information.

A tracking device for an object, the device comprising:

the acquisition module is used for acquiring depth point clouds, color maps and Lei Dadian clouds acquired by the target environment;

the clustering detection module is used for respectively carrying out density clustering on the depth point cloud and the radar point cloud to obtain a depth point cloud cluster and a radar point cloud cluster; detecting the target object in the color map to obtain a detection frame with a semantic tag;

the matching module is used for respectively matching the detection frame with the depth point cloud cluster and the radar point cloud cluster to obtain a target depth point cloud cluster and a radar point cloud cluster which are intersected with the target detection frame;

And the tracking module is used for tracking the target object based on the target detection frame and the fusion point cloud cluster between the target depth point cloud cluster and the radar point cloud cluster.

In one embodiment, the cluster detection module is further configured to select a depth point and a radar point from the depth point cloud and the radar point cloud as a target depth point and a target radar point; determining a first distance between the target depth point and a depth point neighborhood and a second distance between the target radar point and a radar point neighborhood; when the first distance is smaller than a distance threshold, adding the target depth point to the depth point neighborhood; and adding the target radar point to the radar point neighborhood when the second distance is less than the distance threshold; traversing the depth points in the depth point cloud and the radar points in the radar point cloud until the depth points in the depth point cloud and the radar points in the radar point cloud are added into the corresponding depth point neighborhood and the corresponding radar point neighborhood, and obtaining a depth point cloud cluster and a radar point cloud cluster according to the depth point neighborhood and the radar point neighborhood.

In one embodiment, the cluster detection module is further configured to detect a target object in the color chart by using a target detection model to perform framing, so as to obtain a detection frame; determining the behavior state of the target object, and generating a multi-level semantic tag according to the behavior state; outputting a detection frame with the multi-level semantic tags.

the matching module is further configured to project the depth point cloud cluster onto a plane where the color map is located, so as to obtain a depth point cloud projection based on the depth point cloud cluster and the first target detection frame; calculating the intersection ratio between the first target detection frame and the depth point cloud projection; determining a target depth point cloud cluster according to the size of the intersection ratio; projecting the radar point cloud cluster to a plane where the color map is located to obtain radar point cloud projections and the second target detection frame, wherein an intersection area exists between each Lei Dadian cloud projection and the second target detection frame; and determining the radar point cloud cluster with the largest intersection area as a target radar point cloud cluster.

In one embodiment, the matching module is further configured to compare each of the blending ratios with a preset blending ratio upper limit value and a preset blending ratio lower limit value; when the first target intersection ratio is larger than the preset intersection ratio upper limit value, determining a depth point cloud cluster corresponding to the target intersection ratio as a target depth point cloud cluster; the first target overlap ratio belongs to at least one of the overlap ratios; when the second target intersection ratio is larger than the preset intersection ratio lower limit value and smaller than the preset intersection ratio upper limit value, clustering the depth point cloud clusters corresponding to the second target intersection ratio to obtain target depth point cloud clusters; the second target overlap ratio belongs to at least two of the overlap ratios.

In one embodiment, the target detection frame comprises a first target detection frame and a second target detection frame; the apparatus further comprises:

the selection module is used for tracking the target object based on the first target detection frame and the target depth point cloud cluster when the first target detection frame is matched with the target depth point cloud cluster and the second target detection frame is matched with the target radar point cloud cluster; or tracking the target object based on the second target detection frame and the target radar point cloud cluster.

In one embodiment thereof, the apparatus further comprises:

and the fusion module is used for fusing the target depth point cloud cluster and the radar point cloud cluster to obtain the fused point cloud cluster when the Euclidean distance between the target depth point cloud cluster and the radar point cloud cluster is smaller than the preset distance threshold.

In one embodiment, the tracking module is further configured to calculate a first matching cost between the current target detection frame and the target detection frame of the history tracking, and a second matching cost between the current fusion point cloud cluster and the fusion point cloud cluster of the history tracking; the current fusion point cloud cluster is a point cloud cluster fused between the target depth point cloud cluster and the radar point cloud cluster; weighting and summing the first matching cost and the second matching cost to obtain a comprehensive matching cost; determining a comprehensive matching result based on the comprehensive matching cost; the comprehensive matching result is used for representing the matching condition between the current target detection frame and the history tracking target detection frame and between the current fusion point cloud cluster and the history tracking fusion point cloud cluster; and determining tracking information for tracking the target object according to the comprehensive matching result, and tracking the target object according to the tracking information.

In one embodiment, when the comprehensive matching result is determined based on the comprehensive matching cost smaller than the preset loss threshold, the tracking module is further configured to update the state of the target detection frame and the fusion point cloud cluster according to the current target detection frame and the current fusion point cloud cluster, so as to obtain tracking information.

A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to invoke and execute the steps of the method of tracking an object as described above.

A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to invoke and execute the steps of the tracking method of an object described above.

The target object tracking method, the target object tracking device, the computer equipment and the storage medium acquire depth point cloud, color map and Lei Dadian cloud acquired by acquiring a target environment; respectively carrying out density clustering on the depth point cloud and the radar point cloud to obtain a depth point cloud cluster and a radar point cloud cluster; detecting a target object in the color chart to obtain a detection frame with a semantic tag; matching the detection frame with the depth point cloud cluster and the radar point cloud cluster respectively to obtain a target depth point cloud cluster and a radar point cloud cluster which have intersection with the target detection frame; A2D-to-3D multi-sensor tracking scheme is constructed, a target object is tracked based on a target detection frame and a fusion point cloud cluster between a target depth point cloud cluster and a radar point cloud cluster, and the accuracy of a tracking system is improved through a 2D-to-3D combined tracking mode.

Drawings

FIG. 1 is an application environment diagram of a target tracking method in one embodiment;

FIG. 2a is a schematic flow chart of an algorithm of a tracking method of a target object in one embodiment;

FIG. 2b is a flow chart of a method for tracking a target object according to an embodiment;

FIG. 3a is a schematic diagram illustrating an intersection of a target detection frame and a point cloud cluster in one embodiment;

FIG. 3b is a schematic diagram illustrating another intersection of a target detection frame and a point cloud cluster in one embodiment;

FIG. 4 is a flow diagram of density clustering in one embodiment;

FIG. 5 is a schematic flow chart of determining a target depth point cloud cluster according to the size of the intersection ratio in one embodiment;

FIG. 6 is a schematic diagram of semi-supervised clustering in one embodiment;

FIG. 7 is a block diagram of a tracking device for an object in one embodiment;

FIG. 8 is a block diagram of a tracking device for an object in one embodiment;

fig. 9 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The target object tracking method provided by the application can be applied to an application environment shown in fig. 1. The tracking method of the target object is applied to a tracking system of the target object, and the tracking system of the target object comprises a terminal 102 and a server 104.

The terminal 102 may be, but is not limited to, a robot, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc.

The server 104 may be a separate physical server or may be a service node in a blockchain system, where a Peer-To-Peer (P2P) network is formed between service nodes in the blockchain system, and the P2P protocol is an application layer protocol that runs on top of a transmission control protocol (TCP, transmission Control Protocol) protocol.

The server 104 may be a server cluster formed by a plurality of physical servers, and may be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

The terminal 102 and the server 104 may be connected by a communication connection manner such as bluetooth, USB (Universal Serial Bus ) or a network, which is not limited herein.

In one embodiment, as shown in fig. 2a, an algorithm flowchart of a target tracking method is provided, as shown in fig. 2b, a flowchart of a target tracking method is provided, and the method is applied to the terminal in fig. 1 for illustration, and includes the following steps:

s202, acquiring depth point cloud, color map and Lei Dadian cloud acquired by acquiring the target environment.

The depth point cloud is converted from a depth map, the depth map refers to a map shot by a depth camera, each pixel value in the depth map is a vertical distance from a certain point of an object in space to a plane perpendicular to an optical axis of a lens and passing through an optical center (optical zero point of the depth camera), and the conversion of the depth map into the depth point cloud is an inverse process of 3D point projection to a 2D plane. The color map is captured by a color camera. The Lei Dadian cloud was obtained from lidar. Each point of the Lei Dadian cloud contains three-dimensional coordinate information, namely X, Y, Z three elements, and sometimes contains color information, reflection intensity information, echo frequency information and the like.

In one embodiment, before S202, the terminal converts the depth map into a depth point cloud using the camera intrinsic as a constraint, sets the depth point cloud coordinates in the world coordinate system to be (x, y, z), sets the depth map coordinates in the image coordinate system to be (x ', y'), sets D to be a depth value, and sets the camera intrinsic to be

The formula of the conversion is:

s204, performing density clustering on the depth point cloud and the radar point cloud respectively to obtain a depth point cloud cluster and a radar point cloud cluster; and detecting the target object in the color map to obtain a detection frame with a semantic tag.

Wherein clustering is the partitioning of a data set into different classes or clusters according to a certain criteria (e.g. distance), such that the similarity of data objects within the same cluster is high, while the variability of data objects not within the same cluster is high. That is, the data of the same class after clustering are gathered together as much as possible, and the data of different classes are separated as much as possible. Clustering is an unsupervised learning (Unsupervised Learning) method. The data clustering method can be mainly classified into a Partition-based clustering method (Partition-based Methods), a Density-based clustering method (Density-based Methods), a hierarchical clustering method (Hierarchical Methods), and the like.

In addition, the Density-based clustering method may be a Density-based clustering method (Density-Based Spatial Clustering of Applications with Noise, DBSCAN) with noise, where the DBSCAN is a very typical Density clustering algorithm, and a set of samples connected by a maximum Density derived from a Density reachable relationship is a category of final clusters, or a cluster. The density clustering algorithm generally assumes that the class can be determined by how tightly the sample is distributed. Samples of the same class are closely connected, that is, there must be samples of the same class present at a short distance around any sample of the class. By grouping closely connected samples into one class, a cluster class is thus obtained. And dividing all the closely connected samples into different categories, and obtaining the final clustering category results.

Wherein, detection may refer to target detection, and detection focuses on a specific object target, and category information and position information of the target are required to be obtained simultaneously. The detection gives an understanding of the foreground and background of the picture, it is necessary to separate the object of interest from the background and determine the description (class and position) of this object, so that the output of the detection model is a list, each item of which gives the class and position of the detected object (usually represented by the coordinates of a rectangular detection box) using a data set.

The target detection algorithm is roughly divided into Two types, one is a Two-Stage algorithm represented by fast R-CNN (Region with CNN Feature), and the target detection is mainly divided into Two parts, and a special module is used for generating candidate frames, searching for prospects and adjusting boundary frames. The other is One-Stage algorithm represented by SSD and YOLO, which is based directly on an anchor to directly classify and adjust bounding boxes.

YOLO is a new target detection method, and the method is characterized in that high accuracy is achieved while rapid detection is achieved. The target detection task is regarded as regression problem of target region prediction and category prediction. According to the method, the single neural network is adopted to directly predict the object boundary and the class probability, so that the end-to-end object detection is realized.

YOLOv5 is a one-stage target detection algorithm that balances accuracy and real-time performance. Three secondary semantic tags of a person are added on the basis of the detection of the YOLOv5 target, wherein the three secondary semantic tags are respectively the gesture (sitting gesture, standing gesture and others), the direction (facing, backing and others) and the risk degree (old people, children, pregnant women and others) of the person; all three are multi-classification problems, and for Yolov5, the dimension of the output feature map is actually increased, namely the dimension of the output feature map is 5 in total categories; adding a first class classification label and adding 4 classifiers, so that three classification loss functions are added correspondingly at a training end, wherein the classification loss is calculated by using a BCecls loss two-class cross entropy loss function which is the same as the original Yolov5, and the back propagation loss of the whole model is as follows:

Loss＝Loss _obj +Loss _cls +Loss _{cls_pose} +Loss _{cls_orient} +Loss _{cls_risk}

In one embodiment, detecting the target object in the color chart, the obtaining the detection frame with the semantic tag may include: detecting a target object in the color chart by using a target detection model to carry out framing to obtain a detection frame; determining the behavior state of the target object, and generating a multi-level semantic tag according to the behavior state; outputting a detection frame with multi-level semantic tags.

S206, matching the detection frame with the depth point cloud cluster and the radar point cloud cluster respectively to obtain a target depth point cloud cluster and a target radar point cloud cluster which have intersection with the target detection frame;

in one embodiment, S206 may include: the terminal projects the depth point cloud cluster to a plane where the color map is located, and depth point cloud projection and a first target detection frame based on the depth point cloud cluster are obtained; calculating the intersection ratio between the first target detection frame and the depth point cloud projection; determining a target depth point cloud cluster according to the size of the cross ratio; projecting the radar point cloud clusters to a plane where the color map is located to obtain radar point cloud projections and a second target detection frame, wherein an intersection area exists between each radar point cloud projection and the second target detection frame; and determining the radar point cloud cluster with the largest intersection area as a target radar point cloud cluster.

In one embodiment, before S206, the terminal screens the depth point cloud clusters according to the number of point clouds, and determines background information when the number of point clouds of the depth point cloud clusters is greater than a certain fixed value, and filters the background information. For example, when the number of the depth point cloud clusters and the radar point cloud clusters is larger than 2000, the background information is determined and filtered. The terminal screens Lei Dadian cloud clusters according to the number of the point clouds, and when the number of the point clouds of the depth point cloud clusters is larger than a certain fixed value, the terminal judges the point cloud clusters as background information and filters the background information. For example, when the number of the depth point cloud clusters and the radar point cloud clusters is greater than 50, the background information is determined and filtered.

In an embodiment, the step of projecting the depth point cloud cluster onto the plane where the color map is located to obtain a depth point cloud projection based on the depth point cloud cluster and a first target detection frame may specifically include: the terminal projects the depth point cloud cluster to a plane where the color map is located to obtain a pseudo target frame and a first target detection frame of the depth point cloud cluster; the Euclidean distance from each pseudo target frame to (the coordinate origin under the world coordinate system or the target position of a frame in history) is sequenced from small to large, the smaller the distance is, the higher the priority is, the lower the priority is, the overlapping part is removed in single target matching, and the depth point cloud projection based on the depth point cloud cluster is obtained after the pseudo target frames are de-overlapped.

In an embodiment, the step of projecting the radar point cloud cluster onto the plane where the color map is located to obtain the radar point cloud projection and the second target detection frame may specifically include: and the terminal projects the radar point cloud cluster to a plane where the color map is located to obtain radar point cloud projection, screens the detection frame according to the intersection range of the radar point cloud projection and the detection frame, and synchronizes the height of the second target detection frame with the laser radar projection area to obtain the second target detection frame.

In one embodiment, the step of screening the Lei Dadian cloud cluster to obtain a target radar point cloud cluster having an intersection with the target detection frame may specifically include: the terminal needs to match the second target detection frame with the radar point cloud cluster corresponding to the Lei Dadian cloud projection, and can be regarded as a 0-1 planning problem.

This problem can be transformed into solving a set of solutions under the constraint w _ij =0 or w _ij At=1, the objective function f (w _ij ) The value of (2) is the smallest. FIG. 3a is a schematic diagram showing the intersection of the target detection frame and the point cloud cluster,

wherein u is _ij U is an element in U, and U is an effective intersection area set of a box O and box C set region; v _ij V is an element in V, and V is an intersection area set of a box O and a box C set region; r is (r) _cj Is R _c The element R _c For the area of the box c cluster bounding box region, as shown in fig. 3b, another schematic diagram of the intersection of the target detection box and the point cloud cluster is shown, when Bc _j+1 And Bc _j With Bo _i Is intersected by a region S of overlap ₀ At this time, the effective area:

u _ij ＝v _ij ；

u _i(j+1) ＝v _i(j+1) -S ₀

R _c ＝{r _c1 ，r _c2 ...r _cm }

solving the 0-1 linear programming problem to obtain { W in W _iq The solution to the above problem is } =1, { w _1q }，{w _2q }...{w _nq And the point cloud clusters corresponding to the targets 1 to n are respectively, if a plurality of point cloud clusters exist, the point cloud clusters are combined, and the combined cluster set is as follows:

Clusters*{Cl ₀ ，Cl ₁ ，...Cl _n }

s208, tracking the target object based on the target detection frame and the fusion point cloud cluster between the target depth point cloud cluster and the target radar point cloud cluster.

The target tracking is a process of searching the most similar candidate target area position with the target template in the image sequence through the effective expression of the target. The target tracking algorithm includes optical flow algorithm, meanshift, camshift, kalman filtering, particle filtering, correlation Filtering (CF) and the like, and the Multi-target tracking algorithm includes DeepSort, motdt, towards Real-Time Multi-object tracking and the like. Tracking the detected target under the action of a MOT algorithm, wherein the MOT comprises two parts: 2D image-based tracking and 3D spatial location-based tracking. The algorithm mainly comprises two parts: the KF uniform motion equation is a uniform velocity model.

In one embodiment, before S208, when the first target detection frame matches the target depth point cloud cluster and the second target detection frame matches the target radar point cloud cluster, the terminal tracks the target object based on the first target detection frame and the target depth point cloud cluster; or tracking the target object based on the second target detection frame and the target radar point cloud cluster.

In one embodiment, before S208, when the euclidean distance between the target depth point cloud cluster and the radar point cloud cluster is smaller than the preset distance threshold, the terminal fuses the target depth point cloud cluster and the radar point cloud cluster to obtain a fused point cloud cluster.

Specifically, when the target depth point cloud cluster is the same as the radar point cloud cluster area, two 3D target p center positions are obtained from the radar and the depth camera respectively.

P＝k1*P _lidar +k2*P _depth

When P _lidar P _depth If both sources are present, k1=0k2=1 if the euclidean distance of both sources is greater than a threshold, for example, a threshold of 0.5m, at which point radar detection is considered to be abnormal; within the threshold range, k1 and k2 take a fixed constant, for example k1=0.7k2=0.3.

In one embodiment, S208 may include: the terminal calculates a first matching cost between the current target detection frame and the history tracking target detection frame, and a second matching cost between the current fusion point cloud cluster and the history tracking fusion point cloud cluster; the current fusion point cloud cluster is a point cloud cluster fused between the target depth point cloud cluster and the radar point cloud cluster; weighting and summing the first matching cost and the second matching cost to obtain a comprehensive matching cost; determining a comprehensive matching result based on the comprehensive matching cost; the comprehensive matching result is used for representing the matching condition between the current target detection frame and the history tracking target detection frame and between the current fusion point cloud cluster and the history tracking fusion point cloud cluster; and determining tracking information for tracking the target object according to the comprehensive matching result, and tracking the target object according to the tracking information.

In one embodiment, determining tracking information for tracking the target object based on the comprehensive matching result includes: when the comprehensive matching result is determined based on the comprehensive matching cost smaller than the preset loss threshold, the terminal updates the state of the target detection frame and the fusion point cloud cluster of the history tracking according to the current target detection frame and the current fusion point cloud cluster to obtain tracking information. When the comprehensive matching result is based on the comprehensive matching cost which is larger than or equal to a preset loss threshold value and the detection target cannot find the existing tracking target, recording the current number and the occurrence period, and generating a new track when the period is larger than the generation time. When the comprehensive matching result is based on the comprehensive matching cost which is larger than or equal to a preset loss threshold value and the corresponding detection target cannot be found by the existing tracking target, recording the current number and the disappearing period, deleting the current track when the period is larger than the deleting time, and otherwise, continuing to keep. For example, the generation time may be set to 2 seconds and the deletion time may be set to 3 seconds.

In the above embodiment, the depth point cloud, the color map, and the Lei Dadian cloud acquired by acquiring the target environment are acquired; respectively carrying out density clustering on the depth point cloud and the radar point cloud to obtain a depth point cloud cluster and a radar point cloud cluster; detecting a target object in the color chart to obtain a detection frame with a semantic tag; matching the detection frame with the depth point cloud cluster and the radar point cloud cluster respectively to obtain a target depth point cloud cluster and a radar point cloud cluster which have intersection with the target detection frame; A2D-to-3D multi-sensor tracking scheme is constructed, a target object is tracked based on a target detection frame and a fusion point cloud cluster between a target depth point cloud cluster and a radar point cloud cluster, and the accuracy of a tracking system is improved through a 2D-to-3D combined tracking mode.

In one embodiment, as shown in fig. 4, density clustering of the depth point cloud and the radar point cloud may specifically include:

and S402, selecting depth points and radar points from the depth point cloud and the Lei Dadian cloud as target depth points and target radar points.

S404, determining a first distance between the target depth point and the depth point neighborhood and a second distance between the target radar point and the radar point neighborhood.

Wherein the depth point neighborhood is a set of at least core depth points in the depth point cloud, and depth points with a distance from the core depth points less than a distance threshold may be included in the set. A radar point neighborhood is a set of at least core radar points in the Lei Dadian cloud, which may also include radar points that have a distance to the core radar points that is less than a distance threshold. The first distance refers to a distance between the target depth point and a core depth point (depth core object) in the vicinity of the depth point. The second distance refers to a distance between the target radar point and a core radar point (radar core object) in the vicinity of the radar point.

In one embodiment, prior to S404, the terminal determines a core depth point and a core radar point in the depth point cloud and the radar point cloud.

S406, when the first distance is smaller than the distance threshold, adding the target depth point to the depth point neighborhood; and adding the target radar point to the radar point neighborhood when the second distance is less than the distance threshold.

S408, traversing the depth points in the depth point cloud and the radar points in the radar point cloud until the depth points in the depth point cloud and the radar points in the radar point cloud are added into corresponding depth point neighborhood and radar point neighborhood, and obtaining a depth point cloud cluster and a radar point cloud cluster according to the depth point neighborhood and the radar point neighborhood.

Where density clustering (DBSCAN) is based on a set of neighborhoods to describe how tight a sample set is, parameters (e, minPts) are used to describe how tight a neighborhood's sample is distributed. Where e describes the neighborhood distance threshold for a sample and MinPts describes the threshold for the number of samples in the neighborhood where the distance of a sample is e. For example, according to the distribution situation of the point clouds of the specific application, taking e=0.05 m and mints=20 under the point clouds of the depth camera for two neighborhood parameters; usually taking e=0.05m under laser radar, mints=3;

assuming that the sample set is d= (x 1, x2,..once, xm), the specific density description of DBSCAN is defined as follows:

epsilon-neighborhood: for xj ε D, its ε -neighborhood contains a sub-set of samples in the sample set that is no more than ε away from xj in D, i.e., N ε (xj) = { xi ε D|distance (xi, xj) +.e., where ε +.

Core object: for any sample xj e D, if N e (xj) corresponding to its e-neighborhood contains at least MinPts samples, i.e., if |N e (xj) |gtoreq MinPts, xj is the core object.

The density is direct: if xi is located in the E-neighborhood of xj and xj is the core object, then xi is said to be directly reached by xj density. Note that the opposite is not necessarily true, i.e., at this point xj cannot be said to be directly from xi density unless and xi is also the core object.

The density can be achieved: for xi and xj, if there are sample sequences p1, p2,..pt satisfies p1=xi, pt=xj, and pt+1 is directly reached by pT density, then xj is said to be reachable by xi density. That is, the density can be satisfied with the transferability. The transfer samples p1, p2 in the sequence were all core objects, pT-1, since only core objects were able to pass other sample densities directly. Note that density is up to and not symmetrical, which can be derived from density-through asymmetry.

Density connection: for xi and xj, if there is a core object sample xk, let both xi and xj be reachable by xk density, then we call xi and xj density connected. Note that the density-connected relationship is satisfied with symmetry.

The steps of the DBSCAN clustering algorithm may be:

input: sample set d= (x 1, x2,..xm), neighborhood parameters (e, minPts), sample distance measurement mode

And (3) outputting: cluster partition C.

Initialization ofCore object collection

Initializing cluster number k=0, initializing unvisited sample set Γ=d, cluster division +.>

2) For j=1, 2..m, find all core objects as follows:

a) Finding an epsilon-neighborhood sub-sample set N epsilon (xj) of the sample xj in a distance measurement mode;

b) If the number of the sub-sample set samples meets |N epsilon (xj) |or more than MinPts, adding the samples xj into the core object sample set: Ω=Ω { xj }.

3) If the core object set

The algorithm is ended, otherwise, the step 4 is carried out;

4) In the core object set Ω, randomly selecting a core object o, initializing a current cluster core object queue Ω cur= { o }, initializing a class sequence number k=k+1, initializing a current cluster sample set ck= { o }, and updating a non-access sample set Γ=Γ - { o };

5) If the current cluster core object queue

Then the current cluster Ck is generated, the cluster division c= { C1, C2, &..ck, ck }, the core object set Ω=Ω -Ck is updated, and step 3 is performed. Otherwise the core object set Ω=Ω -Ck is updated.

6) And (3) taking out a core object o 'from the current cluster core object queue omega cur, finding out all E-neighborhood sub-sample sets N E (o') through a neighborhood distance threshold E, enabling delta=N E (o ') ∈Γ, updating the current cluster sample set Ck=Ck ∈ΔCk=Ck ≡Δ, updating the unvisited sample set Γ=Γ - Δ, updating omega cur=omega cur & (delta N Ω) -o', and turning to step 5.

The output result is: cluster partition c= { C1, C2,..ck }.

In the above embodiment, the depth point cloud, the color map, and the Lei Dadian cloud acquired by acquiring the target environment are acquired; respectively carrying out density clustering on the depth point cloud and the radar point cloud to obtain a depth point cloud cluster and a radar point cloud cluster; A2D-to-3D multi-sensor tracking scheme is constructed, and the accuracy of a tracking system is improved through a 2D-to-3D combined tracking mode.

In one embodiment, as shown in fig. 5, determining the target depth point cloud cluster according to the size of the intersection ratio includes:

s502, comparing each cross-over ratio with a preset cross-over ratio upper limit value and a preset cross-over ratio lower limit value.

S504, when the first target intersection ratio is larger than a preset intersection ratio upper limit value, determining a depth point cloud cluster corresponding to the first target intersection ratio as a target depth point cloud cluster; the first target cross-over ratio belongs to at least one of the cross-over ratios.

S506, when the second target intersection ratio is larger than the preset intersection ratio lower limit value and smaller than the preset intersection ratio upper limit value, clustering the depth point cloud clusters corresponding to the second target intersection ratio to obtain target depth point cloud clusters; the second target cross-over ratio belongs to at least two of the cross-over ratios.

In one embodiment, as shown in fig. 6, which is a schematic diagram of semi-supervised clustering, clustering the depth point cloud clusters corresponding to the second target intersection may include: for a given data set D, the number of clusters/is given by the must-be-connected constraint set ML and the don't-be-connected constraint set CL, and their respective penalty constraint sets { k } _ij Sum { k' _ij }. Based on PCKMeas, a label constraint set L is added, and a pair constraint structure initialization strategy in an original algorithm is changed into a label constraint structure, so that points belonging to a target depth point cloud cluster, namely a set P, can be further extracted through the constraint conditions _Y The target depth point cloud cluster (semi-supervised clustering) is obtained.

Wherein, label constraint: fall to p respectively _Y 、p _N As the center of a circle, r _Y 、r _N The dots with radius circles are labeled Y and N, the green and blue dots shown in the figure. Pairing constraint: throwing inThe shadow region is located at the target frame Bo _i Point and P in (2) _Y Belongs to ML, otherwise belongs to CL, and can be obtained by the same principle as P _N Is a pair of constraints of (a). And from point to p _Y As penalty weights.

Wherein p is _i Is Pcl _i Points of (1), L _max Is Pcl _i Midpoint to p _Y Is a maximum distance of (c).

In the above embodiment, the target depth point cloud cluster is determined according to the size of the intersection ratio, and each intersection ratio is compared with the preset intersection ratio upper limit value and the preset intersection ratio lower limit value. The target depth point cloud cluster is accurately obtained, a 2D-to-3D multi-sensor tracking scheme is constructed, and the accuracy of a tracking system is improved through a 2D-to-3D combined tracking mode.

It should be understood that, although the steps in the flowcharts of fig. 2a-2b, 4-5 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in FIGS. 2a-2b, 4-5 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in other steps.

In one embodiment, as shown in fig. 7, there is provided a tracking device for a target object, the tracking device for a target object specifically includes: an acquisition module 702, a cluster detection module 704, a matching module 706, a tracking module 708; wherein:

an acquisition module 702, configured to acquire a depth point cloud, a color map, and a Lei Dadian cloud acquired by acquiring a target environment;

The cluster detection module 704 is configured to perform density clustering on the depth point cloud and the radar point cloud to obtain a depth point cloud cluster and a radar point cloud cluster; detecting a target object in the color chart to obtain a detection frame with a semantic tag;

the matching module 706 is configured to match the detection frame with a depth point cloud cluster and a radar point cloud cluster, respectively, to obtain a target depth point cloud cluster and a radar point cloud cluster that have an intersection with the target detection frame;

the tracking module 708 is configured to track the target object based on the target detection frame and a fused point cloud cluster between the target depth point cloud cluster and the radar point cloud cluster.

In one embodiment, the cluster detection module 704 is further configured to select a depth point and a radar point from the depth point cloud and the Lei Dadian cloud as a target depth point and a target radar point; determining a first distance between a target depth point and a depth point neighborhood and a second distance between a target radar point and a radar point neighborhood; when the first distance is smaller than the distance threshold value, adding the target depth point to the depth point neighborhood; and adding the target radar point to the radar point neighborhood when the second distance is less than the distance threshold; traversing the depth points in the depth point cloud and the radar points in the radar point cloud until the depth points in the depth point cloud and the radar points in the radar point cloud are added into corresponding depth point neighborhood and radar point neighborhood, and obtaining a depth point cloud cluster and a radar point cloud cluster according to the depth point neighborhood and the radar point neighborhood.

In one embodiment, the cluster detection module 704 is further configured to detect a target object in the color chart by using the target detection model to perform framing, so as to obtain a detection frame; determining the behavior state of the target object, and generating a multi-level semantic tag according to the behavior state; outputting a detection frame with multi-level semantic tags.

In one embodiment, the target detection frame comprises a first target detection frame and a second target detection frame; the matching module 706 is further configured to project the depth point cloud cluster onto a plane where the color map is located, so as to obtain a depth point cloud projection based on the depth point cloud cluster and a first target detection frame; calculating the intersection ratio between the first target detection frame and the depth point cloud projection; determining a target depth point cloud cluster according to the size of the cross ratio; projecting the radar point cloud clusters to a plane where the color map is located to obtain radar point cloud projections and a second target detection frame, wherein an intersection area exists between each radar point cloud projection and the second target detection frame; and determining the radar point cloud cluster with the largest intersection area as a target radar point cloud cluster.

In one embodiment, the matching module 706 is further configured to compare each of the cross ratios with a preset cross ratio upper limit value and a preset cross ratio lower limit value; when the first target intersection ratio is larger than the preset intersection ratio upper limit value, determining a depth point cloud cluster corresponding to the first target intersection ratio as a target depth point cloud cluster; the first target cross-over ratio belongs to at least one of the cross-over ratios; when the second target intersection ratio is larger than the preset intersection ratio lower limit value and smaller than the preset intersection ratio upper limit value, clustering the depth point cloud clusters corresponding to the second target intersection ratio to obtain target depth point cloud clusters; the second target cross-over ratio belongs to at least two of the cross-over ratios.

In one embodiment, as shown in FIG. 8, the target detection frame includes a first target detection frame and a second target detection frame; the apparatus further comprises:

the selection module 710 is configured to track the target object based on the first target detection frame and the target depth point cloud cluster when the first target detection frame is matched with the target depth point cloud cluster and the second target detection frame is matched with the target radar point cloud cluster; or tracking the target object based on the second target detection frame and the target radar point cloud cluster.

And the fusion module 712 is configured to fuse the target depth point cloud cluster and the radar point cloud cluster to obtain a fusion point cloud cluster when the euclidean distance between the target depth point cloud cluster and the radar point cloud cluster is smaller than a preset distance threshold.

In one embodiment, the tracking module 708 is further configured to calculate a first matching cost between the current target detection frame and the history tracked target detection frame, and a second matching cost between the current fusion point cloud cluster and the history tracked fusion point cloud cluster; the current fusion point cloud cluster is a point cloud cluster fused between the target depth point cloud cluster and the radar point cloud cluster; weighting and summing the first matching cost and the second matching cost to obtain a comprehensive matching cost; determining a comprehensive matching result based on the comprehensive matching cost; the comprehensive matching result is used for representing the matching condition between the current target detection frame and the history tracking target detection frame and between the current fusion point cloud cluster and the history tracking fusion point cloud cluster; and determining tracking information for tracking the target object according to the comprehensive matching result, and tracking the target object according to the tracking information.

In one embodiment, the tracking module 708 is further configured to, when the comprehensive matching result is determined based on the comprehensive matching cost less than the preset loss threshold, update the state of the target detection frame and the fusion point cloud cluster of the history tracking according to the current target detection frame and the current fusion point cloud cluster, and obtain tracking information.

For specific limitations of the tracking device for the target object, reference may be made to the above limitation of the tracking method for the target object, which is not described herein. The various modules in the target tracking device described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal or a server, and in this embodiment, the computer device is taken as an example of a terminal, and the internal structure thereof may be as shown in fig. 9. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of tracking an object. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps in the above-described method embodiments.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method of tracking a target, the method comprising:

2. The method of claim 1, wherein the performing density clustering on the depth point cloud and the radar point cloud to obtain a depth point cloud cluster and a radar point cloud cluster respectively comprises:

3. The method of claim 1, wherein detecting the object in the color map to obtain a detection box with a semantic tag comprises:

outputting a detection frame with the multi-level semantic tags.

4. The method of claim 1, wherein the target detection frame comprises a first target detection frame and a second target detection frame;

the step of matching the detection frame with the depth point cloud cluster and the radar point cloud cluster respectively to obtain a target depth point cloud cluster and a target radar point cloud cluster which have intersection with the target detection frame comprises the following steps:

5. The method of claim 4, wherein the determining the target depth point cloud cluster according to the size of the merging ratio comprises:

6. The method of claim 1, wherein the target detection frame comprises a first target detection frame and a second target detection frame; the method further comprises the steps of:

7. The method according to claim 1, wherein the method further comprises:

8. The method of claim 1, wherein tracking the target based on the target detection box and a fused point cloud cluster between the target depth point cloud cluster and the radar point cloud cluster comprises:

9. The method of claim 8, wherein determining tracking information for tracking the target based on the comprehensive matching result comprises:

10. A tracking device for an object, the device comprising:

11. A computer device comprising a memory storing a computer program and a processor, characterized in that the processor is adapted to implement the steps of the method of any of claims 1 to 9 when the computer program is invoked and executed.

12. A computer readable storage medium storing a computer program, characterized in that the computer program when invoked and executed by a processor realizes the steps of the method according to any one of claims 1 to 9.