CN114332158A

CN114332158A - 3D real-time multi-target tracking method based on camera and laser radar fusion

Info

Publication number: CN114332158A
Application number: CN202111553630.8A
Authority: CN
Inventors: 王西洋; 傅春耘; 赖颖; 李占坤; 何嘉伟
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-04-12
Anticipated expiration: 2041-12-17
Also published as: CN114332158B

Abstract

The invention relates to a 3D real-time multi-target tracking method based on camera and laser radar fusion, which belongs to the field of environmental perception and comprises the following steps: s1: obtaining two-dimensional information and three-dimensional information of the target object at each moment; s2: fusing to obtain a target object detected by both a laser radar sensor and a camera sensor, a target object detected only by the laser radar and a target object detected only by the camera; s3: matching the target object detected by the two sensors with the three-dimensional track; s4: matching the target object detected only by the laser radar with the remaining unmatched three-dimensional tracks; s5: matching the two-dimensional tracks of the target object detected only by the camera; s6: matching the three-dimensional track with the two-dimensional track; s7: and managing the track.

Description

3D real-time multi-target tracking method based on camera and laser radar fusion

Technical Field

The invention belongs to the field of environmental perception, and relates to a 3D real-time multi-target tracking method based on camera and laser radar fusion.

Background

At present, the mainstream multi-target tracking method is based on tracking-by-detection, and the process comprises two steps: 1) detecting a target; 2) and (6) associating the data. With the increasing precision of target detection, the tracking accuracy is correspondingly improved, and more difficulties still exist in the most important data association stage of multi-target tracking, and the problem of how to overcome missed tracking and mistaken tracking caused by inaccurate detection and shielding is still challenging. The existing multi-target tracking method mainly comprises target tracking based on a camera and target tracking based on a laser radar.

The camera-based method uses information of a target object on an RGB image to accomplish the task of object similarity association, and generally includes appearance information, motion information, and the like. Camera-based multi-target tracking methods are typically 2D, i.e. tracking on an image plane, but there are also 3D tracking using binocular cameras to extract depth. The mainstream method is to extract target information by using a target detection algorithm, then predict through kalman filtering, and then calculate a cost matrix between each object of the previous frame and the next frame, such as: and performing cross-over, Euclidean distance, Marshall distance and the like, and finally performing matching association by using a Hungarian algorithm or a greedy algorithm. Later, researchers provided a single-stage tracking framework (joint learning and tracking model), thought that detection and tracking can be performed simultaneously, and provided a tracking model for Embedding target detection and appearance into a shared structure learning, which not only helps to prevent false tracking due to missed detection, but also solves the problem of missed tracking, and obtains a better result.

Lidar based methods typically have depth information and thus facilitate 3D tracking. As the deep learning method makes a major breakthrough in processing the point cloud data of the lidar, 3D tracking based on the lidar is becoming increasingly popular. Tracking using lidar point cloud data lacks pixel information like vision, and although recent advances have been made for extraction of point cloud features, point cloud-based appearance features are generally less accurate than vision-based appearance features.

Disclosure of Invention

In view of this, the invention aims to provide a 3D real-time multi-target tracking method based on the fusion of a camera and a laser radar

In order to achieve the purpose, the invention provides the following technical scheme:

the invention discloses a 3D real-time multi-target tracking method based on camera and laser radar fusion, which comprises the following steps:

A3D real-time multi-target tracking method based on camera and laser radar fusion comprises the following steps:

s1: the method comprises the steps that a camera-based 2D detector is used for obtaining two-dimensional information of a target object at each moment, and a laser radar-based 3D detector is used for obtaining three-dimensional information of the target object at each moment;

s2: three-dimensional information of a target object obtained by detection based on a laser radar is projected onto an image plane through coordinate system conversion, and is fused with the target object obtained by detection based on a camera, so that the target object detected by two sensors at the same time, the target object detected only by the laser radar and the target object detected only by the camera are obtained;

s3: matching the target object detected by two sensors at the time t with the three-dimensional track at the time t-1 through Kalman filtering prediction to obtain a track successfully matched and a track not successfully matched;

s4: matching the target object detected only by the laser radar with the remaining unmatched three-dimensional tracks;

s5: matching the target object detected by the camera at the time t with the two-dimensional track at the time t-1 to obtain the two-dimensional track at the time t through Kalman filtering prediction;

s6: projecting the three-dimensional track at the time t to an image plane through coordinate system transformation to match with the corresponding two-dimensional track;

s7: initializing targets detected by two sensors on unmatched targets into a new confirmation track, initializing target objects detected by laser radar on unmatched targets into a three-dimensional track to be confirmed, and converting into the confirmation track if the next three continuous frames are matched; initializing the unmatched target object which is only detected by the camera into a two-dimensional track to be confirmed, and converting into a confirmed track if the next three continuous frames are matched; 6 frames are reserved for two-dimensional and three-dimensional traces that do not match.

Further, in step S1, the information of the target object obtained by the 3D detector is represented by a three-dimensional bounding box ═ x, y, z, w, h, l, θ, where (x, y, z) is a coordinate of the center point of the three-dimensional detection frame in the lidar coordinate system, (w, h, l) is a width, a height, and a length of the three-dimensional detection frame, θ is an orientation angle (the heading angle), and the target object detected by the 2D detector is represented by a two-dimensional bounding box ═ x (x, h, l)_c,y_cW, h) represents, wherein (x)_c,y_c) The coordinates of the central point of the target detection frame under the pixel coordinate system are shown, and (w, h) are the width and the height of the detection frame respectively.

Further, step S2 specifically includes the following steps:

s21: projecting the three-dimensional bounding box onto an image plane through coordinate system conversion to obtain a two-dimensional bounding box,

then, calculating an intersection ratio with a bounding box based on the target object detected by the camera, wherein the calculation mode of the intersection ratio is as follows:

wherein b is_{3d_2d}Bounding box, b representing three-dimensional to two-dimensional transformation_2dA two-dimensional bounding box represented as a camera-based detected target object;

s22: the calculated intersection ratio d is compared with the calculated intersection ratio d^2diouAnd a threshold value sigma₁Comparison, if d^2diou≥σ₁Then the target is considered to be detected by both the lidar and the camera, denoted as

If d is^2diou<σ₁It is considered as the target object separately detected by the two sensors, respectively, and the object separately detected by the camera is represented as

The object detected by using the laser radar alone is represented as

Further, step S3 specifically includes:

s31: target object detected by detector at time t

the track predicted from the T-1 moment at the T moment is T_3D＝{T₁,T₂,…,T_m}, calculating D_fusionAnd T_3DThe intersection and parallel ratio between every two and the normalized Euclidean distance form a cost function

The calculation mode of the intersection ratio is

The normalized European style calculation mode is

S32: if d is_fusionGreater than a threshold value sigma₂If the track is matched with the measurement successfully, the track T is updated by using the corresponding measurement information_matchedIf d is_fusion<σ₂Then the measurement on mismatch D_unmatchedInitialized to a new validation track, track T on unmatched_unmatchedEnter intoAnd matching of the next stage.

Further, step S4 specifically includes:

s41: the track T on the unmatched track in the step S32_unmatchedWith objects D detected only under lidar_only3dMatching is carried out by using a Hungarian algorithm, and cost functions are as follows;

the calculation mode of the intersection ratio is

The normalized European style calculation mode is

S42: and comparing the cost function with a threshold, if the cost function is greater than the threshold, successfully matching the measurement and the track, updating the track by using the corresponding measurement, initializing the track which is not to be confirmed by the unmatched measurement, and performing the next stage of matching on the track which is not matched.

Further, the hungarian algorithm in step S41 has the following steps:

s411: subtracting the minimum value of each row of a cost matrix formed by the cost function from the minimum value of the row;

s412: subtracting the minimum value of the column from each column of the obtained new matrix;

s413: covering all elements with zero values in the new matrix by using the least row lines and column lines, if the row lines and column lines cannot cover all the zero elements at the moment, entering S414, and otherwise, entering S415;

s414: finding a minimum value among elements that cannot be connected using the row and column lines, subtracting the minimum value from the remaining elements, and adding the minimum value to an element at the intersection of the row and column lines;

s415: and starting matching from the row or the column with the least zero elements until all the rows and the columns are matched, and obtaining the optimal matching scheme.

Further, step S5 specifically includes:

s51: matching the two-dimensional trajectory in the image plane with the target object only detected by the camera by using a Hungarian algorithm, wherein the calculation step of the cost function is the same as the step S41;

s52: the cost function is compared with a threshold value sigma₃Comparing, if greater than threshold value sigma₃If the measurement is successfully matched with the track, updating the two-dimensional track by using the corresponding measurement, and if the two-dimensional track is smaller than the threshold value sigma₃The measurements on the unmatched initialize the tracks that are not to be validated.

Further, step S6 specifically includes:

s61: and transforming and projecting the three-dimensional track to an image plane through a coordinate system to form a two-dimensional track, wherein the three-dimensional track is based on a laser radar coordinate system, the two-dimensional track is based on an image coordinate system, and the three-dimensional bounding box of each three-dimensional track_3DConversion to image plane to form two-dimensional bounding box_{3D_to_2D}The method comprises the following steps:

bounding box_{3D_to_2D}＝P_rect·R₀·Tr_{velo_to_cam}·bounding box_3D

in the formula, P_rect2Is an internal reference matrix of the camera, R₀For correcting rotation matrices, Tr_{velo_to_cam}Converting points in the laser radar into a transformation matrix of a camera coordinate system;

s62: converting the obtained bounding box of the two-dimensional track_{3D_to_2D}Bounding box of the two-dimensional trajectory formed with step S52_2DThe matching is carried out, and the cost function calculation mode is as follows

S63: matching the cost value between the matching pair with successful matching and the threshold value sigma₄Comparing if greater than sigma₄Then the two-dimensional track is compared with threeAnd fusing the dimensional tracks, namely updating the two-dimensional track information by using the three-dimensional track information.

The invention has the beneficial effects that: the tracking framework can arbitrarily fuse the current mainstream 2D and 3D detectors, fully fuses the characteristics of the camera and the laser radar, can realize the fusion of 2D and 3D tracks, carries out 2D tracking on a target when only the camera detects the target at a far position, and carries out 3D tracking on the target once the target enters the detection range of the laser radar sensor.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow chart of the algorithm of the present invention;

fig. 2 is a schematic diagram of the fusion of a three-dimensional trajectory and a two-dimensional trajectory.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.

Referring to fig. 1 to 2, the present invention provides a 3D real-time multi-target tracking method based on camera and lidar fusion, which includes the following steps:

step 1: the method comprises the steps that a camera-based 2D detector is used for obtaining two-dimensional information of a target object at each moment, and a laser radar-based 3D detector is used for obtaining three-dimensional information of the target object at each moment; the information of the target object obtained by the 3D detector is represented by a three-dimensional bounding box (x, y, z, w, h, l, θ), where (x, y, z) is a coordinate of a central point of the three-dimensional detection frame in a laser radar coordinate system, (w, h, l) is a width, a height, and a length of the three-dimensional detection frame, θ is an orientation angle (the heading angle), and the target object detected by the 2D detector is represented by a two-dimensional bounding box (x)_c,y_cW, h) represents, wherein (x)_c,y_c) The coordinates of the central point of the target detection frame under the pixel coordinate system are shown, and (w, h) are the width and the height of the detection frame respectively.

Step 2: three-dimensional information of a target object obtained by detection based on a laser radar is projected onto an image plane through coordinate system conversion and is fused with the target object obtained by detection based on a camera, so that the target object detected by two sensors at the same time, the target object detected only by the laser radar and the target object detected only by the camera can be obtained; the method specifically comprises the following steps:

2-1: converting and projecting a three-dimensional bounding box and a coordinate system onto an image plane to obtain a two-dimensional bounding box, and then calculating a cross-over ratio with the bounding box based on a target object detected by a camera, wherein the cross-over ratio calculation mode is that

Wherein b is_{3d_2d}A bounding box representing a three-dimensional transformation to two-dimensional, representing a two-dimensional bounding box based on the target object detected by the camera.

2-2: the calculated intersection ratio d is compared with the calculated intersection ratio d^2diouAnd a threshold value sigma₁Comparison, if d^2diou≥σ₁Then the target is considered to be detected by both the lidar and the camera, denoted as

The object detected by using the laser radar alone is represented as

And step 3: matching the target object detected by two sensors at the time t with the three-dimensional track at the time t-1 through Kalman filtering prediction to obtain a track successfully matched and a track not successfully matched; the method specifically comprises the following steps:

3-1: target object detected by detector at time t

The calculation mode of the intersection ratio is

The normalized European style calculation mode is

3-2: if d is_fusionGreater than a threshold value sigma₂If the track is successfully matched with the measurement, the track (T) is updated by using the corresponding measurement information_matched) If d is_fusion<σ₂Then not match the measurement (D)_unmatched) Initialized to a new validation track, track on unmatch (T)_unmatched) And entering the next stage of matching.

And 4, step 4: matching the target object detected only by the laser radar with the remaining unmatched three-dimensional trajectories; the method specifically comprises the following steps:

4-1: matching the unmatched track in the step 3 with an object detected only under the laser radar by using a Hungarian algorithm, and calculating a cost function in the same step 3, specifically as follows;

the calculation mode of the intersection ratio is

The normalized European style calculation mode is

The Hungarian algorithm comprises the following steps:

1) subtracting the minimum value of each row of the cost matrix formed by the cost function from the minimum value of the row

2) Subtracting the minimum value of the column from each column of the obtained new matrix

3) Covering all elements with zero value in the new matrix by using the least row lines and column lines, if the row lines and column lines can not cover all the zero elements at the moment, entering the step 4), and if not, entering the step 5)

4) Finding the minimum of the elements that cannot be connected using the row and column lines, subtracting the minimum from the remaining elements, and adding the minimum to the element at the intersection of the row and column lines

5) Starting matching from the row or the column with the least zero elements until all the rows and the columns are matched, and obtaining the optimal matching scheme

4-2: and comparing the cost function with a threshold, if the cost function is greater than the threshold, successfully matching the measurement and the track, updating the track by using the corresponding measurement, initializing the track which is not to be confirmed by the unmatched measurement, and performing the next stage of matching on the track which is not matched.

And 5: matching the target object detected by the camera at the time t with the two-dimensional track at the time t-1 to obtain the two-dimensional track at the time t through Kalman filtering prediction; the method specifically comprises the following steps:

5-1: matching a two-dimensional track in an image plane with a target object detected only by a camera by using a Hungarian algorithm;

5-2: the cost function is compared with a threshold value sigma₃Comparing, if greater than threshold value sigma₃If the measurement is successfully matched with the track, updating the two-dimensional track by using the corresponding measurement, and if the two-dimensional track is smaller than the threshold value sigma₃The measurements on the unmatched initialize the tracks that are not to be validated.

Step 6: projecting the three-dimensional track at the time t to an image plane through coordinate system transformation to match with the corresponding two-dimensional track; the method specifically comprises the following steps:

6-1: and transforming and projecting the three-dimensional track to an image plane through a coordinate system to form a two-dimensional track, wherein the three-dimensional track is based on a laser radar coordinate system, the two-dimensional track is based on an image coordinate system, and each three-dimensional track has a three-dimensional bounding box_3DConversion to image plane to form two-dimensional bounding box_{3D_to_2D}The method comprises the following steps:

bounding box_{3D_to_2D}＝P_rect·R₀·Tr_{velo_to_cam}·bounding box_3Dhere P_rect2Is an internal reference matrix of the camera, R₀For correcting rotation matrices, Tr_{velo_to_cam}Converting points in the laser radar into a transformation matrix of a camera coordinate system;

6-2: converting the obtained bounding box of the two-dimensional track_{3D_to_2D}Bounding box of two-dimensional track formed in step 5_2DMatching by using a Hungarian algorithm, wherein the cost function is the same as that in the step 5;

6-3: the track fusion diagram can be seen in fig. 2, for an object with track number 2268, the camera already detects the presence of the object at frame 7, the lidar does not detect the object until frame 33, the method starts to perform 2D tracking on the object at frame 9, and performs 3D tracking on the object when the object is within the range of the lidar sensor, that is, at frame 33.

And 7: initializing targets detected by two sensors on unmatched targets into a new confirmation track, initializing target objects detected by laser radar on unmatched targets into a three-dimensional track to be confirmed (converting into the confirmation track if the next three continuous frames are matched), initializing target objects detected only by a camera on unmatched targets into a two-dimensional track to be confirmed (converting into the confirmation track if the next three continuous frames are matched), and reserving 6 frames for the two-dimensional track and the three-dimensional track on unmatched targets;

finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. A3D real-time multi-target tracking method based on camera and laser radar fusion is characterized in that: the method comprises the following steps:

2. The camera and lidar fusion based 3D real-time multi-target tracking method of claim 1, wherein: in step S1, the information of the target object obtained by the 3D detector is represented by a three-dimensional bounding box (x, y, z, w, h, l, θ), where (x, y, z) is a coordinate of a center point of the three-dimensional detection frame in the lidar coordinate system, (w, h, l) is a width, a height, and a length of the three-dimensional detection frame, θ is an orientation angle (the heading), and the target object detected by the 2D detector is represented by a two-dimensional bounding box (x)_c,y_cW, h) represents, wherein (x)_c,y_c) The coordinates of the central point of the target detection frame under the pixel coordinate system are shown, and (w, h) are the width and the height of the detection frame respectively.

3. The camera and lidar fusion based 3D real-time multi-target tracking method of claim 1, wherein: step S2 specifically includes the following steps:

The object detected by using the laser radar alone is represented as

4. The camera and lidar fusion based 3D real-time multi-target tracking method of claim 1, wherein: step S3 specifically includes:

s31: target object detected by detector at time t

The calculation mode of the intersection ratio is

The normalized European style calculation mode is

S32: if d is_fusionGreater than a threshold value sigma₂If the track is matched with the measurement successfully, the track T is updated by using the corresponding measurement information_matchedIf d is_fusion<σ₂Then the measurement on mismatch D_unmatchedInitialized to a new validation track, track T on unmatched_unmatchedAnd entering the next stage of matching.

5. The camera and lidar fusion based 3D real-time multi-target tracking method of claim 4, wherein: step S4 specifically includes:

the calculation mode of the intersection ratio is

The normalized European style calculation mode is

6. The camera and lidar fusion based 3D real-time multi-target tracking method of claim 5, wherein: the Hungarian algorithm in step S41 has the following steps:

7. The camera and lidar fusion based 3D real-time multi-target tracking method of claim 6, wherein: step S5 specifically includes:

8. The camera and lidar fusion based 3D real-time multi-target tracking method of claim 7, wherein: step S6 specifically includes:

bounding box_{3D_to_2D}＝P_rect·R₀·Tr_{velo_to_cam}·bounding box_3D

S63: matching the cost value between the matching pair with successful matching and the threshold value sigma₄Comparing if greater than sigma₄And fusing the two-dimensional track and the three-dimensional track, namely updating the two-dimensional track information by using the three-dimensional track information.