CN111583663A

CN111583663A - Monocular perception correction method and device based on sparse point cloud and storage medium

Info

Publication number: CN111583663A
Application number: CN202010338036.6A
Authority: CN
Inventors: 严鑫
Original assignee: Zhejiang Geely Holding Group Co Ltd; Ningbo Geely Automobile Research and Development Co Ltd
Current assignee: Zhejiang Geely Holding Group Co Ltd; Ningbo Geely Automobile Research and Development Co Ltd
Priority date: 2020-04-26
Filing date: 2020-04-26
Publication date: 2020-08-25
Anticipated expiration: 2040-04-26
Also published as: CN111583663B

Abstract

The invention discloses a monocular perception correction method, a monocular perception correction device and a storage medium based on sparse point cloud, wherein the method comprises the following steps: acquiring original camera data of a monocular camera and original sparse point cloud data of a radar sensor; processing original camera data to obtain three-dimensional detection results of a plurality of targets in an image plane, wherein the three-dimensional detection results comprise target depth values and two-dimensional bounding boxes; acquiring a conversion matrix; mapping original sparse point cloud data to corresponding positions of an image plane based on a conversion matrix to obtain a point cloud projection depth map, and setting a point cloud frame for each two-dimensional boundary frame in the point cloud projection depth map, wherein the point cloud projection depth map comprises a plurality of projection points corresponding to the original sparse point cloud data, and each projection point comprises a point cloud depth value; and correcting the target depth values of the multiple targets based on the point cloud depth values of the projection points contained in all the point cloud frames. According to the method, the accuracy of target depth value correction is improved by designing the point cloud frame characteristics.

Description

Monocular perception correction method and device based on sparse point cloud and storage medium

Technical Field

The invention relates to the field of automatic driving, in particular to a monocular perception correction method and device based on sparse point cloud and a storage medium.

Background

The intelligent perception is an important link of automatic driving and is a link of interaction between a vehicle and the environment. At present, mainstream perception sensors comprise cameras, millimeter wave radars, laser radars and the like, but the multi-beam laser radars are very expensive and are not suitable for mass production, and point clouds obtained by the low-beam laser radars and the millimeter wave radars are very sparse and are not suitable for being directly used for perception of three-dimensional obstacles. Compared with a binocular camera, the monocular camera is a relatively inexpensive sensor and has an advantage of being remarkably superior in obstacle detection and tracking, but the monocular camera has a limitation in depth information perception.

In the prior art, a scheme of fusing monocular camera data and sparse point cloud data is often adopted, but in the existing data fusion scheme, a target detection algorithm needs to be executed on the sparse point cloud data, so that under the condition that the complexity of the algorithm is increased, the target depth estimation deviation is large, and the improvement of the detection effect is limited.

Disclosure of Invention

The invention aims to provide a monocular perception method based on sparse point cloud, which is used for solving the problems of high algorithm complexity and large target depth estimation deviation when the depth information of a monocular camera is corrected in the prior art.

In order to achieve the purpose, the invention adopts the technical scheme that:

in a first aspect, an embodiment of the present invention provides a monocular perception method based on sparse point cloud, including:

acquiring original camera data of a monocular camera and original sparse point cloud data of a radar sensor;

processing the original camera data to obtain three-dimensional detection results of a plurality of targets in an image plane, wherein the three-dimensional detection result of each target comprises a target depth value and a two-dimensional bounding box;

acquiring a conversion matrix, wherein the conversion matrix is obtained in advance when the monocular camera and the radar sensor are jointly calibrated;

based on the conversion matrix, mapping the original sparse point cloud data to a corresponding position of the image plane to obtain a point cloud projection depth map, and setting a point cloud frame for each two-dimensional boundary frame in the point cloud projection depth map, wherein the point cloud projection depth map comprises a plurality of projection points corresponding to the original sparse point cloud data, and each projection point comprises a point cloud depth value;

correcting the target depth values of the plurality of targets based on the point cloud depth values of the projection points included in all the point cloud frames.

In a second aspect, an embodiment of the present invention further provides a monocular perception correction device based on a sparse point cloud, including:

the data acquisition module is used for acquiring original camera data of the monocular camera and original sparse point cloud data of the radar sensor;

a first processing module, configured to process the original camera data to obtain three-dimensional detection results of a plurality of targets in an image plane, where the three-dimensional detection result of each target includes a target depth value and a two-dimensional bounding box;

the parameter acquisition module is used for acquiring a conversion matrix, and the conversion matrix is obtained in advance when the monocular camera and the radar sensor are subjected to combined calibration;

the second processing module is used for mapping the original sparse point cloud data to the corresponding position of the image plane based on the conversion matrix to obtain a point cloud projection depth map, and setting a point cloud frame for each two-dimensional boundary frame in the point cloud projection depth map, wherein the point cloud projection depth map comprises a plurality of projection points corresponding to the original sparse point cloud data, and each projection point comprises a point cloud depth value;

and the depth correction module is used for correcting the target depth values of the plurality of targets based on the point cloud depth values of the projection points contained in all the point cloud frames.

In a third aspect, an embodiment of the present invention further provides a computer storage medium, where at least one instruction or at least one program is stored in the computer storage medium, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the monocular sensing method according to the first aspect.

The technical scheme of the invention has the following beneficial effects:

according to the method, the original sparse point cloud data are subjected to spatial transformation and projection through the transformation matrix, the original sparse point cloud data are mapped to the camera image plane, the depth information of the monocular camera is corrected by utilizing the depth information of the point cloud in the same plane, the data fusion is simple, and the algorithm complexity is low; by designing the characteristics of the point cloud frame, a method for screening out the point cloud depth values capable of representing the target is adopted, the accuracy of target depth value correction is improved, the target depth estimation deviation is reduced, and the target detection effect is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a monocular perception correction method based on a sparse point cloud according to an embodiment of the present invention.

FIG. 2 is an exemplary diagram of a point cloud projection depth map provided by an embodiment of the invention.

Fig. 3 is a flowchart of correcting a target depth value according to an embodiment of the present invention.

FIG. 4 is another exemplary diagram of a point cloud projection depth map provided by an embodiment of the invention.

Fig. 5 to 8 are diagrams illustrating selection of effective boxes according to an embodiment of the present invention.

Fig. 9 is a flowchart of correcting a target depth value according to an embodiment of the present invention.

FIG. 10 is an exemplary diagram of a curve of a relational equation provided by one embodiment of the invention.

Fig. 11 is a block diagram of a monocular perception correcting device based on a sparse point cloud according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The intelligent perception is an important link of automatic driving, the measurement of the distance of the obstacle is one of basic tasks of three-dimensional perception, and the sparse point cloud-based three-dimensional target detection method can lose a lot of outline or detail information due to the sparsity of the point cloud, so that the detection effect is poor, and the missed detection is relatively serious; the monocular camera-based three-dimensional target detection is limited in that the position prediction of the three-dimensional target is directly influenced by the error of the depth estimation. In the existing fusion scheme based on the sparse point cloud and the monocular camera, a target detection algorithm needs to be executed on sparse point cloud data, so that under the condition that the responsibility degree of the algorithm is increased, the target depth estimation deviation is large, and the improvement of the detection effect is limited.

Referring to the attached drawing 1 in the specification, an embodiment of the present invention provides a monocular perception correction method based on sparse point cloud, and the monocular perception correction method may be applied to the monocular perception correction device provided in the embodiment of the present invention, and may also be applied to a vehicle with an automatic driving function. As shown in fig. 1, the monocular perception modification method may include the following steps:

step S101, collecting original camera data of a monocular camera and original sparse point cloud data of a radar sensor.

The radar sensor may include some common radars, such as laser radar and millimeter wave radar of various line beams, and may also include some distance measuring devices, which may generate point cloud data of a spatial position and may be calibrated with a monocular camera. After the raw camera data and the radar sensor data are acquired, delay synchronization processing is also required. The raw camera data and the raw sparse point cloud data may be correlated by time synchronizing the raw camera data and the radar sensor data.

The original camera data and the radar sensor data are synchronized, the acquisition of the camera sensor and the radar sensor can be triggered in a hardware or software triggering mode, data delay is added after the data are acquired, and the purpose of data synchronization is achieved.

Step S102, processing the original camera data to obtain three-dimensional detection results of a plurality of targets in an image plane, wherein the three-dimensional detection result of each target comprises a target depth value and a two-dimensional bounding box.

In the embodiment of the invention, a common target detection framework and a geometric constraint relation of a monocular camera can be combined by utilizing a perception algorithm of the common monocular camera to obtain three-dimensional detection results of a plurality of targets in an image plane, such as RCNN (region with CNN features) series, YOLO (you Only Live one) series, SSD (Single shot Multi Box Detector) series and the like.

In one possible embodiment, in order to directly regress the size, direction and depth information of the target, branches of size estimation, direction estimation and depth estimation can be added in the deep learning network to obtain a target detection network; and processing the original camera data based on a target detection network to obtain three-dimensional detection results of a plurality of targets in an image plane. The three-dimensional detection result may include information such as a two-dimensional bounding box (bounding box) and a depth value of the target, where the two-dimensional bounding box may include a position, a direction, and a size of the two-dimensional bounding box, and the two-dimensional bounding box represents a compact rectangular box containing the target. Various obstacles are detected as much as possible by depending on an advanced deep learning target detection framework, and target loss caused by rear-end target-level fusion due to detection capability difference among sensors is overcome.

Step S103, a conversion matrix is obtained in advance when the monocular camera and the radar sensor are subjected to combined calibration.

Generally, in order to correct the target depth value of each target more accurately, a monocular camera and a radar sensor need to be calibrated in advance, the calibration of the monocular camera and the radar sensor belongs to the spatial synchronization category, and the rotation and the translation of the radar sensor relative to the monocular camera can be obtained through the calibration.

The joint calibration may be performed in two steps, first an internal reference calibration of the monocular camera, and then an external reference calibration of the monocular camera and the radar sensor. The internal reference calibration is to calibrate information such as focal length, principal point, distortion coefficient and the like of the monocular camera, and a conventional checkerboard calibration method can be adopted; the external reference calibration is to calibrate the conversion of rotation, translation and the like between the monocular camera and the radar sensor, and can be solved by using the characteristic points based on a matching method. And after the external parameters are calibrated, storing the calibrated conversion matrix.

And step S104, mapping the original sparse point cloud data to the corresponding position of the image plane based on the conversion matrix to obtain a point cloud projection depth map, and setting a point cloud frame for each two-dimensional boundary frame in the point cloud projection depth map, wherein the point cloud projection depth map comprises a plurality of projection points corresponding to the original sparse point cloud data, and each projection point comprises a point cloud depth value.

The calibration results of the monocular camera and the radar sensor are utilized, the original sparse point cloud data of the radar sensor are mapped to the same image plane of the monocular camera, the high-precision distance measurement advantage of the radar sensor is fully utilized in the same image plane, the radar sensor assists the monocular camera to carry out target detection, the limitation of depth information perception of the monocular camera is solved, a target detection algorithm does not need to be adopted for the original sparse point cloud data, and the algorithm complexity of the target detection is reduced.

In order to further improve the accuracy of target depth information correction and improve the detection effect, projection points which can represent the depth information of a target need to be screened from the projection points (namely point clouds). The position and size of the point cloud frame can be set according to the position and size of the corresponding two-dimensional boundary frame. For example, the position and size of the point cloud frame may be set to be the same as those of the corresponding two-dimensional bounding box, that is, the two-dimensional bounding box itself may be the point cloud frame, or the position and size of the point cloud frame may be set to be different from those of the corresponding two-dimensional bounding box.

In a preferred embodiment, for the point cloud frame in each two-dimensional bounding box in the point cloud projection depth map, the width of the point cloud frame may be set to 1/3 of the width of the two-dimensional bounding box, the height of the point cloud frame may be set to 1/3 of the height of the two-dimensional bounding box, the center point of the point cloud frame may be set to be located directly below the center point of the two-dimensional bounding box, and the distance between the center point of the point cloud frame and the center point of the two-dimensional bounding box may be set to 1/6 of the height of the two-dimensional bounding box.

Referring to the specification, fig. 2 shows an example of a point cloud projection depth map provided by an embodiment of the invention. As shown in fig. 2, the original sparse point cloud data is transformed into a matrix, and a plurality of projection points 3 are mapped in the point cloud projection depth map, so that it is easy to see that the point cloud formed by the plurality of projection points 3 is relatively sparse, and fig. 2 is a result of projection performed by the point cloud data of the 64-line laser radar sensor. In practical application, a millimeter wave radar or a 16-line laser radar is often adopted for distance measurement in order to reduce cost, and point cloud is more sparse. The two-dimensional boundary frame 1 is provided with a point cloud frame 2, and the position and the size of the point cloud frame 2 are set according to the two-dimensional boundary frame 1.

Step S105, correcting the target depth values of the targets based on the point cloud depth values of the projection points contained in all the point cloud frames.

Specifically, the characteristics of the point cloud frame are designed, and the point cloud depth value of the projection point included in the point cloud frame is used for correcting the target depth value of the target detected by the monocular camera, because the measurement accuracy of the radar sensor on the distance between the front obstacle is high, the point cloud data is mainly used as the reference when the target depth value of the target is corrected. The target depth values of the limited targets and the relation equation of the point cloud depth values are fitted by the target depth values and the point cloud depth values, the target depth values are updated to be point cloud depth values, and the target depth values of the other targets are calculated according to the relation equation, so that the target depth values of all the detected targets are corrected, and the depth estimation capability of the monocular camera is improved.

Because of the problems of perception capability or shielding and the like, the original sparse point cloud data may not have the depth information of some targets, so that after the original sparse point cloud is mapped to an image plane, some point cloud frames in the obtained point cloud projection depth map do not contain projection points, or the same projection point covers a plurality of targets, namely two-dimensional boundary frames are overlapped. Therefore, it is necessary to process the overlapped two-dimensional bounding boxes to screen out effective two-dimensional bounding boxes.

Referring to fig. 3 in the specification, fig. 3 illustrates a process of correcting a target depth value according to an embodiment of the present invention. As shown in fig. 3, the step of correcting the target depth values of the plurality of targets based on the point cloud depth values of the projection points included in all the point cloud frames may include:

step S301, divide all two-dimensional bounding boxes into overlapping sets and non-overlapping sets. The overlapped set stores zero or more two-dimensional bounding box groups, each two-dimensional bounding box group stores at least two-dimensional bounding boxes, each two-dimensional bounding box in the two-dimensional bounding box group is overlapped with at least one other two-dimensional bounding box, and every two-dimensional bounding boxes in the non-overlapped set are not overlapped with each other.

The overlapping comprises partial overlapping and full overlapping, namely each two-dimensional bounding box in the two-dimensional bounding box group is partially overlapped or fully overlapped with at least one rest two-dimensional bounding box part, and all two-dimensional bounding boxes in the non-overlapping set are not partially overlapped or fully overlapped.

Referring specifically to fig. 4, fig. 4 shows another example of a point cloud projection depth map provided by an embodiment of the invention. As shown in fig. 4, the object 1 does not include a projection point and does not overlap any other object, the object 2 does not overlap any other object, and the object 3, the object 4, and the object 5 overlap each other, so that the object 1 and the object 2 are drawn into a non-overlapping set, and the object 3, the object 4, and the object 5 are drawn into an overlapping set and belong to a two-dimensional bounding box group.

Step S302, screening out an effective two-dimensional bounding box set from the overlapped set.

Specifically, whether the number of the two-dimensional bounding box groups in the overlapping set is zero or not can be judged; if the number of the two-dimensional bounding box groups is not zero, calculating the area intersection ratio of every two-dimensional bounding boxes in the two-dimensional bounding box groups for each two-dimensional bounding box group in the overlapping set; if the area intersection ratio is larger than or equal to a preset intersection ratio threshold value, selecting a two-dimensional bounding box from the two-dimensional bounding box group as an effective box; and saving the effective frame to an effective two-dimensional bounding box set. Wherein the preset intersection ratio threshold may be set to 0.5.

It should be noted that, if the number of the two-dimensional bounding box groups is zero, it indicates that all the targets do not overlap, that is, all the two-dimensional bounding boxes corresponding to all the targets do not partially or completely overlap, and at this time, the two-dimensional bounding box set does not need to be processed.

In one possible embodiment, selecting a two-dimensional bounding box from the two-dimensional bounding box set as the valid box may include: acquiring a longitudinal coordinate value of the bottom frame center point of each two-dimensional boundary frame in the two-dimensional boundary frame group; and determining the two-dimensional boundary box corresponding to the maximum or minimum longitudinal coordinate value as an effective box in all the longitudinal coordinate values. The method comprises the steps of screening possibly corresponding effective frames by performing area intersection and comparison on a two-dimensional boundary frame and the information of a frame at the bottom of the two-dimensional boundary frame, and then determining target data to be fitted according to the same point cloud of a specific position in the two-dimensional boundary frame to ensure the validity of the data. Wherein the longitudinal direction represents a direction perpendicular to the horizontal direction.

In practical applications, the maximum longitudinal coordinate value or the minimum longitudinal coordinate value is selected to be associated with the establishment of the coordinate system in the image plane, and the manner of selecting is described below with reference to a two-dimensional bounding box set of the target 3, the target 4, and the target 5 in fig. 4, and for the sake of clearer explanation, the two-dimensional bounding boxes corresponding to the target 3, the target 4, and the target 5 are abstracted into the coordinate system.

For example, when a coordinate system is established with the upper left corner of the image as the origin, the horizontal right direction as the positive direction of the horizontal axis, and the vertical downward direction as the positive direction of the vertical axis, the vertical coordinate of the bottom frame center point of each two-dimensional bounding box in the two-dimensional bounding box group is obtained based on the coordinate system, and the two-dimensional box corresponding to the maximum vertical coordinate value among all the vertical coordinate values is determined as the effective box. As shown in FIG. 5, P1, P2, and P3 are the bottom border center points of the three two-dimensional borders, respectively, and P3 has the largest vertical coordinate value, so that the target 3 is the valid box.

Under the condition that a coordinate system is established by taking the lower left corner of the image as an origin, the horizontal right direction as the positive direction of a cross shaft and the vertical upward direction as the positive direction of a longitudinal shaft, the longitudinal coordinate value of the bottom frame central point of each two-dimensional boundary frame in the two-dimensional boundary frame group is obtained based on the coordinate system, and the two-dimensional boundary frame corresponding to the minimum longitudinal coordinate is determined as an effective frame in all longitudinal coordinates. As shown in FIG. 6, P1, P2, and P3 are the bottom border center points of the three two-dimensional borders, respectively, and P3 has the smallest vertical coordinate value, so that the target 3 is the valid box.

Under the condition that a coordinate system is established by taking the upper right corner of the image as an origin, the horizontal left direction as the positive direction of a cross shaft and the vertical downward direction as the positive direction of a longitudinal shaft, the longitudinal coordinate of the bottom frame center point of each two-dimensional boundary frame in the two-dimensional boundary frame group is obtained based on the coordinate system, and the two-dimensional boundary frame corresponding to the maximum longitudinal coordinate value in all the longitudinal coordinate values is determined as an effective frame. As shown in FIG. 7, P1, P2, and P3 are the bottom border center points of the three two-dimensional borders, respectively, and P3 has the largest vertical coordinate value, then the target 3 is the valid box.

Under the condition that a coordinate system is established by taking the lower right corner of the image as an origin, the horizontal left direction as the positive direction of a cross shaft and the vertical upward direction as the positive direction of a longitudinal shaft, the longitudinal coordinate of the bottom frame center point of each two-dimensional boundary frame in the two-dimensional boundary frame group is obtained based on the coordinate system, and the two-dimensional boundary frame corresponding to the minimum longitudinal coordinate value in all the longitudinal coordinate values is determined as an effective frame. As shown in FIG. 8, P1, P2, and P3 are the bottom border center points of the three two-dimensional borders, respectively, and P3 has the smallest vertical coordinate value, then object 3 is the valid box.

Of course, any point in the image may be selected as the origin to establish the coordinate system, but for each overlapped two-dimensional bounding box group, the two-dimensional bounding box with the smallest or largest longitudinal coordinate of the center point of the bottom bounding box is selected as the effective box, without departing from the scope of the claimed invention.

Step S303, determining whether the two-dimensional bounding box is a target box or not for each two-dimensional bounding box in the effective two-dimensional bounding box set and the non-overlapping set according to the number of projection points contained in the point cloud box in the two-dimensional bounding box, and if the two-dimensional bounding box is the target box, storing the two-dimensional bounding box in the target set.

Specifically, whether the number of projection points contained in a point cloud frame in the two-dimensional boundary frame is zero or not is judged; if the number of the projection points contained in the point cloud frame is zero, that is, no projection point exists in the two-dimensional bounding box, the target depth value of the target corresponding to the two-dimensional bounding box cannot be corrected through the point cloud depth value of the projection point, and the two-dimensional bounding box is not the target frame; if the number of the projection points contained in the point cloud frame is not zero, the two-dimensional bounding frame is indicated to have the projection points, and the target depth value of the target corresponding to the two-dimensional bounding frame can be corrected through the point cloud depth value of the projection points, so that the two-dimensional bounding frame is the target frame.

In step S304, the target depth value of each target is corrected based on the target set.

Referring to fig. 9 in detail, fig. 9 illustrates a process of correcting a target depth value according to an embodiment of the present invention. As shown in fig. 9, the step of correcting the target depth value of each target based on the target set may include:

step S901, determining whether the number of two-dimensional bounding boxes in the target set is greater than or equal to a preset target box threshold.

In the embodiment of the present invention, the preset target frame threshold may be set as the number of two-dimensional bounding boxes required for establishing the relational equation. If the number of the two-dimensional bounding boxes in the target set is smaller than a preset target box threshold value, jumping to step S902; and if the number of the two-dimensional bounding boxes in the target set is greater than or equal to the preset target box threshold value, skipping to the step S903.

Step S902, for each two-dimensional bounding box in the target set, calculating the depth value of the point cloud box in the two-dimensional bounding box, and updating the target depth value of the target corresponding to the two-dimensional bounding box into the depth value of the point cloud box.

And when the number of the two-dimensional boundary frames in the target set is smaller than a preset target frame threshold value, the target depth value of the target corresponding to the two-dimensional boundary frame and the depth value of the point cloud frame corresponding to the two-dimensional boundary frame are indicated to be incapable of passing through, and the relational equation is simulated, wherein the relational equation is not established at the moment. For each two-dimensional bounding box which is not in the target set, the target depth value of the corresponding target is not processed, and after the processing in step S902 is completed, the monocular perception correction method is completed.

Step S903, for each two-dimensional bounding box in the target set, calculating the depth value of the point cloud box in the two-dimensional bounding box, forming a depth value pair by the target depth value of the target corresponding to the two-dimensional bounding box and the depth value of the point cloud box, and updating the target depth value to the depth value of the point cloud box.

Because the number of the two-dimensional boundary frames necessary for establishing the relation equation is stored in the target set, the relation equation of the camera data and the point cloud data can be fitted through the target depth value of the target corresponding to the two-dimensional boundary frames and the depth value pair of the point cloud frame, and then the target depth value of the target corresponding to the two-dimensional boundary frame which is not in the target set is corrected according to the fitted relation equation.

Step S904, a relational equation is established based on all the depth value pairs.

Specifically, various fitting methods may be used to find a governing equation, such as the most common least squares method, that approximately corresponds to the depth measured by the radar sensor and the depth measured by the monocular camera.

If all the targets on the image have corresponding point cloud data, the depth information of the targets is based on the depth information of the point cloud. However, because the point cloud data obtained by the radar sensor is sparse, the point cloud mapped onto the image may also be sparse, and there may be targets on the image without corresponding point cloud, and at this time, modeling of the corresponding relationship between the depth of the point cloud data and the depth of the monocular camera is performed.

Referring specifically to fig. 10, fig. 10 illustrates an example of a simulation equation with depth values provided by an embodiment of the present invention. As shown in fig. 10, the horizontal axis represents the depth value of the point cloud frame, i.e., the result of radar sensor estimation, and the vertical axis represents the target depth value, i.e., the result of monocular camera estimation, and the relationship equation established according to the depth value pair may be represented by a relationship curve on the image. This method requires that there be at least two different depth value pairs, and therefore, the preset target box threshold may be set to 2.

Step S905, for each two-dimensional bounding box which is not in the target set, substituting the target depth value of the target corresponding to the two-dimensional bounding box into the relation equation to obtain a corrected depth value, and updating the target depth value into the corrected depth value.

In one possible embodiment, for each two-dimensional bounding box in the target set, the step of calculating a depth value for the point cloud box in the two-dimensional bounding box may comprise: acquiring point cloud depth values of all projection points in the point cloud frame; calculating the average value of the depth values of all the point clouds to obtain the depth average value; and determining the depth average value as the depth value of the point cloud frame. That is, the depth value of the point cloud frame may be an average value of the point cloud depth values of all the projection points included in the point cloud frame.

According to the monocular perception correction method, sparse point cloud data of the radar sensor with accurate distance measurement is combined with camera data of the monocular camera, a relational equation between monocular depth estimation of the specific target and radar sensor estimation is fitted through the monocular depth estimation and point cloud data pair, the depth value of the specific target is updated to be the depth value of the point cloud, and the depth values of other targets are recalculated according to the relational equation, so that the depth values of all detected targets in the image are corrected, and the depth estimation capability of the monocular camera is improved. Compared with the existing data fusion scheme, the monocular perception correction method does not need to execute a target detection algorithm on sparse point cloud, does not need to execute target-level fusion processing at the rear end, only performs spatial transformation, namely projection on the point cloud data, and has lower algorithm complexity. The method mainly uses the image information with rich color, texture and contour information and uses the sparse point cloud as the auxiliary to perceive the three-dimensional target, thereby reducing the demands on perception of the radar sensor and the like and reducing the cost.

Referring to the specification and the attached fig. 11, an embodiment of the present invention further provides a monocular perception correcting device based on sparse point cloud. As shown in fig. 11, the monocular perception correcting device may include a data collecting module 111, a first processing module 112, a parameter obtaining module 113, a second processing module 114, and a depth correcting module 115.

The data acquisition module 111 may be configured to acquire raw camera data of a monocular camera and raw sparse point cloud data of a radar sensor; the first processing module 112 may be configured to process the raw camera data to obtain three-dimensional detection results of a plurality of targets in the image plane, where the three-dimensional detection result of each target includes a target depth value and a two-dimensional bounding box; the parameter obtaining module 113 may be configured to obtain a conversion matrix, where the conversion matrix is obtained in advance when the monocular camera and the radar sensor are jointly calibrated; the second processing module 114 may be configured to map the original sparse point cloud data to a corresponding position of the image plane based on the transformation matrix, obtain a point cloud projection depth map, and set a point cloud frame for each two-dimensional bounding box in the point cloud projection depth map, where the point cloud projection depth map includes a plurality of projection points corresponding to the original sparse point cloud data, and each projection point includes a point cloud depth value; the depth correction module 115 may be configured to correct the target depth values of the plurality of targets based on the point cloud depth values of the projection points included in all the point cloud boxes.

In one possible embodiment, the first processing module 112 may include an extension unit, and the extension unit may be configured to add branches of size estimation, direction estimation, and depth estimation in the deep learning network to obtain a target detection network; based on the target detection network, the original camera data is processed to obtain three-dimensional detection results of a plurality of targets in the image plane.

In one possible embodiment, the depth modification module 115 may include a preprocessing unit, a screening unit, a determining unit, and a modifying unit, where the preprocessing unit may be configured to divide all two-dimensional bounding boxes into an overlapping set and a non-overlapping set, where the overlapping set stores zero or more two-dimensional bounding box groups, each two-dimensional bounding box group stores at least two-dimensional bounding boxes, each two-dimensional bounding box in the two-dimensional bounding box group overlaps at least one other two-dimensional bounding box, and all two-dimensional bounding boxes in the non-overlapping set do not overlap with each other two by two; the screening unit may be configured to screen out a set of valid two-dimensional bounding boxes from the overlapping set; the judging unit can be used for determining whether the two-dimensional bounding box is a target box or not according to the number of projection points contained in a point cloud box in the two-dimensional bounding box for each two-dimensional bounding box in the effective two-dimensional bounding box set and the non-overlapping set, and storing the two-dimensional bounding box into the target set if the two-dimensional bounding box is the target box; the correction unit may be configured to correct the target depth value for each target based on the target set.

In one possible embodiment, the screening unit may be further configured to: judging whether the number of the two-dimensional bounding box groups in the overlapping set is zero or not; if the number of the two-dimensional bounding box groups is not zero, calculating the area intersection ratio of every two-dimensional bounding boxes in the two-dimensional bounding box groups for each two-dimensional bounding box group in the overlapping set; if the area intersection ratio is larger than or equal to a preset intersection ratio threshold value, selecting a two-dimensional bounding box from the two-dimensional bounding box group as an effective box; and saving the effective frame to an effective two-dimensional bounding box set.

In one possible embodiment, the screening unit may be further configured to: for each two-dimensional boundary frame in the two-dimensional boundary frame group, acquiring a longitudinal coordinate value of a bottom frame central point of the two-dimensional boundary frame; and determining the two-dimensional boundary box corresponding to the maximum or minimum longitudinal coordinate value as an effective box in all the longitudinal coordinate values.

In a possible embodiment, the correction unit may be further configured to: when the number of the two-dimensional bounding boxes in the target set is greater than or equal to a preset target box threshold value, calculating the depth value of a point cloud box in each two-dimensional bounding box in the target set, forming a depth value pair by the target depth value of a target corresponding to the two-dimensional bounding box and the depth value of the point cloud box, and updating the target depth value to the depth value of the point cloud box; establishing a relation equation based on all depth value pairs; and substituting the target depth value of the target corresponding to the two-dimensional bounding box into the relation equation for each two-dimensional bounding box which is not in the target set to obtain a corrected depth value, and updating the target depth value into the corrected depth value.

In a possible embodiment, the correction unit may be further configured to: and when the number of the two-dimensional bounding boxes in the target set is smaller than a preset target box threshold value, calculating the depth value of a point cloud box in each two-dimensional bounding box in the target set, and updating the target depth value of the target corresponding to the two-dimensional bounding box into the depth value of the point cloud box.

In a possible embodiment, the correction unit may be further configured to: acquiring point cloud depth values of all projection points in the point cloud frame; calculating the average value of the depth values of all the point clouds to obtain the depth average value; and determining the depth average value as the depth value of the point cloud frame.

In a possible embodiment, the monocular perception correcting device of the embodiment of the present invention may further include a calibration module and a synchronization module. The calibration module is used for carrying out combined calibration on the monocular camera and the radar sensor; the synchronization module is used for synchronizing the original camera data and the original sparse point cloud data so as to enable the original camera data and the original sparse point cloud data to be associated.

When the monocular perception correction device provided by the embodiment of the invention is used, the monocular camera and the radar sensor are jointly calibrated through the calibration module, the conversion matrix is stored, the monocular camera and the radar sensor can be jointly calibrated usually when leaving a factory, and the on-line self-calibration can be carried out at intervals of one month or longer after leaving the factory. The data acquisition module 111 sends a synchronous trigger signal to the monocular camera and the radar sensor to acquire data. After the monocular camera module and the radar sensor acquire data, the original camera data and the original sparse point cloud data are synchronized through the synchronization module, and the data of the monocular camera module and the data of the radar sensor are correlated. Then, the synchronization module sends the original camera data to the first processing module 112 for processing, sends the original sparse point cloud data to the second processing module 114 for processing, and sends the processed data to the depth correction module 115 for data fusion processing, so as to correct the depth information of the detected target of the monocular camera. And finally, integrating the corrected information and the information originally detected by the monocular camera to obtain a final three-dimensional target detection result.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

An embodiment of the present invention further provides a computer storage medium, where at least one instruction or at least one program is stored in the computer storage medium, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the steps of the monocular perception correcting method in the above method embodiments.

The foregoing description has disclosed fully preferred embodiments of the present invention. It should be noted that those skilled in the art can make modifications to the embodiments of the present invention without departing from the scope of the appended claims. Accordingly, the scope of the appended claims is not to be limited to the specific embodiments described above.

Claims

1. A monocular perception correction method based on sparse point cloud is characterized by comprising the following steps:

2. The monocular perception modification method of claim 1, wherein the processing the raw camera data to obtain three-dimensional detection results of a plurality of targets in an image plane comprises:

adding branches of size estimation, direction estimation and depth estimation in a deep learning network to obtain a target detection network;

and processing the original camera data based on the target detection network to obtain three-dimensional detection results of a plurality of targets in an image plane.

3. The monocular perception correction method according to claim 1 or 2, wherein the correcting the target depth values of the plurality of targets based on the point cloud depth values of the projection points included in all the point cloud frames includes:

dividing all the two-dimensional bounding boxes into an overlapping set and a non-overlapping set, wherein zero or more two-dimensional bounding box groups are stored in the overlapping set, at least two-dimensional bounding boxes are stored in each two-dimensional bounding box group, each two-dimensional bounding box in the two-dimensional bounding box groups is overlapped with at least one other two-dimensional bounding box, and every two-dimensional bounding boxes in the non-overlapping set are not overlapped with each other;

screening out an effective two-dimensional bounding box set from the overlapped set;

determining whether the two-dimensional bounding box is a target box or not for each two-dimensional bounding box in the effective two-dimensional bounding box set and the non-overlapping set according to the number of the projection points contained in the point cloud box in the two-dimensional bounding box, and if the two-dimensional bounding box is the target box, storing the two-dimensional bounding box in the target set;

modifying the target depth value for each target based on the target set.

4. The monocular perception modification method of claim 3, wherein the filtering out the valid set of two-dimensional bounding boxes from the overlapping set comprises:

judging whether the number of the two-dimensional bounding box groups in the overlapping set is zero or not;

if the number of the two-dimensional bounding box groups is not zero, calculating the area intersection ratio of every two-dimensional bounding boxes in the two-dimensional bounding box groups for each two-dimensional bounding box group in the overlapped set;

if the area intersection ratio is larger than or equal to a preset intersection ratio threshold value, selecting one two-dimensional bounding box from the two-dimensional bounding box group as an effective box;

and saving the effective frame to the effective two-dimensional bounding box set.

5. The monocular perception modification method of claim 4, wherein the selecting one of the two-dimensional bounding boxes from the set of two-dimensional bounding boxes as the valid box comprises:

for each two-dimensional boundary frame in the two-dimensional boundary frame group, acquiring a longitudinal coordinate value of a bottom frame central point of the two-dimensional boundary frame;

and determining the two-dimensional boundary box corresponding to the maximum or minimum longitudinal coordinate value as the effective box in all the longitudinal coordinate values.

6. The monocular perception modification method of claim 3, wherein the modifying the target depth value for each target based on the target set comprises:

if the number of the two-dimensional bounding boxes in the target set is greater than or equal to a preset target box threshold value, calculating the depth value of the point cloud box in the two-dimensional bounding boxes for each two-dimensional bounding box in the target set, forming a depth value pair by the target depth value of the target corresponding to the two-dimensional bounding box and the depth value of the point cloud box, and updating the target depth value to the depth value of the point cloud box;

establishing a relational equation based on all the depth value pairs;

and substituting the target depth value of the target corresponding to the two-dimensional bounding box into the relation equation to obtain a corrected depth value for each two-dimensional bounding box which is not in the target set, and updating the target depth value to the corrected depth value.

7. The monocular perception modification method according to claim 6, further comprising:

if the number of the two-dimensional bounding boxes in the target set is smaller than the preset target box threshold value, calculating the depth value of the point cloud box in the two-dimensional bounding box for each two-dimensional bounding box in the target set, and updating the target depth value of the target corresponding to the two-dimensional bounding box into the depth value of the point cloud box.

8. The monocular perception modification method of claim 6 or 7, wherein the calculating, for each of the two-dimensional bounding boxes in the target set, a depth value of the point cloud box in the two-dimensional bounding box comprises:

acquiring the point cloud depth values of all the projection points in the point cloud frame;

calculating the average value of all the point cloud depth values to obtain a depth average value;

and determining the depth average value as the depth value of the point cloud frame.

9. A monocular perception correction device based on sparse point cloud is characterized by comprising:

10. A computer storage medium having at least one instruction or at least one program stored therein, the at least one instruction or the at least one program being loaded and executed by a processor to implement the monocular perception modification method according to any one of claims 1 to 8.