CN113761999B

CN113761999B - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN113761999B
Application number: CN202010931022.5A
Authority: CN
Inventors: 刘浩
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2020-09-07
Filing date: 2020-09-07
Publication date: 2024-03-05
Anticipated expiration: 2040-09-07
Also published as: CN113761999A

Abstract

The embodiment of the invention discloses a target detection method, a target detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining the outline of a detection target candidate region based on three-dimensional point cloud data acquired for physical space scanning; projecting the outline to an initial two-dimensional image acquired by shooting in a physical space to obtain a first two-dimensional image corresponding to a detection target; a position and/or type of the detection target is determined based on the first two-dimensional image and the initial two-dimensional image. By the technical scheme provided by the embodiment of the invention, the operation amount during target detection is reduced, and the target detection speed is further improved.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a target detection method, a target detection device, electronic equipment and a storage medium.

Background

In the field of automatic driving, in order to ensure the running safety of an automatic driving vehicle, real-time detection of obstacles possibly obstructing the running of the vehicle is necessary in the automatic driving process, so as to execute reasonable avoidance actions according to different obstacle types and states and ensure the running safety of the automatic driving vehicle. At present, two commonly used target detection methods exist, one of which is based on laser radar point cloud detection, specifically, three-dimensional point cloud data are converted into image data of a bird's eye view angle, and then target detection is performed through a two-dimensional target detection algorithm. The other is based on the visual detection of RGB images, specifically, feature extraction is performed according to the original image, a feature image is obtained, and then target recognition is performed according to each recognition unit of the feature image.

In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:

the detection method based on the laser radar point cloud is limited by the characteristics of the laser radar, the identification precision and stability of distant targets are poor, the problems are particularly obvious when the laser radar with fewer scanning lines is used, and the cost of an automatic driving vehicle is definitely increased by using the laser radar with more scanning lines. In the visual detection method based on the RGB image, the size and the position of the detected target are not known, and global search is required to be carried out on the characteristic image in the process of respectively carrying out target identification according to each identification unit of the characteristic image, so the detection method has the problems of large calculated amount, more consumed calculation resources and slower detection speed.

Disclosure of Invention

The embodiment of the invention provides a target detection method, a target detection device, electronic equipment and a storage medium, which improve the target detection precision and speed and reduce the target detection calculated amount.

In a first aspect, an embodiment of the present invention provides a target detection method, including:

determining the outline of a detection target candidate region based on three-dimensional point cloud data acquired for physical space scanning;

Projecting the outline to an initial two-dimensional image acquired by shooting in a physical space to obtain a first two-dimensional image corresponding to a detection target;

a position and/or type of the detection target is determined based on the first two-dimensional image and the initial two-dimensional image.

In a second aspect, an embodiment of the present invention further provides an object detection apparatus, including:

the candidate region determining module is used for determining the outline of the candidate region of the detection target based on the three-dimensional point cloud data acquired for the physical space scanning;

the projection module is used for projecting the outline to an initial two-dimensional image acquired by shooting in a physical space to obtain a first two-dimensional image corresponding to the detection target;

and the detection module is used for determining the position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image.

In a third aspect, an embodiment of the present invention further provides an object detection system, including: the device comprises a three-dimensional point cloud acquisition device, a two-dimensional image acquisition device and a processor;

the three-dimensional point cloud acquisition device is in communication connection with the processor and is used for scanning the acquired three-dimensional point cloud data aiming at the physical space and sending the three-dimensional point cloud data to the processor;

The two-dimensional image acquisition device is in communication connection with the processor and is used for acquiring an initial two-dimensional image aiming at physical space shooting and sending the initial two-dimensional image to the processor;

the processor is used for executing the steps of the target detection method according to the embodiment of the invention based on the three-dimensional point cloud data and the initial two-dimensional image.

In a fourth aspect, an embodiment of the present invention further provides an electronic device, including:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the target detection method steps as provided by any embodiment of the present invention.

In a fifth aspect, embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method steps as provided by any of the embodiments of the present invention.

The embodiments of the above invention have the following advantages or benefits:

by determining the outline of the detection target candidate region based on the three-dimensional point cloud data acquired for the physical space scanning, rather than directly extracting the fine features of the detection target from the three-dimensional point cloud data, the calculation amount is reduced; the contour is projected to an initial two-dimensional image acquired by shooting a physical space, so that a first two-dimensional image corresponding to a detection target is obtained; the purpose of improving the target detection speed and accuracy is achieved based on the technical means that the position and/or the type of the detection target are determined by the first two-dimensional image and the initial two-dimensional image.

Drawings

FIG. 1 is a flowchart of a target detection method according to a first embodiment of the present invention;

fig. 2 is a flowchart of a target detection method according to a second embodiment of the present invention;

fig. 3 is a flowchart of a target detection method according to a third embodiment of the present invention;

FIG. 4 is a schematic view of a manner of setting an anchor according to a third embodiment of the present invention;

FIG. 5 is a schematic diagram of a target detection algorithm according to a third embodiment of the present invention;

fig. 6 is a schematic structural diagram of a target detection device according to a fourth embodiment of the present invention;

fig. 7 is a schematic structural diagram of an object detection system according to a fifth embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1 is a flowchart of a target detection method according to a first embodiment of the present invention. The embodiment is applicable to a scene in which an obstacle that may obstruct the running of a vehicle is detected in the field of automatic driving. The method may be performed by an object detection device, which may be implemented in software and/or hardware, and integrated in an object electronic apparatus, such as an autonomous vehicle or a server, etc.

As shown in fig. 1, the target detection method specifically includes the following steps:

step 110, determining the outline of the candidate region of the detection target based on the three-dimensional point cloud data acquired for the physical space scanning.

The three-dimensional point cloud data can be obtained by scanning a physical space through a vehicle-mounted laser radar, and the physical space can be a physical space of an automatic driving vehicle running environment. The detection target may be a movable object such as a vehicle, a pedestrian, or the like traveling on a road. The three-dimensional point cloud data is composed of a very large number of point cloud points, each point cloud point comprises four dimensions of information, namely (x, y, z and intensity), wherein x represents x coordinate values of the point cloud point in a point cloud coordinate system, y represents y coordinate values of the point cloud point in the point cloud coordinate system, z represents z coordinate values of the point cloud point in the point cloud coordinate system, and intensity represents intensity information of the point cloud point.

Illustratively, the determining the outline of the detection target candidate region based on the three-dimensional point cloud data acquired for the physical space scanning includes:

preprocessing the three-dimensional point cloud data to remove point cloud data belonging to a preset static object in the three-dimensional point cloud data;

Clustering operation is carried out on the preprocessed three-dimensional point cloud data through a set clustering algorithm, and a clustering result containing at least one clustering cluster is obtained;

determining a minimum bounding box of each cluster in the clustering result;

and determining the area where each minimum bounding box is located as the outline of the detection target candidate area.

Wherein the preset static object is, for example, ground, flower pool, telegraph pole or road curb, etc. By removing the point cloud data of the preset static object in advance, the number of clusters in the subsequent clustering result is greatly reduced, the number of subsequent matching times on the characteristic image is further reduced, the calculated amount of characteristic fusion is reduced, and therefore the target detection speed is improved.

Further, if the preset static object is the ground, the preprocessing the three-dimensional point cloud data to remove the point cloud data belonging to the preset static object in the three-dimensional point cloud data includes:

determining the inclination angle between two point cloud points obtained by scanning two adjacent laser emission ends at the same moment in the three-dimensional point cloud data;

if the inclination angle is smaller than the inclination angle threshold value, marking the two point cloud points as ground point cloud data;

And removing the ground point cloud data from the three-dimensional point cloud data.

Taking a 16-line radar as an example, single-frame point cloud data is point cloud data scanned by simultaneously rotating 16 laser transmitting ends for one circle. When the laser is installed, the elevation angles of the 16 laser emitting ends are uniformly distributed, and are generally 2 degrees. Each laser emitting end rotates once, 1800 point cloud data (determined by the scanning frequency) can be scanned, so that single-frame point cloud data consists of 16 x 1800 point cloud data to form a matrix of 16 rows and 1800 columns. Because the ground has a flat characteristic, aiming at the ground point cloud data, the inclination angle between two point cloud data of the same column and the same adjacent row is not larger than the difference of elevation angles of two adjacent laser transmitting ends, and therefore the characteristic can be used for identifying the ground point cloud data and the non-ground point cloud data in single-frame point cloud data. The single-frame point cloud data are obtained by rotationally scanning at least two laser emitting ends adjacent to each other for one circle.

The inclination angle between two point cloud points obtained by scanning two adjacent laser emission ends at the same moment comprises the following steps:

the tilt angle between two point clouds is determined based on the following formula:

wherein alpha (i, j) represents an inclination angle between a point cloud point of an ith row and a point cloud point of a jth column in a single-frame point cloud data set and a point cloud point of a (i+1) th row and a jth column, and x _i,j X coordinate value, y representing point cloud point of ith row and jth column in single frame point cloud data set _i,j Y coordinate value, z representing point cloud point of ith row and jth column in single frame point cloud data set _i,j Z coordinate values representing point cloud points of ith row and jth column in single-frame point cloud data set, and row elements in the single-frame point cloud data set represent scanning at different moments through the same laser transmitting endThe obtained point cloud point is obtained by scanning different laser emitting ends at the same moment. atan2 (y, x) means the angle between the coordinate plane and the positive direction of the x-axis of the ray pointing to the point (x, y) with the origin of coordinates as the starting point.

Further, if the preset static object is a static object (such as a flower pool, a telegraph pole, a garbage can, a road curb, etc.) other than the ground, the preprocessing is performed on the three-dimensional point cloud data to remove point cloud data belonging to the preset static object in the three-dimensional point cloud data, including:

determining world coordinate values of each three-dimensional point cloud point in the three-dimensional point cloud data;

and removing the three-dimensional point cloud points with the world coordinate values falling in a set range from the three-dimensional point cloud data, wherein the set range is determined according to the world coordinate values of the preset static object. The world coordinate value specifically refers to a coordinate value under a world coordinate system, and each static and immovable object (such as a flower pool, a telegraph pole, a garbage can, a road tooth and the like) in the physical space corresponds to a unique coordinate value under the world coordinate system, so that the world coordinate value of the object can be utilized to determine a point-dimensional point cloud point of the static object, and the point-dimensional point cloud point of the static object can be removed.

Further, the set clustering algorithm may be, for example, a density clustering algorithm landmark FN-DBSCAN, or a grid clustering algorithm STING.

The determining of the minimum bounding box of each cluster C in the clustering result { C } specifically includes calculating a minimum circumscribing cube of each cluster C, where the cube may be a cuboid or a cube. The border of the minimum circumscribing cube forms the area where the three-dimensional point cloud point is located in each cluster C, namely detecting the contour proposal regions of the target candidate area,wherein->Class indexes id, px, py, pz, pd, ph, pw for cluster C, respectivelyAnd representing the position and length, height and width information of the geometric center of the current candidate region proposal region.

And step 120, projecting the outline to an initial two-dimensional image acquired by shooting in a physical space, and obtaining a first two-dimensional image corresponding to the detection target.

The initial two-dimensional image may be a color image captured by an RGB camera.

Specifically, based on coordinate transformation, the region outline of the candidate region is projected to the corresponding position of the initial two-dimensional image, and the position and/or type of the detection target is determined based on the first two-dimensional image after projection. Because the possible position area (namely the candidate area) of the detection target is determined on the basis of the coarse granularity of the three-dimensional point cloud data in advance, the purpose of reducing the detection area is achieved by projecting the outline of the candidate area to the initial two-dimensional image, and only the small area block corresponding to the candidate area in the initial two-dimensional image is required to be subjected to target detection, and the whole initial two-dimensional image is not required to be subjected to target detection, so that the detection calculation amount is greatly reduced, the detection speed is improved, and the detection precision is ensured.

Exemplary, the projecting the contour onto an initial two-dimensional image acquired for physical space shooting to obtain a first two-dimensional image corresponding to a detection target includes:

determining a coordinate conversion matrix according to calibration parameters of the laser radar and the camera;

and based on the coordinate transformation matrix, projecting the outline to an initial two-dimensional image acquired by shooting in a physical space, and obtaining a first two-dimensional image corresponding to the detection target.

Step 130, determining the position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image.

Illustratively, the determining the position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image includes:

inputting the initial two-dimensional image into a preset detection model;

determining an output result of the middle layer of the preset detection model as a characteristic image corresponding to the initial two-dimensional image;

mapping the first two-dimensional image to the characteristic image according to the downsampling ratio corresponding to the characteristic image, and obtaining a second two-dimensional image corresponding to a detection target;

a position and/or type of the detection target is determined based on the second two-dimensional image.

Feature extraction of the initial two-dimensional image may be accomplished by a neural network model, such as a convolutional neural network model. In order to improve the accuracy of target detection, in the solution of this embodiment, the three-dimensional point cloud features of the detection target and the two-dimensional image features are fused to finally determine the position and/or type of the detection target (such as a person, a car, etc.), so the feature image is not the feature image output by the last layer of the neural network detection model, but the feature image output by the middle layer, and the position and/or type of the detection target is predicted by combining the candidate regions based on the feature image output by the middle layer.

According to the technical scheme, coarse-granularity target contours are detected based on three-dimensional point cloud data, contours of detected candidate areas are projected to an initial two-dimensional image, target detection is further carried out on corresponding area small blocks of the contours of the candidate areas in the initial two-dimensional image, extraction of fine detection target features is not carried out directly based on the three-dimensional point cloud data, detection operand is greatly reduced, and detection speed is improved; the outline of the determined detection target candidate region is projected to the initial two-dimensional image, so that the aim of reducing the detection region is fulfilled, the target detection is only required to be carried out on the region small block corresponding to the candidate region in the initial two-dimensional image, the target detection is not required to be carried out on the whole initial two-dimensional image, the detection calculation amount is greatly reduced, the detection speed is improved, and the detection precision is ensured.

Example two

Fig. 2 is a flowchart of a target detection method according to a second embodiment of the present invention. The present embodiment embodies step 130 "determining the position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image" on the basis of the above embodiments. Compared with the target detection method provided by the embodiment, the target detection method provided by the embodiment can compensate for the influence of the characteristics of the laser radar on the acquired three-dimensional point cloud data. Specifically, if the characteristics of the lidar are poor, three-dimensional point cloud data corresponding to a distant object cannot be better obtained, for example, when the number of scanning lines of the lidar is small, the problem is particularly serious, and if no three-dimensional point cloud data with good quality is used as a basis, the guarantee cannot be provided for the follow-up accurate detection of the target. While deliberately improving the characteristics of lidar would undoubtedly increase hardware costs. Aiming at the problems, the technical scheme of the embodiment provides a corresponding solution. Wherein the same or corresponding terms as those of the above-described embodiments are not explained in detail herein.

Referring to fig. 2, the target detection method includes the steps of:

Step 210, determining the outline of the detection target candidate region based on the three-dimensional point cloud data acquired for the physical space scanning.

And 220, projecting the outline to an initial two-dimensional image acquired by shooting in a physical space, and obtaining a first two-dimensional image corresponding to the detection target.

Step 230, inputting the initial two-dimensional image into a preset detection model; and determining an output result of the middle layer of the preset detection model as a characteristic image corresponding to the initial two-dimensional image.

And step 240, extending the corresponding image contour of the candidate region in the first two-dimensional image by a set distance so as to supplement the edge of the main body of the detection target.

And step 250, mapping the first two-dimensional image to the characteristic image according to the downsampling magnification corresponding to the characteristic image, and obtaining a second two-dimensional image corresponding to the detection target.

The characteristic image is a less mature and less accurate image in the target detection process, and is an output result of a preset detection model intermediate layer. Any target detection task finally obtains a target detection result through a certain detection process, and a detection algorithm can extract various features of an original image step by step in the detection process, wherein the feature image is an image in the detection process, and the feature of a detection target is more or less. Taking the preset detection model as a convolutional neural network model for example, the characteristic image may be an output result of a second convolutional layer of an eighth block in a model structure, which is generally an image obtained by downsampling based on an input image, so that the first two-dimensional image is mapped to the characteristic image according to a downsampling ratio corresponding to the characteristic image, and a second two-dimensional image corresponding to a detection target is obtained.

Further, the characteristics of the laser radar are limited, if the characteristics of the laser radar are poor, three-dimensional point cloud data corresponding to a distant object cannot be obtained well, which may cause that a main body of a target object on the characteristic image in a mapping area corresponding to a candidate area is incomplete, and further influence the detection effect of the position of a subsequent target object. Aiming at the problems, in the technical scheme of the embodiment, the following steps are added:

before the first two-dimensional image is mapped to the characteristic image, the corresponding image contour of the candidate region in the first two-dimensional image is prolonged by a set distance so as to supplement the edge of the main body of the detection target. Specifically, the extension direction may be determined based on the image contour feature corresponding to the three-dimensional point cloud data in the candidate region in the first two-dimensional image, and then the set distance may be extended along the determined extension direction. The outline of the candidate region may also be directly elongated along the straight line direction in which the corresponding image outline in the first two-dimensional image is located.

Step 260, determining the position and/or type of the detection target based on the second two-dimensional image.

Specifically, the second two-dimensional image may be divided into grid areas with a fixed size, and then the positions of the anchors are further determined according to the grid division, and then target detection is performed in the anchors.

Assuming that the number of contours of the determined detection target candidate region is Q, the grid region of the fixed size divided by the second two-dimensional image is 3*3, and the number of corresponding anchors is T, the number of matching times when performing target detection only needs q×t. In the prior art, since the size and position of the detection target are not known, global search matching is required to be performed on the feature image, that is, matching calculation is performed on the anchors in each grid in the feature image, and if the number of grid areas divided by the feature image is m×n and K anchors are allocated in advance to each grid, the matching frequency is m×n×k, and typically m×n×k is far greater than q×t.

According to the technical scheme, the outline of the candidate detection target area is determined based on three-dimensional point cloud data acquired by physical space scanning; projecting the outline to an initial two-dimensional image acquired by shooting in a physical space to obtain a first two-dimensional image corresponding to a detection target; extending the contour of the candidate region by a set distance from the corresponding image contour in the first two-dimensional image to supplement the main body edge of the detection target, and mapping the first two-dimensional image to the characteristic image according to the downsampling magnification corresponding to the characteristic image to obtain a second two-dimensional image corresponding to the detection target; the position and/or type of the detection target are determined based on the second two-dimensional image, so that the problem that three-dimensional point cloud data of a distant object is incomplete due to the characteristics of the laser radar is solved, and the requirements on the characteristics of the laser radar are reduced; the target detection is realized, and the detection speed and the detection precision are improved.

Example III

Fig. 3 is a flowchart of a target detection method according to a third embodiment of the present invention. Based on the above embodiment, the present embodiment embodies the step 260 "determining the position and/or type of the detection target based on the second two-dimensional image", and several manners of setting the anchor are provided, which is helpful for further improving the target detection speed and accuracy. Wherein the same or corresponding terms as those of the above-described embodiments are not explained in detail herein.

Referring to fig. 3, the target detection method includes the steps of:

step 310, determining the outline of the candidate region of the detection target based on the three-dimensional point cloud data acquired for the physical space scanning.

Step 320, projecting the outline to an initial two-dimensional image acquired by shooting in a physical space, and obtaining a first two-dimensional image corresponding to the detection target.

Step 330, inputting the initial two-dimensional image into a preset detection model; and determining an output result of the middle layer of the preset detection model as a characteristic image corresponding to the initial two-dimensional image.

And 340, extending the corresponding image contour of the candidate region in the first two-dimensional image by a set distance so as to supplement the edge of the main body of the detection target.

And 350, mapping the first two-dimensional image to the characteristic image according to the downsampling ratio corresponding to the characteristic image, and obtaining a second two-dimensional image corresponding to the detection target.

Step 360, dividing the second two-dimensional image into a set number of grid areas, and determining at least one frame area according to the grid areas; performing frame rough regression operation according to the frame region by using a set neural network model to obtain a segmented image; and determining the position and/or type of the detection target according to the segmented image.

Assume that the second two-dimensional image is divided into a grid area of 3*3, and 15 anchors are allocated to the grid area of 3*3, and the allocation manner of the 15 anchors is shown with reference to fig. 4.

Illustratively, the determining at least one frame area (the frame area is an anchor box area) according to the grid area includes at least one of the following manners:

determining all the grid areas as first frame areas;

determining the grid areas in the same row as a second frame area;

determining the grid areas in the same column as a third frame area;

determining the grid areas of two adjacent rows as fourth frame areas;

Determining the grid areas of two adjacent columns as a fifth frame area;

a grid region constituting a square is determined as a sixth border region, wherein a side length of the square is greater than a side length of a single grid region and less than a side length of the second two-dimensional image.

Wherein, the anchor mechanism specifically comprises: a point on the feature map corresponds to a small area of the original map, on which a plurality of anchor boxes can be generated, and then anchor boxes where an object may exist are detected in each anchor box, and coordinates are further returned to detect the position information of the object. In the technical scheme of the embodiment, firstly, the outline of the detection target is determined based on the three-dimensional point cloud data, the detection area of the two-dimensional characteristic image is reduced based on the determined outline, only the area corresponding to the detected outline in the two-dimensional characteristic image is reserved, and the detection calculation amount is greatly reduced by reducing the detection area of the two-dimensional characteristic image; detecting whether the anchor corresponding to each pixel point contains a detection target or not through an anchor mechanism.

Further, the determining the position and/or type of the detection target according to the segmented image includes:

Inputting the segmented image into a target classification model to obtain the type of a detection target;

and inputting the segmented image into a full convolution model to obtain the position of the detection target.

Correspondingly, referring to the schematic frame diagram of an Object detection algorithm shown in fig. 5, the algorithm includes two modules, namely a 3D point cloud data processing module (also called 3D-probe-Projector) and a 2D image detection module (also called 2D-Feature-Object-Detector). The 3D-Proposal-Projector is responsible for extracting the outline of the detection target candidate region from the point cloud and mapping the outline to the corresponding position of the Feature image Feature map extracted from the 2D-Feature-Object-Detector. The 2D-Feature-Object-Detector is responsible for target detection of the contour of the 3D-Proposal-project extraction candidate region and for accomplishing Feature extraction.

Unlike the existing fusion scheme, the method does not directly extract or detect the characteristics from the point cloud, but clusters the characteristics in an unsupervised mode, so that a rough three-dimensional candidate region of a detection target is quickly obtained. Next, the three-dimensional candidate region is mapped to the 2D coordinate system of the camera by calibration parameters of hardware electronics such as radar, camera, etc. And sending the two-dimensional image into a CNN backbone network to perform feature extraction while processing the point cloud, and obtaining a feature image map of the whole image. And then directly projecting the position of the 3D candidate region obtained before to the corresponding position of the feature map according to the downsampling multiplying power and clipping, so that a plurality of small-size local feature maps are obtained. Note that the local feature map at this time corresponds to a clustering result of filtering out the preset static point cloud, and because the preset static point cloud has been filtered out, the number of clusters in the clustering result is greatly reduced, and further, the number of times that the detection link matches on the whole feature map is reduced. In addition, in the conventional detection method, K anchors are allocated in advance to each pixel point of the global feature map (assumed to be m×n), where the calculated amount is m×n×k. In this scheme, the region to be detected is not a global feature map, but a plurality of local feature maps (corresponding to the extracted 3D candidate regions), where each local feature map has a higher probability of containing one or fewer detection targets. Because the detection targets are relatively close to each other in three-dimensional space, the scales of the detection targets of the same category on the feature image feature map are relatively similar, so that the local feature image feature map to be detected can be dynamically divided into grids with fixed sizes, such as 3*3, and then the local feature image feature map is further spatially divided according to the grids to serve as anchors, and then detection is carried out in each anchor. Assuming that the number of detected clusters is Q, the fixed grid size of the local feature map is 3*3, and the corresponding number of anchors is T, the number of matches only needs q×t when detecting. Generally, m×n×k > q×t, which is the root cause of the speed improvement of this scheme.

Specifically, referring to fig. 5, the algorithm flow of the 3D point cloud data processing module (also called 3D-Proposal-Projector) is as follows:

1) The input is a point cloud matrix PC, the format of the point cloud matrix PC is (n, 4), wherein n represents the number of point cloud points, 4 represents four dimensions (x, y, z, intensity) of the point cloud points, and the four dimensions respectively represent space three-dimensional coordinates and point cloud intensity;

2) Data preprocessing is performed on the point Yun Juzhen PC. For example using ground removal algorithmsRemoval ofGround point clouds and static obstacle removal algorithms remove ambient point clouds (e.g., point clouds of utility poles, point clouds of flower pools, point clouds of garbage cans, etc.).

3) Clustering the point cloud by using a clustering algorithm, such as a landmark FN-DBSCAN spatial clustering algorithm, to obtain a clustering result { C }, wherein C is a cluster.

4) Respectively calculating the minimum bounding box (namely the minimum bounding rectangle) of all cluster clusters C in the clustering result to form a three-dimensional point cloud candidate region proposal regions of the detection target, wherein->The class indexes id, px, py, pz, pd, ph and pw of the cluster C are the position and the length, height and width information of the geometric center of the current candidate region proposal region respectively.

5) Projecting the outline of the three-dimensional candidate region pro-region to the 2D coordinates of the image using the calibration parameters of the lidar and the camera to obtain a first two-dimensional image Wherein the method comprises the steps ofClass indexes id, px, py, ph, pw for cluster C are respectively as followsThe center position and height and width information of the front candidate region proposal region in the 2D coordinate system. It will be appreciated that the number of the three-dimensional candidate regions proposal regions may be 1 or at least two, and in general, the number of the three-dimensional candidate regions proposal regions is plural. If the number of the three-dimensional candidate regions proposal regions is 1, a two-dimensional image contour is corresponding to the first two-dimensional image obtained after projection; and if the number of the three-dimensional candidate regions proposal regions is a plurality of, a plurality of two-dimensional image outlines are correspondingly arranged in the first two-dimensional image obtained after projection.

The algorithm flow of the 2D image detection module (also called 2D-Feature-Object-Detector) is as follows:

1) The input is a two-dimensional image (i.e., a two-dimensional image obtained by the vehicle-mounted camera for physical space shooting), the size is 800,600,3, the image is input into a backbone network VGG, and the convolution layer conv8_2 is taken as a full-image feature map, denoted as G, and the size is 50,38,512.

2) Calculating downsampling ratio according to the relation of convolution operation in downsampling, such as kernel size kernel-size, padding, step size stride, convolution layer number, pooling parameters, etc., and processing the first two-dimensional image PR ₂ Each image contour of the two-dimensional images is mapped to a characteristic image G to obtain a second two-dimensional image, and each image contour corresponds to a local area small block in the second two-dimensional image and is called a local characteristic imageIt can be understood that each cluster corresponds to one of the candidate regions, each of the candidate regions corresponds to one of the region blocks on the first two-dimensional image, and each of the region blocks on the first two-dimensional image corresponds to one of the region blocks on the second two-dimensional image, so that a plurality of local region patches on the second two-dimensional image, that is, regions where detection targets may exist, can be obtained through the projection and mapping operations, and only the local region patches need to be subjected to target detection in the following process, without each of the whole feature imagesThe pixel areas are respectively used for target detection, so that the operation amount is greatly reduced.

3) And respectively cutting out the local area small blocks, namely cutting out each local area small block from the second two-dimensional image to obtain a local characteristic image. It can be understood that if the number of the candidate regions is one, a local region small block is corresponding to the second two-dimensional image, and no clipping is needed at this time, and the second two-dimensional image is the local feature image.

4) In particular, in the second two-dimensional imageAnd (3) carrying out image region design region pro-pos, judging whether each region is a foreground (the foreground is a detection target such as a pedestrian or a vehicle) or a background (the background is other objects which are not detection targets), and carrying out rough regression on the binding box, wherein the rough regression is realized through an RPN (reactive power network). The cross ratio IoU with the ground trunk can be used as a positive sample with more than 0.7 and as a negative sample with less than 0.3, and the smooth L1 and the regular term can be used as a regression loss function loss, and the binary_focal_loss can be used as a category loss function loss of the foreground and the background. Wherein the local feature image is->Dividing the grid area into a grid area of 3*3, and distributing 15 anchors in total; and selecting all foreground targets in the RPN process, and cutting the corresponding local feature image feature map according to the coordinates after coarse regression. The feature map after crop is pooled to SPP pool, and the SPP parameter may be {5×5,3×3,2×2,1×1} to obtain a feature map with uniform size=39×1, denoted as F. F is sent to a CNN branch network for extracting the ebedding, the loss function loss is cosine loss, a visual characteristic vector of 128 x 1 is output, and a downstream module (tracking module) of the algorithm provides visual characteristics of the detected object. F is sent to a full convolution layer and a softmax layer, detection targets are classified, and a loss function loss is category_cross_entry. F is sent into a full convolution layer to carry out the precise regression of the binding box, and the loss function los is obtained s is smoothL 1.

According to the target detection method, the operation complexity is greatly reduced, the response speed of target detection in automatic driving is improved, meanwhile, the laser radar point cloud is not directly used for detection, so that a laser radar product with fewer scanning lines can be used, and the cost of electronic equipment is effectively reduced.

The following is an embodiment of an object detection device provided in the present embodiment, which belongs to the same inventive concept as the object detection method of the above embodiments, and reference may be made to the embodiment of the object detection method for details that are not described in detail in the embodiment of the object detection device.

Example IV

Fig. 6 is a schematic structural diagram of a target detection device according to a fourth embodiment of the present invention. The embodiment is applicable to a scene in which an obstacle that may obstruct the running of a vehicle is detected in the field of automatic driving. The device may be integrated in an autonomous vehicle or in a server.

As shown in fig. 6, the apparatus includes: a candidate region determination module 610, a projection module 620, and a detection module 630.

The candidate region determining module 610 is configured to determine a contour of the detection target candidate region based on three-dimensional point cloud data acquired for physical space scanning; the projection module 620 is configured to project the contour to an initial two-dimensional image acquired for physical space shooting, so as to obtain a first two-dimensional image corresponding to the detection target; a detection module 630 for determining a position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image.

Further, the candidate region determination module 610 includes:

the preprocessing unit is used for preprocessing the three-dimensional point cloud data to remove point cloud data belonging to a preset static object in the three-dimensional point cloud data;

the clustering unit is used for carrying out clustering operation on the preprocessed three-dimensional point cloud data through a set clustering algorithm to obtain a clustering result containing at least one clustering cluster;

the first determining unit is used for determining the minimum bounding box of each cluster in the clustering result;

and the second determining unit is used for determining the area where each minimum bounding box is located as the outline of the detection target candidate area.

Further, if the preset static object is the ground, the preprocessing unit is specifically configured to:

Further, if the preset static object is a static object except the ground, the preprocessing unit is specifically configured to:

and removing the three-dimensional point cloud points with the world coordinate values falling in a set range from the three-dimensional point cloud data, wherein the set range is determined according to the world coordinate values of the preset static object.

Further, the projection module 620 includes:

the first determining unit is used for determining a coordinate transformation matrix according to calibration parameters of the laser radar and the camera;

and the projection unit is used for projecting the outline of the candidate region to an initial two-dimensional image acquired by shooting in a physical space based on the coordinate transformation matrix, and obtaining a first two-dimensional image corresponding to the detection target.

Further, the detection module 630 includes:

a mapping unit, configured to map the first two-dimensional image to the feature image according to a downsampling magnification corresponding to the feature image, and obtain a second two-dimensional image corresponding to a detection target;

and a second determining unit for determining the position and/or type of the detection target based on the second two-dimensional image.

Further, the detection module 630 further includes: and the supplementing unit is used for extending the corresponding image contour of the candidate region in the first two-dimensional image by a set distance before the first two-dimensional image is mapped to the characteristic image so as to supplement the edge of the main body of the detection target.

Further, the second determining unit includes:

a dividing subunit configured to divide the second two-dimensional image into a set number of grid areas;

a first determining subunit, configured to determine at least one frame area according to the grid area;

an operation subunit, configured to perform a frame coarse regression operation according to the frame region by using a set neural network model, so as to obtain a segmented image;

and the second determination subunit is used for determining the position and/or type of the detection target according to the segmented image.

Further, the first determining subunit is specifically configured to at least one of the following:

determining all the grid areas as first frame areas;

determining the grid areas in the same row as a second frame area;

determining the grid areas in the same column as a third frame area;

determining the grid areas of two adjacent rows as fourth frame areas;

determining the grid areas of two adjacent columns as a fifth frame area;

Further, the second determining subunit is specifically configured to:

According to the technical scheme, coarse-granularity target contours are detected based on three-dimensional point cloud data, contours of detected candidate areas are projected to a two-dimensional characteristic image, target detection is further carried out on corresponding area small blocks of the contours of the candidate areas in the two-dimensional characteristic image, extraction of fine detection target features is not carried out directly based on the three-dimensional point cloud data, detection operand is greatly reduced, and detection speed is improved; the determined outline of the candidate region of the detection target is projected to the two-dimensional characteristic image, so that the aim of reducing the detection region is fulfilled, the target detection is only required to be carried out on the small region block corresponding to the candidate region in the characteristic image, the target detection is not required to be carried out on the whole characteristic image, the detection calculation amount is greatly reduced, the detection speed is improved, and the detection precision is ensured.

The object detection device provided by the embodiment of the invention can execute the object detection method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the object detection method.

Example five

Fig. 7 is a schematic structural diagram of an object detection system according to a fifth embodiment of the present invention, as shown in fig. 7, where the system includes: a three-dimensional point cloud acquisition device 710, a two-dimensional image acquisition device 720, and a processor 730;

the three-dimensional point cloud acquisition device 710 is in communication connection with the processor 730, and is configured to scan the obtained three-dimensional point cloud data for the physical space, and send the three-dimensional point cloud data to the processor;

the two-dimensional image acquisition device 720 is in communication connection with the processor 730, and is configured to acquire an initial two-dimensional image for physical space shooting, and send the initial two-dimensional image to the processor;

the processor 730 is configured to perform the steps of the object detection method according to any of the above embodiments based on the three-dimensional point cloud data and the initial two-dimensional image.

According to the target detection method, firstly, coarse-granularity target contours are detected based on three-dimensional point cloud data, then, the contours of the detected candidate areas are projected to a two-dimensional characteristic image, and further, target detection is carried out on small area blocks corresponding to the contours of the candidate areas in the two-dimensional characteristic image, rather than direct extraction of fine detection target features based on the three-dimensional point cloud data, so that detection operand is greatly reduced, and detection speed is improved; the determined outline of the candidate region of the detection target is projected to the two-dimensional characteristic image, so that the aim of reducing the detection region is fulfilled, the target detection is only required to be carried out on the small region block corresponding to the candidate region in the characteristic image, the target detection is not required to be carried out on the whole characteristic image, the detection calculation amount is greatly reduced, the detection speed is improved, and the detection precision is ensured.

Example six

Fig. 8 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present invention. Fig. 8 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 8 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 8, the electronic device 12 is in the form of a general purpose computing electronic device. Components of the electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 8, commonly referred to as a "hard disk drive"). Although not shown in fig. 8, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The system memory 28 may include at least one program product having a set of program modules (e.g., at least one candidate region determination module 610, feature image determination module 620, and detection module 630) configured to perform the functions of embodiments of the invention.

The program/utility 40 having a set (at least one candidate region determination module 610, feature image determination module 620, and detection module 630) of program modules 42 may be stored, for example, in the system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The electronic device 12 may also communicate with one or more external electronic devices 14 (e.g., keyboard, pointing electronic device, display 24, etc.), with one or more electronic devices that enable a user to interact with the electronic device 12, and/or with any electronic device (e.g., network card, modem, etc.) that enables the electronic device 12 to communicate with one or more other computing electronic devices. Such communication may occur through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through a network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 over the bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 12, including, but not limited to: microcode, electronic device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 16 executes various functional applications and object detection by running programs stored in the system memory 28, for example, implementing an object detection method step provided by the present embodiment, the method includes:

determining the position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image, as will be understood by those skilled in the art, the processor may also implement the technical solution of the target detection method provided in any embodiment of the present invention.

Example seven

The seventh embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method steps as provided by any embodiment of the present invention, the method comprising:

Determining the location and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

It will be appreciated by those of ordinary skill in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device, or distributed over a network of computing devices, or they may alternatively be implemented in program code executable by a computer device, such that they are stored in a memory device and executed by the computing device, or they may be separately fabricated as individual integrated circuit modules, or multiple modules or steps within them may be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A method of detecting an object, comprising:

determining a position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image;

the determining the position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image comprises:

inputting the initial two-dimensional image into a preset detection model;

determining a position and/or type of the detection target based on the second two-dimensional image;

before the mapping the first two-dimensional image to the feature image, the method further comprises:

extending the contour of the candidate region by a set distance from the corresponding image contour in the first two-dimensional image so as to supplement the edge of the main body of the detection target;

The determining the position and/or type of the detection target based on the second two-dimensional image comprises:

dividing the second two-dimensional image into a set number of grid areas;

determining at least one frame area according to the grid area;

performing frame rough regression operation according to the frame region by using a set neural network model to obtain a segmented image;

and determining the position and/or type of the detection target according to the segmented image.

2. The method of claim 1, wherein determining the contour of the detection target candidate region based on the three-dimensional point cloud data acquired for the physical space scan comprises:

determining a minimum bounding box of each cluster in the clustering result;

3. The method according to claim 2, wherein if the preset static object is a ground, the preprocessing the three-dimensional point cloud data to remove point cloud data belonging to the preset static object from the three-dimensional point cloud data includes:

4. The method according to claim 2, wherein if the preset static object is a static object other than the ground, the preprocessing the three-dimensional point cloud data to remove point cloud data belonging to the preset static object from the three-dimensional point cloud data includes:

5. The method of claim 1, wherein projecting the contour onto an initial two-dimensional image acquired for physical space photographing to obtain a first two-dimensional image corresponding to a detection target, comprises:

6. The method of claim 1, wherein the determining at least one border region from the grid region comprises at least one of:

determining all the grid areas as first frame areas;

determining the grid areas in the same row as a second frame area;

determining the grid areas in the same column as a third frame area;

determining the grid areas of two adjacent rows as fourth frame areas;

determining the grid areas of two adjacent columns as a fifth frame area;

7. The method according to claim 1, wherein said determining the position and/or type of the detection target from the segmented image comprises:

8. An object detection apparatus, comprising:

a detection module for determining a position and/or type of the detection target based on the first two-dimensional image and the initial two-dimensional image;

the detection module comprises:

inputting the initial two-dimensional image into a preset detection model;

a second determining unit configured to determine a position and/or type of a detection target based on the second two-dimensional image;

a supplementing unit, configured to extend, by a set distance, an image contour corresponding to a contour of the candidate region in the first two-dimensional image before the mapping of the first two-dimensional image to the feature image, so as to supplement a main body edge of a detection target;

The second determination unit includes:

9. An object detection system, comprising: the device comprises a three-dimensional point cloud acquisition device, a two-dimensional image acquisition device and a processor;

the processor is configured to perform the object detection method steps of any one of claims 1-7 based on the three-dimensional point cloud data and the initial two-dimensional image.

10. An electronic device, the electronic device comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, causes the one or more processors to implement the target detection method steps of any of claims 1-7.

11. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the object detection method steps of any of claims 1-7.