CN113256709A

CN113256709A - Target detection method, target detection device, computer equipment and storage medium

Info

Publication number: CN113256709A
Application number: CN202110396755.8A
Authority: CN
Inventors: 彭亮; 刘飞; 邓丹; 钱炜; 杨政; 何晓飞
Original assignee: Hangzhou Fabu Technology Co Ltd
Current assignee: Hangzhou Fabu Technology Co Ltd
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2021-08-13

Abstract

The method comprises the steps of firstly obtaining an image to be processed, carrying out target detection on the image to be processed to obtain depth information and two-dimensional block diagram information of each target object in the image to be processed, then obtaining three-dimensional center information, dimensionality and orientation information of each target object according to the depth information and the two-dimensional block diagram information of each target object in the image to be processed, and finally determining the three-dimensional block diagram information of each target object according to the three-dimensional center information, the dimensionality and the orientation information of each target object. According to the technical scheme, the depth information and the two-dimensional block diagram information of each target object are subjected to self-adaptive voxel processing, and the two-dimensional block diagram information of each target object is subjected to scaling processing, so that the target detection is more accurately performed on the image to be processed, and the problem of low detection accuracy in the prior art is solved.

Description

Target detection method, target detection device, computer equipment and storage medium

Technical Field

The present application relates to the field of detection technologies, and in particular, to a target detection method and apparatus, a computer device, and a storage medium.

Background

The detection of three-dimensional objects is a matter of great concern in the field of automatic driving, and the detection results play a crucial role in the driving safety of vehicles. For example, in the case where a detection device is mounted on a vehicle to detect an obstacle, and another vehicle is a preceding obstacle, it is required to be able to detect the other vehicle and calculate the position and speed information thereof with higher accuracy, and the calculated position and speed information is used as an input of, for example, a collision avoidance function or a preceding vehicle tracking function, and contributes to more appropriate vehicle control.

In the prior art, a common method for detecting a three-dimensional object is a camera-based method, and specifically, an image in front of a vehicle is acquired by a single camera, and orientation estimation is performed by extracting features of the object in the image, so as to determine a detection result of the object in the image.

However, the camera-based three-dimensional detection method only relies on cheap and mature camera sensors, obtains plane information of an object, cannot provide depth information of a scene, and has the problem of low detection accuracy.

Disclosure of Invention

The embodiment of the application provides a target detection method, a target detection device, computer equipment and a storage medium, which are used for solving the problem of low detection accuracy of a three-dimensional object in the prior art.

In a first aspect, an embodiment of the present application provides a target detection method, including:

acquiring an image to be processed, wherein the image to be processed comprises: at least one target object;

carrying out target detection on the image to be processed to obtain depth information and two-dimensional block diagram information of each target object in the image to be processed;

obtaining three-dimensional center information, dimension and orientation information of each target object according to the depth information and the two-dimensional block diagram information of each target object in the image to be processed;

and determining the three-dimensional block diagram information of each target object according to the three-dimensional center information, the dimension and the orientation information of each target object.

In a possible design of the first aspect, the obtaining, according to the depth information and the two-dimensional block diagram information of each target object in the image to be processed, three-dimensional center information, dimension information, and orientation information of each target object includes:

carrying out self-adaptive voxel processing on the depth information and the two-dimensional block diagram information of each target object to determine the three-dimensional center information of each target object;

and scaling the two-dimensional block diagram information of each target object, and determining the dimension and orientation information of each target object.

In this possible design, the performing adaptive voxel processing on the depth information and the two-dimensional block diagram information of each target object to determine three-dimensional center information of each target object includes:

carrying out reprojection processing on each target object according to the depth information of the target object to obtain the spatial point cloud information of the target object;

quantizing the spatial point cloud information of the target object into voxel information according to the two-dimensional block diagram information of the target object to obtain self-adaptive voxel information of the target object;

and inputting the self-adaptive voxel information of the target object into a pre-trained three-dimensional center locator to obtain the three-dimensional center information of the target object.

In this possible design, the quantizing the spatial point cloud information of the target object into voxel information according to the two-dimensional block diagram information of the target object to obtain adaptive voxel information of the target object includes:

determining a point cloud range corresponding to the target object according to the two-dimensional block diagram information of the target object;

determining the size of each voxel in the point cloud range according to the point cloud density in the point cloud range;

and dividing the point cloud range according to the size of each voxel in the point cloud range to obtain the self-adaptive voxel information of the target object.

In another possible design of the first aspect, the scaling the two-dimensional block diagram information of each target object to determine the dimension and orientation information of each target object includes:

zooming the two-dimensional block diagram information of each target object according to a preset length-width ratio to obtain a zooming result of a preset size;

carrying out zero filling operation on the hollow white area in the zooming result to obtain the self-adaptive frame information of each target object;

and inputting the self-adaptive frame information of each target object into an object dimension and orientation estimator to obtain dimension and orientation information of each target object.

Optionally, after the three-dimensional block diagram information of each target object is determined according to the three-dimensional center information, the dimension, and the orientation information of each target object, the method further includes:

and for each target object, determining the confidence of the three-dimensional block diagram information according to the two-dimensional block diagram information of the target object and the three-dimensional block diagram information of the target object.

In yet another possible design of the first aspect, the performing target detection on the image to be processed to obtain depth information and two-dimensional block diagram information of each target object in the image to be processed includes:

inputting the image to be processed into a pre-trained two-dimensional detector to obtain a two-dimensional frame area of each target object in the image to be processed;

cutting the two-dimensional frame area of each target object in the image to be processed to obtain two-dimensional frame information of each target object;

inputting the image to be processed into a depth estimator trained in advance to obtain a depth image corresponding to the image to be processed;

and according to the two-dimensional frame area of each target object in the image to be processed, carrying out shearing processing on the depth image to obtain the depth information of each target object.

In a second aspect, an embodiment of the present application provides an object detection apparatus, including: the device comprises an acquisition module, a detection module and a processing module;

the acquiring module is used for acquiring an image to be processed, and the image to be processed comprises: at least one target object;

the detection module is used for carrying out target detection on the image to be processed to obtain depth information and two-dimensional block diagram information of each target object in the image to be processed;

the processing module is used for obtaining three-dimensional center information, dimensionality and orientation information of each target object according to the depth information and the two-dimensional block diagram information of each target object in the image to be processed, and determining the three-dimensional block diagram information of each target object according to the three-dimensional center information, the dimensionality and the orientation information of each target object.

In a possible design of the second aspect, the processing module is configured to obtain three-dimensional center information, dimension, and orientation information of each target object according to the depth information and the two-dimensional block diagram information of each target object in the image to be processed, and specifically:

the processing module is specifically configured to:

In this possible design, the processing module is configured to perform adaptive voxel processing on the depth information and the two-dimensional block diagram information of each target object, and determine three-dimensional center information of each target object, specifically:

the processing module is specifically configured to:

Optionally, the processing module is configured to quantize the spatial point cloud information of the target object into voxel information according to the two-dimensional block diagram information of the target object, so as to obtain adaptive voxel information of the target object, and specifically includes:

the processing module is specifically configured to:

In another possible design of the second aspect, the processing module is configured to perform scaling processing on the two-dimensional block diagram information of each target object, and determine the dimension and orientation information of each target object, specifically:

the processing module is specifically configured to:

Optionally, the processing module is further configured to determine, for each target object, a confidence level of the three-dimensional block diagram information according to the two-dimensional block diagram information of the target object and the three-dimensional block diagram information of the target object.

In yet another possible design of the second aspect, the detection module is specifically configured to:

In a third aspect, an embodiment of the present application provides a computer device, including: at least one processor, a memory;

the memory stores computer-executable instructions;

the at least one processor executes the computer-executable instructions to cause the computer apparatus to perform the object detection method as described in the first aspect and various possible designs above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-executable instructions are used to implement the object detection method as described in the first aspect and various possible designs.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program, which when executed by a processor, is configured to implement the object detection method as described in the first aspect and various possible designs.

According to the method, the device, the computer equipment and the storage medium for detecting the target, firstly, an image to be processed is obtained, target detection is carried out on the image to be processed, depth information and two-dimensional block diagram information of each target object in the image to be processed are obtained, then, three-dimensional center information, dimension and orientation information of each target object are obtained according to the depth information and the two-dimensional block diagram information of each target object in the image to be processed, and finally, the three-dimensional block diagram information of each target object is determined according to the three-dimensional center information, the dimension and the orientation information of each target object. According to the technical scheme, the depth information and the two-dimensional block diagram information of each target object are subjected to self-adaptive voxel processing, and the two-dimensional block diagram information of each target object is subjected to scaling processing, so that the target detection is more accurately performed on the image to be processed, and the problem of low detection accuracy in the prior art is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic view of an application scenario of a target detection method provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a first embodiment of a target detection method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a second embodiment of a target detection method according to an embodiment of the present application;

fig. 4 is a schematic diagram of a spatial point cloud provided in the embodiment of the present application;

fig. 5 is a schematic diagram of adaptive voxel construction provided in an embodiment of the present application;

fig. 6 is a schematic flowchart of a third embodiment of a target detection method according to an embodiment of the present application;

FIG. 7A is a schematic diagram of a scaling process in the prior art according to an embodiment of the present application;

fig. 7B is a schematic diagram of a scaling process provided in the embodiment of the present application;

fig. 8 is a schematic flowchart of a fourth embodiment of a target detection method according to an embodiment of the present application;

FIG. 9 is a general block diagram of a target detection method according to an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Before introducing the embodiments of the present application, the background of the present application is explained first:

in the field of automatic driving, three-dimensional object detection has been a problem of great interest, and various solutions have emerged. These methods can be broadly divided into two categories, lidar-based methods and camera-based methods.

The method based on the laser radar provides an accurate scene depth value through the laser radar point cloud, so that the obtained detection result is relatively accurate, however, the laser radar has a limited working range and is too expensive, and the method is not suitable for large-scale popularization.

As an alternative, the camera-based approach, while far from satisfactory in terms of detection performance, relies solely on cheaper and more sophisticated camera sensors, and is therefore of increasing interest in industry and academia, particularly the monocular (i.e. requiring only a single camera).

Since the conventional monocular image-based method lacks explicit depth information, it is difficult to accurately predict an object in a three-dimensional space. For this reason, many of the latest monocular three-dimensional object detection methods directly acquire depth information using an estimated depth map. Some of these methods convert the depth map into a pseudo-lidar point cloud, which is then detected. However, methods based on pseudo-lidar mostly use existing three-dimensional detectors designed specifically for the exact lidar point cloud. For high noise monocular pseudoradars, this type of approach is difficult to handle resulting in poor performance.

Also, the influence of the orientation estimation on the three-dimensional object on the detection result is also important. The prior art typically performs orientation estimation by converting from the entire image or depth map to be detected to a point cloud, thereby extracting image features. However, the orientation of the three-dimensional object in this method depends only on the appearance of the object on the image to be detected. While global semantics are unnecessary or even detrimental to the orientation estimation. Semantics from regions outside the object may interfere with or even overwhelm important local semantics. Also, originally resizing the cropped image destroys semantic cues associated with the orientation estimation and is therefore detrimental to the orientation estimation result.

Furthermore, most three-dimensional object detection methods refer to the confidence of two-dimensional detection as the confidence score of three-dimensional detection, which ignores the complexity of three-dimensional detection, objects with high two-dimensional detection confidence may be difficult to accurately locate in three-dimensional space, e.g., occluded, truncated, or distant objects. The confidence strategy should be redesigned according to the actual situation of the three-dimensional detection.

Based on the problems in the prior art, fig. 1 is a schematic view of an application scenario of the target detection method provided in the embodiment of the present application, so as to solve the technical problems. As shown in fig. 1, the application scenario diagram includes: an image to be detected 1000, an object detection device 1100, and a detected image 1200.

Among them, the image 1000 to be detected includes: at least one target object, e.g., a vehicle image 1001; in the detected image 1200, there are included: a vehicle image 1001, a two-dimensional block diagram 1202, and a three-dimensional block diagram 1203.

Alternatively, the object detection apparatus 1100 may be a physical device such as a computer device, for example, the object detection apparatus 1100 may be mounted on a vehicle.

In a possible design, the object detection apparatus 1100 may receive the image 1000 to be detected (for example, the image 1000 to be detected includes an image 1001 of a vehicle of another vehicle to be detected) acquired by the vehicle photographing device, and process the image 1000 to be detected (the processing procedure is described in detail in the following embodiments), so as to obtain a detected image 1200, where a two-dimensional frame 1202 and a three-dimensional frame 1203 of the image 1001 of the marked vehicle are displayed in the detected image 1200.

After the two-dimensional block diagram 1202 and the three-dimensional block diagram 1203 are obtained, the vehicle can calculate the position information and the speed information of other vehicles with higher precision, and the calculated position information and speed information can be used as the input of the anti-collision function or the front tracking function of the vehicle, so that more appropriate vehicle control can be realized.

In order to solve the technical problems, the technical conception process of the inventor is as follows: the inventor finds that adaptive voxel construction can be performed after obtaining a depth map of an image to be detected so as to improve the possibility of the current absolute position of a target object in the image to be detected, and in addition, when the target object is subjected to scaling processing during orientation estimation, the aspect ratio of the object is considered so as to adapt to a regression network required by the orientation estimation, so that the accuracy of a detection result is improved.

The technical solution of the present application is described in detail below with reference to an application scenario diagram shown in fig. 1 by specific embodiments. It should be noted that the following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 2 is a schematic flowchart of a first embodiment of a target detection method according to an embodiment of the present application. As shown in fig. 2, the following steps may be included:

and step 21, acquiring an image to be processed.

Wherein, the image to be processed includes: at least one target object.

In this step, when a target object in a scene needs to be detected, first, scene information including the target object needs to be captured, that is, an image obtained by shooting with a device such as a camera or an image extracted from another device or a memory, and the image is called an image to be detected.

Among them, there may be vehicles, pedestrians, obstacles, etc., that is, target objects, in the image to be detected.

Alternatively, the image to be processed may be a Red Green Blue (RGB) color mode image.

And 22, carrying out target detection on the image to be processed to obtain the depth information and the two-dimensional block diagram information of each target object in the image to be processed.

In the step, each target object in the image to be processed is detected, and the bit number required by the gray scale or color of the image actually stored in each target object, namely the depth information of each target object, is detected; and detecting a frame of each target object surrounded from a two-dimensional angle, i.e., two-dimensional frame diagram information of each target object.

Optionally, the depth information of each target object may be obtained by inputting an image to be detected to a pre-trained depth estimator, and the two-dimensional block diagram information of each target object may be obtained by inputting an image to be detected to a pre-trained two-dimensional detector.

And step 23, obtaining three-dimensional center information, dimension and orientation information of each target object according to the depth information and the two-dimensional block diagram information of each target object in the image to be processed.

In this step, for each target object, the depth information and the two-dimensional block diagram information of the target object obtained in the above step are further processed to obtain Adaptive Voxel (AV) information that can be input to a three-dimensional center locator, and the three-dimensional center locator outputs three-dimensional center information, that is, a position with the maximum possibility of the current position absolute position of the target object in the image to be processed, and the two-dimensional block diagram information of each target object is further processed to obtain Adaptive frame information that can be input to an object dimension and orientation estimator, and the object dimension and orientation estimator outputs dimension and orientation information, that is, the dimension and direction of the target object.

Optionally, the two-dimensional block diagram information of each target object is scaled, so that the scaled result is input to the object dimension and orientation estimator, and the dimension and orientation information of each target object is output.

Optionally, the depth information and the two-dimensional block diagram information of each target object are subjected to adaptive voxel processing to obtain adaptive voxel information, so that the adaptive voxel information is input to the three-dimensional center locator, and the three-dimensional center information of the target object is output.

And 24, determining the three-dimensional block diagram information of each target object according to the three-dimensional center information, the dimension and the orientation information of each target object.

In this step, for each target object, the three-dimensional block diagram corresponding to the target object may be marked in the image to be processed when the position of the maximum possibility of the absolute position of the current position of the target object in the image to be processed is obtained and the dimension and direction information of the target object are obtained.

After this step, for each target object, a confidence of the three-dimensional block diagram information is determined from the two-dimensional block diagram information of the target object and the three-dimensional block diagram information of the target object.

Optionally, in this embodiment, the confidence detection of the three-dimensional block diagram needs to use two-dimensional block diagram information of the target object and three-dimensional block diagram information of the target object.

In a possible implementation, an Intersection of unity (IoU) of the two-dimensional block diagram and the three-dimensional block diagram may be used as a measure, and in addition, due to unreliability of depth prediction in three-dimensional detection, the Intersection ratio of a far target object in an image to be detected is greater than that of a near target object, so that a confidence Conf of three-dimensional block diagram information is provided_3dThe formula of (1) is as follows:

Conf_3d＝IoU(box_proj，box_2d)*Conf_2d/e^dis/λ

IoU (box) among them_proj，box_2d) Representing the intersection ratio, box, of two-dimensional block diagram and three-dimensional block diagram_projRepresenting three-dimensional block diagrams, box_2dRepresenting a two-dimensional block diagram, Conf_2dRepresenting the confidence of the two-dimensional block diagram information, dis represents the distance of the target object from the camera's optical center, and λ may be a control factor with a value of 80.

The target detection method provided by the embodiment of the application comprises the steps of firstly obtaining an image to be processed, carrying out target detection on the image to be processed to obtain depth information and two-dimensional block diagram information of each target object in the image to be processed, then obtaining three-dimensional center information, dimensionality and orientation information of each target object according to the depth information and the two-dimensional block diagram information of each target object in the image to be processed, and finally determining the three-dimensional block diagram information of each target object according to the three-dimensional center information, the dimensionality and the orientation information of each target object. According to the technical scheme, the depth information and the two-dimensional block diagram information of each target object are subjected to self-adaptive voxel processing, and the two-dimensional block diagram information of each target object is subjected to scaling processing, so that the target detection is more accurately performed on the image to be processed, the problem of low detection accuracy in the prior art is solved, and a verification basis is provided for the confidence coefficient of the three-dimensional block diagram of the target object.

On the basis of the foregoing embodiments, fig. 3 is a schematic flowchart of a second embodiment of a target detection method provided in the embodiments of the present application. As shown in fig. 3, the step 23 of performing adaptive voxel processing on the depth information and the two-dimensional block information of each target object to determine the three-dimensional center information of each target object may include the following steps (for each target object in the image to be detected):

and 31, carrying out re-projection processing according to the depth information of the target object to obtain the spatial point cloud information of the target object.

In this step, the depth information of the target object is re-projected to generate spatial point cloud information of the target object, i.e., the distribution of the spatial point cloud of the target object.

Optionally, the embodiment is described with reference to the content shown in fig. 4, where fig. 4 is a schematic view of a spatial point cloud provided in the embodiment of the present application, and points of a spatial point cloud distribution 400 in fig. 4 represent a spatial point cloud of an image to be processed.

And step 32, quantizing the spatial point cloud information of the target object into voxel information according to the two-dimensional block diagram information of the target object to obtain self-adaptive voxel information of the target object.

In this step, the spatial point cloud information of the target object is determined in the spatial point cloud of the image to be processed, and is quantized into the voxel information, so as to obtain the adaptive voxel information of the target object, and thus, the feature extraction of the target object can be fully performed, that is, the feature value of each voxel is the RGB value corresponding to the target object.

Specifically, the quantization process can be implemented by the following steps:

step 1, determining a point cloud range corresponding to a target object according to two-dimensional block diagram information of the target object.

Optionally, the spatial point cloud information of the image to be processed is cut by combining the two-dimensional block diagram of the target object, so as to obtain the spatial point cloud information presented in the dotted line frame in the spatial point cloud distribution 400, that is, the point cloud range corresponding to the target object.

And 2, determining the size of each voxel in the point cloud range according to the point cloud density in the point cloud range.

In one possible implementation, the density of the spatial point cloud represented in the dashed box in the spatial point cloud distribution 400 is divided, for example, the size of each voxel of the first construction mode 401 in fig. 4 is the same, and the density of the spatial point cloud in each voxel is different.

In another possible implementation, the density of the spatial point clouds represented in the dashed box in the spatial point cloud distribution 400 is divided, for example, the number of the spatial point clouds included in each voxel of the second construction method 402 in fig. 4 is consistent, that is, the voxel where the density of the spatial point clouds is large is small, and the voxel where the density of the spatial point clouds is small is large.

And 3, dividing the point cloud range according to the size of each voxel in the point cloud range to obtain the self-adaptive voxel information of the target object.

Optionally, taking the second construction method 402 in fig. 4 as an example, fig. 5 is a schematic diagram of constructing an adaptive voxel provided in the embodiment of the present application. The adaptive voxel information of the target object under the second construction mode 402 shown in fig. 5.

And step 33, inputting the self-adaptive voxel information of the target object into a pre-trained three-dimensional center locator to obtain the three-dimensional center information of the target object.

In this step, the pre-training process of the three-dimensional center locator may be: the three-dimensional center information of a certain target object is known, then the self-adaptive voxel information of the target object is input into a three-dimensional center locator to obtain the processed three-dimensional center information, and then the three-dimensional center information is compared with the known three-dimensional center information of the target object, and the parameters of the three-dimensional center locator are continuously adjusted until the known three-dimensional center information of the target object is consistent with the processed three-dimensional center information, so that the three-dimensional center locator with accurate parameters is obtained.

At this time, the adaptive voxel information of the target object is input into the three-dimensional center locator, and a three-dimensional probability map having the same resolution as the adaptive voxel information of the target object is input, each probability represents the possibility of the target object being at the current absolute position, and the current position finally containing the maximum probability is selected as the three-dimensional center of the target object.

Alternatively, the three-dimensional center locator may be a three-dimensional U-network structure (U-net).

According to the target detection method provided by the embodiment of the application, the reprojection processing is performed according to the depth information of the target object to obtain the spatial point cloud information of the target object, the spatial point cloud information of the target object is quantized into the voxel information according to the two-dimensional block diagram information of the target object to obtain the self-adaptive voxel information of the target object, and finally the point cloud range is divided according to the size of each voxel in the point cloud range to obtain the self-adaptive voxel information of the target object. In the technical scheme, the depth information and the two-dimensional block diagram information of each target object are subjected to self-adaptive voxel processing, so that a basis is provided for accurately obtaining the three-dimensional block diagram information.

On the basis of the foregoing embodiments, fig. 6 is a schematic flowchart of a third embodiment of a target detection method provided in the embodiments of the present application. As shown in fig. 6, the scaling process performed on the two-dimensional block diagram information of each target object in step 23 to determine the dimension and orientation information of each target object may include the following steps:

and 61, zooming the two-dimensional block diagram information of each target object according to a preset length-width ratio to obtain a zooming result with a preset size.

For example, fig. 7A is a schematic diagram of a scaling process in the prior art provided in an embodiment of the present application. As shown in fig. 7A, when scaling a target object (e.g., a vehicle), it is necessary to meet the input requirements of the object dimension and orientation estimator, for example, if the input size required by the object dimension and orientation estimator is a × a, the length and width of the target object are extended to a, respectively, to obtain a processed target object image.

In this step, in order to overcome the input requirement of the object dimension and orientation estimator in the prior art, which causes the shape of the target object to change during the scaling process, it is first required to ensure that the aspect ratio of the target object is not changed during the scaling process.

Alternatively, the two-dimensional block diagram information of the target object may be an image of the clipped target object, and may be a rectangle.

In one possible implementation, fig. 7B is a schematic diagram of a scaling process provided in the embodiment of the present application. As shown in fig. 7B, if the input size required by the object dimension and orientation estimator is a × a (i.e., a preset size), the length and width of the target object are scaled at this time, so that the length of the target object is a, and the width of the target object is smaller than a, that is, the scaling result of the preset size is obtained.

It should be understood that the scaling process described above may also be performed by scaling up the length and width to meet the preset size required by the input of the object dimension and orientation estimator.

And step 62, performing zero filling operation on the hollow white area in the zooming result to obtain the self-adaptive frame information of each target object.

In this step, as shown in fig. 7B, zero padding is performed on a blank region in the scaling result, that is, image information of the target object is not included in the blank region, and then adaptive frame information of the target object is obtained.

And step 63, inputting the self-adaptive frame information of each target object into an object dimension and orientation estimator to obtain dimension and orientation information of each target object.

In this step, the pre-training process of the object dimension and orientation estimator may be: knowing the dimension and orientation information of a certain target object, inputting the self-adaptive frame information of the target object into an object dimension and orientation estimator to obtain the processed dimension and orientation information, comparing the processed dimension and orientation information with the known dimension and orientation information of the target object, and continuously adjusting the parameters of the object dimension and orientation estimator until the known dimension and orientation information of the target object is consistent with the processed dimension and orientation information, thereby obtaining the object dimension and orientation estimator with accurate parameters.

At this time, the adaptive frame information of the target object is input into the object dimension and orientation estimator, and the dimension and orientation information of the target object is obtained.

Alternatively, the object dimension and orientation estimator may be a Visual Geometry Group Network (VGG) model.

The target detection method provided by the embodiment of the application comprises the steps of firstly carrying out scaling processing on two-dimensional block diagram information of each target object according to a preset length-width ratio to obtain a scaling result with a preset size, carrying out zero filling operation on a hollow white area in the scaling result to obtain self-adaptive frame information of each target object, and finally inputting the self-adaptive frame information of each target object to an object dimension and orientation estimator to obtain the dimension and orientation information of each target object. In the technical scheme, zero filling processing is carried out on the scaling result of the target object, so that the two-dimensional block diagram information of the target object is ensured not to be deformed before being input to the object dimension and orientation estimator, and the accuracy of the dimension and orientation information of the target object is improved.

On the basis of the foregoing embodiment, fig. 8 is a schematic flowchart of a fourth embodiment of a target detection method provided in the embodiment of the present application. As shown in fig. 8, the step 22 may include the following steps:

and 81, inputting the image to be processed into a pre-trained two-dimensional detector to obtain a two-dimensional frame area of each target object in the image to be processed.

In this step, the image to be processed is input to a pre-trained two-dimensional detector, and an image having two-dimensional frame areas, each of which includes a target object, is output.

It should be understood that the pre-training process of the two-dimensional detector is the same as above, and will not be described herein.

And 82, shearing the two-dimensional frame area of each target object in the image to be processed to obtain the two-dimensional frame information of each target object.

In this step, the image output by the two-dimensional detector is cut according to the two-dimensional frame area to obtain two-dimensional frame information of each target object, where the two-dimensional frame information may be two-dimensional frames of each target object with different sizes.

And 83, inputting the image to be processed into a depth estimator trained in advance to obtain a depth image corresponding to the image to be processed.

In this step, the image to be processed is input into a pre-trained depth estimator, and the output image is the depth value of each pixel in the current scene shown in the image to be processed.

It should be understood that the pre-training process of the depth estimator is the same as above, and will not be described herein.

And 84, according to the two-dimensional frame area of each target object in the image to be processed, performing shearing processing on the depth image to obtain the depth information of each target object.

In this step, the depth image corresponding to the image to be processed is obtained in the above steps, and the cropping processing is performed according to the position of each target object on the depth image (the two-dimensional frame area of each target object), so as to obtain the depth image of each target object, that is, the depth information.

The target detection method provided by the embodiment of the application obtains the two-dimensional frame area of each target object in the image to be processed by inputting the image to be processed into the pre-trained two-dimensional detector, and performs shearing processing on the two-dimensional frame area of each target object in the image to be processed to obtain the two-dimensional frame information of each target object, then inputs the image to be processed into the pre-trained depth estimator to obtain the depth image corresponding to the image to be processed, and finally performs shearing processing on the depth image according to the two-dimensional frame area of each target object in the image to be processed to obtain the depth information of each target object. According to the technical scheme, the depth information of each target object is determined, so that a foundation is provided for obtaining three-dimensional block diagram information more accurately in the follow-up process.

On the basis of the foregoing embodiments, fig. 9 is a general framework schematic diagram of a target detection method provided in the embodiments of the present application. As shown in fig. 9, the overall frame schematic includes: an image to be detected 1000, an object detection device 1100 and a detected image 1200.

Wherein, the image to be detected 1000 includes: at least one vehicle image 1001; the object detection apparatus 1100 includes: a depth estimator 901, a two-dimensional detector 902, an object dimension and orientation estimator 903, and a three-dimensional center locator 904; the detected image 1200 includes: a three-dimensional block diagram labeled with a vehicle image 1001.

In a possible implementation manner, the image 1000 to be detected is input to the two-dimensional detector 902 to obtain a two-dimensional block diagram 1202 corresponding to the vehicle image 1001, the two-dimensional block diagram 1202 is cut in combination with the image 1000 to be detected to obtain a cut vehicle image 1002, at this time, the image 1000 to be detected is input to the depth estimator 901 to obtain the depth information of the image 1000 to be detected, and the depth information of the image 1000 to be detected is cut based on the two-dimensional block diagram 1202 to obtain the depth information of the vehicle.

Further, the vehicle image is zoomed to obtain adaptive frame information, and conversion processing is performed in combination with the depth information of the vehicle to obtain adaptive voxel information, at this time, the adaptive frame information is input to the object dimension and orientation estimator 903 to obtain dimension and orientation information of the vehicle, the adaptive voxel information is input to the three-dimensional center locator 904 to obtain three-dimensional center information of the vehicle, and then the three-dimensional block diagram 1203 of the vehicle is determined according to the dimension and orientation information and the three-dimensional center information.

The specific process of the conversion treatment is as follows: carrying out reprojection operation on the depth information of the vehicle to obtain point cloud information of the vehicle, and generating self-adaptive voxel information by combining the sheared vehicle image 1002.

The general framework of the target detection method provided by the embodiment of the application obtains the depth information of the image to be detected and the two-dimensional block diagram of the vehicle by inputting the image to be detected into the depth estimator and the two-dimensional detector respectively, further performs shearing operation on the depth information of the image to be detected and the two-dimensional block diagram of the vehicle to obtain the depth information of the vehicle and the vehicle image, performs conversion operation on the depth information of the vehicle to obtain adaptive voxel information, performs scaling operation on the vehicle image to obtain adaptive frame information, finally inputs the adaptive voxel information and the adaptive frame information into the three-dimensional center locator and the object dimension and orientation estimator 903 respectively, and determines the three-dimensional block diagram information of the vehicle in the mapping according to the obtained result. According to the technical scheme, the depth information and the two-dimensional block diagram information of each target object are subjected to self-adaptive voxel processing, and the two-dimensional block diagram information of each target object is subjected to scaling processing, so that the target detection is more accurately performed on the image to be processed, and the problem of low detection accuracy in the prior art is solved.

On the basis of the above-mentioned embodiment of the target detection method, fig. 10 is a schematic structural diagram of a target detection apparatus provided in the embodiment of the present application. As shown in fig. 10, the object detection device includes: the system comprises an acquisition module 101, a detection module 102 and a processing module 103;

an obtaining module 101, configured to obtain an image to be processed, where the image to be processed includes: at least one target object;

the detection module 102 is configured to perform target detection on an image to be processed to obtain depth information and two-dimensional block diagram information of each target object in the image to be processed;

the processing module 103 is configured to obtain three-dimensional center information, dimension, and orientation information of each target object according to the depth information and the two-dimensional block diagram information of each target object in the image to be processed, and determine three-dimensional block diagram information of each target object according to the three-dimensional center information, the dimension, and the orientation information of each target object.

In a possible design provided in the embodiment of the present application, the processing module 103 is configured to obtain three-dimensional center information, dimension, and orientation information of each target object according to the depth information and the two-dimensional block diagram information of each target object in the image to be processed, and specifically:

the processing module 103 is specifically configured to:

In this possible design, the processing module 103 is configured to perform adaptive voxel processing on the depth information and the two-dimensional block diagram information of each target object, and determine three-dimensional center information of each target object, specifically:

the processing module 103 is specifically configured to:

Optionally, the processing module 103 is configured to quantize the spatial point cloud information of the target object into voxel information according to the two-dimensional block diagram information of the target object, to obtain adaptive voxel information of the target object, and specifically includes:

the processing module 103 is specifically configured to:

In another possible design provided in this embodiment of the present application, the processing module 103 is configured to perform scaling processing on the two-dimensional block diagram information of each target object, and determine the dimension and orientation information of each target object, specifically:

the processing module 103 is specifically configured to:

Optionally, the processing module 103 is further configured to determine, for each target object, a confidence level of the three-dimensional block diagram information according to the two-dimensional block diagram information of the target object and the three-dimensional block diagram information of the target object.

In yet another possible design provided in this embodiment of the present application, the detection module 102 is specifically configured to:

shearing the two-dimensional frame area of each target object in the image to be processed to obtain two-dimensional frame information of each target object;

and according to the two-dimensional frame area of each target object in the image to be processed, performing shearing processing on the depth image to obtain the depth information of each target object.

The target detection device provided in the embodiment of the present application may be used to execute the technical solution corresponding to the target detection method in the above embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

It should be noted that the division of each module in the above object detection apparatus is only a division of a logic function, and all or part of the actual implementation may be integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

Fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 11, the computer apparatus may include: at least one processor 110, a memory 111, and computer program instructions stored on the memory 111 and operable on the processor 110.

Optionally, the computer device may further include: a transceiver 112.

The processor 110 executes computer-executable instructions stored by the memory 111, causing the processor 110 to perform the aspects of the embodiments described above. The processor 110 may be a general-purpose processor including a central processing unit CPU, a Network Processor (NP), and the like; but also a digital signal processor DSP, an application specific integrated circuit ASIC, a field programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components.

The memory 111 and the transceiver 112 are coupled to the processor 110 via a system bus and communicate with each other, and the memory 111 is used for storing computer program instructions.

The transceiver 112 is used to communicate with other computer devices, and the transceiver 112 forms a communication interface.

Optionally, in terms of hardware implementation, the obtaining module 101 in the embodiment shown in fig. 10 corresponds to the transceiver 112 in this embodiment.

In one possible implementation, the computer device may further include: a display 113, the display 113 is used for displaying the target detection result.

The system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

It should be understood that the computer device may be a computer, an electronic control unit of a vehicle, a cell phone, etc.

The computer device provided in the embodiment of the present application may be configured to execute the technical solution corresponding to the target detection method in the above embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.

The embodiment of the application also provides a chip for running the instructions, and the chip is used for executing the technical scheme of the target detection method in the embodiment.

The embodiment of the present application further provides a computer-readable storage medium, where a computer instruction is stored in the computer-readable storage medium, and when the computer instruction runs on a computer device, the computer device is enabled to execute the technical solution of the target detection method in the foregoing embodiment.

The embodiment of the present application further provides a computer program product, which includes a computer program, and the computer program is used for executing the technical solution of the object detection method in the foregoing embodiment when being executed by a processor.

The computer-readable storage medium described above may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of object detection, comprising:

2. The method according to claim 1, wherein obtaining three-dimensional center information, dimension information, and orientation information of each target object according to the depth information and two-dimensional block diagram information of each target object in the image to be processed comprises:

3. The method according to claim 2, wherein the performing adaptive voxel processing on the depth information and the two-dimensional block diagram information of each target object to determine three-dimensional center information of each target object comprises:

4. The method according to claim 3, wherein the quantizing the spatial point cloud information of the target object into voxel information according to the two-dimensional block diagram information of the target object to obtain adaptive voxel information of the target object comprises:

5. The method of claim 2, wherein the scaling the two-dimensional block diagram information of each target object to determine the dimension and orientation information of each target object comprises:

6. The method according to any one of claims 1-5, wherein after determining the three-dimensional block diagram information of each target object according to the three-dimensional center information and the dimension and orientation information of each target object, the method further comprises:

7. The method according to any one of claims 1 to 5, wherein the performing target detection on the image to be processed to obtain depth information and two-dimensional block diagram information of each target object in the image to be processed comprises:

8. An object detection device, comprising: the device comprises an acquisition module, a detection module and a processing module;

9. A computer device, comprising: a processor, a memory and computer program instructions stored on the memory and executable on the processor, wherein the processor implements the object detection method as claimed in any one of claims 1 to 7 when executing the computer program instructions.

10. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, are configured to implement the object detection method of any one of claims 1 to 7.