CN112446227A

CN112446227A - Object detection method, device and equipment

Info

Publication number: CN112446227A
Application number: CN201910746537.5A
Authority: CN
Inventors: 苗振伟
Original assignee: Alibaba Group Holding Ltd
Current assignee: Wuzhou Online E Commerce Beijing Co ltd
Priority date: 2019-08-12
Filing date: 2019-08-12
Publication date: 2021-03-05
Also published as: WO2021027710A1

Abstract

The application discloses an object detection method, device and equipment, and a vehicle driving information determination method, device and equipment. The object detection method comprises the following steps: according to the environmental point cloud data, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels; generating a two-dimensional feature image of a target space according to third feature data through a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution; object information in the target space is determined from the two-dimensional feature image including the first inter-voxel neighborhood information and the second inter-voxel neighborhood information. By adopting the processing mode, higher object detection accuracy and higher detection speed can be effectively considered.

Description

Object detection method, device and equipment

Technical Field

The application relates to the technical field of image processing, in particular to an object detection method, device and equipment, a vehicle driving information determination method, device and equipment, a focus detection method, device and equipment, a virtual shopping method, device and equipment, and a weather prediction method, device and equipment.

Background

In the fields of unmanned driving, robots and the like, machine sensing is an important component, and sensing sensors comprise laser radars, cameras, ultrasonic waves, millimeter wave radars and the like. Compared with sensors such as a camera, ultrasonic waves and millimeter wave radars, the laser point cloud signal of the multi-line laser radar contains accurate target position information and geometric shape information of a target, so that the laser point cloud signal plays an important role in the perception of unmanned driving and robots.

At present, a typical processing procedure of a method for realizing target identification by using a laser point cloud includes the following steps: 1) unlike target detection algorithms that directly project a laser point cloud onto one or more planes, 3D voxelization methods divide the laser point cloud into multiple voxels of equal volume; 2) aiming at each voxel, extracting the voxel characteristics from the point cloud data of the voxel by using methods such as MLP + firing and the like; 3) reducing the dimension of the spatial 3D information into spatial 2D information through multilayer 3D convolution; 4) and carrying out target detection based on the space 2D information after dimension reduction. Compared with the existing deep learning method based on laser point cloud projection (the point cloud is directly projected to a specific plane), the method fully extracts the adjacent information between voxels by using 3D convolution, so that the perception performance of the laser radar is improved.

In summary, the prior art has a problem that a higher object detection accuracy and a faster detection speed cannot be considered at the same time. How to improve the object detection accuracy without reducing the object detection speed is a problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

The application provides an object detection method, which aims to solve the problem that the prior art cannot give consideration to higher object detection accuracy and higher detection speed. The application further provides an object detection device and equipment, a vehicle driving information determination method, a device and equipment, a focus detection method, a device and equipment, a virtual shopping method, a device and equipment, and a weather prediction method, a device and equipment.

The application provides an object detection method, comprising:

according to environmental point cloud data, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels;

taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model;

object information in the target space is determined from the two-dimensional feature image including the first inter-voxel neighborhood information and the second inter-voxel neighborhood information.

Optionally, the first characteristic data and the second characteristic data are determined by the following steps:

dividing a target space into a plurality of first voxels according to the environmental point cloud data;

dividing the first voxel into a plurality of second voxels;

determining the first characteristic data according to the point cloud data of the first voxel; and determining second characteristic data according to the point cloud data of the second voxel.

Optionally, the method further includes:

determining a region of interest within the target space;

and clearing point cloud data outside the region of interest in the environmental point cloud data.

Optionally, the generating a two-dimensional feature image of the target space by the two-dimensional feature extraction model includes:

splitting a three-dimensional convolution kernel of an image feature extraction layer based on manifold sparse convolution into a plurality of single convolution kernels;

for a first single convolution kernel arranged before a central single convolution kernel in the image feature extraction layer, if an output non-empty voxel j generated by a first single convolution kernel k exists for an input non-empty voxel i, constructing a convolution index relation R (i, k, j) among the input non-empty voxel, the first single convolution kernel and the output non-empty voxel;

if a corresponding relation R (i, k, j) exists for a second single convolution kernel arranged behind the central single convolution kernel in the image feature extraction layer, constructing a convolution index relation R (j, k, i) among the input non-empty voxel j, the second single convolution kernel k and the output non-empty voxel i;

constructing a convolution index relation R (i, k, i) among the input non-empty voxel i, the central single convolution kernel k and the output non-empty voxel i;

and determining the non-empty voxels output by the manifold sparse convolution-based image feature extraction layer according to the convolution index relation, the single convolution kernel and the input non-empty voxels.

splitting a three-dimensional convolution kernel of an image feature extraction layer based on non-manifold sparse convolution into a plurality of single convolution kernels;

determining a global index j of an output voxel according to the space size, filling and convolution step length of a convolution kernel;

for an input non-empty voxel i, if an output non-empty voxel j generated by a single convolution kernel k exists, constructing a convolution index relation R (i, k, i) among the input non-empty voxel i, the single convolution kernel k and the output non-empty voxel j;

and determining the non-empty voxels output by the image feature extraction layer based on the non-manifold sparse convolution according to the convolution index relation, the single convolution kernel and the input non-empty voxels.

Optionally, the object comprises: vehicle, person, obstacle.

The application provides an object detection device, includes:

a feature data determination unit, configured to determine, according to environment point cloud data, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels, respectively, and use the first feature data and the second feature data as third feature data of the first voxels;

a two-dimensional feature extraction unit, configured to use the third feature data of the first voxel as input data of a two-dimensional feature extraction model with a network depth corresponding to a first spatial resolution, and generate a two-dimensional feature image of a target space through the two-dimensional feature extraction model;

and an object information determination unit configured to determine object information in the target space from the two-dimensional feature image including the first inter-voxel neighborhood information and the second inter-voxel neighborhood information.

Optionally, the method further includes:

a region-of-interest determining unit for determining a region of interest within the target space;

and the data clearing unit is used for clearing point cloud data outside the region of interest in the environment point cloud data.

The present application provides an apparatus comprising:

a processor; and

a memory for storing a program for implementing the object detection method, the apparatus performing the following steps after being powered on and running the program of the method by the processor: according to environmental point cloud data, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; object information in the target space is determined from the two-dimensional feature image including the first inter-voxel neighborhood information and the second inter-voxel neighborhood information.

Optionally, the apparatus comprises a vehicle; the vehicle includes a three-dimensional space scanning device.

The application provides a vehicle driving information determination method, which comprises the following steps:

determining object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; the object information includes current position information of the first vehicle;

and determining the running speed and/or the running track of the first vehicle according to the current position information and the historical position information of the first vehicle.

Optionally, the method further includes:

and adjusting the running mode of the second vehicle according to the running speed and/or the running track.

The application provides an object detection method, comprising:

determining first object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information;

and determining second object information in the target space according to the environment perception data outside the environment point cloud data and the first object information.

Optionally, the perception data includes at least one of the following data: two-dimensional images, millimeter wave radar perception data.

The application provides a vehicle travel information determination device, including:

a position information determination unit configured to determine object information in a target space from a two-dimensional feature image including first inter-voxel neighborhood information and second inter-voxel neighborhood information; the object information includes current position information of the first vehicle;

and the running information determining unit is used for determining the running speed and/or the running track of the first vehicle according to the current position information and the historical position information of the first vehicle.

The application provides an object detection device, includes:

a first object information determination unit configured to determine first object information in a target space from a two-dimensional feature image including first inter-voxel neighborhood information and second inter-voxel neighborhood information;

and the second object information determining unit is used for determining second object information in the target space according to the environment perception data outside the environment point cloud data and the first object information.

The present application provides an apparatus comprising:

a processor; and

a memory for storing a program for implementing a vehicle travel information determination method, the apparatus performing the following steps after being powered on and running the program of the method by the processor: according to environmental point cloud data, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; determining object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; the object information includes current position information of the first vehicle; and determining the running speed and/or the running track of the first vehicle according to the current position information and the historical position information of the first vehicle.

The present application provides an apparatus comprising:

a processor; and

a memory for storing a program for implementing the object detection method, the apparatus performing the following steps after being powered on and running the program of the method by the processor: according to environmental point cloud data, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; determining first object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; and determining second object information in the target space according to the environment perception data outside the environment point cloud data and the first object information.

The present application also provides an object detection method, including:

acquiring a two-dimensional environment image of a target space;

constructing a three-dimensional environment image of a target space according to the two-dimensional environment image;

according to the three-dimensional environment image, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels;

The present application also provides a lesion detection method, comprising:

acquiring at least one CT image of the target part;

constructing a three-dimensional image of the target part according to at least one CT image;

determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target region and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively according to the three-dimensional image, and taking the first feature data and the second feature data as third feature data of the first voxels;

taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the target part through the two-dimensional feature extraction model;

and determining the focus information of the target part according to the two-dimensional characteristic image comprising the first inter-voxel adjacent information and the second inter-voxel adjacent information.

The present application further provides an object detection device, including:

a two-dimensional image acquisition unit for acquiring a two-dimensional environment image of a target space;

the three-dimensional image construction unit is used for constructing a three-dimensional environment image of a target space according to the two-dimensional environment image;

a feature data determination unit configured to determine, from a three-dimensional environment image, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels, respectively, and use the first feature data and the second feature data as third feature data of the first voxels;

The present application further provides an apparatus comprising:

a processor; and

a memory for storing a program for implementing the object detection method, the apparatus performing the following steps after being powered on and running the program of the method by the processor: acquiring a two-dimensional environment image of a target space; constructing a three-dimensional environment image of a target space according to the two-dimensional environment image; according to the three-dimensional environment image, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; object information in the target space is determined from the two-dimensional feature image including the first inter-voxel neighborhood information and the second inter-voxel neighborhood information.

The present application further provides a lesion detection apparatus, comprising:

an image acquisition unit for acquiring at least one Computed Tomography (CT) image of a target region;

the three-dimensional image construction unit is used for constructing a three-dimensional image of the target part according to at least one CT image;

a feature data determination unit configured to determine, from the three-dimensional image, first feature data of a plurality of first voxels corresponding to a first spatial resolution within a target region and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution among the first voxels, respectively, and to use the first feature data and the second feature data as third feature data of the first voxels;

a two-dimensional feature extraction unit, configured to use the third feature data of the first voxel as input data of a two-dimensional feature extraction model with a network depth corresponding to a first spatial resolution, and generate a two-dimensional feature image of a target portion through the two-dimensional feature extraction model;

and a lesion information determination unit for determining lesion information of the target region based on the two-dimensional feature image including the first inter-voxel neighborhood information and the second inter-voxel neighborhood information.

The present application further provides an apparatus comprising:

a processor; and

a memory for storing a program for implementing a lesion detection method, the apparatus performing the following steps after being powered on and running the program of the method through the processor: acquiring at least one CT image of the target part; constructing a three-dimensional image of the target part according to at least one CT image; determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target region and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively according to the three-dimensional image, and taking the first feature data and the second feature data as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the target part through the two-dimensional feature extraction model; and determining the focus information of the target part according to the two-dimensional characteristic image comprising the first inter-voxel adjacent information and the second inter-voxel adjacent information.

The application also provides a virtual shopping method, which comprises the following steps:

according to the point cloud data of the shopping environment, determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a shopping space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first feature data and the second feature data as third feature data of the first voxels;

taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the shopping space through the two-dimensional feature extraction model;

determining object information in a shopping space according to a two-dimensional characteristic image comprising first inter-voxel neighborhood information and second inter-voxel neighborhood information;

and placing the virtual object into the three-dimensional image of the shopping space according to the object information.

The present application further provides a weather prediction method, including:

acquiring a three-dimensional space image of a target cloud layer;

according to the three-dimensional space image, determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target cloud layer and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first feature data and the second feature data as third feature data of the first voxels;

taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the target cloud layer through the two-dimensional feature extraction model;

weather information is determined from a two-dimensional feature image including first inter-voxel neighborhood information and second inter-voxel neighborhood information.

The present application further provides a virtual shopping device, comprising:

the characteristic data determining unit is used for determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a shopping space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively according to the shopping environment point cloud data, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels;

the two-dimensional feature extraction unit is used for taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a shopping space through the two-dimensional feature extraction model;

an object information determination unit configured to determine object information in the shopping space from a two-dimensional feature image including first inter-voxel neighborhood information and second inter-voxel neighborhood information;

and the virtual object adding unit is used for placing the virtual object into the three-dimensional image of the shopping space according to the object information.

The present application further provides an apparatus comprising:

a processor; and

a memory for storing a program for implementing a virtual shopping method, the apparatus performing the following steps after being powered on and running the program of the method through the processor: according to the point cloud data of the shopping environment, determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a shopping space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first feature data and the second feature data as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the shopping space through the two-dimensional feature extraction model; determining object information in a shopping space according to a two-dimensional characteristic image comprising first inter-voxel neighborhood information and second inter-voxel neighborhood information; and placing the virtual object into the three-dimensional image of the shopping space according to the object information.

The present application further provides a weather prediction apparatus, including:

the image acquisition unit is used for acquiring a three-dimensional space image of a target cloud layer;

a feature data determination unit, configured to determine, according to the three-dimensional spatial image, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target cloud layer and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels, respectively, and use the first feature data and the second feature data as third feature data of the first voxels;

the two-dimensional feature extraction unit is used for taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target cloud layer through the two-dimensional feature extraction model;

and the weather information determining unit is used for determining weather information according to the two-dimensional characteristic image comprising the first inter-voxel adjacent information and the second inter-voxel adjacent information.

The present application further provides an apparatus comprising:

a processor; and

a memory for storing a program for implementing the weather prediction method, the apparatus performing the following steps after being powered on and running the program of the method by the processor: acquiring a three-dimensional space image of a target cloud layer; according to the three-dimensional space image, determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target cloud layer and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first feature data and the second feature data as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the target cloud layer through the two-dimensional feature extraction model; weather information is determined from a two-dimensional feature image including first inter-voxel neighborhood information and second inter-voxel neighborhood information.

The present application also provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the various methods described above.

The present application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the various methods described above.

Compared with the prior art, the method has the following advantages:

according to the object detection method provided by the embodiment of the application, first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels are determined according to environment point cloud data, and the first characteristic data and the second characteristic data are used as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; determining object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; the processing mode enables three-dimensional image dimension reduction processing to be carried out based on coarse-grained voxels to generate a two-dimensional characteristic image, and the coarse-grained voxels are divided more finely, so that the characteristic data of the coarse-grained voxels not only comprise the voxel characteristic data of the coarse-grained voxels, but also comprise the voxel characteristic data of fine-grained voxels, more sufficient point cloud original information is reserved in the characteristic data of the coarse-grained voxels, and therefore, partial object information can be prevented from being lost; therefore, higher object detection accuracy and higher detection speed can be effectively considered. In addition, the multi-voxelization division mode occupies less system resources.

According to the vehicle driving information determining method provided by the embodiment of the application, first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels are determined according to environment point cloud data, and the first characteristic data and the second characteristic data are used as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; determining object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; the object information includes current position information of the first vehicle; determining the running speed and/or the running track of the first vehicle according to the current position information and the historical position information of the first vehicle; the processing mode enables three-dimensional image dimension reduction processing to be carried out based on coarse-grained voxels, two-dimensional characteristic images are generated, the coarse-grained voxels are divided more finely, the characteristic data of the coarse-grained voxels not only comprise the voxel characteristic data of the coarse-grained voxels, but also comprise the voxel characteristic data of fine-grained voxels, and therefore more sufficient point cloud original information is reserved in the characteristic data of the coarse-grained voxels, so that part of vehicle information can be avoided being lost, and the driving speed and/or the driving track of a vehicle can be determined according to more accurate vehicle information; therefore, higher accuracy of the vehicle running information and higher speed of determining the vehicle running information can be effectively considered.

According to the object detection method provided by the embodiment of the application, first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels are determined according to environment point cloud data, and the first characteristic data and the second characteristic data are used as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; determining first object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; determining second object information in a target space according to environment perception data outside the environment point cloud data and the first object information; according to the processing mode, three-dimensional image dimension reduction processing is carried out based on coarse-grained voxels, a two-dimensional characteristic image is generated, the coarse-grained voxels are divided more finely, the characteristic data of the coarse-grained voxels not only comprise the voxel characteristic data of the coarse-grained voxels, but also comprise the voxel characteristic data of fine-grained voxels, so that more sufficient point cloud original information is reserved in the characteristic data of the coarse-grained voxels, and therefore part of object information can be prevented from being lost, and more accurate second object information is determined according to other sensing data (such as two-dimensional images, millimeter wave radar sensing data and the like) except the point cloud based on the first object information; therefore, higher object detection accuracy and higher detection speed can be effectively considered.

According to the object detection method provided by the embodiment of the application, a two-dimensional environment image of a target space is obtained; constructing a three-dimensional environment image of a target space according to the two-dimensional environment image; according to the three-dimensional environment image, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; determining object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; the processing mode enables three-dimensional image dimension reduction processing to be carried out based on coarse-grained voxels to generate a two-dimensional characteristic image, and the coarse-grained voxels are divided more finely, so that the characteristic data of the coarse-grained voxels not only comprise the voxel characteristic data of the coarse-grained voxels, but also comprise the voxel characteristic data of fine-grained voxels, and therefore more sufficient spatial information is reserved in the characteristic data of the coarse-grained voxels, and partial object information can be prevented from being lost; therefore, higher object detection accuracy and higher detection speed can be effectively considered. In addition, by adopting the processing mode, the point cloud data does not need to be acquired through the laser radar, but the object information can be directly determined according to the two-dimensional image acquired by the camera, so that the equipment cost can be effectively reduced. In addition, compared with the technical scheme that object information is determined directly according to the acquired two-dimensional image through an object detection model (such as a RefineDet method based on deep learning and the like) in the prior art, the processing method provided by the invention enables input data (two-dimensional characteristic image) of the object detection model to include spatial information, and the accuracy of the object information determined according to the two-dimensional characteristic image is higher.

According to the focus detection method provided by the embodiment of the application, at least one electronic Computed Tomography (CT) image of a target part is obtained; constructing a three-dimensional image of the target part according to at least one CT image; determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target region and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively according to the three-dimensional image, and taking the first feature data and the second feature data as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the target part through the two-dimensional feature extraction model; determining lesion information of the target region according to the two-dimensional feature image comprising the first inter-voxel neighborhood information and the second inter-voxel neighborhood information; the processing mode enables the three-dimensional image of the target part to be subjected to image dimension reduction processing based on the coarse-grained voxels to generate a two-dimensional characteristic image, and the coarse-grained voxels are divided more finely, so that the characteristic data of the coarse-grained voxels not only comprise the voxel characteristic data of the coarse-grained voxels, but also comprise the voxel characteristic data of the fine-grained voxels, and therefore more sufficient spatial information is reserved in the characteristic data of the coarse-grained voxels, and therefore, partial focus information can be prevented from being lost; therefore, higher lesion detection accuracy and higher detection speed can be effectively considered.

According to the virtual shopping method provided by the embodiment of the application, first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a shopping space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels are determined according to point cloud data of a shopping environment, and the first characteristic data and the second characteristic data are used as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the shopping space through the two-dimensional feature extraction model; determining object information in a shopping space according to a two-dimensional characteristic image comprising first inter-voxel neighborhood information and second inter-voxel neighborhood information; placing the virtual object into a three-dimensional image of a shopping space according to the object information; the processing mode enables the three-dimensional image of the target shopping space to be subjected to image dimension reduction processing based on the coarse-grained voxels to generate a two-dimensional characteristic image, and the coarse-grained voxels are divided more finely, so that the characteristic data of the coarse-grained voxels not only comprise the voxel characteristic data of the coarse-grained voxels, but also comprise the voxel characteristic data of the fine-grained voxels, and therefore more sufficient spatial information is reserved in the characteristic data of the coarse-grained voxels, and partial commodity information can be prevented from being lost; therefore, higher object detection accuracy and higher detection speed can be effectively considered, and the commodity sales rate and the user experience are improved.

According to the weather prediction method provided by the embodiment of the application, the three-dimensional space image of the target cloud layer is collected; according to the three-dimensional space image, determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target cloud layer and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first feature data and the second feature data as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the target cloud layer through the two-dimensional feature extraction model; determining weather information according to the two-dimensional characteristic image comprising the first inter-voxel adjacent information and the second inter-voxel adjacent information; the processing mode enables the three-dimensional image of the target cloud layer to be subjected to image dimension reduction processing based on the coarse-granularity voxels to generate a two-dimensional characteristic image, and the coarse-granularity voxels are divided more finely, so that the characteristic data of the coarse-granularity voxels not only comprise the voxel characteristic data of the coarse-granularity voxels, but also comprise the voxel characteristic data of the fine-granularity voxels, more sufficient spatial information is reserved in the characteristic data of the coarse-granularity voxels, and therefore, part of weather information can be prevented from being lost; therefore, the method can effectively give consideration to higher accuracy of weather forecast and higher detection speed.

Drawings

FIG. 1 is a flow chart of an embodiment of an object detection method provided herein;

FIG. 2 is a schematic view of voxel division of an embodiment of an object detection method provided herein;

FIG. 3 is a schematic diagram of a network structure of an embodiment of an object detection method provided in the present application;

FIG. 4 is a detailed flow chart of an embodiment of an object detection method provided herein;

FIG. 5 is a schematic view of an embodiment of an object detection device provided herein;

FIG. 6 is a schematic diagram of an embodiment of an apparatus provided herein;

FIG. 7 is a flow chart of an embodiment of a vehicle travel information determination method provided herein;

FIG. 8 is a flow chart of an embodiment of an object detection method provided herein;

FIG. 9 is a flow chart of an embodiment of an object detection method provided herein;

fig. 10 is a flow chart of an embodiment of a lesion detection method provided herein;

FIG. 11 is a flow chart of an embodiment of a virtual shopping method provided herein;

fig. 12 is a flowchart of an embodiment of a weather prediction method provided herein.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The application provides an object detection method, an object detection device and equipment, a vehicle driving information determination method, a vehicle driving information determination device and equipment, a focus detection method, a focus detection device and equipment, a virtual shopping method, a virtual shopping device and equipment, and a weather prediction method, a weather prediction device and equipment. Each of the schemes is described in detail in the following examples.

First embodiment

Please refer to fig. 1, which is a flowchart illustrating an embodiment of an object detection method according to the present disclosure. The method is carried out by a subject including but not limited to an unmanned vehicle such as a smart logistics vehicle and the like, and the identifiable objects include traffic objects such as pedestrians, vehicles, obstacles and the like in the road, and other objects can also be identified. This method will be described below by taking the detection of a traffic object as an example. The application provides an object detection method, which comprises the following steps:

step S101: according to the environmental point cloud data, determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first feature data and the second feature data as third feature data of the first voxels.

According to the method provided by the embodiment of the application, in the driving process of a vehicle (hereinafter referred to as a self vehicle), the spatial coordinates of each sampling Point on the surface of an environmental space object of a driving road of the vehicle can be obtained through a three-dimensional space scanning device arranged on the vehicle, so as to obtain a Point set, and the massive Point data is called as road environmental Point Cloud (Point Cloud) data. The road environment point cloud data is used to record the surface of the scanned object in the form of points, each point contains three-dimensional coordinates, and some points may contain color information (RGB) or reflection Intensity information (Intensity). By means of the point cloud data, the target space can be expressed under the same spatial reference system.

The three-dimensional space scanning device may be a laser radar (Lidar), And performs laser Detection And measurement in a laser scanning manner to obtain obstacle information in a surrounding environment, such as buildings, trees, people, vehicles, And the like, where measured data is represented by discrete points of a Digital Surface Model (DSM). In specific implementation, a multi-line laser radar such as 16 lines, 32 lines, 64 lines and the like can be adopted, the Frame rates (Frame rates) of the point cloud data collected by radars with different laser beam quantities are different, and for example, 10 frames of point cloud data are generally collected per second by 16 lines and 32 lines. The three-dimensional space scanning device may be a three-dimensional laser scanner, a photographic scanner, or the like.

The road environment point cloud data of the embodiment comprises spatial coordinate data of sampling points of object surface points in a laser radar sensing area. From the road environment point cloud data, point cloud data for a plurality of first voxels (voxels) within a perception area (target space) corresponding to a first spatial resolution may be determined. A voxel is an abbreviation of a Volume element (Volume Pixel) that can be used to represent a unit displaying a base point in three-dimensional space, similar to a Pixel (Pixel) under a two-dimensional plane. The volume of the first voxel depends on the first spatial resolution and may be a coarser grained first volume.

On one hand, the object detection accuracy is directly related to the volume of a voxel, the finer the voxelization is, the smaller the volume of the voxel is, and the more the voxel can fully retain the original information of the point cloud, so that the object detection accuracy is higher; on the other hand, the running speed of the 3D convolutional network is also directly related to the size of input data after voxelization, and the finer the voxelization is, the larger the input data volume is, so that the video memory load of an image processor (GPU) is increased, and the running speed of the 3D convolutional network is reduced; in yet another aspect, the depth of the network of the 3D convolutional network is also directly related to the voxel volume, and the smaller the voxels, the smaller the field of view covered by each voxel, in which case the network needs to be deeper to achieve a sufficiently large field of view needed to detect large targets, resulting in a reduction in network efficiency.

The method provided by the embodiment is different from the prior art in that: in the prior art, only one-level voxel division is performed on the sensing region, but the method provided by the embodiment can perform multi-level voxel division on the sensing region. For example, the method provided in this embodiment may first divide the sensing region into a plurality of first voxels with a first spatial resolution, where the plurality of first voxels generally have the same volume, and the volume of the first voxel is referred to as a first volume; then, each first voxel is further divided into voxels. This embodiment refers to a sub-voxel of the first voxel as the second voxel and the spatial resolution of the second voxel as the second spatial resolution, which is typically higher than the first spatial resolution. The second voxels corresponding to the at least one second spatial resolution each typically have a different volume therebetween, and the volume of the second voxel is referred to as the second volume.

In one example, the first voxel may be divided into a plurality of second voxels of different second volumes, i.e. the first voxel is divided in a plurality of second spatial resolutions. For example, first, the laser point cloud is divided into first voxels with relatively large particle size, such as 0.1 × 0.2 meters in volume per first voxel; then, for each coarse-grained first voxel, it may be further fine-grained divided into 8 voxels of 2 x 2, and/or 27 voxels of 3 x 3 and n of higher order n x n, as shown in fig. 2³And (4) each voxel. For each coarse-grained first voxel, its descriptive feature (i.e., third feature data) is F ═ F_1,1,f_8,1f_8,2,…,f_8,8,f_27,1,f_27,2,…,f_27,27,…]Wherein f is_i,jIs the characteristic of the jth sub-voxel if the voxel is divided into i parts. The voxel characteristics may use some simple characteristics, such as the average of the points within the voxel

Or complex features such as those produced by MLP + poling.

In another example, the first voxel may be divided into a plurality of second voxels of the same second volume, i.e. the first voxel may be divided only at one second spatial resolution. For example, the spatial resolution of a first voxel of coarser granularity is 0.1 x 0.2 meters, and the spatial resolution of a second voxel divided into smaller voxels is 0.025 x 0.05 meters, the second voxel being 1/4 of the first voxel in a single dimension.

In a specific implementation, step S101 may include the following sub-steps: 1) dividing a target space (a perception area) into a plurality of first voxels according to the environmental point cloud data; 2) dividing the first voxel into a plurality of second voxels; 3) determining the first characteristic data according to the point cloud data of the first voxel; and determining second characteristic data according to the point cloud data of the second voxel.

In specific implementation, the first characteristic data of each first voxel and the second characteristic data of each second voxel can be determined through a voxel characteristic learning network, and the first characteristic data and the second characteristic data are used as the third characteristic data of the first voxel. Since the voxel feature learning network belongs to the mature prior art, it is not described here in detail.

In one example, step S101 can be implemented as follows: if the multi-level voxel division condition is met, determining the first characteristic data and the second characteristic data according to the environment point cloud data, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxel; and if the multi-level voxel division condition is not met, determining the first characteristic data according to the environment point cloud data, taking the first characteristic data as input data of the two-dimensional characteristic extraction model, and determining object information in a target space according to a two-dimensional characteristic image comprising adjacent information between first voxels.

The multi-level voxel-dividing condition includes, but is not limited to, at least one of the following conditions: the object identification accuracy is greater than an accuracy threshold; the object information includes a vehicle, a person, or an obstacle in the driving road; the resource occupancy of a processor executing the method is greater than an occupancy threshold.

Wherein the accuracy threshold can be determined according to the service requirement. For example, the accuracy threshold is 95%, and if the object identification accuracy is required to be 98%, the hierarchical voxel division condition is satisfied, and multi-level voxel division is required.

For another example, in a service scenario in which an unmanned vehicle automatically identifies an obstacle in a driving road, since the automatic driving requirement has higher safety and frequent braking due to erroneous identification is to be avoided, the object identification accuracy requirement is higher, and the scenario meets the multi-level voxel division condition.

By adopting the processing mode, the switching of the multi-level voxel division working mode is carried out in real time according to the actual situation; therefore, computing resources and storage resources can be effectively saved, and energy consumption and hardware cost are reduced.

For another example, if the processor resource (CPU or GPU occupancy) of the current vehicle system is high, in order to avoid a crash fault caused by over busy of the CPU or GPU, if such a situation is not improved, multi-level voxel division should not be performed, so that normal operation of the processor can be exchanged at the cost of sacrificing the accuracy of part of object identification, and when the processor is not busy, the operation mode of multi-level voxel division is switched in real time. In this embodiment, the method is executed in the GPU, and occupies less system resources.

In another example, the second characteristic data may be determined using the following steps: 1) dividing the target space into a first region which needs multi-voxel division and a second region which does not need multi-voxel division; 2) second feature data of a plurality of second voxels in the first voxel corresponding to at least one second spatial resolution in the first region are determined.

For example, the first region is defined as: the area between the road surface and a spatial location where the spatial height is less than the height threshold. The height threshold may be 4 meters, so that various objects in the driving road, such as a double-deck bus and the like, can be basically identified.

By adopting the processing mode, under the condition of ensuring that the object information can be identified, multi-level voxel division is only carried out on partial areas; therefore, computing resources and storage resources can be effectively saved, and energy consumption and hardware cost are reduced.

In yet another example, the third characteristic data is determined using the steps of: 1) determining an area to be subjected to second voxel division in the first voxel, for example, an automatic driving region of interest determined according to a high-precision map and a real-time position of an automatic driving vehicle, including a road, a sidewalk, a travelable area and the like; 2) and taking the first characteristic data of the first voxel and the second characteristic data in the region to be subjected to the second voxel division as the third characteristic data.

In practical applications, some of the first voxels include some regions that do not need to be divided by the second voxel, for example, regions that do not affect autonomous driving, such as roadside barriers and regions outside the barriers, which are determined according to a high-precision map and a real-time position of an autonomous vehicle; in this case, the division of the second voxel into the partial region is not necessary, and the second feature data of the partial region is also not necessary.

By adopting the processing mode, after one first voxel is subjected to multi-level division, only part of second characteristic data is taken as third characteristic data, so that the sparsity of input data input into the two-dimensional characteristic extraction model can be ensured as much as possible; therefore, the object recognition speed can be effectively improved.

Step S103: and taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model.

The input data of the two-dimensional feature extraction model comprises three-dimensional feature data, and the output data is a two-dimensional feature image. The two-dimensional characteristic image comprises spatial information of a target space, and the information is arranged into a two-dimensional matrix in a two-dimensional mode.

The method provided by the embodiment further comprises the following steps: 1) if the same object detection accuracy is to be achieved, the network depth of the two-dimensional feature extraction model in the prior art is higher than that of the two-dimensional feature extraction model in the method provided by the embodiment; 2) the point cloud original information retained in the two-dimensional feature image generated by the two-dimensional feature extraction model in the prior art is far less than the point cloud original information retained in the two-dimensional feature image generated by the two-dimensional feature extraction model in the method provided by this embodiment.

1) If the same object detection accuracy is to be achieved, the network depth of the two-dimensional feature extraction model in the prior art is higher than that of the two-dimensional feature extraction model in the method provided by the embodiment.

In the prior art, to achieve higher object detection accuracy, voxels with smaller granularity, higher spatial resolution and larger number are required for 3D convolution, which results in deeper depth of the 3D convolution network. For example, the spatial resolution of the coarser-sized voxels is 0.1 x 0.2 meters, and if the spatial resolution of the smaller voxels is 0.025 x 0.05 meters, the smaller voxels are 1/4 for the larger voxels in a single dimension, and to achieve the same receptive field, the receptive field at resolution needs to be expanded four times. In which the increased network depth varies according to the network structure, for example, if the field is increased by using a pooling (posing) layer, two convolution blocks (blocks) with posing are required, wherein a convolution block includes a plurality of convolution layers and a pooling layer.

As shown in fig. 3, in the method provided in this embodiment, a first voxel of third feature data having relatively large granularity and relatively low spatial resolution but having sufficient point cloud original information is used for 3D convolution, instead of a second voxel having relatively small granularity, so that the number of total voxels input to the two-dimensional feature extraction model can be controlled, and on the premise of ensuring higher object detection accuracy, the network depth of the two-dimensional feature extraction model can be effectively reduced, thereby improving the object detection speed, that is, both higher object detection accuracy and faster detection speed can be achieved.

2) In the method provided by this embodiment, the point cloud original information retained in the two-dimensional feature image generated by the two-dimensional feature extraction model is much higher than the point cloud original information retained in the two-dimensional feature image in the prior art.

The method provided by the embodiment identifies the object based on the two-dimensional characteristic image with more sufficient point cloud original information, so that the detected object information is more accurate. In the prior art, a two-dimensional feature image only includes neighborhood information between voxels of one volume; in this embodiment, the two-dimensional feature image includes not only the neighborhood information between the first voxels, but also the neighborhood information between the second voxels of at least one volume. Therefore, the two-dimensional feature image of the embodiment has sufficient point cloud original information, and object information identified based on the two-dimensional feature image with the sufficient point cloud original information is more accurate.

Please refer to fig. 4, which is a detailed flowchart of an embodiment of an object detection method according to the present application. In this embodiment, the method may further include the steps of:

step S401: a region of interest within the target space is determined.

The region of interest (ROI) includes, but is not limited to, a vehicle driving road, a sidewalk, and other regions that affect vehicle driving; accordingly, the regions of non-interest may include areas occupied by buildings, trees, etc. on both sides of the roadway.

Step S403: and clearing point cloud data outside the region of interest in the environmental point cloud data.

The region of interest may be determined as follows: firstly, high-precision positioning is carried out on a point cloud data acquisition device to determine the position information of the device; then, determining the ROI in the target space according to the position information through a high-precision map ROI determining module; and finally, filtering out point cloud data outside the ROI through an ROI filtering module.

In the method provided by the embodiment, point cloud data except the ROI is filtered from the environmental point cloud data, and the steps S101 to S105 are executed according to the remaining ROI point cloud data; this way of treatment allows to greatly reduce the number of points to be treated, about more than 50% of the extraneous points being filtered out; therefore, computing resources and storage resources can be effectively saved, and the object detection speed is improved.

It is to be noted that the 3D convolution operation at the time of performing the three-dimensional image dimension reduction processing in step S103 has a larger amount of calculation than the 2D convolution operation at the time of performing the target detection processing in step S105. For the effective sensing distance range x (front and back) direction [ -64,64] m, y (left and right) direction [ -64,64] m, z (up and down) direction [ -3,1] m, and the first spatial resolution is [0.1,0,1,0.2] m, the spatial dimension of the generated two-dimensional feature extraction model input feature is 1280 × 20 dimensions, so that the input data with high dimension is a great burden for 2D convolution, and a great time cost is generated for 3D convolution. Compared with a two-dimensional image convolution algorithm, the laser radar has strong sparsity, for a 32-line laser radar, one frame of laser point cloud contains 57600 laser points, the number of the corresponding generated non-empty first voxels is about 25000, in this case, about 99.92% of the first voxels are empty, and if the laser point cloud outside the ROI is filtered through a high-precision map, the proportion of the empty first voxels is higher. In this case, the sparse 3D convolution can effectively increase the convolution speed, thereby increasing the object detection speed.

In summary, real-time performance becomes a very important issue for 3D convolution, which can be alleviated by Sparse (Sparse)3D convolution. However, the real-time performance of existing algorithms still needs to be further improved. How to realize faster sparse convolution, thereby further improving the object detection speed, becomes a problem that needs to be solved urgently by those skilled in the art.

The sparse 3D convolution includes two types, one is a general sparse convolution, and there is no particular limitation on the output of the convolution; another is manifold sparse convolution, which produces an output at a corresponding output location only if there are active voxels (non-empty voxels) at the corresponding input location. In the general 3D convolution, as the number of activated voxels increases with the increase of the convolution layer number, the sparsity of data becomes worse, and the calculation efficiency of the 3D convolution is affected. The manifold sparse convolution can ensure that the output layer and the input layer have the same number of activated voxels, so that no extra activated voxels are generated in the sparse convolution process, and the calculation performance is ensured. In the sparse network design, a common sparse convolution layer and a manifold sparse convolution layer are usually combined, for example, a convolution block is formed by combining a common sparse convolution layer with three manifold sparse convolution layers, so that the extraction of features between adjacent voxels is ensured, and the rapid increase of the number of activated voxels is inhibited, thereby ensuring the network performance (namely object detection accuracy) and the calculation speed.

The process of sparse convolution is as follows. Firstly, non-empty voxel regions are indexed, and the non-empty regions are aggregated together; then, a General matrix multiplication (GEMM) algorithm is used to realize fast convolution; finally, the convolved output is returned to the corresponding 3D output position. However, due to the high sparsity of the lidar data, voxel x within the same convolution kernel_P(j),lIt is still possible to include a large number of empty voxels, for example, a 3-dimensional convolution with a convolution kernel of 3 × 3, and there may be more than 50% empty voxels out of the corresponding 3 × 3 — 27 input voxels, which still represents a large computational burden.

To solve this problem, step S103 may include the following sub-steps:

step S1031: and splitting the three-dimensional convolution kernel of the manifold sparse convolution-based image feature extraction layer into a plurality of single convolution kernels.

The method provided by this embodiment further spreads the convolution kernels according to spatial dimensions, for example, a convolution kernel with spatial dimension 3 × 3 can be split into 27 single convolution kernels, a convolution kernel with spatial dimension 5 × 5 can be split into 125 single convolution kernels, and so on. In this case, the corresponding sparse convolution GEMM mathematical expression is:

wherein k is the index value of the single convolution kernel, and the number of the index valuesFor the number of single convolution kernels, w is the weight of the single convolution kernel, l is the characteristic dimension of the input data, and m is the characteristic dimension of the output data. R (i, k, j) represents the convolution index relationship of the input voxel i to output j through the kth single convolution kernel, x_R(i,k,j),lIs an aggregation of the input layer data corresponding to the output at the j-th location.

Step S1033: and for a first single convolution kernel arranged before a central single convolution kernel in the image feature extraction layer, if an output non-empty voxel j generated by the first single convolution kernel k exists for an input non-empty voxel i, constructing a convolution index relation R (i, k, j) among the input non-empty voxel, the first single convolution kernel and the output non-empty voxel.

Step S1035: and if a convolution index relation R (i, k, j) exists for a second single convolution kernel arranged after the central single convolution kernel in the image feature extraction layer, constructing a convolution index relation R (j, k, i) among the input non-empty voxel j, the second single convolution kernel k and the output non-empty voxel i.

Sparse convolution requires a step of establishing a convolution index, and can be realized in a traversal mode. The traversal process is as follows: for the sparse convolution with convolution kernel of 3 × 3, for each non-empty voxel, traversing the 3 × 3 neighborhood, finding other non-empty voxels in the neighborhood, and establishing a corresponding relation. For example, for i non-empty voxels, at the kth position in the neighborhood of 3 × 3, there is the jth non-empty voxel, such an index as (i, k, j) is established. Conventional approaches require searching every location in the neighborhood to establish an indexing relationship.

The core part of the method for realizing the fast sparse convolution is to quickly establish an index relation R between an input i, an output j and a single convolution kernel k. Taking the spatial size of the convolution kernel as 3 × 3 as an example, for manifold sparse convolution, in implementing the present invention, the inventors found two laws:

law 1: if there is a corresponding relationship (i, k, j) of input i, single convolution kernel k, output j, that is: if the relationship holds that input i produces output j through single convolution kernel k, then there must be a relationship (j, 26-k, i) that input j produces output i through single convolution kernel 26-k.

Law 2: for a central convolution kernel, such as the convolution kernel of 3 x 3, the center is a single convolution kernel 13, and the corresponding index relationship is and only { (i,13, i), i ∈ [0, N) }.

Through the law 1 and the law 2, all the corresponding relations of the single convolution kernel k belonging to [0,27) can be established only by searching and establishing the corresponding relation (i, k, j) of the single convolution kernel k belonging to [0,13), so that the time required for establishing the index can be reduced by half.

For example, assuming that the number of voxels of the input layer is N and the spatial size of the convolution kernel of one output channel is 3 × 3 — 27, the process of establishing the manifold sparse convolution index relationship R may include the following steps:

step 1: establishing a global index which is the same as that of the input layer for the output voxel, and setting the index of a null voxel as-1;

for example, the input layer includes 10 non-empty voxels with global positions of 50, 200, 1000, 9400, 90700, 109000, 180000, 370000, 500000, 700000, and corresponding global indices of 0 to 9; accordingly, the global voxel position of the output voxel is 50, 200, 1000, 9400, 90700, 109000, 180000, 370000, 500000, 700000, and the global index is also 0 to 9, respectively.

Step 2: and (3) judging whether an output j generated by a convolution kernel k epsilon [0,13) exists or not for each input non-empty voxel i, and if yes, storing (i, k, j) into an index relation R.

For example, the determination method is: taking a convolution kernel of 3 × 3 as an example, for the non-empty voxels i,13 neighbors of the first upper half of 27 neighbors of i in space are determined, and the relative positions of the 13 neighbors are (0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 1, 0), (0, 1, 1), (0, 1, 2), (0, 2, 0), (0, 2, 1), (0, 2, 2), (1, 0, 0), (1, 0, 1), (1, 0, 2), and 1, 1, 0), respectively, and the number k is from 0 to 12.

And step 3: the index relationship R of k ∈ [13,27) is complemented by law 1 and law 2.

First, the index of the center point (i,13, j) can be added according to law 2; then, if (i, k, j) exists for k ∈ [0,13) at the kth neighbor position, (i, 26-k, j) exists according to law 1, and this relationship is added to the index relationship R.

According to the method provided by the embodiment of the application, index establishment can be completed only by searching half of the neighborhood space through the symmetry rule discovered by the inventor.

Step S1037: and determining the non-empty voxels output by the manifold sparse convolution-based image feature extraction layer according to the convolution index relation, the single convolution kernel and the input non-empty voxels.

After the index relationship is quickly established in the above manner, a general matrix multiplication algorithm can be used for realizing quick convolution according to the convolution index relationship, the single convolution kernel and the input non-empty voxels, determining the output non-empty voxels, and returning the convolution output to the corresponding 3D output position.

The above description is made on the fast establishment method of the flow sparse convolution index relationship R, and next, the fast establishment method of the general sparse convolution index relationship R is described.

For ordinary sparse convolution, there is no need to limit the location of the output voxel, so there is no consistency of input and output voxel locations, in which case laws 1 and 2 are not suitable for ordinary sparse convolution. To quickly establish the ordinary sparse convolution index relationship R, step S103 may include the following sub-steps:

step S1031': and splitting the three-dimensional convolution kernel of the image feature extraction layer based on the non-manifold sparse convolution into a plurality of single convolution kernels.

Step S1033': and determining a global index j of an output voxel according to the space size, filling and convolution step length of the single convolution kernel.

For example, for a general convolution with a convolution kernel size of 3 × 3, the same input/output size, a padding of 0, and a convolution step size of 1, for an input non-empty voxel i, due to the diffusivity of the convolution kernel, 27 neighboring spatial positions corresponding to 3 × 3 centered on i are all outputs of a corresponding output layer, output voxel positions corresponding to all input non-empty voxels are determined, and a unique global index j is assigned to these output voxels.

The method comprises the steps of firstly establishing a lookup table with the same size of an output layer for distributing a unique global index j to the output layer, setting the numerical values of all voxels in the lookup table to be-1, if a certain output voxel is an activated output and is distributed with the global index, setting the value of the voxel lookup table to be-2, and ensuring that the activated voxel is distributed with the unique global index by judging whether the value in the lookup table is-1 or-2.

Step S1035': and (5) judging whether an output j generated by a single convolution kernel k exists or not for each input voxel i, and storing (i, k, j) into an index relation R if the output j exists.

Taking the convolution kernel of 3 × 3 as an example, for the ith input voxel, the output voxel j of the output layer neighborhood corresponding to the voxel is judged, and an index relationship (i, k, j) is established. For an ordinary convolution with a convolution kernel size of 3 × 3, the same input and output sizes, a filling of 0, and a convolution step size of 1, for an input non-empty voxel i, 27 voxels at the corresponding position of the output layer need to be indexed.

Step S1037': and determining the non-empty voxels output by the image feature extraction layer based on the non-manifold sparse convolution according to the convolution index relation, the single convolution kernel and the input non-empty voxels.

Step S105: object information in the target space is determined from the two-dimensional feature image including the first inter-voxel neighborhood information and the second inter-voxel neighborhood information.

The method provided by the embodiment further comprises the following steps: the two-dimensional feature image in the prior art only includes neighboring information between voxels of one volume, but the two-dimensional feature image in the method provided by this embodiment retains neighboring information between first voxels and neighboring information between second voxels, and determines object information in a sensing region based on the image, thereby effectively improving higher object detection accuracy.

In particular embodiments, object information in the target space may be determined from a two-dimensional feature image including first inter-voxel neighborhood information and second inter-voxel neighborhood information via an image-based target detection algorithm (e.g., Faster R-CNN, SSD, FCN, etc.). The object of the present embodiment may be a traffic object. The traffic object can be a vehicle, a person, an obstacle such as a tree, and the like.

In the present embodiment, object information in the target space is determined from the two-dimensional feature image by the traffic object detection model. The traffic object detection model can adopt a RefineDet method based on deep learning, and the method combines a two-stage method such as Faster R-CNN on the basis of using a single-stage method such as SSD for reference, so that the traffic object detection model has the advantage of high object detection accuracy. When detecting images of traffic objects (for running vehicles, the traffic objects are obstacles), the method obtains coordinates of bounding boxes (bounding boxes) of the images of the traffic objects, namely position data of the images of the traffic objects in the images of the traffic environments. The position data may be vertex coordinate data of a 3D rectangular bounding box of the traffic object image, i.e. the position data may be a 24-dimensional vector representing x, y, z coordinates of all 8 vertices, respectively. After the traffic object image in the two-dimensional characteristic image is determined, the object detection result can be converted from the vehicle coordinate system to the world coordinate system, so that the space object information is determined. Since the object detection technology for two-dimensional feature images belongs to the mature prior art, it is not described here again.

As can be seen from the foregoing embodiments, in the object detection method provided in the embodiments of the present application, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels are determined according to environment point cloud data, and the first feature data and the second feature data are used as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; determining object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; the processing mode enables three-dimensional image dimension reduction processing to be carried out based on coarse-grained voxels to generate a two-dimensional characteristic image, and the coarse-grained voxels are divided more finely, so that the characteristic data of the coarse-grained voxels not only comprise the voxel characteristic data of the coarse-grained voxels, but also comprise the voxel characteristic data of fine-grained voxels, more sufficient point cloud original information is reserved in the characteristic data of the coarse-grained voxels, and therefore, partial object information can be prevented from being lost; therefore, higher object detection accuracy and higher detection speed can be effectively considered.

Second embodiment

In the foregoing embodiment, an object identification method is provided, and correspondingly, the present application further provides an object detection apparatus. The apparatus corresponds to an embodiment of the method described above.

Please refer to fig. 5, which is a schematic diagram of an embodiment of an object detecting apparatus of the present application. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

The present application additionally provides an object detection device, comprising:

a feature data determining unit 501, configured to determine, according to environment point cloud data, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels, where the first feature data and the second feature data are used as third feature data of the first voxels;

a two-dimensional feature extraction unit 503, configured to use the third feature data of the first voxel as input data of a two-dimensional feature extraction model with a network depth corresponding to a first spatial resolution, and generate a two-dimensional feature image of a target space through the two-dimensional feature extraction model;

an object information determining unit 505, configured to determine object information in the target space according to the two-dimensional feature image including the first inter-voxel neighboring information and the second inter-voxel neighboring information.

Optionally, the object comprises: vehicle, person, obstacle.

Optionally, the method further includes:

Optionally, the feature data determining unit includes:

the first voxel dividing unit is used for dividing the target space into a plurality of first voxels according to the environmental point cloud data;

a second voxel dividing unit for dividing the first voxel into a plurality of second voxels;

a first characteristic data determining subunit, configured to determine the first characteristic data according to the point cloud data of the first voxel;

and the second characteristic data determining subunit is used for determining second characteristic data according to the point cloud data of the second voxel.

Optionally, the two-dimensional feature extraction unit includes:

the convolution kernel splitting subunit is used for splitting a three-dimensional convolution kernel of the image feature extraction layer based on manifold sparse convolution into a plurality of single convolution kernels;

a first index relation construction subunit, configured to construct, for a first single convolution kernel arranged before a central single convolution kernel in the image feature extraction layer, a convolution index relation R (i, k, j) among an input non-empty voxel, the first single convolution kernel, and an output non-empty voxel if an output non-empty voxel j generated by the first single convolution kernel k exists for the input non-empty voxel i;

a second index relation construction subunit, configured to construct, for a second single convolution kernel arranged after the central single convolution kernel in the image feature extraction layer, a convolution index relation R (j, k, i) among the input non-empty voxel j, the second single convolution kernel k, and the output non-empty voxel i if a correspondence R (i, k, j) exists;

a third index relation construction subunit, configured to construct a convolution index relation R (i, k, i) between the input non-empty voxel i, the central single convolution kernel k, and the output non-empty voxel i;

and the convolution operation subunit is used for determining the non-empty voxels output by the manifold sparse convolution-based image feature extraction layer according to the convolution index relationship, the single convolution kernel and the input non-empty voxels.

Optionally, the two-dimensional feature extraction unit includes:

the convolution kernel splitting subunit is used for splitting a three-dimensional convolution kernel of the image feature extraction layer based on the non-manifold sparse convolution into a plurality of single convolution kernels;

the global index determining subunit is used for determining a global index j of an output voxel according to the space size, filling and convolution step length of the convolution kernel;

an index relation construction subunit, configured to construct, for an input non-empty voxel i, a convolution index relation R (i, k, i) between the input non-empty voxel i, a single convolution kernel k, and an output non-empty voxel j if the output non-empty voxel j generated by the single convolution kernel k exists;

and the convolution operation subunit is used for determining the non-empty voxels output by the image feature extraction layer based on the non-manifold sparse convolution according to the convolution index relationship, the single convolution kernel and the input non-empty voxels.

Third embodiment

In the above embodiment, an object detection method is provided, and correspondingly, the present application also provides an apparatus. Embodiments of the device correspond with the embodiments of the method described above.

Please refer to fig. 6, which is a schematic diagram of an embodiment of the apparatus of the present application. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

The present application additionally provides an apparatus comprising: a processor 601; and a memory 602 for storing a program for implementing the object detection method, the apparatus performing the following steps after being powered on and running the program of the method by the processor: according to environmental point cloud data, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; object information in the target space is determined from the two-dimensional feature image including the first inter-voxel neighborhood information and the second inter-voxel neighborhood information.

In one example, the device is a vehicle or drive test sensing device, and the device may further include a three-dimensional space scanning apparatus for acquiring environmental point cloud data.

In another example, the device is a server or a personal computer, and the like, and the device may be further configured to receive an object detection request for the environmental point cloud data sent by a vehicle or a drive test sensing device, and send back detected object information to a requester.

Fourth embodiment

In the above embodiment, an object detection method is provided, and correspondingly, the present application also provides a vehicle travel information determination method. The method is executed by a main body including but not limited to an unmanned vehicle, and can also be a drive test sensing device and other devices.

Please refer to fig. 7, which is a flowchart of an embodiment of a vehicle travel information determining method of the present application. The vehicle driving information determination method provided by the application comprises the following steps:

step S701: according to the environmental point cloud data, determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first feature data and the second feature data as third feature data of the first voxels.

Step S703: and taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model.

Step S705: object information in the target space is determined from the two-dimensional feature image including the first inter-voxel neighborhood information and the second inter-voxel neighborhood information.

The object information may include position information of the first vehicle, such as longitude and latitude information of the vehicle, and the information may be used as the current position information. The first vehicle comprises a vehicle other than the execution subject of the method.

Step S707: and determining the running speed and/or the running track of the first vehicle according to the current position information and the historical position information of the first vehicle.

The historical position information may be the position information of the first vehicle determined according to the previous frame or previous frames of the environmental point cloud data by using the method provided by the embodiment of the present application, or the position information of the vehicle determined by using other existing positioning technologies, and the like.

After obtaining the current position information and the historical position information of the first vehicle, the running speed and/or the running track of the first vehicle can be determined according to the two position information. For example, the distance between two positions is divided by the time difference between two frames of data, which is the driving speed. For another example, the vehicle trajectory may be determined to be southeast based on the two locations, or may be determined to be along a street to facilitate tracking of the first vehicle, and so on.

In one example, the method may further comprise the steps of: and adjusting the running mode of the second vehicle according to the running speed and/or the running track. The second vehicle comprises a vehicle for carrying out the method.

For example, if it is determined that the first vehicle is in an acceleration travel state and the first vehicle is in front of the second vehicle, based on the current speed and the historical speed of the first vehicle, the second vehicle can also be appropriately accelerated to travel; if the first vehicle is behind the second vehicle and is in an acceleration running state, the second vehicle can also run at an appropriate acceleration to avoid vehicle collision and improve the safety of automatic driving.

For another example, if the travel trajectory of the first vehicle is determined to be southeast, the travel line of the second vehicle may be adjusted to facilitate tracking of the first vehicle, and so on.

As can be seen from the foregoing embodiments, in the vehicle driving information determining method provided in an embodiment of the present application, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels are determined according to environment point cloud data, and the first feature data and the second feature data are used as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; determining object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; the object information includes current position information of the first vehicle; determining the running speed and/or the running track of the first vehicle according to the current position information and the historical position information of the first vehicle; the processing mode enables three-dimensional image dimension reduction processing to be carried out based on coarse-grained voxels, two-dimensional characteristic images are generated, the coarse-grained voxels are divided more finely, the characteristic data of the coarse-grained voxels not only comprise the voxel characteristic data of the coarse-grained voxels, but also comprise the voxel characteristic data of fine-grained voxels, and therefore more sufficient point cloud original information is reserved in the characteristic data of the coarse-grained voxels, so that part of vehicle information can be avoided being lost, and the driving speed and/or the driving track of a vehicle can be determined according to more accurate vehicle information; therefore, higher accuracy of the vehicle running information and higher speed of determining the vehicle running information can be effectively considered.

Fifth embodiment

In the above-described embodiment, a vehicle travel information determination method is provided, and correspondingly, the present application also provides a vehicle travel information determination device. The present application provides a vehicle travel information determination device including:

Optionally, the apparatus further comprises:

and the vehicle running mode adjusting unit is used for adjusting the running mode of the second vehicle according to the running speed and/or the running track.

Sixth embodiment

In the embodiment described above, a vehicle travel information determination method is provided, and correspondingly, the present application also provides a device. The present application provides an apparatus comprising:

a processor; and

a memory for storing a program for implementing a vehicle travel information determination method, the apparatus performing the following steps after being powered on and running the program for the vehicle travel information determination method by the processor: according to environmental point cloud data, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; determining object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; the object information includes current position information of the first vehicle; and determining the running speed and/or the running track of the first vehicle according to the current position information and the historical position information of the first vehicle.

Seventh embodiment

In the above embodiment, an object detection method is provided, and correspondingly, the present application also provides an object detection method. The method is executed by a main body including but not limited to an unmanned vehicle, and can also be a drive test sensing device and other devices.

Please refer to fig. 8, which is a flowchart of an embodiment of an object detection method of the present application. The application provides an object detection method, which comprises the following steps:

step S801: according to the environmental point cloud data, determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first feature data and the second feature data as third feature data of the first voxels.

Step S803: and taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model.

Step S805: first object information in a target space is determined from a two-dimensional feature image including first inter-voxel neighborhood information and second inter-voxel neighborhood information.

Step S807: and determining second object information in the target space according to the environment perception data outside the environment point cloud data and the first object information.

The perception data comprises other types of environment perception data besides the environment point cloud data, including but not limited to at least one of the following data: two-dimensional images, millimeter wave radar perception data.

For example, the execution subject of the method is an unmanned vehicle, the vehicle is loaded with a laser radar, a millimeter wave radar and a camera, the three sensors respectively collect different types of road environment perception data, and the vehicle respectively processes various perception data, wherein the first object information is determined from the point cloud data through the steps 701 to 705, the third object information is determined from the two-dimensional image through other prior arts, the fourth object information is determined from the millimeter wave radar perception data, then the third object information and the fourth object information are integrated, the first object information is adjusted, and the adjusted object information is called as second object information. Therefore, the second object information is more accurate than the other three object information, and the target confidence coefficient is higher.

In one example, the method may further comprise the steps of:

and transmitting the second object information to the surrounding vehicle in a short-distance communication mode, wherein the surrounding vehicle can adjust the driving mode thereof according to the second object information, and the like.

As can be seen from the foregoing embodiments, in the object detection method provided in the embodiments of the present application, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels are determined according to environment point cloud data, and the first feature data and the second feature data are used as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; determining first object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; determining second object information in a target space according to environment perception data outside the environment point cloud data and the first object information; according to the processing mode, three-dimensional image dimension reduction processing is carried out based on coarse-grained voxels, a two-dimensional characteristic image is generated, the coarse-grained voxels are divided more finely, the characteristic data of the coarse-grained voxels not only comprise the voxel characteristic data of the coarse-grained voxels, but also comprise the voxel characteristic data of fine-grained voxels, so that more sufficient point cloud original information is reserved in the characteristic data of the coarse-grained voxels, and therefore part of object information can be prevented from being lost, and more accurate second object information is determined according to other sensing data (such as two-dimensional images, millimeter wave radar sensing data and the like) except the point cloud based on the first object information; therefore, higher object detection accuracy and higher detection speed can be effectively considered.

Eighth embodiment

In the foregoing embodiment, an object detection method is provided, and correspondingly, an object detection apparatus is also provided in the present application. The application provides an object detection device includes:

Ninth embodiment

In the above embodiment, an object detection method is provided, and correspondingly, the present application also provides an apparatus. The present application provides an apparatus comprising:

a processor; and

a memory for storing a program for implementing an object detection method, the apparatus performing the following steps after being powered on and running the program for the object detection method by the processor: according to environmental point cloud data, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; determining first object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; and determining second object information in the target space according to the environment perception data outside the environment point cloud data and the first object information.

Tenth embodiment

Please refer to fig. 9, which is a flowchart of an embodiment of an object detection method of the present application. The application provides an object detection method, which comprises the following steps:

step S901: a two-dimensional environment image of a target space is acquired.

For example, the unmanned vehicle collects a two-dimensional environment image of a driving road through an image collecting device (such as a camera or the like) mounted on the unmanned vehicle.

Step S903: and constructing a three-dimensional environment image of the target space according to the two-dimensional environment image.

The three-dimensional image is constructed according to the two-dimensional image, and a mature prior art, such as a method for performing depth estimation by using binocular stereo vision to obtain a three-dimensional depth image, can be adopted. Since the method for constructing a three-dimensional image according to a two-dimensional image belongs to the mature prior art, the detailed description is omitted here.

The two-dimensional environment image of the acquired target space may be a single image or a plurality of images. To construct a three-dimensional environment image of a target space from a plurality of two-dimensional environment images, strict calibration of internal and external parameters between a plurality of image cameras is required.

Step S905: according to the three-dimensional environment image, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels are determined, and the first feature data and the second feature data are used as third feature data of the first voxels.

In this embodiment, according to point data on a surface of a three-dimensional environment image, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels, respectively, are determined, and the first feature data and the second feature data are used as third feature data of the first voxels.

Step S907: and taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model.

Step S909: object information in the target space is determined from the two-dimensional feature image including the first inter-voxel neighborhood information and the second inter-voxel neighborhood information.

As can be seen from the foregoing embodiments, in the object detection method provided in the embodiments of the present application, a two-dimensional environment image of a target space is obtained; constructing a three-dimensional environment image of a target space according to the two-dimensional environment image; according to the three-dimensional environment image, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; determining object information in a target space according to a two-dimensional characteristic image comprising first inter-voxel neighbor information and second inter-voxel neighbor information; the processing mode enables three-dimensional image dimension reduction processing to be carried out based on coarse-grained voxels to generate a two-dimensional characteristic image, and the coarse-grained voxels are divided more finely, so that the characteristic data of the coarse-grained voxels not only comprise the voxel characteristic data of the coarse-grained voxels, but also comprise the voxel characteristic data of fine-grained voxels, and therefore more sufficient spatial information is reserved in the characteristic data of the coarse-grained voxels, and partial object information can be prevented from being lost; therefore, higher object detection accuracy and higher detection speed can be effectively considered.

In addition, compared with the technical scheme of the first embodiment, the processing mode adopted by the embodiment does not need to acquire point cloud data through a laser radar, but can directly determine object information according to a two-dimensional image acquired by a camera, and therefore equipment cost can be effectively reduced.

In addition, compared with the technical solution in the prior art that object information is determined directly according to the acquired two-dimensional image through an object detection model (such as a deep learning-based RefineDet method, etc.), the processing method adopted in this embodiment enables input data (a two-dimensional feature image) of the object detection model to include spatial information, and the accuracy of the object information determined according to the two-dimensional feature image is higher.

Eleventh embodiment

Twelfth embodiment

a processor; and

a memory for storing a program for implementing an object detection method, the apparatus performing the following steps after being powered on and running the program for the object detection method by the processor: acquiring a two-dimensional environment image of a target space; constructing a three-dimensional environment image of a target space according to the two-dimensional environment image; according to the three-dimensional environment image, determining first characteristic data of a plurality of first voxels corresponding to a first spatial resolution in a target space and second characteristic data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of a target space through the two-dimensional feature extraction model; object information in the target space is determined from the two-dimensional feature image including the first inter-voxel neighborhood information and the second inter-voxel neighborhood information.

Thirteenth embodiment

In the above embodiments, a method for detecting a lesion is provided, and correspondingly, a method for detecting a lesion is also provided. The subject of the method includes, but is not limited to, CT lesion detection devices and other devices.

Please refer to fig. 10, which is a flowchart of an embodiment of a lesion detection method according to the present application. The focus detection method provided by the application comprises the following steps:

step S1001: at least one Computed Tomography (CT) image of the target site is acquired.

For example, a three-dimensional CT apparatus can be used to perform multi-slice scanning on the same part (target part, such as head) of a human body, so as to obtain at least one CT image of the target part.

Step S1003: and constructing a three-dimensional image of the target part according to the at least one CT image.

The three-dimensional image of the target region is constructed from at least one CT image, and a mature prior art, such as a method of performing depth estimation using binocular stereo vision to obtain a three-dimensional depth image, may be used. Since the method for constructing a three-dimensional image according to a two-dimensional image belongs to the mature prior art, the detailed description is omitted here.

Step S1005: according to the three-dimensional image, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target region and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels are determined, and the first feature data and the second feature data are used as third feature data of the first voxels.

And expressing a plurality of inputs of different sections into the same three-dimensional space, further performing multi-level voxelization division, and determining the characteristic data of the first voxel.

Step S1007: and taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the target part through the two-dimensional feature extraction model.

Step S1009: and determining the focus information of the target part according to the two-dimensional characteristic image comprising the first inter-voxel adjacent information and the second inter-voxel adjacent information.

The lesion information includes, but is not limited to, lesion category, lesion location, and the like.

As can be seen from the above embodiments, the lesion detection method provided in the embodiments of the present application obtains at least one CT image of a target region; constructing a three-dimensional image of the target part according to at least one CT image; determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target region and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively according to the three-dimensional image, and taking the first feature data and the second feature data as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the target part through the two-dimensional feature extraction model; determining lesion information of the target region according to the two-dimensional feature image comprising the first inter-voxel neighborhood information and the second inter-voxel neighborhood information; the processing mode enables the three-dimensional image of the target part to be subjected to image dimension reduction processing based on the coarse-grained voxels to generate a two-dimensional characteristic image, and the coarse-grained voxels are divided more finely, so that the characteristic data of the coarse-grained voxels not only comprise the voxel characteristic data of the coarse-grained voxels, but also comprise the voxel characteristic data of the fine-grained voxels, and therefore more sufficient spatial information is reserved in the characteristic data of the coarse-grained voxels, and therefore, partial focus information can be prevented from being lost; therefore, higher lesion detection accuracy and higher detection speed can be effectively considered.

Fourteenth embodiment

In the above embodiments, a lesion detection method is provided, and correspondingly, a lesion detection apparatus is also provided. The application provides an object detection device includes:

Fifteenth embodiment

In the above embodiment, a lesion detection method is provided, and correspondingly, an apparatus is also provided. The present application provides an apparatus comprising:

a processor; and

a memory for storing a program for implementing a lesion detection method, the apparatus performing the following steps after being powered on and running the program for the lesion detection method through the processor: acquiring at least one CT image of the target part; constructing a three-dimensional image of the target part according to at least one CT image; determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target region and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively according to the three-dimensional image, and taking the first feature data and the second feature data as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the target part through the two-dimensional feature extraction model; and determining the focus information of the target part according to the two-dimensional characteristic image comprising the first inter-voxel adjacent information and the second inter-voxel adjacent information.

Sixteenth embodiment

In the embodiment, an object detection method is provided, and correspondingly, the application further provides a virtual shopping method. The execution subject of the method includes, but is not limited to, a virtual shopping device or the like.

Please refer to fig. 11, which is a flowchart of an embodiment of a virtual shopping method of the present application. The virtual shopping method provided by the application comprises the following steps:

step S1101: according to the point cloud data of the shopping environment, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a shopping space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels are determined, and the first feature data and the second feature data are used as third feature data of the first voxels.

For example, the user may construct a three-dimensional image of the shopping space through ar (augmented reality) smart glasses, and may perform multi-level voxel division on the shopping space based on the three-dimensional image to obtain third feature data.

In specific implementation, the 3d space image in the current shopping scene can be estimated by the AR glasses through a binocular vision technology or a monocular return 3d technology, and the 3d image is discretized to obtain the 3-dimensional point cloud picture. On the basis, the target of interest can be detected and identified through a multi-level convolution network, and the position and the size of the target are estimated.

Step S1103: and taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the shopping space through the two-dimensional feature extraction model.

Step S1105: object information within the shopping space is determined from a two-dimensional feature image that includes first inter-voxel neighborhood information and second inter-voxel neighborhood information.

For example, if the shopping space is an air conditioner selling area, the object information includes various displayed air conditioner types and accurate space 3d position information.

Step S1107: and placing the virtual object into the three-dimensional image of the shopping space according to the object information.

After the object information in the shopping space is identified through the previous step, the virtual object can be added to the three-dimensional image of the space. The virtual object may include a virtual shopping guide person, air conditioner sales information, and the like.

In this embodiment, by recognizing the model and the 3d space position of the target, the customized 3d model corresponding to the target (e.g., an air conditioner) is placed in the virtual space during virtual shopping, and further, the user can interact with the virtual model, such as experiencing to open a refrigerator door.

As can be seen from the foregoing embodiments, in the virtual shopping method provided in the embodiments of the present application, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a shopping space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels are determined according to a point cloud data of a shopping environment, and the first feature data and the second feature data are used as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the shopping space through the two-dimensional feature extraction model; determining object information in a shopping space according to a two-dimensional characteristic image comprising first inter-voxel neighborhood information and second inter-voxel neighborhood information; placing the virtual object into a three-dimensional image of a shopping space according to the object information; the processing mode enables the three-dimensional image of the target shopping space to be subjected to image dimension reduction processing based on the coarse-grained voxels to generate a two-dimensional characteristic image, and the coarse-grained voxels are divided more finely, so that the characteristic data of the coarse-grained voxels not only comprise the voxel characteristic data of the coarse-grained voxels, but also comprise the voxel characteristic data of the fine-grained voxels, and therefore more sufficient spatial information is reserved in the characteristic data of the coarse-grained voxels, and partial commodity information can be prevented from being lost; therefore, higher object detection accuracy and higher detection speed can be effectively considered, and the commodity sales rate and the user experience are improved.

Seventeenth embodiment

In the embodiment, a virtual shopping method is provided, and correspondingly, the application also provides a virtual shopping device. The application provides a virtual shopping device includes:

Eighteenth embodiment

In the embodiment, the virtual shopping method is provided, and correspondingly, the application further provides a device. The present application provides an apparatus comprising:

a processor; and

a memory for storing a program for implementing a virtual shopping method, the device performing the following steps after being powered on and running the program for the virtual shopping method through the processor: according to the point cloud data of the shopping environment, determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a shopping space and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first feature data and the second feature data as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the shopping space through the two-dimensional feature extraction model; determining object information in a shopping space according to a two-dimensional characteristic image comprising first inter-voxel neighborhood information and second inter-voxel neighborhood information; and placing the virtual object into the three-dimensional image of the shopping space according to the object information.

Nineteenth embodiment

In the above embodiment, an object detection method is provided, and correspondingly, a weather prediction method is also provided in the present application. The execution subject of the method includes but is not limited to weather prediction equipment and the like.

Please refer to fig. 12, which is a flowchart of an embodiment of a weather prediction method of the present application. The weather prediction method provided by the application comprises the following steps:

step S1201: and acquiring a three-dimensional space image of the target cloud layer.

Step S1203: according to the three-dimensional space image, first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target cloud layer and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels are determined, and the first feature data and the second feature data are used as third feature data of the first voxels.

For example, a cloud layer 3D space image may be acquired by a meteorological radar, and a target cloud layer may be subjected to multi-level voxel division based on the three-dimensional image, so as to obtain third feature data.

Step S1205: and taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the target cloud layer through the two-dimensional feature extraction model.

Step S1207: weather information is determined from a two-dimensional feature image including first inter-voxel neighborhood information and second inter-voxel neighborhood information.

The weather information includes but is not limited to: snow, rain, and the like.

As can be seen from the foregoing embodiments, the weather prediction method provided in the embodiments of the present application acquires a three-dimensional spatial image of a target cloud layer; according to the three-dimensional space image, determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target cloud layer and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first feature data and the second feature data as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the target cloud layer through the two-dimensional feature extraction model; determining weather information according to the two-dimensional characteristic image comprising the first inter-voxel adjacent information and the second inter-voxel adjacent information; the processing mode enables the three-dimensional image of the target cloud layer to be subjected to image dimension reduction processing based on the coarse-granularity voxels to generate a two-dimensional characteristic image, and the coarse-granularity voxels are divided more finely, so that the characteristic data of the coarse-granularity voxels not only comprise the voxel characteristic data of the coarse-granularity voxels, but also comprise the voxel characteristic data of the fine-granularity voxels, more sufficient spatial information is reserved in the characteristic data of the coarse-granularity voxels, and therefore, part of weather information can be prevented from being lost; therefore, the method can effectively give consideration to higher accuracy of weather forecast and higher detection speed.

Twentieth embodiment

In the foregoing embodiment, a weather prediction method is provided, and correspondingly, a weather prediction apparatus is also provided in the present application. The application provides an object detection device includes:

Twenty-first embodiment

In the above embodiment, a weather prediction method is provided, and correspondingly, the present application also provides a device. The present application provides an apparatus comprising:

a processor; and

a memory for storing a program for implementing a weather prediction method, wherein the following steps are performed after the device is powered on and the program for the weather prediction method is executed by the processor: acquiring a three-dimensional space image of a target cloud layer; according to the three-dimensional space image, determining first feature data of a plurality of first voxels corresponding to a first spatial resolution in a target cloud layer and second feature data of a plurality of second voxels corresponding to at least one second spatial resolution in the first voxels respectively, and taking the first feature data and the second feature data as third feature data of the first voxels; taking the third feature data of the first voxel as input data of a two-dimensional feature extraction model with the network depth corresponding to the first spatial resolution, and generating a two-dimensional feature image of the target cloud layer through the two-dimensional feature extraction model; weather information is determined from a two-dimensional feature image including first inter-voxel neighborhood information and second inter-voxel neighborhood information.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. An object detection method, comprising:

2. The method of claim 1, wherein the first characteristic data and the second characteristic data are determined by:

dividing the first voxel into a plurality of second voxels;

3. The method of claim 1, further comprising:

determining a region of interest within the target space;

4. The method of claim 1, wherein generating a two-dimensional feature image of the target space through the two-dimensional feature extraction model comprises:

5. The method according to claim 1 or 4, wherein the generating a two-dimensional feature image of the target space by the two-dimensional feature extraction model comprises:

6. The method of claim 1, wherein the object comprises: vehicle, person, obstacle.

7. The method of claim 1,

if the multi-level voxel division condition is met, determining the first characteristic data and the second characteristic data according to the environment point cloud data, and taking the first characteristic data and the second characteristic data as third characteristic data of the first voxel;

and if the multi-level voxel division condition is not met, determining the first characteristic data according to the environment point cloud data, taking the first characteristic data as input data of the two-dimensional characteristic extraction model, and determining object information in a target space according to a two-dimensional characteristic image comprising adjacent information between first voxels.

8. The method according to claim 7, wherein the multi-level voxel-splitting condition comprises at least one of:

the object identification accuracy is greater than an accuracy threshold;

the object information includes a vehicle, a person, or an obstacle in the driving road;

the resource occupancy of a processor executing the method is greater than an occupancy threshold.

9. The method of claim 1, wherein the second characterization data is determined by:

dividing the target space into a first region which needs multi-voxel division and a second region which does not need multi-voxel division;

second feature data of a plurality of second voxels in the first voxel corresponding to at least one second spatial resolution in the first region are determined.

10. The method of claim 9, wherein the first region comprises a region between a road surface and a spatial location having a spatial height less than a height threshold.

11. The method of claim 1, wherein the third characterization data is determined by:

determining a region to be subjected to second voxel division in the first voxel;

and taking the first characteristic data of the first voxel and the second characteristic data in the region to be subjected to the second voxel division as the third characteristic data.

12. An object detecting device, comprising:

13. The apparatus of claim 12, further comprising:

14. An apparatus, comprising:

a processor; and

15. The apparatus of claim 14,

the apparatus comprises a vehicle;

the vehicle includes a three-dimensional space scanning device.

16. A vehicle travel information determination method characterized by comprising:

17. The method of claim 16, further comprising:

18. An object detection method, comprising:

19. The method of claim 18, wherein the perception data comprises at least one of: two-dimensional images, millimeter wave radar perception data.

20. A vehicle travel information determination device characterized by comprising:

21. An object detecting device, comprising:

22. An apparatus, comprising:

a processor; and

23. An apparatus, comprising:

a processor; and

24. An object detection method, comprising:

acquiring a two-dimensional environment image of a target space;

25. A method of lesion detection, comprising:

acquiring at least one CT image of the target part;

26. An object detecting device, comprising:

27. An apparatus, comprising:

a processor; and

28. A lesion detection apparatus, comprising:

29. An apparatus, comprising:

a processor; and

30. A virtual shopping method, comprising:

31. A weather prediction method, comprising:

acquiring a three-dimensional space image of a target cloud layer;

32. A virtual shopping device, comprising:

33. An apparatus, comprising:

a processor; and

34. A weather prediction apparatus, comprising:

35. An apparatus, comprising:

a processor; and