WO2022126523A1 - Object detection method, device, movable platform, and computer-readable storage medium - Google Patents

Object detection method, device, movable platform, and computer-readable storage medium Download PDF

Info

Publication number
WO2022126523A1
WO2022126523A1 PCT/CN2020/137299 CN2020137299W WO2022126523A1 WO 2022126523 A1 WO2022126523 A1 WO 2022126523A1 CN 2020137299 W CN2020137299 W CN 2020137299W WO 2022126523 A1 WO2022126523 A1 WO 2022126523A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
size
network layer
object detection
target
Prior art date
Application number
PCT/CN2020/137299
Other languages
French (fr)
Chinese (zh)
Inventor
蒋卓键
孙扬
陈晓智
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/137299 priority Critical patent/WO2022126523A1/en
Publication of WO2022126523A1 publication Critical patent/WO2022126523A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the present application relates to the technical field of object detection, and in particular, to an object detection method, device, movable platform, and computer-readable storage medium.
  • Movable platforms such as unmanned vehicles need to use sensors to sense the surrounding environment, and control the movable platform according to the information obtained by sensing the surrounding objects, so that the movable platform can work safely and reliably. How to accurately detect the object information around the movable platform has become an urgent technical problem to be solved.
  • the present application provides an object detection method, a device, a movable platform and a computer-readable storage medium to solve the technical problem in the related art that the object detection accuracy needs to be improved urgently.
  • an object detection method including:
  • a plurality of sampling points are obtained by sampling the space through the sensor of the movable platform
  • the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer;
  • the feature extraction network layer is used to obtain feature information of the sampling points based on a neural network
  • the size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
  • the object detection network layer is used to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
  • the position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
  • an object detection device comprising: a processor and a memory storing a computer program
  • the processor implements the following steps when executing the computer program:
  • the space is sampled by the sensor to obtain multiple sampling points to be identified;
  • a plurality of sampling points are obtained by sampling the space through the sensor of the movable platform
  • the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer;
  • the feature extraction network layer is used to obtain feature information of the sampling points based on a neural network
  • the size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
  • the object detection network layer is configured to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
  • the position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
  • a movable platform including:
  • a power system mounted within the body for powering the movable platform
  • the object detection device according to the aforementioned second aspect.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the object detection method according to the foregoing first aspect.
  • the feature extraction network layer in the object detection model can be used to obtain the feature information of the sampling point; the size information extraction network layer is used to determine the sampling point according to the feature information of the sampling point The size information is used to represent the probability that the sampling point belongs to an object whose size is within the target size range; the object detection network layer is used to detect the corresponding sampling points based on the feature information and size information.
  • the size information can be extracted based on the feature information, which improves the object detection model's attention to objects whose size is within the target size range, and the position detection network layer can be based on The position information of the target sampling point accurately detects the position information of the object, and can better identify the object whose size is within the target size range.
  • FIG. 1 is a schematic diagram of an object detection method according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an object detection model according to an embodiment of the present application.
  • FIG. 3 is a hardware structure diagram of an object detection apparatus according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a movable platform according to an embodiment of the present application.
  • the movable platform in the embodiment of the present application may be a car, an unmanned aerial vehicle, an unmanned ship, or a robot, etc., wherein, the car may be an unmanned vehicle or a manned vehicle, and the unmanned aerial vehicle may be a drone or other unmanned aerial vehicle.
  • the movable platform is not limited to the movable platforms listed above, and can also be other movable platforms.
  • unmanned vehicles use on-board sensors to sense the surrounding environment of the vehicle, and control the steering and speed of the vehicle according to the object information obtained by sensing, so that the vehicle can drive on the road safely and reliably.
  • Vehicle sensors mainly include lidar, millimeter-wave radar, and vision sensors.
  • the process of recognizing an object may be to obtain data from a sensor and then input it to a trained object detection model, and the object detection model outputs an object recognition result.
  • the training process of the object detection model can be: firstly express a model through modeling, then evaluate the model by constructing an evaluation function, and finally optimize the evaluation function according to the sample data and the optimization method, and adjust the model to the optimum.
  • modeling is to convert practical problems into problems that can be understood by computers, that is, to convert practical problems into ways that computers can represent.
  • Modeling generally refers to the process of estimating the objective function of the model based on a large number of sample data.
  • evaluation is an indicator used to represent the quality of the model.
  • evaluation indicators will involve some evaluation indicators and the design of some evaluation functions.
  • evaluation indicators There will be targeted evaluation indicators in machine learning. For example, after the modeling is completed, a loss function needs to be designed for the model to evaluate the output error of the model.
  • the goal of optimization is the evaluation function. That is, the optimization method is used to optimize the evaluation function and find the model with the highest evaluation. For example, an optimization method such as gradient descent can be used to find the minimum value (optimal solution) of the output error of the loss function, and adjust the parameters of the model to the optimum.
  • an optimization method such as gradient descent can be used to find the minimum value (optimal solution) of the output error of the loss function, and adjust the parameters of the model to the optimum.
  • the existing object detection models have been able to achieve very good results in object detection, and can detect objects with high accuracy, but the inventors found that the accuracy of the object detection models still cannot reach a 100% perfect state. From the perspective of business scenarios, for example, in the field of vehicle driving, in some extreme scenarios, subtle defects in the object detection model may have serious consequences for the safe driving of vehicles. From a technical point of view, it is extremely challenging and difficult to further solve subtle defects on the basis of the existing high accuracy, because in the field of machine learning, as mentioned above, there are many links from modeling to training. For example, the selection and processing of sample data, the design of data features, the design of models, the design of loss functions or the design of optimization methods, etc., subtle differences in any link are factors that lead to subtle defects in detection accuracy.
  • the inventors of the present application focused their research on objects to be detected.
  • object detection often focuses on large-sized objects such as people, vehicles, roads, or trees, and these larger-sized objects in the sample data often have a large amount of data.
  • the model will tend to be globally optimal during the training process.
  • the features of objects with larger sizes are often more obvious and easier to be noticed.
  • the features of objects with small sizes are relatively subtle and difficult to obtain the attention of the model, which makes the model biased towards Extracting features of large objects, this paranoia eventually leads to models that recognize large objects well, but not small objects well, and this is one of the reasons for the subtle flaws in object detection models.
  • FIG. 1 is a flowchart of an object detection method provided by the embodiment of the present application, including the following steps:
  • step 102 a plurality of sampling points are obtained by spatial sampling of the sensor of the movable platform
  • step 104 an object detection model is used to detect a target sampling point of the same object corresponding to the plurality of sampling points, and the position information of the object is detected according to the position information of the target sampling point.
  • FIG. 2 it is a schematic diagram of an object detection model provided by an embodiment of the present application, wherein the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer ;
  • the feature extraction network layer is used to obtain feature information of the sampling points based on a neural network
  • the size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
  • the object detection network layer is configured to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
  • the position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
  • the object detection model has two information extraction network layers: a feature extraction network layer and a size information extraction network layer.
  • the feature extraction network layer is used to obtain the feature information of the sampling point
  • the size information extraction network layer can further determine the size information of the sampling point on the basis of the feature information, and the size information indicates that the sampling point belongs to the size
  • the probability of objects within the target size range enables the object detection model to pay attention to the size information of objects belonging to the target size range in addition to the feature information of the objects.
  • the size information represents objects of a specific size. Enables the model to increase the consideration of objects of a specific size, thereby enabling further accurate identification of objects of a specific size.
  • the method of the embodiment of the present application can be applied to a movable platform, and the movable platform recognizes an object during the movement process, so as to perform movement control based on the recognition result.
  • the movable platform in this embodiment of the present application may include: a car, an unmanned aerial vehicle, an unmanned ship, or a robot, wherein the car may be an unmanned vehicle or a manned vehicle, and the unmanned aerial vehicle may be an unmanned aerial vehicle or other unmanned aerial vehicle.
  • the movable platform is not limited to the movable platforms listed above, and can also be other movable platforms.
  • an object detection model may be pre-trained, and the object detection model may be set in the movable platform, or may be set in a server connected to the movable platform.
  • the object detection model may be pre-trained by the business party, and the trained object detection model may be stored in the movable platform, so that the movable platform can recognize the object.
  • the mobile platform may also send the collected data to the server, and the object detection model configured on the server uses the collected data to identify the object, and then returns the identification result to the mobile platform.
  • the business party may prepare sample data for training in advance.
  • the sample data may include: data belonging to objects whose size is within the target size range.
  • it is referred to as the second type of object.
  • the second type of object has a specific size, and the specific size can be based on actual business requirements. Flexible configuration is not limited in this embodiment.
  • the target size range in this embodiment may include: a size range smaller than a preset target size threshold, and the preset target size threshold can be flexibly configured according to business needs ;
  • the sample data also includes data of objects that do not belong to the target size range, which is referred to as the first type of objects in this embodiment.
  • Objects of the second type have a certain size relative to the objects of the first type, eg, objects of smaller size.
  • the data of the second type of object is added to the sample data, which can enhance the recognition of the second type of object by the object detection model.
  • Model training in this embodiment may be supervised training or unsupervised training.
  • a supervised training method can be used to improve the training speed, and the real values can be marked in the sample data.
  • the sample data is marked with the position information of the object.
  • the position information of the object may include one or more kinds of information, and the specific information may be configured according to business needs.
  • the position information of the object may include any of the following: size information, coordinate information or direction information.
  • the sample data may be point cloud data for multiple sampling points or image data for multiple sampling points.
  • point cloud data can be collected by sensors such as lidar or millimeter wave radar.
  • sensors such as lidar or millimeter wave radar.
  • unmanned vehicles use on-board sensors to sense the surrounding environment of the vehicle, and control the steering and speed of the vehicle according to the road, vehicle position and object information obtained by sensing, so that the vehicle can be safely, Drive reliably on the road.
  • Vehicle sensors can include lidar, millimeter-wave radar, and vision sensors.
  • the basic principle of lidar is to actively transmit a laser pulse signal to the detected object, and obtain the reflected pulse signal, and calculate the depth information of the detected object according to the time difference between the transmitted signal and the received signal; Know the emission direction, and obtain the angle information of the measured object relative to the lidar; combine the aforementioned depth and angle information to obtain point cloud data.
  • point cloud data can be converted to image data.
  • the vehicle-mounted sensor may further include a plurality of cameras that can collect image data from multiple viewing angles and a plurality of depth cameras that can collect image data with depth information from multiple viewing angles.
  • Image data can also be collected by image acquisition sensors such as cameras.
  • the above-mentioned sample data may be data obtained after feature engineering of the original data.
  • Feature engineering refers to the process of finding some physically meaningful features from the original data to participate in model training. This process involves data cleaning, data dimensionality reduction, feature extraction, feature normalization, feature evaluation and screening, feature dimensionality reduction or Feature encoding, etc.
  • the point cloud data is unstructured data and needs to be processed into a format that can be input to the object detection model.
  • the point cloud data is processed to obtain the point cloud density corresponding to each voxel of the point cloud data.
  • the point cloud density corresponding to each voxel of the point cloud data is used as the input of the object detection model.
  • the point cloud data processing method may be point cloud three-dimensional grid processing.
  • the point cloud data is divided into grids to obtain multiple voxels of the point cloud data, and the points contained in each voxel in the point cloud data are obtained.
  • the ratio of the number of clouds to the number of all point clouds in the point cloud data constitutes the point cloud density of the voxel.
  • the point cloud density represents the number of point clouds contained in the voxel
  • the point cloud density is large, it means that the voxel has a greater probability of corresponding to an object, so the point cloud corresponding to each voxel of the point cloud data is Density can be used as characteristic information of objects. Processing the irregular point cloud into a regular representation can better represent the contour information of the object.
  • grid data including point cloud density as model input, the number of point clouds of each 3D grid can be distinguished, and the accuracy of object detection can be improved.
  • sample data refers to image data containing real objects, including image data containing first-type objects and image data containing second-type objects.
  • the object detection model can automatically identify the image data based on real objects. Object characteristics.
  • the object detection model can be obtained by training a machine learning model using the sample data.
  • the machine learning model may be a neural network model or the like, such as a deep learning-based neural network model.
  • the specific structural design of the object detection model is one of the important aspects of the training process.
  • the structure of the object detection model at least includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer.
  • the feature extraction network layer is used for acquiring feature information of the sampling points based on a deep learning neural network.
  • the object detection model of this embodiment additionally adds a size information extraction network layer, so as to enhance the model's recognition of the second type of objects by extracting the size information of the second type of objects.
  • the size information extraction network layer can be independent of the feature extraction network layer, that is, two independent neural networks are used to extract two types of information respectively. This implementation requires training two independent neural networks, and the algorithm overhead is relatively large. .
  • it can be implemented in the form of a backbone network and a branch network. The backbone network is used to receive the input data and extract the feature information of the object, and the branch network is dedicated to extracting the size information, so that the feature information can be extracted on the basis of The size information is further extracted, the execution overhead is correspondingly reduced, and the model execution efficiency is improved.
  • the feature may be the pixel value of the image data; for point cloud data, the feature may be the point cloud density corresponding to each voxel of the point cloud data.
  • the size information is used to represent the probability that each voxel of the point cloud data belongs to an object whose size is within the target size range, or the size information The probability that each pixel used to characterize the image data belongs to an object whose size is within the target size range.
  • the size information is used to: enhance the recognition by the object detection network layer that the sampling point belongs to an object whose size is within the target size range.
  • the class feature information may be in the form of a feature map, and the feature map may have a specified size, such as the size of H*W, where H and W represent length and width respectively, and specific values can be flexibly configured as needed; Figure A is taken as an example.
  • size information is further extracted, which can be size information extracted from each position in feature map A.
  • the size information represents the probability that the sampling point belongs to an object whose size is within the target size range. , which means that each position in the feature map A represents the probability that the point belongs to an object whose size is within the target size range.
  • the extracted size information can be combined with the feature information for object recognition, and the combination of the two can be implemented in various ways.
  • the two are independent, and they are used as two types of data for object recognition.
  • the feature information and the size information can also be added, and the target sampling point of the same object corresponding to the multiple sampling points can be detected according to the addition result.
  • the size information can be used as additional information of the feature information.
  • a feature map A representing feature information is extracted, and size information can be added to the feature map A to form a new feature map B, and object recognition can be performed based on feature B.
  • the dimension information network layer can be implemented with various network structures.
  • a graph can be used to represent the feature information, which can be implemented by using a convolutional neural network.
  • the size information extraction network layer includes a convolution layer, for example, there may be at least two convolution layers.
  • the convolutional layer in the size information network layer has at least two layers.
  • the size information may be obtained by performing at least two convolution operations on the feature information by the at least two convolution layers.
  • the loss function is also called the cost function.
  • the real value is marked in the sample data, and the loss function is used to estimate the error between the detection value of the model and the real value.
  • the loss function is very important to the recognition accuracy of the model, and it is difficult to design what kind of loss function based on the existing sample data and the requirements of the model.
  • some existing loss functions such as logarithmic loss functions, squared loss functions, exponential loss functions, 0/1 loss functions, etc. can be used to form loss functions of corresponding scenarios.
  • the object detection network layer is used to: obtain the confidence that the sampling point belongs to an object based on the feature information and size information of the sampling point, and use the confidence that the sampling point belongs to the object to detect A target sampling point of the same object corresponding to the plurality of sampling points.
  • the loss functions used in the training process of the object detection model include: an object position information loss sub-function, an object confidence loss sub-function and a size information coefficient loss sub-function.
  • the loss function additionally adds a size information coefficient loss sub-function, so that the model can pay attention to the second type of objects, so that the model can Objects are more clearly differentiated.
  • the optimization objective of the object position information loss function includes: reducing the difference between the object position information obtained from the sample data by the object detection model and the object position information calibrated by the sample data.
  • the position information of the object includes size information, position information or orientation information of the object, and based on this, the position difference includes: the size information, position information or orientation information of the object obtained by the object detection model from the sample data, The difference from the size information, position information or direction information of the object calibrated by the sample data respectively.
  • the length, width, height, position, and orientation of an object are specifically expressed as (x, y, z, l, h, w, ⁇ ), where x, y, and z represent length, width, and height.
  • x, y, and z represent length, width, and height.
  • l, h, w represent the position (that is, the coordinate information of the point), and ⁇ represents the orientation; then in the object state information loss sub-function, the sub-function used to describe the position difference can include:
  • floc( xi ) represents the position information of the object identified by the object detection model.
  • the loss function of this embodiment further includes an object confidence loss sub-function, and the optimization goal of the object confidence loss sub-function includes: improving the confidence of the object detected by the object detection model from the sample data; as an example:
  • fpred(x i ) represents the confidence of the object detected by the object detection model from the sample data, that is, the probability of recognizing the object.
  • a size information coefficient loss function is also added to the loss function in this embodiment, and the size information coefficient loss function is used to: enhance the features of sampling points belonging to objects whose size is within the target size range value, in some examples, the optimization goal of the size information coefficient loss function includes: improving the confidence of the object detection model to detect objects belonging to the size within the target size range from the sample data.
  • the object detection model uses the size information, for each sampling point, detects that the point belongs to the object whose size is within the target size range. Therefore, the loss function of the size information coefficient is to make the eigenvalue of the corresponding position of the small object large, and the eigenvalue of the non-small object to be small, so that the network can distinguish small objects more clearly:
  • fseg(x k ) represents the confidence level that the object detection model detects objects whose size is within the target size range from the sample data.
  • the loss function used in the training process of the object detection model of this embodiment may include:
  • the optimization method In the training process, it is necessary to use the optimization method to optimize the evaluation function and find the model with the highest evaluation. For example, the minimum value (optimal solution) of the output error of the loss function can be found through optimization methods such as gradient descent, and the parameters of the model can be adjusted to the optimum, that is, the optimal coefficients of each network layer in the model can be solved.
  • the process of solving may be to solve for gradients that adjust model parameters by computing the output of the model and the error value of the loss function.
  • a back-propagation function can be called to calculate the gradient, and the calculation result of the loss function can be back-propagated into the object detection model, so that the object detection model can update model parameters.
  • the solution of the loss function described above can be solved using a stand-alone solver.
  • a network branch may be set on the basis of the backbone network to calculate the loss function of the network.
  • the loss function can be divided into the above three sub-functions: the object position information loss sub-function, the object confidence loss sub-function and the size information coefficient loss sub-function.
  • the corresponding three network branches can be set to solve separately.
  • the object detection model is obtained after the training, and the obtained object detection model can also be tested by using the test sample to check the recognition accuracy of the object detection model.
  • the finally obtained object detection model can be set in the movable platform or the server.
  • the space is sampled by the sensors of the movable platform to obtain multiple sampling points to be identified, and the object detection model detects the position information of the object. .
  • the sensors may be lidars, millimeter-wave radars, vision sensors, etc., correspondingly, point cloud data of multiple sampling points to be identified may be collected; Can be converted to image data.
  • the sensor may also be an image sensor, and correspondingly, image data of multiple sampling points to be identified may be collected.
  • the point cloud data can be divided into a grid to obtain multiple voxels. After calculating the point cloud density corresponding to each voxel, the point cloud density corresponding to each voxel is input into the object detection model for identification.
  • the data input to the object detection model includes pixel values of image data. The voxels of the point cloud data are obtained by dividing the point cloud data into a grid.
  • the size information of the sampling points is further extracted. Since the size information represents the probability that the sampling points belong to objects whose size is within the target size range, the size information of the sampling points in this embodiment is Object detection models are better at recognizing objects of a certain size.
  • objects whose size is within the target size range are referred to as the second type of objects; in some examples, the target size range includes: a size range smaller than a preset target size threshold, that is, the second type of objects is a size smaller objects.
  • the feature information can be in the form of a feature map, and the feature map can have a specified size, such as the size of H*W, where H and W represent length and width respectively, and the specific values can be flexibly configured as needed;
  • the size information is further extracted on the basis of the feature map A, which can be the size information extracted from each position in the feature map A.
  • the size information represents the probability that the sampling point belongs to an object whose size is within the target size range, That is, it represents the probability that each position in the feature map A belongs to an object whose size is within the target size range.
  • the extracted size information can be combined with the feature information for object detection, and the combination of the two can be implemented in various ways.
  • the two are independent, and they are used as two types of data for object recognition.
  • the feature information and the size information can also be added, and the target sampling point of the same object corresponding to the multiple sampling points can be detected according to the addition result.
  • the size information can be used as additional information of the feature information.
  • a feature map A representing feature information is extracted, and size information can be added to the feature map A to form a new feature map B, and object recognition can be performed based on feature B.
  • the size information extraction network layer can be implemented with various network structures.
  • a graph can be used to represent the feature information, which can be implemented by using a convolutional neural network.
  • the size information extraction network layer includes a convolution layer, for example, there may be at least two convolution layers. There are at least two convolution layers in the size information extraction network layer.
  • the size information may be obtained by performing at least two convolution operations on the feature information by the at least two convolution layers.
  • the feature extraction network layer is used to obtain the feature information of the sampling points based on the neural network;
  • the size information extraction network layer is used to extract the feature information according to the sampling points the feature information, determine the size information of the sampling point, and the size information is used to represent the probability that the sampling point belongs to an object whose size is within the target size range;
  • the object detection network layer is used based on the feature The information and size information detect the target sampling point of the same object corresponding to the plurality of sampling points;
  • the position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
  • the object detection network layer is configured to: based on the feature information and size information of the sampling point, obtain a confidence that the sampling point belongs to an object, and use the confidence that the sampling point belongs to an object to detect multiple The target sampling points of the same object corresponding to the sampling points.
  • the model in this embodiment can extract specific size information, and since the size information represents the probability that the sampling point belongs to an object whose size is within the target size range, the object detection model can improve the accuracy of the sampling point. It belongs to the attention of objects whose size is within the target size range, and thus can better detect objects whose size is within the target size range.
  • the object recognition accuracy can still be guaranteed under the condition of sparse point clouds.
  • the point cloud obtained by the mobile platform using a single sensor detection is generally sparse. Under the condition of sparse point cloud, the object detection effect is usually not good.
  • related technologies use a combination of multiple sensors to obtain point clouds. For example, while using lidar detection to obtain point clouds, other sensors are used for auxiliary fusion to obtain dense and high-quality point clouds. The data fusion process between them is more complicated, and at the same time, multiple sensors also increase the hardware cost.
  • the object recognition accuracy can be achieved, so that the movable platform does not need to use other sensors for auxiliary fusion.
  • the neural network in deep learning is used to detect The position and confidence of the three-dimensional object, by adding a network branch to the neural network, and improving the training strategy of the neural network in the deep learning algorithm, so that the branch is used to extract the size information of small objects, and finally makes the model small object detection more accurate. friendly.
  • the point cloud is divided into three-dimensional grid according to a certain resolution in the xyz direction, that is, the three-dimensional space is voxelized.
  • the point cloud density in the voxel determines the point cloud feature corresponding to the voxel; if there is a point cloud in each voxel, calculate the point cloud density p of the position, and set the feature of the position as p. voxels, set their point cloud features to 0; thus generating the input dimension that the neural network can receive.
  • the input data will first go through the feature extraction network layer in the object detection model, so that the features of the point cloud can be extracted to generate a feature map, which represents the feature information of the sampling points.
  • another network branch that is, the size information extraction network layer, will be connected to generate a new feature map, which represents the size information of the sampling points.
  • the network branch can perform two convolution operations, and each position of the feature map will generate a size information, and the size information has better semantics to describe the objects belonging to the target size range Information that characterizes the probability that the location belongs to an object whose size is within the target size range.
  • the extracted size information can be added with the feature map of the feature information to obtain a new feature map B, and the addition operation can be used to fuse stronger semantic information.
  • the target sampling points of the same object corresponding to the plurality of sampling points can be detected based on the feature information and size information, and then according to the position of the target sampling point
  • the information detects the position information of the object.
  • a series of candidate frames can be identified. Each candidate frame may correspond to an object.
  • the sampling points in the candidate frame are the target sampling points corresponding to the same object.
  • the confidence level represents the probability of belonging to the object, the probability that the candidate frame corresponds to belonging to the object can be determined, the probability corresponding to each candidate frame is sorted, and the sorting is performed according to the set threshold.
  • An object is identified, and then the position information of the object can be detected according to the position information of the target sampling point in the candidate frame, and the final detection result is obtained.
  • the position information of the object identified by the object detection model of this embodiment can be used for automatic movement decision of the movable platform, for example, it can be used for automatic driving decision of a car, automatic flight decision of an unmanned aerial vehicle, and the like.
  • the foregoing method embodiments may be implemented by software, and may also be implemented by hardware or a combination of software and hardware.
  • software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory through the processor that detects the object in which it is located.
  • FIG. 3 which is a hardware structure diagram for implementing the object detection apparatus 300 of this embodiment, except for the processor 301 , the memory 302 , and the non-volatile memory 303 shown in FIG. 3
  • the object detection device used for implementing the object recognition method in the embodiment usually according to the actual function of the object detection device, may also include other hardware, which will not be repeated here.
  • the processor 301 implements the following steps when executing the computer program:
  • a plurality of sampling points are obtained by sampling the space through the sensor of the movable platform
  • the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer;
  • the feature extraction network layer is used to obtain feature information of the sampling points based on a neural network
  • the size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
  • the object detection network layer is configured to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
  • the position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
  • the target size range includes a size range smaller than a preset target size threshold.
  • the data of the multiple sampling points obtained by spatial sampling includes: point cloud data of the multiple sampling points and/or image data of the multiple sampling points.
  • the data input to the object detection model includes: a point cloud density corresponding to each voxel of the point cloud data and/or a pixel value of the image data.
  • the voxels are obtained by rasterizing the point cloud data.
  • the size information is used to characterize the probability that each voxel of the point cloud data belongs to an object whose size is within the target size range, or the size information is used to characterize that each pixel of the image data belongs to Probability of an object whose size is within the target size range.
  • the size information is used to: enhance the recognition by the object detection network layer that the sampling point belongs to an object whose size is within a target size range.
  • the object detection network layer is configured to: add the feature information and the size information, and use the addition result to detect a target sampling point of the same object corresponding to the plurality of sampling points.
  • the feature extraction network layer is used for acquiring feature information of the sampling points based on a deep learning neural network.
  • the size information extraction network layer is a branch of the feature extraction network layer.
  • the dimension information extraction network layer includes a convolutional layer.
  • the size information is obtained by using the convolution layer to perform a convolution operation on the feature information.
  • the size information is obtained by performing at least two convolution operations on the feature information by the at least two convolutional layers.
  • the object detection network layer is configured to: based on the feature information and size information of the sampling point, obtain a confidence that the sampling point belongs to an object, and use the confidence that the sampling point belongs to an object to detect multiple The target sampling points of the same object corresponding to the sampling points.
  • the loss functions used in the training process of the object detection model include: an object position information loss sub-function, an object confidence loss sub-function, and a size information coefficient loss sub-function.
  • the optimization objective of the object position information loss function includes: reducing the difference between the object position information obtained from the sample data by the object detection model and the object position information calibrated by the sample data.
  • the optimization objective of the object confidence loss sub-function includes: improving the confidence of the object detected by the object detection model from the sample data.
  • the size information coefficient loss function is used to enhance feature values of sample points belonging to objects whose size is within the target size range.
  • the optimization objective of the size information coefficient loss function includes: improving the confidence that the object detection model detects objects whose size is within the target size range from the sample data.
  • the training process of the object detection model includes back-propagating the calculation result of the loss function into the object detection model, so that the object detection model updates model parameters.
  • the location information of the object includes any of the following: size information, coordinate information or orientation information.
  • the apparatus is applied to a movable platform.
  • the point cloud data is acquired by using a lidar or a camera device with a depth information acquisition function configured on the movable platform.
  • the image data is acquired using a camera device disposed on the movable platform.
  • the movable platform includes: an unmanned aerial vehicle, a car, an unmanned boat, or a robot.
  • the detected location information of the object is used to: make an autonomous driving decision for the car.
  • the detected location information of the object is used to: make automatic flight decisions for the UAV.
  • an embodiment of the present application further provides a movable platform 400, including: a body 401; a power system 402 installed in the body 401 to provide power for the movable platform; The object detection device 300 described in the embodiment.
  • the movable platform 400 is a vehicle, an unmanned aerial vehicle, an unmanned ship or a mobile robot.
  • the embodiments of this specification further provide a computer-readable storage medium, where several computer instructions are stored on the readable storage medium, and when the computer instructions are executed, the steps of the object detection method in any one of the embodiments are implemented.
  • Embodiments of the present specification may take the form of a computer program product embodied on one or more storage media having program code embodied therein, including but not limited to disk storage, CD-ROM, optical storage, and the like.
  • Computer-usable storage media includes permanent and non-permanent, removable and non-removable media, and storage of information can be accomplished by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • PRAM phase-change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • Flash Memory or other memory technology
  • CD-ROM Compact Disc Read Only Memory
  • CD-ROM Compact Disc Read Only Memory
  • DVD Digital Versatile Disc
  • Magnetic tape cassettes magnetic tape magnetic disk storage or other magnetic storage devices or any other non-

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Provided are an object detection method, device, movable platform, and computer-readable storage medium; a feature extraction network layer in an object detection model can be used for obtaining feature information of sample points; a size information extraction network layer is used for, according to said feature information of said sample points, determining size information of the sample points, said size information being used for characterizing the probability that said sample points belong to an object whose size is within a target size range; an object detection network layer is used for detecting a target sample point corresponding to the same object in a plurality of the sample points on the basis of the feature information and size information; on the basis of the feature information, it is possible to extract size information, thereby improving the focus of the object detection model on an object whose sample points fall within a target size range, thus the position detection network layer can accurately detect the position information of the object according to the position information of said target sample point, enabling better identification of objects having a size within the target size range.

Description

物体检测方法、设备、可移动平台及计算机可读存储介质Object detection method, device, removable platform, and computer-readable storage medium 技术领域technical field
本申请涉及物体检测技术领域,具体而言,涉及一种物体检测方法、设备、可移动平台及计算机可读存储介质。The present application relates to the technical field of object detection, and in particular, to an object detection method, device, movable platform, and computer-readable storage medium.
背景技术Background technique
如无人驾驶车辆等可移动平台需要利用传感器来感知周围环境,并根据感知所获得的周边物体等信息来控制可移动平台,从而使可移动平台能够安全可靠地工作。如何精确地检测出可移动平台周边的物体信息,成为亟待解决的技术问题。Movable platforms such as unmanned vehicles need to use sensors to sense the surrounding environment, and control the movable platform according to the information obtained by sensing the surrounding objects, so that the movable platform can work safely and reliably. How to accurately detect the object information around the movable platform has become an urgent technical problem to be solved.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本申请提供一种物体检测方法、设备、可移动平台及计算机可读存储介质,以解决相关技术中物体检测准确度亟待提升的技术问题。In view of this, the present application provides an object detection method, a device, a movable platform and a computer-readable storage medium to solve the technical problem in the related art that the object detection accuracy needs to be improved urgently.
第一方面,提供一种物体检测方法,包括:In a first aspect, an object detection method is provided, including:
通过可移动平台的传感器对空间采样得到多个采样点;A plurality of sampling points are obtained by sampling the space through the sensor of the movable platform;
利用物体检测模型检测多个所述采样点中对应的同一物体的目标采样点,并根据所述目标采样点的位置信息检测所述物体的位置信息;Use the object detection model to detect the target sampling points of the same object corresponding to the plurality of sampling points, and detect the position information of the object according to the position information of the target sampling points;
其中,所述物体检测模型包括:特征提取网络层,尺寸信息提取网络层,物体检测网络层和位置检测网络层;Wherein, the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer;
所述特征提取网络层用于基于神经网络获取所述采样点的特征信息;The feature extraction network layer is used to obtain feature information of the sampling points based on a neural network;
所述尺寸信息提取网络层用于根据所述采样点的所述特征信息,确定所述采样点的尺寸信息,所述尺寸信息用于表征所述采样点属于尺寸在目标尺寸范围内的物体的概率;The size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
所述物体检测网络层用于基于所述特征信息和尺寸信息检测多个所述采样点中对应的同一物体的目标采样点;The object detection network layer is used to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
所述位置检测网络层用于根据所述目标采样点的位置信息检测所述物体的位置信息。The position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
第二方面,提供一种物体检测设备,包括:处理器和存储有计算机程序的存储器;In a second aspect, an object detection device is provided, comprising: a processor and a memory storing a computer program;
所述处理器在执行所述计算机程序时实现以下步骤:The processor implements the following steps when executing the computer program:
通过传感器对空间采样,获取待识别的多个采样点;The space is sampled by the sensor to obtain multiple sampling points to be identified;
通过可移动平台的传感器对空间采样得到多个采样点;A plurality of sampling points are obtained by sampling the space through the sensor of the movable platform;
利用物体检测模型检测多个所述采样点中对应的同一物体的目标采样点,并根据所述目标采样点的位置信息检测所述物体的位置信息;Use the object detection model to detect the target sampling points of the same object corresponding to the plurality of sampling points, and detect the position information of the object according to the position information of the target sampling points;
其中,所述物体检测模型包括:特征提取网络层,尺寸信息提取网络层,物体检测网络层和位置检测网络层;Wherein, the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer;
所述特征提取网络层用于基于神经网络获取所述采样点的特征信息;The feature extraction network layer is used to obtain feature information of the sampling points based on a neural network;
所述尺寸信息提取网络层用于根据所述采样点的所述特征信息,确定所述采样点的尺寸信息,所述尺寸信息用于表征所述采样点属于尺寸在目标尺寸范围内的物体的概率;The size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
所述物体检测网络层用于基于所述特征信息和尺寸信息检测多个所述采样点中对应的同一物体的目标采样点;The object detection network layer is configured to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
所述位置检测网络层用于根据所述目标采样点的位置信息检测所述物体的位置信息。The position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
第三方面,提供一种可移动平台,包括:In a third aspect, a movable platform is provided, including:
机体;body;
动力***,安装在所述机体内,用于为所述可移动平台提供动力;以及,a power system mounted within the body for powering the movable platform; and,
如前述第二方面所述的物体检测设备。The object detection device according to the aforementioned second aspect.
第四方面,提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如前述第一方面所述的物体检测方法。In a fourth aspect, a computer-readable storage medium is provided, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the object detection method according to the foregoing first aspect.
应用本申请提供的方案,物体检测模型中的特征提取网络层可用于获取所述采样点的特征信息;尺寸信息提取网络层用于根据所述采样点的所述特征信息,确定所述采样点的尺寸信息,所述尺寸信息用于表征所述采样点属于尺寸在目标尺寸范围内的物体的概率;物体检测网络层用于基于所述特征信息和尺寸信息检测多个所述采样点中对应的同一物体的目标采样点,由于在特征信息的基础上,能够提取出尺寸信息,从而提高了物体检测模型对采样点属于尺寸在目标尺寸范围内的物体的关注,进而位置检测网络层可以根据所述目标采样点的位置信息准确地检测物体的位置信息,能够更好地识别出属于尺寸在目标尺寸范围内的物体。Applying the solution provided in this application, the feature extraction network layer in the object detection model can be used to obtain the feature information of the sampling point; the size information extraction network layer is used to determine the sampling point according to the feature information of the sampling point The size information is used to represent the probability that the sampling point belongs to an object whose size is within the target size range; the object detection network layer is used to detect the corresponding sampling points based on the feature information and size information. For the target sampling points of the same object, the size information can be extracted based on the feature information, which improves the object detection model's attention to objects whose size is within the target size range, and the position detection network layer can be based on The position information of the target sampling point accurately detects the position information of the object, and can better identify the object whose size is within the target size range.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.
图1是本申请一个实施例的物体检测方法的示意图。FIG. 1 is a schematic diagram of an object detection method according to an embodiment of the present application.
图2是本申请一个实施例的物体检测模型的示意图。FIG. 2 is a schematic diagram of an object detection model according to an embodiment of the present application.
图3是本申请一个实施例的物体检测装置的一种硬件结构图。FIG. 3 is a hardware structure diagram of an object detection apparatus according to an embodiment of the present application.
图4是本申请一个实施例的可移动平台的示意图。FIG. 4 is a schematic diagram of a movable platform according to an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments.
本申请实施例的可移动平台可以为汽车、无人飞行器、无人船或机器人等,其中,汽车可以为无人驾驶车辆,也可以为有人驾驶车辆,无人飞行器可以为无人机或其他无人飞行器。当然,可移动平台并不限于上述列举的可移动平台,也可为其他可移动平台。The movable platform in the embodiment of the present application may be a car, an unmanned aerial vehicle, an unmanned ship, or a robot, etc., wherein, the car may be an unmanned vehicle or a manned vehicle, and the unmanned aerial vehicle may be a drone or other unmanned aerial vehicle. Of course, the movable platform is not limited to the movable platforms listed above, and can also be other movable platforms.
以无人机驾驶车辆为例,无人驾驶车辆是利用车载传感器来感知车辆周围环境,并根据感知获得的物体信息,控制车辆的转向和速度,从而使车辆能够安全可靠地在道路上行驶。车载传感器主要包括激光雷达、毫米波雷达和视觉传感器等。识别物体的过程,可以是通过传感器获取数据后输入至训练好的物体检测模型,由物体检测模型输出物体识别结果。Taking drone-driven vehicles as an example, unmanned vehicles use on-board sensors to sense the surrounding environment of the vehicle, and control the steering and speed of the vehicle according to the object information obtained by sensing, so that the vehicle can drive on the road safely and reliably. Vehicle sensors mainly include lidar, millimeter-wave radar, and vision sensors. The process of recognizing an object may be to obtain data from a sensor and then input it to a trained object detection model, and the object detection model outputs an object recognition result.
物体检测模型的训练过程可以是:先通过建模表示出一个模型,再通过构建评价函数对模型进行评价,最后根据样本数据及最优化方法对评价函数进行优化,把模型调整到最优。The training process of the object detection model can be: firstly express a model through modeling, then evaluate the model by constructing an evaluation function, and finally optimize the evaluation function according to the sample data and the optimization method, and adjust the model to the optimum.
其中,建模是将实际问题转化成为计算机可以理解的问题,即将实际的问题转换成计算机可以表示的方式。建模一般是指基于大量样本数据估计出来模型的目标函数的过程。Among them, modeling is to convert practical problems into problems that can be understood by computers, that is, to convert practical problems into ways that computers can represent. Modeling generally refers to the process of estimating the objective function of the model based on a large number of sample data.
评价的目标是判断已建好的模型的优劣。对于第一步中建好的模型,评价是一个指标,用于表示模型的优劣。这里就会涉及一些评价的指标以及一些评价函数的设计。在机器学习中会有针对性的评价指标。例如,在建模完成后,需要为模型设计一个损失函数,来评价模型的输出误差。The goal of evaluation is to judge the pros and cons of the established model. For the model built in the first step, evaluation is an indicator used to represent the quality of the model. Here will involve some evaluation indicators and the design of some evaluation functions. There will be targeted evaluation indicators in machine learning. For example, after the modeling is completed, a loss function needs to be designed for the model to evaluate the output error of the model.
优化的目标是评价函数。即利用最优化方法,对评价函数进行最优化求解,找到评价最高的模型。例如,可以通过诸如梯度下降法等最优化方法,找到损失函数的输出误差的最小值(最优解),将模型的参数调到最优。The goal of optimization is the evaluation function. That is, the optimization method is used to optimize the evaluation function and find the model with the highest evaluation. For example, an optimization method such as gradient descent can be used to find the minimum value (optimal solution) of the output error of the loss function, and adjust the parameters of the model to the optimum.
可以这么理解,要训练一个机器学习模型之前,首先确定出一个合适的参数估计方法,再利用这种参数估计方法,把这个模型的目标函数中的各个参数估计出来,进而确定出目标函数最终的数学表达式。It can be understood that before training a machine learning model, first determine a suitable parameter estimation method, and then use this parameter estimation method to estimate each parameter in the objective function of the model, and then determine the final objective function. Mathematical expression.
已有的物体检测模型在物体检测上已经能够取得非常好的效果,能够以很高的准确性检测出物体,但发明人发现,物体检测模型的准确性仍然不能达到百分百的完美状态,从业务场景的角度,例如车辆驾驶领域在一些极端场景下,物体检测模型细微的缺陷对于车辆安全行驶可能带来严重的后果。从技术的角度,在已有较高准确性的基础上再进一步解决细微的缺陷是极具挑战和难度的,因为在机器学习领域,如上所述,从建模至训练阶段涉及非常多的环节,例如样本数据的选择与处理、数据特征的设计、模型的设计、损失函数的设计或优化方法的设计等等,任一环节的细微差别都是导致检测准确度细微缺陷的因素。The existing object detection models have been able to achieve very good results in object detection, and can detect objects with high accuracy, but the inventors found that the accuracy of the object detection models still cannot reach a 100% perfect state. From the perspective of business scenarios, for example, in the field of vehicle driving, in some extreme scenarios, subtle defects in the object detection model may have serious consequences for the safe driving of vehicles. From a technical point of view, it is extremely challenging and difficult to further solve subtle defects on the basis of the existing high accuracy, because in the field of machine learning, as mentioned above, there are many links from modeling to training. For example, the selection and processing of sample data, the design of data features, the design of models, the design of loss functions or the design of optimization methods, etc., subtle differences in any link are factors that lead to subtle defects in detection accuracy.
本申请发明人将研究关注在要检测的物体上。在有关可移动平台的实际业务场景中,物体检测往往关注的是人、车辆、道路或树木等尺寸较大的目标,样本数据中这些尺寸较大的目标往往数据量也大。而模型在训练过程会偏向全局最优,尺寸较大的目标的特征往往也较为明显而更容易被关注到,尺寸小的物体的特征相对而言较为细微难以获得模型的关注,使得模型偏向于提取大物体的特征,这种偏执性最终导致模型能够很好地识别出大物体,而不能很好地关注到小物体的识别,而这正是导致物体检测模型细微的缺陷的原因之一。The inventors of the present application focused their research on objects to be detected. In actual business scenarios related to mobile platforms, object detection often focuses on large-sized objects such as people, vehicles, roads, or trees, and these larger-sized objects in the sample data often have a large amount of data. The model will tend to be globally optimal during the training process. The features of objects with larger sizes are often more obvious and easier to be noticed. The features of objects with small sizes are relatively subtle and difficult to obtain the attention of the model, which makes the model biased towards Extracting features of large objects, this paranoia eventually leads to models that recognize large objects well, but not small objects well, and this is one of the reasons for the subtle flaws in object detection models.
基于此,通过对物体的关注,本申请实施例从模型结构设计、特征的设计等多方面进行改进,并提供了一种可进一步提升模型的检测准确度的物体检测方法,如图1所示,是本申请实施例提供的一种物体检测方法的流程图,包括如下步骤:Based on this, by paying attention to objects, the embodiment of the present application improves the model structure design, feature design, etc., and provides an object detection method that can further improve the detection accuracy of the model, as shown in FIG. 1 . , is a flowchart of an object detection method provided by the embodiment of the present application, including the following steps:
在步骤102中,通过可移动平台的传感器对空间采样得到多个采样点;In step 102, a plurality of sampling points are obtained by spatial sampling of the sensor of the movable platform;
在步骤104中,利用物体检测模型检测多个所述采样点中对应的同一物体的目标采样点,并根据所述目标采样点的位置信息检测所述物体的位置信息。In step 104, an object detection model is used to detect a target sampling point of the same object corresponding to the plurality of sampling points, and the position information of the object is detected according to the position information of the target sampling point.
如图2所示,是本申请实施例提供的一种物体检测模型的示意图,其中,所述物体检测模型包括:特征提取网络层,尺寸信息提取网络层,物体检测网络层和位置检测网络层;As shown in FIG. 2, it is a schematic diagram of an object detection model provided by an embodiment of the present application, wherein the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer ;
所述特征提取网络层用于基于神经网络获取所述采样点的特征信息;The feature extraction network layer is used to obtain feature information of the sampling points based on a neural network;
所述尺寸信息提取网络层用于根据所述采样点的所述特征信息,确定所述采样点的尺寸信息,所述尺寸信息用于表征所述采样点属于尺寸在目标尺寸范围内的物体的概率;The size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
所述物体检测网络层用于基于所述特征信息和尺寸信息检测多个所述采样点中对应的同一物体的目标采样点;The object detection network layer is configured to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
所述位置检测网络层用于根据所述目标采样点的位置信息检测所述物体的位置信息。The position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
本实施例中,物体检测模型具有两个信息提取网络层:特征提取网络层和尺寸信息提取网络层。特征提取网络层用于获取所述采样点的特征信息,尺寸信息提取网络层能够在所述特征信息的基础上,进一步确定所述采样点的尺寸信息,尺寸信息表征了所述采样点属于尺寸在目标尺寸范围内的物体的概率,使得物体检测模型除了关注到物体的特征信息之外,还能够关注到属于尺寸在目标尺寸范围内的物体的尺寸信息,尺寸信息表征了特定尺寸的物体,使模型能够增加对特定尺寸物体的考虑,从而能够进一步准确地识别出特定尺寸的物体。In this embodiment, the object detection model has two information extraction network layers: a feature extraction network layer and a size information extraction network layer. The feature extraction network layer is used to obtain the feature information of the sampling point, and the size information extraction network layer can further determine the size information of the sampling point on the basis of the feature information, and the size information indicates that the sampling point belongs to the size The probability of objects within the target size range enables the object detection model to pay attention to the size information of objects belonging to the target size range in addition to the feature information of the objects. The size information represents objects of a specific size. Enables the model to increase the consideration of objects of a specific size, thereby enabling further accurate identification of objects of a specific size.
本申请实施例的方法可应用于可移动平台中,由可移动平台在移动过程中识别物体,以基于识别结果进行移动控制。本申请实施例的可移动平台可以包括:汽车、无人飞行器、无人船或机器人,其中,汽车可以为无人驾驶车辆,也可以为有人驾驶车辆,无人飞行器可以为无人机或其他无人飞行器。当然,可移动平台并不限于上述列举的可移动平台,也可为其他可移动平台。The method of the embodiment of the present application can be applied to a movable platform, and the movable platform recognizes an object during the movement process, so as to perform movement control based on the recognition result. The movable platform in this embodiment of the present application may include: a car, an unmanned aerial vehicle, an unmanned ship, or a robot, wherein the car may be an unmanned vehicle or a manned vehicle, and the unmanned aerial vehicle may be an unmanned aerial vehicle or other unmanned aerial vehicle. Of course, the movable platform is not limited to the movable platforms listed above, and can also be other movable platforms.
本申请实施例可以预先训练有物体检测模型,该物体检测模型可以设置于可移动平台中,也可以设置于与可移动平台连接的服务端中。在一些例子中,物体检测模型可以由业务方预先进行训练,训练后的物体检测模型可以存储在可移动平台中,以使可移动平台识别物体。在另一些例子中,还可以是可移动平台将采集数据发送给服务端,由配置于服务端的物体检测模型利用采集数据进行物体识别后,向可移动平台返回识别结果。In this embodiment of the present application, an object detection model may be pre-trained, and the object detection model may be set in the movable platform, or may be set in a server connected to the movable platform. In some examples, the object detection model may be pre-trained by the business party, and the trained object detection model may be stored in the movable platform, so that the movable platform can recognize the object. In other examples, the mobile platform may also send the collected data to the server, and the object detection model configured on the server uses the collected data to identify the object, and then returns the identification result to the mobile platform.
接下来对物体检测模型的训练过程进行说明。本实施例中,业务方可以预先准备用于训练的样本数据。样本数据可以包括有:属于尺寸在目标尺寸范围内的物体的数据,本实施例为了便于说明,称之为第二类物体,第二类物体具有特定尺寸,具体的尺寸大小可以根据实际业务需求灵活配置,本实施例对此不作限定。以前述无人驾驶领域在小物体识别准确度较低的场景为例,本实施例的目标尺寸范围可以包括:小于预设目标尺寸阈值的尺寸范围,预设目标尺寸阈值可以根据业务需要灵活配置;样本数据还包括不属于尺寸在目标尺寸范围内的物体的数据,本实施例称之为第一类物体。Next, the training process of the object detection model will be described. In this embodiment, the business party may prepare sample data for training in advance. The sample data may include: data belonging to objects whose size is within the target size range. In this embodiment, for convenience of description, it is referred to as the second type of object. The second type of object has a specific size, and the specific size can be based on actual business requirements. Flexible configuration is not limited in this embodiment. Taking the aforementioned scene in the field of unmanned driving where the recognition accuracy of small objects is low as an example, the target size range in this embodiment may include: a size range smaller than a preset target size threshold, and the preset target size threshold can be flexibly configured according to business needs ; The sample data also includes data of objects that do not belong to the target size range, which is referred to as the first type of objects in this embodiment.
相对于第一类物体,第二类物体具有特定尺寸,例如尺寸较小的物体。本实施例中,样本数据中增加了第二类物体的数据,能增强物体检测模型对第二类物体的识别。Objects of the second type have a certain size relative to the objects of the first type, eg, objects of smaller size. In this embodiment, the data of the second type of object is added to the sample data, which can enhance the recognition of the second type of object by the object detection model.
本实施例中模型训练可以是有监督训练,也可以是无监督训练。在一些例子中,可以采用有监督训练方式以提高训练速度,样本数据中可以标注真实值,例如样本数据标注有物体的位置信息,通过有监督的训练方式,可以提高模型训练的速度和精确度。其中,物体的位置信息可以包括一种或多种信息,具体信息可以根据业务需要进行配置,作为例子,物体的位置信息可以包括如下任一:大小信息、坐标信息或方向信息。Model training in this embodiment may be supervised training or unsupervised training. In some examples, a supervised training method can be used to improve the training speed, and the real values can be marked in the sample data. For example, the sample data is marked with the position information of the object. Through the supervised training method, the speed and accuracy of the model training can be improved. . The position information of the object may include one or more kinds of information, and the specific information may be configured according to business needs. As an example, the position information of the object may include any of the following: size information, coordinate information or direction information.
在一些例子中,样本数据可以是多个采样点的点云数据或多个采样点的图像数据。In some examples, the sample data may be point cloud data for multiple sampling points or image data for multiple sampling points.
其中,点云数据可以通过激光雷达或毫米波雷达等传感器采集到。以无人机驾驶车辆为例,无人驾驶车辆是利用车载传感器来感知车辆周围环境,并根据感知所获得的道路、车辆位置和物体信息,控制车辆的转向和速度,从而使车辆能够安全、可靠地在道路上行驶。车载传感器可以包括激光雷达、毫米波雷达、视觉传感器。其中,激光雷达的基本原理为主动对被探测对象发射激光脉冲信号,并获得其反射回来的脉冲信号,根据发射信号和接收信号之间的时间差计算被测对象的深度信息;基于激光雷达的已知发射方向,获得被测对象相对激光雷达的角度信息;结合前述深度和角度信息得到点云数据。Among them, point cloud data can be collected by sensors such as lidar or millimeter wave radar. Taking drone-driven vehicles as an example, unmanned vehicles use on-board sensors to sense the surrounding environment of the vehicle, and control the steering and speed of the vehicle according to the road, vehicle position and object information obtained by sensing, so that the vehicle can be safely, Drive reliably on the road. Vehicle sensors can include lidar, millimeter-wave radar, and vision sensors. Among them, the basic principle of lidar is to actively transmit a laser pulse signal to the detected object, and obtain the reflected pulse signal, and calculate the depth information of the detected object according to the time difference between the transmitted signal and the received signal; Know the emission direction, and obtain the angle information of the measured object relative to the lidar; combine the aforementioned depth and angle information to obtain point cloud data.
在一些例子中,点云数据可以转换为图像数据。在另一些例子中,车载传感器还可以包括可以采集多视角的图像数据的多个摄像头以及可以采集多视角的具有深度信息的图像数据的多个深度摄像头。图像数据也可以通过摄像头等图像采集传感器采集到。In some examples, point cloud data can be converted to image data. In other examples, the vehicle-mounted sensor may further include a plurality of cameras that can collect image data from multiple viewing angles and a plurality of depth cameras that can collect image data with depth information from multiple viewing angles. Image data can also be collected by image acquisition sensors such as cameras.
在一些例子中,上述的样本数据可以是原始数据经过特征工程后获得的数据。特征工程是指从原始数据中找出一些具有物理意义的特征参与模型训练的过程,该过程中涉及数据清洗、数据降维、特征提取、特征归一化、特征评估和筛选、特征降维或特征编码等处理。In some examples, the above-mentioned sample data may be data obtained after feature engineering of the original data. Feature engineering refers to the process of finding some physically meaningful features from the original data to participate in model training. This process involves data cleaning, data dimensionality reduction, feature extraction, feature normalization, feature evaluation and screening, feature dimensionality reduction or Feature encoding, etc.
例如,针对点云数据,点云数据是非结构化数据,需要处理为可输入至物体检测模型的格式,例如将点云数据进行处理,得到点云数据的每个体素对应的点云密度,将点云数据的每个体素对应的点云密度作为物体检测模型的输入。而点云数据的处理方式可以是点云三维网格化处理,例如,对所述点云数据进行栅格划分,获得点云数据的多个体素,点云数据中每个体素所包含的点云数量与点云数据中所有点云数量比例构成了该体素的点云密度。其中,由于点云密度表征了该体素中所包含的点云数量,若点云密度较大,表示该体素对应有物体的概率越大,因此点云数据的每个体素对应的点云密度可作为物体的特征信息。将不规则点云处理成规则的表示形式,能够更好地表现物体的轮廓信息。同时,通过将包括点云密度的网格数据作为模型输入,可以对每个三维网格的点云的数量作区分,提高物体检测的准确率。For example, for point cloud data, the point cloud data is unstructured data and needs to be processed into a format that can be input to the object detection model. For example, the point cloud data is processed to obtain the point cloud density corresponding to each voxel of the point cloud data. The point cloud density corresponding to each voxel of the point cloud data is used as the input of the object detection model. The point cloud data processing method may be point cloud three-dimensional grid processing. For example, the point cloud data is divided into grids to obtain multiple voxels of the point cloud data, and the points contained in each voxel in the point cloud data are obtained. The ratio of the number of clouds to the number of all point clouds in the point cloud data constitutes the point cloud density of the voxel. Among them, since the point cloud density represents the number of point clouds contained in the voxel, if the point cloud density is large, it means that the voxel has a greater probability of corresponding to an object, so the point cloud corresponding to each voxel of the point cloud data is Density can be used as characteristic information of objects. Processing the irregular point cloud into a regular representation can better represent the contour information of the object. At the same time, by using grid data including point cloud density as model input, the number of point clouds of each 3D grid can be distinguished, and the accuracy of object detection can be improved.
针对图像数据,样本数据是指包含有真实物体的图像数据,包括包含有第一类物体的图像数据和包含有第二类物体的图像数据,物体检测模型可以根据真实物体的图像数据自动识别出物体特征。For image data, sample data refers to image data containing real objects, including image data containing first-type objects and image data containing second-type objects. The object detection model can automatically identify the image data based on real objects. Object characteristics.
利用上述样本数据,物体检测模型可以利用样本数据对机器学习模型训练得到。机器学习模型可以是神经网络模型等,例如基于深度学习的神经网络模型。而物体检测模型的具体结构设计,是训练过程的其中一个重要方面。本实施例中,物体检测模型的结构至少包括:特征提取网络层,尺寸信息提取网络层,物体检测网络层和位置 检测网络层。Using the above sample data, the object detection model can be obtained by training a machine learning model using the sample data. The machine learning model may be a neural network model or the like, such as a deep learning-based neural network model. The specific structural design of the object detection model is one of the important aspects of the training process. In this embodiment, the structure of the object detection model at least includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer.
在一些例子中,所述特征提取网络层用于基于深度学习的神经网络获取所述采样点的特征信息。In some examples, the feature extraction network layer is used for acquiring feature information of the sampling points based on a deep learning neural network.
为了增强对第二类物体的识别,本实施例的物体检测模型额外增加了尺寸信息提取网络层,以通过提取第二类物体的尺寸信息来增强模型对第二类物体的识别。In order to enhance the recognition of the second type of objects, the object detection model of this embodiment additionally adds a size information extraction network layer, so as to enhance the model's recognition of the second type of objects by extracting the size information of the second type of objects.
在一些例子中,尺寸信息提取网络层可以独立于特征提取网络层,即通过两个独立的神经网络来分别提取两类信息,此种实现方式需要训练两个独立的神经网络,算法开销较大。在另一些例子中,可以采用主干网络和分支网络的方式实现,主干网络用于接收输入的数据并提取物体的特征信息,分支网络专用于提取尺寸信息,从而可以在提取出特征信息的基础上进一步提取出尺寸信息,相应地降低执行开销,提高模型执行效率。In some examples, the size information extraction network layer can be independent of the feature extraction network layer, that is, two independent neural networks are used to extract two types of information respectively. This implementation requires training two independent neural networks, and the algorithm overhead is relatively large. . In other examples, it can be implemented in the form of a backbone network and a branch network. The backbone network is used to receive the input data and extract the feature information of the object, and the branch network is dedicated to extracting the size information, so that the feature information can be extracted on the basis of The size information is further extracted, the execution overhead is correspondingly reduced, and the model execution efficiency is improved.
训练过程的其中另一个重要方面,是要选取合适的特征。如前述所言,针对图像数据,特征可以是图像数据的像素值;针对点云数据,特征可以是点云数据的每个体素对应的点云密度。为了使模型能够更好地关注到小物体,本实施例中,所述尺寸信息用于表征所述点云数据的每个体素属于尺寸在目标尺寸范围内的物体的概率,或所述尺寸信息用于表征所述图像数据的每个像素属于尺寸在目标尺寸范围内的物体的概率。所述尺寸信息用于:增强所述物体检测网络层对所述采样点属于尺寸在目标尺寸范围内的物体的识别。Another important aspect of the training process is the selection of appropriate features. As mentioned above, for image data, the feature may be the pixel value of the image data; for point cloud data, the feature may be the point cloud density corresponding to each voxel of the point cloud data. In order to enable the model to better pay attention to small objects, in this embodiment, the size information is used to represent the probability that each voxel of the point cloud data belongs to an object whose size is within the target size range, or the size information The probability that each pixel used to characterize the image data belongs to an object whose size is within the target size range. The size information is used to: enhance the recognition by the object detection network layer that the sampling point belongs to an object whose size is within the target size range.
本实施例中,类特征信息可以是特征图的形式,该特征图可以具有指定的大小,例如H*W的大小,H和W分别代表长和宽,具体数值可以根据需要灵活配置;以特征图A为例,在特征图A的基础上进一步提取尺寸信息,可以是从特征图A中每个位置提取的尺寸信息,尺寸信息表征所述采样点属于尺寸在目标尺寸范围内的物体的概率,即表示特征图A中每个位置表示该点属于尺寸在目标尺寸范围内的物体的概率。In this embodiment, the class feature information may be in the form of a feature map, and the feature map may have a specified size, such as the size of H*W, where H and W represent length and width respectively, and specific values can be flexibly configured as needed; Figure A is taken as an example. On the basis of feature map A, size information is further extracted, which can be size information extracted from each position in feature map A. The size information represents the probability that the sampling point belongs to an object whose size is within the target size range. , which means that each position in the feature map A represents the probability that the point belongs to an object whose size is within the target size range.
提取出的尺寸信息可以与特征信息结合进行物体识别,两者结合的方式可以有多种实现方式,例如两者独立,分别作为两类数据进行物体识别。在一些例子中,还可以是将特征信息和尺寸信息相加,根据相加结果检测多个所述采样点中对应的同一物体的目标采样点,例如,尺寸信息可以作为特征信息的附加信息。在另一些例子中,以前述特征图为例,提取出表征特征信息的特征图A,可以在特征图A的基础上加上尺寸信息后形成新的特征图B,基于特征B进行物体识别。The extracted size information can be combined with the feature information for object recognition, and the combination of the two can be implemented in various ways. For example, the two are independent, and they are used as two types of data for object recognition. In some examples, the feature information and the size information can also be added, and the target sampling point of the same object corresponding to the multiple sampling points can be detected according to the addition result. For example, the size information can be used as additional information of the feature information. In other examples, taking the aforementioned feature map as an example, a feature map A representing feature information is extracted, and size information can be added to the feature map A to form a new feature map B, and object recognition can be performed based on feature B.
实际应用中,尺寸信息网络层可以有多种网络结构实现。作为例子,以图作为特征信息的表征方式,可以采用卷积神经网络实现。可选的,所述尺寸信息提取网络层中包括卷积层,例如至少可以有两层卷积层。所述尺寸信息网络层中的卷积层至少有两层。所述尺寸信息可以是通过所述至少两层卷积层对所述特征信息进行至少两次卷积操作得到的。In practical applications, the dimension information network layer can be implemented with various network structures. As an example, a graph can be used to represent the feature information, which can be implemented by using a convolutional neural network. Optionally, the size information extraction network layer includes a convolution layer, for example, there may be at least two convolution layers. The convolutional layer in the size information network layer has at least two layers. The size information may be obtained by performing at least two convolution operations on the feature information by the at least two convolution layers.
训练过程的其中另一个重要方面,需要根据业务需求设计合适的损失函数。损失函数也称之为代价函数,在有监督模型训练的场景下,样本数据中标注有真实值,损失函数用来估量模型的检测值与真实值之间的误差。损失函数对于模型的识别准确性至关重要,基于已有的样本数据和模型的需求来设计何种损失函数是具有较大难度的。在一些例子中,可以利用一些已有的如对数损失函数、平方损失函数、指数损失函数、0/1损失函数等来构成相应场景的损失函数。Another important aspect of the training process is the need to design an appropriate loss function according to the business requirements. The loss function is also called the cost function. In the scenario of supervised model training, the real value is marked in the sample data, and the loss function is used to estimate the error between the detection value of the model and the real value. The loss function is very important to the recognition accuracy of the model, and it is difficult to design what kind of loss function based on the existing sample data and the requirements of the model. In some examples, some existing loss functions such as logarithmic loss functions, squared loss functions, exponential loss functions, 0/1 loss functions, etc. can be used to form loss functions of corresponding scenarios.
基于本实施例的需求,所述物体检测网络层用于:基于所述采样点的特征信息和尺寸信息,获取所述采样点属于物体的置信度,利用所述采样点属于物体的置信度检测多个所述采样点中对应的同一物体的目标采样点。作为例子,所述物体检测模型在训练过程中采用的损失函数包括:物***置信息损失子函数、物体置信度损失子函数以及尺寸信息系数损失子函数。本实施例中,损失函数在物***置损失子函数和物体置信度损失子函数的基础上,额外新增了尺寸信息系数损失子函数,从而模型能够关注到第二类物体,使模型能够在小物体的区分上更加明显。Based on the requirements of this embodiment, the object detection network layer is used to: obtain the confidence that the sampling point belongs to an object based on the feature information and size information of the sampling point, and use the confidence that the sampling point belongs to the object to detect A target sampling point of the same object corresponding to the plurality of sampling points. As an example, the loss functions used in the training process of the object detection model include: an object position information loss sub-function, an object confidence loss sub-function and a size information coefficient loss sub-function. In this embodiment, on the basis of the object position loss sub-function and the object confidence loss sub-function, the loss function additionally adds a size information coefficient loss sub-function, so that the model can pay attention to the second type of objects, so that the model can Objects are more clearly differentiated.
在一些例子中,所述物***置信息损失函数的优化目标包括:降低所述物体检测模型从样本数据中获取的物***置信息与所述样本数据标定的物***置信息之间的差异。In some examples, the optimization objective of the object position information loss function includes: reducing the difference between the object position information obtained from the sample data by the object detection model and the object position information calibrated by the sample data.
作为例子,物体的位置信息包括物体的大小信息、位置信息或方向信息,基于此,所述位置差异包括:所述物体检测模型从样本数据中获取的物体的大小信息、位置信息或方向信息,分别与所述样本数据标定的物体的大小信息、位置信息或方向信息的差异。As an example, the position information of the object includes size information, position information or orientation information of the object, and based on this, the position difference includes: the size information, position information or orientation information of the object obtained by the object detection model from the sample data, The difference from the size information, position information or direction information of the object calibrated by the sample data respectively.
接下来通过一公式为例进行说明,物体的长宽高、位置、朝向,具体表示为(x,y,z,l,h,w,θ),其中,x,y,z表示长宽高,l,h,w表示位置(即该点的坐标信息),θ表示朝向;则物体状态信息损失子函数中,用于描述位置差异的子函数可以包括:Next, a formula is used as an example to illustrate. The length, width, height, position, and orientation of an object are specifically expressed as (x, y, z, l, h, w, θ), where x, y, and z represent length, width, and height. , l, h, w represent the position (that is, the coordinate information of the point), and θ represents the orientation; then in the object state information loss sub-function, the sub-function used to describe the position difference can include:
Figure PCTCN2020137299-appb-000001
Figure PCTCN2020137299-appb-000001
其中,
Figure PCTCN2020137299-appb-000002
为标注的数据真值,包含样本数据中标注的物体的大小信息、坐标信息或朝向信息中的任一;floc(x i)表示物体检测模型识别出的物体的位置信息。
in,
Figure PCTCN2020137299-appb-000002
is the true value of the marked data, including any of the size information, coordinate information or orientation information of the marked object in the sample data; floc( xi ) represents the position information of the object identified by the object detection model.
本实施例的损失函数还包括物体置信度损失子函数,该物体置信度损失子函数的优化目标包括:提高所述物体检测模型从样本数据中检测出的物体的置信度;作为例子:The loss function of this embodiment further includes an object confidence loss sub-function, and the optimization goal of the object confidence loss sub-function includes: improving the confidence of the object detected by the object detection model from the sample data; as an example:
Figure PCTCN2020137299-appb-000003
Figure PCTCN2020137299-appb-000003
其中,fpred(x i)表示物体检测模型从样本数据中检测出的物体的置信度,即识别出物体的概率。 Among them, fpred(x i ) represents the confidence of the object detected by the object detection model from the sample data, that is, the probability of recognizing the object.
为了增强对第二类物体的识别,本实施例的损失函数中还增加了尺寸信息系数损失函数,该尺寸信息系数损失函数用于:增强属于尺寸在目标尺寸范围内的物体的采样点的特征值,在一些例子中,所述尺寸信息系数损失函数的优化目标包括:提高所述物体检测模型从样本数据中检测出属于所述尺寸在目标尺寸范围内的物体的置信度。In order to enhance the recognition of the second type of objects, a size information coefficient loss function is also added to the loss function in this embodiment, and the size information coefficient loss function is used to: enhance the features of sampling points belonging to objects whose size is within the target size range value, in some examples, the optimization goal of the size information coefficient loss function includes: improving the confidence of the object detection model to detect objects belonging to the size within the target size range from the sample data.
作为例子,对于尺寸信息提取网络层提取出的物体的尺寸信息,基于此,物体检测模型通过该尺寸信息,对于每个采样点,检测该点属于属于所述尺寸在目标尺寸范围内的物体的概率,因此尺寸信息系数损失函数即是使得小物体对应位置的特征值大,非小物体的特征值小,使得网络能在小物体上区分更加明显:As an example, for the size information of the object extracted by the size information extraction network layer, based on this, the object detection model uses the size information, for each sampling point, detects that the point belongs to the object whose size is within the target size range. Therefore, the loss function of the size information coefficient is to make the eigenvalue of the corresponding position of the small object large, and the eigenvalue of the non-small object to be small, so that the network can distinguish small objects more clearly:
作为例子:as an example:
Figure PCTCN2020137299-appb-000004
Figure PCTCN2020137299-appb-000004
其中,fseg(x k)表示所述物体检测模型从样本数据中检测出属于所述尺寸在目标尺寸范围内的物体的置信度。 Wherein, fseg(x k ) represents the confidence level that the object detection model detects objects whose size is within the target size range from the sample data.
综上,本实施例的物体检测模型在训练过程中采用的损失函数可以包括:To sum up, the loss function used in the training process of the object detection model of this embodiment may include:
Figure PCTCN2020137299-appb-000005
Figure PCTCN2020137299-appb-000005
上述损失函数的具体公式只是示意说明,实际应用中可以根据需要灵活配置函数的具体数学描述,还可以根据需要确定是否添加正则化项,本实施例对此不做限定。The specific formula of the above loss function is only a schematic illustration. In practical applications, the specific mathematical description of the function can be flexibly configured as needed, and whether to add a regularization term can also be determined as needed, which is not limited in this embodiment.
训练过程中,需要利用最优化方法对评价函数进行最优化求解,找到评价最高的模型。例如,可以通过诸如梯度下降法等最优化方法,找到损失函数的输出误差的最小值(最优解),将模型的参数调到最优,即求解到模型中各网络层的最优系数。在一些例子中,求解的过程可以是通过计算模型的输出和损失函数的误差值,以求解对模型参数进行调整的梯度。作为例子,可以调用反向传播函数,来计算梯度,将所述损失函数的计算结果反向传播至所述物体检测模型中,以使所述物体检测模型更新模型参数。In the training process, it is necessary to use the optimization method to optimize the evaluation function and find the model with the highest evaluation. For example, the minimum value (optimal solution) of the output error of the loss function can be found through optimization methods such as gradient descent, and the parameters of the model can be adjusted to the optimum, that is, the optimal coefficients of each network layer in the model can be solved. In some examples, the process of solving may be to solve for gradients that adjust model parameters by computing the output of the model and the error value of the loss function. As an example, a back-propagation function can be called to calculate the gradient, and the calculation result of the loss function can be back-propagated into the object detection model, so that the object detection model can update model parameters.
在一些例子中,上述损失函数的求解可以采用独立的求解器进行求解。在另一些例子中,以物体检测模型为神经网络模型为例,可以在主干网络的基础上设置网络分支用于计算网络的损失函数。作为例子,损失函数可以分为上述三个子函数:物***置信息损失子函数、物体置信度损失子函数以及尺寸信息系数损失子函数,可以设置对应的三个网络分支分别进行求解,这些子损失函数将会统一指导更新神经网络的参数,使其具备更好的检测性能。In some instances, the solution of the loss function described above can be solved using a stand-alone solver. In other examples, taking the object detection model as an example of a neural network model, a network branch may be set on the basis of the backbone network to calculate the loss function of the network. As an example, the loss function can be divided into the above three sub-functions: the object position information loss sub-function, the object confidence loss sub-function and the size information coefficient loss sub-function. The corresponding three network branches can be set to solve separately. These sub-loss functions There will be unified guidance to update the parameters of the neural network to make it have better detection performance.
通过上述训练过程,训练结束获得物体检测模型,获得的物体检测模型还可利用测试样本进行测试,以检验物体检测模型的识别准确度。最终获得的物体检测模型可以设置于可移动平台中或服务端中,在需要时,通过可移动平台的传感器对空间采样,获取待识别的多个采样点,由物体检测模型检测物体的位置信息。Through the above training process, the object detection model is obtained after the training, and the obtained object detection model can also be tested by using the test sample to check the recognition accuracy of the object detection model. The finally obtained object detection model can be set in the movable platform or the server. When necessary, the space is sampled by the sensors of the movable platform to obtain multiple sampling points to be identified, and the object detection model detects the position information of the object. .
与前述训练阶段相对应的,在一些例子中,传感器可以是激光雷达、毫米波雷达和视觉传感器等,相应的可以采集到待识别的多个采样点的点云数据;或者,点云数据还可以转换为图像数据。在另一些例子中,传感器还可以是图像传感器,相应的可以采集待识别的多个采样点的图像数据。在一些例子中,点云数据可以栅格划分后得到多个体素,通过计算各个体素对应的点云密度后,将各个体素对应的点云密度输入至物体检测模型中进行识别。在另一些例子中,输入至所述物体检测模型的数据包括图像数据的像素值。其中,点云数据的体素是对所述点云数据进行栅格划分获得的。Corresponding to the aforementioned training phase, in some examples, the sensors may be lidars, millimeter-wave radars, vision sensors, etc., correspondingly, point cloud data of multiple sampling points to be identified may be collected; Can be converted to image data. In other examples, the sensor may also be an image sensor, and correspondingly, image data of multiple sampling points to be identified may be collected. In some examples, the point cloud data can be divided into a grid to obtain multiple voxels. After calculating the point cloud density corresponding to each voxel, the point cloud density corresponding to each voxel is input into the object detection model for identification. In other examples, the data input to the object detection model includes pixel values of image data. The voxels of the point cloud data are obtained by dividing the point cloud data into a grid.
本实施例中,在获取到采样点的特征信息的基础上,进一步提取出采样点的尺寸信息,由于尺寸信息表征采样点属于尺寸在目标尺寸范围内的物体的概率,因此,本实施例的物体检测模型能够更好地识别出特定尺寸的物体。本实施例将属于尺寸在目标尺寸范围内的物体称之为第二类物体;在一些例子中,所述目标尺寸范围包括:小于预设目标尺寸阈值的尺寸范围,即第二类物体是尺寸较小的物体。In this embodiment, on the basis of acquiring the characteristic information of the sampling points, the size information of the sampling points is further extracted. Since the size information represents the probability that the sampling points belong to objects whose size is within the target size range, the size information of the sampling points in this embodiment is Object detection models are better at recognizing objects of a certain size. In this embodiment, objects whose size is within the target size range are referred to as the second type of objects; in some examples, the target size range includes: a size range smaller than a preset target size threshold, that is, the second type of objects is a size smaller objects.
本实施例中,特征信息可以是特征图的形式,该特征图可以具有指定的大小,例 如H*W的大小,H和W分别代表长和宽,具体数值可以根据需要灵活配置;以特征图A为例,在特征图A的基础上进一步提取尺寸信息,可以是从特征图A中每个位置提取的尺寸信息,尺寸信息表征所述采样点属于尺寸在目标尺寸范围内的物体的概率,即表示特征图A中每个位置属于尺寸在目标尺寸范围内的物体的概率。In this embodiment, the feature information can be in the form of a feature map, and the feature map can have a specified size, such as the size of H*W, where H and W represent length and width respectively, and the specific values can be flexibly configured as needed; Taking A as an example, the size information is further extracted on the basis of the feature map A, which can be the size information extracted from each position in the feature map A. The size information represents the probability that the sampling point belongs to an object whose size is within the target size range, That is, it represents the probability that each position in the feature map A belongs to an object whose size is within the target size range.
提取出的尺寸信息可以与特征信息结合进行物体检测,两者结合的方式可以有多种实现方式,例如两者独立,分别作为两类数据进行物体识别。在一些例子中,还可以是将特征信息和尺寸信息相加,根据相加结果检测多个所述采样点中对应的同一物体的目标采样点,例如,尺寸信息可以作为特征信息的附加信息。在另一些例子中,以前述特征图为例,提取出表征特征信息的特征图A,可以在特征图A的基础上加上尺寸信息后形成新的特征图B,基于特征B进行物体识别。The extracted size information can be combined with the feature information for object detection, and the combination of the two can be implemented in various ways. For example, the two are independent, and they are used as two types of data for object recognition. In some examples, the feature information and the size information can also be added, and the target sampling point of the same object corresponding to the multiple sampling points can be detected according to the addition result. For example, the size information can be used as additional information of the feature information. In other examples, taking the aforementioned feature map as an example, a feature map A representing feature information is extracted, and size information can be added to the feature map A to form a new feature map B, and object recognition can be performed based on feature B.
实际应用中,尺寸信息提取网络层可以有多种网络结构实现。作为例子,以图作为特征信息的表征方式,可以采用卷积神经网络实现。可选的,所述尺寸信息提取网络层中包括卷积层,例如至少可以有两层卷积层。所述尺寸信息提取网络层中的卷积层至少有两层。所述尺寸信息可以是通过所述至少两层卷积层对所述特征信息进行至少两次卷积操作得到的。In practical applications, the size information extraction network layer can be implemented with various network structures. As an example, a graph can be used to represent the feature information, which can be implemented by using a convolutional neural network. Optionally, the size information extraction network layer includes a convolution layer, for example, there may be at least two convolution layers. There are at least two convolution layers in the size information extraction network layer. The size information may be obtained by performing at least two convolution operations on the feature information by the at least two convolution layers.
本实施例中,采集到的数据输入至物体检测模型后,所述特征提取网络层用于基于神经网络获取所述采样点的特征信息;所述尺寸信息提取网络层用于根据所述采样点的所述特征信息,确定所述采样点的尺寸信息,所述尺寸信息用于表征所述采样点属于尺寸在目标尺寸范围内的物体的概率;所述物体检测网络层用于基于所述特征信息和尺寸信息检测多个所述采样点中对应的同一物体的目标采样点;所述位置检测网络层用于根据所述目标采样点的位置信息检测所述物体的位置信息。In this embodiment, after the collected data is input into the object detection model, the feature extraction network layer is used to obtain the feature information of the sampling points based on the neural network; the size information extraction network layer is used to extract the feature information according to the sampling points the feature information, determine the size information of the sampling point, and the size information is used to represent the probability that the sampling point belongs to an object whose size is within the target size range; the object detection network layer is used based on the feature The information and size information detect the target sampling point of the same object corresponding to the plurality of sampling points; the position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
在一些例子中,所述物体检测网络层用于:基于所述采样点的特征信息和尺寸信息,获取所述采样点属于物体的置信度,利用所述采样点属于物体的置信度检测多个所述采样点中对应的同一物体的目标采样点。In some examples, the object detection network layer is configured to: based on the feature information and size information of the sampling point, obtain a confidence that the sampling point belongs to an object, and use the confidence that the sampling point belongs to an object to detect multiple The target sampling points of the same object corresponding to the sampling points.
由于在特征信息的基础上,本实施例的模型能够提取出特定的尺寸信息,由于尺寸信息表征所述采样点属于尺寸在目标尺寸范围内的物体的概率,从而提高了物体检测模型对采样点属于尺寸在目标尺寸范围内的物体的关注,进而能够更好地检测出尺寸在目标尺寸范围内的物体。Based on the feature information, the model in this embodiment can extract specific size information, and since the size information represents the probability that the sampling point belongs to an object whose size is within the target size range, the object detection model can improve the accuracy of the sampling point. It belongs to the attention of objects whose size is within the target size range, and thus can better detect objects whose size is within the target size range.
在可移动平台采集点云数据作为模型输入的场景中,由于采用本实施例方案可以很好地关注到小物体并识别出来,因此在点云稀疏的条件下仍然能够保证物体识别准确度。可移动平台采用单个传感器检测获得的点云一般较为稀疏,在稀疏点云的条件下,通常物体检测效果不佳。为解决点云稀疏问题,相关技术采用多个传感器组合方式来获得点云,例如采用激光雷达检测获得点云的同时,利用其它传感器进行辅助融合,从而获得密集质量较高的点云,不同传感器之间的数据融合过程较为复杂,同时,多个传感器还增加了硬件成本。而本申请实施例在可移动平台采用单个传感器的情况下能够物体识别准确度,从而使得可移动平台无需额外利用其它传感器进行辅助融合。In the scenario where the movable platform collects point cloud data as model input, since the solution of this embodiment can well pay attention to small objects and identify them, the object recognition accuracy can still be guaranteed under the condition of sparse point clouds. The point cloud obtained by the mobile platform using a single sensor detection is generally sparse. Under the condition of sparse point cloud, the object detection effect is usually not good. In order to solve the problem of point cloud sparseness, related technologies use a combination of multiple sensors to obtain point clouds. For example, while using lidar detection to obtain point clouds, other sensors are used for auxiliary fusion to obtain dense and high-quality point clouds. The data fusion process between them is more complicated, and at the same time, multiple sensors also increase the hardware cost. However, in the embodiment of the present application, when the movable platform adopts a single sensor, the object recognition accuracy can be achieved, so that the movable platform does not need to use other sensors for auxiliary fusion.
接下来再通过一实施例进行说明。Next, an embodiment will be used for description.
在自动驾驶领域,三维物体检测是一个核心问题,而利用激光雷达等传感器进行检测会存在小物体较难检测的情况,本实施例方案中,采用深度学习中的神经网络,利用该神经网络检测三维物体的位置与置信度,通过在该神经网络中加入一网络分支, 并改进深度学习算法中神经网络的训练策略,使得该分支用于提取小物体的尺寸信息,最终使得模型小物体检测更友好。In the field of autonomous driving, three-dimensional object detection is a core problem, and detection of small objects by sensors such as lidar may be difficult to detect. In this embodiment, the neural network in deep learning is used to detect The position and confidence of the three-dimensional object, by adding a network branch to the neural network, and improving the training strategy of the neural network in the deep learning algorithm, so that the branch is used to extract the size information of small objects, and finally makes the model small object detection more accurate. friendly.
作为例子,首先生成一个可检测三维物体的基于神经网络的物体检测模型,该模型可以接收输入的三维点云/图像数据。以三维点云为例,首先对点云在xyz方向上按照一定分辨率进行三维网格划分处理,即将三维空间体素化,对空间点云进行三个方得到三维空间的体素,然后基于体素中的点云密度确定该体素对应的点云特征;每个体素中如果有点云,计算该位置的点云密度p,则将该位置的特征置为p,对于无点云存在的体素,将其点云特征置为0;从而生成神经网络可以接收的输入维度。As an example, first generate a neural network-based object detection model that can detect 3D objects, which can receive input 3D point cloud/image data. Taking the three-dimensional point cloud as an example, firstly, the point cloud is divided into three-dimensional grid according to a certain resolution in the xyz direction, that is, the three-dimensional space is voxelized. The point cloud density in the voxel determines the point cloud feature corresponding to the voxel; if there is a point cloud in each voxel, calculate the point cloud density p of the position, and set the feature of the position as p. voxels, set their point cloud features to 0; thus generating the input dimension that the neural network can receive.
输入的数据首先会经过物体检测模型中的特征提取网络层,从而可以提取点云的特征以生成特征图,该特征图表征了采样点的特征信息。The input data will first go through the feature extraction network layer in the object detection model, so that the features of the point cloud can be extracted to generate a feature map, which represents the feature information of the sampling points.
对于该特征图,将接入另一条网络分支,即尺寸信息提取网络层,用于生成新的特征图,新的特征图表征了采样点的尺寸信息。本实施例中,该网络分支可通过两个卷积操作,对该特征图的每个位置都将生成一个尺寸信息,该尺寸信息具有较好的描述属于尺寸在目标尺寸范围内的物体的语义信息,其表征了该位置属于尺寸在目标尺寸范围内的物体的概率。提取出的尺寸信息可以与特征信息的特征图进行相加,得到一个新的特征图B,采用相加操作可以融合更强的语义信息。For this feature map, another network branch, that is, the size information extraction network layer, will be connected to generate a new feature map, which represents the size information of the sampling points. In this embodiment, the network branch can perform two convolution operations, and each position of the feature map will generate a size information, and the size information has better semantics to describe the objects belonging to the target size range Information that characterizes the probability that the location belongs to an object whose size is within the target size range. The extracted size information can be added with the feature map of the feature information to obtain a new feature map B, and the addition operation can be used to fuse stronger semantic information.
基于新的特征图B,其表征了特征信息和尺寸信息,可以基于特征信息和尺寸信息来检测多个所述采样点中对应的同一物体的目标采样点,进而根据所述目标采样点的位置信息检测所述物体的位置信息。作为例子,对于输入的待识别的采样点,可以识别一系列候选框,每个候选框可能对应一个物体,候选框内的采样点即对应同一物体的目标采样点,基于识别出的采样点属于物体的置信度,该置信度表征属于物体的概率,可以确定候选框对应属于物体的概率,将各个候选框对应的概率进行排序,排序后按照设定阈值进行筛选,大于设定阈值的可以认为识别出一个物体,进而可以根据候选框内目标采样点的位置信息检测出所述物体的位置信息,得到最终的检测结果。Based on the new feature map B, which represents feature information and size information, the target sampling points of the same object corresponding to the plurality of sampling points can be detected based on the feature information and size information, and then according to the position of the target sampling point The information detects the position information of the object. As an example, for the input sampling points to be identified, a series of candidate frames can be identified. Each candidate frame may correspond to an object. The sampling points in the candidate frame are the target sampling points corresponding to the same object. Based on the identified sampling points belonging to The confidence level of the object, the confidence level represents the probability of belonging to the object, the probability that the candidate frame corresponds to belonging to the object can be determined, the probability corresponding to each candidate frame is sorted, and the sorting is performed according to the set threshold. An object is identified, and then the position information of the object can be detected according to the position information of the target sampling point in the candidate frame, and the final detection result is obtained.
通过本实施例的物体检测模型识别出的物体的位置信息,可用于可移动平台的自动移动决策,例如,可用于对汽车进行自动驾驶决策,无人飞行器进行自动飞行决策等。The position information of the object identified by the object detection model of this embodiment can be used for automatic movement decision of the movable platform, for example, it can be used for automatic driving decision of a car, automatic flight decision of an unmanned aerial vehicle, and the like.
上述方法实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在物体检测的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,如图3所示,为实施本实施例物体检测装置300的一种硬件结构图,除了图3所示的处理器301、内存302、以及非易失性存储器303之外,实施例中用于实施本物体识别方法的物体检测设备,通常根据该物体检测设备的实际功能,还可以包括其他硬件,对此不再赘述。The foregoing method embodiments may be implemented by software, and may also be implemented by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory through the processor that detects the object in which it is located. From the perspective of hardware, as shown in FIG. 3 , which is a hardware structure diagram for implementing the object detection apparatus 300 of this embodiment, except for the processor 301 , the memory 302 , and the non-volatile memory 303 shown in FIG. 3 , the object detection device used for implementing the object recognition method in the embodiment, usually according to the actual function of the object detection device, may also include other hardware, which will not be repeated here.
本实施例中,所述处理器301执行所述计算机程序时实现以下步骤:In this embodiment, the processor 301 implements the following steps when executing the computer program:
通过可移动平台的传感器对空间采样得到多个采样点;A plurality of sampling points are obtained by sampling the space through the sensor of the movable platform;
利用物体检测模型检测多个所述采样点中对应的同一物体的目标采样点,并根据所述目标采样点的位置信息检测所述物体的位置信息;Use the object detection model to detect the target sampling points of the same object corresponding to the plurality of sampling points, and detect the position information of the object according to the position information of the target sampling points;
其中,所述物体检测模型包括:特征提取网络层,尺寸信息提取网络层,物体检测网络层和位置检测网络层;Wherein, the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer;
所述特征提取网络层用于基于神经网络获取所述采样点的特征信息;The feature extraction network layer is used to obtain feature information of the sampling points based on a neural network;
所述尺寸信息提取网络层用于根据所述采样点的所述特征信息,确定所述采样点的尺寸信息,所述尺寸信息用于表征所述采样点属于尺寸在目标尺寸范围内的物体的概率;The size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
所述物体检测网络层用于基于所述特征信息和尺寸信息检测多个所述采样点中对应的同一物体的目标采样点;The object detection network layer is configured to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
所述位置检测网络层用于根据所述目标采样点的位置信息检测所述物体的位置信息。The position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
在一些例子中,所述目标尺寸范围包括:小于预设目标尺寸阈值的尺寸范围。In some examples, the target size range includes a size range smaller than a preset target size threshold.
在一些例子中,对空间采样得到的多个采样点的数据包括:多个采样点的点云数据和/或多个采样点的图像数据。In some examples, the data of the multiple sampling points obtained by spatial sampling includes: point cloud data of the multiple sampling points and/or image data of the multiple sampling points.
在一些例子中,输入至所述物体检测模型的数据包括:所述点云数据的每个体素对应的点云密度和/或图像数据的像素值。In some examples, the data input to the object detection model includes: a point cloud density corresponding to each voxel of the point cloud data and/or a pixel value of the image data.
在一些例子中,所述体素是对所述点云数据进行栅格划分获得的。In some examples, the voxels are obtained by rasterizing the point cloud data.
在一些例子中,所述尺寸信息用于表征所述点云数据的每个体素属于尺寸在目标尺寸范围内的物体的概率,或所述尺寸信息用于表征所述图像数据的每个像素属于尺寸在目标尺寸范围内的物体的概率。In some examples, the size information is used to characterize the probability that each voxel of the point cloud data belongs to an object whose size is within the target size range, or the size information is used to characterize that each pixel of the image data belongs to Probability of an object whose size is within the target size range.
在一些例子中,所述尺寸信息用于:增强所述物体检测网络层对所述采样点属于尺寸在目标尺寸范围内的物体的识别。In some examples, the size information is used to: enhance the recognition by the object detection network layer that the sampling point belongs to an object whose size is within a target size range.
在一些例子中,所述物体检测网络层用于:将所述特征信息和尺寸信息相加,利用相加结果检测多个所述采样点中对应的同一物体的目标采样点。In some examples, the object detection network layer is configured to: add the feature information and the size information, and use the addition result to detect a target sampling point of the same object corresponding to the plurality of sampling points.
在一些例子中,所述特征提取网络层用于基于深度学习的神经网络获取所述采样点的特征信息。In some examples, the feature extraction network layer is used for acquiring feature information of the sampling points based on a deep learning neural network.
在一些例子中,所述尺寸信息提取网络层是所述特征提取网络层的分支。In some examples, the size information extraction network layer is a branch of the feature extraction network layer.
在一些例子中,所述尺寸信息提取网络层中包括卷积层。In some examples, the dimension information extraction network layer includes a convolutional layer.
在一些例子中,所述尺寸信息,是利用所述卷积层对所述特征信息进行卷积操作得到的。In some examples, the size information is obtained by using the convolution layer to perform a convolution operation on the feature information.
在一些例子中,所述尺寸信息提取网络层中的卷积层至少有两层。In some examples, there are at least two convolutional layers in the size information extraction network layer.
在一些例子中,所述尺寸信息是通过所述至少两层卷积层对所述特征信息进行至少两次卷积操作得到的。In some examples, the size information is obtained by performing at least two convolution operations on the feature information by the at least two convolutional layers.
在一些例子中,所述物体检测网络层用于:基于所述采样点的特征信息和尺寸信息,获取所述采样点属于物体的置信度,利用所述采样点属于物体的置信度检测多个所述采样点中对应的同一物体的目标采样点。In some examples, the object detection network layer is configured to: based on the feature information and size information of the sampling point, obtain a confidence that the sampling point belongs to an object, and use the confidence that the sampling point belongs to an object to detect multiple The target sampling points of the same object corresponding to the sampling points.
在一些例子中,所述物体检测模型在训练过程中采用的损失函数包括:物***置信息损失子函数、物体置信度损失子函数以及尺寸信息系数损失子函数。In some examples, the loss functions used in the training process of the object detection model include: an object position information loss sub-function, an object confidence loss sub-function, and a size information coefficient loss sub-function.
在一些例子中,所述物***置信息损失函数的优化目标包括:降低所述物体检测模型从样本数据中获取的物***置信息与所述样本数据标定的物***置信息之间的差异。In some examples, the optimization objective of the object position information loss function includes: reducing the difference between the object position information obtained from the sample data by the object detection model and the object position information calibrated by the sample data.
在一些例子中,所述物体置信度损失子函数的优化目标包括:提高所述物体检测模型从样本数据中检测出的物体的置信度。In some examples, the optimization objective of the object confidence loss sub-function includes: improving the confidence of the object detected by the object detection model from the sample data.
在一些例子中,所述尺寸信息系数损失函数用于:增强属于尺寸在目标尺寸范围内的物体的采样点的特征值。In some examples, the size information coefficient loss function is used to enhance feature values of sample points belonging to objects whose size is within the target size range.
在一些例子中,所述尺寸信息系数损失函数的优化目标包括:提高所述物体检测模型从样本数据中检测出属于所述尺寸在目标尺寸范围内的物体的置信度。In some examples, the optimization objective of the size information coefficient loss function includes: improving the confidence that the object detection model detects objects whose size is within the target size range from the sample data.
在一些例子中,所述物体检测模型的训练过程包括:将所述损失函数的计算结果反向传播至所述物体检测模型中,以使所述物体检测模型更新模型参数。In some examples, the training process of the object detection model includes back-propagating the calculation result of the loss function into the object detection model, so that the object detection model updates model parameters.
在一些例子中,所述物体的位置信息包括如下任一:大小信息、坐标信息或方向信息。In some examples, the location information of the object includes any of the following: size information, coordinate information or orientation information.
在一些例子中,所述设备应用于可移动平台。In some examples, the apparatus is applied to a movable platform.
在一些例子中,所述点云数据是利用配置于所述可移动平台上的激光雷达或者具有深度信息采集功能的摄像设备获取的。In some examples, the point cloud data is acquired by using a lidar or a camera device with a depth information acquisition function configured on the movable platform.
在一些例子中,所述图像数据是利用配置于所述可移动平台上的摄像设备获取的。In some examples, the image data is acquired using a camera device disposed on the movable platform.
在一些例子中,所述可移动平台包括:无人飞行器、汽车、无人船或机器人。In some examples, the movable platform includes: an unmanned aerial vehicle, a car, an unmanned boat, or a robot.
在一些例子中,检测的所述物体的位置信息用于:对所述汽车进行自动驾驶决策。In some examples, the detected location information of the object is used to: make an autonomous driving decision for the car.
在一些例子中,检测的所述物体的位置信息用于:对所述无人飞行器进行自动飞行决策。In some examples, the detected location information of the object is used to: make automatic flight decisions for the UAV.
如图4所示,本申请实施例还提供一种可移动平台400,包括:机体401;动力***402,安装在所述机体401内,用于为所述可移动平台提供动力;以及任一实施例所述的物体检测设备300。As shown in FIG. 4 , an embodiment of the present application further provides a movable platform 400, including: a body 401; a power system 402 installed in the body 401 to provide power for the movable platform; The object detection device 300 described in the embodiment.
可选地,所述可移动平台400为车辆、无人机、无人船或者可移动机器人。Optionally, the movable platform 400 is a vehicle, an unmanned aerial vehicle, an unmanned ship or a mobile robot.
本说明书实施例还提供一种计算机可读存储介质,所述可读存储介质上存储有若干计算机指令,所述计算机指令被执行时实任一实施例所述物体检测方法的步骤。The embodiments of this specification further provide a computer-readable storage medium, where several computer instructions are stored on the readable storage medium, and when the computer instructions are executed, the steps of the object detection method in any one of the embodiments are implemented.
本说明书实施例可采用在一个或多个其中包含有程序代码的存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。计算机可用存储介质包括永久性和非永久性、可移动和非可移动媒体,可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括但不限于:相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。Embodiments of the present specification may take the form of a computer program product embodied on one or more storage media having program code embodied therein, including but not limited to disk storage, CD-ROM, optical storage, and the like. Computer-usable storage media includes permanent and non-permanent, removable and non-removable media, and storage of information can be accomplished by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。For the apparatus embodiments, since they basically correspond to the method embodiments, reference may be made to the partial descriptions of the method embodiments for related parts. The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体 意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. The terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also other not expressly listed elements, or also include elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
以上对本发明实施例所提供的方法和装置进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The methods and devices provided by the embodiments of the present invention have been described in detail above. The principles and implementations of the present invention are described with specific examples in this paper. The descriptions of the above embodiments are only used to help understand the methods of the present invention and its implementation. At the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific implementation and application scope. To sum up, the content of this description should not be construed as a limitation to the present invention. .

Claims (58)

  1. 一种物体检测方法,其特征在于,包括:An object detection method, comprising:
    通过可移动平台的传感器对空间采样得到多个采样点;A plurality of sampling points are obtained by sampling the space through the sensor of the movable platform;
    利用物体检测模型检测多个所述采样点中对应的同一物体的目标采样点,并根据所述目标采样点的位置信息检测所述物体的位置信息;Use the object detection model to detect the target sampling points of the same object corresponding to the plurality of sampling points, and detect the position information of the object according to the position information of the target sampling points;
    其中,所述物体检测模型包括:特征提取网络层,尺寸信息提取网络层,物体检测网络层和位置检测网络层;Wherein, the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer;
    所述特征提取网络层用于基于神经网络获取所述采样点的特征信息;The feature extraction network layer is used to obtain feature information of the sampling points based on a neural network;
    所述尺寸信息提取网络层用于根据所述采样点的所述特征信息,确定所述采样点的尺寸信息,所述尺寸信息用于表征所述采样点属于尺寸在目标尺寸范围内的物体的概率;The size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
    所述物体检测网络层用于基于所述特征信息和尺寸信息检测多个所述采样点中对应的同一物体的目标采样点;The object detection network layer is configured to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
    所述位置检测网络层用于根据所述目标采样点的位置信息检测所述物体的位置信息。The position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
  2. 根据权利要求1所述的方法,其特征在于,所述目标尺寸范围包括:小于预设目标尺寸阈值的尺寸范围。The method according to claim 1, wherein the target size range comprises: a size range smaller than a preset target size threshold.
  3. 根据权利要求2所述的方法,其特征在于,对空间采样得到的多个采样点的数据包括:多个采样点的点云数据和/或多个采样点的图像数据。The method according to claim 2, wherein the data of multiple sampling points obtained by spatial sampling comprises: point cloud data of multiple sampling points and/or image data of multiple sampling points.
  4. 根据权利要求3所述的方法,其特征在于,输入至所述物体检测模型的数据包括:所述点云数据的每个体素对应的点云密度和/或图像数据的像素值。The method according to claim 3, wherein the data input to the object detection model comprises: a point cloud density corresponding to each voxel of the point cloud data and/or a pixel value of the image data.
  5. 根据权利要求4所述的方法,其特征在于,所述体素是对所述点云数据进行栅格划分获得的。The method according to claim 4, wherein the voxels are obtained by grid division of the point cloud data.
  6. 根据权利要求5所述的方法,其特征在于,所述尺寸信息用于表征所述点云数据的每个体素属于尺寸在目标尺寸范围内的物体的概率,或所述尺寸信息用于表征所述图像数据的每个像素属于尺寸在目标尺寸范围内的物体的概率。The method according to claim 5, wherein the size information is used to characterize the probability that each voxel of the point cloud data belongs to an object whose size is within a target size range, or the size information is used to characterize all the voxels of the point cloud data. The probability that each pixel of the image data belongs to an object whose size is within the target size range.
  7. 根据权利要求1或6所述的方法,其特征在于,所述尺寸信息用于:增强所述物体检测网络层对所述采样点属于尺寸在目标尺寸范围内的物体的识别。The method according to claim 1 or 6, wherein the size information is used to: enhance the recognition by the object detection network layer that the sampling point belongs to an object whose size is within a target size range.
  8. 根据权利要求7所述的方法,其特征在于,所述物体检测网络层用于:将所述特征信息和尺寸信息相加,利用相加结果检测多个所述采样点中对应的同一物体的目标采样点。The method according to claim 7, wherein the object detection network layer is configured to: add the feature information and size information, and use the addition result to detect the same object corresponding to the plurality of sampling points target sampling point.
  9. 根据权利要求1所述的方法,其特征在于,所述特征提取网络层用于基于深度学习的神经网络获取所述采样点的特征信息。The method according to claim 1, wherein the feature extraction network layer is used for acquiring feature information of the sampling points based on a deep learning neural network.
  10. 根据权利要求1所述的方法,其特征在于,所述尺寸信息提取网络层是所述特征提取网络层的分支。The method of claim 1, wherein the size information extraction network layer is a branch of the feature extraction network layer.
  11. 根据权利要求10所述的方法,其特征在于,所述尺寸信息提取网络层中包括卷积层。The method according to claim 10, wherein the size information extraction network layer includes a convolution layer.
  12. 根据权利要求11所述的方法,其特征在于,所述尺寸信息,是利用所述卷积层对所述特征信息进行卷积操作得到的。The method according to claim 11, wherein the size information is obtained by performing a convolution operation on the feature information by using the convolution layer.
  13. 根据权利要求11所述的方法,其特征在于,所述尺寸信息提取网络层中的卷积层至少有两层。The method according to claim 11, wherein there are at least two convolution layers in the size information extraction network layer.
  14. 根据权利要求13所述的方法,其特征在于,所述尺寸信息是通过所述至少两层卷积层对所述特征信息进行至少两次卷积操作得到的。The method according to claim 13, wherein the size information is obtained by performing at least two convolution operations on the feature information through the at least two convolution layers.
  15. 根据权利要求1所述的方法,其特征在于,所述物体检测网络层用于:基于所述采样点的特征信息和尺寸信息,获取所述采样点属于物体的置信度,利用所述采样点属于物体的置信度检测多个所述采样点中对应的同一物体的目标采样点。The method according to claim 1, wherein the object detection network layer is configured to: obtain the confidence that the sampling point belongs to an object based on the feature information and size information of the sampling point, and use the sampling point The confidence level belonging to the object detects the target sampling point of the same object corresponding to the plurality of sampling points.
  16. 根据权利要求15所述的方法,其特征在于,所述物体检测模型在训练过程中采用的损失函数包括:物***置信息损失子函数、物体置信度损失子函数以及尺寸信息系数损失子函数。The method according to claim 15, wherein the loss function used in the training process of the object detection model comprises: an object position information loss sub-function, an object confidence loss sub-function and a size information coefficient loss sub-function.
  17. 根据权利要求16所述的方法,其特征在于,所述物***置信息损失函数的优化目标包括:降低所述物体检测模型从样本数据中获取的物***置信息与所述样本数据标定的物***置信息之间的差异。The method according to claim 16, wherein the optimization objective of the object position information loss function comprises: reducing the object position information obtained by the object detection model from the sample data and the object position information calibrated by the sample data. difference between.
  18. 根据权利要求16所述的方法,其特征在于,所述物体置信度损失子函数的优化目标包括:提高所述物体检测模型从样本数据中检测出的物体的置信度。The method according to claim 16, wherein the optimization objective of the object confidence loss sub-function comprises: improving the confidence of the object detected by the object detection model from the sample data.
  19. 根据权利要求16所述的方法,其特征在于,所述尺寸信息系数损失函数用于:增强属于尺寸在目标尺寸范围内的物体的采样点的特征值。The method according to claim 16, wherein the size information coefficient loss function is used to: enhance the feature values of the sampling points belonging to the object whose size is within the target size range.
  20. 根据权利要求16所述的方法,其特征在于,所述尺寸信息系数损失函数的优化目标包括:提高所述物体检测模型从样本数据中检测出属于所述尺寸在目标尺寸范围内的物体的置信度。The method according to claim 16, wherein the optimization objective of the size information coefficient loss function comprises: improving the confidence that the object detection model detects objects whose size is within the target size range from the sample data Spend.
  21. 根据权利要求16所述的方法,其特征在于,所述物体检测模型的训练过程包括:将所述损失函数的计算结果反向传播至所述物体检测模型中,以使所述物体检测模型更新模型参数。The method according to claim 16, wherein the training process of the object detection model comprises: back-propagating the calculation result of the loss function to the object detection model, so as to update the object detection model model parameters.
  22. 根据权利要求1所述的方法,其特征在于,所述物体的位置信息包括如下任一:大小信息、坐标信息或方向信息。The method according to claim 1, wherein the position information of the object includes any one of the following: size information, coordinate information or direction information.
  23. 根据权利要求1或3所述的方法,其特征在于,所述方法应用于可移动平台。The method according to claim 1 or 3, characterized in that the method is applied to a movable platform.
  24. 根据权利要求23所述的方法,其特征在于,所述点云数据是利用配置于所述可移动平台上的激光雷达或者具有深度信息采集功能的摄像设备获取的。The method according to claim 23, wherein the point cloud data is acquired by using a lidar configured on the movable platform or a camera device with a depth information collection function.
  25. 根据权利要求23所述的方法,其特征在于,所述图像数据是利用配置于所述可移动平台上的摄像设备获取的。24. The method of claim 23, wherein the image data is acquired using a camera device disposed on the movable platform.
  26. 根据权利要求23所述的方法,其特征在于,所述可移动平台包括:无人飞行器、汽车、无人船或机器人。The method of claim 23, wherein the movable platform comprises: an unmanned aerial vehicle, a car, an unmanned ship or a robot.
  27. 根据权利要求26所述的方法,其特征在于,检测的所述物体的位置信息用于:对所述汽车进行自动驾驶决策。The method according to claim 26, wherein the detected position information of the object is used to: make an automatic driving decision for the car.
  28. 根据权利要求27所述的方法,其特征在于,检测的所述物体的位置信息用于:对所述无人飞行器进行自动飞行决策。The method according to claim 27, wherein the detected position information of the object is used to: make an automatic flight decision for the unmanned aerial vehicle.
  29. 一种物体检测设备,其特征在于,包括:处理器和存储有计算机程序的存储器;An object detection device, comprising: a processor and a memory storing a computer program;
    所述处理器在执行所述计算机程序时实现以下步骤:The processor implements the following steps when executing the computer program:
    通过可移动平台的传感器对空间采样得到多个采样点;A plurality of sampling points are obtained by sampling the space through the sensor of the movable platform;
    利用物体检测模型检测多个所述采样点中对应的同一物体的目标采样点,并根据所述目标采样点的位置信息检测所述物体的位置信息;Use the object detection model to detect the target sampling points of the same object corresponding to the plurality of sampling points, and detect the position information of the object according to the position information of the target sampling points;
    其中,所述物体检测模型包括:特征提取网络层,尺寸信息提取网络层,物体检测网络层和位置检测网络层;Wherein, the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer;
    所述特征提取网络层用于基于神经网络获取所述采样点的特征信息;The feature extraction network layer is used to obtain feature information of the sampling points based on a neural network;
    所述尺寸信息提取网络层用于根据所述采样点的所述特征信息,确定所述采样点的尺寸信息,所述尺寸信息用于表征所述采样点属于尺寸在目标尺寸范围内的物体的概率;The size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
    所述物体检测网络层用于基于所述特征信息和尺寸信息检测多个所述采样点中对应的同一物体的目标采样点;The object detection network layer is configured to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
    所述位置检测网络层用于根据所述目标采样点的位置信息检测所述物体的位置信息。The position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
  30. 根据权利要求29所述的设备,其特征在于,所述目标尺寸范围包括:小于预设目标尺寸阈值的尺寸范围。The device according to claim 29, wherein the target size range comprises: a size range smaller than a preset target size threshold.
  31. 根据权利要求30所述的设备,其特征在于,对空间采样得到的多个采样点的数据包括:多个采样点的点云数据和/或多个采样点的图像数据。The device according to claim 30, wherein the data of the multiple sampling points obtained by spatial sampling comprises: point cloud data of the multiple sampling points and/or image data of the multiple sampling points.
  32. 根据权利要求31所述的设备,其特征在于,输入至所述物体检测模型的数据包括:所述点云数据的每个体素对应的点云密度和/或图像数据的像素值。The device according to claim 31, wherein the data input to the object detection model comprises: a point cloud density corresponding to each voxel of the point cloud data and/or a pixel value of the image data.
  33. 根据权利要求32所述的设备,其特征在于,所述体素是对所述点云数据进行栅格划分获得的。The device according to claim 32, wherein the voxels are obtained by grid division of the point cloud data.
  34. 根据权利要求33所述的设备,其特征在于,所述尺寸信息用于表征所述点云数据的每个体素属于尺寸在目标尺寸范围内的物体的概率,或所述尺寸信息用于表征所述图像数据的每个像素属于尺寸在目标尺寸范围内的物体的概率。The device according to claim 33, wherein the size information is used to characterize the probability that each voxel of the point cloud data belongs to an object whose size is within a target size range, or the size information is used to characterize all the voxels of the point cloud data. The probability that each pixel of the image data belongs to an object whose size is within the target size range.
  35. 根据权利要求29或34所述的设备,其特征在于,所述尺寸信息用于:增强所述物体检测网络层对所述采样点属于尺寸在目标尺寸范围内的物体的识别。The device according to claim 29 or 34, wherein the size information is used to: enhance the recognition by the object detection network layer that the sampling point belongs to an object whose size is within a target size range.
  36. 根据权利要求35所述的设备,其特征在于,所述物体检测网络层用于:将所述特征信息和尺寸信息相加,利用相加结果检测多个所述采样点中对应的同一物体的目标采样点。The device according to claim 35, wherein the object detection network layer is configured to: add the feature information and the size information, and use the addition result to detect the same object corresponding to the plurality of sampling points target sampling point.
  37. 根据权利要求29所述的设备,其特征在于,所述特征提取网络层用于基于深度学习的神经网络获取所述采样点的特征信息。The device according to claim 29, wherein the feature extraction network layer is used for acquiring feature information of the sampling points based on a deep learning neural network.
  38. 根据权利要求29所述的设备,其特征在于,所述尺寸信息提取网络层是所述特征提取网络层的分支。The apparatus of claim 29, wherein the size information extraction network layer is a branch of the feature extraction network layer.
  39. 根据权利要求10所述的设备,其特征在于,所述尺寸信息提取网络层中包括卷积层。The device according to claim 10, wherein the size information extraction network layer includes a convolution layer.
  40. 根据权利要求39所述的设备,其特征在于,所述尺寸信息,是利用所述卷积层对所述特征信息进行卷积操作得到的。The device according to claim 39, wherein the size information is obtained by performing a convolution operation on the feature information by using the convolution layer.
  41. 根据权利要求40所述的设备,其特征在于,所述尺寸信息提取网络层中的卷积层至少有两层。The device according to claim 40, wherein there are at least two convolution layers in the size information extraction network layer.
  42. 根据权利要求41所述的设备,其特征在于,所述尺寸信息是通过所述至少两层卷积层对所述特征信息进行至少两次卷积操作得到的。The device according to claim 41, wherein the size information is obtained by performing at least two convolution operations on the feature information through the at least two convolution layers.
  43. 根据权利要求29所述的设备,其特征在于,所述物体检测网络层用于:基于所述采样点的特征信息和尺寸信息,获取所述采样点属于物体的置信度,利用所述采样点属于物体的置信度检测多个所述采样点中对应的同一物体的目标采样点。The device according to claim 29, wherein the object detection network layer is configured to: based on the feature information and size information of the sampling points, obtain the confidence that the sampling points belong to objects, and use the sampling points The confidence level belonging to the object detects the target sampling point of the same object corresponding to the plurality of sampling points.
  44. 根据权利要求29或43所述的设备,其特征在于,所述物体检测模型在训练过程中采用的损失函数包括:物***置信息损失子函数、物体置信度损失子函数以及 尺寸信息系数损失子函数。The device according to claim 29 or 43, wherein the loss function used by the object detection model in the training process comprises: an object position information loss sub-function, an object confidence loss sub-function and a size information coefficient loss sub-function .
  45. 根据权利要求44所述的设备,其特征在于,所述物***置信息损失函数的优化目标包括:降低所述物体检测模型从样本数据中获取的物***置信息与所述样本数据标定的物***置信息之间的差异。The device according to claim 44, wherein the optimization objective of the object position information loss function comprises: reducing the object position information obtained by the object detection model from the sample data and the object position information calibrated by the sample data difference between.
  46. 根据权利要求44所述的设备,其特征在于,所述物体置信度损失子函数的优化目标包括:提高所述物体检测模型从样本数据中检测出的物体的置信度。The device according to claim 44, wherein the optimization objective of the object confidence loss sub-function comprises: improving the confidence of the object detected by the object detection model from the sample data.
  47. 根据权利要求44所述的设备,其特征在于,所述尺寸信息系数损失函数用于:增强属于尺寸在目标尺寸范围内的物体的采样点的特征值。The device according to claim 44, wherein the size information coefficient loss function is used to: enhance feature values of sample points belonging to objects whose size is within a target size range.
  48. 根据权利要求44所述的设备,其特征在于,所述尺寸信息系数损失函数的优化目标包括:提高所述物体检测模型从样本数据中检测出属于所述尺寸在目标尺寸范围内的物体的置信度。The device according to claim 44, wherein the optimization objective of the size information coefficient loss function comprises: improving the confidence that the object detection model detects objects whose size is within the target size range from the sample data Spend.
  49. 根据权利要求44所述的设备,其特征在于,所述物体检测模型的训练过程包括:将所述损失函数的计算结果反向传播至所述物体检测模型中,以使所述物体检测模型更新模型参数。The device according to claim 44, wherein the training process of the object detection model comprises: backpropagating the calculation result of the loss function to the object detection model, so as to update the object detection model model parameters.
  50. 根据权利要求29所述的设备,其特征在于,所述物体的位置信息包括如下任一:大小信息、坐标信息或方向信息。The device according to claim 29, wherein the position information of the object includes any one of the following: size information, coordinate information or direction information.
  51. 根据权利要求29或31所述的设备,其特征在于,所述设备应用于可移动平台。The device according to claim 29 or 31, characterized in that the device is applied to a movable platform.
  52. 根据权利要求51所述的设备,其特征在于,所述点云数据是利用配置于所述可移动平台上的激光雷达或者具有深度信息采集功能的摄像设备获取的。The device according to claim 51, wherein the point cloud data is obtained by using a lidar or a camera device with a depth information collection function configured on the movable platform.
  53. 根据权利要求51所述的设备,其特征在于,所述图像数据是利用配置于所述可移动平台上的摄像设备获取的。51. The device of claim 51, wherein the image data is acquired using a camera device disposed on the movable platform.
  54. 根据权利要求51所述的设备,其特征在于,所述可移动平台包括:无人飞行器、汽车、无人船或机器人。The apparatus of claim 51, wherein the movable platform comprises: an unmanned aerial vehicle, an automobile, an unmanned ship, or a robot.
  55. 根据权利要求54所述的设备,其特征在于,检测的所述物体的位置信息用于:对所述汽车进行自动驾驶决策。The device according to claim 54, wherein the detected position information of the object is used to: make an automatic driving decision for the car.
  56. 根据权利要求54所述的设备,其特征在于,检测的所述物体的位置信息用于:对所述无人飞行器进行自动飞行决策。The device according to claim 54, wherein the detected position information of the object is used to: make an automatic flight decision for the unmanned aerial vehicle.
  57. 一种可移动平台,其特征在于,包括:A movable platform, characterized in that, comprising:
    机体;body;
    动力***,安装在所述机体内,用于为所述可移动平台提供动力;以及,a power system mounted within the body for powering the movable platform; and,
    如权利要求29至56任一所述的物体检测设备。An object detection device as claimed in any one of claims 29 to 56.
  58. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至28任一项所述的方法。A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1 to 28 is implemented.
PCT/CN2020/137299 2020-12-17 2020-12-17 Object detection method, device, movable platform, and computer-readable storage medium WO2022126523A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/137299 WO2022126523A1 (en) 2020-12-17 2020-12-17 Object detection method, device, movable platform, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/137299 WO2022126523A1 (en) 2020-12-17 2020-12-17 Object detection method, device, movable platform, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2022126523A1 true WO2022126523A1 (en) 2022-06-23

Family

ID=82058813

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/137299 WO2022126523A1 (en) 2020-12-17 2020-12-17 Object detection method, device, movable platform, and computer-readable storage medium

Country Status (1)

Country Link
WO (1) WO2022126523A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032962A (en) * 2019-04-03 2019-07-19 腾讯科技(深圳)有限公司 A kind of object detecting method, device, the network equipment and storage medium
CN110298262A (en) * 2019-06-06 2019-10-01 华为技术有限公司 Object identification method and device
CN110942000A (en) * 2019-11-13 2020-03-31 南京理工大学 Unmanned vehicle target detection method based on deep learning
US20200145569A1 (en) * 2017-10-19 2020-05-07 DeepMap Inc. Lidar to camera calibration for generating high definition maps

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200145569A1 (en) * 2017-10-19 2020-05-07 DeepMap Inc. Lidar to camera calibration for generating high definition maps
CN110032962A (en) * 2019-04-03 2019-07-19 腾讯科技(深圳)有限公司 A kind of object detecting method, device, the network equipment and storage medium
CN110298262A (en) * 2019-06-06 2019-10-01 华为技术有限公司 Object identification method and device
CN110942000A (en) * 2019-11-13 2020-03-31 南京理工大学 Unmanned vehicle target detection method based on deep learning

Similar Documents

Publication Publication Date Title
CN110988912B (en) Road target and distance detection method, system and device for automatic driving vehicle
EP3506158A1 (en) Method, apparatus, and device for determining lane line on road
CN112101092A (en) Automatic driving environment sensing method and system
CN111222395B (en) Target detection method and device and electronic equipment
CN113706480B (en) Point cloud 3D target detection method based on key point multi-scale feature fusion
CN114820465B (en) Point cloud detection model training method and device, electronic equipment and storage medium
CN111428859A (en) Depth estimation network training method and device for automatic driving scene and autonomous vehicle
WO2022126522A1 (en) Object recognition method, apparatus, movable platform, and storage medium
CN113412505A (en) System and method for ordered representation and feature extraction of point clouds obtained by detection and ranging sensors
CN110674705A (en) Small-sized obstacle detection method and device based on multi-line laser radar
CN113807350A (en) Target detection method, device, equipment and storage medium
CN115393601A (en) Three-dimensional target detection method based on point cloud data
CN113536920B (en) Semi-supervised three-dimensional point cloud target detection method
CN114241448A (en) Method and device for obtaining heading angle of obstacle, electronic equipment and vehicle
US20240193788A1 (en) Method, device, computer system for detecting pedestrian based on 3d point clouds
WO2022126523A1 (en) Object detection method, device, movable platform, and computer-readable storage medium
CN114638996A (en) Model training method, device, equipment and storage medium based on counterstudy
CN116246119A (en) 3D target detection method, electronic device and storage medium
CN115453570A (en) Multi-feature fusion mining area dust filtering method
CN112712062A (en) Monocular three-dimensional object detection method and device based on decoupling truncated object
Yu et al. A lightweight ship detection method in optical remote sensing image under cloud interference
Pereira et al. A 3-D Lightweight Convolutional Neural Network for Detecting Docking Structures in Cluttered Environments
US20240135195A1 (en) Efficient search for data augmentation policies
Babolhavaeji et al. Multi-Stage CNN-Based Monocular 3D Vehicle Localization and Orientation Estimation
CN111815667B (en) Method for detecting moving target with high precision under camera moving condition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20965539

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20965539

Country of ref document: EP

Kind code of ref document: A1