WO2020186444A1 - Object detection method, electronic device, and computer storage medium - Google Patents

Object detection method, electronic device, and computer storage medium Download PDF

Info

Publication number
WO2020186444A1
WO2020186444A1 PCT/CN2019/078629 CN2019078629W WO2020186444A1 WO 2020186444 A1 WO2020186444 A1 WO 2020186444A1 CN 2019078629 W CN2019078629 W CN 2019078629W WO 2020186444 A1 WO2020186444 A1 WO 2020186444A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel
detected
point
feature map
point cloud
Prior art date
Application number
PCT/CN2019/078629
Other languages
French (fr)
Chinese (zh)
Inventor
张磊杰
陈晓智
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2019/078629 priority Critical patent/WO2020186444A1/en
Priority to CN201980005385.1A priority patent/CN111316285A/en
Publication of WO2020186444A1 publication Critical patent/WO2020186444A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the embodiments of the present invention relate to the technical field of object detection, and in particular to an object detection method, electronic equipment, and computer storage medium.
  • Intelligent driving includes automatic driving and assisted driving.
  • One of the core technologies of intelligent driving technology is obstacle detection.
  • the accuracy of obstacle detection results is directly related to the safety and reliability of intelligent driving.
  • a common obstacle detection technology that uses sensors such as cameras, lidars, millimeter wave radars and other sensors on the vehicle to detect dynamic obstacles (such as vehicles, pedestrians) in the road scene, and then obtain the attitude and three-dimensional position of the dynamic obstacles And physical dimensions.
  • the embodiment of the present invention provides an object detection method, electronic equipment and computer storage medium to realize accurate detection of objects.
  • an object detection method includes:
  • the second detection result of the object to be detected is determined.
  • an electronic device includes:
  • Memory used to store computer programs
  • the processor is configured to execute the computer program, specifically:
  • the second detection result of the object to be detected is determined.
  • a vehicle according to an embodiment of the present application includes: a vehicle body and the electronic device as described in the second aspect installed on the vehicle body.
  • a vehicle of an embodiment of the present application includes: a vehicle body and the electronic device as described in the second aspect installed on the vehicle body.
  • an unmanned aerial vehicle includes: a fuselage and the electronic device as described in the second aspect installed on the fuselage.
  • an embodiment of the present application is a computer storage medium, in which a computer program is stored, and the computer program implements the timing control method of an analog communication interface as described in the first aspect when the computer program is executed.
  • the object detection method, electronic device, and computer storage medium provided by the embodiments of the present application obtain the first feature map by performing first processing on the point cloud data of the object to be detected; for each pixel in the first feature map Perform detection to obtain the first detection result of each pixel; and determine the second detection result of the object to be detected according to the first detection result of each pixel. That is, in the embodiment of the present application, by performing pixel-level detection on each pixel in the first feature map, this intensive detection method can increase the recall rate of object detection, thereby improving the accuracy of object detection.
  • FIG. 1 is a schematic diagram of an application scenario involved in an embodiment of this application
  • FIG. 2 is a flowchart of an object detection method provided by an embodiment of the application
  • 3A is a schematic diagram of the framework of a neural network model involved in an embodiment of this application.
  • 3B is a schematic diagram of a neural network model involved in an embodiment of this application.
  • 3C is a schematic diagram of another neural network model involved in an embodiment of this application.
  • FIG. 4 is another flowchart of an object detection method provided by an embodiment of the application.
  • FIG. 5 is another flowchart of the object detection method provided by an embodiment of the application.
  • FIG. 6 is another flowchart of an object detection method provided by an embodiment of the application.
  • FIG. 7 is another flowchart of an object detection method provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of an electronic device provided by an embodiment of the application.
  • FIG. 9 is another schematic diagram of an electronic device provided by an embodiment of the application.
  • FIG. 10 is a schematic structural diagram of a vehicle provided by an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of a transportation tool provided by an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of a drone provided by an embodiment of the application.
  • words such as “first” and “second” are used to distinguish the same items or similar items with substantially the same function and effect. Those skilled in the art can understand that words such as “first” and “second” do not limit the quantity and order of execution, and words such as “first” and “second” do not limit the difference.
  • the embodiments of this application can be applied to any field that needs to detect objects, such as automatic driving, assisted driving and other intelligent driving fields. It can detect obstacles such as vehicles and pedestrians in road scenes, thereby improving the safety of intelligent driving. Sex.
  • Figure 1 is a schematic diagram of an application scenario involved in an embodiment of this application.
  • the intelligent driving vehicle includes detection equipment.
  • the detection equipment can detect obstacles in the front lane (such as falling rocks, Objects, dead branches, pedestrians, vehicles, etc.) are detected to obtain detection information such as the location, attitude, orientation, and size of obstacles, and based on the detection information to plan the state of intelligent driving, such as lane change, deceleration, or parking.
  • the detection equipment may specifically be radar, ultrasonic detection equipment, Time Of Flight (TOF) ranging detection equipment, visual detection equipment, laser detection equipment, etc., and combinations thereof.
  • TOF Time Of Flight
  • FIG. 1 is a schematic diagram of an application scenario of this application, and the application scenario of the embodiment of this application includes but is not limited to that shown in FIG. 1.
  • FIG. 2 is a flowchart of an object detection method provided by an embodiment of the application. As shown in FIG. 2, the method of the embodiment of the application includes:
  • S101 Perform first processing on the acquired point cloud data of the object to be detected, and obtain a first feature map of the point cloud data.
  • the execution subject of the embodiments of the present application is a device with an object detection function, for example, a detection device, which can be integrated in any electronic device as a part of the electronic device.
  • the detection device may also be a separate electronic device.
  • the electronic device may be a vehicle-mounted device, such as a trip recorder.
  • the carrier of the electronic device may also be an aircraft, an unmanned aerial vehicle, a smart handheld device, a smart handheld device with a stable platform, and the like.
  • Each point cloud in the aforementioned point cloud data includes information such as three-dimensional data and reflectance of the point cloud, wherein the three-dimensional data of the point cloud includes the three-dimensional coordinates of the point cloud in the point cloud data coordinate system.
  • the first processing includes at least one of the following: at least one convolution operation, at least one sampling operation, and at least one stacking operation.
  • the sampling operation may include: a down sampling operation and/or an up sampling operation.
  • the point cloud data of the object to be detected obtained in this step is input into the neural network model shown in FIG. 3A.
  • the neural network model performs the first processing on the point cloud data to obtain the point cloud data.
  • the neural network model includes N-layer feature maps, and at least one convolution and at least one down-sampling operation are included between the first-layer feature maps and the second-layer feature maps.
  • the convolution/down-sampling process is to extract high-level information and Enlarge the receptive field of neurons. At least one convolution and at least one down-sampling operation are included between the second feature map and the third layer feature map.
  • the N-2th feature map to the N-1th layer feature map includes at least one convolution and at least one upsampling operation, where the process of convolution/upsampling is to extract high-level information and perform pixel-level detection.
  • the N-1th feature map to the Nth layer feature map also includes at least one convolution and at least one upsampling operation.
  • FIG. 3A is only an example of a neural network model.
  • the neural network model of the embodiment of the present application includes but is not limited to that shown in FIG. 3A, and the number of operations performed on each feature map shown in FIG. 3A can be calculated according to Set your own resource requirements.
  • the output of the layer 2 feature map can also be used as the input of the N-1 layer feature map.
  • the neural network model used in the embodiment of the present application may also be a segmentation network model as shown in FIG. 3B.
  • the segmentation network model has a structure as shown in the figure, including, for example, 9 layers, where:
  • each layer includes at least one convolution operation (Conv), and the pooling operation (Pooling) is performed between two adjacent layers, and the convolution operation and the pooling operation implement down-sampling.
  • Conv convolution operation
  • Pooling the pooling operation
  • the convolution operation and the pooling operation implement down-sampling.
  • each layer includes at least one convolution operation (Conv), between two adjacent layers is an upward convolution operation (up-Conv), and an upper convolution operation (up-Conv) is implemented Upsampling.
  • Different layers can be stacked.
  • the output of the first layer can be input to the 9th layer for stacking
  • the output of the second layer can be input to the 8th layer for stacking
  • the output of the third layer can be input to the first 7 layers are stacked
  • the output of the 4th layer can be input to the 6th layer for stacking.
  • Stacking can achieve multi-scale feature fusion, feature point-by-point addition, and feature channel dimension splicing, and a pixel-level segmentation map can be obtained, and semantic category judgments can be made for each pixel.
  • the output of the m-th layer is directly stacked with Nm in a stacked manner, so as to realize the fusion of the feature map images of the same layer and the same size, and the image feature map of the m-th layer has more shallow layers due to less convolution.
  • Information After stacking, the output information of the m-th layer can be fused with the information that has not undergone multiple convolutions and pooling before. And then realize the splicing and alignment of shallow information and deep information on the pixel scale. During the stacking process, since the feature map sizes of the same layer are equal, splicing and alignment only need to be aligned in the feature map size, which is very beneficial to the deep semantic information and shallow feature map information in the stacking process of each layer. Fusion.
  • the neural network model used in the embodiment of the present application may be a deep experimental network as shown in FIG. 3C, where the encoder module encodes multi-scale context information by using hole convolution on multiple scales, and The decoder module refines the segmentation result along the object boundary.
  • This deep experimental network can achieve pixel-level classification.
  • each pixel in the first feature map is detected to obtain the detection result of each pixel, and the detection result of each pixel is recorded as the first detection result.
  • the first feature map is input into the neural network model, and the first detection result of each pixel is predicted.
  • the above steps S101 and S102 can be predicted by a neural network model. For example, input the point cloud data of the object to be detected into the neural network model shown in FIG. 3B or FIG. 3C, and first predict the first cloud data A feature map, and then continue to input the first feature map as an input into the neural network model to predict the first detection result of each pixel in the first feature map.
  • the first detection result of the pixel point includes at least one of the following: the semantic category of the pixel point, the orientation of the pixel point, and the distance between the pixel point and the center point of the object to be detected.
  • S103 Determine a second detection result of the object to be detected according to the first detection result of each pixel.
  • the second detection result of the object to be detected includes at least one of the following: the semantic category of the object to be detected, the three-dimensional coordinates of the object to be detected, and the size of the object to be detected.
  • the semantic category of the object to be detected can be obtained; according to the spatial position and orientation of each pixel and the distance between each pixel and the center of the object to be detected, The three-dimensional coordinates and size of the object to be detected.
  • the second detection result of the object to be detected is determined. In this way, object detection based on the pixel level can provide the accuracy of object detection.
  • the object detection method provided by the embodiment of the application obtains a first feature map by performing first processing on the obtained point cloud data of the object to be detected; detects each pixel in the first feature map to obtain each The first detection result of the pixel; and the second detection result of the object to be detected is determined according to the first detection result of each pixel. That is, in the embodiment of the present application, by performing pixel-level detection on each pixel in the first feature map, this intensive detection method can increase the recall rate of object detection, thereby improving the accuracy of object detection.
  • the embodiment of the present application further includes:
  • This step does not limit the way of obtaining the point cloud data of the object to be detected, which is determined according to actual needs.
  • the depth sensor collects point cloud data of the object to be detected, and the electronic device obtains the point cloud data of the object to be detected collected by the depth sensor from the depth sensor.
  • the depth sensor can be installed on the electronic device and is a part of the electronic device.
  • the depth sensor and the electronic device are two components, and the depth sensor is in communication connection with the electronic device, and the depth sensor can transmit the collected point cloud data of the object to be detected to the electronic device.
  • the communication connection between the depth sensor and the electronic device may be a wired connection or a wireless connection, which is not limited.
  • the depth sensor may be radar, ultrasonic detection equipment, TOF ranging detection equipment, laser detection equipment, etc. and combinations thereof.
  • the method for the electronic device to obtain the point cloud data of the object to be detected may also be: the electronic device obtains the first image and the second image collected by the binocular camera of the object to be detected; The first image and the second image obtain point cloud data of the object to be detected.
  • a binocular camera is installed on the vehicle.
  • the binocular camera collects the road map, and then can collect the first image and the second image of the object to be detected.
  • One image is the left-eye image, the second image is the right-eye image, and vice versa.
  • the electronic device matches the pixels of the first image and the second image to obtain the disparity value of each pixel. Based on the triangulation principle, according to the disparity value of each pixel, each pixel in the object to be detected can be obtained. Point cloud data corresponding to each physical point.
  • FIG. 4 is another flowchart of the object detection method provided by the embodiment of the application, and the specific process of performing the first processing on the acquired point cloud data of the object to be detected in the embodiment of the application to obtain the first feature map.
  • the above S101 may include:
  • S201 Perform feature encoding on the acquired point cloud data to obtain an encoding feature map of the point cloud data.
  • the point cloud data obtained in the above steps includes multiple point clouds, and each point cloud is a 4-dimensional vector, including the three-dimensional coordinates and reflectivity of the point cloud.
  • the above point cloud data is disordered. In order to facilitate the accurate and rapid implementation of the subsequent step of obtaining the first feature map, this step can preprocess the obtained disordered point cloud data to minimize the amount of information loss. Next, perform feature encoding of the point cloud data, and then obtain a feature encoding map of the point cloud data.
  • the specific encoding method for feature encoding of the point cloud data can be based on the three-dimensional coordinates and/or reflectivity of each point cloud to perform feature encoding on each point cloud in the point cloud data to obtain the point cloud.
  • the feature code map of the data can be based on the three-dimensional coordinates and/or reflectivity of each point cloud to perform feature encoding on each point cloud in the point cloud data to obtain the point cloud.
  • the object when object detection is performed, the object is usually projected onto the top view, and information such as the position of the object in the top view direction is detected.
  • the top view is a top view
  • the top view includes two-dimensional coordinates of the three-dimensional data projected in the horizontal direction, and height data information and reflectance information corresponding to the two-dimensional coordinates. Therefore, the coding method of performing feature coding on the point cloud data in this step can also be projecting the point cloud data under the top view, and performing feature coding on the point cloud data in the top view direction to obtain a feature coding map of the point cloud data.
  • the size of the first feature map obtained in this step is consistent with the size of the feature encoding map of the point cloud data.
  • the encoded feature map may be input into the neural network model shown in FIG. 3B or FIG. 3C to obtain the first feature map of the point cloud data.
  • the generation speed of the first feature map can be increased, and the first feature can be improved The accuracy of the graph.
  • the foregoing S201 performs feature encoding on the acquired point cloud data to obtain an encoding feature map of the point cloud data, which may include:
  • this step can project the point cloud data to the top view direction, and then compress the point cloud data in the top view direction to obtain the coding characteristics of the point cloud data Figure.
  • the three-dimensional point cloud can be expressed as a four-dimensional vector of (x, y, z, f), and when the feature vector is used, the three-dimensional point cloud set and the points in it are disordered. Point cloud compression can compress disordered point sets with a small loss.
  • FIG. 5 is another flowchart of the object detection method provided by an embodiment of the application. As shown in FIG. 5, the foregoing S300 may include:
  • the point cloud data in the embodiment of the application is three-dimensional point cloud data, and the three-dimensional point cloud data is constrained according to a preset coding space, so as to constrain the three-dimensional point cloud data to the preset coding space.
  • This step can be understood as the compression of point cloud data.
  • the preset coding space can be understood as a cube, and the corresponding coding range is L ⁇ W ⁇ H, where L represents distance, W represents width, and H represents height, and the unit can be m.
  • the embodiment of the present application does not limit the size of the preset coding space, which is specifically determined according to actual needs.
  • this step can constrain the three-dimensional point cloud data in the physical space to the preset coding space L ⁇ W ⁇ H.
  • S302 Perform grid division on the point cloud data in the coding space according to a preset resolution, and determine the characteristics of each grid.
  • the point cloud data is constrained to the coding space according to the above steps, and then the point cloud data in the coding space is grid-divided according to the preset resolution to obtain multiple grids, one of which may include One or more point clouds, or not including point clouds. Then, the characteristics of each grid are determined, for example, based on the point cloud included in the grid.
  • the preset resolution includes resolutions in three directions: length, width, and height.
  • the grid division of the point cloud data in the coding space according to the preset resolution in S302 may include the following step A1:
  • Step A1 Perform grid division on the point cloud data in the coding space according to the resolution in the three directions of length, width and height.
  • the point cloud data is divided in the length direction according to the resolution in the length direction and the length of the preset coding space; according to the resolution in the width direction and the length of the preset coding space Width, dividing the point cloud data in the width direction; dividing the point cloud data in the height direction according to the resolution in the height direction and the width of the preset coding space.
  • the preset coding space is L ⁇ W ⁇ H
  • the preset resolutions in the three directions of length, width, and height are: dl, dw, dh, so that the coding space L ⁇ W is determined according to dl, dw, and dh.
  • the point cloud data in ⁇ H is divided into grids, and the size of each grid obtained is L/dl ⁇ W/dw ⁇ H/dh.
  • determining the characteristics of each grid in the foregoing S302 may specifically include the following step B1:
  • Step B1 Determine the characteristics of the grid according to the number of point clouds included in the grid and/or the reflectivity of the point clouds included in the grid.
  • the size of each grid is obtained as L/dl ⁇ W/dw ⁇ H/dh, and then the number of point clouds included in each grid and/or the number of point clouds included in the grid is obtained
  • the reflectivity determines the characteristics of the grid according to the number of point clouds included and/or the reflectivity of the point clouds included in the grid.
  • the feature of the grid is determined according to the number of point clouds included. For example, if the grid includes a point cloud, the feature of the grid is determined to be 1. If the grid does not include a point cloud, Determine the feature of this grid as 0.
  • the embodiment of the application determines the coding feature map of the point cloud data in the top view.
  • the height information is lost, and the distance information and width information exist in the top view. Therefore, the height information needs to be extracted to obtain the final point cloud data. Encoding feature map.
  • the scale of the coded feature map is L ⁇ W ⁇ H is 80 ⁇ 40 ⁇ 4, and the resolution of length, width and height are all 0.1, and the final coded feature map obtained is C ⁇ A ⁇ B which is 80 ⁇ 800 ⁇ 400.
  • the object detection method provided by the embodiment of the application constrains the point cloud data to a preset coding space; according to the preset resolution, the point cloud data in the coding space is divided into grids, and each grid is determined In the top view direction, according to the characteristics of each grid, the coding feature map of the point cloud data is obtained, and then the coding feature map is accurately obtained.
  • Fig. 6 is another flowchart of the object detection method provided by the embodiment of the application.
  • the embodiment of the present application involves detecting each pixel in the first feature map to obtain each
  • the specific process of the first detection result of the pixel point is described.
  • steps S401 to S403 are the specific process of obtaining the orientation of each pixel
  • step S404 is the specific process of obtaining the semantic category of each pixel
  • steps S405 and S406 are obtaining the center point of each pixel and the object to be detected.
  • the specific process of the distance may include:
  • the first feature map is divided into multiple intervals, specifically, the first feature map is divided into multiple intervals along the circumferential direction with the center of the first feature map as the center.
  • the center of the first feature map as the center of the circle, divide the first feature map into several intervals according to [-180°, 180°], and determine the center of each interval.
  • S402 Predict the interval to which the pixel point belongs, and the position coordinate of the pixel point in the interval to which the pixel point belongs.
  • the interval to which an object orientation belongs will be predicted, and the residual amount of the interval to which the pixel belongs.
  • the relative position in that is, the position coordinate of the pixel in the interval.
  • the prediction of the interval to which each pixel in the first feature map belongs is used as a classification problem, and the prediction of the position coordinate of each pixel in the interval to which it belongs is used as a regression problem.
  • the first feature map can be input into the trained prediction model to predict the interval to which each pixel in the first feature map belongs and the position coordinate of each pixel in the interval to which it belongs.
  • S403 Determine the orientation of the pixel point according to the angle of the interval to which the pixel point belongs and the position coordinate of the pixel point in the interval to which the pixel point belongs.
  • the angle of the interval to which the pixel belongs and the position coordinate of the pixel in the interval are predicted, the specific angle of the pixel can be determined, and the orientation of the pixel can be determined according to the specific angle of the pixel.
  • Test results can include:
  • S404 Perform semantic category detection on the pixel to obtain the semantic category of the pixel.
  • the feature prediction is performed on each pixel in the first feature map in S102 to obtain each pixel.
  • the first detection result of the pixel point may include:
  • S405 Perform position detection on the pixel point to obtain the distance between the pixel point and the center point of the object to be detected.
  • the vector distance between each pixel and the center point of the object to be detected can be obtained.
  • the orientation, semantic category, and distance of each pixel from the center of the object to be detected are obtained respectively, thereby achieving Accurate acquisition of the first detection result of the pixel.
  • FIG. 7 is another flowchart of the object detection method provided by the embodiment of the application.
  • the embodiment of the present application relates to obtaining the pending detection result according to the first detection result of each pixel.
  • the specific process of detecting the second detection result of the object is a specific process of determining the semantic category of the object to be detected
  • step S502 is a specific process of determining the size of the object to be detected.
  • the foregoing S103 may include:
  • each pixel in the first feature map can be clustered to determine the semantic category of the object to be detected. For example, by clustering pixels with the same semantic category, one or more clustering results can be obtained. The clustering result with the largest number of pixels included in the multiple clustering results is regarded as the final clustering result. The semantic category corresponding to the final clustering result is determined as the semantic category of the object to be detected.
  • the foregoing S501 may include, for example, step C1 and step C2;
  • Step C1 cluster the pixels in the first feature map according to the semantic category of each pixel in the first feature map, and obtain the cluster area.
  • the semantic category of each pixel in the first feature map clusters the pixels in the first feature map, and multiple candidate cluster regions may be obtained, and the largest candidate among the multiple candidate cluster regions
  • the cluster area is determined as the cluster area.
  • one candidate cluster area including the most pixel points among the plurality of candidate cluster areas is determined as the cluster area.
  • the foregoing method of clustering the pixels may be a bottom-up gradual clustering method.
  • Step C2 Determine the semantic category of the object to be detected according to the semantic category of each pixel in the cluster area.
  • the semantic category of the object to be detected is determined. For example, if the semantic category of each first pixel in the cluster area is a pedestrian, then the semantic category of the object to be detected is determined to be a pedestrian.
  • S502. Determine the size of the cluster area according to the spatial position of each pixel in the cluster area and the distance between each pixel in the cluster area and the center point of the object to be detected. center.
  • the position information of the pixel can be determined according to the preset resolution and the grid corresponding to the pixel.
  • the above-mentioned first feature map is obtained based on the encoding feature map, and the encoding feature map is obtained by rasterizing at a certain resolution.
  • Each pixel can be understood as a grid, so that it can be based on the preset encoding space and preset Resolution, the position information of the grid can be obtained, the position information of the pixel corresponding to the grid can be determined according to the position information of the grid, and the accuracy of the position information is the resolution value.
  • the center position of the cluster area can be determined based on the spatial position of each pixel in the cluster area and the distance between each pixel in the cluster area and the center point of the object to be detected.
  • the center of the clustering area coincides with the center point of the object to be detected, so that the center of the object to be detected can be determined according to the spatial position of each pixel and the distance between each pixel and the center of the object to be detected The position of the point, and then determine the center position of the clustering area.
  • the foregoing S502 includes step D;
  • Step D According to the spatial position of each pixel in the cluster area, the distance between each pixel in the cluster area and the center point of the object to be detected, and the cluster area The first weight of each of the pixels within determines the center of the cluster area.
  • the first weight is used as the weight of the distance between the pixel point and the center point of the object to be detected. In this way, according to the spatial position of each pixel in the cluster area, each pixel point in the cluster area and the center point of the object to be detected In the process of determining the center of the clustering area, the first weight of the distance between each pixel and the center point of the object to be detected is increased in the process of determining the center of the clustering area, thereby improving the accuracy of calculation of the center of the clustering area.
  • the first weight of each pixel in the cluster area is the semantic category probability value of each pixel in the cluster area.
  • S503 Determine the orientation of the cluster area according to the orientation of each pixel in the cluster area.
  • each pixel has an orientation, so that the orientation of the cluster area can be determined according to the orientation of each pixel.
  • the orientation of the most pixels in the cluster area is taken as the orientation of the cluster area.
  • the orientation of the class area is also a.
  • the foregoing S503 includes step E;
  • Step E Determine the orientation of the cluster area according to the orientation of each pixel in the cluster area and the second weight of each pixel in the cluster area.
  • the second weight of the orientation of each pixel is added, thereby improving the accuracy of determining the orientation of the clustering area.
  • the second weight of each pixel in the cluster area may be a semantic category probability value of each pixel in the cluster area.
  • the orientation of the object to be detected can be determined according to the orientation of the cluster area, for example, the orientation of the cluster area is determined as the orientation of the object to be detected.
  • S504 Determine the size of the object to be detected according to the center of the cluster area and the orientation of the cluster area.
  • the size of the cluster area can be determined, and then the size of the object to be detected can be determined according to the size of the cluster area.
  • the foregoing S404 may include step F1 and step F2.
  • Step F1 Determine the largest circumscribed rectangular frame of the cluster area according to the center of the cluster area and the orientation of the cluster area.
  • the largest circumscribed rectangular frame of the cluster area is obtained by fitting.
  • Step F2 Determine the size of the object to be detected according to the largest circumscribed rectangular frame of the cluster area.
  • the size of the largest circumscribed rectangular frame of the cluster area is taken as the size of the object to be detected.
  • this embodiment of the present application may also determine the three-dimensional coordinates of the object to be detected according to the largest circumscribed rectangular frame of the clustering area. That is, the three-dimensional coordinates of the largest circumscribed rectangular frame are used as the three-dimensional coordinates of the object to be detected.
  • the object inspection method provided by the embodiment of the present application clusters the pixels in the first feature map according to the semantic category of each pixel in the first feature map to determine the semantic category of the object to be detected; according to the clustering
  • the orientation of each pixel in the area determine the orientation of the cluster area; determine the largest circumscribed rectangular frame of the object to be detected according to the center of the cluster area and the orientation of the cluster area, according to The maximum circumscribed rectangular frame can determine the size and three-dimensional coordinates of the object to be detected, thereby achieving accurate determination of the semantic category of the object to be detected, the three-dimensional coordinates of the object to be detected, and the size of the object to be detected.
  • FIG. 8 is a schematic diagram of an electronic device provided by an embodiment of the application.
  • the electronic device 200 of the embodiment of the application includes at least one memory 210 and at least one processor 220.
  • the memory 210 is used to store a computer program;
  • the processor 220 is used to execute the computer program, specifically:
  • the processor 220 when executing a computer program, is specifically configured to perform first processing on the acquired point cloud data of the object to be detected to obtain a first feature map; to detect each pixel in the first feature map to obtain The first detection result of each pixel; and the second detection result of the object to be detected is determined according to the first detection result of each pixel.
  • the electronic device of the embodiment of the present application may be used to execute the technical solution of the method embodiment shown above, and its implementation principles and technical effects are similar, and will not be repeated here.
  • the processor 220 is specifically configured to perform feature encoding on the acquired point cloud data to obtain an encoding feature map of the point cloud data; perform the encoding feature map on the encoding feature map.
  • the first process is to obtain the first characteristic map.
  • the processor 220 is specifically configured to compress the point cloud data in a top view direction to obtain an encoding feature map of the point cloud data.
  • the processor 220 is specifically configured to constrain the point cloud data to a preset coding space; according to a preset resolution, perform a calculation of the point cloud data in the coding space Perform grid division and determine the characteristics of each grid; and in the top view direction, obtain the coded feature map of the point cloud data according to the characteristics of each grid.
  • the preset resolution includes: resolution in three directions of length, width, and height; the processor 220 is specifically configured to perform according to the three directions of length, width, and height.
  • the point cloud data in the coding space is divided into grids.
  • the processor 220 is specifically configured to divide the point cloud data in the length direction according to the resolution in the length direction and the length of the preset encoding space; The resolution in the width direction and the width of the preset encoding space are divided into the point cloud data in the width direction; according to the resolution in the height direction and the width of the preset encoding space, The point cloud data is divided in the height direction.
  • the processor 220 is specifically configured to determine the number of point clouds included in the grid and/or the reflectivity of the point clouds included in the grid. The characteristics of the grid.
  • the scale of the encoding feature map is C ⁇ A ⁇ B, where C is determined by the ratio of the height of the preset encoding space to the resolution in the height direction, and A is determined by the ratio of the length of the preset coding space to the resolution in the length direction, and the B is determined by the ratio of the width of the preset coding space to the resolution in the width direction.
  • the first processing includes at least one of the following: at least one convolution operation, at least one sampling operation, and at least one stacking operation.
  • the sampling operation includes: a down sampling operation and/or an up sampling operation.
  • the size of the first feature map and the encoding feature map are the same.
  • the first detection result of the pixel point includes at least one of the following: the semantic category of the pixel point, the orientation of the pixel point, the center of the pixel point and the object to be detected The distance of the point.
  • the first detection result of the pixel includes the semantic category of the pixel
  • the processor 220 is specifically configured to perform category detection on the pixel to obtain the pixel The semantic category.
  • the first detection result of the pixel point includes the orientation of the pixel point
  • the processor 220 is specifically configured to use the center of the first feature map as the center of the circle and move along the circumference.
  • Direction divide the first feature map into a plurality of intervals; predict the interval to which the pixel point belongs, and the position coordinate of the pixel point in the interval; and according to the angle of the interval to which the pixel point belongs, and The position coordinates of the pixel point in the interval to which it belongs determine the orientation of the pixel point.
  • the first detection result of the pixel point includes the distance between the pixel point and the center point of the object to be detected
  • the processor 220 is specifically configured to perform Position detection to obtain the distance between the pixel point and the center point of the object to be detected.
  • the second detection result of the object to be detected includes at least one of the following: semantic category of the object to be detected, size of the object to be detected, and three-dimensional coordinates of the object to be detected .
  • the second detection result of the object to be detected includes the semantic category of the object to be detected
  • the processor 220 is specifically configured to perform according to each of the The semantic category of pixels is to cluster the pixels in the first feature map to determine the semantic category of the object to be detected.
  • the processor 220 is specifically configured to perform according to the semantic category of each pixel in the first feature map.
  • the semantic category of the pixel points, cluster the pixels in the first feature map to obtain the cluster area; according to the semantic category of each pixel in the cluster area, determine the object to be detected Semantic category.
  • the method of clustering the pixels is a method of gradually clustering from bottom to top.
  • the second detection result of the object to be detected includes the size of the object to be detected
  • the processor 220 is specifically configured to determine the size of each pixel in the cluster area. Determine the center of the cluster area according to the spatial position of each pixel in the cluster area and the center point of the object to be detected; determine the center of the cluster area according to each pixel in the cluster area Determine the orientation of the cluster area; determine the size of the object to be detected according to the center of the cluster area and the orientation of the cluster area.
  • the processor 220 is specifically configured to determine the relationship between each pixel in the cluster area and the spatial position of each pixel in the cluster area. The distance of the center point of the object to be detected and the first weight of each pixel in the cluster area determine the center of the cluster area.
  • the processor 220 is specifically configured to perform according to the orientation of each pixel in the cluster area, and the second position of each pixel in the cluster area.
  • the weight determines the orientation of the cluster area.
  • the first weight and/or the second weight of each pixel in the cluster area is a semantic category probability value of each pixel in the cluster area.
  • the processor 220 is specifically configured to determine the largest circumscribed rectangular frame of the cluster area according to the center of the cluster area and the orientation of the cluster area; The largest circumscribed rectangular frame of the clustering area determines the size of the object to be detected.
  • the processor 220 is specifically configured to use the center of the cluster region as the center of the largest circumscribed rectangular frame, and use the orientation of the cluster region as a constraint to obtain The largest circumscribed rectangular frame of the cluster area.
  • the processor 220 is further configured to determine the three-dimensional coordinates of the object to be detected according to the largest circumscribed rectangular frame of the cluster area.
  • the processor 220 is further configured to obtain point cloud data of the object to be detected.
  • the processor 220 is specifically configured to obtain the point cloud data of the object to be detected collected by the depth sensor.
  • the processor 220 is specifically configured to obtain the first image and the second image collected by the binocular camera on the object to be detected; according to the first image and the second image Image to obtain point cloud data of the object to be detected.
  • the electronic device 200 of the embodiment of the present application may be used to implement the technical solutions of the method embodiments shown above, and its implementation principles and technical effects are similar, and will not be repeated here.
  • FIG. 9 is another schematic diagram of an electronic device provided by an embodiment of the application. Based on the foregoing embodiment, as shown in FIG. 9, the electronic device 200 of the embodiment of the application further includes a binocular camera 230,
  • the binocular camera 230 is used to collect the first image and the second image of the object to be detected;
  • the processor 220 is specifically configured to obtain the first image and the second image collected by the binocular camera; and obtain a point cloud of the object to be detected according to the first image and the second image data.
  • each point cloud in the point cloud data includes three-dimensional data and reflectivity of the point cloud.
  • the electronic device of the embodiment of the present application can be used to implement the technical solution of the method embodiment shown above, and its implementation principles and technical effects are similar, and will not be repeated here.
  • FIG. 10 is a schematic structural diagram of a vehicle provided by an embodiment of the application.
  • a vehicle 50 in this embodiment includes a body 51 and an electronic device 52 installed on the body 51.
  • the electronic device 52 is the electronic device shown in FIG. 8 or FIG. 9, and the electronic device 52 is used for object detection, for example, detecting obstacles on the running path of the vehicle.
  • the electronic device 52 is installed on the roof of the vehicle body 51. If the electronic device is the electronic device shown in FIG. 9, the binocular camera in the electronic device 52 can face the front or the rear of the vehicle for collecting One image and second image.
  • the electronic device 52 is installed on the front windshield of the vehicle body 51, or the electronic device 52 is installed on the rear windshield of the vehicle body 51.
  • the electronic device 52 is installed on the front of the vehicle body 51, or the electronic device 52 is installed on the rear of the vehicle body 51.
  • the embodiment of the present application does not limit the installation position of the electronic device 52 on the body 51, which is specifically determined according to actual needs.
  • the vehicle of the embodiment of the present application may be used to implement the technical solution of the embodiment of the object detection method shown above, and its implementation principles and technical effects are similar, and will not be repeated here.
  • FIG. 11 is a schematic structural diagram of a vehicle provided by an embodiment of this application.
  • the vehicle 60 of this embodiment includes: a vehicle body 61 and an electronic device 62 installed on the vehicle body 61.
  • the electronic device 62 is the electronic device shown in FIG. 8 or FIG. 9, and the electronic device 62 is used for object detection, such as detecting obstacles on the running path of a vehicle.
  • the vehicle 60 in this embodiment may be a ship, automobile, bus, railway vehicle, aircraft, railway locomotive, scooter, bicycle, etc.
  • the electronic device 62 can be installed on the front, rear, or middle of the vehicle body 61, etc.
  • the embodiment of the present application does not limit the installation position of the electronic device 62 on the vehicle body 61, and is specifically determined according to actual needs.
  • the transportation tool of the embodiment of the present application can be used to implement the technical solution of the above-mentioned object detection method embodiment, and its implementation principles and technical effects are similar, and will not be repeated here.
  • this intensive detection method can make the object detection recall high.
  • the orientation and 3D position are predicted at the pixel level, and then weighted and fused. This method can obtain more reliable and stable positioning accuracy.
  • the orientation prediction and the priori that the target frame in the top view is unlikely to overlap, effectively reduce the calculation amount of rectangular frame fitting.
  • the solution provided by the patent can efficiently realize high-precision three-dimensional dynamic obstacle positioning, which is very suitable for dynamic obstacle perception in autonomous driving scenarios.
  • FIG. 12 is a schematic structural diagram of a drone provided by an embodiment of the application.
  • the drone 100 shown in FIG. 12 can be a multi-rotor, fixed-wing and other types of drones, where the multi-rotor drone can include Quad-rotor, hexa-rotor, octo-rotor and other drones including other numbers of rotors.
  • a rotary wing drone is taken as an example for description.
  • the drone 100 may include a power system 150, a flight control system 160, a frame, and electronic equipment 120 fixed on the frame.
  • the frame may include a fuselage and a tripod (also called a landing gear).
  • the fuselage may include a center frame and one or more arms connected to the center frame, and the one or more arms extend radially from the center frame.
  • the tripod is connected with the fuselage and used for supporting the UAV 100 when it is landed.
  • the power system 150 may include one or more electronic speed regulators (referred to as ESCs) 151, one or more propellers 153, and one or more motors 152 corresponding to the one or more propellers 153, wherein the motors 152 are connected to Between the electronic governor 151 and the propeller 153, the motor 152 and the propeller 153 are arranged on the arm of the UAV 110; the electronic governor 151 is used to receive the driving signal generated by the flight control system 160 and provide driving according to the driving signal Current is supplied to the motor 152 to control the speed of the motor 152.
  • the motor 152 is used to drive the propeller to rotate, thereby providing power for the flight of the drone 100, and the power enables the drone 100 to realize one or more degrees of freedom of movement.
  • the drone 100 may rotate around one or more rotation axes.
  • the aforementioned rotation axis may include a roll axis (Roll), a yaw axis (Yaw), and a pitch axis (pitch).
  • the motor 152 may be a DC motor or an AC motor.
  • the motor 152 may be a brushless motor or a brushed motor.
  • the flight control system 160 may include a flight controller 161 and a sensing system 162.
  • the sensing system 162 is used to measure the attitude information of the drone, that is, the position information and state information of the drone 110 in space, such as three-dimensional position, three-dimensional angle, three-dimensional velocity, three-dimensional acceleration, and three-dimensional angular velocity.
  • the sensing system 162 may include, for example, at least one of sensors such as a gyroscope, an ultrasonic sensor, an electronic compass, an inertial measurement unit (IMU), a vision sensor, a global navigation satellite system, and a barometer.
  • the global navigation satellite system may be a global positioning system (Global Positioning System, GPS).
  • the flight controller 161 is used to control the flight of the drone 100, for example, it can control the flight of the drone 110 according to the attitude information measured by the sensor system 162. It should be understood that the flight controller 161 can control the drone 100 according to pre-programmed program instructions, and can also control the drone 100 by responding to one or more control instructions from the control terminal 140.
  • the electronic device 120 is used to implement object detection and send the detection result to the flight control system 160.
  • the above flight control system 160 controls the flight of the drone 100 according to the object detection result.
  • the electronic device further includes a photographing component, and the photographing component is a binocular camera for collecting the first image and the second image.
  • the photographing component of this embodiment at least includes a photosensitive element, and the photosensitive element is, for example, a Complementary Metal Oxide Semiconductor (CMOS) sensor or a Charge-coupled Device (CCD) sensor.
  • CMOS Complementary Metal Oxide Semiconductor
  • CCD Charge-coupled Device
  • the unmanned aerial vehicle of the embodiment of the present application can be used to implement the technical solution of the object detection method in the foregoing method embodiment, and its implementation principles and technical effects are similar, and will not be repeated here.
  • the embodiment of the present application also provides a computer storage medium.
  • the computer storage medium is used to store the computer software instructions for detecting the above object.
  • the computer can execute various possible object detection methods in the foregoing method embodiments.
  • the processes or functions described in the embodiments of the present application can be generated in whole or in part.
  • the computer instructions can be stored in a computer storage medium, or transmitted from one computer storage medium to another computer storage medium, and the transmission can be transmitted to another by wireless (such as cellular communication, infrared, short-range wireless, microwave, etc.) Website site, computer, server or data center for transmission.
  • the computer storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, an SSD).
  • a person of ordinary skill in the art can understand that all or part of the steps in the above method embodiments can be implemented by a program instructing relevant hardware.
  • the foregoing program can be stored in a computer readable storage medium. When the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks, etc., which can store program codes Medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

An object detection method, an electronic device (200), and a computer storage medium, the method comprising: performing first processing on acquired point cloud data of an object to be detected to acquire a first feature map (S101); performing detection on each pixel in the first feature map, to acquire a first detection result for each pixel (S102); and on the basis of the first detection result for each pixel, determining a second detection result for the object to be detected (S103). By means of performing pixel-level detection on each pixel in a feature map, such a mode of dense detection is capable of improving the recall rate of object detection, thus improving the accuracy of object detection.

Description

物体检测方法、电子设备与计算机存储介质Object detection method, electronic equipment and computer storage medium 技术领域Technical field
本发明实施例涉及物体检测技术领域,尤其涉及一种物体检测方法、电子设备与计算机存储介质。The embodiments of the present invention relate to the technical field of object detection, and in particular to an object detection method, electronic equipment, and computer storage medium.
背景技术Background technique
随着图像处理技术的发展,智能驾驶技术得到了广泛的关注,智能驾驶包括自动驾驶和辅助驾驶。智能驾驶技术的一个核心技术是障碍物检测,障碍物检测结果的准确性直接关系到智能驾驶的安全性和可靠性。With the development of image processing technology, intelligent driving technology has received extensive attention. Intelligent driving includes automatic driving and assisted driving. One of the core technologies of intelligent driving technology is obstacle detection. The accuracy of obstacle detection results is directly related to the safety and reliability of intelligent driving.
常见的障碍物检测技术,其利用车辆搭载的相机、激光雷达、毫米波雷达等传感器来检测道路场景中的动态障碍物(比如,车辆、行人),进而获得动态障碍物的姿态朝向、三维位置和物理尺寸等信息。A common obstacle detection technology that uses sensors such as cameras, lidars, millimeter wave radars and other sensors on the vehicle to detect dynamic obstacles (such as vehicles, pedestrians) in the road scene, and then obtain the attitude and three-dimensional position of the dynamic obstacles And physical dimensions.
然而,已有的障碍物检测方法其检测准确性差,无法实现对障碍物的准确检测。However, the existing obstacle detection methods have poor detection accuracy and cannot achieve accurate detection of obstacles.
发明内容Summary of the invention
本发明实施例提供一种物体检测方法、电子设备与计算机存储介质,实现对物体的准确检测。The embodiment of the present invention provides an object detection method, electronic equipment and computer storage medium to realize accurate detection of objects.
第一方面,本申请实施例一种物体检测方法,包括:In the first aspect, an object detection method according to an embodiment of the present application includes:
对获取的待检测物体的点云数据进行第一处理,获得第一特征图;Perform first processing on the acquired point cloud data of the object to be detected to obtain a first feature map;
对所述第一特征图中每个像素点进行检测,获得每个所述像素点的第一检测结果;Detecting each pixel in the first feature map to obtain a first detection result of each pixel;
根据每个所述像素点的第一检测结果,确定所述待检测物体的第二检测结果。According to the first detection result of each pixel point, the second detection result of the object to be detected is determined.
第二方面,本申请实施例一种电子设备,包括:In the second aspect, an electronic device according to an embodiment of the present application includes:
存储器,用于存储计算机程序;Memory, used to store computer programs;
处理器,用于执行所述计算机程序,具体用于:The processor is configured to execute the computer program, specifically:
对获取的待检测物体的点云数据进行第一处理,获得第一特征图;Perform first processing on the acquired point cloud data of the object to be detected to obtain a first feature map;
对所述第一特征图中每个像素点进行检测,获得每个所述像素点的第一 检测结果;Detecting each pixel in the first feature map to obtain a first detection result of each pixel;
根据每个所述像素点的第一检测结果,确定所述待检测物体的第二检测结果。According to the first detection result of each pixel point, the second detection result of the object to be detected is determined.
第三方面,本申请实施例一种车辆,包括:车身和安装在所述车身上的如第二方面所述的电子设备。In a third aspect, a vehicle according to an embodiment of the present application includes: a vehicle body and the electronic device as described in the second aspect installed on the vehicle body.
第四方面,本申请实施例一种交通工具,包括:交通工具本体和安装在所述交通工具本体上的如第二方面所述的电子设备。In a fourth aspect, a vehicle of an embodiment of the present application includes: a vehicle body and the electronic device as described in the second aspect installed on the vehicle body.
第五方面,本申请实施例一种无人机,包括:机身和安装在所述机身上的如第二方面所述的电子设备。In a fifth aspect, an unmanned aerial vehicle according to an embodiment of the present application includes: a fuselage and the electronic device as described in the second aspect installed on the fuselage.
第六方面,本申请实施例一种计算机存储介质,所述存储介质中存储计算机程序,所述计算机程序在执行时实现如第一方面所述的模拟通信接口的时序控制方法。In a sixth aspect, an embodiment of the present application is a computer storage medium, in which a computer program is stored, and the computer program implements the timing control method of an analog communication interface as described in the first aspect when the computer program is executed.
本申请实施例提供的物体检测方法、电子设备与计算机存储介质,通过对获取待检测物体的点云数据进行第一处理,获得第一特征图;对所述第一特征图中每个像素点进行检测,获得每个所述像素点的第一检测结果;根据每个所述像素点的第一检测结果,确定所述待检测物体的第二检测结果。即本申请实施例,通过对第一特征图中的每个像素点进行像素级别的检测,这种密集检测的方式可以提高物体检测的召回率,进而提高了物体检测的准确性。The object detection method, electronic device, and computer storage medium provided by the embodiments of the present application obtain the first feature map by performing first processing on the point cloud data of the object to be detected; for each pixel in the first feature map Perform detection to obtain the first detection result of each pixel; and determine the second detection result of the object to be detected according to the first detection result of each pixel. That is, in the embodiment of the present application, by performing pixel-level detection on each pixel in the first feature map, this intensive detection method can increase the recall rate of object detection, thereby improving the accuracy of object detection.
附图说明Description of the drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings used in the description of the embodiments or the prior art. Obviously, the drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1为本申请实施例涉及的一种应用场景示意图;FIG. 1 is a schematic diagram of an application scenario involved in an embodiment of this application;
图2为本申请实施例提供的物体检测方法的一种流程图;FIG. 2 is a flowchart of an object detection method provided by an embodiment of the application;
图3A为本申请实施例涉及的神经网络模型的框架示意图;3A is a schematic diagram of the framework of a neural network model involved in an embodiment of this application;
图3B为本申请实施例涉及的一种神经网络模型的示意图;3B is a schematic diagram of a neural network model involved in an embodiment of this application;
图3C为本申请实施例涉及的另一种神经网络模型的示意图;3C is a schematic diagram of another neural network model involved in an embodiment of this application;
图4为本申请实施例提供的物体检测方法的另一流程图;FIG. 4 is another flowchart of an object detection method provided by an embodiment of the application;
图5为本申请实施例提供的物体检测方法的另一流程图;FIG. 5 is another flowchart of the object detection method provided by an embodiment of the application;
图6为本申请实施例提供的物体检测方法的另一流程图;FIG. 6 is another flowchart of an object detection method provided by an embodiment of the application;
图7为本申请实施例提供的物体检测方法的另一流程图;FIG. 7 is another flowchart of an object detection method provided by an embodiment of the application;
图8为本申请实施例提供的电子设备的一种示意图;FIG. 8 is a schematic diagram of an electronic device provided by an embodiment of the application;
图9为本申请实施例提供的电子设备的另一示意图FIG. 9 is another schematic diagram of an electronic device provided by an embodiment of the application
图10为本申请实施例提供的车辆的结构示意图;FIG. 10 is a schematic structural diagram of a vehicle provided by an embodiment of the application;
图11为本申请实施例提供的交通工具的结构示意图;FIG. 11 is a schematic structural diagram of a transportation tool provided by an embodiment of the application;
图12为本申请实施例提供的无人机的结构示意图。FIG. 12 is a schematic structural diagram of a drone provided by an embodiment of the application.
具体实施方式detailed description
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
在本申请的描述中,除非另有说明,“多个”是指两个或多于两个。In the description of this application, unless otherwise specified, "plurality" means two or more than two.
另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。In addition, in order to facilitate a clear description of the technical solutions of the embodiments of the present application, in the embodiments of the present application, words such as "first" and "second" are used to distinguish the same items or similar items with substantially the same function and effect. Those skilled in the art can understand that words such as "first" and "second" do not limit the quantity and order of execution, and words such as "first" and "second" do not limit the difference.
本申请实施例可以应用于任何需要对物体进行检测的领域,例如应用于自动驾驶、辅助驾驶等智能驾驶领域,可以检测道路场景中的车辆、行人等障碍物的检测,进而提高智能驾驶的安全性。The embodiments of this application can be applied to any field that needs to detect objects, such as automatic driving, assisted driving and other intelligent driving fields. It can detect obstacles such as vehicles and pedestrians in road scenes, thereby improving the safety of intelligent driving. Sex.
图1为本申请实施例涉及的一种应用场景示意图,如图1所示,智能驾驶车辆包括探测设备,智能驾驶车辆在行驶过程中,探测设备可以对前方车道的障碍物(如落石、遗撒物、枯枝、行人、车辆等)进行检测,获得障碍物的位置、姿态朝向和尺寸等检测信息,并根据这些检测信息来规划智能驾驶的状态,例如为变道、减速或者停车等。Figure 1 is a schematic diagram of an application scenario involved in an embodiment of this application. As shown in Figure 1, the intelligent driving vehicle includes detection equipment. During the driving of the intelligent driving vehicle, the detection equipment can detect obstacles in the front lane (such as falling rocks, Objects, dead branches, pedestrians, vehicles, etc.) are detected to obtain detection information such as the location, attitude, orientation, and size of obstacles, and based on the detection information to plan the state of intelligent driving, such as lane change, deceleration, or parking.
可选的,该探测设备具体可以是雷达、超声波探测设备、飞行时间测距 法(Time Of Flight,简称TOF)测距探测设备、视觉探测设备、激光探测设备等及其组合。Optionally, the detection equipment may specifically be radar, ultrasonic detection equipment, Time Of Flight (TOF) ranging detection equipment, visual detection equipment, laser detection equipment, etc., and combinations thereof.
需要说明的是,图1为本申请的一种应用场景示意图,本申请实施例的应用场景包括但不限于图1所示。It should be noted that FIG. 1 is a schematic diagram of an application scenario of this application, and the application scenario of the embodiment of this application includes but is not limited to that shown in FIG. 1.
下面以具体地实施例对本发明的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solutions of the present invention will be described in detail below with specific embodiments. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
图2为本申请实施例提供的物体检测方法的一种流程图,如图2所示,本申请实施例的方法包括:FIG. 2 is a flowchart of an object detection method provided by an embodiment of the application. As shown in FIG. 2, the method of the embodiment of the application includes:
S101、对获取的待检测物体的点云数据进行第一处理,获得所述点云数据的第一特征图。S101: Perform first processing on the acquired point cloud data of the object to be detected, and obtain a first feature map of the point cloud data.
本申请实施例的执行主体为具有物体检测功能的装置,例如为检测装置,该检测装置可以集成在任一电子设备中,作为电子设备的一部分。可选的,该检测装置还可以是单独的电子设备。The execution subject of the embodiments of the present application is a device with an object detection function, for example, a detection device, which can be integrated in any electronic device as a part of the electronic device. Optionally, the detection device may also be a separate electronic device.
该电子设备可以为车载设备,例如为行程记录仪等。所述电子设备的载体还可是飞行器、无人飞行器、智能手持设备、具有稳定云台的智能手持设备等。The electronic device may be a vehicle-mounted device, such as a trip recorder. The carrier of the electronic device may also be an aircraft, an unmanned aerial vehicle, a smart handheld device, a smart handheld device with a stable platform, and the like.
本申请实施例以执行主体为电子设备为例进行说明。The embodiment of the present application is described by taking an example where the execution subject is an electronic device.
上述点云数据中的每个点云包括该点云的三维数据和反射率等信息,其中点云的三维数据包括点云在点云数据坐标系中的三维坐标。Each point cloud in the aforementioned point cloud data includes information such as three-dimensional data and reflectance of the point cloud, wherein the three-dimensional data of the point cloud includes the three-dimensional coordinates of the point cloud in the point cloud data coordinate system.
可选的,所述第一处理包括如下至少一个:至少一次卷积操作、至少一次采样操作和至少一次堆叠操作。Optionally, the first processing includes at least one of the following: at least one convolution operation, at least one sampling operation, and at least one stacking operation.
可选的,所述采样操作可以包括:下采样操作和/或上采样操作。Optionally, the sampling operation may include: a down sampling operation and/or an up sampling operation.
在一种可能的实现方式中,本步骤将获得的待检测物体的点云数据输入图3A所示的神经网络模型中,该神经网络模型对点云数据进行第一处理,获得点云数据的第一特征图。该神经网络模型包括N层特征图,第1层特征图与第2层特征图之间包括至少一个卷积和至少一个下采样操作,其中,卷积/下采样的过程是为了提取高层信息并增大神经元的感受野。第2特征图与第3层特征图之间包括至少一个卷积和至少一个下采样操作。第3特征图至第N-2层特征图之间包括至少一个堆叠操作,其中,堆叠操作是为了使上采 样过程中特征图有更好的细节信息。第N-2特征图至第N-1层特征图之间包括至少一个卷积和至少一个上采样操作,其中,卷积/上采样的过程是为了提取高层信息并进行像素级别的检测。第N-1特征图至第N层特征图之间也包括至少一个卷积和至少一个上采样操作。In a possible implementation, the point cloud data of the object to be detected obtained in this step is input into the neural network model shown in FIG. 3A. The neural network model performs the first processing on the point cloud data to obtain the point cloud data. The first feature map. The neural network model includes N-layer feature maps, and at least one convolution and at least one down-sampling operation are included between the first-layer feature maps and the second-layer feature maps. The convolution/down-sampling process is to extract high-level information and Enlarge the receptive field of neurons. At least one convolution and at least one down-sampling operation are included between the second feature map and the third layer feature map. There is at least one stacking operation between the 3rd feature map and the N-2th layer feature map, where the stacking operation is to make the feature map have better detailed information during the sampling process. The N-2th feature map to the N-1th layer feature map includes at least one convolution and at least one upsampling operation, where the process of convolution/upsampling is to extract high-level information and perform pixel-level detection. The N-1th feature map to the Nth layer feature map also includes at least one convolution and at least one upsampling operation.
需要说明的是,图3A只是一种神经网络模型的示例,本申请实施例的神经网络模型包括但不限于图3A所示,且图3A所示的各特征图进行的操作数目,可以根据计算资源的要求而自行设置。可选的,如图3A所示,第2层特征图的输出还可以作为第N-1层特征图的输入。It should be noted that FIG. 3A is only an example of a neural network model. The neural network model of the embodiment of the present application includes but is not limited to that shown in FIG. 3A, and the number of operations performed on each feature map shown in FIG. 3A can be calculated according to Set your own resource requirements. Optionally, as shown in FIG. 3A, the output of the layer 2 feature map can also be used as the input of the N-1 layer feature map.
在一种示例中,本申请实施例采用的神经网络模型还可以为如图3B所示的分割网络模型,示例性的,分割网络模型为如图所示的结构,包括例如9层,其中,例如第1层至第5层中,每一层包括至少一次卷积操作(Conv),相邻两层之间为池化操作(Pooling),卷积操作和池化操作实现下采样。例如第5层至第9层中,每一层包括至少一次卷积操作(Conv),相邻两层之间为向上卷积操作(up-Conv),上卷积操作(up-Conv)实现上采样。不同层间可以实现堆叠,如图3B所示,第1层的输出可以输入至第9层实现堆叠,第2层的输出可以输入至第8层实现堆叠,第3层的输出可以输入至第7层实现堆叠,第4层的输出可以输入至第6层实现堆叠。堆叠可以实现多尺度特征融合、特征逐点相加、以及特征通道维度拼接,可以获得像素级别的分割图(segement map),可以对每一个像素进行语义类别判断。第m层的输出直接以堆叠的方式与N-m层叠在一起,从而实现同层,同尺寸的特征图图像融合,而第m层的图像特征图由于卷积次数少,带有更多的浅层信息,在堆叠之后,第m层输出的信息可以与之前未经过多次卷积和池化的信息融合。进而实现浅层信息和深层信息在像素尺度上的拼接和对齐。在堆叠过程中,由于同层的特征图尺寸是相等的,拼接和对齐只需要在特征图尺寸上对齐,这样非常有利于在每一层堆叠过程中,深层语义信息与浅层特征图信息的融合。在另一种示例中,本申请实施例采用的神经网络模型可以为如图3C所示的深层实验网络,其中编码器模块通过在多个尺度上采用空洞卷积来编码多尺度上下文信息,而解码器模块沿着对象边界细化分割结果。该深层实验网络可以实现像素级分类。In an example, the neural network model used in the embodiment of the present application may also be a segmentation network model as shown in FIG. 3B. Exemplarily, the segmentation network model has a structure as shown in the figure, including, for example, 9 layers, where: For example, in the first to fifth layers, each layer includes at least one convolution operation (Conv), and the pooling operation (Pooling) is performed between two adjacent layers, and the convolution operation and the pooling operation implement down-sampling. For example, in the 5th to 9th layers, each layer includes at least one convolution operation (Conv), between two adjacent layers is an upward convolution operation (up-Conv), and an upper convolution operation (up-Conv) is implemented Upsampling. Different layers can be stacked. As shown in Figure 3B, the output of the first layer can be input to the 9th layer for stacking, the output of the second layer can be input to the 8th layer for stacking, and the output of the third layer can be input to the first 7 layers are stacked, and the output of the 4th layer can be input to the 6th layer for stacking. Stacking can achieve multi-scale feature fusion, feature point-by-point addition, and feature channel dimension splicing, and a pixel-level segmentation map can be obtained, and semantic category judgments can be made for each pixel. The output of the m-th layer is directly stacked with Nm in a stacked manner, so as to realize the fusion of the feature map images of the same layer and the same size, and the image feature map of the m-th layer has more shallow layers due to less convolution. Information. After stacking, the output information of the m-th layer can be fused with the information that has not undergone multiple convolutions and pooling before. And then realize the splicing and alignment of shallow information and deep information on the pixel scale. During the stacking process, since the feature map sizes of the same layer are equal, splicing and alignment only need to be aligned in the feature map size, which is very beneficial to the deep semantic information and shallow feature map information in the stacking process of each layer. Fusion. In another example, the neural network model used in the embodiment of the present application may be a deep experimental network as shown in FIG. 3C, where the encoder module encodes multi-scale context information by using hole convolution on multiple scales, and The decoder module refines the segmentation result along the object boundary. This deep experimental network can achieve pixel-level classification.
S102、对所述第一特征图中每个像素点进行检测,获得每个所述像素点 的第一检测结果。S102. Detect each pixel in the first feature map to obtain a first detection result of each pixel.
根据上述步骤获得第一特征图后,对该第一特征图中的每一个像素点进行检测,获得每一个像素点的检测结果,将每一个像素点的检测结果记为第一检测结果。After the first feature map is obtained according to the above steps, each pixel in the first feature map is detected to obtain the detection result of each pixel, and the detection result of each pixel is recorded as the first detection result.
例如,将第一特征图输入神经网络模型中,预测出每个像素点的第一检测结果。For example, the first feature map is input into the neural network model, and the first detection result of each pixel is predicted.
在一种示例中,上述步骤S101和S102可以通过一个神经网络模型来预测,例如将待检测物体的点云数据输入图3B或图3C所示的神经网络模型中,首先预测出云数据的第一特征图,接着将该第一特征图作为输入继续输入该神经网络模型中,预测出第一特征图中每一个像素点的第一检测结果。In an example, the above steps S101 and S102 can be predicted by a neural network model. For example, input the point cloud data of the object to be detected into the neural network model shown in FIG. 3B or FIG. 3C, and first predict the first cloud data A feature map, and then continue to input the first feature map as an input into the neural network model to predict the first detection result of each pixel in the first feature map.
可选的,像素点的第一检测结果包括如下至少一个:像素点的语义类别、像素点的朝向、像素点与待检测物体的中心点的距离。Optionally, the first detection result of the pixel point includes at least one of the following: the semantic category of the pixel point, the orientation of the pixel point, and the distance between the pixel point and the center point of the object to be detected.
S103、根据每个所述像素点的第一检测结果,确定所述待检测物体的第二检测结果。S103: Determine a second detection result of the object to be detected according to the first detection result of each pixel.
可选的,待检测物体的第二检测结果包括如下至少一个:待检测物体的语义(semantic)类别、待检测物体的三维坐标、待检测物体的尺寸。Optionally, the second detection result of the object to be detected includes at least one of the following: the semantic category of the object to be detected, the three-dimensional coordinates of the object to be detected, and the size of the object to be detected.
例如,根据上述步骤检测的每个像素点的语义类别,可以获得待检测物体的语义类别;根据每个像素点的空间位置、朝向和每个像素点与待检测物体的中心点的距离,获得待检测物体的三维坐标和尺寸。For example, according to the semantic category of each pixel detected in the above steps, the semantic category of the object to be detected can be obtained; according to the spatial position and orientation of each pixel and the distance between each pixel and the center of the object to be detected, The three-dimensional coordinates and size of the object to be detected.
本步骤,基于每个像素点的第一检测结果,确定待检测物体的第二检测结果,这样基于像素级别的物体检测,可以提供物体检测的准确性。In this step, based on the first detection result of each pixel, the second detection result of the object to be detected is determined. In this way, object detection based on the pixel level can provide the accuracy of object detection.
本申请实施例提供的物体检测方法,通过对获取的待检测物体的点云数据进行第一处理,获得第一特征图;对所述第一特征图中每个像素点进行检测,获得每个所述像素点的第一检测结果;根据每个所述像素点的第一检测结果,确定所述待检测物体的第二检测结果。即本申请实施例,通过对第一特征图中的每个像素点进行像素级别的检测,这种密集检测的方式可以提高物体检测的召回率,进而提高了物体检测的准确性。The object detection method provided by the embodiment of the application obtains a first feature map by performing first processing on the obtained point cloud data of the object to be detected; detects each pixel in the first feature map to obtain each The first detection result of the pixel; and the second detection result of the object to be detected is determined according to the first detection result of each pixel. That is, in the embodiment of the present application, by performing pixel-level detection on each pixel in the first feature map, this intensive detection method can increase the recall rate of object detection, thereby improving the accuracy of object detection.
在上述S101之前,本申请实施例还包括:Before the above S101, the embodiment of the present application further includes:
S100、获取待检测物体的点云数据。S100. Obtain point cloud data of the object to be detected.
本步骤对获取待检测物体的点云数据的方式不做限制,具体根据实际需要确定。This step does not limit the way of obtaining the point cloud data of the object to be detected, which is determined according to actual needs.
在一种示例中,深度传感器采集待检测物体的点云数据,电子设备从深度传感器处获取深度传感器采集的待检测物体的点云数据。In an example, the depth sensor collects point cloud data of the object to be detected, and the electronic device obtains the point cloud data of the object to be detected collected by the depth sensor from the depth sensor.
可选的,该深度传感器可以安装在电子设备上,为电子设备的一部分。Optionally, the depth sensor can be installed on the electronic device and is a part of the electronic device.
可选的,该深度传感器与电子设备为两个元件,深度传感器与电子设备通信连接,深度传感器可以将采集的待检测物体的点云数据传输给电子设备。可选的,深度传感器与电子设备的通信连接方式可以是有线连接,也可以是无线连接,对此不做限制。可选的,该深度传感器可以是雷达、超声波探测设备、TOF测距探测设备、激光探测设备等及其组合。Optionally, the depth sensor and the electronic device are two components, and the depth sensor is in communication connection with the electronic device, and the depth sensor can transmit the collected point cloud data of the object to be detected to the electronic device. Optionally, the communication connection between the depth sensor and the electronic device may be a wired connection or a wireless connection, which is not limited. Optionally, the depth sensor may be radar, ultrasonic detection equipment, TOF ranging detection equipment, laser detection equipment, etc. and combinations thereof.
在另一种示例方式中,电子设备获取待检测物体的点云数据的方式还可以是:电子设备获取双目摄像头对所述待检测物体进行采集的第一图像和第二图像;根据所述第一图像和所述第二图像,获得待检测物体的点云数据。In another example manner, the method for the electronic device to obtain the point cloud data of the object to be detected may also be: the electronic device obtains the first image and the second image collected by the binocular camera of the object to be detected; The first image and the second image obtain point cloud data of the object to be detected.
在该方式中,假设在智能驾驶领域,在车辆上安装双目摄像头,该双目摄像头对道路图进行采集,进而可以采集到待检测物体的第一图像和第二图像,可选的,第一图像为左目图像,第二图像为右目图像,反之亦然。电子设备将第一图像和第二图像的像素点进行匹配,可以得到每个像素点的视差值,再基于三角测量原理,根据每个像素点的视差值,可以得到待检测物体中每个物理点对应的点云数据。In this method, it is assumed that in the field of intelligent driving, a binocular camera is installed on the vehicle. The binocular camera collects the road map, and then can collect the first image and the second image of the object to be detected. One image is the left-eye image, the second image is the right-eye image, and vice versa. The electronic device matches the pixels of the first image and the second image to obtain the disparity value of each pixel. Based on the triangulation principle, according to the disparity value of each pixel, each pixel in the object to be detected can be obtained. Point cloud data corresponding to each physical point.
图4为本申请实施例提供的物体检测方法的另一流程图,本申请实施例涉及的对获取的待检测物体的点云数据进行第一处理,获得第一特征图的具体过程。如图4所示,上述S101可以包括:FIG. 4 is another flowchart of the object detection method provided by the embodiment of the application, and the specific process of performing the first processing on the acquired point cloud data of the object to be detected in the embodiment of the application to obtain the first feature map. As shown in Figure 4, the above S101 may include:
S201、对获取的所述点云数据进行特征编码,获得所述点云数据的编码特征图。S201: Perform feature encoding on the acquired point cloud data to obtain an encoding feature map of the point cloud data.
上述步骤获得的点云数据包括多个点云,每个点云是4维向量,包括该点云的三维坐标和反射率。上述点云数据是无序的,为了便于后续获取第一特征图步骤的准确、快速实现,则本步骤可以对获得的无序的点云数据进行预处理,在尽可能减少损失信息量的情况下,进行点云数据进行特征编码,进而获得点云数据的特征编码图。The point cloud data obtained in the above steps includes multiple point clouds, and each point cloud is a 4-dimensional vector, including the three-dimensional coordinates and reflectivity of the point cloud. The above point cloud data is disordered. In order to facilitate the accurate and rapid implementation of the subsequent step of obtaining the first feature map, this step can preprocess the obtained disordered point cloud data to minimize the amount of information loss. Next, perform feature encoding of the point cloud data, and then obtain a feature encoding map of the point cloud data.
本步骤对点云数据进行特征编码的具体编码方式不做限制,例如可以是根据每个点云的三维坐标和/或反射率,对点云数据中的各点云进行特征编码,获得点云数据的特征编码图。In this step, there is no restriction on the specific encoding method for feature encoding of the point cloud data. For example, it can be based on the three-dimensional coordinates and/or reflectivity of each point cloud to perform feature encoding on each point cloud in the point cloud data to obtain the point cloud. The feature code map of the data.
可选的,在智能驾驶领域,进行物体检测时,通常是将物体投影至俯视图上,检测物体在俯视图方向下的位置等信息。其中,俯视图是俯视正视图,俯视正视图包括三维数据在水平方向上投影的二维坐标,以及与二维坐标对应的高度数据信息以及反射率信息。因此,本步骤对点云数据进行特征编码的编码方式还可以是,将点云数据投影至俯视图下,在俯视图方向下对点云数据进行特征编码,获得点云数据的特征编码图。Optionally, in the field of intelligent driving, when object detection is performed, the object is usually projected onto the top view, and information such as the position of the object in the top view direction is detected. Wherein, the top view is a top view, and the top view includes two-dimensional coordinates of the three-dimensional data projected in the horizontal direction, and height data information and reflectance information corresponding to the two-dimensional coordinates. Therefore, the coding method of performing feature coding on the point cloud data in this step can also be projecting the point cloud data under the top view, and performing feature coding on the point cloud data in the top view direction to obtain a feature coding map of the point cloud data.
可选的,本步骤获得第一特征图的大小与点云数据的特征编码图的大小一致。Optionally, the size of the first feature map obtained in this step is consistent with the size of the feature encoding map of the point cloud data.
S202、对所述编码特征图进行所述第一处理,获得所述第一特征图。S202. Perform the first processing on the encoded feature map to obtain the first feature map.
示例性的,可以将编码特征图输入到图3B或图3C所示的神经网络模型中,获得点云数据的第一特征图。Exemplarily, the encoded feature map may be input into the neural network model shown in FIG. 3B or FIG. 3C to obtain the first feature map of the point cloud data.
本申请实施例,通过对大量且无需的点云数据进行特征编码,基于特征编码后获得的编码特征图来生成第一特征图,可以提高第一特征图的生成速度,且可以提高第一特征图的准确性。In the embodiment of the present application, by performing feature encoding on a large amount of unnecessary point cloud data, and generating the first feature map based on the encoded feature map obtained after feature encoding, the generation speed of the first feature map can be increased, and the first feature can be improved The accuracy of the graph.
在上述实施例的基础上,在一种可能的实现方式中,上述S201对获取的所述点云数据进行特征编码,获得所述点云数据的编码特征图,可以包括:On the basis of the foregoing embodiment, in a possible implementation manner, the foregoing S201 performs feature encoding on the acquired point cloud data to obtain an encoding feature map of the point cloud data, which may include:
S300、在俯视图方向下,对所述点云数据进行压缩,获得所述点云数据的编码特征图。S300. Compress the point cloud data in the top view direction to obtain an encoding feature map of the point cloud data.
在智能驾驶领域,通常是基于俯视图方向下进行物体检测,因此,本步骤可以将点云数据投影至俯视图方向下,接着,对俯视图方向下的点云数据进行压缩,获得点云数据的编码特征图。在一个可选的实施例中,三维点云可以表示为(x,y,z,f)的四维向量,而使用特征向量表示时,三维点云集和中的点是无序的,通过将三维点云压缩可以将无序的点集,以较小的损失量进行压缩。In the field of intelligent driving, object detection is usually based on the top view direction. Therefore, this step can project the point cloud data to the top view direction, and then compress the point cloud data in the top view direction to obtain the coding characteristics of the point cloud data Figure. In an alternative embodiment, the three-dimensional point cloud can be expressed as a four-dimensional vector of (x, y, z, f), and when the feature vector is used, the three-dimensional point cloud set and the points in it are disordered. Point cloud compression can compress disordered point sets with a small loss.
图5为本申请实施例提供的物体检测方法的另一流程图,如图5所示,上述S300可以包括:FIG. 5 is another flowchart of the object detection method provided by an embodiment of the application. As shown in FIG. 5, the foregoing S300 may include:
S301、将所述点云数据约束至预设的编码空间中。S301. Constrain the point cloud data to a preset coding space.
本申请实施例的点云数据为三维点云数据,根据预设的编码空间对三维点云数据进行约束,以将三维点云数据约束至预设的编码空间中。The point cloud data in the embodiment of the application is three-dimensional point cloud data, and the three-dimensional point cloud data is constrained according to a preset coding space, so as to constrain the three-dimensional point cloud data to the preset coding space.
该步骤可以理解为对点云数据的压缩。This step can be understood as the compression of point cloud data.
预设的编码空间可以理解为一个立方体,对应的编码范围为L×W×H,其中L表示距离、W表示宽度、H表示高度,单位可以为m。本申请实施例对预设的编码空间的大小不做限制,具体根据实际需要确定。The preset coding space can be understood as a cube, and the corresponding coding range is L×W×H, where L represents distance, W represents width, and H represents height, and the unit can be m. The embodiment of the present application does not limit the size of the preset coding space, which is specifically determined according to actual needs.
这样,本步骤可以将物理空间下三维点云数据约束至预设的编码空间L×W×H中。In this way, this step can constrain the three-dimensional point cloud data in the physical space to the preset coding space L×W×H.
S302、根据预设的分辨率,对所述编码空间中的点云数据进行栅格划分,并确定每个栅格的特征。S302: Perform grid division on the point cloud data in the coding space according to a preset resolution, and determine the characteristics of each grid.
具体的,根据上述步骤将点云数据约束至编码空间中,接着,根据预设的分辨率对编码空间中的点云数据进行栅格划分,得到多个栅格,其中一个栅格中可以包括一个或多个点云,也可以不包括点云。然后,确定每个栅格的特征,例如根据栅格中包括的点云来确定该栅格的特征。Specifically, the point cloud data is constrained to the coding space according to the above steps, and then the point cloud data in the coding space is grid-divided according to the preset resolution to obtain multiple grids, one of which may include One or more point clouds, or not including point clouds. Then, the characteristics of each grid are determined, for example, based on the point cloud included in the grid.
可选的,所述预设的分辨率包括:长度、宽度和高度三个方向的分辨率。Optionally, the preset resolution includes resolutions in three directions: length, width, and height.
在一种可能的实现方式下,上述S302中根据预设的分辨率,对所述编码空间中的点云数据进行栅格划分可以包括如下步骤A1:In a possible implementation manner, the grid division of the point cloud data in the coding space according to the preset resolution in S302 may include the following step A1:
步骤A1、根据所述长度、宽度和高度三个方向的分辨率,对所述编码空间中的点云数据进行栅格划分。Step A1: Perform grid division on the point cloud data in the coding space according to the resolution in the three directions of length, width and height.
具体的,根据所述长度方向的分辨率和所述预设编码空间的长度,对所述点云数据在长度方向上进行划分;根据所述宽度方向的分辨率和所述预设编码空间的宽度,对所述点云数据在宽度方向上进行划分;根据所述高度方向的分辨率和所述预设编码空间的宽度,对所述点云数据在高度方向上进行划分。Specifically, the point cloud data is divided in the length direction according to the resolution in the length direction and the length of the preset coding space; according to the resolution in the width direction and the length of the preset coding space Width, dividing the point cloud data in the width direction; dividing the point cloud data in the height direction according to the resolution in the height direction and the width of the preset coding space.
例如,预设的编码空间为L×W×H,预设的长度、宽度和高度三个方向的分辨率分别为:dl、dw、dh,这样根据dl、dw、dh对编码空间L×W×H中的点云数据进行栅格划分,获得的每一个栅格的大小为L/dl×W/dw×H/dh。For example, the preset coding space is L×W×H, and the preset resolutions in the three directions of length, width, and height are: dl, dw, dh, so that the coding space L×W is determined according to dl, dw, and dh. The point cloud data in ×H is divided into grids, and the size of each grid obtained is L/dl×W/dw×H/dh.
在一种可能的实现方式下,上述S302中确定每个栅格的特征具体可以包括如下步骤B1:In a possible implementation manner, determining the characteristics of each grid in the foregoing S302 may specifically include the following step B1:
步骤B1、根据所述栅格中包括的点云的个数和/或所述栅格所包括的点云的反射率,确定所述栅格的特征。Step B1: Determine the characteristics of the grid according to the number of point clouds included in the grid and/or the reflectivity of the point clouds included in the grid.
根据上述步骤,获得每个栅格的大小为L/dl×W/dw×H/dh,接着,获得每个栅格所包括的点云的个数和/或栅格所包括的点云的反射率,根据所包括的点云的个数和/或栅格所包括的点云的反射率,确定栅格的特征。According to the above steps, the size of each grid is obtained as L/dl×W/dw×H/dh, and then the number of point clouds included in each grid and/or the number of point clouds included in the grid is obtained The reflectivity determines the characteristics of the grid according to the number of point clouds included and/or the reflectivity of the point clouds included in the grid.
在一种示例中,根据所包括的点云的个数,确定栅格的特征,例如栅格中包括点云则确定该栅格的特征为1,若该栅格中不包括点云时,确定该栅格的特征为0。In an example, the feature of the grid is determined according to the number of point clouds included. For example, if the grid includes a point cloud, the feature of the grid is determined to be 1. If the grid does not include a point cloud, Determine the feature of this grid as 0.
S303、在俯视图方向下,根据每个栅格的特征,获得所述点云数据的编码特征图。S303. Obtain an encoding feature map of the point cloud data according to the feature of each grid in the top view direction.
本申请实施例在俯视图下确定点云数据的编码特征图,在俯视图方向下,高度信息丢失,距离信息及宽度信息在俯视图下存在,因此需要将高度信息提取出来,获得最终的点云数据的编码特征图。The embodiment of the application determines the coding feature map of the point cloud data in the top view. In the top view direction, the height information is lost, and the distance information and width information exist in the top view. Therefore, the height information needs to be extracted to obtain the final point cloud data. Encoding feature map.
可选的,所述编码特征图的尺度为C×A×B,其中C由所述预设编码空间的高度H与所述高度方向的分辨率dh的比值确定,例如C=2H/dh;A由预设编码空间的长度L与长度方向的分辨率dl的比值确定,例如A=L/dl;B由预设编码空间的宽度W与宽度方向的分辨率dw的比值确定,例如B=W/dw。Optionally, the scale of the coding feature map is C×A×B, where C is determined by the ratio of the height H of the preset coding space to the resolution dh in the height direction, for example, C=2H/dh; A is determined by the ratio of the length L of the preset coding space to the resolution dl in the length direction, for example, A=L/dl; B is determined by the ratio of the width W of the preset coding space to the resolution dw in the width direction, for example, B= W/dw.
例如,编码特征图的尺度为L×W×H为80×40×4,长、宽和高的分辨率均为0.1,获得的最终的编码特征图为C×A×B为80×800×400。For example, the scale of the coded feature map is L×W×H is 80×40×4, and the resolution of length, width and height are all 0.1, and the final coded feature map obtained is C×A×B which is 80×800× 400.
本申请实施例提供的物体检测方法,通过将点云数据约束至预设的编码空间中;根据预设的分辨率,对编码空间中的点云数据进行栅格划分,并确定每个栅格的特征;在俯视图方向下,根据每个栅格的特征,获得点云数据的编码特征图,进而实现对编码特征图的准确获得。The object detection method provided by the embodiment of the application constrains the point cloud data to a preset coding space; according to the preset resolution, the point cloud data in the coding space is divided into grids, and each grid is determined In the top view direction, according to the characteristics of each grid, the coding feature map of the point cloud data is obtained, and then the coding feature map is accurately obtained.
图6为本申请实施例提供的物体检测方法的另一流程图,在上述实施例的基础上,本申请实施例涉及的是对第一特征图中每个像素点进行检测,获得每个所述像素点的第一检测结果的具体过程。其中,步骤S401至S403为获得每个像素点的朝向的具体过程,步骤S404为获得每个像素点的语义类别的具体过程,步骤S405和S406为获得每个像素点与待检测物体的中心点的距离的具体过程。图6所示,上述S102可以包括:Fig. 6 is another flowchart of the object detection method provided by the embodiment of the application. On the basis of the above-mentioned embodiment, the embodiment of the present application involves detecting each pixel in the first feature map to obtain each The specific process of the first detection result of the pixel point is described. Among them, steps S401 to S403 are the specific process of obtaining the orientation of each pixel, step S404 is the specific process of obtaining the semantic category of each pixel, and steps S405 and S406 are obtaining the center point of each pixel and the object to be detected. The specific process of the distance. As shown in FIG. 6, the foregoing S102 may include:
S401、以所述第一特征图的中心为圆心,沿着圆周方向,将所述第一特征图划分成多个区间。S401: Using the center of the first feature map as the center of the circle, divide the first feature map into a plurality of sections along the circumferential direction.
首先将第一特征图划分成多个区间,具体是,以第一特征图的中心为圆心,沿着圆周方向,将第一特征图划分成多个区间。Firstly, the first feature map is divided into multiple intervals, specifically, the first feature map is divided into multiple intervals along the circumferential direction with the center of the first feature map as the center.
例如,以第一特征图的中心为圆心,按照[-180°,180°]将第一特征图等分成若干个区间,并确定每个区间的中心。For example, taking the center of the first feature map as the center of the circle, divide the first feature map into several intervals according to [-180°, 180°], and determine the center of each interval.
S402、预测所述像素点所属的区间,以及所述像素点在所属区间中的位置坐标。S402: Predict the interval to which the pixel point belongs, and the position coordinate of the pixel point in the interval to which the pixel point belongs.
将第一特征图划分成多个区间后,针对第一特征图中每个像素点会预测出一个物体朝向所属的区间,以及所属区间的残差量,该残差量为像素点在所属区间中的相对位置,即像素点在所属区间中的位置坐标。After dividing the first feature map into multiple intervals, for each pixel in the first feature map, the interval to which an object orientation belongs will be predicted, and the residual amount of the interval to which the pixel belongs. The relative position in, that is, the position coordinate of the pixel in the interval.
例如,将上述预测第一特征图中每个像素点所属的区间作为分类问题,将预测每个像素点在所属区间中的位置坐标作为回归问题。这样,可以将第一特征图输入训练好的预测模型中,预测出第一特征图中每个像素点所属的区间,以及每个像素点在所属区间中的位置坐标。For example, the prediction of the interval to which each pixel in the first feature map belongs is used as a classification problem, and the prediction of the position coordinate of each pixel in the interval to which it belongs is used as a regression problem. In this way, the first feature map can be input into the trained prediction model to predict the interval to which each pixel in the first feature map belongs and the position coordinate of each pixel in the interval to which it belongs.
S403、根据所述像素点所属的区间的角度、以及所述像素点在所属区间中的位置坐标,确定所述像素点的朝向。S403: Determine the orientation of the pixel point according to the angle of the interval to which the pixel point belongs and the position coordinate of the pixel point in the interval to which the pixel point belongs.
根据上述步骤,预测出像素点所属的区间的角度和像素点在所属区间中的位置坐标,可以确定出像素点的具体角度,根据像素点的具体角度,可以确定出像素点的朝向。According to the above steps, the angle of the interval to which the pixel belongs and the position coordinate of the pixel in the interval are predicted, the specific angle of the pixel can be determined, and the orientation of the pixel can be determined according to the specific angle of the pixel.
在一种实现方式中,若像素点的第一检测结果包括像素点的语义类别,则上述S102中对第一特征图中每个像素点进行特征预测,获得每个所述像素点的第一检测结果,可以包括:In an implementation manner, if the first detection result of the pixel includes the semantic category of the pixel, the feature prediction is performed on each pixel in the first feature map in S102 to obtain the first detection result of each pixel. Test results can include:
S404、对像素点进行语义类别检测,获得所述像素点的语义类别。S404: Perform semantic category detection on the pixel to obtain the semantic category of the pixel.
例如,将第一特征图输入分类模型中,确定出每个像素点属于某个语义类别的概率。比如,确定像素点属于车辆、行人、自行车或者背景的概率。For example, input the first feature map into the classification model to determine the probability that each pixel belongs to a certain semantic category. For example, determine the probability that a pixel belongs to a vehicle, pedestrian, bicycle, or background.
在一种实现方式中,若像素点的第一检测结果包括像素点与待检测物体的中心点的距离,则上述S102中对第一特征图中每个像素点进行特征预测, 获得每个所述像素点的第一检测结果,可以包括:In one implementation, if the first detection result of the pixel includes the distance between the pixel and the center point of the object to be detected, the feature prediction is performed on each pixel in the first feature map in S102 to obtain each pixel. The first detection result of the pixel point may include:
S405、对所述像素点进行位置检测,获得所述像素点与所述待检测物体的中心点的距离。S405: Perform position detection on the pixel point to obtain the distance between the pixel point and the center point of the object to be detected.
例如,将第一特征图像输入位置检测模型中,可以获得每个像素点与待检测物体的中心点的矢量距离。For example, by inputting the first feature image into the position detection model, the vector distance between each pixel and the center point of the object to be detected can be obtained.
本申请实施例的方法,通过对像素点的第一特征图中每个像素点进行检测,分别获得每个像素点的朝向、语义类别和像素点与待检测物体的中心点的距离,进而实现对像素点的第一检测结果的准确获得。In the method of the embodiment of the present application, by detecting each pixel in the first feature map of the pixel, the orientation, semantic category, and distance of each pixel from the center of the object to be detected are obtained respectively, thereby achieving Accurate acquisition of the first detection result of the pixel.
图7为本申请实施例提供的物体检测方法的另一流程图,在上述实施例的基础上,本申请实施例涉及的是根据每个所述像素点的第一检测结果,获得所述待检测物体的第二检测结果的具体过程。其中,步骤S501为确定待检测物体的语义类别的具体过程,步骤S502为确定所述待检测物体的尺寸的具体过程。图7所示,上述S103可以包括:FIG. 7 is another flowchart of the object detection method provided by the embodiment of the application. On the basis of the above-mentioned embodiment, the embodiment of the present application relates to obtaining the pending detection result according to the first detection result of each pixel. The specific process of detecting the second detection result of the object. Wherein, step S501 is a specific process of determining the semantic category of the object to be detected, and step S502 is a specific process of determining the size of the object to be detected. As shown in FIG. 7, the foregoing S103 may include:
S501、根据所述第一特征图中每个所述像素点的语义类别,对所述第一特征图中的像素点进行聚类,确定所述待检测物体的语义类别。S501. Cluster the pixels in the first feature map according to the semantic category of each pixel in the first feature map to determine the semantic category of the object to be detected.
本步骤中,可以根据上述步骤获得的第一特征图中每个像素点的语义类别,对第一特征图中的各像素点进行聚类,确定待检测物体的语义类别。例如,将语义类别相同的像素点进行聚类,可以得到一个或多个聚类结果,将多个聚类结果中包括的像素点的数量最多的聚类结果作为最终的聚类结果,将该最终的聚类结果对应的语义类别,确定为待检测物体的语义类别。In this step, according to the semantic category of each pixel in the first feature map obtained in the above steps, each pixel in the first feature map can be clustered to determine the semantic category of the object to be detected. For example, by clustering pixels with the same semantic category, one or more clustering results can be obtained. The clustering result with the largest number of pixels included in the multiple clustering results is regarded as the final clustering result. The semantic category corresponding to the final clustering result is determined as the semantic category of the object to be detected.
在一种可能的实现方式中,上述S501可以包括如步骤C1和步骤C2;In a possible implementation manner, the foregoing S501 may include, for example, step C1 and step C2;
步骤C1、根据所述第一特征图中每个所述像素点的语义类别,对所述第一特征图中的像素点进行聚类,获取所述聚类区域。Step C1, cluster the pixels in the first feature map according to the semantic category of each pixel in the first feature map, and obtain the cluster area.
具体的,第一特征图中每个像素点的语义类别对第一特征图中的像素点进行聚类,可能会得到多个候选聚类区域,将多个候选聚类区域中最大的一个候选聚类区域确定为聚类区域。或者,将多个候选聚类区域中包括的像素点最多的一个候选聚类区域确定为聚类区域。Specifically, the semantic category of each pixel in the first feature map clusters the pixels in the first feature map, and multiple candidate cluster regions may be obtained, and the largest candidate among the multiple candidate cluster regions The cluster area is determined as the cluster area. Alternatively, one candidate cluster area including the most pixel points among the plurality of candidate cluster areas is determined as the cluster area.
可选的,上述对像素点进行聚类的方式可以为自底向上的逐渐聚类方式。Optionally, the foregoing method of clustering the pixels may be a bottom-up gradual clustering method.
步骤C2、根据所述聚类区域中每个像素点的语义类别,确定所述待检测 物体的语义类别。Step C2: Determine the semantic category of the object to be detected according to the semantic category of each pixel in the cluster area.
根据聚类区域中每个像素点的语义类别,确定待检测物体的语义类别,例如,聚类区域中每个第一像素点的语义类别为行人,则确定待检测物体的语义类别为行人。According to the semantic category of each pixel in the cluster area, the semantic category of the object to be detected is determined. For example, if the semantic category of each first pixel in the cluster area is a pedestrian, then the semantic category of the object to be detected is determined to be a pedestrian.
S502、根据所述聚类区域中每个所述像素点的空间位置和所述聚类区域中每个所述像素点与所述待检测物体的中心点的距离,确定所述聚类区域的中心。S502. Determine the size of the cluster area according to the spatial position of each pixel in the cluster area and the distance between each pixel in the cluster area and the center point of the object to be detected. center.
针对每个所述像素点,根据预设的分辨率和所述像素点对应的栅格,可以确定出所述像素点的位置信息。For each pixel, the position information of the pixel can be determined according to the preset resolution and the grid corresponding to the pixel.
上述第一特征图是基于编码特征图获得,而编码特征图以一定的分辨率栅格化后得到的,每个像素点可以理解为一个栅格,这样可以根据预设的编码空间和预设分辨率,可以获得栅格的位置信息,根据栅格的位置信息可以确定出该栅格对应的像素点的位置信息,其位置信息的精度为分辨率值。The above-mentioned first feature map is obtained based on the encoding feature map, and the encoding feature map is obtained by rasterizing at a certain resolution. Each pixel can be understood as a grid, so that it can be based on the preset encoding space and preset Resolution, the position information of the grid can be obtained, the position information of the pixel corresponding to the grid can be determined according to the position information of the grid, and the accuracy of the position information is the resolution value.
这样,可以根据出聚类区域中每个像素点的空间位置,以及聚类区域中每个像素点与待检测物体的中心点的距离,确定出聚类区域的中心位置。例如,聚类区域的中心与待检测物体的中心点重合,这样,可以根据每个像素点的空间位置与每个像素点与待检测物体的中心点的距离,可以确定出待检测物体的中心点的位置,进而确定出聚类区域的中心位置。In this way, the center position of the cluster area can be determined based on the spatial position of each pixel in the cluster area and the distance between each pixel in the cluster area and the center point of the object to be detected. For example, the center of the clustering area coincides with the center point of the object to be detected, so that the center of the object to be detected can be determined according to the spatial position of each pixel and the distance between each pixel and the center of the object to be detected The position of the point, and then determine the center position of the clustering area.
在一种可能的实现方式中,上述S502包括步骤D;In a possible implementation manner, the foregoing S502 includes step D;
步骤D、根据所述聚类区域中每个所述像素点的空间位置、所述聚类区域中每个所述像素点与所述待检测物体的中心点的距离、以及所述聚类区域内每个所述像素点的第一权重,确定所述聚类区域的中心。Step D. According to the spatial position of each pixel in the cluster area, the distance between each pixel in the cluster area and the center point of the object to be detected, and the cluster area The first weight of each of the pixels within determines the center of the cluster area.
将第一权重作为像素点与待检测物体的中心点的距离的权重,这样,在根据聚类区域中每个像素点的空间位置、聚类区域中每个像素点与待检测物体的中心点的距离,确定聚类区域的中心的过程中增加了每个像素点与待检测物体的中心点的距离的第一权重,进而提高了聚类区域的中心计算的准确性。The first weight is used as the weight of the distance between the pixel point and the center point of the object to be detected. In this way, according to the spatial position of each pixel in the cluster area, each pixel point in the cluster area and the center point of the object to be detected In the process of determining the center of the clustering area, the first weight of the distance between each pixel and the center point of the object to be detected is increased in the process of determining the center of the clustering area, thereby improving the accuracy of calculation of the center of the clustering area.
可选的,聚类区域内每个像素点的第一权重为聚类区域内每个像素点的语义类别概率值。Optionally, the first weight of each pixel in the cluster area is the semantic category probability value of each pixel in the cluster area.
S503、根据所述聚类区域中每个所述像素点的朝向,确定所述聚类区域 的朝向。S503: Determine the orientation of the cluster area according to the orientation of each pixel in the cluster area.
具体的,根据聚类区域中包括多个像素点,每个像素点具有朝向,这样可以根据每个像素点的朝向,可以确定出聚类区域的朝向。例如,将聚类区域中,像素点最多的朝向作为该聚类区域的朝向,例如,聚类区域中有100个像素点,其中有90个像素点的朝向相同为a,这样,可以确定聚类区域的朝向也为a。Specifically, according to the cluster area including a plurality of pixels, each pixel has an orientation, so that the orientation of the cluster area can be determined according to the orientation of each pixel. For example, the orientation of the most pixels in the cluster area is taken as the orientation of the cluster area. For example, if there are 100 pixels in the cluster area, 90 of them have the same orientation as a. In this way, the cluster can be determined. The orientation of the class area is also a.
在一种可能的实现方式中,上述S503包括步骤E;In a possible implementation manner, the foregoing S503 includes step E;
步骤E、根据所述聚类区域中每个所述像素点的朝向、以及所述聚类区域内每个所述像素点的第二权重,确定所述聚类区域的朝向。Step E: Determine the orientation of the cluster area according to the orientation of each pixel in the cluster area and the second weight of each pixel in the cluster area.
本步骤中,在根据聚类区域中每个像素点的朝向,确定聚类区域的朝向的过程中增加了每个像素点的朝向的第二权重,进而提高了聚类区域的朝向确定的准确性。In this step, in the process of determining the orientation of the clustering area according to the orientation of each pixel in the clustering area, the second weight of the orientation of each pixel is added, thereby improving the accuracy of determining the orientation of the clustering area. Sex.
可选的,所述聚类区域内每个像素点的第二权重可以为聚类区域内每个像素点的语义类别概率值。Optionally, the second weight of each pixel in the cluster area may be a semantic category probability value of each pixel in the cluster area.
进一步的,可以根据聚类区域的朝向,确定待检测物体的朝向,例如将聚类区域的朝向,确定为待检测物体的朝向。Further, the orientation of the object to be detected can be determined according to the orientation of the cluster area, for example, the orientation of the cluster area is determined as the orientation of the object to be detected.
S504、根据所述聚类区域的中心和所述聚类区域的朝向,确定所述待检测物体的尺寸。S504: Determine the size of the object to be detected according to the center of the cluster area and the orientation of the cluster area.
根据上述步骤,确定聚类区域的中心和聚类区域的朝向,可以确定出聚类区域的大小,进而根据聚类区域的大小确定待检测物体的尺寸。According to the above steps, by determining the center of the cluster area and the orientation of the cluster area, the size of the cluster area can be determined, and then the size of the object to be detected can be determined according to the size of the cluster area.
在一种可能的实现方式中,上述S404可以包括步骤F1和步骤F2。In a possible implementation manner, the foregoing S404 may include step F1 and step F2.
步骤F1、根据所述聚类区域的中心和所述聚类区域的朝向,确定所述聚类区域的最大外接矩形框。Step F1: Determine the largest circumscribed rectangular frame of the cluster area according to the center of the cluster area and the orientation of the cluster area.
例如,以聚类区域的中心为最大外接矩形框的中心,以聚类区域的朝向作为约束,拟合获得聚类区域的最大外接矩形框。For example, taking the center of the cluster area as the center of the largest circumscribed rectangular frame, and the orientation of the cluster area as a constraint, the largest circumscribed rectangular frame of the cluster area is obtained by fitting.
步骤F2、根据所述聚类区域的最大外接矩形框,确定所述待检测物体的尺寸。Step F2: Determine the size of the object to be detected according to the largest circumscribed rectangular frame of the cluster area.
例如,将聚类区域的最大外接矩形框的尺寸作为待检测物体的尺寸。For example, the size of the largest circumscribed rectangular frame of the cluster area is taken as the size of the object to be detected.
可选的,本申请实施例还可以根据所述聚类区域的最大外接矩形框,确定出待检测物体的三维坐标。即将最大外接矩形框的三维坐标作为待检测物 体的三维坐标。Optionally, this embodiment of the present application may also determine the three-dimensional coordinates of the object to be detected according to the largest circumscribed rectangular frame of the clustering area. That is, the three-dimensional coordinates of the largest circumscribed rectangular frame are used as the three-dimensional coordinates of the object to be detected.
本申请实施例提供的物体检查方法,根据第一特征图中每个像素点的语义类别,对第一特征图中的像素点进行聚类,确定待检测物体的语义类别;根据所述聚类区域中每个所述像素点的空间位置和所述聚类区域中每个所述像素点与所述待检测物体的中心点的距离,确定所述聚类区域的中心;根据所述聚类区域中每个所述像素点的朝向,确定所述聚类区域的朝向;根据所述聚类区域的中心和所述聚类区域的朝向,确定所述待检测物体的最大外接矩形框,根据该最大外接矩形框可以确定出待检测物体的尺寸和三维坐标,进而实现对待检测物体的语义类别、待检测物体的三维坐标和待检测物体的尺寸的准确确定。The object inspection method provided by the embodiment of the present application clusters the pixels in the first feature map according to the semantic category of each pixel in the first feature map to determine the semantic category of the object to be detected; according to the clustering The spatial position of each pixel in the area and the distance between each pixel in the cluster area and the center point of the object to be detected, determine the center of the cluster area; The orientation of each pixel in the area, determine the orientation of the cluster area; determine the largest circumscribed rectangular frame of the object to be detected according to the center of the cluster area and the orientation of the cluster area, according to The maximum circumscribed rectangular frame can determine the size and three-dimensional coordinates of the object to be detected, thereby achieving accurate determination of the semantic category of the object to be detected, the three-dimensional coordinates of the object to be detected, and the size of the object to be detected.
图8为本申请实施例提供的电子设备的一种示意图,如图8所示,本申请实施例的电子设备200包括至少一个存储器210和至少一个处理器220。其中,存储器210,用于存储计算机程序;处理器220,用于执行所述计算机程序,具体用于:FIG. 8 is a schematic diagram of an electronic device provided by an embodiment of the application. As shown in FIG. 8, the electronic device 200 of the embodiment of the application includes at least one memory 210 and at least one processor 220. Among them, the memory 210 is used to store a computer program; the processor 220 is used to execute the computer program, specifically:
处理器220,在执行计算机程序时,具体用于对获取的待检测物体的点云数据进行第一处理,获得第一特征图;对所述第一特征图中每个像素点进行检测,获得每个所述像素点的第一检测结果;并根据每个所述像素点的第一检测结果,确定所述待检测物体的第二检测结果。The processor 220, when executing a computer program, is specifically configured to perform first processing on the acquired point cloud data of the object to be detected to obtain a first feature map; to detect each pixel in the first feature map to obtain The first detection result of each pixel; and the second detection result of the object to be detected is determined according to the first detection result of each pixel.
本申请实施例的电子设备,可以用于执行上述所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。The electronic device of the embodiment of the present application may be used to execute the technical solution of the method embodiment shown above, and its implementation principles and technical effects are similar, and will not be repeated here.
在一种可能的实现方式中,所述处理器220,具体用于对获取的所述点云数据进行特征编码,获得所述点云数据的编码特征图;对所述编码特征图进行所述第一处理,获得所述第一特征图。In a possible implementation manner, the processor 220 is specifically configured to perform feature encoding on the acquired point cloud data to obtain an encoding feature map of the point cloud data; perform the encoding feature map on the encoding feature map. The first process is to obtain the first characteristic map.
在一种可能的实现方式中,所述处理器220,具体用于在俯视图方向下,对所述点云数据进行压缩,获得所述点云数据的编码特征图。In a possible implementation manner, the processor 220 is specifically configured to compress the point cloud data in a top view direction to obtain an encoding feature map of the point cloud data.
在一种可能的实现方式中,所述处理器220,具体用于将所述点云数据约束至预设的编码空间中;根据预设的分辨率,对所述编码空间中的点云数据进行栅格划分,并确定每个栅格的特征;并在俯视图方向下,根据每个栅格的特征,获得所述点云数据的编码特征图。In a possible implementation manner, the processor 220 is specifically configured to constrain the point cloud data to a preset coding space; according to a preset resolution, perform a calculation of the point cloud data in the coding space Perform grid division and determine the characteristics of each grid; and in the top view direction, obtain the coded feature map of the point cloud data according to the characteristics of each grid.
在一种可能的实现方式中,所述预设的分辨率包括:长度、宽度和高度三个方向的分辨率;所述处理器220,具体用于根据所述长度、宽度和高度三个方向的分辨率,对所述编码空间中的点云数据进行栅格划分。In a possible implementation manner, the preset resolution includes: resolution in three directions of length, width, and height; the processor 220 is specifically configured to perform according to the three directions of length, width, and height. The point cloud data in the coding space is divided into grids.
在一种可能的实现方式中,所述处理器220,具体用于根据所述长度方向的分辨率和所述预设编码空间的长度,对所述点云数据在长度方向上进行划分;根据所述宽度方向的分辨率和所述预设编码空间的宽度,对所述点云数据在宽度方向上进行划分;根据所述高度方向的分辨率和所述预设编码空间的宽度,对所述点云数据在高度方向上进行划分。In a possible implementation manner, the processor 220 is specifically configured to divide the point cloud data in the length direction according to the resolution in the length direction and the length of the preset encoding space; The resolution in the width direction and the width of the preset encoding space are divided into the point cloud data in the width direction; according to the resolution in the height direction and the width of the preset encoding space, The point cloud data is divided in the height direction.
在一种可能的实现方式中,所述处理器220,具体用于根据所述栅格中包括的点云的个数和/或所述栅格所包括的点云的反射率,确定所述栅格的特征。In a possible implementation manner, the processor 220 is specifically configured to determine the number of point clouds included in the grid and/or the reflectivity of the point clouds included in the grid. The characteristics of the grid.
在一种可能的实现方式中,所述编码特征图的尺度为C×A×B,其中所述C由所述预设编码空间的高度与所述高度方向的分辨率的比值确定,所述A由所述预设编码空间的长度与所述长度方向的分辨率的比值确定,所述B由所述预设编码空间的宽度与所述宽度方向的分辨率的比值确定。In a possible implementation manner, the scale of the encoding feature map is C×A×B, where C is determined by the ratio of the height of the preset encoding space to the resolution in the height direction, and A is determined by the ratio of the length of the preset coding space to the resolution in the length direction, and the B is determined by the ratio of the width of the preset coding space to the resolution in the width direction.
可选的,所述第一处理包括如下至少一个:至少一次卷积操作、至少一次采样操作和至少一次堆叠操作。Optionally, the first processing includes at least one of the following: at least one convolution operation, at least one sampling operation, and at least one stacking operation.
可选的,所述采样操作包括:下采样操作和/或上采样操作。Optionally, the sampling operation includes: a down sampling operation and/or an up sampling operation.
可选的,所述第一特征图与所述编码特征图的大小一致。Optionally, the size of the first feature map and the encoding feature map are the same.
在一种可能的实现方式中,所述像素点的第一检测结果包括如下至少一个:所述像素点的语义类别、所述像素点的朝向、所述像素点与所述待检测物体的中心点的距离。In a possible implementation, the first detection result of the pixel point includes at least one of the following: the semantic category of the pixel point, the orientation of the pixel point, the center of the pixel point and the object to be detected The distance of the point.
在一种可能的实现方式中,所述像素点的第一检测结果包括所述像素点的语义类别,所述处理器220,具体用于对所述像素点进行类别检测,获得所述像素点的语义类别。In a possible implementation manner, the first detection result of the pixel includes the semantic category of the pixel, and the processor 220 is specifically configured to perform category detection on the pixel to obtain the pixel The semantic category.
在一种可能的实现方式中,所述像素点的第一检测结果包括所述像素点的朝向,所述处理器220,具体用于以所述第一特征图的中心为圆心,沿着圆周方向,将所述第一特征图划分成多个区间;预测所述像素点所属的区间,以及所述像素点在所属区间中的位置坐标;并根据所述像素点所属的区间的角度、以及所述像素点在所属区间中的位置坐标,确定所述像素点的朝向。In a possible implementation manner, the first detection result of the pixel point includes the orientation of the pixel point, and the processor 220 is specifically configured to use the center of the first feature map as the center of the circle and move along the circumference. Direction, divide the first feature map into a plurality of intervals; predict the interval to which the pixel point belongs, and the position coordinate of the pixel point in the interval; and according to the angle of the interval to which the pixel point belongs, and The position coordinates of the pixel point in the interval to which it belongs determine the orientation of the pixel point.
在一种可能的实现方式中,所述像素点的第一检测结果包括所述像素点与所述待检测物体的中心点的距离,所述处理器220,具体用于对所述像素点进行位置检测,获得所述像素点与所述待检测物体的中心点的距离。In a possible implementation manner, the first detection result of the pixel point includes the distance between the pixel point and the center point of the object to be detected, and the processor 220 is specifically configured to perform Position detection to obtain the distance between the pixel point and the center point of the object to be detected.
在一种可能的实现方式中,,所述待检测物体的第二检测结果包括如下至少一个:所述待检测物体的语义类别、所述待检测物体的尺寸、所述待检测物体的三维坐标。In a possible implementation manner, the second detection result of the object to be detected includes at least one of the following: semantic category of the object to be detected, size of the object to be detected, and three-dimensional coordinates of the object to be detected .
在一种可能的实现方式中,所述待检测物体的第二检测结果包括所述待检测物体的语义类别,所述处理器220,具体用于根据所述第一特征图中每个所述像素点的语义类别,对所述第一特征图中的像素点进行聚类,确定所述待检测物体的语义类别。In a possible implementation manner, the second detection result of the object to be detected includes the semantic category of the object to be detected, and the processor 220 is specifically configured to perform according to each of the The semantic category of pixels is to cluster the pixels in the first feature map to determine the semantic category of the object to be detected.
在一种可能的实现方式中,所述根据所述第一特征图中每个所述像素点的语义类别,所述处理器220,具体用于根据所述第一特征图中每个所述像素点的语义类别,对所述第一特征图中的像素点进行聚类,获取所述聚类区域;根据所述聚类区域中每个像素点的语义类别,确定所述待检测物体的语义类别。In a possible implementation manner, according to the semantic category of each pixel in the first feature map, the processor 220 is specifically configured to perform according to the semantic category of each pixel in the first feature map. The semantic category of the pixel points, cluster the pixels in the first feature map to obtain the cluster area; according to the semantic category of each pixel in the cluster area, determine the object to be detected Semantic category.
可选的,所述对所述像素点进行聚类的方式为自底向上逐渐聚类方式。Optionally, the method of clustering the pixels is a method of gradually clustering from bottom to top.
在一种可能的实现方式中,所述待检测物体的第二检测结果包括所述待检测物体的尺寸,所述处理器220,具体用于根据所述聚类区域中每个所述像素点的空间位置和所述聚类区域中每个所述像素点与所述待检测物体的中心点的距离,确定所述聚类区域的中心;根据所述聚类区域中每个所述像素点的朝向,确定所述聚类区域的朝向;根据所述聚类区域的中心和所述聚类区域的朝向,确定所述待检测物体的尺寸。In a possible implementation manner, the second detection result of the object to be detected includes the size of the object to be detected, and the processor 220 is specifically configured to determine the size of each pixel in the cluster area. Determine the center of the cluster area according to the spatial position of each pixel in the cluster area and the center point of the object to be detected; determine the center of the cluster area according to each pixel in the cluster area Determine the orientation of the cluster area; determine the size of the object to be detected according to the center of the cluster area and the orientation of the cluster area.
在一种可能的实现方式中,所述处理器220,具体用于根据所述聚类区域中每个所述像素点的空间位置、所述聚类区域中每个所述像素点与所述待检测物体的中心点的距离、以及所述聚类区域内每个所述像素点的第一权重,确定所述聚类区域的中心。In a possible implementation manner, the processor 220 is specifically configured to determine the relationship between each pixel in the cluster area and the spatial position of each pixel in the cluster area. The distance of the center point of the object to be detected and the first weight of each pixel in the cluster area determine the center of the cluster area.
在一种可能的实现方式中,所述处理器220,具体用于根据所述聚类区域中每个所述像素点的朝向、以及所述聚类区域内每个所述像素点的第二权重,确定所述聚类区域的朝向。In a possible implementation manner, the processor 220 is specifically configured to perform according to the orientation of each pixel in the cluster area, and the second position of each pixel in the cluster area. The weight determines the orientation of the cluster area.
可选的,所述聚类区域内每个像素点的所述第一权重和/或所述第二权重为所述聚类区域内每个像素点的语义类别概率值。Optionally, the first weight and/or the second weight of each pixel in the cluster area is a semantic category probability value of each pixel in the cluster area.
在一种可能的实现方式中,所述处理器220,具体用于根据所述聚类区域的中心和所述聚类区域的朝向,确定所述聚类区域的最大外接矩形框;根据所述聚类区域的最大外接矩形框,确定所述待检测物体的尺寸。In a possible implementation manner, the processor 220 is specifically configured to determine the largest circumscribed rectangular frame of the cluster area according to the center of the cluster area and the orientation of the cluster area; The largest circumscribed rectangular frame of the clustering area determines the size of the object to be detected.
在一种可能的实现方式中,所述处理器220,具体用于以所述聚类区域的中心为所述最大外接矩形框的中心,以所述聚类区域的朝向作为约束,拟合获得所述聚类区域的最大外接矩形框。In a possible implementation manner, the processor 220 is specifically configured to use the center of the cluster region as the center of the largest circumscribed rectangular frame, and use the orientation of the cluster region as a constraint to obtain The largest circumscribed rectangular frame of the cluster area.
在一种可能的实现方式中,所述处理器220,还体用于根据所述聚类区域的最大外接矩形框,确定所述待检测物体的三维坐标。In a possible implementation manner, the processor 220 is further configured to determine the three-dimensional coordinates of the object to be detected according to the largest circumscribed rectangular frame of the cluster area.
在一种可能的实现方式中,所述处理器220,还用于获取所述待检测物体的点云数据。In a possible implementation manner, the processor 220 is further configured to obtain point cloud data of the object to be detected.
在一种可能的实现方式中,所述处理器220,具体用于获取深度传感器采集的所述待检测物体的点云数据。In a possible implementation manner, the processor 220 is specifically configured to obtain the point cloud data of the object to be detected collected by the depth sensor.
在一种可能的实现方式中,所述处理器220,具体用于获取双目摄像头对所述待检测物体进行采集的第一图像和第二图像;根据所述第一图像和所述第二图像,获得待检测物体的点云数据。In a possible implementation, the processor 220 is specifically configured to obtain the first image and the second image collected by the binocular camera on the object to be detected; according to the first image and the second image Image to obtain point cloud data of the object to be detected.
本申请实施例的电子设备200,可以用于执行上述所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。The electronic device 200 of the embodiment of the present application may be used to implement the technical solutions of the method embodiments shown above, and its implementation principles and technical effects are similar, and will not be repeated here.
图9为本申请实施例提供的电子设备的另一示意图,在上述实施例的基础上,如图9所示,本申请实施例的电子设备200还包括双目摄像头230,FIG. 9 is another schematic diagram of an electronic device provided by an embodiment of the application. Based on the foregoing embodiment, as shown in FIG. 9, the electronic device 200 of the embodiment of the application further includes a binocular camera 230,
所述双目摄像头230,用于采集所述待检测物体的第一图像和第二图像;The binocular camera 230 is used to collect the first image and the second image of the object to be detected;
所述处理器220,具体用于获取所述双目摄像头采集的所述第一图像和所述第二图像;并根据所述第一图像和所述第二图像,获得待检测物体的点云数据。The processor 220 is specifically configured to obtain the first image and the second image collected by the binocular camera; and obtain a point cloud of the object to be detected according to the first image and the second image data.
可选的,所述点云数据中的每个点云包括该点云的三维数据和反射率。Optionally, each point cloud in the point cloud data includes three-dimensional data and reflectivity of the point cloud.
本申请实施例的电子设,可以用于执行上述所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。The electronic device of the embodiment of the present application can be used to implement the technical solution of the method embodiment shown above, and its implementation principles and technical effects are similar, and will not be repeated here.
图10为本申请实施例提供的车辆的结构示意图,如图10所示,本实施例的车辆50包括:车身51和安装在车身51上的电子设备52。FIG. 10 is a schematic structural diagram of a vehicle provided by an embodiment of the application. As shown in FIG. 10, a vehicle 50 in this embodiment includes a body 51 and an electronic device 52 installed on the body 51.
其中,电子设备52为图8或图9所示的电子设备,该电子设备52用于物体检测,例如检测车辆运行路径上的障碍物。The electronic device 52 is the electronic device shown in FIG. 8 or FIG. 9, and the electronic device 52 is used for object detection, for example, detecting obstacles on the running path of the vehicle.
可选的,电子设备52安装在车身51的车顶,若该电子设备为图9所示的电子设备,则该电子设备52中的双目摄像头可以朝向车辆的前方或后方,用于采集第一图像和第二图像。Optionally, the electronic device 52 is installed on the roof of the vehicle body 51. If the electronic device is the electronic device shown in FIG. 9, the binocular camera in the electronic device 52 can face the front or the rear of the vehicle for collecting One image and second image.
可选的,电子设备52安装在车身51的前挡风玻璃上,或者,电子设备52安装在车身51的后挡风玻璃上。Optionally, the electronic device 52 is installed on the front windshield of the vehicle body 51, or the electronic device 52 is installed on the rear windshield of the vehicle body 51.
可选的,电子设备52安装在车身51的车头上,或者,所述电子设备52安装在车身51的车尾上。Optionally, the electronic device 52 is installed on the front of the vehicle body 51, or the electronic device 52 is installed on the rear of the vehicle body 51.
本申请实施例对电子设备52在车身51上的安装位置不限制,具体根据实际需要确定。The embodiment of the present application does not limit the installation position of the electronic device 52 on the body 51, which is specifically determined according to actual needs.
本申请实施例的车辆,可以用于执行上述所示物体检测方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。The vehicle of the embodiment of the present application may be used to implement the technical solution of the embodiment of the object detection method shown above, and its implementation principles and technical effects are similar, and will not be repeated here.
图11为本申请实施例提供的交通工具的结构示意图,如图11所示,本实施例的交通工具60包括:交通工具本体61和安装在交通工具本体61上的电子设备62。FIG. 11 is a schematic structural diagram of a vehicle provided by an embodiment of this application. As shown in FIG. 11, the vehicle 60 of this embodiment includes: a vehicle body 61 and an electronic device 62 installed on the vehicle body 61.
其中,电子设备62为图8或图9所示的电子设备,该电子设备62用于物体检测,例如检测交通工具运行路径上的障碍物。The electronic device 62 is the electronic device shown in FIG. 8 or FIG. 9, and the electronic device 62 is used for object detection, such as detecting obstacles on the running path of a vehicle.
可选的,本实施例的交通工具60可以是船舶、汽车、巴士、铁路车辆、飞行器、铁路机车、踏板车、脚踏车等。Optionally, the vehicle 60 in this embodiment may be a ship, automobile, bus, railway vehicle, aircraft, railway locomotive, scooter, bicycle, etc.
可选的,该电子设备62可以安装在交通工具本体61的前部、尾部或中部等,本申请实施例对电子设备62在交通工具本体61上的安装位置不限制,具体根据实际需要确定。Optionally, the electronic device 62 can be installed on the front, rear, or middle of the vehicle body 61, etc. The embodiment of the present application does not limit the installation position of the electronic device 62 on the vehicle body 61, and is specifically determined according to actual needs.
本申请实施例的交通工具,可以用于执行上述所示物体检测方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。The transportation tool of the embodiment of the present application can be used to implement the technical solution of the above-mentioned object detection method embodiment, and its implementation principles and technical effects are similar, and will not be repeated here.
由于本专利的实施是在俯视图下进行像素级别的预测,这种密集检测的方式可以使得物体检测的召回率很高。并且朝向以及三维位置都是在像素级 别进行预测,然后再权重融合处理,这种方式能获得更加可靠、稳定的定位精度。朝向的预测以及俯视图下目标框不可能出现重合的先验,有效减少了矩形框拟合的计算量。综上,专利提供的方案可以高效的实现高精度的三维动态障碍物定位,非常适用于自动驾驶场景下的动态障碍物感知。Since the implementation of this patent performs pixel-level prediction in a top view, this intensive detection method can make the object detection recall high. In addition, the orientation and 3D position are predicted at the pixel level, and then weighted and fused. This method can obtain more reliable and stable positioning accuracy. The orientation prediction and the priori that the target frame in the top view is unlikely to overlap, effectively reduce the calculation amount of rectangular frame fitting. In summary, the solution provided by the patent can efficiently realize high-precision three-dimensional dynamic obstacle positioning, which is very suitable for dynamic obstacle perception in autonomous driving scenarios.
图12为本申请实施例提供的无人机的结构示意图,如图12所示的无人机100可以是多旋翼、固定翼等各种型号的无人机,其中多旋翼无人机可包括四旋翼、六旋翼、八旋翼等包括其他数目旋翼的无人机。本实施例以旋翼无人机为例进行说明。FIG. 12 is a schematic structural diagram of a drone provided by an embodiment of the application. The drone 100 shown in FIG. 12 can be a multi-rotor, fixed-wing and other types of drones, where the multi-rotor drone can include Quad-rotor, hexa-rotor, octo-rotor and other drones including other numbers of rotors. In this embodiment, a rotary wing drone is taken as an example for description.
无人机100可以包括动力***150、飞行控制***160、机架和固定在所述机架上的电子设备120。The drone 100 may include a power system 150, a flight control system 160, a frame, and electronic equipment 120 fixed on the frame.
机架可以包括机身和脚架(也称为起落架)。机身可以包括中心架以及与中心架连接的一个或多个机臂,一个或多个机臂呈辐射状从中心架延伸出。脚架与机身连接,用于在无人机100着陆时起支撑作用。The frame may include a fuselage and a tripod (also called a landing gear). The fuselage may include a center frame and one or more arms connected to the center frame, and the one or more arms extend radially from the center frame. The tripod is connected with the fuselage and used for supporting the UAV 100 when it is landed.
动力***150可以包括一个或多个电子调速器(简称为电调)151、一个或多个螺旋桨153以及与一个或多个螺旋桨153相对应的一个或多个电机152,其中电机152连接在电子调速器151与螺旋桨153之间,电机152和螺旋桨153设置在无人机110的机臂上;电子调速器151用于接收飞行控制***160产生的驱动信号,并根据驱动信号提供驱动电流给电机152,以控制电机152的转速。电机152用于驱动螺旋桨旋转,从而为无人机100的飞行提供动力,该动力使得无人机100能够实现一个或多个自由度的运动。在某些实施例中,无人机100可以围绕一个或多个旋转轴旋转。例如,上述旋转轴可以包括横滚轴(Roll)、偏航轴(Yaw)和俯仰轴(pitch)。应理解,电机152可以是直流电机,也可以交流电机。另外,电机152可以是无刷电机,也可以是有刷电机。The power system 150 may include one or more electronic speed regulators (referred to as ESCs) 151, one or more propellers 153, and one or more motors 152 corresponding to the one or more propellers 153, wherein the motors 152 are connected to Between the electronic governor 151 and the propeller 153, the motor 152 and the propeller 153 are arranged on the arm of the UAV 110; the electronic governor 151 is used to receive the driving signal generated by the flight control system 160 and provide driving according to the driving signal Current is supplied to the motor 152 to control the speed of the motor 152. The motor 152 is used to drive the propeller to rotate, thereby providing power for the flight of the drone 100, and the power enables the drone 100 to realize one or more degrees of freedom of movement. In some embodiments, the drone 100 may rotate around one or more rotation axes. For example, the aforementioned rotation axis may include a roll axis (Roll), a yaw axis (Yaw), and a pitch axis (pitch). It should be understood that the motor 152 may be a DC motor or an AC motor. In addition, the motor 152 may be a brushless motor or a brushed motor.
飞行控制***160可以包括飞行控制器161和传感***162。传感***162用于测量无人机的姿态信息,即无人机110在空间的位置信息和状态信息,例如,三维位置、三维角度、三维速度、三维加速度和三维角速度等。传感***162例如可以包括陀螺仪、超声传感器、电子罗盘、惯性测量单元(Inertial Measurement Unit,IMU)、视觉传感器、全球导航卫星***和气压 计等传感器中的至少一种。例如,全球导航卫星***可以是全球定位***(Global Positioning System,GPS)。飞行控制器161用于控制无人机100的飞行,例如,可以根据传感***162测量的姿态信息控制无人机110的飞行。应理解,飞行控制器161可以按照预先编好的程序指令对无人机100进行控制,也可以通过响应来自控制终端140的一个或多个控制指令对无人机100进行控制。The flight control system 160 may include a flight controller 161 and a sensing system 162. The sensing system 162 is used to measure the attitude information of the drone, that is, the position information and state information of the drone 110 in space, such as three-dimensional position, three-dimensional angle, three-dimensional velocity, three-dimensional acceleration, and three-dimensional angular velocity. The sensing system 162 may include, for example, at least one of sensors such as a gyroscope, an ultrasonic sensor, an electronic compass, an inertial measurement unit (IMU), a vision sensor, a global navigation satellite system, and a barometer. For example, the global navigation satellite system may be a global positioning system (Global Positioning System, GPS). The flight controller 161 is used to control the flight of the drone 100, for example, it can control the flight of the drone 110 according to the attitude information measured by the sensor system 162. It should be understood that the flight controller 161 can control the drone 100 according to pre-programmed program instructions, and can also control the drone 100 by responding to one or more control instructions from the control terminal 140.
电子设备120用于实现物体检测,并将检测结果发送给飞行控制***160,以上飞行控制***160根据物体检测结果来控制无人机100的飞行。The electronic device 120 is used to implement object detection and send the detection result to the flight control system 160. The above flight control system 160 controls the flight of the drone 100 according to the object detection result.
可选的,该电子设备还包括拍摄组件,该拍摄组件为双目摄像头,用于采集第一图像和第二图像。本实施例的拍摄组件至少包括感光元件,该感光元件例如为互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)传感器或电荷耦合元件(Charge-coupled Device,CCD)传感器。Optionally, the electronic device further includes a photographing component, and the photographing component is a binocular camera for collecting the first image and the second image. The photographing component of this embodiment at least includes a photosensitive element, and the photosensitive element is, for example, a Complementary Metal Oxide Semiconductor (CMOS) sensor or a Charge-coupled Device (CCD) sensor.
本申请实施例的无人机,可以用于执行上述方法实施例中物体检测方法的技术方案,其实现原理和技术效果类似,此处不再赘述。The unmanned aerial vehicle of the embodiment of the present application can be used to implement the technical solution of the object detection method in the foregoing method embodiment, and its implementation principles and technical effects are similar, and will not be repeated here.
进一步的,当本申请实施例中物体检测方法的至少一部分功能通过软件实现时,本申请实施例还提供一种计算机存储介质,计算机存储介质用于储存为上述物体检测的计算机软件指令,当其在计算机上运行时,使得计算机可以执行上述方法实施例中各种可能的物体检测方法。在计算机上加载和执行所述计算机执行指令时,可全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机指令可以存储在计算机存储介质中,或者从一个计算机存储介质向另一个计算机存储介质传输,所述传输可以通过无线(例如蜂窝通信、红外、短距离无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如SSD)等。Further, when at least a part of the functions of the object detection method in the embodiment of the present application is implemented by software, the embodiment of the present application also provides a computer storage medium. The computer storage medium is used to store the computer software instructions for detecting the above object. When running on a computer, the computer can execute various possible object detection methods in the foregoing method embodiments. When the computer-executable instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application can be generated in whole or in part. The computer instructions can be stored in a computer storage medium, or transmitted from one computer storage medium to another computer storage medium, and the transmission can be transmitted to another by wireless (such as cellular communication, infrared, short-range wireless, microwave, etc.) Website site, computer, server or data center for transmission. The computer storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, an SSD).
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读 取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:只读内存(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。A person of ordinary skill in the art can understand that all or part of the steps in the above method embodiments can be implemented by a program instructing relevant hardware. The foregoing program can be stored in a computer readable storage medium. When the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks, etc., which can store program codes Medium.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: It is still possible to modify the technical solutions described in the foregoing embodiments, or equivalently replace some or all of the technical features; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention range.

Claims (65)

  1. 一种物体检测方法,其特征在于,包括:An object detection method, characterized by comprising:
    对获取的待检测物体的点云数据进行第一处理,获得第一特征图;Perform first processing on the acquired point cloud data of the object to be detected to obtain a first feature map;
    对所述第一特征图中每个像素点进行检测,获得每个所述像素点的第一检测结果;Detecting each pixel in the first feature map to obtain a first detection result of each pixel;
    根据每个所述像素点的第一检测结果,确定所述待检测物体的第二检测结果。According to the first detection result of each pixel point, the second detection result of the object to be detected is determined.
  2. 根据权利要求1所述的方法,其特征在于,所述对获取的待检测物体的点云数据进行第一处理,获得第一特征图,包括:The method according to claim 1, wherein the first processing is performed on the acquired point cloud data of the object to be detected to obtain a first feature map, comprising:
    对获取的所述点云数据进行特征编码,获得所述点云数据的编码特征图;Performing feature encoding on the acquired point cloud data to obtain an encoding feature map of the point cloud data;
    对所述编码特征图进行所述第一处理,获得所述第一特征图。Perform the first processing on the encoded feature map to obtain the first feature map.
  3. 根据权利要求2所述的方法,其特征在于,所述对所述点云数据进行特征编码,获得所述点云数据的编码特征图,包括:The method according to claim 2, wherein said performing feature encoding on said point cloud data to obtain an encoding feature map of said point cloud data comprises:
    在俯视图方向下,对所述点云数据进行压缩,获得所述点云数据的编码特征图。In the top view direction, compress the point cloud data to obtain an encoded feature map of the point cloud data.
  4. 根据权利要求3所述的方法,其特征在于,所述在俯视图方向下,对所述点云数据进行压缩,获得所述点云数据的编码特征图,包括:The method according to claim 3, wherein the compressing the point cloud data in the top view direction to obtain the coding feature map of the point cloud data comprises:
    将所述点云数据约束至预设的编码空间中;Constrain the point cloud data to a preset coding space;
    根据预设的分辨率,对所述编码空间中的点云数据进行栅格划分,并确定每个栅格的特征;Perform grid division on the point cloud data in the coding space according to a preset resolution, and determine the characteristics of each grid;
    在俯视图方向下,根据每个栅格的特征,获得所述点云数据的编码特征图。In the top view direction, the coded feature map of the point cloud data is obtained according to the feature of each grid.
  5. 根据权利要求4所述的方法,其特征在于,所述预设的分辨率包括:长度、宽度和高度三个方向的分辨率;The method according to claim 4, wherein the preset resolution comprises: resolution in three directions of length, width and height;
    所述根据预设的分辨率,对所述编码空间中的点云数据进行栅格划分,包括:The performing grid division on the point cloud data in the coding space according to a preset resolution includes:
    根据所述长度、宽度和高度三个方向的分辨率,对所述编码空间中的点云数据进行栅格划分。According to the resolution in the three directions of length, width and height, grid division is performed on the point cloud data in the coding space.
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述长度、宽度和高度三个方向的分辨率,对所述编码空间中的点云数据进行栅格划分,包 括:The method according to claim 5, wherein the grid division of the point cloud data in the coding space according to the resolutions in the three directions of length, width, and height comprises:
    根据所述长度方向的分辨率和所述预设编码空间的长度,对所述点云数据在长度方向上进行划分;Dividing the point cloud data in the length direction according to the resolution in the length direction and the length of the preset coding space;
    根据所述宽度方向的分辨率和所述预设编码空间的宽度,对所述点云数据在宽度方向上进行划分;Dividing the point cloud data in the width direction according to the resolution in the width direction and the width of the preset coding space;
    根据所述高度方向的分辨率和所述预设编码空间的宽度,对所述点云数据在高度方向上进行划分。According to the resolution in the height direction and the width of the preset coding space, the point cloud data is divided in the height direction.
  7. 根据权利要求4-6任一项所述的方法,其特征在于,所述确定每个栅格的特征,包括:The method according to any one of claims 4-6, wherein the determining the characteristics of each grid comprises:
    根据所述栅格中包括的点云的个数和/或所述栅格所包括的点云的反射率,确定所述栅格的特征。The characteristics of the grid are determined according to the number of point clouds included in the grid and/or the reflectivity of the point clouds included in the grid.
  8. 根据权利要求6所述的方法,其特征在于,所述编码特征图的尺度为C×A×B,其中所述C由所述预设编码空间的高度与所述高度方向的分辨率的比值确定,所述A由所述预设编码空间的长度与所述长度方向的分辨率的比值确定,所述B由所述预设编码空间的宽度与所述宽度方向的分辨率的比值确定。The method according to claim 6, wherein the scale of the coding feature map is C×A×B, and the C is determined by the ratio of the height of the preset coding space to the resolution in the height direction. It is determined that the A is determined by the ratio of the length of the preset coding space to the resolution in the length direction, and the B is determined by the ratio of the width of the preset coding space to the resolution in the width direction.
  9. 根据权利要求1-6任一项所述的方法,其特征在于,所述第一处理包括如下至少一个:至少一次卷积操作、至少一次采样操作和至少一次堆叠操作。The method according to any one of claims 1 to 6, wherein the first processing comprises at least one of the following: at least one convolution operation, at least one sampling operation, and at least one stacking operation.
  10. 根据权利要求9所述的方法,其特征在于,所述采样操作包括:下采样操作和/或上采样操作。The method according to claim 9, wherein the sampling operation comprises: a down sampling operation and/or an up sampling operation.
  11. 根据权利要求2所述的方法,其特征在于,所述第一特征图与所述编码特征图的大小一致。The method according to claim 2, wherein the size of the first feature map and the encoding feature map are the same.
  12. 根据权利要求1-11任一项所述的方法,其特征在于,所述像素点的第一检测结果包括如下至少一个:所述像素点的语义类别、所述像素点的朝向、所述像素点与所述待检测物体的中心点的距离。The method according to any one of claims 1-11, wherein the first detection result of the pixel point includes at least one of the following: the semantic category of the pixel point, the orientation of the pixel point, and the pixel The distance between the point and the center point of the object to be detected.
  13. 根据权利要求12所述的方法,其特征在于,所述像素点的第一检测结果包括所述像素点的语义类别,所述对所述第一特征图中每个像素点进行特征预测,获得每个所述像素点的第一检测结果,包括:The method according to claim 12, wherein the first detection result of the pixel includes the semantic category of the pixel, and the feature prediction is performed on each pixel in the first feature map to obtain The first detection result of each pixel includes:
    对所述像素点进行类别检测,获得所述像素点的语义类别。Perform category detection on the pixel to obtain the semantic category of the pixel.
  14. 根据权利要求12或13所述的方法,其特征在于,所述像素点的第一检测结果包括所述像素点的朝向,所述对所述第一特征图中每个像素点进行检测,获得每个所述像素点的第一检测结果,包括:The method according to claim 12 or 13, wherein the first detection result of the pixel point includes the orientation of the pixel point, and each pixel point in the first feature map is detected to obtain The first detection result of each pixel includes:
    以所述第一特征图的中心为圆心,沿着圆周方向,将所述第一特征图划分成多个区间;Taking the center of the first feature map as the center of the circle and dividing the first feature map into a plurality of sections along the circumferential direction;
    预测所述像素点所属的区间,以及所述像素点在所属区间中的位置坐标;Predicting the interval to which the pixel point belongs, and the position coordinate of the pixel point in the interval to which the pixel point belongs;
    根据所述像素点所属的区间的角度、以及所述像素点在所属区间中的位置坐标,确定所述像素点的朝向。The orientation of the pixel point is determined according to the angle of the interval to which the pixel point belongs and the position coordinate of the pixel point in the interval to which the pixel point belongs.
  15. 根据权利要求14所述的方法,其特征在于,所述像素点的第一检测结果包括所述像素点与所述待检测物体的中心点的距离,所述对所述第一特征图中每个像素点进行特征预测,获得每个所述像素点的第一检测结果,包括:The method according to claim 14, wherein the first detection result of the pixel point includes the distance between the pixel point and the center point of the object to be detected, and the pair of each pixel in the first feature map Feature prediction for each pixel to obtain the first detection result of each pixel includes:
    对所述像素点进行位置检测,获得所述像素点与所述待检测物体的中心点的距离。Perform position detection on the pixel point to obtain the distance between the pixel point and the center point of the object to be detected.
  16. 根据权利要求12所述的方法,其特征在于,所述待检测物体的第二检测结果包括如下至少一个:所述待检测物体的语义类别、所述待检测物体的尺寸、所述待检测物体的三维坐标。The method according to claim 12, wherein the second detection result of the object to be detected includes at least one of the following: the semantic category of the object to be detected, the size of the object to be detected, and the object to be detected The three-dimensional coordinates.
  17. 根据权利要求16所述的方法,其特征在于,所述待检测物体的第二检测结果包括所述待检测物体的语义类别,所述根据每个所述像素点的第一检测结果,确定所述待检测物体的第二检测结果,包括:The method according to claim 16, wherein the second detection result of the object to be detected includes the semantic category of the object to be detected, and the first detection result of each pixel point is determined. The second detection result of the object to be detected includes:
    根据所述第一特征图中每个所述像素点的语义类别,对所述第一特征图中的像素点进行聚类,确定所述待检测物体的语义类别。According to the semantic category of each pixel in the first feature map, cluster the pixels in the first feature map to determine the semantic category of the object to be detected.
  18. 根据权利要求17所述的方法,其特征在于,所述根据所述第一特征图中每个所述像素点的语义类别,对所述第一特征图中的像素点进行聚类,确定所述待检测物体的语义类别,包括:The method according to claim 17, wherein the pixel points in the first feature map are clustered according to the semantic category of each pixel in the first feature map to determine State the semantic category of the object to be detected, including:
    根据所述第一特征图中每个所述像素点的语义类别,对所述第一特征图中的像素点进行聚类,获取所述聚类区域;Clustering the pixels in the first feature map according to the semantic category of each pixel in the first feature map to obtain the clustering area;
    根据所述聚类区域中每个像素点的语义类别,确定所述待检测物体的语义类别。Determine the semantic category of the object to be detected according to the semantic category of each pixel in the cluster area.
  19. 根据权利要求18所述的方法,其特征在于,所述对所述像素点进行 聚类的方式为自底向上逐渐聚类方式。The method according to claim 18, wherein the method of clustering the pixels is a method of gradually clustering from bottom to top.
  20. 根据权利要求18所述的方法,其特征在于,所述待检测物体的第二检测结果包括所述待检测物体的尺寸,所述根据每个所述像素点的第一检测结果,获得所述待检测物体的第二检测结果,包括:The method according to claim 18, wherein the second detection result of the object to be detected includes the size of the object to be detected, and the first detection result of each pixel point is used to obtain the The second detection result of the object to be detected includes:
    根据所述聚类区域中每个所述像素点的空间位置和所述聚类区域中每个所述像素点与所述待检测物体的中心点的距离,确定所述聚类区域的中心;Determine the center of the cluster area according to the spatial position of each pixel in the cluster area and the distance between each pixel in the cluster area and the center point of the object to be detected;
    根据所述聚类区域中每个所述像素点的朝向,确定所述聚类区域的朝向;Determine the orientation of the cluster area according to the orientation of each pixel in the cluster area;
    根据所述聚类区域的中心和所述聚类区域的朝向,确定所述待检测物体的尺寸。The size of the object to be detected is determined according to the center of the cluster area and the orientation of the cluster area.
  21. 根据权利要求20所述的方法,其特征在于,所述根据所述聚类区域中每个所述像素点的空间位置和所述聚类区域中每个所述像素点与所述待检测物体的中心点的距离,确定所述聚类区域的中心,包括:22. The method according to claim 20, characterized in that, according to the spatial position of each pixel in the cluster area and the relationship between each pixel in the cluster area and the object to be detected To determine the center of the clustering area, including:
    根据所述聚类区域中每个所述像素点的空间位置、所述聚类区域中每个所述像素点与所述待检测物体的中心点的距离、以及所述聚类区域内每个所述像素点的第一权重,确定所述聚类区域的中心。According to the spatial position of each pixel in the cluster area, the distance between each pixel in the cluster area and the center point of the object to be detected, and each pixel in the cluster area The first weight of the pixel points determines the center of the cluster area.
  22. 根据权利要求21所述的方法,其特征在于,所述根据所述聚类区域中每个所述像素点的朝向,确定所述聚类区域的朝向,包括:The method according to claim 21, wherein the determining the orientation of the clustering area according to the orientation of each pixel in the clustering area comprises:
    根据所述聚类区域中每个所述像素点的朝向、以及所述聚类区域内每个所述像素点的第二权重,确定所述聚类区域的朝向。The orientation of the cluster area is determined according to the orientation of each pixel in the cluster area and the second weight of each pixel in the cluster area.
  23. 根据权利要求22所述的方法,其特征在于,所述聚类区域内每个像素点的所述第一权重和/或所述第二权重为所述聚类区域内每个像素点的语义类别概率值。The method according to claim 22, wherein the first weight and/or the second weight of each pixel in the cluster area is the semantics of each pixel in the cluster area The category probability value.
  24. 根据权利要求20-23任一项所述的方法,其特征在于,所述根据所述聚类区域的中心和所述聚类区域的朝向,确定所述待检测物体的尺寸,包括:The method according to any one of claims 20-23, wherein the determining the size of the object to be detected according to the center of the clustering area and the orientation of the clustering area comprises:
    根据所述聚类区域的中心和所述聚类区域的朝向,确定所述聚类区域的最大外接矩形框;Determine the largest circumscribed rectangular frame of the cluster area according to the center of the cluster area and the orientation of the cluster area;
    根据所述聚类区域的最大外接矩形框,确定所述待检测物体的尺寸。Determine the size of the object to be detected according to the largest circumscribed rectangular frame of the cluster area.
  25. 根据权利要求24所述的方法,其特征在于,所述根据所述聚类区域的中心和所述聚类区域的朝向,确定所述聚类区域的最大外接矩形框,包括:The method according to claim 24, wherein the determining the largest bounding rectangular frame of the clustering area according to the center of the clustering area and the orientation of the clustering area comprises:
    以所述聚类区域的中心为所述最大外接矩形框的中心,以所述聚类区域的朝向作为约束,拟合获得所述聚类区域的最大外接矩形框。Taking the center of the cluster region as the center of the largest circumscribed rectangular frame, and taking the orientation of the cluster region as a constraint, fitting the largest circumscribed rectangular frame of the cluster region is obtained.
  26. 根据权利要求24所述的方法,其特征在于,所述方法还包括:The method according to claim 24, wherein the method further comprises:
    根据所述聚类区域的最大外接矩形框,确定所述待检测物体的三维坐标。Determine the three-dimensional coordinates of the object to be detected according to the largest circumscribed rectangular frame of the cluster area.
  27. 根据权利要求1-26任一项所述的方法,其特征在于,对获取的待检测物体的点云数据进行第一处理,获得第一特征图之前,所述方法还包括:The method according to any one of claims 1-26, wherein the first processing is performed on the acquired point cloud data of the object to be detected, and before the first feature map is obtained, the method further comprises:
    获取所述待检测物体的点云数据。Obtain the point cloud data of the object to be detected.
  28. 根据权利要求27所述的方法,其特征在于,所述获取所述待检测物体的点云数据,包括:The method according to claim 27, wherein said acquiring point cloud data of the object to be detected comprises:
    获取深度传感器采集的所述待检测物体的点云数据。Obtain the point cloud data of the object to be detected collected by the depth sensor.
  29. 根据权利要求27所述的方法,其特征在于,所述获取所述待检测物体的点云数据,包括:The method according to claim 27, wherein said acquiring point cloud data of the object to be detected comprises:
    获取双目摄像头对所述待检测物体进行采集的第一图像和第二图像;Acquiring the first image and the second image collected by the binocular camera on the object to be detected;
    根据所述第一图像和所述第二图像,获得待检测物体的点云数据。According to the first image and the second image, point cloud data of the object to be detected is obtained.
  30. 根据权利要求1所述的方法,其特征在于,所述点云数据中的每个点云包括该点云的三维数据和反射率。The method according to claim 1, wherein each point cloud in the point cloud data includes three-dimensional data and reflectivity of the point cloud.
  31. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    存储器,用于存储计算机程序;Memory, used to store computer programs;
    处理器,用于执行所述计算机程序,具体用于:The processor is configured to execute the computer program, specifically:
    对获取的待检测物体的点云数据进行第一处理,获得第一特征图;Perform first processing on the acquired point cloud data of the object to be detected to obtain a first feature map;
    对所述第一特征图中每个像素点进行检测,获得每个所述像素点的第一检测结果;Detecting each pixel in the first feature map to obtain a first detection result of each pixel;
    根据每个所述像素点的第一检测结果,确定所述待检测物体的第二检测结果。According to the first detection result of each pixel point, the second detection result of the object to be detected is determined.
  32. 根据权利要求31所述的电子设备,其特征在于,The electronic device according to claim 31, wherein:
    所述处理器,具体用于对获取的所述点云数据进行特征编码,获得所述点云数据的编码特征图;对所述编码特征图进行所述第一处理,获得所述第一特征图。The processor is specifically configured to perform feature encoding on the acquired point cloud data to obtain an encoded feature map of the point cloud data; perform the first processing on the encoded feature map to obtain the first feature Figure.
  33. 根据权利要求32所述的电子设备,其特征在于,The electronic device according to claim 32, wherein:
    所述处理器,具体用于在俯视图方向下,对所述点云数据进行压缩,获得所述点云数据的编码特征图。The processor is specifically configured to compress the point cloud data in a top view direction to obtain an encoding feature map of the point cloud data.
  34. 根据权利要求33所述的电子设备,其特征在于,The electronic device according to claim 33, wherein:
    所述处理器,具体用于将所述点云数据约束至预设的编码空间中;根据预设的分辨率,对所述编码空间中的点云数据进行栅格划分,并确定每个栅格的特征;并在俯视图方向下,根据每个栅格的特征,获得所述点云数据的编码特征图。The processor is specifically configured to constrain the point cloud data to a preset coding space; perform grid division on the point cloud data in the coding space according to the preset resolution, and determine each grid The feature of the grid; and in the top view direction, according to the feature of each grid, the coded feature map of the point cloud data is obtained.
  35. 根据权利要求34所述的电子设备,其特征在于,所述预设的分辨率包括:长度、宽度和高度三个方向的分辨率;The electronic device according to claim 34, wherein the preset resolution comprises: resolution in three directions of length, width and height;
    所述处理器,具体用于根据所述长度、宽度和高度三个方向的分辨率,对所述编码空间中的点云数据进行栅格划分。The processor is specifically configured to perform grid division on the point cloud data in the coding space according to the resolution in the three directions of length, width, and height.
  36. 根据权利要求35所述的电子设备,其特征在于,The electronic device according to claim 35, wherein:
    所述处理器,具体用于根据所述长度方向的分辨率和所述预设编码空间的长度,对所述点云数据在长度方向上进行划分;根据所述宽度方向的分辨率和所述预设编码空间的宽度,对所述点云数据在宽度方向上进行划分;根据所述高度方向的分辨率和所述预设编码空间的宽度,对所述点云数据在高度方向上进行划分。The processor is specifically configured to divide the point cloud data in the length direction according to the resolution in the length direction and the length of the preset coding space; according to the resolution in the width direction and the length of the Preset the width of the coding space, and divide the point cloud data in the width direction; according to the resolution in the height direction and the width of the preset coding space, divide the point cloud data in the height direction .
  37. 根据权利要求34-36任一项所述的电子设备,其特征在于,The electronic device according to any one of claims 34-36, wherein:
    所述处理器,具体用于根据所述栅格中包括的点云的个数和/或所述栅格所包括的点云的反射率,确定所述栅格的特征。The processor is specifically configured to determine the characteristics of the grid according to the number of point clouds included in the grid and/or the reflectivity of the point clouds included in the grid.
  38. 根据权利要求36所述的电子设备,其特征在于,所述编码特征图的尺度为C×A×B,其中所述C由所述预设编码空间的高度与所述高度方向的分辨率的比值确定,所述A由所述预设编码空间的长度与所述长度方向的分辨率的比值确定,所述B由所述预设编码空间的宽度与所述宽度方向的分辨率的比值确定。The electronic device according to claim 36, wherein the scale of the coding feature map is C×A×B, wherein the C is determined by the height of the preset coding space and the resolution in the height direction. The ratio is determined, the A is determined by the ratio of the length of the preset coding space to the resolution in the length direction, and the B is determined by the ratio of the width of the preset coding space to the resolution in the width direction .
  39. 根据权利要求31-36任一项所述的电子设备,其特征在于,所述第一处理包括如下至少一个:至少一次卷积操作、至少一次采样操作和至少一次堆叠操作。The electronic device according to any one of claims 31-36, wherein the first processing comprises at least one of the following: at least one convolution operation, at least one sampling operation, and at least one stacking operation.
  40. 根据权利要求39所述的电子设备,其特征在于,所述采样操作包括:下采样操作和/或上采样操作。The electronic device according to claim 39, wherein the sampling operation comprises: a down sampling operation and/or an up sampling operation.
  41. 根据权利要求32所述的电子设备,其特征在于,所述第一特征图与所述编码特征图的大小一致。The electronic device according to claim 32, wherein the size of the first feature map and the encoding feature map are the same.
  42. 根据权利要求31-41任一项所述的电子设备,其特征在于,所述像素点的第一检测结果包括如下至少一个:所述像素点的语义类别、所述像素点的朝向、所述像素点与所述待检测物体的中心点的距离。The electronic device according to any one of claims 31-41, wherein the first detection result of the pixel point comprises at least one of the following: semantic category of the pixel point, orientation of the pixel point, and The distance between the pixel point and the center point of the object to be detected.
  43. 根据权利要求42所述的电子设备,其特征在于,所述像素点的第一检测结果包括所述像素点的语义类别,42. The electronic device according to claim 42, wherein the first detection result of the pixel point comprises the semantic category of the pixel point,
    所述处理器,具体用于对所述像素点进行类别检测,获得所述像素点的语义类别。The processor is specifically configured to perform category detection on the pixel to obtain the semantic category of the pixel.
  44. 根据权利要求42或43所述的电子设备,其特征在于,所述像素点的第一检测结果包括所述像素点的朝向,The electronic device according to claim 42 or 43, wherein the first detection result of the pixel point includes the orientation of the pixel point,
    所述处理器,具体用于以所述第一特征图的中心为圆心,沿着圆周方向,将所述第一特征图划分成多个区间;预测所述像素点所属的区间,以及所述像素点在所属区间中的位置坐标;并根据所述像素点所属的区间的角度、以及所述像素点在所属区间中的位置坐标,确定所述像素点的朝向。The processor is specifically configured to divide the first feature map into a plurality of intervals along the circumferential direction with the center of the first feature map as the center; predict the interval to which the pixel belongs, and the The position coordinates of the pixel point in the interval to which it belongs; and the orientation of the pixel point is determined according to the angle of the interval to which the pixel point belongs and the position coordinate of the pixel point in the interval to which it belongs.
  45. 根据权利要求44所述的电子设备,其特征在于,所述像素点的第一检测结果包括所述像素点与所述待检测物体的中心点的距离,The electronic device according to claim 44, wherein the first detection result of the pixel point comprises the distance between the pixel point and the center point of the object to be detected,
    所述处理器,具体用于对所述像素点进行位置检测,获得所述像素点与所述待检测物体的中心点的距离。The processor is specifically configured to perform position detection on the pixel point to obtain the distance between the pixel point and the center point of the object to be detected.
  46. 根据权利要求42所述的电子设备,其特征在于,所述待检测物体的第二检测结果包括如下至少一个:所述待检测物体的语义类别、所述待检测物体的尺寸、所述待检测物体的三维坐标。The electronic device according to claim 42, wherein the second detection result of the object to be detected includes at least one of the following: semantic category of the object to be detected, size of the object to be detected, and The three-dimensional coordinates of the object.
  47. 根据权利要求46所述的电子设备,其特征在于,所述待检测物体的第二检测结果包括所述待检测物体的语义类别,The electronic device according to claim 46, wherein the second detection result of the object to be detected includes the semantic category of the object to be detected,
    所述处理器,具体用于根据所述第一特征图中每个所述像素点的语义类别,对所述第一特征图中的像素点进行聚类,确定所述待检测物体的语义类别。The processor is specifically configured to cluster the pixels in the first feature map according to the semantic category of each pixel in the first feature map, and determine the semantic category of the object to be detected .
  48. 根据权利要求47所述的电子设备,其特征在于,所述根据所述第一特征图中每个所述像素点的语义类别,The electronic device according to claim 47, wherein, according to the semantic category of each pixel in the first feature map,
    所述处理器,具体用于根据所述第一特征图中每个所述像素点的语义类 别,对所述第一特征图中的像素点进行聚类,获取所述聚类区域;根据所述聚类区域中每个像素点的语义类别,确定所述待检测物体的语义类别。The processor is specifically configured to cluster the pixels in the first feature map according to the semantic category of each pixel in the first feature map to obtain the clustering area; The semantic category of each pixel in the cluster area determines the semantic category of the object to be detected.
  49. 根据权利要求48所述的电子设备,其特征在于,所述对所述像素点进行聚类的方式为自底向上逐渐聚类方式。The electronic device of claim 48, wherein the method of clustering the pixels is a method of gradually clustering from bottom to top.
  50. 根据权利要求48所述的电子设备,其特征在于,所述待检测物体的第二检测结果包括所述待检测物体的尺寸,The electronic device according to claim 48, wherein the second detection result of the object to be detected comprises the size of the object to be detected,
    所述处理器,具体用于根据所述聚类区域中每个所述像素点的空间位置和所述聚类区域中每个所述像素点与所述待检测物体的中心点的距离,确定所述聚类区域的中心;根据所述聚类区域中每个所述像素点的朝向,确定所述聚类区域的朝向;根据所述聚类区域的中心和所述聚类区域的朝向,确定所述待检测物体的尺寸。The processor is specifically configured to determine according to the spatial position of each pixel in the cluster area and the distance between each pixel in the cluster area and the center point of the object to be detected The center of the cluster region; determine the orientation of the cluster region according to the orientation of each pixel in the cluster region; according to the center of the cluster region and the orientation of the cluster region, Determine the size of the object to be detected.
  51. 根据权利要求50所述的电子设备,其特征在于,The electronic device according to claim 50, wherein:
    所述处理器,具体用于根据所述聚类区域中每个所述像素点的空间位置、所述聚类区域中每个所述像素点与所述待检测物体的中心点的距离、以及所述聚类区域内每个所述像素点的第一权重,确定所述聚类区域的中心。The processor is specifically configured to use the spatial position of each pixel in the cluster area, the distance between each pixel in the cluster area and the center point of the object to be detected, and The first weight of each pixel in the cluster area determines the center of the cluster area.
  52. 根据权利要求51所述的电子设备,其特征在于,The electronic device according to claim 51, wherein:
    所述处理器,具体用于根据所述聚类区域中每个所述像素点的朝向、以及所述聚类区域内每个所述像素点的第二权重,确定所述聚类区域的朝向。The processor is specifically configured to determine the orientation of the cluster area according to the orientation of each pixel in the cluster area and the second weight of each pixel in the cluster area .
  53. 根据权利要求52所述的电子设备,其特征在于,所述聚类区域内每个像素点的所述第一权重和/或所述第二权重为所述聚类区域内每个像素点的语义类别概率值。The electronic device according to claim 52, wherein the first weight and/or the second weight of each pixel in the cluster area is the weight of each pixel in the cluster area. Semantic category probability value.
  54. 根据权利要求50-53任一项所述的电子设备,其特征在于,The electronic device according to any one of claims 50-53, wherein:
    所述处理器,具体用于根据所述聚类区域的中心和所述聚类区域的朝向,确定所述聚类区域的最大外接矩形框;根据所述聚类区域的最大外接矩形框,确定所述待检测物体的尺寸。The processor is specifically configured to determine the maximum circumscribed rectangular frame of the cluster area according to the center of the cluster area and the orientation of the cluster area; The size of the object to be detected.
  55. 根据权利要求54所述的电子设备,其特征在于,The electronic device according to claim 54, wherein:
    所述处理器,具体用于以所述聚类区域的中心为所述最大外接矩形框的中心,以所述聚类区域的朝向作为约束,拟合获得所述聚类区域的最大外接矩形框。The processor is specifically configured to use the center of the cluster area as the center of the largest circumscribed rectangular frame, and use the orientation of the cluster area as a constraint, to obtain the largest circumscribed rectangular frame of the cluster area by fitting .
  56. 根据权利要求54所述的电子设备,其特征在于,The electronic device according to claim 54, wherein:
    所述处理器,还体用于根据所述聚类区域的最大外接矩形框,确定所述待检测物体的三维坐标。The processor is further configured to determine the three-dimensional coordinates of the object to be detected according to the largest circumscribed rectangular frame of the cluster area.
  57. 根据权利要求31-56任一项所述的电子设备,其特征在于,The electronic device according to any one of claims 31-56, wherein:
    所述处理器,还用于获取所述待检测物体的点云数据。The processor is also used to obtain point cloud data of the object to be detected.
  58. 根据权利要求57所述的电子设备,其特征在于,The electronic device according to claim 57, wherein:
    所述处理器,具体用于获取深度传感器采集的所述待检测物体的点云数据。The processor is specifically configured to obtain the point cloud data of the object to be detected collected by the depth sensor.
  59. 根据权利要求57所述的电子设备,其特征在于,The electronic device according to claim 57, wherein:
    所述处理器,具体用于获取双目摄像头对所述待检测物体进行采集的第一图像和第二图像;根据所述第一图像和所述第二图像,获得待检测物体的点云数据。The processor is specifically configured to obtain a first image and a second image collected by a binocular camera on the object to be detected; obtain point cloud data of the object to be detected according to the first image and the second image .
  60. 根据权利要求57所述的电子设备,其特征在于,所述电子设备还包括双目摄像头,The electronic device of claim 57, wherein the electronic device further comprises a binocular camera,
    所述双目摄像头,用于采集所述待检测物体的第一图像和第二图像;The binocular camera is used to collect the first image and the second image of the object to be detected;
    所述处理器,具体用于获取所述双目摄像头采集的所述第一图像和所述第二图像;并根据所述第一图像和所述第二图像,获得待检测物体的点云数据。The processor is specifically configured to obtain the first image and the second image collected by the binocular camera; and obtain the point cloud data of the object to be detected according to the first image and the second image .
  61. 根据权利要求31所述的电子设备,其特征在于,所述点云数据中的每个点云包括该点云的三维数据和反射率。The electronic device according to claim 31, wherein each point cloud in the point cloud data includes three-dimensional data and reflectivity of the point cloud.
  62. 一种车辆,其特征在于,包括:车身和安装在所述车身上的如权利要求31-61任一项所述的电子设备。A vehicle, characterized by comprising: a vehicle body and the electronic device according to any one of claims 31-61 installed on the vehicle body.
  63. 一种交通工具,其特征在于,包括:交通工具本体和安装在所述交通工具本体上的如权利要求31-61任一项所述的电子设备。A vehicle, characterized by comprising: a vehicle body and the electronic device according to any one of claims 31-61 installed on the vehicle body.
  64. 一种无人机,其特征在于,包括:机身和安装在所述机身上的如权利要求31-61任一项所述的电子设备。An unmanned aerial vehicle, characterized by comprising: a fuselage and the electronic equipment according to any one of claims 31-61 installed on the fuselage.
  65. 一种计算机存储介质,其特征在于,所述存储介质中存储计算机程序,所述计算机程序在执行时实现如权利要求1-30中任一项所述的物体检测方法。A computer storage medium, wherein a computer program is stored in the storage medium, and the computer program realizes the object detection method according to any one of claims 1-30 when executed.
PCT/CN2019/078629 2019-03-19 2019-03-19 Object detection method, electronic device, and computer storage medium WO2020186444A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/078629 WO2020186444A1 (en) 2019-03-19 2019-03-19 Object detection method, electronic device, and computer storage medium
CN201980005385.1A CN111316285A (en) 2019-03-19 2019-03-19 Object detection method, electronic device, and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/078629 WO2020186444A1 (en) 2019-03-19 2019-03-19 Object detection method, electronic device, and computer storage medium

Publications (1)

Publication Number Publication Date
WO2020186444A1 true WO2020186444A1 (en) 2020-09-24

Family

ID=71157767

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/078629 WO2020186444A1 (en) 2019-03-19 2019-03-19 Object detection method, electronic device, and computer storage medium

Country Status (2)

Country Link
CN (1) CN111316285A (en)
WO (1) WO2020186444A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112614226A (en) * 2020-12-07 2021-04-06 深兰人工智能(深圳)有限公司 Point cloud multi-view feature fusion method and device
CN112784814A (en) * 2021-02-10 2021-05-11 中联重科股份有限公司 Posture recognition method for vehicle backing and warehousing and conveying vehicle backing and warehousing guide system
CN112800873A (en) * 2021-01-14 2021-05-14 知行汽车科技(苏州)有限公司 Method, device and system for determining target direction angle and storage medium
CN112927234A (en) * 2021-02-25 2021-06-08 中国工商银行股份有限公司 Point cloud semantic segmentation method and device, electronic equipment and readable storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111973410A (en) * 2020-06-30 2020-11-24 北京迈格威科技有限公司 Obstacle detection method and device, obstacle avoidance equipment and computer readable storage medium
CN112149746B (en) * 2020-09-27 2024-02-06 中国商用飞机有限责任公司北京民用飞机技术研究中心 Landing gear remaining use number model training method and device and computer equipment
CN112233052B (en) * 2020-10-15 2024-04-30 北京四维图新科技股份有限公司 Expansion convolution processing method, image processing method, apparatus and storage medium
CN112509052B (en) * 2020-12-22 2024-04-23 苏州超云生命智能产业研究院有限公司 Method, device, computer equipment and storage medium for detecting macula fovea
CN112927559A (en) * 2021-01-20 2021-06-08 国汽智控(北京)科技有限公司 Vehicle danger avoiding control method, device and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778693A (en) * 2015-04-08 2015-07-15 云挺 Leaf area index calculation method based on projection algorithm and active contour model
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
US20180225515A1 (en) * 2015-08-04 2018-08-09 Baidu Online Network Technology (Beijing) Co. Ltd. Method and apparatus for urban road recognition based on laser point cloud, storage medium, and device
CN108681718A (en) * 2018-05-20 2018-10-19 北京工业大学 A kind of accurate detection recognition method of unmanned plane low target
CN109146943A (en) * 2018-08-03 2019-01-04 百度在线网络技术(北京)有限公司 Detection method, device and the electronic equipment of stationary object

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9110163B2 (en) * 2013-06-14 2015-08-18 Microsoft Technology Licensing, Llc Lidar-based classification of object movement
CN109145677A (en) * 2017-06-15 2019-01-04 百度在线网络技术(北京)有限公司 Obstacle detection method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778693A (en) * 2015-04-08 2015-07-15 云挺 Leaf area index calculation method based on projection algorithm and active contour model
US20180225515A1 (en) * 2015-08-04 2018-08-09 Baidu Online Network Technology (Beijing) Co. Ltd. Method and apparatus for urban road recognition based on laser point cloud, storage medium, and device
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
CN108681718A (en) * 2018-05-20 2018-10-19 北京工业大学 A kind of accurate detection recognition method of unmanned plane low target
CN109146943A (en) * 2018-08-03 2019-01-04 百度在线网络技术(北京)有限公司 Detection method, device and the electronic equipment of stationary object

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112614226A (en) * 2020-12-07 2021-04-06 深兰人工智能(深圳)有限公司 Point cloud multi-view feature fusion method and device
CN112800873A (en) * 2021-01-14 2021-05-14 知行汽车科技(苏州)有限公司 Method, device and system for determining target direction angle and storage medium
CN112784814A (en) * 2021-02-10 2021-05-11 中联重科股份有限公司 Posture recognition method for vehicle backing and warehousing and conveying vehicle backing and warehousing guide system
CN112784814B (en) * 2021-02-10 2024-06-07 中联重科股份有限公司 Gesture recognition method for reversing and warehousing of vehicle and reversing and warehousing guiding system of conveying vehicle
CN112927234A (en) * 2021-02-25 2021-06-08 中国工商银行股份有限公司 Point cloud semantic segmentation method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN111316285A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
WO2020186444A1 (en) Object detection method, electronic device, and computer storage medium
US11788861B2 (en) Map creation and localization for autonomous driving applications
WO2021184218A1 (en) Relative pose calibration method and related apparatus
JP2019532268A (en) Determination of stereo distance information using an imaging device integrated in the propeller blades
US11120280B2 (en) Geometry-aware instance segmentation in stereo image capture processes
CN111247557A (en) Method and system for detecting moving target object and movable platform
US20210365038A1 (en) Local sensing based autonomous navigation, and associated systems and methods
CN111670339B (en) Techniques for collaborative mapping between unmanned aerial vehicles and ground vehicles
CN112639882A (en) Positioning method, device and system
EP3447729B1 (en) 2d vehicle localizing using geoarcs
CN112378397B (en) Unmanned aerial vehicle target tracking method and device and unmanned aerial vehicle
CN110751336B (en) Obstacle avoidance method and obstacle avoidance device of unmanned carrier and unmanned carrier
EP3291178B1 (en) 3d vehicle localizing using geoarcs
US11842440B2 (en) Landmark location reconstruction in autonomous machine applications
CN113887400B (en) Obstacle detection method, model training method and device and automatic driving vehicle
CN114200481A (en) Positioning method, positioning system and vehicle
US11308324B2 (en) Object detecting system for detecting object by using hierarchical pyramid and object detecting method thereof
CN112380933B (en) Unmanned aerial vehicle target recognition method and device and unmanned aerial vehicle
CN116678424A (en) High-precision vehicle positioning, vectorization map construction and positioning model training method
US20240151855A1 (en) Lidar-based object tracking
US20230098223A1 (en) Systems and method for lidar grid velocity estimation
CN115131509A (en) Method and device for processing point cloud data
CN114384486A (en) Data processing method and device
WO2022133911A1 (en) Target detection method and apparatus, movable platform, and computer-readable storage medium
JP7295320B1 (en) Information processing device, program, system, and information processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19920499

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19920499

Country of ref document: EP

Kind code of ref document: A1