WO2022126522A1

WO2022126522A1 - Object recognition method, apparatus, movable platform, and storage medium

Info

Publication number: WO2022126522A1
Application number: PCT/CN2020/137298
Authority: WO
Inventors: 蒋卓键; 黄浩洸; 栗培梁
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2022-06-23
Also published as: CN114556445A

Abstract

Provided are an object recognition method, apparatus, movable platform, and storage medium, said method comprising: obtaining point cloud data and image data of an object to be recognized, and obtaining a first fused feature according to said point cloud data and said image data; and obtaining a first image feature according to said image data (S101); obtaining a first dimension factor according to said first fused feature; and obtaining a second dimension factor according to said first image feature; said first dimension factor and said second dimension factor characterizing the probability that said object to be recognized is an object within a target size range (S102); obtaining a second fused feature according to the first fused feature and the first dimension factor (S103); and obtaining a second image feature according to the first image feature and the second dimension factor; according to the second fused feature and the second image feature, recognizing the object to be recognized (S104). The present method achieves effective recognition of objects within a target size range.

Description

物体识别方法、装置、可移动平台以及存储介质Object recognition method, device, movable platform and storage medium

技术领域technical field

本申请涉及物体识别技术领域，具体而言，涉及一种物体识别方法、装置、可移动平台及存储介质。The present application relates to the technical field of object recognition, and in particular, to an object recognition method, device, movable platform and storage medium.

背景技术Background technique

随着技术的发展，无人驾驶车辆、无人飞行器等可移动平台也逐渐发展起来。其中，在可移动平台的移动过程中，需要感知可移动平台周围的环境，获得环境中存在的物体的信息以控制可移动平台安全可靠地工作，比如避障是如无人驾驶车辆、无人飞行器等可移动平台需要重点关注的问题，而要实现避障的关键在于对可移动平台周围的物体的准确识别，因此，如何精确地识别出可移动平台周边的物体信息，成为亟待解决的技术问题。With the development of technology, mobile platforms such as unmanned vehicles and unmanned aerial vehicles have gradually developed. Among them, in the process of moving the movable platform, it is necessary to perceive the environment around the movable platform, and obtain the information of objects existing in the environment to control the movable platform to work safely and reliably. For example, obstacle avoidance is such as unmanned vehicles, unmanned vehicles Movable platforms such as aircraft need to focus on the issues, and the key to achieving obstacle avoidance lies in the accurate identification of objects around the movable platform. Therefore, how to accurately identify the information of objects around the movable platform has become an urgent technology to be solved. question.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本申请的目的之一是提供一种物体识别方法、装置、可移动平台及存储介质。In view of this, one of the objectives of the present application is to provide an object recognition method, device, movable platform and storage medium.

第一方面，本申请实施例提供了一种物体识别方法，包括：In a first aspect, an embodiment of the present application provides an object recognition method, including:

获取待识别物体的点云数据以及图像数据，并根据所述点云数据和所述图像数据获取第一融合特征；以及，根据所述图像数据获取第一图像特征；acquiring point cloud data and image data of the object to be recognized, and acquiring a first fusion feature according to the point cloud data and the image data; and acquiring a first image feature according to the image data;

根据所述第一融合特征获取第一尺寸系数；以及，根据所述第一图像特征获取第二尺寸系数；所述第一尺寸系数和所述第二尺寸系数表征所述待识别物体属于目标尺寸范围的物体的概率；Obtain a first size coefficient according to the first fusion feature; and obtain a second size coefficient according to the first image feature; the first size coefficient and the second size coefficient indicate that the object to be recognized belongs to the target size the probability of objects in the range;

根据所述第一融合特征和所述第一尺寸系数获取第二融合特征；以及，根据所述第一图像特征和所述第二尺寸系数获取第二图像特征；Obtaining a second fusion feature according to the first fusion feature and the first size coefficient; and obtaining a second image feature according to the first image feature and the second size coefficient;

根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别。The object to be recognized is recognized according to the second fusion feature and the second image feature.

第二方面，本申请实施例提供了一种物体识别装置，包括处理器和存储有计算机程序的存储器；In a second aspect, an embodiment of the present application provides an object recognition device, comprising a processor and a memory storing a computer program;

所述处理器在执行所述计算机程序时实现以下步骤：The processor implements the following steps when executing the computer program:

根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别。。The object to be recognized is recognized according to the second fusion feature and the second image feature. .

第三方面，本申请实施例提供了一种可移动平台，包括：In a third aspect, an embodiment of the present application provides a movable platform, including:

机体；body;

动力***，安装在所述机体内，用于为所述可移动平台提供动力；以及，a power system mounted within the body for powering the movable platform; and,

如第一方面所述的物体识别装置。The object recognition device according to the first aspect.

第三方面，本申请实施例提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如第一方面所述的方法。In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the method according to the first aspect.

本申请实施例所提供的一种物体识别方法、装置、可移动平台及存储介质，在获取待识别物体的点云数据以及图像数据之后，根据所述点云数据和所述图像数据获取第一融合特征，以及根据所述图像数据获取第一图像特征；然后根据所述第一融合特征获取第一尺寸系数，以及根据所述第一图像特征获取第二尺寸系数，所述第一尺寸系数和所述第二尺寸系数表征所述待识别物体属于目标尺寸范围的物体的概率；接着根据所述第一融合特征和所述第一尺寸系数获取第二融合特征，以及根据所述第一图像特征和所述第二尺寸系数获取第二图像特征；最后根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别。本实施例中，通过确定的所述第一尺寸系数和第二尺寸系数来加大对于目标尺寸范围的物体的关注度，进而通过所述第一尺寸系数和第二尺寸系数来增加在第二融合特征和所述第二图像特征中关于目标尺寸范围的物体的信息，使得所述第二融合特征和所述第二图像特征中包含有更多关于目标尺寸范围的物体的语义信息，从而基于第二融合特征和所述第二图像特征能够对小物体进行准确的识别，提高物体的识别准确度；进一步地，本实施例通过组合点云数据和图像数据进行物体识别，点云数据能够为图像数据提供空间信息的辅助，图像数据能够为点云数据提供色彩信息的辅助，两种数据相辅相成，有利于进一步提高物体识别的准确性。In the object recognition method, device, movable platform and storage medium provided by the embodiments of the present application, after acquiring point cloud data and image data of an object to be recognized, a first fusing features, and obtaining a first image feature according to the image data; then obtaining a first size coefficient according to the first fusion feature, and obtaining a second size coefficient according to the first image feature, the first size coefficient and The second size coefficient represents the probability that the object to be identified belongs to the object in the target size range; then a second fusion feature is obtained according to the first fusion feature and the first size coefficient, and a second fusion feature is obtained according to the first image feature and the second size coefficient to obtain a second image feature; finally, the object to be recognized is recognized according to the second fusion feature and the second image feature. In this embodiment, the first size coefficient and the second size coefficient are determined to increase the attention to objects in the target size range, and then the first size coefficient and the second size coefficient are used to increase the second size coefficient. information about objects in the target size range in the fusion feature and the second image feature, so that the second fusion feature and the second image feature contain more semantic information about objects in the target size range, so that based on The second fusion feature and the second image feature can accurately identify small objects and improve the object recognition accuracy; further, in this embodiment, object recognition is performed by combining point cloud data and image data, and the point cloud data can be The image data provides the assistance of spatial information, and the image data can provide the assistance of color information for the point cloud data. The two kinds of data complement each other, which is beneficial to further improve the accuracy of object recognition.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.

图1A是本申请一个实施例提供的一种应用场景的示意图；FIG. 1A is a schematic diagram of an application scenario provided by an embodiment of the present application;

图1B是本申请一个实施例提供的一种无人驾驶车辆的结构示意图；1B is a schematic structural diagram of an unmanned vehicle provided by an embodiment of the present application;

图2是本申请一个实施例提供的一种物体识别方法的流程示意图；2 is a schematic flowchart of an object recognition method provided by an embodiment of the present application;

图3A以及图3B是本申请一个实施例提供的物体识别模型的不同结构示意图；3A and 3B are schematic diagrams of different structures of an object recognition model provided by an embodiment of the present application;

图4是本申请一个实施例提供的一种物体识别装置的结构示意图；4 is a schematic structural diagram of an object recognition device provided by an embodiment of the present application;

图5是本申请一个实施例提供的一种可移动平台的结构示意图。FIG. 5 is a schematic structural diagram of a movable platform provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

发明人发现，相关技术中的物体识别方法中，针对于尺寸较大的大物体，相关的传感器可以从大物体中获取更多的检测数据，进而可以从检测数据中提取出较多的特征信息，更多的特征信息可以使得相关技术中的物体识别方法对于大物体的识别相对准确，但对于尺寸较小的小物体来说，相关传感器从小物体中获取的检测数据相对于大物体的检测数据要少很多，从而使得从小物体的检测数据中提取的特征信息也比较少，进而使得相关技术中的物体识别方法针对于小物体的识别准确度较低，从业务场景的角度，例如车辆驾驶领域在一些极端场景下，相关技术中的物体识别方法对于小物体的识别缺陷对于车辆安全行驶可能带来严重的后果。The inventor found that, in the object recognition method in the related art, for a large object with a larger size, the related sensor can obtain more detection data from the large object, and then can extract more feature information from the detection data. , more feature information can make the object recognition method in the related art relatively accurate for the recognition of large objects, but for small objects with smaller sizes, the detection data obtained by the relevant sensors from the small objects is relatively large compared to the detection data of large objects. It is much less, so that the feature information extracted from the detection data of small objects is also relatively small, which makes the object recognition methods in the related art have low recognition accuracy for small objects. From the perspective of business scenarios, such as the field of vehicle driving In some extreme scenarios, the recognition defects of small objects in the object recognition methods in the related art may have serious consequences for the safe driving of vehicles.

基于此，本申请实施例提供了一种物体识别方法，在获取待识别物体的点云数据以及图像数据之后，根据所述点云数据和所述图像数据获取第一融合特征，以及根据所述图像数据获取第一图像特征；然后根据所述第一融合特征获取第一尺寸系数，以及根据所述第一图像特征获取第二尺寸系数，所述第一尺寸系数和所述第二尺寸系数表征所述待识别物体属于目标尺寸范围的物体的概率；接着根据所述第一融合特征和所述第一尺寸系数获取第二融合特征，以及根据所述第一图像特征和所述第二尺寸系数获取第二图像特征；最后根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别。本实施例中，通过确定的所述第一尺寸系数和第二尺寸系数来加大对于目标尺寸范围的物体的关注度，进而通过所述第一尺寸系数和第二尺寸系数来增加在第二融合特征和所述第二图像特征中关于目标尺寸范围的物体的信息，使得所述第二融合特征和所述第二图像特征中包含有更多关于目标尺寸范围的物体的语义信息，从而基于第二融合特征和所述第二图像特征能够对小物体进行准确的识别，提高物体的识别准确度；进一步地，本实施例通过组合点云数据和图像数据进行物体识别，点云数据能够为图像数据提供空间信息的辅助，图像数据能够为点云数据提供色彩信息的辅助，两种数据相辅相成，有利于进一步提高物体识别的准确性。Based on this, an embodiment of the present application provides an object recognition method. After acquiring point cloud data and image data of an object to be recognized, a first fusion feature is acquired according to the point cloud data and the image data, and a first fusion feature is acquired according to the point cloud data and the image data. obtaining a first image feature from the image data; then obtaining a first size coefficient according to the first fusion feature, and obtaining a second size coefficient according to the first image feature, the first size coefficient and the second size coefficient represent the probability that the object to be identified belongs to the object in the target size range; then obtain a second fusion feature according to the first fusion feature and the first size coefficient, and obtain a second fusion feature according to the first image feature and the second size coefficient Obtaining a second image feature; finally, identifying the object to be recognized according to the second fusion feature and the second image feature. In this embodiment, the first size coefficient and the second size coefficient are determined to increase the attention to objects in the target size range, and then the first size coefficient and the second size coefficient are used to increase the second size coefficient. information about objects in the target size range in the fusion feature and the second image feature, so that the second fusion feature and the second image feature contain more semantic information about objects in the target size range, so that based on The second fusion feature and the second image feature can accurately identify small objects and improve the object recognition accuracy; further, in this embodiment, object recognition is performed by combining point cloud data and image data, and the point cloud data can be The image data provides the assistance of spatial information, and the image data can provide the assistance of color information for the point cloud data. The two kinds of data complement each other, which is beneficial to further improve the accuracy of object recognition.

本实施例的物体识别方法可以由物体识别装置来实现。在一种可能的实现方式中，所述物体识别装置可以是具有数据处理能力的计算机芯片或者集成电路，例如中央处理单元(Central Processing Unit，CPU)、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)或者现成可编程门阵列(Field-Programmable Gate Array，FPGA)等。所述物体识别装置可以安装于可移动平台中；本申请实施例的可移动平台可以包括：汽车、无人飞行器、无人船或机器人，其中，汽车可以为无人驾驶车辆，也可以为有人驾驶车辆，所述无人飞行器可以为四旋翼无人机、六旋翼无人机或者八旋翼无人机等。在另一种实现方式中，所述物体识别装置也可以是可移动平台等；所述可移动平台至少包括汽车、无人飞行器、无人船或机器人，其中，汽车可以为无人驾驶车辆，也可以为有人驾驶车辆。The object recognition method of this embodiment may be implemented by an object recognition device. In a possible implementation manner, the object recognition device may be a computer chip or integrated circuit with data processing capability, such as a central processing unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC) or off-the-shelf Programmable Gate Array (Field-Programmable Gate Array, FPGA), etc. The object recognition device may be installed in a movable platform; the movable platform in the embodiment of the present application may include: a car, an unmanned aerial vehicle, an unmanned ship or a robot, wherein the car may be an unmanned vehicle or a manned vehicle For driving a vehicle, the unmanned aerial vehicle may be a quad-rotor drone, a hexa-rotor drone, or an octa-rotor drone. In another implementation manner, the object identification device may also be a movable platform, etc.; the movable platform includes at least a car, an unmanned aerial vehicle, an unmanned ship or a robot, wherein the car may be an unmanned vehicle, Manned vehicles are also possible.

在一示例性的应用场景中，请参阅图1A以及图1B，以可移动平台为无人驾驶车辆，所述无人驾驶车辆包括有所述物体识别装置为例进行说明，图1A示出了无人驾驶车辆100的行驶场景，图1B示出了无人驾驶车辆100的结构图，所述无人驾驶车辆100上可以安装有用于获取点云数据的激光雷达10，以及所述无人驾驶车辆100上还安装有用于获取图像数据的拍摄装置20。示例性的，所述激光雷达10以及所述拍摄装置20的数量可以是一个或多个。可以理解的是，所述激光雷达10以及所述拍摄装置20的安装位置可依据实际应用场景进行具体设置，示例性的，其中一个激光雷达10以及其中一个拍摄装置20可以安装于无人驾驶车辆100的车头。在所述无人驾驶车辆100行驶过程中，所述激光雷达10采集所述无人驾驶车辆100周围的物体的点云数据并传输给所述无人驾驶车辆100中的物体识别装置30，以及所述拍摄装置20采集所述无人驾驶车辆100周围的物体的图像数据并传输给所述无人驾驶车辆100中的物体识别装置30，所述物体识别装置30获取所述无人驾驶车辆100周围的物体的点云数据和图像数据，并基于本申请实施例的物体识别方法进行物体识别，得到识别结果。在第一种可能的实现方式中，在获得所述识别结果之后，所述无人驾驶车辆100可以使用所述识别结果进行避障决策或者进行路线规划；在第二种可能的实现方式中，可以将所述识别结果显示在所述无人驾驶车辆100的界面或者与所述无人驾驶车辆100通信连接的终端的界面，以便让用户了解无人驾驶车辆100的行驶情况以及无人驾驶车辆100周边的路况；在第三种可能的实现方式中，可以将所述识别结果传输给所述无人驾驶车辆100中的其他部件，以便所述其他部件基于所述识别结果控制所述无人驾驶车辆100安全可靠地工作。In an exemplary application scenario, please refer to FIG. 1A and FIG. 1B , the movable platform is an unmanned vehicle, and the unmanned vehicle includes the object recognition device as an example for description, and FIG. 1A shows The driving scene of the unmanned vehicle 100, FIG. 1B shows the structure diagram of the unmanned vehicle 100, the unmanned vehicle 100 can be installed with the lidar 10 for acquiring point cloud data, and the unmanned vehicle A photographing device 20 for acquiring image data is also mounted on the vehicle 100 . Exemplarily, the number of the lidar 10 and the photographing device 20 may be one or more. It can be understood that the installation positions of the lidar 10 and the photographing device 20 can be specifically set according to the actual application scenario. Exemplarily, one of the lidar 10 and one of the photographing devices 20 can be installed in an unmanned vehicle 100's head. During the driving process of the unmanned vehicle 100, the lidar 10 collects point cloud data of objects around the unmanned vehicle 100 and transmits it to the object recognition device 30 in the unmanned vehicle 100, and The photographing device 20 collects image data of objects around the unmanned vehicle 100 and transmits it to the object recognition device 30 in the unmanned vehicle 100 , and the object recognition device 30 acquires the unmanned vehicle 100 point cloud data and image data of surrounding objects, and perform object recognition based on the object recognition method of the embodiment of the present application to obtain a recognition result. In a first possible implementation manner, after obtaining the identification result, the unmanned vehicle 100 may use the identification result to make an obstacle avoidance decision or perform route planning; in a second possible implementation manner, The recognition result can be displayed on the interface of the unmanned vehicle 100 or the interface of the terminal communicatively connected with the unmanned vehicle 100, so as to let the user know the driving situation of the unmanned vehicle 100 and the unmanned vehicle 100 surrounding road conditions; in a third possible implementation manner, the identification result may be transmitted to other components in the unmanned vehicle 100, so that the other components control the unmanned vehicle based on the identification result Drive the vehicle 100 to work safely and securely.

接下来对本申请实施例提供的所述物体识别方法进行说明：请参阅图2，图2为本申请实施例提供的一种物体识别方法的流程示意图，所述方法可以由物体识别装置来实现；所述物体识别装置可以是可移动平台，或者所述物体识别装置作为芯片安装于可移动平台中；所述方法包括：Next, the object recognition method provided by the embodiment of the present application will be described: please refer to FIG. 2 , which is a schematic flowchart of an object recognition method provided by the embodiment of the present application, and the method may be implemented by an object recognition device; The object identification device may be a movable platform, or the object identification device may be installed in the movable platform as a chip; the method includes:

在步骤S101中，获取待识别物体的点云数据以及图像数据，并根据所述点云数据和所述图像数据获取第一融合特征；以及，根据所述图像数据获取第一图像特征。In step S101, point cloud data and image data of an object to be recognized are acquired, and a first fusion feature is acquired according to the point cloud data and the image data; and, a first image feature is acquired according to the image data.

在步骤S102中，根据所述第一融合特征获取第一尺寸系数；以及，根据所述第一图像特征获取第二尺寸系数；所述第一尺寸系数和所述第二尺寸系数表征所述待识别物体属于目标尺寸范围的物体的概率。In step S102, a first size coefficient is obtained according to the first fusion feature; and a second size coefficient is obtained according to the first image feature; the first size coefficient and the second size coefficient represent the Probability of identifying objects that fall within the target size range.

在步骤S103中，根据所述第一融合特征和所述第一尺寸系数获取第二融合特征；以及，根据所述第一图像特征和所述第二尺寸系数获取第二图像特征。In step S103, a second fusion feature is obtained according to the first fusion feature and the first size coefficient; and a second image feature is obtained according to the first image feature and the second size coefficient.

在步骤S104中，根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别。In step S104, the object to be recognized is recognized according to the second fusion feature and the second image feature.

其中，所述待识别物体的点云数据以及图像数据是通过可移动平台的传感器对空间采样得到的。具体来说，所述点云数据可以利用配置于可移动平台上的激光雷达或者具有深度信息采集功能的拍摄装置获取的；和/或，所述图像数据可以利用配置于可移动平台上的拍摄装置获取的。Wherein, the point cloud data and image data of the object to be recognized are obtained by spatial sampling of the sensor of the movable platform. Specifically, the point cloud data may be acquired by using a lidar configured on a movable platform or a camera with a depth information collection function; and/or, the image data may be acquired by using a camera configured on the movable platform obtained by the device.

所述激光雷达用于向所述可移动平台所在的空间中发射激光脉冲序列，然后接收从待识别物体反射回来的激光脉冲序列，并根据反射回来的激光脉冲序列生成点云数据。在一个例子中，所述激光雷达可以确定反射回来的激光脉冲序列的接收时间，例如，通过探测电信号脉冲的上升沿时间和/或下降沿时间确定激光脉冲序列的接收时间。如此，所述激光雷达可以利用激光脉冲序列的接收时间信息和发射时间计算TOF(Time of flight，飞行时间)，从而确定待识别物体到所述激光雷达的距离。所述激光雷达属于自主发光的传感器，不依赖于光源光照，受环境光干扰比较小，即使在无光封闭环境内也可以正常工作，以便后续生成高精度的三维模型，具有广泛的适用性。The lidar is used for emitting a laser pulse sequence into the space where the movable platform is located, then receiving the laser pulse sequence reflected from the object to be identified, and generating point cloud data according to the reflected laser pulse sequence. In one example, the lidar can determine the reception time of the reflected laser pulse sequence, eg, by detecting the rising edge time and/or the falling edge time of the electrical signal pulse to determine the reception time of the laser pulse sequence. In this way, the laser radar can calculate TOF (Time of flight, time of flight) by using the receiving time information and the transmitting time of the laser pulse sequence, so as to determine the distance from the object to be identified to the laser radar. The lidar is a self-illuminating sensor, does not depend on light source illumination, is less disturbed by ambient light, and can work normally even in a closed environment without light, so as to generate a high-precision three-dimensional model later, and has a wide range of applicability.

所述具有深度信息采集功能的拍摄装置包括但不限于双目视觉传感器或者结构光深度相机等；所述双目视觉传感器是基于视差原理从不同的位置获取目标场景的两幅图像，通过计算两幅图像对应点间的位置偏差，来获取三维几何信息，以此生成点云数据；所述结构光深度相机是将具有一定结构特征的光线投射到空间中再进行采集，这种具备一定结构的光线，会因待识别物体的不同深度区域而采集不同的图像相位信息，然后将其换算成深度信息，以此来获得点云数据。The shooting device with depth information collection function includes but is not limited to a binocular vision sensor or a structured light depth camera, etc. The binocular vision sensor acquires two images of the target scene from different positions based on the parallax principle, and calculates two The position deviation between the corresponding points of the images is used to obtain three-dimensional geometric information, thereby generating point cloud data; the structured light depth camera projects light with certain structural characteristics into space and then collects it. The light will collect different image phase information due to different depth areas of the object to be recognized, and then convert it into depth information to obtain point cloud data.

所述图像数据可以是彩色图像、灰度图像或者红外图像等，用于采集图像数据的拍摄装置包括但不限于可见光相机、灰度相机以及红外相机等。所述拍摄装置可以以指定帧率捕捉图像序列。所述拍摄装置可以具有可调的拍摄参数。在不同拍摄参数下，尽管经受完全相同的外部条件(例如，位置、光照)，所述拍摄装置可以拍摄不同的图像。拍摄参数可以包括曝光(例如，曝光时间、快门速度、孔径、胶片速度)、增益、伽玛、兴趣区、像素合并(binning)/子采样、像素时钟、偏移、触发、ISO等。与曝光相关的参数可以控制到达所述拍摄装置中的图像传感器的光量。例如，快门速度可以控制光到达图像传感器的时间量而孔径可以控制在给定时间内到达图像传感器的光量。与增益相关的参数可以控制对来自光学传感器的信号的放大。ISO可以控制相机对可用光的灵敏度水平。The image data may be a color image, a grayscale image, an infrared image, etc., and the photographing device used for collecting the image data includes, but is not limited to, a visible light camera, a grayscale camera, an infrared camera, and the like. The camera may capture a sequence of images at a specified frame rate. The photographing device may have adjustable photographing parameters. Under different capture parameters, the capture device may capture different images despite being subjected to exactly the same external conditions (eg location, lighting). Capture parameters may include exposure (eg, exposure time, shutter speed, aperture, film speed), gain, gamma, region of interest, binning/subsampling, pixel clock, offset, trigger, ISO, and the like. Exposure-related parameters can control the amount of light reaching the image sensor in the camera. For example, shutter speed can control the amount of time light reaches the image sensor and aperture can control the amount of light that reaches the image sensor in a given time. A gain-related parameter can control the amplification of the signal from the optical sensor. ISO controls the level of sensitivity of the camera to the available light.

在一个示例性的实施例中，所述可移动平台安装有激光雷达和可见光相机。在一个例子中，所述激光雷达和所述可见光相机可以以相同的帧率进行工作。在另一个例子中，所述激光雷达和所述可见光相机也可以以不同的帧率进行工作，所述激光雷达和所述可见光相机的帧率满足在预设的时间周期内能够获取到点云数据和图像数据。In an exemplary embodiment, the movable platform is equipped with a lidar and a visible light camera. In one example, the lidar and the visible light camera may operate at the same frame rate. In another example, the lidar and the visible light camera may also work at different frame rates, and the frame rates of the lidar and the visible light camera are sufficient to obtain point clouds within a preset time period data and image data.

示例性的，在所述可移动平台移动过程中，所述激光雷达可以实时采集待识别物体的点云数据并传输给所述物体识别装置，以及所述拍摄装置可以实时采集待识别物体的图像数据并传输给所述物体识别装置。本实施例中通过组合点云数据和图像数据进行物体识别，点云数据能够为图像数据提供空间信息的辅助，图像数据能够为点云数据提供色彩信息的辅助，两种数据相辅相成，有利于进一步提高物体识别的准确性。Exemplarily, during the movement of the movable platform, the lidar can collect point cloud data of the object to be recognized in real time and transmit it to the object recognition device, and the photographing device can collect the image of the object to be recognized in real time. data and transmitted to the object recognition device. In this embodiment, object recognition is performed by combining point cloud data and image data. The point cloud data can provide the assistance of spatial information for the image data, and the image data can provide the assistance of color information for the point cloud data. The two kinds of data complement each other, which is conducive to further Improve the accuracy of object recognition.

所述物体识别装置在获取待识别物体的点云数据以及图像数据之后，根据所述点云数据和所述图像数据获取第一融合特征，以及根据所述图像数据获取第一图像特征；然后根据所述第一融合特征获取第一尺寸系数，以及根据所述第一图像特征获取第二尺寸系数，所述第一尺寸系数和所述第二尺寸系数表征所述待识别物体属于目标尺寸范围的物体的概率，本实施例实现从第一融合特征和所述第一图像特征中确定属于目标尺寸范围的物体的信息(即所述第一尺寸系数和所述第二尺寸系数)，通过确定的所述第一尺寸系数和第二尺寸系数来加大对于目标尺寸范围的物体的关注度，进一步地，根据所述第一融合特征和所述第一尺寸系数获取第二融合特征，以及根据所述第一图像特征和所述第二尺寸系数获取第二图像特征，实现通过所述第一尺寸系数和第二尺寸系数来增加在第二融合特征和所述第二图像特征中关于目标尺寸范围的物体的语义信息，从而基于第二融合特征和所述第二图像特征能够对小物体进行准确的识别，提高物体识别的准确率。After acquiring the point cloud data and the image data of the object to be recognized, the object recognition device acquires the first fusion feature according to the point cloud data and the image data, and acquires the first image feature according to the image data; The first fusion feature obtains a first size coefficient, and obtains a second size coefficient according to the first image feature, and the first size coefficient and the second size coefficient represent that the object to be recognized belongs to the target size range. The probability of the object, in this embodiment, the information of the object belonging to the target size range (that is, the first size coefficient and the second size coefficient) is determined from the first fusion feature and the first image feature. The first size coefficient and the second size coefficient are used to increase the attention to objects in the target size range, and further, a second fusion feature is obtained according to the first fusion feature and the first size coefficient, and according to the The first image feature and the second size coefficient are used to obtain the second image feature, and the target size range in the second fusion feature and the second image feature can be increased by the first size coefficient and the second size coefficient. The semantic information of the object is obtained, so that the small object can be accurately recognized based on the second fusion feature and the second image feature, and the accuracy of object recognition can be improved.

其中，本实施例的目标尺寸范围可以包括：小于预设目标尺寸阈值的尺寸范围，考虑到对于不同种类的可移动平台，对于小物体的尺寸大小的定义并不相同，则所述预设目标尺寸阈值可以根据业务需要灵活配置，本实施例对此不作限定。The target size range in this embodiment may include: a size range smaller than a preset target size threshold. Considering that the definition of the size of small objects is different for different types of movable platforms, the preset target size The size threshold may be flexibly configured according to business needs, which is not limited in this embodiment.

针对于点云数据，点云数据是非结构化数据，需要处理为可进行数据分析的格式，例如将点云数据进行处理，得到点云数据的每个体素对应的点云密度。而点云数据的处理方式可以是点云三维网格化处理，例如，对所述点云数据进行栅格划分，获得点云数据的多个体素，点云数据中每个体素所包含的点云数量与点云数据中所有点云数量比例构成了该体素的点云密度。其中，由于点云密度表征了该体素中所包含的点云数量，若点云密度较大，表示该体素对应有物体的概率越大，因此点云数据的每个体素对应的点云密度可作为物体的特征信息。将不规则点云处理成规则的表示形式，能够更好地表现待识别物体的轮廓信息。For point cloud data, the point cloud data is unstructured data and needs to be processed into a format that can be used for data analysis. For example, the point cloud data is processed to obtain the point cloud density corresponding to each voxel of the point cloud data. The point cloud data processing method may be point cloud three-dimensional grid processing. For example, the point cloud data is divided into grids to obtain multiple voxels of the point cloud data, and the points contained in each voxel in the point cloud data are obtained. The ratio of the number of clouds to the number of all point clouds in the point cloud data constitutes the point cloud density of the voxel. Among them, since the point cloud density represents the number of point clouds contained in the voxel, if the point cloud density is large, it means that the voxel has a greater probability of corresponding to an object, so the point cloud corresponding to each voxel of the point cloud data is Density can be used as characteristic information of objects. Processing the irregular point cloud into a regular representation can better represent the contour information of the object to be recognized.

针对于图像数据，可以使用所述图像数据中的像素值进行数据分析，所述像素值包括但不限于RGB值或者灰度值等。For image data, data analysis can be performed using pixel values in the image data, where the pixel values include but are not limited to RGB values or grayscale values.

在一些实施例中，所述物体识别装置可以根据所述点云数据中每个体素的点云密度以及所述图像数据中的像素值，获取第一融合特征。所述第一融合特征融合有点云数据的特征信息和图像数据的特征信息，可以从三维角度和二维角度更好的体现出待识别物体的特性，有利于提高物体识别的准确度。In some embodiments, the object recognition apparatus may obtain the first fusion feature according to the point cloud density of each voxel in the point cloud data and the pixel value in the image data. The first fusion feature fuses the feature information of the point cloud data and the feature information of the image data, which can better reflect the characteristics of the object to be recognized from a three-dimensional angle and a two-dimensional angle, which is beneficial to improve the accuracy of object recognition.

在一种可能的实现方式中，可以拼接所述点云数据和所述图像数据，得到拼接数据并对所述拼接数据进行特征提取，从而获取所述第一融合特征。在一个例子中，可以通过预先训练好的物体识别模型对所述拼接数据进行特征提取，以获取所述第一融合特征。In a possible implementation manner, the point cloud data and the image data may be spliced to obtain spliced data, and feature extraction is performed on the spliced data to obtain the first fusion feature. In one example, feature extraction may be performed on the spliced data through a pre-trained object recognition model to obtain the first fusion feature.

其中，所述点云数据可以被栅格化为H*W*C的三维网格(其中，H和W分别代表长和宽，C表示所述三维网格的深度)，每个网格表示一个体素，所述体素的值为该体素的点云密度，所述图像数据(以RGB图像为例)可以表示为H*W*3的数据(其中，H和W分别代表长和宽，3表示RGB3个通道)，则可以将所述点云数据和所述图像数据拼接成大小为H*W*(C+3)的拼接数据，所述拼接数据包括了待识别物体的位置信息(从点云数据中获得)和色彩信息(从图像数据中获得)，能够从三维角度和二维角度更好的体现出待识别物体的特性，从而有利于提高物体识别的准确度。The point cloud data can be rasterized into a three-dimensional grid of H*W*C (where H and W represent the length and width, respectively, and C represents the depth of the three-dimensional grid), and each grid represents A voxel, the value of the voxel is the point cloud density of the voxel, and the image data (taking an RGB image as an example) can be expressed as H*W*3 data (where H and W represent the long and width, 3 means RGB3 channels), then the point cloud data and the image data can be spliced into spliced data with a size of H*W*(C+3), and the spliced data includes the position of the object to be recognized Information (obtained from point cloud data) and color information (obtained from image data) can better reflect the characteristics of the object to be recognized from a three-dimensional and two-dimensional perspective, thereby helping to improve the accuracy of object recognition.

在另一种可能的实现方式中，为了进一步提高物体识别的准确度，所述物体识别装置可以基于点云到图像的投影关系，确定所述点云数据在二维空间中的第一投影位置，然后获取所述图像数据中所述第一投影位置处的像素值，根据所述第一投影位置处的像素值以及所述点云数据生成包括有所述像素值的点云数据；以所述图像数据为RGB图像为例进行说明，所述物体识别装置可以将所述图像数据中所述第一投影位置处的RGB值赋给对应的点云，从而生成彩色点云；本实施例中，所述点云数据融合了图像中的色彩数据，进一步强化了图像数据与点云数据之间的对应关系，使得包括有所述像素值的点云数据能够更好的体现出待识别物体的特性，从而有利于提高物体识别的准确度。其中，所述点云到图像的投影关系可以基于点云坐标系到相机坐标系的外参以及相机的内参得到，设点云中的某一个点在点云坐标系中的坐标为P，设点云坐标系到相机坐标系的外参为RT，相机内参为K，该点在二维空间中的第一投影位置为p，则p＝RT*K*P。In another possible implementation manner, in order to further improve the accuracy of object recognition, the object recognition device may determine the first projection position of the point cloud data in the two-dimensional space based on the projection relationship between the point cloud and the image , and then obtain the pixel value at the first projection position in the image data, and generate point cloud data including the pixel value according to the pixel value at the first projection position and the point cloud data; The image data is an RGB image as an example to illustrate, the object recognition device can assign the RGB value at the first projection position in the image data to the corresponding point cloud, thereby generating a color point cloud; in this embodiment, , the point cloud data is fused with the color data in the image, which further strengthens the correspondence between the image data and the point cloud data, so that the point cloud data including the pixel value can better reflect the object to be recognized. characteristics, thereby helping to improve the accuracy of object recognition. The projection relationship between the point cloud and the image can be obtained based on the external parameters from the point cloud coordinate system to the camera coordinate system and the internal parameters of the camera. Let the coordinates of a point in the point cloud in the point cloud coordinate system be P, and let The external parameter from the point cloud coordinate system to the camera coordinate system is RT, the internal parameter of the camera is K, and the first projection position of the point in the two-dimensional space is p, then p=RT*K*P.

接着，所述物体识别装置可以根据所述包括有所述像素值的点云数据以及所述图像数据获取所述第一融合特征，例如所述物体识别装置可以拼接所述包括有所述像素值的点云数据以及所述图像数据得到拼接数据，并对所述拼接数据进行特征提取，从而获取所述第一融合特征。在一个例子中，可以通过预先训练好的物体识别模型对所述拼接数据进行特征提取，以获取所述第一融合特征。Next, the object recognition device may obtain the first fusion feature according to the point cloud data including the pixel value and the image data, for example, the object recognition device may stitch the point cloud data including the pixel value The point cloud data and the image data are obtained to obtain mosaic data, and feature extraction is performed on the mosaic data to obtain the first fusion feature. In one example, feature extraction may be performed on the spliced data through a pre-trained object recognition model to obtain the first fusion feature.

其中所述点云数据可以被栅格化为H*W*C的三维网格，每个网格表示一个体素，所述体素的值为该体素的点云密度和点云的像素值(如RGB值)，所述图像数据(以RGB图像为例)可以表示为H*W*3的数据，则可以将所述包括有所述像素值的点云数据和所述图像数据拼接成大小为H*W*(C+3)的拼接数据，所述拼接数据包括了待识别物体的位置信息、位置和色彩的对应关系信息(从包括有所述像素值的点云数据获得)以及色彩信息(从图像数据中获得)，能够从三维角度和二维角度综合体现出待识别物体的特性，从而有利于提高物体识别的准确度。The point cloud data can be rasterized into a H*W*C three-dimensional grid, each grid represents a voxel, and the value of the voxel is the point cloud density of the voxel and the pixel of the point cloud value (such as RGB value), the image data (taking RGB image as an example) can be expressed as H*W*3 data, then the point cloud data including the pixel value and the image data can be spliced The size is H*W*(C+3) splicing data, the splicing data includes the position information of the object to be recognized, the correspondence information of the position and the color (obtained from the point cloud data including the pixel value) and color information (obtained from image data), which can comprehensively reflect the characteristics of the object to be recognized from three-dimensional and two-dimensional angles, thereby helping to improve the accuracy of object recognition.

在一些实施例中，所述物体识别装置对所述图像数据进行特征提取以获取所述第一图像特征；在一个例子中，可以通过预先训练好的物体识别模型对所述图像数据进行特征提取，以获取所述第一图像特征。In some embodiments, the object recognition apparatus performs feature extraction on the image data to obtain the first image feature; in an example, feature extraction can be performed on the image data through a pre-trained object recognition model , to obtain the first image feature.

在获取所述第一融合特征以及所述第一图像特征之后，所述物体识别装置从所述第一融合中获取第一尺寸系数以及从所述第一图像特征中获取第二尺寸系数，所述第一尺寸系数和所述第二尺寸系数表征所述待识别物体属于目标尺寸范围的物体的概率，本实施例实现从第一融合特征和所述第一图像特征中确定有关于目标尺寸范围的物体的信息，通过确定的所述第一尺寸系数和第二尺寸系数来加大对于目标尺寸范围的物体的关注度，从而提高对小物体的识别准确度。After acquiring the first fusion feature and the first image feature, the object recognition apparatus acquires a first size coefficient from the first fusion and a second size coefficient from the first image feature, so The first size coefficient and the second size coefficient represent the probability that the object to be identified belongs to the object in the target size range. information of the object, and the first size coefficient and the second size coefficient are determined to increase the attention to objects in the target size range, thereby improving the recognition accuracy of small objects.

在一种可能的实现方式中，所述第一融合特征和所述第一图像特征可以以特征图的形式表示，所述物体识别装置可以根据包括所述第一融合特征的特征图中的每个位置获取该位置处的第一尺寸系数，所述第一尺寸系数表征包括所述第一融合特征的特征图中的每个位置属于目标尺寸范围的物体的概率，所述第一融合特征中融合了点云信息和图像信息，使得从所述第一融合特征获得的第一尺寸系数也体现了点云信息和图像信息，因此具有较好的语义信息；以及，所述物体识别装置可以根据包括所述第一图像特征的特征图中的每个位置获取该位置处的第二尺寸系数，所述第二尺寸系数表征包括所述第一图像特征的特征图中的每个位置属于目标尺寸范围的物体的概率。In a possible implementation manner, the first fusion feature and the first image feature may be represented in the form of a feature map, and the object recognition apparatus may The first size coefficient at the position is obtained from each position, and the first size coefficient represents the probability that each position in the feature map including the first fusion feature belongs to the object size range. The point cloud information and the image information are fused, so that the first size coefficient obtained from the first fusion feature also reflects the point cloud information and the image information, so it has better semantic information; and, the object recognition device can be based on obtaining a second size coefficient at each position in the feature map including the first image feature, the second size coefficient characterizing that each position in the feature map including the first image feature belongs to the target size The probability of an object in range.

在一个例子中，所述物体识别装置可以使用预先训练好的物体识别模型从所述第一融合特征中获取第一尺寸系数，以及从所述第一图像特征中获取第二尺寸系数。In one example, the object recognition apparatus may use a pre-trained object recognition model to obtain a first size coefficient from the first fusion feature, and obtain a second size coefficient from the first image feature.

在获取所述第一尺寸系数和所述第二尺寸系数之后，所述物体识别装置根据所述第一融合特征和所述第一尺寸系数获取第二融合特征，以及，根据所述第一图像特征和所述第二尺寸系数获取第二图像特征，最后根据所述第二融合特征和所述第二图像特征对所述待识别物体进行目标识别。本实施例实现通过所述第一尺寸系数和第二尺寸系数来增加在第二融合特征和所述第二图像特征中关于目标尺寸范围的物体的信息，使得所述第二融合特征和所述第二图像特征能够融合有关于目标尺寸范围的物体的更多语义信息，从而基于第二融合特征和所述第二图像特征能够对小物体进行准确识别，提高物体识别的准确度；当然，在所述点云数据和图像数据也包括从大物体采集的数据的情况下，基于所述点云数据和图像数据获得的所述第二融合特征和所述第二图像特征中也包括有从大物体提取到的特征信息，因此也能够对大物体进行准确识别。After acquiring the first size coefficient and the second size coefficient, the object recognition apparatus acquires a second fusion feature according to the first fusion feature and the first size coefficient, and, according to the first image The second image feature is obtained by using the feature and the second size coefficient, and finally the object to be recognized is identified according to the second fusion feature and the second image feature. In this embodiment, the first size coefficient and the second size coefficient are used to increase the information about objects in the target size range in the second fusion feature and the second image feature, so that the second fusion feature and the The second image feature can be fused with more semantic information about objects in the target size range, so that small objects can be accurately recognized based on the second fusion feature and the second image feature, and the accuracy of object recognition can be improved; When the point cloud data and image data also include data collected from large objects, the second fusion feature and the second image feature obtained based on the point cloud data and image data also include large objects. The feature information extracted from the object can also accurately identify large objects.

在一个例子中，所述第二融合特征可以是所述第一融合特征和所述第一尺寸系数之和；和/或，所述第二图像特征可以是所述第一图像特征和所述第二尺寸系数之和，从而使得所述第二融合特征和所述第二图像特征能够融合有关于目标尺寸范围的物体的更多语义信息。在其他例子中，例如还可以通过其他运算方式获取所述第二融合特征或者所述第二图像特征，例如在其他的一些可能的实施例中，所述第二融合特征为所述第一融合特征和所述第一尺寸系数的积，或者，所述第二图像特征为所述第一图像特征和所述第二尺寸系数的积，可依据实际应用场景进行具体设置。In one example, the second fusion feature may be the sum of the first fusion feature and the first size coefficient; and/or the second image feature may be the first image feature and the first size coefficient The sum of the second size coefficients enables the second fusion feature and the second image feature to be fused with more semantic information about objects in the target size range. In other examples, for example, the second fusion feature or the second image feature may also be obtained through other operation methods, for example, in some other possible embodiments, the second fusion feature is the first fusion The product of the feature and the first size coefficient, or the second image feature is the product of the first image feature and the second size coefficient, which can be specifically set according to the actual application scenario.

示例性的，所述第一融合特征、所述第一图像特征、所述第二融合特征和所述第二图像特征可以以特征图形式表示。所述物体识别装置可以根据包括所述第一融合特征的特征图中的每个位置获取该位置处的第一尺寸系数，以及，根据包括所述第一图像特征的特征图中的每个位置获取该位置处的第二尺寸系数；即包括有所述第一融合特征的特征图中的每个位置对应有所述第一尺寸系数，以及包括有所述第一图像特征的特征图中的每个位置对应有所述第二尺寸系数；因此，可以将包括有所述第一融合特征的特征图中的每个位置的值与该位置的第一尺寸系数相加，得到包括有所述第二融合特征的特征图；以及，将包括有所述第一图像特征的特征图中的每个位置的值与该位置的第二尺寸系数相加，得到包括有所述第二图像特征的特征图。Exemplarily, the first fusion feature, the first image feature, the second fusion feature, and the second image feature may be represented in the form of a feature map. The object recognition device may obtain a first size coefficient at each position in the feature map including the first fusion feature, and, according to each position in the feature map including the first image feature Acquire the second size coefficient at the position; that is, each position in the feature map including the first fusion feature corresponds to the first size coefficient, and each position in the feature map including the first image feature corresponds to the first size coefficient. Each position corresponds to the second size coefficient; therefore, the value of each position in the feature map including the first fusion feature can be added to the first size coefficient of the position to obtain a value including the first size coefficient. A feature map of the second fusion feature; and adding the value of each position in the feature map including the first image feature to the second size coefficient of the position to obtain a feature map including the second image feature feature map.

在根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别时，为了进一步提高物体识别的准确率，所述物体识别装置可以对两者进行进一步融合，具体来说，所述物体识别装置基于点云到图像的投影关系，确定所述第二融合特征在二维空间中的第二投影位置，然后获取所述第二图像特征中在所述第二投影位置处的图像特征，根据所述图像特征与对应的第二融合特征获得第三融合特征，最后根据所述第三融合特征对所述待识别物体进行识别，获取识别结果。本实施例中，通过融合所述第二融合特征和所述第二图像特征得到第三融合特征，所述第三融合特征融合了图像信息和点云信息并且进一步融合有关于目标尺寸范围的物体的更多信息，具有较好的语义信息，使用所述第三融合特征进行物体识别，有利于提高物体识别的准确性。When recognizing the object to be recognized according to the second fusion feature and the second image feature, in order to further improve the accuracy of object recognition, the object recognition device may further fuse the two, specifically , the object recognition device determines the second projection position of the second fusion feature in the two-dimensional space based on the projection relationship between the point cloud and the image, and then obtains the second projection position in the second image feature. According to the image feature and the corresponding second fusion feature, a third fusion feature is obtained, and finally, the object to be recognized is recognized according to the third fusion feature, and a recognition result is obtained. In this embodiment, a third fusion feature is obtained by fusing the second fusion feature and the second image feature. The third fusion feature fuses image information and point cloud information and further fuses objects related to the target size range. more information, with better semantic information, and using the third fusion feature for object recognition is beneficial to improve the accuracy of object recognition.

其中，所述点云到图像的投影关系可以基于点云坐标系到相机坐标系的外参以及相机的内参得到。所述识别结果至少包括：所述待识别物体的置信度和/或状态信息；所述置信度用于表征所述待识别物体属于障碍物的概率。所述状态信息包括以下至少一种：物体的尺寸信息、位置信息和朝向信息。Wherein, the projection relationship of the point cloud to the image can be obtained based on the external parameters from the point cloud coordinate system to the camera coordinate system and the internal parameters of the camera. The recognition result includes at least: confidence and/or state information of the object to be recognized; the confidence is used to represent the probability that the object to be recognized belongs to an obstacle. The state information includes at least one of the following: size information, position information and orientation information of the object.

在一些实施例中，可以将所述点云数据和图像数据输入预先训练好的物体识别模型中，利用所述物体识别模型对所述点云数据和所述图像数据进行处理，以进一步进行物体识别。In some embodiments, the point cloud data and the image data may be input into a pre-trained object recognition model, and the point cloud data and the image data may be processed by the object recognition model to further process the object identify.

在一些实施例中，本申请实施例的物体识别方法可以通过训练好的一个物体识别模型来实现，所述物体识别装置中可以预置有所述物体识别模型，以便通过所述物体识别模型来进行物体识别过程。In some embodiments, the object recognition method of the embodiments of the present application may be implemented by a trained object recognition model, and the object recognition device may be preset with the object recognition model, so that the object recognition model can be used to identify the object. Carry out the object recognition process.

物体识别模型的训练过程可以是：先通过建模表示出一个模型，再通过构建评价函数对模型进行评价，最后根据样本数据及最优化方法对评价函数进行优化，把模型调整到最优。The training process of the object recognition model can be: firstly express a model through modeling, then evaluate the model by constructing an evaluation function, and finally optimize the evaluation function according to the sample data and the optimization method, and adjust the model to the optimum.

其中，建模是将实际问题转化成为计算机可以理解的问题，即将实际的问题转换成计算机可以表示的方式。建模一般是指基于大量样本数据估计出来模型的目标函数的过程。Among them, modeling is to convert practical problems into problems that can be understood by computers, that is, to convert practical problems into ways that computers can represent. Modeling generally refers to the process of estimating the objective function of the model based on a large number of sample data.

评价的目标是判断已建好的模型的优劣。对于第一步中建好的模型，评价是一个指标，用于表示模型的优劣。这里就会涉及一些评价的指标以及一些评价函数的设计。在机器学习中会有针对性的评价指标。例如，在建模完成后，需要为模型设计一个损失函数，来评价模型的输出误差。The goal of evaluation is to judge the pros and cons of the established model. For the model built in the first step, evaluation is an indicator used to represent the quality of the model. Here will involve some evaluation indicators and the design of some evaluation functions. There will be targeted evaluation indicators in machine learning. For example, after the modeling is completed, a loss function needs to be designed for the model to evaluate the output error of the model.

优化的目标是评价函数。即利用最优化方法，对评价函数进行最优化求解，找到评价最高的模型。例如，可以通过诸如梯度下降法等最优化方法，找到损失函数的输出误差的最小值(最优解)，将模型的参数调到最优。The goal of optimization is the evaluation function. That is, the optimization method is used to optimize the evaluation function and find the model with the highest evaluation. For example, an optimization method such as gradient descent can be used to find the minimum value (optimal solution) of the output error of the loss function, and adjust the parameters of the model to the optimum.

可以这么理解，要训练一个模型之前，首先确定出一个合适的参数估计方法，再利用这种参数估计方法，把这个模型的目标函数中的各个参数估计出来，进而确定出目标函数最终的数学表达式。It can be understood that before training a model, first determine a suitable parameter estimation method, and then use this parameter estimation method to estimate the parameters in the objective function of the model, and then determine the final mathematical expression of the objective function. Mode.

相关技术中的物体识别模型在物体识别上已经能够取得非常好的效果，能够以很高的准确性识别出物体，但发明人发现，在有关可移动平台的实际业务场景中，物体识别模型往往关注的是人、车辆、道路或树木等尺寸较大的目标，样本数据中这些尺寸较大的目标往往数据量也大。而模型在训练过程会偏向全局最优，尺寸较大的目标的特征往往也较为明显而更容易被关注到，小物体的特征相对而言较为细微难以获得模型的关注，使得模型偏向于提取大物体的特征，这种偏执性最终导致模型能够很好地识别出大物体，而不能很好地关注到小物体的识别，导致相关技术中的物体识别模型存在缺陷。从业务场景的角度，例如车辆驾驶领域在一些极端场景下，物体识别模型细微的缺陷对于车辆安全行驶可能带来严重的后果。从技术的角度，在已有较高准确性的基础上再进一步解决缺陷是极具挑战和难度的，因为在机器学习领域，如上所述，从建模至训练阶段涉及非常多的环节，例如样本数据的选择与处理、数据特征的设计、模型的设计、损失函数的设计或优化方法的设计等等，任一环节的细微差别都是导致模型存在缺陷的因素。Object recognition models in the related art have achieved very good results in object recognition, and can recognize objects with high accuracy, but the inventors found that in practical business scenarios related to mobile platforms, object recognition models often The focus is on objects with larger sizes such as people, vehicles, roads, or trees, and these larger objects in the sample data tend to have larger amounts of data. However, the model will tend to be globally optimal during the training process, and the features of larger objects are often more obvious and easier to be noticed. The characteristics of objects, this paranoia eventually leads to the model being able to recognize large objects well, but not paying attention to the recognition of small objects, resulting in defects in object recognition models in related technologies. From the perspective of business scenarios, for example, in the field of vehicle driving, in some extreme scenarios, subtle defects in the object recognition model may have serious consequences for the safe driving of vehicles. From a technical point of view, it is extremely challenging and difficult to further solve the defects on the basis of the existing high accuracy, because in the field of machine learning, as mentioned above, there are many links from modeling to training, such as The selection and processing of sample data, the design of data features, the design of models, the design of loss functions or the design of optimization methods, etc., the subtle differences in any link are factors that lead to defects in the model.

基于此，请参阅图3A，本申请实施例提供了一种用于进行物体识别的物体识别模型，所述物体识别模型包括有第一特征提取网络、第二特征提取网络、第一尺寸系数提取网络、第二尺寸系数提取网络以及物体预测网络；所述第一特征提取网络用于根据所述点云数据和所述图像数据获取第一融合特征；所述第二特征提取网络用于根据所述图像数据获取第一图像特征；所述第一尺寸系数提取网络用于根据所述第一融合特征获取第一尺寸系数，以及根据所述第一融合特征和所述第一尺寸系数获取第二融合特征；所述第二尺寸系数提取网络用于根据所述第一图像特征获取第二尺寸系数，以及，根据所述第一图像特征和所述第二尺寸系数获取第二图像特征；所述物体预测网络用于根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别，获得识别结果。所述识别结果至少包括：待识别物体的置信度和/或状态信息；所述置信度用于表征所述待识别物体属于障碍物的概率。所述状态信息包括以下至少一种：待识别物体的尺寸信息、位置信息和朝向信息。Based on this, referring to FIG. 3A , an embodiment of the present application provides an object recognition model for object recognition. The object recognition model includes a first feature extraction network, a second feature extraction network, and a first size coefficient extraction network. network, a second size coefficient extraction network, and an object prediction network; the first feature extraction network is used to obtain the first fusion feature according to the point cloud data and the image data; the second feature extraction network is used to obtain the first fusion feature according to the The first image feature is obtained from the image data; the first size coefficient extraction network is configured to obtain a first size coefficient according to the first fusion feature, and obtain a second size coefficient according to the first fusion feature and the first size coefficient. fusion features; the second size coefficient extraction network is configured to obtain a second size coefficient according to the first image feature, and obtain a second image feature according to the first image feature and the second size coefficient; the The object prediction network is configured to recognize the object to be recognized according to the second fusion feature and the second image feature, and obtain a recognition result. The recognition result includes at least: confidence and/or state information of the object to be recognized; the confidence is used to represent the probability that the object to be recognized belongs to an obstacle. The state information includes at least one of the following: size information, position information and orientation information of the object to be identified.

其中，所述第一特征提取网络可以对拼接数据(所述拼接数据由点云数据和所述图像数据拼接而成，或者，由前述的包括有所述像素值的点云数据和图像数据拼接而成)进行特征提取，以获取所述第一融合特征。Wherein, the first feature extraction network can splicing data (the splicing data is formed by splicing point cloud data and the image data, or by splicing the aforementioned point cloud data and image data including the pixel value) ) to perform feature extraction to obtain the first fusion feature.

所述第二特征提取网络用于对所述图像数据进行特征提取，以获取所述第一图像特征。The second feature extraction network is configured to perform feature extraction on the image data to obtain the first image feature.

所述第一尺寸系数提取网络和所述第二尺寸系数提取网络均包括有卷积层，所述第一尺寸系数通过所述第一尺寸系数提取网络对所述第一融合特征进行卷积操作得到的；以及，所述第二尺寸系数通过所述第二尺寸系数提取网络对所述第一图像特征进行卷积操作得到的。可以理解的是，所述卷积层的数量可依据实际应用场景进行具体设置，例如所述卷积层的数量有至少2个，所述第一尺寸系数通过所述第一尺寸系数提取网络对所述第一融合特征进行至少两次卷积操作得到的；以及，所述第二尺寸系数通过所述第二尺寸系数提取网络对所述第一图像特征进行至少两次卷积操作得到的。所述第一尺寸系数和所述第二尺寸系数表征所述待识别物体属于目标尺寸范围的物体的概率。Both the first size coefficient extraction network and the second size coefficient extraction network include a convolution layer, and the first size coefficient performs a convolution operation on the first fusion feature through the first size coefficient extraction network and, the second size coefficient is obtained by performing a convolution operation on the first image feature by the second size coefficient extraction network. It can be understood that the number of the convolutional layers can be specifically set according to the actual application scenario, for example, the number of the convolutional layers is at least 2, and the first size coefficient is extracted from the network pair by the first size coefficient. The first fusion feature is obtained by performing at least two convolution operations; and the second size coefficient is obtained by performing at least two convolution operations on the first image feature through the second size coefficient extraction network. The first size coefficient and the second size coefficient represent the probability that the object to be identified belongs to an object in a target size range.

其中，所述第一融合特征中融合了点云信息和图像信息，使得从所述第一融合特征获得的第一尺寸系数也体现了点云信息和图像信息，因此具有较好的语义信息。进一步地，本实施例实现从第一融合特征和所述第一图像特征中确定属于目标尺寸范围的物体的信息(即所述第一尺寸系数和所述第二尺寸系数)，通过确定的所述第一尺寸系数和第二尺寸系数来加大对于目标尺寸范围的物体的关注度，有利于提升模型对于小物体的识别准确度。Wherein, point cloud information and image information are fused in the first fusion feature, so that the first size coefficient obtained from the first fusion feature also reflects point cloud information and image information, and therefore has better semantic information. Further, in this embodiment, the information of objects belonging to the target size range (that is, the first size coefficient and the second size coefficient) can be determined from the first fusion feature and the first image feature. The first size coefficient and the second size coefficient are used to increase the attention to objects in the target size range, which is beneficial to improve the recognition accuracy of the model for small objects.

在一些实施例中，所述第一尺寸系数提取网络还用于根据所述第一融合特征和所述第一尺寸系数之和获取第二融合特征；所述第二尺寸系数提取网络还用于根据所述第一图像特征和所述第二尺寸系数之和获取第二图像特征，但不限于此。例如还可以通过其他运算方式获取所述第二融合特征或者所述第二图像特征，例如在其他的一些可能的实施例中，所述第二融合特征为所述第一融合特征和所述第一尺寸系数的积，或者，所述第二图像特征为所述第一图像特征和所述第二尺寸系数的积，可依据实际应用场景进行具体设置。In some embodiments, the first size coefficient extraction network is further configured to obtain a second fusion feature according to the sum of the first fusion feature and the first size coefficient; the second size coefficient extraction network is further configured to The second image feature is obtained according to the sum of the first image feature and the second size coefficient, but is not limited thereto. For example, the second fusion feature or the second image feature may also be obtained by other operation methods. For example, in some other possible embodiments, the second fusion feature is the first fusion feature and the first fusion feature. The product of a size coefficient, or the second image feature is the product of the first image feature and the second size coefficient, which can be specifically set according to actual application scenarios.

示例性的，请参阅图3B，所述第一尺寸系数提取网络包括第一尺寸系数提取子网络和第一融合子网络，所述第一尺寸系数提取子网络用于对所述第一融合特征进行卷积操作得到第一尺寸系数，所述第一融合子网络用于根据所述第一融合特征和所述第一尺寸系数获取第二融合特征。所述第二尺寸系数提取网络包括第二尺寸系数提取子网络和第二融合子网络，所述第二尺寸系数提取子网络用于对所述第一图像特征进行卷积操作得到第二尺寸系数，所述第二融合子网络用于根据所述第一图像特征和所述第二尺寸系数获取第二图像特征。本实施例中通过确定的所述第一尺寸系数和第二尺寸系数来加大对于目标尺寸范围的物体的关注度，实现通过所述第一尺寸系数和第二尺寸系数来增加在第二融合特征和所述第二图像特征中关于目标尺寸范围的物体的信息，第二融合特征和第二图像特征中含有关于目标尺寸范围的物体的更多语义信息，从而基于第二融合特征和所述第二图像特征能够对小物体进行准确的识别。Exemplarily, please refer to FIG. 3B, the first size coefficient extraction network includes a first size coefficient extraction sub-network and a first fusion sub-network, and the first size coefficient extraction sub-network is used for the first fusion feature. A convolution operation is performed to obtain a first size coefficient, and the first fusion sub-network is used to obtain a second fusion feature according to the first fusion feature and the first size coefficient. The second size coefficient extraction network includes a second size coefficient extraction sub-network and a second fusion sub-network, and the second size coefficient extraction sub-network is used to perform a convolution operation on the first image feature to obtain a second size coefficient , and the second fusion sub-network is used to obtain the second image feature according to the first image feature and the second size coefficient. In this embodiment, the first size coefficient and the second size coefficient are determined to increase the attention to objects in the target size range, and the first size coefficient and the second size coefficient are used to increase the second fusion. The information about objects in the target size range in the feature and the second image feature, the second fusion feature and the second image feature contain more semantic information about objects in the target size range, so that based on the second fusion feature and the The second image feature enables accurate identification of small objects.

所述物体预测网络用于根据第二融合特征和所述第二图像特征得到第三融合特征，然后根据所述第三融合特征对所述待识别物体进行目标识别，具体来说，所述物体预测网络可以基于前述的方式获取所述第三融合特征，并根据所述第三融合特征对所述待识别物体进行目标识别，获得识别结果。所述识别结果至少包括：待识别物体的置信度和/或状态信息；所述置信度用于表征所述待识别物体属于障碍物的概率。所述状态信息包括以下至少一种：尺寸信息、位置信息和朝向信息。The object prediction network is used to obtain a third fusion feature according to the second fusion feature and the second image feature, and then perform target recognition on the object to be recognized according to the third fusion feature. Specifically, the object The prediction network may obtain the third fusion feature based on the foregoing method, and perform target recognition on the object to be recognized according to the third fusion feature to obtain a recognition result. The recognition result includes at least: confidence and/or state information of the object to be recognized; the confidence is used to represent the probability that the object to be recognized belongs to an obstacle. The state information includes at least one of the following: size information, position information, and orientation information.

接下来对所述物体识别模型的训练过程进行说明。本实施例中，可以预先准备用于训练的样本数据。所述样本数据可以包括有：点云样本数据和图像样本数据，所述点云样本数据和图像样本数据中包括有属于目标尺寸范围的物体的数据，也有不属于目标尺寸范围的物体的数据，以使得训练好的物体识别模型既能对小物体进行准确识别也能对大物体进行准确识别。Next, the training process of the object recognition model will be described. In this embodiment, sample data for training may be prepared in advance. The sample data may include: point cloud sample data and image sample data, wherein the point cloud sample data and the image sample data include data of objects that belong to the target size range, and data of objects that do not belong to the target size range, So that the trained object recognition model can accurately identify both small objects and large objects.

本实施例中模型训练可以是有监督训练，也可以是无监督训练。在一些例子中，可以采用有监督训练方式以提高训练速度，样本数据中可以标注真实值，通过有监督的训练方式，可以提高模型训练的速度和精确度。其中，物体状态信息可以包括一种或多种信息，具体信息可以根据业务需要进行配置，作为例子，所述真实值包括有：属于目标尺寸范围的物体的置信度(表征待识别物体是小物体的概率)、物体置信度(表征待识别物体是障碍物的概率)、物体的状态信息；所述状态信息可以包括以下至少一种：尺寸信息、位置信息和朝向信息。Model training in this embodiment may be supervised training or unsupervised training. In some cases, a supervised training method can be used to improve the training speed, and real values can be marked in the sample data. Through the supervised training method, the speed and accuracy of model training can be improved. The object state information may include one or more kinds of information, and the specific information may be configured according to business needs. As an example, the true value includes: the confidence level of the object belonging to the target size range (indicating that the object to be identified is a small object) probability), object confidence (indicating the probability that the object to be recognized is an obstacle), and state information of the object; the state information may include at least one of the following: size information, position information, and orientation information.

在一些例子中，上述的样本数据可以是原始数据经过特征工程后获得的数据。特征工程是指从原始数据中找出一些具有物理意义的特征参与模型训练的过程，该过程中涉及数据清洗、数据降维、特征提取、特征归一化、特征评估和筛选、特征降维或特征编码等处理。In some examples, the above-mentioned sample data may be data obtained after feature engineering of the original data. Feature engineering refers to the process of finding some physically meaningful features from the original data to participate in model training. This process involves data cleaning, data dimensionality reduction, feature extraction, feature normalization, feature evaluation and screening, feature dimensionality reduction or Feature encoding, etc.

例如，针对点云数据，点云数据是非结构化数据，需要处理为可输入至物体识别模型的格式，例如将点云数据进行处理，得到点云数据的每个体素对应的点云密度，将点云数据的每个体素对应的点云密度作为物体识别模型的输入。针对于图像数据，可以将图像数据的像素值作为物体识别模型的输入。For example, for point cloud data, the point cloud data is unstructured data and needs to be processed into a format that can be input into the object recognition model. For example, the point cloud data is processed to obtain the point cloud density corresponding to each voxel of the point cloud data. The point cloud density corresponding to each voxel of the point cloud data is used as the input of the object recognition model. For image data, the pixel values of the image data can be used as input to the object recognition model.

在一些实施例中，为了提高物体识别的准确度，本实施例中输入物体识别模型的样本数据包括：根据点云样本数据和图像样本数据得到的拼接样本数据，以及所述图像样本数据。In some embodiments, in order to improve the accuracy of object recognition, the sample data input to the object recognition model in this embodiment includes: stitched sample data obtained according to point cloud sample data and image sample data, and the image sample data.

在一个例子中，所述点云样本数据可以被栅格化为H*W*C的三维网格(其中，H和W分别代表长和宽，C表示所述三维网格的深度)，每个网格表示一个体素，所述体素的值为该体素的点云密度，所述图像样本数据(以RGB图像为例)可以表示为H*W*3的数据(其中，H和W分别代表长和宽，3表示RGB3个通道)，则可以将所述点云样本数据和所述图像样本数据拼接成大小为H*W*(C+3)的拼接样本数据。所述拼接样本数据包括了物体的位置信息(从点云数据中获得)和色彩信息(从图像数据中获得)，能够从三维角度和二维角度更好的体现出物体的特性，从而有利于提高物体识别的准确度。In one example, the point cloud sample data can be rasterized into a three-dimensional grid of H*W*C (where H and W represent the length and width, respectively, and C represents the depth of the three-dimensional grid), and each Each grid represents a voxel, the value of the voxel is the point cloud density of the voxel, and the image sample data (taking an RGB image as an example) can be expressed as H*W*3 data (where H and W represents length and width respectively, and 3 represents 3 channels of RGB), then the point cloud sample data and the image sample data can be spliced into spliced sample data with a size of H*W*(C+3). The splicing sample data includes the position information of the object (obtained from the point cloud data) and color information (obtained from the image data), which can better reflect the characteristics of the object from a three-dimensional and two-dimensional perspective, which is beneficial to Improve the accuracy of object recognition.

在另一个例子中，可以根据所述点云样本数据和图像样本数据生成包括有像素值的点云样本数据(生成方式可参考前述生成包括有所述像素值的点云数据的方式，此处不再赘述)，其中所述点云样本数据可以被栅格化为H*W*C的三维网格，每个网格表示一个体素，所述体素的值为该体素的点云密度和点云的像素值(如RGB值)，所述图像样本数据(以RGB图像为例)可以表示为H*W*3的数据，则可以将所述包括有像素值的点云样本数据和所述图像数据拼接成大小为H*W*(C+3)的拼接样本数据。In another example, point cloud sample data including pixel values may be generated according to the point cloud sample data and image sample data (for the generation method, refer to the foregoing method for generating point cloud data including the pixel values, here No further description), wherein the point cloud sample data can be rasterized into a three-dimensional grid of H*W*C, each grid represents a voxel, and the value of the voxel is the point cloud of the voxel density and pixel values (such as RGB values) of the point cloud, the image sample data (taking RGB images as an example) can be expressed as H*W*3 data, then the point cloud sample data including pixel values can be and the image data are spliced into spliced sample data with a size of H*W*(C+3).

利用上述样本数据，物体识别模型可以利用样本数据对机器学习模型训练得到。机器学习模型可以是神经网络模型等，例如基于深度学习的神经网络模型。而物体识别模型的具体结构设计，是训练过程的其中一个重要方面。本实施例中，物体识别模型的结构至少包括：第一特征提取网络、第二特征提取网络、第一尺寸系数提取网络、第二尺寸系数提取网络以及物体预测网络。Using the above sample data, the object recognition model can be obtained by training the machine learning model using the sample data. The machine learning model may be a neural network model, for example, a deep learning-based neural network model. The specific structural design of the object recognition model is one of the important aspects of the training process. In this embodiment, the structure of the object recognition model at least includes: a first feature extraction network, a second feature extraction network, a first size coefficient extraction network, a second size coefficient extraction network, and an object prediction network.

其中，本申请实施例通过所述第一尺寸系数提取网络和第二尺寸系数提取网络来分别提取有关于目标尺寸范围内的物体的信息，得到第一尺寸系数和第二尺寸系数，进而在第一融合特征和第一图像特征的基础上分别增强了有关于目标尺寸范围内的物体的信息，得到包括更多有关于目标尺寸范围内的物体的语义信息的第二融合特征和第二图像特征，从而增强模型对目标尺寸范围内的物体的识别。Among them, in this embodiment of the present application, the first size coefficient extraction network and the second size coefficient extraction network are used to extract information about objects within the target size range, respectively, to obtain the first size coefficient and the second size coefficient, and then in the first size coefficient extraction network. On the basis of the first fusion feature and the first image feature, the information about the objects within the target size range is respectively enhanced, and the second fusion feature and the second image feature including more semantic information about the objects within the target size range are obtained. , thereby enhancing the model's recognition of objects within the target size range.

在一种可能的实现方式中，第一特征提取网络和第二特征提取网络为主干网络，所述第一尺寸系数提取网络为第一特征提取网络的分支网络，所述第二尺寸系数提取网络为第二特征提取网络的分支网络，从而可以在提取出第一融合特征和第一图像特征的基础上进一步提取出第一尺寸系数和第二尺寸系数，进而得到第二融合特征和第二图像特征，相对于使用独立的两个神经网络，相应地降低执行开销，提高模型执行效率。In a possible implementation manner, the first feature extraction network and the second feature extraction network are backbone networks, the first size coefficient extraction network is a branch network of the first feature extraction network, and the second size coefficient extraction network is a branch network of the second feature extraction network, so that the first size coefficient and the second size coefficient can be further extracted on the basis of extracting the first fusion feature and the first image feature, and then the second fusion feature and the second image can be obtained. Compared with using two independent neural networks, the execution overhead is correspondingly reduced and the model execution efficiency is improved.

在一示例性的实施例中，在自动驾驶领域，三维物体检测是一个核心问题，而利用激光雷达等传感器进行检测会存在小物体较难检测的情况，本实施例方案中，采用深度学习中的神经网络，利用该神经网络检测三维物体的位置与置信度，通过在该神经网络中加入网络分支(所述第一尺寸系数提取网络和所述第二尺寸系数提取网络)，并改进深度学习算法中神经网络的训练策略，使得该网络分支用于提取小物体的特征信息，最终使得模型小物体检测更友好。In an exemplary embodiment, in the field of automatic driving, three-dimensional object detection is a core problem, and detection using sensors such as lidar may cause small objects to be difficult to detect. The neural network is used to detect the position and confidence of three-dimensional objects, and by adding network branches (the first size coefficient extraction network and the second size coefficient extraction network) to the neural network, and improve deep learning The training strategy of the neural network in the algorithm makes the network branch used to extract the feature information of small objects, and finally makes the model more friendly for small object detection.

训练过程的其中另一个重要方面，需要根据业务需求设计合适的损失函数。损失函数也称之为代价函数，在有监督模型训练的场景下，样本数据中标注有真实值，损失函数用来估量模型的预测值与真实值之间的误差。损失函数对于模型的识别准确性至关重要，基于已有的样本数据和模型的需求来设计何种损失函数是具有较大难度的。在一些例子中，可以利用一些已有的如对数损失函数、平方损失函数、指数损失函数、0/1损失函数等来构成相应场景的损失函数。Another important aspect of the training process is the need to design an appropriate loss function according to the business requirements. The loss function is also called the cost function. In the scenario of supervised model training, the sample data is marked with the real value, and the loss function is used to estimate the error between the predicted value of the model and the real value. The loss function is very important to the recognition accuracy of the model, and it is difficult to design what kind of loss function based on the existing sample data and the requirements of the model. In some examples, some existing loss functions such as logarithmic loss functions, squared loss functions, exponential loss functions, 0/1 loss functions, etc. can be used to form loss functions of corresponding scenarios.

基于本实施例的需求，作为例子，所述物体识别模型在训练过程中采用的损失函数至少包括：用于优化所述第一尺寸系数提取网络的第一损失函数、用于优化所述第二尺寸系数提取网络的第二损失函数，用于描述状态差异的第三损失函数和用于描述置信度差异的第四损失函数。其中，所述第一损失函数和所述第二损失函数能够使模型关注到目标尺寸范围内的物体，使模型能够对小物体的区分更加明显。Based on the requirements of this embodiment, as an example, the loss function used by the object recognition model in the training process includes at least: a first loss function for optimizing the first size coefficient extraction network, a first loss function for optimizing the second The size coefficient extracts the network's second loss function, a third loss function for describing state differences, and a fourth loss function for describing confidence differences. Wherein, the first loss function and the second loss function can make the model pay attention to objects within the target size range, so that the model can distinguish small objects more clearly.

其中，所述第一损失函数的优化目标包括：如果样本数据指示属于目标尺寸范围的物体，增大所述第一尺寸系数；所述样本数据包括点云样本数据和图像样本数据。更具体来说，所述第一尺寸系数提取网络用于从所述第一融合特征提取第一尺寸系数，并进一步基于所述第一尺寸系数和所述第一融合特征得到所述第二融合特征；所述第一尺寸系数提取网络用于预测所述第二融合特征属于目标尺寸范围的物体的概率。因此，所述第一损失函数的优化目标具体包括：如果基于所述样本数据获取的第二融合特征指示属于目标尺寸范围的物体，增大所述第二融合特征，如果基于所述样本数据获取的第二融合特征指示不属于目标尺寸范围的物体，减小所述第二融合特征。Wherein, the optimization objective of the first loss function includes: if the sample data indicates an object belonging to the target size range, increasing the first size coefficient; the sample data includes point cloud sample data and image sample data. More specifically, the first size coefficient extraction network is configured to extract a first size coefficient from the first fusion feature, and further obtain the second fusion based on the first size coefficient and the first fusion feature feature; the first size coefficient extraction network is used to predict the probability that the second fusion feature belongs to an object in the target size range. Therefore, the optimization objective of the first loss function specifically includes: if the second fusion feature obtained based on the sample data indicates an object belonging to the target size range, increasing the second fusion feature; if the second fusion feature obtained based on the sample data indicates an object belonging to the target size range The second fusion feature of indicates objects that do not belong to the target size range, and the second fusion feature is reduced.

在一个例子中，以所述第二融合特征以特征图形式表示为例进行说明：所述第一损失函数的优化目标具体包括：如果基于所述样本数据得到的包括有第二融合特征的特征图的中的某个位置指示属于目标尺寸范围的物体，增大该位置处的特征值，反之，则减小该位置处的特征值，从而使得所述物体识别模型能够对小物体的区分更加明显。In an example, the second fusion feature is represented in the form of a feature map as an example for description: the optimization goal of the first loss function specifically includes: if the feature obtained based on the sample data includes the second fusion feature A certain position in the figure indicates an object belonging to the target size range, and the eigenvalue at this position is increased, otherwise, the eigenvalue at this position is decreased, so that the object recognition model can distinguish small objects more effectively obvious.

所述第一损失函数可以用于描述所述物体识别模型从样本数据中获取的属于所述目标尺寸范围的物体的置信度预测值，与所述样本数据对应的属于所述目标尺寸范围的物体的置信度真实值之间的差异；可以通过一公式为例进行说明：

其中，

表示标注的真实值，即待识别物体属于所述目标尺寸范围的物体的置信度真实值；fseg(x _k1)表示所述物体识别模型从样本数据中预测的属于所述目标尺寸范围的物体的置信度预测值。 The first loss function can be used to describe the confidence prediction value of objects belonging to the target size range obtained by the object recognition model from the sample data, and the objects belonging to the target size range corresponding to the sample data The difference between the true values of the confidence level of ; it can be illustrated by a formula as an example:

in,

Represents the true value of the annotation, that is, the true value of the confidence of the object to be recognized belonging to the target size range; fseg(x _k1 ) represents the object recognition model predicted from the sample data The object belongs to the target size range. Confidence prediction value.

其中，所述第二损失函数的优化目标包括：如果所述图像样本数据指示属于目标尺寸范围的物体，增大所述第二尺寸系数。更具体来说，所述第二尺寸系数提取网络用于从所述第一图像特征提取第二尺寸系数，并进一步基于所述第二尺寸系数和所述第一图像特征得到所述第二图像特征；所述第二尺寸系数提取网络用于预测所述第二图像特征属于目标尺寸范围的物体的概率。因此，所述第二损失函数的优化目标具体包括：如果基于所述图像样本数据获取的第二图像特征指示属于目标尺寸范围的物体，增大所述第二图像特征，如果基于所述样本数据获取的第二图像特征指示不属于目标尺寸范围的物体，减小所述第二图像特征。Wherein, the optimization objective of the second loss function includes: if the image sample data indicates an object belonging to a target size range, increasing the second size coefficient. More specifically, the second size coefficient extraction network is configured to extract a second size coefficient from the first image feature, and further obtain the second image based on the second size coefficient and the first image feature feature; the second size coefficient extraction network is used to predict the probability that the second image feature belongs to an object in the target size range. Therefore, the optimization goal of the second loss function specifically includes: if the second image feature obtained based on the image sample data indicates an object belonging to the target size range, increasing the second image feature, if based on the sample data The acquired second image feature indicates an object that does not belong to the target size range, and the second image feature is reduced.

在一个例子中，以所述第二图像特征以特征图形式表示为例进行说明：所述第二损失函数的优化目标具体包括：如果基于所述图像样本数据得到的包括有第二图像特征的特征图的中的某个位置指示属于目标尺寸范围的物体，增大该位置处的特征值，反之，则减小该位置处的特征值，从而使得所述物体识别模型能够对小物体的区分更加明显。In an example, the second image feature is represented in the form of a feature map as an example for description: the optimization objective of the second loss function specifically includes: if the second image feature obtained based on the image sample data includes the second image feature A certain position in the feature map indicates an object belonging to the target size range, and the eigenvalue at this position is increased; otherwise, the eigenvalue at this position is decreased, so that the object recognition model can distinguish small objects more obvious.

所述第二损失函数用于描述所述物体识别模型从所述图像样本数据中获取的属于所述目标尺寸范围的物体的置信度预测值，与所述图像样本数据对应的属于所述目标尺寸范围的物体的置信度真实值之间的差异。可以通过一公式为例进行说明：

其中，

表示标注的真实值，即待识别物体属于所述目标尺寸范围的物体的置信度真实值；fseg(x _k2)表示所述物体识别模型从图像样本数据中预测的属于所述目标尺寸范围的物体的置信度预测值。 The second loss function is used to describe the confidence prediction value of the object belonging to the target size range obtained by the object recognition model from the image sample data, and the object size corresponding to the image sample data belongs to the target size. The difference between the true values of confidence for objects in the range. A formula can be used as an example to illustrate:

in,

Represents the true value of the annotation, that is, the true value of the confidence of the object to be recognized belongs to the target size range; fseg(x _k2 ) represents the object recognition model predicted from the image sample data to belong to the target size range of the object The confidence prediction value of .

其中，所述第三损失函数的优化目标包括：降低所述物体识别模型从样本数据中获取的物体的状态信息预测值与所述样本数据对应的物体的状态信息真实值之间的差异。物体状态信息包括以下至少一种：物体的尺寸信息、位置信息和朝向信息。The optimization objective of the third loss function includes: reducing the difference between the predicted value of the state information of the object obtained from the sample data by the object recognition model and the actual value of the state information of the object corresponding to the sample data. The object state information includes at least one of the following: size information, position information and orientation information of the object.

所述第三损失函数用于描述所述物体识别模型从样本数据中获取的物体的预测尺寸、预测位置和/或预测朝向，分别与所述样本数据对应的物体的真实尺寸、真实位置和/或真实朝向之间的差异；所述样本数据包括点云样本数据和图像样本数据。可以通过一公式为例进行说明：

其中，

表示标注的真实值，即样本数据中标注的物体的真实尺寸、真实位置和/或真实朝向；floc(x _i)表示所述物体识别模型从样本数据中获取的物体的预测尺寸、预测位置和/或预测朝向。 The third loss function is used to describe the predicted size, predicted position and/or predicted orientation of the object obtained by the object recognition model from the sample data, which are respectively the real size, real position and/or real size of the object corresponding to the sample data. Or the difference between the real orientations; the sample data includes point cloud sample data and image sample data. A formula can be used as an example to illustrate:

in,

Represents the true value of the annotation, that is, the real size, real position and/or real orientation of the object marked in the sample data; floc(x _i ) represents the predicted size, predicted position and / or predicted heading.

其中，所述第四损失函数的优化目标包括：降低所述物体识别模型从样本数据中获取的物体的置信度预测值与所述样本数据对应的物体的置信度真实值之间的差异。所述物体的置信度表示预测的物体是障碍物的概率。Wherein, the optimization objective of the fourth loss function includes: reducing the difference between the predicted value of the confidence of the object obtained from the sample data by the object recognition model and the actual value of the confidence of the object corresponding to the sample data. The confidence of the object represents the probability that the predicted object is an obstacle.

所述第四损失函数用于描述所述物体识别模型从样本数据中获取的物体的置信度预测值与所述样本数据对应的物体的置信度真实值之间的差异。可以通过一公式为例进行说明：

其中，

表示标注的真实值，即样本数据中标注的物体的置信度真实值，fpred(x _i)表示所述物体识别模型从样本数据中获取的物体的置信度预测值。 The fourth loss function is used to describe the difference between the predicted confidence value of the object obtained from the sample data by the object recognition model and the actual confidence value of the object corresponding to the sample data. A formula can be used as an example to illustrate:

in,

represents the true value of the annotation, that is, the true value of the confidence of the object marked in the sample data, and fpred(x _i ) represents the predicted value of the confidence of the object obtained by the object recognition model from the sample data.

综上，本实施例的物体识别模型在训练过程中采用的损失函数可以包括：

To sum up, the loss function used in the training process of the object recognition model of this embodiment may include:

上述损失函数的具体公式只是示意说明，实际应用中可以根据需要灵活配置函数的具体数学描述，还可以根据需要确定是否添加正则化项，本实施例对此不做限定。The specific formula of the above loss function is only a schematic illustration. In practical applications, the specific mathematical description of the function can be flexibly configured as needed, and whether to add a regularization term can also be determined as needed, which is not limited in this embodiment.

训练过程中，需要利用最优化方法对评价函数进行最优化求解，找到评价最高的模型。例如，可以通过诸如梯度下降法等最优化方法，找到损失函数的输出误差的最小值(最优解)，将模型的参数调到最优，即求解到模型中各网络层的最优系数。在一些例子中，求解的过程可以是通过计算模型的输出和损失函数的误差值，以求解对模型参数进行调整的梯度。作为例子，可以调用反向传播函数，来计算梯度，将所述损失函数的计算结果反向传播至所述物体识别模型中，以使所述物体识别模型更新模型参数。In the training process, it is necessary to use the optimization method to optimize the evaluation function and find the model with the highest evaluation. For example, the minimum value (optimal solution) of the output error of the loss function can be found through optimization methods such as gradient descent, and the parameters of the model can be adjusted to the optimum, that is, the optimal coefficients of each network layer in the model can be solved. In some examples, the process of solving may be to solve for gradients that adjust model parameters by computing the output of the model and the error value of the loss function. As an example, a back-propagation function can be called to calculate the gradient, and the calculation result of the loss function can be back-propagated into the object recognition model, so that the object recognition model can update model parameters.

在一些例子中，上述损失函数的求解可以采用独立的求解器进行求解。在另一些例子中，以物体识别模型为神经网络模型为例，可以在主干网络的基础上设置网络分支用于计算网络的损失函数。作为例子，损失函数可以分为上述四个函数：用于优化所述第一尺寸系数提取网络的第一损失函数、用于优化所述第二尺寸系数提取网络的第二损失函数，用于描述状态差异的第三损失函数和用于描述置信度差异的第四损失函数，这些损失函数将会统一指导更新神经网络的参数，使其具备更好的预测性能。In some instances, the solution of the loss function described above can be solved using a stand-alone solver. In other examples, taking the object recognition model as an example of a neural network model, a network branch may be set on the basis of the backbone network to calculate the loss function of the network. As an example, the loss function can be divided into the above four functions: a first loss function for optimizing the first size coefficient extraction network, a second loss function for optimizing the second size coefficient extraction network, for describing The third loss function for state difference and the fourth loss function for describing confidence difference, these loss functions will uniformly guide the update of the parameters of the neural network to make it have better prediction performance.

通过上述训练过程，训练结束获得物体识别模型，获得的物体识别模型还可利用测试样本进行测试，以检验物体识别模型的识别准确度。最终获得的物体识别模型可以设置于物体识别装置中，示例性的，所述物体识别装置可以是可移动平台，或者所述物体识别装置作为芯片安装于可移动平台中。Through the above training process, the object recognition model is obtained after the training, and the obtained object recognition model can also be tested with test samples to check the recognition accuracy of the object recognition model. The finally obtained object recognition model can be set in an object recognition device. Exemplarily, the object recognition device can be a movable platform, or the object recognition device can be installed in the movable platform as a chip.

在所述可移动平台移动过程中，可以通过配置于可移动平台上的激光雷达或者具有深度信息采集功能的拍摄装置获取点云数据，以及通过配置于可移动平台上的拍摄装置获取的图像数据，然后所述可移动平台中的物体识别装置根据所述点云数据和所述图像数据获取拼接数据，然后将拼接数据和所述图像数据输入所述物体识别模型中，从而获取所述物体识别模型输出的识别结果，所述识别结果包括物体的置信度和状态信息。进一步地，为了方便后续基于识别结果的处理过程，所述识别结果为置信度大于预设阈值的数据，而对于置信度不大于预设阈值的数据，表明不是障碍物，无需对置信度不大于预设阈值的数据进行进一步处理，所述预设阈值可依据实际应用场景进行具体设置；作为例子，对于输入的待识别物体，可以识别一系列候选框，每个候选框可能对应一个可能的物体，基于所述待识别物体的置信度，可以确定候选框对应属于障碍物的概率，将各个候选框对应的置信度进行排序，排序后按照设定阈值进行筛选，大于设定阈值的可以认为识别出一个物体，进而得到最终的识别结果。During the movement of the movable platform, point cloud data can be obtained through a lidar or a camera with a depth information collection function configured on the movable platform, and image data obtained through a camera configured on the movable platform. , and then the object recognition device in the movable platform obtains splicing data according to the point cloud data and the image data, and then inputs the splicing data and the image data into the object recognition model, thereby obtaining the object recognition The recognition result output by the model, the recognition result includes the confidence and state information of the object. Further, in order to facilitate the subsequent processing process based on the recognition result, the recognition result is data with a confidence level greater than a preset threshold, and for data with a confidence level not greater than the preset threshold value, it indicates that it is not an obstacle, and there is no need for confidence levels not greater than The data of the preset threshold is further processed, and the preset threshold can be specifically set according to the actual application scenario; as an example, for the input object to be recognized, a series of candidate frames can be identified, and each candidate frame may correspond to a possible object , based on the confidence of the object to be identified, the probability that the candidate frame corresponds to an obstacle can be determined, the confidence corresponding to each candidate frame is sorted, and after sorting, it is screened according to the set threshold, and those greater than the set threshold can be considered recognized out an object, and then get the final recognition result.

在获取所述识别结果之后，在第一种可能的实现方式中，所述可移动平台可以使用所述识别结果进行避障决策或者进行路线规划。After acquiring the identification result, in a first possible implementation manner, the movable platform may use the identification result to make an obstacle avoidance decision or make a route planning.

在第二种可能的实现方式中，可以将所述识别结果显示在所述可移动平台的界面或者与所述可移动平台通信连接的终端的界面，以便让用户了解可移动平台的行驶情况以及可移动平台周边的路况；进一步，可以在所述可移动平台的界面或者与所述可移动平台通信连接的终端的界面显示所述点云数据和图像数据，并进一步将所述识别结果显示在所述点云数据和图像数据上，方便用户结合具体的场景观看，可以让用户进一步了解实际行驶情况。In a second possible implementation manner, the identification result may be displayed on the interface of the movable platform or the interface of the terminal communicatively connected to the movable platform, so as to let the user know the driving situation of the movable platform and The road conditions around the movable platform; further, the point cloud data and image data can be displayed on the interface of the movable platform or the interface of the terminal communicatively connected with the movable platform, and the recognition result can be further displayed on the The point cloud data and the image data are convenient for the user to watch in combination with a specific scene, and allow the user to further understand the actual driving situation.

在第三种可能的实现方式中，可以将所述识别结果传输给所述可移动平台中的其他部件，以便所述其他部件基于所述识别结果控制所述可移动平台安全可靠地工作。In a third possible implementation manner, the identification result may be transmitted to other components in the movable platform, so that the other components control the movable platform to work safely and reliably based on the identification result.

相应的，请参阅图4，本申请实施例还提供了物体识别装置30，所述物体识别装置30可以是可移动平台，或者所述物体识别装30作为芯片安装于可移动平台中；所述物体识别装置30包括处理器31和存储有计算机程序的存储器32；Correspondingly, referring to FIG. 4 , the embodiment of the present application further provides an object recognition device 30, and the object recognition device 30 may be a movable platform, or the object recognition device 30 may be installed in the movable platform as a chip; The object recognition device 30 includes a processor 31 and a memory 32 storing a computer program;

所述处理器31在执行所述计算机程序时实现以下步骤：The processor 31 implements the following steps when executing the computer program:

在一实施例中，所述待识别物体的点云数据以及图像数据是通过可移动平台的传感器对空间采样得到的。In an embodiment, the point cloud data and the image data of the object to be identified are obtained by spatial sampling through the sensor of the movable platform.

在一实施例中，所述处理器31还用于：根据所述点云数据中每个体素的点云密度以及所述图像数据中的像素值，获取第一融合特征；其中，所述体素是对所述点云数据进行栅格划分获得的。In one embodiment, the processor 31 is further configured to: obtain the first fusion feature according to the point cloud density of each voxel in the point cloud data and the pixel value in the image data; wherein, the volume The pixels are obtained by dividing the point cloud data into a grid.

在一实施例中，所述第一融合特征和所述第一图像特征以特征图形式表示。In an embodiment, the first fusion feature and the first image feature are represented in the form of a feature map.

所述第一尺寸系数表征包括所述第一融合特征的特征图中的每个位置属于目标尺寸范围的物体的概率；以及，所述第二尺寸系数表征包括所述第一图像特征的特征图中的每个位置属于目标尺寸范围的物体的概率。the first size coefficient characterizes the probability that each position in the feature map including the first fusion feature belongs to an object in a target size range; and the second size coefficient characterizes the feature map including the first image feature The probability of an object belonging to the target size range at each location in .

在一实施例中，所述处理器31还用于：基于点云到图像的投影关系，确定所述点云数据在二维空间中的第一投影位置；获取所述图像数据中所述第一投影位置处的像素值；根据所述像素值以及所述点云数据生成包括有所述像素值的点云数据；根据所述包括有所述像素值的点云数据以及所述图像数据获取所述第一融合特征。In one embodiment, the processor 31 is further configured to: determine the first projection position of the point cloud data in the two-dimensional space based on the projection relationship between the point cloud and the image; obtain the first projection position in the image data. a pixel value at a projected position; generating point cloud data including the pixel value according to the pixel value and the point cloud data; obtaining the point cloud data including the pixel value and the image data the first fusion feature.

在一实施例中，所述点云到图像的投影关系基于点云坐标系到相机坐标系的外参以及相机的内参得到。In an embodiment, the projection relationship from the point cloud to the image is obtained based on the external parameters from the point cloud coordinate system to the camera coordinate system and the internal parameters of the camera.

在一实施例中，所述处理器31还用于：拼接所述点云数据和所述图像数据，得到拼接数据并输入物体识别模型中；通过所述物体识别模型中的第一特征提取网络对所述拼接信息进行特征提取，获取所述第一融合特征。In one embodiment, the processor 31 is further configured to: splicing the point cloud data and the image data to obtain the spliced data and input it into the object recognition model; through the first feature extraction network in the object recognition model Feature extraction is performed on the splicing information to obtain the first fusion feature.

在一实施例中，所述第一图像特征通过对所述图像数据进行特征提取后获得。In one embodiment, the first image feature is obtained by performing feature extraction on the image data.

在一实施例中，所述处理器31还用于：将所述图像数据输入物体识别模型中，通过所述物体识别模型中的第二特征提取网络对所述图像数据进行特征提取，获得所述第一图像特征。In one embodiment, the processor 31 is further configured to: input the image data into an object recognition model, perform feature extraction on the image data through the second feature extraction network in the object recognition model, and obtain the obtained image data. Describe the first image feature.

在一实施例中，所述第一尺寸系数通过物体识别模型中的第一尺寸系数提取网络从所述第一融合特征中获得；以及，所述第二尺寸系数通过所述物体识别模型中的第二尺寸系数提取网络从所述第一图像特征中获得。In one embodiment, the first size coefficient is obtained from the first fusion feature by a first size coefficient extraction network in the object recognition model; and, the second size coefficient is obtained by using a A second size coefficient extraction network is obtained from the first image features.

在一实施例中，所述第一尺寸系数提取网络和所述第二尺寸系数提取网络均包括有卷积层。In one embodiment, both the first size coefficient extraction network and the second size coefficient extraction network include convolutional layers.

在一实施例中，所述第一尺寸系数通过所述第一尺寸系数提取网络对所述第一融合特征进行卷积操作得到的；以及，所述第二尺寸系数通过所述第二尺寸系数提取网络对所述第一图像特征进行卷积操作得到的。In one embodiment, the first size coefficient is obtained by performing a convolution operation on the first fusion feature by the first size coefficient extraction network; and, the second size coefficient is obtained by the second size coefficient The extraction network is obtained by performing a convolution operation on the first image feature.

在一实施例中，所述物体识别模型在训练过程中采用的损失函数至少包括：用于优化所述第一尺寸系数提取网络的第一损失函数以及用于优化所述第二尺寸系数提取网络的第二损失函数。In an embodiment, the loss function used by the object recognition model in the training process includes at least: a first loss function for optimizing the first size coefficient extraction network and a first loss function for optimizing the second size coefficient extraction network. The second loss function of .

在一实施例中，所述第一损失函数的优化目标包括：如果样本数据指示属于目标尺寸范围的物体，增大所述第一尺寸系数；所述样本数据包括点云样本数据和图像样本数据；以及，所述第二损失函数的优化目标包括：如果所述图像样本数据指示属于目标尺寸范围的物体，增大所述第二尺寸系数。In one embodiment, the optimization objective of the first loss function includes: if the sample data indicates an object belonging to a target size range, increasing the first size coefficient; the sample data includes point cloud sample data and image sample data and, the optimization objective of the second loss function includes: if the image sample data indicates an object belonging to a target size range, increasing the second size coefficient.

在一实施例中，所述第一损失函数用于描述所述物体识别模型从样本数据中获取的属于所述目标尺寸范围的物体的置信度预测值，与所述样本数据对应的属于所述目标尺寸范围的物体的置信度真实值之间的差异；所述样本数据包括点云样本数据和图像样本数据。In an embodiment, the first loss function is used to describe the confidence prediction value of the object belonging to the target size range obtained from the sample data by the object recognition model, and the prediction value corresponding to the sample data belongs to the The difference between the true confidence values of objects in the target size range; the sample data includes point cloud sample data and image sample data.

所述第二损失函数用于描述所述物体识别模型从所述图像样本数据中获取的属于所述目标尺寸范围的物体的置信度预测值，与所述图像样本数据对应的属于所述目标尺寸范围的物体的置信度真实值之间的差异。The second loss function is used to describe the confidence prediction value of the object belonging to the target size range obtained by the object recognition model from the image sample data, and the object size corresponding to the image sample data belongs to the target size. The difference between the true values of confidence for objects in the range.

在一实施例中，所述第二融合特征为所述第一融合特征和所述第一尺寸系数之和；和/或，所述第二图像特征为所述第一图像特征和所述第二尺寸系数之和。In an embodiment, the second fusion feature is the sum of the first fusion feature and the first size coefficient; and/or the second image feature is the first image feature and the first image feature. The sum of the two size coefficients.

在一实施例中，所述处理器31还用于：基于点云到图像的投影关系，确定所述第二融合特征在二维空间中的第二投影位置；获取所述第二图像特征中在所述第二投影位置处的图像特征；根据所述图像特征与对应的第二融合特征获得第三融合特征，并根据所述第三融合特征对所述待识别物体进行识别。In one embodiment, the processor 31 is further configured to: determine the second projection position of the second fusion feature in the two-dimensional space based on the projection relationship between the point cloud and the image; The image feature at the second projection position; the third fusion feature is obtained according to the image feature and the corresponding second fusion feature, and the object to be recognized is recognized according to the third fusion feature.

在一实施例中，所述第三融合特征为所述图像特征与对应的第二融合特征的均值；或者，所述第三融合特征为所述图像特征与对应的第二融合特征中的较大者。In one embodiment, the third fusion feature is the mean value of the image feature and the corresponding second fusion feature; or, the third fusion feature is the ratio of the image feature and the corresponding second fusion feature. Big one.

在一实施例中，所述处理器31还用于：根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别，生成识别结果；所述识别结果至少包括：所述待识别物体的置信度和/或状态信息；所述置信度用于表征所述待识别物体属于障碍物的概率。In one embodiment, the processor 31 is further configured to: recognize the object to be recognized according to the second fusion feature and the second image feature, and generate a recognition result; the recognition result at least includes: confidence and/or state information of the object to be identified; the confidence is used to represent the probability that the object to be identified belongs to an obstacle.

在一实施例中，所述状态信息包括以下至少一种：尺寸信息、位置信息和朝向信息。In an embodiment, the state information includes at least one of the following: size information, location information, and orientation information.

在一实施例中，所述待识别物体的置信度以及状态信息利用物体识别模型中的物体预测网络对所述第二融合特征和所述第二图像特征进行处理获得。In an embodiment, the confidence and state information of the object to be recognized are obtained by processing the second fusion feature and the second image feature by using an object prediction network in an object recognition model.

在一实施例中，所述物体识别模型在训练过程中采用的损失函数至少包括用于描述状态差异的第三损失函数和用于描述置信度差异的第四损失函数。In one embodiment, the loss function used by the object recognition model in the training process includes at least a third loss function for describing the state difference and a fourth loss function for describing the confidence difference.

所述状态差异包括：所述物体识别模型从样本数据中获取的物体的预测尺寸、预测位置和/或预测朝向，分别与所述样本数据对应的物体的真实尺寸、真实位置和/或真实朝向之间的差异；所述样本数据包括点云样本数据和图像样本数据。The state difference includes: the predicted size, predicted position and/or predicted orientation of the object obtained by the object recognition model from the sample data, respectively the real size, real position and/or real orientation of the object corresponding to the sample data The sample data includes point cloud sample data and image sample data.

所述置信度差异包括：所述物体识别模型从样本数据中获取的物体的置信度预测值与所述样本数据对应的物体的置信度真实值之间的差异。The confidence level difference includes: the difference between the confidence level prediction value of the object obtained by the object recognition model from the sample data and the confidence level real value of the object corresponding to the sample data.

在一实施例中，所述识别结果为置信度大于预设阈值的数据。In one embodiment, the identification result is data with a confidence level greater than a preset threshold.

在一实施例中，所述处理器31还用于：根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别，以生成识别结果，所述识别结果用于对可移动平台进行避障决策或者移动路线规划。In an embodiment, the processor 31 is further configured to: recognize the object to be recognized according to the second fusion feature and the second image feature, so as to generate a recognition result, and the recognition result is used to identify the object to be recognized. The movable platform makes obstacle avoidance decisions or mobile route planning.

在一实施例中，所述处理器31还用于：根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别，以生成识别结果；所述识别结果用于在可移动平台的界面或者与所述可移动平台通信连接的终端设备的界面显示。In an embodiment, the processor 31 is further configured to: recognize the object to be recognized according to the second fusion feature and the second image feature to generate a recognition result; the recognition result is used to The interface of the movable platform or the interface of the terminal device communicatively connected to the movable platform is displayed.

在一实施例中，所述点云数据是利用配置于可移动平台上的激光雷达或者具有深度信息采集功能的拍摄装置获取的；和/或，所述图像数据是利用配置于可移动平台上的拍摄装置获取的。In an embodiment, the point cloud data is obtained by using a lidar configured on a movable platform or a camera with a depth information acquisition function; and/or, the image data is obtained by using a mobile platform configured on the captured by the camera.

在一实施例中，所述可移动平台包括：无人飞行器、汽车、无人船或机器人。In one embodiment, the movable platform includes: an unmanned aerial vehicle, a car, an unmanned ship, or a robot.

对于装置实施例而言，由于其基本对应于方法实施例，所以相关之处参见方法实施例的部分说明即可。这里描述的各种实施方式可以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施，这里描述的实施方式可以通过使用特定用途集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理装置(DSPD)、可编程逻辑装置(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少一种来实施。对于软件实施，诸如过程或功能的实施方式可以与允许执行至少一种功能或操作的单独的软件模块来实施。软件代码可以由以任何适当的编程语言编写的软件应用程序(或程序)来实施，软件代码可以存储在存储器中并且由控制器执行。For the apparatus embodiments, since they basically correspond to the method embodiments, reference may be made to the partial descriptions of the method embodiments for related parts. The various embodiments described herein can be implemented using computer readable media such as computer software, hardware, or any combination thereof. For hardware implementation, the embodiments described herein can be implemented using application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays ( FPGA), processors, controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein are implemented. For software implementation, embodiments such as procedures or functions may be implemented with separate software modules that allow the performance of at least one function or operation. The software codes may be implemented by a software application (or program) written in any suitable programming language, which may be stored in memory and executed by a controller.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. The terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also other not expressly listed elements, or also include elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

相应的，请参阅图5，本申请实施例还提供了一种可移动平100，包括：机体101；动力***102，安装在所述机体101内，用于为所述可移动平台100提供动力；以及，上述的物体识别装置30。Correspondingly, referring to FIG. 5 , the embodiment of the present application further provides a movable platform 100 , including: a body 101 ; and a power system 102 , which is installed in the body 101 and is used to provide power for the movable platform 100 ; and, the above-mentioned object recognition device 30 .

可选地，所述可移动平台100为车辆、无人机、无人船或者可移动机器人。Optionally, the movable platform 100 is a vehicle, a drone, an unmanned ship or a mobile robot.

在示例性实施例中，还提供了一种包括指令的非临时性计算机可读存储介质，例如包括指令的存储器，上述指令可由装置的处理器执行以完成上述方法。例如，非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium, such as a memory including instructions, executable by a processor of an apparatus to perform the above-described method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

一种非临时性计算机可读存储介质，当存储介质中的指令由终端的处理器执行时，使得终端能够执行上述方法。A non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the terminal, enable the terminal to execute the above method.

以上对本申请实施例所提供的方法和装置进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The methods and devices provided by the embodiments of the present application have been described in detail above, and the principles and implementations of the present application are described with specific examples herein. At the same time, for those of ordinary skill in the art, according to the idea of the application, there will be changes in the specific implementation and application scope. In summary, the content of this specification should not be construed as a limitation to the application. .

Claims

一种物体识别方法，其特征在于，包括：An object recognition method, characterized in that it includes:

获取待识别物体的点云数据以及图像数据，并根据所述点云数据和所述图像数据获取第一融合特征；以及，根据所述图像数据获取第一图像特征；acquiring point cloud data and image data of the object to be recognized, and acquiring a first fusion feature according to the point cloud data and the image data; and acquiring a first image feature according to the image data;

根据所述第一融合特征获取第一尺寸系数；以及，根据所述第一图像特征获取第二尺寸系数；所述第一尺寸系数和所述第二尺寸系数表征所述待识别物体属于目标尺寸范围的物体的概率；Obtain a first size coefficient according to the first fusion feature; and obtain a second size coefficient according to the first image feature; the first size coefficient and the second size coefficient indicate that the object to be recognized belongs to the target size the probability of objects in the range;

根据所述第一融合特征和所述第一尺寸系数获取第二融合特征；以及，根据所述第一图像特征和所述第二尺寸系数获取第二图像特征；Obtaining a second fusion feature according to the first fusion feature and the first size coefficient; and obtaining a second image feature according to the first image feature and the second size coefficient;

根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别。The object to be recognized is recognized according to the second fusion feature and the second image feature.
根据权利要求1所述的方法，其特征在于，所述待识别物体的点云数据以及图像数据是通过可移动平台的传感器对空间采样得到的。The method according to claim 1, wherein the point cloud data and the image data of the object to be recognized are obtained by spatial sampling by a sensor of the movable platform.
根据权利要求1所述的方法，其特征在于，所述根据所述点云数据和所述图像数据获取第一融合特征，包括：The method according to claim 1, wherein the obtaining the first fusion feature according to the point cloud data and the image data comprises:

根据所述点云数据中每个体素的点云密度以及所述图像数据中的像素值，获取第一融合特征；其中，所述体素是对所述点云数据进行栅格划分获得的。The first fusion feature is obtained according to the point cloud density of each voxel in the point cloud data and the pixel value in the image data; wherein the voxel is obtained by dividing the point cloud data into a grid.
根据权利要求1所述的方法，其特征在于，所述第一融合特征和所述第一图像特征以特征图形式表示；The method according to claim 1, wherein the first fusion feature and the first image feature are represented in the form of a feature map;

所述第一尺寸系数表征包括所述第一融合特征的特征图中的每个位置属于目标尺寸范围的物体的概率；The first size coefficient represents the probability that each position in the feature map including the first fusion feature belongs to the object in the target size range;

所述第二尺寸系数表征包括所述第一图像特征的特征图中的每个位置属于目标尺寸范围的物体的概率。The second size coefficient characterizes the probability that each position in the feature map including the first image feature belongs to the object within the target size range.
根据权利要求1所述的方法，其特征在于，所述根据所述点云数据和所述图像数据获取第一融合特征，包括：The method according to claim 1, wherein the obtaining the first fusion feature according to the point cloud data and the image data comprises:

基于点云到图像的投影关系，确定所述点云数据在二维空间中的第一投影位置；determining the first projection position of the point cloud data in the two-dimensional space based on the projection relationship between the point cloud and the image;

获取所述图像数据中所述第一投影位置处的像素值；obtaining the pixel value at the first projection position in the image data;

根据所述像素值以及所述点云数据生成包括有所述像素值的点云数据；generating point cloud data including the pixel value according to the pixel value and the point cloud data;

根据所述包括有所述像素值的点云数据以及所述图像数据获取所述第一融合特征。The first fusion feature is acquired according to the point cloud data including the pixel value and the image data.
根据权利要求5所述的方法，其特征在于，所述点云到图像的投影关系基于点云坐标系到相机坐标系的外参以及相机的内参得到。The method according to claim 5, wherein the projection relationship of the point cloud to the image is obtained based on the external parameters from the point cloud coordinate system to the camera coordinate system and the internal parameters of the camera.
根据权利要求1所述的方法，其特征在于，所述根据所述点云数据和所述图像数据获取第一融合特征，包括：The method according to claim 1, wherein the obtaining the first fusion feature according to the point cloud data and the image data, comprises:

拼接所述点云数据和所述图像数据，得到拼接数据并输入物体识别模型中；Splicing the point cloud data and the image data, obtaining the splicing data and inputting it into the object recognition model;

通过所述物体识别模型中的第一特征提取网络对所述拼接信息进行特征提取，获取所述第一融合特征。Feature extraction is performed on the splicing information through the first feature extraction network in the object recognition model to obtain the first fusion feature.
根据权利要求1所述的方法，其特征在于，所述第一图像特征通过对所述图像数据进行特征提取后获得。The method according to claim 1, wherein the first image feature is obtained by performing feature extraction on the image data.
根据权利要求1或8所述的方法，其特征在于，所述根据所述图像数据获取第一图像特征，包括：The method according to claim 1 or 8, wherein the acquiring the first image feature according to the image data comprises:

将所述图像数据输入物体识别模型中，通过所述物体识别模型中的第二特征提取网络对所述图像数据进行特征提取，获得所述第一图像特征。The image data is input into an object recognition model, and feature extraction is performed on the image data through a second feature extraction network in the object recognition model to obtain the first image feature.
根据权利要求1所述的方法，其特征在于，所述第一尺寸系数通过物体识别模型中的第一尺寸系数提取网络从所述第一融合特征中获得；The method according to claim 1, wherein the first size coefficient is obtained from the first fusion feature through a first size coefficient extraction network in an object recognition model;

以及，所述第二尺寸系数通过所述物体识别模型中的第二尺寸系数提取网络从所述第一图像特征中获得。And, the second size coefficient is obtained from the first image feature by a second size coefficient extraction network in the object recognition model.
根据权利要求10所述的方法，其特征在于，所述第一尺寸系数提取网络和所述第二尺寸系数提取网络均包括有卷积层。The method according to claim 10, wherein the first size coefficient extraction network and the second size coefficient extraction network both include convolution layers.
根据权利要求10或11所述的方法，其特征在于，所述第一尺寸系数通过所述第一尺寸系数提取网络对所述第一融合特征进行卷积操作得到的；The method according to claim 10 or 11, wherein the first size coefficient is obtained by performing a convolution operation on the first fusion feature by the first size coefficient extraction network;

以及，所述第二尺寸系数通过所述第二尺寸系数提取网络对所述第一图像特征进行卷积操作得到的。And, the second size coefficient is obtained by performing a convolution operation on the first image feature by the second size coefficient extraction network.
根据权利要求10所述的方法，其特征在于，所述物体识别模型在训练过程中采用的损失函数至少包括：用于优化所述第一尺寸系数提取网络的第一损失函数以及用于优化所述第二尺寸系数提取网络的第二损失函数。The method according to claim 10, wherein the loss function used by the object recognition model in the training process at least comprises: a first loss function for optimizing the first size coefficient extraction network and a first loss function for optimizing the The second loss function of the network is extracted by the second size coefficient.
根据权利要求13所述的方法，其特征在于，所述第一损失函数的优化目标包括：如果样本数据指示属于目标尺寸范围的物体，增大所述第一尺寸系数；所述样本数据包括点云样本数据和图像样本数据；The method according to claim 13, wherein the optimization objective of the first loss function comprises: if the sample data indicates an object belonging to a target size range, increasing the first size coefficient; the sample data includes point Cloud sample data and image sample data;

以及，所述第二损失函数的优化目标包括：如果所述图像样本数据指示属于目标尺寸范围的物体，增大所述第二尺寸系数。And, the optimization objective of the second loss function includes: if the image sample data indicates an object belonging to a target size range, increasing the second size coefficient.
根据权利要求13或14所述的方法，其特征在于，The method according to claim 13 or 14, wherein,

所述第一损失函数用于描述所述物体识别模型从样本数据中获取的属于所述目标尺寸范围的物体的置信度预测值，与所述样本数据对应的属于所述目标尺寸范围的物体的置信度真实值之间的差异；所述样本数据包括点云样本数据和图像样本数据；The first loss function is used to describe the confidence prediction value of the object belonging to the target size range obtained by the object recognition model from the sample data, and the prediction value of the object belonging to the target size range corresponding to the sample data. The difference between the true values of confidence; the sample data includes point cloud sample data and image sample data;

所述第二损失函数用于描述所述物体识别模型从所述图像样本数据中获取的属于所述目标尺寸范围的物体的置信度预测值，与所述图像样本数据对应的属于所述目标尺寸范围的物体的置信度真实值之间的差异。The second loss function is used to describe the confidence prediction value of the object belonging to the target size range obtained by the object recognition model from the image sample data, and the object size corresponding to the image sample data belongs to the target size. The difference between the true values of confidence for objects in the range.
根据权利要求1所述的方法，其特征在于，所述第二融合特征为所述第一融合特征和所述第一尺寸系数之和；The method according to claim 1, wherein the second fusion feature is the sum of the first fusion feature and the first size coefficient;

和/或，所述第二图像特征为所述第一图像特征和所述第二尺寸系数之和。And/or, the second image feature is the sum of the first image feature and the second size coefficient.
根据权利要求1所述的方法，其特征在于，所述根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别，包括：The method according to claim 1, wherein the identifying the object to be identified according to the second fusion feature and the second image feature comprises:

基于点云到图像的投影关系，确定所述第二融合特征在二维空间中的第二投影位置；determining the second projection position of the second fusion feature in the two-dimensional space based on the projection relationship between the point cloud and the image;

获取所述第二图像特征中在所述第二投影位置处的图像特征；acquiring the image features at the second projection position in the second image features;

根据所述图像特征与对应的第二融合特征获得第三融合特征，并根据所述第三融合特征对所述待识别物体进行识别。A third fusion feature is obtained according to the image feature and the corresponding second fusion feature, and the object to be recognized is recognized according to the third fusion feature.
根据权利要求17所述的方法，其特征在于，所述第三融合特征为所述图像特征与对应的第二融合特征的均值；The method according to claim 17, wherein the third fusion feature is an average value of the image feature and the corresponding second fusion feature;

或者，所述第三融合特征为所述图像特征与对应的第二融合特征中的较大者。Alternatively, the third fusion feature is the larger of the image feature and the corresponding second fusion feature.
根据权利要求1所述的方法，其特征在于，还包括：The method of claim 1, further comprising:

根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别，生成识别结果；所述识别结果至少包括：所述待识别物体的置信度和/或状态信息；所述置信度用于表征所述待识别物体属于障碍物的概率。Identify the object to be recognized according to the second fusion feature and the second image feature, and generate a recognition result; the recognition result at least includes: confidence and/or state information of the object to be recognized; the The confidence level is used to characterize the probability that the object to be identified belongs to an obstacle.
根据权利要求19所述的方法，其特征在于，所述状态信息包括以下至少一种：尺寸信息、位置信息和朝向信息。The method according to claim 19, wherein the state information comprises at least one of the following: size information, position information and orientation information.
根据权利要求19所述的方法，其特征在于，所述待识别物体的置信度以及状态信息利用物体识别模型中的物体预测网络对所述第二融合特征和所述第二图像特征进行处理获得。The method according to claim 19, wherein the confidence and state information of the object to be recognized are obtained by processing the second fusion feature and the second image feature by using an object prediction network in an object recognition model .
根据权利要求21所述的方法，其特征在于，所述物体识别模型在训练过程中采用的损失函数至少包括用于描述状态差异的第三损失函数和用于描述置信度差异的第四损失函数；The method according to claim 21, wherein the loss function used by the object recognition model in the training process includes at least a third loss function for describing the state difference and a fourth loss function for describing the confidence difference ;

所述状态差异包括：所述物体识别模型从样本数据中获取的物体的预测尺寸、预测位置和/或预测朝向，分别与所述样本数据对应的物体的真实尺寸、真实位置和/或真实朝向之间的差异；所述样本数据包括点云样本数据和图像样本数据；The state difference includes: the predicted size, predicted position and/or predicted orientation of the object obtained by the object recognition model from the sample data, respectively the real size, real position and/or real orientation of the object corresponding to the sample data The difference between; the sample data includes point cloud sample data and image sample data;

所述置信度差异包括：所述物体识别模型从样本数据中获取的物体的置信度预测值与所述样本数据对应的物体的置信度真实值之间的差异。The confidence difference includes: the difference between the predicted confidence value of the object obtained from the sample data by the object recognition model and the actual confidence value of the object corresponding to the sample data.
根据权利要求19所述的方法，其特征在于，所述识别结果为置信度大于预设阈值的数据。The method according to claim 19, wherein the identification result is data with a confidence level greater than a preset threshold.
根据权利要求1所述的方法，其特征在于，还包括：The method of claim 1, further comprising:

根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别，以生成识别结果，所述识别结果用于对可移动平台进行避障决策或者移动路线规划。The object to be recognized is recognized according to the second fusion feature and the second image feature to generate a recognition result, and the recognition result is used to make an obstacle avoidance decision or a moving route planning for the movable platform.
根据权利要求1所述的方法，其特征在于，还包括：The method of claim 1, further comprising:

根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别，以生成识别结果；Recognize the object to be recognized according to the second fusion feature and the second image feature to generate a recognition result;

在可移动平台的界面或者与所述可移动平台通信连接的终端设备的界面显示所述识别结果。The identification result is displayed on the interface of the movable platform or the interface of the terminal device communicatively connected to the movable platform.
根据权利要求1所述的方法，其特征在于，所述点云数据是利用配置于可移动平台上的激光雷达或者具有深度信息采集功能的拍摄装置获取的；和/或，所述图像数据是利用配置于可移动平台上的拍摄装置获取的。The method according to claim 1, wherein the point cloud data is obtained by using a lidar configured on a movable platform or a photographing device with a depth information collection function; and/or the image data is Acquired with a camera mounted on a movable platform.
根据权利要求26所述的方法，其特征在于，所述可移动平台包括：无人飞行器、汽车、无人船或机器人。The method of claim 26, wherein the movable platform comprises: an unmanned aerial vehicle, an automobile, an unmanned ship or a robot.
一种物体识别装置，其特征在于，包括处理器和存储有计算机程序的存储器；An object recognition device, characterized in that it comprises a processor and a memory storing a computer program;

所述处理器在执行所述计算机程序时实现以下步骤：The processor implements the following steps when executing the computer program:

获取待识别物体的点云数据以及图像数据，并根据所述点云数据和所述图像数据获取第一融合特征；以及，根据所述图像数据获取第一图像特征；acquiring point cloud data and image data of the object to be recognized, and acquiring a first fusion feature according to the point cloud data and the image data; and acquiring a first image feature according to the image data;

根据所述第一融合特征获取第一尺寸系数；以及，根据所述第一图像特征获取第二尺寸系数；所述第一尺寸系数和所述第二尺寸系数表征所述待识别物体属于目标尺寸范围的物体的概率；Obtain a first size coefficient according to the first fusion feature; and obtain a second size coefficient according to the first image feature; the first size coefficient and the second size coefficient indicate that the object to be recognized belongs to the target size the probability of objects in the range;

根据所述第一融合特征和所述第一尺寸系数获取第二融合特征；以及，根据所述第一图像特征和所述第二尺寸系数获取第二图像特征；Obtaining a second fusion feature according to the first fusion feature and the first size coefficient; and obtaining a second image feature according to the first image feature and the second size coefficient;

根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别。The object to be recognized is recognized according to the second fusion feature and the second image feature.
根据权利要求28所述的装置，其特征在于，所述待识别物体的点云数据以及图像数据是通过可移动平台的传感器对空间采样得到的。The device according to claim 28, wherein the point cloud data and the image data of the object to be recognized are obtained by spatial sampling of the sensor of the movable platform.
根据权利要求28所述的装置，其特征在于，所述处理器还用于：根据所述点云数据中每个体素的点云密度以及所述图像数据中的像素值，获取第一融合特征；其中，所述体素是对所述点云数据进行栅格划分获得的。The apparatus according to claim 28, wherein the processor is further configured to: obtain the first fusion feature according to the point cloud density of each voxel in the point cloud data and the pixel value in the image data ; wherein, the voxel is obtained by dividing the point cloud data into a grid.
根据权利要求28所述的装置，其特征在于，所述第一融合特征和所述第一图像特征以特征图形式表示；The apparatus according to claim 28, wherein the first fusion feature and the first image feature are represented in the form of a feature map;

所述第一尺寸系数表征包括所述第一融合特征的特征图中的每个位置属于目标尺寸范围的物体的概率；The first size coefficient represents the probability that each position in the feature map including the first fusion feature belongs to the object in the target size range;

所述第二尺寸系数表征包括所述第一图像特征的特征图中的每个位置属于目标尺寸范围的物体的概率。The second size coefficient characterizes the probability that each position in the feature map including the first image feature belongs to the object within the target size range.
根据权利要求28所述的装置，其特征在于，所述处理器还用于：The apparatus of claim 28, wherein the processor is further configured to:

基于点云到图像的投影关系，确定所述点云数据在二维空间中的第一投影位置；determining the first projection position of the point cloud data in the two-dimensional space based on the projection relationship between the point cloud and the image;

获取所述图像数据中所述第一投影位置处的像素值；obtaining the pixel value at the first projection position in the image data;

根据所述像素值以及所述点云数据生成包括有所述像素值的点云数据；generating point cloud data including the pixel value according to the pixel value and the point cloud data;

根据所述包括有所述像素值的点云数据以及所述图像数据获取所述第一融合特征。The first fusion feature is acquired according to the point cloud data including the pixel value and the image data.
根据权利要求32所述的装置，其特征在于，所述点云到图像的投影关系基于点云坐标系到相机坐标系的外参以及相机的内参得到。The device according to claim 32, wherein the projection relationship of the point cloud to the image is obtained based on the external parameters from the point cloud coordinate system to the camera coordinate system and the internal parameters of the camera.
根据权利要求28所述的装置，其特征在于，所述处理器还用于：The apparatus of claim 28, wherein the processor is further configured to:

拼接所述点云数据和所述图像数据，得到拼接数据并输入物体识别模型中；Splicing the point cloud data and the image data, obtaining the splicing data and inputting it into the object recognition model;

通过所述物体识别模型中的第一特征提取网络对所述拼接信息进行特征提取，获取所述第一融合特征。Feature extraction is performed on the splicing information through the first feature extraction network in the object recognition model to obtain the first fusion feature.
根据权利要求28所述的装置，其特征在于，所述第一图像特征通过对所述图像数据进行特征提取后获得。The apparatus according to claim 28, wherein the first image feature is obtained by performing feature extraction on the image data.
根据权利要求28或35所述的装置，其特征在于，所述处理器还用于：将所述图像数据输入物体识别模型中，通过所述物体识别模型中的第二特征提取网络对所述图像数据进行特征提取，获得所述第一图像特征。The device according to claim 28 or 35, wherein the processor is further configured to: input the image data into an object recognition model, and analyze the image data through a second feature extraction network in the object recognition model. Feature extraction is performed on the image data to obtain the first image feature.
根据权利要求28所述的装置，其特征在于，所述第一尺寸系数通过物体识别模型中的第一尺寸系数提取网络从所述第一融合特征中获得；The apparatus according to claim 28, wherein the first size coefficient is obtained from the first fusion feature by a first size coefficient extraction network in an object recognition model;

以及，所述第二尺寸系数通过所述物体识别模型中的第二尺寸系数提取网络从所述第一图像特征中获得。And, the second size coefficient is obtained from the first image feature by a second size coefficient extraction network in the object recognition model.
根据权利要求37所述的装置，其特征在于，所述第一尺寸系数提取网络和所述第二尺寸系数提取网络均包括有卷积层。The apparatus of claim 37, wherein both the first size coefficient extraction network and the second size coefficient extraction network include convolutional layers.
根据权利要求37或38所述的装置，其特征在于，所述第一尺寸系数通过所述第一尺寸系数提取网络对所述第一融合特征进行卷积操作得到的；The apparatus according to claim 37 or 38, wherein the first size coefficient is obtained by performing a convolution operation on the first fusion feature by the first size coefficient extraction network;

以及，所述第二尺寸系数通过所述第二尺寸系数提取网络对所述第一图像特征进行卷积操作得到的。And, the second size coefficient is obtained by performing a convolution operation on the first image feature by the second size coefficient extraction network.
根据权利要求37所述的装置，其特征在于，所述物体识别模型在训练过程中采用的损失函数至少包括：用于优化所述第一尺寸系数提取网络的第一损失函数以及用于优化所述第二尺寸系数提取网络的第二损失函数。The apparatus according to claim 37, wherein the loss function used by the object recognition model in the training process at least comprises: a first loss function for optimizing the first size coefficient extraction network and a first loss function for optimizing the The second loss function of the network is extracted by the second size coefficient.
根据权利要求40所述的装置，其特征在于，所述第一损失函数的优化目标包括：如果样本数据指示属于目标尺寸范围的物体，增大所述第一尺寸系数；所述样本数据包括点云样本数据和图像样本数据；The apparatus according to claim 40, wherein the optimization objective of the first loss function comprises: if the sample data indicates an object belonging to a target size range, increasing the first size coefficient; the sample data includes point Cloud sample data and image sample data;

以及，所述第二损失函数的优化目标包括：如果所述图像样本数据指示属于目标尺寸范围的物体，增大所述第二尺寸系数。And, the optimization objective of the second loss function includes: if the image sample data indicates an object belonging to a target size range, increasing the second size coefficient.
根据权利要求40或41所述的装置，其特征在于，所述第一损失函数用于描述所述物体识别模型从样本数据中获取的属于所述目标尺寸范围的物体的置信度预测值，与所述样本数据对应的属于所述目标尺寸范围的物体的置信度真实值之间的差异；所述样本数据包括点云样本数据和图像样本数据；The device according to claim 40 or 41, wherein the first loss function is used to describe the confidence prediction value of the object belonging to the target size range obtained by the object recognition model from the sample data, and the The difference between the true values of the confidence levels of the objects belonging to the target size range corresponding to the sample data; the sample data includes point cloud sample data and image sample data;

所述第二损失函数用于描述所述物体识别模型从所述图像样本数据中获取的属于所述目标尺寸范围的物体的置信度预测值，与所述图像样本数据对应的属于所述目标尺寸范围的物体的置信度真实值之间的差异。The second loss function is used to describe the confidence prediction value of the object belonging to the target size range obtained by the object recognition model from the image sample data, and the object size corresponding to the image sample data belongs to the target size. The difference between the true values of confidence for objects in the range.
根据权利要求28所述的装置，其特征在于，所述第二融合特征为所述第一融合特征和所述第一尺寸系数之和；The apparatus according to claim 28, wherein the second fusion feature is the sum of the first fusion feature and the first size coefficient;

和/或，所述第二图像特征为所述第一图像特征和所述第二尺寸系数之和。And/or, the second image feature is the sum of the first image feature and the second size coefficient.
根据权利要求28所述的装置，其特征在于，所述处理器还用于：The apparatus of claim 28, wherein the processor is further configured to:

基于点云到图像的投影关系，确定所述第二融合特征在二维空间中的第二投影位置；determining the second projection position of the second fusion feature in the two-dimensional space based on the projection relationship between the point cloud and the image;

获取所述第二图像特征中在所述第二投影位置处的图像特征；acquiring the image features at the second projection position in the second image features;

根据所述图像特征与对应的第二融合特征获得第三融合特征，并根据所述第三融合特征对所述待识别物体进行识别。A third fusion feature is obtained according to the image feature and the corresponding second fusion feature, and the object to be recognized is recognized according to the third fusion feature.
根据权利要求44所述的装置，其特征在于，所述第三融合特征为所述图像特征与对应的第二融合特征的均值；The device according to claim 44, wherein the third fusion feature is an average value of the image feature and the corresponding second fusion feature;

或者，所述第三融合特征为所述图像特征与对应的第二融合特征中的较大者。Alternatively, the third fusion feature is the larger of the image feature and the corresponding second fusion feature.
根据权利要求28所述的装置，其特征在于，所述处理器还用于：根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别，生成识别结果；所述识别结果至少包括：所述待识别物体的置信度和/或状态信息；所述置信度用于表征所述待识别物体属于障碍物的概率。The apparatus according to claim 28, wherein the processor is further configured to: recognize the object to be recognized according to the second fusion feature and the second image feature, and generate a recognition result; the The recognition result at least includes: confidence and/or state information of the object to be recognized; the confidence is used to represent the probability that the object to be recognized belongs to an obstacle.
根据权利要求46所述的装置，其特征在于，所述状态信息包括以下至少一种：尺寸信息、位置信息和朝向信息。The apparatus according to claim 46, wherein the state information includes at least one of the following: size information, position information and orientation information.
根据权利要求46所述的装置，其特征在于，所述待识别物体的置信度以及状态信息利用物体识别模型中的物体预测网络对所述第二融合特征和所述第二图像特征进行处理获得。The device according to claim 46, wherein the confidence and state information of the object to be recognized are obtained by processing the second fusion feature and the second image feature by using an object prediction network in an object recognition model. .
根据权利要求48所述的装置，其特征在于，所述物体识别模型在训练过程中采用的损失函数至少包括用于描述状态差异的第三损失函数和用于描述置信度差异的第四损失函数；The device according to claim 48, wherein the loss function used by the object recognition model in the training process includes at least a third loss function for describing state difference and a fourth loss function for describing confidence difference ;

所述状态差异包括：所述物体识别模型从样本数据中获取的物体的预测尺寸、预测位置和/或预测朝向，分别与所述样本数据对应的物体的真实尺寸、真实位置和/或真实朝向之间的差异；所述样本数据包括点云样本数据和图像样本数据；The state difference includes: the predicted size, predicted position and/or predicted orientation of the object obtained by the object recognition model from the sample data, respectively the real size, real position and/or real orientation of the object corresponding to the sample data The difference between; the sample data includes point cloud sample data and image sample data;

所述置信度差异包括：所述物体识别模型从样本数据中获取的物体的置信度预测值与所述样本数据对应的物体的置信度真实值之间的差异。The confidence difference includes: the difference between the predicted confidence value of the object obtained from the sample data by the object recognition model and the actual confidence value of the object corresponding to the sample data.
根据权利要求46所述的装置，其特征在于，所述识别结果为置信度大于预设阈值的数据。The apparatus according to claim 46, wherein the identification result is data with a confidence level greater than a preset threshold.
根据权利要求28所述的装置，其特征在于，所述处理器还用于：根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别，以生成识别结果，所述识别结果用于对可移动平台进行避障决策或者移动路线规划。The apparatus according to claim 28, wherein the processor is further configured to: recognize the object to be recognized according to the second fusion feature and the second image feature, so as to generate a recognition result, the The above identification results are used for obstacle avoidance decision or moving route planning for the movable platform.
根据权利要求28所述的装置，其特征在于，所述处理器还用于：根据所述第二融合特征和所述第二图像特征对所述待识别物体进行识别，以生成识别结果；所述识别结果用于在可移动平台的界面或者与所述可移动平台通信连接的终端设备的界面显示。The apparatus according to claim 28, wherein the processor is further configured to: recognize the object to be recognized according to the second fusion feature and the second image feature to generate a recognition result; the The identification result is used for display on the interface of the movable platform or the interface of the terminal device communicatively connected to the movable platform.
根据权利要求28所述的装置，其特征在于，所述点云数据是利用配置于可移动平台上的激光雷达或者具有深度信息采集功能的拍摄装置获取的；和/或，所述图像数据是利用配置于可移动平台上的拍摄装置获取的。The device according to claim 28, wherein the point cloud data is obtained by using a lidar configured on a movable platform or a photographing device with a depth information collection function; and/or the image data is Acquired with a camera mounted on a movable platform.
根据权利要求53所述的装置，其特征在于，所述可移动平台包括：无人飞行器、汽车、无人船或机器人。The apparatus of claim 53, wherein the movable platform comprises: an unmanned aerial vehicle, an automobile, an unmanned ship, or a robot.
一种可移动平台，其特征在于，包括：A movable platform is characterized in that, comprising:

机体；body;

动力***，安装在所述机体内，用于为所述可移动平台提供动力；以及，a power system mounted within the body for powering the movable platform; and,

如权利要求28至54任一所述的物体识别装置。The object recognition device according to any one of claims 28 to 54.
一种计算机可读存储介质，其特征在于，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如权利要求1至27任一项所述的方法。A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1 to 27 is implemented.