WO2023050589A1

WO2023050589A1 - Intelligent cargo box loading method and system based on rgbd camera

Info

Publication number: WO2023050589A1
Application number: PCT/CN2021/138155
Authority: WO
Inventors: 任柯燕; 闫桐; 张云路; 胡兆欣
Original assignee: 北京工业大学
Priority date: 2021-09-30
Filing date: 2021-12-15
Publication date: 2023-04-06
Also published as: CN113963044A; CN113963044B

Abstract

An intelligent cargo box loading method and system based on an RGBD camera (7). The method comprises: for cargo boxes which are placed in a scattered manner, first acquiring, by means of an RGBD camera (7), RGB images and depth images of cargo boxes in an area to be subjected to loading (10) and a target area (12), and performing camera calibration; converting the depth images into point clouds according to obtained intrinsic and extrinsic parameters of the camera, and performing data enhancement on the RGB images and the point clouds; inputting the enhanced RGB images and point clouds into an improved 3D target detection network, so as to detect the position and size of each cargo box; inputting, into a loading policy generation network based on reinforcement learning, the obtained sizes and positions of cargo boxes to be loaded, and the loaded cargo boxes in the target area (12), taking the utilization rate of a loading space and a stability factor into comprehensive consideration, generating loading sequences and loading positions of all the cargo boxes to be loaded, and forming a final loading policy; and calculating a deflection displacement and a rotation angle of a mechanical arm according to the loading policy, and controlling the mechanical arm to load the cargo boxes.

Description

一种基于RGBD相机的货物箱智能装载方法及***A method and system for intelligent loading of cargo boxes based on RGBD cameras

技术领域technical field

本发明涉及货物箱智能装载领域，尤其涉及一种基于RGBD相机的货物箱智能装载方法及***。The invention relates to the field of intelligent loading of cargo boxes, in particular to a method and system for intelligent loading of cargo boxes based on an RGBD camera.

背景技术Background technique

在社会环境的竞争要求及计算机自动识别技术高速发展的前提下,仓储管理在逐渐完成信息化。电商与仓储物流自动化需求越来越高。装载作业作为一种常见的物流任务，是将当前位置的所有货物箱装载到指定空间，对仓储管理起着举足轻重的作用。传统的装载任务由人工完成，劳动强度大、工作效率低，且人工装载容易受到人为主观因素影响，随意性大，缺少对整体装载布局的考虑。Under the premise of the competitive requirements of the social environment and the rapid development of computer automatic identification technology, warehouse management is gradually completing informationization. The demand for e-commerce and warehousing and logistics automation is getting higher and higher. As a common logistics task, loading operation is to load all the cargo boxes at the current location into the designated space, which plays a pivotal role in warehouse management. Traditional loading tasks are done manually, which is labor-intensive and low-efficiency, and manual loading is easily affected by human subjective factors, which is random and lacks consideration of the overall loading layout.

随着硬件技术的发展，越来越多的仓储物流将自动化机械臂运用于货物箱装载。当前常见的自动化装载***是由激光和红外装置标定位置，并利用机械臂集体化装载。这种装载***不仅需要的硬件设备价格昂贵，并且要求所有货物箱只能是相同大小并且必须按照指定规则码垛，对具体实施限制性大。With the development of hardware technology, more and more warehousing and logistics use automated robotic arms for box loading. The current common automated loading system uses laser and infrared devices to mark the position, and uses robotic arms for collective loading. This kind of loading system not only requires expensive hardware equipment, but also requires that all cargo boxes can only be of the same size and must be stacked according to specified rules, which is very restrictive for specific implementation.

近些年来，随着神经网络的发展，越来越多的行业开始将基于神经网络的智能化***应用于自身。目标检测作为人工智能的基础性任务之一，在各行各业的智能化***中发挥着举足轻重的作用。针对不同的数据，目标检测领域也有不同的网络结构来完成相应任务。In recent years, with the development of neural networks, more and more industries have begun to apply intelligent systems based on neural networks to themselves. As one of the basic tasks of artificial intelligence, target detection plays a pivotal role in intelligent systems in various industries. For different data, the field of target detection also has different network structures to complete corresponding tasks.

经典的2维检测网络faster-rcnn，以RGB图片作为输入数据，将检测任务分为了两个阶段：第一阶段提取感兴趣区域，第二阶段对建议锚框进行精确回归。这里的锚框初始默认是在每个锚点以(0.5:1:2)的经典比例生成，用于尽可能地捕捉不同大小、不同形状的物体。Fater-rcnn最终生成检测对象的大小和位置，形成最终的检测框。同时，每个检测框也有一个分类分数，这个分类分数经过一层softmax网络层，将分数最高的对象确定为当前检测框的最终分类对象。The classic 2D detection network faster-rcnn, using RGB images as input data, divides the detection task into two stages: the first stage extracts the region of interest, and the second stage performs precise regression on the proposed anchor box. The anchor box here is initially defaulted to be generated at each anchor point with a classic ratio of (0.5:1:2), which is used to capture objects of different sizes and shapes as much as possible. Fater-rcnn finally generates the size and position of the detected object to form the final detection frame. At the same time, each detection frame also has a classification score. This classification score passes through a softmax network layer, and the object with the highest score is determined as the final classification object of the current detection frame.

然而，现实场景都是三维的场景，仅仅利用2D的图片数据作为输入，很难表现3D场景的复杂性。因此，也有很多检测网络将输入数据聚焦于含3D信息的点云数据上。对于点云数据来说，最常用的提取点云特征的网络之一是Pointnet++。Pointnet++有四层特征提取层和两层上采样层。其中，有一个关键步骤是根据点云距离来对点云进行分组。为了尽可能涵盖各种大小、各种形状的物体，Pointnet++采用欧式距离来计算点云距离，并且使用球体范围来进行分组。Pointnet++的网络结构使点云特征提取过程简单而有效，因此被很多3D检测网络所采用，比如Group-free-3D-Net。Group-free-3D-Net以点云数据作为输入，使用Pointnet++提取点云特征，此外，它还使用了Transformer结构来对最后的3D检测框进行精修。However, real scenes are all three-dimensional scenes, and it is difficult to express the complexity of 3D scenes by only using 2D picture data as input. Therefore, there are also many detection networks that focus the input data on point cloud data containing 3D information. For point cloud data, one of the most commonly used networks for extracting point cloud features is Pointnet++. Pointnet++ has four feature extraction layers and two upsampling layers. Among them, a key step is to group the point cloud according to the point cloud distance. In order to cover objects of various sizes and shapes as much as possible, Pointnet++ uses the Euclidean distance to calculate the point cloud distance, and uses the range of the sphere for grouping. The network structure of Pointnet++ makes the point cloud feature extraction process simple and effective, so it is adopted by many 3D detection networks, such as Group-free-3D-Net. Group-free-3D-Net takes point cloud data as input and uses Pointnet++ to extract point cloud features. In addition, it also uses the Transformer structure to refine the final 3D detection frame.

然而，点云数据本身没有颜色，因此相比于图片数据，缺少重要的语义信息和边缘信息。所以也有很多网络模型将研究重点放在了多模态数据的融合上。Imvotenet首先对图像做2D目标检测，并从2D检测框中提取特征加到点云特征上，以增强点云特征。具体来说，从2D检测结果中提取的特征包括：1、语义特征：2D检测结果中的该检测框的分类分数；2、纹理特征：2D检测框中所有像素的RGB初始像素值；3、几何特征：从2D检测框的中心点出发，通过投影原理投影到3D场景中的射线的方向。但Imvotenet在提取这几个图像特征之后，直接将它加在了点云特征之后，缺少对两种数据形式差异性的考虑。EP-Net提出了LI-Fusion模块，专门用于融合点云数据特征和图片数据特征。LI-Fusion模块首先通过一层特征通道对齐层，将图像特征通道和点云特征通道对齐；然后将对齐后的图像特征和点云特征一起输入到全连接层中，全连接层对比图像特征和点云特征的差异性，生成一组与图像特征相对应的注意力权重；这组权重与图像特征相乘得到加权的图像特征，加权的图像特征与点云特征相连接，得到了初步融合特征；最后初步融合特征经过一层特征混合层，输出最终的融合特征。However, point cloud data itself has no color, so compared with image data, it lacks important semantic information and edge information. Therefore, there are also many network models that focus on the fusion of multimodal data. Imvotenet first performs 2D target detection on the image, and extracts features from the 2D detection frame and adds them to the point cloud features to enhance the point cloud features. Specifically, the features extracted from the 2D detection results include: 1. Semantic features: the classification score of the detection frame in the 2D detection result; 2. Texture features: the RGB initial pixel values of all pixels in the 2D detection frame; 3. Geometric features: Starting from the center point of the 2D detection frame, the direction of the ray projected into the 3D scene through the projection principle. However, after Imvotenet extracts these image features, it directly adds them to the point cloud features, and lacks consideration of the differences between the two data forms. EP-Net proposes the LI-Fusion module, which is specially used to fuse point cloud data features and image data features. The LI-Fusion module first aligns the image feature channel with the point cloud feature channel through a layer of feature channel alignment layer; then the aligned image features and point cloud features are input into the fully connected layer, and the fully connected layer compares the image features and The difference of point cloud features generates a set of attention weights corresponding to image features; this set of weights is multiplied by image features to obtain weighted image features, and the weighted image features are connected with point cloud features to obtain preliminary fusion features ; Finally, the preliminary fusion feature passes through a feature mixing layer to output the final fusion feature.

这些3D目标检测模型侧重于学术研究，因此大多考虑了模型在多种形状物体上的通用性。然而在仓储物流领域的实际应用中，主要的物体是形状规则的物流箱。虽然物流箱尺寸各异，但其形状均保持为方体形状。目前，专门针对于物流箱这种规则物体的检测网络还很少。These 3D object detection models focus on academic research, so most of them consider the generality of the model on objects of various shapes. However, in practical applications in the field of warehousing and logistics, the main objects are regular-shaped logistics boxes. Although logistics boxes vary in size, their shape remains square. At present, there are few detection networks specifically for regular objects such as logistics boxes.

另一方面，强化学习的提出，为很多复杂的组合优化问题提供了新的解决办法。装箱问题作为最经典的组合优化问题之一，也受到了很大关注。TAP-Net就是用于解决装箱问题的网络之一。TAP-Net分为装载顺序生成子网络和装载位置生成子网络，接受待装载箱子的3D尺寸位置以及目标区域当前的装载情况作为输入，通过装载顺序生成子网络生成最优装载顺序，然后装载位置生成子网络根据该装载顺序和目标区域装载情况，生成最优装载位置。最优装载顺序和最优装载位置的组合，形成最终的装载策略。然而，TAP-Net的其中一项输入是箱子的尺寸和大小，然而在实际场景中，箱子的尺寸和位置具有很大的随机性。尤其在仓储物流中，物流箱的尺寸大小各异。这一问题导致在实际应用中很难直接得到箱子的尺寸和位置，大大限制了TAP-Net的落地应用。On the other hand, the introduction of reinforcement learning provides new solutions to many complex combinatorial optimization problems. As one of the most classic combinatorial optimization problems, the bin packing problem has also received a lot of attention. TAP-Net is one of the networks used to solve the bin packing problem. TAP-Net is divided into a loading sequence generation subnetwork and a loading position generation subnetwork. It accepts the 3D size position of the box to be loaded and the current loading situation in the target area as input, generates the optimal loading sequence through the loading sequence generation subnetwork, and then loads the position Generate a subnetwork to generate the optimal loading position according to the loading sequence and the loading situation of the target area. The combination of optimal loading sequence and optimal loading position forms the final loading strategy. However, one of the inputs of TAP-Net is the size and size of the box, however in the actual scene, the size and position of the box have a lot of randomness. Especially in warehousing logistics, the size of logistics boxes varies. This problem makes it difficult to directly obtain the size and position of the box in practical applications, which greatly limits the application of TAP-Net.

发明内容Contents of the invention

针对上述缺少专门针对货物箱等规则物体的多模态3D目标检测模型问题，本发明提供一种基于图像数据和点云数据的3D目标检测网络，该目标检测网络充分考虑货物箱等规则物体的特性，对网络结构做了相应改进，并且进一步的，针对目前实际应用的装载***限制较多，且相关学术研究由于输入等限制而难以落地应用的问题，本发明以该3D目标检测网络为前提，提供一种基于RGBD相机的货物箱智能装载方法及***，可自动识别当前待装载货物箱的尺寸与位置，给出合适的货物箱装载顺序和位置，从而实现不同尺寸大小货物箱的自动化装载任务。具体方法如下：In view of the lack of a multi-modal 3D target detection model for regular objects such as cargo boxes, the present invention provides a 3D target detection network based on image data and point cloud data. characteristics, the network structure has been improved accordingly, and further, for the current practical application of the loading system has many restrictions, and related academic research is difficult to apply due to limitations such as input, the present invention is based on the 3D target detection network. , providing a method and system for intelligent loading of cargo boxes based on RGBD cameras, which can automatically identify the size and position of the current cargo boxes to be loaded, and give the appropriate loading sequence and position of the cargo boxes, thereby realizing automatic loading of cargo boxes of different sizes Task. The specific method is as follows:

1.基于RGBD相机的货物箱智能装载方法，所述方法包括以下步骤：1. The cargo case intelligent loading method based on RGBD camera, described method comprises the following steps:

S1、通过RGBD相机采集待装载区域和目标区域货物箱的颜色和深度信息，生成RGB图片和对应的深度图片。并进行相机标定，确定图像坐标系与世界坐标系的转化关系；待装载区域用于放置需要装载的货物箱，目标区域用于放置已装载的货物箱；S1. Collect the color and depth information of the cargo box in the area to be loaded and the target area through the RGBD camera, and generate an RGB image and a corresponding depth image. And carry out camera calibration to determine the transformation relationship between the image coordinate system and the world coordinate system; the area to be loaded is used to place the cargo boxes that need to be loaded, and the target area is used to place the loaded cargo boxes;

S2、根据S1中相机标定得到的相机内外参，将深度图转化为点云。并过增强手段分别对RGB图像和点云进行数据增强；S2. Convert the depth map into a point cloud according to the internal and external parameters of the camera obtained from the camera calibration in S1. And the RGB image and the point cloud are respectively enhanced for data by means of enhancement;

S3、将步骤S2中所得增强后的RGB图像和点云数据输入到3D目标检测网络中，检测出待装载区域和目标区域中每个货物箱的位置和尺寸大小，生成带有中心位置坐标和长宽高的三维检测框的信息；S3. Input the enhanced RGB image and point cloud data obtained in step S2 into the 3D target detection network, detect the position and size of each cargo box in the area to be loaded and the target area, and generate a frame with center position coordinates and The information of the three-dimensional detection frame of length, width and height;

S4、将步骤S3中所得到的货物箱的位置和尺寸信息输入到基于强化学习的装载策略生成网络中，得到最终的装载策略；S4. Input the position and size information of the cargo box obtained in step S3 into the loading strategy generation network based on reinforcement learning to obtain the final loading strategy;

S5、根据步骤S4所生成的装载策略，计算机械臂的偏转位移和旋转角度，控制机械臂对货物箱进行装载。S5. Calculate the deflection displacement and rotation angle of the mechanical arm according to the loading strategy generated in step S4, and control the mechanical arm to load the cargo box.

优选地，所述步骤S1中相机标定方法采用标准的标定板标定法，得到相机内参与外参，从而确定世界坐标系与图像坐标系的位置对应关系，标定公式可以表示为：Preferably, the camera calibration method in the step S1 adopts a standard calibration plate calibration method to obtain the internal and external parameters of the camera, so as to determine the position correspondence between the world coordinate system and the image coordinate system. The calibration formula can be expressed as:

式中：f _u和f _v分别是水平方向和竖直方向上的相机焦距，R和T是相机的旋转矩阵和平移矩阵，C是尺度因子。通过标定公式，可以得到世界坐标系下一点(x,y,z)到图像坐标系一点(u,v)之间的转化公式。 Where: f _u and f _v are the focal lengths of the camera in the horizontal and vertical directions, respectively, R and T are the rotation matrix and translation matrix of the camera, and C is the scaling factor. Through the calibration formula, the conversion formula between a point (x, y, z) in the world coordinate system and a point (u, v) in the image coordinate system can be obtained.

优选地，所述步骤S2中的增强手段包括：对于RGB图像，为防止步骤S3的检测网络对图像过拟合，增强图像表现力，通过颜色抖动、随机裁剪，并通过缩放将图片缩放到固定尺寸，为步骤S3的检测网络做输入准备；对于点云数据，为防止步骤S3的检测网络对点云数据过拟合，通过随机缩放、随机旋转，并通过随机采样的方式将点云采样到20000个点，为步骤S3的检测网络做输入准备。Preferably, the enhancement means in step S2 includes: for RGB images, in order to prevent the detection network in step S3 from overfitting the image, enhance the expressiveness of the image, through color dithering, random cropping, and scaling the image to a fixed size, to prepare for the input of the detection network in step S3; for point cloud data, in order to prevent the detection network in step S3 from overfitting the point cloud data, the point cloud is sampled to 20000 points, prepare for the input of the detection network in step S3.

优选地，所述步骤S3的3D目标检测网络包括三个模块：特征提取模块、特征融合模块和建议生成模块。其中特征提取模块又分为两个分支：图像特征分支和点云特征分支。具体地，图像特征分支是基于Faster-rcnn改进版的2D目标检测网络、点云特征分支是基于pointnet++改进版的点云特征提取网络；特征融合模块用于融合点云特征和图像特征；建议生成网络基于Transformer结构，生成3D场景中每个货物箱的位置和尺寸大小。3D目标检测网络的具体流程如下：Preferably, the 3D object detection network in step S3 includes three modules: a feature extraction module, a feature fusion module and a suggestion generation module. The feature extraction module is divided into two branches: image feature branch and point cloud feature branch. Specifically, the image feature branch is a 2D target detection network based on an improved version of Faster-rcnn, and the point cloud feature branch is a point cloud feature extraction network based on an improved version of pointnet++; the feature fusion module is used to fuse point cloud features and image features; suggestion generation Based on the Transformer structure, the network generates the position and size of each cargo box in the 3D scene. The specific process of the 3D target detection network is as follows:

S31、取图像分支的最终特征以及点云分支的最终特征；S31. Get the final features of the image branch and the final features of the point cloud branch;

其中，图像分支最终特征通过基于Faster-rcnn改进版的2D目标检测网络得到，具体为，将增强后的RGB图像输入基于Faster-rcnn改进版的2D目标检测网络，该网络输出RGB图像中货物箱的2D检测框的位置、大小和对应检测框的分类分数，然后从这些2D检测框中提取特征，特征包括从2D检测框中提取的语义、纹理和几何特征。其中，语义特征为2D检测框的分类分数，由于2D检测网络会在输出检测框位置和大小的同时，还会给出每个检测框的分类。这个分类分数可以为3D目标检测任务起到一个启发作用，因此本发明将分类分数组成的语义特征作为图像分支特征之一；纹理特征为检测框中所有像素的RGB像素值，相比于点云数据，图像中由于包含RGB颜色像素值，因此包含更多的语义和纹理信息，本发明将原始RGB值组成的纹理特征作为图像分支输入之一，以增强特征表示性；几何特征为从2D检测框的中心投影到3D场景的投影射线，该射线可由2D检测框的大小位置以及相机标定得到的内外参获得。几何特征参考Imvotenet特征提取形式，由于3D检测网络最终的输出之一是物体的位置，提供物体在2D图像里的中心投影到3D时的射线方向，可以为3D检测生成3D检测框提供指导作用。这三种特征在通道维度相连接，作为图像分支的最终特征输出；Among them, the final feature of the image branch is obtained through the 2D target detection network based on the improved version of Faster-rcnn. Specifically, the enhanced RGB image is input into the 2D target detection network based on the improved version of Faster-rcnn, and the network outputs the cargo box in the RGB image. The position and size of the 2D detection boxes and the classification scores of the corresponding detection boxes, and then extract features from these 2D detection boxes, including semantic, texture and geometric features extracted from the 2D detection boxes. Among them, the semantic feature is the classification score of the 2D detection frame. Since the 2D detection network will output the position and size of the detection frame, it will also give the classification of each detection frame. This classification score can serve as an inspiration for the 3D target detection task, so the present invention uses the semantic feature composed of the classification score as one of the image branch features; the texture feature is the RGB pixel value of all pixels in the detection frame, compared to the point cloud Data, because the image contains RGB color pixel values, it contains more semantic and texture information. The present invention uses texture features composed of original RGB values as one of the image branch inputs to enhance feature representation; geometric features are detected from 2D The center of the frame is projected to the projected ray of the 3D scene, which can be obtained from the size and position of the 2D detection frame and the internal and external parameters obtained by camera calibration. Geometric features refer to the feature extraction form of Imvotenet. Since one of the final outputs of the 3D detection network is the position of the object, providing the ray direction when the center of the object in the 2D image is projected to 3D can provide guidance for 3D detection to generate a 3D detection frame. These three features are connected in the channel dimension as the final feature output of the image branch;

点云分支的最终特征通过基于Pointnet++改进版的点云特征提取网络得到，具体为，以20000个点云采样点作为输入，通过改进版的Pointnet++网络得到点云特征，作为点云分支的最终特征输出。Pointnet++作为经典的点云特征提取骨干网络，结构简单并且有效，已应用在多个网络模型中；The final feature of the point cloud branch is obtained through the point cloud feature extraction network based on the improved version of Pointnet++. Specifically, 20,000 point cloud sampling points are used as input, and the point cloud feature is obtained through the improved version of the Pointnet++ network as the final feature of the point cloud branch. output. As a classic point cloud feature extraction backbone network, Pointnet++ has a simple and effective structure and has been applied in many network models;

S32、利用特征融合模块将所述步骤S31中的图像分支的特征与点云分支的特征融合起来，融合方法是基于改进版的LI-fusion模块，输出融合后的特征；S32, using the feature fusion module to fuse the features of the image branch in the step S31 with the features of the point cloud branch, the fusion method is based on the improved version of the LI-fusion module, and outputs the fused features;

S33、基于Transformer结构的建议生成模块以融合特征作为输入，从所有的融合特征中计算货物箱特征，并最终输出待装载区域和目标区域货物箱的中心点空间坐标位置与长宽高尺寸大小。Transformer由于其自身的注意力机制的存在，适合计算输入数据的部分与整体之间的关系，这种特性契合从整体特征中计算存在物体部分的特征的任务。基于这一想法，本发明遵循Group-free-3D-Net网络，利用Transformer结构输出最后的3D检测结果。S33. The suggestion generation module based on the Transformer structure takes the fusion features as input, calculates the features of the cargo box from all the fusion features, and finally outputs the spatial coordinate position and the length, width, and height of the center point of the cargo box in the area to be loaded and the target area. Due to the existence of its own attention mechanism, Transformer is suitable for calculating the relationship between the part of the input data and the whole. This characteristic fits the task of calculating the features of the existing object parts from the overall features. Based on this idea, the present invention follows the Group-free-3D-Net network, and uses the Transformer structure to output the final 3D detection result.

优选地，所述步骤S31中的基于Faster-rcnn改进版的2D目标检测网络中的改进包括：舍弃Faster-rcnn中锚框生成时的经典比例(0.5:1:2)，而采用仓库常见的不同货物箱尺寸的长宽高的平均值作为锚框生成时的比例。这样有利于在回归锚框时减少误差，加快回归训练过程，从而使2D检测结果更加准确。Preferably, the improvement in the 2D target detection network based on the improved version of Faster-rcnn in the step S31 includes: abandoning the classic ratio (0.5:1:2) when the anchor frame is generated in Faster-rcnn, and adopting the common The average value of the length, width and height of different cargo box sizes is used as the ratio when the anchor frame is generated. This is beneficial to reduce the error when regressing the anchor box, speed up the regression training process, and thus make the 2D detection results more accurate.

优选地，所述步骤S31中的基于pointnet++改进版的点云特征提取网络的改进包括：针对货物箱不论尺寸如何变化，其形状始终维持为方体形状的特性，在分组聚类时，舍弃原本使用的球体范围进行聚类，而采用方体范围进行聚类，且要求原球体半径是方体面对角线长度的一半，即原球体是方体的棱切球。这样可以保证方体与原球体所围区域基本一致，并使聚类体更符合货物箱形状特征；另外，为配合方体聚类策略，将距离的计算方法由计算欧式距离改为计算曼和顿距离，使点云距离的表示形式更加符合方体特征，减少网络训练误差，增强最终检测效果。Preferably, the improvement of the point cloud feature extraction network based on the pointnet++ improved version in the step S31 includes: no matter how the size of the cargo box changes, its shape is always maintained as a cube shape, and when grouping and clustering, discard the original The range of the sphere is used for clustering, while the range of the cube is used for clustering, and the radius of the original sphere is required to be half the length of the diagonal of the cube, that is, the original sphere is an edge-cut sphere of the cube. This can ensure that the cuboid is basically consistent with the area surrounded by the original sphere, and make the cluster more in line with the shape characteristics of the cargo box; in addition, in order to cooperate with the cuboid clustering strategy, the calculation method of the distance is changed from calculating the Euclidean distance to calculating the Mannton distance. , so that the representation of the point cloud distance is more in line with the cube feature, reducing the network training error and enhancing the final detection effect.

优选地，所述步骤S32的基于改进版的LI-fusion模块的融合策略的改进包括：第一，舍弃LI-fusion模块中图像特征在和点云特征融合之前首先经过的特征通道对齐层，而是直接将未对齐的图像特征和点云特征输入到接下来的全连接层中，特征通道对齐层的作用是将图像特征维度与点云特征维度对齐，但在融合过程中，只是利用RGB图像特征来增强点云特征表示，在3D目标检测中，点云数据的结构信息和场景信息使其仍需在特征提取过程中占主要方面，所以特征对齐层的作用不明显却增加了网络深度，因此在改进版中去掉了这一层；第二，舍弃LI-fusion模块中融合特征输出之前经过的特征混合层，而是直接将连接后的图像特征和点云特征作为特征融合模块的最终输出。再去掉特征通道对齐层之后，为保证输入输出的通道数保持不变，这一特征融合层也需要去掉。此外，将未混合的图像特征和点云特征输入到接下来的模块中时，可以使网络在训练时更清晰地回传梯度信息。Preferably, the improvement of the fusion strategy based on the improved version of the LI-fusion module in the step S32 includes: first, abandoning the feature channel alignment layer that the image features in the LI-fusion module first pass through before merging with the point cloud features, and It is to directly input the unaligned image features and point cloud features into the next fully connected layer. The function of the feature channel alignment layer is to align the image feature dimension with the point cloud feature dimension, but in the fusion process, only the RGB image is used. feature to enhance point cloud feature representation. In 3D target detection, the structural information and scene information of point cloud data still make it the main aspect in the feature extraction process, so the role of the feature alignment layer is not obvious but increases the network depth. Therefore, this layer is removed in the improved version; secondly, the feature mixing layer passed before the fusion feature output in the LI-fusion module is discarded, but the connected image features and point cloud features are directly used as the final output of the feature fusion module . After removing the feature channel alignment layer, in order to ensure that the number of input and output channels remains unchanged, this feature fusion layer also needs to be removed. In addition, feeding unmixed image features and point cloud features into the following modules allows the network to return gradient information more clearly during training.

优选地，所述步骤S4的装载策略生成网络采用TAP-Net。TAP-Net是专门解决箱子装载问题的网络，符合货物箱装载的任务。TAP-Net可以根据待装载区域和目标区域货物箱的情况，给出当前待装载货物箱的装载顺序和装载位置，形成最终的装载策略。TAP-Net通过强化学习训练。其中，针对该货物箱装载任务，强化学习的奖励函数可由剩余空间指数和装载稳定性指数组成，具体为：当装载完当前货物箱后，计算目标装载区域中当前装载箱高度下方的剩余空间，该剩余空间的大小即为剩余空间权重；当装载完当前货物箱后，判断其重心下方是否有支撑，根据支撑情况返回装载稳定性指数。这两部分指数相加，即为最终的奖励函数的输出值。TAP-Net包括装载顺序生成子网络和装载位置生成子网络，分别生成待装载货物箱的装载顺序和装载位置。装载顺序和装载位置的集合即为最终所需的装载策略，具体过程如下：Preferably, the loading policy generating network in step S4 adopts TAP-Net. TAP-Net is a network that specifically solves the problem of box loading, and is in line with the task of box loading. TAP-Net can give the loading sequence and loading position of the current cargo boxes to be loaded according to the cargo boxes in the loading area and the target area, and form the final loading strategy. TAP-Net is trained by reinforcement learning. Among them, for the cargo box loading task, the reward function of reinforcement learning can be composed of the remaining space index and the loading stability index, specifically: after the current cargo box is loaded, calculate the remaining space below the height of the current loading box in the target loading area, The size of the remaining space is the weight of the remaining space; after the current cargo box is loaded, it is judged whether there is support under its center of gravity, and the loading stability index is returned according to the support situation. The sum of these two indices is the output value of the final reward function. TAP-Net includes a loading sequence generation subnetwork and a loading location generation subnetwork, which respectively generate the loading sequence and loading location of the boxes to be loaded. The set of loading order and loading location is the final required loading strategy. The specific process is as follows:

S41、将S33所输出的待装载区域和目标区域货物箱的3D尺寸输入到TAP-Net中的装载顺序生成子网络，生成所有待装载的货物箱的装载顺序；S41. Input the 3D dimensions of the cargo boxes in the area to be loaded and the target area output by S33 into the loading order generation subnetwork in TAP-Net, and generate the loading order of all cargo boxes to be loaded;

S42、将S41所得到的待装载货物箱的装载顺序和目标区域已装载货物箱的位置和尺寸信息，输入到TAP-Net的装载策略生成子网络中，生成最终的装载位置。将装载顺序和装载位置组成一一对应的集合，生成最终的生成策略。S42. Input the loading order of the boxes to be loaded and the position and size information of the loaded boxes in the target area obtained in S41 into the loading strategy generation sub-network of TAP-Net to generate the final loading position. The loading order and the loading position are combined into a one-to-one correspondence set to generate the final generation strategy.

优选地，所述步骤S5中完成装载任务具体为，根据S4生成的装载策略，得到当前待装载的一个货物箱的托取位置和目标装载位置，计算机械臂到当前待装载货物箱的托取面的偏转位移和旋转角度，控制机械臂到达以指定角度到达指定位置并托取完成后，计算从当前待装载位置到目标装载位置的偏转位移和旋转角度，控制机械臂以指定角度到达指定位置，放下货物箱。重复该步骤，直至所有货物箱装载完毕。Preferably, completing the loading task in step S5 is specifically, according to the loading strategy generated in S4, obtaining the pick-up position and target loading position of a cargo box currently to be loaded, and calculating the pick-up position of the current cargo box to be loaded by the robotic arm. The deflection displacement and rotation angle of the surface, control the mechanical arm to reach the specified position at the specified angle and after the lifting is completed, calculate the deflection displacement and rotation angle from the current to-be-loaded position to the target loading position, and control the mechanical arm to reach the specified position at the specified angle , put down the cargo box. Repeat this step until all boxes are loaded.

本发明还提供一种基于RGBD相机的货物箱智能装载***，包括：The present invention also provides a cargo box intelligent loading system based on an RGBD camera, including:

RGBD相机采集装置，用于通过RGBD相机采集待装载货物箱和目标区域的3D场景数据；An RGBD camera acquisition device is used to collect 3D scene data of the cargo box to be loaded and the target area through the RGBD camera;

操作台，用于安装RGBD相机采集装置和固定机械臂；An operating table for installing the RGBD camera acquisition device and fixing the mechanical arm;

托取装置，用于对待装载的货物箱进行托取和装载；Pick-up device, used for picking up and loading the cargo boxes to be loaded;

机械臂，用于辅助托取装置到达指定位置，并以指定方向进行托取或装载；The mechanical arm is used to assist the lifting device to reach the designated position, and carry out lifting or loading in the designated direction;

控制***，存储好预训练的网络模型，通过RGBD相机采集装置获取3D场景数据，并根据数据最终生成机械臂的偏转位移和旋转角度。The control system stores the pre-trained network model, acquires 3D scene data through the RGBD camera acquisition device, and finally generates the deflection displacement and rotation angle of the robotic arm based on the data.

优选地，所述RGBD相机采集装置由RGB相机、深度相机和补光设备组成，RGBD相机用于采集RGB图，深度相机用于采集深度图，补光设备用于提供一定光源，保证光线强度合适；Preferably, the RGBD camera acquisition device is composed of an RGB camera, a depth camera and a supplementary light device, the RGBD camera is used to collect RGB images, the depth camera is used to collect depth images, and the supplementary light device is used to provide a certain light source to ensure that the light intensity is appropriate ;

优选地，机械臂由前臂、后臂和三个转动轴组成，可协助托取装置到达待装载区域和目标区域的指定位置。Preferably, the mechanical arm is composed of a front arm, a rear arm and three rotation axes, which can assist the lifting device to reach the designated position of the loading area and the target area.

与现有技术相比，本发明利用3D检测网络和强化学习决策网络，实现了不同尺寸的货物箱的装载，同时还通过强化学习特性，结合当前状态下所有货物箱的尺寸位置以及目标区域的空间剩余情况，输出具有整体性考虑的装载策略。具有灵活性高、空间利用率高和装载效率高的特点，实现了不同尺寸大小货物箱的自动化装载任务。Compared with the prior art, the present invention utilizes the 3D detection network and the reinforcement learning decision-making network to realize the loading of cargo boxes of different sizes, and at the same time combines the size and position of all cargo boxes in the current state and the location of the target area through reinforcement learning characteristics. For the remaining space, output a loading strategy with overall consideration. It has the characteristics of high flexibility, high space utilization and high loading efficiency, and realizes the automatic loading tasks of different sizes of cargo boxes.

附图说明Description of drawings

图1本发明一种基于RGBD相机的货物箱智能装载方法的基本流程图；Fig. 1 basic flow chart of a kind of cargo box intelligent loading method based on RGBD camera of the present invention;

图2本发明的3D目标检测网络的算法流程图；Fig. 2 is an algorithm flow chart of the 3D target detection network of the present invention;

图3本发明的基于强化学习的装载策略生成网络算法流程图；Figure 3 is a flow chart of network algorithm generation based on reinforcement learning loading strategy of the present invention;

图4(a)本发明的未采用步骤S3中所述改进的3D目标检测网络的训练损失日志图；Fig. 4 (a) the training loss log map of the improved 3D object detection network described in step S3 of the present invention is not adopted;

图4(b)本发明的采用步骤S3中所述改进的3D目标检测网络训练的损失日志图；Fig. 4 (b) adopts the loss log diagram of the improved 3D object detection network training described in step S3 of the present invention;

图5(a)本发明的示例3D场景的RGB图像；Figure 5(a) RGB image of an example 3D scene of the present invention;

图5(b)本发明的图5(a)对应3D场景的点云表示及3D检测结果示意图；Fig. 5 (b) Fig. 5 (a) of the present invention corresponds to a point cloud representation of a 3D scene and a schematic diagram of a 3D detection result;

图6本发明的一种基于RGBD相机的货物箱智能装载***的结构图。Fig. 6 is a structural diagram of an intelligent cargo box loading system based on an RGBD camera of the present invention.

图中：1.控制***，2.信号传输线，3.操作台，4.可变长后臂，5.前臂，6.转动轴，7.RGBD相机，8.补光设备，9.托取装置，10.待装载区域，11.不同尺寸货物箱，12.目标区域。In the figure: 1. Control system, 2. Signal transmission line, 3. Operating table, 4. Variable length rear arm, 5. Forearm, 6. Rotation shaft, 7. RGBD camera, 8. Supplementary light equipment, 9. Holder Device, 10. Area to be loaded, 11. Cargo boxes of different sizes, 12. Target area.

具体实施方式Detailed ways

为使本领域的技术人员更好地理解本发明的设计方案，下面结合附图和实施例对本发明作进一步说明。In order to enable those skilled in the art to better understand the design of the present invention, the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

如图1所示，一种基于RGBD相机的货物箱智能装载方法，所述方法步骤如下：As shown in Figure 1, a method for intelligent loading of cargo boxes based on RGBD cameras, the steps of the method are as follows:

S1、通过RGBD相机采集待装载区域和目标区域货物箱的颜色和深度信息，生成RGB图片和对应的深度图片。并进行相机标定，确定图像坐标系与世界坐标系的转化关系；S1. Collect the color and depth information of the cargo box in the area to be loaded and the target area through the RGBD camera, and generate an RGB image and a corresponding depth image. And carry out camera calibration to determine the transformation relationship between the image coordinate system and the world coordinate system;

本实施例中，首先需要采集仓库中常见的不同尺寸货物箱的长宽高等数据进行模型预训练，以标准邮政纸箱中1号、2号和3号纸箱为例，标准1号纸箱长宽高为530*290*370mm，标准2号纸箱长宽高为530*230*290mm，标准3号纸箱长宽高为430*210*270mm。采集好常见货物箱数据之后，使用这些数据对所需网络模型进行预训练。In this example, it is first necessary to collect the length, width, and height data of cargo boxes of different sizes that are common in warehouses for model pre-training. Taking the No. 1, No. 2, and No. 3 cartons in standard postal cartons as examples, the length, width, and height of standard No. 1 carton The length, width and height of the standard No. 2 carton are 530*230*290mm, and the length, width and height of the standard No. 3 carton are 430*210*270mm. After collecting common cargo box data, use these data to pre-train the required network model.

本实施例中，所有需要预训练的网络模型包括：In this embodiment, all network models that require pre-training include:

1、2D目标检测网络，采用改进的faster-rcnn网络模型，设定检测对象为仓库货物箱，输入数据为所采集的常见货物箱的RGB图像，如标准1号、2号、3号纸箱，输出目标为带2D检测框大小位置及分类的RGB图像；1. 2D target detection network, using the improved faster-rcnn network model, setting the detection object as a warehouse cargo box, and the input data is the RGB image of the collected common cargo boxes, such as standard No. 1, No. 2, No. 3 cartons, The output target is an RGB image with 2D detection frame size, position and classification;

2、3D目标检测网络，采用本发明提出的3D目标检测网络，设定检测对象为3D场景中的仓库货物箱，输出目标是3D场景中货物箱的大小和位置。预训练时，把预训练好的2D目标检测网络作为3D目标检测网络的一部分，再对3D目标检测网络进行整体预训练，同时，将预训练结束后的2D目标检测网络输出的带有2D检测结果的RGB图像以及相应的深度图数据，作为3D目标检测网络预训练的输入数据。2. The 3D target detection network adopts the 3D target detection network proposed by the present invention, sets the detection object as a warehouse cargo box in the 3D scene, and outputs the size and position of the cargo box in the 3D scene. During pre-training, the pre-trained 2D target detection network is used as a part of the 3D target detection network, and then the 3D target detection network is pre-trained as a whole. The resulting RGB image and the corresponding depth map data are used as the input data for the pre-training of the 3D object detection network.

3、装载策略生成网络，采用TAP-Net网络模型，设定学习任务寻找当前批次的待装载货物箱最优装载位置和装载顺序，通过强化学习训练，输出目标是由装载顺序和装载位置组成的最优装载策略。3. Loading strategy generation network, using the TAP-Net network model, setting learning tasks to find the optimal loading position and loading sequence of the current batch of cargo boxes to be loaded, through reinforcement learning training, the output target is composed of loading sequence and loading position optimal loading strategy.

本实施例中，在得到所有预训练网络模型后，开始对当前具体实施例中的货物箱进行装载，所述步骤S1中首先对所使用的相机进行标定，标定方法采用标定板标定法，获取相机的内参和外参，从而可以确定相机坐标系和世界坐标系的转化关系，从世界坐标系中的一个点(x,y,z)到图像坐标系一点(u,v)转化公式为：In this embodiment, after all the pre-trained network models are obtained, the cargo box in the current specific embodiment is started to be loaded. In the step S1, the camera used is first calibrated, and the calibration method adopts the calibration plate calibration method to obtain The internal and external parameters of the camera, so that the conversion relationship between the camera coordinate system and the world coordinate system can be determined. The conversion formula from a point (x, y, z) in the world coordinate system to a point (u, v) in the image coordinate system is:

式中：f _u和f _v分别是水平方向和竖直方向上的相机焦距，R和T是相机的旋转矩阵和平移矩阵，C是尺度因子。 Where: f _u and f _v are the focal lengths of the camera in the horizontal and vertical directions, respectively, R and T are the rotation matrix and translation matrix of the camera, and C is the scaling factor.

本实施例中，在相机标定完成后，将RGBD相机对准待装载区域，通过普通相机获取待装载货物箱的RGB图像，通过深度相机获得待装载区域货物箱的深度图像。然后再将相机对准目标装载区域，获取目标装载区域的RGB图像和深度图像。然后将待装载区域的RGBD图像和目标区域的RGBD图像分别输入到接下来的阶段中。In this embodiment, after the camera calibration is completed, the RGBD camera is aimed at the area to be loaded, and the RGB image of the container to be loaded is obtained by the ordinary camera, and the depth image of the container in the area to be loaded is obtained by the depth camera. Then point the camera at the target loading area to acquire RGB images and depth images of the target loading area. The RGBD image of the area to be loaded and the RGBD image of the target area are then fed into the next stage separately.

本实施例中，所述步骤S2中的增强手段包括：对于RGB图像，采用颜色抖动、随机裁剪的图像增强方法，然后通过缩放将图片缩放到固定尺寸，本实施例中采用faster-rcnn经典尺寸1000*600，方便步骤S3的检测网络做处理；对于点云数据，采用百分之五十的几率点云围绕yz轴平面随机翻转、围绕z轴进行(-30°,30°)的随机旋转的数据增强手段，并通过随机采样的方式将点云采样到20000个点，为步骤S3的检测网络做输入准备。In this embodiment, the enhancement means in the step S2 includes: for the RGB image, the image enhancement method of color dithering and random cropping is adopted, and then the image is scaled to a fixed size by scaling, and the classic size of faster-rcnn is used in this embodiment 1000*600, which is convenient for the detection network in step S3 to process; for point cloud data, use a 50% probability that the point cloud will be randomly flipped around the yz axis plane, and randomly rotated around the z axis (-30°, 30°) The data augmentation method is used, and the point cloud is sampled to 20,000 points by random sampling to prepare for the input of the detection network in step S3.

如图2所示，3D目标检测网络包括三个模块：特征提取模块、特征融合模块和建议生成模块。其中特征提取模块又分为两个分支：图像特征分支和点云特征分支。具体地，图像特征分支是基于Faster-rcnn改进版的2D目标检测网络、点云特征分支是基于Pointnet++改进版的点云特征提取网络；特征融合模块是基于LI-Fusion模型改进版的融合模型，用于融合点云特征和图像特征；建议生成网络基于Group-free-3D-Net网络的Transformer结构，生成3D场景中每个货物箱的位置和尺寸大小。3D目标检测网络的具体流程如下：As shown in Figure 2, the 3D object detection network consists of three modules: feature extraction module, feature fusion module and proposal generation module. The feature extraction module is divided into two branches: image feature branch and point cloud feature branch. Specifically, the image feature branch is a 2D target detection network based on an improved version of Faster-rcnn, and the point cloud feature branch is a point cloud feature extraction network based on an improved version of Pointnet++; the feature fusion module is a fusion model based on an improved version of the LI-Fusion model. It is used to fuse point cloud features and image features; it is recommended to generate a network based on the Transformer structure of the Group-free-3D-Net network to generate the position and size of each cargo box in the 3D scene. The specific process of the 3D target detection network is as follows:

S31、获取图像分支的最终特征以及点云分支的最终特征；S31. Obtain the final features of the image branch and the final features of the point cloud branch;

其中，图像分支最终特征通过基于Faster-rcnn改进版的2D目标检测网络得到，具体为，将增强后的RGB图像输入基于Faster-rcnn改进版的2D目标检测网络，该网络输出RGB图像中货物箱的2D检测框的位置、大小和对应检测框的分类分数，然后从这些2D检测框中提取特征，特征包括从2D检测框中提取的语义、纹理和几何特征，其中语义特征为2D检测框的分类分数；纹理特征为检测框中所有像素的RGB像素值；几何特征为从2D检测框的中心投影到3D场景的投影射线。这三种特征使用concatenate函数在通道维度连接，作为图像分支的最终特征输出；Among them, the final feature of the image branch is obtained through the 2D target detection network based on the improved version of Faster-rcnn. Specifically, the enhanced RGB image is input into the 2D target detection network based on the improved version of Faster-rcnn, and the network outputs the cargo box in the RGB image. The position and size of the 2D detection frame and the classification score of the corresponding detection frame, and then extract features from these 2D detection frames. The features include semantic, texture and geometric features extracted from the 2D detection frame, where the semantic feature is the 2D detection frame. Classification scores; texture features are the RGB pixel values of all pixels in the detection box; geometric features are projected rays projected from the center of the 2D detection box to the 3D scene. These three features are connected in the channel dimension using the concatenate function as the final feature output of the image branch;

点云分支的最终特征通过基于Pointnet++改进版的点云特征提取网络得到，具体为，以20000个点云采样点作为输入，通过改进版的Pointnet++网络得到点云特征，作为点云分支的最终特征输出；The final feature of the point cloud branch is obtained through the point cloud feature extraction network based on the improved version of Pointnet++. Specifically, 20,000 point cloud sampling points are used as input, and the point cloud feature is obtained through the improved version of the Pointnet++ network as the final feature of the point cloud branch. output;

S33、基于Transformer结构的建议生成模块以融合特征作为输入，从所有的融合特征中计算货物箱特征，并最终输出待装载区域和目标区域货物箱的中心点空间坐标位置与长宽高尺寸大小。S33. The suggestion generation module based on the Transformer structure takes the fusion features as input, calculates the features of the cargo box from all the fusion features, and finally outputs the spatial coordinate position and the length, width, and height of the center point of the cargo box in the area to be loaded and the target area.

本实施例中，所述步骤S31中的基于Faster-rcnn改进版的2D目标检测网络中的改进包括：舍弃Faster-rcnn中锚框生成时的经典比例(0.5:1:2)，而采用仓库不同货物箱尺寸的长宽高的平均值作为锚框生成时的比例，如本实施例中采用1号、2号和3号标准纸箱，长宽高平均值为：497mm*243mm*310mm，换算为比例约为(4.7:2.3:3.0)。这样有利于在回归锚框时减少初始误差，加快回归训练过程，从而使2D检测结果更加准确。In this embodiment, the improvement in the 2D target detection network based on the improved version of Faster-rcnn in the step S31 includes: abandoning the classic ratio (0.5:1:2) when the anchor frame is generated in Faster-rcnn, and using the warehouse The average value of the length, width and height of different cargo box sizes is used as the ratio when the anchor frame is generated. For example, in this embodiment, No. 1, No. 2 and No. 3 standard cartons are used, and the average length, width and height are: 497mm*243mm*310mm, converted The ratio is about (4.7:2.3:3.0). This is beneficial to reduce the initial error when regressing the anchor box, speed up the regression training process, and thus make the 2D detection results more accurate.

本实施例中，所述步骤S31中的基于pointnet++改进版的点云特征提取网络的改进包括：在分组聚类时，舍弃原本使用的球体范围进行聚类，而采用方体范围进行聚类，且要求原球体半径是方体面对角线长度的一半，即原球体是方体的棱切球，在原pointnet++里，四层特征提取层分组时球半径分别为0.2m、0.4m、0.8m和1.2m，因此改进版里方体边长应为0.28m、0.57m、1.13m、1.70m，这样可以保证方体与原球体所围区域基本一致，并使聚类体更符合货物箱形状特征；另外，将点云数据中任意两点之间的距离计算方法由计算欧式距离改为计算曼和顿距离，使点云距离的表示形式更加符合方体特征，减少网络训练误差，增强最终检测效果。In this embodiment, the improvement of the point cloud feature extraction network based on the improved version of pointnet++ in the step S31 includes: when grouping and clustering, abandoning the originally used sphere range for clustering, and using the cube range for clustering, And the radius of the original sphere is required to be half the length of the diagonal of the cube, that is, the original sphere is an edge-cut sphere of the cube. In the original pointnet++, when the four feature extraction layers are grouped, the sphere radii are 0.2m, 0.4m, and 0.8m respectively. and 1.2m, so the side lengths of the cube in the improved version should be 0.28m, 0.57m, 1.13m, and 1.70m, which can ensure that the cube is basically consistent with the area surrounded by the original sphere, and make the cluster more in line with the shape of the cargo box In addition, the distance calculation method between any two points in the point cloud data is changed from the calculation of the Euclidean distance to the calculation of the Mannton distance, so that the representation of the point cloud distance is more in line with the cube characteristics, reducing the network training error and enhancing the final Detection effect.

本实施例中，所述步骤S32的基于改进版的LI-fusion模块的融合策略的改进包括：第一，舍弃LI-fusion模块中图像特征在和点云特征融合之前首先经过的特征通道对齐层，而是直接将S31中初始的图像特征和点云特征输入到接下来的全连接层中；第二，舍弃LI-fusion模块中融合特征输出之前经过的特征混合层，而是直接将连接后的图像特征和点云特征作为特征融合模块的最终输出。删除这两层后，可以使点云特征和图像特征通道数一直保持不变，在本实施例中，点云特征遵循Group-free-3D-Net网络的pointnet++网络的通道数，即288维；图像特征遵循Imvotnet中图像分支的通道数，即128维。所以最后融合特征的通道数为416 维。In this embodiment, the improvement of the fusion strategy based on the improved version of the LI-fusion module in step S32 includes: first, abandoning the feature channel alignment layer that the image features in the LI-fusion module first pass through before being fused with the point cloud features , but directly input the initial image features and point cloud features in S31 to the next fully connected layer; second, abandon the feature mixing layer that passes through the fusion feature output in the LI-fusion module, but directly connect the connected The image features and point cloud features of are used as the final output of the feature fusion module. After deleting these two layers, the number of channels of point cloud features and image features can be kept constant. In this embodiment, the point cloud features follow the channel number of the pointnet++ network of the Group-free-3D-Net network, that is, 288 dimensions ; The image features follow the number of channels of the image branch in Imvotnet, which is 128 dimensions. So the number of channels for the final fusion feature is 416 dimensions.

如图3所示，所述步骤S4的装载策略生成网络采用TAP-Net。TAP-Net包括装载顺序生成子网络和装载位置生成子网络，分别生成待装载货物箱的装载顺序和装载位置。装载顺序和装载位置的集合即为最终所需的装载策略，具体过程如下：As shown in FIG. 3 , the loading policy generation network in step S4 adopts TAP-Net. TAP-Net includes a loading sequence generation subnetwork and a loading location generation subnetwork, which respectively generate the loading sequence and loading location of the boxes to be loaded. The set of loading order and loading location is the final required loading strategy. The specific process is as follows:

S41、将S33所输出的待装载区域和目标区域货物箱的3D尺寸装载情况输入到TAP-Net中的装载顺序生成子网络，利用预训练的TAP-Net模型，生成所有待装载的货物箱的装载顺序，如先装载1号纸箱，再装载2号纸箱，最后装载3号纸箱；S41. Input the 3D size loading situation of the cargo boxes in the area to be loaded and the target area output by S33 into the loading sequence generation sub-network in TAP-Net, and use the pre-trained TAP-Net model to generate all cargo boxes to be loaded. Loading sequence, such as loading No. 1 carton first, then No. 2 carton, and finally No. 3 carton;

S42、将S41所得到的待装载货物箱的装载顺序和目标区域已装载货物箱的位置和尺寸信息，输入到TAP-Net的装载策略生成子网络中，利用预训练的TAP-Net模型，生成最终的装载位置。将装载顺序和装载位置组成一一对应的集合，生成最终的生成策略，如1号纸箱装载于最底层的右上角；2号纸箱装载于2号纸箱侧面紧邻处，3号纸箱装载于1号纸箱上方的右上角处。具体装载策略由货物箱ID编号以及每个货物箱装载位置组成S42. Input the loading order of the cargo boxes to be loaded obtained in S41 and the position and size information of the loaded cargo boxes in the target area into the loading strategy generation sub-network of TAP-Net, and use the pre-trained TAP-Net model to generate Final load position. The loading sequence and loading position are combined to form a one-to-one correspondence set to generate the final generation strategy. For example, No. 1 carton is loaded in the upper right corner of the bottom layer; No. 2 carton is loaded on the side of No. 2 carton, and No. 3 carton is loaded in No. in the upper right corner above the carton. The specific loading strategy consists of the ID number of the cargo box and the loading position of each cargo box

本实施例中，所述步骤S5中根据步骤S4所生成的装载策略，根据装载顺序首先取出第一个待装载的货物箱，计算机械臂到达当前待装载货物箱托取面的偏转位移和旋转角度，根据数据移动机械臂。然后控制托取装置对待装载货物箱进行托取。托取完成后，计算从当前位置到目标装载位置的偏转位移和旋转角度，控制机械臂控制机械臂对货物箱进行移动。当机械臂移动到指定位置后，控制托取装载放下货物箱。重复此步骤，直至所有货物箱装载完毕。In this embodiment, according to the loading strategy generated in step S4 in the step S5, the first cargo box to be loaded is firstly taken out according to the loading sequence, and the deflection displacement and rotation of the mechanical arm reaching the lifting surface of the current cargo box to be loaded are calculated Angle, move the arm according to the data. Then control the lifting device to lift the cargo box to be loaded. After the picking is completed, calculate the deflection displacement and rotation angle from the current position to the target loading position, and control the mechanical arm to move the cargo box. When the robotic arm moves to the designated position, it controls the pallet to load and put down the cargo box. Repeat this step until all boxes are loaded.

如图4所示，展示效果为训练3D目标检测网络的损失可视化，(a)是使用未增加S3中所述所有改进的3D目标检测网络，(b)是改进版的3D目标检测网络，训练时采用600个epoch，分别取损失最低的三个epoch做展示。如图中所示，在增加针对性改进后，发现3D目标检测模型的效果比原始版本损失更低，取600个epoch中损失最低的三个epoch，发现改进版3D目标检测网络的损失比未改进版的网络损失均有降低，证明针对性的改进是有效果的。As shown in Figure 4, the display effect is the loss visualization of the training 3D target detection network, (a) is using the improved 3D target detection network described in S3, (b) is the improved version of the 3D target detection network, training When using 600 epochs, take the three epochs with the lowest loss for display. As shown in the figure, after adding targeted improvements, it is found that the loss of the 3D target detection model is lower than that of the original version. Taking the three epochs with the lowest loss among the 600 epochs, it is found that the loss of the improved 3D target detection network is lower than that of the original version. The network loss of the improved version has been reduced, which proves that the targeted improvement is effective.

如图5所示，展示效果为3D目标检测模型的可视化结果，设计检测对象是箱子时，由图可见，该检测网络准确标出箱子的位置与大小，证明该3D目标检测模型针对货物箱的检测是有效果的，可以满足接下来的策略生成以及最后装载任务需求。As shown in Figure 5, the display effect is the visualization result of the 3D target detection model. When the design detection object is a box, it can be seen from the figure that the detection network accurately marks the position and size of the box, which proves that the 3D target detection model is suitable for the cargo box. The detection is effective and can meet the requirements of subsequent policy generation and final loading tasks.

如图6所示，本发明还提供一种基于RGBD相机的货物箱智能装载***，包括：As shown in Figure 6, the present invention also provides an RGBD camera-based cargo box intelligent loading system, including:

本实施例中，所述RGBD相机采集装置安装于操作台上，而非安装在机械臂上，这样可以保证RGBD相机的相对位置不变，减少标定难度与次数，提高装载效率。In this embodiment, the RGBD camera acquisition device is installed on the console instead of on the mechanical arm, so that the relative position of the RGBD camera can be kept unchanged, the difficulty and frequency of calibration can be reduced, and the loading efficiency can be improved.

本实施例中，托取装置可以上下移动，向上移动可以完成托取任务，向下移动可以完成放置任务。In this embodiment, the lifting device can move up and down, the lifting task can be completed by moving upward, and the placing task can be completed by moving downward.

本实施例中，机械臂包括前臂、可变长后臂和三个可实现360°旋转的转动轴。可变长后臂通过转动轴与操作台相连接，其中可变长后臂根据货物箱位置调整长度；前臂通过转动轴和可变长后臂连接，扩大装置可达范围；托取装置通过转动轴与前臂前端相连，用于配合机械臂以指定位置指定角度完成货物箱装载In this embodiment, the mechanical arm includes a forearm, a variable-length rear arm, and three rotation axes that can rotate 360°. The variable-length rear arm is connected to the operating table through the rotating shaft, and the length of the variable-length rear arm is adjusted according to the position of the cargo box; the forearm is connected to the variable-length rear arm through the rotating shaft to expand the reachable range of the device; The shaft is connected to the front end of the forearm, and is used to cooperate with the mechanical arm to complete the loading of the cargo box at a specified position and a specified angle

本实施例中，控制***连接RGBD相机采集装置和机械臂，并内置上述训练好的3D目标检测网络和装载策略生成网络，通过RGBD相机获取带装载区域货物箱的3D场景信息，并根据输出的装载顺序和装载位置，计算机械臂移动距离和角度，控制托取装置完成装载任务。In this embodiment, the control system is connected to the RGBD camera acquisition device and the mechanical arm, and built-in the above-mentioned trained 3D target detection network and loading strategy generation network, and obtains the 3D scene information of the cargo box with the loading area through the RGBD camera, and according to the output Loading sequence and loading position, calculate the moving distance and angle of the mechanical arm, and control the lifting device to complete the loading task.

以上对本发明所提供的一种基于RGBD相机的货物箱智能装载方法及***做了详细的介绍。以上介绍仅用于为本领域技术人员理解本发明提供帮助。对于本领域技术人员，可以在本发明的基础上做若干修改和润饰，这些修改和润饰也当视为本发明的保护范围。The above is a detailed introduction of the RGBD camera-based intelligent cargo box loading method and system provided by the present invention. The above description is only used to help those skilled in the art understand the present invention. For those skilled in the art, several modifications and modifications can be made on the basis of the present invention, and these modifications and modifications should also be regarded as the protection scope of the present invention.

Claims

一种基于RGBD相机的货物箱智能装载方法，其特征在于，所述方法包括以下步骤：A kind of cargo box intelligent loading method based on RGBD camera, it is characterized in that, described method comprises the following steps:

S1、通过RGBD相机采集待装载区域和目标区域中货物箱的颜色和深度信息，生成RGB图片和对应的深度图片，并进行相机标定，确定相机内外参及图像坐标系与世界坐标系的转化关系；S1. Use the RGBD camera to collect the color and depth information of the cargo boxes in the area to be loaded and the target area, generate RGB images and corresponding depth images, and perform camera calibration to determine the conversion relationship between the internal and external parameters of the camera and the image coordinate system and the world coordinate system. ;

S2、根据S1中相机标定得到的相机内外参，将深度图转化为点云，并通过增强手段分别对RGB图像和点云进行数据增强；S2. According to the internal and external parameters of the camera obtained from the camera calibration in S1, the depth map is converted into a point cloud, and the data of the RGB image and the point cloud are respectively enhanced by means of enhancement;

S3、将步骤S2中所得增强后的RGB图像和增强后的点云数据输入到3D目标检测网络中，检测出待装载区域和目标区域中每个货物箱的位置和尺寸大小，生成带有中心位置坐标和长宽高的三维检测框的信息；所述的3D目标检测网络依次包括三个模块：特征提取模块、特征融合模块和建议生成模块；其中特征提取模块又分为两个分支：图像特征分支和点云特征分支。具体地，图像特征分支是基于Faster-rcnn改进版的2D目标检测网络、点云特征分支是基于pointnet++改进版的点云特征提取网络；特征融合模块是基于LI-Fusion模型改进版的融合模型，用于融合点云特征和图像特征；建议生成网络基于Group-free-3D-Net网络的Transformer结构，生成3D场景中每个货物箱的位置和尺寸大小；待装载区域用于放置需要装载的货物箱，目标区域用于放置已装载的货物箱；S3. Input the enhanced RGB image and the enhanced point cloud data obtained in step S2 into the 3D target detection network, detect the position and size of each cargo box in the area to be loaded and the target area, and generate The information of the three-dimensional detection frame of position coordinates and length, width and height; the described 3D object detection network comprises three modules successively: feature extraction module, feature fusion module and suggestion generation module; wherein feature extraction module is divided into two branches again: image Feature branch and point cloud feature branch. Specifically, the image feature branch is based on the improved Faster-rcnn 2D target detection network, and the point cloud feature branch is based on the pointnet++ improved version of the point cloud feature extraction network; the feature fusion module is a fusion model based on the improved version of the LI-Fusion model. It is used to fuse point cloud features and image features; it is recommended to generate a network based on the Transformer structure of the Group-free-3D-Net network to generate the position and size of each cargo box in the 3D scene; the area to be loaded is used to place the cargo that needs to be loaded boxes, the target area is used to place loaded boxes of goods;

S4、将步骤S3中所得到的货物箱的位置和尺寸信息输入到基于强化学习的装载策略生成网络中，得到最终的装载策略；S4. Input the position and size information of the cargo box obtained in step S3 into the loading strategy generation network based on reinforcement learning to obtain the final loading strategy;

S5、根据步骤S4所生成的装载策略，计算机械臂的偏转位移和旋转角度，控制机械臂对货物箱进行装载。S5. Calculate the deflection displacement and rotation angle of the mechanical arm according to the loading strategy generated in step S4, and control the mechanical arm to load the cargo box.
如权利要求1所述的一种基于RGBD相机的货物箱智能装载方法，其特征在于，所述步骤S1中的标定方法具体为使用标定板标定法标定相机内参和外参；所述步骤S2中的增强手段包括：对于RGB图像，通过颜色抖动、随机裁剪，并通过缩放将图片缩放到固定尺寸；对于点云数据，通过随机缩放、随机旋转，并通过随机采样的方式将点云采样到20000个点。The intelligent loading method of a cargo box based on an RGBD camera according to claim 1, wherein the calibration method in the step S1 is specifically to use a calibration board calibration method to calibrate the internal and external parameters of the camera; in the step S2 The enhancement methods include: for RGB images, through color jitter, random cropping, and scaling the image to a fixed size; for point cloud data, through random scaling, random rotation, and random sampling to sample the point cloud to 20000 points.
如权利要求1所述的一种基于RGBD相机的货物箱智能装载方法，其特征在于，所述3D目标检测网络的具体流程如下：A kind of cargo box intelligent loading method based on RGBD camera as claimed in claim 1, is characterized in that, the specific flow of described 3D target detection network is as follows:

S31、利用特征提取模块分别获取图像分支的最终特征以及点云分支的最终特征，S31. Using the feature extraction module to obtain the final features of the image branch and the final features of the point cloud branch respectively,

其中，图像分支最终特征的获取过程具体为，将待装载区域和目标区域增强后的RGB图像输入基于Faster-rcnn改进版的2D目标检测网络，该网络输出RGB图像中货物箱的2D检测框的位置、大小和对应检测框的分类分数，然后从这些2D检测框中提取特征，特征包括从2D检测框中提取的语义、纹理和几何特征，其中语义特征为2D检测框的分类分数；纹理特征为检测框中所有像素的RGB像素值；几何特征为从2D检测框的中心投影到3D场景的投影射线，该射线可由2D检测框的大小位置以及相机标定得到的内外参获得。这三种特征相连接，作为图像分支的最终特征输出；Among them, the acquisition process of the final feature of the image branch is specifically to input the enhanced RGB image of the area to be loaded and the target area into the 2D target detection network based on the improved version of Faster-rcnn, and the network outputs the 2D detection frame of the cargo box in the RGB image. The position, size and classification score of the corresponding detection frame, and then extract features from these 2D detection frames, the features include semantic, texture and geometric features extracted from the 2D detection frame, where the semantic feature is the classification score of the 2D detection frame; texture features is the RGB pixel value of all pixels in the detection frame; the geometric feature is the projection ray projected from the center of the 2D detection frame to the 3D scene, which can be obtained from the size and position of the 2D detection frame and the internal and external parameters obtained by camera calibration. These three features are concatenated as the final feature output of the image branch;

点云分支的最终特征通过基于Pointnet++改进版的点云特征提取网络得到，具体为，以20000个点云采样点作为输入，通过改进版的Pointnet++网络得到点云特征，作为点云分支的最终特征输出；The final feature of the point cloud branch is obtained through the point cloud feature extraction network based on the improved version of Pointnet++. Specifically, 20,000 point cloud sampling points are used as input, and the point cloud feature is obtained through the improved version of the Pointnet++ network as the final feature of the point cloud branch. output;

S32、利用特征融合模块将所述步骤S31中的图像分支的特征与点云分支的特征融合起来，融合方法是基于改进版的LI-fusion模块，输出融合后的特征；S32, using the feature fusion module to fuse the features of the image branch in the step S31 with the features of the point cloud branch, the fusion method is based on the improved version of the LI-fusion module, and outputs the fused features;

S33、基于Transformer结构的建议生成模块以融合特征作为输入，从所有的融合特征中计算货物箱特征，并最终输出待装载区域和目标区域货物箱的中心点空间坐标位置与长宽高尺寸大小。S33. The suggestion generation module based on the Transformer structure takes the fusion features as input, calculates the features of the cargo box from all the fusion features, and finally outputs the spatial coordinate position and the length, width, and height of the center point of the cargo box in the area to be loaded and the target area.
如权利要求1所述的一种基于RGBD相机的货物箱智能装载方法，其特征在于，所述步骤S4的装载策略生成网络采用TAP-Net。The method for intelligently loading cargo boxes based on RGBD cameras according to claim 1, wherein the loading strategy generating network in step S4 adopts TAP-Net.
如权利要求1所述的一种基于RGBD相机的货物箱智能装载方法，其特征在于，步骤S31所述的基于Faster-rcnn改进版的2D目标检测网络中的改进包括：舍弃Faster-rcnn中锚框生成时的经典比例0.5:1:2，而采用仓库不同货物箱尺寸的长宽高的平均值作为锚框生成时的比例，用于在回归锚框时减少误差，加快回归训练过程，从而使2D检测结果更加准确。A kind of cargo box intelligent loading method based on RGBD camera as claimed in claim 1, it is characterized in that, the improvement in the 2D target detection network based on Faster-rcnn improved version described in step S31 comprises: Abandoning the anchor in Faster-rcnn The classic ratio of frame generation is 0.5:1:2, and the average length, width, and height of different cargo box sizes in the warehouse are used as the ratio of anchor frame generation, which is used to reduce errors when regressing anchor frames and speed up the regression training process, thereby Make 2D inspection results more accurate.
如权利要求1所述的一种基于RGBD相机的货物箱智能装载方法，其特征在于，步骤S31所述的基于pointnet++改进版的点云特征提取网络的改进包括：在分组聚类时，舍弃原本使用的球体范围进行聚类，而采用方体范围进行聚类，使聚类体更符合货物箱形状特征，要求原球体半径是方体面对角线长度的一半；另外，将点云中点与点之间距离的计算方法由计算欧式距离改为计算曼和顿距离，使点云距离的表示形式更加符合方体特征，减少网络训练误差，增强最终检测效果。A kind of cargo box intelligent loading method based on RGBD camera as claimed in claim 1, it is characterized in that, the improvement of the point cloud feature extraction network based on pointnet++ improved version described in step S31 comprises: when grouping and clustering, discarding the original The range of the sphere is used for clustering, and the range of the cube is used for clustering, so that the clusters are more in line with the shape characteristics of the cargo box. The radius of the original sphere is required to be half the length of the diagonal of the cube; in addition, the point cloud midpoint The calculation method of the distance to the point is changed from the calculation of the Euclidean distance to the calculation of the Mannton distance, so that the expression form of the point cloud distance is more in line with the cube characteristics, reducing the network training error and enhancing the final detection effect.
如权利要求1所述的一种基于RGBD相机的货物箱智能装载方法，其特征在于，步骤S32所述的基于改进版的LI-fusion模块的融合策略的改进包括：第一，舍弃LI-fusion模块中图像特征在和点云特征融合之前首先经过的特征通道对齐层，而是直接将未对齐的图像特征和点云特征输入到接下来的全连接层中；第二，舍弃LI-fusion模块中融合特征输出之前经过的特征混合层，而是直接将连接后的图像特征和点云特征作为特征融合模块的最终输出。The method for intelligently loading cargo boxes based on an RGBD camera according to claim 1, wherein the improvement of the fusion strategy based on the improved version of the LI-fusion module in step S32 includes: first, discarding the LI-fusion The image features in the module first go through the feature channel alignment layer before being fused with the point cloud features, but directly input the unaligned image features and point cloud features into the next fully connected layer; second, abandon the LI-fusion module In the fusion feature output, it passes through the feature mixing layer, but directly uses the connected image features and point cloud features as the final output of the feature fusion module.
如权利要求1所述的一种基于RGBD相机的货物箱智能装载方法，其特征在于，3D目标检测网络和装载策略生成网络需要预训练。The method for intelligently loading cargo boxes based on RGBD cameras according to claim 1, wherein the 3D object detection network and the loading strategy generation network require pre-training.
一种基于RGBD相机的货物箱智能装载***，其特征在于包括：A kind of cargo box intelligent loading system based on RGBD camera is characterized in that comprising:

RGBD相机采集装置，用于通过RGBD相机采集待装载货物箱和目标区域的3D场景数据；The RGBD camera collection device is used to collect the 3D scene data of the cargo box to be loaded and the target area by the RGBD camera;

托取装置，用于对待装载的货物箱进行托取和装载；Pick-up device, used for picking up and loading the cargo boxes to be loaded;

机械臂，用于辅助托取装置到达指定位置，并以指定方向进行托取或装载；The mechanical arm is used to assist the lifting device to reach the designated position, and carry out lifting or loading in the designated direction;

操作台，用于安装RGBD相机采集装置和固定机械臂；An operating table for installing the RGBD camera acquisition device and fixing the mechanical arm;

控制***，存储好预训练的网络模型，通过RGBD相机采集装置获取3D场景数据，并根据数据最终生成机械臂的偏转位移和旋转角度；The control system stores the pre-trained network model, obtains 3D scene data through the RGBD camera acquisition device, and finally generates the deflection displacement and rotation angle of the robotic arm according to the data;

所述RGBD相机采集装置安装于操作台上，用于保证RGBD相机的相对位置不变；托取装置可以上下移动，完成托取任务和放置任务；机械臂包括前臂、可变长后臂和三个可实现360°旋转的转动轴。可变长后臂通过转动轴与操作台相连接，其中可变长后臂根据货物箱位置调整长度；前臂通过转动轴和可变长后臂连接，扩大装置可达范围；托取装置通过转动轴与前臂前端相连，用于配合机械臂以指定位置指定角度完成货物箱装载；控制***连接RGBD相机采集装置和机械臂，并内置上述训练好的3D目标检测网络和装载策略生成网络，通过RGBD相机获取待装载区域货物箱的3D场景信息，并根据输出的装载顺序和装载位置，计算机械臂移动距离和旋转角度，控制托取装置完成装载任务。The RGBD camera acquisition device is installed on the console to ensure that the relative position of the RGBD camera remains unchanged; the lifting device can move up and down to complete the task of lifting and placing; the mechanical arm includes a forearm, a variable length rear arm and three A rotating axis that can rotate 360°. The variable-length rear arm is connected to the operating table through the rotating shaft, and the length of the variable-length rear arm is adjusted according to the position of the cargo box; the forearm is connected to the variable-length rear arm through the rotating shaft to expand the reachable range of the device; The shaft is connected to the front end of the forearm, which is used to cooperate with the robotic arm to complete the loading of the cargo box at a specified position and specified angle; the control system is connected to the RGBD camera acquisition device and the robotic arm, and has the above-mentioned trained 3D target detection network and loading strategy generation network built in. Through RGBD The camera obtains the 3D scene information of the cargo box in the area to be loaded, and calculates the moving distance and rotation angle of the robotic arm according to the output loading sequence and loading position, and controls the palletizing device to complete the loading task.