WO2023015914A1 - 图像重力方向的获取方法、装置、电子设备及存储介质 - Google Patents

图像重力方向的获取方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023015914A1
WO2023015914A1 PCT/CN2022/084586 CN2022084586W WO2023015914A1 WO 2023015914 A1 WO2023015914 A1 WO 2023015914A1 CN 2022084586 W CN2022084586 W CN 2022084586W WO 2023015914 A1 WO2023015914 A1 WO 2023015914A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
image
dimensional
desktop
test
Prior art date
Application number
PCT/CN2022/084586
Other languages
English (en)
French (fr)
Inventor
喻月涵
Original Assignee
达闼机器人股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 达闼机器人股份有限公司 filed Critical 达闼机器人股份有限公司
Publication of WO2023015914A1 publication Critical patent/WO2023015914A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • Embodiments of the present invention relate to the technical field of computer vision, and in particular to a method, device, electronic equipment, and storage medium for acquiring an image gravity direction.
  • the commonly used methods for obtaining the gravity direction of the image include: based on AprilTag, that is, by pasting the QR code label in the AprilTag library on the horizontal plane in the scene, and then using the AprilTag library to detect the pasted label to obtain the label.
  • the rotation matrix Based on the rotation matrix of the label, the rotation matrix includes the angle between the gravity direction and the Z-axis of the camera coordinate system, which is equivalent to obtaining the gravity direction; the gravity direction is obtained based on the inertial measurement unit (IMU); but considering Based on AprilTag, it is required to set QR code tags in the scene, which is very cumbersome to implement. In many scenarios, QR code tags cannot be artificially set. When based on IMU, there are systematic errors in IMU measurement and error accumulation will occur over time.
  • a method based on plane detection algorithms such as Random Sample Consensus (RANSAC) is proposed, that is, firstly, the plane detection algorithm is used to detect RGB- The largest plane in the 3D point cloud corresponding to the D image, and consider the detected plane as a horizontal plane, so as to use the prior knowledge that the detected plane is perpendicular to the gravity direction to obtain the gravity direction, neither need to introduce peripherals, nor There is no need to manually set up the scene.
  • RANSAC Random Sample Consensus
  • the purpose of the embodiments of the present invention is to provide a method, device, electronic device, and storage medium for acquiring the direction of gravity of an image, so that the direction of gravity of the image can be quickly and accurately determined, so that the coordinates from the camera coordinate system to the world coordinate can be obtained quickly and accurately.
  • the rotation matrix of the system is to provide a method, device, electronic device, and storage medium for acquiring the direction of gravity of an image, so that the direction of gravity of the image can be quickly and accurately determined, so that the coordinates from the camera coordinate system to the world coordinate can be obtained quickly and accurately.
  • an embodiment of the present invention provides a method for acquiring the gravity direction of an image, comprising: acquiring a trained 3D desktop semantic segmentation model, wherein the training set of the 3D desktop semantic segmentation model is several point cloud images A collection of components, the point cloud image carries a three-dimensional desktop mask generated according to the point cloud where the horizontal plane is located in the point cloud image; the three-dimensional desktop semantic segmentation model is used to semantically segment the test point cloud image to obtain the A three-dimensional test desktop mask of the test point cloud image; determining the test normal direction of the three-dimensional test desktop mask as the gravity direction of the test point cloud image.
  • an embodiment of the present invention also provides an image gravity direction acquisition device, including: an acquisition module, used to acquire a trained three-dimensional desktop semantic segmentation model, wherein the training of the three-dimensional desktop semantic segmentation model
  • the set is a collection of several point cloud images, and the point cloud image carries a three-dimensional desktop mask generated according to the point cloud where the horizontal plane is located in the point cloud image;
  • the semantic segmentation module is used to utilize the semantic segmentation model of the three-dimensional desktop
  • the test point cloud image is semantically segmented to obtain the three-dimensional test desktop mask of the test point cloud image;
  • the determination module is used to determine the test normal direction of the three-dimensional test desktop mask as the gravity of the test point cloud image direction.
  • an embodiment of the present invention also provides an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be used by the Instructions executed by at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can execute the method for acquiring the gravity direction of an image as described above.
  • an embodiment of the present invention further provides a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the method for acquiring the gravity direction of an image as described above is implemented.
  • an embodiment of the present invention further provides a computer program, which implements the method for acquiring the gravity direction of an image as described above when the computer program is executed by a processor.
  • the method for obtaining the direction of gravity of an image starts from the point cloud image, first obtains the desktop segmentation semantic model trained by the training set composed of several point cloud images marked with three-dimensional desktop masks, and the point cloud
  • the 3D desktop mask marked in the image is the point cloud where the horizontal plane in the point cloud image is located, so that the trained 3D desktop semantic segmentation model can recognize how to identify the point cloud where the horizontal plane area in the image is located through training, so , when using the 3D tabletop semantic segmentation model to semantically segment the test point cloud image, the 3D test tabletop mask of the test point cloud image can be obtained, and because the normal of the 3D test tabletop mask in the test point cloud image is consistent with the direction of gravity , therefore, the gravity direction of the test point cloud image can be obtained by obtaining the normal of the three-dimensional test desktop mask, and because the model is used for processing, multiple iterations are not required, and the speed is much faster than plane detection algorithms such as RANSAC and The accuracy is even higher, and it also
  • Fig. 1 is the flow chart of the acquisition method of image gravity direction in the embodiment of the present invention.
  • Fig. 2 is the flow chart of the acquisition method of the image gravity direction including training desktop semantic segmentation model step in the embodiment of the present invention
  • Fig. 3 is a flow chart of the method for acquiring the direction of gravity of an image including the step of screening the first three-dimensional point cloud in an embodiment of the present invention
  • Fig. 4 is the flow chart of the acquisition method of the image gravity direction comprising the step of determining the normal line of the first three-dimensional point cloud in another embodiment of the present invention
  • Fig. 5 is a flow chart of an image gravity direction acquisition method including the step of removing a third plane example whose number of pixels is less than a preset number threshold in another embodiment of the present invention
  • Fig. 6 is a flow chart of a method for acquiring the direction of gravity of an image including the step of performing three-dimensional reconstruction in another embodiment of the present invention
  • Fig. 7 is a schematic structural diagram of an acquisition device for the direction of gravity of an image in another embodiment of the present invention.
  • Fig. 8 is a schematic structural diagram of an electronic device in another embodiment of the present invention.
  • the method of obtaining the gravity direction of the image mainly includes three mainstream methods based on AprilTag, IMU-based and plane detection algorithm, but based on AprilTag, the scene needs to be manually set, and the overall process takes a long time; based on IMU, IMU needs to be introduced Assist in determining the direction of gravity. Not all scenes can be equipped with an IMU for the camera, and the IMU has systematic errors that will cause errors to accumulate over time; based on the plane detection algorithm, it is easy to verticalize the wall due to the inability to provide semantic information. The plane is used as the detected horizontal plane, and then the wrong gravity direction is obtained, and the algorithm solution requires multiple iterations, which cannot meet the time requirements of real-time tasks. Therefore, it is necessary to propose a method that can accurately and quickly obtain image gravity.
  • an embodiment of the present invention proposes a method for acquiring the direction of gravity of an image, which includes the following steps: acquiring a trained 3D desktop semantic segmentation model, wherein the training set of the 3D desktop semantic segmentation model is several point clouds A collection of images, the point cloud image carries a three-dimensional desktop mask generated according to the point cloud where the horizontal plane is located in the point cloud image; the three-dimensional desktop semantic segmentation model is used to semantically segment the test point cloud image, and the obtained The three-dimensional test desktop mask of the test point cloud image; determine the test normal direction of the three-dimensional test desktop mask as the gravity direction of the test point cloud image.
  • the method for obtaining the direction of gravity of an image starts from the point cloud image, first obtains the desktop segmentation semantic model trained by the training set composed of several point cloud images marked with three-dimensional desktop masks, and the point cloud
  • the 3D desktop mask marked in the image is the point cloud where the horizontal plane in the point cloud image is located, so that the trained 3D desktop semantic segmentation model can recognize how to identify the point cloud where the horizontal plane area in the image is located through training, so , when using the 3D tabletop semantic segmentation model to semantically segment the test point cloud image, the 3D test tabletop mask of the test point cloud image can be obtained, and then because the normal direction and gravity direction of the 3D test tabletop mask in the test point cloud image Therefore, the gravity direction of the test point cloud image can be obtained by obtaining the normal of the three-dimensional test desktop mask, and because the model is used for processing, multiple iterations are not required, and the speed is much faster than plane detection algorithms such as RANSAC And the accuracy is even higher, and it also avoids the need
  • the method for obtaining the direction of gravity of the image is applied to devices that need to perform three-dimensional scene reconstruction, such as robots, monitoring equipment, etc., as shown in Figure 1, the method for obtaining the direction of gravity of the image includes:
  • Step 101 obtain the trained 3D desktop semantic segmentation model, wherein the training set of the 3D desktop semantic segmentation model is a collection of point cloud images, and the point cloud images carry the 3D image generated according to the point cloud where the horizontal plane in the point cloud image is located.
  • Desktop mask
  • a point cloud refers to a collection of points in space, and these points have geometric meanings, such as a point cloud in a three-dimensional space carrying its coordinate information in a coordinate system; a point cloud image is an image composed of several point clouds ; the 3D desktop mask is actually a semantic mask labeling of the horizontal plane in the point cloud image.
  • the desktop semantic segmentation model can learn the horizontal plane features in the point cloud image through the training set, so that the trained The 3D desktop semantic segmentation model can segment and output the 3D point cloud where the horizontal plane in the point cloud image is located.
  • the table in the scene is placed vertically and the desktop is horizontal, that is, the table is used to represent a standard horizontal plane, not specifically the desktop in the scene; the mask is used to identify the area of interest in the image Highlight to distinguish the masked area from other areas in the image.
  • Step 102 using the 3D desktop semantic segmentation model to perform semantic segmentation on the test point cloud image to obtain a 3D test desktop mask of the test point cloud image.
  • test point cloud image may be any point cloud image acquired by a camera or other equipment.
  • step 102 can be to obtain the test RGB-D image, and then project the RGB-D image from two-dimensional to three-dimensional to obtain the test point cloud image, or the point cloud image directly collected by the sensor, and then test The point cloud image is input into the 3D desktop semantic segmentation model, and the three-digit desktop mask output by the 3D desktop semantic segmentation model, namely the 3D test desktop mask, is obtained.
  • 3D desktop semantic segmentation model can quickly perform semantic segmentation in the test point cloud image and output a 3D desktop mask, and because the model can directly output a 3D point cloud instead of a 2D plane information, avoiding the conversion process from 2D to 3D, and simplifying the operation process.
  • Step 103 determine the test normal direction of the three-dimensional test desktop mask as the test point cloud image.
  • the principal component analysis (Principle Component Analysis, PCA) algorithm is used to calculate the normal direction of the three-dimensional test desktop mask as the test normal direction, and then the gravity direction of the test point cloud image is obtained.
  • PCA Principal Component Analysis
  • the PCA algorithm is a method of linearly transforming the observed values of a series of possibly related variables by using an orthogonal transformation, thereby projecting the values of a series of linearly uncorrelated variables, these uncorrelated variables are called principal components.
  • each neighborhood point is subtracted from the center point of the neighborhood to obtain an n ⁇ 3 matrix X.
  • the method for obtaining the direction of gravity of the image also includes:
  • Step 104 perform plane detection on a preset data set including several RGB-D images, and obtain several first plane instances in each RGB-D image.
  • the data set can be an open source data set, such as SUNRGBD data set, NYU Depth Dataset V2 data set, etc., or a data set produced according to actual needs.
  • An RGB-D image can be two images, an RGB image and a depth image, or an RGB image marked with depth information. I won't go into details here.
  • tools such as PlaneRCNN are used for plane detection to obtain each plane in each RGB-D image, such as each face of the table, each face of the wall, each face of the chair, etc.
  • each RGB-D image will obtain several first plane instances, and different RGB-D images may have different numbers and sizes of the first plane instances due to different scenes.
  • Step 105 Identify and remove vertical planes in the first plane instance according to the semantic segmentation labels carried in the dataset to obtain a second plane instance.
  • the RGB-D images can be When identifying and marking the area where the flowers are located, and when the semantic segmentation label includes the table label, it can identify and mark the area where the table is located from the RGB-D image.
  • various things in the RGB-D image can be identified, so that the things in the scene corresponding to the first plane instance can be known, so that according to the characteristics of the things in the first plane instance
  • the horizontal plane is screened out, such as the first plane instance corresponding to a non-horizontal plane such as a wall is removed, and the first plane instance corresponding to a table and other things with a horizontal plane is reserved.
  • the first plane instance since the first plane instance is obtained by plane detection, the first plane instance contains various planes, including different types of planes such as inclined planes, vertical planes, and horizontal planes, and the size of the planes may also be Different, therefore, it is necessary to remove other planes except the horizontal plane and keep the horizontal plane, so as to obtain an accurate horizontal plane and then generate a desktop mask, avoid using the wall, the photo frame placed obliquely on the desktop, etc. as the horizontal plane, and then Get the wrong desktop mask, which affects the accuracy of the desktop semantic segmentation model.
  • planes including different types of planes such as inclined planes, vertical planes, and horizontal planes, and the size of the planes may also be Different, therefore, it is necessary to remove other planes except the horizontal plane and keep the horizontal plane, so as to obtain an accurate horizontal plane and then generate a desktop mask, avoid using the wall, the photo frame placed obliquely on the desktop, etc. as the horizontal plane, and then Get the wrong desktop mask, which affects the accuracy of the desktop semantic
  • Step 106 project the second plane instance from 2D to 3D to obtain a first 3D point cloud.
  • the first three-dimensional point cloud is actually located in the additive coordinate system
  • the camera coordinate system is a three-dimensional Cartesian coordinate system established with the focus center of the camera as the origin and the optical axis as the Z axis, that is, the origin of the coordinate system is
  • the optical center of the camera, the x-axis and y-axis are parallel to the X and Y axes of the image, and the z-axis is the optical axis of the camera, which is perpendicular to the graphics plane.
  • the RGB-D image is actually a three-dimensional image, that is, the depth information is added on the basis of the two-dimensional information represented by RGB, and since the camera is shooting the scene from itself, the object in the scene is perspective Therefore, when the third plane instance is converted from two-dimensional to three-dimensional through the depth information of its area, the coordinate information obtained is naturally the coordinates in the camera coordinate system.
  • Step 107 using the first 3D point cloud belonging to the same RGB-D image as the 3D desktop mask of the corresponding point cloud image to obtain a training set.
  • all the first 3D point clouds corresponding to it are obtained and used as a 3D desktop mask, so that the point cloud image obtained by projecting the 3D desktop mask and the RGB-D image from 2D to 3D As a training data, multiple training data form a training set.
  • Step 108 using the training set to train the initial 3D desktop semantic segmentation model.
  • the desktop semantic segmentation model is a neural network for semantic segmentation. After training and learning from a training set composed of point cloud images marked with 3D desktop masks, it can learn the characteristics of the horizontal plane in the point cloud image. , which in turn can be used to determine the 3D desktop mask from the point cloud image.
  • the method for acquiring the direction of gravity of the image before step 107 also includes:
  • step 109 the first 3D point cloud is screened according to the deviation degree of the normal direction of the first 3D point cloud in the world coordinate system relative to the Z-axis direction.
  • the degree of deviation of the normal direction of the first 3D point cloud in the world coordinate system relative to the Z-axis direction can be the angle between the found direction and the Z-axis, the distance between the normal and the vector (0, 0, 1) , the correlation between the normal and the vector (0, 0, 1), etc.
  • the direction and other parameters of the Z axis can also be used as the degree of deviation, so I won’t go into details here. up.
  • step 107 is: project the RGB-D image from 2D to 3D to obtain a point cloud image, and determine the second 3D point cloud belonging to the same point cloud image as a 3D desktop mask, A training set is obtained, wherein the second 3D point cloud is the first 3D point cloud retained after screening.
  • the mask used to obtain the 3D desktop in step 107 is the first 3D point cloud that is retained after filtering in step 109 .
  • some non-horizontal planes may still be retained.
  • the table in the scene is tilted due to quality problems, and the single tilt angle is very small.
  • the normal direction of the world coordinate system and the degree of deviation of the Z axis of the world coordinate system can be further screened, and the plane can be further screened to ensure that the obtained second plane instance is a horizontal plane, thereby improving the desktop mask determined by the second plane instance.
  • the accuracy of the code is obtained to obtain a more labeled and more accurate training set, to improve the accuracy of the desktop mask obtained by the model through semantic segmentation, and finally to improve the accuracy of the normal direction determined according to the desktop mask.
  • the point cloud image it is also possible to obtain the point cloud image, then perform plane detection on the point cloud image, and then calculate the normal direction of the detected plane in the world coordinate system, according to the The degree of deviation between the normal direction and the z-axis screens the detected planes, retains the planes whose deviation length meets the requirements, and generates a 3D desktop mask, and then obtains a training set based on the point cloud image and its 3D desktop mask.
  • the training set so I won’t go into details here.
  • step 109 in the method for acquiring the direction of gravity of the image includes:
  • Step 1091 transform the first 3D point cloud into the world coordinate system according to the external parameters carried by the data set.
  • the dataset carries camera parameters to convert the first 3D point cloud in the camera coordinate system to the world coordinate system.
  • Step 1092 determine the normal of the plane where the first 3D point cloud after the coordinate system transformation is located.
  • the PCA algorithm is used to solve the normal of the plane where the first 3D point cloud is located.
  • Step 1093 removing the first 3D point cloud after coordinate system transformation for which the included angle between the normal line and the Z-axis of the world coordinate system exceeds a preset angle threshold.
  • the normal direction of the corresponding first 3D point cloud will be the direction of gravity, that is, in the world coordinate system, it will lead to the Z axis, and at the same time
  • the normal direction of the corresponding first 3D point cloud is not the direction of gravity, but there is a deviation, that is, an included angle, between the direction of gravity and the direction of gravity. Therefore, through the first Whether there is a certain deviation between the normal direction of the 3D point cloud in the world coordinate system and the Z-axis direction determines whether the first 3D point cloud is the point cloud where the real horizontal plane is located.
  • the normal direction of the first 3D point cloud obtained from the second plane instance in the world coordinate system does not completely coincide with the Z-axis direction Possibly, furthermore, it is judged based on the degree of deviation rather than whether it overlaps or not. For example, when the angle between the normal direction of the first three-dimensional point cloud in the world coordinate system and the direction of the Z axis is within 5°, the corresponding first three-dimensional point cloud is considered to be A three-dimensional point cloud is a point cloud where the horizontal plane is located and can be retained.
  • the method for obtaining the direction of gravity of the image also includes:
  • Step 110 from all obtained second plane instances, remove second plane instances whose number of pixels is less than a preset number threshold.
  • step 106 is: project the remaining second plane instance from 2D to 3D to obtain a first 3D point cloud.
  • the table category may contain the table's side or table leg plane, etc., it is necessary to filter the plane instance. According to experience, a plane instance with fewer pixels is more likely to be a table leg or a side of a table rather than a table top, and if it is a table top, the quality of the table top may be greatly affected by inaccurate semantic segmentation labels, which will make the overall
  • the third plane instance corresponding to the desktop produces more noise points. Therefore, the third plane instance with a small number of pixels is removed to avoid the above-mentioned interference, so that the remaining third plane instance is a horizontal plane. Higher performance, and can also reduce the number of second plane instances processed in step 107, reduce the number of first three-dimensional point clouds processed in step 109, speed up the processing speed, and improve efficiency.
  • the obtained gravitational defense line can be used for three-dimensional reconstruction.
  • the method for obtaining the gravitational direction of the image further includes:
  • Step 111 determine the rotation and translation matrix from the camera coordinate system to the world coordinate system according to the gravity direction.
  • the angle between the gravity direction and the Z-axis of the camera coordinate system formed based on the image plane is the camera
  • the angle between the coordinate system and the world coordinate system in the Z-axis direction can determine the rotation and translation matrix.
  • Step 112 perform three-dimensional reconstruction on the scene captured by the camera according to the rotation and translation matrix.
  • the scene captured by the camera is projected from two-dimensional to the three-dimensional space based on the camera coordinate system, and then the rotation and translation matrix is used to transform from the camera coordinate system to the world coordinate system to obtain a three-dimensional reconstruction the result of.
  • step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
  • the embodiment of the present invention also provides a device for acquiring the direction of gravity of an image, as shown in FIG. 7 , including:
  • the obtaining module 701 is used to obtain the trained 3D desktop semantic segmentation model, wherein the training set of the 3D desktop semantic segmentation model is a collection of several point cloud images, and the point cloud images carry the point cloud according to where the horizontal plane in the point cloud image is located.
  • the resulting 3D desktop mask is used to obtain the trained 3D desktop semantic segmentation model, wherein the training set of the 3D desktop semantic segmentation model is a collection of several point cloud images, and the point cloud images carry the point cloud according to where the horizontal plane in the point cloud image is located.
  • the resulting 3D desktop mask is used to obtain the trained 3D desktop semantic segmentation model, wherein the training set of the 3D desktop semantic segmentation model is a collection of several point cloud images, and the point cloud images carry the point cloud according to where the horizontal plane in the point cloud image is located.
  • the semantic segmentation module 702 is configured to perform semantic segmentation on the test point cloud image by using the three-dimensional desktop semantic segmentation model to obtain a three-dimensional test desktop mask of the test point cloud image.
  • the determination module 703 is configured to determine the test normal direction of the three-dimensional test desktop mask as the gravity direction of the test point cloud image.
  • this embodiment is a device embodiment corresponding to the embodiment of the method for obtaining the direction of gravity of the image, and this embodiment can be implemented in cooperation with the embodiment of the method for obtaining the direction of gravity of the image.
  • the relevant technical details mentioned in the embodiment of the method for acquiring the direction of gravity of the image are still valid in this embodiment, and will not be repeated here in order to reduce repetition.
  • the relevant technical details mentioned in this embodiment can also be applied to the embodiment of the method for acquiring the direction of gravity of the image.
  • modules involved in this embodiment are logical modules.
  • a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units.
  • units that are not closely related to solving the technical problem proposed by the present invention are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • An embodiment of the present invention also provides an electronic device, as shown in FIG. 8 , including at least one processor 801; and a memory 802 communicatively connected to at least one processor 801; wherein, the memory 802 stores information that can be processed by at least one
  • the instructions executed by the processor 801 are executed by at least one processor 801, so that the at least one processor 801 can execute the method for acquiring the direction of gravity of an image described in any of the above method embodiments.
  • the memory 802 and the processor 801 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 801 and various circuits of the memory 802 together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the data processed by the processor 801 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 801 .
  • the processor 801 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management and other control functions. And the memory 802 may be used to store data used by the processor 801 when performing operations.
  • Embodiments of the present invention relate to a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • An embodiment of the present invention also provides a computer program, which implements the above method embodiments when the computer program is executed by a processor.
  • the program is stored in a storage medium and includes several instructions to make a device (which can be a single-chip , chip, etc.) or a processor (processor) executes all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

一种图像重力方向的获取方法、装置、电子设备及存储介质,方法包括:获取训练好的三维桌面语义分割模型,其中,三维桌面语义分割模型的训练集为若干点云图像组成的集合,点云图像携带根据点云图像中水平平面所在的点云生成的三维桌面掩码(101);利用三维桌面语义分割模型对测试点云图像进行语义分割,得到测试点云图像的三维测试桌面掩码(102);确定三维测试桌面掩码的测试法线方向作为测试点云图像的重力方向(103)。能够快速准确地确定出图像的重力方向,从而能够迅速得到准确的从相机坐标系到世界坐标系的旋转矩阵。

Description

图像重力方向的获取方法、装置、电子设备及存储介质
本申请要求2021年8月12日递交的、申请号为“202110926336.0”、发明名称为“图像重力方向的获取方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及计算机视觉技术领域,特别涉及一种图像重力方向的获取方法、装置、电子设备及存储介质。
背景技术
在机器人抓取物体等各种需要进行三维重建的场景下,需要将相机采集到的相机坐标系下的RGB-D图像通过旋转矩阵转换到世界坐标系下。为了得到旋转矩阵通常需要知道坐标系中三个坐标轴的旋转角度,其中,由于世界坐标系中的Z轴通常和重力方向一致,因此,通常是确定图像的重力方向,然后根据重力方向和相机坐标系的夹角确定Z轴的旋转角度。目前常用的获取图像重力方向的方法包括:基于AprilTag,即通过在场景中的水平平面粘贴AprilTag库中的二维码标签,然后利用AprilTag库对粘贴的标签进行检测,得到标签的标识和相机相对于标签的旋转矩阵,其中,旋转矩阵中包含了重力方向和相机坐标系Z轴的夹角,相当于获取了重力方向;基于惯性测量单元(Inertial measurement unit,IMU)获取重力方向;但是考虑到基于AprilTag时要求实现在场景中设置二维码标签,在实现时非常繁琐,很多场景不能人为设置二维码标签,基于IMU时又由于IMU在测量时存在***误差并会随着时间进行误差积累,在某些场景下为相机配置IMU不仅成本高,还不易实现,为此提出基于随机采样一致性(Random Sample Consensus,RANSAC)等平面检测算法的方法,即首先利用平面检测算法来检测RGB-D图像对应的三维点云中的一个最大平面,并认为检测出来的平面为水平平面,从而利用检测出来的平面垂直于重力方向的先验知识获取重力方向,既不不需要引入外设,也不需要人工对场景进行设置。
然而,RGB-D图像中可能存在平行于重力方向的大平面如墙壁等,此时由于RANSAC等平面检测算法不能结合语义信息,会将墙壁等平行于重力方向的平面作为检测到的最大平面,导致错误估计重力方向,并且RGB-D图像在三维坐标中很可能对应大量的点云,RANSAC等平面检测算法通过处理大量点云检测RGB-D图像中的水平平面时,需要经过多次迭代,速 度慢耗时久,不能支持实时任务。亟需一种方法能够快速准确地确定图像的重力方向。
发明内容
本发明实施方式的目的在于提供一种图像重力方向的获取方法、装置、电子设备及存储介质,使得能够快速准确地确定出图像的重力方向,从而能够迅速得到准确的从相机坐标系到世界坐标系的旋转矩阵。
为达到上述目的,本发明的实施例提供了一种图像重力方向的获取方法,包括:获取训练好的三维桌面语义分割模型,其中,所述三维桌面语义分割模型的训练集为若干点云图像组成的集合,所述点云图像携带根据所述点云图像中水平平面所在的点云生成的三维桌面掩码;利用所述三维桌面语义分割模型对测试点云图像进行语义分割,得到所述测试点云图像的三维测试桌面掩码;确定所述三维测试桌面掩码的测试法线方向作为所述测试点云图像的重力方向。
为达到上述目的,本发明的实施例还提供了一种图像重力方向的获取装置,包括:获取模块,用于获取训练好的三维桌面语义分割模型,其中,所述三维桌面语义分割模型的训练集为若干点云图像组成的集合,所述点云图像携带根据所述点云图像中水平平面所在的点云生成的三维桌面掩码;语义分割模块,用于利用所述三维桌面语义分割模型对测试点云图像进行语义分割,得到所述测试点云图像的三维测试桌面掩码;确定模块,用于确定所述三维测试桌面掩码的测试法线方向作为所述测试点云图像的重力方向。
为达到上述目的,本发明的实施例还提供了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上所述的图像重力方向的获取方法。
为达到上述目的,本发明的实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的图像重力方向的获取方法。
为达到上述目的,本发明的实施例还提供了一种计算机程序,所述计算机程序被处理器执行时实现如上所述的图像重力方向的获取方法。
本发明实施例提供的图像重力方向的获取方法,从点云图像的角度出发,先获取经由若干标记有三维桌面掩码的点云图像组成的训练集训练好的桌面分割语义模型,并且点云图像中标记的三维桌面掩码为点云图像中的水平平面所在的点云,这样训练好的三维桌面语义分割模型通过训练能够对如何识别图像中的水平平面区域所在的点云进行识别,因此,利用三 维桌面语义分割模型对测试点云图像进行语义分割时,能够得到测试点云图像的三维测试桌面掩码,进而由于测试点云图像中的三维测试桌面掩码的法线与重力方向一致,因此,通过获取三维测试桌面掩码的法线就能够得到测试点云图像的重力方向,并且由于是利用模型进行处理,不需要进行多次迭代,速度要远快于RANSAC等平面检测算法且准确性还要更高,还避免了将RGB-D图像作为模型输入时,需要利用模型对测试RGB-D图像中检测到的多个水平平面进行二维到三维的转换的情况,跳过维度转换的过程直接确定三维语义分割模型输出的三维桌面掩码进而确定出三维桌面掩码的法线方向,效率更高。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。
图1是本发明实施例中的图像重力方向的获取方法的流程图;
图2是本发明实施例中的包括训练桌面语义分割模型步骤的图像重力方向的获取方法的流程图;
图3是本发明实施例中的包括筛选第一三维点云步骤的图像重力方向的获取方法的流程图;
图4是本发明另一实施例中的包括确定第一三维点云法线步骤的图像重力方向的获取方法的流程图;
图5是本发明另一实施例中的包括去除像素点数量少于预设数量阈值的第三平面实例步骤的图像重力方向的获取方法的流程图;
图6是本发明另一实施例中的包括进行三维重建步骤的图像重力方向的获取方法的流程图;
图7是本发明另一实施例中的图像重力方向的获取装置的结构示意图;
图8是本发明另一实施例中的电子设备的结构示意图。
具体实施方式
由背景技术可知,获取图像重力方向的方法主要包括基于AprilTag、基于IMU和基于平面检测算法三种主流的方式,但是基于AprilTag需要人工对场景进行设置,且整体流程耗时 长;基于IMU需要引入IMU辅助确定重力方向,不是所有场景都能够为相机配置IMU,且IMU存在***误差并会随着时间推移导致误差积累;基于平面检测算法,则又会由于无法提供语义信息,容易将墙壁等竖直平面作为检测到的水平平面,进而得到错误的重力方向,以及算法求解需要多次迭代无法满足实时任务的时间要求。因此,需要提出一种能够准确快速地获取图像重力的方法。
为解决上述问题,本发明实施例提出了一种图像重力方向的获取方法,包括以下步骤:获取训练好的三维桌面语义分割模型,其中,所述三维桌面语义分割模型的训练集为若干点云图像组成的集合,所述点云图像携带根据所述点云图像中水平平面所在的点云生成的三维桌面掩码;利用所述三维桌面语义分割模型对测试点云图像进行语义分割,得到所述测试点云图像的三维测试桌面掩码;确定所述三维测试桌面掩码的测试法线方向作为所述测试点云图像的重力方向。
本发明实施例提供的图像重力方向的获取方法,从点云图像的角度出发,先获取经由若干标记有三维桌面掩码的点云图像组成的训练集训练好的桌面分割语义模型,并且点云图像中标记的三维桌面掩码为点云图像中的水平平面所在的点云,这样训练好的三维桌面语义分割模型通过训练能够对如何识别图像中的水平平面区域所在的点云进行识别,因此,利用三维桌面语义分割模型对测试点云图像进行语义分割时,能够得到测试点云图像的三维测试桌面掩码,进而由于测试点云图像中的三维测试桌面掩码的法线方向与重力方向一致,因此,通过获取三维测试桌面掩码的法线就能够得到测试点云图像的重力方向,并且由于是利用模型进行处理,不需要进行多次迭代,速度要远快于RANSAC等平面检测算法且准确性还要更高,还避免了将RGB-D图像作为模型输入时,需要利用模型对测试RGB-D图像中检测到的多个水平平面进行二维到三维的转换的情况,跳过维度转换的过程直接确定三维语义分割模型输出的三维桌面掩码进而确定出三维桌面掩码的法线方向,效率更高。
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合附图对本发明的各实施方式进行详细的阐述。然而,本领域的普通技术人员可以理解,在本发明各实施方式中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施方式的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本发明的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
下面将对本实施例的图像重力方向的获取方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。
在一些实施例中,图像重力方向的获取方法应用于需要进行三维场景重建的设备上,如机器人、监控设备等,如图1所示,图像重力方向的获取方法包括:
步骤101,获取训练好的三维桌面语义分割模型,其中,三维桌面语义分割模型的训练集为若干点云图像组成的集合,点云图像携带根据点云图像中水平平面所在的点云生成的三维桌面掩码。
本实施例中,点云是指空间内由点组成的集合,这些点具有几何意义,如三维空间的点云携带其在坐标系中的坐标信息;点云图像是由若干点云组成的图像;三维桌面掩码实际是点云图像中水平平面的语义掩码标记。
值得一提的是,由于三维桌面掩码为点云图像中的水平平面所在的点云,因此,桌面语义分割模型能够通过训练集对点云图像中的水平平面特征进行学***平面所在的三维点云。
需要说明的是,场景中的桌子竖直放置并且桌面呈现水平状态,即以桌面表示标准的水平平面,而不是特指场景中桌面这一事物;掩码用于对图像中的感兴趣的区域进行突出显示,从而将掩码所在区域与图像中的其他区域区分开来。
步骤102,利用三维桌面语义分割模型对测试点云图像进行语义分割,得到测试点云图像的三维测试桌面掩码。
本实施例中,测试点云图像可以是利用相机或其他设备获取的任何一张点云图像。
需要说明的是,步骤102可以是获取测试RGB-D图像,然后对RGB-D图像进行从二维到三维的投影得到测试点云图像,或者是传感器直接采集到的点云图像,接着将测试点云图像输入三维桌面语义分割模型,获取三维桌面语义分割模型输出的三位桌面掩码,即三维测试桌面掩码。
值得一提的是,利用三维桌面语义分割模型处理图像,能够快速地在测试点云图像中进行语义分割并输出三维桌面掩码,并且由于能够直接由模型输出三维点云,而不是二维平面信息,避免了二维到三维的转化过程,操作流程更加简化。
步骤103,确定三维测试桌面掩码的测试法线方向作为测试点云图像。
具体地说,利用主成分分析(Principle Component Analysis,PCA)算法计算三维测试桌面掩码的法线方向作为测试法线方向,进而得到测试点云图像的重力方向。其中,PCA算法是一种利用正交变换来对一系列可能相关的变量的观测值进行线性变换,从而投影为一系列线性不相关变量的值,这些不相关变量称为主成分。在一组三维点集中,将每个邻域点与该邻域的中心点相减,得到一个n×3的矩阵X。接着,用奇异值分解(Singular Value  Decomposition,SVD)对该矩阵进行分解,得到:X=UΣVT,U中最后一列就是特征值最小的特征向量,也就是要求解的表征法线方向的法向量n。
进一步地,在一些实施例中还包括如何对桌面语义分割模型的训练,如图2所示,步骤101之前,图像重力方向的获取方法还包括:
步骤104,对预设的包含有若干RGB-D图像的数据集进行平面检测,得到每个RGB-D图像中的若干第一平面实例。
本实施例中,数据集可以是开源数据集,如SUNRGBD数据集、NYU Depth Dataset V2数据集等,也可以是针对实际需求制作得到的数据集。RGB-D图像可以是两张图像,即RGB图像和深度图像,也可以是标记有深度信息的RGB图像。此处就不再一一赘述了。
具体地说,利用PlaneRCNN等工具进行平面检测,得到每个RGB-D图像中的各个平面,如桌子的各个面、墙壁、椅子的各个面等。
需要说明的是,每个RGB-D图像都会得到若干第一平面实例,不同的RGB-D图像的由于场景不同,第一平面实例的数量、大小等都可能不同。
步骤105,根据数据集携带的语义分割标签识别并去除第一平面实例中的竖直平面,得到第二平面实例。
具体地说,不论是开源数据集还是用户自主制作的数据集,都会携带语义分割标签,以对数据集中的RGB-D图像进行语义分割,如语义分割标签包括花朵标签时能够从RGB-D图像中识别并标记出花朵所在区域、语义分割标签包括桌子标签时能够从RGB-D图像中识别并标记出桌子所在区域。因此,根据数据集携带的语义分割标签,能够将RGB-D图像中各种事物识别出来,从而能够得知第一平面实例所对应的场景中的事物,从而根据事物的特性在第一平面实例中筛选出水平平面,如墙壁等非水平平面所对应的第一平面实例去除掉,将桌子等具有水平平面的事物所对应的第一平面实例保留。
值得一提的是,由于第一平面实例是平面检测得到,因此,第一平面实例中包含各个不同的平面,包括倾斜平面、竖直平面、水平平面等不同类型的平面,平面的大小也可能不同,因此,需要将除水平平面之外的其他平面去除掉,保留水平平面,从而得到准确的水平平面进而生成桌面掩码,避免将墙体、桌面上倾斜放置的相框等作为水平平面,进而得到错误的桌面掩码,影响桌面语义分割模型的准确性。
步骤106,将第二平面实例从二维投影到三维,得到第一三维点云。
本实施例中,第一三维点云实际位于相加坐标系,其中,相机坐标系是以相机的聚焦中心为原点,以光轴为Z轴建立的三维直角坐标系,即坐标系的原点为相机的光心,x轴与y 轴与图像的X,Y轴平行,z轴为相机光轴,它与图形平面垂直。
需要说明的是,RGB-D图像实际为三维图像,即在RGB表示的二维信息的基础上加上深度信息,并且由于相机在对场景进行拍摄时是从自身出发对场景中的物体进行透视投影得到的,因此,第三平面实例通过其所在区域的深度信息从二维转化到三维时,得到的坐标信息自然就是相机坐标系内的坐标。
步骤107,将属于同一RGB-D图像的第一三维点云作为相应的点云图像的三维桌面掩码,得到训练集。
具体地说,对于同一RGB-D图像,获取其对应的所有第一三维点云并作为三维桌面掩码,从而将三维桌面掩码和RGB-D图像从二维投影到三维得到的点云图像作为一个训练数据,多个训练数据组成训练集。
步骤108,利用训练集对初始的三维桌面语义分割模型进行训练。
需要说明的是,桌面语义分割模型为用于语义分割的神经网络,经过若干标记有三维桌面掩码的点云图像组成的训练集的训练学***平面的特征,进而能够用于从点云图像中确定出三维桌面掩码。
进一步地,在一些场景中点云图像中可能包含桌面上倾斜放置的相框等物体的点云,从而导致将近似标准水平平面识别为标准水平平面,出现数据集不准确的情况,因此,在一些实施例中,如图3所示,步骤107之前图像重力方向的获取方法还包括:
步骤109,根据第一三维点云在世界坐标系内的法线方向相对于Z轴方向的偏离程度对第一三维点云进行筛选。
具体地说,第一三维点云在世界坐标系内的法线方向相对于Z轴方向的偏离程度可以是发现方向与Z轴的夹角、法线和向量(0,0,1)的距离、法线和向量(0,0,1)的相关性等,当然,以上仅为具体的举例说明,还可以是将方向与Z轴的其他参数作为偏离程度,此处就不再一一赘述了。
需要说明的是,在本实施例中步骤107为:将RGB-D图像从二维投影到三维,得到点云图像,将属于同一点云图像的第二三维点云确定为三维桌面掩码,得到训练集,其中,第二三维点云为筛选后被保留的第一三维点云。
即步骤107中的用于获取三维桌面掩码是步骤109筛选后被保留下来的第一三维点云。
值得一提的是,执行步骤105之后可能仍然会保留部分非水平平面,如场景中的桌子由于存在质量问题桌面为倾斜状态单倾斜角度很小等,此时,再根据第一三维点云在世界坐标系的法线方向和世界坐标系的Z轴的偏离程度来进一步筛选,能够进一步对平面进行筛选, 保证得到的第二平面实例是水平平面,从而提高由第二平面实例确定的桌面掩码的准确性,得到更加标记更加准确的训练集,提高模型通过语义分割得到桌面掩码的准确性,最终提高根据桌面掩码确定的法线方向的准确性。
需要说明的是,在其他实施例中,还可以是获取点云图像,然后对点云图像进行平面检测,接着计算检测到的平面在世界坐标系下的法线方向,根据世界坐标系下的法线方向与z轴的偏离程度对检测到的平面进行筛选,保留偏离长度满足要求的平面,并生成三维桌面掩码,进而根据点云图像及其三维桌面掩码得到训练集。当然,还有其他方式获取训练集,此处就不再一一赘述了。
进一步地,在一些实施例中利用法线方向与Z轴方向的夹角来衡量,如图4所示,图像重力方向的获取方法中步骤109包括:
步骤1091,根据数据集携带的外部参数将第一三维点云转换到世界坐标系下。
具体地说,数据集携带了相机参数,以实现将相机坐标系下地第一三维点云转换到世界坐标系。
步骤1092,确定坐标系转换后的第一三维点云所在平面的法线。
具体地说,利用PCA算法求解第一三维点云所在平面的法线。
步骤1093,去除法线与世界坐标系的Z轴的夹角超过预设角度阈值的坐标系转换后的第一三维点云。
需要说明的是,当某个第二平面实例为真正的水平平面时,其对应的第一三维点云的法线方向则会是重力方向,即在世界坐标系中与Z轴通向,同理,当某个第二平面实例不是真正的水平平面时,其对应的第一三维点云的法线方向不是重力方向,而是与重力方向之间存在偏差即夹角,因此,通过第一三维点云在世界坐标系中的法线方向是否与Z轴方向存在一定偏离判断第一三维点云是否为真正的水平平所在的点云。此外,由于转换过程中可能存在投影误差以及计算法线方向也可能存在误差,因此,由第二平面实例得到第一三维点云在世界坐标系中的法线方向与Z轴方向不完全重合也是可能的情况,进而,根据偏离程度进行判断而不是根据是否重合进行判断,如第一三维点云在世界坐标系中的法线方向与Z轴方向的夹角在5°以内时认为相应的第一三维点云为水平平面所在的点云,可以保留。
还需要说明的是,当第一三维点云在世界坐标系内的法线方向相对于Z轴方向的偏离程度为第一三位点云在世界坐标系内的法线方向和Z轴方向的其他参数时,相应地需要设置其他类型的阈值条件,此处就不再一一赘述了。
考虑到场景中的桌子等物体存在桌腿等干扰因素,也需要被去除掉,进一步地,在一些 实施例中,如图5所示,步骤106之前,图像重力方向的获取方法还包括:
步骤110,从得到的所有第二平面实例中,去除像素点数量少于预设数量阈值的第二平面实例。
此时,步骤106为:将剩下的第二平面实例从二维投影到三维,得到第一三维点云。
需要说明的是,由于存在桌子类别中会包含桌子的侧面或桌腿平面等情况,需对平面实例进行筛选。而根据经验,像素点较少的平面实例有较大可能为桌腿或桌子侧面,而非桌面,且若为桌面,该桌面的质量可能受语义分割标签不准确的影响较大,会使得总体桌面对应的第三平面实例产生较多的噪声点,因此,通过将像素点数量少的第三平面实例进行去除,以避免以上所述的干扰,使得剩下的第三平面实例是水平平面可能性更高,且还能够减少步骤107步骤处理的第二平面实例的数量、减少步骤109处理第一三维点云的数量,加快处理速度,提升效率。
此外,获取的重力防线可以用于进行三维重建,具体地,在一些实施例,如图6所示,步骤103之后,图像重力方向的获取方法还包括:
步骤111,根据重力方向确定从相机坐标系到世界坐标系的旋转平移矩阵。
具体地说,由于重力方向实际代表着世界坐标系的Z轴方向,因此在图像平面内重力方向确定后,重力方向与基于图像平面形成的相机坐标系的Z轴之间的夹角即为相机坐标系和世界坐标系在Z轴方向上的夹角,结合其他维度的信息,如X轴方向的夹角等,可以确定出旋转平移矩阵。
步骤112,根据旋转平移矩阵对相机拍摄到的场景进行三维重建。
具体地说,得到旋转平移矩阵后,通过分别将相机拍摄到的场景从二维投影到基于相机坐标系的三维空间内,进而利用旋转平移矩阵从相机坐标系转换到世界坐标系,得到三维重建的结果。
需要说明的是,实现了三维重建之后,可以针对三维重建的结果进行多种操作,如机器人对当前所处环境的分析以进行抓取目标物体等操作,或者,呈现三维重建场景以供观众进行欣赏等。
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。
本发明实施例还提供了一种图像重力方向的获取装置,如图7所示,包括:
获取模块701,用于获取训练好的三维桌面语义分割模型,其中,三维桌面语义分割模型的训练集为若干点云图像组成的集合,点云图像携带根据点云图像中水平平面所在的点云生成的三维桌面掩码。
语义分割模块702,用于利用三维桌面语义分割模型对测试点云图像进行语义分割,得到测试点云图像的三维测试桌面掩码。
确定模块703,用于确定三维测试桌面掩码的测试法线方向作为测试点云图像的重力方向。
不难发现,本实施例为与图像重力方向的获取方法实施例相对应的装置实施例,本实施例可与图像重力方向的获取方法实施例互相配合实施。图像重力方向的获取方法实施例中提到的相关技术细节在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在图像重力方向的获取方法实施例中。
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本发明的创新部分,本实施例中并没有将与解决本发明所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。
本发明实施例还提供了一种电子设备,如图8所示,包括至少一个处理器801;以及,与至少一个处理器801通信连接的存储器802;其中,存储器802存储有可被至少一个处理器801执行的指令,指令被至少一个处理器801执行,以使至少一个处理器801能够执行上述任一方法实施例所描述的图像重力方向的获取方法。
其中,存储器802和处理器801采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器801和存储器802的各种电路连接在一起。总线还可以将诸如***设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器801处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传输给处理器801。
处理器801负责管理总线和通常的处理,还可以提供各种功能,包括定时,***接口,电压调节、电源管理以及其他控制功能。而存储器802可以被用于存储处理器801在执行操作时所使用的数据。
本发明的实施例涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理 器执行时实现上述方法实施例。
本发明的实施例还提供了一种计算机程序,所述计算机程序被处理器执行时实现上述方法实施例。
本领域技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域的普通技术人员可以理解,上述各实施方式是实现本发明的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本发明的精神和范围。

Claims (11)

  1. 一种图像重力方向的获取方法,其特征在于,包括:
    获取训练好的三维桌面语义分割模型,其中,所述三维桌面语义分割模型的训练集为若干点云图像组成的集合,所述点云图像携带根据所述点云图像中水平平面所在的点云生成的三维桌面掩码;
    利用所述三维桌面语义分割模型对测试点云图像进行语义分割,得到所述测试点云图像的三维测试桌面掩码;
    确定所述三维测试桌面掩码的测试法线方向作为所述测试点云图像的重力方向。
  2. 根据权利要求1所述的图像重力方向的获取方法,其特征在于,所述获取训练好的三维桌面语义分割模型之前,所述方法还包括:
    对预设的包含有若干RGB-D图像的数据集进行平面检测,得到每个所述RGB-D图像中的若干第一平面实例;
    根据所述数据集携带的语义分割标签识别并去除所述第一平面实例中的竖直平面,得到第二平面实例;
    将所述第二平面实例从二维投影到三维,得到第一三维点云;
    将属于同一所述RGB-D图像的所述第一三维点云作为相应的所述点云图像的所述三维桌面掩码,得到所述训练集;
    利用所述训练集对初始的所述三维桌面语义分割模型进行训练。
  3. 根据权利要求2所述的图像重力方向的获取方法,其特征在于,所述将属于同一所述RGB-D图像的所述第一三维点云作为相应的所述点云图像的所述三维桌面掩码,得到所述训练集之前,所述方法还包括:
    根据所述第一三维点云在世界坐标系内的法线方向相对于Z轴方向的偏离程度对所述第一三维点云进行筛选;
    所述将属于同一所述RGB-D图像的所述第一三维点云作为相应的所述点云图像的所述三维桌面掩码,得到所述训练集,包括:
    将所述RGB-D图像从二维投影到三维,得到所述点云图像;
    将属于同一所述点云图像的第二三维点云确定为所述三维桌面掩码,得到所述训练集,其中,所述第二三维点云为筛选后被保留的所述第一三维点云。
  4. 根据权利要求3所述的图像重力方向的获取方法,其特征在于,所述根据所述第一三维点云在世界坐标系内的法线方向相对于Z轴方向的偏离程度对所述第第一三维点云进行筛 选,包括:
    根据所述数据集携带的外部参数将所述第一三维点云转换到所述世界坐标系下;
    确定坐标系转换后的所述第一三维点云所在平面的法线;
    去除所述法线与所述世界坐标系的Z轴的夹角超过预设角度阈值的坐标系转换后的所述第一三维点云。
  5. 根据权利要求3或4所述的图像重力方向的获取方法,其特征在于,所述将所述第二平面实例从二维投影到三维,得到第一三维点云之前,所述方法还包括:
    从得到的所有所述第二平面实例中,去除像素点数量少于预设数量阈值的所述第二平面实例;
    所述将所述第二平面实例从二维投影到三维,得到第一三维点云,包括:
    将剩下的所述第二平面实例从二维投影到三维,得到所述第一三维点云。
  6. 根据权利要求1所述的图像重力方向的获取方法,其特征在于,所述确定所述三维测试桌面掩码的测试法线方向,包括:
    利用主成分分析PCA算法计算所述三维测试桌面掩码的法线方向作为所述测试法线方向。
  7. 根据权利要求1所述的图像重力方向的获取方法,其特征在于,所述确定所述三维测试桌面掩码的测试法线方向作为所述测试点云图像的重力方向之后,所述方法还包括;
    根据所述重力方向确定从相机坐标系到世界坐标系的旋转平移矩阵;
    根据所述旋转平移矩阵对相机拍摄到的场景进行三维重建。
  8. 一种图像重力方向的获取装置,其特征在于,包括:
    获取模块,用于获取训练好的三维桌面语义分割模型,其中,所述三维桌面语义分割模型的训练集为若干点云图像组成的集合,所述点云图像携带根据所述点云图像中水平平面所在的点云生成的三维桌面掩码;
    语义分割模块,用于利用所述三维桌面语义分割模型对测试点云图像进行语义分割,得到所述测试点云图像的三维测试桌面掩码;
    确定模块,用于确定所述三维测试桌面掩码的测试法线方向作为所述测试点云图像的重力方向。
  9. 一种电子设备,其特征在于,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1-7中任一所述的图像重力方向的获取方法。
  10. 一种计算机可读存储介质,存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的图像重力方向的获取方法。
  11. 一种计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的图像重力方向的获取方法。
PCT/CN2022/084586 2021-08-12 2022-03-31 图像重力方向的获取方法、装置、电子设备及存储介质 WO2023015914A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110926336.0 2021-08-12
CN202110926336.0A CN115222799B (zh) 2021-08-12 2021-08-12 图像重力方向的获取方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023015914A1 true WO2023015914A1 (zh) 2023-02-16

Family

ID=83606133

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/084586 WO2023015914A1 (zh) 2021-08-12 2022-03-31 图像重力方向的获取方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN115222799B (zh)
WO (1) WO2023015914A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127727A (zh) * 2016-06-07 2016-11-16 中国农业大学 一种家畜体表三维数据获取方法
CN109034065A (zh) * 2018-07-27 2018-12-18 西安理工大学 一种基于点云的室内场景物体提取方法
CN110378246A (zh) * 2019-06-26 2019-10-25 深圳前海达闼云端智能科技有限公司 地面检测方法、装置、计算机可读存储介质及电子设备
US10657647B1 (en) * 2016-05-20 2020-05-19 Ccc Information Services Image processing system to detect changes to target objects using base object models
CN111523547A (zh) * 2020-04-24 2020-08-11 江苏盛海智能科技有限公司 一种3d语义分割的方法及终端
CN111932688A (zh) * 2020-09-10 2020-11-13 深圳大学 一种基于三维点云的室内平面要素提取方法、***及设备
CN112529963A (zh) * 2020-12-11 2021-03-19 深圳一清创新科技有限公司 一种楼梯检测方法、装置及移动机器人

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2615580B1 (en) * 2012-01-13 2016-08-17 Softkinetic Software Automatic scene calibration
CN103514446B (zh) * 2013-10-16 2016-08-24 北京理工大学 一种融合传感器信息的室外场景识别方法
CN105488809B (zh) * 2016-01-14 2018-04-17 电子科技大学 基于rgbd描述符的室内场景语义分割方法
GB2565354A (en) * 2017-08-11 2019-02-13 Canon Kk Method and corresponding device for generating a point cloud representing a 3D object
WO2020024144A1 (zh) * 2018-08-01 2020-02-06 广东朗呈医疗器械科技有限公司 一种三维成像方法、装置及终端设备
CN110097553B (zh) * 2019-04-10 2023-05-02 东南大学 基于即时定位建图与三维语义分割的语义建图***
CN110006435A (zh) * 2019-04-23 2019-07-12 西南科技大学 一种基于残差网络的变电站巡检机器人视觉辅助导航方法
CN112396650B (zh) * 2020-03-30 2023-04-07 青岛慧拓智能机器有限公司 一种基于图像和激光雷达融合的目标测距***及方法
CN112767538B (zh) * 2021-01-11 2024-06-07 浙江商汤科技开发有限公司 三维重建及相关交互、测量方法和相关装置、设备
CN113223091B (zh) * 2021-04-29 2023-01-24 达闼机器人股份有限公司 三维目标检测方法、抓取方法、装置及电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10657647B1 (en) * 2016-05-20 2020-05-19 Ccc Information Services Image processing system to detect changes to target objects using base object models
CN106127727A (zh) * 2016-06-07 2016-11-16 中国农业大学 一种家畜体表三维数据获取方法
CN109034065A (zh) * 2018-07-27 2018-12-18 西安理工大学 一种基于点云的室内场景物体提取方法
CN110378246A (zh) * 2019-06-26 2019-10-25 深圳前海达闼云端智能科技有限公司 地面检测方法、装置、计算机可读存储介质及电子设备
CN111523547A (zh) * 2020-04-24 2020-08-11 江苏盛海智能科技有限公司 一种3d语义分割的方法及终端
CN111932688A (zh) * 2020-09-10 2020-11-13 深圳大学 一种基于三维点云的室内平面要素提取方法、***及设备
CN112529963A (zh) * 2020-12-11 2021-03-19 深圳一清创新科技有限公司 一种楼梯检测方法、装置及移动机器人

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A N RUCHAY, K A DOROFEEV, A V KOBER: "Accurate reconstruction of the 3D indoor environment map with a RGB-D camera based on multiple ICP", IMAGE PROCESSING AND EARTH REMOTE SENSING, SAMARA NATIONAL RESEARCH UNIVERSITY, vol. 18, 1 January 2018 (2018-01-01), pages 300 - 308, XP055626529, DOI: 10.18287/1613-0073-2018-2210-300-308 *
方国润 (FANG, GUORUN): ""三维点云分割研究综述" ("A Review of Three-dimensional Point Cloud Segmention")", 计量与测试技术 (METROLOGY & MEASUREMENT TECHNIQUE), vol. 48, no. 7,, 1 January 2018 (2018-01-01), XP09543408, ISSN: 460-4099 *

Also Published As

Publication number Publication date
CN115222799A (zh) 2022-10-21
CN115222799B (zh) 2023-04-11

Similar Documents

Publication Publication Date Title
US11704833B2 (en) Monocular vision tracking method, apparatus and non-transitory computer-readable storage medium
US11373332B2 (en) Point-based object localization from images
US10086955B2 (en) Pattern-based camera pose estimation system
Wang et al. Latte: accelerating lidar point cloud annotation via sensor fusion, one-click annotation, and tracking
CN111127422A (zh) 图像标注方法、装置、***及主机
US10451403B2 (en) Structure-based camera pose estimation system
US9858669B2 (en) Optimized camera pose estimation system
CN109711472B (zh) 训练数据生成方法和装置
CN112183506A (zh) 一种人体姿态生成方法及其***
WO2024012333A1 (zh) 位姿估计方法及相关模型的训练方法、装置、电子设备、计算机可读介质和计算机程序产品
CN109934165A (zh) 一种关节点检测方法、装置、存储介质及电子设备
WO2023015914A1 (zh) 图像重力方向的获取方法、装置、电子设备及存储介质
CN114690144A (zh) 一种点云数据标注方法、装置及***
WO2023015915A1 (zh) 图像重力方向的获取方法、装置、电子设备及存储介质
US10438066B2 (en) Evaluation of models generated from objects in video
CN117576489B (zh) 智能机器人鲁棒实时目标感知方法、装置、设备及介质
US20240046122A1 (en) Cross-media corresponding knowledge generation method and apparatus
KR102590730B1 (ko) 학습 데이터 수집 시스템 및 방법
CN117576609A (zh) 实验器材的自动标注方法、***、设备和介质
CN116363307A (zh) 一种基于三维重建的实验操作评分方法和装置
Weichen et al. An accurate and efficient monocular mixed match SLAM
CN112070175A (zh) 视觉里程计方法、装置、电子设备及存储介质
Peng et al. Vision-Based Manipulator Grasping
Nobar Precise Hand Finger Width Estimation via RGB-D Data
CN117474961A (zh) 减少深度估计模型误差的方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22854919

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE