CN112801977B

CN112801977B - Assembly body part relative pose estimation and monitoring method based on deep learning

Info

Publication number: CN112801977B
Application number: CN202110117860.3A
Authority: CN
Inventors: 陈成军; 李长治; 潘勇; 李东年; 洪军
Original assignee: Qingdao University of Technology
Current assignee: Qingdao University of Technology
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2022-11-22
Anticipated expiration: 2041-01-28
Also published as: CN112801977A

Abstract

The invention relates to an assembly part relative pose estimation and monitoring method based on deep learning, which comprises the following steps of: shooting images of a target assembly body at different angles through a camera, and establishing a sample data set through the collected images; performing feature extraction and 3D key point detection on the sample data set through a deep learning network to obtain a 3D key point set of each part in the assembly body; performing semantic segmentation according to the acquired image to distinguish different parts in the image; obtaining a pose prediction value of each part under a camera coordinate system by using a least square fitting algorithm according to the 3D key point set and the point cloud data set of each part; selecting a part as a reference system part, establishing a world coordinate system by taking the geometric center of the reference system part as an origin, and calculating the true pose value of the reference system part in a camera coordinate system; and respectively calculating the relative pose relationship between each other part and the reference system part, wherein the relative pose relationship comprises a space geometric distance, a relative rotation matrix and a relative angle.

Description

Assembly body part relative pose estimation and monitoring method based on deep learning

Technical Field

The invention relates to an assembly part relative pose estimation and monitoring method based on deep learning, and belongs to the technical field of computer vision and intelligent manufacturing.

Background

Computer vision has great significance for the reform and upgrade of the intelligent manufacturing industry, and particularly, the development of the modern industry is promoted by the mass emergence of deep learning networks. In traditional manual assembly operation, the workman need contrast assembly process drawing, and assembly information is loaded down with trivial details and visual degree is low, causes the workman to understand the difficulty and assembly efficiency is lower, needs to assemble the quality inspection to the assembly body simultaneously, and the inspection link is numerous and diverse, and the many paper file records of rechecking information record are the main, and is consuming time and wasting power, causes assembly part position mistake easily, and assembly quality can't guarantee.

The position of an assembly body part of an automatic industrial assembly line is fixed through a specially designed tool, so that the manipulator can be assembled, when the working environment is suddenly changed or a machined product is upgraded and replaced, the assembly body part needs to be dismantled and replaced, and the position posture of the assembly body part is checked again. In order to effectively monitor the position and the posture of the parts of the assembly body and improve the production speed and the product quality, a method for assisting in monitoring the position and the posture of the parts of the assembly body is urgently needed, and the position and the posture information of an object in the space can be estimated through the 6-degree-of-freedom posture estimation, so that the method is of great help for monitoring the parts of the assembly body.

At present, the pose estimation algorithm for common objects is common, because the common objects have clear texture and color characteristics and do not generate strong reflection phenomenon, so that a plurality of classical pose estimation algorithms based on the template pose estimation algorithm and feature point method can accurately estimate the pose of the object, and then the subsequent tasks are completed. However, most of mechanical parts have the characteristics of no texture, no color and no light reflection, and have a serious shielding phenomenon in the parts of the assembly body, which all brings challenges to the pose estimation of the parts. At present, most of pose estimation of mechanical parts is carried out on scattered parts, the pose of a single part under a camera coordinate system is estimated, integral correlation is lacked, and relative pose estimation among parts cannot be carried out on an assembly part. Therefore, the degree of freedom of the assembly part 6 relative to pose estimation has important practical value significance.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides an assembly body part relative pose estimation and monitoring method based on deep learning, wherein images of different angles are shot for an assembly body, so that the phenomenon that parts of the assembly body are mutually shielded is avoided; meanwhile, one part is selected as a reference system part, the relative pose relation between other parts and the reference system part is calculated, compared with the situation that the pose of the part under a camera coordinate system is only estimated, the relation among the parts can be monitored, and whether the assembly body is qualified or not is judged according to the relation among the parts.

The technical scheme of the invention is as follows:

an assembly part relative pose estimation and monitoring method based on deep learning comprises the following steps:

establishing an assembly body part data set, shooting images of a target assembly body at different angles through a camera, generating point cloud data sets corresponding to different parts in the assembly body through the collected images, and establishing a sample data set;

selecting 3D key points, performing feature extraction on the sample data set through a deep learning network to obtain surface information and geometric information of each part in the assembly body, and performing feature fusion on the surface information and the geometric information of each part to obtain point-by-point features of each part; performing 3D key point detection on the point-by-point characteristics of each part to obtain a 3D key point set of each part in the assembly body;

segmenting parts, performing semantic segmentation according to the acquired image, and identifying and segmenting different parts of the assembly in the image;

predicting the pose of the part, and obtaining a predicted pose value of each part under a camera coordinate system by using a least square fitting algorithm according to the 3D key point set and the point cloud data set of each part in the segmented image, wherein the pose comprises a rotation matrix and an offset matrix of the part under the camera coordinate system;

setting a reference system, selecting a part as a reference system part, establishing a world coordinate system by taking the geometric center of the reference system part as an original point, and calculating the true pose value of the reference system part in the camera coordinate system through the conversion relation between the world coordinate system and the camera coordinate system;

and estimating relative poses, namely respectively calculating the relative pose relations of the other parts and the reference system part according to the true pose values of the reference system part in the camera coordinate system and the predicted values of the other parts except the reference system part in the assembly body in the camera coordinate system, wherein the relative pose relations comprise space geometric distance, relative rotation matrix and relative angle.

Further, the image comprises a depth map and a color map, and scene registration is carried out through the depth map and the color map to obtain point cloud data containing an assembly body; and performing point cloud clipping on the point cloud data containing the assembly body, removing background information and messy information in a scene, and generating a three-dimensional model of the assembly body, wherein the three-dimensional model of the assembly body is composed of point cloud data sets corresponding to different parts in the assembly body.

Further, the step of performing 3D keypoint detection on the point-by-point features of each part to obtain a 3D keypoint set of each part in the assembly body is specifically as follows:

inputting point-by-point characteristics of each part, initializing a plurality of initial key points of each part in a three-dimensional model of the assembly body by utilizing a farthest point sampling algorithm, and estimating the offset from a visible point to each initial key point;

voting is carried out on each initial key point through the offset to obtain a clustering point set of each part, outlier interference in each clustering point set is eliminated by using a clustering algorithm, and finally, the geometric center of each clustering point set is selected as a 3D key point of each part;

and traversing all the images to repeatedly update the initial key points of all the parts, repeating the steps to obtain a plurality of 3D key points of all the parts, and collecting the 3D key points into a 3D key point set of all the parts.

Further, the semantic segmentation according to the acquired image, and the step of identifying and segmenting different parts of the assembly in the image specifically comprises the following steps:

obtaining semantic labels, extracting global features and local features of each part in the image, predicting the semantic labels of the corresponding parts point by point according to the global features and the local features of the parts, and obtaining outlines of different parts;

acquiring the central points of different parts, voting the central points of the semantic tags of each part, acquiring the central points of the semantic tags by voting, and taking the semantic tag central point of each part as a two-dimensional central point of each part; mapping the two-dimensional central point of each part to the three-dimensional central point of the corresponding part in the three-dimensional model of the assembly body;

and performing instance segmentation on the image according to the semantic labels and the semantic label center points of the parts, and distinguishing different parts in the image.

Further, the specific steps of obtaining the pose prediction value of each part under the camera coordinate system by using the least square fitting algorithm are as follows:

obtaining a pose initial value of each part under a camera coordinate system by utilizing a least square fitting algorithm according to the 3D key point set and the point cloud data set of each part;

and (4) performing iterative refinement on the pose initial values of the parts under the camera coordinate system by using an iterative optimization algorithm until pose predicted values meeting the precision requirements are obtained.

Further, in the step of calculating the relative pose relationship between each of the other parts and the reference system part, the step of calculating the spatial geometric distance is specifically as follows:

acquiring a real value of the coordinate of the three-dimensional central point of the reference system part under the camera coordinate system according to the conversion relation between the world coordinate system and the camera coordinate system;

calculating a three-dimensional central point coordinate predicted value of a three-dimensional central point of the target part under a camera coordinate system according to the rotation matrix predicted value and the offset matrix predicted value of the target part;

calculating the space geometric distance between the predicted value of the three-dimensional center point coordinate of the target part and the true value of the three-dimensional center point coordinate of the reference system part by the following formula:

where d represents the geometrical distance in space, _n representing an n-dimensional space, A _i -B _i It is shown that the difference is made between the i-th coordinate of the point a and the i-th coordinate of the point B.

Further, in the step of calculating the relative pose relationship between each of the other parts and the reference system part, the step of calculating the relative rotation matrix and the relative angle is specifically as follows:

defining the rotation matrix as:

let the true value of the rotation matrix of the reference system part relative to the camera coordinate system be R ₀ The predicted value of the rotation matrix of the target part is R _i I denotes the ith target part, and R is calculated _i Inverse matrix of

The relative rotation matrix of the target part with respect to the reference frame part is then:

by relative matrix R _0i And performing angle conversion to obtain the relative angle between the target part and the reference system part.

The invention has the following beneficial effects:

according to the invention, images at different angles are shot for the assembly body, so that the phenomenon that parts of the assembly body are shielded mutually is avoided; meanwhile, one part is selected as a reference system part, the relative pose relationship between other parts and the reference system part is calculated, the relationship between the parts can be monitored compared with the situation that the pose of the part under a camera coordinate system is only estimated, and whether the assembly body is qualified or not is judged according to the relationship between the parts.

Drawings

FIG. 1 is a flow chart of a first embodiment of the present invention;

FIG. 2 is a flowchart of a second embodiment of the present invention;

FIG. 3 is an illustration of an assembly in an embodiment of the invention;

FIG. 4 is an exemplary illustration of a reference frame component in an embodiment of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and the specific embodiments.

The first embodiment is as follows:

referring to fig. 1, an assembly part relative pose estimation and monitoring method based on deep learning includes the following steps:

establishing an assembly body part data set, carrying out bounding ball motion along a target assembly body through a camera, shooting the target assembly body at certain intervals to obtain images of the target assembly body at different angles, generating point cloud data sets corresponding to different parts in the assembly body through the acquired images, and establishing a sample data set;

selecting 3D key points, loading the sample data set to a deep learning network (in the example, a specially-made extraction network is adopted) for feature extraction, obtaining surface information and geometric information of each part in the target assembly body, and performing feature fusion on the surface information and the geometric information of each part to obtain point-by-point features of each part; performing 3D key point detection on the point-by-point characteristics of each part to obtain a 3D key point set of each part in the assembly body;

setting a reference system, selecting a part as a reference system part, establishing a world coordinate system by taking a geometric center of the reference system part (in the embodiment, the geometric center is a three-dimensional central point of the reference system part) as an origin, and calculating a real pose value of the reference system part in the camera coordinate system through a conversion relation between the world coordinate system and the camera coordinate system;

and estimating relative pose, namely respectively calculating the relative pose relationship between each other part and the reference system part according to the true pose value of the reference system part in the camera coordinate system and the predicted value of other parts in the assembly body except the reference system part in the camera coordinate system, wherein the relative pose relationship comprises a space geometric distance, a relative rotation matrix and a relative angle.

In the embodiment, images at different angles are shot on the assembly body, so that the phenomenon that parts of the assembly body are mutually shielded is avoided; meanwhile, one part is selected as a reference system part, the relative pose relation between other parts and the reference system part is calculated, compared with the situation that the pose of the part under a camera coordinate system is only estimated, the relation among the parts can be monitored, and whether the assembly body is qualified or not is judged according to the relation among the parts.

The second embodiment:

further, referring to fig. 2, in this embodiment, the image includes a depth map and a color map, and the scene registration is performed through the depth map and the color map, the gray value of each pixel point of the depth map indicates the distance from a certain point in the scene to the camera, the geometric shape of the surface of a visible object in the scene is reflected, and the scene reconstruction is performed through the coordinate transformation of the pcl function library according to a group of depth images obtained through shooting, so as to obtain the point cloud data of the scene object; the method comprises the steps that a scene comprises a plurality of target objects, the point cloud is cut repeatedly by adopting a meshlab software, background information and messy information in the scene are removed, then a three-dimensional model of an assembly body in an initial frame coordinate system is generated, and the three-dimensional model of the assembly body comprises a three-dimensional model of each part; the three-dimensional model of the part comprises the coordinates of each point cloud forming the part under the camera coordinate system.

In the embodiment, the color image and the depth image are respectively loaded to a CNN and a PointNet + + network to extract characteristic information, surface information and geometric information of each part in each image are obtained, and then the surface information and the geometric information are fused by using a characteristic fusion network to obtain point-by-point characteristics of each part in each image;

the step of performing 3D key point detection on the point-by-point characteristics of each part to obtain a 3D key point set of each part in the assembly body is as follows:

firstly, point-by-point characteristics are loaded into a 3D key point detection module, M initial key points on a three-dimensional model of each part are initialized by utilizing a farthest point sampling algorithm, and the offset of a visible point of the three-dimensional model of each part to the M initial key points in a two-dimensional image is estimated;

voting M initial key point positions through the offset, selecting a point which can represent the three-dimensional model of the target object most as a key point to obtain a key point clustering point set, eliminating outlier interference by using a clustering algorithm, and then selecting a geometric center of each clustering point set as a 3D key point of each part;

and traversing all the images to repeatedly update the initial key points of each part, repeating the steps to obtain a plurality of 3D key points of each part, and collecting the 3D key points into a 3D key point set of the corresponding part.

Furthermore, the semantic segmentation is performed according to the acquired image, and the step of segmenting the point cloud data sets of different parts in the assembly body specifically comprises the following steps:

obtaining semantic labels, extracting global features and local features of each part in each image, wherein the global features refer to the overall attributes of the images, common global features comprise color features, texture features, shapes and other features, the local features are features extracted from local regions of the images and comprise corners, edges and other features, and the semantic labels of the corresponding parts are predicted point by point according to different global features and local features of the parts; the semantic tags are used for processing multi-target problems to distinguish different objects and obtain the outlines of different types of parts in the images.

Dividing different parts, voting a center point of a semantic label of each part, and taking the semantic label center point as a two-dimensional center point of each part; the two-dimensional central point of each part is mapped to the three-dimensional central point of the three-dimensional model of the part, and the three-dimensional central point contains three-dimensional space information, so that the problem of shielding can be effectively solved; and carrying out example segmentation on the image of the assembly body through the three-dimensional central point of each part and the semantic label, extracting different example features under the same semantic label in the image, and segmenting different example parts in the assembly body and the corresponding point cloud data set.

firstly, two point sets of each target object in each picture are given, wherein one point set is a 3D key point set detected in a camera coordinate system, and the other point set is a point cloud data set corresponding to a three-dimensional model of the corresponding target object which is segmented from the corresponding picture; calculating a pose initial value of the corresponding part in a camera coordinate system by using a least square fitting algorithm;

and respectively acquiring pose initial values of the parts in a camera coordinate system, and iteratively refining the pose initial values by using an iterative optimization algorithm until pose predicted values meeting the precision requirements are acquired.

firstly, selecting a specific part as a reference system part, establishing a world coordinate system at the position of the geometric center (three-dimensional central point) of the reference system part, and setting a camera coordinate system as X', wherein a conversion matrix from the world coordinate system to the camera coordinate system is as follows:

X′＝[R|t]*X；

wherein R is a rotation matrix, t is an offset matrix, X is a world coordinate system, as shown in FIGS. 3 and 4, FIG. 3 is a two-stage cylindrical conical reducer, FIG. 4 uses a box body of the two-stage cylindrical conical reducer assembly as a reference system part, and a fixed origin is arranged at the geometric center of the box body, and in FIG. 3, the parts except the box body are respectively P1-gear shaft, P2-large gear, P3-large helical gear, P4-small helical gear, P5-shaft, P6-bearing, and P7-shaft sleeve.

calculating the space geometric distance between the predicted value of the three-dimensional central point coordinate of the target part and the real value of the three-dimensional central point coordinate of the reference system part by the following formula:

where d represents a geometrical distance in space, n represents an n-dimensional space, A _i -B _i The difference between the ith coordinate of the point A and the ith coordinate of the point B is expressed; the points calculated in this embodiment are in three-dimensional space, so n =3,

the space geometric distance does not change along with the movement of the camera shooting and is a fixed value, so that the relative distance relation of the assembly part relative to the fixed reference system part is judged.

defining the rotation matrix as:

let the true value of the rotation matrix of the reference system part relative to the camera coordinate system be R ₀ The predicted value of the rotation matrix of the target part is R _i I denotes the ith target part, and R is calculated _i Inverse matrix of (2)

relative rotation matrix of the target part relative to the reference system part can be obtained, and the obtained relative rotation matrix is still a fixed value and does not change along with the movement shot by the camera, so that the relative rotation relation of other parts in the assembly body relative to the set reference system part is judged;

by a relative matrix R _0i And performing angle conversion to obtain the relative angle between the target part and the reference system part. Taking the sequential rotation transformation example of the X-Y-Z axis, the angle transformation of the target part relative to the reference system part is shown as the formula:

wherein gamma corresponds to the rotation angle of the X axis, beta corresponds to the rotation angle of the Y axis, and alpha corresponds to the rotation angle of the Z axis, and the obtained relative angle is still a fixed value and does not change along with the movement of the camera for shooting. Through the steps, the space geometric distance, the relative rotation matrix and the relative angle of other parts in the assembly body relative to the relative pose of the reference system part are all obtained, and all values are ideally fixed values and do not change along with the movement of camera shooting. There will be some error in the actual solution, where the error is caused by the precision of the semantic segmentation, the instance segmentation, and the least squares fitting. And comprehensively evaluating the pose of each other part relative to the set reference system part according to the obtained values, and judging whether the position of each target object relative to the set reference system is correct or not. Taking the assembly body in fig. 3 as an example, the accuracy of the relative pose of each other part with respect to the set reference system part box body is finally calculated to be 98%.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An assembly part relative pose estimation monitoring method based on deep learning is characterized by comprising the following steps:

establishing an assembly body part data set, shooting images of different angles of a target assembly body through a camera, generating point cloud data sets corresponding to different parts in the assembly body through the collected images, and establishing a sample data set;

segmenting parts, namely performing semantic segmentation according to the acquired image, and identifying and segmenting different parts of the assembly in the image;

predicting the pose of the part, and obtaining a pose prediction value of each part under a camera coordinate system by using a least square fitting algorithm according to the 3D key point set and the point cloud data set of each part in the segmented image, wherein the pose comprises a rotation matrix and an offset matrix of the part under the camera coordinate system;

estimating relative poses, namely respectively calculating relative pose relations between each other part and the reference system part according to a true pose value of the reference system part in a camera coordinate system and a predicted value of other parts in the assembly body except the reference system part in the camera coordinate system, wherein the relative pose relations comprise a space geometric distance, a relative rotation matrix and a relative angle;

the step of performing 3D keypoint detection on the point-by-point features of each part to obtain a 3D keypoint set of each part in the assembly body is specifically as follows:

voting is carried out on each initial key point through the offset to obtain a clustering point set of each part, the clustering algorithm is used for eliminating outlier interference in each clustering point set, and finally, the geometric center of each clustering point set is selected as the 3D key point of each part;

2. The assembly part relative pose estimation and monitoring method based on deep learning of claim 1, characterized in that: the image comprises a depth map and a color map, and scene registration is carried out through the depth map and the color map to obtain point cloud data containing an assembly body; and performing point cloud cutting on the point cloud data containing the assembly body, removing background information and messy information in a scene, and generating a three-dimensional model of the assembly body, wherein the three-dimensional model of the assembly body is composed of point cloud data sets corresponding to different parts in the assembly body.

3. The assembly part relative pose estimation and monitoring method based on deep learning of claim 2, wherein the step of performing semantic segmentation according to the acquired image, and identifying and segmenting different parts of the assembly body in the image specifically comprises:

obtaining semantic labels, extracting global features and local features of each part in the image, predicting the semantic labels of the corresponding parts point by point according to the global features and the local features of the parts, and obtaining the outlines of different parts;

4. The assembly part relative pose estimation and monitoring method based on deep learning of claim 3, wherein the specific step of obtaining the pose prediction value of each part in the camera coordinate system by using the least square fitting algorithm comprises:

5. An assembly part relative pose estimation and monitoring method based on deep learning as claimed in claim 4, wherein in the step of calculating the relative pose relationship between each other part and the reference system part, the step of calculating the spatial geometric distance is as follows:

where d represents a geometrical distance in space, n represents an n-dimensional space, A _i -B _i Indicating that the ith coordinate value of the point A is different from the ith coordinate value of the point B.

6. An assembly part relative pose estimation and monitoring method based on deep learning as claimed in claim 4, wherein in the step of calculating the relative pose relationship between each other part and the reference system part, the step of calculating the relative rotation matrix and the relative angle is as follows:

defining the rotation matrix as:

by a relative matrix R _0i And performing angle conversion to obtain the relative angle between the target part and the reference system part.