CN112734843A

CN112734843A - Monocular 6D pose estimation method based on regular dodecahedron

Info

Publication number: CN112734843A
Application number: CN202110022822.XA
Authority: CN
Inventors: 孙昊; 谭英伦; 段伦辉; 崔睿; 吴梦坤
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2021-04-30
Anticipated expiration: 2041-01-08
Also published as: CN112734843B

Abstract

The invention discloses a monocular 6D pose estimation method based on a regular dodecahedron. The method combines the ArUco code with the regular dodecahedron, then uses a monocular camera to collect images of the regular dodecahedron marked by the ArUco code, calculates the spatial three-dimensional coordinates of the central point of the ArUco code, obtains sparse point cloud of an inscribed sphere of the ArUco code, compares the virtual regular dodecahedron at the camera coordinates by utilizing multivariate nonlinear fitting and an iterative closest point algorithm, further calculates the spatial displacement and the rotation angle of the current regular dodecahedron, and indirectly calculates the pose of a measured object. The method overcomes the high requirements of the existing method on illumination, environment and equipment, the detection is rapid, and the stability is improved; a large amount of effort is not required to be invested in template acquisition or model training in the early stage, the pose estimation cost is obviously reduced, the pose estimation precision is ensured, the universality is enhanced, and the pose estimation of large-amplitude object motion can be carried out.

Description

Monocular 6D pose estimation method based on regular dodecahedron

Technical Field

The invention belongs to the field of image recognition pose detection, and particularly relates to a monocular 6D pose estimation method based on a regular dodecahedron.

Background

Pose estimation of a target object is one of key technologies for ensuring that the robot realizes random operation in a working space. At present, the industrial automation industry is developed rapidly, the requirements of real-time detection and feedback of the state of the robot are continuously improved, and the target pose estimation based on vision has important significance for improving the performance of the robot. The monocular vision system only uses one camera, has a simple structure and low cost, and is widely applied. The main methods for estimating the pose of the target at present comprise:

and estimating the pose by using a feature point matching method. The specific method comprises the steps of firstly extracting characteristic points in the image, matching the characteristic points in different images, and estimating the change of the target posture according to the motion of the characteristic points. A method for estimating the pose of an Autonomous vehicle using feature point extraction and matching techniques is proposed in the documents G.Lu, X.Wong and J.McBride.from Mapping to Localization A Complex frame to visual Estimate Position and Attitude for Autonomous Vehicles [ J ].2019 IEEE International Conference on Image Processing (ICIP), Taipei, China,2019pp.3103-3107, doi:10.1109/ICIP.2019.8803326. However, the feature point matching technique has its own limitations, and the feature point tracking will be greatly affected in the case of poor lighting conditions.

The method based on deep learning is used, the deep learning is widely applied in recent years, and the effect of identifying the pose of an object in an image is achieved by training a deep neural network and constructing a loss function. In the documents "REDMON J, DIVVAL S, GIRSHICK R, et al. you only look on:. Unifield, real-time object detection [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition.2016:779 and 788", the YoLO network is adopted, but the method based on deep learning needs a large amount of sample resources to construct a large amount of sample data and label the real pose, needs a large amount of computing resources to train the network, is difficult to control the cost, is mostly purposefully developed, and cannot meet the universality.

And constructing a three-dimensional template by using a template-based method, and searching the position closest to the actual position for estimating the current target position. For example, in AUBRY M, MATURANA D, EFROS A, et al, setting 3D channels, expressor part-based 2D-3D alignment using large data set of CAD models [ C ] Proceedings of the IEEE conference on computer vision and pattern registration.2014, 3762 and 3769, a large number of three-dimensional CAD models are used to locate the classification of the target in the image as an alignment problem from two-dimensional image to three-dimensional model. However, the template accuracy depends on the size of the template library, and one template is suitable for one scene, and the universality is poor.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to solve the technical problem of providing a monocular 6D pose estimation method based on a regular dodecahedron.

The technical scheme for solving the technical problem is to provide a monocular 6D pose estimation method based on a regular dodecahedron, and the method is characterized by comprising the following steps of:

firstly, marking the regular dodecahedron by using an Aruco code to obtain the Aruco code marked regular dodecahedron; calibrating a monocular camera, and observing the regular dodecahedron marked by the Aruco code through the calibrated monocular camera;

secondly, positioning a plurality of observed Aruco codes in the current frame and the initial frame respectively, and then obtaining the coding id of all the observed Aruco codes in the current frame and the initial frame respectively_i1,2,3, n and the spatial coordinates a of the central point of the ArUco code_i(x_i,y_i,z_i) 1,2,3, n, n represents the number of observed ArUco codes for a frame;

thirdly, performing multivariate nonlinear fitting on the spatial coordinates of the central point of the ArUco code of the current frame obtained in the second step to calculate the coordinates of the spherical center of the inscribed sphere of the regular dodecahedron of the current frame as the spatial coordinates t of the body center of the regular dodecahedron of the current frame₀Performing multivariate nonlinear fitting on the space coordinate of the central point of the Aruco code of the initial frame to calculate the spherical center coordinate of the regular dodecahedron inscribed sphere of the initial frame as the initialSpatial coordinate t of body center of frame regular dodecahedron_O(ii) a Then by the formula t_rel＝t₀-t_OObtaining the relative space coordinate t of the body center of the regular dodecahedron of the current frame relative to the initial frame_rel；

Constructing a virtual regular dodecahedron at the origin of a camera coordinate system, enabling the body center of the virtual regular dodecahedron to coincide with the origin of the camera coordinate system, marking the virtual regular dodecahedron by using the Aruco codes and obtaining the codes of all the Aruco codes of the virtual regular dodecahedron and the space coordinate A of the center point of the Aruco codes_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2,3, ·, 12; then the space coordinate of the central point of the Aruco code of the current frame obtained in the second step and the code thereof are compared with A_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2, 3.. 12, calculating a rotation matrix R of the regular dodecahedron of the current frame by an iterative closest point method, and combining the spatial coordinates of the central point of the Aruco code of the initial frame and the coding thereof with A_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2,3, 12 calculates the rotation matrix R of the regular dodecahedron of the initial frame by the iterative closest point method_OThen by the formula

Calculating to obtain a relative rotation matrix R of the regular dodecahedron of the current frame relative to the initial frame_rel；

According to the relative rotation matrix R_relAnd relative spatial coordinate t_relObtaining a pose matrix T of the current frame regular dodecahedron relative to the initial frame regular dodecahedron_rel：

In the formula (2), SE (3) represents an attribute of the matrix;

fourthly, firstly, a measured object coordinate system is constructed, the regular dodecahedron is fixed on the measured object, and the relative space coordinate t of the regular dodecahedron relative to the measured object is calculated_refAnd relative rotation matrix R_refObtaining a pose matrix T of the regular dodecahedron relative to the measured object_ref：

And then the relative pose matrix T of the measured object is as follows:

and obtaining the spatial attitude change of the measured object of the current frame relative to the initial frame.

Compared with the prior art, the invention has the beneficial effects that:

(1) the method creatively combines the Aruco code with the regular dodecahedron, expands the pose identification from two-dimensional label identification to three-dimensional space body identification, greatly improves the pose identification performance, uses a monocular camera to collect images of the regular dodecahedron marked by the Aruco code, calculates the spatial three-dimensional coordinate of the central point of the Aruco code by using a PnP algorithm, obtains sparse point cloud of an inscribed sphere, compares the virtual regular dodecahedron at the coordinate of the camera by using multivariate nonlinear fitting and an iterative closest point algorithm, converts the position solving and rotation matrix solving problems into an optimization problem, further calculates the space displacement and the rotation angle of the current regular dodecahedron, and indirectly calculates the pose of a measured object.

(2) The method overcomes the high requirements of the existing method on illumination, environment and equipment, the detection is rapid, the stability is improved, and the average image recognition time of each frame is about 10 ms; a large amount of effort is not required to be invested in template acquisition or model training in the early period, so that the pose estimation cost is obviously reduced, the pose estimation precision is ensured, and the universality is enhanced; the three-dimensional object is used as a mark, and the pose estimation of the object large-amplitude motion in the space can be carried out.

(3) The regular dodecahedron adopted by the invention is a three-dimensional object, and can realize the position and posture identification of any spatial angle and large-amplitude motion. Each surface area of the regular dodecahedron is larger than that of the regular icosahedron with the same volume, and each surface can put down larger Aruco codes for computer visual detection, so that the identification effectiveness can be ensured at a farther position.

(4) Compared with a method based on feature points, the method for identifying the Aruco codes by the monocular camera overcomes the sensitivity of the method to factors such as illumination, environment and the like, and can stably operate in most environments.

(5) Compared with a template-based method, the method provided by the invention has the advantages that the target pose is indirectly measured by using the regular dodecahedron, the universality is stronger, a large amount of template libraries do not need to be constructed, and the memory space and the workload are saved.

(6) Compared with a deep learning-based method, the algorithm does not need relatively strong hardware performance, saves cost and meets the universality.

Drawings

FIG. 1 is a schematic representation of an Aruco code-tagged regular dodecahedron of the present invention;

FIG. 2 is a schematic view of a regular dodecahedron and its inscribed sphere in accordance with the present invention;

FIG. 3 is a schematic diagram showing the position relationship between the virtual regular dodecahedron and the actual regular dodecahedron marked by Aruco code according to the present invention.

Detailed Description

Specific examples of the present invention are given below. The specific examples are only intended to illustrate the invention in further detail and do not limit the scope of protection of the claims of the present application.

The invention provides a monocular 6D pose estimation method (short method) based on a regular dodecahedron, which is characterized by comprising the following steps:

firstly, marking the regular dodecahedron by using an Aruco code to obtain the Aruco code marked regular dodecahedron (shown in figure 1); calibrating a monocular camera, and observing the regular dodecahedron marked by the Aruco code through the calibrated monocular camera;

preferably, in the first step, the process of labeling the dodecahedron by using the ArUco code is as follows: 12 Aruco codes are generated through C + + codes by using an OpenCV image processing library, the 12 Aruco codes are respectively attached to 12 faces of the regular dodecahedron in the coding sequence, and the geometric center of the Aruco codes is overlapped with the geometric midpoint of the attached faces.

Preferably, in the first step, the monocular camera is calibrated by using a checkerboard standard calibration board and by using an OpenCV image processing library to calibrate the internal parameters of the camera, so as to generate a camera internal parameter matrix K;

the camera internal reference matrix K is a matrix for converting the coordinates in the space into plane coordinates, and the standardized formula is as follows:

in the formula (1), f_x、f_yIs the focal length parameter of the camera; c. C_x、c_yIs the amount of translation of the pixel.

Secondly, positioning a plurality of observed Aruco codes in the current frame and the initial frame respectively by using an image processing algorithm, and obtaining the coding ids of all observed Aruco codes in the current frame and the initial frame respectively by using the image processing algorithm_i1,2,3, n and obtaining the spatial coordinates A of the center points of all observed Aruco codes in the current frame and the initial frame by using a pose calculation method_i(x_i,y_i,z_i) 1,2,3, n, n represents the number of observed ArUco codes for a frame;

the current frame is an image of the current position of the regular dodecahedron, and the initial frame is an image of the initial position of the regular dodecahedron shot by the monocular camera;

preferably, in the second step, the positioning process of the ArUco code is as follows: carrying out gray processing, median filtering and self-adaptive threshold segmentation on an image acquired by a monocular camera, extracting an Aruco code contour from the segmented image, and filtering non-convex and non-square images to further obtain a candidate region meeting the requirement.

Preferably, in the second step, the coding of the ArUco code is obtained: firstly, perspective transformation is applied to a candidate region to obtain a standard square mark, an Otsu threshold processing algorithm is used for separating black and white color positions, and the black and white color positions are divided into different cells according to the size of the mark; and determining the color of each cell according to the color corresponding to the pixel with the largest number on each cell, and finally converting the color into a binary value to determine the code of the mark.

Preferably, in the second step, the spatial coordinates of the central point of the ArUco code are obtained: the spatial coordinates of the central point of the Aruco code are obtained according to the four corner point information of the candidate area, the pose calculation function of the Aruco library function in the OpenCV framework is used, the four corner point information is used as input, and the spatial coordinates of the central point of the Aruco code under a camera coordinate system can be obtained by using a PnP mode (namely a 2D-3D matching algorithm).

Thirdly, performing multivariate nonlinear fitting on the space coordinates of the center points of all ArUco codes of the current frame obtained in the second step to calculate the coordinates of the spherical center of the regular dodecahedron inscribed sphere of the current frame as the space coordinates t of the body center of the regular dodecahedron of the current frame₀＝(x₀,y₀,z₀) Performing multivariate nonlinear fitting on the spatial coordinates of the central point of the Aruco code of the initial frame to calculate the spherical center coordinates of the regular dodecahedron inscribed sphere of the initial frame as the spatial coordinates t of the body center of the regular dodecahedron of the initial frame_O＝(x_O,y_O,z_O) (ii) a Then the space coordinate t of the body center of the regular dodecahedron of the current frame is used₀＝(x₀,y₀,z₀) Subtracting the spatial coordinate t of the body center of the regular dodecahedron of the initial frame_O＝(x_O,y_O,z_O) By the formula t_rel(x,y,z)＝t₀(x₀,y₀,z₀)-t_O(x_O,y_O,z_O) Obtaining the relative space coordinate t of the body center of the regular dodecahedron of the current frame relative to the initial frame_rel；

Constructing a virtual regular dodecahedron at the origin of a camera coordinate system, wherein the position relationship between the virtual regular dodecahedron and the actual regular dodecahedron is shown in figure 3, the body center of the virtual regular dodecahedron is superposed with the origin of the camera coordinate system Oc, and then using ArUco codes to mark the virtual regular dodecahedron to obtain the codes of all ArUco codes of the virtual regular dodecahedron and the space coordinates A of the center point of the ArUco codes_Wi(x_Wi,y_Wi,z_Wi,id_Wi),i＝12,3, ·, 12; then the space coordinate of the central point of the Aruco code of the current frame obtained in the second step and the code A thereof_i(x_i,y_i,z_i,id_i) 1,2,3, n and a_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2, 3.. 12, calculating a rotation matrix R of the regular dodecahedron of the current frame by an iterative closest point method, and combining the spatial coordinates of the central point of the Aruco code of the initial frame and the coding thereof with A_Wi(x_Wi,y_Wi,z_Wi,id_Wi) 1,2,3, 12 calculates the rotation matrix R of the regular dodecahedron of the initial frame by the iterative closest point method_OThen by the formula

In the formula (2), SE (3) represents a plum group, is a mathematical symbolic expression and represents the attribute of a matrix;

preferably, in the third step, the spatial coordinates of the body center of the regular dodecahedron of the current frame are obtained: since the geometric center of the Aruco code coincides with the geometric midpoint of the surface to which the Aruco code is attached, and the intersection point of the inscribed sphere of the regular dodecahedron and the regular dodecahedron is exactly positioned at the geometric center of each surface of the regular dodecahedron (as shown in FIG. 2), the spatial coordinate A of the central point of each Aruco code on the partial surface of the regular dodecahedron observed by the monocular camera obtained in the second step at the current frame can be used_i(x_i,y_i,z_i) 1,2,3, n is regarded as the tangent sphere point (x) of the regular dodecahedron of the current frame_i,y_i,z_i) The sparse point cloud coordinates of (1); when the monocular camera obtains more than three ArUco codesThe spatial coordinates of the central point satisfy the data condition of multivariate nonlinear fitting (generally, the monocular camera can obtain 6-8 Aruco code central point coordinates for each observation, namely, the condition can be satisfied for each observation), and the spherical point (x) of the inscribed sphere of the regular dodecahedron of the current frame_i,y_i,z_i) And the radius of the inscribed sphere is taken as a parameter, the standard equation of the inscribed sphere is as follows:

(x_i-x₀)²+(y_i-y₀)²+(z_i-z₀)²-R_{inner part} ²＝0 (3)

In the formula (3), R_{Inner part}Is the radius of the inscribed sphere of the regular dodecahedron,

a is the edge length of a regular dodecahedron;

the spherical center of the spherical multivariate nonlinear function is fitted to the formula (3) by using nonlinear least squares, and a loss function is constructed as follows:

since the requirement of multivariate nonlinear fitting on the initial point is high, the initial point is easy to fall into a local minimum value, and therefore the known spherical point coordinate (x) is utilized_i,y_i,z_i) Generating a group of initial values R of multivariate nonlinear fitting for the coordinates of the spherical center of the inscribed sphere_{First stage}：

Then R is put_{First stage}Obtaining a value of the loss function J in the formula (4), and then continuously changing R_{First stage}The value of (a) is that the value of the loss function J in the formula (4) is continuously reduced, and a group of (x) which enables the loss function J to be minimum can be obtained after multiple iterative solution₀,y₀,z₀) I.e. the coordinates of the spherical center of the inscribed sphere of the current frame (i.e. the space of the body center of the regular dodecahedron of the current frame under the coordinate system of the camera)Inter-coordinate t₀＝(x₀,y₀,z₀))。

Preferably, in the third step, the spatial coordinates of the regular dodecahedron body center of the initial frame are obtained: the space coordinate of each ArUco code central point on the partial surface of the regular dodecahedron observed by the monocular camera in the initial frame is taken as the sparse point cloud coordinate of the spherical surface point of the inscribed sphere of the regular dodecahedron in the initial frame, and the space coordinate t of the body center of the regular dodecahedron in the initial frame under the camera coordinate system can be obtained in the same way as the space coordinate of the body center of the regular dodecahedron in the current frame_O＝(x_O,y_O,z_O)。

Preferably, in the third step, the rotation matrix of the current frame regular dodecahedron is obtained:

search A_WiAnd matching the codes of the virtual regular dodecahedron ArUco codes corresponding to the codes of the actual regular dodecahedron ArUco codes of the current frame obtained in the second step, so as to obtain a group of matching points in the space:

(A_Wi,A_i),i＝1,2,3,...,n(6)

the set of matching points consists of the spatial coordinates of the central point of the ArUco code observed on the actual regular dodecahedron and the spatial coordinates of the central point of the ArUco code corresponding to the actual regular dodecahedron, and the centroiding of the matching points is respectively carried out through a formula (7) (namely, the spatial coordinates of the central point of the ArUco code observed on the actual regular dodecahedron and the centroiding of the spatial coordinates of the central point of the ArUco code corresponding to the virtual regular dodecahedron are translated to the original points):

obtained q_iDe-centrode, q, of the spatial coordinates of the center point of the actual regular dodecahedral ArUco code_i' is a centroid removing point of the space coordinate of the central point of the virtual regular dodecahedral ArUco code, and then q is obtained_iAnd q is_iThe relationship of' is as follows:

defining an error term:

the rotation matrix R of the current frame regular dodecahedron with the error term E minimized is obtained according to the formula (9), the first term in the formula (9)

Independent of the optimization objective, R in the second term^TI is also independent of the optimization objective, the error term becomes:

in the formula (10), tr represents a trace of the matrix;

to solve the optimization objective R in equation (10), a matrix is defined:

performing singular value decomposition on W in equation (11) to obtain:

W＝UΣV^T (12)

in equation (12), Σ is the eigenvalue matrix of W, U and V are diagonal matrices, and the rotation matrix R of the regular dodecahedron of the current frame is:

R＝UV^T (13)。

preferably, in the third step, the rotation matrix of the regular dodecahedron of the initial frame is obtained: search A_WiThe coding of the virtual regular dodecahedron ArUco code corresponding to the coding of the actual regular dodecahedron ArUco code of the initial frame obtained in the second step is matched with the same coding, and the rotation matrix R of the initial frame regular dodecahedron can be obtained in the same way as the obtaining of the rotation matrix of the current frame regular dodecahedron_O。

And fourthly, indirectly calculating the 6D pose of the measured object according to the space relative position relation between the regular dodecahedron and the measured object.

Preferably, the fourth step is specifically: firstly, a measured object coordinate system is constructed, the regular dodecahedron is fixed on the measured object, and the relative position and posture relation between the regular dodecahedron and the measured object is calculated (namely, the relative space coordinate t of the regular dodecahedron relative to the measured object is calculated_refAnd relative rotation matrix R_ref) Obtaining a pose matrix T of the regular dodecahedron relative to the measured object_ref：

And then the relative pose matrix T of the measured object is as follows:

and obtaining the space attitude change of the measured object of the current frame relative to the initial frame.

The camera calibration method, the Aruco code generation method, the image processing algorithm, the Otsu threshold processing algorithm and the pose calculation method are known in the field.

Nothing in this specification is said to apply to the prior art.

Claims

1. A monocular 6D pose estimation method based on a regular dodecahedron is characterized by comprising the following steps:

firstly, labeling the regular dodecahedron by using Aruco morphine to obtain the Aruco morpholine labeled regular dodecahedron; calibrating a monocular camera, and observing the regular dodecahedron marked by the Aruco through the calibrated monocular camera;

secondly, respectively positioning a plurality of observed Aruco in the current frame and the initial frame, and respectively obtaining the coding ids of all the observed Aruco in the current frame and the initial frame_i，i＝1，2，3，.., spatial coordinates A of the n and Aruco center points_i(x_i，y_i，z_i) N, n represents the number of ArUco observed for a frame;

thirdly, performing multivariate nonlinear fitting on the spatial coordinates of the central point of the Aruco moleculae of the current frame obtained in the second step to calculate the coordinates of the spherical center of the inscribed sphere of the regular dodecahedron of the current frame as the spatial coordinates t of the body center of the regular dodecahedron of the current frame₀Performing multivariate nonlinear fitting on the spatial coordinates of the central point of the Aruco molars of the initial frame to calculate the spherical center coordinates of the tangent sphere of the dodecahedron of the initial frame as the spatial coordinates t of the body center of the dodecahedron of the initial frame_O(ii) a Then by the formula t_rel＝t₀-t_OObtaining the relative space coordinate t of the body center of the regular dodecahedron of the current frame relative to the initial frame_rel；

Constructing a virtual regular dodecahedron at the origin of a camera coordinate system, enabling the body center of the virtual regular dodecahedron to coincide with the origin of the camera coordinate system, marking the virtual regular dodecahedron by using Aruco to obtain all the programs of the Aruco of the virtual regular dodecahedron and the space coordinates A of the center points of the Aruco_Wi(x_Wi，y_Wi，z_Wi，id_Wi) 1,2,3, ·, 12; then the space coordinate of the central point of the current frame Aruco obtained in the second step and the coding thereof are compared with A_Wi(x_Wi，y_Wi，z_Wi，id_Wi) 1,2,3, 12 calculating a rotation matrix R of a regular dodecahedron of the current frame by an iterative closest point method, and compiling the spatial coordinates of the central point of the initial frame Aruco and A_Wi(x_Wi，y_Wi，z_Wi，id_Wi) 1,2,3, 12 calculates the rotation matrix R of the regular dodecahedron of the initial frame by the iterative closest point method_OThen by the formula

According to the relative rotation matrix R_relAnd relativeSpatial coordinate t_relObtaining a pose matrix T of the current frame regular dodecahedron relative to the initial frame regular dodecahedron_rel：

In the formula (2), SE (3) represents an attribute of the matrix;

And then the relative pose matrix T of the measured object is as follows:

2. The monocular 6D pose estimation method based on a regular dodecahedron as claimed in claim 1, wherein in the first step, the process of marking the regular dodecahedron by using ArUco code is: 12 Aruco codes are generated by using an OpenCV image processing library, the 12 Aruco codes are respectively attached to 12 faces of the regular dodecahedron in the coding sequence, and the geometric center of the Aruco codes is coincident with the geometric center of the face to which the Aruco codes are attached.

3. The monocular 6D pose estimation method based on a regular dodecahedron as claimed in claim 1, wherein in the first step, a chessboard grid standard calibration board is used for calibrating monocular cameras, and an OpenCV image processing library is used for calibrating internal parameters of the cameras to generate a camera internal parameter matrix K;

4. The monocular 6D pose estimation method based on a regular dodecahedron as claimed in claim 1, wherein in the second step, the positioning process of the ArUco code is: carrying out gray processing, median filtering and adaptive threshold segmentation on the image acquired by the monocular camera in sequence, extracting an Aruco code contour from the segmented image, and filtering the non-convex and non-square images to further obtain a candidate region meeting the requirement.

5. The monocular pose estimation method based on regular dodecahedron of claim 4, wherein in the second step, the coded acquisition of Aruco code: firstly, perspective transformation is applied to a candidate region to obtain a standard square mark, an Otsu threshold processing algorithm is used for separating black and white color positions, and the black and white color positions are divided into different cells according to the size of the mark; and determining the color of each cell according to the color corresponding to the pixel with the largest number on each cell, and finally converting the color into a binary value to determine the code of the mark.

6. The monocular 6D pose estimation method based on a regular dodecahedron as claimed in claim 4, wherein in the second step, the spatial coordinates of the central point of the ArUco code are obtained: and solving the spatial coordinates of the central point of the Aruco code according to the four corner point information of the candidate area, using a pose calculation function of an Aruco library function in an OpenCV frame, taking the four corner point information as input, and obtaining the spatial coordinates of the central point of the Aruco code under a camera coordinate system by using a PnP mode.

7. The monocular 6D pose estimation method based on a regular dodecahedron as claimed in claim 1, wherein in the third step, the spatial coordinates of the body center of the regular dodecahedron of the current frame are obtained: the space coordinate of the central point of the Aruco code on the partial surface of the regular dodecahedron observed by the current frame obtained in the second step is taken as the inscribed sphere spherical point (x) of the regular dodecahedron of the current frame_i，y_i，z_i) The sparse point cloud coordinates of (1); when the monocular camera obtains the space coordinate more than three central points of the ArUco code, the spherical surface point (x) of the inscribed sphere of the regular dodecahedron of the current frame is obtained_i，y_i，z_i) And the radius of the inscribed sphere is taken as a parameter, the standard equation of the inscribed sphere is as follows:

(x_i-x₀)²+(y_i-y₀)²+(z_i-z₀)²-R_{inner part} ²＝0 (3)

a is the edge length of a regular dodecahedron;

since the requirement of multivariate nonlinear fitting on the initial point is high, the initial point is easy to fall into a local minimum value, and therefore the known spherical point coordinate (x) is utilized_i，y_i，z_i) Generating a group of initial values R of multivariate nonlinear fitting for the coordinates of the spherical center of the inscribed sphere_{First stage}：

Then R is put_{First stage}Obtaining a value of the loss function J in the formula (4), and then continuously changing R_{First stage}The value of (a) is that the value of the loss function J in the formula (4) is continuously reduced, and a group of (x) which enables the loss function J to be minimum can be obtained after multiple iterative solution₀，y₀，z₀) I.e. the coordinates of the spherical center of the inscribed sphere of the current frame.

8. The monocular 6D pose estimation method based on a regular dodecahedron as claimed in claim 1, wherein in the third step, the rotation matrix of the regular dodecahedron of the current frame is obtained:

(A_Wi，A_i)，i＝1，2，3，...，n (6)

the group of matching points consists of the spatial coordinates of the central point of the Aruco code observed on the actual regular dodecahedron and the spatial coordinates of the central point of the Aruco code corresponding to the actual regular dodecahedron, and the centroiding of the matching points is respectively removed by the following formula (7):

defining an error term:

in the formula (10), tr represents a trace of the matrix;

to solve the optimization objective R in equation (10), a matrix is defined:

performing singular value decomposition on W in equation (11) to obtain:

W＝U∑V^T (12)

in equation (12), Σ is the eigenvalue matrix of W, U and V are diagonal matrices, and then the rotation matrix R of the regular dodecahedron of the current frame is:

R＝UV^T (13)。