CN115661252A

CN115661252A - Real-time pose estimation method and device, electronic equipment and storage medium

Info

Publication number: CN115661252A
Application number: CN202211421534.2A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Guangdong Lyric Robot Intelligent Automation Co Ltd
Current assignee: Guangdong Lyric Robot Automation Co Ltd
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-01-31

Abstract

The invention discloses a real-time pose estimation method, a real-time pose estimation device, electronic equipment and a storage medium, wherein the method comprises the following steps: initializing and configuring a coordinate reference system of a depth camera and a laser radar; fusing three-dimensional point cloud data acquired by a laser radar and image information acquired by a depth camera, and determining a minimum target outsourcing rectangle; selecting a plurality of groups of feature points from the minimum target outsourcing rectangle to calculate the pose, and determining the target pose information of the minimum target outsourcing rectangle; determining the size of an overlapping area between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle; and finally, determining the similarity between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle and a real-time pose estimation result according to the size of the overlapping area. The method and the device use the laser radar and the depth camera to acquire the relevant pose information of the target object, realize the position and pose estimation of the target object, can improve the positioning precision of the target object, and can be widely applied to the technical field of computers.

Description

Real-time pose estimation method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a real-time pose estimation method and device, electronic equipment and a storage medium.

Background

The pose estimation capability of the robot is the key for realizing autonomous navigation of the robot for target tracking and motion control, and has important significance for improving the automation level of the robot.

The instant positioning and Mapping (SLAM) problem can be described as follows: the robot carries out path exploration from an unknown position in an unknown environment, realizes self-positioning according to position information in the moving process, and simultaneously constructs an incremental map on the basis of the self-positioning to realize the autonomous positioning and navigation of the robot. The main principle of SLAM is to detect the surrounding environment through a sensor on a robot, and construct an environment map while estimating the pose of the robot, and currently, SLAM systems are mainly divided into two types: liDAR-SLAM and Visual-SLAM. The prior SLAM method is usually realized only based on a single sensor, but if the SLAM is carried out only by using the single sensor, the problems of poor target pose estimation accuracy or poor mapping accuracy caused by sensor failure, error accumulation or shielding cannot be solved.

Disclosure of Invention

In view of this, embodiments of the present invention provide a high-precision real-time pose estimation method and apparatus, an electronic device, and a storage medium.

One aspect of the embodiments of the present invention provides a real-time pose estimation method, including:

initializing and configuring a coordinate reference system of a depth camera and a laser radar;

based on the coordinate reference system of the depth camera and the laser radar, acquiring three-dimensional point cloud data of a target object through the laser radar, and acquiring image information of the target object through the depth camera;

after the three-dimensional point cloud data and the image information are fused, determining a target minimum outsourcing rectangle through a two-dimensional image edge detection algorithm;

selecting a plurality of groups of feature points from the minimum target outsourcing rectangle to calculate the pose, and determining the target pose information of the minimum target outsourcing rectangle;

determining the size of an overlapping area between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle according to the target pose information of the target minimum outsourcing rectangle and the real pose information of the real minimum outsourcing rectangle;

and according to the size of the overlapping area, determining the similarity between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle, and determining a real-time pose estimation result of the target object.

Optionally, the initializing configures a coordinate reference system of the depth camera and the lidar, including:

constructing a first coordinate system of a depth camera and a second coordinate system of a laser radar;

determining a transformation relation of the target detection point in different coordinate systems according to the geometric relation between the first coordinate system and the second coordinate system;

converting the coordinate information of the target detection point acquired by the depth camera into a second coordinate system according to the transformation relation;

and determining a rotation and translation relation between a first coordinate system and a second coordinate system according to the coordinate information of the target detection point in different coordinate systems, finishing joint calibration of the coordinate systems according to the rotation and translation relation, and determining a transformation relation between the first coordinate system and the second coordinate system.

Optionally, the obtaining, by the lidar, three-dimensional point cloud data of a target object and obtaining, by the depth camera, image information of the target object based on a coordinate reference system of the depth camera and the lidar includes:

acquiring three-dimensional point cloud data acquired by a laser radar and RGB image data acquired by a depth camera;

generating a BEV image according to the three-dimensional point cloud data, generating an RGB-D image according to the RGB image data, and projecting height information and depth information in the BEV image onto an RGB image plane through multi-sensor combined calibration and coordinate transformation;

for the BEV image, slicing an image plane on the basis of point cloud, and then identifying the attribute of a pixel point by calculating a density characteristic and a height characteristic; for an RGB-D image, the height information of the point cloud projection is embedded into the original RGB image.

Optionally, the selecting a plurality of groups of feature points from the minimum outsourcing rectangle of the target to perform pose calculation, and determining target pose information of the minimum outsourcing rectangle of the target includes:

constructing the pose calculation problem into an algebraically defined nonlinear least square problem, and solving an optimal solution of the camera pose;

calculating a least square problem and a Jacobian function according to the constructed least square problem and by combining the constructed error function; wherein the error function is used to determine a direction of a next optimal iterative estimate of a position attitude increment;

and determining the target pose information of the minimum target outer-wrapping rectangle according to the least square problem and the calculation result of the Jacobian function.

Optionally, the determining, according to the target pose information of the target minimum outsourcing rectangle and the real pose information of the real minimum outsourcing rectangle, the size of the overlapping area between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle comprises:

determining a first area of the target minimum outsourcing rectangle and a second area of the real minimum outsourcing rectangle according to the target pose information of the target minimum outsourcing rectangle and the real pose information of the real minimum outsourcing rectangle;

calculating intersection set information and union set information between the first area and the second area;

and calculating the intersection and parallel ratio between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle according to the intersection information and the union information, and determining the size of an overlapping area between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle according to the intersection and parallel ratio.

Optionally, the calculation formula of the intersection ratio is:

wherein IOU represents the intersection ratio between the target minimum outsourcing rectangle and the true minimum outsourcing rectangle; s _dp A first area representing the target minimum bounding rectangle; s. the _gp A second area representing the true minimum bounding rectangle; intersection is represented by n; and U represents a union.

In another aspect, an embodiment of the present invention further provides a real-time pose estimation apparatus, including:

the system comprises a first module, a second module and a third module, wherein the first module is used for initializing and configuring a coordinate reference system of a depth camera and a laser radar;

the second module is used for acquiring three-dimensional point cloud data of a target object through the laser radar and acquiring image information of the target object through the depth camera based on a coordinate reference system of the depth camera and the laser radar;

the third module is used for fusing the three-dimensional point cloud data and the image information and determining a target minimum outsourcing rectangle through a two-dimensional image edge detection algorithm;

a fourth module, configured to select a plurality of groups of feature points from the minimum outsourcing rectangle of the target to perform pose calculation, and determine target pose information of the minimum outsourcing rectangle of the target;

a fifth module, configured to determine, according to the target pose information of the target minimum outsourcing rectangle and the real pose information of the real minimum outsourcing rectangle, an overlap area size between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle;

a sixth module, configured to determine, according to the size of the overlap area, a similarity between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle, and determine a real-time pose estimation result of the target object.

Another aspect of the embodiments of the present invention further provides an electronic device, which includes a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

Another aspect of the embodiments of the present invention also provides a computer-readable storage medium, which stores a program, and the program is executed by a processor to implement the method as described above.

The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.

Firstly, initializing and configuring a coordinate reference system of a depth camera and a laser radar; then, based on the coordinate reference system of the depth camera and the laser radar, three-dimensional point cloud data of a target object is obtained through the laser radar, and image information of the target object is obtained through the depth camera; after the three-dimensional point cloud data and the image information are fused, determining a target minimum outsourcing rectangle through a two-dimensional image edge detection algorithm; then selecting a plurality of groups of feature points from the target minimum outsourcing rectangle to calculate the pose, and determining the target pose information of the target minimum outsourcing rectangle; determining the size of an overlapping area between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle according to the target pose information of the target minimum outsourcing rectangle and the real pose information of the real minimum outsourcing rectangle; and finally, according to the size of the overlapping area, determining the similarity between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle, and determining a real-time pose estimation result of the target object. According to the invention, the laser radar and the depth camera are used for acquiring the relevant pose information of the target object, so that the position and the pose of the target object are estimated, and the positioning precision of the target object can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a flowchart illustrating the overall steps provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

In view of the problems in the prior art, an embodiment of the present invention provides a real-time pose estimation method, as shown in fig. 1, the method of the present invention integrally includes the following steps:

initializing a coordinate reference system for configuring a depth camera and a laser radar;

fusing the three-dimensional point cloud data and the image information, and determining a target minimum outsourcing rectangle through a two-dimensional image edge detection algorithm;

selecting a plurality of groups of feature points from the target minimum outsourcing rectangle to calculate the pose, and determining the target pose information of the target minimum outsourcing rectangle;

determining the size of an overlapping area between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle according to the target pose information of the target minimum outsourcing rectangle and the real pose information of the real minimum outsourcing rectangle; the true minimum envelope rectangle is a reference value (true value) manually made by the producer of the data set.

It is understood that the depth camera can detect the depth distance of the photographing space. And acquiring the distance between each point in the image and the camera and the two-dimensional coordinates and color information of the point in the 2D image through the depth camera.

The laser radar is a radar system that detects a characteristic quantity such as a position, a speed, and the like of an object by emitting a laser beam. The working principle is that a detection signal (laser beam) is emitted to a target, then a received signal (target echo) reflected from the target is compared with the emitted signal, and after appropriate processing, relevant information of the target, such as target distance, azimuth, height, speed, attitude, even shape and other parameters, can be obtained, so that the targets of airplanes, missiles and the like are detected, tracked and identified.

and determining a rotation-translation relationship between a first coordinate system and a second coordinate system according to the coordinate information of the target detection point in different coordinate systems, completing joint calibration of the coordinate systems according to the rotation-translation relationship, and determining a transformation relationship between the first coordinate system and the second coordinate system.

for the BEV image, slicing an image plane on the basis of point cloud, and identifying the attribute of a pixel point by calculating a density characteristic and a height characteristic; for an RGB-D image, the height information of the point cloud projection is embedded into the original RGB image.

The BEV image is a top view centered on the visual point.

An RGB-D image can be understood as two images: one is a common RGB three-channel color image; one is a Depth image. The Depth image is similar to a grayscale image except that each pixel value thereof is the actual distance of the sensor from the object. Usually, the RGB image and the Depth image are registered, so that there is a one-to-one correspondence between the pixel points.

Thus, RGB-D = RGB + Depth Map can be derived.

The RGB color scheme is a color standard in the industry, and various colors are obtained by changing three color channels of red (R), green (G) and blue (B) and superimposing the three color channels on each other, where RGB represents colors of the three channels of red, green and blue, and the color standard almost includes all colors that can be perceived by human vision, and is one of the most widely used color systems at present.

Depth Map: in 3D computer graphics, a Depth Map (Depth Map) is an image or image channel containing information about the distance of the surface of a scene object from a viewpoint. Where Depth Map is similar to a grayscale image except that each pixel value is the actual distance of the sensor from the object. Usually, the RGB image and the Depth image are registered, so that there is a one-to-one correspondence between the pixel points.

Image depth refers to the number of bits used to store each pixel and is also used to measure the color resolution of an image. The image depth determines the number of possible colors per pixel of a color image or the number of possible gray levels per pixel of a gray scale image. It determines the maximum number of colors that can be present in a color image, or the maximum gray level in a gray scale image. For example, if each pixel has 8 bits, the maximum number of gray levels is 2 to the power of 8, i.e., 256.

The three channels of color image RGB have pixel bits of 4,4,2, respectively, and the maximum color number is 4+2 power of 2, i.e. 1024, that is, the depth of the pixel is 10 bits, and each pixel can be one of 1024 colors.

For example: a picture size of 1024 × 768 and a depth of 16 has a data volume of 1.5M.

The calculation is as follows:

1024×768×16bit＝(1024×768×16)/8Byte＝[(1024×768×16)/8]/1024KB＝1536KB＝{[(1024×768×16)/8]/1024}/1024MB＝1.5MB。

optionally, the selecting a plurality of groups of feature points from the target minimum outsourcing rectangle to perform pose calculation, and determining target pose information of the target minimum outsourcing rectangle includes:

calculating a least square problem and a Jacobian function according to the constructed least square problem and by combining the constructed error function; wherein the error function is used to determine a direction of a next optimal iterative estimate of the position attitude increment;

and determining the target pose information of the target minimum outsourcing rectangle according to the least square problem and the calculation result of the Jacobi function.

It should be noted that the nonlinear least squares method is a parameter estimation method for estimating parameters of a nonlinear static model by using the least square sum of errors as a criterion. Let the model of the nonlinear system be y = f (x, θ), which is commonly used for sensor parameter setting, where y is the output of the system, x is the input, and θ is the parameter (they may be vectors). The non-linearity here refers to a non-linear model of the parameter θ, and does not include the time-varying relation of the input and output variables. The form f of the model is known when estimating the parameters, and data (x 1, y 1), (x 2, y 2), (8230; (xn, yn) are obtained through N experiments. The criterion (or objective function) for estimating parameters is selected as the square error of the model and the nonlinear least square method, namely, the parameter estimation value for making Q minimum.

In the vector calculus, a jacobian matrix is a matrix in which the first partial derivatives are arranged in a certain manner, and its determinant is called jacobian. The importance of the jacobian matrix is that it embodies an optimal linear approximation of a micro-equation to a given point. Thus, the jacobian matrix is analogous to the derivative of a multivariate function.

Optionally, the determining, according to the target pose information of the target minimum outsourcing rectangle and the real pose information of the real minimum outsourcing rectangle, the size of the overlapping area between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle includes:

and calculating the intersection ratio between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle according to the intersection information and the union information, and determining the size of an overlapping area between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle according to the intersection ratio.

Optionally, the calculation formula of the intersection ratio is:

wherein IOU represents the intersection ratio between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle; s. the _dp A first area representing the target minimum bounding rectangle; s. the _gp Representing the true minimum bounding rectangleA second area of; intersection is represented by n; and U represents a union.

the third module is used for fusing the three-dimensional point cloud data and the image information and then determining a target minimum outsourcing rectangle through a two-dimensional image edge detection algorithm;

the fourth module is used for selecting a plurality of groups of feature points from the target minimum outsourcing rectangle to calculate the pose and determining the target pose information of the target minimum outsourcing rectangle;

a fifth module, configured to determine, according to the target pose information of the target minimum outsourcing rectangle and the real pose information of the real minimum outsourcing rectangle, a size of an overlapping area between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle;

Another aspect of the embodiments of the present invention further provides an electronic device, including a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

The following describes in detail the specific implementation process of the real-time pose estimation method of the present invention:

first, it should be noted that positioning and navigation are key technologies of autonomous mobile service robots, and building positioning and maps is considered as a necessary basis for realizing the functions. The main principle of SLAM is to detect the surrounding environment through a sensor on a robot, and construct an environment map while estimating the pose of the robot, and currently, SLAM systems are mainly divided into two types: liDAR-SLAM and Visual-SLAM. If only a single sensor is used for SLAM, the positioning accuracy is low.

Instant positioning and Mapping (SLAM) is a special term, also called CML (Concurrent positioning and Mapping) or Concurrent positioning and Mapping. The SLAM problem can be described as: the robot starts to move from an unknown position in an unknown environment, self-positioning is carried out according to the position and the map in the moving process, and meanwhile, an incremental map is built on the basis of self-positioning, so that the autonomous positioning and navigation of the robot are realized.

Aiming at the problems in the prior art, the invention provides a real-time pose estimation method, which constructs a complementary system scheme by taking a laser radar as a basis and taking multiple sensors such as a depth camera RGB-D and an IMU as supplements in a specific application scene, and comprises the following steps:

s1: the coordinate reference system for RGB-D and lidar is initialized, and specifically a world (global) coordinate system is defined. To simplify the calculations, the reference frame of the mobile robot is assumed to be the same as the lidar. LiDAR and cameras detect objects in different forms of data. Let P be the target point. Point P in lidar coordinate system (O) _L ，X _L ，Y _L ，Z _L ) The lower coordinate is (X) _L ，Y _L ，Z _L ) The coordinates of the point P in the camera coordinate system (Oc, xc, yc, zc) are (Xc, yc, zc), and the point P in the plane coordinate system (X) _p O _p Y _P ) Is P, (u, v). The coordinate of the P point collected by the laser radar is not (xL, gamma L, zL), but the distance r and the angle alpha; the coordinates of the P point acquired by the camera are not (xc, yc, zc), but the projection coordinates (u, v) and the corresponding depth information (Zc).

According to LiDAR coordinate system (O) _L ，X _L ，Y _L ，Z _L ) The geometrical relationship with the camera coordinate system (Oc, xc, yc, zc) can obtain the transformation relationship of the point P in different coordinate systems, as shown in formula 1.

Wherein, R is a rotary array, and T is a translation array. For a classical model of a camera, RGB-D may acquire coordinate data P, (u, v) and depth information Zc through the camera, where P is a projection of a point P on an image plane, and a relationship between a coordinate system of the camera and data u, v, z acquired by LiDAR is as shown in formula 2.

Wherein, formula (2) f _x ，f _y Equivalent focal lengths (in pixel units) on the x-axis and y-axis, respectively. c. C _x 、c _y Reference points on the x-axis and the y-axis, respectively, which belong to intrinsic parameters of the camera; (u, v) is the projected point of the target point P on the image plane. According to equations (1) and (2), the present embodiment can convert coordinate information collected by the camera into coordinates in the LiDAR coordinate system. Here, it is desirable to maintain the origin of coordinates of the LiDAR and camera on the Y-axis, given a vertical height difference of h, as shown in equation 3.

z _{L＝r cosa} x _L ＝r sina

In determining camera parameters f _x ，f _y 、c _x 、c _y Then, taking data u, v and z of a plurality of cameras and scanning radius r of the laser radar, wherein alpha is an included angle between the data u, v and z and an x axis and an y axis to a formula (3); and obtaining matrixes R and T by solving a linear equation system, further completing the joint calibration of the rotation and translation relation of the laser radar and the camera coordinate system, and then determining the transformation relation between the camera and the laser radar coordinate system and completing the calibration.

S2: the laser radar measures the shape and the outline of an object to generate three-dimensional point cloud data, a camera acquires image information of the object, the image and the point cloud data are fused, then a target minimum outsourcing rectangle is obtained through a two-dimensional image edge detection algorithm, and a plurality of groups of feature points are selected from a polygon to calculate the position and the pose.

Specifically, the present embodiment takes the lidar point cloud and the RGB image as input data. BEV and RGB-D images may be obtained. And projecting the height information and the depth information of the three-dimensional radar point cloud onto an RGB image plane through multi-sensor combined calibration and coordinate transformation. Thus, for BEV, the image plane is sliced on the basis of the point cloud, and then the attributes of the pixel points (x, y, z) are identified by calculating the density and height features. For the RGB-D image, only the height information of the point cloud projection needs to be embedded into the original RGB image.

The whole process is divided into three steps:

first, the point cloud (X, Y, Z) is mapped onto the original image (W, H) plane, as shown in equation 4.

(uv1) ^T ＝M(XYZ1) ^T

Wherein (u, v) is the image coordinate, P _roj In order to be a projection matrix, the projection matrix,

is a rotation matrix of the LiDAR to the camera,

for translation vectors, M is a homogeneous transformation matrix of LiDAR to cameras. In the embodiment, liDAR point cloud coordinates (X, Y, Z) are mapped to a W multiplied by H image plane through a formula (4), and a projection matrix M is solved; wherein (uv 1) ^T T in (2) is the transpose of the matrix.

Second, the points { (X, Y, z) | X ∈ X, Y ∈ Y, z ∈ X, which lie in the image size W × H, are retained]}. At the same time, the LiDAR points are projected onto the camera coordinates, denoted as (x) _c ，y _c ，z _c ). As shown in equation 5.

(x _c y _c z _c ) ^T ＝M·(xyz1) ^T (5)

Finally, z is _c Mapped between 0 and 255 and then assigned to the corresponding image coordinates (u, v). In this embodiment, the formula (5) is derived from the formula (4), where (xc, yc, zc) is the camera coordinate, and the point { (X, Y, Z) | X ∈ X, Y ∈ Y, and Z ∈ Z } is a point coordinate of the point cloud.

And S3, constructing the PnP problem into an algebraically defined nonlinear least square problem to solve the optimal solution of the attitude of the camera, wherein the optimal solution is shown as a formula 6.

Wherein, delta is a lie algebra; r (delta) is the residual (predictor-observer),

is a predicted value, and u is an observed value.

S4, the constructed least square problem is shown in a formula 7, and the least square problem and the Jacobian function are solved.

According to the embodiment of the invention, a Jacobi expression J shown in a formula (8) can be obtained by constructing a least square problem and using a Jacobi function.

And S5, determining the direction of the next optimal iterative estimation of the position and attitude increment by an error function, and according to the pose transformation process, expressing j0 by using a chain rule of a formula 8 in the embodiment.

From the above calculations, it can be seen that the Jacobian matrices for the direct method and the feature point method are only equal to j ₀ Different. The specific derivation step is shown in an error function of a pose jacobian matrix during SLAM optimization. The results are shown in formula 9.

S6: evaluating detected polygons using IOU (S) _dp ) And true minimum outsourcing rectangle (S) _gp ) Divided by a merging region (S) _dp ) And (S) _gp ) The overlapping area of (c). The calculation process is shown in equation 10.

If the IoU between the detected polygon and the real minimum envelope rectangle is more than 0.5, the polygon is considered as the same object, otherwise, the polygon is not.

In summary, the real-time pose estimation method of the present invention combines the advantages of RGB-D and laser radar, and is different from the existing two-stage frame or multi-stage pipeline method, in the present solution, after fusing image data and original point cloud data, using the spatial anchor point of the input 3D point cloud as a key point, predicting the pose between two sequence frames, then using CNN to identify and extract the 3D bounding box, tracking the target object and projecting to the RGB image, obtaining the target MBR, then using a geometric method to calculate the rotation angle and translation distance of the target centroid, and finally combining with the IMU to perform joint optimization, thereby realizing the estimation of the position and the pose, and improving the positioning accuracy of the model.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise indicated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be understood that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer given the nature, function, and interrelationships of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A real-time pose estimation method is characterized by comprising the following steps:

2. The real-time pose estimation method according to claim 1, wherein the initializing a coordinate reference system of the depth camera and the lidar includes:

3. The real-time pose estimation method according to claim 1, wherein the acquiring three-dimensional point cloud data of a target object by the lidar and image information of the target object by the depth camera based on the coordinate reference system of the depth camera and the lidar comprises:

4. The real-time pose estimation method according to claim 1, wherein the selecting sets of feature points from the minimum bounding rectangle of the goal to perform pose calculation and determining the pose information of the goal of the minimum bounding rectangle comprises:

according to the constructed least square problem, combining the constructed error function, and calculating the least square problem and the Jacobian function; wherein the error function is used to determine a direction of a next optimal iterative estimate of the position attitude increment;

5. The real-time pose estimation method according to claim 1, wherein the determining the size of the overlap area between the target minimum outsourcing rectangle and the real minimum outsourcing rectangle according to the target pose information of the target minimum outsourcing rectangle and the real pose information of the real minimum outsourcing rectangle comprises:

6. The real-time pose estimation method according to claim 5,

the calculation formula of the intersection ratio is as follows:

wherein IOU represents the intersection ratio between the target minimum outsourcing rectangle and the true minimum outsourcing rectangle; s. the _dp A first area representing the target minimum bounding rectangle; s _gp A second area representing the true minimum bounding rectangle; n represents the intersection; and U represents a union.

7. A real-time pose estimation apparatus, comprising:

8. An electronic device comprising a processor and a memory;

the memory is used for storing programs;

the processor executing the program realizes the method of any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that the storage medium stores a program which is executed by a processor to implement the method according to any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the method according to any of claims 1 to 6 when executed by a processor.