CN109872343B

CN109872343B - Weak texture object posture tracking method, system and device

Info

Publication number: CN109872343B
Application number: CN201910105602.6A
Authority: CN
Inventors: 刘力; 李中源; 张小军
Original assignee: Shichen Information Technology (shanghai) Co Ltd
Current assignee: Shichen Information Technology (shanghai) Co Ltd
Priority date: 2019-02-01
Filing date: 2019-02-01
Publication date: 2020-03-17
Anticipated expiration: 2039-02-01
Also published as: CN109872343A

Abstract

The invention provides a method, a system and a device for tracking the posture of a weak texture object, wherein the method comprises the following steps: predicting the attitude of the input current frame image to obtain a predicted object attitude: acquiring object contour shape data according to the predicted object posture; searching and matching contour shape characteristics on the current live-action image according to the contour shape data of the object; and carrying out attitude calculation according to the matching information. The method utilizes the 3D model of the weak texture object and the live-action image of the target object, does not need to rely on a large amount of prior data for training, and therefore can realize real-time posture tracking on common computing equipment.

Description

Weak texture object posture tracking method, system and device

Technical Field

The embodiment of the invention relates to the field of computer vision, in particular to a method, a system and a device for tracking the posture of a weak texture object.

Background

Three-dimensional object posture tracking is a typical task in the field of computer vision, and by calculating 6-degree-of-freedom (R, t) posture data of a three-dimensional object in a scene in real time, the external world has a clear understanding of the position and orientation of the three-dimensional object in the scene, and the three-dimensional object posture tracking has a great number of applications in the fields of augmented reality, industrial assistance, robots and the like. The three-dimensional object can be divided into rich texture and weak texture according to the richness degree of the surface texture.

The rich texture has rich texture information, so that the rich texture can be extracted and described by utilizing a mature computer vision algorithm, then a matching pair with a 3D model is obtained by methods such as 2D image feature matching and the like, attitude estimation and tracking are carried out, and finally the 6-degree-of-freedom attitude of the object is obtained.

The weak texture object is also a very common object, which is as small as screws, wrenches and various parts in a production workshop and as large as almost all household appliances, automobiles, consumer electronics products and the like in life, and can be said to be full of daily life and social production of people. The gesture tracking with 6 degrees of freedom is carried out on the weak texture object, so that a plurality of very practical problems can be solved, for example, the function of carrying out virtual button marking on household appliances in daily life so as to realize a specification, for example, the function of carrying out virtual decoration on an automobile so as to realize multi-dimensional display, for example, the function of assisting workers in assembling parts in industry and assisting a robot in accurately grabbing objects and the like can be solved. These augmented reality applications or industrial production applications require real-time acquisition of 6-degree-of-freedom pose information of a weakly textured object so that auxiliary digital content or tags can be accurately placed on the real position of the object, thereby allowing a user to quickly obtain useful information in the same field of view, improving productivity and production efficiency. However, the surface of a weak texture object is often single in color, has no abundant texture, even has the characteristics of high light, specular reflection and the like, and is very lack of or even has no reliable visual features, so that the traditional computer vision algorithm cannot well solve the tracking problem of the 6-degree-of-freedom posture of the object.

Chinese patent application CN107679537A of prior art 1 discloses a texture-free spatial target pose estimation algorithm based on contour point ORB feature matching, which uses ORB features to perform feature extraction on the contour of a multi-viewpoint projection image of a texture-free object, and performs feature matching on ORB features in a target image to be detected with the contour, thereby obtaining 2D-3D matching information. And finally, solving the attitude parameters by combining the accuracy assistance of feature matching.

US patent US9892543B2 of prior art 2 discloses a system and method for estimating pose of a non-textured object, the estimation of pose being performed on a weakly textured object by a method of machine learning. Firstly, projection rendering is carried out on a 3D model of the weak texture object, different viewpoints, different illumination conditions and different scale distances are exhausted as far as possible, and a series of training data sets are obtained. And during training, the tree structure is adopted to index the blocks of the target object in the image. And performing matching retrieval on the input target image to be detected and the data set, and finally deducing the 6-degree-of-freedom posture of the target object in the current image.

However, in the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

the chinese patent application CN107679537A of prior art 1 uses a conventional computer vision algorithm, which has the disadvantage that it is not possible to extract sufficient reliable visual features using ORB features for the contour of the projected image of most non-textured objects. In the process of carrying out ORB feature matching on an image to be detected, the feature uniqueness is poor and the feature matching outlier rate is extremely high due to the fact that object contour information in the image of the real environment is very easily interfered by noise, background, shielding and the like. Therefore, the pose solution with 6 degrees of freedom in a real environment is very easy to fail.

US patent US9892543B2 of prior art 2 uses a newer machine learning algorithm, which has the disadvantage that a training data set of a very large amount of data needs to be constructed for each weakly textured object to cover as common situations and poses as possible. The success rate and the precision of the method depend on the scale of the training data set, so that the method has high cost in the practical field, has no convenient expandability and cannot be quickly applied to another weak texture object. Another disadvantage of this method is that the matching and pose inference process is time-consuming and depends on a lot of computing resources, and thus does not have the capability of real-time pose tracking on general computing devices (such as smart phones, AR glasses, etc.).

In addition, weak texture objects in a real environment generally need to face the situation that occlusion, background interference, highlight material, specular reflection and the like seriously affect the performance of computer vision or machine learning algorithms, and the methods in the prior art have no good system robustness to the problems, and are prone to tracking failure or attitude error in the using process.

It should be noted that the above background description is only for the sake of clarity and complete description of the technical solutions of the present invention and for the understanding of those skilled in the art. Such solutions are not considered to be known to the person skilled in the art merely because they have been set forth in the background section of the invention.

Disclosure of Invention

In view of the foregoing problems, an object of embodiments of the present invention is to provide a method, a system, and a device for tracking a pose of a weak texture object, where a 3D model of the weak texture object and a live-action image of a target object are used, and training is performed without relying on a large amount of prior data, so that real-time pose tracking can be implemented on a common computing device.

In order to achieve the above object, an embodiment of the present invention provides a method for tracking a weak texture object pose, including: predicting the attitude of an input current frame image to obtain a predicted object attitude, which specifically comprises the following steps: inputting a current frame image; if the current frame is the initial frame, adjusting the default posture of the system as the predicted object posture rotation matrix R₀＝R_adjustTranslation vector t₀＝t_adjustThe default pose is represented by a default rotation matrix and translation vector, denoted as R_init，t_initThe value is obtained by optimal calculation according to the following conditions:

wherein M represents a 3D model of the object; i represents an image; ori (m) represents the center of the 3D model of the object; ori (i) represents the center of the image; proj (; R, t) denotes projection according to attitude (R, t); area () represents a calculation Area;

adjusting the default posture according to a certain step length (Δ R, Δ t):

ΔR＝exp(δrx，δry，δrz)^T，Δt＝(δtx，δty，δtz)^T

wherein Δ R is a preset rotation step, and Δ t is a preset translation step; δ tx, δ ty, δ tz are step lengths of unit translation; δ rx, δ ry, δ rz is the step size of the unit rotation vector; exp () is an exponential mapping function under the SO (3) group, and converts the rotation amount into a rotation matrix;

the calculation criterion of the adjusted posture according to the SE (3) group is as follows:

R_adjust＝ΔR*R_init，t_adjust＝ΔR*t_init+Δt

the predicted object pose (R)₀，t₀) Is R₀＝R_adjust，t₀＝t_adjust(ii) a Or, if it is currentlyIf the frame is not the initial frame, then the pose of the previous frame is used as the predicted object pose rotation matrix R₀＝R_prevTranslation vector t₀＝t_prev(ii) a Acquiring object contour shape data according to the predicted object posture, specifically comprising: performing projection rendering on the 3D model of the object according to the predicted object posture, and extracting a projection outline; sampling the outline information, and carrying out statistics on the local color information of the outline to obtain object outline shape data; searching and matching contour shape characteristics on the current live-action image according to the contour shape data of the object; and carrying out attitude calculation according to the matching information.

The embodiment of the invention also provides a system for tracking the posture of the weak texture object, which comprises the following components: the image acquisition unit is used for acquiring a real-scene image of the target object; the human-computer interaction unit is used for displaying the acquired real scene image of the target object on the display unit; a data storage unit for storing a 3D model of the object; a rendering unit, configured to perform projection rendering of the 3D model of the object according to the predicted object pose, specifically configured to: performing projection rendering on a 3D model of an object according to the predicted object posture, rendering a binary foreground image, and extracting a 2D contour of the foreground image; the communication synchronization unit is used for carrying out data synchronization on the rendered plane image and the image acquired by the image acquisition unit; the computing unit is used for predicting the posture of the input current frame image to obtain a predicted object posture, acquiring object contour shape data according to the predicted object posture, searching and matching contour shape characteristics on the current live-action image according to the object contour shape data, and performing posture calculation according to matching information; the computing unit is specifically configured to: judging whether the input current frame image is an initial frame; if the current frame is the initial frame, adjusting the default posture of the system as the predicted object posture rotation matrix R₀＝R_adjustTranslation vector t₀＝t_adjustThe default pose is represented by a default rotation matrix and translation vector, denoted as R_init，t_initThe value is obtained by optimal calculation according to the following conditions:

adjusting the default posture according to a certain step length (Δ R, Δ t):

ΔR＝exp(δrx，δry，δrz)^T，Δt＝(δtx，δty，δtz)^T

R_adjust＝ΔR*R_init，t_adjust＝ΔR*t_init+Δt

the predicted object pose (R)₀，t₀) Is R₀＝R_adjust，t₀＝t_adjust(ii) a If the current frame is not the initial frame, the pose of the previous frame is used as the predicted object pose rotation matrix R₀＝R_prevTranslation vector t₀＝t_prev。

The embodiment of the present invention further provides a weak texture object posture tracking device, including a memory and a processor, wherein: the memory is used for storing codes and documents; the processor is configured to execute the code and documents stored in the memory to implement the aforementioned method steps.

As can be seen from the above, the embodiment of the invention combines the projection contour of the 3D model of the weak texture object and the color information on the live-action image to perform statistical description of the local shape features, and the contour extraction is not performed on the live-action image, so that the interference of noise in the real-action image on the edge contour information can be dealt with; secondly, the embodiment of the invention carries out discretized conditional interval sampling when carrying out statistics on the local characteristics around the object outline, reserves the local area of a strong structure, and fades the local area of a weak structure and a flat structure, thereby well coping with the situations of background interference, shielding and the like; thirdly, the whole system only depends on the outer contour of the object and does not need to depend on the inner surface of the object, so that the embodiment of the invention can reduce the influence of the interference to the minimum under the complex conditions of high light material, specular reflection and the like, and has high system robustness; in addition, the input of the system in the embodiment of the invention only needs the 3D model of the weak texture object and the live-action image of the target object, and does not need to rely on a large amount of prior data for training, thereby having very high expansibility and adaptability. Compared with the traditional operators such as ORB, SIFT and the like, the used local feature statistical method is low in calculation complexity, and can achieve real-time posture tracking on common computing equipment (such as smart phones, AR glasses and the like).

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for tracking a posture of a weak texture object according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a process of predicting a pose of an input current frame image according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of acquiring object contour shape data according to a predicted object posture according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of a process of searching and matching contour shape features on a current live-action image according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart illustrating an attitude calculation according to matching information according to an embodiment of the present invention

FIG. 6 is a schematic diagram of an architecture of a weak texture object pose tracking system according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another weak texture object pose tracking apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The embodiment of the invention provides a weak texture object posture tracking method which mainly comprises object posture prediction, object contour shape data acquisition, real image contour shape searching and matching and posture calculation. As shown in fig. 1, the method specifically comprises the following steps:

step 1, predicting the posture of an input current frame image to obtain a predicted object posture.

The detailed mode of the step can be seen in fig. 2.

Step 101, updating the predicted object pose.

Step 102, inputting a current frame image.

103, judging whether the current frame is an initial frame, if so, entering a step 104; if not, step 105 is entered.

In this embodiment, if the system never enters the tracking process, for example, a camera is just turned on, or the tracking state of the previous frame of the system is failed, the current frame is considered as the initial frame, otherwise, the current frame is a non-initial frame.

And 104, adjusting the default posture of the system to be used as the predicted posture of the object.

In this embodiment modeFor the initial frame, the system default pose is adjusted as the predicted object pose, i.e., (R)₀，t₀) Is (R)_adjust，t_adjust) Wherein R is₀＝R_adjust，t₀＝t_adjustR denotes a rotation matrix and t denotes a translation vector.

The default posture adjustment method is as follows:

the default pose is represented by a default rotation matrix and translation vector, denoted as R_init，t_initThe value is obtained by optimal calculation according to the following conditions:

wherein M represents a 3D model of the object; i represents an image; ori (m) represents the center of the 3D model of the object; ori (i) represents the center of the image; proj (; R, t) denotes projection according to attitude (R, t); area () represents a calculation region Area.

Default posture R_init，t_initAfter the model M is projected, the center of the model M is just positioned at the center of the image, and the area occupied by the projection area in the picture is the largest.

Adjusting the default posture according to a certain step length (Δ R, Δ t):

ΔR＝exp(δrx，δry，δrz)^T，Δt＝(δtx，δty，δtz)^T

wherein Δ R is a preset rotation step, and Δ t is a preset translation step; δ rx, δ ry, δ rz is the step size of the unit rotation vector; δ tx, δ ty, δ tz are step lengths of unit translation; exp () is an exponential mapping function under the SO (3) group, and converts the rotation amount into a rotation matrix, wherein SO (3) is a 3-dimensional special orthogonal group, any 3-dimensional rotation matrix belongs to SO (3), and the operation follows the SO (3) algorithm.

R_adjust＝ΔR*R_init，t_adjust＝ΔR*t_init+Δt

wherein SE (3) is a 3-dimensional special Euclidean group and represents rigid body transformation, namely rotation and translation.

The predicted object pose (R)₀，t₀) Is (R)_adjust，t_adjust) Wherein R is₀＝R_adjust，t₀＝t_adjust。

Step 105, using the pose of the previous frame as the predicted pose of the object.

In this embodiment, for a non-initial frame, the pose (R) of the previous frame may be used directly_prev，t_prev) As predicted object attitude (R)₀，t₀) Wherein R is₀＝R_prev，t₀＝t_prev。

And 2, acquiring object contour shape data according to the predicted object posture.

The detailed mode of the step can be seen in fig. 3.

Step 201, updating the predicted object posture.

And 202, performing projection rendering on the 3D model of the object according to the predicted object posture.

In the present embodiment, the object attitude R is predicted based on the predicted object attitude₀＝R_adjust，t₀＝t_adjustOr R₀＝R_prev，t₀＝t_prevAnd performing projection rendering of the 3D model of the object.

Step 203, extracting the projection contour.

In this embodiment, in order to reduce the complexity of the calculation, only the binarized foreground map Fg may be rendered, and then the 2D contour of the foreground map Fg may be extracted. Preferably, only the outline is extracted, and an outline extraction method, such as a method of statistical Structural Analysis of partitioned Binary Images by BorderFollowing, may be used, although other methods may be used, and are not limited.

It should be noted that, in the present embodiment, since the contour extraction is performed on the projected binary image, not on the live-action target image, the contour extraction is not interfered by the noise of the live-action image, and the obtained contour can be ensured to be accurate.

Step 204, sampling the profile information.

In this embodiment, after obtaining the projected 2D contour, the contour needs to be information sampled.

In the actual operation process, any sampling brings about a certain degree of shape information loss. The currently commonly used simple sampling method is to perform sampling at uniform intervals along the contour line to obtain a series of 2D contour sampling points. However, the shape characteristics of the object in the actual field of view are neglected in the uniform interval sampling mode, and the structural characteristics cannot be well preserved. For example, when sampling is sparse, shape loss is large; when the sampling is dense, the calculation amount is multiplied.

It should be noted that in this embodiment, an uneven sampling manner is adopted, that is, an adaptive sampling step length is set according to the local structural features of the contour, strong structural features are retained, flat regions and weak structural regions are thinned, the original shape is retained to the maximum extent, and the sampling density is not increased.

The adaptive sampling step calculation method is as follows:

λ_i＝λ₀*Weigh(θ)，θ＝Angle(P_i-1)

wherein λ is_iIs a self-adaptive sampling step length; lambda [ alpha ]₀A base step size; theta represents more than one sampling point P_i-1Angle (P) formed at both sides of the contour line as the center_i-1) (ii) a Weigh (θ) represents a weighting factor for this angle.

The angle θ closer to 180 ° indicates that the region is less structural and flatter, whereas the region is more structural and sharper. Different weighting coefficients are set according to different included angles theta, a smaller coefficient is set in a region with strong structure, and a larger coefficient is set in a region with weak structure.

And step 205, counting the local color information of the contour to obtain object contour shape data.

In the present embodiment, the contour point obtained by sampling the contour information based on the adaptive sampling step is { P }_i(u，v；x，y，z)|i＝[1，N]N contour points in total. Wherein the contour point P_iBoth the 3D point position (x, y, z) in the 3D model of the object and the 2D position (u, v) on the projection image are included.

After N contour points are obtained, statistics of contour local color information is needed to obtain finally needed object contour shape data. In the present embodiment, the statistics are performed on the live view image of the target object in the previous frame.

First, each contour point P is calculated_iNormal direction vector n of_i：

n_i＝(v_i-v_i+1，u_i+1-u_i)

Wherein u is_i，v_iIs a contour point P_iThe pixel coordinates of (a); u. of_i+1，v_i+1Is a contour point P_i+1The pixel coordinates of (a); the normal calculation formula is to express n_iAnd P_i、P_i+1The line between the points is vertical.

Then, pixels on the live view image are counted along the positive direction and the negative direction of the normal vector, and a color histogram is calculated as a local shape description of the contour point and is recorded as an internal shape and an external shape, respectively.

Finally, the internal shapes and the external shapes of all the contour points are respectively superposed to form the contour shape data Appearance of the object_obj：

Appearance_obj＝{∑_iIntApperance_i，∑_iExtAppearance_i}

Wherein, Appeance_iRepresenting a contour point P_iIntApperatence of_iRepresenting the internal shape, ExtAppearance_iRepresenting the outer shape, Histo () is a statistical color histogram function, α is a step factor.

If L pixels are counted along the positive direction and the negative direction of the normal vector, the shape feature of the point is described by 2L +1 pixels. Therefore, the total number of points to be examined is N × 2L +1, and compared with the conventional method in which local features such as ORB and SIFT are required to extract features from the whole image, the amount of computation is reduced geometrically, and thus the method is very efficient in performing statistics on the contour local color information in the present embodiment.

And 3, searching and matching the contour shape characteristics on the current live-action image according to the contour shape data of the object.

The detailed mode of the step can be seen in fig. 4.

Step 301, updating the predicted object pose.

Step 302, sampling the contour with a sample point P_iAnd projecting on the current live-action image.

In the present embodiment, the object attitude R is predicted based on the predicted object attitude₀＝R_adjust，t₀＝t_adjustOr R₀＝R_prev，t₀＝t_prevSampling point of the profile { P_i(u，v；x，y，z)|i＝[1，N]Projecting on the current live-action image.

Step 303, calculating the gradient of each pixel on the current live-action image along the positive direction and the negative direction of the normal of the contour sampling point.

In the present embodiment, each contour sampling point P is calculated according to the method of the aforementioned step 205_iNormal direction vector n of_iThe gradient of each pixel is calculated on the current live-action image in the positive and negative directions of the normal.

Step 304, judging whether the pixel gradient exceeds a threshold value, if so, executing step 305; if not, return to step 303.

In the present embodiment, a pixel gradient threshold value is set in advance. If the pixel gradient exceeds a threshold value, the pixel is considered as a candidate point; and if the pixel gradient does not exceed the threshold, returning to continue to search along the normal for candidate points that satisfy the gradient threshold. Marking the candidate point as { C_i|i＝[1，m]It can be set that m candidate points are found in total.

And 305, taking the pixel as a candidate point, counting local color information on two sides of the candidate point, comparing the local color information with the contour shape data of the object, and finding out a matching point according to the similarity.

In the present embodiment, each candidate point C is used_iCounting color histograms on two sides of the normal line by taking the color histogram as a center to obtain local shape data application of the candidate point_cThen, the calculated object contour shape data is compared with the calculated object contour shape data_objA similarity measure is performed. The similarity measure is a measure for comprehensively evaluating the degree of closeness between two things. The closer two things are, the larger their similarity measure is, and the further apart the two things are, the smaller their similarity measure is. The methods for measuring similarity are various and generally selected according to practical problems. For example, the distance L1 or L2 or other means may be used in this embodiment, and is not particularly limited herein.

In the present embodiment, the most similar point in the candidate points is found by the similarity measure, a similarity threshold may be set, and when the similarity measure exceeds the threshold, the point may be regarded as a matching point.

And step 306, inserting the found matching points into the 2D-3D matching pairs.

In this embodiment, the found matching points are inserted into a 2D-3D matching pair. The 2D coordinates in the 2D-3D matching pair come from the matching points, and the 3D coordinates come from the coordinates of the corresponding contour sampling points on the 3D model of the object.

And 4, carrying out attitude calculation according to the matching information.

The detailed manner of this step can be seen in fig. 5.

And step 401, solving the PnP by using a RANSAC mechanism according to the 2D-3D matching pair.

In the present embodiment, the PnP (passive-n-point) solution algorithm is to obtain the position and the posture of the object by calculating the projection relationship of a plurality of 2D-3D matching pairs. RANSAC (RANdom SAmple Consensus) can iteratively estimate parameters of an optimal mathematical model from a set of observed datasets comprising inliers (data that build the model) and outliers (data that cannot adapt to the model).

Step 402, judging whether the number of the inner points exceeds a preset threshold value, and if so, entering step 403; successfully tracking the current frame and outputting the posture with 6 degrees of freedom; if the current frame tracking fails, returning to the updating of the predicted attitude and repeating the process.

In the present embodiment, the number of inliers is determined from the reprojection error after the PnP is solved by the RANSAC mechanism, and it is determined whether or not the number of inliers exceeds a preset threshold, for example, 8.

Step 403, successfully tracking the current frame and outputting a 6-degree-of-freedom posture;

step 404, inputting a next frame.

As shown in fig. 6, the embodiment of the present invention further provides a system for tracking the posture of a weak texture object, where the input of the system is the live-action image of the target object collected by the image collection unit. The target object live-action image is displayed in a display unit of the equipment through a man-machine interaction unit, and the object 3D model is stored in a data storage unit. When the gesture tracking is carried out, the rendering unit is responsible for carrying out projection rendering on the 3D model of the object, the rendered plane image is subjected to data synchronization with the real image of the target object acquired by the image acquisition unit through the communication synchronization unit, gesture tracking calculation with 6 degrees of freedom is carried out in the calculation unit together with the real image of the target object, and finally gesture data of the target object are obtained.

The computing unit is specifically configured to perform attitude prediction on an input current frame image to obtain a predicted object attitude, obtain object contour shape data according to the predicted object attitude, perform search and matching of contour shape features on a current live-action image according to the object contour shape data, and perform attitude calculation according to matching information.

It should be noted that the specific technical details of the above-mentioned weak texture object posture tracking system are similar to the weak texture object posture tracking method, and in particular, the novel non-uniform adaptive step size contour point sampling algorithm and the efficient shape feature description and matching algorithm of the object contour involved in the processes of object posture prediction, object contour shape data acquisition, live-action image contour shape search and matching, and 6-degree-of-freedom posture calculation by the computing unit are similar to the aforementioned weak texture object posture tracking method, and therefore detailed description is omitted.

As shown in fig. 7, an embodiment of the present invention further provides a weak texture object pose tracking apparatus, including a memory and a processor, where:

a memory 701 for storing codes and documents;

a processor 702 for executing the code and documents stored in the memory for implementing the method for weak texture three-dimensional object pose estimation as described above.

The specific technical details of the above-mentioned weak texture object posture tracking device are similar to those of the above-mentioned weak texture object posture tracking method, and therefore detailed descriptions thereof are omitted.

Those skilled in the art will understand that all or part of the steps in the method according to the above embodiments may be implemented by a program instructing related hardware to complete, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments.

Finally, it should be noted that: the foregoing description of various embodiments of the invention is provided to those skilled in the art for the purpose of illustration. It is not intended to be exhaustive or to limit the invention to a single disclosed embodiment. Various alternatives and modifications of the invention, as described above, will be apparent to those skilled in the art. Thus, while some alternative embodiments have been discussed in detail, other embodiments will be apparent or relatively easy to derive by those of ordinary skill in the art. The present invention is intended to embrace all such alternatives, modifications, and variances which have been discussed herein, and other embodiments which fall within the spirit and scope of the above application.

Claims

1. A weak texture object posture tracking method is characterized by comprising the following steps:

predicting the attitude of an input current frame image to obtain a predicted object attitude, which specifically comprises the following steps: inputting a current frame image; if the current frame is the initial frame, adjusting the default posture of the system as the predicted object posture rotation matrix R₀＝R_adjustTranslation vector t₀＝t_adjust(ii) a Or, if the current frame is not the initial frame, using the pose of the previous frame as the predicted object pose rotation matrix R₀＝R_prevTranslation vector t₀＝t_prev；

Acquiring object contour shape data according to the predicted object posture, specifically comprising: performing projection rendering on the 3D model of the object according to the predicted object posture, and extracting a projection outline; sampling the outline information, and carrying out statistics on the local color information of the outline to obtain object outline shape data;

searching and matching contour shape characteristics on the current live-action image according to the contour shape data of the object;

carrying out attitude calculation according to the matching information;

s.t.Proj(Ori(M)；R，t)＝Ori(I)

adjusting the default posture according to a certain step length (Δ R, Δ t):

ΔR＝exp(δrx，δry，δrz)^T，Δt＝(δtx，δty，δtz)^T

R_adjust＝ΔR*R_init，t_adjust＝ΔR*t_init+Δt

the predicted object pose (R)₀，t₀) Is R₀＝R_adjust，t₀＝t_adjust。

2. The weak texture object pose tracking method according to claim 1, wherein the extracting of the projection contour specifically comprises:

rendering a binarized foreground image, and extracting a 2D contour of the foreground image.

3. The weak texture object pose tracking method according to claim 1, wherein the sampling of the contour information specifically comprises:

setting a self-adaptive sampling step length, and sampling the profile information by adopting a non-uniform sampling mode;

the self-adaptive sampling step calculation mode is as follows:

λ_i＝λ₀*Weigh(θ)，θ＝Angle(P_i-1)；

wherein λ is₀A base step size; theta represents more than one sampling point P_i-1Angle (P) formed at both sides of the contour line as the center_i-1) (ii) a Weigh (θ) represents the weighting factor for this angle;

the contour point obtained by sampling the contour information according to the self-adaptive sampling step length is P_i(u，v；x，y，z)|i＝[1，N]In which the contour point P_iBoth the 3D point position (x, y, z) in the 3D model of the object and the 2D position (u, v) on the projection image are included.

4. The method for tracking the posture of the weakly-textured object according to claim 3, wherein the step of performing statistics on the local color information of the contour to obtain the contour shape data of the object specifically comprises the following steps:

calculating each contour point P_iNormal direction vector n of_i：n_i＝(v_i-v_i+1，u_i+1-u_i)；

Wherein u is_i，v_iIs a contour point P_iThe pixel coordinates of (a); u. of_i+1，v_i+1Is a contour point P_i+1The pixel coordinates of (a); n is_iAnd P_i、P_i+1The connecting line between the points is vertical;

counting pixels on the live-action image along the positive direction and the negative direction of the normal vector, calculating a color histogram of the pixels, taking the color histogram as the local shape description of the contour point, and respectively recording the local shape description as an internal shape and an external shape;

respectively superposing the internal shapes and the external shapes of all the contour points to form the contour shape data Appearance of the object_obj：

Appearance_obj＝{∑_iIntApperance_i，∑_iExtAppearance_i}

Wherein Histo () is a statistical color histogram function; appeance_iRepresenting a contour point P_iIntApperatence of_iRepresenting the internal shape, ExtAppearance_iRepresenting the outer shape, α is the step factor.

5. The method for tracking the pose of a weak texture object according to claim 4, wherein the searching and matching of the contour shape features on the current live-action image according to the contour shape data of the object specifically comprises:

sampling point P of the profile_iProjecting on the current live-action image, and calculating the gradient of each pixel on the current live-action image along the positive direction and the negative direction of the normal of the contour sampling point;

presetting a pixel gradient threshold, if the pixel gradient exceeds the threshold, the pixel is a candidate point, and recording the candidate point as { C }_i|i＝[1，m]}；

Counting local color information on two sides of the candidate point, and comparing the local color information with the object contour shape data;

finding out the most similar point in the candidate points as a matching point through similarity measurement, or presetting a similarity threshold value and finding out the candidate points with the similarity larger than the threshold value as the matching points;

and inserting the matching points into a 2D-3D matching pair, wherein the 2D coordinates in the 2D-3D matching pair are from the matching points, and the 3D coordinates are from the coordinates of the corresponding contour sampling points on the 3D model of the object.

6. The weak texture object posture tracking method according to claim 5, wherein the performing posture calculation according to the matching information specifically comprises:

solving PnP by using a RANSAC mechanism according to the 2D-3D matching pair, and determining the number of interior points according to the reprojection error;

and presetting an interior point number threshold, if the number of the interior points exceeds the threshold, successfully tracking the current frame, and outputting the posture with 6 degrees of freedom.

7. A weak texture object pose tracking system, comprising:

the image acquisition unit is used for acquiring a real-scene image of the target object;

the human-computer interaction unit is used for displaying the acquired real scene image of the target object on the display unit;

a data storage unit for storing a 3D model of the object;

a rendering unit, configured to perform projection rendering of the 3D model of the object according to the predicted object pose, specifically configured to: performing projection rendering on a 3D model of an object according to the predicted object posture, rendering a binary foreground image, and extracting a 2D contour of the foreground image;

the communication synchronization unit is used for carrying out data synchronization on the rendered plane image and the acquired real image of the target object;

a computing unit for predicting the input current frame image to obtain the predicted object gesture, obtaining the object outline shape data according to the predicted object gesture, searching and matching the outline shape characteristics on the current real scene image according to the object outline shape data, and performing gesture according to the matching informationState resolving; the computing unit is specifically configured to: judging whether the input current frame image is an initial frame; if the current frame is the initial frame, adjusting the default posture of the system as the predicted object posture rotation matrix R₀＝R_adjustTranslation vector t₀＝t_adjustThe method specifically comprises the following steps:

s.t.Proj(Ori(M)；R，t)＝Ori(I)

adjusting the default posture according to a certain step length delta R, delta t:

ΔR＝exp(δrx，δry，δrz)^T，Δt＝(δtx，δty，δtz)^T

wherein, δ tx, δ ty, δ tz are the step length of unit translation; δ rx, δ ry, δ rz is the step size of the unit rotation vector; exp () is an exponential mapping function under the SO (3) group, and converts the rotation amount into a rotation matrix;

R_adjust＝ΔR*R_init，t_adjust＝ΔR*t_init+Δt

the predicted attitude of the object is R₀＝R_adjust，t₀＝t_adjust(ii) a If the current frame is not the initial frame, the pose of the previous frame is used as the predicted object pose rotation matrix R₀＝R_prevTranslation vector t₀＝t_prev。

8. The weak texture object pose tracking system of claim 7, wherein the computing unit is further specifically configured to:

setting a self-adaptive sampling step length, and sampling the profile information by adopting a non-uniform sampling mode; the self-adaptive sampling step calculation mode is as follows:

λ_i＝λ₀*Weigh(θ)，θ＝Angle(P_i-1)；

wherein λ is₀A base step size; theta represents more than one sampling point P_i-1An included angle is formed on two sides of the contour line as a center; weigh (θ) represents the weighting factor for this angle;

the contour point obtained by sampling the contour information according to the self-adaptive sampling step length is P_i(u，v；x，y，z)|i＝[1，N]In which the contour point P_iBoth the 3D point position (x, y, z) in the 3D model of the object and the 2D position (u, v) on the projection image;

Wherein u is_i，v_iIs the pixel coordinate of the contour point Pi; u. of_i+1，v_i+1Is a contour point P_i+1The pixel coordinates of (a); n is_iAnd P_i、P_i+1The connecting line between the points is vertical;

Appearance_obj＝{∑_iIntApperance_i，∑_iExtAppearance_i}

Where Histo () is a statistical color histogramA function; IntApperatence_iRepresenting the internal shape, ExtAppearance_iRepresenting the outer shape.

9. The weak texture object pose tracking system of claim 8, wherein the computing unit is further specifically configured to:

sampling point P of the profile_iProjecting on the current live-action image, and calculating the gradient of each pixel on the current live-action image along the positive direction and the negative direction of the normal of the contour sampling point; presetting a pixel gradient threshold, if the pixel gradient exceeds the threshold, the pixel is a candidate point, and recording the candidate point as { C }_i|i＝[1，m]}; counting local color information on two sides of the candidate point, and comparing the local color information with the object contour shape data; finding out the most similar point in the candidate points as a matching point through similarity measurement, or presetting a similarity threshold value and finding out the candidate points with the similarity larger than the threshold value as the matching points; inserting the matching points into a 2D-3D matching pair, wherein 2D coordinates in the 2D-3D matching pair come from the matching points, and 3D coordinates come from coordinates of corresponding contour sampling points on the 3D model of the object;

solving PnP by using a RANSAC mechanism according to the 2D-3D matching pair, and determining the number of interior points according to the reprojection error; and presetting an interior point number threshold, if the number of the interior points exceeds the threshold, successfully tracking the current frame, and outputting the posture with 6 degrees of freedom.

10. A weak texture object pose tracking apparatus, the apparatus comprising a memory and a processor, wherein:

the memory is used for storing codes and documents;

the processor for executing the code and documents stored in the memory for implementing the method steps of any of claims 1 to 6.