CN111275734B - Object identification and tracking system and method thereof - Google Patents

Object identification and tracking system and method thereof Download PDF

Info

Publication number
CN111275734B
CN111275734B CN201811626054.3A CN201811626054A CN111275734B CN 111275734 B CN111275734 B CN 111275734B CN 201811626054 A CN201811626054 A CN 201811626054A CN 111275734 B CN111275734 B CN 111275734B
Authority
CN
China
Prior art keywords
tracking
mobile device
template
module
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811626054.3A
Other languages
Chinese (zh)
Other versions
CN111275734A (en
Inventor
黄圣筑
林奕成
黄伟伦
卢奕丞
刘郁昌
刘旭航
林家煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiwan Chunghwa Telecom Co ltd
Original Assignee
Taiwan Chunghwa Telecom Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiwan Chunghwa Telecom Co ltd filed Critical Taiwan Chunghwa Telecom Co ltd
Publication of CN111275734A publication Critical patent/CN111275734A/en
Application granted granted Critical
Publication of CN111275734B publication Critical patent/CN111275734B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an object identification and tracking system and a method thereof, wherein the system comprises a server and a mobile device. The model construction module of the server constructs a plurality of models with different visual angles on the three-dimensional model of the object in a projection mode, and the characteristic extraction module of the server extracts model characteristics of the templates with different visual angles. The object identification and tracking module of the mobile device compares the data of the plurality of template features to identify the object and the view angle thereof, and then uses an iterative nearest point algorithm, a hidden surface removal method and a bidirectional corresponding inspection method to track the view angle of the object. The hidden surface removal method removes or ignores template features that are not observable from the perspective of the object when performing the iterative closest point algorithm. When searching the closest data of the template feature, the two data of the template feature are checked or searched bidirectionally by the bidirectionally corresponding checking method whether the two data are the closest data of each other.

Description

Object identification and tracking system and method thereof
Technical Field
The present invention relates to object recognition and tracking technology, and more particularly, to an object recognition and tracking system and method thereof.
Background
In one prior art, a method and an electronic device for tracking a moving object are provided, which utilize a plurality of cameras to receive a plurality of video data, and compare a plurality of different frames (frames) to obtain the position and the moving path of the object, but the prior art can only track the translation position of the object in a frame, but cannot identify and track the object or obtain the viewing angle of the object.
In another prior art, a Multi-tracker object tracking (Multi-tracker object tracking) system is proposed, which can integrate multiple trackers (such as a profile tracker and an optical tracker) to work together so as to obtain a stable object tracking effect, but this prior art is difficult to reduce the amount of computation required for object tracking.
Therefore, how to solve the above-mentioned drawbacks of the prior art, to identify and track the object or to know the viewing angle of the object, or to reduce the amount of computation required for tracking the object, has been a major problem for those skilled in the art.
Disclosure of Invention
The invention provides an object identification and tracking system and a method thereof, which can identify and track an object or obtain the visual angle of the object or reduce the operation amount required for tracking the object.
The object identifying and tracking system of the present invention comprises: the system comprises a server, a model construction module and a feature extraction module, wherein the model construction module is used for constructing a plurality of templates with different visual angles on a three-dimensional model of an object in a projection mode, and the feature extraction module is used for capturing, analyzing or simplifying the data of the model features of the templates with different visual angles; and a mobile device, which obtains or downloads the data of the plurality of template features from the server, the mobile device has an object recognition and tracking module to compare the data of the plurality of template features to recognize the object and the view angle thereof, and the object recognition and tracking module performs view angle tracking of the object by using three methods of iterative closest point algorithm (Iterative Closest Point algorithm, ICP), hidden surface removal method and two-way correspondence inspection method, wherein, when the iterative closest point algorithm is executed, the object recognition and tracking module removes or ignores the template features which cannot be observed by the view angle of the object by using the hidden surface removal method, and when the iterative closest point algorithm searches the closest data of the template features, the object recognition and tracking module uses the two-way correspondence inspection method to check or search whether the two data of the template features are the closest data of each other.
The object identification and tracking method of the invention comprises the following steps: a template construction module of a server is used for constructing templates with a plurality of different visual angles on a three-dimensional model of an object in a projection mode, the characteristic extraction module of the server extracts, analyzes or simplifies the data of the template characteristics of templates with different visual angles; and obtaining or downloading data of a plurality of template features from the server by a mobile device, comparing the data of the plurality of template features by an object identification and tracking module of the mobile device to identify the object and the view angle thereof, and tracking the view angle of the object by the object identification and tracking module by using an iterative closest point algorithm, a hidden surface removal method and a two-way corresponding checking method, wherein when the iterative closest point algorithm is executed, the object identification and tracking module removes or ignores the template features which cannot be observed by the view angle of the object by using the hidden surface removal method, and when the iterative closest point algorithm searches the closest data of the template features, the object identification and tracking module uses the two-way corresponding checking method to check or search whether the two data of the template features are the closest data of each other.
In order to make the above features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below. Additional features and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The features and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the scope of the invention as claimed.
Drawings
FIG. 1 is a schematic architecture diagram of an object recognition and tracking system of the present invention;
FIG. 2 is a simplified schematic diagram of a usage flow of the object recognition and tracking system and method thereof according to the present invention;
FIGS. 3A and 3B are schematic diagrams illustrating the construction of a multi-view template by means of pictorial projection in accordance with the present invention;
FIG. 4 shows the present invention along the optical axis a schematic of a plurality of templates rotated;
FIG. 5 is a schematic diagram of the invention in which all template vectors are formed into a matrix of templates;
FIG. 6 is a flow chart of the mobile device in interactive operation according to the present invention; and
fig. 7 is a schematic diagram of a dynamic handoff procedure of the mobile device in the tracking phase according to the present invention.
Symbol description
1. Object identification and tracking system
10. Mobile device
11. Color camera
12. Depth sensor
13. Foreground cutting module
14. Object identification and tracking module
141. Iterative nearest point algorithm
142. Hidden surface removing method
143. Two-way correspondence checking method
144. Device motion tracking method
145. Posture measurement method
15. Display module
20. Server device
21. Three-dimensional model reconstruction module
22. Template construction module
23. Feature extraction module
A object
B three-dimensional model
C sample plate
D template features
F object selection interface
F1 Identification stage
F2 Tracking stage
T' template matrix
Steps S11 to S14, S21 to S25
Steps S31 to S33, S41 to S45.
Detailed Description
The following embodiments of the present invention will be described in terms of specific embodiments, and those skilled in the art will readily appreciate from the disclosure of the present invention that the present invention may be implemented or practiced in other embodiments.
The invention provides an object identification and tracking system and a method thereof, which are key technologies for expanding the application of an augmented reality (Augmented Reality, AR), for example, the object identification and tracking system and the method thereof without the marker can shoot or scan an object (target object) through a color camera and a depth sensor of a mobile device so as to identify and track the object (target object) for subsequent application of the Augmented Reality (AR).
The invention discloses an object identification and tracking system and a method thereof based on computer vision technology, wherein an object (target object) is shot or scanned by a color camera and a depth sensor of a mobile device, and color characteristics and depth information of the object are analyzed by an object identification and tracking module so as to identify the state and visual angle of the object (target object). In addition, the invention cooperates with the dynamic sensing information attached in the mobile device, and under the condition that the mobile device moves in a small amplitude within a short time interval, the mobile device automatically changes the sensing information to estimate the movement, thereby achieving the function of tracking the three-dimensional (3D) dynamic of an object (target object) with lower operand. Meanwhile, the invention can simplify the data of the template to be identified in advance through the server so as to reduce the operation amount and the data amount required by identifying the template in real time.
Fig. 1 is a diagram of an object recognition and tracking system 1 according to the present invention, which includes a mobile device 10 and a server 20. The mobile device 10 may be, for example, a smart phone, a smart glasses, a tablet computer, etc., and the server 20 may be, for example, a remote server, a cloud server, a network server, a background server, etc., but is not limited thereto.
The server 20 may have a template construction module 22 and a feature extraction module 23, wherein the template construction module 22 constructs templates C with a plurality of different angles of view for the three-dimensional model B of the object a in a projection manner, and the feature extraction module 23 extracts, analyzes or simplifies the data of the template features D of the templates C with the plurality of different angles of view. Meanwhile, the mobile device 10 may acquire or download data of a plurality of template features D from the server 20, the mobile device 10 has an object recognition and tracking module 14 for comparing the data of the plurality of template features D to identify the object a and its viewing angle, and the object recognition and tracking module 14 performs the viewing angle tracking of the object a by using three methods of iterative closest point algorithm (ICP) 141, hidden surface removal method 142 and two-way correspondence inspection 143. Moreover, when the iterative closest point algorithm 141 is executed, the object recognition and tracking module 14 removes or ignores the template feature D that cannot be observed by the view angle of the object a by using the hidden surface removal method 142, and when the iterative closest point algorithm 141 searches for the closest data of the template feature D, the object recognition and tracking module 14 uses the bi-directional correspondence checking method 143 to bi-directionally check or search whether the two data of the template feature D are the closest data of each other.
The object recognition and tracking system 1 may be operated in two ways, namely a pre-processing stage and an interactive operation stage. The pre-processing stage of the first part mainly comprises: the three-dimensional model B of the object a is identified by the template constructing module 22 of the server 20, so as to construct templates C with a plurality of different viewing angles according to the three-dimensional model B, and the feature extracting module 23 of the server 20 extracts the templates C with the plurality of different viewing angles to generate corresponding template features D. The interactive operation phase of the second part mainly comprises: the identification and tracking orientation of the object a is performed by the object identification and tracking module 14 of the mobile device 10.
In the pre-processing stage of the object recognition and tracking system 1, the user can shoot or scan the actual object a (target object) or input the three-dimensional model B of the object a (which can also be the target object) through the mobile device 10, so that the server 20 can establish a plurality of templates C and template features D with different viewing angles according to the three-dimensional model B of the object a. For example, the user may take or scan the object a around the mobile device 10 to upload the color image and the three-dimensional (3D) point cloud of the object a to the server 20, and then build the three-dimensional model B of the object a by the three-dimensional model reconstruction module 21 of the server 20, or the user may directly input or upload the three-dimensional model B of the object a to the server 20 through the mobile device 10 or any other electronic device. Then, the three-dimensional model B of the object a is constructed by the template constructing module 22 of the server 20 in a projection manner to form a plurality of templates C with different viewing angles, and the feature extracting module 23 of the server 20 extracts, analyzes or simplifies the data of the template features D of the templates C with different viewing angles for subsequent comparison.
In the interactive operation phase of the object recognition and tracking system 1, the user can recognize and track the object a through the object recognition and tracking module 14 of the mobile device 10 by the following procedures P11 to P14.
Program P11: the object recognition and tracking module 14 of the mobile device 10 compares the template features D of the templates C from a plurality of different perspectives to perform the recognition of the object a and the perspectives thereof. For example, after the mobile device 10 obtains or downloads the data of the plurality of template features D from the server 20, the object recognition and tracking module 14 of the mobile device 10 can compare the color images and the depth information of the plurality of template features D to identify the object a and the viewing angle (e.g. rough viewing angle) thereof.
Program P12: the object recognition and tracking module 14 of the mobile device 10 performs perspective tracking of the object a using an iterative closest point algorithm (ICP). For example, the object recognition and tracking module 14 can combine the hidden surface removal method 142 and the bi-directional correspondence checking method 143 according to the present invention based on the rough viewing angle of the object a obtained after recognition, so as to enhance the angle tracking effect of the conventional iterative closest point algorithm (iterative approximation method) on the object a.
Program P13: when the mobile device 10 has only a small motion within a short duration, the object recognition and tracking module 14 can switch automatically to perform the view tracking of the object a by the device motion tracking method 144. For example, when the object recognition and tracking module 14 analyzes only small movements of the mobile device 10 within a short time interval, the object recognition and tracking module 14 can automatically switch to estimate the relative viewing angle movement of the object a based on dynamic sensing information obtained by the inertial measurement unit (Inertial Measurement Unit, IMU) of the mobile device 10. Accordingly, it is the process comprises, the invention can reduce the complex comparison operation amount of the relative visual angle movement of the object A, improve the system response rate or reduce the calculation energy consumption.
Program P14: the object recognition and tracking module 14 of the mobile device 10 automatically determines whether to switch back to full view tracking or object recognition. For example, the object recognition and tracking module 14 can compare the difference between the dynamic tracking effect of the device on the object a and the scene of the photographed object a, so that the object recognition and tracking module 14 switches back to the complete view tracking calculation or needs to perform the object view recognition again when the difference exceeds the threshold value.
The above-mentioned five modules, namely the foreground cutting module 13, the object identifying and tracking module 14, the three-dimensional model reconstructing module 21, the template constructing module 22 and the feature capturing module 23, can be constructed, composed or realized in the form of hardware, firmware or software. For example, the five modules are constructed using a single chip or multiple chips of hardware. Alternatively, the foreground cutting module 13 may be foreground cutting software or program, the object identifying and tracking module 14 may be object identifying and tracking software or program, the three-dimensional model reconstructing module 21 may be three-dimensional model reconstructing software or program, the template constructing module 22 may be template constructing software or program, and the feature extracting module 23 may be feature extracting software or program. However, the present invention is not limited thereto.
Fig. 2 is a simplified schematic diagram of the usage flow of the object identifying and tracking system 1 and the method thereof according to the present invention, please refer to fig. 1. Before the whole triggering procedure, the user can select the object a (see step S11 of fig. 2) to be identified and tracked, such as a toy car, a toy plane, etc., through the object selection interface F (see fig. 2) of the mobile device 10 (see fig. 1). If the data of the object a is not stored in the mobile device 10, the mobile device 10 obtains or downloads a data package of the object a from the server 20 (see step S12 of fig. 2), wherein the content of the data package of the object a includes multi-view template posture information, color template data, depth template data and weight values, and the data package is stored in a memory (e.g. a hard disk or a memory card) of the mobile device 10 of the user.
The triggering procedure may be started after the data of the selected object a and the checked object a exist, the object a is first placed near the center of the screen of the mobile device 10, so that the mobile device 10 can shoot the object a (see step S13 of fig. 2), the foreground cutting module 13 (see fig. 1) of the mobile device 10 will automatically perform foreground cutting, view angle recognition and tracking on the object a in the background, and the obtained gesture result of the object a is plotted on the corresponding position of the screen object of the mobile device 10 in a three-dimensional (3D) point cloud manner, so that the three-dimensional (3D) point cloud result is displayed on the screen of the mobile device 10 through the display module 15 (see step S14 of fig. 2), or other Augmented Reality (AR) auxiliary information is presented on the screen of the mobile device 10.
Fig. 3A and fig. 3B are schematic views of a template C for constructing a multi-view template for an object a by means of a pictorial projection according to the present invention, and please refer to fig. 1. Fig. 3A is a view of a general object a, which is projected in a hemispherical shape or at a finer angle. Fig. 3B is a view projection of an object a in a symmetrical form, since the object a may have similar projection images around the symmetry axis, only a semicircular view projection is required for one of the cross sections.
As shown in fig. 3A, 3B and 1, in the preprocessing stage, after the mobile device 10 captures an object a (target object), the mobile device 10 may transmit the color image and depth information of the object a to the server 20 for the three-dimensional model reconstruction module 21 of the server 20 to model the object a to generate a three-dimensional model B, or may directly input the three-dimensional model B of the object a (target object) to the server 20 through the mobile device 10 or any other electronic device. Then, the server 20 may construct a multi-view template C for the three-dimensional model B of the object a in a schematic projection manner, so that the feature extraction module 23 of the server 20 analyzes the multi-view template C to obtain the information of the template feature D.
FIG. 4 is a schematic view of a plurality of templates C of the present invention rotated along an Optical Axis (Optical Axis). In order to quickly cope with the case where an object rotates along the optical axis at a certain viewpoint, the present invention also pre-calculates a plurality of templates C rotating along the optical axis, such rotation being called in-plane rotation.
FIG. 5 is a schematic diagram of the invention, in which all template vectors are formed into a matrix T' of templates, the right side T 1 、T 2 To T n Representing a plurality of original template images, the middle t 1 '、t 2 ' to t n ' represents a plurality of resulting images that have undergone a LoG, where LoG represents a laplacian of gaussian (Laplacian of Gaussian). T' is a template matrix formed by combining vectorized template data.
Since the comparison of the templates C is easily interfered or influenced by light variation, shadow, noise and the like, and the calculation amount required by the full-image comparison of the templates C is huge, in order to increase the accuracy of the identification of the templates C and the resistance to the interference, the mobile device 10 of the present invention reforms the LoG (laplace operator) and normalized (normalized) information of each template C into a single vector, and forms the vectors of all templates C into a template matrix T', and uses cross-correlation (cross-correlation) as a comparison method of feature vectors.
In addition, the mobile device 10 of the present invention may reduce the amount of data required on the mobile device 10 or reduce the dimension of the template matrix T' by way of singular value decomposition (Singular Value Decomposition, SVD). Meanwhile, the invention reserves the dimension which is enough to represent the original data to reduce the used data amount on the basis of not excessively reducing the specific accuracy and improving the efficiency. These data, which generate template feature D at server 20, are then wrapped into a dataset for download and comparison by mobile device 10.
Fig. 6 is a flow chart of the mobile device 10 according to the present invention in interactive operation, please refer to fig. 1. The present invention can capture or scan a scene of an object a (target object) through the color camera 11 and the depth sensor 12 of the mobile device 10 of fig. 1, and perform foreground cutting by the foreground cutting module 13 by using a planar cutting technique to obtain a contour region of the object a (target object).
Meanwhile, the object identifying and tracking method of the present invention may include a first stage (identifying stage F1) and a second stage (tracking stage F2) of fig. 6.
In the first stage (recognition stage F1) of fig. 6, the object recognition and tracking module 14 of fig. 1 analyzes the foreground region features of the object a, and compares the foreground region features of the object a with the data of the template features D generated in advance to recognize the state and viewing angle of the object a (target object). After the object identifying and tracking module 14 obtains the object in the foreground region, it normalizes and scales the foreground region to a specified size, performs LoG and normalization and vectorization information on the foreground color and the depth image in an analysis manner when creating the template C, and performs a cross-correlation operation with a template matrix T' generated in advance to calculate the similarity of the template C, where the highest score obtained by the cross-correlation operation is the template C with the highest similarity, and uses the pose of the template C as the initial estimated pose of the object a. And then, calculating whether the current result and the previous frame have an excessive rotation angle difference by using the quaternion so as to avoid erroneous results caused by the fact that the positive and negative shapes are too similar. To ensure the reliability of the alignment, the similarity of template C exceeds a certain threshold and the first is set to the initial alignment.
For example, in the first stage (the recognition stage F1) of fig. 6, the object recognition and tracking module 14 can perform the comparison of the templates C in step S21 and perform the flip check of the templates C in step S22. If none of the angles in the templates C is smaller than the threshold value, step S23 is performed to obtain C miss (failure to detect) is increased by 1, and if for example C miss (failure detection) is greater than 5, step S24 is performed again to reset the initial alignment posture. On the contrary, if the angles in the templates C are smaller than the threshold value, step S25 is performed to set the alignment posture.
In the second stage (tracking stage F2) of fig. 6, the object recognition and tracking module 14 may perform ICP (iterative closest point algorithm) tracking or device motion tracking of step S31 according to the alignment pose set in step S25. If the tracking fails, the recognition stage F1 is returned (template comparison in step S21). On the contrary, if the tracking is successful, the object recognition and tracking module 14 sequentially performs the gesture smoothing of step S32 and the updated gesture comparison of step S33, and returns to ICP (iterative closest point algorithm) tracking or device motion tracking of step S31.
The gesture smoothing in step S32 is performed by iterating the closest point algorithm (ICP) 141 to downsample and the user holds the mobile device 10 for movement to easily shake, the tracked gesture may be too jumped to be smooth. If the tracking is successful, the object recognition and tracking module 14 records the pose and smoothes the current pose and the pose of the previous two frames with a Gaussian filter (Gaussian filter) so that the process image is smoother.
The first stage (recognition stage F1) can estimate the rough viewing angle direction of the object a, while the second stage (tracking stage F2) requires a more accurate tracking viewing angle. Conventionally, the tracking of the view angle is obtained only through an iterative closest point algorithm (ICP) aimed at finding the rotation matrix R and the translation matrix t where the two point sets are aligned optimally. Let us assume that there is a set of points P (e.g., p= { P i },i=1,…N P ) And a set of points Q of another target (e.g., q= { Q i },i=1,…,N Q ) WhereinThe conventional iterative closest point algorithm (ICP) will correspond to the closest point, the corresponding point set is +.>As shown in the following formula (1), wherein P, Q,/is shown in the following formula (1)>For dot aggregation, p i 、q i Points i, j, N P 、N Q And x, y and z are respectively the values of an x axis, a y axis and a z axis.
The relationship between the rotation matrix R and the translation matrix t, which are the best, can be written as an objective function to be converted into E (R, t) which is the smallest in the following equation (2), i.e. find a set of the rotation matrix R and the translation matrix t so that they are closest, where E (R, t) is the total error value between the point set and the actual point set calculated according to the rotation matrix R and the translation matrix t.
The method can estimate the rotation matrix R and the translation matrix t between the visual angle and the rough visual angle of the object A actually photographed, and further obtain the relative motion of the object A. However, the conventional iterative closest point algorithm (ICP) has a disadvantage of being prone to be trapped in a local minimum, so that the present invention adds (1) the hidden surface removal method 142 and (2) the bi-directional correspondence check method 143 to the conventional iterative closest point algorithm (ICP) to obtain a more accurate tracking view angle of the object a.
(1) Hidden surface removal 142: conventional iterative closest point algorithms (ICP) compare the entire set of points, which is time consuming and prone to instability. Since the rough viewing angle of the object a can be obtained in the present invention, the hidden surface removing method 142 of the present invention can remove points of the object a that cannot be seen from the viewing angle, and only uses the visible points (the remaining points) of the viewing angle of the object a for comparison, so as to reduce the blur zone and the judder of the tracking track between consecutive frames in the comparison process.
(2) Bi-directional correspondence check 143: conventional iterative closest point algorithm (ICP) for each input pointSearch for corresponding points in only one direction +.>However, the bi-directional correspondence check 143 of the present invention may consider searching for more than the point p i Nearest point->Also search for the point q j Nearest point->Point of p i And point q j Point p when the closest points are each other i And point q j The points are referred to as bi-directional correspondence and have bi-directional correspondence should be more representative.
In addition, considering that the computing capability of the mobile device 10 is weaker than that of the server 20, if the application program of the mobile device 10 performs excessive data computation, the rate of the mobile device 10 and the remaining battery life of the mobile device 10 are affected. In many mobile applications, such as augmented reality applications, the relative viewing angle of the object a (target object) to the mobile device 10 does not change much over a short time period and is primarily due to movement of the mobile device 10. Therefore, the present invention proposes the device motion tracking method 144 within a short time interval after the state and angle of the object a are identified, and the device motion tracking method 144 can use the dynamic sensing information obtained by the Inertial Measurement Unit (IMU) of the mobile device 10 as a motion conversion reference, so as to achieve a high response rate and a low operation amount for identifying and tracking the object a (target object) on the mobile device 10.
Fig. 7 is a schematic diagram of a dynamic switching process of the mobile device 10 in the tracking stage according to the present invention, please refer to fig. 1, and fig. 7 is mainly implemented by iterative closest point algorithm (ICP) 141, device motion tracking method 144 and gesture measurement method 145 in conjunction.
In step S41 of fig. 7, after the mobile device 10 recognizes the rough gesture of the object a, the object recognition and tracking module 14 uses the iterative closest point algorithm (ICP) 141 to fine tune the viewing angle of the corrected object a. Meanwhile, in step S42 of fig. 7, the object recognition and tracking module 14 uses the pose measurement 145 to compare the differences between the contour and the depth image of the object a to calculate the error of the view angle of the object a.
In step S43 of fig. 7, if the error of the angle of view of the object a is greater than the predetermined threshold value, it indicates that the estimated direction is wrong (i.e., tracking fails), i.e., the recognition phase (the step of object state recognition) is returned. Otherwise, in step S44 of fig. 7, if the angle of view error is not greater than the predetermined threshold value, the result is acceptable (i.e. tracking is successful), and the object recognition and tracking module 14 switches to reverse the current angle of view of the object a with the device motion information of the device motion tracking method 144.
In step S45 of fig. 7, after a period of time (e.g., every 100 frames), the object recognition and tracking module 14 performs pose measurement on the current foreground object and the estimated object view angle by using the pose measurement method 145 to obtain a pose measurement value. If the pose measurement is less than the predetermined threshold (i.e., tracking is successful), the object recognition and tracking module 14 maintains the device motion tracking with the device motion tracking method 144 of step S44. Otherwise, if the pose measurement value is not smaller than the predetermined threshold value (i.e. tracking failure), the iterative closest point algorithm (ICP) 141 of step S41 is used again to adjust the viewing angle of the object, and the pose measurement method 145 of step S42 is used again to perform the pose measurement, and if the pose measurement value is still larger than the threshold value (i.e. tracking failure), the recognition stage (object state recognition step) of step S43 is returned to re-estimate the viewing angle of the object a.
As described above with reference to fig. 1 to 7, the object identifying and tracking method of the present invention mainly includes: the three-dimensional model B of the object a is constructed by a template construction module 22 of the server 20 in a projection manner to form a plurality of templates C with different viewing angles, and the feature extraction module 23 of the server 20 extracts, analyzes or simplifies the data of the template features D of the templates C with different viewing angles. Meanwhile, the mobile device 10 obtains or downloads the data of the plurality of template features D from the server 20, and the object recognition and tracking module 14 of the mobile device 10 compares the data of the plurality of template features D to recognize the object a and the view angle thereof, and the object recognition and tracking module 14 performs the view angle tracking of the object a by using the iterative closest point algorithm 141, the hidden surface removal method 142 and the two-way correspondence inspection method 143. When the iterative closest point algorithm 141 is executed, the object recognition and tracking module 14 removes or ignores the template feature D that cannot be observed by the perspective of the object a using the hidden surface removal method 142, and when the iterative closest point algorithm 141 searches for the closest data of the template feature D, the object recognition and tracking module 14 uses the bi-directional correspondence checking method 143 to bi-directionally check or search whether the two data of the template feature D are the closest data to each other.
Specifically, the object identifying and tracking method of the present invention can be described in the following procedures P21 to P26, and the rest of the technical content is the same as the detailed description of fig. 1 to 7, and will not be repeated here.
Program P21: the three-dimensional model B is created or acquired by the providing server 20 by photographing or scanning the actual object a or inputting the three-dimensional model B of the object a by the mobile device 10.
Program P22: the template construction module 22 of the server 20 constructs templates C with a plurality of different viewing angles for the three-dimensional model B in a projection manner, and the feature extraction module 23 of the server 20 extracts the templates C with the plurality of different viewing angles to generate corresponding template features D.
Program P23: the object a and its rough viewing angle are identified by the object recognition and tracking module 14 of the mobile device 100 comparing the object a with the template features D of templates C of a plurality of different viewing angles.
Program P24: the object recognition and tracking module 14 of the mobile device 100 performs the view tracking of the object a by using an iterative closest point algorithm 141 (iterative approximation method) according to the rough view of the object a to obtain a more accurate view.
Program P25: when the mobile device 10 has only a small motion for a period of time, the object recognition and tracking module 14 of the mobile device 10 automatically changes the device motion tracking method 144 to perform the view tracking of the object a.
Program P26: the object recognition and tracking module 14 of the mobile device 10 compares the difference between the effect of viewing angle tracking of the object a and the photographed scene of the object a through the device motion tracking method 144, and when the difference exceeds the threshold value, the object recognition and tracking module 14 of the mobile device 100 automatically changes to the iterative closest point algorithm 141 (iterative approximation method) to perform the viewing angle tracking of the object a, or re-performs the recognition of the object a and the viewing angle thereof.
The object recognition and tracking module 14 may include a hidden surface removal method 142. While performing the iterative closest point algorithm 141, the object recognition and tracking module 14 uses the hidden surface removal method 142 to remove or ignore the template feature D that cannot be observed with the rough perspective of the object a.
The object identifying and tracking module 14 may include a bi-directional correspondence checking method 143, and when the iterative closest point algorithm 141 searches for the closest data of the template feature D, the object identifying and tracking module 14 uses the bi-directional correspondence checking method 143 to bi-directionally check or search whether the two data of the template feature D are the closest data to each other. For example, the bidirectional correspondence checking method 143 can search the closest data B of the data a, and can also check whether the closest data of the data B is the data a, thereby improving the reliability and accuracy of the correspondence between the data a and the data B.
In summary, the object recognition and tracking system and method of the present invention may have the following features, advantages or technical effects:
1. the mobile device of the invention can track the position and the visual angle of an object (target object), so as to expand the application range of the amplification reality.
2. The invention moves the time-consuming template construction and template characteristic analysis to the server for operation, so as to reduce the operation amount and data amount required by the mobile device for real-time identification.
3. The object identification and tracking module of the invention can combine an iterative closest point algorithm (ICP) with a hidden surface removal method and a two-way correspondence inspection method to obtain a more accurate tracking view angle of the object.
4. The hidden surface removal method can remove points which cannot be seen by a visual angle, and only uses the visual point (the rest points) of the visual angle for comparison, so as to reduce the fuzzy zone and the judder condition of tracking tracks between continuous pictures in the comparison process.
5. The bidirectional correspondence checking method can bidirectionally check or search whether the two data of the template feature are the closest data to each other, thereby improving the reliability and accuracy of the correspondence of the two data.
6. The object identification and tracking module can automatically estimate the three-dimensional relative motion of an object (target object) by using dynamic sensing information under the condition that the mobile device only has small-amplitude motion, so as to greatly reduce the complex comparison operation amount of the relative visual angle motion of the object, improve the system response rate or reduce the calculation energy consumption.
7. The invention can dynamically adjust the operation mode of the visual angle of the object according to the condition of the mobile device, can keep low angle error when tracking the object, reduces operation energy consumption and maintains instant interactivity.
8. The present invention is applicable to, for example, the following industries. (1) manufacturing: the assembly prompt of the product and the application of intelligent manufacturing and maintenance in new generation industry 4.0. (2) education industry: anatomical teaching of organ architecture. (3) food industry: description and suggestion of nutrient components and eating modes. (4) advertisement commerce: and displaying and interacting the commodity advertisement content. (5) service industry: the remote video assists the customer in performing troubleshooting or finishing operations. (6) gaming industry: doll interactive game. In addition, the invention can also be applied to products such as smart glasses.
The above-described embodiments are merely illustrative of the principles, features and effects of the present invention, and are not intended to limit the scope of the invention, which can be modified or altered by those skilled in the art without departing from the spirit and scope of the invention. Any equivalent changes and modifications made by the present disclosure are intended to be covered by the scope of the appended claims. Accordingly, the scope of the invention is to be indicated by the appended claims.

Claims (20)

1. An object recognition and tracking system, comprising:
the system comprises a server, a model construction module and a feature extraction module, wherein the model construction module is used for constructing a plurality of templates with different visual angles on a three-dimensional model of an object in a projection mode, and the feature extraction module is used for extracting model feature data of the templates with different visual angles; and
the mobile device is provided with an object identification and tracking module for identifying the object and the view angle thereof by comparing the data of the plurality of template features, and the object identification and tracking module performs view angle tracking of the object by using an iterative nearest point algorithm, a hidden surface removal method and a bidirectional correspondence checking method, wherein the hidden surface removal method is used for removing points which cannot be seen by the view angle of the object and comparing the points by using the visual point of the view angle of the object, and the bidirectional correspondence checking method is used for checking or searching whether two data of the template features are the closest data of each other in a bidirectional way; when the iterative closest point algorithm is executed, the object identifying and tracking module removes or ignores the template feature which cannot be observed by the view angle of the object by using the hidden surface removing method, and when the iterative closest point algorithm searches the closest data of the template feature, the object identifying and tracking module bidirectionally checks or searches whether two data of the template feature are the closest data of each other by using the bidirectional correspondence checking method.
2. The system of claim 1, wherein the server further comprises a three-dimensional model reconstruction module for creating a three-dimensional model of the object for the template construction module to construct templates of the plurality of different perspectives on the three-dimensional model of the object in the projection manner.
3. The object recognition and tracking system according to claim 1, wherein the mobile device further comprises a color camera and a depth sensor for photographing or scanning the object, and the object recognition and tracking module analyzes color characteristics and depth information of the object to recognize the state and viewing angle of the object.
4. The object recognition and tracking system according to claim 1, wherein the mobile device further comprises a foreground cutting module for performing foreground cutting, perspective recognition and tracking with respect to the object.
5. The object recognition and tracking system according to claim 1, wherein the object recognition and tracking module automatically switches to perform visual angle tracking of the object by a device motion tracking method when the mobile device has only a small motion within a short time frame.
6. The system of claim 1, wherein the object recognition and tracking module automatically switches to estimate the relative angular motion of the object based on dynamic sensing information obtained by an inertial measurement unit of the mobile device when the mobile device has only a small motion within a short time frame.
7. The system of claim 1, wherein the object recognition and tracking module further compares the difference between the dynamic tracking effect of the device on the object and the scene in which the object is captured, so that the object recognition and tracking module switches back to the complete view tracking calculation or the object view recognition needs to be performed again when the difference exceeds a threshold value.
8. The object recognition and tracking system according to claim 1, wherein the object recognition and tracking module further uses a pose measurement method to compare the differences between the contour and the depth image of the object to calculate the error of the viewing angle of the object.
9. The system of claim 1 wherein the mobile device further reconstructs the gaussian laplacian and normalized information for each template into a single vector and the vectors for all templates into a template matrix.
10. The object recognition and tracking system of claim 9, wherein the mobile device is further configured to reduce the amount of data required on the mobile device or the dimensions of the template matrix by singular value decomposition.
11. An object identification and tracking method, comprising:
a template construction module of a server constructs templates of a plurality of different visual angles on a three-dimensional model of an object in a projection mode, and a characteristic extraction module of the server extracts data of template characteristics of the templates of the plurality of different visual angles; and
the method comprises the steps that a mobile device obtains or downloads data of a plurality of template features from the server, an object identification and tracking module of the mobile device compares the data of the plurality of template features to identify the object and the view angle of the object, the object identification and tracking module performs view angle tracking of the object by using an iterative nearest point algorithm, a hidden surface removal method and a bidirectional correspondence checking method, the hidden surface removal method is used for removing points which cannot be seen by the view angle of the object and comparing the points by using visible points of the view angle of the object, and the bidirectional correspondence checking method is used for checking or searching whether two data of the template features are the closest data of each other in a bidirectional way;
when the iterative closest point algorithm is executed, the object identifying and tracking module removes or ignores the template feature which cannot be observed by the view angle of the object by using the hidden surface removing method, and when the iterative closest point algorithm searches the closest data of the template feature, the object identifying and tracking module bidirectionally checks or searches whether two data of the template feature are the closest data of each other by using the bidirectional correspondence checking method.
12. The method according to claim 11, further comprising creating a three-dimensional model of the object by the three-dimensional model reconstruction module of the server for the template construction module to construct templates of the plurality of different perspectives on the three-dimensional model of the object in the projection manner.
13. The method of claim 11, further comprising capturing or scanning the object by a color camera and a depth sensor of the mobile device to analyze color characteristics and depth information of the object by the object recognition and tracking module to recognize a state and a viewing angle of the object.
14. The method of claim 11, further comprising performing foreground cutting, perspective recognition and tracking of the object by a foreground cutting module of the mobile device.
15. The method of claim 11, further comprising automatically switching by the object recognition and tracking module to perform view tracking of the object by device motion tracking when the mobile device has only small motion within a short time frame.
16. The method of claim 11, further comprising estimating a relative perspective motion of the object from the object identification and tracking module automatically switching over the dynamic sensing information obtained by the inertial measurement unit of the mobile device when the mobile device has only a small amplitude motion within a short time interval.
17. The method of claim 11, further comprising comparing, by the object recognition and tracking module, a difference between an effect of dynamic tracking of the device with respect to the object and a scene in which the object was captured, when the difference between the two is over the threshold value, the object recognition and tracking module switches back to complete view tracking calculation or needs to perform object view recognition again.
18. The method of claim 11, wherein, the method also includes comparing, by the object recognition and tracking module, differences in the profile and depth images of the object using pose measurements to calculate an error in the perspective of the object.
19. The method of claim 11 further comprising reconstructing, by the mobile device, the gaussian laplacian and normalized information for each template into a single vector and forming the vectors for all templates into a template matrix.
20. The method of claim 19 further comprising reducing the amount of data required on the mobile device or the dimensions of the template matrix by the mobile device through singular value decomposition.
CN201811626054.3A 2018-12-04 2018-12-28 Object identification and tracking system and method thereof Active CN111275734B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TW107143429A TWI684956B (en) 2018-12-04 2018-12-04 Object recognition and tracking system and method thereof
TW107143429 2018-12-04

Publications (2)

Publication Number Publication Date
CN111275734A CN111275734A (en) 2020-06-12
CN111275734B true CN111275734B (en) 2024-02-02

Family

ID=70413546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811626054.3A Active CN111275734B (en) 2018-12-04 2018-12-28 Object identification and tracking system and method thereof

Country Status (2)

Country Link
CN (1) CN111275734B (en)
TW (1) TWI684956B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220292717A1 (en) * 2019-09-13 2022-09-15 Google Llc 3D Object Detection Using Random Forests
TWI779488B (en) * 2021-02-09 2022-10-01 趙尚威 Feature identification method and system
TWI772020B (en) * 2021-05-12 2022-07-21 廣達電腦股份有限公司 Image positioning device and method
TWI817847B (en) * 2022-11-28 2023-10-01 國立成功大學 Method, computer program and computer readable medium for fast tracking and positioning objects in augmented reality and mixed reality

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800103A (en) * 2012-06-18 2012-11-28 清华大学 Unmarked motion capturing method and device based on multi-visual angle depth camera
CN102802000A (en) * 2012-08-09 2012-11-28 冠捷显示科技(厦门)有限公司 Tracking type multi-angle three-dimensional display image quality improving method
CN107545581A (en) * 2016-06-28 2018-01-05 圆展科技股份有限公司 Target tracking method and target tracking device
CN108509848A (en) * 2018-02-13 2018-09-07 视辰信息科技(上海)有限公司 The real-time detection method and system of three-dimension object

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI443587B (en) * 2011-05-30 2014-07-01 Univ Nat Cheng Kung Three dimensional dual-mode scanning apparatus and three dimensional dual-mode scanning system
TWI517100B (en) * 2014-01-22 2016-01-11 國立臺灣科技大學 Method for tracking moving object and electronic apparatus using the same
EP3114647A2 (en) * 2014-03-05 2017-01-11 Smart Picture Technology, Inc. Method and system for 3d capture based on structure from motion with simplified pose detection
FR3020699A1 (en) * 2014-04-30 2015-11-06 Centre Nat Rech Scient METHOD OF FOLLOWING SHAPE IN A SCENE OBSERVED BY AN ASYNCHRONOUS LIGHT SENSOR
US9830703B2 (en) * 2015-08-12 2017-11-28 Nvidia Corporation Model-based three-dimensional head pose estimation
US20170323149A1 (en) * 2016-05-05 2017-11-09 International Business Machines Corporation Rotation invariant object detection
EP3312762B1 (en) * 2016-10-18 2023-03-01 Axis AB Method and system for tracking an object in a defined area

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800103A (en) * 2012-06-18 2012-11-28 清华大学 Unmarked motion capturing method and device based on multi-visual angle depth camera
CN102802000A (en) * 2012-08-09 2012-11-28 冠捷显示科技(厦门)有限公司 Tracking type multi-angle three-dimensional display image quality improving method
CN107545581A (en) * 2016-06-28 2018-01-05 圆展科技股份有限公司 Target tracking method and target tracking device
CN108509848A (en) * 2018-02-13 2018-09-07 视辰信息科技(上海)有限公司 The real-time detection method and system of three-dimension object

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度摄像机的三维场景表面重建关键技术研究;李阳;《中国博士学位论文电子期刊网》;20170315;全文 *

Also Published As

Publication number Publication date
TW202022803A (en) 2020-06-16
CN111275734A (en) 2020-06-12
TWI684956B (en) 2020-02-11

Similar Documents

Publication Publication Date Title
CN111275734B (en) Object identification and tracking system and method thereof
US10769411B2 (en) Pose estimation and model retrieval for objects in images
CN103514432B (en) Face feature extraction method, equipment and computer program product
US11237637B2 (en) Gesture recognition systems
Klein et al. Improving the agility of keyframe-based SLAM
US9177381B2 (en) Depth estimate determination, systems and methods
Baak et al. A data-driven approach for real-time full body pose reconstruction from a depth camera
Klein et al. Parallel tracking and mapping for small AR workspaces
CN108960045A (en) Eyeball tracking method, electronic device and non-transient computer-readable recording medium
JP5940453B2 (en) Method, computer program, and apparatus for hybrid tracking of real-time representations of objects in a sequence of images
EP2843621A1 (en) Human pose calculation from optical flow data
CN109684969B (en) Gaze position estimation method, computer device, and storage medium
CN111951325B (en) Pose tracking method, pose tracking device and electronic equipment
Núnez et al. Real-time human body tracking based on data fusion from multiple RGB-D sensors
US20220067357A1 (en) Full skeletal 3d pose recovery from monocular camera
Song et al. VTONShoes: Virtual try-on of shoes in augmented reality on a mobile device
CN112257617B (en) Multi-modal target recognition method and system
JP6305856B2 (en) Image processing apparatus, image processing method, and program
Xue et al. Event-based non-rigid reconstruction from contours
CN115841602A (en) Construction method and device of three-dimensional attitude estimation data set based on multiple visual angles
Khokhlova et al. 3D point cloud descriptor for posture recognition
Czúni et al. The use of IMUs for video object retrieval in lightweight devices
Burch et al. Convolutional neural networks for real-time eye tracking in interactive applications
Colombo et al. Face^ 3 a 2D+ 3D Robust Face Recognition System
Jiang et al. Real-time multiple people hand localization in 4d point clouds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant