CN111595332B

CN111595332B - Full-environment positioning method integrating inertial technology and visual modeling

Info

Publication number: CN111595332B
Application number: CN202010284018.4A
Authority: CN
Inventors: 许凯; 鲍翊平; 王栋
Original assignee: Ningbo Shenxun Information Technology Co ltd
Current assignee: Zhejiang Shenxun Technology Co ltd
Priority date: 2020-04-13
Filing date: 2020-04-13
Publication date: 2023-05-09
Anticipated expiration: 2040-04-13
Also published as: CN111595332A

Abstract

The invention relates to the technical field of positioning methods, in particular to a full-environment positioning method integrating an inertial technology and visual modeling. It comprises the following steps: s1, setting positions of an inertial measurement unit and image acquisition equipment to enable the inertial measurement unit and the image acquisition equipment to be relatively static; s2, positioning through data acquired by an inertial measurement unit to obtain positioning coordinates; meanwhile, performing visual modeling through data acquired by the image acquisition equipment to obtain a visual model; and S3, unifying the visual model and a coordinate system of the positioning coordinates, then performing clock synchronization, and finally displaying the positioning coordinates in the visual model. The positioning method can be used in unfamiliar environments without depending on external equipment, and has the advantages of low cost and high positioning accuracy.

Description

Full-environment positioning method integrating inertial technology and visual modeling

Technical Field

The invention relates to the technical field of positioning methods, in particular to a full-environment positioning method integrating an inertial technology and visual modeling.

Background

Because the inertial navigation positioning technology has drift accumulation characteristics, the positioning precision is lower and lower along with the increase of time and distance, and the positioning needs in life cannot be met, so that other positioning technologies are needed to be used together with inertial navigation, and the inertial navigation positioning technology is generally satellite positioning and UWB positioning.

However, aiming at the satellite positioning technology, the biggest defect is that the satellite positioning technology is greatly influenced by signals, and even if the satellite positioning technology is outdoors, the satellite positioning technology cannot be accurately positioned by being matched with inertial navigation once the satellite positioning technology is in a high-rise condition; the disadvantage of UWB positioning is the large number of base stations that need to be deployed, the unfamiliar environment, and the high cost.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: the full-environment positioning method integrating the inertial technology and the visual modeling is provided, and can be used in unfamiliar environments without depending on external equipment, so that the cost is low, and the positioning accuracy is high.

The technical scheme adopted by the invention is as follows: a full environment positioning method integrating inertial technology and visual modeling comprises the following steps:

s1, setting positions of an inertial measurement unit and image acquisition equipment to enable the inertial measurement unit and the image acquisition equipment to be relatively static;

s2, positioning through data acquired by an inertial measurement unit to obtain positioning coordinates; meanwhile, performing visual modeling through data acquired by the image acquisition equipment to obtain a visual model;

and S3, unifying the visual model and a coordinate system of the positioning coordinates, then performing clock synchronization, and finally displaying the positioning coordinates in the visual model.

Preferably, the clock synchronization in step S3 means by continuously converging the deviation to zero in successive optimization.

Preferably, the positioning by the inertial measurement unit in step S2 specifically includes the following steps:

SA1, measuring horizontal acceleration and gravity acceleration from an inertial navigation measurement unit;

SA2, obtaining a moving direction and a moving speed through horizontal acceleration and gravity acceleration, and obtaining a positioning position according to an initial position;

SA3, calculating errors and correcting.

Preferably, the errors in step SA3 include a velocity error, a position error, and an attitude error.

Preferably, the visual modeling of the data acquired by the image acquisition device in the step S2 specifically includes the following steps:

SB1, preprocessing an image acquired by image acquisition equipment, and converting the image into characteristic points with main directions;

SB2, performing closed loop detection;

SB3, determining depth data of each characteristic point to obtain three-dimensional coordinates of each characteristic point;

SB4, obtaining a relative pose through feature point matching;

SB5, 2D-3D conversion is carried out, and three-dimensional visual modeling is realized.

Preferably, step SB1 comprises the following specific steps:

SB11, gray processing is carried out on the image acquired by the image acquisition equipment;

SB12, using Gaussian filter processing to the image after gray processing;

SB13, performing bias guide processing on each pixel point by using a black plug matrix on the Gaussian filtered image, so as to find edge points in the image;

SB14, screening edge points to obtain final characteristic points;

SB15, giving the main direction of the final feature point.

Compared with the prior art, the method provided by the invention has the following advantages: by combining the inertial navigation positioning technology with three-dimensional visual modeling, positioning can be realized without depending on external auxiliary equipment, an actual map of an unfamiliar environment can be generated in real time, on-site real-time environment data is provided for a commander for decision reference, and inertial positioning results are corrected by using the visual modeling map, so that long-time high precision and no drift of inertial positioning can be ensured.

Drawings

FIG. 1 is a schematic diagram of integrated pixel values in a full environment localization method combining inertial techniques and visual modeling according to the present invention.

FIG. 2 is a schematic diagram of a Gaussian filter template and box filter in a full environment positioning method combining inertial technology and visual modeling according to the present invention.

Fig. 3 is a schematic diagram of three layers of images in a full environment positioning method integrating inertial technology and visual modeling.

FIG. 4 is a schematic diagram of a Haar wavelet filter in a full environment positioning method integrating inertial technology and visual modeling according to the present invention.

FIG. 5 is an explanatory diagram of epipolar constraints in a full environment positioning method integrating inertial technology with visual modeling in accordance with the present invention.

Fig. 6 is a schematic diagram of clock synchronization in a full environment positioning method integrating inertial technology and visual modeling according to the present invention.

Detailed Description

The present invention is further described below by way of the following embodiments, but the present invention is not limited to the following embodiments.

The invention realizes final positioning calculation mainly from two angles of inertial technology and visual modeling.

The positions of the inertial measurement unit (which may also be an inertial navigation module, comprising mainly three accelerometers) and the image acquisition device (comprising mainly a camera) need to be fixed so that they remain relatively stationary.

First, the specific force component f of each direction of the wearer is measured from three accelerometers inside the inertial measurement unit ^b For subsequent positioning calculations.

/>

In order to be able to vectorially split the specific force into horizontal acceleration and gravitational acceleration, the specific force in the carrier coordinate system (b-system) needs to be converted into the navigation coordinate system (n-system) in order to obtain the real ground acceleration value for effective ground positioning.

The transformation of the coordinate system is realized by means of cosine matrix

Namely: />

To obtain this->

The angle between the two coordinate systems is known:

heading angle phi: longitudinal axis y of the carrier _b Projection of the point in time with respect to the tangential plane of the earth and the geographic north y-axis _t The included angle of (2) is defined to be in a range of 0-360 DEG by taking the north-east direction as the forward direction.

Pitch angle θ: the included angle between the projection of the longitudinal axis of the carrier and the transverse axis of the carrier on the horizontal plane is positive direction upwards, and the range of the definition domain is-90 degrees to 90 degrees.

Roll angle γ: transverse axis x of the carrier _b The included angle with the horizontal axis of the carrier projected on the horizontal plane is defined as the range of-180 degrees to 180 degrees by taking the right-tilting direction as the positive direction.

The conversion relationship between the carrier coordinate system (b system) and the navigation coordinate system (n system) is:

to facilitate the subsequent formula writing, gesture matrix is remembered

After obtaining the conversion matrix

After that, can be converted f ^b Obtaining f ⁿ For subsequent position calculation. However, f is directly used ⁿ It is obviously unreasonable to perform the next calculation because the carrier is in the earth's environment and is subject to the influence of gravity and earth's rotation, so that when considering the navigation coordinate system, this data needs to be added to the consideration, and the positioning result obtained would have a huge error.

Taking the gravity and the earth rotation into consideration, according to the relation between the absolute variability and the relative variability of the vector and Newton's second law, the basic equation of the inertial navigation system can be obtained as follows:

wherein:

-navigation of acceleration vectors in a coordinate system

f ⁿ Specific force vector (output of accelerometer) of navigation coordinate system

-earth rotationAngular velocity of

-angular velocity of carrier relative to earth movement

g ⁿ -gravity acceleration vector

-sum of the Ge acceleration and the centripetal acceleration of the earth

The definition of the coriolis acceleration is: when a moving point moves relative to a moving reference system in a linear manner, and the moving system rotates, the relative speed of the moving point changes, and a Ge-type acceleration is generated. In short, the coriolis acceleration is formed by the combined action of the relative motion and the pulling rotation.

This equation calculates the velocity of the carrier, where

R _m And R is _n The principal radius of curvature for the earth reference ellipsoid, namely:

the above f ⁿ 、

g ⁿ 、/>

The formula of (2) is brought into the velocity equation, and the last three axial velocities +.>

After the moving direction and the speed of the personnel can be calculated after the three axial speeds are obtained, the final position data of the personnel can be obtained by substituting the speed into an integral equation and adding the speed into the initial position of the personnel, and the position can be updated in real time according to the speed change.

Wherein L is ₀ And lambda (lambda) ₀ Time zero is the latitude and longitude of the carrier, respectively, and T represents the elapsed time.

Inevitably, any system always has errors, and in order to further improve the positioning accuracy of inertial navigation, final error calculation needs to be performed to optimize the positioning result.

Assume that the navigation coordinate system n in practice is the navigation coordinate system calculated by the computer

The slight angle differences are +.>

Then there is n series

Coordinate transformation matrix of system:

in the following, this transformation matrix will be used to find the errors of velocity, position, attitude, respectively.

Calculating a speed error:

the velocity calculation equation for the horizontal channel in the ideal case is:

wherein, the liquid crystal display device comprises a liquid crystal display device,

is the actual speed value; />

Is the actual specific force value; l (L) ^c Is the actual latitude value.

And, a set of expression formulas for actual speed are readily available:

wherein i=x, y, z,

representing the speed difference, _i representing the accelerometer zero offset.

And (3) simultaneous equations, wherein the expression of the velocity error is as follows:

calculating a position error:

from the position calculation formula of the carrier

The longitude and latitude change rate under the actual error condition can be obtained as follows:

there are also actual error expressions:

/>

the expression of the error obtained by combining the above equations is:

calculating attitude errors:

known to have angular deviations of

The form written as a negative antisymmetric matrix is:

the previous conversion matrix may be changed to:

navigation coordinate system calculated by knowing relative computer of inertial measurement unit coordinate system s system

The system conversion matrix is:

and deriving the left and right sides of the equation to obtain:

then the properties contained in the cosine matrix

Substituting the above equation to obtain

Both sides multiply by the direction cosine matrix

And substitute into +.>

In the matrix, obtain:

the drift errors of three axes of the gyroscope in static state are expressed, and the drift errors of x, y and z are respectively epsilon _x 、ε _y 、ε _z I.e. +.>

Deriving from the angular velocity calculation matrix obtained in the velocity calculation, the corresponding drift can be obtained:

/>

by combining the above equations, the attitude error expression can be:

by the positioning calculation and error elimination algorithm, the influence of angle errors on the positioning accuracy can be reduced to a great extent. However, inertial navigation has a further characteristic, namely-error accumulation. Although the corrected inertial navigation has an accuracy of 3% relative error, its accuracy decreases with time.

The invention adopts a visual modeling mode to correct the inertial navigation positioning result, and relies on modeling to restrict the drift of positioning points, thereby maintaining high-precision positioning for a long time. Namely: the inertial measurement unit acquires specific force components, and a lens group arranged on a person acquires a scene in real time for modeling.

First, a live view is captured by a lens group provided on a subject, and the live view is subjected to gradation processing, whereby each pixel value in the image is expressed as a gradation value. Meanwhile, the upper left corner of the image is taken as the origin of coordinates, the right is the positive direction of the x axis, and the downward is the positive direction of the y axis to establish a coordinate system. Then any pixel point (x _i ,y _i ) The sum of all pixel values in the rectangular region enclosed by the origin of coordinates is the integral pixel value.

By integrating the pixel values, an arbitrary rectangular integrated pixel value is desired, and as shown in fig. 1, the integrated pixel values at the four vertex angle positions of the rectangle are added or subtracted to obtain: Σ=a-B-c+d, a being the upper left point, B being the upper right point, C being the lower left point, D being the lower right point. Since the whole rectangle operation no longer needs to pay attention to accumulation of each pixel point in the rectangle, the operation amount of image processing is greatly reduced.

After obtaining the pixel value of each pixel, gaussian filtering is needed to process the obtained image, so as to ensure that the feature points finally obtained by us have scale independence. ( Namely: regardless of the size of the scene, machine vision is able to determine whether it is an image of the same scene. This is relevant for the implementation of closed loop detection in the post modeling process. )

Gaussian filtering is a process of weighted averaging the pixel values of an image, where the value of each pixel is obtained by weighted averaging its own value with other pixel values in the field. The closer the point weight is, the higher the weight is, and conversely, the lower the weight is. The image is subjected to blurring processing so as to simulate the imaging process of the machine when the distance from the object is from far to near.

The formula of the two-dimensional Gaussian filter is:

wherein x and y represent coordinates of the pixel point, sigma ² Representing the variance of the pixel values.

After Gaussian blur is carried out on the image, the black plug matrix can be used for conducting partial guide processing on each pixel point, and therefore edge points (abrupt points) in the image are found.

The discriminant of the Hessian matrix is:

/>

since the gaussian kernel is normally distributed, the SURF algorithm uses a box filter to approximate the substitution of the gaussian filter, thereby increasing the speed of operation. As shown in fig. 2.

Obtaining an approximation of a black plug matrix:

det(H)＝D _xx *D _yy -(0.9*D _xy ) ² 。

wherein D is _xy The multiplied factor of 0.9 is to balance the error introduced by using the box filter.

After all the mutation points in the image are found through the steps, the mutation points are screened one by one, and the final feature points are obtained. Since the feature point is an extreme point in the image, and this extreme value is not only compared with the surrounding 8 points, but also includes 9 points of the adjacent two layers.

As shown in fig. 3, the middle detection point is compared with 8 pixels in the 3×3 neighborhood of the image where the detection point is located, and 18 pixels in the 3×3 neighborhood of the upper layer and the lower layer, which are adjacent to each other, for 26 pixels in total. Therefore, in order to enable each mutation point to have an upper layer image and a lower layer image for comparison, only one image is far from enough, and the construction of a scale space is needed at this time.

The construction method mainly comprises the following steps: the size of the original image is kept unchanged, the size of the template of the box filter is only changed to obtain images with different scales, and the obtained images are used for forming a pyramid model.

First, a box filter of 9*9 size is started, and the box filter corresponds to a second-order gaussian filter of σ=1.2. Since a scale space is made up of groups, each group in turn contains layers. Layers are generated by filters of different sizes, while groups are generated by different sizes of the starting and spacing. The corresponding relation between the size of the filtering template and sigma is as follows:

examples:

it is noted that in order to maintain continuity in the scale space, to avoid the situation where the first and last layers of each group are not compared one to the other, there must be a partial layer overlap between adjacent groups, while the box filters in each group are progressively larger in size.

The obtained mutation points can be screened in the constructed scale space, and extreme points, namely characteristic points, in the mutation points are obtained. However, according to the actual situation, we need to perform corresponding preferential treatment on the feature points.

If the obtained feature points are fewer, the feature points are directly used without preferential treatment.

If the feature points are more, the next screening is needed, and only the feature points with larger contrast are reserved. The specific method comprises the following steps:

setting a threshold value, recording candidate feature points x, defining the offset as delta x, and applying taylor expansion to D (x), wherein the contrast is the absolute value of D (x) |D (x):

the product can be obtained by the method,

since x is the extreme point of D (x), the above formula is derived and set to 0 to obtain

Then substituting the obtained Deltax into Taylor expansion of D (x)

/>

Let the threshold of contrast be T, if

The feature point remains and is otherwise removed. This ensures that the remaining feature points are high contrast points.

When the threshold is actually selected, the threshold is changed according to the requirements of the number and the accuracy of the feature points in practical application. The larger the threshold, the better the robustness of the feature points obtained, but the fewer the number of feature points obtained. The threshold value can be suitably lowered when processing images of a simple scene. In a complex image, the number of feature point changes after the image is rotated or blurred is large, and the threshold value needs to be properly improved in the test.

After the final feature points are determined, the main directions of the feature points are given, so that the feature points have rotation invariance, namely: two images with different rotation angles are matched, and still the matching can be performed through the main direction of each feature point. This step is also relevant for closed loop detection in the post modeling process.

The method mainly comprises the following steps:

i confirmation of the direction of the interest points

Each point of interest is first assigned a directional characteristic. In the circular field with a certain characteristic point as a circle center and 6S (S is a scale corresponding to the interest point) as a radius, a Haar wavelet template with the size of 4S is used for processing an image, and Haar wavelet responses in the x and y directions are obtained.

The templates of Haar wavelets are shown in fig. 4:

wherein the left-hand template represents the response in the X-direction, the right-hand template represents the response in the y-direction, black represents-1, and white represents +1. After processing the circular field, the responses in the x, y directions corresponding to each point in the field are obtained, and then weighted by a gaussian function (σ=2s) centered on the point of interest.

A60-degree sector is formed in the circular field, the sum of haar wavelet characteristics in the sector is counted, the sector is rotated, and the sum of wavelet characteristics is counted. The direction in which the sum of the wavelet features is largest is the main direction.

II generating feature point descriptors

And selecting a square frame around the feature points, wherein the direction is the main direction of the key points, and the side length is 20S. Dividing it into 16 regions (side length 5S), each region counting the Haar wavelet characteristics of 25 pixels in the horizontal and vertical directions (all determined by the main direction of the square frame),

the wavelet characteristics include the sum of horizontal direction values, the sum of horizontal direction absolute values, the sum of vertical direction values, and the sum of vertical direction absolute values (the absolute values are accumulated in order to include polarity information of the intensity change in the descriptor as well).

All the steps are preprocessing of the images, converting the images acquired by the lens group into feature points with main directions, and then applying the feature points.

First is closed loop detection. Since only the correlation between adjacent times is considered in the whole algorithm, the problem of error accumulation is inevitably encountered, and the track drift of the whole VSLAM is more and more serious with the continuous extension of time. At this point, the track needs to be corrected using closed loop detection, pulling the shifted point back to the original position. After the characteristic points are obtained from the algorithm, the KD-Tree algorithm is adopted to match the nearest neighbor characteristic points, whether the two images are in the same scene or not is identified, and accordingly whether a closed loop is formed or not is judged.

The KD-Tree algorithm is mainly based on dividing data into two parts according to size. All values on the left branch are smaller than the root value, and all values on the right branch are larger than the root value, and the whole value is continuously divided under the condition until all subsets cannot be divided.

When two-dimensional data such as pixel positions are encountered, the variances are calculated for x and y respectively, and then the direction with large variances is divided, because the large variances indicate that the data have higher dispersion in the direction, and the division is more obvious.

Examples: there are { (4, 7), (2, 3), (7, 2), (8, 1), (5, 4), (9, 6) }6 pixel positions.

The variance is calculated for x, y, respectively. The variance in the x-direction is greater. Thus first divided in the x-direction: 2,4,5,7,8,9. Taking the intermediate value 7, there are left branches (2, 4, 5) and right branches (8, 9).

Then dividing in the y direction, taking the median value 4 from 3,4 and 7, and taking 6 from 1 and 6 to obtain the binary tree.

In this way, when the feature point matching is performed, the shortest path searching and matching can be directly performed on the binary tree according to the size of the numerical value, so that the most efficient image matching is realized.

However, if there is only one-way search, then the result is not necessarily the closest point to the Q point, and a trace-back operation is also required, i.e., to determine whether there are more points in the unviewed (or visited) tree branches that are closer to Q, or whether the distance |q (k) -m| "between Q and the unviewed tree branches is less than the distance of Q to the current nearest neighbor. From the geometric space, it is determined whether or not a hyper-sphere centered on Q and at dis intersects a hyper-rectangle represented by an unvisited tree branch. If the two branches intersect, the distance represented by the Q and the branches is calculated continuously, otherwise, the distance is skipped directly and not considered.

If the intersection occurs indicating that the backtracking point is closer to the Q point, otherwise, the result point is already the nearest point. Then, by comparing the pixel values between the nearest point and the Q point, since there are tens of Q in one image, tens of comparison relations are formed, and when the values in the correspondence relation are similar to the feature description and more similar to the adjacent frames of the image, then it can be considered that a closed loop is formed at this time.

After the track drift problem is solved, depth data (distance relative to the lens group) of each feature point is then determined.

Here we use the principle of epipolar constraint. As shown in fig. 5, a and c are the centers of the cameras, X is the image point in the image, then it can be known that the object X is necessarily on the extension line of the straight line cx, at this time, if there is image data of another lens, an extension line can be made in the same way, and the intersection point of the two lines is the real position of the object, so as to determine the distance between the specific object and the machine. On the basis, the first camera can be used as the origin of coordinates, the first camera is assumed to be not translated and rotated, and then the rotation and translation of the second camera relative to the first camera are judged through image point movement, so that the pose of the camera is known.

The derivation formula is as follows:

let point p= [ X, Y, Z of three-dimensional space with first camera as coordinate system]Which are p at the image points of the two cameras ₁ ,p ₂ . Since the center of the first camera serves as the origin of the world coordinate system, that is, the first camera has no rotation or translation, it is possible to obtain by the pinhole camera model:

p ₁ ＝KP，p ₂ ＝K(RP+t)

where K is the internal reference of the camera, R, t is the rotation and translation of the second camera relative to the first camera.

From p ₁ P=k can be obtained by=kp ^-1 p ₁ The second formula is introduced:

p ₂ ＝K(RK ^-1 p ₁ +t)

two sides are simultaneously multiplied by K ^-1 Obtaining

K ^-1 p ₂ ＝RK ^-1 p ₁ +t

Let x be ₁ ＝K ^-1 p ₁ ,x ₂ ＝K ^-1 p ₂ Substituted into

x ₂ ＝Rx ₁ +t

The two sides simultaneously multiply the anti-symmetric matrix t x of the vector t by the left, since t x t=0, cancel t

t×x ₂ ＝t×Rx ₁

Two sides are multiplied by left at the same time

Due to t x ₂ Is vector t and vector x ₂ Is perpendicular to both vector t and vector x ₂ So to the left

Obtaining

And then x is ₁ ,x ₂ Change off

The above equation is another representation of epipolar constraints, which contains only image points, rotation and translation of the camera, the middle matrix being the basis matrix F.

Wherein f=k ^-T t×RK ^-1

From formula f=k ^-T t×RK ^-1 It can be seen that if the internal parameters K of the camera are known, intermediate matrices can be extracted

E＝t×R

E is called the essential matrix, which differs from the base matrix by the internal reference K of the camera. And calculating the pose of the camera through the matched point pairs.

From the above, the following relationship holds for the matched pixels p1, p2 and the basis matrix F:

wherein f=k ^-T t×RK ^-1

That is, the basis matrix F of the two views can be calculated only by the matched pair of points (minimum 7 pairs), and then the pose of the camera is decomposed from F.

The relative pose of the camera can be estimated by feature point matching:

1. extracting characteristic points of the two images and matching

2. Computing a basis matrix F for two views using matching image points

3. Decomposing the basis matrix F to obtain a rotation matrix R and a translation vector t of the camera

After the corresponding relation between the space point and the pixel point in the image is acquired by the method, final 2D-3D coordinate conversion can be performed, and the three-dimensional modeling of the VSLAM is realized.

That is, the correspondence of one spatial point [ x, y, z ] and its pixel coordinates [ u, v, d ] (d depth data) in the image is such that:

wherein f _x ,f _y Refers to the focal length of the camera in both x, y axes, cx, cy refers to the aperture center of the camera, and s refers to the scaling factor of the depth map. This formula is pushed from (x, y, z) to (u, v, d). Conversely, we can write it as a known (u, v, d) and derive the way of (x, y, z).

A point cloud model can be constructed from this formula.

In general we will put f _x ,f _y The four parameters cx, cy are defined as the internal matrix C of the camera, i.e. the parameters that will not change after the camera is finished. Given the internal reference K, the spatial position and pixel coordinates of each point can be described by a simple matrix model:

where R and t are the pose of the camera. R represents a rotation matrix, and t represents a displacement vector. Because we now do a single point cloud, the camera is not considered to be rotating and translating. Therefore, R is set to unity matrix I and t is set to zero. s is the ratio of the data given in the depth map to the actual distance. Since depth maps are given by short (mm units), s is typically 1000.

The present invention combines these two techniques by unifying the coordinate system and performing two steps of clock synchronization.

Coordinate system

In general, two technologies have independent coordinate systems, and in order to enable the inertial positioning points to be positioned and displayed in a model generated by visual modeling with high precision, the two technologies of visual modeling and inertial positioning are required to be located in the same coordinate system. Only then can the movement trajectories of a fixed point in space coincide within the coordinate system of the two techniques.

The invention integrates and designs the functional modules of visual modeling and inertial positioning, namely: assembled in a single product such that both remain relatively stationary. This means that the angles between the camera coordinate system and the carrier coordinate system corresponding to the visual modeling and inertial positioning are fixed after the product is manufactured. The z axis of the camera coordinate system is the optical axis of the camera, the x and y axes are respectively parallel to the x and y axes of the imaging plane, and the origin is positioned in the optical center of the camera; the carrier coordinate system is a coordinate system formed by three accelerometers in the corresponding inertial positioning module.

Then, given the two coordinate systems, substituting the two coordinate systems into the conversion formulas of the carrier coordinate system and the navigation coordinate system can obtain the corresponding conversion matrix t=

θ _x : the included angle between the projection of the longitudinal axis y-axis of the camera coordinate system on the xy-plane of the carrier coordinate system and the y-axis of the carrier coordinate system takes the clockwise direction as the forward direction, and the definition domain is 0-360 degrees.

θ _y : the included angle between the y-axis of the camera coordinate system and the projection of the y-axis of the camera coordinate system in the xy-plane of the carrier coordinate system is positive upwards, and the definition range is-90 degrees to 90 degrees.

θ _z : the included angle between the projection of the x axis of the camera coordinate system and the projection of the x axis on the xy plane of the carrier coordinate system is positive in the right inclination direction, and the definition range is-180 degrees to 180 degrees.

Then the formula can be: a=tb, and the camera coordinate system is substituted into B, so that the obtained new coordinate system can be overlapped with the inertial coordinate system, thereby ensuring that the positioning tracks of the two technologies overlap, and the positioning tracks are accurately displayed in the map model modeled in real time.

Clock synchronization

In the process of coordinate system conversionAfter the two coordinate systems are coincident, the resulting anchor points of the two techniques should be coincident in theory, however in practice, the results of the two techniques remain different without taking into account drift. The problem that all hardware systems have trigger delay, transmission delay, clock synchronization inaccuracy and the like causes that each device cannot obtain positioning results at the same time, so that the results are deviated from each other. As shown in FIG. 6, the bias on the camera and IMU data time stamps is taken as part of the state variable, denoted as t _d Let t _IMU ＝t _cam +t _d So that the time stamps between the inertial measurement unit and the image acquisition device (camera) can coincide.

Let the time stamp corresponding to the nth frame be t _n The corresponding real sampling time point is t _n -t _d . At this time, a feature point I on the frame image _k Its correspondent coordinate is [ x ] _k y _k ] ^T Then the relationship between the frame and the true feature point coordinates is:

I _{k true} ＝[x _k y _k ] ^T -t _d V _k

Wherein V is _k Representing t _d The movement speed of the feature point in the time period is due to the fact that under normal conditions t _d Is very small, so the feature points can be regarded as uniform motion, i.e. V _k Is a constant.

Then substituting this case into the three-dimensional coordinate system, using I _{k true} Replace I _k And time deviation parameter t _d The residual term introduced into the optimization equation camera, i.e., the re-projection error, can be written as:

e _k ＝I _{k true} -π(R _k (P _l -p _k ))

Wherein P is _l Representing three-dimensional coordinates x _l y _l z _l ] ^T ,(R _k p _k ) Representing the pose of the camera.

Since even the error is not a fixed value, the minimum of the errors needs to be taken (it is assumed that the error should not be large in normal cases), then there are:

wherein e _p -H _p χ represents an a priori factor, e _B (I _k+1 χ) represents the error term produced by the inertial system, B represents the individual measurements taken by the inertial device,

representing error terms generated by visual modeling, C representing feature points acquired by the visual modeling and observed at least twice, wherein the error terms comprise t _d Is a piece of information of (a).

Finally, the obtained minimum t _d Substitute back to t 'in the initial time offset' _cam ＝t _cam -t _d Then using the newly obtained t' _cam The above operations are performed again by establishing a new time stamp, so that the deviation will continuously converge to zero in the successive optimization, and finally clock synchronization is realized.

Therefore, the two technologies simultaneously meet the requirements of synchronization of a coordinate system and a clock, and accurate positioning of the same point can be realized. The real-time modeling obtained by the visual modeling technology is simultaneously transmitted back to the background command end, so as to provide on-site real environment simulation for commanders. The method realizes long-time high-precision positioning and map information simulation feedback of strange environment without depending on an external base station.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; while the invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will appreciate that modifications may be made to the techniques described in the foregoing embodiments, or that equivalents may be substituted for elements thereof; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The full environment positioning method integrating the inertial technology and the visual modeling is characterized by comprising the following steps of:

s3, unifying the visual model and a coordinate system of the positioning coordinates, then performing clock synchronization, and finally displaying the positioning coordinates in the visual model;

the positioning by the inertial measurement unit in step S2 specifically includes the following steps:

SA3, correcting calculation errors;

the step S2 of performing visual modeling on the data acquired by the image acquisition equipment specifically comprises the following steps:

SB2, performing closed loop detection;

SB4, obtaining a relative pose through feature point matching;

SB5, 2D-3D conversion is carried out to realize three-dimensional visual modeling;

step SB1 comprises the following specific steps:

SB12, using Gaussian filter processing to the image after gray processing;

SB14, screening edge points to obtain final characteristic points;

SB15, giving the main direction of the final feature point.

2. The full environment positioning method integrating inertial technology and visual modeling according to claim 1, wherein the method comprises the following steps: the clock synchronization in step S3 is achieved by continuously converging the deviation to zero in a successive optimization.

3. The full environment positioning method integrating inertial technology and visual modeling according to claim 1, wherein the method comprises the following steps: the unifying of the coordinate system in step S3 means: the transformation matrix is obtained by the coordinate system of the known inertial measurement unit and the coordinate system of the image acquisition device, and then the coordinate is unified by the transformation matrix.

4. The full environment positioning method integrating inertial technology and visual modeling according to claim 1, wherein the method comprises the following steps: the errors in step SA3 include a velocity error, a position error, and an attitude error.