CN113240597A

CN113240597A - Three-dimensional software image stabilization method based on visual inertial information fusion

Info

Publication number: CN113240597A
Application number: CN202110497661.XA
Authority: CN
Inventors: 唐成凯; 李冠林; 张怡; 张玲玲; 王晨; 程卓
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-05-08
Filing date: 2021-05-08
Publication date: 2021-08-10
Anticipated expiration: 2041-05-08
Also published as: CN113240597B

Abstract

The invention provides a three-dimensional software image stabilization method based on visual inertia information fusion. Firstly, calibrating a camera and a visual inertial sensor, acquiring distortion parameters and an internal reference matrix of a monocular camera, and calibrating IMU error parameters for acquiring an IMU error model of subsequent tight coupling joint optimization; secondly, performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system; thirdly, tightly coupling IMU data and visual data to perform joint optimization, and transforming a matrix and three-dimensional space characteristic points by using a camera; the fourth step: and pre-warping according to the obtained transformation matrix, and carrying out local grid transformation on the obtained characteristic points to obtain a final image stabilization result. The invention can effectively solve the problem of poor motion vector estimation performance of the classical SFM method and improve the spatial position of the motion-compensated three-dimensional characteristic point and the overall motion vector precision.

Description

Three-dimensional software image stabilization method based on visual inertial information fusion

Technical Field

The invention relates to a Visual three-dimensional Software image Stabilization method with Inertial components, in particular to a Software image Stabilization (Software Stabilization) method based on a Visual Inertial Information Fusion (Visual Inertial Information Fusion) theory.

Background

The rapid development of the electronic information age leads to the technical soaring, and the unmanned technology gradually develops from vehicles to the air. Unmanned systems play an increasingly important role in military and civilian areas such as urban transportation, battlefield combat and the like. In the development of unmanned systems, the design of sensing, navigation and control systems is of paramount importance. The visual navigation technology is characterized in that the unmanned system extracts abundant navigation information from an environment image acquired by the camera in real time, so that the unmanned system has the capability of interacting with the environment, and the automation and intelligence level of control is improved. However, in some cases, the camera carrier may generate undesirable random jitter, and in such cases, the video and images obtained by the camera carrier may have unstable and blurred pictures, resulting in degradation of video quality and observation effect. In order to ensure the use value of the obtained video sequence, anti-shake processing, namely image stabilization processing, needs to be performed on the moving camera platform.

Image stabilization systems can be classified into mechanical image stabilization, optical image stabilization and digital image stabilization according to the image stabilization mode. The mechanical image stabilization platform is large in size and low in precision in practical use, and is easily influenced by external force errors such as friction force and wind resistance, so that the mechanical image stabilization platform is not suitable for serving as a main means for image stabilization of the small rotor wing unmanned aerial vehicle; the optical image stabilizer has a good optical image stabilization effect, can directly utilize an optical method to realize stabilization of an image sequence, but needs achromatic treatment on a variable optical wedge due to the existence of a secondary spectrum, so that the structure and the process of the image stabilizer are too complex, passive compensation is only carried out by a prism, a reflector or an optical wedge, and the image stabilization capability of the image stabilizer is greatly limited; the Digital Video Stabilization (DVS) technology is also called electronic image stabilization, and based on computer vision, image processing, signal processing and other subjects, the image stabilization function is realized mainly by compiling an image stabilization algorithm by a computer and software codes without the support of other systems, and the compiled software codes can be transplanted, so that the Digital video stabilization DVS system is convenient to maintain and update, has low cost and good precision. Digital video image stabilization is an emerging technology developed in recent decades, and generates a stable video sequence mainly by eliminating or reducing video jitter. Image stabilization methods can be divided into 2D, 2.5D and 3D methods according to current research models. The 2D method has simple algorithm principle and small operand, but the image stabilizing effect is not natural enough due to lack of depth of field, and the algorithm performance is limited; the 2.5D algorithm compromises the 2D algorithm and the 3D algorithm, the algorithm complexity is high, the image stabilizing effect is good, the parallax can be processed, but the algorithm needs to adopt enough long continuous appearance characteristic points in the multi-frame image to generate a track and stabilize the image according to certain constraint conditions, and the algorithm is difficult to be used for real-time image stabilization; the 3D algorithm has the highest complexity and the best performance, but the image stabilization effect of the algorithm is limited by the performance of the Motion recovery Structure (SFM), and it is difficult to obtain a good enough image stabilization effect.

Disclosure of Invention

The invention aims to provide a three-dimensional software image stabilization method based on visual inertial information fusion. The three-dimensional software image stabilization method based on visual inertial information fusion can be combined with a visual navigation method, the calculation amount of an algorithm is reduced, the image stabilization performance is good, and the quality of an obtained image sequence can be effectively improved.

The three-dimensional software image stabilization method based on visual inertia information fusion provided by the invention combines the advantages of high precision of the visual inertia method and the characteristic of relatively natural grid warping change, and can effectively solve the problem of precision bottleneck existing in the conventional image stabilization algorithm.

The technical scheme of the invention is as follows:

the three-dimensional software image stabilization method based on visual inertial information fusion comprises the following steps:

step 1: calibrating a camera and an IMU (inertial measurement Unit), acquiring distortion parameters and an internal reference matrix of the monocular camera, and calibrating IMU error parameters for acquiring an IMU error model of subsequent tight coupling joint optimization;

step 2: performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system;

and step 3: tightly coupling IMU data and visual data to perform joint optimization, and acquiring a camera transformation matrix and optimized three-dimensional space characteristic points;

and 4, step 4: and pre-warping according to the obtained transformation matrix, and carrying out local grid transformation on the obtained characteristic points to obtain a final image stabilization result.

Further, in step 1, the camera and IMU calibration method is calibrated with reference to the github open source calibration tool Kalibr.

Further, in step 1, errors of the angular velocity and the linear velocity are obtained through IMU calibration to correct the IMU error model; acquiring a camera internal reference matrix K and a distortion coefficient through visual calibration; and further correcting the parameters through combined calibration, and acquiring a conversion matrix between the IMU and the camera carrier.

Further, in step 2, the process of performing monocular vision initialization is as follows:

image frame I obtained for monocular vision₁,I₂There is a common three-dimensional point P under the world coordinate system, and the position of the common three-dimensional point P under the world coordinate system is [ X, Y, Z ]]^T(ii) a The poses of the three-dimensional point in the two frames of images have a relation:

s₁p₁＝KP,s₂p₂＝K(RP+t)

x₁＝K^-1p₁，x₂＝K^-1p₂

wherein R and t are a rotation matrix and a displacement vector between two adjacent frames of the camera; can obtain the pose relation

Obtaining an essential matrix E and a basic matrix F, further obtaining a rotation matrix R and a translation matrix t according to the essential matrix E and the basic matrix F, and recovering the relative depth of each characteristic point by adopting triangulation; and then, obtaining the scale information of each characteristic point by utilizing an IMU pre-integration model.

Further, the IMU pre-integration model is:

obtaining the gyroscope bias b according to the IMU calibration of the step 1_wAccelerometer bias b_aAnd additive noise n_a，n_w(ii) a Obtaining accelerometer values using 6-axis IMU

And the gyroscope obtains the angular velocity value

Where the b subscripts indicate being in a body coordinate system,

for transfer of the world coordinate system to the rotation matrix in the body coordinate system, g^wThe gravity parameters under a world coordinate system; if the camera is at [ t ]_k,t_k+1]Image frames acquired in time are respectively k and k +1, wherein the corresponding positions of a certain characteristic point under a body coordinate system are respectively b_kAnd b_k+1And obtaining the transmission of the position, speed and direction values in the world coordinate system through IMU measurement values in a time interval as follows:

where Δ t_kIs [ t ]_k,t_k+1]The time interval between the start of the cycle,

the body coordinate system is converted into a rotation matrix under world coordinates; further converting the position, the speed and the direction values under the world coordinate system into a body coordinate system:

wherein:

and performing first-order Taylor expansion on the formula to obtain an approximate value, and then performing intermediate value expansion to obtain:

and obtaining an error transfer function matrix:

wherein

Obtaining corresponding parameters by using the error transfer function matrix; by using

Obtaining the scale information s so as to recover the actual depth of each characteristic point and the result under the world coordinate system, wherein the corner mark c₀Representing the camera coordinate system in the first frame image of the camera.

Further, in step 3, the IMU data and the visual data are closely coupled for joint optimization, a sliding window method is adopted for optimization, and a part of frame information is selected as key frame information to be jointly optimized with the rest solutions.

Further, the optimization process by using a sliding window method in step 3 is as follows:

using formulas

Representing the parameter variable to be optimized, where x_kN denotes a camera state at each time, including a position

Speed of rotation

And rotate

And bias matrix b of accelerometer and gyroscope at each moment_a，b_g；

Is a transformation matrix, lambda, between the camera coordinate system and the body coordinate system_nIs the inverse depth value; solving an objective function

And optimizing, wherein the first part of the objective function represents the marginalized residual error part, the second part is the IMU residual error, and the third part is the visual error function part.

Advantageous effects

The invention establishes a monocular vision-based motion estimation model by a visual inertial fusion method, thereby realizing the improvement of the precision of the motion estimation method. And on the basis of the traditional SFM motion estimation method, a visual inertial navigation information tight coupling method is added, so that the overall accuracy of the algorithm for obtaining the three-dimensional information is improved.

The method of pre-warping and grid transformation local warping adopted by the invention effectively solves the parallax problem by a motion compensation method on the basis of the traditional image stabilization method, so that the result after image stabilization is more natural, and the image stabilization visual effect of an image stabilization algorithm can be effectively improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic antipodal geometry, with adjacent image frames I₁,I₂Setting an image frame I₁→I₂Is R, t, and the center position of the camera in each image frame is O₁,O₂. Suppose an image frame I₁Its corresponding characteristic point is p₁Image frame I₂Corresponding characteristic point p₂The two are matched by an ORB feature point method and subjected to RANSAC screeningThe result can be seen as the projection of the same point in two different image frame imaging planes. Thus, connecting the wires

And

will intersect at a point P in three-dimensional space, represented by O₁,O₂And P we can get a plane named the polar plane (Epipolar plane), O₁,O₂The connecting line between the two points will intersect with the image frame at e₁,e₂Two points, referred to as poles (Epipoles), O₁O₂It is called Baseline. l₁,l₂For two image frame planes I₁,I₂The intersection with the polar plane is called the Epipolar line.

FIG. 2 is a schematic diagram of IMU integration, wherein red represents visual data and green represents IMU data.

Figure 3 is a schematic view of a partial warpage,

representing mesh vertices, u and v representing transformed coordinate systems, respectively

And (4) coordinates.

Detailed Description

The invention provides a three-dimensional image stabilization method based on visual inertial navigation information fusion, aiming at the problem that the precision of the traditional image stabilization method cannot meet the requirement of a future high-precision image stabilization algorithm. The method mainly comprises two main parts of motion estimation and motion compensation. The motion estimation adopts a method of combining visual inertia information, can effectively solve the problem of poor motion vector estimation performance of the classical SFM method, and improves the spatial position of the three-dimensional characteristic point of the motion compensation and the overall motion vector precision. The motion compensation is used for pre-warping according to the obtained motion vector and the three-dimensional characteristic point cloud, carrying out grid segmentation to carry out three-dimensional image warping further compensation, and obtaining a result after image stabilization, so that the image stabilization precision is improved.

The main ideas of the invention are as follows: firstly, calibrating a camera and a visual inertial sensor, acquiring distortion parameters and an internal reference matrix of a monocular camera, and calibrating IMU error parameters for acquiring an IMU error model of subsequent tight coupling joint optimization; secondly, performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system; thirdly, tightly coupling IMU data and visual data to perform joint optimization, and transforming a matrix and three-dimensional space characteristic points by using a camera; the fourth step: and pre-warping according to the obtained transformation matrix, and carrying out local grid transformation on the obtained characteristic points to obtain a final image stabilization result.

The specific process is as follows:

calibrating a camera and an IMU (inertial measurement Unit), acquiring distortion parameters and an internal reference matrix of the monocular camera, and calibrating IMU error parameters to acquire an IMU error model of subsequent tight coupling joint optimization;

the specific calibration method of vision and IMU calibration can be calibrated by referring to a github open source calibration tool Kalibr. The IMU calibration is to obtain the error of angular velocity and linear velocity to correct an IMU error model for scale acquisition and final overall optimization, the visual calibration is to obtain an internal reference matrix K and a distortion coefficient, the parameters can be further corrected by combined calibration, and a conversion matrix between the IMU and a camera carrier is obtained, and a monocular pinhole camera model is adopted by default. The distortion parameter of which comprises the tangential distortion p₁,p₂And radial distortion k₁,k₂,k₃：

Wherein X and Y are measured values of the camera normalized coordinate system, X and Y are actual values of the camera normalized coordinate system, and r²＝x²+y²The true values x, y after distortion correction can be obtained from the equation (1-1).

Performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system;

after distortion correction, the input image frame can be defaulted to be distortion-free. The pixels of the image frame taken later are by default already corrected. Initialization may then occur. As shown in fig. 1, image frame I₁,I₂A common three-dimensional point P under the world coordinate system is positioned as [ X, Y, Z ] under the world coordinate system]^T. According to the monocular camera model, the poses of the three-dimensional points in the two images have the relationship of the formula (1-2):

wherein R and t are a rotation matrix and a displacement vector between two adjacent frames of the camera, and the formula (1-3) can be deduced from the formula (1-2):

the essential matrix E and the basic matrix F can be obtained from the equations (1-3), and the rotation matrix R and the translation matrix t can be further obtained from the essential matrix E and the basic matrix F, for example, by an eight-point method or a five-point method. And then, triangulation can be adopted to recover the relative depth of each feature point, wherein the relative depth is mainly limited by the defect that a monocular camera cannot acquire an actual scale. And for jointly optimizing the scale, obtaining scale information by means of an IMU pre-integration model. In addition, another function of the IMU pre-integration is to obtain a linear model of the IMU integration error transfer formula for subsequent joint optimization.

IMU pre-integration model: according to the IMU calibration in the step 1, the gyroscope bias b can be obtained_wAccelerometer bias b_aAnd additive noise n_a，n_w。

In order to make the IMU data information acquisition module universal, 6-axis IMU is selected for data acquisition, so that an accelerometer value can be acquired through an accelerometer

And the gyroscope obtains the angular velocity value

Where the b subscripts indicate being in a body coordinate system,

for transfer of the world coordinate system to the rotation matrix in the body coordinate system, g^wIs a gravity parameter under a world coordinate system.

Suppose the vision sensor is at t_k,t_k+1]The image frames acquired in time are respectively k and k +1, and the corresponding positions under the body coordinate system are respectively b_kAnd b_k+1The position, velocity and direction values in the world coordinate system are passed through the IMU measurements over a time interval as equations (1-5) to (1-8):

where Δ t_kIs [ t ]_k,t_k+1]The time interval between the start of the cycle,

for t moment body coordinate system toA rotation matrix in world coordinates. As can be seen from FIG. 2, b is the body coordinate system_k+1State of (1)

Dependent on the previous moment b_kThe real-time state of the system, so if the system is directly used for subsequent optimization procedures, the overall calculation amount is greatly increased. This is because the state result that is re-transmitted to the IMU every iteration when optimization is performed is directly used, and the state is updated.

For fusing the visual and IMU data, the position, speed and direction values under the world coordinate system should be converted into the body coordinate system, i.e. multiplied by the world coordinate system w to the body coordinate system b at the corresponding time_kOf the rotation matrix

The formulae (1-5), (1-6), (1-7) can be rewritten as:

wherein:

q represents a quaternion form, and the three components can be regarded as b in a body coordinate system_k+1Relative to b_kThe amount of exercise of (2) is obtained by respectively corresponding to the displacement, the velocity and the quaternion, and only the offset b of the accelerometer and the gyroscope needs to be considered in the three components_aAnd b_wAnd the state of the previous moment does not influence the stateBy using the method as an optimization variable, the overall calculation amount can be reduced to a certain extent. Secondly, in the case of small change between adjacent frames, the above three values can be approximated by using a first-order taylor expansion:

there are various pre-integration methods for IMU in discrete time, for example, euler integration method, RK4 integration method, median method, etc. are used, and in this embodiment, the median method is performed on the equation (1-13), and the following can be obtained:

at the initial moment of time, the device is started,

all state values are 0 and noise value n_a、n_wThe value of (a) is 0,

is the unit quaternion and t is the time interval between adjacent measurements of the IMU.

Through the above process, the obtaining of the measurement value of the IMU pre-integration is already finished, and further, in order to complete the whole flow of the IMU pre-integration, when the method is applied to the nonlinear optimization, it is necessary to derive the covariance matrix of the equation and solve the corresponding jacobian matrix to obtain the result, so it is necessary to establish a linear gaussian error state recurrence equation using the measurement value, and further derive the error transfer function matrix from the equations (1-9) (1-10) (1-11) by using the covariance of the linear gaussian system, and the detailed form is shown in the following equations (1-15), where R is R_kAnd R_k+1The linear relationship can be established by the integral equation of dynamics and obtained from the rotation matrix at the previous moment:

in the formulae (1-15):

for simplicity, the error transfer equation is abbreviated

δz_k+1＝Hδz_k+Vn

The initial value of Jacobian is set as an identity matrix I, and the iterative formula is as follows:

J_k+1＝FJ_k

from the error propagation model, an iterative formula of covariance can be obtained as

P_k+1＝HPH^T+VQV^T

The covariance initial value is 0; the IMU pre-integration provides an initial value for observation for fusion initialization of subsequent information and provides a measurement item for iterative optimization. In an actual situation, in order to more accurately fuse the two pieces of information, the IMU and the camera are jointly calibrated through the step 1 to obtain distortion parameters of the IMU and the camera, and the distortion removal processing is performed on the obtained data to improve the accuracy of the data.

And (4) substituting the formula (1-15) back to the formula (1-13) to obtain a corresponding reference value, wherein the part is a numerical value component required by subsequent tight coupling.

In addition, in order to recover the actual scale of the three-dimensional space, the scale result can be obtained by solving the equations (1-16), and the part can be decomposed by LDLT to obtain the final scale information s, so that the actual depths of the feature points and the poses in the solution result and the result under the world coordinate system can be recovered. Wherein the corner mark c₀Representing the camera coordinate system in the first frame image of the camera. R and p denote rotation matrix and translation, respectively. The formula (1-16) is optimized by using a least square method, the optimized gravity can be obtained, and c is obtained according to the change of the gravity₀And (3) converting the coordinate system into a world coordinate system, and then transferring variables (three-dimensional characteristic points recovered by translation and triangulation in the visual epipolar geometry operation) in the initial camera coordinate system to the world coordinate system and recovering the scale until the initialization is finished.

The initialized visual image frame can adopt a sliding window method, and 10 frames of images are continuously ensured to be used for initialization until the initialization is successful. After the initialization is successful, enough three-dimensional space characteristic points x in the world coordinate system can be obtained, and then the solution of the characteristic points and the transformation matrix in the world coordinate system can be carried out by adopting a P3P and ICP method, so that the calculated amount is effectively reduced.

Tightly coupling the IMU data and the visual data to perform combined optimization to obtain a camera transformation matrix and optimized three-dimensional space characteristic points;

if only the pose calculation result is used without tight coupling optimization, errors are accumulated slowly along with the time, and all data are used as subsequent optimization production conditions, the calculated amount is excessively increased, so that the method selects a sliding window method for optimization, selects a part of frame information as key frame information and performs combined optimization with other solutions, and obtains the final optimized solution. The selection requirement of the key frame meets the large requirement of enough parallax, and the specific selection is according to actual reference. The parameter variables to be optimized can be represented by the equations (1-17)

Wherein x_kN denotes a camera state at each time, including a position

Speed of rotation

And rotate

And an accelerometer b at each moment_aAnd a gyroscope b_gThe bias matrix of (a) is,

is a transformation matrix, lambda, between the camera coordinate system and the body coordinate system_nFor inverse depth values (the inverse of the depth of the feature points in the normalized camera coordinate system, the number of which is the number of landmark feature points included in the keyframe, which conforms to the gaussian distribution and can simplify the computational constraints), the objective function for state estimation optimization can be further expressed as

In the equations (1-18), the first part represents the rimmed residual, the second part is the IMU residual, and the third part is the visual error function part.

The visual residual function obtains an optimal solution in part by minimizing the normalized coordinate error of the same feature point observed at different times. For the characteristic points, there are the following geometrical relationships (1-19)

Wherein T represents a transformation matrix, which represents that the camera coordinate system at the moment i is converted into an inertial coordinate system to a world coordinate system, and then the world coordinate system is converted into the inertial coordinate system at the moment j and then the camera coordinate system after four times of coordinate system conversion. Then, from the expressions (1-19), new relations (1-20) can be obtained

Wherein R and p correspond to the rotation and translation matrices after the transformation matrix is disassembled, and for the pinhole camera, there is a visual residual error formula

In combination with equations (1-20), the minimized visual residual can be obtained.

The second part is the error of the IMU component part according to the pre-integral formula (1-9) (1-10) (1-11), having the formula (1-22)

The first part is a marginalized residual part, and the purpose of introducing the marginalized residual is mainly for the trade-off of operation speed and performance. If only two adjacent frames of images are considered to optimize the pose, although the algorithm can be guaranteed to have good operation speed, error accumulation is quicker, and the operation of optimizing the information of all the images in each frame is very redundant and has large operation amount. In order to ensure that the operand is relatively fixed, a sliding window method is selected to carry out nonlinear optimization on the fixed frame number, and marginalization is introduced. The purpose of introducing the marginalization is to stop calculating the pose of the frame and the related landmark points thereof while keeping the window constraint of the edge frame, so as to accelerate the speed while ensuring the sparsity of the optimization matrix. Marginalization specifically considers the following two cases: one is that if a new reference frame is added into the sliding window, and the last frame existing in the current window is a key frame, the pose of the last frame is moved out, and the visual and inertial data related to the pose are rimmed and used as prior values; and when the penultimate frame is not a key frame, marginalizing the visually relevant portion while preserving the relevant IMU constraints. Namely, schur elimination element removes non-key information, matrix dimension expansion is carried out according to the quantity to be optimized, finally, an open source cerees base can be adopted to solve the overall optimization result, and finally pose information with smaller errors and a pose under a characteristic point world coordinate system are obtained. The part can finally obtain the transformation matrix between frames and the three-dimensional space point information for subsequent weak constraint transformation.

And (IV) processing and compensating the acquired three-dimensional motion vector and the space pose, namely pre-warping according to the acquired transformation matrix, and performing local grid transformation on the acquired feature points to acquire a final image stabilization result.

Before the previous steps are carried out, filtering is carried out on motion filtering according to the obtained inter-frame motion vector by using the steps in the classical image stabilization method, pre-warping is carried out, primary compensation is carried out on the algorithm according to a global motion result, secondary motion compensation is carried out by adopting weak constraint compensation, and a stable image result after compensation transformation is obtained. In the secondary motion compensation, the information mainly adopted is the three-dimensional space information acquired before, the three-dimensional space information is projected into a stable frame, the projection error of the corresponding point is observed, and weak constraint compensation is performed. The core idea of the weak constraint compensation part comprises the following two parts:

(1) the characteristic points in the image are corrected by adopting strong constraint, which may cause complete deformation of the image, generate time-domain incoherence and distort the transformed edge part. And the calculated characteristic points are used as weak constraints, so that distortion can be effectively distributed to other flat area parts of the image, and therefore, the method selects and adopts a weak constraint method, and the interferences are ignored visually under the condition of ensuring that the image is not distorted as much as possible.

(2) In order to maintain the content and detail in the image while the image is deformed, additional constraints are needed to make the before and after deformation conform to the similarity transformation. In addition, the part with sparse characteristic points can reduce the constraint of the part so as to distribute the image distortion to the visually unobvious area. To do this, the input video frame image may be processed

The system is divided into n x m parts uniformly, each part is called a grid, and a feature point exists in the grid.

The position of each obtained characteristic point in the image is projected as

In the algorithm, each feature point is represented by four vertexes of a grid where the feature point is located in a bilinear interpolation mode. Suppose that

For vectors consisting of four vertices of a mesh, V_kAre the corresponding four mesh vertices in the output image. The vector ω k is a bilinear interpolation coefficient having a sum of 1, so

The data item can be represented as a least squares problem:

wherein V_kFour unknown vertices, P_kAnd optimizing the position of the characteristic point through the nonlinearity. The purpose of using this data item is to make the feature point P in the output image_kAnd the characteristic points of the original image

The distance between the interpolation positions of the grids is the minimum.

The similarity transformation term measures the difference between the grid cells in the output image and the similarity transformation of the corresponding input grid. Each mesh is divided into two triangles to build the model.

Where u and v are known values that can be found in the original image, when griddingU is 0 and V is 1 for a square, and V is not satisfied in the similarity transformation relation in the output image₁Deviation from the ideal position, as shown in figure 3, the transformed ideal position is shown by the dotted line, V₁Is the actual position. Then a similarity transformation is needed to be used so that the distance between the transformed position and the ideal position result satisfies the minimum relationship:

E_s(V₁)＝ω_s||V₁-(V₂+u(V₃-V₂)+vR₉₀(V₁-V₁))||² (1-25)

wherein, the weight omega used for calculating the optimal solution in each grid_sSet as the two-norm of the color variance, then the variance for each mesh can be viewed as the sum of the similarity transformations after being divided into two triangles, with 8 terms in total. And when the formulas (1-23) and (1-25) obtain the minimum value at the same time, the ideal optimal coordinate of the output vertex in the grid can be obtained, then the standard texture mapping algorithm is adopted for solving, the generated output image after further image stabilization is obtained, then the image is cut to eliminate the black edge, and the final stable image frame can be obtained.

The method considers the problems of possible real-time property and the like in the future, slightly reduces the impression problem, keeps certain visual incoherence on a time domain within an acceptable range, preferentially selects the feature points continuously appearing in the previous key frame and the data frame as the feature points for image stabilization, and ensures the consistency of the algorithm as much as possible.

And then, continuously repeating the steps to obtain a stable output image sequence.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.

Claims

1. A three-dimensional software image stabilization method based on visual inertial information fusion is characterized by comprising the following steps: the method comprises the following steps:

2. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 1, characterized in that: in step 1, the camera and IMU calibration method is calibrated by referring to a github open source calibration tool Kalibr.

3. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 1, characterized in that: in the step 1, errors of angular velocity and linear velocity are obtained through IMU calibration to correct an IMU error model; acquiring a camera internal reference matrix K and a distortion coefficient through visual calibration; and further correcting the parameters through combined calibration, and acquiring a conversion matrix between the IMU and the camera carrier.

4. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 1, characterized in that: in step 2, the process of performing monocular vision initialization is as follows:

s₁p₁＝KP,s₂p₂＝K(RP+t)

x₁＝K^-1p₁，x₂＝K^-1p₂

5. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 4, characterized in that: the IMU pre-integration model is as follows:

obtaining the gyroscope bias b according to the IMU calibration of the step 1_wAccelerometer bias b_aAnd additive noise n_a，n_w；

Obtaining accelerometer values using 6-axis IMU

And the gyroscope obtains the angular velocity value

Where the b subscripts indicate being in a body coordinate system,

where Δ t_kIs [ t ]_k,t_k+1]The time interval between the start of the cycle,

body seatThe standard system is transferred to a rotation matrix under the world coordinate; further converting the position, the speed and the direction values under the world coordinate system into a body coordinate system:

wherein:

and obtaining an error transfer function matrix:

wherein

6. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 1, characterized in that: and 3, optimizing by adopting a sliding window method when tightly coupling the IMU data and the visual data for joint optimization, and selecting a part of frame information as key frame information to perform joint optimization with other solutions.

7. The three-dimensional software image stabilization method based on visual inertial information fusion of claim 6, wherein: the optimization process by adopting a sliding window method in the step 3 comprises the following steps:

using formulas

Speed of rotation

And rotate

And bias matrix b of accelerometer and gyroscope at each moment_a，b_g；