CN113240597A - Three-dimensional software image stabilization method based on visual inertial information fusion - Google Patents

Three-dimensional software image stabilization method based on visual inertial information fusion Download PDF

Info

Publication number
CN113240597A
CN113240597A CN202110497661.XA CN202110497661A CN113240597A CN 113240597 A CN113240597 A CN 113240597A CN 202110497661 A CN202110497661 A CN 202110497661A CN 113240597 A CN113240597 A CN 113240597A
Authority
CN
China
Prior art keywords
imu
coordinate system
matrix
camera
image stabilization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110497661.XA
Other languages
Chinese (zh)
Other versions
CN113240597B (en
Inventor
唐成凯
李冠林
张怡
张玲玲
王晨
程卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202110497661.XA priority Critical patent/CN113240597B/en
Publication of CN113240597A publication Critical patent/CN113240597A/en
Application granted granted Critical
Publication of CN113240597B publication Critical patent/CN113240597B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/10Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
    • G01C21/12Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
    • G01C21/16Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
    • G01C21/165Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S11/00Systems for determining distance or velocity not using reflection or reradiation
    • G01S11/12Systems for determining distance or velocity not using reflection or reradiation using electromagnetic waves other than radio waves
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Electromagnetism (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a three-dimensional software image stabilization method based on visual inertia information fusion. Firstly, calibrating a camera and a visual inertial sensor, acquiring distortion parameters and an internal reference matrix of a monocular camera, and calibrating IMU error parameters for acquiring an IMU error model of subsequent tight coupling joint optimization; secondly, performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system; thirdly, tightly coupling IMU data and visual data to perform joint optimization, and transforming a matrix and three-dimensional space characteristic points by using a camera; the fourth step: and pre-warping according to the obtained transformation matrix, and carrying out local grid transformation on the obtained characteristic points to obtain a final image stabilization result. The invention can effectively solve the problem of poor motion vector estimation performance of the classical SFM method and improve the spatial position of the motion-compensated three-dimensional characteristic point and the overall motion vector precision.

Description

Three-dimensional software image stabilization method based on visual inertial information fusion
Technical Field
The invention relates to a Visual three-dimensional Software image Stabilization method with Inertial components, in particular to a Software image Stabilization (Software Stabilization) method based on a Visual Inertial Information Fusion (Visual Inertial Information Fusion) theory.
Background
The rapid development of the electronic information age leads to the technical soaring, and the unmanned technology gradually develops from vehicles to the air. Unmanned systems play an increasingly important role in military and civilian areas such as urban transportation, battlefield combat and the like. In the development of unmanned systems, the design of sensing, navigation and control systems is of paramount importance. The visual navigation technology is characterized in that the unmanned system extracts abundant navigation information from an environment image acquired by the camera in real time, so that the unmanned system has the capability of interacting with the environment, and the automation and intelligence level of control is improved. However, in some cases, the camera carrier may generate undesirable random jitter, and in such cases, the video and images obtained by the camera carrier may have unstable and blurred pictures, resulting in degradation of video quality and observation effect. In order to ensure the use value of the obtained video sequence, anti-shake processing, namely image stabilization processing, needs to be performed on the moving camera platform.
Image stabilization systems can be classified into mechanical image stabilization, optical image stabilization and digital image stabilization according to the image stabilization mode. The mechanical image stabilization platform is large in size and low in precision in practical use, and is easily influenced by external force errors such as friction force and wind resistance, so that the mechanical image stabilization platform is not suitable for serving as a main means for image stabilization of the small rotor wing unmanned aerial vehicle; the optical image stabilizer has a good optical image stabilization effect, can directly utilize an optical method to realize stabilization of an image sequence, but needs achromatic treatment on a variable optical wedge due to the existence of a secondary spectrum, so that the structure and the process of the image stabilizer are too complex, passive compensation is only carried out by a prism, a reflector or an optical wedge, and the image stabilization capability of the image stabilizer is greatly limited; the Digital Video Stabilization (DVS) technology is also called electronic image stabilization, and based on computer vision, image processing, signal processing and other subjects, the image stabilization function is realized mainly by compiling an image stabilization algorithm by a computer and software codes without the support of other systems, and the compiled software codes can be transplanted, so that the Digital video stabilization DVS system is convenient to maintain and update, has low cost and good precision. Digital video image stabilization is an emerging technology developed in recent decades, and generates a stable video sequence mainly by eliminating or reducing video jitter. Image stabilization methods can be divided into 2D, 2.5D and 3D methods according to current research models. The 2D method has simple algorithm principle and small operand, but the image stabilizing effect is not natural enough due to lack of depth of field, and the algorithm performance is limited; the 2.5D algorithm compromises the 2D algorithm and the 3D algorithm, the algorithm complexity is high, the image stabilizing effect is good, the parallax can be processed, but the algorithm needs to adopt enough long continuous appearance characteristic points in the multi-frame image to generate a track and stabilize the image according to certain constraint conditions, and the algorithm is difficult to be used for real-time image stabilization; the 3D algorithm has the highest complexity and the best performance, but the image stabilization effect of the algorithm is limited by the performance of the Motion recovery Structure (SFM), and it is difficult to obtain a good enough image stabilization effect.
Disclosure of Invention
The invention aims to provide a three-dimensional software image stabilization method based on visual inertial information fusion. The three-dimensional software image stabilization method based on visual inertial information fusion can be combined with a visual navigation method, the calculation amount of an algorithm is reduced, the image stabilization performance is good, and the quality of an obtained image sequence can be effectively improved.
The three-dimensional software image stabilization method based on visual inertia information fusion provided by the invention combines the advantages of high precision of the visual inertia method and the characteristic of relatively natural grid warping change, and can effectively solve the problem of precision bottleneck existing in the conventional image stabilization algorithm.
The technical scheme of the invention is as follows:
the three-dimensional software image stabilization method based on visual inertial information fusion comprises the following steps:
step 1: calibrating a camera and an IMU (inertial measurement Unit), acquiring distortion parameters and an internal reference matrix of the monocular camera, and calibrating IMU error parameters for acquiring an IMU error model of subsequent tight coupling joint optimization;
step 2: performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system;
and step 3: tightly coupling IMU data and visual data to perform joint optimization, and acquiring a camera transformation matrix and optimized three-dimensional space characteristic points;
and 4, step 4: and pre-warping according to the obtained transformation matrix, and carrying out local grid transformation on the obtained characteristic points to obtain a final image stabilization result.
Further, in step 1, the camera and IMU calibration method is calibrated with reference to the github open source calibration tool Kalibr.
Further, in step 1, errors of the angular velocity and the linear velocity are obtained through IMU calibration to correct the IMU error model; acquiring a camera internal reference matrix K and a distortion coefficient through visual calibration; and further correcting the parameters through combined calibration, and acquiring a conversion matrix between the IMU and the camera carrier.
Further, in step 2, the process of performing monocular vision initialization is as follows:
image frame I obtained for monocular vision1,I2There is a common three-dimensional point P under the world coordinate system, and the position of the common three-dimensional point P under the world coordinate system is [ X, Y, Z ]]T(ii) a The poses of the three-dimensional point in the two frames of images have a relation:
s1p1=KP,s2p2=K(RP+t)
x1=K-1p1,x2=K-1p2
wherein R and t are a rotation matrix and a displacement vector between two adjacent frames of the camera; can obtain the pose relation
Figure BDA0003055053410000031
Figure BDA0003055053410000032
Figure BDA0003055053410000033
Obtaining an essential matrix E and a basic matrix F, further obtaining a rotation matrix R and a translation matrix t according to the essential matrix E and the basic matrix F, and recovering the relative depth of each characteristic point by adopting triangulation; and then, obtaining the scale information of each characteristic point by utilizing an IMU pre-integration model.
Further, the IMU pre-integration model is:
obtaining the gyroscope bias b according to the IMU calibration of the step 1wAccelerometer bias baAnd additive noise na,nw(ii) a Obtaining accelerometer values using 6-axis IMU
Figure BDA0003055053410000034
And the gyroscope obtains the angular velocity value
Figure BDA0003055053410000035
Figure BDA0003055053410000036
Figure BDA0003055053410000037
Where the b subscripts indicate being in a body coordinate system,
Figure BDA0003055053410000038
for transfer of the world coordinate system to the rotation matrix in the body coordinate system, gwThe gravity parameters under a world coordinate system; if the camera is at [ t ]k,tk+1]Image frames acquired in time are respectively k and k +1, wherein the corresponding positions of a certain characteristic point under a body coordinate system are respectively bkAnd bk+1And obtaining the transmission of the position, speed and direction values in the world coordinate system through IMU measurement values in a time interval as follows:
Figure BDA0003055053410000039
Figure BDA00030550534100000310
Figure BDA00030550534100000311
Figure BDA0003055053410000041
where Δ tkIs [ t ]k,tk+1]The time interval between the start of the cycle,
Figure BDA0003055053410000042
the body coordinate system is converted into a rotation matrix under world coordinates; further converting the position, the speed and the direction values under the world coordinate system into a body coordinate system:
Figure BDA0003055053410000043
Figure BDA0003055053410000044
Figure BDA0003055053410000045
wherein:
Figure BDA0003055053410000046
Figure BDA0003055053410000047
Figure BDA0003055053410000048
and performing first-order Taylor expansion on the formula to obtain an approximate value, and then performing intermediate value expansion to obtain:
Figure BDA0003055053410000049
Figure BDA00030550534100000410
Figure BDA00030550534100000411
and obtaining an error transfer function matrix:
Figure BDA0003055053410000051
wherein
Figure BDA0003055053410000052
Figure BDA0003055053410000053
Figure BDA0003055053410000054
Figure BDA0003055053410000055
Obtaining corresponding parameters by using the error transfer function matrix; by using
Figure BDA0003055053410000056
Obtaining the scale information s so as to recover the actual depth of each characteristic point and the result under the world coordinate system, wherein the corner mark c0Representing the camera coordinate system in the first frame image of the camera.
Further, in step 3, the IMU data and the visual data are closely coupled for joint optimization, a sliding window method is adopted for optimization, and a part of frame information is selected as key frame information to be jointly optimized with the rest solutions.
Further, the optimization process by using a sliding window method in step 3 is as follows:
using formulas
Figure BDA0003055053410000061
Representing the parameter variable to be optimized, where xkN denotes a camera state at each time, including a position
Figure BDA0003055053410000062
Speed of rotation
Figure BDA0003055053410000063
And rotate
Figure BDA0003055053410000064
And bias matrix b of accelerometer and gyroscope at each momenta,bg
Figure BDA0003055053410000065
Is a transformation matrix, lambda, between the camera coordinate system and the body coordinate systemnIs the inverse depth value; solving an objective function
Figure BDA0003055053410000066
And optimizing, wherein the first part of the objective function represents the marginalized residual error part, the second part is the IMU residual error, and the third part is the visual error function part.
Advantageous effects
The invention establishes a monocular vision-based motion estimation model by a visual inertial fusion method, thereby realizing the improvement of the precision of the motion estimation method. And on the basis of the traditional SFM motion estimation method, a visual inertial navigation information tight coupling method is added, so that the overall accuracy of the algorithm for obtaining the three-dimensional information is improved.
The method of pre-warping and grid transformation local warping adopted by the invention effectively solves the parallax problem by a motion compensation method on the basis of the traditional image stabilization method, so that the result after image stabilization is more natural, and the image stabilization visual effect of an image stabilization algorithm can be effectively improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic antipodal geometry, with adjacent image frames I1,I2Setting an image frame I1→I2Is R, t, and the center position of the camera in each image frame is O1,O2. Suppose an image frame I1Its corresponding characteristic point is p1Image frame I2Corresponding characteristic point p2The two are matched by an ORB feature point method and subjected to RANSAC screeningThe result can be seen as the projection of the same point in two different image frame imaging planes. Thus, connecting the wires
Figure BDA0003055053410000067
And
Figure BDA0003055053410000068
will intersect at a point P in three-dimensional space, represented by O1,O2And P we can get a plane named the polar plane (Epipolar plane), O1,O2The connecting line between the two points will intersect with the image frame at e1,e2Two points, referred to as poles (Epipoles), O1O2It is called Baseline. l1,l2For two image frame planes I1,I2The intersection with the polar plane is called the Epipolar line.
FIG. 2 is a schematic diagram of IMU integration, wherein red represents visual data and green represents IMU data.
Figure 3 is a schematic view of a partial warpage,
Figure BDA0003055053410000071
representing mesh vertices, u and v representing transformed coordinate systems, respectively
Figure BDA0003055053410000072
And (4) coordinates.
Detailed Description
The invention provides a three-dimensional image stabilization method based on visual inertial navigation information fusion, aiming at the problem that the precision of the traditional image stabilization method cannot meet the requirement of a future high-precision image stabilization algorithm. The method mainly comprises two main parts of motion estimation and motion compensation. The motion estimation adopts a method of combining visual inertia information, can effectively solve the problem of poor motion vector estimation performance of the classical SFM method, and improves the spatial position of the three-dimensional characteristic point of the motion compensation and the overall motion vector precision. The motion compensation is used for pre-warping according to the obtained motion vector and the three-dimensional characteristic point cloud, carrying out grid segmentation to carry out three-dimensional image warping further compensation, and obtaining a result after image stabilization, so that the image stabilization precision is improved.
The main ideas of the invention are as follows: firstly, calibrating a camera and a visual inertial sensor, acquiring distortion parameters and an internal reference matrix of a monocular camera, and calibrating IMU error parameters for acquiring an IMU error model of subsequent tight coupling joint optimization; secondly, performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system; thirdly, tightly coupling IMU data and visual data to perform joint optimization, and transforming a matrix and three-dimensional space characteristic points by using a camera; the fourth step: and pre-warping according to the obtained transformation matrix, and carrying out local grid transformation on the obtained characteristic points to obtain a final image stabilization result.
The specific process is as follows:
calibrating a camera and an IMU (inertial measurement Unit), acquiring distortion parameters and an internal reference matrix of the monocular camera, and calibrating IMU error parameters to acquire an IMU error model of subsequent tight coupling joint optimization;
the specific calibration method of vision and IMU calibration can be calibrated by referring to a github open source calibration tool Kalibr. The IMU calibration is to obtain the error of angular velocity and linear velocity to correct an IMU error model for scale acquisition and final overall optimization, the visual calibration is to obtain an internal reference matrix K and a distortion coefficient, the parameters can be further corrected by combined calibration, and a conversion matrix between the IMU and a camera carrier is obtained, and a monocular pinhole camera model is adopted by default. The distortion parameter of which comprises the tangential distortion p1,p2And radial distortion k1,k2,k3
Figure BDA0003055053410000081
Wherein X and Y are measured values of the camera normalized coordinate system, X and Y are actual values of the camera normalized coordinate system, and r2=x2+y2The true values x, y after distortion correction can be obtained from the equation (1-1).
Performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system;
after distortion correction, the input image frame can be defaulted to be distortion-free. The pixels of the image frame taken later are by default already corrected. Initialization may then occur. As shown in fig. 1, image frame I1,I2A common three-dimensional point P under the world coordinate system is positioned as [ X, Y, Z ] under the world coordinate system]T. According to the monocular camera model, the poses of the three-dimensional points in the two images have the relationship of the formula (1-2):
Figure BDA0003055053410000082
wherein R and t are a rotation matrix and a displacement vector between two adjacent frames of the camera, and the formula (1-3) can be deduced from the formula (1-2):
Figure BDA0003055053410000083
the essential matrix E and the basic matrix F can be obtained from the equations (1-3), and the rotation matrix R and the translation matrix t can be further obtained from the essential matrix E and the basic matrix F, for example, by an eight-point method or a five-point method. And then, triangulation can be adopted to recover the relative depth of each feature point, wherein the relative depth is mainly limited by the defect that a monocular camera cannot acquire an actual scale. And for jointly optimizing the scale, obtaining scale information by means of an IMU pre-integration model. In addition, another function of the IMU pre-integration is to obtain a linear model of the IMU integration error transfer formula for subsequent joint optimization.
IMU pre-integration model: according to the IMU calibration in the step 1, the gyroscope bias b can be obtainedwAccelerometer bias baAnd additive noise na,nw
In order to make the IMU data information acquisition module universal, 6-axis IMU is selected for data acquisition, so that an accelerometer value can be acquired through an accelerometer
Figure BDA0003055053410000084
And the gyroscope obtains the angular velocity value
Figure BDA0003055053410000085
Figure BDA0003055053410000086
Where the b subscripts indicate being in a body coordinate system,
Figure BDA0003055053410000091
for transfer of the world coordinate system to the rotation matrix in the body coordinate system, gwIs a gravity parameter under a world coordinate system.
Suppose the vision sensor is at tk,tk+1]The image frames acquired in time are respectively k and k +1, and the corresponding positions under the body coordinate system are respectively bkAnd bk+1The position, velocity and direction values in the world coordinate system are passed through the IMU measurements over a time interval as equations (1-5) to (1-8):
Figure BDA0003055053410000092
Figure BDA0003055053410000093
Figure BDA0003055053410000094
Figure BDA0003055053410000095
where Δ tkIs [ t ]k,tk+1]The time interval between the start of the cycle,
Figure BDA0003055053410000096
for t moment body coordinate system toA rotation matrix in world coordinates. As can be seen from FIG. 2, b is the body coordinate systemk+1State of (1)
Figure BDA0003055053410000097
Dependent on the previous moment bkThe real-time state of the system, so if the system is directly used for subsequent optimization procedures, the overall calculation amount is greatly increased. This is because the state result that is re-transmitted to the IMU every iteration when optimization is performed is directly used, and the state is updated.
For fusing the visual and IMU data, the position, speed and direction values under the world coordinate system should be converted into the body coordinate system, i.e. multiplied by the world coordinate system w to the body coordinate system b at the corresponding timekOf the rotation matrix
Figure BDA0003055053410000098
The formulae (1-5), (1-6), (1-7) can be rewritten as:
Figure BDA0003055053410000099
Figure BDA00030550534100000910
Figure BDA00030550534100000911
wherein:
Figure BDA0003055053410000101
q represents a quaternion form, and the three components can be regarded as b in a body coordinate systemk+1Relative to bkThe amount of exercise of (2) is obtained by respectively corresponding to the displacement, the velocity and the quaternion, and only the offset b of the accelerometer and the gyroscope needs to be considered in the three componentsaAnd bwAnd the state of the previous moment does not influence the stateBy using the method as an optimization variable, the overall calculation amount can be reduced to a certain extent. Secondly, in the case of small change between adjacent frames, the above three values can be approximated by using a first-order taylor expansion:
Figure BDA0003055053410000102
there are various pre-integration methods for IMU in discrete time, for example, euler integration method, RK4 integration method, median method, etc. are used, and in this embodiment, the median method is performed on the equation (1-13), and the following can be obtained:
Figure BDA0003055053410000103
at the initial moment of time, the device is started,
Figure BDA0003055053410000104
all state values are 0 and noise value na、nwThe value of (a) is 0,
Figure BDA0003055053410000105
is the unit quaternion and t is the time interval between adjacent measurements of the IMU.
Through the above process, the obtaining of the measurement value of the IMU pre-integration is already finished, and further, in order to complete the whole flow of the IMU pre-integration, when the method is applied to the nonlinear optimization, it is necessary to derive the covariance matrix of the equation and solve the corresponding jacobian matrix to obtain the result, so it is necessary to establish a linear gaussian error state recurrence equation using the measurement value, and further derive the error transfer function matrix from the equations (1-9) (1-10) (1-11) by using the covariance of the linear gaussian system, and the detailed form is shown in the following equations (1-15), where R is RkAnd Rk+1The linear relationship can be established by the integral equation of dynamics and obtained from the rotation matrix at the previous moment:
Figure BDA0003055053410000111
in the formulae (1-15):
Figure BDA0003055053410000112
Figure BDA0003055053410000113
Figure BDA0003055053410000114
Figure BDA0003055053410000115
for simplicity, the error transfer equation is abbreviated
δzk+1=Hδzk+Vn
The initial value of Jacobian is set as an identity matrix I, and the iterative formula is as follows:
Jk+1=FJk
from the error propagation model, an iterative formula of covariance can be obtained as
Pk+1=HPHT+VQVT
The covariance initial value is 0; the IMU pre-integration provides an initial value for observation for fusion initialization of subsequent information and provides a measurement item for iterative optimization. In an actual situation, in order to more accurately fuse the two pieces of information, the IMU and the camera are jointly calibrated through the step 1 to obtain distortion parameters of the IMU and the camera, and the distortion removal processing is performed on the obtained data to improve the accuracy of the data.
And (4) substituting the formula (1-15) back to the formula (1-13) to obtain a corresponding reference value, wherein the part is a numerical value component required by subsequent tight coupling.
Figure BDA0003055053410000121
In addition, in order to recover the actual scale of the three-dimensional space, the scale result can be obtained by solving the equations (1-16), and the part can be decomposed by LDLT to obtain the final scale information s, so that the actual depths of the feature points and the poses in the solution result and the result under the world coordinate system can be recovered. Wherein the corner mark c0Representing the camera coordinate system in the first frame image of the camera. R and p denote rotation matrix and translation, respectively. The formula (1-16) is optimized by using a least square method, the optimized gravity can be obtained, and c is obtained according to the change of the gravity0And (3) converting the coordinate system into a world coordinate system, and then transferring variables (three-dimensional characteristic points recovered by translation and triangulation in the visual epipolar geometry operation) in the initial camera coordinate system to the world coordinate system and recovering the scale until the initialization is finished.
The initialized visual image frame can adopt a sliding window method, and 10 frames of images are continuously ensured to be used for initialization until the initialization is successful. After the initialization is successful, enough three-dimensional space characteristic points x in the world coordinate system can be obtained, and then the solution of the characteristic points and the transformation matrix in the world coordinate system can be carried out by adopting a P3P and ICP method, so that the calculated amount is effectively reduced.
Tightly coupling the IMU data and the visual data to perform combined optimization to obtain a camera transformation matrix and optimized three-dimensional space characteristic points;
if only the pose calculation result is used without tight coupling optimization, errors are accumulated slowly along with the time, and all data are used as subsequent optimization production conditions, the calculated amount is excessively increased, so that the method selects a sliding window method for optimization, selects a part of frame information as key frame information and performs combined optimization with other solutions, and obtains the final optimized solution. The selection requirement of the key frame meets the large requirement of enough parallax, and the specific selection is according to actual reference. The parameter variables to be optimized can be represented by the equations (1-17)
Figure BDA0003055053410000131
Wherein xkN denotes a camera state at each time, including a position
Figure BDA0003055053410000132
Speed of rotation
Figure BDA0003055053410000133
And rotate
Figure BDA0003055053410000134
And an accelerometer b at each momentaAnd a gyroscope bgThe bias matrix of (a) is,
Figure BDA0003055053410000135
is a transformation matrix, lambda, between the camera coordinate system and the body coordinate systemnFor inverse depth values (the inverse of the depth of the feature points in the normalized camera coordinate system, the number of which is the number of landmark feature points included in the keyframe, which conforms to the gaussian distribution and can simplify the computational constraints), the objective function for state estimation optimization can be further expressed as
Figure BDA0003055053410000136
In the equations (1-18), the first part represents the rimmed residual, the second part is the IMU residual, and the third part is the visual error function part.
The visual residual function obtains an optimal solution in part by minimizing the normalized coordinate error of the same feature point observed at different times. For the characteristic points, there are the following geometrical relationships (1-19)
Figure BDA0003055053410000137
Wherein T represents a transformation matrix, which represents that the camera coordinate system at the moment i is converted into an inertial coordinate system to a world coordinate system, and then the world coordinate system is converted into the inertial coordinate system at the moment j and then the camera coordinate system after four times of coordinate system conversion. Then, from the expressions (1-19), new relations (1-20) can be obtained
Figure BDA0003055053410000138
Wherein R and p correspond to the rotation and translation matrices after the transformation matrix is disassembled, and for the pinhole camera, there is a visual residual error formula
Figure BDA0003055053410000141
In combination with equations (1-20), the minimized visual residual can be obtained.
The second part is the error of the IMU component part according to the pre-integral formula (1-9) (1-10) (1-11), having the formula (1-22)
Figure BDA0003055053410000142
The first part is a marginalized residual part, and the purpose of introducing the marginalized residual is mainly for the trade-off of operation speed and performance. If only two adjacent frames of images are considered to optimize the pose, although the algorithm can be guaranteed to have good operation speed, error accumulation is quicker, and the operation of optimizing the information of all the images in each frame is very redundant and has large operation amount. In order to ensure that the operand is relatively fixed, a sliding window method is selected to carry out nonlinear optimization on the fixed frame number, and marginalization is introduced. The purpose of introducing the marginalization is to stop calculating the pose of the frame and the related landmark points thereof while keeping the window constraint of the edge frame, so as to accelerate the speed while ensuring the sparsity of the optimization matrix. Marginalization specifically considers the following two cases: one is that if a new reference frame is added into the sliding window, and the last frame existing in the current window is a key frame, the pose of the last frame is moved out, and the visual and inertial data related to the pose are rimmed and used as prior values; and when the penultimate frame is not a key frame, marginalizing the visually relevant portion while preserving the relevant IMU constraints. Namely, schur elimination element removes non-key information, matrix dimension expansion is carried out according to the quantity to be optimized, finally, an open source cerees base can be adopted to solve the overall optimization result, and finally pose information with smaller errors and a pose under a characteristic point world coordinate system are obtained. The part can finally obtain the transformation matrix between frames and the three-dimensional space point information for subsequent weak constraint transformation.
And (IV) processing and compensating the acquired three-dimensional motion vector and the space pose, namely pre-warping according to the acquired transformation matrix, and performing local grid transformation on the acquired feature points to acquire a final image stabilization result.
Before the previous steps are carried out, filtering is carried out on motion filtering according to the obtained inter-frame motion vector by using the steps in the classical image stabilization method, pre-warping is carried out, primary compensation is carried out on the algorithm according to a global motion result, secondary motion compensation is carried out by adopting weak constraint compensation, and a stable image result after compensation transformation is obtained. In the secondary motion compensation, the information mainly adopted is the three-dimensional space information acquired before, the three-dimensional space information is projected into a stable frame, the projection error of the corresponding point is observed, and weak constraint compensation is performed. The core idea of the weak constraint compensation part comprises the following two parts:
(1) the characteristic points in the image are corrected by adopting strong constraint, which may cause complete deformation of the image, generate time-domain incoherence and distort the transformed edge part. And the calculated characteristic points are used as weak constraints, so that distortion can be effectively distributed to other flat area parts of the image, and therefore, the method selects and adopts a weak constraint method, and the interferences are ignored visually under the condition of ensuring that the image is not distorted as much as possible.
(2) In order to maintain the content and detail in the image while the image is deformed, additional constraints are needed to make the before and after deformation conform to the similarity transformation. In addition, the part with sparse characteristic points can reduce the constraint of the part so as to distribute the image distortion to the visually unobvious area. To do this, the input video frame image may be processed
Figure BDA0003055053410000157
The system is divided into n x m parts uniformly, each part is called a grid, and a feature point exists in the grid.
The position of each obtained characteristic point in the image is projected as
Figure BDA0003055053410000151
In the algorithm, each feature point is represented by four vertexes of a grid where the feature point is located in a bilinear interpolation mode. Suppose that
Figure BDA0003055053410000152
For vectors consisting of four vertices of a mesh, VkAre the corresponding four mesh vertices in the output image. The vector ω k is a bilinear interpolation coefficient having a sum of 1, so
Figure BDA0003055053410000153
The data item can be represented as a least squares problem:
Figure BDA0003055053410000154
wherein VkFour unknown vertices, PkAnd optimizing the position of the characteristic point through the nonlinearity. The purpose of using this data item is to make the feature point P in the output imagekAnd the characteristic points of the original image
Figure BDA0003055053410000155
The distance between the interpolation positions of the grids is the minimum.
The similarity transformation term measures the difference between the grid cells in the output image and the similarity transformation of the corresponding input grid. Each mesh is divided into two triangles to build the model.
Figure BDA0003055053410000156
Where u and v are known values that can be found in the original image, when griddingU is 0 and V is 1 for a square, and V is not satisfied in the similarity transformation relation in the output image1Deviation from the ideal position, as shown in figure 3, the transformed ideal position is shown by the dotted line, V1Is the actual position. Then a similarity transformation is needed to be used so that the distance between the transformed position and the ideal position result satisfies the minimum relationship:
Es(V1)=ωs||V1-(V2+u(V3-V2)+vR90(V1-V1))||2 (1-25)
wherein, the weight omega used for calculating the optimal solution in each gridsSet as the two-norm of the color variance, then the variance for each mesh can be viewed as the sum of the similarity transformations after being divided into two triangles, with 8 terms in total. And when the formulas (1-23) and (1-25) obtain the minimum value at the same time, the ideal optimal coordinate of the output vertex in the grid can be obtained, then the standard texture mapping algorithm is adopted for solving, the generated output image after further image stabilization is obtained, then the image is cut to eliminate the black edge, and the final stable image frame can be obtained.
The method considers the problems of possible real-time property and the like in the future, slightly reduces the impression problem, keeps certain visual incoherence on a time domain within an acceptable range, preferentially selects the feature points continuously appearing in the previous key frame and the data frame as the feature points for image stabilization, and ensures the consistency of the algorithm as much as possible.
And then, continuously repeating the steps to obtain a stable output image sequence.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.

Claims (7)

1. A three-dimensional software image stabilization method based on visual inertial information fusion is characterized by comprising the following steps: the method comprises the following steps:
step 1: calibrating a camera and an IMU (inertial measurement Unit), acquiring distortion parameters and an internal reference matrix of the monocular camera, and calibrating IMU error parameters for acquiring an IMU error model of subsequent tight coupling joint optimization;
step 2: performing monocular vision initialization to obtain sufficient three-dimensional characteristic points under a world coordinate system;
and step 3: tightly coupling IMU data and visual data to perform joint optimization, and acquiring a camera transformation matrix and optimized three-dimensional space characteristic points;
and 4, step 4: and pre-warping according to the obtained transformation matrix, and carrying out local grid transformation on the obtained characteristic points to obtain a final image stabilization result.
2. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 1, characterized in that: in step 1, the camera and IMU calibration method is calibrated by referring to a github open source calibration tool Kalibr.
3. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 1, characterized in that: in the step 1, errors of angular velocity and linear velocity are obtained through IMU calibration to correct an IMU error model; acquiring a camera internal reference matrix K and a distortion coefficient through visual calibration; and further correcting the parameters through combined calibration, and acquiring a conversion matrix between the IMU and the camera carrier.
4. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 1, characterized in that: in step 2, the process of performing monocular vision initialization is as follows:
image frame I obtained for monocular vision1,I2There is a common three-dimensional point P under the world coordinate system, and the position of the common three-dimensional point P under the world coordinate system is [ X, Y, Z ]]T(ii) a The poses of the three-dimensional point in the two frames of images have a relation:
s1p1=KP,s2p2=K(RP+t)
x1=K-1p1,x2=K-1p2
wherein R and t are a rotation matrix and a displacement vector between two adjacent frames of the camera; can obtain the pose relation
Figure FDA0003055053400000011
Figure FDA0003055053400000012
Figure FDA0003055053400000013
Obtaining an essential matrix E and a basic matrix F, further obtaining a rotation matrix R and a translation matrix t according to the essential matrix E and the basic matrix F, and recovering the relative depth of each characteristic point by adopting triangulation; and then, obtaining the scale information of each characteristic point by utilizing an IMU pre-integration model.
5. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 4, characterized in that: the IMU pre-integration model is as follows:
obtaining the gyroscope bias b according to the IMU calibration of the step 1wAccelerometer bias baAnd additive noise na,nw
Obtaining accelerometer values using 6-axis IMU
Figure FDA0003055053400000021
And the gyroscope obtains the angular velocity value
Figure FDA0003055053400000022
Figure FDA0003055053400000023
Figure FDA0003055053400000024
Where the b subscripts indicate being in a body coordinate system,
Figure FDA0003055053400000025
for transfer of the world coordinate system to the rotation matrix in the body coordinate system, gwThe gravity parameters under a world coordinate system; if the camera is at [ t ]k,tk+1]Image frames acquired in time are respectively k and k +1, wherein the corresponding positions of a certain characteristic point under a body coordinate system are respectively bkAnd bk+1And obtaining the transmission of the position, speed and direction values in the world coordinate system through IMU measurement values in a time interval as follows:
Figure FDA0003055053400000026
Figure FDA0003055053400000027
Figure FDA0003055053400000028
Figure FDA0003055053400000029
where Δ tkIs [ t ]k,tk+1]The time interval between the start of the cycle,
Figure FDA00030550534000000210
body seatThe standard system is transferred to a rotation matrix under the world coordinate; further converting the position, the speed and the direction values under the world coordinate system into a body coordinate system:
Figure FDA00030550534000000211
Figure FDA00030550534000000212
Figure FDA00030550534000000213
wherein:
Figure FDA0003055053400000031
Figure FDA0003055053400000032
Figure FDA0003055053400000033
and performing first-order Taylor expansion on the formula to obtain an approximate value, and then performing intermediate value expansion to obtain:
Figure FDA0003055053400000034
Figure FDA0003055053400000035
Figure FDA0003055053400000036
and obtaining an error transfer function matrix:
Figure FDA0003055053400000037
wherein
Figure FDA0003055053400000038
Figure FDA0003055053400000039
Figure FDA00030550534000000310
Figure FDA00030550534000000311
Obtaining corresponding parameters by using the error transfer function matrix; by using
Figure FDA0003055053400000041
Obtaining the scale information s so as to recover the actual depth of each characteristic point and the result under the world coordinate system, wherein the corner mark c0Representing the camera coordinate system in the first frame image of the camera.
6. The three-dimensional software image stabilization method based on visual inertial information fusion according to claim 1, characterized in that: and 3, optimizing by adopting a sliding window method when tightly coupling the IMU data and the visual data for joint optimization, and selecting a part of frame information as key frame information to perform joint optimization with other solutions.
7. The three-dimensional software image stabilization method based on visual inertial information fusion of claim 6, wherein: the optimization process by adopting a sliding window method in the step 3 comprises the following steps:
using formulas
Figure FDA0003055053400000042
Representing the parameter variable to be optimized, where xkN denotes a camera state at each time, including a position
Figure FDA0003055053400000043
Speed of rotation
Figure FDA0003055053400000044
And rotate
Figure FDA0003055053400000045
And bias matrix b of accelerometer and gyroscope at each momenta,bg
Figure FDA0003055053400000046
Is a transformation matrix, lambda, between the camera coordinate system and the body coordinate systemnIs the inverse depth value; solving an objective function
Figure FDA0003055053400000047
And optimizing, wherein the first part of the objective function represents the marginalized residual error part, the second part is the IMU residual error, and the third part is the visual error function part.
CN202110497661.XA 2021-05-08 2021-05-08 Three-dimensional software image stabilizing method based on visual inertial information fusion Active CN113240597B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110497661.XA CN113240597B (en) 2021-05-08 2021-05-08 Three-dimensional software image stabilizing method based on visual inertial information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110497661.XA CN113240597B (en) 2021-05-08 2021-05-08 Three-dimensional software image stabilizing method based on visual inertial information fusion

Publications (2)

Publication Number Publication Date
CN113240597A true CN113240597A (en) 2021-08-10
CN113240597B CN113240597B (en) 2024-04-26

Family

ID=77132324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110497661.XA Active CN113240597B (en) 2021-05-08 2021-05-08 Three-dimensional software image stabilizing method based on visual inertial information fusion

Country Status (1)

Country Link
CN (1) CN113240597B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147325A (en) * 2022-09-05 2022-10-04 深圳清瑞博源智能科技有限公司 Image fusion method, device, equipment and storage medium
CN116208855A (en) * 2023-04-28 2023-06-02 杭州未名信科科技有限公司 Multi-tower crane cradle head panoramic image jitter coordination inhibition method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110030994A (en) * 2019-03-21 2019-07-19 东南大学 A kind of robustness vision inertia close coupling localization method based on monocular
CN110345944A (en) * 2019-05-27 2019-10-18 浙江工业大学 Merge the robot localization method of visual signature and IMU information
CN110498039A (en) * 2019-08-05 2019-11-26 北京科技大学 A kind of intelligent monitor system based on bionic flapping-wing flying vehicle
WO2020087846A1 (en) * 2018-10-31 2020-05-07 东南大学 Navigation method based on iteratively extended kalman filter fusion inertia and monocular vision
WO2021027323A1 (en) * 2019-08-14 2021-02-18 北京理工大学 Hybrid image stabilization method and device based on bionic eye platform
CN112749665A (en) * 2021-01-15 2021-05-04 东南大学 Visual inertia SLAM method based on image edge characteristics

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020087846A1 (en) * 2018-10-31 2020-05-07 东南大学 Navigation method based on iteratively extended kalman filter fusion inertia and monocular vision
CN110030994A (en) * 2019-03-21 2019-07-19 东南大学 A kind of robustness vision inertia close coupling localization method based on monocular
CN110345944A (en) * 2019-05-27 2019-10-18 浙江工业大学 Merge the robot localization method of visual signature and IMU information
CN110498039A (en) * 2019-08-05 2019-11-26 北京科技大学 A kind of intelligent monitor system based on bionic flapping-wing flying vehicle
WO2021027323A1 (en) * 2019-08-14 2021-02-18 北京理工大学 Hybrid image stabilization method and device based on bionic eye platform
CN112749665A (en) * 2021-01-15 2021-05-04 东南大学 Visual inertia SLAM method based on image edge characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程传奇;郝向阳;李建胜;刘智伟;胡鹏;: "基于非线性优化的单目视觉/惯性组合导航算法", 中国惯性技术学报, no. 05 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147325A (en) * 2022-09-05 2022-10-04 深圳清瑞博源智能科技有限公司 Image fusion method, device, equipment and storage medium
CN116208855A (en) * 2023-04-28 2023-06-02 杭州未名信科科技有限公司 Multi-tower crane cradle head panoramic image jitter coordination inhibition method and system
CN116208855B (en) * 2023-04-28 2023-09-01 杭州未名信科科技有限公司 Multi-tower crane cradle head panoramic image jitter coordination inhibition method and system

Also Published As

Publication number Publication date
CN113240597B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CN109166149B (en) Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
CN107255476B (en) Indoor positioning method and device based on inertial data and visual features
CN110068335B (en) Unmanned aerial vehicle cluster real-time positioning method and system under GPS rejection environment
CN104732518B (en) A kind of PTAM improved methods based on intelligent robot terrain surface specifications
CN112304307A (en) Positioning method and device based on multi-sensor fusion and storage medium
WO2024045632A1 (en) Binocular vision and imu-based underwater scene three-dimensional reconstruction method, and device
CN112102458A (en) Single-lens three-dimensional image reconstruction method based on laser radar point cloud data assistance
CN110349249B (en) Real-time dense reconstruction method and system based on RGB-D data
CN112669354B (en) Multi-camera motion state estimation method based on incomplete constraint of vehicle
CN113240597A (en) Three-dimensional software image stabilization method based on visual inertial information fusion
JP2023505891A (en) Methods for measuring environmental topography
CN110942477A (en) Method for depth map fusion by using binocular camera and laser radar
CN114964276A (en) Dynamic vision SLAM method fusing inertial navigation
CN114998556B (en) Virtual-real fusion method for mixed reality flight simulation system
CN113345032B (en) Initialization map building method and system based on wide-angle camera large distortion map
CN113211433B (en) Separated visual servo control method based on composite characteristics
Liu et al. Dense stereo matching strategy for oblique images that considers the plane directions in urban areas
Wang et al. A novel binocular vision system for accurate 3-D reconstruction in large-scale scene based on improved calibration and stereo matching methods
CN112767481B (en) High-precision positioning and mapping method based on visual edge features
CN117115271A (en) Binocular camera external parameter self-calibration method and system in unmanned aerial vehicle flight process
CN116182855B (en) Combined navigation method of compound eye-simulated polarized vision unmanned aerial vehicle under weak light and strong environment
CN111145267B (en) 360-degree panoramic view multi-camera calibration method based on IMU assistance
CN114485648B (en) Navigation positioning method based on bionic compound eye inertial system
CN112837409B (en) Method for reconstructing three-dimensional human body by using mirror
KR102107465B1 (en) System and method for generating epipolar images by using direction cosine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant