CN109186592B

CN109186592B - Method and device for visual and inertial navigation information fusion and storage medium

Info

Publication number: CN109186592B
Application number: CN201811014745.8A
Authority: CN
Inventors: 凌永根; 暴林超; 揭泽群; 刘威
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2022-05-20
Anticipated expiration: 2038-08-31
Also published as: CN109186592A

Abstract

The invention discloses a method for fusion of visual inertial navigation information, which comprises the following steps: respectively acquiring image data and inertial navigation data from an image sensor and an inertial sensor of a terminal; acquiring state estimation corresponding to the image data according to time offset between the nominal generation time and the actual generation time of the image data; and acquiring a state parameter set of the terminal based on a dynamic residual of the image data and a basic residual of the inertial navigation data, wherein the dynamic residual represents a difference between the state estimate and the state parameter set, and the basic residual represents a difference between an integral estimate corresponding to the inertial navigation data and the state parameter set. According to the scheme for fusing the visual inertial navigation information, which is provided by the embodiment of the invention, the visual inertial navigation information can be fused and applied to the asynchronously acquired inertial navigation data and image data by introducing the dynamic visual residual associated with the time offset acquired by the image data.

Description

Method and device for fusion of visual inertial navigation information and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for fusion of visual inertial navigation information, a computer storage medium, and an electronic device.

Background

Visual inertial navigation information fusion generally represents a method for fusing visual information and inertial navigation information and is used for synchronous positioning and environment reconstruction. The visual information generally refers to a two-dimensional image captured by a camera, and the Inertial navigation information generally refers to angular velocity information and acceleration information output by an IMU (Inertial Measurement Unit).

For example, an image obtained by shooting through a camera of the mobile terminal is two-dimensional, namely, the image is represented by a three-dimensional environment in a reduced dimension mode; however, by combining inertial navigation information output by the IMU, the three-dimensional environment where the mobile terminal is located can be reconstructed based on images taken by the camera at different times and different positions, and the historical positions of the mobile terminal at different times can be inferred. The process is synchronous positioning and environment reconstruction based on the fusion of the visual inertial navigation information.

Once the position and environment information of the mobile terminal is obtained, the mobile terminal is provided with the capability of interacting with the environment. For example, in VR (virtual Reality) and AR (Augmented Reality) applications, virtual objects may be placed in a real environment based on known environmental information. Meanwhile, by combining with the known terminal position information, the real environment and the virtual environment can be rendered into the image displayed on the terminal screen through the corresponding position relation. For example, in mall navigation, based on known environment information, a user may be helped to identify the environment in which the user is located; meanwhile, by combining with the known position information and displaying the virtual navigation information on the real environment displayed on the terminal screen in an overlapping manner, the user can be guided to nearby restaurants, shops, washrooms and the like according to the requirements.

With the rapid development of technologies such as VR, AR and the like, synchronous positioning and environment reconstruction gradually become a very important research direction in the field of computer vision, and have wide application, and meanwhile, higher and higher requirements are provided for timeliness and reliability of visual inertial navigation information fusion.

In order to merge inertial navigation information with image information, the measurement value acquisition times of the inertial navigation information and the image information are synchronized in addition to the alignment of the coordinate systems of the inertial sensor and the image sensor and the conversion into the real world coordinate system. Such time synchronization generally requires complex hardware design (e.g., using a unified clock) for the inertial sensor and the image sensor, but generally speaking, the two sensors on the terminal device are often from different providers, so that the requirement of time synchronization cannot be easily met. Therefore, how to realize the fusion of the visual inertial navigation information under the condition that the acquisition of the inertial navigation information and the image information is not synchronous becomes a problem to be solved urgently.

Disclosure of Invention

In order to solve the problem that the acquisition time of inertial navigation information and image information is difficult to synchronize in the related art, the invention provides a method and a device for fusion of visual inertial navigation information, a computer storage medium and an electronic device.

According to an embodiment of the present invention, there is provided a method for fusion of visual inertial navigation information, including: respectively acquiring image data and inertial navigation data from an image sensor and an inertial sensor of a terminal; acquiring state estimation corresponding to the image data according to time offset between the nominal generation time and the actual generation time of the image data; and acquiring a state parameter set of the terminal based on a dynamic residual of the image data and a basic residual of the inertial navigation data, wherein the dynamic residual represents a difference between the state estimation and the state parameter set, and the basic residual represents a difference between an integral estimation corresponding to the inertial navigation data and the state parameter set.

According to an embodiment of the present invention, there is provided an apparatus for fusion of visual inertial navigation information, including: the acquisition module is used for respectively acquiring image data and inertial navigation data from an image sensor and an inertial sensor of the terminal; the estimation module is used for acquiring state estimation corresponding to the image data according to the time offset between the nominal generation time and the actual generation time of the image data; and a fusion module, configured to obtain a state parameter set of the terminal based on a dynamic residual of the image data and a basic residual of the inertial navigation data, where the dynamic residual represents a difference between the state estimate and the state parameter set, and the basic residual represents a difference between an integral estimate corresponding to the inertial navigation data and the state parameter set.

According to an embodiment of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for visual inertial navigation information fusion as described above.

According to an embodiment of the present invention, there is provided an electronic apparatus including: a processor; and a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method for visual inertial navigation information fusion as described above.

The technical scheme provided by the embodiment of the invention can have the following beneficial effects:

according to the scheme for fusing the visual inertial navigation information, provided by the embodiment of the invention, the visual inertial navigation information can be fused and applied to the asynchronously acquired inertial navigation data and image data by introducing the dynamic visual residual associated with the time offset acquired by the image data.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 shows a schematic block diagram of an electronic device suitable for use in implementing embodiments of the present invention.

Fig. 2 and 3 respectively show schematic diagrams of synchronous acquisition and asynchronous acquisition of inertial navigation data and image data.

FIG. 4 is a flow chart illustrating a method for visual inertial navigation information fusion, according to an example embodiment.

FIG. 5 is a flow chart illustrating a method for visual inertial navigation information fusion, according to another exemplary embodiment.

FIG. 6 is a block diagram illustrating an apparatus for visual inertial navigation information fusion, according to an example embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

FIG. 1 shows a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. It should be noted that the electronic device 100 shown in fig. 1 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure. The electronic device 100 may be a mobile terminal such as a mobile phone or a tablet computer. Referring to fig. 1, the terminal 100 may include one or more of the following components: processing component 102, memory 104, power component 106, multimedia component 108, input/output (I/O) interface 112, sensor component 114, and communication component 116.

The processing component 102 generally controls overall operation of the terminal 100, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 102 may include one or more processors to execute instructions to perform all or a portion of the steps of the methods described above. Further, the process component 102 can include one or more modules that facilitate interaction between the process component 102 and other components. For example, the processing component 102 can include a multimedia module to facilitate interaction between the multimedia component 108 and the processing component 102.

The memory 104 is configured to store various types of data to support operations at the terminal 100. Examples of such data include instructions for any application or method operating on terminal 100, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 104 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 106 provides power to the various components of the terminal 100. The power components 106 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal 100.

The multimedia component 108 includes a screen providing an output interface between the terminal 100 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 108 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the terminal 100 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The I/O interface 112 provides an interface between the processing component 102 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc.

The sensor assembly 114 includes one or more sensors for providing various aspects of status assessment for the terminal 100. For example, sensor assembly 114 may detect an open/closed state of terminal 100, the relative positioning of components, such as a display and keypad of terminal 100, sensor assembly 114 may detect a change in position of terminal 100 or a component of terminal 100, the presence or absence of user contact with terminal 100, orientation or acceleration/deceleration of terminal 100, and a change in temperature of terminal 100. The sensor assembly 114 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 114 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 114 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 116 is configured to facilitate communications between the terminal 100 and other devices in a wired or wireless manner. The terminal 100 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication section 116 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 116 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the terminal 100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the methods of the above-described embodiments.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 4 and 5.

Before explaining the technical solutions of the embodiments of the present invention in detail, some related technical solutions and principles are described below.

In the background art, a mobile terminal is taken as an example to introduce a scene for synchronous positioning and environment reconstruction based on visual inertial navigation information fusion. In the following, the present invention is still described by taking a mobile terminal as an example of a carrier for implementing the fusion of the visual and inertial navigation information, but this is only an example and does not constitute a limitation to the scope of the present invention.

When the method for fusion of the visual and inertial navigation information is implemented on the mobile terminal, the method can be realized by running applications (such as VR and AR applications) installed on the mobile terminal. Accordingly, at this time, the application at the mobile end often needs to meet the requirements of real-time performance and mapping performance.

Firstly, when the user uses the application on the mobile terminal, the synchronous positioning and mapping process is on-line, that is, the positions of the mobile terminal at different moments need to be calculated in real time, and the surrounding environment needs to be reconstructed, so as to meet the application requirements.

Secondly, the position information obtained by synchronous positioning and the map constructed by map construction are inconsistent with the real world in scale. This is simply because the projection at the camera is the same for objects of different sizes at different distances. To address the issue of dimensional uncertainty, inertial navigation data for the inertial sensors may be introduced, including, for example, triaxial acceleration measured by an accelerometer and triaxial angular rate measured by a gyroscope, among others. Because the inertial navigation data are measured in a real world coordinate system and are fused with the image data acquired by the camera, the finally obtained scale of synchronous positioning and map construction can be mapped to be consistent with the scale of the real world.

On the other hand, as mentioned in the background section, in order to fuse inertial navigation information and image information together, synchronization of measurement value acquisition times of the inertial navigation information and the image information is also performed.

Typically, the IMU outputs data above 100Hz (i.e., more than 100 measurements are taken per second) and the image data is about 30Hz (i.e., 30 frames of image are taken per second). Therefore, if synchronization is to be performed between the IMU and the image data, complicated hardware design (e.g., using a unified clock) is required for the inertial sensor and the image sensor to be able to achieve the synchronization, but usually, the two sensors on the terminal device are from different providers, so that the requirement of time synchronization cannot be easily met.

In view of this, a conventional method for fusing visual and inertial navigation information is generally performed on the assumption that IMU data and image data are always synchronized. FIG. 2 is a schematic diagram of the synchronous acquisition of inertial navigation data and image data, as shown, assuming that the acquisition time t of each frame of image_k,t_k+1,t_k+2…, the corresponding IMU data is synchronously acquired. In practice, however, the IMU data and the image data acquired by the terminal are generally not synchronized. FIG. 3 is a schematic diagram illustrating asynchronous acquisition of inertial navigation data and image data, as shown in FIG. 3, the actual acquisition time of the k-th frame image

And nominal acquisition time t_kThere is a time offset between

In view of this, another conventional method for fusing visual inertial navigation information is based on the time offset

A known assumption is made, and the former assumption of always-on synchronization can be considered as

Special case of time. However, in practice, the time offset

In generalThe information is unknown and dynamically changed, so that the fusion result of the visual inertial navigation information is not accurate enough.

Therefore, how to efficiently and accurately realize the fusion of the visual inertial navigation information under the condition that the acquisition of the inertial navigation information and the image information is not synchronous becomes a problem to be solved urgently.

In view of the foregoing related technologies, embodiments of the present invention provide a method and an apparatus for fusion of visual and inertial navigation information, a computer-readable storage medium, and an electronic device.

The principle and implementation details of the technical solution of the embodiments of the present invention are explained in detail below.

FIG. 4 is a flow chart illustrating a method for visual inertial navigation information fusion, according to an example embodiment. As shown in FIG. 4, the method may be performed by the electronic device shown in FIG. 1 and may include the following

steps

410 and 450.

In step 410, image data and inertial navigation data are acquired from an image sensor and an inertial sensor of the terminal, respectively.

The image sensor herein includes an imaging device capable of detecting visible light, infrared light, or ultraviolet light, and includes, for example, a camera built in or externally attached to the terminal. Accordingly, the image data herein includes visible light, infrared light, or ultraviolet light imaging of the subject acquired by the image sensor. For the sake of simplicity, the following description will take an image sensor as the terminal camera and image data as the image captured by the terminal camera as an example.

In one embodiment, the terminal may include more than one image sensor, thereby providing more image data and achieving enhanced stability against partial image sensor failure. In addition, combining data from multiple image sensors may allow for spatial relationships between different image sensors, thereby providing a more accurate fusion result.

In some embodiments, the image data may be an image captured by a Global Shutter (Global Shutter), or may be an image captured by a Rolling Shutter (Rolling Shutter) and approximated by the Global Shutter. The shutter is used for clearing any pixel when exposure is started, and reading a signal value after exposure time elapses; by global shutter, it is meant that all pixels on the sensor receive light source and readout simultaneously; in the case of a rolling shutter, it means that the zero clearing and exposure are also performed sequentially line by line (usually from top to bottom) in a manner of serial data reading. The use of a Rolling Shutter tends to cause more severe image distortion than a global Shutter, a phenomenon commonly referred to as the jelly Effect (Rolling Shutter Effect). However, most current CMOS (Complementary Metal Oxide Semiconductor) sensors for cameras employ rolling shutters. For this reason, the image obtained by the rolling shutter shooting can be processed to be a global shutter image by approximation through a specific algorithm, and details can be referred to the content of another application of the applicant on the same day, and are not described herein again.

The inertial sensors here may comprise, for example, accelerometers capable of measuring the acceleration of the three axes of the terminal, and gyroscopes capable of measuring the angular velocity of the three axes of the terminal. The inertial sensors may also include other Inertial Measurement Units (IMU) capable of measuring acceleration and angular velocity. Accordingly, inertial navigation data herein may include acceleration and/or angular velocity measurements output by inertial sensors. For simplicity, inertial sensors including a speedometer and a gyroscope, inertial navigation data including acceleration and angular velocity (collectively referred to as IMU data) acquired by the two, respectively, are described below as examples.

Referring next to fig. 4, in step 430, a state estimate corresponding to the image data is obtained based on a time offset between a nominal generation time and an actual generation time of the image data.

In step 450, a state parameter set of the terminal is obtained based on the dynamic residual of the image data and the basic residual of the inertial navigation data.

Here, the dynamic residual is used to represent the difference between the state estimation obtained in step 430 and the state parameter set, and the basic residual is used to represent the difference between the integral estimation corresponding to the inertial navigation data and the state parameter set.

In some embodiments,

steps

430 and 450 may be implemented based on a framework of non-linear optimization. An example of an algorithm for acquiring the set of status parameters at

steps

430 and 450 is described in detail below.

First, suppose (·)^wA coordinate system representing the real world,

and

respectively show the time t of taking the k-th frame image_kThe IMU coordinate system and the camera coordinate system,

representing three-dimensional position, velocity and rotation from the Y coordinate system to the X coordinate system, respectively. For rotation

Can be represented by using corresponding Hamiltonian quaternions

But merely as an example of a convenient mathematical operation and different rotational representations may be used to represent the same physical quantity. In addition, it is also assumed in this example that the image acquired by the camera has been rectified and that the internal parameters (including, for example, focal length and center point) are known. The displacement and rotation between IMU data and image are respectively

And

in addition, as mentioned above, the actual acquisition timing of the k-th frame image

And nominal acquisition time t_kThere is a time offset between

In the original method, it is assumed that

Are known. Based on the method provided by the embodiment of the invention, the condition can be relaxed, and the assumption is made that

Are unknown and dynamically changing.

Subsequently, the state parameter set X of the terminal may be defined as:

wherein the content of the first and second substances,

and

are each t_kThe acceleration and angular velocity deviations at the moment,

is a representation of the feature point j in a real world coordinate system.

In some embodiments, the objective of the visual inertial navigation fusion is to obtain a set of state parameters X that minimizes the following objective equation.

Wherein, b_pAnd H_pRepresenting prior information; s. the_iRepresenting a set of inertial navigation data; s_cRepresenting a set of image data;

a residual function representing IMU data, corresponding to a covariance matrix of

A residual function representing image data, corresponding covariance matrix of ∑_c。

To derive the residual function of the IMU, one can start with deriving a propagation model of the IMU, from the equation of motion:

wherein, Δ t_k＝t_k+1-t_k，g^w＝[0，0，9.8]^TIs a representation of the gravity vector in the world coordinate system,

is t_kAnd t_k+1The relative amount therebetween can be obtained by integrating the acceleration and integrating the angular velocity. In practice, however, since the measurement data obtained from the IMU is discrete, the relative quantities can generally be approximated by numerical integration

In the process of calculation, the calculation can also be carried out

Corresponding covariance matrix

Computing

And

reference is made to the article "IMU prediction on manual for actual visual-inertial maximum-a-temporal estimation" (IMU pre-integration on effective visual-inertial maximum posterior estimation manifold) published in the proceedings of robotics and systems (proc.of robot.: sci.and Syst.) by Christian, f., Luca, c., Frank, d., Davide, S, which is not further described herein.

Then, get

And

the residual function of the IMU can then be derived as:

continuing, residual equation of image data

It can be further understood as the difference between the projection of the feature points in the camera plane and the observed values. Assuming a pinhole camera model, for a feature point in world coordinates

If the IMU and the acquisition of the image data are synchronized, then its projection at the k-th image is:

accordingly, the visual residual can be further obtained as:

wherein the content of the first and second substances,

and

respectively representing the ith characteristic point in the real world coordinate system

The projection and observation values in the k-th image data,

and

respectively represent the feature points

Converting the three-axis coordinates into a coordinate system of the image sensor,

and

representing rotation and displacement between the image data and the inertial navigation data, respectively. Here, the first and second liquid crystal display panels are,

the viewing variables, which can be considered visual, can be derived based on the point locations provided by the feature point tracker, and the covariance matrix sigma can be set according to the performance of the feature point tracker_c. For detailed description of the feature point trackerReference is made to Shi, j., Tomasi, c. in 1994, an article published in the IEEE robotic Vision and Pattern Recognition article (proc.of IEEE conf.on.computer Vision and Pattern Recognition): good features to track, will not be described herein.

Next, in order to solve the objective equation of equation (2), the gauss newton method may be employed. Specifically, assume that there is an initial value

The initial value may be an estimated value of the last time or an initial value provided by a specific initialization method (see another chinese patent application of the present applicant on the same day). Based on the assumption of initial values, the target equation (2) can be iteratively derived with respect to the error state δ X to convert to the minimum value for solving the following target equation.

Updating

Until convergence. Wherein H_kAnd

a matrix of first derivatives of the residual function of the IMU data and the residual function of the image data, respectively.

The derivation above is based on the assumption that the IMU data and the image data are acquired synchronously, and if they are acquired asynchronously, equation (7) will become:

wherein, the first and the second end of the pipe are connected with each other,

and

is that

The rotation and position of the image sensor. In some embodiments, IMU integration may be used to derive from

And

computing

And

wherein the content of the first and second substances,

and

is IMU from t_kTo

The amount of integration of (a):

wherein the content of the first and second substances,

and

is b_tInstantaneous linear acceleration and angular velocity of rotation at a time. Introduction of

After X, δ X becomes represented by the following formula:

where for a rotational variable, it can be represented using a minimum representation as:

for other variables, standard addition may be used.

From the foregoing, when introducing

The residual function of the IMU data (also called the base residual) and its first derivative matrix H_kUnchanged, the residual function (also called dynamic residual) and the first derivative matrix of the image data both become

The value of the time of day. In particular, the present invention relates to a method for producing,

in (2) isThe form is as follows:

wherein the content of the first and second substances,

in addition, the first and second substrates are,

representing a co-symmetric version of a vector.

The above formula relates to the time

But because of

The above calculated values will change continuously with the iteration and updating process. For this reason, the applicant proposes a new fast algorithm to calculate the formula, and the details can refer to the content of another application of the same date of the applicant, which is not described herein again.

As described above, step 450 reconstructs the objective functions (2) and (9) of the state parameter set based on the dynamic residual of the image data and the base residual of the IMU data.

According to the method for fusing the visual inertial navigation information, provided by the embodiment of the invention, the visual inertial navigation information can be fused and applied to the asynchronously acquired inertial navigation data and image data by introducing the dynamic visual residual associated with the time offset acquired by the image data.

FIG. 5 is a flow chart illustrating a method for visual inertial navigation information fusion, according to another example embodiment. As shown in FIG. 5, the method may be performed by the electronic device shown in FIG. 1 and may include the following steps 510-550.

In step 510, image data and inertial navigation data are acquired from an image sensor and an inertial sensor of the terminal, respectively.

The details of step 510 can be found in the detailed description of step 410 in the embodiment of fig. 4, and are not repeated here.

In step 530, a temporal residual corresponding to a temporal offset between a nominal generation time and an actual generation time of the image data is obtained.

As already mentioned above, in connection with FIG. 3, the actual moment of acquisition of the image

And nominal acquisition time t_kThere is a time offset between

And due to time offset

Not only will change with the change of system load, but also will be affected by sensor jamming, therefore

Are generally unknown and dynamically changing. For example, temporal offset of adjacent k-th and k + 1-th image data

And

there may also be differences between them.

In some embodiments of the present invention, in order to consider the above time offset and its dynamic change in the process of fusion of the visual inertial navigation information, a time residual corresponding to the time offset may be introduced in the target equation (2).

For this purpose, consider

Is a quantity that changes constantly over time and can then be modeled using a gaussian-based random walk model:

wherein n is₀Is a mean value of zero and a covariance matrix of ∑₀Gaussian noise.

Since the optimization target (state parameter set) of the target equation (2) is a set of discrete values, the gaussian noise can be integrated between the acquisition time of two adjacent image data to obtain the time offset of two adjacent images

And

the constraints of (a) are as follows:

wherein the content of the first and second substances,

and

respectively, discrete noise and covariance matrix, at_k＝t_k+0-t_kIs the time period between the acquisition timings of two adjacent image data.

Thus, the time residual corresponding to the time offset can be obtained based on the constraints of equations (20) and (21) as follows:

in step 550, an objective function of the state parameter set is constructed based on the temporal residual, the dynamic residual, and the basic residual, and a state parameter set that makes the objective function extremal is obtained.

Here, by introducing a time residual on the basis of the objective equation (2), an updated objective function can be obtained as follows:

for obtaining the state parameter set for making the objective function (23) take the extreme value, the example of the calculation process can refer to the detailed description of step 430 and step 450 in the embodiment of fig. 4, and will not be described herein again. It is only necessary to supplement that optimization takes into account

The first derivative matrix of this function is

According to the method for fusing the visual inertial navigation information, provided by the embodiment of the invention, the visual inertial navigation information fusion can be applied to the asynchronously acquired inertial navigation data and image data by introducing the dynamic visual residual associated with the time offset acquired by the image data.

On the other hand, by introducing a residual function between the time offsets of two adjacent image data, the situation that the time offset estimation value jumps can be avoided, the characteristic that the time offset estimation value continuously changes along with the flow of time is kept, and the reliability and the accuracy of the visual inertial navigation information fusion are further improved.

The following is an embodiment of an apparatus of the present invention, which can be used to execute the above embodiment of the method for fusion of visual inertial navigation information of the present invention. For details that are not disclosed in the embodiments of the apparatus of the present invention, please refer to the above-mentioned embodiments of the method for fusion of the visual inertial navigation information of the present invention.

FIG. 6 is a block diagram illustrating an apparatus for visual inertial navigation information fusion, according to an example embodiment. As shown in fig. 6, the apparatus, which may be implemented by the electronic device shown in fig. 1, may include an obtaining module 610, an estimating module 630, and a fusing module 650.

The obtaining module 610 is configured to obtain image data and inertial navigation data from an image sensor and an inertial sensor of the terminal, respectively; the estimation module 630 is configured to obtain a state estimation corresponding to the image data according to a time offset between a nominal generation time and an actual generation time of the image data; the fusion module 650 is configured to obtain a state parameter set of the terminal based on the dynamic residual of the image data and the basic residual of the inertial navigation data. Wherein a dynamic residual represents a difference between the state estimate and the state parameter set, and a base residual represents a difference between an integral estimate corresponding to the inertial navigation data and the state parameter set.

In one embodiment, the fusion module 650 includes an objective function unit and a solution unit. Wherein the objective function unit is configured to construct an objective function of the set of state parameters based on the dynamic residual and the base residual; the solving unit acquires a state parameter set enabling the objective function to take an extreme value.

In one embodiment, the fusion module 650 may further include a time offset unit for obtaining a time residual corresponding to the time offset. Accordingly, the objective function unit may construct an objective function of the state parameter set based on the temporal residual, the dynamic residual, and the base residual.

In one embodiment, the specific operation of the time offset unit to obtain the time residual may include: modeling the time offset as Gaussian noise with a mean value of zero based on a Gaussian random walk model; integrating the Gaussian noise between the acquisition moments of two adjacent image data to obtain a constraint condition between the time offsets of the two adjacent image data; and acquiring a time residual corresponding to the time offset according to the constraint condition.

In one embodiment, the time offset unit may construct the time residual as a function as shown in equation (22) above.

In one embodiment, the objective function unit may construct the objective function of the set of state parameters as shown in equation (2) above.

In one embodiment, the dynamic residuals introduced when the objective function unit constructs the objective function of the state parameter set may be constructed as shown in the above equations (8) and (10), and the base residuals may be constructed as shown in the above equation (6).

The calculation examples herein can be referred to the above detailed descriptions of equations (1) - (23), and are not repeated here.

In summary, according to the apparatus for fusion of visual inertial navigation information provided by the embodiment of the present invention, by introducing a dynamic visual residual associated with a time offset of image data acquisition, the visual inertial navigation information fusion can be applied to the asynchronously acquired inertial navigation data and image data.

On the other hand, by introducing a residual function between the time offsets of two adjacent image data, the situation that the time offset estimation value jumps can be avoided, the characteristic that the time offset estimation value continuously changes along with the flow of time is kept, and the reliability and the accuracy of the fusion of the visual inertial navigation information are further improved.

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units. The components shown as modules or units may or may not be physical units, i.e. may be located in one place or may also be distributed over a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiment of the present invention.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for fusion of visual inertial navigation information, the method comprising:

respectively acquiring image data and inertial navigation data from an image sensor and an inertial sensor of a terminal;

acquiring state estimation corresponding to the image data according to time offset between the nominal generation time and the actual generation time of the image data;

acquiring a time residual corresponding to the time offset;

and constructing an objective function of the state parameter set of the terminal based on the time residual, the dynamic residual of the image data and the basic residual of the inertial navigation data, and acquiring the state parameter set enabling the objective function to take an extreme value, wherein the dynamic residual represents a difference between the state estimation and the state parameter set of the terminal, and the basic residual represents a difference between an integral estimation corresponding to the inertial navigation data and the state parameter set of the terminal.

2. The method of claim 1, wherein said obtaining a time residual corresponding to said time offset comprises:

modeling the time offset based on a Gaussian random walk model to enable the derivative of the time offset to be Gaussian noise with zero mean;

integrating the Gaussian noise between the acquisition moments of two adjacent image data to obtain a constraint condition between the time offsets of the two adjacent image data; and

and acquiring a time residual corresponding to the time offset according to the constraint condition.

3. The method of claim 2, wherein said obtaining a time residual corresponding to said time offset comprises:

obtaining the time residual as:

and

respectively representing time offsets at which the kth image data and the (k + 1) th image data are acquired,

a covariance matrix representing the Gaussian noise.

4. The method of claim 3, wherein constructing the objective function for the set of state parameters for the terminal based on the temporal residual, the dynamic residual of the image data, and the base residual of the inertial navigation data comprises:

constructing the objective function as:

representing the integral relative quantity of the inertial navigation data between the acquisition moments of the kth and k +1 th image data; x represents a state parameter set of the terminal;

representing the base residual with a corresponding covariance matrix of

A state estimate representing the kth image data;

representing the dynamic residual, the corresponding covariance matrix being sigma_c。

5. The method of claim 4, wherein constructing the objective function for the set of state parameters for the terminal based on the temporal residual, the dynamic residual of the image data, and the base residual of the inertial navigation data further comprises:

constructing the dynamic residual

Comprises the following steps:

wherein the content of the first and second substances,

and

The projection and observation values in the k-th image data,

and

respectively represent the feature points

and

representing rotation and displacement between the image data and the inertial navigation data, respectively,

and

respectively representing the rotation and the position of the image sensor at the actual generation moment.

6. The method of claim 4, wherein constructing the objective function for the set of state parameters for the terminal based on the temporal residual, the dynamic residual of the image data, and the base residual of the inertial navigation data further comprises:

constructing the base residual

Comprises the following steps:

wherein the content of the first and second substances,

and

representing the integral relative quantity of the inertial navigation data between the acquisition moments of the kth and k +1 th image data; Δ t_kRepresenting a time difference between acquisition timings of the k-th and k + 1-th image data; g is a radical of formula^wThe gravity acceleration under a real world coordinate system;

and

respectively, at the acquisition time t of the k-th image data_kFrom the inertial sensor coordinate system to the position of the real world coordinate system at the acquisition time t_kVelocity from the inertial sensor coordinate system to the real world coordinate system and at the acquisition time t_kA rotation from the real world coordinate system to the inertial sensor coordinate system;

is indicated at the acquisition time t_kA rotation from the inertial sensor coordinate system to the real world coordinate system;

indicates the acquisition time t of the (k + 1) th image data_k+1A rotation from the inertial sensor coordinate system to the real world coordinate system.

7. An apparatus for fusion of visual inertial navigation information, the apparatus comprising:

the first acquisition module is used for respectively acquiring image data and inertial navigation data from an image sensor and an inertial sensor of the terminal;

the estimation module is used for acquiring state estimation corresponding to the image data according to the time offset between the nominal generation time and the actual generation time of the image data;

a second obtaining module, configured to obtain a time residual corresponding to the time offset;

and a fusion module, configured to construct an objective function of a state parameter set of the terminal based on the temporal residual, a dynamic residual of the image data, and a basic residual of the inertial navigation data, and obtain the state parameter set enabling the objective function to take an extreme value, where the dynamic residual represents a difference between the state estimate and the state parameter set, and the basic residual represents a difference between an integral estimate corresponding to the inertial navigation data and the state parameter set.

8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method for visual inertial navigation information fusion according to any one of claims 1 to 6.

9. An electronic device, comprising:

a processor; and

a memory having stored thereon computer readable instructions which, when executed by the processor, implement the method for visual inertial navigation information fusion according to any one of claims 1 to 6.