CN113392904B

CN113392904B - LTC-DNN-based visual inertial navigation combined navigation system and self-learning method

Info

Publication number: CN113392904B
Application number: CN202110664888.9A
Authority: CN
Inventors: 胡斌杰; 丘金光
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-06-16
Filing date: 2021-06-16
Publication date: 2022-07-26
Anticipated expiration: 2041-06-16
Also published as: CN113392904A; WO2022262878A1

Abstract

The invention discloses a visual inertial navigation combined navigation system based on LTC-DNN and a self-learning method, wherein the visual inertial navigation combined navigation system comprises a deep learning network model, wherein the deep learning network model consists of a visual feature extraction module, an inertial navigation feature extraction module and a pose regression module; the visual feature extraction module is used for extracting visual features of the two adjacent frames of RGB pictures; the inertial navigation feature extraction module is used for extracting inertial navigation features of the inertial navigation data; the pose regression module comprises an attention mechanism fusion submodule, a liquid time constant recurrent neural network (LTC-RNN) and a full-connection regression submodule, and is used for predicting relative displacement and relative rotation. The self-learning method disclosed by the invention is used for training the visual inertial navigation integrated navigation system, and compared with the same type of algorithm, the dependence on a real label is reduced; and the estimation precision of the relative displacement and the relative pose of the deep learning network model is high, and the robustness to data damage is good.

Description

LTC-DNN-based visual inertial navigation combined navigation system and self-learning method

Technical Field

The invention relates to the technical field of sensor fusion and motion estimation, in particular to a visual inertial navigation combined navigation system based on LTC-DNN and a self-learning method.

Background

With the continuous development of automatic driving and unmanned aerial vehicles, the realization of high-precision and high-robustness positioning is an important premise for completing tasks such as autonomous navigation, searching unknown areas and the like, a pure visual odometry method acquires surrounding environment information by using a visual sensor, and estimates a motion state by analyzing visual data, but once a shelter appears in a scene or the visual data is lost due to data transmission reasons, the estimation of the motion state is undoubtedly seriously interfered, and an error is originally larger. The visual inertial navigation odometer adds and Inertial Measurement Unit (IMU) information on the basis of a pure visual odometer, and can improve the precision of motion state estimation under the condition of visual loss.

In recent years, deep learning techniques have achieved enormous success in the field of computer vision, and are widely used in various fields. The visual inertial navigation combined navigation is used as a regression task and can also be trained by adopting a deep learning method, but the existing visual inertial navigation combined navigation algorithm based on the deep learning is limited by the number of real labels in the training process, and the generalization capability is weak; meanwhile, a large amount of trainable parameters are needed in the existing visual inertial navigation combination navigation task based on deep learning, and the practical application of the visual inertial navigation combination navigation task is greatly influenced.

Disclosure of Invention

The invention aims to solve the defects in the prior art and provides a visual inertial navigation combined navigation system based on LTC-DNN and a self-learning method.

The first purpose of the invention can be achieved by adopting the following technical scheme:

an LTC-DNN-based visual inertial navigation combined navigation system is used for automatic driving and autonomous navigation of unmanned aerial vehicles and comprises a deep learning network model, wherein the deep learning network model consists of a visual feature extraction module, an inertial navigation feature extraction module and a pose regression module which are sequentially connected in sequence,

the visual feature extraction module is used for extracting 1024-dimensional visual features, the input of the visual feature extraction module is two adjacent frames of RGB pictures overlapped along a channel, and the 1024-dimensional visual features are output;

the inertial navigation feature extraction module comprises a first single-layer LTC-RNN in a 1024-dimensional hidden state; the input of the inertial navigation feature extraction module is inertial navigation data between the two adjacent frames of RGB pictures, and the output of the inertial navigation feature extraction module is 1024-dimensional inertial navigation features;

the pose regression module comprises an attention mechanism fusion submodule, a 1000-dimensional hidden state second single-layer LTC-RNN and a fully-connected regression submodule which are connected in sequence, wherein the input of the attention mechanism fusion submodule is a series connection characteristic obtained by connecting a visual characteristic and an inertial navigation characteristic in series and is used for weighting the visual characteristic and the inertial navigation characteristic to obtain a weighted fusion characteristic; the input of the second single-layer LTC-RNN is weighted fusion characteristics, and regression characteristics are output; the input of the fully connected regression submodule is the regression feature, and the estimation of the relative displacement and the relative rotation is output.

Further, the visual feature extraction module is formed by sequentially stacking 10 layers of convolutional neural networks, the sizes of convolutional kernels of the former three layers of convolutional neural networks in the 10 layers of convolutional neural networks are 7 × 7, 5 × 5 and 5 × 5 in sequence, the sizes of convolutional kernels of the latter seven layers of convolutional neural networks are all 3 × 3, the convolution step lengths of the fourth layer, the sixth layer and the eighth layer of convolutional neural networks are 1, and the convolution step lengths of the rest convolutional neural networks are 2; the 10-layer convolutional neural networks all use the ReLU activation function.

Further, the RGB picture is converted to 416 × 128 size before being input to the feature extraction module.

Further, the calculation formula of the first single-layer LTC-RNN and the second single-layer LTC-RNN is as follows:

h (t) is a hidden state of the LTC-RNN at the current moment, tau is a constant time constant, delta t is a time step, x (t) is input data of the current moment, f (h) (t), x (t), t and theta are deep learning networks, theta is a trainable parameter of the deep learning networks, t is the current moment, the calculation mode of the first single-layer LTC-RNN and the second single-layer LTC-RNN inputs data x (t) and h (t) into the calculation formula at the initial stage of each calculation, the current output h (t + delta t) of the formula is used as the input h (t) of the formula for the next time, the calculation is continued, and the calculation is repeated for 6 times; and taking the output h (t + delta t) of the 6 th time as the calculation result of the first single-layer LTC-RNN and the second single-layer LTC-RNN.

Furthermore, the attention mechanism fusion sub-module comprises two sub-networks with the same structure, each sub-network is formed by overlapping two layers of fully-connected networks, the dimensionality of the first layer of the fully-connected network is 2048 and is connected with a ReLU activation function, the dimensionality of the second layer of the fully-connected network is 1024 and is connected with a Sigmoid activation function.

Further, the fully-connected regression submodule consists of four layers of fully-connected networks, wherein the dimension of the first layer of fully-connected network is 512, the dimension of the second layer of fully-connected network is 128, the dimension of the third layer of fully-connected network is 64, and the dimension of the fourth layer of fully-connected network is 6; and a ReLU activation function is connected behind the first three layers of fully-connected networks in the fully-connected regression submodule, and the fourth layer of fully-connected networks are not connected with any activation function.

The other purpose of the invention can be achieved by adopting the following technical scheme:

a self-learning method of an LTC-DNN based visual inertial navigation combination navigation system comprises the following steps:

s1, converting the real label with real relative displacement and relative rotation into standard normal distribution to obtain a real standard label, a mean value 1 and a variance 1, and performing first training on the deep learning network model by using the real standard label;

s2, predicting the unlabeled data by the deep learning network model after the first training, and performing first inverse standardized calculation on the prediction result by using the mean value 1 and the variance 1 to obtain a pseudo label;

s3, randomly selecting a certain number of pseudo labels and real labels, and mixing according to the proportion of 0.2:1 to obtain mixed labels;

and S4, converting the mixed label into a standard normal distribution to obtain a mixed standard label, a mean value 2 and a variance 2, and performing secondary training on the deep learning network model by using the mixed standard label.

Further, the pseudo label, the real label and the mixed label comprise relative displacement and relative rotation on x, y and z axes.

Further, the operation of converting the real label and the mixed label into the standard normal distribution is to convert the relative displacement and the relative rotation on the x, y and z axes into the standard normal distribution respectively.

Further, the training of the deep learning network model uses an Adam optimizer with momentum set to (0.9, 0.99); the learning rates of the first single-layer LTC-RNN and the second single-layer LTC-RNN are set to be 0.001, and the learning rates of the rest of the modules are set to be 0.00001; the loss function is smooth _ l1_ loss.

Compared with the prior art, the invention has the following advantages and effects:

(1) the invention provides a visual inertial navigation integrated navigation system based on LTC-DNN, which comprises a deep learning network model, wherein the deep learning network model introduces a first single-layer LTC-RNN and a second single-layer LTC-RNN, and the purposes of reducing the trainable parameter number of the deep learning network model and improving the robustness of the deep learning network model are achieved.

(2) The invention provides a self-learning method of a visual inertial navigation combined navigation system based on LTC-DNN, which reduces the dependence on real labels compared with the same type of algorithm.

Drawings

Fig. 1 is a schematic structural diagram of a deep learning network model in an integrated navigation system based on LTC-DNN vision inertial navigation system disclosed in an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an attention mechanism fusion submodule in an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a fully connected regression submodule in an embodiment of the present invention;

fig. 4 is a flowchart of a self-learning method of a visual inertial navigation integrated navigation system based on LTC-DNN disclosed in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Example one

The embodiment discloses a visual inertial navigation combination navigation system based on LTC-DNN, and fig. 1 is a schematic structural diagram of the visual inertial navigation combination navigation system based on LTC-DNN.

Referring to fig. 1, the deep learning network model is composed of a visual feature extraction module, an inertial navigation feature extraction module and a pose regression module which are sequentially connected.

The visual feature extraction module is used for extracting 1024-dimensional visual features, the input of the visual feature extraction module is two adjacent frames of RGB pictures overlapped along a channel, and the 1024-dimensional visual features are output.

The visual feature extraction module is formed by sequentially stacking 10 layers of convolutional neural networks, the sizes of convolutional kernels of the former three layers of convolutional neural networks in the 10 layers of convolutional neural networks are 7 multiplied by 7, 5 multiplied by 5 and 5 multiplied by 5 in sequence, the sizes of convolutional kernels of the latter seven layers of convolutional neural networks are all 3 multiplied by 3, the convolution step lengths of the fourth layer, the sixth layer and the eighth layer of convolutional neural networks are 1, and the convolution step lengths of the rest convolutional neural networks are 2; the 10-layer convolutional neural networks all use the ReLU activation function.

The inertial navigation feature extraction module comprises a first single-layer LTC-RNN (liquid time constant recurrent neural network) with 1024-dimensional hidden states; the input of the inertial navigation feature extraction module is inertial navigation data between the two adjacent frames of RGB pictures, and the output is 1024-dimensional inertial navigation features;

the pose regression module comprises an attention mechanism fusion submodule, a 1000-dimensional hidden second single-layer LTC-RNN and a fully-connected regression submodule which are connected in sequence, wherein the input of the attention mechanism fusion submodule is a series connection characteristic obtained by connecting a visual characteristic and an inertial navigation characteristic in series, and the series connection characteristic is used for weighting the visual characteristic and the inertial navigation characteristic to obtain a weighted fusion characteristic; the input of the second single-layer LTC-RNN is a weighted fusion characteristic, and a regression characteristic is output; the input of the fully connected regression submodule is regression characteristics, and estimates of relative displacement and relative rotation are output.

The calculation formula of the first single-layer LTC-RNN and the second single-layer LTC-RNN is as follows:

h (t) is a hidden state of the LTC-RNN at the current moment, tau is a constant time constant, delta t is a time step length, x (t) is input data of the current moment, f (h) (t), x (t), t and theta are deep learning networks, theta is trainable parameters of the deep learning networks, t is the current moment, the data x (t) and h (t) are input into the calculation formula in the starting stage of each calculation in the calculation mode of the first single-layer LTC-RNN and the second single-layer LTC-RNN, the current output h (t + delta t) of the formula is taken as the input h (t) of the formula for the next time, the calculation is continued, and the calculation is repeated for 6 times; and taking the output h (t + delta t) of the 6 th time as the calculation result of the first single-layer LTC-RNN and the second single-layer LTC-RNN.

Fig. 2 is a schematic diagram of an attention mechanism fusion sub-module according to an embodiment of the present invention, and with reference to fig. 2, the attention mechanism fusion sub-module includes two sub-networks with the same structure, each sub-network is formed by overlapping two layers of fully connected networks, a dimension of the first layer of fully connected network is 2048, and is followed by a ReLU activation function, a dimension of the second layer of fully connected network is 1024, and is followed by a Sigmoid activation function.

Fig. 3 is a schematic structural diagram of a fully-connected regression submodule according to an embodiment of the present invention, referring to fig. 3, where the fully-connected regression submodule is composed of four layers of fully-connected networks, a dimension of the first layer of fully-connected network is 512, a dimension of the second layer of fully-connected network is 128, a dimension of the third layer of fully-connected network is 64, and a dimension of the fourth layer of fully-connected network is 6; and a ReLU activation function is connected behind the first three layers of fully-connected networks in the fully-connected regression submodule, and the fourth layer of fully-connected networks are not connected with any activation function.

Example two

The embodiment discloses a self-learning method of the visual inertial navigation combination navigation system based on the LTC-DNN disclosed in the embodiment. FIG. 4 is a flow chart of a self-learning method according to an embodiment of the present invention, and referring to FIG. 4, the self-learning method comprises four steps, and the process is as follows:

s3, randomly selecting a certain number of pseudo labels and real labels to be mixed according to the proportion of 0.2:1 to obtain mixed labels;

The pseudo label, the real label and the mixed label comprise relative displacement and relative rotation on x, y and z axes.

The operation of converting the real label and the mixed label into the standard normal distribution is to convert the relative displacement and the relative rotation on the x, y and z axes into the standard normal distribution respectively.

Training of the deep learning network model uses an Adam optimizer with momentum set to (0.9, 0.99); the learning rates of the first single-layer LTC-RNN and the second single-layer LTC-RNN are set to be 0.001, and the learning rates of the rest of the modules are set to be 0.00001; the penalty function is smooth _ l1_ loss.

Inputting two adjacent frames of RGB pictures overlapped along a channel by a visual feature extraction module in the deep learning network model after the second training to obtain visual features; meanwhile, inertial navigation data between two adjacent frames of RGB pictures are input into an inertial navigation feature extraction module to obtain inertial navigation features; then, the visual characteristic and the inertial navigation characteristic are connected in series along the row direction and input to a pose regression module to obtain relative displacement 1 and relative rotation 1; next, the relative displacement 1 and the relative rotation 1 are subjected to a second inverse normalization by the mean 2 and the variance 2, and the relative displacement 2 and the relative rotation 2 are obtained.

In summary, the self-learning method in this embodiment introduces the pseudo tags and the real tags to train the deep learning network model together, so that the requirement for the number of the real tags is reduced, unlike other methods that require a large number of real tags to train. The method utilizes the first single-layer LTC-RNN and the second single-layer LTC-RNN to respectively extract inertial navigation characteristics and regress pose, and has the advantages that the iterative calculation mode inside the first single-layer LTC-RNN and the second single-layer LTC-RNN increases the capability of extracting the characteristics, and the method is different from other recurrent neural networks which only extract the characteristics in a single calculation mode.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. The visual inertial navigation combined navigation system based on LTC-DNN is used for automatic driving and autonomous navigation of unmanned aerial vehicles and is characterized by comprising a deep learning network model, wherein the deep learning network model consists of a visual feature extraction module, an inertial navigation feature extraction module and a pose regression module which are sequentially connected in sequence,

the inertial navigation feature extraction module comprises a first single-layer LTC-RNN with 1024-dimensional hidden states; the input of the inertial navigation feature extraction module is inertial navigation data between the two adjacent frames of RGB pictures, and the output is 1024-dimensional inertial navigation features;

the pose regression module comprises an attention mechanism fusion submodule, a 1000-dimensional hidden state second single-layer LTC-RNN and a fully-connected regression submodule which are connected in sequence, wherein the input of the attention mechanism fusion submodule is a series connection characteristic obtained by connecting a visual characteristic and an inertial navigation characteristic in series and is used for weighting the visual characteristic and the inertial navigation characteristic to obtain a weighted fusion characteristic; the input of the second single-layer LTC-RNN is weighted fusion characteristics, and regression characteristics are output; the input of the full-connection regression submodule is regression characteristics, and estimation of relative displacement and relative rotation is output;

wherein the calculation formula of the first single-layer LTC-RNN and the second single-layer LTC-RNN is as follows:

2. The integrated navigation system for visual inertial navigation based on LTC-DNN (low temperature co-fired temperature sensor-based) according to claim 1, wherein the visual feature extraction module is formed by sequentially stacking 10 layers of convolutional neural networks, the sizes of convolutional kernels of the former three layers of convolutional neural networks in the 10 layers of convolutional neural networks are 7 x 7, 5 x 5 and 5 x 5 in sequence, the sizes of convolutional kernels of the latter seven layers of convolutional neural networks are 3 x 3, the convolution step sizes of the fourth layer, the sixth layer and the eighth layer of convolutional neural networks are 1, and the convolution step sizes of the rest convolutional neural networks are 2; the 10-layer convolutional neural networks all use the ReLU activation function.

3. The LTC-DNN based visual inertial navigation combination navigation system according to claim 1, wherein the RGB picture is converted to 416 x 128 size before input feature extraction module.

4. The LTC-DNN based visual and inertial navigation combined navigation system according to claim 1, wherein the attention mechanism fusion submodule comprises two sub-networks with the same structure, each sub-network is formed by overlapping two layers of fully connected networks, the dimension of the first layer of fully connected networks is 2048 and is connected with a ReLU activation function, the dimension of the second layer of fully connected networks is 1024 and is connected with a Sigmoid activation function.

5. The LTC-DNN based visual inertial navigation combination navigation system of claim 1, wherein the fully-connected regression sub-module is composed of four layers of fully-connected networks, wherein the first layer of fully-connected network dimension is 512, the second layer of fully-connected network dimension is 128, the third layer of fully-connected network dimension is 64, and the fourth layer of fully-connected network dimension is 6; and the front three layers of fully-connected networks in the fully-connected regression submodule are connected with a ReLU activation function, and the fourth layer of fully-connected networks are not connected with any activation function.

6. A self-learning method of the LTC-DNN based visual inertial navigation unit navigation system according to any one of claims 1 to 5, the self-learning method comprising the steps of:

7. The self-learning method of the LTC-DNN based visual inertial navigation combination navigation system of claim 6, wherein the pseudo tag, the real tag and the hybrid tag comprise relative displacement and relative rotation on x, y and z axes.

8. The self-learning method of the LTC-DNN based visual inertial navigation combined navigation system is characterized in that the real label and the mixed label are converted into the standard normal distribution according to the claim 6, and the relative displacement and the relative rotation on the x axis, the y axis and the z axis are respectively converted into the standard normal distribution.

9. The self-learning method of the LTC-DNN based visual inertial navigation combined navigation system, according to claim 6, is characterized in that the deep learning network model is trained by using Adam optimizer, and the momentum of Adam optimizer is set to (0.9, 0.99); the learning rates of the first single-layer LTC-RNN and the second single-layer LTC-RNN are set to be 0.001, and the learning rates of the rest of the modules are set to be 0.00001; the penalty function is smooth _ l1_ loss.