WO2022262878A1 - 基于ltc-dnn的视觉惯导组合导航***与自学习方法 - Google Patents

基于ltc-dnn的视觉惯导组合导航***与自学习方法 Download PDF

Info

Publication number
WO2022262878A1
WO2022262878A1 PCT/CN2022/112625 CN2022112625W WO2022262878A1 WO 2022262878 A1 WO2022262878 A1 WO 2022262878A1 CN 2022112625 W CN2022112625 W CN 2022112625W WO 2022262878 A1 WO2022262878 A1 WO 2022262878A1
Authority
WO
WIPO (PCT)
Prior art keywords
ltc
layer
visual
inertial navigation
rnn
Prior art date
Application number
PCT/CN2022/112625
Other languages
English (en)
French (fr)
Inventor
胡斌杰
丘金光
Original Assignee
华南理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华南理工大学 filed Critical 华南理工大学
Publication of WO2022262878A1 publication Critical patent/WO2022262878A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/10Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
    • G01C21/12Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
    • G01C21/16Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
    • G01C21/165Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments
    • G01C21/1656Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments with passive imaging devices, e.g. cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the invention relates to the technical field of sensor fusion and motion estimation, in particular to an integrated visual-inertial navigation system and self-learning method based on LTC-DNN.
  • the pure visual odometer method uses visual sensors to obtain the surrounding environment. Information, through the analysis of visual data, the motion state is estimated, but once an occluder appears in the scene or the visual data is lost due to data transmission, the estimation of the motion state is undoubtedly seriously disturbed, and the error will become larger.
  • Visual inertial odometry adds inertial measurement unit (IMU) information on the basis of pure visual odometry, which can improve the accuracy of motion state estimation when vision loses its effect.
  • IMU inertial measurement unit
  • the purpose of the present invention is to solve the above-mentioned defects in the prior art, and provide a visual-inertial navigation integrated navigation system and self-learning method based on LTC-DNN.
  • a kind of visual-inertial navigation integrated navigation system based on LTC-DNN this visual-inertial navigation integrated navigation system is used for automatic driving, autonomous navigation of unmanned aerial vehicle, described visual-inertial navigation integrated navigation system comprises deep learning network model, and described depth
  • the learning network model is composed of a visual feature extraction module, an inertial navigation feature extraction module, and a pose regression module connected sequentially.
  • the visual feature extraction module is used to extract 1024-dimensional visual features, and the input of the visual feature extraction module is adjacent two frames of RGB pictures superimposed along the channel, and outputs 1024-dimensional visual features;
  • the inertial navigation feature extraction module includes the first single-layer LTC-RNN of 1024-dimensional hidden state; the input of the inertial navigation feature extraction module is the inertial navigation data between the two adjacent frames of RGB pictures, and the output is 1024-dimensional inertial navigation features;
  • the pose regression module includes an attention mechanism fusion submodule connected in sequence, a second single-layer LTC-RNN with a 1000-dimensional hidden state, and a fully connected regression submodule, wherein the input of the attention mechanism fusion submodule is visual features
  • the concatenated features obtained in series with the inertial navigation features are used to weight the visual features and inertial navigation features to obtain weighted fusion features
  • the input of the second single-layer LTC-RNN is the weighted fusion features, and the output regression features
  • the fully connected regression submodule The input is the regression feature, and the output is an estimate of relative displacement and relative rotation.
  • the visual feature extraction module is formed by sequentially stacking 10-layer convolutional neural networks, and the convolution kernel sizes of the first three layers of convolutional neural networks in the 10-layer convolutional neural network are 7 ⁇ 7, 5 ⁇ 5, 5 ⁇ 5, the convolution kernel size of the next seven layers of convolutional neural network is 3 ⁇ 3, among which, the convolution step size of the fourth layer, sixth layer and eighth layer convolutional neural network is 1, and the remaining convolutional neural network
  • the convolutional step size of the neural network is 2; the 10-layer convolutional neural network uses the ReLU activation function.
  • the RGB picture is converted into a size of 416 ⁇ 128 before being input into the feature extraction module.
  • calculation formulas of the first single-layer LTC-RNN and the second single-layer LTC-RNN are as follows:
  • h(t) is the hidden state of LTC-RNN at the current moment
  • is a constant time constant
  • ⁇ t is the time step
  • x(t) is the input data at the current moment
  • f(h(t), x(t), t , ⁇ ) is the deep learning network
  • is its trainable parameters
  • t is the current moment.
  • the calculation method of the first single-layer LTC-RNN and the second single-layer LTC-RNN takes the data x(t ) and h(t) are input into the above calculation formula, and the current output h(t+ ⁇ t) of the formula is used as the input h(t) of the formula next time to continue the calculation, and it is repeated 6 times; the sixth time
  • the output h(t+ ⁇ t) of is used as the calculation result of the first single-layer LTC-RNN and the second single-layer LTC-RNN.
  • the attention mechanism fusion sub-module includes two sub-networks with the same structure, and each sub-network is formed by superimposing two layers of fully-connected networks.
  • the dimension of the first-layer fully-connected network is 2048, followed by the ReLU activation function.
  • the dimension of the two-layer fully connected network is 1024, followed by the Sigmoid activation function.
  • the fully-connected regression sub-module is composed of a four-layer fully-connected network, wherein the dimension of the first-layer fully-connected network is 512, the dimension of the second-layer fully-connected network is 128, and the dimension of the third-layer fully-connected network is 64.
  • the dimension of the fourth-layer fully-connected network is 6; the first three-layer fully-connected networks in the fully-connected regression sub-module are followed by a ReLU activation function, and the fourth-layer fully-connected network is not connected with any activation function.
  • a kind of self-learning method of the visual inertial navigation integrated navigation system based on LTC-DNN, described self-learning method comprises the following steps:
  • pseudo tags, real tags, and mixed tags include relative displacements and relative rotations on x, y, and z axes.
  • the operation of converting the real label and the mixed label to the standard normal distribution is to convert the relative displacement and relative rotation on the x, y and z axes to the standard normal distribution respectively.
  • the training of the deep learning network model uses an Adam optimizer, and the momentum of the Adam optimizer is set to (0.9,0.99); the learning rate of the first single-layer LTC-RNN and the second single-layer LTC-RNN is set to 0.001, The learning rate of the remaining modules is set to 0.00001; the loss function is smooth_l1_loss.
  • the present invention has the following advantages and effects:
  • the present invention proposes a visual inertial navigation integrated navigation system based on LTC-DNN, including a deep learning network model, which introduces the first single-layer LTC-RNN and the second single-layer LTC-RNN to reduce
  • the deep learning network model can train the number of parameters and improve the robustness of the deep learning network model.
  • the present invention proposes a self-learning method of a visual-inertial navigation integrated navigation system based on LTC-DNN. Compared with the same type of algorithm, the self-learning method reduces the dependence on real labels.
  • Fig. 1 is a schematic structural diagram of a deep learning network model in a visual-inertial navigation integrated navigation system based on LTC-DNN disclosed in an embodiment of the present invention
  • Fig. 2 is a schematic diagram of the structure of the attention mechanism fusion sub-module in the embodiment of the present invention.
  • Fig. 3 is a schematic structural diagram of a fully connected regression sub-module in an embodiment of the present invention.
  • Fig. 4 is a flow chart of a self-learning method of a visual-inertial navigation integrated navigation system based on LTC-DNN disclosed in an embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of the visual-inertial navigation integrated navigation system based on LTC-DNN.
  • the deep learning network model is composed of a visual feature extraction module, an inertial navigation feature extraction module, and a pose regression module connected in sequence.
  • the visual feature extraction module is used to extract 1024-dimensional visual features, the input of the visual feature extraction module is two adjacent frames of RGB pictures superimposed along the channel, and the output is 1024-dimensional visual features.
  • the visual feature extraction module is formed by sequentially stacking 10 layers of convolutional neural networks, and the convolution kernel sizes of the first three layers of convolutional neural networks in the 10-layer convolutional neural network are 7 ⁇ 7, 5 ⁇ 5, 5 ⁇ 5 in sequence. , the convolution kernel size of the following seven layers of convolutional neural networks is 3 ⁇ 3, among them, the convolution step size of the fourth layer, the sixth layer and the eighth layer convolutional neural network is 1, and the convolutional neural network of the other convolutional neural networks The convolution step size is 2; the 10-layer convolutional neural network uses the ReLU activation function.
  • the inertial navigation feature extraction module includes the first single-layer LTC-RNN (liquid time constant recurrent neural network) of 1024-dimensional hidden state; the input of the inertial navigation feature extraction module is between the two adjacent frames of RGB pictures Inertial navigation data, the output is 1024-dimensional inertial navigation features;
  • LTC-RNN liquid time constant recurrent neural network
  • the pose regression module includes an attention mechanism fusion submodule connected in sequence, a second single-layer LTC-RNN with a 1000-dimensional hidden state, and a fully connected regression submodule, wherein the input of the attention mechanism fusion submodule is visual features
  • the concatenated features obtained in series with the inertial navigation features are used to weight the visual features and inertial navigation features to obtain weighted fusion features
  • the input of the second single-layer LTC-RNN is the weighted fusion features, and the output regression features
  • the fully connected regression submodule The input is the regression feature, and the output is an estimate of relative displacement and relative rotation.
  • h(t) is the hidden state of LTC-RNN at the current moment
  • is a constant time constant
  • ⁇ t is the time step
  • x(t) is the input data at the current moment
  • f(h(t), x(t), t , ⁇ ) is the deep learning network
  • is its trainable parameters
  • t is the current moment.
  • the calculation method of the first single-layer LTC-RNN and the second single-layer LTC-RNN takes the data x(t ) and h(t) are input into the above calculation formula, and the current output h(t+ ⁇ t) of the formula is used as the input h(t) of the formula next time to continue the calculation, and it is repeated 6 times; the sixth time
  • the output h(t+ ⁇ t) of is used as the calculation result of the first single-layer LTC-RNN and the second single-layer LTC-RNN.
  • Fig. 2 is a schematic diagram of the attention mechanism fusion sub-module of the embodiment of the present invention.
  • the attention mechanism fusion sub-module includes two sub-networks with the same structure, and each sub-network is formed by superimposing two layers of fully connected networks.
  • the dimension of the first layer of fully connected network is 2048, followed by the ReLU activation function, and the dimension of the second layer of fully connected network is 1024, followed by the Sigmoid activation function.
  • Fig. 3 is a schematic diagram of the structure of the fully connected regression sub-module of the embodiment of the present invention.
  • the fully connected regression sub-module is composed of four layers of fully connected networks.
  • the dimension of the connection network is 128, the dimension of the third layer of fully connected network is 64, and the dimension of the fourth layer of fully connected network is 6; the first three layers of fully connected networks in the fully connected regression sub-module are followed by the ReLU activation function, and the fourth layer
  • the fully connected network does not receive any activation function.
  • Fig. 4 is the self-learning method flow chart of the embodiment of the present invention, refer to Fig. 4, this self-learning method is made up of four steps, and process is as follows:
  • pseudo labels, real labels, and mixed labels include relative displacements and relative rotations on the x, y, and z axes.
  • the operation of transforming the real label and the mixed label to the standard normal distribution is to convert the relative displacement and relative rotation on the x, y, and z axes to the standard normal distribution respectively.
  • the training of the deep learning network model uses the Adam optimizer, and the momentum of the Adam optimizer is set to (0.9,0.99); the learning rate of the first single-layer LTC-RNN and the second single-layer LTC-RNN is set to 0.001, and the learning rate of the other modules is Set to 0.00001; the loss function is smooth_l1_loss.
  • the visual feature extraction module inputs two adjacent frames of RGB images superimposed along the channel to obtain visual features; at the same time, the inertial navigation feature extraction module inputs between two adjacent frames of RGB images The inertial navigation data to obtain the inertial navigation features; then the visual features and inertial navigation features are connected in series along the row direction and input to the pose regression module to obtain relative displacement 1 and relative rotation 1; Relative displacement 1 and relative rotation 1 are denormalized for the second time to obtain relative displacement 2 and relative rotation 2.
  • the self-learning method in this embodiment introduces pseudo-labels and real labels to train the deep learning network model together, which reduces the demand for the number of real labels, unlike other methods that require a large number of real labels for training.
  • the present invention utilizes the first single-layer LTC-RNN and the second single-layer LTC-RNN to extract inertial navigation features and pose regression respectively, and its advantage lies in that the first single-layer LTC-RNN and the second single-layer LTC-RNN are
  • the iterative calculation method increases the ability to extract features, unlike other recurrent neural networks, which only use a single calculation method to extract features.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种基于LTC-DNN的视觉惯导组合导航***与自学习方法,该视觉惯导组合导航***包含深度学习网络模型,其中,深度学习网络模型由视觉特征提取模块、惯导特征提取模块、位姿回归模块构成;视觉特征提取模块用于提取相邻两帧RGB图片的视觉特征;惯导特征提取模块用于提取惯导数据的惯导特征;位姿回归模块包括注意力机制融合子模块、液态时间常量递归神经网络(LTC-RNN)、全连接回归子模块,用于预测相对位移、相对旋转。所述方法对视觉惯导组合导航***进行训练,与同类型算法相比降低对真实标签的依赖性;且深度学习网络模型相对位移和相对位姿估计精度高、对数据损坏的鲁棒性好。

Description

基于LTC-DNN的视觉惯导组合导航***与自学习方法 技术领域
本发明涉及传感器融合和运动估计技术领域,具体涉及一种基于LTC-DNN的视觉惯导组合导航***与自学习方法。
背景技术
随着自动驾驶、无人机的不断发展,实现高精度、高鲁棒性的定位是完成自主导航、探索未知区域的等任务的重要前提,纯视觉里程计方法,利用视觉传感器获取周围的环境信息,通过对视觉数据进行分析,估计运动状态,但是一旦场景中出现遮挡物或者由于数据传输的原因导致视觉数据丢失,运动状态的估计无疑为受到很严重的干扰,误差会原来越大。视觉惯导里程计在纯视觉里程计的基础上添加和惯性测量单元(IMU)信息,可以在视觉失去作用的情况下来提高运动状态估计的精度。
近年来,深度学习技术在计算机视觉领域取得了巨大成就,广泛应用于各个领域中。视觉惯导组合导航作为一项回归任务,同样可以采用深度学习的方法进行训练,但是现有的基于深度学习的视觉惯导组合导航算法在训练过程中,受限于真实标签数量,泛化能力较弱;同时现有的基于深度学习的视觉惯导组合导航任务中需要大量的可训练参数,对其实际应用有着十分巨大的影响。
发明内容
本发明的目的是为了解决现有技术中的上述缺陷,提供一种基于LTC-DNN的视觉惯导组合导航***与自学习方法。
本发明的第一个目的可以通过采取如下技术方案达到:
一种基于LTC-DNN的视觉惯导组合导航***,该视觉惯导组合导航***用于自动驾驶、无人机的自主导航,所述视觉惯导组合导航***包括深度学习网络模型,所述深度学习网络模型由依次顺序连接的视觉特征提取模块、惯导特征提取模块、位姿回归模块组成,其中,
所述视觉特征提取模块用于提取1024维视觉特征,所述视觉特征提取模块的输入为沿着通道叠加的相邻两帧RGB图片,输出1024维视觉特征;
所述惯导特征提取模块包括1024维隐藏状态的第一单层LTC-RNN;所述惯导特征提取模块的输入为所述相邻两帧RGB图片之间的惯导数据,输出为1024维惯导特征;
所述位姿回归模块包括依照顺序连接的注意力机制融合子模块、1000维隐藏状态的第二单层LTC-RNN、全连接回归子模块,其中,注意力机制融合子模块的输入是视觉特征和惯导特征串联得到的串联特征,用于对视觉特征及惯导特征进行加权得到加权融合特征;第二单层LTC-RNN的输入是加权融合特征,输出回归特征;全连接回归子模块的输入是回归特征,输出相对位移、相对旋转的估计。
进一步地,所述视觉特征提取模块由10层卷积神经网络顺序堆叠而成,10层卷积神经网络中前三层卷积神经网络的卷积核大小依次是7×7、5×5、5×5,后面七层卷积神经 网络的卷积核大小均是3×3,其中,第四层、第六层和第八层卷积神经网络的卷积步长为1,其余卷积神经网络的卷积步长为2;10层卷积神经网络都使用ReLU激活函数。
进一步地,所述RGB图片在输入特征提取模块前转换成416×128的尺寸。
进一步地,所述第一单层LTC-RNN和第二单层LTC-RNN的计算公式如下:
Figure PCTCN2022112625-appb-000001
h(t)为当前时刻LTC-RNN的隐藏状态,τ为常量时间常数,Δt为时间步长,x(t)为当前时刻的输入数据,f(h(t),x(t),t,θ)为深度学习网络,θ为其可训练参数,t为当前时刻,第一单层LTC-RNN和第二单层LTC-RNN的计算方式在每次计算的开始阶段将数据x(t)和h(t)输入至上述计算公式中,将该公式的当前输出h(t+Δt)作为下次该公式的输入h(t)继续进行计算,并重复执行6次;将第6次的输出h(t+Δt)作为第一单层LTC-RNN和第二单层LTC-RNN的计算结果。
进一步地,所述注意力机制融合子模块包括两个相同结构的子网络,每个子网络由两层全连接网络叠加而成,第一层全连接网络维度为2048,后接ReLU激活函数,第二层全连接网络维度为1024,后接Sigmoid激活函数。
进一步地,所述全连接回归子模块由四层全连接网络组成,其中,第一层全连接网络维度为512,第二层全连接网络维度为128,第三层全连接网络维度为64,第四层全连接网络维度为6;所述全连接回归子模块内的前三层全连接网络后接ReLU激活函数,第四层全连接网络不接任何激活函数。
本发明的另一个目的可以通过采取如下技术方案达到:
一种基于LTC-DNN的视觉惯导组合导航***的自学习方法,所述自学习方法包括以下步骤:
S1、将具有真实相对位移、相对旋转的真实标签转换到标准正态分布,得到真实标准化标签、均值1、方差1,使用真实标准化标签对深度学习网络模型进行第一次训练;
S2、将第一次训练完成的深度学习网络模型对无标签数据进行预测,并使用均值1、方差1对预测结果进行第一次逆标准化计算,得到伪标签;
S3、随机选取一定数量的伪标签和真实标签根据0.2:1的比例进行混合,得到混合标签;
S4、将混合标签转换到标准正态分布,得到混合标准化标签、均值2、方差2,使用混合标准化标签对深度学习网络模型进行第二次训练。
进一步地,所述伪标签、真实标签、混合标签包含x、y、z轴上的相对位移、相对旋转。
进一步地,所述真实标签、混合标签转换到标准正态分布的操作是将x、y、z轴上的相对位移、相对旋转分别转换到标准正态分布。
进一步地,所述深度学习网络模型的训练使用Adam优化器,Adam优化器动量设置为(0.9,0.99);第一单层LTC-RNN与第二单层LTC-RNN的学习率设置为0.001,其余模块的学习率设置成0.00001;损失函数为smooth_l1_loss。
本发明相对于现有技术具有如下的优点及效果:
(1)本发明提出一种基于LTC-DNN的视觉惯导组合导航***,包含深度学习网络模型,该深度学习网络模型引入第一单层LTC-RNN与第二单层LTC-RNN,达到降低深度学习网络模型可训练参数量及提高深度学习网络模型鲁棒性的目的。
(2)本发明提出一种基于LTC-DNN的视觉惯导组合导航***的自学习方法,该自学习方法与同类型算法相比降低对真实标签的依赖性。
附图说明
图1是本发明实施例中公开的一种基于LTC-DNN的视觉惯导组合导航***中深度学习网络模型结构示意图;
图2是本发明实施例中注意力机制融合子模块结构示意图;
图3是本发明实施例中全连接回归子模块结构示意图;
图4是本发明实施例中公开的一种基于LTC-DNN的视觉惯导组合导航***的自学习方法流程图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
实施例一
本实施例公开了一种基于LTC-DNN的视觉惯导组合导航***,图1是该基于LTC-DNN的视觉惯导组合导航***的结构示意图。
参见图1,所述深度学习网络模型由依次顺序连接的视觉特征提取模块、惯导特征提取模块、位姿回归模块组成。
所述视觉特征提取模块用于提取1024维视觉特征,所述视觉特征提取模块的输入为沿着通道叠加的相邻两帧RGB图片,输出1024维视觉特征。
所述视觉特征提取模块由10层卷积神经网络顺序堆叠而成,10层卷积神经网络中前三层卷积神经网络的卷积核大小依次是7×7、5×5、5×5,后面七层卷积神经网络的卷积核大小均是3×3,其中,第四层、第六层和第八层卷积神经网络的卷积步长为1,其余卷积神经网络的卷积步长为2;10层卷积神经网络都使用ReLU激活函数。
所述惯导特征提取模块包括1024维隐藏状态的第一单层LTC-RNN(液态时间常量递归神经网络);所述惯导特征提取模块的输入为所述相邻两帧RGB图片之间的惯导数据,输出为1024维惯导特征;
所述位姿回归模块包括依照顺序连接的注意力机制融合子模块、1000维隐藏状态的第二单层LTC-RNN、全连接回归子模块,其中,注意力机制融合子模块的输入是视觉特征和惯导特征串联得到的串联特征,用于对视觉特征及惯导特征进行加权得到加权融合特征;第二单层LTC-RNN的输入是加权融合特征,输出回归特征;全连接回归子模块的输入是回归特征,输出相对位移、相对旋转的估计。
所述第一单层LTC-RNN和第二单层LTC-RNN的计算公式:
Figure PCTCN2022112625-appb-000002
h(t)为当前时刻LTC-RNN的隐藏状态,τ为常量时间常数,Δt为时间步长,x(t)为当前时刻的输入数据,f(h(t),x(t),t,θ)为深度学习网络,θ为其可训练参数,t为当前时刻,第一单层LTC-RNN和第二单层LTC-RNN的计算方式在每次计算的开始阶段将数据x(t)和h(t)输入至上述计算公式中,将该公式的当前输出h(t+Δt)作为下次该公式的输入h(t)继续进行计算,并重复执行6次;将第6次的输出h(t+Δt)作为第一单层LTC-RNN和第二单层LTC-RNN的计算结果。
图2是本发明实施例注意力机制融合子模块示意图,参加图2,所述注意力机制融合子模块包括两个相同结构的子网络,每个子网络由两层全连接网络叠加而成,第一层全连接网络维度为2048,后接ReLU激活函数,第二层全连接网络维度为1024,后接Sigmoid激活函数。
图3是本发明实施例全连接回归子模块结构示意图,参加图3,所述全连接回归子模块由四层全连接网络组成,其中,第一层全连接网络维度为512,第二层全连接网络维度为128,第三层全连接网络维度为64,第四层全连接网络维度为6;所述全连接回归子模块内的前三层全连接网络后接ReLU激活函数,第四层全连接网络不接任何激活函数。
实施例二
本实施例基于上述实施例中公开的一种基于LTC-DNN的视觉惯导组合导航***,公开了该视觉惯导组合导航***的自学习方法。图4是本发明实施例自学习方法流程图,参加图4,该自学习方法由四个步骤组成,过程如下:
S1、将具有真实相对位移、相对旋转的真实标签转换到标准正态分布,得到真实标准化标签、均值1、方差1,使用真实标准化标签对深度学习网络模型进行第一次训练;
S2、将第一次训练完成的深度学习网络模型对无标签数据进行预测,并使用均值1、方差1对预测结果进行第一次逆标准化计算,得到伪标签;
S3、随机选取一定数量的伪标签和真实标签根据0.2:1的比例进行混合,得到混合标签;
S4、将混合标签转换到标准正态分布,得到混合标准化标签、均值2、方差2,使用混合标准化标签对深度学习网络模型进行第二次训练。
其中,伪标签、真实标签、混合标签包含x、y、z轴上的相对位移、相对旋转。
真实标签、混合标签转换到标准正态分布的操作是将x、y、z轴上的相对位移、相对旋转分别转换到标准正态分布。
深度学习网络模型的训练使用Adam优化器,Adam优化器动量设置为(0.9,0.99);第一单层LTC-RNN与第二单层LTC-RNN的学习率设置为0.001,其余模块的学习率设置成0.00001;损失函数为smooth_l1_loss。
在第二次训练完成的深度学习网络模型中的视觉特征提取模块输入沿着通道叠加的相邻两帧RGB图片,得到视觉特征;同时在惯导特征提取模块输入相邻两帧RGB图片之 间的惯导数据,得到惯导特征;然后把视觉特征和惯导特征沿着行方向进行串联并输入至位姿回归模块,得到相对位移1、相对旋转1;接下来通过均值2、方差2对相对位移1和相对旋转1进行第二次逆标准化,得到相对位移2、相对旋转2。
综上所述,本实施例中自学习方法引入伪标签与真实标签一起训练深度学习网络模型,降低了对真实标签数量的需求,而不像其他方法需要大量真实标签进行训练。本发明利用第一单层LTC-RNN和第二单层LTC-RNN分别进行惯导特征的提取和位姿回归,其优点在于第一单层LTC-RNN和第二单层LTC-RNN内部的迭代计算方式增加了提取特征的能力,而不像其他递归神经网络只是单次计算方式来提取特征。
上述实施例为本发明较佳的实施方式,但本发明的实施方式并不受上述实施例的限制,其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化,均应为等效的置换方式,都包含在本发明的保护范围之内。

Claims (10)

  1. 一种基于LTC-DNN的视觉惯导组合导航***,该视觉惯导组合导航***用于自动驾驶、无人机的自主导航,其特征在于,所述视觉惯导组合导航***包括深度学习网络模型,所述深度学习网络模型由依次顺序连接的视觉特征提取模块、惯导特征提取模块、位姿回归模块组成,其中,
    所述视觉特征提取模块用于提取1024维视觉特征,所述视觉特征提取模块的输入为沿着通道叠加的相邻两帧RGB图片,输出1024维视觉特征;
    所述惯导特征提取模块包括1024维隐藏状态的第一单层LTC-RNN;所述惯导特征提取模块的输入为所述相邻两帧RGB图片之间的惯导数据,输出为1024维惯导特征;
    所述位姿回归模块包括依照顺序连接的注意力机制融合子模块、1000维隐藏状态的第二单层LTC-RNN、全连接回归子模块,其中,注意力机制融合子模块的输入是视觉特征和惯导特征串联得到的串联特征,用于对视觉特征及惯导特征进行加权得到加权融合特征;第二单层LTC-RNN的输入是加权融合特征,输出回归特征;全连接回归子模块的输入是回归特征,输出相对位移、相对旋转的估计。
  2. 根据权利要求1所述的基于LTC-DNN的视觉惯导组合导航***,其特征在于,所述视觉特征提取模块由10层卷积神经网络顺序堆叠而成,10层卷积神经网络中前三层卷积神经网络的卷积核大小依次是7×7、5×5、5×5,后面七层卷积神经网络的卷积核大小均是3×3,其中,第四层、第六层和第八层卷积神经网络的卷积步长为1,其余卷积神经网络的卷积步长为2;10层卷积神经网络都使用ReLU激活函数。
  3. 根据权利要求1所述的基于LTC-DNN的视觉惯导组合导航***,其特征在于,所述RGB图片在输入特征提取模块前转换成416×128的尺寸。
  4. 根据权利要求1所述的基于LTC-DNN的视觉惯导组合导航***,其特征在于,所述第一单层LTC-RNN和第二单层LTC-RNN的计算公式如下:
    Figure PCTCN2022112625-appb-100001
    h(t)为当前时刻LTC-RNN的隐藏状态,τ为常量时间常数,Δt为时间步长,x(t)为当前时刻的输入数据,f(h(t),x(t),t,θ)为深度学习网络,θ为其可训练参数,t为当前时刻,第一单层LTC-RNN和第二单层LTC-RNN的计算方式在每次计算的开始阶段将数据x(t)和h(t)输入至上述计算公式中,将该公式的当前输出h(t+Δt)作为下次该公式的输入h(t)继续进行计算,并重复执行6次;将第6次的输出h(t+Δt)作为第一单层LTC-RNN和第二单层LTC-RNN的计算结果。
  5. 根据权利要求1所述的基于LTC-DNN的视觉惯导组合导航***,其特征在于,所述注意力机制融合子模块包括两个相同结构的子网络,每个子网络由两层全连接网络叠加而成,第一层全连接网络维度为2048,后接ReLU激活函数,第二层全连接网络维度为1024,后接Sigmoid激活函数。
  6. 根据权利要求1所述的基于LTC-DNN的视觉惯导组合导航***,其特征在于,所述全连接回归子模块由四层全连接网络组成,其中,第一层全连接网络维度为512,第二层全连接网络维度为128,第三层全连接网络维度为64,第四层全连接网络维度为6;所述全连接回 归子模块内的前三层全连接网络后接ReLU激活函数,第四层全连接网络不接任何激活函数。
  7. 一种根据权利要求1至6任一所述的基于LTC-DNN的视觉惯导组合导航***的自学习方法,所述自学习方法包括以下步骤:
    S1、将具有真实相对位移、相对旋转的真实标签转换到标准正态分布,得到真实标准化标签、均值1、方差1,使用真实标准化标签对深度学习网络模型进行第一次训练;
    S2、将第一次训练完成的深度学习网络模型对无标签数据进行预测,并使用均值1、方差1对预测结果进行第一次逆标准化计算,得到伪标签;
    S3、随机选取一定数量的伪标签和真实标签根据0.2:1的比例进行混合,得到混合标签;
    S4、将混合标签转换到标准正态分布,得到混合标准化标签、均值2、方差2,使用混合标准化标签对深度学习网络模型进行第二次训练。
  8. 据权利要求7所述的一种基于LTC-DNN的视觉惯导组合导航***的自学习方法,其特征在于,所述伪标签、真实标签、混合标签包含x、y、z轴上的相对位移、相对旋转。
  9. 据权利要求7所述的一种基于LTC-DNN的视觉惯导组合导航***的自学习方法,其特征在于,所述真实标签、混合标签转换到标准正态分布的操作是将x、y、z轴上的相对位移、相对旋转分别转换到标准正态分布。
  10. 据权利要求7所述的一种基于LTC-DNN的视觉惯导组合导航***的自学习方法,其特征在于,所述深度学习网络模型的训练使用Adam优化器,Adam优化器动量设置为(0.9,0.99);第一单层LTC-RNN与第二单层LTC-RNN的学习率设置为0.001,其余模块的学习率设置成0.00001;损失函数为smooth_l1_loss。
PCT/CN2022/112625 2021-06-16 2022-08-15 基于ltc-dnn的视觉惯导组合导航***与自学习方法 WO2022262878A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110664888.9 2021-06-16
CN202110664888.9A CN113392904B (zh) 2021-06-16 2021-06-16 基于ltc-dnn的视觉惯导组合导航***与自学习方法

Publications (1)

Publication Number Publication Date
WO2022262878A1 true WO2022262878A1 (zh) 2022-12-22

Family

ID=77621376

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/112625 WO2022262878A1 (zh) 2021-06-16 2022-08-15 基于ltc-dnn的视觉惯导组合导航***与自学习方法

Country Status (2)

Country Link
CN (1) CN113392904B (zh)
WO (1) WO2022262878A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115793001A (zh) * 2023-02-07 2023-03-14 立得空间信息技术股份有限公司 一种基于惯导复用的视觉、惯导、卫导融合定位方法
CN115953839A (zh) * 2022-12-26 2023-04-11 广州紫为云科技有限公司 一种基于循环架构与坐标系回归的实时2d手势估计方法
GB2615639A (en) * 2022-01-05 2023-08-16 Honeywell Int Inc Multiple inertial measurement unit sensor fusion using machine learning
CN116704026A (zh) * 2023-05-24 2023-09-05 国网江苏省电力有限公司南京供电分公司 一种定位方法、装置、电子设备和存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116989820B (zh) * 2023-09-27 2023-12-05 厦门精图信息技术有限公司 一种智能导航***及方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084131A (zh) * 2019-04-03 2019-08-02 华南理工大学 一种基于深度卷积网络的半监督行人检测方法
CN112556692A (zh) * 2020-11-27 2021-03-26 绍兴市北大信息技术科创中心 一种基于注意力机制的视觉和惯性里程计方法和***
CN112801201A (zh) * 2021-02-08 2021-05-14 华南理工大学 一种基于标准化的深度学习视觉惯导组合导航设计方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940562B (zh) * 2017-03-09 2023-04-28 华南理工大学 一种移动机器人无线集群***及神经网络视觉导航方法
US10885659B2 (en) * 2018-01-15 2021-01-05 Samsung Electronics Co., Ltd. Object pose estimating method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084131A (zh) * 2019-04-03 2019-08-02 华南理工大学 一种基于深度卷积网络的半监督行人检测方法
CN112556692A (zh) * 2020-11-27 2021-03-26 绍兴市北大信息技术科创中心 一种基于注意力机制的视觉和惯性里程计方法和***
CN112801201A (zh) * 2021-02-08 2021-05-14 华南理工大学 一种基于标准化的深度学习视觉惯导组合导航设计方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RAMIN HASANI; MATHIAS LECHNER; ALEXANDER AMINI; DANIELA RUS; RADU GROSU: "Liquid Time-constant Networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 14 December 2020 (2020-12-14), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081836317 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2615639A (en) * 2022-01-05 2023-08-16 Honeywell Int Inc Multiple inertial measurement unit sensor fusion using machine learning
CN115953839A (zh) * 2022-12-26 2023-04-11 广州紫为云科技有限公司 一种基于循环架构与坐标系回归的实时2d手势估计方法
CN115953839B (zh) * 2022-12-26 2024-04-12 广州紫为云科技有限公司 一种基于循环架构与关键点回归的实时2d手势估计方法
CN115793001A (zh) * 2023-02-07 2023-03-14 立得空间信息技术股份有限公司 一种基于惯导复用的视觉、惯导、卫导融合定位方法
CN116704026A (zh) * 2023-05-24 2023-09-05 国网江苏省电力有限公司南京供电分公司 一种定位方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN113392904A (zh) 2021-09-14
CN113392904B (zh) 2022-07-26

Similar Documents

Publication Publication Date Title
WO2022262878A1 (zh) 基于ltc-dnn的视觉惯导组合导航***与自学习方法
CN110595466B (zh) 轻量级的基于深度学习的惯性辅助视觉里程计实现方法
CN113393522B (zh) 一种基于单目rgb相机回归深度信息的6d位姿估计方法
CN111275713A (zh) 一种基于对抗自集成网络的跨域语义分割方法
US11100646B2 (en) Future semantic segmentation prediction using 3D structure
WO2020227651A1 (en) Methods, systems and computer program products for media processing and display
CN110533724B (zh) 基于深度学习和注意力机制的单目视觉里程计的计算方法
CN109272493A (zh) 一种基于递归卷积神经网络的单目视觉里程计方法
Sun et al. Unmanned surface vessel visual object detection under all-weather conditions with optimized feature fusion network in YOLOv4
CN113903011A (zh) 一种适用于室内停车场的语义地图构建及定位方法
CN108288038A (zh) 基于场景分割的夜间机器人运动决策方法
CN114022697A (zh) 基于多任务学习与知识蒸馏的车辆再辨识方法及***
CN114526728B (zh) 一种基于自监督深度学习的单目视觉惯导定位方法
CN112556692A (zh) 一种基于注意力机制的视觉和惯性里程计方法和***
CN114943757A (zh) 基于单目景深预测和深度增强学习的无人机森林探索***
CN117058474B (zh) 一种基于多传感器融合的深度估计方法及***
CN113160315B (zh) 一种基于对偶二次曲面数学模型的语义环境地图表征方法
CN112945233B (zh) 一种全局无漂移的自主机器人同时定位与地图构建方法
Li et al. Multi-modal neural feature fusion for automatic driving through perception-aware path planning
Wang et al. LF-VISLAM: A SLAM framework for large field-of-view cameras with negative imaging plane on mobile agents
CN111611869B (zh) 一种基于串行深度神经网络的端到端单目视觉避障方法
Jo et al. Mixture density-PoseNet and its application to monocular camera-based global localization
Dang et al. Real-time semantic plane reconstruction on a monocular drone using sparse fusion
CN112102399B (zh) 一种基于生成式对抗网络的视觉里程计算法
Wang et al. Attention guided unsupervised learning of monocular visual-inertial odometry

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22824350

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22824350

Country of ref document: EP

Kind code of ref document: A1