WO2024098240A1

WO2024098240A1 - Gastrointestinal endoscopy visual reconstruction navigation system and method

Info

Publication number: WO2024098240A1
Application number: PCT/CN2022/130535
Authority: WO
Inventors: 熊璟; 谭敏; 夏泽洋; 谢高生
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2022-11-08
Filing date: 2022-11-08
Publication date: 2024-05-16

Abstract

Provided are a gastrointestinal endoscopy visual reconstruction navigation system and method. The system comprises a data acquisition module (101), a map construction module (102), and a path planning module (103). The data acquisition module (101) is configured for acquiring virtual camera pose data and image depth data, and sending the acquired virtual camera pose data and image depth data to the map construction module (102). The map construction module (102) is configured for building an optical flow self-supervision network and an improved residual network according to the virtual camera pose data and the image depth data; and performing camera pose estimation and depth map estimation according to the optical flow self-supervision network and the improved residual network, respectively, to construct an environment map. The path planning module (103) is configured for extracting a topological center line according to the environment map and performing path planning and navigation around the topological center line. Therefore, the problems of difficulty in extracting feature points and inability to accurately discern the direction in an endoscopic image under the characteristics of low illumination and few textures can be solved.

Description

一种消化内镜视觉重建导航***及方法A visual reconstruction navigation system and method for digestive endoscopy

技术领域Technical Field

本申请实施例涉及医学图像处理技术领域，特别涉及一种消化内镜视觉重建导航***及方法。The embodiments of the present application relate to the field of medical image processing technology, and in particular to a digestive endoscopy visual reconstruction navigation system and method.

背景技术Background technique

结直肠癌是我国患病率最高的消化***癌症，肠镜检查是发现恶性息肉的最佳手段，在结肠镜检查手术中，医生凭借临床经验观察内窥镜图像，同时操作结肠镜的控制手柄前进。然而，消化内镜获取的组织内部的图像区域的弱纹理、重复纹理多、场景光照变化较大。并且，相机运动还会产生运动模糊，使得图像的特征提取困难。因此，当内窥镜图像出现“无信息帧”时会丢失腔道，进而无法辨别正确方向。Colorectal cancer is the most common digestive system cancer in my country. Colonoscopy is the best way to detect malignant polyps. During colonoscopy, doctors rely on their clinical experience to observe endoscopic images while operating the colonoscope's control handle to move forward. However, the image areas inside the tissues obtained by digestive endoscopy have weak textures, many repeated textures, and large changes in scene lighting. In addition, camera movement will also produce motion blur, making it difficult to extract image features. Therefore, when an endoscopic image has a "no information frame", the cavity will be lost, and the correct direction cannot be identified.

现有消化内镜视觉导航方法分为传统图像处理算法和深度学习相关算法，传统图像处理算法是利用肠腔的显著轮廓、暗区等特征作为导航的依据；深度学习相关算法则是根据输入的图像流对相机位姿以及深度图进行估计。然而，对于传统图像处理算法，在图像出现遮挡、模糊时，算法有效性将大大降低，当内窥镜距离肠壁太近时，内窥镜头部接收到的光线角度过窄，肠道肌肉线和暗区甚至可能会发生混淆，导致图像的特征提取困难。对于有监督深度学习方法，在消化内镜环境下，获得临床的手术视频图像相对容易，但要想获得每一帧图像对应的相机位姿、深度等真值标签十分困难，导致内窥镜图像无法准确辨向。Existing visual navigation methods for digestive endoscopy are divided into traditional image processing algorithms and deep learning-related algorithms. Traditional image processing algorithms use the prominent contours and dark areas of the intestinal cavity as the basis for navigation; deep learning-related algorithms estimate the camera pose and depth map based on the input image stream. However, for traditional image processing algorithms, the effectiveness of the algorithm will be greatly reduced when the image is blocked or blurred. When the endoscope is too close to the intestinal wall, the angle of light received by the endoscope head is too narrow, and the intestinal muscle lines and dark areas may even be confused, making it difficult to extract image features. For supervised deep learning methods, it is relatively easy to obtain clinical surgical video images in the digestive endoscopy environment, but it is very difficult to obtain the true value labels such as camera pose and depth corresponding to each frame of the image, resulting in the inability to accurately identify the direction of the endoscopic image.

发明内容Summary of the invention

本申请实施例提供一种消化内镜视觉重建导航***及方法，解决内窥镜图像在低光照、少纹理特点下的特征点提取困难、无法准确辨别方向的问题。The embodiments of the present application provide a digestive endoscope visual reconstruction navigation system and method to solve the problem that feature point extraction of endoscopic images under low light and little texture is difficult and the direction cannot be accurately identified.

为解决上述技术问题，第一方面，本申请实施例提供一种消化内镜视觉重建导航***，包括：依次连接的数据采集模块、地图构建模块和路径规划模块；数据采集模块用于采集虚拟相机位姿数据、图像深度数据，并将采集的虚拟相机位姿数据、图像深度数据发送至所述地图构建模块；地图构建模块用于根据所述虚拟相机位姿数据、所述图像深度数据，搭建光流自监督网络、改进的残差网络；并根据所述光流自监督网络、所述改进的残差网络，分别进行相机位姿估计、深度图估计，构建环境地图；路径规划模块用于根据所述环境地图，提取拓扑中心线，并围绕拓扑中心线进行路径规划和导航。To solve the above technical problems, in the first aspect, an embodiment of the present application provides a digestive endoscopy visual reconstruction navigation system, comprising: a data acquisition module, a map construction module and a path planning module connected in sequence; the data acquisition module is used to collect virtual camera pose data and image depth data, and send the collected virtual camera pose data and image depth data to the map construction module; the map construction module is used to build an optical flow self-supervised network and an improved residual network according to the virtual camera pose data and the image depth data; and according to the optical flow self-supervised network and the improved residual network, perform camera pose estimation and depth map estimation respectively to construct an environment map; the path planning module is used to extract a topological center line according to the environment map, and perform path planning and navigation around the topological center line.

一些示例性实施例中，所述地图构建模块包括相机位姿估计模块和深度图估计模块；所述相机位姿估计模块用于根据所述光流自监督网络，得到估计的相机位姿；所述深度图估计模块用于根据所述改进的残差网络，得到估计的内窥镜图像深度。In some exemplary embodiments, the map construction module includes a camera pose estimation module and a depth map estimation module; the camera pose estimation module is used to obtain an estimated camera pose based on the optical flow self-supervised network; the depth map estimation module is used to obtain an estimated endoscopic image depth based on the improved residual network.

一些示例性实施例中，所述地图构建模块根据估计的相机位姿、估计的内窥镜图像深度，通过三维重建构建环境地图。一些示例性实施例中，所述路径规划模块包括拓扑中心线获取模块和导航模块；拓扑中心线获取模块用于结合肠腔的管道特性获取肠腔腔道的拓扑中心线；导航模块用于提取所述拓扑中心线，并围绕所述拓扑中心线进行路径规划与导航。In some exemplary embodiments, the map construction module constructs an environment map through three-dimensional reconstruction according to the estimated camera pose and the estimated endoscopic image depth. In some exemplary embodiments, the path planning module includes a topological centerline acquisition module and a navigation module; the topological centerline acquisition module is used to obtain the topological centerline of the intestinal cavity in combination with the pipeline characteristics of the intestinal cavity; the navigation module is used to extract the topological centerline and perform path planning and navigation around the topological centerline.

第二方面，本申请实施例还提供了一种消化内镜视觉重建导航方法，采用上述消化内镜视觉重建导航***进行导航，包括以下步骤：获取虚拟相机位姿数据、图像深度数据；基于所述虚拟相机位姿数据，搭建光流自监督网络，并基于所述光流自监督网络，得到估计的相机位姿；基于所述图像深度数据，搭建改进的残差网络，并基于所述改进的残差网络，得到估计的内窥镜图像深度；基于所述估计的相机位姿、所述估计的内窥镜图像深度，构建环境地图；基于所述环境地图，提取拓扑中心线，并围绕所述拓扑中心线进行路径规划和导航。In the second aspect, an embodiment of the present application also provides a digestive endoscope visual reconstruction and navigation method, which uses the above-mentioned digestive endoscope visual reconstruction and navigation system for navigation, including the following steps: obtaining virtual camera pose data and image depth data; building an optical flow self-supervised network based on the virtual camera pose data, and obtaining an estimated camera pose based on the optical flow self-supervised network; building an improved residual network based on the image depth data, and obtaining an estimated endoscopic image depth based on the improved residual network; constructing an environmental map based on the estimated camera pose and the estimated endoscopic image depth; extracting a topological centerline based on the environmental map, and performing path planning and navigation around the topological centerline.

一些示例性实施例中，所述基于所述光流自监督网络，得到估计的相机位姿，包括：以至少两张图片作为输入，进行网络训练，得到与每张图片对应的特征描述子；所述特征描述子根据排序规则进行匹配，得到不同图片之间的对应的像素点；构建置信度评分损失函数，从所述像素点中提取特征点；基于所述特征点以及不同图片之间的几何关系，得到估计的相机位姿。In some exemplary embodiments, the estimated camera pose is obtained based on the optical flow self-supervised network, including: taking at least two pictures as input, performing network training, and obtaining a feature descriptor corresponding to each picture; matching the feature descriptors according to a sorting rule to obtain corresponding pixel points between different pictures; constructing a confidence score loss function to extract feature points from the pixel points; and obtaining an estimated camera pose based on the feature points and the geometric relationship between different pictures.

一些示例性实施例中，以两张图片作为输入，进行网络训练，得到两个特征描述子；两张图片的特征描述子根据排序规则进行匹配，得到两张图片之间对应的像素点；所述置信度评分损失函数如公式(1)所示：In some exemplary embodiments, two images are used as input to perform network training to obtain two feature descriptors; the feature descriptors of the two images are matched according to a sorting rule to obtain corresponding pixel points between the two images; the confidence score loss function is shown in formula (1):

其中，R _ij表示置信度评分，R _ij＝0～1；R _ij越大，表示特征描述子是特征点的几率越大；(i,j)表示图片中像素点的位置坐标；AP(i,j)表示像素点的平均精度；k∈[0,1]是一个阈值的超参数； Where R _ij represents the confidence score, R _ij = 0~1; the larger the R _ij , the greater the probability that the feature descriptor is a feature point; (i, j) represents the position coordinates of the pixel point in the image; AP(i, j) represents the average precision of the pixel point; k∈[0,1] is a threshold hyperparameter;

通过平均精度的损失函数，提取特征点；平均精度的损失函数如公式(2)所示：The feature points are extracted through the loss function of average precision. The loss function of average precision is shown in formula (2):

在网络中令公式(1)中k＝0.5，当计算得到的AP(i,j)损失函数小于k时，R _ij越小； In the network, let k = 0.5 in formula (1). When the calculated AP(i,j) loss function is less than k, the smaller R _{ij is} ;

(x _i,y)、(x _i,y')分别为两幅有重叠区域的图像中对应的像素点的位置坐标。 ( _xi , y) and ( _xi , y') are the position coordinates of the corresponding pixel points in the two images with overlapping areas.

一些示例性实施例中，基于所述改进的残差网络，通过卷积与批归一化处理，得到估计的内窥镜图像深度；所述改进的残差网络包括编码器模块和解码器模块，所述解码器模块采用带有激活函数的卷积块以及损失函数进行解码；所述激活函数为指数线性单元函数，如公式(3)所示：In some exemplary embodiments, based on the improved residual network, an estimated endoscopic image depth is obtained through convolution and batch normalization processing; the improved residual network includes an encoder module and a decoder module, and the decoder module uses a convolution block with an activation function and a loss function for decoding; the activation function is an exponential linear unit function, as shown in formula (3):

其中，ELU(x)表示指数线性单元函数；Where ELU(x) represents the exponential linear unit function;

所述损失函数包括第一损失函数、第二损失函数和第三损失函数；所述第一损失函数如公式(4)所示：The loss function includes a first loss function, a second loss function and a third loss function; the first loss function is shown in formula (4):

其中，D _i(p)表示真实的深度值图像，D _i'(p)表示预测的深度图；h _i＝logD _i'(p)－logD _i(p)；T表示经过滤后留下的有效值的数量，p∈T； Where D _i (p) represents the real depth value image, D _i '(p) represents the predicted depth map; h _i = log D _i '(p) - log _{D i} (p); T represents the number of valid values left after filtering, p∈T;

所述第二损失函数如公式(5)所示：The second loss function is shown in formula (5):

所述第三损失函数如公式(6)所示：The third loss function is shown in formula (6):

其中，l _i(p)表示彩色图像，而

表示对彩色图像和深度图像在x和y方向上求导数，得到彩色图像和深度图像的梯度图像。 Among them, l _i (p) represents the color image, and

It means taking the derivatives of the color image and the depth image in the x and y directions to obtain the gradient images of the color image and the depth image.

一些示例性实施例中，基于所述环境地图，提取拓扑中心线，并围绕所述拓扑中心线进行路径规划和导航，包括：基于肠腔的管道特性，获取肠腔腔道的拓扑中心线；基于所述拓扑中心线，在肠腔内可行进的空腔内构建拓扑地图；基于所述拓扑地图，进行从相机当前位置到目标位置的路径规划。In some exemplary embodiments, a topological centerline is extracted based on the environmental map, and path planning and navigation are performed around the topological centerline, including: obtaining the topological centerline of the intestinal cavity based on the pipeline characteristics of the intestinal cavity; constructing a topological map in the movable cavity in the intestinal cavity based on the topological centerline; and performing path planning from the current position of the camera to the target position based on the topological map.

一些示例性实施例中，所述基于所述拓扑中心线，在肠腔内可行进的空腔内构建拓扑地图，包括：遍历度量地图中空闲空间中的所有体素；将每个体素的父方向和与其相邻的体素的父方向进行比较；所述父方向为当前体素到最近的被占用点体素的方向；基于所述拓扑中心线的角度，对体素进行过滤，保留关键点作为拓扑地图的节点；对所述节点进行连接，得到拓扑地图。In some exemplary embodiments, a topological map is constructed in a movable cavity in the intestinal cavity based on the topological centerline, including: traversing all voxels in the free space in the metric map; comparing the parent direction of each voxel with the parent direction of the voxel adjacent to it; the parent direction is the direction from the current voxel to the nearest occupied point voxel; based on the angle of the topological centerline, the voxels are filtered and the key points are retained as nodes of the topological map; the nodes are connected to obtain a topological map.

本申请实施例提供的技术方案至少具有以下优点：The technical solution provided by the embodiment of the present application has at least the following advantages:

本申请实施例主要针对内窥镜图像在低光照、少纹理特点下的特征点提取困难、无法准确辨别方向的问题，提出了一种消化内镜视觉重建导航***及方法，该***包括：数据采集模块、地图构建模块和路径规划模块；数据采集模块用于采集虚拟相机位姿数据、图像深度数据，并将采集的虚拟相机位姿数据、图像深度数据发送至地图构建模块；地图构建模块用于根据虚拟相机位姿数据、图像深度数据，搭建光流自监督网络、改进的残差网络；并根据光流自监督网络、改进的残差网络，分别进行相机位姿估计、深度图估计，构建环境地图；路径规划模块用于根据环境地图提取拓扑中心线，并围绕拓扑中心线进行路径规划和导航。The embodiment of the present application mainly aims at the problem that it is difficult to extract feature points and cannot accurately identify directions of endoscopic images under low light and low texture characteristics, and proposes a digestive endoscope visual reconstruction navigation system and method, the system includes: a data acquisition module, a map construction module and a path planning module; the data acquisition module is used to collect virtual camera pose data and image depth data, and send the collected virtual camera pose data and image depth data to the map construction module; the map construction module is used to build an optical flow self-supervised network and an improved residual network according to the virtual camera pose data and image depth data; and according to the optical flow self-supervised network and the improved residual network, the camera pose estimation and depth map estimation are performed respectively to construct an environment map; the path planning module is used to extract the topological center line according to the environment map, and perform path planning and navigation around the topological center line.

本申请实施例提供的消化内镜视觉重建导航***及方法，相较于传统的消化内镜导航方法，本申请能够全局感知内窥镜环境、视觉重建能记录内窥镜历史轨迹。而且，本申请构建的基于光流自监督的特征点网络更适应于内窥镜图像弱纹理、表面光滑等特点，能够解决内窥镜图像在低光照、少纹理特点下的特征点难以提取问题。另外，相比于有监督深度学习方法，本申请实施例为解决临床图像没有真值标签的问题搭建了数据采集模块，因此，本申请不需要位姿真值标签进行网络的训练，标签仅仅用于验证阶段计算精度指标与误差。Compared with the traditional digestive endoscopy navigation method, the digestive endoscopy visual reconstruction navigation system and method provided in the embodiment of the present application can globally perceive the endoscope environment, and the visual reconstruction can record the historical trajectory of the endoscope. Moreover, the feature point network based on optical flow self-supervision constructed in the present application is more suitable for the characteristics of weak texture and smooth surface of endoscopic images, and can solve the problem that the feature points of endoscopic images under low light and few textures are difficult to extract. In addition, compared with the supervised deep learning method, the embodiment of the present application has built a data acquisition module to solve the problem that clinical images do not have true value labels. Therefore, the present application does not require pose true value labels for network training. The labels are only used to calculate accuracy indicators and errors in the verification stage.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

一个或多个实施例通过与之对应的附图中的图片进行示例性说明，这些示例性说明并不构成对实施例的限定，除非有特别申明，附图中的图不构成比例限制。One or more embodiments are exemplarily described by pictures in the corresponding drawings, and these exemplifications do not constitute limitations on the embodiments. Unless otherwise stated, the pictures in the drawings do not constitute proportional limitations.

图1为本申请一实施例提供的一种消化内镜视觉重建导航***的结构示意图；FIG1 is a schematic diagram of the structure of a visual reconstruction navigation system for digestive endoscopy provided in one embodiment of the present application;

图2为本申请一实施例提供的一种消化内镜视觉重建导航***的框架结构示意图；FIG2 is a schematic diagram of the framework structure of a digestive endoscope visual reconstruction navigation system provided in one embodiment of the present application;

图3为本申请一实施例提供的一种数据采集模块在虚拟结肠仿真平台上进行数据采集的流程示意图；FIG3 is a schematic diagram of a process of data acquisition performed on a virtual colon simulation platform by a data acquisition module provided in an embodiment of the present application;

图4为本申请一实施例提供的一种消化内镜视觉重建导航方法的流程示意图；FIG4 is a flow chart of a digestive endoscopy visual reconstruction navigation method provided in one embodiment of the present application;

图5为本申请一实施例提供的根据光流自监督网络得到估计的相机位姿的流程示意图；FIG5 is a schematic diagram of a process of estimating a camera pose based on an optical flow self-supervisory network according to an embodiment of the present application;

图6为本申请一实施例提供的根据改进的残差网络得到估计的内窥镜图像深度的流程示意图。FIG6 is a schematic diagram of a process for estimating the depth of an endoscopic image based on an improved residual network provided in an embodiment of the present application.

具体实施方式Detailed ways

由背景技术可知，现有消化内镜视觉导航方法存在的内窥镜图像在低光照、少纹理特点下的特征点提取困难、无法准确辨别方向的问题。From the background technology, it can be seen that the existing digestive endoscopy visual navigation method has the problem that it is difficult to extract feature points of endoscopic images under low light and little texture, and it is impossible to accurately identify the direction.

目前，现有消化内镜视觉导航方法分为传统图像处理算法和深度学习相关算法，其中，传统图像处理算法是利用肠腔的显著轮廓、暗区等特征，具体又分为暗区提取法、轮廓识别法等，现有技术中往往是将二者相结合作为导航的依据。由于内窥镜在封闭肠腔内前进，光照由远及近，因此，暗区是医生判断前进方向最重要也是最显著的特征。此外，结肠内部通常有比较明显的肌肉环，当腔道清晰可见时，可看见肠道半闭合肌肉曲线形状。因此，基于结肠本身的结构特性的轮廓识别法把曲率半径的方向作为肠道最深处进行导航。深度学习相关算法，要想对环境进行建图并对机器人的位置进行定位，需要算法能根据输入的图像流对相机位姿以及深度图进行估计。相机的位姿相当于世界坐标系到相机坐标系的变换，在三维视觉中也被称为外参。为了完成相机坐标系到像素坐标系的转换，还需要和相机本身属性相关的内参矩阵。相机的位姿估计也被称为SLAM(同步建图与定位)框架中的前端，叫做视觉里程计。在得到相机位姿之后，若能得到彩色图像帧对应的逐点像素深度，可重建出环境的地图。At present, the existing visual navigation methods of digestive endoscopy are divided into traditional image processing algorithms and deep learning related algorithms. Among them, the traditional image processing algorithm uses the significant contours and dark areas of the intestinal cavity, which are specifically divided into dark area extraction method, contour recognition method, etc. The existing technology often combines the two as the basis for navigation. Since the endoscope advances in the closed intestinal cavity and the light is from far to near, the dark area is the most important and most significant feature for doctors to judge the direction of advancement. In addition, there are usually obvious muscle rings inside the colon. When the cavity is clearly visible, the semi-closed muscle curve shape of the intestine can be seen. Therefore, the contour recognition method based on the structural characteristics of the colon itself uses the direction of the curvature radius as the deepest part of the intestine for navigation. Deep learning related algorithms, in order to map the environment and locate the position of the robot, require the algorithm to estimate the camera pose and depth map based on the input image stream. The pose of the camera is equivalent to the transformation from the world coordinate system to the camera coordinate system, which is also called external parameters in three-dimensional vision. In order to complete the transformation from the camera coordinate system to the pixel coordinate system, an internal parameter matrix related to the properties of the camera itself is also required. The camera pose estimation is also known as the front end in the SLAM (Simultaneous Mapping and Localization) framework, called visual odometry. After obtaining the camera pose, if the pixel depth corresponding to the color image frame can be obtained, a map of the environment can be reconstructed.

随着以数据驱动的深度网络的兴起，基于深度学习结合SLAM的一些研究开始出现。基于深度学习结合SLAM的方法分为有监督与无监督的方法，有监督训练的人工神经网络具有良好的泛化能力和较快的预测时间。然而，网络训练时难以选择最优的参数集，并对初始权值的选择敏感。此外，在消化内镜环境下，获得临床的手术视频相对容易，但要想获得每一帧图像对应的相机位姿、深度等真值标签十分困难。而无监督方法刚好可以解决在医疗领域难以获得真值标签的缺陷，网络内部依靠损失函数形成自约束。这个自约束就是从成像的像素坐标系到相机坐标系，最后再变换到世界坐标系过程中，已知深度图和相机的位姿即可从图像恢复三维点位置，满足几何一致性约束。With the rise of data-driven deep networks, some research based on deep learning combined with SLAM has begun to emerge. Methods based on deep learning combined with SLAM are divided into supervised and unsupervised methods. Supervised artificial neural networks have good generalization ability and fast prediction time. However, it is difficult to select the optimal parameter set during network training, and it is sensitive to the selection of initial weights. In addition, in the environment of digestive endoscopy, it is relatively easy to obtain clinical surgical videos, but it is very difficult to obtain the true value labels such as camera pose and depth corresponding to each frame of the image. The unsupervised method can just solve the defect of difficulty in obtaining true value labels in the medical field, and the network relies on the loss function to form self-constraints. This self-constraint is that in the process of transforming from the pixel coordinate system of the imaging to the camera coordinate system and finally to the world coordinate system, the three-dimensional point position can be restored from the image if the depth map and the camera pose are known, satisfying the geometric consistency constraint.

然而，对于传统图像处理算法，在图像出现遮挡、模糊时，算法有效性将大大降低，甚至完全不起作用。并且当内窥镜距离肠壁太近时，内窥镜头部接收到的光线角度过窄，肠道肌肉线和暗区甚至可能会发生混淆。在肠腔清晰可见时传统图像处理算法的鲁棒性较高，但这对医生来说没有太多辅助意义。另外，传统图像处理算法重点在于实时处理每一帧图像，暗区提取或是边缘轮廓提取法通常需要设定一些固定的阈值参数，在处理过程中难以实现自适应参数调整。对于有监督深度学习方法，在消化内镜环境下，获得临床的手术视频图像相对容易，但要想获得每一帧图像对应的相机位姿、深度等真值标签是十分困难的。对于无监督自约束深度学习方法，运动估计中的误差可能随着时间的推移而积累，导致轨迹的漂移。大多数工作基于自动驾驶自然场景图像或者医疗腹腔镜图像，其相机运动轨迹并不复杂，且腹腔内如肝脏等器官的边缘存在清晰纹理，更易于位姿与深度的估计。另外，对于已有的针对肠镜图像的位姿估计与深度估计工作，缺陷是误差仍然较大，且只能对简单的轨迹例如直线进行估计。但是在肠道内部，往往一些转弯处更为关键，这些从自动驾驶领域迁移过来的无监督框架往往不能很好的处理纹理重复，以及复杂的拐弯处的位姿预测问题。However, for traditional image processing algorithms, when the image is blocked or blurred, the effectiveness of the algorithm will be greatly reduced, or even completely ineffective. Moreover, when the endoscope is too close to the intestinal wall, the angle of light received by the endoscope head is too narrow, and the intestinal muscle line and dark area may even be confused. The robustness of traditional image processing algorithms is high when the intestinal cavity is clearly visible, but this does not have much auxiliary significance for doctors. In addition, the focus of traditional image processing algorithms is to process each frame of the image in real time. Dark area extraction or edge contour extraction methods usually require setting some fixed threshold parameters, and it is difficult to achieve adaptive parameter adjustment during the processing process. For supervised deep learning methods, it is relatively easy to obtain clinical surgical video images in the environment of digestive endoscopy, but it is very difficult to obtain the true value labels such as camera pose and depth corresponding to each frame of the image. For unsupervised self-constrained deep learning methods, errors in motion estimation may accumulate over time, resulting in trajectory drift. Most of the work is based on natural scene images of autonomous driving or medical laparoscopic images, whose camera motion trajectories are not complicated, and the edges of organs such as the liver in the abdominal cavity have clear textures, which are easier to estimate pose and depth. In addition, the existing pose estimation and depth estimation work for colonoscopy images still has large errors and can only estimate simple trajectories such as straight lines. However, inside the intestine, some turns are often more critical, and these unsupervised frameworks migrated from the field of autonomous driving often cannot handle texture repetition and pose prediction problems at complex turns well.

为了解决上述技术问题，本申请实施例提供一种消化内镜视觉重建导航***及方法，该***包括：依次连接的数据采集模块、地图构建模块和路径规划模块；数据采集模块用于采集虚拟相机位姿数据、图像深度数据，并将采集的虚拟相机位姿数据、图像深度数据发送至地图构建模块；地图构建模块用于根据虚拟相机位姿数据、图像深度数据，搭建光流自监督网络、改进的残差网络；并根据光流自监督网络、改进的残差网络，分别进行相机位姿估计、深度图估计，构建环境地图；路径规划模块用于根据环境地图，提取拓扑中心线，并围绕拓扑中心线进行路径规划和导航。本申请实施例提供的消化内镜视觉重建导航***，首先，为解决临床图像没有真值标签的问题搭建了数据采集模块；其次，分别利用深度网络搭建了位姿估计网络与深度图预测网络从而构建环境地图；最后，使用了一种拓扑中心线导航算法完成路径规划。通过本申请实施例提供的消化内镜视觉重建导航***及方法，能够现有消化内镜视觉导航方法存在内窥镜图像在低光照、少纹理特点下的特征点提取困难、无法准确辨别方向的问题。In order to solve the above technical problems, the embodiment of the present application provides a digestive endoscope visual reconstruction navigation system and method, the system includes: a data acquisition module, a map construction module and a path planning module connected in sequence; the data acquisition module is used to collect virtual camera pose data and image depth data, and send the collected virtual camera pose data and image depth data to the map construction module; the map construction module is used to build an optical flow self-supervised network and an improved residual network according to the virtual camera pose data and image depth data; and according to the optical flow self-supervised network and the improved residual network, the camera pose estimation and depth map estimation are performed respectively to build an environmental map; the path planning module is used to extract the topological centerline according to the environmental map, and perform path planning and navigation around the topological centerline. The digestive endoscope visual reconstruction navigation system provided in the embodiment of the present application, first, a data acquisition module is built to solve the problem that clinical images do not have true value labels; secondly, a pose estimation network and a depth map prediction network are built using a deep network to construct an environmental map; finally, a topological centerline navigation algorithm is used to complete path planning. The digestive endoscopy visual reconstruction navigation system and method provided in the embodiments of the present application can solve the problems of existing digestive endoscopy visual navigation methods, such as difficulty in extracting feature points of endoscopic images under low light and little texture, and inability to accurately identify directions.

下面将结合附图对本申请的各实施例进行详细的阐述。然而，本领域的普通技术人员可以理解，在本申请各实施例中，为了使读者更好地理解本申请而提出了许多技术细节。但是，即使没有这些技术细节和基于以下各实施例的种种变化和修改，也可以实现本申请所要求保护的技术方案。The following will describe the various embodiments of the present application in detail with reference to the accompanying drawings. However, it will be appreciated by those skilled in the art that in the various embodiments of the present application, many technical details are provided in order to enable the reader to better understand the present application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solution claimed in the present application can be implemented.

参看图1，本申请实施例提供一种消化内镜视觉重建导航***，包括：依次连接的数据采集模块101、地图构建模块102和路径规划模块1021；数据采集模块101用于采集虚拟相机位姿数据、图像深度数据，并将采集的虚拟相机位姿数据、图像深度数据发送至地图构建模块102；地图构建模块102用于根据虚拟相机位姿数据、图像深度数据，搭建光流自监督网络、改进的残差网络；并根据光流自监督网络、改进的残差网络，分别进行相机位姿估计、深度图估计，构建环境地图；路径规划模块103用于根据环境地图，提取拓扑中心线，并围绕拓扑中心线进行路径规划和导航。Referring to Figure 1, an embodiment of the present application provides a digestive endoscopy visual reconstruction navigation system, including: a data acquisition module 101, a map construction module 102 and a path planning module 1021 connected in sequence; the data acquisition module 101 is used to collect virtual camera pose data and image depth data, and send the collected virtual camera pose data and image depth data to the map construction module 102; the map construction module 102 is used to build an optical flow self-supervised network and an improved residual network according to the virtual camera pose data and image depth data; and according to the optical flow self-supervised network and the improved residual network, perform camera pose estimation and depth map estimation respectively, and construct an environment map; the path planning module 103 is used to extract the topological center line according to the environment map, and perform path planning and navigation around the topological center line.

如图1和图2所示，本申请通过设计一个基于深度自监督深度学习的消化内镜视觉重建与导航***，该***可更好的辅助医生进行导航，构建的环境地图也能起到辅助诊断以及记录病灶位置的作用。若该算法提供给机器人使用，可使机器人完成自主导航。As shown in Figures 1 and 2, this application designs a digestive endoscope visual reconstruction and navigation system based on deep self-supervised deep learning. The system can better assist doctors in navigation, and the constructed environmental map can also assist in diagnosis and record the location of lesions. If the algorithm is provided to a robot, the robot can complete autonomous navigation.

具体的，数据采集模块101用于采集虚拟相机位姿数据、图像深度数据，并将采集的虚拟相机位姿数据、图像深度数据发送至地图构建模块102。由于临床的真实结肠手术视频流中存在大量模糊情况，并且训练深度网络所需的深度图与位姿的真值标签是难以手工标记的，因此数据采集模块101的训练数据集采集工作是在虚拟结肠仿真平台上进行。图3示出了数据采集模块101在虚拟结肠仿真平台上进行数据采集的流程示意图，数据采集模块101采集的虚拟相机位姿与深度数据将用于深度网络训练时的真值标签，以及后续的相关评估。如图3所示，图3示出了虚拟结肠仿真平台的数据集采集模块101采集的结肠内部仿真环境，经过CT扫描得到的结肠模型导入虚拟仿真平台，然后通过仿真平台可进行环境与高光渲染，编写自定义脚本可采集得到对应的相机位姿与深度图像，这两部分数据将作为后续训练网络的真值标签，从而进行验证等工作。Specifically, the data acquisition module 101 is used to collect virtual camera pose data and image depth data, and send the collected virtual camera pose data and image depth data to the map construction module 102. Since there are a lot of blurry situations in the real clinical colon surgery video stream, and the depth map and pose truth labels required for training the deep network are difficult to mark manually, the training data set collection work of the data acquisition module 101 is carried out on the virtual colon simulation platform. Figure 3 shows a flow chart of data acquisition by the data acquisition module 101 on the virtual colon simulation platform. The virtual camera pose and depth data collected by the data acquisition module 101 will be used for the truth labels during deep network training, as well as subsequent related evaluations. As shown in Figure 3, Figure 3 shows the internal simulation environment of the colon collected by the data set acquisition module 101 of the virtual colon simulation platform. The colon model obtained by CT scanning is imported into the virtual simulation platform, and then the environment and highlight rendering can be performed through the simulation platform. The corresponding camera pose and depth image can be collected by writing a custom script. These two parts of data will be used as the truth labels for the subsequent training network, so as to perform verification and other work.

如图1所示，在一些实施例中，地图构建模块102包括相机位姿估计模块1021和深度图估计模块1022；相机位姿估计模块1021用于根据光流自监督网络，得到估计的相机位姿；深度图估计模块1022用于根据改进的残差网络，得到估计的内窥镜图像深度。As shown in Figure 1, in some embodiments, the map construction module 102 includes a camera pose estimation module 1021 and a depth map estimation module 1022; the camera pose estimation module 1021 is used to obtain an estimated camera pose based on an optical flow self-supervised network; the depth map estimation module 1022 is used to obtain an estimated endoscopic image depth based on an improved residual network.

本申请实施例提供的消化内镜视觉重建导航***中，地图构建模块102为核心模块，地图构建模块102可以分为基于光流自监督网络的位姿估计与基于改进残差网络的单目内窥镜图像深度图估计，其中，相机位姿估计模块1021用于实现基于光流自监督网络的位姿估计；深度图估计模块1022用于实现基于改进残差网络的单目内窥镜图像深度图估计；基于两种网络输出的相机位姿与肠腔深度图，可重建得到环境地图，并在此基础上开发路径规划与导航算法。In the digestive endoscopy visual reconstruction and navigation system provided in the embodiment of the present application, the map construction module 102 is the core module, and the map construction module 102 can be divided into pose estimation based on the optical flow self-supervised network and monocular endoscopic image depth map estimation based on the improved residual network, wherein the camera pose estimation module 1021 is used to realize pose estimation based on the optical flow self-supervised network; the depth map estimation module 1022 is used to realize monocular endoscopic image depth map estimation based on the improved residual network; based on the camera pose and intestinal cavity depth map output by the two networks, the environmental map can be reconstructed, and the path planning and navigation algorithms can be developed on this basis.

地图构建模块102构建环境地图的主要步骤如下：(1)搭建光流自监督网络与改进的残差网络，分别用于相机位姿估计与深度图估计，并进行训练。(2)使用基于光流自监督网络，得到估计的相机位姿。(3)使用基于改进残差网络的单目内窥镜图像深度图估计网络，得到估计的内窥镜图像深度。(4)利用步骤(2)和(3)得到的相机位姿与内窥镜图像深度，构建环境地图。The main steps of the map construction module 102 to construct the environment map are as follows: (1) Build an optical flow self-supervised network and an improved residual network, which are used for camera pose estimation and depth map estimation respectively, and train them. (2) Use the optical flow self-supervised network to obtain the estimated camera pose. (3) Use the monocular endoscopic image depth map estimation network based on the improved residual network to obtain the estimated endoscopic image depth. (4) Use the camera pose and endoscopic image depth obtained in steps (2) and (3) to construct the environment map.

在一些实施例中，地图构建模块102根据估计的相机位姿、估计的内窥镜图像深度，通过三维重建构建环境地图。In some embodiments, the map construction module 102 constructs an environment map through three-dimensional reconstruction according to the estimated camera pose and the estimated endoscopic image depth.

如图1所示，在一些实施例中，路径规划模块103包括拓扑中心线获取模块1031和导航模块1032；拓扑中心线获取模块1031用于结合肠腔的管道特性获取肠腔腔道的拓扑中心线；导航模块1032用于提取拓扑中心线，并围绕拓扑中心线进行路径规划与导航。具体的，导航模块1031根据所述拓扑中心线，在可行进的空腔内构建用于描述空腔的拓扑地图，再围绕拓扑中心线进行从相机当前位置到目标位置的路径规划。需要说明的是，目标位置为在内窥镜行进阶段看到的特殊位置，在后退过程中完成到所述特殊位置的路径规划；特殊位置包括病灶位置、息肉位置中的一种或两种。As shown in FIG1 , in some embodiments, the path planning module 103 includes a topological centerline acquisition module 1031 and a navigation module 1032; the topological centerline acquisition module 1031 is used to obtain the topological centerline of the intestinal cavity in combination with the pipeline characteristics of the intestinal cavity; the navigation module 1032 is used to extract the topological centerline, and perform path planning and navigation around the topological centerline. Specifically, the navigation module 1031 constructs a topological map for describing the cavity in the movable cavity according to the topological centerline, and then performs path planning from the current position of the camera to the target position around the topological centerline. It should be noted that the target position is a special position seen during the endoscope advancement stage, and the path planning to the special position is completed during the retreat process; the special position includes one or both of the lesion position and the polyp position.

具体的，路径规划模块103的主要功能在于：结合肠腔的管道特性直接提取腔道的拓扑中心线，在可行进的空腔内构建简单的用于描述空腔的拓扑地图，再围绕拓扑地图进行从相机当前位置到目标位置(也可称为目标点)的路径规划。其中，目标点定义为在内窥镜行进阶段看到的病灶或者息肉等特殊位置，在后退过程中完成到这些特殊位置的路径规划。Specifically, the main function of the path planning module 103 is to directly extract the topological center line of the cavity in combination with the pipeline characteristics of the intestinal cavity, construct a simple topological map for describing the cavity in the movable cavity, and then plan the path from the current position of the camera to the target position (also called the target point) based on the topological map. The target point is defined as a special position such as a lesion or polyp seen during the endoscope's advancement stage, and the path planning to these special positions is completed during the retreat process.

需要说明的是，特殊位置除了是病灶位置、息肉位置，也可以是其他具有特殊标记的位置。It should be noted that the special location may be not only the lesion location and the polyp location, but also other locations with special markings.

参见图4，本申请实施例还提供了一种消化内镜视觉重建导航方法，采用上述消化内镜视觉重建导航***进行导航，包括以下步骤：Referring to FIG. 4 , the embodiment of the present application further provides a digestive endoscope visual reconstruction navigation method, which uses the above-mentioned digestive endoscope visual reconstruction navigation system for navigation, and includes the following steps:

步骤S1、获取虚拟相机位姿数据、图像深度数据。Step S1, obtaining virtual camera position data and image depth data.

步骤S2、基于虚拟相机位姿数据，搭建光流自监督网络，并基于所述光流自监督网络，得到估计的相机位姿；基于图像深度数据，搭建改进的残差网络，并基于改进的残差网络，得到估计的内窥镜图像深度。Step S2: Based on the virtual camera pose data, an optical flow self-supervised network is built, and based on the optical flow self-supervised network, an estimated camera pose is obtained; based on the image depth data, an improved residual network is built, and based on the improved residual network, an estimated endoscopic image depth is obtained.

步骤S3、基于估计的相机位姿、估计的内窥镜图像深度，构建环境地图。Step S3: construct an environment map based on the estimated camera pose and the estimated endoscopic image depth.

步骤S4、基于环境地图，提取拓扑中心线，并围绕拓扑中心线进行路径规划和导航。Step S4: extract the topological center line based on the environment map, and perform path planning and navigation around the topological center line.

本申请提供的消化内镜视觉重建导航方法，也称为内窥镜导航算法，该算法包括：基于光流自监督网络的位姿估计算法、基于改进残差网络的单目内窥镜图像深度图估计算法以及拓扑中心线的路径规划与导航算法，以进行消化内镜视觉重建导航。通过该内窥镜导航算法，能够全局感知环境与机器人当前的位姿，并根据当前环境地图对下一步的行动给出提示。The digestive endoscope visual reconstruction navigation method provided in this application is also called an endoscope navigation algorithm, which includes: a pose estimation algorithm based on an optical flow self-supervised network, a monocular endoscope image depth map estimation algorithm based on an improved residual network, and a topological centerline path planning and navigation algorithm to perform digestive endoscope visual reconstruction navigation. Through the endoscope navigation algorithm, the environment and the current pose of the robot can be globally perceived, and prompts for the next action can be given based on the current environment map.

具体的，步骤S1中通过数据采集模块101获取虚拟相机位姿数据、图像深度数据，数据采集模块101采集的虚拟相机位姿与深度数据作为深度网络训练时的真值标签，以及后续的相关评估。在获取虚拟相机位姿数据、图像深度数据后，执行步骤S2，步骤S2又可以分成两个步骤，步骤S201、基于光流自监督网络的位姿估计；步骤S202、基于改进残差网络的单目内窥镜图像深度图估计。Specifically, in step S1, the virtual camera pose data and image depth data are obtained through the data acquisition module 101. The virtual camera pose and depth data collected by the data acquisition module 101 are used as true value labels during deep network training, as well as subsequent related evaluations. After obtaining the virtual camera pose data and image depth data, step S2 is executed. Step S2 can be divided into two steps: step S201, pose estimation based on optical flow self-supervised network; step S202, monocular endoscope image depth map estimation based on improved residual network.

以下实施例是对步骤S201基于光流自监督网络的位姿估计的具体步骤进行解释说明。The following embodiment explains the specific steps of step S201 of posture estimation based on the optical flow self-supervised network.

在一些实施例中，步骤S2中基于所述光流自监督网络，得到估计的相机位姿，包括：以至少两张图片作为输入，进行网络训练，得到与每张图片对应的特征描述子；所述特征描述子根据排序规则进行匹配，得到不同图片之间的对应的像素点；构建置信度评分损失函数，从所述像素点中提取特征点；基于所述特征点以及不同图片之间的几何关系，得到估计的相机位姿。In some embodiments, in step S2, an estimated camera pose is obtained based on the optical flow self-supervised network, including: taking at least two pictures as input, performing network training, and obtaining a feature descriptor corresponding to each picture; matching the feature descriptors according to a sorting rule to obtain corresponding pixel points between different pictures; constructing a confidence score loss function to extract feature points from the pixel points; and obtaining an estimated camera pose based on the feature points and the geometric relationship between different pictures.

需要说明的是，通常以两张图片作为输入，进行网络训练，得到分别与两张图片一一对应的特征描述子。下面以两张图片作为输入进行网络训练来详细说明。It should be noted that two images are usually used as input to perform network training, and feature descriptors corresponding to the two images are obtained. The following is a detailed description of network training using two images as input.

在一些实施例中，以两张图片作为输入，进行网络训练，得到两个特征描述子；两张图片的特征描述子根据排序规则进行匹配，得到两张图片之间对应的像素点；所述置信度评分损失函数如公式(1)所示：In some embodiments, two images are used as input to perform network training to obtain two feature descriptors; the feature descriptors of the two images are matched according to a sorting rule to obtain corresponding pixel points between the two images; the confidence score loss function is shown in formula (1):

在网络中令公式(1)中k＝0.5，当计算得到的AP(i,j)损失函数小于k时，R _ij越小；(x _i,y)、(x _i,y')分别为两幅有重叠区域的图像中对应的像素点的位置坐标。 In the network, let k = 0.5 in formula (1). When the calculated AP(i,j) loss function is less than k, the smaller R _ij is; ( _xi , y) and ( _xi , y') are the position coordinates of the corresponding pixel points in the two images with overlapping areas.

光流自监督网络框架借助光流构建置信度评分损失函数，提取更为鲁棒的特征点。后续再使用本质矩阵以及对极约束从两视图中估计出本质矩阵，以反解出位姿。光流自监督网络各个节点参数如图5所示。网络由两张图片作为输入，两张图片分别以img1和img2表示，经过卷积(Conv)、ReLU激活函数以及批归一化(BN)层的操作，最后输出得到两个特征描述子。两张图片的特征描述子会根据排序规则进行匹配，以找到两张图片之间对应的像素点(称为对应点)。而这个匹配的排序规则是根据预先算出的光流驱动进行自监督的，额外的置信度评分可以帮助在对应点中选出更稳定可靠的特征点，过滤掉那些得分更低的特征点。The optical flow self-supervised network framework uses optical flow to construct a confidence score loss function to extract more robust feature points. Subsequently, the essential matrix and the epipolar constraint are used to estimate the essential matrix from the two views to inversely solve the pose. The parameters of each node of the optical flow self-supervised network are shown in Figure 5. The network takes two images as input, which are represented by img1 and img2 respectively. After the convolution (Conv), ReLU activation function and batch normalization (BN) layer operations, two feature descriptors are finally output. The feature descriptors of the two images are matched according to the sorting rules to find the corresponding pixels between the two images (called corresponding points). The matching sorting rules are self-supervised based on the pre-calculated optical flow drive. The additional confidence score can help select more stable and reliable feature points from the corresponding points and filter out those with lower scores.

损失函数的设计上引入了额外两张图像的光流，在数据加载阶段生成。光流向量中的具体数值代表的是img2中img1每个像素的位置坐标(x,y)，置信度评价损失函数如公式(1)所示，其中，置信度评分R _ij为0～1；R _ij越大，表示特征描述子是特征点的几率越大；k∈[0,1]是一个阈值的超参数，在网络纵通常设置为0.5，当计算得到的图片中像素点的位置坐标AP(i,j)的损失函数小于k时，R _ij越小。 The design of the loss function introduces the optical flow of two additional images, which are generated during the data loading phase. The specific values in the optical flow vector represent the position coordinates (x, y) of each pixel of img1 in img2. The confidence evaluation loss function is shown in formula (1), where the confidence score _Rij is 0-1; the larger the _Rij , the greater the probability that the feature descriptor is a feature point; k∈[0,1] is a threshold hyperparameter, which is usually set to 0.5 in the network. When the loss function of the calculated pixel position coordinate AP(i,j) in the image is less than k, the smaller the _Rij .

平均精度(Average Precision,AP)是在多标签分类中衡量分类结果的一种评价指标，这里用来作为最小化两个特征描述子之间的匹配误差的损失函数。特征描述向量的匹配可以建模为一个排序优化问题，即在两幅有重叠区域的图像I和I'中，在图像I中的每一个特征描述向量与图像I'中的特征描述向量之间计算它们的距离(如欧式距离)。获得距离后，按照距离从小到大排序，距离最小的即为与之匹配的特征。其中标签的真值由图5中的光流进行稀疏采样得到，相当于已经提前知道两帧图像之前的匹配关系。特征点提取以后，需利用经典的两视图几何关系进行位姿估计。Average Precision (AP) is an evaluation index for measuring classification results in multi-label classification. It is used here as a loss function to minimize the matching error between two feature descriptors. The matching of feature description vectors can be modeled as a sorting optimization problem, that is, in two images I and I' with overlapping areas, the distance (such as Euclidean distance) between each feature description vector in image I and the feature description vector in image I' is calculated. After obtaining the distance, sort them from small to large, and the one with the smallest distance is the matching feature. The true value of the label is obtained by sparse sampling of the optical flow in Figure 5, which is equivalent to knowing the matching relationship between the two frames in advance. After the feature points are extracted, the pose estimation needs to be performed using the classic two-view geometric relationship.

本发明通过提出一种基于自监督特征点提取的深度学习网络，直接通过光流等信息对网络提取的特征描述进行自监督，从而提取图像中更鲁棒的特征点。通过实验测试证明这种自监督学习的路线可以解决内窥镜图像在低光照、少纹理特点下的特征点提取问题。提取特征点后再进行两两图像的特征描述子的匹配，最后再通过传统多视图的几何算法求解相机位姿。The present invention proposes a deep learning network based on self-supervised feature point extraction, which directly self-supervises the feature description extracted by the network through information such as optical flow, thereby extracting more robust feature points in the image. Experimental tests have proved that this self-supervised learning route can solve the problem of feature point extraction of endoscopic images under low light and low texture characteristics. After extracting the feature points, the feature descriptors of the two images are matched, and finally the camera pose is solved by the traditional multi-view geometric algorithm.

此处，需要说明的是，本申请实施例采用自监督光流深度网络输出算法获取特征点。可以理解的是，还可以通过其他算法获取特征点。例如，解算两幅图像特征点的过程还可以使用传统的SIFT尺度不变特征变换(Scale-invariant feature transform，SIFT)特征点或者ORB(Oriented FAST and Rotated BRIEF)特征点取代，本申请中使用的自监督光流深度网络输出比上述算法更稳定的特征点，从而进行位姿解算。另外，作为本申请的一种实施方式，单目内窥镜图像深度图预测过程可以使用其他的自定义有监督网络取代。Here, it should be noted that the embodiment of the present application adopts a self-supervised optical flow deep network output algorithm to obtain feature points. It is understandable that feature points can also be obtained through other algorithms. For example, the process of solving the feature points of two images can also be replaced by traditional SIFT scale-invariant feature transform (Scale-invariant feature transform, SIFT) feature points or ORB (Oriented FAST and Rotated BRIEF) feature points. The self-supervised optical flow deep network used in this application outputs feature points that are more stable than the above-mentioned algorithm, thereby performing pose solution. In addition, as an implementation method of the present application, the monocular endoscopic image depth map prediction process can be replaced by other custom supervised networks.

前面提到，步骤S2包括：步骤S201、基于光流自监督网络的位姿估计；步骤S202、基于改进残差网络的单目内窥镜图像深度图估计。接下来，对步骤S202基于改进残差网络的单目内窥镜图像深度图估计的具体步骤进行解释说明。As mentioned above, step S2 includes: step S201, pose estimation based on optical flow self-supervised network; step S202, monocular endoscopic image depth map estimation based on improved residual network. Next, the specific steps of step S202, monocular endoscopic image depth map estimation based on improved residual network, are explained.

在一些实施例中，基于所述改进的残差网络，通过卷积与批归一化处理，得到估计的内窥镜图像深度；改进的残差网络包括编码器模块和解码器模块，所述解码器模块采用带有激活函数的卷积块以及损失函数进行解码；所述激活函数为指数线性单元函数，如公式(3) 所示：In some embodiments, based on the improved residual network, an estimated endoscopic image depth is obtained through convolution and batch normalization processing; the improved residual network includes an encoder module and a decoder module, and the decoder module uses a convolution block with an activation function and a loss function for decoding; the activation function is an exponential linear unit function, as shown in formula (3):

其中，l _i(p)表示彩色图像，而

在基于改进残差网络的单目内窥镜图像深度图估计过程中，单目内窥镜图像的深度估计使用了经典的18层的残差网络(ResNet)进行改进，基于改进残差网络的深度网络架构如图6所示，主要使用卷积结合批归一化(Batch Normalization，BN)提取特征。为了输出与原图像宽高大小相同的深度图，深度网络主要由编码器(Encoder)和解码器(Decoder)组成，在Encoder部分使用了完整的ResNet，在解码部分则直接使用带有指数线性单元函数(Exponential Linear Units，ELU)激活函数的卷积块完成。在编码阶段，每个Basic Block里会进行下采样的操作逐步增大特征向量的通道数，一直增大平均池化层(Avg Pool)与全连接层(FN)提取得到512维的特征向量。在解码阶段，直接使用3*3大小的卷积块配合ELU激活函数达到上采样的目的。In the process of estimating the depth map of monocular endoscopic images based on the improved residual network, the depth estimation of monocular endoscopic images uses the classic 18-layer residual network (ResNet) for improvement. The deep network architecture based on the improved residual network is shown in Figure 6. It mainly uses convolution combined with batch normalization (BN) to extract features. In order to output a depth map with the same width and height as the original image, the deep network is mainly composed of an encoder (Encoder) and a decoder (Decoder). The complete ResNet is used in the encoder part, and the convolution block with the exponential linear unit function (ELU) activation function is directly used in the decoding part. In the encoding stage, downsampling operations are performed in each Basic Block to gradually increase the number of channels of the feature vector, and the average pooling layer (Avg Pool) and the fully connected layer (FN) are always increased to extract a 512-dimensional feature vector. In the decoding stage, a 3*3 convolution block is directly used with the ELU activation function to achieve the purpose of upsampling.

损失函数的设计上，本申请设计了三个损失函数，分别为第一损失函数、第二损失函数和第三损失函数，如公式(4)、公式(5)和公式(6)所示。公式(4)中，T表示经过有效性mask滤后留下的有效值的数量，p∈T。第一损失函数和第二损失函数均是用于直接比较两幅深度图差异性的差异性损失。公式(4)和公式(5)中，D _i(p)表示真实的深度值图像，D _i'(p)表示预测的深度图。由于ResNet预测出的深度图过于平滑，丢失了部分细节的纹理信息，因此本申请改进了ResNet预测出的深度图中的平滑损失，提出了第三损失函数，如公式(6)所示，公式(6)中，l _i(p)表示彩色图像，而

表示对彩色RGB图像和深度图像在x和y方向上求导数，得到彩色图像和深度图像的梯度图像；依据是消化内镜图像中一些弯曲边缘处通常像素的梯度更大，变化更为剧烈。 In the design of the loss function, the present application designs three loss functions, namely the first loss function, the second loss function and the third loss function, as shown in formula (4), formula (5) and formula (6). In formula (4), T represents the number of valid values left after the validity mask filter, p∈T. Both the first loss function and the second loss function are difference losses used to directly compare the differences between two depth maps. In formula (4) and formula (5), _Di (p) represents the true depth value image, and _Di '(p) represents the predicted depth map. Since the depth map predicted by ResNet is too smooth and loses some detailed texture information, the present application improves the smoothness loss in the depth map predicted by ResNet and proposes a third loss function, as shown in formula (6). In formula (6), l _i (p) represents the color image, and

It means taking the derivatives of the color RGB image and the depth image in the x and y directions to obtain the gradient images of the color image and the depth image. The basis is that the gradients of pixels at some curved edges in digestive endoscopy images are usually larger and the changes are more drastic.

在得到估计的相机位姿、估计的内窥镜图像深度之后，执行步骤S3、根据估计的相机位姿、估计的内窥镜图像深度，通过三维重建构建环境地图。环境地图构建完成之后，接下来，执行步骤S4、基于环境地图，构建拓扑地图，并围绕拓扑地图进行路径规划。After obtaining the estimated camera pose and the estimated endoscope image depth, step S3 is executed to construct an environment map through three-dimensional reconstruction according to the estimated camera pose and the estimated endoscope image depth. After the environment map is constructed, step S4 is executed to construct a topology map based on the environment map and perform path planning around the topology map.

在一些实施例中，步骤S4中基于环境地图，提取拓扑中心线，并围绕拓扑中心线进行路径规划和导航，包括以下步骤：In some embodiments, in step S4, extracting a topological centerline based on the environment map, and performing path planning and navigation around the topological centerline includes the following steps:

步骤S401、基于肠腔的管道特性，获取肠腔腔道的拓扑中心线。Step S401: based on the pipeline characteristics of the intestinal cavity, obtain the topological center line of the intestinal cavity.

步骤S402、基于拓扑中心线，在肠腔内可行进的空腔内构建拓扑地图。Step S402: construct a topological map in the traversable cavity in the intestinal cavity based on the topological center line.

步骤S403、基于拓扑地图，进行从相机当前位置到目标位置的路径规划。Step S403: Based on the topological map, a path is planned from the current position of the camera to the target position.

需要说明的是，步骤S4主要是基于拓扑中心线进行路径规划与导航的操作。路径规划的依据是地图的拓扑中心线。拓扑中心线也可以被称为3D广义维诺图(Generalized Voronoi Diagram，GVD)骨架，GVD的生成依赖于欧式符号距离函数(Euclidean Signed Distance Functions，ESDF)度量地图，即ESDF度量地图。在度量地图中迭代找出距离两个或多个障碍物等距离的所有点，将这些点连接起来构成一条空闲空间的脊，也可以称为中轴线。金接下来，再经过骨架细化、修剪等后处理得到完整的稀疏拓扑地图描述作为导航的前进路线。It should be noted that step S4 is mainly an operation of path planning and navigation based on the topological centerline. The basis of path planning is the topological centerline of the map. The topological centerline can also be called the skeleton of the 3D generalized Voronoi diagram (GVD). The generation of GVD depends on the Euclidean signed distance function (ESDF) metric map, that is, the ESDF metric map. In the metric map, iteratively find all points that are equidistant from two or more obstacles, and connect these points to form a ridge of free space, which can also be called the central axis. Next, after post-processing such as skeleton refinement and pruning, a complete sparse topological map description is obtained as the forward route for navigation.

在一些实施例中，步骤S402基于所述拓扑中心线，在肠腔内可行进的空腔内构建拓扑地图，包括以下步骤：In some embodiments, step S402 constructs a topological map in the traversable cavity in the intestinal cavity based on the topological center line, comprising the following steps:

步骤S4021、遍历度量地图中空闲空间中的所有体素.Step S4021, traverse all voxels in the free space in the metric map.

步骤S4022、将每个体素的父方向和与其相邻的体素的父方向进行比较；父方向为当前体素到最近的被占用点体素的方向.Step S4022: Compare the parent direction of each voxel with the parent direction of its adjacent voxel; the parent direction is the direction from the current voxel to the nearest occupied point voxel.

步骤S4023、基于拓扑中心线的角度，对体素进行过滤，保留关键点作为拓扑地图的节点；对节点进行连接，得到拓扑地图。Step S4023: based on the angle of the topological centerline, filter the voxels, retain the key points as nodes of the topological map; connect the nodes to obtain the topological map.

具体的，GVD提取过程包括：首先，遍历ESDF中空闲空间中的所有体素；接下来，将它们的父方向与6个连接邻居的父方向进行比较，父方向定义为当前体素到最近的被占用点体素的方向；然后，根据预先设定的GVD角度，舍弃那些父方向过大的体素。最后，过滤掉多余体素后，留下关键点作为拓扑地图的节点，将节点之间进行连线，得到拓扑地图。Specifically, the GVD extraction process includes: first, traversing all voxels in the free space in ESDF; next, comparing their parent directions with the parent directions of the 6 connected neighbors, where the parent direction is defined as the direction from the current voxel to the nearest occupied point voxel; then, according to the pre-set GVD angle, discarding those voxels whose parent directions are too large. Finally, after filtering out the redundant voxels, the key points are left as the nodes of the topological map, and the nodes are connected to obtain the topological map.

本发明提供的提供的消化内镜视觉重建导航***及方法，经过试验模拟，具有可行性。训练阶段共使用了共21671张虚拟平台采集的结肠镜图像数据，测试阶段分别使用了两组虚拟数据，分别为400和447张。两组临床数据，分别为82和109张，经过实验初步验证效果可行。The digestive endoscopy visual reconstruction navigation system and method provided by the present invention are feasible after experimental simulation. A total of 21,671 colonoscopy image data collected by the virtual platform were used in the training phase, and two sets of virtual data, 400 and 447 respectively, were used in the test phase. The two sets of clinical data, 82 and 109 respectively, were preliminarily verified by experiments to be feasible.

经计算得到的预测深度图的a ₁和a ₂精度均值分别为0.7637和0.9471，误差RMSE均值为0.0929，单位均以深度值的灰度表示。以上误差均在可控范围内，评价指标的计算方式如下： The calculated average precision of _a1 and _a2 of the predicted depth map is 0.7637 and 0.9471 respectively, and the average RMSE of the error is 0.0929, and the units are expressed in the grayscale of the depth value. The above errors are all within the controllable range, and the calculation method of the evaluation indicators is as follows:

其中，d ^*代表预测深度图，d代表真实深度图。 Among them, d ^* represents the predicted depth map and d represents the true depth map.

由以上技术方案，本申请实施例主要针对现有的内窥镜导航算法存在的内窥镜图像在低光照、少纹理特点下的特征点提取困难、无法准确辨别方向的问题，本申请提出了一种消化内镜视觉重建导航***及方法，该***包括：数据采集模块、地图构建模块和路径规划模块；数据采集模块用于采集虚拟相机位姿数据、图像深度数据，并将采集的虚拟相机位姿数据、图像深度数据发送至地图构建模块；地图构建模块用于根据虚拟相机位姿数据、图像深度数据，搭建光流自监督网络、改进的残差网络；并根据光流自监督网络、改进的残差网络，分别进行相机位姿估计、深度图估计，构建环境地图；路径规划模块用于根据环境地图提取拓扑中心线，并围绕拓扑中心线进行路径规划和导航。Based on the above technical scheme, the embodiment of the present application mainly aims at the problems existing in the existing endoscopic navigation algorithm, that is, it is difficult to extract feature points of endoscopic images under low light and low texture characteristics, and it is impossible to accurately identify the direction. The present application proposes a digestive endoscope visual reconstruction navigation system and method, which includes: a data acquisition module, a map construction module and a path planning module; the data acquisition module is used to collect virtual camera pose data and image depth data, and send the collected virtual camera pose data and image depth data to the map construction module; the map construction module is used to build an optical flow self-supervised network and an improved residual network according to the virtual camera pose data and image depth data; and according to the optical flow self-supervised network and the improved residual network, the camera pose estimation and depth map estimation are performed respectively to construct an environment map; the path planning module is used to extract the topological center line according to the environment map, and perform path planning and navigation around the topological center line.

相比于传统的内窥镜导航算法，本申请实施例提供的消化内镜视觉重建导航***及方法，能够全局感知内窥镜环境、视觉重建能记录内窥镜历史轨迹。而且，本申请构建的基于光流自监督的特征点网络更适应于内窥镜图像弱纹理、表面光滑等特点，能够解决内窥镜图像在低光照、少纹理特点下的特征点难以提取问题。另外，相比于有监督深度学习方法，本申请实施例为解决临床图像没有真值标签的问题搭建了数据采集模块，因此，本申请不需要位姿真值标签进行网络的训练，标签仅仅用于验证阶段计算精度指标与误差。Compared with traditional endoscopic navigation algorithms, the digestive endoscope visual reconstruction navigation system and method provided in the embodiments of the present application can globally perceive the endoscopic environment, and the visual reconstruction can record the historical trajectory of the endoscope. Moreover, the feature point network based on optical flow self-supervision constructed in the present application is more suitable for the characteristics of weak texture and smooth surface of endoscopic images, and can solve the problem of difficulty in extracting feature points of endoscopic images under low light and few textures. In addition, compared with the supervised deep learning method, the embodiment of the present application has built a data acquisition module to solve the problem of clinical images without true value labels. Therefore, the present application does not require pose true value labels for network training. The labels are only used to calculate accuracy indicators and errors in the verification stage.

本领域的普通技术人员可以理解，上述各实施方公式是实现本申请的具体实施例，而在实际应用中，可以在形公式上和细节上对其作各种改变，而不偏离本申请的精神和范围。任何本领域技术人员，在不脱离本申请的精神和范围内，均可作各自更动与修改，因此本申请的保护范围应当以权利要求限定的范围为准。Those skilled in the art can understand that the above-mentioned embodiments are specific embodiments of the present application, and in practical applications, various changes can be made to the embodiments in form and detail without departing from the spirit and scope of the present application. Any person skilled in the art can make their own changes and modifications without departing from the spirit and scope of the present application, so the scope of protection of the present application shall be based on the scope defined in the claims.

Claims

一种消化内镜视觉重建导航***，其特征在于，包括：依次连接的数据采集模块、地图构建模块和路径规划模块；A digestive endoscope visual reconstruction navigation system, characterized in that it comprises: a data acquisition module, a map construction module and a path planning module connected in sequence;

所述数据采集模块用于采集虚拟相机位姿数据、图像深度数据，并将采集的虚拟相机位姿数据、图像深度数据发送至所述地图构建模块；The data acquisition module is used to collect virtual camera posture data and image depth data, and send the collected virtual camera posture data and image depth data to the map construction module;

所述地图构建模块用于根据所述虚拟相机位姿数据、所述图像深度数据，搭建光流自监督网络、改进的残差网络；并根据所述光流自监督网络、所述改进的残差网络，分别进行相机位姿估计、深度图估计，构建环境地图；The map construction module is used to build an optical flow self-supervised network and an improved residual network according to the virtual camera pose data and the image depth data; and to perform camera pose estimation and depth map estimation according to the optical flow self-supervised network and the improved residual network, respectively, to construct an environment map;

所述路径规划模块用于根据所述环境地图，提取拓扑中心线，并围绕所述拓扑中心线进行路径规划和导航。The path planning module is used to extract a topological center line according to the environment map, and perform path planning and navigation around the topological center line.
根据权利要求1所述的消化内镜视觉重建导航***，其特征在于，所述地图构建模块包括相机位姿估计模块和深度图估计模块；The digestive endoscopy visual reconstruction navigation system according to claim 1, characterized in that the map construction module includes a camera pose estimation module and a depth map estimation module;

所述相机位姿估计模块用于根据所述光流自监督网络，得到估计的相机位姿；The camera pose estimation module is used to obtain an estimated camera pose according to the optical flow self-supervised network;

所述深度图估计模块用于根据所述改进的残差网络，得到估计的内窥镜图像深度。The depth map estimation module is used to obtain an estimated endoscopic image depth based on the improved residual network.
根据权利要求2所述的消化内镜视觉重建导航***，其特征在于，所述地图构建模块根据估计的相机位姿、估计的内窥镜图像深度，通过三维重建构建环境地图。The digestive endoscopy visual reconstruction navigation system according to claim 2 is characterized in that the map construction module constructs an environmental map through three-dimensional reconstruction based on the estimated camera pose and the estimated endoscopic image depth.
根据权利要求1所述的消化内镜视觉重建导航***，其特征在于，所述路径规划模块包括拓扑中心线获取模块和导航模块；The digestive endoscope visual reconstruction navigation system according to claim 1 is characterized in that the path planning module includes a topological centerline acquisition module and a navigation module;

所述拓扑中心线获取模块用于结合肠腔的管道特性获取肠腔腔道的拓扑中心线；The topological centerline acquisition module is used to acquire the topological centerline of the intestinal cavity in combination with the pipeline characteristics of the intestinal cavity;

所述导航模块用于提取所述拓扑中心线，并围绕所述拓扑中心线进行路径规划与导航。The navigation module is used to extract the topological center line and perform path planning and navigation around the topological center line.
一种消化内镜视觉重建导航方法，采用如权利要求1至权利要求4所述的消化内镜视觉重建导航***进行导航，其特征在于，包括以下步骤：A digestive endoscope visual reconstruction navigation method, using the digestive endoscope visual reconstruction navigation system according to claim 1 to claim 4 for navigation, characterized in that it comprises the following steps:

获取虚拟相机位姿数据、图像深度数据；Obtain virtual camera pose data and image depth data;

基于所述虚拟相机位姿数据，搭建光流自监督网络，并基于所述光流自监督网络，得到估计的相机位姿；基于所述图像深度数据，搭建改进的残差网络，并基于所述改进的残差网络，得到估计的内窥镜图像深度；Based on the virtual camera pose data, an optical flow self-supervised network is constructed, and based on the optical flow self-supervised network, an estimated camera pose is obtained; based on the image depth data, an improved residual network is constructed, and based on the improved residual network, an estimated endoscopic image depth is obtained;

基于所述估计的相机位姿、所述估计的内窥镜图像深度，构建环境地图；constructing an environment map based on the estimated camera pose and the estimated endoscopic image depth;

基于所述环境地图，提取拓扑中心线，并围绕所述拓扑中心线进行路径规划和导航。Based on the environment map, a topological center line is extracted, and path planning and navigation are performed around the topological center line.
根据权利要求5所述的消化内镜视觉重建导航方法，其特征在于，所述基于所述光流自监督网络，得到估计的相机位姿，包括：The digestive endoscopy visual reconstruction navigation method according to claim 5 is characterized in that the estimated camera pose is obtained based on the optical flow self-supervised network, comprising:

以至少两张图片作为输入，进行网络训练，得到与每张图片对应的特征描述子；Take at least two pictures as input, train the network, and obtain the feature descriptor corresponding to each picture;

所述特征描述子根据排序规则进行匹配，得到不同图片之间的对应的像素点；The feature descriptors are matched according to the sorting rules to obtain corresponding pixel points between different pictures;

构建置信度评分损失函数，从所述像素点中提取特征点；Constructing a confidence score loss function to extract feature points from the pixel points;

基于所述特征点以及不同图片之间的几何关系，得到估计的相机位姿。Based on the feature points and the geometric relationship between different pictures, an estimated camera pose is obtained.
根据权利要求6所述的消化内镜视觉重建导航方法，其特征在于，以两张图片作为输入，进行网络训练，得到两个特征描述子；两张图片的特征描述子根据排序规则进行匹配，得到两张图片之间对应的像素点；The visual reconstruction navigation method for digestive endoscopy according to claim 6 is characterized in that two images are used as input to perform network training to obtain two feature descriptors; the feature descriptors of the two images are matched according to a sorting rule to obtain corresponding pixel points between the two images;

所述置信度评分损失函数如公式(1)所示：The confidence score loss function is shown in formula (1):

其中，R _ij表示置信度评分，R _ij＝0～1；R _ij越大，表示特征描述子是特征点的几率越大； Wherein, _Rij represents the confidence score, _Rij = 0-1; the larger the _Rij , the greater the probability that the feature descriptor is a feature point;

(i,j)表示图片中像素点的位置坐标；AP(i,j)表示像素点的平均精度；k∈[0,1]是一个阈值的超参数；(i,j) represents the position coordinates of the pixel in the image; AP(i,j) represents the average precision of the pixel; k∈[0,1] is a threshold hyperparameter;

通过平均精度的损失函数，提取特征点；平均精度的损失函数如公式(2)所示：The feature points are extracted through the loss function of average precision. The loss function of average precision is shown in formula (2):

在网络中令公式(1)中k＝0.5，当计算得到的AP(i,j)损失函数小于k时，R _ij越小； In the network, let k = 0.5 in formula (1). When the calculated AP(i,j) loss function is less than k, the smaller R _{ij is} ;

(x _i,y)、(x _i,y')分别为两幅有重叠区域的图像中对应的像素点的位置坐标。 ( _xi , y) and ( _xi , y') are the position coordinates of the corresponding pixel points in the two images with overlapping areas.
根据权利要求5所述的消化内镜视觉重建导航方法，其特征在于，基于所述改进的残差网络，通过卷积与批归一化处理，得到估计的内窥镜图像深度；The digestive endoscope visual reconstruction navigation method according to claim 5 is characterized in that, based on the improved residual network, the estimated endoscopic image depth is obtained through convolution and batch normalization processing;

所述改进的残差网络包括编码器模块和解码器模块，所述解码器模块采用带有激活函数的卷积块以及损失函数进行解码；The improved residual network includes an encoder module and a decoder module, and the decoder module uses a convolution block with an activation function and a loss function for decoding;

所述激活函数为指数线性单元函数，如公式(3)所示：The activation function is an exponential linear unit function, as shown in formula (3):

其中，ELU(x)表示指数线性单元函数；Where ELU(x) represents the exponential linear unit function;

所述损失函数包括第一损失函数、第二损失函数和第三损失函数；所述第一损失函数如公式(4)所示：The loss function includes a first loss function, a second loss function and a third loss function; the first loss function is shown in formula (4):

其中，D _i(p)表示真实的深度值图像，D _i'(p)表示预测的深度图；h _i＝logD _i'(p)－logD _i(p)；T表示经过滤后留下的有效值的数量，p∈T； Where D _i (p) represents the real depth value image, D _i '(p) represents the predicted depth map; h _i = log D _i '(p) - log _{D i} (p); T represents the number of valid values left after filtering, p∈T;

所述第二损失函数如公式(5)所示：The second loss function is shown in formula (5):

所述第三损失函数如公式(6)所示：The third loss function is shown in formula (6):

其中，l _i(p)表示彩色图像，而
表示对彩色图像和深度图像在x和y方向上求导数，得到彩色图像和深度图像的梯度图像。 Among them, l _i (p) represents the color image, and
It means taking the derivatives of the color image and the depth image in the x and y directions to obtain the gradient images of the color image and the depth image.
根据权利要求5所述的消化内镜视觉重建导航方法，其特征在于，基于所述环境地图，提取拓扑中心线，并围绕所述拓扑中心线进行路径规划和导航，包括：The method for visual reconstruction and navigation of digestive endoscopy according to claim 5 is characterized in that, based on the environment map, a topological center line is extracted, and path planning and navigation are performed around the topological center line, including:

基于肠腔的管道特性，获取肠腔腔道的拓扑中心线；Based on the pipeline characteristics of the intestinal cavity, the topological center line of the intestinal cavity is obtained;

基于所述拓扑中心线，在肠腔内可行进的空腔内构建拓扑地图；基于所述拓扑地图，进行从相机当前位置到目标位置的路径规划。Based on the topological center line, a topological map is constructed in the movable cavity in the intestinal cavity; based on the topological map, a path planning is performed from the current position of the camera to the target position.
根据权利要求9所述的消化内镜视觉重建导航方法，其特征在于，所述基于所述拓扑中心线，在肠腔内可行进的空腔内构建拓扑地图，包括：The digestive endoscopy visual reconstruction navigation method according to claim 9 is characterized in that the topological map is constructed in the movable cavity in the intestinal cavity based on the topological center line, comprising:

遍历度量地图中空闲空间中的所有体素；Iterate over all voxels in the free space in the metric map;

将每个体素的父方向和与其相邻的体素的父方向进行比较；所述父方向为当前体素到最近的被占用点体素的方向；Compare the parent direction of each voxel with the parent direction of the voxel adjacent to it; the parent direction is the direction from the current voxel to the nearest occupied point voxel;

基于所述拓扑中心线的角度，对体素进行过滤，保留关键点作为拓扑地图的节点；Based on the angle of the topological centerline, the voxels are filtered and key points are retained as nodes of the topological map;

对所述节点进行连接，得到拓扑地图。The nodes are connected to obtain a topological map.