WO2023142780A1

WO2023142780A1 - Mobile robot visual navigation method and apparatus based on deep reinforcement learning

Info

Publication number: WO2023142780A1
Application number: PCT/CN2022/140079
Authority: WO
Inventors: 张仪; 冯伟; 王卫军; 朱子翰
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2022-01-25
Filing date: 2022-12-19
Publication date: 2023-08-03
Also published as: CN114526738B; CN114526738A

Abstract

A mobile robot visual navigation method and apparatus based on deep reinforcement learning, a terminal device, and a computer readable medium, relating to the field of machine vision navigation. According to the method, on the basis of a deep reinforcement learning method, by using an image, a depth image, and a target position as input, navigation can be performed in a large space where a plurality of scenarios are mixed, and the navigation capability of a mobile robot visual navigation technology based on deep reinforcement learning is improved; in addition, according to the method, a reward function related to the speed of a mobile robot, and a distance between the mobile robot and a target is designed, so that the training of a deep reinforcement learning model can be rapidly converged; and according to the method, the problem of sparse rewards is solved, a model convergence speed is increased, and the navigation performance in a complex large scenario is improved.

Description

一种基于深度强化学习的移动机器人视觉导航方法及装置A mobile robot visual navigation method and device based on deep reinforcement learning

技术领域technical field

本发明涉及机器视觉导航领域，具体而言，涉及一种基于深度强化学习的移动机器人视觉导航方法及装置。The present invention relates to the field of machine vision navigation, in particular to a mobile robot vision navigation method and device based on deep reinforcement learning.

背景技术Background technique

基于深度强化学习的复杂大场景下的移动机器人视觉导航方法，以当前观测得到的图像及目标信息为输入，输出连续动作使智能体避开障碍，以较短路径到指定位置。目前基于深度强化学习的移动机器人视觉导航技术主要有以下两大问题：一是大空间下的视觉导航性能偏弱，二是同时在多种不同场景进行导航有难度。The visual navigation method of mobile robots in complex large scenes based on deep reinforcement learning takes the currently observed image and target information as input, and outputs continuous actions to enable the agent to avoid obstacles and reach the designated location with a short path. At present, the visual navigation technology of mobile robots based on deep reinforcement learning mainly has the following two problems: one is that the performance of visual navigation in large spaces is weak, and the other is that it is difficult to navigate in multiple different scenes at the same time.

目前技术比较成熟且应用较为广泛的定位技术为全球定位***定位技术，这种定位技术的定位方法是车辆、手机等移动设备通过搭载GPS模块实现对自身的定位，从而实现导航。但此种技术在室内会出现信号偏弱，定位不准等问题，导致导航效果不佳，无法实现在室内外同时进行导航的目标；此外，移动机器人导航技术应用较多使用基于激光雷达导航的技术和基于视觉导航的技术，通过激光雷达和视觉传感器完成同时定位与建图，实现移动机器人的导航；但激光雷达成本较高，也无法识别物体；而视觉传感器造价低，可通过图像对周围环境有一个清楚的认知。At present, the positioning technology that is relatively mature and widely used is the global positioning system positioning technology. The positioning method of this positioning technology is that mobile devices such as vehicles and mobile phones realize their own positioning by carrying GPS modules, so as to realize navigation. However, this kind of technology will have problems such as weak signal and inaccurate positioning indoors, resulting in poor navigation effect, and the goal of simultaneously navigating indoors and outdoors cannot be achieved; Technology and technology based on visual navigation, complete simultaneous positioning and mapping through laser radar and visual sensors, and realize the navigation of mobile robots; The environment has a clear perception.

现有的技术没有充分融合图像中的深度信息，对未知场景目标的泛化能力、避障能力较差，且对深度强化学习的奖励函数设计较简单，极易出现奖励稀疏问题，导致移动机器人极难到达目标点，导致训练收敛速度变慢，另外在复杂的大空间下导航性能大大减弱。The existing technology does not fully integrate the depth information in the image, the generalization ability and obstacle avoidance ability of unknown scene targets are poor, and the reward function design for deep reinforcement learning is relatively simple, which is prone to the problem of reward sparseness, which leads to mobile robots It is extremely difficult to reach the target point, resulting in slower training convergence, and the navigation performance is greatly weakened in complex large spaces.

因此，越来越多的研究人员将精力投入到基于深度强化学习的移动机器人的视觉导航中来，只需简单输入当前移动机器人观测到的图像以及目标位置，即可以较短路径无碰撞的到达指定位置。Therefore, more and more researchers are devoting their energy to the visual navigation of mobile robots based on deep reinforcement learning. Simply input the images observed by the current mobile robot and the target position, that is, they can reach the target in a short path without collision. specify the location.

发明内容Contents of the invention

本发明实施例提供了一种基于深度强化学习的移动机器人视觉导航方法及装置，以提高机器人在复杂的多场景下的快速导航性能。Embodiments of the present invention provide a mobile robot visual navigation method and device based on deep reinforcement learning, so as to improve the rapid navigation performance of the robot in complex multi-scenario.

根据本发明的一实施例，提供了一种基于深度强化学习的移动机器人视觉导航方法，包括以下步骤：According to an embodiment of the present invention, a kind of mobile robot visual navigation method based on deep reinforcement learning is provided, comprising the following steps:

构建具有多种场景的场景地图；Build scene maps with multiple scenarios;

移动机器人在场景地图中移动，并在场景地图中收集当前观测到的图像以及目标点位置信息，将当前观测到的图像以及目标点位置信息，设计卷积神经网络并提取出图像特征；The mobile robot moves in the scene map, and collects the currently observed images and target point location information in the scene map, designs the convolutional neural network and extracts image features from the currently observed images and target point position information;

在场景地图中的单场景中构建深度强化学习模型，将图像特征及目标点位置输入深度强化学习模型，并通过设计奖励函数进行深度强化学习模型的训练，输出移动机器人连续的线速度以及角速度；Build a deep reinforcement learning model in a single scene in the scene map, input image features and target point positions into the deep reinforcement learning model, and design a reward function to train the deep reinforcement learning model, and output the continuous linear velocity and angular velocity of the mobile robot;

将各个单场景中移动机器人学习到的动作存储到数据库中，在穿越多种场景时对应使用，将各单场景的连通位置作为中间目标点，导航移动机器人到达目标位置。The actions learned by the mobile robot in each single scene are stored in the database, and used when traveling through multiple scenes, and the connected positions of each single scene are used as intermediate target points to navigate the mobile robot to the target position.

进一步地，构建具有多种场景的场景地图具体为：Further, constructing a scene map with various scenes is specifically as follows:

基于gazebo仿真平台构建具有多种场景的场景地图。Based on the gazebo simulation platform, a scene map with various scenes is constructed.

进一步地，在场景地图中的单场景中构建PPO深度强化学习模型。Further, a PPO deep reinforcement learning model is constructed in a single scene in the scene map.

进一步地，奖励函数为：Further, the reward function is:

其中，奖励函数的含义为：当移动机器人到达目标，即可获得100的奖励，若在导航过程中发生碰撞，给予-50的碰撞奖励；导航过程中，为了以最短距离到达目标，将与目标的距离Δd乘一个系数C ₁，作为距离奖励；为了以最快速度到达目标，将移动机器人的线速度C _v乘一个系数C ₂，作为速度奖励；为了以平滑的路径到达目标，限制移动机器人的角速度C _w，将其乘一个系数C ₃作为转弯奖励；为了以较短步数到达目标，加入步长奖励C ₄。 Among them, the meaning of the reward function is: when the mobile robot reaches the target, it can get a reward of 100, and if a collision occurs during the navigation process, a collision reward of -50 will be given; during the navigation process, in order to reach the target with the shortest distance, it will The distance Δd multiplied by a coefficient C ₁ is used as a distance reward; in order to reach the target at the fastest speed, the linear velocity C _v of the mobile robot is multiplied by a coefficient C ₂ as a speed reward; in order to reach the target with a smooth path, limit the mobile robot The angular velocity C _w of , multiply it by a coefficient C ₃ as the turning reward; in order to reach the goal with a shorter number of steps, add the step size reward C ₄ .

进一步地，将各个单场景中移动机器人学习到的策略存储到数据库中，在穿越多种场景时对应使用，将各单场景的连通位置作为中间目标点，直到到达目标位置具体为：Furthermore, the strategies learned by the mobile robot in each single scene are stored in the database, and used correspondingly when traveling through multiple scenes, and the connected positions of each single scene are used as intermediate target points until reaching the target position. Specifically:

将移动机器人在单场景中动作存储到数据库中；Store the actions of the mobile robot in a single scene into the database;

在场景地图中，根据移动机器人的位置及图像特征确定所处场景；In the scene map, the scene is determined according to the position and image characteristics of the mobile robot;

从数据库中调取对应动作，以根据对应动作实现导航至目标位置。The corresponding action is called from the database, so as to realize the navigation to the target position according to the corresponding action.

进一步地，在从数据库中调取对应动作，以根据对应动作实现导航至目标位置之前还包括：Further, before calling the corresponding action from the database, so as to realize the navigation to the target position according to the corresponding action, it also includes:

判断移动机器人的起点与终点是否位于同一单场景；Determine whether the starting point and the ending point of the mobile robot are located in the same single scene;

若是同一单场景，则直接从对应的单场景的数据库中选取动作，移动机器人根据选取动作移动至目标点。If it is the same single scene, the action is directly selected from the database of the corresponding single scene, and the mobile robot moves to the target point according to the selected action.

进一步地，在判断移动机器人的起点与终点是否位于同一单场景之后还包括：Further, after judging whether the starting point and the ending point of the mobile robot are located in the same single scene, it also includes:

若不是同一单场景，则判断移动机器人是否需要穿越其它单场景才能到达目标；If it is not the same single scene, judge whether the mobile robot needs to pass through other single scenes to reach the target;

若不用穿越其它单场景，则确定当前单场景与目标单场景间的中间目标点，从单场景的数据库中选取动作，移动机器人根据选取的动作到达中间目标点，继续判断移动机器人的起点与终点是否位于同一单场景，直至移动机器人根据选取动作移动至目标点；If there is no need to traverse other single scenes, determine the intermediate target point between the current single scene and the target single scene, select an action from the database of the single scene, and the mobile robot will reach the intermediate target point according to the selected action, and continue to judge the starting point and end point of the mobile robot Whether it is in the same single scene until the mobile robot moves to the target point according to the selected action;

若需要穿越其它单场景，则确定当前单场景与要穿越的单场景之间的中间目标点，从对应单场景的数据库中取动作到达中间目标点，继续判断移动机器人的起点与终点是否位于同一单场景，直至移动机器人根据选取动作移动至目标点。If other single scenes need to be traversed, then determine the intermediate target point between the current single scene and the single scene to be traversed, take the action from the database corresponding to the single scene to reach the intermediate target point, and continue to judge whether the starting point and the ending point of the mobile robot are at the same Single scene until the mobile robot moves to the target point according to the selected action.

一种基于深度强化学习的移动机器人视觉导航装置，包括：A mobile robot visual navigation device based on deep reinforcement learning, including:

地图构建模块，用于构建具有多种场景的场景地图；A map building block for building a scene map with a variety of scenes;

特征提取模块，用于移动机器人在场景地图中移动，并在场景地图中收集当前观测到的图像以及目标点位置信息，将当前观测到的图像以及目标点位置信息，设计卷积神经网络并提取出图像特征；The feature extraction module is used for the mobile robot to move in the scene map, and collect the currently observed image and target point position information in the scene map, design the convolutional neural network and extract the currently observed image and target point position information image features;

机器人状态输出模块，用于在场景地图中的单场景中构建深度强化学习模型，将图像特征及目标点位置输入深度强化学习模型，并通过设计奖励函数进行深度强化学习模型的训练，输出移动机器人连续的线速度以及角速度；The robot state output module is used to build a deep reinforcement learning model in a single scene in the scene map, input image features and target point positions into the deep reinforcement learning model, and train the deep reinforcement learning model by designing a reward function, and output the mobile robot Continuous linear velocity and angular velocity;

目标位置导航模块，用于将各个单场景中移动机器人学习到的动作存储到数据库中，在穿越多种场景时对应使用，将各单场景的连通位置作为中间目标点，导航移动机器人到达目标位置。The target position navigation module is used to store the actions learned by the mobile robot in each single scene into the database, and use it when traveling through multiple scenes. The connected positions of each single scene are used as intermediate target points to guide the mobile robot to reach the target position.

一种计算机可读介质，计算机可读存储介质存储有一个或者多个程序，一个或者多个程序可被一个或者多个处理器执行，以实现如上述任意一项的基于深度强化学习的移动机器人视觉导航方法中的步骤。A computer-readable medium, the computer-readable storage medium stores one or more programs, and one or more programs can be executed by one or more processors, so as to realize a mobile robot based on deep reinforcement learning as described above. Steps in the visual navigation method.

一种终端设备，包括：处理器、存储器及通信总线；存储器上存储有可被处理器执行的计算机可读程序；A terminal device, including: a processor, a memory, and a communication bus; a computer-readable program that can be executed by the processor is stored on the memory;

通信总线实现处理器和存储器之间的连接通信；The communication bus realizes the connection communication between the processor and the memory;

处理器执行计算机可读程序时实现如上述任意一项的基于深度强化学习的移动机器人视觉导航方法中的步骤。When the processor executes the computer-readable program, the steps in any one of the above-mentioned deep reinforcement learning-based mobile robot vision navigation methods are realized.

本发明提供一种基于深度强化学习的移动机器人视觉导航方法及装置，本发明方法基于深度强化学习方法，以图像、深度图像、目标位置为输入，可实现在多种场景，例如包含工厂，餐厅，办公区，户外等环境混合的大空间下进行导航，提升了基于深度强化学习的移动机器人视觉导航技术的导航能力；此外，本发明通过设计移动机器人速度及移动机器人与目标距离相关的奖励函数，使深度强化学习模型的训练能够快速收敛；本发明可提高深度强化学习方法在复杂大场景下的导航能力，解决奖励稀疏问题，加快模型收敛速度，提高在复杂大场景下的导航性能。The present invention provides a mobile robot visual navigation method and device based on deep reinforcement learning. The method of the present invention is based on the deep reinforcement learning method, with images, depth images, and target positions as inputs, and can be implemented in various scenarios, such as factories and restaurants. , office area, outdoor and other environment mixed large space, which improves the navigation ability of the mobile robot visual navigation technology based on deep reinforcement learning; in addition, the present invention designs the reward function related to the speed of the mobile robot and the distance between the mobile robot and the target , so that the training of the deep reinforcement learning model can quickly converge; the invention can improve the navigation ability of the deep reinforcement learning method in complex large scenes, solve the problem of reward sparsity, accelerate the model convergence speed, and improve the navigation performance in complex large scenes.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解，构成本申请的一部分，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。在附图中：The accompanying drawings described here are used to provide a further understanding of the present invention and constitute a part of the application. The schematic embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations to the present invention. In the attached picture:

图1为本发明基于深度强化学习的移动机器人视觉导航方法的流程图；Fig. 1 is the flow chart of the mobile robot visual navigation method based on deep reinforcement learning of the present invention;

图2为本发明基于深度强化学习的视觉导航模型图；Fig. 2 is a visual navigation model diagram based on deep reinforcement learning of the present invention;

图3为本发明移动机器人移至目标点的区域化导航模型图；Fig. 3 is the regionalized navigation model figure that the mobile robot of the present invention moves to the target point;

图4为本发明基于深度强化学习的移动机器人视觉导航装置的模块图；Fig. 4 is the block diagram of the mobile robot vision navigation device based on deep reinforcement learning of the present invention;

图5为本发明终端设备原理图。FIG. 5 is a schematic diagram of a terminal device of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is an embodiment of a part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、***、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

根据本发明一实施例，提供了一种基于深度强化学习的移动机器人视觉导航方法，参见图1，包括以下步骤：According to an embodiment of the present invention, a mobile robot visual navigation method based on deep reinforcement learning is provided, referring to Fig. 1, comprising the following steps:

S100：构建具有多种场景的场景地图；S100: Construct a scene map with various scenes;

S200：移动机器人在场景地图中移动，并在场景地图中收集当前观测到的图像以及目标点位置信息，将当前观测到的图像以及目标点位置信息，设计卷积神经网络并提取出图像特征；S200: The mobile robot moves in the scene map, collects the currently observed image and target point location information in the scene map, designs a convolutional neural network and extracts image features from the currently observed image and target point location information;

实施例中，将移动机器人模型在步骤101中的场景地图移动，将当前观测到的RGB-D图像以及目标点位置信息，设计卷积神经网络并提取出图像特征。In the embodiment, the scene map of the mobile robot model in step 101 is moved, and the currently observed RGB-D image and target point position information are used to design a convolutional neural network and extract image features.

S300：在场景地图中的单场景中构建深度强化学习模型，将图像特征及目标点位置输入深度强化学习模型，并通过设计奖励函数进行深度强化学习模型的训练，输出移动机器人连续的线速度以及角速度；S300: Construct a deep reinforcement learning model in a single scene in the scene map, input image features and target point positions into the deep reinforcement learning model, and design a reward function to train the deep reinforcement learning model, and output the continuous linear velocity and angular velocity;

S400：将各个单场景中移动机器人学习到的动作存储到数据库中，在穿越多种场景时对应使用，将各单场景的连通位置作为中间目标点，导航移动机器人到达目标位置。S400: Store the actions learned by the mobile robot in each single scene into the database, use it when traveling through multiple scenes, use the connected positions of each single scene as intermediate target points, and navigate the mobile robot to reach the target position.

本发明面向复杂大场景下的视觉导航，可实现在餐厅，办公室，户外，工厂等多种大空间下的视觉导航，并可穿越不同场景进行视觉导航。本发明基于区域化的方法，通过输入RGB图像及深度图像，设计和移动机器人与目标间距离、移动机器人速度相关的奖励函数，使移动机器人能较快到达目标位置；通过对多种场景的区域化处理，实现移动机器人在复杂大场景下的视觉导航。The invention is oriented to visual navigation in complex large scenes, and can realize visual navigation in various large spaces such as restaurants, offices, outdoors, factories, etc., and can perform visual navigation across different scenes. The present invention is based on the regionalization method, by inputting RGB images and depth images, designing a reward function related to the distance between the mobile robot and the target, and the speed of the mobile robot, so that the mobile robot can reach the target position quickly; to realize the visual navigation of mobile robots in complex large scenes.

实施例中，步骤S100具体为：In an embodiment, step S100 is specifically:

具体的，构建基于gazebo仿真平台的复杂大场景地图，地图中包括工厂、办公室、户外、餐厅等多种场景。Specifically, construct a complex large-scale scene map based on the gazebo simulation platform, which includes various scenes such as factories, offices, outdoors, and restaurants.

实施例中，步骤S300包括：In an embodiment, step S300 includes:

在场景地图中的单场景中构建PPO深度强化学习模型。Build a PPO deep reinforcement learning model in a single scene in a scene map.

具体的，在单场景中构建PPO深度强化学习模型，并将步骤S102中的图像特征及目标点位置作为模型的输入，设计奖励函数，进行模型的训练，输出为连续的移动机器人的线速度以及角速度。Specifically, a PPO deep reinforcement learning model is constructed in a single scene, and the image features and target point positions in step S102 are used as the input of the model, the reward function is designed, the model is trained, and the output is the continuous linear velocity of the mobile robot and angular velocity.

具体地，为了要使移动机器人快速到达目标位置，需要设计一个与移动机器人的目标距离及速度相关的奖励函数，因此设计如下奖励函数：Specifically, in order to make the mobile robot reach the target position quickly, it is necessary to design a reward function related to the target distance and speed of the mobile robot, so the following reward function is designed:

实施例中，步骤S400具体为：In an embodiment, step S400 is specifically:

S401：将移动机器人在单场景中动作存储到数据库中；S401: storing the actions of the mobile robot in a single scene into a database;

S402：在场景地图中，根据移动机器人的位置及图像特征确定所处场景；S402: In the scene map, determine the scene where the mobile robot is located according to the position and image features;

S403：从数据库中调取对应动作，以根据对应动作实现导航至目标位置。S403: Retrieving the corresponding action from the database, so as to realize the navigation to the target position according to the corresponding action.

具体的，将步骤S300中各个单场景中学习到的策略或动作存储到经验池或数据库中，在穿越多场景时对应使用，将各场景连通位置作为中间目标点，导航移动机器人，直至移动机器人到达目标位置。Specifically, the strategies or actions learned in each single scene in step S300 are stored in the experience pool or database, and used correspondingly when traveling through multiple scenes, and the connected positions of each scene are used as intermediate target points to navigate the mobile robot until the mobile robot reach the target location.

实施例中，在步骤S403之前还包括：In an embodiment, before step S403, it also includes:

S404：判断移动机器人的起点与终点是否位于同一单场景；S404: judging whether the starting point and the ending point of the mobile robot are located in the same single scene;

S405：若是同一单场景，则直接从对应的单场景的数据库中选取动作，移动机器人根据选取动作移动至目标点。S405: If it is the same single scene, directly select an action from the database of the corresponding single scene, and the mobile robot moves to the target point according to the selected action.

在步骤S404之后还包括：Also include after step S404:

S406：若不是同一单场景，则判断移动机器人是否需要穿越其它单场景才能到达目标；S406: If it is not the same single scene, judge whether the mobile robot needs to pass through other single scenes to reach the target;

S407：若不用穿越其它单场景，则确定当前单场景与目标单场景间的中间目标点，从单场景的数据库中选取动作，移动机器人根据选取的动作到达中间目标点，继续判断移动机器人的起点与终点是否位于同一单场景，直至移动机器人根据选取动作移动至目标点；S407: If there is no need to traverse other single scenes, determine the intermediate target point between the current single scene and the target single scene, select an action from the database of the single scene, and the mobile robot will reach the intermediate target point according to the selected action, and continue to judge the starting point of the mobile robot Whether it is located in the same single scene as the end point, until the mobile robot moves to the target point according to the selected action;

S408：若需要穿越其它单场景，则确定当前单场景与要穿越的单场景之间的中间目标点，从对应单场景的数据库中取动作到达中间目标点，继续判断移动机器人的起点与终点是否位于同一单场景，直至移动机器人根据选取动作移动至目标点。S408: If other single scenes need to be traversed, determine the intermediate target point between the current single scene and the single scene to be traversed, take actions from the database corresponding to the single scene to reach the intermediate target point, and continue to judge whether the starting point and the ending point of the mobile robot are Stay in the same single scene until the mobile robot moves to the target point according to the selected action.

在本发明中，我们提出了一个基于深度强化学习的用于复杂大场景的区域化视觉导航框架。主要包括：基于深度强化学习的导航模型、深度强化学习奖励函数设计、区域化导航模型。In this invention, we propose a deep reinforcement learning based regionalized visual navigation framework for complex large scenes. It mainly includes: navigation model based on deep reinforcement learning, design of reward function for deep reinforcement learning, and regionalized navigation model.

具体地，本发明步将移动机器人以第一视角观测到的RGB图像及深度图像输入卷积神经网络，提取出与目标和障碍物相关的特征。Specifically, the present invention first inputs RGB images and depth images observed by the mobile robot from the first perspective into the convolutional neural network to extract features related to targets and obstacles.

通过构建与移动机器人与目标之间距离、夹角、以及本身线速度及角速度相关的奖励函数，计算出移动机器人所采取的动作的奖励值。By constructing a reward function related to the distance between the mobile robot and the target, the included angle, and its own linear velocity and angular velocity, the reward value of the action taken by the mobile robot is calculated.

设计区域化导航模型，将移动机器人在单场景中动作存储到数据库中，在复杂大场景下，根据移动机器人的位置及周围图像特征确定所处场景，并从数据库中调取动作以实现导航至目标。Design a regionalized navigation model, store the actions of the mobile robot in a single scene into the database, and in a complex large scene, determine the scene according to the position of the mobile robot and the surrounding image features, and call the action from the database to realize navigation to Target.

具体地，基于深度强化学习的导航模型：Specifically, the navigation model based on deep reinforcement learning:

室内目标驱动视觉导航如图2所示，该网络以移动机器人64×48×3的RGB图像以及32×24×1的深度图像为输入，RGB图像首先经过32个滤波器，8×6的内核，跨度为4，ReLU为激活函数的二维卷积层，然后经过64个滤波器，4×3的内核，跨度为2，ReLU为激活函数的二维卷积层，然后经过以2×2的内核，跨度为2的最大化池化层，最后经过64个滤波器，2×2的内核，跨度为2，ReLU为激活函数的二维卷积层，获得有关RGB图像的特征向量；深度图像首先经过32个滤波器，4×3的内核，跨度为2，ReLU为激活函数的二维卷积层，然后经过64个滤波器，4×3的内核，跨度为2，ReLU为激活函数的二维卷积层，然后经过以2×2的内核，跨度为2的最大化池化层，最后经过64个滤波器，2×2的内核，跨度为2，ReLU为激活函数的二维卷积层，获得有关深度图像的特征向量；Indoor target-driven visual navigation is shown in Figure 2. The network takes a 64×48×3 RGB image and a 32×24×1 depth image of the mobile robot as input. The RGB image first passes through 32 filters, and the 8×6 kernel , the span is 4, ReLU is the two-dimensional convolutional layer of the activation function, and then passes through 64 filters, the kernel of 4×3, the span is 2, the ReLU is the two-dimensional convolutional layer of the activation function, and then passes through the 2×2 The kernel, the maximum pooling layer with a stride of 2, and finally 64 filters, a 2×2 kernel, a two-dimensional convolutional layer with a stride of 2, and ReLU as the activation function to obtain the feature vector of the RGB image; depth The image first passes through 32 filters, a 4×3 kernel, a span of 2, and ReLU is a two-dimensional convolutional layer of the activation function, and then passes through 64 filters, a 4×3 kernel, a span of 2, and ReLU is an activation function The two-dimensional convolution layer, then through the maximization pooling layer with a 2×2 kernel and a span of 2, and finally through 64 filters, a 2×2 kernel, a span of 2, and a two-dimensional ReLU activation function The convolutional layer obtains the feature vector of the depth image;

将有关RGB图像和深度图像的特征向量铺平整合，经过具有32个隐藏单元和ReLU激活函数的全连接层处理，将处理后结果与目标信息整合，输入具有256个隐藏单元的LSTM层中，将处理后结果与上一时刻移动机器人速度、上一时刻移动机器人获得的奖励进行整合，输入具有256个隐藏单元的LSTM层中，再输入具有32个隐藏单元和ReLU激活函数的全连接层处理，得到该时刻移动机器人的速度，实现端到端的视觉导航。The feature vectors of the RGB image and the depth image are flattened and integrated, processed by a fully connected layer with 32 hidden units and a ReLU activation function, and the processed results are integrated with the target information, and input into the LSTM layer with 256 hidden units. Integrate the processed result with the speed of the mobile robot at the last moment and the reward obtained by the mobile robot at the last moment, input it into the LSTM layer with 256 hidden units, and then input it into the fully connected layer with 32 hidden units and ReLU activation function for processing , get the speed of the mobile robot at this moment, and realize end-to-end visual navigation.

具体地，深度强化学习奖励函数的改进设计思路基于如下区域化导航模型，并参考图3所示内容。Specifically, the improved design idea of the deep reinforcement learning reward function is based on the following regionalized navigation model, and refer to the content shown in Figure 3.

步骤一：将移动机器人在单场景中学习到的策略存储到经验池或数据库中，在穿越多场景时对应使用，将各单场景连通位置作为中间目标点。Step 1: Store the strategy learned by the mobile robot in a single scene into the experience pool or database, and use it when traveling through multiple scenes, and use the connected positions of each single scene as the intermediate target point.

步骤二：判断移动机器人起点与终点是否在同一子地图(单场景)，若是，直接从对应子地图经验池中选取动作，根据所选取的动作即可到达目标点；若否执行步骤三。Step 2: Determine whether the starting point and the ending point of the mobile robot are in the same sub-map (single scene), if so, directly select an action from the experience pool of the corresponding sub-map, and reach the target point according to the selected action; if not, perform step 3.

步骤三：判断是否需要穿越其它子地图才能到达目标；若否，则先确定当前子地图与目标子地图间的中间目标点，从子地图经验池或数据库选取动作到达中间目标点，转执行步骤二；若是，则执行步骤四；Step 3: Determine whether you need to cross other sub-maps to reach the target; if not, first determine the intermediate target point between the current sub-map and the target sub-map, select an action from the sub-map experience pool or database to reach the intermediate target point, and go to the execution step 2; if so, go to step 4;

步骤四：确定移动机器人当前子地图与要穿越的子地图间的中间目标点，从子地图经验池或数据库选取动作到达中间目标点，转执行步骤二，直至移动机器人到达目的点。Step 4: Determine the intermediate target point between the current sub-map of the mobile robot and the sub-map to be traversed, select an action from the sub-map experience pool or database to reach the intermediate target point, and then perform step 2 until the mobile robot reaches the target point.

本发明通过gazebo仿真平台建立融合工厂，办公室，户外，餐厅的复杂大场景，通过设计区域化深度强化学习的视觉导航框架，改进深度强化学习中的奖励函数，实现移动机器人在复杂大场景下的视觉导航。本发明对比现有技术，提升了在复杂大场景下的泛化能力，提升了导航性能。The present invention uses the gazebo simulation platform to establish a complex large scene that integrates factories, offices, outdoors, and restaurants. By designing a visual navigation framework for regionalized deep reinforcement learning, the reward function in deep reinforcement learning is improved to realize mobile robots in complex large scenes. visual navigation. Compared with the prior art, the present invention improves the generalization ability in complex large scenes and improves the navigation performance.

本发明经过与视觉导航方法进行多组实验对比，在设计的仿真地图中取得了很好的效果，在复杂大场景下的泛化能力和导航性能都有所提升。The present invention has achieved good results in the designed simulation map through multiple sets of experimental comparisons with the visual navigation method, and the generalization ability and navigation performance in complex large scenes have been improved.

参见图4，根据本发明一实施例，提供了一种基于深度强化学习的移动机器人视觉导航装置，包括：Referring to Fig. 4, according to an embodiment of the present invention, a mobile robot visual navigation device based on deep reinforcement learning is provided, including:

地图构建模块100，用于构建具有多种场景的场景地图；A map construction module 100, configured to construct a scene map with multiple scenes;

特征提取模块200，用于移动机器人在场景地图中移动，并在场景地图中收集当前观测到的图像以及目标点位置信息，将当前观测到的图像以及目标点位置信息，设计卷积神经网络并提取出图像特征；The feature extraction module 200 is used for the mobile robot to move in the scene map, and collect the currently observed images and target point position information in the scene map, design the convolutional neural network and extract the currently observed images and target point position information out image features;

机器人状态输出模块300，用于在场景地图中的单场景中构建深度强化学习模型，将图像特征及目标点位置输入深度强化学习模型，并通过设计奖励函数进行深度强化学习模型的训练，输出移动机器人连续的线速度以及角速度；The robot state output module 300 is used to construct a deep reinforcement learning model in a single scene in the scene map, input image features and target point positions into the depth reinforcement learning model, and perform training of the depth reinforcement learning model by designing a reward function, and output movement Continuous linear velocity and angular velocity of the robot;

目标位置导航模块400，用于将各个单场景中移动机器人学习到的动作存储到数据库中，在穿越多种场景时对应使用，将各单场景的连通位置作为中间目标点，导航移动机器人到达目标位置。The target position navigation module 400 is used to store the actions learned by the mobile robot in each single scene into the database, and use it when passing through multiple scenes, and use the connected positions of each single scene as the intermediate target point to guide the mobile robot to reach the target position .

基于上述基于深度强化学习的移动机器人视觉导航方法，本实施例提供了一种计算机可读存储介质，计算机可读存储介质存储有一个或者多个程序，一个或者多个程序可被一个或者多个处理器执行，以实现如上述实施例的基于深度强化学习的移动机器人视觉导航方法中的步骤。Based on the above-mentioned mobile robot visual navigation method based on deep reinforcement learning, this embodiment provides a computer-readable storage medium, the computer-readable storage medium stores one or more programs, and one or more programs can be used by one or more The processor executes to realize the steps in the mobile robot visual navigation method based on deep reinforcement learning as in the above-mentioned embodiments.

一种终端设备，包括：处理器、存储器及通信总线；存储器上存储有可被处理器执行的计算机可读程序；通信总线实现处理器和存储器之间的连接通信；处理器执行计算机可读程序时实现上述的基于深度强化学习的移动机器人视觉导航方法中的步骤。A terminal device, including: a processor, a memory, and a communication bus; a computer-readable program that can be executed by the processor is stored on the memory; the communication bus realizes connection and communication between the processor and the memory; the processor executes the computer-readable program The steps in the above-mentioned mobile robot vision navigation method based on deep reinforcement learning are realized at the same time.

基于上述基于深度强化学习的移动机器人视觉导航方法，本申请提供了一种终端设备，如图5所示，其包括至少一个处理器(processor)20；显示屏21；以及存储器(memory)22，还可以包括通信接口(Communications Interface)23和总线24。其中，处理器20、显示屏21、存储器22和通信接口23可以通过总线24完成相互间的通信。显示屏21设置为显示初始设置模式中预设的用户引导界面。通信接口23可以传输信息。处理器 20可以调用存储器22中的逻辑指令，以执行上述实施例中的方法。Based on the above-mentioned mobile robot visual navigation method based on deep reinforcement learning, the present application provides a terminal device, as shown in FIG. 5 , which includes at least one processor (processor) 20; display screen 21; and memory (memory) 22, A communication interface (Communications Interface) 23 and a bus 24 may also be included. Wherein, the processor 20 , the display screen 21 , the memory 22 and the communication interface 23 can communicate with each other through the bus 24 . The display screen 21 is configured to display the preset user guidance interface in the initial setting mode. The communication interface 23 can transmit information. The processor 20 can invoke logic instructions in the memory 22 to execute the methods in the above-mentioned embodiments.

此外，上述的存储器22中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。In addition, the above-mentioned logic instructions in the memory 22 may be implemented in the form of software functional units and when sold or used as an independent product, may be stored in a computer-readable storage medium.

存储器22作为一种计算机可读存储介质，可设置为存储软件程序、计算机可执行程序，如本公开实施例中的方法对应的程序指令或模块。处理器20通过运行存储在存储器22中的软件程序、指令或模块，从而执行功能应用以及数据处理，即实现上述实施例中的方法。As a computer-readable storage medium, the memory 22 can be configured to store software programs and computer-executable programs, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 runs software programs, instructions or modules stored in the memory 22 to execute functional applications and data processing, ie to implement the methods in the above-mentioned embodiments.

存储器22可包括存储程序区和存储数据区，其中，存储程序区可存储操作***、至少一个功能所需的应用程序；存储数据区可存储根据终端设备的使用所创建的数据等。此外，存储器22可以包括高速随机存取存储器，还可以包括非易失性存储器。例如，U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等多种可以存储程序代码的介质，也可以是暂态存储介质。The memory 22 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and at least one application required by a function; the data storage area may store data created according to the use of the terminal device, and the like. In addition, the memory 22 may include a high-speed random access memory, and may also include a non-volatile memory. For example, various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc., can also be temporary state storage medium.

此外，上述存储介质以及终端设备中的多条指令处理器加载并执行的具体过程在上述方法中已经详细说明，在这里就不再一一陈述。In addition, the specific process of loading and executing multiple instruction processors in the storage medium and the terminal device has been described in detail in the above method, and will not be described here one by one.

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above is only a preferred embodiment of the present invention, it should be pointed out that, for those of ordinary skill in the art, without departing from the principle of the present invention, some improvements and modifications can also be made, and these improvements and modifications can also be made. It should be regarded as the protection scope of the present invention.

Claims

一种基于深度强化学习的移动机器人视觉导航方法，其特征在于，包括以下步骤：A mobile robot visual navigation method based on depth reinforcement learning, is characterized in that, comprises the following steps:

构建具有多种场景的场景地图；Build scene maps with multiple scenarios;

移动机器人在所述场景地图中移动，并在所述场景地图中收集当前观测到的图像以及目标点位置信息，将当前观测到的所述图像以及目标点位置信息，设计卷积神经网络并提取出图像特征；The mobile robot moves in the scene map, and collects the currently observed image and target point position information in the scene map, designs the convolutional neural network and extracts the currently observed image and target point position information image features;

在所述场景地图中的单场景中构建深度强化学习模型，将所述图像特征及目标点位置输入所述深度强化学习模型，并通过设计奖励函数进行所述深度强化学习模型的训练，输出所述移动机器人连续的线速度以及角速度；Construct a deep reinforcement learning model in a single scene in the scene map, input the image features and target point positions into the depth reinforcement learning model, and carry out the training of the depth reinforcement learning model by designing a reward function, and output the Describe the continuous linear velocity and angular velocity of the mobile robot;

将各个所述单场景中所述移动机器人学习到的动作存储到数据库中，在穿越多种场景时对应使用，将各所述单场景的连通位置作为中间目标点，导航所述移动机器人到达目标位置。Store the actions learned by the mobile robot in each of the single scenes into a database, use them when traveling through multiple scenes, use the connected positions of each of the single scenes as intermediate target points, and navigate the mobile robot to reach the target Location.
根据权利要求1所述的基于深度强化学习的移动机器人视觉导航方法，其特征在于，所述构建具有多种场景的场景地图具体为：The method for visual navigation of mobile robots based on deep reinforcement learning according to claim 1, wherein said construction of a scene map with multiple scenes is specifically:

基于gazebo仿真平台构建具有多种场景的所述场景地图。The scene map with multiple scenes is constructed based on the gazebo simulation platform.
根据权利要求1所述的基于深度强化学习的移动机器人视觉导航方法，其特征在于，在所述场景地图中的单场景中构建PPO深度强化学习模型。The mobile robot visual navigation method based on deep reinforcement learning according to claim 1, wherein a PPO deep reinforcement learning model is constructed in a single scene in the scene map.
根据权利要求1所述的基于深度强化学习的移动机器人视觉导航方法，其特征在于，所述奖励函数为：The mobile robot visual navigation method based on deep reinforcement learning according to claim 1, wherein the reward function is:

其中，所述奖励函数的含义为：当移动机器人到达目标，即可获得100的奖励，若在导航过程中发生碰撞，给予-50的碰撞奖励；导航过程中，为了以最短距离到达目标，将与目标的距离Δd乘一个系数C ₁，作为距离奖励；为了以最快速度到达目标，将移动机器人的线速度C _v乘一个系数C ₂，作为速度奖励；为了以平滑的路径到达目标，限制移动机器人的角速度C _w，将其乘一个系数C ₃作为转弯奖励；为了以较短步数到达目标，加入步长奖励C ₄。 Wherein, the meaning of the reward function is: when the mobile robot reaches the target, a reward of 100 can be obtained, and if a collision occurs during the navigation process, a collision reward of -50 will be given; during the navigation process, in order to reach the target with the shortest distance, the The distance Δd from the target is multiplied by a coefficient C ₁ as a distance reward; in order to reach the target at the fastest speed, the linear velocity C _v of the mobile robot is multiplied by a coefficient C ₂ as a speed reward; in order to reach the target with a smooth path, limit The angular velocity C _w of the mobile robot is multiplied by a coefficient C ₃ as a turning reward; in order to reach the goal with a shorter number of steps, a step reward C ₄ is added.
根据权利要求1所述的基于深度强化学习的移动机器人视觉导航方法，其特征在于，所述将各个所述单场景中所述移动机器人学习到的策略存储到数据库中，在穿越多种场景时对应使用，将各所述单场景的连通位置作为中间目标点，直到到达目标位置具体为：The mobile robot visual navigation method based on deep reinforcement learning according to claim 1, wherein the strategies learned by the mobile robot in each of the single scenes are stored in a database, and when passing through multiple scenes Corresponding to use, the connected position of each single scene is used as the intermediate target point until the target position is reached, specifically:

将所述移动机器人在单场景中动作存储到数据库中；Store the actions of the mobile robot in a single scene into a database;

在所述场景地图中，根据所述移动机器人的位置及所述图像特征确定所处场景；In the scene map, the scene where the mobile robot is located is determined according to the position of the mobile robot and the image features;

从所述数据库中调取对应动作，以根据所述对应动作实现导航至所述目标位置。The corresponding actions are retrieved from the database, so as to navigate to the target location according to the corresponding actions.
根据权利要求5所述的基于深度强化学习的移动机器人视觉导航方法，其特征在于，在所述从所述数据库中调取对应动作，以根据所述对应动作实现导航至所述目标位置之前还包括：The method for visual navigation of a mobile robot based on deep reinforcement learning according to claim 5, characterized in that before the corresponding action is called from the database to realize navigation to the target position according to the corresponding action include:

判断所述移动机器人的起点与终点是否位于同一所述单场景；judging whether the starting point and the ending point of the mobile robot are located in the same single scene;

若是同一所述单场景，则直接从对应的所述单场景的所述数据库中选取动作，所述移动机器人根据所述选取动作移动至目标点。If it is the same single scene, an action is directly selected from the database of the corresponding single scene, and the mobile robot moves to the target point according to the selected action.
根据权利要求6所述的基于深度强化学习的移动机器人视觉导航方法，其特征在于，在所述判断所述移动机器人的起点与终点是否位于同一所述单场景之后还包括：The mobile robot visual navigation method based on deep reinforcement learning according to claim 6, characterized in that, after said judging whether the starting point and the ending point of said mobile robot are located in the same single scene, it also includes:

若不是同一所述单场景，则判断所述移动机器人是否需要穿越其它所述单场景才能到达目标；If it is not the same single scene, it is judged whether the mobile robot needs to pass through other single scenes to reach the target;

若不用穿越其它所述单场景，则确定当前单场景与目标单场景间的中间目标点，从单场景的数据库中选取动作，所述移动机器人根据选取的动作到达所述中间目标点，继续判断所述移动机器人的起点与终点是否位于同一所述单场景，直至所述移动机器人根据所述选取动作移动至目标点；If there is no need to pass through other single scenes, then determine the intermediate target point between the current single scene and the target single scene, select an action from the database of the single scene, and the mobile robot will arrive at the intermediate target point according to the selected action, and continue to judge Whether the starting point and the ending point of the mobile robot are located in the same single scene until the mobile robot moves to the target point according to the selection action;

若需要穿越其它所述单场景，则确定当前单场景与要穿越的单场景之间的中间目标点，从对应单场景的所述数据库中取动作到达中间目标点，继续判断所述移动机器人的起点与终点是否位于同一所述单场景，直至所述移动机器人根据所述选取动作移动至目标点。If it is necessary to pass through other single scenes, then determine the intermediate target point between the current single scene and the single scene to be traversed, take actions from the database corresponding to the single scene to reach the intermediate target point, and continue to judge the position of the mobile robot Whether the start point and the end point are located in the same single scene, until the mobile robot moves to the target point according to the selection action.
一种基于深度强化学习的移动机器人视觉导航装置，其特征在于，包括：A mobile robot visual navigation device based on deep reinforcement learning, characterized in that it includes:

地图构建模块，用于构建具有多种场景的场景地图；A map building block for building a scene map with a variety of scenes;

特征提取模块，用于移动机器人在所述场景地图中移动，并在所述场景地图中收集当前观测到的图像以及目标点位置信息，将当前观测到的所述图像以及目标点位置信息，设计卷积神经网络并提取出图像特征；The feature extraction module is used for the mobile robot to move in the scene map, and collect the currently observed image and target point position information in the scene map, and design the currently observed image and target point position information A neural network is used to extract image features;

机器人状态输出模块，用于在所述场景地图中的单场景中构建深度强化学习模型，将所述图像特征及目标点位置输入所述深度强化学习模型，并通过设计奖励函数进行所述深度强化学习模型的训练，输出所述移动机器人连续的线速度以及角速度；The robot state output module is used to construct a deep reinforcement learning model in a single scene in the scene map, input the image features and target point positions into the depth reinforcement learning model, and perform the depth reinforcement by designing a reward function Learning model training, outputting the continuous linear velocity and angular velocity of the mobile robot;

目标位置导航模块，用于将各个所述单场景中所述移动机器人学习到的动作存储到数据库中，在穿越多种场景时对应使用，将各所述单场景的连通位置作为中间目标点，导航所述移动机器人到达目标位置。The target position navigation module is used to store the actions learned by the mobile robot in each of the single scenes into a database, and use it when passing through multiple scenes, and use the connected positions of each of the single scenes as intermediate target points, Navigate the mobile robot to a target location.
一种计算机可读介质，其特征在于，所述计算机可读存储介质存储有一个或者多个程序，所述一个或者多个程序可被一个或者多个处理器执行，以实现如权利要求1-7任意一项所述的基于深度强化学习的移动机器人视觉导航方法中的步骤。A computer-readable medium, characterized in that the computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors, so as to realize the claims 1- 7. The steps in any one of the deep reinforcement learning-based mobile robot vision navigation methods.
一种终端设备，其特征在于，包括：处理器、存储器及通信总线；所述存储器上存储有可被所述处理器执行的计算机可读程序；A terminal device, characterized by comprising: a processor, a memory, and a communication bus; a computer-readable program executable by the processor is stored in the memory;

所述通信总线实现处理器和存储器之间的连接通信；The communication bus realizes connection and communication between the processor and the memory;

所述处理器执行所述计算机可读程序时实现如权利要求1-7任意一项所述的基于深度强化学习的移动机器人视觉导航方法中的步骤。When the processor executes the computer-readable program, the steps in the method for visual navigation of a mobile robot based on deep reinforcement learning according to any one of claims 1-7 are realized.