CN117631660A

CN117631660A - Cross-media continuous learning-based robot multi-scene path planning method and system

Info

Publication number: CN117631660A
Application number: CN202311371787.8A
Authority: CN
Inventors: 张伟; 赵越男; 李晓磊; 杨志强; 李腾; 李睿童; 谢寅铎; 许筱毓
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2023-10-20
Filing date: 2023-10-20
Publication date: 2024-03-01

Abstract

The invention discloses a multi-scene path planning method and system for a robot based on cross-media continuous learning, and relates to the technical field of path planning. The method comprises the following steps: constructing a multi-scene task environment according to task requirements; acquiring images of scenes and laser point cloud data, and constructing a cross-media perception model; and performing multi-task training and continuous learning by using the cross-media perception model to obtain intermediate features, and training a path planning strategy network by using the intermediate features to obtain a path planning strategy. The invention can train the mobile robot to complete flexible, safe and reliable path planning tasks in different scenes.

Description

Cross-media continuous learning-based robot multi-scene path planning method and system

Technical Field

The invention relates to the technical field of path planning, in particular to a multi-scene path planning method and system for a robot based on cross-media continuous learning.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Mobile robots are one of the most important branches in the field of robotics research. The mobile robot needs to have autonomous navigation and obstacle avoidance capabilities, and can autonomously complete navigation tasks from one point to another point and avoid obstacles while traveling along a set path. Early mobile robots were commonly used in a terrain-leveled indoor environment, such as TurtleBot developed by the Willow Garage laboratory, which is a typical indoor mobile robot development platform; moveBase in the robot operating system ROS is also a very widely applied and mature indoor robot navigation framework. In recent years, with the development of communication technology and artificial intelligence, unmanned has become a hot spot for research in academia and industry, and a more mature technical solution has been formed, for example, apollo proposed by hundred degrees and an unmanned frame represented by Autoware proposed by Kato et al in ancient house university.

The application of mobile robots is very wide, such as home, factory, warehouse, hospital, agriculture, traffic, detection, rescue, etc. In order to accommodate these complex and diverse environments, mobile robots need to have powerful perceptions that can acquire and understand surrounding information to make reasonable decisions and behaviors. In the perception process of the mobile robot, the cross-media-based path planning method is to acquire different information by using a plurality of types of sensors as system input, and output decision and control tracks by a navigation strategy module. The cross-media perception can improve the perception effect and the robustness of the mobile robot. This is because cross-media perception may provide richer and more complete information, and the perception of a single media may be affected by noise, occlusion, illumination, etc., resulting in inaccurate or incomplete information. The cross-media perception can supplement and enhance the quality of information by fusing data from different sensors. For example, the vision sensor may provide information of color, shape, texture, etc., while the lidar may provide information of distance, depth, etc. Combining this information, a more detailed and accurate scene description can be obtained.

Moreover, the cross-media perception can improve the adaptability and flexibility of the mobile robot in path planning. Different environments and tasks may require different awareness modes and strategies. The perception of a single modality may be limited by certain specific conditions, resulting in failure to function properly. The cross-media sensing can adapt to different situations and requirements by selecting a proper sensor combination and a switching mode, so that the path planning performance of the robot in different task scenes is improved.

However, the difference between different task scenes is often huge, which results in that the cross-media perception model trained in a single task scene is poor in performance in other scenes, and thus the path planning failure of the mobile robot is also easily caused. And meanwhile, the data combined training of different scenes is acquired, which leads to high model training and deployment cost.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention aims to provide a multi-scene path planning method and system for a robot based on cross-media continuous learning, which adopt a continuous learning technology to improve the adaptability of the path planning method based on the cross-media under different task scenes, continuously learn and update in a changed environment, and simultaneously keep the learned knowledge in the scenes to avoid catastrophic forgetting.

In order to achieve the above object, the present invention is realized by the following technical scheme:

the invention provides a multi-scene path planning method of a robot based on cross-media continuous learning, which comprises the following steps:

constructing a multi-scene task environment according to task requirements;

acquiring images of scenes and laser point cloud data, and constructing a cross-media perception model;

performing multi-task training and continuous learning by using a cross-media perception model to obtain intermediate features, wherein the features of the image and the laser point cloud data are fused by using a self-attention mechanism in the training process;

and training the path planning strategy network by using the intermediate features to obtain a path planning strategy, wherein the path planning strategy network is optimized by using a reinforcement learning mode.

Furthermore, the cross-media perception model comprises two feature extraction branches and a cross-media self-attention mechanism module, and intermediate features are output after passing through the cross-media self-attention mechanism module, and the intermediate features contain information of two data modes, so that cross-media information fusion is completed.

Furthermore, in the process of multi-task training and continuous learning by utilizing the fused features, in order to improve the continuous learning capacity of the cross-media perception model, in the task training of the nth task scene, data with fixed proportion is randomly extracted from the previous n-1 scenes to be used as memory playback, so that the model is prevented from being forgotten disastrous.

Further, the specific steps of fusing the image and the laser point cloud data by using the self-attention mechanism are as follows:

respectively carrying out feature extraction on the image and the laser point cloud data by utilizing an image feature extraction branch and a laser point cloud feature extraction branch to obtain a feature map;

the feature map is spliced through a full-connection layer after being subjected to average pooling, so that feature vectors from the image and the radar are combined through element-by-element summation, cross-media information fusion is completed, and intermediate features are obtained.

Further, the multi-task training comprises a semantic map segmentation task and a target detection task.

Further, the specific steps of utilizing the intermediate features to train the path planning strategy network are as follows:

after training of the cross-media perception model is completed by using continuous learning, the cross-media perception model learns the spatial characterization capability under different scenes, then freezes the weight of the cross-media perception model, and trains a path planning strategy network by using the intermediate features extracted by the cross-media perception model.

Furthermore, the mobile robot acquires environment state observation information from the environment, then inputs the environment state observation information into a cross-media perception model of the frozen weight to obtain current intermediate features, takes the current intermediate features as the input of the trained path planning strategy network, and outputs actions by the path planning strategy network to obtain the path planning strategy.

The second aspect of the invention provides a robot multi-scene path planning system based on cross-media continuous learning, comprising:

the scene building module is configured to build task environments of multiple scenes according to task requirements;

the model construction module is configured to acquire images of scenes and laser point cloud data and construct a cross-media perception model;

the continuous learning module is configured to perform multi-task training and continuous learning by using the cross-media perception model to obtain intermediate features, wherein the features of the image and the laser point cloud data are fused by using a self-attention mechanism in the training process;

and the path planning module is configured to train the path planning strategy network by utilizing the intermediate features to obtain a path planning strategy, wherein the path planning strategy network is optimized by utilizing a reinforcement learning mode.

A third aspect of the present invention provides a medium having stored thereon a program which when executed by a processor performs the steps in a method for multi-scenario path planning for a robot based on cross-media continuous learning according to the first aspect of the present invention.

A fourth aspect of the present invention provides an apparatus comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in the method for planning a path of a robot based on cross-media continuous learning according to the first aspect of the present invention when the program is executed.

The one or more of the above technical solutions have the following beneficial effects:

the invention discloses a robot multi-scene path planning method and a system based on cross-media continuous learning, which are used for improving the continuous learning capacity of a cross-media perception model to the environment through memory playback among multi-task scenes; then, the path planning model is input by utilizing the middle features of the learned multi-scene perception, and the mobile robot is trained through reinforcement learning to realize the completion of flexible, safe and reliable path planning tasks in different scenes.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

Fig. 1 is a flowchart of a method for planning a multi-scene path of a robot based on cross-media continuous learning in a first embodiment of the invention;

fig. 2 is a block diagram of a multi-scene path planning method of a robot based on cross-media continuous learning in accordance with an embodiment of the present invention;

fig. 3 is a schematic view of a multi-scene environment according to a first embodiment of the invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof;

embodiment one:

mobile robots need to work in different task scenarios, which put higher demands on the generalization of the cross-media based path planning method, because these scenarios often have large spatial structure differences. In order to solve the problem, the invention provides a multi-scene path planning method and system for a robot based on cross-media continuous learning. The continuous learning technique may improve the adaptability and flexibility of the perception technique. Because different task scenes may have different data distribution, noise, occlusion, illumination, etc., a single perception model may not perform well in all scenes. The continuous learning technology can dynamically adjust and optimize the perception model according to a new task scene, and the generalization capability and the robustness of the model are enhanced through a memory playback method.

As shown in fig. 1, the invention firstly decouples the perception and planning module of the mobile robot, and improves the continuous learning ability of the cross-media perception model to the environment through the memory playback among the multi-task scenes; then, the path planning model is input by utilizing the middle features of the learned multi-scene perception, and the mobile robot is trained through reinforcement learning to realize the completion of flexible, safe and reliable path planning tasks in different scenes.

According to the invention, the perception and route planning module of the mobile robot is decoupled, and the perception and characterization capacity of the system to different task scenes is improved through a continuous learning technology, so that the route planning capacity of the mobile robot in multiple scenes is finally improved.

The specific steps are as follows:

the first embodiment of the invention provides a multi-scene path planning method of a robot based on cross-media continuous learning, which is shown in fig. 2 and comprises the following steps:

and step 1, constructing a multi-scene task environment according to task requirements.

And 2, acquiring images of the scene and laser point cloud data, and constructing a cross-media perception model.

And 3, performing multi-task training and continuous learning by using the cross-media perception model to obtain intermediate features.

And 4, training a path planning strategy network by using the intermediate features to obtain a path planning strategy.

In step 1, three commonly used working scenarios of the mobile robot are set up in this embodiment: highway scenes, campus scenes, indoor scenes, as shown in fig. 3. The types of the obstacles and the events contained in different scenes have larger difference, so that the common obstacles of the scenes are added into each scene to improve the perception capability of the cross-media perception model on the various types of the obstacles, and the total of the following 12 types of obstacles are included: pedestrians, automobiles, electric vehicles, road fences, shoulders, barriers, lamp posts, garbage cans, shrubs, tables, chairs, and bookends. And acquiring cross-media perception data in the 3 different working scenes, and training a perception model by adopting a continuous learning technology.

In step 2, a cross-media perception model shown in fig. 1 is constructed, wherein the cross-media perception model comprises two feature extraction branches and a cross-media self-attention mechanism module, and the feature extraction branches comprise image feature extraction branches formed by an image extractor and laser point cloud feature extraction branches formed by a radar. The input of the cross-media perception model comprises two data modes of an image and a laser point cloud, and intermediate features are output after passing through the cross-media self-attention module, and the intermediate features contain information of the two data modes, so that cross-media information fusion is completed.

In step 3, as shown in fig. 1, in the process of performing multi-task training and continuous learning by using the fused features, in order to improve the continuous learning ability of the cross-media perception model, in the task training of the nth task scene, data with a fixed proportion is randomly extracted from the previous n-1 scenes to be used as memory playback, so as to avoid catastrophic forgetting of the model.

In this embodiment, the self-attention mechanism is used to fuse the features of the image and the laser point cloud data during the training process. The method comprises the following specific steps:

In a specific implementation, the present embodiment designs a cross-media self-attention mechanism to fuse image global information with laser point cloud modal spatial distribution features. The input to the cross-media global self-attention mechanism is a continuous discrete sequence, each sequence element representing a feature extracted from a respective channel.

Formally, the input sequence is expressed asWhere N is the number of tags in the sequence and D represents the feature vector dimension of each sequence element, e.g. D _f Representing the dimensions of each element in the input sequence. The cross-media self-attention mechanism uses linear projection to compute a set of queries, keywords, and values (Q, K and V).

Wherein the method comprises the steps ofIs a weight matrix. It uses the scaling dot product between Q and K to calculate the attention weight Attn and then aggregates the value of each query.

Finally, a cross-media self-attention mechanism uses a Multi-Layer Perceptron (MLP) to compute the output feature F ^out These features and input features F ⁱⁿ Has the same shape.

F ^out ＝MLP(A)+F ⁱⁿ (3)

Cross-media self-attention mechanisms the attention mechanism is applied multiple times throughout the architecture, resulting in L layers of attention. In the standard cross-media self-attention mechanism, each layer has multiple parallel attention "heads" involving generating multiple Q, K and V values for equation (1) and concatenating the values of result a of equation (2). Unlike token input structures in natural language processing, this embodiment operates on feature maps of a grid structure, which treats the intermediate feature maps of each modality as a set, rather than a spatial grid, and treats each element in the set as a token, similar to previous studies that apply cross-media self-attention mechanisms to images. The convolution feature extractor of the image and the point cloud input encodes different aspects of the scene at different levels. Thus, the present embodiment fuses on these features over multiple scales of the encoder.

The present embodiment obtains feature maps from feature extractors of image branches and from laser point cloud branches, respectively. These feature maps are pooled by averaging and then passed through a fully connected layer. The eigenvectors from the image and radar are combined by element-wise summation. The intermediate feature vector forms a compact representation of the environment, encodes global contexts of various task scenes, and completes cross-media information fusion.

In this embodiment, in order to improve the feature expression capability of the cross-media perception model, a semantic map segmentation task and a target detection task are added to guide the learning of the feature extractor.

(1) Semantic map segmentation task: in the semantic map segmentation task, a three-channel bird's eye view segmentation mask is predicted, including roads, lane markings, and other categories. This may cause the intermediate feature to encode information about the travelable and non-travelable regions. The semantic map uses the same coordinate frame as the mobile robot input and is obtained from the feature map of the radar branches by a convolutional decoder. The semantic map segmentation task is trained using cross entropy loss.

(2) Target detection task: the present embodiment locates other vehicles in the scene by designing the decoder for keypoint estimationAnd (5) a vehicle. In particular, a position map is predicted from intermediate features using a convolutional decoderSimilar to the semantic segmentation map prediction task, the 2D target labels of this task are rendered using gaussian kernels at each object center in the training dataset. Since the direction is a single scalar value, it is challenging to directly regress this value, as observed in existing 3D detectors. This embodiment predicts a regression graph that contains three regression targets: vehicle size (∈R) ² ) Position offset (∈R) ² ) And a directional offset (∈r). The position offset is used to compensate for quantization errors introduced by the lower resolution when predicting the position map. The directional offset is used to correct the directional discretization error. It should be noted in particular that only the location with the vehicle center is supervised to predict +.>And->The position map, the direction map and the regression map use focus loss, cross entropy loss and L, respectively ₁ Loss training.

In step 4, the specific steps of using the intermediate features to train the path planning strategy network are as follows:

The method comprises the steps of optimizing a path planning strategy network by using a reinforcement learning mode, fixing the weight of a perception model after the cross-media perception model is trained, acquiring environment state observation information from the environment by a mobile robot, inputting the environment state observation information into the cross-media perception model with frozen weight to obtain a current intermediate feature, taking the current intermediate feature as the input of the trained path planning strategy network, and outputting actions by the path planning strategy network to obtain the path planning strategy.

Reinforcement learning is a goal-oriented algorithm that learns how to accomplish a particular goal through a series of sequential decisions. Reinforcement learning and supervision learning and unsupervised learning are combined into three machine learning paradigms. Supervised learning relies on sample data with labels, and models are summarized from training data to learn the characteristic distribution of the data. Unsupervised learning is the direct learning of the intrinsic structural information of the data itself from some unlabeled sample data. Unlike the two previous paradigms, the reinforcement learning algorithm starts with a piece of "white paper" and obtains experience by trial and error, and then summarizes the rules from the experience. Finally, higher returns can be obtained by the algorithm learning what actions are performed in what states.

Reinforcement learning may be abstracted as a markov decision process (Markov Decision Process, MDP for short). The Markov decision process is a typical sequence decision mathematical model. The markov decision process is generally composed of five elements, namely (S, a, R, P, γ). Wherein S represents a set of finite states S; a represents a set of agent actions a; r is an instant prize R obtained after the intelligent agent executes the action; p is a state transition probability function representing the execution of action a by the agent _t From the current state s _t Transition to the next state s _t+1 Probability of (2); gamma represents the discount factor of the reward. The basic settings of the markov decision process in reinforcement learning are: within a finite time step T, the agent interacts with the environment, receiving an environment state s at each discrete time stamp T _t Then, according to the state, selecting a corresponding action a _t After the action is performed, the intelligent agent obtains instant feedback, namely the rewarding value r _t And next state s _t+1 . The goal of the Markov decision process is to find an optimal strategy pi ^* Enabling the reinforcement learning agent to obtain the maximum desired return, namely:

this embodiment uses an Actor-Critic based near-end policy optimization algorithm. In the algorithm, an Actor represents a strategy network, and is responsible for outputting actions, including linear velocity linear_vel and angular velocity angular_vel, as shown in the strategy network in the stage two of fig. 1; and Critic represents a network of value functions for evaluating cumulative returns.

In the framework of the Actor-Critic algorithm, the difference between the action value function and the state value function output by the Critic network is generally called as an dominance function a, and the function is used for evaluating the degree of taking a certain action in the current state. Since the Actor-Critic algorithm uses a function approximator and bootstrap, the evaluation of the dominance function requires balancing the variance and variance by a generalized dominance evaluator (Generalized Advantage Estimator, GAE for short), expressed as (5), where γ represents the discount factor of the reward; lambda is used to trade off the influence of bias and variance:

δ _t ＝r+γV(s _t+1 )-V(s _t ) (6)。

the near-end policy optimization (PPO) algorithm used in this embodiment is a classical Actor-Critic algorithm. Like the strategy gradient method, the Actor-Critic method is very sensitive to the update step length, and the strategy divergence is easily caused by improper selection. The PPO algorithm proposes a new objective function to solve the problem of policy update, and the updating objective L of the Actor network _actor May be expressed as equation 9, epsilon is used to control clipping (clip) amplitude. :

actor network pass-to-probability ratio r _t (θ ^π ) By clipping, the update amplitude can be effectively suppressed, and the policy change before and after update is controlled within a certain range. The purpose of the Critic network is to evaluate the state value function V(s) and then to calculate the error value of equation (8) to evaluate the dominance function a, which updates the equation:

L _critic ＝(r+V(s _t+1 ；θ ^v )-V(s _t ；θ ^v )) ² (9)。

whereas policy gradient-based and Actor-Critic-based algorithms generally augment the exploration process by adding auxiliary entropy after the policy network objective function. The calculation formula of the entropy is as follows:

H(x)＝-∫P(x)lnP(x)dx (10)。

in the continuous motion space task, a gaussian distribution is generally adopted as an output distribution of continuous motion. The entropy corresponding to the unitary gaussian distribution is:

where σ is the standard deviation of the gaussian distribution.

In this embodiment, the policy network is responsible for the action of the output path planning, and the value function network is a network for evaluating the output action of the policy network. Rewards are rewards that are used to guide the policy network to learn towards good path plans. After the perception training is finished, the intermediate perception features are used as input to a strategy network, and then a reward function is designed to guide the output path planning action of the strategy network, so that the strategy network obtains high accumulated rewards in the interaction with the environment. If punishment for collision is added in rewards (rewards), then with the penetration of reinforcement learning, the strategy network learns to avoid obstacles in order to avoid the accumulated rewards from being 'deducted' due to collision, thereby achieving the effect of obstacle avoidance.

In reinforcement learning, the bonus function is a crucial part. The method has the main function of measuring the feedback information of the environment after the mobile robot takes a certain action in a given state. The reward function provides a learning direction for the agent, which can learn which actions may lead to good results (positive rewards) and which actions may lead to bad results (negative rewards) by looking at the feedback provided by the reward function as the agent explores in the environment. Furthermore, by setting the bonus function, the goal of reinforcement learning can be determined. For example, if you want an agent to learn to play a game, you can set a bonus function to reflect success and failure in the game so that the agent learns what may lead to success. Therefore, in designing reinforcement learning algorithms, care must be taken in how to set the appropriate reward function to guide the agent to achieve the desired learning objective. The present embodiment designs the following reward function to guide the training of the path planning strategy model. In the path planning task, it is desirable that the mobile robot can reach the end point along a preset route and avoid collision with various static and dynamic obstacles in the process. The well designed reward function can effectively guide the strategy network to learn the correct path planning action, so that the mobile robot obtains the largest accumulated reward in the process of interacting with the environment, and the action meeting the maximum reward expectation is selected in the process of interacting, thereby realizing the functions of navigation obstacle avoidance and the like.

Overall prize R _total Comprises four items: r is R _towards ，R _sidewalk ，R _collision ，R _goal . Wherein R is _towards As shown in formula (12), the method is used for rewarding whether the mobile robot advances towards a preset Waypoint (Waypoint, wpt), and as shown in formula (13), the cosine similarity between the orientation of the robot and the Waypoint is calculated and multiplied by the speed of the robot, and then the distance from the mobile robot to the target Waypoint is used for adjusting the rewarding amplitude. R is R _sidewwlk ，R _collision Respectively representing a road deviation penalty and a collision penalty, generating a penalty of-10 when the mobile robot deviates from the road direction or collides, otherwise, the two are 0; end point prize R _goa1 I.e. when the robot reaches the end point 3 meters, a reward is obtained.

R _total ＝R _towards +R _sidewalk +R _collision +R _goal (12)

Embodiment two:

the second embodiment of the invention provides a robot multi-scene path planning system based on cross-media continuous learning, which comprises:

Embodiment III:

the third embodiment of the invention provides a medium, on which a program is stored, which when executed by a processor, implements the steps in the method for planning a multi-scene path of a robot based on cross-media continuous learning according to the first embodiment of the invention.

Embodiment four:

the fourth embodiment of the invention provides a device, which comprises a memory, a processor and a program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in the multi-scene path planning method based on the cross-media continuous learning according to the first embodiment of the invention when executing the program.

The steps involved in the second, third and fourth embodiments correspond to the first embodiment of the method, and the detailed description of the second embodiment refers to the relevant description of the first embodiment.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented by general-purpose computer means, alternatively they may be implemented by program code executable by computing means, whereby they may be stored in storage means for execution by computing means, or they may be made into individual integrated circuit modules separately, or a plurality of modules or steps in them may be made into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. The robot multi-scene path planning method based on cross-media continuous learning is characterized by comprising the following steps of:

constructing a multi-scene task environment according to task requirements;

2. The method for planning the multi-scene path of the robot based on the cross-media continuous learning according to claim 1, wherein the cross-media perception model comprises two feature extraction branches and a cross-media self-attention mechanism module, and intermediate features are output after passing through the cross-media self-attention module, and the intermediate features contain information of two data modes, so that cross-media information fusion is completed.

3. The method for planning the multi-scene path of the robot based on the cross-media continuous learning according to claim 1, wherein in the process of multi-task training and continuous learning by utilizing the fused characteristics, in order to improve the continuous learning capacity of a cross-media perception model, in the task training of an nth task scene, data with fixed proportion is randomly extracted from n-1 previous scenes to be used as memory playback so as to avoid catastrophic forgetting of the model.

4. The method for planning a multi-scene path of a robot based on cross-media continuous learning according to claim 1, wherein the specific steps of fusing the features of the image and the laser point cloud data by using a self-attention mechanism are as follows:

5. The method for planning a multi-scene path of a robot based on cross-media continuous learning according to claim 1, wherein the multi-task training comprises a semantic map segmentation task and a target detection task.

6. The method for planning a multi-scene path of a robot based on cross-media continuous learning according to claim 1, wherein the specific steps of training a path planning strategy network by using intermediate features are as follows:

7. The method for planning a path of a plurality of scenes of a robot based on cross-media continuous learning according to claim 6, wherein the mobile robot acquires environmental state observation information from the environment, then inputs the environmental state observation information into a cross-media perception model of frozen weights to obtain current intermediate features, takes the current intermediate features as the input of a trained path planning strategy network, and outputs actions by the path planning strategy network to obtain a path planning strategy.

8. A cross-media continuous learning-based robotic multi-scenario path planning system, comprising:

9. A computer readable storage medium, characterized in that a plurality of instructions are stored, which instructions are adapted to be loaded by a processor of a terminal device and to perform the cross-media continuous learning based robotic multi-scenario path planning method of any one of claims 1-7.

10. A terminal device comprising a processor and a computer readable storage medium, the processor configured to implement instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform the cross-media continuous learning based robotic multi-scenario path planning method of any one of claims 1-7.