CN113276883A - Unmanned vehicle driving strategy planning method based on dynamic generation environment and implementation device - Google Patents

Unmanned vehicle driving strategy planning method based on dynamic generation environment and implementation device Download PDF

Info

Publication number
CN113276883A
CN113276883A CN202110464610.7A CN202110464610A CN113276883A CN 113276883 A CN113276883 A CN 113276883A CN 202110464610 A CN202110464610 A CN 202110464610A CN 113276883 A CN113276883 A CN 113276883A
Authority
CN
China
Prior art keywords
vehicle
environment
strategy
road
unmanned vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110464610.7A
Other languages
Chinese (zh)
Other versions
CN113276883B (en
Inventor
俞扬
詹德川
周志华
史正昕
罗凡明
袁雷
秦熔均
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110464610.7A priority Critical patent/CN113276883B/en
Publication of CN113276883A publication Critical patent/CN113276883A/en
Application granted granted Critical
Publication of CN113276883B publication Critical patent/CN113276883B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0043Signal treatments, identification of variables or parameters, parameter estimation or state estimation
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2552/00Input parameters relating to infrastructure
    • B60W2552/50Barriers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • Transportation (AREA)
  • Human Computer Interaction (AREA)
  • Automation & Control Theory (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an unmanned vehicle driving strategy planning method based on a dynamic generation environment and an implementation device thereof, and (1) an unmanned driving environment is constructed in a simulator. (2) And initializing the reinforcement learning parameters and the network strategy model. (3) Interacting with the environment, collecting the current state of the unmanned vehicle, performing action sampling by a strategy network, executing the action in a simulator and entering a new state. (4) The accumulated reward of the vehicle in a section of the generating environment and whether the task is successfully completed are collected, and a new section of the generating environment is established. (5) And (3) for the driving strategy of the vehicle, after a certain reinforcement learning sample is collected by repeating the operation in the step (3), performing strong strategy iteration. (6) And (3) continuously generating the environment to which the vehicle faces by the mode in (4), extracting the environment parameters of failed roads and low-jackpot roads according to success and jackpot collection, and repeatedly training the road sections. (7) The training of the steps is continued until the strategy is converged.

Description

Unmanned vehicle driving strategy planning method based on dynamic generation environment and implementation device
Technical Field
The invention relates to an unmanned vehicle driving strategy planning method based on a dynamic generation environment and an implementation device, and belongs to the technical field of unmanned vehicles.
Background
The unmanned vehicle is one of the popular fields of research in recent years, because unmanned driving can improve people's mode of going out, promote traffic safety, has very big application prospect, can effectively promote the development of automobile industry with the help of current artificial intelligence technique, more can reduce a lot of manpower and materials, reaches high-efficient resource utilization efficiency. With the gradual maturity and application of deep learning technology in recent years, deep reinforcement learning has succeeded to a certain extent in the fields of games, recommendation systems, unmanned driving and the like, reinforcement learning is applied to the field of unmanned driving, and efficient unmanned driving strategy training can be achieved to a certain extent through proper reward function design.
Aiming at the characteristic that the cost of unmanned trial and error is too high, an algorithm based on reinforcement learning generally establishes a driving scene through a simulator for sampling training and migrates strategies to a real environment. At present mainstream based on passing through of simulator environment training vehicle selects a sealed garden, carries out 1 in the simulator to this sealed garden: 1, such methods are more limited, including 1: 1, fine manual operation can be performed on the road surface, the road edge and the curve in the modeling, and the modeling needs to consume a plurality of repeated operations; and since the environment 1: 1, modeling is not easy to modify the road width, the style and the like; once the model of the closed park is established, the unmanned vehicle strategy trained in the simulator is usually only suitable for the selected closed park, the generalization is poor, when the unmanned vehicle strategy obtained by previous training faces similar scenes such as road width change, turning amplitude change and the like, the unmanned vehicle strategy obtained by previous training cannot obtain expected effects, and if the vehicle strategy is required to be suitable for a new park, the simulator scene, the training strategy model and other operations need to be re-established by using the method.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects that in the prior art, the environment scene is single and poor in diversity when the simulator is used for training, the driving strategy obtained by training in the traditional simulator is poor in generalization and the like, the invention provides the unmanned vehicle driving strategy planning method and the implementation device based on the dynamic generation environment, and the efficient training from the simulator to the real environment is realized by dynamically adjusting the environment parameters in the simulator.
The technical scheme is as follows: a method for planning a driving strategy of an unmanned vehicle based on a dynamic generation environment realizes the efficient training from a simulator to a real environment by dynamically adjusting environmental parameters in the simulator, and comprises the following steps:
(1) the unmanned environment was constructed using a parameter-based approach in the UE4 simulator.
(2) And initializing a reinforcement learning parameter and a network strategy model, starting to train the unmanned vehicle, and training the vehicle to safely and efficiently reach a specified target point through a designed reward function.
(3) Interacting with the environment, collecting the current state of the unmanned vehicle in the dynamically generated environment, performing action sampling by a strategy network, executing the action obtained by sampling in a simulator and entering a new state.
(4) The method comprises the steps that a primary task of the unmanned vehicle in the dynamic generation environment is started by entering a road section, and is finished when the unmanned vehicle collides or reaches a target point of the road section, after the primary task is finished, the accumulated reward of the unmanned vehicle in the generation environment of the section and whether the unmanned vehicle successfully completes the task are collected, and a new generation environment is established by using a new set of parameters.
(5) And (3) for the driving strategy of the unmanned vehicle, after a certain reinforcement learning sample is collected by repeating the operation in the step (3), strong strategy iteration is carried out.
(6) And (3) continuously generating the environment to which the unmanned vehicle faces by the mode in (4), extracting the environment parameters of the failed road and the low-jackpot road based on whether the unmanned vehicle succeeds or not and collecting the jackpots, and repeatedly training the sections for multiple times, so that the unmanned vehicle can obtain more targeted training.
(7) And (5) continuing the training in the steps (3) to (6) until the strategy is converged to obtain a trained strategy model.
In the step (1), the environmental parameters include road parameters for modeling a section of road, a second-order Bezier curve is used as a road main line, the road main line is modeled into a section of road model for driving of unmanned vehicles by using a mesh grid method, the shape, the corner size and the road width of the road are determined by controlling the environmental parameters, and the environmental parameters can also control dynamic and static obstacles, particularly determine the placement position of the static obstacles and the advancing mode of the dynamic obstacles, so that the static obstacles and the dynamic pedestrians in a real scene can be simulated, and rich training environments are realized together.
Splicing roads for the unmanned vehicle to run: the road for the unmanned vehicle to run is combined with special road scenes, and the special road scenes are characterized in that the special road scenes can be effectively connected with the road generated by parameters to jointly form a continuous dynamic generation environment for the unmanned vehicle to train. The special road scenes comprise challenging roads such as narrow roads, crossroads and T-shaped intersections, the training samples can be richer in the combined mode, and compared with the traditional method, the method can better cope with complex environments.
The simulator is used for simulating a four-tuple form (O, A, P, R) of a Markov decision process approximately, providing observation information consistent with the unmanned vehicle in a real scene, and providing instructions of an accelerator, steering and the like consistent with the real unmanned vehicle; the sampling is performed in a markov process provided by a simulator, and the unmanned vehicle is trained. O, A, P, R refer to vehicle state information, vehicle motion, state transition probability, and rewards, respectively. The states comprise radar information received by the unmanned vehicle, the current speed, the throttle size and the turning angle state of the unmanned vehicle, and information used for describing the forward road aiming point. The vehicle action comprises an accelerator instruction and a turning angle instruction. The reward is a combination of a plurality of reward functions, and comprises progressive positive reward advancing to a target point, negative reward colliding with an obstacle, and positive reward for completing a task.
The method comprises the steps of using a neural network as a strategy model for driving of the unmanned vehicle, outputting actions to be executed by using the strategy according to observation information collected in a simulator to control the unmanned vehicle to drive in the simulator, giving rewards to the environment of the simulator at the same time, collecting samples in the process, and optimizing a current strategy by using a near-end strategy optimization algorithm PPO until the strategy converges.
The radar information refers to the distance information between obstacles around the vehicle body and the vehicle body represented by radar rays around the vehicle body; the information of the pre-aiming point of the road in front of the vehicle is the road point information which is planned in front when the vehicle runs, can be used for informing the future traveling direction of the vehicle, and plays a great role when facing crossroads, T-shaped intersections or static barriers covering the shape of the road.
The trained strategy model can be migrated to a real scene, a park in which the unmanned vehicle runs is selected under the real scene, a starting point and a terminal point of the vehicle are selected, and obstacles blocking the vehicle can be arranged in the scene. And calling a driving strategy trained in the simulator to control the vehicle, so as to realize automatic driving of the vehicle.
An apparatus for implementing a dynamically generated environment based unmanned vehicle driving strategy planning method, comprising:
(1) radar mounted on unmanned vehicle body: for obtaining obstacle distance information in the vicinity of the unmanned vehicle.
(2) CAN equipment: the system is used for transmitting the current state of the vehicle to the trained strategy model and sending the control information given by the strategy model to the vehicle chassis.
(3) Differential GPS: the method is used for acquiring the current longitude and latitude position of the vehicle.
(4) A memory: for storing the trained strategy model.
(5) A processor: and the CAN equipment is used for calculating the control information through the strategy model after receiving the observation information and sending the control information through the CAN equipment.
Compared with the prior art, the invention has the following advantages:
1) compared with the traditional simulator-based method, the unmanned vehicle driving strategy obtained based on dynamic generation environment training can obtain a more stable effect when facing a new scene, and the vehicle strategy has the capabilities of avoiding obstacles and avoiding pedestrians by introducing dynamic static obstacles.
2) Dynamically modeling a road for a road main line using a second order bezier curve, compared to performing 1: the modeling of 1 has the characteristics of less manual repeated operation, easiness in using environmental parameters to control the road and the like, and has the characteristics of continuous connection, smoothness and the like in a multi-section generated road. Connecting components such as crossroads and T-shaped intersections are added at the connecting positions, and the dynamic generation environment in the simulator is closer to the real road through the processing, so that the difference between the real environment and the environment of the simulator is reduced.
3) When a road is modeled through environmental parameters, based on the characteristics of reinforcement learning, after an unmanned vehicle completes the training of a road section, the accumulated reward corresponding to the road section and whether the accumulated reward is successful or not are collected, the road formed by modeling of the environmental parameters is combined with the expression of the vehicle on the road section, repeated training can be performed for a plurality of times aiming at the scene with poor vehicle expression, and the robustness of the strategy is improved.
Drawings
FIG. 1 is a schematic diagram of a method for modeling a section of road based on a second-order Bezier curve according to the present invention;
FIG. 2 is a schematic diagram of obstacle avoidance in a dynamically generated environment according to the present invention;
FIG. 3 is a schematic illustration of training an unmanned vehicle driving strategy in a dynamically generated environment;
FIG. 4 is a schematic diagram of a hardware facility in a real scene according to the present invention;
FIG. 5 is a flow chart of policy usage in the migration to a real scenario of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
An unmanned vehicle driving strategy planning method based on a dynamic generation environment comprises the following steps:
the method comprises the following steps:
constructing a training environment in a simulator by using a code, and firstly using road parameters to construct a corresponding road section, as shown in fig. 1(a), in order to keep P in a new section of a curve0And P 'in the last segment'2At the same point, P in the new curve segment for keeping the curve smooth0P1The direction is P 'in the upper section of the curve'1P′2The direction, P in the new road section can be determined according to the distances d1, d2 and the angle theta1、P2Two points, now from P0、P1、P2The new curve can be obtained by three points by means of a second-order bezier curve function b (t):
B(t)=(1-t)2P0+2t(1-t)P1+t2P2,t∈[0,1]
by sampling t, a set of intermediate points on the curve is obtained, which can be used in subsequent steps to build a road model from the curve and as pre-pointing point information representing a future driving plan of the vehicle.
Step two:
after a set of second-order bezier curve road intermediate points are obtained by using the method, the intermediate points are expanded into the road by using a mesh grid component (as shown in fig. 1 (b)). In the expansion process, differences are sequentially obtained for the group of intermediate points, and the advancing directions of the intermediate points are obtained. Then, direction vectors perpendicular to the traveling direction of the intermediate points and on the horizontal plane are calculated, and for each intermediate point, left and right boundary points of the road, which determine the shape of the road, are obtained by moving the direction vector in the positive and negative directions by a distance of half the width of the road. And these boundary points constitute vertices in the mesh, and these boundary points are connected in order to constitute triangular faces in the mesh (as shown in fig. 1 (b)). In the UE4 simulator, after the road plane is drawn by using the above method, a collision body is added to make it a drivable road surface, and walls are generated on both sides of the road surface by a similar method, and the road surface and the walls together form the main body of a section of road.
Step three:
after a road body is obtained, the difficulty of the road is improved by adding dynamic obstacles and static obstacles, and the generation of the obstacles is also controlled by parameters, specifically: (1) the placing position of the static barrier is controlled by the position of the middle point of the Bezier curve and the distance from the middle point. (2) The moving of the dynamic barrier is controlled by a rule, the dynamic barrier can perform a return motion at a certain speed between a starting point and a target point, the dynamic barrier is provided with radar rays, the radar rays can stop or accelerate when a vehicle is seen, so that pedestrians in actual conditions are simulated, in the process, the starting point position, the target point position and the moving speed of the moving dynamic barrier are determined by parameters, and the generation environment after the barrier is added is shown in fig. 2.
The method for generating the road based on the second-order Bezier curve can smoothly connect the roads in the two sections of generating environments, so that the unmanned vehicle can continuously run in a plurality of sections of generating environment barriers.
Step four:
the road scene forming the unmanned dynamic generation environment is not only a road formed by a second-order bezier curve, as shown in the left half part of fig. 3, but also various roads such as crossroads, t-junctions, straight roads and the like can be established by using a mesh grid method when the road is generated, so that the unmanned vehicle can carry out richer and more diverse training. In addition, based on the characteristics of reinforcement learning, the cumulative reward of each road section and whether the driving task is successfully completed or not can be recorded when the vehicle starts to run on each road section until the vehicle collides or completes the road section, and the records respectively correspond to two parameter sets, namely a failure set for storing failed road parameters and a low cumulative reward set, and a batch of road parameter sets which are successful and have the lowest cumulative reward are maintained in a maximum pile mode. When a new parameter generation environment is selected, one set of parameters is extracted from the two sets with a set probability. The method can be used for carrying out targeted training aiming at scenes that the vehicle is weak at present.
Step five:
a method of training an unmanned vehicle in a dynamically generated environment may be seen in the right half of fig. 3. The invention uses a near-end strategy optimization algorithm PPO (prompt Policy optimization) to optimize the driving strategy of the unmanned vehicle, and in the network structure, a combination of a full connection layer and a relu activation function is used, and an lstm layer is added to combine with information of a time domain, so that the vehicle can better perform an obstacle avoidance task.
Step six:
the unmanned vehicle interacts with the environment in the generated environment provided in step three and four, which may be defined in the form of a quadruple (O, a, P, R) of a Markov Decision Process (MDP), and observation information O is collected once at 100ms intervals in the UE4 simulator, the observation information including radar information simulated in the simulator, vehicle state information such as the current speed of the vehicle, and pre-aiming point information. The method for acquiring and using the preview point information comprises the following steps: and C, calculating a group of forward aiming points by means of the intermediate points of the Bezier curve obtained in the step I and the current position of the vehicle, and using the relative position information of the aiming points relative to the vehicle body as aiming point information, wherein the aiming point information can represent the direction in which the vehicle needs to walk next to guide the vehicle to better run, and particularly in the scene of the crossroads, the aiming point information can guide the vehicle to go forward to which crossroad of the crossroads.
And transmitting the observation information O to a neural network, sampling the action by the policy network to obtain an action A to be executed, including an accelerator and a steering command of the unmanned vehicle, and transmitting the action to a simulator for execution to obtain a reward R.
Step eight:
and repeatedly executing the step six, collecting observation, action and reward information in the interaction process as samples, updating the strategy when the samples are enough, wherein the optimization target used in the near-end strategy optimization algorithm is as follows:
Figure BDA0003043055120000061
where theta is the parameter to be optimized in the algorithm,
Figure BDA0003043055120000062
as a dominant function at time t, i.e. Q(s)t,at)-V(st),Q(st,at) Representing vehicle slave state s by a function called state-action valuetStarting, performing action atAnd then use the jackpot accrued by the current strategy. V(s)t) Called state value function, representing the vehicle's slave state stCumulative rewards from departure using current strategy, where Q(s)t,at) Reward discount and win from the current state to the end of a track, V(s)t) Can be calculated by critic network, therefore
Figure BDA0003043055120000063
The expression of (a) is:
Figure BDA0003043055120000064
where γ is the attenuation coefficient, and γ is 0.99 in this experiment. In measuring current strategy piθ(at|st) Old policy
Figure BDA0003043055120000065
When the cutting idea in ppo2 is used, namely
Figure BDA0003043055120000066
Instead of the KL penalty in ppo1, the value of (c) is clipped between 1- ε and 1+ ε, where ε is the hyper-parameter, determining the clipping range, preventing the parameter from updating too fast.
In this mode, samples are continuously and repeatedly acquired and network parameters are updated, and training is ended when the maximum number of iterations in the training setting is reached.
Step nine:
after the driverless vehicle driving strategy trained in the dynamically generated environment is obtained, the strategy can be migrated to a real environment.
The steps for migrating the unmanned vehicle strategy in real environment are shown in fig. 5:
step 01: and selecting a loaded and trained strategy model.
Step 02: and (4) with the help of map information, after a starting point and a target point are selected, planning a path by using an A-star algorithm, simulating the middle point of the Bezier curve mentioned in the step one by using a planned path point, and calculating the information of the pre-aiming point by using the method in the step six.
Step 03: as shown in fig. 5, vehicle state information such as radar data and vehicle speed and the preview point information are combined and transmitted to the policy model to obtain an action command.
Step 04: after the unmanned vehicle executes the action command in Step03 for a while, next state information is acquired.
Step 05: and repeatedly executing Step04-Step05 until the unmanned vehicle finishes the driving task.
As shown in fig. 4, the apparatus for implementing the unmanned vehicle driving strategy planning method based on the dynamically generated environment includes:
(1) radar mounted on unmanned vehicle body: for obtaining obstacle distance information in the vicinity of the unmanned vehicle.
(2) CAN equipment: the system is used for transmitting the current state of the vehicle to the trained strategy model and sending the control information given by the strategy model to the vehicle chassis.
(3) Differential GPS: the system is used for acquiring the current longitude and latitude position of the vehicle, forming observation information together with information acquired by the radar and the acquired vehicle body state information and sending the observation information to the processor.
(4) A memory: for storing the trained strategy model.
(5) A processor: and the CAN equipment is used for calculating the control information through the strategy model after receiving the observation information and sending the control information through the CAN equipment.
The method has the advantages that the road is parameterized, and the dynamic and static obstacles which obstruct the vehicle from moving forward in the road scene are introduced by the parameters, so that the vehicle has obstacle avoidance capability, the generation environment which is jointly formed by the road and the obstacles is determined by the parameters, and the vehicle can be trained for many times aiming at the generation environment parameter combination with poor vehicle performance, so that the vehicle can be trained more specifically. And the mode of the multi-section road combination connection can add special road sections such as crossroads, narrow roads, T-shaped intersections and the like in the road combination process and keep the roads coherent, so that vehicles can learn more special scenes expected to be added, and the splicing mode can better include new road scenes, thereby further improving the richness of training scenes. The vehicle strategy model trained in the generated environment has strong generalization, and can be directly applied to a closed park environment established by using the UE4, and strategies are further migrated to vehicles in a real environment.

Claims (10)

1. A method for planning a driving strategy of an unmanned vehicle based on a dynamic generation environment is characterized in that a method for training a simulator to a real environment is realized by dynamically adjusting environment parameters in the simulator, and comprises the following steps:
(1) constructing an unmanned environment in a simulator using a parameter-based approach;
(2) initializing a reinforcement learning parameter and a network strategy model, starting to train the unmanned vehicle, and training the vehicle to reach a specified target point through a designed reward function;
(3) interacting with the environment, collecting the current state of the unmanned vehicle in the dynamically generated environment, performing action sampling by a strategy network, executing the action obtained by sampling in a simulator and entering a new state;
(4) starting a primary task of the unmanned vehicle in the dynamic generation environment from a road section, and finishing collision or reaching a target point of the road section, and establishing a section of new generation environment after finishing the primary task;
(5) for the driving strategy of the unmanned vehicle, after the operation in the step (3) is repeated and a preset reinforcement learning sample is collected, strong strategy iteration is carried out;
(6) for the environment faced by the unmanned vehicle, training for a plurality of times for the set parameters;
(7) and (5) continuing the training of the steps until the strategy is converged to obtain a trained strategy model.
2. The unmanned aerial vehicle driving strategy planning method based on dynamic generation environment of claim 1, wherein in the step (4), a task of the unmanned aerial vehicle in the dynamic generation environment is started by entering a road section, and is finished by colliding with or reaching a target point of the road section, after finishing the task, the accumulated reward of the unmanned aerial vehicle in the generation environment of the section and whether the unmanned aerial vehicle successfully completes the task are collected, and a new generation environment is established by using a new set of parameters.
3. The unmanned aerial vehicle driving strategy planning method based on dynamic environment generation as claimed in claim 2, wherein in the step (6), the environment is continuously generated from the mode in (4) for the environment faced by the unmanned aerial vehicle, and based on success or failure and accumulated reward collection, the environment parameters of the failed road and the low accumulated reward road are extracted, and the unmanned aerial vehicle is trained repeatedly for a plurality of times on the road sections, so that the unmanned aerial vehicle obtains more targeted training.
4. The unmanned vehicle driving strategy planning method based on the dynamic generation environment of claim 1, wherein in the step (1), the environment parameters include road parameters for modeling a section of road, a second-order bezier curve is used as a road main line, the road main line is modeled into a section of road model for the unmanned vehicle to drive by using a mesh grid method, the shape, the corner size and the road width of the road are determined by controlling the environment parameters, and the environment parameters can also control dynamic and static obstacles, in particular determine the placement position of the static obstacles and the driving mode of the dynamic obstacles, so as to simulate the static obstacles and the dynamic pedestrians in the real scene.
5. The unmanned vehicle driving strategy planning method based on dynamic generation environment of claim 4, wherein roads available for the unmanned vehicle to run are spliced: combining the road for the unmanned vehicle to run with a special road scene to jointly form a continuous dynamic generation environment for the unmanned vehicle to train; the special road scene comprises an intersection, a T-shaped intersection and a narrow road.
6. The dynamically environment-based unmanned aerial vehicle driving strategy planning method of claim 1, wherein the simulator is configured to approximate a four-tuple form (in terms of (i) a Markov decision processO,A,P,R) Providing observation information consistent with the unmanned vehicle in a real scene, and providing instructions of an accelerator and steering consistent with the real unmanned vehicle; sampling in a Markov process provided by a simulator, and training the unmanned vehicle; o, A, P, R refer to vehicle status information, vehicle actions, acquisitions, awards, respectively; the states include radar information received by the unmanned vehicle, current speed, throttle size and turning angle states of the unmanned vehicle, andthe system is used for describing the information of the pre-aiming point of the road ahead; the vehicle action comprises an accelerator instruction and a turning angle instruction; the reward is a combination of a plurality of reward functions, and comprises progressive positive reward advancing to a target point, negative reward colliding with an obstacle, and positive reward for completing a task.
7. The unmanned vehicle driving strategy planning method based on dynamic generation environment as claimed in claim 1, wherein a neural network is used as a strategy model for the unmanned vehicle driving, the strategy is used to output the action to be executed according to the observation information collected in the simulator to control the unmanned vehicle to drive in the simulator, and the simulator environment gives a reward, samples are collected in the process, and the current strategy is optimized by using a near-end strategy optimization algorithm PPO until the strategy converges.
8. The unmanned vehicle driving strategy planning method based on dynamic generation environment of claim 6, wherein the radar information refers to distance information between obstacles around the vehicle body and the vehicle body, which are represented by radar rays around the vehicle body; the information of the predicted point of the road in front of the vehicle is the forward planned waypoint information when the vehicle is running, and the information can be used for informing the future traveling direction of the vehicle.
9. The unmanned vehicle driving strategy planning method based on the dynamic generation environment of claim 1, wherein the trained strategy model is used for migrating to a real scene, selecting a park to be driven by the unmanned vehicle under the real scene, selecting a starting point and an end point of the vehicle, and arranging an obstacle obstructing the vehicle in the scene; and calling a driving strategy trained in the simulator to control the vehicle, so as to realize automatic driving of the vehicle.
10. An apparatus for implementing a dynamically generated environment based unmanned vehicle driving strategy planning method, comprising:
(1) radar mounted on unmanned vehicle body: the system comprises a display unit, a control unit and a display unit, wherein the display unit is used for displaying the distance information of the obstacle near the unmanned vehicle;
(2) CAN equipment: the system is used for transmitting the current state of the vehicle to the trained strategy model and sending the control information given by the strategy model to the vehicle chassis;
(3) differential GPS: the system is used for acquiring the current longitude and latitude position of the vehicle;
(4) a memory: the strategy model is used for storing the trained strategy model;
(5) a processor: and the CAN equipment is used for calculating the control information through the strategy model after receiving the observation information and sending the control information through the CAN equipment.
CN202110464610.7A 2021-04-28 2021-04-28 Unmanned vehicle driving strategy planning method and implementation device based on dynamic generation environment Active CN113276883B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110464610.7A CN113276883B (en) 2021-04-28 2021-04-28 Unmanned vehicle driving strategy planning method and implementation device based on dynamic generation environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110464610.7A CN113276883B (en) 2021-04-28 2021-04-28 Unmanned vehicle driving strategy planning method and implementation device based on dynamic generation environment

Publications (2)

Publication Number Publication Date
CN113276883A true CN113276883A (en) 2021-08-20
CN113276883B CN113276883B (en) 2023-04-21

Family

ID=77277555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110464610.7A Active CN113276883B (en) 2021-04-28 2021-04-28 Unmanned vehicle driving strategy planning method and implementation device based on dynamic generation environment

Country Status (1)

Country Link
CN (1) CN113276883B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114117829A (en) * 2022-01-24 2022-03-01 清华大学 Dynamic modeling method and system for man-vehicle-road closed loop system under limit working condition
CN114491739A (en) * 2021-12-30 2022-05-13 深圳市优必选科技股份有限公司 Construction method and device of road traffic system, terminal equipment and storage medium
CN114596553A (en) * 2022-03-11 2022-06-07 阿波罗智能技术(北京)有限公司 Model training method, trajectory prediction method and device and automatic driving vehicle
CN114770497A (en) * 2022-03-31 2022-07-22 中国人民解放军陆军工程大学 Search and rescue method and device of search and rescue robot and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647839A (en) * 2019-09-18 2020-01-03 深圳信息职业技术学院 Method and device for generating automatic driving strategy and computer readable storage medium
CN110745136A (en) * 2019-09-20 2020-02-04 中国科学技术大学 Driving self-adaptive control method
CN111026127A (en) * 2019-12-27 2020-04-17 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
CN111473794A (en) * 2020-04-01 2020-07-31 北京理工大学 Structural road unmanned decision planning method based on reinforcement learning
CN111483468A (en) * 2020-04-24 2020-08-04 广州大学 Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning
US20200372822A1 (en) * 2019-01-14 2020-11-26 Polixir Technologies Limited Training system for autonomous driving control policy
US20200372410A1 (en) * 2019-05-23 2020-11-26 Uber Technologies, Inc. Model based reinforcement learning based on generalized hidden parameter markov decision processes

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200372822A1 (en) * 2019-01-14 2020-11-26 Polixir Technologies Limited Training system for autonomous driving control policy
US20200372410A1 (en) * 2019-05-23 2020-11-26 Uber Technologies, Inc. Model based reinforcement learning based on generalized hidden parameter markov decision processes
CN110647839A (en) * 2019-09-18 2020-01-03 深圳信息职业技术学院 Method and device for generating automatic driving strategy and computer readable storage medium
CN110745136A (en) * 2019-09-20 2020-02-04 中国科学技术大学 Driving self-adaptive control method
CN111026127A (en) * 2019-12-27 2020-04-17 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
CN111473794A (en) * 2020-04-01 2020-07-31 北京理工大学 Structural road unmanned decision planning method based on reinforcement learning
CN111483468A (en) * 2020-04-24 2020-08-04 广州大学 Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114491739A (en) * 2021-12-30 2022-05-13 深圳市优必选科技股份有限公司 Construction method and device of road traffic system, terminal equipment and storage medium
CN114117829A (en) * 2022-01-24 2022-03-01 清华大学 Dynamic modeling method and system for man-vehicle-road closed loop system under limit working condition
CN114117829B (en) * 2022-01-24 2022-04-22 清华大学 Dynamic modeling method and system for man-vehicle-road closed loop system under limit working condition
CN114596553A (en) * 2022-03-11 2022-06-07 阿波罗智能技术(北京)有限公司 Model training method, trajectory prediction method and device and automatic driving vehicle
CN114596553B (en) * 2022-03-11 2023-01-24 阿波罗智能技术(北京)有限公司 Model training method, trajectory prediction method and device and automatic driving vehicle
CN114770497A (en) * 2022-03-31 2022-07-22 中国人民解放军陆军工程大学 Search and rescue method and device of search and rescue robot and storage medium
CN114770497B (en) * 2022-03-31 2024-02-02 中国人民解放军陆军工程大学 Search and rescue method and device of search and rescue robot and storage medium

Also Published As

Publication number Publication date
CN113276883B (en) 2023-04-21

Similar Documents

Publication Publication Date Title
CN113276883A (en) Unmanned vehicle driving strategy planning method based on dynamic generation environment and implementation device
WO2022052406A1 (en) Automatic driving training method, apparatus and device, and medium
CN111506058B (en) Method and device for planning a short-term path for autopilot by means of information fusion
CN112882469B (en) Deep reinforcement learning obstacle avoidance navigation method integrating global training
CN112888612A (en) Autonomous vehicle planning
CN113405558B (en) Automatic driving map construction method and related device
CN114846425A (en) Prediction and planning of mobile robots
CN114358128A (en) Method for training end-to-end automatic driving strategy
CN111460879B (en) Neural network operation method using grid generator and device using the same
CN113104050B (en) Unmanned end-to-end decision method based on deep reinforcement learning
CN112508164A (en) End-to-end automatic driving model pre-training method based on asynchronous supervised learning
CN112114592B (en) Method for realizing autonomous crossing of movable frame-shaped barrier by unmanned aerial vehicle
Guvenc et al. Simulation Environment for Safety Assessment of CEAV Deployment in Linden
EP4302165A1 (en) Instantiating objects in a simulated environment based on log data
Hubmann Belief state planning for autonomous driving: Planning with interaction, uncertain prediction and uncertain perception
Batkovic Enabling Safe Autonomous Driving in Uncertain Environments
Li et al. Decision making for autonomous vehicles
CN114120653A (en) Centralized vehicle group decision control method and device and electronic equipment
Aflaki Teaching a Virtual Duckietown Agent to Stop
Zheng et al. The Navigation Based on Hybrid A Star and TEB Algorithm Implemented in Obstacles Avoidance
US11808582B1 (en) System processing scenario objects during simulation
Moudhgalya Language Conditioned Self-Driving Cars Using Environmental Object Descriptions For Controlling Cars
CN111434550B (en) Simulation-based parking strategy generation method and system
JPH10268749A (en) Method for simulating autonomous traveling body
Peiss et al. Graph-Based Autonomous Driving with Traffic-Rule-Enhanced Curriculum Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant