CN114179835A

CN114179835A - Decision training method for automatic driving vehicle based on reinforcement learning in real scene

Info

Publication number: CN114179835A
Application number: CN202111653767.0A
Authority: CN
Inventors: 孙辉; 戴一凡
Original assignee: Tsinghua University; Suzhou Automotive Research Institute of Tsinghua University
Current assignee: Tsinghua University; Suzhou Automotive Research Institute of Tsinghua University
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-03-15
Anticipated expiration: 2041-12-30
Also published as: CN114179835B

Abstract

The invention discloses an automatic driving vehicle decision training method based on reinforcement learning in a real scene. The autonomous vehicle is provided with a drive-by-wire chassis, a positioning device, a lidar device, and an autonomous driving controller, the method comprising: when a vehicle runs according to track points of a preset running path in a real scene, intermittently executing exploration behaviors and recording input information of a reinforcement learning model, wherein the input information comprises an input state, an action space and a return after single step execution; and training a reinforcement learning decision algorithm according to the input information. According to the invention, through basic hardware such as a drive-by-wire chassis, four laser radars, an RTK positioning unit and a computer controller, and through key technologies such as preset driving track, small-range sampling action exploration, reliable safety protection and automatic reset, the limitation that a reinforcement learning algorithm depends on a virtual environment is broken through, and online automatic acquisition, training and verification of a reinforcement learning decision algorithm of an automatic driving vehicle are realized.

Description

Decision training method for automatic driving vehicle based on reinforcement learning in real scene

Technical Field

The embodiment of the invention relates to the technical field of automatic driving, in particular to a decision training method for an automatic driving vehicle based on reinforcement learning in a real scene.

Background

An automatic driving vehicle, also called an intelligent automobile, is an important application of an outdoor wheeled mobile robot in the traffic field. The vehicle-mounted sensor senses the surrounding environment of the vehicle by using a vehicle-mounted sensor, such as a camera, a laser radar, an ultrasonic sensor, a microwave radar, a GPS, a speedometer, a magnetic compass and the like, and can control the steering and the speed of the vehicle according to road, vehicle position and obstacle information obtained by sensing, so that the vehicle can safely and reliably run on the road.

The intelligent automobile fundamentally changes the traditional closed-loop control mode of 'one vehicle is one way' and requests an uncontrollable driver from the closed-loop system, thereby reducing human influence factors and realizing accurate machine control by a machine driving brain, thereby greatly improving the efficiency and the safety of a traffic system.

The traditional prediction method based on artificial features or vehicle dynamics models cannot solve the problems of high dynamics, uncertainty, strong nonlinearity and the like existing in the actual road traffic environment, and the problems affect and limit the industrial development process of the intelligent driving technology.

The deep reinforcement learning theory is used for solving the problem of random uncertainty of the intelligent driving technology by analyzing big data and researching and calculating, and lays a scientific theoretical support for further realizing industrialization of the intelligent driving automobile. However, the reinforcement learning algorithm mostly depends on the data acquisition and training of the virtual simulation environment, which greatly limits the application of the reinforcement learning algorithm in real scenes.

Disclosure of Invention

The invention provides an automatic driving vehicle decision training method based on reinforcement learning in a real scene, which breaks through the limitation that a reinforcement learning algorithm depends on a virtual environment and realizes the online automatic acquisition, training and verification of the reinforcement learning decision algorithm of the automatic driving vehicle.

The invention provides an automatic driving vehicle decision training method based on reinforcement learning in a real scene, wherein the automatic driving vehicle is provided with a drive-by-wire chassis, a positioning device, a laser radar device and an automatic driving controller, the drive-by-wire chassis drives along track points of a preset driving path after being started, the positioning device is used for acquiring position information of the vehicle, the radar device is used for acquiring environment data of a vehicle driving process, the automatic driving controller is used for controlling the vehicle driving process according to a preset algorithm, and the method comprises the following steps:

when a vehicle runs according to track points of a preset running path in a real scene, intermittently executing exploration behaviors and recording input information of a reinforcement learning model, wherein the input information comprises an input state S, an action space A and a return R after single step execution;

and training a reinforcement learning decision algorithm according to the input information.

Optionally, the input state S includes: a track point S1 of a preset driving path and abstract information S2 of the surrounding environment acquired by the laser radar device.

Optionally, the motion space a comprises two decomposed motions of a transverse motion space a1 and a longitudinal motion space a 2;

wherein, the transverse motion space A1 is assumed to conform to Gaussian distribution as the basis of the subsequent random motion sampling;

a reference value is set in the vertical movement space a 2.

Optionally, the return R after the single step execution is an evaluation obtained after the action a is executed in a single step in the input state S;

factors related to the reward R after the single-step execution include: the evaluation of the offset amount of the vehicle travel path and the preset travel path, the offset amount of the vehicle travel speed and the expected travel speed, and the vehicle collision risk and lane departure.

Optionally, when the number of times of continuously executing the exploration behavior reaches a set threshold, the vehicle is controlled to reset, and the vehicle runs according to a preset running track point.

Optionally, the reinforcement learning algorithm is an offline reinforcement learning algorithm.

The invention has the beneficial effects that:

1. the invention provides an automatic driving vehicle decision training algorithm based on reinforcement learning in a real scene, which breaks through the limitation that the reinforcement learning algorithm depends on a virtual environment through key technologies such as preset driving tracks, small-range sampling action exploration, reliable safety protection and automatic reset and has reference guiding significance for popularization and use of reinforcement learning in a real environment.

2. The invention is a full-automatic driving behavior in the whole sample collection period, greatly reduces the manual workload, improves the sampling efficiency, and supports the synchronous operation and sampling of a plurality of automatic driving vehicles.

3. The invention sets a reference value on the longitudinal motion space by applying Gaussian distribution limitation on the transverse motion space, and the designs are favorable for the rapid convergence of the intelligent agent model.

Drawings

FIG. 1 is a flow chart of an automated driving vehicle decision training method based on reinforcement learning in real scenes according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

The embodiment of the invention provides an automatic driving vehicle decision training method based on reinforcement learning in a real scene, wherein the automatic driving vehicle is provided with a drive-by-wire chassis, a positioning device, a laser radar device and an automatic driving controller, the drive-by-wire chassis drives along track points of a preset driving path after being started, the positioning device is used for acquiring position information of the vehicle, the radar device is used for acquiring environment data of a vehicle driving process, and the automatic driving controller is used for controlling the vehicle driving process according to a preset algorithm.

Preferably, the laser radar can be 4 360-degree laser radars arranged on the front, the rear, the left and the right vehicle bodies of the drive-by-wire chassis; the positioning device may be a roof mounted RTK high precision positioning unit.

Referring to fig. 1, the method includes:

s110, when a vehicle runs according to track points of a preset running path in a real scene, intermittently executing exploration behaviors and recording input information of a reinforcement learning model, wherein the input information comprises an input state S, an action space A and a return R after single step execution;

and S120, training a reinforcement learning decision algorithm according to the input information.

The reinforcement learning model in this embodiment selects an off-polarity reinforcement learning algorithm, such as a ddpg (Deep dependent polarization gradient) algorithm, a TD3(Twin Delayed dependent polarization gradient) algorithm, or a SAC (Soft Actor-critical) algorithm in Deep reinforcement learning. The offline reinforcement learning algorithm can make full use of historical data.

Further, a high-precision map of the target driving area and preset driving track points are prestored in the automatic driving control, and the vehicle drives along the tracks in the policy environment. In addition, the whole sampling process is full-automatic driving, the sampling efficiency can be improved, and multi-vehicle parallel sampling can be realized. In order to explore more motion spaces, a method of random exploration and automatic reset is provided to realize full exploration on the environment.

Specifically, the input state S includes two portions S1, S2. Wherein, S1 is a track point of a preset driving path and is global information; s2 is an abstraction of the lidar-aware surroundings, including dynamic and static obstacles, travelable areas around the vehicle.

The motion space A comprises two decomposed motions of a transverse motion space A1 and a longitudinal motion space A2;

wherein, the transverse motion space A1 is assumed to conform to Gaussian distribution as the basis of the subsequent random motion sampling; the reference value is set on the longitudinal motion space A2, and the rapid convergence of the intelligent agent model is realized through the design.

The return R after single step execution is an evaluation obtained after the action A is executed in a single step under the input state S, and R is related to 3 factors: 1) and the offset of the preset path, wherein the smaller the offset is, the larger R is; 2) and an offset of the expected travel speed, the smaller the offset, the larger R; 3) evaluation of collision risk and lane departure.

In this embodiment, a high-precision map of a specific scene and a closed running track τ of the vehicle under a normal condition are preset in the automatic driving controller, and the vehicle is required to run along the track tracking according to the positioning information received by the RTK positioning unit under a normal obstacle-free condition. However, since the environment cannot be fully explored all the time by driving along the trajectory, a method a1 'of sampling the strategy conforming to the gaussian distribution in the transverse operation is adopted, the standard preset action a1 is replaced, random noise is added to the speed operation to form an action a 2', and the action a2 is replaced, so as to realize the full exploration of the environment.

Due to the arrangement of the automatic exploration process, the vehicle has strong random uncertainty objectively in driving, and the deviation phenomenon occurs. In order to solve the problem, when the number of times of continuously executing the exploration behaviors reaches a set threshold value, the vehicle is controlled to reset and run according to a preset running track point. In the embodiment, safety of the vehicle in the random exploration process can be guaranteed through the constraint of the preset running track, the RTK high-precision positioning unit and the four laser radar data and the full-automatic driving state.

Examples

The embodiment of the invention provides an application case of an automatic driving vehicle decision training method based on reinforcement learning in a real scene, which comprises the following steps:

1. preparing a debugged drive-by-wire chassis, and respectively installing four 360-degree laser radars on front, rear, left and right vehicle bodies of the drive-by-wire chassis to form full coverage of a 360-degree view field by combination. And the RTK high-precision positioning unit is arranged on the roof of the vehicle, and the automatic driving controller is fixed on the vehicle body.

2. And downloading the high-precision map of the fixed park and the preset running track point into an automatic driving controller.

3. And downloading the programmed driving control and exploration algorithm into an automatic driving controller.

4. And downloading the compiled safe collision avoidance and behavior evaluation algorithm into an automatic driving controller.

5. When the drive-by-wire chassis and the automatic driving controller are started, the drive-by-wire chassis can drive along the preset track, intermittently executes exploration behaviors, and at the moment, begins to record S, A, R data. If the vehicle drifts, the vehicle automatically resets to a preset track point, and the behavior exploration is restarted.

6. If the search operation does not reach the automatic reset condition (reset after a certain number of searches), the search is continued.

7. And collecting until the training requirement is met, and adopting off-policy (offline) reinforcement learning algorithms such as DDPG, SAC and the like during training.

8. And downloading the trained model into an automatic driving controller, and evaluating the effect of the automatic driving controller by a real vehicle.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. The utility model provides an automatic driving vehicle decision-making training method based on reinforcement learning under the real scene, which characterized in that, automatic driving vehicle is provided with drive-by-wire chassis, positioner, laser radar device and automatic driving controller, drive-by-wire chassis is after the start-up along the track point of predetermineeing the route of traveling, positioner is used for acquireing the positional information of vehicle, the radar installation is used for acquireing the environmental data of vehicle travel process, automatic driving controller is used for controlling the vehicle travel process according to predetermined algorithm, the method includes:

2. The method of claim 1, wherein the input state S comprises: a track point S1 of a preset driving path and abstract information S2 of the surrounding environment acquired by the laser radar device.

3. The method of claim 1, wherein the motion space a comprises two decomposed motions of lateral motion space a1 and longitudinal motion space a 2;

a reference value is set in the vertical movement space a 2.

4. The method of claim 1, wherein the reward R after single-stepping is an evaluation obtained after a single-stepping action a in the input state S;

5. The method according to claim 1, characterized in that when the number of times of the continuous exploration activities reaches a set threshold, the vehicle is controlled to reset and to run according to a preset running track point.

6. The method of claim 1, wherein the reinforcement learning algorithm is an offline reinforcement learning algorithm.