CN112232490A - Deep simulation reinforcement learning driving strategy training method based on vision - Google Patents

Deep simulation reinforcement learning driving strategy training method based on vision Download PDF

Info

Publication number
CN112232490A
CN112232490A CN202011154491.7A CN202011154491A CN112232490A CN 112232490 A CN112232490 A CN 112232490A CN 202011154491 A CN202011154491 A CN 202011154491A CN 112232490 A CN112232490 A CN 112232490A
Authority
CN
China
Prior art keywords
network
learning
training
reinforcement learning
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011154491.7A
Other languages
Chinese (zh)
Other versions
CN112232490B (en
Inventor
邹启杰
熊康
高兵
汪祖民
王东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN202011154491.7A priority Critical patent/CN112232490B/en
Publication of CN112232490A publication Critical patent/CN112232490A/en
Application granted granted Critical
Publication of CN112232490B publication Critical patent/CN112232490B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Human Computer Interaction (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a deep simulation reinforcement learning driving strategy training method based on vision, which comprises the following steps: constructing a simulated learning network; training the mimic learning network; carrying out network splitting on the simulated learning network after training to obtain a perception module; constructing a DDPG network to obtain a control module; the construction of a deep simulation reinforcement learning model is completed through the sensing module and the control module; training the deep simulation reinforcement learning model. The simulation learning network comprises 5 convolutional layers and four full-connection layers, wherein the convolutional layers are used for extracting characteristics, and the full-connection layers are used for predicting steering angles, accelerator opening degrees and brake opening degrees; in addition, a reward function is set in the training process of the deep simulation reinforcement learning model, and the comfort and the safety of curve running are guaranteed.

Description

Deep simulation reinforcement learning driving strategy training method based on vision
Technical Field
The invention relates to the technical field of automatic driving, in particular to a deep simulation reinforcement learning driving strategy training method based on vision.
Background
The rise of automatic driving technology provides a new solution for the existing traffic problems. The automatic driving technology can effectively improve the driving efficiency of road motor vehicles, and further relieve traffic pressure. And the high-efficient and accurate executive power of machine is utilized, the traffic accident is reduced, and the driving safety index is improved. Meanwhile, the development of science and technology promotes the rise of traffic intellectualization, and the rapid development of the automatic driving technology is promoted together from the calculation capability and traffic big data to the deep learning of hot door.
In various tasks of automatic driving, sensors such as radar, laser radar, ultrasonic sensor and infrared camera are widely applied, but common color cameras can acquire abundant information due to low cost, and are still the most widely accepted environment sensing methods with extremely high reliability at present. Meanwhile, the intelligent driving system adopts a vision technology to distinguish and analyze the traffic environment, which also accords with the environment cognitive approach of artificial driving.
In the process of learning the driving strategy of the human simulator, deep reinforcement learning, simulation learning, transfer learning and the like are often adopted. The driving strategy learning based on the visual angle of the driver can adaptively extract features from the original picture if an end-to-end training mode is adopted, so that the limitation of manual feature extraction is effectively avoided, the cost is saved, artificial experience is not relied on, and the comprehensiveness, accuracy and objectivity of the driving strategy learning are improved. However, the method has some defects in the process of learning the autonomous driving strategy, the quality of the driving strategy learned by simulating and learning is limited by expert individual demonstration data, and an unknown scene is difficult to generalize by using a naive algorithm; deep reinforcement learning needs to perform large-scale exploration in a traffic environment before decision making, and the real-time performance of a driving decision and a planning process is difficult to guarantee due to low learning efficiency; in addition, the migration learning is a very effective way to solve the driving strategy learning, but it is also difficult to solve the driving strategy migration problem in the atypical environment.
Therefore, on one hand, the learning of expert driving strategies needs to be solved, and on the other hand, the driving strategies obtained through learning can have certain generalization capability, namely the driving strategies can also reach the human-like level in unknown traffic environments, so that the application range of automatic driving is enlarged.
The performance of the end-to-end model of the end-to-end vision-based lane keeping method disclosed in chinese patent document CN109446919A needs to depend on the quantity and quality of the collected driving data, and it takes a lot of time to collect data of various driving scenes to obtain an excellent driving strategy. Secondly, it is impractical to collect driving data for all scenes, so that the model cannot process unknown scenes, and it is difficult to further improve the performance.
In the method for constructing the layered end-to-end automatic vehicle driving system disclosed in chinese patent document CN108897313A, the first two layers of neural network models need to be trained by a large amount of label data, and the second two layers of reinforcement learning model algorithms are still training modes of traditional reinforcement learning, which needs a large amount of exploration, and training is time-consuming. Due to the complex network structure, each network model needs to be trained independently, and the training complexity is further increased.
The Model-free Deep Reinforcement Learning algorithm proposed by berkeley in 2019, Model-free Deep Reinforcement Learning for Urban Autonomous Driving, has the biggest problem that the Deep Reinforcement Learning part is still the traditional Learning mode, so that the Learning efficiency is low in the huge search space of Urban simulation, and the application range of the Model is limited.
The network model of the DAVE2 proposed by the NVIDIA team in "End to End learning for self-driving cars" has the same disadvantages as the model proposed in the patent document CN109446919A, requires a large amount of labeled data training networks, and cannot process other unknown scenes due to difficulty in collecting driving data of a whole scene.
Disclosure of Invention
Aiming at the defects in the prior art, the application provides a driving strategy training method based on deep simulation reinforcement learning of vision, which solves the problem of driving strategy learning in unknown environment, thereby improving the generalization capability of the driving strategy obtained by learning, also learning the principle of human driving, setting a reinforcement learning reward function suitable for a curve, and further ensuring the driving stability of a vehicle in the curve.
In order to achieve the purpose, the technical scheme of the application is as follows: a deep simulation reinforcement learning driving strategy training method based on vision comprises the following steps:
constructing a simulated learning network;
training the mimic learning network;
carrying out network splitting on the simulated learning network after training to obtain a perception module;
constructing a DDPG network to obtain a control module;
the construction of a deep simulation reinforcement learning model is completed through the sensing module and the control module;
training the deep simulation reinforcement learning model.
Further, the simulation learning network comprises 5 convolutional layers and four full-connection layers, wherein the convolutional layers are used for extracting features, and the full-connection layers are used for predicting the steering angle, the accelerator opening and the brake opening; the 5 convolutional layers use 5x5 convolutional kernels, wherein a max pooling layer and a Dropout layer are added to optimize the network; the 5 convolutional layers and the first three full-connection layers all use Relu activation functions, the last full-connection layer is an output layer and comprises three full-connection networks, the three full-connection networks respectively use tanh, sigmoid and sigmoid activation functions, and 3 actions of steering, accelerating and braking are correspondingly output.
Further, the imitation learning network inputs images with the size of 64x64 pixels after processing, and outputs automobile control information comprising predicted steering angle, predicted accelerator information and predicted braking information.
Further, training the imitation learning network specifically comprises:
the method comprises the steps that a TORCS (simulated driving simulator) is utilized to collect artificial driving data, the artificial driving data with excellent performance and a vehicle video frame of a corresponding driver visual angle are selected as sample data, an artificial control instruction is used as a label, and the artificial control instruction comprises steering, an accelerator and a brake;
the simulated learning network is trained using the DAgger algorithm, which is an iterative strategy training algorithm that reverts to an online learning state, with the learner retraining the primary classifier on all states encountered during each iteration.
Further, the simulated learning network after training is subjected to network splitting to obtain a perception module, which specifically comprises:
carrying out weight storage and network splitting on the simulated learning network after training is completed, splitting a front 7-layer network to be used as a sensing module and endowing the sensing module with a corresponding weight, wherein the front 7-layer network comprises 5 convolutional layers and 2 full-connection layers;
the sensing module inputs a first visual angle driving image and outputs a corresponding feature vector; the last two full-connection layer networks are used as action generating networks, and the action generating networks are used for initializing the Actor network in the control part by using the weight values of the action generating networks so as to ensure the initial performance of the whole model.
Further, the building of the DDPG network and obtaining of the control module specifically include:
dividing the DDPG network into an Actor network and a Critic network; the Actor network is divided into three layers, an input layer receives the characteristics generated by the sensing module, a hidden layer is a full-connection layer, and an output layer consists of three full-connection networks and respectively corresponds to output steering, an accelerator and a brake;
the Actor network has the same structure as the action generating network, and is initialized by using the weight of the action generating network; the Critic network is composed of fully-connected networks, the environmental characteristics generated by the sensing module and the action information in the Actor network are used as input, the action information is combined with the characteristic vector after being processed by one layer of fully-connected network, and then is processed by one layer of fully-connected network, and finally the values of three actions are output by one layer of fully-connected network and are provided for DDPG network learning.
Further, training the deep simulation reinforcement learning model specifically includes:
adding an OU exploration factor into a deep simulation reinforcement learning model, setting a reward function which accords with a task, setting a specified speed at a curve in the reward function, and encouraging the automobile to decelerate to the specified speed at the curve through the reward function and accelerate on a straight road;
the OU exploration formula is as follows:
E dxt=E(μ-xt)dt+σdwt (1)
wherein E represents the recovery to the average value too fast, μ represents the average value, σ represents the amplitude of the fluctuation, and the specific parameters are shown in Table 1;
TABLE 1 parameters of OU noise
Figure BDA0002742330500000061
The reward function is
Figure BDA0002742330500000062
Wherein I is a switch, the condition value in the middle brackets is 1 when the I is satisfied, and the condition value is 0 when the I is not satisfied; d1The distance of the lane line directly in front of the vehicle, d2The vehicle lane center value is a parameter for judging whether the vehicle is in the lane center, and the value approaches to 0 as the vehicle approaches to the lane center; v. ofxThe longitudinal speed of the automobile is represented, and theta represents the included angle between the automobile and the lane line; α and β are the target speed and penalty discount at the curve, respectively; when d is1When less than 10, it means that the vehicle is in a curve, when d1Less than 40, indicating an impending curve entry ahead, α is set to 50, encouraging the vehicle to slow down to 50 at the curve, when d is1Greater than 40 indicates that the vehicle is on a straight road, encouraging vehicle acceleration.
The invention can obtain the following technical effects: the intelligent driving control structure is divided into two parts of perception and control, and the driving strategy is efficiently learned through simulation learning and deep reinforcement learning respectively. On one hand, the learning cost is reduced by simulating learning based on the DAgger algorithm, and a better driving strategy is obtained by a small number of samples; on the other hand, the unknown environment incremental learning improves the unknown environment processing capacity, accelerates the convergence speed of the model and improves the overall real-time performance. In addition, the reward function set in the training process of the reinforcement learning model is deeply simulated, and the comfort and the safety of curve running are also ensured.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings needed to be used in the embodiments are briefly described as follows:
FIG. 1 is a deep-mimicking reinforcement learning model workflow diagram;
FIG. 2 is a schematic diagram of network splitting and weight sharing;
FIG. 3 is a schematic diagram of a deep-simulated reinforcement learning model;
fig. 4 is a view of a simulated learning network architecture.
Detailed Description
The invention is described in further detail below with reference to the following figures and specific examples: the present application is further described by taking this as an example. It is to be understood that the embodiments described are only some of the embodiments of the invention, and not all of them.
The embodiment provides a deep simulation reinforcement learning driving strategy training method based on vision, which combines the advantages of simulation learning and deep reinforcement learning, obtains initial driving strategy learning through simulation learning, and solves the problem of online driving strategy learning through deep reinforcement learning. The output of the simulated learning is used as the input of the deep reinforcement learning, so that the exploration space is reduced, and the learning efficiency is improved; meanwhile, the driving strategy learning of unknown environment is achieved through deep reinforcement learning, and therefore the generalization capability of the driving strategy obtained through learning is improved. The invention also learns the driving principle of human beings, sets a reinforcement learning reward function suitable for the curve, and further ensures the stability of the vehicle when the vehicle runs on the curve. The method specifically comprises the following steps:
constructing a simulated learning network;
the simulation learning network comprises 5 convolutional layers and four full-connection layers, wherein the convolutional layers are used for extracting characteristics, and the full-connection layers are used for predicting steering angles, accelerator opening degrees and brake opening degrees; the input of the simulated learning network is processed images with the size of 64x64 pixels, and the output of the simulated learning network is automobile control information comprising predicted steering angle, predicted accelerator information and predicted brake information. The 5 convolutional layers use 5x5 convolutional kernels, wherein a max pooling layer and a Dropout layer are added to optimize the network; the 5 convolutional layers and the first three full-connection layers all use Relu activation functions, the last full-connection layer is an output layer and comprises three full-connection networks, the three full-connection networks respectively use tanh, sigmoid and sigmoid activation functions, and 3 actions of steering, accelerating and braking are correspondingly output.
Training the mimic learning network;
the method comprises the steps of utilizing a TORCS (simulated driving simulator) to collect artificial driving data, selecting data with excellent driving performance and a vehicle video frame of a corresponding driver visual angle as sample data, and using artificial control instructions (steering, accelerator and brake) as labels. Only typical driving data need to be collected, and the driving data can be used as a sample of simulation learning.
Under the condition of less demonstration data, the simulation learning network is trained by using a DAgger algorithm so as to obtain the best training effect. DAgger is an iterative strategy training algorithm that reverts to an online learning state. In each iteration, the learner retrains the primary classifier on all states encountered. The main advantage of DAgger is that expert demonstration is used to teach the learner how to recover from past errors, enabling active learning to the expert.
Carrying out network splitting on the simulated learning network after training to obtain a perception module;
carrying out weight value storage and network splitting on the simulated learning network after training is finished, splitting a front 7-layer network comprising 5 convolutional layers and 2 full-connection layers to serve as a sensing module and endowing the sensing module with corresponding weight values, wherein a first visual angle driving image is input by the sensing module, and a corresponding characteristic vector is output by the sensing module; the latter two layers of networks serve as action generating networks, and the action generating networks are used for initializing the Actor network in the control part by using the weights of the action generating networks so as to ensure the initial performance of the whole model, as shown in fig. 2.
Constructing a DDPG network to obtain a control module;
the DDPG network is divided into an Actor network and a Critic network. Because the perception part processes the input of the original image, the network of the control part can be simplified to improve the learning efficiency. The Actor network is divided into three layers, an input layer receives characteristics generated by a sensing part, a hidden layer is a full-connection layer, and an output layer consists of three full-connection networks and respectively corresponds to output steering, an accelerator and a brake.
The last two layers of networks (action generating networks) of the Actor network and the simulation learning network have the same structure, and the Actor network is initialized by using the weight of the action generating network. The Critic network is composed of a fully-connected network, the environmental characteristics generated by a sensing part and action information in an Actor network are used as input, the action information is combined with the characteristic vector after being processed by a layer of the fully-connected network, then is processed by a layer of the fully-connected network, and finally, the values of three actions are output by a layer of the fully-connected network and are provided for DDPG network learning.
Completing the construction of a Deep simulation Reinforcement Learning model (DIRL) through the sensing module and the control module;
the deep simulation reinforcement learning model comprises a perception module and a control module, and the flow is shown in the figure 1. The sensing module inputs a first visual angle driving image from the TORCS simulator, the size of the first visual angle driving image is 64x64, the first visual angle driving image is processed by the sensing module to generate a corresponding feature vector, and the control module takes the feature vector generated by the sensing module as input and outputs a final control command. The specific network structure is shown in fig. 3.
Training the deep simulation reinforcement learning model;
and adding a proper OU exploration factor into the deep simulation reinforcement learning model, setting a reward function which accords with a task, exploring and learning in a simulation environment, and further improving the performance of the whole model. In order to solve the problem that the vehicle cannot be correctly braked at a curve when the vehicle is driven at a high speed, a specified speed is set at the curve in a reward function, and the vehicle is encouraged to decelerate to the specified speed at the curve and accelerate on a straight road through the reward function. Because the whole model has initial performance, a large amount of exploration time can be reduced, and the learning efficiency is improved.
The OU exploration formula is as follows:
E dxt=E(μ-xt)dt+σdwt (1)
wherein E represents the recovery to the average value too fast, μ represents the average value, σ represents the amplitude of the fluctuation, and the specific parameters are shown in Table 1;
TABLE 1 parameters of OU noise
Figure BDA0002742330500000101
The reward function is
Figure BDA0002742330500000102
Wherein I is a switch, the condition value in the middle brackets is 1 when the I is satisfied, and the condition value is 0 when the I is not satisfied; d1The distance of the lane line directly in front of the vehicle, d2The vehicle lane center value is a parameter for judging whether the vehicle is in the lane center, and the value approaches to 0 as the vehicle approaches to the lane center; v. ofxThe longitudinal speed of the automobile is represented, and theta represents the included angle between the automobile and the lane line; α and β are the target speed and penalty discount at the curve, respectively; when d is1When less than 10, it means that the vehicle is in a curve, when d1Less than 40, indicating an impending curve entry ahead, α is set to 50, encouraging the vehicle to slow down to 50 at the curve, when d is1Greater than 40 indicates that the vehicle is on a straight road, encouraging vehicle acceleration.
According to the method, the problem of learning of the driving strategy outside the expert demonstration is solved, in the DIRL model, a certain initial performance is obtained in a short time by using a small amount of labeled data through a supervised learning mode through simulation learning, then exploration learning is carried out by using a deep reinforcement learning algorithm DDPG, and an agent is further enabled to learn the driving strategy outside the expert demonstration by adding OU exploration noise and a corresponding reward function, so that more excellent performance is obtained.
The capacity of coping with unknown traffic environment is improved. Due to the fact that appropriate exploration noise is added into the DDPG, the agent can obtain a large amount of driving data of unknown scenes in a simulation environment, and the agent has better capability of coping with the unknown scenes relative to simulation learning through adjustment of network parameters by the aid of the reward function.
Improve the exploration efficiency of deep reinforcement learning and accelerate the convergence speed. Through a small amount of label data, the simulation learning network is trained by using a Dagger algorithm, then the sensing module and the action generating network are constructed in a network splitting mode, the action generating network is used for initializing the Actor network of the control part, so that the whole DIRL model has certain initial performance, because the model has the initial performance, a large amount of unnecessary exploration in the reinforcement learning is avoided, and the sensing module processes original pixels, the network structure of the reinforcement learning can be greatly simplified, and the learning efficiency is further improved.
The above description is only for the purpose of creating a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the technical scope of the present invention.

Claims (7)

1. A deep simulation reinforcement learning driving strategy training method based on vision is characterized by comprising the following steps:
constructing a simulated learning network;
training the mimic learning network;
carrying out network splitting on the simulated learning network after training to obtain a perception module;
constructing a DDPG network to obtain a control module;
the construction of a deep simulation reinforcement learning model is completed through the sensing module and the control module;
training the deep simulation reinforcement learning model.
2. The training method of the deep simulation reinforcement learning driving strategy based on the vision is characterized in that the simulation learning network comprises 5 convolutional layers and four fully-connected layers, wherein the convolutional layers are used for extracting features, and the fully-connected layers are used for predicting the steering angle, the throttle opening and the brake opening; the 5 convolutional layers use 5x5 convolutional kernels, wherein a max pooling layer and a Dropout layer are added to optimize the network; the 5 convolutional layers and the first three full-connection layers all use Relu activation functions, the last full-connection layer is an output layer and comprises three full-connection networks, the three full-connection networks respectively use tanh, sigmoid and sigmoid activation functions, and 3 actions of steering, accelerating and braking are correspondingly output.
3. The method as claimed in claim 2, wherein the mimic learning network inputs processed 64x64 pixels and outputs vehicle control information including predicted steering angle, predicted throttle information and predicted braking information.
4. The method for training the deep simulation reinforcement learning driving strategy based on vision as claimed in claim 1, wherein the simulation learning network is trained by:
the method comprises the steps that a TORCS (simulated driving simulator) is utilized to collect artificial driving data, the artificial driving data with excellent performance and a vehicle video frame of a corresponding driver visual angle are selected as sample data, an artificial control instruction is used as a label, and the artificial control instruction comprises steering, an accelerator and a brake;
the simulated learning network is trained using the DAgger algorithm, which is an iterative strategy training algorithm that reverts to an online learning state, with the learner retraining the primary classifier on all states encountered during each iteration.
5. The vision-based deep simulation reinforcement learning driving strategy training method of claim 1, wherein a simulated learning network after training is subjected to network splitting to obtain a perception module, and specifically comprises the following steps:
carrying out weight storage and network splitting on the simulated learning network after training is completed, splitting a front 7-layer network to be used as a sensing module and endowing the sensing module with a corresponding weight, wherein the front 7-layer network comprises 5 convolutional layers and 2 full-connection layers;
the sensing module inputs a first visual angle driving image and outputs a corresponding feature vector; the last two full-connection layer networks are used as action generating networks, and the action generating networks are used for initializing the Actor network in the control part by using the weight values of the action generating networks so as to ensure the initial performance of the whole model.
6. The method for training the deep simulation reinforcement learning driving strategy based on the vision as claimed in claim 1, wherein the building of the DDPG network and the obtaining of the control module are specifically as follows:
dividing the DDPG network into an Actor network and a Critic network; the Actor network is divided into three layers, an input layer receives the characteristics generated by the sensing module, a hidden layer is a full-connection layer, and an output layer consists of three full-connection networks and respectively corresponds to output steering, an accelerator and a brake;
the Actor network has the same structure as the action generating network, and is initialized by using the weight of the action generating network; the Critic network is composed of fully-connected networks, the environmental characteristics generated by the sensing module and the action information in the Actor network are used as input, the action information is combined with the characteristic vector after being processed by one layer of fully-connected network, and then is processed by one layer of fully-connected network, and finally the values of three actions are output by one layer of fully-connected network and are provided for DDPG network learning.
7. The vision-based deep-imitation reinforcement learning driving strategy training method of claim 1, wherein the deep-imitation reinforcement learning model is trained by specifically:
adding an OU exploration factor into a deep simulation reinforcement learning model, setting a reward function which accords with a task, setting a specified speed at a curve in the reward function, and encouraging the automobile to decelerate to the specified speed at the curve through the reward function and accelerate on a straight road;
the OU exploration formula is as follows:
E dxt=E(μ-xt)dt+σdwt (1)
wherein E represents the recovery to the average value too fast, μ represents the average value, σ represents the amplitude of the fluctuation, and the specific parameters are shown in Table 1;
TABLE 1 parameters of OU noise
Figure FDA0002742330490000031
The reward function is
Figure FDA0002742330490000041
Wherein I is a switch, the condition value in the middle brackets is 1 when the I is satisfied, and the condition value is 0 when the I is not satisfied; d1The distance of the lane line directly in front of the vehicle, d2The vehicle lane center value is a parameter for judging whether the vehicle is in the lane center, and the value approaches to 0 as the vehicle approaches to the lane center; v. ofxThe longitudinal speed of the automobile is represented, and theta represents the included angle between the automobile and the lane line; α and β are the target speed and penalty discount at the curve, respectively; when d is1When less than 10, it means that the vehicle is in a curve, when d1Less than 40, indicating an impending curve entry ahead, α is set to 50, encouraging the vehicle to slow down to 50 at the curve, when d is1Greater than 40 indicates that the vehicle is on a straight road, encouraging vehicle acceleration.
CN202011154491.7A 2020-10-26 2020-10-26 Visual-based depth simulation reinforcement learning driving strategy training method Active CN112232490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011154491.7A CN112232490B (en) 2020-10-26 2020-10-26 Visual-based depth simulation reinforcement learning driving strategy training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011154491.7A CN112232490B (en) 2020-10-26 2020-10-26 Visual-based depth simulation reinforcement learning driving strategy training method

Publications (2)

Publication Number Publication Date
CN112232490A true CN112232490A (en) 2021-01-15
CN112232490B CN112232490B (en) 2023-06-20

Family

ID=74109364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011154491.7A Active CN112232490B (en) 2020-10-26 2020-10-26 Visual-based depth simulation reinforcement learning driving strategy training method

Country Status (1)

Country Link
CN (1) CN112232490B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112904864A (en) * 2021-01-28 2021-06-04 的卢技术有限公司 Automatic driving method and system based on deep reinforcement learning
CN113064424A (en) * 2021-03-17 2021-07-02 西安工业大学 Unmanned vehicle path planning method for improving DDPG algorithm
CN113353102A (en) * 2021-07-08 2021-09-07 重庆大学 Unprotected left-turn driving control method based on deep reinforcement learning
CN113715842A (en) * 2021-08-24 2021-11-30 华中科技大学 High-speed moving vehicle control method based on simulation learning and reinforcement learning
CN113741533A (en) * 2021-09-16 2021-12-03 中国电子科技集团公司第五十四研究所 Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning
CN114104005A (en) * 2022-01-26 2022-03-01 苏州浪潮智能科技有限公司 Decision-making method, device and equipment of automatic driving equipment and readable storage medium
CN114444718A (en) * 2022-01-26 2022-05-06 北京百度网讯科技有限公司 Training method of machine learning model, signal control method and device
CN114609925A (en) * 2022-01-14 2022-06-10 中国科学院自动化研究所 Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish
CN114708568A (en) * 2022-06-07 2022-07-05 东北大学 Pure vision automatic driving control system, method and medium based on improved RTFNet
CN114925850A (en) * 2022-05-11 2022-08-19 华东师范大学 Deep reinforcement learning confrontation defense method for disturbance reward

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427827A (en) * 2019-07-08 2019-11-08 辽宁工程技术大学 It is a kind of it is multiple dimensioned perception and Global motion planning under autonomous driving network
CN110795821A (en) * 2019-09-25 2020-02-14 的卢技术有限公司 Deep reinforcement learning training method and system based on scene differentiation
CN111010294A (en) * 2019-11-28 2020-04-14 国网甘肃省电力公司电力科学研究院 Electric power communication network routing method based on deep reinforcement learning
US20200241542A1 (en) * 2019-01-25 2020-07-30 Bayerische Motoren Werke Aktiengesellschaft Vehicle Equipped with Accelerated Actor-Critic Reinforcement Learning and Method for Accelerating Actor-Critic Reinforcement Learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200241542A1 (en) * 2019-01-25 2020-07-30 Bayerische Motoren Werke Aktiengesellschaft Vehicle Equipped with Accelerated Actor-Critic Reinforcement Learning and Method for Accelerating Actor-Critic Reinforcement Learning
CN110427827A (en) * 2019-07-08 2019-11-08 辽宁工程技术大学 It is a kind of it is multiple dimensioned perception and Global motion planning under autonomous driving network
CN110795821A (en) * 2019-09-25 2020-02-14 的卢技术有限公司 Deep reinforcement learning training method and system based on scene differentiation
CN111010294A (en) * 2019-11-28 2020-04-14 国网甘肃省电力公司电力科学研究院 Electric power communication network routing method based on deep reinforcement learning

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112904864A (en) * 2021-01-28 2021-06-04 的卢技术有限公司 Automatic driving method and system based on deep reinforcement learning
CN113064424A (en) * 2021-03-17 2021-07-02 西安工业大学 Unmanned vehicle path planning method for improving DDPG algorithm
CN113353102A (en) * 2021-07-08 2021-09-07 重庆大学 Unprotected left-turn driving control method based on deep reinforcement learning
CN113353102B (en) * 2021-07-08 2022-11-25 重庆大学 Unprotected left-turn driving control method based on deep reinforcement learning
CN113715842A (en) * 2021-08-24 2021-11-30 华中科技大学 High-speed moving vehicle control method based on simulation learning and reinforcement learning
CN113741533A (en) * 2021-09-16 2021-12-03 中国电子科技集团公司第五十四研究所 Unmanned aerial vehicle intelligent decision-making system based on simulation learning and reinforcement learning
CN114609925A (en) * 2022-01-14 2022-06-10 中国科学院自动化研究所 Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish
CN114609925B (en) * 2022-01-14 2022-12-06 中国科学院自动化研究所 Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish
CN114104005B (en) * 2022-01-26 2022-04-19 苏州浪潮智能科技有限公司 Decision-making method, device and equipment of automatic driving equipment and readable storage medium
CN114444718A (en) * 2022-01-26 2022-05-06 北京百度网讯科技有限公司 Training method of machine learning model, signal control method and device
CN114104005A (en) * 2022-01-26 2022-03-01 苏州浪潮智能科技有限公司 Decision-making method, device and equipment of automatic driving equipment and readable storage medium
CN114925850A (en) * 2022-05-11 2022-08-19 华东师范大学 Deep reinforcement learning confrontation defense method for disturbance reward
CN114925850B (en) * 2022-05-11 2024-02-20 华东师范大学 Deep reinforcement learning countermeasure defense method for disturbance rewards
CN114708568A (en) * 2022-06-07 2022-07-05 东北大学 Pure vision automatic driving control system, method and medium based on improved RTFNet
CN114708568B (en) * 2022-06-07 2022-10-04 东北大学 Pure vision automatic driving control system, method and medium based on improved RTFNet

Also Published As

Publication number Publication date
CN112232490B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
CN112232490B (en) Visual-based depth simulation reinforcement learning driving strategy training method
CN112965499B (en) Unmanned vehicle driving decision-making method based on attention model and deep reinforcement learning
CN111061277B (en) Unmanned vehicle global path planning method and device
CN110745136A (en) Driving self-adaptive control method
CN113044064B (en) Vehicle self-adaptive automatic driving decision method and system based on meta reinforcement learning
CN112731925A (en) Conical barrel identification and path planning and control method for unmanned formula racing car
CN114358128A (en) Method for training end-to-end automatic driving strategy
CN114035575B (en) Unmanned vehicle motion planning method and system based on semantic segmentation
Hu et al. Learning a deep cascaded neural network for multiple motion commands prediction in autonomous driving
CN113715842B (en) High-speed moving vehicle control method based on imitation learning and reinforcement learning
CN111645673B (en) Automatic parking method based on deep reinforcement learning
CN110281949B (en) Unified hierarchical decision-making method for automatic driving
CN110196587A (en) Vehicular automatic driving control strategy model generating method, device, equipment and medium
CN113255054A (en) Reinforcement learning automatic driving method based on heterogeneous fusion characteristics
CN110930811B (en) System suitable for unmanned decision learning and training
CN116595871A (en) Vehicle track prediction modeling method and device based on dynamic space-time interaction diagram
CN108921044A (en) Driver's decision feature extracting method based on depth convolutional neural networks
Gao et al. Autonomous driving based on modified sac algorithm through imitation learning pretraining
CN116382267B (en) Robot dynamic obstacle avoidance method based on multi-mode pulse neural network
Zhang et al. A convolutional neural network method for self-driving cars
CN112991744B (en) Automatic driving decision-making method and system suitable for long-distance urban road
CN114170488A (en) Automatic driving method based on condition simulation learning and reinforcement learning
Liu et al. End-to-end control of autonomous vehicles based on deep learning with visual attention
Liu et al. Personalized Automatic Driving System Based on Reinforcement Learning Technology
CN116048096B (en) Unmanned vehicle movement planning method based on hierarchical depth perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant