CN109143852B - Intelligent driving vehicle environment self-adaptive importing method under urban environment - Google Patents

Intelligent driving vehicle environment self-adaptive importing method under urban environment Download PDF

Info

Publication number
CN109143852B
CN109143852B CN201810780413.4A CN201810780413A CN109143852B CN 109143852 B CN109143852 B CN 109143852B CN 201810780413 A CN201810780413 A CN 201810780413A CN 109143852 B CN109143852 B CN 109143852B
Authority
CN
China
Prior art keywords
vehicle
value
action
reward
import
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810780413.4A
Other languages
Chinese (zh)
Other versions
CN109143852A (en
Inventor
陈雪梅
刘哥盟
杜明明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201810780413.4A priority Critical patent/CN109143852B/en
Publication of CN109143852A publication Critical patent/CN109143852A/en
Application granted granted Critical
Publication of CN109143852B publication Critical patent/CN109143852B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Traffic Control Systems (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)

Abstract

The invention discloses a self-adaptive importing method of an intelligent driving vehicle environment in an urban environment, which comprises the steps of extracting an initial state vector; calculating action variables according to a greedy strategy, executing an import action and simultaneously updating an import scene, selecting an import gap and an import action according to uniform probability if the action variables take random actions, comparing maximum action value functions of all candidate gaps if an intelligent method is adopted, selecting a gap and an action corresponding to a maximum value from the maximum value functions, and returning to a target import gap and an intelligent import action; sensing a state vector at the next moment; calculating an award value according to the environment feedback information; storing the initial state vector, the action variable, the state vector at the next moment and the reward value into a sample set, and evaluating and improving according to an LSQ method after enough samples are obtained; and repeating the steps until the confluence is successful. The sample set and the learning time of the invention are lower than those of the Q learning algorithm, and the success rate is high.

Description

Intelligent driving vehicle environment self-adaptive importing method under urban environment
Technical Field
The method relates to an environment self-adaptive import method which comprehensively considers target gap selection and expected import opportunity under a complex city environment.
Background
The unmanned vehicle has great potential in solving the problems of traffic safety, road congestion management and the like as a future traffic development trend. As the 'brain' of the unmanned vehicle, the decision-making system embodies the intelligent level thereof, improves the generalization and adaptability of the decision-making system in a complex urban environment, and is very important for developing the unmanned vehicle capable of actually driving on the road. However, the traditional rule learning-based unmanned vehicle can only adapt to a single driving environment, cannot face a complex and changeable real scene, and the decision making may not meet the requirements of robustness and flexibility. The expressway converging requirements under the urban environment make safe and effective decisions under multiple constraint conditions such as short time and limited space, and higher requirements are provided for a decision making system of an unmanned vehicle.
In the aspect of the research of the convergence behavior strategy, Yang provides a longitudinal control algorithm, guides the unmanned vehicle to converge into a main line and provides a speed strategy according to the distance from a target gap. Liu et al utilizes an improved game theory framework to model the influx behavior of freeway ramps. Ran et al are concerned with how incoming vehicles travel on freeway ramps to desired incoming locations and quantify the interaction between each other by modeling the acceleration and deceleration of the vehicles. The above studies have all focused on telling the highway environment, high density urban environments are rarely involved, and most studies consider tactical decisions of lane changing, and less studies describe the continuity process of lane changing.
In the aspect of application of reinforcement learning in driving behavior decision, Abbel and Wunda consider interaction between a vehicle and the surrounding environment, reverse reinforcement learning is utilized, vehicle operation is learned, influence of the environment on behavior operation is reflected by a reward function, and functional relationship mapping between environment influence factors and vehicle motion is established. The national defense science Xuxin research team provides the obstacle avoidance and navigation problems of the intelligent vehicle under the continuous state space of the expressway based on the approximate strategy iteration KLSPI algorithm. Shalev-Shwartz discusses a secure reinforcement learning method that divides the policy network into two parts and learns the safety and comfort of driving separately, but model validity is only verified in a simple simulation environment.
The method only considers a few indexes in consideration of the convergence, and cannot simulate the automatic convergence driving experience of artificial driving.
Disclosure of Invention
1. The invention aims to provide a novel method.
The method comprehensively considers evaluation indexes such as safety, comfort, timeliness and the like, establishes the existing weighted comprehensive reward value model, simultaneously sets an action space containing two-dimensional variables, namely a longitudinal speed decision variable and a transverse speed decision variable, decouples the transverse and longitudinal motions of the unmanned vehicle, realizes the continuous control of the importing process, and improves the adaptability of the unmanned vehicle to the dynamic environment in the importing process.
2. The technical scheme adopted by the invention is disclosed.
The invention provides an environment self-adaptive importing method of an intelligent driving vehicle in an urban environment, which is characterized by comprising the following steps of:
extracting an initial state vector;
calculating action variables according to a greedy strategy, executing the import action and simultaneously updating the import scene, if the action variables take random action, selecting the import interval and the import action according to uniform probability,
if intelligent selection action is adopted, the candidate gaps comprise a front vehicle, a rear vehicle and an incoming vehicle, the maximum action value functions of all the candidate gaps are compared, the maximum function is selected, the gap and the action corresponding to the maximum value are selected, and the target incoming gap and the intelligent incoming action are returned;
sensing a state vector at the next moment;
calculating an award value according to the environment feedback information;
storing the initial state vector, the action variable, the state vector at the next moment and the reward value into a sample set, and evaluating and improving the strategy according to an LSQ method after enough samples are obtained;
and repeating the steps until the merging is successful.
Further, the state space is described as a seven-dimensional vector space, wherein the front three dimensions are position coordinates and speed information of the imported vehicle, and the rear four dimensions are longitudinal position coordinates and speed information of the front vehicle and the rear vehicle of the target lane in the simulation process.
Furthermore, the initial state space adopts a basis function which is included in the collision time, the headway, the relative distance and the relative speed of the two vehicles and the motion state.
Further, the action variables include a longitudinal speed decision variable and a transverse speed decision variable.
Furthermore, the acceleration of the longitudinal speed decision variable in the action variables is discrete into five action values of rapid deceleration, uniform speed, acceleration and rapid acceleration, and the transverse speed decision variable is two action values, so that the action space of the longitudinal speed decision variable and the transverse speed decision variable is 10 actions.
Furthermore, the reward function for calculating the reward value is a linear weighting function of the security reward value, the reward value for success or failure of the task, the remittance efficiency reward value, the speed limit reward value and the comfort reward value.
Furthermore, the speed limit, i.e. the safety degree reward function, is specifically as follows:
when the collision is or is easy to collide, a large negative reward (punishment) is given, when the safety condition is met, the reward value is 0, and therefore the weight of the safety reward value is a large negative value;
Figure BDA0001732416040000031
dx10,dx02the relative distances between the converged vehicle and the front vehicle and the rear vehicle of the target lane are respectively, wherein dis is a safe threshold value of the relative distance between the converged vehicle and the front vehicle of the target lane and between the converged vehicle and the rear vehicle of the target lane.
Further, the task success reward function:
Figure BDA0001732416040000032
dis1to a safe distance threshold, dx10,dx02The relative distances between the vehicle to be merged and the front vehicle and the rear vehicle of the target lane are respectively, when the unmanned vehicle is successfully merged, a larger positive reward is given, and the weight is a larger positive value.
Further, the import efficiency reward value function is:
Figure BDA0001732416040000033
step represents the current period, and when the unmanned vehicle successfully sinks within the preset value, a positive reward is given, otherwise, a negative reward is given, so that the weight is positive.
Furthermore, the speed limit reward function is as follows:
Figure BDA0001732416040000034
vlimitindicating the road speed limit. When the unmanned vehicle is in the speed limit range, the speed limit reward value is 0; if speeding, a negative prize value is given, so the weight is positive.
Further, the comfort reward function:
the comfort degree in the driving process comprises characteristic indexes of acceleration and impact degree in the longitudinal and transverse directions, the impact degree refers to the change rate of the acceleration along with time, the comfort reward value considers the change of the longitudinal acceleration and is normalized as follows:
Figure BDA0001732416040000041
where | △ a | represents the longitudinal acceleration motion difference of two cycles, amaxRepresents the maximum acceleration, aminRepresents the maximum deceleration, when the acceleration difference is 0, the reward value is zero; in other cases, the acceleration changes constantly, the driving comfort decreases, and a negative reward is given, so the weight is a negative value.
3. The technical effect produced by the invention.
(1) Compared with the Q learning algorithm, the environment self-adaptive importing method (LSPI algorithm) of the intelligent driving vehicle in the urban environment has the advantages that the environment generalization and popularization capability of Q learning is restricted due to the discrete state space, the samples cannot be fully learned, information loss is generated, the state space is considered to be more finely dispersed, the calculated amount is exponentially increased along with the increase of the state space dimension, the requirement on the storage space is larger, the algorithm convergence time and the required sample set are far longer than that of the LSPI algorithm, and the learning time is longer than that of the LSPI algorithm.
(2) According to the LSPI algorithm and the Q learning algorithm, verification shows that the import success rate based on the LSPI algorithm is gradually improved along with the increase of training times, and finally reaches 86% success rate, which shows that the import strategy method can independently learn the import strategy. The learning success rate of Q fluctuates around 25%, the import success rate is low, and the applicability of the algorithm is not high.
Drawings
FIG. 1 is a graph comparing success rate of Q learning and LSPI algorithm.
Fig. 2 shows the comparison result between the import strategy gap selection and the real data.
Fig. 3 is a graph comparing data of the 2745 imported vehicle and simulation experiments.
Fig. 4 is a graph comparing data of the imported vehicle 63 with simulation experiment data.
FIG. 5 is a flow chart of multi-target candidate gap selection.
Fig. 6 is a flowchart of the import strategy training based on the LSPI algorithm.
Detailed Description
Examples
The invention considers the target gap selection and the expected influx opportunity and provides the target gap selection and the expected influx opportunity based on a least square strategy iterative algorithm.
The method regards a front vehicle, a rear vehicle and an afflux vehicle of a candidate gap as a unit afflux system to carry out reinforcement learning modeling. And in the strategy optimization process, comparing the maximum action value functions of all candidate gaps, and selecting the strategy corresponding to the maximum value as an output strategy. In the unit system reinforcement learning modeling process, evaluation indexes such as safety, comfort and timeliness are comprehensively considered, an existing weighted comprehensive reward value model is established, meanwhile, an action space is set to comprise two-dimensional variables, namely a longitudinal speed decision variable and a transverse speed decision variable, transverse and longitudinal movement of the unmanned vehicle is decoupled, and continuity control of the remittance process is achieved.
Modeling based on LSPI algorithm import strategy:
(1) state space
The unit system state space of the LSPI algorithm is described as a seven-dimensional vector space (x)0y0v0x1v1x2v2) Wherein (x)0y0v0) To incorporate the position coordinates and velocity information of the vehicle, (x)1v1x2v2) And longitudinal position coordinates and speed information of the front vehicle and the rear vehicle of the target lane in the simulation process are represented.
(2) Basis function establishment
The basis functions, which are also referred to as features in some cases, are generally selected based on empirical knowledge, and commonly used basis functions include gaussian radial basis functions, polynomial basis functions, and the like. The invention relates to a Time To Collision (TTC) and a headway (gt) of two vehicles in a unit systemi) Relative distance (dx)i) And relative velocity (dv)i) And a part of self-state information (y) of the unmanned vehicle0,v0) The basis functions are included and comprise 14 dimensions, as shown in table 1.
TABLE 1 basis function establishment
Figure BDA0001732416040000051
(3) Movement space
In order to simplify the action space of the model and ensure the comfort requirement, the invention disperses the longitudinal acceleration into five action values of rapid deceleration, uniform speed, acceleration and rapid acceleration which respectively correspond to (-4, -2, 0, 2 and 4). Thus, the motion space contains 10 motions, as shown in table 2.
TABLE 2 action space settings
Figure BDA0001732416040000061
(4) Reward function
In the process of converging the unmanned vehicle, the condition which is firstly met is safety. Secondly, the import process requires that the lane change operation be completed within limited time and space constraints, and therefore, the efficiency of the import process is also one of the evaluation indexes. The method refers to the merging behavior of urban expressway ramps in the real world, aims to keep traffic rules, and considers the comfort in the driving process, so the method brings the speed limit and the comfort into evaluation indexes. In view of the above, the present invention establishes a linearly weighted overall prize value model, as shown in equation 1,
Figure BDA0001732416040000062
Rsafety(s, a) represents a security reward value, μ1For security reward value weight, Rtask(s, a) a prize value, μ, indicating success or failure of the task2Rewarding value weight for task success, Rtime(s, a) represents the remittance efficiency reward value, μ3Rewarding value weight for remittance efficiency, Rrule(s, a) represents a rate-limiting prize value, μ4Rewarding value weight for remittance efficiency, Rcomfort(s, a) comfort reward value, μ5Value weights are awarded for the importation efficiency.
Dx in the formula presented below10,dx02The relative distances of the merging vehicle to the front vehicle and the rear vehicle of the target lane are respectively.
1) Security reward function
During driving, safety is the most important evaluation index, when collision or collision is easy, a large negative reward (punishment) is given, when the safety condition is met, the reward value is 0, and therefore the weight mu of the safety reward value1A large negative value.
Figure BDA0001732416040000071
Wherein dis is a relative distance safety threshold value between the converged vehicle and the vehicle in front of the target lane and between the converged vehicle and the vehicle behind the target lane
1) Task success reward function
The task success reward value is a reward value which is fed back when the remittance task is completed safely and efficiently. The unit importing system of the present invention includes:
Figure BDA0001732416040000072
dis1is a safe distance threshold. When the unmanned vehicle successfully enters, a larger positive reward is given, and the weight mu2A large positive value.
2) Sink efficiency reward function
The converged driving behavior is required to be within certain space constraint, and the lane change task is efficiently completed. Therefore, the invention designs the remittance efficiency reward value according to the timeliness of completion of the remittance task:
Figure BDA0001732416040000073
step denotes the current period. When the unmanned vehicle successfully remits within 6.5 seconds, a positive reward is given, and vice versa, a negative reward is given, so the weight mu3Positive values.
3) Speed limit reward function
In the driving process, the traffic laws and regulations are required to be complied with, and the speed limit reward value is introduced to regulate the speed of the unmanned vehicle within a reasonable range.
Figure BDA0001732416040000074
vlimitIndicating the road speed limit. When the unmanned vehicle is in the speed limit range, the speed limit reward value is 0; if overspeed, a negative prize value is given. Thus, the weight μ4Positive values.
4) Comfort reward function
The comfort degree during driving comprises characteristic indexes of acceleration and impact degree in the longitudinal direction and the transverse direction, and the impact degree refers to the change rate of the acceleration along with time. In the research of the convergent process, only a simple two-degree-of-freedom kinematic model is considered, so the comfort reward value mainly considers the longitudinal acceleration change and is normalized as follows:
Figure BDA0001732416040000081
where | △ a | represents the longitudinal acceleration motion difference of two cycles, amaxRepresents the maximum acceleration, aminIndicating the maximum deceleration. When the acceleration difference is 0, the reward value is zero; under other conditions, the acceleration changes continuously, the driving comfort is reduced, and negative rewards are given to the driverThe weight mu5Is negative.
Example 2
The specific process of the optimization training of the environment self-adaptive importing strategy method based on the LSPI algorithm is as follows:
(1) initialization strategy pi0And sample set D0
(2) Operating a Vissim + Prescan combined traffic simulation platform;
(3) obtaining the information of the imported environment from the simulation environment, and extracting the state vector st
(4) Computing an action variable a according to a greedy policyt(ii) a If random action is taken, selecting an influx gap and an influx action according to uniform probability; and if the intelligent agent selection action is adopted, comparing the maximum function of all the candidate gaps, selecting the gap and the action corresponding to the maximum value, and returning to the target remittance gap and the intelligent agent remittance action. The simulation platform executes the import action atAnd updates the import scene.
(5) The unmanned vehicle senses the state vector s of the next moment through the sensort+1
(6) Calculating a reward value R based on the environmental feedback informationt
(7) Will(s)t,at,st+1,Rt) The sample set is stored and if sufficient, the policy is evaluated and refined according to the LSQ method.
(8) Repeating the steps 3-7 until the remittance is successful;
(9) and repeating the steps 2-8 until the strategy converges or the maximum iteration number is reached.
Experimental verification
Simulation comparison experiment of Q learning and LSPI algorithm:
1) a comparison experiment is designed in the unit import system strategy learning stage, and the following table shows the number of samples required by the unmanned vehicle unit system strategy convergence and the convergence time. Compared with the LSPI algorithm and the Q learning algorithm, in the process of optimization training, the number of samples required by the LSPI algorithm is lower than that of the Q learning algorithm, and the iteration times on the strategy learning problem are relatively small. Specific comparisons are shown in the following table:
TABLE 3 comparison of simulation results for LSPI and Q learning algorithms
Figure BDA0001732416040000091
The termination condition of the Q learning algorithm is that the sum of squares of the difference values of the Q value table between two iterations is less than 1, and the termination condition of the LSPI algorithm is that the sum of squares of the difference values of the parameter vectors omega of the two iterations is less than 0.01. The data in the table are mean values obtained by multiple experiments, and analysis shows that the number of samples required by convergence of the Q learning algorithm is far larger than that of LSPI algorithms, and the learning time is also longer than that of the LSPI algorithms. The discrete state space restricts the generalization and popularization ability of Q learning to the environment, and the samples cannot be fully learned, so that the information is lost. Considering the state space is more finely dispersed, the calculation amount will grow exponentially as the dimension of the state space increases, and the requirement on the storage space will be larger, so the convergence time and the required sample set of the algorithm are far larger than those of the LSPI algorithm.
2) Designing a typical 3-interval import scene on a combined simulation platform, carrying out strategy optimization training, and continuously exploring and optimizing an import strategy by an algorithm agent through interactive feedback with a simulation environment.
As shown in fig. 1, the maximum number of iterations of each set of experiments is 5000, wherein each iteration is 500, the policy is evaluated once, the success rate of the import behavior is recorded, the import success rate based on the LSPI algorithm is gradually increased with the increase of the training times, and finally reaches 86% success rate, which indicates that the import strategy method can autonomously learn the import strategy. The learning success rate of Q fluctuates around 25%, the import success rate is low, and the applicability of the algorithm is not high.
(2) And verifying the true imported data.
As shown in fig. 2, 30 sets of real data in the ngsim (next generation simulation) US101 data set are selected for verification of the import policy, the selection of the gaps after training is slightly conservative under the same condition, the original gap is selected under 73% of the conditions, and the proportion of the gap before selection is lower than that of the real import data.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (9)

1. An environment self-adaptive importing method of an intelligent driving vehicle in an urban environment is characterized by comprising the following steps:
extracting an initial state vector;
calculating action variables according to a greedy strategy, executing the import action and simultaneously updating the import scene, if the action variables take random action, selecting the import interval and the import action according to uniform probability,
if intelligent selection action is adopted, the candidate gaps comprise a front vehicle, a rear vehicle and an incoming vehicle, the maximum action value functions of all the candidate gaps are compared, the maximum function is selected, the gap and the action corresponding to the maximum value are selected, and the target incoming gap and the intelligent incoming action are returned;
sensing a state vector at the next moment;
calculating an award value according to the environment feedback information;
storing the initial state vector, the action variable, the state vector at the next moment and the reward value into a sample set, and after enough samples are obtained, converging into strategy modeling according to an LSPI algorithm for evaluation and improvement;
repeatedly executing the steps until the merging is successful;
the LSPI algorithm is imported into strategy modeling:
(1) state space
The unit system state space of the LSPI algorithm is described as a seven-dimensional vector space (x)0,y0,v0,x1,v1,x2,v2) Wherein (x)0,y0,v0) To incorporate the position coordinates and velocity information of the vehicle, (x)1,v1,x2,v2) Representing longitudinal position coordinates and speed information of a front vehicle and a rear vehicle of the target lane in the simulation process;
(2) basis function establishment
The collision time TTC and the head time interval gt of two vehicles in the unit systemiRelative distance dxiAnd relative velocity dviAnd a part of self-state information (y) of the unmanned vehicle0,v0) Including a basis function, the basis function comprising 14 dimensions;
(3) movement space
Dispersing the longitudinal acceleration into five action values of rapid deceleration, uniform speed, acceleration and rapid acceleration which respectively correspond to (-4, -2, 0, 2 and 4);
(4) reward function
The speed limit and the comfort are brought into the evaluation index, a linear weighted comprehensive reward value model is established, as shown in a formula (1),
Figure FDA0002588188180000021
Rsafety(s, a) represents a security reward value, μ1For security reward value weight, Rtask(s, a) a prize value, μ, indicating success or failure of the task2Rewarding value weight for task success, Rtime(s, a) represents the remittance efficiency reward value, μ3Rewarding value weight for remittance efficiency, Rrule(s, a) represents a rate-limiting prize value, μ4Rewarding value weight for remittance efficiency, Rcomfort(s, a) comfort reward value, μ5Value weights are awarded for the importation efficiency.
2. The adaptive importing method for the environment of the intelligent driving vehicle in the urban environment according to claim 1, wherein: the state space is described as a seven-dimensional vector space, wherein the front three dimensions are position coordinates and speed information of the imported vehicle, and the rear four dimensions are longitudinal position coordinates and speed information of the front vehicle and the rear vehicle of the target lane in the simulation process.
3. The adaptive importing method for the environment of the intelligent driving vehicle in the urban environment according to claim 1, wherein: the initial state space adopts a basis function which adopts the collision time, the headway, the relative distance, the relative speed and the motion state of two vehicles to be included in the basis function.
4. The adaptive importing method for the environment of the intelligent driving vehicle in the urban environment according to claim 1, wherein: the action variables comprise a longitudinal speed decision variable and a transverse speed decision variable.
5. The adaptive importing method for the environment of the intelligent driving vehicle in the urban environment according to claim 4, wherein: the acceleration of the longitudinal speed decision variable in the action variables is dispersed into five action values of rapid deceleration, uniform speed, acceleration and rapid acceleration, and the transverse speed decision variable is two action values, so that the action space of the longitudinal speed decision variable and the transverse speed decision variable is 10 actions.
6. The adaptive importing method for the environment of the intelligent driving vehicle in the urban environment according to claim 1, wherein the security reward value is specifically:
when collision or collision is easy, a large negative reward is given, and when the safety condition is met, the reward value is 0, so that the weight of the safety reward value is a large negative value;
Figure FDA0002588188180000022
dx10,dx02the relative distances, min (dx), of the oncoming vehicle to the leading and trailing vehicles of the target lane, respectively10,dx02) Is the minimum value of the relative distance between the merging vehicle and the front vehicle and the rear vehicle of the target lane; where dis is a relative distance safety threshold between the merging vehicle and the vehicle in front of the target lane and the vehicle behind the target lane, and min dis is a relative distance safety threshold between the merging vehicle and the vehicle in front of the target lane and the vehicle behind the target laneA minimum value.
7. The adaptive importing method for intelligent driving vehicle environment in urban environment according to claim 1, wherein the reward value for success or failure of task is:
Figure FDA0002588188180000031
dis1to a safe distance threshold, y0As vehicle ordinate, dx10,dx02The relative distances between the vehicle to be merged and the front vehicle and the rear vehicle of the target lane are respectively, when the unmanned vehicle is successfully merged, a larger positive reward is given, and the weight is a larger positive value.
8. The adaptive import method for the environment of the intelligent driving vehicle under the urban environment according to claim 1, wherein the import efficiency reward value is as follows:
Figure FDA0002588188180000032
step represents the current period, and when the unmanned vehicle successfully sinks within the preset value, a positive reward is given, otherwise, a negative reward is given, so that the weight is positive.
9. The adaptive importing method for the environment of the intelligent driving vehicle in the urban environment according to claim 1, wherein the speed limit reward value is as follows:
Figure FDA0002588188180000033
vlimitindicating a road speed limit; when the unmanned vehicle is within the speed limit range, the speed limit reward value is 0, if the unmanned vehicle is overspeed, a negative reward value is given, therefore, the weight is positive, v0Representing a vehicle speed;
the comfort reward value is as follows:
the comfort degree in the driving process comprises characteristic indexes of acceleration and impact degree in the longitudinal and transverse directions, the impact degree refers to the change rate of the acceleration along with time, the comfort reward value considers the change of the longitudinal acceleration and is normalized as follows:
Figure FDA0002588188180000041
where | Δ a | represents the longitudinal acceleration motion difference of two cycles, amaxRepresents the maximum acceleration, aminRepresents the maximum deceleration, when the acceleration difference is 0, the reward value is zero; in other cases, the acceleration changes constantly, the driving comfort decreases, and a negative reward is given, so the weight is a negative value.
CN201810780413.4A 2018-07-17 2018-07-17 Intelligent driving vehicle environment self-adaptive importing method under urban environment Active CN109143852B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810780413.4A CN109143852B (en) 2018-07-17 2018-07-17 Intelligent driving vehicle environment self-adaptive importing method under urban environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810780413.4A CN109143852B (en) 2018-07-17 2018-07-17 Intelligent driving vehicle environment self-adaptive importing method under urban environment

Publications (2)

Publication Number Publication Date
CN109143852A CN109143852A (en) 2019-01-04
CN109143852B true CN109143852B (en) 2020-09-18

Family

ID=64800630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810780413.4A Active CN109143852B (en) 2018-07-17 2018-07-17 Intelligent driving vehicle environment self-adaptive importing method under urban environment

Country Status (1)

Country Link
CN (1) CN109143852B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI705377B (en) * 2019-02-01 2020-09-21 緯創資通股份有限公司 Hardware boost method and hardware boost system
CN111243296B (en) * 2020-01-15 2020-11-27 清华大学 Ramp confluence cooperative control method and system based on confluence time optimization
CN111625989B (en) * 2020-03-18 2024-02-13 北京联合大学 Intelligent vehicle incoming flow method and system based on A3C-SRU
CN112590792B (en) * 2020-12-18 2024-05-10 的卢技术有限公司 Vehicle convergence control method based on deep reinforcement learning algorithm
CN113110043B (en) * 2021-03-25 2022-04-08 南京航空航天大学 Vehicle convergence control method considering workshop interaction
CN115909780B (en) * 2022-11-09 2023-07-21 江苏大学 Expressway import control system and method based on intelligent networking and RBF neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101789183A (en) * 2010-02-10 2010-07-28 北方工业大学 Self-adaptive control system and method for entrance ramp
CN105912814A (en) * 2016-05-05 2016-08-31 苏州京坤达汽车电子科技有限公司 Lane change decision model of intelligent drive vehicle
CN106601002A (en) * 2016-11-23 2017-04-26 苏州大学 City expressway access ramp vehicle pass guiding system in car networking environment and guiding method thereof
CN107700293A (en) * 2017-08-31 2018-02-16 金勇� Automatic running transit system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9108652B2 (en) * 2012-07-09 2015-08-18 General Electric Company Method and system for timetable optimization utilizing energy consumption factors

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101789183A (en) * 2010-02-10 2010-07-28 北方工业大学 Self-adaptive control system and method for entrance ramp
CN105912814A (en) * 2016-05-05 2016-08-31 苏州京坤达汽车电子科技有限公司 Lane change decision model of intelligent drive vehicle
CN106601002A (en) * 2016-11-23 2017-04-26 苏州大学 City expressway access ramp vehicle pass guiding system in car networking environment and guiding method thereof
CN107700293A (en) * 2017-08-31 2018-02-16 金勇� Automatic running transit system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
车-车协同下无人驾驶车辆的换道汇入控制方法;张荣辉 等;《中国公路学报》;20180430;第31卷(第4期);第180-191页 *

Also Published As

Publication number Publication date
CN109143852A (en) 2019-01-04

Similar Documents

Publication Publication Date Title
CN109143852B (en) Intelligent driving vehicle environment self-adaptive importing method under urban environment
CA3065617C (en) Method for predicting car-following behavior under apollo platform
CN110969848B (en) Automatic driving overtaking decision method based on reinforcement learning under opposite double lanes
CN109733415B (en) Anthropomorphic automatic driving and following model based on deep reinforcement learning
Xin et al. Intention-aware long horizon trajectory prediction of surrounding vehicles using dual LSTM networks
Wang et al. A novel pure pursuit algorithm for autonomous vehicles based on salp swarm algorithm and velocity controller
CN113110592A (en) Unmanned aerial vehicle obstacle avoidance and path planning method
CN111931902A (en) Countermeasure network generation model and vehicle track prediction method using the same
CN115257745A (en) Automatic driving lane change decision control method based on rule fusion reinforcement learning
Yao et al. Optimizing traffic flow efficiency by controlling lane changes: Collective, group, and user optima
Ye et al. Meta reinforcement learning-based lane change strategy for autonomous vehicles
CN114781072A (en) Decision-making method and system for unmanned vehicle
Venkatesh et al. Connected and automated vehicles in mixed-traffic: Learning human driver behavior for effective on-ramp merging
Ulfsjöö et al. On integrating POMDP and scenario MPC for planning under uncertainty–with applications to highway driving
CN110390398B (en) Online learning method
US20240202393A1 (en) Motion planning
Yuan et al. From Naturalistic Traffic Data to Learning-Based Driving Policy: A Sim-to-Real Study
Ma et al. Evolving testing scenario generation method and intelligence evaluation framework for automated vehicles
CN114701517A (en) Multi-target complex traffic scene automatic driving solution based on reinforcement learning
Cho A hierarchical learning approach to autonomous driving using rule specifications
WO2021148113A1 (en) Computing system and method for training a traffic agent in a simulation environment
CN114627640B (en) Dynamic evolution method of intelligent network-connected automobile driving strategy
Yang et al. Deep Reinforcement Learning Lane-Changing Decision Algorithm for Intelligent Vehicles Combining LSTM Trajectory Prediction
Ramkumar Realistic Speed Control of Agents in Traffic Simulation
Bhattacharyya Modeling Human Driving from Demonstrations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Chen Xuemei

Inventor after: Liu Gemeng

Inventor after: Du Mingming

Inventor before: Chen Xuemei

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant