CN110568760A - Parameterized learning decision control system and method suitable for lane changing and lane keeping - Google Patents

Parameterized learning decision control system and method suitable for lane changing and lane keeping Download PDF

Info

Publication number
CN110568760A
CN110568760A CN201910952119.1A CN201910952119A CN110568760A CN 110568760 A CN110568760 A CN 110568760A CN 201910952119 A CN201910952119 A CN 201910952119A CN 110568760 A CN110568760 A CN 110568760A
Authority
CN
China
Prior art keywords
vehicle
lane
state
module
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910952119.1A
Other languages
Chinese (zh)
Other versions
CN110568760B (en
Inventor
高炳钊
张羽翔
吕吉东
陈虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN201910952119.1A priority Critical patent/CN110568760B/en
Publication of CN110568760A publication Critical patent/CN110568760A/en
Application granted granted Critical
Publication of CN110568760B publication Critical patent/CN110568760B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)

Abstract

The invention belongs to the technical field of design of advanced auxiliary driving and unmanned systems of automobiles, and particularly relates to a parameterized learning decision control system and method suitable for lane changing and lane keeping behaviors. The invention designs a parameterized learning control system suitable for lane changing and lane keeping behaviors based on a parameterized decision frame, which comprises a learning decision method designed based on a reinforcement learning algorithm under the scene of lane changing and lane keeping of a vehicle and a trajectory planning controller which can be suitable for straight roads and curved roads after corresponding parameterization under the scene of the scene.

Description

parameterized learning decision control system and method suitable for lane changing and lane keeping
Technical Field
The invention belongs to the technical field of design of advanced auxiliary driving and unmanned systems of automobiles, and particularly relates to a parameterized learning decision control system and method suitable for lane changing and lane keeping behaviors.
Background
With the continuous development of intelligent driving assistance technology and unmanned driving technology, different forms of motion control systems are continuously proposed and applied. For example, in the motion trajectory planning and control problem, in order to make the system more functional and adaptive to various scenarios, under the control framework of vehicle hierarchy, the integrated underlying motion controller needs to perform various driving tasks and scenarios, such as lane change, lane keeping, etc. Meanwhile, each execution subsystem, such as driving, braking and steering system, has the capability of coordination control and can realize stable switching among different tasks. The parameterized decision framework proposed in the prior art can meet the above requirements, namely a trajectory planning control method based on a parametric decision framework, which is based on a model prediction control method and integrates trajectory planning and motion control in various scenes. The trajectory planning and control method has advantages and development potential because it is in a simple form and can be adapted to various driving tasks and conditions. Under the track planning control framework, human-like driving decisions are described as a plurality of decision parameters closely related to track characteristics at a decision control layer. Furthermore, the solution of different decision parameters needs to be adapted to the changeable driving conditions and continuously adapted to the behavior and feedback behavior of the real human driver in the real driving scene, and the continuous learning effect is difficult to achieve by using the model-based control method. Therefore, for the design of the decision layer control algorithm, a reinforcement learning algorithm which has advantages in sequence control and continuous learning in the learning algorithm can be used. For urban conditions, or highway conditions, lane change and lane keeping are the most common. The decision parameter feature relationship is simple and consistent.
disclosure of Invention
the invention provides a parameterized learning decision control system and a parameterized learning decision control method suitable for lane changing and lane keeping behaviors, which comprise a learning decision method designed based on an reinforcement learning algorithm and a track planning controller which can be suitable for straight roads and curved roads after corresponding parameterization in the scene.
The technical scheme of the invention is described as follows by combining the attached drawings:
A parameterization learning decision control system suitable for lane changing and lane keeping is characterized by comprising a sensing signal collection and data storage module A, a learning decision parameter module B, a trajectory planning and motion control module C and an execution tracking module D;
The perception signal collecting and data storing module A is used for obtaining the running state information of the current vehicle and the vehicle in the surrounding environment, processing the signals, and collecting data for the learning training of the subsequent decision parameters;
The learning decision parameter module B is used for learning the collected decision data, when the data quantity collected by the system reaches a certain threshold value or is updated to a certain degree, the system can continuously learn, and a proper decision parameter value is learned based on a reinforcement learning method;
the track planning and motion control module C is used for real-time track planning and motion control of vehicle planning, and determines the form of a controller and roll-optimizes tracks by using the specific decision parameter values output by the learning decision parameter module B and the current driving road type judged by the sensing signal collection and data storage module A based on a model prediction control method;
The execution tracking module D is used for tracking and controlling the control quantity output by the algorithm, and is realized by adopting a PID (proportion integration differentiation) controller to ensure the control precision;
the perception signal collecting and data storing module A is connected with a learning decision parameter module B, a track planning and motion control module C and an execution tracking module D; the learning decision parameter module B is connected with the track planning and motion control module C; and the track planning and motion control module C is connected with the execution tracking module D.
A method for a parameterized learning decision control system for lane change and lane keeping, the method comprising the steps of:
The method comprises the following steps of firstly, obtaining state information of a vehicle and an environmental vehicle required by a vehicle control algorithm through a perception signal collection and data storage module A, and comprising the following steps: the lane, the speed and the acceleration of the surrounding vehicle are obtained by means of a vehicle-mounted camera and a radar environment sensing element in a vehicle-mounted intelligent sensing module, and the relative distance of the vehicle relative to the vehicle by taking the lane as a reference is obtained, and the driving intention of the environment vehicle, namely the lane keeping or lane changing, and the lane and the speed of the vehicle are obtained through the deviation of the vehicle from the center line of the lane or the steering lamp information, and the information is stored in the module;
Step two, learning appropriate decision parameter values, namely specific values of lateral deviation, behavior time and acceleration and deceleration behaviors of the behavior terminal through a learning decision parameter module B, and dispersing two continuous variables of the behavior time and the acceleration and deceleration behaviors in a value range space to obtain a discrete action space; performing state design and return design by using a least square strategy iterative reinforcement learning method based on a kernel function, and learning by using a reinforcement learning algorithm when the data volume collected by the system reaches a certain threshold;
Performing on-line optimization solution to perform track planning and motion control through a track planning and motion control module C according to decision parameter values output by a learning decision parameter module B, using a state space equation containing a vehicle dynamics equation and a six-dimensional state vector, and establishing a constraint equation with terminal state constraint, so that the process of action execution can be matched with different road types; decision parameters corresponding to the lane changing and lane keeping behavior scenes are unified and determined, and are lateral deviation of a behavior terminal, behavior time and acceleration and deceleration behaviors, and are respectively corresponding to terminal lateral deviation equality constraints in a model prediction controller, a prediction time domain and an acceleration reference item in an objective function; for two different road conditions of a straight road and a curve, two different terminal state equality constraint conditions are correspondingly converted, namely, the terminal lateral deviation, the course angle, the lateral speed and the yaw speed of the vehicle are constrained under the straight road condition, and only the terminal lateral displacement and the course angle of the vehicle are constrained under the curve condition;
and fourthly, tracking control is carried out on the control quantity output by the algorithm through executing the tracking module D, and the control precision is ensured by adopting a PID controller.
the specific method of the first step is as follows:
Obtaining state information of a vehicle and an environmental vehicle required by a vehicle control algorithm in a perception signal collection and data storage module A, wherein the state information comprises the following steps: the state information of surrounding vehicles is obtained by means of a vehicle-mounted camera and a radar environment sensing element in a vehicle-mounted intelligent sensing module, different positions of the surrounding vehicles are labeled, and target vehicles at corresponding positions are screened; if the corresponding position has the target vehicle, the activation sign signal P of the corresponding positionN1, otherwise PN0 is _; when the activation flag signal P at position NNWhen _flagis 1, the corresponding lane L of the vehicleNVelocity vNAcceleration aNAnd a relative distance d with respect to the host vehicle with respect to the lane thereofNAnd obtaining the driving intention I of the environmental vehicle through the deviation of the environmental vehicle from the central line of the lane or the information of the steering lampNAnd a lane L of the host vehiclehvelocity vhIs recorded; wherein, for the driving intention INIs calculated by
Wherein, INWhen the value of (1) is-1, 0,1, respectively indicating the intention of changing lane to the right, keeping lane and changing lane to the left of the environmental vehicle; flag _ light is a turn signal, the values of which are-1, 0, and 1 respectively indicate that the vehicle in the environment has the right turn signal, does not have the right turn signal and the left turn signal is on; delta d is the lateral distance of the current environment vehicle relative to the lane where the current environment vehicle is located and perpendicular to the lane line direction; dlanethe distance between two adjacent lanes is defined as the distance between two adjacent lanes; this information is finally stored in the module.
the specific method of the second step is as follows:
The learning decision parameter module B learns proper decision parameter values based on a least square strategy iterative reinforcement learning method of a kernel function; modeling a driving decision process suitable for lane changing and lane keeping behaviors into a Markov decision process, wherein the Markov decision process comprises state design, action design and return design; according to the designed Markov decision process model and the recorded data, when the data volume collected by the system reaches a certain threshold value, learning by using a least square strategy iterative reinforcement learning method based on a kernel function;
2.1) establishing a Markov decision process model;
Firstly, designing a state;
for the relative positions of the environmental vehicle and the host vehicle, and for the number of the positions of the environmental vehicle, in order to fully express the traffic flow state in the environment, the state of the vehicle at the position N is considered, respectively, as the current lane LNVelocity vNAcceleration aNAnd a relative distance d with respect to the host vehicle with respect to the lane thereofNAnd obtaining the driving intention I of the environmental vehicle through the lateral deviation delta d of the environmental vehicle relative to the central line of the lane or the turn signal information Flag _ lightNWherein subscript N represents the corresponding vehicle at location N; the state vector also includes the state of the vehicle, the lane L of the vehiclehVelocity vh(ii) a The numerical values of the state quantities are read, calculated and stored in the sensing signal collection and data storage module A; thus, the state vector s can be represented as
When no environment vehicle exists at the corresponding position, the corresponding state vector value is set to be 0;
Secondly, designing the action;
In the framework of the problem, the decision parameters corresponding to the lane change and lane keeping behavior scenes are unified and determined, and are laterally deviated T for the behavior terminalyTime of action tfacceleration and deceleration behavior atar(ii) a The decision parameters can be directly applied to a trajectory planning and motion control controller in a trajectory planning and motion control module C, and respectively correspond to terminal lateral deviation equality constraint in a model prediction controller, a prediction time domain and an acceleration reference item in an objective function; thus, the motion vector a can be represented as
a=(Ty,tf,atar)T, (3)
Wherein the behavior terminal is laterally offset by Ty∈{-dlane,0,dlane};dlaneThe distance between two adjacent lanes is respectively and correspondingly changed to the left, the lane is kept and the lane is changed to the right; in the motion space, the action time tfacceleration and deceleration behavior atarThe two continuous variables are dispersed in a value range space to obtain a discrete action space; thus the action time tfCan be expressed as
Acceleration and deceleration behavior atarE { -1.5, -0.5,0,0.5,1.5}, these parameterized decisions are used to describe human driving behavior;
Thirdly, return design;
In the design of the return function, safety factors r are respectively consideredsFast factor rrRide comfort factor rcrespectively expressed as:
rr=β1atar (6)
rr=rr-0.5 f tf=4, (7)
rc=-β1|atar| (8)
rc=rc-0.5 f tf=2, (9)
wherein d isNis the relative distance of the vehicle at position N relative to the host vehicle on the basis of the lane thereof, dcIs the collision distance, TH ═ dN/vhtime interval of head, THexpIs the desired headway, LNIs the vehicle lane at position N, LhIs the vehicle lane, beta1,β2Is the weight coefficient, tfAs time of action, atarAcceleration and deceleration behaviors; thus, the total return can be calculated as follows
r=rs+rr+rc+ra, (10)
wherein r isaa return returned after the trajectory planning is carried out for the trajectory planning and motion control module (C);
2.2) least square strategy iterative algorithm based on kernel function: in a continuous state space, a function approximation method is used for representing a state-action value function; solving a weight vector of a state-action value function in reinforcement learning by using a least square strategy iterative algorithm based on a kernel function; firstly, obtaining a kernel dictionary through a thinning process; the feature vector is designed according to the state vector s and the motion vector a in the state pair m (s, a), and can be expressed as phi (m) sT,aT]TSelecting a radial basis function as the kernel function can be expressed as:
wherein the content of the first and second substances,<·,·>Represents the inner product of two vectors, phi (m)i),φ(mj) The state vectors are normalized in different ranges and are distinguished from action vectors and state vectors; sample set is denoted as M ═ M1,m2,...,mpThe feature vector set is phi ═ phi (m)1),φ(m2),...,φ(mp) }; screening based on the feature vector set, and if the linear correlation between the current feature vector and the feature vector in the dictionary is greater than a threshold value, adding the feature vector into a kernel dictionary to approximate the state value function;
The screening process is described as: assuming that after traversing q samples, kernel dictionary Dt-1T-1 (t is more than 1 and less than or equal to p) feature vectors; for the q +1 th sample, when judging whether the sample should be added into the kernel dictionary, calculating:
wherein λ ═ λ12,...,λt-1]As a weight vector of the formula(12) The solution of (a) is:
Wherein λ ═ λ12,...,λt-1]as a weight vector, [ W ]t-1]i,j=κ(mi,mj) Is a t-1 × t-1 dimensional matrix, w(q+1)(q+1)=κ(mq+1,mq+1) For the current feature vector mq+1inner product value with itself, wt-1(mq+1)=[κ(m1,mq+1),κ(m2,mq+1),...,κ(mt-1,mq+1)]TFor the existing feature vector and the current feature vector s in the dictionarytThe inner product t-1 dimensional column vector; if xi is more than mu, the feature vector is added into the kernel dictionary, otherwise, the feature vector is not added; until all samples are tested;
After obtaining the kernel dictionary, linearly approximating a state-action value function by using a characteristic vector in the kernel dictionary; the state-action value function is represented as:
Wherein the content of the first and second substances,is in a state miis estimated by the state-action value function of (a ═ a12,...,αt) Is a weight vector; phi (m)j) Is a state pair mjThe feature vector of (2); for the ii th sample pair miiAnd ii +1 sample pair mii+1The incremental iterative update equation is:
wherein, wt(mii)=[κ(m1,mii),κ(m2,mii),...,κ(mt,mii)]T,wt(mii+1)=[κ(m1,mii+1),κ(m2,mii+1),...,κ(mt,mii+1)]TAre respectively formed by mii,mii+1Calculating with the feature vector in the dictionary; a. theii-1,Aiiis a matrix of dimensions t x t, bii-1,biiThe t-dimensional column vectors respectively correspond to the values of the matrix A and the vector b in two times of iterative updating before and after; alpha is alphaiiEstimating a linear approximation weight vector of the state-action value function after iterative computation for the ii samples;
Estimation based on state-action value functionto improve the policy, the updated policy can be expressed as:
iteration is continued until all sample states and actions in the data set are the same as the actions obtained by the current strategy, and the algorithm convergence is finished;
The specific calculation process is as follows:
Step (1): obtain data set M ═ M1,m2,...,mp}, kernel function k, and initialize empty kernel dictionary D0A threshold μ;
step (2): calculating equation (13) for cycle i ═ 1: p; if xi is more than mu, adding the current feature vector into the dictionary; otherwise, i is i + 1;
And (3): and obtaining a kernel dictionary, and performing strategy iteration. Initializing a zero matrix A, a zero vector b and a zero weight vector alpha;
And (4): calculating equation (15) for a number of cycles i ═ 1: p; until the data set policy is consistent with the current network policy;
And (5): and outputting the weight vector alpha.
The concrete method of the third step is as follows:
3.1) nonlinear trajectory planning and establishment of motion equation: the bicycle vehicle dynamics model can be expressed as:
Wherein M is the vehicle mass; v. ofxIs the longitudinal vehicle speed; v. ofyIs the vehicle lateral velocity; w is aris the vehicle yaw rate; fyf,FyrRespectively applying the lateral force of the front wheels and the lateral force of the rear wheels of the vehicle; i iszIs the moment of inertia of the vehicle along the z-axis; lf,lrThe distance between the front shaft and the rear shaft is; the control of tracking the longitudinal speed and the steering motion of the vehicle in the tracking module D is executed, so that the control method ensureswhile here the control quantity is simplified to the front wheel turning angle deltafAnd a rate of change of the longitudinal velocity number a; tire side force Fyf,FyrCan be expressed as:
Wherein, deltafis a front wheel corner; cr,Cffront and rear wheel cornering stiffness, respectively; meanwhile, according to the motion relationship of the vehicle, there are Is the heading angle of the vehicle; considering the motion equation of the motion of the vehicle under the global coordinate system, the nonlinear vehicle motion space equation is established as
Wherein the state variable isThe controlled variable is u ═ a, δf];Fyf,FyrCan be driven by(18) calculating to obtain; x and Y are the positions of the vehicles in the global coordinate system;
3.2) establishing an optimized trajectory planner: firstly, the terminal state equality constraint condition is related to different road types; for a task, the completion of the task can be guaranteed only when a certain terminal state condition is met in a prediction time domain terminal; for lane keeping and lane changing tasks in a straight road environment, the task is completed under the condition that the yaw speed is at the terminal moment, the lateral speed returns to 0, the course angle is consistent with the center line of the current lane, and the position is on the center line of the lane of the current lane; in a curve environment, the equality constraint that the lateral speed returns to 0 for the yaw rate can be relaxed; thus, the terminal equation in a straight-path environment is constrained to
Wherein, wr(tf),vy(tf),Y(tf) Respectively predicting the time yaw velocity, the lateral velocity, the course angle and the transverse displacement of the time domain terminal; y isl,fLaterally displaced for a desired termination; lane keeping time yl,f0; when changing lanes, yl,f=dlane,dlaneIs the lateral distance between adjacent lanes; the terminal equality constraint in a curved road environment is
Wherein the content of the first and second substances,The course angle is the central line of the target lane at the point which is vertically closest to the current position of the vehicle; p (t)f) Predicting the vehicle position at the time of the time domain terminal; planeThe position of the central line of the target lane at the closest point in vertical distance with the current position of the vehicle is taken as the position; at the same time, the control quantity should satisfy the inequalityConstraining
wherein, subscripts min, max represent the minimum and maximum values of the corresponding variables, respectively;
front wheel turning angle delta of each control quantity in prediction time domain is considered by target functionfAnd the amount of change Δ δ in the rate of change of the longitudinal speed numberfAnd Δ a and the rate of change of the longitudinal velocity number a and the desired acceleration/deceleration behavior atarThe integral performance index of the deviation, the objective function of the controller is expressed as:
whereinIs a weight coefficient;
Thus an optimization problem can be established as
Wherein, P (t)f)∈Rac,P(tf)∈RcdPredicting the positions of the vehicles at the time of the time domain terminal in a straight road and a curved road;
3.3) the trajectory planning and motion control module executes the driving decision return calculation: changing the transfer function in reinforcement learning into an actual track planning and motion control module (C), wherein the return r returned after the track planning is carried out by the track planning and motion control module (C)aThe calculation equation is
The invention has the beneficial effects that:
1. The invention designs a parameterized learning control system suitable for lane changing and lane keeping behaviors, and uses consistent driving decision and track planning forms in different driving tasks and environments;
2. The invention uses a learning decision method designed based on a reinforcement learning algorithm, and the decision simultaneously comprises three variables of behavior terminal lateral deviation, behavior time and acceleration and deceleration behaviors.
3. the invention uses a model predictive control method to carry out track planning and motion control on the online optimization solution of decision parameter values, and different terminal state constraints are suitable for different driving tasks and road conditions.
Drawings
FIG. 1 is a schematic view of position numbers of a host vehicle and an environmental vehicle;
FIG. 2 is a block diagram of the system architecture of the present invention;
FIG. 3 is a general flow diagram of the system of the present invention;
FIG. 4 is a lane change diagram of the host vehicle (H) and the environmental vehicles (N1-N8) under scene 1;
FIG. 5 is a lane change diagram of the host vehicle (H) and the environmental vehicles (N1-N8) under scene 2;
Detailed Description
Because the driving behavior characteristics of a driver in a real driving environment are unknown in the system design stage, an accurate model is difficult to establish, and the system needs to improve the overall performance of the system through continuous learning. In order to improve the adaptability of the system to different driving behavior characteristics of different drivers and further ensure the safety of the system under the condition of obtaining better driving performance, the invention designs a parameterized learning control system suitable for lane changing and lane keeping behaviors based on a parameterized decision framework, which comprises a learning decision method designed based on an enhanced learning algorithm under the scene of lane changing and lane keeping of a vehicle and a trajectory planning controller which is suitable for straight roads and curved roads after corresponding parameterization under the scene.
A parameterized learning decision control system suitable for lane change and lane keeping behaviors comprises a plurality of sub-modules, and a structural block diagram of the system is shown in figure 2, and mainly comprises: the perception signal collection and data storage module A, the learning decision parameter module B, the trajectory planning and motion control module C and the execution tracking module D jointly form a parameterized decision frame, and the parameterized learning decision control system is suitable for lane changing and lane keeping behaviors. The sensing signal collection and data storage module A is used for obtaining the running state information of the current vehicle and the surrounding environment vehicle and carrying out signal processing, and comprises: the lane, the speed and the acceleration of the surrounding vehicle are obtained by means of a vehicle-mounted camera and a radar environment sensing element in a vehicle-mounted intelligent sensing module, the relative distance of the vehicle relative to the vehicle by taking the lane as a reference is obtained, the driving intention (keeping or changing lane) of the environment vehicle and the lane and the speed of the vehicle are obtained through the deviation of the vehicle from the lane central line or the steering lamp information, and data are collected for learning and training of subsequent decision parameters. And the learning decision parameter module B is used for learning a proper decision parameter value by a reinforcement learning method. For urban conditions, or highway conditions, lane change and lane keeping behaviors are most common. The decision parameter feature relationship is simple, and the consistency is realized, namely the specific numerical values of lateral deviation of the behavior terminal, behavior time and acceleration and deceleration behaviors. And dispersing the two continuous variables of the behavior time and the acceleration and deceleration behavior in a value range space to obtain a discrete action space. Further performing state design and report design. When the data volume collected by the system reaches a certain threshold value, learning is carried out by using a least square strategy iterative reinforcement learning algorithm based on a kernel function. And the trajectory planning and motion control module C performs online optimization solution according to the decision parameter values output by the learning decision parameter module B, and is used for real-time trajectory planning and motion control of vehicle planning. The sensing signal collection and data storage module A judges the type of the current driving road; and rolling to optimize the track based on a model prediction control method. A nonlinear state space equation with a six-dimensional state vector is established, and a constraint equation with terminal state constraint is established, so that the process of action execution can be matched with different road types. The specific decision parameter value output by the learning decision parameter module B determines the form of a controller, and two different terminal state equality constraint conditions are correspondingly converted for two different road conditions of a straight road and a curved road, namely the terminal lateral deviation, the course angle, the lateral speed and the yaw speed of the vehicle are constrained under the straight road condition, and only the terminal lateral displacement and the course angle of the vehicle are constrained under the curved road condition. Behavior terminal lateral deviation, behavior time and acceleration and deceleration behaviors respectively correspond to terminal lateral deviation equality constraint in a model prediction controller, and an acceleration reference item in a time domain and an objective function is predicted; and the execution tracking module D is used for tracking and controlling the control quantity output by the algorithm, and is realized by adopting a PID (proportion integration differentiation) controller, so that the control precision is ensured.
On this basis, fig. 3 shows an overall technical scheme flowchart of the present invention, and the specific implementation process is as follows:
As shown in fig. 3, the learning process of the entire system is present in a human driver driving or virtual simulation environment. When a human driver drives, only the perception signal collection and data storage module A and the learning decision parameter module B work. When the virtual simulation environment is used for learning or the learning effect is verified, the modules A-D work simultaneously. The sensing signal collection and data storage module A obtains the lane, speed and acceleration of the surrounding vehicle and the relative distance of the vehicle relative to the vehicle by taking the lane as a reference by means of the vehicle-mounted camera and the radar environment sensing element in the vehicle-mounted intelligent sensing module, obtains the driving intention (keeping or changing lane) of the environment vehicle and the lane and speed of the vehicle by means of the deviation of the vehicle from the lane central line or the steering lamp information, and stores the information in the module. The sample value in the learning decision parameter block B reaches the threshold (10)3) Or after the data updating amount is more than 20%, learning the decision parameters according to a designed least square strategy iteration reinforcement learning algorithm based on the kernel function, and updating; otherwise, human driving continues to be collected or a random strategy is used to search the action space in the simulation environment. And the track planning and motion control module C carries out on-line optimization solution according to the decision parameter values output by the learning decision parameter module B to carry out track planning and motion control. Obtaining a controlled amount of front wheel steering angle deltafAnd a rate of change a in the longitudinal velocity number, the final output acting on the executive tracking module D. The vehicle execution control module D adopts a feedback proportional-integral-derivative PID controller to realize the tracking execution of the decision quantity because the control precision of the vehicle actuator on the control quantity needs to be ensured.
A parameterized learning decision control method suitable for lane changing and lane keeping behaviors comprises the following steps:
The method comprises the following steps of firstly, obtaining state information of a vehicle and an environmental vehicle required by a vehicle control algorithm through a perception signal collection and data storage module A, and comprising the following steps: the method comprises the following steps of obtaining the lane, speed and acceleration of a surrounding vehicle and the relative distance of the surrounding vehicle relative to the vehicle by using a vehicle-mounted camera and a radar environment sensing element in a vehicle-mounted intelligent sensing module and taking the lane as a reference, obtaining the driving intention (keeping or changing lane) of the surrounding vehicle and the lane and speed of the vehicle through the deviation of the driving intention and the lane center line of the vehicle or the turn signal information, and storing the information in the module, wherein the specific method comprises the following steps:
Obtaining state information of a vehicle and an environmental vehicle required by a vehicle control algorithm in a perception signal collection and data storage module A, wherein the state information comprises the following steps: and surrounding vehicle state information is obtained by means of a vehicle-mounted camera and a radar environment sensing element in the vehicle-mounted intelligent sensing module. As shown in fig. 1, different positions are respectively labeled as shown in the figure, and target vehicles at the corresponding positions are screened. If the corresponding position has the target vehicle, the activation sign signal P of the corresponding positionN1, otherwise PNand 0 is used for flag. When the activation flag signal P at position NNwhen _flagis 1, the corresponding lane L of the vehicleNvelocity vNAcceleration aNAnd a relative distance d with respect to the host vehicle with respect to the lane thereofNAnd obtaining the driving intention I of the environmental vehicle through the deviation of the environmental vehicle from the central line of the lane or the information of the steering lampNAnd a lane L of the host vehiclehVelocity vhIs recorded. Wherein for the driving intention INis calculated by
Wherein, INthe values of (1), (0), (1) respectively indicate the intention of the environmental vehicle to change lane to the right, keep lane and change lane to the left, the Flag _ light is a turn signal, the values of (1), (0) and (1) respectively indicate the turn signals of the environmental vehicle to the right, no and leftthe lamp is turned on, delta d is the lateral distance of the current environment vehicle relative to the direction perpendicular to the lane line of the lane where the vehicle is located, dlanethis information is finally stored in the module for the distance between two adjacent lanes.
Step two, learning appropriate decision parameter values, namely specific values of lateral deviation, behavior time and acceleration and deceleration behaviors of the behavior terminal through a learning decision parameter module B, and dispersing two continuous variables of the behavior time and the acceleration and deceleration behaviors in a value range space to obtain a discrete action space; the least square strategy iteration reinforcement learning method based on the kernel function is used for state design and return design, when the data volume collected by the system reaches a certain threshold value, the reinforcement learning algorithm is used for learning, and the specific method is as follows:
the learning decision parameter module B learns proper decision parameter values based on a least square strategy iterative reinforcement learning method of a kernel function. The driving decision process suitable for lane changing and lane keeping behaviors is modeled into a Markov decision process, and the Markov decision process comprises state design, action design and return design. According to the designed Markov decision process model and the recorded data, when the data volume collected by the system reaches a certain threshold value, a least square strategy iteration reinforcement learning method based on a kernel function is used for learning.
2.1) establishing a Markov decision process model;
State design, according to the number of the relative position between the environment vehicle and the position of the environment vehicle in fig. 1, in order to completely express the traffic flow state in the environment, the state of the vehicle at the position N is considered, and is respectively the current lane LNVelocity vNacceleration aNAnd a relative distance d with respect to the host vehicle with respect to the lane thereofNAnd obtaining the driving intention I of the environmental vehicle through the lateral deviation delta d of the environmental vehicle relative to the central line of the lane or the turn signal information Flag _ lightNWhere the subscript N represents the corresponding vehicle at location N. The state vector also includes the state of the vehicle, the lane L of the vehiclehVelocity vh. The values of these state quantities are read in the sensing signal collection and data storage module ACalculated and stored. Thus, the state vector s can be represented as
When there is no ambient vehicle in the corresponding position, the corresponding state vector value is set to 0.
Secondly, action design, in the frame of the problem, decision parameters corresponding to lane changing and lane keeping behavior scenes are unified and determined, and the decision parameters are laterally deviated T for a behavior terminalyTime of action tfAcceleration and deceleration behavior atar. The decision parameters can be directly applied to a trajectory planning and motion control controller in the trajectory planning and motion control module C, and respectively correspond to terminal lateral deviation equality constraint in a model prediction controller, a prediction time domain and an acceleration reference item in an objective function. Thus, the motion vector a can be represented as
a=(Ty,tf,atar)T, (3)
Wherein the behavior terminal is laterally offset by Ty∈{-dlane,0,dlane},dlaneThe distance between two adjacent lanes is respectively corresponding to changing lanes leftwards, keeping lanes and changing lanes rightwards. In the motion space, the action time tfAcceleration and deceleration behavior atarthe two continuous variables are dispersed in the value range space to obtain a discrete action space. Thus the action time tfCan be expressed as
Acceleration and deceleration behavior atarE { -1.5, -0.5,0,0.5,1.5 }. These parameterized decisions may be used to describe the driving behavior of humans, as shown in table 1.
TABLE 1 parameterized decision and human decision analogy example
And thirdly, return design. In the design of the return function, safety factors r are respectively consideredsFast factor rrRide comfort factor rcRespectively expressed as:
rr=βi atar (6)
rr=rr-0.5 f tf=4, (7)
rc=-β1|atar| (8)
rc=rc-0.5 f tf=2, (9)
wherein d isNIs the relative distance of the vehicle at position N relative to the host vehicle on the basis of the lane thereof, dcis the collision distance, TH ═ dN/vhTime interval of head, THexpIs the desired headway, LNIs the vehicle lane at position N, LhIs the vehicle lane, beta12Is the weight coefficient, tfas time of action, atarAcceleration and deceleration behaviors. Thus, the total return can be calculated as follows
r=rs+rr+rc+ra, (10)
Here we change the transfer function in reinforcement learning into a practical trajectory planning and motion control module C, so raAnd returning the return after the trajectory planning is carried out for the trajectory planning and motion control module C. The specific values will be further explained in the trajectory planning and motion control module C.
2.2) least square strategy iterative algorithm based on kernel function: in a continuous state space, a function approximation method is generally used to characterize a state-action value function; solving a weight vector of a state-action value function in reinforcement learning by using a least square strategy iterative algorithm based on a kernel function; first, a kernel dictionary is obtained through a thinning process. According to the state vector s and the motion vector a in the state pair m ═ s, aThe design feature vector may be expressed as phi (m) ═ sT,aT]Tselecting a radial basis function as the kernel function can be expressed as:
Wherein the content of the first and second substances,<·,·>Represents the inner product of two vectors, phi (m)i),φ(mj) And k is a weight vector and is used for normalizing state vectors in different ranges and distinguishing motion vectors from state vectors. The sample set may be represented as M ═ M1,m2,...,mpThe feature vector set is phi ═ phi (m)1),φ(m2),...,φ(mp) }; and screening based on the feature vector set, and if the linear correlation between the current feature vector and the feature vector in the dictionary is greater than a threshold value, adding the feature vector into the kernel dictionary to approximate the state value function.
the screening process can be described as: assuming that after traversing q samples, kernel dictionary Dt-1There are t-1 (t is more than 1 and less than or equal to p) eigenvectors. For the q +1 th sample, when judging whether the sample should be added into the kernel dictionary, calculating:
Wherein λ ═ λ12,...,λt-1]as a weight vector, the solution of equation (12) is:
Wherein λ ═ λ12,...,λt-1]As a weight vector, [ W ]t-1]i,j=κ(mi,mj) Is a t-1 × t-1 dimensional matrix, w(q+1)(q+1)=κ(mq+1,mq+1) For the current feature vector mq+1inner product value with itself, wt-1(mq+1)=[κ(m1,mq+1),κ(m2,mq+1),...,κ(mt-1,mq+1)]Tfor the existing feature vector and the current feature vector s in the dictionarytThe inner product t-1 dimensional column vector; if xi is more than mu, the feature vector is added into the kernel dictionary, otherwise, the feature vector is not added; until all samples are tested;
After the kernel dictionary is obtained, the characteristic vector in the kernel dictionary is used for linearly approximating the state-action value function. The state-action value function may be expressed as:
wherein the content of the first and second substances,is in a state miIs estimated by the state-action value function of (a ═ a12,...,αt) Is a weight vector; phi (m)j) Is a state pair mjthe feature vector of (2); for the ii th sample pair miiAnd ii +1 sample pair mii+1The incremental iterative update equation is:
Wherein, wt(mii)=[κ(m1,mii),κ(m2,mii),...,κ(mt,mii)]T,wt(mii+1)=[κ(m1,mii+1),κ(m2,mii+1),...,κ(mt,mii+1)]TAre respectively formed by mii,mii+1Calculating with the feature vector in the dictionary; a. theii-1,AiiIs a matrix of dimensions t x t, bii-1,biiThe t-dimensional column vectors respectively correspond to the values of the matrix A and the vector b in two times of iterative updating before and after; alpha is alphaiiEstimating a linear approximation weight vector of the state-action value function after iterative computation for the ii samples;
estimation based on state-action value functionThe strategy is improved. The updated policy may be expressed as:
iteration is continued until all sample states and actions in the data set are the same as the actions obtained by the current strategy, and the algorithm convergence is finished;
The specific calculation process is as follows:
Step (1): obtain data set M ═ M1,m2,...,mp}, kernel function k, and initialize empty kernel dictionary D0a threshold μ;
step (2): the loop i is 1: p, and equation (13) is calculated. If xi is more than mu, adding the current feature vector into the dictionary; otherwise, i is i + 1;
And (3): and obtaining a kernel dictionary, and performing strategy iteration. Initializing a zero matrix A, a zero vector b and a zero weight vector alpha;
And (4): multiple cycles i ═ 1: p, calculate equation (15). Until the data set policy is consistent with the current network policy;
and (5): and outputting the weight vector alpha.
Performing on-line optimization solution to perform track planning and motion control through a track planning and motion control module (C) according to decision parameter values output by a learning decision parameter module (B), using a state space equation containing a vehicle dynamics equation and a six-dimensional state vector, and establishing a constraint equation with terminal state constraint, so that the process of motion execution can be matched with different road types; decision parameters corresponding to the lane changing and lane keeping behavior scenes are unified and determined, and are lateral deviation of a behavior terminal, behavior time and acceleration and deceleration behaviors, and are respectively corresponding to terminal lateral deviation equality constraints in a model prediction controller, a prediction time domain and an acceleration reference item in an objective function; for two different road conditions of a straight road and a curve, two different terminal state equality constraint conditions are correspondingly converted, namely, the terminal lateral deviation, the course angle, the lateral speed and the yaw speed of the vehicle are constrained under the straight road condition, and only the terminal lateral displacement and the course angle of the vehicle are constrained under the curve condition; the specific method comprises the following steps:
3.1) nonlinear trajectory planning and establishment of motion equation: the bicycle vehicle dynamics model can be expressed as:
Where M is the vehicle mass, vxIs the longitudinal vehicle speed, vyAs the lateral speed of the vehicle, wrIs the yaw rate of the vehicle, Fyf,Fyrrespectively front wheel and rear wheel lateral forces, IzIs the moment of inertia of the vehicle along the z-axis,/f,lrIs the front-rear axle distance. The longitudinal speed and the steering movement of the vehicle can be tracked and controlled in the tracking module D, so that the condition that the longitudinal speed and the steering movement of the vehicle can be tracked and controlled is ensuredWhile here the control quantity is simplified to the front wheel turning angle deltafAnd the rate of change a of the longitudinal velocity number. Tire side force Fyf,FyrCan be expressed as:
Wherein, deltafIs a corner of the front wheel, Cr,CfFront and rear wheel cornering stiffness, respectively; meanwhile, according to the motion relationship of the vehicle, there are Is the heading angle of the vehicle. And considering the motion equation of the vehicle under the global coordinate system, the nonlinear vehicle motion space equation is established as
Wherein the state variable isThe controlled variable is u ═ a, δf]。Fyf,FyrCan be calculated from equation (18). X and Y are the positions of the vehicles in the global coordinate system.
3.2) establishing an optimized trajectory planner: first are the terminal state equation constraints, which are related to different road types. The idea is that for a task, the completion of the task can be guaranteed only when a certain terminal state condition is satisfied in the prediction time domain terminal. For lane keeping and lane changing tasks in a straight road environment, the task is completed under the condition that the yaw speed is at the terminal moment, the lateral speed returns to 0, the course angle is consistent with the center line of the current lane, and the position is on the center line of the lane of the current lane; whereas in a curve environment, the equality constraint for yaw rate, lateral rate back to 0, can be relaxed. Thus, the terminal equation in a straight-path environment is constrained to
Wherein, ω isr(tf),vy(tf),Y(tf) Respectively predicting the time yaw velocity, the lateral velocity, the course angle, the transverse displacement and the y of the time domain terminall,fFor desired terminal lateral displacement, lane keeping yl,f0; when changing lanes, yl,f=dlane,dlaneIs the lateral distance between adjacent lanes; the terminal equality constraint in a curved road environment is
Wherein the content of the first and second substances,is the heading angle of the center line of the target lane at the nearest point in vertical distance from the current position of the vehicle, P (t)f) Predicting time domain terminal time vehicle position, Planeand the position of the central line of the target lane at the closest point in vertical distance with the current position of the vehicle. At the same time, the control quantity should satisfy the inequality constraint
wherein, subscripts min, max represent the minimum and maximum values of the corresponding variables, respectively;
Front wheel turning angle delta of each control quantity in prediction time domain is considered by target functionfAnd the amount of change Δ δ in the rate of change of the longitudinal speed numberfand Δ a and the rate of change of the longitudinal velocity number a and the desired acceleration/deceleration behavior atarthe integral-type performance indicator of the deviation, the objective function of the controller can be expressed as:
wherein the content of the first and second substances,Are weight coefficients.
Thus an optimization problem can be established as
Wherein, P (t)f)∈Rac,P(tf)∈RcdAnd predicting the position of the vehicle at the time of the time domain terminal in a straight road and a curve.
3.3) the trajectory planning and motion control module executes the driving decision return calculation: the transfer function in reinforcement learning is changed into an actual track planning and motion control module C, and the return r returned after the track planning is carried out by the track planning and motion control module CaThe calculation equation is
Finally, the driving strategy is verified after learning, as shown in a driving scene 1 shown in fig. 4, an environmental vehicle N1 keeps driving on a lane 2, the environmental vehicle firstly drives on the lane 2 and then is switched into the lane 1, and the environmental vehicle keeps driving along a lane 3; and the environmental vehicle is switched into the lane 4 from the lane 3 and then into the lane 5 to finally keep running. Under the scene, the vehicle firstly changes the lane 3 into the lane 5 continuously and then changes into the lane 2 and finally changes into the lane 1.
in driving scenario 2 shown in fig. 5, environmental vehicle N3 changes into lane 1 after lane 2 has remained running for a period of time; the environmental vehicle N4 is changed into the lane 3 from the lane 2 and then is changed into the lane 4; the environmental vehicle N5 keeps running along the lane 3; the environmental vehicle N7 changes into lane 3 after lane 4 has remained running for a period of time; the environmental vehicle N8 keeps running along the lane 4; under the scene, the vehicle continuously changes the lane from the lane 3 to the lane 1 and then keeps running.
therefore, the vehicle can autonomously switch lane keeping and changing operations and carry out active lane changing operation according to the environment, and the system is a parameterized learning decision control system suitable for lane changing and lane keeping behaviors.

Claims (5)

1. a parameterization learning decision control system suitable for lane changing and lane keeping is characterized by comprising a sensing signal collection and data storage module (A), a learning decision parameter module (B), a trajectory planning and motion control module (C) and an execution tracking module (D);
the perception signal collecting and data storing module (A) is used for obtaining the running state information of the current vehicle and the vehicles in the surrounding environment, processing the signals and collecting data for the learning training of the subsequent decision parameters;
The learning decision parameter module (B) is used for learning the collected decision data, when the data quantity collected by the system reaches a certain threshold value or is updated to a certain degree, the system can continuously learn, and a proper decision parameter value is learned based on a reinforcement learning method;
The track planning and motion control module (C) is used for real-time track planning and motion control of vehicle planning, and based on a model prediction control method, the controller form is determined by using the specific decision parameter value output by the learning decision parameter module (B) and the current driving road type judged by the sensing signal collection and data storage module (A), and the track is optimized in a rolling manner;
The execution tracking module (D) is used for tracking and controlling the control quantity output by the algorithm, and is realized by adopting a PID controller to ensure the control precision;
The perception signal collection and data storage module (A) is connected with the learning decision parameter module (B), the trajectory planning and motion control module (C) and the execution tracking module (D); the learning decision parameter module (B) is connected with the trajectory planning and motion control module (C); and the track planning and motion control module (C) is connected with the execution tracking module (D).
2. The method of claim 1, wherein the method comprises the steps of:
the method comprises the following steps of firstly, obtaining state information of a vehicle and an environmental vehicle required by a vehicle control algorithm through a sensing signal collection and data storage module (A), wherein the state information comprises the following steps: the lane, the speed and the acceleration of the surrounding vehicle are obtained by means of a vehicle-mounted camera and a radar environment sensing element in a vehicle-mounted intelligent sensing module, and the relative distance of the vehicle relative to the vehicle by taking the lane as a reference is obtained, and the driving intention of the environment vehicle, namely the lane keeping or lane changing, and the lane and the speed of the vehicle are obtained through the deviation of the vehicle from the center line of the lane or the steering lamp information, and the information is stored in the module;
step two, learning appropriate decision parameter values, namely specific values of lateral deviation, behavior time and acceleration and deceleration behaviors of a behavior terminal through a learning decision parameter module (B), and dispersing two continuous variables of the behavior time and the acceleration and deceleration behaviors in a value range space to obtain a discrete action space; performing state design and return design by using a least square strategy iterative reinforcement learning method based on a kernel function, and learning by using a reinforcement learning algorithm when the data volume collected by the system reaches a certain threshold;
performing on-line optimization solution to perform track planning and motion control through a track planning and motion control module (C) according to decision parameter values output by a learning decision parameter module (B), using a state space equation containing a vehicle dynamics equation and a six-dimensional state vector, and establishing a constraint equation with terminal state constraint, so that the process of motion execution can be matched with different road types; decision parameters corresponding to the lane changing and lane keeping behavior scenes are unified and determined, and are lateral deviation of a behavior terminal, behavior time and acceleration and deceleration behaviors, and are respectively corresponding to terminal lateral deviation equality constraints in a model prediction controller, a prediction time domain and an acceleration reference item in an objective function; for two different road conditions of a straight road and a curve, two different terminal state equality constraint conditions are correspondingly converted, namely, the terminal lateral deviation, the course angle, the lateral speed and the yaw speed of the vehicle are constrained under the straight road condition, and only the terminal lateral displacement and the course angle of the vehicle are constrained under the curve condition;
and fourthly, tracking control is carried out on the control quantity output by the algorithm through the execution tracking module (D), and the control precision is ensured by adopting a PID controller.
3. The method for the parameterized learning decision control system for lane change and lane keeping according to claim 1, wherein the specific method of the first step is as follows:
Obtaining state information of a vehicle and an environmental vehicle required by a vehicle control algorithm in a perception signal collection and data storage module A, wherein the state information comprises the following steps: the state information of surrounding vehicles is obtained by means of a vehicle-mounted camera and a radar environment sensing element in a vehicle-mounted intelligent sensing module, different positions of the surrounding vehicles are labeled, and target vehicles at corresponding positions are screened; if the corresponding position has the target vehicle, the activation sign signal P of the corresponding positionN1, otherwise PN0 is _; activation flag signal at position NPNwhen _flagis 1, the corresponding lane L of the vehicleNVelocity vNAcceleration aNAnd a relative distance d with respect to the host vehicle with respect to the lane thereofNAnd obtaining the driving intention I of the environmental vehicle through the deviation of the environmental vehicle from the central line of the lane or the information of the steering lampNAnd a lane L of the host vehiclehVelocity vhIs recorded; wherein, for the driving intention INIs calculated by
wherein, INWhen the value of (1) is-1, 0,1, respectively indicating the intention of changing lane to the right, keeping lane and changing lane to the left of the environmental vehicle; flag _ light is a turn signal, the values of which are-1, 0, and 1 respectively indicate that the vehicle in the environment has the right turn signal, does not have the right turn signal and the left turn signal is on; delta d is the lateral distance of the current environment vehicle relative to the lane where the current environment vehicle is located and perpendicular to the lane line direction; dlaneThe distance between two adjacent lanes is defined as the distance between two adjacent lanes; this information is finally stored in the module.
4. the method of the parameterized learning decision control system for lane change and lane keeping according to claim 1, wherein the specific method of the second step is as follows:
A learning decision parameter module (B) learns appropriate decision parameter values based on a least square strategy iterative reinforcement learning method of a kernel function; modeling a driving decision process suitable for lane changing and lane keeping behaviors into a Markov decision process, wherein the Markov decision process comprises state design, action design and return design; according to the designed Markov decision process model and the recorded data, when the data volume collected by the system reaches a certain threshold value, learning by using a least square strategy iterative reinforcement learning method based on a kernel function;
2.1) establishing a Markov decision process model;
firstly, designing a state;
Numbering of relative positions of environmental vehicles and the host vehicle, and of positions of environmental vehiclesfor a complete representation of the traffic state in the environment, the state of the vehicle at the position N is taken into account, respectively the current lane LNvelocity vNAcceleration aNAnd a relative distance d with respect to the host vehicle with respect to the lane thereofNAnd obtaining the driving intention I of the environmental vehicle through the lateral deviation delta d of the environmental vehicle relative to the central line of the lane or the turn signal information Flag _ lightNwherein subscript N represents the corresponding vehicle at location N; the state vector also includes the state of the vehicle, the lane L of the vehiclehVelocity vh(ii) a The numerical values of the state quantities are read, calculated and stored in the sensing signal collection and data storage module A; thus, the state vector s can be represented as
When no environment vehicle exists at the corresponding position, the corresponding state vector value is set to be 0;
Secondly, designing the action;
In the framework of the problem, the decision parameters corresponding to the lane change and lane keeping behavior scenes are unified and determined, and are laterally deviated T for the behavior terminalyTime of action tfAcceleration and deceleration behavior atar(ii) a The decision parameters can be directly applied to a trajectory planning and motion control controller in a trajectory planning and motion control module (C) and respectively correspond to terminal lateral deviation equality constraint in a model prediction controller to predict time domain and acceleration reference items in an objective function; thus, the motion vector a can be represented as
a=(Ty,tf,atar)T, (3)
Wherein the behavior terminal is laterally offset by Ty∈{-dlane,0,dlane};dlanethe distance between two adjacent lanes is respectively and correspondingly changed to the left, the lane is kept and the lane is changed to the right; in the motion space, the action time tfAcceleration and deceleration behavior atarThe two continuous variables are dispersed in a value range space to obtain a discrete action space; thus act likeTime tfCan be expressed as
Acceleration and deceleration behavior atarE { -1.5, -0.5,0,0.5,1.5}, these parameterized decisions are used to describe human driving behavior;
Thirdly, return design;
In the design of the return function, safety factors r are respectively consideredsFast factor rrRide comfort factor rcRespectively expressed as:
rr=β1atar (6)
rr=rr-0.5 f tf=4, (7)
rc=-β1|atar| (8)
rc=rc-0.5 f tf=2, (9)
wherein d isNIs the relative distance of the vehicle at position N relative to the host vehicle on the basis of the lane thereof, dcIs the collision distance, TH ═ dN/vhTime interval of head, THexpIs the desired headway, LNIs the vehicle lane at position N, LhIs the vehicle lane, beta12Is the weight coefficient, tfAs time of action, atarAcceleration and deceleration behaviors; thus, the total return can be calculated as follows
r=rs+rr+rc+ra, (10)
Wherein r isaa return returned after the trajectory planning is carried out for the trajectory planning and motion control module (C);
2.2) kernel function-based optimizationAnd (3) a small second-multiplication strategy iterative algorithm: in a continuous state space, a function approximation method is used for representing a state-action value function; solving a weight vector of a state-action value function in reinforcement learning by using a least square strategy iterative algorithm based on a kernel function; firstly, obtaining a kernel dictionary through a thinning process; the feature vector is designed according to the state vector s and the motion vector a in the state pair m (s, a), and can be expressed as phi (m) sT,aT]TSelecting a radial basis function as the kernel function can be expressed as:
Wherein the content of the first and second substances,<·,·>represents the inner product of two vectors, phi (m)i),φ(mj) The state vectors are normalized in different ranges and are distinguished from action vectors and state vectors; sample set is denoted as M ═ M1,m2,...,mpThe feature vector set is phi ═ phi (m)1),φ(m2),...,φ(mp) }; screening based on the feature vector set, and if the linear correlation between the current feature vector and the feature vector in the dictionary is greater than a threshold value, adding the feature vector into a kernel dictionary to approximate the state value function;
the screening process is described as: assuming that after traversing q samples, kernel dictionary Dt-1t-1 (t is more than 1 and less than or equal to p) feature vectors; for the q +1 th sample, when judging whether the sample should be added into the kernel dictionary, calculating:
Wherein λ ═ λ12,...,λt-1]as a weight vector, the solution of equation (12) is:
wherein λ ═ λ12,...,λt-1]As a weight vector, [ W ]t-1]i,j=κ(mi,mj) Is a t-1 × t-1 dimensional matrix, w(q+1)(q+1)=κ(mq+1,mq+1) For the current feature vector mq+1Inner product value with itself, wt-1(mq+1)=[κ(m1,mq+1),κ(m2,mq+1),...,κ(mt-1,mq+1)]Tfor the existing feature vector and the current feature vector s in the dictionarytThe inner product t-1 dimensional column vector; if xi is more than mu, the feature vector is added into the kernel dictionary, otherwise, the feature vector is not added; until all samples are tested;
After obtaining the kernel dictionary, linearly approximating a state-action value function by using a characteristic vector in the kernel dictionary; the state-action value function is represented as:
wherein the content of the first and second substances,Is in a state miIs estimated by the state-action value function of (a ═ a12,...,αt) Is a weight vector; phi (m)j) Is a state pair mjthe feature vector of (2); for the ii th sample pair miiAnd ii +1 sample pair mii+1The incremental iterative update equation is:
Wherein, wt(mii)=[κ(m1,mii),κ(m2,mii),...,κ(mt,mii)]T,wt(mii+1)=[κ(m1,mii+1),κ(m2,mii+1),...,κ(mt,mii+1)]TAre respectively formed by mii,mii+1and in dictionariesCalculating a feature vector; a. theii-1,AiiIs a matrix of dimensions t x t, bii-1,biiThe t-dimensional column vectors respectively correspond to the values of the matrix A and the vector b in two times of iterative updating before and after; alpha is alphaiiEstimating a linear approximation weight vector of the state-action value function after iterative computation for the ii samples;
Estimation based on state-action value functionto improve the policy, the updated policy can be expressed as:
iteration is continued until all sample states and actions in the data set are the same as the actions obtained by the current strategy, and the algorithm convergence is finished;
The specific calculation process is as follows:
Step (1): obtain data set M ═ M1,m2,...,mp}, kernel function k, and initialize empty kernel dictionary D0A threshold μ;
Step (2): calculating equation (13) for cycle i ═ 1: p; if xi is more than mu, adding the current feature vector into the dictionary; otherwise, i is i + 1;
And (3): and obtaining a kernel dictionary, and performing strategy iteration. Initializing a zero matrix A, a zero vector b and a zero weight vector alpha;
And (4): calculating equation (15) for a number of cycles i ═ 1: p; until the data set policy is consistent with the current network policy;
and (5): and outputting the weight vector alpha.
5. the method of the parameterized learning decision control system for lane change and lane keeping according to claim 1, wherein the specific method of the third step is as follows:
3.1) nonlinear trajectory planning and establishment of motion equation: the bicycle vehicle dynamics model can be expressed as:
wherein M is the vehicle mass; v. ofxIs the longitudinal vehicle speed; v. ofyIs the vehicle lateral velocity; w is aris the vehicle yaw rate; fyf,FyrRespectively applying the lateral force of the front wheels and the lateral force of the rear wheels of the vehicle; i iszIs the moment of inertia of the vehicle along the z-axis; lf,lrThe distance between the front shaft and the rear shaft is; the control of the tracking of the longitudinal speed and the steering movement of the vehicle in the tracking module (D) is carried out, thus ensuringWhile here the control quantity is simplified to the front wheel turning angle deltafAnd a rate of change of the longitudinal velocity number a; tire side force Fyf,FyrCan be expressed as:
Wherein, deltafIs a front wheel corner; cr,CfFront and rear wheel cornering stiffness, respectively; meanwhile, according to the motion relationship of the vehicle, there are Is the heading angle of the vehicle; considering the motion equation of the motion of the vehicle under the global coordinate system, the nonlinear vehicle motion space equation is established as
Wherein the state variable isthe controlled variable is u ═[a,δf];Fyf,FyrCan be calculated from formula (18); x and Y are the positions of the vehicles in the global coordinate system;
3.2) establishing an optimized trajectory planner: firstly, the terminal state equality constraint condition is related to different road types; for a task, the completion of the task can be guaranteed only when a certain terminal state condition is met in a prediction time domain terminal; for lane keeping and lane changing tasks in a straight road environment, the task is completed under the condition that the yaw speed is at the terminal moment, the lateral speed returns to 0, the course angle is consistent with the center line of the current lane, and the position is on the center line of the lane of the current lane; in a curve environment, the equality constraint that the lateral speed returns to 0 for the yaw rate can be relaxed; thus, the terminal equation in a straight-path environment is constrained to
wherein, wr(tf),vy(tf),Y(tf) Respectively predicting the time yaw velocity, the lateral velocity, the course angle and the transverse displacement of the time domain terminal; y isl,fLaterally displaced for a desired termination; lane keeping time yl,f0; when changing lanes, yl,f=dlane,dlaneis the lateral distance between adjacent lanes; the terminal equality constraint in a curved road environment is
Wherein the content of the first and second substances,The course angle is the central line of the target lane at the point which is vertically closest to the current position of the vehicle; p (t)f) Predicting the vehicle position at the time of the time domain terminal; planeto be current with the vehiclethe position is vertically away from the center line of the nearest target lane; at the same time, the control quantity should satisfy the inequality constraint
Wherein, subscripts min, max represent the minimum and maximum values of the corresponding variables, respectively;
Front wheel turning angle delta of each control quantity in prediction time domain is considered by target functionfand the amount of change Δ δ in the rate of change of the longitudinal speed numberfand Δ a and the rate of change of the longitudinal velocity number a and the desired acceleration/deceleration behavior atarThe integral performance index of the deviation, the objective function of the controller is expressed as:
WhereinIs a weight coefficient;
Thus an optimization problem can be established as
Wherein, P (t)f)∈Rac,P(tf)∈RcdPredicting the positions of the vehicles at the time of the time domain terminal in a straight road and a curved road;
3.3) the trajectory planning and motion control module executes the driving decision return calculation: changing the transfer function in reinforcement learning into an actual track planning and motion control module (C), wherein the return r returned after the track planning is carried out by the track planning and motion control module (C)aThe calculation equation is
CN201910952119.1A 2019-10-08 2019-10-08 Parameterized learning decision control system and method suitable for lane changing and lane keeping Active CN110568760B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910952119.1A CN110568760B (en) 2019-10-08 2019-10-08 Parameterized learning decision control system and method suitable for lane changing and lane keeping

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910952119.1A CN110568760B (en) 2019-10-08 2019-10-08 Parameterized learning decision control system and method suitable for lane changing and lane keeping

Publications (2)

Publication Number Publication Date
CN110568760A true CN110568760A (en) 2019-12-13
CN110568760B CN110568760B (en) 2021-07-02

Family

ID=68784244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910952119.1A Active CN110568760B (en) 2019-10-08 2019-10-08 Parameterized learning decision control system and method suitable for lane changing and lane keeping

Country Status (1)

Country Link
CN (1) CN110568760B (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111192284A (en) * 2019-12-27 2020-05-22 吉林大学 Vehicle-mounted laser point cloud segmentation method and system
CN111746544A (en) * 2020-07-13 2020-10-09 吉林大学 Lane changing method for embodying individual behavior of driver
CN111967094A (en) * 2020-09-01 2020-11-20 吉林大学 Backward lane line calculating method based on Mobileye lane line equation
CN111959492A (en) * 2020-08-31 2020-11-20 重庆大学 HEV energy management hierarchical control method considering lane change behavior in networking environment
CN111985614A (en) * 2020-07-23 2020-11-24 中国科学院计算技术研究所 Method, system and medium for constructing automatic driving decision system
CN112051846A (en) * 2020-08-17 2020-12-08 华中科技大学 Multi-mode switching control method and system for full-steering mobile robot
CN112046484A (en) * 2020-09-21 2020-12-08 吉林大学 Q learning-based vehicle lane-changing overtaking path planning method
CN112578672A (en) * 2020-12-16 2021-03-30 吉林大学青岛汽车研究院 Unmanned vehicle trajectory control system based on chassis nonlinearity and trajectory control method thereof
CN112590792A (en) * 2020-12-18 2021-04-02 的卢技术有限公司 Vehicle convergence control method based on deep reinforcement learning algorithm
WO2021077725A1 (en) * 2019-10-21 2021-04-29 南京航空航天大学 System and method for predicting motion state of surrounding vehicle based on driving intention
CN112896191A (en) * 2021-03-08 2021-06-04 京东鲲鹏(江苏)科技有限公司 Trajectory processing method and apparatus, electronic device and computer readable medium
CN112937608A (en) * 2021-03-31 2021-06-11 吉林大学 Track prediction-based integrated rolling decision method and device for unmanned vehicle in ice and snow environment and storage medium
CN112965489A (en) * 2021-02-05 2021-06-15 北京理工大学 Intelligent vehicle high-speed lane change planning method based on collision detection
CN113177663A (en) * 2021-05-20 2021-07-27 启迪云控(上海)汽车科技有限公司 Method and system for processing intelligent network connection application scene
CN113191248A (en) * 2021-04-25 2021-07-30 国能智慧科技发展(江苏)有限公司 Vehicle deviation route detection system based on video linkage and intelligent Internet of things
CN113264059A (en) * 2021-05-17 2021-08-17 北京工业大学 Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning
CN113511222A (en) * 2021-08-27 2021-10-19 清华大学 Scene self-adaptive vehicle interactive behavior decision and prediction method and device
CN113548047A (en) * 2021-06-08 2021-10-26 重庆大学 Personalized lane keeping auxiliary method and device based on deep learning
WO2021212728A1 (en) * 2020-04-24 2021-10-28 广州大学 Unmanned vehicle lane changing decision-making method and system based on adversarial imitation learning
CN114074680A (en) * 2020-08-11 2022-02-22 湖南大学 Vehicle lane change behavior decision method and system based on deep reinforcement learning
CN114084155A (en) * 2021-11-15 2022-02-25 清华大学 Predictive intelligent automobile decision control method and device, vehicle and storage medium
CN114114929A (en) * 2022-01-21 2022-03-01 北京航空航天大学 Unmanned vehicle path tracking method based on LSSVM
CN114217601A (en) * 2020-09-03 2022-03-22 财团法人车辆研究测试中心 Hybrid decision-making method and system for self-driving
CN114620059A (en) * 2020-12-14 2022-06-14 广州汽车集团股份有限公司 Automatic driving method and system thereof, and computer readable storage medium
CN115202341A (en) * 2022-06-16 2022-10-18 同济大学 Transverse motion control method and system for automatic driving vehicle
WO2022237115A1 (en) * 2021-05-13 2022-11-17 中车长春轨道客车股份有限公司 Capability managing and energy saving assisted driving method for railway vehicle, and related device
CN116088321A (en) * 2023-04-12 2023-05-09 宁波吉利汽车研究开发有限公司 Automatic driving decision control method and device and electronic equipment
WO2023082726A1 (en) * 2021-11-12 2023-05-19 京东鲲鹏(江苏)科技有限公司 Lane changing strategy generation method and apparatus, computer storage medium, and electronic device
CN116476825A (en) * 2023-05-19 2023-07-25 同济大学 Automatic driving lane keeping control method based on safe and reliable reinforcement learning

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955930A (en) * 2016-05-06 2016-09-21 天津科技大学 Guidance-type policy search reinforcement learning algorithm
CN106114501A (en) * 2016-06-23 2016-11-16 吉林大学 A kind of have multimodal lane-change collision avoidance control method based on steering-by-wire
CN107168303A (en) * 2017-03-16 2017-09-15 中国科学院深圳先进技术研究院 A kind of automatic Pilot method and device of automobile
US20180093671A1 (en) * 2017-11-21 2018-04-05 GM Global Technology Operations LLC Systems and methods for adjusting speed for an upcoming lane change in autonomous vehicles
CN108819948A (en) * 2018-06-25 2018-11-16 大连大学 Driving behavior modeling method based on reverse intensified learning
CN108932840A (en) * 2018-07-17 2018-12-04 北京理工大学 Automatic driving vehicle urban intersection passing method based on intensified learning
CN109204308A (en) * 2017-07-03 2019-01-15 上海汽车集团股份有限公司 The control method and system that the determination method of lane keeping algorithm, lane are kept
CN110187639A (en) * 2019-06-27 2019-08-30 吉林大学 A kind of trajectory planning control method based on Parameter Decision Making frame
US20190302785A1 (en) * 2018-04-02 2019-10-03 Sony Corporation Vision-based sample-efficient reinforcement learning framework for autonomous driving

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955930A (en) * 2016-05-06 2016-09-21 天津科技大学 Guidance-type policy search reinforcement learning algorithm
CN106114501A (en) * 2016-06-23 2016-11-16 吉林大学 A kind of have multimodal lane-change collision avoidance control method based on steering-by-wire
CN107168303A (en) * 2017-03-16 2017-09-15 中国科学院深圳先进技术研究院 A kind of automatic Pilot method and device of automobile
CN109204308A (en) * 2017-07-03 2019-01-15 上海汽车集团股份有限公司 The control method and system that the determination method of lane keeping algorithm, lane are kept
US20180093671A1 (en) * 2017-11-21 2018-04-05 GM Global Technology Operations LLC Systems and methods for adjusting speed for an upcoming lane change in autonomous vehicles
US20190302785A1 (en) * 2018-04-02 2019-10-03 Sony Corporation Vision-based sample-efficient reinforcement learning framework for autonomous driving
CN108819948A (en) * 2018-06-25 2018-11-16 大连大学 Driving behavior modeling method based on reverse intensified learning
CN108932840A (en) * 2018-07-17 2018-12-04 北京理工大学 Automatic driving vehicle urban intersection passing method based on intensified learning
CN110187639A (en) * 2019-06-27 2019-08-30 吉林大学 A kind of trajectory planning control method based on Parameter Decision Making frame

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
CHANG WANG 等: "Cognitive Competence Improvement for Autonomous Vehicles: A Lane Change Identification Model for Distant Preceding Vehicles", 《DIGITAL OBJECT IDENTIFIER》 *
JINLONG HONG: "Engine Speed Control During Gear Shifting of AMT HEVs with Identified Intake-to-Power Delay", 《IFAC-PAPERSONLINE》 *
JUNJIE WANG 等: "Lane Change Decision-making through Deep Reinforcement Learning with Rule-based Constraints", 《IJCNN 2019. INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS》 *
YUXIANG ZHANG 等: "Deterministic Promotion Reinforcement Learning Applied to Longitudinal Velocity Control for Automated Vehicles", 《IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY》 *
YUXIANG ZHANG 等: "Velocity control in a right-turn across traffic scenario for autonomous vehicles using kernel-based reinforcement learning", 《CHINESE AUTOMATION CONGRESS (CAC)》 *
朱冰 等: "基于深度强化学习的车辆跟驰控制", 《中国公路学报》 *
陈虹 等: "面向动态避障的智能汽车滚动时域路径规划", 《中国公路学报》 *
陈银银: "面向无人驾驶的增强学习算法研究", 《中国优秀硕士学位论文全文数据库工程科技II辑》 *

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021077725A1 (en) * 2019-10-21 2021-04-29 南京航空航天大学 System and method for predicting motion state of surrounding vehicle based on driving intention
CN111192284B (en) * 2019-12-27 2022-04-05 吉林大学 Vehicle-mounted laser point cloud segmentation method and system
CN111192284A (en) * 2019-12-27 2020-05-22 吉林大学 Vehicle-mounted laser point cloud segmentation method and system
WO2021212728A1 (en) * 2020-04-24 2021-10-28 广州大学 Unmanned vehicle lane changing decision-making method and system based on adversarial imitation learning
CN111746544B (en) * 2020-07-13 2021-05-25 吉林大学 Lane changing method for embodying individual behavior of driver
CN111746544A (en) * 2020-07-13 2020-10-09 吉林大学 Lane changing method for embodying individual behavior of driver
CN111985614A (en) * 2020-07-23 2020-11-24 中国科学院计算技术研究所 Method, system and medium for constructing automatic driving decision system
CN111985614B (en) * 2020-07-23 2023-03-24 中国科学院计算技术研究所 Method, system and medium for constructing automatic driving decision system
CN114074680B (en) * 2020-08-11 2023-08-22 湖南大学 Vehicle channel change behavior decision method and system based on deep reinforcement learning
CN114074680A (en) * 2020-08-11 2022-02-22 湖南大学 Vehicle lane change behavior decision method and system based on deep reinforcement learning
CN112051846B (en) * 2020-08-17 2021-11-19 华中科技大学 Multi-mode switching control method and system for full-steering mobile robot
CN112051846A (en) * 2020-08-17 2020-12-08 华中科技大学 Multi-mode switching control method and system for full-steering mobile robot
CN111959492B (en) * 2020-08-31 2022-05-20 重庆大学 HEV energy management hierarchical control method considering lane change behavior in internet environment
CN111959492A (en) * 2020-08-31 2020-11-20 重庆大学 HEV energy management hierarchical control method considering lane change behavior in networking environment
CN111967094B (en) * 2020-09-01 2022-08-16 吉林大学 Backward lane line calculating method based on Mobileye lane line equation
CN111967094A (en) * 2020-09-01 2020-11-20 吉林大学 Backward lane line calculating method based on Mobileye lane line equation
CN114217601A (en) * 2020-09-03 2022-03-22 财团法人车辆研究测试中心 Hybrid decision-making method and system for self-driving
CN114217601B (en) * 2020-09-03 2024-02-27 财团法人车辆研究测试中心 Hybrid decision method and system for self-driving
CN112046484B (en) * 2020-09-21 2021-08-03 吉林大学 Q learning-based vehicle lane-changing overtaking path planning method
CN112046484A (en) * 2020-09-21 2020-12-08 吉林大学 Q learning-based vehicle lane-changing overtaking path planning method
CN114620059A (en) * 2020-12-14 2022-06-14 广州汽车集团股份有限公司 Automatic driving method and system thereof, and computer readable storage medium
CN114620059B (en) * 2020-12-14 2024-05-17 广州汽车集团股份有限公司 Automatic driving method, system thereof and computer readable storage medium
CN112578672B (en) * 2020-12-16 2022-12-09 吉林大学青岛汽车研究院 Unmanned vehicle trajectory control system based on chassis nonlinearity and trajectory control method thereof
CN112578672A (en) * 2020-12-16 2021-03-30 吉林大学青岛汽车研究院 Unmanned vehicle trajectory control system based on chassis nonlinearity and trajectory control method thereof
CN112590792B (en) * 2020-12-18 2024-05-10 的卢技术有限公司 Vehicle convergence control method based on deep reinforcement learning algorithm
CN112590792A (en) * 2020-12-18 2021-04-02 的卢技术有限公司 Vehicle convergence control method based on deep reinforcement learning algorithm
CN112965489A (en) * 2021-02-05 2021-06-15 北京理工大学 Intelligent vehicle high-speed lane change planning method based on collision detection
CN112896191A (en) * 2021-03-08 2021-06-04 京东鲲鹏(江苏)科技有限公司 Trajectory processing method and apparatus, electronic device and computer readable medium
CN112937608B (en) * 2021-03-31 2022-06-21 吉林大学 Track prediction-based integrated rolling decision method and device for unmanned vehicle in ice and snow environment and storage medium
CN112937608A (en) * 2021-03-31 2021-06-11 吉林大学 Track prediction-based integrated rolling decision method and device for unmanned vehicle in ice and snow environment and storage medium
CN113191248A (en) * 2021-04-25 2021-07-30 国能智慧科技发展(江苏)有限公司 Vehicle deviation route detection system based on video linkage and intelligent Internet of things
WO2022237115A1 (en) * 2021-05-13 2022-11-17 中车长春轨道客车股份有限公司 Capability managing and energy saving assisted driving method for railway vehicle, and related device
CN113264059A (en) * 2021-05-17 2021-08-17 北京工业大学 Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning
CN113177663A (en) * 2021-05-20 2021-07-27 启迪云控(上海)汽车科技有限公司 Method and system for processing intelligent network connection application scene
CN113177663B (en) * 2021-05-20 2023-11-24 云控智行(上海)汽车科技有限公司 Processing method and system of intelligent network application scene
CN113548047A (en) * 2021-06-08 2021-10-26 重庆大学 Personalized lane keeping auxiliary method and device based on deep learning
CN113511222A (en) * 2021-08-27 2021-10-19 清华大学 Scene self-adaptive vehicle interactive behavior decision and prediction method and device
CN113511222B (en) * 2021-08-27 2023-09-26 清华大学 Scene self-adaptive vehicle interaction behavior decision and prediction method and device
WO2023082726A1 (en) * 2021-11-12 2023-05-19 京东鲲鹏(江苏)科技有限公司 Lane changing strategy generation method and apparatus, computer storage medium, and electronic device
CN114084155B (en) * 2021-11-15 2023-10-20 清华大学 Predictive intelligent automobile decision control method and device, automobile and storage medium
CN114084155A (en) * 2021-11-15 2022-02-25 清华大学 Predictive intelligent automobile decision control method and device, vehicle and storage medium
CN114114929A (en) * 2022-01-21 2022-03-01 北京航空航天大学 Unmanned vehicle path tracking method based on LSSVM
CN114114929B (en) * 2022-01-21 2022-04-29 北京航空航天大学 Unmanned vehicle path tracking method based on LSSVM
CN115202341B (en) * 2022-06-16 2023-11-03 同济大学 Automatic driving vehicle lateral movement control method and system
CN115202341A (en) * 2022-06-16 2022-10-18 同济大学 Transverse motion control method and system for automatic driving vehicle
CN116088321A (en) * 2023-04-12 2023-05-09 宁波吉利汽车研究开发有限公司 Automatic driving decision control method and device and electronic equipment
CN116476825A (en) * 2023-05-19 2023-07-25 同济大学 Automatic driving lane keeping control method based on safe and reliable reinforcement learning
CN116476825B (en) * 2023-05-19 2024-02-27 同济大学 Automatic driving lane keeping control method based on safe and reliable reinforcement learning

Also Published As

Publication number Publication date
CN110568760B (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN110568760B (en) Parameterized learning decision control system and method suitable for lane changing and lane keeping
CN111845774B (en) Automatic driving automobile dynamic trajectory planning and tracking method based on transverse and longitudinal coordination
CN111338346B (en) Automatic driving control method and device, vehicle and storage medium
Chen et al. Human-centered trajectory tracking control for autonomous vehicles with driver cut-in behavior prediction
Rupp et al. Survey on control schemes for automated driving on highways
CN114379583B (en) Automatic driving vehicle track tracking system and method based on neural network dynamics model
Yoganandhan et al. Fundamentals and development of self-driving cars
Koga et al. Realization of different driving characteristics for autonomous vehicle by using model predictive control
Wu et al. Route planning and tracking control of an intelligent automatic unmanned transportation system based on dynamic nonlinear model predictive control
Kebbati et al. Lateral control for autonomous wheeled vehicles: A technical review
WO2024088068A1 (en) Automatic parking decision making method based on fusion of model predictive control and reinforcement learning
CN115303289A (en) Vehicle dynamics model based on depth Gaussian, training method, intelligent vehicle trajectory tracking control method and terminal equipment
Azam et al. N 2 C: neural network controller design using behavioral cloning
CN114030485A (en) Automatic driving automobile man lane change decision planning method considering attachment coefficient
CN114779641A (en) Environment self-adaptive MPC path tracking control method based on new course error definition
CN113184040B (en) Unmanned vehicle line-controlled steering control method and system based on steering intention of driver
Chen et al. An improved IOHMM-based stochastic driver lane-changing model
Chen et al. Online learning-informed feedforward-feedback controller synthesis for path tracking of autonomous vehicles
CN114077242A (en) Device and method for controlling a hardware agent in a control situation with a plurality of hardware agents
CN113033902A (en) Automatic driving track-changing planning method based on improved deep learning
Fehér et al. Proving ground test of a ddpg-based vehicle trajectory planner
CN115343950A (en) Vehicle path tracking control method and control system suitable for complex road surface
Zhan et al. Risk-aware lane-change trajectory planning with rollover prevention for autonomous light trucks on curved roads
Ting An output-feedback fuzzy approach to guaranteed cost control of vehicle lateral motion
Wang et al. Learning and generalizing motion primitives from driving data for path-tracking applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant