CN117666559A - Autonomous vehicle transverse and longitudinal decision path planning method, system, equipment and medium - Google Patents

Autonomous vehicle transverse and longitudinal decision path planning method, system, equipment and medium Download PDF

Info

Publication number
CN117666559A
CN117666559A CN202311468384.5A CN202311468384A CN117666559A CN 117666559 A CN117666559 A CN 117666559A CN 202311468384 A CN202311468384 A CN 202311468384A CN 117666559 A CN117666559 A CN 117666559A
Authority
CN
China
Prior art keywords
decision model
longitudinal
decision
autonomous vehicle
transverse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311468384.5A
Other languages
Chinese (zh)
Other versions
CN117666559B (en
Inventor
陈雪梅
徐书缘
朱宇臻
肖龙
薛杨武
沈晓旭
赵小萱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Advanced Technology Research Institute of Beijing Institute of Technology
Original Assignee
Beijing Institute of Technology BIT
Advanced Technology Research Institute of Beijing Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT, Advanced Technology Research Institute of Beijing Institute of Technology filed Critical Beijing Institute of Technology BIT
Priority to CN202311468384.5A priority Critical patent/CN117666559B/en
Publication of CN117666559A publication Critical patent/CN117666559A/en
Application granted granted Critical
Publication of CN117666559B publication Critical patent/CN117666559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Navigation (AREA)

Abstract

The invention discloses a method, a system, equipment and a medium for planning a transverse and longitudinal decision path of an autonomous vehicle, which relate to the technical field of vehicle driving decision, and comprise the following steps: under global path navigation, obtaining a position point of each step based on the sampling offset of the central line of the road; the method comprises the steps of taking the positions and the speeds of an autonomous vehicle and an environmental vehicle as state observables, taking a position point selected under each step as an action quantity to construct a transverse decision model, taking the opening of an accelerator pedal and the opening of a brake pedal as the action quantity to construct a longitudinal decision model, designing a reward function, and training the transverse decision model and the longitudinal decision model; selecting an optimal position point of each step according to the trained transverse decision model, and obtaining a local path track after polynomial fitting of the optimal position point of each step; based on the local path track, the speed control quantity is obtained according to the trained longitudinal decision model, and the decision planning effect under the perceived occlusion is improved.

Description

Autonomous vehicle transverse and longitudinal decision path planning method, system, equipment and medium
Technical Field
The invention relates to the technical field of vehicle driving decision making, in particular to a method, a system, equipment and a medium for planning a transverse and longitudinal decision making path of an autonomous vehicle.
Background
In terms of safety and efficiency, unmanned vehicles have great advantages over manned vehicles, and along with the development of deep learning, learning-based methods, particularly reinforcement learning algorithms, are widely focused in autonomous vehicle decision-making planning research, however, the safety and feasibility of tracks cannot be fully ensured by completely relying on the traditional reinforcement learning algorithms. In addition, many reinforcement learning algorithms do not have high traffic efficiency.
Therefore, part of scholars integrate reinforcement learning into traditional decision planning, and local path points are selected on unstructured roads based on reinforcement learning to serve as guidance of local planning. However, this method does not take into account the perceived occlusion, and is not able to accommodate scenes of perceived uncertainty, such as pedestrian probes, etc., which are likely to cause accidents in a narrow field of view.
And a layered structure is built, the global task is decomposed into a plurality of local subtasks, and the local subtasks are finally realized to reach a destination through realizing each subtask, but the architecture is relatively complex, and the fusion of the traditional path planning and reinforcement learning algorithm is lacking at present.
Disclosure of Invention
In order to solve the problems, the invention provides a method, a system, equipment and a medium for planning a transverse and longitudinal decision path of an autonomous vehicle, which are used for decoupling transverse and longitudinal decision problems, deciding by utilizing a value distributed reinforcement learning algorithm and improving the decision planning effect under perceived occlusion.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for autonomous vehicle longitudinal and transverse decision path planning, comprising:
under global path navigation, obtaining a position point of each step based on the sampling offset of the central line of the road;
the method comprises the steps of taking the positions and the speeds of an autonomous vehicle and an environmental vehicle as state observables, taking a position point selected under each step as an action quantity to construct a transverse decision model, taking the opening of an accelerator pedal and the opening of a brake pedal as action quantities to construct a longitudinal decision model, and designing a reward function interacted with the environment after the autonomous vehicle executes the action to train the transverse decision model and the longitudinal decision model;
selecting an optimal position point of each step according to the trained transverse decision model, and obtaining a local path track after polynomial fitting of the optimal position point of each step;
based on the local path track, obtaining the speed control quantity according to the trained longitudinal decision model.
As an alternative embodiment, the reward function of the longitudinal decision model comprises: a security rewards function, an efficiency transit rewards function, a destination rewards function, and a comfort rewards.
As an alternative embodiment, the safety reward function represents a reward value given in the event of a collision;
the arrival destination bonus function means that a bonus value is given if the destination is reached and if the destination is not reached but no collision occurs;
the efficiency pass reward function is expressed as controlling vehicle speed according to a desired vehicle speed;
comfortable rewardsDenoted as->Where a is the vehicle acceleration.
As an alternative embodiment, the reward function of the lateral decision model comprises: the reward function of the longitudinal decision model, the reference line reward function and the lane change reward function.
Alternatively, the reference line reward function indicates that if on the reference line, a reward value is awarded;
the lane change reward function indicates that if a lane change exists, a reward value is assigned.
As an alternative implementation mode, when the transverse decision model and the longitudinal decision model are trained, a completely parameterized quantile function of value distributed reinforcement learning is adopted, the longitudinal decision model is trained first, then the transverse decision model is trained, then iterative optimization is carried out, and when the iterative optimization is carried out, a strategy in one direction is fixed, and then a strategy in the other direction is learned.
In the transverse decision model and the longitudinal decision model, the positions and the speeds of the autonomous vehicle and the environmental vehicle are taken as state observables, so that an observation space is formed, the observation space comprises the position difference and the speed difference of the autonomous vehicle and the environmental vehicle, and the state observables at different moments are stacked to form the state space.
In a second aspect, the present invention provides an autonomous vehicle longitudinal decision path planning system comprising:
the sampling module is configured to obtain a position point of each step based on the sampling offset of the central line of the road under the global path navigation;
the model training module is configured to take the positions and the speeds of the autonomous vehicle and the environment vehicle as state observers, take the selected position point under each step as an action quantity to construct a transverse decision model, take the opening of an accelerator pedal and the opening of a brake pedal as the action quantity to construct a longitudinal decision model, and design a reward function interacted with the environment after the autonomous vehicle executes the action so as to train the transverse decision model and the longitudinal decision model;
the transverse decision module is configured to select the optimal position point of each step length according to the trained transverse decision model, and obtain a local path track after polynomial fitting of the optimal position point of each step length;
and the longitudinal decision module is configured to obtain the speed control quantity according to the trained longitudinal decision model based on the local path track.
In a third aspect, the invention provides an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of the first aspect.
In a fourth aspect, the present invention provides a computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a method for planning a transverse and longitudinal decision path of an autonomous vehicle, which provides a decoupled layered framework, decouples a transverse and longitudinal decision problem, wherein the transverse decision problem is the planning of a local path track, firstly, the position point of each step length is obtained based on a sampling method under global path navigation, then the position point of each step length is decided by using a completely parameterized quantile function of value distributed reinforcement learning, the optimal position point of each step length is selected, and finally, the local path track is generated by using polynomial fitting of traditional path planning; the longitudinal decision-making problem is that the speed control quantity is subjected to reinforcement learning based on the local path track, the decision is made by utilizing a completely parameterized quantile function algorithm of value distributed reinforcement learning, a traditional path planning and value distributed reinforcement learning algorithm are fused, the reinforced learning neural network can fit the total rewarding value (namely Q value) under risk distribution, the safety action under perceived occlusion is potentially learned, the risk in an uncertainty environment is learned, and the decision planning effect under perceived occlusion is improved.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
Fig. 1 is a flow chart of an autonomous vehicle transverse and longitudinal decision path planning method provided in embodiment 1 of the present invention;
fig. 2 is a schematic diagram of transverse track sampling provided in embodiment 1 of the present invention;
FIG. 3 is a diagram of a framework of a transversal decision problem provided in embodiment 1 of the present invention;
FIG. 4 is a schematic view of a transversal and longitudinal iterative optimization provided in embodiment 1 of the present invention;
fig. 5 is a longitudinal strategy training convergence graph based on real vehicle data according to embodiment 1 of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Example 1
The embodiment provides an autonomous vehicle transverse and longitudinal decision path planning method which is suitable for sensing a shielding environment, is an autonomous vehicle decision planning technology for realizing a decoupling layered structure based on combination of value distributed reinforcement learning and a traditional path planning method, and is used for effectively improving traffic safety and traffic efficiency by considering shielded obstacles potentially.
As shown in fig. 1, the method specifically includes:
under global path navigation, obtaining a position point of each step based on the sampling offset of the central line of the road;
the method comprises the steps of taking the positions and the speeds of an autonomous vehicle and an environmental vehicle as state observables, taking a position point selected under each step as an action quantity to construct a transverse decision model, taking the opening of an accelerator pedal and the opening of a brake pedal as action quantities to construct a longitudinal decision model, and designing a reward function interacted with the environment after the autonomous vehicle executes the action to train the transverse decision model and the longitudinal decision model;
selecting an optimal position point of each step according to the trained transverse decision model, and obtaining a local path track after polynomial fitting of the optimal position point of each step;
and obtaining the speed control quantity according to the trained longitudinal decision model based on the local path track.
In the present embodiment, in the Frenet coordinate system (also referred to as S-L coordinate system) in the urban environment, the offset is sampled based on the road center line; as shown in fig. 2, Δs means a displacement difference of the lateral displacement, and Δl means a displacement difference of the longitudinal displacement.
The Frenet coordinate system is a coordinate system for representing road position with a more intuitive representation than the traditional Cartesian coordinates; in Frenet coordinates, the position of the vehicle on the road is described using the variables S and L, where S represents distance along the road (also referred to as longitudinal displacement) and L represents left and right positions on the road (also referred to as lateral displacement).
Because the vehicle kinematic model cannot directly walk the broken line, the track local starting point and the track local target point cannot be simply connected by a straight line, and therefore the embodiment utilizes a five-degree polynomial to perform track fitting between the track points.
It should be noted that the five-degree polynomial fit is SL path points in the Frenet coordinate system, independent of time information.
In the embodiment, decoupling the autonomous vehicle transverse and longitudinal decision path planning problem into a transverse decision problem and a longitudinal decision problem which are respectively used for planning a local path track and planning a speed;
the transverse decision problem is decomposed into two parts, as shown in fig. 3, one is that the position point of each step is obtained based on a sampling method under the global path navigation, and then the position point of each step is decided by using a completely parameterized quantile function (Fully parameterized Quantile Function, FQF) of value distributed reinforcement learning, so that the optimal position point of each step is selected; and secondly, fitting the optimal position point of each step by using a five-element formula so as to generate a local path track.
The longitudinal decision problem is to learn the speed control based on the local path trajectory.
Therefore, the embodiment respectively decouples the transverse decision problem and the longitudinal decision problem into two Markov decision problems, and makes decisions by using the FQF algorithm.
In the embodiment, the positions and speeds of the autonomous vehicle and the environmental vehicle are taken as state observables, and the selected position point under each step length is taken as action quantity, so that a transverse decision model is constructed; the method comprises the steps of constructing a longitudinal decision model by taking the positions and the speeds of an autonomous vehicle and an environmental vehicle as state observables and taking the opening of an accelerator pedal and the opening of a brake pedal as action quantities;
wherein:
(1) The position and the speed of the autonomous vehicle and the environmental vehicle are taken as state observers, so that an observation space o is formed t Comprising: position and speed differences of the autonomous vehicle and the ambient vehicle; stacked by state observables at different moments as a state space s t
(1);
(2);
Wherein,is the observation space at time t, < >>Is the observation space at time t-1, +.>Is the observation space at time t-2, +.>Is the state space at time t, +.>、/>Representing a first ambient vehicle and a second ambient vehicle, < > or->Represents an autonomous vehicle->Is the position coordinates of the first ambient vehicle, +.>Is the position coordinates of the second ambient vehicle, +.>Is the position coordinates of the autonomous vehicle,/->、/>、/>The speed of the first ambient vehicle, the speed of the second ambient vehicle, the speed of the autonomous vehicle, respectively。
(2) The action amount of the transverse decision model refers to: a plurality of position points are arranged under each step length, and one position point is selected from the plurality of position points;
the action amount of the longitudinal decision model refers to: the opening of the accelerator pedal and the opening of the brake pedal are used for adjusting the speed.
In this embodiment, designing a reward function for interaction with the environment after the autonomous vehicle performs the action specifically includes:
(1) Reward function of longitudinal decision modelComprising the following steps: security reward function->Efficiency pass reward function->Get destination reward function->And comfortable rewards->The method comprises the steps of carrying out a first treatment on the surface of the Specifically, the formula (3) -formula (7):
(3);
(4);
(5);
(6);
(7);
wherein,for the expected speed of the vehicle, < > is>The current vehicle speed; a is the vehicle acceleration;
equation (7) is a reward function of the longitudinal decision model
Equation (3) indicates that if a collision occurs, a prize value is assigned p7;
equation (4) represents an efficiency passing reward, and the vehicle speed is controlled according to the expected vehicle speed so as to enable the vehicle speed to be close to the expected vehicle speed;
equation (5) shows that if the destination is reached, the prize value is given p 8 If the destination is not reached but no collision occurs, the prize value is assigned p 9 Otherwise, 0.
Equation (6) represents a comfort reward.
(2) In the transverse decision model, the transverse offset and the transverse displacement variation from the reference line are additionally considered, namely the rewarding function of the transverse decision modelComprising the following steps: security reward function->Efficiency pass reward function->Get destination reward function->And->And a reference line reward function +.>And lane change reward functionThe method comprises the steps of carrying out a first treatment on the surface of the Specifically, the formula (8) -formula (10):
(8);
(9);
(10);
wherein,the current action quantity;
equation (10) represents a reward function of the lateral decision model
Equation (8) shows that if on the reference line, the prize value is given p 10 Otherwise give p 11
The expression (9) indicates that if a channel change exists, the value is assigned, otherwise, the value is not assigned;
p 7 -p 12 all represent super parameters, and the specific value is designed as p 7 Is-1000, p 8 500, p 9 Is-200, p 10 Is-1000, p 11 500, p 12 Is-200.
In this embodiment, when training the transverse decision model and the longitudinal decision model, training the transverse decision model, and then performing iterative optimization; upon iterative optimization, after fixing a policy in one direction, a policy in another direction is learned, as shown in fig. 4.
Experiment verification
The embodiment provides a mode of training real vehicle data, which is mainly used for verifying the pedestrian probe scene of an intersection, the introduced real vehicle data is real environment data collected under the environment of an urban T-shaped intersection based on an automatic driving platform, the specific data mainly depends on a camera and a laser radar, the laser radar obtains the distance between a pedestrian and the platform, and the camera assists the laser radar to obtain information. According to the laser radar point cloud image and the camera data, pedestrians can be accurately identified, and the relative pose of the pedestrians can be obtained.
Pedestrian probe data are acquired at a T-junction, 100 sets of pedestrian data are acquired in this scenario, however these data volumes are sufficient for pedestrians and insufficient for vehicles. Therefore, the initial position of the vehicle is subjected to data expansion in a data enhancement mode, so that the initial position of each round of vehicle and the real data of pedestrians under each round of vehicle are generated, a longitudinal strategy training convergence curve based on real vehicle data recharging by utilizing multiple sub-training is obtained, and the longitudinal strategy training convergence curve is shown in fig. 5, wherein the gray shaded part of the curve is a confidence interval obtained by multiple sub-training.
Example 2
The embodiment provides an autonomous vehicle transverse and longitudinal decision path planning system, comprising:
the sampling module is configured to obtain a position point of each step based on the sampling offset of the central line of the road under the global path navigation;
the model training module is configured to take the positions and the speeds of the autonomous vehicle and the environment vehicle as state observers, take the selected position point under each step as an action quantity to construct a transverse decision model, take the opening of an accelerator pedal and the opening of a brake pedal as the action quantity to construct a longitudinal decision model, and design a reward function interacted with the environment after the autonomous vehicle executes the action so as to train the transverse decision model and the longitudinal decision model;
the transverse decision module is configured to select the optimal position point of each step length according to the trained transverse decision model, and obtain a local path track after polynomial fitting of the optimal position point of each step length;
and the longitudinal decision module is configured to obtain the speed control quantity according to the trained longitudinal decision model based on the local path track.
It should be noted that the above modules correspond to the steps described in embodiment 1, and the above modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 1. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
In further embodiments, there is also provided:
an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method described in embodiment 1. For brevity, the description is omitted here.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method described in embodiment 1.
The method in embodiment 1 may be directly embodied as a hardware processor executing or executed with a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
Those of ordinary skill in the art will appreciate that the elements of the various examples described in connection with the present embodiments, i.e., the algorithm steps, can be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims (10)

1. An autonomous vehicle longitudinal and transverse decision path planning method, comprising:
under global path navigation, obtaining a position point of each step based on the sampling offset of the central line of the road;
the method comprises the steps of taking the positions and the speeds of an autonomous vehicle and an environmental vehicle as state observables, taking a position point selected under each step as an action quantity to construct a transverse decision model, taking the opening of an accelerator pedal and the opening of a brake pedal as action quantities to construct a longitudinal decision model, and designing a reward function interacted with the environment after the autonomous vehicle executes the action to train the transverse decision model and the longitudinal decision model;
selecting an optimal position point of each step according to the trained transverse decision model, and obtaining a local path track after polynomial fitting of the optimal position point of each step;
and obtaining the speed control quantity according to the trained longitudinal decision model based on the local path track.
2. The autonomous vehicle longitudinal decision making method of claim 1, wherein the reward function of the longitudinal decision model comprises: a security rewards function, an efficiency transit rewards function, a destination rewards function, and a comfort rewards.
3. A method of autonomous vehicle longitudinal decision making as claimed in claim 2, characterized in that the safety reward function represents a reward value given in the event of a collision;
the arrival destination bonus function means that a bonus value is given if the destination is reached and if the destination is not reached but no collision occurs;
the efficiency pass reward function is expressed as controlling vehicle speed according to a desired vehicle speed;
comfortable rewardsDenoted as->Where a is the vehicle acceleration.
4. The autonomous vehicle longitudinal decision making method of claim 2, wherein the reward function of the lateral decision model comprises: the reward function of the longitudinal decision model, the reference line reward function and the lane change reward function.
5. A method of autonomous vehicle longitudinal decision making as defined in claim 4, wherein the reference line rewarding function represents a rewarding value if on the reference line;
the lane change reward function indicates that if a lane change exists, a reward value is assigned.
6. The autonomous vehicle horizontal and vertical decision planning method of claim 1, wherein when training the horizontal decision model and the vertical decision model, a completely parameterized quantile function of value-distributed reinforcement learning is adopted, the vertical decision model is trained first, then the horizontal decision model is trained, then iterative optimization is performed, and when iterative optimization is performed, a strategy in one direction is fixed, and then a strategy in the other direction is learned.
7. The method for planning transverse and longitudinal decisions of an autonomous vehicle according to claim 1, wherein in the transverse decision model and the longitudinal decision model, the positions and speeds of the autonomous vehicle and the environmental vehicle are taken as state observables, an observation space is formed by taking the positions and speeds of the autonomous vehicle and the environmental vehicle as state observables, the observation space comprises the position difference and the speed difference of the autonomous vehicle and the environmental vehicle, and the state observables at different moments are stacked to form the state space.
8. An autonomous vehicle longitudinal and transverse decision path planning system, comprising:
the sampling module is configured to obtain a position point of each step based on the sampling offset of the central line of the road under the global path navigation;
the model training module is configured to take the positions and the speeds of the autonomous vehicle and the environment vehicle as state observers, take the selected position point under each step as an action quantity to construct a transverse decision model, take the opening of an accelerator pedal and the opening of a brake pedal as the action quantity to construct a longitudinal decision model, and design a reward function interacted with the environment after the autonomous vehicle executes the action so as to train the transverse decision model and the longitudinal decision model;
the transverse decision module is configured to select the optimal position point of each step length according to the trained transverse decision model, and obtain a local path track after polynomial fitting of the optimal position point of each step length;
and the longitudinal decision module is configured to obtain the speed control quantity according to the trained longitudinal decision model based on the local path track.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of any of claims 1-7.
CN202311468384.5A 2023-11-07 2023-11-07 Autonomous vehicle transverse and longitudinal decision path planning method, system, equipment and medium Active CN117666559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311468384.5A CN117666559B (en) 2023-11-07 2023-11-07 Autonomous vehicle transverse and longitudinal decision path planning method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311468384.5A CN117666559B (en) 2023-11-07 2023-11-07 Autonomous vehicle transverse and longitudinal decision path planning method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN117666559A true CN117666559A (en) 2024-03-08
CN117666559B CN117666559B (en) 2024-07-02

Family

ID=90072318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311468384.5A Active CN117666559B (en) 2023-11-07 2023-11-07 Autonomous vehicle transverse and longitudinal decision path planning method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN117666559B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200043324A1 (en) * 2017-11-01 2020-02-06 Tencent Technology (Shenzhen) Company Limited Method for obtaining road condition information, apparatus thereof, and storage medium
CN111383474A (en) * 2018-12-29 2020-07-07 长城汽车股份有限公司 Decision making system and method for automatically driving vehicle
CN114435396A (en) * 2022-01-07 2022-05-06 北京理工大学前沿技术研究院 Intelligent vehicle intersection behavior decision method
CN114919578A (en) * 2022-07-20 2022-08-19 北京理工大学前沿技术研究院 Intelligent vehicle behavior decision method, planning method, system and storage medium
CN115257746A (en) * 2022-07-21 2022-11-01 同济大学 Uncertainty-considered decision control method for lane change of automatic driving automobile
US11748664B1 (en) * 2023-03-31 2023-09-05 Geotab Inc. Systems for creating training data for determining vehicle following distance

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200043324A1 (en) * 2017-11-01 2020-02-06 Tencent Technology (Shenzhen) Company Limited Method for obtaining road condition information, apparatus thereof, and storage medium
CN111383474A (en) * 2018-12-29 2020-07-07 长城汽车股份有限公司 Decision making system and method for automatically driving vehicle
CN114435396A (en) * 2022-01-07 2022-05-06 北京理工大学前沿技术研究院 Intelligent vehicle intersection behavior decision method
CN114919578A (en) * 2022-07-20 2022-08-19 北京理工大学前沿技术研究院 Intelligent vehicle behavior decision method, planning method, system and storage medium
CN115257746A (en) * 2022-07-21 2022-11-01 同济大学 Uncertainty-considered decision control method for lane change of automatic driving automobile
US11748664B1 (en) * 2023-03-31 2023-09-05 Geotab Inc. Systems for creating training data for determining vehicle following distance

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
苏卫星等: "基于环境风险的自动驾驶局部路径规划算法", 《信息与控制》, vol. 52, no. 3, 3 November 2022 (2022-11-03), pages 369 - 381 *
陈雪梅等: "城市环境下无人驾驶车辆驾驶规则获取及决策算法", 《北京理工大学学报》, vol. 37, no. 5, 15 May 2017 (2017-05-15), pages 491 - 496 *

Also Published As

Publication number Publication date
CN117666559B (en) 2024-07-02

Similar Documents

Publication Publication Date Title
CN113805572B (en) Method and device for motion planning
CN110834644B (en) Vehicle control method and device, vehicle to be controlled and storage medium
JP7140849B2 (en) Probabilistic Object Tracking and Prediction Framework
JP7194755B2 (en) Trajectory plan
CN110155031B (en) Trajectory tracking for vehicle lateral control using neural networks
US11493926B2 (en) Offline agent using reinforcement learning to speedup trajectory planning for autonomous vehicles
CN112292646B (en) Control system for a vehicle, method for controlling a vehicle and non-transitory computer readable memory
CN107792065B (en) Method for planning road vehicle track
US11586974B2 (en) System and method for multi-agent reinforcement learning in a multi-agent environment
US11467591B2 (en) Online agent using reinforcement learning to plan an open space trajectory for autonomous vehicles
CN111629945B (en) Autonomous vehicle operation management scenario
JP2021523057A (en) Direction adjustment action for autonomous vehicle motion management
JP2021519720A (en) Time expansion and contraction method for autonomous driving simulation
US11307585B2 (en) Introspective competence modeling for AV decision making
CN112888612A (en) Autonomous vehicle planning
CN111948938B (en) Slack optimization model for planning open space trajectories for autonomous vehicles
US20200189597A1 (en) Reinforcement learning based approach for sae level-4 automated lane change
CN111137301A (en) Vehicle language processing
US11815891B2 (en) End dynamics and constraints relaxation algorithm on optimizing an open space trajectory
US20220230080A1 (en) System and method for utilizing a recursive reasoning graph in multi-agent reinforcement learning
Chae et al. Design and vehicle implementation of autonomous lane change algorithm based on probabilistic prediction
Ma et al. Collision-avoidance lane change control method for enhancing safety for connected vehicle platoon in mixed traffic environment
Cai et al. Rule‐constrained reinforcement learning control for autonomous vehicle left turn at unsignalized intersection
CN117666559B (en) Autonomous vehicle transverse and longitudinal decision path planning method, system, equipment and medium
Gutiérrez-Moreno et al. Hybrid decision making for autonomous driving in complex urban scenarios

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant