CN113625718A - Method for planning driving path of vehicle - Google Patents

Method for planning driving path of vehicle Download PDF

Info

Publication number
CN113625718A
CN113625718A CN202110927868.6A CN202110927868A CN113625718A CN 113625718 A CN113625718 A CN 113625718A CN 202110927868 A CN202110927868 A CN 202110927868A CN 113625718 A CN113625718 A CN 113625718A
Authority
CN
China
Prior art keywords
vehicle
network
environment
value
speed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110927868.6A
Other languages
Chinese (zh)
Other versions
CN113625718B (en
Inventor
莫建林
赖哲渊
张汉驰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SAIC Volkswagen Automotive Co Ltd
Original Assignee
SAIC Volkswagen Automotive Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SAIC Volkswagen Automotive Co Ltd filed Critical SAIC Volkswagen Automotive Co Ltd
Priority to CN202110927868.6A priority Critical patent/CN113625718B/en
Publication of CN113625718A publication Critical patent/CN113625718A/en
Application granted granted Critical
Publication of CN113625718B publication Critical patent/CN113625718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0238Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors
    • G05D1/024Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors in combination with a laser
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Optics & Photonics (AREA)
  • Electromagnetism (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention provides a method for planning a driving path of a vehicle, which comprises the following steps: generating an environment state characteristic diagram sequence of the self-vehicle based on the map data and the target tracking result; acquiring vehicle state information of the self vehicle; taking the environment state characteristic diagram sequence of the self vehicle and the self vehicle state information as environment and state data, and inputting the environment and state data into a path planning model; and acquiring a planned track of the self-vehicle output by the path planning model. The method can realize the reinforcement learning and training of the path planning model, so that the method is better applied to the scene of vehicle automatic driving.

Description

Method for planning driving path of vehicle
Technical Field
The present invention relates to the field of automatic driving, and in particular, to a method and an apparatus for planning a driving path of a vehicle, and a computer readable medium.
Background
Some automatic driving path planning methods are based on modularization methods, specifically, an automatic driving system is divided into an environment sensing module, a prediction module, a path planning module and a control module explicitly, the output of the former module is used as the input of the latter module, and after the corresponding result of the environment sensing and prediction is obtained, the path planning module searches for an optimal planning path meeting the set loss function based on optimization methods such as dynamic planning. Some methods adopt simulation learning (emulation learning), which is based on deep learning and learns a large number of data sets driven by experts through a deep neural network model so as to learn an optimized planned path.
The solution of the modularized method has strong interpretability, is not like a black box mode which is basically not ideal in interpretation in deep learning, but has the defects that the modules are influenced by each other, the accuracy of the perception and prediction module seriously influences the design of the subsequent planning module, and the inter-module interference phenomenon is serious.
Although the simulated learning method based on deep learning can directly learn a planned track or a driving behavior end to end, the simulated learning method has the defects that a great amount of labeled data is needed for learning, and meanwhile, the simulated learning method can only be applied to scene modes appearing in learning, and generally, data coping with extreme scenes (such as red light running, illegal driving, vehicle collision and the like) is extremely rare in expert data, so that the learned model cannot be well applied to an automatic driving scene.
Disclosure of Invention
The invention aims to provide a method for planning a driving path of a vehicle, which realizes the reinforcement learning and training of a path planning model and enables the path planning model to be better applied to the scene of vehicle automatic driving.
In order to solve the technical problem, the invention provides a method for planning a driving path of a vehicle, which comprises the following steps: generating an environment state characteristic diagram sequence of the self-vehicle based on the map data and the target tracking result; acquiring vehicle state information of the self vehicle; taking the environment state characteristic diagram sequence of the self vehicle and the self vehicle state information as environment and state data, and inputting the environment and state data into a path planning model; and acquiring a planned track of the self-vehicle output by the path planning model.
In an embodiment of the present invention, the method further includes: and obtaining a model return estimation value output by the path planning model, and evaluating the model based on the return estimation value.
In an embodiment of the present invention, the path planning model includes a trunk neural network, a first feature vectorization module, a first fully connected network, a second feature vectorization module, a third fully connected network, and a fourth fully connected network, which are connected in sequence; the environment state feature diagram sequence of the vehicle is input into the backbone neural network, and the vehicle state information is input into the feature vectorization first module.
In an embodiment of the present invention, the environment and state data are updated through the operation of the controller module and the environment and state data calculation module; wherein the second fully-connected network inputs planned trajectory data of the self-vehicle into the controller module, the controller module controls the self-vehicle to travel, and inputs first data generated by the self-vehicle to travel and second data generated by observing the periphery of the self-vehicle into the environment and state data calculation module, and the environment and state data calculation module updates the environment and state data based on the first data and the second data; the fourth fully-connected network outputs a model-reported estimate.
In an embodiment of the invention, the environment and state data calculation module outputs a vehicle state transition reward value of the path planning model.
In an embodiment of the present invention, the vehicle state information includes a speed, an acceleration, a heading angle and a heading angular velocity.
In an embodiment of the present invention, obtaining the model return estimation value output by the path planning model includes: calculating a vehicle speed reward value of the path planning model; calculating a vehicle position reward value of the path planning model; deriving a vehicle state transition reward value based on the vehicle speed reward value and the vehicle location reward value; and calculating to obtain the model return estimation value based on the vehicle state transition reward value.
In an embodiment of the present invention, calculating the vehicle speed reward value of the path planning model includes: according to the actual speed V of the bicyclerealAnd the desired speed V of the vehicleexpReceive the reward parameter Gspeed(ii) a According to the reward parameter GspeedObtaining the vehicle speed reward value rt,speed. In one embodiment of the present invention, the actual speed V of the vehicle is determined according to the actual speed V of the vehiclerealAnd the desired speed V of the vehicleexpReceive the reward parameter GspeedThe method comprises the following steps:
Figure BDA0003208634580000031
wherein, | Vreal-VexpI represents the pair V real-VexpAnd taking an absolute value.
In an embodiment of the invention, the reward parameter G is determined according tospeedObtaining the vehicle speed reward value rt,speedThe method comprises the following steps: when G isspeed> 1 or Gspeed=1,rt,speed0; when G isspeed=0,rt,speed1 is ═ 1; when 0 < Gspeed<1,
Figure BDA0003208634580000032
In an embodiment of the present invention, the desired speed V of the host vehicleexpThe calculation of (a) includes: when the vehicle meets the red light road condition, and when the distance between the vehicle and the red light stop line is greater than L1, the expected speed of the vehicle is
Vexp=Vexp,max
When the distance between the self vehicle and the red light stop line is less than or equal to L1, the desired speed of the self vehicle is according to
Figure BDA0003208634580000033
Performing linear deceleration;
LD is the distance between the current time and the red light stop line of the bicycle, and the distance between the red light stop line and the red light is L2Vexp,maxIs the maximum desired speed.
In an embodiment of the present invention, the desired speed V of the host vehicleexpThe calculation of (a) includes: when the vehicle meets the road condition of the obstacle, the vehicle and the obstacle are in the same stateThe actual distance P of the obstacle stop line from the obstacle D2 and the distance of the own vehicle from the obstacle stop line D1 satisfy P>D1+ D2, the expected speed of the bicycle
Vexp=Vexp,max
When the actual distance P between the self vehicle and the obstacle, the distance D2 between the obstacle stop line and the obstacle and the distance D1 between the self vehicle and the obstacle stop line satisfy P ≦ D1+ D2, the desired speed of the self vehicle is equal to or less than D1+ D2
Figure BDA0003208634580000041
Performing linear deceleration; wherein, Vexp,maxIs the maximum desired speed.
In an embodiment of the present invention, the desired speed V of the host vehicleexpThe calculation of (a) includes: when the vehicle meets the green light road condition, the expected speed of the vehicle is
Vexp=Vexp,max
Wherein, Vexp,maxIs the maximum desired speed.
In an embodiment of the present invention, calculating the vehicle location reward value of the path planning model includes: determining the vehicle position reward value according to the distance S1 between the center point of the vehicle and the center line of the lane;
wherein the content of the first and second substances,
when | S1| > 1 or | S1| > 1, rt,position=-1;
When | S1| ═ 0, rt,position=0;
When 0 < | S1| < 1,
Figure BDA0003208634580000042
| S1| represents the absolute value of S1.
In one embodiment of the present invention, deriving a vehicle state transition reward value based on the vehicle speed reward value and the vehicle location reward value comprises: the vehicle state transition reward value
rt=rt,speed+rt,position
Wherein r ist,speedRepresenting said vehicle speed reward value, rt,positionRepresenting the vehicle location reward value.
In an embodiment of the present invention, calculating the model reward evaluation value based on the vehicle state transition reward value includes: the model return estimate
Figure BDA0003208634580000043
Wherein ρ is an estimation coefficient, T represents a total frame number of the environmental status feature map in the environmental status feature map sequence, the total frame number corresponds to a time point corresponding to the end of the path planning, and T is a positive integer.
In an embodiment of the present invention, the backbone neural network, the feature vectorization first module, the first fully connected network, and the second fully connected network are connected to form an Actor network; the Actor network is connected with the feature vectorization second module, the third fully-connected network and the fourth fully-connected network to form a Critic network; wherein the Actor network outputs a planned trajectory a of the host vehicletThe neural network weight parameter needing to be learned by the Actor network is thetaμThe Actor network is expressed as a weight parameter in the form of at=μ(stu),stRepresenting the environment and state data at the current time; the Critic network output model return estimation value QtThe neural network weight parameters needing to be learned by the Critic network comprise theta of the first half networkμAnd theta of the latter half networkEThe Critic network is expressed as a weight parameter in the form of Qt=Q(st,atμ,θE) (ii) a The environment state characteristic diagram sequence comprises a multi-frame environment state characteristic diagram.
In an embodiment of the present invention, the weights of the neural networks of the Actor network and the Critic network are setParameter thetaμAnd thetaEThe process of performing reinforcement learning includes: setting the number RB of playback buffers for the reinforcement learning and the number N of sample batches during training, wherein the RB and the N are positive integers; for the weight parameter theta of the neural network μAnd thetaEThe Actor network mu(s)tμ) The Critic network Q(s)t,atμ,θE) Carrying out initialization; constructing and relating to the Actor network mu(s)tμ) And said Critic network Q(s)t,atμ,θE) Is completely identical to the first target network mu'(s)tμ′) And a second target network Q'(s)t,atμ′,θE′) (ii) a For the first target network mu'(s)tμ′) And a second target network Q'(s)t,atμ′,θE′) Weight parameter theta ofμ′And thetaE′Carrying out initialization; setting the update period value Num of the target network weight parameterupdate(ii) a Setting an initial value s of the environment and state data1And the target network update count value NumcountAn initial value of (d); for the environment and state data with the total frame number of T frames, the environment and state data is s1Initially, a learning step is performed.
In an embodiment of the present invention, for the environment and status data with the total frame number of T frames, the environment and status data is s1Initially, the performing learning step includes: output a to the current Actor networktAdding disturbance to obtain at,dAs the motion track indication of the current frame; context and status data s based on the current timetPerforming a on the environment and statet,dAnd obtaining environment and state data s after the vehicle state is transferredt+1And corresponding vehicle state transition excitation value rt(ii) a Transferring the current vehicle state to corresponding sample vector(s) t,at,d,rt,st+1) Saving in the playback cache; randomly taking N samples(s) from the playback bufferi,ai,d,ri,si+1)(i=1,2,…,N,ai,d∈at,d,ri∈rt) And training the Actor network and the Critic network.
In an embodiment of the invention, the initializing comprises randomizing the parameters for the neural network weight parameter θμAnd thetaEThe Actor network mu(s)tμ) The Critic network Q(s)t,atμ,θE) Initialization is performed.
In an embodiment of the present invention, a is output to the current Actor networktAdding disturbance to obtain at,dThe motion track indication as the current frame comprises the following steps:
at,d=μ(stμ)+σζt-βμ(stμ)
wherein zeta is a Gaussian random process, sigma is a first disturbance parameter, and beta is a second disturbance parameter.
In one embodiment of the invention, the environment and state data s is based on the current timetPerforming a on the environment and statet,dAnd obtaining environment and state data s after the vehicle state is transferredt+1The environment and state data calculation module is used for calculating the environment and state data of the environment and the state of the vehicle.
In one embodiment of the present invention, the controller module controls lateral and longitudinal movements of the host vehicle.
In one embodiment of the invention, N samples(s) are randomly drawn from the playback bufferi,ai,d,ri,si+1)(i=1,2,…,N,ai,d∈at,d) Training the Actor network and the Critic network comprises: calculating the model return estimation value Q tA target value of (d); calculating the model return estimated value Q of the current frametAverage residual error with model reported target value; selecting and updating the weight parameter theta of the Critic network according to the sampling result of Bernoulli distributionμAnd thetaEThe manner of (a); weights to the Actor networkParameter thetaμUpdating is carried out; updating the count value Num for the target networkcountUpdating is carried out; comparing the updated count value Num of the target networkcountAnd the update period value NumupdateObtaining a judgment result; determining whether to determine the weight parameter theta of the target network according to the judgment resultμ′And thetaE′And (6) updating.
In one embodiment of the present invention, the model return estimate Q is calculatedtThe target values of (a) include:
the model return estimate QtTarget value of
yi=ri+γQ′(si+1,μ′(si+1μ′)|θμ′,θE′);
Where γ is a target value coefficient. Calculating the model return estimated value Q of the current frametThe average residuals from the model reward target values include:
mean residual error
Figure BDA0003208634580000061
Wherein, yiRepresenting the model reward objective value.
In an embodiment of the present invention, the weight parameter θ of the criticic network is selected and updated according to the sampling result of bernoulli distributionμAnd thetaEThe method comprises the following steps: for each frame of the environment and state data, sampling a Bernoulli sample once according to Bernoulli distribution to obtain a sampling result;
If the sampling result is 1, according to
Figure BDA0003208634580000071
Figure BDA0003208634580000072
Weight parameter theta to the Critic networkEUpdating is carried out, and the weight parameter thetaμKeeping the same;
if the sampling result is 0, according to
Figure BDA0003208634580000073
Figure BDA0003208634580000074
Figure BDA0003208634580000075
Figure BDA0003208634580000076
Weight parameter theta to the Critic networkμAnd thetaEUpdating is carried out;
wherein the content of the first and second substances,
Figure BDA0003208634580000077
representing the function L vs. thetaEThe derivation is carried out by the derivation,
Figure BDA0003208634580000078
representing the function L vs. thetaμAnd (4) derivation, wherein the probability that the Bernoulli sample in the Bernoulli distribution is 1 is taken as k, and k is more than 0 and less than 1.
In an embodiment of the present invention, a weight parameter θ for the Actor networkμThe updating comprises the following steps:
according to
Figure BDA0003208634580000079
Figure BDA00032086345800000710
A weight parameter theta to the Actor networkμThe updating is carried out, and the updating is carried out,
wherein J ═ Q (s, a | θ)μ,θE)。
In an embodiment of the present invention, the count value Num is updated for the target networkcountThe updating comprises the following steps: numcount=Numcount+1。
In an embodiment of the present invention, whether to determine the weight parameter θ of the target network is determined according to the determination resultμ′And thetaE′The updating comprises the following steps: if the target network updates the count value NumcountIs less than the update period value NumupdateContinuing the learning step; if the target network updates the count value NumcountIs equal to the update period value NumupdateAccording to
θE′←τθE+(1-τ)θE′
θμ′←τθμ+(1-τ)θμ′
A weight parameter theta to the target networkμ′And theta E′Updating and updating the target network update count value NumcountResetting to zero; wherein, τ is the update coefficient of the target network weight.
In an embodiment of the present invention, the sequence of the environmental status characteristic maps of the host vehicle includes a plurality of frames of environmental status characteristic maps, and each of the environmental status characteristic maps is generated by the following steps: generating an environment static picture taking the self-vehicle as a picture center based on the map data; generating an environment dynamic picture taking the self-vehicle as a picture center based on the target detection tracking result; and generating the environment state characteristic graph according to the environment static picture and the environment dynamic picture.
In an embodiment of the present invention, generating the environment status feature map according to the environment still picture and the environment moving picture includes: taking the environment static picture as a base map; overlaying picture information contained in the environment dynamic picture on the base map; taking the self-vehicle central point of the current frame as a pixel central point on the environment state characteristic diagram; and setting the heading angle direction of the vehicle as the direction right above the environmental state characteristic diagram, and generating the environmental state characteristic diagram.
Compared with the prior art, the invention has the following advantages: according to the technical scheme, the path planning model for automatic driving path planning can be subjected to reinforcement learning and training through designing the reinforcement learning model of the shared network and the related algorithm for model training, so that the application requirement of automatic driving is better met.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principle of the invention. In the drawings:
fig. 1 is a flowchart of a method for planning a driving path of a vehicle according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of a path planning model according to an embodiment of the present application.
Fig. 3 is a schematic training diagram of a path planning model according to an embodiment of the present application.
Fig. 4-6 are schematic diagrams illustrating calculation of a desired speed of a host vehicle according to some embodiments of the present disclosure.
FIG. 7 is a schematic illustration of the calculation of a vehicle location reward value in accordance with some embodiments of the present application.
Fig. 8 is a schematic system implementation environment diagram of a vehicle travel path planning apparatus according to an embodiment of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described herein, and thus the present invention is not limited to the specific embodiments disclosed below.
As used herein, the terms "a," "an," "the," and/or "the" are not intended to be inclusive and include the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.
The relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description. Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
In the description of the present application, it is to be understood that the orientation or positional relationship indicated by the directional terms such as "front, rear, upper, lower, left, right", "lateral, vertical, horizontal" and "top, bottom", etc., are generally based on the orientation or positional relationship shown in the drawings, and are used for convenience of description and simplicity of description only, and in the case of not making a reverse description, these directional terms do not indicate and imply that the device or element being referred to must have a particular orientation or be constructed and operated in a particular orientation, and therefore, should not be considered as limiting the scope of the present application; the terms "inner and outer" refer to the inner and outer relative to the profile of the respective component itself.
Furthermore, it should be noted that the terms "first", "second", etc. are used to define the components or assemblies, and are only used for convenience to distinguish the corresponding components or assemblies, and the terms have no special meaning if not stated, and therefore, the scope of protection of the present application should not be construed as being limited. Further, although the terms used in the present application are selected from publicly known and used terms, some of the terms mentioned in the specification of the present application may be selected by the applicant at his or her discretion, the detailed meanings of which are described in relevant parts of the description herein. Further, it is required that the present application is understood not only by the actual terms used but also by the meaning of each term lying within.
Flow charts are used herein to illustrate operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, various steps may be processed in reverse order or simultaneously. Meanwhile, other operations are added to or removed from these processes.
The embodiment of the invention describes a method, a device and a computer readable medium for planning a driving path of a vehicle.
Fig. 1 is a flowchart of a method for planning a driving path of a vehicle according to an embodiment of the present application.
As shown in fig. 1, the method for planning a driving path of a vehicle includes a step 101 of generating an environmental state feature map sequence of the vehicle based on map data and a target tracking result. And 102, acquiring the vehicle state information of the self vehicle. And 103, inputting the environment state feature diagram sequence of the self vehicle and the self vehicle state information into a path planning model as environment and state data. And 104, acquiring a planned track of the self vehicle output by the path planning model.
Specifically, in step 101, an environmental state feature map sequence of the own vehicle is generated based on the map data and the target tracking result.
In some embodiments, the sequence of environmental status feature maps of the host vehicle includes a plurality of frames of environmental status feature maps, each of the environmental status feature maps being generated by: step 1011, generating an environment static picture taking the self-vehicle as a picture center based on the map data; step 1012, generating an environment dynamic picture taking the self-vehicle as a picture center based on the target detection tracking result; and 1013, generating the environment state feature map according to the environment static picture and the environment dynamic picture.
In some embodiments, the generating the environment status feature map according to the environment still picture and the environment moving picture of step 1013 comprises: step 1101, taking the environment static picture as a base map; step 1102, overlaying picture information contained in the environment dynamic picture on the base map; 1103, taking the center point of the current frame as a pixel center point on an environment state feature map; and 1104, setting the heading angle direction of the vehicle to be right above the environmental state characteristic diagram, and generating the environmental state characteristic diagram.
In some embodiments, the method for planning the driving path of the vehicle further includes step 105 of obtaining a model return estimation value output by the path planning model, and evaluating the model based on the return estimation value.
Fig. 2 is a schematic structural diagram of a path planning model according to an embodiment of the present application.
Referring to fig. 2, in some embodiments, the path planning model 401 includes a trunk neural network 403, a feature vectorization first module 405, a first fully-connected network FC1, a second fully-connected network FC2, a feature vectorization second module 407, a third fully-connected network FC3, and a fourth fully-connected network FC4 connected in series.
In some embodiments, the sequence 421 of the feature map of the environmental state of the host vehicle is input to the backbone neural network 403, and the vehicle state information 423 of the host vehicle is input to the first feature vectorization module 405. The sequence 421 of the characteristic map of the environmental state of the own vehicle and the information 423 of the vehicle state of the own vehicle constitute the environmental and state data 411.
Fig. 3 is a schematic training diagram of a path planning model according to an embodiment of the present application.
Referring to fig. 3, in some embodiments, the environment and state data 411 is updated by the operation of the controller module 502 and the environment and state data calculation module 504.
In some embodiments, the environment and status data 411 may also be taken from other existing data sets. The controller module 502 and the environment and state data calculation module 504 are only used to better understand the training process of the path planning model in the method for planning a driving path of a vehicle according to the present application, and are not used to limit the structure of the path planning model according to the present application.
In some embodiments, the second fully-connected network FC2 inputs planned trajectory data FC2_ output of the own vehicle into the controller module 502, the controller module 502 controls the running of the own vehicle, and inputs first data generated by the running of the own vehicle and second data generated by observing the surroundings of the own vehicle into the environment and state data calculation module 504, and the environment and state data calculation module 504 updates the environment and state data 411 based on the first data and the second data. As described above, the environment and state data 411 includes the environment state feature map sequence 421 of the own vehicle and the vehicle state information 423 of the own vehicle.
In actual driving situations, the controller module 502 controls lateral and longitudinal movement of the vehicle, for example, by controlling components such as the throttle, brake, and steering wheel of the host vehicle.
In some embodiments, the fourth fully-connected network FC4 outputs a model reward estimate Qt. Model return estimate QtThe schematic reference number in fig. 3 is 416.
In some embodiments, the environment and state data calculation module 504 outputs a vehicle state transition reward value r for the path planning modelt. Vehicle state transition reward value rtThe schematic reference number in fig. 3 is 510.
In some embodiments, the vehicle state information includes speed, acceleration, heading angle, and heading angular velocity. In an actual driving situation, the raw data corresponding to the vehicle state information is acquired by, for example, a camera device, a laser radar, a millimeter wave radar, and the like mounted on the vehicle, and then the state information is obtained through data processing.
In some embodiments, obtaining the model return estimate output by the path planning model at step 105 includes: step 1051, calculating a vehicle speed reward value of the path planning model; step 1052, calculating a vehicle position reward value of the path planning model; step 1053, obtaining a vehicle state transition reward value based on the vehicle speed reward value and the vehicle position reward value; and 1054, calculating to obtain the model return estimation value based on the vehicle state transition reward value.
In some embodiments, the calculating of the vehicle speed reward value for the path planning model of step 1051 comprises: step 201, according to the actual speed V of the vehiclerealAnd the desired speed V of the vehicleexpReceive the reward parameter Gspeed(ii) a 202, according to the reward parameter GspeedObtaining the vehicle speed reward value r t,speed
In some embodiments, the step 201 is based on the actual speed V of the vehiclerealAnd the desired speed V of the vehicleexpReceive the reward parameter GspeedThe method comprises the following steps:
Figure BDA0003208634580000121
wherein, | Vreal-VexpI represents the pair Vreal-VexpAnd taking an absolute value.
In some embodiments, the reward parameter G of step 202 is based onspeedObtaining the vehicle speed reward value rt,speedThe method comprises the following steps:
when G isspeed> 1 or Gspeed=1,rt,speed=0;
When G isspeed=0,rt,speed=1;
When 0 < Gspeed<1,
Figure BDA0003208634580000131
Fig. 4, 5 and 6 are schematic diagrams illustrating calculation of a desired speed of a host vehicle according to some embodiments of the present application.
In some embodiments, the desired speed V of the host vehicleexpThe calculation of (a) includes:
in case one, referring to fig. 4, when the vehicle encounters a red traffic light,
when the distance between the vehicle and the red light stop line is larger than L1, the desired speed of the vehicle
Vexp=Vexp,max
When the distance between the self vehicle and the red light stop line is less than or equal to L1, the desired speed of the self vehicle is according to
Figure BDA0003208634580000132
Performing linear deceleration;
LD is the distance between the current time and the red light stop line of the bicycle, and the distance between the red light stop line and the red light is L2Vexp,maxIs the maximum desired speed.
The values of the above 1 and L2 can be set as desired, for example, L1 is 60 meters and L2 is 10 meters.
In case two, referring to fig. 5, when the vehicle encounters an obstacle,
the desired speed of the own vehicle when an actual distance P of the own vehicle from the obstacle, a distance D2 of the obstacle stop line from the obstacle, and a distance D1 of the own vehicle from the obstacle stop line satisfy P > D1+ D2
Vexp=Vexp,max
When the actual distance P between the self vehicle and the obstacle, the distance D2 between the obstacle stop line and the obstacle and the distance D1 between the self vehicle and the obstacle stop line satisfy P ≦ D1+ D2, the desired speed of the self vehicle is equal to or less than D1+ D2
Figure BDA0003208634580000133
Performing linear deceleration; vexp,maxIs the maximum desired speed.
The values of D1 and D2 can be set as desired, for example, D1 is 60 meters and D2 is 2 meters. Obstacles such as a leading vehicle or a road barrier, etc.
In case three, referring to fig. 6, when the vehicle encounters a green road condition, the vehicle expects a speed
Vexp=Vexp,max
Vexp,maxIs the maximum desired speed.
FIG. 7 is a schematic illustration of the calculation of a vehicle location reward value in accordance with some embodiments of the present application.
Referring to fig. 7, in some embodiments, the calculating of the vehicle location reward value of the path planning model of step 1052 comprises:
determining the vehicle position reward value according to the distance S1 between the center point of the vehicle and the center line of the lane;
wherein the content of the first and second substances,
when | S1| > 1 or | S1| > 1, rt,position=-1;
When | S1| ═ 0, rt,position=0;
When 0 < | S1| < 1,
Figure BDA0003208634580000141
| S1| represents the absolute value of S1.
In fig. 7, S2 represents the self vehicle width, and S3 represents half the road width. The values of S2 and S3 can be set according to the needs of model training and the actual automatic driving situation, for example, S2 is 1.8 meters, and S3 is 2 meters. The dotted box 702 of fig. 7 includes a legend.
The rectangle illustrating the host vehicle illustrated in fig. 7 may correspond to a rectangular box area in the environmental status feature map that indicates the host vehicle. The center point of the bicycle is indicated by the center point of the rectangle.
In some embodiments, deriving a vehicle state transition reward value based on the vehicle speed reward value and the vehicle location reward value of step 1053 comprises:
the vehicle state transition reward value
rt=rt,speed+rt,position
Wherein r ist,speedRepresenting said vehicle speed reward value, rt,positionRepresenting the vehicle location reward value.
In some embodiments, calculating the model reward estimate based on the vehicle state transition reward value of step 1054 includes:
the model return estimate
Figure BDA0003208634580000142
Wherein ρ is an estimation coefficient, T represents a total frame number of the environmental status feature map in the environmental status feature map sequence, the total frame number corresponds to a time point corresponding to the end of the path planning, and T is a positive integer.
In order to explain the training process of the path planning model of the present application, the composition of the path planning model is further explained.
In some embodiments, the backbone neural network, the feature vectorization first module, the first fully-connected network and the second fully-connected network in the path planning model are connected to form an Actor network;
And the Actor network is connected with the feature vectorization second module, the third full-connection network and the fourth full-connection network to form a Critic network.
Wherein the Actor network outputs a planned trajectory a of the host vehicletThe neural network weight parameter needing to be learned by the Actor network is thetaμThe Actor network is expressed as a weight parameter in the form of at=μ(stu),stRepresenting the environment and state data at the current time;
the Critic network output model return estimation value QtThe neural network weight parameters needing to be learned by the Critic network comprise theta of the first half networkμAnd theta of the latter half networkEThe Critic network is expressed as a weight parameterIs of the form Qt=Q(st,atμ,θE) (ii) a The environment state characteristic diagram sequence comprises a multi-frame environment state characteristic diagram.
Next, a method for training the path planning model of the present application will be explained. The process of training the path planning model is also the process of implementing reinforcement learning.
In some embodiments, the neural network weight parameter θ for the Actor network and the Critic networkμAnd thetaEThe process of performing reinforcement learning includes: step 301, setting the number of playback buffers RB used for reinforcement learning and the number of sample batches N during training, wherein RB and N are positive integers; step 302, weighting the neural network weight parameter theta μAnd thetaEThe Actor network mu(s)tμ) The Critic network Q(s)t,atμ,θE) Carrying out initialization; step 303, constructing the network mu(s) of the Actortμ) And said Critic network Q(s)t,atμ,θE) Is completely identical to the first target network mu'(s)tμ′) And a second target network Q'(s)t,atμ′,θE′) (ii) a Step 304, for the first target network μ'(s)tμ′) And a second target network Q'(s)t,atμ′,θE′) Weight parameter theta ofμ′And thetaE′Carrying out initialization; step 305, setting the update period Num of the target network weight parameterupddte(ii) a Step 306, setting an initial value s of the environment and state data1And the target network update count value NumcountAn initial value of (d); step 307, for the environment and status data with total frame number of T frames, the environment and status data is s1Initially, a learning step is performed.
In some embodiments, the environmental and status data of step 307 is s for the total frame number of T frames1Initially, the performing learning step includes: step 3071, the step of making the Chinese character' BingdangFront Actor network output atAdding disturbance to obtain at,dAs the motion track indication of the current frame; 3072, environment and state data s based on the current timetPerforming a on the environment and statet,dAnd obtaining environment and state data s after the vehicle state is transferred t+1And corresponding vehicle state transition excitation value rt(ii) a Step 3073, transferring the current vehicle state to corresponding sample vector(s)t,at,d,rt,st+1) Saving in the playback cache; step 3074, randomly extracting N samples(s) from the playback bufferi,ai,d,ri,si+1)(i=1,2,…,N,ai,d∈at,d,ri∈rt) And training the Actor network and the Critic network.
In some embodiments, the initialization in step 302 includes randomizing the parameters for the neural network weight parameter θμAnd thetaEThe Actor network mu(s)tμ) The Critic network Q(s)t,atμ,θE) Initialization is performed. The way of choosing the randomization parameters can be chosen as desired.
In some embodiments, the output a to the current Actor network of step 3071tAdding disturbance to obtain at,dThe motion track indication as the current frame comprises the following steps:
at,d=μ(stμ)+σζt-βμ(stμ)
wherein zeta is a Gaussian random process, sigma is a first disturbance parameter, and beta is a second disturbance parameter. The disturbance adding process is carried out during model training, and the disturbance adding process is not carried out when the model training is carried out after the model training is finished and is used. σ is, for example, 1.2, and β is, for example, 0.15.
In some embodiments, the current time-based environment and state data s of step 3072tPerforming a on the environment and statet,dAnd obtaining environment and state data s after the vehicle state is transferred t+1By operation of the controller module 502 and the environment and state data calculation module 504.
In some embodiments, the step 3073 randomly draws N samples(s) from the playback bufferi,ai,d,ri,si+1)(i=1,2,…,N,ai,d∈at,d) Training the Actor network and the Critic network comprises: step 401, calculating the model return estimation value QtA target value of (d); step 402, calculating a model return estimation value Q of the current frametAverage residual error with model reported target value; step 403, selecting and updating the weight parameter theta of the Critic network according to the sampling result of Bernoulli distributionμAnd thetaEThe manner of (a); step 404, weighting parameter theta of the Actor networkμUpdating is carried out; step 405, updating the count value Num of the target networkcountUpdating is carried out; step 406, comparing the updated count value Num of the target networkcountAnd the update period value NumupdateObtaining a judgment result; step 407, determining whether to determine the weight parameter θ of the target network according to the determination resultμ′And thetaE′And (6) updating.
In some embodiments, the model return estimate Q of step 401 is calculatedtThe target values of (a) include: the model return estimate QtTarget value of
yi=ri+γQ′(si+1,μ′(si+1μ′)|θμ′,θE′);
Where γ is a target value coefficient. γ is, for example, 0.9.
In some embodiments, the model return estimate Q of the current frame is calculated in step 402tThe average residuals from the model reward target values include:
mean residual error
Figure BDA0003208634580000171
Wherein, yiRepresenting the model reward objective value.
In some embodiments, the weight parameter θ of the criticic network is selected and updated according to the sampling result of the bernoulli distribution in step 403μAnd thetaEThe method comprises the following steps:
for each frame of the environment and state data, sampling a Bernoulli sample once according to Bernoulli distribution to obtain a sampling result;
if the sampling result is 1, according to
Figure BDA0003208634580000172
Figure BDA0003208634580000173
Weight parameter theta to the Critic networkEUpdating is carried out, and the weight parameter thetaμKeeping the same;
if the sampling result is 0, according to
Figure BDA0003208634580000174
Figure BDA0003208634580000175
Figure BDA0003208634580000176
Figure BDA0003208634580000177
Weight parameter theta to the Critic networkμAnd thetaEUpdating is carried out;
wherein the content of the first and second substances,
Figure BDA0003208634580000178
representing the function L vs. thetaEThe derivation is carried out by the derivation,
Figure BDA0003208634580000179
representing the function L vs. thetaμAnd (4) derivation, wherein the probability that the Bernoulli sample in the Bernoulli distribution is 1 is taken as k, and k is more than 0 and less than 1. k is, for example, 0.55.
In some embodiments, the weight parameter θ of step 404 to the Actor networkμThe updating comprises the following steps:
according to
Figure BDA0003208634580000181
Figure BDA0003208634580000182
A weight parameter theta to the Actor networkμUpdating, wherein J ═ Q (s, a | theta)μ,θE)。
In some embodiments, the count Num is updated to the target network of step 405 countThe updating comprises the following steps:
Numcount=Numcounr+1。
in some embodiments, the step 407 determines whether to use the weight parameter θ of the target network according to the determination resultμ′And thetaE′The updating comprises the following steps:
if the target network updates the count value NumcountIs less than the update period value NumupdateContinuing the learning step;
if the target network updates the count value NumcountIs equal to the update period value NumupdateAccording to
θE′←τθE+(1-τ)θE′
θμ′←τθμ+(1-τ)θμ′
A weight parameter theta to the target networkμ′And thetaE′Updating and updating the target network update count value NumcountResetting to zero; wherein, τ is the update coefficient of the target network weight. τ is, for example, 0.1.
As mentioned above, the environmental status characteristic diagram sequence of the own vehicle comprises a multi-frame environmental status characteristic diagram. In some embodiments, the environmental state feature map sequence is formed by the environmental state feature map of the current frame and several consecutive frames before the current frame. And taking the vehicle state information of the current frame as the vehicle state information of the own vehicle.
In some embodiments, the step 1011 of generating the image of the environment with the own vehicle as the center of the image based on the map data includes the step 501 of setting processing parameters of the image. And 502, acquiring local map information with the radius of R based on the coordinate position of the current central point of the self-vehicle. Step 503, performing coordinate transformation on the road center line and the road boundary line in the local map information. Step 504, determine the RGB values of the pixels of the environmental still picture. And 505, generating the environment static picture based on the RGB values of the pixel points.
In some embodiments, the processing parameters for the picture include an initial resolution, a final resolution, and a scale ratio of picture pixels to the actual perceptual environment. The radius R can be set according to the actual situation, for example, R is 100 meters.
In some embodiments, the coordinate transformation of the road centerline and the road boundary line in step 503 includes, in step 5031, taking a picture with all black pixels in RGB color representation as a base picture of the environmental still picture. Step 5032, placing the center point of the vehicle at the center of the base map, and setting the heading angle direction of the vehicle to be right above the base map. Step 5033, converting the coordinates of the road center line and the road boundary line from absolute coordinates in a world coordinate system to relative coordinates in a cartesian coordinate system with the self-vehicle as an origin and the heading angle direction of the self-vehicle as the positive direction of the y axis. Step 5034, converting the relative coordinates of the road center line and the road boundary line into pixel coordinates which are set to be right above the environment static picture by taking the vehicle center point as a pixel center point on the environment static picture and the heading angle direction of the vehicle.
In some embodiments, in step 5033, the specific transformation formula for transforming the coordinates of the road center line and the road boundary line from the absolute coordinates in the world coordinate system to the relative coordinates in the cartesian coordinate system with the own vehicle as the origin and the heading angle direction of the own vehicle as the positive y-axis direction is as follows:
x2=(x-xcenter)*cosθ+(y-ycenter)*sinθ
y2=(y-ycenter)*cosθ-(x-xcenter)*sinθ。
in step 5034, the specific conversion formula for converting the relative coordinates of the road centerline and the road boundary line into the pixel coordinates with the vehicle center point as the pixel center point on the environmental still image and the heading angle direction of the vehicle as the position right above the environmental still image is as follows:
u=uimage_center+(x2/scale)
v=vimage_center+(y2/scale)
combining step 5033 with step 5034, a conversion formula for converting the absolute coordinates into the pixel coordinates can be obtained as follows:
u=uimage_center+(((x-xcenter)*cosθ+(y-ycenter)*sinθ)/scale)
Figure BDA0003208634580000191
wherein x and y represent the abscissa and ordinate of absolute coordinates in a world coordinate system, u and v represent the abscissa and ordinate of pixel points, and xcenter、ycenterAbsolute coordinates, u, representing the centre point of the vehicleimage_center、vimage_centerCoordinates of center pixel point representing environment picture corresponding to the vehicleThe coordinates of pixel points of the center points on the picture, theta is the course angle of the vehicle, and scale is the scale proportion of the picture pixels to the actual perception environment.
In some embodiments, the determining RGB values of the pixels of the environmental still image in step 504 includes marking pixels in a polygonal area surrounded by the border line of the road as pure white pixels in an RGB color representation manner, where the polygonal area corresponds to a drivable area of the host vehicle. Then, for a point in the center line of the road, determining the RGB value of the point according to the deviation angle of the heading angle of the point and the heading angle of the own vehicle.
In some embodiments, determining the RGB values for a point in the road centerline from the angle of deviation of the heading angle of the point from the heading angle of the host vehicle comprises:
by passing
Figure BDA0003208634580000201
Determining the value of the V component of the point in the HSV color representation mode;
wherein, pi is the circumference ratio,
Figure BDA0003208634580000202
the course angle of a point in the road center line is theta, the course angle of the vehicle is theta, and V is a V component when HSV is used for describing a point pixel; h is 240 degrees, and S is 1;
and after the value of the pixel point in the HSV color representation mode is obtained, converting the value of the HSV color representation mode into the value of the corresponding RGB color representation mode.
In some embodiments, the generating the environmental still picture of step 505 comprises: and on the base map of the environment static picture, generating the environment static picture comprising the road center line and the drivable area around the self vehicle based on the drivable area surrounded by the road boundary line, the pixel point coordinates of the road center line and the pixel point RGB values.
In some embodiments, the generating the environmental dynamic picture centering on the self-vehicle based on the target detection and tracking result in step 1012 includes, in step 601, acquiring absolute coordinates of boundary points of the target object of which the target category is the vehicle. Step 602, performing coordinate transformation on the absolute coordinates of the target object. Step 603, determining the RGB values of the pixels of the target object in the environment dynamic picture. Step 604, generating the environment dynamic picture based on the RGB values of the pixels of the environment dynamic picture.
In some embodiments, the process of coordinate transformation of step 602 is similar to the process of coordinate transformation set forth in steps 603 and 5034, for example. And will not be described in detail herein.
In some embodiments, the determining RGB values of the target object at the pixel point of the environment moving picture in step 603 includes:
by passing
Figure BDA0003208634580000211
Determining the value of the V component of the pixel point in the rectangular area corresponding to the target object in the HSV color representation mode;
wherein N isframesFor the total number of said successive frames, NpositionThe number of frame sequences of the frame in which the rectangular area is located in the continuous frames is the number;
taking H as 0 degree for the self vehicle; taking H as 60 degrees for non-self vehicles; and taking S as 1;
and then, converting the value of the HSV color representation mode into the value of the corresponding RGB color representation mode.
And generating the environmental state picture according to the environmental static picture and the environmental dynamic picture.
In some embodiments, generating the environmental status picture further comprises: and performing resolution clipping on the environment state picture. For example, the ambient state picture is cropped from an initial resolution to a final resolution.
In some embodiments, the input and output dimensions of the first, second, third, and fourth fully connected networks may be set according to model training needs and the actual situation of the autonomous driving. For example, if the path planning model outputs a vehicle trajectory for 5 seconds in the future (5 seconds after the current time), and outputs two-dimensional cartesian coordinates of a future position of the vehicle every half second, the Actor network outputs 20 values, and each value is a continuously variable value.
The feature vectorization first module expands the output quantity of the main neural network into a 1-dimensional vector, combines the vector with the vehicle state information of the vehicle and inputs the vector into the first full-connection network FC 1.
In some embodiments, the first fully-connected network FC1 has input and output dimensions of 2052 and 256, respectively, and the second fully-connected network FC2 has input and output dimensions of 256 and 20, respectively.
In the criticic network, the feature vectorization second module performs feature vectorization processing on the output of the Actor network (i.e., the output FC2_ output of the FC 2) and the result of the feature vectorization first module again, and inputs the result into the third fully-connected network FC 3. The output of the Critic network (i.e., the output of FC 4) is the return estimate for reinforcement learning of the model.
In some embodiments, the input and output dimensions of the third fully-connected network FC3 are 2072 and 256, respectively, and the input and output dimensions of the fourth fully-connected network FC4 are 256 and 1, respectively.
The former part of the Critic network is the Actor network, and the output of the Actor network (i.e. the output FC2_ output of FC 2) also serves as the input to the middle part of the Critic network. The architecture of an Actor-critical network may be referred to as a shared network.
According to the method for planning the driving path of the vehicle, the reinforcement learning model of the shared network and the related algorithm for model training are designed, and the reinforcement learning and training can be performed on the path planning model for planning the automatic driving path, so that the application requirement of automatic driving can be better met.
Meanwhile, because the input and the output of the model are all in the expression mode of intermediate quantity, the model can be subjected to reinforcement learning based on a simulation environment.
The present application further provides a driving path planning device for a vehicle, including: a memory for storing instructions executable by the processor; and a processor for executing the instructions to implement the method as previously described.
Fig. 8 is a schematic diagram of a system implementation environment of a driving path planning apparatus for a vehicle according to an embodiment of the present application. The driving path planning apparatus 800 of the vehicle may include an internal communication bus 801, a Processor (Processor)802, a Read Only Memory (ROM)803, a Random Access Memory (RAM)804, and a communication port 805. The vehicle driving path planning apparatus 800 is connected to a network through a communication port and may be connected to a server side, which may provide a strong data processing capability. The internal communication bus 801 may enable data communication between components of the driving path planning apparatus 800 of the vehicle, such as a CAN bus. The processor 802 may make the determination and issue the prompt. In some embodiments, the processor 802 may be comprised of one or more processors. The communication port 805 may enable sending and receiving information and data from a network. The vehicle's travel path planning apparatus 800 may also include various forms of program storage units and data storage units, such as a Read Only Memory (ROM)803 and a Random Access Memory (RAM)804, capable of storing various data files for computer processing and/or communication use, as well as possibly program instructions for execution by the processor 802. The processor executes these instructions to implement the main parts of the method. The results processed by the processor may be communicated to the user device via the communication port and displayed on a user interface, such as an interactive interface of the in-vehicle system.
The vehicle travel path planning apparatus 800 may be implemented as a computer program, stored in a memory, and executed by a processor 802 to implement the vehicle travel path planning method of the present application.
The present application also provides a computer readable medium having stored thereon computer program code which, when executed by a processor, implements a method of driving path planning for a vehicle as described above.
Aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. The processor may be one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), digital signal processing devices (DAPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, or a combination thereof. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media. For example, computer-readable media may include, but are not limited to, magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips … …), optical disks (e.g., Compact Disk (CD), Digital Versatile Disk (DVD) … …), smart cards, and flash memory devices (e.g., card, stick, key drive … …).
The computer readable medium may comprise a propagated data signal with the computer program code embodied therein, for example, on a baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, and the like, or any suitable combination. The computer readable medium can be any computer readable medium that can communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or device. Program code on a computer readable medium may be propagated over any suitable medium, including radio, electrical cable, fiber optic cable, radio frequency signals, or the like, or any combination of the preceding.
Similarly, it should be noted that in the preceding description of embodiments of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
Although the present application has been described with reference to the present specific embodiments, it will be recognized by those skilled in the art that the foregoing embodiments are merely illustrative of the present application and that various changes and substitutions of equivalents may be made without departing from the spirit of the application, and therefore, it is intended that all changes and modifications to the above-described embodiments that come within the spirit of the application fall within the scope of the claims of the application.

Claims (32)

1. A method for planning a driving path of a vehicle comprises the following steps:
generating an environment state characteristic diagram sequence of the self-vehicle based on the map data and the target tracking result;
acquiring vehicle state information of the self vehicle;
taking the environment state characteristic diagram sequence of the self vehicle and the self vehicle state information as environment and state data, and inputting the environment and state data into a path planning model;
and acquiring a planned track of the self-vehicle output by the path planning model.
2. The method for planning a travel path of a vehicle according to claim 1, characterized by further comprising:
and obtaining a model return estimation value output by the path planning model, and evaluating the model based on the return estimation value.
3. The method for planning a driving path of a vehicle according to claim 1, wherein the path planning model includes a trunk neural network, a first feature vectorization module, a first fully-connected network, a second feature vectorization module, a third fully-connected network, and a fourth fully-connected network, which are connected in sequence;
the environment state feature diagram sequence of the vehicle is input into the backbone neural network, and the vehicle state information is input into the feature vectorization first module.
4. The method for planning a driving path of a vehicle according to claim 3, wherein the environment and state data is updated by operation of a controller module and an environment and state data calculation module;
wherein the second fully-connected network inputs planned trajectory data of the self-vehicle into the controller module, the controller module controls the self-vehicle to travel, and inputs first data generated by the self-vehicle to travel and second data generated by observing the periphery of the self-vehicle into the environment and state data calculation module, and the environment and state data calculation module updates the environment and state data based on the first data and the second data; the fourth fully-connected network outputs a model-reported estimate.
5. The method according to claim 4, wherein the environment and state data calculation module outputs a vehicle state transition reward value of the path planning model.
6. The method of claim 1, wherein the vehicle state information comprises speed, acceleration, heading angle, and heading angular velocity.
7. The method according to claim 2, wherein obtaining the model reward estimate output by the path planning model comprises:
calculating a vehicle speed reward value of the path planning model;
calculating a vehicle position reward value of the path planning model;
deriving a vehicle state transition reward value based on the vehicle speed reward value and the vehicle location reward value;
and calculating to obtain the model return estimation value based on the vehicle state transition reward value.
8. The method of claim 7, wherein calculating a vehicle speed reward value for the path planning model comprises:
according to the actual speed V of the bicyclerealAnd the desired speed V of the vehicleexpReceive the reward parameter G speed
According to the reward parameter GspeedObtaining the vehicle speed reward value rt,speed
9. The method according to claim 8, wherein the method is based on an actual speed V of the vehiclerealAnd the desired speed V of the vehicleexpReceive the reward parameter GspeedThe method comprises the following steps:
Figure FDA0003208634570000021
wherein, | Vreal-VexpI represents the pair Vreal-VexpAnd taking an absolute value.
10. The method according to claim 8, wherein the reward parameter G is a function of the vehiclespeedObtaining the vehicle speed reward value rt,speedThe method comprises the following steps:
when G isspeed> 1 or Gspeed=1,rt,speed=0;
When G isspeed=0,rt,speed=1;
When 0 < Gspeed<1,
Figure FDA0003208634570000022
11. The method according to claim 8, wherein the desired speed V is a speed of the vehicleexpThe calculation of (a) includes:
when the vehicle meets the red light road condition,
when the distance between the vehicle and the red light stop line is larger than L1, the desired speed of the vehicle
Vexp=Vexp,max
When the distance between the self vehicle and the red light stop line is less than or equal to L1, the desired speed of the self vehicle is according to
Figure FDA0003208634570000031
Performing linear deceleration;
LD is the distance between the current time and the red light stop line of the bicycle, and the distance between the red light stop line and the red light is L2Vexp,maxIs the maximum desired speed.
12. The method according to claim 8, wherein the desired speed V is a speed of the vehicle expThe calculation of (a) includes:
when the vehicle meets the road condition of the obstacle,
the desired speed of the host vehicle is determined when an actual distance P between the host vehicle and the obstacle, a distance D2 between the obstacle stop line and the obstacle, and a distance D1 between the host vehicle and the obstacle stop line satisfy P > D1+ D2
Vexp=Vexp,max
When the actual distance P between the self vehicle and the obstacle, the distance D2 between the obstacle stop line and the obstacle and the distance D1 between the self vehicle and the obstacle stop line satisfy P ≦ D1+ D2, the desired speed of the self vehicle is equal to or less than D1+ D2
Figure FDA0003208634570000032
Performing linear deceleration;
wherein, Vexp,maxIs the maximum desired speed.
13. The cart of claim 8A method for planning a traveling route of a vehicle, characterized in that the desired speed V of the vehicle isexpThe calculation of (a) includes:
when the vehicle meets the green light road condition, the expected speed of the vehicle is
Vexp=Vexp,max
Wherein, Vexp,maxIs the maximum desired speed.
14. The method for planning a driving path of a vehicle according to claim 7, wherein calculating the vehicle location reward value of the path planning model comprises:
determining the vehicle position reward value according to the distance S1 between the center point of the vehicle and the center line of the lane;
wherein the content of the first and second substances,
when | S1| > 1 or | S1| > 1, r t,position=-1;
When | S1| ═ 0, rt,position=0;
When 0 < | S1| < 1,
Figure FDA0003208634570000041
| S1| represents the absolute value of S1.
15. The method for planning a travel path of a vehicle according to claim 7, wherein deriving a vehicle state transition reward value based on the vehicle speed reward value and the vehicle location reward value comprises:
the vehicle state transition reward value
rt=rt,speed+rt,position
Wherein r ist,speedRepresenting said vehicle speed reward value, rt,positionRepresenting the vehicle location reward value.
16. The method for planning a driving path of a vehicle according to claim 7, wherein calculating the model reward estimation value based on the vehicle state transition reward value comprises:
the model return estimate
Figure FDA0003208634570000042
Wherein ρ is an estimation coefficient, T represents a total frame number of the environmental status feature map in the environmental status feature map sequence, the total frame number corresponds to a time point corresponding to the end of the path planning, and T is a positive integer.
17. The method according to claim 3, wherein the backbone neural network, the first module for vectorization of features, the first fully-connected network and the second fully-connected network are connected to form an Actor network;
the Actor network is connected with the feature vectorization second module, the third fully-connected network and the fourth fully-connected network to form a Critic network;
Wherein the Actor network outputs a planned trajectory a of the host vehicletThe neural network weight parameter needing to be learned by the Actor network is thetaμThe Actor network is expressed as a weight parameter in the form of at=μ(stu),stRepresenting the environment and state data at the current time;
the Critic network output model return estimation value QtThe neural network weight parameters needing to be learned by the Critic network comprise theta of the first half networkμAnd theta of the latter half networkEThe Critic network is expressed as a weight parameter in the form of Qt=Q(st,atμ,θE) (ii) a The environment state characteristic diagram sequence comprises a multi-frame environment state characteristic diagram.
18. The method according to claim 17, wherein a neural network weight parameter θ for the Actor network and the Critic network is set to θμAnd thetaETo proceed with intensive chemistryThe learning process comprises the following steps:
setting the number RB of playback buffers for the reinforcement learning and the number N of sample batches during training, wherein the RB and the N are positive integers;
for the weight parameter theta of the neural networkμAnd thetaEThe Actor network mu(s)tμ) The Critic network Q(s)t,atμ,θE) Carrying out initialization;
constructing and relating to the Actor network mu(s)tμ) And said Critic network Q(s)t,atμ,θE) Is completely identical to the first target network mu'(s) tμ′) And a second target network Q'(s)t,atμ′,θE′);
For the first target network mu'(s)tμ′) And a second target network Q'(s)t,atμ′,θE′) Weight parameter theta ofμ′And thetaE′Carrying out initialization;
setting the update period value Num of the target network weight parameterupdate
Setting an initial value s of the environment and state data1And the target network update count value NumcountAn initial value of (d);
for the environment and state data with the total frame number of T frames, the environment and state data is s1Initially, a learning step is performed.
19. The method according to claim 18, wherein the environmental and status data of T frames is s from the environmental and status data1Initially, the performing learning step includes:
output a to the current Actor networktAdding disturbance to obtain at,dAs the motion track indication of the current frame;
context and status data s based on the current timetPerforming a on the environment and statet,dAnd get the carEnvironmental and state data s after vehicle state transitiont+1And corresponding vehicle state transition excitation value rt
Transferring the current vehicle state to corresponding sample vector(s)t,at,d,rt,st+1) Saving in the playback cache;
randomly taking N samples(s) from the playback bufferi,ai,d,ri,si+1)(i=1,2,…,N,ai,d∈at,d,ri∈rt) And training the Actor network and the Critic network.
20. The method according to claim 18, wherein the initializing comprises randomizing the parameters for the neural network weight parameter θμAnd thetaEThe Actor network mu(s)tμ) The Critic network Q(s)t,atμ,θE) Initialization is performed.
21. The method according to claim 19, wherein a is output to a current Actor networktAdding disturbance to obtain at,dThe motion track indication as the current frame comprises the following steps:
at,d=μ(stμ)+σζt-βμ(stμ)
wherein zeta is a Gaussian random process, sigma is a first disturbance parameter, and beta is a second disturbance parameter.
22. The method according to claim 19, wherein the method is based on environmental and status data s at the current timetPerforming a on the environment and statet,dAnd obtaining environment and state data s after the vehicle state is transferredt+1The environment and state data calculation module is operated to realize the environment and state data calculation.
23. The method for planning a driving path of a vehicle according to claim 22, wherein the controller module controls lateral and longitudinal movements of the host vehicle.
24. The method for planning a driving path of a vehicle according to claim 19, characterized in that N samples(s) are randomly drawn from the playback buffer i,at,d,ri,si+1)(i=1,2,…,N,ai,d∈at,d) Training the Actor network and the Critic network comprises:
calculating the model return estimation value QtA target value of (d);
calculating the model return estimated value Q of the current frametAverage residual error with model reported target value;
selecting and updating the weight parameter theta of the Critic network according to the sampling result of Bernoulli distributionμAnd thetaEThe manner of (a);
a weight parameter theta to the Actor networkμUpdating is carried out;
updating the count value Num for the target networkcountUpdating is carried out;
comparing the updated count value Num of the target networkcountAnd the update period value NumupdateObtaining a judgment result;
determining whether to determine the weight parameter theta of the target network according to the judgment resultμ′And thetaE′And (6) updating.
25. The method according to claim 24, wherein the model return estimation value Q is calculatedtThe target values of (a) include:
the model return estimate QtTarget value of
yi=ri+γQ′(si+1,μ′(si+1μ′)|θμ′,θE′):
Where γ is a target value coefficient.
26. The method according to claim 24, wherein the model return estimation value Q of the current frame is calculatedtThe average residuals from the model reward target values include:
mean residual error
Figure FDA0003208634570000071
Wherein, yiRepresenting the model reward objective value.
27. The method according to claim 24, wherein the weighting parameter θ of the Critic network is selected and updated according to the sampling result of bernoulli distributionμAnd thetaEThe method comprises the following steps:
for each frame of the environment and state data, sampling a Bernoulli sample once according to Bernoulli distribution to obtain a sampling result;
if the sampling result is 1, according to
Figure FDA0003208634570000072
Figure FDA0003208634570000073
Weight parameter theta to the Critic networkEUpdating is carried out, and the weight parameter thetaμKeeping the same;
if the sampling result is 0, according to
Figure FDA0003208634570000074
Figure FDA0003208634570000075
Figure FDA0003208634570000076
Figure FDA0003208634570000077
Weight parameter theta to the Critic networkμAnd thetaEUpdating is carried out;
wherein the content of the first and second substances,
Figure FDA0003208634570000078
representing the function L vs. thetaEThe derivation is carried out by the derivation,
Figure FDA0003208634570000079
representing the function L vs. thetaμAnd (4) derivation, wherein the probability that the Bernoulli sample in the Bernoulli distribution is 1 is taken as k, and k is more than 0 and less than 1.
28. The method according to claim 24, wherein a weight parameter θ for the Actor network is set toμThe updating comprises the following steps:
according to
Figure FDA0003208634570000081
Figure FDA0003208634570000082
A weight parameter theta to the Actor networkμThe updating is carried out, and the updating is carried out,
wherein J ═ Q: (s,a|θμ,θE)。
29. The method according to claim 24, wherein a count value Num is updated for the target network countThe updating comprises the following steps:
Numcount=Numcount+1。
30. the method according to claim 24, wherein whether to determine the weight parameter θ for the target network is determined according to the determination resultμ′And thetaE′The updating comprises the following steps:
if the target network updates the count value NumcountIs less than the update period value NumupdateContinuing the learning step;
if the target network updates the count value NumcountIs equal to the update period value NumupdateAccording to
θE′←τθE+(1-τ)θE′
θμ′←τθμ+(1-τ)θμ′
A weight parameter theta to the target networkμ′And thetaE′Updating and updating the target network update count value NumcountResetting to zero;
wherein, τ is the update coefficient of the target network weight.
31. The method for planning a travel path of a vehicle according to claim 1, wherein the sequence of the environmental state feature maps of the host vehicle includes a plurality of frames of environmental state feature maps, and each of the environmental state feature maps is generated by:
generating an environment static picture taking the self-vehicle as a picture center based on the map data;
generating an environment dynamic picture taking the self-vehicle as a picture center based on the target detection tracking result;
and generating the environment state characteristic graph according to the environment static picture and the environment dynamic picture.
32. The method for planning a driving path of a vehicle according to claim 31, wherein generating an environment state feature map based on the environment still picture and the environment moving picture comprises:
taking the environment static picture as a base map;
overlaying picture information contained in the environment dynamic picture on the base map;
taking the self-vehicle central point of the current frame as a pixel central point on the environment state characteristic diagram;
and setting the heading angle direction of the vehicle as the direction right above the environmental state characteristic diagram, and generating the environmental state characteristic diagram.
CN202110927868.6A 2021-08-12 2021-08-12 Vehicle travel path planning method Active CN113625718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110927868.6A CN113625718B (en) 2021-08-12 2021-08-12 Vehicle travel path planning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110927868.6A CN113625718B (en) 2021-08-12 2021-08-12 Vehicle travel path planning method

Publications (2)

Publication Number Publication Date
CN113625718A true CN113625718A (en) 2021-11-09
CN113625718B CN113625718B (en) 2023-07-21

Family

ID=78385148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110927868.6A Active CN113625718B (en) 2021-08-12 2021-08-12 Vehicle travel path planning method

Country Status (1)

Country Link
CN (1) CN113625718B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023102827A1 (en) * 2021-12-09 2023-06-15 华为技术有限公司 Path constraint method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108931981A (en) * 2018-08-14 2018-12-04 汽-大众汽车有限公司 A kind of paths planning method of automatic driving vehicle
CN110631596A (en) * 2019-04-23 2019-12-31 太原理工大学 Equipment vehicle path planning method based on transfer learning
CN111061277A (en) * 2019-12-31 2020-04-24 歌尔股份有限公司 Unmanned vehicle global path planning method and device
CN111141300A (en) * 2019-12-18 2020-05-12 南京理工大学 Intelligent mobile platform map-free autonomous navigation method based on deep reinforcement learning
EP3657130A1 (en) * 2017-01-12 2020-05-27 Mobileye Vision Technologies Ltd. Navigation based on vehicle activity
EP3688540A1 (en) * 2018-12-18 2020-08-05 Beijing Voyager Technology Co., Ltd. Systems and methods for autonomous driving
US20200331465A1 (en) * 2019-04-16 2020-10-22 Ford Global Technologies, Llc Vehicle path prediction
CN112381132A (en) * 2020-11-11 2021-02-19 上汽大众汽车有限公司 Target object tracking method and system based on fusion of multiple cameras
CN113156963A (en) * 2021-04-29 2021-07-23 重庆大学 Deep reinforcement learning automatic driving automobile control method based on supervision signal guidance

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3657130A1 (en) * 2017-01-12 2020-05-27 Mobileye Vision Technologies Ltd. Navigation based on vehicle activity
CN108931981A (en) * 2018-08-14 2018-12-04 汽-大众汽车有限公司 A kind of paths planning method of automatic driving vehicle
EP3688540A1 (en) * 2018-12-18 2020-08-05 Beijing Voyager Technology Co., Ltd. Systems and methods for autonomous driving
US20200331465A1 (en) * 2019-04-16 2020-10-22 Ford Global Technologies, Llc Vehicle path prediction
CN110631596A (en) * 2019-04-23 2019-12-31 太原理工大学 Equipment vehicle path planning method based on transfer learning
CN111141300A (en) * 2019-12-18 2020-05-12 南京理工大学 Intelligent mobile platform map-free autonomous navigation method based on deep reinforcement learning
CN111061277A (en) * 2019-12-31 2020-04-24 歌尔股份有限公司 Unmanned vehicle global path planning method and device
WO2021135554A1 (en) * 2019-12-31 2021-07-08 歌尔股份有限公司 Method and device for planning global path of unmanned vehicle
CN112381132A (en) * 2020-11-11 2021-02-19 上汽大众汽车有限公司 Target object tracking method and system based on fusion of multiple cameras
CN113156963A (en) * 2021-04-29 2021-07-23 重庆大学 Deep reinforcement learning automatic driving automobile control method based on supervision signal guidance

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAIQING SHEN 等: "Path-Following Control of Underactuated Ships using Actor-Critic Reinforcement Learning with MLP Neural Networks", 《SIXTH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY》, pages 317 - 321 *
张栩源 等: "自动驾驶汽车路径规划技术", 《汽车工程师》, pages 35 - 39 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023102827A1 (en) * 2021-12-09 2023-06-15 华为技术有限公司 Path constraint method and device

Also Published As

Publication number Publication date
CN113625718B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN114384920B (en) Dynamic obstacle avoidance method based on real-time construction of local grid map
CN110007675B (en) Vehicle automatic driving decision-making system based on driving situation map and training set preparation method based on unmanned aerial vehicle
Chen et al. Attention-based hierarchical deep reinforcement learning for lane change behaviors in autonomous driving
KR102539942B1 (en) Method and apparatus for training trajectory planning model, electronic device, storage medium and program
US11003928B2 (en) Using captured video data to identify active turn signals on a vehicle
CN115303297B (en) Urban scene end-to-end automatic driving control method and device based on attention mechanism and graph model reinforcement learning
JP2020064619A (en) Device and method for training image recognition model and method for recognizing image
US11873006B2 (en) Virtual lane estimation using a recursive self-organizing map
Friji et al. A dqn-based autonomous car-following framework using rgb-d frames
Masmoudi et al. Autonomous car-following approach based on real-time video frames processing
CN113625718B (en) Vehicle travel path planning method
Souza et al. Vision-based waypoint following using templates and artificial neural networks
Hu et al. Learning dynamic graph for overtaking strategy in autonomous driving
Bhaggiaraj et al. Deep Learning Based Self Driving Cars Using Computer Vision
da Silva Bastos et al. Vehicle speed detection and safety distance estimation using aerial images of Brazilian highways
Holder et al. Learning to drive: End-to-end off-road path prediction
Chen et al. From perception to control: an autonomous driving system for a formula student driverless car
CN113570595B (en) Vehicle track prediction method and optimization method of vehicle track prediction model
Zhang et al. Learning how to avoiding obstacles for end-to-end driving with conditional imitation learning
CN113793371B (en) Target segmentation tracking method, device, electronic equipment and storage medium
CN115107806A (en) Vehicle track prediction method facing emergency scene in automatic driving system
Beglerovic et al. Polar occupancy map-a compact traffic representation for deep learning scenario classification
Souza et al. Template-based autonomous navigation and obstacle avoidance in urban environments
Kashyap et al. A Minimalistic Model for Converting Basic Cars Into Semi-Autonomous Vehicles Using AI and Image Processing
Souza et al. Vision and GPS-based autonomous vehicle navigation using templates and artificial neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant