WO2023082726A1 - 换道策略生成方法和装置、计算机存储介质、电子设备 - Google Patents

换道策略生成方法和装置、计算机存储介质、电子设备 Download PDF

Info

Publication number
WO2023082726A1
WO2023082726A1 PCT/CN2022/109804 CN2022109804W WO2023082726A1 WO 2023082726 A1 WO2023082726 A1 WO 2023082726A1 CN 2022109804 W CN2022109804 W CN 2022109804W WO 2023082726 A1 WO2023082726 A1 WO 2023082726A1
Authority
WO
WIPO (PCT)
Prior art keywords
lane
current
learner
state information
current vehicle
Prior art date
Application number
PCT/CN2022/109804
Other languages
English (en)
French (fr)
Inventor
徐鑫
Original Assignee
京东鲲鹏(江苏)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东鲲鹏(江苏)科技有限公司 filed Critical 京东鲲鹏(江苏)科技有限公司
Publication of WO2023082726A1 publication Critical patent/WO2023082726A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • Embodiments of the present disclosure relate to the technical field of intelligent driving, and in particular, relate to a method for generating a lane-changing strategy, a device for generating a lane-changing strategy, a computer storage medium, and an electronic device.
  • smart cars can obtain real-time information about people, vehicles and road environments, and have the capabilities of scene recognition, risk assessment, intelligent decision-making and control in complex environments, and are expected to become the future of traffic reduction. important means of accidents.
  • decision-making and path planning are the core of smart driving.
  • Traditional decision-making problems are rule-based expert systems, such as finite state machines, decision tree models, etc., but expert systems rely on prior knowledge and the cost of modeling is high. Poor scalability and insufficient scene generalization ability make it difficult to adapt to complex and changeable driving conditions; path planning usually uses a local path planning algorithm based on rolling time domain, which can plan safe and efficient collision-free paths in dynamic and changeable environments , however, the calculation amount is relatively large, and the practical application is limited.
  • a method for generating a lane-changing strategy including:
  • a current path data pair is generated based on the trajectory optimization control quantity
  • the current lane-changing learner is trained by the current path data to obtain a target lane-changing learner, and the lane-changing strategy of the current vehicle is obtained by the target lane-changing learner.
  • obtaining status information of the current vehicle and status information of surrounding vehicles associated with the current vehicle includes:
  • determining the surrounding vehicles associated with the current vehicle, and obtaining state information of the surrounding vehicles includes:
  • the longitudinal relative speed, longitudinal and lateral relative distance, reciprocal collision avoidance time, and following time distance of the preceding vehicle in the own lane, the preceding vehicle in the target lane, and the following vehicle in the target lane are obtained.
  • inputting the state information of the current vehicle and the state information of the surrounding vehicles into the current lane change learner and the preset trajectory optimizer includes:
  • the trajectory learning control quantity includes current vehicle acceleration and front wheel rotation angle
  • the trajectory optimization control quantity includes target acceleration and target front wheel rotation angle
  • generating a current path data pair based on the trajectory optimization control quantity includes:
  • the current lane change learner When it is determined that the deviation is smaller than the preset threshold, the current lane change learner converges, and the current lane change learner is used as a target lane change learner;
  • the current path data pair is generated by using the trajectory optimization control amount, the current vehicle state information, and the surrounding vehicle state information.
  • the current lane change learner is trained through the current path data to obtain a target lane change learner, including:
  • the current lane change learner is trained by including path data pairs in the target path data set until the current lane change learner converges to obtain the target lane change learner.
  • an apparatus for generating a lane-changing strategy including:
  • An input data acquisition module configured to acquire state information of the current vehicle and state information of surrounding vehicles associated with the current vehicle;
  • the output data acquisition module is used to input the state information of the current vehicle and the state information of the surrounding vehicles into the current lane change learner and the preset trajectory optimizer to obtain the trajectory learning control amount and the trajectory optimization control amount;
  • the current path data pair generation module when determining that the deviation between the trajectory learning control quantity and the trajectory optimization control quantity is greater than a preset threshold, generates a current path data pair based on the trajectory optimization control quantity;
  • a lane-changing strategy generating module configured to train the current lane-changing learner by using the current path data pair to obtain a target lane-changing learner.
  • a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method for generating a lane-changing strategy described in any one of the above exemplary embodiments is implemented.
  • an electronic device comprising:
  • a memory for storing executable instructions of the processor
  • the processor is configured to execute the lane-changing strategy generation method described in any one of the above exemplary embodiments by executing the executable instructions.
  • Fig. 1 schematically shows a system architecture block diagram of an application scenario of a method for generating a lane changing strategy according to an exemplary embodiment of the present disclosure.
  • Fig. 2 schematically shows a flowchart of a method for generating a lane changing strategy according to an exemplary embodiment of the present disclosure.
  • Fig. 3 schematically shows a block diagram of a system for generating a lane-changing strategy according to an exemplary embodiment of the present disclosure.
  • Fig. 4 schematically shows a flowchart of a method for acquiring state information of a current vehicle and state information of surrounding vehicles according to an exemplary embodiment of the present disclosure.
  • Fig. 5 schematically shows a flow chart of a method for determining surrounding vehicles associated with a current vehicle and acquiring state information of surrounding vehicles according to an exemplary embodiment of the present disclosure.
  • Fig. 6 schematically shows a structural block diagram of a current lane change learner according to an exemplary embodiment of the present disclosure.
  • Fig. 7 schematically shows a flow chart of a method for inputting state information of a current vehicle and surrounding vehicles into a current lane change learner and a preset trajectory optimizer according to an exemplary embodiment of the present disclosure.
  • Fig. 8 schematically shows a flowchart of a method for training a current lane-changing learner by using current path data to obtain a target lane-changing learner according to an exemplary embodiment of the present disclosure.
  • Fig. 9 schematically shows a flowchart of a method for generating a target lane change learner according to an exemplary embodiment of the present disclosure.
  • Fig. 10 schematically shows a comparison diagram of loss function curves during the training process of the current lane change learner according to an exemplary embodiment of the present disclosure.
  • Fig. 11 schematically shows a relationship diagram between the number of training times and the mean value of the absolute value error during the current training process of the lane change learner according to an exemplary embodiment of the present disclosure.
  • Fig. 12 schematically shows a relationship diagram between the number of training times and the percentage absolute error mean in the current lane change learner training process according to an exemplary embodiment of the present disclosure.
  • Fig. 13 schematically shows a comparison diagram of lane change strategies of an original path and a target lane change learner during an online simulation process according to an exemplary embodiment of the present disclosure.
  • Fig. 14 schematically shows a comparison chart of the original control amount, the acceleration of the target lane change learner, and the front wheel rotation angle during an online simulation process according to an exemplary embodiment of the present disclosure.
  • Fig. 15 schematically shows a block diagram of an apparatus for generating a lane-changing strategy according to an exemplary embodiment of the present disclosure.
  • Fig. 16 schematically shows an electronic device for implementing the above-mentioned method for generating a lane-changing strategy according to an exemplary embodiment of the present disclosure.
  • Example embodiments will now be described more fully with reference to the accompanying drawings.
  • Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the concept of example embodiments to those skilled in the art.
  • the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
  • numerous specific details are provided in order to give a thorough understanding of embodiments of the present disclosure.
  • those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced without one or more of the specific details being omitted, or other methods, components, devices, steps, etc. may be adopted.
  • well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
  • Scenario cognition in mixed traffic scenarios requires smart cars to further process, store, and extract environmental perception information, understand the composition and interaction of various traffic elements, and explain the behavior of road users and their evolution. It can be seen that, Scene cognition is a key link in intelligent driving. Traditional scene cognition is usually based on deductive logic and semantic description, and it is difficult to accurately model dynamic and changeable traffic objects and their behavior. In particular, Vulnerable Road Users (Vulnerable Road Users, VRUs) are of various types and numbers, their traffic safety awareness is weak, and their behaviors are highly dynamic and uncertain, which makes it difficult for existing behavioral cognitive models to describe the behavioral characteristics of VRUs. It poses a great challenge to autonomous driving in mixed traffic environments.
  • FIG. 1 shows a schematic block diagram of a system architecture of an exemplary application scenario where a method and device for generating a lane-changing strategy according to an embodiment of the present disclosure can be applied.
  • the system architecture 100 may include one or more of terminal devices 101, 102, a network 103 and a server 104.
  • the network 103 is used as a medium for providing communication links between the terminal devices 101 , 102 and the server 104 .
  • Network 103 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • the terminal devices 101 and 102 may be smart driving vehicles, including but not limited to smart driving cars, smart driving buses and the like. It should be understood that the numbers of terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • the server 104 may be a server cluster composed of multiple servers.
  • the method for generating a lane-changing strategy provided by the embodiments of the present disclosure is generally executed by the server 104 , and correspondingly, the device for generating a lane-changing strategy is generally disposed in the server 104 .
  • the method for generating lane-changing strategies provided by the embodiments of the present disclosure can also be executed by terminal devices 101 and 102, and correspondingly, the device for generating lane-changing strategies can also be set in terminal devices 101 and 102 , which is not specifically limited in this exemplary embodiment.
  • the current vehicle status information and the surrounding vehicle status information associated with the current vehicle may be uploaded to the server 104 through the terminal devices 101 and 102, and the server provides the The lane-changing strategy generation method generates a target lane-changing learner, generates the lane-changing strategy of the current vehicle through the target lane-changing learner, and transmits the generated current vehicle lane-changing strategy to the terminal devices 101, 102, etc. so that the terminal devices 101, 102 Execute the corresponding lane-changing decision according to the received current vehicle lane-changing strategy.
  • FIG. 2 shows a schematic flow diagram of a lane changing strategy generation method, referring to Figure 2, the lane changing strategy generation method may include the following steps:
  • Step S210 Obtain the state information of the current vehicle and the state information of surrounding vehicles associated with the current vehicle;
  • Step S220 Input the state information of the current vehicle and the state information of the surrounding vehicles into the current lane change learner and the preset trajectory optimizer to obtain the trajectory learning control amount and the trajectory optimization control amount;
  • Step S230 When it is determined that the deviation between the trajectory learning control quantity and the trajectory optimization control quantity is greater than a preset threshold, generate a current path data pair based on the trajectory optimization control quantity;
  • Step S240 Using the current path data to train the current lane-changing learner to obtain a target lane-changing learner, and obtain the lane-changing strategy of the current vehicle through the target lane-changing learner.
  • the above-mentioned lane changing strategy generation method obtains the state information of the current vehicle and the state information of the surrounding vehicles associated with the current vehicle; inputs the state information of the current vehicle and the state information of the surrounding vehicles into the current lane change learner And in the preset trajectory optimizer, the trajectory learning control quantity and the trajectory optimization control quantity are obtained; when it is determined that the deviation between the trajectory learning control quantity and the trajectory optimization control quantity is greater than a preset threshold, based on the trajectory optimization control quantity generating a current path data pair; using the current path data pair to train the current lane-changing learner to obtain a target lane-changing learner, and obtaining the lane-changing strategy of the current vehicle through the target lane-changing learner; On the one hand, input the obtained current vehicle state information and surrounding vehicle state information associated with the current vehicle into the current lane change learner and trajectory optimizer, obtain the trajectory learning control amount and trajectory optimization control amount, and calculate the trajectory learning control amount and trajectory Optimize the deviation between the control quantities.
  • the current path data pair is generated based on the trajectory optimization control quantity, and the lane change learner is trained through the current path data pair to obtain the target lane change learner.
  • the lane-changing learner generates the lane-changing strategy of the current vehicle, which solves the problems in the prior art that rely on prior knowledge, high modeling cost, poor scalability, and insufficient scene generalization ability when making decisions through expert systems.
  • the efficiency of lane changing strategy generation is improved; on the other hand, after the trajectory learning control amount of the current vehicle is obtained through the current lane changing learner, the deviation of the trajectory learning control amount is calculated, and the current path is generated when the deviation is determined to be greater than the preset threshold data pair and train the current lane change learner through the current path data pair to obtain the target lane change learner, which avoids a large error in the trajectory learning control output output by the current lane change learner, which causes the path of the current vehicle to change. Larger offsets improve the accuracy of generating lane-changing strategies.
  • the exemplary embodiments of the present disclosure can be applied to intelligent driving, mainly studying how to generate the lane-changing strategy of the current vehicle according to the status information of the current vehicle and the status information of surrounding vehicles associated with the current vehicle, and improve the lane-changing strategy of the current vehicle. Generated efficiency and accuracy.
  • the present disclosure Based on the obtained current vehicle state information and the surrounding vehicle state information associated with the current vehicle, the present disclosure inputs the obtained current vehicle state information and the surrounding vehicle state information associated with the current vehicle into the current lane change learner and the predictive In the established trajectory optimizer, the trajectory learning control amount is obtained through the current lane-changing learner, and the trajectory optimization control amount is obtained through the trajectory optimizer.
  • the current lane-changing learner is trained based on the training set; among them, the data in the training set is the data included in the original path data set; the trajectory optimizer is an optimizer based on mixed integrated integer quadratic programming; after obtaining the trajectory learning control quantity and the trajectory optimization control quantity, according to the relationship between the trajectory learning control quantity and the trajectory optimization control quantity
  • the deviation of the current path data pair is generated, the original path data set is updated through the generated current path data, and the training set is updated through the updated original path data set, and the current lane change learner is trained using the updated training set to generate
  • the target lane-changing learner is used to generate the lane-changing strategy of the current vehicle through the target lane-changing learner, which improves the efficiency and accuracy of lane-changing strategy generation.
  • the system for generating lane-changing strategies may include a state information collection module 310 , a current lane-changing learner 320 , a trajectory optimizer 330 and a target lane-changing learner 340 .
  • the state information collection module 310 is used to collect the state information of the current vehicle and the state information of the surrounding vehicles associated with the current vehicle, and perform Normalization process to obtain the normalized state information of the current vehicle and the normalized state information of surrounding vehicles;
  • the current lane change learner 320 is connected to the state information collection module 310 for obtaining the state information collection module 310
  • the state information of the current vehicle and the state information of the surrounding vehicles associated with the current vehicle are collected in the current vehicle, and the acquired state information of the current vehicle and the state information of the surrounding vehicles are input into the current lane change learner to obtain the trajectory learning of the current vehicle Control quantity, wherein, the trajectory learning control quantity of the current vehicle includes the acceleration of the current vehicle and the front wheel angle;
  • the trajectory optimizer 330 is connected to the state information collection module 310 network, and is used to obtain the current vehicle collected in the state information collection module 310 The state information of the current vehicle and the state information of the surrounding vehicles associated with the current vehicle are input into the trajectory optimizer 330 to obtain the trajectory optimization control
  • step S210-step S240 will be explained and illustrated in detail with reference to FIG. 3 .
  • step S210 state information of the current vehicle and state information of surrounding vehicles associated with the current vehicle are acquired.
  • the state information shown in Table 1 is the state information that needs to be considered when the current vehicle makes a lane change decision. Therefore, the absolute motion of the current vehicle can be selected as the state information of the current vehicle, including: the driving speed of the current vehicle, the heading angle, and the lateral distance between the current vehicle and the target lane;
  • the surrounding vehicles associated with the current vehicle include: the vehicle in front of the current vehicle's own lane, the vehicle in front of the target lane, and the vehicle behind the target lane; for the surrounding vehicles associated with the current vehicle, when the current vehicle changes lanes, the current vehicle and
  • the longitudinal relative speed, the longitudinal and lateral relative distances of the surrounding vehicles can be selected, in addition, the inverse of time to collision avoidance (inverse of time to collision, TTCi) can reflect the driver’s
  • TTCi inverse of time to collision avoidance
  • Table 1 The status information of the current vehicle and the status information of the surrounding vehicles
  • acquiring the state information of the current vehicle and the state information of surrounding vehicles associated with the current vehicle may include steps S410 and S420:
  • Step S410 Obtain the current vehicle's driving speed, heading angle, and lateral distance from the target lane.
  • the state information of the current vehicle may include: the driving speed of the current vehicle, the heading angle of the current vehicle, and the lateral distance between the current vehicle and the target lane; wherein, the current vehicle's The heading angle is based on the ground coordinate system, the angle between the velocity of the center of mass of the current vehicle and the horizontal axis.
  • Step S420 Determine the surrounding vehicles associated with the current vehicle, and acquire the status information of the surrounding vehicles.
  • determining the surrounding vehicles associated with the current vehicle, and obtaining the status information of the surrounding vehicles may include steps S510-step S530:
  • Step S510 Acquiring the target lane and the current vehicle's own lane
  • Step S520 Determine the vehicle ahead of the current vehicle's own lane, the vehicle ahead in the target lane relative to the current vehicle in the target lane, and the vehicle behind the target lane;
  • Step S530 Obtain the longitudinal relative speed, longitudinal and lateral distance, reciprocal of collision avoidance time, and follow-up time of the preceding vehicle in the own lane, the preceding vehicle in the target lane, and the following vehicle in the target lane.
  • step S510-step S530 will be explained and described. Specifically, first, obtain the own lane of the current vehicle and the target lane where the current vehicle wants to change lanes; then, obtain the vehicle in front of the current vehicle in the self-lane, the vehicle in front of the target lane relative to the current vehicle in the target lane, and the rear of the target lane Finally, obtain the longitudinal relative speed, longitudinal and lateral relative distance, reciprocal collision avoidance time and following time distance of the vehicle in front of the own lane, the vehicle in front of the target lane and the rear vehicle in the target lane, and obtain the obtained longitudinal relative speed, longitudinal and lateral relative distance, reciprocal collision avoidance time and following time as the status information of the surrounding vehicles associated with the current vehicle.
  • step S220 the state information of the current vehicle and the state information of the surrounding vehicles are input into the current lane change learner and the preset trajectory optimizer to obtain the trajectory learning control amount and the trajectory optimization control amount.
  • the current lane-changing learner is obtained by training the deep neural network through the training set, wherein the data in the training set is the data included in the original path data set, and the original path data set includes the input data in the original path data set.
  • the path data pair composed of the state information to the trajectory optimizer and the trajectory optimization control output output by the trajectory optimizer; when the input to the current lane change learner is the state information of the current vehicle and the surrounding vehicles associated with the current vehicle
  • the state information of the current lane change learner is output, the current vehicle acceleration and front wheel angle are the features output.
  • the preset trajectory optimizer is an optimizer based on mixed integrated integer quadratic programming.
  • the trajectory optimization The characteristics output by the controller are the target acceleration of the current vehicle and the target front wheel angle; the trajectory learning control amount includes the acceleration of the current vehicle and the front wheel angle of the current vehicle; the trajectory optimization control amount includes the target acceleration of the current vehicle and the target acceleration of the current vehicle. Front wheel angle.
  • Keras is an open source artificial neural network library written in Python.
  • the network model includes an input layer, a hidden layer, and an output layer, all of which use a fully connected layer;
  • the input feature dimension is 18, the input layer includes 256 neurons, and the activation function can be ReLU;
  • the hidden layer includes 128 neurons, and the activation function is ReLU;
  • the output layer includes 2 neurons, corresponding to the output acceleration and Front wheel angle, the activation function is the tanh function.
  • the trajectory optimizer chooses the Adam algorithm as an extension of the stochastic gradient descent algorithm, and can estimate independent adaptive learning rates for different parameters by calculating the gradient matrix, which has high computational efficiency and low memory usage.
  • inputting the state information of the current vehicle and the state information of the surrounding vehicles into the current lane change learner and the preset trajectory optimizer may include steps S710 and Step S720:
  • Step S710 Perform normalization processing on the status information of the current vehicle and the status information of the surrounding vehicles to obtain the normalized status information of the current vehicle and the normalized status information of the surrounding vehicles;
  • Step S720 Input the normalized state information of the current vehicle and the normalized state information of surrounding vehicles into the current lane change learner and the preset trajectory optimizer.
  • step S710 and step S720 will be explained and described. Specifically, since the measurement units of the state information of the current vehicle input into the current lane change learner and the state information of surrounding vehicles associated with the current vehicle are different, the acquired state information of the current vehicle and the state information associated with the current vehicle are directly It is difficult to learn effective features by inputting the status information of the surrounding vehicles into the current lane change learner.
  • the state information of the vehicle and the state information of the normalized surrounding vehicles can be normalized according to the maximum value and the minimum value,
  • the feature can be normalized with reference to expression (1).
  • the output features are the acceleration of the current vehicle, respectively , the front wheel angle of the current vehicle, the target acceleration of the current vehicle, and the target front wheel angle of the current vehicle.
  • step S230 when it is determined that the deviation between the trajectory learning control quantity and the trajectory optimization control quantity is greater than a preset threshold, a current path data pair is generated based on the trajectory optimization control quantity.
  • the deviation when there is a large deviation between the trajectory learning control output output by the current lane change learner and the optimal decision, the deviation will cause a large deviation in the path of the current vehicle.
  • the trajectory learning control quantity output by the current lane-changing learner that may have deviations is sampled and marked, and the original path data set is updated online, and the updated path data set is passed
  • the current lane-changing learner is trained to reduce the deviation between the trajectory learning control output output by the current lane-changing learner and the optimal decision.
  • the first deviation between the acceleration of the current vehicle in the trajectory learning control quantity and the target acceleration of the current vehicle in the trajectory optimization control quantity calculates the acceleration of the current vehicle in the trajectory learning control quantity
  • the current path data pair is generated by the state information of the surrounding vehicles and the state information of the surrounding vehicles; when the first deviation is less than the first preset threshold and the second deviation is less than the second preset threshold, it can be considered that the current policy learner converges, that is, the current policy learner The learner can switch lanes for the target.
  • the mean absolute error (Mean Absolute Error, MAE) can be used as the loss function to better respond and control the output error.
  • MAE mean Absolute Error
  • n is the number of predicted points, is the predicted value of the i-th predicted point, is the true value of the i-th prediction point.
  • step S240 the current lane-changing learner is trained by using the current path data to obtain a target lane-changing learner, and the lane-changing strategy of the current vehicle is obtained by using the target lane-changing learner.
  • the current lane change learner is trained through the current path data to obtain a target lane change learner, which may include steps S810-step S830:
  • Step S810 Obtain an original route data set, wherein the original route data set includes a plurality of route data pairs;
  • Step S820 Add the current route data pair to the original route data set, update the original route data set, and obtain a target route data set;
  • Step S830 Train the current lane-changing learner by including path data pairs in the target route data set until the current lane-changing learner converges to obtain the target lane-changing learner.
  • step S810-step S830 will be explained and described.
  • the original path data set in the original sample data is obtained, wherein the original path data set includes multiple path data pairs; then, the current path data pair is added to the original path data set, and the original path data set is updated to obtain The target path data set.
  • the training samples are updated through the target path data set, and the current lane-changing learner is trained through the updated sample data until the current lane-changing learner converges to obtain the target lane-changing learner.
  • the state information of the current vehicle and the state information of surrounding vehicles associated with the current vehicle are input into the target lane-changing learner to obtain the lane-changing strategy of the current vehicle volume.
  • the generation of the target lane change learner may include: combining the state information of the current vehicle and the state information of surrounding vehicles Input to the neural network-based current lane-changing learner ⁇ ⁇ (u t
  • the current lane change policy learner is the target lane change learner; when the deviation is determined to be greater than the preset threshold
  • the current path data pair is generated from the data in the trajectory optimizer Update the original path data set, that is, by inputting the state information of the current vehicle into the trajectory optimizer, the state information of surrounding vehicles
  • the trajectory optimization control quantity output by the trajectory optimizer Update the original path data set D ⁇ (o 1 ,u 1 ),...,(o N ,u N ) ⁇ , and update the training set with the updated original path data set
  • t is the tth training
  • t is a positive integer.
  • the main parameters of the Adam algorithm initial learning rate is 0.001
  • the exponential decay rate of the first-order moment estimation is 0.9
  • the exponential decay rate of the second-order moment estimation is 0.999
  • the epoch (period) of deep neural network model training when a completed data set passes through the neural network once and returns once, This process is a period) of 100 times
  • the batch size is 32.
  • Figure 10 shows the loss function curves on the training set and verification set in the current lane change learner training process.
  • Figure 10 the current lane change learning The detector basically converges after training 100 times.
  • Figures 11 and 12 show that after the current lane change learner is trained based on the above parameters, the absolute error values of the current lane change learner are both 5*10 -4 , and the average percentage absolute error is 25%.
  • Figure 13 shows the comparison between the original path and the path obtained by the decision of the target lane-changing learner in the online simulation. It can be seen that the deviation between the path obtained by the decision of the target lane-changing learner and the original path is small.
  • Figure 14 is a comparison of the original acceleration and front wheel angle with the acceleration and front wheel angle obtained through the decision of the target lane change learner in the online simulation. It can be seen that the range of acceleration and front wheel angle obtained by the target lane change learner satisfies The constraints of the learner, showing that the current lane-changing learner can achieve safe, smooth and efficient lane-changing decisions.
  • the lane changing strategy generation method has at least the following advantages: On the one hand, not only the state information of the current vehicle, but also the state information of surrounding vehicles associated with the current vehicle are considered, which improves the integrity of the environment information; On the other hand, the current lane-changing learner is obtained by training the deep neural network model through the original path data set, and the current lane-changing strategy of the current vehicle is obtained through the current lane-changing learner. It does not need to rely on prior knowledge, and the modeling cost is low.
  • the efficiency of lane changing strategy generation is improved; on the other hand, after the trajectory learning control amount of the current vehicle is obtained through the current lane changing strategy learner, the trajectory learning control amount is compared with the trajectory optimization control amount obtained in the trajectory optimizer , the current path data pair generated based on the comparison deviation, and the original path data set is updated through the current path data pair, and the current lane-changing learner is retrained through the updated original path data set, and the current lane-changing learner converges When , the target lane-changing learner is obtained, and the lane-changing strategy of the current vehicle is generated through the target lane-changing learner, which improves the accuracy of lane-changing strategy generation.
  • the exemplary embodiment of the present disclosure also provides a lane-changing strategy generating device, as shown in FIG. 1540. in:
  • the input data obtaining module 1510 is used to obtain the state information of the current vehicle and the state information of surrounding vehicles associated with the current vehicle;
  • the output data acquisition module 1520 is configured to input the state information of the current vehicle and the state information of the surrounding vehicles into the current lane change learner and the preset trajectory optimizer to obtain the trajectory learning control amount and the trajectory optimization control amount ;
  • the current path data pair generation module 1530 when determining that the deviation between the trajectory learning control amount and the trajectory optimization control amount is greater than a preset threshold, generates a current path data pair based on the trajectory optimization control amount;
  • a lane-changing policy generation module 1540 configured to train the current lane-changing learner by using the current path data to obtain a target lane-changing learner.
  • obtaining status information of the current vehicle and status information of surrounding vehicles associated with the current vehicle includes:
  • determining the surrounding vehicles associated with the current vehicle, and obtaining state information of the surrounding vehicles includes:
  • the longitudinal relative speed, longitudinal and lateral relative distance, reciprocal collision avoidance time, and following time distance of the preceding vehicle in the own lane, the preceding vehicle in the target lane, and the following vehicle in the target lane are acquired.
  • inputting the state information of the current vehicle and the state information of the surrounding vehicles into the current lane change learner and the preset trajectory optimizer includes:
  • the trajectory learning control quantity includes current vehicle acceleration and front wheel rotation angle
  • the trajectory optimization control quantity includes target acceleration and target front wheel rotation angle
  • generating a current path data pair based on the trajectory optimization control quantity includes:
  • the current lane change learner When it is determined that the deviation is smaller than the preset threshold, the current lane change learner converges, and the current lane change learner is used as a target lane change learner;
  • the current path data pair is generated by using the trajectory optimization control amount, the current vehicle state information, and the surrounding vehicle state information.
  • the current lane change learner is trained through the current path data to obtain a target lane change learner, including:
  • the current lane change learner is trained by including path data pairs in the target path data set until the current lane change learner converges to obtain the target lane change learner.
  • an apparatus for generating a lane-changing strategy including:
  • An input data acquisition module configured to acquire state information of the current vehicle and state information of surrounding vehicles associated with the current vehicle;
  • the output data acquisition module is used to input the state information of the current vehicle and the state information of the surrounding vehicles into the current lane change learner and the preset trajectory optimizer to obtain the trajectory learning control amount and the trajectory optimization control amount;
  • the current path data pair generation module generates a current path data pair based on the trajectory optimization control amount when it is determined that the deviation between the trajectory learning control amount and the trajectory optimization control amount is greater than a preset threshold;
  • a lane-changing strategy generating module configured to train the current lane-changing learner by using the current path data pair to obtain a target lane-changing learner.
  • steps of the methods of the present disclosure are depicted in the drawings in a particular order, there is no requirement or implication that the steps must be performed in that particular order, or that all illustrated steps must be performed to achieve the desired result. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, etc.
  • an electronic device capable of implementing the above method is also provided.
  • FIG. 16 An electronic device 1600 according to this embodiment of the present disclosure is described below with reference to FIG. 16 .
  • the electronic device 1600 shown in FIG. 16 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
  • electronic device 1600 takes the form of a general-purpose computing device.
  • the components of the electronic device 1600 may include, but are not limited to: at least one processing unit 1610, at least one storage unit 1620, a bus 1630 connecting different system components (including the storage unit 1620 and the processing unit 1610), and a display unit 1640.
  • the storage unit stores program codes, and the program codes can be executed by the processing unit 1610, so that the processing unit 1610 executes various exemplary methods according to the present disclosure described in the "Exemplary Methods" section of this specification. Implementation steps.
  • the processing unit 1610 may execute step S210 as shown in FIG.
  • the storage unit 1620 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 16201 and/or a cache storage unit 16202 , and may further include a read-only storage unit (ROM) 16203 .
  • RAM random access storage unit
  • ROM read-only storage unit
  • Storage unit 1620 may also include programs/utilities 16204 having a set (at least one) of program modules 16205, such program modules 16205 including but not limited to: an operating system, one or more application programs, other program modules, and program data, Implementations of networked environments may be included in each or some combination of these examples.
  • Bus 1630 may represent one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local area using any of a variety of bus structures. bus.
  • the electronic device 1600 can also communicate with one or more external devices 1700 (such as keyboards, pointing devices, Bluetooth devices, etc.), and can also communicate with one or more devices that enable the user to interact with the electronic device 1600, and/or communicate with Any device (eg, router, modem, etc.) that enables the electronic device 1600 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interface 1650 .
  • the electronic device 1600 can also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN) and/or a public network such as the Internet) through the network adapter 1660 . As shown, network adapter 1660 communicates with other modules of electronic device 1600 via bus 1630 .
  • other hardware and/or software modules may be used in conjunction with electronic device 1600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • the example implementations described here can be implemented by software, or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of the present disclosure can be embodied in the form of software products, and the software products can be stored in a non-volatile storage medium (which can be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present disclosure.
  • a computing device which may be a personal computer, a server, a terminal device, or a network device, etc.
  • a computer-readable storage medium on which a program product capable of implementing the above-mentioned method in this specification is stored.
  • various aspects of the present disclosure may also be implemented in the form of a program product, which includes program code, and when the program product is run on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present disclosure described in the "Exemplary Method" section above in this specification.
  • a program product for implementing the above method according to the embodiment of the present disclosure, it may adopt a portable compact disk read only memory (CD-ROM) and include program codes, and may run on a terminal device such as a personal computer.
  • CD-ROM compact disk read only memory
  • the program product of the present disclosure is not limited thereto.
  • a readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus or device.
  • the program product may reside on any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer readable signal medium may include a data signal carrying readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a readable signal medium may also be any readable medium other than a readable storage medium that can transmit, propagate, or transport a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Program code for performing the operations of the present disclosure may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural Programming language - such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server to execute.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (for example, using an Internet service provider). business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, using an Internet service provider

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Traffic Control Systems (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)

Abstract

一种换道策略生成方法和装置、计算机存储介质、电子设备,涉及智能驾驶技术领域,该方法包括:获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息(S210);将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,得到轨迹学习控制量以及轨迹优化控制量(S220);在确定所述轨迹学习控制量与所述轨迹优化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对(S230);通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器,通过所述目标换道学习器得到所述当前车辆的换道策略(S240)。该方法提高了换道策略生成的效率以及准确率。

Description

换道策略生成方法和装置、计算机存储介质、电子设备
相关申请的交叉引用
本公开要求于2021年11月12日提交的公开号为202111354329.4、名称为“换道策略生成方法和装置、计算机存储介质、电子设备”的中国专利公开的优先权,该中国专利公开的全部内容通过引用全部并入全文。
技术领域
本公开实施例涉及智能驾驶技术领域,具体而言,涉及一种换道策略生成方法、换道策略生成装置、计算机存储介质以及电子设备。
背景技术
随着全球经济和汽车工业的飞速发展,机动车保有量逐年攀升,导致的交通事故对人们的生命和财产构成巨大威胁。人、车、环境是交通事故的三大主要因素,驾驶人的感知、认知、决策和操控能力具有局限性、动态不确定性与开放向,且容易受到驾驶人自身状态和道路环境等内外因素的影响,因此,难以保证驾驶安全的一致性与稳定性。
近年来,智能汽车依靠精准的定位与传感装置,能实时获取人、车和道路环境的信息,具备复杂环境下的场景认知、风险评估、智能决策和控制等能力,有望成为未来减少交通事故的重要途径。
在智能汽车中,决策和路径规划是智能驾驶的核心,传统的决策问题是基于规则的专家***,如有限状态机、决策树模型等,但是专家***依赖先验知识且建模成本高、可拓展性差、场景泛化能力不足,难以适应复杂多变的驾驶路况;路径规划通常采用基于滚动时域的局部路径规划算法,该算法能够在动态多变的环境中规划处安全高效的无碰路径,但是,计算量相对较大,实际应用受限制。
发明内容
根据本公开的一个方面,提供一种换道策略生成方法,包括:
获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息;
将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,得到轨迹学习控制量以及轨迹优化控制量;
在确定所述轨迹学习控制量与所述轨迹优化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对;
通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器,通过所述目标换道学习器得到所述当前车辆的换道策略。
在本公开的一种示例性实施例中,获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息,包括:
获取当前车辆的行驶速度、航向角以及与目标车道的横向距离;
确定与所述当前车辆关联的周围车辆,获取所述周围车辆的状态信息。
在本公开的一种示例性实施例中,确定与所述当前车辆关联的周围车辆,获取所述周围车辆的状态信息,包括:
获取目标车道、所述当前车辆的自车道;
确定所述当前车辆的自车道的前车、目标车道中相对于当前车辆的目标车道前车以及目标车道后车;
获取所述自车道前车、所述目标车道前车以及所述目标车道后车的纵向相对速度、纵向和横向相对距离、避撞时间倒数以及跟车时距。
在本公开的一种示例性实施例中,将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,包括:
对所述当前车辆的状态信息以及所述周围车辆的状态信息进行归一化处理,得到归一化的当前车辆的状态信息以及归一化的周围车辆的状态信息;
将所述归一化的当前车辆的状态信息以及所述归一化的周围车辆的状态信息输入至所述当前换道学习器以及所述预设的轨迹优化器中。
在本公开的一种示例性实施例中,所述轨迹学习控制量中包括当前车辆的加速度和前轮转角,所述轨迹优化控制量中包括目标加速度和目标前轮转角。
在本公开的一种示例性实施例中,在确定所述轨迹学习控制量与所述轨迹优化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对,包括:
获取所述轨迹学习控制量与所述轨迹优化控制量的偏差;
在确定所述偏差小于所述预设阈值时,所述当前换道学习器收敛,并将所述当前换道学习器作为目标换道学习器;
在确定所述偏差大于所述预设阈值时,通过所述轨迹优化控制量以及所述当前车辆状态信息、所述周围车辆状态信息生成所述当前路径数据对。
在本公开的一种示例性实施例中,通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器,包括:
获取原始路径数据集,其中,所述原始路径数据集中包括多个路径数据对;
将所述当前路径数据对添加至所述原始路径数据集中,对所述原始路径数据集进行更新,得到目标路径数据集;
通过所述目标路径数据集中包括路径数据对对所述当前换道学习器进行训练,直至所述当前换道学习器收敛,得到所述目标换道学习器。
根据本公开的一个方面,提供一种换道策略生成装置,包括:
输入数据获取模块,用于获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息;
输出数据获取模块,用于将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,得到轨迹学习控制量以及轨迹优化控制量;
当前路径数据对生成模块,在确定所述轨迹学习控制量与所述轨迹优 化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对;
换道策略生成模块,用于通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器。
根据本公开的一个方面,提供一种计算机存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一示例性实施例所述的换道策略生成方法。
根据本公开的一个方面,提供一种电子设备,包括:
处理器;以及
存储器,用于存储所述处理器的可执行指令;
其中,所述处理器配置为经由执行所述可执行指令来执行上述任一示例性实施例所述的换道策略生成方法。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示意性示出根据本公开示例实施例的一种换道策略生成方法应用场景的***架构框图。
图2示意性示出根据本公开示例实施例的一种换道策略生成方法的流程图。
图3示意性示出根据本公开示例实施例的一种换道策略生成***的框图。
图4示意性示出根据本公开示例实施例的一种获取当前车辆的状态信息以及周围车辆的状态信息的方法流程图。
图5示意性示出根据本公开示例实施例的一种确定与当前车辆关联的周围车辆,获取周围车辆的状态信息的方法流程图。
图6示意性示出根据本公开示例实施例的一种当前换道学习器的结构框图。
图7示意性示出根据本公开示例实施例的一种将当前车辆的状态信息以及周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中的方法流程图。
图8示意性示出根据本公开示例实施例的一种通过当前路径数据对对当前换道学习器进行训练得到目标换道学习器的方法流程图。
图9示意性示出根据本公开示例实施例的一种生成目标换道学习器的方法流程图。
图10示意性示出根据本公开示例实施例的一种当前换道学习器训练过程中损失函数曲线对比图。
图11示意性示出根据本公开示例实施例的一种当前换道学习器训练过程中训练次数与绝对值误差均值之间的关系图。
图12示意性示出根据本公开示例实施例的一种将当前换道学习器训练过程中训练次数与百分比绝对误差均值之间的关系图。
图13示意性示出根据本公开示例实施例的一种在线仿真过程中原始路径与目标换道学习器的换道策略的对比图。
图14示意性示出根据本公开示例实施例的一种在线仿真过程中原始控制量与目标换道学习器的加速度以及前轮转角的对比图。
图15示意性示出根据本公开示例实施例的一种换道策略生成装置的框图。
图16示意性示出根据本公开示例实施例的用于实现上述换道策略生成方法的电子设备。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供 许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
混合交通场景中的场景认知要求智能汽车能对环境感知信息做进一步的加工、存储和提取,理解各个交通要素的构成和交互关系,解释道路使用者的行为及其演变规律,由此可见,场景认知时智能驾驶的关键环节。传统的场景认知通常是建立在演绎逻辑与语义描述的基础上,难以对动态多变的交通对象及其行为进行精确建模。特别地,弱势道路使用者(Vulnerable Road Users,VRUs)种类繁杂、数量众多,交通安全意识薄弱,行为具有高度的动态不确定性,导致现有的行为认知模型难以描述VRUs的行为特性,这对混合交通环境下的自动驾驶构成极大的挑战。随着技术的发展,近年来,基于深度强化学习(Deep Reinforcement Learning)的决策框架成为智能汽车决策和规划的重要研究方向,驾驶决策被看作时智能汽车与周围环境的博弈结果,通过构建价值网络评估当前到未来预测时域内的行车风险,然后通过构建策略网络输出车辆的控制决策,从而能够适应动态多变的驾驶场景。然而,在基于深度强化学习的决策框架算法搜索空间大、收敛慢,难以满足实时性的需求。
基于上述一个或者多个问题,首先,参考图1,图1示出了可以应用本公开实施例的一种换道策略生成方法及装置的示例性应用场景的***架构的示意框图。
如图1所示,***架构100可以包括终端设备101、102中的一个或 多个,网络103和服务器104。网络103用以在终端设备101、102和服务器104之间提供通信链路的介质。网络103可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。终端设备101、102可以是智能驾驶车,包括但不限于智能驾驶汽车、智能驾驶公交等。应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。比如服务器104可以是多个服务器组成的服务器集群等。
本公开实施例所提供的换道策略生成方法一般由服务器104执行,相应地,换道策略生成装置一般设置于服务器104中。但本领域技术人员容易理解的是,本公开实施例所提供的换道策略生成方法也可以由终端设备101、102执行,相应的,换道策略生成装置也可以设置于终端设备101、102中,本示例性实施例中对此不做特殊限定。举例而言,在一种示例性实施例中,可以是通过终端设备101、102将当前车辆状态信息以及与当前车辆关联的周围车辆状态信息上传至服务器104,服务器通过本公开实施例所提供的换道策略生成方法生成目标换道学习器,通过目标换道学习器生成当前车辆的换道策略,并将生成的当前车辆换道策略传输给终端设备101、102等以使终端设备101、102根据接收到的当前车辆换道策略执行对应的换道决策。
图2示出了换道策略生成方法的流程示意图,参考图2所示,该换道策略生成方法可以包括以下步骤:
步骤S210.获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息;
步骤S220.将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,得到轨迹学习控制量以及轨迹优化控制量;
步骤S230.在确定所述轨迹学习控制量与所述轨迹优化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对;
步骤S240.通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器,通过所述目标换道学习器得到所述当前车辆的 换道策略。
上述换道策略生成方法,获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息;将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,得到轨迹学习控制量以及轨迹优化控制量;在确定所述轨迹学习控制量与所述轨迹优化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对;通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器,通过所述目标换道学习器得到所述当前车辆的换道策略;一方面,将获取到的当前车辆状态信息以及当前车辆关联的周围车辆状态信息输入至当前换道学习器以及轨迹优化器中,得到轨迹学习控制量以及轨迹优化控制量,计算轨迹学习控制量与轨迹优化控制量之间的偏差,当偏差大于预设阈值时,基于轨迹优化控制量生成当前路径数据对,通过当前路径数据对对换道学习器进行训练,得到目标换道学习器,并通过目标换道学习器生成当前车辆的换道策略,解决了现有技术中通过专家***在进行决策时,其需要依赖先验知识且建模成本高、可拓展性差、场景泛化能力不足的问题,提高了换道策略生成的效率;另一方面,通过当前换道学习器得到当前车辆的轨迹学习控制量之后,计算该轨迹学习控制量的偏差,在确定偏差大于预设阈值时,生成当前路径数据对并通过当前路径数据对对当前换道学习器进行训练,得到目标换道学习器,避免了当前换道学习器输出的轨迹学习控制量存在较大的误差,而导致当前车辆的路径发生较大偏移,提高了生成换道策略的准确度。
以下,对本公开示例实施例的换道策略生成方法中涉及的各步骤进行详细的解释以及说明。
首先,对本公开示例实施例的应用场景以及发明目的进行解释以及说明。具体的,本公开示例实施例可以应用于智能驾驶中,主要研究如何根据当前车辆的状态信息以及与当前车辆关联的周围车辆的状态信息,生成当前车辆的换道策略,提高当前车辆换道策略生成的效率以及准确度。
本公开以获取到的当前车辆状态信息以及与当前车辆关联的周围车辆状态信息为基础,将获取到的当前车辆状态信息以及与当前车辆关联的 周围车辆状态信息输入至当前换道学习器以及预设的轨迹优化器中,通过当前换道学习器得到轨迹学习控制量,通过轨迹优化器得到轨迹优化控制量,其中,当前换道学习器是基于训练集训练得到的;其中,训练集中的数据为原始路径数据集中包括的数据;轨迹优化器是基于混合整合整数二次规划的优化器;当得到轨迹学习控制量以及轨迹优化控制量之后,根据轨迹学习控制量与轨迹优化控制量之间的偏差生成当前路径数据对,通过生成的当前路径数据对原始路径数据集进行更新,并通过更新后原始路径数据集更新训练集,利用更新后的训练集对当前换道学习器进行训练,生成目标换道学习器,并通过目标换道学习器生成当前车辆的换道策略,提高了换道策略生成的效率以及准确度。
其次,对本公开示例实施例中涉及到的换道策略生成***进行解释以及说明。参考图3所示,该换道策略生成***可以包括状态信息收集模块310、当前换道学习器320、轨迹优化器330以及目标换道学习器340。其中,状态信息收集模块310,用于收集当前车辆的状态信息以及与当前车辆关联的周围车辆的状态信息,并对收集到的当前车辆的状态信息以及与当前车辆关联的周围车辆的状态信息进行归一化处理,得到归一化的当前车辆的状态信息以及归一化的周围车辆的状态信息;当前换道学习器320,与状态信息收集模块310网络连接,用于获取状态信息收集模块310中收集的当前车辆的状态信息以及与当前车辆关联的周围车辆的状态信息,将获取到的当前车辆的状态信息以及周围车辆的状态信息输入至当前换道学习器中,得到当前车辆的轨迹学习控制量,其中,当前车辆的轨迹学习控制量中包括当前车辆的加速度以及前轮转角;轨迹优化器330,与状态信息收集模块310网络连接,用于获取状态信息收集模块310中收集的当前车辆的状态信息以及与当前车辆关联的周围车辆的状态信息,将获取到的当前车辆的状态信息以及周围车辆的状态信息输入至轨迹优化器330中,得到当前车辆的轨迹优化控制量,其中轨迹优化控制量中包括当前车辆的目标加速度和目标前轮转角;目标换道学习器340,与当前换道学习器320以及轨迹优化器330网络连接,用于根据计算轨迹学习控制量与轨迹优化控制量之间的偏差,在确定偏差大于预设阈值时,根据输入至轨迹 优化器中的当前车辆的状态信息、周围车辆的状态信息以及轨迹优化器输出的轨迹优化控制量生成当前路径数据对,通过当前路径数据对对原始路径数据集进行更新,并通过更新后的原始路径数据集对训练集进行更新,通过更新后的训练集对当前换道学习器进行训练,直至该当前换道学习器收敛,得到目标换道学习器,并通过该目标换道学习器得到当前车辆的换道策略。
以下,将结合图3对步骤S210-步骤S240进行详细的解释以及说明。
在步骤S210中,获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息。
在本示例实施例中,表1所示的状态信息为当前车辆换道决策时需要考虑的状态信息,对于当前车辆,由于可以通过车速表感知到当前车辆速度的变化,也可以根据外接参考物的位置感知自车的横向位置和纵向位置的变化,因此,可以选择当前车辆的绝对运动作为当前车辆的状态信息,包含:当前车辆的行驶速度、航向角以及当前车辆与目标车道的横向距离;与当前车辆关联的周围车辆包括:当前车辆的自车道的前车、目标车道的前车以及目标车道的后车;对于与当前车辆关联的周围车辆,当前车辆换道时考虑的是当前车辆与周围车辆的相对运动的变化,而不是绝对运动变化,因此,可以选择周围车辆的纵向相对速度、纵向和横向相对距离,此外,避撞时间倒数(inverse of Time to Collision,TTCi)可以反映驾驶人对行车风险的感知特性,对当前车辆换道决策影响显著,因此避撞时间倒数也可以作为周围车辆的状态信息;同时,跟车时距(Time Headway,THW)作为跟车特性的重要衡量指标,也可以作为周围车辆的状态信息,因此,周围车辆的状态信息包括:纵向相对速度、纵向和横向相对距离、避撞时间倒数以及跟车时距。
表1当前车辆的状态信息以及周围车辆的状态信息
Figure PCTCN2022109804-appb-000001
在本示例实施例中,参考图4所示,获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息,可以包括步骤S410以及步骤S420:
步骤S410.获取当前车辆的行驶速度、航向角以及与目标车道的横向距离。
在本示例实施例中,在获取当前车辆的状态信息时,当前车辆的状态信息可以包括:当前车辆的行驶速度、当前车辆的航向角以及当前车辆与目标车道的横向距离;其中,当前车辆的航向角为基于地面坐标系,当前车辆的质心速度与横轴的夹角。
步骤S420.确定与所述当前车辆关联的周围车辆,获取所述周围车辆的状态信息。
在本示例实施例中,参考图5所示,确定与所述当前车辆关联的周围车辆,获取所述周围车辆的状态信息,可以包括步骤S510-步骤S530:
步骤S510.获取目标车道、所述当前车辆的自车道;
步骤S520.确定所述当前车辆的自车道的前车、目标车道中相对于当前车辆的目标车道前车以及目标车道后车;
步骤S530.获取所述自车道前车、所述目标车道前车以及所述目标车道后车的纵向相对速度、纵向和横向相对距离、避撞时间倒数以及跟车时距。
以下,将对步骤S510-步骤S530进行解释以及说明。具体的,首先,获取当前车辆的自车道以及当前车辆想要换道的目标车道;然后,获取自车道中当前车辆的前车,目标车道中相对于当前车辆的目标车道前车以及目标车道后车;最后,获取自车道前车、目标车道前车以及目标车道后车的纵向相对速度、纵向和横向相对距离、避撞时间倒数以及跟车时距,并将获取到的纵向相对速度、纵向和横向相对距离、避撞时间倒数以及跟车时距作为与当前车辆关联的周围车辆的状态信息。
在步骤S220中,将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,得到轨迹学习控制量以及轨迹优化控制量。
在本示例实施例中,当前换道学习器是通过训练集对深度神经网络训练得到的,其中,训练集中的数据为原始路径数据集中包括的数据,原始路径数据集原始路径数据集中包括由输入至轨迹优化器中的状态信息与该轨迹优化器输出的轨迹优化控制量构成的路径数据对;当输入至当前换道学习器中的特征为当前车辆的状态信息以及与当前车辆关联的周围车辆的状态信息时,该当前换道学习器输出的特征为当前车辆的加速度以及前轮转角。预设的轨迹优化器是基于混合整合整数二次规划的优化器,当输入至轨迹优化器中的特征为当前车辆的状态信息以及与当前车辆关联的周围车辆的状态信息时,该轨迹优化器输出的特征为当前车辆的目标加速度以及目标前轮转角;轨迹学习控制量中包括当前车辆的加速度以及当前车辆的前轮转角;轨迹优化控制量中包括当前车辆的目标加速度以及当前车辆的目标前轮转角。
具体的,参考图6所示,可以选择keras搭建深度神经网络模型,其中,Keras是由Python编写的开源人工神经网络库,网络模型包含输入层、隐藏层和输出层,均采用全连接层;其中,输入的特征维度为18,输入层包括256个神经单元,激活函数可以为ReLU;隐藏层包含128个神经单元,激活函数为ReLU;输出层包含2个神经元,分别对应输出的加速度以及前轮转角,激活函数为tanh函数。轨迹优化器选择Adam算法,作为随机梯度下降算法的扩展,并且,可以通过计算梯度矩阵估计为不同的参数射击独立的自适应学习率,计算效率高且占用内存低。
在本示例实施例中,参考图7所示,将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,可以包括步骤S710以及步骤S720:
步骤S710.对所述当前车辆的状态信息以及所述周围车辆的状态信息进行归一化处理,得到归一化的当前车辆的状态信息以及归一化的周围车辆的状态信息;
步骤S720.将所述归一化的当前车辆的状态信息以及所述归一化的周围车辆的状态信息输入至所述当前换道学习器以及所述预设的轨迹优化器中。
以下,将对步骤S710、步骤S720进行解释以及说明。具体的,由于输入至当前换道学习器中的当前车辆的状态信息以及与当前车辆关联的周围车辆的状态信息的计量单位不同,直接将获取到的当前车辆的状态信息以及与当前车辆关联的周围车辆的状态信息输入至当前换道学习器中难以学习到有效的特征,因此,需要对获取到的当前车辆的状态信息以及周围车辆的状态信息进行归一化处理,得到归一化的当前车辆的状态信息以及归一化的周围车辆的状态信息;其中,在进行归一化处理时,可以根据最大值以及最小值对当前车辆的状态信息以及周围车辆的状态信息进行归一化处理,对于状态信息中的任一特征,可参考表达式(1)对该特征进行归一化处理。
Figure PCTCN2022109804-appb-000002
其中,
Figure PCTCN2022109804-appb-000003
表示标准化后的状态信息,
Figure PCTCN2022109804-appb-000004
表示第i个样本的第j个状态信息,
Figure PCTCN2022109804-appb-000005
表示状态信息j对应的样本中的最小值,
Figure PCTCN2022109804-appb-000006
表示状态信息j对应的样本中的最大值。
在本示例实施例中,当将归一化的当前车辆的状态信息以及归一化的周围车辆的状态信息输入至当前换道学习器以及轨迹优化器中,输出的特征分别为当前车辆的加速度、当前车辆的前轮转角以及当前车辆的目标加速度、当前车辆的目标前轮转角。
在步骤S230中,在确定所述轨迹学习控制量与所述轨迹优化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对。
在本示例实施例中,当当前换道学习器输出的轨迹学习控制量与最佳决策之间存在较大偏差时,该偏差将导致当前车辆的路径发生较大偏移,为解决这个问题,在对当前换道学习器训练过程中,对当前换道学习器中输出的可能产生偏差的轨迹学习控制量进行采样标记,并且在线对原始路径数据集进行更新,并通过更新后的路径数据集对当前换道学习器进行训练,以减少当前换道学习器输出的轨迹学习控制量与最佳决策之间的偏差。
在本示例实施例中,可以通过计算轨迹学习控制量中的当前车辆的加 速度与轨迹优化控制量中的当前车辆的目标加速度之间的第一偏差,以及计算轨迹学习控制量中的当前车辆的前轮转角与轨迹优化控制量中的当前车辆的目标前轮转角之间的第二偏差,当第一偏差和/或第二偏差大于预设阈值,其中,预设阈值中包括第一预设阈值以及第二预设阈值,即,当第一偏差大于第一预设阈值和/或第二偏差大于第二预设阈值时,通过轨迹优化控制量以及与该轨迹优化控制量对应的当前车辆的状态信息、周围车辆的状态信息生成当前路径数据对;当第一偏差小于第一预设阈值以及第二偏差小于第二预设阈值时,可以认为当前策略学习器收敛,即,当前策略学习器可以为目标换道学习器。
在当前换道学***均绝对误差(Mean Absolute Error,MAE)作为损失函数,以更好地反应、控制输出误差,其中,损失函数可以参考表达式(2):
Figure PCTCN2022109804-appb-000007
其中,n为预测点的数量,
Figure PCTCN2022109804-appb-000008
为第i个预测点的预测值,
Figure PCTCN2022109804-appb-000009
为第i个预测点的真实值。
在步骤S240中,通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器,通过所述目标换道学习器得到所述当前车辆的换道策略。
在本示例实施例中,参考图8所示,通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器,可以包括步骤S810-步骤S830:
步骤S810.获取原始路径数据集,其中,所述原始路径数据集中包括多个路径数据对;
步骤S820.将所述当前路径数据对添加至所述原始路径数据集中,对所述原始路径数据集进行更新,得到目标路径数据集;
步骤S830.通过所述目标路径数据集中包括路径数据对对所述当前换道学习器进行训练,直至所述当前换道学习器收敛,得到所述目标换道学习器。
以下,将对步骤S810-步骤S830进行解释以及说明。具体的,获取 原始样本数据中的原始路径数据集,其中,原始路径数据集中包括多个路径数据对;然后,将当前路径数据对添加至原始路径数据集中,对原始路径数据集进行更新,得到目标路径数据集,同时,通过目标路径数据集对训练样本进行更新,并通过更新的样本数据对当前换道学习器进行训练,直至当前换道学习器收敛,得到目标换道学习器。当得到目标换道学习器之后,将当前车辆的状态信息以及与当前车辆关联的周围车辆的状态信息输入至目标换道学习器中,得到当前车量的换道策略。
进一步的,参考图9所示,目标换道学习器的生成可以包括:将当前车辆的状态信息以及周围车辆的状态信息
Figure PCTCN2022109804-appb-000010
输入至基于神经网络的当前换道学习器π θ(u t|o t)中以及基于混合正数二次规划的轨迹优化器中,分别得到当前车辆的轨迹学习控制量
Figure PCTCN2022109804-appb-000011
以及当前车辆的轨迹优化控制量
Figure PCTCN2022109804-appb-000012
对得到的当前车辆的轨迹学习控制量以及轨迹优化控制量进行对比,得到偏差,在确定偏差小于预设阈值时,当前换道策略学习器为目标换道学习器;在确定偏差大于预设阈值时,通过轨迹优化器中的数据生成当前路径数据对
Figure PCTCN2022109804-appb-000013
对原始路径数据集进行更新,即,通过输入轨迹优化器中的当前车辆的状态信息、周围车辆的状态信息
Figure PCTCN2022109804-appb-000014
以及轨迹优化器输出的轨迹优化控制量
Figure PCTCN2022109804-appb-000015
对原始路径数据集D={(o 1,u 1),…,(o N,u N)}进行更新,并通过更新后的原始路径数据集对训练集进行更新
Figure PCTCN2022109804-appb-000016
并通过更新后的训练集对基于深度神经网络的当前换道学习器进行训练,直至收敛,得到目标换道学习器
Figure PCTCN2022109804-appb-000017
其中,t为第t次训练,t为正整数。
在本示例实施例中,当原始路径数据集中包括的路径数据对的数量为43550,训练集、验证集和测试集上样本比例分别为6:2:2,Adam算法的主要参数:初始学习率为0.001,一阶矩估计的指数衰减率为0.9,二阶矩估计的指数衰减率为0.999,深度神经网络模型训练的epoch(时期,当一个完成的数据集通过神经网络一次并且返回了一次,该过程为一个时期)为100次,批大小为32,图10示出了当前换道学习器训练过程中训练集和验证集上的损失函数曲线,在图10中可以得到,当前换道学习器在训练100次后基本收敛。图11以及图12示出了基于上述参数对当前换 道学习器进行训练后,该当前换道学习器的绝对误差值均为5*10 -4,百分比绝对误差均值为25%。图13为在线仿真中,原始路径与通过目标换道学***顺和高效的换道决策。
本公开示例实施例提供的换道策略生成方法至少具有以下优点:一方面,不仅考虑当前车辆的状态信息,还考虑了与当前车辆关联的周围车辆的状态信息,提高了环境信息的完整程度;另一方面,通过原始路径数据集对深度神经网络模型进行训练得到当前换道学习器,并通过当前换道学习器得到当前车辆的换道策略,不需要依赖先验知识,建模成本低,提高了换道策略生成的效率;再一方面,当通过当前换道策略学习器得到当前车辆的轨迹学习控制量之后,对该轨迹学习控制量与轨迹优化器中得到的轨迹优化控制量进行对比,基于对比偏差生成的当前路径数据对,并通过当前路径数据对对原始路径数据集进行更新,通过更新后的原始路径数据集对当前换道学习器进行再次训练,在当前换道学习器收敛时,得到目标换道学习器,并通过目标换道学习器生成当前车辆的换道策略,提高了换道策略生成的准确率。
本公开示例实施例还提供了一种换道策略生成装置,参考图15所示,可以包括:输入数据获取模块1510、输出数据获取模块1520、当前路径数据对生成模块1530以及换道策略生成模块1540。其中:
输入数据获取模块1510,用于获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息;
输出数据获取模块1520,用于将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,得到轨迹学习控制量以及轨迹优化控制量;
当前路径数据对生成模块1530,在确定所述轨迹学习控制量与所述轨迹优化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前 路径数据对;
换道策略生成模块1540,用于通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器。
上述换道策略生成装置中各模块的具体细节已经在对应的换道策略生成方法中进行了详细的描述,因此此处不再赘述。
在本公开的一种示例性实施例中,获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息,包括:
获取当前车辆的行驶速度、航向角以及与目标车道的横向距离;
确定与所述当前车辆关联的周围车辆,获取所述周围车辆的状态信息。
在本公开的一种示例性实施例中,确定与所述当前车辆关联的周围车辆,获取所述周围车辆的状态信息,包括:
获取目标车道、所述当前车辆的自车道;
确定所述当前车辆的自车道的前车、目标车道中相对于当前车辆的目标车道前车以及目标车道后车;
获取所述自车道前车、所述目标车道前车以及所述目标车道后车的纵向相对速度、纵向和横向相对距离、避撞时间倒数以及跟车时距。
在本公开的一种示例性实施例中,将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,包括:
对所述当前车辆的状态信息以及所述周围车辆的状态信息进行归一化处理,得到归一化的当前车辆的状态信息以及归一化的周围车辆的状态信息;
将所述归一化的当前车辆的状态信息以及所述归一化的周围车辆的状态信息输入至所述当前换道学习器以及所述预设的轨迹优化器中。
在本公开的一种示例性实施例中,所述轨迹学习控制量中包括当前车辆的加速度和前轮转角,所述轨迹优化控制量中包括目标加速度和目标前轮转角。
在本公开的一种示例性实施例中,在确定所述轨迹学习控制量与所述 轨迹优化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对,包括:
获取所述轨迹学习控制量与所述轨迹优化控制量的偏差;
在确定所述偏差小于所述预设阈值时,所述当前换道学习器收敛,并将所述当前换道学习器作为目标换道学习器;
在确定所述偏差大于所述预设阈值时,通过所述轨迹优化控制量以及所述当前车辆状态信息、所述周围车辆状态信息生成所述当前路径数据对。
在本公开的一种示例性实施例中,通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器,包括:
获取原始路径数据集,其中,所述原始路径数据集中包括多个路径数据对;
将所述当前路径数据对添加至所述原始路径数据集中,对所述原始路径数据集进行更新,得到目标路径数据集;
通过所述目标路径数据集中包括路径数据对对所述当前换道学习器进行训练,直至所述当前换道学习器收敛,得到所述目标换道学习器。
根据本公开的一个方面,提供一种换道策略生成装置,包括:
输入数据获取模块,用于获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息;
输出数据获取模块,用于将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,得到轨迹学习控制量以及轨迹优化控制量;
当前路径数据对生成模块,在确定所述轨迹学习控制量与所述轨迹优化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对;
换道策略生成模块,用于通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的 实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
此外,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
在本公开的示例性实施例中,还提供了一种能够实现上述方法的电子设备。
所属技术领域的技术人员能够理解,本公开的各个方面可以实现为***、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“***”。
下面参照图16来描述根据本公开的这种实施方式的电子设备1600。图16显示的电子设备1600仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图16所示,电子设备1600以通用计算设备的形式表现。电子设备1600的组件可以包括但不限于:上述至少一个处理单元1610、上述至少一个存储单元1620、连接不同***组件(包括存储单元1620和处理单元1610)的总线1630以及显示单元1640。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元1610执行,使得所述处理单元1610执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。例如,所述处理单元1610可以执行如图2中所示的步骤S210:获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息;S220:将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,得到轨迹学习控制量以及轨迹优化控制量;S230: 在确定所述轨迹学习控制量与所述轨迹优化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对;S240:通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器,通过所述目标换道学习器得到所述当前车辆的换道策略。
存储单元1620可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)16201和/或高速缓存存储单元16202,还可以进一步包括只读存储单元(ROM)16203。
存储单元1620还可以包括具有一组(至少一个)程序模块16205的程序/实用工具16204,这样的程序模块16205包括但不限于:操作***、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线1630可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、***总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
电子设备1600也可以与一个或多个外部设备1700(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备1600交互的设备通信,和/或与使得该电子设备1600能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口1650进行。并且,电子设备1600还可以通过网络适配器1660与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器1660通过总线1630与电子设备1600的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备1600使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID***、磁带驱动器以及数据备份存储***等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是 CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施方式的方法。
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中,本公开的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。
根据本公开的实施方式的用于实现上述方法的程序产品,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行***、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的***、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行***、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
此外,上述附图仅是根据本公开示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未发明的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。

Claims (10)

  1. 一种换道策略生成方法,其中,包括:
    获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息;
    将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,得到轨迹学习控制量以及轨迹优化控制量;
    在确定所述轨迹学习控制量与所述轨迹优化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对;
    通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器,通过所述目标换道学习器得到所述当前车辆的换道策略。
  2. 根据权利要求1所述的换道策略生成方法,其中,获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息,包括:
    获取当前车辆的行驶速度、航向角以及与目标车道的横向距离;
    确定与所述当前车辆关联的周围车辆,获取所述周围车辆的状态信息。
  3. 根据权利要求2所述的换道策略生成方法,其中,确定与所述当前车辆关联的周围车辆,获取所述周围车辆的状态信息,包括:
    获取目标车道、所述当前车辆的自车道;
    确定所述当前车辆的自车道的前车、目标车道中相对于当前车辆的目标车道前车以及目标车道后车;
    获取所述自车道前车、所述目标车道前车以及所述目标车道后车的纵向相对速度、纵向和横向相对距离、避撞时间倒数以及跟车时距。
  4. 根据权利要求1所述的换道策略生成方法,其中,将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,包括:
    对所述当前车辆的状态信息以及所述周围车辆的状态信息进行归一化处理,得到归一化的当前车辆的状态信息以及归一化的周围车辆的状态信息;
    将所述归一化的当前车辆的状态信息以及所述归一化的周围车辆的状态信息输入至所述当前换道学习器以及所述预设的轨迹优化器中。
  5. 根据权利要求1所述的换道策略生成方法,其中,所述轨迹学习控制量中包括当前车辆的加速度和前轮转角,所述轨迹优化控制量中包括目标加速度和目标前轮转角。
  6. 根据权利要求1所述的换道策略生成方法,其中,在确定所述轨迹学习控制量与所述轨迹优化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对,包括:
    获取所述轨迹学习控制量与所述轨迹优化控制量的偏差;
    在确定所述偏差小于所述预设阈值时,所述当前换道学习器收敛,并将所述当前换道学习器作为目标换道学习器;
    在确定所述偏差大于所述预设阈值时,通过所述轨迹优化控制量以及所述当前车辆状态信息、所述周围车辆状态信息生成所述当前路径数据对。
  7. 根据权利要求6所述的换道策略生成方法,其中,通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器,包括:
    获取原始路径数据集,其中,所述原始路径数据集中包括多个路径数据对;
    将所述当前路径数据对添加至所述原始路径数据集中,对所述原始路径数据集进行更新,得到目标路径数据集;
    通过所述目标路径数据集中包括路径数据对对所述当前换道学习器进行训练,直至所述当前换道学习器收敛,得到所述目标换道学习器。
  8. 一种换道策略生成装置,其中,包括:
    输入数据获取模块,用于获取当前车辆的状态信息以及与所述当前车辆关联的周围车辆的状态信息;
    输出数据获取模块,用于将所述当前车辆的状态信息以及所述周围车辆的状态信息输入至当前换道学习器以及预设的轨迹优化器中,得到轨迹学习控制量以及轨迹优化控制量;
    当前路径数据对生成模块,在确定所述轨迹学习控制量与所述轨迹优 化控制量的偏差大于预设阈值时,基于所述轨迹优化控制量生成当前路径数据对;
    换道策略生成模块,用于通过所述当前路径数据对对所述当前换道学习器进行训练,得到目标换道学习器。
  9. 一种计算机存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1-7任一项所述的换道策略生成方法。
  10. 一种电子设备,其中,包括:
    处理器;以及
    存储器,用于存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1-7任一项所述的换道策略生成方法。
PCT/CN2022/109804 2021-11-12 2022-08-02 换道策略生成方法和装置、计算机存储介质、电子设备 WO2023082726A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111354329.4 2021-11-12
CN202111354329.4A CN114021840A (zh) 2021-11-12 2021-11-12 换道策略生成方法和装置、计算机存储介质、电子设备

Publications (1)

Publication Number Publication Date
WO2023082726A1 true WO2023082726A1 (zh) 2023-05-19

Family

ID=80064375

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/109804 WO2023082726A1 (zh) 2021-11-12 2022-08-02 换道策略生成方法和装置、计算机存储介质、电子设备

Country Status (2)

Country Link
CN (1) CN114021840A (zh)
WO (1) WO2023082726A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114021840A (zh) * 2021-11-12 2022-02-08 京东鲲鹏(江苏)科技有限公司 换道策略生成方法和装置、计算机存储介质、电子设备
CN115482687B (zh) * 2022-09-15 2024-05-07 吉咖智能机器人有限公司 用于车辆变道风险评估的方法、装置、设备和介质
CN115657684B (zh) * 2022-12-08 2023-03-28 禾多科技(北京)有限公司 车辆路径信息生成方法、装置、设备和计算机可读介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110568760A (zh) * 2019-10-08 2019-12-13 吉林大学 适用于换道及车道保持的参数化学习决策控制***及方法
CN112578672A (zh) * 2020-12-16 2021-03-30 吉林大学青岛汽车研究院 基于底盘非线性的无人驾驶汽车轨迹控制***及其轨迹控制方法
CN112937564A (zh) * 2019-11-27 2021-06-11 初速度(苏州)科技有限公司 换道决策模型生成方法和无人车换道决策方法及装置
US20210181742A1 (en) * 2019-12-12 2021-06-17 Baidu Usa Llc Path planning with a preparation distance for a lane-change
CN114021840A (zh) * 2021-11-12 2022-02-08 京东鲲鹏(江苏)科技有限公司 换道策略生成方法和装置、计算机存储介质、电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110568760A (zh) * 2019-10-08 2019-12-13 吉林大学 适用于换道及车道保持的参数化学习决策控制***及方法
CN112937564A (zh) * 2019-11-27 2021-06-11 初速度(苏州)科技有限公司 换道决策模型生成方法和无人车换道决策方法及装置
US20210181742A1 (en) * 2019-12-12 2021-06-17 Baidu Usa Llc Path planning with a preparation distance for a lane-change
CN112578672A (zh) * 2020-12-16 2021-03-30 吉林大学青岛汽车研究院 基于底盘非线性的无人驾驶汽车轨迹控制***及其轨迹控制方法
CN114021840A (zh) * 2021-11-12 2022-02-08 京东鲲鹏(江苏)科技有限公司 换道策略生成方法和装置、计算机存储介质、电子设备

Also Published As

Publication number Publication date
CN114021840A (zh) 2022-02-08

Similar Documents

Publication Publication Date Title
EP4009300A1 (en) Vehicle automatic control method and lane change intention prediction network training method
WO2023082726A1 (zh) 换道策略生成方法和装置、计算机存储介质、电子设备
Michelmore et al. Uncertainty quantification with statistical guarantees in end-to-end autonomous driving control
Ding et al. Multimodal safety-critical scenarios generation for decision-making algorithms evaluation
Yufang et al. Investigating long‐term vehicle speed prediction based on BP‐LSTM algorithms
Xiao et al. UB‐LSTM: a trajectory prediction method combined with vehicle behavior recognition
Makantasis et al. Deep reinforcement‐learning‐based driving policy for autonomous road vehicles
Jin et al. Gauss mixture hidden Markov model to characterise and model discretionary lane‐change behaviours for autonomous vehicles
CN114261400B (zh) 一种自动驾驶决策方法、装置、设备和存储介质
WO2022252457A1 (zh) 一种自动驾驶控制方法、装置、设备及可读存储介质
Liu et al. Smart city moving target tracking algorithm based on quantum genetic and particle filter
Yuan et al. End‐to‐end learning for high‐precision lane keeping via multi‐state model
CN114519433A (zh) 多智能体强化学习、策略执行方法及计算机设备
CN114004993A (zh) 基于lstm速度预测优化的ia-svm行驶工况识别方法及装置
Liu et al. Estimation of driver lane change intention based on the LSTM and Dempster–Shafer evidence theory
Lu et al. A sharing deep reinforcement learning method for efficient vehicle platooning control
Hu et al. Active uncertainty reduction for safe and efficient interaction planning: A shielding-aware dual control approach
Shang et al. [Retracted] Human‐Computer Interaction of Networked Vehicles Based on Big Data and Hybrid Intelligent Algorithm
Lu et al. Altruistic cooperative adaptive cruise control of mixed traffic platoon based on deep reinforcement learning
Yang et al. Improved deep reinforcement learning for car-following decision-making
Khanum et al. Involvement of deep learning for vision sensor-based autonomous driving control: a review
Hu et al. Manoeuvre prediction and planning for automated and connected vehicles based on interaction and gaming awareness under uncertainty
Yan et al. A Survey of Generative AI for Intelligent Transportation Systems
Gao et al. Deep learning‐based hybrid model for the behaviour prediction of surrounding vehicles over long‐time periods
CN114004406A (zh) 车辆轨迹预测方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891535

Country of ref document: EP

Kind code of ref document: A1