WO2023004698A1 - 智能驾驶决策方法、车辆行驶控制方法、装置及车辆 - Google Patents

智能驾驶决策方法、车辆行驶控制方法、装置及车辆 Download PDF

Info

Publication number
WO2023004698A1
WO2023004698A1 PCT/CN2021/109331 CN2021109331W WO2023004698A1 WO 2023004698 A1 WO2023004698 A1 WO 2023004698A1 CN 2021109331 W CN2021109331 W CN 2021109331W WO 2023004698 A1 WO2023004698 A1 WO 2023004698A1
Authority
WO
WIPO (PCT)
Prior art keywords
vehicle
strategy
self
game
game object
Prior art date
Application number
PCT/CN2021/109331
Other languages
English (en)
French (fr)
Inventor
戴正晨
王志涛
杨绍宇
王新宇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21951305.8A priority Critical patent/EP4360976A1/en
Priority to CN202180008224.5A priority patent/CN115943354A/zh
Priority to PCT/CN2021/109331 priority patent/WO2023004698A1/zh
Publication of WO2023004698A1 publication Critical patent/WO2023004698A1/zh
Priority to US18/424,238 priority patent/US20240166242A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0015Planning or execution of driving tasks specially adapted for safety
    • B60W60/0018Planning or execution of driving tasks specially adapted for safety by employing degraded modes, e.g. reducing speed, in response to suboptimal conditions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • B60W30/095Predicting travel path or likelihood of collision
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/08Interaction between the driver and the control system
    • B60W50/14Means for informing the driver, warning the driver or prompting a driver intervention
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/08Interaction between the driver and the control system
    • B60W50/14Means for informing the driver, warning the driver or prompting a driver intervention
    • B60W2050/146Display means

Definitions

  • the present application relates to intelligent driving technology, in particular to an intelligent driving decision-making method, a vehicle driving control method, a device, and a vehicle.
  • L1-L5 levels among which, L1 level, assisted driving, can help the driver complete certain driving task, and can only help complete one driving operation
  • L2 level partial automation, which can automatically perform acceleration, deceleration and steering operations at the same time
  • L3 level conditional automation, the vehicle can realize automatic acceleration, deceleration and steering in a specific environment, no need Driver's operation
  • L4 level highly automated, can realize the whole driving without a driver, but there will be restrictions, such as limiting the vehicle speed to not exceed a certain value, and the driving area is relatively fixed
  • L5 level fully automated, fully adaptive Driving, adapt to any driving scene. The higher these ratings, the more powerful the autonomous driving capabilities.
  • the present application provides an intelligent driving decision-making method, a vehicle driving control method, a device, a vehicle, etc., which can consume as little computing power as possible to realize the decision-making of self-driving under the premise of ensuring the decision-making accuracy.
  • the first aspect of the present application provides an intelligent driving decision-making method, including: obtaining the game object of the self-vehicle; performing multiple releases of multiple strategy spaces from the multiple strategy spaces of the self-vehicle and the game object. After one release in , determine the strategy feasible region of the self-vehicle and the game object according to the strategy spaces that have been released, and determine the decision-making result of the self-vehicle driving according to the strategy feasible region.
  • the feasible domain of the strategy of the self-vehicle and the non-game object includes the behavior actions that the self-vehicle can perform relative to the non-game object.
  • the decision-making accuracy can be, for example, the execution probability of the result of the decision
  • the strategy feasible region can be obtained when releasing as little strategy space as possible, so as to obtain from A behavior-action pair is selected in the policy feasible region as the decision result, which minimizes the release times and operations of the policy space, and reduces the requirements for hardware computing power.
  • the dimensions of the multiple policy spaces include at least one of the following: a vertical sampling dimension, a horizontal sampling dimension, or a time sampling dimension.
  • the plurality of strategy spaces include a longitudinal sampling strategy space spanned by the longitudinal sampling dimensions of the ego vehicle and/or the game object, a horizontal sampling strategy space spanned by the ego vehicle and/or the horizontal sampling dimensions of the game object, or a horizontal sampling strategy space spanned by the ego vehicle and/or the game object /or the time dimension strategy space formed by the game objects in the time sampling dimension, or the strategy space formed by any combination of two or three of the vertical sampling dimension, horizontal sampling dimension, or time sampling dimension.
  • the time-dimension strategy space corresponds to the strategy space stretched in multiple single-frame deduction included in one-step decision-making, and in each single-frame deduction, the strategy space stretched may include vertical sampling strategy space and/or horizontal Sampling policy space.
  • the corresponding policy space can be expanded in at least one sampling dimension, and the policy space can be released.
  • performing multiple releases of multiple policy spaces includes performing the releases in order of the following dimensions: vertical sampling dimension, horizontal sampling dimension, and time sampling dimension.
  • the strategy space released multiple times can include the following strategy spaces:
  • the vertical sampling strategy space spanned by a set of values of the longitudinal sampling dimension of the self-vehicle; the longitudinal sampling strategy space spanned by another set of values of the longitudinal sampling dimension of the self-vehicle;
  • the vertical sampling strategy space formed by a set of group values and a set of values of the longitudinal sampling dimension of the game object; Zhang Cheng's vertical sampling strategy space; the longitudinal sampling strategy space formed by another set of values of the longitudinal sampling dimension of the self-vehicle and another set of values of the longitudinal sampling dimension of the game object; the horizontal sampling dimension of the ego-vehicle
  • the horizontal sampling strategy space spanned by a set of values of , and the vertical sampling strategy space spanned by the longitudinal sampling dimension of the ego vehicle and/or the longitudinal sampling dimension of the game object are jointly spanned into a strategy space;
  • the horizontal sampling strategy space formed by another set of values of the sampling dimension is jointly formed by the longitudinal sampling strategy space formed by the longitudinal sampling dimension of the ego vehicle and/or the
  • the released time dimension strategy space includes: multiple In each single-frame deduction, the created strategy space can include the aforementioned vertical sampling strategy space, horizontal sampling strategy space, and vertical sampling strategy space and horizontal sampling strategy space The strategy space jointly formed by the strategy space.
  • the total cost value of the behavior-action pair in the strategically feasible domain is determined according to one or more of the following: the self-vehicle or the game
  • one or more cost values can be selected according to needs to calculate the total cost value, which is used to determine the feasible region.
  • each cost value has a different weight.
  • the different weights can focus on driving safety, right of way, passability, comfort, risk, etc.
  • the weight allocation may be allocated as follows: safety weight > road right weight > lateral offset weight > passability weight > comfort weight > risk area weight > inter-frame correlation weight.
  • the decision result of driving the own vehicle is determined according to the feasible domains of each strategy of the own vehicle and each game object.
  • the final strategy feasible region is determined according to the intersection of each strategy feasible region through the respectively obtained strategy feasible regions.
  • the intersection here refers to behavior actions that both include the same action of the self-vehicle.
  • the first aspect also includes: obtaining the non-game object of the self-vehicle; determining the feasible domain of the strategy of the self-vehicle and the non-game object; the feasible domain of the strategy of the self-vehicle and the non-game object includes the relative Behavior actions that non-game objects can perform; at least determine the decision result of the self-vehicle driving according to the feasible domain of the strategy of the self-vehicle and the non-game object.
  • the strategy feasible region of the decision result of the own vehicle is determined according to the intersection of the strategy feasible regions of the own vehicle and each game object, or the strategy feasible region of the decision result of the own vehicle and each game object is determined. Domain and the intersection of each policy feasible domain of ego vehicle and each non-game object determines the strategy feasible domain of the decision result of ego vehicle driving.
  • the final strategy feasible region and the decision result of ego vehicle driving can be obtained through the intersection of each strategy feasible domain of ego vehicle and multiple game objects.
  • the final strategy feasible region and the decision result of ego-vehicle driving can be obtained according to the intersection of the strategy feasible regions of the self-vehicle and multiple game objects and non-game objects.
  • the first aspect also includes: obtaining the non-game object of the self-vehicle; constraining the longitudinal sampling strategy space corresponding to the self-vehicle or constraining the horizontal sampling strategy space corresponding to the self-vehicle according to the motion state of the non-game object Sampling policy space.
  • the longitudinal sampling strategy space corresponding to the self-vehicle is constrained, that is, the range of values in the longitudinal sampling dimension of the ego-vehicle used when constraining the longitudinal sampling strategy space;
  • the horizontal sampling strategy space corresponding to the ego-vehicle is constrained, That is, the range of values in the horizontal sampling dimension of the self-vehicle used when constraining the horizontal sampling strategy space.
  • the range of longitudinal acceleration or lateral offset of the self-vehicle in Zhangcheng's strategy space can be constrained by the motion state of non-game objects, such as position and velocity, which reduces the number of behaviors in the strategy space amount, which can further reduce the amount of computation.
  • the first aspect also includes: acquiring the non-game objects of the game objects of the self-vehicle; constraining the longitudinal sampling strategy space corresponding to the game objects of the self-vehicle according to the motion state of the non-game objects, or constraining The horizontally sampled policy space corresponding to the game objects of the ego vehicle.
  • constrain the longitudinal sampling strategy space corresponding to the game object of the self-vehicle that is, constrain the value range on the longitudinal sampling dimension corresponding to the game object of the ego-vehicle used when constraining the longitudinal sampling strategy space; the constraint and the ego-vehicle
  • the horizontal sampling strategy space corresponding to the game object of that is, the range of values in the horizontal sampling dimension of the ego vehicle’s game object used when constraining the horizontal sampling strategy space.
  • the range of longitudinal acceleration or lateral offset of the ego vehicle game object in Zhangcheng’s strategy space can be constrained by the motion state of non-game objects, such as position and speed, which reduces the behavior of the strategy space.
  • the number of actions can further reduce the amount of computation.
  • a conservative decision on the driving of the ego vehicle is executed, and the conservative decision includes an action of making the ego car stop safely, or an action of making the ego car safely decelerate.
  • the game object or non-game object is determined according to the attention method.
  • the game object and non-game object can be determined according to the attention allocated by each obstacle to the self-vehicle.
  • the attention method can be implemented through algorithms, or it can be implemented through neural network reasoning.
  • the first aspect also includes: displaying at least one of the following through the human-computer interaction interface: the decision result of the ego vehicle driving, the policy feasible region of the decision result, and the ego vehicle corresponding to the decision result of the ego vehicle driving The driving trajectory, or the driving trajectory of the game object corresponding to the decision result of ego vehicle driving.
  • the human-computer interaction interface can display the decision-making results of the corresponding driving of the vehicle or the game with rich content, and the interaction with the user is more friendly. .
  • the second aspect of the present application provides an intelligent driving decision-making device, including: an acquisition module, used to obtain the game object of the self-vehicle; a processing module, used to execute multiple strategies from the multiple strategy spaces of the self-vehicle and the game object For multiple releases of space, when one of the multiple releases is executed, the strategy feasible regions of the self-vehicle and the game object are determined according to the released strategy spaces, and the decision-making result of the self-vehicle is determined according to the strategy feasible regions.
  • the dimensions of the multiple policy spaces include at least one of the following: a vertical sampling dimension, a horizontal sampling dimension, or a time sampling dimension.
  • performing multiple releases of multiple policy spaces includes performing the releases in the order of the following dimensions: vertical sampling dimension, horizontal sampling dimension, and time sampling dimension.
  • the total cost value of the action pairs in the strategically feasible domain is determined according to one or more of the following: the self-vehicle or the game object The object's safety cost, right of way cost, lateral offset cost, passability cost, comfort cost, inter-frame correlation cost, and risk area cost.
  • each cost value has a different weight.
  • the decision result of driving the own vehicle is determined according to the feasible domains of each strategy of the own vehicle and each game object.
  • the acquisition module is also used to obtain the non-game object of the self-vehicle;
  • the processing module is also used to determine the feasible domain of the strategy of the self-vehicle and the non-game object;
  • the strategy of the self-vehicle and the non-game object Feasible region includes the actions that the self-vehicle can execute relative to non-game objects; at least determine the decision-making result of self-vehicle driving according to the feasible region of the strategy of the self-vehicle and non-game objects.
  • the processing module is also used to determine the strategy feasible region of the decision result of the driving of the own vehicle according to the intersection of the strategy feasible regions of the own vehicle and each game object, or according to the intersection of the own vehicle and each game object.
  • Each strategy feasible region of the object and the intersection of each strategy feasible region of the ego vehicle and each non-game object determine the strategy feasibility region of the decision result of the ego vehicle driving.
  • the acquisition module is also used to acquire non-game objects of the self-vehicle; the processing module is also used to constrain the longitudinal sampling strategy space corresponding to the self-vehicle according to the motion state of the non-game objects, or Constrains the space of lateral sampling policies corresponding to the ego-vehicle.
  • the acquisition module is also used to acquire the non-game objects of the game object of the self-vehicle; the processing module is also used to constrain the game objects corresponding to the game object of the self-vehicle
  • the strategy space is sampled vertically, or the strategy space is horizontally sampled with constraints corresponding to the game objects of the ego vehicle.
  • the conservative decision of the ego vehicle when the intersection is an empty set, the conservative decision of the ego vehicle is executed, and the conservative decision includes an action of making the ego vehicle stop safely, or an action of making the ego vehicle safely decelerate.
  • the game object or non-game object is determined according to the attention method.
  • the processing module is further configured to display at least one of the following through the human-computer interaction interface: the decision result of the self-vehicle driving, the policy feasible region of the decision result, and the corresponding The driving trajectory of the own vehicle, or the driving trajectory of the game object corresponding to the decision result of the driving of the own vehicle.
  • the third aspect of the present application provides a vehicle driving control method, including: acquiring obstacles outside the vehicle; determining the decision result of vehicle driving according to any method of the first aspect for the obstacle; and controlling the driving of the vehicle according to the decision result.
  • the fourth aspect of the present application provides a vehicle driving control device, including: an acquisition module, used to acquire obstacles outside the vehicle; a processing module, used to determine the decision result of vehicle driving according to any method of the first aspect for obstacles; The processing module is also used to control the driving of the vehicle according to the decision result.
  • a fifth aspect of the present application provides a vehicle, including: the vehicle travel control device of the fourth aspect, and a travel system; the vehicle travel control device controls the travel system.
  • the sixth aspect of the present application provides a computing device, including: a processor, and a memory, on which program instructions are stored, and when the program instructions are executed by the processor, the processor implements any intelligent driving decision-making method of the first aspect, or The program instructions, when executed by the processor, cause the processor to implement the vehicle travel control method of the third aspect.
  • the seventh aspect of the present application provides a computer-readable storage medium on which program instructions are stored.
  • the program instructions When executed by a processor, the program instructions enable the processor to implement any intelligent driving decision-making method in the first aspect, or when the program instructions are processed
  • the processor When the processor is executed, the processor is made to implement the vehicle driving control method of the third aspect.
  • FIG. 1 is a schematic diagram of a traffic scene in which a road vehicle is driven according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of an embodiment of the present application applied to a vehicle
  • Fig. 3A-Fig. 3E are the schematic diagrams of game objects and non-game objects in different traffic scenarios provided by the embodiment of the present application;
  • Fig. 4 is a flow chart of the intelligent driving decision-making method provided by the embodiment of the present application.
  • Fig. 5 is the flow chart that obtains game object among Fig. 4;
  • Fig. 6 is the flowchart of obtaining decision-making result in Fig. 4;
  • FIG. 7 is a schematic diagram of multi-frame derivation provided in the embodiment of the present application.
  • FIG. 8A-FIG. 8F are schematic diagrams of the cost function provided by the embodiment of the present application.
  • Fig. 9 is a flow chart of driving control provided by another embodiment of the present application.
  • FIG. 10 is a schematic diagram of a traffic scene in an embodiment of the present application.
  • Fig. 11 is a flow chart of the driving control provided by the specific embodiment of the present application.
  • FIG. 12 is a schematic diagram of an intelligent driving decision-making device provided in an embodiment of the present application.
  • FIG. 13 is a flowchart of a vehicle driving control method provided in an embodiment of the present application.
  • Fig. 14 is a schematic diagram of a vehicle driving control device provided by an embodiment of the present application.
  • Figure 15 is a schematic diagram of a vehicle provided in the embodiment of the present application.
  • FIG. 16 is a schematic diagram of an embodiment of a computing device of the present application.
  • intelligent driving decision-making solutions include intelligent driving decision-making methods and devices, vehicle driving control methods and devices, vehicles, electronic devices, computing equipment, computer-readable storage media, and computer program products. Since the principles of these technical solutions to solve problems are the same or similar, in the introduction of the following specific embodiments, some repetitions may not be repeated, but it should be considered that these specific embodiments have been referred to each other and can be combined with each other.
  • Figure 1 shows a traffic scene of vehicles driving on the road.
  • the north-south road and the east-west road form an intersection A, where: the first vehicle 901 is located on the south side of the intersection A and is traveling from south to north; the second vehicle 902 is located at the north of the intersection A side, and travel from north to south; the third vehicle 903 is located on the east side of intersection A, and travels from east to south, that is, it will turn left at intersection A and merge into the north-south road; behind the first vehicle 901, there is a fourth vehicle 904, the fourth vehicle 904 is also driving from south to north; near the southeast corner of intersection A, there is a fifth vehicle 905 parked on the side of the north-south road, that is, the fifth vehicle 905 is located on the roadside in front of the first vehicle 901.
  • the first vehicle 901 can detect the current traffic scene, and can make a decision on the driving strategy under the current traffic scene, and then control the driving of the vehicle according to the decision result, for example, according to the decision Preempting or yielding or avoiding strategies, controlling the vehicle to accelerate or decelerate or change lanes, etc.
  • An intelligent driving decision-making scheme is based on game-based decision-making of driving strategies. For example, the first vehicle 901 decides the driving strategy of its own vehicle in a traffic scene with the second vehicle 902 driving in the opposite direction through a game. The decision-making of driving strategy based on the game method is difficult to deal with the decision-making in complex traffic scenarios. For example, for the traffic scenario shown in Fig.
  • the third vehicle 903 crosses the intersection A, and the fifth vehicle 905 is parked on the roadside in front of it.
  • the game objects of the first vehicle 901 are the second vehicle 902 and the third vehicle 901 at the same time.
  • the vehicle 903 therefore needs to use a multi-dimensional game space for game decision-making, for example, a multi-dimensional game space formed by the respective horizontal driving dimensions and longitudinal driving dimensions of the first vehicle 901, the second vehicle 902, and the third vehicle 903.
  • the use of multi-dimensional game space will lead to an explosive increase in the number of game decision-making solutions, resulting in a geometric increase in the computational burden, which poses a great challenge to the existing hardware computing power. Therefore, currently limited by hardware computing power, it is difficult to realize productization in intelligent driving scenarios by using multi-dimensional game space for game decision-making.
  • the embodiment of the present application provides an improved intelligent driving decision-making scheme.
  • the basic principle of the scheme includes: for the self-vehicle, identify obstacles in the current traffic scene, and the obstacles may include The game object of the self-vehicle and the non-game object of the self-vehicle.
  • the strategy space formed by a single sampling dimension or multiple sampling dimensions is released multiple times, and after each release of the strategy space, search The solution of the self-vehicle and its single game object in the strategy space.
  • the decision result of the self-vehicle driving is determined according to the game result, and the driving of the vehicle can be controlled according to the decision result .
  • the unreleased strategy space can no longer be released from the multi-dimensional game space.
  • the strategy feasible domains of the ego vehicle for each game object can be respectively determined as above, and the action of the ego vehicle is used as an index to obtain the intersection of the strategy feasible domains of the ego vehicle and each game object (intersection index Both include the same action of the own car) to obtain the driving strategy of the own car.
  • the method can obtain the optimal driving decision in the multi-dimensional game space with the least number of searches, and can reduce as much as possible
  • the use of policy space reduces the requirements for hardware computing power and makes it easier to productize on vehicles.
  • the implementation subject of the intelligent driving decision-making scheme of the embodiment of the present application may be an intelligent body with power and can move autonomously.
  • the intelligent body can make game decisions with other objects in the traffic scene through the intelligent driving decision-making scheme provided by the embodiment of the present application. Generate semantic-level decision labels and the expected driving trajectory of the agent, so that the agent can perform reasonable horizontal and vertical motion planning.
  • the intelligent body can be, for example, a vehicle with an automatic driving function, a robot that can move autonomously, and the like.
  • the vehicles here include general motor vehicles, such as cars, sports utility vehicles (Sport Utility Vehicle, SUV), multi-purpose vehicles (Multi-purpose Vehicle, MPV), automatic guided transport vehicles (Automated Guided Vehicle, AGV) Land transportation devices including buses, trucks and other cargo or passenger vehicles, as well as water surface transportation devices including various ships and boats, and aircraft.
  • general motor vehicles such as cars, sports utility vehicles (Sport Utility Vehicle, SUV), multi-purpose vehicles (Multi-purpose Vehicle, MPV), automatic guided transport vehicles (Automated Guided Vehicle, AGV) Land transportation devices including buses, trucks and other cargo or passenger vehicles, as well as water surface transportation devices including various ships and boats, and aircraft.
  • motor vehicles it also includes hybrid vehicles, electric vehicles, gasoline vehicles, plug-in hybrid vehicles, fuel cell vehicles and other alternative fuel vehicles.
  • a hybrid vehicle refers to a vehicle with two or more power sources, and an electric vehicle includes a pure electric vehicle, an extended-range electric vehicle, and the like.
  • the aforementioned robot that can move autonomously may also belong to one of the vehicles.
  • the intelligent driving decision-making scheme provided by the embodiment of the present application is applied to a vehicle as an example.
  • a vehicle in some embodiments may also include a communication device 14 , a navigation device 15 , or a display device 16 .
  • the environment information acquiring device 11 can be used to acquire the external environment information of the vehicle.
  • the environmental information acquisition device 11 may include a camera, a laser radar, a millimeter-wave radar, an ultrasonic radar, or a Global Navigation Satellite System (Global Navigation Satellite System, GNSS) as described later, and the number may be one or A plurality of, wherein, camera can comprise conventional RGB (Red Green Blue) tricolor camera sensor, infrared camera sensor etc.
  • the acquired environment outside the vehicle includes road surface information and objects on the road surface. Objects on the road surface include surrounding vehicles and pedestrians, etc. Specifically, it may include vehicle motion state information. Motion state information may include vehicle speed, acceleration, heading angle information, trajectory information etc.
  • the movement state information of surrounding vehicles may also be obtained through the communication device 14 of the vehicle 10 .
  • the environment information outside the vehicle acquired by the environment information acquisition device 11 can be used to form a world model constructed by roads (corresponding to road surface information) and obstacles (corresponding to objects on the road surface).
  • the environmental information acquisition device 11 can also be an electronic device that receives the external environmental information of the vehicle transmitted by the camera sensor, infrared night vision camera sensor, laser radar, millimeter wave radar, ultrasonic radar, etc., such as a data transmission chip , data transmission chips such as bus data transceiver chips, network interface chips, etc., data transmission chips can also be wireless transmission chips, such as Bluetooth (Bluetooth) chips or Wi-Fi chips.
  • the environment information acquisition device 11 may also be integrated into the control device 12 to become an interface circuit or a data transmission module integrated into the processor.
  • control device 12 can be used to make intelligent driving strategy decisions based on the acquired vehicle external environment information (including the constructed world model), and generate decision results.
  • the decision results can include acceleration, braking, steering ( Including changing lanes or turning), and also including the expected driving trajectory of the vehicle in the short term (eg, within a few seconds).
  • the control device 12 can further generate corresponding instructions to control the driving system 13 according to the decision result, so as to execute the driving control of the vehicle through the driving system 13, and control the vehicle to realize the desired driving trajectory according to the decision result.
  • control device 12 can be an electronic device, for example, it can be a processor of a vehicle-mounted processing device such as a vehicle machine, a domain controller, a mobile data center (Mobile Data Center, MDC) or a vehicle-mounted computer, or it can be a central processing unit.
  • vehicle-mounted processing device such as a vehicle machine, a domain controller, a mobile data center (Mobile Data Center, MDC) or a vehicle-mounted computer, or it can be a central processing unit.
  • Conventional chips such as Central Processing Unit (CPU) and Microprocessor (Micro Control Unit, MCU).
  • the driving system 13 may include a power system 131, a steering system 132, and a braking system 133, which will be introduced separately below:
  • the power system 131 may include a driving electronic control unit (Electrical Control Unit, ECU) and a driving source.
  • the driving ECU controls the driving force (such as torque) of the vehicle 10 by controlling the driving source.
  • the drive source there may be an engine, a drive motor, or the like.
  • the driving ECU can control the driving source according to the driver's operation of the accelerator pedal, or can control the driving source according to the command sent from the control device 12, thereby being able to control the driving force.
  • the driving force of the driving source is transmitted to the wheels via a transmission or the like, thereby driving the vehicle 10 to run.
  • the steering system 132 may include a steering electronic control unit (ECU) and an electric power steering system (Electric Power Steering, EPS).
  • the steering ECU can control the EPS motor according to the driver's operation of the steering wheel, or can control the EPS motor according to the command sent from the control device 12, thereby controlling the direction of the wheels (specifically, the steering wheels).
  • steering can also be performed by changing the torque distribution or braking force distribution to the left and right wheels.
  • the braking system 133 may include a braking electronic control unit (ECU) and a braking mechanism.
  • the brake mechanism makes the brake components work through the brake motor, hydraulic mechanism, etc.
  • the brake ECU can control the brake mechanism according to the driver's operation of the brake pedal, or can control the brake mechanism according to the command sent from the control device 12, and can control the braking force.
  • the braking system 133 may also include an energy regenerative braking mechanism.
  • a communication device 14 may also be included.
  • the communication device 14 can perform data interaction with external objects through wireless communication, and obtain data required by the vehicle 10 for smart driving decisions.
  • the communicable external object may include a cloud server, a mobile terminal (such as a mobile phone, a laptop, a tablet, etc.), a roadside device, or a surrounding vehicle.
  • the data required for decision-making includes user portraits of vehicles around the vehicle 10 (that is, other cars), which reflect the driving habits of other car drivers, and may also include the position of other cars and the movement of other cars. status information, etc.
  • a navigation device 15 may also be included, and the navigation device 15 may include a Global Navigation Satellite System (Global Navigation Satellite System, GNSS) receiver and a map database.
  • the navigation device 15 can determine the position of the vehicle 10 through satellite signals received by the GNSS receiver, and can generate a route to the destination according to the map information in the map database, and send information about the route (including the position of the vehicle 10) Provided to the control device 12.
  • the navigation device 15 may also have an inertial measurement unit (Inertial Measurement Unit, IMU), and perform more accurate positioning of the vehicle 10 by fusing the information of the GNSS receiver and the information of the IMU.
  • IMU Inertial Measurement Unit
  • a display device 16 may also be included, for example, it may be a display screen installed in the central control position of the cockpit of the vehicle, or it may be a head-up display device (Head Up Display, HUD).
  • the control device 12 may display the decision result on the display device 16 in the cockpit of the vehicle in a manner understandable to the user, for example, in the form of expected driving trajectory, arrows, text, and the like.
  • when displaying the desired driving trajectory it may also be displayed on a display device in the vehicle cockpit in the form of a partially enlarged view in combination with the vehicle's current traffic scene (such as a graphical traffic scene).
  • the control device 12 can also display information on the route to the destination provided by the navigation device 15 .
  • a voice playback system may also be included, which prompts the user to make decisions about the current traffic scene by playing voices.
  • the intelligent driving vehicle that is in the traffic scene and executes the intelligent driving decision-making method provided in the embodiment of the present application is called the self-vehicle.
  • the self-vehicle From the perspective of the self-vehicle, other objects in the traffic scene that affect or may affect the driving of the self-vehicle are called obstacles of the self-vehicle.
  • the self-vehicle has a certain behavior decision-making ability, and can generate a driving strategy to change its own motion state.
  • the driving strategy includes acceleration, braking, and steering (including changing lanes or steering), and the self-vehicle also has driving behavior execution Ability, including executing the driving strategy, and driving according to the desired driving trajectory determined by the decision.
  • the obstacle of the self-vehicle can also have behavioral decision-making ability to change its own motion state, for example, the obstacle can be a vehicle, a pedestrian, etc. that can move autonomously.
  • the obstacle of the self-vehicle may also not have behavioral decision-making ability, or may not change its motion state.
  • the obstacle may be a vehicle parked on the side of the road (the vehicle is in a non-starting state), a width-limiting pier on the road, etc.
  • the obstacles of the self-vehicle can include: pedestrians, bicycles, motor vehicles (such as motorcycles, passenger cars, trucks, trucks, buses, etc.), etc., wherein the motor vehicles can include intelligent driving systems that can implement intelligent driving decision-making methods. vehicle.
  • the obstacles of the own vehicle can be further divided into game objects of the own vehicle, non-game objects of the own vehicle or irrelevant obstacles of the own vehicle.
  • the interaction strength between game objects, non-game objects and irrelevant obstacles and the ego vehicle gradually weakens from strong interaction to no interaction. It should be understood that in multiple traffic scenarios corresponding to different decision-making moments, game objects, non-game objects, and irrelevant obstacles may be converted to each other.
  • the position or motion state of the irrelevant obstacle of the own vehicle makes it completely irrelevant to the future behavior of the own vehicle, and there will be no trajectory conflict or intention conflict between the own vehicle and the irrelevant obstacle in the future, so unless otherwise specified, the Obstacles refer to the game objects of the own vehicle and the non-game objects of the own vehicle.
  • trajectory conflict or intention conflict between the non-game objects of the self-vehicle and the self-vehicle in the future.
  • trajectory conflicts or intention conflicts between them, and the self-vehicle needs to unilaterally adjust the motion state of the self-vehicle to resolve the possible trajectory conflicts or intention conflicts between it and its non-game objects in the future, that is, the non-game objects of the self-vehicle
  • the non-game objects of the self-vehicle There is no game interaction relationship with the self-vehicle. That is to say, the non-game object of the self-vehicle will not be affected by the driving behavior of the self-vehicle, and will maintain its predetermined motion state, and will not adjust its motion state to resolve possible trajectory conflicts or intention conflicts between the self-vehicle and the future.
  • the ego vehicle 101 goes straight through the unprotected intersection.
  • the oncoming vehicle 102 (in the left front of the own vehicle 101) turns left and passes through the unprotected intersection.
  • there is a trajectory conflict or intention conflict between the oncoming vehicle 102 and the own vehicle 101 and the oncoming vehicle 102 is the game object of the own vehicle 101 .
  • the ego vehicle 101 is going straight.
  • the oncoming vehicle 102 on the left crosses the lane where the own vehicle 101 is located and passes.
  • the ego vehicle 101 is going straight.
  • Oncoming vehicle 102 in the same direction (right front of own vehicle 101) merges into own vehicle lane or adjacent lane of own vehicle.
  • there is a trajectory conflict or intention conflict between the oncoming vehicle 102 and the own vehicle 101 and a game interaction relationship will be established with the own vehicle 101 , and the oncoming vehicle 102 is the game object of the own vehicle 101 .
  • the own car 101 is going straight, and the opposite vehicle 103 is going straight on the left adjacent lane of the own car 101, and there is a stationary vehicle 102 on the right adjacent lane of the own car 101 (in the right front of the own car 101) .
  • there is a trajectory conflict or intention conflict between the oncoming vehicle 103 and the own vehicle 101 and a game interaction relationship will be established with the own vehicle 101 , and the oncoming vehicle 103 is the game object of the own vehicle 101 .
  • the location of the stationary vehicle 102 will conflict with the trajectory of the self-vehicle 101 in the future, but according to the obtained external environment information, it can be confirmed that in the interactive game decision-making process, the stationary vehicle 102 will not switch to the moving state or even if it switches to the moving state, but It has a relatively high right of way and will not establish a game interaction relationship with the self-vehicle 101. Therefore, the stationary vehicle 102 is a non-game object of the self-vehicle. track conflicts.
  • the ego vehicle 101 changes lanes to the right from the current lane to merge into the right adjacent lane.
  • the first straight vehicle 103 has a higher right of way, does not establish a game interaction relationship with the own vehicle 101 , and is a non-game object of the own vehicle 101 .
  • the second straight-going vehicle 102 at the right rear of the own vehicle 101 will have a trajectory conflict with the own vehicle 101 in the future, and will establish a game interaction relationship with the own vehicle 101 , and the second straight-going vehicle 102 is the game object of the own vehicle 101 .
  • this step may include the following sub-steps:
  • the self-vehicle obtains the external environment information of the vehicle, and the acquired external environment information includes the movement state and relative position information of the self-vehicle and obstacles in the road scene.
  • the ego vehicle acquires the external environment information of the vehicle through its environmental information acquisition device, such as camera sensor, infrared night vision camera sensor, laser radar, millimeter wave radar, ultrasonic radar, GNSS, etc. to obtain.
  • the vehicle's acquisition of the vehicle's external environment information may be through its communication device communicating with a roadside device, or communicating with a cloud server to obtain its vehicle's external environment information.
  • the roadside device may have a camera or a communication device, which can obtain vehicle information around it, and the cloud server can receive and store the information reported by each roadside device.
  • the above two methods may also be combined to acquire the vehicle external environment information.
  • S12 According to the acquired movement state of the obstacle, or the movement state of the obstacle within a period of time, or the driving track of the formed obstacle, and the relative position information with the obstacle, identify it from the obstacle The game object from the car.
  • non-gaming objects of the own vehicle may also be identified from the obstacles at the same time, or non-gaming objects of the own vehicle's game objects may be identified from the obstacles.
  • the aforementioned game objects or non-game objects can be identified according to preset judging rules.
  • the judgment rule is for example: if the driving trajectory or driving intention of an obstacle conflicts with the driving trajectory or driving intention of the own vehicle, and the obstacle has behavioral decision-making ability and can change its own motion state, then it is the game object of the own vehicle . If the driving trajectory or driving intention of an obstacle conflicts with the driving trajectory or driving intention of the self-vehicle, but does not actively change its own motion state to actively avoid the conflict, it is a non-game object with the self-vehicle.
  • the driving trajectory or driving intention of the obstacle can be judged according to the lane in which it is driving (straight-going or turning lane), whether to turn on the turn signal, the direction of the head of the vehicle, and so on.
  • the crossing obstacle and the obstacle merged into the lane all belong to the large-angle intersection with the own vehicle trajectory and thus have a driving trajectory conflict, and are further classified as game vehicles.
  • the oncoming vehicle 103 passing through the narrow lane in FIG. 3D and the following vehicle 104 whose own vehicle merges into the traffic flow in the side lane in FIG. 3E belong to conflicting intentions, and thus are classified as game vehicles.
  • the car 103 in front of which the ego car merges into the traffic flow of the side lane although the trajectories or intentions have conflicts, because the ego car has a lower right of way than the other car, the other car does not Will take actions to resolve the conflict, and the behavior of the own car can not change the behavior of other cars, then the other car is a non-game car.
  • the self-vehicle obtains obstacle information from its perceived or acquired vehicle external environment information based on some known algorithms, and identifies the game object, non-game object or game object of the self-vehicle from the obstacles non-game objects.
  • the above algorithm can be, for example, a classification neural network based on deep learning. Since it is to identify the type of obstacle, it is equivalent to classification, so the neural network of the classification model can be used for reasoning and determination.
  • the classification neural network can adopt Convolutional Neural Networks (CNN), Recurrent Neural Network (RNN), Bidirectional Encoder Representations from Transformers (Bidirectional Encoder Representations from Transformers, BERT), etc.
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Network
  • BERT Bidirectional Encoder Representations from Transformers
  • the classification neural network when the classification neural network is trained, the neural network can be trained using sample data, the sample data can be a picture or a video clip of a vehicle driving scene marked with a classification label, and the classification label can include game objects, non-game objects, The non-game object of the game object.
  • the above algorithm may also utilize attention-related algorithms, such as modeled attention models.
  • the attention model is used to output the attention value assigned by each obstacle to the self-vehicle, which is related to the degree of intention conflict or trajectory conflict between the obstacle and the self-vehicle. For example, obstacles that have intention conflicts or trajectory conflicts with the ego vehicle will assign more attention to the ego vehicle; while obstacles that do not have intention conflicts or trajectory conflicts with the ego vehicle will assign less attention or zero attention to the ego vehicle. Attention; obstacles with a higher right of way than the ego vehicle can also assign little or zero attention to the ego vehicle.
  • the obstacle can be identified as the game object of the self-vehicle. If the obstacle assigns enough attention to the self-vehicle (such as below a certain threshold), the obstacle can be identified as a non-game object of the self-vehicle.
  • the attention model can also be implemented by a neural network, and the output of the neural network at this time is the attention value assigned to the ego vehicle corresponding to the identified obstacle.
  • Step S20 For the interactive game task between the ego vehicle and the game object, execute multiple releases of the multiple strategy spaces from the multiple strategy spaces between the ego vehicle and the game object, when the multiple releases After a release is executed, the strategy feasible region of the own vehicle and the game object is determined according to the released strategy spaces, and the decision result of the driving of the own vehicle is determined according to the strategy feasible region.
  • the decision result refers to an executable behavior action pair between the ego vehicle and the game object in the policy feasible domain.
  • step S20 completes the single-vehicle interactive game decision-making process between the self-vehicle and any game object, and determines the feasible domain of the strategy between the self-vehicle and the game object.
  • performing multiple releases of the plurality of policy spaces includes performing successive releases of each of the policy spaces, that is, only one policy space is released for each release, and successive releases are performed. After that, multiple policy spaces are released accumulatively. At this time, each policy space is spanned by at least one sampling dimension.
  • the ego vehicle first accelerates and decelerates in the current lane before changing lanes. Therefore, when the above-mentioned multiple strategy spaces are released, the optional , in the process of releasing multiple strategy spaces, they can be successively released according to the following dimension order: vertical sampling dimension, horizontal sampling dimension, and time sampling dimension. Different dimensions can form different strategy spaces. For example, when the vertical sampling dimension is released, the vertical sampling strategy space can be expanded, and when the horizontal sampling dimension is released, the horizontal sampling strategy space can be expanded, or it can be expanded together with the vertical sampling dimension.
  • the strategy space combined with the vertical sampling strategy space and the horizontal sampling strategy space can form multiple strategy spaces composed of multi-frame deduction when the time sampling dimension is released.
  • it can also be the sequential release of the combination of various parts of different strategy spaces. For example, when releasing for the first time, the partial space of the longitudinal sampling strategy space and the partial space of the horizontal adoption strategy space are released sequentially. , and then sequentially release the remaining space of the vertical sampling strategy space and the remaining local space of the horizontal sampling strategy space.
  • the cumulatively released multiple strategy spaces may include: vertical sampling strategy space, horizontal sampling strategy space, strategy space formed by combining vertical sampling strategy space and horizontal sampling strategy space, vertical sampling strategy space and horizontal sampling strategy space The strategy space formed by combining the strategy space with the time sampling dimension, and the strategy space formed by combining the vertical sampling dimension, horizontal sampling dimension, and time sampling dimension.
  • the dimensions constituting the policy space may include vertical sampling dimensions, horizontal sampling dimensions, or time sampling dimensions.
  • the longitudinal sampling dimension used when the longitudinal sampling strategy space is stretched includes at least one of the following: the longitudinal acceleration of the ego vehicle and the longitudinal acceleration of the game object;
  • the horizontal sampling dimension used when the horizontal sampling strategy space is stretched includes at least the following One: the lateral offset of the ego vehicle and the lateral offset of the game object;
  • the time sampling dimension includes multiple strategy spaces composed of continuous multi-frame deduction corresponding to continuous time points (that is, sequentially increasing the deduction depth). The combination of these three dimensions can constitute the above-mentioned multiple strategy spaces.
  • the values in each horizontal or vertical sampling dimension when each strategy space is stretched correspond to the sampling action of the ego vehicle or the game object, that is, the behavioral action.
  • this step S20 may include the following substeps S21-S26:
  • S21 Execute the first release of the strategy space, release the strategy space of the self-vehicle and the game object, and take out multiple values in at least one sampling dimension of the self-vehicle and the value of the game object one by one according to the released strategy space Behavior-action pairs formed by multiple values on at least one sampling dimension.
  • the longitudinal sampling dimension is released, including the longitudinal acceleration of the ego vehicle and the longitudinal acceleration of the game object, and the strategy space formed by the released longitudinal sampling dimension is the longitudinal sampling strategy Space, to simplify the description, hereinafter referred to as the longitudinal sampling strategy space of the first release.
  • the strategy space is the behavior-action pair consisting of the longitudinal acceleration of the ego vehicle to be evaluated and the longitudinal acceleration of the game object (namely the other vehicle).
  • multiple sampling values may be set in each sampling dimension.
  • a sampling interval may be formed by a plurality of sampling values at uniform and continuous sampling intervals.
  • a plurality of sampling values scattered on the sampling dimension are discrete sampling points.
  • the longitudinal acceleration dimension of the ego vehicle uniform sampling is performed at a predetermined sampling interval, and multiple sampling values of the ego vehicle in the longitudinal acceleration dimension can be obtained, denoted as M1, that is, the M1 longitudinal acceleration of the ego vehicle Acceleration sampling action.
  • M1 longitudinal acceleration of the ego vehicle Acceleration sampling action M1 longitudinal acceleration of the ego vehicle Acceleration sampling action.
  • N1 longitudinal acceleration sampling actions of the game object N1 longitudinal acceleration sampling actions of the game object .
  • the longitudinal sampling strategy space released for the first time includes M1*N1 behavior action pairs of the ego vehicle and the game object obtained by combining the ego vehicle’s longitudinal acceleration sampling action and the game object’s longitudinal acceleration sampling action.
  • Table 1 or Table 2 A specific example of this strategy space can be found in Table 1 or Table 2 below.
  • the first row and the first column of Table 1 or Table 2 are the values of the own car and the game object (that is, the other car O in the table) respectively.
  • the game object is the crossing game car
  • the game object is the opposing game car.
  • S22 Deduce each behavior-action pair in the released strategy space into the sub-traffic scene currently constructed by the self-vehicle and the game object, and determine the cost value corresponding to each behavior-action pair.
  • the self-vehicle and each game object can also construct each sub-traffic scene separately, and each sub-traffic scene is a subset of the road scene where the self-vehicle is located.
  • the cost value corresponding to each behavior action pair in the strategy space is determined according to at least one of the following: the corresponding safety cost value and comfort cost value of the self-vehicle and the game object executing the behavior action pair , lateral offset cost value, passability cost value, right-of-way cost value, risk area cost value, and inter-frame correlation cost value.
  • the weighted sum of the above-mentioned cost values can be used.
  • the calculated weighted sum can be called the total cost value. The greater the benefit of the corresponding decision-making by the object performing the behavior action, the greater the possibility of the behavior action being a decision result.
  • the above-mentioned generation values will be further described later.
  • S23 Add the behavior action pair whose cost value is not greater than the cost threshold to the strategy feasible region between the self-vehicle and the game object.
  • the strategy feasible region is the result of the game between the self-vehicle and the game object when the strategy space is released for the first time .
  • the policy feasible domain refers to the set of executable behavior and action pairs.
  • the table items whose table content is Cy or Cg in Table 1 below constitute the feasible domain of the strategy.
  • the second release of the policy space is performed, that is, the next policy space among the multiple policy spaces is released.
  • the first The second release is the horizontal sampling dimension
  • the horizontal sampling strategy space is formed by the horizontal offset.
  • the horizontal sampling strategy space released this time and the vertical sampling strategy space released for the first time are jointly used as the current strategy space.
  • the strategy space currently used for the interactive game between the ego vehicle and the game object is the strategy space formed by the combination of the vertical sampling strategy space and the horizontal sampling strategy space.
  • the lateral sampling strategy space is formed on the lateral offset dimension of the ego vehicle and the lateral offset dimension of the game object.
  • uniform sampling is performed at a predetermined sampling interval, and multiple sampling values of the ego vehicle on the lateral offset dimension can be obtained, recorded as Q, that is, the Q of the ego vehicle A horizontal offset sampling action.
  • uniform sampling is carried out at a predetermined sampling interval, and multiple sampling values of the game object on the horizontal offset dimension can be obtained, which are recorded as R, that is, the R lateral offset of the game object Shift sampling action.
  • the behavior of each ego-vehicle and the game object is composed of the ego-vehicle’s lateral offset sampling action, the game object’s lateral offset sampling action, and the ego-vehicle’s longitudinal acceleration
  • the sampling action and the longitudinal acceleration sampling action of the game object are jointly formed.
  • the strategy space of the self-vehicle and the game object released for the second time includes M2*N2*Q*R behavior-action pairs.
  • Table 3 below, wherein each table item in the upper horizontal sampling strategy space in Table 3 is associated with a table item in the lower vertical sampling strategy space in Table 3.
  • the game object that is, the other car O in the table
  • the opposing game car is the opposing game car.
  • step S26 When the strategy feasible region (ie game result) in step S25 is not empty, select a behavior-action pair as the decision result, and end the release of the strategy space.
  • the policy feasible domain When the policy feasible domain is empty, it means that the current policy space has no solution, and at this time, the third release of the policy space is performed, that is, the next policy space among the multiple policy spaces is released. In this way, other strategy spaces can be continuously released in sequence according to the above method, so as to continue to execute the determination of the game result and the decision result.
  • the strategy space released multiple times above can be released first by the i-th group of values of the ego vehicle and/or game object on the longitudinal acceleration dimension and the ego vehicle and/or game object on the lateral offset dimension
  • the strategy space formed by the i-th group of values on and when there is no strategy feasible region in the strategy space, release the i+1-th group of values and The strategy space formed by the i+1th set of values of the ego vehicle and/or the game object in the horizontal offset dimension.
  • the strategy space released multiple times is the game space formed by all the values of the ego vehicle and/or game objects on the longitudinal acceleration dimension and all the values of the ego vehicle and/or game objects on the lateral offset dimension Move its local position inside, where i is a positive integer.
  • i is a positive integer.
  • the strategy spaces corresponding to different parts are released sequentially, and the optimal decision-making result can be obtained in the multi-dimensional game space with the least number of searches, and the use of strategy space can be reduced as much as possible, and the requirements for hardware computing power can be reduced.
  • the policy space between the self-vehicle and the game object is still empty after the above steps are executed multiple times to release the strategy space, it means that there is still no solution.
  • the above-mentioned conservative decision-making includes the behavior of making the self-vehicle brake safely, the behavior of making the self-vehicle decelerate safely, or giving a prompt or warning so that the driver takes over the control of the vehicle.
  • a single-frame deduction is completed.
  • the policy feasible domain may also include: according to the development of the deduction time (that is, increasing the deduction depth in sequence, for multiple consecutive moments), Perform multiple releases of the time sampling dimension and perform multi-frame deduction.
  • T1 is used to indicate the initial motion state of the self-vehicle and the game object
  • T2 is used to indicate the motion state of the self-vehicle and the game object after the first frame deduction, that is, the deduction result of the first frame
  • Tn is used for Indicates the motion state of the self-vehicle and the game object after deduction in frame n-1.
  • the derivation time is moved backward by a predetermined time interval (eg, 2 seconds or 5 seconds), that is, moved to the next moment (or referred to as a time point).
  • a predetermined time interval eg, 2 seconds or 5 seconds
  • the deduction result of the current frame is used as the deduction initial condition of the next frame to deduce the motion state of the ego vehicle and the game object at the next moment; thus, in this way, the time sampling dimension can continue to be continued at subsequent moments release, to continue to execute subsequent continuous multi-frame deduction, to continue to execute the determination of game results and decision results.
  • the release time sampling dimension it is necessary to evaluate the decision results of the behavior decisions between the self-vehicle and the game object determined by two adjacent single-frame deduction, and determine the inter-frame correlation cost value, which will be described in detail later .
  • Releasing the time sampling dimension helps to improve the behavior consistency of the vehicle. For example, when the self-vehicle and the game object’s motion state or decision result corresponding to the intention and decision of the game object are the same or similar, the intelligent driving vehicle executes The driving behavior of the intelligent driving decision-making method is more stable in the time domain, the fluctuation of the driving trajectory is smaller, and the driving comfort of the vehicle is better.
  • the above release time sampling dimension that is, at multiple consecutive decision-making moments, use the released multiple strategy spaces to obtain the feasible domain of the strategy corresponding to the self-vehicle and the game object, and deduce the self-vehicle and the game object in order of time
  • the motion state after executing the executable behavior and action pairs corresponding to the feasible regions of these strategies.
  • the consistency of decision results in time can be achieved.
  • the decision result of the first frame in the multi-frame derivation can be used as the decision result of the self-vehicle driving.
  • the decision result of the first frame may be reselected.
  • the decision result of the first frame corresponds to the decision result of single-frame deduction, and the decision result of the first frame is reselected, that is, another behavior-action pair is selected as the decision result from the policy feasible domain of the decision result of the first frame.
  • multi-frame deduction can be performed again to judge whether it can be used as the final decision result.
  • the ranking results of the corresponding cost values can be selected according to each behavior action, and the total cost value is preferably selected.
  • the action behavior corresponds to the corresponding decision result.
  • the above-mentioned different cost values may have different weights, and the corresponding ones may be respectively referred to as safety weight, comfort weight, lateral offset weight, passability weight, right-of-way weight, risk area weight, frame Inter-correlation weights.
  • the weight distribution may be assigned as follows: safety weight>right of way weight>lateral offset weight>passability weight>comfortability weight>risk area weight>inter-frame correlation weight.
  • the above cost values may be normalized respectively, and the value range is [0,1].
  • the above cost values can be calculated according to different cost functions, and the corresponding ones can be respectively called safety cost function, comfort cost function, passability cost function, lateral offset cost function, and right of way cost function .
  • the safety cost value can be calculated according to the safety cost function with the relative distance between the self-vehicle and other cars (that is, the game object) as the independent variable, and the safety cost value and the relative distance are negative relevant. For example, the greater the relative distance between two vehicles, the smaller the safety penalty.
  • a normalized security cost function is the following piecewise function, where C dist is the security cost value, and dist is the relative distance between the self-vehicle and the game object, for example, the minimum The distance is defined as the polygon minimum distance between the self-vehicle and the game object:
  • threLow is the lower threshold of the distance, which is 0.2 in FIG. 8A
  • threHigh is the upper threshold of the distance, and is 1.2 in FIG. 8A
  • the distance lower threshold threLow and the distance upper threshold threHigh can be dynamically adjusted according to the interaction between the self-vehicle and other vehicles, such as the relative speed, relative distance, and relative angle between the self-vehicle and other vehicles.
  • the safety cost value defined by the safety cost function is positively related to relative velocity or relative angle. For example, when meeting cars in opposite directions or laterally (horizontal means that the other car crosses the own car), the greater the relative speed or relative angle of the interaction between the two cars, the greater the corresponding safety cost.
  • the comfort cost value of the vehicle can be calculated according to a comfort cost function whose independent variable is the absolute value of the acceleration change (ie, jerk).
  • a homogenized comfort cost function is the following piecewise function, where C comf is the comfort cost, and jerk is the acceleration change of the ego vehicle or the game object:
  • threMiddle is the jerk middle point threshold, which is 2 in the example shown in FIG. 8B
  • threHigh is the jerk upper limit threshold, which is 4 in FIG. 8B
  • C middle is the jerk cost slope. That is, the greater the acceleration change of the vehicle, the worse the comfort and the greater the comfort cost, and the comfort cost increases faster after the acceleration change of the vehicle is greater than the intermediate point threshold.
  • the acceleration variation of the vehicle may be a longitudinal acceleration variation, a lateral acceleration variation, or a weighted sum of the two.
  • the comfort cost may be the comfort cost of the self-vehicle, or the comfort cost of the game object, or the weighted sum of the comfort cost of the two.
  • the passability cost value can be calculated according to a passability cost function with the velocity variation of the ego vehicle or the game object as an independent variable. For example, a vehicle giving way with a large deceleration will result in a large speed loss (the difference between the current speed and the future speed, that is, acceleration) or a long waiting time, and the vehicle's passability cost will increase. For example, a vehicle rushing with a large acceleration will result in a large speed increase (the difference between the current speed and the future speed, that is, the acceleration) or a short waiting time, and the vehicle's passability cost will decrease.
  • a passability cost function with the velocity variation of the ego vehicle or the game object as an independent variable. For example, a vehicle giving way with a large deceleration will result in a large speed loss (the difference between the current speed and the future speed, that is, acceleration) or a long waiting time, and the vehicle's passability cost will increase.
  • a vehicle rushing with a large acceleration will result in a large speed
  • the passability cost value can also be calculated according to the passability cost function with the relative speed ratio of the ego vehicle and the game object as an independent variable. For example, before the action pair is executed, the absolute value of the speed of the own vehicle accounts for a large proportion of the sum of the absolute values of the speeds of the own vehicle and the game object, and the absolute value of the speed of the game object is greater than the sum of the absolute values of the speed of the own vehicle and the game object. A small proportion of them. After executing the action pair, if the ego vehicle yields with a greater deceleration, its speed loss will increase, and the speed ratio will decrease, so the passing cost of the ego vehicle performing the action action will be greater. However, if after executing the action pair, the game object rushes forward with a relatively large acceleration, its speed increases, and the speed ratio increases, and the passability cost corresponding to the game object's execution of the action action is relatively small.
  • the passability cost value is the passability cost value of the own vehicle performing the behavior action to the corresponding own vehicle passability cost value, or the game object executes the behavior action to the corresponding game object passability cost value, or the two pass through Weighted sum of sexual cost values.
  • the normalized passability cost function is the following piecewise function, where C pass is the passability cost value, and speed is the absolute value of the speed of the vehicle:
  • the absolute value of the vehicle's intermediate speed is speed0
  • the maximum value of the vehicle's absolute speed is speed1
  • C middle is the slope of the speed cost. That is, the greater the absolute value of the speed of the vehicle, the better the passability and the smaller the passability cost, and the passability cost decreases faster after the absolute value of the vehicle speed is greater than the intermediate point threshold.
  • the road right information corresponding to the vehicle can be determined according to the obtained user portrait of the ego vehicle or the game object. For example, if the driving behavior of the game object belongs to an aggressive style, and it is more inclined to adopt a rush decision, it is a high road Right of way, if the driving behavior of the game object is conservative and more inclined to adopt a yielding strategy, it is a low right of way. Among them, the high right of way tends to maintain the established motion state or established driving behavior, and the low right of way tends to change the established motion state or established driving behavior.
  • the user profile can be determined according to the user's gender, age, or completion of historical actions.
  • the data required for determining the user profile may be acquired by the cloud server and the user profile may be determined. If the self-vehicle and/or the game object executes the behavior action to cause the vehicle with a high right of way to change its motion state, the cost of the behavior action to the corresponding right of way is greater, and the benefit is smaller.
  • penalties are increased by determining a higher right-of-way cost value for behavioral decisions that result in a change in the vehicle's motion state with a high right-of-way. That is to say, through this feedback mechanism, the behavior action that makes the high-right-of-way vehicle maintain the current state of motion has a greater right-of-way benefit, that is, a smaller right-of-way cost.
  • the normalized road right cost function is the following piecewise function, where C roadRight is the road right cost value, and acc is the absolute value of the acceleration of the vehicle:
  • threHigh is the acceleration upper limit threshold, which is 1 in FIG. 8D . That is, the greater the acceleration of the vehicle, the greater the value of the right of way.
  • the right-of-way cost function makes the behavior of the high-right-of-way vehicle maintain the current motion state have a small right-of-way cost value, thereby preventing the behavior of the high-right-of-way vehicle from changing the current motion state from becoming a decision result.
  • the acceleration of the vehicle may be a longitudinal acceleration or a lateral acceleration. That is to say, in the dimension of lateral offset, a large lateral change will also make the cost of the right of way larger.
  • the right-of-way cost value can be the corresponding right-of-way cost value of the self-vehicle performing the behavior action, or the corresponding right-of-way cost value of the game object performing the behavior action, or the two right-of-way cost values weighted sum of .
  • the vehicle is in a risky area on the road (in this area, the vehicle has a greater driving risk and needs to leave the risky area as soon as possible)
  • the area will not have a serious impact on traffic.
  • the vehicles in the risk area of the road will not give way, that is, the behavior that causes the vehicle in the risk area of the road to give way will be abandoned Decision-making (the behavior decision has a large risk area cost value), and choose the decision result (with a small risk area cost value) of the vehicle in the risk area of the road to rush to leave the risk area as soon as possible, so as to avoid being in the road Vehicles in the risk area are stranded and have a serious impact on traffic.
  • the cost value of the risk area can be the cost value of the corresponding risk area when the self-vehicle in the risk area of the road executes the behavior action, or the game object in the risk area of the road executes the action action to the corresponding Risk area cost value, or the weighted sum of both risk area cost values.
  • the lateral offset cost can be calculated according to the lateral offset of the ego vehicle or the game object.
  • the horizontal offset cost function after the homogenization process is the following piecewise function in the right half space, where C offset is the lateral offset cost value, offset is the lateral offset of the vehicle, and the unit is meter,
  • the equation expression of the left half space can be obtained by inverting the equation expression of the right half space of the coordinate plane:
  • threMiddle is the middle value of the lateral offset, for example, it is the soft boundary of the road; C middle is the slope of the first lateral offset cost; threHigh is the upper threshold of the lateral offset, for example, it is the hard boundary of the road. That is to say, the larger the lateral offset of the vehicle, the smaller the lateral offset benefit and the greater the lateral offset cost, and after the vehicle’s lateral offset is greater than the median value of the lateral offset, the lateral offset cost increases by Faster for increased penalties. After the lateral offset of the vehicle is greater than the upper limit threshold of the lateral offset, for example, the lateral offset cost value is fixed at 1.2 to increase the penalty.
  • the lateral offset cost value can be the corresponding lateral offset cost value of the ego vehicle performing the behavior action, or the corresponding lateral offset cost value of the game object performing the behavior action, or both Weighted sum of offset cost values.
  • the intention decision of the self-vehicle in the previous frame K is a rushing game object
  • the corresponding frame interval Affinity cost value will be small, like 0.3, while the default value is 0.5, so it is a bonus.
  • the intention decision of the current frame K+1 of the ego vehicle is the yielding game object
  • the corresponding inter-frame correlation cost value will be larger, such as 0.8, and the default value is 0.5, so it is a penalty.
  • choosing the strategy that makes the intention decision of the ego vehicle in the current frame as the object of the rushing game becomes the feasible solution of the current frame.
  • the inter-frame correlation cost value can be calculated according to the ego vehicle's intention decision in the previous frame and the ego vehicle's intention decision in the current frame , can also be calculated according to the intention decision of the game object in the previous frame and the intention decision of the game object in the current frame, or obtained after weighting the ego vehicle and the game object.
  • steps S30 and/or S40 may also be included:
  • Step S30 The own vehicle generates the longitudinal/lateral control amount according to the decision result, so that the driving system of the own vehicle executes the longitudinal/lateral control amount to realize the expected driving trajectory of the own vehicle.
  • control device of the own vehicle generates the longitudinal/lateral control amount according to the decision result, and sends the longitudinal/lateral control amount to the driving system 13, so that the driving system 13 can execute the driving control of the vehicle, including power control , steering control, and braking control, so that the vehicle executes the desired driving track of its own vehicle according to the decision result.
  • Step S40 Display the decision result on the display device in a manner understandable to the user.
  • the decision result of ego vehicle driving includes the behavior of ego car.
  • the intended decision-making of the own vehicle can be predicted, such as rushing, yielding or avoiding, and the expected driving trajectory of the own vehicle can also be predicted.
  • the decision result is displayed on the display device in the cockpit of the vehicle in a manner understandable to the user, such as the expected driving trajectory, arrows indicating the decision-making, and words indicating the decision-making.
  • the desired driving trajectory when the desired driving trajectory is displayed, it may be combined with the vehicle's current traffic scene (such as a graphical traffic scene) and displayed on a display device in the vehicle cockpit in the form of a partially enlarged view.
  • a voice playback system may also be included, which can prompt the user for the intention decision or policy label that has been decided by playing voice.
  • step S10 considering the one-way interactive decision between the self-vehicle and the non-game object of the self-vehicle, or the one-way interactive decision between the game object of the self-vehicle and the non-game object of the self-vehicle game object, such as Another embodiment shown in FIG. 9, between the above step S10 and step S20, further includes the following steps:
  • S15 Constrain the strategy space of the ego vehicle or the game object through the motion state of the non-game object.
  • the range of values of the self-vehicle in each sampling dimension can be constrained with regard to the motion state of the non-game object of the own vehicle
  • the range of values of the game objects of the own vehicle in each sampling dimension may be constrained with respect to the motion states of the non-game objects of the game objects of the own vehicle.
  • the value range may be one or more sampling intervals on the sampling dimension, or may be a plurality of discrete sampling points.
  • the value range after the constraint may be a partial value range.
  • step S15 includes: regarding the one-way interaction process between the self-vehicle and its non-game object, determine the value range of the self-vehicle on each sampling dimension under the constraints of the motion state of the non-game object; or The one-way interaction process between the game object of the car and the non-game object of the game object of the self-vehicle determines the selection of the game object of the self-vehicle in each sampling dimension under the constraints of the motion state of the non-game object of the game object of the self-vehicle. range of values.
  • step S20 is performed, which is conducive to narrowing the gap between the self-vehicle and the game objects of the self-vehicle in the process of single-vehicle interactive game decision-making.
  • the game space and strategy space reduce the computing power used in the interactive game decision-making process.
  • step S15 may include: first, receiving the motion state of the non-game object of the self-vehicle, and observing the feature quantity of the non-game object, such as; then, calculating the relationship between the self-vehicle and the non-game object The conflict area of the game object determines the characteristic quantity of the self-vehicle, that is, the critical action.
  • the feasible interval of the game object that is, the value range of the self-vehicle after being constrained by the non-game object in each sampling dimension.
  • the self-vehicle when it has a non-game object C, it can also be: first process the interactive game decision between the self-vehicle A and the game object B of the own vehicle, determine the corresponding strategy feasible region AB, and then introduce The non-game feasible region AC of the self-vehicle and the non-game object C, and then take the intersection of the policy feasible region AB and the non-game feasible region AC to obtain the final strategy feasible region ABC, based on the final strategy feasible region to determine the target The decision result of ego driving.
  • the interactive game decision between the self-vehicle A and the game object B of the self-vehicle can also be processed first, and the corresponding strategy feasible region AB is determined, and then Then introduce the non-game feasible domain BD of the game object B of the self-vehicle and its non-game object D, and then take the intersection of the policy feasible domain AB and the non-game feasible domain BD to obtain the final feasible domain ABD, based on the final strategy is feasible domain to determine the decision outcome for the ego vehicle.
  • the interactive game decision between the own vehicle A and the game object B of the own vehicle can also be processed first, Determine the corresponding strategy feasible region AB, and then introduce the non-game feasible region AC of the self-vehicle A and the non-game object C, and the non-game feasible region BD of the game object B of the self-vehicle and its non-game object D, and then introduce the strategy feasible region AB, the non-game feasible domain AC, and the non-game feasible domain BD are intersected to obtain the final feasible domain ABCD, and based on the final policy feasible domain, the decision result for the self-vehicle driving is determined.
  • the above specifically exemplifies the steps of determining the executable behavior action pairs of the self-vehicle and the game object by successively releasing the multiple strategy spaces of the self-vehicle and a single game object.
  • the intelligent driving provided by the embodiment of the present application Decision-making methods, including:
  • Step 1 From the multiple strategy spaces of the self-vehicle and the first game object, execute successive releases of each of the strategy spaces, and determine the feasible domain of the self-vehicle's driving strategy for the first game object. Wherein, determining the feasible region of the self-vehicle driving strategy for the first game object is similar to the aforementioned step S20. I won't go into details here.
  • Step 2 From the plurality of strategy spaces of the self-vehicle and the second game object, execute successive releases of each of the strategy spaces, and determine the feasible domain of the self-vehicle's driving strategy for the second game object.
  • determining the feasible region of the self-vehicle driving strategy for the second game object is similar to the aforementioned step S20. I won't go into details here.
  • Step 3 Determine the decision result of the driving of the self-vehicle according to the feasible domains of each strategy of the own vehicle and each of the game objects.
  • the final policy feasible domain is obtained by intersecting each policy feasible domain, and then the decision result is determined from the policy feasible domain.
  • the decision result may be the behavior-action pair with the least cost in the feasible domain of the strategy.
  • the ego vehicle acquires the environment information outside the vehicle through the environment information acquisition device 11 .
  • S120 The ego vehicle determines game objects and non-game objects.
  • This step can be referred to the aforementioned step S12, and will not be repeated here.
  • it is determined that a game object from the car is a crossing game car, and a game object is an opposing game car.
  • S130 From the plurality of strategy spaces of the self-vehicle and the traversing game vehicle, perform sequential release of each of the strategy spaces, and determine the game result of the self-vehicle and the traversing game vehicle. Specifically, the following steps S131-S132 may be included:
  • the longitudinal acceleration dimension of the self-vehicle and the crossing game car is released, and the first longitudinal sampling strategy space of the self-vehicle and the crossing game car is created.
  • the value range of the longitudinal acceleration of the crossing game car is [-4,3], and the unit is m/s 2 , where m means meter and s means second.
  • the sampling interval of the own vehicle and the crossing game vehicle is determined to be 1m/s 2 .
  • Zhang Cheng’s strategy space is shown in Table 1 when displayed in a two-dimensional form.
  • the first line of Table 1 lists all the values Ae of the longitudinal acceleration of the own car, and the first column lists all the values Ao1 of the longitudinal acceleration values of the crossing game car. That is to say, the longitudinal sampling strategy space of the self-vehicle and the crossing game car released this time includes 8 times 8, that is, 64 pairs of longitudinal acceleration behaviors of the self-vehicle and the crossing game car.
  • S132 According to the pre-defined cost value determination methods, such as each cost function, calculate the cost value corresponding to each behavior action pair in the longitudinal sampling strategy space of the self-vehicle and the crossing game vehicle, and determine the feasible region of the strategy.
  • the self-vehicle and the crossing game car execute 3 plus 13, that is, after 16 sampling actions, in the sub-traffic scene constructed by the self-vehicle and the crossing game car, the safety cost value , comfort cost, passability cost, lateral offset cost, right-of-way cost, risk area cost, and inter-frame correlation cost are greater than the preset cost threshold, which is the strategy space Feasible solution constitutes the strategy feasible region of the self-vehicle and the traversing game vehicle.
  • the cost value of the horizontal offset is zero. Because it is the decision of the current frame and does not involve the decision result of the previous frame, the inter-frame correlation cost value is zero.
  • the interactive game between the self-vehicle and the crossing game vehicle finds enough feasible solutions in the vertical sampling strategy space, and it is no longer necessary to continue to find solutions in the horizontal sampling dimension.
  • the logarithm of the search behavior is 64, and the current round of the game consumes less computing power and less computing time.
  • the behavior decision of the own car and the crossing game car is: the self car accelerates, and the crossing game car decelerates.
  • the self-vehicle and the cross-travel game car perform any of the three sampling actions, the self-vehicle will pass before the cross-travel game car Therefore, it is determined that the intention decision corresponding to these behavioral action pairs is the self-vehicle rushing across the game vehicle.
  • set the self-vehicle rushing decision label for these three behavior-action pairs that is, "Cg" in Table 1.
  • the behavior decision of the own vehicle and the crossing game vehicle is: the crossing game vehicle accelerates, and the self vehicle decelerates.
  • the motion state of the self-vehicle and the traversing game acquired at the beginning of the decision-making in the current frame deduction, it can be deduced that after the self-vehicle and the traversing game vehicle perform any of the 13 sampling actions, the traversing game vehicle will pass before the self-vehicle Therefore, it is determined that the intention decision corresponding to these behavioral action pairs is to cross the game vehicle and seize the self-vehicle.
  • set the decision-making label of crossing the game car that is, "Cy" in Table 1.
  • S140 From the plurality of strategy spaces of the self-vehicle and the opposing game vehicle, execute successive releases of each of the strategy spaces, and determine the game result of the self-vehicle and the opposing game vehicle. Specifically, the following steps S141-S144 may be included:
  • S141 According to the principle of releasing the vertical sampling dimension first and then the horizontal sampling dimension, release the longitudinal sampling dimension of the self-vehicle and the opposing game car, and create the first longitudinal sampling strategy space for the self-vehicle and the opposing game car.
  • Zhang Cheng's strategy space is displayed in two-dimensional form as shown in Table 2.
  • the first row of Table 2 lists all the values of the longitudinal acceleration of the ego vehicle, Ae; the first column lists all the values of the longitudinal acceleration of the opposing game car, Ao2. That is to say, the longitudinal sampling strategy space of the ego vehicle and the opposing game vehicle released this time includes 8 times 8, a total of 64 longitudinal acceleration behavior action pairs of the ego vehicle and the opposing game vehicle.
  • S142 According to the pre-defined cost value determination methods, such as each cost function, respectively calculate the cost value corresponding to each behavior action pair in the longitudinal sampling strategy space of the ego vehicle and the forward game vehicle, and determine the strategy feasible region.
  • the safety cost value or the passability cost value are greater than the preset The cost threshold, there is no feasible solution in the strategy space released for the first time, and the strategy feasible regions of the self-vehicle and the opposing game vehicle are empty.
  • S143 Release the lateral offset dimension of the own vehicle, and form a second strategy space of the own vehicle and the opposing game vehicle with the longitudinal acceleration dimension of the own vehicle and the opposing game vehicle.
  • release part of the values of the self-vehicle on the lateral offset dimension and part of the values of the self-vehicle and the opposing game car on the longitudinal acceleration dimension and create a second strategy space for the ego-vehicle and the opposing game car.
  • Figure 10 shows a schematic diagram of the lateral sampling actions determined for the lateral offset sampling for the two vehicles, respectively. That is, the multiple lateral deviation behaviors correspond to multiple mutually parallel lateral deviation trajectories that the vehicle can execute.
  • the first row of the upper sub-table of Table 3 lists all values Oe of the lateral offset values of the ego vehicle, and the first column lists all values Oo2 of the lateral offset values of the opposing game vehicle. Therefore, the horizontal sampling strategy space formed by the self-vehicle and the opposing game vehicle in the horizontal offset dimension includes at most 7 times 7, that is, 49 pairs of lateral offset behaviors of the ego-vehicle and the opposing game vehicle.
  • each cost value determination method such as each cost function, calculate the partial values of the own vehicle on the lateral offset dimension and the partial values of the own vehicle and the opposing game vehicle on the longitudinal acceleration dimension to form a second time The corresponding cost value of each behavior action pair in the released strategy space, and determine the feasible domain of the strategy.
  • the lateral offset value of the self-vehicle is 1, in the 64 released longitudinal acceleration behavior pairs of the self-vehicle and the opposing game car, after the self-vehicle and the opposing game car perform 48 sampling actions, the In the sub-traffic scene constructed for the game car, the weighted sum of safety, comfort, passability, lateral offset cost value, road right cost value, risk area cost value, and inter-frame correlation cost value is greater than the preset cost
  • the threshold is the feasible solution in the strategy space, which constitutes the strategy feasible region of the self-vehicle and the opposing game vehicle.
  • these 48 action pairs are identified with the label "1".
  • the inter-frame correlation cost value is zero.
  • the label of the corresponding behavior-action pair in the sub-table of Table 3 is adjusted from “-1" to "0" when the longitudinal accelerations of the ego vehicle and the opposing game vehicle are respectively -1. This is because, when the ego vehicle laterally shifts to the right by 1m, the ego car and the opposing game car no longer have the risk of collision. The passability is too poor (for braking), and it is still an infeasible solution, but the label is adjusted from "-1" to "0".
  • the intention decision can also be determined for the ego vehicle and the opposing game vehicle, and a policy label can be set, refer to step S132, and will not be repeated here.
  • the release of the strategy space between the self-vehicle and the opposing game-vehicle mentioned above can also select multiple sampling values in the dimension of the lateral offset of the ego-vehicle, for example, the lateral offset value of the ego-vehicle is 2 or 3, and the The longitudinal acceleration sampling strategy space of the car and the opposing game car jointly expands into more strategy spaces.
  • S150 Find the intersection of the strategically feasible domains of the own vehicle and the opposing game vehicle and the strategically feasible domains of the own vehicle and the crossing game vehicle, and determine the game result of the own vehicle.
  • Table 4 shows the minimum cost value (that is, the best profit) found in the public feasible domains of the strategic feasible domains of the self-vehicle and the opposing game vehicle in Table 3 and the strategic feasible domains of the self-vehicle and the crossing game vehicle in Table 1 feasible solution.
  • the feasible solution is the game decision-making action pair of the self-vehicle, the opposing game vehicle and the crossing game vehicle.
  • a multi-action action pair formed by the combination of longitudinal acceleration.
  • the ego vehicle decelerates to give way with a longitudinal acceleration of -2m /s2, and laterally shifts to the right by 1m to avoid the opposing game vehicle; in order to ensure trafficability, the crossing game vehicle accelerates through the conflict with a longitudinal acceleration of 1m/ s2 Area: The opposing game vehicle accelerates through the conflict area with a longitudinal acceleration of 1m/s 2 .
  • the intention decisions are respectively: the crossing game car rushes its own car, the opposing game car snatches its own car, and the ego car laterally avoids the opponent to the right. To the game car, the car gives way to cross the game car.
  • S160 From the game results of the self-vehicle, select the decision result, and select the corresponding action pair with the smallest cost value, and determine the executable action from the own vehicle, which can be used to control the self-vehicle to execute the action.
  • an action pair may be selected according to the cost value as the decision result.
  • continuous multi-frame deduction can be further carried out for each solution, that is, the time sampling dimension is released to select the Behavior-action pairs with good consistency in the time dimension are used as the decision-making result of ego-vehicle driving.
  • the present application also provides an embodiment of a corresponding intelligent driving decision-making device.
  • the intelligent driving decision-making device 100 includes:
  • the obtaining module 110 is used to obtain the game object with the self-vehicle. Specifically, it is used to execute the above step S10 or steps S110-S120, or various optional embodiments corresponding to the steps.
  • the processing module 120 is used to perform multiple releases of multiple strategy spaces from the multiple strategy spaces of the self-vehicle and the game object. After one release in the multiple releases is executed, determine the self-vehicle according to each strategy space that has been released. According to the strategy feasible region of the game object, the decision result of ego vehicle driving is determined according to the strategy feasible region. Specifically, it is used to execute the above steps S20-S40, or various optional embodiments corresponding to the steps.
  • the dimensions of the plurality of policy spaces include at least one of the following: vertical sampling dimension, horizontal sampling dimension, or time sampling dimension.
  • performing multiple releases of multiple policy spaces includes performing the releases in the order of the following dimensions: vertical sampling dimension, horizontal sampling dimension, and time sampling dimension.
  • the total cost value of the behavior action pair in the feasible strategy domain is determined according to one or more of the following: the security cost value of the self-vehicle or the game object , right of way cost, lateral offset cost, passability cost, comfort cost, inter-frame correlation cost, and risk area cost.
  • each cost value has a different weight.
  • the decision result of driving the own vehicle is determined according to the feasible domains of each strategy of the own vehicle and each game object.
  • the acquisition module 110 is also used to obtain the non-game object of the own vehicle; the processing module 120 is also used to determine the feasible domain of the strategy from the vehicle and the non-game object; the feasible domain of the strategy of the vehicle and the non-game object Including the executable behavior of the self-vehicle relative to the non-game object; at least determine the decision-making result of the self-vehicle driving according to the feasible area of the strategy of the self-vehicle and the non-game object.
  • the processing module 120 is also used to determine the strategy feasible region of the decision result of the driving of the own vehicle according to the intersection of the feasible regions of each strategy of the own vehicle and each game object, or according to the intersection of each strategy feasible region of the own vehicle and each game object.
  • the intersection of the feasible domain and the feasible domains of each strategy of the self-vehicle and each non-game object determines the feasible domain of the strategy of the decision result of the driving of the own vehicle.
  • the obtaining module 110 is also used to obtain the non-game object of the self-vehicle; the processing module 120 is also used to constrain the longitudinal sampling strategy space corresponding to the self-vehicle according to the motion state of the non-game object, or to constrain and The lateral sampling policy space corresponding to the ego vehicle.
  • the acquiring module 110 is also used to acquire the non-game objects of the game objects of the own vehicle; the processing module 120 is also used to constrain the longitudinal sampling corresponding to the game objects of the own vehicle according to the motion state of the non-game objects Policy space, or a laterally sampled policy space with constraints corresponding to the game objects of the ego vehicle.
  • intersection set is an empty set
  • a conservative decision for the ego vehicle to travel is performed, and the conservative decision includes an action of stopping the ego vehicle safely, or an action of making the ego vehicle decelerate safely.
  • the game object or non-game object is determined according to the attention method.
  • the processing module 120 is further configured to display at least one of the following through the human-computer interaction interface: the decision result of the driving of the own vehicle, the policy feasible region of the decision result, and the driving trajectory of the own vehicle corresponding to the decision result of the driving of the own vehicle , or the driving trajectory of the game object corresponding to the decision result of ego vehicle driving.
  • the decision result of ego vehicle driving can be the decision result of the current single-frame deduction, or it can be the decision result corresponding to multiple single-frame deduction that has already been executed.
  • the decision result can be the executable behavior of the ego vehicle, or it can be
  • the behavior action that the game object can execute can also be the intention decision corresponding to the ego vehicle’s execution of the action action, such as Cg or Cy in Table 1, such as rushing, giving way or avoiding.
  • the policy feasible domain of the decision result may be the policy feasible domain of the current single-frame deduction, or the policy feasible domain corresponding to multiple single-frame deduction that has been executed.
  • the ego vehicle trajectory corresponding to the ego vehicle driving decision result can be the ego vehicle trajectory corresponding to the first single frame deduction in one-step decision-making, such as T1 in Figure 7, or it can be already executed in one-step decision-making Multiple single-frame derivations are sequentially connected to the ego vehicle trajectory, such as T1, T2, and Tn in Figure 7.
  • the driving trajectory of the game object corresponding to the decision result of ego vehicle driving can be the driving trajectory of the game object corresponding to the first single-frame deduction in one-step decision-making, such as T1 in Figure 7, or it can be already
  • the running trajectories of the game objects formed by sequential connection of multiple single-frame deduction performed are T1, T2, and Tn in FIG. 7 .
  • the embodiment of the present application also provides a vehicle driving control method, including:
  • S220 According to the obstacle information, determine the decision result of vehicle driving according to any one of the above intelligent driving decision-making methods
  • the embodiment of the present application also provides a vehicle driving control device 200, including: an acquisition module 210, used to acquire obstacles outside the vehicle; a processing module 220, An intelligent driving decision-making method determines the decision result of vehicle driving; the processing module is also used to control the vehicle running according to the decision result.
  • a vehicle driving control device 200 including: an acquisition module 210, used to acquire obstacles outside the vehicle; a processing module 220, An intelligent driving decision-making method determines the decision result of vehicle driving; the processing module is also used to control the vehicle running according to the decision result.
  • the embodiment of the present application also provides a vehicle 300 , including: the vehicle travel control device 200 described above, and a travel system 250 ; the vehicle travel control device 200 controls the travel system 250 .
  • the driving system 250 may include the aforementioned driving system 13 in FIG. 2 .
  • FIG. 16 is a schematic structural diagram of a computing device 400 provided by an embodiment of the present application.
  • the computing device 400 includes: a processor 410 , a memory 420 , and may also include a communication interface 430 .
  • the communication interface 430 in the computing device 400 shown in FIG. 16 can be used to communicate with other devices.
  • the processor 410 may be connected to the memory 420 .
  • the memory 420 can be used to store the program codes and data. Therefore, the memory 420 may be a storage unit inside the processor 410, or an external storage unit independent of the processor 410, or may include a storage unit inside the processor 410 and an external storage unit independent of the processor 410. part.
  • computing device 400 may also include a bus.
  • the memory 420 and the communication interface 430 may be connected to the processor 410 through a bus.
  • the bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the processor 410 may be a central processing unit (central processing unit, CPU).
  • the processor can also be other general-purpose processors, digital signal processors (digital signal processors, DSPs), application specific integrated circuits (Application specific integrated circuits, ASICs), off-the-shelf programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the processor 410 adopts one or more integrated circuits for executing related programs, so as to implement the technical solutions provided by the embodiments of the present application.
  • the memory 420 may include read-only memory and random-access memory, and provides instructions and data to the processor 410 .
  • a portion of processor 410 may also include non-volatile random access memory.
  • processor 410 may also store device type information.
  • the processor 410 executes computer-implemented instructions in the memory 420 to perform the operation steps of the above method.
  • the computing device 400 may correspond to a corresponding subject in performing the methods according to the various embodiments of the present application, and the above-mentioned and other operations and/or functions of the modules in the computing device 400 are for realizing the present invention For the sake of brevity, the corresponding processes of the methods in the embodiments are not repeated here.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and the program is used to execute the above method when executed by a processor, and the method includes at least one of the solutions described in the above embodiments one.
  • the computer storage medium in the embodiments of the present application may use any combination of one or more computer-readable media.
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present application may be written in one or more programming languages or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through the Internet using an Internet service provider). connect).
  • LAN local area network
  • WAN wide area network
  • connect such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Human Computer Interaction (AREA)
  • Traffic Control Systems (AREA)

Abstract

一种智能驾驶决策方法,涉及智能驾驶技术,首先,确定自车的博弈对象;然后,从自车与博弈对象的多个策略空间中,执行各策略空间的多次释放,根据已经释放的各策略空间确定自车与博弈对象的策略可行域,根据策略可行域确定自车行驶的决策结果,决策结果为自车的可执行的行为动作。通过多次释放策略空间,可以在保持决策精度的前提下,在尽量少释放策略空间的情况下得到决策结果,减少了计算量,降低了对硬件算力的需求。

Description

智能驾驶决策方法、车辆行驶控制方法、装置及车辆 技术领域
本申请涉及智能驾驶技术,特别涉及智能驾驶决策方法、车辆行驶控制方法、装置及车辆。
背景技术
随着人工智能技术的发展,自动驾驶技术正在逐渐广泛地被应用,从而降低了驾驶员的驾驶负担。关于自动驾驶,例如国际汽车工程师学会(SAE International,或称为国际自动机工程师学会)提出了5个等级,即L1-L5级,其中,L1级,辅助驾驶,能够帮助驾驶员完成某些驾驶任务,且只能帮助完成一项驾驶操作;L2级,部分自动化,可以同时自动进行加减速和转向的操作;L3级,条件自动化,车辆在特定环境中可以实现自动加减速和转向,不需要驾驶者的操作;L4级,高度自动化,可以实现驾驶全程不需要驾驶员,但是会有限制条件,例如限制车辆车速不能超过一定值,且驾驶区域相对固定;L5级,完全自动化,完全自适应驾驶,适应任何驾驶场景。这些等级越高,表示自动驾驶功能也越强大。
目前,针对L2级别以上、但需要人类驾驶员视情接管的自动驾驶技术,通常认为属于智能驾驶。车辆处于智能驾驶状态时,需要能够及时、准确地感知周围障碍物,例如对向来车、横穿车辆、静态车辆、行人等,就自身的行驶行为和行驶轨迹进行决策,例如加减速、变道等。
发明内容
本申请提供一种智能驾驶决策方法、车辆行驶控制方法、装置及车辆等,能够在保证决策精度的前提下,消耗尽量少的算力实现对自身行驶的决策。
本申请第一方面提供了一种智能驾驶决策方法,包括:获取自车的博弈对象;从自车与博弈对象的多个策略空间中,执行多个策略空间的多次释放,当多次释放中的一次释放执行后,根据已经释放的各策略空间确定自车与博弈对象的策略可行域,根据策略可行域确定自车行驶的决策结果。
其中,自车与非博弈对象的策略可行域包括自车相对于非博弈对象可执行的行为动作。由上,通过多个策略空间的多次释放,保证决策精度(决策精度可以例如为所决策的结果的执行概率)的前提下,以在释放尽量少的策略空间时得到策略可行域,从而从策略可行域中选取一行为动作对作为决策结果,实现了尽量减少策略空间的释放次数和运算,降低了对硬件算力的要求。
作为第一方面的一种可能的实现方式,多个策略空间的维度包括至少以下之一:纵向采样维度、横向采样维度、或时间采样维度。
以上,根据纵向采样维度、横向采样维度、或时间采样维度,张成多个策略空间。多个策略空间包括由自车和/或博弈对象的纵向采样维度张成的纵向采样策略空间、由 自车和/或博弈对象的横向采样维度张成的横向采样策略空间、或由自车和/或博弈对象在时间采样维度张成的时间维度策略空间、或所述纵向采样维度、横向采样维度、或时间采样维度任意两两组合或三者组合构成的策略空间。其中,时间维度策略空间对应于在一步决策中包括的多个单帧推演中分别张成的策略空间,而每一个单帧推演中,张成的策略空间可以包括纵向采样策略空间和/或横向采样策略空间。
由上,可以根据交通场景在至少一个采样维度上张成相应的策略空间,以及进行策略空间的释放。
作为第一方面的一种可能的实现方式,执行多个策略空间的多次释放包括按照以下维度的顺序执行所述释放:纵向采样维度、横向采样维度、时间采样维度。
以上,依次按照释放纵向采样维度、释放横向采样维度、释放时间采样维度的顺序,多次释放的策略空间可以包括下述策略空间:
由自车的纵向采样维度的一组取值张成的纵向采样策略空间;由自车的纵向采样维度的另一组取值张成的纵向采样策略空间;由自车的纵向采样维度的一组取值及博弈对象的纵向采样维度的一组取值共同张成的纵向采样策略空间;由自车的纵向采样维度的另一组取值及博弈对象的纵向采样维度的一组取值共同张成的纵向采样策略空间;由自车的纵向采样维度的另一组取值及博弈对象的纵向采样维度的另一组取值共同张成的纵向采样策略空间;由自车的横向采样维度的一组取值张成的横向采样策略空间,与由自车的纵向采样维度和/或博弈对象的纵向采样维度共同张成的纵向采样策略空间共同张成的策略空间;由自车的横向采样维度的另一组取值张成的横向采样策略空间,与由自车的纵向采样维度和/或博弈对象的纵向采样维度共同张成的纵向采样策略空间共同张成的策略空间;由自车的横向采样维度的一组取值及博弈对象的横向维度的一组取值共同张成的横向采样策略空间,与由自车的纵向采样维度和/或博弈对象的纵向采样维度共同张成的纵向采样策略空间共同张成的策略空间;由自车的横向采样维度的另一组取值及博弈对象的横向采样维度的一组取值共同张成的横向采样策略空间,与由自车的纵向采样维度和/或博弈对象的纵向采样维度共同张成的纵向采样策略空间共同张成的策略空间;由自车的横向采样维度的另一组取值及博弈对象的横向采样维度的另一组取值共同张成的横向采样策略空间,与由自车的纵向采样维度和/或博弈对象的纵向采样维度共同张成的纵向采样策略空间共同张成的策略空间。以及,在根据已经释放的各策略空间确定自车与博弈对象的策略可行域,根据策略可行域确定自车行驶的决策结果之后,释放的时间维度策略空间,包括:在一步决策中包括的多个单帧推演中分别张成的策略空间,在每一个单帧推演中,张成的策略空间可以包括前述的各纵向采样策略空间、横向采样策略空间、及各纵向采样策略空间和各横向采样策略空间共同张成的策略空间。
由上,顺序执行多个策略空间释放,也即先纵向改变车辆的加速度,再横向调整车辆的偏移,更符合车辆驾驶习惯,以及利于驾驶安全要求。最后可根据时间维度上释放的多帧推演进一步从多个可行域中确定时间一致性更好的决策结果。
作为第一方面的一种可能的实现方式,确定自车与博弈对象的策略可行域时,策略可行域中的行为动作对的总代价值,根据以下之一或多个确定:自车或博弈对象的安全性代价值、路权代价值、横向偏移代价值、通过性代价值、舒适性代价值、帧间 关联性代价值、风险区域代价值。
由上,可以根据需要选取一个或多个代价值,来计算总代价值,总代价值用于确定可行域。
作为第一方面的一种可能的实现方式,行为动作对的总代价值根据两个或两个以上的代价值进行确定时,各代价值具有不同的权重。
由上,该不同的权值可以分别着重关注行驶安全、路权、通过性、舒适性、风险性等。通过灵活设置各权值,增加了智能驾驶决策的灵活性。在一些可能的实现方式中,所述权重分配的大小可以按照如下分配:安全性权重>路权权重>横向偏移权重>通过性权重>舒适性权重>风险区域权重>帧间关联权重。
作为第一方面的一种可能的实现方式,博弈对象包括两个或两个以上时,自车行驶的决策结果根据自车与各博弈对象的各策略可行域确定。
由上,当有多个博弈对象时,通过分别获得的各个策略可行域,再根据各个策略可行域的交集确定最终的策略可行域。其中,这里的交集指均包括该自车的同一个动作的行为动作。
作为第一方面的一种可能的实现方式,还包括:获取自车的非博弈对象;确定出自车与非博弈对象的策略可行域;自车与非博弈对象的策略可行域包括自车相对于非博弈对象可执行的行为动作;至少根据自车与非博弈对象的策略可行域确定自车行驶的决策结果。
由上,当存在非博弈对象时,最终决策结果的获得要与非博弈对象有关。
作为第一方面的一种可能的实现方式,根据自车与各博弈对象的各策略可行域的交集确定自车行驶的决策结果的策略可行域,或根据自车与各博弈对象的各策略可行域以及自车与各非博弈对象的各策略可行域的交集确定自车行驶的决策结果的策略可行域。
由上,当存在自车与多个博弈对象时,可以通过自车与多个博弈对象的各策略可行域的交集获得最终的策略可行域及自车行驶的决策结果。当存在自车与多个博弈对象、非博弈对象时,则可以根据自车与多个博弈对象、非博弈对象的各策略可行域的交集获得最终的策略可行域及自车行驶的决策结果。
作为第一方面的一种可能的实现方式,还包括:获取自车的非博弈对象;根据非博弈对象的运动状态,约束与自车对应的纵向采样策略空间,或约束与自车对应的横向采样策略空间。
以上,约束与自车对应的纵向采样策略空间,也即约束张成纵向采样策略空间时所使用的在自车的纵向采样维度上的取值范围;约束与自车对应的横向采样策略空间,也即约束张成横向采样策略空间时所使用的在自车的横向采样维度上的取值范围。
由上,可以通过非博弈对象的运动状态,如位置、速度等,来约束张成的策略空间中自车的纵向加速度取值范围或横向偏移取值范围,降低了策略空间的行为动作数量量,可进一步减少运算量。
作为第一方面的一种可能的实现方式,还包括:获取自车的博弈对象的非博弈对象;根据非博弈对象的运动状态,约束与自车的博弈对象对应的纵向采样策略空间,或约束与自车的博弈对象对应的横向采样策略空间。
以上,约束与自车的博弈对象对应的纵向采样策略空间,也即约束张成纵向采样策略空间时所使用的在自车的博弈对象对应的纵向采样维度上的取值范围;约束与自车的博弈对象对应的横向采样策略空间,也即约束张成横向采样策略空间时所使用的在自车的博弈对象的横向采样维度上的取值范围。
由上,可以通过非博弈对象的运动状态,如位置、速度等,来约束张成的策略空间中自车博弈对象的纵向加速度取值范围或横向偏移取值范围,降低了策略空间的行为动作数量,可进一步减少运算量。
作为第一方面的一种可能的实现方式,交集为空集时,执行自车行驶的保守决策,保守决策包括使自车安全停车的动作,或,使自车安全减速行驶的动作。
由上,可以实现自车的策略可行域为空时,使得车辆能够安全行驶。
作为第一方面的一种可能的实现方式,博弈对象或非博弈对象,根据注意力方式进行确定。
由上,可以根据各障碍物对自车分配的注意力,来确定博弈对象、非博弈对象。该注意力方式可以是通过算法实现,也可以是神经网络推理实现。
作为第一方面的一种可能的实现方式,还包括:通过人机交互界面显示至少以下之一:自车行驶的决策结果、决策结果的策略可行域、自车行驶的决策结果对应的自车行驶轨迹、或自车行驶的决策结果对应的博弈对象的行驶轨迹。
由上,可以在人机交互界面以丰富的内容显示自车或博弈对应行驶的决策结果,与用户的交互更友好。。
本申请第二方面提供了一种智能驾驶决策装置,包括:获取模块,用于获取自车的博弈对象;处理模块,用于从自车与博弈对象的多个策略空间中,执行多个策略空间的多次释放,当多次释放中的一次释放执行后,根据已经释放的各策略空间确定自车与博弈对象的策略可行域,根据策略可行域确定自车行驶的决策结果。
作为第二方面的一种可能的实现方式,多个策略空间的维度包括至少以下之一:纵向采样维度、横向采样维度、或时间采样维度。
作为第二方面的一种可能的实现方式,执行多个策略空间的多次释放包括按照以下维度的顺序执行所述释放:纵向采样维度、横向采样维度、时间采样维度。
作为第二方面的一种可能的实现方式,确定自车与博弈对象的策略可行域时,策略可行域中的行为动作对的总代价值,根据以下之一或多个确定:自车或博弈对象的安全性代价值、路权代价值、横向偏移代价值、通过性代价值、舒适性代价值、帧间关联性代价值、风险区域代价值。
作为第二方面的一种可能的实现方式,行为动作对的总代价值根据两个或两个以上的代价值进行确定时,各代价值具有不同的权重。
作为第二方面的一种可能的实现方式,博弈对象包括两个或两个以上时,自车行驶的决策结果根据自车与各博弈对象的各策略可行域确定。
作为第二方面的一种可能的实现方式,获取模块还用于获取自车的非博弈对象;处理模块还用于确定出自车与非博弈对象的策略可行域;自车与非博弈对象的策略可行域包括自车相对于非博弈对象可执行的行为动作;至少根据自车与非博弈对象的策略可行域确定自车行驶的决策结果。
作为第二方面的一种可能的实现方式,处理模块还用于根据自车与各博弈对象的各策略可行域的交集确定自车行驶的决策结果的策略可行域,或根据自车与各博弈对象的各策略可行域以及自车与各非博弈对象的各策略可行域的交集确定自车行驶的决策结果的策略可行域。
作为第二方面的一种可能的实现方式,获取模块还用于获取自车的非博弈对象;处理模块还用于根据非博弈对象的运动状态,约束与自车对应的纵向采样策略空间,或约束与自车对应的横向采样策略空间。
作为第二方面的一种可能的实现方式,获取模块还用于获取自车的博弈对象的非博弈对象;处理模块还用于根据非博弈对象的运动状态,约束与自车的博弈对象对应的纵向采样策略空间,或约束与自车的博弈对象对应的横向采样策略空间。
作为第二方面的一种可能的实现方式,交集为空集时,执行自车行驶的保守决策,保守决策包括使自车安全停车的动作,或,使自车安全减速行驶的动作。
作为第二方面的一种可能的实现方式,博弈对象或非博弈对象,根据注意力方式进行确定。
作为第二方面的一种可能的实现方式,处理模块还用于通过人机交互界面显示至少以下之一:自车行驶的决策结果、决策结果的策略可行域、自车行驶的决策结果对应的自车行驶轨迹、或自车行驶的决策结果对应的博弈对象的行驶轨迹。
本申请第三方面提供了一种车辆行驶控制方法,包括:获取车外障碍物;针对障碍物,根据第一方面任一方法确定车辆行驶的决策结果;根据决策结果控制车辆的行驶。
本申请第四方面提供了一种车辆行驶控制装置,包括:获取模块,用于获取车外障碍物;处理模块,用于针对障碍物,根据第一方面任一方法确定车辆行驶的决策结果;处理模块还用于根据决策结果控制车辆的行驶。
本申请第五方面提供了一种车辆,包括:第四方面的车辆行驶控制装置,及行驶***;车辆行驶控制装置控制行驶***。
本申请第六方面提供了一种计算设备,包括:处理器,以及存储器,其上存储有程序指令,程序指令当被处理器执行时使得处理器实现第一方面任一智能驾驶决策方法,或程序指令当被处理器执行时使得处理器实现第三方面的车辆行驶控制方法。
本申请第七方面提供了一种计算机可读存储介质,其上存储有程序指令,程序指令当被处理器执行时使得处理器实现第一方面任一智能驾驶决策方法,或程序指令当被处理器执行时使得处理器实现第三方面的车辆行驶控制方法。
本申请的这些和其它方面在以下(多个)实施例的描述中会更加简明易懂。
附图说明
以下参照附图来进一步说明本申请的各个特征和各个特征之间的联系。附图均为示例性的,一些特征并不以实际比例示出,并且一些附图中可能省略了本申请所涉及领域的惯常的且对于本申请非必要的特征,或是额外示出了对于本申请非必要的特征,附图所示的各个特征的组合并不用以限制本申请。另外,在本说明书全文中,相同的附图标记所指代的内容也是相同的。具体地附图说明如下:
图1为本申请实施例提供的路面车辆行驶的一交通场景的示意图;
图2为本申请实施例应用于车辆的示意图;
图3A-图3E为本申请实施例提供的博弈对象和非博弈对象在不同交通场景下的示意图;
图4为本申请实施例提供的智能驾驶决策方法的流程图;
图5为图4中获得博弈对象的流程图;
图6为图4中获得决策结果的流程图;
图7为本申请实施例中提供的多帧推演的示意图;
图8A-图8F为本申请实施例提供的代价函数的示意图;
图9为本申请另一实施例提供的行驶控制的流程图;
图10为本申请实施方式中的交通场景示意图;
图11为本申请具体实施方式提供的行驶控制的流程图;
图12为本申请实施例提供的智能驾驶决策装置的示意图;
图13为本申请实施例提供的车辆行驶控制方法的流程图;
图14为本申请实施例提供的车辆行驶控制装置的示意图;
图15为本申请实施例提供的车辆示意图;
图16为本申请计算设备的一实施例的示意图。
具体实施方式
下面结合附图并举实施例,对本申请提供的技术方案作进一步说明。应理解,本申请实施例中提供的***结构和业务场景主要是为了说明本申请的技术方案的可能的实施方式,不应被解读为对本申请的技术方案的唯一限定。本领域普通技术人员可知,随着***结构的演进和新业务场景的出现,本申请提供的技术方案对类似技术问题同样适用。
应理解,本申请实施例提供的智能驾驶决策方案,包括智能驾驶决策方法、装置、车辆行驶控制方法及装置、车辆、电子装置、计算设备、计算机可读存储介质及计算机程序产品。由于这些技术方案解决问题的原理相同或相似,在如下具体实施例的介绍中,某些重复之处可能不再赘述,但应视为这些具体实施例之间已有相互引用,可以相互结合。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。如有不一致,以本说明书中所说明的含义或者根据本说明书中记载的内容得出的含义为准。另外,本文中所使用的术语旨在描述本申请实施例的目的,而非限制本申请。
图1所示为路面车辆行驶的一个交通场景。如图1所示,该交通场景下,南北向道路与东西向道路形成交叉路口A,其中:第一车辆901位于路口A南侧、且由南向北行驶;第二车辆902位于路口A北侧、且由北向南行驶;第三车辆903位于路口A东侧、且由东向南行驶,即其将在路口A向左转汇入南北向道路;第一车辆901后方,具有第四车辆904,第四车辆904也由南向北行驶;靠近路口A的东南角、位于南北向道路路侧停放有第五车辆905,即第五车辆905位于第一车辆901前方道路路侧位 置。假设第一车辆901此时开启了智能驾驶功能,则其可以检测到当前交通场景,并可对当前交通场景下的行驶策略进行决策,进而可以根据决策结果控制车辆的行驶,例如,根据决策出的抢行或让行或避让策略,控制车辆加速或减速或变道行驶等。一种智能驾驶决策方案,是基于博弈方式进行行驶策略的决策。例如第一车辆901通过博弈方式决策具有对向行驶的第二车辆902的交通场景下自车的行驶策略。基于博弈方式进行行驶策略的决策,难以处理复杂交通场景下的决策,例如对于图1示出的交通场景,对于第一车辆901而言,既有对向行驶的第二车辆902,又有旁侧横穿路口A的第三车辆903,且其前方路侧还停放有第五车辆905,当基于博弈方式进行行驶策略决策时,第一车辆901的博弈对象同时为第二车辆902、第三车辆903,因此需使用多维度的博弈空间进行博弈决策,例如由第一车辆901、第二车辆902、第三车辆903各自的横向行驶维度、纵向行驶维度共同张成的多维度博弈空间。使用多维度的博弈空间将导致博弈决策的解的数量呈***式增长,导致计算负担呈几何量级增加,这对现有的硬件算力提出了较大挑战。因此,目前受硬件算力制约,使用多维度的博弈空间进行博弈决策,难以在智能驾驶场景实现产品化。
本申请实施例提供了一种改进的智能驾驶决策方案,该方案应用于车辆的智能驾驶时,该方案的基本原理包括:针对自车,识别出当前交通场景中的障碍物,障碍物可包括自车的博弈对象、自车的非博弈对象。对于自车的单个博弈对象,从自车与该单个博弈对象的多维度博弈空间中,多次释放单个采样维度或多个采样维度张成的策略空间,并在每次释放策略空间后,搜索自车与其单个博弈对象在该策略空间内的解。当有解时,即具有博弈结果时,也即自车与其单个博弈对象在该策略空间内有策略可行域时,根据博弈结果确定自车行驶的决策结果,进而可以根据决策结果控制车辆的行驶。这时,可不再继续从多维度博弈空间中释放尚未释放的策略空间。而针对自车的多个博弈对象,则可如上分别确定自车针对各博弈对象的策略可行域,并以自车动作为索引,从自车与其各博弈对象的策略可行域的交集(交集指均包括该自车的同一个动作)获取自车的行驶策略。该方法在保证决策精度(决策精度可以例如为所决策的结果的执行概率)的前提下,可以以最少的搜索次数在多维度博弈空间内求得最优的行驶决策,且可以尽可能地减少策略空间的使用,因此降低了对硬件算力的要求,更易于在车辆上的产品化。
本申请实施例的智能驾驶决策方案的实施主体可以是具有动力、可自主移动的智能体,智能体可通过本申请实施例提供的智能驾驶决策方案与所在交通场景内的其他物体进行博弈决策,生成语义级的决策标签和智能体的期望行驶轨迹,进而智能体可以进行合理的横向、纵向的运动规划。智能体例如可以是具有自动驾驶功能的车辆、可以自主移动的机器人等。这里的车辆包括一般的机动车辆,例如包括轿车、运动型多用途汽车(Sport Utility Vehicle,SUV)、多用途汽车(Multi-purpose Vehicle,MPV)、自动导引运输车(Automated Guided Vehicle,AGV)、公交车、卡车和其它载货或者载客车辆在内的陆地运输装置,也包括各种船、艇在内的水面运输装置,以及航空器等。对于机动车辆,还包括混合动力车辆、电动车辆、燃油车辆、插电式混合动力车辆、燃料电池汽车以及其它代用燃料车辆。其中,混合动力车辆指的是具有两种或者多种动力源的车辆,电动车辆包括纯电动汽车、增程式电动汽车等。在一些实施例中, 上述可以自主移动的机器人也可以归属于所述车辆的一种。
下面,以本申请实施例提供的智能驾驶决策方案应用于车辆为例进行介绍,如图2所示,应用于车辆时,该车辆10可以包括环境信息获取装置11、控制装置12、行驶***13,在一些实施例中还可以包括通信装置14、导航装置15、或显示装置16。
本实施例中,环境信息获取装置11可用于获取车辆外部环境信息。在一些实施例中,环境信息获取装置11可以包括摄像头、激光雷达、毫米波雷达、超声波雷达、或后述的全球导航卫星***(Global Navigation Satellite System,GNSS)等,数量可以是一个也可以是多个,其中,摄像头可以包括常规的RGB(Red Green Blue)三原色摄像头传感器、红外摄像头传感器等。所获取的车外环境包括路面信息、路面上的对象,路面上的对象包括周边车辆、行人等,具体可包括车辆的运动状态信息,运动状态信息可以包括车辆速度、加速度、航向角信息、轨迹信息等。在一些实施例中,周边车辆的运动状态信息也可以通过车辆10的通信装置14获取。环境信息获取装置11所获取的车外环境信息可以用来形成由道路(对应路面信息)和障碍物(对应路面上的对象)等构建的世界模型。
在其他一些实施例中,环境信息获取装置11也可以是接收摄像头传感器、红外夜视摄像头传感器、激光雷达、毫米波雷达、超声波雷达等所传输的车辆外部环境信息的电子设备,如数据传输芯片,数据传输芯片例如总线数据收发芯片、网络接口芯片等,数据传输芯片也可以是无线传输芯片,如蓝牙(Blue tooth)芯片或Wi-Fi芯片等。在另一些实施例中,环境信息获取装置11也可以集成于控制装置12中,成为集成到处理器中的接口电路或数据传输模块等。
本实施例中,控制装置12可用于根据获取的车辆外部环境信息(包括所构建的世界模型)进行智能行驶策略的决策,生成决策结果,示例的,决策结果可以包括加速、制动、转向(包括变道或转向),也包括车辆短期(如几秒钟之内)的期望行驶轨迹。在一些实施例中,控制装置12还可以进一步根据决策结果生成相应的指令去控制行驶***13,以通过行驶***13执行对车辆的行驶控制,控制车辆根据决策结果实现期望的行驶轨迹。本申请实施例中,控制装置12可以为电子设备,例如可以为车机、域控制器、移动数据中心(Mobile Data Center,MDC)或车载电脑等车载处理装置的处理器,也可以为中央处理器(Central Processing Unit,CPU)、微处理器(Micro Control Unit,MCU)等常规的芯片。
本实施例中,行驶***13可包括动力***131、转向***132和制动***133,下面分别进行介绍:
其中,动力***131可包括驱动电控单元(Electrical Control Unit,ECU)和驱动源。驱动ECU通过控制驱动源来控制车辆10的驱动力(如扭矩)。作为驱动源的例子,可以是发动机、驱动电机等。驱动ECU能够根据驾驶员对加速踏板的操作来控制驱动源,或者能够根据从控制装置12发送来的指令来控制驱动源,从而能够控制驱动力。驱动源的驱动力经由变速器等传递给车轮,从而驱动车辆10行驶。
其中,转向***132可包括转向电控单元(ECU)和电动助力转向***(Electric Power Steering,EPS)。转向ECU能够根据驾驶员对方向盘的操作来控制EPS的电机,或者能够根据从控制装置12发送来的指令控制EPS的电机,从而控制车轮(具 体而言是转向轮)的朝向。另外,也可以通过改变对左右车轮的扭矩分配或制动力分配来进行转向操纵。
其中,制动***133可包括制动电控单元(ECU)和制动机构。制动机构通过制动电机、液压机构等使制动部件进行工作。制动ECU能够根据驾驶员对制动踏板的操作来控制制动机构,或者能够根据从控制装置12发送来的指令控制制动机构,从而能够控制制动力。在车辆10是电动车辆或者混合动力车辆的情况下,制动***133还可以包括能量回收制动机构。
本实施例中,还可包括通信装置14,通信装置14能够通过无线通信方式与外部对象进行数据交互,获得车辆10进行智能驾驶决策所需的数据。在一些实施例中,可通信的外部对象可以包括云端服务器、移动终端(如手机、便携式电脑、平板等)、路侧设备、或周边车辆等。在一些实施例中,决策所需数据包括车辆10周边车辆(也即他车)的用户画像,该用户画像体现了他车驾驶员的驾驶习惯,还可包括他车的位置、他车的运动状态信息等。
本实施例中,还可包括导航装置15,导航装置15可包括全球导航卫星***(Global Navigation Satellite System,GNSS)接收机和地图数据库。导航装置15能够通过GNSS接收机接收到的卫星信号来确定车辆10的位置,且能够根据地图数据库中的地图信息生成到达目的地的路径,并将关于该路径的信息(包括车辆10的位置)提供给控制装置12。导航装置15还可以具有惯性测量装置(Inertial Measurement Unit,IMU),通过融合GNSS接收机的信息和IMU的信息来进行车辆10更精确的定位。
本实施例中,还可包括显示装置16,例如可以是安装在车辆座舱中控位置的显示屏,也可以是抬头显示装置(Head Up Display,HUD)。在一些实施例中,控制装置12可以将决策结果以用户可理解的方式,例如期望行驶轨迹、箭头、文字等形式显示在车辆座舱内的显示装置16。在一些实施例中,当显示期望行驶轨迹时,还可以结合车辆的当前交通场景(如图形化的交通场景),以局部放大视图的形式显示在车辆座舱内的显示装置中。控制装置12还可以显示导航装置15提供的到达目的地的路径的信息。
在一些实施例中,还可包括语音播放***,通过播放语音的方式提示用户就当前交通场景所决策出的决策结果。
下面,将对本申请实施例提供的智能驾驶决策方法进行介绍。为描述方便,在本申请实施例中,将处于交通场景中且执行本申请实施例提供的智能驾驶决策方法的智能驾驶车辆称为自车。在自车视角,将交通场景中影响或可能影响自车行驶的其他物体称为自车的障碍物。
本申请实施例中,自车具有一定的行为决策能力,可以生成行驶策略,以改变自身运动状态,行驶策略包括加速、制动、转向(包括变道或转向),自车还具有行驶行为执行能力,包括执行所述行驶策略,按决策出的期望行驶轨迹行驶。
在一些实施例中,自车的障碍物也可以具有行为决策能力,以改变其自身运动状态,例如障碍物可以是可以自主移动的车辆、行人等。自车的障碍物也可以不具有行为决策能力,或不改变其运动状态,例如障碍物可以是停靠路侧的车辆(该车辆处于未启动状态)、路上的限宽墩等。综上,自车的障碍物可以包括:行人、自行车、机 动车(如摩托车、小客车、货车、卡车、公交车等)等,其中,机动车可以包括可执行智能驾驶决策方法的智能驾驶车辆。
根据是否会与自车建立博弈交互关系,可将自车的各障碍物进一步分为自车的博弈对象、自车的非博弈对象或自车的无关障碍物。具体地,博弈对象、非博弈对象和无关障碍物这三者与自车的交互强度从强交互逐渐减弱到不交互。应该理解为,在与不同决策时刻对应的多个交通场景中,博弈对象、非博弈对象、无关障碍物这三者有可能相互转换。自车的无关障碍物的位置或运动状态使得其与自车未来的行为完全无关,自车与无关障碍物在未来不存在轨迹冲突或意图冲突,故若无特别说明,本申请实施例中的障碍物指自车的博弈对象、自车的非博弈对象。
自车的非博弈对象与自车在未来存在轨迹冲突或意图冲突,因此,自车的非博弈对象将对自车未来的行为产生约束,但自车的非博弈对象不响应在未来与自车之间可能存在的轨迹冲突或意图冲突,而需要自车单方面调整自车的运动状态来解除在未来与其非博弈对象之间可能存在的轨迹冲突或意图冲突,也即自车的非博弈对象与自车不建立博弈交互关系。也即,自车的非博弈对象不会受到自车的行驶行为影响,会保持其既定的运动状态,不会调整其运动状态来解除在未来与自车之间可能存在的轨迹冲突或意图冲突。
自车与自车的博弈对象建立有博弈交互关系,自车的博弈对象会响应在未来与自车之间可能存在的轨迹冲突或意图冲突。在博弈决策开始时刻,自车的博弈对象与自车存在轨迹冲突或意图冲突,在博弈过程中,自车与自车的博弈对象可调整各自的运动状态,以在安全性前提下,逐步解除两者可能的轨迹冲突或意图冲突。其中,作为自车的博弈对象的车辆调整其运动状态时,可包括通过其智能驾驶功能自动调整,也包括通过其驾驶员调整手动驾驶调整。
为了对博弈对象、非博弈对象进一步进行理解,下面,结合图3A-图3E的几种交通场景的示意图,对自车的博弈对象和非博弈对象进行举例说明。
如图3A所示,自车101直行通过无保护路口。对向来车102(在自车101的左前方)左转通过该无保护路口。这时,对向来车102与自车101存在轨迹冲突或意图冲突,该对向来车102为自车101的博弈对象。
如图3B,自车101直行。左侧来车102横穿自车101所在车道并通过。这时,左侧来车102与自车101存在轨迹冲突或意图冲突,该左侧来车102为自车101的博弈对象。
如图3C,自车101直行。同向来车102(在自车101的右前方)汇入自车车道或自车的相邻车道。这时,同向来车102与自车101存在轨迹冲突或意图冲突,会与自车101建立博弈交互关系,该同向来车102为自车101的博弈对象。
如图3D,自车101直行,对向来车103在自车101的左侧相邻车道直行,在自车101的右侧相邻车道上有一台静止车辆102(在自车101的右前方)。这时,对向来车103与自车101存在轨迹冲突或意图冲突,会与自车101建立博弈交互关系,该来车103为自车101的博弈对象。静止车辆102所在的位置与自车101的轨迹在未来存在冲突,但根据获取的外部环境信息可以确认在交互博弈决策过程中,静止车辆102不会切换为移动状态或即使切换为移动状态,但其具有较高路权,不会与自车101建 立有博弈交互关系,所以静止车辆102为自车的非博弈对象,将由自车101单独调整其行驶行为及运动状态,以解除两者之间的轨迹冲突。
如图3E,自车101从当前车道向右变线以汇入右侧相邻车道。右侧相邻车道上有第一直行车辆103(在自车101的右前方)和第二直行车辆102(在自车101的右后方)。第一直行车辆103与自车101相比,具有较高路权,不会与自车101建立有博弈交互关系,为自车101的非博弈对象。在自车101右后方的第二直行车辆102与自车101在未来存在轨迹冲突,会与自车101建立博弈交互关系,该第二直行车辆102为自车101的博弈对象。
下面,参考图1和图2、并结合图4示出的流程图,对本申请实施例提供的智能驾驶决策方法进行介绍,包括以下步骤:
S10:由自车获取自车的博弈对象。
在一些实施例中,如图5所示的流程图,本步骤可以包括以下子步骤:
S11:自车获取车辆外部环境信息,所获取的该外部环境信息包括道路场景中的自车和障碍物的运动状态、相对位置信息等。
在本实施例中,自车对车辆外部环境信息的获取,可以是通过其环境信息获取装置,如摄像头传感器、红外夜视摄像头传感器、激光雷达、毫米波雷达、超声波雷达、GNSS等进行获取。在一些实施例中,自车对车辆外部环境信息的获取,可以是通过其通信装置与路侧装置通信,或与云服务器通信,来获得其车辆外部环境信息。其中,路侧装置可以具有摄像头或通信装置,其可以获取其周边的车辆信息,云服务器可以接收存储各个路侧装置上报的信息。在一些实施例中,也可以是上述两种方式结合进行车辆外部环境信息的获取。
S12:自车根据所获得的障碍物的运动状态,或一段时间内的障碍物的运动状态,或形成的障碍物的行驶轨迹,以及与障碍物的相对位置信息,从所述障碍物中识别出自车的博弈对象。
在本实施例中,在步骤S12中,也可以同时从所述障碍物中识别出自车的非博弈对象,或者从所述障碍物中识别出自车的博弈对象的非博弈对象。
在一些实施例中,可以根据预先设置的判断规则来识别上述博弈对象或非博弈对象。该判断规则例如:如果一障碍物的行驶轨迹或行驶意图与自车的行驶轨迹或行驶意图有冲突,且该障碍物具有行为决策能力,可以改变其自身运动状态,则为自车的博弈对象。如果一障碍物的行驶轨迹或行驶意图与自车的行驶轨迹或行驶意图有冲突,但不会主动改变其自身运动状态来主动避让冲突,则为与自车的非博弈对象。在一些实施例中,障碍物的行驶轨迹或行驶意图,可以根据其行驶所在车道(直行或转弯车道)、是否开启转向灯、车头朝向等进行判断。
例如图3A-图3C中,横穿的障碍物、车道汇入的障碍物,均属于与自车轨迹存在大角度相交因而具有行驶轨迹冲突,进而分类为博弈车。图3D窄道通行的对向来车103和图3E自车汇入旁边车道车流的后车104属于意图上存在冲突的,进而被分类为博弈车。图3D窄道通行右前方静止车辆102和图3E自车汇入旁边车道车流的前方车103,尽管轨迹或意图具有冲突,但因为自车相对于他车具有较低的路权,他车不会采取行为来解除冲突,且自车的行为也不能改变他车的行为,则他车属于非博弈 车。
在一些实施例中,自车是基于一些已知的算法从其感知或获取的车辆外部环境信息中获取到障碍物信息,并从障碍物中识别自车的博弈对象、非博弈对象或博弈对象的非博弈对象。
在一些实施例中,上述算法可以是例如基于深度学习的分类神经网络,由于是识别障碍物的类型,相当于分类,因此可以采用分类模型的神经网络进行推理后确定。该分类神经网络可以采用卷积神经网络(Convolutional Neural Networks,CNN)、循环神经网络(Recurrent Neural Network,RNN)、基于转换器的双向编码表示(Bidirectional Encoder Representations from Transformers,BERT)等。其中,所述分类神经网络进行训练时,可以使用样本数据对神经网络进行训练,样本数据可以是标记有分类标签的车辆行驶场景的图片或视频片段,分类标签可以包括博弈对象、非博弈对象、博弈对象的非博弈对象。
在一些实施例中,上述算法也可以利用注意力相关的算法,如建模的注意力模型。其中,注意力模型用于输出各个障碍物对自车分配的注意力值,该注意力值与障碍物与自车存在的意图冲突或轨迹冲突的程度相关。例如,与自车存在意图冲突或轨迹冲突的障碍物向自车会分配更多的注意力;而与自车不存在意图冲突或轨迹冲突的障碍物向自车会分配较少注意力或零注意力;具有比自车更高路权的障碍物,也可向自车分配较少注意力或零注意力。若障碍物向自车分配足够多(如高于某阈值)的注意力,则可以识别该障碍物为自车的博弈对象。若障碍物向自车分配足够少的注意力(如低于某阈值),则可以识别该障碍物为自车的非博弈对象。
在一些实施例中,注意力模型可以采用例如y=softmax(a1x1+a2x2+a3x3…)等数学模型构建,其中softmax表示归一化,a1、a2、a3…为权重系数,x1、x2、x3…为障碍物与自车的相关参数,例如纵向车距、横向车距、车速差、车加速度差、车位置关系(前方、后方、左方、右侧等)等,其中,x1、x2、x3…也可以是归一化后的值,即0-1之间的值。在一些实施例中,注意力模型也可以通过神经网络来实现,此时神经网络的输出是对应所识别的障碍物向自车分配的注意力值。
步骤S20:自车针对与博弈对象之间的交互博弈任务,从自车与所述博弈对象的多个策略空间中执行所述多个策略空间的多次释放,当所述多次释放中的一次释放执行后,根据已经释放的各所述策略空间确定自车与所述博弈对象的策略可行域,根据所述策略可行域确定所述自车行驶的决策结果。
在一些实施例中,所述决策结果指策略可行域中自车与博弈对象的可执行的行为动作对。这时,步骤S20完成自车与任一个博弈对象的单车交互博弈决策过程,并确定自车与该博弈对象的策略可行域。在一些实施例中,所述执行所述多个策略空间的多次释放,包括执行各所述策略空间的逐次释放,也即,执行每次释放时,只释放一个策略空间,以及执行逐次释放后,累计释放多个策略空间。这时,每一个策略空间是由至少一个采样维度张成的。
在一些实施例中,根据通常的车辆控制的安全性,或行驶习惯,通常是自车在当前车道先进行加减速的方式优先于更换车道,因此,上述多个策略空间释放时,可选的,在多个策略空间释放过程中,可以按照下述维度顺序逐次释放:纵向采样维度、 横向采样维度、时间采样维度维度。由不同的维度可以张成不同的策略空间,例如,释放纵向采样维度时可以张成纵向采样策略空间,释放横向采样维度时可以张成横向采样策略空间,或者与所述纵向采样维度共同张成纵向采样策略空间与横向采样策略空间组合后的策略空间,释放时间采样维度时可以形成多帧推演构成的多个策略空间。
在一些实施例中,也可以是不同策略空间的各部分空间组合的依次释放,例如首次释放时,先依次释放纵向采样策略空间的局部空间和横向采用策略空间的局部空间,第二次释放时,再依次释放纵向采样策略空间剩下的空间和横向采样策略空间剩下的局部空间。
在一些实施例中,累计释放的多个策略空间可以包括:纵向采样策略空间,横向采样策略空间,纵向采样策略空间与横向采样策略空间组合后张成的策略空间,纵向采样策略空间与横向采样策略空间分别与时间采样维度组合后张成的策略空间,纵向采样维度、横向采样维度、时间采样维度这三者组合后张成的策略空间。
在一些实施例中,如上所述,构成策略空间的维度可以包括纵向采样维度、横向采样维度、或时间采样维度。结合到车辆行驶场景,也即纵向加速度维度、横向偏移维度、在一步决策中包括的多个单帧推演分别对应的推演深度。对应的,所述纵向采样策略空间张成时所使用的纵向采样维度包括至少以下之一:自车的纵向加速度、博弈对象的纵向加速度;所述横向采样策略空间张成时所使用的横向采样维度包括至少以下之一:自车的横向偏移、博弈对象的横向偏移;所述时间采样维度包括由对应连续时间点(也即依次增加推演深度)的连续多帧推演构成的多个策略空间。该三个维度的组合可构成上述的多个策略空间。
这时,各所述策略空间张成时在各横向或纵向采样维度上的取值对应自车或博弈对象的采样动作,也即行为动作。
在一些实施例中,如图6所示的流程图,本步骤S20可以包括以下子步骤S21-S26:
S21:执行策略空间的第一次释放,释放自车与所述博弈对象的策略空间,根据所释放的策略空间,逐一取出由自车的至少一个采样维度上的多个取值和博弈对象的至少一个采样维度上的多个取值形成的行为动作对。
在一些实施例中,执行第一次策略空间的释放时,释放的为纵向采样维度,包括自车的纵向加速度、博弈对象的纵向加速度,释放的纵向采样维度张成的策略空间为纵向采样策略空间,为简化描述,后文简称为第一次释放的纵向采样策略空间。这时,策略空间内为待评价的自车纵向加速度与其博弈对象(即他车)的纵向加速度构成的行为动作对。这时,在各采样维度上可以设置多个采样值。这些采样值中,可以由均匀且连续的采样间隔的多个采样值构成采样区间。这些采样值中,分散在采样维度上的多个采样值,则为离散的采样点。示例的,在自车的纵向加速度维度上,以预先确定的采样间隔进行均匀采样,可以得到自车在纵向加速度维度上的多个采样值,记为M1个,也即自车的M1个纵向加速度采样动作。在博弈对象的纵向加速度维度上,以预先确定的采样间隔进行均匀采样,可以得到博弈对象在纵向加速度维度上的多个采样值,记为N1个,也即博弈对象的N1个纵向加速度采样动作。则该第一次释放的纵向采样策略空间中包括M1*N1个由自车的纵向加速度采样动作与博弈对象的纵向加速度采样动作组合得到的自车与博弈对象的行为动作对。该策略空间的一具体例 子可参见后文表1或表2所示,表1或表2的第一行和第一列分别为自车和博弈对象(也即表格中的他车O)的纵向加速度采样值,表1中,博弈对象为横穿博弈车,表2中,博弈对象为对向博弈车。
S22:将所释放的策略空间中的各行为动作对推演到自车与所述博弈对象当前构建的子交通场景中,确定各行为动作对对应的代价值。
这时,自车与各博弈对象还可以分别构建各子交通场景,各子交通场景为自车所在道路场景的子集。
在一些实施例中,所述策略空间中各行为动作对对应的代价值,根据至少以下之一所确定:自车和博弈对象执行所述行为动作对相应的安全性代价值、舒适性代价值、横向偏移代价值、通过性代价值、路权代价值、风险区域代价值、帧间关联性代价值。
在一些实施例中,可以采用上述各代价值的加权和,此时为了区分各个代价值,计算的加权和可以称为总代价值,总代价值的数值越小,则自车与所述博弈对象执行所述行为动作对相应的决策收益越大,则所述行为动作对作为决策结果的可能性越大。关于上述各代价值,将在后文进一步进行描述。
S23:将代价值不大于代价阈值的行为动作对加入到自车与所述博弈对象的策略可行域内,该策略可行域即为第一次释放策略空间时自车与所述博弈对象的博弈结果。
其中,策略可行域指的是可执行的行为动作对的集合。例如,后文表1中表格内容为Cy或Cg的表格项即构成策略可行域。
S24:在当前策略空间中的策略可行域(即博弈结果)不为空时,可以将策略可行域中至少一个自车与博弈对象的可执行的行为动作对作为自车与所述博弈对象的决策结果,并结束本次策略空间的释放。
在所述策略可行域为空时,表示当前策略空间无解,此时执行策略空间的第二次释放,也即,释放多个策略空间中的下一个策略空间,在本实施例中,第二次释放的为横向采样维度,以由横向偏移张成横向采样策略空间,将本次释放的横向采样策略空间与第一次释放的纵向采样策略空间共同作为当前的策略空间,此时,当前用于自车与博弈对象交互博弈的策略空间为纵向采样策略空间与横向采样策略空间组合后张成的策略空间。
其中横向采样策略空间在自车的横向偏移维度上及在博弈对象的横向偏移维度上张成。示例的,在自车的横向偏移维度上,以预先确定的采样间隔进行均匀采样,可以得到自车在横向偏移维度上的多个采样值,记为Q个,也即自车的Q个横向偏移采样动作。在博弈对象的横向偏移维度上,以预先确定的采样间隔进行均匀采样,可以得到博弈对象在横向偏移维度上的多个采样值,记为R个,也即博弈对象的R个横向偏移采样动作。
这时,当前用于自车与博弈对象交互博弈的策略空间中,每一个自车与博弈对象的行为动作对由自车横向偏移采样动作、博弈对象横向偏移采样动作、自车纵向加速度采样动作、博弈对象纵向加速度采样动作共同构成。
假设当前的策略空间是由自车横向偏移的Q个取值、博弈对象横向偏移的R个取值、自车纵向加速度的M2个取值、博弈对象纵向加速度的N2个取值构成时,该第二次释放的自车与博弈对象的策略空间中包括M2*N2*Q*R个行为动作对。其具体 一个例子可参见后文表3所示,其中,表3中上部横向采样策略空间的每个表格项都关联一个表3中下部纵向采样策略空间的表格项。表3中上部横向采样策略空间的表格中,博弈对象(也即表格中的他车O)为对向博弈车。
S25:在执行策略空间第二次释放后,基于当前的策略空间,将所释放的各行为动作对推演到自车与所述博弈对象当前构建的子交通场景中,确定各行为动作对对应的代价值,进而确定策略可行域,以确定出博弈结果,该步骤可参见步骤S22-S23。
S26:在步骤S25的策略可行域(即博弈结果)不为空时,则可从中选择一行为动作对作为决策结果,并结束本次策略空间的释放。
在所述策略可行域为空时,表示当前策略空间无解,此时执行策略空间的第三次释放,也即,释放多个策略空间中的下一个策略空间。如此,按照上述方式可以继续依次释放其他策略空间,以继续执行博弈结果及决策结果的确定。
在一些实施例中,以上多次释放的策略空间,可以是先释放由自车和/或博弈对象在纵向加速度维度上的第i组取值与自车和/或博弈对象在横向偏移维度上的第i组取值张成的策略空间,并在该策略空间内不存在策略可行域时,再释放由自车和/或博弈对象在纵向加速度维度上的第i+1组取值与自车和/或博弈对象在横向偏移维度上的第i+1组取值张成的策略空间。也即多次释放的策略空间,分别在自车和/或博弈对象在纵向加速度维度上的全部取值及自车和/或博弈对象在横向偏移维度上的全部取值张成的博弈空间内移动其所在的局部位置,其中,i为正整数。以上以第i组取值为例,展示了依次释放在各采样维度上的部分取值,以依次在博弈空间不同的局部策略空间内寻找策略可行域及确定决策结果。这样依次释放不同局部对应的策略空间,可以以最少的搜索次数在多维度博弈空间内求得最优的决策结果,且可以尽可能地减少策略空间的使用,降低对硬件算力的要求。
如,先释放下表3中由对向博弈车的横向偏移值0、自车的横向偏移值1、自车及对向博弈车的全部的纵向加速度值张成的策略空间;并在该策略空间内不存在策略可行域时,再释放由对向博弈车的横向偏移值0、自车的横向偏移值为2或3、自车及对向博弈车的全部的纵向加速度取值张成的策略空间。
在一些实施例中,如果经过上述步骤多次执行释放策略空间后,自车与博弈对象的策略可行域仍为空,即表示仍无解,此时,可以执行自车行驶的保守决策,所述保守决策包括使得自车安全刹停的行为动作,使得自车安全减速行驶的行为动作,或给出提示或警告,以由驾驶员接手对车辆的控制。
以上,执行完步骤S10-S20,即完成了一次单帧推演。在一些实施例中,当执行完步骤S10-S20,若策略可行域不为空时,则还可以包括:按照推演的时间的发展(也即依次增加推演深度,为连续的多个时刻),执行时间采样维度的多次释放,执行多帧推演。这时,当所述多次释放中在推演的一时刻(或称时间点)执行了一次释放后,完成一次单帧推演,以确定自车与所述博弈对象的策略可行域,并在该单帧推演确定出的自车与所述博弈对象的策略可行域非空时,执行所推演的下一时刻的释放,以执行下一次单帧推演,直到时间采样维度的多次释放结束或连续多帧推演结束。
这时,实现了一步决策中的多个单帧推演。如图7所示,T1用于指示自车及博弈对象的初始的运动状态,T2用于指示第一帧推演后自车及博弈对象的运动状态,也 即第一帧推演结果,Tn用于指示第n-1帧推演后自车及博弈对象的运动状态。
在一些实施例中,在每次执行释放时间采样维度时,将推演时间向后移动预定时间间隔(如,2秒或5秒),即移到下一时刻(或称为时间点)。相应地,将当前帧的推演结果,作为下一帧的推演初始条件,来推演下一时刻自车与所述博弈对象的运动状态;如此,按照该方式,时间采样维度可以继续在后续时刻继续释放,以继续执行后续的连续多帧的推演,以继续执行博弈结果及决策结果的确定。
以上,释放时间采样维度中,需要就相邻两个单帧推演确定出的自车与所述博弈对象的行为决策进行决策结果评价,并确定帧间关联性代价值,将在后文详细说明。释放时间采样维度有助于提高车辆的行为一致性,例如,当连续多帧推演的自车与所述博弈对象的运动状态或决策结果对应的意图决策是相同或相似的,则智能行驶车辆执行该智能驾驶决策方法的行驶行为从时域上看更稳定,行驶轨迹的波动性更小,车辆行驶的舒适性就更好。
上述释放时间采样维度,也即在连续的多个决策时刻,分别利用释放的多个策略空间,获取自车与所述博弈对象对应的策略可行域,并推演自车和博弈对象按照时间顺序依次执行这些策略可行域对应的可执行的行为动作对后的运动状态。通过对自车和博弈对象的运动状态和/或期望行驶轨迹进行长期推演,可以实现决策结果在时间上的一致性。
以上多帧推演结束后,若多帧推演的整体收益满足决策要求,这时,可以确定各帧博弈结果逐渐收敛至纳什均衡状态(Nash equilibrium)。这时,可将多帧推演中第一帧的决策结果作为所述自车行驶的决策结果。
在一些实施例中,若多帧推演的整体收益不满足决策要求,可以重新选择第一帧的决策结果。其中第一帧的决策结果对应单帧推演的决策结果,重新选择第一帧的决策结果,也就是从第一帧的决策结果的策略可行域中,选择另一行为动作对作为决策结果。针对再次选择的决策结果可以再次执行多帧推演,以判断其是否可以作为最终的决策结果。
在一些实施例中,在上述第一次以及重新选择,或多次重新选择第一帧的决策结果时,可以按照各行为动作对对应的代价值的排序结果进行选择,优先选择总代价值小的动作行为对对应的决策结果。
在一些实施例中,上述各个不同的代价值可以具有不同的权重,对应的可分别称为安全性权重、舒适性权重、横向偏移权重、通过性权重、路权权重、风险区域权重、帧间关联权重。并且,在一些实施例中,所述权重分配的大小可以按照如下分配:安全性权重>路权权重>横向偏移权重>通过性权重>舒适性权重>风险区域权重>帧间关联权重。在一些实施例中,上述代价值可分别经过归一化处理,取值区间为[0,1]。
在一些实施例中,上述各个代价值可以根据不同的代价函数计算得到,对应的可分别称为安全性代价函数、舒适性代价函数、通过性代价函数、横向偏移代价函数、路权代价函数。
在一些实施例中,安全性代价值可以根据以自车与他车(也即博弈对象)交互时的相对距离为自变量的安全性代价函数计算得到,并且安全性代价值与相对距离为负相关。如,两车相对距离越大,则安全性代价值越小。如图8A所示,一个均一化处 理后的安全性代价函数为如下的分段函数,其中,C dist为安全性代价值,dist为自车和博弈对象之间的相对距离,如,该最小距离定义为自车和博弈对象之间的polygon最小距离:
Figure PCTCN2021109331-appb-000001
其中,threLow为距离下限阈值,如图8A中为0.2,threHigh为距离上限阈值,如图8A中为1.2。可选的,距离下限阈值threLow和距离上限阈值threHigh可以随自车和他车交互情况而动态调整,如随自车和他车的相对速度、相对距离、相对角度等动态调整。
在一些实施例中,安全性代价函数定义的安全性代价值与相对速度或相对角度正相关。如,对向或横向(横向指他车相对自车为交叉)会车时,两车交互的相对速度或相对角度越大,则对应的安全性代价值就越大。
在一些实施例中,车辆(自车或博弈对象)的舒适性代价值可以根据以加速度变化量(也即加加速度,jerk)的绝对值为自变量的舒适性代价函数计算得到。如图8B所示,一个均一化处理后的舒适性代价函数为如下的分段函数,其中,C comf为舒适性代价值,jerk为自车或博弈对象的加速度变化量:
Figure PCTCN2021109331-appb-000002
其中,threMiddle为jerk中间点阈值,如图8B中示例为2,threHigh为jerk上限阈值,如图8B中为4。C middle为jerk代价斜率。也即,车辆的加速度变化量越大,则舒适性越差,舒适性代价值越大,并且,车辆的加速度变化量大于中间点阈值之后,舒适性代价值增加得更快。
在一些实施例中,车辆的加速度变化量可以是纵向加速度变化量、横向加速度变化量,或二者的加权和。在一些实施例中,舒适性代价值可以为自车的舒适性代价值,或博弈对象的舒适性代价值,或二者舒适性代价值的加权和。
在一些实施例中,通过性代价值可以根据以自车或博弈对象的速度变化量为自变量的通过性代价函数计算得到。如,车辆以较大减速度让行将导致速度损失(当前速度与未来速度之间的差值,也即加速度)较大或等待时间较长,则车辆的通过性代价值增加。如,车辆以较大加速度抢行将导致速度增加较大(当前速度与未来速度之间的差值,也即加速度)或等待时间较短,则车辆的通过性代价值减少。
在一些实施例中,通过性代价值还可以根据以自车与博弈对象的相对速度占比为自变量的通过性代价函数计算得到。如,在执行该动作对前,自车的速度绝对值在自车与博弈对象的速度绝对值之和的占比较大,博弈对象的速度绝对值在自车与博弈对象的速度绝对值之和中的占比较小。在执行该行为动作对后,自车以较大减速度让行, 则其速度损失增加,速度占比减小,则自车执行该行为动作对应的通过性代价值较大。而若在执行该行为动作对后,博弈对象以较大加速度抢行,则其速度增加,速度占比增大,则博弈对象执行该行为动作对应的通过性代价值较小。
在一些实施例中,通过性代价值为自车执行所述行为动作对相应的自车通过性代价值,或博弈对象执行所述行为动作对相应的博弈对象通过性代价值,或二者通过性代价值的加权和。
在一些实施例中,如图8C所示,均一化处理后的通过性代价函数为如下的分段函数,其中C pass为通过性代价值,speed为车辆的速度绝对值:
Figure PCTCN2021109331-appb-000003
其中,车辆的中间点速度绝对值为speed0,车辆的速度绝对值的最大值为speed1,C middle为速度代价斜率。也即,车辆的速度绝对值越大,则通过性越好,通过性代价值越小,并且,车辆的速度绝对值大于中间点阈值之后,通过性代价值减少得更快。
在一些实施例中,可以根据获得的自车或博弈对象的用户画像来确定车辆对应的路权信息,如,若博弈对象的驾驶行为属于激进风格,更倾向采用抢行决策,则为高路权,若博弈对象的驾驶行为属于保守风格,更倾向采用让行策略,则为低路权。其中,高路权倾向保持既定运动状态或既定行驶行为,低路权更倾向改变既定运动状态或既定行驶行为。
在一些实施例中,用户画像可以根据用户的性别、年龄或历史行为动作的完成情况确定。在一些实施例中,可以由云服务器获取确定用户画像所需数据并确定用户画像。若自车和/或博弈对象执行所述行为动作对使得高路权的车辆改变运动状态,则所述行为动作对相应的路权代价值较大,收益越小。
在一些实施例中,通过为导致高路权车辆运动状态改变的行为决策确定一个较高的路权代价值以增加惩罚。也即,通过这个反馈机制,使得高路权车保持当前运动状态的行为动作对具有较大的路权收益,也即较小的路权代价值。
在一些实施例中,如图8D所示,均一化处理后的路权代价函数为如下的分段函数,其中C roadRight为路权代价值,acc为车辆的加速度绝对值:
Figure PCTCN2021109331-appb-000004
其中,threHigh为加速度上限阈值,如图8D中为1。也即,车辆的加速度越大,则路权代价值越大。
也即,路权代价函数使得高路权车保持当前运动状态的行为动作具有较小的路权代价值,从而可以避免高路权车改变当前运动状态的行为动作对成为决策结果。
在一些实施例中,车辆的加速度可以是纵向加速度或横向加速度。也即,在横向偏移维度上,横向变化大也将使得路权代价值较大。在一些实施例中,路权代价值可以为自车执行所述行为动作对相应的路权代价值,或博弈对象执行所述行为动作对相 应的路权代价值,或二者路权代价值的加权和。
在一些实施例中,如,车辆处于道路内风险区域(该区域内,车辆有较大的行车风险,需要尽快离开该风险区域),则需要对车辆让行策略施以较高的风险区域代价值以增加惩罚,通过不选择车辆让行行为而选择车辆抢行行为作为决策结果,以使得车辆尽快驶离风险区域,也即进行车辆尽快驶离风险区域的决策,以保证车辆尽快驶离风险区域,不会对交通产生严重影响。
也即,通过风险区域代价值越大则策略收益越小这个反馈机制,使得处于道路内风险区域的车辆不发生让行行为,也即放弃导致处于道路内风险区域的车辆发生让行行为的行为决策(该行为决策具有较大风险区域代价值),而选择处于道路内风险区域的车辆发生抢行行为以尽快驶离风险区域的决策结果(具有较小风险区域代价值),从而避免处于道路内风险区域的车辆滞留并对交通产生严重影响。
在一些实施例中,风险区域代价值可以为处于道路内风险区域的自车执行所述行为动作对相应的风险区域代价值,或处于道路内风险区域的博弈对象执行所述行为动作对相应的风险区域代价值,或二者风险区域代价值的加权和。在一些实施例中,横向偏移代价值可以根据自车或博弈对象的横向偏移量计算得到。如图8E所示,均一化处理后的横向偏移代价函数在右半空间为如下的分段函数,其中C offset为横向偏移代价值,offset为车辆的横向偏移量,单位为米,这时,左半空间的方程表达可以对坐标平面右半空间的方程表达取反得到:
Figure PCTCN2021109331-appb-000005
其中,threMiddle为横向偏移中间值,如,为道路软边界;C middle为第一横向偏移代价斜率;threHigh为横向偏移上限阈值,如,为道路硬边界。也即,车辆的横向偏移越大,则横向偏移收益越小,横向偏移代价值越大,并且,车辆的横向偏移量大于横向偏移中间值之后,横向偏移代价值增加得更快,以增加惩罚。车辆的横向偏移量大于横向偏移上限阈值之后,示例的,横向偏移代价值为固定值1.2,以增加惩罚。
在一些实施例中,横向偏移代价值可以为自车执行所述行为动作对相应的横向偏移代价值,或博弈对象执行所述行为动作对相应的横向偏移代价值,或二者横向偏移代价值的加权和。
在前述的多帧推演步骤中,需要就相邻两个单帧推演确定出的自车与所述博弈对象的行为决策进行决策结果评价,并确定帧间关联性代价值。
在一些实施例中,如图8F所示,自车上一帧K的意图决策为抢行博弈对象,则若自车当前帧K+1的意图决策为抢行博弈对象时,对应的帧间关联性代价值会较小,如0.3,而默认值为0.5,因此为奖励。而若自车当前帧K+1的意图决策为让行博弈对象时,对应的帧间关联性代价值会较大,如0.8,而默认值为0.5,因此为惩罚。这时,选择使自车当前帧的意图决策为抢行博弈对象的策略成为当前帧的可行解。经过如上 针对帧间关联性代价值的惩罚或奖励,可以保证自车在当前帧的意图决策与上一帧的意图决策保持一致,从而使得自车在当前帧的运动状态与前一帧的运动状态保持一致,从时域上稳定自车的行为决策。
帧间关联性代价值帧间关联性代价值帧间关联性代价值在一些实施例中,帧间关联性代价值可以根据上一帧自车的意图决策和当前帧自车的意图决策计算得到,也可以根据上一帧博弈对象的意图决策和当前帧博弈对象的意图决策计算得到,或根据自车和博弈对象加权后得到。
在一些实施例中,如图4所示,在步骤S20确定决策结果之后,还可包括下述步骤S30和/或S40:
步骤S30:自车根据所述决策结果,生成纵向/横向控制量,以由自车的行驶***执行所述纵向/横向控制量实现自车期望行驶轨迹。
在一些实施例中,由自车的控制装置根据决策结果生成纵向/横向控制量,并将纵向/横向控制量发送给行驶***13,以通过行驶***13执行对车辆的行驶控制,包括动力控制、转向控制、制动控制,实现车辆根据所述决策结果执行自车的期望行驶轨迹。
步骤S40:将所述决策结果以用户可理解的方式显示在显示装置中。
其中,自车行驶的决策结果包括自车的行为动作。根据自车的行为动作,和当前帧推演中决策开始时刻获取的自车的运动状态,可以预测自车的意图决策,如抢行、让行或避让,还可以预测自车的期望行驶轨迹。在一些实施例中,以用户可理解的方式,例如期望行驶轨迹、指示意图决策的箭头、指示意图决策的文字等形式将决策结果显示在车辆座舱内的显示装置。在一些实施例中,当显示期望行驶轨迹时,可以结合车辆的当前交通场景(如图形化的交通场景),以局部放大视图的形式显示在车辆座舱内的显示装置中。在一些实施例中,还可包括语音播放***,可以通过播放语音方式提示用户所决策出的意图决策或策略标签。
在一些实施例中,考虑到自车与自车的非博弈对象之间的单向交互决策,或自车的博弈对象与自车的博弈对象的非博弈对象之间的单向交互决策,如图9所示的另一实施例,上述步骤S10与步骤S20之间,还包括下述步骤:
S15:对自车或博弈对象的策略空间,通过非博弈对象的运动状态进行约束。
在一些实施例中,可以就自车的非博弈对象的运动状态,对自车在各采样维度上的取值范围进行约束;
在一些实施例中,可以就自车的博弈对象的非博弈对象的运动状态,对自车的博弈对象在各采样维度上的取值范围进行约束。
在一些实施例中,取值范围可以是在采样维度上的一个或多个采样区间,也可以是离散的多个采样点。约束之后的取值范围可以为部分取值范围。
在一些实施例中,步骤S15包括:就自车与其非博弈对象的单向交互过程,确定在非博弈对象的运动状态的约束下,自车在各采样维度上的取值范围;或就自车的博弈对象与自车的博弈对象的非博弈对象的单向交互过程,确定在自车的博弈对象的非博弈对象的运动状态的约束下,自车的博弈对象在各采样维度上的取值范围。
由于非博弈对象不参与交互博弈,其运动状态保持不变,因此,将自车在各采样 维度上的取值范围通过自车的博弈对象的运动状态进行约束后,以及将自车的博弈对象在各采样维度上的取值范围通过自车的博弈对象的非博弈对象的运动状态进行约束后,再进行步骤S20,有利于缩小自车与自车的博弈对象进行单车交互博弈决策过程中的博弈空间及策略空间,减少交互博弈决策过程使用的算力。
在一些实施例中,针对自车进行约束,步骤S15可以包括:首先,接收自车的非博弈对象的运动状态,并观测该非博弈对象的特征量,如;然后,计算自车与该非博弈对象的冲突区域,确定自车的特征量,也即临界动作。如上,基于非博弈对象的位置、速度、加速度和/或行驶轨迹,计算自车做出避让、抢行或让行等意图决策的对应的临界动作,生成自车在各采样维度上针对该非博弈对象的可行区间,也即自车在各采样维度上经过非博弈对象约束后的取值范围。
在一些实施例中,同样可以针对自车的博弈对象进行约束,相对于上述针对自车进行约束,将自车更改为自车的博弈对象即可,来生成自车的博弈对象在各采样维度上经过非博弈对象约束后的取值范围。
在一些实施例中,当自车存在非博弈对象C时,也可以为:先处理自车A与自车的博弈对象B之间的交互博弈决策,确定对应的策略可行域AB,之后再引入自车与非博弈对象C的非博弈可行域AC,然后将策略可行域AB与非博弈可行域AC这两者取交集,获得最终的策略可行域ABC,基于该最终的策略可行域来确定针对自车行驶的决策结果。
在一些实施例中,当自车的博弈对象B存在非博弈对象D时,也可以先处理自车A与自车的博弈对象B之间的交互博弈决策,确定对应的策略可行域AB,之后再引入自车的博弈对象B与其非博弈对象D的非博弈可行域BD,然后将策略可行域AB与非博弈可行域BD两者取交集,获得最终的可行域ABD,基于该最终的策略可行域来确定针对自车行驶的决策结果。
在一些实施例中,当自车存在非博弈对象C,且自车的博弈对象B存在非博弈对象D时,也可以先处理自车A与自车的博弈对象B之间的交互博弈决策,确定对应的策略可行域AB,之后再引入自车A与非博弈对象C的非博弈可行域AC、及自车的博弈对象B与其非博弈对象D的非博弈可行域BD,然后将策略可行域AB、非博弈可行域AC、与非博弈可行域BD这三者取交集,获得最终的可行域ABCD,并基于该最终的策略可行域来确定针对自车行驶的决策结果。
以上具体示例了通过逐次释放自车与单个博弈对象的多个策略空间,确定自车与博弈对象的可执行的行为动作对的步骤。在一些实施例中,当自车有两个以上的博弈对象时,例如以两个博弈对象,包括第一博弈对象、及第二博弈对象为例,此时,本申请实施例提供的智能驾驶决策方法,包括:
第一步:从自车与第一个博弈对象的多个策略空间中,执行各所述策略空间的逐次释放,确定针对第一个博弈对象的所述自车行驶的策略可行域。其中,确定针对第一个博弈对象的所述自车行驶的策略可行域,与前述的步骤S20相似。这里不再赘述。
第二步:从自车与第二个博弈对象的多个策略空间中,执行各所述策略空间的逐次释放,确定针对第二个博弈对象的所述自车行驶的策略可行域。其中,确定针对第二个博弈对象的所述自车行驶的策略可行域,与前述的步骤S20相似。这里不再赘述。
第三步:根据自车与各个所述博弈对象的各个策略可行域,确定所述自车行驶的决策结果。在一些实施例中,通过对各个策略可行域取交集,来获得最终的策略可行域,进而从该策略可行域中确定决策结果。在一些实施例中,决策结果可以是该策略可行域中代价值最小的行为动作对。
下面,对本申请实施例提供的智能驾驶决策方法的一具体实施方式进行介绍。本具体实施方式仍以应用于路面车辆行驶的一个交通场景为例进行说明,如图10示出了本具体实施方式的场景为,自车101行驶在一公路上,公路为双向单车道,对向行驶来车辆103,即对向博弈车。自车前方具有一将要横穿公路的车辆102,即横穿博弈车。下面参见图11所示的流程图,对本申请具体实施方式提供的行驶控制方法进行详细描述,包括以下步骤:
S110:自车通过环境信息获取装置11获取车外环境信息。
该步骤可参见前述步骤S11,不再赘述。
S120:自车确定出博弈对象、非博弈对象。
该步骤可参见前述步骤S12,不再赘述,本步骤中,确定出自车的一博弈对象为横穿博弈车,一博弈对象为对向博弈车。
S130:从自车与横穿博弈车的多个策略空间中,执行各所述策略空间的逐次释放,并确定自车与横穿博弈车的博弈结果。具体可包括下述步骤S131-S132:
S131:按照先纵向采样维度再横向采样维度的释放原则,释放自车及横穿博弈车的纵向加速度维度。
从自车与横穿博弈车的多维博弈空间内,释放自车及横穿博弈车纵向加速度维度,张成第一个自车及横穿博弈车的纵向采样策略空间。考虑自车及横穿博弈车的车辆纵向/横向动力学、运动学约束、自车及横穿博弈车的相对位置关系和相对速度关系,并认为两车具有相同的机动能力,确定自车及横穿博弈车的纵向加速度取值区间均为[-4,3],单位为m/s 2,其中m表示米,s表示秒。根据自车的计算能力和预先设定的决策精度,确定自车及横穿博弈车的采样间隔均为1m/s 2
表1.第一次释放的自车及横穿博弈车的纵向采样策略空间
Figure PCTCN2021109331-appb-000006
张成的该策略空间以二维表格展示时如表1所示。表1的第一行罗列了自车的纵 向加速度的全部取值Ae,第一列罗列了横穿博弈车的纵向加速度值的全部取值Ao1。也即,本次释放的自车及横穿博弈车的纵向采样策略空间中包括8乘以8也即64个自车及横穿博弈车的纵向加速度行为动作对。
S132:根据预先定义的各代价值确定方法,如各代价函数,分别计算自车及横穿博弈车的纵向采样策略空间中各行为动作对对应的代价值,并确定策略可行域。
在释放的表1中的64个行为动作对中,自车及横穿博弈车执行其中9个采样动作后,在自车及横穿博弈车构建的子交通场景中,通过性太差(如刹停),为不可行解,在表1中,这些动作对用标签“0”来标识。
在释放的64个行为动作对中,自车及横穿博弈车执行其中39个采样动作后,在自车及横穿博弈车构建的子交通场景中,安全性太差(如碰撞),为不可行解,在表1中,这些动作对用标签“-1”来标识。
在释放的64个行为动作对中,自车及横穿博弈车执行其中3加13,也即16个采样动作后,在自车及横穿博弈车构建的子交通场景中,安全性代价值、舒适性代价值、通过性代价值、横向偏移代价值、路权代价值、风险区域代价值、帧间关联性代价值的加权和大于预先设定的代价阈值,为该策略空间内的可行解,构成了自车及横穿博弈车的策略可行域。
其中,因为释放的是纵向采样策略空间,不涉及横向偏移,因此横向偏移代价值为零。因为是当前帧决策,不涉及前一帧的决策结果,因此帧间关联性代价值为零。
这时,自车与横穿博弈车的交互博弈在纵向采样策略空间找到充分多的可行解,不再需要在横向采样维度继续寻找解决方案。这时,搜索的行为动作对数为64,本轮博弈消耗了较少的算力和较少的计算时间。
另外,还可以根据策略可行域内的各行为动作对对应的代价值,为各行为动作对添加决策标签。
自车及横穿博弈车执行其中3个采样动作后,自车及横穿博弈车的行为决策是:自车加速行进,且横穿博弈车减速行进。根据当前帧推演中决策开始时刻获取的自车及横穿博弈的运动状态,可以推演出自车及横穿博弈车执行这3个采样动作中的任一个之后,自车先于横穿博弈车通过冲突区域,因此确定这些行为动作对对应的意图决策为自车抢行横穿博弈车。相应地,为这3个行为动作对设置自车抢行决策标签,也即表1中的“Cg”。
自车及横穿博弈车执行其中13个采样动作后,自车及横穿博弈车的行为决策是:横穿博弈车加速行进,且自车减速行进。根据当前帧推演中决策开始时刻获取的自车及横穿博弈的运动状态,可以推演出自车及横穿博弈车执行这13个采样动作中的任一个之后,横穿博弈车先于自车通过冲突区域,因此确定这些行为动作对对应的意图决策为横穿博弈车抢行自车。相应地,为这13个行为动作对设置横穿博弈车抢行决策标签,也即表1中的“Cy”。
S140:从自车与对向博弈车的多个策略空间中,执行各所述策略空间的逐次释放,并确定自车与对向博弈车的博弈结果。具体可包括下述步骤S141-S144:
S141:按照先纵向采样维度再横向采样维度的释放原则,释放自车及对向博弈车的纵向采样维度,张成第一个自车及对向博弈车的纵向采样策略空间。
从自车与对向博弈车的多维博弈空间内,释放自车及对向博弈车的纵向加速度维度,张成第一个自车及对向博弈车的纵向采样策略空间。考虑自车及对向博弈车的车辆纵向/横向动力学、运动学约束、自车及对向博弈车的相对位置关系和相对速度关系,确定自车及对向博弈车的纵向加速度取值区间均为[-4,3],单位为m/s 2,确定自车及对向博弈车的采样间隔均为1m/s 2
张成的该策略空间以二维表格展示如表2所示。表2的第一行罗列了自车的纵向加速度值的全部取值,Ae;第一列罗列了对向博弈车的纵向加速度值的全部取值,Ao2。也即,本次释放的自车及对向博弈车的纵向采样策略空间中包括8乘以8,共64个自车及对向博弈车的纵向加速度行为动作对。
表2.释放的自车及对向博弈车的纵向采样策略空间
Figure PCTCN2021109331-appb-000007
S142:根据预先定义的各代价值确定方法,如各代价函数,分别计算自车及向博弈车的纵向采样策略空间中各行为动作对对应的代价值,并确定策略可行域。
在释放的64个行为动作对中,自车及对向博弈车执行其中9个采样动作后,在自车及对向博弈车构建的子交通场景中,通过性太差(如刹停),为不可行解,在表2中,这些动作对用标签“0”来标识。
在释放的64个行为动作对中,自车及对向博弈车执行其中55个采样动作后,在自车及对向博弈车构建的子交通场景中,安全性太差(如碰撞),为不可行解,在表2中,这些动作对用标签“-1”来标识。
也即,自车及对向博弈车执行释放的64个行为动作对后,在自车及对向博弈车构建的子交通场景中,安全性代价值或通过性代价值均大于预先设定的代价阈值,第一次释放的策略空间内没有可行解,自车及对向博弈车的策略可行域为空。
S143:释放自车的横向偏移维度,与所述自车及对向博弈车的纵向加速度维度张成第二个自车及对向博弈车的策略空间。
具体的,释放自车在横向偏移维度上的部分取值与自车及对向博弈车在纵向加速度维度上的部分取值,张成第二个自车及对向博弈车的策略空间。
首先,从自车与对向博弈车的多维博弈空间内,确定自车及对向博弈车在横向偏移维度上张成的最大横向采样策略空间。图10示出了针对这2个车辆分别针对横向 偏移采样而确定的横向采样动作的示意图。也即,多个横向偏移行为动作对应于车辆可以执行的多条相互平行的横向偏移轨迹。
考虑自车及对向博弈车的车辆纵向/横向动力学、运动学约束、自车及对向博弈车的相对位置关系和相对速度关系,确定自车及对向博弈车的横向偏移取值区间均为[-3,3],单位为m,m表示米。在采样时,根据自车的计算能力和预先设定的决策精度,确定自车及对向博弈车的采样间隔均为1m,这时释放的自车及对向博弈车的横向采样策略空间以二维表格展示如表3的上子表所示。表3的上子表的第一行罗列了自车的横向偏移值的全部取值Oe,第一列罗列了对向博弈车的横向偏移值的全部取值Oo2。故自车及对向博弈车在横向偏移维度上张成的横向采样策略空间中最多包括7乘以7,也即49个自车及对向博弈车的横向偏移行为动作对。
车辆行驶时,不能独立发生横向偏移而不发生纵向行为动作。因此,在释放自车及对向博弈车的横向采样策略空间的同时,需释放自车及对向博弈车在纵向加速度维度上的多个行为动作对。
为了减少算力、节约计算资源,在本次释放时,只释放自车在横向偏移维度上的部分取值,并与自车及对向博弈车在纵向采样维度纵向加速度维度上的部分取值张成第二次释放的策略空间。这时,对向博弈车的横向偏移值为零。如表3的上子表所示,从横向采样策略空间中,选择对向博弈车的横向偏移值为0及自车的横向偏移值分别为-3,-2,-1,0,1,2或3组成7个横向偏移行为动作对。用这7个横向偏移行为动作对分别与前一次释放的64个自车及对向博弈车的纵向加速度行为动作对(如表2所示)进行组合,可以得到7乘以64,也即448个由行为动作对,这时,每一个行为动作对中,对向博弈车的横向偏移值为0。与自车及对向博弈车的纵向加速度采样对应的策略空间最多可以释放64个行为动作对相比,这时释放的行为动作对的数量增加了6倍,为第一次释放的7倍。
S144:根据各代价值确定方法,如各代价函数,分别计算自车在横向偏移维度上的部分取值与自车及对向博弈车在纵向加速度维度上的部分取值张成第二次释放的策略空间中各行为动作对对应的代价值,并确定策略可行域。
如表3的下子表所示,在自车横向偏移值为1时,在释放的64个自车与对向博弈车的纵向加速度行为动作对中,自车及对向博弈车执行其中16个采样动作后,在自车及对向博弈车构建的子交通场景中,通过性太差(如刹停),为不可行解,在表3中,这些动作对用标签“0”来标识。
在自车横向偏移值为1时,在释放的64个自车与对向博弈车纵向加速度行为动作对中,自车及对向博弈车执行其中48个采样动作后,在自车及对向博弈车构建的子交通场景中,安全性、舒适性、通过性、横向偏移代价值、路权代价值、风险区域代价值、帧间关联性代价值的加权和大于预先设定的代价阈值,为该策略空间内的可行解,构成了自车及对向博弈车的策略可行域。在表3中,这48个动作对用标签“1”来标识。这时,因为是在当前帧交互博弈,不涉及前一帧的决策结果,因此帧间关联性代价值为零。
这时,自车与对向博弈车的交互博弈已经找到48个可行解,不再需要继续在自车与对向博弈车的博弈空间内寻找解决方案。这时,搜索的行为动作对总数为64,本 轮博弈消耗了较少的算力和较少的计算时间。
也即,用对向博弈车的横向偏移值为0及自车的横向偏移值为1的横向偏移行为动作对分别与前一次释放的64个自车及对向博弈车的纵向加速度行为动作对进行组合,得到的64个行为动作对中,有48个可行解(表3中用阴影底纹示出了可行解),这些可行解可以加入到自车及对向博弈车的策略可行域内。
这是因为,自车向右横向偏移1m(以自车为参考,向右横向偏移为正,向左横向偏移为负)后,由于自车及对向博弈车已在横向错开,在自车及对向博弈车的纵向采样策略空间内,其策略可行域覆盖了除两车都刹停外的所有情况(表3中用底纹示出了该动作对)。
并且,相对于表2,表3的下子表中自车及对向博弈车的纵向加速度分别为-1时对应的行为动作对的标签从“-1”调整到了“0”。这是因为,在自车向右横向偏移1m时,已经可以使得自车及对向博弈车不再具有碰撞风险,这些行为动作对映射到自车及对向博弈车构建的交通场景中的通过性太差(为刹停),仍旧为不可行解,但是标签从“-1”调整为了“0”。
并且,这时,针对自车及对向博弈车也可以确定意图决策,并设置策略标签,可以参考步骤S132,这里不再赘述。
表3.自车及对向博弈车的横向采样策略空间、自车及对向博弈车的纵向采样策略空间共同张成的策略空间
Figure PCTCN2021109331-appb-000008
以上就自车与对向博弈车的策略空间的释放,还可以在自车的横向偏移维度上选择多个采样值,如,自车的横向偏移值分别为2或3,并与自车与对向博弈车的纵向加速度采样策略空间共同张成更多的策略空间。
本实施例中,用对向博弈车的横向偏移值为0及自车的横向偏移值为1的横向偏移行为动作对及前一次释放的64个自车及对向博弈车的纵向加速度行为动作对张成第二次释放的策略空间,并从该策略空间中搜索到48个可行解,因此,不再需要释放其他的策略空间这时,交互博弈消耗了较少的算力和较少的计算时间。
表4.自车及对向博弈车、与自车及横穿博弈车的可行解
Figure PCTCN2021109331-appb-000009
S150:就自车及对向博弈车的策略可行域与自车及横穿博弈车的策略可行域求交集,确定自车的博弈结果。
针对确定的自车及横穿博弈车的策略可行域和自车及对向博弈车的策略可行域取交集,找到两者的公共可行域,并从公共可行域中找到代价值最小(也即收益最好)的可行解。
表4展示了表3中自车及对向博弈车的策略可行域与表1中自车及横穿博弈车的策略可行域的公共可行域中找到的代价值最小(也即收益最好)的可行解。该可行解为自车、对向博弈车及横穿博弈车的博弈决策动作对,是由自车的纵向加速度、对向博弈车的纵向加速度、自车的横向偏移、横穿博弈车的纵向加速度组合而成的多元行为动作对。
也即,即自车以-2m/s 2纵向加速度减速让行,并向右横向偏移1m避让对向博弈车;为保证通行性,横穿博弈车以1m/s 2纵向加速度加速通过冲突区域;对向博弈车以1m/s 2纵向加速度加速通过冲突区域。
自车、对向博弈车及横穿博弈车执行其中该行为动作后,意图决策分别是:横穿博弈车抢行自车,对向博弈车抢行自车,及自车横向向右避让对向博弈车,自车让行横穿博弈车。
S160:从自车的博弈结果中,选择决策结果,可以选择代价值最小的对应的动作对,据此确定出自车的可执行动作,可用于控制自车执行该动作。
在一些实施例中,对于博弈结果中的多个策略可行域,可以根据代价值来选择出一动作对作为决策结果。
在一些实施例中,对于博弈结果中的策略可行域中的多个解(即行为动作对),还可以进一步针对各个解进行连续多帧推演,也即,释放时间采样维度,以选择出在时间维度上一致性较好的行为动作对,作为自车行驶的决策结果。具体可以参照前述对图7的描述。
如图12所示,本申请还提供了相应的一种智能驾驶决策装置的实施例,关于该装置的有益效果或解决的技术问题,可以参见与各装置分别对应的方法中的描述,或者参见发明内容中的描述,此处不再一一赘述。
在该智能驾驶决策装置的实施例中,该智能驾驶决策装置100包括:
获取模块110,用于获取与自车的博弈对象。具体的,用于执行上述步骤S10或步骤S110-S120,或该步骤对应的各个可选的实施例。
处理模块120,用于从自车与博弈对象的多个策略空间中,执行多个策略空间的多次释放,当多次释放中的一次释放执行后,根据已经释放的各策略空间确定自车与博弈对象的策略可行域,根据策略可行域确定自车行驶的决策结果。具体的,用于执行上述步骤S20-S40,或该步骤对应的各个可选的实施例。
在一些实施例中,多个策略空间的维度包括至少以下之一:纵向采样维度、横向采样维度、或时间采样维度。
在一些实施例中,执行多个策略空间的多次释放包括按照以下维度的顺序执行所述释放:纵向采样维度、横向采样维度、时间采样维度。
在一些实施例中,确定自车与博弈对象的策略可行域时,策略可行域中的行为动 作对的总代价值,根据以下之一或多个确定:自车或博弈对象的安全性代价值、路权代价值、横向偏移代价值、通过性代价值、舒适性代价值、帧间关联性代价值、风险区域代价值。
在一些实施例中,行为动作对的总代价值根据两个或两个以上的代价值进行确定时,各代价值具有不同的权重。
在一些实施例中,博弈对象包括两个或两个以上时,自车行驶的决策结果根据自车与各博弈对象的各策略可行域确定。
在一些实施例中,获取模块110,还用于获取自车的非博弈对象;处理模块120,还用于确定出自车与非博弈对象的策略可行域;自车与非博弈对象的策略可行域包括自车相对于非博弈对象可执行的行为动作;至少根据自车与非博弈对象的策略可行域确定自车行驶的决策结果。
在一些实施例中,处理模块120,还用于根据自车与各博弈对象的各策略可行域的交集确定自车行驶的决策结果的策略可行域,或根据自车与各博弈对象的各策略可行域以及自车与各非博弈对象的各策略可行域的交集确定自车行驶的决策结果的策略可行域。
在一些实施例中,获取模块110,还用于获取自车的非博弈对象;处理模块120,还用于根据非博弈对象的运动状态,约束与自车对应的纵向采样策略空间,或约束与自车对应的横向采样策略空间。
在一些实施例中,获取模块110,还用于获取自车的博弈对象的非博弈对象;处理模块120,还用于根据非博弈对象的运动状态,约束与自车的博弈对象对应的纵向采样策略空间,或约束与自车的博弈对象对应的横向采样策略空间。
在一些实施例中,交集为空集时,执行自车行驶的保守决策,保守决策包括使自车安全停车的动作,或,使自车安全减速行驶的动作。
在一些实施例中,博弈对象或非博弈对象,根据注意力方式进行确定。
在一些实施例中,处理模块120,还用于通过人机交互界面显示至少以下之一:自车行驶的决策结果、决策结果的策略可行域、自车行驶的决策结果对应的自车行驶轨迹、或自车行驶的决策结果对应的博弈对象的行驶轨迹。
以上,自车行驶的决策结果可以是当前单帧推演的决策结果,还可以是已经执行的多个单帧推演分别对应的决策结果,决策结果可以是自车可执行的行为动作,还可以是博弈对象可执行的行为动作,还可以是自车执行该行为动作对应的意图决策如表1中的Cg或Cy,如抢行、让行或避让。
以上,决策结果的策略可行域可以是当前单帧推演的策略可行域,还可以是已经执行的多个单帧推演分别对应的策略可行域。
以上,自车行驶的决策结果对应的自车行驶轨迹可以是在一步决策中第一个单帧推演对应的自车行驶轨迹,如图7中的T1,还可以是在一步决策中已经执行的多个单帧推演依次连接而成的自车行驶轨迹,如图7中的T1、T2、及Tn。
以上,自车行驶的决策结果对应的博弈对象的行驶轨迹可以是在一步决策中第一个单帧推演对应的博弈对象的行驶轨迹,如图7中的T1,还可以是在一步决策中已经执行的多个单帧推演依次连接而成的博弈对象的行驶轨迹,如图7中的T1、T2、及 Tn。
如图13所示,本申请实施例还提供了一种车辆行驶控制方法,包括:
S210:获取车外障碍物信息;
S220:针对所述障碍物信息,根据以上任一智能驾驶决策方法确定车辆行驶的决策结果;
S230:根据所述决策结果控制车辆的行驶。
如图14所示,本申请实施例还提供了一种车辆行驶控制装置200,包括:获取模块210,用于获取车外障碍物;处理模块220,用于针对所述障碍物,根据以上任一智能驾驶决策方法确定车辆行驶的决策结果;处理模块还用于根据所述决策结果控制车辆的行驶。
如图15所示,本申请实施例还提供了一种车辆300,包括:上述车辆行驶控制装置200,及行驶***250;所述车辆行驶控制装置200控制所述行驶***250。在一些实施例中,行驶***250可以包括前述图2中的行驶***13。
图16是本申请实施例提供的一种计算设备400的结构性示意性图。该计算设备400包括:处理器410、存储器420,还可以包括通信接口430。
应理解,该图16中所示的计算设备400中的通信接口430可以用于与其他设备之间进行通信。
其中,该处理器410可以与存储器420连接。该存储器420可以用于存储该程序代码和数据。因此,该存储器420可以是处理器410内部的存储单元,也可以是与处理器410独立的外部存储单元,还可以是包括处理器410内部的存储单元和与处理器410独立的外部存储单元的部件。
可选的,计算设备400还可以包括总线。其中,存储器420、通信接口430可以通过总线与处理器410连接。总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。
应理解,在本申请实施例中,该处理器410可以采用中央处理单元(central processing unit,CPU)。该处理器还可以是其它通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(Application specific integrated circuit,ASIC)、现成可编程门矩阵(field programmable gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。或者该处理器410采用一个或多个集成电路,用于执行相关程序,以实现本申请实施例所提供的技术方案。
该存储器420可以包括只读存储器和随机存取存储器,并向处理器410提供指令和数据。处理器410的一部分还可以包括非易失性随机存取存储器。例如,处理器410还可以存储设备类型的信息。
在计算设备400运行时,所述处理器410执行所述存储器420中的计算机执行指令执行上述方法的操作步骤。
应理解,根据本申请实施例的计算设备400可以对应于执行根据本申请各实施例的方法中的相应主体,并且计算设备400中的各个模块的上述和其它操作和/或功能分 别为了实现本实施例各方法的相应流程,为了简洁,在此不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时用于执行上述方法,该方法包括上述各个实施例所描述的方案中的至少之一。
本申请实施例的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是,但不限于,电、磁、光、电磁、红外线、或半导体的***、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质, 该程序可以被指令执行***、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行***、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括、但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
其中,说明书和权利要求书中的词语“第一、第二、第三等”或模块A、模块B、模块C等类似用语,仅用于区别类似的对象,不代表针对对象的特定排序,可以理解地,在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
在以上的描述中,所涉及的表示步骤的标号,如S116、S124……等,并不表示一定会按此步骤执行,在允许的情况下可以互换前后步骤的顺序,或同时执行。
说明书和权利要求书中使用的术语“包括”不应解释为限制于其后列出的内容;它不排除其它的元件或步骤。因此,其应当诠释为指定所提到的所述特征、整体、步骤或部件的存在,但并不排除存在或添加一个或更多其它特征、整体、步骤或部件及其组群。因此,表述“包括装置A和B的设备”不应局限为仅由部件A和B组成的设备。
本说明书中提到的“一个实施例”或“实施例”意味着与该实施例结合描述的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在本说明书各处出现的用语“在一个实施例中”或“在实施例中”并不一定都指同一实施例,但可以指同一实施例。此外,在一个或多个实施例中,能够以任何适当的方式组合各特定特征、结构或特性,如从本公开对本领域的普通技术人员显而易见的那样。注意,上述仅为本申请的较佳实施例及所运用的技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请的构思的情况下,还可以包括更多其他等效实施例,均属于本申请的保护范畴。

Claims (31)

  1. 一种智能驾驶决策方法,其特征在于,包括:
    获取自车的博弈对象;
    从自车与所述博弈对象的多个策略空间中,执行所述多个策略空间的多次释放,当所述多次释放中的一次释放执行后,根据已经释放的各所述策略空间确定自车与所述博弈对象的策略可行域,根据所述策略可行域确定所述自车行驶的决策结果。
  2. 根据权利要求1所述的方法,其特征在于,
    所述多个策略空间的维度包括至少以下之一:纵向采样维度、横向采样维度、或时间采样维度。
  3. 根据权利要求2所述的方法,其特征在于,所述执行所述多个策略空间的多次释放包括按照以下维度的顺序执行所述释放:纵向采样维度、横向采样维度、时间采样维度。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述确定自车与所述博弈对象的策略可行域时,所述策略可行域中的行为动作对的总代价值,根据以下之一或多个确定:
    自车或博弈对象的安全性代价值、路权代价值、横向偏移代价值、通过性代价值、舒适性代价值、帧间关联性代价值、风险区域代价值。
  5. 根据权利要求4所述的方法,其特征在于,所述行为动作对的总代价值根据两个或两个以上的代价值进行确定时,各所述代价值具有不同的权重。
  6. 根据权利要求1所述的方法,其特征在于,所述博弈对象包括两个或两个以上时,所述自车行驶的决策结果根据自车与各所述博弈对象的各策略可行域确定。
  7. 根据权利要求1-6任一所述的方法,其特征在于,还包括:
    获取自车的非博弈对象;
    确定出自车与所述非博弈对象的策略可行域;
    至少根据自车与所述非博弈对象的策略可行域确定所述自车行驶的决策结果。
  8. 根据权利要求6或7所述的方法,其特征在于,根据自车与各所述博弈对象的各策略可行域的交集确定所述自车行驶的决策结果的策略可行域,或
    根据自车与各所述博弈对象的各策略可行域以及自车与各所述非博弈对象的各策略可行域的交集确定所述自车行驶的决策结果的策略可行域。
  9. 根据权利要求2-8任一所述的方法,其特征在于,还包括:
    获取自车的非博弈对象;
    根据所述非博弈对象的运动状态,约束与自车对应的纵向采样策略空间,或约束与自车对应的横向采样策略空间。
  10. 根据权利要求2-8任一所述的方法,其特征在于,还包括:
    获取自车的博弈对象的非博弈对象;
    根据所述非博弈对象的运动状态,约束与自车的博弈对象对应的纵向采样策略空间,或约束与自车的博弈对象对应的横向采样策略空间。
  11. 根据权利要求8所述的方法,其特征在于,所述交集为空集时,执行自车行驶的保守决策,所述保守决策包括使所述自车安全停车的动作,或,使所述自车安全 减速行驶的动作。
  12. 根据权利要求1所述的方法,其特征在于,所述博弈对象或非博弈对象,根据注意力方式进行确定。
  13. 根据权利要求1-12任一所述的方法,其特征在于,还包括:通过人机交互界面显示至少以下之一:
    所述自车行驶的决策结果、所述决策结果的策略可行域、所述自车行驶的决策结果对应的自车行驶轨迹、或所述自车行驶的决策结果对应的博弈对象的行驶轨迹。
  14. 一种智能驾驶决策装置,其特征在于,包括:
    获取模块,用于获取自车的博弈对象;
    处理模块,用于从自车与所述博弈对象的多个策略空间中,执行所述多个策略空间的多次释放,当所述多次释放中的一次释放执行后,根据已经释放的各所述策略空间确定自车与所述博弈对象的策略可行域,根据所述策略可行域确定所述自车行驶的决策结果。
  15. 根据权利要求14所述的装置,其特征在于,
    所述多个策略空间的维度包括至少以下之一:纵向采样维度、横向采样维度、或时间采样维度。
  16. 根据权利要求15所述的装置,其特征在于,所述执行所述多个策略空间的多次释放包括按照以下维度的顺序执行所述释放:纵向采样维度、横向采样维度、时间采样维度。
  17. 根据权利要求14-16任一所述的装置,其特征在于,所述确定自车与所述博弈对象的策略可行域时,所述策略可行域中的行为动作对的总代价值,根据以下之一或多个确定:
    自车或博弈对象的安全性代价值、路权代价值、横向偏移代价值、通过性代价值、舒适性代价值、帧间关联性代价值、风险区域代价值。
  18. 根据权利要求17所述的装置,其特征在于,所述行为动作对的总代价值根据两个或两个以上的代价值进行确定时,各所述代价值具有不同的权重。
  19. 根据权利要求14所述的装置,其特征在于,所述博弈对象包括两个或两个以上时,所述自车行驶的决策结果根据自车与各所述博弈对象的各策略可行域确定。
  20. 根据权利要求14-19任一所述的装置,其特征在于,所述获取模块还用于获取自车的非博弈对象;
    所述处理模块还用于确定出自车与所述非博弈对象的策略可行域;以及用于至少根据自车与所述非博弈对象的策略可行域确定所述自车行驶的决策结果。
  21. 根据权利要求19或20所述的装置,其特征在于,所述处理模块还用于:
    根据自车与各所述博弈对象的各策略可行域的交集确定所述自车行驶的决策结果的策略可行域,或
    根据自车与各所述博弈对象的各策略可行域以及自车与各所述非博弈对象的各策略可行域的交集确定所述自车行驶的决策结果的策略可行域。
  22. 根据权利要求15-21任一所述的装置,其特征在于,
    所述获取模块还用于获取自车的非博弈对象;
    所述处理模块还用于根据所述非博弈对象的运动状态,约束与自车对应的纵向采样策略空间,或约束与自车对应的横向采样策略空间。
  23. 根据权利要求15-21任一所述的装置,其特征在于,
    所述获取模块还用于获取自车的博弈对象的非博弈对象;
    所述处理模块还用于根据所述非博弈对象的运动状态,约束与自车的博弈对象对应的纵向采样策略空间,或约束与自车的博弈对象对应的横向采样策略空间。
  24. 根据权利要求21所述的装置,其特征在于,所述交集为空集时,执行自车行驶的保守决策,所述保守决策包括使所述自车安全停车的动作,或,使所述自车安全减速行驶的动作。
  25. 根据权利要求14所述的装置,其特征在于,所述博弈对象或非博弈对象,根据注意力方式进行确定。
  26. 根据权利要求14所述的装置,其特征在于,所述处理模块还用于通过人机交互界面显示至少以下之一:
    所述自车行驶的决策结果、所述决策结果的策略可行域、所述自车行驶的决策结果对应的自车行驶轨迹、或所述自车行驶的决策结果对应的博弈对象的行驶轨迹。
  27. 一种车辆行驶控制方法,其特征在于,包括:
    获取车外障碍物;
    针对所述障碍物,根据权利要求1-13任一所述方法确定车辆行驶的决策结果;
    根据所述决策结果控制车辆的行驶。
  28. 一种车辆行驶控制装置,其特征在于,包括:
    获取模块,用于获取车外障碍物;
    处理模块,用于针对所述障碍物,根据权利要求1-13任一所述方法确定车辆行驶的决策结果;
    所述处理模块还用于根据所述决策结果控制车辆的行驶。
  29. 一种车辆,其特征在于,包括:
    如权利要求28所述的车辆行驶控制装置,及行驶***;
    所述车辆行驶控制装置控制所述行驶***。
  30. 一种计算设备,其特征在于,包括:
    处理器,以及
    存储器,其上存储有程序指令,所述程序指令当被所述处理器执行时使得所述处理器实现权利要求1-13任一所述的智能驾驶决策方法,或所述程序指令当被所述处理器执行时使得所述处理器实现权利要求27所述的车辆行驶控制方法。
  31. 一种计算机可读存储介质,其特征在于,其上存储有程序指令,所述程序指令当被处理器执行时使得所述处理器实现权利要求1-13任一所述的智能驾驶决策方法,或所述程序指令当被所述处理器执行时使得所述处理器实现权利要求27所述的车辆行驶控制方法。
PCT/CN2021/109331 2021-07-29 2021-07-29 智能驾驶决策方法、车辆行驶控制方法、装置及车辆 WO2023004698A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP21951305.8A EP4360976A1 (en) 2021-07-29 2021-07-29 Method for intelligent driving decision-making, vehicle movement control method, apparatus, and vehicle
CN202180008224.5A CN115943354A (zh) 2021-07-29 2021-07-29 智能驾驶决策方法、车辆行驶控制方法、装置及车辆
PCT/CN2021/109331 WO2023004698A1 (zh) 2021-07-29 2021-07-29 智能驾驶决策方法、车辆行驶控制方法、装置及车辆
US18/424,238 US20240166242A1 (en) 2021-07-29 2024-01-26 Intelligent driving decision-making method, vehicle traveling control method and apparatus, and vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/109331 WO2023004698A1 (zh) 2021-07-29 2021-07-29 智能驾驶决策方法、车辆行驶控制方法、装置及车辆

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/424,238 Continuation US20240166242A1 (en) 2021-07-29 2024-01-26 Intelligent driving decision-making method, vehicle traveling control method and apparatus, and vehicle

Publications (1)

Publication Number Publication Date
WO2023004698A1 true WO2023004698A1 (zh) 2023-02-02

Family

ID=85085998

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109331 WO2023004698A1 (zh) 2021-07-29 2021-07-29 智能驾驶决策方法、车辆行驶控制方法、装置及车辆

Country Status (4)

Country Link
US (1) US20240166242A1 (zh)
EP (1) EP4360976A1 (zh)
CN (1) CN115943354A (zh)
WO (1) WO2023004698A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116503025A (zh) * 2023-06-25 2023-07-28 深圳高新区信息网有限公司 一种基于工作流引擎的业务工单流程处理方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160144838A1 (en) * 2013-06-14 2016-05-26 Valeo Schalter Und Sensoren Gmbh Method and device for carrying out collision-avoiding measures
CN108595823A (zh) * 2018-04-20 2018-09-28 大连理工大学 一种联合驾驶风格和博弈理论的自主车换道策略的计算方法
EP3539837A1 (en) * 2018-03-13 2019-09-18 Veoneer Sweden AB A vehicle radar system for detecting preceding vehicles
CN110362910A (zh) * 2019-07-05 2019-10-22 西南交通大学 基于博弈论的自动驾驶车辆换道冲突协调模型建立方法
CN111267846A (zh) * 2020-02-11 2020-06-12 南京航空航天大学 一种基于博弈论的周围车辆交互行为预测方法
CN112907967A (zh) * 2021-01-29 2021-06-04 吉林大学 一种基于不完全信息博弈的智能车换道决策方法
CN113160547A (zh) * 2020-01-22 2021-07-23 华为技术有限公司 一种自动驾驶方法及相关设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160144838A1 (en) * 2013-06-14 2016-05-26 Valeo Schalter Und Sensoren Gmbh Method and device for carrying out collision-avoiding measures
EP3539837A1 (en) * 2018-03-13 2019-09-18 Veoneer Sweden AB A vehicle radar system for detecting preceding vehicles
CN108595823A (zh) * 2018-04-20 2018-09-28 大连理工大学 一种联合驾驶风格和博弈理论的自主车换道策略的计算方法
CN110362910A (zh) * 2019-07-05 2019-10-22 西南交通大学 基于博弈论的自动驾驶车辆换道冲突协调模型建立方法
CN113160547A (zh) * 2020-01-22 2021-07-23 华为技术有限公司 一种自动驾驶方法及相关设备
CN111267846A (zh) * 2020-02-11 2020-06-12 南京航空航天大学 一种基于博弈论的周围车辆交互行为预测方法
CN112907967A (zh) * 2021-01-29 2021-06-04 吉林大学 一种基于不完全信息博弈的智能车换道决策方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116503025A (zh) * 2023-06-25 2023-07-28 深圳高新区信息网有限公司 一种基于工作流引擎的业务工单流程处理方法
CN116503025B (zh) * 2023-06-25 2023-09-19 深圳高新区信息网有限公司 一种基于工作流引擎的业务工单流程处理方法

Also Published As

Publication number Publication date
US20240166242A1 (en) 2024-05-23
EP4360976A1 (en) 2024-05-01
CN115943354A (zh) 2023-04-07

Similar Documents

Publication Publication Date Title
US11932284B2 (en) Trajectory setting device and trajectory setting method
CN110488802B (zh) 一种网联环境下的自动驾驶车辆动态行为决策方法
CN111081065B (zh) 路段混行条件下的智能车辆协同换道决策模型
US11364902B2 (en) Testing predictions for autonomous vehicles
CN108269424B (zh) 用于车辆拥堵估计的***和方法
CN110325823B (zh) 基于规则的导航
CN113071520B (zh) 车辆行驶控制方法及装置
CN110068346A (zh) 用于自主车辆中不受保护的操纵缓解的***和方法
EP4316935A1 (en) Method and apparatus for obtaining lane change area
CN114074681A (zh) 基于概率的车道变更决策和运动规划***及其方法
US20230037767A1 (en) Behavior planning for autonomous vehicles in yield scenarios
US20240166242A1 (en) Intelligent driving decision-making method, vehicle traveling control method and apparatus, and vehicle
CN112689024B (zh) 一种车路协同的货车队列换道方法、装置及***
US20210078608A1 (en) System and method for providing adaptive trust calibration in driving automation
WO2023087157A1 (zh) 一种智能驾驶方法及应用该方法的车辆
CN116142194A (zh) 一种拟人化的换道决策方法
CN115810291A (zh) 一种关联目标识别方法、装置、路侧设备及车辆
WO2023168630A1 (zh) 一种控车方法及相关装置
Qin et al. Two-lane multipoint overtaking decision model based on vehicle network
WO2024140381A1 (zh) 一种自动泊车方法、装置和智能驾驶设备
US20230278562A1 (en) Method to arbitrate multiple automatic lane change requests in proximity to route splits
CN113954871B (zh) 对于自主车辆测试预测的方法、装置及介质
CN115402321A (zh) 一种变道策略确定方法、***、电子设备及存储介质
CN117760453A (zh) 车辆并道行驶的路线规划方法、装置以及计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21951305

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2024504975

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2021951305

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2021951305

Country of ref document: EP

Effective date: 20240126

NENP Non-entry into the national phase

Ref country code: DE