KR20220102694A

KR20220102694A - System and Method for Improving Traffic for Autonomous Vehicles at Non Signalized Intersections

Info

Publication number: KR20220102694A
Application number: KR1020210004703A
Authority: KR
Inventors: 배상훈
Original assignee: 부경대학교 산학협력단; 에스에이엠(주)
Priority date: 2021-01-13
Filing date: 2021-01-13
Publication date: 2022-07-21
Also published as: KR102479484B1

Abstract

The present invention relates to a device for the improved passage of an autonomous vehicle at a non-signal intersection and a method thereof, which can utilize the responsibility-sensitive safety theory and the partial observability markov decision process at a non-signal intersection to enable an efficient passage. To this end, the device of the present invention may comprise: a state initialization unit of initializing the learning state of the partial observability markov decision process (POMDP) algorithm; an optimal action deducing unit of applying the responsibility-sensitive safety (RSS) algorithm to the POMDP model for the optimization of the driving of an autonomous vehicle to deduce the optimal action; an action executing unit of executing the optimal action deduced by the optimal action deducing unit; a state observation unit of receiving data observed in the vision sensor and the radar sensor of the autonomous vehicle to observe the driving state; and a compensation determination unit of correcting the action of the autonomous vehicle based on the RSS algorithm for the autonomous vehicle and the adaptive model predictive control system for the vehicle driven by human according to the recognition of safety, unsafety, failure, and the target compensation.

Description

비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법{System and Method for Improving Traffic for Autonomous Vehicles at Non Signalized Intersections}System and Method for Improving Traffic for Autonomous Vehicles at Non Signalized Intersections

본 발명은 자율주행 차량 통행 제어에 관한 것으로, 구체적으로 비신호 교차로에서 책임민감성 안전이론과 부분적 관측 마르코프 결정 절차를 활용하여 효율적 통행이 가능하도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법에 관한 것이다.The present invention relates to autonomous vehicle traffic control, and more specifically, for improved passage of autonomous vehicles at non-signaling intersections, which enables efficient passage by utilizing the responsibility-sensitive safety theory and partial observation Markov decision procedure at non-signaling intersections. It relates to an apparatus and method.

자율 주행 차량(Autonomous Vehicle)은 카메라 또는 전방물체 감지센서를 이용하여 차선을 인식하고 자동 조향을 행하는 기술이 탑재된 차량이다. 자율 주행 차량은 카메라의 이미지 프로세싱 또는 전방물체 감지센싱을 기반으로 차선 폭, 차선상의 차량의 횡방향 위치, 양측 차선까지의 거리 및 차선의 형태, 도로의 곡률 반경이 측정되며, 이와 같이 얻어진 차량의 위치와 도로의 정보를 사용하여 차량의 주행 궤적을 추정하고, 추정된 주행 궤적을 따라 차선을 변경한다.An autonomous vehicle is a vehicle equipped with a technology that recognizes a lane and performs automatic steering using a camera or a front object detection sensor. For autonomous vehicles, the lane width, the lateral position of the vehicle on the lane, the distance to both lanes and the shape of the lane, and the radius of curvature of the road are measured based on the camera's image processing or sensing of the front object. The vehicle's driving trajectory is estimated using the location and road information, and the lane is changed according to the estimated driving trajectory.

자율 주행 차량(Autonomous Vehicle)은 차량 전방에 장착된 카메라 또는 전방물체 감지센서에서 검출되는 선행차량의 위치 및 거리를 통하여 차량의 쓰로틀밸브, 브레이크 및 변속기를 자동 제어하여 적절한 가감속을 수행함으로써, 선행차량과 적정거리를 유지하도록할 수도 있다.An autonomous vehicle automatically controls the vehicle's throttle valve, brake, and transmission through the location and distance of the preceding vehicle detected by a camera mounted on the front of the vehicle or a front object detection sensor to perform appropriate acceleration/deceleration. You can also make sure you keep an appropriate distance from the vehicle.

그러나 이와 같은 자율 주행 차량(Autonomous Vehicle)이 교차로를 통과하는 경우에는 신호등의 교통신호에 따라 정차 후 출발시 선행 차량의 움직임을 감지한 다음 출발하므로 차량들 간의 출발이 지체되어 교차로에서 정체가 발생될 수 있다.However, when such an autonomous vehicle passes through an intersection, it detects the movement of the preceding vehicle and then departs after stopping according to the traffic signal of a traffic light. can

특히, 자율주행 차량과 같이 센서로부터 입력되는 정보를 이용하여 주행 환경을 파악하는 경우 비신호 교차로에서의 주행은 일반적인 도로에서의 주행보다 훨씬 어려운 과제가 된다.In particular, when the driving environment is grasped using information input from a sensor, such as an autonomous vehicle, driving at a non-signaled intersection becomes a much more difficult task than driving on a general road.

자율주행 차량이 주행 환경을 파악하여 비신호 교차로에서의 효율적인 주행을 위한 연구들이 이루어지고 있으나, 혼합 교통류 상황(자율주행차량과 인간운전자의 혼재)에서 자율주행차량 군집주행에 따른 비신호 교차로 통행에서는 아직도 해결하여야 하는 과제가 많다.Studies are being conducted for efficient driving at non-signal intersections by understanding the driving environment of autonomous vehicles, but in non-signaling intersections due to platooning of autonomous vehicles in a mixed traffic flow situation (a mixture of autonomous vehicles and human drivers) There are still many challenges to be solved.

따라서, 자율주행차량 군집주행에 따른 비신호 교차로 통행 개선 및 안전성 확보를 위한 새로운 기술의 개발이 요구되고 있다.Therefore, there is a demand for the development of new technologies for improving the traffic at non-signal intersections and securing safety according to the platooning of autonomous vehicles.

대한민국 공개특허 제10-2020-0071406호Republic of Korea Patent Publication No. 10-2020-0071406 대한민국 공개특허 제10-2020-0058613호Republic of Korea Patent Publication No. 10-2020-0058613 대한민국 공개특허 제10-2018-0065196호Republic of Korea Patent Publication No. 10-2018-0065196

본 발명은 종래 기술의 자율주행 차량 통행 제어 기술의 문제점을 해결하기 위한 것으로, 비신호 교차로에서 책임민감성 안전이론과 부분적 관측 마르코프 결정 절차를 활용하여 효율적 통행이 가능하도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention is to solve the problems of the autonomous vehicle traffic control technology of the prior art, and an autonomous vehicle at a non-signal intersection that enables efficient passage by utilizing the responsibility-sensitive safety theory and partial observation Markov decision procedure at non-signal intersections An object of the present invention is to provide an apparatus and method for improved passage of

본 발명은 비신호 교차로에서 다수의 인간 운전자 차량 사이의 자율주행차량의 교통안전 보장, 지체시간 등을 고려하여 자율 주행 행태를 학습시키는 모델을 구축하여 효율적인 자율주행 차량의 사고 방지가 가능하도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention is a non-signaling vehicle that enables efficient accident prevention of autonomous vehicles by building a model that learns autonomous driving behavior in consideration of traffic safety guarantees and delay times of autonomous vehicles between multiple human driver vehicles at non-signal intersections. An object of the present invention is to provide an apparatus and method for improved passage of an autonomous vehicle at a signal intersection.

본 발명은 실제 상황과 같이 자율주행차량이 관찰할 수 있는 범위 내의 정보를 통하여 학습하는 방법으로 강화학습인 마르코프 의사결정 모델 사용(Partial Observability MDP, POMDP)으로 행동에 대한 강화학습의 보상을 최대화할 수 있도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention is a method of learning through information within the range that an autonomous vehicle can observe as in a real situation. It is a method of maximizing the reward of reinforcement learning for behavior by using a Markov decision-making model (Partial Observability MDP, POMDP), which is reinforcement learning. An object of the present invention is to provide an apparatus and method for improved passage of an autonomous vehicle at a non-signaled intersection.

본 발명은 Matlab의 Automated Driving Toolbox를 활용하여 레이다, 비전 센서 데이터를 활용하고, 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS) 기반의 강화학습-자율주행 시스템 프레임워크로 최적화를 하여 자율주행차량의 시스템이 다른 차량의 행태를 예측하며 운행할 수 있도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention utilizes Matlab's Automated Driving Toolbox to utilize radar and vision sensor data, and optimizes with Responsibility-Sensitive Safty (RSS)-based reinforcement learning-autonomous driving system framework for self-driving vehicles. An object of the present invention is to provide an apparatus and method for an improved passage of an autonomous vehicle at a non-signaled intersection so that the system can operate while predicting the behavior of other vehicles.

본 발명은 부분적인 환경에 대한 관찰을 바탕으로 학습하는 대상의 의사결정을 할 수 있게 해주는 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 과정을 포함하여 행동을 결정하고 행동에 대해 강화학습의 보상을 최대화할 수 있도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention determines behavior and reinforcement learning for behavior, including a Partial Observability Markov decision process (POMDP) process, which enables decision-making of a learning target based on observation of a partial environment. The purpose is to provide an apparatus and method for improved passage of autonomous vehicles at non-signaled intersections to maximize the compensation of

본 발명은 시뮬레이션 환경에서 실제 자율주행 환경을 모사하기 위해 학습과 행동 결정의 근거를 시뮬레이션의 모든 환경(전체 관찰)이 아닌 자율주행차량 센서를 통하여 얻어진 데이터(부분만 관찰)를 기반으로 하여 행동을 결정하고 행동에 대해 강화학습의 보상을 최대화하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.In order to simulate the actual autonomous driving environment in the simulation environment, the present invention sets the basis for learning and behavior decision based on the data (partial observation) obtained through the autonomous vehicle sensor rather than the entire environment (full observation) of the simulation. The objective is to provide a device and method for improved passage of autonomous vehicles at non-signaling intersections that make decisions and maximize the reward of reinforcement learning for actions.

본 발명은 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 비신호 교차로에서 자율주행차량과 인간운전자 사이의 안전 거리를 유지하기 위해 사용하여 자율주행차량 모델이 거리에 따라 위험한 상황이 발생할 수 있을 때 적절하게 대응할 수 있도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention uses a Responsibility-Sensitive Safty (RSS) to maintain a safe distance between an autonomous vehicle and a human driver at a non-signaled intersection, so that a dangerous situation may occur depending on the distance of the autonomous vehicle model. An object of the present invention is to provide an apparatus and method for improved passage of an autonomous vehicle at a non-signaled intersection so that it can respond appropriately when necessary.

본 발명은 인간 운전자 차량용 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System) 적용으로, 시뮬레이션 상 자율주행차량의 센서를 통해 획득된 전방의 가장 가까운 인간 운전자와의 상대적 거리와 상대 속도를 파악하고, 제어 변수에서 자율주행차량은 전방 차량과 일정한 거리를 자율적으로 유지하는 방식으로 인간 운전자의 행태에 반응하여 운행할 수 있도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention is the application of an Adaptive Model Predictive Control System for a human driver vehicle, and the relative distance and relative speed to the nearest human driver in front obtained through the sensor of the autonomous vehicle in simulation are identified, As a control variable, an autonomous vehicle provides a device and method for improved passage of an autonomous vehicle at a non-signaled intersection that enables it to operate in response to the behavior of a human driver in a manner that autonomously maintains a certain distance from the vehicle in front. There is a purpose.

본 발명의 다른 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.Other objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned will be clearly understood by those skilled in the art from the following description.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치는 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 알고리즘의 학습 상태를 초기화하는 상태 초기화부;자율주행차량(AV) 운행 최적화를 위한 POMDP 모델에 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 적용하여 최적 행동을 도출하는 최적 행동 도출부;최적 행동 도출부에서 도출된 최적 행동을 실행하는 행동 실행부;자율주행차량(AV)의 비전 센서, 레이더 센서에서 관찰된 데이터를 제공받아 주행 상태를 관찰하는 상태 관찰부;안전, 비안전, 실패, 목표 보상 인지에 따라 자율주행 차량용 RSS 알고리즘과 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)에 기반한 인간 운전자 차량에 대응하여 자율주행 차량 행동 수정을 하는 보상 결정부;를 포함하는 것을 특징으로 한다.In order to achieve the above object, an apparatus for improved passage of an autonomous vehicle at a non-signaling intersection according to the present invention is a state initialization that initializes the learning state of a Partial Observability Markov decision process (POMDP) algorithm. Sub; Optimal behavior derivation unit that derives optimal behavior by applying Responsibility-Sensitive Safty (RSS) to the POMDP model for optimizing autonomous vehicle (AV) operation; Optimal behavior derived from the optimal behavior derivation unit Action execution unit that executes; State observation unit that receives data observed from a vision sensor and radar sensor of an autonomous vehicle (AV) and observes the driving state; RSS algorithm for autonomous vehicle according to safety, non-safety, failure, and target reward recognition and a compensation determining unit that corrects the behavior of the autonomous driving vehicle in response to the human driver vehicle based on the Adaptive Model Predictive Control System.

다른 목적을 달성하기 위한 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법은 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 알고리즘의 학습 상태를 초기화하는 상태 초기화 단계;자율주행차량(AV) 운행 최적화를 위한 POMDP 모델에 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 적용하여 최적 행동을 도출하는 최적 행동 도출 단계;최적 행동 도출 단계에서 도출된 최적 행동을 실행하는 행동 실행 단계;자율주행차량(AV)의 비전 센서, 레이더 센서에서 관찰된 데이터를 제공받아 주행 상태를 관찰하는 상태 관찰 단계;안전, 비안전, 실패, 목표 보상 인지에 따라 자율주행 차량용 RSS 알고리즘과 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)에 기반한 인간 운전자 차량에 대응하여 자율주행 차량 행동 수정을 하는 보상 결정 단계;를 포함하는 것을 특징으로 한다.A method for improved passage of an autonomous vehicle at a non-signaling intersection according to the present invention for achieving another object is a state initialization step of initializing a learning state of a Partial Observability Markov decision process (POMDP) algorithm; Responsibility-Sensitive Safety (RSS) is applied to the POMDP model for optimizing autonomous vehicle (AV) operation to derive the optimal behavior; Behavioral execution stage; State observation stage of observing driving conditions by receiving data observed from the vision sensor and radar sensor of the autonomous driving vehicle (AV); and a reward determination step of correcting the behavior of the autonomous driving vehicle in response to the human driver vehicle based on the Adaptive Model Predictive Control System.

이상에서 설명한 바와 같은 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법은 다음과 같은 효과가 있다.The apparatus and method for improved passage of an autonomous vehicle at a non-signaling intersection according to the present invention as described above have the following effects.

첫째, 비신호 교차로에서 책임민감성 안전이론과 부분적 관측 마르코프 결정 절차를 활용하여 효율적 통행이 가능하도록 한다.First, at non-signaled intersections, efficient passage is possible by using the responsibility-sensitive safety theory and partial observation Markov decision procedure.

둘째, 비신호 교차로에서 다수의 인간 운전자 차량 사이의 자율주행차량의 교통안전 보장, 지체시간 등을 고려하여 자율 주행 행태를 학습시키는 모델을 구축하여 효율적인 자율주행 차량의 사고 방지가 가능하도록 한다.Second, it is possible to efficiently prevent accidents of autonomous vehicles by establishing a model that learns autonomous driving behaviors in consideration of the traffic safety guarantee and delay time of autonomous driving vehicles between multiple human driver vehicles at non-signal intersections.

셋째, 실제 상황과 같이 자율주행차량이 관찰할 수 있는 범위 내의 정보를 통하여 학습하는 방법으로 강화학습인 마르코프 의사결정 모델 사용(Partial Observability MDP, POMDP)으로 행동에 대한 강화학습의 보상을 최대화할 수 있도록 한다.Third, as in the real situation, it is a method of learning through information within the range that autonomous vehicles can observe. let it be

넷째, Matlab의 Automated Driving Toolbox를 활용하여 레이다, 비전 센서 데이터를 활용하고, 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS) 기반의 강화학습-자율주행 시스템 프레임워크로 최적화를 하여 자율주행차량의 시스템이 다른 차량의 행태를 예측하며 운행할 수 있도록 한다.Fourth, by utilizing Matlab's Automated Driving Toolbox, radar and vision sensor data are utilized, and the Responsibility-Sensitive Safty (RSS)-based reinforcement learning-autonomous driving system framework is optimized to optimize the autonomous vehicle system. It predicts the behavior of this other vehicle and allows it to be driven.

다섯째, 부분적인 환경에 대한 관찰을 바탕으로 학습하는 대상의 의사결정을 할 수 있게 해주는 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 과정을 포함하여 행동을 결정하고 행동에 대해 강화학습의 보상을 최대화할 수 있도록 한다.Fifth, including the Partial Observability Markov decision process (POMDP) process, which enables decision-making of the learning target based on partial observation of the environment, determines behavior and to maximize rewards.

여섯째, 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 비신호 교차로에서 자율주행차량과 인간운전자 사이의 안전 거리를 유지하기 위해 사용하여 자율주행차량 모델이 거리에 따라 위험한 상황이 발생할 수 있을 때 적절하게 대응할 수 있도록 한다.Sixth, when Responsibility-Sensitive Safty (RSS) is used to maintain a safe distance between autonomous vehicles and human drivers at non-signaled intersections, when the autonomous vehicle model may cause dangerous situations depending on the distance. to respond appropriately.

일곱째, 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)을 인간 운전자 차량의 제어에 적용하여, 시뮬레이션 상 자율주행차량의 센서에서 획득된 전방의 가장 가까운 인간 운전자 차량과의 상대적 거리와 상대 속도를 파악하고, 제어 변수에서 자율주행차량은 전방 차량과 일정한 거리를 자율적으로 유지하는 방식으로 인간 운전자의 행태에 반응하여 운행할 수 있도록 한다.Seventh, by applying the Adaptive Model Predictive Control System to the control of the human driver vehicle, the relative distance and relative speed to the nearest human driver vehicle in front obtained from the sensor of the autonomous vehicle in the simulation were calculated. In the control variable, the autonomous vehicle can operate in response to the behavior of the human driver in a way that autonomously maintains a certain distance from the vehicle in front.

도 1a는 ACC 시스템의 규칙을 나타낸 구성도
도 1b는 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치의 구성도
도 2a와 도 2b는 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 과정을 설명하기 위한 구성도
도 3은 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법을 나타낸 동작 흐름도
도 4는 POMDP 프레임워크를 통한 RSS 알고리즘 최적화를 나타낸 의사 코드 형식 구성도
도 5a와 도 5b는 RSS 기반 POMDP 모델의 성능평가를 위한 시뮬레이션 실험 구성도
도 6a와 도 6b는 첫 번째 실험의 출력 프로파일을 사용한 시뮬레이션 결과 그래프
도 7a와 도 7b는 두 번째 실험의 출력 프로파일을 사용한 시뮬레이션 결과 그래프
도 8은 본 발명에 따른 모델과 이전의 적응형 MPC 모델의 성능 비교 그래프
도 9는 본 발명에 따른 모델에서 자율 주행 차량 속도에 따른 시뮬레이션 결과 그래프
도 10은 본 발명에 따른 모델에서 자율 주행 차량의 가속을 통한 시뮬레이션 결과 그래프1a is a block diagram showing the rules of the ACC system;
1b is a block diagram of an apparatus for improved passage of an autonomous vehicle at a non-signaling intersection according to the present invention;
Figures 2a and 2b is a configuration diagram for explaining the partial observation Markov decision process (Partial Observability Markov decision process; POMDP) process
3 is an operational flow diagram illustrating a method for improved passage of an autonomous vehicle at a non-signaling intersection according to the present invention;
4 is a pseudo code format configuration diagram showing the RSS algorithm optimization through the POMDP framework.
5A and 5B are diagrams of simulation experiments for performance evaluation of the RSS-based POMDP model.
6A and 6B are graphs of simulation results using the output profile of the first experiment.
7A and 7B are graphs of simulation results using the output profile of the second experiment.
8 is a performance comparison graph between the model according to the present invention and the previous adaptive MPC model.
9 is a graph of simulation results according to the speed of an autonomous vehicle in a model according to the present invention;
10 is a graph of simulation results through acceleration of an autonomous vehicle in a model according to the present invention;

이하, 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법의 바람직한 실시 예에 관하여 상세히 설명하면 다음과 같다.Hereinafter, a preferred embodiment of an apparatus and method for improved passage of an autonomous vehicle at a non-signaling intersection according to the present invention will be described in detail as follows.

본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법의 특징 및 이점들은 이하에서의 각 실시 예에 대한 상세한 설명을 통해 명백해질 것이다.Features and advantages of the apparatus and method for improved passage of an autonomous vehicle at a non-signaling intersection according to the present invention will become apparent from the detailed description of each embodiment below.

도 1a는 ACC 시스템의 규칙을 나타낸 구성도이고, 도 1b는 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치의 구성도이다.1A is a block diagram showing the rules of the ACC system, and FIG. 1B is a block diagram of an apparatus for improved passage of an autonomous vehicle at a non-signaling intersection according to the present invention.

본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법은 비신호 교차로에서 책임민감성 안전이론과 부분적 관측 마르코프 결정 절차를 활용하여 효율적 통행이 가능하도록 한 것이다.The apparatus and method for improved passage of autonomous vehicles at non-signaled intersections according to the present invention enable efficient passage at non-signaled intersections by using the responsible-sensitive safety theory and partial observation Markov decision procedure.

이를 위하여, 본 발명은 비신호 교차로에서 다수의 인간 운전자 차량 사이의 자율주행차량의 교통안전 보장, 지체시간 등을 고려하여 자율 주행 행태를 학습시키는 모델을 구축하는 구성을 포함할 수 있다.To this end, the present invention may include a configuration for building a model for learning autonomous driving behavior in consideration of traffic safety guarantees and delay times of autonomous driving vehicles between a plurality of human driver vehicles at a non-signaling intersection.

본 발명은 실제 상황과 같이 자율주행차량이 관찰할 수 있는 범위 내의 정보를 통하여 학습하는 방법으로 강화학습인 마르코프 의사결정 모델 사용(Partial Observability MDP, POMDP)으로 행동에 대한 강화학습의 보상을 최대화하는 구성을 포함할 수 있다.The present invention is a method of learning through information within the range that an autonomous vehicle can observe as in a real situation. It is a method of maximizing the reward of reinforcement learning for behavior by using a Markov decision-making model (Partial Observability MDP, POMDP), which is reinforcement learning. configuration may be included.

본 발명은 Matlab의 Automated Driving Toolbox를 활용하여 레이다, 비전 센서 데이터를 활용하고, 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS) 기반의 강화학습-자율주행 시스템 프레임워크로 최적화를 하여 자율주행차량의 시스템이 다른 차량의 행태를 예측하며 운행할 수 있도록 하는 구성을 포함할 수 있다.The present invention utilizes Matlab's Automated Driving Toolbox to utilize radar and vision sensor data, and optimizes with Responsibility-Sensitive Safty (RSS)-based reinforcement learning-autonomous driving system framework for self-driving vehicles. The system may include a configuration that enables the system to operate while predicting the behavior of other vehicles.

본 발명은 부분적인 환경에 대한 관찰을 바탕으로 학습하는 대상의 의사결정을 할 수 있게 해주는 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 과정을 포함하여 행동을 결정하고 행동에 대해 강화학습의 보상을 하는 구성을 포함할 수 있다.The present invention determines behavior and reinforcement learning for behavior, including a Partial Observability Markov decision process (POMDP) process, which enables decision-making of a learning target based on observation of a partial environment. It may include a configuration to compensate for.

본 발명은 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 비신호 교차로에서 자율주행차량과 인간운전자 사이의 안전 거리를 유지하기 위해 사용하여 자율주행차량 모델이 거리에 따라 위험한 상황이 발생할 수 있을 때 적절하게 대응하는 구성을 포함할 수 있다.The present invention uses a Responsibility-Sensitive Safty (RSS) to maintain a safe distance between an autonomous vehicle and a human driver at a non-signaled intersection, so that a dangerous situation may occur depending on the distance of the autonomous vehicle model. When appropriate, it may include a corresponding configuration.

본 발명은 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)을 인간 운전자 차량에 적용하였고, 시뮬레이션 상 자율주행차량의 센서에서 획득된 전방의 가장 가까운 인간 운전자 차량과의 상대적 거리와 상대 속도를 파악하고, 제어 변수에서 자율주행차량은 전방 차량과 일정한 거리를 자율적으로 유지하는 방식으로 인간 운전자의 행태에 반응하여 운행하는 구성을 포함할 수 있다.The present invention applies an Adaptive Model Predictive Control System to a human driver's vehicle, and identifies the relative distance and relative speed to the nearest human driver's vehicle in front obtained from the sensor of the autonomous vehicle in simulation. and, in the control variable, the autonomous vehicle may include a configuration in which the autonomous vehicle operates in response to the behavior of the human driver in a manner that autonomously maintains a predetermined distance from the vehicle in front.

적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)에 관하여 설명하면 다음과 같다.The Adaptive Model Predictive Control System will be described as follows.

MPC(Model Predictive Control)는 인간 운전자 차량의 미래 행동을 추정하고 실시간 최적화를 사용해서 예측된 내용을 바탕으로 적절한 동작을 제어하도록 설계된 것으로, 이에 대응하여 자율주행차량에게 적절한 가속도를 선택하게 하는 역할을 한다.Model Predictive Control (MPC) is designed to estimate the future behavior of human driver vehicles and use real-time optimization to control the appropriate behavior based on the predictions. do.

본 발명에서 MPC의 탐지 객체는 시뮬레이션 상 자율주행차량의 센서에서 획득된 전방의 가장 가까운 인간 운전자와의 상대적 거리와 상대 속도를 파악하고, 제어 변수에서 자율주행차량은 전방 차량과 일정한 거리를 자율적으로 유지하는 방식으로 인간 운전자의 행태에 반응하여 운행하도록 한다.In the present invention, the detection object of the MPC detects the relative distance and relative speed with the nearest human driver in front obtained from the sensor of the autonomous driving vehicle in the simulation, and in the control variable, the autonomous vehicle autonomously sets a certain distance from the vehicle in front. to respond to human driver behavior in a way that maintains

그리고 ACC 시스템은 하위 및 상위 레벨 컨트롤러로 구성되고, 상위 레벨 컨트롤러는 상대 속도와 상대 거리가 융합된 것아고, 하위 레벨 컨트롤러는 브레이크 시스템을 조정하여 최고의 가속도를 달성하는 것이이다.And the ACC system is composed of lower and upper level controllers, the upper level controller is a fusion of relative speed and relative distance, and the lower level controller is to adjust the brake system to achieve the best acceleration.

최적의 가속도를 계산하기 위해서는 자율 주행 차량(ego vehicle)과 선도 차량 사이의 관계가 설정되어야 한다. ACC는 전방 차량의 상대 위치와 상대 속도를 자율적으로 유지함으로써 자율 주행 차량의 종방향 가속을 제어한다. 컨트롤러는 온보드 센서(예: 레이더 및 비전 센서)의 실시간 측정을 기반으로 차량 간(V2V) 통신을 통해 상대 속도와 거리를 추정한다.To calculate the optimal acceleration, the relationship between the ego vehicle and the lead vehicle must be established. ACC controls the longitudinal acceleration of an autonomous vehicle by autonomously maintaining the relative position and relative speed of the vehicle in front. The controller estimates relative speed and distance via vehicle-to-vehicle (V2V) communication based on real-time measurements from onboard sensors (such as radar and vision sensors).

안전 거리는 다음과 같이 정의된다The safety distance is defined as

여기서,

는 ACC 시스템의 안전 거리,

는 자율주행차량(ego vehicle)의 실제 속도,

는 원하는 정지 거리,

은 차량 사이의 이동 시간을 나타낸다.here,

is the safety distance of the ACC system,

is the actual speed of the ego vehicle,

is the desired stopping distance,

represents the travel time between vehicles.

ACC 시스템의 운전 결정 함수는 다음과 같다.The driving decision function of the ACC system is as follows.

는 인간 운전 차량이 너무 가깝고, 안전 거리가 복구될 때까지 자율주행차량이 감속함을 의미한다.(공간 제어)

means that the human-driven vehicle is too close, and the autonomous vehicle decelerates until a safe distance is restored (space control).

는 인간 운전 차량이 너무 멀다는 것을 의미하며, 자율주행 차량은 설정 속도에 도달할 때까지 평상시와 같이 움직인다.(속도 제어)

means that the human-driven vehicle is too far away, and the autonomous vehicle moves as usual until it reaches a set speed (speed control).

센서 융합이 있는 종방향 상에서 추적 선도 차량은 동일한 차선의 자율주행 차량 전방과 센서의 감지 범위 내의 기타 차선에 있는 물체를 감지하여 자율 주행 차량과 유도 차량(자율 주행 차량 앞에 가장 가까운 인간 운전 차량) 사이의 상대 거리와 상대 속도를 찾는다. In the longitudinal direction with sensor fusion, the tracking lead vehicle detects objects in front of the autonomous vehicle in the same lane and in other lanes within the sensor's detection range, allowing the vehicle to move between the autonomous vehicle and the guided vehicle (the closest human-driven vehicle in front of the autonomous vehicle). Find the relative distance and relative speed of

도 1a는 ACC 시스템의 규칙을 나타낸 것으로, ACC 시스템의 주행 결정 기능과 관련된 안전 거리와 상대적 거리 사이의 관계를 나타낸다(예: 간격 제어 및 속도 제어).1A shows the rules of the ACC system, and shows the relationship between the safety distance and the relative distance related to the driving decision function of the ACC system (eg, interval control and speed control).

ACC 시스템에서는 이산 시간에 ACC 장착 차량의 가속도가 다음과 같이 제시된다.In the ACC system, the acceleration of the vehicle equipped with ACC in discrete time is presented as follows.

여기서,

는 ACC 장착 차량의 가속,

는 샘플링 기간이며,

는 하위 레벨 컨트롤러의 유한 대역폭에 해당하는 시간 지연이며,

는 가속에 관한 제어 변수 매트릭스를 나타낸다.here,

is the acceleration of a vehicle equipped with ACC,

is the sampling period,

is the time delay corresponding to the finite bandwidth of the lower-level controller,

denotes the matrix of control variables for acceleration.

MPC 알고리즘은 미래 행동을 추정하고 온라인 최적화를 사용하여 예측 시야에서 적절한 제어 동작을 결정하도록 설계되었다.The MPC algorithm is designed to estimate future behavior and use online optimization to determine the appropriate control behavior in the predictive field of view.

MPC는 출력과 입력 사이의 상호작용을 고려하는데, 이는 피드백 제어 알고리즘의 작동 방식과 비슷하다. 이 모델은 AV에 가장 적합한 가속도를 선택한다.MPC takes into account the interaction between output and input, similar to how feedback control algorithms work. This model selects the most appropriate acceleration for AV.

MPC는 트랙션 제어 문제와 차선 유지 보조 시스템 등 일부 자율 제어 애플리케이션을 통해 도입되었다.MPC has been introduced through some autonomous control applications, such as traction control issues and lane keeping assistance systems.

MPC 알고리즘에서 샘플링 시간 k에서 측정할 수 있는 예측 및 전류 상태 파라미터는 다음과 같이 표시된다.The predicted and current state parameters that can be measured at sampling time k in the MPC algorithm are expressed as

여기서,

는 예측 상태 행렬,

는 현재 상태 행렬이며, A와 B는 다음과 같이 상태 전환 행렬을 나타낸다.here,

is the predicted state matrix,

is the current state matrix, and A and B represent the state transition matrices as follows.

이산 시간으로 분리된 연속 시간에서 세로 방향 차량 역학의 상대 거리

, 상대 속도

및 자기 속도

를 포함한 입력 데이터는 다음과 같다Relative distance of longitudinal vehicle dynamics in continuous time separated by discrete time

, relative speed

and magnetic speed

The input data including

여기서,

는 상대 가속도이고,

는 시간 t에서 자율 주행 차량의 가속도이다.here,

is the relative acceleration,

is the acceleration of the autonomous vehicle at time t.

적응형 MPC 시스템은 최대 스로틀 또는 제동 기능을 통해 더욱 강력해지고, 호스트 차량은 호스트 차량이 갑자기 차선을 변경하거나 제동할 때 즉시 응답할 수 있다. 이에 따라 적응형 MPC 시스템은 안전, 제어 추종 차량, 부드러운 주행 및 연비에 촛점을 둔다.The adaptive MPC system becomes more powerful with full throttle or braking functions, and the host vehicle can respond immediately when the host vehicle abruptly changes lanes or brakes. Accordingly, the adaptive MPC system focuses on safety, control-following vehicles, smooth driving and fuel economy.

ACC 시스템과 유사한 하드 제약 조건에 통합된 속도(v), 가속(a), 제동(u) 및 저크(jerk)(j)의 제약 조건은 다음과 같이 표현할 수 있다.The constraints of speed (v), acceleration (a), braking (u), and jerk (j) integrated into hard constraints similar to the ACC system can be expressed as follows.

센서 융합의 입력 매개 변수, 시뮬레이션 시간, 자동화 차량의 종방향 속도 및 도로 정보에 따르면 센서 융합이 적용된 추적 선도 차량은 먼저 자율주행 차량 앞에 있는 물체를 감지하고 다중 물체 추적기로 전달된다. According to the input parameters of sensor fusion, simulation time, longitudinal speed of the automated vehicle and road information, the tracking lead vehicle with sensor fusion first detects the object in front of the autonomous vehicle and passes it to the multi-object tracker.

탐지 개체의 상태는 Kalman 필터 알고리즘에 의해 추정되고 융합된다.The state of the detection entity is estimated and fused by the Kalman filter algorithm.

그리고 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)은 튜플에 의해 정의된 모든 주행 시나리오(안전 거리, 위험 상황 및 적절한 대응)와 관련된 인간의 개념에 기초하는 안전한 자동화 차량에 대한 규칙을 공식화하기 위해 도입되었다.And the Responsibility-Sensitive Safety Algorithm (RSS) is used to formulate rules for safe automated vehicles based on human concepts related to all driving scenarios (safe distances, hazardous situations and appropriate responses) defined by tuples. was introduced

안전 추종 거리 유지와 같은 몇 가지 간단한 규칙에 따라 RSS 알고리즘은 AV의 주변 환경에 대한 응답으로 AV에 대한 안전 보장을 제안한다. 예를 들어, AV는 AV와 사람이 운전하는 차량 사이의 충돌이 발생할 때 책임을 평가하고 결정한다.Following a few simple rules, such as maintaining a safe-following distance, the RSS algorithm proposes a safety guarantee for the AV in response to the AV's surrounding environment. For example, AV evaluates and determines liability when a collision occurs between an AV and a human-driven vehicle.

자율주행차량과 인간운전자 충돌 시 책임을 평가하는데 사용하고, 본 발명에서 RSS 알고리즘은 비신호 교차로에서 자율주행차량과 인간운전자 사이의 안전 거리를 유지하기 위해 사용하여 자율주행차량 모델은 거리에 따라 위험한 상황이 발생할 수 있을 때 적절하게 대응할 수 있도록 한다.It is used to evaluate liability in the event of a collision between an autonomous vehicle and a human driver, and in the present invention, the RSS algorithm is used to maintain a safe distance between the autonomous vehicle and the human driver at a non-signaled intersection, so that the autonomous vehicle model is dangerous depending on the distance. Be able to respond appropriately when a situation may arise.

AV(Automated vehicles)는 RSS 알고리즘을 혼합 트래픽에 적용하고 위험한 상황을 피하기 위해 안전한 거리(예: 안전한 세로 거리와 안전한 가로 거리)를 유지해야 한다.Automated vehicles (AVs) must apply RSS algorithms to mixed traffic and maintain safe distances (eg safe vertical and safe horizontal distances) to avoid dangerous situations.

또한, AV가 다른 자동차와의 교통 사고를 피할 수 있다면, 그들은 최소 가속도로 감속되거나 차선을 바꾸어야 한다.Also, if AV can avoid a traffic accident with another car, they must decelerate or change lanes with minimal acceleration.

안전 거리(

)는 다음과 같이 표시된 AV의 반응 시간(

) 및 제동 거리(

)를 포함한다.safe distance (

) is the response time (

) and braking distance (

) is included.

여기서,

는 AV의 응답 시간을 나타내고,

은 AV의 실제 속도이고,

는 AV의 최소 가속도이다.here,

represents the response time of AV,

is the actual speed of the AV,

is the minimum acceleration of the AV.

본 발명에서는 운전자 없는 차량에서 수동 주행 차량까지의 안전 거리를 유지하기 위해 비신호화된 교차로에 RSS 알고리즘을 적용한다.In the present invention, the RSS algorithm is applied to the non-signaled intersection in order to maintain a safe distance from the driverless vehicle to the manually driven vehicle.

AV는 수학적 수식에 기초하여 가능하면 밖의 경로와 다른 경로를 유지해야 한다. 즉, RSS 알고리즘은 다른 차량에 의해 위험한 상황이 발생할 수 있는 경우 자동화된 차량이 적절하게 대응하도록 보장하는 것이다. AV should maintain a path that is different from the outside path as much as possible based on a mathematical formula. In other words, the RSS algorithm is to ensure that the automated vehicle responds appropriately when a dangerous situation may be caused by another vehicle.

첫째, 안전 거리는 수학적 정의(수학식 12)를 사용하여 계산한다. First, the safety distance is calculated using a mathematical definition (Equation 12).

둘째, 전방 충돌 경고는 확인된 트랙을 통한 자율 주행 차량과 MIO(Most Important Object)트랙 사이의 상대 거리와 상대 속도에서 결정된다.Second, forward collision warning is determined from the relative distance and relative speed between the autonomous vehicle and the Most Important Object (MIO) track through the identified track.

마지막으로 AV는 안전 상태(예: ACC 시스템의 속도 제어 또는 간격 제어)를 복원하기 위해 적절한 조치를 취한다.Finally, the AV takes appropriate action to restore a safe state (e.g. speed control or interval control in the ACC system).

RSS 알고리즘에 따르면, AV는 응답 시간

에서 최대 가속도에 도달할 때까지 가속되었고 수동 구동 차량으로부터 안전한 거리를 유지하기 위해 응답 시간 후 최소 가속도에 의해 감속된다.According to the RSS algorithm, AV is the response time

was accelerated until it reached its maximum acceleration and then decelerated by the minimum acceleration after a response time to maintain a safe distance from the manually driven vehicle.

따라서 자율 주행 결정 함수는 다음과 같다.Therefore, the autonomous driving decision function is

만약,

인 경우, 두 차량 모두 정상 주행 및 설정 속도(ACC 시스템의 속도 제어)를 따를 수 있다.what if,

, both vehicles can follow the normal driving and set speed (speed control of the ACC system).

만약,

인 경우, 자율 주행 차량은 안전 거리가 복원될 때까지 최소 가속으로 감속한다(ACC 시스템의 공간 제어).what if,

, the autonomous vehicle decelerates to minimum acceleration until a safe distance is restored (space control of the ACC system).

그리고 마르코프 결정 과정(MDP)은 완전히 관찰 가능한 무작위 환경에서 적절한 조치를 결정하는 데 사용되는 강력한 프레임워크이다. 그러나 AV는 부정확한 의도와 센서 소음을 고려하여 불확실한 환경으로 기동한다. And the Markov Decision Process (MDP) is a powerful framework used to determine appropriate actions in a completely observable, randomized environment. However, AV maneuvers into an uncertain environment taking into account inaccurate intentions and sensor noise.

이 문제를 해결하기 위해, POMDP는 부분적으로 관찰 가능한 MDP로 제안된다. To solve this problem, POMDP is proposed as a partially observable MDP.

여기서 POMDP는 튜플(S, A, T, R, O, Z)로 지정된 타임 스텝(time step)에 걸쳐 가능한 각 믿음 상태(belief state)에 대한 적절한 조치를 결정하기 위해 사용된다,where POMDP is used to determine the appropriate action for each possible belief state over a time step specified by a tuple (S, A, T, R, O, Z),

S와 A는 참가자의 상태 및 행동이다. 각각,

는 전이 확률을 나타내고,

는 선택된 작용에 대한 보상을 정의하고, O는 관측치를 정의하고, Z는 관측 함수이다.S and A are the participant's state and behavior. each,

represents the transition probability,

defines the compensation for the selected action, O defines the observation, and Z is the observation function.

POMDP 시스템에서, 우리는 믿음 상태(belief state)를 유지한다.(예를 들어, 불완전한 시스템 상태로 인한 위치, 속도, yaw 및 yaw rate 포함)In the POMDP system, we maintain a belief state (including, for example, position, velocity, yaw and yaw rate due to incomplete system state).

자율 주행 차량이(가속, 감속 및 원하는 속도 유지 등) 조치를 취하고 레이더와 비전 센서를 통해 관찰을 수신하면, 새로운 믿음 상태는 베이즈의 규칙에 기초하여 얻어진다.When an autonomous vehicle takes action (such as accelerating, decelerating and maintaining a desired speed) and receiving observations via radar and vision sensors, a new belief state is obtained based on Bayes' rule.

POMDP 프레임워크는 다음과 같이 정의된 몬테카를로 방법의 기초가 되는 예상 보상을 최대화하는 것을 목표로 한다The POMDP framework aims to maximize the expected reward underlying the Monte Carlo method, defined as

본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치는 도 1b에서와 같이, POMDP 알고리즘의 학습 상태를 초기화하는 상태 초기화부(10)와, 자율주행차량 운행 최적화를 위한 POMDP 모델에 RSS 알고리즘을 제공받아 최적 행동을 도출하는 최적 행동 도출부(20)와, 최적 행동 도출부(20)에서 도출된 최적 행동을 실행하는 행동 실행부(30)와, 시뮬레이션 상 자율주행차량의 비전 센서, 레이더 센서에서 관찰된 데이터를 제공받아 주행 상태를 관찰하는 상태 관찰부(40)와, 안전, 비안전, 실패, 목표 보상 인지에 따라 RSS 알고리즘과 적응형 MPC 시스템에 기반한 자율주행 차량 행동 수정을 하는 보상 결정부(50)와, 보상 결정부(50)의 보상 수준이 적절한지 판단하는 보상 수준 판단부(60)와, 보상 수준 판단부(60)의 판단 결과 보상 수준이 목표치가 아닌 경우에 상태를 업데이트하는 상태 업데이트부(70)를 포함한다.As shown in FIG. 1B , the apparatus for improved passage of autonomous vehicles at non-signal intersections according to the present invention includes a state initialization unit 10 that initializes a learning state of the POMDP algorithm, and a POMDP model for optimizing operation of an autonomous vehicle. The optimal behavior derivation unit 20 that derives the optimal behavior by receiving the RSS algorithm from the A state observation unit 40 that observes driving conditions by receiving data observed from sensors and radar sensors, and self-driving vehicle behavior modification based on RSS algorithm and adaptive MPC system according to safety, non-safety, failure, and target reward recognition When the compensation level determination unit 50 determines whether the compensation level of the compensation determination unit 50 is appropriate, and the compensation level determination unit 60 determines that the compensation level is not the target value and a state update unit 70 for updating the state.

도 2a와 도 2b는 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 과정을 설명하기 위한 구성도이다.2A and 2B are block diagrams for explaining a Partial Observability Markov decision process (POMDP) process.

부분관찰 마르코프 의사결정 과정(Partial Observability Markov decision process ; POMDP)은 부분적인 환경에 대한 관찰을 바탕으로 학습하는 대상의 의사결정을 할 수 있게 해주는 프로세스이다.Partial Observability Markov decision process (POMDP) is a process that enables decision-making of a learning object based on observation of a partial environment.

본 발명에서는 시뮬레이션 환경에서 실제 자율주행 환경을 모사하기 위해 학습과 행동 결정의 근거를 시뮬레이션의 모든 환경(전체 관찰)이 아닌 자율주행차량 센서를 통하여 얻어진 데이터(부분만 관찰)를 기반으로 하여 행동을 결정하고 행동에 대해 강화학습의 보상을 최대화하는 것을 목표로 사용된다.In the present invention, in order to simulate the real autonomous driving environment in the simulation environment, the basis for learning and behavior decision is based on the data (partial observation) obtained through the autonomous vehicle sensor rather than the entire environment (full observation) of the simulation. It is used with the goal of maximizing the reward of reinforcement learning for decision making and behavior.

POMDP는 자율주행차량 센서를 통하여 얻어진 데이터(부분만 관찰)를 기반으로 하여 행동을 결정하고 행동에 대해 강화학습의 보상을 하기 위하여, 자율주행차량과 인간운전자 상태(시간, 차량 위치, 속도, 결정된 경로) 및, 자율주행차 행태(가속, 유지, 감속) 및, 상태 관찰(시뮬레이션 시간, 차량 위치, 차량 속도, 차량 센서에서 얻어진 데이터) 및, 강화학습 보상을 위하여 안전거리 미확보 시의 부정적 보상, 안전거리 확보시의 일반적 보상, 차량이 목표까지 도달했는가를 기준으로 한 목표 보상, 사고 발생시의 실패 보상의 과정을 수행한다.POMDP determines behaviors based on data (partial observation) obtained through autonomous vehicle sensors, and to reward behaviors with reinforcement learning, autonomous vehicles and human driver states (time, vehicle location, speed, path), autonomous vehicle behavior (acceleration, maintenance, deceleration) and state observation (simulation time, vehicle position, vehicle speed, data obtained from vehicle sensors), and reinforcement learning compensation, negative compensation when a safe distance is not secured; It carries out the process of general compensation when securing a safe distance, target compensation based on whether the vehicle has reached the target, and failure compensation in case of an accident.

POMDP 프레임워크를 이용한 RSS 방법에 관하여 구체적으로 설명하면 다음과 같다.The RSS method using the POMDP framework will be described in detail as follows.

자율 의사결정 프로세스(autonomous decision-making process)의 주요 문제는 불확실성을 이해하고 자율 주행 차량(자기 차량)에 대한 적절한 주행 전략을 결정하는 방법이다. 본 발명은 폐쇄 루프 설정에서 온라인 최적화에 초점을 맞춘다. A major challenge in autonomous decision-making processes is how to understand uncertainty and determine an appropriate driving strategy for autonomous vehicles (own vehicles). The present invention focuses on online optimization in a closed loop setup.

본 발명에 따른 모델은 적응형 MPC 시스템에 기초한 RSS 방법과 POMDP 알고리즘의 융합으로, 불확실한 환경(예: 예측 불가능한 인간 운전자)에서 자율 주행 차량에 대한 진정한 안전 보장을 찾을 수 있다. The model according to the present invention is a fusion of the RSS method based on the adaptive MPC system and the POMDP algorithm, which can find true safety guarantees for autonomous vehicles in uncertain environments (eg, unpredictable human drivers).

도 3은 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법을 나타낸 동작 흐름도이다.3 is an operation flowchart illustrating a method for improved passage of an autonomous vehicle at a non-signaling intersection according to the present invention.

본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법은 POMDP 알고리즘의 학습 상태를 초기화하는 상태 초기화 단계(S301)와, 자율주행차량 운행 최적화를 위한 POMDP 모델에 RSS 알고리즘을 제공받아 최적 행동을 도출하는 최적 행동 도출 단계(S302)와, 최적 행동 도출 단계에서 도출된 최적 행동을 실행하는 행동 실행 단계(S303)와, 시뮬레이션 상 자율주행차량의 비전 센서, 레이더 센서에서 관찰된 데이터를 제공받아 주행 상태를 관찰하는 상태 관찰 단계(S304))와, 안전, 비안전, 실패, 목표 보상 인지에 따라 RSS 알고리즘과 적응형 MPC 시스템에 기반한 자율주행 차량 행동 수정을 하는 보상 결정 단계(S305))와, 보상 결정 단계의 보상 수준이 적절한지 판단하는 보상 수준 판단 단계(S306))와, 보상 수준 판단 단계의 판단 결과 보상 수준이 목표치가 아닌 경우에 상태를 업데이트하는 상태 업데이트 단계(S307)를 포함한다.The method for improved passage of autonomous vehicles at non-signal intersections according to the present invention includes a state initialization step (S301) of initializing the learning state of the POMDP algorithm, and an RSS algorithm provided to the POMDP model for optimizing the operation of the autonomous vehicle. The optimal behavior deriving step (S302) for deriving the optimal behavior, the action execution step (S303) for executing the optimal behavior derived from the optimal behavior derivation step, and the data observed from the vision sensor and the radar sensor of the autonomous driving vehicle in the simulation A state observation step (S304) of observing the driving state by receiving it, and a reward decision step (S305) of correcting the behavior of an autonomous driving vehicle based on the RSS algorithm and the adaptive MPC system according to the recognition of safety, non-safety, failure, and target reward (S305) ), a compensation level determination step (S306) of determining whether the compensation level of the compensation determination step is appropriate, and a status update step (S307) of updating the status when the compensation level is not the target value as a result of the determination of the compensation level determination step include

POMDP 알고리즘의 믿음 상태는 다음과 같이 표시되는 연속 상태 공간이다.The belief state of the POMDP algorithm is a continuous state space expressed as

여기서,

는 자율 주행 차량의 믿음 상태(belief state)를 의미한다.here,

denotes the belief state of the autonomous vehicle.

는 인간 운전 차량의 상태이며,

는 이 모델에서 인간이 운전하는 차량의 수를 나타낸다.

is the state of a human-driven vehicle,

represents the number of vehicles driven by humans in this model.

상태(state)는 다음과 같이 폐쇄 루프 설정에서 레이더 및 비전 센서를 통해 각 타임 스텝(time step)에서 자율주행 차량과 기타 차량의 위치(x,y), 속도(v), yaw(

) 및 yaw rate(

)로 구성된다.The state is the position (x, y), velocity (v), yaw (

) and yaw rate(

) is composed of

행동 공간(Action space)은 다음과 같다.The action space is as follows.

본 발명에 따른 자기 행동(ego action)에는 가속(

), 감속(

) 및 원하는 속도 유지(

)가 포함되었으며, ACC 모델의 간격과 속도 제어에 의해 제어된다.Acceleration (ego action) according to the present invention

), decelerate (

) and keep the desired speed (

) is included, and is controlled by the interval and speed control of the ACC model.

자기 행동(ego action)은

로 정의되는 반면 수동 구동 차량의 작용은 MPC 알고리즘에 의한 시간 단계로 연속 시간 동안 추정한다.ego action is

whereas the behavior of the manually driven vehicle is estimated for continuous time in time steps by the MPC algorithm.

관측 공간(Observation space)에 관하여 설명하면 다음과 같다.The observation space will be described as follows.

관측은 위치, 속도, yaw, yaw rate, 상대 속도(

) 및 운전자 없는 차량과 선행 차량의 상대 거리(

)와 같은 요인으로 구성된다. Observation is the position, velocity, yaw, yaw rate, relative velocity (

) and the relative distance between the driverless vehicle and the preceding vehicle (

) is composed of factors such as

선도 차량(lead vehicle)은 자율주행차량(ego vehicle) 앞에서 가장 가까운 트랙인 MIO 트랙에 의해 감지된다. 또한 비전 및 레이더 센서는 자율 주행 차량과 관련하여 수동 구동 차량의 위치와 속도를 측정할 수 있다. The lead vehicle is detected by the MIO track, which is the track closest to the front of the ego vehicle. Vision and radar sensors can also measure the position and speed of manually driven vehicles in relation to autonomous vehicles.

관측 함수는 다음과 같이 정의된다.The observation function is defined as

보상 기능 및 최적 행동에 관하여 설명하면 다음과 같다.The reward function and optimal behavior will be described as follows.

자동화 차량은 안전한 추종 거리를 유지하는 것과 같은 간단한 규칙을 따라야 한다.Automated vehicles must follow simple rules such as maintaining a safe following distance.

따라서, 보상은 네 가지 측면(즉, 안전, 비안전, 실패 및 목표 보상)에 대해 고려된다. 보상은 센서 융합(비전 및 레이더 센서 사용)과 수동 차량을 통해 자율의 관찰 가능한 매개 변수에서 판단된다.Thus, compensation is considered for four aspects (ie, safety, non-safety, failure, and target compensation). Compensation is judged from the observable parameters of autonomy via sensor fusion (using vision and radar sensors) and passive vehicles.

적절한 응답을 실행하기 위해 RSS 알고리즘과 적응형 MPC 시스템에 기반한 AV의 동작은 보상 값에 기초하여 다음 타임 스텝에 최적화될 수 있다.The operation of the AV based on the RSS algorithm and the adaptive MPC system to implement an appropriate response can be optimized for the next time step based on the compensation value.

따라서 보상 기능과 최적 행동은 다음과 같은 규칙에 따라 갱신된다.Therefore, the reward function and optimal behavior are updated according to the following rules.

(1)

은 안전한 보상을 의미한다. 따라서 AV는 설정된 속도(예: ACC 시스템의 속도 제어)까지 속도를 유지하거나 가속할 수 있다.(One)

means safe compensation. The AV can thus maintain or accelerate its speed up to a set speed (e.g. speed control of the ACC system).

(2)

은 안전하지 않은 보상을 의미하므로 AV는 안전 거리(예: ACC 시스템의 차량 내 간격 제어)가 복원될 때까지 최소 가속으로 감속한다.(2)

means unsafe compensation, so the AV decelerates to minimum acceleration until a safe distance (e.g. in-vehicle spacing control of the ACC system) is restored.

(3)

은 고장 보상을 나타내는 것으로, 폐쇄 루프 설정이 중지된다.(3)

indicates fault compensation, the closed-loop setup is stopped.

(4)목표 위치에 도달하는 차량 중 하나가 목표 보상을 안전하게 표시하므로 폐쇄 루프 설정이 중지된다.(4) The closed-loop setting stops as one of the vehicles reaching the target position safely marks the target reward.

POMDP 프레임워크를 통한 전체 RSS 알고리즘은 다음과 같은 알고리즘으로 의사 코드와 함께 자세히 제시된다.The full RSS algorithm through the POMDP framework is presented in detail with pseudocode as the following algorithm.

도 4는 POMDP 프레임워크를 통한 RSS 알고리즘 최적화를 나타낸 의사 코드 형식 구성도이다.4 is a pseudo code format configuration diagram showing the RSS algorithm optimization through the POMDP framework.

도 4의 POMDP 프레임워크를 통한 RSS 알고리즘 최적화 과정을 단계별로 설명하면 다음과 같다.A step-by-step description of the RSS algorithm optimization process through the POMDP framework of FIG. 4 is as follows.

1 : 운행 상태 수집 S0, S1, S2, S3……1: Operation status collection S0, S1, S2, S3… …

2 : 시뮬레이션 상에서 POMDP 프레임 워크를 기반으로 한 RSS 알고리즘 적용2: Application of RSS algorithm based on POMDP framework in simulation

3 : 시뮬레이션 파라미터 입력3: Input simulation parameters

4 : 시계열 마다 반복 (0.1s, 0.1 초마다 반응)4: Repeat every time series (react every 0.1 s, 0.1 seconds)

5 : 초기 상태 설정 (시뮬레이션 상 차량들의 초기 상태 설정)5: Initial state setting (initial state setting of vehicles in simulation)

6 : S0 = (x0 y0 v0 θ0 ω0) T6: S0 = (x0 y0 v0 θ0 ω0) T

7 : Sk = (xk yk vk θk ωk) T7: Sk = (xk yk vk θk ωk) T

8 : 최적 주행 전략 연산8: Optimal driving strategy calculation

9 : π(b):= argmax aQ(b,a)9: π(b):= argmax aQ(b,a)

9 : 행동 실행, 행동양식 = [가속, 감속, 속도 유지]9: action action, action style = [acceleration, deceleration, maintaining speed]

10 : 시뮬레이션 상에서 자율주행차량의 비전 및 레이더 센서로부터 데이터 수집10: Data collection from vision and radar sensors of autonomous vehicles in simulation

11 : 관찰된 데이터 수집 = = (x, y, v, q, w, rel_v, rel_d)T11: Collect observed data = = (x, y, v, q, w, rel_v, rel_d)T

12 : 강화학습 보상 기능 실행12: Reinforcement learning reward function execution

13 : 보상R = [안전 보상, 비안전 보상, 실패 보상, 목표 보상 수준]13: reward R = [safety reward, non-safety reward, failure reward, target reward level]

14 : 새로운 상태 업데이트 b = t(b, a, O)14 : new state update b = t(b, a, O)

15 : 시뮬레이션 반복 결정 : 목표 보상에 도달할때까지 R = [failure reward, good reward]로 결정.15: Simulation iteration decision: R = [failure reward, good reward] until the target reward is reached.

도 5a와 도 5b는 RSS 기반 POMDP 모델의 성능평가를 위한 시뮬레이션 실험 구성도이다.5A and 5B are diagrams of simulation experiments for performance evaluation of the RSS-based POMDP model.

RSS 기반 POMDP 모델의 성능은 사람 운전 차량의 수가 증가하는 것으로 시뮬레이션하여 평가되었다. 즉, 군집 차량이 적응형 MPC 시스템하에서 제안된 RSS 기반 POMDP 모델에 미치는 영향을 고려했다. The performance of the RSS-based POMDP model was evaluated by simulating the increase in the number of human-driven vehicles. That is, we considered the effect of platoon vehicles on the RSS-based POMDP model proposed under the adaptive MPC system.

RSS 기반 POMDP 모델은 고전적 적응형 MPC 모델이 제안된 모델과의 비교를 위해 사용되도록 적응형 MPC 시스템에서 구현되었다. The RSS-based POMDP model was implemented in the adaptive MPC system so that the classical adaptive MPC model is used for comparison with the proposed model.

교통 안전 보장 강화, 원활한 운전 개선, 지연 시간 단축을 위한 RSS 알고리즘의 성능을 평가하기 위해 두 가지 사례 실험과 특정 설정이 작성되었다. Two case experiments and specific settings were created to evaluate the performance of the RSS algorithm for enhancing traffic safety assurance, improving smooth driving, and reducing latency.

제안된 모델과 고전적인 적응형 MPC 모델은 동일한 안전 거리와 초기 속도를 가지고, 여기에서 사용되는 자율주행 차량은 센서 융합 및 확인된 트랙을 사용하여 MIO 트랙(예: 자율주행 차량 전방에서 가장 가까운 인간 운전 차량)의 상대 거리와 상대 속도를 예측한다. 자율주행 차량은 선도 차량의 동작을 이해하고 안전 거리를 유지할 것인지 여부를 결정해야 한다. 두 가지 실험은 다음과 같다.The proposed model and the classic adaptive MPC model have the same safety distance and initial speed, and the autonomous vehicle used here uses sensor fusion and identified tracks to track MIO tracks (e.g., the closest human in front of the autonomous vehicle). predict the relative distance and relative speed of the driving vehicle). Autonomous vehicles must understand the behavior of the leading vehicle and decide whether to maintain a safe distance or not. The two experiments are as follows.

도 5a는 첫번째 실험 환경을 나타낸 것으로, 자율주행차량은 좌회전을 시도하는 상황이고, 양방향에서 접근하는 사람 운전자가 탑승한 두대의 차량 고려하여여 한다. 사람 운전 차량은 양방향에서 직진으로 접근하는 것이다.FIG. 5A shows the first experimental environment, in which the autonomous vehicle attempts to make a left turn, and two vehicles with a human driver approaching from both directions should be considered. A human-driven vehicle is approaching in a straight line from both directions.

도 5b는 두번째 실험 환경을 나타낸 것으로, 자율주행차량은 좌회전을 시도하는 상황이고, 양방향에서 다수의 차량이 진입하는 상황을 시뮬레이션하는 것으로 진입하는 차량의 간격은 40m로 설정한다.FIG. 5B shows a second experimental environment, in which the autonomous driving vehicle attempts to turn left, simulating a situation in which a plurality of vehicles enter from both directions, and the interval between the vehicles entering is set to 40 m.

표 1에서와 같이, 본 발명에 따른 모델을 검증하기 위한 비신호 교차로에서의 시뮬레이션을 위한 설정값으로, 시뮬레이션 시간 단위, 자율주행차량의 반응 시간, 자율주행차량의 초기 속도, 인간운전자의 초기 속도, 차량의 최소 가속도, 차량의 최대 가속도, 비신호 교차로 진입 도로의 차선 수, 비신호 교차로 진입 도로의 수, 차선의 너비 항목을 포함할 수 있고, 이 항목들은 실제 운행시에 자율 주행을 위한 파라미터로 사용될 수 있다.As shown in Table 1, it is a set value for simulation at a non-signal intersection for verifying the model according to the present invention. Simulation time unit, the reaction time of the autonomous vehicle, the initial speed of the autonomous vehicle, the initial speed of the human driver , the minimum acceleration of the vehicle, the maximum acceleration of the vehicle, the number of lanes on the non-signaled intersection entry road, the number of non-signaled intersection entry roads, and lane width items, and these items are parameters for autonomous driving during actual driving. can be used as

지연 시간을 고려한 성능 지수를 설명하면 다음과 같다.The figure of merit considering the delay is as follows.

시뮬레이션 시나리오에서 비전 및 레이더 센서를 통해 자율주행 차량과 선도 차량 사이의 상대 거리를 기준으로 감속 시작 시간(

))부터 가속 시작 시간(

)까지 제동 시간(

)을 계산하여 지연 시간을 고려한 성능을 분석하였다.In a simulation scenario, the start time of deceleration based on the relative distance between the autonomous vehicle and the leading vehicle via vision and radar sensors (

)) from the acceleration start time (

) until the braking time (

) was calculated to analyze the performance considering the delay time.

제동 시간은 다음과 같이 정의된다.The braking time is defined as

따라서, 시간 성능 지수(

)는 다음과 같이 정리된다.Thus, the time figure of merit (

) is arranged as follows.

여기서,

는 고전적인 적응형 MPC 모델의 제동시간,

은 고전적인 적응형 MPC 모델의 가속 시작 시간이고,

은 고전적인 적응형 MPC 모델의 감속 시작 시간이다.here,

is the braking time of the classic adaptive MPC model,

is the acceleration start time of the classic adaptive MPC model,

is the deceleration start time of the classic adaptive MPC model.

는 본 발명에 따른 모델의 제동 시간,

는 본 발명에 따른 모델의 가속 시작 시간이고,

는 본 발명에 따른 모델의 감속 시작 시간이다.

is the braking time of the model according to the present invention,

is the acceleration start time of the model according to the present invention,

is the deceleration start time of the model according to the present invention.

부드러운 운행을 고려한 성능지수에 관하여 설명하면 다음과 같다.The performance index considering smooth operation will be described as follows.

부드러운 운행 수준 점수(

)는 부드러운 주행 고려의 성능을 분석하기 위해 최소 속도(

)와 설정 속도(

)를 고려한다.Smooth driving level score (

) is the minimum speed (

) and set speed (

) are taken into account.

여기서,

는 적응형 MPC 차량 기준 속도, here,

is the adaptive MPC vehicle reference speed,

는 본 발명에 따른 모델의 차량 기준속도이다.

is the vehicle reference speed of the model according to the present invention.

는 적응형 모델 차량 최소 속도,

is the adaptive model vehicle minimum speed,

는 본 발명에 따른 모델의 차량 최소 속도이다.

is the vehicle minimum speed of the model according to the invention.

도 6a와 도 6b는 첫 번째 실험의 출력 프로파일을 사용한 시뮬레이션 결과 그래프이고, 도 7a와 도 7b는 두 번째 실험의 출력 프로파일을 사용한 시뮬레이션 결과 그래프이다.6A and 6B are simulation result graphs using the output profile of the first experiment, and FIGS. 7A and 7B are simulation result graphs using the output profile of the second experiment.

그리고 도 8은 본 발명에 따른 모델과 이전의 적응형 MPC 모델의 성능 비교 그래프이다.And FIG. 8 is a performance comparison graph of the model according to the present invention and the previous adaptive MPC model.

부드러운 운행은 분명히 충돌이 증가하는 첫 번째 시나리오에서 두 번째 시나리오로 점차적으로 개선되는 경향이 있었다. Smooth running obviously tended to improve gradually from the first scenario with increasing crashes to the second scenario.

표 2와 도 8은 모든 실험에서 자율 주행 차량과 사람이 운전하는 차량 간의 충돌이 감지되지 않았음을 보여준다.Table 2 and FIG. 8 show that no collision between the autonomous vehicle and the human-driven vehicle was detected in all experiments.

예를 들어, 자율 주행 차량과 사람이 운전하는 차량 사이의 충돌이 없는 것을 보여주는 도 6a와 도 6b 그리고 도 7a와 도 7b의 충돌 상태 값은 0이다. For example, the collision state value of FIGS. 6A and 6B and FIGS. 7A and 7B showing that there is no collision between an autonomous vehicle and a vehicle driven by a human is 0.

그러므로 자동화된 차량은 안전하게 이동했다. 또한 두 번째 실험에서는 부드러운 운행 수준 점수 (53.26%)의 가장 높은 개선 효과가 관찰되었다. Therefore, the automated vehicle moved safely. Also, the highest improvement effect of smooth driving level score (53.26%) was observed in the second experiment.

첫 번째 실험에서 가장 높은 시간 성능 지수(31.60%)가 발생했다. The highest time figure of merit (31.60%) occurred in the first experiment.

비신호 교차로에서 본 발명에 따른 적응형 MPC 모델과 고전적 적응형 MPC 모델을 비교했을 때, 부드러운 주행의 개선은 상향 추세를 보였으며, 수동 차량의 수가 점차 증가하고 있을 때 지연 시간 단축이 감소하고 있었다.When the adaptive MPC model according to the present invention and the classical adaptive MPC model were compared at non-signal intersections, the improvement of smooth driving showed an upward trend, and the delay time reduction was decreasing when the number of manual vehicles was gradually increasing. .

군집 차량을 고려한 본 발명에 따른 부드러운 운행 성능은 다음과 같다.The smooth driving performance according to the present invention considering the platoon vehicle is as follows.

도 9는 본 발명에 따른 모델에서 자율 주행 차량 속도에 따른 시뮬레이션 결과 그래프이고, 도 10은 본 발명에 따른 모델에서 자율 주행 차량의 가속을 통한 시뮬레이션 결과 그래프이다.9 is a simulation result graph according to the autonomous driving vehicle speed in the model according to the present invention, and FIG. 10 is a simulation result graph through acceleration of the autonomous driving vehicle in the model according to the present invention.

표 3과 표 4는 두 실험의 해당 최소, 최대 및 표준 편차를 나타낸 것이다.Tables 3 and 4 show the corresponding minimum, maximum and standard deviations of the two experiments.

RSS 알고리즘을 사용하여 AV는 안전한 거리를 유지하면서 유도 차량(사람 운전 차량)에 대한 상대 거리와 상대 속도를 자동으로 추적했다. Using an RSS algorithm, AV automatically tracked the relative distance and relative speed to a guided vehicle (human-driven vehicle) while maintaining a safe distance.

도 9와 같이 AV 속도는 시뮬레이션 시작 시 설정된 속도를 항상 따를 수 있다. 두 차량 사이의 상대 거리가 제동 거리보다 작을 때 AV는 안전 거리가 복원될 때까지 최소 가속(-5.0m/s2)으로 감속했다. As shown in FIG. 9 , the AV speed may always follow the speed set at the start of the simulation. When the relative distance between the two vehicles is less than the braking distance, the AV decelerates to a minimum acceleration (-5.0 m/s2) until the safety distance is restored.

첫 번째 실험의 속도 분포와 가속 분포의 표준 편차는 두 번째 실험보다 작았다. 따라서, 첫 번째 실험의 변동 범위는 속도와 가속도 분포 측면에서 두 번째 실험의 변동 범위보다 작았다. The standard deviations of velocity distribution and acceleration distribution in the first experiment were smaller than in the second experiment. Therefore, the fluctuation range of the first experiment was smaller than that of the second experiment in terms of velocity and acceleration distribution.

첫 번째 실험은 속도와 가속도 면에서 두 번째 실험보다 더 부드러웠다. 다시 말해, 부드러운 주행의 개선은 본 발명에 따른 모델에서 수동 차량의 수가 증가했을 때보다 적었다.The first experiment was smoother than the second in terms of speed and acceleration. In other words, the improvement in smooth running was less than when the number of manual vehicles was increased in the model according to the invention.

이상에서 설명한 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법은 비신호 교차로에서 책임민감성 안전이론과 부분적 관측 마르코프 결정 절차를 활용하여 효율적 통행이 가능하도록 한 것이다.The apparatus and method for improved passage of autonomous vehicles at non-signaled intersections according to the present invention described above enables efficient passage at non-signaled intersections by utilizing the responsibility-sensitive safety theory and partial observation Markov decision procedure.

본 발명은 실제 상황과 같이 자율주행차량이 관찰할 수 있는 범위 내의 정보를 통하여 학습하는 방법으로 강화학습인 마르코프 의사결정 모델 사용(Partial Observability MDP, POMDP)으로 행동에 대한 강화학습의 보상을 최대화할 수 있도록 한 것이다.The present invention is a method of learning through information within the range that an autonomous vehicle can observe as in a real situation. It is possible to maximize the reward of reinforcement learning for behavior by using a Markov decision-making model (Partial Observability MDP, POMDP), which is reinforcement learning. made it possible

이상에서의 설명에서와 같이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 본 발명이 구현되어 있음을 이해할 수 있을 것이다.It will be understood that the present invention is implemented in a modified form without departing from the essential characteristics of the present invention as described above.

그러므로 명시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 하고, 본 발명의 범위는 전술한 설명이 아니라 특허청구 범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.Therefore, the specified embodiments are to be considered in an illustrative rather than a restrictive view, the scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the scope equivalent thereto are included in the present invention. will have to be interpreted.

10. 상태 초기화부 20. 최적 행동 도출부
30. 행동 실행부 40. 상태 관찰부
50. 보상 결정부 60. 보상 수준 판단부
70. 상태 업데이트부10. State initialization unit 20. Optimal action derivation unit
30. Behavior Execution Unit 40. State Observation Unit
50. Compensation decision unit 60. Compensation level determination unit
70. Status Update Department

Claims

부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 알고리즘의 학습 상태를 초기화하는 상태 초기화부;
자율주행차량(AV) 운행 최적화를 위한 POMDP 모델에 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 적용하여 최적 행동을 도출하는 최적 행동 도출부;
최적 행동 도출부에서 도출된 최적 행동을 실행하는 행동 실행부;
자율주행차량(AV)의 비전 센서, 레이더 센서에서 관찰된 데이터를 제공받아 주행 상태를 관찰하는 상태 관찰부;
안전, 비안전, 실패, 목표 보상 인지에 따라 자율주행 차량을 위한 RSS 알고리즘과 인간 운전 차량을 위한 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)에 기반한 자율주행 차량 행동 수정을 하는 보상 결정부;를 포함하는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.a state initiator that initializes the learning state of a Partial Observability Markov decision process (POMDP) algorithm;
an optimal behavior derivation unit that derives optimal behavior by applying Responsibility-Sensitive Safety (RSS) to the POMDP model for optimizing autonomous vehicle (AV) operation;
a behavior execution unit executing the optimal behavior derived from the optimal behavior derivation unit;
a state observation unit receiving data observed from a vision sensor and a radar sensor of an autonomous vehicle (AV) and observing a driving state;
Reward decision unit that modifies autonomous vehicle behavior based on RSS algorithm for autonomous vehicle and Adaptive Model Predictive Control System for human-driving vehicle based on safety, unsafety, failure, and target reward recognition A device for improved passage of autonomous vehicles at non-signaled intersections, comprising:

제 1 항에 있어서, 보상 결정부의 보상 수준이 적절한지 판단하는 보상 수준 판단부와,
보상 수준 판단부의 판단 결과 보상 수준이 목표치가 아닌 경우에 상태를 업데이트하는 상태 업데이트부를 더 포함하는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.The method of claim 1, further comprising: a compensation level determination unit for determining whether the compensation level of the compensation determination unit is appropriate;
The apparatus for improved passage of an autonomous vehicle at a non-signaling intersection, characterized in that it further comprises a status update unit for updating the status when the compensation level is not the target value as a result of the determination of the compensation level determination unit.

제 1 항에 있어서, POMDP 모델은 자율주행차량 센서를 통하여 얻어진 데이터를 기반으로 하여 행동을 결정하고 행동에 대해 강화학습의 보상을 하기 위하여,
시간, 차량 위치, 속도, 결정된 경로 항목을 포함하는 자율주행차량과 인간운전자 상태,
가속, 유지, 감속 항목을 포함하는 자율주행차 행태,
시뮬레이션 시간, 차량 위치, 차량 속도, 차량 센서에서 얻어진 데이터를 포함하는 상태 관찰,
강화학습 보상을 위하여 안전거리 미확보 시의 부정적 보상, 안전거리 확보시의 일반적 보상, 차량이 목표까지 도달했는가를 기준으로 한 목표 보상, 사고 발생시의 실패 보상의 과정을 수행하는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.The method of claim 1, wherein the POMDP model determines a behavior based on data obtained through an autonomous vehicle sensor and rewards the behavior with reinforcement learning;
Autonomous vehicle and human driver status, including time, vehicle location, speed, and determined route parameters;
Autonomous vehicle behavior including acceleration, maintenance, and deceleration items;
simulation time, vehicle position, vehicle speed, condition observations including data obtained from vehicle sensors;
Non-signal characterized in that for reinforcement learning compensation, a process of negative compensation when a safe distance is not secured, general compensation when a safe distance is secured, target compensation based on whether the vehicle has reached the target, and failure compensation when an accident occurs Devices for improved passage of autonomous vehicles at intersections.

제 3 항에 있어서, POMDP 모델의 믿음 상태(belief state)는

으로 표시되는 연속 상태 공간이고,
여기서,

는 자율 주행 차량의 믿음 상태(belief state),
를

는 인간 운전 차량의 상태이며,

는 이 모델에서 인간이 운전하는 차량의 수인 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.4. The method of claim 3, wherein the belief state of the POMDP model is

is a continuous state space denoted by
here,

is the belief state of the autonomous vehicle,
cast

is the state of a human-driven vehicle,

A device for improved passage of autonomous vehicles at non-signaled intersections, characterized in that is the number of vehicles driven by humans in this model.

제 4 항에 있어서, 상태(state)는 폐쇄 루프 설정에서 레이더 및 비전 센서를 통해 각 타임 스텝(time step)에서 자율주행 차량과 기타 차량의 위치(x,y), 속도(v), yaw(

) 및 yaw rate(

)로 구성되는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.5. The method of claim 4, wherein the state is the position (x, y), velocity (v), yaw (x, y), velocity (v), yaw (

) and yaw rate(

), a device for improved passage of autonomous vehicles at non-signaled intersections.

제 3 항에 있어서, 행동 공간(Action space)에서 자기 행동(ego action)에는 가속(

), 감속(

) 및 원하는 속도 유지(

)가 포함되고, ACC 모델의 간격과 속도 제어에 의해 제어되고,
관측 공간(Observation space)에서 관측은 위치, 속도, yaw, yaw rate, 상대 속도(

) 및 운전자 없는 차량과 선행 차량의 상대 거리(

) 요인으로 구성되는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.According to claim 3, In the action space (Action space) ego action (ego action) acceleration (

), decelerate (

) and keep the desired speed (

) is included, and is controlled by the interval and speed control of the ACC model,
In observation space, observations can be made with position, velocity, yaw, yaw rate, and relative velocity (

) A device for improved passage of autonomous vehicles at non-signaled intersections, characterized in that it consists of factors.

제 3 항에 있어서, 보상 결정부에서 보상 기능과 최적 행동은,
(1)

은 안전한 보상을 의미하고, AV는 설정된 속도까지 속도를 유지하거나 가속,
(2)

은 안전하지 않은 보상을 의미하므로 AV는 안전 거리가 복원될 때까지 최소 가속으로 감속,
(3)

은 고장 보상을 나타내는 것으로, 폐쇄 루프 설정이 중지,
(4)목표 위치에 도달하는 차량 중 하나가 목표 보상을 안전하게 표시하는 경우 폐쇄 루프 설정 중지의 규칙에 따라 갱신되는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.The method of claim 3, wherein the reward function and the optimal behavior in the reward determining unit are:
(One)

means safe compensation, and AV maintains the speed up to a set speed or accelerates,
(2)

means unsafe compensation, so AV decelerates to minimum acceleration until safety distance is restored,
(3)

indicates fault compensation, closed-loop setup stops,
(4) A device for improved passage of autonomous vehicles at non-signaled intersections, characterized in that when one of the vehicles reaching the target position safely displays the target reward, it is updated according to the rule of closed-loop setting stop.

제 1 항에 있어서, 자율 주행 차량 운행 최적화를 위하여 RSS 알고리즘을 혼합 트래픽에 적용하여 위험한 상황을 피하기 위해 안전한 거리를 유지하고,
안전 거리(

)는,

으로 정의되고,
여기서,

는 AV의 반응 시간,

는 제동 거리,

는 AV의 응답 시간,

은 AV의 실제 속도,

는 AV의 최소 가속도인 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.The method of claim 1, wherein the RSS algorithm is applied to mixed traffic for optimizing autonomous vehicle operation to maintain a safe distance to avoid dangerous situations;
safe distance (

)Is,

is defined as
here,

is the reaction time of AV,

is the braking distance,

is the response time of AV,

is the actual speed of the AV,

A device for improved passage of an autonomous vehicle in a non-signaled intersection, characterized in that is the minimum acceleration of the AV.

제 8 항에 있어서, AV는 응답 시간

에서 최대 가속도에 도달할 때까지 가속되고 수동 구동 차량으로부터 안전한 거리를 유지하기 위해 응답 시간 후 최소 가속도에 의해 감속되고,
자율 주행 결정 함수는 만약,

인 경우, 두 차량 모두 정상 주행 및 설정 속도를 따르고, 만약,

인 경우, 자율 주행 차량은 안전 거리가 복원될 때까지 최소 가속으로 감속하는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.9. The method of claim 8, wherein AV is response time

is accelerated until it reaches the maximum acceleration at
The autonomous driving decision function is

If , both vehicles follow normal driving and set speed, and if,

If , the autonomous vehicle decelerates with minimal acceleration until a safe distance is restored.

제 1 항에 있어서, 인간 운전 차량에 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System) 적용하여,
자율주행차량의 센서에서 획득된 전방의 가장 가까운 인간 운전자와의 상대적 거리와 상대 속도를 파악하고,
제어 변수에서 자율주행차량은 전방의 인간 운전 차량과 일정한 거리를 자율적으로 유지하는 방식으로 인간 운전자의 행태에 반응하여 운행할 수 있도록 하는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.The method of claim 1, wherein by applying an Adaptive Model Predictive Control System to a human-driven vehicle,
Identify the relative distance and relative speed to the nearest human driver in front obtained from the sensor of the autonomous vehicle,
In the control variable, the self-driving vehicle maintains a certain distance autonomously from the human-driving vehicle in front, allowing it to operate in response to the behavior of the human driver. device for.

제 10 항에 있어서, 비신호 교차로에서의 시뮬레이션을 위한 설정값으로,
시뮬레이션 시간 단위, 자율주행차량의 반응 시간, 자율주행차량의 초기 속도, 인간운전자의 초기 속도, 차량의 최소 가속도, 차량의 최대 가속도, 비신호 교차로 진입 도로의 차선 수, 비신호 교차로 진입 도로의 수, 차선의 너비 항목을 포함하는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.11. The method according to claim 10, as a set value for simulation at a non-signal intersection,
Simulation time unit, reaction time of autonomous vehicle, initial speed of autonomous vehicle, initial speed of human driver, minimum acceleration of vehicle, maximum acceleration of vehicle, number of lanes on non-signaled intersections, number of non-signaled intersections , a device for improved passage of an autonomous vehicle at a non-signaled intersection, comprising:

제 11 항에 있어서, 지연 시간을 고려한 성능 분석을 위하여,
시뮬레이션 시나리오에서 비전 및 레이더 센서를 통해 자율주행 차량과 선도 차량 사이의 상대 거리를 기준으로 감속 시작 시간(

))부터 가속 시작 시간(

)까지 제동 시간(

)을 계산하여 지연 시간을 고려한 성능을 분석을 하는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.The method of claim 11 , wherein for performance analysis in consideration of delay time,
In a simulation scenario, the start time of deceleration based on the relative distance between the autonomous vehicle and the leading vehicle via vision and radar sensors (

)) from the acceleration start time (

) until the braking time (

) and analyzing the performance considering the delay time.

제 12 항에 있어서, 제동 시간은

으로 정의되고,
시간 성능 지수(

)는,

,

으로 정의되고,
여기서,

는 고전적인 적응형 MPC 모델의 제동시간,

은 고전적인 적응형 MPC 모델의 가속 시작 시간이고,

은 고전적인 적응형 MPC 모델의 감속 시작 시간,

는 제안 모델의 제동 시간,

는 제안 모델의 가속 시작 시간이고,

는 제안 모델의 감속 시작 시간인 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.13. The method of claim 12, wherein the braking time is

is defined as
time figure of merit (

)Is,

,

is defined as
here,

is the braking time of the classic adaptive MPC model,

is the acceleration start time of the classic adaptive MPC model,

is the deceleration start time of the classic adaptive MPC model,

is the braking time of the proposed model,

is the acceleration start time of the proposed model,

is the deceleration start time of the proposed model.

제 12 항에 있어서, 부드러운 운행 수준 점수(

)는 부드러운 주행 고려의 성능을 분석하기 위해 최소 속도(

)와 설정 속도(

)를 고려하고,

으로 정의되고,

,

,
여기서,

는 적응형 MPC 차량 기준 속도,

는 제안 모델의 차량 기준속도,

는 적응형 모델 차량 최소 속도,

는 제안 모델의 차량 최소 속도인 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치.13. The smooth driving level score (

) is the minimum speed (

) and set speed (

) to take into account,

is defined as

,

,
here,

is the adaptive MPC vehicle reference speed,

is the vehicle reference speed of the proposed model,

is the adaptive model vehicle minimum speed,

A device for improved passage of autonomous vehicles at non-signaled intersections, characterized in that is the minimum vehicle speed of the proposed model.

부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 알고리즘의 학습 상태를 초기화하는 상태 초기화 단계;
자율주행차량(AV) 운행 최적화를 위한 POMDP 모델에 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 적용하여 최적 행동을 도출하는 최적 행동 도출 단계;
최적 행동 도출 단계에서 도출된 최적 행동을 실행하는 행동 실행 단계;
자율주행차량(AV)의 비전 센서, 레이더 센서에서 관찰된 데이터를 제공받아 주행 상태를 관찰하는 상태 관찰 단계;
안전, 비안전, 실패, 목표 보상 인지에 따라 자율 주행 차량의 RSS 알고리즘과 인간 운전 차량의 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)에 기반한 자율주행 차량 행동 수정을 하는 보상 결정 단계;를 포함하는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법.A state initialization step of initializing the learning state of the partial observation Markov decision process (POMDP) algorithm;
An optimal behavior derivation step of deriving an optimal behavior by applying a Responsibility-Sensitive Safty (RSS) to the POMDP model for optimizing autonomous vehicle (AV) operation;
an action execution step of executing the optimum action derived from the step of deriving the optimum action;
a state observation step of observing a driving state by receiving data observed from a vision sensor and a radar sensor of an autonomous vehicle (AV);
A reward decision step of correcting the behavior of the autonomous vehicle based on the RSS algorithm of the autonomous vehicle and the Adaptive Model Predictive Control System of the human-driven vehicle according to the recognition of safety, non-safety, failure, and target reward; A method for improved passage of autonomous vehicles at non-signaled intersections, comprising:

제 15 항에 있어서, 보상 결정 단계의 보상 수준이 적절한지 판단하는 보상 수준 판단 단계와,
보상 수준 판단 단계의 판단 결과 보상 수준이 목표치가 아닌 경우에 상태를 업데이트하는 상태 업데이트 단계를 더 포함하는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법.16. The method of claim 15, further comprising: a compensation level determination step of determining whether the compensation level of the compensation determination step is appropriate;
The method for improved passage of an autonomous vehicle at a non-signaled intersection, further comprising a status update step of updating a status when the compensation level is not the target value as a result of the determination of the compensation level determination step.

제 15 항에 있어서, POMDP 모델은 자율주행차량 센서를 통하여 얻어진 데이터를 기반으로 하여 행동을 결정하고 행동에 대해 강화학습의 보상을 하기 위하여,
시간, 차량 위치, 속도, 결정된 경로 항목을 포함하는 자율주행차량과 인간운전자 상태,
가속, 유지, 감속 항목을 포함하는 자율주행차 행태,
시뮬레이션 시간, 차량 위치, 차량 속도, 차량 센서에서 얻어진 데이터를 포함하는 상태 관찰,
강화학습 보상을 위하여 안전거리 미확보 시의 부정적 보상, 안전거리 확보시의 일반적 보상, 차량이 목표까지 도달했는가를 기준으로 한 목표 보상, 사고 발생시의 실패 보상의 과정을 수행하는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법.The method of claim 15, wherein the POMDP model determines a behavior based on data obtained through an autonomous vehicle sensor and rewards the behavior with reinforcement learning;
Autonomous vehicle and human driver status, including time, vehicle location, speed, and determined route parameters;
Autonomous vehicle behavior including acceleration, maintenance, and deceleration items;
simulation time, vehicle position, vehicle speed, condition observations including data obtained from vehicle sensors;
Non-signal characterized in that for reinforcement learning compensation, a process of negative compensation when a safe distance is not secured, general compensation when a safe distance is secured, target compensation based on whether the vehicle has reached the target, and failure compensation when an accident occurs Methods for improved passage of autonomous vehicles at intersections.

제 15 항에 있어서, 보상 결정 단계에서 보상 기능과 최적 행동은,
(1)

은 고장 보상을 나타내는 것으로, 폐쇄 루프 설정이 중지,
(4)목표 위치에 도달하는 차량 중 하나가 목표 보상을 안전하게 표시하는 경우 폐쇄 루프 설정 중지의 규칙에 따라 갱신되는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법.16. The method of claim 15, wherein the reward function and the optimal behavior in the reward determination step are:
(One)

indicates fault compensation, closed-loop setup stops,
(4) A method for improved passage of autonomous vehicles at non-signaled intersections, characterized in that they are updated according to the rules of closed-loop set stop when one of the vehicles reaching the target position safely displays the target reward.

제 15 항에 있어서, 인간 운전 차량용 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System) 적용에 따라,
자율주행차량의 센서에서 획득된 가장 가까운 인간 운전자와의 상대적 거리와 상대 속도를 파악하고,
제어 변수에서 자율주행차량은 전방 차량과 일정한 거리를 자율적으로 유지하는 방식으로 인간 운전자의 행태에 반응하여 운행할 수 있도록 하는 것을 특징으로 하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법.
16. The method of claim 15, according to the application of the Adaptive Model Predictive Control System for human-driven vehicles,
To determine the relative distance and relative speed to the nearest human driver obtained from the sensor of the autonomous vehicle,
A method for improved passage of autonomous vehicles at non-signaled intersections, characterized in that, in the control variable, the autonomous vehicle can operate in response to the behavior of a human driver in a manner that autonomously maintains a certain distance from the vehicle in front.