KR102618680B1

KR102618680B1 - Real-time 3D object detection and tracking system using visual and LiDAR

Info

Publication number: KR102618680B1
Application number: KR1020210118874A
Authority: KR
Inventors: 김곤우; 무하마드 수알레
Original assignee: 충북대학교 산학협력단
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2023-12-27
Also published as: KR20230036243A

Abstract

본 발명은 영상과 라이다를 이용한 실시간 3차원 물체 인식 및 추적하는 시스템에 관한 것으로서, 물체를 인식하기 위한 제1 센서인 라이다에서 생성된 포인트 클라우드에 대하여 지면 포인트 클라우드와 비 지면 포인트 클라우드로 구분하기 위한 지면 분류 모듈, 상기 지면 분류 모듈에서 분류된 포인트 클라우드에 대하여 하나의 물체에 속하는 포인트들을 그룹핑하고 라벨링하는 방식으로 클러스터링 작업을 수행하기 위한 클러스터링 모듈, 상기 클러스터링 모듈에서 클러스터링된 포인트들에 대하여 물체를 나타내는 3D 경계 박스를 생성하고, 생성된 박스 영역을 이용하여 물체를 분류하기 위한 박스 피팅 모듈, 상기 박스 피팅 모듈에서 분류된 물체를 트랙에 할당하여, 모션 패턴을 예측하여 추적하는 트래커 모듈, 물체를 인식하기 위한 제2 센서인 카메라에서 촬영된 영상에서 물체를 인식하기 위한 물체 디텍터 및 상기 트래커 모듈에서 추적한 물체와 상기 물체 디텍터에서 인식한 물체를 융합하고, 가짜 트랙을 취소하고, 트랙을 유지하고 관리하는 트랙 관리 모듈을 포함한다. The present invention relates to a system for real-time 3D object recognition and tracking using images and LiDAR. The point cloud generated by LiDAR, the first sensor for recognizing objects, is divided into ground point cloud and non-ground point cloud. A ground classification module for performing a clustering task by grouping and labeling points belonging to one object for the point cloud classified in the ground classification module, and an object for the points clustered in the clustering module. A box fitting module for generating a 3D bounding box representing and classifying an object using the generated box area, a tracker module for predicting and tracking a motion pattern by assigning the object classified in the box fitting module to a track, and object An object detector for recognizing an object in an image captured by a camera, which is a second sensor for recognizing an object, fuses the object tracked by the tracker module and the object recognized by the object detector, cancels the fake track, and maintains the track. It includes a track management module that manages

Description

영상과 라이다를 이용한 실시간 3차원 물체 인식 및 추적하는 시스템 {Real-time 3D object detection and tracking system using visual and LiDAR}Real-time 3D object detection and tracking system using visual and LiDAR}

본 발명은 자율 주행 차량의 인베디드 시스템에서 3차원 물체 인식에 관한 것이다. The present invention relates to 3D object recognition in embedded systems of autonomous vehicles.

최근 몇 년 동안 자율 주행 차량에 대한 뉴스에 의하면, 레벨 5의 차량 자율성이 다가왔음을 알 수 있다. 그러나 완전한 자율성을 주장하는 데 있어 가장 큰 걸림돌은 여전히 자율적 결정에 영향을 미치는 환경적 인식이라고 할 수 있다. 효율적인 지각 시스템은 다양한 환경 조건에서 수행할 수 있고 제한된 계산 자원을 사용하여 신뢰할 수 있는 정보를 제공할 수 있는 센서 양식의 중복성을 필요로 한다.News about autonomous vehicles in recent years suggests that level 5 vehicle autonomy is on the horizon. However, the biggest obstacle to claiming complete autonomy is still environmental awareness that influences autonomous decisions. Efficient perception systems require redundancy in sensor modalities that can perform under a variety of environmental conditions and provide reliable information using limited computational resources.

무인 차량의 자율 주행을 위해서는 전방의 동적 물체를 탐지하여 물체의 동적 움직임을 추정한 후 자율 주행 경로를 생성하여야 한다. 현재 레이다, 카메라 등을 이용한 동적 물체 탐지 추적 기법에 많은 연구가 진행되고 있으며 최근 레이저 스캐너의 가격 하락에 힘입어 일반 자동차 업체들 또한 운전자 보조 시스템(driver-assistance system) 탑재가 대중화 되고 있다.For autonomous driving of an unmanned vehicle, a dynamic object in front must be detected, the dynamic movement of the object must be estimated, and an autonomous driving path must be created. Currently, much research is being conducted on dynamic object detection and tracking techniques using radar, cameras, etc., and thanks to the recent decline in the price of laser scanners, the installation of driver-assistance systems is becoming popular among general automobile manufacturers.

레이저 스캐너를 이용한 이동물체 탐지를 하기 위해서는 각각의 레이저 포인터를 깊이 값으로 환산하여 탑재차량의 주위에 대한 포인트 클라우드(Point cloud)를 생성하게 된다. 생성된 포인트 클라우드에서 각각의 포인트는 어떠한 의미를 가지고 있지 않기 때문에 이동물체 탐지 추적을 위해서는 먼저 클러스터링 기법을 통하여 포인트들을 묶어서 하나의 물체로 표현한다.In order to detect moving objects using a laser scanner, each laser pointer is converted into a depth value to create a point cloud around the vehicle. Since each point in the generated point cloud does not have any meaning, in order to detect and track moving objects, the points are first grouped together through a clustering technique and expressed as one object.

이처럼, 자율 주행에서 환경 인식은 필수적이며 복잡한 도시 시나리오와 같이 복잡한 환경에서 견고성을 요구한다. As such, environmental awareness is essential in autonomous driving and requires robustness in complex environments such as complex urban scenarios.

운전자가 관리할 수 있는 모든 도로 및 환경 조건에서 모든 운전 업무를 수행할 수 있는 자동화된 주행 시스템은 국제 SAE(Society of Automotive Engineers)에서 최고 수준의 자동화로 분류된다. ADA(Advanced Driving Assists)는 상업적으로 이용 가능하지만 인간의 개입이 필요하거나 특정 환경 조건에서 작동하는 경우도 있다. 이러한 자율성의 실현은 다중 물체 인식 및 추적(Multiple Object Detection and Tracking, MODT)과 같은 관련 연구 영역에 대해 거대한 요구 사항을 제시하고, 주변 환경에서 공존하는 엔티티의 동적 속성을 이해하는 것이 전반적인 자동화를 향상시키는 데 중요하다. 이는 로컬라이제이션(localization), 매핑(mapping) 및 모션 계획(motion planning)의 품질에 직접적인 영향을 미친다. Automated driving systems that can perform all driving tasks under all road and environmental conditions that the driver can manage are classified as the highest level of automation by the international Society of Automotive Engineers (SAE). Advanced Driving Assists (ADA) are commercially available, but some require human intervention or operate under specific environmental conditions. The realization of such autonomy places huge demands on related research areas such as Multiple Object Detection and Tracking (MODT), where understanding the dynamic properties of coexisting entities in the surrounding environment will improve overall automation. It is important to do it. This has a direct impact on the quality of localization, mapping and motion planning.

지난 10 년 동안 카메라에 대한 인식을 통해 수많은 MODT 접근법이 전통적으로 연구되어 왔으며, 이에 대한 자세한 검토가 이루어졌다. 이에 의하면, 물체는 2D 좌표계 또는 스테레오 설정의 3D 좌표계에서, 카메라 기준 프레임에 감지되어 각각 2D 또는 3D 궤적을 생성한다. 그러나, 정확하지 않은 카메라 기하학을 이용하여 공간 정보가 산출되며, FOV(Field of view)는 제한적이다. 또한, 카메라 기반 접근법은 객체 절단, 조명 조건, 고속 타겟, 센서 모션 및 타겟 간의 상호 작용을 포함하여 다양한 문제에 직면하고 있다. Over the past decade, numerous MODT approaches have been traditionally studied through camera awareness, and these have been given a detailed review. According to this, an object is detected in a camera reference frame in a 2D coordinate system or a 3D coordinate system in a stereo setting and generates a 2D or 3D trajectory, respectively. However, spatial information is calculated using inaccurate camera geometry, and the field of view (FOV) is limited. Additionally, camera-based approaches face a variety of challenges, including object cutting, lighting conditions, high-speed targets, sensor motion, and interactions between targets.

자율 주행에서 3D 객체 좌표는 위치 정확도와 견고성이 있어야 하며, 대부분의 물체 인식 장치는 자율 차량에 내장되어 동작한다. 이러한 제약을 충족하기 위해서 완전 자율주행 임베디드 시스템(embedded system)의 맥락에서 효율적이고 컴팩트한 3D 감지 프레임워크가 필요하다. 따라서 포인트 클라우드에서 소형 3D 물체를 감지하는 경우, 임베디드 시스템 친화적으로 자율 주행 시스템을 구현하는 것이 중요하다. In autonomous driving, 3D object coordinates must have location accuracy and robustness, and most object recognition devices operate while embedded in autonomous vehicles. To meet these constraints, an efficient and compact 3D sensing framework is needed in the context of fully autonomous embedded systems. Therefore, when detecting small 3D objects in a point cloud, it is important to implement an autonomous driving system in an embedded system-friendly manner.

최근, 넓은 파노라마 배경 정보를 제공하는 라이다(Light Detector and Ranging, LiDAR)기술이 대안 기술로서 점차 대중화되고 있다. 라이다는 10-15 Hz의 합리적인 속도로 최대 100m에 이르는 넓은 파노라믹 측정을 제공하기 때문에, MODT 작업을 위한 이상적인 센서이다.Recently, LiDAR (Light Detector and Ranging, LiDAR) technology, which provides wide panoramic background information, has become increasingly popular as an alternative technology. LiDAR is an ideal sensor for MODT tasks as it provides wide panoramic measurements up to 100 m at a reasonable rate of 10-15 Hz.

다양한 센서 중에서 LIDAR는 3D 객체 검출 작업에 이상적인 센서로 사용되고 있으며, 많은 모바일 로봇 응용 프로그램, 특히 자율 주행에서 로봇 비전에 유비쿼터스한 3D 포인트 클라우드를 제공한다. 또한 LIDAR는 시각 정보와 달리 3D 현실 세계의 불균일한 샘플링, 유효 작동 범위, 폐색, 노이즈 및 시각 센서에서 제한되는 모든 날씨에 대한 상대 포즈와 같은 요인으로 인해 매우 희소한 점 밀도 분포를 제공한다. Among various sensors, LIDAR is used as an ideal sensor for 3D object detection tasks and provides ubiquitous 3D point clouds for robot vision in many mobile robot applications, especially in autonomous driving. Additionally, LIDAR, unlike visual information, provides a very sparse point density distribution due to factors such as non-uniform sampling of the 3D real world, effective operating range, occlusion, noise, and relative pose for all weather limited in visual sensors.

자율주행차의 대규모 상용화 지연은 안전성, 타당성 및 경제성과 관련된 요소와 관련이 있으며, 인명 구조 치수에서 무인 자동차에 대한 개발이 진행되고 있다.Delays in large-scale commercialization of self-driving cars are related to factors related to safety, feasibility and economics, and development of driverless cars is progressing in life-saving dimensions.

현재 미국의 자동차 사망률은 안전 위반 사례를 포함하여 주행 1억 마일(miles)당 약 1.22명이다. 이는 사실상 자율 차량 고장에 대한 벤치마크를 설정하며, 이는 여전히 큰 도전 과제로 남아 있다. 또한 자율주행차는 안전과 실행 가능성 사이에서 균형을 유지하면서 결정을 내려야 한다. The current automobile fatality rate in the United States, including safety violations, is approximately 1.22 per 100 million miles driven. This effectively sets the benchmark for autonomous vehicle failures, which remain a huge challenge. Self-driving cars must also make decisions while balancing safety and feasibility.

자율 주행차 패러다임의 또 다른 과제는 센서의 원시(raw) 데이터를 실시간으로 처리해야 하는 컴퓨팅 요구가 증가하는 것으로서, 엣지 컴퓨팅은 계산 비용이 많이 드는 작업의 원격 처리를 제공할 수 있지만 정보의 보안 및 신뢰성이 저하된다. Another challenge for the autonomous vehicle paradigm is the increasing computing need to process raw data from sensors in real time. Edge computing can provide remote processing of computationally expensive tasks, but also provides security and security of information. Reliability decreases.

3D 물체 감지는 클래스, 치수 및 포즈 측면에서 차량 주변의 3D 공간을 충실하게 표현한다. 반면 추적을 통해 동적 매개변수를 추정할 수 있다. 또한 추적은 탐지기의 단점으로 인해 일시적으로 탐지가 누락되는 문제도 해결한다. 자율 차량이 3D 물체 감지 및 추적을 수행하기 위한 프레임워크를 개발할 때의 주요 과제에는 실시간 성능, 제한된 계산 수요, 다양한 날씨 및 조명 조건에서의 적용 가능성, 센서의 수 및 위치 변경에 대한 용이한 적응 등이 포함된다.3D object detection faithfully represents the 3D space around the vehicle in terms of class, dimensions, and pose. On the other hand, dynamic parameters can be estimated through tracking. Tracking also solves the problem of temporarily missing detections due to detector shortcomings. Key challenges in developing a framework for autonomous vehicles to perform 3D object detection and tracking include real-time performance, limited computational demands, applicability in different weather and lighting conditions, and easy adaptation to changes in the number and location of sensors. This is included.

자율 주행 차량에는 일반적으로 초음파, 레이더, LiDAR, 카메라 등과 같은 환경 인식을 위한 수많은 센서가 장착되어 있다. 이러한 센서 중 많은 최신 접근 방식은 3D 물체 감지 작업에 카메라, LiDAR 또는 이 둘의 융합을 사용한다. LiDAR와 카메라는 독립적으로 물체 감지를 수행할 수 있지만 각 센서에는 몇 가지 제한 사항이 있다. LiDAR 기반 접근 방식은 혹독한 기상 조건과 저해상도에 취약한 반면, 카메라 기반 접근 방식은 부적절한 조명 및 깊이 정보로 인해 어려움을 겪고 있다. 따라서 두 센서 모두 개별적인 한계를 보완하고 더 넓은 범위의 환경 조건에 적용할 수 있도록 공동 작업이 필요하다. Autonomous vehicles are typically equipped with numerous sensors for environmental awareness, such as ultrasound, radar, LiDAR, cameras, etc. Many of the latest approaches among these sensors use cameras, LiDAR, or a fusion of the two for 3D object detection tasks. LiDAR and cameras can perform object detection independently, but each sensor has some limitations. LiDAR-based approaches are vulnerable to harsh weather conditions and low resolution, while camera-based approaches suffer from inadequate lighting and depth information. Therefore, both sensors need to work together to compensate for their individual limitations and be applicable to a wider range of environmental conditions.

현재 3차원 공간에서 추적을 수행하는 트래커(tracker)가 점점 더 많이 제안되고 있다. 그러나 이러한 트래커는 상당히 정확한 3D 감지를 필요로 하므로, 시스템에 상당한 계산 요구가 추가된다. 기존 트래커 방식의 주요 단점은 트레이닝을 필요로 하는 네트워크 파라미터, 계산 요구 사항, 임베디드 시스템에 대한 적용 불가능성, 3D 객체 디텍터(detector) 성능에 대한 의존성 등이다. Currently, more and more trackers are being proposed that perform tracking in three-dimensional space. However, these trackers require fairly accurate 3D sensing, adding significant computational demands to the system. The main drawbacks of existing tracker methods are network parameters that require training, computational requirements, inapplicability to embedded systems, and dependence on 3D object detector performance.

대한민국 등록특허 10-1628155Republic of Korea registered patent 10-1628155

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 자율 주행 차량의 임베디드 시스템에서 영상과 라이다를 이용하여 실시간 3차원 물체 인식 및 추적하는 시스템을 제공하는데 그 목적이 있다.The present invention was developed to solve the above problems, and its purpose is to provide a system for real-time 3D object recognition and tracking using images and LiDAR in an embedded system of an autonomous vehicle.

본 발명의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The object of the present invention is not limited to the object mentioned above, and other objects not mentioned will be clearly understood by those skilled in the art from the description below.

이와 같은 목적을 달성하기 위한 본 발명은 영상과 라이다를 이용한 실시간 3차원 물체 인식 및 추적하는 시스템에 관한 것으로서, 물체를 인식하기 위한 제1 센서인 라이다에서 생성된 포인트 클라우드에 대하여 지면 포인트 클라우드와 비 지면 포인트 클라우드로 구분하기 위한 지면 분류 모듈, 상기 지면 분류 모듈에서 분류된 포인트 클라우드에 대하여 하나의 물체에 속하는 포인트들을 그룹핑하고 라벨링하는 방식으로 클러스터링 작업을 수행하기 위한 클러스터링 모듈, 상기 클러스터링 모듈에서 클러스터링된 포인트들에 대하여 물체를 나타내는 3D 경계 박스를 생성하고, 생성된 박스 영역을 이용하여 물체를 분류하기 위한 박스 피팅 모듈, 상기 박스 피팅 모듈에서 분류된 물체를 트랙에 할당하여, 모션 패턴을 예측하여 추적하는 트래커 모듈, 물체를 인식하기 위한 제2 센서인 카메라에서 촬영된 영상에서 물체를 인식하기 위한 물체 디텍터 및 상기 트래커 모듈에서 추적한 물체와 상기 물체 디텍터에서 인식한 물체를 융합하고, 가짜 트랙을 취소하고, 트랙을 유지하고 관리하는 트랙 관리 모듈을 포함한다. The present invention to achieve this purpose relates to a system for real-time 3D object recognition and tracking using images and LiDAR, and is a ground point cloud for the point cloud generated by LiDAR, the first sensor for recognizing objects. A ground classification module for distinguishing between and non-ground point clouds, a clustering module for performing a clustering task by grouping and labeling points belonging to one object for the point cloud classified in the ground classification module, in the clustering module A box fitting module for generating a 3D bounding box representing an object for clustered points and classifying the object using the generated box area, assigning the object classified in the box fitting module to a track, and predicting a motion pattern. a tracker module that tracks the object, an object detector that recognizes the object in the image captured by the camera, which is a second sensor for recognizing the object, and the object tracked by the tracker module and the object recognized by the object detector are fused, and a fake track is created. Includes a track management module to cancel, maintain and manage tracks.

상기 지면 분류 모듈은, 포인트 클라우드의 인덱스를 원통형 극 그리드에 분포시키고, 상기 원통형 극 그리드에서 라이다가 위치한 높이와 동일한 지면 레벨을 기준으로 지면 포인트 클라우드와 비 지면 포인트 클라우드를 분류할 수 있다. The ground classification module may distribute the index of the point cloud to a cylindrical pole grid and classify the ground point cloud and the non-ground point cloud based on the ground level equal to the height at which the LIDAR is located in the cylindrical pole grid.

상기 클러스터링 모듈은, 비 지면 포인트 클라우드에 대하여 3D 원통형 그리드에 분포시키고, 3D 원통형 그리드에서 선택된 인덱스 셀 주변에 있는 인접 셀에서 연관된 포인트들을 검색하고, 연관된 포인트가 존재하는 인접 셀을 클러스터 멤버로 표시하는 방식으로 클러스터링을 수행할 수 있다. The clustering module distributes the non-ground point cloud on a 3D cylindrical grid, searches for related points in adjacent cells around the selected index cell in the 3D cylindrical grid, and displays adjacent cells in which the related points exist as cluster members. Clustering can be performed using this method.

상기 박스 피팅 모듈은, L 형 클라우드 피팅을 갖는 최소 직사각형 영역을 사용하여 3D 경계 박스의 자세를 수정하고, 수정된 3D 경계 박스의 치수를 이용하여 물체를 분류할 수 있다. The box fitting module can modify the pose of a 3D bounding box using a minimum rectangular region with L-shaped cloud fitting and classify the object using the dimensions of the modified 3D bounding box.

상기 트래커 모듈은, 다른 모션 패턴을 캡쳐하기 위한 IMM(Interactive Multiple Model), 모션 모델의 비선형성을 처리하기 위한 UKF(Unscented Kalman Filter) 및 클루터(clutter) 존재시 상기 물체에 대한 측정 데이터를 연관시키기 위한 JPDAF(Joint Probabilistic Data Association Filter)를 포함하는 IMM-UKF-JPDAF를 이용하여 물체의 운동 상태를 추정할 수 있다. The tracker module correlates measurement data for the object in the presence of IMM (Interactive Multiple Model) to capture different motion patterns, UKF (Unscented Kalman Filter) to handle non-linearity of the motion model, and clutter. The motion state of an object can be estimated using IMM-UKF-JPDAF, which includes JPDAF (Joint Probabilistic Data Association Filter).

상기 트랙 관리 모듈은, 트랙을 초기화하고 트랙의 유효성을 검사하는 초기화 및 유효성 검사 과정과, 불필요한 트랙을 제거하기 위한 프루닝(pruning) 과정과, 트랙 초기화가 진행된 후에 요, 속도 및 각속도 파라미터를 업데이트하는 과정을 수행할 수 있다. The track management module includes an initialization and validation process to initialize the track and check the validity of the track, a pruning process to remove unnecessary tracks, and update yaw, velocity, and angular velocity parameters after track initialization. The process can be performed.

본 발명에서 카메라에서 촬영된 영상과 라이다를 융합하여 3차원 물체를 인식 및 추적하는 기술을 제안함으로써, 폐색 및 누락된 시각적 감지 문제를 해결하고, 다양한 센서 배열에 원활하게 통합할 수 있으며, 개체 동적 속성을 일시적으로 제공할 수 있는 효과가 있다.In the present invention, by proposing a technology to recognize and track 3D objects by fusing images captured from a camera and LIDAR, the problem of occlusion and missing visual detection can be solved, seamless integration into various sensor arrays, and object detection can be achieved. This has the effect of temporarily providing dynamic properties.

도 1은 V2X 기반 MODT 구조를 도시한 블록도이다.
도 2는 본 발명의 일 실시예에 따른 영상과 라이다를 이용한 실시간 3차원 물체 인식 및 추적하는 시스템의 프레임워크를 도시한 것이다.
도 3은 본 발명에서 제안하는 프레임워크를 실제 차량에 구현하여 테스트한 플랫폼을 보여주는 것이다.
도 4는 본 발명의 일 실시예에 따른 지면 분류를 위한 2D 극 그리드를 도시한 것이다.
도 5는 본 발명의 일 실시예에 따른 클러스터링을 위한 원통형 그리드를 도시한 것이다.
도 6은 본 발명의 일 실시예에 따른 L형 박스 피팅을 도시한 것이다.
도 7은 본 발명의 일 실시예에 따른 IMM-UKF-JPDAF 트래커를 도시한 것이다.
도 8은 본 발명의 일 실시예에 따른 클래스 벡터를 이용하여 추적된 개체를 분류하는 시각적 물체 디텍터를 도시한 것이다.
도 9는 본 발명의 일 실시예에 따른 자세, 치수 및 중심 보정기를 구비한 트랙 관리 모듈을 도시한 것이다. Figure 1 is a block diagram showing a V2X-based MODT structure.
Figure 2 shows the framework of a system for real-time 3D object recognition and tracking using images and LIDAR according to an embodiment of the present invention.
Figure 3 shows a platform on which the framework proposed in the present invention was implemented and tested on an actual vehicle.
Figure 4 shows a 2D polar grid for ground classification according to an embodiment of the present invention.
Figure 5 shows a cylindrical grid for clustering according to an embodiment of the present invention.
Figure 6 shows an L-shaped box fitting according to an embodiment of the present invention.
Figure 7 shows an IMM-UKF-JPDAF tracker according to an embodiment of the present invention.
Figure 8 shows a visual object detector that classifies a tracked object using a class vector according to an embodiment of the present invention.
Figure 9 shows a track management module with attitude, dimension and centroid correctors according to one embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present invention.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in this application are only used to describe specific embodiments and are not intended to limit the invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 갖고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 갖는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the technical field to which the present invention pertains. Terms such as those defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings they have in the context of the related technology, and should not be interpreted as having ideal or excessively formal meanings, unless explicitly defined in the present application. No.

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, when describing with reference to the accompanying drawings, identical components will be assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted. In describing the present invention, if it is determined that a detailed description of related known technologies may unnecessarily obscure the gist of the present invention, the detailed description will be omitted.

본 발명에서는 임베디드 시스템을 위한 융합 3D 물체 인식 및 추적을 위한 포괄적인 프레임워크(framework)를 제안한다. 본 발명에서 프레임워크는 시각적 LiDAR 설정을 사용하여 실시간으로 신뢰할 수 있는 결과를 위한 정보 중복성을 활용한다. 3D 라이다 포인트 클라우드는 원통형 그리드로 표시되며, 물체 가능 후보가 필터링된다. 그리고, 후보자를 추적하고 위치, 자세(pose), 치수 및 클래스 벡터 정보를 유지한다. 이와 동시에, 추적된 후보의 클래스 벡터를 일시적으로 업데이트하는 제안 생성을 위한 객체의 시각적 분류를 위해 신경망이 사용된다.The present invention proposes a comprehensive framework for converged 3D object recognition and tracking for embedded systems. In the present invention, the framework utilizes information redundancy for reliable results in real time using a visual LiDAR setup. The 3D LiDAR point cloud is displayed as a cylindrical grid, and possible object candidates are filtered. Then, it tracks candidates and maintains location, pose, dimensions, and class vector information. At the same time, a neural network is used for visual classification of objects to generate suggestions that temporarily update the class vectors of the tracked candidates.

본 발명은 광범위한 V2X(Vehicle to Everything) 기반 자율 차량 아키텍처를 포함하는 스마트 자동차 프로젝트의 일부인 3D 물체 감지 및 추적에 관한 것이다. 본 발명은 스마트 차량이 V2X 프로토콜을 통해 통신하고 환경 정보를 공유하도록 하는 것이다. 조밀한 도시 상황에서와 마찬가지로 환경의 대부분은 다른 동적 개체에 의해 가려진다. 따라서, 본 발명에 의하면 모든 차량이 환경에 대한 최소한의 MODT 정보를 공유할 수 있다는 점을 감안할 때 센서 범위를 넘어서는 가시성을 확보할 수 있다. 또한 엣지 컴퓨팅(edge computing)과 5G의 급속한 발전은 컴퓨팅 및 통신 용량을 향상시킬 수 있다.The present invention relates to 3D object detection and tracking as part of a smart car project involving a broad Vehicle to Everything (V2X)-based autonomous vehicle architecture. The present invention allows smart vehicles to communicate and share environmental information through the V2X protocol. As in dense urban situations, most of the environment is obscured by other dynamic objects. Therefore, according to the present invention, visibility beyond the sensor range can be secured, given that all vehicles can share minimal MODT information about the environment. Additionally, the rapid development of edge computing and 5G can improve computing and communication capacity.

본 발명은 영상과 라이다를 이용한 실시간 3차원 물체 인식 및 추적하는 시스템에 관한 것이다. The present invention relates to a system for real-time 3D object recognition and tracking using images and LIDAR.

도 1은 V2X 기반 MODT 구조를 도시한 블록도이다. Figure 1 is a block diagram showing a V2X-based MODT structure.

도 1에서 보는 바와 같이, 본 발명에서 제안하는 MODT 방식은 기본적으로 V2X(Vehicle to Everything) 기반 자율주행차 아키텍처에서 구현된다. 본 발명은 LDM(Local Dynamic Map)을 채워 네트워크에 있는 개별 'n'개의 스마트 차량 제어를 지원하고 안전 메시지를 공유하는 것이다. 스마트 차량이 지도에서 현지화(localization)되면 V2X 트랜시버를 통해 로컬 MODT 정보와 LDM 정보가 융합된다. 이것은 단일 차량 센서의 감지 기능을 넘어서는 환경 인식을 제공한다.As shown in Figure 1, the MODT method proposed in the present invention is basically implemented in a V2X (Vehicle to Everything)-based autonomous vehicle architecture. The present invention supports control of 'n' individual smart vehicles in a network by filling out a Local Dynamic Map (LDM) and shares safety messages. When a smart vehicle is localized on a map, local MODT information and LDM information are fused through the V2X transceiver. This provides environmental awareness beyond the detection capabilities of single vehicle sensors.

본 발명의 3D MODT는 클래스(class), 치수(dimension) 및 방향(orientation) 별로 물체를 인식하고 위치 및 운동학에 관련된 매개변수와 함께 고유한 ID를 유지하는 것이다. 대부분의 경우 물체 인식의 응용 분야에서 연속적인 시각적 프레임이 시간적이라는 개념에서 파생된다. 즉, 장면이 갑자기 변경되지 않고 장면에 있는 개체의 모양이 여러 프레임 동안 계속 표시된다. 또한 자율 차량과 같은 응용 분야는 감지된 물체의 동적 정보로부터 큰 이점을 얻는다. 이것은 일반적으로 감지된 객체의 고유 ID를 유지하고 감지된 객체의 모션 패턴을 예측하는 트래커(tracker)에 의해 실행된다. 따라서 추적 없이 정확한 3D 물체 감지는 물체의 움직임에 대한 어떠한 정보도 제공하지 않는다.The 3D MODT of the present invention recognizes objects by class, dimension, and orientation and maintains a unique ID along with parameters related to position and kinematics. In most cases, applications of object recognition are derived from the notion that successive visual frames are temporal. This means that the scene does not change suddenly and the shapes of objects in the scene remain visible for several frames. Additionally, applications such as autonomous vehicles benefit greatly from dynamic information from detected objects. This is typically implemented by a tracker that maintains the unique ID of the detected object and predicts the motion pattern of the detected object. Therefore, accurate 3D object detection without tracking does not provide any information about the object's movement.

본 발명에서 제안하는 프레임워크는 시간적 방식으로 3D 물체 인식 및 추적을 수행한다. The framework proposed in the present invention performs 3D object recognition and tracking in a temporal manner.

도 2는 본 발명의 일 실시예에 따른 영상과 라이다를 이용한 실시간 3차원 물체 인식 및 추적하는 시스템의 프레임워크를 도시한 것으로서, 각 모듈 간의 정보의 흐름이 도시되어 있다. Figure 2 shows the framework of a system for real-time 3D object recognition and tracking using images and LIDAR according to an embodiment of the present invention, and shows the flow of information between each module.

도 2를 참조하면, 본 발명의 3차원 물체 인식 및 추적하는 시스템은 지면 분류 모듈(110), 클러스터링 모듈(120), 박스 피팅 모듈(130), 트래커 모듈(140), 물체 디텍터(150), 트랙 관리 모듈(160)을 포함한다. Referring to FIG. 2, the system for recognizing and tracking a 3D object of the present invention includes a ground classification module 110, a clustering module 120, a box fitting module 130, a tracker module 140, an object detector 150, Includes a track management module 160.

지면 분류 모듈(110)은 물체를 인식하기 위한 제1 센서인 라이다(10)에서 생성된 포인트 클라우드에 대하여 지면 포인트 클라우드와 비 지면 포인트 클라우드로 구분한다. The ground classification module 110 divides the point cloud generated by the LiDAR 10, which is the first sensor for recognizing objects, into a ground point cloud and a non-ground point cloud.

클러스터링 모듈(120)은 지면 분류 모듈(110)에서 분류된 포인트 클라우드에 대하여 하나의 물체에 속하는 포인트들을 그룹핑하고 라벨링하는 방식으로 클러스터링 작업을 수행한다. The clustering module 120 performs a clustering task on the point cloud classified by the ground classification module 110 by grouping and labeling points belonging to one object.

박스 피팅 모듈(130)은 클러스터링 모듈(120)에서 클러스터링된 포인트들에 대하여 물체를 나타내는 3D 경계 박스를 생성하고, 생성된 박스 영역을 이용하여 물체를 분류한다. The box fitting module 130 generates a 3D bounding box representing an object for points clustered in the clustering module 120 and classifies the object using the generated box area.

트래커 모듈(140)은 박스 피팅 모듈(130)에서 분류된 물체를 트랙에 할당하여, 모션 패턴을 예측하여 추적한다. The tracker module 140 assigns the objects classified by the box fitting module 130 to tracks, predicts motion patterns, and tracks them.

물체 디텍터(150)는 물체를 인식하기 위한 제2 센서인 카메라(20)에서 촬영된 영상에서 물체를 인식한다. The object detector 150 recognizes an object from an image captured by the camera 20, which is a second sensor for recognizing an object.

트랙 관리 모듈(160)은 트래커 모듈(140)에서 추적한 물체와 물체 디텍터(150)에서 인식한 물체를 융합하고, 가짜 트랙을 취소하고, 트랙을 유지하고 관리한다. The track management module 160 fuses the object tracked by the tracker module 140 and the object recognized by the object detector 150, cancels fake tracks, and maintains and manages the tracks.

지면 분류 모듈(110)은 포인트 클라우드의 인덱스를 원통형 극 그리드에 분포시키고, 상기 원통형 극 그리드에서 라이다가 위치한 높이와 동일한 지면 레벨을 기준으로 지면 포인트 클라우드와 비 지면 포인트 클라우드를 분류할 수 있다. The ground classification module 110 may distribute the index of the point cloud to a cylindrical pole grid and classify the ground point cloud and the non-ground point cloud based on the ground level equal to the height at which the LIDAR is located in the cylindrical pole grid.

클러스터링 모듈(120)은 비 지면 포인트 클라우드에 대하여 3D 원통형 그리드에 분포시키고, 3D 원통형 그리드에서 선택된 인덱스 셀 주변에 있는 인접 셀에서 연관된 포인트들을 검색하고, 연관된 포인트가 존재하는 인접 셀을 클러스터 멤버로 표시하는 방식으로 클러스터링을 수행할 수 있다. The clustering module 120 distributes the non-ground point cloud to a 3D cylindrical grid, searches for related points in adjacent cells around the selected index cell in the 3D cylindrical grid, and displays adjacent cells in which the related point exists as cluster members. Clustering can be performed in this way.

박스 피팅 모듈(130)은 L 형 클라우드 피팅을 갖는 최소 직사각형 영역을 사용하여 3D 경계 박스의 자세를 수정하고, 수정된 3D 경계 박스의 치수를 이용하여 물체를 분류할 수 있다. The box fitting module 130 may modify the pose of the 3D bounding box using the minimum rectangular region with L-shaped cloud fitting and classify the object using the dimensions of the modified 3D bounding box.

트래커 모듈(140)은 다른 모션 패턴을 캡쳐하기 위한 IMM(Interactive Multiple Model), 모션 모델의 비선형성을 처리하기 위한 UKF(Unscented Kalman Filter) 및 클루터(clutter) 존재시 상기 물체에 대한 측정 데이터를 연관시키기 위한 JPDAF(Joint Probabilistic Data Association Filter)를 포함하는 IMM-UKF-JPDAF를 이용하여 물체의 운동 상태를 추정할 수 있다. The tracker module 140 uses an Interactive Multiple Model (IMM) for capturing different motion patterns, an Unscented Kalman Filter (UKF) for processing non-linearity of the motion model, and measurement data for the object in the presence of clutter. The motion state of an object can be estimated using IMM-UKF-JPDAF, which includes Joint Probabilistic Data Association Filter (JPDAF) for association.

트랙 관리 모듈(160)은 트랙을 초기화하고 트랙의 유효성을 검사하는 초기화 및 유효성 검사 과정과, 불필요한 트랙을 제거하기 위한 프루닝(pruning) 과정과, 트랙 초기화가 진행된 후에 요, 속도 및 각속도 파라미터를 업데이트하는 과정을 수행할 수 있다. The track management module 160 performs an initialization and validation process to initialize the track and check the validity of the track, a pruning process to remove unnecessary tracks, and yaw, velocity, and angular velocity parameters after track initialization. You can perform the updating process.

도 2에서 본 발명의 프레임워크는 각각 라이다(10) 및 카메라(20)와 관련된 두 개의 스레드(thread)에서 실행된다. LiDAR 포인트 클라우드는 지면 제거 및 클러스터링으로 처리되어 잠재적으로 추적 가능한 물체의 초기 자세(pose)와 치수를 예측한다. 물체의 중심은 IMM-UKF-JPDAF(Interactive Multiple Model- Unscented Kalman Filter- Joint Probabilistic Data Association Filter) 기반 트래커에 대한 측정으로 간주될 수 있다. In Figure 2, the framework of the present invention is executed in two threads associated with lidar 10 and camera 20, respectively. LiDAR point clouds are processed with ground removal and clustering to predict the initial pose and dimensions of potentially trackable objects. The center of the object can be considered a measurement for an IMM-UKF-JPDAF (Interactive Multiple Model- Unscented Kalman Filter- Joint Probabilistic Data Association Filter) based tracker.

이와 함께 두 번째 스레드는 카메라에서 촬영된 이미지의 시각적 감지를 예측하여 지역화된 경계 박스(localized bounding box)와 클래스 정보를 제공한다. 단일 프레임의 물체에 고정된 클래스와 치수를 할당하는 대신 추적된 물체에 클래스가 할당되는 반면, 치수, 자세 및 속도와 관련된 파라미터는 여러 시간 프레임에 걸쳐 일시적으로 업데이트된다. 추적 정보는 시각적 감지와 병합되어 관련 추적 파라미터와 함께 3D 물체 자세를 제공한다. In addition, the second thread predicts the visual detection of images captured by the camera and provides localized bounding box and class information. Instead of assigning fixed classes and dimensions to objects in a single frame, classes are assigned to tracked objects, while parameters related to dimensions, pose, and speed are updated temporally over multiple time frames. Tracking information is merged with visual detection to provide 3D object pose along with associated tracking parameters.

도 3은 본 발명에서 제안하는 프레임워크를 실제 차량에 구현하여 테스트한 플랫폼을 보여주는 것이다. Figure 3 shows a platform on which the framework proposed in the present invention was implemented and tested on an actual vehicle.

도 3의 실시예에서 보는 바와 같이, 차량에서 구현 및 테스트되었다. 플랫폼에는 OS1-64 Ouster LiDAR가 탑재되어 있다. 플랫폼의 중앙 상단. 시각적 인식을 위해 ZED 카메라는 맞춤형 케이스 내부의 LiDAR 옆에 장착된다. 센서는 제안된 프레임워크와 관련된 계산을 수행하는 Nvidia의 Jetson AGX Xavier 장치에 원시 측정값을 제공한다. 또한 차량 CAN은 V2X 모뎀(Modem)과 연동하여 V2X 통신을 수행한다. 프레임워크는 Ubuntu Linux 18.04.1 위에 ROS(Robot operating system) 'Melodic Morenia' 미들웨어에서 작동하도록 개발되었다. Xavier의 GPU는 시각적 감지를 위해 CUDA 10.0 라이브러리를 통해 활용되고, LiDAR 전처리 및 추적 작업은 NVIDIA Carmel ARM CPU 프로세서에서 처리된다. As shown in the embodiment of Figure 3, it was implemented and tested in a vehicle. The platform is equipped with OS1-64 Ouster LiDAR. Top center of the platform. For visual recognition, a ZED camera is mounted next to the LiDAR inside a custom case. The sensor provides raw measurements to Nvidia's Jetson AGX Xavier device, which performs the calculations relevant to the proposed framework. Additionally, vehicle CAN performs V2X communication in conjunction with the V2X modem. The framework was developed to run on ROS (Robot operating system) 'Melodic Morenia' middleware on Ubuntu Linux 18.04.1. Xavier's GPU is leveraged through the CUDA 10.0 library for visual detection, and LiDAR preprocessing and tracking tasks are handled by the NVIDIA Carmel ARM CPU processor.

본 발명에서 제안하는 3D MODT 프레임워크는 각각 3D 라이다 및 카메라 처리와 관련된 두 개의 스레드로 구성된다. 3D 라이다 포인트 클라우드 처리를 위한 스레드는 지상 분할, 클러스터링, 박싱 및 추적으로 구성된다. 반면 카메라의 이미지를 처리하는 스레드는 객체 클래스 정보를 제공하기 위해 ROS 패키지로 구현된 YOLO v3로 구성된다. 각 하위 모듈의 작동 및 구조에 대한 상세한 설명은 다음과 같다. The 3D MODT framework proposed in the present invention consists of two threads related to 3D LiDAR and camera processing, respectively. The thread for 3D LiDAR point cloud processing consists of ground segmentation, clustering, boxing and tracking. On the other hand, the thread that processes the camera's images is composed of YOLO v3 implemented as a ROS package to provide object class information. A detailed description of the operation and structure of each submodule is as follows.

지면 분류 모듈(110)은 LiDAR 포인트 클라우드를 지면 및 비지면 측정으로 분류하는 필수 전처리 작업을 수행한다. 지면으로 분류된 포인트 클라우드 부분은 도로 표시, 연석 감지, 횡단 가능 영역 및 경로 계획 작업을 위해 추가 처리될 수 있다. 반면, 비지면으로 분류된 포인트 클라우드의 일부는 3D 물체 감지와 관련된 작업에 효과적으로 사용된다. The ground classification module 110 performs the necessary preprocessing task of classifying the LiDAR point cloud into ground and non-ground measurements. The portion of the point cloud classified as ground can be further processed for road marking, curb detection, traversable areas, and route planning tasks. On the other hand, the part of the point cloud classified as non-ground is effectively used for tasks related to 3D object detection.

지면 분류를 위한 전략은 스캔 링(scan-rings), 복셀(voxels), 높이 임계값 또는 기능 학습 등이 있다. 스캔 링 기반 접근 방식은 일반적으로 단일 LiDAR 설정에 적용할 수 있으며, 여기에서 연속 스캔 라인 사이의 거리가 접지 분류를 위해 사용된다. 반면에 포인트 클라우드를 2D 또는 3D 공간으로 복셀화(voxelization)하는 것도 추정을 위해 측정 횟수를 줄이는 일반적인 방법이다. 마찬가지로 플래너(planner) 지반 환경을 가정하면 지면 분류를 위해 높이 임계값을 설정할 수 있다. 반면에 일부 접근 방식은 신경망을 활용하여 희소 LiDAR 포인트 클라우드의 분류를 처리한다. 본 발명에서는 그라운드가 논 플래너(non-planner)가 될 가능성을 고려하고 임의로 배치된 여러 개의 보정된 LiDAR의 병합을 포인트 클라우드로 가정한다. 이 가정은 높이 임계값 및 스캔 링에 의존하는 접근 방식을 배제하고, 라이다 센서의 수와 위치의 가변성, 임베디드 컴퓨팅의 제약으로 인해 학습 기반 접근 방식의 사용이 제한된다. 이 작업에서 채택된 접근 방식은 분류 작업을 효율적으로 처리하는 2D 배열로 포인트 클라우드의 인덱싱을 포함한다.Strategies for ground classification include scan-rings, voxels, height thresholds, or feature learning. Scan ring-based approaches are generally applicable to single LiDAR setups, where the distance between consecutive scan lines is used for ground classification. On the other hand, voxelizing the point cloud into 2D or 3D space is also a common way to reduce the number of measurements for estimation. Likewise, assuming a planner ground environment, a height threshold can be set for ground classification. On the other hand, some approaches utilize neural networks to handle classification of sparse LiDAR point clouds. In the present invention, the possibility of the ground being non-planner is considered and the point cloud is assumed to be a merge of several randomly placed calibrated LiDARs. This assumption precludes approaches relying on height thresholds and scanning rings, and the variability in the number and location of LiDAR sensors and the constraints of embedded computing limit the use of learning-based approaches. The approach adopted in this work involves indexing the point cloud into a 2D array, which efficiently handles the classification task.

도 4는 본 발명의 일 실시예에 따른 지면 분류를 위한 2D 극 그리드를 도시한 것이다. Figure 4 shows a 2D polar grid for ground classification according to an embodiment of the present invention.

도 4를 참조하면, 어레이의 각 셀에는 수직으로 슬라이스된 실린더의 섹션에 속하는 포인트 클라우드 측정의 인덱스가 채널과 빈(bin)으로 표시된다. 각 채널은 지면 수준을 추정하기 위해 차량에서 바깥쪽으로 향하게 독립적으로 횡단된다. 그리드의 각 셀에서 지면으로부터의 센서 높이를 초기 지면 높이로 간주하고 연속된 셀의 가장 낮은 측정값까지의 기울기를 계산한다. 비지면 측정 및 이전 지면 수준을 포함하는 셀과 관련된 임계값을 초과하는 경사가 유지된다. 반면 임계값 한계 내의 경사에 따라 후속 셀의 지면 수준을 업데이트한다. 그리드의 모든 셀이 지면 수준에 도달하면 포인트 클라우드가 허용오차 파라미터로 분리되어 에지 노이즈를 제거한다.Referring to Figure 4, in each cell of the array, the index of the point cloud measurement belonging to the section of the vertically sliced cylinder is displayed as a channel and bin. Each channel is traversed independently facing outward from the vehicle to estimate ground level. The sensor height from the ground in each cell of the grid is considered the initial ground height, and the slope to the lowest measurement in successive cells is calculated. Slopes exceeding a threshold associated with cells containing non-ground measurements and previous ground levels are maintained. On the other hand, it updates the ground level of subsequent cells according to the slope within the threshold limit. When all cells in the grid reach ground level, the point cloud is separated by a tolerance parameter to remove edge noise.

도 4에서 보는 바와 같이, 극 그리드는 채널(channel)로 구성되며, 채널은 원통형 극 그리드의 원점에서부터 가장 멀리 떨어진 값 까지의 수직 슬라이스이다. 그리고, 채널은 해당 영역의 가능한 포인트 수를 포함하는 빈(bin)으로 더 세분화된다. 극 그리드의 모든 빈들은 차량에서 시작하여 라이다 센서 높이와 동일한 지면 레벨인 로컬 지면 레벨을 추적하기 위해 횡단되어 있다. As shown in Figure 4, the polar grid is composed of channels, and a channel is a vertical slice from the origin of the cylindrical polar grid to the furthest value. Then, the channel is further divided into bins containing the number of possible points in that region. All bins of the polar grid are traversed to track the local ground level, which is the ground level equal to the lidar sensor height, starting from the vehicle.

인접 셀의 지면 레벨은 가장 낮은 포인트, 빈의 높이 및 인접 빈까지의 절대 기울기를 기반으로 추정된다. 로컬 지면 레벨은 포인트를 빈 레벨에서 지면 및 비 지면 포인트로 구분한다. 이때, 식물과 같이 임계 값 높이에 해당하지 않는 빈에 관한 포인트를 제거하기 위하여 미세 조정이 수행된다. The ground level of an adjacent cell is estimated based on the lowest point, the height of the bin, and the absolute slope to the adjacent bin. The local ground level separates points from the empty level into ground and non-ground points. At this time, fine tuning is performed to remove points related to bins that do not correspond to the threshold height, such as plants.

그리고, 비 지면 포인트를 참조하는 포인트의 인덱스는 클라우드를 필터링하는데 사용되며, 추후 처리를 위해 피드된다.Then, the index of the point referencing the non-ground point is used to filter the cloud and is fed for further processing.

본 발명의 지면 분류 모듈(110)은 채널을 따라 일관성을 체크하는 부가적인 단계를 포함한다. 그런데, 모든 빈이 상대적으로 정확한 로컬 지면 정보를 얻으므로, 이 부가적인 단계가 필수적인 것은 아니다. The ground classification module 110 of the present invention includes the additional step of checking consistency along the channel. However, since all bins obtain relatively accurate local ground information, this additional step is not essential.

본 발명에서 제안된 지면 분류 모듈(110)은 종래 모듈과 비교하여 프로세스 시간을 약 1/2로 줄일 수 있도록 더욱 최적화되었다. 지면 분류에 대한 이전 작업의 접근 방식은 데이터 표현을 원통형 그리드, 그리드 셀 레이블링 및 LiDAR 포인트 레이블링으로 각각 여러 번 이동하였으나, 본 발명에서는 최적화를 통해 프로세스 시간을 단축시켰다. 본 발명에서 최적화를 위한 주요 수정 사항은 다음과 같다. The ground classification module 110 proposed in the present invention has been further optimized to reduce the process time by about half compared to the conventional module. Previous approaches to ground classification have shifted the data representation to a cylindrical grid, grid cell labeling, and LiDAR point labeling several times each, but in the present invention, the process time has been shortened through optimization. The main modifications for optimization in the present invention are as follows.

(1) 각 셀의 최저점과 최고점은 포인트 클라우드의 인덱스가 그리드에 분포될 때 발견된다. (1) The lowest and highest points of each cell are found when the indices of the point cloud are distributed on a grid.

(2) 각 채널을 두 번 횡단하는 대신, 빈을 따라 경사 및 로컬 지면 수준의 추정이 단일 횡단에서 수행된다. (2) Instead of traversing each channel twice, estimation of slope and local ground level along the bin is performed in a single traverse.

(3) 포인트 클라우드 인덱스에 대한 반복 수행은 빈에 레이블을 지정하는 단계를 건너뛰고 지면 및 비 지면 포인트에 대한 포인트 클라우드를 형성하는 데 효율적으로 활용된다.(3) Iterating over the point cloud index is efficiently utilized to form point clouds for ground and non-ground points, skipping the step of labeling bins.

연산 플랫폼의 처리 속도와 최적화된 프로그래밍 방식 외에도 데이터 표현을 위한 파라미터가 전체 처리 시간에 기여한다. 여기에는 센서의 측정 범위 R_range, 시간 단계에서 라이다 측정 수, FOV로 간주되는 라이다 시야, 그리드 셀 영역으로 표현되는 그리드 기반 표현의 해상도가 포함되며, 다음과 같은 수식으로 나타낼 수 있다. In addition to the processing speed of the computing platform and optimized programming methods, parameters for data representation contribute to the overall processing time. This includes the measurement range R _range of the sensor, the number of LiDAR measurements in a time step, the LiDAR field of view considered as FOV, and the resolution of the grid-based representation expressed in terms of grid cell area, which can be expressed by the formula:

(1) (One)

(2) (2)

빈과 채널의 수는 경사 테스트 및 로컬 지면 추정을 위해 횡단해야 하는 그리드 셀의 면적과 수를 결정한다. 해상도가 높으면 추가 처리가 필요한 반면, 해상도가 낮으면 간결한 표현이 가능하다. 지면이 분류되면 비 지면 라이다 측정과 관련된 포인트 클라우드가 클러스터링 모듈(120)에 표시된다.The number of bins and channels determines the area and number of grid cells that must be traversed for slope testing and local ground estimation. High resolution requires additional processing, while low resolution allows for concise expression. Once the ground is classified, the point cloud associated with the non-ground LIDAR measurements is displayed in the clustering module 120.

클러스터링의 개념은 일부 유사성을 기반으로 엔터티(entities)를 그룹화하는 것으로서, 각 클러스터가 고유한 객체에 해당하도록 LiDAR 포인트 클라우드를 클러스터링하는 것은 희소성과 질감 정보의 부족으로 인해 어려운 작업이다. 클러스터링 접근 방식은 일반적으로 연결성, 중심, 밀도, 분포 또는 LiDAR 측정의 학습된 기능을 활용한다. 연결성 또는 계층 기반 접근 방식은 인접 측정값의 근접성에 의존하고 반복적으로 확장된다.The concept of clustering is to group entities based on some similarity, and clustering a LiDAR point cloud such that each cluster corresponds to a unique object is a difficult task due to sparsity and lack of texture information. Clustering approaches typically utilize learned features of connectivity, centroid, density, distribution, or LiDAR measurements. Connectivity or hierarchy-based approaches rely on the proximity of adjacent measurements and are iteratively expanded.

중심 기반 클러스터링 방식은 K-평균, 가우스 혼합 모델 및 퍼지 c-평균과 같이 데이터를 나눌 클러스터 수에 대한 사전 지식이 필요하다. 밀도 기반 접근 방식은 클러스터링을 위한 고밀도 영역을 식별하지만 LiDAR 측정의 밀도는 센서로부터의 거리에 따라 방사형으로 감소한다. 더욱이, 가려진 측정은 밀도에 더 영향을 미치므로 밀도 기반 클러스터링 접근 방식은 3D 물체 감지 패러다임에 효과적으로 적용할 수 없다.Centroid-based clustering methods require prior knowledge of the number of clusters to divide the data into, such as K-means, Gaussian mixture models, and fuzzy c-means. Density-based approaches identify high-density areas for clustering, but the density of LiDAR measurements decreases radially with distance from the sensor. Moreover, since occluded measurements further affect density, density-based clustering approaches cannot be effectively applied to 3D object detection paradigms.

분포 기반 클러스터링 방법은 물체의 잠재적인 클러스터에 분포 모델을 활용하여 밀도 기반 방법에 비해 더 많은 정보를 제공하지만 복잡성이 증가한다. 그러나 분포 모델의 부재와 부분 폐색에서의 측정은 적절한 클러스터링에 어려움이 있다. 이와 유사하게, 학습 기반 접근 방식은 클러스터 또는 포인트별 분류에 대한 최적화 기능/기준 세트에 대한 신경망을 훈련한다. 접근 방식은 3D 물체 인식 작업에 효과적인 것으로 입증되었지만 임베디드 플랫폼의 제약을 넘어서는 과도한 계산 리소스가 필요하다. Distribution-based clustering methods utilize a distribution model to identify potential clusters of objects, providing more information compared to density-based methods, but increasing complexity. However, the absence of a distribution model and measurements under partial occlusion make it difficult to achieve appropriate clustering. Similarly, learning-based approaches train a neural network on a set of optimization functions/criteria for cluster- or point-wise classification. The approach has proven effective for 3D object recognition tasks, but requires excessive computational resources beyond the constraints of embedded platforms.

클러스터링 작업에서 라이다 포인트 클라우드는 연결 기반 접근 방식을 사용하여 클러스터링된다. 본 발명에서 복잡성을 줄이기 위해 점 단위 클러스터링 대신 3D 원통형 그리드가 사용된다. In the clustering task, LiDAR point clouds are clustered using a connection-based approach. In the present invention, a 3D cylindrical grid is used instead of point-wise clustering to reduce complexity.

도 5는 본 발명의 일 실시예에 따른 클러스터링을 위한 원통형 그리드를 도시한 것이다. Figure 5 shows a cylindrical grid for clustering according to an embodiment of the present invention.

도 5를 참조하면, 2D 그리드보다 3D 그리드의 장점은 신호등 및 다리와 같은 높은 구조물과 관련된 측정을 수용할 수 있다는 것이다. 또한 원통형 그리드는 센서에서 멀리 떨어진 측정의 희소성을 해결할 수 있다. 클러스터링을 위한 포인트 클라우드는 3D 배열로 표현되며, 각 셀에는 해당 포인트 인덱스가 포함된다. Referring to Figure 5, the advantage of a 3D grid over a 2D grid is that it can accommodate measurements involving tall structures such as traffic lights and bridges. Cylindrical grids can also address the sparsity of measurements far from the sensor. The point cloud for clustering is expressed as a 3D array, and each cell contains the corresponding point index.

3D 어레이는 3D 연결 구성 요소 클러스터링 접근 방식을 통해 처리되어 근접한 그리드 셀을 그룹화한다. 클러스터링의 공식은 3D 어레이의 모든 셀을 횡단하고 클러스터에 포함할 최소 셀 수에 대해 인접한 이웃을 검사한다. 포인트 클라우드 클러스터는 치수를 기반으로 필터링되며, 큰 클러스터는 일반적으로 건물에 해당하는 반면 매우 작은 클러스터는 소음, 사소한 장애물 또는 세분화에 속한다. 또한 지면에서 움직이는 물체를 추적하기 위한 의도이므로 지면에서 상승한 클러스터도 필터링된다. 나머지 클러스터는 다음 하위 물체 및 중심의 자세를 추정하기 위해 박스 피팅(box fitting) 작업으로 처리된다.The 3D array is processed through a 3D connected component clustering approach to group adjacent grid cells. The formula for clustering is to traverse all cells in a 3D array and check their immediate neighbors for the minimum number of cells to include in a cluster. Point cloud clusters are filtered based on dimensions, with large clusters typically corresponding to buildings, while very small clusters belong to noise, minor obstacles, or segmentation. Additionally, since the intent is to track objects moving on the ground, clusters that rise from the ground are also filtered out. The remaining clusters are then processed by a box fitting operation to estimate the poses of the sub-objects and centroids.

본 발명의 클러스터링 모듈(120)은 최적화되어 처리 시간이 크게 단축된다. 기존 클러스터링 모듈은 포인트 클라우드의 직사각형 그리드 기반 표현을 사용하여 상대적으로 더 높은 해상도를 요구한다. 또한 점유 그리드 셀을 클러스터링하기 위해 점유 확인을 위해 각 셀의 모든 26개 이웃을 순회했다. 본 발명의 클러스터링 모듈(120)의 주요 수정 사항은 다음과 같습니다.The clustering module 120 of the present invention is optimized and processing time is greatly reduced. Existing clustering modules use a rectangular grid-based representation of point clouds and require relatively higher resolution. Additionally, to cluster occupied grid cells, all 26 neighbors of each cell were traversed to check occupancy. The main modifications to the clustering module 120 of the present invention are as follows.

(1) 3개의 라이다의 병합된 포인트 클라우드 대신 단일 라이다의 포인트 클라우드를 활용하기 위해 직사각형 그리드가 3D 원통형 그리드로 대체되었다. (1) The rectangular grid was replaced with a 3D cylindrical grid to utilize the point cloud of a single LIDAR instead of the merged point cloud of three LIDARs.

(2) 클러스터링을 위해 그리드 셀의 26개 이웃을 검색하는 대신 6개의 인접 이웃을 탐색한다. (2) Instead of searching a grid cell's 26 neighbors for clustering, it searches its 6 immediate neighbors.

클러스터링 프로세스에서는 지면 분류를 위한 LiDAR 데이터의 2D 표현과 달리 3D 또는 체적 표현이 채택된다. 빈과 채널 외에도 LiDAR 클라우드의 수직 범위 V_range는 레이어로 나뉩니다. 그러나 LiDAR 측정의 수는 FOV 및 범위 R_range 내에서 비지면 측정으로만 감축된다. 그리드 셀의 부피는 다음과 같이 표현된다.In the clustering process, a 3D or volumetric representation is adopted as opposed to a 2D representation of LiDAR data for ground classification. In addition to bins and channels, the vertical range V _range of the LiDAR cloud is divided into layers. However, the number of LiDAR measurements is reduced to only non-ground measurements within the FOV and range R _range . The volume of a grid cell is expressed as:

(3) (3)

본 발명에서 채택한 클러스터링 방법은 고유한 물체에 속하는 그리드의 인접 셀에 LiDAR 클라우드 인덱스를 채워야 한다. 따라서 G_Vcell을 설정하려면 최적의 해상도 파라미터가 필요하다. 왜냐하면 더 높은 해상도는 계산 리소스가 증가함에도 불구하고 오히려 과도한 분할을 초래하고, 반면 저해상도 표현은 근접한 물체와 관련된 LiDAR 측정을 단일 물체로 클러스터링하는 경향이 있기 때문이다. 따라서 부피 G_Vcell은 성능과 계산 시간 사이의 균형을 제공한다.The clustering method adopted in the present invention requires filling the LiDAR cloud index into adjacent cells of the grid belonging to unique objects. Therefore, optimal resolution parameters are required to set G _Vcell . This is because higher resolutions actually lead to excessive segmentation despite increased computational resources, while lower resolution representations tend to cluster LiDAR measurements associated with nearby objects into single objects. Therefore, the volume G _Vcell provides a balance between performance and computation time.

클러스터링 모듈(120)에는 전체 추적 성능에 큰 영향을 미치는 박스 피팅 작업이 포함된다. 추적된 물체에 부분 폐색이 있더라도 추적된 물체의 치수 및 자세 기록은 추적 관리 모듈(160)에서 처리하는 정확한 중심과 자세를 복구하는데 여전히 기여한다.The clustering module 120 includes a box fitting operation that has a significant impact on overall tracking performance. Even if there is partial occlusion in the tracked object, the tracked object's dimension and pose records still contribute to recovering the correct centroid and pose, which is processed by the tracking management module 160.

클러스터링된 라이다 포인트 클라우드 데이터의 박스 피팅(box fitting)은 센서 가시선의 장애물로 인해 측정이 항상 가려지기 때문에 필수적이면서 어려운 작업이다. 효율적인 박스 피팅 기법은 부분적 측정을 고려하여 정확한 물체 자세와 중심을 추정한다. 모델 기반 또는 기능 기반의 몇가지 접근 방법으로 박스 피팅을 수행할 수 있는데, 모델 기반 방법은 원시(raw) 포인트 클라우드를 알려진 기하학적 모델과 일치시키는 반면, 특징 기반 접근 방식은 에지 특징을 활용하여 자세를 추정한다. MODT 응용 프로그램에서 모델 기반 접근 방식에서 범용성의 흠결과 과도한 계산 요구 사항이 있으며, 물체의 자세를 가장 잘 설명하는 특징을 선택하는 것은 어려운 작업이다. 현재, 신경망은 특징 선택 과정을 위해 훈련되기도 한다. 그러나 센서 설정을 변경하려면 데이터 세트에 레이블을 지정하고 네트워크를 재교육해야 하는 경우가 많다.Box fitting of clustered LiDAR point cloud data is an essential and difficult task because measurements are always obscured by obstacles in the sensor's line-of-sight. Efficient box fitting techniques estimate accurate object pose and centroid by considering partial measurements. Box fitting can be performed using several approaches, either model-based or feature-based. Model-based methods match a raw point cloud to a known geometric model, while feature-based approaches utilize edge features to estimate pose. do. In MODT applications, model-based approaches suffer from generality drawbacks and excessive computational requirements, and selecting the features that best describe the object's pose is a difficult task. Currently, neural networks are also trained for the feature selection process. However, changing sensor settings often requires labeling the dataset and retraining the network.

본 발명에서는 계산상의 제약을 고려하여 최소 직사각형 영역 내에서 L자 형태의 포인트 클라우드 피팅을 수행하는 기능 기반 방법을 사용한다. In the present invention, a function-based method is used to perform L-shaped point cloud fitting within a minimum rectangular area, taking computational constraints into account.

도 6은 본 발명의 일 실시예에 따른 L형 박스 피팅을 도시한 것이다.Figure 6 shows an L-shaped box fitting according to an embodiment of the present invention.

도 6을 참조하면, 처음에 가로 축에서 클러스터된 포인트 클라우드의 모서리를 식별하기 위해 최소 박스 피팅을 정의하는 좌표가 있는 점의 인덱스를 탐색한다. 물체 클러스터의 치수와 위치에 따른 가장 먼 모서리를 사용하여 선을 만들고, 클러스터의 모든 점을 탐색하여 라인에서 가장 먼 점을 세 번째 모서리로 찾는다. 그리고, 세 모서리를 사용하여 경계 박스 및 중심의 치수가 업데이트된다. 마지막으로 업데이트된 중심에 대해 클러스터링된 물체의 자세가 계산된다. 오클루전(occlusion)의 존재는 정확한 자세와 중심 추정에 영향을 미치기 때문에 트래커 모듈(140)은 추적된 물체의 이력을 유지하고 휴리스틱하게(heuristically) 물체의 치수와 자세를 일시적으로 조정한다. Referring to Figure 6, we first search for the index of the point whose coordinates define the minimum box fit to identify the edges of the clustered point cloud on the horizontal axis. Create a line using the farthest edge based on the dimensions and location of the object cluster, and search through all points in the cluster to find the third edge, which is the farthest point on the line. Then, the dimensions of the bounding box and center are updated using the three edges. Finally, the pose of the clustered objects is calculated with respect to the updated centroid. Because the presence of occlusion affects accurate pose and centroid estimation, the tracker module 140 maintains a history of the tracked object and heuristically temporarily adjusts the object's dimensions and pose.

라이다 포인트 클라우드에서 물체의 박스 피팅(Box fitting) 및 물체 분류는 라이다의 시야를 방해하는 어클루젼(occlusion)을 일으킬 수 있으므로 신중하게 수행되어야 한다. 포인트 클라우드의 박스 피팅은 효율적으로 추적할 수 있도록 물체의 중심을 찾는데 도움을 줄 뿐만 아니라, 초기 자세 추정을 제공한다. Box fitting and object classification of objects in a LiDAR point cloud must be performed carefully as they can cause occlusion that interferes with the LiDAR's view. Box fitting of a point cloud not only helps find the center of the object for efficient tracking, but also provides an initial pose estimate.

본 발명에서 L 형(L-shape) 클라우드 피팅을 갖는 최소 직사각형 영역을 사용하며, 이는 최적화된 연산 및 정확도 고려사항을 만족한다. In the present invention, we use a minimum rectangular area with L-shape cloud fitting, which satisfies optimized computation and accuracy considerations.

도 6은 본 발명의 일 실시예에 따른 박스 피팅 과정을 예시한 것이다. Figure 6 illustrates a box fitting process according to an embodiment of the present invention.

라이다 포인트 클라우드 측정에서 물체는 항상 어클루젼(occlusion) 상태이기 때문에 최소 직사각형 영역만으로 경계 박스 피팅의 목적을 수행할 수 없고, 결국 부정확한 자세 추정에 영향을 미친다. In LiDAR point cloud measurement, objects are always in a state of occlusion, so the purpose of bounding box fitting cannot be accomplished with only the minimum rectangular area, which ultimately affects inaccurate pose estimation.

따라서, 도 6에 도시된 바와 같이, 박스의 자세를 수정하기 위하여 최소 영역 직사각형 피팅에 L 형 피팅이 수행된다. Therefore, as shown in Figure 6, L-shaped fitting is performed on the minimum area rectangular fitting to modify the posture of the box.

도 6을 참조하면, 먼저 클러스터링된 포인트 클라우드의 차원이 계산되고, 계산된 차원으로부터 2D 직사각형 피팅의 초기 영역의 길이와 폭이 정해진다. Referring to Figure 6, first, the dimension of the clustered point cloud is calculated, and the length and width of the initial area of the 2D rectangular fitting are determined from the calculated dimension.

다음, 클러스터에서 가장 멀리 떨어져 있는 두 포인트를 찾고, 두 포인트 사이에 직선을 생성한다. Next, find the two points that are furthest away from the cluster and create a straight line between the two points.

마지막으로, 클러스터링된 클라우드 내에서, 상기 직선으로부터 가장 멀리 떨어져 있는 포인트를 찾는다. 결과적으로 물체의 실제 자세(pose)의 세 코너에 대응하는 세 개의 포인트가 생성된다. Finally, within the clustered cloud, we find the point that is furthest from the straight line. As a result, three points are created corresponding to the three corners of the object's actual pose.

본 발명에서는 초기 직사각형의 모서리와 대각선에서 가장 먼 포인트를 연결한 선 사이의 각도를 계산하여, 직사각형의 회전 또는 물체의 자세를 추정한다. In the present invention, the rotation of the rectangle or the posture of the object is estimated by calculating the angle between the edge of the initial rectangle and the line connecting the point furthest from the diagonal.

그리고, 높이 정보는 3D 경계 상자를 형성하기 위한 포인트 클라우드 차원으로부터 계산된다. Then, height information is calculated from the point cloud dimensions to form a 3D bounding box.

이러한 과정은 라이타 포인트 클라우드에서 객체를 나타내는 박스를 생성하고, 물체의 위치가 박스의 중심에 인덱스된다. 결국, 추적을 초기화하기 위한 물체를 차량 또는 보행자로 분류하기 위해 박스의 차원이 사용된다. 예를 들어, 시각화를 위해, 경계 박스 매칭 차원 규칙은 3D CAD 모델로 대체될 수 있다.This process creates a box representing the object in the Rita point cloud, and the object's location is indexed to the center of the box. Ultimately, the dimensions of the box are used to classify objects as vehicles or pedestrians to initiate tracking. For example, for visualization purposes, bounding box matching dimension rules can be replaced with a 3D CAD model.

도 6에서 정보의 흐름이 도시되어 있으며, 여기서 점 a와 b는 각각 클러스터의 최대 좌표와 최소 좌표로 식별되는 클러스터의 가장 먼 지점을 나타낸다.The flow of information is depicted in Figure 6, where points a and b represent the farthest points of the cluster, identified as the maximum and minimum coordinates of the cluster, respectively.

박스 피팅 작업에서 클러스터에서 가장 먼 지점을 찾는 것이 전체 프로세스 시간에 기여하며, 최적화된 공정 시간을 얻기 위해 수정된 핵심 요소는 다음과 같다. In box fitting operations, finding the farthest point in the cluster contributes to the overall process time, and the key factors modified to obtain optimized process time are:

(1) 모든 포인트를 순회하는 대신 클러스터의 최소 및 최대 좌표에 해당하는 점을 활용한다. (1) Instead of traversing all points, use the points corresponding to the minimum and maximum coordinates of the cluster.

(2) 클러스터의 가장 먼 지점은 센서에 대한 치수 및 클러스터 위치를 사용하여 경험적으로 발견된다. (2) The farthest point of the cluster is found empirically using the dimensions and cluster location with respect to the sensor.

다중 물체 추적은 자율 주행 차량의 인식 파이프라인에서 필수적인 구성 요소이다. 물체 추적 기능을 사용하면 시스템이 복잡한 환경에서 수행할 작업에 대해 더 나은 결정을 내릴 수 있다. 2D MOT(Multiple Object Tracking) 알고리즘에 대한 기존 방식은 이미지 평면에서 물체 추적에 중점을 두고 있으며, 고유한 ID를 유지하고 감지된 물체의 시간적으로 일관된 위치를 제공하는 것을 목표로 한다. 그러나 점점 더 많은 3D MOT 방식이 제안됨에 따라 3D 공간에서 물체를 추적하는 것이 필요하다. 3D MOT 시스템은 일반적으로 2D MOT 시스템과 유사한 구성 요소를 공유하며, 일부는 이미지 평면 대신 3D 공간에서 물체 감지의 구별을 형성한다. 이것은 잠재적으로 원근 왜곡 없이 3D 공간에서 직접 모션 모델, 데이터 연관, 오클루전 처리 및 궤적 유지의 설계를 허용한다. Multi-object tracking is an essential component in the perception pipeline of autonomous vehicles. Object tracking capabilities allow systems to make better decisions about what to do in complex environments. Existing approaches to 2D Multiple Object Tracking (MOT) algorithms focus on object tracking in the image plane, aiming to maintain unique identities and provide temporally consistent positions of detected objects. However, as more and more 3D MOT methods are proposed, it is necessary to track objects in 3D space. 3D MOT systems generally share similar components to 2D MOT systems, some of which make the distinction of object detection in 3D space instead of the image plane. This potentially allows the design of motion models, data correlation, occlusion processing and trajectory maintenance directly in 3D space without perspective distortion.

자율 주행 차량은 환경의 물체를 추적하는 데 필요한 동적 시스템과 같은 역할을 한다. 추적된 물체는 규칙적인 모션 패턴을 따르지 않는 경향이 있어 모션 불확실성이 발생한다. 이와 유사하게, 어수선한 환경과 센서의 한계는 추적된 물체의 위치와 자세의 불확실성을 더하는 물체의 부분적 또는 완전한 폐색을 초래한다. 불확실성이 있는 상태에서 추적을 수행하기 위해 일반적으로 베이지안 여과(Bayesian filtration) 전략이 사용된다. 상태 추정이 밀도에 대한 가우시안 혼합물의 가정 또는 전이에 대한 가우스 분포의 가정으로 수행되는 경우. 가정은 상태 추정을 위해 각각 가우스 혼합 확률 가설 밀도 필터(Probability Hypothesis Density Filter, PHDF) 또는 공동 확률 데이터 연관 필터(Joint Probabilistic Data Association Filter, JPDAF)의 사용으로 이어진다. 반면, 비가우시안 가정의 경우 입자 필터(Particle Filter, PF) 방법이 사용된다. 추적된 물체의 상태는 추적과 탐지 정보 간의 연결이 설정된 후 업데이트된다.Autonomous vehicles act like dynamic systems needed to track objects in the environment. Tracked objects tend not to follow regular motion patterns, resulting in motion uncertainty. Similarly, cluttered environments and sensor limitations result in partial or complete occlusion of objects, adding uncertainty to the location and pose of the tracked object. Bayesian filtration strategies are commonly used to perform tracking under uncertainty. When state estimation is performed with the assumption of a Gaussian mixture for the density or a Gaussian distribution for the transition. The assumption leads to the use of a Gaussian mixture Probability Hypothesis Density Filter (PHDF) or Joint Probabilistic Data Association Filter (JPDAF), respectively, for state estimation. On the other hand, for non-Gaussian assumptions, the Particle Filter (PF) method is used. The status of the tracked object is updated after the connection between tracking and detection information is established.

본 발명에서는 가우시안 분포를 가정하여 클러터(clutter)로 인한 불확실성을 해결하고 JPDAF를 데이터 연관에 적용한다. 이와 유사하게, 움직임으로 인한 불확실성은 IMM(Interacting Multiple Model)에 의해 처리되어 추적된 물체에 대한 상태의 비선형 예측을 수행한다. 가우스 프로세스에 대한 모션 모델의 비선형성을 수용하기 위해 UKF(Unscented Kalman Filter)가 사용된다. IMM-UKF-JPDAF의 구현은 클러터가 있는 상태에서 점프 마르코프(Markov) 비선형 시스템으로 설명되는 대상의 상태 및 모드 확률을 재귀적으로 추정하는 문제를 효율적으로 해결하는 접근 방식이다.In the present invention, uncertainty due to clutter is resolved by assuming Gaussian distribution, and JPDAF is applied to data correlation. Similarly, uncertainty due to motion is handled by the Interacting Multiple Model (IMM) to perform a non-linear prediction of the state for the tracked object. To accommodate the nonlinearity of the motion model for the Gaussian process, an Unscented Kalman Filter (UKF) is used. The implementation of IMM-UKF-JPDAF is an approach that efficiently solves the problem of recursively estimating the state and mode probabilities of an object described by a jumping Markov nonlinear system in the presence of clutter.

추적 가능한 물체는 비선형 확률적 상태 공간 모델로 표현되는 r개의 모션 모델을 따르는 것으로 가정한다.Trackable objects are assumed to follow r motion models expressed as nonlinear stochastic state space models.

본 발명에서 트랙 관리 과정에서는 트랙을 초기화하고 트랙의 유효성을 검사하는 초기화 및 유효성 검사 과정과, 불필요한 트랙을 제거하기 위한 트랙 가지 치기 과정과, 트랙 초기화가 진행된 후에 요, 속도 및 각속도 매개 변수를 업데이트하는 운동학적 매개 변수 업데이트 과정을 수행할 수 있다. In the present invention, the track management process includes an initialization and validation process to initialize the track and check the validity of the track, a track pruning process to remove unnecessary tracks, and an update of the yaw, velocity, and angular velocity parameters after track initialization. A kinematic parameter updating process can be performed.

라이다(LiDAR)를 사용하여 도시 시나리오에서 객체를 추적하는 작업은 어클루젼(occlusin), 유사하지 않은 동작 패턴 및 혼란을 포함하는 몇 가지 문제를 해결해야 한다. 어클루젼(occlusion)이라는 용어는 자율 주행 차량의 포즈 또는 측정을 방해하는 대상으로 인해 불완전한 공간 정보의 가능성을 나타낸다. 이로 인해 객체가 부분적으로 또는 전체적으로 폐색될 수 있다. 부분 폐색은 프레임 워크의 탐지 구성 요소에 의해 처리될 수 있고, 추적된 물체에 대한 완전한 방해는 위치의 불확실성에 영향을 미친다. 또한, 도시 시나리오의 객체는 자세와 속도의 일관된 변화와 같이 유사하지 않은 동작 패턴을 따르는 경향이 있다. 또한 많은 물체가 서로 가까이에 존재할 수 있기 때문에 불규칙성이 높아져 측정 불확실성이 증가한다. Tracking objects in urban scenarios using LiDAR must address several challenges, including occlusion, dissimilar motion patterns, and confusion. The term occlusion refers to the possibility of incomplete spatial information due to the pose of the autonomous vehicle or objects interfering with the measurements. This can cause the object to be partially or completely occluded. Partial occlusion can be handled by the detection component of the framework, while complete occlusion for the tracked object affects the uncertainty of the location. Additionally, objects in urban scenarios tend to follow dissimilar motion patterns, such as consistent changes in posture and speed. Additionally, because many objects may be close to each other, irregularities increase, increasing measurement uncertainty.

추적 구성 요소는 모션 및 혼란으로 인한 두 가지 주요 불확실성의 존재를 해결해야 한다. 따라서, 본 발명에서는 추적하는 동안 불확실성을 해결하기 위해 두 개의 필터가 설정된다. 모션 불확실성에 대한 필터에는 모션 패턴이 불확실한 객체를 추적하는 다중 모션 모델이 있는 시스템이 통합되어 있습니다. 반면, 클러 터 필터(clutter filter)는 추적된 객체가 가까이에 있을 때 발생하는 불확실성을 해결하기 위해 확률적 데이터 연관성 접근법을 사용합니다. 그러나 언급 된 두 필터는 서로 다른 단계에서 수행되는 필터링과 함께 단일 추적 체계로 결합 될 수 있습니다. The tracking component must address the presence of two major uncertainties due to motion and clutter. Therefore, in the present invention, two filters are set to resolve uncertainty during tracking. Filters for motion uncertainty incorporate systems with multiple motion models that track objects with uncertain motion patterns. On the other hand, clutter filters use a probabilistic data correlation approach to address the uncertainty that arises when tracked objects are close together. However, the two filters mentioned can be combined into a single tracking scheme with the filtering performed in different steps.

본 발명에서 제안한 프레임워크에서 IMM(Interacting Motion Model)은 모델의 비선형 성을 처리하기 위해 UKF(Unscented Kalman Filter)와 결합된 객체의 다른 모션 패턴을 레버에 배치하기 위해 배포된다. 반면, JPDAF(Joint Probabilistic Data Association Filter)는 UKF의 데이터 연관 단계에서 적용된 혼란으로 인한 불확실성을 해결하는 데 사용된다. 본 발명에서는 두 필터의 개성을 포괄하는 필터를 IMM-UKF-JPDAF라고 명명하기로 한다. In the framework proposed in this invention, an Interacting Motion Model (IMM) is deployed to place different motion patterns of the object on the lever, combined with an Unscented Kalman Filter (UKF) to handle the non-linearity of the model. On the other hand, Joint Probabilistic Data Association Filter (JPDAF) is used to resolve uncertainty due to confusion applied in the data association step of UKF. In the present invention, the filter encompassing the characteristics of the two filters is named IMM-UKF-JPDAF.

본 발명에서 IMM-UKF-JPDAF를 구현하는 것은 클러터(clutter)가 있을 때, 점프 마르코프 비선형(jump Markov non-linear) 시스템에 의해 설명되는 타겟의 상태 및 모드 확률을 재귀적으로 추정하는 문제를 효과적으로 해결하기 위한 접근법이다.Implementing IMM-UKF-JPDAF in the present invention solves the problem of recursively estimating the state and mode probability of the target described by a jump Markov non-linear system when there is clutter. This is an approach to effectively solve the problem.

r개 모델의 집합 을 형성하는 비선형 확률 상태 공간 모델을 수학식 (1)와 (2)로 나타낸다. set of r models The nonlinear stochastic state space model that forms is expressed by equations (1) and (2).

(4) (4)

(5) (5)

여기서, 입력 벡터는 이고, 측정 벡터는 z_k이고, f는 시스템이고, h는 측정 함수를 나타낸다. 또한, w, v는 제로-평균 가우시안 잡은 시퀀스를 특징짓고, 각각 서로 독립적인 공분산 행렬 Q 및 R이다. Here, the input vector is , the measurement vector is z _k , f is the system, and h represents the measurement function. Additionally, w and v characterize the zero-mean Gaussian Job sequence and are the covariance matrices Q and R, respectively, which are independent of each other.

r 모델들 사이에서 시스템의 진행은 시불변 마르코비안 모델 전이 확률 행렬의 최상부에서 작동하는 1 차 마르코프 체인으로 간주되며, 다음과 같은 식으로 나타낼 수 있다. The progression of the system between r models is considered as a first-order Markov chain operating on top of the time-invariant Markovian model transition probability matrix, and can be expressed as:

(6) (6)

여기서, 행렬의 요소 p_ij는 모델 i에서 j로의 모드 전이 확률을 나타낸다.Here, the element p _ij of the matrix represents the mode transition probability from model i to j.

본 발명에서 제안된 IMM-UKF-JPDAF 트래커는 (a) 상호 작용, (b) 상태 예측 및 측정 검증, (c) 데이터 연관 및 모델 기반 필터링, (d) 모드 확률 업데이트, (e) 조합 단계의 5단계 프로세스를 구비한다. 단일 대상 접근 방식에 비해 JPDAF는 여러 물체의 추적을 수행하기 위해 제안된 프레임워크에 배포된다. 이를 위해서는 각 트랙과 측정 사이의 연관 확률 계산이 필요하며, 모든 측정에서 가능한 모든 결합 연관 이벤트를 고려하여 조합 폭발 문제가 발생한다. The IMM-UKF-JPDAF tracker proposed in the present invention includes (a) interaction, (b) state prediction and measurement verification, (c) data association and model-based filtering, (d) mode probability updating, and (e) combination steps. Equipped with a 5-step process. Compared to single-object approaches, JPDAF is deployed in the proposed framework to perform tracking of multiple objects. This requires calculating the probability of association between each track and measurement, taking into account all possible joint association events across all measurements, resulting in the combinatorial explosion problem.

가능한 조합 폭발을 완화하기 위해 연관 행렬이 공동 연관 이벤트 세트로 클러스터링되는 클러스터링 기술이 채택된다. 클러스터의 수는 주변 및 공동 연관 이벤트의 합계와 같다. 클러스터링 기술은 복잡한 환경에서 자연스럽게 성장하는 가설의 조합 폭발을 완화하는 데 도움이 된다. 또한, 트랙 예측의 공분산은 연속적인 시간 단계에서 연관되지 않은 측정과 함께 증가하여 결과적으로 연관을 위한 게이트 영역이 증가한다. 더 큰 게이트 영역은 더 많은 수의 공동 연관 이벤트를 초래한다.To mitigate possible combinatorial explosion, a clustering technique is adopted where the association matrix is clustered into sets of co-related events. The number of clusters is equal to the sum of surrounding and co-related events. Clustering techniques help mitigate the explosion of combinations of hypotheses that naturally grow in complex environments. Additionally, the covariance of the track prediction increases with uncorrelated measurements in successive time steps, resulting in an increased gate area for correlation. A larger gate area results in a greater number of co-correlated events.

IMM-UKF-JPDAF의 상태, 공분산 및 모드 확률은 개별 모델 가능성의 도움으로 재귀적으로 추정된다. 개별 필터 상태와 공분산은 트랙의 모드 확률을 사용하여 단일 가중치 출력으로 결합된다. The states, covariances and mode probabilities of IMM-UKF-JPDAF are estimated recursively with the help of individual model likelihoods. The individual filter states and covariances are combined into a single weighted output using the track's modal probabilities.

도 7은 본 발명의 일 실시예에 따른 IMM-UKF-JPDAF 트래커를 도시한 것이다.Figure 7 shows an IMM-UKF-JPDAF tracker according to an embodiment of the present invention.

도 7에서 트래커 모듈(140)의 흐름에 있어서 클러스터링된 연관 행렬과 함께 트래커에 대한 하위 문제로 3개의 연관 클러스터를 형성하는 것이 도시되어 있다. In Figure 7, it is shown that in the flow of the tracker module 140, three association clusters are formed as sub-problems for the tracker along with a clustered association matrix.

또한 추적 파라미터를 클래스 및 신뢰도 백분율과 함께 표시하는 추적된 물체가 표시된다. Tracked objects are also displayed showing tracking parameters along with class and confidence percentage.

트래커 모듈(140)의 실행 시간은 주로 유지되는 트랙의 수에 의존한다. 시각적 분류가 없는 전자 구현은 일관되지 않은 측정을 기반으로 트랙을 잘라냈기 때문에 유지해야 할 트랙 수가 많다. 추적 대상이 시각적 물체 디텍터(150)에 의해 분류되는 추가 조건으로 추적 수를 줄임으로써 처리 시간을 단축할 수 있다. 프로세스 시간을 최적화하는 핵심 요소는 추적에 대한 오탐지 측정을 제한하는 효율적인 추적 관리 모듈(160)이다.The execution time of tracker module 140 primarily depends on the number of tracks maintained. Electronic implementations without visual classification have a large number of tracks to maintain because they cut tracks based on inconsistent measurements. Processing time can be shortened by reducing the number of tracks with the additional condition that the tracked object is classified by the visual object detector 150. A key element in optimizing process time is an efficient trace management module 160 that limits false positive measurements for traces.

물체 분류 작업에서 다중 양식 융합의 패러다임은 추적된 포인트 클라우드 클러스터가 분류되는 후기 융합으로 간주될 수 있다. 추적 포인트 클라우드 클러스터의 분류는 클래스 연결 및 클래스 관리의 두 가지 구성 요소를 포함한다. The paradigm of multimodality fusion in object classification tasks can be considered as late fusion, where tracked point cloud clusters are classified. Classification of tracked point cloud clusters includes two components: class association and class management.

클래스 연관에는 시각적으로 감지된 물체의 클래스를 추적된 포인트 클라우드 클러스터에 할당하는 프로세스가 포함된다. Class association involves the process of assigning the class of a visually detected object to a tracked point cloud cluster.

클래스 관리는 할당 내역을 활용하여 추적된 물체에 대한 클래스를 유지하고 선택한다. Class management utilizes the allocation history to maintain and select classes for tracked objects.

물체 분류 작업에서 임베디드 시스템에서 프로세스 시간을 100 밀리초(milliseconds) 미만으로 유지하는 416×416 픽셀로 입력 이미지 해상도가 조정된다. 그러나 입력 이미지 해상도가 감소하면 이미지의 크기, 채도 및 노출 문제로 인해 감지가 누락된다. 트래커 모듈(140)은 물체에 클래스를 확률적으로 할당하는 클래스 벡터로 누락된 탐지를 처리한다. 또한 라이다 범위는 60-80m로 제한되며, 이 범위를 넘어서면 포인트 클라우드가 물체 크기와 자세를 정확하게 추정하기가 어렵게 되며, 이 범위를 벗어난 물체를 인식하는 시각적 디텍터는 클래스 연결 프로세스에 추가적인 복잡성만 추가한다.For object classification tasks, the input image resolution is scaled to 416×416 pixels, keeping the process time below 100 milliseconds in embedded systems. However, as the input image resolution decreases, detection is missed due to size, saturation, and exposure issues in the image. Tracker module 140 handles missing detections with class vectors that probabilistically assign classes to objects. Additionally, LIDAR range is limited to 60-80m, beyond which it becomes difficult for point clouds to accurately estimate object size and pose, and visual detectors that recognize objects outside this range only add additional complexity to the class association process. Add.

도 8은 본 발명의 일 실시예에 따른 클래스 벡터를 이용하여 추적된 개체를 분류하는 시각적 물체 디텍터를 도시한 것이다.Figure 8 shows a visual object detector that classifies a tracked object using a class vector according to an embodiment of the present invention.

도 8에는 라이다 기반 탐지 추적 스레드와 병렬로 실행되는 물체 분류 스레드의 흐름이 도시되어 있다. 여기서, 성숙한 트랙의 클래스 벡터가 66.6%의 클래스 확실성을 초래하는 클래스 연관의 연령과 수를 나타내는 것으로 표시된다. 또한 이미지의 빨간색과 파란색 점은 각각 클러스터의 투영된 중심과 시각적으로 감지된 물체의 중심을 나타낸다.Figure 8 shows the flow of the object classification thread running in parallel with the lidar-based detection tracking thread. Here, the class vector of a mature track is shown representing the age and number of class associations resulting in a class certainty of 66.6%. Additionally, the red and blue dots in the image represent the projected center of the cluster and the center of the visually detected object, respectively.

추적된 물체는 각 치수가 시각적으로 감지 가능한 클래스에 등록된 클래스 벡터를 유지한다. 그리고, 융합 단계에서 성공적인 연결 후 벡터의 해당 클래스 치수에서 단위 증가가 이루어진다. 최대 개수는 클래스 벡터가 물체 클래스를 지정하는 치수인 반면 확실성은 트랙의 수명에 대해 계산된다.Tracked objects maintain class vectors where each dimension is registered to a visually detectable class. Then, in the fusion step, a unit increase is made in the corresponding class dimension of the vector after successful concatenation. The maximum number is the dimension where the class vector specifies the object class, while the certainty is calculated over the lifetime of the track.

물체의 분류, 클러터 및 움직임과 관련된 불확실성이 존재하는 MODT 작업에는 신뢰할 수 있는 정보를 유지하고 제공하기 위한 강력한 트랙 관리(track management) 모듈(160)이 필요하다. 트랙 관리 모듈(160)의 주요 목적은 추적 통계, 추적된 물체의 폐색 처리 및 거짓 긍정 측정과 관련된 트랙에서의 프루닝(pruning)을 초기화하고 유지 관리하는 것이다. MODT tasks, where there are uncertainties associated with object classification, clutter, and movement, require a powerful track management module 160 to maintain and provide reliable information. The main purpose of the track management module 160 is to initialize and maintain tracking statistics, occlusion handling of tracked objects, and pruning on tracks related to false positive measurements.

트랙 관리 모듈(160)은 고유한 ID를 사용하여 관련되지 않은 측정에 대한 새 트랙을 시작하고 프레임 수 측면에서 추적 기간을 기록한다. 또한 라이다 속성을 고려하는 동안 추적된 물체의 치수와 자세가 유지된다. 센서에서 더 멀리 이동하는 물체와 관련된 측정은 폐색이 증가하고 치수가 감소하는 경향이 있다. 반면에 센서에 더 가까이 접근하는 물체는 더 많은 노출을 얻고 상대적으로 더 정확한 치수를 제공한다. 유사한 패턴이 물체의 추정된 자세에서 관찰되며 요(yaw) 각의 급격한 변화를 부드럽게 처리한다. 그리고, 유지된 치수 및 자세의 정확성은 교합 처리 및 중심 보정에 도움이 된다. 센서 특성과 유지된 정보를 활용하여 추적 대상의 중심 C는 길이 ΔL, 너비 ΔW, 높이 ΔH 및 요(yaw) ψ의 변화를 사용하여 다음과 같이 수정된다.Track management module 160 starts a new track for an unrelated measurement using a unique ID and records the tracking period in terms of number of frames. Additionally, the dimensions and pose of the tracked object are maintained while taking LiDAR properties into account. Measurements involving objects moving farther from the sensor tend to increase occlusion and decrease dimensionality. On the other hand, objects that approach closer to the sensor receive more exposure and provide relatively more accurate dimensions. A similar pattern is observed in the estimated pose of the object, smoothing out rapid changes in yaw angle. And, the accuracy of the maintained dimensions and posture helps with occlusal processing and center correction. Using sensor characteristics and retained information, the center C of the tracked object is modified using changes in length ΔL, width ΔW, height ΔH, and yaw ψ as follows:

(7) (7)

(8) (8)

(9) (9)

도 9는 본 발명의 일 실시예에 따른 자세, 치수 및 중심 보정기를 구비한 트랙 관리 모듈을 도시한 것이다. 도 9에서 트랙 관리 모듈(160)과 관련된 작업이 중심 보정의 예와 함께 도시되어 있다. Figure 9 shows a track management module with attitude, dimension and centroid correctors according to one embodiment of the present invention. In Figure 9 the operations associated with the track management module 160 are shown with an example of centroid correction.

도 9를 참조하면, 물체의 성숙한 트랙은 ΔW와 ΔL의 차이를 찾기 위해 측정된 치수 W_m 및 L_m과 비교하여 너비 W 및 길이 L의 치수를 유지한다. 치수의 변화는 정확한 중심 C'를 얻기 위해 센서에 대한 물체의 위치와 함께 활용된다. 또한 물체의 성숙한 트랙은 정확한 치수와 중심이 있는 빨간색 상자를 따라 와이어 프레임 경계 상자로 측정된 치수를 나타낸다. 초기화된 트랙은 성숙한 트랙으로 간주되기 위해 연속 5개의 시간 단계에 대한 측정 연관이 필요하다. 트랙이 성숙 기간 내에 시각적 디텍터로부터 분류를 받지 않고 연관 측정을 놓치면 해당 트랙이 제거된다. 또한 60%를 초과하는 모든 성숙한 트랙에 대한 비율 Pⁱ _o는 이상값(outliers)으로 필터링된다. 또한 연속 5개 시간 단계에 대한 공통 측정값을 공유하는 성숙한 트랙은 일치하지 않거나 더 어린 트랙을 프루닝(pruning)한다. Referring to Figure 9, the mature track of the object maintains the dimensions of width W and length L, which are compared to the measured dimensions W _m and L _m to find the difference between ΔW and ΔL. The change in dimensions is utilized along with the position of the object relative to the sensor to obtain the exact center C'. Additionally, the mature track of an object represents its measured dimensions as a wireframe bounding box along with a red box with the exact dimensions and center. An initialized track requires measurement correlation for five consecutive time steps to be considered a mature track. If a track does not receive classification from a visual detector within the maturity period and misses an associated measurement, it is removed. Additionally, proportions P ⁱ _o for all mature tracks exceeding 60% are filtered out as outliers. Additionally, mature tracks that share a common measurement for five consecutive time steps pruning mismatched or younger tracks.

도 9에서 보는 바와 같이, 추적된 물체의 위치, 자세, 시간 등의 궤적을 저장하여 각 시간 단계에서 추적된 물체의 상대적 진행 방향, 속도, 각속도를 계산하는데 활용한다. 추적된 물체에 클래스 확실성 Pⁱ _c을 제공하는 벡터 Aⁱ가 업데이트된다. As shown in Figure 9, the trajectory of the tracked object's position, posture, and time is stored and used to calculate the relative direction, speed, and angular velocity of the tracked object at each time step. The vector A ⁱ providing the class certainty P ⁱ _c for the tracked object is updated.

본 발명에서는 비쥬얼 라이다(Visual-LiDAR) 설정에서 작동하는 임베디드 시스템을 위한 효율적인 MODT 프레임워크를 제안한다. MODT 프레임워크는 시간적 양식의 후기 융합을 수행하여 공간 라이다 데이터 및 2D 장면 이해를 활용한다. 또한 MODT는 조밀한 환경 조건에서 시각적 주행 거리 측정을 돕기 위해 사용될 수 있다. In the present invention, we propose an efficient MODT framework for embedded systems operating in a visual-LiDAR setting. The MODT framework leverages spatial LIDAR data and 2D scene understanding by performing late fusion of temporal modalities. MODT can also be used to aid visual odometry in dense environmental conditions.

이상 본 발명을 몇 가지 바람직한 실시 예를 사용하여 설명하였으나, 이들 실시 예는 예시적인 것이며 한정적인 것이 아니다. 본 발명이 속하는 기술분야에서 통상의 지식을 지닌 자라면 본 발명의 사상과 첨부된 특허청구범위에 제시된 권리범위에서 벗어나지 않으면서 다양한 변화와 수정을 가할 수 있음을 이해할 것이다.Although the present invention has been described above using several preferred examples, these examples are illustrative and not limiting. Those of ordinary skill in the technical field to which the present invention pertains will understand that various changes and modifications can be made without departing from the spirit of the present invention and the scope of rights set forth in the appended claims.

110 지면 분류 모듈 120 클러스터링 모듈
130 박스 피팅 모듈 140 트래커 모듈
150 물체 디텍터 160 트랙 관리 모듈110 Ground classification module 120 Clustering module
130 Box Fitting Module 140 Tracker Module
150 object detector 160 track management module

Claims

물체를 인식하기 위한 제1 센서인 라이다에서 생성된 포인트 클라우드에 대하여 지면 포인트 클라우드와 비 지면 포인트 클라우드로 구분하기 위한 지면 분류 모듈;
상기 지면 분류 모듈에서 분류된 포인트 클라우드에 대하여 하나의 물체에 속하는 포인트들을 그룹핑하고 라벨링하는 방식으로 클러스터링 작업을 수행하기 위한 클러스터링 모듈;
상기 클러스터링 모듈에서 클러스터링된 포인트들에 대하여 물체를 나타내는 3D 경계 박스를 생성하고, 생성된 박스 영역을 이용하여 물체를 분류하기 위한 박스 피팅 모듈;
상기 박스 피팅 모듈에서 분류된 물체를 트랙에 할당하여, 모션 패턴을 예측하여 추적하는 트래커 모듈;
물체를 인식하기 위한 제2 센서인 카메라에서 촬영된 영상에서 물체를 인식하기 위한 물체 디텍터; 및
상기 트래커 모듈에서 추적한 물체와 상기 물체 디텍터에서 인식한 물체를 융합하고, 가짜 트랙을 취소하고, 트랙을 유지하고 관리하는 트랙 관리 모듈을 포함하고,
상기 지면 분류 모듈은, 포인트 클라우드의 인덱스를 원통형 극 그리드에 분포시키고, 상기 원통형 극 그리드에서 라이다가 위치한 높이와 동일한 지면 레벨을 기준으로 지면 포인트 클라우드와 비 지면 포인트 클라우드를 분류하고,
상기 클러스터링 모듈은, 비 지면 포인트 클라우드에 대하여 3D 원통형 그리드에 분포시키고, 3D 원통형 그리드에서 선택된 인덱스 셀 주변에 있는 인접 셀에서 연관된 포인트들을 검색하고, 연관된 포인트가 존재하는 인접 셀을 클러스터 멤버로 표시하는 방식으로 클러스터링을 수행하고,
상기 박스 피팅 모듈은, L 형 클라우드 피팅을 갖는 최소 직사각형 영역을 사용하여 3D 경계 박스의 자세를 수정하고, 수정된 3D 경계 박스의 치수를 이용하여 물체를 분류하고,
상기 트랙 관리 모듈은, 트랙을 초기화하고 트랙의 유효성을 검사하는 초기화 및 유효성 검사 과정과, 불필요한 트랙을 제거하기 위한 프루닝(pruning) 과정과, 트랙 초기화가 진행된 후에 요, 속도 및 각속도 파라미터를 업데이트하는 과정을 수행하며,
상기 원통형 극 그리드는 채널(channel)로 구성되고, 채널은 원통형 극 그리드의 원점에서부터 가장 멀리 떨어진 값까지의 수직 슬라이스이고, 해당 영역의 가능한 포인트 수를 포함하는 빈(bin)으로 더 세분화되고, 극 그리드의 모든 빈들은 차량에서 시작하여 라이다 센서 높이와 동일한 지면 레벨인 로컬 지면 레벨을 추적하기 위해 횡단되어 있고, 어레이의 각 셀에 수직으로 슬라이스된 실린더의 섹션에 속하는 포인트 클라우드 측정의 인덱스가 채널과 빈(bin)으로 표시되고, 각 채널은 지면 수준을 추정하기 위해 차량에서 바깥쪽으로 향하게 독립적으로 횡단되고,
상기 지면 분류 모듈은 상기 극 그리드의 각 셀에서 지면으로부터의 센서 높이를 초기 지면 높이로 간주하고 연속된 셀의 가장 낮은 측정값까지의 기울기를 계산하고, 비지면 측정 및 이전 지면 수준을 포함하는 셀과 관련된 임계값을 초과하는 경사가 유지되고, 임계값 한계 내의 경사에 따라 후속 셀의 지면 수준을 업데이트하고, 그리드의 모든 셀이 지면 수준에 도달하면 포인트 클라우드가 허용오차 파라미터로 분리되어 에지 노이즈를 제거하고,
상기 지면 분류 모듈은 각 셀의 최저점과 최고점이 포인트 클라우드의 인덱스가 그리드에 분포될 때 발견되고, 각 채널을 두 번 횡단하는 대신, 빈을 따라 경사 및 로컬 지면 수준의 추정이 단일 횡단에서 수행되고, 포인트 클라우드 인덱스에 대한 반복 수행이 빈에 레이블을 지정하는 단계를 건너뛰고 지면 및 비 지면 포인트에 대한 포인트 클라우드를 형성하는데 활용되는 것으로 프로세스가 최적화되고,
상기 박스 피팅 모듈은 L 형 클라우드 피팅을 갖는 최소 직사각형 영역을 사용하여 3D 경계 박스의 자세를 수정하고, 수정된 3D 경계 박스의 치수를 이용하여 물체를 분류함에 있어서, 가로 축에서 클러스터된 포인트 클라우드의 모서리를 식별하기 위해 최소 박스 피팅을 정의하는 좌표가 있는 점의 인덱스를 탐색하고, 물체 클러스터의 치수와 위치에 따른 가장 먼 모서리를 사용하여 선을 만들고, 클러스터의 모든 점을 탐색하여 라인에서 가장 먼 점을 세 번째 모서리로 찾고, 세 모서리를 사용하여 경계 박스 및 중심의 치수를 업데이트하고, 업데이트된 중심에 대해 클러스터링된 물체의 자세를 계산하고,
상기 트랙 관리 모듈은 고유한 ID를 사용하여 관련되지 않은 측정에 대한 새 트랙을 시작하고 프레임 수 측면에서 추적 기간을 기록하고, 라이다 속성을 고려하는 동안 추적된 물체의 치수와 자세가 유지되도록 하고, 센서 특성과 유지된 정보를 이용하여 추적 대상 물체의 중심 C(C_x, C_y, C_z)를 수정하며, ΔL을 길이의 변화, ΔW을 너비의 변화, ΔH를 높이의 변화, ψ을 요(yaw)의 변화라고 할 때, 수정된 중심 좌표를,

의 수학식으로 나타낼 수 있는 것을 특징으로 하는 실시간 3차원 물체 인식 및 추적하는 시스템.
A ground classification module for classifying point clouds generated by LiDAR, the first sensor for recognizing objects, into ground point clouds and non-ground point clouds;
a clustering module for performing a clustering task by grouping and labeling points belonging to one object on the point cloud classified by the ground classification module;
a box fitting module for generating a 3D bounding box representing an object for points clustered in the clustering module and classifying the object using the generated box area;
a tracker module that assigns objects classified by the box fitting module to tracks, predicts motion patterns, and tracks them;
An object detector for recognizing an object in an image captured by a camera, which is a second sensor for recognizing an object; and
A track management module that fuses the object tracked by the tracker module and the object recognized by the object detector, cancels fake tracks, and maintains and manages the track,
The ground classification module distributes the index of the point cloud to a cylindrical pole grid, classifies the ground point cloud and the non-ground point cloud based on the ground level equal to the height at which the lidar is located in the cylindrical pole grid,
The clustering module distributes the non-ground point cloud on a 3D cylindrical grid, searches for related points in adjacent cells around the selected index cell in the 3D cylindrical grid, and displays adjacent cells in which the related points exist as cluster members. Clustering is performed in this way,
The box fitting module modifies the pose of the 3D bounding box using a minimum rectangular region with L-shaped cloud fitting, classifies the object using the dimensions of the modified 3D bounding box, and
The track management module includes an initialization and validation process to initialize the track and check the validity of the track, a pruning process to remove unnecessary tracks, and update yaw, velocity, and angular velocity parameters after track initialization. Perform the process of
The cylindrical polar grid is made up of channels, which are vertical slices from the origin of the cylindrical polar grid to the furthest value, further subdivided into bins containing the number of possible points in that area, and All bins of the grid are traversed to track the local ground level, which is the ground level equal to the lidar sensor height, starting from the vehicle, and the index of the point cloud measurement belonging to the section of the cylinder sliced perpendicularly to each cell of the array is the channel. and bins, each channel is independently traversed facing outward from the vehicle to estimate ground level;
The ground classification module considers the sensor height from the ground in each cell of the pole grid as the initial ground height and calculates the slope to the lowest measurement in successive cells, including non-ground measurements and previous ground levels. Slopes exceeding the threshold associated with are maintained, updating the ground level of subsequent cells according to slopes within the threshold limits, and once all cells in the grid have reached ground level, the point cloud is separated by a tolerance parameter to remove edge noise. remove,
In the ground classification module, the lowest and highest points of each cell are found when the indices of the point cloud are distributed on a grid, and instead of traversing each channel twice, the estimation of slope and local ground level along the bin is performed in a single traverse. , the process is optimized such that iterating over the point cloud index is utilized to form point clouds for ground and non-ground points, skipping the step of labeling bins;
The box fitting module modifies the pose of the 3D bounding box using the minimum rectangular region with L-shaped cloud fitting, and classifies the object using the dimensions of the modified 3D bounding box, where the clustered point cloud on the horizontal axis is used. To identify an edge, we search the index of the point whose coordinates define the minimum box fit, create a line using the farthest edge based on the dimensions and location of the object cluster, and search all points in the cluster to find the most distant edge on the line. Find the point by the third edge, use the three edges to update the dimensions of the bounding box and centroid, calculate the pose of the clustered object with respect to the updated centroid,
The track management module starts a new track for unrelated measurements using a unique ID, records the tracking period in terms of number of frames, ensures that the dimensions and pose of the tracked object are maintained while taking lidar properties into account, and , using sensor characteristics and _maintained information to modify _the center C ( _C When referring to a change in yaw, the modified center coordinates are,

A system for real-time 3D object recognition and tracking, characterized in that it can be expressed in the mathematical equation of .

삭제delete

청구항 1에 있어서,
상기 트래커 모듈은, 다른 모션 패턴을 캡쳐하기 위한 IMM(Interactive Multiple Model), 모션 모델의 비선형성을 처리하기 위한 UKF(Unscented Kalman Filter) 및 클루터(clutter) 존재시 상기 물체에 대한 측정 데이터를 연관시키기 위한 JPDAF(Joint Probabilistic Data Association Filter)를 포함하는 IMM-UKF-JPDAF를 이용하여 물체의 운동 상태를 추정하는 것을 특징으로 하는 실시간 3차원 물체 인식 및 추적하는 시스템.In claim 1,
The tracker module correlates measurement data for the object in the presence of IMM (Interactive Multiple Model) to capture different motion patterns, UKF (Unscented Kalman Filter) to handle non-linearity of the motion model, and clutter. A real-time 3D object recognition and tracking system characterized by estimating the motion state of an object using IMM-UKF-JPDAF, which includes a Joint Probabilistic Data Association Filter (JPDAF).

삭제delete