KR101896941B1

KR101896941B1 - Stereo matching apparatus and its method

Info

Publication number: KR101896941B1
Application number: KR1020120068236A
Authority: KR
Inventors: 박정아; 이지하
Original assignee: 엘지이노텍 주식회사
Priority date: 2012-06-25
Filing date: 2012-06-25
Publication date: 2018-09-12
Also published as: KR20140000833A

Abstract

발명의 실시예에 따른 스테레오 매칭 장치는 스테레오 카메라를 통해 입력된 스테레오 영상을 전처리하는 스테레오 전처리부; 상기 전처리된 스테레오 영상에 대해 정합 노드와 비정합 노드에서의 정합 코스트와 비정합 코스트를 계산하여 검출된 최적 경로에 대응하는 양안차를 결정하는 스테레오 매칭부; 및, 상기 스테레오 매칭부의 출력에 연결되고, 연재 프레임과 이전 프레임의 상관관계를 이용하여 현재 프레임의 누적 평균 픽셀 값을 출력하는 시간축 필터링부가 내부에 형성된 스테레오 후처리부;를 포함한다.According to an embodiment of the present invention, there is provided a stereo matching apparatus comprising: a stereo preprocessing unit for preprocessing a stereo image input through a stereo camera; A stereo matching unit for calculating a matching cost and an unmatched cost at a matching node and an unmatched node with respect to the preprocessed stereo image and determining a binocular disparity corresponding to the detected optimal path; And a stereo post-processing unit, connected to the output of the stereo matching unit, for generating a cumulative average pixel value of the current frame by using a correlation between the serial frame and the previous frame.

Description

스테레오 매칭 장치 및 그 방법{STEREO MATCHING APPARATUS AND ITS METHOD}[0001] STEREO MATCHING APPARATUS AND ITS METHOD [0002]

본 발명은 스테레오 매칭 기법에 관한 것으로, 더욱 상세하게는 시간축(Time-domain filter) 필터를 이용한 깊이 영상(depth map)을 생성하는 스테레오 매칭 장치 및 그 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a stereo matching method, and more particularly, to a stereo matching apparatus and method for generating a depth map using a time-domain filter.

사람의 시각은 주위 환경의 정보를 얻기 위한 감각의 하나로서, 두 눈을 통해 사물의 위치와 멀고 가까움을 인지할 수 있다. 즉, 두 눈을 통해 들어오는 시각 정보들이 하나의 거리 정보로 합성되면서 자유롭게 활동할 수 있는 것이다.Human vision is one of the senses for acquiring information of the surrounding environment, and it is possible to recognize the location and distance of objects through the two eyes. In other words, the visual information coming in through the two eyes is synthesized into one piece of distance information and can act freely.

이러한 시각 구조를 기계에 구현함으로써, 인간을 대신할 수 있는 로봇을 개발해 왔는데, 로봇의 시각 시스템은 스테레오 카메라(즉, 좌측 카메라와 우측 카메라)로 구성되고, 두 개의 카메라로부터 영상을 읽어 들여 하나의 3차원 정보로 재구성하게 된다.The visual system of the robot consists of a stereo camera (that is, a left camera and a right camera), and reads images from two cameras to obtain a single And reconstructed into three-dimensional information.

이때, 하나의 영상은 3차원 공간을 2차원 공간으로 사영시킨 것을 의미하는데, 이 과정에서 3차원 거리 정보(depth)를 상실하게 되고, 이것으로부터 3차원 공간을 직접 복원하는 것은 매우 어렵게 된다. 하지만, 서로 다른 위치에서 얻어진 두 장 이상의 영상이 있을 경우에는 3차원 공간의 복원을 수행할 수 있다.In this case, one image means that a three-dimensional space is projected into a two-dimensional space. In this process, the three-dimensional distance information (depth) is lost, and it is very difficult to directly restore the three-dimensional space. However, if there are two or more images obtained from different positions, the restoration of the three-dimensional space can be performed.

즉, 실제 공간상의 한 점이 두 장의 영상에 맺혔을 때 두 영상에서 나타나는 그 점의 대응점을 찾고, 기하학적인 구조를 이용하면 그 점의 실제 공간에서의 위치를 찾을 수 있다.In other words, when a point on the real space is formed on two images, the corresponding point of the point appearing in the two images can be found, and the geometrical structure can be used to find the position in the real space of the point.

하지만, 두 영상의 대응점을 찾는 일(이하 '스테레오 매칭'이라 함)은 매우 어려운 작업이며, 3차원 공간을 추정 및 복원하는데 필요한 가장 중요한 기술이다. 이러한 스테레오 매칭은 많은 형태의 결과를 가질 수 있는데, 그 결과가 너무 많기 때문에 그것들 중 실제 공간을 대변하는 결과만을 찾는 것은 매우 어려운 문제이다.However, finding matching points of two images (hereinafter referred to as "stereo matching") is a very difficult task and is the most important technique for estimating and restoring a three-dimensional space. Such stereo matching can have many types of results, and since the results are too many, it is very difficult to find only the results that represent the actual space.

일반적으로, 스테레오 매칭 기술은 마코프 랜덤 필드(Markov Random Field :MRF) 모델을 기반으로 하며, 이 2차원 필드 모델은 복잡한 물체의 모델링을 간단하고, 지역적인 관계가 있는 확률 모델로 만들어 주는데, 이 마코프 랜덤 필드 모델(MRF)은 계산이 복잡하고, 계산량이 많으며, 결과의 경계가 희미해지는 단점을 가지고 있다.In general, the stereo matching technique is based on the Markov Random Field (MRF) model, which makes modeling of complex objects a simple and localized probabilistic model. The random field model (MRF) has a disadvantage that the calculation is complicated, the calculation amount is large, and the boundary of the result is blurred.

한편, 고속 처리를 기반으로 하는 스테레오 매칭 기술에는 격자(trellis) 구조를 가지는 동적 계획(dynamic programming) 기술이 있는데, 이 기술은 격자 구조를 기반으로 하며, 상당히 빠르고 정확한 스테레오 매칭을 수행할 수 있다. 하지만, 이러한 동적 계획 기술은 단일 스캔라인에서만 검색을 통해 매칭을 수행하기 때문에 아랫줄, 윗줄의 결과를 고려하지 않게 되어 스트라이프 노이즈(stripe noise)가 많이 발생한다. On the other hand, a stereo matching technique based on high-speed processing has a dynamic programming technique having a trellis structure, which is based on a lattice structure and can perform a very fast and accurate stereo matching. However, this dynamic planning technique does not consider the results of the bottom line and the top line because matching is performed through a search in only a single scan line, resulting in a lot of stripe noise.

즉, 모든 줄의 스테레오 매칭이 독립적으로 이루어지게 되고, 그 결과 상, 하 행의 결과와 현재 행의 결과가 다르게 나타나는 현상이 발생하게 된다.That is, the stereo matching of all the lines is performed independently, and as a result, the result of the up / down and the result of the current line appear differently.

이러한 잡음을 줄이기 위해 영상처리에서 중간값 필터, 평균값 필터 등이 사용되고 있으나 각 프레임 별로 적용되기 때문에 이전 프레임과의 연관성을 고려하지 않아 잡음이 매 노이즈마다 변하여 안정적인 깊이 영상을 추출하기 어렵다는 문제점이 존재한다.In order to reduce such noises, an intermediate value filter and an average value filter are used in image processing, but since they are applied to each frame, there is a problem in that it is difficult to extract a stable depth image because the noise is changed every noise, .

본 발명은 상기의 문제점을 개선하기 위해 프레임간 픽셀 대 픽셀의 관계가 아닌 픽셀 대 여러 픽셀의 관계를 이용하여 시간 축 필터를 적용한다.In order to solve the above problems, the present invention applies a time axis filter using the relation of pixels to several pixels rather than the relationship between pixels to pixels.

또한 시간 축 필터 적용 전에 각 프레임마다 세로방향(vertical)으로 중간값 필터를 적용하여 노이즈를 감소시키는 것을 목적으로 한다.It is also intended to reduce the noise by applying an intermediate value filter in the vertical direction for each frame before applying the time-axis filter.

발명의 실시예에 따른 스테레오 매칭 방법은 스테레오 카메라를 통해 입력된 스테레오 영상을 전처리하는 단계; 상기 전처리된 스테레오 영상에 대해 양안차를 통한 정합 노드와 비정합 노드에서의 코스트를 계산하여 최적의 경로를 검색하고 이에 대응하는 제1 깊이 영상을 생성하는 단계; 및, 상기 깊이 영상에서 이전 프레임과 현재 프레임의 상관 관계를 이용하여 잡음이 제거된 제2 깊이 영상을 시간축 필터링 부에서 추출하는 단계;를 포함한다.According to an embodiment of the present invention, there is provided a stereo matching method including: preprocessing a stereo image input through a stereo camera; Calculating a cost at the matching node and the non-matching node through the binocular difference with respect to the preprocessed stereo image to search for an optimal path and generating a corresponding first depth image; And extracting a second depth image from the depth image using the correlation between the previous frame and the current frame, the second depth image having no noise removed by the temporal filtering unit.

발명의 실시예에 따르면 스테레오 매칭을 통한 3차원 공간 복원에 있어서 종래보다 향상된 양안차 맵(3차원 정보)을 획득할 수 있다.According to the embodiment of the present invention, it is possible to obtain a binocular difference map (three-dimensional information) which is more improved than the conventional one in the three-dimensional space restoration through stereo matching.

도 1은 발명의 실시예에 따른 스테레오 매칭 장치의 블록 구성도이다.
도 2는 발명의 실시예에 따라 3차원 정보를 추정하기 위한 에피폴라 라인을 예시한 도면이다.
도 3은 발명의 실시예에 따라 양안차와 3차원 정보간의 관계를 설명하기 위한 도면이다.
도 4는 발명의 실시예에 따라 격자 구조의 정합 노드와 비정합 노드를 예시한 도면이다.
도 5는 발명의 다른 실시예에 따른 스테레오 매칭 장치의 블록 구성도이다.
도 6은 발명의 다른 실시예에 따른 세로방향 중간값 필터링을 나타낸 도면이다.
도 7은 발명의 실시예에 따라 스테레오 매칭을 수행하는 과정을 도시한 플로우차트이다.1 is a block diagram of a stereo matching apparatus according to an embodiment of the present invention.
2 is a diagram illustrating an epipolar line for estimating three-dimensional information according to an embodiment of the present invention.
3 is a diagram for explaining a relationship between a binocular and a three-dimensional information according to an embodiment of the present invention.
4 is a diagram illustrating a matching node and a non-matching node in a lattice structure according to an embodiment of the present invention.
5 is a block diagram of a stereo matching apparatus according to another embodiment of the present invention.
6 is a diagram illustrating longitudinal median filtering according to another embodiment of the invention.
7 is a flowchart illustrating a process of performing stereo matching according to an embodiment of the present invention.

이하, 본 발명의 바람직한 실시 예에 대하여 첨부도면을 참조하여 상세히 설명하기로 한다. 기타 실시 예들의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예를 참조하면 명확해질 것이다.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. The details of other embodiments are included in the detailed description and drawings. BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and how to accomplish them, will become apparent by reference to the embodiments described in detail below with reference to the accompanying drawings.

도 1은 발명의 실시예에 따른 스테레오 매칭 장치의 블록 구성도이다. 발명의 실시예에 따른 스테레오 매칭 장치(100)는 제1 카메라(210), 제2 카메라(220), 스테레오 전 처리부(300), 스테레오 매칭부(400) 및 시간축 필터링부(500)를 포함한다. 1 is a block diagram of a stereo matching apparatus according to an embodiment of the present invention. The stereo matching apparatus 100 according to an embodiment of the present invention includes a first camera 210, a second camera 220, a stereo preprocessing unit 300, a stereo matching unit 400, and a time axis filtering unit 500 .

상기 제1 카메라(210) 및 제2 카메라(220)는 모듈 등을 이용하여 피사체를 촬영하는 좌측 카메라와 우측 카메라를 의미한다. The first camera 210 and the second camera 220 refer to a left camera and a right camera that photograph a subject using a module or the like.

입력되는 촬영 이미지(또는 동영상)는 렌즈를 통해 CMOS 모듈 또는 CCD 모듈로 제공되고, CMOS 모듈 또는 CCD 모듈은 렌즈를 통과한 피사체의 광신호를 전기적 신호(촬영 신호)로 변환 출력하며, 카메라가 갖는 노출, 감마, 이득 조정, 화이트 밸런스, 컬러 매트릭스 등을 수행한 후에, ADC(analog to digital converter, 이하 'ADC'라 함)를 통해 촬영 산호를 디지털 신호로 변환하여 좌측 영상 및 우측 영상을 각각 스테레오 전처리부(300)로 전달한다.The input image (or moving image) is supplied to the CMOS module or the CCD module via the lens, and the CMOS module or the CCD module converts the optical signal of the subject passing through the lens into an electrical signal (photographing signal) After performing the exposure, the gamma, the gain adjustment, the white balance, the color matrix, and the like, the image coring is converted into a digital signal through an ADC (Analog to Digital Converter) to convert the left image and the right image into stereo To the preprocessing unit 300.

그리고, 스테레오 전처리부(300)는 교정 파라미터(rectification parameter)값을 이용하여 스테레오 영상을 교정(rectification)하고, 영상의 에피폴라 라인(epipolar line)을 일치시켜 출력하는 것으로, 제 1 카메라(210) 및 제 2 카메라(220)로부터 각각 전달되는 좌측 영상 및 우측 영상에 대한 스테레오 카메라의 교정 파라미터를 이용하여 에피폴라 라인이 수평이 될 수 있도록 영상을 교정하고, 좌측 카메라와 우측 카메라의 특성 차이 및 좌우 영상의 밝기 차이를 보정한다.The stereo preprocessing unit 300 rectifies the stereo image using a rectification parameter value and outputs an epipolar line of the image in synchronization with each other. And corrects the image so that the epipolar lines become horizontal by using the calibration parameters of the stereo camera for the left and right images respectively transmitted from the second camera 220 and the right and left cameras, Correct the brightness difference of the image.

일반적인 스테레오 영상에서 한쪽 영상의 한 점이 다른 영상의 라인으로 대응되게 되는데, 상기 에피폴라 라인은 이 대응되는 라인을 의미하는 것으로 상기 에피폴라 라인을 일치시키기 위해 스테레오 카메라 전방에 좌측 카메라 및 우측 카메라의 베이스 라인 벡터와 평행하도록 스테레오 카메라 앞에 교정용 패턴을 이용하여 교정 파라미터를 추출하며, 이러한 교정 파라미터를 저장해 두고 초기화 작업 시 이러한 교정 파라미터를 이용하여 에피폴라 라인이 수평이 되게 할 수 있다.In a typical stereo image, a point of one image corresponds to a line of another image. The epipolar line means the corresponding line. In order to match the epipolar lines, the left camera and the right camera The calibration parameters are extracted using a calibration pattern in front of the stereo camera so as to be parallel to the line vector, and these calibration parameters are stored and the calibration parameters can be used during the initialization to make the epipolar line horizontal.

스테레오 매칭부(400)는 피라미드 형태의 격자 구조(trellis)에서 양안차를 통한 대응점을 지역적(local) 매칭기법으로 검색한다. 이에 대해 구체적으로 설명하면 하기와 같다.The stereo matching unit 400 searches for a corresponding point through a binocular in a pyramid-shaped trellis by a local matching technique. This will be described in detail as follows.

도 2는 발명의 실시예에 따라 3차원 정보를 추정하기 위한 에피폴라 라인을 예시한 도면이고, 도 3은 발명의 실시예에 따라 양안차와 3차원 정보간의 관계를 설명하기 위한 도면이다.FIG. 2 is a diagram illustrating an epipolar line for estimating three-dimensional information according to an embodiment of the present invention, and FIG. 3 is a diagram for explaining a relationship between a binocular and three-dimensional information according to an embodiment of the present invention.

스테레오 매칭은 2차원 좌측 영상과 우측 영상, 즉 스테레오 영상으로부터 3차원 공간을 재구성하는 것으로, 두 개의 2차원 영상에서 대응점들을 찾아 상호간의 기하학적 관계를 이용하여 3차원 정보를 추정한다. 예를 들면, 도 2에 도시된 바와 같이, 두 개의 2차원 영상에서 대응점들을 찾아 상호간의 기하학적 관계를 이용하여 3차원 정보를 추정하기 위하여 스테레오 영상에서 한쪽 영상(Image1)의 한 점(P)에 대응되는 점을 다른 쪽 영상(Image2)에서 찾아야 하는데, 이 점(예를 들면, P', P1', P2')은 기준 영상(즉, Image1)에서의 점(P)에 대한 대응 영상(즉, Image2)의 에피폴라 라인 상에 있음을 알 수 있으며, 에피폴라 라인에 대한 교정(rectification)을 수행하면, 수평으로 펼쳐진 두 개의 단일 스캔라인만을 검사함으로써, 스테레오 매칭을 수행할 수 있다.Stereo matching reconstructs a three-dimensional space from a two-dimensional left image and a right image, that is, a stereo image, and finds corresponding points in two two-dimensional images to estimate three-dimensional information using mutual geometric relationships. For example, as shown in FIG. 2, in order to estimate the three-dimensional information using mutual geometric relationships by finding corresponding points in two two-dimensional images, a point P of one image (Image1) It is necessary to find a corresponding point on the other image (Image2), and this point (for example, P ', P1', P2 ') corresponds to the corresponding image for the point P in the reference image , Image2). When rectifying the epipolar lines, it is possible to perform stereo matching by inspecting only two single scan lines horizontally spread.

그리고, 도 3은 스테레오 카메라로부터 얻은 좌측 영상 및 우측 영상과 그 영상들에 맺힌 사물과의 관계를 나타내는데, 대응되는 각 픽셀에 대한 양안차(d)는 단일 라인에서의 대응점으로 아래의 수학식 1과 같이 나타낼 수 있다.3 shows the relationship between the left image and the right image obtained from the stereo camera and the objects formed on the images. The binocular difference d for each corresponding pixel corresponds to the corresponding point in a single line, As shown in Fig.

d는 양안차를 의미하며, x^r은 우측 영상에서의 x축 길이, x^l은 좌측 영상에서의 x축 길이를 의미하고, 한 점(x,y,z)이 좌측 영상과 우측 영상으로 촬영될 경우 기하학적 구조로부터 아래와 같은 수학식 2와 같은 파라미터간 관계를 알 수 있다.(x, y, z) represents the left eye image and the right eye image, and d represents the binocular, ^xr represents the x axis length in the right image, and ^xl represents the x axis length in the left image. The relationship between the parameters as shown in the following Equation 2 can be obtained from the geometrical structure.

여기에서, f는 초점 거리(focal length)를 의미하고, B(base length)는 두 카메라 사이의 거리를 의미하며, Z는 3차원 거리를 의미하는데, 이러한 수학식 2를 상기한 수학식 1에 적용하면 아래의 수학식 3과 같이 나타낼 수 있다.Here, f denotes a focal length, B (base length) denotes a distance between two cameras, and Z denotes a three-dimensional distance. This expression (2) If applied, it can be expressed as the following Equation 3.

따라서, 초점거리(f)와 두 카메라 사이의 거리(B)를 알 수 있고, 두 영상의 대응점을 찾을 수 있으면, 물체의 3차원 정보, 즉, 깊이(depth)를 추정할 수 있다.Accordingly, if the focal length f and the distance B between the two cameras can be known and the corresponding points of the two images can be found, the three-dimensional information of the object, that is, the depth can be estimated.

도 4는 본 발명의 실시 예에 따라 격자 구조의 정합 노드와 비정합 노드를 예시한 도면으로, 피라미드 형태의 격자 구조(trellis)에서의 정합 노드(matching node)와 비정합 노드(occlusion node)를 표현하고 있는데, 검정색 점이 정합 노드를 의미하며, 흰점은 비정합 노드를 의미하고, 두 영상으로 부터의 사영선(projection line)에서 발생 가능한 정합 노드들은 다수 개 존재할 수 있으며, 이 정합 노드들에서는 모두 옳은 매칭이 일어날 가능성이 있는데, 그렇다고 하여 전체 양안차 맵의 형태와 관계없이 모든 노드에서 매칭이 일어날 수 있는 것은 아니다.FIG. 4 illustrates matching nodes and non-matching nodes in a lattice structure according to an exemplary embodiment of the present invention. In FIG. 4, a matching node and an occlusion node in a pyramid- The black dot means the matching node, the white point means the non-matching node, and there can be a plurality of matching nodes that can be generated in the projection line from the two images. In this matching node, There is a possibility that the right matching will occur, but not all nodes can be matched irrespective of the shape of the whole binaural map.

이러한 격자 구조에서의 스테레오 매칭은 에너지 함수를 최소로 만드는 양안차 값을 찾는 MAP 추정(maximum aposteriori estimation)을 기반으로 하는데, 이를 위해 DP(dynamic programming)을 기반으로 하는 비터비(viterbi) 알고리즘을 이용하여 최적의 경로를 검색한다. 즉, 정합 노드들 및 비정합 노드들에서 패스를 정의하고, 각 각의 패스마다 주어진 코스트를 합하여 이들 코스트가 최소가 되는 패스를 검출하는 방식으로 수행될 수 있다. 상기 과정에 의해 피라미드 형태의 격자 구조(trellis)에서 양안차를 통한 대응점을 지역적(local) 매칭기법으로 검색할 수 있다.Stereo matching in this grid structure is based on MAP estimation (maximum aposteriori estimation) which finds a binomial difference value that minimizes the energy function. For this, a viterbi algorithm based on dynamic programming (DP) is used Thereby searching for an optimal path. That is, a path may be defined in matching nodes and non-coherent nodes, and a given path may be summed for each path, and a path where these costs are minimized may be detected. Through the above process, the corresponding points through the binocular difference in the pyramid-shaped trellis can be searched by a local matching technique.

시간축 필터링부(500)는 이전 프레임들과의 상관관계(correlation)를 이용하여 필터링을 수행하는 방법을 의미한다. The time-base filtering unit 500 means a method of performing filtering using a correlation with previous frames.

평균값(average) 필터 또는 중간값(median) 필터와 같이, 일반적으로 영상 처리에서 많이 사용되는 필터는 잡음을 줄일 수 있지만, 시간에 따라 값이 변하지 않는 영역에 대해서 발생하는 잡음을 제거할 수 없다. 잡음도 없애고, 정적인 변화와 동적인 변화 모두를 제대로 반영하기 위해서는 시간적인 상관관계(correlation)를 이용하는 시간축(time domain) 필터를 사용해야 한다. 예로 디지털 필터의 한 종류로 입력신호의 값과 출력신호의 값이 재귀적으로(feedback) 적용되어 필터링이 수행되는 one-pole IIR(infinite impulse response) 필터 등이 있다. Filters commonly used in image processing, such as an average filter or a median filter, can reduce noise, but can not remove noise that occurs over an area where the value does not change over time. To eliminate noise and properly reflect both static and dynamic changes, we need to use a time domain filter that uses temporal correlation. For example, a one-pole IIR (infinite impulse response) filter is one type of digital filter in which the input signal value and the output signal value are applied in a feedback manner and filtering is performed.

시간축 필터를 영상처리에 적용하게 되면 이전의 모든 프레임 픽셀 값의 누적 평균을 구하는데, 이를 수학식으로 나타내면 다음과 같다.When the time-base filter is applied to image processing, a cumulative average of all previous frame pixel values is obtained.

여기서, X_t[x,y] 는 t시간에서의 입력 값(현재 프레임의 픽셀 값), Y_t _- ₁[x,y]는 t-1시간까지의 누적 결과 값(이전 프레임까지의 누적 평균 픽셀 값), Y_t[x,y] 는 t시간에서의 필터링 후 최종 결과 값(현재 프레임의 누적 평균 픽셀 값), a는 얼마나 오랫동안 이전 프레임의 영향을 유지할 것인지를 설정하는 역할을 한다. 상기 a는 0 내지 1의 값을 가질 수 있다. 즉, 상기 현재 프레임의 픽셀 값과 이전 프레임까지의 누적 평균 픽셀 값의 적용비율을 합한 값은 1의 값을 가질 수 있다.Here, X _t [x, y] is the input value at time t (pixel value of the current frame), Y _t _- ₁ [x, y] is the cumulative result value Y _t [x, y] is the final result after filtering at time t (the cumulative average pixel value of the current frame), and a is how long the influence of the previous frame is to be maintained. And a may have a value of 0 to 1. That is, the sum of the pixel value of the current frame and the application ratio of the cumulative average pixel value to the previous frame may have a value of 1.

이에 의해, 현재 프레임의 누적 평균 픽셀 값은 현재 프레임의 픽셀 값 및 이전 프레임까지의 누적 평균 픽셀 값을 합한 결과로 나타낼 수 있다.Accordingly, the cumulative average pixel value of the current frame can be expressed as a sum of the pixel value of the current frame and the cumulative average pixel value of the previous frame.

또한, 상기 a값에 따라서 가중치를 최근의 값 또는 이전의 값에 더 줄 것인지 결정된다.Further, it is determined whether to add the weight to the latest value or the previous value according to the value a.

상기 시간축 프레임에 의해 얻어진 현재 프레임의 누적 평균 픽셀 값에 근거하여 깊이 영상(600)을 표현할 수 있게 된다.The depth image 600 can be expressed based on the cumulative average pixel value of the current frame obtained by the time base frame.

도 5는 발명의 다른 실시예에 따른 스테레오 매칭 장치의 블록 구성도이다. 5 is a block diagram of a stereo matching apparatus according to another embodiment of the present invention.

도 5를 참고하면 발명의 다른 실시예에 따른 스테레오 매칭 장치(100A)는 스테레오 매칭부(400)와 시간축 필터링부(500) 사이에 세로방향의 중간값 필터링부(700)를 포함할 수 있다.Referring to FIG. 5, the stereo matching apparatus 100A according to another embodiment of the present invention may include a longitudinal intermediate value filtering unit 700 between the stereo matching unit 400 and the time axis filtering unit 500. FIG.

시간축 필터링부(500)에서는 프레임 간의 관계를 계산할 때, 픽셀 대 픽셀로 수행한다. 여러 픽셀을 이용한 스무딩 효과를 더하기 위해 중간값 필터를 적용할 수 있는데, 일반적으로 사용되는 가로축으로의 중간값 필터 대신 세로축 중간값 필터를 사용한다.The time-base filtering unit 500 performs pixel-by-pixel calculation of the relationship between frames. In order to add smoothing effect using several pixels, an intermediate value filter can be applied. Instead, a vertical axis intermediate value filter is used instead of the commonly used horizontal direction intermediate value filter.

상기 중간값 필터링부(700)에 의해 수학식(4)의 시간-축 필터에서 입력으로 들어가는 X_t[x,y] 값과 Y_t _-1[x,y] 값이 달라지게 된다. X_t[x,y]는 현재 프레임을 세로축으로 중간값 필터를 통해 나온 결과의 각 픽셀 값이 되고, Y_t _-1[x,y] 는 이전 프레임까지 중간값 필터를 통한 누적된 결과의 각 픽셀 값이다.The intermediate value filtering unit 700 changes the value of X _t [x, y] and Y _t _-1 [x, y] into the input from the time-axis filter of Equation (4). X _t [x, y] is the pixel value of the result of the current frame through the median filter on the vertical axis, and Y _t _-1 [x, y] Pixel value.

도 6은 발명의 다른 실시예에 따른 세로방향 중간값 필터링을 나타낸 도면이다. 도 6을 참고하면 3×3 픽셀에서 각 픽셀값이 도시된 바와 같이 5, 3, 4, 6, 10, 5, 3, 4, 5의 값을 갖는 경우, 픽셀 값을 크기의 순서대로 나열하게 되면 3, 3, 3, 4, 4, 5, 5, 5, 10이 되고, 세로방향의 중간값 필터링에 의해 중간값인 4가 픽셀의 중간값이 된다. 이는 수학식 (4)에서 현재 프레임을 세로축으로 중간값 필터를 통해 나온 결과의 각 픽셀 값인 X_t[x,y]가 된다. 6 is a diagram illustrating longitudinal median filtering according to another embodiment of the invention. Referring to FIG. 6, when 3 × 3 pixels have pixel values of 5, 3, 4, 6, 10, 5, 3, 4, and 5 as shown in the drawing, 3, 3, 4, 4, 5, 5, 5, and 10, and the median value of the intermediate value 4-pixel is obtained by the intermediate value filtering in the vertical direction. This is X _t [x, y] which is the value of each pixel of the result of the current frame through the intermediate value filter on the vertical axis in Equation (4).

도 7은 발명의 다른 실시예에 따라 스테레오 매칭을 수행하는 과정을 도시한 플로우차트이다. 7 is a flowchart illustrating a process of performing stereo matching according to another embodiment of the present invention.

스테레오 전처리부에서는 우선 카메라 칼리브레이션 과정이 진행된다(S100). 이는 영상에서 실제 파라메터를 찾아내는 프로세스로, 정밀한 2D 영상 처리 및 3D 영상 처리에 필요한 과정이다. The stereo preprocessing unit first carries out the camera calibration process (S100). This is a process for finding the actual parameters in the image, and is a necessary process for the precise 2D image processing and 3D image processing.

이에 의해 교정에 필요한 데이터인 인트린식 파라메터(Intrinsic parameter)와 익스트린식(extrinsic parameter)를 추출한다. In this way, intrinsic parameters and extrinsic parameters, which are data necessary for calibration, are extracted.

다음으로, 교정(Rectification)에 의해 좌, 우 영상의 에피폴라 라인을 수평으로 맞추게 된다(S200). 즉, 좌영상 및 우영상에서 동일한 픽셀 대응점을 찾게 된다.Next, the epipolar lines of the left and right images are horizontally aligned by rectification (S200). That is, the same pixel correspondence point is found on the left image and the right image.

이후, 스테레오 매칭부에서는 코스트를 계산하게 된다(S300). 이는 좌, 우 영상의 같은 라인에 해당하는 픽셀값들을 이용하여 코스트를 계산한다.Thereafter, the stereo matching unit calculates the cost (S300). This computes the cost using pixel values corresponding to the same line of the left and right images.

다음으로, 상기단계에서 계산된 코스트와 임의의 값인 감마(gamma)와의 비교를 통하여 각 노드가 매칭되었는지의 여부를 판단하고 경로를 결정한다(S400). 다음 단계에서는 코스트와 정해진 경로를 이용하여 최적의 경로를 찾아 깊이 영상을 생성하게 된다(S500). Next, it is determined whether or not each node is matched by comparing the cost calculated in the step and gamma, which is an arbitrary value, and a path is determined (S400). In the next step, a depth image is generated by searching an optimal route using the cost and the determined path (S500).

이후 스테레오 후처리부에서는 가로로 발생하는 스트릭 노이즈(streak noise) 특성을 고려하여 세로방향으로 중간값 필터를 적용하고(S600), 시간축 필터링 단계에서 현재 프레임과 이전 프레임의 상관관계를 이용하여 잡음이 감소된 깊이 영상을 추출하게 된다(S700). 상기 과정에 의해 추출된 깊이 영상(Depth map)은 사용자의 동작, 자세 등의 인식을 위하여 입력으로 사용된다(S800).Then, the stereo post-processing unit applies the intermediate value filter in the vertical direction in consideration of the streak noise characteristic generated in the horizontal direction (S600). In the time base filtering step, the noise is reduced using the correlation between the current frame and the previous frame The depth image is extracted (S700). The depth map extracted by the above process is used as an input for recognizing the user's motion, posture, etc. (S800).

이상에서 실시예들에 설명된 특징, 구조, 효과 등은 본 발명의 적어도 하나의 실시예에 포함되며, 반드시 하나의 실시예에만 한정되는 것은 아니다. 나아가, 각 실시예에서 예시된 특징, 구조, 효과 등은 실시예들이 속하는 분야의 통상의 지식을 가지는 자에 의해 다른 실시예들에 대해서도 조합 또는 변형되어 실시 가능하다. 따라서 이러한 조합과 변형에 관계된 내용들은 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다.The features, structures, effects and the like described in the embodiments are included in at least one embodiment of the present invention and are not necessarily limited to only one embodiment. Furthermore, the features, structures, effects and the like illustrated in the embodiments can be combined and modified by other persons skilled in the art to which the embodiments belong. Therefore, it should be understood that the present invention is not limited to these combinations and modifications.

Claims

스테레오 카메라를 통해 입력된 스테레오 영상을 전처리하는 스테레오 전처리부;
상기 전처리된 스테레오 영상에 대해 정합 노드와 비정합 노드에서의 정합 코스트와 비정합 코스트를 계산하여 검출된 최적 경로에 대응하는 양안차를 결정하는 스테레오 매칭부; 및,
상기 스테레오 매칭부의 출력에 연결되고, 현재 프레임과 이전 프레임의 상관관계를 이용하여 현재 프레임의 누적 평균 픽셀 값을 출력하는 시간축 필터링부가 내부에 형성된 스테레오 후처리부를 포함하며,
상기 스테레오 후처리부는 내부에 세로 방향으로 필터링이 진행되는 세로축 중간값 필터링부를 포함하고,
상기 시간축 필터링부는 상기 세로축 중간값 필터링부의 출력단에 연결되어 현재 프레임을 상기 세로축 중간값 필터링부를 통해 나온 결과의 각 픽셀 값과 이전 프레임까지 상기 세로축 중간값 필터링부를 통해 누적된 결과의 각 픽셀 값을 입력으로 하여 출력값을 결정하며,
상기 시간축 필터링부는 현재 프레임을 상기 세로축 중간값 필터링부를 통해 나온 결과의 각 픽셀 값과 이전 프레임까지 상기 세로축 중간값 필터링부를 통해 누적된 결과의 각 픽셀 값의 합을 출력시키는 스테레오 매칭 장치.A stereo preprocessor for preprocessing a stereo image input through a stereo camera;
A stereo matching unit for calculating a matching cost and an unmatched cost at a matching node and an unmatched node with respect to the preprocessed stereo image and determining a binocular disparity corresponding to the detected optimal path; And
And a stereo post-processing unit connected to the output of the stereo matching unit and configured to output a cumulative average pixel value of a current frame using a correlation between a current frame and a previous frame,
Wherein the stereo post-processing unit includes a vertical axis intermediate value filtering unit in which filtering is performed in the vertical direction,
The time-base filtering unit is connected to an output terminal of the vertical axis intermediate value filtering unit to input the pixel value of the result of the current frame through the vertical axis intermediate value filtering unit and the pixel value of the result accumulated through the vertical axis intermediate value filtering unit to the previous frame To determine an output value,
Wherein the time-base filtering unit outputs the current frame value of each pixel value obtained through the vertical axis intermediate value filtering unit and the sum of pixel values of the result accumulated through the vertical axis intermediate value filtering unit to a previous frame.

삭제delete

스테레오 카메라를 통해 입력된 스테레오 영상을 전처리하는 단계;
상기 전처리된 스테레오 영상에 대해 양안차를 통한 정합 노드와 비정합 노드에서의 코스트를 계산하여 최적의 경로를 검색하고 이에 대응하는 제1 깊이 영상을 생성하는 단계; 및,
상기 깊이 영상에서 이전 프레임과 현재 프레임의 상관 관계를 이용하여 잡음이 제거된 제2 깊이 영상을 시간축 필터링부에서 추출하는 단계;를 포함하고,
상기 제1 깊이 영상에 세로축 중간값 필터를 적용하여 출력값을 상기 시간축 필터링부에 입력하며,
상기 시간축 필터링부의 입력값은 현재 프레임을 상기 세로축 중간값 필터를 통해 나온 결과의 각 픽셀 값과 이전 프레임까지 상기 세로축 중간값 필터를 통해 누적된 결과의 각 픽셀 값이고,
상기 시간축 필터링부는 현재 프레임을 상기 세로축 중간값 필터를 통해 나온 결과의 각 픽셀 값과 이전 프레임까지 상기 세로축 중간값 필터를 통해 누적된 결과의 각 픽셀 값의 합을 출력시키는 스테레오 매칭 방법.Pre-processing a stereo image input through a stereo camera;
Calculating a cost at the matching node and the non-matching node through the binocular difference with respect to the preprocessed stereo image to search for an optimal path and generating a corresponding first depth image; And
And extracting a second depth image from the depth image using the correlation between the previous frame and the current frame, the second depth image having no noise removed by the temporal filtering unit,
Applying a vertical axis intermediate value filter to the first depth image, inputting an output value to the temporal filtering unit,
Wherein the input value of the time-base filtering unit is a pixel value of a result of accumulating the current frame through the vertical axis intermediate value filter,
Wherein the time-base filtering unit outputs a sum of each pixel value of the current frame through the vertical axis intermediate value filter and each pixel value of the result accumulated through the vertical axis intermediate value filter to the previous frame.

삭제delete