KR102107334B1

KR102107334B1 - Method, device and system for determining whether pixel positions in an image frame belong to a background or a foreground

Info

Publication number: KR102107334B1
Application number: KR1020190056929A
Authority: KR
Inventors: 구룬드스트롬 야콥; 발트센 조아킴; 몰린 사이몬; 브졸빈스도티르 한나
Original assignee: 엑시스 에이비
Priority date: 2018-06-14
Filing date: 2019-05-15
Publication date: 2020-05-06
Also published as: EP3582181B1; EP3582181A1; KR20190141577A; TW202018666A; TWI726321B; JP2020024675A; CN110610507A; US10726561B2; JP6767541B2; US20190385312A1; CN110610507B

Abstract

본 발명은 이미지에서 배경 분리의 분야에 관한 것이다. 특히, 비디오 시퀀스의 이미지 프레임에서 픽셀 위치가 픽셀 위치의 결정된 동적 레벨을 사용하여 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 것에 관한 것이다.The present invention relates to the field of background separation in images. In particular, it relates to determining whether the pixel position in the image frame of the video sequence belongs to the background or foreground of the captured scene using the determined dynamic level of the pixel position.

Description

이미지 프레임에서의 픽셀 위치가 배경 또는 전경에 속하는지를 결정하기 위한 방법, 장치 및 시스템{METHOD, DEVICE AND SYSTEM FOR DETERMINING WHETHER PIXEL POSITIONS IN AN IMAGE FRAME BELONG TO A BACKGROUND OR A FOREGROUND}METHOD, DEVICE AND SYSTEM FOR DETERMINING WHETHER PIXEL POSITIONS IN AN IMAGE FRAME BELONG TO A BACKGROUND OR A FOREGROUND

본 발명은 이미지에서 배경 제거(subtraction)의 분야에 관한 것이다. 특히, 비디오 시퀀스의 이미지 프레임에서의 픽셀 위치가 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 것에 관한 것이다.The present invention relates to the field of background subtraction in images. In particular, it relates to determining whether the pixel position in the image frame of the video sequence belongs to the background or foreground of the captured scene.

비디오 감시에서, 비디오 시퀀스에서 캡쳐된 장면에서 움직이는 물체를 감지할 수 있는 것이 중요하다. 비디오에서 움직임 감지를 위해 많은 도구가 존재한다. 그들 중 일부는 상기 비디오 스트림에서 특징을 따라 프레임별로 물체를 추적한다. 다른 것들은 현재 프레임을 픽셀별로 정적 배경 프레임과 비교한다. 후자는 중요한 변경이 발생하는 영역을 감지함으로써 움직이는 물체를 추출하는 것을 목적으로 하는 배경 제거를 기초로 한다. 정적 물체는 배경의 일부인 반면 움직이는 물체는 전경이라 한다. In video surveillance, it is important to be able to detect moving objects in a scene captured in a video sequence. There are many tools for motion detection in video. Some of them track objects frame by frame along features in the video stream. Others compare the current frame to the static background frame on a pixel-by-pixel basis. The latter is based on background removal aimed at extracting moving objects by detecting the areas where significant changes occur. Static objects are part of the background, while moving objects are called foregrounds.

움직이는 물체를 배경에서 분리하는 것은 복잡한 문제로, 배경에서 흔들리는 나무나 잔물결이 존재하거나, 조명이 변하는 경우와 같이 배경이 동적일 경우 훨씬 더 어려워진다. 특히, 동적 배경은 움직이는 물체의 오감지(false detection) 수가 증가하는 것으로 나타날 수 있다. Separating moving objects from the background is a complex problem, making it much more difficult when the background is dynamic, such as when there are trees or ripples shaking in the background, or when lighting changes. In particular, the dynamic background may appear to increase the number of false detections of a moving object.

배경 제거 방법에 대한 검토는 "Background Modeling and Foreground Detection for Video Surveillance"(편집자: Thierry Bouwmans, Fatih Porikli, Benjamin Hoferlin 및 Antoine Vacavant), CRC Press, Taylor & Francis Group, Boca Raton, 2015에서 제공된다. 1장과 7장의 예를 참조한다.A review of the background removal method is provided by "Background Modeling and Foreground Detection for Video Surveillance" (editors: Thierry Bouwmans, Fatih Porikli, Benjamin Hoferlin and Antoine Vacavant), CRC Press, Taylor & Francis Group, Boca Raton, 2015. See examples in Chapters 1 and 7.

배경 제거 방법은 일반적으로 비디오 스트림의 현재 프레임과 움직이는 물체가 없는 참조 배경 프레임 또는 모델을 비교하는 것을 포함한다. 이미지를 배경 프레임 또는 모델과 비교함으로써, 이미지에서 각각의 픽셀이 전경 또는 배경에 속하는지 여부가 결정될 수 있다. 이러한 방식으로, 이미지는 전경 및 배경의 두개의 상보적인 픽셀 세트로 분할될 수 있다.Background removal methods generally involve comparing the current frame of a video stream to a reference background frame or model without moving objects. By comparing the image with a background frame or model, it can be determined whether each pixel in the image belongs to the foreground or background. In this way, the image can be divided into two sets of complementary pixels, the foreground and background.

배경 제거는 시간 경과에 따른 배경 변경을 수용하기 위한 기본 배경 모델과 업데이트 전략의 정의가 필요하다. 많은 배경 모델이 문헌에 제안되어 있다. 여기에는 매겨변수 모델(예를 들어, 가우스 분포)과 비-매개변수 모델(예를 들어, 샘플-기반 모델)이 포함된다.Background removal requires the definition of a basic background model and update strategy to accommodate background changes over time. Many background models have been proposed in the literature. This includes parametric models (eg, Gaussian distribution) and non-parametric models (eg, sample-based models).

그러나, 배경과 전경 사이의 정확한 분리를 달성하기 위해, 사용되는 배경 모델링에 대한 접근 방식에 관계 없이, 멀티-모달 환경을 나타내는 장면 영역(이러한 영역을 나타내는 픽셀 값이 장면을 캡쳐하는 비디오 시퀀스의 프레임들 사이의 값이 변경될 가능성이 높음을 의미함)은 이러한 영역에서의 프레임들 사이에 본질적으로 존재할 이미지 콘텐츠(픽셀 값으로 표현됨)에서의 큰 차이로 인해, 더 정적 영역과 비교하여, 영역이 배경 또는 전경을 나타내는지 여부를 결정할 때 상이하게 처리될 필요가 있다.However, to achieve an accurate separation between the background and the foreground, regardless of the approach to background modeling used, a scene region representing a multi-modal environment (a frame of video sequence in which pixel values representing these regions capture the scene) Compared to a more static region, the region is compared to a more static region due to the large difference in image content (expressed in pixel values) that will essentially exist between frames in these regions). When deciding whether to represent the background or the foreground, it needs to be handled differently.

따라서 이러한 맥락에서 개선이 필요하다.Therefore, improvement is needed in this context.

상기의 관점에서, 따라서 본 발명의 목적은 상기 논의된 문제점을 극복하거나 적어도 완화시키는 것이다. 특히, 비디오 시퀀스의 이미지 프레임에서의 픽셀 위치가 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하기 위한 방법 및 장치를 제공하는 것이 목적이며, 이는 해당하는 픽셀 위치에 대해 상기 장면을 캡쳐하는 상기 비디오 시퀀스의 프레임들 사이의 값을 변경할 다른 가능성이 존재하는 것을 고려한다. In view of the above, it is therefore the object of the present invention to overcome or at least alleviate the problems discussed above. In particular, it is an object to provide a method and apparatus for determining whether a pixel position in an image frame of a video sequence belongs to the background or foreground of a captured scene, which is to capture the scene for the corresponding pixel position. Consider that there is another possibility to change the value between frames of a video sequence.

본 발명의 제1 양태에 따르면, 비디오 시퀀스의 이미지 프레임에서의 픽셀 위치가 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 방법이 제공된다. 상기 방법은 상기 이미지 프레임에서의 각각의 픽셀 위치에 대해:According to a first aspect of the present invention, a method is provided for determining whether a pixel position in an image frame of a video sequence belongs to the background or foreground of a captured scene. The method for each pixel position in the image frame:

- 상기 픽셀 위치가 분류된 분류(class)를 수신하는 단계(상기 분류는 상기 픽셀 위치에서 캡쳐된 장면에서의 콘텐츠(content)의 카테고리(category)를 나타냄),-Receiving a classification in which the pixel location is classified (the classification represents a category of content in a scene captured at the pixel location),

- 상기 픽셀 위치를 이의 해당하는 분류의 동적 레벨(level of dynamics)과 관련시키는 단계(상기 분류의 동적 레벨은 상기 비디오 시퀀스의 프레임들 사이에서 값을 변경하는 분류에 속하는 픽셀 위치에서의 픽셀 값의 가능성을 반영함),-Associating the pixel position with the level of dynamics of its corresponding classification (the dynamic level of the classification is the value of the pixel at the pixel position belonging to the classification that changes the value between frames of the video sequence) Reflects the possibilities),

- 상기 이미지 프레임에서의 픽셀 위치의 픽셀 값을 배경 모델 및 임계값과 비교함으로써 상기 이미지 프레임에서의 픽셀 위치가 상기 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 단계(상기 배경 모델은 픽셀 위치를 나타내는 하나 이상의 값을 포함하고, 상기 픽셀은 상기 픽셀 값과 상기 배경 모델에서 위치를 나타내는 제1 미리 결정된 수의 값 사이의 차이가 상기 픽셀 위치에 대한 임계값보다 작은 경우 상기 배경에 속하는 것으로 결정함)를 포함한다.-Determining whether the pixel position in the image frame belongs to the background or foreground of the captured scene by comparing the pixel value of the pixel position in the image frame to a background model and a threshold (the background model is a pixel Includes one or more values representing a location, and the pixel belongs to the background when the difference between the pixel value and a first predetermined number of values representing the location in the background model is less than a threshold for the pixel location Decide).

상기 이미지 프레임에서의 픽셀 위치가 배경에 속하는 것으로 결정되면, 상기 방법은:If it is determined that the pixel position in the image frame belongs to the background, the method:

- 상기 이미지 프레임에서 픽셀 위치에서의 픽셀 값이 이전 프레임 이후의 제2 값보다 더 크게 변경되면 상기 픽셀 위치에 특정한 임계값을 증가분(increment)으로 증가시키는 단계(상기 증가분은 상기 픽셀 위치의 동적 레벨에 의존하도록 설정되어 높은 동적 레벨이 높은 증가분을 야기함)를 더 포함한다.-If the pixel value at the pixel position in the image frame is changed larger than the second value after the previous frame, incrementing a threshold value specific to the pixel position in increments (the increment is the dynamic level of the pixel position) It is set to depend on the high dynamic level further causes a high increment).

본 발명은 나무, 물, 깃발 등과 같은 동적 배경 물체를 포함하는 장면의 부분들이 상기 장면을 캡쳐하는 시퀀스된 비디오에서의 프레임들 사이에서 이들 부분들을 나타내는 픽셀 값에서 더 큰 차이가 야기될 것이라는 인식으로부터 비롯된다. 결과적으로, 이는 이미지 프레임에서의 특정 픽셀 위치가 상기 캡쳐된 장면의 전경 또는 배경에 속하는지 여부를 결정할 때 고려될 필요가 있다. 상기 이미지에서의 픽셀 값과 상기 배경 모델에서의 해당하는 픽셀 값(들) 사이의 더 큰 차이가 동적 배경 물체를 포함하는 영역에 대해 이들 영역이 실수에 의해 전경에 속하는 것으로 결정되는 가능성을 감소시키도록 유리하게 허용되어야 한다. 상기 임계값(픽셀 위치의 픽셀 값이 이러한 픽셀 위치에 대한 배경 모델과 얼마나 유사한지를 결정하기 위해 사용됨)은 본 발명에서 상기 이미지에서의 픽셀의 위치에 따라 변할 것이다. 특히, 상기 이미지 시퀀스에서의 후속 이미지들 사이의 값을 변경하는 것은 상기 픽셀의 위치에서 배경의 경향에 따라 변할 수 있다. 값을 변경하는 경향이 높을수록 일반적으로 더 높은 임계값을 제공할 수 있다. 이는 전경 분류가 흔들리는 나무, 잔물결 등과 같은 동적 배경에 적용될 수 있다는 점에서 유리하다. 예를 들어, 픽셀 위치가 전경에 속하는 것으로 결정하기 위한 분류의 감도(sensitivity)는 동적 영역과 비교하여 비-동적 영역에서 증가될 수 있다. The present invention is based on the recognition that parts of a scene containing dynamic background objects such as trees, water, flags, etc. will cause a greater difference in pixel values representing these parts between frames in the sequenced video capturing the scene. It comes from. Consequently, this needs to be considered when determining whether a particular pixel position in the image frame belongs to the foreground or background of the captured scene. A larger difference between the pixel value in the image and the corresponding pixel value (s) in the background model reduces the likelihood that these regions are accidentally determined to belong to the foreground for regions containing dynamic background objects. Should be allowed to The threshold (used to determine how similar the pixel value of the pixel location is to the background model for this pixel location) will vary depending on the location of the pixel in the image in the present invention. In particular, changing the value between subsequent images in the image sequence can vary depending on the tendency of the background at the position of the pixel. The higher the tendency to change the value, the generally higher thresholds can be provided. This is advantageous in that the foreground classification can be applied to dynamic backgrounds such as trees, ripples, and the like. For example, the sensitivity of the classification to determine that the pixel position belongs to the foreground can be increased in the non-dynamic region compared to the dynamic region.

또한, 상기 임계값은 상기 비디오 시퀀스의 이미지 프레임들 사이의 증가된 차이를 허용하기 위해, 상기 이미지의 동적 영역에 대해 더 낮은 값과 더 높은 값 사이에서 보다 신속하게 업데이트되는 것이 유리하다. 유리하게, 이는 바람이 장면에서 불어오기 시작하고 나무와 물이 더 동적으로 변할 때 임계값을 신속하게 적용시킬 것이다. 결과적으로 이는 이러한 영역에서 전경 픽셀의 잘못된 분류의 수를 감소시킬 것이다.It is also advantageous that the threshold is updated more quickly between lower and higher values for the dynamic region of the image to allow for increased differences between image frames of the video sequence. Advantageously, this will quickly apply the threshold when the wind begins to blow in the scene and the trees and water change more dynamically. As a result, this will reduce the number of false classifications of foreground pixels in these areas.

본 발명에서, 상기 장면은 상기 캡쳐된 장면에서 콘텐츠의 카테고리를 나타내는 분류로 분할된다. 이러한 분류는 시맨틱 분할(semantic segmentation), 콘텐츠 유형에 따른 분류 등으로 나타낼 수 있다. 예를 들어, 가능한 분류는 자동차, 나무, 물, 도로, 사람, 집 등을 포함한다. 이러한 분류는 예를 들어 조작자에 의해 수동으로 이루어질 수 있거나, 예를 들어 "Fully Convolutional Networks for Semantic Segmentation"(Long 외)와 같은 연구 문헌에 기술된 바와 같이 딥 러닝 신경망에서 구현되는, 시맨틱 분할 알고리즘과 같은 알고리즘을 사용하여 이루어질 수 있다. In the present invention, the scene is divided into categories representing categories of content in the captured scene. Such classification may be represented by semantic segmentation, classification according to content type, and the like. For example, possible classifications include cars, trees, water, roads, people, houses, and the like. This classification can be done manually, for example by an operator, or with a semantic segmentation algorithm, implemented in deep learning neural networks, as described in research literature, for example, "Fully Convolutional Networks for Semantic Segmentation" (Long et al.). It can be done using the same algorithm.

상기 장면의 비디오 시퀀스의 이미지 프레임에서의 각각의 픽셀 위치에 대해, 상기 분류가 수신되고, 상기 픽셀 위치를 이의 해당하는 분류의 동적 레벨과 관련시키기 위해 사용된다. 상기 관련은 예를 들어, 미리 정의된 분류를 다른 동적 레벨로 매핑하는 표 또는 특정 분류의 동적 레벨이 나타내는 것을 정의하는 임의의 다른 데이터 구조를 사용하여 수행될 수 있다. For each pixel position in the image frame of the video sequence of the scene, the classification is received and used to associate the pixel position with the dynamic level of its corresponding classification. The association can be performed using, for example, a table mapping predefined classifications to different dynamic levels or any other data structure that defines what the dynamic level of a particular classification represents.

따라서 상기 분류의 동적 레벨은 상기 분류에 속하는 픽셀 위치에서의 픽셀 값이 상기 비디오 시퀀스의 프레임들 사이에서 값을 변경할 가능성을 반영한다. 일례로서, 나무(의 수관)로 분류된 픽셀 위치는 비교적 높은 동적 레벨을 가질 수 있는 반면, 집으로 분류된 픽셀 위치는 비교적 낮은 동적 레벨을 가질 수 있다. 상기 동적 레벨은 1-100, 0-1, 1-10 또는 임의의 다른 적절한 범위의 값 사이에 걸쳐 있을 수 있다.Therefore, the dynamic level of the classification reflects the possibility that the pixel value at the pixel position belonging to the classification will change the value between frames of the video sequence. As an example, a pixel location classified as a tree (water pipe) may have a relatively high dynamic level, while a pixel location classified as a house may have a relatively low dynamic level. The dynamic level can range between 1-100, 0-1, 1-10 or any other suitable range of values.

상기 배경 모델은 각각의 픽셀 위치에 대해, 상기 픽셀 위치를 나타내는 하나 이상의 값(예를 들어, 상기 픽셀 위치에서의 이전 이미지 프레임의 픽셀 값)을 포함한다. 상기 배경 모델은 상기 픽셀 위치가 전경 또는 배경에 속하는지 여부를 결정할 때, 상기 픽셀 위치에서의 픽셀 값과 상기 배경 모델에서의 해당하는 위치에 대한 값 사이의 허용 가능한 차이를 나타내는 임계값을 더 포함한다. 상기 픽셀 값과 상기 배경 모델에서의 위치를 나타내는 제1 미리 결정된 수의 값 사이의 차이가 상기 픽셀 위치에 대한 임계값보다 작은 경우, 상기 픽셀 위치는 상기 배경에 속하는 것으로 결정된다. 예를 들어, 상기 배경 모델이 픽셀 위치에 대해 두개의 값(예를 들어, 5 및 7)을 포함하고, 상기 임계값이 2이며, 상기 제1 미리 결정된 수가 1이고 상기 픽셀 위치에서의 픽셀 값이 9인 경우, 2보다 작은 차이가 없기 때문에, 상기 픽셀 위치는 전경에 속하는 것으로 결정될 것이다. 그러나, 상기 픽셀 위치에서의 픽셀 값이 7인 경우, 상기 픽셀 위치는 배경에 속하는 것으로 결정될 것이다. 다시 말해서, 상기 임계값은 상기 픽셀 위치의 값이 가질 수 있는 값 범위의 크기를 정의하고, 이는 상기 픽셀 위치가 배경에 속하는 것으로 결정되며, 여기서 상기 범위는 상기 임계값을 증가시킴에 따라 증가한다. 상기 배경 모델이 각각의 픽셀 위치에 대해 하나의 값을 포함하는 실시 형태의 경우, 상기 제1 미리 결정된 수는 항상 1일 것이다. 상기 배경 모델이 각각의 픽셀 위치에 대해 복수의 값을 포함하는 실시 형태에서, 상기 미리 결정된 수는 전경 필셀을 결정할 때의 사용 경우 및 이러한 사용 경우를 위한 감도 요건에 따라, 1과 각각의 픽셀 위치에 대한 값의 수 사이의 임의의 적절한 수일 것이다. The background model includes, for each pixel location, one or more values representing the pixel location (eg, the pixel value of the previous image frame at the pixel location). The background model further includes a threshold value indicating an allowable difference between a pixel value at the pixel location and a value for a corresponding location in the background model when determining whether the pixel location belongs to the foreground or background. do. If the difference between the pixel value and a first predetermined number of values representing a position in the background model is less than a threshold for the pixel position, the pixel position is determined to belong to the background. For example, the background model includes two values for a pixel location (eg, 5 and 7), the threshold is 2, the first predetermined number is 1, and a pixel value at the pixel location If this is 9, since there is no difference less than 2, the pixel position will be determined to belong to the foreground. However, if the pixel value at the pixel position is 7, it will be determined that the pixel position belongs to the background. In other words, the threshold value defines the size of a range of values that the value of the pixel position can have, and it is determined that the pixel position belongs to the background, where the range increases as the threshold value is increased. . For embodiments where the background model includes one value for each pixel location, the first predetermined number will always be one. In embodiments in which the background model includes a plurality of values for each pixel position, the predetermined number is 1 and each pixel position, depending on the use case when determining the foreground pixel and sensitivity requirements for such use case It will be any suitable number between the number of values for.

상기 픽셀 위치가 배경에 속하는 것으로 결정되고, 상기 이미지 프레임에서 픽셀 위치에서의 픽셀 값 사이의 차이가 이전 프레임의 해당하는 픽셀 값과 비교하여 제2 값보다 큰 경우, 상기 임계값은 이러한 픽셀 위치가 가질 수 있는 픽셀 값 범위에 기초하여 상기 임계값을 미세 조정하도록 증가되어야 한다. 여기에서 동적 레벨이 작동하기 시작한다. 동적 영역의 경우, 이러한 동적 영역에서의 픽셀 값이 변경될 가능성이 높기 때문에, 상기 증가분은 유리하게 정적 영역보다 높아야 한다. 픽셀 위치의 동적 레벨에 기초하여 증가분을 변화시키는 것은 비디오 시퀀스의 이미지 프레임에서의 각각의 전경 픽셀의 배경의 잘못된 결정의 수를 감소시킬 수 있다.If it is determined that the pixel position belongs to the background, and the difference between the pixel values at the pixel position in the image frame is greater than the second value compared to the corresponding pixel value in the previous frame, the threshold value indicates that this pixel position It should be increased to fine-tune the threshold based on the range of pixel values it can have. Here the dynamic level starts working. In the case of a dynamic region, since the pixel value in this dynamic region is likely to change, the increment should advantageously be higher than the static region. Changing the increment based on the dynamic level of the pixel position can reduce the number of erroneous determinations of the background of each foreground pixel in the image frame of the video sequence.

일부 실시 형태에 따르면, 상기 방법은 상기 이미지 프레임에서의 픽셀 위치가 배경에 속하는 것으로 결정되면, 상기 이미지 프레임에서 상기 픽셀 위치에서의 픽셀 값이 이전 프레임 이후의 제2 값보다 작게 변경시키는 경우의 감소분(decrement)으로 상기 픽셀 위치에 특정한 임계값을 감소시키는 단계를 더 포함하며, 상기 감소분은 상기 픽셀 위치의 동적 레벨에 좌우되도록 설정되어 더 높은 동적 레벨이 낮은 감소분을 야기한다. 상기 이미지 프레임에서 상기 픽셀 위치에서의 픽셀 값이 이전 프레임 이후의 제2 값보다 더 크게 변경되었을 때 전술한 임계값의 미세 조정과 유사하게, 임계값의 감소분은 예를 들어, 특정 픽셀 위치에 대한 두개의 후속 이미지 프레임들 사이의 픽셀 값에서의 차이가 제2 값보다 작은 경우에 수행될 수 있다. 이러한 실시 형태에서, 동적 영역에서의 감소분은 이러한 동적 영역에서의 픽셀 값이 변경할 가능성이 높기 때문에, 정적 영역에서의 감소분과 비교하여 낮아야 한다. 픽셀 위치의 동적 레벨에 기초하여 감소분을 변화는 것은 비디오 시퀀스의 이미지 프레임에서 각각의 전경 픽셀의 배경의 잘못된 결정의 수를 감소시킬 수 있다.According to some embodiments, if the method determines that the pixel position in the image frame belongs to the background, the decrease in the case where the pixel value in the pixel position in the image frame is changed to be smaller than the second value after the previous frame. The method further includes reducing a threshold value specific to the pixel position by increments, and the decrease is set to depend on the dynamic level of the pixel position, causing a higher dynamic level to cause a lower decrease. Similar to the fine adjustment of the threshold value described above when the pixel value at the pixel position in the image frame is changed to be greater than the second value since the previous frame, the decrease in the threshold value is, for example, for a particular pixel position. It may be performed when the difference in the pixel value between two subsequent image frames is less than the second value. In this embodiment, the decrease in the dynamic region should be lower compared to the decrease in the static region, since the pixel value in this dynamic region is likely to change. Changing the reduction based on the dynamic level of the pixel position can reduce the number of erroneous determinations of the background of each foreground pixel in the image frame of the video sequence.

일부 실시 형태에서, 동적 영역에 대한 감소분은 해당하는 증가분보다 낮다. 증가분 값이 감소분 값보다 크기 때문에, 상기 방법은 배경에서 증가된 동적에 신속하게 응답할 수 있으며, 이는 예를 들어 상기 장면에서 바람 상태가 변하기 때문일 수 있다. 동시에, 상기 임계값은 정적 영역에 비해 동적 영역에 대해 감소가 느리며, 이는 동적 배경 움직임이 다시 발생할 가능성이 있기 때문에 유리하다.In some embodiments, the decrease for the dynamic region is lower than the corresponding increase. Since the increment value is greater than the increment value, the method can quickly respond to increased dynamics in the background, for example, because the wind condition changes in the scene. At the same time, the threshold is slower for the dynamic region compared to the static region, which is advantageous because dynamic background motion is likely to occur again.

일부 실시 형태에 따르면, 상기 방법은:According to some embodiments, the method:

픽셀 위치에 특정한 임계값을 값으로 설정하는 단계를 포함하고, 상기 값은 상기 픽셀 위치의 동적 레벨에 의존하여 더 높은 동적 레벨이 더 높은 값을 야기한다. 따라서 이러한 실시 형태는 예를 들어 상기 비디오 스트림의 초기 프레임에 대해 및/또는 상기 비디오 시퀀스를 캡쳐하는 비디오 캡쳐 장치의 시야(a field of view)가 변경될 때 수행될 수 있는 임계값의 초기화 또는 임계값의 재설정을 정의한다. 특정 픽셀 위치에 대한 임계값을 그 픽셀 위치와 관련된 동적 레벨에 따라 초기화/설정함으로써, 후속 이미지 프레임 동안 수행되는 임계값의 미세 조정이 더 빨리 수행될 것이고, 이는 임계값이 시작부터(예를 들어, 모든 픽셀 위치에 대한 임계값을 0과 같은 미리 결정된 값 또는 임의의 값으로 설정하는 것과 비교하여) 더 정확할 것이다. 이러한 실시 형태는 예를 들어 비디오 캡쳐를 시작하거나, 시야를 변경할 때 비디오 시퀀스의 이미지 프레임에서 각각의 전경 픽셀을 배경으로 잘못된 결정의 수를 추가로 감소시킬 수 있다. And setting a threshold value specific to the pixel position, the value depending on the dynamic level of the pixel position, resulting in a higher dynamic level. Thus, this embodiment may be performed, for example, for the initial frame of the video stream and / or the threshold or initialization of a threshold that can be performed when a field of view of a video capture device capturing the video sequence is changed. Define the value reset. By initializing / setting the threshold for a particular pixel position according to the dynamic level associated with that pixel position, fine adjustment of the threshold performed during subsequent image frames will be performed faster, which means that the threshold starts from the beginning (e.g. , Compared to setting a threshold for all pixel positions to a predetermined or random value such as 0). Such an embodiment may further reduce the number of erroneous decisions against each foreground pixel in the background of the image frame of the video sequence, for example when starting video capture or changing the field of view.

일부 실시 형태에 따르면, 상기 방법은 상기 픽셀 위치의 동적 레벨에 따라 상기 픽셀 위치에 특정한 임계값에 대해 더 낮은 임계값을 설정하는 단계를 더 포함하고, 상기 더 낮은 임계값은 상기 임계값에 대한 가능한 최소값을 결정하여, 높은 동적 레벨이 더 낮은 임계값의 높은 값을 야기한다. 즉, 상기 임계값이 이들 낮은 임계값보다 더 낮아질 수 없으므로, 상기 배경 모델은 항상 나무와 같은 동적 영역에 덜 민감하게 한다. According to some embodiments, the method further comprises setting a lower threshold for a threshold specific to the pixel location according to the dynamic level of the pixel location, wherein the lower threshold is for the threshold. By determining the minimum possible value, a high dynamic level results in a high value of a lower threshold. That is, since the threshold cannot be lower than these lower thresholds, the background model is always less sensitive to dynamic regions such as trees.

일부 실시 형태에 따르면, 이의 해당하는 분류가 제1 미리 정의된 분류 그룹에 속하면, 상기 픽셀 위치는 제1 동적 레벨과 관련되고, 이의 해당하는 분류가 제2 미리 정의된 분류 그룹에 속하면, 상기 픽셀 위치는 더 높은 제2의 동적 레벨과 관련된다. 예를 들어, 상기 장면에서 집을 나타내는 것과 같은 정적 분류에 대한 하나와 상기 장면에서 나무 또는 물을 나타내는 것과 같은 동적 분류에 대한 하나의 두개의 동적 레벨만이 정의될 수 있다. 다른 실시 형태에서, 예를 들어 상기 장면에서 집을 나타내는 픽셀 위치에 대한 가장 낮은 제1 동적 레벨, 상기 장면에서 물을 나타내는 픽셀 위치에 대한 중간 제2 동적 레벨, 및 상기 장면에서 나무를 나타내는 픽셀 위치에 대한 가장 높은 제3 동적 레벨을 정의하는 예에서, 미세 그레인 모델(finer grained model)이 구현된다. According to some embodiments, if its corresponding classification belongs to a first predefined classification group, the pixel position is associated with a first dynamic level, and its corresponding classification belongs to a second predefined classification group, The pixel position is associated with a higher second dynamic level. For example, only two dynamic levels can be defined, one for static classification, such as representing a house in the scene, and one for dynamic classification, such as representing trees or water in the scene. In another embodiment, for example, the lowest first dynamic level for a pixel location representing a house in the scene, an intermediate second dynamic level for a pixel location representing water in the scene, and a pixel location representing a tree in the scene. In the example of defining the highest third dynamic level for, a fine grained model is implemented.

일부 실시 형태에 따르면, 상기 픽셀 위치에 해당하는 분류가 제3의 미리 정의된 분류 그룹에 속하면, 상기 픽셀 위치에 특정한 임계값은 일정한 레벨로 유지된다. 이러한 실시 형태에서, 제3 미리 정의된 분류 그룹에 속하는 것으로 분류되는 픽셀 위치에 대한 임계값은 일정한 레벨로 유지된다.According to some embodiments, when the classification corresponding to the pixel position belongs to a third predefined classification group, a threshold value specific to the pixel position is maintained at a constant level. In this embodiment, the threshold for pixel positions classified as belonging to the third predefined classification group is maintained at a constant level.

일부 실시 형태에 따르면, 상기 픽셀 위치가 분류된 분류는 또한 위에서 예시된 바와 같이 시멘틱 분할을 위한 알고리즘을 사용하여 결정된다.According to some embodiments, the classification in which the pixel location is classified is also determined using an algorithm for semantic segmentation as illustrated above.

일부 실시 형태에 따르면, 시맨틱 분할을 위한 알고리즘은 상기 비디오 스트림의 이미지 프레임의 서브 세트에서 실행된다. 이러한 실시 형태에서, 상기 시맨틱 분할은 상기 비디오 스트림에서의 모든 이미지 프레임에 대해 수행되지는 않는다. 상기 비디오 스트림에 대한 두개의 시맨틱 분할 작동 사이에 상이한 시간 범위가 예를 들어 분, 시간 또는 심지어 일(day)로 사용될 수 있다. 상기 시간 범위는 캡쳐된 장면의 배경에서 얼마나 많이 발생하는지에 의존할 수 있다. 더 긴 범위는 상기 방법에 대한 계산상의 요건을 감소시키고, 이는 시맨틱 알고리즘이 다소 하드웨어를 요구하기 때문에 알고리즘으로부터 실시간 결과를 얻는 것이 어렵게(또는 불가능하게) 한다.According to some embodiments, an algorithm for semantic segmentation is performed on a subset of image frames of the video stream. In this embodiment, the semantic segmentation is not performed for every image frame in the video stream. Different time ranges between the two semantic segmentation operations for the video stream can be used, for example in minutes, hours or even days. The time range may depend on how much occurs in the background of the captured scene. The longer range reduces the computational requirements for the method, which makes it difficult (or impossible) to obtain real-time results from the algorithm because the semantic algorithm is somewhat hardware- demanding.

일부 실시 형태에 따르면, 상기 픽셀 위치가 분류된 분류는 이미지 프레임의 상기 서브 세트에서 복수의 이미지 프레임으로부터 상기 픽셀 위치에 대한 시맨틱 알고리즘의 결과들의 조합을 이용하여 결정된다. 다시 말해서, 특정 픽셀 위치에 대한 분류의 이전 결과는 예를 들어 잘못된 분류의 위험을 감소시키기 위해 새로운 결과와 함께 사용될 수 있다.According to some embodiments, the classification in which the pixel location is classified is determined using a combination of the results of the semantic algorithm for the pixel location from a plurality of image frames in the subset of image frames. In other words, previous results of classification for a particular pixel location can be used with new results, for example, to reduce the risk of misclassification.

일부 실시 형태에 따르면, 상기 배경 모델은 픽셀 위치를 나타내는 복수의 값을 포함하고, 상기 이미지 프레임에서의 픽셀 위치가 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 단계는:According to some embodiments, the background model includes a plurality of values representing a pixel location, and determining whether the pixel location in the image frame belongs to the background or foreground of the captured scene:

- 상기 픽셀 위치에서 이미지 프레임에서의 픽셀 값과 해당하는 픽셀 위치에서 상기 배경 모델의 복수의 값 사이의 차이를 계산하는 단계,-Calculating a difference between a pixel value in an image frame at said pixel position and a plurality of values of said background model at a corresponding pixel position,

- 상기 픽셀 위치에 특정한 임계값보다 작은 차이의 수를 계산하는 단계,-Calculating the number of differences smaller than a threshold value specific to the pixel location,

- 상기 계산된 수가 제1 미리 결정된 수의 값을 초과하면 상기 이미지 프레임에서의 픽셀 위치가 배경에 속하고, 그렇지 않으면 상기 이미지 프레임에서의 픽셀 위치가 전경에 속하는 것으로 결정하는 단계를 포함한다.And determining that the pixel position in the image frame belongs to the background if the calculated number exceeds a value of the first predetermined number, otherwise the pixel position in the image frame belongs to the foreground.

이러한 실시 형태는 제1 미리 결정된 수의 값에 따라, 현재 처리된 이미지 프레임의 값과 상기 배경 모델의 값 사이의 차이에 다소 민감한 결정을 야기할 수 있다. 일부 실시 형태에서, 대부분의 차이는 배경 픽셀을 야기하도록 임계값보다 낮은 것이 요구된다. 다른 실시 형태에서, 1/3, 2/3 또는 임의의 다른 적절한 수의 차이는 배경 픽셀을 야기하기 위해 임계값보다 낮은 것이 요구된다. 극단적인 경우, 차이의 전부 또는 하나만이 배경 픽셀을 야기하기 위해 임계값보다 낮은 것이 필요하다. Such an embodiment may result in a decision that is more or less sensitive to the difference between the value of the currently processed image frame and the value of the background model, according to a first predetermined number of values. In some embodiments, most differences are required to be below the threshold to cause background pixels. In other embodiments, a difference of 1/3, 2/3 or any other suitable number is required to be below the threshold to cause background pixels. In the extreme case, only all or one of the differences needs to be below the threshold to cause background pixels.

본 발명의 제2 양태에 따르면, 상기 목적은 처리 능력을 갖는 장치에 의해 실행될 때, 제1 양태의 방법을 수행하기 위해 컴퓨터 코드 명령어가 저장된 컴퓨터 판독 가능 매체를 포함하는 컴퓨터 프로그램 제품에 의해 달성된다.According to a second aspect of the invention, the object is achieved by a computer program product comprising a computer readable medium having computer code instructions stored thereon for performing the method of the first aspect when executed by a device having processing power. .

본 발명의 제3 양태에 따르면, 상기 목적은 비디오 시퀀스의 이미지 프레임에서의 픽셀 위치가 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하기 위한 장치에 의해 달성되며, 상기 장치는:According to a third aspect of the invention, the object is achieved by an apparatus for determining whether a pixel position in an image frame of a video sequence belongs to the background or foreground of a captured scene, the apparatus comprising:

- 상기 픽셀 위치가 분류되는 분류를 수신하는 단계(상기 분류는 상기 픽셀 위치에서 캡쳐된 장면에서의 물체의 유형을 나타냄),-Receiving a classification in which the pixel position is classified (the classification indicates the type of object in the scene captured at the pixel position),

- 상기 픽셀 위치를 이의 해당 분류에 기초한 동적 레벨과 관련시키는 단계(상기 분류의 동적 레벨은 상기 비디오 시퀀스의 프레임들 사이에서 값을 변경하는 분류에 속하는 픽셀 위치에서의 픽셀 값의 가능성을 반영함),-Associating the pixel position with a dynamic level based on its classification (the dynamic level of the classification reflects the likelihood of a pixel value at a pixel position belonging to a classification that changes values between frames of the video sequence) ,

- 상기 이미지 프레임에서의 픽셀 위치의 픽셀 값을 배경 모델 및 임계값과 비교함으로써 상기 이미지 프레임에서의 픽셀 위치가 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 단계(상기 배경 모델은 픽셀 위치를 나타내는 하나 이상의 값을 포함하고, 상기 픽셀은 상기 픽셀 값과 배경 모델에서의 위치를 나타내는 제1 미리 결정된 수의 값 사이의 차이가 상기 픽셀 위치에 대한 임계값보다 작으면 배경에 속하는 것으로 결정됨)를 적용하기 위한 프로세서를 포함한다.-Determining whether the pixel position in the image frame belongs to the background or foreground of the captured scene by comparing the pixel value of the pixel position in the image frame to a background model and a threshold (the background model is a pixel position And one or more values representing, wherein the pixel is determined to belong to a background if the difference between the pixel value and a first predetermined number of values representing a location in the background model is less than a threshold for the pixel location) It includes a processor for applying.

상기 이미지 프레임에서의 픽셀 위치가 배경에 속하는 것으로 결정되면, 상기 장치의 프로세서는:If it is determined that the pixel position in the image frame belongs to the background, the processor of the device:

- 상기 픽셀 값이 이전 프레임 이후에 제2 미리 결정된 값보다 크게 변경되면 상기 픽셀 위치에 특정한 임계값을 증분으로 증가시키는 단계(상기 증분은 상기 픽셀 위치의 동적 레벨에 의존하도록 설정되어 더 높은 동적 레벨이 높은 증분을 야기함)를 추가로 적용된다.-Incrementally increasing a threshold value specific to the pixel position if the pixel value is changed greater than a second predetermined value after the previous frame (the increment is set to depend on the dynamic level of the pixel position, so a higher dynamic level This causes a high increment).

본 발명의 제4 양태에 따르면, 상기 목적은:According to a fourth aspect of the invention, the object is:

장면을 묘사하는 비디오 시퀀스를 연속적으로 캡쳐하도록 적용된 비디오 캡쳐 장치;A video capture device adapted to continuously capture a video sequence depicting the scene;

상기 비디오 캡쳐 장치로부터 상기 비디오 시퀀스의 제1 이미지 프레임을 수신하고, 상기 이미지 프레임에서 각각의 픽셀 위치를 분류하며(상기 분류는 상기 픽셀 위치에서 캡쳐된 장면에서의 물체의 유형을 나타냄), 상기 이미지 프레임에서 각각의 픽셀 위치에 대한 분류를 출력하도록 적용된 제1 장치,Receive a first image frame of the video sequence from the video capture device, classify each pixel position in the image frame (the classification indicates the type of object in the scene captured at the pixel position), and the image A first device adapted to output a classification for each pixel position in the frame,

상기 비디오 캡쳐 장치로부터 상기 비디오 시퀀스의 제2 이미지 프레임을 수신하고 상기 제1 장치로부터 상기 이미지 프레임에서의 각각의 픽셀 위치에 대한 분류를 수신하도록 적용되며 제3 양태에 따른 제2 장치를 포함하는 시스템에 의해 달성된다.A system comprising a second device according to a third aspect, adapted to receive a second image frame of the video sequence from the video capture device and to receive a classification for each pixel position in the image frame from the first device Is achieved by.

제2, 제3 및 제4 양태는 일반적으로 제1 양태와 동일한 특징 및 장점을 가질 수 있다. 또한, 본 발명은 달리 명시되지 않는 한 모든 가능한 특징의 조합에 관한 것이다.The second, third and fourth aspects can generally have the same features and advantages as the first aspect. In addition, the present invention relates to all possible combinations of features unless otherwise specified.

전술한 것뿐만 아니라, 본 발명의 추가적인 목적, 특징 및 이점은 첨부된 도면을 참조하여 이하의 예시적이고 비-제한적인 상세한 설명을 통해 보다 잘 이해될 것이고, 유사한 요소에 대해서는 동일한 참고 번호가 사용될 것이다.
도 1은 상이한 동적 레벨을 갖는 배경 물체를 포함하는 장면을 도시한다.
도 2-3은 픽셀 위치에 대한 배경 모델의 실시 형태를 도시한다.
도 4는 실시 형태에 따른 비디오 시퀀스의 이미지 프레임에서 픽셀 위치가 배경 또는 전경에 속하는지 여부를 결정하는 방법의 흐름도를 도시한다.
도 5는 다른 실시 형태에 따른 비디오 시퀀스의 이미지 프레임에서 픽셀 위치가 배경 또는 전경에 속하는지 여부를 결정하는 방법의 흐름도를 도시한다.
도 6은 장면의 비디오 시퀀스를 캡쳐하고 비디오 시퀀스의 이미지 프레임에서 픽셀 위치가 배경 또는 전경에 속하는지 여부를 결정하기 위한 시스템을 예로 도시한다.In addition to the foregoing, further objects, features and advantages of the present invention will be better understood through the following illustrative and non-limiting detailed description with reference to the accompanying drawings, and the same reference numerals will be used for similar elements. .
1 shows a scene comprising background objects with different dynamic levels.
2-3 shows an embodiment of a background model for pixel location.
4 shows a flow chart of a method for determining whether a pixel position belongs to a background or a foreground in an image frame of a video sequence according to an embodiment.
5 shows a flow diagram of a method for determining whether a pixel position belongs to a background or a foreground in an image frame of a video sequence according to another embodiment.
6 illustrates an example system for capturing a video sequence of a scene and determining whether a pixel position in the image frame of the video sequence belongs to the background or foreground.

이하, 본 발명의 실시 형태가 도시된 첨부 도면을 참조하여 본 발명을 보다 상세하게 설명할 것이다. 본 발명에 개시된 시스템 및 장치는 작동 중에 설명될 것이다.Hereinafter, the present invention will be described in more detail with reference to the accompanying drawings, in which embodiments of the present invention are illustrated. The systems and devices disclosed in the present invention will be described during operation.

도 1은 두개의 배경 물체(102, 104)를 포함하는 장면(101)의 이미지 (100)를 도시한다. 일반적으로, 이러한 장면(101)은 전경 물체도 포함하지만, 설명의 편의를 위해 이들이 생략된다. 장면에서, 제1 배경 물체(102)는 건물이다. 장면에서 제2 배경 물체(104)는 나무이다. 건물(102)은 일반적으로 매우 정적인 물체이며, 이는 시간(t)에서 취해진 건물의 이미지와 이후 시간(t+n)에서 취해진 건물의 이미지 사이에 차이가 거의 없거나 전혀 없음을 의미한다. 반면에 나무는 특히 바람이 불 때 나무의 잎과 가지가 시간이 지남에 따라 상당히 움직일 수 있는 동적 물체이다. 즉, 시간(t)에서 취해진 나무의 이미지와 이후 시간(t+n)에서 취해진 나무의 이미지 사이에 많은 차이가 존재할 수 있다. 예를 들어, 모니터링 목적을 위해 장면의 이미지에서 움직임을 감지할 때 이를 고려해야 한다. 이러한 움직임은 일반적으로 전경 물체로부터 유래될 때에만 관심이 있고, 배경에서의 움직임은 이러한 경우에 무시되어야 한다. 결과적으로, 잘못된 움직임 감지를 감소시키고, 예를 들어 모니터링 적용 분야에서 오경보 수를 감소시키기 위해, 여전히 배경으로 감지하도록 동적 물체를 보상하는 배경 물체 알고리즘을 구현할 필요가 있다.1 shows an image 100 of a scene 101 that includes two background objects 102, 104. Generally, these scenes 101 also include foreground objects, but these are omitted for convenience of explanation. In the scene, the first background object 102 is a building. The second background object 104 in the scene is a tree. Building 102 is generally a very static object, meaning that there is little or no difference between the image of the building taken at time t and the image of the building taken at time t + n. Trees, on the other hand, are dynamic objects whose leaves and branches can move considerably over time, especially when the wind is blowing. That is, there may be many differences between an image of a tree taken at time t and an image of a tree taken at time t + n. For example, this should be taken into account when detecting motion in an image of a scene for monitoring purposes. These movements are generally of interest only when they originate from foreground objects, and movements in the background should be ignored in this case. As a result, it is necessary to implement a background object algorithm that compensates for dynamic objects to still detect in the background in order to reduce false motion detection and reduce the number of false alarms, for example in monitoring applications.

이러한 배경 검출 알고리즘의 실시 형태는 이제 도 4와 함께 도 1을 사용하여 설명될 것이다.An embodiment of this background detection algorithm will now be described using FIG. 1 in conjunction with FIG. 4.

개선된 배경 감지는 장면(101)을 묘사하는 이미지 프레임(100)(복수의 이미지 프레임을 포함하는 비디오 시퀀스의)에서 각각의 픽셀 위치에 대해, 픽셀 위치가 분류된 분류를 결정함으로써 달성되고, 분류는 픽셀 위치에서 캡쳐된 장면에서의 콘텐츠의 카테고리를 나타낸다. 도 1에서, 점선의 사각형(106, 108)은 이러한 분류를 상징하고, 점선 사각형(106) 내부의 픽셀, 예를 들어 제1 픽셀 위치(110)에서의 픽셀은 예를 들어 건물로서 분류되고, 점선 사각형(108) 내부의 픽셀, 예를 들어 제2 픽셀 위치(112)에서의 픽셀은 나무로서 분류된다. 설명의 편의를 위해 나무(104) 주위의 표시(108)는 직사각형으로 나타낸 점을 유의해야 한다. 많은 응용 분야에서, 표시는 나무(104)의 윤곽을 따를 것이다.Improved background detection is achieved by determining, for each pixel position in the image frame 100 (of a video sequence containing multiple image frames) depicting the scene 101, the classification in which the pixel position is classified, Indicates a category of content in the scene captured at the pixel location. In Fig. 1, the dotted rectangles 106 and 108 symbolize this classification, and the pixels inside the dotted rectangle 106, for example the pixels at the first pixel location 110, are classified as buildings, for example, The pixels inside the dotted rectangle 108, for example pixels at the second pixel location 112, are classified as trees. It should be noted that for convenience of explanation, the mark 108 around the tree 104 is shown as a rectangle. In many applications, the marking will follow the outline of the tree 104.

이러한 분류는 예를 들어, 조작자가 상이한 분류에 속하는 것으로서, 예를 들어 나무의 분류 및 건물의 분류로 두개의 물체(102, 104)를 106, 108로 표시할 수 있다. 다른 실시 형태에서, 픽셀 위치가 분류된 분류는 예를 들어, 시맨틱 분할을 위한 알고리즘, 예를 들어 전술한 바와 같은 신경망 구현을 사용하여 결정된다. This classification is, for example, that the operator belongs to different classifications, for example, the classification of trees and the classification of buildings may indicate two objects 102 and 104 as 106 and 108. In another embodiment, the classification in which the pixel location is classified is determined using, for example, an algorithm for semantic segmentation, eg, a neural network implementation as described above.

일부 실시 형태에서 시맨틱 분할은 비디오 스트림의 이미지 프레임들의 서브 세트에서 실행될 수 있다. 일반적으로, 이러한 알고리즘(또는 수동 작업)은 실시간, 즉 비디오 시퀀스의 모든 이미지 프레임에 대해 실행하기가 어렵다. 유리하게, 시맨틱 분할(분류)은 특정 시간 간격, 예를 들어 매분, 시간, 일 등에서만 실행된다. 즉, 시맨틱 분할을 위한 알고리즘은 서로 미리 결정된 간격으로 비디오 스트림의 이미지 프레임에서 실행된다. 나머지 이미지 프레임에 대해, 캐싱된 결과, 예를 들어 마지막으로 수행된 분류의 결과가 사용될 수 있다. 다른 실시 형태에서, 픽셀 위치가 분류된 분류는 이미지 프레임의 상기 서브 세트에서 복수의 이미지 프레임들로부터 상기 픽셀 위치에 대한 시맨틱 분할 알고리즘으로부터의 결과들의 조합을 이용하여 결정되며, 예컨대 분류는 대부분의 분류의 결과이다. 다른 실시 형태에서, 일부 분류는 다른 분류보다 "더 중요한" 것으로 정의되는데, 여기서 픽셀 위치가 이전에 이들 분류 중 하나로 분류되면, 시맨틱 분할 알고리즘(또는 수동 작업)이 다음 또는 이전 분류 절차 중 일부에서 해당 픽셀 위치에 대한 다른 분류로 나타나더라도 분류는 유지될 것이다. In some embodiments semantic segmentation may be performed on a subset of image frames of a video stream. In general, such algorithms (or manual tasks) are difficult to implement in real time, ie for every image frame in a video sequence. Advantageously, semantic segmentation (classification) is performed only at specific time intervals, for example every minute, hour, day, etc. That is, the algorithm for semantic segmentation is executed in image frames of the video stream at predetermined intervals from each other. For the remaining image frames, the cached result, for example the result of the last performed classification can be used. In another embodiment, the classification in which the pixel position is classified is determined using a combination of results from the semantic segmentation algorithm for the pixel position from a plurality of image frames in the subset of image frames, eg, classification is the most classification Is the result. In other embodiments, some classifications are defined as being “more important” than others, where if the pixel position was previously classified as one of these classifications, then the semantic segmentation algorithm (or manual task) is applicable in some of the next or previous classification procedures. The classification will be maintained even if it appears to be a different classification for pixel location.

일반적으로, 장면을 캡쳐하는 카메라가 시야를 변경하는 경우, 이전에 시야에 없고 따라서 분류되지 않은 장면의 새로운 부분에 대해 분류가 적어도 다시 수행될 필요가 있다. 즉, 시맨틱 분할을 위한 알고리즘은 비디오 시퀀스를 캡쳐하는 비디오 캡쳐 장치의 시야가 변경될 때 캡쳐된 이미지 프레임에서 실행된다.Generally, when a camera capturing a scene changes the field of view, classification needs to be performed at least again on a new portion of the scene that was previously not in the field of view and thus unclassified. That is, the algorithm for semantic segmentation is executed on the captured image frame when the field of view of the video capture device capturing the video sequence is changed.

픽셀 위치(110, 112)의 분류는 이미지 프레임(100)에서의 픽셀 위치(110, 112)가 캡쳐된 장면(101)의 배경 또는 전경에 속하는지의 여부를 결정하는데 사용된다. 건물 픽셀(도 1의 직사각형(106) 내부)로 분류된 픽셀 위치(110)인 특정 픽셀 위치에 대해 이러한 예에서, 분류가 수신된다(S302)(도 4). 이어서 픽셀 위치(110)는 이의 해당하는 분류의 동적 레벨과 관련되고(S304), 분류의 동적 레벨은 비디오 시퀀스의 프레임들 사이에서 값을 변경하는 분류에 속하는 픽셀 위치에서의 픽셀 값의 가능성을 반영한다. 이는 예를 들어 특정 분류가 갖는 동적 레벨을 정의하는 표를 사용하여 S304를 달성할 수 있다. 결정된(S304) 동적 레벨은 도 5와 관련하여 아래에서 더 설명되는 일부 경우에서, 이미지 프레임에서의 픽셀 위치가 배경 또는 전경에 속하는지 여부를 결정할 때 사용되는 임계값을 설정(S306)하기 위해 사용될 수 있다. The classification of the pixel locations 110 and 112 is used to determine whether the pixel locations 110 and 112 in the image frame 100 belong to the background or foreground of the captured scene 101. In this example, for a particular pixel location that is a pixel location 110 classified as a building pixel (inside rectangle 106 in FIG. 1), a classification is received (S302) (FIG. 4). The pixel location 110 is then associated with the dynamic level of its corresponding classification (S304), and the dynamic level of the classification reflects the likelihood of a pixel value at the pixel location belonging to the classification changing values between frames of the video sequence. do. This can be achieved, for example, by using a table defining the dynamic level a particular classification has. The determined (S304) dynamic level may be used to set a threshold used in determining whether the pixel position in the image frame belongs to the background or foreground (S306), in some cases further described below in connection with FIG. You can.

일부 실시 형태에서, 이의 해당하는 분류가 제1 미리 정의된 분류 그룹(즉, 건물을 포함하는 미리 정의된 분류 그룹)에 속하는 경우, 픽셀 위치는 제1 동적 레벨과 관련되고, 픽셀 위치는 이의 해당하는 분류가 제2 미리 정의된 분류 그룹(예를 들어, 나무를 포함하는 미리 정의된 분류 그룹)에 속하면 더 높은 제2 동적 레벨과 관련된다. 분류와 동적 레벨 사이의 더 세밀한 맵핑(mapping)이 즉 3, 5, 10 등 가능한 동적 레벨을 포함하여 구현될 수 있다.In some embodiments, if its corresponding classification belongs to a first predefined classification group (i.e., a predefined classification group that includes buildings), the pixel location is associated with a first dynamic level, and the pixel location is its corresponding If the classification to be belongs to a second predefined classification group (eg, a predefined classification group comprising trees), it is associated with a higher second dynamic level. More detailed mapping between classification and dynamic levels can be implemented, including 3, 5, 10, etc. possible dynamic levels.

다음으로, 이미지 프레임(100)에서의 픽셀 위치(110)가 배경 또는 전경에 속하는지 여부가 결정된다(S308). 이는 이미지 프레임(100)에서의 픽셀 위치(110)의 픽셀 값을 배경 모델 및 임계값과 비교함으로써 수행된다. 도 2a, 3a는 배경 모델(200)의 값(202) 및 픽셀 위치(110)에 대한 임계값(204)의 두개의 상이한 실시 형태를 도시한다. 도 2a의 실시 형태에서, 배경 모델(200)의 값(202)은 픽셀 위치(110)를 나타내는 복수의 픽셀 값(202)(이러한 예에서는 4개, 이하 배경 샘플이라고도 지칭함)을 포함한다. 각각의 값(202)은 이러한 예에서 예를 들어 픽셀의 광도(luminosity)를 나타내는 하나의 픽셀 값으로 표현된다. 다른 예에서, 각각의 값(202)은 픽셀 위치 값의 빨강, 녹색 및 파랑(RGB) 강도를 나타내는 벡터일 수 있다.Next, it is determined whether the pixel position 110 in the image frame 100 belongs to a background or a foreground (S308). This is done by comparing the pixel value of the pixel location 110 in the image frame 100 with the background model and threshold. 2A and 3A show two different embodiments of the value 202 of the background model 200 and the threshold 204 for the pixel location 110. In the embodiment of FIG. 2A, the value 202 of the background model 200 includes a plurality of pixel values 202 representing the pixel location 110 (four in this example, hereinafter also referred to as background samples). Each value 202 is represented in this example as one pixel value, for example, representing the luminance of the pixel. In another example, each value 202 may be a vector representing the red, green, and blue (RGB) intensity of the pixel location value.

픽셀 값과 배경 모델(200)에서의 위치를 나타내는 (적어도) 제1 미리 결정된 수의 값(미도시) 간의 차이가 픽셀 위치(110)에 대한 임계값(204)보다 적은 경우 픽셀 위치는 배경에 속하는 것으로 결정되고, 그렇지 않으면 전경으로 결정된다. 따라서, 픽셀 위치(110)의 픽셀 값이 12.9이고 제1 미리 결정된 수가 2인 경우, 배경 모델(200)에서의 값(202) 중 하나만이 픽셀 위치(110)의 픽셀 값으로부터 임계값 내에 있기 때문에, 픽셀 위치는 전경에 속하는 것으로 결정될 것이다. 제1 미리 결정된 수가 1이면, 픽셀 위치(110)는 따라서 배경에 속하는 것으로 결정될 것이다. 픽셀 위치(110)의 픽셀 값이 10인 경우, 픽셀 위치는 제1 미리 결정된 수 등이 무엇이든 간에 배경에 속하는 것으로 결정될 것이다.If the difference between the pixel value and the (at least) first predetermined number of values (not shown) representing the location in the background model 200 is less than the threshold 204 for the pixel location 110, the pixel location is in the background. It is decided to belong, otherwise it is decided as the foreground. Thus, if the pixel value of pixel location 110 is 12.9 and the first predetermined number is 2, because only one of values 202 in background model 200 is within the threshold from the pixel value of pixel location 110 , The pixel position will be determined to belong to the foreground. If the first predetermined number is 1, the pixel location 110 will thus be determined to belong to the background. If the pixel value of the pixel location 110 is 10, the pixel location will be determined to belong to the background whatever the first predetermined number or the like.

보다 상세하게, 이미지 시퀀스의 시간(t)에서 픽셀(m)에서의 관찰(observation)을 x_t(m)으로 나타내고, 픽셀(m)의 배경 샘플의 집합을 {x_i(m)│i = 1, ..., N}으로 나타낸다. 각각의 관찰(

)은 k 채널(예를 들어, RGB 색 공간에서 각각의 관찰은 R, G, B의 3개의 채널에 의해 표현됨)을 갖는다. 이미지(100)에서 픽셀 위치(110)에 대해, 이미지 데이터, 즉 그 픽셀 위치의 강도 값(적용 가능한 경우 각각의 채널에 대해)은 이미지 데이터가 임계값(T_r)(204)보다 작게 각각의 배경 샘플과 다른지를 확인하기 위해, 픽셀 위치(110)와 관련된 배경 모델(200)에서 각각의 배경 샘플(202)과 비교된다. 예를 들어, 임계값(T_r)보다 작은 픽셀에서의 이미지 데이터와 다른 배경 샘플은 "1" 값과 관련될 수 있고, 다른 배경 샘플은 하기에 따라 "0" 값과 관련될 수 있다:More specifically, at time t of the image sequence, observation at pixel m is represented by x _t (m), and the set of background samples of pixel m is {x _i (m) │i = 1, ..., N}. Each observation (

) Has k channels (e.g., each observation in the RGB color space is represented by three channels: R, G, B). For the pixel location 110 in the image 100, the image data, i.e. the intensity value of the pixel location (for each channel, if applicable) is such that the image data is less than the threshold T _r 204, respectively. To verify that it is different from the background sample, it is compared to each background sample 202 in the background model 200 associated with the pixel location 110. For example, image data at a pixel smaller than the threshold T _r and other background samples may be associated with a value of “1”, and other background samples may be associated with a value of “0” as follows:

[수학식 1][Equation 1]

.

도 2a, 3a의 예에서, T_r = 2이다. In the example of FIGS. 2A and 3A, T _r = 2.

임계값(T_r)보다 작은 것으로 픽셀의 이미지 데이터와 다른 배경 모델(200)에서의 배경 샘플(202)의 수가 제1 미리 결정된 수의 값(T_N)보다 크거나 같으면, 픽셀 위치(110)에서의 픽셀은 배경에 속하는 것으로 결정된다. 그렇지 않으면 이는 전경에 속하는 것이다. If the number of background samples 202 in the background model 200 different from the image data of the pixel to be smaller than the threshold T _r is greater than or equal to the first predetermined number of values T _N , then the pixel location 110 The pixel at is determined to belong to the background. Otherwise, it belongs to the foreground.

이는 시간(t)에서 배경 픽셀에 대해 "1"의 값, 전경 픽셀에 대해 "0"의 값을 갖는 바이너리 마스크(binary mask, B_t)를 계산함으로써 구현될 수 있다:This can be implemented by calculating a binary mask (B _t ) with a value of "1" for the background pixel and a value of "0" for the foreground pixel at time _t :

[수학식 2][Equation 2]

다르게 표현하면, 픽셀 위치에서의 이미지 데이터와 임계값(T_r)(204)보다 작게 상이한 배경 모델(200)에서의 배경 샘플(202)의 수가 카운트된다. 그 수가 제1 미리 결정된 수의 값(TN)과 같거나 이를 초과하면, 픽셀이 배경에 속하는 것으로 결정되고, 그렇지 않으면 전경에 속하는 것으로 결정된다. 따라서, 픽셀 위치(110)에서의 이미지 데이터와 유사한 것으로(수학식 1에서) 배경 모델(200)에서의 적어도 T_N 배경 샘플(202)이 발견되면, 픽셀 위치(110)에서의 픽셀은 배경에 속하는 것으로 분류될 것이고, 그렇지 않으면 전경에 속하는 것으로 분류될 것이다.In other words, the number of background samples 202 in the background model 200 that differ from the image data at the pixel location and less than the threshold T _r 204 is counted. If the number is equal to or exceeds the first predetermined number of values TN, the pixel is determined to belong to the background, otherwise it is determined to belong to the foreground. Thus, if at least T _N background samples 202 in the background model 200 are found to be similar to the image data at the pixel location 110 (in Equation 1), the pixels at the pixel location 110 are in the background. It will be classified as belonging, otherwise it will be classified as belonging to the foreground.

다시 말해서, 이미지 프레임(100)에서의 픽셀 위치(110)가 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 단계(S308)는: 픽셀 위치(110)에서 이미지 프레임(100)에서의 픽셀 값과 해당 픽셀 위치에서의 배경 모델(200)의 복수의 값(202) 사이의 차이를 계산하는 단계, 픽셀 위치에 특정한 임계값(204)보다 작은 차이의 수를 계산하는 단계, 및 계산된 수가 제1 미리 결정된 수의 값을 초과하거나 동일한 경우 이미지 프레임에서의 픽셀 위치가 배경에 속하는 것으로 결정하고, 이미지 프레임에서의 픽셀 위치가 그렇지 않으면 전경에 속하는 것으로 결정하는 단계를 포함한다.In other words, determining whether the pixel position 110 in the image frame 100 belongs to the background or foreground of the captured scene (S308) is: a pixel in the image frame 100 at the pixel position 110. Calculating a difference between a value and a plurality of values 202 of the background model 200 at the corresponding pixel location, calculating the number of differences less than a threshold 204 specific to the pixel location, and the calculated number And determining that the pixel position in the image frame belongs to the background if it exceeds or equals the first predetermined number of values, and determines that the pixel position in the image frame belongs to the foreground otherwise.

도 3a는 배경 모델(200)의 다른 예를 도시한다. 이런 경우, 배경 모델(200)은 픽셀 위치(110)를 나타내는 하나의 값(202)을 포함하고, 여기서 이미지 프레임(100)에서의 픽셀 위치(110)가 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 단계(S308)는: 픽셀 위치(110)에서 이미지 프레임에서의 픽셀 값과 해당 픽셀 위치에서의 배경 모델(200)의 값(202) 사이의 차이를 계산하는 단계, 및 상기 차이가 픽셀 위치에 특정한 임계값(204)보다 작은 경우 이미지 프레임에서의 픽셀 위치가 배경에 속하는 것으로 결정하고, 이미지 프레임에서의 픽셀 위치가 전경에 속하는 것으로 결정하는 단계를 포함한다. 3A shows another example of a background model 200. In this case, the background model 200 includes one value 202 representing the pixel location 110, where the pixel location 110 in the image frame 100 belongs to the background or foreground of the captured scene Determining whether or not (S308) includes: calculating a difference between a pixel value in an image frame at a pixel location 110 and a value 202 of the background model 200 at the pixel location, and the difference is And determining that the pixel position in the image frame belongs to the background when the pixel position is less than a specific threshold 204, and determining that the pixel position in the image frame belongs to the foreground.

도 2b 및 3b는 픽셀 위치(112)(즉, 나무로 분류됨)에 대한 두개의 배경 모델(200)을 예로서 도시한다. 도 2a, 3a의 예에서 알 수 있는 바와 같이, 배경 모델(200)의 값(들)(202)과 픽셀 위치(112)에서의 픽셀 값 사이의 차이(들)는 큰 임계값(204)으로 인해, 나무로 분류된 픽셀 위치(110)와 관련하여 전술한 것과 비교하여 더 큰 것으로 되게 한다. 2B and 3B show two background models 200 for the pixel location 112 (ie, classified as a tree) as an example. As can be seen in the examples of FIGS. 2A and 3A, the difference (s) between the value (s) 202 of the background model 200 and the pixel values at the pixel location 112 is set to a large threshold 204. This results in a larger one compared to the one described above with respect to the pixel location 110 classified as tree.

배경 모델(200)을 정의하는 다른 방법이 구현될 수 있다. 예를 들어, 픽셀 위치에 대한 배경 모델의 값은 평균값 및 표준 편차를 갖는 가우시안 분포로 표현될 수 있다. 이런 경우, 이미지 프레임에서의 픽셀 위치가 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 단계(S308)는: 픽셀 위치에서 이미지 프레임에서의 픽셀 값과 평균값 사이의 차이를 계산하는 단계, 및 정규화된 차이가 픽셀 위치에 특정한 임계값보다 낮은 경우 이미지 프레임에서의 픽셀 위치가 배경에 속하는 것으로 결정하고, 그렇지 않으면 이미지 프레임에서의 픽셀 위치가 전경에 속하는 것으로 결정하는 단계를 포함한다. Other methods of defining the background model 200 can be implemented. For example, the value of the background model for the pixel location can be expressed as a Gaussian distribution with mean and standard deviation. In this case, determining whether the pixel position in the image frame belongs to the background or foreground of the captured scene (S308) is: calculating a difference between the pixel value and the average value in the image frame at the pixel position, and And determining that the pixel position in the image frame belongs to the background if the normalized difference is lower than a specific threshold for the pixel position, otherwise determining that the pixel position in the image frame belongs to the foreground.

배경 모델(200)의 값(202)(예를 들어, 픽셀 값(들) 또는 가우시안 분포)은 정규 간격(regular interval)으로 업데이트되는 것이 유리하다. 예를 들어, 도 2a-b 모델의 픽셀 값(202)은 이러한 프레임에서 픽셀 위치의 픽셀 값으로 n개의 이미지 프레임마다 업데이트되는 FIFO 큐(queue) 또는 임의의 적절한 방식으로 구현될 수 있다.It is advantageous that the value 202 of the background model 200 (eg, pixel value (s) or Gaussian distribution) is updated at regular intervals. For example, the pixel values 202 of the FIG. 2A-B model can be implemented in any suitable way or FIFO queue that is updated every n image frames with the pixel values of the pixel positions in these frames.

픽셀 위치의 동적 레벨을 적용하기 위해, 임계값을 업데이트해야 한다. 이는 특정 시간에서의 동적 레벨에 기초하여 임계값(S306)을 초기화 또는 재설정함으로써 달성될 수 있다(추가로 하기 참조). 일부 경우에서 임계값이 증가될 수도 있다. 구체적으로, 픽셀 위치가 배경에 속하는 것으로 결정되면, 픽셀 위치(110, 112)의 픽셀 값과 이전 프레임에서 해당하는 픽셀 위치(즉, 동일한 픽셀 위치)에서의 픽셀 값 사이의 차이가 계산된다(S312). 이전 프레임은 사용 경우에 따라, 현재 이미지 프레임(100) 바로 앞의 프레임이거나, 현재 이미지 프레임(100) 이전의 n개의 프레임인 프레임일 수 있다. 이전 프레임 이후에 픽셀 값이 제2 값보다 크게 변경되면, 픽셀 위치에 특정한 임계값은 증분만큼 증가된다(S314). 특정 픽셀 위치에 대해 비교적 높은 동적 레벨을 더 빨리 적용하기 위해, 더 높은 동적 레벨이 더 높은 증분을 야기한다. 결과적으로, 이러한 픽셀 위치가 실수로 전경에 속한다고 결정될 가능성이 감소되는데, 이는 임계값이 상기 상황에서 더 높은 증가분으로 증가되도록 설정되기 때문이다. 제2 값은 미리 결정될 수 있고, 모든 픽셀 위치에 사용되는 정적 값일 수 있다. 제2 값은 해당하는 픽셀 위치의 임계값에 의존하도록 설정될 수 있다. 제2 값은 또한 픽셀 위치와 관련된 동적 레벨에 의존하도록 설정될 수 있다.To apply a dynamic level of pixel position, the threshold must be updated. This can be achieved by initializing or resetting the threshold value S306 based on the dynamic level at a specific time (see further below). In some cases, the threshold may be increased. Specifically, when it is determined that the pixel position belongs to the background, a difference between the pixel value of the pixel positions 110 and 112 and the pixel value at the corresponding pixel position (ie, the same pixel position) in the previous frame is calculated (S312). ). The previous frame may be a frame immediately before the current image frame 100 or a frame that is n frames before the current image frame 100, depending on the use case. If the pixel value is changed larger than the second value after the previous frame, the threshold value specific to the pixel position is increased by an increment (S314). To apply a relatively high dynamic level faster for a particular pixel position, a higher dynamic level results in a higher increment. As a result, the likelihood that this pixel position is mistakenly determined to belong to the foreground is reduced, since the threshold is set to increase in a higher increment in this situation. The second value may be predetermined, and may be a static value used for all pixel locations. The second value can be set to depend on the threshold of the corresponding pixel position. The second value can also be set to depend on the dynamic level associated with the pixel location.

일부 실시 형태에서, 이미지 프레임에서의 픽셀 위치가 배경에 속하는 것으로 결정되면, 임계값도 감소될 수 있다(S316). 구체적으로, 이미지 프레임에서 픽셀 위치에서의 픽셀 값이 이전 프레임 이후의 제2 값보다 작게 변경되면 픽셀 위치에 특정한 임계값은 감소분으로 감소되고(S316), 감소분은 더 높은 동적 레벨이 낮은 감소분으로 나타나도록 픽셀 위치의 동적 레벨에 의존하도록 설정된다. 결과적으로, 픽셀 위치의 픽셀 값이 프레임들 사이에서 유사하게 유지되기 때문에 동적 픽셀 위치에 대한 임계값이 너무 빨리 낮아지지 않는다. 이는 예를 들어 나무로 분류되는 픽셀 위치의 실시 형태에서 바람이 다시 불기 시작하는 시나리오에서 다시 증가된 차이에 빠르게 적용되게 한다. In some embodiments, if it is determined that the pixel position in the image frame belongs to the background, the threshold may also be reduced (S316). Specifically, when the pixel value at the pixel position in the image frame is changed to be smaller than the second value after the previous frame, the threshold value specific to the pixel position is reduced to a decrease (S316), and the higher dynamic level is represented as a low decrease to a decrease. Is set to depend on the dynamic level of the pixel position. As a result, the threshold for the dynamic pixel position is not lowered too quickly because the pixel value at the pixel position remains similar between frames. This allows, for example, to be quickly applied to the increased difference again in a scenario where the wind starts to blow again in the embodiment of the pixel position being classified as a tree.

일반적으로, 감소분은 해당하는 증가분보다 낮고, 이는 또한 상기 시나리오에서 빠르게 적용되게 한다. Generally, the decrement is lower than the corresponding increment, which also allows it to be applied quickly in the scenario.

일부 실시 형태에 따르면, 임계값이 일정한 레벨로 유지되는 분류(복수의 분류)가 존재할 수 있음에 유의해야 한다. 예를 들어, 나무의 줄기가 수관(tree crown)과 분리된 분류인 경우, 줄기로 분류된 픽셀 위치에 대한 임계값은 정적으로 유지될 수 있다. 다시 말해서, 픽셀 위치에 해당하는 분류가 제3 미리 정의된 분류 그룹(나무의 줄기를 포함하는 이러한 예에서)에 속하는 경우, 픽셀 위치에 특정한 임계값은 일정한 레벨로 유지된다. 결과적으로, 장면의 특정 부분에 대해 동적 배경이 비활성화되어, 배경 검출 알고리즘에 대해 유연함을 향상되게 한다. It should be noted that according to some embodiments, there may be a classification (multiple classification) in which the threshold is maintained at a constant level. For example, if the trunk of a tree is a classification separated from a tree crown, a threshold value for a pixel location classified as a stem may be kept static. In other words, when the classification corresponding to the pixel position belongs to the third predefined classification group (in this example including the trunk of a tree), the threshold value specific to the pixel position is maintained at a constant level. As a result, dynamic backgrounds are disabled for certain parts of the scene, improving flexibility for background detection algorithms.

전술한 바와 같이, 선택적으로, 도 4에 설명된 알고리즘은 각각 픽셀 위치에 대해 임계값을 설정하는 단계(S306)를 포함할 수 있다. 다른 실시 형태에 따르면, 임계값을 설정하는 단계는 필수 단계이고, 임계값을 업데이트하기 위한 단계 S310, S312, S314 및 S316은 선택적이다. 이러한 실시 형태는 도 5와 관련하여 하기에서 설명될 것이다. 아래에서, 각각의 픽셀 위치에 대해 임계값을 설정하는 단계(S306)는 상세히 설명될 것이다. 따라서 이하의 설명은 도 4의 방법의 해당하는 선택적인 단계 S306에도 적용된다.As described above, optionally, the algorithm described in FIG. 4 may include setting a threshold for each pixel position (S306). According to another embodiment, setting the threshold is a mandatory step, and steps S310, S312, S314 and S316 for updating the threshold are optional. This embodiment will be described below in connection with FIG. 5. In the following, the step of setting a threshold for each pixel position (S306) will be described in detail. Therefore, the following description also applies to the corresponding optional step S306 of the method of FIG. 4.

도 5의 방법은 픽셀 위치에 대한 분류를 수신하는 단계(S302), 픽셀 위치에 대한 동적 레벨을 결정/관련시키는 단계(S304)를 포함한다. 그 후, 픽셀 위치에 대해 새로운 임계값이 설정되어야 하는지의 여부(S404)가 결정된다. 픽셀 위치에 대해 새로운 임계값이 설정되어야 한다고 결정되면, 픽셀 위치에 특정한 임계값은 값으로 설정되고(S406), 여기서 값은 더 높은 동적 레벨이 더 높은 값으로 나타나도록 픽셀 위치의 동적 레벨에 의존한다. 그렇지 않으면, 임계값이 설정(S406)되지 않아야 하는 경우, 이미지 프레임에서의 픽셀 위치가 배경 또는 전경에 속하는지 여부를 결정하는 단계(S308)는 현재 임계값을 사용하여 전술한 바와 같이 직접 수행된다.The method of FIG. 5 includes receiving a classification for the pixel location (S302), and determining / associated the dynamic level for the pixel location (S304). Thereafter, it is determined whether a new threshold should be set for the pixel position (S404). If it is determined that a new threshold should be set for a pixel location, a threshold specific to the pixel location is set to a value (S406), where the value depends on the dynamic level of the pixel location such that a higher dynamic level appears as a higher value. do. Otherwise, if the threshold should not be set (S406), determining whether the pixel position in the image frame belongs to the background or foreground (S308) is performed directly as described above using the current threshold. .

일례에 따르면, 설정하는 단계(S406)는 비디오 스트림의 초기 프레임에 대해 임계값을 수행한다. 대안적으로 또는 부가적으로, 임계값을 설정하는 단계(S406)는 비디오 시퀀스를 캡쳐하는 비디오 캡쳐 장치의 시야가 변경될 때 수행된다. 임계값을 설정하는 단계(S406)는 또한 비디오 스트림의 이미지 프레임에 대해 서로 미리 결정된 간격으로 수행될 수 있다.According to an example, the step of setting (S406) performs a threshold on the initial frame of the video stream. Alternatively or additionally, the step of setting a threshold (S406) is performed when the field of view of the video capture device capturing the video sequence is changed. The step of setting the threshold (S406) may also be performed at predetermined intervals with respect to image frames of the video stream.

전술한 임계값을 제외하고, 배경 및 전경이 검출되어야 하는 방법에 대한 다른 선호도는 픽셀 위치에 대한 동적 레벨을 이용하여 설정될 수 있음에 유의해야 한다. 예를 들어, 본 발명에서 설명된 방법은 픽셀 위치의 동적 레벨에 따라 픽셀 위치에 특정한 임계값에 대한 낮은 임계값을 설정하는 단계를 더 포함할 수 있으며, 낮은 임계값은 임계값에 대한 최소 가능한 값을 결정하여, 더 높은 동적 레벨이 낮은 임계값의 높은 값으로 나타난다. 이는 (분류를 통해) 동적으로 정의된 장면에서의 영역이 일정 기간 동안 (거의) 정적인 경우에서 임계값을 감소시키는 단계가 임계값을 너무 낮추는 위험을 감소시킬 수 있다.It should be noted that, other than the thresholds described above, other preferences for how the background and foreground should be detected can be set using dynamic levels for pixel location. For example, the method described in the present invention may further include setting a low threshold for a specific threshold for the pixel location according to the dynamic level of the pixel location, where the low threshold is the minimum possible for the threshold. By determining the value, a higher dynamic level appears as a higher value for a lower threshold. This can reduce the risk that the step of reducing the threshold in a case where the area in the dynamically defined scene (via classification) is (almost) static for a period of time is too low the threshold.

전술한 방법은 소프트웨어 및/또는 하드웨어에서, 예를 들어 처리 능력을 갖는 장치에 의해 실행될 때 본 발명에서 기술된 방법을 수행하도록 적용된 명령어를 갖는 컴퓨터 판독 가능 저장 매체를 포함하는 컴퓨터 프로그램 제품으로서 구현될 수 있다. 따라서 상기 방법은 본 발명에서 설명된 방법을 수행하도록 적용된 프로세서를 포함하는 장치에서 구현될 수 있다. 상기 장치는 장치 시스템의 일부일 수 있으며, 이러한 시스템은 도 6과 관련하여 아래에서 예시될 것이다.The method described above may be implemented in software and / or hardware as a computer program product comprising a computer readable storage medium having instructions adapted to perform the methods described herein when executed by, for example, a device having processing power. You can. Thus, the method can be implemented in an apparatus comprising a processor adapted to perform the method described in the present invention. The device may be part of a device system, which will be illustrated below in connection with FIG. 6.

도 6의 시스템(600)은 장면(101)을 묘사하는 비디오 시퀀스(복수의 이미지 프레임(100a-c)을 포함함)를 연속적으로 캡쳐하도록 적용된 비디오 캡쳐 장치를 포함한다. 시스템은 비디오 캡쳐 장치(604)로부터 비디오 시퀀스의 제1 이미지 프레임(100a)을 수신하고, 이미지 프레임(100a)에서 각각의 픽셀 위치를 분류하며(분류는 픽셀 위치에서 캡쳐된 장면에서의 물체 유형을 나타냄), 이미지 프레임(100a)에서 각각의 픽셀 위치에 대해 분류를 출력(609)하도록 적용된 제1 장치(608)(도 6에서 시맨틱 분석을 위한 장치로서 표시됨)를 포함한다. 전술한 바와 같이, 제1 장치(608)는 분류에 사용된 이전 이미지 프레임으로부터 프레임이 특정 시간 간격으로 캡쳐되는지 또는 비디오 캡쳐 장치(604)가 시야 등이 변경되었는지와 같은 다수의 전제 조건에 기초하여 선택된 이미지 프레임에 대해 이러한 분류를 수행할 수 있다.The system 600 of FIG. 6 includes a video capture device adapted to successively capture a video sequence depicting the scene 101 (including multiple image frames 100a-c). The system receives the first image frame 100a of the video sequence from the video capture device 604, classifies each pixel position in the image frame 100a (classification determines the object type in the scene captured at the pixel position). ), The first device 608 (shown as the device for semantic analysis in FIG. 6) applied to output 609 the classification for each pixel position in the image frame 100a. As described above, the first device 608 is based on a number of prerequisites, such as whether a frame is captured at a specific time interval from a previous image frame used for classification, or whether the video capture device 604 has changed field of view or the like. This classification can be performed on the selected image frame.

시스템은 제1 장치(608)로부터의 출력(609), 즉 제1 장치(608)로부터 출력(609), 즉 제1 장치로부터 제1 이미지 프레임(100a)에서의 각각의 픽셀 위치에 대한 분류 및 비디오 캡쳐 장치의 비디오 시퀀스의 적어도 제2 이미지 프레임(100b-c)을 수신하고, 전술한 바와 같이 수신된 제2 이미지 프레임의 픽셀에 대한 배경 분류(배경 분석 등)를 수행하도록 적용된 제2 장치(610)(도 6에서 배경 분류기(background classifier)로서 표시됨)를 더 포함한다. 물론 이러한 배경 분류는 제1 이미지 프레임(100a)상에서 이루어질 수 있음에 유의해야 한다.The system classifies and outputs each pixel position in the output 609 from the first device 608, that is, the output 609 from the first device 608, that is, the first image frame 100a from the first device. A second device applied to receive at least the second image frame 100b-c of the video sequence of the video capture device and perform background classification (such as background analysis) on the pixels of the received second image frame as described above ( 610) (denoted as a background classifier in FIG. 6). Of course, it should be noted that this background classification can be performed on the first image frame 100a.

일부 실시 형태에 따르면, 제2 장치(610)는 또한 전술한 바와 같이, 임계값이 설정되어야 하는지 여부(도 4-5에서 S306)를 결정하는데 사용되는 비디오 캡쳐 장치(604)로부터 입력(612)을 수신한다. 또한 제1 및 제2 장치(608, 610)는 동일한 물리적 장치 또는 비디오 캡쳐 장치(604)에서 구현될 수 있음에 유의해야 한다.According to some embodiments, the second device 610 is also input 612 from the video capture device 604 used to determine whether a threshold should be set (S306 in FIGS. 4-5), as described above. To receive. It should also be noted that the first and second devices 608, 610 can be implemented on the same physical device or video capture device 604.

당업자는 전술한 실시 형태를 다양한 방식으로 변형할 수 있고 상기 실시 형태에 도시된 바와 같이 본 발명의 장점을 여전히 이용할 수 있는 것으로 이해될 것이다. It will be understood by those skilled in the art that the above-described embodiments can be modified in various ways and still take advantage of the present invention as shown in the above embodiments.

예를 들어, 일부 실시 형태(도 6에 도시되지 않음)에 따르면, 제2 장치는 제2 이미지 프레임에서의 어느 픽셀 위치가 캡쳐된 장면의 배경 및 전경에 속하는지에 관한 데이터를 출력하도록 적용된다. 시스템은 비디오 캡쳐 장치(604)로부터 비디오 시퀀스(100a-c)를 수신하고, 제2 이미지 프레임에서 움직임을 검출하며, 제2 이미지 프레임에서 검출된 움직임에 관한 데이터를 출력하도록 구성된 제3 장치를 포함할 수 있다. 이러한 출력은 제2 장치로부터 출력된 데이터 및 제3 장치로부터 출력된 데이터를 수신하고 비디오 스트림에서 물체를 추적하기 위해 수신된 데이터를 사용하도록 적용된 제4 장치에 의해 수신될 수 있다.For example, according to some embodiments (not shown in FIG. 6), the second device is applied to output data regarding which pixel position in the second image frame belongs to the background and foreground of the captured scene. The system includes a third device configured to receive video sequences 100a-c from video capture device 604, detect motion in the second image frame, and output data regarding motion detected in the second image frame. can do. Such output may be received by a fourth device adapted to receive data output from the second device and data output from the third device and use the received data to track objects in the video stream.

따라서, 본 발명은 도시된 실시 형태에 제한되지 않고 첨부된 청구 범위에 의해서만 정의되어야 한다. 또한, 당업자가 이해하는 바와 같이, 도시된 실시 형태는 조합될 수 있다. Accordingly, the invention should not be limited to the illustrated embodiments, but should only be defined by the appended claims. Further, as understood by those skilled in the art, the illustrated embodiments may be combined.

Claims

비디오 시퀀스의 이미지 프레임에서의 픽셀 위치가 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 컴퓨터 구현 방법으로, 이미지 프레임에서의 각각의 픽셀 위치에 대해:
상기 픽셀 위치가 분류된 분류를 수신하는 단계 - 상기 분류는 상기 픽셀 위치에서 캡쳐된 장면에서의 콘텐츠의 카테고리를 나타냄 -,
상기 픽셀 위치를 해당하는 분류의 동적 레벨과 관련시키는 단계 - 분류의 상기 동적 레벨은 상기 분류에 속하는 픽셀 위치에서의 픽셀 값이 상기 비디오 시퀀스의 프레임들 사이에서 값을 변경할 가능성을 반영함 -,
상기 이미지 프레임에서의 픽셀 위치의 픽셀 값을 배경 모델 및 상기 픽셀 위치에 대한 임계값과 비교함으로써 상기 이미지 프레임에서의 픽셀 위치가 상기 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 단계 - 하나 이상의 값을 포함하는 상기 배경 모델은 상기 픽셀 위치의 픽셀 값을 나타내고, 상기 픽셀 위치는 상기 픽셀 값과 상기 배경 모델에서의 하나 이상의 값 중 제1 미리 결정된 수의 값 사이의 차이가 상기 픽셀 위치에 대한 임계값보다 작은 경우 배경에 속하는 것으로 결정됨 -,
상기 이미지 프레임에서의 픽셀 위치가 배경에 속하는 것으로 결정되는 경우:
상기 이미지 프레임에서 상기 픽셀 위치에서의 픽셀 값이 이전 프레임 이후의 제2 값보다 크게 변경되면 상기 픽셀 위치에 특정한 임계값을 증가분으로 증가시키는 단계 - 상기 증가분은 상기 픽셀 위치의 동적 레벨에 의존하도록 설정되어 높은 동적 레벨이 높은 증가분을 야기함 -를 포함하는 방법.A computer-implemented method of determining whether a pixel position in an image frame of a video sequence belongs to the background or foreground of a captured scene, for each pixel position in an image frame:
Receiving a classification in which the pixel location is classified, wherein the classification represents a category of content in a scene captured at the pixel location ;,
Associating the pixel location with a dynamic level of the corresponding classification, wherein the dynamic level of the classification reflects the possibility that the pixel value at the pixel location belonging to the classification will change the value between frames of the video sequence-,
Determining whether the pixel position in the image frame belongs to the background or foreground of the captured scene by comparing the pixel value of the pixel position in the image frame to a background model and a threshold for the pixel position-one The background model including the above value represents a pixel value at the pixel position, and the pixel position is a difference between the pixel value and a first predetermined number of values of one or more values in the background model at the pixel position. If it is less than the threshold for, it is determined to belong to the background-,
If it is determined that the pixel position in the image frame belongs to the background:
Increasing the threshold value specific to the pixel position in increments when the pixel value at the pixel position in the image frame is changed larger than the second value after the previous frame-the increment set to depend on the dynamic level of the pixel position A method comprising a high dynamic level resulting in a high increment.

제1항에 있어서,
상기 이미지 프레임에서의 픽셀 위치가 배경에 속하는 것으로 결정되는 경우:
상기 이미지 프레임에서 상기 픽셀 위치에서의 픽셀 값이 이전 프레임 이후의 제2 값보다 작게 변경되면 상기 픽셀 위치에 특정한 임계값을 감소분으로 감소시키는 단계를 더 포함하고, 상기 감소분은 상기 픽셀 위치의 동적 레벨에 의존하도록 설정되어 높은 동적 레벨이 낮은 감소분을 야기하는 것을 특징으로 하는 방법.According to claim 1,
If it is determined that the pixel position in the image frame belongs to the background:
And when the pixel value at the pixel position in the image frame is changed to be less than a second value after the previous frame, further reducing the threshold specific to the pixel position in increments, wherein the reduction is the dynamic level of the pixel position. The method is characterized in that the high dynamic level is set to depend on causing a low decrease.

제1항에 있어서,
상기 픽셀 위치를 해당하는 분류의 동적 레벨과 관련시키고 나서 상기 픽셀 위치에 특정한 임계값을 값으로 설정하는 단계를 더 포함하고, 상기 값은 상기 픽셀 위치의 동적 레벨에 의존하여 더 높은 동적 레벨이 더 높은 값을 야기하는 것을 특징으로 하는 방법.According to claim 1,
Further comprising the step of associating the pixel position with a dynamic level of a corresponding classification and then setting a threshold value specific to the pixel position as a value, where the higher dynamic level is further dependent on the dynamic level of the pixel position. A method characterized by causing a high value.

제3항에 있어서, 상기 임계값을 설정하는 단계는 상기 비디오 시퀀스의 초기 프레임에 대해 수행되는 것을 특징으로 하는 방법.4. The method of claim 3, wherein the step of setting the threshold is performed on an initial frame of the video sequence.

제3항에 있어서, 상기 임계값을 설정하는 단계는 상기 비디오 시퀀스를 캡쳐하는 비디오 캡쳐 장치의 시야가 변경될 때 수행되는 것을 특징으로 하는 방법.4. The method of claim 3, wherein the step of setting the threshold is performed when a field of view of a video capture device capturing the video sequence is changed.

제3항에 있어서, 상기 픽셀 위치에 특정한 임계값을 값으로 설정하는 단계는 상기 픽셀 위치의 동적 레벨에 따라 상기 픽셀 위치에 특정한 임계값에 대한 낮은 임계값을 설정하는 단계를 더 포함하고, 더 낮은 동적 레벨은 상기 임계값에 대한 가능한 최소값을 결정하여, 높은 동적 레벨이 상기 낮은 임계값의 더 높은 값을 야기하는 것을 특징으로 하는 방법.4. The method of claim 3, wherein setting the threshold value specific to the pixel position as a value further comprises setting a low threshold value for the threshold value specific to the pixel position according to the dynamic level of the pixel position, and further A method in which a low dynamic level determines a minimum possible value for the threshold, so that a high dynamic level results in a higher value of the low threshold.

제1항에 있어서, 상기 해당하는 분류가 제1 미리 정의된 분류 그룹에 속하는 경우 상기 픽셀 위치는 제1 동적 레벨과 관련되고, 상기 해당하는 분류가 제2 미리 정의된 분류 그룹에 속하는 경우 상기 픽셀 위치는 더 높은 제2 동적 레벨과 관련되는 것을 특징으로 하는 방법. The pixel position of claim 1, wherein the pixel position is associated with a first dynamic level when the corresponding classification belongs to a first predefined classification group, and the pixel if the corresponding classification belongs to a second predefined classification group. The method of claim 1, wherein the position is associated with a higher second dynamic level.

제7항에 있어서, 상기 픽셀 위치에 해당하는 분류가 미리 정의된 제3 분류 그룹에 속하는 경우, 상기 픽셀 위치에 특정한 임계값은 일정한 레벨로 유지되는 것을 특징으로 하는 방법.The method of claim 7, wherein when the classification corresponding to the pixel location belongs to a predefined third classification group, a threshold value specific to the pixel location is maintained at a constant level.

제1항에 있어서, 상기 픽셀 위치가 분류된 분류는 시맨틱 분할을 위한 알고리즘을 사용하여 결정되는 것을 특징으로 하는 방법.The method of claim 1, wherein the classification in which the pixel position is classified is determined using an algorithm for semantic segmentation.

제9항에 있어서, 상기 시맨틱 분할을 위한 알고리즘은 상기 비디오 시퀀스의 이미지 프레임의 서브 세트에서 실행되는 것을 특징으로 하는 방법.10. The method of claim 9, wherein the algorithm for semantic segmentation is performed on a subset of image frames of the video sequence.

제10항에 있어서, 상기 픽셀 위치가 분류된 분류는 이미지 프레임의 상기 서브 세트에서 복수의 이미지 프레임들로부터 상기 픽셀 위치에 대한 상기 시맨틱 알고리즘으로부터의 결과의 조합을 사용하여 결정되는 것을 특징으로 하는 방법.11. The method of claim 10, wherein the classification in which the pixel location is classified is determined using a combination of results from the semantic algorithm for the pixel location from a plurality of image frames in the subset of image frames. .

제1항에 있어서, 상기 배경 모델은 상기 픽셀 위치의 픽셀 값을 나타내는 복수의 값을 포함하고, 상기 이미지 프레임에서의 픽셀 위치가 상기 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 단계는:
상기 픽셀 위치에서 상기 이미지 프레임에서의 픽셀 값과 상기 해당하는 픽셀 위치에서의 배경 모델의 복수의 값 각각의 차이를 계산하는 단계,
상기 픽셀 위치에 특정한 임계값보다 작은 차이의 수를 계산하는 단계,
상기 계산된 수가 상기 제1 미리 결정된 수의 값을 초과하거나 같으면 상기 이미지 프레임에서의 픽셀 위치가 배경에 속하는 것으로 결정하고, 그렇지 않으면 상기 이미지 프레임에서의 픽셀 위치가 전경에 속하는 것으로 결정하는 단계를 포함하는 것을 특징으로 하는 방법.The method of claim 1, wherein the background model includes a plurality of values representing a pixel value of the pixel location, and determining whether the pixel location in the image frame belongs to a background or foreground of the captured scene. :
Calculating a difference between each pixel value in the image frame at the pixel location and a plurality of values of a background model at the corresponding pixel location,
Calculating a number of differences smaller than a threshold value specific to the pixel location,
And determining that the pixel position in the image frame belongs to a background if the calculated number exceeds or equals the value of the first predetermined number, otherwise determining that the pixel position in the image frame belongs to a foreground. Method characterized in that.

처리 능력을 갖는 장치에서 실행될 때 제1항 내지 제12항 중 어느 한 항의 방법을 수행하도록 적용된 명령어를 갖는 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록 매체.A computer-readable recording medium recording a program having instructions applied to perform the method of any one of claims 1 to 12 when executed in a device having processing power.

비디오 시퀀스의 이미지 프레임에서의 픽셀 위치가 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하기 위한 장치로서,
상기 픽셀 위치가 분류된 분류를 수신하는 단계 - 상기 분류는 상기 픽셀 위치에서 캡쳐된 장면에서의 물체의 유형을 나타냄 -,
상기 픽셀 위치를 해당하는 분류에 기초한 동적 레벨과 관련시키는 단계 - 분류의 상기 동적 레벨은 상기 분류에 속하는 픽셀 위치에서의 픽셀 값이 상기 비디오 시퀀스의 프레임들 사이에서 값을 변경할 가능성을 반영함 -,
상기 이미지 프레임에서의 픽셀 위치의 픽셀 값을 배경 모델 및 상기 픽셀 위치에 대한 임계값과 비교함으로써 상기 이미지 프레임에서의 픽셀 위치가 상기 캡쳐된 장면의 배경 또는 전경에 속하는지 여부를 결정하는 단계 - 상기 배경 모델은 상기 픽셀 위치의 픽셀 값을 나타내는 하나 이상의 값을 포함하고, 상기 픽셀은 상기 픽셀 값과 상기 배경 모델에서의 하나 이상의 값 중 제1 미리 결정된 수의 값 사이의 차이가 상기 픽셀 위치에 대한 임계값보다 작으면 배경에 속하는 것으로 결정됨 -,
상기 이미지 프레임에서의 픽셀 위치가 배경에 속하는 것으로 결정되는 경우:
상기 픽셀 값이 이전 프레임 이후의 제2 값보다 크게 변경되면 상기 픽셀 위치에 특정한 임계값을 증가분으로 증가시키는 단계 - 상기 증가분은 상기 픽셀 위치의 동적 레벨에 의존하도록 설정되어 높은 동적 레벨은 높은 증가분을 야기함 -를 위해 적용된 프로세서를 포함하는 장치.An apparatus for determining whether a pixel position in an image frame of a video sequence belongs to the background or foreground of a captured scene,
Receiving a classification in which the pixel position is classified, wherein the classification indicates the type of object in a scene captured at the pixel position-,
Associating the pixel position with a dynamic level based on the corresponding classification, wherein the dynamic level of the classification reflects the possibility that the pixel value at the pixel position belonging to the classification will change the value between frames of the video sequence-,
Determining whether the pixel position in the image frame belongs to the background or foreground of the captured scene by comparing the pixel value of the pixel position in the image frame to a background model and a threshold for the pixel position-the The background model includes one or more values representing a pixel value at the pixel location, and the pixel is the difference between the pixel value and a first predetermined number of values of the one or more values in the background model for the pixel location. If it is less than the threshold, it is determined to belong to the background-,
If it is determined that the pixel position in the image frame belongs to the background:
Increasing the threshold value specific to the pixel position in increments when the pixel value is changed greater than the second value after the previous frame-the increment is set to depend on the dynamic level of the pixel position so that the high dynamic level is a high increment. A device comprising a processor applied for-causing.

장면을 묘사하는 비디오 시퀀스를 연속적으로 캡쳐하도록 적용된 비디오 캡쳐 장치,
상기 비디오 캡쳐 장치로부터 상기 비디오 시퀀스의 제1 이미지 프레임을 수신하고, 상기 이미지 프레임에서 각각의 픽셀 위치를 분류하며(상기 분류는 상기 픽셀 위치에서 캡쳐된 장면에서의 물체의 유형을 나타냄), 상기 이미지 프레임에서의 각각의 픽셀 위치에 대한 분류를 출력하도록 적용된 제1 장치,
상기 비디오 캡쳐 장치로부터 상기 비디오 시퀀스의 제2 이미지 프레임을 수신하고 상기 제1 장치로부터 상기 제1 이미지 프레임에서의 각각의 픽셀 위치에 대한 분류를 수신하도록 적용된 제14항에 따른 제2 장치를 포함하는 시스템.
A video capture device applied to continuously capture a video sequence depicting a scene,
Receiving a first image frame of the video sequence from the video capture device, classifying each pixel position in the image frame (the classification indicates the type of object in the scene captured at the pixel position), and the image A first device adapted to output a classification for each pixel position in the frame,
And a second device according to claim 14 adapted to receive a second image frame of the video sequence from the video capture device and to receive a classification for each pixel position in the first image frame from the first device. system.