KR102140255B1

KR102140255B1 - Device and method for tracking object in image based on deep learning using rotatable elliptical model

Info

Publication number: KR102140255B1
Application number: KR1020180170829A
Authority: KR
Inventors: 김원준; 김민지
Original assignee: 건국대학교 산학협력단
Priority date: 2018-12-27
Filing date: 2018-12-27
Publication date: 2020-07-31
Also published as: KR20200084467A

Abstract

회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치는 이전 프레임 영상으로부터 추적 대상 객체에 대응하는 타원형 모델을 정의하는 모델 정의부, 상기 이전 프레임 영상과 현재 프레임 영상의 특성을 입력으로 하는 인공신경망을 통해 상기 현재 프레임 영상에서 상기 추적 대상 객체에 대응하는 타원형 모델을 예측하는 모델 예측부 및 상기 예측된 현재 프레임 영상의 타원형 모델에 기초하여 상기 현재 프레임 영상에서 상기 추적 대상 객체를 추적하는 객체 추적부를 포함할 수 있다.In-depth learning-based object tracking device using a rotatable elliptical model is a model definition unit that defines an elliptical model corresponding to an object to be tracked from a previous frame image, and an artificial object that inputs characteristics of the previous frame image and the current frame image as input. An object tracking unit that tracks the object to be tracked in the current frame image based on a model predictor predicting an elliptical model corresponding to the object to be tracked in the current frame image through a neural network and an elliptical model of the predicted current frame image. It may contain wealth.

Description

회전 가능한 타원형 모델을 이용한 딥러닝 기반의 영상 내 객체 추적 장치 및 방법{DEVICE AND METHOD FOR TRACKING OBJECT IN IMAGE BASED ON DEEP LEARNING USING ROTATABLE ELLIPTICAL MODEL}DEVICE AND METHOD FOR TRACKING OBJECT IN IMAGE BASED ON DEEP LEARNING USING ROTATABLE ELLIPTICAL MODEL}

본원은 회전 가능한 타원형 모델을 이용한 딥러닝 기반의 영상 내 객체 추적 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for tracking an object in an image based on deep learning using a rotatable elliptical model.

컨볼루션 신경망(Convolutional Neural Network; CNN)은 인공신경망 기술의 하나로, 영상 분석 분야에 다양하게 도입되어 높은 분석 성능 달성에 핵심적인 역할을 한다.Convolutional Neural Network (CNN) is an artificial neural network technology, which is variously introduced in the field of image analysis and plays a key role in achieving high analysis performance.

종래의 인공신경망을 이용한 객체 추적에는 영상 내 객체에 사각형의 모델을 부여하여 객체를 추적하였으나, 모델이 사각형인 특성으로 인해 객체 주변의 배경이 모델에 포함되어 불필요한 연산이 요구되거나 객체의 이동, 형상 변형 등에 적응적으로 객체 추적을 위한 모델의 특성이 변하지 못하는 문제점이 있다. 따라서, 객체의 형상 또는 크기에 따라 적응적으로 최적화될 수 있는 추적 모델의 개발이 요구된다.Object tracking using a conventional artificial neural network tracks an object by assigning a rectangular model to the object in the image, but due to the characteristics of the model, the background around the object is included in the model, requiring unnecessary computation, object movement, and shape. There is a problem in that characteristics of a model for object tracking cannot be changed adaptively to deformation. Therefore, there is a need to develop a tracking model that can be adaptively optimized according to the shape or size of an object.

본원의 배경이 되는 기술은 한국공개특허공보 제10-2017-0137350호에 개시되어 있다.The background technology of the present application is disclosed in Korean Patent Publication No. 10-2017-0137350.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 추적 대상 객체의 이동 및 형상 변형에 최적화 될 수 있는 회전 가능한 타원형 모델을 이용한 딥러닝 기반의 영상 내 객체 추적 장치 및 방법을 제공하는 것을 목적으로 한다.The present application is for solving the above-mentioned problems of the prior art, and aims to provide an apparatus and method for tracking objects in an image based on deep learning using a rotatable elliptical model that can be optimized for movement and shape deformation of an object to be tracked. do.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 객체 추적을 위한 예측 모델과 실제 객체와의 차이를 최소화할 수 있는 타원형 모델을 이용한 딥러닝 기반의 영상 내 객체 추적 장치 및 방법을 제공하는 것을 목적으로 한다.The present invention is to solve the problems of the prior art described above, and to provide an apparatus and method for tracking an object in an image based on deep learning using an elliptical model capable of minimizing the difference between a prediction model for object tracking and a real object The purpose.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들도 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problems to be achieved by the embodiments of the present application are not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 타원형 모델을 이용한 딥러닝 기반의 영상 내 객체 추적 방법은, (a) 이전 프레임 영상으로부터 추적 대상 객체에 대응하는 타원형 모델을 정의하는 단계, (b) 상기 이전 프레임 영상과 현재 프레임 영상의 특성을 입력으로 하는 인공신경망을 통해 상기 현재 프레임 영상에서 상기 추적 대상 객체에 대응하는 타원형 모델을 예측하는 단계 및 (c) 상기 예측된 현재 프레임 영상의 타원형 모델에 기초하여 상기 현재 프레임 영상에서 상기 추적 대상 객체를 추적하는 단계를 포함할 수 있다.As a technical means for achieving the above technical problem, an object tracking method in a deep learning based image using an elliptical model according to an embodiment of the present application includes: (a) an elliptical model corresponding to an object to be tracked from a previous frame image. Defining, (b) predicting an elliptical model corresponding to the object to be tracked in the current frame image through an artificial neural network that inputs characteristics of the previous frame image and the current frame image, and (c) the predicted And tracking the object to be tracked in the current frame image based on an elliptical model of the current frame image.

본원의 일 실시예에 따르면, 상기 (a) 단계는, 상기 이전 프레임 영상에서 추적 대상 객체 영역을 크롭(crop)하고, 상기 추적 대상 객체 영역에서 상기 타원형 모델을 정의하고, 상기 타원형 모델은 벡터에 기반한 복수의 특징 도메인을 포함하되, 상기 특징 도메인은, 타원의 중심점, 장축, 단축, 및 사잇각을 포함할 수 있다.According to an embodiment of the present application, in step (a), a target object region is cropped in the previous frame image, the elliptical model is defined in the target object region, and the elliptical model is a vector. It includes a plurality of feature domains based on the feature domains, and may include an ellipse center point, a long axis, a short axis, and an angle.

본원의 일 실시예에 따르면, 상기 (b) 단계는, 상기 예측된 현재 프레임 영상의 타원형 모델과 실제 추적 대상 객체의 타원형 모델의 차이에 따른 손실 함수를 고려하여 상기 현재 프레임 영상의 타원형 모델을 예측할 수 있다.According to an embodiment of the present application, in step (b), the elliptical model of the current frame image is predicted by considering a loss function according to a difference between the predicted elliptical model of the current frame image and the actual tracking object. Can.

본원의 일 실시예에 따르면, 상기 손실 함수에 포함되는 가중치는, 타원형 모델의 특징 도메인의 변화량에 기초하여 특징 도메인 별로 결정될 수 있다.According to one embodiment of the present application, the weight included in the loss function may be determined for each feature domain based on the amount of change in the feature domain of the elliptical model.

본원의 일 실시예에 따르면, 상기 (b) 단계는, 상기 인공신경망을 통해 상기 이전 프레임 영상과 상기 현재 프레임 영상 각각의 특징맵을 산출하고, 상기 인공신경망은, 상기 각각의 특징맵을 완전 연결 레이어 처리하여 상기 추적 대상 객체의 모션 차이를 학습하여 상기 추적 대상 객체에 대응하는 타원형 모델을 예측할 수 있다.According to an embodiment of the present application, in step (b), a feature map of each of the previous frame image and the current frame image is calculated through the artificial neural network, and the artificial neural network completely connects each feature map. By processing a layer, the motion difference between the tracking target object is learned to predict an elliptical model corresponding to the tracking target object.

본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치는, 이전 프레임 영상으로부터 추적 대상 객체에 대응하는 타원형 모델을 정의하는 모델 정의부, 상기 이전 프레임 영상과 현재 프레임 영상의 특성을 입력으로 하는 인공신경망을 통해 상기 현재 프레임 영상에서 상기 추적 대상 객체에 대응하는 타원형 모델을 예측하는 모델 예측부 및 상기 예측된 현재 프레임 영상의 타원형 모델에 기초하여 상기 현재 프레임 영상에서 상기 추적 대상 객체를 추적하는 객체 추적부를 포함할 수 있다.In-depth learning-based object tracking apparatus using a rotatable elliptical model according to an embodiment of the present disclosure includes a model definition unit defining an elliptical model corresponding to an object to be tracked from a previous frame image, the previous frame image and the current frame The model predictor predicts an elliptical model corresponding to the object to be tracked in the current frame image through an artificial neural network having the characteristics of the image as input, and the current frame image based on the predicted elliptical model of the current frame image. An object tracking unit tracking an object to be tracked may be included.

본원의 일 실시예에 따르면, 상기 모델 정의부는, 상기 이전 프레임 영상에서 추적 대상 객체 영역을 크롭(crop)하고, 상기 추적 대상 객체 영역에서 상기 타원형 모델을 정의하고, 상기 타원형 모델은 벡터에 기반한 복수의 특징 도메인을 포함하되, 상기 특징 도메인은, 타원의 중심점, 장축, 단축, 및 사잇각을 포함할 수 있다.According to one embodiment of the present application, the model definition unit crops a target object area in the previous frame image, defines the elliptical model in the target object area, and the elliptical model is a plurality of vectors based on It includes a feature domain, the feature domain, may include the center point, the long axis, the minor axis, and the angle of the ellipse.

본원의 일 실시예에 따르면, 상기 모델 예측부는, 상기 예측된 현재 프레임 영상의 타원형 모델과 실제 추적 대상 객체의 타원형 모델의 차이에 따른 손실 함수를 고려하여 상기 현재 프레임 영상의 타원형 모델을 예측할 수 있다.According to an embodiment of the present disclosure, the model predicting unit may predict an elliptical model of the current frame image in consideration of a loss function according to a difference between the predicted elliptical model of the current frame image and the actual tracking object. .

본원의 일 실시예에 따르면, 상기 모델 예측부는, 상기 인공신경망을 통해 상기 이전 프레임 영상과 상기 현재 프레임 영상 각각의 특징맵을 산출하고, 상기 인공신경망은, 상기 각각의 특징맵을 완전 연결 레이어 처리하여 상기 추적 대상 객체의 모션 차이를 학습하여 상기 추적 대상 객체에 대응하는 타원형 모델을 예측할 수 있다.According to an embodiment of the present application, the model predicting unit calculates a feature map of each of the previous frame image and the current frame image through the artificial neural network, and the artificial neural network processes each feature map in a fully connected layer. By learning the motion difference of the object to be tracked, an elliptical model corresponding to the object to be tracked can be predicted.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present application. In addition to the exemplary embodiments described above, additional embodiments may exist in the drawings and detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 추적 대상 객체에 최적화 될 수 있는 회전 가능한 타원형 모델을 이용한 딥러닝 기반의 영상 내 객체 추적 장치 및 방법을 제공할 수 있다.According to the above-described problem solving means of the present application, it is possible to provide an apparatus and method for tracking an object in an image based on deep learning using a rotatable elliptical model that can be optimized for an object to be tracked.

전술한 본원의 과제 해결 수단에 의하면, 객체 추적을 위한 예측 모델과 실제 객체와의 차이를 최소화할 수 있는 타원형 모델을 이용한 딥러닝 기반의 영상 내 객체 추적 장치 및 방법을 제공할 수 있다.According to the above-described problem solving means of the present application, it is possible to provide an apparatus and method for tracking an object in an image based on deep learning using an elliptical model capable of minimizing the difference between a prediction model for object tracking and a real object.

도 1은 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치의 구성을 도시한 도면이다.
도 2는 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치의 회전 가능한 타원형 모델의 예를 도시한 도면이다.
도 3은 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치의 현재 프레임 영상에서 타원형 모델을 예측하기 위한 흐름을 도시한 도면이다.
도 4는 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치의 손실 함수를 고려한 타원형 모델의 예측의 예를 도시한 도면이다.
도 5는 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치의 가중치를 고려한 예측된 타원형 모델의 특징 도메인의 예를 도시한 도면이다.
도 6은 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치의 예측된 타원형 모델을 통한 현재 프레임 영상의 객체 추적을 도시한 도면이다.1 is a diagram illustrating a configuration of an object tracking device in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.
2 is a diagram illustrating an example of a rotatable elliptical model of an object tracking device in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.
FIG. 3 is a diagram illustrating a flow for predicting an elliptical model in a current frame image of an object tracking device in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.
4 is a diagram illustrating an example of prediction of an elliptical model considering a loss function of an object tracking device in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.
5 is a diagram illustrating an example of a feature domain of a predicted elliptical model considering weights of an object tracking device in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.
FIG. 6 is a diagram illustrating object tracking of a current frame image through a predicted elliptical model of an object tracking device in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present application pertains may easily practice. However, the present application may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present application in the drawings, parts irrelevant to the description are omitted, and like reference numerals are assigned to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element in between. do.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when one member is positioned on another member “on”, “on top”, “top”, “bottom”, “bottom”, “bottom”, it means that one member is on another member This includes cases where there is another member between the two members as well as when in contact.

본원 명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함" 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout the present specification, when a part “includes” a certain component, it means that the component may further include other components, not to exclude other components, unless specifically stated to the contrary.

도 1은 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치의 구성을 도시한 도면이다.1 is a diagram illustrating a configuration of an object tracking device in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.

도 1을 참조하면, 심층학습 기반의 영상 내 객체 추적 장치(100)는 모델 정의부(110), 모델 예측부(120) 및 객체 추적부(130)를 포함할 수 있다.Referring to FIG. 1, the apparatus 100 for tracking an object in an image based on deep learning may include a model definition unit 110, a model prediction unit 120, and an object tracking unit 130.

모델 정의부(110)는 이전 프레임 영상으로부터 추적 대상 객체에 대응하는 타원형 모델을 정의할 수 있다. 이전 프레임 영상은 시계열적인 영상의 각 프레임 중 현재 프레임(k)의 바로 이전의 프레임(k-1)을 의미한다. 모델 정의부(110)는 이전 프레임 영상에서 추적 대상 객체 영역을 크롭(crop)하고, 추적 대상 객체 영역에서 타원형 모델을 정의할 수 있다. 추적 대상 객체 영역의 크롭은 후술하는 도 3을 통해 살펴보기로 한다.The model definition unit 110 may define an elliptical model corresponding to an object to be tracked from a previous frame image. The previous frame image means a frame (k-1) immediately preceding the current frame (k) among each frame of the time series image. The model definition unit 110 may crop an object region to be tracked in the previous frame image, and define an elliptical model in the object region to be tracked. Cropping of the tracking target object region will be described with reference to FIG. 3 described later.

도 2는 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치의 회전 가능한 타원형 모델의 예를 도시한 도면이다.2 is a diagram illustrating an example of a rotatable elliptical model of an object tracking device in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.

도 2를 참조하면, 회전 가능한 타원형 모델은 벡터에 기반한 복수의 특징 도메인을 포함할 수 있다. 예시적으로, 특징 도메인은 타원의 중심점(cx, cy), 장축(a)(cx+x₁, cy+y₁) 위치값, 단축(b)(cx+x₂, cx+y₂) 위치값 및 사잇각(φ)을 포함할 수 있다. 상기 회전 가능한 타원형 모델은 추적 대상 객체의 형태에 따라 회전할 수 있다. 뿐만 아니라, 추적 대상 객체의 형태에 따라 장축과 단축의 길이가 적응적으로 변화될 수 있어 추적 대상 객체를 용이하게 추적하기 위한 최적의 형태로 변화될 수 있다.Referring to FIG. 2, the rotatable elliptical model may include a plurality of feature domains based on vectors. Illustratively, the feature domain is an ellipse center point (cx, cy), a long axis (a) (cx+x ₁ , cy+y ₁ ) position value, and a short axis (b) (cx+x ₂ , cx+y ₂ ) position It may include a value and an angle (φ). The rotatable oval model may rotate according to the shape of the object to be tracked. In addition, the length of the long axis and the short axis can be adaptively changed according to the shape of the object to be tracked, and thus can be changed to an optimal shape for easily tracking the object to be tracked.

도 3은 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치의 현재 프레임 영상에서 타원형 모델을 예측하기 위한 흐름을 도시한 도면이다.FIG. 3 is a diagram illustrating a flow for predicting an elliptical model in a current frame image of an object tracking device in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.

도 3을 참조하면, 모델 정의부(110)는 이전 프레임 영상(k-1)에서 크롭된 추적 대상 객체 영역에서 타원형 모델(10)을 정의할 수 있다. 예시적으로, 추적 대상 객체는 사용자에 의해 선택될 수 있다. 상기 크롭은 추적 대상 객체의 중심 위치를 기준으로 소정의 영역으로 설정될 수 있으며, 크롭의 결과가 Laplace 랜덤 변수와 같은 분포를 가지도록 수행될 수 있으므로 구체적인 설명은 생략한다. 또한, 모델 정의부(110)는 이전 프레임 영상에서 크롭된 추적 대상 객체 영역과 대응하는 현재 프레임 영상의 일 영역을 크롭할 수 있다. 구체적으로, 이전 프레임 영상의 크롭된 영역의 위치에 기반하여 현재 프레임 영상에서 동일한 영역을 크롭할 수 있다. 현재 프레임 영상에서 크롭된 영역은 현재 프레임 영상에서 객체를 추적하기 위한 후보 영역으로 활용될 수 있다.Referring to FIG. 3, the model definition unit 110 may define the elliptical model 10 in the object region to be tracked cropped from the previous frame image k-1. Exemplarily, the object to be tracked can be selected by the user. The crop may be set to a predetermined area based on the center position of the object to be tracked, and a detailed description thereof will be omitted because the crop result can be performed to have the same distribution as the Laplace random variable. Also, the model definition unit 110 may crop a region of the current frame image corresponding to the object region to be tracked cropped from the previous frame image. Specifically, the same region may be cropped in the current frame image based on the location of the cropped region of the previous frame image. The cropped region in the current frame image may be used as a candidate region for tracking an object in the current frame image.

모델 예측부(120)는, 이전 프레임 영상과 현재 프레임 영상의 특성을 입력으로 하는 인공신경망을 통해 현재 프레임 영상에서 추적 대상 객체에 대응하는 타원형 모델을 예측할 수 있다. 구체적으로, 이전 프레임 영상의 특성은 타원형 모델의 특성값을 포함하는 크롭된 영역의 추적 대상 객체의 특징을 포함하고, 상기 현재 프레임 영상의 특성은 현재 프레임 영상에서 크롭된 영역을 포함한다. 상기 인공신경망은 컨볼루션 신경망(Convolution Neural Network)일 수 있다. 컨볼루션 신경망은 입력 데이터의 지역적 특성(예를 들어, 추적 대상 객체의 특징)을 추출하는 복수의 컨볼루션 레이어(Convolutional Layer)와 추출된 특성들을 분석하는 완전 연결 레이어(Fully-Connected Layer)로 구성될 수 있다.The model predictor 120 may predict an elliptical model corresponding to an object to be tracked in the current frame image through an artificial neural network that inputs characteristics of the previous frame image and the current frame image. Specifically, the characteristic of the previous frame image includes the characteristic of the object to be tracked in the cropped region including the characteristic value of the elliptical model, and the characteristic of the current frame image includes the cropped region in the current frame image. The artificial neural network may be a convolutional neural network. The convolutional neural network is composed of a plurality of convolutional layers that extract the local characteristics of the input data (e.g., the characteristics of the object to be tracked) and a fully-connected layer that analyzes the extracted characteristics. Can be.

모델 예측부(120)는 인공신경망을 통해 이전 프레임 영상과 현재 프레임 영상 각각의 특징맵을 산출할 수 있다. 인공신경망은 이전 프레임 영상과 현재 프레임 영상 각각의 특징맵을 완전 연결 레이어 처리하여 추적 대상 객체의 모션 차이를 학습할 수 있다. 구체적으로, 현재 프레임 영상에서 크롭된 영역은 이전 프레임 영상에서 크롭된 추적 대상 객체 영역과 완전히 일치하지 않을 수도 있으므로, 인공신경망은 추적 대상 객체 영역과 현재 프레임 영상의 크롭된 영역을 입력으로 하는 학습(실제 입력에는 이전 프레임 영상과 현재 프레임 영상의 특성맵이 입력될 수 있다.)을 통해 현재 프레임 영상에서 추적 대상 객체를 인지할 수 있다. 또한, 이전 프레임 영상의 특징맵은 타원형 모델이 고려된 것이므로, 모델 예측부(120)는 인공신경망의 완전 연결 레이어를 통한 선형 회귀 분석을 통해 현재 프레임 영상에서 예측된 타원형 모델의 특징 도메인을 산출할 수 있다. 예시적으로, 예측된 타원형 모델의 특징 도메인은 (cx, cy, x₁, y₁, x₂, y₂) 와 같은 벡터 상의 좌표로 출력될 수 있고, 상기 좌표에 기초하여 다음 프레임 영상의 타원형 모델(11)이 예측될 수 있다.The model predictor 120 may calculate a feature map of each of the previous frame image and the current frame image through the artificial neural network. The artificial neural network can learn the motion difference of the object to be tracked by completely processing the feature maps of each of the previous frame image and the current frame image. Specifically, the area cropped in the current frame image may not completely match the object region to be tracked cropped in the previous frame image, so the artificial neural network learns to input the object region to be tracked and the cropped region of the current frame image as input ( In actual input, a characteristic map of the previous frame image and the current frame image may be input.) The object to be tracked can be recognized in the current frame image. In addition, since the feature map of the previous frame image is considered an elliptical model, the model predictor 120 calculates a feature domain of the predicted elliptical model in the current frame image through linear regression analysis through a completely connected layer of the artificial neural network. Can. For example, the feature domain of the predicted elliptical model may be output as coordinates on a vector such as (cx, cy, x ₁ , y ₁ , x ₂ , y ₂ ), and the ellipse of the next frame image based on the coordinates The model 11 can be predicted.

도 4는 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치의 손실 함수를 고려한 타원형 모델의 예측의 예를 도시한 도면이다.4 is a diagram illustrating an example of prediction of an elliptical model considering a loss function of an object tracking device in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.

현재 프레임 영상에서 예측된 타원형 모델(11)은 전술한 모션 차이에 따라 추적 대상 객체의 위치를 예측한 것이므로, 보다 정확한 위치 추적을 위해서는 예측된 타원형 모델(11)의 형상을 보정할 필요가 있다. 이에, 모델 예측부(120)는 현재 프레임 영상에서 타원형 모델(11)을 예측함에 있어서, 보다 정확한 타원형 모델(11)을 예측하기 위해 현재 프레임 영상의 실제 추적 대상 객체와의 차이를 고려하여 상기 차이를 최소화하는 타원형 모델을 예측할 수 있다. 구체적으로, 도 4를 참조하면, 모델 예측부(120)는 예측된 현재 프레임 영상의 타원형 모델(11)과 실제 추적 대상 객체의 타원형 모델(20)의 차이에 따른 손실 함수를 고려하여 상기 현재 프레임 영상의 타원형 모델을 예측할 수 있다. Since the elliptical model 11 predicted from the current frame image predicts the position of the object to be tracked according to the above-described motion difference, it is necessary to correct the shape of the predicted elliptical model 11 for more accurate position tracking. Accordingly, in predicting the elliptical model 11 from the current frame image, the model predictor 120 considers the difference from the actual tracking target object of the current frame image in order to predict the more accurate elliptical model 11. It is possible to predict an elliptical model that minimizes. Specifically, referring to FIG. 4, the model predicting unit 120 considers the loss function according to the difference between the elliptical model 11 of the predicted current frame image and the elliptical model 20 of the object to be actually tracked. The elliptical model of the image can be predicted.

모델 예측부(120)는 수학식 1을 통한 손실 함수에 기초한 손실량을 고려하여 현재 프레임 영상의 타원형 모델을 예측할 수 있다.The model predictor 120 may predict the elliptical model of the current frame image in consideration of the loss amount based on the loss function through Equation (1).

[수학식 1][Equation 1]

여기서, m은 m번째 샘플이고(샘플의 수),

는 i번째 특징 도메인의 가중치이고,

는 예측된 추적 대상 객체의 i번째 특징 도메인이고,

는 실제 추적 대상 객체의 i번째 특징 도메인을 나타내며, 상기 특징 도메인은 타원의 중심점(cx, cy), 장축(a), 위치값 단축(b) 위치값 및 사잇각(φ)을 포함할 수 있다. 따라서, 수학식 1에 기초한 손실량은 각 특징 도메인 별로 손실량이 산출되어 합산될 수 있고, 상기 가중치 또한 특징 도메인 별로 산출될 수 있다.Where m is the mth sample (number of samples),

Is the weight of the i-th feature domain,

Is the i-th feature domain of the predicted tracking object,

Denotes the i-th feature domain of the object to be tracked, and the feature domain may include an elliptical center point (cx, cy), a long axis (a), a shortened position value (b) a position value, and a sight angle (φ). Accordingly, the loss amount based on Equation 1 may be calculated and summed for each feature domain, and the weight may also be calculated for each feature domain.

예시적으로, 실제 추적 대상 객체의 타원형 모델(20)은 사용자에 의해 설정될 수 있다.For example, the elliptical model 20 of the object to be tracked may be set by the user.

도 5는 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치의 가중치를 고려한 예측된 타원형 모델의 특징 도메인의 예를 도시한 도면이다.5 is a diagram illustrating an example of a feature domain of a predicted elliptical model considering weights of an object tracking device in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.

도 5를 참조하면, 손실 함수에 포함되는 가중치는, 타원형 모델의 특징 도메인의 변화량에 기초하여 특징 도메인 별로 결정될 수 있다. 구체적으로, 예측된 타원형 모델의 특징 도메인과 이전 프레임 영상의 타원형 모델의 특징 도메인의 변화량은 특징 도메인별로 상이할 수 있다. 도 5는 학습률이 5e-5, 크롭 인자가 2.5로 설정되고, 위치 이동 함수에 대한 람다 값이 2.8(라플라스 분포의 피크치는 0.3)로 설정된 인공신경망을 통해 산출된 예측된 타원형 모델의 특징 도메인(a)과, 가중치를 고려한 예측된 타원형 모델의 특징 도메인(b)을 도시한다. Referring to FIG. 5, weights included in the loss function may be determined for each feature domain based on the amount of change in the feature domain of the elliptical model. Specifically, the amount of change between the predicted elliptical model feature domain and the previous frame image's oval model feature domain may be different for each feature domain. FIG. 5 is a feature domain of a predicted elliptical model calculated through an artificial neural network having a learning rate of 5e-5, a crop factor of 2.5, and a lambda value of a position shift function set to 2.8 (the peak value of the Laplace distribution is 0.3). a) and the feature domain (b) of the predicted elliptical model considering the weight.

모든 특징 도메인에 동일한 가중치를 고려한 경우, 각 특징 도메인마다 변화량 또는 변화 가능한 범위가 상이하게 된다. 예시적으로, 도 5의 (a)를 참조하면, 중심점의 요소인 cx의 평균은 약 0.5이고, 단축의 요소인 b의 평균은 약 0.15이다. 이때 모든 특징 도메인에 동일한 가중치(예를 들어, 3)를 고려하여 손실함수를 계산하고 현재 프레임 영상의 타원형 모델을 예측하는 경우, 각 특징 도메인의 특성을 정확히 반영하지 못한 타원형 모델이 예측되게 된다. 따라서, 동일한 가중치를 고려하는 것은 부정확한 손실 함수가 산출될 수 있고, 결과적으로 부정확한 타원형 모델의 예측을 야기할 수 있다. 따라서, 모델 예측부(120)는 이전 프레임 영상의 타원형 모델의 특징 도메인과 현재 프레임 영상의 타원형 모델의 특징 도메인 각각의 변화량 또는 변화 가능한 범위 등 각 특징 도메인의 특성에 기초하여 특징 도메인 별로 상기 가중치를 적응적으로 결정할 수 있다. 이와 같이 함으로써 각 특징 도메인의 가중치가 정규화 될 수 있고, 가중치가 정규화된 손실 함수를 통해 실제 추적 대상 객체의 타원형 모델과의 차이가 최소화되어 보다 정확한 현재 프레임 영상의 타원형 모델을 예측할 수 있다.When the same weights are considered for all feature domains, a change amount or a changeable range is different for each feature domain. For example, referring to (a) of FIG. 5, the average of cx, an element of the center point, is about 0.5, and the average of b, an element of the minor axis, is about 0.15. In this case, when the loss function is calculated in consideration of the same weight (for example, 3) for all feature domains and the elliptical model of the current frame image is predicted, an elliptical model that does not accurately reflect the characteristics of each feature domain is predicted. Thus, considering the same weights can result in an incorrect loss function, resulting in prediction of an incorrect elliptical model. Therefore, the model predictor 120 weights the weights for each feature domain based on the characteristics of each feature domain such as a change amount or a changeable range of each feature domain of the elliptical model of the previous frame image and the feature domain of the elliptical model of the current frame image. You can make adaptive decisions. By doing this, the weights of each feature domain can be normalized, and the difference from the elliptical model of the object to be tracked is minimized through the loss function with normalized weights, thereby predicting a more accurate elliptical model of the current frame image.

도 6은 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치의 예측된 타원형 모델을 통한 현재 프레임 영상의 객체 추적을 도시한 도면이다.FIG. 6 is a diagram illustrating object tracking of a current frame image through a predicted elliptical model of an object tracking device in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.

객체 추적부(130)는 예측된 현재 프레임 영상의 타원형 모델에 기초하여 상기 현재 프레임 영상에서 추적 대상 객체를 추적할 수 있다. 객체 추적부(130)는 현재 프레임에서 추적 대상 객체를 추적하고, 이후 시계열적으로 입력되는 신규 프레임(k+1)에서 추적 대상 객체를 추적할 수 있다. 도 6은 심층학습 기반의 영상 내 객체 추적 장치(100)를 통한 객체 추적의 예를 도시한다. 흰색 타원은 추적 대상 객체를 나타내고 빨간색 타원은 타원형 모델을 나타낸다. 종래에는 사각형 모델을 통해 객체를 추적함으로써 불필요한 배경이 모델에 포함되는 경우가 빈번하고, 모델의 회전 및 스케일의 조절이 어려웠으나, 심층학습 기반의 영상 내 객체 추적 장치(100)에 따르면, 회전 가능한 타원형 모델을 활용함으로써 추적 대상 객체의 크기 및 형태에 대해 적응적으로 모델의 스케일을 조절할 수 있으므로, 객체 추적 성능이 보다 향상될 수 있다.The object tracking unit 130 may track the object to be tracked in the current frame image based on the predicted elliptical model of the current frame image. The object tracking unit 130 may track the object to be tracked in the current frame, and then track the object to be tracked in a new frame (k+1) input in time series. 6 shows an example of object tracking through the object tracking device 100 in an image based on deep learning. The white ellipse represents the object to be tracked and the red ellipse represents the elliptical model. Conventionally, by tracking an object through a square model, unnecessary backgrounds are frequently included in the model, and it is difficult to adjust the rotation and scale of the model, but according to the object tracking device 100 in an image based on deep learning, it is rotatable. By using the elliptical model, since the scale of the model can be adjusted adaptively to the size and shape of the object to be tracked, object tracking performance can be further improved.

도 7은 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 딥러닝 기반의 영상 내 객체 추적 방법의 흐름을 도시한 도면이다.7 is a diagram illustrating a flow of a method for tracking an object in an image based on deep learning using a rotatable elliptical model according to an embodiment of the present application.

도 7에 도시된 본원의 일 실시예에 따른 회전 가능한 타원형 모델을 이용한 딥러닝 기반의 영상 내 객체 추적 방법은 앞선 도 1 내지 도 6을 통해 설명된 심층학습 기반의 영상 내 객체 추적 장치(100)에 의하여 수행될 수 있다. 따라서 이하 생략된 내용이라고 하더라도 도 1 내지 도 6 통해 심층학습 기반의 영상 내 객체 추적 장치(10)에 대하여 설명된 내용은 도 7에도 동일하게 적용될 수 있다.A deep learning-based object tracking method in an image using a rotatable elliptical model according to an embodiment of the present application illustrated in FIG. 7 is an apparatus 100 for tracking an object in an image based on deep learning described through FIGS. 1 to 6. Can be performed by Therefore, even if it is omitted hereinafter, the contents described with respect to the object tracking device 10 in the image based on the deep learning through FIGS. 1 to 6 may be applied to FIG. 7 as well.

단계 S710에서 모델 정의부(110)는 이전 프레임 영상으로부터 추적 대상 객체에 대응하는 타원형 모델을 정의할 수 있다. 모델 정의부(110)는 이전 프레임 영상에서 추적 대상 객체 영역을 크롭(crop)하고, 추적 대상 객체 영역에서 타원형 모델을 정의할 수 있다. 회전 가능한 타원형 모델은 벡터에 기반한 복수의 특징 도메인을 포함할 수 있다. 예시적으로, 특징 도메인은 타원의 중심점(cx, cy), 장축(a)(cx+x₁, cy+y₁) 위치값, 단축(b)(cx+x₂, cy+y₂) 위치값 및 사잇각(φ)을 포함할 수 있다. 상기 회전 가능한 타원형 모델은 추적 대상 객체의 형태에 따라 회전할 수 있다. 뿐만 아니라, 추적 대상 객체의 형태에 따라 장축과 단축의 길이가 적응적으로 변화될 수 있어 추적 대상 객체를 용이하게 추적하기 위한 최적의 형태로 변화될 수 있다.In step S710, the model definition unit 110 may define an elliptical model corresponding to the object to be tracked from the previous frame image. The model definition unit 110 may crop an object region to be tracked in the previous frame image, and define an elliptical model in the object region to be tracked. The rotatable elliptical model may include a plurality of feature domains based on vectors. Illustratively, the feature domains are the center points of the ellipse (cx, cy), the long axis (a) (cx+x ₁ , cy+y ₁ ) position values, and the short axis (b) (cx+x ₂ , cy+y ₂ ) positions. It may include a value and an angle (φ). The rotatable oval model may rotate according to the shape of the object to be tracked. In addition, the length of the long axis and the short axis can be adaptively changed according to the shape of the object to be tracked, and thus can be changed to an optimal shape for easily tracking the object to be tracked.

모델 정의부(110)는 이전 프레임 영상(k-1)에서 크롭된 추적 대상 객체 영역에서 타원형 모델(10)을 정의할 수 있다. 예시적으로, 추적 대상 객체는 사용자에 의해 선택될 수 있다. 상기 크롭은 추적 대상 객체의 중심 위치를 기준으로 소정의 영역으로 설정될 수 있으며 크롭의 결과가 Laplace 랜덤 변수와 같은 분포를 가지도록 수행될 수 있으므로 구체적인 설명은 생략한다. 또한, 모델 정의부(110)는 이전 프레임 영상에서 크롭된 추적 대상 객체 영역과 대응하는 현재 프레임 영상의 일 영역을 크롭할 수 있다. 구체적으로, 이전 프레임 영상의 크롭된 영역의 위치에 기반하여 현재 프레임 영상에서 동일한 영역을 크롭할 수 있다. 현재 프레임 영상에서 크롭된 영역은 현재 프레임 영상에서 객체를 추적하기 위한 후보 영역으로 활용될 수 있다.The model definition unit 110 may define the elliptical model 10 in the object region to be tracked cropped from the previous frame image k-1. Exemplarily, the object to be tracked can be selected by the user. The crop may be set to a predetermined area based on the center position of the object to be tracked, and a detailed description thereof will be omitted because the crop result can be performed to have the same distribution as the Laplace random variable. Also, the model definition unit 110 may crop a region of the current frame image corresponding to the object region to be tracked cropped from the previous frame image. Specifically, the same region may be cropped in the current frame image based on the location of the cropped region of the previous frame image. The cropped region in the current frame image may be used as a candidate region for tracking an object in the current frame image.

단계 S720에서 모델 예측부(120)는, 이전 프레임 영상과 현재 프레임 영상의 특성을 입력으로 하는 인공신경망을 통해 현재 프레임 영상에서 추적 대상 객체에 대응하는 타원형 모델을 예측할 수 있다. 구체적으로, 이전 프레임 영상의 특성은 타원형 모델을 포함하는 크롭된 영역의 추적 대상 객체의 특징을 포함하고, 상기 현재 프레임 영상의 특성은 현재 프레임 영상에서 크롭된 영역을 포함한다. 상기 인공신경망은 컨볼루션 신경망(Convolution Neural Network)일 수 있다. 컨볼루션 신경망은 입력 데이터의 지역적 특성(예를 들어, 추적 대상 객체의 특징)을 추출하는 복수의 컨볼루션 레이어(Convolutional Layer)와 추출된 특성들을 분석하는 완전 연결 레이어(Fully-Connected Layer)로 구성될 수 있다.In step S720, the model predictor 120 may predict an elliptical model corresponding to the object to be tracked in the current frame image through an artificial neural network that inputs characteristics of the previous frame image and the current frame image. Specifically, the characteristics of the previous frame image include characteristics of the object to be tracked in the cropped region including the elliptical model, and the characteristics of the current frame image include regions cropped in the current frame image. The artificial neural network may be a convolutional neural network. The convolutional neural network consists of a plurality of convolutional layers that extract the local characteristics of the input data (e.g., the characteristics of the object to be tracked) and a fully-connected layer that analyzes the extracted characteristics. Can be.

모델 예측부(120)는 인공신경망을 통해 이전 프레임 영상과 현재 프레임 영상 각각의 특징맵을 산출할 수 있다. 인공신경망은 이전 프레임 영상과 현재 프레임 영상 각각의 특징맵을 완전 연결 레이어 처리하여 추적 대상 객체의 모션 차이를 학습할 수 있다. 구체적으로, 현재 프레임 영상에서 크롭된 영역은 이전 프레임 영상에서 크롭된 추적 대상 객체 영역과 완전히 일치하지 않을 수도 있으므로, 인공신경망은 추적 대상 객체 영역과 현재 프레임 영상의 크롭된 영역을 입력으로 하는 학습(실제 입력에는 이전 프레임 영상과 현재 프레임 영상의 특성맵이 입력될 수 있다.)을 통해 현재 프레임 영상에서 추적 대상 객체를 인지할 수 있다. 또한, 이전 프레임 영상의 특징맵은 타원형 모델이 고려된 것이므로, 모델 예측부(120)는 인공신경망의 완전 연결 레이어를 통한 선형 회귀 분석을 통해 현재 프레임 영상에서 예측된 타원형 모델의 특징 도메인을 산출할 수 있다. 예시적으로, 예측된 타원형 모델의 특징 도메인은 (cx, cy, x₁, y₁, x₂, y₂)와 같은 벡터 상의 좌표로 출력될 수 있고, 상기 좌표에 기초하여 다음 프레임 영상의 타원형 모델(11)이 예측될 수 있다.The model predictor 120 may calculate a feature map of each of the previous frame image and the current frame image through the artificial neural network. The artificial neural network can learn the motion difference of the object to be tracked by completely processing the feature maps of each of the previous frame image and the current frame image. Specifically, the area cropped in the current frame image may not completely match the object region to be tracked cropped in the previous frame image, so the artificial neural network learns to input the object region to be tracked and the cropped region of the current frame image as input ( In actual input, a characteristic map of the previous frame image and the current frame image may be input.) The object to be tracked can be recognized in the current frame image. In addition, since the feature map of the previous frame image is considered an elliptical model, the model predictor 120 calculates a feature domain of the predicted elliptical model in the current frame image through linear regression analysis through a completely connected layer of the artificial neural network. Can. For example, the feature domain of the predicted elliptical model may be output as coordinates on a vector such as (cx, cy, x ₁ , y ₁ , x ₂ , y ₂ ), and the ellipse of the next frame image based on the coordinates The model 11 can be predicted.

또한, 모델 예측부(120)는 예측된 현재 프레임 영상의 타원형 모델(11)과 실제 추적 대상 객체의 타원형 모델(20)의 차이에 따른 손실 함수를 고려하여 상기 현재 프레임 영상의 타원형 모델을 예측할 수 있다.In addition, the model predictor 120 may predict the elliptical model of the current frame image in consideration of a loss function according to the difference between the elliptical model 11 of the predicted current frame image and the elliptical model 20 of the object to be tracked. have.

모델 예측부(120)는 수학식 2를 통한 손실 함수에 기초한 손실량을 고려하여 현재 프레임 영상의 타원형 모델을 예측할 수 있다.The model predictor 120 may predict the elliptical model of the current frame image in consideration of the loss amount based on the loss function through Equation (2).

[수학식 2][Equation 2]

여기서, m은 m번째 샘플이고,

는 i번째 특징 도메인의 가중치이고,

는 예측된 추적 대상 객체의 i번째 특징 도메인이고,

는 실제 추적 대상 객체의 i번째 특징 도메인을 나타내며, 상기 특징 도메인은 타원의 중심점(cx, cy), 장축(a), 위치값 단축(b) 위치값 및 사잇각(φ)을 포함할 수 있다. 따라서, 수학식 1에 기초한 손실량은 각 특징 도메인 별로 산출될 수 있고, 상기 가중치 또한 특징 도메인 별로 산출될 수 있다.Where m is the mth sample,

Is the weight of the i-th feature domain,

Is the i-th feature domain of the predicted tracking object,

Denotes the i-th feature domain of the object to be tracked, and the feature domain may include an elliptical center point (cx, cy), a long axis (a), a shortened position value (b) a position value, and a sight angle (φ). Accordingly, the loss amount based on Equation 1 may be calculated for each feature domain, and the weight may also be calculated for each feature domain.

예시적으로, 실제 추적 대상 객체의 타원형 모델(20)은 사용자에 의해 설정될 수 있다. 또한, 손실 함수에 포함되는 가중치는, 타원형 모델의 특징 도메인의 변화량에 기초하여 특징 도메인 별로 결정될 수 있다. 모델 예측부(120)는 이전 프레임 영상의 타원형 모델의 특징 도메인과 현재 프레임 영상의 타원형 모델의 특징 도메인 각각의 변화량에 기초하여 특징 도메인 별로 상기 가중치를 결정할 수 있다. 이와 같이 함으로써 각 특징 도메인의 가중치가 정규화 될 수 있고, 가중치가 정규화된 손실 함수를 통해 실제 추적 대상 객체의 타원형 모델과의 차이가 최소화되어 보다 정확한 현재 프레임 영상의 타원형 모델을 예측할 수 있다.For example, the elliptical model 20 of the object to be tracked may be set by the user. Further, the weight included in the loss function may be determined for each feature domain based on the amount of change in the feature domain of the elliptical model. The model predictor 120 may determine the weight for each feature domain based on the amount of change in each feature domain of the elliptical model of the previous frame image and the feature domain of the oval model of the current frame image. By doing this, the weight of each feature domain can be normalized, and the difference from the elliptical model of the object to be tracked is minimized through the loss function with normalized weight to predict the more accurate elliptical model of the current frame image.

단계 S730에서 객체 추적부(130)는 예측된 현재 프레임 영상의 타원형 모델에 기초하여 상기 현재 프레임 영상에서 추적 대상 객체를 추적할 수 있다. 객체 추적부(130)는 현재 프레임에서 추적 대상 객체를 추적하고, 이후 시계열적으로 입력되는 신규 프레임(k+1)에서 추적 대상 객체를 추적할 수 있다.In step S730, the object tracking unit 130 may track the object to be tracked in the current frame image based on the predicted elliptical model of the current frame image. The object tracking unit 130 may track the object to be tracked in the current frame, and then track the object to be tracked in a new frame (k+1) input in time series.

본원의 일 실시 예에 따른, 회전 가능한 타원형 모델을 이용한 딥러닝 기반의 영상 내 객체 추적 방법은, 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.According to an embodiment of the present application, a method for tracking an object in an image based on deep learning using a rotatable elliptical model may be implemented in a form of program instructions that can be executed through various computer means and recorded in a computer readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. -Hardware devices specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present application is for illustrative purposes, and those skilled in the art to which the present application pertains will understand that it is possible to easily modify to other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the claims, which will be described later, rather than the detailed description, and all modifications or variations derived from the meaning and scope of the claims and equivalent concepts should be interpreted to be included in the scope of the present application.

10: 이전 프레임 영상의 타원형 모델
11: 다음 프레임 영상의 타원형 모델
20: 실제 추적 대상 객체의 타원형 모델
100: 심층학습 기반의 영상 내 객체 추적 장치
110: 모델 정의부
120: 모델 예측부
130: 객체 추적부10: Oval model of the previous frame image
11: Oval model of next frame image
20: Oval model of the object to be tracked
100: deep learning based object tracking device in the image
110: model definition
120: model prediction unit
130: object tracking unit

Claims

회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 방법에 있어서,
(a) 이전 프레임 영상으로부터 추적 대상 객체에 대응하는 타원형 모델을 정의하는 단계;
(b) 상기 이전 프레임 영상과 현재 프레임 영상의 특성을 입력으로 하는 인공신경망을 통해 상기 현재 프레임 영상에서 상기 추적 대상 객체에 대응하는 타원형 모델을 예측하는 단계; 및
(c) 상기 예측된 현재 프레임 영상의 타원형 모델에 기초하여 상기 현재 프레임 영상에서 상기 추적 대상 객체를 추적하는 단계,
를 포함하되,
상기 타원형 모델은 벡터에 기반한 특징 도메인을 포함하고,
상기 특징 도메인은 타원의 중심점, 장축, 단축, 및 사잇각을 포함하고,
상기 (b) 단계는,
상기 예측된 현재 프레임 영상의 타원형 모델과 실제 추적 대상 객체의 타원형 모델의 차이에 따른 손실 함수를 고려하여 상기 현재 프레임 영상의 타원형 모델을 예측하되,
이전 프레임 영상의 타원형 모델의 특징 도메인과 현재 프레임 영상의 타원형 모델의 특징 도메인 각각의 변화량 및 변화 가능한 범위 중 적어도 어느 하나에 따른 특징 도메인의 특성에 기초하여 상기 특징 도메인 별로 가중치를 적응적으로 결정하고,
상기 특징 도메인별 가중치를 정규화하고 상기 손실 함수에 적용하여 상기 현재 프레임 영상의 타원형 모델을 예측하는 것인, 심층학습 기반의 영상 내 객체 추적 방법.In the method of tracking an object in an image based on deep learning using a rotatable elliptical model,
(a) defining an elliptical model corresponding to the object to be tracked from the previous frame image;
(b) predicting an elliptical model corresponding to the object to be tracked in the current frame image through an artificial neural network having characteristics of the previous frame image and the current frame image as inputs; And
(c) tracking the object to be tracked in the current frame image based on an elliptical model of the predicted current frame image,
Including,
The elliptical model includes a feature domain based on a vector,
The feature domain includes an ellipse center point, a long axis, a short axis, and a sway angle,
Step (b) is,
Predicting the elliptical model of the current frame image in consideration of the loss function according to the difference between the predicted elliptical model of the current frame image and the actual tracking object,
Based on the characteristics of the feature domain according to at least one of a change amount and a changeable range of each of the feature domains of the elliptical model of the previous frame image and the feature domains of the current frame image, the weights for each feature domain are adaptively determined and ,
A method of tracking an object in an image based on deep learning, wherein the weight of each feature domain is normalized and applied to the loss function to predict an elliptical model of the current frame image.

제1항에 있어서,
상기 (a) 단계는,
상기 이전 프레임 영상에서 추적 대상 객체 영역을 크롭(crop)하고, 상기 추적 대상 객체 영역에서 상기 타원형 모델을 정의하는 것인, 심층학습 기반의 영상 내 객체 추적 방법.According to claim 1,
Step (a) is,
A method of tracking an object in an image based on depth learning, which crops a target object area in the previous frame image and defines the elliptical model in the target object area.

삭제delete

제1항에 있어서,
상기 (b) 단계는,
상기 인공신경망을 통해 상기 이전 프레임 영상과 상기 현재 프레임 영상 각각의 특징맵을 산출하고,
상기 인공신경망은, 상기 각각의 특징맵을 완전 연결 레이어 처리하여 상기 추적 대상 객체의 모션 차이를 학습하여 상기 추적 대상 객체에 대응하는 타원형 모델을 예측하는 것인, 심층학습 기반의 영상 내 객체 추적 방법.According to claim 1,
Step (b) is,
A feature map of each of the previous frame image and the current frame image is calculated through the artificial neural network,
The artificial neural network predicts an elliptical model corresponding to the object to be tracked by learning the motion difference of the object to be tracked by processing each feature map in a fully connected layer, thereby tracking an object in an image based on deep learning. .

회전 가능한 타원형 모델을 이용한 심층학습 기반의 영상 내 객체 추적 장치에 있어서,
이전 프레임 영상으로부터 추적 대상 객체에 대응하는 타원형 모델을 정의하는 모델 정의부;
상기 이전 프레임 영상과 현재 프레임 영상의 특성을 입력으로 하는 인공신경망을 통해 상기 현재 프레임 영상에서 상기 추적 대상 객체에 대응하는 타원형 모델을 예측하는 모델 예측부; 및
상기 예측된 현재 프레임 영상의 타원형 모델에 기초하여 상기 현재 프레임 영상에서 상기 추적 대상 객체를 추적하는 객체 추적부,
를 포함하되,
상기 타원형 모델은 벡터에 기반한 특징 도메인을 포함하고,
상기 특징 도메인은 타원의 중심점, 장축, 단축, 및 사잇각을 포함하고,
상기 모델 예측부는,
상기 예측된 현재 프레임 영상의 타원형 모델과 실제 추적 대상 객체의 타원형 모델의 차이에 따른 손실 함수를 고려하여 상기 현재 프레임 영상의 타원형 모델을 예측하되,
이전 프레임 영상의 타원형 모델의 특징 도메인과 현재 프레임 영상의 타원형 모델의 특징 도메인 각각의 변화량 및 변화 가능한 범위 중 적어도 어느 하나에 따른 특징 도메인의 특성에 기초하여 상기 특징 도메인 별로 가중치를 적응적으로 결정하고,
상기 특징 도메인별 가중치를 정규화하고 상기 손실함수에 적용하여 상기 현재 프레임 영상의 타원형 모델을 예측하는 것인, 심층학습 기반의 영상 내 객체 추적 장치.In the apparatus for tracking an object in an image based on deep learning using a rotatable elliptical model,
A model definition unit defining an elliptical model corresponding to an object to be tracked from a previous frame image;
A model predicting unit predicting an elliptical model corresponding to the object to be tracked in the current frame image through an artificial neural network that inputs characteristics of the previous frame image and the current frame image; And
An object tracking unit tracking the object to be tracked in the current frame image based on the predicted elliptical model of the current frame image,
Including,
The elliptical model includes a feature domain based on a vector,
The feature domain includes an ellipse center point, a long axis, a short axis, and a sway angle,
The model prediction unit,
Predicting the elliptical model of the current frame image in consideration of the loss function according to the difference between the predicted elliptical model of the current frame image and the actual tracking object,
Based on the characteristics of the feature domain according to at least one of a change amount and a changeable range of each of the feature domains of the elliptical model of the previous frame image and the feature domains of the current frame image, the weights for each feature domain are adaptively determined and ,
In-depth learning-based object tracking device for predicting an elliptical model of the current frame image by normalizing the weights for each feature domain and applying it to the loss function.

제6항에 있어서,
상기 모델 정의부는,
상기 이전 프레임 영상에서 추적 대상 객체 영역을 크롭(crop)하고, 상기 추적 대상 객체 영역에서 상기 타원형 모델을 정의하는 것인, 심층학습 기반의 영상 내 객체 추적 장치.The method of claim 6,
The model definition unit,
An object tracking device in an image based on deep learning, which crops a target object area in the previous frame image and defines the elliptical model in the target object area.

삭제delete

제6항에 있어서,
상기 모델 예측부는,
상기 인공신경망을 통해 상기 이전 프레임 영상과 상기 현재 프레임 영상 각각의 특징맵을 산출하고,
상기 인공신경망은, 상기 각각의 특징맵을 완전 연결 레이어 처리하여 상기 추적 대상 객체의 모션 차이를 학습하여 상기 추적 대상 객체에 대응하는 타원형 모델을 예측하는 것인, 심층학습 기반의 영상 내 객체 추적 장치.The method of claim 6,
The model prediction unit,
A feature map of each of the previous frame image and the current frame image is calculated through the artificial neural network,
The artificial neural network predicts an elliptical model corresponding to the object to be tracked by learning the motion difference of the object to be tracked by processing each feature map in a fully connected layer, thereby tracking an object in the image based on deep learning. .

제1항, 제2항 및 제5항 중 어느 한 항의 방법을 컴퓨터에서 실행하기 위한 프로그램을 기록한 컴퓨터에서 판독 가능한 기록매체.A computer-readable recording medium recording a program for executing the method of any one of claims 1, 2, and 5 on a computer.