KR101027157B1

KR101027157B1 - clothing classification using the Support Vector Machine

Info

Publication number: KR101027157B1
Application number: KR1020080103128A
Authority: KR
Inventors: 최유주; 김구진; 박선미
Original assignee: 서울벤처정보대학원대학교 산학협력단
Priority date: 2008-10-21
Filing date: 2008-10-21
Publication date: 2011-04-05
Also published as: KR20100043883A

Abstract

본 발명은 최대마진분류기를 이용한 의상 분류 방법에 관한 것으로, 특히 카메라로부터 입력되는 영상에서 인물의 얼굴 영역을 감지하고, 감지된 얼굴 영역을 기반으로 하여 의상 영역을 정한 다음, 최대마진분류기(support vector machine; SVM)를 이용하여 인물의 의상이 어떤 종류인지를 판별함으로써, 지능형 감시 시스템의 기반 기술로 사용할 수 있는 최대마진분류기를 이용한 의상 분류 방법에 관한 것이다.The present invention relates to a method of classifying clothes using a maximum margin classifier. In particular, a face area of a person is detected from an image input from a camera, a clothing area is determined based on the detected face area, and a maximum margin classifier (support vector) is provided. The present invention relates to a method of classifying clothes using a maximum margin classifier that can be used as a base technology of an intelligent surveillance system by determining what kind of clothes a person is using a machine (SVM).

본 발명의 의상 분류 방법은 실시간 입력되는 영상으로부터 배경영상을 차분하는 (a) 단계; 상기 (a) 단계에서 차분된 전체 영상의 전체 픽셀에 대한 전경 픽셀의 비율과 상기 전경 픽셀에 대한 피부 픽셀의 비율을 계산하는 (b) 단계; 상기 (a) 단계에서 차분된 전체 영상을 미리 정해진 크기의 격자패치로 분할한 후에, 분할된 각 격자패치 내의 전체 픽셀에 대한 전경 픽셀의 비율과 격자패치 내의 전경픽셀에 대한 피부픽셀의 비율을 계산하는 (c) 단계; 상기 (c) 단계에서 계산된 격자패치내의 상기 전경픽셀의 비율과 상기 피부픽셀의 비율을 상기 (b) 단계에서 계산된 각 비율에 따라 적응적으로 정해지는 임계값과 비교한 결과에 따라 각 격자패치를 전경격자, 피부격자 및 배경격자로 분류하는 (d) 단계; 상기 (d) 단계에서 분류된 격자패치 기반에 의해 레이블링을 수행하여 관심 피부영역을 바운딩 박스로 추출하는 (e) 단계; 아다부스트 알고리즘을 이용하여 상기 (e) 단계에서 추출된 관심 피부영역 중에서 얼굴영역을 찾은 후에, 찾은 얼굴영역의 아래에 위치한 미리 정해진 크기의 영역을 의상영역으로 정하는 (f) 단계; 상기 (f) 단계에서 정해진 의상영역에 대해 소벨마스크를 적용하여 상기 의상영역에 대한 특징벡터를 구성하는 (g) 단계; 및 상기 (g) 단계에서 구성된 특징벡터에 대해 최대 마진 분류기를 적용하여 의상의 종류를 판별하는 (h) 단계를 포함하여 이루어진다.The method of classifying clothes according to the present invention comprises the steps of: (a) dividing a background image from a real-time input image; (B) calculating a ratio of a foreground pixel to all pixels of the entire image differentiated in step (a) and a ratio of skin pixels to the foreground pixel; After dividing the entire image differentiated in the step (a) into a grid patch of a predetermined size, the ratio of the foreground pixel to the total pixels in each divided grid patch and the skin pixel to the foreground pixel in the grid patch are calculated. (C) doing; Each grid according to a result of comparing the ratio of the foreground pixel and the skin pixel in the grid patch calculated in step (c) with a threshold value adaptively determined according to each ratio calculated in step (b). (D) classifying the patch into a foreground grid, a skin grid and a background grid; (E) extracting a skin region of interest as a bounding box by performing labeling based on the grid patches classified in step (d); Finding a face region from the skin region of interest extracted in the step (e) using an Adaboost algorithm, and then setting a region of a predetermined size located below the found face region as a clothes region; (G) constructing a feature vector for the clothes area by applying a Sobel mask to the clothes area determined in the step (f); And (h) determining a type of clothes by applying a maximum margin classifier to the feature vector configured in step (g).

의상, 분류, 판별, 격자, support vector machine, 소벨 마스크 Costume, classification, discrimination, grid, support vector machine, sobel mask

Description

최대마진분류기를 이용한 의상 분류 방법{clothing classification using the Support Vector Machine}Clothing classification using the Support Vector Machine}

직관적인 인간과 컴퓨터 사이의 상호작용을 위한 HCI(Human Computer Interaction) 기술 중 하나로 비전기반 인터페이스 영역에 많은 관심이 집중되고 있다. 비전기반 접근은 사용자가 외부의 장비 착용 없이 자연스러운 동작을 통해 컴퓨터와의 쉬운 상호작용을 할 수 있도록 도와준다. 비전기반 접근은 주로 여러 대의 카메라로부터 입력된 영상만을 사용하므로 비용면에서 효과적인 인터페이스 시스템을 구축할 수 있다.One of HCI (Human Computer Interaction) technology for intuitive human-computer interaction has attracted much attention in the field of vision-based interface. The vision-based approach allows the user to easily interact with the computer through natural motions without wearing external equipment. The vision-based approach mainly uses only images input from multiple cameras, thus creating a cost-effective interface system.

한편, 이러한 비전기반 상호작용은 크게 외관기반 접근방식과 특징기반 접근 방식으로 구분될 수 있다. 외관기반 접근방식은 사용자의 실루엣을 분석하고 다양한 사용자의 실루엣에 따라 의미 있는 행동을 구분한다. 특징기반 접근방식은 행동모델 정의에 사용되기 위해 미리 구축된 머리와 손 등의 특징점 혹은 특징영역에 초점을 맞춘다.On the other hand, such vision-based interactions can be largely divided into appearance-based and feature-based approaches. Appearance-based approach analyzes the user's silhouette and distinguishes meaningful behaviors according to various user's silhouettes. The feature-based approach focuses on feature points or feature areas, such as heads and hands, that are pre-built for use in defining behavior models.

특징기반 접근방식은 외관기반 접근방식보다 빠른 속도를 갖기 때문에 실시간 구현에 적합하다. 그러나 제스처 인식의 강건함은 특징점/특징영역의 정확성에 상당히 의존하게 되며, 정확한 특징을 뽑아내는 것도 어려운 문제이다.The feature-based approach is faster than the appearance-based approach, making it suitable for real-time implementation. However, the robustness of gesture recognition is highly dependent on the accuracy of the feature point / feature area, and it is also difficult to extract the exact feature.

한편, 여러 가지 사회 문제가 발생함에 따라 보안의 심각성이 대두되면서 모니터링 기술을 이용한 감시 시스템이 각광받고 있다. 영상 기반의 감시 시스템은 현재 널리 사용되고 있으나, 시스템의 사용자가 육안으로 영상을 감시하므로 인력과 시간을 투입해야 하는 단점이 있다. 지능형 감시 시스템을 이용하면 건물 내부에 무단으로 침입한 사람을 실시간으로 자동 추적하여 감시할 수 있어 각종 사고를 미연에 방지하는 것이 가능하므로 효율적이다.On the other hand, as various social problems occur, the seriousness of security has emerged, and a surveillance system using monitoring technology has been in the spotlight. Video-based surveillance system is currently widely used, but the user of the system has the disadvantage that the human and time should be invested in the video surveillance. The intelligent surveillance system can automatically track and monitor unauthorized intruders inside the building in real time, making it possible to prevent various accidents in advance.

영상 기반의 지능형 감시 시스템에서는 카메라 또는 비디오로 촬영된 인물 영상에 대한 정보를 처리하여 특정 인물을 실시간으로 추적하는 기술을 필요로 한다. 특히, 다중 카메라 시스템에서는 다른 위치에서 촬영된 인물 영상들 여러 장을 비교하여 동일인 여부를 판별하는 인식 기술이 반드시 필요하다. 이러한 인식을 위해서는 영상에 포함된 인물에 대한 특징을 추출하여 분석하는 기반 기술이 요구된다.Video-based intelligent surveillance systems require a technology to track a specific person in real time by processing information about a person's image taken with a camera or video. In particular, in a multi-camera system, a recognition technique for comparing the plurality of person images photographed at different positions and determining whether they are the same is essential. For this recognition, a base technology for extracting and analyzing features of a person included in an image is required.

인물에 대한 가시적인 특징은 얼굴이나 체격 등과 같은 신체적인 특징과 착 용한 의상의 특징으로 구분할 수 있다. 현재까지 얼굴 인식에 대한 연구(참조 문헌 『W. Zhao, R. Chellappa and P. J. Phillips, "Face Recognition: A literature survey", ACM Computing Survey, 35(4), 2003, pp. 399-458.』)는 다양하게 수행되어 왔지만 원거리 촬영에 의해 얼굴의 특징이 불명확한 경우, 그리고 촬영된 사람이 모자를 착용하거나 헤어 스타일로 인해 이목구비가 잘 드러나지 않는 경우 등 얼굴 인식이 불가능한 경우가 많으므로 동일인 여부를 정확하게 판별하기 위해서는 다른 종류의 특징 정보를 사용해야 한다.The visible characteristics of the characters can be divided into physical characteristics such as face or physique and characteristics of the worn clothing. Research on face recognition to date (W. Zhao, R. Chellappa and PJ Phillips, "Face Recognition: A literature survey", ACM Computing Survey, 35 (4), 2003, pp. 399-458.) Has been performed in a variety of ways, but face recognition is often unclear due to long-range shooting, and when a person is wearing a hat or a hair style is not easily visible, such as facial recognition is not possible. Different types of feature information should be used for discrimination.

얼굴의 특징 정보 이외에 인물 인증을 위해 유용하게 사용할 수 있는 것은 의상의 특징 정보이다. 일반적으로, 의상 인식은 옷의 디자인이 다양하기 때문에 일정한 패턴을 찾기가 쉽지 않고 특징을 분석하기가 어렵다. 그러므로 의상 인식은 문제를 세분화하여 옷의 종류에 따라 특징을 추출하여 인식할 필요성이 요구된다.Besides the feature information of the face, the feature information of the clothes may be usefully used for person authentication. In general, clothes recognition is difficult to find a certain pattern and difficult to analyze characteristics because of the variety of clothes design. Therefore, the costume recognition needs to subdivide the problem and extract and recognize the characteristics according to the type of clothes.

따라서, 본 출원인은 최대 마진 분류기인 Support Vector Machine(참조 문헌 『C. Burges, "A tutorial on support vector machines for pattern recognition", Data Mining and Knowledge Discovery, Vol.2, pp.121-167, 1998.』)을 이용하여, 주어진 영상에 포함된 의상이 와이셔츠, 넥타이, 재킷으로 구성된 양복(suit)인지를 판별하는 기술 및 주어진 영상에 포함된 의상이 가진 무늬 또는 장식을 분류하는 기술을 연구 중에 본 발명에 이르게 되었다. Accordingly, Applicant has described Support Vector Machine, the maximum margin classifier (see C. Burges, "A tutorial on support vector machines for pattern recognition", Data Mining and Knowledge Discovery, Vol. 2, pp. 121-167, 1998. Researches to determine whether the clothes included in a given image are suits consisting of shirts, ties, and jackets, and to classify the patterns or decorations of the clothes included in a given image. It came to invention.

????

본 발명은 전술한 문제점을 해결하기 위하여 안출된 것으로서, 카메라로부터 입력되는 영상에서 인물의 얼굴 영역을 감지하고, 감지된 얼굴 영역을 기반으로 하여 의상 영역을 정한 다음, 최대마진분류기(support vector machine; SVM)를 이용하여 인물의 의상이 어떤 종류인지를 판별하도록 한 최대마진분류기를 이용한 의상 분류 방법을 제공하는데 그 목적이 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problem, and detects a face area of a person in an image input from a camera, determines a clothes area based on the detected face area, and then supports a maximum margin classifier (support vector machine); It is an object of the present invention to provide a method of classifying clothes using a maximum margin classifier to determine what kind of clothes a person is using SVM.

전술한 목적을 달성하기 위해 본 발명의 의상 분류 방법은 컴퓨터 장치에 의해 수행되며, 실시간 입력되는 영상으로부터 배경영상을 차분하는 (a) 단계; 상기 (a) 단계에서 차분된 전체 영상의 전체 픽셀에 대한 전경 픽셀의 비율과 상기 전경 픽셀에 대한 피부 픽셀의 비율을 계산하는 (b) 단계; 상기 (a) 단계에서 차분된 전체 영상을 미리 정해진 크기의 격자패치로 분할한 후에, 분할된 각 격자패치 내의 전체 픽셀에 대한 전경 픽셀의 비율과 격자패치 내의 전경픽셀에 대한 피부픽셀의 비율을 계산하는 (c) 단계; 상기 (c) 단계에서 계산된 격자패치내의 상기 전경픽셀의 비율과 상기 피부픽셀의 비율을 상기 (b) 단계에서 계산된 각 비율에 따라 적응적으로 정해지는 임계값과 비교한 결과에 따라 각 격자패치를 전경격자, 피부격자 및 배경격자로 분류하는 (d) 단계; 상기 (d) 단계에서 분류된 격자패치 기반에 의해 레이블링을 수행하여 관심 피부영역을 바운딩 박스로 추출하는 (e) 단계; 아다부스트 알고리즘을 이용하여 상기 (e) 단계에서 추출된 관심 피부영역 중에서 얼굴 영역을 찾은 후에, 찾은 얼굴영역의 아래에 위치한 미리 정해진 크기의 영역을 의상영역으로 정하는 (f) 단계; 상기 (f) 단계에서 정해진 의상영역에 대해 소벨마스크를 적용하여 상기 의상영역에 대한 특징벡터를 구성하는 (g) 단계; 및 상기 (g) 단계에서 구성된 특징벡터에 대해 최대 마진 분류기를 적용하여 의상의 종류를 판별하는 (h) 단계를 포함하여 이루어진다.In order to achieve the above object, the clothing classification method of the present invention is performed by a computer device, the method comprising the steps of: (a) subtracting a background image from a real-time input image; (B) calculating a ratio of a foreground pixel to all pixels of the entire image differentiated in step (a) and a ratio of skin pixels to the foreground pixel; After dividing the entire image differentiated in the step (a) into a grid patch of a predetermined size, the ratio of the foreground pixel to the total pixels in each divided grid patch and the skin pixel to the foreground pixel in the grid patch are calculated. (C) doing; Each grid according to a result of comparing the ratio of the foreground pixel and the skin pixel in the grid patch calculated in step (c) with a threshold value adaptively determined according to each ratio calculated in step (b). (D) classifying the patch into a foreground grid, a skin grid and a background grid; (E) extracting a skin region of interest as a bounding box by performing labeling based on the grid patches classified in step (d); Finding a face region from the skin region of interest extracted in the step (e) by using an Adboost algorithm, and then setting a region of a predetermined size located below the found face region as a clothes region; (G) constructing a feature vector for the clothes area by applying a Sobel mask to the clothes area determined in the step (f); And (h) determining a type of clothes by applying a maximum margin classifier to the feature vector configured in step (g).

전술한 구성에서, 상기 (g) 단계는 상기 (f) 단계에서 정해진 의상영역에 대해 수평 방향으로 누적한 에지 히스토그램과 수직 방향으로 누적한 에지 히스토그램을 계산한 후, 계산된 에지 히스토그램을 결합하여 상기 의상영역에 대한 특징벡터를 구성하는 것이되, 상기 (h) 단계는 의상이 양복인지 여부를 판별하는 것이 될 수 있다.In the above-described configuration, the step (g) calculates an edge histogram accumulated in the horizontal direction and an edge histogram accumulated in the vertical direction with respect to the clothes area determined in the step (f), and then combines the calculated edge histograms. Comprising a feature vector for the clothes area, step (h) may be to determine whether the clothes are clothes.

또한, 상기 (g) 단계는 상기 (f) 단계에서 정해진 의상영역을 그레이스케일로 변환하는 (g-1) 단계와, 상기 (g-1) 단계에서 그레이스케일로 변환된 의상영역에 대해 소벨마스크를 적용하여 에지 영상을 계산하는 (g-2) 단계와, 상기 (g-2) 단계에서 계산된 에지 영상을 미리 정해진 크기의 격자패치로 분할하는 (g-3) 단계와, 상기 (g-3)에서 분할된 각 격자패치 내의 소벨 에지값을 합산한 후 상기 격자패치 내의 전체 픽셀 수로 나누어 각 격자패치의 에지 특징값을 산출하는 (g-4) 단계와, 상기 (g-4) 단계에서 산출한 각 격자패치의 에지 특징값을 나열하여 특징벡터를 구성하는 (g-5) 단계를 포함하여 이루어지되, 상기 (h) 단계는 의상의 무늬를 판별하는 것이 될 수 있다.In the step (g), the Sobel mask is applied to (g-1) converting the clothes area determined in the step (f) to grayscale, and the clothes area converted to grayscale in the step (g-1). (G-2) calculating the edge image by applying the step, dividing the edge image calculated in the step (g-2) into a grid patch of a predetermined size, and (g-) In step (g-4) and calculating the edge feature value of each lattice patch by summing the Sobel edge values in each lattice patch divided in 3) and dividing by the total number of pixels in the lattice patch. Comprising (g-5) to form a feature vector by arranging the edge feature value of each calculated grid patch, step (h) may be to determine the pattern of the clothes.

본 발명의 최대마진분류기를 이용한 의상 분류 방법에 따르면, 실시간 입력되는 영상에서 의상이 양복인지 여부, 의상의 무늬/장식이 어떤 형태인지를 판별할 수 있고, 이에 따라 이에 따라 최근에 관심을 모으고 있는 지능형 감시 시스템에서 서로 다른 위치에 설치되어 서로 다른 관찰 각도로 촬영되는 다중 카메라 영상에서의 동일인 식별을 위한 기반 기술로 활용될 수 있다.According to the method of classifying clothes using the maximum margin classifier of the present invention, it is possible to determine whether the clothes are suits or the patterns / decorations of the clothes in real-time input images, and accordingly, attention has been recently drawn. It can be used as a base technology for identification of the same person in multiple camera images that are installed at different locations and photographed at different viewing angles in an intelligent surveillance system.

이하에는 첨부한 도면을 참조하여 본 발명의 바람직한 실시예에 따라 최대마진분류기를 이용한 의상 분류 방법에 대해서 상세하게 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail with respect to the clothing classification method using the maximum margin classifier according to a preferred embodiment of the present invention.

영상에서 인물의 의상 영역을 추출하기 위해서는 우선, 인물의 얼굴 영역이 감지되여야 한다. 그런 다음, 얼굴 영역을 기준으로 아래 방향으로 일정 거리만큼 떨어진 위치에서 얼굴 영역의 크기를 기준으로 좌우 및 상하로 확장된 영역이 인물의 의상 영역으로 정의된다. 그리고 나서, 의상 영역에 대해 텍스처(texture) 정보를 추출하게 된다. 본 발명에서는 인물 영역 추출 시 영상 노이즈의 영향을 줄이고 수행 효율성을 증가시키기 위하여 격자 영상 (grid image) 을 정의하는 방법(참조 문헌 『Y. Choi, K. Kim, W. Cho, "Grid-based Approach for Detecting Head and Hand Regions", ICIC 2007, CCIS 2, 2007, pp. 1126-1132.』) 및 얼굴과 의상 영역을 발견하는 방법이 적용된다.In order to extract the clothing area of a person from an image, first, the face area of the person should be detected. Then, the area that extends to the left and right and up and down based on the size of the face area at a distance away from the face area by a predetermined distance is defined as the clothes area of the person. Then, texture information is extracted for the clothes area. In the present invention, a method of defining a grid image in order to reduce the influence of image noise and increase performance efficiency when extracting a person region (see, for example, `` Y. Choi, K. Kim, W. Cho, "Grid-based Approach) for Detecting Head and Hand Regions ", ICIC 2007, CCIS 2, 2007, pp. 1126-1132.

도 1은 본 발명의 일 실시예에 따른 최대마진분류기를 이용한 의상 분류 방법을 설명하기 위한 흐름도이다.1 is a flowchart illustrating a clothes classification method using a maximum margin classifier according to an embodiment of the present invention.

도 1에 도시한 바와 같이, 먼저 단계 S10에서는 전처리 과정(이런 의미에서 점선 블록으로 도시하고 있다)으로 인체가 위치할 장소의 배경영상을 촬영한 후에 이렇게 촬영된 배경영상의 각 픽셀별 평균과 표준편차 등을 구하여 배경 모델링을 수행하는데, 예를 들어 HSI 컬러모델을 사용하여 배경 모델링을 수행한다.As shown in FIG. 1, first, in step S10, a background image of a place where a human body is to be positioned is taken by a preprocessing process (shown by a dotted line block in this sense), and then the average and standard for each pixel of the taken background image are taken. Background modeling is performed by obtaining the deviation, for example, background modeling is performed using an HSI color model.

HSI 컬러모델은 색도(Hue), 채도(Saturation)와 명암(Intensity) 요소로 구성된다. 색도 요소는 색상 자체의 특정한 값을 표현하기 때문에 조명 변화에 관계없이 안정적인 값을 제공한다. 채도는 색상의 순도를 표현한다. 한편, 이와 같은 전처리, 즉 HSI 컬러공간에서의 배경영상을 획득하기 위해 소정 시간, 예를 들어 약 50 프레임에 상당하는 시간 동안의 훈련 과정을 거쳐 배경영상의 색도와 채도 값에 대한 가우시안(Gaussian) 배경모델을 만든다.The HSI color model consists of Hue, Saturation and Intensity components. Chromaticity components represent specific values of the color itself, providing stable values regardless of lighting changes. Saturation expresses the purity of color. On the other hand, Gaussian on the chromatic and chroma values of the background image is subjected to a preprocessing, that is, a training process for a predetermined time, for example, about 50 frames, to obtain a background image in the HSI color space. Create a background model.

다음으로, 단계 S20에서는 연속적으로 입력되는 상기 배경영상 하의 인체가 포함된 일련의 영상, 예를 들어 카메라로부터 실시간으로 입력되는 상기 배경영상 하의 인체가 포함된 영상으로부터 단계 S10에서 구축된 배경영상을 차분한다. 만약 한 픽셀에서의 차분값이 임계값보다 크면 그 픽셀은 전경픽셀로 결정된다.Next, in step S20, the background image constructed in step S10 is differentiated from a series of images including the human body under the background image continuously input, for example, an image including the human body under the background image input in real time from a camera. do. If the difference in one pixel is greater than the threshold, that pixel is determined as the foreground pixel.

구체적으로, 배경 (background) 영상과 전경 (foreground) 에서 동작하는 인물의 영역을 구분하기 위하여 배경 차감법 (background subtraction) 을 적용한다. 배경 차감법을 적용한 결과로 전경 영상이 얻어진다. 전경 영상에 속한 각 픽셀의 RGB(red-green-blue)값을 HSV(hue-saturation-value)값으로 변환한다. HSV값을 이용하여 각 픽셀을 피부색(skin color) 및 비피부색(non-skin color) 으로 구분한다. 색조(H)를 0에서 360, 채도(S)와 명도(V)를 0에서 255의 구간의 값으로 정의하였을 때 피부색에 속하게 되는 HSV 값의 범위는 다음 수학식 1과 같은 계산을 통하 여 얻을 수 있다.In detail, a background subtraction is applied to distinguish a background image and a region of a person operating in the foreground. The foreground image is obtained as a result of applying the background subtraction method. The RGB (red-green-blue) value of each pixel belonging to the foreground image is converted into a hue-saturation-value (HSV) value. HSV values are used to distinguish each pixel into a skin color and a non-skin color. When hue (H) is defined as a value in the range of 0 to 360, saturation (S), and brightness (V) in the range of 0 to 255, the range of HSV values belonging to the skin color is obtained by the following equation (1). Can be.

전경 영상의 각 픽셀을 피부색 및 비피부색으로 판정한 후, 단계 S30에서는 배경차분에 의해 얻어진 전체영상 내에서 전경픽셀의 비율과 피부픽셀의 비율을 계산하고, 다시 단계 S40에서는 단계 S30에서 계산된 비율에 적응적인 임계값을 적용하여 미리 정해진 크기, 예를 들어 8*8 픽셀로 정의된 격자패치에 포함된 격자영상을 배경격자, 피부격자 또는 피부를 제외한 일반적인 전경격자로 정의하게 된다.After determining each pixel of the foreground image as skin color and non-skin color, in step S30, the ratio of the foreground pixel and the skin pixel in the entire image obtained by the background difference is calculated, and again in step S40, the ratio calculated in step S30. By applying an adaptive threshold to the grid image, a grid image included in a grid patch defined in a predetermined size, for example, 8 * 8 pixels, is defined as a general foreground grid except for a background grid, a skin grid, or skin.

이를 보다 상세하게 설명하면, 배경차분에 의해 얻어진 영상은 소정 크기, 예를 들어 8*8 픽셀로 정의된 격자패치(

)로 구획되고, 이후에 각각의 격자패치(

)에 포함된 전경픽셀과 피부픽셀의 수가 계산된다. 그리고 격자패치(

)에 포함된 64개의 픽셀 중에서 전체픽셀에 대한 전경픽셀 비율(

)과 전경픽셀에 대한 피부픽셀의 비율(

)이 각 격자패치의 특징값으로 사용되는데, 이들은 아래의 수학식 2 및 3에 의해 산출된다.In more detail, the image obtained by the background difference is a grid patch defined by a predetermined size, for example, 8 * 8 pixels.

), And then each grid patch (

), The number of foreground pixels and skin pixels included in is calculated. And the grid patch (

), The ratio of foreground pixels to total pixels (of 64 pixels included in)

) And the ratio of skin pixels to foreground pixels (

) Is used as a characteristic value of each lattice patch, and these are calculated by

Equations

2 and 3 below.

각각의 격자 패치를

,

의 두 가지 패치 특징 값을 기본으로 하여 배경 부분, 비피부 영역 그리고 피부 영역 중의 한 가지로 레이블링 (labeling) 한다.

가 0.45 이상인 경우 전경 격자로 정의하고, 그 외의 경우 배경 격자로 정의한다. 전경 격자의 경우

값이 0.40 이상이 되는 격자는 피부 격자로, 그렇지 않은 격자는 비피부 격자로 구분한다.Each grid patch

,

Labeling is based on one of the two patch feature values of the background, non-skin and skin areas.

If more than 0.45, it is defined as the foreground grid, otherwise it is defined as the background grid. For foreground grid

Gratings with a value greater than 0.40 are classified as skin gratings, and gratings other than non-skin gratings.

각 격자 패치가 3가지 영역 중의 한 가지로 분류된 격자 영상은 원 영상 (original image) 에 비해 1/64로 축소된 해상도를 가진다. 격자 영상을 사용함으로써 피부 영역에 대한 블럽 (blob)을 추출할 때 영상 노이즈에 의해 발생할 수 있는 오류를 줄이고, 수행 효율성을 높일 수 있다. 도 2는 81×54 해상도의 그리드 영상을 보인 예시 사진인바, 여기에서 피부 격자는 분홍색으로, 비피부 격자는 파란색으로, 배경 격자는 투명한 색으로 표시되고 있다. The grid image in which each grid patch is classified into one of three regions has a resolution reduced to 1/64 compared to the original image. By using the grating image, it is possible to reduce the error caused by the image noise and increase the performance efficiency when extracting the blobs on the skin region. FIG. 2 shows an example of a grid image having a resolution of 81 × 54, wherein the skin grating is pink, the non-skin grating is blue, and the background grating is displayed in a transparent color.

한편, 카메라와 인체와의 거리에 상관없이 격자패치를 정확하게 분류하기 위 해서는 패치 특징값인

와

에 대한 적절한 임계값을 결정해야 하므로 패치 분류에서 적절한 두 개의 패치 특징값을 얻기 위해 히스토그램을 분석한다. 우선, 배경차분에 의해 얻어진 전체영상의 전체픽셀에 대한 전경픽셀의 비율()과 전경픽셀에 대한 피부픽셀의 비율(

이 각 격자패치를 분류하는데 기준이 되는 임계값을 결정하기 위해 아래의 수학식 4 및 5과 같이 계산된다.On the other hand, to accurately classify the grid patch regardless of the distance between the camera and the human body, the patch feature value

Wow

Since we need to determine the appropriate threshold for, we analyze the histogram to get the appropriate two patch characteristics from the patch classification. First, the ratio of the foreground pixel to the total pixel of the entire image obtained by the background difference ( ) And the ratio of skin pixels to foreground pixels (

In order to determine a threshold value for classifying each grid patch, the following equations (4) and (5) are calculated.

도 3은 카메라로부터 서로 다른 거리에 있는 인체로부터 획득된 두 인체 영상 예시도로서, (a)와 (b)의 영상에서 전체픽셀에 대한 전경픽셀의 비율은 각각 10.73%와 20.2%로 상이하고, 전경픽셀에 대한 피부픽셀의 비율 역시 각각 3.5%와 11.2%로 상이하다. 따라서, 도 3의 (a) 및 (b) 영상의 임의의 격자패치에 대한 두 개의 패치 특징값인

와

이 비록 동일하다 하더라도 당해 격자패치를 분 류하는데 기준이 되는 임계값은 (a) 및 (b) 영상에서 다르게 정해지는 것이 바람직하다.3 is an exemplary diagram of two human images obtained from a human body at different distances from the camera, and the ratios of the foreground pixels to the total pixels in the images of (a) and (b) differ by 10.73% and 20.2%, respectively. The ratio of skin pixels to foreground pixels also differs between 3.5% and 11.2%, respectively. Thus, two patch feature values for any grid patch of the images (a) and (b) of FIG.

Wow

Although the same is the same, it is preferable that a threshold value used to classify the grid patch is determined differently in the images (a) and (b).

이를 위해 두 개의 패치 특징값인

와

에 대해 히스토그램을 생성한다. 즉, 모든 격자패치에 대해 전경픽셀의 비율(

이 구해진다. 히스토그램에서 조건을 만족하는 격자패치의 개수는 분할되어지는 격자의 수에 의해 정규화된다. 격자에서 전경픽셀에 대한 피부픽셀의 비율(

)은 전경픽셀이 존재하는 구획에 대해 구해진다. 이 히스토그램의 조건을 만족하는 패치의 개수는 전경픽셀을 하나 이상 포함하는 구획의 개수에 의해 정규화되어진다.Two patch feature values

Wow

Create a histogram for. That is, the ratio of foreground pixels to all grid patches (

Is obtained. The number of grid patches that satisfy the condition in the histogram is normalized by the number of grids to be partitioned. The ratio of skin pixels to foreground pixels in the grid (

) Is obtained for the section in which the foreground pixel exists. The number of patches that satisfy the condition of this histogram is normalized by the number of partitions containing one or more foreground pixels.

한편, 두 개의 패치 특징값인

와

에 대한 임계값

와

는 아래의 수학식 6 및 7과 같이 정의된다.On the other hand, two patch feature values

Wow

Threshold for

Wow

Is defined as in Equations 6 and 7 below.

위의 수학식 6 및 7에서,

이고,In Equations 6 and 7, above,

ego,

이다.

to be.

그리고 각각의 격자패치(

)는 위의 수학식 6 및 7에 의해 각각 구해진 임계값

및

에 기반하여 아래의 수학식 8과 같이 3개의 그룹 즉, 전경격자, 피부격자 및 배경격자 중의 하나로 분류된다.And each grid patch (

) Is a threshold value obtained by Equations 6 and 7, respectively.

And

Based on Equation 8 below, it is classified into one of three groups, that is, the foreground grid, the skin grid, and the background grid.

도 4는 도 3의 영상에 대한 히스토그램이다. 도 4에서 검정색 막대는 원거리 객체(인체) 및 근거리 객체(인체)의 각각에서 전경격자와 피부격자를 결정하기 위한 임계값을 나타내는바, 전경격자의 경우에는 근거리 객체보다 원거리 객체에서 그 임계값이 상대적으로 낮게 정해지고, 반면 피부격자의 경우에는 원거리 객체보다 근거리 객체에서 그 임계값이 상대적으로 낮게 정해진다.4 is a histogram of the image of FIG. 3. In FIG. 4, the black bars represent thresholds for determining the foreground grid and the skin lattice in each of the remote object and the near object. In the case of the foreground grid, the threshold is higher in the far object than in the near object. It is set relatively low, whereas in the case of skin gratings, the threshold value is set relatively low in the near object than the far object.

도 5는 도 3의 영상에서 변환된 81*54(의 해상도 영상에서 추출된 격자이며) 해상도의 격자패치 영상인바, 전경영역은 푸른색으로, 피부영역은 분홍색으로 표시되고 있다.FIG. 5 is a grid patch image having a resolution of 81 * 54 (the resolution image extracted from the resolution image) converted from the image of FIG. 3, the foreground area is blue, and the skin area is displayed in pink.

도 1로 돌아가서, 단계 S50에서는 이렇게 정의된 격자패치 영상을 기반으로 한 레이블링(Labelling)이 수행되고, 단계 S60에서는 피부 관심영역(ROI; Region Of Interest)이 추출된다.Returning to FIG. 1, in step S50, labeling based on the grid patch image defined as described above is performed, and in step S60, a region of interest (ROI) is extracted.

이를 보다 상세하게 설명하면, 도 2 및 도 3과 같이 간단한 배경차분 영상은 피부 관심영역에 노이즈 픽셀이나 구멍이 포함되어 있다. 그러므로 이러한 배경차분 영상을 레이블링하게 되면 원하지 않는 결과를 얻게 되고, 이러한 현상은 영상의 해상도가 높을수록 심화되게 된다.In more detail, as illustrated in FIGS. 2 and 3, a simple background difference image includes noise pixels or holes in the region of skin of interest. Therefore, labeling such a background difference image results in undesirable results, and this phenomenon becomes worse as the resolution of the image is higher.

그러나 본 발명의 검출 방법을 사용하면 피부영역이 낮은 해상도로 레이블링되기 때문에 노이즈 픽셀 등의 예기치않은 인위적 결함이 있을 경우에도 좋은 성능을 기대할 수 있다. 본 발명의 검출 방법에서는 각 격자패치에 CCL(connected component labeling) 알고리즘을 적용하여 피부영역을 레이블링하고 있는바, 만약 같은 레이블로 지정된 영역이 너무 작다면 이 영역을 노이즈 패치로 간주하게 된다. 마지막으로 같은 피부영역으로 레이블링 된 영역을 바운딩 박스(Bounding Box; 방형 경계선) 처리함으로써 피부 관심영역을 추출하게 된다.However, using the detection method of the present invention, since the skin region is labeled with a low resolution, good performance can be expected even in the presence of unexpected artificial defects such as noise pixels. In the detection method of the present invention, a skin region is labeled by applying a connected component labeling (CCL) algorithm to each lattice patch. If the region designated by the same label is too small, the region is regarded as a noise patch. Finally, the region of skin is extracted by processing a region labeled with the same skin region by a bounding box.

실험 영상 획득을 위한 카메라로는 640*480 해상도로 초당 30 프레임을 획득할 수 있는 Dragonfly2 IEEE 1394 디지털 카메라를 사용하였다. 우선 소정 크기로 분할된 각각의 격자패치를, 고정된 임계값과 패치 히스토그램 분석에 기반을 둔 적응적 임계값을 사용하여 각각 분류하였다.For the experimental image acquisition, a Dragonfly2 IEEE 1394 digital camera capable of capturing 30 frames per second at 640 * 480 resolution was used. First, each grid patch divided into predetermined sizes was classified using fixed thresholds and adaptive thresholds based on patch histogram analysis.

도 6은 격자패치의 분류에 있어서 고정적 임계값과 적응적 임계값을 적용하였을 때의 피부영역 검출 능력을 비교한 도표이다. 도 6에 도시한 바와 같이, 고정적 임계값을 사용하여 격자패치를 분류하는 경우에 가까운 거리의 큰 피부영역은 빠르게 검출할 수 있으나 먼 거리에 있는 작은 피부영역의 피부 관심영역은 노이즈로 오분류될 수가 있다. 그러나 본 발명의 검출 방법과 같이 적응적 임계값을 적용하여 격자패치를 분류하는 경우에는 원거리의 작은 피부영역도 성공적으로 검출될 수가 있다.FIG. 6 is a chart comparing skin area detection capability when a fixed threshold and an adaptive threshold are applied in the classification of grid patches. As shown in FIG. 6, when the grid patch is classified using a fixed threshold value, a large skin area of a short distance can be detected quickly, but a skin region of interest of a small skin area at a long distance may be misclassified as noise. There is a number. However, when the grid patch is classified by applying an adaptive threshold value as in the detection method of the present invention, even a small skin area of a long distance can be detected successfully.

본 발명의 검출 방법은 기본적으로 기정의된 피부색깔 모델에 기반을 둔 피부 픽셀 검출 방법이다. 따라서, 만약 사용자가 짧은 소매의 옷을 입었을 경우, 손과 팔 영역이 같은 피부 관심영역으로 검출될 수 있는데, 손 영역만을 정확하게 검출하기 위해서는 골격 기반의 형태 분석 기법이 더 필요할 수도 있다.The detection method of the present invention is basically a skin pixel detection method based on a predefined skin color model. Therefore, if the user wears short sleeves, the hand and arm areas may be detected as the same skin region of interest. In order to accurately detect only the hand region, a skeleton-based shape analysis technique may be further required.

도 7은 짧은 소매의 옷을 입은 인체의 피부 관심영역의 검출 결과를 보인 영상으로, 왼쪽은 원 영상, 가운데는 검출된 피부영역이 바운딩 박스 처리된 영상이고, 오른쪽은 가운데의 확대 영상이다.7 is an image showing a detection result of a skin ROI of a human body wearing a short sleeved garment, the left image of which is the original image, the center of the detected skin region, which is bounded by the box, and the right side of the image, which is an enlarged image of the center.

도 1로 돌아가서, 단계 S70에서는 단계 S60에서 추출된 관심 피부영역 중에서 얼굴영역을 찾고, 다시 단계 S80에서는 찾은 얼굴영역의 아래에 위치한 미리 정해진 크기의 영역을 의상영역으로 정하게 된다.Returning to FIG. 1, in step S70, a face region is found from the skin region of interest extracted in step S60, and in step S80, a region of a predetermined size located below the found face region is determined as the clothes region.

이를 구체적으로 설명하자면, 관심 피부영역이 얼굴 영역을 포함하는지 여부는 아다부스트(AdaBoost) 알고리즘을 이용하여 판정한다.Specifically, whether the skin region of interest includes the facial region is determined using an AdaBoost algorithm.

도 8은 검출된 관심 피부영역이 얼굴 또는 손 영역에 해당하는지 판단하기 위해 아다부스트 알고리즘을 적용한 예를 보인 것인바, 검출된 피부 블럽 중에서 얼굴 특징으로 높게 보인 블럽은 얼굴 블럽으로 정의되고, 나머지 블럽들은 손 혹은 팔 블럽으로 구분된다.FIG. 8 illustrates an example in which an Adaboost algorithm is applied to determine whether a detected skin region of interest corresponds to a face or a hand region. Among the detected skin blobs, a blob that is highly visible as a facial feature is defined as a facial blob, and the remaining blobs are defined. They are divided into hand or arm blobs.

도 9는 의상 영역의 추출 결과를 보인 예시 사진이다. 도 9에 도시한 바와 같이, 발견된 얼굴 블럽을 둘러싸는 사각 영역을 얼굴 영역으로 정의한 뒤, 아래 방향으로 사각 영역의 폭의 두 배, 높이의 두 배에 해당하는 사각 영역을 정의하여 이를 의상 영역으로 정하게 된다. 도 9에서 발견된 의상 영역은 파란색 테두리의 사각 영역으로 표시되고 있다. 촬영한 카메라의 위치 및 각도에 따라 의상 영역의 위치와 크기는 조절될 수 있다. 9 is an example photograph showing the extraction result of the clothes area. As shown in FIG. 9, a blind spot surrounding the found face blob is defined as a face zone, and then a blind spot corresponding to twice the width and twice the height of the blind spot in a downward direction is defined. It is decided as. The clothes area found in FIG. 9 is represented by a rectangular area with a blue border. The position and size of the clothes area may be adjusted according to the position and angle of the photographed camera.

도 1로 돌아가서, 단계 S90에서는 단계 S80에서 정해진 의상영역에 대해 소벨마스크(sobel mask)를 적용하여 상기 의상영역에 대한 특징벡터를 구성하고, 다시 단계 S100에서는 단계 S90에서 구성된 특징벡터에 대해 최대 마진 분류기(maximal margin classifier)인 Support Vector Machine을 적용하여 의상의 종류를 판별하게 된다.Returning to FIG. 1, in step S90, a sobel mask is applied to the clothes area determined in step S80 to construct a feature vector for the clothes area, and again in step S100, a maximum margin for the feature vector configured in step S90. The type of clothing is determined by applying a support vector machine, which is a maximum margin classifier.

영상의 텍스처(texture) 특징은 내용 기반 영상 검색 분야(참조 문헌 『J. R. Smith, S. F. Chang, "Tools and techniques for color image retrieval", Storage & Retrieval for Image and Video Databases IV, Vol.2670, pp.426-437, 1996.』)나 객체 인식 분야 (참조문헌『백낙훈, 김구진, "격자 단위 특징 값을 이용한 도로 영상의 차량 영역 분할", 한국멀티미디어학회, Vol.8, No.10, 2005.』)에서 자주 사용되어 왔다. 의상 중에서 특히 양복의 경우에는 영상의 에지 분포가 일정한 패턴을 가지므로, 양복 판별을 위해서는 텍스처 특징을 사용하는 것이 바람직하다. 또한, SVM은 이진 분류기이므로 양복인 클래스와 양복이 아닌 클래스로 분류하기에 적합하며, 분류 성능이 우수하기 때문에 양복 판별과 의상무늬 분류 도구로 사용되는 것이 바람직하다.The texture features of images are described in the field of content-based image retrieval (see JR Smith, SF Chang, "Tools and techniques for color image retrieval", Storage & Retrieval for Image and Video Databases IV, Vol. 2670, pp. 426). -437, 1996.) or the field of object recognition (Ref. 『Nak-Hoon Paik, Ku-Jin Kim,“ Vehicle Segmentation of Road Images Using Grid Unit Feature Values, ”Korea Multimedia Society, Vol. 8, No. 10, 2005.』) Frequently used in. In particular, in the case of clothes, the edge distribution of the image has a constant pattern. Therefore, it is preferable to use a texture feature to distinguish clothes. In addition, since SVM is a binary classifier, it is suitable for classifying a class that is a suit and a class that is not a suit, and it is preferable to be used as a suit discrimination and clothes pattern classification tool because of excellent classification performance.

먼저, 양복 의상의 선별의 경우를 상세하게 설명하자면 양복과 비양복을 구분하기 위해서 의상 영역에 대한 텍스처 정보를 이용하게 된다. 영상의 텍스처 추출 방법으로는 GLCM (Gray Level Co-occurrence Matrix) 혹은 Wavelet 변환 등 다양한 기법들이 있지만, 계산 시간이 짧으면서도 효과적으로 텍스처 정보를 표현하기 위해서는 에지 히스토그램을 이용하여 특징벡터를 구성하는 것이 바람직하다. First of all, in order to describe the case of the selection of suit clothes, texture information about the clothes area is used to distinguish clothes and non-wear clothes. There are various techniques such as GLCM (Gray Level Co-occurrence Matrix) or Wavelet transformation for texture extraction of images, but it is desirable to construct feature vectors using edge histogram to express texture information efficiently with short computation time. .

도 10은 소벨 마스크의 일 예를 보인 것인바, 구체적인 특징벡터의 추출 방법은 먼저, 의상 영역을 그레이스케일(grayscale) 영상으로 변환하고, 그레이스케일 영상의 각 픽셀에 대해 소벨 필터(소벨 마스크)를 적용하여 에지 영상을 계산한다. 그 다음, 입력 영상에 따라 의상 영역의 크기가 달라질 수 있으므로, 의상 영역의 영상에 대해 정규화(normalization)를 수행하여 에지 영상을 일정한 크기로 조정한다.FIG. 10 illustrates an example of a Sobel mask, in which a method of extracting a specific feature vector first converts a clothes area into a grayscale image, and then applies a Sobel filter (Sobel mask) to each pixel of the grayscale image. Apply to calculate the edge image. Next, since the size of the clothes area may vary according to the input image, the edge image is adjusted to a constant size by performing normalization on the image of the clothes area.

도 11은 에지 영상의 히스토그램을 보인 예시도인바, 정규화된 에지 영상에서 각 픽셀의 에지 값을 수평 방향으로 누적한 히스토그램(도 11 (a))을 계산하고, 에지 영상을 수직 방향으로 3 등분한 3개의 영역에 대해 각각 수직 방향으로 누적한 히스토그램(도 11 (b))을 계산하게 된다.FIG. 11 is an exemplary diagram showing a histogram of an edge image. In the normalized edge image, a histogram (FIG. 11 (a)) in which the edge values of each pixel are accumulated in the horizontal direction is calculated, and the edge image is divided into three equal parts in the vertical direction. The histogram accumulated in the vertical direction for each of the three regions (FIG. 11B) is calculated.

그런 다음, 계산된 4개의 에지 히스토그램을 결합하여 최종적으로 특징벡터 를 구성한다. 예들 들어, 에지 영상이 40*60 크기이고 픽셀의 소벨 에지

의 분포를 나타내는 수직 및 수평 방향의 히스토그램을 각각

로 표기한다면

,

일 때 각각의 히스토그램은 다음 수학식 9와 같이 정의될 수 있다. 즉, 특징벡터 F는

로 정의된다. Then, the calculated four edge histograms are combined to finally form a feature vector. For example, the edge image is 40 * 60 size and the Sobel edge of the pixel

Each of the vertical and horizontal histograms representing the distribution of

If written as

,

Each histogram may be defined as in Equation 9 below. That is, the feature vector F is

Is defined as

도 12는 양복과 비양복 영상에 대한 에지 히스트로그램을 보인 예시도이다. 도 12에 도시한 바와 같이, 양복과 비양복 영역은 에지 히스토그램으로 구성한 특징벡터에서 패턴 상의 차이점을 보인다. 도 12의 하단에 제시된 수평 방향의 에지 히스토그램을 예로 들어, 양복인 의상 도 12(a), 도 12(b)는 그래프가 단조 감소하는 특성을 보이는 반면, 비양복인 의상 도 12(c), 도 12(d)에서는 일정한 패턴이 보이지 않는다.12 is an exemplary view showing an edge histogram for a suit and a non-clothing image. As shown in Fig. 12, the suit and the undressed area show a difference in pattern in the feature vector composed of the edge histogram. Taking the horizontal histogram shown in the lower part of FIG. 12 as an example, the garments of the garments FIG. 12 (a) and FIG. 12 (b) show that the graph is monotonically decreasing, whereas the garments of the undressed garments FIG. 12 (c), In Fig. 12 (d), a certain pattern is not seen.

마지막으로, SVM은 최대 마진 분류기(maximal margin classifier)(참조문헌『R. O. Duda, P. E. Hart, D. G. Stork, Pattern recognition, 2nd Edition, Wiley-Interscience, 2000.』)로써, 특징벡터를 두 개의 클래스로 분류하게 된다. 구체적으로, 에지 영상이 예컨대 40*60 크기일 때 히스토그램

들을 차례대로 결합하여 구성한 특징벡터 F의 차원은 180이 된다. 학습 영상들은 양복 및 비양복 의상으로 분류된 뒤, 이들에 대한 특징벡터가 계산되고 이에 따라 양복 및 비양복은 각각, 양과 음의 클래스로 설정된다. 이렇게 SVM이 구성된 후 테스트 과정에서 임의의 영상이 입력되면, 입력된 영상에서 의상 영역에 대해 특징벡터가 계산된다. 그런 다음, 특징벡터는 SVM으로 입력되는데 SVM으로부터 양의 실수 값이 출력되면 양복의상으로 음의 실수 값이 출력되면 비양복 의상으로 판별되는 것이다.Finally, SVM is a maximum margin classifier (RO Duda, PE Hart, DG Stork, Pattern recognition, 2nd Edition, Wiley-Interscience, 2000.), classifying feature vectors into two classes. Done. Specifically, histogram when the edge image is 40 * 60 size, for example

The dimension of the feature vector F formed by combining them in turn is 180. After the learning images are classified into suit and undress clothes, feature vectors for them are calculated and accordingly, the suit and unsuit clothes are set to positive and negative classes, respectively. After the SVM is configured and a random image is input during the test process, a feature vector is calculated for the clothes area in the input image. Then, the feature vector is inputted to the SVM. If a positive real value is outputted from the SVM, a negative real value is outputted to the clothes, which is determined as an undressed garment.

한편, 본 발명에 따른 의상의 무늬별 분류의 경우를 상세하게 설명하자면 다음과 같다.On the other hand, the case of classification according to the pattern of the garment according to the invention in detail as follows.

의상의 무늬별 분류는 입력 영상에서 추출한 소매를 제외한 의상 영역에 대해 특징벡터를 구성한 뒤, 기계학습(machine learning) 방법 중의 한 가지인 SVM (support vector machine) 을 이용하여 의상의 무늬/장식 형태를 예컨대, 다음 표 1과 같은 4개의 클래스(class)로 분류하는 것을 의미한다. 도 13은 의상을 무늬의 형태별로 분류하여 보인 예시 사진이다.The classification of clothes by pattern consists of a feature vector for the clothing area except the sleeve extracted from the input image, and then uses the SVM (support vector machine) which is one of the machine learning methods. For example, this means classification into four classes as shown in Table 1 below. 13 is an exemplary photograph showing the clothes classified by the shape of the pattern.

클래스 명Class name 분류 기준Classifier 1(민무늬)1 (thin pattern) 무늬와 장식이 없는 경우If there is no pattern and decoration 2(민무늬 바탕 지역적 무늬)2 (pop pattern background regional pattern) 민무늬 바탕에 지역적인 무늬 또는 장식이 있는 경우If there is a local pattern or ornament on the plain pattern background 3(균일한 전역적 무늬)3 (uniform global pattern) 균일한 전역적 무늬 또는 장식이 있는 경우Have a uniform global pattern or decoration 4(균일하지 않은 전역적 무늬)4 (uneven global pattern) 균일하지 않은 전역적 무늬 또는 장식이 있는 경우Have a non-uniform global pattern or decoration

먼저, 의상의 무늬별 분류를 위한 전처리 과정으로써, 각 클래스별 학습 의상 세트(set)을 이용하여 SVM을 구성하게 된다. 그런 다음, 의상 영상

를 그레이스케일(grayscale) 영상

로 변환하고, 영상

에 대해 소벨 에지 필터(소벨 마스크)를 적용하여 에지 영상

를 계산한 후, 영상

를 예컨대, 가로 4등분, 세로 5등분하여 4*5 크기의 격자패치로 분할하게 된다. 예를 들어, 에지 영상의 한 격자

가

크기이고 픽셀의 소벨 에지가

일 때 이 격자의 에지 특징값

은 다음 수학식 10와 같이 계산될 수 있다.First, as a preprocessing process for classifying the patterns of clothes, the SVM is configured using a set of learning clothes for each class. Then, the costume video

Grayscale image

Convert to video

Edge images by applying a Sobel edge filter (Sobel mask) to the

After calculating the,

For example, by dividing the horizontal 4 equals, vertical 5 divided into 4 * 5 grid patch. For example, one grid of edge images

end

Size and the Sobel edge of the pixel

Edge feature of this grid when

May be calculated as in Equation 10 below.

즉, 특징벡터 F는 각 격자 별로 계산된 에지 특징값을 1차원 벡터로 나열하여 구한다.That is, the feature vector F is obtained by arranging the edge feature values calculated for each lattice as one-dimensional vectors.

마지막으로, 특징벡터 F를 SVM에 입력하여 의상 클래스를 분류하게 된다.Finally, the feature vector F is entered into the SVM to classify the clothes class.

일반적으로, SVM은 이진 분류기로 사용되지만, 의상의 무늬 분류는 다중 클래스 문제에 해당하므로 one-vs-all 비교 방식을 적용하여 SVM을 사용하게 된다. 전처리 과정에서는 각 의상 클래스마다 학습 영상 세트을 이용하여 이진 분류 SVM을 생성함으로써, 총 4 개의 SVM을 얻는다. 클래스

에 해당하는 특징벡터를 분류하기 위한 SVM은 다음과 같이 구성된다.In general, SVM is used as a binary classifier, but because the pattern classification of clothes is a multi-class problem, one-vs-all comparison method is used to use SVM. In the preprocessing process, four SVMs are obtained by generating binary classification SVMs using the training image set for each clothing class. class

The SVM for classifying the feature vectors corresponding to the above is constructed as follows.

학습 영상 세트에서 클래스

인 특징벡터에 대해 클래스 정보를 +1로,

외의 다른 클래스에 속한 특징벡터에 대해 클래스 정보를 -1로 설정한 뒤, 이들을 입력으로 하여 SVM을 생성한다. 이와 같은 방식으로 각 클래스에 대한 SVM을 모두 생성한다. 실험 영상에 대한 특징벡터가 주어질 때, 이 특징벡터를 4개의 SVM에 모두 입력하여, 이 중에서 출력값이 가장 큰 SVM에 해당하는 의상 클래스로 특징벡터를 분류한다.Class from learning video set

Class information to +1 for

SVM is created by setting class information as -1 for feature vectors belonging to other classes and inputting them. In this way, you create all the SVMs for each class. When the feature vector for the experimental image is given, the feature vector is input to all four SVMs, and the feature vectors are classified into the clothes class corresponding to the SVM having the largest output value.

아래는 의상의 무늬를 분류하기 위한 SVM 적용 알고리즘을 보인 것이다. The following shows the SVM application algorithm to classify the pattern of clothes.

위의 SVM 적용 알고리즘 중에서

은 클래스

의 분류를 위한 SVM이며, 알고리즘의 출력은 의상 클래스

이다.Among the SVM application algorithms above

Silver class

SVM for the classification of, and the output of the algorithm is the costume class

to be.

이하에서는 본 발명에 따른 양복의상 선별 방법의 실험 결과에 대해 설명한다.Hereinafter, the experimental results of the suit clothes selection method according to the present invention.

양복의상 선별 방법의 실험을 위한 알고리즘 구현 언어로는 VISUAL C++ 6.0가 사용되었으며, 하드웨어 환경으로는 1GByte의 메모리와 3GHz 펜티엄 4 CPU가 장착된 사양의 컴퓨터가 사용되었다. 학습 및 실험에 총 68장의 영상이 사용되였으며, 여기에서 양복 및 비양복 의상의 영상은 각 34장이다. 실험 및 학습 영상은 다양한 크기로 주어질 수 있지만, 영상 내에서 발견된 의상 영역의 크기는 40*60으로 정규화된다. 도 14는 양복의상 선별 방법의 실효성 검증을 위해 쓰인 예시 사진인바, 윗쪽 사진은 원 영상이고 아래쪽 사진은 원 영상에서 의상 영역을 추출한 것이다. 또한, 학습 데이터와 테스트 데이터의 세트는 아래 표 2와 같이 구성되었다. 여기서, 본 출원인은 실험의 안정성을 높이기 위해 영상 데이터를 랜덤하게 선정하여 학습 및 테스트 데이터가 각 17장으로 구성된 3개의 세트를 구성하였다. 또한, SVM을 본 실험에 적용하기 위해 『http://svmlight.joachims.org』를 참고하였다.VISUAL C ++ 6.0 was used as the algorithm implementation language for the experiment of clothes selection method, and the computer with the specification equipped with 1GByte memory and 3GHz Pentium 4 CPU was used as the hardware environment. A total of 68 images were used for learning and experimentation, where 34 images of suits and unsuited costumes were made. Experimental and training images can be given in various sizes, but the size of the clothing area found in the images is normalized to 40 * 60. 14 is an example picture used for validating the suit clothes selection method, the top picture is the original image and the bottom picture is the clothing area is extracted from the original image. In addition, a set of training data and test data is configured as shown in Table 2 below. Here, the Applicant randomly selected the image data in order to increase the stability of the experiment to configure three sets of 17 pieces of training and test data each. Also, refer to 『http://svmlight.joachims.org』 to apply SVM to this experiment.

영상 종류Video type 학습 데이터Learning data 테스트 데이터Test data 총 데이터Total data 양복suit 1717 1717 3434 비양복Non-suit 1717 1717 3434

3개의 세트에 대해 수행한 실험 결과는 아래 표 3과 같은바, 인식 성공률은 최저 91.18%에서 최고 97.06%로 측정되었고 평균 인식 성공률 95.1%는 신뢰할 만한 높은 수치로 볼 수 있다.The experimental results for the three sets are shown in Table 3 below. The recognition success rate was measured from 91.18% to 97.06%, and the average recognition success rate was 95.1%.

setset AccuracyAccuracy PrecisionPrecision RecallRecall 1One 97.06%97.06% 94.44%94.44% 100.00%100.00% 22 91.18%91.18% 93.75%93.75% 88.24%88.24% 33 97.06%97.06% 94.44%94.44% 100.00%100.00% 평균Average 95.1%95.1% 94.21%94.21% 96.08%96.08%

다음으로, SVM을 사용하는 방법 외에 특징벡터간의 Euclidean distance를 이용하여 분류하는 방법을 비교한 결과, 아래 표 4와 같은 결과를 얻었으며, SVM과 Euclidean distance의 평균 인식 성공률의 차이는 도 15와 같다. Set 2에 대해서는 두 가지 방법의 인식 성공률이 같으며, 나머지 셋에 대해서는 Euclidean distance 보다 SVM을 이용할 경우의 인식 성공률이 높게 측정되었고, 평균 인식 성공률은 2.96%의 차이를 보였다.Next, as a result of comparing the classification method using Euclidean distance between feature vectors in addition to the method using SVM, the results are obtained as shown in Table 4 below, and the difference in the average recognition success rate between SVM and Euclidean distance is shown in FIG. 15. . For Set 2, the recognition success rate of the two methods is the same. For the other sets, the recognition success rate using SVM was higher than Euclidean distance, and the average recognition success rate was 2.96%.

SetSet AccuracyAccuracy Set1Set1 94.12%94.12% Set2Set2 91.18%91.18% Set3Set3 91.18%91.18% 평균Average 92.16%92.16%

다음으로, 아래 표 5에서는 에지 히스토그램의 결합 방법을 변화시키며 특징벡터를 다양하게 구성하는 방법을 제시한다. 아래 표 6와 표 7에서는 특징벡터의 종류 별로 인식 성공률을 제시한다. 4개의 에지 히스토그램을 모두 결합하여 구성한 특징벡터에 대한 인식 성공률은 표 6과 같다. 수평 방향의 에지 히스토그램만을 사용하여 구성한 특징벡터

의 인식 성공률은 표 6에 제시되었고, 수직 방향의 에지 히스토그램만을 사용하여 구성한 특징벡터

의 인식 성공률은 표 7에 제시되었다.Next, Table 5 below shows how to vary the method of combining the edge histogram and configure the feature vectors in various ways. Tables 6 and 7 below show recognition success rates for each type of feature vector. Table 6 shows the recognition success rates for the feature vectors that combine all four edge histograms. Feature vector constructed from horizontal edge histogram only

The recognition success rate of is presented in Table 6, and the feature vector is constructed using only the vertical edge histogram.

The recognition success rate of is presented in Table 7.

특징벡터Vector illustration

HistogramHistogram

DimensionDimension 180180 6060 120120

SetSet Accuracy(%)Accuracy (%) Precision(%)Precision (%) Recall(%)Recall (%) Set1Set1 73.5373.53 72.2272.22 76.4776.47 Set2Set2 67.6567.65 63.6463.64 82.3582.35 Set3Set3 85.2985.29 87.5087.50 82.3582.35 평균Average 75.4975.49 74.4574.45 80.3980.39

SetSet Accuracy(%)Accuracy (%) Precision(%)Precision (%) Recall(%)Recall (%) Set1Set1 85.2985.29 92.8692.86 76.4776.47 Set2Set2 88.2488.24 93.3393.33 82.3582.35 Set3Set3 91.7591.75 93.7593.75 88.2488.24 평균Average 85.4385.43 93.3193.31 82.3582.35

도 16은 3가지 종류의 특징벡터에 대한 평균 성공 인식률을 비교한 것인바, 특징벡터의 차원이 낮을수록 분류에 필요한 시간을 단축시킬 수 있지만, 인식 성공률은 낮아진다는 것을 알 수 있다. 양복 분류 문제에서는 모든 에지 히스토그램으로 특징벡터를 구성했을 때와 그러지 않은 경우를 비교했을 때 인식 성공률이 큰 차이를 보이므로 계산 시간이 비효율적이라도 벡터의 차원을 높게 구성하는 것이 더 합리적이라고 판단할 수 있다.FIG. 16 compares the average success recognition rates of three types of feature vectors. As the dimension of the feature vectors is lower, the time required for classification can be shortened, but the recognition success rate is low. In the suit classification problem, the recognition success rate differs significantly when the feature vectors are composed of all edge histograms and when it is not. Therefore, it may be more reasonable to construct a higher dimension of the vector even if the computation time is inefficient. .

도 17은 양복과 비양복 의상을 SVM으로 분류한 결과 실패한 경우를 보인 것인바, (a)와 (b)는 양복인 의상을 비양복으로 분류한 경우이고 (c)와 (d)는 비양복인 의상을 양복으로 분류한 경우이다. 분류 실패의 원인은 양복 영상의 경우 의상 영역에 배경 및 얼굴 부분이 포함되어 불필요한 에지가 발생했기 때문으로 분석된다. 비양복 영상의 경우는 의상의 디자인이 양복과 유사하여, 에지 히스토그램의 패턴이 양복인 의상과 구분되지 않아 잘못 분류된 것으로 분석된다.FIG. 17 shows a case in which a suit and a non-clothing costume are classified as SVM and failed. (A) and (b) show a case in which a suit is a suit, and (c) and (d) are non-suit. This is the case when the costume is classified as a suit. The reason for the classification failure is that the suit image includes the background and face parts in the clothes area, causing unnecessary edges. In the case of unsuited images, the design of the costume is similar to that of the suit, and the pattern of the edge histogram is not distinguished from the clothes of the suit, which is analyzed as being misclassified.

이하에서는 본 발명에 따른 의상 무늬 분류 방법의 실험 결과에 대해 설명한다.Hereinafter, the experimental results of the clothing pattern classification method according to the present invention.

의상 무늬 분류 방법의 실험을 위한 알고리즘 구현 언어로는 VISUAL C++ 6.0가 사용되었으며, 하드웨어 환경으로는 1GByte의 메모리와 3GHz 펜티엄 4 CPU가 장착된 사양의 컴퓨터가 사용되었다. 학습 및 실험 영상은 각각 217장이 사용되였으며, 각 클래스별 학습영상과 실험 영상의 수는 아래 표 8과 같다. 이러한 조건 아래, SVM을 적용한 결과 각 클래스별 분류는 평균 61.75%의 성공률을 얻었다.VISUAL C ++ 6.0 was used as the algorithm implementation language for the experiment of the pattern classification method, and the computer with the specification equipped with 1GByte memory and 3GHz Pentium 4 CPU was used as the hardware environment. 217 lessons were used for training and experiment images, and the number of lessons and experiment images for each class is shown in Table 8 below. Under these conditions, the SVM resulted in an average success rate of 61.75%.

클래스 명Class name 학습 영상 개수Number of learning videos 실험 영상 개수Experiment video count 영상 개수Video count 1One 2929 2828 5757 22 110110 111111 221221 33 4646 4646 9292 44 3232 3232 6464 합계Sum 217217 217217 434434

본 발명의 최대마진분류기를 이용한 의상 분류 방법은 전술한 실시 예에 국한되지 않고 본 발명의 기술 사상이 허용하는 범위에서 다양하게 변형하여 실시할 수가 있다.The clothing classification method using the maximum margin classifier of the present invention is not limited to the above-described embodiment and can be modified in various ways within the scope of the technical idea of the present invention.

도 1은 본 발명의 일 실시예에 따른 최대마진분류기를 이용한 의상 분류 방법을 설명하기 위한 흐름도이고, 1 is a flowchart illustrating a method of classifying clothes using a maximum margin classifier according to an embodiment of the present invention.

도 2는 81×54 해상도의 그리드 영상을 보인 예시 사진이며,2 is an example photograph showing a grid image with 81 × 54 resolution.

도 3은 카메라로부터 서로 다른 거리에 있는 인체로부터 획득된 두 인체 영상 예시도이며,3 is an exemplary view illustrating two human body images obtained from a human body at different distances from a camera.

도 4는 도 3의 영상에 대한 히스토그램이며,4 is a histogram of the image of FIG.

도 5는 도 3의 영상에서 변환된 81*54 해상도의 격자패치 영상이며,FIG. 5 is a grid patch image having a resolution of 81 * 54 resolution converted from the image of FIG. 3.

도 6은 격자패치의 분류에 있어서 고정적 임계값과 적응적 임계값을 적용하였을 때의 피부영역 검출 능력을 비교한 도표이며,6 is a chart comparing skin area detection capability when applying a fixed threshold value and an adaptive threshold value in the classification of grid patches.

도 7은 짧은 소매의 옷을 입은 인체의 피부 관심영역의 검출 결과를 보인 영상이며,7 is an image showing a detection result of a skin ROI of a human body wearing short sleeved clothes,

도 8은 검출된 관심 피부영역이 얼굴 또는 손 영역에 해당하는지 판단하기 위해 아다부스트 알고리즘을 적용한 예를 보인 것이며,FIG. 8 illustrates an example of applying an adaboost algorithm to determine whether a detected skin region of interest corresponds to a face or hand region.

도 9는 의상 영역의 추출 결과를 보인 예시 사진이며,9 is an example photograph showing the extraction result of the clothing area,

도 10은 소벨 마스크의 일 예를 보인 것이며,10 shows an example of a Sobel mask,

도 11은 에지 영상의 히스토그램을 보인 예시도이며,11 is an exemplary diagram showing a histogram of an edge image.

도 12는 양복과 비양복 영상에 대한 에지 히스트로그램을 보인 예시도이며,12 is an exemplary view showing an edge histogram for a suit and a non-clothing image,

도 13은 의상을 무늬의 형태별로 분류하여 보인 예시 사진이며,13 is an example photograph showing the clothes classified by the shape of the pattern,

도 14는 양복의상 선별 방법의 실효성 검증을 위해 쓰인 예시 사진이며,14 is an example photograph used for validating the suit clothes selection method,

도 15는 SVM과 Euclidean distance의 평균 인식 성공률의 차이를 보인 것이며,15 shows the difference between the average recognition success rate of SVM and Euclidean distance,

도 16은 3가지 종류의 특징벡터에 대한 평균 성공 인식률을 비교한 것이며,16 is a comparison of average success recognition rates for three types of feature vectors.

도 17은 양복과 비양복 의상을 SVM으로 분류한 결과 실패한 경우를 보인 것이다.FIG. 17 illustrates a case where a suit and a non-clothing garment are classified as SVMs and fail.

Claims

삭제delete

실시간 입력되는 영상으로부터 배경영상을 차분하는 (a) 단계;(A) dividing a background image from a real-time input image;

상기 (a) 단계에서 차분된 전체 영상의 전체 픽셀에 대한 전경 픽셀의 비율과 상기 전경 픽셀에 대한 피부 픽셀의 비율을 계산하는 (b) 단계;(B) calculating a ratio of a foreground pixel to all pixels of the entire image differentiated in step (a) and a ratio of skin pixels to the foreground pixel;

상기 (a) 단계에서 차분된 전체 영상을 미리 정해진 크기의 격자패치로 분할한 후에, 분할된 각 격자패치 내의 전체 픽셀에 대한 전경 픽셀의 비율과 격자패치 내의 전경픽셀에 대한 피부픽셀의 비율을 계산하는 (c) 단계;After dividing the entire image differentiated in the step (a) into a grid patch of a predetermined size, the ratio of the foreground pixel to the total pixels in each divided grid patch and the skin pixel to the foreground pixel in the grid patch are calculated. (C) doing;

상기 (c) 단계에서 계산된 격자패치내의 상기 전경픽셀의 비율과 상기 피부픽셀의 비율을 상기 (b) 단계에서 계산된 각 비율에 따라 적응적으로 정해지는 임계값과 비교한 결과에 따라 각 격자패치를 전경격자, 피부격자 및 배경격자로 분류하는 (d) 단계;Each grid according to a result of comparing the ratio of the foreground pixel and the skin pixel in the grid patch calculated in step (c) with a threshold value adaptively determined according to each ratio calculated in step (b). (D) classifying the patch into a foreground grid, a skin grid and a background grid;

상기 (d) 단계에서 분류된 격자패치 기반에 의해 레이블링을 수행하여 피부 관심영역을 바운딩 박스로 추출하는 (e) 단계;(E) extracting a skin region of interest into a bounding box by performing labeling based on the grid patches classified in step (d);

아다부스트 알고리즘을 이용하여 상기 (e) 단계에서 추출된 피부 관심영역 중에서 얼굴영역을 찾은 후에, 찾은 얼굴영역의 아래에 위치한 미리 정해진 크기의 영역을 의상영역으로 정하는 (f) 단계;Finding a face region from the skin region of interest extracted in the step (e) using an Adboost algorithm, and then setting a region of a predetermined size located below the found face region as a clothes region;

상기 (f) 단계에서 정해진 의상영역에 대해 소벨마스크를 적용하여 상기 의상영역에 대한 특징벡터를 구성하는 (g) 단계; 및(G) constructing a feature vector for the clothes area by applying a Sobel mask to the clothes area determined in the step (f); And

상기 (g) 단계에서 구성된 특징벡터에 대해 최대 마진 분류기를 적용하여 의상의 종류를 판별하는 (h) 단계를 포함하여 이루어지되,And (h) determining the type of clothes by applying the maximum margin classifier to the feature vector configured in step (g).

상기 (g) 단계는 상기 (f) 단계에서 정해진 의상영역에 대해 수평 방향으로 누적한 에지 히스토그램과 수직 방향으로 누적한 에지 히스토그램을 계산한 후, 계산된 에지 히스토그램을 결합하여 상기 의상영역에 대한 특징벡터를 구성하고,In the step (g), the edge histogram accumulated in the horizontal direction and the edge histogram accumulated in the vertical direction with respect to the clothes area determined in the step (f) are calculated, and then the calculated edge histogram is combined to provide the characteristics of the clothes area. Construct a vector,

상기 (h) 단계는 의상이 양복인지 여부를 판별하는 것을 특징으로 하는 컴퓨터 장치에 의해 수행되는 의상 분류 방법.The method of classifying clothes performed by the computer device, characterized in that step (h) determines whether the clothes are suits.

상기 (g) 단계는 상기 (f) 단계에서 정해진 의상영역을 그레이스케일로 변환하는 (g-1) 단계와, 상기 (g-1) 단계에서 그레이스케일로 변환된 의상영역에 대해 소벨마스크를 적용하여 에지 영상을 계산하는 (g-2) 단계와, 상기 (g-2) 단계에서 계산된 에지 영상을 미리 정해진 크기의 격자패치로 분할하는 (g-3) 단계와, 상기 (g-3)에서 분할된 각 격자패치 내의 소벨 에지의 크기값을 합산한 후 상기 격자패치 내의 전체 픽셀 수로 나누어 각 격자패치의 에지 특징값을 산출하는 (g-4) 단계와, 상기 (g-4) 단계에서 산출한 각 격자패치의 에지 특징값을 나열하여 특징벡터를 구성하는 (g-5) 단계를 포함하여 이루어지며,Step (g) is a step (g-1) of converting the clothes area determined in step (f) to grayscale, and a Sobel mask is applied to the clothes area converted to grayscale in step (g-1). (G-2) calculating the edge image by dividing the image, and (g-3) dividing the edge image calculated in the step (g-2) into a grid patch having a predetermined size. In step (g-4) and calculating the edge feature value of each lattice patch by summing the magnitude values of the Sobel edges in the lattice patches divided by the total number of pixels in the lattice patches. And (g-5) forming a feature vector by arranging the edge feature values of each calculated grid patch.

상기 (h) 단계는 의상의 무늬를 판별하는 것을 특징으로 하는 컴퓨터 장치에 의해 수행되는 의상 분류 방법.Wherein (h) step is performed by the computer device characterized in that for determining the pattern of the clothes.