KR20170127354A

KR20170127354A - Apparatus and method for providing video conversation using face conversion based on facial motion capture

Info

Publication number: KR20170127354A
Application number: KR1020170031295A
Authority: KR
Inventors: 강미연
Original assignee: 강미연
Priority date: 2016-05-11
Filing date: 2017-03-13
Publication date: 2017-11-21

Abstract

Provided are a face conversion video conversation system using facial motion capture and a service method thereof. The method comprises the steps of: receiving facial motion data on a facial image of a conversation partner from a video conversation apparatus on the conversation partner who participates in a video conversation; mapping the facial motion data to a preset computer graphic character; and outputting, as the facial image of the conversation partner, the computer graphic character to which the mapped facial motion data are applied. The facial motion data are a result of the facial motion capture according to depth information of the facial image photographed through a depth camera in the video conversation apparatus of the conversation partner.

Description

페이셜 모션 캡쳐를 이용한 얼굴 변환 화상 대화 장치 및 방법{APPARATUS AND METHOD FOR PROVIDING VIDEO CONVERSATION USING FACE CONVERSION BASED ON FACIAL MOTION CAPTURE}FIELD OF THE INVENTION [0001] The present invention relates to a facial motion image capturing apparatus,

본 발명은 페이셜 모션 캡쳐에 기반하여 생성된 컴퓨터 그래픽(CG: Computer Graphics) 캐릭터를 통해 대화자 간의 화상 대화를 제공하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and a method for providing a video conversation between a talker via a computer graphics (CG) character generated based on facial motion capture.

최근 들어 오프라인 어학 교육의 번거로움을 해소하기 위하여 원격 어학 교육 시장이 크게 발전하고 있다. 그러나 기존의 전화 통화 등을 통한 어학 교육은 음성 교육에만 한정되어 사용자의 집중력이 떨어지고 입 모양을 통한 발음 확인 등이 불가능하여 학습 효율이 떨어진다는 문제가 있었다. 또한, 온라인 어학 교육 시스템은 대부분 어학 교육 서비스 제공자 측에서 제공한 교육 영상을 학생이 시청하는 단방향 방식이라는 한계가 있으며 실시간 교육이 어렵다는 단점이 있었다.Recently, the remote language education market has been greatly developed to solve the hassle of offline language education. However, language education through existing telephone conversation is limited to voice education only, so the concentration of the user is decreased and the pronunciation is not verified through the mouth shape. In addition, the online language education system has a limitation that it is a one-way method in which a student watches an educational image provided by a language education service provider, and real-time education is difficult.

이와 관련하여, 대한민국 공개특허 제 2008-0096862 호(발명의 명칭: 온라인 원격 화상 교육의 운용 서비스 방법)에서는, 원어민 교사와 학생이 화상 대화와 같은 방법으로 화상수업을 진행할 수 있도록 제공되는 임베디드 시스템의 교육 장치들을 원어민 그룹과 학원 그룹에 각각 설치하고, 수업 스케줄 정보에 따라 원어민 교사 및 학생들에게 화상 수업을 제공하는 교육 장치들을 프랜차이즈 방식으로 운용 관리 및 서비스하는 것이 가능하도록 해주는 온라인 원격 화상 교육의 운용 서비스 방법을 개시하고 있다.In this regard, in Korean Patent Publication No. 2008-0096862 (entitled " Method of Operation Service of On-line Remote Image Education "), an embedded system is provided in which a native speaker and a student can perform an image class The management service of online remote image education which enables the education devices to be managed and serviced by franchise system by installing educational devices in native language group and school group respectively and providing educational classes to native English teachers and students according to class schedule information / RTI >

한편, 무선/정보통신기술의 비약적인 발전 및 스마트 기기 등의 개인용 휴대 단말의 사용이 활성화됨에 따라, 사용자가 장소 및 시간 등에 구애받지 않고 편리하게 화상 대화 서비스를 이용할 수 있게 되었다.On the other hand, as the rapid development of wireless / information communication technology and the use of personal portable terminals such as smart devices are activated, users can conveniently use the video conversation service regardless of place and time.

이러한 화상 대화 서비스는 다양한 대화형 시스템에 이용될 수 있으며, 예를들어 대화자 간의 대면이 필요하였던 온라인 어학 교육 시스템 뿐만 아니라 다양한 정보 안내 및 외국어 통역 등을 위한 인포메이션 시스템(information-system) 등에 유용하게 적용될 수 있다.Such a video chatting service can be used in various interactive systems and is usefully applied to an information system for various information guidance and foreign language interpretation as well as an on-line language education system in which a confrontation between conversants is required .

그러나 화상 대화 서비스의 경우 개인적인 친분이 있는 대화자 간에 사용되는 것이 일반적이며, 어학 교육, 정보 안내 및 통역 서비스 등에 이용될 경우 사용자에게 심리적 부담감을 줄 수 있다는 문제점이 있었다. 이에 따라, 화상 대화 서비스를 이용한 온라인 대화형 시스템에서 사용자의 집중도 및 시스템 활용도를 높일 수 있는 방식이 필요하다.However, in the case of the videoconferencing service, it is general that it is used between conversationists having personal friendships, and there is a problem that it can give psychological burden to users when it is used in language education, information guidance and interpretation service. Accordingly, there is a need for a method that can enhance the user's concentration and system utilization in an online interactive system using a video conversation service.

본 발명의 일 실시예는 페이셜 모션 캡쳐에 기반하여 생성된 컴퓨터 그래픽 캐릭터를 통해 대화자 간의 화상 대화를 처리하는 얼굴 변환 화상 대화 장치 및 방법을 제공하고자 한다.An embodiment of the present invention seeks to provide a facial image conversion apparatus and method for processing image conversation between conversationists through a computer graphic character generated based on facial motion capture.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.It should be understood, however, that the technical scope of the present invention is not limited to the above-described technical problems, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 측면에 따른 페이셜 모션 캡쳐 정보를 이용하여 대화 상대의 얼굴 변환을 처리하는 화상 대화 장치는, 화상 대화에 참여한 대화 상대의 화상 대화 장치와 데이터 송수신을 처리하는 통신모듈; 얼굴 변환 화상 대화 프로그램이 저장된 메모리; 및 상기 메모리에 저장된 얼굴 변환 화상 대화 프로그램을 실행하는 프로세서를 포함하며, 상기 프로세서는 상기 얼굴 변환 화상 대화 프로그램의 실행에 따라, 상기 통신모듈을 통해 상기 대화 상대의 화상 대화 장치로부터 상기 대화 상대의 얼굴 영상에 대한 페이셜 모션 데이터가 수신되면, 기설정된 컴퓨터 그래픽 캐릭터에 상기 페이셜 모션 데이터를 맵핑하고, 상기 맵핑된 페이셜 모션 데이터가 적용된 상기 컴퓨터 그래픽 캐릭터를 상기 대화 상대의 얼굴 영상으로서 출력한다. 이때, 상기 페이셜 모션 데이터는 상기 대화 상대의 화상 대화 장치에서 깊이 카메라를 통해 촬영된 얼굴 영상의 깊이 정보에 따른 페이셜 모션 캡쳐의 결과이다.As a technical means for achieving the above technical object, an image dialog device for processing a face transformation of a conversation partner using facial motion capture information according to an aspect of the present invention includes an image dialog device of a conversation partner participating in an image conversation, A communication module for processing data transmission / reception; A memory for storing a face conversion video conversation program; And a processor for executing a face conversion image conversation program stored in the memory, wherein the processor executes, via the communication module, from the image conversation apparatus of the conversation party, When the facial motion data for the image is received, the facial motion data is mapped to a predetermined computer graphic character, and the computer graphic character to which the mapped facial motion data is applied is output as the face image of the conversation partner. At this time, the facial motion data is a result of facial motion capturing according to depth information of a face image photographed through a depth camera in the conversation partner's image chatting apparatus.

그리고 본 발명의 다른 측면에 따른 페이셜 모션 캡쳐 정보를 이용하여 대화 상대의 얼굴 변환을 처리하는 화상 대화 서비스 제공 방법은, 화상 대화에 참여한 대화 상대의 화상 대화 장치로부터 상기 대화 상대의 얼굴 영상에 대한 페이셜 모션 데이터를 수신하는 단계; 기설정된 컴퓨터 그래픽 캐릭터에 상기 페이셜 모션 데이터를 맵핑하는 단계; 및 상기 맵핑된 페이셜 모션 데이터가 적용된 상기 컴퓨터 그래픽 캐릭터를 상기 대화 상대의 얼굴 영상으로서 출력하는 단계를 포함한다. 이때, 상기 페이셜 모션 데이터는 상기 대화 상대의 화상 대화 장치에서 깊이 카메라를 통해 촬영된 얼굴 영상의 깊이 정보에 따른 페이셜 모션 캡쳐의 결과이다.A method of providing a video conversation service for processing face conversions of a conversation partner using facial motion capturing information according to another aspect of the present invention includes the steps of acquiring, from a video chatting apparatus of a conversation partner participating in a video conversation, Receiving motion data; Mapping the facial motion data to a predetermined computer graphic character; And outputting the computer graphic character to which the mapped facial motion data is applied as the face image of the conversation partner. At this time, the facial motion data is a result of facial motion capturing according to depth information of a face image photographed through a depth camera in the conversation partner's image chatting apparatus.

또한, 본 발명의 또 다른 측면에 따른 페이셜 모션 캡쳐를 이용한 얼굴 변환 화상 대화 장치는, 화상 대화에 참여한 대화 상대의 화상 대화 장치와 데이터 송수신을 처리하는 통신모듈; 사용자의 얼굴 영상을 촬영하되, 촬영된 얼굴 영상으로부터 깊이 정보를 추출하는 깊이 카메라 모듈; 얼굴 변환 화상 대화 프로그램이 저장된 메모리; 및 상기 메모리에 저장된 얼굴 변환 화상 대화 프로그램을 실행하는 프로세서를 포함하며, 상기 프로세서는 상기 얼굴 변환 화상 대화 프로그램의 실행에 따라, 상기 촬영된 얼굴 영상에 대해 깊이 정보를 추출하여 페이셜 모션 캡쳐를 처리하고, 페이셜 모션 캡쳐의 결과에 따른 페이셜 모션 데이터를 상기 대화 상대의 화상 대화 장치로 전송한다. 이때, 상기 페이셜 모션 데이터는 상기 대화 상대의 화상 대화 장치에서 기설정된 컴퓨터 그래픽 캐릭터에 맵핑되며, 상기 맵핑된 페이셜 모션 데이터가 적용된 상기 컴퓨터 그래픽 캐릭터가 상기 사용자의 얼굴 영상으로서 출력된다.According to still another aspect of the present invention, there is provided a face conversion video dialogue apparatus using facial motion capture, comprising: a communication module for processing data transmission and reception with an image communication apparatus of a conversation partner participating in an image conversation; A depth camera module for photographing a user's face image and extracting depth information from the photographed face image; A memory for storing a face conversion video conversation program; And a processor for executing a face conversion image conversation program stored in the memory, wherein the processor extracts depth information on the photographed face image in accordance with the execution of the face conversion image conversation program to process facial motion capturing , And transmits the facial motion data according to the result of the facial motion capture to the conversation partner's image conversation apparatus. At this time, the facial motion data is mapped to a predetermined computer graphic character in the conversation partner's image dialog unit, and the computer graphic character to which the mapped facial motion data is applied is output as the face image of the user.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 대화자 간의 실제 얼굴을 보고 대화하는 것과 동일한 효과를 갖되, 대화 참여자의 얼굴 모션을 컴퓨터 그래픽 캐릭터를 통해 그대로 구현함으로써 대화자의 표정, 근육을 이용한 사실적인 신체 표현, 피부 랜더링 및 동작 생성 등이 가능하여 실감도 및 대화 집중도를 높일 수 있으며, 어학용 화상 대화인 경우 학습 흥미도를 높일 수 있다.According to any one of the above-described objects of the present invention, it is possible to provide a computer-readable medium having a computer-readable medium having a computer-readable medium storing a computer- It is possible to increase the realism degree and the conversation concentration degree, and it is possible to enhance the learning interest in the case of the linguistic video conversation.

도 1은 본 발명의 일 실시예에 따른 얼굴 변환 화상 대화 시스템의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 페이셜 모션 캡쳐를 위한 기준 데이터를 생성하는 방식을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 얼굴 변환 정보 제공자측의 화상 대화 서비스 제공 방법을 설명하기 위한 순서도이다.
도 4는 본 발명의 일 실시예에 따른 얼굴 변환 정보 수신자측의 화상 대화 서비스 제공 방법을 설명하기 위한 순서도이다.1 is a configuration diagram of a face-switched image dialog system according to an embodiment of the present invention.
2 is a diagram for explaining a method of generating reference data for facial motion capturing according to an embodiment of the present invention.
3 is a flowchart illustrating a method for providing a video conversation service on the face conversion information provider side according to an embodiment of the present invention.
4 is a flowchart for explaining a method of providing a video conversation service on a receiver side of a face conversion information according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when an element is referred to as "including" an element, it does not exclude other elements unless specifically stated to the contrary, It is to be understood that the foregoing does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, parts, or combinations thereof.

본 명세서에 있어서 '부(部)' 또는 '모듈'이란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부' 또는 '~모듈' 은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서, '~부' 또는 '~모듈' 은 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과, '~부'(또는 '~모듈')들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'(또는 '~모듈')들로 결합되거나 추가적인 구성요소들과 '~부'(또는 '~모듈')들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'(또는 '~모듈')들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, the term "unit" or "module" includes a unit implemented by hardware, a unit implemented by software, and a unit realized by using both. Further, one unit may be implemented using two or more hardware, or two or more units may be implemented by one hardware. On the other hand, 'to' or 'module' is not meant to be limited to software or hardware, but may be configured to be in an addressable storage medium and configured to play one or more processors. Thus, by way of example, a "module" or "module" may include components such as software components, object-oriented software components, class components and task components, and processes, Microcode, circuitry, data, databases, data structures, tables, arrays, and variables, as will be appreciated by those skilled in the art. The functionality provided in the components and 'part' (or 'module') may be combined with a smaller number of components and 'parts' (or 'modules'), (Or ' modules '). &Lt; / RTI > In addition, the components and 'part' (or 'module') may be implemented to play one or more CPUs in a device or a secure multimedia card.

이하에서 언급되는 "단말"은 네트워크를 통해 서버나 타 단말에 접속할 수 있는 컴퓨터나 휴대용 단말기로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop) 등을 포함하고, 휴대용 단말기는 휴대성과 이동성이 보장되는 무선 통신 장치로서 예를 들어, IMT(International Mobile Telecommunication), CDMA(Code Division Multiple Access), W-CDMA(W-Code Division Multiple Access), LTE(Long Term Evolution) 등의 통신 기반 단말, 스마트폰, 태블릿 PC 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다. 또한, “네트워크”는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN) 또는 부가가치 통신망(Value Added Network; VAN) 등과 같은 유선 네트워크나 이동 통신망(mobile radio communication network) 또는 위성 통신망 등과 같은 모든 종류의 무선 네트워크로 구현될 수 있다.The "terminal" referred to below can be implemented as a computer or a portable terminal that can access a server or other terminal through a network. Here, the computer includes, for example, a notebook computer, a desktop computer, a laptop computer, and the like, each of which is equipped with a web browser (WEB Browser), and the portable terminal is a wireless communication device, Communication-based terminals such as International Mobile Telecommunication (IMT), Code Division Multiple Access (CDMA), W-Code Division Multiple Access (W-CDMA) and Long Term Evolution (LTE), smart phones, And a handheld based wireless communication device. The term " network " may also be used in a wired network such as a local area network (LAN), a wide area network (WAN) or a value added network (VAN) And may be implemented in all kinds of wireless networks, such as communication networks.

도 1은 본 발명의 일 실시예에 따른 얼굴 변환 화상 대화 시스템의 구성도이다.1 is a configuration diagram of a face-switched image dialog system according to an embodiment of the present invention.

도 1에 도시한 바와 같이, 얼굴 변환 화상 대화 시스템(10)은, 페이셜 모션 캡쳐를 통해 사용자의 얼굴 영상으로부터 얼굴 변환 정보를 추출하여 대화 상대의 화상 대화 장치로 제공하는 제 1 화상 대화 장치(100), 및 제 1 화상 대화 장치(100)로부터 제공된 얼굴 변환 정보에 기초하여 대화 상대의 얼굴 영상을 임의의 디지털 액터(Digital Actor) 형태로 변환하여 출력하는 제 2 화상 대화 장치(200)를 포함한다.As shown in Fig. 1, the face conversion image conversation system 10 includes a first image conversation apparatus 100 (Fig. 1) for extracting face conversion information from a face image of a user through facial motion capture and providing the face conversion information to a conversation partner image conversation apparatus And a second image dialogue device 200 for converting the face image of the conversation partner into an arbitrary digital actor form based on the face conversion information provided from the first image dialogue apparatus 100 .

제 1 화상 대화 장치(100) 및 제 2 화상 대화 장치(200)는 각각 개인 사용자의 단말일 수 있다. 또한, 제 1 화상 대화 장치(100)가 다수의 대화 상대에 대해 임의의 정보 서비스를 제공하는 장치이고, 제 2 화상 대화 장치(200)가 다수의 사용자가 제 1 화상 대화 장치(100)로부터 제공되는 정보 서비스를 사용할 수 있는 공용 장치인 경우, 제 1 화상 대화 장치(100)는 일종의 서버 장치이고 제 2 화상 대화 장치(200)는 키오스크 단말 또는 DID 시스템(Digital Information Display system)인 것도 가능하다. 이때, 서버 장치로서의 제 1 화상 대화 장치(100)에는 정보 제공자의 얼굴 영상을 촬영하여 화상 대화에 참여할 수 있도록 하는 사용자 인터페이스가 구현될 수 있다. 즉, 제 1 화상 대화 장치(100) 및 제 2 화상 대화 장치(200)의 종류와 두 장치 간의 화상 대화를 처리하기 위한 통신 방식은 제한되지 않는다.The first video dialog device 100 and the second video dialog device 200 may each be a terminal of an individual user. It is also possible that the first video talk apparatus 100 is a device that provides any information service to a plurality of contacts and that the second video chat device 200 is provided by a plurality of users from the first video chat device 100 The first video chatting apparatus 100 may be a kind of server apparatus and the second video chatting apparatus 200 may be a kiosk terminal or a DID system (Digital Information Display System). At this time, a user interface can be implemented in the first video dialogue apparatus 100 as a server apparatus so that the face image of the information provider can be photographed and participate in the video conversation. That is, the type of the first image communication apparatus 100 and the second image communication apparatus 200, and the communication method for processing the image communication between the two apparatuses are not limited.

예를 들어, 본 발명의 일 실시예에 따른 얼굴 변환 화상 대화 시스템(10)은, 제 1 화상 대화 장치(100)의 사용자가 선생님이고 제 2 화상 대화 장치(200)의 사용자가 학생인, 언어학습용 화상 대화 서비스를 처리하는 시스템일 수 있다. 즉, 제 1 화상 대화 장치(100)는 선생님측 단말로서 선생님의 얼굴 영상을 깊이 카메라로 촬영한 후, 촬영된 영상 데이터로부터 깊이 정보를 추출하여 페이셜 모션 캡쳐를 수행한 결과에 따른 얼굴 변환 정보를 제 2 화상 대화 장치(200)로 전송한다. 또한, 제 2 화상 대화 장치(200)는 학생측 단말로서 제 1 화상 대화 장치(100)로부터 수신된 얼굴 변환 정보를 디지털 액터에 적용하여, 대화자의 실제 얼굴을 CG 캐릭터로 변환하여 출력한다.For example, the face-switched video dialog system 10 according to an embodiment of the present invention may be implemented in a language (e.g., a computer-readable medium), such as a computer- And may be a system for processing a training video conversation service. That is, the first image communication apparatus 100 captures the face image of the teacher as a teacher terminal with a depth camera, extracts depth information from the photographed image data, and outputs face conversion information according to the result of performing facial motion capturing To the second video dialogue apparatus 200. The second image communication apparatus 200 also applies the face conversion information received from the first image communication apparatus 100 as a student-side terminal to the digital actor, converts the actual face of the speaker into a CG character, and outputs the CG character.

한편, 본 발명의 다른 실시예에 따른 얼굴 변환 화상 대화 시스템(10)은 공항, 공연장, 교통안내소, 쇼핑몰, 관공서 등 다양한 종류의 현장에서 다수의 이용자에게 정보를 제공하는 인포메이션 시스템일 수 있다. 이러한 경우, 제 1 화상 대화 장치(100)는 한명 이상의 정보 제공자(예를 들어, 통역사 등)들의 단말일 수 있으며, 제 2 화상 대화 장치(200)는 임의의 정보를 제공받기 위해 제 1 화상 대화 장치(100)에 접속하는 하나 이상의 정보 이용자(예를 들어, 외국에 방문한 여행객, 쇼핑객, 공연 관람자 등)의 단말일 수 있다. 참고로, 제 2 화상 대화 장치(200)는 사용자 개인의 휴대용 단말일 뿐만 아니라 각종 장소에 배치되어 있는 키오스크 등의 형태인 것도 가능하다.Meanwhile, the face-converted image communication system 10 according to another embodiment of the present invention may be an information system that provides information to a plurality of users at various kinds of sites such as an airport, a performance hall, a traffic information center, a shopping mall, and a government office. In this case, the first video dialog device 100 may be a terminal of one or more information providers (e.g., an interpreter, etc.) and the second video dialog device 200 may be a terminal of one or more information providers May be a terminal of one or more information users (e.g., travelers, shoppers, performers, etc.) who are connected to the device 100. For reference, the second image communication apparatus 200 may be a portable terminal of a user, or a kiosk disposed at various places.

이하에서는, 본 발명의 일 실시예에 따른 얼굴 변환 화상 대화 시스템(10)이 언어학습용 대화 시스템인 것을 설명하도록 하며, 대화자의 실제 얼굴 모션을 구현하는 디지털 액터가 사용자의 선택에 따르거나 또는 자동으로 선택된 컴퓨터 그래픽 캐릭터(즉, CG 캐릭터)인 것을 설명하도록 한다.Hereinafter, it will be explained that the face-converted video conversation system 10 according to an embodiment of the present invention is a conversation system for language learning, and a digital actor that implements the actual face motion of the conversation is selected according to the user's selection or automatically And a selected computer graphic character (i.e., CG character).

구체적으로, 도 1에 도시한 바와 같이, 제 1 화상 대화 장치(100)는 깊이 카메라 모듈(110), 오디오 모듈(120), 통신 모듈(130), 메모리(140) 및 프로세서(150)를 포함하며, 제 2 화상 대화 장치(200)은 카메라 모듈(210), 오디오 모듈(220), 통신 모듈(230), 메모리(240) 및 프로세서(250)를 포함한다.1, the first video communication device 100 includes a depth camera module 110, an audio module 120, a communication module 130, a memory 140, and a processor 150 And the second video communication device 200 includes a camera module 210, an audio module 220, a communication module 230, a memory 240 and a processor 250.

한편, 도 1에는 도시되지 않았으나, 제 1 화상 대화 장치(100) 및 제 2 화상 대화 장치(200)는 각각 사용 목적에 따른 구성 요소들을 더 포함할 수 있다. 예를 들어, 제 1 화상 대화 장치(100) 및 제 2 화상 대화 장치(200)가 스마트폰과 같이 무선 통신을 처리하는 단말인 경우, 음성 통화 기능, 무선 데이터 통신 기능 등의 원래의 기능을 처리하기 위한 구성을 더 포함할 수 있다. 이러한 경우, 제 1 화상 대화 장치(100) 및 제 2 화상 대화 장치(200)의 깊이 카메라 모듈(110) 및 카메라 모듈(120)로서 스마트폰에 실장된 카메라를 사용할 수 있다. 또한, 제 1 화상 대화 장치(100) 및 제 2 화상 대화 장치(200)가 데스크탑 또는 노트북과 같은 개별 사용자 단말인 경우, 깊이 카메라 모듈(110) 및 카메라 모듈(120)로서 화상캠을 사용하는 것도 가능하다. 이러한 제 1 화상 대화 장치(100) 및 제 2 화상 대화 장치(200)의 종류는 제한되지 않는다.Although not shown in FIG. 1, each of the first video dialog device 100 and the second video dialog device 200 may further include components according to the purpose of use. For example, when the first image communication apparatus 100 and the second image communication apparatus 200 are terminals that process wireless communication such as a smart phone, the original functions such as a voice communication function and a wireless data communication function are processed And may further comprise a configuration for performing the above operation. In such a case, a camera mounted on the smart phone can be used as the depth camera module 110 and the camera module 120 of the first video talk apparatus 100 and the second video talk apparatus 200. [ It is also contemplated that the use of an image cam as the depth camera module 110 and camera module 120, when the first image communication device 100 and the second image communication device 200 are separate user terminals, such as a desktop or notebook, It is possible. The types of the first image communication apparatus 100 and the second image communication apparatus 200 are not limited.

먼저, 제 1 화상 대화 장치(100)에 대해서 상세히 설명하도록 한다.First, the first video talk apparatus 100 will be described in detail.

깊이 카메라 모듈(110)은 사용자의 얼굴 영상을 촬영하되, 촬영된 얼굴 영상으로부터 깊이 정보를 추출한다.The depth camera module 110 captures a face image of a user, and extracts depth information from the captured face image.

즉, 화상 대화에 참여하여 학생에게 어학 정보를 제공하고자 하는 선생님의 얼굴 영상을 깊이 카메라를 통해 촬영하고, 깊이 정보를 포함하는 선생님의 얼굴 영상 데이터를 프로세서(150)로 제공한다.That is, the face image of the teacher who wants to participate in the video conversation and provide the linguistic information to the student is photographed through the depth camera, and the face image data of the teacher including the depth information is provided to the processor 150.

오디오 모듈(120)은 사용자의 음성 데이터를 녹음하며, 또한 제 2 화상 대화 장치(200)로부터 수신된 음성 데이터를 사용자가 청취할 수 있도록 출력한다.The audio module 120 records the voice data of the user and outputs the voice data received from the second video chat device 200 so that the user can listen to the voice data.

통신모듈(130)은 프로세서(150)의 제어에 따라 화상 대화에 참여한 대화 상대의 화상 대화 장치(즉, 제 2 화상 대화 장치(200))와 데이터 송수신을 처리한다.The communication module 130 handles data transmission and reception with the video conversation device of the conversation partner (that is, the second video conversation device 200) participating in the video conversation under the control of the processor 150. [

메모리(140)에는 대화 상대에게 사용자의 실제 얼굴 영상이 아닌 CG 캐릭터로 대체된 얼굴 영상의 정보를 제공할 수 있도록 하는 얼굴 변환 화상 대화 프로그램이 저장되어 있다.The memory 140 stores a face conversion image dialogue program that allows the conversation partner to provide information of a face image replaced by a CG character, not the actual face image of the user.

이때, 메모리(140)는 전원이 공급되지 않아도 저장된 정보를 계속 유지하는 비휘발성 저장장치 및 저장된 정보를 유지하기 위하여 전력이 필요한 휘발성 저장장치를 통칭하는 것이다. 또한, 메모리(140)에는 얼굴 변환 화상 대화 프로그램 외에도 다양한 프로그램 및 데이터들이 저장되며, 이는 프로세서(150)에 의하여 처리될 수 있다.At this time, the memory 140 is collectively referred to as a non-volatile storage device that keeps stored information even when no power is supplied, and a volatile storage device that requires power to maintain stored information. In addition, various programs and data are stored in the memory 140 in addition to the face-converted video conversation program, which can be processed by the processor 150. [

프로세서(150)는 메모리(140)에 저장된 얼굴 변환 화상 대화 프로그램을 실행한다.The processor 150 executes the face conversion video chat program stored in the memory 140. [

구체적으로, 프로세서(150)는 얼굴 변환 화상 대화 프로그램의 실행에 따라 다음과 같은 동작을 처리한다.Specifically, the processor 150 processes the following operations according to the execution of the face-converted video conversation program.

프로세서(150)는 깊이 카메라 모듈(110)을 통해 촬영된 사용자의 얼굴 영상에 대해 깊이 정보를 추출하여 페이셜 모션 캡쳐를 처리하고, 페이셜 모션 캡쳐의 결과에 따른 페이셜 모션 데이터를 제 2 화상 대화 장치(200)로 전송한다. 이때, 페이셜 모션 데이터는 제 2 화상 대화 장치(200) 상에서 기설정되어 있는 CG 캐릭터에 맵핑되며, 맵핑된 페이셜 모션 데이터가 적용된 CG 그래픽 캐릭터가 사용자(즉, 선생님)의 얼굴 영상으로서 출력된다. 이에 따라, 학생측 단말의 화상 대화 화면에는 선생님의 실제 얼굴 대신 선생님의 표정, 입모양 등을 따라하는 CG 캐릭터가 출력된다.The processor 150 processes the facial motion capturing by extracting depth information on the face image of the user photographed through the depth camera module 110 and transmits the facial motion data according to the result of the facial motion capturing to the second image conversation device 200). At this time, the facial motion data is mapped to the CG character previously set on the second image communication apparatus 200, and the CG graphic character to which the mapped facial motion data is applied is output as the face image of the user (i.e., the teacher). Accordingly, a CG character that follows the teacher's expression, mouth shape, and the like is output instead of the actual face of the teacher on the video conversation screen of the student terminal.

이때, 프로세서(150)는 깊이 카메라 모듈(110)을 통해 사용자(즉, 선생님)의 기설정된 기본 표정에 따른 깊이 정보를 추출하여 기준 데이터를 생성하고, 사용자의 얼굴 표정 변화에 따른 깊이 정보 값과 기준 데이터를 비교하여 사용자 얼굴 상에 기설정된 위치 요소 별 이동량을 계산하고, 이동량을 실시간으로 추적하여 위치 요소 별 변화량을 포함하는 페이셜 모션 데이터 생성하여 제 2 화상 대화 장치(200)로 스트리밍 전송한다.At this time, the processor 150 extracts depth information according to a predetermined basic facial expression of the user (i.e., a teacher) through the depth camera module 110 to generate reference data, Calculates movement amounts of the predetermined position elements on the user's face by comparing the reference data, generates the facial motion data including the amount of change per position element by tracking the movement amount in real time, and transmits the generated facial motion data to the second image conversation apparatus 200.

도 2는 본 발명의 일 실시예에 따른 페이셜 모션 캡쳐를 위한 기준 데이터를 생성하는 방식을 설명하기 위한 도면이다.2 is a diagram for explaining a method of generating reference data for facial motion capturing according to an embodiment of the present invention.

프로세서(150)는 깊이 카메라 모듈(110)을 통해 동일 시점에서의 사용자의 2차원 얼굴 영상 및 3차원 얼굴 영상을 촬영하고, 사용자의 2차원 얼굴 영상과 3차원 얼굴 영상의 각 위치를 정합시킨다.The processor 150 photographs the two-dimensional face image and the three-dimensional face image of the user at the same point in time through the depth camera module 110, and matches the positions of the two-dimensional face image and the three-dimensional face image of the user.

즉, 도 2의 (a)에서와 같은 사용자의 2차원 얼굴 영상으로부터, 도 2의 (b)에서와 같이 기설정된 표정 부위 요소들을 추출한다. 그리고 도 2의 (c)에서와 같이 3차원 얼굴 영상에서 해당 표정 부위 요소들에 대응되는 위치를 위치 요소로서 결정한다. 참고로, 표정 부위 요소는, 눈동자의 중심, 입꼬리, 눈썹 중심, 코 라인, 윗 입술 라인, 아랫 입술 라인 등 여러 부위가 설정될 수 있으며, 화상 대화의 목적에 따라 상이하게 설정될 수 있다. 예를 들어, 외국어 학습용 화상 대화 서비스의 경우 선생님의 발음과 관련된 표정을 보다 강조할 수 있도록 입꼬리, 윗 입술 및 아랫 입술 라인 등의 부위가 표정 부위 요소로서 포함될 수 있다.That is, predetermined facial feature elements are extracted from the user's two-dimensional facial image as shown in FIG. 2 (a), as shown in FIG. 2 (b). As shown in FIG. 2 (c), a position corresponding to the corresponding facial region elements in the 3D facial image is determined as a positional element. For reference, the facial feature element may be set in various places such as the center of the pupil, the mouth of the mouth, the center of eyebrows, the nose line, the upper lip line, and the lower lip line, and may be set differently according to the purpose of video conversation. For example, in the case of a videoconferencing service for foreign language learning, parts such as a mouth, upper lip, and lower lip line may be included as a facial expression part element so as to more emphasize a facial expression related to a teacher's pronunciation.

또한, 사용자의 기본 표정은 각 표정 부위 요소 별로 이동량을 검출하는 기준이 되는 것으로서, 무표정한 상태의 사용자를 촬영한 2차원 및 3차원 얼굴 영상이 사용될 수 있다.In addition, the user's basic facial expression is used as a reference for detecting the amount of movement for each facial feature element, and two-dimensional and three-dimensional facial images of a user in a state of no expression can be used.

또한, 프로세서(150)는, 제 2 화상 대화 장치(200) 상에서 사용자(선생님)의 얼굴 영상의 페이셜 모션 캡쳐 결과가 적용된 CG 캐릭터가 출력될 경우 사용자의 음성과 표정이 자연스럽게 매칭될 수 있도록, 음성 데이터와 음성 데이터의 녹음 시점에 대응되는 페이셜 모션 데이터를 동기화하여 제 2 화상 대화 장치(200)로 전송한다. 이때, 프로세서(150)는 사용자가 화상 대화에 참여한 후 실시간으로 페이셜 모션 캡쳐를 수행하는 동안 발생되는 지연 시간에 따라 사용자의 음성 데이터를 버퍼링시켜 동기화를 수행할 수 있다. 참고로, 사용자의 음성 데이터 및 페이셜 모션 데이터의 동기화는 제 2 화상 대화 장치(200)에서 처리되는 것도 가능하다.When the CG character to which the facial motion capturing result of the face image of the user (teacher) is applied is output on the second image communication apparatus 200, the processor 150 performs a voice recognition process so that the voice and facial expression of the user can be naturally matched. And transmits the facial motion data corresponding to the recording time of the data and voice data to the second image dialogue apparatus 200 in synchronization. At this time, the processor 150 may perform synchronization by buffering the user's voice data according to the delay time generated during the facial motion capturing in real time after the user participates in the video conversation. Note that synchronization of the user's voice data and facial motion data can also be processed in the second video dialogue apparatus 200. [

한편, 프로세서(150)는 제 2 화상 대화 장치(200)를 통해 대화 상대의 얼굴 영상은 별도의 얼굴 변환 처리없이 오디오 데이터와 함께 사용자가 확인할 수 있도록 출력하는 사용자 인터페이스를 제공한다. 즉, 학생측 또는 정보 요청자가 전송하는 화상 대화 데이터는 선생님 또는 정보 제공자가 명확히 확인할 수 있도록 실제 데이터가 그대로 출력될 수 있다.Meanwhile, the processor 150 provides a user interface for outputting the face image of the conversation partner through the second image dialogue apparatus 200 so that the face image of the conversation partner can be verified by the user together with the audio data without performing face conversion processing. That is, the actual data may be outputted as it is so that the teacher or the information provider can clearly confirm the video conversation data transmitted by the student or the information requester.

이하, 도 3을 참조하여 본 발명의 일 실시예에 따른 얼굴 변환 화상 대화 서비스 방법에 대해서 상세히 설명하도록 한다.Hereinafter, a method of converting a face-converted image conversation service according to an embodiment of the present invention will be described in detail with reference to FIG.

도 3은 본 발명의 일 실시예에 따른 얼굴 변환 정보 제공자측의 화상 대화 서비스 제공 방법을 설명하기 위한 순서도이다.3 is a flowchart illustrating a method for providing a video conversation service on the face conversion information provider side according to an embodiment of the present invention.

먼저, 깊이 카메라를 통해 촬영한 사용자의 얼굴 영상으로부터, 사용자의 기설정된 기본 표정에 따른 깊이 정보를 추출하여 기준 데이터를 생성한다(S310).First, depth information according to a user's predetermined basic facial expression is extracted from a face image of a user photographed through a depth camera to generate reference data (S310).

이때, 깊이 카메라를 통해 동일 시점에서의 사용자의 2차원 얼굴 영상 및 3차원 얼굴 영상을 촬영하고, 2차원 얼굴 영상과 3차원 얼굴 영상의 각 위치를 정합시키고, 2차원 얼굴 영상으로부터 기설정된 표정 부위 요소들을 추출하고, 3차원 얼굴 영상에서 해당 표정 부위 요소들에 대응되는 위치를 위치 요소로서 결정할 수 있다.At this time, the two-dimensional face image and the three-dimensional face image of the user at the same point in time are photographed through the depth camera, the positions of the two-dimensional face image and the three-dimensional face image are matched with each other, And the position corresponding to the corresponding facial region elements in the three-dimensional facial image can be determined as the positional element.

그런 다음, 화상 대화에 참여한 사용자의 얼굴 영상을 깊이 카메라를 통해 촬영하고(S320), 촬영된 얼굴 영상에 대해 깊이 정보를 추출하여 페이셜 모션 캡쳐를 처리한다(S330).Then, the face image of the user participating in the video conversation is photographed through the depth camera (S320), the depth information is extracted from the photographed face image, and the facial motion capturing is performed (S330).

이때, 사용자의 얼굴 표정 변화에 따른 깊이 정보 값과 기준 데이터를 비교하여 사용자 얼굴 상에 기설정된 위치 요소 별 이동량을 계산하고, 이동량을 실시간으로 추적하여 위치 요소 별 변화량을 포함하는 페이셜 모션 데이터를 생성한다.At this time, the amount of movement for each location element is calculated on the user's face by comparing the depth information value according to the user's facial expression change with the reference data, and the movement amount is tracked in real time to generate facial motion data including the amount of change per location element do.

다음으로, 페이셜 모션 캡쳐의 결과에 따른 페이셜 모션 데이터를 화상 대화에 참여한 대화 상대의 화상 대화 장치로 전송한다(S340).Next, the facial motion data according to the result of the facial motion capturing is transmitted to the video conversation apparatus of the conversation partner participating in the video conversation (S340).

이러한 페이셜 모션 데이터는 대화 상대의 화상 대화 장치로 스트리밍 전송된다.Such facial motion data is streamed to the conversation partner's video chatting device.

이때, 페이셜 모션 데이터는 대화 상대의 화상 대화 장치에서 기설정된 컴퓨터 그래픽 캐릭터에 맵핑되며, 맵핑된 페이셜 모션 데이터가 적용된 컴퓨터 그래픽 캐릭터가 상기 사용자의 얼굴 영상으로서 출력된다.At this time, the facial motion data is mapped to a predetermined computer graphic character in the conversation partner's image dialog device, and a computer graphic character to which the mapped facial motion data is applied is output as the face image of the user.

즉, 대화 상대의 화상 대화 장치는 3차원 랜더링 처리된 적어도 하나의 컴퓨터 그래픽 캐릭터가 저장되어 있으며, 적어도 하나의 컴퓨터 그래픽 캐릭터는 각각 상기 위치 요소들과 대응하는 위치 요소가 포함된 것이다. 따라서, 페이셜 모션 데이터에 기초하여 대화 상대의 화상 대화 장치 상에서 컴퓨터 그래픽 캐릭터의 위치 요소 별 이동량이 변화되며, 사용자의 표정에 대응하는 컴퓨터 그래픽 캐릭터를 통한 얼굴 영상이 출력된다.That is, the conversation partner's video chatting apparatus stores at least one computer graphic character rendered in a three-dimensional rendering process, and at least one computer graphic character includes a position element corresponding to each of the position elements. Therefore, on the basis of the facial motion data, the amount of movement of the computer graphic character by the positional element is changed on the conversation partner's image dialog device, and the facial image through the computer graphic character corresponding to the facial expression of the user is output.

한편, 상기 단계 S320과 병렬적으로 오디오 모듈을 통해 사용자의 음성 데이터를 녹음하는 단계가 더 수행될 수 있으며, 이를 통해 상기 단계 S340에서는 음성 데이터와 음성 데이터의 녹음 시점에 대응되는 페이셜 모션 데이터를 동기화하여 대화 상대의 화상 대화 장치로 전송할 수 있다.The step of recording voice data of the user through the audio module in parallel with the step S320 may be further performed so that the facial motion data corresponding to the recording time of the voice data and the voice data is synchronized To the video conversation apparatus of the conversation partner.

또한, 상기 단계 S310 내지 S340과 병렬적으로, 대화 상대의 화상 대화 장치로부터 화상 대화 데이터(즉, 상대의 얼굴 영상 및 음성 데이터)를 수신하는 단계를 수행할 수 있다. 이러한 경우, 얼굴 변환 처리없이 대화 상대의 얼굴 영상 및 오디오 데이터를 출력하는 사용자 인터페이스를 제공할 수 있다.In addition, in parallel with the above steps S310 to S340, it is possible to perform the step of receiving the image conversation data (that is, the face image and voice data of the partner) from the conversation partner's image conversation apparatus. In this case, it is possible to provide a user interface for outputting face images and audio data of a conversation partner without face conversion processing.

다시 도 1로 돌아가서, 제 2 화상 대화 장치(200)에 대해서 상세히 설명하도록 한다.Referring back to FIG. 1, the second video dialogue apparatus 200 will be described in detail.

카메라 모듈(210)은 사용자의 얼굴 영상을 촬영하고, 촬영된 얼굴 영상 데이터를 프로세서(250)로 제공한다.The camera module 210 photographs a user's face image and provides the photographed face image data to the processor 250.

오디오 모듈(220)은 사용자의 음성 데이터를 녹음하며, 또한 제 1 화상 대화 장치(100)로부터 수신된 음성 데이터를 사용자가 청취할 수 있도록 출력한다.The audio module 220 records the voice data of the user and also outputs the voice data received from the first video chatting device 100 so that the user can listen to the voice data.

통신모듈(230)은 프로세서(250)의 제어에 따라 화상 대화에 참여한 대화 상대의 화상 대화 장치(즉, 제 1 화상 대화 장치(100))와 데이터 송수신을 처리한다.The communication module 230 handles data transmission and reception with the video chatting device of the conversation partner (that is, the first video chatting device 100) participating in the video chatting under the control of the processor 250. [

메모리(240)에는, 대화 상대에게 사용자의 얼굴 영상을 포함하는 화상 대화 데이터를 제공하며, 대화 상대의 실제 얼굴 영상이 아닌 CG 캐릭터로 대체된 얼굴 영상을 출력하는 얼굴 변환 화상 대화 프로그램이 저장되어 있다.In the memory 240, a face conversion image dialogue program for providing image conversation data including a face image of a user to a conversation partner and outputting a face image replaced by a CG character other than the actual face image of the conversation partner is stored .

이때, 메모리(240)는 전원이 공급되지 않아도 저장된 정보를 계속 유지하는 비휘발성 저장장치 및 저장된 정보를 유지하기 위하여 전력이 필요한 휘발성 저장장치를 통칭하는 것이다. 또한, 메모리(240)에는 얼굴 변환 화상 대화 프로그램 외에도 다양한 프로그램 및 데이터들이 저장되며, 이는 프로세서(250)에 의하여 처리될 수 있다.At this time, the memory 240 collectively refers to a non-volatile storage device that keeps stored information even when power is not supplied, and a volatile storage device that requires power to maintain stored information. In addition, various programs and data are stored in the memory 240 in addition to the face-converted video conversation program, which can be processed by the processor 250. [

프로세서(250)는 메모리(240)에 저장된 얼굴 변환 화상 대화 프로그램을 실행한다.The processor 250 executes the face conversion video chat program stored in the memory 240. [

구체적으로, 프로세서(250)는 얼굴 변환 화상 대화 프로그램의 실행에 따라 다음과 같은 동작을 처리한다.Specifically, the processor 250 performs the following operations according to the execution of the face-converted video conversation program.

프로세서(250)는 사전에 저장되어 있는 3차원 랜더링 처리된 적어도 하나의 CG 캐릭터 중 사용자가 선택하거나 또는 자동으로 선택된 CG 캐릭터를 대화 상대의 얼굴 영상용으로서 출력한다. 이를 위해, 프로세서(250)는 적어도 하나의 CG 캐릭터 중 사용자가 원하는 캐릭터를 선택할 수 있도록 하는 사용자 인터페이스를 제공할 수 있다. 또한, 제 2 화상 대화 장치(200)가 스마트폰 또는 데스크탑 등 개별 단말인 경우, 프로세서(250)가 제공하는 사용자 인터페이스는 사전에 단말에 설치된 애플리케이션(예: 모바일 앱)의 형태로 제공될 수 있다.The processor 250 outputs a CG character selected by the user or automatically selected among at least one CG character that has been subjected to the 3D rendering processing stored in advance as a face image of the conversation partner. To this end, the processor 250 may provide a user interface that allows the user to select a desired character among at least one CG character. In addition, if the second video dialog device 200 is an individual terminal such as a smart phone or a desktop, the user interface provided by the processor 250 may be provided in advance in the form of an application (e.g., a mobile application) installed in the terminal .

이때, 프로세서(250)는 CG 캐릭터로서 애니메이션 캐릭터, 연예인 캐릭터 등 사용자에게 친숙하거나 선호하는 캐릭터를 생성하여 제공할 수 있다. 예를 들어, 어학 교육 서비스에서는 화상 대화 시 유소년층, 학생층 및 성인층 별로 선호하는 CG 캐릭터로 대체된 선생님의 얼굴 영상을 출력함으로써, 사용자 맞춤형으로 편안함, 재미, 흥미도를 높여주고 선생님의 실제 얼굴을 보며 대화하는 부담감을 줄여주고 집중력을 높일 수 있다.At this time, the processor 250 can generate and provide a character that is familiar or preferable to the user such as an animation character, an entertainer character, etc. as a CG character. For example, in the language education service, the teacher's face image is replaced with the CG character preferred by the under-age group, the student group and the adult group during the video conversation, thereby enhancing comfort, fun, and interest with the user-customized view, It can reduce the burden of conversation and increase concentration.

이러한 CG 캐릭터는 3차원 얼굴 영상으로서, 앞서 설명한 제 1 화상 대화 장치(100)에서 설정된 표정 부위 요소 별로 결정된 위치 요소들과 대응하는 위치 요소를 포함한다. 이때, 각 CG 캐릭터는 제 1 화상 대화 장치(100)에서 설정된 기준 데이터와 대응하는 위치 요소 값들이 기본 표정으로 설정된 상태일 수 있다. The CG character is a three-dimensional face image, and includes position elements corresponding to the facial feature elements determined in the first facial feature apparatus 100 described above and corresponding position elements. At this time, each CG character may be a state in which the position element values corresponding to the reference data set in the first video dialogue apparatus 100 are set as basic facial expressions.

이에 따라, 프로세서(250)는 제 1 화상 대화 장치(100)로부터 수신된 페이셜 모션 데이터에 기초하여 CG 캐릭터의 위치 요소 별 이동량을 변화시킴으로써 대화 상대의 표정에 대응하는 CG 캐릭터를 스트리밍 출력한다.Accordingly, the processor 250 outputs the streaming CG character corresponding to the facial expression of the conversation partner by changing the amount of movement of the CG character by the position element, based on the facial motion data received from the first video chatting apparatus 100. [

이하, 도 4를 참조하여 본 발명의 일 실시예에 따른 얼굴 변환 화상 대화 서비스 방법에 대해서 상세히 설명하도록 한다.Hereinafter, a method of converting a face-converted image conversation service according to an embodiment of the present invention will be described in detail with reference to FIG.

도 4는 본 발명의 일 실시예에 따른 얼굴 변환 정보 수신자측의 화상 대화 서비스 제공 방법을 설명하기 위한 순서도이다.4 is a flowchart for explaining a method of providing a video conversation service on a receiver side of a face conversion information according to an embodiment of the present invention.

먼저, 대화 상대의 화상 대화 장치(즉, 제 1 화상 대화 장치(100))로부터, 대화 상대의 얼굴 영상을 페이셜 모션 캡쳐한 결과인 페이셜 모션 데이터를 실시간으로 연속 수신한다(S410).First, the facial motion data, which is the result of facial motion capturing of the face image of the conversation partner, is continuously received in real time from the conversation partner's image conversation apparatus (i.e., the first image communication apparatus 100) (S410).

그리고 사전에 선택된 CG 캐릭터에 페이셜 모션 데이터를 맵핑하여, CG 캐릭터에 대해 설정되어 있던 위치 요소 별로 이동량을 변화시킴으로써 대화 상대의 표정에 대응하는 CG 캐릭터를 생성한다(S420).In operation S420, facial motion data is mapped to the CG character selected in advance and the CG character corresponding to the facial expression of the conversation partner is generated by changing the movement amount for each location element set for the CG character.

이때, 화상 대화 장치 상에는 각각 3D 랜더링된 복수의 CG 캐릭터가 저장되어 있으며, 각 CG 캐릭터는 대화 상대의 화상 대화 장치로부터 획득된 기준 데이터가 적용된 상태일 수 있다. 예를 들어, 임의의 대화 상대와 화상 대화를 시작하는 시점에 대화 상대의 화상 대화 장치로부터 기본 표정에 따른 기준 데이터를 수신하고, 각 CG 캐릭터에 기준 데이터를 적용하여 위치 요소들 및 위치 요소 값들에 대응하도록 기본 표정을 설정할 수 있다.At this time, a plurality of CG characters rendered in 3D are respectively stored in the image chatting apparatus, and each CG character may be a state in which reference data obtained from the image chatting apparatus of the conversation partner is applied. For example, reference data according to a basic facial expression is received from a conversation partner's video chatting apparatus at the time of starting an image conversation with an arbitrary conversation partner, and reference data is applied to each CG character to generate positional elements and positional element values A basic expression can be set so that it corresponds to

다음으로, 맵핑된 페이셜 모션 데이터가 적용된 CG 캐릭터를 대화 상대의 얼굴 영상으로서 디스플레이한다(S430).Next, the CG character to which the mapped facial motion data is applied is displayed as a face image of a conversation partner (S430).

한편, 상기 단계 S410은 페이셜 모션 데이터와 더불어 대화 상대의 음성 데이터가 함께 수신될 수 있다. 이에 따라, 상기 단계 S430에서는 대화 상대의 표정 변화에 따른 페이셜 모션 데이터가 적용된 CG 캐릭터가 출력됨과 동시에, 대화 상대의 음성 데이터가 함께 출력된다. 이때, 대화 상대의 음성 데이터와 페이셜 모션 데이터는 각각 발생된 시점에 기초하여 동기화 처리된 것일 수 있다. 또한, 대화 상대의 음성 데이터 및 페이셜 모션 데이터를 별도로 수신할 경우, 페이셜 모션 데이터의 처리 시간에 대한 정보 및 음성 데이터 발생 시점 정보를 함께 수신하여 음성 데이터를 두 시간 정보에 기초하여 지연시킴으로써 얼굴 영상과 음성 데이터를 동기화 시켜 출력하는 것도 가능한다.Meanwhile, in step S410, the voice data of the conversation partner may be received together with the facial motion data. Accordingly, in step S430, the CG character to which the facial motion data according to the change of the facial expression of the conversation partner is applied is outputted, and at the same time, the voice data of the conversation partner is output together. At this time, the voice data of the conversation partner and the facial motion data may be synchronized based on the generated point of time. In addition, when separately receiving the voice data and the facial motion data of the conversation partner, the information on the processing time of the facial motion data and the voice data generation time point are received together, and the voice data is delayed based on the two- It is also possible to synchronize and output voice data.

또한, 상기 단계 S410 내지 S430과 병렬적으로, 카메라를 통해 사용자의 얼굴 영상을 촬영하고 오디오 모듈을 통해 사용자의 음성을 녹음하는 단계, 및 화상 대화 데이터(즉, 촬영된 얼굴 영상 및 녹음된 음성 데이터)를 대화 상대의 화상 대화 장치로 전송하는 단계를 더 수행할 수 있다.In addition, in parallel with steps S410 to S430, a step of photographing a face image of a user through a camera and recording a voice of a user through an audio module, and a step of recording voice conversation data (i.e., ) To the conversation partner's video chatting device.

이상에서 설명한 본 발명의 실시예에 따른 페이셜 모션 캡쳐를 이용한 얼굴 변환 화상 대화 장치 및 그 제공 방법은, 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈, 또는 반송파와 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다.The above-described facial image conversion dialogue apparatus using facial motion capture according to an embodiment of the present invention and a method for providing the same can be also used in the form of a recording medium including a command executable by a computer such as a program module executed by a computer Can be implemented. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes any information delivery media, including computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transport mechanism.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.It will be understood by those skilled in the art that the foregoing description of the present invention is for illustrative purposes only and that those of ordinary skill in the art can readily understand that various changes and modifications may be made without departing from the spirit or essential characteristics of the present invention. will be. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

또한, 본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수도 있다.Furthermore, while the methods and systems of the present invention have been described in terms of specific embodiments, some or all of those elements or operations may be implemented using a computer system having a general purpose hardware architecture.

본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.It is intended that the present invention covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents. .

10: 얼굴 변환 화상 대화 시스템
100: 제 1 화상 대화 장치
200: 제 2 화상 대화 장치10: Face conversion video conversation system
100: first image communication device
200: second video communication device

Claims

페이셜 모션 캡쳐 정보를 이용하여 대화 상대의 얼굴 변환을 처리하는 화상 대화 장치에 있어서,
화상 대화에 참여한 대화 상대의 화상 대화 장치와 데이터 송수신을 처리하는 통신모듈;
얼굴 변환 화상 대화 프로그램이 저장된 메모리; 및
상기 메모리에 저장된 얼굴 변환 화상 대화 프로그램을 실행하는 프로세서를 포함하며,
상기 프로세서는 상기 얼굴 변환 화상 대화 프로그램의 실행에 따라, 상기 통신모듈을 통해 상기 대화 상대의 화상 대화 장치로부터 상기 대화 상대의 얼굴 영상에 대한 페이셜 모션 데이터가 수신되면, 기설정된 컴퓨터 그래픽 캐릭터에 상기 페이셜 모션 데이터를 맵핑하고, 상기 맵핑된 페이셜 모션 데이터가 적용된 상기 컴퓨터 그래픽 캐릭터를 상기 대화 상대의 얼굴 영상으로서 출력하며,
상기 페이셜 모션 데이터는 상기 대화 상대의 화상 대화 장치에서 깊이 카메라를 통해 촬영된 얼굴 영상의 깊이 정보에 따른 페이셜 모션 캡쳐의 결과인 것인, 얼굴 변환 화상 대화 장치.
An image dialogue apparatus for processing face conversions of a conversation partner using facial motion capture information,
A communication module for processing data transmission / reception with a video conversation device of a conversation partner participating in a video conversation;
A memory for storing a face conversion video conversation program; And
And a processor for executing a face conversion video chat program stored in the memory,
Wherein the facial-transformed image dialogue program is executed by the processor, and when facial motion data on the face image of the conversation partner is received from the image-communicating apparatus of the conversation partner via the communication module, Maps the motion data, and outputs the computer graphic character to which the mapped facial motion data is applied as the face image of the conversation partner,
Wherein the facial motion data is a result of facial motion capture according to depth information of a facial image photographed through a depth camera in an image conversation device of the conversation partner.

제 1 항에 있어서,
3차원 랜더링 처리된 컴퓨터 그래픽 캐릭터가 저장되어 있으며,
상기 컴퓨터 그래픽 캐릭터는 상기 대화 상대의 기본 표정에 따라 설정된 위치 요소들과 대응하는 위치 요소가 포함된 것이고,
상기 프로세서는,
상기 페이셜 모션 데이터에 기초하여 상기 컴퓨터 그래픽 캐릭터의 위치 요소 별 이동량을 변화시키는, 얼굴 변환 화상 대화 장치.
The method according to claim 1,
3D rendered computer graphics characters are stored,
The computer graphic character includes location elements corresponding to the location elements set according to the basic facial expression of the conversation partner,
The processor comprising:
And changes the amount of movement of the computer graphic character for each position element based on the facial motion data.

제 2 항에 있어서,
복수의 컴퓨터 그래픽 캐릭터가 저장되어 있으며,
상기 프로세서는,
상기 복수의 컴퓨터 그래픽 캐릭터 중 사용자가 어느 하나를 선택할 수 있도록 하는 사용자 인터페이스를 제공하는, 얼굴 변환 화상 대화 장치.
3. The method of claim 2,
A plurality of computer graphic characters are stored,
The processor comprising:
Wherein the user interface provides a user interface that allows a user of the plurality of computer graphic characters to select any one of them.

제 1 항에 있어서,
사용자의 얼굴 영상을 촬영하는 카메라 모듈; 및
상기 사용자의 음성 데이터를 녹음하는 오디오 모듈을 더 포함하며,
상기 프로세서는,
상기 카메라 모듈을 통해 촬영된 얼굴 영상 데이터 및 상기 오디오 모듈을 통해 녹음된 음성 데이터를 포함하는 화상 대화 데이터를 상기 대화 상대의 화상 대화 장치로 전송하며,
상기 대화 상대의 화상 대화 장치에서는 상기 사용자의 얼굴 영상 데이터가 얼굴 변환 처리없이 출력되는 것인, 얼굴 변환 화상 대화 장치.
The method according to claim 1,
A camera module for photographing a user's face image; And
Further comprising an audio module for recording voice data of the user,
The processor comprising:
The image communication data including face image data photographed through the camera module and voice data recorded through the audio module is transmitted to the image communication device of the conversation partner,
Wherein the face image data of the user is output without face conversion processing in the conversation partner's image conversation apparatus.

페이셜 모션 캡쳐 정보를 이용하여 대화 상대의 얼굴 변환을 처리하는 화상 대화 장치를 통해 얼굴 변환 화상 대화 서비스를 제공하는 방법에 있어서,
화상 대화에 참여한 대화 상대의 화상 대화 장치로부터 상기 대화 상대의 얼굴 영상에 대한 페이셜 모션 데이터를 수신하는 단계;
기설정된 컴퓨터 그래픽 캐릭터에 상기 페이셜 모션 데이터를 맵핑하는 단계; 및
상기 맵핑된 페이셜 모션 데이터가 적용된 상기 컴퓨터 그래픽 캐릭터를 상기 대화 상대의 얼굴 영상으로서 출력하는 단계를 포함하며,
상기 페이셜 모션 데이터는 상기 대화 상대의 화상 대화 장치에서 깊이 카메라를 통해 촬영된 얼굴 영상의 깊이 정보에 따른 페이셜 모션 캡쳐의 결과인 것인, 얼굴 변환 화상 대화 서비스 제공 방법.
A method for providing a face-switched video chatting service through a video chatting apparatus for processing face conversions of a conversation partner using facial motion capture information,
Receiving facial motion data on the face image of the conversation partner from a conversation partner's image conversation device participating in the image conversation;
Mapping the facial motion data to a predetermined computer graphic character; And
And outputting the computer graphic character to which the mapped facial motion data is applied as the facial image of the conversation partner,
Wherein the facial motion data is a result of facial motion capturing according to depth information of a facial image photographed through a depth camera in an image conversation device of the conversation partner.

제 5 항에 있어서,
상기 컴퓨터 그래픽 캐릭터는,
3차원 랜더링 처리되어, 상기 대화 상대의 기본 표정에 따라 설정된 위치 요소들과 대응하는 위치 요소가 포함된 것이며,
상기 페이셜 모션 데이터를 맵핑하는 단계는,
상기 상기 페이셜 모션 데이터에 기초하여 상기 컴퓨터 그래픽 캐릭터의 위치 요소 별 이동량을 변화시키는, 얼굴 변환 화상 대화 서비스 제공 방법.
6. The method of claim 5,
Wherein the computer graphic character comprises:
Dimensional rendering process, and includes location elements corresponding to the location elements set according to the basic facial expression of the conversation partner,
Wherein mapping the facial motion data comprises:
And changing a movement amount of each of the computer graphic characters according to the positional element based on the facial motion data.

제 6 항에 있어서,
상기 페이셜 모션 데이터를 맵핑하는 단계 이전에,
기저장된 복수의 컴퓨터 그래픽 캐릭터 중 사용자가 어느 하나를 선택할 수 있도록 하는 사용자 인터페이스를 제공하는 단계를 더 포함하는, 얼굴 변환 화상 대화 서비스 제공 방법.
The method according to claim 6,
Before the step of mapping the facial motion data,
Further comprising the step of providing a user interface that allows a user to select any of a plurality of pre-stored computer graphic characters.

페이셜 모션 캡쳐를 이용한 얼굴 변환 화상 대화 장치에 있어서,
화상 대화에 참여한 대화 상대의 화상 대화 장치와 데이터 송수신을 처리하는 통신모듈;
사용자의 얼굴 영상을 촬영하되, 촬영된 얼굴 영상으로부터 깊이 정보를 추출하는 깊이 카메라 모듈;
얼굴 변환 화상 대화 프로그램이 저장된 메모리; 및
상기 메모리에 저장된 얼굴 변환 화상 대화 프로그램을 실행하는 프로세서를 포함하며,
상기 프로세서는 상기 얼굴 변환 화상 대화 프로그램의 실행에 따라, 상기 촬영된 얼굴 영상에 대해 깊이 정보를 추출하여 페이셜 모션 캡쳐를 처리하고, 페이셜 모션 캡쳐의 결과에 따른 페이셜 모션 데이터를 상기 대화 상대의 화상 대화 장치로 전송하며,
상기 페이셜 모션 데이터는 상기 대화 상대의 화상 대화 장치에서 기설정된 컴퓨터 그래픽 캐릭터에 맵핑되며, 상기 맵핑된 페이셜 모션 데이터가 적용된 상기 컴퓨터 그래픽 캐릭터가 상기 사용자의 얼굴 영상으로서 출력되는 것인, 얼굴 변환 화상 대화 장치.
A face conversion image dialogue apparatus using facial motion capture,
A communication module for processing data transmission / reception with a video conversation device of a conversation partner participating in a video conversation;
A depth camera module for photographing a user's face image and extracting depth information from the photographed face image;
A memory for storing a face conversion video conversation program; And
And a processor for executing a face conversion video chat program stored in the memory,
Wherein the processor extracts depth information on the photographed face image in accordance with the execution of the face conversion image conversation program to process the facial motion capture and transmits facial motion data according to a result of the facial motion capture to the image conversation of the conversation partner &Lt; / RTI >
Wherein the facial motion data is mapped to a predetermined computer graphic character in the buddy's video chatting device and the computer graphic character to which the mapped facial motion data is applied is output as the face image of the user, Device.

제 8 항에 있어서,
상기 프로세서는,
상기 깊이 카메라 모듈을 통해 상기 사용자의 기설정된 기본 표정에 따른 깊이 정보를 추출하여 기준 데이터를 생성하고,
상기 사용자의 얼굴 표정 변화에 따른 깊이 정보 값과 상기 기준 데이터를 비교하여 상기 사용자 얼굴 상에 기설정된 위치 요소 별 이동량을 계산하고,
상기 이동량을 실시간으로 추적하여 상기 위치 요소 별 변화량을 포함하는 페이셜 모션 데이터를 상기 대화 상대의 화상 대화 장치로 스트리밍 전송하는, 얼굴 변환 화상 대화 장치.
9. The method of claim 8,
The processor comprising:
Extracting depth information according to a predetermined basic facial expression of the user through the depth camera module to generate reference data,
Calculating a movement amount by a predetermined position element on the user's face by comparing the depth information value according to the user's facial expression change with the reference data,
And the facial motion data including the amount of change per location element is streamed to the conversation partner's video chatting device by tracking the movement amount in real time.

제 9 항에 있어서,
상기 프로세서는,
상기 깊이 카메라 모듈을 통해 동일 시점에서의 상기 사용자의 2차원 얼굴 영상 및 3차원 얼굴 영상을 촬영하고,
상기 2차원 얼굴 영상과 상기 3차원 얼굴 영상의 각 위치를 정합시키고,
상기 2차원 얼굴 영상으로부터 기설정된 표정 부위 요소들을 추출하고,
상기 3차원 얼굴 영상에서 상기 추출된 표정 부위 요소들에 대응되는 위치를 상기 위치 요소로서 결정하는, 얼굴 변환 화상 대화 장치.

10. The method of claim 9,
The processor comprising:
Dimensional face image and a three-dimensional face image of the user at the same time point through the depth camera module,
Dimensional face image and the three-dimensional face image,
Extracting predetermined facial feature points from the two-dimensional facial image,
And determines a position corresponding to the extracted facial region elements in the three-dimensional facial image as the positional element.

제 9 항에 있어서,
상기 대화 상대의 화상 대화 장치는 3차원 랜더링 처리된 적어도 하나의 컴퓨터 그래픽 캐릭터가 저장되어 있으며,
상기 적어도 하나의 컴퓨터 그래픽 캐릭터는 각각 상기 위치 요소들과 대응하는 위치 요소가 포함된 것이고,
상기 페이셜 모션 데이터에 기초하여 상기 컴퓨터 그래픽 캐릭터의 위치 요소 별 이동량이 변화되는 것인, 얼굴 변환 화상 대화 장치.
10. The method of claim 9,
Wherein the conversation partner's video chatting apparatus stores at least one computer graphic character subjected to three-dimensional rendering processing,
Wherein the at least one computer graphic character includes a location element corresponding to each of the location elements,
Wherein the amount of movement of each of the computer graphic characters according to the positional element is changed based on the facial motion data.

제 8 항에 있어서,
상기 사용자의 음성 데이터를 녹음하는 오디오 모듈을 더 포함하며,
상기 프로세서는,
상기 음성 데이터와 상기 음성 데이터의 녹음 시점에 대응되는 상기 페이셜 모션 데이터를 동기화하여 상기 대화 상대의 화상 대화 장치로 전송하는, 얼굴 변환 화상 대화 장치.
9. The method of claim 8,
Further comprising an audio module for recording voice data of the user,
The processor comprising:
And transmits the voice data and the facial motion data corresponding to the recording time of the voice data in synchronization with each other to the video conversation apparatus of the conversation partner.

제 8 항에 있어서,
상기 프로세서는,
상기 통신모듈을 통해 상기 대화 상대의 화상 대화 장치로부터 상기 대화 상대의 얼굴 영상 및 오디오 데이터를 수신하면, 얼굴 변환 처리없이 상기 수신된 대화 상대의 얼굴 영상 및 오디오 데이터를 출력하는 사용자 인터페이스를 제공하는, 얼굴 변환 화상 대화 장치.
9. The method of claim 8,
The processor comprising:
And providing a user interface for outputting the face image and audio data of the received conversation partner without performing face conversion processing upon receiving the face image and audio data of the conversation partner from the conversation partner image communication apparatus via the communication module, A face conversion image dialog device.