KR102456438B1

KR102456438B1 - Visual wake-up system using artificial intelligence

Info

Publication number: KR102456438B1
Application number: KR1020220086404A
Authority: KR
Inventors: 조한희
Original assignee: (주)인티그리트
Priority date: 2022-07-13
Filing date: 2022-07-13
Publication date: 2022-10-19

Abstract

The present invention relates to a visual wake-up system using artificial intelligence. The problems to be solved are to minimize the probability of malfunction and wake-up failure which can occur during waking up to drive a voice recognition conversation service robot in a noisy place, and to reduce reluctance and discomfort that a user can feel in the existing method of waking up with only a voice command. According to an embodiment, a visual wake-up system using artificial intelligence comprises: an image data acquisition unit for acquiring image data through a camera built into a voice recognition service device; a visual information recognition unit which recognizes at least one visual information of a preset face, gaze, and motion of a user from the image data using an image recognition solution built in the voice recognition service device, and determines whether or not the recognized visual information satisfies a preset visual recognition condition; and a wake-up execution unit for executing a wake-up for driving the voice recognition service device, when the visual information meets the visual recognition condition.

Description

인공지능을 활용한 비주얼 웨이크 업 시스템{VISUAL WAKE-UP SYSTEM USING ARTIFICIAL INTELLIGENCE} Visual wake-up system using artificial intelligence {VISUAL WAKE-UP SYSTEM USING ARTIFICIAL INTELLIGENCE}

본 발명의 실시예는 인공지능을 활용한 비주얼 웨이크 업 시스템에 관한 것이다.An embodiment of the present invention relates to a visual wake-up system using artificial intelligence.

최근에는 사용자의 편의성 증대를 위해, 입력 장치의 수동 조작을 통해 사용자로부터 명령을 입력 받는 대신에, 사용자가 원하는 명령을 발화하면 사용자의 음성을 인식하고 사용자의 발화에 내포된 명령을 파악함으로써, 사용자의 의도에 대응되는 서비스를 제공하는 음성 인식 대화 서비스 로봇이 개발 및 보급되고 있다. Recently, for the convenience of the user, instead of receiving a command from the user through manual manipulation of the input device, when the user utters a desired command, the user's voice is recognized and the command embedded in the user's utterance is recognized. A voice recognition conversation service robot that provides a service corresponding to the intention of the user is being developed and disseminated.

이러한 음성 인식 대화 서비스 로봇은 특정 장소에 배치되어 사용자에게 각종 정보를 안내하는 안내 로봇, 가정에 구비되는 홈 로봇, 학습자와의 인터랙션을 통해 학습자의 학습을 지도하거나 보조하는 교육용 로봇 등 다양한 종류와 목적을 갖는 로봇을 포함할 수 있다.These speech recognition conversation service robots are of various types and purposes, such as a guide robot that is placed in a specific place to guide various information to the user, a home robot provided at home, and an educational robot that guides or assists learners in their learning through interaction with the learner. It may include a robot having

이러한 음성 인식 대화 서비스 로봇이 실외나 많은 대중들이 오고 가는 대규모 실내에서 운영되며, 음성 명령을 통해 웨이크 업하는 경우, 해당 장소에서 발생되는 각종 소음으로 인해 오작동이 발생되거나 웨이크 업이 제대로 실행되지 않을 가능성이 높다.If such a voice recognition conversation service robot is operated outdoors or in a large indoor where many people come and go, and it wakes up through a voice command, there is a possibility that a malfunction may occur or the wakeup may not be performed properly due to various noises generated in the place. this is high

또한, 음성 인식 대화 서비스 로봇의 최초 구동을 위해 음성 명령 방식만이 적용되어 있는 경우 그에 따른 웨이크 업에 대하여 사용자들이 거부감이 들거나 불편함을 느낄 수 있다는 문제가 있다.In addition, when only the voice command method is applied for the initial operation of the voice recognition dialog service robot, there is a problem that users may feel repulsive or feel uncomfortable with the wake-up according to the voice recognition dialog service robot.

공개특허공보 제10-2018-0111859호(공개일자: 2018년10월11일)Laid-open Patent Publication No. 10-2018-0111859 (published date: October 11, 2018) 공개특허공보 제10-2020-0075721호(공개일자: 2020년06월26일)Laid-open Patent Publication No. 10-2020-0075721 (published date: June 26, 2020)

본 발명의 실시예는, 소음이 심한 장소에서 음성 인식 대화 서비스 로봇의 구동을 위해 웨이크 업하는 경우 발생할 수 있는 오작동 및 웨이크 업 실패 확률을 최소화하며, 음성 명령만으로 웨이크 업하는 기존의 방식에서 사용자들이 느낄 수 있는 거부감과 불편함을 줄일 수 있는 인공지능을 활용한 비주얼 웨이크 업 시스템을 제공한다.The embodiment of the present invention minimizes the probability of a malfunction and wakeup failure that may occur when waking up for driving a voice recognition conversation service robot in a noisy place, and in the existing method of waking up only with a voice command, users can It provides a visual wake-up system using artificial intelligence that can reduce the sense of rejection and discomfort.

본 발명의 실시예에 따른 인공지능을 활용한 비주얼 웨이크 업 시스템은, 음성 인식 서비스 장치에 내장된 카메라를 통해 영상데이터를 획득하는 영상데이터 획득부; 음성 인식 서비스 장치에 내장된 영상인식솔루션을 이용하여 상기 영상데이터로부터 미리 설정된 사용자의 안면, 시선 및 모션 중 적어도 하나의 비주얼 정보를 인식하고, 인식된 상기 비주얼 정보가 미리 설정된 비주얼 인식 조건에 충족하는지 여부를 판정하는 비주얼 정보 인식부; 및 상기 비주얼 정보가 상기 비주얼 인식 조건에 충족되면, 음성 인식 서비스 장치의 구동을 위한 웨이크 업을 실행하는 웨이크 업 실행부를 포함한다.A visual wake-up system using artificial intelligence according to an embodiment of the present invention includes: an image data acquisition unit for acquiring image data through a camera built in a voice recognition service device; Recognizes at least one visual information of the user's face, gaze, and motion preset from the image data using the image recognition solution built into the voice recognition service device, and whether the recognized visual information meets the preset visual recognition condition a visual information recognition unit to determine whether or not; and a wakeup execution unit configured to perform a wakeup for driving a voice recognition service device when the visual information satisfies the visual recognition condition.

또한, 음성 인식 서비스 장치에 내장된 마이크, ADC 및 STT모듈을 통해 텍스트데이터 형태의 음성데이터를 획득하는 음성데이터 획득부; 및 상기 음성데이터가 미리 설정된 다수의 웨이크업 텍스트데이터 중 어느 하나와 일치하는지 여부를 판정하는 음성정보 인식부를 더 포함하고, 상기 웨이크 업 실행부는, 상기 음성데이터가 상기 다수의 웨이크업 텍스트데이터 중 어느 하나와 일치하고, 상기 비주얼 정보가 상기 비주얼 인식 조건에 충족하는 경우 음성 인식 서비스 장치의 구동을 위한 웨이크 업을 실행할 수 있다.In addition, a voice data acquisition unit for acquiring voice data in the form of text data through a microphone, ADC and STT module built in the voice recognition service device; and a voice information recognition unit for determining whether the voice data matches any one of a plurality of preset wake-up text data, wherein the wake-up execution unit determines whether the voice data matches any one of the plurality of wake-up text data. When one matches and the visual information satisfies the visual recognition condition, a wakeup for driving the voice recognition service device may be executed.

또한, 상기 음성정보 인식부의 기능이 오프 상태로 설정된 제1 상태, 상기 음성정보 인식부를 통해 상기 음성데이터가 2개 이상 생성되는 제2 상태, 상기 음성데이터에 대하여 미리 설정된 문법 체크를 통해 문법에 맞지 않는 내용을 포함하는 제3 상태 중 적어도 하나의 상태를 인식하면, 상기 웨이크 업 실행부가 상기 비주얼 정보 인식부의 판정 결과만으로 웨이크 업 동작이 실행되도록 상기 웨이크 업 실행부의 실행 조건을 조정하는 웨이크 업 실행 조건 조정부를 더 포함할 수 있다.In addition, in a first state in which the function of the voice information recognition unit is set to an off state, in a second state in which two or more voice data are generated through the voice information recognition unit, the voice data does not conform to grammar through a preset grammar check A wake-up execution condition for adjusting the execution condition of the wake-up execution unit so that the wake-up operation is executed only with the determination result of the visual information recognition unit when the wake-up execution unit recognizes at least one of the third states including content not included It may further include an adjustment unit.

또한, 상기 비주얼 정보 인식부는, 사용자의 안면 중 정면 얼굴이 미리 설정된 비율과 미리 설정된 시간 이상으로 노출되는 제1 비주얼 인식 조건, 사용자의 시선이 미리 설정된 시간 이상으로 카메라를 주시하는 제2 비주얼 인식 조건, 및 미리 설정된 사용자의 손짓 및 박수 중 적어도 하나의 모션이 인식되는 제3 비주얼 인식 조건을 상기 비주얼 인식 조건으로서 저장하고, 상기 제1 비주얼 인식 조건, 상기 제2 비주얼 인식 조건 및 상기 제3 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 상기 비주얼 인식 조건으로 미리 설정되어, 상기 비주얼 정보가 미리 설정된 상기 비주얼 인식 조건 중 어느 하나에 충족되는지 여부를 판정할 수 있다.In addition, the visual information recognition unit, a first visual recognition condition in which the front face of the user's face is exposed for a preset ratio and a preset time or more, and a second visual recognition condition in which the user's gaze gazes at the camera for a preset time or more , and a third visual recognition condition in which at least one of a motion of a user's hand gesture and clapping is recognized as the visual recognition condition, and the first visual recognition condition, the second visual recognition condition, and the third visual recognition At least one of the conditions may be preset as the visual recognition condition according to a user's selection, and it may be determined whether the visual information satisfies any one of the preset visual recognition conditions.

또한, 상기 비주얼 정보 인식부는, 사용자의 정면 얼굴 인식 영역 내에 손가락 객체를 추가적으로 인식하는 제4 비주얼 인식 조건을 상기 비주얼 인식 조건으로서 추가 저장하고, 상기 제1 비주얼 인식 조건, 상기 제2 비주얼 인식 조건, 상기 제3 비주얼 인식 조건 및 상기 제4 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 상기 비주얼 인식 조건으로 미리 설정되어, 상기 비주얼 정보가 미리 설정된 상기 비주얼 인식 조건 중 어느 하나에 충족되는지 여부를 판정할 수 있다.In addition, the visual information recognition unit additionally stores a fourth visual recognition condition for additionally recognizing a finger object in the user's front face recognition area as the visual recognition condition, the first visual recognition condition, the second visual recognition condition, At least one of the third visual recognition condition and the fourth visual recognition condition is preset as the visual recognition condition according to a user's selection to determine whether the visual information satisfies any one of the preset visual recognition conditions can do.

또한, 상기 비주얼 정보 인식부는, 사용자의 정면 얼굴 인식 영역 내에 손가락 객체에 대하여 미리 설정된 모션을 추가적으로 인식하는 제5 비주얼 인식 조건을 상기 비주얼 인식 조건으로서 추가 저장하고, 상기 제1 비주얼 인식 조건, 상기 제2 비주얼 인식 조건, 상기 제3 비주얼 인식 조건 및 상기 제5 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 상기 비주얼 인식 조건으로 미리 설정되어, 상기 비주얼 정보가 미리 설정된 상기 비주얼 인식 조건 중 어느 하나에 충족되는지 여부를 판정할 수 있다.In addition, the visual information recognition unit additionally stores a fifth visual recognition condition for additionally recognizing a preset motion with respect to a finger object in the user's front face recognition region as the visual recognition condition, the first visual recognition condition, the first At least one of the second visual recognition condition, the third visual recognition condition, and the fifth visual recognition condition is preset as the visual recognition condition according to a user's selection, and the visual information is applied to any one of the preset visual recognition conditions. It can be determined whether it is satisfied or not.

또한, 상기 비주얼 정보 인식부는, 인식된 사용자의 정면 얼굴 중앙점을 기준으로 형성되는 사분면 중 미리 설정된 사분면 상에 손가락 객체를 추가적으로 인식하는 제6 비주얼 인식 조건을 상기 비주얼 인식 조건으로서 추가 저장하고, 상기 제1 비주얼 인식 조건, 상기 제2 비주얼 인식 조건, 상기 제3 비주얼 인식 조건 및 상기 제6 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 상기 비주얼 인식 조건으로 미리 설정되어, 상기 비주얼 정보가 미리 설정된 상기 비주얼 인식 조건 중 어느 하나에 충족되는지 여부를 판정할 수 있다.In addition, the visual information recognition unit additionally stores as the visual recognition condition a sixth visual recognition condition for additionally recognizing a finger object on a preset quadrant among quadrants formed based on the recognized user's front face center point, At least one of a first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the sixth visual recognition condition is preset as the visual recognition condition according to a user's selection, so that the visual information is preset. It may be determined whether any one of the above visual recognition conditions is satisfied.

또한, 상기 비주얼 정보 인식부는, 인식된 사용자의 정면 얼굴 중앙점을 기준으로 형성되는 사분면 중 미리 설정된 사분면 상에 손가락 객체에 대하여 미리 설정된 모션을 추가적으로 인식하는 제7 비주얼 인식 조건을 상기 비주얼 인식 조건으로서 추가 저장하고, 상기 제1 비주얼 인식 조건, 상기 제2 비주얼 인식 조건, 상기 제3 비주얼 인식 조건 및 상기 제7 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 상기 비주얼 인식 조건으로 미리 설정되어, 상기 비주얼 정보가 미리 설정된 상기 비주얼 인식 조건 중 어느 하나에 충족되는지 여부를 판정할 수 있다.In addition, the visual information recognition unit, as the visual recognition condition, a seventh visual recognition condition for additionally recognizing a preset motion with respect to a finger object on a preset quadrant among quadrants formed based on the recognized user's front face center point as the visual recognition condition further storing, wherein at least one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the seventh visual recognition condition is preset as the visual recognition condition according to a user's selection; It may be determined whether the visual information satisfies any one of the preset visual recognition conditions.

본 발명에 따르면, 소음이 심한 장소에서 음성 인식 대화 서비스 로봇의 구동을 위해 웨이크 업하는 경우 발생할 수 있는 오작동 및 웨이크 업 실패 확률을 최소화하며, 음성 명령만으로 웨이크 업하는 기존의 방식에서 사용자들이 느낄 수 있는 거부감과 불편함을 줄일 수 있는 인공지능을 활용한 비주얼 웨이크 업 시스템을 제공할 수 있다.According to the present invention, the probability of malfunction and wakeup failure that may occur when waking up to drive the voice recognition conversation service robot in a noisy place is minimized, and users can feel it in the existing method of waking up only with voice commands. It is possible to provide a visual wake-up system using artificial intelligence that can reduce rejection and discomfort.

도 1은 본 발명의 실시예에 따른 인공지능을 활용한 비주얼 웨이크 업 시스템이 적용되는 장치 환경 및 대략적인 동작 방식을 설명하기 위해 나타낸 도면이다.
도 2는 본 발명의 실시예에 따른 인공지능을 활용한 비주얼 웨이크 업 시스템의 전체 구성을 나타낸 블록도이다.
도 3 내지 도 9는 본 발명의 실시예에 따른 제1 내지 제7 비주얼 인식 조건을 설명하기 위해 나타낸 도면이다.
도 10은 본 발명의 실시예에 따른 웨이크 업 실행 조건 조정부의 동작 방식을 설명하기 위해 나타낸 흐름도이다.1 is a diagram illustrating a device environment and an approximate operation method to which a visual wake-up system using artificial intelligence according to an embodiment of the present invention is applied.
2 is a block diagram showing the overall configuration of a visual wake-up system using artificial intelligence according to an embodiment of the present invention.
3 to 9 are diagrams illustrating first to seventh visual recognition conditions according to an embodiment of the present invention.
10 is a flowchart illustrating an operation method of a wakeup execution condition adjusting unit according to an embodiment of the present invention.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 발명에 대해 구체적으로 설명하기로 한다.Terms used in this specification will be briefly described, and the present invention will be described in detail.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the present invention have been selected as currently widely used general terms as possible while considering the functions in the present invention, but these may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technology, and the like. In addition, in a specific case, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding invention. Therefore, the term used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, rather than the name of a simple term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나 이상의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.In the entire specification, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated. In addition, terms such as "... unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software, or a combination of hardware and software. .

아래에서는 첨부한 도면을 참고하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, with reference to the accompanying drawings, the embodiments of the present invention will be described in detail so that those skilled in the art can easily carry out the embodiments of the present invention. However, the present invention may be embodied in various different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

도 1은 본 발명의 실시예에 따른 인공지능을 활용한 비주얼 웨이크 업 시스템이 적용되는 장치 환경 및 대략적인 동작 방식을 설명하기 위해 나타낸 도면이다.1 is a diagram illustrating a device environment and an approximate operation method to which a visual wake-up system using artificial intelligence according to an embodiment of the present invention is applied.

도 1을 참조하면, 본 발명의 실시예에 따른 인공지능을 활용한 비주얼 웨이크 업 시스템(1000)은 음성 인식 서비스 장치(10)에 하드웨어 및 소프트웨어적으로 구성될 수 있으며, 음성 인식 서비스 장치(10)에 내장된 다양한 구성요소를 활용하여 비주얼 웨이크 업 기능을 수행할 수 있다. Referring to FIG. 1 , a visual wake-up system 1000 using artificial intelligence according to an embodiment of the present invention may be configured in hardware and software in a voice recognition service apparatus 10 , and the voice recognition service apparatus 10 ) can perform a visual wake-up function by utilizing various components built into it.

상기 음성 인식 서비스 장치(10)는, 사용자에게 각종 정보를 안내하는 안내 로봇, 가정에 구비되는 홈 로봇, 학습자와의 인터랙션을 통해 학습자의 학습을 지도하거나 보조하는 교육용 로봇 등 다양한 종류와 목적을 갖는 음성 인식 대화 서비스 로봇을 포함할 수 있다.The voice recognition service device 10 has various types and purposes, such as a guide robot that guides a user with various information, a home robot provided at home, and an educational robot that guides or assists the learner's learning through interaction with the learner. It may include a voice recognition conversation service robot.

또한, 인공지능을 활용한 비주얼 웨이크 업 시스템(1000)은 음성 인식 서비스 장치(10)에 내장된 각종 구성요소(하드웨어, 소프트웨어 구성요소)를 활용하여 사용자의 비주얼 인식, 음성 인식, 비주얼과 음성 인식 등을 통한 웨이크 업 동작을 실행할 수 있으며, 좀 더 구체적으로는 음성 인식 서비스 장치(10)의 카메라(1), 영상 인식 솔루션(2), 마이크(3) 및 STT 모듈(5) 등을 활용하여 본 실시예에 따른 비주얼 웨이크 업 동작을 수행할 수 있으며, 이러한 카메라(1), 영상 인식 솔루션(2), 마이크(3) 및 STT 모듈(5)은 음성 인식 서비스 장치(10)의 웨이크 업 정보를 디텍션해야 하므로 슬립 상태에서도 구동할 수 있으며, 웨이크 업에 의한 구동 시 인공지능 기반의 음성 인식 서비스에 대한 본 동작이 개시될 수 있다.In addition, the visual wake-up system 1000 using artificial intelligence utilizes various components (hardware, software components) built into the voice recognition service device 10 to recognize the user's visual, voice recognition, visual and voice recognition. It is possible to execute a wake-up operation through the A visual wake-up operation according to the present embodiment can be performed, and the camera 1, the image recognition solution 2, the microphone 3, and the STT module 5 are the wake-up information of the voice recognition service device 10. to be detected, so it can be driven even in a sleep state, and when driven by wake-up, the main operation for the AI-based voice recognition service can be started.

도 2는 본 발명의 실시예에 따른 인공지능을 활용한 비주얼 웨이크 업 시스템의 전체 구성을 나타낸 블록도이고, 도 3 내지 도 9는 본 발명의 실시예에 따른 제1 내지 제7 비주얼 인식 조건을 설명하기 위해 나타낸 도면이며, 도 10은 본 발명의 실시예에 따른 웨이크 업 실행 조건 조정부의 동작 방식을 설명하기 위해 나타낸 흐름도이다.2 is a block diagram showing the overall configuration of a visual wake-up system using artificial intelligence according to an embodiment of the present invention, and FIGS. 3 to 9 are first to seventh visual recognition conditions according to an embodiment of the present invention. It is a diagram shown for explanation, and FIG. 10 is a flowchart illustrating an operation method of the wake-up execution condition adjusting unit according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 인공지능을 활용한 비주얼 웨이크 업 시스템(1000)은 영상데이터 획득부(100), 비주얼 정보 인식부(200), 음성데이터 획득부(300), 음성정보 인식부(400), 웨이크 업 실행부(500) 및 웨이크 업 실행 조건 조정부(600) 중 적어도 하나를 포함할 수 있다.Referring to FIG. 2 , the visual wake-up system 1000 using artificial intelligence of the present invention includes an image data acquisition unit 100 , a visual information recognition unit 200 , a voice data acquisition unit 300 , and a voice information recognition unit. 400 , at least one of a wake-up execution unit 500 , and a wake-up execution condition adjustment unit 600 .

상기 영상데이터 획득부(100)는, 음성 인식 서비스 장치(10)에 내장된 카메라(1)를 통해 영상데이터를 획득할 수 있다. 도시되지 않았으나, 카메라(1)는 인체감지센서(미도시) 등을 포함할 수 있으며, 인체감지센서를 통해 일정 거리 이내의 객체가 인식되면 구동하여 촬영영역 내 해당 객체를 촬영하고, 영상데이터 획득부(100)는 슬립 상태에서 카메라(1)를 통해 영상데이터가 생성되면 카메라(1)로부터 해당 영상데이터를 수신하여 웨이크 업을 위한 비주얼 인식 조건에 충족하는지 여부를 판정하기 위해 비주얼 정보 인식부(200)로 전달할 수 있다. 이때, 영상데이터 획득부(100)는 대략 3 내지 5초 분량의 영상데이터를 확보할 수 있으며, 해당 시간 이내에 촬영된 영상데이터에 대한 비주얼 인식 조건을 판정할 수 있는 것으로 가정한다. 물론, 영상데이터의 시간 분량에 대해서는 상술한 바와 같이 3 내지 5초로 한정하는 것은 아니며, 해당 영상데이터로 비주얼 인식 조건을 충분히 판정할 수 있을 정도의 시간 분량으로 재 설정이 가능하다.The image data acquisition unit 100 may acquire image data through the camera 1 built in the voice recognition service device 10 . Although not shown, the camera 1 may include a human body detection sensor (not shown), etc., and when an object within a certain distance is recognized through the human body detection sensor, it is driven to photograph the object in the photographing area, and image data is acquired When the image data is generated through the camera 1 in the sleep state, the unit 100 receives the image data from the camera 1 to determine whether the visual recognition condition for wake-up is satisfied. 200) can be passed. In this case, it is assumed that the image data acquisition unit 100 can secure image data of about 3 to 5 seconds and determine the visual recognition condition for image data captured within the corresponding time. Of course, the amount of time of the image data is not limited to 3 to 5 seconds as described above, and it is possible to reset the amount of time enough to sufficiently determine the visual recognition condition with the corresponding image data.

한편, 인체감지센서에 의해 음성 인식 서비스 장치(10)로부터 일정 거리 이내에 객체가 포착되면 카메라(1)가 동작하고, 카메라(1)에 의해 촬영되는 영상이 도시되지 않았으나 음성 인식 서비스 장치(10)의 디스플레이(미도시)로 출력되며, 해당 디스플레이를 통해 웨이크 업을 위한 안내메시지와 각종 가이드정보를 표시하여 사용자로 하여금 웨이크 업을 위한 정보가 보다 수월하게 입력될 수 있도록 도울 수 있다. 또한, 디스플레이를 통해 비주얼 정보의 입력 시작 및 종료 시점과 입력 중에 있음을 안내할 수 있으며, 해당 비주얼 정보의 입력 시간 동안에 촬영된 영상데이터가 영상데이터 획득부(100)로 전달될 수 있다.On the other hand, when an object is captured within a certain distance from the voice recognition service device 10 by the human body sensor, the camera 1 operates, and the image captured by the camera 1 is not shown, but the voice recognition service device 10 is output to a display (not shown) of , and by displaying a guide message for wake-up and various guide information through the display, it is possible to help the user to input wake-up information more easily. In addition, through the display, it is possible to guide the input start and end times of the visual information and whether the input is in progress, and image data captured during the input time of the corresponding visual information may be transmitted to the image data obtaining unit 100 .

상기 비주얼 정보 인식부(200)는, 음성 인식 서비스 장치(10)에 내장된 영상인식솔루션(3)을 이용하여 영상데이터로부터 미리 설정된 사용자의 안면, 시선 및 모션 중 적어도 하나의 비주얼 정보를 인식하고, 인식된 비주얼 정보가 미리 설정된 비주얼 인식 조건에 충족하는지 여부를 판정할 수 있다.The visual information recognition unit 200 recognizes at least one of the user's face, gaze, and motion preset from the image data using the image recognition solution 3 built into the voice recognition service device 10, and , it may be determined whether the recognized visual information satisfies a preset visual recognition condition.

본 실시예에 따른 비주얼 정보 인식부(200)는, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건 및 제3 비주얼 인식 조건이 기본적인 비주얼 인식 조건으로 저장되어 있어, 영상인식솔루션(3)과 연동하여 사용자의 비주얼 정보가 제1 내지 제3 비주얼 인식 조건에 충족하는지를 판정할 수 있다.The visual information recognition unit 200 according to this embodiment stores the first visual recognition condition, the second visual recognition condition, and the third visual recognition condition as basic visual recognition conditions, so that it works in conjunction with the image recognition solution 3 It may be determined whether the user's visual information satisfies first to third visual recognition conditions.

좀 더 구체적으로 설명하면, 제1 비주얼 인식 조건은 도 3에 도시된 바와 같이 사용자의 안면 중 정면 얼굴이 미리 설정된 비율(약 90%)과 미리 설정된 시간(약 3초) 이상으로 노출되는 조건이고, 제2 비주얼 인식 조건은 도 4에 도시된 바와 같이 사용자의 시선이 미리 설정된 시간(약 3초) 이상으로 카메라(1)를 주시하는 조건이고, 제3 비주얼 인식 조건은 도 5에 도시된 바와 같이 미리 설정된 사용자의 손짓 및 박수 중 적어도 하나의 모션이 인식되는 조건일 수 있다. More specifically, the first visual recognition condition is a condition in which the front face of the user is exposed for a preset ratio (about 90%) and a preset time (about 3 seconds) or more of the user's face as shown in FIG. , the second visual recognition condition is a condition in which the user's gaze gazes at the camera 1 for more than a preset time (about 3 seconds) as shown in FIG. 4 , and the third visual recognition condition is as shown in FIG. 5 . The same may be a condition in which at least one motion of a preset user's hand gesture and applause is recognized.

여기서, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건 및 제3 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 비주얼 인식 조건으로 설정될 수 있다. 즉, 제1 내지 제3 비주얼 인식 조건 중 어느 하나의 비주얼 인식 조건이 기본 비주얼 인식 조건으로 설정될 수 있으며, 상황에 따라 둘 이상의 조합으로도 설정 가능하다.Here, at least one of the first visual recognition condition, the second visual recognition condition, and the third visual recognition condition may be set as the visual recognition condition according to a user's selection. That is, any one of the first to third visual recognition conditions may be set as the basic visual recognition condition, and a combination of two or more may be set according to circumstances.

상기 비주얼 정보 인식부(200)는, 사용자의 정면 얼굴 인식 영역 내에 손가락 객체를 추가적으로 인식하는 제4 비주얼 인식 조건을 비주얼 인식 조건으로서 추가 저장할 수 있다. The visual information recognition unit 200 may additionally store a fourth visual recognition condition for additionally recognizing a finger object in the user's front face recognition area as a visual recognition condition.

상기 제4 비주얼 인식 조건에 대한 비주얼 정보가 충족되는 경우에 대하여 설명하면 도 6에 도시된 바와 같이, 영상데이터에서 얼굴(정면 얼굴)이 먼저 인식되면 인식된 얼굴에 대한 인식 영역(얼굴 인식 영역)(F)이 정의된 후, 해당 얼굴 인식 영역(F) 내에 손가락 객체(정확히는 손)가 인식되면 사용자의 비주얼 정보가 제4 비주얼 인식 조건에 충족된 것으로 판정할 수 있다. 이와 반대로, 해당 얼굴 인식 영역(F) 이외에 손이 위치하여 해당 얼굴 인식 영역(F) 내에는 손이 인식되지 않는 경우 실제로 사용자의 손이 카메라(1)에 촬영되었더라도 얼굴 인식 영역(F) 이내에 해당 객체가 인식되지 않음으로써 해당 비주얼 정보는 제4 비주얼 인식 조건으로서 충족하지 않게 된다.When the visual information for the fourth visual recognition condition is satisfied, as shown in FIG. 6 , when a face (front face) is first recognized in the image data, the recognition area for the recognized face (face recognition area) After (F) is defined, when a finger object (precisely, a hand) is recognized in the corresponding face recognition area F, it may be determined that the user's visual information satisfies the fourth visual recognition condition. Conversely, if the hand is not recognized within the face recognition area F because the hand is located other than the corresponding face recognition area F, even if the user's hand is actually photographed by the camera 1, the corresponding hand is located within the face recognition area F. Since the object is not recognized, the corresponding visual information is not satisfied as the fourth visual recognition condition.

한편, 비주얼 정보 인식부(200)는, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건, 제3 비주얼 인식 조건 및 제4 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 비주얼 인식 조건으로 미리 설정되어, 비주얼 정보가 미리 설정된 비주얼 인식 조건 중 어느 하나에 충족되는지 여부를 판정하도록 동작할 수 있다.On the other hand, the visual information recognition unit 200, at least one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the fourth visual recognition condition is preset as a visual recognition condition according to the user's selection. , to determine whether the visual information satisfies any one of preset visual recognition conditions.

여기서, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건, 제3 비주얼 인식 조건 및 제4 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 비주얼 인식 조건으로 설정될 수 있다. 즉, 제1 내지 제4 비주얼 인식 조건 중 어느 하나의 비주얼 인식 조건이 기본 비주얼 인식 조건으로 설정될 수 있으며, 상황에 따라 둘 이상의 조합으로도 설정 가능하다.Here, at least one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the fourth visual recognition condition may be set as the visual recognition condition according to a user's selection. That is, any one of the first to fourth visual recognition conditions may be set as the basic visual recognition condition, and a combination of two or more may be set according to circumstances.

상기 비주얼 정보 인식부(200)는, 사용자의 정면 얼굴 인식 영역 내에 손가락 객체에 대하여 미리 설정된 모션을 추가적으로 인식하는 제5 비주얼 인식 조건을 비주얼 인식 조건으로서 추가 저장할 수 있다.The visual information recognition unit 200 may additionally store a fifth visual recognition condition for additionally recognizing a preset motion with respect to a finger object in the user's front face recognition area as a visual recognition condition.

상기 제5 비주얼 인식 조건에 대한 비주얼 정보가 충족되는 경우에 대하여 설명하면 도 7에 도시된 바와 같이, 영상데이터에서 얼굴(정면 얼굴)이 먼저 인식되면 인식된 얼굴에 대한 인식 영역(얼굴 인식 영역)(F)이 정의된 후, 해당 얼굴 인식 영역(F) 내에 손가락 객체(정확히는 손)가 특정 모양(오케이 모양)을 하고 특정 방향으로 이동하는 움직임이 인식되면 사용자의 비주얼 정보가 제5 비주얼 인식 조건에 충족된 것으로 판정할 수 있다. 이와 반대로, 해당 얼굴 인식 영역(F) 이외에 손이 위치하여 움직이더라도 해당 얼굴 인식 영역(F) 내에는 손이 인식되지 않는 경우 실제로 사용자의 손이 카메라(1)에 촬영되었더라도 얼굴 인식 영역(F) 이내에 해당 객체가 인식되지 않음으로써 해당 비주얼 정보는 제5 비주얼 인식 조건으로서 충족하지 않게 된다.When the visual information for the fifth visual recognition condition is satisfied, as shown in FIG. 7 , when a face (front face) is first recognized in the image data, the recognition area for the recognized face (face recognition area) After (F) is defined, if a finger object (exactly a hand) has a specific shape (ok shape) within the corresponding face recognition area (F) and a movement moving in a specific direction is recognized, the user's visual information is converted to the fifth visual recognition condition can be judged to be satisfied. Conversely, if the hand is not recognized within the corresponding face recognition area F even if the hand is positioned and moved other than the corresponding face recognition area F, even if the user's hand is actually photographed by the camera 1, the face recognition area F Since the corresponding object is not recognized within, the corresponding visual information is not satisfied as the fifth visual recognition condition.

한편, 비주얼 정보 인식부(200)는, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건, 제3 비주얼 인식 조건 및 제5 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 비주얼 인식 조건으로 미리 설정되어, 비주얼 정보가 미리 설정된 비주얼 인식 조건 중 어느 하나에 충족되는지 여부를 판정하도록 동작할 수 있다.On the other hand, the visual information recognition unit 200, at least one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the fifth visual recognition condition is preset as a visual recognition condition according to the user's selection. , to determine whether the visual information satisfies any one of preset visual recognition conditions.

여기서, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건, 제3 비주얼 인식 조건 및 제5 비주얼 인식 조건 중 어느 하나의 비주얼 인식 조건이 기본 비주얼 인식 조건으로 설정될 수 있다. 즉, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건, 제3 비주얼 인식 조건 및 제5 비주얼 인식 조건 중 어느 하나의 비주얼 인식 조건이 기본 비주얼 인식 조건으로 설정될 수 있으며, 상황에 따라 둘 이상의 조합으로도 설정 가능하다.Here, any one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the fifth visual recognition condition may be set as the basic visual recognition condition. That is, any one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the fifth visual recognition condition may be set as the basic visual recognition condition, and a combination of two or more may be used depending on the situation. can also be set.

상기 비주얼 정보 인식부(200)는, 인식된 사용자의 정면 얼굴 중앙점을 기준으로 형성되는 사분면 중 미리 설정된 사분면 상에 손가락 객체를 추가적으로 인식하는 제6 비주얼 인식 조건을 비주얼 인식 조건으로서 추가 저장할 수 있다.The visual information recognition unit 200 may additionally store a sixth visual recognition condition for additionally recognizing a finger object on a preset quadrant among quadrants formed based on the recognized user's front face center point as a visual recognition condition. .

상기 제6 비주얼 인식 조건에 대한 비주얼 정보가 충족되는 경우에 대하여 설명하면 도 8에 도시된 바와 같이, 영상데이터에서 얼굴(정면 얼굴)이 먼저 인식되면 인식된 얼굴에 대한 인식 영역(얼굴 인식 영역)(F)과 해당 얼굴 인식 영역(F)의 중앙점(C)을 각각 정의한 후, 중앙점(C)을 기준으로 해당 영상을 4개의 사분면으로 구분하고, 이 중 예를 들어 3사분면(③) 영역 내에 손가락 객체(정확히는 손)가 인식되면 사용자의 비주얼 정보가 제6 비주얼 인식 조건에 충족된 것으로 판정할 수 있다. 이와 반대로, 해당 3사분면(③) 영역 이외에 손이 위치하여 3사분면(③) 영역 내에는 손이 인식되지 않는 경우 실제로 사용자의 손이 1, 2, 4 사분면 영역(①, ②, ④) 중 어느 한 영역에 위치하였더라도 3사분면(③) 영역 이내에 해당 객체가 인식되지 않음으로써 해당 비주얼 정보는 제6 비주얼 인식 조건으로서 충족하지 않게 된다.When the visual information for the sixth visual recognition condition is satisfied, as shown in FIG. 8, when a face (front face) is first recognized in the image data, the recognition area for the recognized face (face recognition area) After defining (F) and the central point (C) of the corresponding face recognition area (F), respectively, the image is divided into four quadrants based on the central point (C), and among these, for example, the third quadrant (③) When a finger object (more precisely, a hand) is recognized in the region, it may be determined that the user's visual information satisfies the sixth visual recognition condition. Conversely, if the hand is not recognized in the third quadrant (③) area because the hand is located other than the corresponding third quadrant (③) area, the user's hand is actually located in any of the 1, 2, and 4 quadrant areas (①, ②, ④). Even if it is located in one area, since the object is not recognized within the area of the third quadrant (③), the corresponding visual information is not satisfied as the sixth visual recognition condition.

한편, 비주얼 정보 인식부(200)는, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건, 제3 비주얼 인식 조건 및 제6 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 비주얼 인식 조건으로 미리 설정되어, 비주얼 정보가 미리 설정된 비주얼 인식 조건 중 어느 하나에 충족되는지 여부를 판정하도록 동작할 수 있다.On the other hand, the visual information recognition unit 200, at least one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the sixth visual recognition condition is preset as a visual recognition condition according to the user's selection. , to determine whether the visual information satisfies any one of preset visual recognition conditions.

여기서, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건, 제3 비주얼 인식 조건 및 제6 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 비주얼 인식 조건으로 설정될 수 있다. 즉, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건, 제3 비주얼 인식 조건 및 제6 비주얼 인식 조건 중 어느 하나의 비주얼 인식 조건이 기본 비주얼 인식 조건으로 설정될 수 있으며, 상황에 따라 둘 이상의 조합으로도 설정 가능하다.Here, at least one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the sixth visual recognition condition may be set as the visual recognition condition according to a user's selection. That is, any one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the sixth visual recognition condition may be set as the basic visual recognition condition, and a combination of two or more may be used depending on the situation. can also be set.

상기 비주얼 정보 인식부(200)는, 인식된 사용자의 정면 얼굴 중앙점을 기준으로 형성되는 사분면 중 미리 설정된 사분면 상에 손가락 객체에 대하여 미리 설정된 모션을 추가적으로 인식하는 제7 비주얼 인식 조건을 비주얼 인식 조건으로서 추가 저장할 수 있다.The visual information recognition unit 200 sets a seventh visual recognition condition for additionally recognizing a preset motion with respect to a finger object on a preset quadrant among quadrants formed based on the recognized user's front face center point as a visual recognition condition. can be further stored as

상기 제7 비주얼 인식 조건에 대한 비주얼 정보가 충족되는 경우에 대하여 설명하면 도 9에 도시된 바와 같이, 영상데이터에서 얼굴(정면 얼굴)이 먼저 인식되면 인식된 얼굴에 대한 인식 영역(얼굴 인식 영역)(F)과 해당 얼굴 인식 영역(F)의 중앙점(C)을 각각 정의한 후, 중앙점(C)을 기준으로 해당 영상을 4개의 사분면으로 구분하고, 이 중 예를 들어 3사분면(③) 영역 내에 손가락 객체(정확히는 손)가 특정 모양(오케이 모양)을 하고 특정 방향으로 이동하는 움직임 인식되면 사용자의 비주얼 정보가 제7 비주얼 인식 조건에 충족된 것으로 판정할 수 있다. 이와 반대로, 해당 3사분면(③) 영역 이외에 손이 위치하여 3사분면(③) 영역 내에는 손이 인식되지 않는 경우 실제로 사용자의 손이 1, 2, 4 사분면 영역(①, ②, ④) 중 어느 한 영역에 위치하였더라도 3사분면(③) 영역 이내에 해당 객체가 인식되지 않음으로써 해당 비주얼 정보는 제7 비주얼 인식 조건으로서 충족하지 않게 된다.When the visual information for the seventh visual recognition condition is satisfied, as shown in FIG. 9 , when a face (front face) is first recognized in the image data, the recognition area for the recognized face (face recognition area) After defining (F) and the central point (C) of the corresponding face recognition area (F), respectively, the image is divided into four quadrants based on the central point (C), and among these, for example, the third quadrant (③) When a finger object (exactly a hand) has a specific shape (ok shape) in the region and a movement is recognized to move in a specific direction, it may be determined that the user's visual information satisfies the seventh visual recognition condition. Conversely, if the hand is not recognized in the third quadrant (③) area because the hand is located other than the corresponding third quadrant (③) area, the user's hand is actually located in any of the 1, 2, and 4 quadrant areas (①, ②, ④). Even if it is located in one area, the corresponding visual information is not satisfied as the seventh visual recognition condition because the corresponding object is not recognized within the area of the third quadrant (③).

한편, 비주얼 정보 인식부(200)는, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건, 제3 비주얼 인식 조건 및 제7 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 비주얼 인식 조건으로 미리 설정되어, 비주얼 정보가 미리 설정된 비주얼 인식 조건 중 어느 하나에 충족되는지 여부를 판정하도록 동작할 수 있다.On the other hand, the visual information recognition unit 200, at least one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the seventh visual recognition condition is preset as a visual recognition condition according to the user's selection. , to determine whether the visual information satisfies any one of preset visual recognition conditions.

여기서, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건, 제3 비주얼 인식 조건 및 제7 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 비주얼 인식 조건으로 설정될 수 있다. 즉, 제1 비주얼 인식 조건, 제2 비주얼 인식 조건, 제3 비주얼 인식 조건 및 제7 비주얼 인식 조건 중 어느 하나의 비주얼 인식 조건이 기본 비주얼 인식 조건으로 설정될 수 있으며, 상황에 따라 둘 이상의 조합으로도 설정 가능하다.Here, at least one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the seventh visual recognition condition may be set as the visual recognition condition according to a user's selection. That is, any one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, and the seventh visual recognition condition may be set as the basic visual recognition condition, and a combination of two or more may be used depending on the situation. can also be set.

상기 음성데이터 획득부(300)는, 음성 인식 서비스 장치(10)에 내장된 마이크(3), ADC(4)와 STT 모듈(5)를 통해 사용자로부터 입력 받은 음성신호(음파신호)를 전기신호로 변환하고, 변환된 전기신호를 디지털신호로 변환한 후, 변환된 디지털신호를 텍스트데이터 형태의 음성데이터로 변환할 수 있다. 도시되지 않았으나, 음성 인식 서비스 장치(10)에는 인체감지센서(미도시)가 설치될 수 있으며, 인체감지센서(미도시)를 통해 일정 거리 이내(또는 특정 영역 이내)의 객체(사용자)가 인식되면 마이크(3), ADC(4)와 STT 모듈(5)이 구동하여 외부로부터 입력된 음성에 대한 전기적신호로의 변환(아날로그신호), 전기적신호에 대한 샘플링 및 양자화 처리를 통한 디지털신호(음성데이터)로의 변환, 및 음성데이터에 대한 텍스트 변환 과정을 거쳐 텍스트데이터 형태의 음성데이터를 생성하는 프로세스를 처리할 수 있다. The voice data acquisition unit 300 converts the voice signal (sound wave signal) received from the user through the microphone 3, the ADC 4 and the STT module 5 built in the voice recognition service device 10 to an electrical signal. , converts the converted electrical signal into a digital signal, and then converts the converted digital signal into voice data in the form of text data. Although not shown, a human body detection sensor (not shown) may be installed in the voice recognition service device 10, and an object (user) within a certain distance (or within a specific area) is recognized through the human body detection sensor (not shown). When the microphone (3), ADC (4) and STT module (5) are driven, conversion of the externally inputted voice into an electrical signal (analog signal), and sampling and quantization of the electrical signal, the digital signal (voice) data) and a process of generating voice data in the form of text data through a process of converting the voice data into text may be processed.

이때, 음성데이터 획득부(300)는 음성 인식 서비스 장치(10)가 슬립 상태에 있다가 인체감지센서(미도시)가 동작할 때, 마이크(3)와 STT 모듈(5)을 통해 음성데이터가 생성되면 그로부터 해당 음성데이터를 획득하여 웨이크 업을 위한 음성 인식 조건에 충족하는지 여부를 판정하기 위해 음성정보 인식부(400)로 전달할 수 있다. At this time, the voice data acquisition unit 300 receives voice data through the microphone 3 and the STT module 5 when the human body sensor (not shown) operates while the voice recognition service device 10 is in a sleep state. When generated, the corresponding voice data may be obtained therefrom and transmitted to the voice information recognition unit 400 to determine whether the voice recognition condition for wake-up is satisfied.

상기 음성정보 인식부(400)는, 음성데이터 획득부(300)로부터 전달된 음성데이터를 미리 설정된 다수의 웨이크업 텍스트데이터 중 어느 하나와 일치하는지 여부를 판정할 수 있다.The voice information recognition unit 400 may determine whether the voice data transmitted from the voice data acquisition unit 300 matches any one of a plurality of preset wake-up text data.

예를 들어, 입력된 음성데이터의 내용이 "헬로우 로봇 일어나"라는 내용이고, 웨이크업 텍스트데이터가 "안녕 친구 일어나", "헬로우 로봇 일어나", "헬로우 로봇" 등일 경우, "헬로우 로봇 일어나"와 같이 일치하는 웨이크업 텍스트데이터가 존재하므로, 입력된 음성데이터가 미리 저장된 텍스트데이터와 일치하는 것으로 판정할 수 있다. 물론, 텍스트가 정확히 100% 일치할 필요는 없으며 형태소 분석, 의미 분석 등 자연어처리(Natural Language Processing)를 통해 미리 설정된 유사 범위 내에서 음성데이터와 웨이크업 텍스트데이터 간이 일치하는지를 판정할 수도 있다.For example, if the content of the input voice data is "Hello robot wake up" and the wakeup text data is "Hello friend wake up", "Hello robot wake up", "Hello robot", etc., "Hello robot wake up" and Since the wake-up text data that matches the same exists, it can be determined that the input voice data matches the pre-stored text data. Of course, the text does not need to be exactly 100% identical, and it is also possible to determine whether the voice data and the wakeup text data match within a preset similarity range through natural language processing such as morphological analysis and semantic analysis.

상기 웨이크 업 실행부(500)는, 비주얼 정보가 비주얼 인식 조건에 충족되면, 음성 인식 서비스 장치(100)의 구동을 위한 웨이크 업을 실행할 수 있다. 이러한 웨이크 업 실행부(500)는, 음성데이터가 다수의 웨이크업 텍스트데이터 중 어느 하나와 일치하고(즉, 웨이크 업을 위한 사용자 음성 인식 시), 비주얼 정보가 비주얼 인식 조건에 충족하는 경우(즉, 웨이크 업을 위한 사용자 비주얼 인식 시), 음성 인식 서비스 장치(10)의 구동을 위한 웨이크 업을 기본적으로 실행할 수 있다. 여기서, 비주얼 정보 인식의 경우 상술한 바와 같이 제1 내지 제7 비주얼 인식 조건이 택일적 또는 혼용적으로 설정될 수 있으며, 음성 인식의 경우 음성 인식 서비스 장치(10)의 주변 상황에 따라 생략된 상태로 웨이크 업이 실행될 수도 있다. 이에 대한 보다 상세한 설명은 웨이크 업 실행 조건 조정부(600)를 통해 후술한다.The wake-up executor 500 may perform a wake-up for driving the voice recognition service apparatus 100 when the visual information satisfies the visual recognition condition. The wake-up execution unit 500, when the voice data matches any one of a plurality of wake-up text data (that is, when user voice recognition for wake-up), and the visual information satisfies the visual recognition condition (that is, , when user visual recognition for wakeup), and wakeup for driving the voice recognition service apparatus 10 may be basically executed. Here, in the case of visual information recognition, as described above, the first to seventh visual recognition conditions may be alternatively or mixedly set, and in the case of voice recognition, the state is omitted depending on the surrounding circumstances of the voice recognition service device 10 . A low wakeup may be performed. A more detailed description of this will be described later through the wake-up execution condition adjusting unit 600 .

상기 웨이크 업 실행 조건 조정부(600)는, 도 10에 도시된 바와 같이 음성정보 인식부(400)의 기능이 오프 상태로 설정된 제1 상태(S602), 음성정보 인식부(400)를 통해 음성데이터가 2개 이상 생성되는 제2 상태(S603), 음성데이터에 대하여 미리 설정된 문법 체크를 통해 문법에 맞지 않는 내용을 포함하는 제3 상태(S604) 중 적어도 하나의 상태를 인식하면, 웨이크 업 실행부(500)가 비주얼 정보 인식부(200)의 판정 결과만으로 웨이크 업 동작이 실행되도록 웨이크 업 실행부(500)의 실행 조건을 조정할 수 있다.As shown in FIG. 10 , the wake-up execution condition adjusting unit 600 performs a first state ( S602 ) in which the function of the voice information recognition unit 400 is set to an off state, and the voice data through the voice information recognition unit 400 . When at least one of the second state ( S603 ) in which two or more is generated and the third state ( S604 ) including contents that do not conform to grammar are recognized through a preset grammar check for voice data, the wake-up execution unit The execution condition of the wake-up execution unit 500 may be adjusted so that the wake-up operation is executed only with the result of the determination of the visual information recognition unit 200 .

좀 더 구체적으로, 음성 인식 서비스 장치(10)가 소음이 심한 장소에 배치되어 임무를 수행하는 경우, 음성정보 인식부(400)를 통한 웨이크 업 실행 시 오작동 가능성이 있거나, 다른 잡음과 함께 사용자 음성이 인식되는 경우 음성 인식 오류로 인해 불편함을 느낄 수 있으므로, 이러한 환경에서는 관리자의 선택에 따라 음성정보 인식부(400)의 기능을 오프시킬 수 있다. 이러한 음성정보 인식부(400)는 웨이크 업만을 위한 구성으로 추후에 음성 인식 서비스 장치(10)가 웨이크 업 된 후 음성 인식 대화 서비스 등과는 별도로 운영될 수 있다. 이러한 경우가 제1 상태(S602)에 해당되며 이때, 비주얼 인식 결과(S601)만이 입력될 수 있다.More specifically, when the voice recognition service device 10 is disposed in a noisy place to perform a task, there is a possibility of a malfunction during wake-up through the voice information recognition unit 400 or the user's voice along with other noises. If this is recognized, it may feel uncomfortable due to a voice recognition error, so in such an environment, the function of the voice information recognition unit 400 may be turned off according to the selection of the administrator. The voice information recognition unit 400 is configured only for wake-up and may be operated separately from the voice recognition conversation service after the voice recognition service apparatus 10 wakes up later. This case corresponds to the first state ( S602 ), and in this case, only the visual recognition result ( S601 ) may be input.

또한, 제2 상태(S603)는 음성 인식 서비스 장치(10)가 상술한 바와 같이 소음이 심한 장소에 배치되어 임무를 수행하는 경우, "헬로우 로봇 일어나"라는 사용자의 음성 속에 다른 음성 예를 들어 "이리로 와" 등의 음성이 섞여 입력되면 "헬로우 로봇 일어나"라는 사용자의 음성 속에 다른 음성 예를 들어 "이리로 와" 등의 음성이 섞여 입력되면 "헬로우 로봇 일어나 이리로 와" 등으로 인식될 수 있는데, 이러한 경우는 문법적으로 오류인 상태는 아니지만 웨이크 업을 할 수 있는 음성 명령어가 2개 이상 입력된 상태로 제2 상태(S603)에 해당하게 된다.In addition, in the second state (S603), when the voice recognition service device 10 is disposed in a noisy place as described above and performs a task, another voice, for example, " If a voice such as “Come here” is mixed and input, other voices such as “Come here” are mixed with the user’s voice saying “Hello robot, wake up,” and it can be recognized as “Hello robot, wake up, come here”. In this case, although it is not a grammatically error state, two or more voice commands for wake-up are input and correspond to the second state ( S603 ).

또한, 제3 상태(S604)는 음성 인식 서비스 장치(10)가 상술한 바와 같이 소음이 심한 장소에 배치되어 임무를 수행하는 경우, "헬로우 로봇 일어나"라는 사용자의 음성 속에 다른 음성 예를 들어 "이리로 와" 등의 음성이 섞여 입력되면 "헬로우 이리로 와 로봇 일어나" 등으로 인식될 수 있는데, 이러한 경우는 문법적으로 오류인 상태로 인식됨으로써 제3 상태(S604)에 해당하게 된다.In addition, in the third state (S604), when the voice recognition service device 10 is disposed in a noisy place as described above and performs a task, another voice, for example, " When a voice such as "Come here" is mixed and input, it can be recognized as "Hello, come here, robot wake up", etc. In this case, it is recognized as a grammatical error state and corresponds to the third state (S604).

이와 같이, 웨이크 업 실행 조건 조정부(600)는, 상술한 제1 상태(S602), 제2 상태(S603) 및 제3 상태(S604) 중 어느 한 상태를 인식하는 경우, 비주얼 정보 인식부(200)의 판정 결과만으로 웨이크 업 동작이 실행되도록 웨이크 업 실행부(500)의 실행 조건을 조정할 수 있다.In this way, when the wakeup execution condition adjusting unit 600 recognizes any one of the above-described first state (S602), second state (S603) and third state (S604), the visual information recognition unit 200 ), the execution condition of the wake-up execution unit 500 may be adjusted so that the wake-up operation is executed only as a result of the determination.

이상에서 설명한 것은 본 발명에 의한 인공지능을 활용한 비주얼 웨이크 업 시스템을 실시하기 위한 하나의 실시예에 불과한 것으로서, 본 발명은 상기 실시예에 한정되지 않고, 이하의 특허청구범위에서 청구하는 바와 같이 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변경 실시가 가능한 범위까지 본 발명의 기술적 정신이 있다고 할 것이다.What has been described above is only one embodiment for implementing a visual wake-up system utilizing artificial intelligence according to the present invention, and the present invention is not limited to the above embodiment, and as claimed in the claims below Without departing from the gist of the present invention, it will be said that the technical spirit of the present invention exists to the extent that various modifications can be made by anyone with ordinary knowledge in the field to which the invention pertains.

1000: 인공지능을 활용한 비주얼 웨이크 업 시스템
100: 영상데이터 획득부
200: 비주얼 정보 인식부
300: 음성데이터 획득부
400: 음성정보 인식부
500: 웨이크 업 실행부
600: 웨이크 업 실행 조건 조정부
10: 음성 인식 서비스 장치
1: 카메라
2: 영상인식 솔루션
3: 마이크
4: ADC
5: STT 모듈
F: 얼굴 인식 영역
C: 얼굴 중앙점1000: Visual wake-up system using artificial intelligence
100: image data acquisition unit
200: visual information recognition unit
300: voice data acquisition unit
400: voice information recognition unit
500: wake-up execution unit
600: wake-up execution condition adjustment unit
10: voice recognition service device
1: Camera
2: Image recognition solution
3: microphone
4: ADC
5: STT module
F: face recognition area
C: face center point

Claims

음성 인식 서비스 장치에 내장된 카메라를 통해 영상데이터를 획득하는 영상데이터 획득부;
음성 인식 서비스 장치에 내장된 영상인식솔루션을 이용하여 상기 영상데이터로부터 미리 설정된 사용자의 안면, 시선 및 모션 중 적어도 하나의 비주얼 정보를 인식하고, 인식된 상기 비주얼 정보가 미리 설정된 비주얼 인식 조건에 충족하는지 여부를 판정하는 비주얼 정보 인식부; 및
상기 비주얼 정보가 상기 비주얼 인식 조건에 충족되면, 음성 인식 서비스 장치의 구동을 위한 웨이크 업을 실행하는 웨이크 업 실행부를 포함하고,
상기 비주얼 정보 인식부는,
사용자의 안면 중 정면 얼굴이 미리 설정된 비율과 미리 설정된 시간 이상으로 노출되는 제1 비주얼 인식 조건, 사용자의 시선이 미리 설정된 시간 이상으로 카메라를 주시하는 제2 비주얼 인식 조건, 미리 설정된 사용자의 손짓 및 박수 중 적어도 하나의 모션이 인식되는 제3 비주얼 인식 조건, 사용자의 정면 얼굴 인식 영역 내에 손가락 객체를 추가적으로 인식하는 제4 비주얼 인식 조건, 사용자의 정면 얼굴 인식 영역 내에 손가락 객체에 대하여 미리 설정된 모션을 추가적으로 인식하는 제5 비주얼 인식 조건, 상기 정면 얼굴 인식 영역과 상기 정면 얼굴 인식 영역의 중앙점을 각각 정의한 후, 상기 중앙점을 기준으로 형성되는 사분면 중 미리 설정된 어느 한 사분면을 통해 손가락 객체를 추가적으로 인식하는 제6 비주얼 인식 조건, 및 상기 정면 얼굴 인식 영역과 상기 정면 얼굴 인식 영역의 중앙점을 각각 정의한 후, 상기 중앙점을 기준으로 형성되는 사분면 중 미리 설정된 어느 한 사분면을 통해 손가락 객체에 대하여 미리 설정된 모션을 추가적으로 인식하는 제7 비주얼 인식 조건을 상기 비주얼 인식 조건으로서 저장하고,
상기 제1 비주얼 인식 조건, 상기 제2 비주얼 인식 조건, 상기 제3 비주얼 인식 조건, 상기 제4 비주얼 인식 조건, 상기 제5 비주얼 인식 조건, 상기 제6 비주얼 인식 조건 및 상기 제7 비주얼 인식 조건 중 적어도 하나가 사용자의 선택에 따라 상기 비주얼 인식 조건으로 미리 설정되어, 상기 비주얼 정보가 미리 설정된 상기 비주얼 인식 조건 중 어느 하나에 충족되는지 여부를 판정하는 것을 특징으로 하는 인공지능을 활용한 비주얼 웨이크 업 시스템.
an image data acquisition unit for acquiring image data through a camera built into the voice recognition service device;
Recognizes at least one visual information of the user's face, gaze, and motion preset from the image data using the image recognition solution built into the voice recognition service device, and whether the recognized visual information meets the preset visual recognition condition a visual information recognition unit to determine whether or not; and
When the visual information satisfies the visual recognition condition, a wakeup execution unit configured to perform a wakeup for driving a voice recognition service device,
The visual information recognition unit,
A first visual recognition condition in which the front face of the user is exposed at a preset ratio and for a preset time or longer, a second visual recognition condition in which the user's gaze looks at the camera for a preset time or longer, a preset user's hand gestures and applause A third visual recognition condition in which at least one motion is recognized, a fourth visual recognition condition in which a finger object is additionally recognized in the user's front face recognition area, and a preset motion for the finger object in the user's front face recognition area is additionally recognized After defining a fifth visual recognition condition, a center point of the front face recognition area and the front face recognition area, respectively, a method of additionally recognizing a finger object through any one preset among quadrants formed based on the center point 6 After defining the visual recognition conditions and the center points of the front face recognition area and the front face recognition area, respectively, a preset motion for the finger object is performed through any one of the quadrants formed based on the center point. Storing the additionally recognized seventh visual recognition condition as the visual recognition condition,
At least one of the first visual recognition condition, the second visual recognition condition, the third visual recognition condition, the fourth visual recognition condition, the fifth visual recognition condition, the sixth visual recognition condition, and the seventh visual recognition condition One is preset as the visual recognition condition according to a user's selection, and it is determined whether the visual information satisfies any one of the preset visual recognition conditions.

제1 항에 있어서,
음성 인식 서비스 장치에 내장된 마이크, ADC 및 STT모듈을 통해 텍스트데이터 형태의 음성데이터를 획득하는 음성데이터 획득부; 및
상기 음성데이터가 미리 설정된 다수의 웨이크업 텍스트데이터 중 어느 하나와 일치하는지 여부를 판정하는 음성정보 인식부를 더 포함하고,
상기 웨이크 업 실행부는,
상기 음성데이터가 상기 다수의 웨이크업 텍스트데이터 중 어느 하나와 일치하고, 상기 비주얼 정보가 상기 비주얼 인식 조건에 충족하는 경우 음성 인식 서비스 장치의 구동을 위한 웨이크 업을 실행하는 것을 특징으로 하는 인공지능을 활용한 비주얼 웨이크 업 시스템.
The method of claim 1,
a voice data acquisition unit for acquiring voice data in the form of text data through a microphone, ADC, and STT module built into the voice recognition service device; and
Further comprising a voice information recognition unit for determining whether the voice data matches any one of a plurality of preset wake-up text data,
The wake-up execution unit,
Artificial intelligence, characterized in that when the voice data matches any one of the plurality of wakeup text data and the visual information satisfies the visual recognition condition, a wakeup for driving a voice recognition service device is executed Visual wake-up system utilized.

제2 항에 있어서,
상기 음성정보 인식부의 기능이 오프 상태로 설정된 제1 상태, 상기 음성정보 인식부를 통해 상기 음성데이터가 2개 이상 생성되는 제2 상태, 상기 음성데이터에 대하여 미리 설정된 문법 체크를 통해 문법에 맞지 않는 내용을 포함하는 제3 상태 중 적어도 하나의 상태를 인식하면, 상기 웨이크 업 실행부가 상기 비주얼 정보 인식부의 판정 결과만으로 웨이크 업 동작이 실행되도록 상기 웨이크 업 실행부의 실행 조건을 조정하는 웨이크 업 실행 조건 조정부를 더 포함하는 것을 특징으로 하는 인공지능을 활용한 비주얼 웨이크 업 시스템.
3. The method of claim 2,
A first state in which the function of the voice information recognition unit is set to an off state, a second state in which two or more voice data are generated through the voice information recognition unit, and contents that do not conform to grammar through a preset grammar check for the voice data When recognizing at least one of the third states including Visual wake-up system using artificial intelligence, characterized in that it further comprises.

삭제delete