KR20220019340A

KR20220019340A - Apparatus and method for monitoring emotion change through user's mobility and voice analysis in personal space observation image

Info

Publication number: KR20220019340A
Application number: KR1020200099655A
Authority: KR
Inventors: 신사임; 장진예; 정민영; 정혜동; 김보은; 김산
Original assignee: 한국전자기술연구원
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2022-02-17

Abstract

The present invention relates to an apparatus for monitoring emotion change through mobility and voice analysis to accurately determine a patient's condition. According to the present invention, the apparatus comprises: a communication unit for communication; a camera unit continuously photographing a user's predetermined space to generate an observation image; an audio unit continuously receiving voices in the space to generate observation voices; a mobility analysis unit generating a mobility pattern of the user on the basis of the observation video and analyzing the generated mobility pattern; a speech analyzer generating a speech pattern of the user on the basis of the observed voice and analyzing the generated speech pattern; an emotion analysis unit analyzing the user's emotional state on the basis of the mobility pattern and the speech pattern through a neural network in which the user's emotional state corresponding to the mobility pattern and the speech pattern has been learned; and a report unit transmitting the analyzed emotional state to a guardian device and a remote medical treatment device through the communication unit.

Description

개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 장치 및 방법{Apparatus and method for monitoring emotion change through user's mobility and voice analysis in personal space observation image}Apparatus and method for monitoring emotion change through user's mobility and voice analysis in personal space observation image

본 발명은 감정 변화를 모니터링하기 위한 기술에 관한 것으로, 보다 상세하게는, 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 장치 및 방법에 관한 것이다. The present invention relates to a technique for monitoring emotional change, and more particularly, to an apparatus and method for monitoring emotional change through analysis of a user's movement and voice in a personal space observation image.

최근 인구의 노령화와 핵가족화에 따라 도움을 줄 수 있는 간병인이 없이 홀로 생활하는 독거노인 혹은 감정 기복이 심한 정신질환자의 위급상황 발생 시, 행동의 제약을 받아 긴급한 구급상황을 이루지 못해 귀중한 생명을 잃게 되는 인명구조사고의 문제가 있다. Due to the recent aging of the population and the nuclear family, in the event of an emergency, the elderly living alone or mentally ill with severe emotional ups and downs living alone without a caregiver who can help, may lose precious lives due to restrictions on their actions and failure to achieve urgent emergency situations. There is the problem of life-saving accidents.

한국공개특허 제1952281호 2019년 02월 20일 공개 (명칭: 노인생활감지서버 및 그 동작 방법)Korean Patent Publication No. 1952281 published on February 20, 2019 (Title: Elderly life detection server and its operation method)

본 발명의 목적은 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 장치 및 방법을 제공함에 있다. It is an object of the present invention to provide an apparatus and method for monitoring a user's movement and emotional change through voice analysis in a personal space observation image.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 장치는 통신을 위한 통신부와, 사용자의 소정의 공간을 지속적으로 촬영하여 관찰 영상을 생성하는 카메라부와, 상기 공간 내의 음성을 지속적으로 수신하여 관찰 음성을 생성하는 오디오부와, 상기 관찰 영상을 기초로 상기 사용자의 운동 패턴을 생성하고, 생성된 운동 패턴을 분석하는 운동성분석부와, 상기 관찰 음성을 기초로 상기 사용자의 발화 패턴을 생성하고, 생성된 발화 패턴을 분석하는 발화분석부와, 상기 운동 패턴 및 상기 발화 패턴을 기초로 상기 운동 패턴 및 상기 발화 패턴에 대응하는 사용자의 감정 상태가 학습된 신경망을 통해 상기 사용자의 감정 상태를 분석하는 감정분석부와, 상기 분석된 감정 상태를 상기 통신부를 통해 보호자장치 및 원격진료장치로 전송하는 리포트부를 포함한다. An apparatus for monitoring emotional changes through movement and voice analysis according to a preferred embodiment of the present invention for achieving the above object is a communication unit for communication and an observation image by continuously photographing a predetermined space of the user. a camera unit for generating; an audio unit for continuously receiving voices in the space to generate an observation voice; a movement analysis unit for generating an exercise pattern of the user based on the observation image and analyzing the generated exercise pattern; , a speech analysis unit that generates the user's speech pattern based on the observed voice and analyzes the generated speech pattern, and the exercise pattern and the user corresponding to the speech pattern based on the exercise pattern and the speech pattern It includes an emotion analysis unit that analyzes the emotional state of the user through a neural network in which the emotional state is learned, and a report unit that transmits the analyzed emotional state to a guardian device and a remote medical device through the communication unit.

상기 운동성분석부는 상기 사용자의 일반적인 운동 패턴을 학습한 제1 신경망을 포함하며, 실시간으로 입력되는 관찰 영상에서 이미지 인식을 통해 상기 사용자가 식별되면, 식별된 사용자의 운동 패턴을 추출하고, 상기 제1 신경망을 통해 상기 학습된 운동 패턴과 상기 추출된 운동 패턴이 차이가 있는지 여부를 판별한 후, 상기 학습된 운동 패턴과 상기 추출된 운동 패턴이 차이가 있으면, 차이가 있는 운동 패턴과 함께 운동 패턴 이상 신호를 출력하는 것을 특징으로 한다. The movement analysis unit includes a first neural network that has learned the general movement pattern of the user, and when the user is identified through image recognition from an observation image input in real time, extracts the movement pattern of the identified user, and the first After determining whether there is a difference between the learned movement pattern and the extracted movement pattern through a neural network, if there is a difference between the learned movement pattern and the extracted movement pattern, the movement pattern is abnormal with the difference movement pattern It is characterized in that it outputs a signal.

상기 운동성분석부는 상기 공간을 복수의 단위 공간으로 구분하고, 상기 관찰 영상으로부터 단위 시간 별로 사용자의 상기 단위 공간 상의 위치를 추출하여 운동 패턴을 생성하는 것을 특징으로 한다. The movement analyzer divides the space into a plurality of unit spaces, and extracts a position of the user on the unit space for each unit time from the observation image to generate an exercise pattern.

상기 발화분석부는 상기 사용자의 일반적인 발화 패턴을 학습한 제2 신경망을 포함하며, 실시간으로 입력되는 관찰 음성에서 음성 인식을 통해 상기 사용자가 식별되면, 식별된 사용자의 발화 패턴을 추출하고, 상기 제1 신경망을 통해 상기 학습된 발화 패턴과 상기 추출된 발화 패턴이 차이가 있는지 여부를 판별한 후, 상기 학습된 발화 패턴과 상기 추출된 발화 패턴이 차이가 있으면, 차이가 있는 발화 패턴과 함께 발화 패턴 이상 신호를 출력하는 것을 특징으로 한다. The speech analysis unit includes a second neural network that has learned the general speech pattern of the user, and when the user is identified through voice recognition in the observation voice input in real time, extracts the speech pattern of the identified user, and the first After determining whether there is a difference between the learned speech pattern and the extracted speech pattern through a neural network, if there is a difference between the learned speech pattern and the extracted speech pattern, the speech pattern is abnormal together with the different speech pattern It is characterized in that it outputs a signal.

상기 발화분석부는 상기 관찰 음성으로부터 단위 시간 별로 발화 속도, 음성 신호의 톤 및 음성 신호의 피치 중 적어도 하나를 추출하여 발화 패턴을 생성하는 것을 특징으로 한다. The speech analysis unit generates a speech pattern by extracting at least one of a speech speed, a tone of a speech signal, and a pitch of a speech signal for each unit time from the observed speech.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 장치는 카메라부가 사용자의 소정의 공간을 지속적으로 촬영하여 관찰 영상을 생성하는 단계와, 오디오부가 상기 공간 내의 음성을 지속적으로 수신하여 관찰 음성을 생성하는 단계와, 운동성분석부가 상기 관찰 영상을 기초로 상기 사용자의 운동 패턴을 생성하고, 생성된 운동 패턴을 분석하는 단계와, 발화분석부가 상기 관찰 음성을 기초로 상기 사용자의 발화 패턴을 생성하고, 생성된 발화 패턴을 분석하는 단계와, 감정분석부가 상기 운동 패턴 및 상기 발화 패턴을 기초로 상기 운동 패턴 및 상기 발화 패턴에 대응하는 사용자의 감정 상태가 학습된 신경망을 통해 상기 사용자의 감정 상태를 분석하는 단계와, 리포트부가 상기 분석된 감정 상태를 상기 통신부를 통해 보호자장치 및 원격진료장치로 전송하는 단계를 포함한다. An apparatus for monitoring emotional changes through movement and voice analysis according to a preferred embodiment of the present invention for achieving the above object includes the steps of: a camera unit continuously photographing a user's predetermined space to generate an observation image; , generating an observation voice by continuously receiving the voice in the space by an audio unit, generating an exercise pattern of the user based on the observation image by a movement analysis unit, and analyzing the generated exercise pattern, speech analysis additionally generating the user's speech pattern based on the observation voice and analyzing the generated speech pattern; and an emotion analysis unit based on the exercise pattern and the speech pattern, and a user corresponding to the exercise pattern and the speech pattern. Analyzing the emotional state of the user through a neural network in which the emotional state of the user has been learned, and a report unit transmitting the analyzed emotional state to a guardian device and a telemedicine device through the communication unit.

상기 운동 패턴을 분석하는 단계는 상기 운동성분석부가 실시간으로 입력되는 관찰 영상에서 이미지 인식을 통해 상기 사용자가 식별되면, 식별된 사용자의 운동 패턴을 추출하는 단계와, 상기 운동성분석부가 상기 사용자의 일반적인 운동 패턴을 학습한 제1 신경망을 통해 상기 학습된 운동 패턴과 상기 추출된 운동 패턴이 차이가 있는지 여부를 판별하는 단계와, 상기 운동성분석부가 상기 판별 결과, 상기 학습된 운동 패턴과 상기 추출된 운동 패턴이 차이가 있으면, 차이가 있는 운동 패턴과 함께 운동 패턴 이상 신호를 출력하는 단계를 포함한다. The step of analyzing the exercise pattern may include: when the user is identified through image recognition in the observation image input by the movement analysis unit in real time, extracting the identified user's exercise pattern; Determining whether there is a difference between the learned exercise pattern and the extracted exercise pattern through a first neural network that has learned the pattern; If there is this difference, outputting a movement pattern abnormal signal together with the movement pattern with the difference.

상기 발화 패턴을 분석하는 단계는 상기 발화분석부가 실시간으로 입력되는 관찰 음성에서 음성 인식을 통해 상기 사용자가 식별되면, 식별된 사용자의 발화 패턴을 추출하는 단계와, 상기 발화분석부가 상기 사용자의 일반적인 발화 패턴을 학습한 제2 신경망을 통해 상기 학습된 발화 패턴과 상기 추출된 발화 패턴이 차이가 있는지 여부를 판별하는 단계와, 상기 발화분석부가 상기 학습된 발화 패턴과 상기 추출된 발화 패턴이 차이가 있으면, 차이가 있는 발화 패턴과 함께 발화 패턴 이상 신호를 출력하는 단계를 포함한다. The step of analyzing the speech pattern may include: when the user is identified through voice recognition in the observation voice input in real time by the speech analysis unit, extracting the identified user's speech pattern; determining whether there is a difference between the learned utterance pattern and the extracted utterance pattern through a second neural network from which the pattern has been learned; , outputting an utterance pattern abnormality signal together with a different utterance pattern.

본 발명에 따르면, 홈 환경에서도 지속적인 관찰과 모니터링이 필요한 공황장애, 우울증, 조현명 등의 정신질환자들의 지속적인 케어를 지원할 수 있다. 더욱이, 본 발명은 원거리에서도 보호자가 환자의 응급 상황을 알림 받을 수 있고, 의료진들에게 병원 밖에서의 환자 상태를 24시간 모니터링하여 전달할 수 있어서 정확한 환자 상태 파악이 가능하다. According to the present invention, it is possible to support continuous care for mentally ill patients such as panic disorder, depression, and Jo Hyun-myung, who require continuous observation and monitoring even in a home environment. Moreover, according to the present invention, the guardian can be notified of the emergency situation of the patient even from a distance, and it is possible to monitor and deliver the patient's condition outside the hospital to medical staff 24 hours a day, so that it is possible to accurately identify the patient's condition.

도 1은 본 발명의 실시예에 따른 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 시스템의 구성을 설명하기 위한 도면이다.
도 2는 본 발명의 실시예에 따른 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 장치의 구성을 설명하기 위한 도면이다.
도 3은 본 발명의 실시예에 따른 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 장치의 세부적인 구성을 설명하기 위한 도면이다.
도 4는 본 발명의 실시예에 따른 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 방법을 설명하기 위한 흐름도이다. 1 is a diagram for explaining the configuration of a system for monitoring a user's movement and emotional change through voice analysis in a personal space observation image according to an embodiment of the present invention.
2 is a diagram for explaining the configuration of an apparatus for monitoring a user's movement and emotional change through voice analysis in a personal space observation image according to an embodiment of the present invention.
3 is a diagram for explaining the detailed configuration of an apparatus for monitoring a user's movement and emotional change through voice analysis in a personal space observation image according to an embodiment of the present invention.
4 is a flowchart illustrating a method for monitoring a user's movement and emotional change through voice analysis in a personal space observation image according to an embodiment of the present invention.

본 발명의 상세한 설명에 앞서, 이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 실시예에 불과할 뿐, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다. Prior to the detailed description of the present invention, the terms or words used in the present specification and claims described below should not be construed as being limited to their ordinary or dictionary meanings, and the inventors should develop their own inventions in the best way. It should be interpreted as meaning and concept consistent with the technical idea of the present invention based on the principle that it can be appropriately defined as a concept of a term for explanation. Therefore, the embodiments described in the present specification and the configurations shown in the drawings are only the most preferred embodiments of the present invention, and do not represent all the technical spirit of the present invention, so various equivalents that can be substituted for them at the time of the present application It should be understood that there may be water and variations.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. 이때, 첨부된 도면에서 동일한 구성 요소는 가능한 동일한 부호로 나타내고 있음을 유의해야 한다. 또한, 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략할 것이다. 마찬가지의 이유로 첨부 도면에 있어서 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시되었으며, 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In this case, it should be noted that in the accompanying drawings, the same components are denoted by the same reference numerals as much as possible. In addition, detailed descriptions of well-known functions and configurations that may obscure the gist of the present invention will be omitted. For the same reason, some components are exaggerated, omitted, or schematically illustrated in the accompanying drawings, and the size of each component does not fully reflect the actual size.

먼저, 본 발명의 실시예에 따른 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 시스템에 대해서 설명하기로 한다. 도 1은 본 발명의 실시예에 따른 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 시스템의 구성을 설명하기 위한 도면이다. First, a system for monitoring a user's movement and emotional change through voice analysis in a personal space observation image according to an embodiment of the present invention will be described. 1 is a diagram for explaining the configuration of a system for monitoring a user's movement and emotional change through voice analysis in a personal space observation image according to an embodiment of the present invention.

도 1을 참조하면, 모니터링시스템은 모니터링장치(10), 원격진료장치(20) 및 보호자장치(30)를 포함한다. Referring to FIG. 1 , the monitoring system includes a monitoring device 10 , a remote medical treatment device 20 , and a guardian device 30 .

모니터링장치(10)는 사용자의 관찰이 허용되는 지정된 공간(R)에 비접촉식 센서인, 카메라 및 마이크를 설치하고, 카메라 및 마이크를 통해 사용자의 행동을 촬영한 동영상 및 사용자의 발화를 포함하는 음성을 수집하고, 수집된 동영상 및 음성을 기초로 사용자의 행동 패턴 및 음성 패턴을 학습하고, 행동 패턴 및 음성 패턴에 대응하는 사용자의 감정 상태를 학습한다. 그런 다음, 학습된 바에 따라 사용자의 행동 패턴 및 음성 패턴을 분석하고, 행동 패턴 및 음성 패턴에 대응하는 사용자의 감정 상태를 추정한다. 이와 같이, 본 발명은 비접촉식 센서인 카메라 및 마이크만을 활용하여 사용자의 감정 상태를 분석한다. 접촉식 센서를 활용한 감정 분석은 사용자의 이동성에 제한을 주고 불편함을 줄 수 있으므로, 지속적인 기술의 활용과 확장을 제약할 수 있다. 따라서 본 발명에서는 사용자의 관찰이 허용되는 지정된 공간(R)에 비접촉식 센서를 설치하고, 사용자의 일상 데이터, 즉, 동영상과 음성을 분석하여 관찰이 필요한 사용자의 감정 상태를 추정한다. The monitoring device 10 installs a non-contact sensor, a camera and a microphone, in a designated space R where the user's observation is allowed, and a video recording the user's behavior through the camera and the microphone and a voice including the user's utterance It collects, learns a user's behavioral pattern and voice pattern based on the collected video and voice, and learns the user's emotional state corresponding to the behavioral pattern and voice pattern. Then, the user's behavior pattern and voice pattern are analyzed according to the learned behavior pattern and the user's emotional state corresponding to the behavior pattern and voice pattern is estimated. As described above, the present invention analyzes the user's emotional state by using only the camera and the microphone, which are non-contact sensors. Emotion analysis using a touch sensor can limit the user's mobility and cause discomfort, thus limiting the continuous use and expansion of technology. Therefore, in the present invention, a non-contact sensor is installed in a designated space (R) where observation of the user is allowed, and the user's daily data, ie, video and voice, is analyzed to estimate the emotional state of the user who needs observation.

또한, 모니터링장치(10)는 사용자의 추정된 감정 상태를 원격진료장치(20) 및 보호자장치(30)로 전송할 수 있다. 원격진료장치(20)는 원격 진료를 위한 것으로, 사용자의 담당의가 사용하는 장치이다. 담당의는 원격진료장치(20)를 통해 사용자의 감정 상태를 지속적으로 모니터링할 수 있다. 보호자장치(30)는 사용자의 보호자가 사용하는 장치이며, 보호자는 보호자장치(30)를 통해 사용자의 감정 상태를 지속적으로 모니터링할 수 있다. Also, the monitoring device 10 may transmit the user's estimated emotional state to the remote medical treatment device 20 and the guardian device 30 . The telemedicine device 20 is for telemedicine and is used by the user's doctor. The attending physician may continuously monitor the emotional state of the user through the remote medical treatment device 20 . The guardian device 30 is a device used by the user's guardian, and the guardian may continuously monitor the emotional state of the user through the guardian device 30 .

다음으로, 본 발명의 실시예에 따른 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 장치에 대해서 설명하기로 한다. 도 2는 본 발명의 실시예에 따른 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 장치의 구성을 설명하기 위한 도면이다. 도 2를 참조하면, 모니터링장치(10)는 통신부(11), 카메라부(12), 오디오부(13), 입력부(14), 표시부(15), 저장부(16) 및 제어부(17)를 포함한다. Next, a description will be given of an apparatus for monitoring a user's movement and emotional change through voice analysis in a personal space observation image according to an embodiment of the present invention. 2 is a diagram for explaining the configuration of an apparatus for monitoring a user's movement and emotional change through voice analysis in a personal space observation image according to an embodiment of the present invention. Referring to FIG. 2 , the monitoring device 10 includes a communication unit 11 , a camera unit 12 , an audio unit 13 , an input unit 14 , a display unit 15 , a storage unit 16 , and a control unit 17 . include

통신부(11)는 원격진료장치(20) 및 보호자장치(30)와 통신을 위한 것이다. 통신부(11)는 송신되는 신호의 주파수를 상승 변환 및 증폭하는 RF(Radio Frequency) 송신기(Tx) 및 수신되는 신호를 저 잡음 증폭하고 주파수를 하강 변환하는 RF 수신기(Rx)를 포함할 수 있다. 그리고 통신부(11)은 송신되는 신호를 변조하고, 수신되는 신호를 복조하는 모뎀(Modem)을 포함할 수 있다. The communication unit 11 is for communication with the remote medical treatment device 20 and the guardian device 30 . The communication unit 11 may include a radio frequency (RF) transmitter (Tx) for up-converting and amplifying a frequency of a transmitted signal, and an RF receiver (Rx) for low-noise amplifying a received signal and down-converting the frequency. In addition, the communication unit 11 may include a modem that modulates a transmitted signal and demodulates a received signal.

카메라부(12)는 영상을 촬영하기 위한 것으로, 이미지 센서를 포함한다. 이미지 센서는 피사체에서 반사되는 빛을 입력받아 전기신호로 변환하며, CCD(Charged Coupled Device), CMOS(Complementary Metal-Oxide Semiconductor) 등을 기반으로 구현될 수 있다. 카메라부(12)는 아날로그-디지털 변환기(Analog to Digital Converter)를 더 포함할 수 있으며, 이미지 센서에서 출력되는 전기신호를 디지털 수열로 변환하여 제어부(17)로 출력할 수 있다. 특히, 카메라부(12)는 3D 센서를 포함한다. 3D 센서는 비접촉 방식으로 영상의 각 픽셀에 대한 3차원 좌표를 획득하기 위한 센서이다. 카메라부(12)가 객체를 촬영하면, 3D 센서는 촬영된 객체의 영상의 각 픽셀에 대한 3차원 좌표를 검출하고, 검출된 3차원 좌표를 제어부(17)로 전달한다. 3D 센서는 레이저, 적외선, 가시광 등을 이용하는 다양한 방식의 센서를 이용할 수 있다. 이러한 3D 센서는 TOP(Time of Flight), 위상변위(Phase-shift) 및 Online Waveform Analysis 중 어느 하나를 이용하는 레이저 방식 3차원 스캐너, 광 삼각법을 이용하는 레이저 방식 3차원 스캐너, 백색광 혹은 변조광을 이용하는 광학방식 3차원 스캐너, Handheld Real Time 방식의 PHOTO, 광학방식 3차원 스캐너, Pattern Projection 혹은 Line Scanning을 이용하는 광학방식, 레이저 방식 전신 스캐너, 사진 측량(Photogrammetry)을 이용하는 사진방식 스캐너, 키네틱(Kinect Fusion)을 이용하는 실시간(Real Time) 스캐너 등을 예시할 수 있다. 특히, 카메라부(12)는 사용자의 소정의 공간을 지속적으로 촬영하여 관찰 영상을 생성할 수 있다. 이러한 관찰 영상은 제어부(17)에 제공된다. The camera unit 12 is for capturing an image, and includes an image sensor. The image sensor receives light reflected from a subject and converts it into an electrical signal, and may be implemented based on a Charged Coupled Device (CCD), a Complementary Metal-Oxide Semiconductor (CMOS), or the like. The camera unit 12 may further include an analog-to-digital converter, and may convert an electrical signal output from the image sensor into a digital sequence and output it to the control unit 17 . In particular, the camera unit 12 includes a 3D sensor. The 3D sensor is a sensor for acquiring 3D coordinates for each pixel of an image in a non-contact manner. When the camera unit 12 captures an object, the 3D sensor detects 3D coordinates for each pixel of the image of the photographed object, and transmits the detected 3D coordinates to the controller 17 . As the 3D sensor, various types of sensors using laser, infrared, or visible light may be used. These 3D sensors are a laser type 3D scanner using any one of TOP (Time of Flight), phase-shift, and Online Waveform Analysis, a laser type 3D scanner using optical triangulation, and optical using white light or modulated light. Method 3D scanner, Handheld Real Time PHOTO, Optical 3D scanner, Optical method using Pattern Projection or Line Scanning, Laser full body scanner, Photo scanner using Photogrammetry, Kinect Fusion A real-time scanner to be used may be exemplified. In particular, the camera unit 12 may generate an observation image by continuously photographing a predetermined space of the user. This observation image is provided to the control unit 17 .

오디오부(13)는 음성 신호를 출력하기 위한 스피커(SPK)와, 음성과 같은 음성 신호를 입력받기 위한 마이크(MIKE)를 포함한다. 오디오부(13)는 제어부(17)의 제어에 따라 음성 신호를 스피커(SPK)를 통해 출력하거나, 마이크(MIKE)를 통해 입력된 음성 신호를 제어부(17)로 전달할 수 있다. 특히, 오디오부(13)는 사용자의 소정의 공간 내의 음성 신호를 지속적으로 수신하여 관찰 음성을 생성할 수 있다. 그런 다음, 오디오부(13)는 생성된 관찰 음성을 제어부(17)에 제공한다. The audio unit 13 includes a speaker SPK for outputting a voice signal and a microphone MIKE for receiving a voice signal such as voice. The audio unit 13 may output a voice signal through the speaker SPK or transmit a voice signal input through the microphone MIKE to the control unit 17 under the control of the control unit 17 . In particular, the audio unit 13 may continuously receive a user's voice signal in a predetermined space to generate an observation voice. Then, the audio unit 13 provides the generated observation voice to the control unit 17 .

입력부(14)는 모니터링장치(10)을 제어하기 위한 사용자의 키 조작을 입력받고 입력 신호를 생성하여 제어부(17)에 전달한다. 입력부(14)는 모니터링장치(10)를 제어하기 위한 각 종 키들을 포함할 수 있다. 입력부(14)는 표시부(15)가 터치스크린으로 이루어진 경우, 각 종 키들의 기능이 표시부(15)에서 이루어질 수 있으며, 터치스크린만으로 모든 기능을 수행할 수 있는 경우, 입력부(14)는 생략될 수도 있다. The input unit 14 receives a user's key manipulation for controlling the monitoring device 10 , generates an input signal, and transmits it to the control unit 17 . The input unit 14 may include various types of keys for controlling the monitoring device 10 . In the input unit 14, when the display unit 15 is formed of a touch screen, the functions of various keys may be performed on the display unit 15, and when all functions can be performed only with the touch screen, the input unit 14 may be omitted. may be

표시부(15)는 모니터링장치(10)의 메뉴, 입력된 데이터, 기능 설정 정보 및 기타 다양한 정보를 사용자에게 시각적으로 제공한다. 표시부(15)는 모니터링장치(10)의 부팅 화면, 대기 화면, 메뉴 화면, 등의 화면을 출력하는 기능을 수행한다. 특히, 표시부(15)는 본 발명의 실시예에 따른 검침 영상을 화면으로 출력하는 기능을 수행한다. 이러한 표시부(15)는 액정표시장치(LCD, Liquid Crystal Display), 유기 발광 다이오드(OLED, Organic Light Emitting Diodes), 능동형 유기 발광 다이오드(AMOLED, Active Matrix Organic Light Emitting Diodes) 등으로 형성될 수 있다. 한편, 표시부(15)는 터치스크린으로 구현될 수 있다. 이러한 경우, 표시부(15)는 터치센서를 포함한다. 터치센서는 사용자의 터치 입력을 감지한다. 터치센서는 정전용량 방식(capacitive overlay), 압력식, 저항막 방식(resistive overlay), 적외선 감지 방식(infrared beam) 등의 터치 감지 센서로 구성되거나, 압력 감지 센서(pressure sensor)로 구성될 수도 있다. 상기 센서들 이외에도 물체의 접촉 또는 압력을 감지할 수 있는 모든 종류의 센서 기기가 본 발명의 터치센서로 이용될 수 있다. 터치센서는 사용자의 터치 입력을 감지하고, 감지 신호를 발생시켜 제어부(17)로 전송한다. 특히, 표시부(15)가 터치스크린으로 이루어진 경우, 입력부(14) 기능의 일부 또는 전부는 표시부(15)를 통해 이루어질 수 있다. The display unit 15 visually provides a menu of the monitoring device 10, input data, function setting information, and other various information to the user. The display unit 15 performs a function of outputting a boot screen, a standby screen, a menu screen, and the like of the monitoring device 10 . In particular, the display unit 15 performs a function of outputting the meter reading image according to the embodiment of the present invention to the screen. The display unit 15 may be formed of a liquid crystal display (LCD), an organic light emitting diode (OLED), an active matrix organic light emitting diode (AMOLED), or the like. Meanwhile, the display unit 15 may be implemented as a touch screen. In this case, the display unit 15 includes a touch sensor. The touch sensor detects a user's touch input. The touch sensor may be composed of a touch sensing sensor such as a capacitive overlay, a pressure type, a resistive overlay, or an infrared beam, or may be composed of a pressure sensor. . In addition to the above sensors, all types of sensor devices capable of sensing contact or pressure of an object may be used as the touch sensor of the present invention. The touch sensor detects a user's touch input, generates a detection signal, and transmits it to the control unit 17 . In particular, when the display unit 15 is formed of a touch screen, some or all of the functions of the input unit 14 may be performed through the display unit 15 .

저장부(16)는 모니터링장치(10)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행한다. 특히, 저장부(16)는 모니터링장치(10)의 사용에 따라 발생하는 사용자 데이터, 예컨대, 운동 패턴, 발화 패턴 및 감정 분석 결과 등이 저장되는 영역이다. 저장부(160)에 저장되는 각 종 데이터는 사용자의 조작에 따라, 삭제, 변경, 추가될 수 있다. The storage unit 16 serves to store programs and data necessary for the operation of the monitoring device 10 . In particular, the storage unit 16 is an area in which user data generated according to the use of the monitoring device 10 , for example, an exercise pattern, an utterance pattern, an emotion analysis result, and the like are stored. Various types of data stored in the storage 160 may be deleted, changed, or added according to a user's manipulation.

제어부(17)는 모니터링장치(10)의 전반적인 동작 및 모니터링장치(10)의 내부 블록들 간 신호 흐름을 제어하고, 데이터를 처리하는 데이터 처리 기능을 수행할 수 있다. 또한, 제어부(17)는 기본적으로, 모니터링장치(10)의 각 종 기능을 제어하는 역할을 수행한다. 제어부(17)는 CPU(Central Processing Unit), BP(baseband processor), AP(application processor), GPU(Graphic Processing Unit), DSP(Digital Signal Processor) 등을 예시할 수 있다. The controller 17 may control the overall operation of the monitoring device 10 and the signal flow between internal blocks of the monitoring device 10 , and perform a data processing function of processing data. Also, the control unit 17 basically serves to control various functions of the monitoring device 10 . The controller 17 may include a central processing unit (CPU), a baseband processor (BP), an application processor (AP), a graphic processing unit (GPU), a digital signal processor (DSP), and the like.

그러면, 이러한 제어부(17)의 구성 및 동작에 대해 보다 상세하게 설명하기로 한다. 도 3은 본 발명의 실시예에 따른 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 장치의 세부적인 구성을 설명하기 위한 도면이다. 도 3을 참조하면, 제어부(17)는 운동성분석부(100), 발화분석부(200), 감정분석부(300) 및 리포트부(400)를 포함한다. 운동성분석부(100)는 제1 신경망(NN1)을 포함하며, 발화분석부(200)는 제2 신경망(NN1)을 포함하며, 감정분석부(300)는 제3 신경망(NN1)을 포함한다. Then, the configuration and operation of the control unit 17 will be described in more detail. 3 is a diagram for explaining a detailed configuration of an apparatus for monitoring a user's movement and emotional change through voice analysis in a personal space observation image according to an embodiment of the present invention. Referring to FIG. 3 , the control unit 17 includes a movement analysis unit 100 , a speech analysis unit 200 , an emotion analysis unit 300 , and a report unit 400 . The movement analysis unit 100 includes a first neural network NN1 , the speech analysis unit 200 includes a second neural network NN1 , and the emotion analysis unit 300 includes a third neural network NN1 . .

제1 신경망(NN1), 제2 신경망(NN1) 및 제3 신경망(NN1)은 모두 인공신경망(ANN: Artificial Neural Network)이다. 이러한 인공신경망(ANN)은 다양한 종류의 알고리즘이 적용될 수 있지만, 데이터를 분류하는 구조를 가지는 모든 종류의 알고리즘이 적용될 수 있다. The first neural network NN1 , the second neural network NN1 , and the third neural network NN1 are all artificial neural networks (ANNs). Although various types of algorithms may be applied to such an artificial neural network (ANN), all types of algorithms having a structure for classifying data may be applied.

운동성분석부(100)는 카메라부(12)로부터 사용자의 소정의 공간(R)을 지속적으로 촬영하여 생성된 관찰 영상을 수신할 수 있다. 그러면, 운동성분석부(100)는 수신된 관찰 영상을 기초로 사용자의 운동 패턴을 생성할 수 있다. 운동성분석부(100)는 사용자의 소정의 공간(R)을 복수의 단위 공간(r1, r2, ... rn)으로 구분하고, 구분된 단위 공간(r1, r2, ... rn)을 기반으로 운동 패턴을 도출한다. 본 발명의 실시예에서 관찰 영상은 동영상이며, 운동성분석부(100)는 관찰 영상으로부터 단위 시간 별로 사용자의 단위 공간 상의 위치를 추출하여 운동 패턴을 생성한다. 일례로, 사용자의 소정의 공간의 바닥면을 소정수의 행과 열을 가지는 격자 형식으로 분할하여 복수(예컨대, 5×5 = 25개)의 단위 사각형으로 구분하고, 구분된 각각의 단위 사각형을 밑면으로 하고 높이는 바닥에서 천장까지인 육면체가 차지하는 공간을 단위 공간으로 지정할 수 있다. 이와 같이, 소정의 공간이 격자 형식으로 분할되어 행(5)과 열(5)을 가지는 25개의 단위 공간으로 구분된 경우, 단위 공간 상의 사용자의 위치는 행(row)과 열(row)의 좌표로 표현할 수 있다. 여기서, 단위 시간이 10초라고 가정하면, 운동성분석부(100)는 관찰 영상에서 매 10초마다 사용자의 단위 공간 상의 위치를 추출하여 매 10초마다 사용자가 위치한 단위 공간의 좌표(행, 열)를 운동 패턴으로 도출할 수 있다. 그러면, 운동성분석부(100)는 시간의 흐름에 따라 사용자가 위치한 단위 공간의 좌표를 지속적으로 기록하여 운동 패턴을 생성한다. The motion analysis unit 100 may receive an observation image generated by continuously photographing the user's predetermined space R from the camera unit 12 . Then, the movement analyzer 100 may generate the user's movement pattern based on the received observation image. The movement analysis unit 100 divides the user's predetermined space (R) into a plurality of unit spaces (r1, r2, ... rn), and based on the divided unit spaces (r1, r2, ... rn). to derive the movement pattern. In an embodiment of the present invention, the observation image is a moving picture, and the movement analyzer 100 generates an exercise pattern by extracting the user's position on the unit space for each unit time from the observation image. As an example, the bottom surface of a user's predetermined space is divided in a grid format having a predetermined number of rows and columns, divided into a plurality (eg, 5×5 = 25) unit squares, and each divided unit square is divided into a plurality of unit squares. It is possible to designate the space occupied by a cube from the floor to the ceiling as the unit space with the base as the base and the height. As such, when a predetermined space is divided in a grid format and divided into 25 unit spaces having rows 5 and columns 5, the user's position on the unit space is the coordinates of the rows and columns. can be expressed as Here, assuming that the unit time is 10 seconds, the movement analyzer 100 extracts the position on the unit space of the user every 10 seconds from the observation image, and coordinates (row, column) of the unit space where the user is located every 10 seconds can be derived as a movement pattern. Then, the movement analysis unit 100 generates an exercise pattern by continuously recording the coordinates of the unit space in which the user is located according to the passage of time.

운동성분석부(100)는 카메라부(12)를 통해 사용자의 일상을 촬영한 관찰 영상으로부터 운동 패턴을 생성하고, 생성된 운동 패턴을 학습 데이터로 이용하여 제1 신경망(NN1)을 학습시킨다. 이에 따라, 제1 신경망(NN1)은 해당 사용자의 일반적인 운동 패턴을 학습할 수 있다. The movement analysis unit 100 generates an exercise pattern from an observation image captured by the user's daily life through the camera unit 12 , and uses the generated exercise pattern as learning data to learn the first neural network NN1 . Accordingly, the first neural network NN1 may learn a general exercise pattern of the corresponding user.

전술한 바와 같이, 제1 신경망(NN1)이 사용자의 일반적인 운동 패턴을 학습한 상태에서, 운동성분석부(100)는 카메라부(12)를 통해 실시간으로 관찰 영상을 입력 받을 수 있다. 그러면, 운동성분석부(100)는 실시간으로 입력되는 관찰 영상에서 이미지 인식을 통해 사용자를 식별한다. 사용자가 식별되면, 운동성분석부(100)는 실시간으로 입력되는 관찰 영상에서 식별된 사용자의 운동 패턴을 추출한다. 이어서, 운동성분석부(100)는 제1 신경망(NN1)을 통해 학습된 운동 패턴과 추출된 운동 패턴이 차이가 있는지 여부를 판별한다. 이에 따라, 학습된 운동 패턴과 추출된 운동 패턴이 차이가 있으면, 운동성분석부(100)는 차이가 있는 운동 패턴과 함께 운동 패턴 이상 신호를 출력한다. 출력된 차이가 있는 운동 패턴을 포함하는 운동 패턴 이상 신호는 감정분석부(300)에 입력된다. 반면, 학습된 운동 패턴과 추출된 운동 패턴이 차이가 없으면, 운동성분석부(100)는 해당 운동 패턴을 출력한다. 출력된 운동 패턴은 감정분석부(300)에 입력된다. As described above, in a state in which the first neural network NN1 has learned the user's general movement pattern, the movement analyzer 100 may receive an observation image in real time through the camera unit 12 . Then, the movement analysis unit 100 identifies the user through image recognition from the observation image input in real time. When the user is identified, the movement analysis unit 100 extracts the user's movement pattern identified from the observation image input in real time. Next, the movement analysis unit 100 determines whether there is a difference between the exercise pattern learned through the first neural network NN1 and the exercise pattern extracted. Accordingly, if there is a difference between the learned exercise pattern and the extracted exercise pattern, the movement analysis unit 100 outputs an abnormal exercise pattern signal together with the exercise pattern with the difference. The exercise pattern abnormal signal including the output difference exercise pattern is input to the emotion analysis unit 300 . On the other hand, if there is no difference between the learned exercise pattern and the extracted exercise pattern, the movement analysis unit 100 outputs the corresponding exercise pattern. The output movement pattern is input to the emotion analysis unit 300 .

발화분석부(200)는 오디오부(13)로부터 사용자의 소정의 공간에서 발생되는 음성을 지속적으로 수신하여 생성한 관찰 음성을 수신하고, 수신된 관찰 음성을 기초로 사용자의 음성 패턴을 생성할 수 있다. 즉, 발화분석부(200)는 오디오부(13)로부터 사용자의 소정의 공간(R)에서 발생되는 음성을 지속적으로 수신하여 생성한 관찰 음성을 수신하고, 수신된 관찰 음성을 기초로 사용자의 음성 패턴을 생성할 수 있다. 발화분석부(200)는 관찰 음성으로부터 단위 시간 별로 발화 속도, 음성 신호의 톤 및 음성 신호의 피치 중 적어도 하나를 추출하여 발화 패턴을 생성한다. 여기서, 단위 시간이 10초라고 가정하면, 발화분석부(200)는 관찰 음성에서 매 10초마다 발화 속도, 음성 신호의 톤 및 음성 신호의 피치를 추출하여 추출한 운동 패턴으로 도출할 수 있다. 그러면, 운동성분석부(100)는 시간의 흐름에 따라 발화 속도, 음성 신호의 톤 및 음성 신호의 피치를 지속적으로 기록하여 발화 패턴을 생성한다. The speech analysis unit 200 may receive the observation voice generated by continuously receiving the voice generated in the user's predetermined space from the audio unit 13, and generate the user's voice pattern based on the received observation voice. there is. That is, the speech analysis unit 200 receives the observation voice generated by continuously receiving the voice generated in the user's predetermined space (R) from the audio unit 13, and based on the received observation voice, the user's voice You can create patterns. The speech analysis unit 200 generates a speech pattern by extracting at least one of a speech speed, a tone of a speech signal, and a pitch of a speech signal for each unit time from the observed speech. Here, assuming that the unit time is 10 seconds, the speech analysis unit 200 may extract the speech speed, the tone of the speech signal, and the pitch of the speech signal every 10 seconds from the observed speech and derive the extracted movement pattern. Then, the movement analysis unit 100 generates a speech pattern by continuously recording the speech speed, the tone of the speech signal, and the pitch of the speech signal over time.

발화분석부(200)는 오디오부(13)를 통해 사용자의 일상으로부터 수집되어 생성된 관찰 음성으로부터 음성 패턴을 생성하고, 생성된 음성 패턴을 학습 데이터로 이용하여 제2 신경망(NN2)을 학습시킨다. 이에 따라, 제2 신경망(NN2)은 해당 사용자의 일반적인 음성 패턴을 학습할 수 있다. The speech analysis unit 200 generates a voice pattern from the generated observation voice collected from the user's daily life through the audio unit 13, and uses the generated voice pattern as learning data to learn the second neural network NN2. . Accordingly, the second neural network NN2 may learn the general voice pattern of the corresponding user.

전술한 바와 같이, 제2 신경망(NN2)이 사용자의 일반적인 음성 패턴을 학습한 상태에서, 발화분석부(200)는 오디오부(13)로부터 실시간으로 관찰 음성을 입력 받을 수 있다. 그러면, 발화분석부(200)는 실시간으로 입력되는 관찰 음성에서 음성 인식을 통해 사용자를 식별한다. 사용자가 식별되면, 발화분석부(200)는 실시간으로 입력되는 관찰 음성에서 식별된 사용자의 음성 패턴을 추출한다. 이어서, 발화분석부(200)는 제2 신경망(NN2)을 통해 학습된 음성 패턴과 추출된 음성 패턴이 차이가 있는지 여부를 확인한다. 이에 따라, 기 저장된 음성 패턴과 비교하여 추출된 음성 패턴이 차이가 있으면, 발화분석부(200)는 차이가 있는 음성 패턴과 함께 음성 패턴 이상 신호를 출력한다. 출력된 해당 음성 패턴을 포함하는 음성 패턴 이상 신호는 감정분석부(300)에 입력된다. 반면, 학습된 음성 패턴과 추출된 음성 패턴이 임계치 이상의 차이가 없으면, 발화분석부(200)는 해당 음성 패턴을 출력한다. 출력된 음성 패턴은 감정분석부(300)에 입력된다. As described above, in a state in which the second neural network NN2 has learned the user's general voice pattern, the speech analysis unit 200 may receive an observation voice from the audio unit 13 in real time. Then, the speech analysis unit 200 identifies the user through voice recognition from the observation voice input in real time. When the user is identified, the speech analysis unit 200 extracts the identified user's voice pattern from the observed voice input in real time. Next, the speech analysis unit 200 checks whether there is a difference between the voice pattern learned through the second neural network NN2 and the extracted voice pattern. Accordingly, if there is a difference in the extracted voice pattern compared to the pre-stored voice pattern, the speech analysis unit 200 outputs a voice pattern abnormality signal together with the different voice pattern. The voice pattern abnormal signal including the outputted voice pattern is input to the emotion analysis unit 300 . On the other hand, if there is no difference between the learned voice pattern and the extracted voice pattern by more than a threshold, the speech analysis unit 200 outputs the corresponding voice pattern. The output voice pattern is input to the emotion analysis unit 300 .

감정분석부(300)는 운동성분석부(100)로부터 수신되는 운동 패턴 및 발화분석부(200)로부터 수신되는 발화 패턴을 기초로 제3 신경망(NN3)을 통해 사용자의 감정 상태를 분석한다. 제3 신경망(NN3)을 통해 사용자의 감정 상태를 분석하기 위하여 학습(machine learning)이 요구된다. The emotion analysis unit 300 analyzes the user's emotional state through the third neural network NN3 based on the movement pattern received from the movement analysis unit 100 and the speech pattern received from the speech analysis unit 200 . Machine learning is required to analyze the user's emotional state through the third neural network NN3.

학습이 이루어지는 동안 모니터링장치(10)의 관리자에 의해 운동성분석부(100)로부터 수신되는 운동 패턴 및 발화분석부(200)로부터 수신되는 발화 패턴에 대응하는 레이블로 감정 상태(기쁨, 행복, 놀람, 슬픔, 분노, 혐오 등)가 입력된다. 이에 따라, 감정분석부(300)는 운동 패턴 및 발화 패턴에 대응하여 입력된 레이블을 매핑하여 학습 데이터를 생성한다. 그런 다음, 생성된 학습 데이터를 이용하여 제3 신경망(NN3)이 운동 패턴 및 발화 패턴에 따라 사용자의 감정 상태를 추정하도록 제3 신경망(NN3)을 학습시킨다. 전술한 바와 같이, 학습이 이루어진 후, 감정분석부(300)는 실시간으로 운동성분석부(100) 및 발화분석부(200)로부터 운동 패턴 및 발화 패턴을 수신할 수 있다. 그러면, 감정분석부(300)는 제3 신경망(NN3)을 통해 수신된 운동 패턴 및 발화 패턴에 대응하는 사용자의 감정 상태를 추정한다. 감정분석부(300)는 추정된 감정 상태를 리포트부(400)로 출력한다. During learning, the emotional state (joy, happiness, surprise, sadness, anger, disgust, etc.) are entered. Accordingly, the emotion analysis unit 300 generates learning data by mapping the input label corresponding to the movement pattern and the speech pattern. Then, using the generated learning data, the third neural network NN3 trains the third neural network NN3 to estimate the user's emotional state according to the movement pattern and the speech pattern. As described above, after learning is performed, the emotion analysis unit 300 may receive the movement pattern and the speech pattern from the movement analysis unit 100 and the speech analysis unit 200 in real time. Then, the emotion analysis unit 300 estimates the user's emotional state corresponding to the movement pattern and the speech pattern received through the third neural network NN3. The emotion analysis unit 300 outputs the estimated emotional state to the report unit 400 .

리포트부(400)는 감정분석부(300)가 추정한 사용자의 감정 상태를 수신할 수 있다. 그러면, 리포트부(400)는 수신된 사용자의 감정 상태를 통신부(11)를 통해 보호자가 사용하는 보호자장치(30) 및 의료진이 사용하는 원격진료장치(20)로 주기적으로 전송할 수 있다. The report unit 400 may receive the user's emotional state estimated by the emotion analysis unit 300 . Then, the report unit 400 may periodically transmit the received emotional state of the user to the guardian device 30 used by the guardian and the remote medical treatment device 20 used by the medical staff through the communication unit 11 .

다음으로, 본 발명의 실시예에 따른 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 방법을 설명하기로 한다. 도 4는 본 발명의 실시예에 따른 개인 공간 관찰 영상에서 사용자의 운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 방법을 설명하기 위한 흐름도이다. 도 4에서, 앞서 설명된 바와 같이, 제1 신경망(NN1), 제2 신경망(NN2) 및 제3 신경망(NN3) 모두 학습이 이루어진 상태라고 가정한다. Next, a method for monitoring a user's movement and emotional change through voice analysis in a personal space observation image according to an embodiment of the present invention will be described. 4 is a flowchart illustrating a method for monitoring a user's movement and emotional change through voice analysis in a personal space observation image according to an embodiment of the present invention. In FIG. 4 , as described above, it is assumed that all of the first neural network NN1 , the second neural network NN2 , and the third neural network NN3 are in a learning state.

도 4를 참조하면, 카메라부(12)는 S110 단계에서 사용자의 소정의 공간(R)을 지속적으로 촬영하여 관찰 영상을 생성한다. 운동성분석부(100)는 S120 단계에서 카메라부(12)로부터 실시간으로 입력되는 관찰 영상으로부터 운동 패턴을 추출하고, 제1 신경망(NN1)을 통해 추출된 운동 패턴을 분석한다. 이러한 S120 단계에 대해 보다 구체적으로 설명하면 다음과 같다. 운동성분석부(100)는 실시간으로 입력되는 관찰 영상에서 이미지 인식을 통해 사용자를 식별하고, 실시간으로 입력되는 관찰 영상에서 식별된 사용자의 운동 패턴을 추출한다. 이어서, 운동성분석부(100)는 제1 신경망(NN1)을 통해 학습된 운동 패턴과 추출된 운동 패턴이 차이가 있는지 여부를 판별한다. 이에 따라, 학습된 운동 패턴과 추출된 운동 패턴이 차이가 있으면, 운동성분석부(100)는 차이가 있는 운동 패턴과 함께 운동 패턴 이상 신호를 출력하여 감정분석부(300)에 입력한다. 반면, 학습된 운동 패턴과 추출된 운동 패턴이 차이가 없으면, 운동성분석부(100)는 해당 운동 패턴을 출력하여 감정분석부(300)에 입력한다. Referring to FIG. 4 , the camera unit 12 continuously captures the user's predetermined space R in step S110 to generate an observation image. The movement analysis unit 100 extracts a movement pattern from the observation image input in real time from the camera unit 12 in step S120 and analyzes the movement pattern extracted through the first neural network NN1. The step S120 will be described in more detail as follows. The movement analysis unit 100 identifies the user through image recognition in the observation image input in real time, and extracts the movement pattern of the user identified from the observation image input in real time. Next, the movement analysis unit 100 determines whether there is a difference between the exercise pattern learned through the first neural network NN1 and the exercise pattern extracted. Accordingly, if there is a difference between the learned exercise pattern and the extracted exercise pattern, the movement analysis unit 100 outputs an abnormal movement pattern signal together with the exercise pattern with the difference and inputs it to the emotion analysis unit 300 . On the other hand, if there is no difference between the learned exercise pattern and the extracted exercise pattern, the movement analysis unit 100 outputs the corresponding movement pattern and inputs it to the emotion analysis unit 300 .

한편, 오디오부(13)는 S130 단계에서 사용자의 소정의 공간(R) 내의 음성을 지속적으로 수신하여 관찰 음성을 생성하고 이를 출력한다. 그러면, 발화분석부(200)는 S140 단계에서 오디오부(13)로부터 실시간으로 입력되는 관찰 음성으로부터 발화 패턴을 추출하고, 제2 신경망(NN2)을 통해 추출한 발화 패턴을 분석한다. 이러한 S140 단계에 대해 보다 상세하게 설명하면 다음과 같다. Meanwhile, the audio unit 13 continuously receives the user's voice in the predetermined space R in step S130 to generate an observation voice and outputs it. Then, the speech analysis unit 200 extracts a speech pattern from the observed voice input in real time from the audio unit 13 in step S140 and analyzes the extracted speech pattern through the second neural network NN2. The step S140 will be described in more detail as follows.

발화분석부(200)는 실시간으로 입력되는 관찰 음성에서 음성 인식을 통해 사용자를 식별하고, 실시간으로 입력되는 관찰 음성에서 식별된 사용자의 음성 패턴을 추출한다. 이어서, 발화분석부(200)는 제2 신경망(NN2)을 통해 학습된 음성 패턴과 추출된 음성 패턴이 차이가 있는지 여부를 확인한다. 이에 따라, 기 저장된 음성 패턴과 비교하여 추출된 음성 패턴이 차이가 있으면, 발화분석부(200)는 차이가 있는 음성 패턴과 함께 음성 패턴 이상 신호를 출력하여 감정분석부(300)에 입력한다. 반면, 학습된 음성 패턴과 추출된 음성 패턴이 임계치 이상의 차이가 없으면, 발화분석부(200)는 해당 음성 패턴을 출력하여 감정분석부(300)에 입력한다. The speech analysis unit 200 identifies the user through voice recognition in the observation voice input in real time, and extracts the user's voice pattern identified from the observation voice input in real time. Next, the speech analysis unit 200 checks whether there is a difference between the voice pattern learned through the second neural network NN2 and the extracted voice pattern. Accordingly, if there is a difference in the extracted voice pattern compared to the pre-stored voice pattern, the speech analysis unit 200 outputs a voice pattern abnormal signal together with the voice pattern with the difference and inputs it to the emotion analysis unit 300 . On the other hand, if there is no difference between the learned voice pattern and the extracted voice pattern by more than a threshold, the speech analysis unit 200 outputs the corresponding voice pattern and inputs it to the emotion analysis unit 300 .

감정분석부(300)는 S150 단계에서 운동성분석부(100)로부터 수신되는 운동 패턴 및 발화분석부(200)로부터 수신되는 발화 패턴을 분석하여 제3 신경망(NN3)을 통해 사용자의 감정 상태를 추정하고, 추정된 감정 상태를 출력하여 리포트부(400)에 입력한다. The emotion analysis unit 300 analyzes the movement pattern received from the movement analysis unit 100 and the speech pattern received from the speech analysis unit 200 in step S150 to estimate the user's emotional state through the third neural network NN3 and output the estimated emotional state and input it to the report unit 400 .

리포트부(400)는 S150 단계에서 감정분석부(300)로부터 수신된 감정 상태를 통신부(11)를 통해 보호자가 사용하는 보호자장치(30) 및 의료진이 사용하는 원격진료장치(20)로 전송한다. The report unit 400 transmits the emotional state received from the emotion analysis unit 300 in step S150 to the guardian device 30 used by the guardian and the remote medical treatment device 20 used by the medical staff through the communication unit 11. .

한편, 앞서 설명된 본 발명의 실시예에 따른 방법은 다양한 컴퓨터수단을 통하여 판독 가능한 프로그램 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광 기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. Meanwhile, the method according to the embodiment of the present invention described above may be implemented in the form of a program readable by various computer means and recorded in a computer readable recording medium. Here, the recording medium may include a program command, a data file, a data structure, etc. alone or in combination. The program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. For example, the recording medium includes magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks ( magneto-optical media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions may include high-level languages that can be executed by a computer using an interpreter or the like as well as machine language such as generated by a compiler. Such hardware devices may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

인간 내면의 상태를 파악하기 위한 감정 분석 기술은 뇌파, 체온, 심박 등 다양한 접촉식 센서 및 고가의 장비들을 기반의 분석이 주를 이루었기 때문에, 의료기관이나 연구소 등에서만 분석할 수 있다. 감정 기복이 심한 정신질환자들이나 독거노인의 상시 케어를 위해 가족 및 보호자들은 어려움을 호소한다. 진료 시간에 관찰하는 환자의 상태만으로 정신질환 의료진들이 환자의 정확한 상태 파악에 어려움이 많거나, 의료진의 경험에 전적으로 의존할 수밖에 없다. 하지만, 본 발명은 홈 환경에서도 지속적인 관찰과 모니터링이 필요한 공황장애, 우울증, 조현명 등의 정신질환자들의 지속적인 케어를 지원할 수 있다. 더욱이, 본 발명은 원거리에서도 보호자가 환자의 응급 상황을 알림 받을 수 있고, 의료진들에게 병원 밖에서의 환자 상태를 24시간 모니터링하여 전달할 수 있어서 정확한 환자 상태 파악이 가능하다. Emotion analysis technology to understand the inner state of human beings can only be analyzed at medical institutions or research institutes, as analysis based on various contact sensors and expensive equipment such as brain waves, body temperature, and heart rate is the main method. Families and caregivers complain of difficulties for regular care of mentally ill people with severe emotional ups and downs or the elderly living alone. Medical staff with mental disorders have a lot of difficulty in figuring out the exact condition of a patient only by observing the patient's condition during treatment hours, or they have no choice but to rely entirely on the experience of the medical staff. However, the present invention can support continuous care for mentally ill patients such as panic disorder, depression, and Jo Hyeon-myung, who require continuous observation and monitoring even in a home environment. Furthermore, according to the present invention, a guardian can be notified of an emergency situation of a patient even from a distance, and it is possible to monitor and deliver the patient's condition outside the hospital to medical staff 24 hours a day, so that it is possible to accurately identify the patient's condition.

이상 본 발명을 몇 가지 바람직한 실시예를 사용하여 설명하였으나, 이들 실시예는 예시적인 것이며 한정적인 것이 아니다. 이와 같이, 본 발명이 속하는 기술분야에서 통상의 지식을 지닌 자라면 본 발명의 사상과 첨부된 특허청구범위에 제시된 권리범위에서 벗어나지 않으면서 균등론에 따라 다양한 변화와 수정을 가할 수 있음을 이해할 것이다. Although the present invention has been described above using several preferred embodiments, these examples are illustrative and not restrictive. As such, those of ordinary skill in the art to which the present invention pertains will understand that various changes and modifications can be made in accordance with the doctrine of equivalents without departing from the spirit of the present invention and the scope of rights set forth in the appended claims.

10: 모니터링장치
11: 통신부
12: 카메라부
13: 오디오부
14: 입력부
15: 표시부
16: 저장부
17: 제어부
20: 원격진료장치
30: 보호자장치
100: 운동성분석부
200: 발화분석부
300: 감정분석부
400: 리포트부 10: monitoring device
11: Ministry of Communications
12: camera unit
13: audio part
14: input
15: display
16: storage
17: control unit
20: telemedicine device
30: protector device
100: motility analysis unit
200: speech analysis unit
300: emotion analysis unit
400: report unit

Claims

운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 장치에 있어서,
통신을 위한 통신부;
사용자의 소정의 공간을 지속적으로 촬영하여 관찰 영상을 생성하는 카메라부;
상기 공간 내의 음성을 지속적으로 수신하여 관찰 음성을 생성하는 오디오부;
상기 관찰 영상을 기초로 상기 사용자의 운동 패턴을 생성하고, 생성된 운동 패턴을 분석하는 운동성분석부;
상기 관찰 음성을 기초로 상기 사용자의 발화 패턴을 생성하고, 생성된 발화 패턴을 분석하는 발화분석부;
상기 운동 패턴 및 상기 발화 패턴을 기초로 상기 운동 패턴 및 상기 발화 패턴에 대응하는 사용자의 감정 상태가 학습된 신경망을 통해 상기 사용자의 감정 상태를 분석하는 감정분석부; 및
상기 분석된 감정 상태를 상기 통신부를 통해 보호자장치 및 원격진료장치로 전송하는 리포트부;
를 포함하는 것을 특징으로 하는
모니터링하기 위한 장치. In the device for monitoring emotional change through movement and voice analysis,
communication unit for communication;
a camera unit that continuously captures a user's predetermined space to generate an observation image;
an audio unit for continuously receiving a voice in the space to generate an observation voice;
a movement analysis unit for generating the user's movement pattern based on the observation image and analyzing the generated movement pattern;
a speech analysis unit generating the user's speech pattern based on the observed voice and analyzing the generated speech pattern;
an emotion analysis unit for analyzing the user's emotional state through a neural network in which the exercise pattern and the user's emotional state corresponding to the utterance pattern are learned based on the exercise pattern and the utterance pattern; and
a report unit for transmitting the analyzed emotional state to a guardian device and a remote medical device through the communication unit;
characterized in that it comprises
device for monitoring.

제1항에 있어서,
상기 운동성분석부는
상기 사용자의 일반적인 운동 패턴을 학습한 제1 신경망을 포함하며,
실시간으로 입력되는 관찰 영상에서 이미지 인식을 통해 상기 사용자가 식별되면,
식별된 사용자의 운동 패턴을 추출하고,
상기 제1 신경망을 통해 상기 학습된 운동 패턴과 상기 추출된 운동 패턴이 차이가 있는지 여부를 판별한 후,
상기 학습된 운동 패턴과 상기 추출된 운동 패턴이 차이가 있으면,
차이가 있는 운동 패턴과 함께 운동 패턴 이상 신호를 출력하는 것을 특징으로 하는
모니터링하기 위한 장치. According to claim 1,
The motility analysis unit
It includes a first neural network that learned the user's general movement pattern,
When the user is identified through image recognition in the observation image input in real time,
Extract the identified user's exercise pattern,
After determining whether there is a difference between the learned exercise pattern and the extracted exercise pattern through the first neural network,
If there is a difference between the learned exercise pattern and the extracted exercise pattern,
Characterized in outputting a movement pattern abnormal signal together with a different movement pattern
device for monitoring.

제2항에 있어서,
상기 운동성분석부는
상기 공간을 복수의 단위 공간으로 구분하고,
상기 관찰 영상으로부터 단위 시간 별로 사용자의 상기 단위 공간 상의 위치를 추출하여 운동 패턴을 생성하는 것을 특징으로 하는
모니터링하기 위한 장치. 3. The method of claim 2,
The motility analysis unit
dividing the space into a plurality of unit spaces,
Extracting the user's position on the unit space for each unit time from the observation image to generate an exercise pattern
device for monitoring.

제1항에 있어서,
상기 발화분석부는
상기 사용자의 일반적인 발화 패턴을 학습한 제2 신경망을 포함하며,
실시간으로 입력되는 관찰 음성에서 음성 인식을 통해 상기 사용자가 식별되면,
식별된 사용자의 발화 패턴을 추출하고,
상기 제1 신경망을 통해 상기 학습된 발화 패턴과 상기 추출된 발화 패턴이 차이가 있는지 여부를 판별한 후,
상기 학습된 발화 패턴과 상기 추출된 발화 패턴이 차이가 있으면,
차이가 있는 발화 패턴과 함께 발화 패턴 이상 신호를 출력하는 것을 특징으로 하는
모니터링하기 위한 장치. According to claim 1,
The speech analysis unit
and a second neural network that has learned the user's general speech pattern,
When the user is identified through voice recognition in the observation voice input in real time,
Extracting the identified user's speech pattern,
After determining whether there is a difference between the learned speech pattern and the extracted speech pattern through the first neural network,
If there is a difference between the learned speech pattern and the extracted speech pattern,
Characterized in outputting a signal abnormality of the utterance pattern together with the utterance pattern with a difference
device for monitoring.

제1항에 있어서,
상기 발화분석부는
상기 관찰 음성으로부터 단위 시간 별로 발화 속도, 음성 신호의 톤 및 음성 신호의 피치 중 적어도 하나를 추출하여 발화 패턴을 생성하는 것을 특징으로 하는
모니터링하기 위한 장치. According to claim 1,
The speech analysis unit
and generating a speech pattern by extracting at least one of a speech speed, a tone of a speech signal, and a pitch of a speech signal for each unit time from the observed speech.
device for monitoring.

운동성 및 음성 분석을 통한 감정 변화를 모니터링하기 위한 장치에 있어서,
카메라부가 사용자의 소정의 공간을 지속적으로 촬영하여 관찰 영상을 생성하는 단계;
오디오부가 상기 공간 내의 음성을 지속적으로 수신하여 관찰 음성을 생성하는 단계;
운동성분석부가 상기 관찰 영상을 기초로 상기 사용자의 운동 패턴을 생성하고, 생성된 운동 패턴을 분석하는 단계;
발화분석부가 상기 관찰 음성을 기초로 상기 사용자의 발화 패턴을 생성하고, 생성된 발화 패턴을 분석하는 단계;
감정분석부가 상기 운동 패턴 및 상기 발화 패턴을 기초로 상기 운동 패턴 및 상기 발화 패턴에 대응하는 사용자의 감정 상태가 학습된 신경망을 통해 상기 사용자의 감정 상태를 분석하는 단계; 및
리포트부가 상기 분석된 감정 상태를 상기 통신부를 통해 보호자장치 및 원격진료장치로 전송하는 단계;
를 포함하는 것을 특징으로 하는
모니터링하기 위한 방법. In the device for monitoring emotional change through movement and voice analysis,
generating an observation image by continuously photographing a predetermined space of the user by the camera unit;
generating an observation voice by continuously receiving a voice in the space by an audio unit;
generating an exercise pattern of the user based on the observation image by a movement analysis unit, and analyzing the generated exercise pattern;
generating, by a speech analysis unit, the speech pattern of the user based on the observed voice, and analyzing the generated speech pattern;
analyzing, by an emotion analysis unit, the emotional state of the user based on the exercise pattern and the utterance pattern, through a neural network in which the exercise pattern and the user's emotional state corresponding to the utterance pattern are learned; and
transmitting the analyzed emotional state by the report unit to the guardian device and the remote medical device through the communication unit;
characterized in that it comprises
Methods for monitoring.

제6항에 있어서,
상기 운동 패턴을 분석하는 단계는
상기 운동성분석부가 실시간으로 입력되는 관찰 영상에서 이미지 인식을 통해 상기 사용자가 식별되면, 식별된 사용자의 운동 패턴을 추출하는 단계;
상기 운동성분석부가 상기 사용자의 일반적인 운동 패턴을 학습한 제1 신경망을 통해 상기 학습된 운동 패턴과 상기 추출된 운동 패턴이 차이가 있는지 여부를 판별하는 단계; 및
상기 운동성분석부가 상기 판별 결과, 상기 학습된 운동 패턴과 상기 추출된 운동 패턴이 차이가 있으면, 차이가 있는 운동 패턴과 함께 운동 패턴 이상 신호를 출력하는 단계;
를 포함하는 것을 특징으로 하는
모니터링하기 위한 방법. 7. The method of claim 6,
The step of analyzing the movement pattern is
when the user is identified through image recognition in the observation image input by the movement analysis unit in real time, extracting the identified user's movement pattern;
determining whether there is a difference between the learned exercise pattern and the extracted exercise pattern through the first neural network in which the exercise analyzer has learned the general exercise pattern of the user; and
outputting an exercise pattern abnormality signal together with the exercise pattern having a difference when the exercise pattern is different from the learned exercise pattern and the extracted exercise pattern as a result of the determination by the movement analysis unit;
characterized in that it comprises
Methods for monitoring.

제6항에 있어서,
상기 발화 패턴을 분석하는 단계는
상기 발화분석부가 실시간으로 입력되는 관찰 음성에서 음성 인식을 통해 상기 사용자가 식별되면, 식별된 사용자의 발화 패턴을 추출하는 단계;
상기 발화분석부가 상기 사용자의 일반적인 발화 패턴을 학습한 제2 신경망을 통해 상기 학습된 발화 패턴과 상기 추출된 발화 패턴이 차이가 있는지 여부를 판별하는 단계; 및
상기 발화분석부가 상기 학습된 발화 패턴과 상기 추출된 발화 패턴이 차이가 있으면, 차이가 있는 발화 패턴과 함께 발화 패턴 이상 신호를 출력하는 단계;
를 포함하는 것을 특징으로 하는
모니터링하기 위한 방법. 7. The method of claim 6,
The step of analyzing the speech pattern is
extracting, by the speech analysis unit, the speech pattern of the identified user when the user is identified through speech recognition from the observation voice input in real time;
determining, by the speech analysis unit, whether there is a difference between the learned speech pattern and the extracted speech pattern through a second neural network from which the user's general speech pattern has been learned; and
outputting, by the utterance analysis unit, a utterance pattern abnormality signal together with the difference utterance pattern when the learned utterance pattern and the extracted utterance pattern are different;
characterized in that it comprises
Methods for monitoring.