KR101660306B1

KR101660306B1 - Method and apparatus for generating life log in portable termianl

Info

Publication number: KR101660306B1
Application number: KR1020100044953A
Authority: KR
Inventors: 정명기; 고한석; 김기현; 윤종성; 최우현
Original assignee: 삼성전자주식회사; 고려대학교 산학협력단
Priority date: 2010-05-13
Filing date: 2010-05-13
Publication date: 2016-09-27
Also published as: KR20110125431A

Abstract

본 발명은 휴대용 단말기에서 라이프 로그 생성 방법 및 장치에 관한 것으로서, 휴대용 단말기에서 라이프 로그 생성 방법은, 기 설정된 음향 환경들 중에서 마이크로부터 입력되는 음향 신호에 대응되는 음향 환경을 인지하는 과정과, 기 설정된 영상 환경들 중에서 카메라로부터 입력되는 영상 신호에 대응되는 영상 환경을 인지하는 과정과, 기 설정된 상황 모델들 중에서 상기 음향 환경 인지 결과와 영상 환경 인지 결과에 대응되는 사용자 상황을 판단하는 과정과, 판단된 사용자 상황을 라이프 로그로 기록하는 과정을 포함하여, 사용자가 별도의 장치를 구매하거나 소지하지 않고서 일상생활에서 라이프 로그를 생성할 수 있으며, 라이프 로그만으로 사용자의 생활 패턴을 알 수 있다.A method for generating a life log in a portable terminal includes the steps of recognizing an acoustic environment corresponding to an acoustic signal input from a microphone among predetermined acoustic environments, Recognizing an image environment corresponding to a video signal input from a camera among image environments, determining a user situation corresponding to the acoustic environment recognition result and the image environment recognition result among preset status models, The user can generate a life log in daily life without purchasing or holding a separate device including a process of recording the user situation into the life log, and the life pattern of the user can be known only by the life log.

Description

휴대용 단말기에서 라이프 로그 생성 방법 및 장치{METHOD AND APPARATUS FOR GENERATING LIFE LOG IN PORTABLE TERMIANL}METHOD AND APPARATUS FOR GENERATING LIFE LOG IN PORTABLE TERMIANL BACKGROUND OF THE INVENTION [0001]

본 발명은 휴대용 단말기에서 라이프 로그를 생성하는 방법 및 장치에 관한 것으로서, 특히 마이크 및 카메라를 이용하여 사용자의 상황을 판단하고, 이를 통해 라이프 로그를 생성하는 방법 및 장치에 관한 것이다.
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for generating a life log in a portable terminal, and more particularly, to a method and apparatus for determining a situation of a user using a microphone and a camera,

최근 들어, 휴대용 단말기들이 발전하고, 활용도가 높아짐에 따라 사용자의 일상 생활 속에서 획득 가능한 정보를 기록하여 필요에 따라 검색할 수 있도록 하는 라이프로그(Life Log) 서비스가 제공되고 있다. 즉, 상기 라이프 로그 서비스는 카메라, GPS(Global Positioning System), 조도센서, 지자기 센서, 온도계 등의 다양한 센서들로부터 사용자의 일상 생활 속에서 정보를 획득하여 기록한 후, 상기 기록된 정보를 이후에 이용할 수 있도록 한다.2. Description of the Related Art [0002] In recent years, a life log service has been provided that enables information that can be acquired in a user's daily life to be recorded and retrieved as needed as the portable terminals are developed and utilized. That is, the life log service acquires and records information in a user's daily life from various sensors such as a camera, a Global Positioning System (GPS), an illuminance sensor, a geomagnetic sensor, a thermometer, and the like, .

종래에 제공된 라이프 로그 서비스는 사용자의 시선과 일치하는 카메라 및 착용 가능한 장치들을 사용자 몸에 장착하여 시간, 위치, 밝기, 사용자 시선 영상 등 다양한 사용자 주변 상황 데이터를 수집한 후, 이를 웹 서버로 전송하여 저장함으로써, 사용자가 웹 서비스를 통해 자신의 라이프 로그를 확인할 수 있도록 하고 있다.Conventionally, the Life Log service collects various user circumstance data such as time, position, brightness, and user's gaze image by attaching the camera and wearable devices matching the user's gaze to the user's body, So that the user can check his / her life log through the web service.

상기와 같이 종래의 라이프 로그 서비스는 사용자의 다양한 정보를 얻기 위해 많은 수의 센서들을 사용자의 몸에 장착해야 한다. 하지만, 이와 같이 사용자의 몸에 많은 수의 센서들을 장착하는 것은 사용자의 일상생활에서 부자연스러움을 유발하는 것은 물론, 사용자가 거부감을 일으킬 수 있기 때문에 상업적으로 실용화되기 어려운 단점이 있다. 또한, 종래의 라이프 로그 기법에서는 수집된 정보 예를 들어, 시간, 위치, 밝기, 사용자 시선 영상 등을 단순한 형태로 웹 서버에 전송 및 저장함으로써, 사용자의 행위나 사용자가 처한 환경 등과 같은 종합적인 상황을 나타내지 못하는 단점이 있다.
As described above, in the conventional life log service, a large number of sensors must be mounted on the user's body in order to obtain various information of the user. However, mounting a large number of sensors on the body of the user as described above has a disadvantage in that it is not practical for commercial use because it can cause unnaturalness in a user's daily life, and also cause a sense of rejection by the user. In addition, in the conventional lifelog technique, the collected information, for example, time, position, brightness, user's gaze image, and the like are transmitted and stored in a simple form to a web server, It is not possible to display the image.

본 발명은 상술한 바와 같은 문제점을 해결하기 위해 도출된 것으로서, 본 발명의 목적은 휴대용 단말기에서 라이프 로그를 생성하는 방법 및 장치를 제공함에 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to provide a method and apparatus for generating a life log in a portable terminal.

본 발명의 다른 목적은 휴대용 단말기에서 마이크 및 카메라로부터 입력되는 영상 및 음향을 이용하여 사용자의 상황을 판단하고, 이를 통해 라이프 로그를 생성하는 방법 및 장치를 제공함에 있다.It is another object of the present invention to provide a method and apparatus for determining a user's situation using a video and sound input from a microphone and a camera in a portable terminal and generating a life log through the determination.

본 발명의 또 다른 목적은 휴대용 단말기에서 휴대용 단말기에서 마이크 및 카메라로부터 입력되는 영상 및 음향을 처리 및 인지하기 위한 스케줄링 방법 및 장치를 제공함에 있다.
It is another object of the present invention to provide a scheduling method and apparatus for processing and recognizing images and sounds input from a microphone and a camera in a portable terminal in a portable terminal.

상술한 목적들을 달성하기 위한 본 발명의 제 1 견지에 따르면, 휴대용 단말기에서 라이프 로그 생성 방법은, 기 설정된 음향 환경들 중에서 마이크로부터 입력되는 음향 신호에 대응되는 음향 환경을 인지하는 과정과, 기 설정된 영상 환경들 중에서 카메라로부터 입력되는 영상 신호에 대응되는 영상 환경을 인지하는 과정과, 기 설정된 상황 모델들 중에서 상기 음향 환경 인지 결과와 영상 환경 인지 결과에 대응되는 사용자 상황을 판단하는 과정과, 판단된 사용자 상황을 라이프 로그로 기록하는 과정을 포함하는 것을 특징으로 한다.According to a first aspect of the present invention, there is provided a method for generating a life log in a portable terminal, comprising the steps of: recognizing an acoustic environment corresponding to an acoustic signal input from a microphone among predetermined acoustic environments; Recognizing an image environment corresponding to a video signal input from a camera among image environments, determining a user situation corresponding to the acoustic environment recognition result and the image environment recognition result among preset status models, And recording the user status to the life log.

상술한 목적들을 달성하기 위한 본 발명의 제 2 견지에 따르면, 휴대용 단말기의 라이프 로그 생성 장치는, 음향 신호를 입력받는 마이크와, 영상 신호를 입력받는 카메라와, 기 설정된 음향 환경들 중에서 상기 마이크로부터 입력되는 음향 신호에 대응되는 음향 환경을 인지하는 음향 환경 인지부와, 기 설정된 영상 환경들 중에서 상기 카메라로부터 입력되는 영상 신호에 대응되는 영상 환경을 인지하는 영상 환경 인지부와, 기 설정된 상황 모델들 중에서 상기 음향 환경 인지 결과와 영상 환경 인지 결과에 대응되는 사용자 상황을 판단하고, 판단된 사용자 상황을 라이프 로그로 기록하는 상황 판단부를 포함하는 것을 특징으로 한다.
According to a second aspect of the present invention, there is provided an apparatus for generating a life log of a portable terminal, comprising: a microphone for receiving a sound signal; a camera for receiving a video signal; An audio environment recognition unit recognizing an audio environment corresponding to an input sound signal, an image environment recognition unit recognizing an image environment corresponding to a video signal input from the camera among preset image environments, And a status determination unit for determining a user status corresponding to the acoustic environment recognition result and the image environment recognition result and recording the determined user status in the life log.

본 발명은 휴대용 단말기에서 마이크 및 카메라로부터 입력되는 영상 및 음향을 이용하여 사용자의 상황을 판단하고, 이를 통해 라이프 로그를 생성함으로써, 사용자가 별도의 장치를 구매하거나 소지하지 않고서 일상생활에서 라이프 로그를 생성할 수 있으며, 라이프 로그만으로 사용자의 생활 패턴을 알 수 있는 효과가 있다. 또한, 본 발명에서 제안한 스케줄링 방식에 따라 영상과 음향을 처리 및 인지함으로써, 영상과 음향을 처리 및 인지하는데 소모되는 시간을 감소시킬 수 있는 효과가 있다.
The present invention relates to a portable terminal, which uses a video and sound input from a microphone and a camera to determine a user's situation and generates a life log, thereby enabling a user to purchase a life log in daily life And the life pattern of the user can be known only by the life log. In addition, according to the scheduling method proposed in the present invention, processing and recognizing images and sounds can reduce the time consumed in processing and recognizing images and sounds.

도 1은 본 발명의 실시 예에 따라 사용자의 주변 환경에 따라 사용자 상황을 판단하는 예를 도시하는 도면,
도 2는 본 발명의 실시 예에 따른 휴대용 단말기의 블록 구성을 도시하는 도면,
도 3은 본 발명의 실시 예에 따른 휴대용 단말기에서 영상 신호 분석 절차를 도시하는 도면,
도 4는 본 발명의 실시 예에 따른 휴대용 단말기에서 영상 신호 분석 절차에 따라 분석 영역을 제한하는 예를 도시하는 도면,
도 5는 본 발명의 실시 예에 따른 휴대용 단말기에서 영상과 음성 처리를 위한 스케줄링 기법을 도시하는 도면,
도 6은 본 발명의 실시 예에 따른 휴대용 단말기의 상황 통계 모델을 도시하는 도면,
도 7은 본 발명의 실시 예에 따른 휴대용 단말기에서 영상 및 음성과 상황 통계 모델을 통해 사용자 상황을 판단하는 예를 도시하는 도면,
도 8은 본 발명의 실시 예에 따른 휴대용 단말기에서 라이프 로그 저장 방식을 도시하는 도면,
도 9는 본 발명의 실시 예에 따른 휴대용 단말기에서 라이프 로그를 생성하여 저장하는 절차를 도시하는 도면,1 is a diagram illustrating an example of determining a user's situation according to a user's surrounding environment according to an embodiment of the present invention;
2 is a block diagram of a portable terminal according to an embodiment of the present invention;
3 is a diagram illustrating a video signal analysis procedure in a portable terminal according to an embodiment of the present invention.
4 is a diagram illustrating an example of limiting an analysis area according to a video signal analysis procedure in a portable terminal according to an embodiment of the present invention;
5 is a diagram illustrating a scheduling technique for video and audio processing in a portable terminal according to an embodiment of the present invention;
6 is a diagram illustrating a status statistical model of a portable terminal according to an embodiment of the present invention.
7 is a diagram illustrating an example of determining a user's situation through a video and audio and a statistical model in a portable terminal according to an embodiment of the present invention.
8 is a diagram illustrating a life log storage method in a portable terminal according to an embodiment of the present invention;
9 is a flowchart illustrating a procedure for generating and storing a life log in a portable terminal according to an embodiment of the present invention;

이하 본 발명의 바람직한 실시 예를 첨부된 도면을 참조하여 설명한다. 그리고, 본 발명을 설명함에 있어서, 관련된 공지기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단된 경우 그 상세한 설명은 생략한다.Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings. In the following description, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

이하 본 발명에서는 휴대용 단말기에서 마이크 및 카메라로부터 입력되는 영상 및 음향을 이용하여 사용자의 상황을 판단하고, 이를 통해 라이프 로그를 생성하는 기술에 관해 설명할 것이다.
Hereinafter, a description will be given of a technique of determining a user's situation using a video and sound input from a microphone and a camera in a portable terminal, and generating a life log through the determination.

본 발명에 따른 휴대용 단말기는 마이크로부터 입력되는 음향과 카메라로부터 입력되는 영상을 통해 주변 환경을 인지하고, 인지된 결과를 통해 사용자의 상황을 판단한다. 예를 들어, 도 1에 도시된 바와 같이, 상기 휴대용 단말기는 상기 마이크로부터 입력되는 음향을 분석하여 자동차 소리(100), 음악 소리(102), 웅성거리는 소리(104)를 인지하고, 상기 카메라로부터 입력되는 영상을 분석하여 인물이 걷는 영상(106)임을 인지한 후, 상기 자동차 소리(100), 음악 소리(102), 웅성거리는 소리(104) 및 걷는 영상(106)을 종합하여 사용자의 상황이 길거리 이동 중(110)인 상황임을 판단할 수 있다.
The portable terminal recognizes the surrounding environment through the sound input from the microphone and the image input from the camera, and determines the user's situation through the perceived result. For example, as shown in FIG. 1, the portable terminal analyzes sounds input from the microphones to recognize a car sound 100, a music sound 102, and a buzzing sound 104, The user recognizes that the image is the walking image 106 and analyzes the input image to synthesize the vehicle sound 100, the music sound 102, the buzzing sound 104, and the walking image 106, It can be determined that the vehicle is in a state of being in the traveling (110).

도 2는 본 발명의 실시 예에 따른 휴대용 단말기의 블록 구성을 도시하고 있다. 상기 도 2에 도시된 바와 같이, 상기 휴대용 단말기는 입력부(200), 인지 결과 누적부(210), 상황 판단부(220), 상황 통계 모델(230), 저장부(240)를 포함하여 구성되며, 상기 입력부(200)는 마이크(202)와 카메라(204)를 포함하며, 상기 인지 결과 누적부(210)는 음향 환경 인지부(212)와 영상 환경 인지부(214)를 포함하여 구성된다.2 shows a block diagram of a portable terminal according to an embodiment of the present invention. 2, the portable terminal includes an input unit 200, a recognition result accumulation unit 210, a situation determination unit 220, a situation statistical model 230, and a storage unit 240 The input unit 200 includes a microphone 202 and a camera 204. The recognition result accumulation unit 210 includes an acoustic environment recognition unit 212 and an image environment recognition unit 214. [

먼저, 상기 입력부(200)는 사용자 상황 판단에 필요한 음향 신호와 영상 신호를 입력받아 상기 인지 결과 누적부(210)로 제공한다. 즉, 상기 입력부(200)는 상기 마이크(202)를 통해 사용자 주변의 다양한 음향 신호를 입력받고, 상기 카메라(204)는 카메라 센서를 통해 사용자 주변의 영상 신호를 입력받는다. 상기 입력부(200)는 기 설정된 주기마다 활성화되어 소정 시간 동안 상기 음향 신호와 영상 신호를 입력받아 상기 인지 결과 누적부(210)로 제공하고, 상기 소정 시간이 만료되면 다음 주기가 될 때까지 비활성화 상태로 되돌아간다.First, the input unit 200 receives the sound signals and the video signals necessary for determining the user status, and provides the sound signals and the image signals to the recognition result accumulation unit 210. That is, the input unit 200 receives various sound signals around the user through the microphone 202, and the camera 204 receives a video signal around the user through the camera sensor. The input unit 200 is activated every preset period, receives the sound signal and the image signal for a predetermined time, and provides the sound signal and the image signal to the recognition result accumulation unit 210. When the predetermined time period expires, Lt; / RTI >

상기 인지 결과 누적부(210)는 상기 입력부(200)로부터 누적 시간 동안 입력되는 음향 신호와 영상 신호를 미리 설정된 단구간(예: 3초 이내의 구간)마다 분석하여 음향 환경과 영상 환경을 인지하고, 상기 누적 시간 동안 인지된 결과를 누적시켜 상기 상황 판단부(220)로 제공한다. 특히, 상기 인지 결과 누적부(210)는 상기 음향 환경 인지부(212)를 통해 상기 음향 신호를 분석하여 음향 환경을 인지하고, 상기 영상 환경 인지부(214)를 통해 영상 신호를 분석하여 영상 환경을 인지한다. The recognition result accumulation unit 210 recognizes an acoustic environment and an image environment by analyzing an acoustic signal and an image signal input during the cumulative time from the input unit 200 for each preset interval (for example, an interval within 3 seconds) And accumulates the results of the accumulation time and provides the accumulated results to the situation determination unit 220. Particularly, the recognition result accumulation unit 210 recognizes the acoustic environment by analyzing the acoustic signal through the acoustic environment recognition unit 212, analyzes the image signal through the image environment recognition unit 214, &Lt; / RTI >

상기 음향 환경 인지부(212)는 기 저장된 음향 환경에 대한 가우시안 혼합 모델(Gaussian Mixture Model)들을 참조하여 상기 입력되는 음향 신호에 가장 유사한 음향 환경을 검색 및 인지한다. 여기서, 상기 음향 환경에 대한 가우시안 혼합 모델들은 각각의 음향 환경에 대한 에너지 특징을 나타낼 수 있으며, 상기 음향 환경은 예를 들어, 웅성거리는 소리, 자동차 소리, 음악 소리, 가방과 같은 밀폐 공간 내 소음, 사무실 소음, 사람 음성, 지하철 소음, 조용한 공공장소의 소음, 물 흐르는 소리 및 큰 소리(혹은 큰 소음)로 구분할 수 있다. 즉, 상기 음향 환경 인지부(212)는 상기 마이크(202)로부터 입력되는 음향 신호의 에너지 특징인 MFCC(Mel Frequency Cepstral Coefficient)를 추출한 후, 상기 가우시안 혼합 모델로 나타낸 음향 환경들 중에서 상기 추출한 에너지 특징에 대해 최대 우도(Likelihood) 값을 갖는 음향 환경을 검색한다. The acoustic environment recognition unit 212 searches for and recognizes an acoustic environment most similar to the input acoustic signal by referring to Gaussian Mixture Models for pre-stored acoustic environments. Here, the Gaussian mixture models for the acoustic environment may represent an energy characteristic for each acoustic environment, and the acoustic environment may include, for example, sounds such as a bell sound, a car sound, a music sound, a noise in a closed space such as a bag, Office noise, human voice, subway noise, quiet public place noise, water running sound, and loud noise (or loud noise). That is, the acoustic environment recognition unit 212 extracts a Mel Frequency Cepstral Coefficient (MFCC), which is an energy characteristic of an acoustic signal input from the microphone 202, and then extracts the extracted energy characteristic from acoustic environments represented by the Gaussian mixture model And searches for an acoustic environment having a maximum Likelihood value.

이때, 상기 음향 신호에 임계값 이상 큰 소리가 포함된 경우, 다른 음성, 소음 혹은 소리를 구분하기 어려우므로, 상기 음향 환경 인지부(212)는 상기 입력되는 음향 신호에 큰 소리가 포함되어 있는지 여부를 먼저 판단하여 상기 큰 소리가 포함된 경우 해당 음향 신호를 큰 소리 환경으로 구분하고, 상기 큰 소리가 포함되지 않은 경우 해당 음향 신호에서 에너지 특징을 추출하여 음향 환경을 검색하는 동작을 수행할 수 있을 것이다.In this case, if the sound signal includes a sound having a sound level higher than the threshold value, it is difficult to distinguish the sound, the noise or the sound from other sounds. Therefore, the sound environment recognition unit 212 determines whether the sound signal If the loud sound is included, the sound signal is divided into a loud sound environment, and if the loud sound is not included, the energy feature may be extracted from the sound signal to search for an acoustic environment will be.

상기 영상 환경 인지부(214)는 기 설정된 기준에 따라 상기 입력되는 영상 신호를 분석하여 영상 환경을 인지한다. 즉, 상기 영상 환경 인지부(214)는 상기 입력되는 영상 신호의 프레임에 대해 저조명, 이동 여부, 얼굴 유무, 실내외 여부를 검사하여 영상 환경을 인지한다. 상기 영상 환경 인지부(214)는 해당 영상 프레임에 대한 빛의 강도(intensity)(혹은 명도)를 측정하여 저조명 여부를 판단하고, 움직임 벡터를 이용하여 이동중인지 여부를 판단하고, 피부색을 감지하여 얼굴 존재 여부를 판단하며, 색감과 질감 특징(texture feature)을 이용하여 실내인지 혹은 실외인지 여부를 판단한다. 예를 들어, 상기 영상 환경 인지부(214)는 입력되는 영상 프레임에 대한 명도를 나타내는 그레이 스케일(gray scale)을 측정하여 평균 그레이 스케일을 임계값과 비교하여 상기 영상 프레임이 저조명인지 여부를 판단할 수 있다. 또한, 상기 영상 환경 인지부(214)는 입력되는 영상 프레임을 복수의 서브 블록으로 나눈 후, 각 서브 블록에 대한 평균 그레이 스케일을 측정하고, 임계값 이하의 그레이 스케일을 가지는 서브 블록 수에 따라 상기 영상 프레임이 저조명인지 여부를 판단할 수 있다. 또한, 상기 영상 환경 인지부(214)는 종래에 제공된 'Lucas-Kanade 알고리즘' 혹은 'Pyramid 알고리즘'을 이용하여 복수의 영상 프레임들로부터 움직임 벡터를 추출함으로써, 상기 영상 프레임의 이동 여부를 판단할 수 있다. 또한, 상기 영상 환경 인지부(214)는 상기 영상 프레임에서 피부색을 갖는 영역을 검출한 후, 검출된 영역이 기 설정된 최대 및 최소 크기 조건, 가로 대 세로의 비율 조건 및 검출 영역 대 그외 영역의 비율 조건을 만족하는지 여부를 검사하여 상기 영상 프레임에 얼굴이 포함되어 있는지 여부를 판단할 수 있다. 또한, 상기 영상 환경 인지부(214)는 상기 영상 프레임을 복수의 서브 블록으로 나눈 후, 각 서브 블록에 대한 색상 및 질감 특징을 추출하여 기 설정된 임계값과 비교함으로써, 상기 영상 프레임이 실내 영상에 해당하는지 혹은 실외 영상에 해당하는지 판단할 수 있다.The image environment recognition unit 214 analyzes the input image signal according to a preset reference to recognize an image environment. That is, the image environment recognition unit 214 recognizes an image environment by inspecting a frame of the input image signal by checking whether the image is low-lighted, whether it is moving, whether there is a face, or whether it is indoor or outdoor. The image environment recognition unit 214 determines whether light is illuminated by measuring the intensity (or brightness) of light for the image frame, determines whether the image is being moved using a motion vector, and detects a skin color The presence or absence of a face is determined, and it is determined whether it is indoor or outdoor using color and texture features. For example, the image environment recognition unit 214 measures a gray scale indicating brightness of an input image frame, compares the average gray scale with a threshold value, and determines whether the image frame is a low contrast image . The image environment recognition unit 214 divides the input image frame into a plurality of subblocks, measures an average gray scale for each subblock, and calculates an average gray scale for each subblock according to the number of subblocks having a gray- It is possible to judge whether or not the image frame is a poor image. The image environment recognition unit 214 extracts a motion vector from a plurality of image frames using the conventional 'Lucas-Kanade algorithm' or 'Pyramid algorithm' to determine whether the image frame is moved have. In addition, after the image environment recognition unit 214 detects a skin color region in the image frame, the image environment recognition unit 214 determines whether the detected region includes a predetermined maximum and minimum size condition, a ratio condition of the width to height, It is possible to determine whether or not a face is included in the image frame by checking whether the condition is satisfied. In addition, the image environment recognition unit 214 divides the image frame into a plurality of sub-blocks, extracts color and texture features for each sub-block, and compares the extracted color and texture features with a preset threshold value, It can be judged whether it is applicable or corresponds to an outdoor image.

특히, 상기 영상 환경 인지부(214)는 상기 입력되는 영상 신호를 실시간으로 처리하기 위해 도 3에 도시된 바와 같은 순서로 해당 영상 프레임을 분석할 수 있다. 먼저, 상기 영상 환경 인지부(214)는 영상 프레임의 저조명(301) 여부를 판단하고, 상기 영상 프레임이 저조명인 경우, 상기 이동 여부, 얼굴 유무, 실내외 여부를 판단하는 절차를 생략한다. 이는, 영상 프레임이 저조명인 경우 영상 인지에 관한 정보가 부족하기 때문이다. 반면, 상기 영상 환경 인지부(214)는 상기 영상 프레임이 저조명이 아닌 경우, 얼굴 유무(303)를 판단하고, 얼굴이 존재하는 경우, 실내인지 혹은 실외(305)인지 여부를 판단한다. 여기서, 상기 영상 환경 인지부(214)는 상기 영상 프레임에 얼굴이 존재하는 경우 이동 여부를 판단하는 절차를 생략할 수 있다. 이는, 상기 영상 프레임에 얼굴이 감지된 경우, 이동 중임을 감지하기 위한 정보가 부족할 수 있기 때문이다. 한편, 상기 영상 프레임에 얼굴이 존재하지 않는 경우, 상기 영상 환경 인지부(214)는 이동 여부(307)를 판단하고 실내인지 혹은 실외(305)인지 여부를 판단한다. 이와 같이 영상 프레임을 분석할 경우, 불필요한 과정을 생략하여 영상 처리 속도를 향상시켜 실시간으로 입력 영상을 처리할 수 있다.In particular, the image environment recognition unit 214 may analyze the image frame in the order shown in FIG. 3 to process the input image signal in real time. First, the image environment recognition unit 214 determines whether the image frame is low illumination 301. If the image frame is a low image quality, the procedure for determining whether the image frame is moving, face, or indoor or outdoor is omitted. This is because, when the image frame is a low-resolution image, information about the image is insufficient. On the other hand, the image environment recognition unit 214 determines whether or not there is a face 303 if the image frame is not a low image, and determines whether the face exists or whether it is indoor or outdoor 305. Here, the image environment recognition unit 214 may skip the procedure for determining whether the face exists in the image frame. This is because, when the face is detected in the image frame, information for detecting that the face is moving may be insufficient. On the other hand, if the face does not exist in the image frame, the image environment recognition unit 214 determines whether the image is a movement 307 and determines whether the face is indoor or outdoor 305. In analyzing the image frame, it is possible to process the input image in real time by omitting the unnecessary process and improving the image processing speed.

또한, 상기 영상 환경 인지부(214)는 상기 입력되는 영상 신호를 실시간으로 처리하기 위해 도 4에 도시된 바와 같이 분석 영역을 제한하여 해당 영상 프레임을 분석할 수 있다. 즉, 상기 영상 환경 인지부(214)는 해당 영상 프레임 전체 영역을 대상으로 영상 프레임의 저조명 여부를 판단하고, 얼굴 유무를 검사한 후. 상기 영상에 얼굴이 존재할 경우, 얼굴이 존재하는 영역(410)을 제외한 나머지 영역을 분석 영역으로 제한하여 실내외 여부 및 이동 여부를 판단할 수 있다. 이와 같이 영상 프레임의 분석 영역을 제한하는 경우, 영상 처리에 필요한 연산량을 감소시켜 실시간으로 입력 영상을 처리할 수 있다.In addition, the image environment recognition unit 214 may analyze the corresponding image frame by limiting the analysis region as shown in FIG. 4 in order to process the input image signal in real time. That is, the image environment recognition unit 214 determines whether the image frame is low-illuminated with respect to the entire image frame, and checks whether the face is low or not. If there is a face in the image, it is possible to determine whether it is indoor or outdoor by restricting the remaining area excluding the area 410 where the face exists to the analysis area. In the case of limiting the analysis area of the image frame, the input image can be processed in real time by reducing the amount of calculation required for the image processing.

여기서, 상기 음향 환경 인지부(212)와 상기 영상 환경 인지부(214)는 도 5에 도시된 바와 같이, 음향 신호와 영상 신호를 서로 다른 시점에 처리할 수 있다. 이때, 상기 음향 환경 인지부(212)와 상기 영상 환경 인지부(214)는 음향 신호와 영상 신호의 입력이 시작된 후 소정 시간이 지난 시점에서 상기 음향 신호와 영상 신호를 동시에 출력하기 위해, 상기 음향 환경을 판단하는데 필요한 시간과 데이터를 고려하고, 상기 영상 환경을 판단하는데 필요한 시간과 데이터를 고려하여 동작한다. 상기 도 5에서는 상기 음향 신호와 영상 신호가 입력되고 3초가 지난 시점에 상기 음향 환경 판단 결과와 상기 영상 판단 결과를 동시에 출력하기 위해, 상기 음향 환경 인지부(212)와 상기 영상 환경 인지부(214)가 동작하는 시점을 나타내고 있다. 상기 도 5에 도시된 바와 같이, 상기 음향 환경 인지부(212)는 3초 동안 주기적으로 반복되는 음향 시점에 음향 신호를 처리하여 음향 환경을 인지하고, 상기 영상 환경 인지부(214)는 상기 3초 동안 주기적으로 반복되는 영상 시점에 영상 신호를 처리한다. 여기서, 상기 음향 시점과 영상 시점은 중복되지 않는다. 상기 영상 환경 인지부(214)는 상기 3초 동안 영상 환경을 판단하기 위해 먼저 저조명 여부를 판단하고, 이동 여부를 검사한 후, 얼굴 유무 및 실내외 여부를 판단할 수 있다. 여기서, 상기 영상 프레임이 저조명인 경우, 나머지 이동 여부, 얼굴 유무 및 실내외 여부를 판단하는 동작을 수행할 필요가 없으므로, 상기 저조명 여부를 가장 먼저 판단하는 것이 중요하다.Here, as shown in FIG. 5, the acoustic environment recognition unit 212 and the image environment recognition unit 214 may process the acoustic signal and the video signal at different points in time. The acoustic environment recognition unit 212 and the image environment recognition unit 214 simultaneously output the sound signal and the image signal at a predetermined time after the input of the sound signal and the image signal starts, Considering the time and data necessary for judging the environment, and considering the time and data necessary for judging the image environment. 5, in order to simultaneously output the acoustic environment determination result and the image determination result at the time when the acoustic signal and the video signal are inputted and the acoustic environment recognition unit 212 and the visual environment recognition unit 214 ) Is operated. 5, the acoustic environment recognition unit 212 recognizes an acoustic environment by processing an acoustic signal at an acoustic time point that is periodically repeated for 3 seconds, and the visual environment recognition unit 214 recognizes the acoustic environment The image signal is processed at a periodic repetition of the image. Here, the acoustic viewpoint and the visual viewpoint do not overlap. In order to determine the image environment for the 3 seconds, the image environment recognition unit 214 may determine whether the image is low-illuminated, check whether the image is illuminated, and determine whether there is a face or not. In this case, when the image frame is a low contrast, it is not necessary to perform the operation of determining the remaining movement, face presence, and indoor / outdoor determination.

상기 상황 통계 모델(230)은 미리 구분된 복수의 상황들에 대한 음향 및 영상 신호에 대한 통계 모델을 나타낸다. 예를 들어, 상기 상황 통계 모델은 일상생활에서 생활/이동하는 공간을 바탕으로 사무실, 식당/카페, 경기관람, 쇼핑몰, 자동차/버스, 강의실, 길거리, 대형 마트, 지하철, 야외 상황, 가정집으로 구분될 수 있다. 상기 각 상황에 대한 통계 모델들은 해당 상황별로 장시간(수 시간 분량)의 음향 및 영상 신호를 수집하고, 수집된 음향 및 영상 신호를 단구간(예: 3초)별로 구분하여 음향 환경 인지 및 영상 환경 인지를 수행한 후, 인지 결과를 누적하여 획득한다. 여기서 상기 각 상황에 대한 통계 모델은, 도 6에 도시된 바와 같이, 상기 누적된 인지 결과를 이용하여 2차원(음향/영상 환경) 히스토그램으로 나타낼 수 있다. 상기 도 6에 도시된 바와 같이, 각 상황에 대한 통계 모델은 각 상황의 특징을 나타낸다. 예를 들어, 대형 마트의 통계 모델은 실내/웅성거림과 이동 중/웅성거림이 높게 나타나고, 지하철의 통계 모델은 실내/지하철 소음과 실내/큰소음이 높게 나타난다. The statistical statistical model 230 represents a statistical model of sound and image signals for a plurality of pre-classified situations. For example, the situation statistical model is classified into an office, a restaurant / café, a game viewing, a shopping mall, a car / bus, a lecture room, a street, a large mart, a subway, . The statistical models for each situation collects a long time (several hours) of sound and video signals in each situation, divides the collected sound and video signals into short intervals (for example, three seconds) After recognition is performed, recognition results are accumulated and acquired. Here, the statistical model for each situation can be represented by a two-dimensional (acoustic / visual environment) histogram using the accumulated cognitive result as shown in FIG. As shown in FIG. 6, the statistical model for each situation indicates characteristics of each situation. For example, statistical models of large marts show high indoor / boggling and moving / boggling, and subway statistical models show high indoor / subway noise and indoor / loud noise.

상기 상황 판단부(220)는 상기 인지 결과 누적부(210)로부터 누적 결과가 제공되면, 상기 상황 통계 모델(230)에 저장된 상황별 모델들과 누적 결과 간의 확률적 거리를 비교하여 상기 확률적 거리가 가장 가까운 상황 통계 모델을 상기 사용자 상황으로 판단한다. 예를 들어, 상기 상황 판단부(220)는 상기 도 7에 도시된 바와 같이, 5분간 누적된 음향 환경 인지 결과와 영상 환경 인지 결과를 나타내는 히스토그램과 상기 상황 통계 모델(230)에 저장된 대형마트, 지하철 및 길거리와 같은 상황별 모델들을 나타내는 히스토그램 간의 확률적 거리를 계산하여 계산된 확률적 거리가 가장 짧은 상황 통계 모델을 상기 사용자의 상황으로 판단할 수 있다. When the cumulative result is provided from the recognition result accumulating unit 210, the situation determining unit 220 compares the probabilistic distance between the cumulative results and the contextual models stored in the situation statistical model 230, As the user situation. For example, as shown in FIG. 7, the situation determination unit 220 may include a histogram representing a five-minute accumulated acoustic environment result, an image environment recognition result, and a large- A situation statistic model having the shortest calculated stochastic distance can be determined as the situation of the user by calculating a probabilistic distance between histograms representing contextual models such as a subway and a street.

상기 상황 판단부(220)는 상기 사용자 상황이 판단되면, 도 8에 도시된 바와 같이 상기 판단된 사용자 상황 정보에 시간 인덱스를 매핑하여 상기 저장부(240)에 저장한다. 여기서, 상기 시간 인덱스는 상기 사용자 상황을 판단한 시점으로, 날짜 정보와 시간 정보를 포함할 수 있다. 여기서, 판단된 사용자 상황을 시간 인덱스와 매핑하여 저장하는 것은 이후 사용자가 필요에 따라 과거 상황을 쉽게 검색할 수 있도록 하기 위함이다. 즉, 상기 사용자가 날짜, 시간 단위 검색을 통해 과거 상황을 검색할 수 있도록 하기 위함이다.If the user status is determined, the status determination unit 220 maps the time index to the determined user status information and stores the time index in the storage unit 240 as shown in FIG. Here, the time index is a time point of determining the user status, and may include date information and time information. Here, mapping and storing the determined user situation with the time index is for the user to easily search the past situation as needed. That is, the user can retrieve the past situation through the search of the date and time unit.

상기 저장부(240)는 상기 휴대용 단말기에서 라이프 로그 생성에 필요한 각종 프로그램 및 데이터를 저장하며, 특히 상기 상황 판단부(220)의 제어에 따라 시간 인덱스가 매핑된 사용자 상황 정보를 저장한다. 여기서, 상기 시간 인덱스가 매핑된 사용자 상황 정보를 사용자의 라이프 로그라고 칭할 수 있다.
The storage unit 240 stores various programs and data necessary for generating a life log in the portable terminal, and in particular, stores user state information to which a time index is mapped under the control of the situation determination unit 220. Here, the user status information to which the time index is mapped may be referred to as a user's life log.

도 9는 본 발명의 실시 예에 따른 휴대용 단말기에서 라이프 로그를 생성하여 저장하는 절차를 도시하고 있다.FIG. 9 shows a procedure for generating and storing a life log in the portable terminal according to the embodiment of the present invention.

상기 도 9를 참조하면, 상기 단말은 901단계에서 마이크(202)와 카메라(204)를 통해 사용자 주변의 다양한 음향 신호와 영상 신호를 입력받는다. Referring to FIG. 9, in step 901, the terminal receives various audio signals and video signals around the user through a microphone 202 and a camera 204.

이후, 상기 단말은 903단계에서 입력된 음향 신호와 영상 신호를 처리하기 위해 상기 음향 신호와 영상 신호를 구분하여 음향 신호일 경우 905단계로 진행하고 영상 신호일 경우 911단계로 진행한다.In step 903, the terminal separates the audio signal and the video signal to process the audio signal and the video signal input in step 903, and proceeds to step 905. If the audio signal is an audio signal,

상기 단말은 905단계에서 미리 설정된 단구간 동안 입력되는 음향 신호에 임계값 이상의 큰 소리가 포함되었는지 여부를 검사한다. 상기 단말은 상기 음향 신호에 임계값 이상의 큰 소리가 포함되어 있을 시 하기 909단계로 진행하여 상기 음향 신호에 대응되는 음향 환경이 큰 소리 환경임을 인지한다. In step 905, the terminal checks whether the sound signal input during a preset interval includes a loud sound having a value equal to or greater than a threshold value. If the sound signal includes a loud sound having a level equal to or greater than the threshold value, the terminal proceeds to step 909 and recognizes that the sound environment corresponding to the sound signal is a loud sound environment.

반면, 상기 음향 신호에 임계값 이상의 큰 소리가 포함되지 않았을 시, 상기 단말은 907단계로 진행하여 상기 음향 신호의 에너지 특징을 추출한 후, 909단계로 진행하여 상기 추출된 에너지 특징에 따른 음향 환경을 인지한다. 즉, 상기 단말은 기 저장된 음향 환경에 대한 가우시안 혼합 모델들 중에서 상기 추출된 에너지 특징에 대해 최대 우도 값을 갖는 음향 환경을 검색 및 인지한다. 여기서, 상기 음향 환경에 대한 가우시안 혼합 모델은 각각의 음향 환경에 대한 에너지 특징을 나타낼 수 있으며, 상기 음향 환경은 예를 들어, 웅성거리는 소리, 자동차 소리, 음악 소리, 가방과 같은 밀폐 공간 내 소음, 사무실 소음, 사람 음성, 지하철 소음, 조용한 공공장소의 소음, 물 흐르는 소리 및 큰 소리로 구분할 수 있다. On the other hand, if the sound signal does not include a loud sound having a value equal to or greater than the threshold value, the terminal proceeds to step 907 to extract the energy characteristic of the sound signal, and then proceeds to step 909, I know. That is, the terminal searches for and recognizes an acoustic environment having a maximum likelihood value with respect to the extracted energy characteristic among the Gaussian mixture models for the pre-stored acoustic environment. Herein, the Gaussian mixture model for the acoustic environment may represent an energy characteristic for each acoustic environment, and the acoustic environment may include, for example, sounds such as a bell sound, a car sound, a music sound, a noise in a closed space such as a bag, Office noise, human voice, subway noise, quiet public place noise, water flowing sound and loud sound.

여기서는 상기 큰 소리 포함 여부를 검사한 후, 상기 큰 소리가 포함되지 않은 경우에 상기 음향 신호의 특징을 추출하여 음향 환경을 인지하였으나, 상기 큰 소리 포함 여부를 검사하는 절차를 생략하고, 음향 신호의 특징을 추출할 수도 있다.In this case, after checking whether the loud sound is included, if the loud sound is not included, the feature of the acoustic signal is extracted to recognize the acoustic environment. However, the procedure for checking whether the loud sound is included is omitted, Features can also be extracted.

한편, 상기 단말은 911단계에서 미리 설정된 단구간 동안 입력되는 영상 신호에 해당하는 영상 프레임의 강도를 측정하여 저조명인지 여부를 검사하고, 상기 저조명일 경우 915단계로 진행하여 상기 영상 신호의 환경이 저조명 환경임을 인지한다. 반면, 해당 영상 프레임이 저조명이 아닐 경우, 상기 단말은 913단계에서 상기 영상 신호에 해당하는 프레임에 대한 움직임 벡터, 피부색 및 색감과 질감을 감지하여 상기 해당 프레임이 이동 중인지 혹은 얼굴이 포함되어 있는지 실내인지 실외인지 여부를 판단하고, 915단계로 진행하여 판단 결과에 따라 상기 해당 프레임의 영상 환경을 인지한다. 여기서, 상기 단말은 상기 입력되는 영상 신호를 실시간으로 처리하기 위해 도 3에 도시된 바와 같은 순서에 따라 영상 프레임을 분석하거나 도 4에 도시된 바와 같이 분석 영역을 제한하여 해당 영상 프레임을 분석할 수 있다. 특히, 상기 단말은 도 5에 도시된 바와 같이, 상기 영상 신호를 상기 음향 신호가 처리되지 않는 시점에 처리할 수 있다.In step 911, the terminal measures the strength of an image frame corresponding to a video signal input during a preset interval and checks whether the video signal is low contrast. If the low contrast is detected, the terminal proceeds to step 915, It recognizes the lighting environment. On the other hand, if the corresponding image frame is not a low frame, the terminal detects a motion vector, a skin color, a color and a texture of the frame corresponding to the video signal in step 913 and determines whether the corresponding frame is moving or a face is included It is determined whether or not it is the outdoor or the outdoors. In step 915, the image environment of the corresponding frame is recognized according to the determination result. In order to process the input video signal in real time, the terminal analyzes the image frame according to the order as shown in FIG. 3 or restricts the analysis region as shown in FIG. 4 to analyze the corresponding image frame have. In particular, as shown in FIG. 5, the terminal can process the video signal at a time when the audio signal is not processed.

상기 음향 신호의 음향 환경이 인지되고, 상기 영상 신호의 영상 환경이 인지되면, 상기 단말은 917단계에서 상기 미리 설정된 단구간에 대한 음향 환경 결과와 영상 환경 결과를 누적시킨다. 여기서, 상기 단말은 상기 인지 결과를 누적시켜 도 6에 도시된 바와 같이, 2차원 히스토그램으로 나타낼 수 있다.If the acoustic environment of the acoustic signal is recognized and the video environment of the video signal is recognized, the terminal accumulates the acoustic environment result and the video environment result for the predetermined section in step 917. [ Here, the terminal can accumulate the recognition result and display it as a two-dimensional histogram as shown in FIG.

이후, 상기 단말은 919단계로 진행하여 미리 설정된 누적 시간이 만료되는지 여부를 검사한다. 상기 누적 시간이 만료되지 않을 시, 상기 단말은 상기 901단계로 되돌아가 이하 단계를 재수행한다.Thereafter, the MS proceeds to step 919 and determines whether a predetermined cumulative time has expired. If the accumulated time has not expired, the terminal returns to step 901 and re-executes the following steps.

반면, 상기 누적 시간이 만료될 시, 상기 단말은 921단계에서 상기 누적 결과를 이용하여 사용자의 상황을 판단한다. 즉, 상기 단말은 미리 저장된 복수의 상황 통계 모델들과 상기 누적 결과 간의 확률적 거리를 비교하여 확률적 거리가 가장 짧은 상황 통계 모델을 사용자 상황으로 판단한다. 여기서, 상기 복수의 상황 통계 모델들은 미리 구분된 복수의 상황들에 대한 음향 및 영상 신호에 대한 통계 모델로서, 2차원 히스토그램으로 나타낼 수 있다. 예를 들어, 상기 상황 통계 모델들은 사무실, 식당/카페, 경기관람, 쇼핑몰, 자동차/버스, 강의실, 길거리, 대형 마트, 지하철, 야외 상황, 가정집 각각에 대한 음향 및 영상 신호에 대한 특징을 나타낼 수 있다.On the other hand, when the accumulation time expires, the terminal determines the user's situation using the accumulation result in step 921. [ That is, the terminal compares the probabilistic distance between the plurality of statistical models stored in advance and the cumulative result, and determines the situation statistical model having the shortest probable distance as the user situation. Here, the plurality of context statistical models may be represented by a two-dimensional histogram as a statistical model for acoustic and image signals for a plurality of pre-classified situations. For example, the contextual statistical models can characterize acoustic and video signals for offices, restaurants / cafes, game viewing, shopping malls, cars / buses, classrooms, streets, large marts, subways, have.

이후, 상기 단말은 923단계로 진행하여 도 8에 도시된 바와 같이 상기 판단된 사용자 상황 정보에 시간 인덱스를 매핑하여 라이프 로그로 저장한다. 여기서, 상기 시간 인덱스는 상기 사용자 상황을 판단한 시점으로, 날짜 정보와 시간 정보를 포함할 수 있다. 여기서, 판단된 사용자 상황을 시간 인덱스와 매핑하여 저장하는 것은 이후 사용자가 필요에 따라 날짜, 시간 단위의 검색을 통해 과거 상황을 쉽게 검색할 수 있도록 하기 위함이다.In step 923, the terminal maps a time index to the determined user context information as shown in FIG. 8 and stores the time index in the life log. Here, the time index is a time point of determining the user status, and may include date information and time information. Here, mapping and storing the determined user status with the time index is for the user to search the past situation easily by searching the date and time unit as needed.

이후, 상기 단말은 본 발명에 따른 알고리즘을 종료한다.
Thereafter, the terminal terminates the algorithm according to the present invention.

한편 본 발명의 상세한 설명에서는 구체적인 실시 예에 관해 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능하다. 그러므로 본 발명의 범위는 설명된 실시 예에 국한되어 정해져서는 아니 되며 후술하는 특허청구의 범위뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다.
While the present invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not limited to the disclosed embodiments. Therefore, the scope of the present invention should not be limited by the illustrated embodiments, but should be determined by the scope of the appended claims and equivalents thereof.

200: 입력부 202: 마이크
204: 카메라 210: 인지 결과 누적부
212: 음향 환경 인지부 214: 영상 환경 인지부
220: 상황 판단부 230: 상황 통계 모델
240: 저장부200: input unit 202: microphone
204: camera 210: recognition result accumulation unit
212: acoustic environment recognition unit 214: image environment recognition unit
220: Situation determination section 230: Situation statistics model
240:

Claims

기 설정된 음향 환경들 중 마이크로부터 입력되는 음향 신호에 대응되는 음향 환경을 인지하는 과정과,
기 설정된 영상 환경들 중 카메라로부터 입력되는 영상 신호에 대응되는 영상 환경을 인지하는 과정과,
기 설정된 상황 모델들 중 상기 음향 환경 인지 결과 및 상기 영상 환경 인지 결과에 대응되는 사용자 상황을 판단하는 과정과,
상기 판단된 사용자 상황을 라이프 로그로 기록하는 과정을 포함하며,
상기 음향 신호 및 상기 영상 신호는 미리 설정된 구간 내에서 서로 다른 시점에 처리되는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 방법.
Recognizing an acoustic environment corresponding to an acoustic signal input from a microphone among predetermined acoustic environments;
Recognizing an image environment corresponding to a video signal input from a camera among preset video environments,
Determining a user context corresponding to the acoustic environment recognition result and the visual environment recognition result among the predetermined context models,
And recording the determined user situation in the life log,
Wherein the sound signal and the video signal are processed at different points within a predetermined interval.

제 1항에 있어서,
상기 음향 환경들은, 미리 설정된 복수의 음향 환경 각각에 대한 에너지 특징을 나타내며,
상기 음향 환경들은, 웅성거리는 소리, 자동차 소리, 음악 소리, 가방과 같은 밀폐 공간 내 소음, 사무실 소음, 사람 음성, 지하철 소음, 조용한 공공장소의 소음, 물 흐르는 소리 및 큰 소리 중 적어도 하나에 대한 환경을 포함하는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 방법.
The method according to claim 1,
Wherein the acoustic environments represent energy characteristics for each of a plurality of predetermined acoustic environments,
The acoustic environments may include an environment for at least one of buzzing, car sound, music sound, noise in enclosed spaces such as a bag, office noises, human voice, subway noises, quiet public noises, And generating a lifetime log of the portable terminal.

제 2항에 있어서,
상기 음향 환경을 인지하는 과정은,
상기 음향 신호의 에너지 특징을 추출하는 과정과,
상기 기 설정된 음향 환경들 중 상기 추출된 에너지 특징에 대해 최대 우도 값을 갖는 음향 환경을 상기 음향 신호에 대응되는 음향 환경으로 인지하는 과정을 포함하는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 방법.
3. The method of claim 2,
The process of recognizing the acoustic environment includes:
Extracting an energy characteristic of the acoustic signal;
And recognizing an acoustic environment having a maximum likelihood value for the extracted energy feature among the predetermined acoustic environments as an acoustic environment corresponding to the acoustic signal.

제 1항에 있어서,
상기 영상 환경들은, 저조명, 이동 여부, 얼굴 유무, 실내외 중 적어도 하나의 조건에 따라 구분되는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 방법.
The method according to claim 1,
Wherein the image environments are classified according to at least one of a low illumination, a moving state, a face presence, and an indoor or outdoor condition.

제 4항에 있어서,
상기 영상 환경을 인지하는 과정은,
상기 카메라로부터 상기 영상 신호가 상기 적어도 하나의 조건을 만족하는지 여부를 검사하는 과정과,
상기 적어도 하나의 조건을 만족하는지 여부에 따라 상기 영상 신호에 대응되는 영상 환경을 인지하는 과정을 포함하는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 방법.
5. The method of claim 4,
The process of recognizing the image environment includes:
Checking whether the video signal satisfies the at least one condition from the camera;
And recognizing an image environment corresponding to the video signal according to whether the at least one condition is satisfied or not.

삭제delete

제 1항에 있어서,
상기 상황 모델들은 미리 구분된 복수의 상황들 각각에 대한 음향 신호 및 영상 신호의 인지 결과 통계를 나타내며,
상기 상황 모델들은, 사무실, 식당/카페, 경기관람, 쇼핑몰, 자동차/버스, 강의실, 길거리, 대형 마트, 지하철, 야외 상황, 가정집 중 적어도 하나에 대한 모델을 포함하는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 방법.
The method according to claim 1,
Wherein the context models represent recognition results of acoustic signals and video signals for a plurality of predefined situations,
Wherein the context models include models for at least one of an office, a restaurant / café, a game viewing, a shopping mall, an automobile / bus, a lecture room, a street, a large mart, a subway, How to create logs.

제 7항에 있어서,
기 설정된 상황 모델들 중 상기 음향 환경 인지 결과 및 상기 영상 환경 인지 결과에 대응되는 상황을 판단하는 과정은,
소정 시간 구간 동안 상기 음향 환경 인지 결과 및 상기 영상 환경 인지 결과를 누적시키는 과정과,
상기 상황 모델들 중 상기 누적 결과와 확률적 거리가 가장 짧은 상황 모델을 사용자 상황으로 판단하는 과정을 포함하는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 방법.
8. The method of claim 7,
Wherein the step of determining the acoustic environment recognition result and the situation corresponding to the visual environment recognition result among the predetermined context models,
Accumulating the acoustic environment recognition result and the visual environment recognition result during a predetermined time interval;
And determining a situation model having the shortest stochastic distance from the cumulative result as the user situation.

제 1항에 있어서,
상기 판단된 사용자 상황을 라이프 로그로 기록하는 과정은,
상기 사용자 상황을 판단한 시점의 시간 정보를 획득하는 과정과,
상기 판단된 사용자 상황에 상기 시간 정보를 매핑하여 기록하는 과정을 포함하는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 방법.
The method according to claim 1,
The process of recording the determined user status into the life log may include:
Acquiring time information at a time of determining the user status;
And mapping the time information to the determined user state and recording the life state of the portable terminal.

음향 신호를 입력받는 마이크와,
영상 신호를 입력받는 카메라와,
기 설정된 음향 환경들 중 상기 마이크로부터 입력되는 음향 신호에 대응되는 음향 환경을 인지하는 음향 환경 인지부와,
기 설정된 영상 환경들 중 상기 카메라로부터 입력되는 영상 신호에 대응되는 영상 환경을 인지하는 영상 환경 인지부와,
기 설정된 상황 모델들 중 상기 음향 환경 인지 결과 및 상기 영상 환경 인지 결과에 대응되는 사용자 상황을 판단하고, 상기 판단된 사용자 상황을 라이프 로그로 기록하는 상황 판단부를 포함하며,
상기 음향 신호 및 상기 영상 신호는 미리 설정된 구간 내에서 서로 다른 시점에 처리되는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 장치.
A microphone for receiving a sound signal,
A camera for receiving a video signal,
An acoustic environment recognition unit recognizing an acoustic environment corresponding to an acoustic signal input from the microphone among predetermined acoustic environments;
An image environment recognition unit recognizing an image environment corresponding to a video signal input from the camera among preset image environments;
And a situation determination unit for determining the user environment corresponding to the acoustic environment recognition result and the image environment recognition result among the preset contextual models and recording the determined user situation into the life log,
Wherein the sound signal and the video signal are processed at different points within a predetermined time interval.

제 10항에 있어서,
상기 음향 환경들은, 미리 설정된 복수의 음향 환경 각각에 대한 에너지 특징을 나타내며,
상기 음향 환경들은, 웅성거리는 소리, 자동차 소리, 음악 소리, 가방과 같은 밀폐 공간 내 소음, 사무실 소음, 사람 음성, 지하철 소음, 조용한 공공장소의 소음, 물 흐르는 소리 및 큰 소리 중 적어도 하나에 대한 환경을 포함하는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 장치.
11. The method of claim 10,
Wherein the acoustic environments represent energy characteristics for each of a plurality of predetermined acoustic environments,
The acoustic environments may include an environment for at least one of buzzing, car sound, music sound, noise in enclosed spaces such as a bag, office noises, human voice, subway noises, quiet public noises, And generating a lifetime log of the portable terminal.

제 10항에 있어서,
상기 음향 환경 인지부는, 상기 음향 신호의 에너지 특징을 추출하고, 상기 기 설정된 음향 환경들 중 상기 추출된 에너지 특징에 대해 최대 우도 값을 갖는 음향 환경을 상기 음향 신호에 대응되는 음향 환경으로 인지하는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 장치.
11. The method of claim 10,
Wherein the acoustic environment recognition unit extracts an energy characteristic of the acoustic signal and recognizes an acoustic environment having a maximum likelihood value for the extracted energy characteristic among the predetermined acoustic environments as an acoustic environment corresponding to the acoustic signal Wherein the life log generating unit generates the life log of the portable terminal.

제 10항에 있어서,
상기 영상 환경들은, 저조명, 이동 여부, 얼굴 유무, 실내외 중 적어도 하나의 조건에 따라 구분되는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 장치.
11. The method of claim 10,
Wherein the image environments are classified according to at least one of a low illumination, a moving state, a face presence, and an indoor or outdoor condition.

제 13항에 있어서,
상기 영상 환경 인지부는, 상기 영상 신호가 상기 적어도 하나의 조건을 만족하는지 여부를 검사하여 상기 영상 신호에 대응되는 영상 환경을 인지하는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 장치.
14. The method of claim 13,
Wherein the image environment recognition unit recognizes an image environment corresponding to the image signal by checking whether the image signal satisfies the at least one condition.

삭제delete

제 10항에 있어서,
상기 상황 모델들은 미리 구분된 복수의 상황들 각각에 대한 음향 신호 및 영상 신호의 인지 결과 통계를 나타내며,
상기 상황 모델들은, 사무실, 식당/카페, 경기관람, 쇼핑몰, 자동차/버스, 강의실, 길거리, 대형 마트, 지하철, 야외 상황, 가정집 중 적어도 하나에 대한 모델을 포함하는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 장치.
11. The method of claim 10,
Wherein the context models represent recognition results of acoustic signals and video signals for a plurality of predefined situations,
Wherein the context models include models for at least one of an office, a restaurant / café, a game viewing, a shopping mall, a car / bus, a lecture room, a street, a large mart, a subway, Log generation device.

제 16항에 있어서,
상기 상황 판단부는, 상기 상황 모델들 중 소정 시간 구간 동안 누적된 음향 환경 인지 결과 및 영상 환경 인지 결과와 확률적 거리가 가장 짧은 상황 모델을 사용자 상황으로 판단하는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 장치.
17. The method of claim 16,
Wherein the situation determination unit determines a situation model having a shortest probabilistic distance from the acoustic environment recognition result and the image environment recognition result accumulated during a predetermined time period among the contextual models as a user situation. Device.

제 10항에 있어서,
상기 상황 판단부는, 상기 사용자 상황을 판단한 시점의 시간 정보를 획득하여 상기 판단된 사용자 상황에 상기 시간 정보를 매핑하여 기록하는 것을 특징으로 하는 휴대용 단말기의 라이프 로그 생성 장치.
11. The method of claim 10,
Wherein the status determination unit acquires time information at a time when the user status is determined, and maps the time information to the determined user status to record the time information.