TW201837901A

TW201837901A - Emotion recognition device and emotion recognition program

Info

Publication number: TW201837901A
Application number: TW107106555A
Authority: TW
Inventors: 淺井清美
Original assignee: 日商賽爾科技股份有限公司
Priority date: 2017-03-28
Filing date: 2018-02-27
Publication date: 2018-10-16
Also published as: JP2020099367A; WO2018180134A1

Abstract

Provided is an emotion recognition device which recognizes the emotion of a person, wherein a recognition accuracy degree of the emotion is improved. An authentication device (1) is provided with: an image feature amount calculation unit which calculates a first emotion feature amount from subject image information captured by a camera (32); a voice feature amount calculation unit which calculates a second feature amount from voice information obtained by collecting a voice emitted by the subject through a microphone (31); and a recognition means which recognizes the emotion of the subject on the basis of the emotion feature amount in which the first emotion feature amount and the second feature amount are synthesized.

Description

情感識別裝置以及情感識別程式 Emotion recognition device and emotion recognition program

本發明係有關於識別操作者的情感的情感識別裝置以及情感識別程式。 The present invention relates to an emotion recognition device and an emotion recognition program for recognizing an operator's emotions.

隨著近年來的運算能力的增強，各式各樣的情況都變成了可以用電腦來做識別。特別是以人做為識別對象來判定出情感這點，可以構思出多樣的應用。 With the increase in computing power in recent years, various situations have become computer-readable. In particular, it is possible to conceive a variety of applications by using humans as the object of recognition to determine emotions.

專利文獻1的實施型態中，記載了在顯示影片期間識別使用者的身心狀況，當檢測出既定程度以上的疲勞感、倦怠感、恐怖感、或者是嫌惡感中的任一者的情況下，會切換到顯示其他的影片，或者是幾釋出既定的訊息(「您的大腦處於疲勞狀態。繼續下去將可能危害健康」之類警告訊息)。 In the embodiment of the patent document 1, it is described that the physical and mental condition of the user is recognized during the display of the movie, and when any of fatigue, burnout, horror, or disgusting feeling of a predetermined degree or more is detected, , will switch to display other videos, or a few warning messages ("Your brain is in a state of fatigue. If it continues, it may endanger health").

專利文獻1：日本特開2005-237561號公報 Patent Document 1: Japanese Laid-Open Patent Publication No. 2005-237561

專利文獻1所記載的發明會感測影像、聲音、生物資訊。然而，專利文獻1的發明並沒有記載怎麼樣地組合影像、聲音、生物資訊來辨識使用者的狀態。更沒有任何考慮到當被拍攝的對象有複數個的情況。因此，本發明的目的是對於識別人的情感的情感識別裝置以及情感識別程式，提昇情感的辨識精準度。 The invention described in Patent Document 1 senses images, sounds, and biological information. However, the invention of Patent Document 1 does not describe how to combine images, sounds, and biological information to identify the state of the user. There is no consideration of the fact that there are a plurality of objects to be photographed. Therefore, the object of the present invention is to enhance the recognition accuracy of emotions for an emotion recognition device and an emotion recognition program that recognize a person's emotions.

為了解決上述的問題，本發明的一種情感識別裝置，包括：影像特徵量算出裝置，從相機拍攝的被拍攝體的影像資訊中算出第1情感特徵量；聲音特徵量算出裝置，從麥克風收集該被拍攝體發出的聲音之聲音資訊中算出第2情感特徵量；以及辨識裝置，根據合成了該第1情感特徵量及該第2情感特徵量之情感特徵量，辨識該被拍攝體的情感。關於其他的裝置，將在實施本發明的型態中說明。 In order to solve the above problem, an emotion recognition device according to the present invention includes: an image feature amount calculation device that calculates a first emotion feature amount from image information of a subject imaged by a camera; and a sound feature amount calculation device that collects the microphone feature amount from the microphone The second emotion feature amount is calculated from the sound information of the sound emitted by the subject, and the recognition device recognizes the emotion of the subject based on the emotion feature amount in which the first emotion feature amount and the second emotion feature amount are combined. Other devices will be described in the mode of carrying out the invention.

根據本發明，識別人的情感的情感識別裝置以及情感識別程式中，能夠提昇情感的辨識精準度。 According to the present invention, in the emotion recognition device and the emotion recognition program that recognize the emotion of the person, the recognition accuracy of the emotion can be improved.

S‧‧‧認證系統 S‧‧‧ Certification System

1‧‧‧認證裝置 1‧‧‧Authorized device

11‧‧‧CPU 11‧‧‧CPU

12‧‧‧虛擬機器監視器 12‧‧‧Virtual Machine Monitor

13‧‧‧多功能作業系統 13‧‧‧Multifunctional operating system

14‧‧‧即時作業系統 14‧‧‧ Real-time operating system

15‧‧‧安全作業系統 15‧‧‧Safe Operating System

2‧‧‧機器 2‧‧‧ Machine

31‧‧‧麥克風 31‧‧‧ microphone

32‧‧‧相機 32‧‧‧ camera

33‧‧‧心跳感測器(行動感測器) 33‧‧‧heartbeat sensor (motion sensor)

34‧‧‧氣溫感測器(環境感測器) 34‧‧‧temperature sensor (environment sensor)

4‧‧‧機械學習伺服器 4‧‧‧Mechanical learning server

411‧‧‧個人聲音檢出部 411‧‧‧ Personal Sound Detection Department

412‧‧‧影像處理部 412‧‧‧Image Processing Department

413‧‧‧行動分析部 413‧‧‧Action Analysis Department

414‧‧‧環境分析部 414‧‧‧Environmental Analysis Department

42‧‧‧特徵量算出部 42‧‧‧Characteristics calculation unit

421‧‧‧聲音特徵量算出部 421‧‧‧Sound feature calculation unit

422‧‧‧影像特徵量算出部 422‧‧‧Image Feature Calculation Unit

423‧‧‧行動特徵量算出部 423‧‧‧Action Feature Calculation Unit

424‧‧‧環境特徵量算出部 424‧‧‧Environmental Feature Calculation Unit

43‧‧‧特徵量保存部 43‧‧‧Characteristics Department

44‧‧‧記憶部 44‧‧‧Memory Department

45‧‧‧校正部 45‧‧‧Correction Department

46‧‧‧通信部 46‧‧‧Communication Department

471‧‧‧聲音情感分析部 471‧‧‧Sound and Emotional Analysis Department

472‧‧‧影像情感分析部 472‧‧‧Image Affective Analysis Department

473‧‧‧行動情感分析部 473‧‧‧Action Affective Analysis Department

474‧‧‧環境情感分析部 474‧‧‧Environmental Affective Analysis Department

50‧‧‧組態設定 50‧‧‧Configuration settings

501‧‧‧組態資料庫 501‧‧‧Configuration database

51‧‧‧感測資料接收功能 51‧‧‧Sensing data receiving function

52‧‧‧信號處理功能 52‧‧‧Signal processing function

53‧‧‧信號處理功能利用系統 53‧‧‧Signal Processing Function Utilization System

54‧‧‧機械學習功能 54‧‧‧Mechanical learning function

55‧‧‧特徵量算出功能 55‧‧‧Characteristic calculation function

561‧‧‧個人推測功能 561‧‧‧personal speculation function

562‧‧‧情感推測功能 562‧‧‧Emotional speculation function

563‧‧‧身體狀況推測功能 563‧‧‧Physical condition estimation function

58‧‧‧校正功能 58‧‧‧correction function

581‧‧‧特徵點資料庫 581‧‧‧ Feature Point Database

582‧‧‧特徵量資料庫 582‧‧‧Characteristic Database

583‧‧‧標籤資料庫 583‧‧‧Label database

59‧‧‧結果輸出功能 59‧‧‧Result output function

611‧‧‧個人聲音檢出部 611‧‧‧ Personal Sound Detection Department

612‧‧‧影像處理部(影像特定裝置) 612‧‧‧Image Processing Department (Image Specific Device)

613‧‧‧行動分析部 613‧‧ ‧Action Analysis Department

614‧‧‧環境分析部 614‧‧ Environmental Analysis Department

621‧‧‧聲音特徵量算出部 621‧‧‧Sound feature calculation unit

622‧‧‧影像特徵量算出部 622‧‧‧Image Feature Calculation Unit

623‧‧‧行動特徵量算出部 623‧‧‧Action Feature Calculation Department

624‧‧‧環境特徵量算出部 624‧‧‧Environmental feature calculation unit

63‧‧‧個人認證部 63‧‧‧Personal Certification Department

64‧‧‧操作許可部 64‧‧‧Operation Permit Department

651‧‧‧特徵量保存部 651‧‧‧Characteristics Department

652‧‧‧校正部 652‧‧‧Correction Department

653‧‧‧通信部 653‧‧‧Communication Department

66‧‧‧記憶部 66‧‧‧Memory Department

661‧‧‧安全領域 661‧‧‧Security area

662‧‧‧非安全領域 662‧‧‧ Non-security areas

671‧‧‧聲音情感分析部 671‧‧‧Sound and Emotional Analysis Department

672‧‧‧影像情感分析部 672‧‧‧Image Affective Analysis Department

673‧‧‧行動情感分析部 673‧‧‧Action Affective Analysis Department

674‧‧‧環境情感分析部 674‧‧‧Environmental Affective Analysis Department

68‧‧‧情感推測部 68‧‧‧Emotional Estimation Department

681‧‧‧身體狀況推測部 681‧‧‧Physical Status Department

71‧‧‧聲音波形領域 71‧‧‧Sound waveform field

72‧‧‧聲紋領域 72‧‧‧ voice field

73‧‧‧三次元影像 73‧‧‧Three-dimensional image

81‧‧‧情感特徵量空間 81‧‧‧Emotional feature space

82‧‧‧情感特徵量平面 82‧‧‧Emotional feature plane

9‧‧‧階層型人工神經網路 9‧‧ ‧ hierarchical artificial neural network

91‧‧‧輸入層 91‧‧‧Input layer

92‧‧‧第1中間層 92‧‧‧1st intermediate layer

93‧‧‧第2中間層 93‧‧‧2nd intermediate layer

94‧‧‧輸出層 94‧‧‧Output layer

98‧‧‧節點 98‧‧‧ nodes

99‧‧‧邊 99‧‧‧ side

第1圖係顯示本實施型態的認證系統的概略的構造圖。 Fig. 1 is a schematic structural view showing an authentication system of the present embodiment.

第2圖係顯示本實施型態的軟體架構的一例的示意圖。 Fig. 2 is a schematic view showing an example of a software architecture of the present embodiment.

第3圖係顯示本實施型態的認證裝置的邏輯架構的示意圖。 Fig. 3 is a schematic diagram showing the logical architecture of the authentication apparatus of the present embodiment.

第4圖係本實施型態的機械學習伺服器的邏輯架構的示意圖。 Figure 4 is a schematic diagram of the logical architecture of the mechanical learning server of the present embodiment.

第5圖係認證處理的流程圖。 Figure 5 is a flow chart of the authentication process.

第6圖係顯示聲音的個人認證動作的細節的示意圖。 Figure 6 is a schematic diagram showing the details of the personal authentication action of the sound.

第7圖係顯示影像的個人認證動作的細節的示意圖。 Figure 7 is a schematic diagram showing the details of the personal authentication action of the image.

第8圖係顯示健康群與患疾群的檢查結果以及其分布的概念的圖表。 Fig. 8 is a graph showing the results of the examination of the healthy group and the affected group and the concept of its distribution.

第9圖係顯示個人認證處理的流程圖(其之一)。 Figure 9 is a flow chart showing one of the personal authentication processes (one of them).

第10圖係顯示個人認證處理的流程圖(其之二)。 Figure 10 is a flow chart showing the personal authentication process (Part 2).

第11圖係顯示進行個人認證處理的階層型人工神經網路的說明圖。 Fig. 11 is an explanatory diagram showing a hierarchical artificial neural network for performing personal authentication processing.

第12圖係顯示層間的指標與邊權重的一例的說明圖。 Fig. 12 is an explanatory diagram showing an example of an index between the layers and an edge weight.

第13圖係情感辨識處理的流程圖。 Figure 13 is a flow chart of the emotion recognition process.

第14圖係顯示疲勞感、清醒感、舒適感的3軸所構成的情感的多次元空間的概念圖。 Fig. 14 is a conceptual diagram showing a multi-dimensional space of emotions composed of three axes of fatigue, sensation, and comfort.

第15圖係顯示清醒感與舒適感的2軸的情感空間的概念圖。 Fig. 15 is a conceptual diagram showing a 2-axis emotional space of waking feeling and comfort.

以後，參照各圖詳細地說明實施本發明的實施型態。第1圖係顯示本實施型態的認證系統S的概略的構造圖。認證系統S的組成包括認證裝置1、藉此來許可操作的機器2、麥克風31、相機32、心跳感測器33、氣溫感測器34、機械學習伺服器4。認證裝置1會與麥克風31、相機32、心跳感測器33等透過纜線或近距離無線通信等而可通信地連接。麥克風31、相機32、心跳感測器33所檢測到的各感測器資訊會送到認證裝置1與機械學習伺服器4。機械學習伺服器4從各感測器資訊中算出特徵量，將校正特徵點並保存這些特徵點。在此，機器2例如是車輛等的移動體。藉由將認證裝置1組裝到車輛這種機器2中，能夠構成例如先進駕駛輔助系統(advanced driver assistance system)。又，認證裝置1也可以藉由追加穿戴式感測器，來記錄身體的狀態、飲食或生活習慣，根據這些記錄結果來判定情感、疾病以及睡意。藉此，能夠將使用者的精神或肉體的健康狀態通知這個使用者，對於延長使用者的健康壽命帶來貢獻。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Fig. 1 is a schematic structural view showing an authentication system S of the present embodiment. The composition of the authentication system S includes an authentication device 1, a device 2 for permitting operation, a microphone 31, a camera 32, a heartbeat sensor 33, a temperature sensor 34, and a mechanical learning server 4. The authentication device 1 is communicably connected to the microphone 31, the camera 32, the heartbeat sensor 33, and the like via a cable or short-range wireless communication or the like. The sensor information detected by the microphone 31, the camera 32, and the heartbeat sensor 33 is sent to the authentication device 1 and the mechanical learning server 4. The mechanical learning server 4 calculates the feature amount from each sensor information, and corrects the feature points and stores the feature points. Here, the machine 2 is, for example, a moving body such as a vehicle. By assembling the authentication device 1 into the machine 2 such as a vehicle, for example, an advanced driver assistance system can be constructed. Further, the authentication device 1 may record the state of the body, the diet, or the lifestyle by adding a wearable sensor, and determine the emotion, the disease, and the drowsiness based on the results of the recording. Thereby, the user's mental or physical health status can be notified to the user, contributing to extending the healthy life of the user.

在此，特徵點是指從各感測器資訊算出的各特徵量所組成的多次元空間中顯示出各辨識結果對應哪一個領域的資訊，也稱為機械學習標籤。機械學習伺服器4例如設置於雲端(資料中心)。機械學習伺服器4從認證裝置1收到特徵點資訊，校正這個特徵點資訊並更新，將更新的特徵點傳送給認證裝置1。機械學習伺服器4具備特徵量算出量42、特徵量保存部43、校正部45。特徵量算出部42根據各個感測器資訊分別算出特徵量。特徵量保存部43保存從各感測器資訊中算出的各特徵量。校正部45根據從各感測器資訊算出的各特徵量，校正特徵點(機械學習標籤)。機械學習伺服器4的邏輯架構以後述的第4圖來說明。 Here, the feature point is information indicating which field corresponding to each recognition result is displayed in the multi-element space composed of the respective feature amounts calculated from the respective sensor information, and is also referred to as a mechanical learning tag. The mechanical learning server 4 is installed, for example, in the cloud (data center). The mechanical learning server 4 receives the feature point information from the authentication device 1, corrects the feature point information and updates it, and transmits the updated feature point to the authentication device 1. The machine learning server 4 includes a feature amount calculation amount 42, a feature amount storage unit 43, and a correction unit 45. The feature amount calculation unit 42 calculates the feature amount based on each sensor information. The feature amount storage unit 43 stores the feature amounts calculated from the respective sensor information. The correction unit 45 corrects the feature points (mechanical learning tags) based on the respective feature amounts calculated from the respective sensor information. The logical structure of the machine learning server 4 will be described later in Fig. 4.

認證裝置1根據例如搭載於車輛這種機器2的麥克風31或相機32、設置於方向盤或座位的心跳感測器33所檢測的各感測器資訊，來認證操作者是誰，如果是具有機器2的操作權限的人的話就允許對這個機器2的操作。認證裝置1更將氣溫感測器34所檢測到的室溫資訊(第4情感特徵量)加入聲音資訊、影像資訊、心跳資訊，來辨識認證的人的情感，從辨識到的情感中檢測到既定的疾病(例如癲癇或心臟病的發作等)的話，就停止機器2。 The authentication device 1 authenticates the operator according to the sensor information detected by, for example, the microphone 31 or the camera 32 mounted on the device 2 of the vehicle, and the heartbeat sensor 33 provided on the steering wheel or the seat, if it has a machine. The operation of this machine 2 is allowed for the person of the operation authority of the second. The authentication device 1 further adds the room temperature information (the fourth emotional feature amount) detected by the temperature sensor 34 to the sound information, the image information, and the heartbeat information to identify the emotion of the authenticated person, and detects from the recognized emotion. If the established disease (such as epileptic seizures or heart attack), the machine 2 is stopped.

麥克風31朝向機器2的操作者而設置，收集操作者發出的聲音。相機32是拍攝機器2的操作者的臉的立體相機。心跳感測器33設置於例如機器2的操作座席等，與操作者的身體緊密靠近。藉此，心跳感測器33能夠檢測機器2的操作者的心跳。心跳是心臟跳動的反覆。這個心跳資料的反覆，也就是心跳圖樣會因人而異。因此，藉由心跳感測器33來取得心跳資料，能夠推測出那個人是誰。心跳感測器33是檢測機器2的操作者的具體行動的行動感測器。氣溫感測器34設置於機器2的操作者的附近，是量測這個操作者的周圍的氣溫的環境感測器。 The microphone 31 is disposed toward the operator of the machine 2 to collect the sound emitted by the operator. The camera 32 is a stereo camera that photographs the face of the operator of the machine 2. The heartbeat sensor 33 is provided, for example, to an operator's seat of the machine 2 or the like, in close proximity to the operator's body. Thereby, the heartbeat sensor 33 can detect the heartbeat of the operator of the machine 2. Heartbeat is the repetition of a heartbeat. The repetition of this heartbeat data, that is, the heartbeat pattern will vary from person to person. Therefore, the heartbeat data is obtained by the heartbeat sensor 33, and it is possible to estimate who the person is. The heartbeat sensor 33 is a motion sensor that detects the specific action of the operator of the machine 2. The temperature sensor 34 is disposed in the vicinity of the operator of the machine 2 and is an environmental sensor that measures the temperature around the operator.

認證裝置1具備CPU(Central Processing Unit)11。這個CPU11執行未圖示的管理程序程式，使虛擬機器監視器12具體化。虛擬機器監視器12是具體化假想機器(虛擬機器)之物。藉由這個虛擬機器監視器12的動作，使分別讓多功能作業系統13、即時作業系統14、安全作業系統15動作的未圖示的假想機械具體化。 The authentication device 1 includes a CPU (Central Processing Unit) 11. This CPU 11 executes a hypervisor program (not shown) to embody the virtual machine monitor 12. The virtual machine monitor 12 is a material that embodies a virtual machine (virtual machine). By the operation of the virtual machine monitor 12, a virtual machine (not shown) that operates the multi-function operating system 13, the real-time operating system 14, and the safe working system 15 is embodied.

多功能作業系統13是例如Linux(登錄商標)一樣使用者界面相當充實的作業系統。即時作業系統14重視即時性，是專門強化於提供根據時間資源的優先度的分配以及執行時間的預測可能性的作業系統。安全作業系統15是強化了安全的作業系統。本實施型態中，藉由在即時作業系統14上動作的應用程式，使個人聲音檢出部611、影像處理部612、行動分析部613、環境分析部614具體化。像這樣，以重視即時性的作業系統來做個人認證，能夠在縮短開始認證到結束期間的時間滯後。又也能夠即時把握聲音與影像的關聯性。 The multi-function operating system 13 is an operating system in which the user interface is quite substantial, such as Linux (registered trademark). The real-time operating system 14 emphasizes immediacy and is an operating system that is specifically enhanced to provide an allocation of priorities based on time resources and a prediction possibility of execution time. The safe operating system 15 is an enhanced operating system. In the present embodiment, the personal sound detecting unit 611, the image processing unit 612, the action analyzing unit 613, and the environment analyzing unit 614 are embodied by an application operating on the real-time operating system 14. In this way, personal authentication can be performed with an operating system that emphasizes immediacy, and the time lag between the start of the authentication and the end of the authentication can be shortened. It is also possible to instantly grasp the correlation between sound and image.

個人聲音檢出部611檢出麥克風31收集的聲音資訊。影像處理部612處理相機32拍攝的影像資訊。行動分析部613從心跳感測器33檢測到的心跳資料(行動資訊)中分析對象者的行動。環境分析部614加入氣溫感測器檢測到的氣溫資料(環境資訊)，提昇對象者的情感辨識精準度，這些認證裝置1的邏輯架構將在後述的第3圖說明。 The personal sound detecting unit 611 detects the sound information collected by the microphone 31. The image processing unit 612 processes the image information captured by the camera 32. The action analysis unit 613 analyzes the action of the subject person from the heartbeat data (action information) detected by the heartbeat sensor 33. The environmental analysis unit 614 adds the temperature data (environmental information) detected by the temperature sensor to improve the emotion recognition accuracy of the subject, and the logical structure of the authentication device 1 will be described in FIG. 3 which will be described later.

第2圖係顯示本實施型態的軟體架構的一例的示意圖。感測資料接收功能51會接收來自各感測器所感測聲音、影像、心音等的各資料。感測資料接收功能51所接收的各感測資料會輸出到信號處理功能52。信號處理功能52會藉由信號處理機能利用系統53，對各感測資料進行濾過或平滑化等的信號處理。信號處理機能利用系統53是執行濾過或平滑化等的信號處理之物。信號處理機能52會將信號處理過的感測資料輸出給機械學習機能54、個人推測機能561與情感推測機能562。 Fig. 2 is a schematic view showing an example of a software architecture of the present embodiment. The sensing data receiving function 51 receives various materials from the sensed sounds, images, heart sounds, and the like. Each sensed data received by the sensing data receiving function 51 is output to the signal processing function 52. The signal processing function 52 performs signal processing such as filtering or smoothing on each sensing data by the signal processing function utilization system 53. The signal processing function utilization system 53 is a signal processing device that performs filtering or smoothing or the like. The signal processing function 52 outputs the signal processed sensing data to the mechanical learning function 54, the personal estimation function 561, and the emotion estimation function 562.

機械學習功能54會進行根據信號處理的感測資料之機械學習，將機械學習結果輸出。特徵量算出功能55根據這個機械學習結果檢測出有關現在的操作者的特徵點。特徵點資料資料庫581是將個人以及其特徵點付與對應連結的資料庫。特徵量資料庫582是保存特徵量算出功能55算出的特徵量的資料庫。標籤資料資料庫583是將情感空間與該情感空間中用於決定人的情感的標籤付與對應連結的資料庫。分別對應到個人的標籤稱為個人標籤。與人無關而可通用的標籤稱為一般標籤。 The mechanical learning function 54 performs mechanical learning of the sensing data according to the signal processing, and outputs the mechanical learning result. The feature amount calculation function 55 detects a feature point regarding the current operator based on this mechanical learning result. The feature point material database 581 is a database in which individuals and their feature points are associated with each other. The feature amount database 582 is a database that stores the feature amount calculated by the feature amount calculation function 55. The tag material database 583 is a database that associates the emotion space with the tag for determining the emotion of the person in the emotion space. The label corresponding to the individual is called a personal label. A label that is unrelated to humans and is universal is called a general label.

個人推測功能561會將信號處理後的聲音、影像、行動的感測資料、或特徵量算出功能55算出的特徵量與特徵點資料庫581比對。判定現在的操作者是誰。情感推測功能562會從聲音、影像、行動的感測資料以及特徵量中算出情感空間中的情感座標，將這個情感座標與標籤資料庫583相比對，推測出操作者的情感。身體狀況推測功能563會根據操作者的情感中推測身體狀況。 The personal estimation function 561 compares the signal processed sound, the video, the motion sensing data, or the feature amount calculated by the feature amount calculation function 55 with the feature point database 581. Determine who the current operator is. The emotion estimation function 562 calculates the emotional coordinates in the emotion space from the sensed data of the sound, the image, the action, and the feature amount, and compares the emotion coordinate with the tag database 583 to estimate the emotion of the operator. The physical condition estimation function 563 estimates the physical condition based on the operator's emotions.

校正功能58會根據特徵量資料庫582，維護特徵點資料庫581與標籤資料庫583。維護對象是儲存於特徵點資料庫581中的個人認證用的聲紋的特徵點、臉部辨識的特徵點與行動的特徵點、以及儲存在標籤資料庫583中的情感辨識用的個人標籤與一般標籤。校正功能58所維護的特徵點資料庫581會被個人推測功能561所參照。標籤資料庫583會被情感推測功能562或身體狀況推測功能563所參照。 The correction function 58 maintains the feature point database 581 and the tag library 583 based on the feature amount database 582. The maintenance target is a feature point of the voiceprint for personal authentication stored in the feature point database 581, a feature point of the face recognition and a feature point of the action, and a personal tag for emotion recognition stored in the tag database 583. General label. The feature point database 581 maintained by the correction function 58 is referred to by the personal speculation function 561. The tag database 583 is referred to by the sentiment speculation function 562 or the physical condition estimation function 563.

結果輸出功能59會顯示文字、圖、二次元或三次元圖形，或者是以聲音發出警報。結果輸出功能59會顯示或通知個人推測功能561所推測的個人、或者是情感推測功能562所推測的個人的情感。又，結果輸出功能59會顯示或通知身體狀況推測功能563推測的疾病、或者是因為這個疾病而不被許可的操作。組態設定50會設定結果輸出功能59的輸出形式，更設定成為感測資料接收功能51的接收對象的感測器。組態資料庫501會儲存組態設定50所設定的內容。 The result output function 59 displays a text, a picture, a two-dimensional or three-dimensional graphic, or an alarm is sounded. The result output function 59 displays or notifies the individual estimated by the personal speculation function 561 or the emotion of the individual estimated by the sentiment speculation function 562. Further, the result output function 59 displays or notifies the disease estimated by the physical condition estimation function 563 or the operation that is not permitted because of the disease. The configuration setting 50 sets the output form of the result output function 59, and further sets the sensor to be the receiving target of the sensing data receiving function 51. The configuration database 501 stores the contents set by the configuration setting 50.

第3圖係顯示本實施型態的認證裝置1的邏輯構造的示意圖。認證裝置1可以與麥克風31、相機32、心跳感測器33、氣溫感測器34、機器2可通信地連接。認證裝置1取得感測器資訊對使用者進行個人認證，認證這個使用者是否是具有機器2的操作權限的人。又，認證裝置1推測出使用者的情感，從這個情感推測出疾病，判斷對機器2的操作許可或不許可。 Fig. 3 is a view showing the logical configuration of the authentication device 1 of the present embodiment. The authentication device 1 can be communicably coupled to the microphone 31, the camera 32, the heartbeat sensor 33, the temperature sensor 34, and the machine 2. The authentication device 1 obtains the sensor information to perform personal authentication on the user, and authenticates whether the user is a person having the operation authority of the machine 2. Further, the authentication device 1 estimates the emotion of the user, estimates the disease from this emotion, and determines whether or not the operation of the device 2 is permitted or not.

認證裝置1具備個人聲音檢出部611、聲音特徵量算出部621、影像處理部612、影像特徵量算出部622、行動分析部613、行動特徵量算出部623、環境分析部614、環境特徵量算出部624、個人認證部63、操作許可部64。在此之中，個人聲音檢出部611、影像處理部612、行動分析部613、環境分析部614 相當於第2圖所示的感測資料接收功能51及信號處理功能52。聲音特徵量算出部621、影像特徵量算出部622、行動特徵量算出部623、環境特徵量算出部624相當於第2圖所示的特徵量算出功能55。個人認證部63與操作許可部64相當於第2圖所示的個人推測功能561。 The authentication device 1 includes a personal sound detection unit 611, a sound feature amount calculation unit 621, a video processing unit 612, a video feature amount calculation unit 622, an action analysis unit 613, an action feature amount calculation unit 623, an environment analysis unit 614, and an environmental feature amount. The calculation unit 624, the personal authentication unit 63, and the operation permission unit 64. Here, the personal sound detecting unit 611, the video processing unit 612, the action analyzing unit 613, and the environment analyzing unit 614 correspond to the sensing data receiving function 51 and the signal processing function 52 shown in FIG. The sound feature amount calculation unit 621, the video feature amount calculation unit 622, the action feature amount calculation unit 623, and the environmental feature amount calculation unit 624 correspond to the feature amount calculation function 55 shown in Fig. 2 . The personal authentication unit 63 and the operation permission unit 64 correspond to the personal estimation function 561 shown in Fig. 2 .

又，認證裝置1具備聲音情感分析部671、影像情感分析部672、行動情感分析部673、環境情感分析部674、情感推測部68、身體狀況推測部681。在此之中，聲音情感分析部671、影像情感分析部672、行動情感分析部673、環境情感分析部674、情感推測部68相當於第2圖所示的情感推測功能562。身體狀況推測部681相當於第2圖所示的身體狀況推測功能563。認證裝置1更具備特徵量保存部651、校正部652、通信部653、記憶部66。 Further, the authentication device 1 includes a voice emotion analysis unit 671, a video emotion analysis unit 672, an action emotion analysis unit 673, an environmental emotion analysis unit 674, an emotion estimation unit 68, and a physical condition estimation unit 681. Among these, the voice emotion analysis unit 671, the video emotion analysis unit 672, the action emotion analysis unit 673, the environmental emotion analysis unit 674, and the emotion estimation unit 68 correspond to the emotion estimation function 562 shown in Fig. 2 . The physical condition estimation unit 681 corresponds to the physical condition estimation function 563 shown in Fig. 2 . The authentication device 1 further includes a feature amount storage unit 651, a correction unit 652, a communication unit 653, and a storage unit 66.

個人聲音檢出部611取得麥克風31收集的聲音資料，對這個聲音資料進行頻譜化的信號處理，將處理結果的聲紋輸出到聲音特徵量算出部621與個人認證部63。聲音特徵量算出部621會機械學習聲紋，將關係到這個聲音資料的特徵量算出，將這個特徵量輸出到個人認證部63、聲音情感分析部671、特徵量保存部651。個人聲音檢出部611的處理結果的聲紋的例子會顯示於後述的第6圖。聲音情感分析部671會根據聲紋、從聲紋算出的特徵量，分析發出這個聲音的人的情感，將分析出的情感空間的座標值輸出到情感推測部68。 The personal sound detecting unit 611 acquires the sound data collected by the microphone 31, performs signal processing on the sound data, and outputs the voiceprint of the processing result to the sound feature amount calculating unit 621 and the personal authentication unit 63. The voice feature amount calculation unit 621 mechanically learns the voiceprint, calculates the feature amount related to the voice material, and outputs the feature amount to the personal authentication unit 63, the voice emotion analysis unit 671, and the feature amount storage unit 651. An example of the voiceprint of the processing result of the personal sound detecting unit 611 is shown in Fig. 6 which will be described later. The voice emotion analysis unit 671 analyzes the emotion of the person who issued the voice based on the voiceprint and the feature amount calculated from the voiceprint, and outputs the coordinate value of the analyzed emotion space to the emotion estimation unit 68.

影像處理部612取得相機32拍攝的影像資料，進行檢測出臉部領域、或檢測出臉之中的雙眼、鼻、口的兩端等的三次元座標的處理，再將影像資料的處理結果輸出到影像特徵量算出部622與個人認證部63。影像特徵量算出部622機械學習處理結果之各部的三次元座標，算出關於這個影像資料的特徵量，將這個特徵量輸出到個人認證部63與特徵量保存部651與影像情感分析部672。影像處理部612的處理結果之各部的三次元座標顯示於後述的第7圖。影像情感分析部672根據臉部的三次元影像、從臉部的三次元影像中算出的特徵量，分析這個臉部的情感，將分析出的情感空間的座標輸出到情感推測部68。 The image processing unit 612 acquires the image data captured by the camera 32, and detects the face region or detects the three-dimensional coordinates of the both eyes, the nose, and the mouth of the face, and then processes the image data. The image feature amount calculation unit 622 and the individual authentication unit 63 are output. The video feature amount calculation unit 622 mechanically learns the ternary coordinates of each unit of the processing result, calculates the feature amount of the video material, and outputs the feature amount to the personal authentication unit 63, the feature amount storage unit 651, and the video emotion analysis unit 672. The ternary coordinates of each part of the processing result of the image processing unit 612 are displayed in Fig. 7 which will be described later. The image sentiment analysis unit 672 analyzes the emotion of the face based on the three-dimensional image of the face and the feature amount calculated from the three-dimensional image of the face, and outputs the coordinates of the analyzed emotion space to the emotion estimation unit 68.

行動分析部613會取得心跳感測器33感測到的心跳資料(生物資訊)，執行分析這個心跳圖樣的處理，將分析出來的心跳圖樣輸出到個人認證部63、行動特徵量算出部623、行動情感分析部673。行動特徵量算出部623機械學習處理結果之心跳圖樣，算出關於這個心跳資料的特徵量。行動情感分析部673會根據心跳資料、分析這個心跳資料的心跳圖樣，分析關於這個心跳的人的情感，將分析出的情感空間的座標值輸出到情感推測部68。 The action analysis unit 613 obtains the heartbeat data (biological information) sensed by the heartbeat sensor 33, performs a process of analyzing the heartbeat pattern, and outputs the analyzed heartbeat pattern to the personal authentication unit 63, the action feature amount calculation unit 623, Action sentiment analysis unit 673. The action feature amount calculation unit 623 mechanically learns the heartbeat pattern of the processing result, and calculates the feature amount about the heartbeat data. The action emotion analysis unit 673 analyzes the heartbeat pattern of the heartbeat data based on the heartbeat data, analyzes the emotion of the person with the heartbeat, and outputs the coordinate value of the analyzed emotion space to the emotion estimation unit 68.

環境分析部614取得氣溫感測器34感測到的氣溫資料(環境資訊)，進行分析這個氣溫的時間變化的處理，將分析出來的氣溫輸出到環境特徵量算出部624與環境情感分析部674。環境特徵量算出部624會機械學習處理結果之氣溫變化，算出關於這個氣溫變化的特徵量。環境情感分析部674會根據這個特徵量及氣溫變化，推測操作者的情感。 The environment analysis unit 614 acquires the temperature data (environmental information) sensed by the temperature sensor 34, analyzes the time change of the temperature, and outputs the analyzed temperature to the environmental feature amount calculation unit 624 and the environmental emotion analysis unit 674. . The environmental feature amount calculation unit 624 mechanically learns the temperature change of the processing result, and calculates the feature amount regarding the temperature change. The environmental sentiment analysis unit 674 estimates the emotion of the operator based on the feature amount and the temperature change.

個人認證部63判定聲音資料的處理結果、影像資料的處理結果、分析出來的心跳圖樣是怎樣的組合，根據這個判定來組合聲音的特徵量、影像的特徵量、心跳圖樣的特徵量。個人認證部63更將組合的複數的特徵量與儲存於記憶部66的各個人的特徵點比較，認證各感測器所捕捉的操作者是誰。像這樣，將認證操作者是誰的處理稱為個人認證。 The personal authentication unit 63 determines how the processing result of the audio data, the processing result of the video data, and the analyzed heartbeat pattern are combined, and based on this determination, the feature quantity of the sound, the feature quantity of the video, and the feature quantity of the heartbeat pattern are combined. The personal authentication unit 63 compares the combined feature quantity with the feature points of each person stored in the storage unit 66, and authenticates who the operator is captured by each sensor. In this way, the process of authenticating the operator is called personal authentication.

情感推測部68判定聲音資料的處理結果、影像資料的處理結果、分析出來的心跳圖樣是怎樣的組合，根據這個判定來組合從聲音中分析出的情感、從影像中分析出的情感、從心跳圖樣分析出的情感，然後再加入從環境中分析出的情感的傾向。藉此，比起從單一感測器所感測的資料中分析情感，因為次元數增加，能夠提昇情感的推測精準度。 The emotion estimation unit 68 determines how the processing result of the voice data, the processing result of the video data, and the analyzed heartbeat pattern are combined, and combines the emotion analyzed from the voice, the emotion analyzed from the image, and the heartbeat based on the determination. The pattern analyzes the emotions and then adds the tendency to analyze the emotions from the environment. In this way, the emotion is analyzed in comparison with the data sensed from the single sensor, and the estimation accuracy of the emotion can be improved because the number of dimensions is increased.

身體狀況推測部681根據情感推測部68所推測的情感，推測操作者的身體狀況。在此，身體狀況推測部681推測出操作者的癲癇發作或心臟病發作的話，操作許可部64會停止機器2，且禁止操作者對機器2進行操作。 The physical condition estimation unit 681 estimates the physical condition of the operator based on the emotion estimated by the emotion estimation unit 68. When the physical condition estimation unit 681 estimates the seizure or heart attack of the operator, the operation permission unit 64 stops the device 2 and prohibits the operator from operating the device 2.

操作許可部64判定個人認證部63判定的操作者是否是具有機器2的操作權限的人。操作者是具有機器2的操作權限的人的話，操作許可部64會許可機器2的操作。認證裝置1更具備特徵量保存部651、校正部652、通信部653、記憶部66。 The operation permission unit 64 determines whether or not the operator determined by the personal authentication unit 63 is a person who has the operation authority of the device 2. The operator permission unit 64 permits the operation of the machine 2 if the operator is a person who has the operation authority of the machine 2. The authentication device 1 further includes a feature amount storage unit 651, a correction unit 652, a communication unit 653, and a storage unit 66.

特徵量保存部651判定聲音資料的處理結果、影像資料的處理結果、分析出來的心跳圖樣是怎樣的組合，根據這個判定來組合聲音特徵量、影像的特徵量、心跳圖樣的特徵量，並保存(儲存)於記憶部66。更進一步特徵量保存部651判定聲音資料的處理結果、影像資料的處理結果、分析出來的心跳圖樣是怎樣的組合，根據這個判定來組合聲音的情感分析結果、影像的情感分析結果、心跳圖樣的情感分析結果、並加入從環境中分析的情感的傾向，並保存(儲存)於記憶部66。 The feature amount storage unit 651 determines how the processing result of the audio data, the processing result of the video data, and the analyzed heartbeat pattern are combined, and combines the feature quantity of the sound, the feature quantity of the image, and the feature quantity of the heartbeat pattern based on the determination, and saves (stored) in the memory unit 66. Further, the feature amount storage unit 651 determines a combination of the processing result of the sound data, the processing result of the video data, and the analyzed heartbeat pattern, and combines the emotion analysis result of the sound, the sentiment analysis result of the image, and the heartbeat pattern based on the determination. The emotion analysis results, and the tendency to analyze the emotions analyzed from the environment is added and stored (stored) in the memory unit 66.

校正部652根據保存於記憶部66的各特徵量來校正個人的特徵點資訊，將校正後的個人的特徵點資訊保存於特徵點資料庫581(參照第2圖)。認證裝置1校正個人的特徵點資訊，藉此能夠提昇個人認證的精準度。又，校正部652會校正儲存於記憶部66的情感分析結果，將校正後的個人情感標籤儲存於標籤資料庫583(參照第2圖)。認證裝置1能夠藉由校正個人情感標籤來提昇情感辨識的精準度。 The correction unit 652 corrects the feature point information of the individual based on each feature amount stored in the storage unit 66, and stores the feature point information of the corrected individual in the feature point database 581 (see FIG. 2). The authentication device 1 corrects the personal feature point information, thereby improving the accuracy of the personal authentication. Further, the correction unit 652 corrects the sentiment analysis result stored in the storage unit 66, and stores the corrected personal emotion tag in the tag database 583 (see FIG. 2). The authentication device 1 can improve the accuracy of the emotion recognition by correcting the personal emotion tag.

通信部653是網路介面，具有在機械學習伺服器4等之間收發資訊的功能。藉此，能夠讓機械學習伺服器4來進行個人的特徵點資訊或情感標籤的校正。記憶部66具備只有第1圖的安全作業系統15與即時作業系統14能夠存取的安全領域661、多功能作業系統13及即時作業系統14及安全作業系統15任一者都能存取的非安全領域662。認證裝置1會將識別個人的特徵量、辨識個人用的特徵點資訊、識別情感用的情感標籤資訊儲存到安全領域661。藉此，即使多功能作業系統13受到異常入侵的情況下，認證裝置1也能夠防止儲存於這個安全領域661的特定個人用的資訊外洩。 The communication unit 653 is a network interface and has a function of transmitting and receiving information between the machine learning server 4 and the like. Thereby, the machine learning server 4 can be made to correct the personal feature point information or the emotion tag. The memory unit 66 includes only the security area 661, the multi-function operating system 13, and the real-time operating system 14 and the secure operating system 15 that the secure operating system 15 and the real-time operating system 14 of FIG. 1 can access. Security area 662. The authentication device 1 stores the feature amount for identifying the individual, the feature point information for identifying the individual, and the emotion tag information for identifying the emotion in the security field 661. Thereby, even if the multi-function operating system 13 is abnormally invaded, the authentication device 1 can prevent the information for the specific person stored in the security area 661 from being leaked.

第4圖係顯示本實施型態的機械學習伺服器4的邏輯架構的示意圖。機械學習伺服器4具備個人聲音檢出部411、聲音特徵量算出部421、聲音情感分析部471、影像處理部412、影像特徵量算出部422、影像情感分析部472、行動分析部413、行動特徵量算出部423、行動情感分析部473、環境分析部414、環境特徵量算出部424、環境情感分析部474、特徵量保存部43、記憶部44、校正部45、通信部46。這些聲音特徵量算出部421、影像特徵量算出部422、行動特徵量算出部423會構成第1圖所示的特徵量算出部42。 Fig. 4 is a view showing the logical structure of the mechanical learning server 4 of the present embodiment. The machine learning server 4 includes a personal voice detecting unit 411, a voice feature amount calculating unit 421, a voice emotion analyzing unit 471, a video processing unit 412, a video feature amount calculating unit 422, an image sentiment analyzing unit 472, an action analyzing unit 413, and an action. The feature amount calculation unit 423, the action emotion analysis unit 473, the environment analysis unit 414, the environmental feature amount calculation unit 424, the environmental emotion analysis unit 474, the feature amount storage unit 43, the storage unit 44, the correction unit 45, and the communication unit 46. The voice feature amount calculation unit 421, the video feature amount calculation unit 422, and the action feature amount calculation unit 423 constitute the feature amount calculation unit 42 shown in Fig. 1 .

個人聲音檢出部411、影像處理部412、行動分析部413對應第2圖所示的信號處理功能52，並且與第3圖所示的個人聲音檢出部611、影像處理部612、行動分析部613相同。聲音特徵量算出部421、影像特徵量算出部422、行動特徵量算出部423會對應第2圖所示的機械學習功能54及特徵量算出部55，且與第3圖所示的聲音特徵量算出部621、影像特徵量算出部622、行動特徵量算出部623相同。聲音情感分析部471、影像情感分析部472、行動情感分析部473、環境情感分析部474會對應第2圖所示的情感推測功能562的一部分，且與第3圖所示的聲音情感分析部671、影像情感分析部672、行動情感分析部673、環境情感分析部674相同。 The personal sound detecting unit 411, the video processing unit 412, and the action analyzing unit 413 correspond to the signal processing function 52 shown in FIG. 2, and the personal sound detecting unit 611, the video processing unit 612, and the action analysis shown in FIG. The part 613 is the same. The sound feature amount calculation unit 421, the video feature amount calculation unit 422, and the action feature amount calculation unit 423 correspond to the mechanical learning function 54 and the feature amount calculation unit 55 shown in Fig. 2, and the sound feature amount shown in Fig. 3 The calculation unit 621, the video feature amount calculation unit 622, and the action feature amount calculation unit 623 are the same. The voice emotion analysis unit 471, the video emotion analysis unit 472, the action emotion analysis unit 473, and the environmental emotion analysis unit 474 correspond to a part of the emotion estimation function 562 shown in FIG. 2 and the audio emotion analysis unit shown in FIG. 671. The image sentiment analysis unit 672, the action emotion analysis unit 673, and the environmental emotion analysis unit 674 are the same.

特徵量保存部43判定聲音資料的處理結果、影像資料的處理結果、分析出來的心跳圖樣是怎樣的組合，並根據這個判斷來組合聲音的特徵量、影像的特徵量、心跳圖樣的特徵量，保存(儲存)於記憶部44。又，特徵量保存部43判定聲音資料的處理結果、影像資料的處理結果、分析出來的心跳圖樣是怎樣的組合，根據這個判定來組合聲音的情感分析結果、影像的情感分析結果、心跳圖樣的情感分析結果，並加入從環境中分析的情感的傾向，並保存(儲存) 於記憶部44。 The feature amount storage unit 43 determines how the processing result of the sound data, the processing result of the video data, and the analyzed heartbeat pattern are combined, and combines the feature amount of the sound, the feature amount of the image, and the feature amount of the heartbeat pattern based on the determination. It is saved (stored) in the memory unit 44. Further, the feature amount storage unit 43 determines a combination of the processing result of the sound data, the processing result of the video data, and the analyzed heartbeat pattern, and combines the emotion analysis result of the sound, the sentiment analysis result of the image, and the heartbeat pattern based on the determination. The results of the sentiment analysis, and the tendency to analyze the emotions analyzed from the environment, are stored and stored (stored) in the memory unit 44.

記憶部44例如以硬碟或快閃記憶體等構成，儲存各種感測器資訊、從這些感測資訊中算出的各特徵量、根據各特徵量校正的個人的特徵點資訊等。 The memory unit 44 is configured by, for example, a hard disk or a flash memory, and stores various sensor information, each feature amount calculated from the sensing information, and feature point information of the individual corrected based on each feature amount.

通信部46例如是網路介面，在第1圖所示的認證裝置1之間收發資訊。校正部45對應第2圖所示的校正功能58，且與第3圖所示的校正部652相同。機械學習伺服器4能夠在認證裝置1的計算能力低的情況下代替這個認證裝置1，進行校正，來提昇個人認證的精準度。 The communication unit 46 is, for example, a network interface, and transmits and receives information between the authentication devices 1 shown in FIG. The correction unit 45 corresponds to the correction function 58 shown in FIG. 2 and is the same as the correction unit 652 shown in FIG. The mechanical learning server 4 can replace the authentication device 1 in the case where the computing capability of the authentication device 1 is low, and perform correction to improve the accuracy of personal authentication.

第5圖係認證處理的流程圖。認證裝置1的CPU11(參照第1圖)會對各感測器感測到的信號進行個人特定識別處理(步驟S1)，判定被感側的人是誰。個人特定識別處理會在後述的第9圖至第12圖說明。又，CPU11會執行相對於感測的人之情感辨識處理(步驟S2)，推測人的情感。情感辨識處理會在後述的第13圖至第15圖說明。 Figure 5 is a flow chart of the authentication process. The CPU 11 (see FIG. 1) of the authentication device 1 performs a person-specific identification process on the signals sensed by the respective sensors (step S1), and determines who the person on the affected side is. The individual-specific recognition processing will be described in the nineth to twelfth drawings which will be described later. Further, the CPU 11 executes the emotion recognition processing of the person with respect to the sensing (step S2), and estimates the emotion of the person. The emotion recognition processing will be described in Figs. 13 to 15 which will be described later.

CPU11認證個人後，會進行根據這個個人的情感標籤之情感辨識處理，因此比起不特定個人就進行根據一般的情感標籤之情感辨識處理，能夠高精準度地辨識情感。又，CPU11組合聲音與臉部影像兩者來辨識情感，因此能夠比單獨聲音或單獨影像更高精準度地辨識情感。 When the CPU 11 authenticates the individual, the emotion recognition processing based on the personal emotion tag is performed, so that the emotion recognition processing based on the general emotion tag is performed compared to the unspecified individual, and the emotion can be recognized with high accuracy. Further, the CPU 11 combines both the sound and the face image to recognize the emotion, and thus can recognize the emotion with higher precision than the individual sound or the individual image.

接著，CPU11在步驟S3判定機器2的操作者。CPU11在例如機器2是車輛的情況下，將坐在駕駛座的人判定為操作者。機器2是家電等的情況下，將最接近這個機器2的操作部的人判定為操作者。 Next, the CPU 11 determines the operator of the machine 2 at step S3. When the machine 2 is a vehicle, for example, the CPU 11 determines that the person sitting in the driver's seat is the operator. When the machine 2 is a home appliance or the like, the person closest to the operation unit of the machine 2 is determined as the operator.

CPU11會將操作者的情感的特徵量與有關於病狀的標籤相比，定量地算出操作者的發病或睡意的徵兆，判斷這個徵兆是否超過第1閾值(步驟S4)。病狀是指癲癇、憂鬱症、心臟病發作等。這個第1閾值是用來判定有無徵兆的閾值。 The CPU 11 quantitatively calculates the symptom of the onset or drowsiness of the operator, and determines whether the symptom exceeds the first threshold value (step S4), compared with the feature amount of the operator's emotion. The condition refers to epilepsy, depression, heart attack and the like. This first threshold is a threshold for determining the presence or absence of a symptom.

當發病或睡意的徵兆在第1閾值以上的話(步驟S5→是)，CPU11會通知該操作者或其他人(步驟S6)，然後進入步驟S7。在此，CPU11會將催促停止車輛駕駛的建議以聲音等來傳達。當發病或睡意的徵兆不滿第1閾值的話(步驟S5→否)，CPU11判定操作者身上沒有發病或睡意的徵兆，結束第5圖的處理。 When the symptom of the onset or drowsiness is equal to or greater than the first threshold (step S5 → YES), the CPU 11 notifies the operator or another person (step S6), and proceeds to step S7. Here, the CPU 11 transmits a suggestion for urging the vehicle to stop driving by sound or the like. When the symptom of the onset or drowsiness is less than the first threshold (step S5 -> NO), the CPU 11 determines that there is no sign of onset or drowsiness on the operator, and ends the process of Fig. 5.

步驟S7中，當發病或睡意的徵兆在第2閾值以上的話(步驟S7→是)，CPU11停止這個機器2之後(步驟S8)，結束第5圖的處理。當發病或睡意的徵兆不滿第2閾值的話(步驟S7→No)，CPU11結束第5圖的處理。這些顯示徵兆的定量的檢查結果以及第1閾值及第2閾值的關係，將顯示於後述的第8圖。 In step S7, when the symptom of the onset or drowsiness is equal to or greater than the second threshold (step S7: YES), the CPU 11 stops the device 2 (step S8), and ends the process of Fig. 5. When the symptom of the onset or drowsiness is less than the second threshold (step S7 → No), the CPU 11 ends the process of Fig. 5. The quantitative test results showing the signs and the relationship between the first threshold and the second threshold are displayed in Fig. 8 which will be described later.

第6圖係顯示聲音的個人認證動作的細節之示意圖。第6圖中顯示聲音波形領域71與聲紋領域72。聲音波形領域71中顯示了麥克風31收集的聲音的波形。聲音波形領域71的橫軸顯示時間，縱軸顯示信號的大小。聲紋領域72顯示了與聲音波形領域71的聲音同時刻的聲紋。這個聲紋是將聲音信號通過漢明窗(Hamming window)計算頻譜的結果。聲紋領域72的橫軸表示時間，縱軸表示頻率。然後，以各點的亮度與顏色顯示這個時間點的頻率下的強度。 Figure 6 is a schematic diagram showing the details of the personal authentication action of the sound. The sound waveform field 71 and the voiceprint field 72 are shown in FIG. The waveform of the sound collected by the microphone 31 is displayed in the sound waveform field 71. The horizontal axis of the sound waveform field 71 displays time, and the vertical axis shows the size of the signal. The voiceprint field 72 shows a voiceprint that is simultaneously engraved with the sound of the sound waveform field 71. This voiceprint is the result of calculating the spectrum of the sound signal through the Hamming window. The horizontal axis of the voiceprint field 72 represents time, and the vertical axis represents frequency. Then, the intensity at the frequency of this time point is displayed with the brightness and color of each point.

每個人的聲音的差異會從發聲器官(口腔、鼻孔、聲帶)或口脣、舌頭等的差異中產生。特別是關於聲音的音色，會因為口腔或鼻孔的容積及構造而限定。從聲帶到口腔、鼻孔的形狀或大小會因人而有不同的特徵。這個特徵會呈現在聲紋上。又，人的身高與那個人的聲音的高低有密切的關係。一般身高越高的人身體的各部位越大，聲帶也大多比較大。因此，身高越高的人，聲音越低。聲音的高低也會呈現於聲紋。因此，藉由機械學習聲紋來算出關於聲音的特徵量，藉由比較這個特徵量與各個人的特徵點，能夠認證個人。 The difference in each person's voice can be caused by differences in the vocal organs (oral, nostril, vocal cords) or lips, tongue, and the like. In particular, the sound of the sound is limited by the volume and configuration of the mouth or nostrils. The shape or size of the vocal cords to the mouth and nostrils may vary from person to person. This feature will appear on the voiceprint. Moreover, the height of a person is closely related to the voice of that person. Generally, the higher the height, the larger the parts of the body, and the vocal cords are mostly larger. Therefore, the higher the height, the lower the sound. The level of sound will also appear in the voiceprint. Therefore, the feature quantity with respect to the sound is calculated by mechanically learning the voiceprint, and the individual can be authenticated by comparing the feature amount with the feature points of the individual.

第7圖係顯示圖像的個人認證動作的細節的示意圖。三次元圖像 73是以將臉部影像以連結各部位的三次元座標以及這些各部位(例如眼角、口角、鼻子上端與下端、下顎下端等)的三次元線框來構成。三次元影像73是從相機32以二次元拍攝的影像中檢測出臉部領域，將這個臉部領域轉換成三次元影像73而求得。 Figure 7 is a schematic diagram showing the details of the personal authentication action of the image. The three-dimensional image 73 is constructed by connecting a facial image to a three-dimensional coordinate of each part and a three-dimensional line frame of these parts (for example, an eye corner, a mouth angle, a nose upper end and a lower end, a lower jaw lower end, etc.). The three-dimensional image 73 is obtained by detecting the face area from the image captured by the camera 32 in the second element, and converting the face area into the three-dimensional image 73.

臉部領域的檢出，會使用主成分分析、線形判定分析、彈性打孔圖匹配、隱馬爾可夫模型、神經元動機之動態比對等。在此，臉部的特徵量是例如構成臉部的各部位的相對的位置關係，從各部位的三次元座標中求出。機械學習這些各部位的相對的位置關係與時間的變化，與各個人的特徵點做幾何學比較，藉此能夠認證出個人。 In the detection of facial areas, principal component analysis, linear judgment analysis, elastic perforation map matching, hidden Markov model, dynamic alignment of neuron motives, etc. are used. Here, the feature amount of the face is, for example, a relative positional relationship of each part constituting the face, and is obtained from the cubic coordinates of each part. The machine learns the relative positional relationship and time changes of these parts, and compares the geometric points of each person's feature points, thereby being able to authenticate the individual.

第8圖係顯示健康群與患疾群的檢查結果以及其分布的概念的圖表。健康群是指沒有發病且沒有感到睡意的人，以實線表示。患疾群是指發病或者是感到睡意的人，以虛線表示。圖表的橫軸顯示機械學習的檢查結果，圖表的縱軸顯示這個檢查結果的人數分布。 Fig. 8 is a graph showing the results of the examination of the healthy group and the affected group and the concept of its distribution. A healthy group is a person who has no morbidity and does not feel sleepy, and is indicated by a solid line. A disease group is a person who is sick or sleepy and is indicated by a dotted line. The horizontal axis of the graph shows the results of the mechanical learning check, and the vertical axis of the graph shows the distribution of the number of people for this test result.

第1閾值T₁是根據機械學習的檢查結果來區別陽性及陰性的閾值。在此，陽性會指會顯示出判定發病或者是感到睡意。陽性包括患疾群中的真陽性以及健康群中的偽陽性。陰性包括健康群中的真陰性與患疾群中的偽陰性。偽陽性與偽陰性是錯誤判定。 The first threshold T ₁ is a threshold for distinguishing between positive and negative based on the result of inspection by mechanical learning. Here, a positive indication may indicate a morbidity or a feeling of drowsiness. Positives include true positives in the affected group and false positives in the healthy group. Negatives include true negatives in the healthy group and false negatives in the affected group. False positives and false negatives are erroneous decisions.

本實施型態中，第1閾值T₁是健康群的分布與患疾群的分布相交的值。此時，能夠使偽陽性的人數與偽陰性的人數的和最少。第2閾值T₂是使認證裝置1停止機器2的閾值。超過第2閾值T₂的領域中，幾乎沒有健康群分布，只有患疾群分布。像這樣，藉由設定第2閾值T₂，即使健康的人在操作，認證裝置1還是錯誤地停止機器2的狀況會消失。 In the present embodiment, the first threshold T ₁ is a value at which the distribution of the healthy group intersects with the distribution of the affected group. At this time, the sum of the number of false positives and the number of false negatives can be minimized. The second threshold T ₂ is a threshold for causing the authentication device 1 to stop the machine 2 . In the field exceeding the second threshold T ₂ , there is almost no distribution of healthy groups, and only the disease group is distributed. As described above, by setting the second threshold value T ₂ , even if a healthy person is operating, the authentication device 1 erroneously stops the state of the device 2 and disappears.

第9圖及第10圖是個人認證處理的流程圖。以下的個人認證處理中，步驟S10~S14的聲音的處理、步驟S20~S25的影像的處理、步驟S30~S33 的心跳資訊(生物資訊)的處理會平行地進行。 Figures 9 and 10 are flow charts of personal authentication processing. In the following personal authentication processing, the processing of the sounds of steps S10 to S14, the processing of the images of steps S20 to S25, and the processing of the heartbeat information (biological information) of steps S30 to S33 are performed in parallel.

《聲音處理》 Sound Processing

一開始個人聲音檢出部611會取得麥克風31收集到的聲音(步驟S10)。個人聲音檢出部611會在相機32拍攝的影像中操作者的嘴巴沒有動的時候的聲音無聲化(步驟S11)。從相機32拍攝的影像中檢測出操作者的嘴巴動作的處理會在後述的影像處理的步驟S23中進行。接著，個人聲音檢出部611會對聲音信號進行頻譜化信號處理，算出聲紋(步驟S12)。 Initially, the personal sound detecting unit 611 acquires the sound collected by the microphone 31 (step S10). The personal sound detecting unit 611 does not sound the sound when the operator's mouth is not moving in the image captured by the camera 32 (step S11). The process of detecting the operator's mouth motion from the image captured by the camera 32 is performed in step S23 of image processing to be described later. Next, the personal sound detecting unit 611 performs spectral signal processing on the sound signal to calculate a voice print (step S12).

接著，個人聲音檢出部611判定聲紋是否包含人的聲音(步驟S13)。聲紋是否包含人的聲音例如能夠以聲音的大小等來判斷。個人聲音檢出部611能夠檢出聲音的話，聲音特徵量算出部621會從這個聲紋算出特徵量(步驟S14)。藉此，能夠抑制從非人的雜音中算出特徵量的情況。 Next, the personal sound detecting unit 611 determines whether or not the voice print contains a human voice (step S13). Whether or not the voiceprint contains a human voice can be judged, for example, by the size of the sound or the like. When the voice detection unit 611 can detect the voice, the voice feature amount calculation unit 621 calculates the feature amount from the voiceprint (step S14). Thereby, it is possible to suppress the case where the feature amount is calculated from the non-human noise.

聲音特徵量算出部621算出的特徵量是指例如各聲音的連續時間、各頻率下的相對強度等，以複數的次元所構成。 The feature amount calculated by the sound feature amount calculation unit 621 is, for example, a continuous time of each sound, a relative intensity at each frequency, and the like, and is composed of a plurality of dimensions.

《影像處理》 Image Processing

一開始，影像處理部612取得相機32拍攝的影像(步驟S20)。相機32以1秒60張拍攝。影像處理部612以每1/60秒取得這些拍攝到的各個圖框的影像。影像處理部612判定取得的圖樣是否包含臉部影像(步驟S21)，檢測出臉部影像。接著，影像處理部612(影像特定手段)算出構成臉部的各部位的三次元相對座標(步驟S22)，再從時間上連續的影像中算出嘴巴週邊的三次元相對座標的動作，藉此檢出嘴巴的動作(步驟S23)。嘴巴的動作的資訊會輸出到個人聲音檢出部611且會用於與後述的臉部影像及聲音對應連結。又，影像處理部612會算出皮膚的皺紋或斑的三次元相對座標(步驟S24)。藉此增加特徵量的次元，而更容易特定出個人。 Initially, the image processing unit 612 acquires an image captured by the camera 32 (step S20). The camera 32 takes a picture of 60 shots per second. The image processing unit 612 acquires the images of the respective captured frames every 1/60 second. The video processing unit 612 determines whether or not the acquired pattern includes a face image (step S21), and detects the face image. Next, the image processing unit 612 (image specifying means) calculates the three-dimensional relative coordinates of the respective parts constituting the face (step S22), and calculates the three-dimensional relative coordinates around the mouth from the temporally consecutive images, thereby detecting The action of the mouth is made (step S23). The information on the movement of the mouth is output to the personal sound detecting unit 611 and is used to connect with the facial image and sound to be described later. Further, the image processing unit 612 calculates the three-dimensional relative coordinates of the wrinkles or spots of the skin (step S24). By this, the dimension of the feature quantity is increased, and it is easier to specify the individual.

圖像處理部612檢測出臉部影像的話，影像特徵量算出部622會從這些各部位的三次元相對座標所構成的臉部的三次元影像中，算出操作者的臉部的特徵量(步驟S25)。藉此能夠抑制從人臉沒有被拍攝的影像中算出臉的特徵量的情況。 When the image processing unit 612 detects the face image, the image feature amount calculation unit 622 calculates the feature amount of the operator's face from the three-dimensional image of the face formed by the three-dimensional relative coordinates of the respective parts (step S25). Thereby, it is possible to suppress the case where the feature amount of the face is calculated from the image in which the face is not photographed.

臉部的特徵量是例如相對於臉的眼睛或鼻子大小、相對於臉的眼睛或鼻子或嘴巴的高度等，以複數的次元構成。 The feature amount of the face is, for example, a size of the eye or the nose relative to the face, the height of the eye or the nose or the mouth relative to the face, and the like, and is composed of a plurality of dimensions.

《行動分析處理》 Action Analysis Processing

一開始，行動分析部613取得心跳感測器33感測到的心跳資料(步驟S30)。行動分析部613取得的心跳資料包括人的心跳圖樣，藉此判定是否能夠量測到人的心跳(步驟S31)。接著，行動分析部613從心跳資料中算出既定期間的心跳圖樣(步驟S32)。藉此，能夠減少重疊於單一週期的心跳資料之雜訊的影響。 Initially, the action analysis unit 613 obtains the heartbeat data sensed by the heartbeat sensor 33 (step S30). The heartbeat data acquired by the action analysis unit 613 includes a heartbeat pattern of the person, thereby determining whether or not the heartbeat of the person can be measured (step S31). Next, the action analysis unit 613 calculates a heartbeat pattern for a predetermined period from the heartbeat data (step S32). Thereby, the influence of the noise of the heartbeat data superimposed on a single cycle can be reduced.

又，行動分析部613判定出心跳資料是能夠量測到人的心跳的結果的話，行動特徵量算出部623會從這個心跳圖樣算出特徵量。心跳圖樣的特徵量是以心跳圖樣的週期或其變動、心跳圖樣的波形等，以複數的次元構成。心跳圖樣的特徵量具有與操作者的行動的關聯性。 When the action analysis unit 613 determines that the heartbeat data is a result of measuring the heartbeat of the person, the action feature amount calculation unit 623 calculates the feature amount from the heartbeat pattern. The feature quantity of the heartbeat pattern is a period of a heartbeat pattern or a variation thereof, a waveform of a heartbeat pattern, and the like, and is composed of a plurality of dimensions. The feature quantity of the heartbeat pattern has an association with the operator's action.

《個人特定識別處理》 Personal Specific Identification Processing

以下的步驟S40~S48的處理是個人特定識別處理。這個個人特定識別處理會以每次既定週期(例如3秒)反覆地進行。個人認證部63在該週期內影像處理部612所檢測出的臉部，反覆地進行步驟S40~S48。 The processing of the following steps S40 to S48 is personal identification processing. This personal-specific recognition process is repeated over a predetermined period (for example, 3 seconds). The personal authentication unit 63 repeats steps S40 to S48 in the face detected by the image processing unit 612 in this cycle.

在步驟S41中，個人認證部63會判斷聲音是否對應這個臉部影像。個人認證部63會在具有對應這個臉部影像的聲音的話(步驟S41→是)，進入步驟S42的處理。個人認證部63會在沒有對應這個臉部影像的聲音的話(步驟S41→否)，進入步驟S45的處理。另外，個人認證部63藉由將拍攝的臉部影像的嘴巴動作與各時刻的音量相對應，藉此使對象者的臉部影像與對象者發出的聲音的時刻相對應。另外，也可以藉由指向性麥克風，檢測出各時刻中發生聲音的人的位置，與拍攝到的各個人的位置相對應。 In step S41, the personal authentication unit 63 determines whether the sound corresponds to the face image. When the personal authentication unit 63 has a sound corresponding to the face image (step S41 → YES), the process proceeds to step S42. When there is no sound corresponding to the face image (step S41: No), the personal authentication unit 63 proceeds to the process of step S45. Further, the personal authentication unit 63 associates the face motion of the captured face image with the volume of each time, thereby causing the face image of the subject to correspond to the time of the voice of the subject person. Further, the position of the person who has generated the sound at each time may be detected by the directional microphone, and may correspond to the position of each of the photographed persons.

在步驟S42中，個人認證部63會判定是否具有對應這個臉部影像的心跳資料。當具有對應這個臉部影像的心跳資料的話(步驟S42→是)，個人認證部63會產生組合臉部的特徵量(第1個人特徵量)、聲紋的特徵量(第2個人特徵量)、心跳圖樣的特徵量(第3個人特徵量)的多次元的特徵量，然後與對應的次元之個人的特徵點(標籤)相比對，藉此特定個人並加以識別(步驟S43)。臉部影像與心跳資料的對應，是藉由對應例如心跳感測器設置的位置(例如駕駛座)與拍攝的各個人的位置(駕駛座或助手座)來進行。相較於比對從單一感測器算出的特徵量以及關於這個感測器的個人特徵點(標籤)，因為在更多次元的空間中進行比對，所以能夠提高個人認證的正解率。這個處理結束後，進入步驟S48的處理。 In step S42, the personal authentication unit 63 determines whether or not there is heartbeat data corresponding to the facial image. When there is a heartbeat data corresponding to the face image (step S42: YES), the personal authentication unit 63 generates the feature amount (first personal feature amount) of the combined face and the feature amount of the voice print (the second personal feature amount). The feature quantity of the multiple element of the feature quantity (third personal feature quantity) of the heartbeat pattern is then compared with the feature point (tag) of the individual of the corresponding dimension, whereby the specific individual is identified and recognized (step S43). The correspondence between the facial image and the heartbeat data is performed by a position corresponding to, for example, a heartbeat sensor (for example, a driver's seat) and a position of each person photographed (a driver's seat or a passenger seat). Compared to the feature quantity calculated from a single sensor and the personal feature points (labels) about this sensor, since the comparison is performed in the space of more dimensional elements, the positive solution rate of the personal authentication can be improved. After this processing is completed, the process proceeds to step S48.

當這個臉部影像沒有對應心跳資料的話(步驟S42→否)，個人認證部63會產生組合了臉部的特徵量與聲紋的特徵量的多次元的特徵量，然後與對應的次元的個人的特徵點(標籤)相比對，藉此特定出個人並識別(步驟S44)。相較於比對從單一感測器算出的特徵量以及關於這個感測器的個人特徵點(標籤)，因為在更多次元的空間中進行比對，所以能夠提高個人認證的正解率。又，個人認證部63會因為臉部影像沒有對應的心跳資料而沒有組合心跳圖樣的特徵量。個人認證部63能夠防止無效的感測資料造成的認證錯誤。這個處理結束後，進入步驟S48的處理。 When the facial image does not have the corresponding heartbeat data (step S42: No), the personal authentication unit 63 generates a feature quantity of the multiple element combining the feature quantity of the face and the feature quantity of the voiceprint, and then the individual with the corresponding dimension The feature points (tags) are compared, whereby the individual is identified and identified (step S44). Compared to the feature quantity calculated from a single sensor and the personal feature points (labels) about this sensor, since the comparison is performed in the space of more dimensional elements, the positive solution rate of the personal authentication can be improved. Further, the personal authentication unit 63 does not combine the feature amounts of the heartbeat pattern because the facial image does not have corresponding heartbeat data. The personal authentication unit 63 can prevent an authentication error caused by invalid sensing data. After this processing is completed, the process proceeds to step S48.

在步驟S45中，個人認證部63判定是否有對應這個臉部影像的心跳資料。當具有對應這個臉部影像的心跳資料的話(步驟S45→是)，個人認證部63會產生出臉部的特徵量組合了心跳圖樣的特徵量之多次元的特徵量，然後與對應的次元之個人的特徵點(標籤)相比對，藉此特定個人並加以識別(步驟S46)。 In step S45, the personal authentication unit 63 determines whether or not there is heartbeat data corresponding to the face image. When there is a heartbeat data corresponding to the face image (step S45→YES), the personal authentication unit 63 generates a feature quantity of the feature of the face combined with the feature quantity of the heartbeat pattern, and then corresponds to the corresponding dimension. The individual feature points (tags) are compared, whereby the specific person is identified and recognized (step S46).

相較於比對從單一感測器算出的特徵量以及關於這個感測器的個人特徵點(標籤)，因為在更多次元的空間中進行比對，所以能夠提高個人認證的正解率。又，個人認證部63會因為臉部影響沒有對應的聲音資料而沒有組合聲紋的特徵量。個人認證部63能夠防止無效的感測資料造成的認證錯誤。這個處理結束後，進入步驟S48的處理。 Compared to the feature quantity calculated from a single sensor and the personal feature points (labels) about this sensor, since the comparison is performed in the space of more dimensional elements, the positive solution rate of the personal authentication can be improved. Further, the personal authentication unit 63 does not have a corresponding sound data because of the facial influence, and does not combine the feature amounts of the voiceprint. The personal authentication unit 63 can prevent an authentication error caused by invalid sensing data. After this processing is completed, the process proceeds to step S48.

當沒有對應這個臉部影像的心跳資料的話(步驟S45→否)個人認證部63會與對應臉部特徵量的次元之個人特徵點(標籤)相比對，藉此特定個人並加以識別(步驟S47)。這個處理結束後，進入步驟S48的處理。 When there is no heartbeat data corresponding to the face image (step S45: No), the personal authentication unit 63 compares with the personal feature point (label) of the corresponding face feature amount, thereby identifying and identifying the individual (steps) S47). After this processing is completed, the process proceeds to step S48.

在步驟S48中，個人認證部63判斷是否對該週期中檢出的全部的臉部進行了重複的處理。當對該週期中檢出的全部的臉部進行了重複的處理的話，將檢出的臉部與聲紋與行動(心跳資料)對應連結到各個人(步驟S49)。 In step S48, the personal authentication unit 63 determines whether or not the processing of all the faces detected in the cycle has been repeated. When all the faces detected in the cycle are repeatedly processed, the detected face and the voiceprint are linked to the action (heartbeat data) to each person (step S49).

個人認證部63在沒有對應臉部影像的聲音時不組合聲紋的特徵量，在沒有對應到臉部影像的心跳資料時不組合心跳圖樣的特徵量。像這樣，感測資料無效的情況下，就不去組合從這個感測資料中算出的特徵量，因此，個人認證部63能夠防止無效的感測資料造成的認證的錯誤。個人認證部63也可以在沒有對應到臉部影像的聲音時組合靜音的聲音信號，也可以在沒有對應到臉部影像的心跳資料時組合平均的心跳圖樣的特徵量。藉此，個人認證部63也能夠防止無效的感測資料造成的認證的錯誤。。 The personal authentication unit 63 does not combine the feature amount of the voiceprint when there is no sound corresponding to the face image, and does not combine the feature amount of the heartbeat pattern when there is no heartbeat data corresponding to the face image. In this way, when the sensing data is invalid, the feature amount calculated from the sensing data is not combined, and therefore, the personal authentication unit 63 can prevent an authentication error caused by the invalid sensing material. The personal authentication unit 63 may combine the mute sound signal when there is no sound corresponding to the face image, or may combine the feature amount of the average heartbeat pattern when there is no heartbeat data corresponding to the face image. Thereby, the personal authentication unit 63 can also prevent an authentication error caused by invalid sensing material. .

第11圖係顯示進行臉部、聲紋及心跳的個人認證處理的階層型人工神經網路9的說明圖。構成個人認證部63的階層型人工神經網路9中，複數的節點98會以輸入層91、第1中間層92、第2中間層93、輸出層94這4層階層化。各層的節點98各自以邊99連接。輸入層91的節點98分別被輸入臉部的特徵量、聲紋的特徵量、心跳的特徵量。從輸出層94輸出個人認證的結果。另外，本實施型態中，分別準備了組合臉部的特徵量與聲紋的特徵量來學習的階層型人工神經網路、組合臉部的特徵量與心跳的特徵量來學習的階層型人工神經網路、只以臉部的特徵量來學習的階層型人工神經網路。 Fig. 11 is an explanatory diagram showing a hierarchical artificial neural network 9 for performing personal authentication processing of face, voiceprint, and heartbeat. In the hierarchical artificial neural network 9 constituting the personal authentication unit 63, the plurality of nodes 98 are hierarchically layered by the input layer 91, the first intermediate layer 92, the second intermediate layer 93, and the output layer 94. The nodes 98 of each layer are each connected by an edge 99. The node 98 of the input layer 91 is input with the feature amount of the face, the feature amount of the voiceprint, and the feature amount of the heartbeat, respectively. The result of the personal authentication is output from the output layer 94. In addition, in the present embodiment, a hierarchical artificial artificial neural network that combines the feature amount of the face and the feature amount of the voiceprint to learn the feature amount of the combined face and the feature amount of the heartbeat is prepared. A neural network, a hierarchical artificial neural network that only learns from the feature quantity of the face.

第12圖係顯示層間的指標與邊權重的一例的說明圖。輸入層91具備節點98-1、98-2。第1中間層92中的節點98-x會與輸入層91中的節點98-2、98-4、98-7透過邊99連接。將第1中間層92中的節點98-x與輸入層91中的節點98-2連接的邊99的權重是0.2。將第1中間層92中的節點98-x與輸入層91中的節點98-4連接的邊99的權重是-0.5。將第1中間層92中的節點98-x與輸入層91中的節點98-7連接的邊99的權重是0.1。第1中間層92中的節點98-x上的1.1表示這個節點98-x的偏差值是1.1。 Fig. 12 is an explanatory diagram showing an example of an index between the layers and an edge weight. The input layer 91 is provided with nodes 98-1 and 98-2. The nodes 98-x in the first intermediate layer 92 are connected to the edges 99 through the nodes 98-2, 98-4, 98-7 in the input layer 91. The weight of the edge 99 connecting the node 98-x in the first intermediate layer 92 to the node 98-2 in the input layer 91 is 0.2. The weight 99 of the edge 99 connecting the node 98-x in the first intermediate layer 92 to the node 98-4 in the input layer 91 is -0.5. The weight of the edge 99 connecting the node 98-x in the first intermediate layer 92 to the node 98-7 in the input layer 91 is 0.1. A 1.1 on node 98-x in the first intermediate layer 92 indicates that the offset value of this node 98-x is 1.1.

接著，說明這個階層型人工神經網路9的動作。對構成階層型人工神經網路9的輸入層91的各節點98，給予臉部的特徵量、聲紋的特徵量、心跳的特徵量時，各節點98會從這些臉的特徵量、聲紋的特徵量、心跳的特徵量中算出各節點98的活性度。當算出輸入層91的各節點98的活性度時，對於第1中間層92的每個節點98，從以邊99連接到這個節點98的輸入層91的各節點98中取得在該節點98的活性度。 Next, the operation of this hierarchical artificial neural network 9 will be described. When the feature quantity of the face, the feature quantity of the voiceprint, and the feature quantity of the heartbeat are given to each node 98 of the input layer 91 constituting the hierarchical artificial neural network 9, each node 98 will receive the feature quantity and voiceprint from these faces. The degree of activity of each node 98 is calculated from the feature quantity and the feature quantity of the heartbeat. When the activity level of each node 98 of the input layer 91 is calculated, for each node 98 of the first intermediate layer 92, the node 98 at the input layer 91 connected to the node 98 by the edge 99 is obtained at the node 98. Activity.

例如，第1中間層92中的節點98-x的情況下，取得在輸入層91的節點98-2的活性度、在節點98-4的活性度、在節點98-7的活性度。又，再取得將節點98-x與輸入層91的各節點98-2、98-4、98-7之間連結的邊99的權重。節點98-x會算出與自己連接的輸入層91的各節點98-2、98-4、98-7的活性度與各邊99的權重的乘積，將這些乘積的總和加上該節點98-x的偏差值，將這個加總值放入活性化函數F(activation function)計算，算出節點98-x的活性度。 For example, in the case of the node 98-x in the first intermediate layer 92, the activity level at the node 98-2 of the input layer 91, the activity at the node 98-4, and the activity at the node 98-7 are obtained. Further, the weight of the edge 99 connecting the node 98-x and each of the nodes 98-2, 98-4, and 98-7 of the input layer 91 is obtained. The node 98-x calculates the product of the activity of each node 98-2, 98-4, 98-7 of the input layer 91 connected to itself and the weight of each side 99, and adds the sum of these products to the node 98- The deviation value of x is calculated by putting the total value into the activation function F (activation function) to calculate the activity of the node 98-x.

使用於活性度計算的活性化函數F是線性函數、S函數、歸一化指數函數、整流化線性函數(ReLU：Rectified Linear Unit function)等。在此，雖然顯示了第1中間層92中的節點98-x的活性度的算出例，但對於第1中間層92中的其他的節點98的活性度也會以同樣的方式算出。 The activation function F used for the activity calculation is a linear function, an S function, a normalized exponential function, a rectified linear unit function (ReLU), or the like. Here, although the calculation example of the activity degree of the node 98-x in the first intermediate layer 92 is shown, the activity degree of the other node 98 in the first intermediate layer 92 is also calculated in the same manner.

當算出第1中間層92的各節點98的活性度，就會算出在第2中間層93的各節點98的活性度。在第2中間層93的各節點98的活性度的算出方法會與在第1中間層92的各節點98的活性度的算出方法相同。當算出第2中間層93的各節點98的活性度，就會算出在輸出層94的各節點98的活性度。在輸出層94的各節點98的活性度的算出方法會與在第1中間層92的各節點98的活性度的算出方法相同。 When the degree of activity of each node 98 of the first intermediate layer 92 is calculated, the degree of activity of each node 98 in the second intermediate layer 93 is calculated. The method of calculating the activity of each node 98 in the second intermediate layer 93 is the same as the method of calculating the activity of each node 98 in the first intermediate layer 92. When the degree of activity of each node 98 of the second intermediate layer 93 is calculated, the degree of activity at each node 98 of the output layer 94 is calculated. The method of calculating the activity of each node 98 of the output layer 94 is the same as the method of calculating the activity of each node 98 of the first intermediate layer 92.

輸出層94的每個節點98的活性度會做為個人認證結果之多次元特徵量輸出。例如，當要識別感測器所檢測的操作者是否是預先登錄的A、B、C或者是除此之外的人的情況下，輸出層94的構成包含了節點98，這些節點98顯示個人識別用的特徵量空間的次元。輸出層94的各節點98的活性度會在學習中盡可能地分別成為顯示A、B、C的多次元空間的特徵量(座標值)的值。 The activity of each node 98 of the output layer 94 is output as a multi-element feature quantity of the personal authentication result. For example, in the case where it is to be recognized whether the operator detected by the sensor is A, B, C or other persons registered in advance, the composition of the output layer 94 includes nodes 98, which display individuals The dimension of the feature quantity space used for identification. The degree of activity of each node 98 of the output layer 94 becomes a value of the feature quantity (coordinate value) of the multiple element space displaying A, B, and C as much as possible in the learning.

推論時，根據以輸出層94所示的多次元空間的特徵量，與顯示這個多次元空間的A、B、C的個人標籤相比，輸出例如A的認證結果。另外，也可以不只是單純的認證結果，可以實施使用了活性度的可靠度的計算或回歸預測值輸出等的處理。另外，使用了人工神經網路的並不限定於個人認證部63，也可以使用於聲音特徵量算出部621、影像特徵量算出部622、行動特徵量算出部623等。 In the inference, according to the feature quantity of the multi-element space shown by the output layer 94, the authentication result of A, for example, is output as compared with the personal tag of A, B, and C which displays this multi-dimensional space. Further, it is also possible to perform processing such as calculation of reliability of activity degree or output of regression prediction value, not only a simple authentication result. In addition, the artificial neural network is not limited to the personal authentication unit 63, and may be used in the audio feature amount calculation unit 621, the video feature amount calculation unit 622, the action feature amount calculation unit 623, and the like.

第13圖係情感辨識處理的流程圖。這個步驟S60~S70所示的情感辨識處理會在第9圖與第10圖所是的個人特定識別處理後進行。也就是，情感辨識處理會在每個既定週期(例如3秒)反覆地進行。環境情感分析部674會從現在的氣溫及其變化中分析出人的情感傾向(步驟S60)。這個分析結果會加到分析出的全部的人的情感中。 Figure 13 is a flow chart of the emotion recognition process. The emotion recognition processing shown in this step S60 to S70 is performed after the individual-specific recognition processing in Figs. 9 and 10. That is, the emotion recognition process is repeatedly performed every predetermined period (for example, 3 seconds). The environmental sentiment analysis unit 674 analyzes the emotional tendency of the person from the current temperature and the change thereof (step S60). The results of this analysis will be added to the emotions of all the people analyzed.

情感推測部68會對個人特定識別處理中檢測出的各個人反覆進行步驟S61~S69的處理。最初，情感推測部68會判定這個人的聲紋資訊是否被檢出(步驟S62)。在聲紋資訊被檢測出來的情況下(步驟S62→是)，情感推測部68會前進到步驟S63。在聲紋資訊沒有被檢測出來的情況下(步驟S62→否)，情感推測部68會前進到步驟S66。 The emotion estimation unit 68 repeats the processing of steps S61 to S69 for each person detected in the individual-specific recognition processing. Initially, the emotion estimation unit 68 determines whether or not the voiceprint information of the person is detected (step S62). When the voiceprint information is detected (step S62 → YES), the emotion estimation unit 68 proceeds to step S63. When the voiceprint information is not detected (step S62: No), the emotion estimation unit 68 proceeds to step S66.

在步驟S63中，情感推測部68會判定是否檢測出關於這個人的心跳資料。當檢測出心跳資料的情況下(步驟S63→是)情感推測部68會從臉部的特徵量(第1情感特徵量)、聲紋的特徵量(第2情感的特徵量)、心跳圖樣的特徵量(第3情感特徵量)與氣溫資訊中算出多次元的情感特徵量，並與多次元空間的情感標籤相比對，藉此推測出這個人的情感(步驟S64)。當這個人被認證的話，情感推測部68會使用這個人的個人標籤。藉此，情感推測部68能夠比起使用一般的標籤以更高的精準度來推測情感。又，相較於從單一的感測器來算出多次元的情感特徵量的作法來說，情報量更多，因此能夠提高推測出的情感的正確率。當這個處理結束時，進入步驟S69的處理。 In step S63, the emotion estimating unit 68 determines whether or not the heartbeat data about the person is detected. When the heartbeat data is detected (step S63: YES), the emotion estimating unit 68 extracts the feature amount (first emotion feature amount) of the face, the feature amount of the voice print (feature amount of the second emotion), and the heartbeat pattern. The feature amount (third emotion feature amount) and the temperature information are used to calculate the emotion feature amount of the multiple element, and compared with the emotion tag of the multi-element space, thereby estimating the emotion of the person (step S64). When this person is authenticated, the emotion speculation unit 68 will use the person's personal tag. Thereby, the emotion estimation unit 68 can estimate emotions with higher precision than using a general label. Further, compared with the method of calculating the emotional feature amount of a plurality of elements from a single sensor, since the amount of information is larger, the accuracy of the estimated emotion can be improved. When this process ends, the process proceeds to step S69.

當沒有檢測出心跳資料的情況下(步驟S63→否)情感推測部68會從臉部的特徵量、聲紋的特徵量以及氣溫資訊中算出多次元的情感特徵量，並與多次元空間的情感標籤相比對，藉此推測出這個人的情感(步驟S65)。情感推測部68因為沒有檢測到心跳資料就沒有使用心跳圖樣的特徵量。情感推測部68能夠防止無效的感測資料造成的錯誤。當這個處理結束時，進入步驟S69的處理。 When the heartbeat data is not detected (step S63: No), the emotion estimating unit 68 calculates the emotional feature amount of the multi-element from the feature amount of the face, the feature amount of the voiceprint, and the temperature information, and the multi-element space The emotion tag is compared, thereby inferring the emotion of the person (step S65). The emotion estimation unit 68 does not use the feature amount of the heartbeat pattern because no heartbeat data is detected. The emotion estimation unit 68 can prevent an error caused by invalid sensing data. When this process ends, the process proceeds to step S69.

在步驟S66中，情感推測部68會判定是否檢測出關於這個人的心跳資料。當檢測出心跳資料的情況下(步驟S66→是)情感推測部68會從臉部的特徵量、心跳圖樣的特徵量與氣溫資訊中算出多次元的情感特徵量，並與多次元空間的情感標籤相比對，藉此推測出這個人的情感(步驟S67)。情感推測部 68沒有使用聲紋的特徵量。情感推測部68能夠防止無效的感測資料造成的錯誤。當這個處理結束時，進入步驟S69的處理。 In step S66, the emotion estimating unit 68 determines whether or not the heartbeat data about the person is detected. When the heartbeat data is detected (step S66→YES), the emotion estimating unit 68 calculates the emotional feature amount of the multi-element from the feature amount of the face, the feature amount of the heartbeat pattern, and the temperature information, and the emotion of the multi-element space. The tag is compared, thereby inferring the emotion of the person (step S67). The emotion estimation unit 68 does not use the feature amount of the voiceprint. The emotion estimation unit 68 can prevent an error caused by invalid sensing data. When this process ends, the process proceeds to step S69.

當沒有檢測出關於這個人的心跳資料的情況下(步驟S66→否)情感推測部68會從臉部的特徵量以及氣溫資訊中算出多次元的情感特徵量，並與多次元空間的情感標籤相比對，藉此推測出這個人的情感(步驟S68)。當這個處理結束時，進入步驟S69的處理。 When the heartbeat data about the person is not detected (step S66->No), the emotion estimating unit 68 calculates the emotional feature amount of the multi-element from the feature amount of the face and the temperature information, and the emotion tag with the multi-element space. In contrast, the person's emotion is estimated by this (step S68). When this process ends, the process proceeds to step S69.

在步驟S69中，情感推測部68判定是否對於個人特定識別處理中檢出的各個人進行反覆的處理。當全部的人的處理結束的情況下，情感推測部68會從檢測出的人們的情感中推測出場的情感(步驟S70)，結束第13圖的處理。情感推測部68與個人認證部63相同地，當沒有對應臉部影像的聲音時可以組合靜音的聲音信號，在沒有對應臉部影像的心跳資料時則可以組合平均的心跳圖樣的特徵量。 In step S69, the emotion estimation unit 68 determines whether or not the individual processing is performed for each person detected in the individual-specific recognition processing. When the processing of all the people is completed, the emotion estimation unit 68 estimates the emotion of the scene from the detected emotions of the person (step S70), and ends the processing of FIG. Similarly to the personal authentication unit 63, the emotion estimation unit 68 can combine the mute sound signals when there is no sound corresponding to the face image, and can combine the feature amounts of the average heartbeat pattern when there is no heartbeat data corresponding to the face image.

第14圖係顯示疲勞感、清醒感、舒適感的3軸所構成的多次元情感特徵量空間81的概念圖。本實施型態的多次元的情感特徵量空間81是以3軸構成。情感推測部68會從臉部的特徵量或聲紋的特徵量或心跳圖樣的特徵量或氣溫資訊中算出情感特徵量。情感特徵量會顯示出在這個多次元的情感特徵量空間81中這個人的情感位於哪一個座標。在這個情感特徵量算出時，會使用例如階層型人工神經網路等。 Fig. 14 is a conceptual diagram showing a multi-element emotion feature amount space 81 composed of three axes of fatigue, sensation, and comfort. The multi-element emotion feature amount space 81 of the present embodiment is constituted by three axes. The emotion estimation unit 68 calculates the emotion feature amount from the feature amount of the face, the feature amount of the voiceprint, the feature amount of the heartbeat pattern, or the temperature information. The emotional feature quantity will show which coordinate the person's emotion is located in in the multi-element emotional feature volume space 81. When this emotion feature quantity is calculated, for example, a hierarchical artificial neural network or the like is used.

第15圖係顯示清醒感與舒適感的2軸的情感特徵量平面82的概念圖。這個情感特徵量平面82顯示第14圖所示的情感特徵量空間81中疲勞感的值被設定為既定值的情況下的切平面。情感特徵量平面82的縱軸是清醒感，上方向顯示興奮，下方向顯示沈靜。情感特徵量平面82的橫軸是舒適感，右方向表示舒適，左方向表示不舒適。 Fig. 15 is a conceptual diagram showing the 2-axis emotional feature amount plane 82 of the sense of awakening and comfort. This emotion feature amount plane 82 displays the tangent plane in the case where the value of the fatigue feeling in the emotion feature amount space 81 shown in Fig. 14 is set to a predetermined value. The vertical axis of the emotional feature amount plane 82 is a sense of waking, the upper direction shows excitement, and the lower direction shows silence. The horizontal axis of the emotion feature amount plane 82 is a sense of comfort, the right direction indicates comfort, and the left direction indicates discomfort.

情感特徵量平面82的第1象限顯示精神一振的情感，標記了興奮、驚喜、喜悅、幸福的標籤。第2象限顯示焦躁的情感，標記了警戒、恐懼、憤怒、緊張、不滿、不愉快、壓抑的標籤。第3象限顯示無聊的情感，標記了悲傷、煩惱、無聊、憂鬱的標籤。第4象限顯示放鬆的情感，標示了睡意、安穩、放鬆、安心、滿足的標籤。藉由比對情感推測部68所算出的情感特徵量與上述的情感標籤，能夠推測出人的情感。 The first quadrant of the Emotional Feature Plane 82 shows the emotion of the spirit, marking the label of excitement, surprise, joy, and happiness. The second quadrant shows anxious emotions, marking warnings, fears, anger, nervousness, dissatisfaction, unpleasantness, and depression. The third quadrant shows boring emotions, marking the label of sadness, trouble, boredom, and melancholy. The 4th quadrant shows the feelings of relaxation, marking the label of drowsiness, stability, relaxation, peace of mind, and satisfaction. The emotion of the person can be estimated by comparing the emotion feature amount calculated by the emotion estimation unit 68 with the emotion tag described above.

(變形例) (Modification)

本發明並不限定於上述實施型態，在不脫離本發明的旨趣的範圍內，能夠變更實施方式，例如接下來的(a)~(j)。 The present invention is not limited to the above-described embodiments, and the embodiments can be modified without departing from the scope of the present invention, for example, the following (a) to (j).

(a)本發明中，機器2並不限定於車輛，也可以是醫療機器、家電機器、或允許門的開閉的防盜機器等，並不限定。 (a) In the present invention, the device 2 is not limited to a vehicle, and may be a medical device, a home electric appliance, or an antitheft device that allows the door to be opened and closed, and is not limited.

(b)本發明的行動感測器並不限定於感測心跳的心跳感測器，也可以是感測體重、血壓、體脂肪率、會話量、睡眠時間、脈搏、皮膚溫度、呼吸、身體運動、加速度等之中的任一者。(c)本發明也可以不具備行動感測器，而藉由相機與麥克風的組合來做個人認證。 (b) The motion sensor of the present invention is not limited to a heartbeat sensor that senses a heartbeat, but may also sense weight, blood pressure, body fat percentage, conversation amount, sleep time, pulse, skin temperature, breathing, body. Any of motion, acceleration, and the like. (c) The present invention may also have no action sensor, but a personal authentication by a combination of a camera and a microphone.

(d)本發明的相機32不限定於拍攝可見光的裝置，也可以是拍攝紅外線或紫外線的裝置，也可以是藉由超音波來拍攝物體的超音波相機。(e)相機32也可以是拍攝二次元的影像的單一光學系統所構成的相機，並不限定於立體相機。(f)機械學習伺服器4不限定於設置於雲端(資料中心)的伺服器，也可以是連接認證裝置1使用的機械學習裝置。(g)人工神經網路的階層不限定於4層，也可以是3層以下或者是5層以上。(h)個人認證或情感辨識中的機械學習並不限定於階層型人工神經網路。這些也可以是決策樹學習、支持向量機、LSTM(Long short-term memory)、聚類分析、關聯規則學習、遺傳程式設計、貝氏網路、強化學習、表達式學習、k-最鄰近算法(k-nearest neighbor algorithm)、RusBoost、線性判別，監督學習(Boosting等)，裝袋算法(bagging) 等。(i)情感特徵量空間的次元不限定於3次元，也可以是更多的次元。(j)麥克風31不限定於指向性的麥克風，也可以從複數的麥克風來特定出音源位置。 (d) The camera 32 of the present invention is not limited to a device that captures visible light, and may be a device that captures infrared rays or ultraviolet rays, or may be an ultrasonic camera that images an object by ultrasonic waves. (e) The camera 32 may be a camera constituted by a single optical system that captures an image of a secondary element, and is not limited to a stereo camera. (f) The machine learning server 4 is not limited to a server installed in the cloud (data center), and may be a machine learning device used by the connection authentication device 1. (g) The level of the artificial neural network is not limited to four layers, and may be three or less layers or five or more layers. (h) Mechanical learning in personal authentication or emotional recognition is not limited to hierarchical artificial neural networks. These can also be decision tree learning, support vector machine, LSTM (Long short-term memory), cluster analysis, association rule learning, genetic programming, Bayesian network, reinforcement learning, expression learning, k-nearest neighbor algorithm (k-nearest neighbor algorithm), RusBoost, linear discrimination, supervised learning (Boosting, etc.), bagging algorithm (bagging), etc. (i) The dimension of the emotional feature quantity space is not limited to the third dimension, and may be more dimensional elements. (j) The microphone 31 is not limited to a directional microphone, and the sound source position may be specified from a plurality of microphones.

Claims

一種情感識別裝置，包括：影像特徵量算出裝置，從相機拍攝的被拍攝體的影像資訊中算出第1情感特徵量；聲音特徵量算出裝置，從麥克風收集該被拍攝體發出的聲音之聲音資訊中算出第2情感特徵量；以及辨識裝置，根據合成了該第1情感特徵量及該第2情感特徵量之情感特徵量，辨識該被拍攝體的情感。 An emotion recognition device includes: an image feature amount calculation device that calculates a first emotion feature amount from image information of a subject imaged by a camera; and a sound feature amount calculation device that collects sound information of a sound emitted by the object from a microphone The second emotion feature amount is calculated, and the recognition device recognizes the emotion of the subject based on the emotion feature amount in which the first emotion feature amount and the second emotion feature amount are combined.

如申請專利範圍第1項所述之情感識別裝置，更包括：行動特徵量算出裝置，分析行動感測器從該被拍攝體取得的行動資訊，算出第3情感特徵量，其中該辨識裝置根據從合成了該第1情感特徵量至該第3情感特徵量之情感特徵量，辨識該被拍攝體的情感。 The emotion recognition device according to claim 1, further comprising: an action feature amount calculation device that analyzes action information obtained by the action sensor from the subject, and calculates a third emotion feature amount, wherein the recognition device is based on The emotion feature amount of the subject is recognized from the emotion feature amount in which the first emotion feature amount is synthesized to the third emotion feature amount.

如申請專利範圍第2項所述之情感識別裝置，更包括：環境特徵量算出裝置，分析環境感測器從該被拍攝體的周圍取得的環境資訊，算出第4情感特徵量，其中該辨識裝置根據從合成了該第1情感特徵量至該第4情感特徵量之情感特徵量，辨識該被拍攝體的情感。 The emotion recognition device according to claim 2, further comprising: an environmental feature amount calculation device that analyzes environmental information acquired by the environmental sensor from the periphery of the subject, and calculates a fourth emotional feature amount, wherein the identification The device recognizes the emotion of the subject based on the emotional feature amount from the first emotional feature amount to the fourth emotional feature amount.

如申請專利範圍第1至3項任一項所述之情感識別裝置，更包括：影像特定裝置，從該相機拍攝的該被拍攝體的影像資訊中特定出該被拍攝體的嘴巴動的期間，其中該聲音特徵量算出裝置從各個該被拍攝體的嘴巴動的期間的聲音資訊，算出關於該被拍攝體的第2情感特徵量。 The emotion recognition device according to any one of claims 1 to 3, further comprising: an image specific device, wherein a period of movement of the subject of the subject is specified from image information of the subject captured by the camera The voice feature amount calculation device calculates the second emotion feature amount for the subject from the sound information of the period in which the mouth of each subject moves.

如申請專利範圍第1至3項中任一項所述之情感識別裝置，其中該被拍攝體為複數，該情感識別裝置更包括：影像/個人特徵量算出裝置，從該相機拍攝的各個該被拍攝體的影像資訊中，分別算出第1個人特徵量；以及聲音/個人特徵量算出裝置，從該麥克風收錄的該聲音資訊中算出第2個人特徵量，其中該聲音特徵量算出裝置參考各個該被拍攝體的影像資訊與該聲音資訊，特定出各個該被拍攝體所發出的聲音資訊。 The emotion recognition device according to any one of claims 1 to 3, wherein the subject is plural, and the emotion recognition device further includes: an image/personal feature amount calculation device, each of the images captured from the camera The first person feature amount is calculated in the image information of the subject, and the voice/personal feature amount calculation device calculates a second person feature amount from the voice information recorded by the microphone, wherein the voice feature amount calculation means refers to each The image information of the subject and the sound information specify the sound information emitted by each of the subjects.

一種情感識別程式，讓電腦執行以下步驟：以相機拍攝人的被拍攝體並記錄影像資訊，同時收錄該被拍攝體發出的聲音資訊；從該影像資訊算出第1情感特徵量；從該聲音資訊算出第2情感特徵量；根據合成了該第1情感特徵量以及該第2情感特徵量之情感特徵量，辨識該被拍攝體的情感。 An emotion recognition program, the computer performs the following steps: photographing a person's subject with a camera and recording image information, and recording sound information sent by the subject; calculating a first emotional feature amount from the image information; from the sound information The second emotion feature amount is calculated, and the emotion of the subject is recognized based on the emotion feature amount in which the first emotion feature amount and the second emotion feature amount are combined.