JP6166234B2

JP6166234B2 - Robot control apparatus, robot control method, and robot control program

Info

Publication number: JP6166234B2
Application number: JP2014162607A
Authority: JP
Inventors: 崇裕松元; 山田　智広; 智広山田; 章裕宮田; 良輔青木; 俊一瀬古; 渡部　智樹; 智樹渡部
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-08-08
Filing date: 2014-08-08
Publication date: 2017-07-19
Anticipated expiration: 2034-08-08
Also published as: JP2016036883A

Description

本発明は、ロボットを制御する技術に関する。 The present invention relates to a technique for controlling a robot.

人間同士で共に映像を見た場合と一人で視聴する場合とでは、同じ喜劇ビデオを見せたとしても共視聴する他者が同席する方が一人で視聴するよりも笑う頻度と程度が増し、映像自体に対する面白さが向上することが分かっている（例えば非特許文献１）。そのため、ロボットと映像を共視聴することで共に映像内容に対して笑い・喜び・悲しみ・怒りといった同調的反応をロボットに行わせると、一人きりで映像を見た場合よりも笑い・喜びといった感情を促進し、悲しみ・怒りといった感情を抑えることが可能となる。 When people watch videos together and when they watch them alone, even if they show the same comedy video, the co-viewers who are co-viewing are more likely to laugh and laugh than those who watch alone. It has been found that the fun with respect to itself is improved (for example, Non-Patent Document 1). Therefore, by co-viewing the video with the robot and making the robot perform a synchronous reaction such as laughter, joy, sadness, and anger with respect to the video content together, emotions such as laughter and joy than when watching the video alone It is possible to suppress emotions such as sadness and anger.

また、人とＣＧ人物とのコミュニケーション研究において、共感を与えるような表情変化をＣＧ人物に行わせることで人に対し親和動機を与えることが指摘されている。親和動機とは、相手に対して近寄り・協力し・行為に報いることを求める欲求と定義されており、人は自分と類似した態度をとる他者に対して親和動機を抱くと考えられている。 Further, in communication research between a person and a CG person, it has been pointed out that an affinity motivation is given to a person by causing the CG person to perform a facial expression change that gives empathy. Affinity motivation is defined as a desire to approach, cooperate with, or reward the other person, and people are considered to have affinity motivation for others who have a similar attitude to them. Yes.

映像視聴時にロボットがユーザに対し情報を提供する技術として、例えば非特許文献２では、ユーザの視聴番組ログと視聴中の発話から、視聴番組に対するユーザの評価をプロファイルとして推定し、視聴中にユーザが退屈そうであればプロファイルを用いてロボットが他のテレビ番組を推薦する技術が開示されている。 For example, in Non-Patent Document 2, a robot provides information to a user when viewing a video. In Non-Patent Document 2, a user's evaluation of a viewing program is estimated as a profile from the user's viewing program log and the utterance being viewed. However, a technique is disclosed in which a robot recommends another television program using a profile if it seems to be bored.

また、視聴番組に関するソーシャルメディア上のコメントをロボットが発話文として用いユーザに向けて対話を行い、更にロボットがソーシャルメディア上へコメントとして投稿するという、ロボットがソーシャルメディアの仲介役を行う技術が開示されている（例えば非特許文献３）。 Also disclosed is a technology that acts as an intermediary for social media, in which robots use social media comments related to viewing programs as utterances, interact with users, and robots post comments on social media. (For example, Non-Patent Document 3).

また、映像を見ている人の笑い行為に対して胸部の筋電位の変化を検知し、ロボットが反応して笑うことで、ユーザの笑い行動を促進する技術が開示されている（例えば非特許文献４）。 In addition, a technique is disclosed that promotes user laughing behavior by detecting changes in myoelectric potential of the chest in response to the laughing behavior of the person watching the video, and making the robot react and laugh (for example, non-patented) Reference 4).

大森慈子、外１名、「他者の存在が映像に対する面白さと笑い表情の表出に与える影響」、２０１１年、仁愛大学研究紀要、人間学部篇、第１０号、pp.25-32Keiko Omori, 1 other person, “Effects of the existence of others on the expression of fun and expression of laughter”, 2011, Bulletin of Niai University, Faculty of Humanities, No. 10, pp.25-32 高間康史、外５名、「テレビ視聴時の情報推薦に基づくヒューマン・ロボットコミュニケーション」、２００７年、人口知能学会全国大会２００７、2D5-5Yasushi Takama, 5 others, “Human Robot Communication based on Recommendation of Information when Watching TV”, 2007, National Congress of Population Intelligence Society 2007, 2D5-5 高橋達、外２名、「高齢者の発話機会増加のためのソーシャルメディア仲介ロボット」、２０１２年１０月、電子情報通信学会、信学技報、vol.112, no.233, CNR2012-9, pp.21-26Tatsuhashi Takahashi and two others, “Social Media Mediation Robot for Increasing Speaking Opportunities of Elderly People”, October 2012, IEICE, IEICE Technical Report, vol.112, no.233, CNR2012-9, pp.21-26 福嶋政期、外３名、「笑い増幅器：笑い増幅効果の検証」、２０１０年、ヒューマンインターフェース学会論文誌、pp.199-207Masaki Fukushima, 3 others, “Laughter Amplifier: Verification of Laughter Amplification Effect”, 2010, Journal of Human Interface Society, pp.199-207

しかしながら、非特許文献２に開示された技術では、番組単位のユーザ評価により他の番組を紹介しているのみで、視聴時の感情表現については扱うことができない。 However, with the technique disclosed in Non-Patent Document 2, other programs are only introduced by user evaluation in units of programs, and emotional expressions at the time of viewing cannot be handled.

また、非特許文献３においても、ソーシャルメディアから決定するのはロボットの発話内容のみであり、表出すべき感情表現については扱っていない。非特許文献３では、ロボットの発話動作をソーシャルメディアコメント情報のみから決定している。しかし、ソーシャルメディア上のコメント内容はユーザが映像から受ける感情と必ずしも近いものとは限らないため、ユーザが映像から受けた感情と、ソーシャルメディアコメント情報より決定されるロボットの発話動作からユーザが受ける印象が大きく食い違う場合において、ロボットの反応がユーザにとって共感できないものになってしまう課題がある。 Also in Non-Patent Document 3, only the utterance content of the robot is determined from social media, and does not deal with emotional expressions to be expressed. In Non-Patent Document 3, the speech operation of the robot is determined only from social media comment information. However, since the comment content on social media is not necessarily close to the emotion that the user receives from the video, the user receives from the emotion that the user received from the video and the utterance action of the robot determined from the social media comment information There is a problem that the reaction of the robot becomes unsympathetic to the user when the impressions are greatly different.

また、非特許文献４は、ユーザの笑い行動のみを促進する手法である。「笑い」は喜感情の中の１要素であり興奮や悲しみといった感情表現全体における僅かな部分しか対応することができない。また、ロボットへの入力がユーザの笑い反応のみであるため、ロボットはユーザの反応を検出してから検出内容に応じるというユーザに対してリアクティブ（reactive）な制御しかすることができない。そのため、筋電位の変化が検出できないような笑い反応が少ない場合には、ロボットからユーザに笑うように働き掛けるといったユーザに対するプロアクティブ（proactive）な制御を行うことができず、映像視聴時の体験を向上させる条件が限定的である。 Non-Patent Document 4 is a technique for promoting only the user's laughing behavior. “Laughter” is one element of emotion and can only handle a small part of the entire emotional expression such as excitement and sadness. In addition, since the input to the robot is only the user's laughter reaction, the robot can only perform reactive control on the user that responds to the detected content after detecting the user's reaction. Therefore, when there are few laughing reactions that cannot detect changes in myoelectric potential, it is impossible to perform proactive control for the user, such as working to laugh at the user from the robot. The conditions for improvement are limited.

本発明は、これらの課題に鑑みてなされたものであり、ユーザの視聴する映像に共感しているような同調的反応をロボットに行わせるロボット制御装置、ロボット制御方法及びロボット制御プログラムを提供することを目的とする。 The present invention has been made in view of these problems, and provides a robot control device, a robot control method, and a robot control program that cause a robot to perform a synchronous reaction that is sympathetic to a video viewed by a user. For the purpose.

本発明のロボット制御装置は、ユーザとともに映像を視聴するような動作をロボットに実行させるロボット制御装置であって、ユーザとともに映像を視聴するような動作をロボットに実行させるロボット制御装置であって、人間が前記映像を見た場合に当該人間に喚起される感情を表す映像印象情報と、前記映像を見た前記ユーザの感情を表すユーザ感情情報とを入力し、互いに関連する２種類の感情の種類を１組としてｎ組（ｎは１以上の整数）の感情の種類が予め設定されている場合において、前記映像印象情報から、予め用意された変換ルールを用いて、当該各種類の感情の大きさを示す値を生成し、前記ユーザ感情情報から、前記変換ルールを用いて、当該各種類の感情の大きさを示す値を生成し、前記各組につき、（１）前記映像印象情報から生成した当該組の一方の感情の種類についての値に予め定められた重みαを乗じた値と前記ユーザ感情情報から生成した当該組の当該一方の感情の種類についての値に予め定められた重みβを乗じた値の和を前記ロボットの当該組の当該一方の感情の種類についての値として計算し、（２）前記映像印象情報から生成した当該組の他方の感情の種類についての値に前記重みαを乗じた値と前記ユーザ感情情報から生成した当該組の当該他方の感情の種類についての値に前記重みβを乗じた値の和を前記ロボットの当該組の当該他方の感情の種類についての値として計算する感情状態決定部を備える。 A robot control apparatus according to the present invention is a robot control apparatus that causes a robot to perform an operation such as viewing a video together with a user, and a robot control apparatus that causes a robot to perform an operation such as viewing a video together with a user. Video impression information representing emotions aroused by a human when he / she sees the video and user emotion information representing the emotions of the user who viewed the video are input, and two types of emotions related to each other are input. In the case where n types (n is an integer of 1 or more) of emotion types are set in advance, the type of emotion is determined from the video impression information using a conversion rule prepared in advance. A value indicating the size is generated, and a value indicating the size of each type of emotion is generated from the user emotion information using the conversion rule. For each set, (1) the video A value obtained by multiplying a value for one emotion type of the set generated from the elephant information by a predetermined weight α and a value for the one emotion type of the set generated from the user emotion information. The sum of the values multiplied by the weight β is calculated as a value for the one emotion type of the set of the robot, and (2) for the other emotion type of the set generated from the video impression information The sum of the value obtained by multiplying the value by the weight α and the value for the other emotion type of the set generated from the user emotion information by the weight β is the other emotion of the set of the robot. An emotional state determination unit that calculates the value of the type of

また、本発明のロボット制御方法は、ユーザとともに映像を視聴するような動作をロボットに実行させるロボット制御装置が行うロボット制御方法であって、人間が前記映像を見た場合に当該人間に喚起される感情を表す映像印象情報と、前記映像を見た前記ユーザの感情を表すユーザ感情情報とを入力し、互いに関連する２種類の感情の種類を１組としてｎ組（ｎは１以上の整数）の感情の種類が予め設定されている場合において、前記映像印象情報から、予め用意された変換ルールを用いて、当該各種類の感情の大きさを示す値を生成し、前記ユーザ感情情報から、前記変換ルールを用いて、当該各種類の感情の大きさを示す値を生成し、前記各組につき、（１）前記映像印象情報から生成した当該組の一方の感情の種類についての値に予め定められた重みαを乗じた値と前記ユーザ感情情報から生成した当該組の当該一方の感情の種類についての値に予め定められた重みβを乗じた値の和を前記ロボットの当該組の当該一方の感情の種類についての値として計算し、（２）前記映像印象情報から生成した当該組の他方の感情の種類についての値に前記重みαを乗じた値と前記ユーザ感情情報から生成した当該組の当該他方の感情の種類についての値に前記重みβを乗じた値の和を前記ロボットの当該組の当該他方の感情の種類についての値として計算する。 The robot control method of the present invention is a robot control method performed by a robot control device that causes a robot to perform an operation of viewing a video together with a user. When the human views the video, the robot is controlled. Video impression information representing emotions and user emotion information representing the emotions of the user who viewed the video, and n sets of two types of emotions related to each other (n is an integer of 1 or more) ) Emotion types are set in advance, a value indicating the size of each type of emotion is generated from the video impression information using a conversion rule prepared in advance, and from the user emotion information Then, using the conversion rule, a value indicating the magnitude of each type of emotion is generated, and for each set, (1) a value for one type of emotion of the set generated from the video impression information The sum of a value obtained by multiplying a predetermined weight α and a value obtained by multiplying a value for the one emotion type of the set generated from the user emotion information by a predetermined weight β is the value of the set of the robot. Calculated as a value for the one emotion type, and (2) generated from the value obtained by multiplying the value for the other emotion type in the set generated from the video impression information by the weight α and the user emotion information A sum of values obtained by multiplying the value for the other emotion type of the set by the weight β is calculated as a value for the other emotion type of the set of the robot.

本発明では、映像印象情報とユーザ感情情報を用いてロボットの感情状態を決定するので共感を得られたと感じさせる同調的反応をロボットに行わせることが可能である。 In the present invention, since the emotional state of the robot is determined using the video impression information and the user emotion information, it is possible to cause the robot to perform a synchronous reaction that makes it feel that empathy has been obtained.

本実施の形態におけるロボット制御装置１を含む全体構成図である。1 is an overall configuration diagram including a robot control device 1 in the present embodiment. 感情語辞書１５の構成例を示す図である。It is a figure which shows the structural example of the emotion word dictionary. 感情状態変換ルールベース１８の構成例を示す図である。It is a figure which shows the structural example of the emotion state conversion rule base. 音声表現データベース１９の構成例を示す図である。It is a figure which shows the structural example of the audio | voice expression database. 身体表現データベース２０の構成例を示す図である。It is a figure which shows the structural example of the body representation database. 位置情報取得サーバ６が保持する方位情報データベース６０の構成例を示す図である。It is a figure which shows the structural example of the azimuth | direction information database 60 which the position information acquisition server 6 hold | maintains. 本実施の形態の映像関連情報とユーザ情報の流れを示す図である。It is a figure which shows the flow of the video relevant information and user information of this Embodiment. 図７に、更にロボット視線情報の流れを追記した図である。FIG. 7 is a diagram in which the flow of robot line-of-sight information is additionally written. 映像関連情報収集部１１の動作フローを示す図である。It is a figure which shows the operation | movement flow of the video relevant information collection part. ユーザ情報収集部１２の動作フローを示す図である。It is a figure which shows the operation | movement flow of the user information collection part. 映像印象推定部１４の動作フローを示す図である。It is a figure which shows the operation | movement flow of the image | video impression estimation part. ユーザ感情推定部１３の動作フローを示す図である。It is a figure which shows the operation | movement flow of the user emotion estimation part. 感情状態決定部１６の動作フローを示す図である。It is a figure which shows the operation | movement flow of the emotional state determination part. 感情要素を２次元座標上にマッピングする例を示す図である。It is a figure which shows the example which maps an emotion element on a two-dimensional coordinate. 感情表現生成部１７の動作フローを示す図である。It is a figure which shows the operation | movement flow of the emotion expression production | generation part.

以下、本発明の実施の形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１に、本実施の形態におけるロボット制御装置を含む全体構成図を示す。 FIG. 1 is an overall configuration diagram including a robot control device according to the present embodiment.

本実施の形態におけるロボット制御装置１は、映像とユーザ動画情報とユーザ音声情報をそれぞれ取得し、それらの情報を元にロボット５の感情状態を決定してロボット５を制御する装置である。映像は、映像を表示すると共に送信可能な映像表示デバイス２から取得する。ユーザ動画情報は、ユーザのユーザ表情を撮影し動画情報を送信可能なカメラ３から取得する。ユーザ音声情報は、ユーザの発話する音声を取得して音声情報を送信可能なマイク４から取得する。 The robot control device 1 according to the present embodiment is a device that acquires video, user moving image information, and user audio information, determines the emotional state of the robot 5 based on the information, and controls the robot 5. The video is acquired from the video display device 2 that can display and transmit the video. The user moving image information is acquired from the camera 3 that can capture the user's facial expression and transmit moving image information. The user voice information is acquired from the microphone 4 that can acquire the voice uttered by the user and transmit the voice information.

ロボット５は、音声データと駆動制御命令を受信し音声データの再生と駆動制御命令に従いモータを駆動させて身体表現をすることが可能なロボットである。位置情報取得サーバ６は、ロボット５から見たユーザと映像表示デバイス２の方位角と仰俯角を保持し送信可能なサーバであり、方位情報データベース６０を備える。 The robot 5 is a robot capable of receiving voice data and a drive control command and driving the motor in accordance with the playback of the voice data and the drive control command to express the body. The position information acquisition server 6 is a server that can hold and transmit the azimuth and elevation angles of the user and the video display device 2 viewed from the robot 5, and includes an azimuth information database 60.

〔ロボット制御装置の構成〕
まず、本実施の形態におけるロボット制御装置１の構成について説明する。 [Robot controller configuration]
First, the configuration of the robot control apparatus 1 in the present embodiment will be described.

ロボット制御装置１は、映像関連情報収集部１１、ユーザ情報収集部１２、ユーザ感情推定部１３、映像印象推定部１４、感情語辞書１５、感情状態決定部１６、感情表現生成部１７、感情状態変換ルールベース１８、音声表現データベース１９、身体表現データベース２０を備える。ロボット制御装置１は、演算処理装置、記憶装置等を備えたコンピュータにより構成して、各部の処理がプログラムによって実行されるものとしてもよい。このプログラムはロボット制御装置１が備える記憶装置に記憶されており、磁気ディスク、光ディスク、半導体メモリ等の記録媒体に記録することも、ネットワークを通して提供することも可能である。なお、ロボット制御装置１の各機能構成部をコンピュータで構成するようにしてもよい。図１では、ロボット制御装置１とロボット５とを分けて示しているが、ロボット５内にロボット制御装置１を組み込んでもよい。 The robot control apparatus 1 includes a video related information collection unit 11, a user information collection unit 12, a user emotion estimation unit 13, a video impression estimation unit 14, an emotion word dictionary 15, an emotion state determination unit 16, an emotion expression generation unit 17, an emotion state. A conversion rule base 18, a speech expression database 19, and a body expression database 20 are provided. The robot control device 1 may be configured by a computer including an arithmetic processing device, a storage device, and the like, and the processing of each unit may be executed by a program. This program is stored in a storage device included in the robot control apparatus 1 and can be recorded on a recording medium such as a magnetic disk, an optical disk, or a semiconductor memory, or provided through a network. In addition, you may make it comprise each function structure part of the robot control apparatus 1 with a computer. In FIG. 1, the robot control device 1 and the robot 5 are shown separately, but the robot control device 1 may be incorporated in the robot 5.

映像関連情報収集部１１は、映像を映像表示デバイス２から受信し、当該映像に関する映像関連情報を映像印象推定部１４に送信する。ここで映像関連情報とは、映像の動画像情報と音声情報と字幕情報を含む情報である。 The video related information collection unit 11 receives video from the video display device 2 and transmits video related information regarding the video to the video impression estimation unit 14. Here, the video related information is information including video moving image information, audio information, and caption information.

ユーザ情報収集部１２は、カメラ３で撮影して送信したユーザの表情をユーザ動画情報として受信する。また、マイク４で収音したユーザの発話する音声をユーザ音声情報として受信する。そして受信したユーザ動画情報とユーザ音声情報を、ユーザ情報としてユーザ感情推定部１３に送信する。 The user information collection unit 12 receives a user's facial expression photographed and transmitted by the camera 3 as user moving image information. In addition, the voice uttered by the user collected by the microphone 4 is received as user voice information. And the received user moving image information and user audio | voice information are transmitted to the user emotion estimation part 13 as user information.

映像印象推定部１４は、映像関連情報収集部１１から送信される映像関連情報から、人が当該映像関連情報を視聴した際に喚起される感情である映像印象情報を推定する。ユーザ感情推定部１３は、ユーザ情報収集部１２から送信されるユーザ情報から、ユーザの感じているユーザ感情情報を推定する。 The video impression estimation unit 14 estimates video impression information, which is an emotion aroused when a person views the video related information, from the video related information transmitted from the video related information collection unit 11. The user emotion estimation unit 13 estimates user emotion information felt by the user from the user information transmitted from the user information collection unit 12.

感情状態決定部１６は、映像印象推定部１４で推定した映像印象情報と、ユーザ感情推定部１３で推定したユーザ感情情報の２つの情報を用いてロボット５の感情状態であるロボット感情状態を決定する。感情表現生成部１７は、感情状態決定部１６で決定したロボット感情状態を受信し、ロボット制御装置１が保持するデータを参照してロボット５に送信する音声データと駆動制御命令を生成する。 The emotional state determination unit 16 determines the robot emotional state, which is the emotional state of the robot 5, using two pieces of information: the video impression information estimated by the video impression estimation unit 14 and the user emotion information estimated by the user emotion estimation unit 13. To do. The emotion expression generation unit 17 receives the robot emotion state determined by the emotion state determination unit 16, and generates voice data and a drive control command to be transmitted to the robot 5 with reference to data held by the robot control device 1.

続いて、ロボット制御装置１が保持するデータについて説明する。 Next, data held by the robot control apparatus 1 will be described.

図２に、感情語辞書１５の構成例を示す。感情語辞書１５は、「きれい」や「凄い」といった感情を表現する単語と各単語の感情強度を格納する。感情を表現する各単語に対して、各単語を、「興奮」、「喜び」、「怒り」、「悲しみ」、「…」、といった感情要素に分解した各感情要素の強度が対応付けられている。例えばその強度は0.0〜1.0の値によって表される。感情語辞書１５は、映像印象推定部１４が、映像印象情報を推定するときに用いられる。 FIG. 2 shows a configuration example of the emotion word dictionary 15. The emotion word dictionary 15 stores words expressing emotions such as “beautiful” and “great” and the emotion strength of each word. Each word that expresses emotion is associated with the intensity of each emotion element that is broken down into emotion elements such as “excitement”, “joy”, “anger”, “sadness”, “…” Yes. For example, the strength is represented by a value of 0.0 to 1.0. The emotion word dictionary 15 is used when the video impression estimation unit 14 estimates video impression information.

図３に、感情状態変換ルールベース１８の構成例を示す。感情状態変換ルールベース１８は、感情語辞書１５の感情要素に対応させた覚醒値式と快値式を格納する。感情要素の「興奮」に対する覚醒値は例えば覚醒値=ｎ*0.15+0.35、快値は例えば快値=ｎ*0.10+0.10で与えられる。ここでｎは感情要素の強度を表す変数である。感情状態変換ルールベース１８は、感情状態決定部１６がロボットの感情状態を決定するときに用いられる。なお、覚醒値と快値について詳しくは後述する。 FIG. 3 shows a configuration example of the emotion state conversion rule base 18. The emotion state conversion rule base 18 stores an arousal value expression and a pleasant value expression corresponding to emotion elements in the emotion word dictionary 15. The arousal value for the emotion element “excitement” is given by, for example, arousal value = n * 0.15 + 0.35, and the pleasant value is given by, for example, pleasant value = n * 0.10 + 0.10. Here, n is a variable representing the strength of the emotion element. The emotional state conversion rule base 18 is used when the emotional state determination unit 16 determines the emotional state of the robot. The awakening value and the pleasant value will be described later in detail.

図４に、音声表現データベース１９の構成例を示す。音声表現データベース１９は、音声名に対応させた覚醒値と快値と音声ファイルパスを格納する。例えば音声名「すごい」に対する覚醒値は0.85、快値は0.80である。音声ファイルパスは、「すごい」等の音声データの所在を表す。音声データは、例えばロボット制御装置１の内部にあってもよいし、外部のサーバにあってもよい。音声表現データベース１９は、感情表現生成部１７が音声データを生成するときに用いられる。 FIG. 4 shows a configuration example of the speech expression database 19. The voice expression database 19 stores the arousal value, pleasant value, and voice file path associated with the voice name. For example, the arousal value for the voice name “Wow” is 0.85, and the pleasant value is 0.80. The audio file path represents the location of audio data such as “awesome”. For example, the voice data may be in the robot control apparatus 1 or in an external server. The voice expression database 19 is used when the emotion expression generation unit 17 generates voice data.

図５に、身体表現データベース２０の構成例を示す。身体表現データベース２０は、動作名と、それに対応させた覚醒値と快値と視線対象と駆動部制御関数を格納する。駆動部制御関数は、モータ制御箇所と値、シーケンス移動間隔の２つの値の配列で構成される。モータ制御箇所と値は、作動させるロボット５のモータ部位と作動量の値（角度）である。シーケンス移動間隔は、モータ制御箇所と値を次の値に変更するまでの時間間隔を表す数値である。身体表現データベース２０は、感情表現生成部１７が駆動制御命令を生成するときに用いられる。 FIG. 5 shows a configuration example of the body representation database 20. The body representation database 20 stores an action name, an arousal value, a pleasant value, a line-of-sight object, and a drive unit control function corresponding to the motion name. The drive unit control function is composed of an array of two values of a motor control location, a value, and a sequence movement interval. The motor control part and the value are the motor part of the robot 5 to be operated and the value (angle) of the operation amount. The sequence movement interval is a numerical value representing the time interval until the motor control location and value are changed to the next value. The body expression database 20 is used when the emotion expression generation unit 17 generates a drive control command.

〔ロボット制御装置が利用する外部のデータ〕
続いて、ロボット制御装置１が利用する外部のサーバが保持するデータについて説明する。図６に、位置情報取得サーバ６が保持する方位情報データベース６０の構成例を示す。方位情報データベース６０は、視線対象に対応させた方位角と仰俯角を格納する。視線対象は、ロボット５の視線の先にあるユーザ又は映像表示デバイス２のことである。方位角と仰俯角はロボット５の視線の角度である。 [External data used by the robot controller]
Next, data held by an external server used by the robot control apparatus 1 will be described. FIG. 6 shows a configuration example of the azimuth information database 60 held by the position information acquisition server 6. The azimuth information database 60 stores an azimuth angle and an elevation angle corresponding to the line-of-sight object. The line-of-sight target is the user or the video display device 2 ahead of the line of sight of the robot 5. The azimuth angle and the elevation angle are the angles of the line of sight of the robot 5.

各視線対象の方位角と仰俯角は、ユーザ、ロボット５、映像表示デバイス２の移動に応じて逐次更新されるものである。その更新は、映像表示デバイス２とユーザの位置が決まった時点で、人が位置情報取得サーバ６に設定してもよい。または、自動的にそれらの位置を検出して逐次更新されるようにしてもよい。 The azimuth angle and elevation angle of each line-of-sight object are sequentially updated according to the movement of the user, the robot 5, and the video display device 2. The update may be set by the person in the position information acquisition server 6 when the positions of the video display device 2 and the user are determined. Alternatively, these positions may be automatically detected and sequentially updated.

方位角は、ロボット５から各対象が地面に水平方向において北を0°とした際にどの方向にあるかを示したものである。仰俯角は、ロボット５から各対象が地面に垂直方向において水平を0°、真上を90°とした際にどの角度にあるかを示したものである。 The azimuth angle indicates in which direction each object from the robot 5 is located when the north is 0 ° in the horizontal direction on the ground. The elevation angle indicates the angle at which each object from the robot 5 is at 0 ° in the direction perpendicular to the ground and 90 ° directly above the ground.

方位情報データベース６０は、感情表現生成部１７が駆動制御命令を生成するときに用いられる。 The direction information database 60 is used when the emotion expression generation unit 17 generates a drive control command.

〔映像関連情報とユーザ情報の流れ〕
本実施の形態を詳細に説明する前に、主要な情報である映像関連情報とユーザ情報の情報の流れを図７に示して本実施の形態の基本的な考えについて説明する。映像関連情報は、映像印象を推定するために必要な視聴シーンに関わる情報であり、視聴映像の音声、動画像、視聴映像に含まれる字幕などのデータを含む情報である。なお、視聴映像に関するTwitterなどのソーシャルコメント情報等を映像関連情報に含めてもよい。 [Flow of video-related information and user information]
Before describing the present embodiment in detail, the basic concept of the present embodiment will be described with reference to FIG. 7 showing the flow of information of video-related information and user information, which are main information. The video-related information is information related to a viewing scene necessary for estimating a video impression, and is information including data such as audio of a viewing video, a moving image, and captions included in the viewing video. In addition, social comment information such as Twitter related to the viewing video may be included in the video related information.

ユーザ情報は、映像視聴時におけるユーザの表情、ユーザの発話音声などのデータを含む情報である。なお、ユーザの姿勢やユーザの心拍数などの生体情報をユーザ情報に含めてもよい。 The user information is information including data such as the user's facial expression and the user's uttered voice during video viewing. In addition, you may include biometric information, such as a user's attitude | position and a user's heart rate, in user information.

映像関連情報が入力される映像印象推定部１４は、一般の人が映像関連情報を視聴した際に喚起される感情を、映像印象情報として推定する。ユーザ情報が入力されるユーザ感情推定部１３は、ユーザ情報からユーザが感じているユーザ感情情報を推定する。 The video impression estimation unit 14 to which video related information is input estimates an emotion aroused when a general person views the video related information as video impression information. The user emotion estimation unit 13 to which user information is input estimates user emotion information felt by the user from the user information.

感情状態決定部１６は、映像印象情報とユーザ感情情報を入力として、ユーザが共感を得られたと感じさせる同調的反応をロボット５に行わせる感情表現を生成する。ロボット５に行わせる感情表現としては、ロボット５の腕の動きや頭の動きなどの身体的表現、ロボット５が発話する音声、ロボット５の表情などである。 The emotion state determination unit 16 receives the video impression information and the user emotion information as input, and generates an emotion expression that causes the robot 5 to perform a synchronous reaction that makes the user feel empathy. Emotional expressions to be performed by the robot 5 include physical expressions such as arm movements and head movements of the robot 5, voices uttered by the robot 5, facial expressions of the robot 5, and the like.

感情状態決定部１６が、２つの情報を用いてロボット５の感情状態を決定することで、ユーザの感情的な反応が少ない場合や、映像印象情報とユーザ感情情報のそれぞれから推定されるロボット５の感情表現が大きく食い違う場合でも、適切な感情表現をロボット５に対して行わせることが可能になる。 The emotional state determination unit 16 determines the emotional state of the robot 5 using two pieces of information, so that the emotional reaction of the user is small or the robot 5 estimated from each of the video impression information and the user emotional information. Even if the emotional expression of the robot is greatly different, it is possible to cause the robot 5 to perform appropriate emotional expression.

また、映像印象情報とユーザ感情情報に加えてロボット５の視線方向を表すロボット視線情報を用いてロボット５に行わせる感情表現を生成するようにしてもよい。図８に、ロボット視線情報を追加した本実施の形態の情報の流れを示す。ロボット視線情報を用いて、映像印象情報とユーザ感情情報のどちらの情報に重み付けして利用するかを設定することで、ロボット５により人らしい自然な共感反応を行わせることが可能になる。 In addition to the image impression information and the user emotion information, emotion expression to be performed by the robot 5 may be generated using the robot line-of-sight information indicating the line-of-sight direction of the robot 5. FIG. 8 shows the information flow of the present embodiment to which robot line-of-sight information is added. By setting which of the video impression information and the user emotion information is weighted using the robot line-of-sight information, it becomes possible for the robot 5 to perform natural human-like empathy reaction.

このように本実施の形態は、映像関連情報とユーザ情報を用いてロボット５の適切な感情状態を決定する考えである。よって本実施の形態に係るロボット制御装置１は、ロボット５にユーザがロボットと共に映像を視聴して共感を得られたと感じさせる同調的反応を行わせることができる。 As described above, the present embodiment is an idea of determining an appropriate emotional state of the robot 5 using the video related information and the user information. Therefore, the robot control apparatus 1 according to the present embodiment can cause the robot 5 to perform a synchronous reaction that makes the user feel that the user has obtained sympathy by viewing the video together with the robot.

〔ロボット制御装置の動作〕
次に、ロボット制御装置１の動作について説明する。以下では、ロボット制御装置１を構成する各機能構成部ごとの動作を順に説明する。 [Operation of robot controller]
Next, the operation of the robot control device 1 will be described. Below, operation | movement for every function structure part which comprises the robot control apparatus 1 is demonstrated in order.

〔映像関連情報収集部〕
図９に、映像関連情報収集部１１の動作フローを示してその動作を説明する。映像関連情報収集部１１は、映像表示デバイス２から映像を受信すると動作を開始する。まず、映像関連情報収集部１１は、映像表示デバイス２から受信した情報が映像であるか否かを判定する（ステップＳ１１０）。映像でなかった場合は、映像が受信されるまで判定動作を繰り返す（ステップＳ１１０のＮｏ）。 [Video Related Information Collection Department]
FIG. 9 shows an operation flow of the video related information collection unit 11 and the operation will be described. The video related information collection unit 11 starts operation when receiving video from the video display device 2. First, the video-related information collection unit 11 determines whether the information received from the video display device 2 is a video (step S110). If it is not a video, the determination operation is repeated until the video is received (No in step S110).

映像が受信されると、映像を構成する映像の動画像情報と音声情報と字幕情報を映像関連情報として映像印象推定部１４へ送信する（ステップＳ１１１）。映像関連情報収集部１１は、映像関連情報を映像印象推定部１４に送信すると動作を終了する。 When the video is received, moving image information, audio information, and subtitle information of the video constituting the video are transmitted to the video impression estimation unit 14 as video related information (step S111). The video related information collection unit 11 ends the operation when the video related information is transmitted to the video impression estimation unit 14.

〔ユーザ情報収集部〕
図１０に、ユーザ情報収集部１２の動作フローを示してその動作を説明する。ユーザ情報収集部１２は、カメラ３からユーザ動画情報、マイク４からユーザ音声情報、の何れかを受信すると動作を開始する。まず、ユーザ情報収集部１２は、カメラ３とマイク４から受信した情報がユーザ動画情報とユーザ音声情報であるか否かを判定する（ステップＳ１２０）。ユーザ動画情報とユーザ音声情報の両方が受信できるまで判定動作を繰り返す（ステップＳ１２０のＮｏ）。 [User Information Collection Department]
FIG. 10 shows an operation flow of the user information collection unit 12 and its operation will be described. When the user information collection unit 12 receives any one of the user moving image information from the camera 3 and the user voice information from the microphone 4, the user information collecting unit 12 starts the operation. First, the user information collection unit 12 determines whether the information received from the camera 3 and the microphone 4 is user moving image information and user audio information (step S120). The determination operation is repeated until both the user moving image information and the user voice information can be received (No in step S120).

ユーザ動画情報とユーザ音声情報の両方が受信できると、ユーザ動画情報とユーザ音声情報をユーザ情報としてユーザ感情推定部１３へ送信する（ステップＳ１２１）。ユーザ情報収集部１２は、ユーザ情報をユーザ感情推定部１３に送信すると動作を終了する。 When both the user moving image information and the user audio information can be received, the user moving image information and the user audio information are transmitted to the user emotion estimation unit 13 as user information (step S121). The user information collection unit 12 ends the operation when the user information is transmitted to the user emotion estimation unit 13.

〔映像印象推定部〕
図１１に、映像印象推定部１４の動作フローを示してその動作を説明する。映像印象推定部１４は、最初に映像関連情報収集部１１から受信した情報が映像関連情報であるか否かを判定する（ステップＳ１４０）。 [Image impression estimation part]
FIG. 11 shows an operation flow of the video impression estimation unit 14 and its operation will be described. The video impression estimation unit 14 first determines whether the information received from the video related information collection unit 11 is video related information (step S140).

映像印象推定部１４は、映像関連情報を受信すると動作を開始する（ステップＳ１４０のＹｅｓ）。まず、映像印象推定部１４は、受信した映像関連情報のうちの映像の動画像情報と音声情報から音声動画像感情の抽出を行う（ステップＳ１４１）。音声動画像感情は、感情要素のカテゴリとその強度で構成される。 The video impression estimation part 14 will start operation | movement, if video related information is received (Yes of step S140). First, the video impression estimation unit 14 extracts voice moving image emotions from video moving image information and audio information in the received video related information (step S141). The voice moving image emotion includes a category of emotion elements and their strengths.

音声動画像感情の抽出方法は、例えば参考文献１（Go Irie, Takashi Satou, Akira Kojima, Toshihiko Yamasaki and Kiyaharu Aizawa,“Affective Audio-Visual Words and Latent Topic Driving Model for Realizing Movie Affective Scene Classification”, IEEE Transactions on Multimedia, Vol.12, No.6, pp.523-534, 2010.）に記載された映像区間に対して感情ラベルを付与する方法を用いる。参考文献１では、音声情報から映像区間に対して感情ラベルを付与する手法が示されている。つまり、映像区間は音声区間に対応する区間である。音声区間は、例えばユーザ音声情報の振幅が所定値以上ある区間とすることで容易に抽出することができる。また、映像区間は話題（Topics）毎に分割された区間としてもよい。話題は、音声情報を音声認識した結果を分析して抽出してもよいし、後述する字幕情報を形態素解析した結果から抽出するようにしてもよい。 For example, Reference 1 (Go Irie, Takashi Satou, Akira Kojima, Toshihiko Yamasaki and Kiyaharu Aizawa, “Affective Audio-Visual Words and Latent Topic Driving Model for Realizing Movie Affective Scene Classification”, IEEE Transactions on Multimedia, Vol.12, No.6, pp.523-534, 2010.). Reference 1 shows a technique for assigning an emotion label to a video section from audio information. That is, the video section is a section corresponding to the audio section. The voice section can be easily extracted by setting, for example, a section in which the amplitude of the user voice information is a predetermined value or more. The video section may be a section divided for each topic (Topics). The topic may be extracted by analyzing the result of speech recognition of the speech information, or may be extracted from the result of subtitle information described below after morphological analysis.

参考文献１に記載された方法は、映像区間毎に８種類の感情を表すラベルのうち最も適する感情ラベルの推定を行う。８種類の感情ラベルは、例えば、喜び（joy）、賛同（acceptance）、恐れ（fear）、驚き（surprise）、悲しみ（sadness）、嫌悪（disgust）、怒り（anger）、期待（anticipation）の８種類である。この感情ラベルが音声動画像感情の感情要素のカテゴリとなる。 The method described in Reference 1 estimates the most suitable emotion label among the eight types of labels representing emotion for each video section. Eight emotion labels are, for example, joy, acceptance, fear, surprise, sadness, disgust, anger, anticipation. It is a kind. This emotion label is a category of emotion elements of the voice moving image emotion.

本実施の形態では、映像区間について複数のシーンが検出され、複数のラベルが推定されてもよいし、ラベル数が０でもよい。映像区間の全シーンに対して感情ラベルの推定が終了したら各感情ラベルの数をカウントし、各感情ラベルの数を、当該感情要素の感情強度としたものを音声動画像感情とする。 In the present embodiment, a plurality of scenes may be detected for a video section, a plurality of labels may be estimated, and the number of labels may be zero. When the estimation of emotion labels is completed for all scenes in the video section, the number of each emotion label is counted, and the number of each emotion label is defined as the emotion strength of the emotion element as a voice moving image emotion.

次に、映像印象推定部１４は、映像関連情報の字幕情報に対して形態素解析処理を実施する（ステップＳ１４２）。形態素解析処理は周知の方法を用いて行う。 Next, the video impression estimation unit 14 performs a morphological analysis process on the caption information of the video related information (step S142). The morphological analysis process is performed using a known method.

字幕情報を形態素解析した結果の各語彙と一致する語句を、感情語辞書１５を参照して探索する。感情語辞書１５を用いて日本語テキストから感情抽出を行う方法は、例えば参考文献２（菅原久嗣、外２名、「感情語辞書を用いた日本語テキストからの感情抽出」、the 23^rd Annual Conference of the Japanese Society for Artificial Intelligence,2009）に記載されている。一致する語句があればその単語を保持して探索を継続する。例えば、字幕情報を形態素解析した結果の語彙に、「凄い」と「きれい」の単語が含まれていると仮定する。その場合、映像印象推定部１４は、「凄い」に対応する感情要素として「興奮：0.6」、「喜び：0.2」、「怒り：0.1」を保持する（図２参照）。同様に、「きれい」に対応する感情要素として「興奮：0.3」、「喜び：0.8」も保持する。この探索動作は、形態素解析結果の全ての語彙について終了するまで繰り返される。 The phrase that matches each vocabulary as a result of the morphological analysis of the caption information is searched with reference to the emotion word dictionary 15. The method of extracting emotions from Japanese text using the emotion word dictionary 15 is described in, for example, Reference 2 (Hisaaki Sugawara, two others, “Emotion extraction from Japanese text using emotion word dictionary”, the 23 ^rd Annual Conference of the Japanese Society for Artificial Intelligence, 2009). If there is a matching phrase, the word is retained and the search is continued. For example, suppose that the vocabulary resulting from morphological analysis of subtitle information includes the words “great” and “beautiful”. In this case, the video impression estimation unit 14 holds “excitement: 0.6”, “joy: 0.2”, and “anger: 0.1” as emotion elements corresponding to “great” (see FIG. 2). Similarly, “excitement: 0.3” and “joy: 0.8” are also held as emotional elements corresponding to “beautiful”. This search operation is repeated until all vocabularies of the morphological analysis result are completed.

そして、映像印象推定部１４は、一致する語句として保持された全単語の各感情要素の感情強度の和を計算し、各感情要素と感情強度の和の組みを字幕感情として抽出する（ステップＳ１４４）。上記の例では、「興奮：0.9(0.6+0.3)」、「喜び：1.0(0.2+0.8)」、「怒り：0.1(0.1+0)」が字幕感情として抽出される。 Then, the video impression estimation unit 14 calculates the sum of the emotion intensities of the emotion elements of all the words held as matching words, and extracts the combination of the emotion elements and the sum of the emotion intensities as subtitle emotions (step S144). ). In the above example, “excitement: 0.9 (0.6 + 0.3)”, “joy: 1.0 (0.2 + 0.8)”, and “anger: 0.1 (0.1 + 0)” are extracted as subtitle emotions.

次に、映像印象推定部１４は、音声動画像感情と字幕感情から映像印象情報の決定を行う（ステップＳ１４５）。映像印象情報の決定は、音声動画像感情と字幕感情の各感情要素の強度を合算し、その合算値が例えば最大値の５以上である場合はその感情要素の強度を５として行う。全ての感情要素の値の算出が終了したら、各感情要素と各算出値の組を映像印象情報として決定する。決定した映像印象情報を、感情状態決定部１６に送信すると映像印象推定部１４は動作を終了する（ステップＳ１４６）。 Next, the video impression estimation unit 14 determines video impression information from the audio moving image emotion and the subtitle emotion (step S145). The determination of the video impression information is performed by adding the strengths of the emotional elements of the voice moving image emotion and the subtitle emotion, and setting the strength of the emotional element to 5 when the combined value is, for example, 5 or more of the maximum value. When calculation of the values of all the emotion elements is completed, a set of each emotion element and each calculated value is determined as video impression information. When the determined video impression information is transmitted to the emotional state determination unit 16, the video impression estimation unit 14 ends the operation (step S146).

〔ユーザ感情推定部〕
図１２に、ユーザ感情推定部１３の動作フローを示してその動作を説明する。ユーザ感情推定部１３は、最初にユーザ情報収集部１２から受信した情報がユーザ情報であるか否かを判定する（ステップＳ１３０）。 [User Emotion Estimator]
FIG. 12 shows an operation flow of the user emotion estimation unit 13 and its operation will be described. The user emotion estimation part 13 determines whether the information received from the user information collection part 12 first is user information (step S130).

ユーザ感情推定部１３は、ユーザ情報を受信すると動作を開始する（ステップＳ１３０のＹｅｓ）。ユーザ感情推定部１３は、受信したユーザ情報のうちのユーザ動画情報からユーザの表情感情の抽出を行う（ステップＳ１３１）。表情感情の抽出には、例えばオムロン株式会社の商品であるOKAO Visionなどを利用することができる。 The user emotion estimation part 13 will start operation | movement, if user information is received (Yes of step S130). The user emotion estimation part 13 extracts a user's facial expression emotion from the user moving image information in the received user information (step S131). For example, OKAO Vision, a product of OMRON Corporation, can be used to extract facial expressions.

OKAO Visionではユーザ動画情報中のユーザの表情に対して７つの感情ラベルとその度合いを計測することができる。７つの感情ラベルは、上記の８種類の感情ラベルに含まれるものであり、この感情ラベルが表情感情の感情要素のカテゴリを表す。度合いを表す値の最大値がE_maxだとした場合、計測した各感情ラベルの度合いのうち計測区間における最大値に対し、5/ E_maxを積算した値と各感情ラベルに一致する感情要素の全てのペアを表情感情として決定する。 OKAO Vision can measure seven emotion labels and their levels for the user's facial expressions in user video information. The seven emotion labels are included in the above eight types of emotion labels, and the emotion labels represent the category of emotion elements of facial expression emotion. Assuming that the maximum value representing the degree is E _max , the value obtained by adding 5 / E _max to the maximum value in the measurement section of the degree of each measured emotion label and the emotion element that matches each emotion label All pairs are determined as facial emotions.

次に、ユーザ感情推定部１３は、受信したユーザ情報のうちのユーザ音声情報から音声感情の抽出を行う（ステップＳ１３２）。音声感情の抽出方法は、例えば非特許文献３に記載された人の発話に対して感情ラベルを付与する方法を用いる。非特許文献３の方法は、音声区間に５種類の感情を表す感情ラベルのうち最も適する感情ラベルの推定を行う。５種類の感情ラベルは、喜びや悲しみや驚きなどである。この感情ラベルが音声感情の感情要素のカテゴリとなる。 Next, the user emotion estimation unit 13 extracts a voice emotion from the user voice information in the received user information (step S132). As a voice emotion extraction method, for example, a method of giving an emotion label to a person's utterance described in Non-Patent Document 3 is used. The method of Non-Patent Document 3 estimates an emotion label that is most suitable among emotion labels representing five types of emotions in a speech section. The five types of emotion labels are joy, sadness and surprise. This emotion label is a category of emotion elements of voice emotion.

本実施の形態においては、ユーザ音声情報について複数の音声区間が検出され、複数の感情ラベルが推定されてもよいし、感情ラベルの数が０であってもよい。ユーザ音声情報の音声区間に対して感情ラベルの推定が終了したら、音声区間毎に感情ラベルの数をカウントし、各感情ラベルの数を一致する感情要素の感情強度としたものを音声感情とする。 In the present embodiment, a plurality of voice sections may be detected for user voice information, a plurality of emotion labels may be estimated, and the number of emotion labels may be zero. When estimation of emotion labels for the speech segment of the user speech information is finished, the number of emotion labels is counted for each speech segment, and the emotion strength of the emotion element that matches the number of each emotion label is defined as speech emotion. .

そして、ユーザ感情推定部１３は、ステップＳ１３１で抽出した表情感情とステップＳ１３２で抽出した音声感情の各感情要素の強度を合計した値を算出する。合計値が、例えば最大値の５以上である場合はその感情要素の強度を５とする。全ての感情要素の値の算出が終了したら、各感情要素と各算出値の組をユーザ感情情報として決定する（ステップＳ１３３）。決定したユーザ感情情報を、感情状態決定部１６に送信するとユーザ感情推定部１３は動作を終了する（ステップＳ１３４）。 And the user emotion estimation part 13 calculates the value which totaled the intensity | strength of each emotion element of the facial expression emotion extracted at step S131, and the audio | voice emotion extracted at step S132. For example, when the total value is 5 or more, which is the maximum value, the intensity of the emotion element is set to 5. When calculation of the values of all emotion elements is completed, a set of each emotion element and each calculated value is determined as user emotion information (step S133). When the determined user emotion information is transmitted to the emotion state determination unit 16, the user emotion estimation unit 13 ends the operation (step S134).

〔感情状態決定部〕
図１３に、感情状態決定部１６の動作フローを示してその動作を説明する。感情状態決定部１６は、映像印象推定部１４から受信した情報が映像印象情報であるか否かを判定する（ステップＳ１６０）。映像印象情報である場合（ステップＳ１６０のＹｅｓ）、感情状態決定部１６は、ユーザ感情推定部１３から受信した情報がユーザ感情情報であるか否かを判定する（ステップＳ１６１）。なお、判定の順番はユーザ感情情報が先であってもよい。 [Emotion state determination section]
FIG. 13 shows an operation flow of the emotion state determination unit 16 and its operation will be described. The emotional state determination unit 16 determines whether the information received from the video impression estimation unit 14 is video impression information (step S160). When it is video impression information (Yes in step S160), the emotion state determination unit 16 determines whether or not the information received from the user emotion estimation unit 13 is user emotion information (step S161). The order of determination may be user emotion information first.

感情状態決定部１６は、映像印象情報とユーザ感情情報の両方を受信すると動作を開始する（ステップＳ１６１のＹｅｓ）。感情状態決定部１６は、映像印象情報とユーザ感情情報の２つの情報を用いてロボット５の感情状態を決定する。 The emotion state determination unit 16 starts the operation when receiving both the video impression information and the user emotion information (Yes in step S161). The emotional state determination unit 16 determines the emotional state of the robot 5 using two pieces of information of video impression information and user emotion information.

感情状態決定部１６において２つの情報を用いる理由について説明する。映像印象情報を利用する理由は、ユーザ感情情報のみでロボット５の感情を決定しようとすると、ユーザの感情的な反応が少ない場合に、ロボット５から積極的にユーザに働きかけて、より笑わせる、より興奮させるといったユーザに対するプロアクティブな制御が行えない問題が発生するからである。また逆に、映像印象情報のみでロボット５の感情を決定しようとすると、ユーザの感情と映像印象情報から推定される印象が大きく食い違う場合に、ロボット５の反応がユーザにとって共感できないものになってしまう問題が発生するからである。 The reason for using two pieces of information in the emotional state determination unit 16 will be described. The reason for using the video impression information is that if the emotion of the robot 5 is determined only by the user emotion information, the robot 5 actively works on the user and makes the user laugh more when the emotional reaction of the user is small. This is because there arises a problem that proactive control for the user such as excitement cannot be performed. On the other hand, if it is attempted to determine the emotion of the robot 5 using only the video impression information, if the user's emotion and the impression estimated from the video impression information differ greatly, the reaction of the robot 5 cannot be sympathized with the user. This is because a problem occurs.

印象が大きく食い違う問題として、例えば映像は笑いを喚起するような内容であるがユーザはそれを全く面白いと感じていない場合に、ロボット５がユーザの反応と関係なく笑い続けてしまう現象が生じる。このような問題を回避する目的で本実施の形態では、映像印象情報とユーザ感情情報の２つの情報を用いてロボット５の感情状態を決定する。 For example, if the image has contents that provoke laughter but the user does not feel it interesting at all, the robot 5 continues to laugh regardless of the user's reaction. In order to avoid such a problem, in this embodiment, the emotional state of the robot 5 is determined using two pieces of information of video impression information and user emotion information.

ロボット５がユーザに対して共感を生じさせる人らしい感情表現を行うためには、映像やユーザ状況に応じて適切な感情状態を決定し、その感情状態を伝えるための感情表現を行う必要がある。ここで適切な感情状態とは、例えば喜・怒・哀・楽などのような感情を構成する各感情要素に対して、各感情要素の強度を表す値が適切に決められることである。 In order for the robot 5 to express human-like emotions that cause empathy for the user, it is necessary to determine an appropriate emotional state according to the video and the user situation and to express the emotional state to convey the emotional state. . Here, an appropriate emotional state means that a value representing the strength of each emotional element is appropriately determined for each emotional element that constitutes emotions such as joy, anger, sorrow, and comfort.

例えば各感情要素の強度が0〜1の範囲で定まるとすると、激しい怒りを表す感情状態は、喜=0、怒=1、哀=0、楽=0。小さな喜びを表す感情状態は喜=0.3、怒=0、哀=0、楽=0のように定まる。適切な感情状態を定めることができれば、ロボット５はその感情状態に応じた表情表出、身体動作、音声表現などによって感情表現を行うことが可能である。 For example, if the intensity of each emotional element is determined in the range of 0 to 1, the emotional state representing intense anger is joy = 0, anger = 1, sorrow = 0, comfort = 0. The emotional state representing small joy is determined as joy = 0.3, anger = 0, sorrow = 0, comfort = 0. If an appropriate emotional state can be determined, the robot 5 can perform emotional expression by facial expression, body movement, voice expression, etc. according to the emotional state.

２つの情報を用いて上記の問題を緩和若しくは解決する手法として例えば中間値利用手法がある。次に中間値利用手法について説明する。 As a technique for mitigating or solving the above problem using two pieces of information, there is an intermediate value utilization technique, for example. Next, the intermediate value utilization method will be described.

〔中間値利用手法〕
中間値利用手法では、映像印象情報とユーザ感情情報を同じ種類の感情状態の要素と値のスケールにマッピングする。マッピングにおいては、マッピングの変換対応を記述したデータベースを用意することで実現できる。マッピングは、互いに関連する２種類の感情状態の要素を１組として行う。例えば、感情状態の要素を２組の覚醒−非覚醒と快−不快とした場合、マッピングの変換対応は例えば図１４に示すように行うことができる。ここで、感情状態の要素とはロボット５の感情の種類である。よって、感情状態の要素を覚醒−非覚醒と快−不快とした場合のロボット５の感情は、覚醒レベル（覚醒値）と快適レベル（快値）とで表現される。 [Method of using intermediate values]
In the intermediate value utilization technique, video impression information and user emotion information are mapped to the same kind of emotion state elements and value scales. Mapping can be realized by preparing a database describing mapping conversion correspondence. Mapping is performed as a set of elements of two types of emotional states related to each other. For example, when the emotional state elements are two sets of awakening-non-awakening and pleasant-uncomfortable, mapping conversion can be performed as shown in FIG. Here, the emotional state element is the type of emotion of the robot 5. Therefore, the emotion of the robot 5 when the elements of the emotional state are awakening-non-awakening and pleasant-uncomfortable are expressed by an arousal level (arousal value) and a comfort level (a pleasant value).

なお、覚醒−非覚醒と快−不快の組はラッセンの円環モデルに基づく。感情要素を表す２軸は、この例の他に歓喜−悲嘆と激怒と恐怖の組、憧憬−憎悪と警戒−驚嘆の組なども考えられる。 The awakening-non-awaking and pleasant-uncomfortable group is based on Lassen's ring model. In addition to this example, the two axes representing emotional elements can be a combination of joy—grief, rage, and fear, and a longing—hate, vigilance—marvel.

つまり、感情状態決定部１６は、映像印象情報とユーザ感情情報とを、ロボット５の感情状態を決定するための感情状態の要素を２軸の２次元空間に配置し、配置した映像印象情報とユーザ感情情報との間の感情状態の要素の値を、ロボット５の感情状態として決定する。 That is, the emotional state determination unit 16 arranges the image impression information and the user emotion information in the two-dimensional two-dimensional space with the elements of the emotional state for determining the emotional state of the robot 5 and the arranged video impression information and The value of the element of the emotion state between the user emotion information is determined as the emotion state of the robot 5.

なお、本実施の形態では感情状態の要素を２組とした例で説明するが、感情要素を表す軸は１組単位でよい。つまり、覚醒−非覚醒の１組（１軸）に対して中間値利用手法を適用してもよい。また、喜び−悲しみ、受容−嫌悪、恐れ−怒り、驚き−期待、の４組に拡張してもよい。このように互いに関連する２種類の感情の種類の組の数は、ｎ組に拡大することができる。 In this embodiment, an example in which two emotional state elements are used will be described. However, an axis representing an emotional element may be one set unit. That is, the intermediate value utilization method may be applied to one set (one axis) of awakening / non-wakening. Further, it may be expanded to four sets of joy-sadness, acceptance-disgust, fear-anger, surprise-expectation. In this way, the number of sets of two types of emotions related to each other can be expanded to n sets.

図１４において、入力１と入力２は２つの情報である。入力１は、例えば映像印象推定部１４が推定した映像印象情報の推定結果が「怒り」の感情を表す単一ラベルで表される例を示している。「怒り」の覚醒値=0.85、快値=-0.5なのでその２軸上の座標に推定結果α(-0.5,0.85)がプロットされる。 In FIG. 14, input 1 and input 2 are two pieces of information. Input 1 shows an example in which, for example, the estimation result of the video impression information estimated by the video impression estimation unit 14 is represented by a single label representing the feeling of “anger”. Since the arousal value of “anger” is 0.85 and the pleasant value is −0.5, the estimation result α (−0.5, 0.85) is plotted on the coordinates on the two axes.

入力２は、例えばユーザ感情推定部１３が推定したユーザ感情情報の推定結果が複数の感情要素とその程度で表される例を示している。ユーザ感情情報の推定結果の例は、「喜び（ｎ=5）」、「興奮（ｎ=2）」である。このように推定結果が複数の感情要素とその程度で構成される場合は、推定結果が変換式ルール上で一致する各感情要素に対して、その程度の値を覚醒値式と快値式に代入した値を求める。そして各感情要素の代入結果の覚醒値と快値のそれぞれを加算した値が推定結果となる。 Input 2 shows an example in which the estimation result of the user emotion information estimated by the user emotion estimation unit 13 is represented by a plurality of emotion elements and their degrees. Examples of estimation results of user emotion information are “joy (n = 5)” and “excitement (n = 2)”. Thus, when the estimation result is composed of a plurality of emotional elements and their levels, for each emotional element whose estimation result matches on the conversion formula rule, the value of that level is converted into an arousal value expression and a pleasant value expression. Find the assigned value. And the value which added each of the arousal value and pleasant value of the substitution result of each emotion element becomes an estimation result.

この例では、快値=2*0.05+0.10+5*0.08+0.35=0.95、覚醒値=２*0.10+0.35+5*0.05+0.25=1.05、である。本実施の形態では1.0以上の値は1.0とするので、２軸上の座標に推定結果β（0.95,1.0）がプロットされる。 In this example, the pleasant value = 2 * 0.05 + 0.10 + 5 * 0.08 + 0.35 = 0.95 and the arousal value = 2 * 0.10 + 0.35 + 5 * 0.05 + 0.25 = 1.05. In this embodiment, since a value of 1.0 or more is 1.0, the estimation result β (0.95, 1.0) is plotted on the two-axis coordinates.

中間値利用手法では、映像印象情報とユーザ感情情報の２つの情報をマッピングした感情状態を用いて次式に示すように両情報の中間値にロボット５の感情状態を決定する。 In the intermediate value utilization method, the emotional state of the robot 5 is determined as an intermediate value between the two pieces of information using an emotional state obtained by mapping two pieces of information of video impression information and user emotional information as shown in the following equation.

ここでRobotArousal_iとRobotPositive_iは時間ｉにおけるロボット５の感情状態の覚醒−非覚醒と快−不快要素の値を表す。時間ｉは、例えば上記の音声区間や映像区間に対応する時間である。式（１）は、感情状態の要素を覚醒−非覚醒と快−不快の２組とした場合の例である。

Here, RobotArousal _i and RobotPositive _i represent the values of the awakening-non-awaking and pleasant-unpleasant elements of the emotional state of the robot 5 at time i. The time i is, for example, a time corresponding to the above-described audio section or video section. Expression (1) is an example in the case where the elements of the emotional state are two sets of awakening-non-wakening and pleasant-uncomfortable.

この感情状態の要素の組はｎ組（ｎは１以上の整数）であってもよい。感情状態の要素の組に、例えば歓喜−悲嘆と激怒と恐怖の組を追加して４組としてもよい。その場合のロボット５の感情には、歓喜−悲嘆のレベルと激怒と恐怖のレベルが追加される。その場合の式（１）は４個の式で表現されることになる。 This emotional state element set may be n sets (n is an integer of 1 or more). For example, a combination of delight, grief, rage, and fear may be added to the set of elements of the emotional state to form four sets. The emotion of the robot 5 in that case is added with a level of joy-grief and a level of rage and fear. In this case, the expression (1) is expressed by four expressions.

また、MovieArousal_iとMoviePositive_iは映像印象情報の覚醒−非覚醒と快−不快要素の値を表す。UserArousal_iとUserPositive_iはユーザ感情情報の覚醒−非覚醒と快−不快要素の値を表す。また、val値はどの程度ユーザ感情情報又は映像印象情報に依存すべきかを規定する値であり0≦val≦１の範囲を取る値に設定する。 MovieArousal _i and MoviePositive _i represent the values of awakening-non-awaking and pleasant-unpleasant elements of video impression information. UserArousal _i and UserPositive _i represent the values of arousal-non-awakening and pleasant-unpleasant elements of user emotion information. The val value is a value that defines how much the user emotion information or video impression information should be relied on, and is set to a value that takes a range of 0 ≦ val ≦ 1.

中間値利用手法を用いることで、ロボット５はユーザの感情状態の表出が少ない場合、例えばUserArousal_iとUserPositive_iが０に近い場合においてもロボット５は映像印象情報の推定値を用いてプロアクティブにユーザに対して感情表現を行うことができる。また、ユーザ感情情報と映像印象情報とから推定される印象が大きく食い違う場合であっても、val値を適切に設定することで、ユーザの反応を考慮した感情表現を行うことが可能である。 By using the intermediate value utilization method, the robot 5 proactively uses the estimated value of the video impression information even when the emotional state of the user is small, for example, when UserArousal _i and UserPositive _i are close to 0. It is possible to express emotions to the user. Moreover, even when the impression estimated from the user emotion information and the video impression information is largely different, it is possible to express the emotion in consideration of the user's reaction by appropriately setting the val value.

例えばval値を大きく設定した場合、映像は笑いを喚起するシーンであるが、ユーザが全く笑っていない場面では、まずロボット５は小さく笑う反応を表出し、その後のロボット５の反応につられてユーザが笑った場合には、ロボット５もUserArousal_iとUserPositive_iの変化に応じて笑い反応を強くするという制御が可能になる。 For example, when the val value is set to a large value, the image is a scene that arouses laughter, but in a scene where the user is not laughing at all, the robot 5 first expresses a reaction of laughing smallly, and the user responds to the subsequent reaction of the robot 5 When the robot 5 laughs, the robot 5 can control to strengthen the laughing reaction in accordance with changes in UserArousal _i and UserPositive _i .

要するに、感情状態決定部１６は、人間が前記映像を見た場合に当該人間に喚起される感情を表す映像印象情報と、前記映像を見た前記ユーザの感情を表すユーザ感情情報とを入力し、互いに関連する２種類の感情の種類を１組としてｎ組（ｎは１以上の整数）の感情の種類が予め設定されている場合において、前記映像印象情報から、予め用意された変換ルールを用いて、当該各種類の感情の大きさを示す値を生成し、前記ユーザ感情情報から、前記変換ルールを用いて、当該各種類の感情の大きさを示す値を生成し、前記各組につき、（１）前記映像印象情報から生成した当該組の一方の感情の種類についての値に予め定められた重みα（α=（1−val））を乗じた値と前記ユーザ感情情報から生成した当該組の当該一方の感情の種類についての値に予め定められた重みβ（β=val）を乗じた値の和を前記ロボットの当該組の当該一方の感情の種類についての値として計算し、（２）前記映像印象情報から生成した当該組の他方の感情の種類についての値に前記重みαを乗じた値と前記ユーザ感情情報から生成した当該組の当該他方の感情の種類についての値に前記重みβを乗じた値の和を前記ロボットの当該組の当該他方の感情の種類についての値として計算する。 In short, the emotion state determination unit 16 inputs video impression information representing emotions aroused by a human when the human views the video, and user emotion information representing the emotion of the user who viewed the video. In the case where n types (n is an integer of 1 or more) of emotion types are set in advance with two types of emotions related to each other as a set, conversion rules prepared in advance are determined from the video impression information. A value indicating the magnitude of each type of emotion is generated, and from the user emotion information, a value indicating the size of each type of emotion is generated using the conversion rule. (1) Generated from the value obtained by multiplying a value for one emotion type of the set generated from the video impression information by a predetermined weight α (α = (1-val)) and the user emotion information About the type of emotion The sum of the values obtained by multiplying the value of the above by a predetermined weight β (β = val) is calculated as a value for the one emotion type of the set of the robot, and (2) generated from the video impression information A sum of a value obtained by multiplying the value of the other emotion type of the set by the weight α and a value obtained by multiplying the value of the other emotion type of the set generated from the user emotion information by the weight β. Calculated as a value for the other emotion type of the set of robots.

次に、中間値利用手法に更にロボット５の視線方向を利用する手法について説明する。 Next, a method for further using the line-of-sight direction of the robot 5 as the intermediate value utilization method will be described.

〔ロボットの視線方向を利用した方式〕
ロボット５が感情表現を行うために利用する映像関連情報やユーザ情報は、ロボット５がどの方向を向いているかに関わらず取得することができる。一方で、人は視覚により周囲の情報を得ることが多い。人同士が交流する際には、相手が視覚によった情報処理をしている前提でコミュニケーションが図られる。 [Method using robot's gaze direction]
Video-related information and user information used by the robot 5 to express emotions can be acquired regardless of which direction the robot 5 is facing. On the other hand, people often obtain surrounding information visually. When people interact with each other, communication is made on the premise that the other party is performing visual information processing.

そのためロボット５が見ていない方向の情報を入力して感情表現を行うと、ロボット５の情報処理系と人が交流する相手に想定するモデルが食い違うため、人がロボット５に対して共感を生む際の障害となる。そこで、映像関連情報やユーザ情報のどちらの情報を重み付けして利用するか、ロボットの視線方向を用いて設定することでより人らしく自然な共感反応を行わせ、共感における障害を減ずる方法が考えられる。 For this reason, if the robot 5 is input information in a direction that the robot 5 is not looking at and expresses emotions, the model assumed for the information processing system of the robot 5 and the person with whom the person interacts is inconsistent. It becomes an obstacle. Therefore, there is a method of reducing obstacles in empathy by making human-like and more empathetic reaction by setting which information of video related information or user information is weighted and using the robot's gaze direction. It is done.

ロボット５の視線方向を利用した方式の実現方法の例としては、中間値利用手法で説明したval値とRobotArousal_iとRobotPositive_iを例えば次式に示すように設定することで実現することができる。 An example of a method for realizing the method using the line-of-sight direction of the robot 5 can be realized by setting the val value, RobotArousal _i, and RobotPositive _i described in the intermediate value using method as shown in the following equation, for example.

ここでRobotView_iは時間ｉにおけるロボット５の視線方向を表す。Movie,User,Noneはそれぞれ映像表示デバイス２、ユーザ、その他の方向に向けたロボット５の視線方向を表すラベルである。また、val_see-movie,val_nosee-movie,val_nosee-user,val_nosee-userはロボット５の振る舞いを決める変数である。これらの変数は、0≦val_see-movie≦1,0≦val_nosee-movie≦1,0≦val_nosee-user≦1,0≦val_nosee-user≦1の範囲を取る。ロボット５に人らしい反応を行わせるためには、これらの変数の値を適切な値に設定すればよい。なお、ロボットの視線方向を用いる場合でも、上記と同様に感情状態の要素の組を増やすことができる。感情状態の要素を増やした場合の式（２）は、上記の式（１）と同様に感情状態の要素の組の数分増加することになる。

Here, RobotView _i represents the viewing direction of the robot 5 at time i. Movie, User, and None are labels indicating the line-of-sight direction of the robot 5 toward the video display device 2, the user, and other directions, respectively. In addition, val _see-movie , val _nosee-movie , val _nosee-user , and val _nosee-user are variables that determine the behavior of the robot 5. These variables take _{0 ≦ val see-movie ≦ 1,0} ≦ val nosee-movie ≦ 1,0 ≦ val nosee-user ≦ 1,0 ≦ val nosee-user ≦ 1 range. In order to cause the robot 5 to perform a human-like reaction, the values of these variables may be set to appropriate values. Even when the robot's line-of-sight direction is used, the set of emotional state elements can be increased as described above. Equation (2) when the number of emotional state elements is increased is increased by the number of sets of emotional state elements in the same manner as the above equation (1).

感情状態決定部１６は、映像印象情報とユーザ感情情報を受信すると、上記の中間値利用手法に基づいてまず映像印象情報を覚醒値と快値のペアに変換する（ステップＳ１６２）。変換の方法は、映像印象情報の各感情要素と一致する感情状態変換ルールベース１８に格納された変換ルールの感情要素の項目を探索し、一致した変換ルールの覚醒値式と快値式に対し、映像印象情報の感情要素の強度値を両変換式の変数ｎに代入する。 When the emotional state determination unit 16 receives the video impression information and the user emotion information, the emotional state determination unit 16 first converts the video impression information into a pair of arousal value and pleasant value based on the above intermediate value utilization method (step S162). The conversion method searches for the emotion element item of the conversion rule stored in the emotion state conversion rule base 18 that matches each emotion element of the video impression information, and for the awakening value expression and the pleasant expression of the matching conversion rule. Then, the intensity value of the emotion element of the video impression information is substituted into the variable n of both conversion equations.

映像印象情報の強度が０ではない全感情要素に対して探索と代入の処理を繰り返し、全覚醒式と快式の代入結果をそれぞれ加算したものを覚醒値と快値とする。ただし、加算した結果が−1以下の場合は−1を、加算した結果が1以上の場合は1を覚醒値または快値の値とする。映像印象情報を変換した覚醒値を表す変数がMovieArousal_iであり、快値を表す変数がMoviePositive_iである。 The search and substitution processes are repeated for all emotion elements whose image impression information intensity is not 0, and the sum of the results of substitution of all awakening expressions and pleasant expressions is used as an arousal value and a pleasant value. However, if the result of addition is −1 or less, −1 is set, and if the result of addition is 1 or more, 1 is set as a value of an arousal value or a pleasant value. A variable representing the arousal value obtained by converting the video impression information is MovieArousal _i , and a variable representing the pleasant value is MoviePositive _i .

感情状態決定部１６は、次にユーザ感情情報を覚醒値と快値のペアに変換する（ステップＳ１６３）。変換の方法は映像印象情報と同様である。 The emotion state determination unit 16 then converts the user emotion information into a pair of arousal value and pleasant value (step S163). The conversion method is the same as the video impression information.

ユーザ感情情報を覚醒値と快値のペアに変換する処理は、具体例を示して説明する。例えば、感情要素と強度nを「喜び、ｎ=5」と「興奮、ｎ=4」と仮定する。その場合、感情状態変換ルールベース１８に格納された変換ルールの快値式は「喜び」の「n*0.08+0.45」と「興奮」の「n*0.10+0.10」を用いる（図３参照）。よって快値は、快値=5*0.08+0.45+4*0.10+0.10=1.35と計算される。 The process of converting user emotion information into a pair of arousal value and pleasant value will be described with a specific example. For example, it is assumed that the emotion element and the intensity n are “joy, n = 5” and “excitement, n = 4”. In that case, “n * 0.08 + 0.45” of “joy” and “n * 0.10 + 0.10” of “excitement” are used as the pleasant expression of the conversion rule stored in the emotion state conversion rule base 18 (see FIG. 3). . Therefore, the pleasant value is calculated as the pleasant value = 5 * 0.08 + 0.45 + 4 * 0.10 + 0.10 = 1.35.

覚醒値式は「喜び」の「n*0.05+0.25」と「興奮」の「n*0.15+0.35」を用いる。よって覚醒値は、覚醒値=5*0.05+0.25+4*0.15+0.35=1.45と計算される。加算した結果が快値=1.35と覚醒値=1.45であり、それぞれが1以上であるので快値=1.0と覚醒値=1.0とされる。したがって上記の変数UserArousal_i=1.0、UserPositive_i=1.0となる。 The arousal value formula uses “n * 0.05 + 0.25” for “joy” and “n * 0.15 + 0.35” for “excitement”. Therefore, the arousal value is calculated as arousal value = 5 * 0.05 + 0.25 + 4 * 0.15 + 0.35 = 1.45. The result of addition is a pleasant value = 1.35 and an arousal value = 1.45, and since each is 1 or more, the pleasant value = 1.0 and the arousal value = 1.0. Therefore, the above variables UserArousal _i = 1.0 and UserPositive _i = 1.0.

ここで映像印象情報から得られた覚醒値（MovieArousal_i）を0、快値（MoviePositive_i）を0と仮定すると、ロボット５の感情状態の覚醒値（RobotArousal_i）と快値（RobotPositive_i）は上記の式（１）に各変数の値を代入することで計算できる。 Here wake values obtained from the image impression information (MovieArousal _i) 0, when the pleasure value (MoviePositive _i) assuming 0, arousal value of emotional state of the robot 5 (RobotArousal _i) and rice (RobotPositive _i) is It can be calculated by substituting the value of each variable into the above equation (1).

どの程度ユーザ感情情報又は映像印象情報に依存すべきかを規定する値であるval値をval=0.5と仮定すると、この例ではロボット５の感情状態のRobotArousal_iはRobotArousal_i=（1-0.5）*0+0.5+1.0=0.5、RobotPositive_iはRobotPositive_i=（1-0.5）*0+0.5+1.0=0.5として生成される（ステップＳ１６４）。 Assuming that val = 0.5 is a value that defines how much user emotion information or video impression information should be relied upon, RobotArousal _i of the robot 5 emotion state is RobotArousal _i = (1-0.5) * in this example. 0 + 0.5 + 1.0 = 0.5, RobotPositive _i is generated as RobotPositive _i = (1-0.5) * 0 + 0.5 + 1.0 = 0.5 (step S164).

感情状態決定部１６は、生成したロボット５の感情状態を表すRobotArousal_iとRobotPositive_iを感情表現生成部１７に送信すると動作を終了する（ステップＳ１６６）。図１１に示すステップＳ１６５の処理は、ロボット視線情報を受信した場合に行われる。 The emotional state determination unit 16 ends the operation when it transmits the RobotArousal _i and RobotPositive _i representing the generated emotional state of the robot 5 to the emotional expression generation unit 17 (step S166). The process of step S165 illustrated in FIG. 11 is performed when robot line-of-sight information is received.

ロボット視線情報が入力された場合の感情状態決定部１６は、ロボット視線情報が映像の方向を表していれば映像印象情報の重み（1-val）を大きく、ユーザの方向を表していればユーザ感情情報の重み（val）を大きくしてロボット５の感情状態を決定する。 When the robot line-of-sight information is input, the emotion state determination unit 16 increases the weight (1-val) of the video impression information if the robot line-of-sight information indicates the direction of the video, and the user if the robot's line-of-sight information indicates the direction of the user. The emotional state of the robot 5 is determined by increasing the weight (val) of the emotion information.

感情状態決定部１６は、感情表現生成部１７からロボット５の視線方向を表すロボット視線情報を受信すると視線対象変数(RobotView_i)を設定する。視線方向が映像表示デバイス２であった場合はRobotView_i=Movieに、視線方向がユーザであった場合はRobotView_i=Userに設定する。視線方向が映像表示デバイス２又はユーザのどちらでもない場合はRobotView_i=Noneに設定する。 When the emotion state determination unit 16 receives the robot gaze information indicating the gaze direction of the robot 5 from the emotion expression generation unit 17, the emotion state determination unit 16 sets a gaze target variable (RobotView _i ). When the line-of-sight direction is the video display device 2, RobotView _i = Movie is set. When the line-of-sight direction is the user, RobotView _i = User is set. If the line-of-sight direction is neither the video display device 2 nor the user, RobotView _i = None is set.

視線対象変数が設定された場合、感情状態決定部１６は上記の式（２）を用いてロボット５の感情状態のRobotArousal_iとRobotPositive_i を生成する（ステップＳ１６４）。映像印象情報から得られた覚醒値と快値をそれぞれMovieArousal_iとMoviePositive_iとし、ユーザ感情情報より得られた覚醒値と快値をそれぞれUserArousal_iとUserPositive_iとし、val_see-movie=0.8, val_nosee-movie=0.2, val_see-user=0.8, val_nosee-user=0.2として上記の式（２）を用いることでロボット５の感情状態の覚醒値RobotArousal_iと快値RobotPositive_i を求める。
ここでval_see-movie=0.8, val_nosee-movie=0.2, val_see-user=0.8, val_nosee-user=0.2の値は一例である。 When the line-of-sight target variable is set, the emotional state determination unit 16 generates RobotArousal _i and RobotPositive _i of the emotional state of the robot 5 using the above equation (2) (step S164). The arousal value and pleasant value obtained from the video impression information are MovieArousal _i and MoviePositive _i , respectively, the arousal value and pleasant value obtained from the user emotion information are UserArousal _i and UserPositive _i , respectively, and val _see-movie = 0.8, val _{_{nosee-movie = 0.2, val see}} -user = 0.8, val as _nosee-user = 0.2 by using the above equation (2) determining the wakefulness value RobotArousal _i and rice RobotPositive _i emotional state of the robot 5.
Here, the _values of val _see-movie = 0.8, val _nosee-movie = 0.2, val _see-user = 0.8, val _nosee-user = 0.2 are examples.

このように視線対象変数が設定された場合の感情状態決定部１６は、ロボット視線情報記映像の方向を示す場合は、ロボット視線情報が映像の方向を示す場合のために予め定められた映像印象情報の重みおよびユーザ感情情報の重みをそれぞれ重みα（val_see-movie）および重みβ（val_nosee-user）に設定し、ロボット視線情報がユーザの方向を示す場合は、ロボット視線情報がユーザの方向を示す場合のために予め定められた映像印象情報の重みおよびユーザ感情情報の重みをそれぞれ重みα（val_nosee-movie）および重みβ(val_see-user)に設定し、ロボット視線情報がその他の方向を示す場合は、ロボット視線情報がその他の方向を示す場合のために予め定められた映像印象情報の重みおよびユーザ感情情報の重みをそれぞれ重みα(val_nosee-movie)および重みβ(val_nosee-user)に設定する。 When the gaze target variable is set in this way, the emotional state determination unit 16 determines the video impression that is predetermined for the case where the robot gaze information indicates the direction of the video when the robot gaze information recording direction is indicated. If the weight of information and the weight of user emotion information are set to weight α (val _see-movie ) and weight β (val _nosee-user ), respectively, and the robot gaze information indicates the direction of the user, the robot gaze information The weight of the video impression information and the weight of the user emotion information predetermined for the case of indicating the direction are set to the weight α (val _nosee-movie ) and the weight β (val _see-user ), respectively. The direction of the video impression information and the weight of the user emotion information determined in advance for the case where the robot line-of-sight information indicates other directions are weight α (val _nosee-movie ) and Set weight β (val _nosee-user ).

〔感情表現生成部〕
図１５に、感情表現生成部１７の動作フローを示してその動作を説明する。感情表現生成部１７は、感情状態決定部１６からロボット５の感情状態を受信すると動作を開始する（ステップＳ１７０のＹｅｓ）。感情表現生成部１７が動作を開始すると、ロボット５の感情状態の覚醒値と快値を用いて音声表現データベース１９を参照し、ロボット５が発する音声データを決定する（ステップＳ１７１）。 [Emotion expression generator]
FIG. 15 shows an operation flow of the emotion expression generation unit 17 and its operation will be described. When the emotion expression generation unit 17 receives the emotion state of the robot 5 from the emotion state determination unit 16, the emotion expression generation unit 17 starts the operation (Yes in step S170). When the emotion expression generation unit 17 starts to operate, the voice expression database 19 is referred to using the arousal value and the pleasant value of the emotion state of the robot 5, and the voice data emitted by the robot 5 is determined (step S171).

ここで決定とは、ある音声データO_nの覚醒値をVoiceArousalo_n、快値をVoicePositiveo_nとしたときの全音声データO_Nの中から次式を満たす参照する音声データを決定することである（図４参照）。音声表現データベース１９は、音声データそのものを持つようにしてもよいが、本実施の形態では、音声表現データベース１９は音声データが格納された場所を表す音声ファイルパスを格納している。 Here determined is to determine the audio data referring satisfies the following formula awake values of certain audio data O _n from the VoiceArousalo _n, all the audio data O _N when the pleasure value was VoicePositiveo _n ( (See FIG. 4). Although the voice expression database 19 may have the voice data itself, in the present embodiment, the voice expression database 19 stores a voice file path representing a place where the voice data is stored.

次に、感情表現生成部１７はロボット５の感情状態の覚醒値と快値を用いて身体表現データベース２０を参照し、ロボット５を駆動する駆動制御情報を決定する（ステップＳ１７２）。ここで決定とは、ある駆動制御情報K_nの覚醒値をMotionArousal_Kn、快値をMotionPositive_Knとしたときの全駆動制御情報K_Nの中から次式を満たす参照する駆動制御情報を決定することである（図５参照）。

Next, the emotional expression generating unit 17 refers to the body expression database 20 using the arousal value and the pleasant value of the emotional state of the robot 5, and determines drive control information for driving the robot 5 (step S172). Here decided is to determine certain MotionArousal _Kn arousal value of the drive control information K _n, drive control information for referencing satisfies the following expression from all drive control information K _N when the pleasure value was MotionPositive _Kn (See FIG. 5).

参照する駆動部制御情報が決定すると、感情表現生成部１７は決定した駆動部制御情報に含まれる視線対象の方位角と仰俯角を方位情報として参照する（ステップＳ１７３）。視線対象が例えば映像表示デバイス２であれば、駆動制御関数の頭部方位角dxと頭部仰俯角dyに、195.6°（図６参照）と-15.1°を代入する（ステップＳ１７４）。

When the driving unit control information to be referred to is determined, the emotion expression generating unit 17 refers to the azimuth angle and the elevation angle of the line-of-sight target included in the determined driving unit control information as the azimuth information (step S173). If the line-of-sight target is, for example, the video display device 2, 195.6 ° (see FIG. 6) and −15.1 ° are substituted into the head azimuth dx and head elevation angle dy of the drive control function (step S174).

感情状態決定部１６がロボット視線情報を利用する場合、感情表現生成部１７は駆動部制御情報に含まれる視線対象をロボット視線情報として感情状態決定部１６に送信する（ステップＳ１７５）。頭部方位角dxと頭部仰俯角dyに角度が代入された駆動制御関数と音声ファイルパスは、駆動制御命令及び音声データとしてロボット５に送信される（ステップＳ１７６）。 When the emotion state determination unit 16 uses the robot line-of-sight information, the emotion expression generation unit 17 transmits the line-of-sight target included in the drive unit control information to the emotion state determination unit 16 as the robot line-of-sight information (step S175). The drive control function and audio file path in which the angles are substituted for the head azimuth angle dx and the head elevation angle dy are transmitted to the robot 5 as drive control commands and audio data (step S176).

この駆動制御命令と音声データの送信は、駆動制御関数のシーケンス移動間隔として設定された時間毎に繰り返される。例えば身体表現データベース２０（図５）の１行目の駆動部制御情報が参照された場合、ロボット５の視線方向は、映像表示デバイス２としたままの状態で、ロボット５の右腕チルト角0°左腕チルト角0°の状態が２０秒継続した後に、右腕チルト角30°左腕チルト角30°の状態に変化する。 The transmission of the drive control command and audio data is repeated every time set as the sequence movement interval of the drive control function. For example, when the drive unit control information in the first row of the body representation database 20 (FIG. 5) is referenced, the robot 5's gaze direction remains the video display device 2 and the right arm tilt angle of the robot 5 is 0 °. After the state of the left arm tilt angle of 0 ° continues for 20 seconds, the state changes to the state of the right arm tilt angle of 30 ° and the left arm tilt angle of 30 °.

感情表現生成部１７では、ロボット５の感情状態に基づいてロボット５の身体的表現と発話表現の内容を生成する。身体的表現とは、身体表現データベース２０（図５）に格納された動作名に記載されたロボット５の動作である。発話表現とは、音声表現データベース１９に格納された例えば「すごい」等の音声である。 The emotion expression generation unit 17 generates the contents of the physical expression and the utterance expression of the robot 5 based on the emotional state of the robot 5. The physical expression is an operation of the robot 5 described in the operation name stored in the body expression database 20 (FIG. 5). The utterance expression is a voice such as “Wow” stored in the voice expression database 19.

なお、身体的表現にロボット５の表情表現を含めてもよい。その場合、感情表現生成部１７は、身体的表現と発話表現と合わせて表情表現の内容も生成する。感情表現生成部１７は、ロボット５の感情状態に基づきロボット５の身体的表現と表情的表現と発話表現を、動的に生成してもよいし、予め感情状態に対応させて保持している身体的表現と表情的表現と発話表現を利用する方法をとってもよい。また、ロボット５の感情状態を決定するための感情状態の要素を、覚醒−非覚醒と快−不快の２組とする例で説明を行ったが、上記のように感情状態の要素の組の数をｎ組にしてもよい。 Note that the facial expression of the robot 5 may be included in the physical expression. In that case, the emotion expression generation unit 17 also generates the contents of the expression expression together with the physical expression and the utterance expression. The emotion expression generation unit 17 may dynamically generate the physical expression, the expression expression, and the utterance expression of the robot 5 based on the emotion state of the robot 5 or hold them in advance corresponding to the emotion state. A method using physical expression, facial expression and speech expression may be used. Also, the example of the emotional state elements for determining the emotional state of the robot 5 has been described as an example of two sets of awakening-non-wakening and pleasant-uncomfortable. The number may be n.

以上説明したように本実施の形態によれば、ロボット５がユーザと共に映像を視聴する際に、より人らしい感情表現をロボット５が行うことで、ユーザがロボット５に対して共感を持ち映像視聴をより豊かに楽しむことを可能にする。 As described above, according to the present embodiment, when the robot 5 views a video together with the user, the robot 5 performs more emotional expression so that the user has empathy for the robot 5 and views the video. It is possible to enjoy more richly.

１：ロボット制御装置
１１：映像関連情報収集部
１２：ユーザ情報収集部
１３：ユーザ感情推定部
１４：映像印象推定部
１５：感情語辞書
１６：感情状態決定部
１７：感情表現生成部
１８：感情状態変換ルールベース
１９：音声表現データベース
２０：身体表現データベース
２：映像表示デバイス
３：カメラ
４：マイク
５：ロボット
６：位置情報取得サーバ
６０：方位情報データベース 1: Robot control device 11: Video related information collection unit 12: User information collection unit 13: User emotion estimation unit 14: Video impression estimation unit 15: Emotion word dictionary 16: Emotion state determination unit 17: Emotion expression generation unit 18: Emotion State conversion rule base 19: voice expression database 20: body expression database 2: video display device 3: camera 4: microphone 5: robot 6: position information acquisition server 60: direction information database

Claims

ユーザとともに映像を視聴するような動作をロボットに実行させるロボット制御装置であって、
人間が前記映像を見た場合に当該人間に喚起される感情を表す映像印象情報と、前記映像を見た前記ユーザの感情を表すユーザ感情情報とを入力し、互いに関連する２種類の感情の種類を１組としてｎ組（ｎは１以上の整数）の感情の種類が予め設定されている場合において、前記映像印象情報から、予め用意された変換ルールを用いて、当該各種類の感情の大きさを示す値を生成し、前記ユーザ感情情報から、前記変換ルールを用いて、当該各種類の感情の大きさを示す値を生成し、前記各組につき、（１）前記映像印象情報から生成した当該組の一方の感情の種類についての値に予め定められた重みαを乗じた値と前記ユーザ感情情報から生成した当該組の当該一方の感情の種類についての値に予め定められた重みβを乗じた値の和を前記ロボットの当該組の当該一方の感情の種類についての値として計算し、（２）前記映像印象情報から生成した当該組の他方の感情の種類についての値に前記重みαを乗じた値と前記ユーザ感情情報から生成した当該組の当該他方の感情の種類についての値に前記重みβを乗じた値の和を前記ロボットの当該組の当該他方の感情の種類についての値として計算する
感情状態決定部を備えることを特徴とするロボット制御装置。 A robot control device that causes a robot to perform an operation such as viewing a video with a user,
Video impression information representing emotions aroused by a human when he / she sees the video and user emotion information representing the emotions of the user who viewed the video are input, and two types of emotions related to each other are input. In the case where n types (n is an integer of 1 or more) of emotion types are set in advance, the type of emotion is determined from the video impression information using a conversion rule prepared in advance. A value indicating a size is generated, and a value indicating the size of each type of emotion is generated from the user emotion information using the conversion rule. For each set, (1) from the video impression information A weight obtained by multiplying a value obtained by multiplying a value for one emotion type of the set by a predetermined weight α and a value for the one emotion type of the set generated from the user emotion information. The sum of the values multiplied by β Calculated as a value for the one emotion type of the set of the robot, and (2) a value obtained by multiplying the value for the other emotion type of the set generated from the video impression information by the weight α and the user Emotion state determination unit that calculates a sum of values obtained by multiplying the value of the other emotion type of the set generated from emotion information by the weight β as a value of the other emotion type of the set of the robot A robot control device comprising:

請求項１に記載したロボット制御装置において、
前記感情状態決定部は、
前記重みβは０≦β≦１の範囲に含まれ、前記重みαはα＝（１−β）であることを特徴とするロボット制御装置。 The robot control device according to claim 1,
The emotional state determination unit
The robot control apparatus according to claim 1, wherein the weight β is included in a range of 0 ≦ β ≦ 1, and the weight α is α = (1−β).

請求項１に記載したロボット制御装置において、
前記ロボットの視線方向を表すロボット視線情報を入力とし、前記ロボット視線情報が前記映像の方向を示す場合はα＞βであり、前記ロボット視線情報が前記ユーザの方向を示す場合はβ＞αであることを特徴とするロボット制御装置。 The robot control device according to claim 1,
The robot gaze information indicating the robot gaze direction is input. If the robot gaze information indicates the direction of the video, α> β, and if the robot gaze information indicates the user direction, β> α. A robot controller characterized by being.

請求項３に記載したロボット制御装置において、
前記感情状態決定部は、
前記ロボット視線情報が前記映像の方向を示す場合は、前記ロボット視線情報が前記映像の方向を示す場合のために予め定められた映像印象情報の重みおよびユーザ感情情報の重みをそれぞれ前記重みαおよび前記重みβに設定し、
前記ロボット視線情報が前記ユーザの方向を示す場合は、前記ロボット視線情報が前記ユーザの方向を示す場合のために予め定められた映像印象情報の重みおよびユーザ感情情報の重みをそれぞれ前記重みαおよび前記重みβに設定し、
前記ロボット視線情報がその他の方向を示す場合は、前記ロボット視線情報がその他の方向を示す場合のために予め定められた映像印象情報の重みおよびユーザ感情情報の重みをそれぞれ前記重みαおよび前記重みβに設定する
ことを特徴とするロボット制御装置。 The robot controller according to claim 3, wherein
The emotional state determination unit
When the robot line-of-sight information indicates the direction of the video, the weight α and the weight of user emotion information predetermined for the case where the robot line-of-sight information indicates the direction of the video are respectively set to the weight α and Set to the weight β,
When the robot line-of-sight information indicates the direction of the user, a weight of video impression information and a weight of user emotion information predetermined for the case where the robot line-of-sight information indicates the direction of the user are set as the weight α and Set to the weight β,
When the robot line-of-sight information indicates the other direction, the weight α and the weight are set as the weight of the video impression information and the weight of the user emotion information that are predetermined for the case where the robot line-of-sight information indicates the other direction, respectively. A robot controller characterized by being set to β.

ユーザとともに映像を視聴するような動作をロボットに実行させるロボット制御装置が行うロボット制御方法であって、
人間が前記映像を見た場合に当該人間に喚起される感情を表す映像印象情報と、前記映像を見た前記ユーザの感情を表すユーザ感情情報とを入力し、互いに関連する２種類の感情の種類を１組としてｎ組（ｎは１以上の整数）の感情の種類が予め設定されている場合において、前記映像印象情報から、予め用意された変換ルールを用いて、当該各種類の感情の大きさを示す値を生成し、前記ユーザ感情情報から、前記変換ルールを用いて、当該各種類の感情の大きさを示す値を生成し、前記各組につき、（１）前記映像印象情報から生成した当該組の一方の感情の種類についての値に予め定められた重みαを乗じた値と前記ユーザ感情情報から生成した当該組の当該一方の感情の種類についての値に予め定められた重みβを乗じた値の和を前記ロボットの当該組の当該一方の感情の種類についての値として計算し、（２）前記映像印象情報から生成した当該組の他方の感情の種類についての値に前記重みαを乗じた値と前記ユーザ感情情報から生成した当該組の当該他方の感情の種類についての値に前記重みβを乗じた値の和を前記ロボットの当該組の当該他方の感情の種類についての値として計算することを特徴とするロボット制御方法。 A robot control method performed by a robot control device that causes a robot to perform an operation such as viewing a video with a user,
Video impression information representing emotions aroused by a human when he / she sees the video and user emotion information representing the emotions of the user who viewed the video are input, and two types of emotions related to each other are input. In the case where n types (n is an integer of 1 or more) of emotion types are set in advance, the type of emotion is determined from the video impression information using a conversion rule prepared in advance. A value indicating a size is generated, and a value indicating the size of each type of emotion is generated from the user emotion information using the conversion rule. For each set, (1) from the video impression information A weight obtained by multiplying a value obtained by multiplying a value for one emotion type of the set by a predetermined weight α and a value for the one emotion type of the set generated from the user emotion information. The sum of the values multiplied by β Calculated as a value for the one emotion type of the set of the robot, and (2) a value obtained by multiplying the value for the other emotion type of the set generated from the video impression information by the weight α and the user Calculating a sum of values obtained by multiplying a value for the other emotion type of the set generated from the emotion information by the weight β as a value for the other emotion type of the set of the robot, Robot control method.

請求項１乃至４の何れかに記載したロボット制御装置としてコンピュータを機能させるためのロボット制御プログラム。 A robot control program for causing a computer to function as the robot control apparatus according to claim 1.