JP5326843B2

JP5326843B2 - Emotion estimation device and emotion estimation method

Info

Publication number: JP5326843B2
Application number: JP2009139815A
Authority: JP
Inventors: 英治外塚; 実冨樫; 健大野
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2009-06-11
Filing date: 2009-06-11
Publication date: 2013-10-30
Anticipated expiration: 2029-06-11
Also published as: JP2010286627A

Description

本発明は、感情推定装置及び感情推定方法に関する。 The present invention relates to an emotion estimation device and an emotion estimation method.

入力される音声信号の音声の強度や音声のテンポ及び音声の抑揚の変化量を求め、当該変化量に基づいて、怒り、悲しみ及び喜びのそれぞれの感情を把握する感情検出装置が知られている（特許文献１）。 2. Description of the Related Art An emotion detection device that obtains the amount of change in the intensity of an input audio signal, the tempo of the audio, and the inflection of the audio and grasps each emotion of anger, sadness, and joy based on the change is known. (Patent Document 1).

特開２００２−９１４８２号公報JP 2002-91482 A

しかしながら、従来の感情検出装置は、音声の特徴を示す当該変化量と感情とを関連づけてデータベースに予め保持し、当該データベースを用いて入力される音声の変化量に対応する感情を、その時の感情として検出するため、個人毎の音質に合わせた感情の検出が困難であった。 However, the conventional emotion detection device associates the change amount indicating the feature of the voice with the emotion in advance in the database, and stores the emotion corresponding to the voice change amount input using the database as the emotion at that time. Therefore, it is difficult to detect emotions that match the sound quality of each individual.

そこで本発明は、個人毎で異なる音質に対応し、感情を推定できる感情推定装置を提供する。 Therefore, the present invention provides an emotion estimation device that can estimate emotions corresponding to different sound quality for each individual.

本発明は、車両のイベント毎に操作者の感情を表す感情データがそれぞれ対応づけられている第１のテーブルを予め保持し、音声の特徴と当該感情データが示す感情とが対応づけて第２テーブルに保持し、当該第２テーブルにより感情を推定することによって上記課題を解決する。 In the present invention, a first table in which emotion data representing an operator's emotion is associated with each vehicle event is stored in advance, and the voice feature and the emotion indicated by the emotion data are associated with each other. The above problem is solved by holding the table and estimating the emotion by the second table.

本発明によれば、車両のイベント毎に操作者の感情を表す感情データがそれぞれ対応づけられている第１のテーブルを予め保持し、音声の特徴と前記感情データが示す感情とが対応づけて第２テーブルに保持し、当該第２テーブルにより感情を推定するため、車両のイベントの時に発生される操作者の音声の特徴と、当該車両のイベントに対応する感情データが示す感情とを対応づけて保持し、感情を推定することができ、その結果、個人毎に応じて、感情を推定することができる。 According to the present invention, the first table in which the emotion data representing the operator's emotion is associated with each vehicle event is stored in advance, and the voice feature is associated with the emotion indicated by the emotion data. In order to hold the second table and estimate the emotion based on the second table, the voice characteristics of the operator generated at the time of the vehicle event are associated with the emotion indicated by the emotion data corresponding to the vehicle event. And the emotion can be estimated. As a result, the emotion can be estimated according to each individual.

図１は、発明の実施形態に係るナビゲーション装置のブロック図を示す。FIG. 1 shows a block diagram of a navigation device according to an embodiment of the invention. 図１のナビゲーション装置において、イベントと、ステアリングの操舵角、アクセルの開度、ブレーキの踏込量及び車両の速度との対応関係を示す図である。FIG. 2 is a diagram illustrating a correspondence relationship between an event, a steering angle of an steering wheel, an accelerator opening, a brake depression amount, and a vehicle speed in the navigation device of FIG. 1. 図１のナビゲーション装置において、イベントと感情との対応関係を示す図である。FIG. 2 is a diagram showing a correspondence relationship between events and emotions in the navigation device of FIG. 1. 図１のナビゲーション装置において、感情と音響感情量との対応関係を示す図である。FIG. 3 is a diagram illustrating a correspondence relationship between emotions and acoustic emotion amounts in the navigation device of FIG. 1. 図１のナビゲーション装置において、感情と音響感情量との対応関係を示す図である。FIG. 3 is a diagram illustrating a correspondence relationship between emotions and acoustic emotion amounts in the navigation device of FIG. 1. 図１のナビゲーション装置において、感情と出力される音声ガイダンスとの対応関係を示す図である。FIG. 2 is a diagram illustrating a correspondence relationship between emotions and output voice guidance in the navigation device of FIG. 1. 図１のナビゲーション装置の制御手順を示すフローチャートである。It is a flowchart which shows the control procedure of the navigation apparatus of FIG. 図１のナビゲーション装置の制御手順を示すフローチャートである。It is a flowchart which shows the control procedure of the navigation apparatus of FIG. 他の発明の実施形態に係るナビゲーション装置のブロック図を示す。The block diagram of the navigation apparatus which concerns on embodiment of another invention is shown. 図９のナビゲーション装置において、イベントとカメラとの対応関係を示す図である。FIG. 10 is a diagram illustrating a correspondence relationship between events and cameras in the navigation device of FIG. 9. 図９のナビゲーション装置において、イベントと感情との対応関係を示す図である。FIG. 10 is a diagram showing a correspondence relationship between events and emotions in the navigation device of FIG. 9. 他の発明の実施形態に係るナビゲーション装置のブロック図を示す。The block diagram of the navigation apparatus which concerns on embodiment of another invention is shown.

以下、発明の実施形態を図面に基づいて説明する。 Hereinafter, embodiments of the invention will be described with reference to the drawings.

《第１実施形態》
本発明の感情推定装置を含むナビゲーション装置１を、例えば車両に搭載される場合を例として説明する。図１は、ナビゲーション装置１のブロック図を示す。 << First Embodiment >>
The navigation apparatus 1 including the emotion estimation apparatus according to the present invention will be described as an example when mounted on a vehicle. FIG. 1 shows a block diagram of the navigation device 1.

図１に示すナビゲーション装置１は、車両を操作する操作者が発する音声から、当該操作者の感情を推定する感情推定部１０と、感情推定部１０により推定される感情に応じて、操作者へ音声ガイダンスを出力するための信号を生成するガイダンス応答生成部１１と、ガイダンス応答生成部１１から送信されるデジタル信号をアナログ信号に変換するＤ／Ａコンバータ１２（デジタル／アナログコンバータ）と、当該アナログ信号を増幅するアンプ１３とを備える。ナビゲーション装置１は、外部から送信される車両信号を入力し、車両のイベントを検出イベント検出部１０１と、車両のイベントと「安心」、「不安」、「怒り」等の感情を表す感情データとを対応づけて、データベースとして保持するイベント感情データベース１０２（以下、イベント感情ＤＢと称す。）と、車両を操作する操作者が発する音声を入力するマイク１０３と、当該マイクに入力される音声の特徴を抽出する音響特徴量抽出部１０４と、音声の特徴と感情とを対応づけてデータベースとして保持する音声感情データベース１０５（以下、音声感情ＤＢと称す。）と、操作者毎に割り当てられるＩＤ（Ｉｄｅｎｔｉｆｉｃａｔｉｏｎ）を認識するＩＤ認識部１０６と、感情制御部１０７とを有する。 The navigation device 1 shown in FIG. 1 provides the operator with an emotion estimation unit 10 that estimates the emotion of the operator from the voices generated by the operator who operates the vehicle, and the emotion estimated by the emotion estimation unit 10. Guidance response generation unit 11 that generates a signal for outputting voice guidance, a D / A converter 12 (digital / analog converter) that converts a digital signal transmitted from guidance response generation unit 11 into an analog signal, and the analog And an amplifier 13 for amplifying the signal. The navigation device 1 receives a vehicle signal transmitted from the outside, detects a vehicle event, an event detection unit 101, emotion data representing a vehicle event and emotions such as “safety”, “anxiety”, “anger”, and the like Are associated with each other, and an event emotion database 102 (hereinafter referred to as an event emotion DB) held as a database, a microphone 103 for inputting a sound emitted by an operator operating the vehicle, and a feature of the sound input to the microphone , An acoustic feature quantity extraction unit 104 that extracts voices, a voice emotion database 105 (hereinafter referred to as a voice emotion DB) that stores voice features and emotions in association with each other, and an ID (Identification) assigned to each operator. ID recognition unit 106 and emotion control unit 107.

次に各構成における制御内容を説明する。図２は、各イベントと、ステアリング２０１の操舵角、アクセル２０２の開度、ブレーキ２０３の踏込量及び車両の速度との対応関係を示す図であり、図３は、イベント感情ＤＢ１０２に予め格納されている各イベントと感情との対応関係を示す図であり、図４は、音声感情ＤＢ１０５に格納されている、各感情と音響特徴量のパラメータとの対応関係で、一部未更新の状態の対応関係を示す図であり、図５は、音声感情ＤＢ１０５に格納されている、各感情と音響特徴量のパラメータとの対応関係で、更新済みの状態の対応関係を示す図であり、図６は、ガイダンス生成部１１に格納されている、感情と出力される音声ガイダンスとの対応関係を示す図である。 Next, the contents of control in each configuration will be described. FIG. 2 is a diagram showing a correspondence relationship between each event and the steering angle of the steering wheel 201, the opening degree of the accelerator 202, the depression amount of the brake 203, and the vehicle speed, and FIG. 3 is stored in the event emotion DB 102 in advance. 4 is a diagram showing a correspondence relationship between each event and emotion, and FIG. 4 is a correspondence relationship between each emotion and acoustic feature parameter stored in the voice emotion DB 105, and is in a partially unupdated state. FIG. 5 is a diagram illustrating the correspondence relationship between the emotions and the acoustic feature parameter stored in the voice emotion DB 105. FIG. 6 is a diagram illustrating the correspondence relationship of the updated state. These are figures which show the correspondence of the emotion stored in the guidance production | generation part 11, and the audio | voice guidance output.

イベント検出部１０１に入力される車両信号は、車両に備えるステアリング２０１、アクセル２０２及びブレーキ２０３に応じて設定され、イベント検出部１０１に入力され、イベント検出部１０１は当該車両信号から車両のイベントを検出する。ここで、イベントは、車両の例えば走行状態や、運転状態、車両の外部環境で起こっている状況を示す。すなわち、車両を制御する図示しないＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）は、ステアリング２０１の操舵角、アクセル２０２のアクセル開度、ブレーキ２０３の踏み込み量及び車両速度を車両信号としてイベント検出部１０１に送信する。 The vehicle signal input to the event detection unit 101 is set according to the steering 201, the accelerator 202, and the brake 203 included in the vehicle, and is input to the event detection unit 101. The event detection unit 101 detects a vehicle event from the vehicle signal. To detect. Here, the event indicates, for example, a running state of the vehicle, a driving state, or a situation occurring in the external environment of the vehicle. That is, a CPU (Central Processing Unit) (not shown) that controls the vehicle transmits the steering angle of the steering 201, the accelerator opening of the accelerator 202, the depression amount of the brake 203, and the vehicle speed to the event detection unit 101 as vehicle signals.

イベント検出部１０１は、図２に示すように、車両信号に含まれる、ステアリング２０１の操舵角、アクセル２０２の開度、ブレーキ２０３の踏込量及び車両の速度から車両のイベントを特定する。例えば、ステアリング２０１の操舵角が４５度、アクセル２０２の開度がオンに相当する、ブレーキ２０３の踏込量がオフに相当する及び車両の速度が２０ｋｍ／ｈの場合、イベント検出部１０１は、車両が急なカーブを低速で走行している状態である、と検出する。また例えば、ステアリング２０１の操舵角が５度、アクセル２０２の開度がオンに相当する、ブレーキ２０３の踏込量がオフに相当する及び車両の速度が２０ｋｍ／ｈの場合、イベント検出部１０１は、車両が緩いカーブを低速で走行している状態である、と検出する。また例えば、ステアリング２０１の操舵角が５度、アクセル２０２の開度がオンに相当する、ブレーキ２０３の踏込量がオフに相当する及び車両の速度が８０ｋｍ／ｈの場合、イベント検出部１０１は、車両が緩いカーブを高速で走行している状態である、と検出する。なお、図２は、三通りのイベントに対応するステアリング２０１の操舵角等を示すが、イベントは三通りに限定されない。また本例は、車両のイベントは、必ずしもステアリング２０１の操舵角、アクセル２０２の開度、ブレーキ２０３の踏込量及び車両の速度の全ての要素から抽出する必要はなく、ステアリング２０１の操舵角、アクセル２０２の開度、ブレーキ２０３の踏込量又は車両の速度の少なくとも一つの要素からイベントを抽出することも可能である。 As shown in FIG. 2, the event detection unit 101 identifies a vehicle event from the steering angle of the steering 201, the opening degree of the accelerator 202, the depression amount of the brake 203, and the vehicle speed, which are included in the vehicle signal. For example, when the steering angle of the steering wheel 201 is 45 degrees, the opening degree of the accelerator 202 is equivalent to ON, the depression amount of the brake 203 is equivalent to OFF, and the vehicle speed is 20 km / h, the event detection unit 101 Detects that the vehicle is traveling at a low speed on a steep curve. For example, when the steering angle of the steering 201 is 5 degrees, the opening degree of the accelerator 202 is equivalent to ON, the depression amount of the brake 203 is equivalent to OFF, and the vehicle speed is 20 km / h, the event detection unit 101 It is detected that the vehicle is running on a gentle curve at a low speed. For example, when the steering angle of the steering 201 is 5 degrees, the opening degree of the accelerator 202 is equivalent to ON, the depression amount of the brake 203 is equivalent to OFF, and the vehicle speed is 80 km / h, the event detection unit 101 It is detected that the vehicle is traveling at a high speed on a gentle curve. FIG. 2 shows the steering angle of the steering 201 corresponding to the three events, but the events are not limited to three. In this example, the vehicle event does not necessarily need to be extracted from all the factors of the steering angle of the steering wheel 201, the opening degree of the accelerator 202, the depression amount of the brake 203, and the vehicle speed. It is also possible to extract an event from at least one element of the opening degree 202, the depression amount of the brake 203, or the speed of the vehicle.

イベント感情ＤＢ１０２は、図３に示す、車両の各イベントに対応する、感情を表す感情データを示すテーブル（以下、イベント−感情テーブルと称す。）をデータベースとして予め保持している。すなわち、イベント−感情テーブルに含まれる感情データに示される感情は、イベント毎に、予め定義されている。例えば、車両のイベントが低速で急なカーブの状態を示す時、対応する感情は「不安」を示し、車両のイベントが低速で緩いカーブの状態を示す時、対応する感情は「安心」を示し、車両のイベントが高速で緩いカーブの時、対応する感情は「恐怖」を示す。各イベントに対応する感情「不安」、「安心」及び「恐怖」は、車両の操作者が当該イベントの車両の状態で運転している時に通常的に感じている精神状況に応じて、予め設定されている。言い換えると、緩いカーブを高速で走行している時、多くの操作者は、安心した精神状態で運転をしていることはなく、多少の恐怖を感じて運転している。そのため、高速度で緩いカーブを示すイベントには、「恐怖」の感情が割り当てられている。また急なカーブを低速で走行している時、操作者にとって見通しが悪いため、多くの操作者は、不安な精神状態で運転をしている。そのため、低速度で急なカーブを示すイベントには、「不安」の感情が割り当てられている。そして、イベント感情ＤＢ１０２は、イベント検出部１０１からイベントを含む信号を受信し、図３に示すテーブルから、当該イベントに対応する感情データを抽出し、感情制御部１０７へ送信する。 The event emotion DB 102 holds in advance as a database a table (hereinafter referred to as an event-emotion table) indicating emotion data representing emotions corresponding to each event of the vehicle shown in FIG. That is, the emotion shown in the emotion data included in the event-emotion table is defined in advance for each event. For example, when the vehicle event shows a slow and steep curve state, the corresponding emotion shows “anxiety”, and when the vehicle event shows a slow and loose curve state, the corresponding emotion shows “safe”. When a vehicle event is a fast and gentle curve, the corresponding emotion shows “fear”. The emotions “anxiety”, “relief”, and “fear” corresponding to each event are set in advance according to the mental condition that the operator of the vehicle usually feels when driving in the vehicle state of the event. Has been. In other words, when driving on a gentle curve at high speed, many operators are not driving in a reassuring mental state and are driving with some fear. Therefore, the event of “fear” is assigned to an event that shows a gentle curve at a high speed. Also, when driving on a sharp curve at a low speed, the operator has a poor outlook, so many operators are driving in an uneasy mental state. For this reason, an anxiety feeling is assigned to an event that shows a sharp curve at a low speed. Then, the event emotion DB 102 receives a signal including the event from the event detection unit 101, extracts emotion data corresponding to the event from the table shown in FIG. 3, and transmits the emotion data to the emotion control unit 107.

マイク１０３は、ある車両の状況下で、操作者が発する音声を検出し、音声信号を音響特徴量抽出部１０４に送信する。音響特徴量抽出部１０４は、当該音声信号から音声の特徴を数値により抽出する。音声の特徴は、単位時間当たりの音素のテンポや、声の強度、声の抑揚等がから導き出される。そして、音響特徴量抽出部１０４は、音素のピッチ（Ｐｉｔｃｈ）や音素の早さ（Ｓｐｅｅｄ）や、声の強度（Ｌｏｕｄｎｅｓｓ）、声の抑揚（ＰｉｔｃｈＳｌｏｐｅ）を数値（パラメータ）として表し、それぞれの音響特徴値として感情制御部１０７に送信する。 The microphone 103 detects the voice uttered by the operator under the condition of a certain vehicle, and transmits the voice signal to the acoustic feature quantity extraction unit 104. The acoustic feature quantity extraction unit 104 extracts voice features from the voice signal by numerical values. Voice characteristics are derived from phoneme tempo per unit time, voice strength, voice inflection, and the like. The acoustic feature quantity extraction unit 104 represents the phoneme pitch (Pitch), the speed of the phoneme (Speed), the voice strength (Loudness), and the voice inflection (PitchSlope) as numerical values (parameters). It transmits to the emotion control part 107 as a feature value.

ＩＤ認識部１０６は、予め登録されている操作者のＩＤを識別し認識する。例えば、車両を始動するためのキー毎に、ＩＤがふられていて、操作者がキーを入力することにより、ＩＤ認識部１０６は当該キーに割り振られているＩＤを識別する。 The ID recognition unit 106 identifies and recognizes an operator ID registered in advance. For example, an ID is assigned to each key for starting the vehicle, and when the operator inputs the key, the ID recognition unit 106 identifies the ID assigned to the key.

感情制御部１０７は、イベント感情ＤＢ１０２から送信される感情データに示される、特定の車両のイベントに対応する感情と、音響特徴量出部１０４から送信される信号に含まれる特徴量と対応づけて、音声感情ＤＢ１０５に格納する。音声感情ＤＢ１０５は、イベント−感情テーブルに割り当てられている、それぞれの感情と、当該特徴量との対応関係を表すテーブル（以下、感情−特徴量テーブル）を有している。また音声感情ＤＢ１０５は、予め登録さているＩＤ毎に、感情−特徴量テーブルを有している。 The emotion control unit 107 associates the emotion corresponding to the event of the specific vehicle indicated in the emotion data transmitted from the event emotion DB 102 with the feature amount included in the signal transmitted from the acoustic feature amount output unit 104. And stored in the voice emotion DB 105. The voice emotion DB 105 has a table (hereinafter referred to as an emotion-feature amount table) representing a correspondence relationship between each emotion and the feature amount assigned to the event-emotion table. Further, the voice emotion DB 105 has an emotion-feature amount table for each ID registered in advance.

図４及び図５に示すように、感情−特徴量テーブルには、イベント−感情テーブルで定義づけられている感情が、それぞれ割り当てられており、それぞれの感情に対して、音響特徴量が格納される。音響特徴量は、音響特徴量抽出部１０４で用いたパラメータに対応し、パラメータ毎に格納される。 As shown in FIGS. 4 and 5, emotions defined in the event-emotion table are assigned to the emotion-feature amount table, and acoustic feature amounts are stored for the respective emotions. The The acoustic feature amount corresponds to the parameter used in the acoustic feature amount extraction unit 104 and is stored for each parameter.

まずイベント感情ＤＢ１０２は、イベント−感情テーブルを参照して、特定の車両イベント時に対応する感情データを抽出し、音響特徴量抽出部１０４は、当該特定の車両イベント時に操作者が発する音声から特徴を抽出し、感情制御部１０７に送信する。感情制御部１０７は、抽出された感情データ及び特徴値を感情−特徴量テーブルに格納する。これにより感情制御部１０７は、イベント毎に予め割り当てられている、それぞれの感情に対して特徴量を蓄積し、感情−特徴量テーブルに格納する。 First, the event emotion DB 102 refers to the event-emotion table to extract emotion data corresponding to a specific vehicle event, and the acoustic feature amount extraction unit 104 extracts features from the voice uttered by the operator at the specific vehicle event. Extracted and transmitted to the emotion control unit 107. The emotion control unit 107 stores the extracted emotion data and feature values in the emotion-feature amount table. As a result, the emotion control unit 107 accumulates feature amounts for each emotion assigned in advance for each event and stores them in the emotion-feature amount table.

また感情制御部１０７は、感情−特徴量テーブルに特徴量を格納しつつ、音声の特徴量から当該感情―特徴量テーブルを用いて、操作者の感情を推定する。感情制御部１０７は、音響特徴量抽出部により抽出される特徴量が入力されると、感情―特徴量テーブルから、当該特徴量に対応する感情を抽出し、当該特徴量を抽出した時の操作者の感情として推定する。そして、感情制御部１０７は、当該推定される感情を、信号により、ガイダンス応答生成部１１に送信する。 In addition, the emotion control unit 107 estimates the operator's emotion from the feature amount of the voice using the emotion-feature amount table while storing the feature amount in the emotion-feature amount table. When the feature amount extracted by the acoustic feature amount extraction unit is input, the emotion control unit 107 extracts an emotion corresponding to the feature amount from the emotion-feature amount table, and an operation when the feature amount is extracted. Estimated as a person's emotion. And the emotion control part 107 transmits the said estimated emotion to the guidance response production | generation part 11 with a signal.

ガイダンス応答生成部１１は、感情制御部１０７により推定される感情に応じて、スピーカ２０４を介して操作者に音声を発するための信号を生成する。図６を参照し、ガイダンス応答性生成部１１は、推定される感情に応じて、出力される音声のガイダンステキストと音響特徴量を予め対応づけて定義する。例えば、推定される感情が「通常」の時、ガイダンス生成部１１は、ガイダンステキストを「運転うまくなったね。」に、音響特徴量の各パラメータ「Ｐｉｔｃｈ」、「Ｓｐｅｅｄ」、「Ｌｏｕｄｎｅｓｓ」及び「ＰｉｔｃｈＳｌｏｐｅ」を基準値である１００に設定し、信号をＤ／Ａコンバータに送信する。また推定される感情が「不安」の時、ガイダンス生成部１１は、ガイダンステキストを「ゆっくり焦らずに行きましょう。」に、音響特徴量の各パラメータ「Ｐｉｔｃｈ」、「Ｓｐｅｅｄ」及び「ＰｉｔｃｈＳｌｏｐｅ」を基準値である１００に、「Ｌｏｕｄｎｅｓｓ」を１１０に設定し、信号をＤ／Ａコンバータに送信する。感情が「通常」の時と比較して、「Ｌｏｕｄｎｅｓｓ」が高く設定されているため、ガイダンスが、より聞き取り易く、操作者は、落ち着いて運転できる。また推定される感情が「安心」の時、ガイダンス生成部１１は、ガイダンステキストを「今日は運転楽しいな」に、音響特徴量の各パラメータ「Ｐｉｔｃｈ」及び「Ｓｐｅｅｄ」を１１０に、「Ｌｏｕｄｎｅｓｓ」を９０に、「ＰｉｔｃｈＳｌｏｐｅ」を１００に設定し、信号をＤ／Ａコンバータに送信する。感情が「通常」の時と比較して、「Ｐｉｔｃｈ」及び「Ｓｐｅｅｄ」が高く、「Ｌｏｕｄｎｅｓｓ」が低く設定されているため、ガイダンスは、小さな音かつ短時間で出力される。操作者は、当該ガイダンスを多少聞き取りにくくなるが、操作者は落ち着いている状態のため、引き続き、安全な運転をすることができる。また推定される感情が「恐怖」の時、ガイダンス生成部１１は、ガイダンステキストを「ゆっくりあせらず行きましょう。」に、音響特徴量の各パラメータ「Ｐｉｔｃｈ」及び「Ｓｐｅｅｄ」を９０に、「Ｌｏｕｄｎｅｓｓ」及び「ＰｉｔｃｈＳｌｏｐｅ」を１１０に設定し、信号をＤ／Ａコンバータに送信する。感情が「通常」の時と比較して、「Ｐｉｔｃｈ」及び「Ｓｐｅｅｄ」が小さく、「Ｌｏｕｄｎｅｓｓ」及び「ＰｉｔｃｈＳｌｏｐｅ」が高く設定されているため、ガイダンスは、大きな音かつ長時間で出力される。操作者は、当該ガイダンスをより聞き取り易くなり、操作者は、落ち着いて運転できる。 The guidance response generation unit 11 generates a signal for emitting a voice to the operator via the speaker 204 according to the emotion estimated by the emotion control unit 107. With reference to FIG. 6, the guidance responsiveness generation unit 11 defines the guidance text of the output voice and the acoustic feature amount in advance in accordance with the estimated emotion. For example, when the estimated emotion is “normal”, the guidance generating unit 11 changes the guidance text “Pitch”, “Speed”, “Loudness”, and “Loudness” to the parameters “Pitch”, “Speed”, and “ “PitchSlope” is set to a reference value of 100, and the signal is transmitted to the D / A converter. When the estimated emotion is “anxiety”, the guidance generation unit 11 changes the parameters “Pitch”, “Speed”, and “PitchSlope” of the acoustic feature amount to “Let's go without rushing slowly” in the guidance text. Is set to 100 which is the reference value, and “Loudness” is set to 110, and the signal is transmitted to the D / A converter. Since “Loudness” is set higher than when “emotion” is “normal”, the guidance is easier to hear and the operator can drive calmly. When the estimated emotion is “safe”, the guidance generator 11 sets the guidance text to “Today is fun driving”, the acoustic feature parameters “Pitch” and “Speed” to 110, and “Loudness”. Is set to 90, “PitchSlope” is set to 100, and the signal is transmitted to the D / A converter. Since “Pitch” and “Speed” are set higher and “Loudness” is set lower than when the emotion is “normal”, the guidance is output with a small sound and in a short time. Although the operator becomes somewhat difficult to hear the guidance, the operator can continue to drive safely because the operator is calm. When the estimated emotion is “fear”, the guidance generating unit 11 sets the guidance text to “Let's go slowly”, the acoustic feature parameters “Pitch” and “Speed” to 90, “ “Loudness” and “PitchSlope” are set to 110, and the signal is transmitted to the D / A converter. Since “Pitch” and “Speed” are small and “Loudness” and “PitchSlope” are set higher than when the emotion is “normal”, the guidance is output with a loud sound and in a long time. The operator can hear the guidance more easily, and the operator can drive calmly.

次に、図７及び図８を参照しつつ、本例のナビゲーション装置１の制御手順を説明する。図７は、音声感情ＤＢ１０５に格納さている感情−特徴テーブルを更新するための制御手順のフローチャートを示し、図８は、音声感情ＤＢ１０５に格納さている感情−特徴テーブルを用いて、感情を推定し、操作者に対して音声ガイダンスを出力するための制御手順のフローチャートである。 Next, the control procedure of the navigation device 1 of this example will be described with reference to FIGS. FIG. 7 shows a flowchart of a control procedure for updating the emotion-feature table stored in the voice emotion DB 105, and FIG. 8 estimates an emotion using the emotion-feature table stored in the voice emotion DB 105. It is a flowchart of the control procedure for outputting voice guidance with respect to an operator.

図７を参照して、制御が開始されると、まずステップＳ１にて、ＩＤ認識部１０６により、操作者のＩＤを認識する。そして、ステップＳ２にて、感情制御部１０７は、音声感情ＤＢ１０５から当該ＩＤが割り当てられている感情−特徴量テーブルを抽出する。 Referring to FIG. 7, when the control is started, first, in step S1, the ID recognition unit 106 recognizes the operator's ID. In step S <b> 2, the emotion control unit 107 extracts an emotion-feature amount table to which the ID is assigned from the voice emotion DB 105.

次に、音声がマイク１０３に入力されているか判断し（ステップＳ３）、音声が入力されると、当該音声から音響特徴量を抽出する（ステップＳ４）。ここで、ステップＳ３にて抽出される音響特徴量のパラメ−タについて、例として、「Ｐｉｔｃｈ」をＰ２、「Ｓｐｅｅｄ」をＳ２、「Ｌｏｕｄｎｅｓｓ」をＬ２、「ＰｉｔｃｈＳｌｏｐｅ」をＰＳ２とする。そして、ステップＳ５にて、イベント検出部１０１は、マイク１０３に当該音声が入力される時の、イベントを検出する。イベントが検出されない場合、ステップＳ３にて抽出した特徴量から感情を推定する（ステップＳ１０）。ステップＳ１０以降の制御手順は、後述する。 Next, it is determined whether sound is input to the microphone 103 (step S3). When the sound is input, an acoustic feature amount is extracted from the sound (step S4). Here, for example, regarding the parameters of the acoustic feature amount extracted in step S3, “Pitch” is P2, “Speed” is S2, “Loudness” is L2, and “PitchSlope” is PS2. In step S <b> 5, the event detection unit 101 detects an event when the sound is input to the microphone 103. If no event is detected, the emotion is estimated from the feature amount extracted in step S3 (step S10). The control procedure after step S10 will be described later.

一方、ステップＳ５にて、イベントが検出される場合（ここでは、イベント「緩いカーブ低速度」が検出された仮定する。）、感情制御部１０７は、イベント−感情テーブルから（図３を参照）、当該イベント「緩いカーブ低速度」に対応する感情「安心」を特定し、当該感情を表す感情データを特定する（ステップＳ６）。次に、感情制御部１０７は、音声感情ＤＢ１０５に格納され、ステップ２にて抽出された感情−特徴量テーブルを参照し、ステップＳ４の感情データが示す感情に対応する特徴量がテーブルに格納されているか否かを確認する（ステップ７）。図４に示すように、当該感情「安心」に対応する特徴量が格納されていない場合、感情制御部１０７は更新すると判断し、ステップＳ４にて抽出したパラメータを、感情−特徴量テーブルに格納する（ステップＳ８）。これにより、音声感情ＤＢ１０５の感情−特徴量テーブルが更新される。 On the other hand, when an event is detected in step S5 (here, it is assumed that the event “slow curve low speed” is detected), the emotion control unit 107 determines from the event-emotion table (see FIG. 3). The emotion “relief” corresponding to the event “slow curve low speed” is identified, and emotion data representing the emotion is identified (step S6). Next, the emotion control unit 107 refers to the emotion-feature amount table stored in the voice emotion DB 105 and extracted in step 2, and the feature amount corresponding to the emotion indicated by the emotion data in step S4 is stored in the table. (Step 7). As shown in FIG. 4, when the feature quantity corresponding to the emotion “relief” is not stored, the emotion control unit 107 determines to update, and stores the parameter extracted in step S4 in the emotion-feature quantity table. (Step S8). Thereby, the emotion-feature amount table of the voice emotion DB 105 is updated.

一方、ステップＳ６にて、当該感情「安心」に対応する特徴量が格納されている場合、ステップＳ３にて抽出した特徴量から感情を推定する（ステップＳ１０）。ステップＳ１０以降の制御手順は、後述する。 On the other hand, when the feature amount corresponding to the emotion “relief” is stored in step S6, the emotion is estimated from the feature amount extracted in step S3 (step S10). The control procedure after step S10 will be described later.

次に、図８を参照して、上記ステップＳ１０以降である、音響特徴量抽出部１０４にて、抽出した音響特徴量から、操作者の感情を推定し音声ガイダンスを出力する制御手順を説明する。 Next, with reference to FIG. 8, a control procedure for estimating the emotion of the operator from the extracted acoustic feature amount and outputting voice guidance in the acoustic feature amount extraction unit 104 after Step S10 will be described. .

ステップＳ１０の後、ステップＳ１１にて、感情制御部１０７は、音声感情ＤＢ１０５に格納される感情−特徴量テーブルを参照する。参照されるイベント−感情テーブルは、ステップＳ２にて、抽出されたテーブルである。ここで、以下、音響特徴量抽出部１０４により抽出される特徴量のパラメータについて、「Ｐｉｔｃｈ」をＰ１、「Ｓｐｅｅｄ」をＳ１、「Ｌｏｕｄｎｅｓｓ」Ｌ１、「ＰｉｔｃｈＳｌｏｐｅ」をＰＳ１として、説明する。 After step S10, in step S11, the emotion control unit 107 refers to the emotion-feature amount table stored in the voice emotion DB 105. The event-emotion table referred to is the table extracted in step S2. Hereafter, the parameters of the feature amount extracted by the acoustic feature amount extraction unit 104 will be described with “Pitch” as P1, “Speed” as S1, “Loudness” L1, and “PitchSlope” as PS1.

ステップＳ１２にて、感情制御部１０７は、感情−特徴量テーブルにより、当該パラメータに対応する感情を特定し、推定する。図５に示すように、「Ｐｉｔｃｈ」Ｐ１、「Ｓｐｅｅｄ」Ｓ１、「Ｌｏｕｄｎｅｓｓ」Ｌ１、「ＰｉｔｃｈＳｌｏｐｅ」ＰＳ１に対応する感情は「不安」であり、感情制御部１０は、感情「不安」と推定する。 In step S12, the emotion control unit 107 identifies and estimates the emotion corresponding to the parameter using the emotion-feature value table. As shown in FIG. 5, the emotion corresponding to “Pitch” P1, “Speed” S1, “Loudness” L1, and “PitchSlope” PS1 is “anxiety”, and the emotion control unit 10 estimates the emotion “anxiety”. .

次に、感情制御部１０は、当該感情「不安」を示す信号をガイダンス応答生成部１１に送信し、ガイダンス応答生成部１１は、音声ガイダンスを生成する。すなわち、図６に示すように、感情「不安」に対応する音声ガイダンスは、ガイダンステキスト「ゆっくり焦らず行きましょう。」、Ｐｉｔｃｈ１００、Ｓｐｅｅｄ１００、Ｌｏｕｄｎｅｓｓ１１０、ＰｉｔｃｈＳｌｏｐｅ１００となる。 Next, the emotion control unit 10 transmits a signal indicating the emotion “anxiety” to the guidance response generation unit 11, and the guidance response generation unit 11 generates voice guidance. That is, as shown in FIG. 6, the voice guidance corresponding to the emotion “anxiety” is the guidance text “Let's go slowly, not Pitch”, Pitch100, Speed100, Loudness110, and PitchSlope100.

そして、ステップＳ１３にて生成される音声ガイダンスが、スピーカ２０４より出力され（ステップＳ１４）、制御を終了する。 And the voice guidance produced | generated in step S13 is output from the speaker 204 (step S14), and control is complete | finished.

上記のように、本発明は、イベント毎に操作者が感じる感情を定義づける、イベント−感情テーブルを予め保持し、入力される音声の特徴量と感情との対応関係を示す特徴量−感情テーブルを保持する。これにより、車両のイベントから感情を特定し、当該特定される感情毎に、特徴量を示すデータを保持することできる。通常、音声の特徴量には、個人差があり、例えば「恐怖」の状況下において操作者が発する音声は様々である。本例は、感情の違いを車両のイベントにより定義し、感情毎の音声の特徴量を保持するため、個人毎で、特定の感情に対する音響特徴量をテーブルとして保持することができ、さらに、当該テーブルを用いて、操作者の感情を推定することができる、これにより、本例は、個人毎の音質に合わせた感情を推定することができる。従来のように、人の音声を一般化して、音声の特徴量のみから、感情を判断する場合、個人差によって、感情が判断されないおそれがあるが、本例は、予めイベントにより感情を定義づけて、個人毎に、それぞれの感情に応じた特徴量を保持し、データベースとして格納する。そのため、従来に比べて、個人毎の音質に合わせた感情を推定することができる。 As described above, the present invention holds in advance an event-emotion table that defines the emotion felt by the operator for each event, and the feature-emotion table indicating the correspondence between the feature value of the input voice and the emotion. Hold. Thereby, an emotion can be specified from a vehicle event, and data indicating a feature amount can be held for each specified emotion. Usually, there are individual differences in the feature amount of the voice, and for example, the voice uttered by the operator in a situation of “fear” varies. In this example, the difference in emotion is defined by the event of the vehicle, and the audio feature amount for each emotion is held. Therefore, for each individual, the acoustic feature amount for a specific emotion can be held as a table. An operator's emotion can be estimated using a table. Thereby, this example can estimate the emotion according to the sound quality for every individual. As in the past, when a person's voice is generalized and the emotion is judged only from the feature amount of the voice, the emotion may not be judged due to individual differences. In this example, the emotion is defined in advance by an event. Thus, for each individual, a feature amount corresponding to each emotion is held and stored as a database. Therefore, it is possible to estimate an emotion that matches the sound quality of each individual as compared to the conventional case.

また本例は、車両に生じるイベントに対して感情を定義し、イベントが生じた際に操作者が発する音声から特徴量を抽出し、感情と特徴量とを対応づけるため、操作者の声に適したシステムを実現することができる。さらに本例は、操作者特有の感情−特徴量テーブルを形成できるため、車両のイベントに対する操作者の感情にあった、制御を行うことができる。 This example also defines emotions for events that occur in the vehicle, extracts feature values from the voice that the operator utters when the events occur, and associates the emotions with the feature values. A suitable system can be realized. Furthermore, since this example can form an operator-specific emotion-feature amount table, it is possible to perform control that matches the operator's emotion with respect to the vehicle event.

そして、本例において、推定される感情に応じた制御を行うことができるため、例えば、操作者が「不安」に感じている状況下では、情報を操作者対して出力しない等、より安全な車内ＨＭＩ（ＨｕｍａｎＭａｃｈｉｎｅＩｎｔｅｒｆａｃｅ）を提供することができる。 In this example, since control according to the estimated emotion can be performed, for example, in a situation where the operator feels “anxious”, information is not output to the operator, for example, more secure. An in-vehicle HMI (Human Machine Interface) can be provided.

また本例は、感情−特徴量テーブルにおいて、感情に対する特徴量が当該テーブルに格納されていない場合、当該特徴量を当該テーブルに格納する。これにより、本例は、感情毎で異なる音声の特徴を、感情毎に格納することができ、当該テーブルを利用することで、正確に感情を推定することができる。 Also, in this example, in the emotion-feature amount table, when the feature amount for emotion is not stored in the table, the feature amount is stored in the table. Thereby, this example can store the characteristic of the voice which changes for every emotion for every emotion, and can estimate an emotion correctly by using the said table.

また本発明は、推定される感情に応じて、操作者に対して音声を設定する。これにより、操作者は、例えば「不安」、「恐怖」を感じている状況であっても、当該報知される音声により、気分を落ち着かせることができる。また、本例は、当該報知される音声のメッセージ、抑揚、速さ等を推定される感情に応じて設定するため、操作者が例えば「不安」、「恐怖」を感じ、通常の音声では気づきにくい状況であっても、当該音声を設定することで、より当該音声を聞き取り易くなり、気分を落ち着かせることができる。ゆえに、本例は、安全な車内ＨＭＩを提供することができる。 Moreover, this invention sets a voice | voice with respect to an operator according to the estimated emotion. Thereby, even if the operator feels, for example, “anxiety” or “fear”, the operator can calm down by the informed voice. Also, in this example, since the voice message, inflection, speed, etc. to be notified are set according to the estimated emotion, the operator feels "anxiety", "fear", for example, and notices it with normal voice. Even in difficult situations, setting the sound makes it easier to hear the sound and calms you down. Therefore, this example can provide a safe in-vehicle HMI.

また本例は、車両信号により車両のイベントを検出する。これにより、本例は、車両のイベントに応じて感情を予め定義することができる。 In this example, a vehicle event is detected by a vehicle signal. Thereby, this example can predefine an emotion according to the event of a vehicle.

また本例は、ＩＤ認識部１０６を有し、登録されるＩＤ毎に感情−特徴量テーブルを用意し、認識されるＩＤに応じて音声感情ＤＢ１０５から感情−特徴量テーブルを抽出する。これにより、本例は、操作者のＩＤ毎に、音声の特徴量をテーブルに格納することができるため、個人毎に、それぞれの感情に応じた特徴量を保持し、データベースとして格納する。そして、個人毎の音質に合わせた感情を推定することができる。 This example also has an ID recognition unit 106, prepares an emotion-feature amount table for each registered ID, and extracts the emotion-feature amount table from the voice emotion DB 105 according to the recognized ID. Thus, in this example, since the feature amount of voice can be stored in the table for each ID of the operator, the feature amount corresponding to each emotion is held for each individual and stored as a database. Then, it is possible to estimate an emotion that matches the sound quality of each individual.

なお、本例は、音声感情ＤＢ１０５に、感情毎の特徴量のデータを複数、蓄積し、当該データを正規化し、感情制御部１０７は、正規化された感情−特徴量テーブルを参照して、感情を推定することも可能である。以下、当該正規化について、説明する。 In this example, the voice emotion DB 105 accumulates a plurality of feature amount data for each emotion, normalizes the data, and the emotion control unit 107 refers to the normalized emotion-feature amount table, It is also possible to estimate emotions. Hereinafter, the normalization will be described.

まず、音響特徴抽出手段１０４は、操作者の音声の特徴を数値として抽出し、感情制御部１０７は、特定のイベントに対応する感情に対して、当該数値のデータを感情−特徴量テーブルに蓄積する。例えば、感情制御部１０７は、イベント（急なカーブ低速）の時に発せられる音声の特徴量のデータを、感情「不安」に対応づけて、図５に示す感情−特徴量テーブルに蓄積する。ここで、当該感情−特徴量テーブルには、既に「Ｐｉｔｃｈ」Ｐ１、「Ｓｐｅｅｄ」Ｓ１、「Ｌｏｕｄｎｅｓｓ」Ｌ１、「ＰｉｔｃｈＳｌｏｐｅ」ＰＳ１のデータが格納されているが、既存のデータに追加して、データを格納する。 First, the acoustic feature extraction unit 104 extracts the voice feature of the operator as a numerical value, and the emotion control unit 107 stores the numerical value data in the emotion-feature amount table for the emotion corresponding to the specific event. To do. For example, the emotion control unit 107 accumulates voice feature value data generated at the time of an event (steep curve low speed) in the emotion-feature value table shown in FIG. 5 in association with the emotion “anxiety”. Here, the emotion-feature value table already stores data of “Pitch” P1, “Speed” S1, “Loudness” L1, and “PitchSlope” PS1, but in addition to the existing data, data Is stored.

ここで、操作者は同じ感情であっても、発せされる音声の特徴量は、完全に同一にならない。そのため、感情−特徴量テーブルに蓄積される特徴量のデータは、一つの感情の中に、様々な数値を持つ。一方、操作者の音声の特徴量は、それぞれの感情毎に、ある程度の傾向を持つため、感情毎に特徴量の分布が形成される。そのため、本例は、特定の感情（例えば「不安」）において、蓄積されている、それぞれの音響特徴量（Ｐａｒａ）と、その平均値（Ａｖｅ）及び標準偏差（Ｄｅｖ）を利用して、正規化（（Ｐａｒａ−Ａｖｅ／Ｄｅｖ））を行う。また他の感情についても、同様に行う。 Here, even if the operator has the same emotion, the feature amount of the uttered voice is not completely the same. Therefore, the feature amount data stored in the emotion-feature amount table has various numerical values in one emotion. On the other hand, since the feature amount of the operator's voice has a certain tendency for each emotion, a distribution of the feature amount is formed for each emotion. Therefore, in this example, in a specific emotion (for example, “anxiety”), each acoustic feature (Para), average value (Ave), and standard deviation (Dev) accumulated therein is used for normalization. ((Para-Ave / Dev)). The same applies to other emotions.

これにより、本例は、感情毎に特徴量の正規化された値を有するテーブルを音声感情ＤＢに格納することができるため、感情毎に、正規化された異なる部分を割り当てることができる。これにより、本例は、感情を推定する際、誤認識の確率を下げることができる。 Thereby, since this example can store the table having the normalized value of the feature amount for each emotion in the voice emotion DB, a different normalized portion can be assigned for each emotion. Thereby, this example can reduce the probability of misrecognition when estimating an emotion.

なお、本例は、推定される感情に応じて、操作者へ伝える音声を設定するが、映像を設定してもよい。例えば、推定される感情が「不安」、「恐怖」等の場合、車内設備であるカーナビゲーションの映像を設定してもよい。また、図３〜６に示すように、本例は、「通常」、「不安」「安心」及び「恐怖」の４通りに感情を分けているが、必ずしも４通りにする必要はなく、２又は３通りでも、４通り以上でもよい。 In this example, the voice to be transmitted to the operator is set according to the estimated emotion, but a video may be set. For example, when the estimated emotion is “anxiety”, “fear” or the like, an image of car navigation as in-vehicle equipment may be set. In addition, as shown in FIGS. 3 to 6, in this example, emotions are divided into four types of “normal”, “anxiety”, “relief”, and “fear”. Or three or four or more may be used.

なお、本例のイベント検出部１０１は、本発明の「イベント検出手段」に相当し、イベント感情ＤＢ１０２は「イベント感情データ保持手段」に、マイク１０３は「音声入力手段」に、音響特徴量抽出部１０４は「音響特徴抽出手段」に、音声感情ＤＢ１０５は「音声感情格納手段」に、感情制御部１０７は「制御手段」に、ガイダンス応答生成部１１及びスピーカ２０４は「音声出力手段」に相当する。また、本例の感情推定部１０は「感情推定装置」に相当し、「感情推定装置」は、ガイダンス応答生成部１１を含めてもよい。 The event detection unit 101 of this example corresponds to the “event detection unit” of the present invention, the event emotion DB 102 is the “event emotion data holding unit”, the microphone 103 is the “speech input unit”, and the acoustic feature amount extraction is performed. The unit 104 corresponds to “acoustic feature extraction unit”, the voice emotion DB 105 corresponds to “voice emotion storage unit”, the emotion control unit 107 corresponds to “control unit”, and the guidance response generation unit 11 and the speaker 204 correspond to “voice output unit”. To do. The emotion estimation unit 10 of this example corresponds to an “emotion estimation device”, and the “emotion estimation device” may include the guidance response generation unit 11.

《第２実施形態》
図９は、発明の他の実施形態に係る感情推定装置を含むナビゲーション装置のブロック図である。本例は上述した第１実施形態に対して、ステアリング２０１、アクセル２０２及びブレーキ２０３の代わりにカメラ３０１を備える点で異なる。これ以外の構成で上述した第１実施形態と同じ構成は、その記載を適宜、援用する。 << Second Embodiment >>
FIG. 9 is a block diagram of a navigation device including an emotion estimation device according to another embodiment of the invention. This example is different from the above-described first embodiment in that a camera 301 is provided instead of the steering 201, the accelerator 202, and the brake 203. The description of the same configuration as that of the first embodiment described above in other configurations is incorporated as appropriate.

図９を参照し、カメラ３０１は、車両の外部環境を映すカメラであって、車両の前方を映す前方カメラと、車両の側面の方向を映すサイドカメラと、車両の後方を映すリアカメラを有する。カメラ３０１の映像信号は、イベント検出部１０１に入力され、イベント検出部は、当該映像信号より、車両のイベントを検出する。 Referring to FIG. 9, a camera 301 is a camera that reflects the external environment of the vehicle, and includes a front camera that reflects the front of the vehicle, a side camera that reflects the direction of the side of the vehicle, and a rear camera that reflects the rear of the vehicle. . The video signal of the camera 301 is input to the event detection unit 101, and the event detection unit detects a vehicle event from the video signal.

次に、図１０及び図１１を参照して、イベント検出部１０１によるイベントの検出と、検出されたイベントと感情との対応関係を説明する。図１０は、各イベントと、前方カメラの映像、サイドカメラの映像及びリアカメラの映像との対応関係を示す図であり、図１１は、イベント感情ＤＢ１０２に格納されている、イベント−感情テーブルを示す。イベント検出部１０１は、それぞれのカメラに映し出される映像における、切り替わりのタイミングや、映し出される映像の距離感等から車両の状況をイベントして把握する。具体的には、例えば、前方カメラの映像は「対象物なし」、サイドカメラの映像は「近距離に対象物有り」、リアカメラ「近距離に対象物有り」の場合、バック駐車などを想定し、イベント検出部１０１は、「駐車操作」のイベントである、と検出する。また、前方カメラの映像は「対象物なし」、サイドカメラの映像は「対象物なし」、リアカメラ「対象物なし」の場合、通常の走行を想定し、イベント検出部１０１は、「通常走行」のイベントである、と検出する。また、前方カメラの映像は「近距離に対象物有り（急激な画像変化）」、サイドカメラの映像は「対象物なし」、リアカメラ「対象物なし」の場合、例えば隣の車線から急激な割り込みがあったと想定し、イベント検出部１０１は、「割り込み」のイベントである、と検出する。そしてイベント検出部１０１は、それぞれのイベントに応じた信号をイベント感情ＤＢ１０２へ送信する。 Next, with reference to FIG.10 and FIG.11, the detection of the event by the event detection part 101 and the correspondence of the detected event and emotion are demonstrated. FIG. 10 is a diagram illustrating a correspondence relationship between each event and a front camera image, a side camera image, and a rear camera image. FIG. 11 illustrates an event-emotion table stored in the event emotion DB 102. Show. The event detection unit 101 recognizes the situation of the vehicle as an event from the timing of switching in the video projected on each camera, the sense of distance of the projected video, and the like. Specifically, for example, if the front camera image is “no object”, the side camera image is “close object”, and the rear camera is “close object”, back parking is assumed. Then, the event detection unit 101 detects that the event is “parking operation”. Further, when the front camera image is “no object”, the side camera image is “no object”, and the rear camera is “no object”, normal driving is assumed, and the event detection unit 101 sets “normal driving”. ”Is detected. In addition, when the image of the front camera is “There is an object at a short distance (rapid image change)”, the image of the side camera is “No object”, and the rear camera is “No object”, for example, suddenly from the adjacent lane Assuming that an interrupt has occurred, the event detection unit 101 detects an “interrupt” event. And the event detection part 101 transmits the signal according to each event to event emotion DB102.

イベント感情ＤＢ１０２は、図１１に示す、車両の各イベントに対応する、感情を表す感情データを示すテーブルをデータベースとして予め保持している。イベント−感情テーブルに含まれる感情データに示される感情は、イベント毎に応じて、予め定義されている。例えば、車両のイベントが駐車操作の状態を示す時、対応する感情は「不安」を示し、通常運転の状態を示す時、対応する感情は「安心」を示し、割り込みの状態の時、対応する感情は「恐怖」を示す。そして、イベント感情ＤＢ１０２は、イベント検出部１０１からイベントを含む信号を受信し、図９に示すテーブルから、当該イベントに対応する感情データを抽出し、感情制御部１０７へ送信する。感情制御部１０７により行われる制御は、実施の形態１と同様であるため、説明を省略する。 The event emotion DB 102 holds in advance a table showing emotion data representing emotions corresponding to each event of the vehicle shown in FIG. 11 as a database. Emotions shown in emotion data included in the event-emotion table are defined in advance for each event. For example, when a vehicle event indicates a parking operation state, the corresponding emotion indicates “anxiety”, when a normal driving state indicates “correspondence”, “corresponding emotion” indicates “safe”, and when it is in an interrupt state, it corresponds Emotions indicate “fear”. The event emotion DB 102 receives a signal including an event from the event detection unit 101, extracts emotion data corresponding to the event from the table shown in FIG. 9, and transmits the emotion data to the emotion control unit 107. Since the control performed by the emotion control unit 107 is the same as that in the first embodiment, the description thereof is omitted.

上記のように本例の感情推定装置は、カメラの映像情報からイベントを検出するため、駐車支援やサイドビューなどのカメラを利用した車両システムにおいて、操作者の感情を推定することができる。そして、例えば操作者が「不安」に感じて運転をしている状況の場合、操作者に対して情報をしない等の制御を行うことができ、本例は、安全な車内ＨＭＩを提供することができる。 As described above, since the emotion estimation device of this example detects an event from video information of a camera, it can estimate an operator's emotion in a vehicle system using a camera such as parking assistance or side view. For example, in the situation where the operator feels anxious and is driving, it is possible to perform control such as not providing information to the operator, and this example provides a safe in-vehicle HMI. Can do.

なお、本例は、イベントを検出するために、ステアリング２０１、アクセル２０２及びブレーキ２０３の代わりにカメラ３０１を用いたが、両方を備えてもよい。これにより、例えば、車線変更イベントを検出することができ、より多くの車両イベントを検出することができる。 In this example, the camera 301 is used in place of the steering 201, the accelerator 202, and the brake 203 in order to detect an event, but both may be provided. Thereby, for example, a lane change event can be detected, and more vehicle events can be detected.

《第３実施形態》
図１２は、発明の他の実施形態に係る感情推定装置を含むナビゲーション装置のブロック図である。本例は上述した第１実施形態に対して、対話制御部４０１を備える点で異なる。これ以外の構成で上述した第１実施形態と同じ構成は、その記載を適宜、援用する。 << Third Embodiment >>
FIG. 12 is a block diagram of a navigation device including an emotion estimation device according to another embodiment of the invention. This example is different from the above-described first embodiment in that a dialogue control unit 401 is provided. The description of the same configuration as that of the first embodiment described above in other configurations is incorporated as appropriate.

図１２に示すように、対話制御部４０１は、マイク１０３に入力される音声を認識する。そして、操作者が、同じメッセ−ジ何度も繰り返した時、対話制御部４０１は、繰り返される音声信号を検出する。本例の感情推定装置が、操作者の感情を正確に推定できなかった場合、操作者は、何回も音声を発し、マイクに１０３へ音声入力が繰り返される。この場合、対話制御部４０１は、繰り返される音声信号を認識し、操作者の感情が「困る」又は「驚き」であると認識し、当該感情を示す感情データを感情制御部１０７へ送信する。感情制御部１０７は、例えば、ガイダンス応答生成部１１を介して、「システムが正常に動作されませんでした」等のメッセージを含む音声を操作者に対して、出力する。 As illustrated in FIG. 12, the dialogue control unit 401 recognizes a voice input to the microphone 103. When the operator repeats the same message many times, the dialogue control unit 401 detects a repeated audio signal. When the emotion estimation apparatus of this example cannot accurately estimate the operator's emotion, the operator utters voice many times, and voice input to the microphone 103 is repeated. In this case, the dialogue control unit 401 recognizes the repeated audio signal, recognizes that the operator's emotion is “distressed” or “surprise”, and transmits emotion data indicating the emotion to the emotion control unit 107. For example, the emotion control unit 107 outputs a voice including a message such as “The system has not been operated normally” to the operator via the guidance response generation unit 11.

これにより、本例は、感情推定装置が正常に操作者の感情を推定できない場合であっても、操作者に対して、心地よい空間を提供できる。 Thereby, this example can provide a comfortable space for the operator even when the emotion estimation apparatus cannot normally estimate the operator's emotion.

なお、本例は、音声が繰り返されることを検出するが、音声の発話内容を検出してもよい。 In addition, although this example detects that the voice is repeated, the utterance content of the voice may be detected.

１…ナビゲーション装置
１０…感情推定部
１１…ガイダンス応答生成部
１２…Ｄ／Ａコンバータ
１３…アンプ
１０１…検出イベント検出部
１０２…イベント感情データベース
１０３…マイク
１０４…音響特徴量抽出部
１０５…音声感情データベース
１０６…ＩＤ認識部
１０７…感情制御部
２０１…ステアリング
２０２…アクセル
２０３…ブレーキ
２０４…スピーカ
３０１…カメラ
４０１…対話制御部 DESCRIPTION OF SYMBOLS 1 ... Navigation apparatus 10 ... Emotion estimation part 11 ... Guidance response production | generation part 12 ... D / A converter 13 ... Amplifier 101 ... Detection event detection part 102 ... Event emotion database 103 ... Microphone 104 ... Acoustic feature-value extraction part 105 ... Voice emotion database 106 ... ID recognition unit 107 ... Emotion control unit 201 ... Steering 202 ... Accelerator 203 ... Brake 204 ... Speaker 301 ... Camera 401 ... Dialog control unit

Claims

車両のイベントを検出するイベント検出手段と、
前記車両のイベント毎に操作者の感情を表す感情データがそれぞれ対応づけられている第１テーブルを予め保持するイベント感情データ保持手段と、
前記操作者の音声を入力する音声入力手段と、
前記音声入力手段に入力される音声の特徴を抽出する音響特徴抽出手段と、
前記音声の特徴と前記感情データが示す感情とを対応づけて第２テーブルに保持する音声感情格納手段と、
前記第２テーブルを用いて、前記音声入力手段に入力される音声の特徴から前記操作者の感情を推定する制御手段とを備え、
前記音声入力手段は、特定の車両のイベント時の音声を入力し、
前記音響特徴抽出手段は、前記特定のイベント時の音声の特徴を検出し、
前記制御手段は、
前記第１テーブルから、前記特定の車両のイベントに対応する前記感情データを抽出し、
前記第１テーブルから抽出される感情データが示す感情と前記特定の車両状態における音声の特徴とを対応づけて前記第２テーブルに保持することを特徴とする感情推定装置。 Event detection means for detecting a vehicle event;
Event emotion data holding means for holding in advance a first table in which emotion data representing an operator's emotion is associated with each vehicle event;
Voice input means for inputting the voice of the operator;
Acoustic feature extraction means for extracting features of voice input to the voice input means;
Voice emotion storage means for associating the characteristics of the voice with the emotions indicated by the emotion data and holding them in a second table;
Control means for estimating the operator's emotion from the characteristics of the voice input to the voice input means using the second table;
The voice input means inputs a voice at an event of a specific vehicle,
The acoustic feature extraction means detects a feature of the voice at the specific event,
The control means includes
Extracting the emotion data corresponding to the event of the specific vehicle from the first table,
An emotion estimation apparatus, characterized in that an emotion indicated by emotion data extracted from the first table and a voice characteristic in the specific vehicle state are associated with each other and held in the second table.

前記制御手段は、
前記第１テーブルから抽出される感情データが示す感情と前記特定の車両のイベント時の音声の特徴との対応関係が前記第２テーブルに格納されていない場合、当該対応関係を前記第２テーブルに格納することを特徴とする
請求項１に記載の感情推定装置。 The control means includes
If the correspondence relationship between the emotion indicated by the emotion data extracted from the first table and the voice characteristics at the event of the specific vehicle is not stored in the second table, the correspondence relationship is stored in the second table. The emotion estimation apparatus according to claim 1, wherein the emotion estimation apparatus is stored.

前記音響特徴抽出手段は、前記音声の特徴を数値として抽出し
前記制御手段は、
前記感情データが示す感情に対応する前記音声の特徴の数値を前記第２テーブルに蓄積し、
前記第２テーブルに蓄積される数値を正規化し、
前記正規化された数値を有する第２テーブルにより感情を推定することを特徴とする
請求項１又は２に記載の感情推定装置。 The acoustic feature extraction means extracts the voice feature as a numerical value, and the control means
Storing the numerical values of the features of the voice corresponding to the emotion indicated by the emotion data in the second table;
Normalize the numerical values stored in the second table;
The emotion estimation apparatus according to claim 1 or 2, wherein the emotion is estimated by a second table having the normalized numerical values.

前記車両の内部に音声を出力する音声出力手段をさらに備え、
前記制御手段は、前記推定する感情に応じて、前記音声出力手段より出力される音声を設定することを特徴とする
請求項１〜３のいずれか１項に記載の感情推定装置。 Voice output means for outputting voice inside the vehicle;
The emotion estimation apparatus according to any one of claims 1 to 3, wherein the control unit sets a voice output from the voice output unit in accordance with the estimated emotion.

前記車両の内部に映像を出力する映像出力手段をさらに備え、
前記制御手段は、前記推定する感情に応じて、前記映像出力手段より出力される映像を設定することを特徴とする
請求項１〜３のいずれか１項に記載の感情推定装置。 Video output means for outputting video inside the vehicle,
The emotion estimation apparatus according to any one of claims 1 to 3, wherein the control unit sets a video output from the video output unit in accordance with the estimated emotion.

前記イベント検出手段は、少なくともステアリング、アクセル、ブレーキ又はカメラのいずれか一つから送信される信号により前記車両のイベントを検出することを特徴とする
請求項１〜４のいずれか１項に記載の感情推定装置。 The said event detection means detects the event of the said vehicle by the signal transmitted from at least any one of a steering, an accelerator, a brake, or a camera, The Claim 1 characterized by the above-mentioned. Emotion estimation device.

前記操作者のＩＤを認識するＩＤ認識手段をさらに備え、
前記制御手段は、前記ＩＤ認識手段で認識されるＩＤを参照して、前記ＩＤに対応する前記第２テーブルを前記音声感情格納手段から抽出することを特徴とする
請求項１〜６のいずれか１項に記載の感情推定装置。 An ID recognition means for recognizing the operator's ID;
The said control means refers to ID recognized by the said ID recognition means, The said 2nd table corresponding to the said ID is extracted from the said audio | voice emotion storage means, The one of Claims 1-6 The emotion estimation apparatus according to item 1.

前記車両の外部環境を映すカメラをさらに備え、
前記イベント検出手段は、前記カメラの映像信号より前記車両のイベントを検出することを特徴とする
請求項１〜７のいずれか１項に記載の感情推定装置。 A camera that reflects an external environment of the vehicle;
The emotion estimation apparatus according to claim 1, wherein the event detection unit detects an event of the vehicle from a video signal of the camera.

車両のイベントを検出する車両イベント検出ステップと、
前記車両のイベントと操作者の感情を示す感情データを対応づける第１のテーブルから、前記車両イベント検出ステップにより検出される車両のイベントに対応する前記感情データを抽出する感情データ抽出ステップと、
前記操作者の音声が入力される音声入力ステップと、
前記音声の特徴を抽出する音響特徴抽出ステップと、
前記感情データ抽出ステップにより抽出される感情データが示す感情と前記音声の特徴を対応づけて第２テーブルに格納するステップと、
前記第２テーブルを用いて、前記音声入力ステップにより入力される音声の特徴から前記操作者の感情を推定する推定ステップとを有し、
前記音声入力ステップは、特定の車両のイベント時の音声を入力し、
前記音響特徴抽出ステップは、前記特定のイベント時の音声の特徴を検出し、
前記推定ステップは、
前記第１テーブルから、前記特定の車両のイベントに対応する前記感情データを抽出し、
前記第１テーブルから抽出される感情データが示す感情と前記特定の車両状態における音声の特徴とを対応づけて前記第２テーブルに保持することを特徴とする感情推定方法。 A vehicle event detection step for detecting a vehicle event;
An emotion data extraction step for extracting the emotion data corresponding to the vehicle event detected by the vehicle event detection step from a first table associating the vehicle event with emotion data indicating an operator's emotion;
A voice input step in which the voice of the operator is input;
An acoustic feature extraction step for extracting the features of the speech;
Associating the emotion indicated by the emotion data extracted by the emotion data extraction step with the characteristics of the voice and storing them in the second table;
Using the second table, the estimation step of estimating the operator's emotion from the characteristics of the voice input by the voice input step,
The voice input step inputs a voice at an event of a specific vehicle,
The acoustic feature extraction step detects a feature of speech at the specific event,
The estimation step includes
Extracting the emotion data corresponding to the event of the specific vehicle from the first table,
An emotion estimation method comprising: associating an emotion indicated by emotion data extracted from the first table with a feature of voice in the specific vehicle state in the second table.