JP6604267B2

JP6604267B2 - Audio processing system and audio processing method

Info

Publication number: JP6604267B2
Application number: JP2016105257A
Authority: JP
Inventors: 聡彦渡部; 篤司池野; 純一伊藤
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2016-05-26
Filing date: 2016-05-26
Publication date: 2019-11-13
Anticipated expiration: 2036-05-26
Also published as: JP2017211539A

Description

本発明は、音声処理システムに関し、特に、複数の車両間での意思疎通を支援するための音声処理システムに関連する。 The present invention relates to a voice processing system, and more particularly to a voice processing system for supporting communication between a plurality of vehicles.

大人数の集団が車両で移動する際、複数の車両に分乗することがある。その際に、車両間で意思疎通するために、電話（ボイスチャットアプリを含む）やテキストチャットを使用することが考えられる。テキストチャットは運転中のドライバが利用できないので、本開示では音声による意思疎通を検討する。 When a large group of people move by a vehicle, the vehicle may be divided into a plurality of vehicles. At that time, in order to communicate between vehicles, it is conceivable to use a telephone (including a voice chat application) or a text chat. Since text chat cannot be used by a driving driver, this disclosure considers voice communication.

特許文献１は、複数の車両間でチャットを行う車載チャットシステムを開示する。特許文献１が開示する車載チャットシステムは、３台以上の車両から構成され、ある車両からの発話をその他の車両に配信し、配信後一定時間以内に他の２台以上の車両から発生した競合する音声に対しては、選別基準にしたがっていずれか１つのみを選別して各車両に配信する。 Patent Document 1 discloses an in-vehicle chat system that performs chat between a plurality of vehicles. The in-vehicle chat system disclosed in Patent Document 1 is composed of three or more vehicles, distributes utterances from one vehicle to other vehicles, and contention generated from other two or more vehicles within a certain time after distribution For the voice to be played, only one is selected according to the selection criteria and distributed to each vehicle.

特開２００６−１９５５７７号公報JP 2006-195577 A

音声伝送は、状況によっては話者が特定できないという問題や、人間同士で発話がかぶりターンテイキング（発話をいつ開始するかの決定）が難しいという問題がある。さらに特許文献１の手法では、複数車両で同時に発話が行われた場合に、１つの音声しか配信されないため、発話が通じないあるいは不自然になる可能性がある。 The voice transmission has a problem that a speaker cannot be specified depending on a situation and a problem that it is difficult to turn-take (decide when to start utterance) between two people. Furthermore, in the method of Patent Document 1, when a plurality of vehicles are simultaneously uttered, only one voice is delivered, so that the utterance may not be communicated or may become unnatural.

多人数による対面での会話では音声以外のマルチモーダルな情報を使って話者や発話タイミングを判断している。したがって、車両の乗員が音声のみに頼って意思疎通しようとすると、困難が生じる。 In a face-to-face conversation with a large number of people, multimodal information other than voice is used to determine the speaker and utterance timing. Therefore, difficulty arises when the vehicle occupant tries to communicate only by voice.

このような状況を考慮して、本発明は、複数の車両間での意思疎通を支援可能な音声処理システムを提供することを目的とする。 In view of such a situation, an object of the present invention is to provide a voice processing system capable of supporting communication between a plurality of vehicles.

本発明の一態様に係る音声処理システムは、
第１の車両における会話音声に対して音声認識を行う音声認識手段と、
前記音声認識の結果に基づいて、前記第１の車両における会話内容を決定する会話内容理解手段と、
前記第１の車両の会話内容を知らせる発話を生成し、第２の車両において出力されるよう制御する出力制御手段と、
を備えることを特徴とする。 An audio processing system according to an aspect of the present invention includes:
Speech recognition means for performing speech recognition on conversational speech in the first vehicle;
Conversation content understanding means for determining the conversation content in the first vehicle based on the result of the speech recognition;
An output control means for generating an utterance informing the conversation content of the first vehicle and controlling the utterance to be output in the second vehicle;
It is characterized by providing.

このように本態様に係る音声処理システムは、車両内の会話をそのまま他の車両に送信するのではなく、車両内の会話内容を決定してその概要を知らせる。他の車両でどのような会話が行われているかを示す概要を知らせることにより、車両間の意思疎通を十分に支援でき、かつ、従来技術における上述した困難を回避できる。 As described above, the speech processing system according to this aspect does not directly transmit the conversation in the vehicle to another vehicle, but determines the conversation contents in the vehicle and notifies the outline thereof. By notifying the outline indicating what kind of conversation is being performed in another vehicle, communication between the vehicles can be sufficiently supported, and the above-described difficulty in the prior art can be avoided.

本態様において、前記第１の車両における会話内容に基づいて、当該会話内容を前記第２の車両に知らせるか否かを決定する伝達判断手段をさらに備え、
前記伝達判断手段によって前記第１の車両における会話内容を第２の車両に知らせると決定した場合に、前記発話生成手段による発話の生成、および前記出力制御手段による発話の送信を実行する、ことが好適である。 In this aspect, the apparatus further comprises a transmission determination means for determining whether or not to inform the second vehicle of the conversation content based on the conversation content in the first vehicle.
Executing the generation of the utterance by the utterance generation unit and the transmission of the utterance by the output control unit when it is determined by the transmission determination unit to inform the second vehicle of the conversation content in the first vehicle. Is preferred.

このような構成によれば、必要性が低い会話を他車両に通知することを抑制でき、必要な会話に限定して他車両に通知することができる。 According to such a configuration, it is possible to suppress notification of a conversation with low necessity to another vehicle, and it is possible to notify the other vehicle by limiting to a necessary conversation.

本態様において、前記伝達判断手段は、前記第１の車両における会話があらかじめ定められた話題に関する場合に、前記第１の車両における会話内容を前記第２の車両に知らせると決定する、ことができる。あらかじめ定められた話題は任意のものであって構わないが、例として今後の予定、現在地、周囲のランドマーク、目的地、出発地に関する話題とすることができる。各車両のユーザの個人プロファイルが利用できる場合には、グループ内で共通する属性に関する話題を採用することも好適である。 In this aspect, the transmission determining means can determine to inform the second vehicle of the conversation content in the first vehicle when the conversation in the first vehicle relates to a predetermined topic. . The predetermined topic may be arbitrary, but as an example, it can be a topic related to a future schedule, current location, surrounding landmarks, destinations, and departure places. When the personal profile of the user of each vehicle can be used, it is also preferable to employ topics related to attributes common within the group.

本態様において、前記伝達判断手段は、前記第２の車両の位置情報または車両制御情報も考慮して、前記第１の車両の会話内容を前記第２の車両に知らせるか否かを決定する、ことも好ましい。 In this aspect, the transmission determination means determines whether or not to inform the second vehicle of the conversation content of the first vehicle in consideration of position information or vehicle control information of the second vehicle. It is also preferable.

たとえば、第２の車両が、交通安全上危険であったり運転操作に集中する必要があると事前に分かっている場所にあるときや、運転操作を頻繁に行っているときには、第２の車両に対して通知を行うと運転を阻害するおそれがあるので、このような場合には第１の車両の会話内容を知らせないことが好ましい。なお、伝達判断手段は、このような状況が解消した時点で、第１の車両の会話内容を第２の車両に通知するように決定する、すなわち、前記第２の車両の位置情報または車両制御情報も考慮して第１の車両の会話内容の通知タイミングを決定するようにすることも好ましい。 For example, when the second vehicle is in a place where it is known in advance that it is dangerous for traffic safety or needs to be concentrated on driving operation, or when the driving operation is frequently performed, the second vehicle However, if notification is given to the vehicle, driving may be hindered. In such a case, it is preferable not to notify the conversation content of the first vehicle. The transmission determination means determines to notify the second vehicle of the conversation content of the first vehicle when such a situation is resolved, that is, the position information of the second vehicle or the vehicle control. It is also preferable to determine the notification timing of the conversation content of the first vehicle in consideration of the information.

本態様において、前記音声認識手段は前記第２の車両における会話音声に対しても音声認識を行い、前記会話内容理解手段は、前記第２の車両の会話音声に対する音声認識結果に基づいて、前記第２の車両の会話内容を決定し、前記伝達判断手段は、前記第２の車両の会話の状況も考慮して、前記第１の車両の会話内容を前記第２の車両に知らせるか否かを決定する、ことも好ましい。 In this aspect, the voice recognition means also performs voice recognition on the conversation voice in the second vehicle, and the conversation content understanding means is based on a voice recognition result for the conversation voice of the second vehicle. Whether or not to determine the conversation content of the second vehicle, and whether the transmission determination means informs the second vehicle of the conversation content of the first vehicle in consideration of the conversation situation of the second vehicle. It is also preferable to determine

たとえば、第２の車両において会話の活発さに応じて、通知する会話の量が増減するようにするとよい。第２の車両の会話が停滞しているときはより多くの会話を通知するようにし、逆に第２の車両の会話が活発な場合には重要な会話のみを通知するようにすることが考えられる。 For example, the amount of conversation to be notified may be increased or decreased according to the conversation activity in the second vehicle. When the conversation of the second vehicle is stagnant, more conversations are notified, and conversely, when the conversation of the second vehicle is active, only important conversations are notified. It is done.

本態様において、前記伝達判断手段は、前記第１の車両の車両制御情報に基づいて前記第１の車両が危険回避行動を取ったと判断される場合には、そうでない場合とは異なる判断基準にしたがって前記第１の車両の会話内容を前記第２の車両に知らせるか否かを決定する、ことができる。 In this aspect, the transmission determination means uses a determination criterion different from the case when it is determined that the first vehicle has taken a risk avoidance action based on the vehicle control information of the first vehicle. Therefore, it can be determined whether or not to inform the second vehicle of the conversation content of the first vehicle.

危険回避行動は、危険を回避するために取る車両の行動であり、たとえば急ブレーキや急ハンドルが相当する。危険回避行動を取った場合は、車両の安全に関する話題を優先的に他車両に通知するように構成すると、他車両においても危険を察知でき余裕を持って回避行動を取ることができる。 The danger avoidance action is an action of the vehicle taken to avoid danger, and corresponds to, for example, sudden braking or sudden steering. If the danger avoidance action is taken, the other vehicle can be preferentially notified of the topic related to the safety of the vehicle, and the other vehicle can detect the danger and take the avoidance action with a margin.

本態様に係る音声処理システムは、たとえば、その機能の全てを第１および第２の車両と通信可能な１台または複数台のコンピュータ（車載以外）によって構成されてもよい。また、一部または全ての機能が、１台または複数台の車両に搭載されても構わない。たとえば、音声認識処理の一部を車両内で実行し、音声認識処理の結果をその他の機能を実行するサーバに送信してもよい。 The voice processing system according to this aspect may be configured by, for example, one or a plurality of computers (other than the vehicle) that can communicate all of the functions with the first and second vehicles. A part or all of the functions may be mounted on one or a plurality of vehicles. For example, a part of the voice recognition process may be executed in the vehicle, and the result of the voice recognition process may be transmitted to a server that executes other functions.

本態様に係る音声処理システムは、車載コンピュータのみによって構成されてもよい。たとえば、車載コンピュータが車両内の会話を分析して、他の車載コンピュータに会話内容を通知するように構成してもよい。この場合、伝達判断手段の機能は、送信車両と受信車両の双方で分担してもよい。たとえば、送信車両は会話が所定の話題であればその会話内容を送信し、受信車両は適切なタイミングか否かを判断した上で通知された会話内容を出力するようにしてもよい。また、送信車両と受信車両の両方で会話を伝達すべきかを判断してもよい。 The voice processing system according to this aspect may be configured only by the in-vehicle computer. For example, the in-vehicle computer may analyze the conversation in the vehicle and notify the content of the conversation to the other in-vehicle computer. In this case, the function of the transmission determination unit may be shared by both the transmitting vehicle and the receiving vehicle. For example, the transmitting vehicle may transmit the conversation content if the conversation is a predetermined topic, and the receiving vehicle may output the notified conversation content after determining whether the timing is appropriate. Moreover, you may judge whether conversation should be transmitted by both a transmission vehicle and a receiving vehicle.

なお、本発明は、上記処理の少なくとも一部を実行する発話タイミング決定方法として捉えることもできる。また、本発明は、この方法をコンピュータに実行させるためのコンピュータプログラム、あるいはこのコンピュータプログラムを非一時的に記憶したコンピュータ可読記憶媒体として捉えることもできる。上記手段および処理の各々は可能な限り互いに組み合わせて本発明を構成することができる。 Note that the present invention can also be understood as an utterance timing determination method for executing at least a part of the above processing. The present invention can also be understood as a computer program for causing a computer to execute this method, or a computer-readable storage medium in which this computer program is stored non-temporarily. Each of the above means and processes can be combined with each other as much as possible to constitute the present invention.

本発明によれば、複数の車両間での意思疎通を適切に支援できる。 According to the present invention, it is possible to appropriately support communication between a plurality of vehicles.

第１の実施形態に係る音声処理システムの構成例を示す図。The figure which shows the structural example of the speech processing system which concerns on 1st Embodiment. 第１の実施形態における処理の流れを示すフローチャート。The flowchart which shows the flow of the process in 1st Embodiment. 第１の実施形態における伝達判断処理の流れを示すフローチャート。The flowchart which shows the flow of the transmission judgment process in 1st Embodiment. 第２の実施形態における伝達判断処理の流れを示すフローチャート。The flowchart which shows the flow of the transmission judgment process in 2nd Embodiment. 第３の実施形態に係る音声処理システムの構成例を示す図。The figure which shows the structural example of the speech processing system which concerns on 3rd Embodiment. 第３の実施形態における伝達判断処理の流れを示すフローチャート。The flowchart which shows the flow of the transmission judgment process in 3rd Embodiment. 第５の実施形態における伝達判断処理の流れを示すフローチャート。The flowchart which shows the flow of the transmission judgment process in 5th Embodiment. 変形例に係る音声処理システムの構成例を示す図。The figure which shows the structural example of the speech processing system which concerns on a modification. 別の変形例に係る音声処理システムの構成例を示す図。The figure which shows the structural example of the speech processing system which concerns on another modification.

以下、図面を参照しながら本発明の例示的な実施形態を説明する。なお、以下の説明は本発明を例示的に説明するものであり、本発明は以下の実施形態に限定されるものではない。 Hereinafter, exemplary embodiments of the present invention will be described with reference to the drawings. In addition, the following description illustrates this invention exemplarily and this invention is not limited to the following embodiment.

（第１の実施形態）
＜システム構成＞
図１は、第１の実施形態にかかる音声処理システム１のシステム構成を示す図である。本実施形態にかかる音声処理システム１は、車両に搭載された車載システム１０と、センターサーバにより構築されるエージェントシステム２０を含んで構成される。音声処理システム１を構成する車両にはそれぞれ車載システム１０が搭載されている。図１では、３台の車両Ａ〜Ｃを示しているが、音声処理システム１に含まれる車両は２台以上であれば何台であってもよい。 (First embodiment)
<System configuration>
FIG. 1 is a diagram showing a system configuration of a speech processing system 1 according to the first embodiment. The voice processing system 1 according to the present embodiment includes an in-vehicle system 10 mounted on a vehicle and an agent system 20 constructed by a center server. Each vehicle constituting the voice processing system 1 is equipped with an in-vehicle system 10. In FIG. 1, three vehicles A to C are shown, but the number of vehicles included in the voice processing system 1 may be any number as long as it is two or more.

車両Ａ〜Ｃは、同一のグループに属するものとしてエージェントシステム２０に登録されている車両である。たとえば、集団が複数の車両に分乗して同一の目的地に向かうよう
なケースが想定される。ここで、各車両内における会話は車載システム１０からエージェントシステム２０に送信され、エージェントシステム２０は各車両内で行われている会話の内容を把握する。エージェントシステム２０は、ある車両内での会話をグループ内の他の車両に知らせる必要があるか判断し、必要があると判断した場合は、会話内容の要約を他の車両に知らせる制御を行う。このようにすることで、グループ内のユーザは他の車両で行われている会話を知ることができる。 Vehicles A to C are vehicles registered in the agent system 20 as belonging to the same group. For example, a case where a group divides a plurality of vehicles and goes to the same destination is assumed. Here, the conversation in each vehicle is transmitted from the in-vehicle system 10 to the agent system 20, and the agent system 20 grasps the content of the conversation being performed in each vehicle. The agent system 20 determines whether or not it is necessary to notify other vehicles in the group of a conversation in a certain vehicle, and when it is determined that it is necessary, performs control for notifying other vehicles of a summary of the conversation contents. By doing in this way, the user in a group can know the conversation currently performed with the other vehicle.

［車載システム］
車載システム１０は、演算プロセッサ、記憶装置、カメラ・マイク・ボタン・タッチパネルのような入力装置、スピーカーやディスプレイのような出力装置、通信装置を含むコンピュータを備える。車載システム１０は、演算プロセッサが記憶装置に記憶されたプログラムを実行することにより、音声入力部１１、音声出力部１２、制御部１３として機能する。これらの機能部のうちの一部または全部は、専用のロジック回路により実現されても構わない。 [In-vehicle system]
The in-vehicle system 10 includes a computer including an arithmetic processor, a storage device, an input device such as a camera, a microphone, a button, and a touch panel, an output device such as a speaker and a display, and a communication device. The in-vehicle system 10 functions as a voice input unit 11, a voice output unit 12, and a control unit 13 when the arithmetic processor executes a program stored in a storage device. Some or all of these functional units may be realized by a dedicated logic circuit.

音声入力部１１は、１つまたは複数のマイクまたはマイクアレイから音声を取得する。音声入力部１１によって取得された音声は、制御部１３によって通信装置（不図示）を介してエージェントシステム２０に送信されるこの際、音声データをそのままエージェントシステム２０に送信してもよいし、車載システム１０が雑音除去・音源分離・発話特徴量抽出のような前処理を行ってからエージェントシステム２０に送信してもよい。 The voice input unit 11 acquires voice from one or more microphones or a microphone array. The voice acquired by the voice input unit 11 is transmitted to the agent system 20 by the control unit 13 via a communication device (not shown). At this time, the voice data may be transmitted to the agent system 20 as it is, or in-vehicle. It may be transmitted to the agent system 20 after the system 10 performs preprocessing such as noise removal, sound source separation, and speech feature extraction.

音声出力部１２は、エージェントシステム２０から送信される発話内容（テキスト）に対応する音声を合成処理により生成し、スピーカーから出力する。音声合成には既存の任意の技術、たとえば波形接続型音声合成やフォルマント合成を利用可能である。 The voice output unit 12 generates voice corresponding to the utterance content (text) transmitted from the agent system 20 by synthesis processing, and outputs the voice from the speaker. For speech synthesis, any existing technology such as waveform-connected speech synthesis or formant synthesis can be used.

制御部１３は、車載システム１０の全体的な制御を司る。制御部１３は、音声入力部１１からの音声の取得、取得した音声のエージェントシステム２０への送信、エージェントシステム２０から受信した発話指示にしたがった音声出力部１２からの音声出力、のような制御を行う。 The control unit 13 governs overall control of the in-vehicle system 10. The control unit 13 performs control such as acquisition of voice from the voice input unit 11, transmission of the acquired voice to the agent system 20, and voice output from the voice output unit 12 according to the utterance instruction received from the agent system 20. I do.

車載システム１０は、無線通信装置を介してエージェントシステム２０と無線通信を行う。無線通信装置は、無線ＬＡＮ（ＩＥＥＥ８０２．１１系規格）、ＷｉＭＡＸ（ＩＥＥＥ８０２．１６系規格）、ＬＴＥなどのセルラー通信のような既存の任意の無線通信方式を利用可能である。 The in-vehicle system 10 performs wireless communication with the agent system 20 via a wireless communication device. The wireless communication apparatus can use any existing wireless communication method such as wireless LAN (IEEE802.11 system standard), WiMAX (IEEE802.16 system standard), cellular communication such as LTE.

［エージェントシステム］
エージェントシステム２０は、演算プロセッサ、記憶装置、入力装置、出力装置、通信装置を含むサーバコンピュータによって構成される。エージェントシステム２０は、演算プロセッサが記憶装置に記憶されたプログラムを実行することにより、音声認識部２１、会話内容理解部２２、伝達判断部２３、発話生成部２４、発話指示部２５として機能する。これらの機能部のうちの一部または全部は、専用のロジック回路により実現されても構わない。 Agent system
The agent system 20 includes a server computer including an arithmetic processor, a storage device, an input device, an output device, and a communication device. The agent system 20 functions as a speech recognition unit 21, a conversation content understanding unit 22, a transmission determination unit 23, an utterance generation unit 24, and an utterance instruction unit 25 when the arithmetic processor executes a program stored in a storage device. Some or all of these functional units may be realized by a dedicated logic circuit.

音声認識部２１は、車載システム１０から送信される音声データに対して雑音除去・音源分離・発話特徴量抽出の処理を行い、音響モデル・言語モデル・発話辞書を含む音声認識辞書を参照して、発話の内容をテキスト化する。音声認識部２１は、既存の音声認識技術を用いて音声認識を行えばよい。会話は送信元車両ごとにテキスト化され、好ましくはユーザごとにテキスト化される。音声認識部２１は、認識したユーザの発話を会話内容理解部２２に送信する。 The speech recognition unit 21 performs noise removal, sound source separation, and speech feature extraction processing on speech data transmitted from the in-vehicle system 10, and refers to a speech recognition dictionary including an acoustic model, a language model, and a speech dictionary. , Text the utterance. The speech recognition unit 21 may perform speech recognition using existing speech recognition technology. The conversation is textified for each source vehicle, preferably text for each user. The voice recognition unit 21 transmits the recognized user utterance to the conversation content understanding unit 22.

会話内容理解部２２は、音声認識部２１から送信される各車両で行われている会話の内容を理解する。 The conversation content understanding unit 22 understands the content of the conversation performed in each vehicle transmitted from the voice recognition unit 21.

会話内容理解部２２は、会話に含まれるそれぞれの発話について、記憶部に記憶されている語彙意図理解用辞書（不図示）を参照して、発話の意図および話題を推定する。発話の意図には、例えば、話題の切り出し、提案、提案への賛成・反対、意見の集約が含まれる。発話の話題には、例えば、発話のジャンル、場所、ものが含まれる。発話のジャンルには、例えば、飲食、旅行、音楽、天候が含まれる。話題となっている場所には、例えば、地名、ランドマーク、店舗名、施設名が含まれる。語彙意図理解用辞書には、「話題を切り出す、提案する、質問する、賛成する、反対する、物事を集約する」といった場合にそれぞれ使われる語彙や、発話のジャンルを特定するための「飲食、旅行、音楽、天候」に関する語彙や、話題となっている場所を特定するための「地名、ランドマーク、店舗名、施設名」に関する語彙の辞書が含まれる。 The conversation content understanding unit 22 refers to a lexical intention understanding dictionary (not shown) stored in the storage unit for each utterance included in the conversation, and estimates the intention and topic of the utterance. The intention of utterance includes, for example, topic extraction, proposal, approval / disapproval of proposal, and aggregation of opinions. The topic of utterance includes, for example, the genre, place and thing of the utterance. The genre of utterance includes, for example, food and drink, travel, music, and weather. For example, a place that is a topic includes a place name, a landmark, a store name, and a facility name. The dictionary for understanding vocabulary intentions includes vocabulary used in cases such as “cutting out topics, proposing, asking questions, agreeing, disagreeing, and consolidating things” and “eating, drinking, Vocabulary relating to “travel, music, weather” and a dictionary of vocabularies relating to “location name, landmark, store name, facility name” for specifying a topical place.

これらの辞書を用いた処理の結果、会話内容理解部２２は、各発話について、「何をどうしたいか」といった話者の意図と、話題となっているジャンルを推定することができる。「お昼にうどん食べたいね？」というテキストについては、辞書との照合により、「お昼」「うどん」「食べたい」という語からジャンルが「昼食」とくに「うどん」であること、「ね？」という語から発話の意図が「提案」であることが推定できる。また、「いいね。いこうか」というテキストから発話の意図が「同意」であると判断できる。 As a result of the processing using these dictionaries, the conversation content understanding unit 22 can estimate the speaker's intention such as “what to do” and the genre that is the topic for each utterance. As for the text “Do you want to eat udon at lunch?”, By matching against the dictionary, the genre is “lunch”, especially “Udon” from the words “noon”, “udon”, “want to eat”, “ne?” It can be estimated that the intention of the utterance is “suggestion”. Further, it can be determined that the intention of the utterance is “agreement” from the text “Like.

会話内容理解部２２は、会話に含まれる各発話の内容にしたがって、会話の内容を決定する。会話の内容は、たとえば、会話全体の話題、会話の意図、会話の要約によって特定できる。たとえば、会話内容理解部２２は会話の内容が、「昼食にうどんを食べるという決定」、「昼食についての協議」であるというように決定することができる。 The conversation content understanding unit 22 determines the content of the conversation according to the content of each utterance included in the conversation. The content of the conversation can be specified by, for example, the topic of the entire conversation, the intention of the conversation, and the summary of the conversation. For example, the conversation content understanding unit 22 can determine that the content of the conversation is “determination of eating udon for lunch” or “discussion about lunch”.

伝達判断部２３は、会話内容理解部２２が決定した会話の内容と、伝達基準記憶部２３ａに記憶された伝達基準とに基づいて、会話内容を他の車両に対して伝達するべきであるか否かを判断する。伝達基準は任意のものであってよい。たとえば、会話の話題が特定の話題であるか否かに基づいて、伝達するか否かを判断することができる。他車両に伝達すべき話題の例として、今後の予定、現在地、周囲のランドマーク、目的地、出発地に関する話題を挙げられるが、これらに限定されない。各車両のユーザの個人プロプロファイルが利用できる場合には、グループ内で共通する属性に関する話題を採用するようにしてもよい。 Should the transmission determination unit 23 transmit the conversation content to another vehicle based on the conversation content determined by the conversation content understanding unit 22 and the transmission reference stored in the transmission reference storage unit 23a? Judge whether or not. The transmission standard may be arbitrary. For example, it is possible to determine whether or not to transmit based on whether or not the topic of conversation is a specific topic. Examples of topics to be transmitted to other vehicles include, but are not limited to, topics related to future schedules, current locations, surrounding landmarks, destinations, and departure locations. When the personal professional profile of the user of each vehicle can be used, a topic related to attributes common in the group may be adopted.

発話生成部２４は、会話内容を他車両に伝達する場合の発話（テキスト）を生成する。生成される発話は、伝達する会話の内容を表すものとする。会話を要約する場合に、どの程度の抽象度で要約するかは適宜決定すればよい。たとえば、ある車両内で昼食に何を食べるか協議し結論が得られた場合に、「車両Ａでは、お昼にうどんを食べたいと言っています」、「車両では昼食をどうするか話しています」のような会話の要約・概要を発話文として生成することができる。 The utterance generation unit 24 generates an utterance (text) when the conversation content is transmitted to another vehicle. The generated utterance represents the content of the conversation to be transmitted. When summarizing conversations, what level of abstraction should be summarized may be determined as appropriate. For example, when you discuss what you eat for lunch in a vehicle and you have a conclusion, “Vehicle A says you want to eat udon at noon”, “We talk about how to eat lunch in a vehicle.” It is possible to generate a summary / summary of a conversation such as

発話指示部２５は、発話生成部２４によって生成された発話文を、当該発話を出力すべき車両に対して送信する。発話指示を受信した車両は、音声出力部１２から生成された発話を出力する。なお、発話生成部２４と発話指示部２５を合わせて本発明における出力制御部と捉えられる。 The utterance instruction unit 25 transmits the utterance sentence generated by the utterance generation unit 24 to the vehicle to output the utterance. The vehicle that has received the utterance instruction outputs the utterance generated from the voice output unit 12. The utterance generation unit 24 and the utterance instruction unit 25 are collectively regarded as an output control unit in the present invention.

＜処理フロー＞
図２は、本実施形態に係るエージェントシステム２０における処理の流れを示す図である。図２に示すフローチャートは、１つの会話（１台の車両から得られる会話）に対する
処理であり、グループを構成する各車両から得られる会話に対して同様の処理が行われる。 <Processing flow>
FIG. 2 is a diagram showing a flow of processing in the agent system 20 according to the present embodiment. The flowchart shown in FIG. 2 is a process for one conversation (a conversation obtained from one vehicle), and the same process is performed for a conversation obtained from each vehicle constituting the group.

なお、図２に示す処理の前後に、車載システム１０が音声入力部１１から会話音声を取得しエージェントシステム２０に送信する処理、および音声出力部１２がエージェントシステム２０から送信される発話指示に従って発話音声を出力する処理があるが、ここではエージェントシステム２０での処理に絞って説明する。 Before and after the process shown in FIG. 2, the in-vehicle system 10 obtains a conversation voice from the voice input unit 11 and transmits it to the agent system 20, and the voice output unit 12 speaks according to the utterance instruction transmitted from the agent system 20. There is a process for outputting a sound, but here, the description is limited to the process in the agent system 20.

ステップＳ１０１において、エージェントシステム２０は、各車両の音声入力部１１が取得した会話音声を、無線通信によって車載システム１０から取得する。ステップＳ１０２において、音声認識部２１が、取得した会話音声に音声認識処理を施し、発話音声をテキスト化する。ステップＳ１０３において、会話内容理解部２２が、発話音声のテキストに基づいて、会話の話題・意図・要約のような会話内容を理解する。ここまでの処理により、それぞれの車両においてどのような会話が行われているかを、エージェントシステム２０が理解できる。 In step S101, the agent system 20 acquires the conversation voice acquired by the voice input unit 11 of each vehicle from the in-vehicle system 10 by wireless communication. In step S102, the speech recognition unit 21 performs speech recognition processing on the acquired conversational speech to convert the speech speech into text. In step S <b> 103, the conversation content understanding unit 22 understands the conversation content such as the topic / intention / summary of the conversation based on the text of the speech. By the processing so far, the agent system 20 can understand what kind of conversation is being performed in each vehicle.

なお、会話内容の理解は、車両単位で行えばよいが、車両内のユーザ（話者）単位で行うとさらに好適である。話者単位でどのような発話を行ったかを把握できると、より詳細に車両内の会話の内容を分析できるためである。 The conversation content may be understood in units of vehicles, but more preferably in units of users (speakers) in the vehicle. This is because it is possible to analyze the contents of the conversation in the vehicle in more detail if it is possible to grasp what kind of utterance has been made in units of speakers.

ステップＳ１０４において、伝達判断部２３は、取得した会話を他の車両に対して伝達するか否かを決定する。ステップＳ１０４における伝達判断処理の詳細を図３に示す。伝達判断部２３は、ステップＳ３００において、取得した会話が伝達基準記憶部２３ａに格納されている所定の話題に関するものであるか否かを判断する。取得した会話の話題が所定の話題に関するものであれば（Ｓ３００−ＹＥＳ）、伝達判断部２３はこの会話を他の車両に伝達すると判断し（Ｓ３０１）、そうでなければこの会話は他の車両に伝達しないと判断する（Ｓ３０２）。 In step S104, the transmission determination unit 23 determines whether or not to transmit the acquired conversation to another vehicle. Details of the transmission determination process in step S104 are shown in FIG. In step S300, the transmission determination unit 23 determines whether the acquired conversation relates to a predetermined topic stored in the transmission reference storage unit 23a. If the topic of the acquired conversation is related to a predetermined topic (S300-YES), the transmission determining unit 23 determines that this conversation is to be transmitted to another vehicle (S301). (S302).

会話を他の車両に伝達すると判断された場合（Ｓ１０４−ＹＥＳ）には、ステップＳ１０５に進み、発話生成部２４にこの会話を他車両に知らせる際の発話（テキスト）を決定する。ステップＳ１０６において発話指示部２５は、生成した発話を車両内において出力するように指示する発話指示を、会話を取得した車両以外の車両に対して送信する。 If it is determined that the conversation is to be transmitted to another vehicle (S104-YES), the process proceeds to step S105, and the utterance (text) for notifying the utterance generation unit 24 of this conversation to the other vehicle is determined. In step S <b> 106, the utterance instruction unit 25 transmits an utterance instruction instructing to output the generated utterance in the vehicle to vehicles other than the vehicle that has acquired the conversation.

なお、発話指示を受信した車両は、受信直後に発話の出力を行う必要は無く、適宜のタイミングで出力を行えばよい。たとえば、車両内でユーザ同士が会話をしている場合にはその会話が中断したタイミングで発話を出力することが考えられる。また、ドライバが運転操作に集中しているまたは集中する必要があると推定される場合には、そのような状況が解消したタイミングで発話を出力することも考えられる。 A vehicle that has received an utterance instruction does not need to output an utterance immediately after reception, and may output it at an appropriate timing. For example, when users are having a conversation with each other in a vehicle, it is conceivable to output an utterance at the timing when the conversation is interrupted. Further, when it is estimated that the driver concentrates on or needs to concentrate on the driving operation, it is possible to output the utterance at the timing when such a situation is resolved.

＜実施形態の有利な効果＞
本実施形態に係る音声処理システムによれば、ある車両で行われている会話を他車両に伝達する必要があるか否かをエージェントシステム２０が判断し、伝達の必要があると判断される会話については、その会話の要約が他車両に知らされる。異なる車両内のユーザ同士が直接コミュニケーションを取らなくても、エージェントシステムから会話の要約が送信されるので、他の車両内でどのような会話が行われているのかを十分に把握可能である。 <Advantageous Effects of Embodiment>
According to the voice processing system according to the present embodiment, the agent system 20 determines whether or not it is necessary to transmit a conversation performed in a certain vehicle to another vehicle, and the conversation in which it is determined that transmission is necessary. , The other vehicle is informed of the summary of the conversation. Even if users in different vehicles do not communicate directly with each other, a summary of the conversation is transmitted from the agent system, so that it is possible to sufficiently understand what kind of conversation is being performed in another vehicle.

本実施形態では、音声を用いた入力および出力（通知）を行っているため、運転中であってもドライバも利用可能である。また、車両内のユーザ数に関わらず、車両内に１つのマイク（入力装置）およびスピーカー（出力装置）があればよいので、構成を単純化でき
る。また、音声を利用しているが、異なる車両間のユーザが直接会話を行わないので、ターンテイキングに関する問題も生じないという利点がある。雑談のような直感的な会話では音声のみでの多人数会話は困難であるが、そのような問題が生じない。また、通信が途絶した場合であっても影響が少なく、通信回復後に他車両の会話の内容の通知を受けることもできる。 In this embodiment, since voice is used for input and output (notification), a driver can be used even during driving. In addition, regardless of the number of users in the vehicle, the configuration can be simplified because only one microphone (input device) and speaker (output device) are required in the vehicle. Further, although voice is used, there is an advantage that a problem regarding turn-taking does not occur because users between different vehicles do not directly talk. Intuitive conversation such as chat is difficult for multi-person conversation using only voice, but such a problem does not occur. Further, even when communication is interrupted, there is little influence, and it is possible to receive notification of the content of the conversation of another vehicle after communication is restored.

（第２の実施形態）
会話を他車両に伝達するか否かの判断は、図３に示す処理以外の方法で行ってもよい。本実施形態は、伝達判断処理Ｓ１０４が第１の実施形態と異なる。その他の構成は第１の実施形態とほぼ同様であるため、相違点についてのみ説明する。 (Second Embodiment)
The determination as to whether or not to transmit the conversation to another vehicle may be made by a method other than the processing shown in FIG. This embodiment is different from the first embodiment in transmission determination processing S104. Since other configurations are substantially the same as those of the first embodiment, only differences will be described.

図４は、本実施形態における伝達判断処理Ｓ１０４の内容を示すフローチャートである。なお、第１の実施形態では、会話の内容のみに基づいて伝達するかどうかを決定しているが、本実施形態では、伝達先の車両の状況も考慮して伝達するかどうか決定する。したがって、図４の処理は、伝達元の車両および伝達先の車両ごとに実行される。以下では、伝達先の候補となる車両のことを、対象車両と称する。 FIG. 4 is a flowchart showing the contents of the transmission determination process S104 in the present embodiment. In the first embodiment, whether or not to transmit is determined based only on the content of the conversation, but in this embodiment, whether or not to transmit is determined in consideration of the situation of the transmission destination vehicle. Therefore, the process of FIG. 4 is executed for each transmission source vehicle and transmission destination vehicle. Hereinafter, a vehicle that is a candidate for a transmission destination is referred to as a target vehicle.

エージェントシステム２０は、グループ内の各車両の会話を認識しその内容を理解している。したがって、ステップＳ３０３において、伝達判断部２３は対象車両内における会話の活発度を判定できる。会話の活発度は、たとえば、単位時間あたりの発話数、発話間の時間間隔、発話に使われる言い回しや語句から判定可能である。 The agent system 20 recognizes the conversation of each vehicle in the group and understands its contents. Therefore, in step S303, the transmission determination unit 23 can determine the activity level of conversation in the target vehicle. The activity level of the conversation can be determined from, for example, the number of utterances per unit time, the time interval between utterances, and the phrases and phrases used in the utterances.

対象車両において会話の活発度が通常程度以上に活発であれば（Ｓ３０３−ＮＯ）、伝達判断部２３は、会話を伝達するか否かの基準として通常の基準を採用する（Ｓ３０４）。一方、対象車両において会話が停滞している場合（Ｓ３０３−ＹＥＳ）には、通常よりも緩い基準を採用する（Ｓ３０５）。 If the conversational activity level is more active than usual in the target vehicle (S303—NO), the transmission determination unit 23 adopts a normal criterion as a criterion for determining whether or not to transmit the conversation (S304). On the other hand, when the conversation is stagnant in the target vehicle (S303-YES), a criterion that is looser than usual is adopted (S305).

本実施形態では、車両内での会話の活発度に応じて伝達する会話の基準を変更できるように、複数の基準を伝達基準記憶部２３ａをあらかじめ記憶しておく。基準が話題に基づく場合には、通常よりも緩い基準では、より多くの話題が他車両として登録される。 In the present embodiment, the transmission reference storage unit 23a stores a plurality of references in advance so that the conversation reference to be transmitted can be changed according to the conversation activity level in the vehicle. When the standard is based on a topic, more topics are registered as other vehicles with a standard that is looser than usual.

ステップＳ３００〜Ｓ３０２の処理内容は第１の実施形態と同様であるので説明を省略する。 Since the processing contents of steps S300 to S302 are the same as those in the first embodiment, a description thereof will be omitted.

本実施形態によれば、車両内で会話が停滞しているときには、より多くの他車両の会話が通知されることになるので、車両内での会話を促すことができる。 According to this embodiment, when the conversation is stagnant in the vehicle, more conversations of other vehicles are notified, so that the conversation in the vehicle can be promoted.

なお、上記の説明では会話の活発度に応じて伝達基準を２種類のいずれかから選択するようにしているが、会話の活発度を３段階以上に分類してそれぞれ異なる基準を採用するようにしてもよい。 In the above description, the transmission standard is selected from one of two types according to the conversation activity level. However, the conversation activity level is classified into three or more levels and different standards are adopted. May be.

また、上記の説明では、伝達先車両の会話活発度に応じて伝達基準を変えているが、伝達先車両内のその他の状況を考慮して伝達基準を変えてもよい。たとえば、伝達先車両において行われている会話の内容に基づいて、類似する話題の会話をより積極的に伝達するように判断するような伝達基準を採用することが考えられる。 In the above description, the transmission standard is changed according to the conversation activity level of the transmission destination vehicle. However, the transmission standard may be changed in consideration of other situations in the transmission destination vehicle. For example, it is conceivable to adopt a transmission standard that makes a decision to transmit a conversation of a similar topic more positively based on the content of the conversation being performed in the transmission destination vehicle.

（第３の実施形態）
第１の実施形態では会話内容に基づいて会話を他車両に伝達するか否かを決定しているが、本実施形態では会話内容以外の情報も用いて、会話を他車両に伝達するか否かを判断する。 (Third embodiment)
In the first embodiment, whether or not to transmit the conversation to the other vehicle is determined based on the conversation content. In the present embodiment, whether or not the conversation is transmitted to the other vehicle using information other than the conversation content. Determine whether.

図５は本実施形態における音声処理システム１のシステム構成を示す図である。本実施形態において車両が備える車載システム３０は、音声入力部１１、音声出力部１２、制御部１３に加えて、位置情報取得部１４および車両制御情報取得部１５を含む。 FIG. 5 is a diagram showing a system configuration of the voice processing system 1 in the present embodiment. In the present embodiment, the in-vehicle system 30 included in the vehicle includes a position information acquisition unit 14 and a vehicle control information acquisition unit 15 in addition to the audio input unit 11, the audio output unit 12, and the control unit 13.

位置情報取得部１４は、ＧＰＳ装置（あるいはＧＮＳＳ装置）から車両の現在位置を取得する。車両の現在位置は、携帯基地局測位によって取得したり、ジャイロやマップマッチングにより補正したものであってもよい。 The position information acquisition unit 14 acquires the current position of the vehicle from the GPS device (or GNSS device). The current position of the vehicle may be acquired by mobile base station positioning, or may be corrected by gyro or map matching.

車両制御情報取得部１５は、車両の制御に関する情報、たとえば、速度、加速度、ヨーレート、エンジン回転数、アクセル開度、ブレーキ踏み込み量、ハンドル操舵角、方向指示器のオンオフ、自動運転機能あるいは運転アシスト機能のオンオフのような情報を取得する。 The vehicle control information acquisition unit 15 is information related to vehicle control, for example, speed, acceleration, yaw rate, engine speed, accelerator opening, brake depression, steering angle, turn on / off of direction indicator, automatic driving function or driving assist. Get information like feature on / off.

本実施形態では、制御部１３は、音声入力部１１が取得する音声に加えて、位置情報取得部１４が取得する位置情報および車両制御情報取得部１５が取得する車両制御情報もエージェントシステム２０に送信する。 In the present embodiment, in addition to the voice acquired by the voice input unit 11, the control unit 13 also transmits the position information acquired by the position information acquisition unit 14 and the vehicle control information acquired by the vehicle control information acquisition unit 15 to the agent system 20. Send.

本実施形態におけるエージェントシステム２０の構成は第１の実施形態とほぼ同様であるが、危険箇所記憶部２３ｂを備える点で異なる。危険箇所記憶部２３ｂには、事故が起こりやすい位置あるいは運転に集中する必要がある位置（以下ではこれらを危険箇所と称する）に関する情報が格納される。危険箇所記憶部２３ｂの生成方法は特に限定されない。 The configuration of the agent system 20 in the present embodiment is substantially the same as that in the first embodiment, but differs in that it includes a dangerous part storage unit 23b. The dangerous part storage unit 23b stores information related to positions where accidents are likely to occur or positions where it is necessary to concentrate on driving (hereinafter referred to as dangerous places). The method for generating the dangerous part storage unit 23b is not particularly limited.

図６は、本実施形態における伝達判断処理Ｓ１０４の内容を示すフローチャートである。なお、第１の実施形態では、会話の内容のみに基づいて伝達するかどうかを決定しているが、本実施形態では、伝達先の車両の状況も考慮して伝達するかどうか決定する。したがって、図６の処理は、伝達元の車両および伝達先の車両ごとに実行される。以下では、伝達先の候補となる車両のことを、対象車両と称する。 FIG. 6 is a flowchart showing the contents of the transmission determination process S104 in the present embodiment. In the first embodiment, whether or not to transmit is determined based only on the content of the conversation, but in this embodiment, whether or not to transmit is determined in consideration of the situation of the transmission destination vehicle. Therefore, the process of FIG. 6 is executed for each transmission source vehicle and transmission destination vehicle. Hereinafter, a vehicle that is a candidate for a transmission destination is referred to as a target vehicle.

ステップＳ３００における処理は第１の実施形態と同様である。なお、第２の実施形態態と同様に対象車両の会話状況に応じて、ステップＳ３００の選択基準を動的に変えるようにしても構わない。 The process in step S300 is the same as in the first embodiment. As in the second embodiment, the selection criterion in step S300 may be dynamically changed according to the conversation status of the target vehicle.

会話が所定の話題ではないとき（Ｓ３００−ＮＯ）に伝達しないと判断する（Ｓ３０２）点は第１の実施形態と同じであるが、会話が所定の話題であるとき（Ｓ３００−ＹＥＳ）であっても即座に会話を伝達するとは判断せずにステップＳ３０６に進む。 It is the same as the first embodiment in that it is determined that the conversation is not transmitted when the conversation is not a predetermined topic (S300-NO) (S302), but when the conversation is a predetermined topic (S300-YES). However, the process proceeds to step S306 without determining that the conversation is to be transmitted immediately.

ステップＳ３０６では、伝達判断部２３は、対象車両が危険箇所に位置するか否かを判定する。この判定は、対象車両から取得される位置情報と、危険箇所記憶部２３ｂにおいて指定されている危険箇所を照合することにより行える。対象車両が危険箇所に位置しない場合（Ｓ３０６−ＮＯ）には、伝達判断部２３はステップＳ３０７においてさらに、対象車両が運転操作に集中する必要があるか否かを判定する。この判定は、対象車両から取得される車両制御情報に基づいて行える。たとえば、ハンドル操作中・車線変更中などの場合に運転操作に集中する必要があると判定できる。 In step S306, the transmission determination unit 23 determines whether or not the target vehicle is located in a dangerous place. This determination can be performed by collating the position information acquired from the target vehicle with the dangerous spot designated in the dangerous spot storage unit 23b. When the target vehicle is not located at the dangerous place (S306-NO), the transmission determination unit 23 further determines whether or not the target vehicle needs to concentrate on the driving operation in step S307. This determination can be made based on vehicle control information acquired from the target vehicle. For example, it can be determined that it is necessary to concentrate on the driving operation when the steering wheel is being operated or the lane is being changed.

対象車両が危険箇所に位置せず（Ｓ３０６−ＮＯ）かつ運転操作への集中が必要ではない場合（Ｓ３０７−ＮＯ）には、伝達判断部２３は会話を対象車両に伝達すると判断する（Ｓ３０１）。 When the target vehicle is not located in the dangerous part (S306-NO) and the concentration to the driving operation is not necessary (S307-NO), the transmission determination unit 23 determines to transmit the conversation to the target vehicle (S301). .

一方、対象車両が危険箇所に位置する（Ｓ３０６−ＹＥＳ）かまたは運転操作に集中が必要である場合（Ｓ３０７−ＹＥＳ）には、伝達判断部２３は、一定時間の待機（Ｓ３０８）の後に再びステップＳ３０６〜Ｓ３０７の判定を行う。 On the other hand, if the target vehicle is located in a dangerous place (S306-YES) or if the driving operation needs to be concentrated (S307-YES), the transmission determination unit 23 again after waiting for a certain time (S308). The determinations in steps S306 to S307 are performed.

このようにすれば、対象車両（伝達先の候補の車両）の車両状態も考慮した上で、会話を伝達するかどうかを判断できる。本実施例の手法では、音声出力による通知が適切ではないタイミングにエージェントシステム２０から車載システム３０に対して、発話の指示が送られるのを避けることができる。 In this way, it is possible to determine whether or not to transmit the conversation in consideration of the vehicle state of the target vehicle (candidate vehicle for transmission destination). In the method of the present embodiment, it is possible to avoid an utterance instruction being sent from the agent system 20 to the in-vehicle system 30 at a timing when notification by voice output is not appropriate.

（第４の実施形態）
第３の実施形態では対象車両の車両制御情報を、エージェントシステム２０からの発話指示の送信タイミングを決定するために利用しているが、伝達すべき会話内容の選択基準を対象車両の車両制御情報に応じて変えてもよい。たとえば、対象車両の位置情報を考慮して当該位置に対応する会話を伝達すると判断するようにすることができる。あるいは、自動運転機能がオンかオフかによって、異なる基準で会話を伝達するか否かを判断するようにもできる。 (Fourth embodiment)
In the third embodiment, the vehicle control information of the target vehicle is used to determine the transmission timing of the utterance instruction from the agent system 20, but the selection criteria of the conversation content to be transmitted is used as the vehicle control information of the target vehicle. You may change according to. For example, it is possible to determine that the conversation corresponding to the position is transmitted in consideration of the position information of the target vehicle. Alternatively, it can be determined whether or not the conversation is transmitted based on different criteria depending on whether the automatic driving function is on or off.

（第５の実施形態）
本実施形態では、会話を行っている車両（伝達元の車両）の車両制御状態も考慮して、会話を他車両に送信するか否かを決定する。本実施形態の構成は第３の実施形態と同様である。 (Fifth embodiment)
In this embodiment, the vehicle control state of the vehicle (transmission source vehicle) that is having a conversation is also taken into consideration to determine whether or not to transmit the conversation to another vehicle. The configuration of this embodiment is the same as that of the third embodiment.

図７は、本実施形態における伝達判断処理Ｓ１０４の内容を示すフローチャートである。まず、ステップＳ３０９において、伝達するか否かの判断の対象となっている会話の取得元の車両が、危険回避行動を行ったか否かを伝達判断部２３が判断する。危険回避行動の例として、たとえば急ブレーキや急ハンドルが挙げられる。 FIG. 7 is a flowchart showing the contents of the transmission determination process S104 in the present embodiment. First, in step S309, the transmission determination unit 23 determines whether or not the vehicle from which the conversation that is the target of the determination of transmission has performed a risk avoidance action. Examples of danger avoidance behavior include sudden braking and sudden steering.

車両が危険回避行動を行っていない場合（Ｓ３０９−ＮＯ）には、会話を伝達するか否かの基準として通常の基準を採用する（ステップＳ３１０）。一方、車両が危険回避行動を行った場合（Ｓ３０９−ＹＥＳ）には、会話の伝達基準として、通常の基準に合致する会話に加えて、交通安全や路上の障害物に関する話題も伝達すると判断するような基準を採用する。基準が決定された後のステップＳ３００〜Ｓ３０２の処理は第１の実施形態と同様である。 When the vehicle is not performing the danger avoidance action (S309-NO), a normal standard is adopted as a standard for determining whether or not to transmit the conversation (step S310). On the other hand, when the vehicle performs the danger avoidance action (S309-YES), it is determined that topics related to traffic safety and obstacles on the road are transmitted as the conversation transmission standard in addition to the conversation that matches the normal standard. Adopt such a standard. The processing in steps S300 to S302 after the reference is determined is the same as in the first embodiment.

このようにすれば、たとえば路上に障害物があって回避行動を取った後の、障害物や回避行動に関する会話を他の車両に伝達することができる。 In this way, for example, a conversation about an obstacle or an avoidance action after an obstacle on the road and taking an avoidance action can be transmitted to another vehicle.

本実施形態において、伝達元の会話を行っている車両の位置と伝達先の車両の位置とを考慮し、伝達先の車両の方が後方に位置することをさらに条件として、判別の基準を変えてもよい。伝達先車両の後方での危険を通知する必要性は低いと考えられるためである。 In this embodiment, considering the position of the vehicle that is carrying out the transmission source conversation and the position of the transmission destination vehicle, the criteria for the determination are changed on the condition that the transmission destination vehicle is located rearward. May be. This is because the necessity of notifying the danger behind the transmission destination vehicle is considered to be low.

＜変形例＞
［車両間の直接会話］
車載システム１０に対してユーザが特定のキーワードを発した場合には、車両間で直接会話モードに切り替えるようにしてもよい。このようにすれば、エージェントシステム２０から通知される他車両の会話に基づいて、必要に応じて車両間のユーザが直接会話することができる。 <Modification>
[Direct conversation between vehicles]
When a user issues a specific keyword to the in-vehicle system 10, it may be switched directly to a conversation mode between vehicles. If it does in this way, based on the conversation of the other vehicles notified from the agent system 20, the user between vehicles can talk directly as needed.

［構成の変形］
上記の説明では、音声処理システム１が車載システム１０とエージェントシステム２０
から構成される例を説明したが、具体的なコンピュータ（ハードウェア）の配置は任意であり、システム全体として上記で説明した機能が提供されればよい。 [Modification of configuration]
In the above description, the voice processing system 1 is the in-vehicle system 10 and the agent system 20.
However, the specific arrangement of the computer (hardware) is arbitrary, and the functions described above may be provided as the entire system.

図８はシステム構成の一変形例を示す図である。この例では、音声認識処理を行う音声認識サーバ４０が導入されており、エージェントシステム２０は音声認識機能を備えない。車載システム１０は会話音声を音声認識サーバ４０に送信し、音声認識サーバ４０が会話音声をテキスト化して車載システム１０へ送信する。車載システム１０は、その後に会話のテキストをエージェントシステム２０に送信する。 FIG. 8 is a diagram showing a modification of the system configuration. In this example, a voice recognition server 40 that performs voice recognition processing is introduced, and the agent system 20 does not have a voice recognition function. The in-vehicle system 10 transmits the conversation voice to the voice recognition server 40, and the voice recognition server 40 converts the conversation voice into text and transmits it to the in-vehicle system 10. The in-vehicle system 10 then transmits the conversation text to the agent system 20.

図８は音声認識処理の機能を複数の装置によって分担する例といえる。分担する機能は音声認識処理に限られず、上記実施形態中の任意の機能であって構わない。 FIG. 8 can be said to be an example in which the function of voice recognition processing is shared by a plurality of devices. The shared function is not limited to the voice recognition process, and may be any function in the above embodiment.

たとえば、伝達判断部２３の機能をエージェントシステム２０と車載システム１０のあいだで分館することも好ましい。第２から第５の実施形態において、エージェントシステム２０の伝達判断部２３が、発話の内容に加えて対象車両の状況（会話状況や車両制御状況）を考慮して、会話を伝達するか否かを判断している。これに対して、エージェントシステム２０（の伝達判断部２３）が発話の内容に基づいて会話を伝達するか否かを判断し、車載システム１０（の伝達判断部）が当該車両内の会話状況や車両制御状態を考慮して会話の伝達を行うか否かを判断するようにしてもよい。すなわち、エージェントシステム２０において伝達すべきと判断された会話の概要が車載システム１０に通知され、車載システム１０が実際にユーザへの伝達（出力）を行うか否かを決定する。この構成によれば、車両の状況に基づく判断を車両内で行うので、当該処理をサーバ側において行うよりも、正確かつ高速に処理を行えるという利点がある。 For example, it is also preferable that the function of the transmission determination unit 23 is separated between the agent system 20 and the in-vehicle system 10. In the second to fifth embodiments, whether or not the transmission determination unit 23 of the agent system 20 transmits the conversation in consideration of the situation of the target vehicle (conversation situation and vehicle control situation) in addition to the content of the utterance. Judging. On the other hand, the agent system 20 (the transmission determining unit 23) determines whether or not to transmit the conversation based on the content of the utterance, and the in-vehicle system 10 (the transmission determining unit) It may be determined whether or not to transmit the conversation in consideration of the vehicle control state. In other words, the outline of the conversation determined to be transmitted in the agent system 20 is notified to the in-vehicle system 10, and the in-vehicle system 10 determines whether to actually transmit (output) to the user. According to this configuration, since the determination based on the state of the vehicle is performed in the vehicle, there is an advantage that the processing can be performed accurately and at a higher speed than when the processing is performed on the server side.

図９Ａはシステム構成のさらに別の変形例を示す図である。この例では、エージェントシステム２０はサーバにではなく各車両に搭載される。図９Ｂは図９Ａの構成例における処理例を示す図である。車両Ａ内のエージェントシステム２０が、当該車両Ａ内の会話を取得し（Ｓ２０１）、他の車両Ｂ，Ｃに伝達すべきか否かを判断する（Ｓ２０２）。この判断の際に伝達先の車両Ｂ，Ｃに関する情報が必要であれば、車両間で必要な情報を交換すればよい。他車両Ｂ，Ｃに伝達すべき会話と判断されると、車両Ａは会話の要約を含む発話内容を生成し他車両に送信する（Ｓ２０３）。車両Ｂ，Ｃはこの通知を受信する（Ｓ２０４）と、音声の出力が適切なタイミングであるか、すなわち車両内で会話中でないかあるいは運転操作に忙しくないかを判断し（Ｓ２０５）、音声出力が可能な状況であれば、送信された会話の概要を出力する（Ｓ２０６）。 FIG. 9A is a diagram showing still another modification of the system configuration. In this example, the agent system 20 is mounted not on the server but on each vehicle. FIG. 9B is a diagram illustrating a processing example in the configuration example of FIG. 9A. The agent system 20 in the vehicle A acquires the conversation in the vehicle A (S201), and determines whether or not it should be transmitted to the other vehicles B and C (S202). If information regarding the destination vehicles B and C is necessary in this determination, the necessary information may be exchanged between the vehicles. If it is determined that the conversation should be transmitted to the other vehicles B and C, the vehicle A generates utterance content including the summary of the conversation and transmits it to the other vehicle (S203). When the vehicles B and C receive this notification (S204), it is determined whether the sound output is at an appropriate timing, that is, whether the vehicle is in a conversation or not busy with driving (S205). If the situation is possible, a summary of the transmitted conversation is output (S206).

このように車載タイプのエージェントシステムを用いても、サーバタイプのエージェントシステムを用いた場合と同様の効果を得ることができる。 As described above, even when an in-vehicle type agent system is used, the same effect as that obtained when a server type agent system is used can be obtained.

１：音声処理システム
１０：車載システム１１：音声入力部１２：音声出力部１３：制御部
１４：位置情報取得部１５：車両制御情報取得部
２０：エージェントシステム
２１：音声認識部２２：会話内容理解部
２３：伝達判断部２４：発話生成部２５：発話指示部 DESCRIPTION OF SYMBOLS 1: Voice processing system 10: In-vehicle system 11: Voice input part 12: Voice output part 13: Control part 14: Position information acquisition part 15: Vehicle control information acquisition part 20: Agent system 21: Voice recognition part 22: Conversation content understanding Unit 23: Transmission determination unit 24: Utterance generation unit 25: Utterance instruction unit

Claims

第１の車両における会話音声に対して音声認識を行う音声認識手段と、
前記音声認識の結果に基づいて、前記第１の車両における会話内容を決定する会話内容理解手段と、
前記第１の車両の会話内容を知らせる発話を生成し、第２の車両において出力されるよう制御する出力制御手段と、
を備える、音声処理システム。 Speech recognition means for performing speech recognition on conversational speech in the first vehicle;
Conversation content understanding means for determining the conversation content in the first vehicle based on the result of the speech recognition;
An output control means for generating an utterance informing the conversation content of the first vehicle and controlling the utterance to be output in the second vehicle;
A speech processing system comprising:

前記第１の車両における会話内容に基づいて、当該会話内容を前記第２の車両に知らせるか否かを決定する伝達判断手段をさらに備え、
前記伝達判断手段によって前記第１の車両における会話内容を第２の車両に知らせると決定した場合に、前記出力制御手段は、前記第１の車両の会話内容を知らせる発話が前記第２の車両において出力されるように制御する、
請求項１に記載の音声処理システム。 Based on the conversation content in the first vehicle, further comprises a transmission judgment means for determining whether or not to inform the second vehicle of the conversation content,
When it is determined by the transmission determining means that the content of the conversation in the first vehicle is to be notified to the second vehicle, the output control means generates an utterance informing the content of the conversation in the first vehicle in the second vehicle. Control to output,
The speech processing system according to claim 1.

前記伝達判断手段は、前記第１の車両における会話があらかじめ定められた話題に関する場合に、前記第１の車両における会話内容を前記第２の車両に知らせると決定する、
請求項２に記載の音声処理システム。 The transmission determining means determines to inform the second vehicle of the content of the conversation in the first vehicle when the conversation in the first vehicle relates to a predetermined topic;
The voice processing system according to claim 2.

前記伝達判断手段は、前記第２の車両の位置情報または車両制御情報も考慮して、前記第１の車両の会話内容を前記第２の車両に知らせるか否かを決定する、
請求項２から３のいずれか１項に記載の音声処理システム。 The transmission determination means determines whether or not to inform the second vehicle of the conversation content of the first vehicle in consideration of position information or vehicle control information of the second vehicle.
The speech processing system according to any one of claims 2 to 3.

前記音声認識手段は前記第２の車両における会話音声に対しても音声認識を行い、
前記会話内容理解手段は、前記第２の車両の会話音声に対する音声認識結果に基づいて、前記第２の車両の会話内容を決定し、
前記伝達判断手段は、前記第２の車両の会話の状況も考慮して、前記第１の車両の会話内容を前記第２の車両に知らせるか否かを決定する
請求項２から４のいずれか１項に記載の音声処理システム。 The voice recognition means also performs voice recognition for conversation voice in the second vehicle,
The conversation content understanding means determines the conversation content of the second vehicle based on a voice recognition result for the conversation voice of the second vehicle,
5. The transmission determination unit determines whether or not to notify the second vehicle of the conversation content of the first vehicle in consideration of the conversation state of the second vehicle. The speech processing system according to item 1.

前記伝達判断手段は、前記第１の車両の車両制御情報に基づいて前記第１の車両が危険回避行動を取ったと判断される場合には、そうでない場合とは異なる判断基準にしたがって前記第１の車両の会話内容を前記第２の車両に知らせるか否かを決定する、
請求項２から５のいずれか１項に記載の音声処理システム。 When it is determined that the first vehicle has taken a risk avoidance action based on the vehicle control information of the first vehicle, the transmission determination means is configured to determine the first according to a determination criterion different from the case where the first vehicle does not. Determining whether to inform the second vehicle of the conversation content of the vehicle of
The speech processing system according to any one of claims 2 to 5.

音声処理システムが実行する音声処理方法であって、
第１の車両における会話音声に対して音声認識を行う音声認識ステップと、
前記音声認識の結果に基づいて、前記第１の車両における会話内容を決定する会話内容理解ステップと、
前記第１の車両の会話内容を知らせる発話を生成し、第２の車両において出力されるよう制御する出力制御ステップと、
を含む、音声処理方法。 A speech processing method executed by the speech processing system,
A voice recognition step for performing voice recognition on the conversation voice in the first vehicle;
A conversation content understanding step for determining a conversation content in the first vehicle based on the result of the speech recognition;
An output control step of generating an utterance informing the conversation content of the first vehicle and controlling the utterance to be output in the second vehicle;
Including a voice processing method.

請求項７に記載の方法の各ステップをコンピュータに実行させるためのプログラム。 The program for making a computer perform each step of the method of Claim 7.