JP2019200239A

JP2019200239A - Language setting device

Info

Publication number: JP2019200239A
Application number: JP2018093239A
Authority: JP
Inventors: 亮太尾首; Ryota Oshu
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2018-05-14
Filing date: 2018-05-14
Publication date: 2019-11-21

Abstract

To provide a language setting device capable of executing services even in a language of a different kind from a set language.SOLUTION: In a language setting device 10, a sound acquisition unit 20 acquires detected in-vehicle sound. An utterance acquisition unit 22 acquires utterance information of an occupant who requires service, extracted from the in-vehicle sound. A language specifying unit 24 specifies the kind of a language uttered by the occupant who required the service, on the basis of the utterance information. An output controller 28 outputs the response of the requested service, with the specified kind of language.SELECTED DRAWING: Figure 1

Description

本発明は、乗員に提供する情報の言語の種類を設定する言語設定装置に関する。 The present invention relates to a language setting device for setting a language type of information provided to an occupant.

特許文献１には、利用者の母国語に対応していない場合に、利用者にテストや発話をさせて利用者の言語能力を取得し、取得した言語能力に基づいて、音声認識の対象とする言語を利用者の母国語以外の言語から選択して、選択した言語で音声認識の動作条件を設定する音声処理装置が開示される。 In Patent Literature 1, when the user's native language is not supported, the user's language ability is acquired by performing a test or utterance by the user. Based on the acquired language ability, the speech recognition target A speech processing apparatus is disclosed in which a language to be selected is selected from languages other than the user's native language, and operating conditions for speech recognition are set in the selected language.

特開２００５−３１１５０号公報JP 2005-31150 A

特許文献１に開示される技術では、選択した言語で音声認識を動作させるため、選択した言語を設定した後に、その設定された言語以外で利用者が発話した場合に対応していない。また、利用者が使用する言語でサービスを提供できることが好ましい。 In the technique disclosed in Patent Document 1, since speech recognition is operated in a selected language, it does not correspond to a case where a user speaks in a language other than the set language after the selected language is set. In addition, it is preferable that the service can be provided in the language used by the user.

本発明の目的は、乗員が発話した言語の種類に対応する言語でサービスを実行可能にする言語設定装置を提供することにある。 An object of the present invention is to provide a language setting device that can execute a service in a language corresponding to a language type spoken by a passenger.

上記課題を解決するために、本発明のある態様の言語設定装置は、検知した車内音を取得する音取得部と、車内音から抽出された、サービスを要求する乗員の発話情報を取得する発話取得部と、発話情報にもとづいて、サービスを要求した乗員が発話した言語の種類を特定する言語特定部と、特定された種類の言語で、要求されたサービスの応答を出力させる出力制御部と、を備える。 In order to solve the above problems, a language setting device according to an aspect of the present invention includes a sound acquisition unit that acquires detected in-vehicle sound, and an utterance that acquires utterance information of a passenger who requests a service, extracted from the in-vehicle sound. An acquisition unit; a language identification unit that identifies the type of language spoken by the occupant who requested the service based on the utterance information; and an output control unit that outputs a response of the requested service in the specified type of language; .

この態様によると、サービス実行毎にサービスを要求した乗員が発話した言語の種類を特定し、特定した種類の言語でサービスを提供できる。 According to this aspect, it is possible to specify the language type spoken by the occupant who requested the service every time the service is executed, and to provide the service in the specified type of language.

本発明によれば、乗員が発話した言語の種類に対応する言語でサービスを実行可能にする言語設定装置を提供できる。 ADVANTAGE OF THE INVENTION According to this invention, the language setting apparatus which can perform a service in the language corresponding to the kind of language which the passenger | crew spoke can be provided.

実施例の言語設定装置について説明するための図である。It is a figure for demonstrating the language setting apparatus of an Example. 表示言語を初期設定する処理を示すフローチャートである。It is a flowchart which shows the process which initializes a display language. サービス毎に言語を設定する処理を示すフローチャートである。It is a flowchart which shows the process which sets a language for every service.

図１は、実施例の言語設定装置１０について説明するための図である。図１において、さまざまな処理を行う機能ブロックとして記載される各要素は、ハードウェア的には、回路ブロック、メモリ、その他のＬＳＩで構成することができ、ソフトウェア的には、メモリにロードされたプログラムなどによって実現される。したがって、これらの機能ブロックがハードウェアのみ、ソフトウェアのみ、またはそれらの組合せによっていろいろな形で実現できることは当業者には理解されるところであり、いずれかに限定されるものではない。 FIG. 1 is a diagram for explaining a language setting device 10 according to the embodiment. In FIG. 1, each element described as a functional block for performing various processes can be configured by a circuit block, a memory, and other LSIs in terms of hardware, and loaded in the memory in terms of software. Realized by programs. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any one.

言語設定装置１０は、車両に設けられ、乗員が発話した言語に応じた言語で応答を出力させることが可能である。言語設定装置１０は、マイク１２から車内音を取得し、取得した乗員の発話に応じて出力部１４の出力を制御する。言語設定装置１０は、音取得部２０、発話取得部２２、言語特定部２４、応答生成部２６および出力制御部２８を備える。 The language setting device 10 is provided in the vehicle and can output a response in a language corresponding to the language spoken by the occupant. The language setting device 10 acquires in-vehicle sound from the microphone 12 and controls the output of the output unit 14 according to the acquired utterance of the occupant. The language setting device 10 includes a sound acquisition unit 20, an utterance acquisition unit 22, a language specification unit 24, a response generation unit 26, and an output control unit 28.

マイク１２は、車内音を検出するように設けられ、乗員の発話を含む音を電気信号に変換して、その信号を音取得部２０に送る。出力部１４は、スピーカおよび／またはディスプレイであって、出力制御部２８の制御により乗員に音声や画像で情報を出力する。 The microphone 12 is provided so as to detect an in-vehicle sound, converts a sound including an occupant's utterance into an electric signal, and sends the signal to the sound acquisition unit 20. The output unit 14 is a speaker and / or a display, and outputs information to the occupant by voice or image under the control of the output control unit 28.

音取得部２０は、マイク１２から車内音として音信号を取得する。発話取得部２２は、車内音から乗員の発話を抽出し、乗員の発話情報を取得する。 The sound acquisition unit 20 acquires a sound signal from the microphone 12 as a vehicle interior sound. The utterance acquisition unit 22 extracts the occupant's utterance from the in-vehicle sound, and acquires the occupant's utterance information.

言語特定部２４は、乗員の発話情報にもとづいて乗員が発話した言語の種類を特定する。言語特定部２４は、予め保持する各言語のモデル情報と、乗員の発話情報とをマッチングして乗員が発話した言語を特定する。予め保持される各言語のモデル情報は、予め設定された特定のワードであってよく、例えば「こんにちは」という意味を持つ各言語のワードであってよい。出力制御部２８が乗員に所定の意味のワードを発話するように促す出力をさせ、例えば、車載ディスプレイに複数種類の言語で「こんにちは」と発話してくださいと表示させることで、所定の意味の発話が取得される。例えば、言語特定部２４は、乗員が「コンニチハ」と発話すれば日本語であると決定し、「ハロー」と発話すれば英語であると決定し、「ニーハオ」と発話すれば中国語であると決定する。また、言語特定部２４は、乗員の発話情報から単語を抽出し、抽出した単語を各言語のモデル情報と照合して、乗員が発話した言語を特定してよい。 The language specifying unit 24 specifies the language type spoken by the occupant based on the utterance information of the occupant. The language specifying unit 24 specifies the language spoken by the occupant by matching model information of each language held in advance with the utterance information of the occupant. Advance model information for each language to be retained may be a predetermined specific word may be a word of the language with the meaning such as "Hello". Is output to the output control unit 28 is prompted to utter the words of a predetermined meaning to the passenger, for example, by displaying a, please utters "Hello" in multiple different languages on-vehicle display, of a predetermined mean An utterance is acquired. For example, if the occupant speaks “Konichiha”, the language identifying unit 24 determines that the language is Japanese, if he speaks “Hello”, determines that the language is English, and speaks “Niehao”, the language is Chinese. And decide. Further, the language specifying unit 24 may extract words from the utterance information of the occupant and collate the extracted words with model information of each language to specify the language uttered by the occupant.

言語特定部２４は、乗員が車載装置のサービスを要求する発話をした場合に、その乗員の発話情報にもとづいて乗員が発話した言語の種類を特定する。言語特定部２４は、乗員が発話した言語の種類を特定できない場合、その乗員の発話情報をサーバ装置に送信し、サーバ装置により特定された言語の種類を受け取るとともに、サーバ装置により特定された種類の言語の言語情報を受け取ってよい。 The language specifying unit 24 specifies the type of language spoken by the occupant based on the utterance information of the occupant when the occupant makes an utterance requesting the service of the in-vehicle device. If the language type spoken by the occupant cannot identify the type of language spoken by the occupant, the language identifying unit 24 transmits the utterance information of the occupant to the server device, receives the language type identified by the server device, and identifies the type identified by the server device. You may receive linguistic information for other languages.

言語保持部３０は、日本語、英語、中国語など、複数種類の言語情報を保持しており、言語特定部２４による特定処理と、応答生成部２６による生成処理に用いられる。言語保持部３０は、車載機能を表示する際の予め定められた言語情報と、辞書として機能する言語情報と、音声モデル情報とを保持する。言語保持部３０は、例えば、目的地設定機能、目的地検索機能、音楽設定機能、地図表示機能などのサービスを表示する表示言語を各言語毎に保持している。言語保持部３０は、使用されやすい複数種類の言語情報を保持し、それら以外の言語情報をサーバ装置から取得してもよい。言語保持部３０がサーバ装置から言語情報を取得する場合であっても、その言語情報を保持する。 The language holding unit 30 holds a plurality of types of language information such as Japanese, English, Chinese, etc., and is used for specifying processing by the language specifying unit 24 and generating processing by the response generating unit 26. The language holding unit 30 holds predetermined language information for displaying the in-vehicle function, language information that functions as a dictionary, and voice model information. The language holding unit 30 holds, for each language, a display language for displaying services such as a destination setting function, a destination search function, a music setting function, and a map display function. The language holding unit 30 may hold a plurality of types of language information that are easy to use, and may acquire other language information from the server device. Even when the language holding unit 30 acquires language information from the server device, the language information is held.

応答生成部２６は、車載されたシステムソフトウェアの表示言語として初期設定された言語で、乗員の発話情報に対応する応答情報を生成する。初期設定された表示言語は、サーバ装置から取得する場合もあるが、いずれにしても言語保持部３０に保持される。応答生成部２６は、車載装置に乗員の操作が入力された場合、初期設定された表示言語で応答情報を生成する。また、応答生成部２６は、初期設定された表示言語で音声による応答情報を生成する。ディスプレイに表示されるメニュー項目などが、初期設定された表示言語で表示される。 The response generation unit 26 generates response information corresponding to the utterance information of the occupant in a language that is initially set as a display language of the on-board system software. The initially set display language may be acquired from the server device, but in any case, it is held in the language holding unit 30. The response generation unit 26 generates response information in an initially set display language when an occupant's operation is input to the in-vehicle device. In addition, the response generation unit 26 generates response information by voice in an initially set display language. Menu items displayed on the display are displayed in the default display language.

応答生成部２６は、言語特定部２４により特定された種類の言語を言語保持部３０から引き出して、初期設定された表示言語以外の言語で応答情報を生成することが可能である。例えば、乗員の発話情報が目的地設定サービスを要求するもので、初期設定の表示言語と異なる言語で発話されたものである場合、応答生成部２６は、言語特定部２４で特定された種類の言語で目的地設定画像を生成する。応答情報は、ディスプレイに表示される画像であってよく、スピーカで出力される音声であってよい。 The response generation unit 26 can extract the type of language specified by the language specification unit 24 from the language holding unit 30 and generate response information in a language other than the initially set display language. For example, when the utterance information of the occupant requests the destination setting service and is uttered in a language different from the default display language, the response generation unit 26 selects the type specified by the language specifying unit 24. Generate destination setting image in language. The response information may be an image displayed on the display or may be sound output from a speaker.

発話取得部２２がサービスを要求するトリガ語句の発話を取得した場合、言語特定部２４は、トリガ語句を発話した乗員が使用する言語の種類を特定し、応答生成部２６は、言語特定部２４により特定された種類の言語で、要求された応答情報を生成する。トリガ語句は、目的地設定機能、目的地検索機能、音楽設定機能などのサービスを要求する契機となるものであって、予め設定された所定のワードである。トリガ語句が発話取得部２２により検出されると、音声認識処理が実行開始される。 When the utterance acquisition unit 22 acquires the utterance of the trigger phrase that requests the service, the language specifying unit 24 specifies the type of language used by the occupant who uttered the trigger phrase, and the response generating unit 26 is the language specifying unit 24. The requested response information is generated in the language of the type specified by. The trigger phrase is a trigger for requesting services such as a destination setting function, a destination search function, and a music setting function, and is a predetermined word set in advance. When the trigger word / phrase is detected by the utterance acquisition unit 22, the speech recognition process is started.

出力制御部２８は、応答生成部２６により生成された応答情報を出力部１４から出力されるよう制御する。出力制御部２８は、初期設定の表示言語を用いて生成された応答情報を出力させる場合と、乗員が発話した言語を用いて生成された応答情報を出力させる場合がある。 The output control unit 28 performs control so that the response information generated by the response generation unit 26 is output from the output unit 14. The output control unit 28 may output response information generated using an initial display language, or may output response information generated using a language spoken by an occupant.

初期設定の表示言語は、車載電源がオンになったときに取得された乗員の発話によって設定される。出力制御部２８は、車載電源がオンになったときに、乗員が使用する言語の種類を特定するため、乗車した乗員に対して発話を促す出力をする。この出力に対して乗員が発話をすると、その発話を発話取得部２２が取得し、言語特定部２４が発話された言語の種類を特定し、応答生成部２６は、特定された種類の言語を、システムソフトウェアの表示言語として設定する。例えばレンタカーにおいて地域で使用される言語の種類と乗員が使用する言語の種類が異なる場合があり、車載のシステムソフトウェアの表示言語を乗員が使用する言語に設定することで、乗員が車載の各種サービスを使用しやすくなる。言語設定装置１０には、システムソフトウェアの表示言語を初期状態に戻す機能があってよく、システムソフトウェアの表示言語を初期状態に戻した後、車載電源がオンになったときに、乗員の発話を発話取得部２２が取得し、言語特定部２４が発話された言語の種類を特定し、応答生成部２６は、特定された種類の言語を、システムソフトウェアの表示言語として設定する。これにより、乗員がシステムソフトウェアの表示言語を設定する操作をしなくても、乗員の発話をもとに自動的に設定できる。 The default display language is set based on the utterance of the occupant acquired when the in-vehicle power supply is turned on. When the on-vehicle power supply is turned on, the output control unit 28 outputs an output that prompts the occupant to speak to specify the language type used by the occupant. When the occupant utters the output, the utterance acquisition unit 22 acquires the utterance, the language specifying unit 24 specifies the type of the spoken language, and the response generation unit 26 selects the specified type of language. Set as the system software display language. For example, the type of language used in the area in a rental car may differ from the type of language used by the occupant. By setting the display language of the in-vehicle system software to the language used by the occupant, the occupant can use various in-vehicle services. Easy to use. The language setting device 10 may have a function of returning the display language of the system software to the initial state. When the vehicle power is turned on after the display language of the system software is returned to the initial state, the language setting device 10 The speech acquisition unit 22 acquires and the language specification unit 24 specifies the type of language spoken, and the response generation unit 26 sets the specified type of language as the display language of the system software. Thus, even if the occupant does not perform an operation for setting the display language of the system software, it can be automatically set based on the utterance of the occupant.

発話取得部２２がサービスを要求する発話情報を取得した場合、言語特定部２４は、発話した乗員が使用する言語の種類を特定し、応答生成部２６は、そのサービスをやり遂げるまで、言語特定部２４により特定された種類の言語で、要求された応答情報を生成する。例えば、応答生成部２６は、乗員が目的地案内機能を要求した場合、目的地に到着して案内が終了するまで、言語特定部２４により特定された種類の言語で、要求された応答情報を生成する。このようにシステムソフトウェアの表示言語を変更しなくとも、乗員が発話した言語の種類でサービスを提供できる。また、サービス毎に異なる種類の言語でサービスを提供できる。 When the utterance acquisition unit 22 acquires the utterance information requesting the service, the language specifying unit 24 specifies the type of language used by the occupant who spoke, and the response generation unit 26 performs the language specifying unit until the service is completed. The requested response information is generated in the language of the type specified by 24. For example, when the occupant requests the destination guidance function, the response generation unit 26 sends the requested response information in the language specified by the language specification unit 24 until the arrival at the destination and the guidance ends. Generate. Thus, even if the display language of the system software is not changed, the service can be provided in the language type spoken by the passenger. In addition, services can be provided in different types of languages for each service.

図２は、表示言語を初期設定する処理を示すフローチャートである。出力制御部２８は、車載電源がオンとなったときに乗車した乗員に対して所定の意味の発話を促す出力を出力部１４にさせる（Ｓ１０）。この出力に乗員が答えて所定の意味の発話をし、発話取得部２２は、その乗員の発話情報を取得する（Ｓ１２）。 FIG. 2 is a flowchart showing a process for initializing the display language. The output control unit 28 causes the output unit 14 to output an output that prompts the occupant who gets on the vehicle when the vehicle-mounted power source is turned on (S10). The occupant answers the output and utters a predetermined meaning, and the utterance acquisition unit 22 acquires the utterance information of the occupant (S12).

言語特定部２４は、取得された乗員の発話情報にもとづいて、乗員が発話した言語の種類を特定する（Ｓ１４）。応答生成部２６は、特定した種類の言語でシステムソフトウェアの表示言語を初期設定してよいか確認する応答情報を生成し、出力制御部２８は、応答情報を表示させる。特定した種類の言語を表示言語に初期設定することを乗員が了承しない場合（Ｓ１６のＮ）、Ｓ１０に戻る。 The language specifying unit 24 specifies the type of language spoken by the occupant based on the acquired utterance information of the occupant (S14). The response generation unit 26 generates response information for confirming whether the display language of the system software can be initialized in the specified type of language, and the output control unit 28 displays the response information. If the occupant does not approve the initial setting of the specified type of language as the display language (N in S16), the process returns to S10.

特定した種類の言語を表示言語に初期設定することを乗員が了承した場合（Ｓ１６のＹ）、出力制御部２８は、特定された種類の言語でシステムソフトウェアの表示言語を設定して、設定された表示言語で出力させる（Ｓ１８）。このように、乗員が初期設定の操作をしなくとも、発話から乗員が使用する種類の言語に設定できる。 When the occupant approves that the specified type of language is initially set as the display language (Y in S16), the output control unit 28 sets the display language of the system software in the specified type of language. The display language is output (S18). In this way, even if the occupant does not perform the initial setting operation, it is possible to set the language used by the occupant from the utterance.

図３は、サービス毎に言語を設定する処理を示すフローチャートである。発話取得部２２は、サービスを要求する乗員の発話情報を取得し（Ｓ２０）、言語特定部２４は、発話取得部２２により取得された乗員の発話情報の言語の種類を特定する（Ｓ２２）。応答生成部２６は、言語特定部２４により特定された種類の言語で、要求されたサービスに関する応答情報を生成する（Ｓ２４）。 FIG. 3 is a flowchart showing processing for setting a language for each service. The utterance acquisition unit 22 acquires the utterance information of the occupant requesting the service (S20), and the language specifying unit 24 specifies the language type of the occupant utterance information acquired by the utterance acquisition unit 22 (S22). The response generation unit 26 generates response information related to the requested service in the type of language specified by the language specification unit 24 (S24).

所定の終了条件を満たすまで（Ｓ２６のＮ）、応答生成部２６は、言語特定部２４により特定された種類の言語で、要求されたサービスに関する応答情報を生成する（Ｓ２４）。所定の終了条件は、例えば要求されたサービスをやり遂げると満たされ、１回のインタラクションが終了すると満たされてよい。 Until the predetermined end condition is satisfied (N in S26), the response generation unit 26 generates response information related to the requested service in the language of the type specified by the language specifying unit 24 (S24). The predetermined termination condition may be satisfied when a requested service is performed, for example, and may be satisfied when one interaction is terminated.

所定の終了条件を満たすと（Ｓ２６のＹ）、出力制御部２８は、要求されたサービスの出力を終了し（Ｓ２８）、初期設定された表示言語で表示させる（Ｓ３０）。このように、初期設定された表示言語の種類と異なる種類の言語の発話でサービス要求された場合、その言語の種類でサービスを提供できる。 When the predetermined end condition is satisfied (Y in S26), the output control unit 28 ends the output of the requested service (S28) and displays it in the initially set display language (S30). As described above, when a service request is made with an utterance in a language different from the initially set display language, the service can be provided in that language type.

なお実施例はあくまでも例示であり、各構成要素の組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 It is to be understood by those skilled in the art that the embodiments are merely examples, and that various modifications can be made to combinations of the constituent elements, and that such modifications are within the scope of the present invention.

１０言語設定装置、１２マイク、１４出力部、２０音取得部、２２発話取得部、２４言語特定部、２６応答生成部、２８出力制御部、３０言語保持部。 DESCRIPTION OF SYMBOLS 10 language setting apparatus, 12 microphone, 14 output part, 20 sound acquisition part, 22 utterance acquisition part, 24 language specification part, 26 response generation part, 28 output control part, 30 language holding part.

Claims

検知した車内音を取得する音取得部と、
車内音から抽出された、サービスを要求する乗員の発話情報を取得する発話取得部と、
前記発話情報にもとづいて、サービスを要求した乗員が発話した言語の種類を特定する言語特定部と、
特定された種類の言語で、要求されたサービスの応答を出力させる出力制御部と、を備えることを特徴とする言語設定装置。 A sound acquisition unit for acquiring the detected interior sound;
An utterance acquisition unit that acquires utterance information of passengers requesting service, extracted from in-car sound;
A language identifying unit that identifies the type of language spoken by the occupant who requested the service based on the speech information;
An output control unit that outputs a response of a requested service in a specified type of language.