JP2023541603A

JP2023541603A - Chaotic testing of voice-enabled devices

Info

Publication number: JP2023541603A
Application number: JP2023515876A
Authority: JP
Inventors: アナンタプル、バチェ、ヴィジャイ、クマール; ジャヤラタナサミー、プラディープ、ラジ; タンガラジ、シュリタール、ラジャン; ランガラジャン、アービンド
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2020-09-11
Filing date: 2021-09-08
Publication date: 2023-10-03
Also published as: DE112021003937T5; CN116114015A; GB202304707D0; GB2614192A; WO2022052945A1; US20220084501A1; US11769484B2

Abstract

音声アシスタントデバイスをテストするためのコンピュータ実装方法、コンピュータプログラム製品、およびコンピュータシステムは、データベースからテストデータを受信するように構成された１または複数のプロセッサを含むことができ、テストデータは、符号化パラメータの第１のセットと、予想されるデバイス応答を有する第１のユーザ発話とを含む。さらに、１または複数のプロセッサは、第１のユーザ発話に符号化パラメータの第１のセットを適用することによって第１の修正されたユーザ発話を生成するように構成されてもよく、第１の修正ユーザ発話は、第１のユーザ発話と音響的に異なる。１または複数のプロセッサは、第１の修正されたユーザ発話を音声アシスタントデバイスに可聴に提示することと、音声アシスタントデバイスから第１のデバイス応答を受信することと、第１の音声アシスタント応答が予想されるデバイス応答と実質的に類似しているか否かを決定することと、を行うように構成されてもよい。A computer-implemented method, computer program product, and computer system for testing a voice assistant device can include one or more processors configured to receive test data from a database, the test data being encoded. A first user utterance having a first set of parameters and an expected device response. Additionally, the one or more processors may be configured to generate a first modified user utterance by applying the first set of encoding parameters to the first user utterance, the first The modified user utterance is acoustically different from the first user utterance. The one or more processors are configured to audibly present a first modified user utterance to a voice assistant device, receive a first device response from the voice assistant device, and determine whether the first voice assistant response is expected. and determining whether the device response is substantially similar to the received device response.

Description

本発明は、一般に、デバイステストの分野に関し、より詳細には、機械学習技術を用いた音声対応デバイスのテストに関するものである。 TECHNICAL FIELD The present invention relates generally to the field of device testing, and more particularly to testing voice-enabled devices using machine learning techniques.

対話型音声アシスタントは、音声コマンドを検出し、音声コマンドを処理し、特定のタスクを実行するユーザの意図を判断することができる。音声アシスタントデバイスは、ユーザと対話し、ユーザと継続的に会話を行うように進化している。音声アシスタントは、バーチャルアシスタント用のシステムの主要なインターフェースになっている。 Interactive voice assistants can detect voice commands, process voice commands, and determine a user's intent to perform a particular task. Voice assistant devices are evolving to interact with users and carry out continuous conversations with them. Voice assistants have become the primary interface for systems for virtual assistants.

しかしながら、音声アシスタントデバイスは、雑音の多い環境で妨げられることがあり、音声コマンドまたはユーザの発話の正確な意味を決定することがより困難になる。さらに、音声アシスタントは、音声タイプおよび変調の程度を変えてテストされないので、音声アシスタントデバイスが、テスト中に使用された標準的な単語またはフレーズから自然に変調された単語またはフレーズを識別することは困難である。 However, voice assistant devices can be hampered in noisy environments, making it more difficult to determine the exact meaning of voice commands or user utterances. Additionally, voice assistants are not tested with varying voice types and degrees of modulation, so it is unlikely that a voice assistant device will identify naturally modulated words or phrases from the standard words or phrases used during testing. Have difficulty.

本発明の実施形態は、音声アシスタントデバイスをテストするためのコンピュータ実装方法、コンピュータプログラム製品、およびシステムを開示する。音声アシスタントデバイスをテストするためのコンピュータ実装方法は、データベースからテストデータを受信するように構成された１または複数のプロセッサを含むことができ、テストデータは、符号化パラメータの第１のセットと、予想されるデバイス応答を有する第１のユーザ発話とを含む。また、１または複数のプロセッサは、第１のユーザ発話に符号化パラメータの第１のセットを適用することによって第１の修正されたユーザ発話を生成するように構成されてもよく、第１の修正ユーザ発話は、第１のユーザ発話と音響的に異なる。 Embodiments of the invention disclose computer-implemented methods, computer program products, and systems for testing voice assistant devices. A computer-implemented method for testing a voice assistant device can include one or more processors configured to receive test data from a database, the test data comprising: a first set of encoded parameters; a first user utterance having an expected device response. The one or more processors may also be configured to generate a first modified user utterance by applying the first set of encoding parameters to the first user utterance; The modified user utterance is acoustically different from the first user utterance.

１または複数のプロセッサは、第１の修正されたユーザ発話を音声アシスタントデバイスに可聴に提示することと、音声アシスタントデバイスから第１の音声アシスタント応答を受信することと、を行うように構成されてもよく、第１の音声アシスタント応答は、第１の修正されたユーザ発話に応答している。第１の修正されたユーザ発話は、１または複数のプロセッサに通信可能に接続されたスピーカーを介して第１の音声信号として出力することによって音声アシスタントデバイスに可聴に提示されてもよく、音声アシスタントデバイスから受信した第１の音声アシスタント応答は、１または複数のプロセッサに通信可能に接続されたマイクロフォンによって第２の音声信号として検出されてもよい。さらに、１または複数のプロセッサは、第１の音声アシスタント応答が予想されるデバイス応答と実質的に類似しているか否かを決定するようにも構成されてよい。 The one or more processors are configured to audibly present a first modified user utterance to the voice assistant device and receive a first voice assistant response from the voice assistant device. Alternatively, the first voice assistant response may be in response to the first modified user utterance. The first modified user utterance may be audibly presented to the voice assistant device by outputting it as a first audio signal through a speaker communicatively connected to the one or more processors, the voice assistant A first voice assistant response received from the device may be detected as a second voice signal by a microphone communicatively coupled to the one or more processors. Additionally, the one or more processors may also be configured to determine whether the first voice assistant response is substantially similar to an expected device response.

第１の音声アシスタント応答が予想されるデバイス応答と実質的に類似していると決定することに応答して、１または複数のプロセッサは、追加のテストデータの受信を停止するように構成されてもよい。さらに、第１の音声アシスタント応答が予想されるデバイス応答と実質的に類似していないと決定することに応答して、１または複数のプロセッサは、少なくとも符号化パラメータの第１のセットおよび第１の修正されたユーザ発話に基づく第１のエッジケースを識別するように構成されてもよい。 In response to determining that the first voice assistant response is substantially similar to an expected device response, the one or more processors are configured to stop receiving additional test data. Good too. Further, in response to determining that the first voice assistant response is not substantially similar to an expected device response, the one or more processors include at least the first set of encoding parameters and the first voice assistant response. The first edge case may be configured to identify a first edge case based on the modified user utterance.

さらに、音声アシスタントデバイスをテストするためのコンピュータ実装方法は、符号化パラメータの第１のセット、第１の修正されたユーザ発話、および第１のエッジケースのうちの少なくとも１つに基づいて、符号化パラメータの第２のセットを決定するように構成された１または複数のプロセッサを含んでもよい。１または複数のプロセッサは、第１のユーザ発話に符号化パラメータの第２のセットを適用することによって第２の修正されたユーザ発話を生成するようにさらに構成されてもよく、第２の修正されたユーザ発話は、第１のユーザ発話および第１の修正されたユーザ発話と音響的に異なる。１または複数のプロセッサは、第２の修正されたユーザ発話を音声アシスタントデバイスに可聴に提示することと、第２の音声アシスタント応答を受信することと、第２の音声アシスタント応答が予想されるデバイス応答と実質的に類似しているか否かを決定することと、をさらに行うように構成されてもよい。さらに、第２の音声アシスタント応答が予想されるデバイス応答と実質的に類似していると決定することに応答して、１または複数のプロセッサは、第１のユーザ発話を含む追加のテストデータの受信を停止するように構成されてもよい。 Further, the computer-implemented method for testing a voice assistant device includes: a first set of encoding parameters; a first modified user utterance; and a first edge case. and one or more processors configured to determine the second set of parameters. The one or more processors may be further configured to generate a second modified user utterance by applying a second set of encoding parameters to the first user utterance; The modified user utterance is acoustically different from the first user utterance and the first modified user utterance. The one or more processors are configured to audibly present a second modified user utterance to a voice assistant device, to receive a second voice assistant response, and to a device from which the second voice assistant response is anticipated. and determining whether the response is substantially similar. Further, in response to determining that the second voice assistant response is substantially similar to the expected device response, the one or more processors generate additional test data that includes the first user utterance. It may be configured to stop reception.

本発明の一実施形態による、音声アシスタントデバイスのテストのための分散型データ処理環境を示す機能ブロック図である。1 is a functional block diagram illustrating a distributed data processing environment for testing voice assistant devices, according to one embodiment of the present invention. FIG. 本発明の一実施形態による、図１の分散型データ処理環境内のサーバコンピュータ上で、音声アシスタントデバイスをテストするためのコンピュータ実装方法の動作ステップを示すフローチャートである。2 is a flowchart illustrating the operational steps of a computer-implemented method for testing a voice assistant device on a server computer in the distributed data processing environment of FIG. 1, according to one embodiment of the invention. 本発明の一実施形態による、図１の分散型データ処理環境内で音声アシスタントデバイスをテストするためのコンピュータ実装方法を実行するサーバコンピュータの構成要素のブロック図である。2 is a block diagram of components of a server computer that executes a computer-implemented method for testing voice assistant devices within the distributed data processing environment of FIG. 1, according to one embodiment of the invention. FIG.

消費者体験は日々新たなレベルへと成長している。その最前線にいるのが、対話型音声アシスタントである。現在、さまざまな人工知能（ＡＩ）システムと音声アシスタント（ＶＡ）システムが市場に出回っている。他の製品同様、ＶＡ技術も、音声認識とソフトウェアのサポートのブレークスルーが達成されるにつれて、常に向上し続けている。ＶＡデバイスはユーザと対話することができ、継続的な会話を行う際の人間の反応をシミュレートするように進化している。ＶＡデバイスは、様々なテクノロジー企業が開発したＶＡを利用するシステムの主要なインターフェースとなっている。 Consumer experience is growing to new levels every day. At the forefront of this is conversational voice assistants. Currently, there are various artificial intelligence (AI) systems and voice assistant (VA) systems on the market. Like other products, VA technology continues to improve as breakthroughs in speech recognition and software support are achieved. VA devices are capable of interacting with users and are evolving to simulate human reactions in conducting ongoing conversations. VA devices have become the primary interface for VA-based systems developed by various technology companies.

音声認識システムが直面する最大の課題の１つは雑音の多い環境下での動作と他の環境音との競合である。正しい音声を識別するための研究は数多く行われているが、音声に雑音や変調があると、ＶＡシステムはユーザを認証することができない。また、ＶＡアプリケーションの企業システムへの実装が進むにつれ、ＶＡデバイスの応答精度はますます複雑になり、その結果、ＶＡシステムのテストも複雑化し、ＶＡシステムのテストは大仕事になっている。テストが不十分だと、エラーが多発しやすいＶＡシステムになり、ＶＡアプリケーションの障害時にユーザエクスペリエンスが損なわれることになる。ユーザ基板の拡大に伴い、ＶＡデバイスおよびシステムのユーザエクスペリエンスを向上させる必要性が明らかである。 One of the biggest challenges facing speech recognition systems is operating in noisy environments and competing with other environmental sounds. Although much research has been done to identify the correct voice, VA systems cannot authenticate users if the voice is noisy or modulated. Furthermore, as the implementation of VA applications into corporate systems progresses, the response accuracy of VA devices becomes increasingly complex, and as a result, testing of VA systems also becomes complex, making testing VA systems a major undertaking. Inadequate testing can result in an error-prone VA system and a poor user experience when a VA application fails. As the user substrate expands, there is a clear need to improve the user experience of VA devices and systems.

本発明の実施形態は、ＶＡデバイスの応答精度を高めるために、様々なタイプおよびレベルの雑音を含む多数の環境においてＶＡデバイスを試験する必要があることを認識する。さらに、ＶＡデバイスは、ＶＡデバイスがユーザ発話を理解できないようにするのに十分なユーザ発話に対する特定の変調を示すエッジケースを識別するためにテストされる。エッジケースが識別されると、エッジケースをもたらしたユーザ発話の変調に使用されたテストデータを保存することができ、ユーザ発話の特定を不可能にした条件と関連付けることができる。また、エッジケースに関連付けられたテストデータは、ユーザ発話を変調するために微調整することができ、微調整は、後続のテスト反復においてエッジケースを回避するのに十分であろう。テストデータを微調整すること、特に符号化パラメータ（例えば、ボコーダパラメータ）を微調整することによって、背景雑音と音声変調を無視して、ユーザの発話のコマンドまたは質問を適切に識別するために、ＶＡデバイスは、背景雑音と音声変調を識別できるように調整される。さらに、ＶＡデバイスは、ＶＡデバイスの応答精度を高めるために、同じコマンドまたはフレーズを可聴に出力する様々なタイプのユーザ発話でテストされることも必要である。本発明の実施形態の実装は、様々な形態をとることができ、例示的な実装の詳細については、図を参照して後に説明する。 Embodiments of the present invention recognize the need to test VA devices in multiple environments containing various types and levels of noise in order to increase their response accuracy. Additionally, the VA device is tested to identify edge cases that exhibit certain modulations to the user's utterances that are sufficient to prevent the VA device from understanding the user's utterances. Once an edge case is identified, the test data used to modulate the user utterance that resulted in the edge case can be saved and associated with the conditions that made identification of the user utterance impossible. Also, the test data associated with the edge case can be fine-tuned to modulate user utterances, and the fine-tuning may be sufficient to avoid the edge case in subsequent test iterations. By fine-tuning the test data, in particular by fine-tuning the encoding parameters (e.g., vocoder parameters), to ignore background noise and voice modulation and properly identify commands or questions in the user's utterances. The VA device is tuned to be able to distinguish between background noise and voice modulation. Furthermore, the VA device also needs to be tested with various types of user utterances that audibly output the same command or phrase to increase the response accuracy of the VA device. Implementations of embodiments of the invention may take various forms, and exemplary implementation details are described below with reference to the figures.

図１は、本発明の一実施形態による、一般に１００と指定される音声アシスタントデバイスのテストのための分散型データ処理環境を示す機能ブロック図である。本明細書で使用される「分散型」という用語は、単一のコンピュータシステムとして一緒に動作する、物理的に異なる複数の装置を含むコンピュータシステムを説明するものである。図１は、１つの実装の例示のみを提供し、異なる実施形態が実装され得る環境に関して、いかなる制限も意味しない。描かれた環境に対する多くの修正は、特許請求の範囲によって述べられる本発明の範囲から逸脱することなく、当業者によってなされ得る。 FIG. 1 is a functional block diagram illustrating a distributed data processing environment for testing voice assistant devices, generally designated 100, in accordance with one embodiment of the present invention. The term "distributed" as used herein describes a computer system that includes multiple physically distinct devices that operate together as a single computer system. FIG. 1 provides only one example implementation and does not imply any limitations as to the environment in which different embodiments may be implemented. Many modifications to the depicted environment can be made by those skilled in the art without departing from the scope of the invention as described by the claims.

描かれた実施形態では、分散型データ処理環境１００は、ネットワーク１１０を介して相互接続された音声アシスタントデバイス１２０、サーバ１２５、データベース１２２、音声テキスト化モジュール１４０、および機械学習「ＭＬ」モデル１５０を含む。分散型データ処理環境１００は、音声アシスタントデバイスをテストするために分散型データ処理環境１００内のコンポーネント（例えば、音声アシスタントデバイス１２０、ボコーダ１３０、音声テキスト化「ＶＴＴ」モジュール１４０、もしくはＭＬモデル１５０、またはその組み合わせ）から受信したデータを格納し、データを送信するように構成されたデータベース１２２を含むことができる。分散型データ処理環境１００は、図示されていない追加のサーバ、コンピュータ、センサ、または他のデバイスを含むこともできる。各コンポーネント（例えば、音声アシスタントデバイス１２０、ボコーダ１３０、音声テキスト化「ＶＴＴ」モジュール１４０、もしくはＭＬモデル１５０、またはその組み合わせ）は、ネットワーク１１０から独立して互いの間でデータを通信するよう構成されてもよい。 In the depicted embodiment, a distributed data processing environment 100 includes a voice assistant device 120, a server 125, a database 122, a speech-to-text module 140, and a machine learning "ML" model 150 interconnected via a network 110. include. Distributed data processing environment 100 includes components within distributed data processing environment 100 (e.g., voice assistant device 120, vocoder 130, speech-to-text "VTT" module 140, or ML model 150, etc.) for testing voice assistant devices. or a combination thereof), and configured to transmit data. Distributed data processing environment 100 may also include additional servers, computers, sensors, or other devices not shown. Each component (e.g., voice assistant device 120, vocoder 130, voice-to-text “VTT” module 140, or ML model 150, or combinations thereof) is configured to communicate data between each other independently of network 110. It's okay.

ネットワーク１１０は、例えば、ローカルエリアネットワーク（ＬＡＮ）、ワイドエリアネットワーク（ＷＡＮ）、またはそれら２つの組み合わせとすることができ、有線、無線、または光ファイバー接続を含むことができるコンピューティングネットワークとして動作する。一般的に、ネットワーク１１０は、音声アシスタントデバイス１２０、サーバ１２５、データベース１２２、ＶＴＴモジュール１４０、およびＭＬモデル１５０間の通信をサポートする接続およびプロトコルの任意の組み合わせとすることができる。 Network 110 operates as a computing network, which can be, for example, a local area network (LAN), a wide area network (WAN), or a combination of the two, and can include wired, wireless, or fiber optic connections. In general, network 110 may be any combination of connections and protocols that support communication between voice assistant device 120, server 125, database 122, VTT module 140, and ML model 150.

音声アシスタントデバイス１２０は、ユーザプロンプトに応答するための音声アシスタントデバイスとして動作するように動作する。一実施形態では、音声アシスタントデバイス１２０は、ボコーダ１３０から音声データを受信または検出し、音声データを処理し、音声データに対応する音声アシスタント応答を含む音響信号を出力するように構成されてもよい。音声アシスタントデバイス１２０は、ネットワーク１１０から、または分散型データ処理環境１００内の他のシステムコンポーネントを介して、データを送信もしくは受信またはその両方を行うように構成されてもよい。いくつかの実施形態では、音声アシスタントデバイス１２０は、ラップトップコンピュータ、タブレットコンピュータ、ネットブックコンピュータ、パーソナルコンピュータ（ＰＣ）、デスクトップコンピュータ、スマートフォン、スマートスピーカー、バーチャルアシスタント、音声コマンドデバイス、または可聴入力を受信または検出すること、可聴入力を処理すること、および関連する応答を可聴に出力することができる任意のプログラム可能電子装置であり得る。音声アシスタントデバイス１２０は、図３においてさらに詳細に説明されるような構成要素を含んでもよい。 Voice assistant device 120 is operative to act as a voice assistant device for responding to user prompts. In one embodiment, voice assistant device 120 may be configured to receive or detect voice data from vocoder 130, process the voice data, and output an acoustic signal that includes a voice assistant response corresponding to the voice data. . Voice assistant device 120 may be configured to send and/or receive data from network 110 or via other system components within distributed data processing environment 100. In some embodiments, voice assistant device 120 is a laptop computer, tablet computer, netbook computer, personal computer (PC), desktop computer, smartphone, smart speaker, virtual assistant, voice command device, or receives audible input. or any programmable electronic device capable of detecting, processing audible input, and audibly outputting an associated response. Voice assistant device 120 may include components as described in further detail in FIG.

データベース１２２は、ネットワーク１１０および他の接続されたコンポーネントとの間で流れるデータのためのリポジトリとして動作するように構成されてもよい。データの例としては、テストデータ、デバイスデータ、ネットワークデータ、ボコーダによって処理されたユーザ発話に対応するデータ、符号化パラメータ、ボコーダパラメータ、修正されたユーザ発話に対応するデータ、および音声アシスタント応答に対応するデータを含む。データベースは、データの組織化されたコレクションである。データベース１２２は、データベースサーバ、ハードディスクドライブ、またはフラッシュメモリなど、サーバ１２５もしくは音声アシスタントデバイス１２０またはその両方によってアクセスおよび利用され得るデータおよび構成ファイルを格納することができる任意のタイプの記憶装置で実装することができる。一実施形態において、データベース１２２は、音声アシスタントデバイス１２０で実行されるテスト計画に対応するデータを格納および受信するために、ネットワーク１１０を介して、またはネットワーク１１０から独立して、ボコーダ１３０、音声アシスタントデバイス１２０、サーバ１２５、音声テキスト化モジュール１４０、もしくはＭＬモデル１５０、またはその組み合わせによってアクセスされてもよい。例えば、テストデータは、データベース１２２に受信および格納されてよく、テスト計画によって所望されるようにボコーダ１３０またはネットワーク１１０に接続された他の構成要素に送信されてもよい。別の実施形態では、データベース１２２は、ユーザデータ、デバイスデータ、ネットワークデータ、またはテスト計画に関連する他のデータにアクセスするために、サーバ１２５もしくはＭＬモデル１５０またはその両方によってアクセスされてよい。データベース１２２はまた、音声アシスタントデバイス１２０およびＶＴＴモジュール１４０によって処理および生成されたテスト結果データに対応するデータを格納するために、ＶＴＴモジュール１４０によってアクセスされてもよい。別の実施形態では、データベース１２２は、データベース１２２がネットワーク１１０へのアクセスを有することを条件として、分散型ネットワーク環境１００内の他の場所に存在してもよい。 Database 122 may be configured to operate as a repository for data flowing to and from network 110 and other connected components. Examples of data include test data, device data, network data, data corresponding to user utterances processed by a vocoder, encoding parameters, vocoder parameters, data corresponding to modified user utterances, and data corresponding to voice assistant responses. Contains data to A database is an organized collection of data. Database 122 is implemented with any type of storage capable of storing data and configuration files that can be accessed and utilized by server 125 and/or voice assistant device 120, such as a database server, hard disk drive, or flash memory. be able to. In one embodiment, the database 122 is connected to the vocoder 130, the voice assistant, via the network 110, or independently from the network 110, to store and receive data corresponding to test plans executed on the voice assistant device 120. It may be accessed by device 120, server 125, speech-to-text module 140, or ML model 150, or a combination thereof. For example, test data may be received and stored in database 122 and transmitted to vocoder 130 or other components connected to network 110 as desired by the test plan. In another embodiment, database 122 may be accessed by server 125 and/or ML model 150 to access user data, device data, network data, or other data related to the test plan. Database 122 may also be accessed by VTT module 140 to store data corresponding to test result data processed and generated by voice assistant device 120 and VTT module 140. In other embodiments, database 122 may reside elsewhere within distributed network environment 100, provided database 122 has access to network 110.

テストデータは、JavaScript Object Notation（ＪＳＯＮ）データ交換フォーマットに対応したデータ、音声コマンドデータを含むことができる。また、テストデータは、ユーザと他者との間のテキスト音声化会話に対応するデータを含むことができる。例えば、テストデータは、ユーザが話す音声としての可聴ユーザ発話を含むことができる。ユーザ発話は、特定のタスクまたは機能を実行するためのコマンドを含むことができる。また、ユーザ発話は、ＶＡデバイス１２０の動作の一部として実行されるバーチャルアシスタントからの応答を求めるためにユーザによって提起される質問を含んでもよい。テストデータは、符号化パラメータ（例えば、ボコーダパラメータ）およびＶＡデバイス１２０でテストを実行するための値も含んでよい。例えば、テストデータは、外部ソースを介してデータベース１２２に提供されてもよいし、データベース１２２と通信しているコンポーネントのうちの１または複数から受信してもよい。特に、音声コマンドは、「外の天気はどうですか」、「雨が降りそうですか」に続くトリガーワード／フレーズを含んでもよく、予想されるデバイス応答は、それぞれ「こんにちはＡＢ」、「２５度で、晴れています」、「はい、雨が降りそうです」であってよい。 The test data can include data compatible with the JavaScript Object Notation (JSON) data exchange format and voice command data. The test data may also include data corresponding to a text-to-speech conversation between the user and another person. For example, the test data may include audible user utterances as audio spoken by a user. User utterances may include commands to perform particular tasks or functions. User utterances may also include questions posed by the user to solicit a response from the virtual assistant that is performed as part of the operation of the VA device 120. Test data may also include encoding parameters (eg, vocoder parameters) and values for performing tests on VA device 120. For example, test data may be provided to database 122 via an external source or may be received from one or more of the components in communication with database 122. In particular, a voice command may include a trigger word/phrase followed by "What's the weather like outside?", "Is it going to rain?", and the expected device responses are "Hello AB", "It's 25 degrees", respectively. , ``It's sunny'' and ``Yes, it looks like it's going to rain.''

サーバ１２５は、スタンドアロンコンピューティングデバイス、管理サーバ、ウェブサーバ、またはデータを受信、送信、および処理することができ、ネットワーク１１０を介して音声アシスタントデバイス１２０と通信することができる任意の他の電子デバイスまたはコンピューティングシステムとすることができる。他の実施形態では、サーバ１２５は、クラウドコンピューティング環境など、サーバシステムとして複数のコンピュータを利用するサーバコンピューティングシステムを表す。さらに他の実施形態では、サーバ１２５は、分散型データ処理環境１００内でアクセスされたときにシームレスリソースの単一プールとして機能するクラスタ化されたコンピュータおよびコンポーネント（例えば、データベースサーバコンピュータ、アプリケーションサーバコンピュータなど）を利用するコンピューティングシステムを表す。サーバ１２５は、図３においてさらに詳細に説明されるようなコンポーネントを含んでもよい。 Server 125 can be a standalone computing device, a management server, a web server, or any other electronic device that can receive, transmit, and process data and that can communicate with voice assistant device 120 over network 110. Or it can be a computing system. In other embodiments, server 125 represents a server computing system that utilizes multiple computers as a server system, such as a cloud computing environment. In yet other embodiments, server 125 includes clustered computers and components (e.g., database server computers, application server computers, etc.) that function as a single pool of seamless resources when accessed within distributed data processing environment 100. ) represents a computing system that utilizes Server 125 may include components as described in further detail in FIG.

ボコーダ１３０は、音声データ圧縮、多重化、音声暗号化、または音声変換のために入力された人間の音声信号を分析および合成するように構成された音声コーデックまたは音声プロセッサであってよい。ボコーダ１３０は、音声スペクトルにわたって調整された、いくつかのフィルタバンドを含んでもよい。ボコーダ１３０は、ボコーダパラメータを使用して音声テクスチャを生成するように構成されてもよい。ボコーダパラメータは、ボコーダパラメータについて測定可能な単位数に従って受信した音声信号を操作するようにボコーダ１３０をプログラムするために使用される符号化パラメータの一種である。ボコーダ１３０は、テストデータを受信し、テストデータ中のユーザ発話にボコーダパラメータを適用することによってテストデータを処理し、テストデータ中の受信したユーザ発話とは音響的に異なる可能性のある修正されたユーザ発話を出力するように構成されてもよい。 Vocoder 130 may be an audio codec or audio processor configured to analyze and synthesize input human speech signals for audio data compression, multiplexing, audio encryption, or audio conversion. Vocoder 130 may include several filter bands tuned across the audio spectrum. Vocoder 130 may be configured to generate audio textures using vocoder parameters. Vocoder parameters are a type of encoding parameter used to program vocoder 130 to manipulate the received audio signal according to the number of measurable units for the vocoder parameter. Vocoder 130 receives test data and processes the test data by applying vocoder parameters to user utterances in the test data that may be acoustically different from the received user utterances in the test data. The user utterance may be configured to output the user's utterances.

ボコーダ１３０はまた、ボコーダパラメータおよびテストデータで提供される値に基づいて、複数のバリエーションの音声またはユーザ発話を生成するように構成されてもよい。受信されたユーザ発話は、ユーザ発話の特性を変更するために、ボコーダパラメータ、背景雑音、またはアクセントを適用することによって変調または修正されてもよい。例えば、Fligerディストーション技術または他の既知のオーディオ信号処理技術を使用して、テストデータ内のユーザ発話を変調し、変調または修正されたユーザ発話が、受信したユーザ発話と音響的に異なるように聞こえるようにしてもよい。さらに、ボコーダ１３０は、音声アシスタントデバイス１２０を介して実行されるバーチャルアシスタントへの音声コマンドまたは音声質問に対応するユーザ発話を含むテストデータを受信してもよく、バーチャルアシスタントは、予想されるデバイス応答を提供するように構成されてもよい。 Vocoder 130 may also be configured to generate multiple variations of voice or user utterances based on the vocoder parameters and values provided in the test data. Received user speech may be modulated or modified by applying vocoder parameters, background noise, or accents to change characteristics of the user speech. For example, using Fliger distortion techniques or other known audio signal processing techniques to modulate user utterances in the test data so that the modulated or modified user utterances sound acoustically different from the received user utterances. You can do it like this. Further, the vocoder 130 may receive test data including user utterances corresponding to voice commands or voice questions to the virtual assistant executed via the voice assistant device 120, and the virtual assistant determines the expected device response. may be configured to provide.

別の実施形態では、ボコーダ１３０は、ユーザ発話に適用されるボコーダパラメータに雑音データを追加するように構成されてもよい。ボコーダ１３０はまた、修正されたユーザ発話が音声アシスタントデバイス１２０に提示されるとき、修正されたユーザ発話が出力音声信号の一部として雑音データを含むように、雑音データをユーザ発話に直接追加するように構成されてもよい。したがって、符号化パラメータは、雑音データを含んでもよい。例えば、バックグラウンドに周囲雑音を含む実世界環境をシミュレートするために、１または複数の雑音シグネチャがユーザ発話に追加されてもよい。雑音データは、背景のテレビ音声、人々の間の背景の会話、自然（例えば、動物、天候）音、交通（例えば、公共または私有車両／交通手段）、または周囲環境内に存在するマイクロフォンによって検出され得る他の音によって生成される音を含んでもよい。 In another embodiment, vocoder 130 may be configured to add noise data to vocoder parameters applied to user utterances. Vocoder 130 also adds noise data directly to the user utterance such that when the modified user utterance is presented to voice assistant device 120, the modified user utterance includes the noise data as part of the output audio signal. It may be configured as follows. Therefore, the encoding parameters may include noise data. For example, one or more noise signatures may be added to user utterances to simulate a real-world environment with background ambient noise. Noise data can be detected by background television audio, background conversations between people, natural (e.g. animals, weather) sounds, traffic (e.g. public or private vehicles/transportation), or microphones present within the surrounding environment. may include sounds generated by other sounds that may be performed.

ユーザ発話は、あるユーザの音声を他のユーザの音声から一意に区別するユーザ特性に基づく音声特性を含んでもよい。例えば、音声特性は、ピッチ、発話速度、トーン、テクスチャ、イントネーション、ラウドネスなどを含んでもよく、音声特性の１または複数の組み合わせは、アクセントまたは方言に対応する固有の音声をもたらす可能性がある。 User utterances may include voice characteristics based on user characteristics that uniquely distinguish one user's voice from another user's voice. For example, voice characteristics may include pitch, rate of speech, tone, texture, intonation, loudness, etc., and combinations of one or more of the voice characteristics may result in a unique voice corresponding to an accent or dialect.

音声コマンドは、音声アシスタントデバイス１２０が処理するように構成される複数のコマンドのうちの１つである、ユーザの発話における任意のコマンドであってよく、音声コマンドを処理すると、音声アシスタントデバイス１２０は、機能を実行するまたは予想されるデバイス応答を発するように構成され得る。予想されるデバイス応答は、音声アシスタントデバイス１２０が、ユーザ発話におけるコマンドまたは質問に対応する機能を実行することを含んでもよい。例えば、音声コマンドは、「午前６時にアラームを設定する」というコマンドを含んでもよく、音声アシスタントデバイス１２０によって処理および実行されると、「午前６時にアラームを設定する」と可聴に提示する予想されるデバイス応答を生成する。予想されるデバイス応答を生成することに加えて、音声アシスタントデバイス１２０は、音声コマンドを実行し、音声コマンドに含まれる機能を実行するように構成されてもよい。 A voice command may be any command in a user's utterance that is one of a plurality of commands that voice assistant device 120 is configured to process; processing a voice command causes voice assistant device 120 to , may be configured to perform a function or issue an expected device response. The expected device response may include voice assistant device 120 performing a function corresponding to a command or question in the user's utterance. For example, the voice command may include the command "Set an alarm for 6 a.m.," which, when processed and executed by the voice assistant device 120, audibly presents the expected command "Set an alarm for 6 a.m." Generates a device response. In addition to generating expected device responses, voice assistant device 120 may be configured to execute voice commands and perform functions included in the voice commands.

しかしながら、音声アシスタントデバイス１２０は、不慣れなアクセント／方言または背景雑音を含む様々な要因により、音声コマンドを理解するように十分に構成されていない場合があり、その結果、エッジケースが発生する。本発明によれば、エッジケースは、音声アシスタントデバイス１２０が修正されたユーザ発話を含むテストデータを受信および処理し、対応する予想されるデバイス応答と一致しない音声アシスタント応答を生成する場合に発生する。さらに、エッジケースは、音声アシスタントデバイス１２０が修正されたユーザ発話を含むテストデータを受信および処理し、音声アシスタント応答を生成することに失敗するか、または音声アシスタントデバイス１２０が修正されたユーザ発話の処理に失敗したことを示すエラーメッセージを生成する場合に発生する可能性がある。 However, voice assistant device 120 may not be well configured to understand voice commands due to various factors including unfamiliar accents/dialects or background noise, resulting in edge cases. According to the present invention, an edge case occurs when voice assistant device 120 receives and processes test data that includes modified user utterances and generates a voice assistant response that does not match the corresponding expected device response. . Additionally, edge cases occur when voice assistant device 120 fails to receive and process test data containing modified user utterances and generate a voice assistant response, or when voice assistant device 120 fails to receive and process test data containing modified user utterances, or This can occur when generating an error message indicating that the process has failed.

したがって、一実施形態において、ボコーダ１３０は、ボコーダパラメータを含むテストデータを受信するように構成されてもよく、テストデータ内のユーザ発話に適用されると、修正されたユーザ発話を生成および音声アシスタントデバイス１２０に可聴に提示する。修正されたユーザ発話を処理および実行すると、音声アシスタントデバイス１２０は、ユーザ発話に対応する予想されるデバイス応答と実質的に一致する場合があるまたは一致しない場合があるデバイス応答を可聴に提示するよう構成されてもよい。音声アシスタントデバイス１２０が予想されるデバイス応答と実質的に一致しないデバイス応答を提示する場合、修正されたユーザ発話を生成するために使用されたテストデータに基づいてエッジケースが識別され得る。 Accordingly, in one embodiment, vocoder 130 may be configured to receive test data including vocoder parameters, which, when applied to user utterances in the test data, generate modified user utterances and voice assistant audibly presented to device 120; Upon processing and executing the modified user utterance, voice assistant device 120 is configured to audibly present a device response that may or may not substantially match an expected device response corresponding to the user utterance. may be configured. If voice assistant device 120 presents a device response that does not substantially match an expected device response, an edge case may be identified based on the test data used to generate the modified user utterance.

音声テキスト化モジュール１４０は、音声認識処理を行うように構成されたコンポーネントであってよく、音声データが受信され、受信された音声データに対応するテキストデータを出力するように処理される。例えば、音声テキスト化モジュール１４０は、音声アシスタントデバイス１２０から音声データを含む音声信号を受信し、音声信号を処理し、受信した音声信号に対応するテキストデータを出力するように構成されてもよい。出力されたテキストデータは、データベース１２２に送信されるテスト結果データに含まれてもよい。さらに、音声テキスト化モジュール１４０は、音声オーディオ信号を捕捉し、音声オーディオ信号をテキストデータに変換するように構成されてもよい。音声テキスト化モジュール１４０は、実際の音声アシスタント応答に対応するテキストデータを、予想されるデバイス応答に対応するテストデータと比較して、一致があるか否かを決定するようにさらに構成されてもよい。テスト結果データは、データベース１２２に取り込まれ、格納されてもよい。テスト結果データは、音声アシスタントデバイス１２０から取り込まれた音声アシスタント応答に対応するデータを含んでもよい。テスト結果データは、テストデータ中のユーザ発話を変調するために使用されるボコーダパラメータおよびボコーダパラメータの各々の単位数も含んでもよい。 Speech-to-text module 140 may be a component configured to perform speech recognition processing, where audio data is received and processed to output text data corresponding to the received audio data. For example, speech-to-text module 140 may be configured to receive an audio signal that includes audio data from voice assistant device 120, process the audio signal, and output text data corresponding to the received audio signal. The output text data may be included in the test result data sent to the database 122. Additionally, speech-to-text module 140 may be configured to capture a speech audio signal and convert the speech audio signal to text data. The speech-to-text module 140 may be further configured to compare the text data corresponding to the actual voice assistant response to the test data corresponding to the expected device response to determine if there is a match. good. Test result data may be captured and stored in database 122. Test result data may include data corresponding to voice assistant responses captured from voice assistant device 120. The test result data may also include vocoder parameters used to modulate user utterances in the test data and the number of units for each of the vocoder parameters.

ボコーダパラメータの単位数は、各パラメータの定量化可能な単位に対応する。例えば、ボコーダパラメータは、バンド数、周波数範囲、エンベロープ、非音声要素、フォルマントシフト、帯域幅などを含んでもよい。各パラメータは、対応するパラメータに適用される単位数によって定量化されてもよい。例えば、帯域数は、音声信号としてのユーザ発話に適用されるフィルタ帯域の総数を決定したものである。バンド数が少ないほど（例えば４～６の範囲）ビンテージなサウンドになり、１６バンドより多いほどディテールが増すが、プロセッサの使用率が犠牲になることがよくある。周波数範囲のボコーダパラメータは、上限と下限を設定することができ、利用可能なバンド数で分割することができる。周波数範囲のボコーダパラメータは、女性および男性の声のより高いまたはより低い声域にそれぞれ対応するために、ボコーダ１３０のフィルタバンドを調整するために使用され得る。エンベロープボコーダパラメータは、エフェクトが動的な音量変化にどれだけ速く反応するかを決定し、これをボーカルまたはユーザ発話に適用すると、エンベロープボコーダパラメータはかなり速い反応時間を提供するが、より長い反応時間は、より印象的なエフェクトのために達成される。 The number of units of a vocoder parameter corresponds to a quantifiable unit of each parameter. For example, vocoder parameters may include number of bands, frequency range, envelope, non-speech elements, formant shift, bandwidth, etc. Each parameter may be quantified by the number of units applied to the corresponding parameter. For example, the number of bands determines the total number of filter bands applied to the user's utterance as an audio signal. Fewer bands (for example, in the 4-6 range) will give a more vintage sound, and more than 16 bands will provide more detail, but often at the expense of processor utilization. Vocoder parameters for frequency ranges can have upper and lower limits and can be divided by the number of available bands. The frequency range vocoder parameters may be used to adjust the filter bands of vocoder 130 to accommodate the higher or lower registers of female and male voices, respectively. The envelope vocoder parameter determines how quickly the effect reacts to dynamic volume changes, and when applied to vocals or user utterances, the envelope vocoder parameter provides a fairly fast reaction time, but a longer reaction time. is achieved for a more impressive effect.

追加のボコーダパラメータは、特定の音声要素に対する明瞭度の向上に役立つ場合がある周波数フィルタ（たとえば、ハイパス、ローパス）に関連付けられる場合がある。例えば、ハイパスフィルタボコーダパラメータは、これらの声にならない要素がよりよく検出および識別されることを可能にすることによって、破裂音（例えば、文字ｔ、ｄ、ｂ）および歯擦音（例えば、ｓ、ｚ、ｘ）がボコーダ１３０によって処理されるときの明瞭度を向上させることができる。 Additional vocoder parameters may be associated with frequency filters (eg, high pass, low pass) that may help improve intelligibility for particular audio elements. For example, high-pass filter vocoder parameters can improve plosives (e.g., letters t, d, b) and sibilants (e.g., s) by allowing these unvoiced elements to be better detected and identified. , z, x) is processed by the vocoder 130.

機械学習（「ＭＬ」）モデル１５０は、テスト結果データを受信し、受信したテスト結果データを処理し、受信したテスト結果データに対応する出力データを生成するように構成された１または複数のプロセッサを含んでもよい。例えば、ＭＬモデル１５０は、データベース１２２からテスト結果データを受信し、テスト結果データを処理して修正されたボコーダパラメータを生成し、エッジケース（すなわち、受信したテストデータ）に基づいてテストデータを更新するように構成されてもよい。言い換えれば、ＭＬモデル１５０は、テスト結果データに基づいてボコーダパラメータを微調整して新しいテストデータを生成し、音声アシスタントデバイス１２０をさらにテストしてより多くのエッジケースを識別するように構成されてもよい。より多くのエッジケースが識別される場合、ＭＬモデル１５０は、音声アシスタントデバイス１２０から予想されるデバイス応答をもたらす修正されたユーザ発話を生成するために、ユーザ発話を変調するための微調整されたボコーダパラメータを生成するように、よりよく構成されてもよい。 Machine learning (“ML”) model 150 includes one or more processors configured to receive test result data, process the received test result data, and generate output data corresponding to the received test result data. May include. For example, ML model 150 receives test result data from database 122, processes the test result data to generate modified vocoder parameters, and updates test data based on edge cases (i.e., received test data). It may be configured to do so. In other words, the ML model 150 is configured to fine-tune the vocoder parameters based on the test result data and generate new test data to further test the voice assistant device 120 to identify more edge cases. Good too. If more edge cases are identified, the ML model 150 may be fine-tuned to modulate the user utterances to generate modified user utterances that result in expected device responses from the voice assistant device 120. It may be better configured to generate vocoder parameters.

さらに、別の実施形態では、ＭＬモデル１５０は、カオスエンジニアリングの原理を適用してレジリエンスをテストするように構成されてもよく、ＭＬモデル１５０は、テスト結果を比較してエッジケースを識別し、ボコーダパラメータを微調整してプリセット制御限界内でより多くのエッジケースを識別することができる。例えば、実際の音声アシスタント応答が予想されるデバイス応答と一致しない場合、ＭＬモデル１５０は、一致しないことがエッジケースを構成すると決定することができ、修正されたテストデータをエッジケースとして識別することができる。エッジケースが識別されると、ＭＬモデル１５０は、エッジケースで使用される第１のボコーダパラメータを修正して、ボコーダ１３０によってユーザ発話に適用される第２のボコーダパラメータを生成し、音声アシスタントデバイス１２０での追加のテストのために別の修正されたユーザ発話を生成するように構成されてもよい。ＭＬモデル１５０は、テストプロセスの制御限界を満たすのに十分なエッジケースが識別されるまで、ボコーダパラメータ微調整プロセスを継続するように構成されてもよい。エッジケースが特定されると、ＭＬモデル１５０は、識別されたエッジケースに対応するボコーダパラメータを修正または微調整するように構成されてもよい。修正または微調整されたボコーダパラメータは、その後、さらなるテストのためにＶＡデバイス１２０に提示するための修正されたユーザ発話を生成（produce）または生成（generate）するために、ユーザ発話に適用されてもよい。ＶＡデバイス１２０のテストは、成功したテスト計画に関連する条件を満たすのに十分な数の反復を通じて継続してもよい。例えば、ＶＡデバイス１２０のテストは、デバイス応答が、ボコーダ１３０によって処理され、修正されたユーザ発話としてＶＡデバイス１２０に提示されたユーザ発話に対応する予想されるデバイス応答と実質的に一致するまで継続されてもよい。 Additionally, in another embodiment, ML model 150 may be configured to apply chaos engineering principles to test for resilience, and ML model 150 compares test results to identify edge cases; Vocoder parameters can be fine-tuned to identify more edge cases within preset control limits. For example, if the actual voice assistant response does not match the expected device response, the ML model 150 may determine that the mismatch constitutes an edge case and may identify the modified test data as an edge case. I can do it. Once the edge cases are identified, the ML model 150 modifies the first vocoder parameters used in the edge cases to generate second vocoder parameters that are applied to the user utterance by the vocoder 130 and the voice assistant device. Another modified user utterance may be generated for additional testing at 120. ML model 150 may be configured to continue the vocoder parameter fine-tuning process until enough edge cases are identified to meet the control limits of the testing process. Once edge cases are identified, ML model 150 may be configured to modify or fine-tune the vocoder parameters corresponding to the identified edge cases. The modified or fine-tuned vocoder parameters are then applied to the user utterances to produce or generate modified user utterances for presentation to the VA device 120 for further testing. Good too. Testing of VA device 120 may continue through a sufficient number of iterations to satisfy conditions associated with a successful test plan. For example, testing of VA device 120 continues until the device response substantially matches an expected device response corresponding to user utterances processed by vocoder 130 and presented to VA device 120 as modified user utterances. may be done.

条件は、十分なテスト反復が完了した時点で満たされることができる。例えば、条件は、エッジケースの数が所定の閾値を超える、または所定の閾値に満たないという判断に対応してもよい。例えば、エッジケースで識別されたボコーダパラメータを微調整する３つの反復の後、１または複数のプロセッサは、テスト計画の段階の終了を通知して、ボコーダ１３０へのテストデータの送信を停止（stop）または停止（cease）するように構成されてもよい。テスト反復の数は、ＶＡデバイス１２０がユーザ発話および修正されたユーザ発話を処理するのにかかる時間量である、ＶＡデバイス処理時間に基づいてもよい。また、テスト反復の数は、識別されたエッジケースの数に基づいてもよく、識別された所定の数のエッジケースは、ボコーダ１３０へのテストデータの送信を停止する条件を満たすことができる。また、テスト担当者は、ＶＡデバイス１２０にテストデータを提供するプロセスを手動で停止してもよい。 The condition can be met once sufficient test iterations have been completed. For example, the condition may correspond to a determination that the number of edge cases exceeds or is less than a predetermined threshold. For example, after three iterations of fine-tuning the vocoder parameters identified in the edge case, the one or more processors may stop sending test data to the vocoder 130, signaling the end of the test plan stage. ) or cease. The number of test iterations may be based on VA device processing time, which is the amount of time it takes VA device 120 to process user utterances and modified user utterances. The number of test iterations may also be based on the number of identified edge cases, such that a predetermined number of identified edge cases can satisfy a condition to stop sending test data to vocoder 130. The tester may also manually stop the process of providing test data to the VA device 120.

少なくともいくつかの実施形態において、ＭＬモデル１５０は、ＭＬモデル１５０に関して上述した処理を実行するように構成された訓練済みコンポーネントまたは訓練済みモデルを実装してもよい。訓練済みコンポーネントは、１または複数の分類器、１または複数のニューラルネットワーク、１または複数の確率的グラフ、１または複数の決定木などを含むがこれらに限定されない、１または複数の機械学習モデルを含んでもよい。他の実施形態では、訓練済みコンポーネントは、自然言語入力が複雑な自然言語入力であるか非複雑な自然言語入力であるかを決定するためのルールベースのエンジン、１または複数の統計ベースのアルゴリズム、１または複数のマッピング関数または他のタイプの関数／アルゴリズムを含んでもよい。いくつかの実施形態では、訓練済みコンポーネントは、自然言語入力が２つのクラス／カテゴリのうちの１つに分類され得る、２値分類を実行するように構成されてもよい。いくつかの実施形態では、訓練済みコンポーネントは、マルチクラスまたは多項分類を実行するように構成されてもよく、ここで、自然言語入力は、３つ以上のクラス／カテゴリのうちの１つに分類されてもよい。いくつかの実施形態では、訓練済みコンポーネントは、マルチラベル分類を実行するように構成されてもよく、ここで、自然言語入力は、複数のクラス／カテゴリに関連付けられる可能性がある。１または複数のプロセッサは、本明細書で説明するように、ＭＬモデル１５０を含んでもよい。 In at least some embodiments, ML model 150 may implement trained components or models configured to perform the operations described above with respect to ML model 150. The trained component may include one or more machine learning models, including but not limited to one or more classifiers, one or more neural networks, one or more probabilistic graphs, one or more decision trees, etc. May include. In other embodiments, the trained component includes a rules-based engine, one or more statistics-based algorithms for determining whether the natural language input is complex or non-complex natural language input. , one or more mapping functions or other types of functions/algorithms. In some embodiments, the trained component may be configured to perform binary classification, where the natural language input may be classified into one of two classes/categories. In some embodiments, the trained component may be configured to perform multiclass or multinomial classification, where the natural language input is classified into one of three or more classes/categories. may be done. In some embodiments, the trained component may be configured to perform multi-label classification, where the natural language input may be associated with multiple classes/categories. One or more processors may include an ML model 150, as described herein.

本明細書に記載される様々な処理を実行するために、訓練済みコンポーネントを訓練および操作するために、様々な機械学習技術が使用されてもよい。モデルは、様々な機械学習技法に従って訓練および操作されてもよい。そのような技法は、例えば、ニューラルネットワーク（ディープニューラルネットワークもしくはリカレントニューラルネットワークまたはその両方）、推論エンジン、訓練された分類器などを含んでもよい。訓練済み分類器の例としては、サポートベクターマシン（ＳＶＭ）、ニューラルネットワーク、決定木、決定木と組み合わせたAdaBoost（「Adaptive Boosting」の略）、およびランダムフォレストなどが挙げられる。ＳＶＭを例にとると、ＳＶＭはデータを分析し、データのパターンを認識する学習アルゴリズムを伴う教師あり学習モデルであり、分類や回帰分析によく用いられるものである。ＳＶＭの訓練アルゴリズムは、２つのカテゴリのいずれかに属するとマークされた訓練例のセットが与えられたときに、新しい学習例をどちらかのカテゴリに割り当てるモデルを構築するもので、非確率的バイナリ線形分類器となる。より複雑なＳＶＭモデルは、２つ以上のカテゴリを識別する訓練セットで構築され、ＳＶＭはどのカテゴリが入力データに最も類似しているかを決定することが可能である。ＳＶＭモデルは、別々のカテゴリの例が明確なギャップによって分割されるようにマッピングされることがある。そして、新しい例は同じ空間にマッピングされ、ギャップのどちら側に位置するかに基づいて、カテゴリに属すると予測される。分類器は、データがどのカテゴリに最も近く一致するかを示す「スコア」を発行することができる。スコアは、データがどの程度カテゴリに一致するかを示す指標となる。 Various machine learning techniques may be used to train and manipulate trained components to perform the various processes described herein. Models may be trained and operated according to various machine learning techniques. Such techniques may include, for example, neural networks (deep and/or recurrent neural networks), inference engines, trained classifiers, and the like. Examples of trained classifiers include support vector machines (SVMs), neural networks, decision trees, AdaBoost (short for "Adaptive Boosting") combined with decision trees, and random forests. Taking SVM as an example, SVM is a supervised learning model with a learning algorithm that analyzes data and recognizes patterns in the data, and is often used for classification and regression analysis. The SVM training algorithm constructs a model that assigns new training examples to one of two categories, given a set of training examples marked as belonging to one of two categories. It becomes a linear classifier. More complex SVM models are built with a training set that identifies two or more categories, allowing the SVM to determine which category is most similar to the input data. The SVM model may be mapped such that instances of separate categories are separated by clear gaps. New examples are then mapped into the same space and predicted to belong to a category based on which side of the gap they are located on. The classifier can issue a "score" indicating which category the data most closely matches. The score is an indicator of how well the data matches the category.

機械学習の技術を適用するためには、機械学習処理自体を学習させる必要がある。機械学習コンポーネントを訓練するには、訓練例に対する「グラウンドトゥルース」を確立する必要がある。機械学習において「グラウンドトゥルース」とは、教師あり学習技術における訓練セットの分類の精度を意味する。バックプロパゲーション、統計的学習、教師あり学習、半教師あり学習、確率的学習、または他の既知の技術を含むモデルの訓練に様々な技術が使用され得る。 In order to apply machine learning technology, it is necessary to train the machine learning process itself. To train a machine learning component, we need to establish "ground truth" for the training examples. In machine learning, "ground truth" refers to the classification accuracy of a training set in supervised learning techniques. Various techniques may be used to train the model, including backpropagation, statistical learning, supervised learning, semi-supervised learning, probabilistic learning, or other known techniques.

音声アシスタントデバイス１２０をテストするためのコンピュータ実装方法は、図２に関してさらに詳細に描かれ、説明される。 A computer-implemented method for testing voice assistant device 120 is depicted and described in further detail with respect to FIG.

図２は、本発明の実施形態による、図１の分散型データ処理環境内で音声アシスタントデバイスをテストするためのコンピュータ実装方法２００の動作ステップを示すフローチャートである。 FIG. 2 is a flowchart illustrating operational steps of a computer-implemented method 200 for testing voice assistant devices within the distributed data processing environment of FIG. 1, according to an embodiment of the invention.

一実施形態において、音声アシスタントデバイスをテストするためのコンピュータ実装方法２００は、データベース（例えば、データベース１２２）からテストデータを受信２０２するように構成された１または複数のプロセッサによって実行されてよく、テストデータは、少なくとも符号化パラメータの第１のセットと、予想されるデバイス応答を有する第１のユーザ発話とを含み得る。例えば、１または複数のプロセッサは、データベース１２２とボコーダ１３０との間の通信リンクを介してデータベース１２２からテストデータを受信２０２するように構成されたボコーダ１３０を含んでもよい。ボコーダ１３０は、データ伝送を受信し、さらなる処理のためにボコーダ１３０に常駐するプロセッサにデータを中継するように構成されたデータポートを含んでもよい。符号化パラメータの第１のセットは、ボコーダ１３０の設定を構成するために使用されてもよく、第１のユーザ発話を変調または修正するために使用されてもよい。 In one embodiment, the computer-implemented method 200 for testing a voice assistant device may be performed by one or more processors configured to receive 202 test data from a database (e.g., database 122) and test The data may include at least a first set of encoding parameters and a first user utterance having an expected device response. For example, one or more processors may include a vocoder 130 configured to receive 202 test data from database 122 via a communication link between database 122 and vocoder 130. Vocoder 130 may include a data port configured to receive data transmissions and relay the data to a processor resident in vocoder 130 for further processing. The first set of encoding parameters may be used to configure settings for vocoder 130 and may be used to modulate or modify the first user utterance.

音声アシスタントデバイスをテストするためのコンピュータ実装方法２００は、第１のユーザ発話に符号化パラメータの第１のセットを適用することによって第１の修正されたユーザ発話を生成２０４するように構成された１または複数のプロセッサをさらに含むことができ、第１の修正されたユーザ発話は、第１のユーザ発話と音響的に異なる。例えば、１または複数のプロセッサは、本明細書で上記説明したように、第１のユーザ発話に符号化パラメータの第１のセットを適用することによって第１の修正されたユーザ発話を生成２０４するように構成されたボコーダ１３０を含んでもよい。第１の修正されたユーザ発話は、少なくともボコーダ１３０が第１のユーザ発話に符号化パラメータの第１のセットを適用することに起因して、第１のユーザ発話と音響的に異なる場合がある。したがって、第１の修正されたユーザ発話は、ボコーダ１３０によって修正される音特性のうちの１または複数に起因して異なって聞こえる場合がある。 The computer-implemented method 200 for testing a voice assistant device was configured to generate 204 a first modified user utterance by applying a first set of encoding parameters to the first user utterance. One or more processors may further be included, wherein the first modified user utterance is acoustically different from the first user utterance. For example, the one or more processors generate 204 a first modified user utterance by applying a first set of encoding parameters to the first user utterance, as described herein above. The vocoder 130 may include a vocoder 130 configured as follows. The first modified user utterance may be acoustically different from the first user utterance at least due to vocoder 130 applying the first set of encoding parameters to the first user utterance. . Accordingly, the first modified user utterance may sound different due to one or more of the sound characteristics modified by vocoder 130.

音声アシスタントデバイスをテストするためのコンピュータ実装方法２００は、音声アシスタントデバイス１２０に第１の修正されたユーザ発話を可聴に提示２０６するために構成された１または複数のプロセッサをさらに含むことができる。例えば、ボコーダ１３０は、音声アシスタントデバイス１２０による検出のために、第１の修正されたユーザ発話を含む音声信号を出力するように構成されたスピーカーを含んでもよい。音声アシスタントデバイス１２０は、ボコーダ１３０のスピーカーの近接内で音声信号を音波として検出および受信するように構成されたマイクロフォンを含んでもよい。 The computer-implemented method 200 for testing a voice assistant device can further include one or more processors configured to audibly present 206 the first modified user utterance to the voice assistant device 120. For example, vocoder 130 may include a speaker configured to output an audio signal that includes the first modified user utterance for detection by voice assistant device 120. Voice assistant device 120 may include a microphone configured to detect and receive audio signals as sound waves within proximity of a speaker of vocoder 130.

音声アシスタントデバイスをテストするためのコンピュータ実装方法２００は、音声アシスタントデバイス１２０から第１の音声アシスタント応答を受信２０８するために構成された１または複数のプロセッサをさらに含んでもよい。１または複数のプロセッサは、音声テキスト化モジュール１４０を介して第１の音声アシスタント応答を受信するように構成されてもよく、ＶＴＴモジュール１４０は、第１の音声アシスタント応答を音声入力信号として検出および受信し、音声入力信号をテキストデータに変換し、テキストデータをさらに処理するために１または複数のプロセッサに送信するマイクロフォンを含んでもよい。例えば、マイクロフォンは、マイクロフォンが音声アシスタントデバイス１２０のスピーカーからの出力音声信号を検出することができるように、音声アシスタントデバイス１２０のスピーカーの範囲内に配置されてもよい。マイクロフォンは、検出された音声信号を音声アシスタント応答に対応するデータとして１または複数のプロセッサに送信するように構成されてもよい。 Computer-implemented method 200 for testing a voice assistant device may further include one or more processors configured to receive 208 a first voice assistant response from voice assistant device 120. The one or more processors may be configured to receive the first voice assistant response via the speech-to-text module 140, and the VTT module 140 detects and detects the first voice assistant response as an audio input signal. A microphone may be included for receiving, converting the audio input signal to text data, and transmitting the text data to one or more processors for further processing. For example, a microphone may be placed within range of a speaker of voice assistant device 120 such that the microphone can detect output audio signals from the speaker of voice assistant device 120. The microphone may be configured to transmit the detected audio signal to one or more processors as data corresponding to a voice assistant response.

音声アシスタントデバイスをテストするためのコンピュータ実装方法２００は、第１の音声アシスタント応答が予想されるデバイス応答と実質的に類似しているか否かを決定２１０するように構成された１または複数のプロセッサをさらに含んでもよい。一実施形態において、１または複数のプロセッサは、第１の音声アシスタントが予想されるデバイス応答と実質的に類似しているか否かを決定２１０するためのＭＬモデル１５０を含んでもよい。さらに、１または複数のプロセッサは、第１の音声アシスタント応答と予想されるデバイス応答との間の比較に基づいて第１のスコアを決定するように構成されてもよい。例えば、第１の音声アシスタント応答が予想されるデバイス応答と実質的に一致する場合、第１のスコアは１であってもよい。別の実施形態として、第１の音声アシスタント応答が予想されるデバイス応答と実質的に一致しない場合、第１のスコアは０であってよい。第１のスコアが０である場合、１または複数のプロセッサは、第１のテストデータおよび音声アシスタント応答を含むエッジケースが識別されると決定してもよい。 A computer-implemented method 200 for testing a voice assistant device includes one or more processors configured to determine 210 whether a first voice assistant response is substantially similar to an expected device response. It may further include. In one embodiment, the one or more processors may include a ML model 150 to determine 210 whether the first voice assistant is substantially similar to an expected device response. Additionally, the one or more processors may be configured to determine the first score based on a comparison between the first voice assistant response and an expected device response. For example, the first score may be 1 if the first voice assistant response substantially matches an expected device response. As another embodiment, the first score may be 0 if the first voice assistant response does not substantially match an expected device response. If the first score is 0, the one or more processors may determine that an edge case including the first test data and the voice assistant response is identified.

別の実施形態では、第１の音声アシスタント応答が予想されるデバイス応答と実質的に類似していると決定することに応答して、音声アシスタントデバイスをテストするためのコンピュータ実装方法２００は、追加のテストデータの受信を停止するように構成された１または複数のプロセッサをさらに含むことができる。さらに、第１の音声アシスタント応答が予想されるデバイス応答と実質的に類似していないと判断することに応答して、１または複数のプロセッサは、少なくとも符号化パラメータの第１のセットおよび第１の修正されたユーザ発話に基づいて第１のエッジケースを識別するように構成されてもよい。１または複数のプロセッサは、符号化パラメータの第１のセット、第１の修正されたユーザ発話、および第１のエッジケースのうちの少なくとも１つに基づいて、符号化パラメータの第２のセットを決定するようにさらに構成されてもよい。 In another embodiment, in response to determining that a first voice assistant response is substantially similar to an expected device response, the computer-implemented method 200 for testing a voice assistant device includes: The device may further include one or more processors configured to stop receiving test data for the device. Further, in response to determining that the first voice assistant response is not substantially similar to an expected device response, the one or more processors include at least the first set of encoding parameters and the first voice assistant response. The first edge case may be configured to identify the first edge case based on the modified user utterance. The one or more processors determine a second set of encoding parameters based on at least one of the first set of encoding parameters, the first modified user utterance, and the first edge case. The information may be further configured to determine.

さらに、音声アシスタントデバイスをテストするためのコンピュータ実装方法２００は、第１のユーザ発話に符号化パラメータの第２のセットを適用することによって第２の修正されたユーザ発話を生成するように構成された１または複数のプロセッサをさらに含むことができ、第２の修正されたユーザ発話は、第１のユーザ発話および第１の修正されたユーザ発話と音響的に異なる場合がある。１または複数のプロセッサは、第２の修正されたユーザ発話を音声アシスタントデバイス１２０に可聴に提示することと、音声アシスタントデバイス１２０から第２の音声アシスタント応答を受信することと、第２の音声アシスタント応答が予想されるデバイス応答と実質的に類似しているか否かを決定することと、を更に行うように構成されてもよい。さらに、第２の音声アシスタント応答が予想されるデバイス応答と実質的に類似していると決定することに応答して、１または複数のプロセッサは、データベース１２２から追加のテストデータの受信を停止するように構成されてもよい。例えば、１または複数のプロセッサが、音声アシスタント応答が第１のユーザ発話に対応する予想されるデバイス応答と実質的に類似していると決定すると、１または複数のプロセッサは、その後、第１のユーザ発話を含むテストデータの受信を停止するように構成されてもよい。しかしながら、１または複数のプロセッサは、テスト計画を継続するために、第１のユーザ発話とは異なる追加のユーザ発話を含むテストデータの受信を続行するように構成されてもよい。 Further, the computer-implemented method 200 for testing a voice assistant device is configured to generate a second modified user utterance by applying a second set of encoding parameters to the first user utterance. and the second modified user utterance may be acoustically different from the first user utterance and the first modified user utterance. The one or more processors are configured to audibly present a second modified user utterance to the voice assistant device 120, receive a second voice assistant response from the voice assistant device 120, and generate a second voice assistant response. and determining whether the response is substantially similar to an expected device response. Further, in response to determining that the second voice assistant response is substantially similar to the expected device response, the one or more processors stop receiving additional test data from the database 122. It may be configured as follows. For example, if the one or more processors determine that the voice assistant response is substantially similar to an expected device response corresponding to the first user utterance, the one or more processors then It may be configured to stop receiving test data including user utterances. However, the one or more processors may be configured to continue receiving test data including additional user utterances that are different from the first user utterance to continue the test plan.

一実施形態において、第１の修正されたユーザ発話は、１または複数のプロセッサ（例えば、ボコーダ１３０）に通信可能に接続されたスピーカーによって音声アシスタントデバイス１２０に可聴に提示されてもよく、スピーカーは修正されたユーザ発話を音声信号として出力するように構成されている。 In one embodiment, the first modified user utterance may be audibly presented to voice assistant device 120 by a speaker communicatively connected to one or more processors (e.g., vocoder 130), and the speaker The modified user utterance is configured to be output as an audio signal.

一実施形態において、第１の音声アシスタント応答は、１または複数のプロセッサ（例えば、音声テキスト化モジュール１４０）に通信可能に接続されたマイクロフォンによって音声アシスタントデバイス１２０から音声信号として受信され得る。 In one embodiment, the first voice assistant response may be received as an audio signal from voice assistant device 120 by a microphone communicatively connected to one or more processors (eg, speech-to-text module 140).

図３は、本発明の実施形態による、図１の分散型データ処理環境１００内のサーバコンピュータのコンポーネントのブロック図である。図３は、１つの実装の例示のみを提供し、異なる実施形態が実装され得る環境に関していかなる制限も示唆しないことを理解されたい。描かれた環境に対する多くの変更を行うことができる。 FIG. 3 is a block diagram of components of a server computer in the distributed data processing environment 100 of FIG. 1, according to an embodiment of the invention. It should be understood that FIG. 3 provides only one illustration of an implementation and does not suggest any limitations as to the environment in which different embodiments may be implemented. Many changes to the depicted environment can be made.

図３は、本発明の例示的な実施形態による、音声アシスタントデバイス１２０に適したコンピュータ３００のブロック図を示している。図３は、１つの実装の例示のみを提供し、異なる実施形態が実装され得る環境に関していかなる制限も示唆しないことを理解されたい。描かれた環境に対する多くの変更を行うことができる。 FIG. 3 shows a block diagram of a computer 300 suitable for voice assistant device 120, according to an exemplary embodiment of the invention. It should be understood that FIG. 3 provides only one illustration of an implementation and does not suggest any limitations as to the environment in which different embodiments may be implemented. Many changes to the depicted environment can be made.

コンピュータ３００は、通信ファブリック３０２を含み、通信ファブリック３０２はキャッシュ３１６、メモリ３０６、永続ストレージ３０８、通信ユニット３１０、および入力／出力（Ｉ／Ｏ）インタフェース３１２間の通信を提供する。通信ファブリック３０２は、プロセッサ（マイクロプロセッサ、通信およびネットワークプロセッサ等）、システムメモリ、周辺装置、およびシステム内の他の任意のハードウェアコンポーネント間でデータもしくは制御情報またはその両方を渡すように設計された任意のアーキテクチャで実装することができる。例えば、通信ファブリック３０２は、１または複数のバスまたはクロスバースイッチを用いて実装することができる。 Computer 300 includes a communications fabric 302 that provides communications between a cache 316, memory 306, persistent storage 308, a communications unit 310, and an input/output (I/O) interface 312. Communication fabric 302 is designed to pass data and/or control information between processors (such as microprocessors, communications and network processors), system memory, peripherals, and any other hardware components within the system. Can be implemented on any architecture. For example, communications fabric 302 may be implemented using one or more bus or crossbar switches.

メモリ３０６および永続ストレージ３０８は、コンピュータ可読記憶媒体である。本実施形態では、メモリ３０６はＲＡＭを含む。概して、メモリ３０６は、任意の適切な揮発性または不揮発性のコンピュータ可読記憶媒体を含むことができる。キャッシュ３１６は高速メモリ（fast memory）であり、最近アクセスされたデータ、および最近アクセスされたデータに近いデータをメモリ３０６から保持することによって、コンピュータプロセッサ３０４の性能を向上させる。 Memory 306 and persistent storage 308 are computer readable storage media. In this embodiment, memory 306 includes RAM. Generally, memory 306 may include any suitable volatile or nonvolatile computer-readable storage medium. Cache 316 is a fast memory that improves the performance of computer processor 304 by retaining recently accessed data and data near recently accessed data from memory 306 .

ソフトウェアおよびデータ３１４は、キャッシュ３１６を介して１つ以上のコンピュータプロセッサ３０４のそれぞれが実行もしくはアクセスまたはその両方を行うために、永続ストレージ３０８およびメモリ３０６に記憶することができる。一実施形態において、永続ストレージ３０８は、磁気ハードディスクドライブを含む。磁気ハードディスクに代えて、またはこれに加えて、永続ストレージ３０８は、ソリッドステートハードドライブ、半導体記憶装置、ＲＯＭ、消去可能プログラマブルＲＯＭ（ＥＰＲＯＭ）、フラッシュメモリ、またはプログラム命令もしくはデジタル情報を記憶可能な任意の他のコンピュータ可読記憶媒体を含むことができる。 Software and data 314 may be stored in persistent storage 308 and memory 306 for execution and/or access by each of one or more computer processors 304 via cache 316 . In one embodiment, persistent storage 308 includes a magnetic hard disk drive. In place of or in addition to a magnetic hard disk, persistent storage 308 may include a solid state hard drive, solid state storage, ROM, erasable programmable read only memory (EPROM), flash memory, or any device capable of storing program instructions or digital information. and other computer-readable storage media.

永続ストレージ３０８が使用する媒体は、取り外し可能であってもよい。例えば、永続ストレージ３０８には、取り外し可能なハードドライブを用いてもよい。他の例としては、光ディスク、磁気ディスク、サムドライブ、およびスマートカードが挙げられ、これらは、永続ストレージ３０８の一部でもある別のコンピュータ可読記憶媒体に転送するためにドライブに挿入される。 The media used by persistent storage 308 may be removable. For example, persistent storage 308 may include a removable hard drive. Other examples include optical disks, magnetic disks, thumb drives, and smart cards that are inserted into the drive for transfer to another computer-readable storage medium that is also part of persistent storage 308.

これらの例において、通信ユニット３１０は、他のデータ処理システムまたは装置との通信を可能にする。これらの例において、通信ユニット３１０は、１つ以上のネットワークインタフェースカードを含む。通信ユニット３１０は、物理通信リンクおよび無線通信リンクのいずれかまたは両方を用いて通信を可能にしてもよい。ソフトウェアおよびデータ３１４は、通信ユニット３１０を介して永続ストレージ３０８にダウンロードしてもよい。 In these examples, communication unit 310 enables communication with other data processing systems or devices. In these examples, communication unit 310 includes one or more network interface cards. Communication unit 310 may enable communication using either or both physical and wireless communication links. Software and data 314 may be downloaded to persistent storage 308 via communication unit 310.

Ｉ／Ｏインタフェース３１２は、音声アシスタントデバイス１２０に接続可能な他の装置とのデータの入出力を可能にする。例えば、Ｉ／Ｏインタフェース３１２は、キーボード、キーパッド、タッチスクリーン、もしくは他の適切な入力装置またはこれらの組み合わせなどの外部装置３１８との接続を可能にする。また、外部装置３１８は、例えば、サムドライブ、ポータブル光ディスク、ポータブル磁気ディスク、およびメモリカードなどのポータブル・コンピュータ可読記憶媒体を含むこともできる。本発明の実施形態を実施するために用いられるソフトウェアおよびデータ３１４は、かかるポータブル・コンピュータ可読記憶媒体に記憶することができ、Ｉ／Ｏインタフェース３１２を介して永続ストレージ３０８にロードすることができる。Ｉ／Ｏインタフェース３１２は、ディスプレイ３２０にも接続する。 I/O interface 312 allows data input and output from other devices connectable to voice assistant device 120. For example, I/O interface 312 allows for connection with an external device 318 such as a keyboard, keypad, touch screen, or other suitable input device or combination thereof. External devices 318 may also include portable computer readable storage media such as, for example, thumb drives, portable optical disks, portable magnetic disks, and memory cards. Software and data 314 used to implement embodiments of the invention may be stored on such portable computer-readable storage media and loaded into persistent storage 308 via I/O interface 312. I/O interface 312 also connects to display 320.

ディスプレイ３２０は、ユーザにデータを表示する機構を実現するものであり、例えば、コンピュータモニタとすることができる。 Display 320 provides a mechanism for displaying data to a user and may be, for example, a computer monitor.

本発明は、個人データ、コンテンツ、またはユーザが処理されないことを望む情報を含む可能性のある、データベース１２２などの様々なアクセス可能なデータソースを含むことができる。個人データには、個人識別情報または敏感な個人情報、ならびにトラッキングまたはジオロケーション情報などのユーザ情報が含まれる。処理とは、個人データの収集、記録、整理、構造化、保管、適応、変更、検索、相談、使用、送信による開示、普及、その他の利用可能化、組み合わせ、制限、消去、破壊などの自動または非自動的なあらゆる操作または一連の操作を指す。ソフトウェアおよびデータ３１４は、個人データの認可された安全な処理を可能にしてもよい。ソフトウェアおよびデータ３１４は、個人データの収集の通知を伴うインフォームドコンセントを提供し、ユーザが個人データの処理をオプトインまたはオプトアウトできるように構成されてもよい。同意は、いくつかの形態をとることができる。オプトイン同意は、個人データが処理される前に肯定的な行動を取ることをユーザに課すことができる。あるいは、オプトアウト同意は、個人データが処理される前に、個人データの処理を防止するための肯定的な行動を取ることをユーザに課すことができる。ソフトウェアおよびデータ３１４は、個人データおよび処理の性質（例えば、種類、範囲、目的、期間など）に関する情報を提供することができる。ソフトウェアおよびデータ３１４は、保存された個人データのコピーをユーザに提供する。ソフトウェアおよびデータ３１４は、不正確または不完全な個人データの訂正または補完を可能にする。ソフトウェアおよびデータ３１４は、個人データの即時削除を可能にする。 The invention may include a variety of accessible data sources, such as databases 122, that may contain personal data, content, or information that the user desires not to be processed. Personal data includes personally identifying or sensitive personal information as well as user information such as tracking or geolocation information. Processing refers to automatic processing such as the collection, recording, organization, structuring, storage, adaptation, alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, combination, restriction, erasure, destruction of personal data. or any non-automatic operation or series of operations. Software and data 314 may enable authorized and secure processing of personal data. Software and data 314 may be configured to provide informed consent with notice of collection of personal data and allow users to opt in or out of processing of personal data. Consent can take several forms. Opt-in consent can require users to take affirmative action before their personal data is processed. Alternatively, an opt-out consent may impose on the user to take affirmative action to prevent the processing of personal data before the personal data is processed. Software and data 314 can provide information regarding the personal data and the nature of the processing (eg, type, scope, purpose, duration, etc.). Software and data 314 provides the user with a copy of the stored personal data. Software and data 314 allows correction or completion of inaccurate or incomplete personal data. Software and data 314 allows for immediate deletion of personal data.

本明細書に記載されるコンピュータ実装方法は、本発明の特定の実施形態においてそれらが実装される用途に基づいて識別される。しかしながら、本明細書における任意の特定のプログラム命名法は、単に便宜上使用されており、したがって、本発明は、かかる命名法によって識別もしくは暗示またはその両方が行われる任意の特定の用途における使用のみに限定されるべきではないことを理解されたい。 Computer-implemented methods described herein are identified based on the use in which they are implemented in particular embodiments of the invention. However, any particular program nomenclature herein is used merely for convenience, and the invention is therefore intended only for use in any particular application identified and/or implied by such nomenclature. It should be understood that it should not be limited.

本発明は、システム、コンピュータ実装方法もしくはコンピュータプログラム製品またはそれらの組み合せとすることができる。コンピュータプログラム製品は、プロセッサに本発明の態様を実行させるためのコンピュータ可読プログラム命令を記憶したコンピュータ可読記憶媒体を含んでよい。 The invention may be a system, a computer-implemented method or a computer program product, or a combination thereof. A computer program product may include a computer readable storage medium having computer readable program instructions stored thereon for causing a processor to perform aspects of the invention.

コンピュータ可読記憶媒体は、命令実行装置によって使用される命令を保持し、記憶することができる有形の装置とすることができる。コンピュータ可読記憶媒体は、一例として、電子記憶装置、磁気記憶装置、光学記憶装置、電磁記憶装置、半導体記憶装置またはこれらの適切な組み合わせであってよい。コンピュータ可読記憶媒体のより具体的な一例としては、ポータブルコンピュータディスケット、ハードディスク、ＲＡＭ、ＲＯＭ、ＥＰＲＯＭ（またはフラッシュメモリ）、ＳＲＡＭ、ＣＤ－ＲＯＭ、ＤＶＤ、メモリスティック、フロッピーディスク、パンチカードまたは溝内の***構造などに命令を記録した機械的に符号化された装置、およびこれらの適切な組み合せが挙げられる。本明細書で使用されるコンピュータ可読記憶装置は、電波もしくは他の自由に伝播する電磁波、導波管もしくは他の伝送媒体を介して伝播する電磁波（例えば、光ファイバケーブルを通過する光パルス）、またはワイヤを介して送信される電気信号のような、一過性の信号それ自体として解釈されるべきではない。 A computer-readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer-readable storage medium may be, by way of example, electronic storage, magnetic storage, optical storage, electromagnetic storage, semiconductor storage, or any suitable combination thereof. More specific examples of computer readable storage media include portable computer diskettes, hard disks, RAM, ROM, EPROM (or flash memory), SRAM, CD-ROMs, DVDs, memory sticks, floppy disks, punched cards or slotted media. These include mechanically encoded devices having instructions recorded on raised structures and the like, and suitable combinations thereof. As used herein, computer-readable storage devices include radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses passing through fiber optic cables); or as a transient signal per se, such as an electrical signal transmitted over a wire.

本明細書に記載のコンピュータ可読プログラム命令は、コンピュータ可読記憶媒体からそれぞれのコンピューティング／処理装置に、または、ネットワーク（例えば、インターネット、ローカルエリアネットワーク、ワイドエリアネットワーク、もしくはワイヤレスネットワークネットワークまたはその組み合わせ）を介して外部コンピュータまたは外部記憶装置にダウンロードすることができる。ネットワークは、銅線伝送ケーブル、光伝送ファイバー、無線伝送、ルーター、ファイアウォール、スイッチ、ゲートウェイコンピュータ、もしくはエッジサーバーまたはその組み合わせで構成される。各コンピューティング／処理装置のネットワークアダプタカードまたはネットワークインターフェースは、ネットワークからコンピュータ可読プログラム命令を受信し、それぞれのコンピューティング／処理装置内のコンピュータ可読記憶媒体に格納するためにコンピュータ可読プログラム命令を転送する。 The computer readable program instructions described herein may be transferred from a computer readable storage medium to a respective computing/processing device or over a network (e.g., the Internet, a local area network, a wide area network, or a wireless network or combinations thereof). can be downloaded to an external computer or external storage device via. The network may consist of copper transmission cables, optical transmission fibers, wireless transmissions, routers, firewalls, switches, gateway computers, or edge servers, or combinations thereof. A network adapter card or network interface of each computing/processing device receives computer readable program instructions from the network and transfers the computer readable program instructions for storage on a computer readable storage medium within the respective computing/processing device. .

本発明の動作を実行するためのコンピュータ可読プログラム命令は、アセンブラ命令、命令セットアーキテクチャ（ＩＳＡ）命令、機械命令、機械依存命令、マイクロコード、ファームウェア命令、状態設定データ、またはＳｍａｌｌｔａｌｋ、Ｃ＋＋などのオブジェクト指向プログラミング言語と「Ｃ」プログラミング言語や類似のプログラミング言語などの手続き型プログラミング言語を含む、１つ以上のプログラミング言語の任意の組み合わせで記述されたソースコードまたはオブジェクトコードのいずれかであってよい。コンピュータ可読プログラム命令は、スタンドアロンソフトウェアパッケージとして、完全にユーザのコンピュータ上で、または部分的にユーザのコンピュータ上で実行可能である。あるいは、部分的にユーザのコンピュータ上でかつ部分的にリモートコンピュータ上で、または完全にリモートコンピュータまたはサーバ上で実行可能である。後者のシナリオでは、リモートコンピュータは、ローカルエリアネットワーク（ＬＡＮ）またはワイドエリアネットワーク（ＷＡＮ）を含む任意のタイプのネットワークを介してユーザのコンピュータに接続され、または（例えば、インターネットサービスプロバイダーを使用したインターネット経由で）外部コンピュータに接続されてよい。いくつかの実施形態では、例えば、プログラマブルロジック回路、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、またはプログラマブルロジックアレイ（ＰＬＡ）を含む電子回路は、本発明の態様を実行するために、コンピュータ可読プログラム命令の状態情報を利用してパーソナライズすることにより、コンピュータ可読プログラム命令を実行することができる。 Computer-readable program instructions for carrying out operations of the present invention may include assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state configuration data, or objects such as Smalltalk, C++, etc. It may be either source code or object code written in any combination of one or more programming languages, including oriented programming languages and procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions can be executed as a stand-alone software package, completely on a user's computer, or partially on a user's computer. Alternatively, it can be executed partially on the user's computer and partially on a remote computer, or completely on a remote computer or server. In the latter scenario, the remote computer is connected to the user's computer via any type of network, including a local area network (LAN) or wide area network (WAN), or is connected to the user's computer (e.g., via the Internet using an Internet service provider). ) may be connected to an external computer. In some embodiments, an electronic circuit, including, for example, a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), stores computer readable program instructions to carry out aspects of the invention. Personalization using the information allows computer readable program instructions to be executed.

本発明の態様は、本発明の実施形態による方法、装置（システム）、およびコンピュータプログラム製品のフローチャート図もしくはブロック図またはその両方を参照して本明細書に記載されている。フローチャート図もしくはブロック図またはその両方の各ブロック、およびフローチャート図もしくはブロック図またはその両方のブロックの組み合わせは、コンピュータ可読プログラム命令によって実装できることが理解されよう。 Aspects of the invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be appreciated that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

これらのコンピュータ可読プログラム命令は、コンピュータまたは他のプログラム可能なデータ処理装置のプロセッサを介して実行される命令がフローチャートもしくはブロック図またはその両方の１つまたは複数のブロックで指定された機能／動作を実装するための手段を生成するように、機械を生成するために汎用コンピュータ、専用コンピュータのプロセッサまたは他のプログラム可能なデータ処理装置に提供されることができる。これらのコンピュータ可読プログラム命令はまた、フローチャートもしくはブロック図またはその両方の１つまたは複数のブロックで指定された機能／行為の態様を実装する命令を含む生成品の１つを命令が記憶されたコンピュータ可読プログラム命令が構成するように、コンピュータ、プログラム可能なデータ処理装置、もしくは特定の方法で機能する他のデバイスまたはその組み合わせに接続可能なコンピュータ可読記憶媒体の中に記憶されることができる。 These computer readable program instructions are designed to enable instructions executed through a processor of a computer or other programmable data processing device to perform the functions/acts specified in one or more blocks of flowcharts and/or block diagrams. A general purpose computer, a special purpose computer processor, or other programmable data processing device may be provided to produce a machine to produce the means for implementing the method. These computer readable program instructions may also be used by a computer in which the instructions are stored to produce one of the products containing instructions for implementing aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams. The readable program instructions may be stored in a computer readable storage medium connectable to a computer, programmable data processing apparatus, or other device or combination thereof to function in a particular manner.

コンピュータ、他のプログラム可能な装置、または他のデバイス上でフローチャートもしくはブロック図またはその両方の１つまたは複数のブロックで指定された機能／行為を実行する命令のように、コンピュータ可読プログラム命令はまた、コンピュータ、他のプログラム可能なデータ処理装置、または他のデバイスにロードされ、コンピュータ、他のプログラム可能な装置、または他のデバイス上で一連の操作ステップを実行し、コンピュータ実装された過程を生成することができる。 Computer-readable program instructions may also include instructions for performing the functions/acts specified in one or more blocks of flowcharts and/or block diagrams on a computer, other programmable apparatus, or other device. , loaded onto a computer, other programmable data processing apparatus, or other device to perform a sequence of operational steps on the computer, other programmable apparatus, or other device to produce a computer-implemented process. can do.

図中のフローチャートおよびブロック図は、本発明の様々な実施形態によるシステム、方法、およびコンピュータプログラム製品が実行可能な態様の構成、機能、および動作を示している。これに関して、フローチャートまたはブロック図の各ブロックは、モジュール、セグメント、または命令の一部を表してよく、これは、指定された論理機能を実装するための１つまたは複数の実行可能命令を構成する。いくつかの代替の実施形態では、ブロックに示されている機能は、図に示されている順序とは異なる場合がある。例えば、連続して示される２つのブロックは、実際には、実質的に同時に実行されるか、またはブロックは、関係する機能に応じて逆の順序で実行される場合がある。ブロック図もしくはフローチャート図またはその両方の各ブロック、およびブロック図もしくはフローチャート図またはその両方のブロックの組み合わせは、指定された機能または動作を実行する、または特別な目的のハードウェアとコンピュータ命令の組み合わせを実行する特別な目的のハードウェアベースのシステムによって実装できることにも留意されたい。 The flowcharts and block diagrams in the figures illustrate the organization, functionality, and operation of the systems, methods, and computer program products capable of performing according to various embodiments of the invention. In this regard, each block in the flowchart or block diagram may represent a module, segment, or portion of instructions, which constitute one or more executable instructions for implementing the specified logical function. . In some alternative embodiments, the functions shown in the blocks may differ from the order shown in the figures. For example, two blocks shown in succession may actually be executed substantially concurrently, or the blocks may be executed in the reverse order depending on the functionality involved. Each block in the block diagrams and/or flowchart diagrams, and combinations of blocks in the block diagrams and/or flowchart diagrams, perform designated functions or operations, or implement special purpose hardware and computer instructions together. Note also that it can be implemented by a special purpose hardware-based system that executes.

本発明の様々な実施形態の説明は、例示の目的で提示されているが、網羅的であることを意図するものではなく、開示される実施形態に限定されることを意図するものでもない。本発明の範囲から逸脱することなく、多くの修正および変更が可能であることは当業者には明らかであろう。本明細書で使用される用語は、実施形態の原理、市場で見られる技術に対する実際の適用または技術的改善を最もよく説明するため、または当業者が本明細書に開示の実施形態を理解できるようにするために選択された。 The descriptions of various embodiments of the invention are presented for purposes of illustration and are not intended to be exhaustive or limited to the disclosed embodiments. It will be apparent to those skilled in the art that many modifications and changes are possible without departing from the scope of the invention. The terminology used herein is used to best describe the principles of the embodiments, their practical application or technical improvements to technologies found in the marketplace, or to enable those skilled in the art to understand the embodiments disclosed herein. selected to be.

Claims

音声アシスタントデバイスをテストするためのコンピュータ実装方法であって、前記コンピュータ実装方法は、
１または複数のプロセッサによって、データベースからテストデータを受信することであって、前記テストデータは、符号化パラメータの第１のセットと、予想されるデバイス応答を有する第１のユーザ発話とを含む、受信することと、
前記１または複数のプロセッサによって、前記第１のユーザ発話に符号化パラメータの前記第１のセットを適用することによって第１の修正されたユーザ発話を生成することであって、前記第１の修正されたユーザ発話は、前記第１のユーザ発話と音響的に異なる、生成することと、
前記１または複数のプロセッサによって、前記第１の修正されたユーザ発話を音声アシスタントデバイスに可聴に提示することと、
前記１または複数のプロセッサによって、前記音声アシスタントデバイスから第１の音声アシスタント応答を受信することと、
前記１または複数のプロセッサによって、前記第１の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似しているか否かを決定することと、
を含む、コンピュータ実装方法。 A computer-implemented method for testing a voice assistant device, the computer-implemented method comprising:
receiving test data from the database by the one or more processors, the test data including a first set of encoding parameters and a first user utterance having an expected device response; receiving and
generating, by the one or more processors, a first modified user utterance by applying the first set of encoding parameters to the first user utterance, the first modification generating a user utterance that is acoustically different from the first user utterance;
audibly presenting the first modified user utterance to a voice assistant device by the one or more processors;
receiving, by the one or more processors, a first voice assistant response from the voice assistant device;
determining, by the one or more processors, whether the first voice assistant response is substantially similar to the expected device response;
computer-implemented methods, including;

前記第１の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似していると決定することに応答して、前記１または複数のプロセッサによって、追加のテストデータの受信を停止すること
をさらに含む、請求項１に記載のコンピュータ実装方法。 ceasing receiving additional test data by the one or more processors in response to determining that the first voice assistant response is substantially similar to the expected device response; The computer-implemented method of claim 1, further comprising.

前記第１の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似していないと決定することに応答して、前記１または複数のプロセッサによって、少なくとも符号化パラメータの前記第１のセットおよび前記第１の修正されたユーザ発話に基づいて第１のエッジケースを識別すること
をさらに含む、請求項１に記載のコンピュータ実装方法。 In response to determining that the first voice assistant response is not substantially similar to the expected device response, at least the first set of encoding parameters and 2. The computer-implemented method of claim 1, further comprising: identifying a first edge case based on the first modified user utterance.

前記１または複数のプロセッサによって、符号化パラメータの前記第１のセット、前記第１の修正されたユーザ発話、および前記第１のエッジケースのうちの少なくとも１つに基づいて、符号化パラメータの第２のセットを決定すること
をさらに含む、請求項３に記載のコンピュータ実装方法。 a first set of encoding parameters based on at least one of the first set of encoding parameters, the first modified user utterance, and the first edge case; 4. The computer-implemented method of claim 3, further comprising determining a set of 2.

前記１または複数のプロセッサによって、前記第１のユーザ発話に符号化パラメータの前記第２のセットを適用することによって第２の修正されたユーザ発話を生成することであって、前記第２の修正されたユーザ発話は、前記第１のユーザ発話および前記第１の修正されたユーザ発話と音響的に異なる、生成することと、
前記１または複数のプロセッサによって、前記第２の修正されたユーザ発話を前記音声アシスタントデバイスに可聴に提示することと、
前記１または複数のプロセッサによって、第２の音声アシスタント応答を受信することと、
前記１または複数のプロセッサによって、前記第２の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似しているか否かを決定することと、
前記第２の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似していると決定することに応答して、前記第１のユーザ発話を含む追加のテストデータの受信を停止することと、
をさらに含む、請求項４に記載のコンピュータ実装方法。 generating, by the one or more processors, a second modified user utterance by applying the second set of encoding parameters to the first user utterance, the second modified user utterance; generating a modified user utterance that is acoustically different from the first user utterance and the first modified user utterance;
audibly presenting the second modified user utterance to the voice assistant device by the one or more processors;
receiving a second voice assistant response by the one or more processors;
determining, by the one or more processors, whether the second voice assistant response is substantially similar to the expected device response;
and ceasing receiving additional test data including the first user utterances in response to determining that the second voice assistant response is substantially similar to the expected device response. ,
5. The computer-implemented method of claim 4, further comprising:

前記第１の修正されたユーザ発話は、前記１または複数のプロセッサに通信可能に接続されたスピーカーによって前記音声アシスタントデバイスに可聴に提示され、前記スピーカーは、前記第１の修正されたユーザ発話を第１の音声信号として出力するように構成される、
請求項１に記載のコンピュータ実装方法。 The first modified user utterance is audibly presented to the voice assistant device by a speaker communicatively connected to the one or more processors, the speaker transmitting the first modified user utterance. configured to output as a first audio signal;
The computer-implemented method of claim 1.

前記音声アシスタントデバイスから受信した前記第１の音声アシスタント応答は、前記１または複数のプロセッサに通信可能に接続されたマイクロフォンによって第２の音声信号として検出される、
請求項１に記載のコンピュータ実装方法。 the first voice assistant response received from the voice assistant device is detected as a second voice signal by a microphone communicatively connected to the one or more processors;
The computer-implemented method of claim 1.

音声アシスタントデバイスをテストするためのコンピュータプログラム製品であって、前記コンピュータプログラム製品は、
１または複数のコンピュータ可読記憶媒体と、前記１または複数のコンピュータ可読記憶媒体に集合的に格納されたプログラム命令とを含み、前記格納されたプログラム命令は、
データベースからテストデータを受信するプログラム命令であって、前記テストデータは、符号化パラメータの第１のセットと、予想されるデバイス応答を有する第１のユーザ発話とを含む、プログラム命令と、
前記第１のユーザ発話に符号化パラメータの前記第１のセットを適用することによって第１の修正されたユーザ発話を生成するプログラム命令であって、前記第１の修正されたユーザ発話は、前記第１のユーザ発話と音響的に異なる、プログラム命令と、
前記第１の修正されたユーザ発話を音声アシスタントデバイスに可聴に提示するプログラム命令と、
前記音声アシスタントデバイスから第１の音声アシスタント応答を受信するプログラム命令と、
前記第１の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似しているか否かを決定するプログラム命令と、
を含む、コンピュータプログラム製品。 A computer program product for testing a voice assistant device, the computer program product comprising:
one or more computer-readable storage media and program instructions collectively stored on the one or more computer-readable storage media, the stored program instructions comprising:
Program instructions for receiving test data from a database, the test data including a first set of encoded parameters and a first user utterance having an expected device response;
program instructions for generating a first modified user utterance by applying the first set of encoding parameters to the first user utterance, the first modified user utterance comprising: program instructions acoustically different from the first user utterance;
program instructions for audibly presenting the first modified user utterance to a voice assistant device;
program instructions for receiving a first voice assistant response from the voice assistant device;
program instructions for determining whether the first voice assistant response is substantially similar to the expected device response;
computer program products, including;

前記第１の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似していると決定することに応答して、追加のテストデータの受信を停止するプログラム命令
をさらに含む、請求項８に記載のコンピュータプログラム製品。 9. The method of claim 8, further comprising program instructions for ceasing receiving additional test data in response to determining that the first voice assistant response is substantially similar to the expected device response. Computer program products listed.

前記第１の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似していないと決定することに応答して、少なくとも符号化パラメータの前記第１のセットおよび前記第１の修正されたユーザ発話に基づいて第１のエッジケースを識別するプログラム命令
をさらに含む、請求項８に記載のコンピュータプログラム製品。 at least the first set of encoding parameters and the first modified user in response to determining that the first voice assistant response is not substantially similar to the expected device response; 9. The computer program product of claim 8, further comprising program instructions for identifying a first edge case based on an utterance.

符号化パラメータの前記第１のセット、前記第１の修正されたユーザ発話、および前記第１のエッジケースのうちの少なくとも１つに基づいて、符号化パラメータの第２のセットを決定するプログラム命令
をさらに含む、請求項１０に記載のコンピュータプログラム製品。 Program instructions for determining a second set of encoding parameters based on at least one of the first set of encoding parameters, the first modified user utterance, and the first edge case. 11. The computer program product of claim 10, further comprising:

前記第１のユーザ発話に符号化パラメータの前記第２のセットを適用することによって第２の修正されたユーザ発話を生成するプログラム命令であって、前記第２の修正されたユーザ発話は、前記第１のユーザ発話および前記第１の修正されたユーザ発話と音響的に異なる、プログラム命令と、
前記第２の修正されたユーザ発話を前記音声アシスタントデバイスに可聴に提示するプログラム命令と、
第２の音声アシスタント応答を受信するプログラム命令と、
前記第２の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似しているか否かを決定するプログラム命令と、
前記第２の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似していると決定することに応答して、前記第１のユーザ発話を含む追加のテストデータの受信を停止するプログラム命令と、
をさらに含む、請求項１１に記載のコンピュータプログラム製品。 program instructions for generating a second modified user utterance by applying the second set of encoding parameters to the first user utterance, the second modified user utterance comprising: program instructions that are acoustically different from a first user utterance and the first modified user utterance;
program instructions for audibly presenting the second modified user utterance to the voice assistant device;
program instructions for receiving a second voice assistant response;
program instructions for determining whether the second voice assistant response is substantially similar to the expected device response;
program instructions for ceasing receiving additional test data including the first user utterances in response to determining that the second voice assistant response is substantially similar to the expected device response; and,
12. The computer program product of claim 11, further comprising:

前記第１の修正されたユーザ発話は、前記１または複数のプロセッサに通信可能に接続されたスピーカーによって前記音声アシスタントデバイスに可聴に提示され、前記スピーカーは、前記第１の修正されたユーザ発話を第１の音声信号として出力するように構成される、
請求項８に記載のコンピュータプログラム製品。 The first modified user utterance is audibly presented to the voice assistant device by a speaker communicatively connected to the one or more processors, the speaker transmitting the first modified user utterance. configured to output as a first audio signal;
A computer program product according to claim 8.

前記音声アシスタントデバイスから受信した前記第１の音声アシスタント応答は、前記１または複数のプロセッサに通信可能に接続されたマイクロフォンによって第２の音声信号として検出される、
請求項８に記載のコンピュータプログラム製品。 the first voice assistant response received from the voice assistant device is detected as a second voice signal by a microphone communicatively connected to the one or more processors;
A computer program product according to claim 8.

音声アシスタントデバイスをテストするためのコンピュータシステムであって、前記コンピュータシステムは、
１または複数のコンピュータプロセッサと、
１または複数のコンピュータ可読記憶媒体と、
前記１または複数のコンピュータプロセッサの少なくとも１つによって実行するための、前記１または複数のコンピュータ可読記憶媒体に集合的に格納されたプログラム命令とを含み、前記格納されたプログラム命令は、
データベースからテストデータを受信するプログラム命令であって、前記テストデータは、符号化パラメータの第１のセットと、予想されるデバイス応答を有する第１のユーザ発話とを含む、プログラム命令と、
前記第１のユーザ発話に符号化パラメータの前記第１のセットを適用することによって第１の修正されたユーザ発話を生成するプログラム命令であって、前記第１の修正されたユーザ発話は、前記第１のユーザ発話と音響的に異なる、プログラム命令と、
前記第１の修正されたユーザ発話を音声アシスタントデバイスに可聴に提示するプログラム命令と、
前記音声アシスタントデバイスから第１の音声アシスタント応答を受信するプログラム命令と、
前記第１の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似しているか否かを決定するプログラム命令と、
を含む、コンピュータシステム。 A computer system for testing a voice assistant device, the computer system comprising:
one or more computer processors;
one or more computer readable storage media;
program instructions collectively stored on the one or more computer-readable storage media for execution by at least one of the one or more computer processors, the stored program instructions comprising:
Program instructions for receiving test data from a database, the test data including a first set of encoded parameters and a first user utterance having an expected device response;
program instructions for generating a first modified user utterance by applying the first set of encoding parameters to the first user utterance, the first modified user utterance comprising: program instructions acoustically different from the first user utterance;
program instructions for audibly presenting the first modified user utterance to a voice assistant device;
program instructions for receiving a first voice assistant response from the voice assistant device;
program instructions for determining whether the first voice assistant response is substantially similar to the expected device response;
computer systems, including;

前記第１の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似していると決定することに応答して、追加のテストデータの受信を停止するプログラム命令
をさらに含む、請求項１５に記載のコンピュータシステム。 16. The method of claim 15, further comprising program instructions for ceasing receiving additional test data in response to determining that the first voice assistant response is substantially similar to the expected device response. Computer system as described.

前記第１の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似していないと決定することに応答して、少なくとも符号化パラメータの前記第１のセットおよび前記第１の修正されたユーザ発話に基づいて第１のエッジケースを識別するプログラム命令
をさらに含む、請求項１５に記載のコンピュータシステム。 at least the first set of encoding parameters and the first modified user in response to determining that the first voice assistant response is not substantially similar to the expected device response; 16. The computer system of claim 15, further comprising program instructions for identifying a first edge case based on an utterance.

符号化パラメータの前記第１のセット、前記第１の修正されたユーザ発話、および前記第１のエッジケースのうちの少なくとも１つに基づいて、符号化パラメータの第２のセットを決定するプログラム命令
をさらに含む、請求項１７に記載のコンピュータシステム。 Program instructions for determining a second set of encoding parameters based on at least one of the first set of encoding parameters, the first modified user utterance, and the first edge case. 18. The computer system of claim 17, further comprising:

前記第１のユーザ発話に符号化パラメータの前記第２のセットを適用することによって第２の修正されたユーザ発話を生成するプログラム命令であって、前記第２の修正されたユーザ発話は、前記第１のユーザ発話および前記第１の修正されたユーザ発話と音響的に異なる、プログラム命令と、
前記第２の修正されたユーザ発話を前記音声アシスタントデバイスに可聴に提示するプログラム命令と、
第２の音声アシスタント応答を受信するプログラム命令と、
前記第２の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似しているか否かを決定するプログラム命令と、
前記第２の音声アシスタント応答が前記予想されるデバイス応答と実質的に類似していると決定することに応答して、前記第１のユーザ発話を含む追加のテストデータの受信を停止するプログラム命令と、
をさらに含む、請求項１８に記載のコンピュータシステム。 program instructions for generating a second modified user utterance by applying the second set of encoding parameters to the first user utterance, the second modified user utterance comprising: program instructions that are acoustically different from a first user utterance and the first modified user utterance;
program instructions for audibly presenting the second modified user utterance to the voice assistant device;
program instructions for receiving a second voice assistant response;
program instructions for determining whether the second voice assistant response is substantially similar to the expected device response;
program instructions for ceasing receiving additional test data including the first user utterances in response to determining that the second voice assistant response is substantially similar to the expected device response; and,
20. The computer system of claim 18, further comprising:

前記第１の修正されたユーザ発話は、前記１または複数のプロセッサに通信可能に接続されたスピーカーによって前記音声アシスタントデバイスに可聴に提示され、前記スピーカーは、前記第１の修正されたユーザ発話を第１の音声信号として出力するように構成され、
前記音声アシスタントデバイスから受信した前記第１の音声アシスタント応答は、前記１または複数のプロセッサに通信可能に接続されたマイクロフォンによって第２の音声信号として検出される、
請求項１５に記載のコンピュータシステム。 The first modified user utterance is audibly presented to the voice assistant device by a speaker communicatively connected to the one or more processors, the speaker transmitting the first modified user utterance. configured to output as a first audio signal,
the first voice assistant response received from the voice assistant device is detected as a second voice signal by a microphone communicatively connected to the one or more processors;
The computer system according to claim 15.