JP2019204112A5

JP2019204112A5 -

Info

Publication number: JP2019204112A5
Application number: JP2019137200A
Authority: JP
Filing date: 2019-07-25
Publication date: 2020-11-12
Anticipated expiration: 2035-04-10

Claims

音声ウェイクアップ装置及び音声認識装置を含む端末に適用され、
前記音声ウェイクアップ装置によって、周辺環境内の第１の音声情報を聴取するステップであって、前記第１の音声情報は、ウェイクアップ情報と、コマンドワードの第１の部分とを含み、前記ウェイクアップ情報は、前記音声認識装置をイネーブルにするために使用される、ステップと、
前記音声ウェイクアップ装置によって、前記ウェイクアップ情報に従って、前記音声認識装置をイネーブルにするステップと、
前記音声認識装置によって、第２の音声情報を聴取するステップであって、前記第２の音声情報は、前記コマンドワードの第２の部分を含む、ステップと、
前記音声認識装置によって、前記第１の音声情報及び前記第２の音声情報に従って、音声指示情報を取得するステップであって、前記音声指示情報は、前記コマンドワードに一致し、前記コマンドワードは、前記コマンドワードの前記第１の部分と前記コマンドワードの前記第２の部分とを含む、ステップと
を含むことを特徴とする、音声制御方法。 Is applied to terminal including a voice waking device and speech recognition system,
The step of listening to the first voice information in the surrounding environment by the voice wake-up device, the first voice information includes the wake-up information and the first part of the command word, and the wake. The up information is used to enable the voice recognition device, the step and
The step of enabling the voice recognition device according to the wakeup information by the voice wakeup device, and
A step of listening to a second voice information by the voice recognition device, wherein the second voice information includes a second part of the command word .
In a step of acquiring voice instruction information according to the first voice information and the second voice information by the voice recognition device , the voice instruction information matches the command word, and the command word is wherein said first portion of the command word and a second portion of the command word, characterized in that it comprises a step, voice control method.

前記音声ウェイクアップ装置によって、前記ウェイクアップ情報に従って、前記音声認識装置をイネーブルする前記ステップは、
前記音声ウェイクアップ装置によって、前記ウェイクアップ情報が音声ウェイクアップモデルと一致すると決定する場合に、前記音声認識装置をイネーブルにするためのトリガ信号を生成するステップを含む
請求項１に記載の方法。 The step of enabling the voice recognition device according to the wakeup information by the voice wakeup device is
The method of claim 1 , comprising the step of generating a trigger signal to enable the voice recognition device when the voice wakeup device determines that the wakeup information matches the voice wakeup model. ..

前記ウェイクアップ情報が音声ウェイクアップモデルと一致すると決定することは、Determining that the wakeup information matches the voice wakeup model can be determined.
前記ウェイクアップ情報が所定のウェイクアップ音声情報と一致する場合に、前記ウェイクアップ情報が音声ウェイクアップモデルと一致すると決定すること、を含む、Including that, when the wake-up information matches a predetermined wake-up voice information, it is determined that the wake-up information matches the voice wake-up model.
請求項２に記載の方法。The method according to claim 2.

前記ウェイクアップ情報が音声ウェイクアップモデルと一致すると決定することは、
前記ウェイクアップ情報が所定のウェイクアップ音声情報と一致する場合に前記ウェイクアップ情報内の声紋特徴を抽出し、抽出された前記声紋特徴が所定の声紋特徴と一致する場合に前記ウェイクアップ情報が音声ウェイクアップモデルと一致すると決定することを含む
請求項２に記載の方法。 Determining that the wakeup information matches the voice wakeup model is
The wake-up information sound when the wake-up information was extracted voiceprint features within the wake-up information when matching the predetermined wake-up sound information, extracted the voiceprint feature matches a predetermined voiceprint feature The method of claim 2 , comprising determining that the wakeup model is consistent .

前記声紋特徴は、以下の特徴、即ち、ピッチ曲線、線形予測係数、スペクトル包絡線パラメータ、高調波エネルギー比率、共鳴ピーク周波数及びその帯域幅、ケプストラム、或いは、メル周波数ケプストラム係数のうちの１つ以上のものを含む
請求項４に記載の方法。 The voiceprint features the following characteristics, i.e., pitch curve, the linear prediction coefficients, the spectral envelope parameters, harmonic energy ratio, the resonance peak frequency and its bandwidth, cepstrum, or one or more of the mel-frequency cepstral coefficients The method according to claim 4 , which includes the above.

前記音声認識装置によって、前記第１の音声情報及び前記第２の音声情報に従って、音声指示情報を取得する前記ステップは、
前記音声認識装置によって、前記第１の音声情報及び前記第２の音声情報に従って、認識結果を取得するステップであって、前記認識結果は、コマンドワード情報を含む、ステップと、
前記音声認識装置によって、取得された前記認識結果と予め格納された音声指示情報との間のマッチングによって、前記認識結果に一致する前記音声指示情報を取得するステップとを含む、
請求項１に記載の方法。 By the speech recognition device, according to the first audio information and the second audio information, the step of obtaining audio instruction information,
A step of acquiring a recognition result by the voice recognition device according to the first voice information and the second voice information, wherein the recognition result includes a command word information.
The voice recognition device includes a step of acquiring the voice instruction information that matches the recognition result by matching between the recognition result acquired and the voice instruction information stored in advance .
The method according to claim 1 .

前記ウェイクアップ情報は、第１の期間内に前記音声ウェイクアップ装置によって聴取され、前記コマンドワードの前記第１の部分は、第２の期間内に前記音声ウェイクアップ装置によって聴取され、The wake-up information is heard by the voice wake-up device within the first period, and the first portion of the command word is heard by the voice wake-up device within the second period.
前記第２の音声情報は、第３の期間内に前記音声認識装置によって聴取される、The second voice information is heard by the voice recognition device within the third period.
請求項１〜６のいずれか１項に記載の方法。The method according to any one of claims 1 to 6.

前記音声ウェイクアップ装置によって、周辺環境内の第１の音声情報を聴取する前記ステップは、The step of listening to the first voice information in the surrounding environment by the voice wakeup device is
スタンバイ状態において周辺環境内の前記第１の音声情報を聴取するステップ、又はThe step of listening to the first audio information in the surrounding environment in the standby state, or
非スタンバイ状態において周辺環境内の前記第１の音声情報を聴取するステップ、又はThe step of listening to the first audio information in the surrounding environment in the non-standby state, or
スクリーンロック状態において周辺環境内の前記第１の音声情報を聴取するステップThe step of listening to the first audio information in the surrounding environment in the screen locked state.
を含む、including,
請求項１〜６のいずれか１項に記載の方法。The method according to any one of claims 1 to 6.

前記音声ウェイクアップ装置によって、前記トリガ信号を前記音声認識装置に送信して、前記音声認識装置をイネーブルにするステップをさらに含む、The voice wakeup device further comprises a step of transmitting the trigger signal to the voice recognition device to enable the voice recognition device.
請求項２に記載の方法。The method according to claim 2.

前記音声認識装置によって、一致した音声指示情報に対応する動作の実行を制御するステップをさらに含む、The voice recognition device further includes a step of controlling the execution of the operation corresponding to the matched voice instruction information.
請求項１〜６のいずれか１項に記載の方法。The method according to any one of claims 1 to 6.

前記音声認識装置をイネーブルにした後の事前設定された期間内に音声情報が再び受信されないと決定するとき、前記音声認識装置によって自動的にディセーブルにするステップをさらに含む、Further including a step of automatically disabling the voice recognition device when it is determined that the voice information will not be received again within a preset period of time after the voice recognition device is enabled.
請求項１〜６のいずれか１項に記載の方法。The method according to any one of claims 1 to 6.

前記音声ウェイクアップ装置は、デジタル信号プロセッサＤＳＰである、The voice wakeup device is a digital signal processor DSP.
請求項１〜６のいずれか１項に記載の方法。The method according to any one of claims 1 to 6.

前記音声認識装置は、アプリケーションプロセッサＡＰである、The voice recognition device is an application processor AP.
請求項１〜６のいずれか１項に記載の方法。The method according to any one of claims 1 to 6.

端末であって、
１つ又は複数のプロセッサと、
命令を格納するメモリであって、前記１つ又は複数のプロセッサによって前記命令が実行されたときに、請求項１〜１３のいずれか１項に規定される前記方法を前記端末に実行させることを特徴とする、メモリと
を備える、端末。 It ’s a terminal,
With one or more processors
A memory for storing an instruction, which causes the terminal to execute the method specified in any one of claims 1 to 13 when the instruction is executed by the one or more processors. Features memory and
A terminal equipped with.

プロセッサによる実行のためにそこに格納されたコンピュータ利用可能命令を有する非一時的なコンピュータ読み取り可能媒体であって、前記命令が、前記プロセッサに、請求項１〜１３のいずれか１項に記載の前記方法を実行させる、ことを特徴とする、
非一時的なコンピュータ読み取り可能媒体。 The non-transitory computer-readable media having been a computer usable instructions stored thereon for execution by a processor, the instructions cause the processor according to any one of claims 1 to 13 The method is carried out.
Non-temporary computer-readable media.

音声ウェイクアップ装置及び音声認識装置を含み、Including voice wake-up device and voice recognition device
前記音声ウェイクアップ装置は、周辺環境内の第１の音声情報を聴取することであって、前記第１の音声情報は、ウェイクアップ情報と、コマンドワードの第１の部分とを含み、前記ウェイクアップ情報は、前記音声認識装置をイネーブルにするために使用される、ことを行うように構成され、The voice wake-up device is to listen to the first voice information in the surrounding environment, and the first voice information includes the wake-up information and the first part of the command word, and the wake-up device. The up information is configured to do what is used to enable the voice recognition device.
前記音声ウェイクアップ装置は、前記ウェイクアップ情報に従って、前記音声認識装置をイネーブルにするように構成され、The voice wakeup device is configured to enable the voice recognition device according to the wakeup information.
前記音声認識装置は、第２の音声情報を聴取することであって、前記第２の音声情報は、前記コマンドワードの第２の部分を含む、ことを行うように構成され、The voice recognition device is configured to listen to a second voice information, the second voice information including a second portion of the command word.
前記音声認識装置は、前記第１の音声情報及び前記第２の音声情報に従って、音声指示情報を取得することであって、前記音声指示情報は、前記コマンドワードに一致し、前記コマンドワードは、前記コマンドワードの前記第１の部分と前記コマンドワードの前記第２の部分とを含む、ことを行うように構成される、The voice recognition device acquires voice instruction information according to the first voice information and the second voice information, the voice instruction information matches the command word, and the command word is Containing said first part of the command word and said second part of the command word, configured to do.
ことを特徴とする、端末。A terminal characterized by that.

前記音声ウェイクアップ装置は、前記ウェイクアップ情報が所定のウェイクアップ音声情報と一致する場合に、前記ウェイクアップ情報が音声ウェイクアップモデルと一致すると決定するように構成される、The voice wakeup device is configured to determine that the wakeup information matches a voice wakeup model when the wakeup information matches a predetermined wakeup voice information.
請求項１６に記載の端末。The terminal according to claim 16.

前記音声ウェイクアップ装置は、前記ウェイクアップ情報が所定のウェイクアップ音声情報と一致する場合に前記ウェイクアップ情報内の声紋特徴を抽出し、抽出された前記声紋特徴が所定の声紋特徴と一致する場合に前記ウェイクアップ情報が音声ウェイクアップモデルと一致すると決定する、ように構成される、The voice wakeup device extracts the voiceprint feature in the wakeup information when the wakeup information matches the predetermined wakeup voice information, and when the extracted voiceprint feature matches the predetermined voiceprint feature. Is configured to determine that the wakeup information matches the voice wakeup model.
請求項１６に記載の端末。The terminal according to claim 16.

前記声紋特徴は、以下の特徴、即ち、ピッチ曲線、線形予測係数、スペクトル包絡線パラメータ、高調波エネルギー比率、共鳴ピーク周波数及びその帯域幅、ケプストラム、或いは、メル周波数ケプストラム係数のうちの１つ以上のものを含むThe voiceprint feature is one or more of the following features: pitch curve, linear prediction factor, spectral envelope parameter, harmonic energy ratio, resonance peak frequency and its bandwidth, cepstrum, or mel frequency cepstrum coefficient. Including
請求項１８に記載の端末。The terminal according to claim 18.

前記音声認識装置は、The voice recognition device is
前記第１の音声情報及び前記第２の音声情報に従って、認識結果を取得することであって、前記認識結果は、コマンドワード情報を含む、ことを行い、Acquiring the recognition result according to the first voice information and the second voice information, and the recognition result includes the command word information.
取得された前記認識結果と予め格納された音声指示情報との間のマッチングによって、前記認識結果に一致する前記音声指示情報を取得する、ように構成されるBy matching between the acquired recognition result and the voice instruction information stored in advance, the voice instruction information matching the recognition result is acquired.
請求項１６に記載の端末。The terminal according to claim 16.

前記ウェイクアップ情報は、第１の期間内に前記音声ウェイクアップ装置によって聴取され、前記コマンドワードの前記第１の部分は、第２の期間内に前記音声ウェイクアップ装置によって聴取され、The wake-up information is heard by the voice wake-up device within the first period, and the first portion of the command word is heard by the voice wake-up device within the second period.
前記第２の音声情報は、第３の期間内に前記音声認識装置によって聴取される、The second voice information is heard by the voice recognition device within the third period.
請求項１６〜２０のいずれか１項に記載の端末。The terminal according to any one of claims 16 to 20.

前記音声ウェイクアップ装置は、The voice wake-up device is
スタンバイ状態において周辺環境内の前記第１の音声情報を聴取するか、又はListen to the first audio information in the surrounding environment in the standby state, or
非スタンバイ状態において周辺環境内の前記第１の音声情報を聴取するか、又はListening to the first audio information in the surrounding environment in the non-standby state, or
スクリーンロック状態において周辺環境内の前記第１の音声情報を聴取するListen to the first audio information in the surrounding environment in the screen locked state
ように構成される、Is configured as
請求項１６〜２０のいずれか１項に記載の端末。The terminal according to any one of claims 16 to 20.

前記音声認識装置は、The voice recognition device is
前記音声認識装置をイネーブルにした後の事前設定された期間内に音声情報が再び受信されないと決定するとき、自動的にディセーブルにするように構成される、It is configured to be automatically disabled when it determines that voice information will not be received again within a preset period of time after enabling the voice recognition device.
請求項１６〜２０のいずれか１項に記載の端末。The terminal according to any one of claims 16 to 20.

前記音声認識装置は、実行モジュールをさらに含み、The voice recognition device further includes an execution module.
前記音声認識装置は、また、前記音声指示情報に一致する実行命令を前記実行モジュールに送るように構成され、The voice recognition device is also configured to send an execution instruction matching the voice instruction information to the execution module.
前記実行モジュールは、前記実行命令に対応する動作を実行するように構成される、The execution module is configured to execute an operation corresponding to the execution instruction.
請求項１６〜２０のいずれか１項に記載の端末。The terminal according to any one of claims 16 to 20.

前記音声ウェイクアップ装置は、デジタル信号プロセッサＤＳＰであり、The voice wakeup device is a digital signal processor DSP.
前記音声認識装置は、アプリケーションプロセッサＡＰである、The voice recognition device is an application processor AP.
請求項１６〜２０のいずれか１項に記載の端末。The terminal according to any one of claims 16 to 20.