JP2023040244A

JP2023040244A - Earphone, voice processing method, and voice processing program

Info

Publication number: JP2023040244A
Application number: JP2023003670A
Authority: JP
Inventors: 孝司大杉; Koji Osugi; 隆行荒川; Takayuki Arakawa; 昭彦杉山; Akihiko Sugiyama; 良次宮原; Ryoji Miyahara
Original assignee: NEC Platforms Ltd; NEC Corp
Current assignee: NEC Platforms Ltd; NEC Corp
Priority date: 2020-12-02
Filing date: 2023-01-13
Publication date: 2023-03-22
Also published as: JP7214704B2; JP2021047436A

Abstract

PROBLEM TO BE SOLVED: To provide an earphone, a voice processing method, and a voice processing program for inputting high-quality voice to an AI assistant by performing noise cancellation.

SOLUTION: A voice input/output device 200 being a headphone, an earphone, or a headset, includes: an internal microphone 201 as a main voice acquisition part; an external microphone 202 being a noise acquisition part; a speaker 203 as a voice output part; and a voice processing part 290. The voice processing part 290 includes a noise cancellation part 204 and an echo cancellation part 205. The internal microphone 201 catches mixed voice where external noise, output voice 231, and main voice are mixed, so as to output a mixed voice signal 212. The external microphone 202 is arranged toward an outer side of the body of a user 270. The external microphone 202 catches the external noise coming from the outside of the user. A reception signal 240 received by a communication part 260 is converted into an output voice signal 232, so as to be input to the speaker 203.

SELECTED DRAWING: Figure 2B

Description

本発明は、イヤホン、音声処理方法および音声処理プログラムに関する。 The present invention relates to earphones, audio processing methods, and audio processing programs.

上記技術分野において、特許文献１には、マイク部の不使用時に第１スピーカおよび第２スピーカから音声を出力し、マイク部の使用時に第１スピーカの音声出力を停止して第２スピーカから音声を出力する音声入出力装置が開示されている。特許文献２には、環境騒音に対して装着部の筐体の遮音性能により発話収音信号のＳ／Ｎを確保した上で、ＮＣ処理により内部空間におけるノイズを抑制することで、発話収音信号のＳ／Ｎ向上を図る技術が開示されている。 In the above technical field, Patent Document 1 discloses that when the microphone unit is not used, sound is output from the first speaker and the second speaker, and when the microphone unit is used, the sound output from the first speaker is stopped and the sound is output from the second speaker. A voice input/output device for outputting is disclosed. In Patent Document 2, after ensuring the S/N of the speech pickup signal by the sound insulation performance of the housing of the mounting part against environmental noise, the noise in the internal space is suppressed by NC processing. Techniques for improving the S/N of signals have been disclosed.

特開２０１５－６１１１５号公報JP 2015-61115 A 特開２０１７－１１７５４号公報JP 2017-11754 A

しかしながら、上記特許文献１に記載の技術では、エコーが生じないためエコーキャンセルを行う必要がなかった。また、上記特許文献２に記載の技術では、内部マイクロホンに環境騒音が入らないため内部マイクロホンに入力された音声信号から外部ノイズをキャンセルする必要がなかったため、高品質な主音声信号を生成できなかった。 However, in the technique described in Patent Literature 1, echo cancellation is unnecessary because no echo occurs. Further, in the technique described in Patent Document 2, since environmental noise does not enter the internal microphone, there is no need to cancel the external noise from the audio signal input to the internal microphone, so a high-quality main audio signal cannot be generated. rice field.

本発明の目的は、上述の課題を解決する技術を提供することにある。 An object of the present invention is to provide a technique for solving the above problems.

上記目的を達成するため、本発明に係るイヤホンは、ノイズキャンセル機能を備え、ＡＩアシスタントに入力する音声を取得するマイクを備える。 To achieve the above object, an earphone according to the present invention has a noise canceling function and a microphone for acquiring voice to be input to an AI assistant.

上記目的を達成するため、本発明に係る音声処理方法は、
ＡＩアシスタントに入力する音声を取得する音声取得ステップと、
前記ＡＩアシスタントに入力する音声についてノイズキャンセルを行うノイズキャンセリングステップと、
を含む。 In order to achieve the above object, the speech processing method according to the present invention comprises:
a voice acquisition step of acquiring voice to be input to the AI assistant;
a noise canceling step of noise canceling the voice input to the AI assistant;
including.

上記目的を達成するため、本発明に係る音声処理プログラムは、
ＡＩアシスタントに入力する音声を取得する音声取得ステップと、
前記ＡＩアシスタントに入力する音声についてノイズキャンセルを行うノイズキャンセリングステップと、
をコンピュータに実行させる。 In order to achieve the above object, a speech processing program according to the present invention includes:
a voice acquisition step of acquiring voice to be input to the AI assistant;
a noise canceling step of noise canceling the voice input to the AI assistant;
run on the computer.

本発明によれば、ノイズキャンセルを行って高品質な音声をＡＩアシスタントに入力することができる。 ADVANTAGE OF THE INVENTION According to this invention, noise cancellation can be performed and a high-quality voice can be input into AI assistant.

本発明の第１実施形態に係る音声入出力装置の構成を示すブロック図である。1 is a block diagram showing the configuration of an audio input/output device according to a first embodiment of the present invention; FIG. 本発明の第２実施形態に係る音声入出力装置の構成を示す図である。FIG. 5 is a diagram showing the configuration of a voice input/output device according to a second embodiment of the present invention; 本発明の第２実施形態に係る音声入出力装置の音声処理部の詳しい構成を示す図である。FIG. 8 is a diagram showing the detailed configuration of an audio processing unit of the audio input/output device according to the second embodiment of the present invention; 本発明の第２実施形態に係る音声入出力装置の制御部の係数処理を説明する図である。It is a figure explaining coefficient processing of the control part of the voice input/output device concerning a 2nd embodiment of the present invention. 本発明の第３実施形態に係る音声入出力装置の構成を示す図である。FIG. 10 is a diagram showing the configuration of a voice input/output device according to a third embodiment of the present invention; 本発明の第４実施形態に係る音声入出力装置の構成を示す図である。FIG. 10 is a diagram showing the configuration of a voice input/output device according to a fourth embodiment of the present invention; 本発明の第５実施形態に係る補聴器の構成を示す図である。FIG. 10 is a diagram showing the configuration of a hearing aid according to a fifth embodiment of the present invention; 本発明の第５実施形態に係る補聴器の構成を示す図である。FIG. 10 is a diagram showing the configuration of a hearing aid according to a fifth embodiment of the present invention; 本発明の第５実施形態に係る補聴器の構成を示す図である。FIG. 10 is a diagram showing the configuration of a hearing aid according to a fifth embodiment of the present invention; 本発明の第６実施形態に係る音声入出力装置の構成を示す図である。FIG. 12 is a diagram showing the configuration of a voice input/output device according to a sixth embodiment of the present invention; 本発明の第７実施形態に係る音声入出力装置の構成を示す図である。FIG. 14 is a diagram showing the configuration of a voice input/output device according to a seventh embodiment of the present invention; 第２実施形態を信号処理プログラムにより構成する場合に、その信号処理プログラムを実行するコンピュータの構成図である。FIG. 11 is a configuration diagram of a computer that executes a signal processing program when the second embodiment is configured by the signal processing program; ＣＰＵ８２０が実行する処理の流れを示すフローチャートである。8 is a flowchart showing the flow of processing executed by a CPU 820; ＣＰＵ８２０が実行する処理の流れを示すフローチャートである。8 is a flowchart showing the flow of processing executed by a CPU 820; ＣＰＵ８２０が実行する処理の流れを示すフローチャートである。8 is a flowchart showing the flow of processing executed by a CPU 820;

以下に、本発明を実施するための形態について、図面を参照して、例示的に詳しく説明記載する。ただし、以下の実施の形態に記載されている、構成、数値、処理の流れ、機能要素などは一例に過ぎず、その変形や変更は自由であって、本発明の技術範囲を以下の記載に限定する趣旨のものではない。また、下記図面において、一方向性の矢印は、ある信号の流れの方向を端的に示したものであり、双方向性を排除するものではない。なお、以下の説明中における「音声信号」とは、音声その他の音響に従って生ずる直接的の電気的変化であって、音声その他の音響を伝送するためのものをいい、音声に限定されない。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments for carrying out the present invention will be exemplarily described in detail with reference to the drawings. However, the configuration, numerical values, flow of processing, functional elements, etc. described in the following embodiments are only examples, and modifications and changes are free, and the technical scope of the present invention is not limited to the following description. It is not intended to be limited. In the drawings below, unidirectional arrows simply indicate the direction of a certain signal flow, and do not exclude bidirectionality. In the following description, the term "audio signal" refers to a direct electrical change that occurs in response to voice or other sound, and is for transmitting voice or other sound, and is not limited to voice.

［第１実施形態］
本発明の第１実施形態としての音声入出力装置１００について、図１を用いて説明する。 [First embodiment]
A voice input/output device 100 as a first embodiment of the present invention will be described with reference to FIG.

図１に示すように、音声入出力装置１００は、主音声取得部１０１、雑音取得部１０２、音声出力部１０３、ノイズキャンセル部１０４およびエコーキャンセル部１０５を含む。雑音取得部１０２は、ユーザ１２０の体の外側に向けて配置され、ユーザ１２０の外部から到来する外部雑音１２１を取得（捕捉）する。音声出力部１０３は、音声信号１３２の入力を受け付け、ユーザ１２０の外耳道１１０に対して出力音声１３１を出力する。主音声取得部１０１は、外部雑音１２１と出力音声１３１とユーザ１２０の声帯から外耳道内を伝わってきたユーザ１２０の主音声１１１とが混合された混合音声を取得（捕捉）して、混合音声信号１１２を出力する。ノイズキャンセル部１０４は、外部雑音１２１に基づく雑音信号を用いて、混合音声信号１１２を処理する。エコーキャンセル部１０５は、音声信号１３２を用いて、混合音声信号１１２を処理する。 As shown in FIG. 1 , the voice input/output device 100 includes a main voice acquisition section 101 , a noise acquisition section 102 , a voice output section 103 , a noise cancellation section 104 and an echo cancellation section 105 . The noise acquisition unit 102 is arranged toward the outside of the body of the user 120 and acquires (captures) external noise 121 coming from outside the user 120 . The audio output unit 103 receives the input of the audio signal 132 and outputs the output audio 131 to the ear canal 110 of the user 120 . The main voice acquisition unit 101 acquires (captures) a mixed voice obtained by mixing the external noise 121, the output voice 131, and the main voice 111 of the user 120 transmitted through the ear canal from the vocal cords of the user 120, and obtains (captures) a mixed voice signal. 112 is output. The noise cancellation unit 104 processes the mixed speech signal 112 using a noise signal based on the external noise 121 . Echo canceller 105 processes mixed audio signal 112 using audio signal 132 .

本実施形態によれば、ノイズキャンセルとエコーキャンセルとの両方を行って高品質な主音声信号を生成することができる。 According to this embodiment, both noise cancellation and echo cancellation can be performed to generate a high-quality main audio signal.

［第２実施形態］
次に本発明の第２実施形態に係る音声入出力装置について、図２Ａ乃至図３を用いて説明する。図２Ａは、本実施形態に係る音声入出力装置の構成を示す図である。音声入出力装置２００は、主音声取得部としての内部マイク２０１、雑音取得部としての外部マイク２０２、音声出力部としてのスピーカ２０３、音声処理部２９０を有する。音声処理部２９０は、ノイズキャンセル部２０４およびエコーキャンセル部２０５を有する。音声入出力装置２００は、インナーイヤー型のヘッドホン、カナル型のヘッドホン、両耳型のヘッドホン、片耳型のヘッドホン、モノラル型のヘッドホンであってもよいが、これらには限定されない。また、音声入出力装置２００は、ヘッドホンには限られず、イヤホン（earphone(s)）、ヘッドセットであってもよい。 [Second embodiment]
Next, a voice input/output device according to a second embodiment of the present invention will be described with reference to FIGS. 2A to 3. FIG. FIG. 2A is a diagram showing the configuration of the audio input/output device according to this embodiment. The voice input/output device 200 has an internal microphone 201 as a main voice acquisition section, an external microphone 202 as a noise acquisition section, a speaker 203 as a voice output section, and a voice processing section 290 . The audio processing section 290 has a noise canceling section 204 and an echo canceling section 205 . The audio input/output device 200 may be an inner ear headphone, a canal headphone, a binaural headphone, a monaural headphone, or a monaural headphone, but is not limited to these. Also, the audio input/output device 200 is not limited to headphones, and may be earphones (earphone(s)) or a headset.

内部マイク２０１は、ユーザ２７０の外耳道２１０に向けられた内部マイクである。内部マイク２０１で捕捉されたユーザ２７０の主音声２１１は、送信信号２５０として所定の送信先に送信される。 Internal microphone 201 is an internal microphone aimed at ear canal 210 of user 270 . A main voice 211 of the user 270 captured by the internal microphone 201 is transmitted as a transmission signal 250 to a predetermined destination.

内部マイク２０１は、外部雑音２２１と出力音声２３１と主音声２１１とが混合された混合音声を捕捉して、混合音声信号２１２を出力する。内部マイク２０１は、密閉空間である外耳道２１０に配置されていても、外部雑音２２１の音量が大きい場合には、ユーザ２７０の頭部を通過して外耳道内に伝播された外部雑音２２１の一部を捕捉してしまう。さらに、スピーカ２０３から音声が出力されている場合には、内部マイク２０１は、その音声も捕捉してしまう。 The internal microphone 201 captures a mixed sound in which the external noise 221 , the output sound 231 and the main sound 211 are mixed, and outputs a mixed sound signal 212 . Even if the internal microphone 201 is placed in the external auditory canal 210, which is a closed space, when the volume of the external noise 221 is large, part of the external noise 221 propagated into the external auditory canal through the head of the user 270. is captured. Furthermore, when sound is being output from the speaker 203, the internal microphone 201 also captures that sound.

外部マイク２０２は、ユーザ２７０の体の外側に向けて配置され、外部マイク２０２は、ユーザ２７０の外部から到来する外部雑音２２１を捕捉する。例えば、外部マイク２０２は、ユーザ２７０の周辺からの外部雑音２２１を捕捉する外部マイクである。外部マイク２０２は、外部雑音２２１を捕捉して、雑音信号２２２を生成する。 The external microphone 202 is positioned toward the outside of the user's 270 body, and the external microphone 202 captures external noise 221 coming from outside the user 270 . For example, external microphone 202 is an external microphone that captures external noise 221 from the user's 270 surroundings. External microphone 202 captures external noise 221 and produces noise signal 222 .

通信部２６０が、受信した受信信号２４０は、出力音声信号２３２に変換され、スピーカ２０３に入力される。スピーカ２０３は、出力音声信号２３２の入力を受け付け、ユーザ２７０の外耳道２１０に対して出力音声２３１を出力する。 A reception signal 240 received by the communication unit 260 is converted into an output audio signal 232 and input to the speaker 203 . The speaker 203 receives the input of the output audio signal 232 and outputs the output audio 231 to the ear canal 210 of the user 270 .

ノイズキャンセル部２０４は、外部マイク２０２が捕捉した外部雑音２２１に基づく雑音信号を用いて、内部マイク２０１が捕捉した混合音声から出力された混合音声信号２１２を処理する。ユーザ２７０の主音声２１１に外部雑音２２１が混ざった混合音声を内部マイク２０１が捕捉する。 The noise cancellation unit 204 uses a noise signal based on the external noise 221 captured by the external microphone 202 to process the mixed audio signal 212 output from the mixed audio captured by the internal microphone 201 . The internal microphone 201 captures the mixed speech of the user's 270 primary speech 211 mixed with external noise 221 .

エコーキャンセル部２０５は、スピーカ２０３に入力される出力音声信号２３２を用いて、内部マイク２０１が出力した混合音声信号２１２にエコーキャンセル処理を加える。 The echo cancellation unit 205 applies echo cancellation processing to the mixed audio signal 212 output from the internal microphone 201 using the output audio signal 232 input to the speaker 203 .

通信部２６０は、受信信号２４０を受信し、出力音声信号２３２をスピーカ２０３に送る。また、通信部２６０は、音声処理部２９０で生成された音声信号を受け取り、外部に送信信号２５０として送信する。 The communication unit 260 receives the received signal 240 and sends the output audio signal 232 to the speaker 203 . The communication unit 260 also receives the audio signal generated by the audio processing unit 290 and transmits it to the outside as a transmission signal 250 .

図２Ｂは、本実施形態に係る音声入出力装置の音声処理部の詳しい構成を示す図である。ノイズキャンセル部２０４は、適応フィルタ２４１と加算器２２０とを有する。ノイズキャンセル部２０４には、外部マイク２０２で生成した外部雑音信号２２２が入力される。ノイズキャンセル部２０４は、入力された外部雑音２２１に基づく外部雑音信号２２２を用いて、混合音声信号２１２を処理する。ノイズキャンセル部２０４は、適応フィルタ２４１を駆動して、混合音声信号に含まれる雑音信号の擬似信号（擬似雑音信号２４２）を生成する。加算器２２０は、擬似雑音信号２４２を内部マイク２０１の混合音声信号２１２から減算することで、雑音の抑圧を行う。加算器２２０から出力された擬似主音声信号２９１は、残留雑音を含み、適応フィルタ２４１の係数更新に利用される。 FIG. 2B is a diagram showing the detailed configuration of the audio processing unit of the audio input/output device according to this embodiment. Noise cancellation section 204 has adaptive filter 241 and adder 220 . An external noise signal 222 generated by the external microphone 202 is input to the noise canceling section 204 . The noise cancellation unit 204 processes the mixed audio signal 212 using an external noise signal 222 based on the input external noise 221 . The noise cancellation unit 204 drives the adaptive filter 241 to generate a pseudo signal (pseudo noise signal 242) of the noise signal included in the mixed speech signal. The adder 220 performs noise suppression by subtracting the pseudo-noise signal 242 from the mixed speech signal 212 of the internal microphone 201 . The pseudo main speech signal 291 output from the adder 220 contains residual noise and is used to update the coefficients of the adaptive filter 241 .

外部マイク２０２が捕捉した外部雑音信号２２２は、制御部２８０にも入力される。制御部２８０は、入力された外部雑音信号２２２に基づいて、ノイズキャンセル部２０４による処理を制御する。制御部２８０には、外部雑音信号２２２、擬似雑音信号２４２、擬似主音声信号２９１が入力され、制御部２８０は、これらに基づいて適応フィルタ２４１の係数を生成して、係数の更新のタイミングを制御する。 The external noise signal 222 captured by the external microphone 202 is also input to the control section 280 . The control unit 280 controls processing by the noise cancellation unit 204 based on the input external noise signal 222 . The external noise signal 222, the pseudo noise signal 242, and the pseudo main speech signal 291 are input to the control unit 280, and the control unit 280 generates the coefficients of the adaptive filter 241 based on these and determines the timing of updating the coefficients. Control.

擬似主音声信号２９１は、エコーキャンセル部２０５に入力される。エコーキャンセル部２０５は、スピーカ２０３に入力される出力音声信号２３２を用いて、内部マイク２０１が出力した混合音声信号２１２に対してエコーキャンセル処理を加える。エコーキャンセル部２０５は、適応フィルタ２５１と加算器２３０とを有する。適応フィルタ２５１は、出力音声信号２３２を用いて、擬似エコー信号２５２を生成する。加算器２３０は、擬似主音声信号２９１から擬似エコー信号２５２を減算して、擬似主音声信号２９２を生成する。制御部２８０には、出力音声信号２３２、擬似主音声信号２９１，２９２が入力され、制御部２８０は、これらに基づいて適応フィルタ２５１の係数を生成して、係数の更新のタイミングを制御する。 The pseudo main audio signal 291 is input to the echo canceller 205 . The echo cancellation unit 205 applies echo cancellation processing to the mixed audio signal 212 output from the internal microphone 201 using the output audio signal 232 input to the speaker 203 . Echo canceling section 205 has adaptive filter 251 and adder 230 . Adaptive filter 251 uses output speech signal 232 to generate pseudo echo signal 252 . Adder 230 subtracts pseudo echo signal 252 from pseudo main audio signal 291 to generate pseudo main audio signal 292 . The output audio signal 232 and the pseudo main audio signals 291 and 292 are input to the control unit 280, and the control unit 280 generates the coefficients of the adaptive filter 251 based on these and controls the timing of updating the coefficients.

内部マイク２０１が捕捉した混合音声信号２１２に混ざり込んだ出力音声信号２３２の一部を取り除くために、エコーキャンセル部２０５は、入力された音声信号を用いて、混合音声信号２１２にエコーキャンセル処理を施す。 In order to remove a portion of the output audio signal 232 mixed with the mixed audio signal 212 captured by the internal microphone 201, the echo cancellation unit 205 applies echo cancellation processing to the mixed audio signal 212 using the input audio signal. Apply.

このように、エコーキャンセル部２０５は、ノイズキャンセル処理後の音声信号に対して、エコーキャンセル処理を施す。エコーキャンセル部２０５は、例えば、スピーカ２０３から音楽が流れている状態で、ユーザが発声した場合にも、内部マイク２０１で捕捉した混合音声信号からユーザの声をクリアに抽出できる。 In this manner, the echo cancellation unit 205 performs echo cancellation processing on the audio signal after noise cancellation processing. For example, even when the user speaks while music is being played from the speaker 203 , the echo cancellation unit 205 can clearly extract the user's voice from the mixed audio signal captured by the internal microphone 201 .

通信部２６０は、ノイズキャンセル部およびエコーキャンセル部で処理された擬似主音声信号２９２を受け取り、外部へ送信信号２５０として送信する。 The communication unit 260 receives the pseudo main audio signal 292 processed by the noise canceling unit and the echo canceling unit, and transmits it as a transmission signal 250 to the outside.

図２Ｃは、本実施形態に係る音声入出力装置２００の制御部２８０の係数処理を説明する図である。上述したように、ノイズキャンセル部２０４およびエコーキャンセル部２０５はそれぞれ、適応フィルタ２４１，２５１を用いてノイズキャンセル処理およびエコーキャンセル処理を行う。図２Ｃにおいて、縦軸は更新量（学習量）を表し、横軸はＳ／Ｎ（信号対雑音比）を表している。グラフ２０８は、ノイズキャンセル部２０４の適応フィルタ２４１の係数の更新量を示している。グラフ２０９は、エコーキャンセル部２０５の適応フィルタ２５１の係数の更新量を示している。グラフ２０８およびグラフ２０９に示したように、制御部２８０は、適応フィルタ２４１の更新処理を行い、適応フィルタ２４１の更新処理が収束するまでの間は、適応フィルタ２５１の更新は行わない。つまり、制御部２８０は、適応フィルタ２４１の更新処理が収束してから、適応フィルタ２５１の更新処理を行う。すなわち、制御部２８０は、どちらか一方の適応フィルタの更新を行っている場合には、他方の適応フィルタの更新は行わず、両方の適応フィルタ２４１，２５１が同時に更新されることはない。ノイズキャンセル部２０４およびエコーキャンセル部２０５がＯＮ／ＯＦＦされるのではなく、適応フィルタ２４１，２５１の更新（学習）がＯＮ／ＯＦＦされ、シーソーのように適応フィルタ２４１，２５１の更新が行われる。適応フィルタ２４１，２５１は、ある程度更新が進むと、ほとんどフィルタ係数が変わらない状態となる。このような状態となると、適応フィルタ２４１，２５１のフィルタ係数が決まるので、原則として、制御部２８０は、適応フィルタ２４１，２５１の再更新は行わない。 FIG. 2C is a diagram illustrating coefficient processing of the control unit 280 of the audio input/output device 200 according to this embodiment. As described above, the noise cancellation section 204 and the echo cancellation section 205 perform noise cancellation processing and echo cancellation processing using the adaptive filters 241 and 251, respectively. In FIG. 2C, the vertical axis represents the update amount (learning amount), and the horizontal axis represents the S/N (signal-to-noise ratio). A graph 208 indicates the update amount of the coefficients of the adaptive filter 241 of the noise cancellation unit 204 . A graph 209 indicates the update amount of the coefficients of the adaptive filter 251 of the echo canceller 205 . As shown in graphs 208 and 209, the control unit 280 updates the adaptive filter 241 and does not update the adaptive filter 251 until the update process of the adaptive filter 241 converges. That is, the control unit 280 performs the update process of the adaptive filter 251 after the update process of the adaptive filter 241 converges. That is, when one of the adaptive filters is being updated, the control section 280 does not update the other adaptive filter, and both the adaptive filters 241 and 251 are not updated at the same time. The noise canceling unit 204 and the echo canceling unit 205 are not turned on/off, but the updating (learning) of the adaptive filters 241 and 251 is turned on/off, and the adaptive filters 241 and 251 are updated like a seesaw. When the adaptive filters 241 and 251 are updated to some extent, the filter coefficients are almost unchanged. In such a state, the filter coefficients of the adaptive filters 241 and 251 are determined, so in principle the control section 280 does not update the adaptive filters 241 and 251 again.

また、制御部２８０が、適応フィルタ２４１の更新を行うタイミングは、内部マイク２０１が主音声２１１を捕捉せず、スピーカ２０３から出力音声２３１を出力していないタイミングである。また、制御部２８０が、適応フィルタ２５１の更新を行うタイミングは、スピーカ２０３が出力音声２３１を出力しているタイミングである。 The timing at which the control unit 280 updates the adaptive filter 241 is the timing at which the internal microphone 201 does not capture the main sound 211 and the speaker 203 does not output the output sound 231 . Also, the timing at which the control unit 280 updates the adaptive filter 251 is the timing at which the speaker 203 is outputting the output sound 231 .

また、制御部２８０は、内部マイク２０１が主音声２１１を捕捉し、スピーカ２０３から出力音声２３１を出力しているタイミングでは、適応フィルタ２４１，２５１の更新を行わない。 Further, the control unit 280 does not update the adaptive filters 241 and 251 at the timing when the internal microphone 201 captures the main voice 211 and the speaker 203 outputs the output voice 231 .

本実施形態によれば、ノイズキャンセルとエコーキャンセルとの両方を行って高品質の主音声信号を送信できる。つまり、ユーザの声をクリアに相手に届けることができる。また、適応フィルタの更新を行うので、外部雑音の変化、スピーカから出ている音声の変化に対応できる。また、例えば、ユーザの音声をスマートフォンに送信してＡＩ（Artificial Intelligence）アシスタントに音声認識をさせる場合にも、認識精度が高くなるため、外部雑音が大きい屋外においてもＡＩアシスタントの誤認識を減らすことができる。さらに、ユーザがヘッドホンを使って音楽を聴きながらでも音声通話やＡＩアシスタントの使用を実現できる。 According to this embodiment, both noise cancellation and echo cancellation can be performed to transmit a high-quality main audio signal. In other words, the user's voice can be clearly delivered to the other party. Also, since the adaptive filter is updated, it is possible to cope with changes in external noise and voice output from the speaker. In addition, for example, even if the user's voice is sent to a smartphone and the AI (Artificial Intelligence) assistant recognizes the voice, the recognition accuracy will be higher, so the AI assistant's misrecognition can be reduced even outdoors with a lot of external noise. can be done. Furthermore, users can make voice calls and use AI assistants while listening to music through headphones.

［第３実施形態］
次に本発明の第３実施形態に係る音声入出力装置について、図３を用いて説明する。図３は、本実施形態に係る音声入出力装置の構成を示す図である。本実施形態に係る音声入出力装置は、上記第２実施形態と比べると、音声処理部３２０の構成が異なる。その他の構成および動作は、第２実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。 [Third Embodiment]
Next, a voice input/output device according to a third embodiment of the present invention will be described with reference to FIG. FIG. 3 is a diagram showing the configuration of the audio input/output device according to this embodiment. The audio input/output device according to the present embodiment differs from that of the second embodiment in the configuration of the audio processing section 320 . Since other configurations and operations are similar to those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof will be omitted.

音声処理部３２０は、第２実施形態の音声処理部２９０の構成に加えてノイズキャンセル部３０１、エコーキャンセル部３０３および制御部３１０を有する。エコーキャンセル部３０３は、加算器３３０と適応フィルタ３３１とを有する。エコーキャンセル部３０３は、スピーカ２０３の出力音声信号２３２を適応フィルタ３３１で生成した擬似出力音声３３２を加算器３３０で外部マイク２０２で捕捉した外部雑音信号２２２から減算する。これにより、スピーカ２０３からの音漏れをキャンセルして、高品質な擬似外部雑音信号３２２を得ることができる。 The audio processing unit 320 has a noise canceling unit 301, an echo canceling unit 303, and a control unit 310 in addition to the configuration of the audio processing unit 290 of the second embodiment. Echo canceling section 303 has adder 330 and adaptive filter 331 . Echo cancellation section 303 subtracts pseudo output sound 332 generated by adaptive filter 331 from output sound signal 232 of speaker 203 from external noise signal 222 captured by external microphone 202 in adder 330 . As a result, sound leakage from the speaker 203 can be canceled and a high-quality pseudo external noise signal 322 can be obtained.

制御部３１０は、外部雑音信号２２２と、エコーキャンセル処理を施された外部雑音信号２２２と、出力音声信号２３２とを入力して、適応フィルタ３３１の係数を生成して、更新を制御する。 The control unit 310 inputs the external noise signal 222, the echo-cancelled external noise signal 222, and the output speech signal 232, generates the coefficients of the adaptive filter 331, and controls updating.

ノイズキャンセル部３０１は、加算器３１２と適応フィルタ３１１とを有する。ノイズキャンセル部３０１は、擬似外部雑音信号３２２から生成した擬似雑音信号３２３を受信信号２４０に基づいて生成された音声信号３２４から加算器３１２で減算する。 Noise cancellation section 301 has adder 312 and adaptive filter 311 . Noise cancellation section 301 subtracts pseudo noise signal 323 generated from pseudo external noise signal 322 from voice signal 324 generated based on received signal 240 in adder 312 .

本実施形態によれば、ノイズキャンセルとエコーキャンセルとの両方を行って高品質の主音声信号を送信できる。また、スピーカから出力される音漏れが外部マイクに混入する影響を排除できる。 According to this embodiment, both noise cancellation and echo cancellation can be performed to transmit a high-quality main audio signal. In addition, it is possible to eliminate the influence of leakage of sound output from the speaker being mixed into the external microphone.

［第４実施形態］
次に本発明の第４実施形態に係る音声入出力装置について、図４を用いて説明する。図４は、本実施形態に係る音声入出力装置４００の構成を説明するための図である。本実施形態に係る音声入出力装置４００は、上記第３実施形態と比べると、制御部３１０を存在しない点で異なる。他の構成および動作は、第２実施形態および第３実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。 [Fourth Embodiment]
Next, a voice input/output device according to a fourth embodiment of the present invention will be described with reference to FIG. FIG. 4 is a diagram for explaining the configuration of the audio input/output device 400 according to this embodiment. The voice input/output device 400 according to this embodiment differs from the third embodiment in that the control unit 310 is not present. Since other configurations and operations are the same as those of the second and third embodiments, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof will be omitted.

適応フィルタ４２１は、エコーキャンセルを施された似外部雑音信号３２２から擬似雑音信号４２２を生成し、加算器３１２は、受信信号２４０から生成された音声信号３２４から擬似雑音信号４２２を減算する。 Adaptive filter 421 generates pseudo-noise signal 422 from echo-cancelled pseudo-noise signal 322 , and adder 312 subtracts pseudo-noise signal 422 from speech signal 324 generated from received signal 240 .

エコーキャンセル部４０３は、適応フィルタ４３１と加算器３３０とを有する。適応フィルタ４３１は、擬似出力音声信号４３２を生成する。加算器３３０は、外部雑音信号２２２から擬似出力音声信号４３２を減算する。 Echo cancellation section 403 has adaptive filter 431 and adder 330 . Adaptive filter 431 produces simulated output audio signal 432 . Adder 330 subtracts simulated output audio signal 432 from external noise signal 222 .

本実施形態によれば、よりシンプルな構成で第３実施形態と同様の効果を得られる。 According to this embodiment, the same effects as those of the third embodiment can be obtained with a simpler configuration.

［第５実施形態］
次に本発明の第５実施形態に係る補聴器について、図５Ａ乃至図５Ｃを用いて説明する。図５Ａ乃至図５Ｃは、本実施形態に係る補聴器の構成を示す図である。本実施形態に係る補聴器は、上記第４実施形態と比べると、補聴器機能とスイッチとを追加した点で異なる。その他の構成および動作は、第４実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。 [Fifth embodiment]
Next, a hearing aid according to a fifth embodiment of the invention will be described with reference to FIGS. 5A to 5C. 5A to 5C are diagrams showing the configuration of the hearing aid according to this embodiment. The hearing aid according to this embodiment differs from the fourth embodiment in that hearing aid functions and a switch are added. Since other configurations and operations are the same as those of the fourth embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof will be omitted.

図５Ａは、相手の声を聴きつつ外部雑音の漏れこみを許容する場合を表す。図５Ａに示したように、補聴器５００は、内部マイク２０１、外部マイク２０２、スピーカ２０３、通信部２６０および音声処理部５６０を有する。音声処理部５６０は、増幅部５０１、スイッチ５２１，５０３および加算器５２０をさらに有する。通信部２６０を介して入力した受信信号２４０に対応する音声信号３２４は、増幅部５０１で増幅され、スピーカ２０３に入力され、出力音声として出力される。補聴器５００においては、スピーカ２０３から出力される出力音声が大きいので、混合音声における出力音声の混合比率が大きくなる。そのため、内部マイク２０１が捕捉する出力音声のキャンセリングを行うことの効果が大きい。また、増幅された出力音声が補聴器５００からユーザの外部に漏れやすくなるので、エコーキャンセル部４０３の重要性が高い。ユーザは、通話相手の声を大きな音量で聞くことができる。補聴器５００であっても高品質な主音声を捕捉できる。一方、増幅された出力された音声を内部マイク２０１が捕捉し易くなるが、エコーキャンセル部２０５の働きによって、高品質な擬似主音声信号を生成できる。 FIG. 5A shows a case in which external noise is allowed to leak in while listening to the other party's voice. As shown in FIG. 5A, hearing aid 500 has internal microphone 201 , external microphone 202 , speaker 203 , communication section 260 and audio processing section 560 . Audio processing section 560 further includes amplifier section 501 , switches 521 and 503 and adder 520 . An audio signal 324 corresponding to the received signal 240 input via the communication unit 260 is amplified by the amplification unit 501, input to the speaker 203, and output as output audio. In the hearing aid 500, since the output sound output from the speaker 203 is loud, the mixing ratio of the output sound in the mixed sound is large. Therefore, the effect of canceling the output sound captured by the internal microphone 201 is great. In addition, since the amplified output sound is likely to leak from the hearing aid 500 to the outside of the user, the importance of the echo canceller 403 is high. The user can hear the other party's voice at a high volume. Even the hearing aid 500 can capture high quality main speech. On the other hand, it becomes easier for the internal microphone 201 to capture the amplified output voice, but the function of the echo canceller 205 enables generation of a high-quality pseudo main voice signal.

図５Ｂは、外部雑音をキャンセルしつつ、自分の声と相手の声とを大きく聞く場合を表す。この場合、スイッチ５２１を適応フィルタ４２１側の接点へ接続する。また、スイッチ５２１の動きと連携して、スイッチ５０３が閉じる。適応フィルタ４２１と加算器３１２は、図４を用いて説明した動作を行う。これにより、ユーザは、外部音声をキャンセルした音を聞くことができる。また、スイッチ５０３が閉じることにより、加算器５２０は、擬似主音声信号と受信信号２４０から生成した音声信号３２４とを加算する。これにより、ユーザ２７０は、側音と呼ばれる自発生音声を聞くことができる。 FIG. 5B shows a case where one's own voice and the other party's voice are heard loudly while canceling external noise. In this case, the switch 521 is connected to the contact on the adaptive filter 421 side. Also, in conjunction with the movement of switch 521, switch 503 closes. Adaptive filter 421 and adder 312 perform the operations described with reference to FIG. This allows the user to hear the sound with the external sound canceled. Also, by closing the switch 503 , the adder 520 adds the simulated main audio signal and the audio signal 324 generated from the received signal 240 . This allows the user 270 to hear spontaneously generated sounds called sidetones.

図５Ｃは、外部雑音と相手の声との両方を大きく聞く場合を表す。この場合、スイッチ５０２をノイズキャンセル部３０２とは反対側の接点へ接続する。また、スイッチ５０３は、スイッチ５０２の動きと連動して開く。エコーキャンセル部４０３が、音漏れの影響をキャンセルする。加算器３１２は、綺麗な外部雑音と受信した相手の声とを加算する。増幅部５０１は、加算器３１２で加算された音声信号を増幅して、出力音声信号２３２を生成する。これにより、ユーザは、外部の音と通話相手の声とを大きな音量で聞くことができる。 FIG. 5C represents a case in which both external noise and the other party's voice are heard loudly. In this case, the switch 502 is connected to the contact on the side opposite to the noise cancellation section 302 . Also, the switch 503 opens in conjunction with the movement of the switch 502 . An echo cancellation unit 403 cancels the influence of sound leakage. The adder 312 adds clean external noise and the received voice of the other party. Amplifying section 501 amplifies the audio signal added by adder 312 to generate output audio signal 232 . As a result, the user can hear the external sound and the voice of the caller at a high volume.

［第６実施形態］
次に本発明の第６実施形態に係る音声入出力装置について、図６を用いて説明する。図６は、本実施形態に係る音声入出力装置の構成を示す図である。本実施形態に係る音声入出力装置は、上記第２実施形態と比べると、着脱検知部６０１を有する点で異なる。その他の構成および動作は、第２実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。 [Sixth embodiment]
Next, a voice input/output device according to a sixth embodiment of the present invention will be described with reference to FIG. FIG. 6 is a diagram showing the configuration of the audio input/output device according to this embodiment. The audio input/output device according to this embodiment differs from that of the second embodiment in that it has an attachment/detachment detection unit 601 . Since other configurations and operations are similar to those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof will be omitted.

着脱検知部６０１は、例えば、内部マイク２０１が捕捉した血液の流れる音や心拍音を用いて音声入出力装置６００の耳への着脱を検知する。また、着脱検知部６０１は、例えば、人間が聞こえない超音波を発振して、当該超音波の反射波の有無により着脱を検知してもよい。また、赤外線センサや加速度センサなどを用いて着脱を検知してもよい。なお、着脱検知方法は、これらには限定されない。 Attachment/detachment detection unit 601 detects attachment/detachment of voice input/output device 600 to the ear, for example, using the sound of flowing blood or heartbeat captured by internal microphone 201 . Alternatively, the attachment/detachment detection unit 601 may, for example, oscillate an ultrasonic wave inaudible to humans and detect attachment/detachment based on the presence or absence of a reflected wave of the ultrasonic wave. Attachment/detachment may also be detected using an infrared sensor, an acceleration sensor, or the like. Note that the attachment/detachment detection method is not limited to these.

そして、着脱検知部６０１で音声入出力装置６００の装着を検知した場合に、ノイズキャンセル部２０４は、適応フィルタ２４１を用いてノイズキャンセル処理を行い、エコーキャンセル部２０５は適応フィルタ２５１を用いてエコーキャンセル処理を行う。エコーの状態は、音声入出力装置６００を装着するユーザごとに異なるため、制御部２８０は、音声入出力装置６００の装着を検知する度に適応フィルタ２５１の更新を行う。一方、ノイズの状態も、装着した状況（場所や時間）ごとに異なるため、制御部２８０は、装着ごとに適応フィルタ２４１の更新を行う。本実施形態によれば、着脱検知部を設けたので、音声入出力装置を使用するユーザが変わったり、ユーザが音声入出力装置を付け替えたりしても送信信号の品質を高めることができる。なお、音声入出力装置６００は、着脱検知部６０１で音声入出力装置６００が外されたことを検知した場合に、音声入出力装置６００の全機能を停止させてもよい。 When the attachment/detachment detection unit 601 detects attachment of the audio input/output device 600, the noise cancellation unit 204 uses the adaptive filter 241 to perform noise cancellation processing, and the echo cancellation unit 205 uses the adaptive filter 251 to perform echo cancellation processing. Perform cancellation processing. Since the state of echo differs for each user who wears the voice input/output device 600, the control unit 280 updates the adaptive filter 251 each time the wearing of the voice input/output device 600 is detected. On the other hand, since the state of noise also varies depending on the wearing situation (place and time), the control unit 280 updates the adaptive filter 241 for each wearing. According to this embodiment, since the attachment/detachment detection unit is provided, it is possible to improve the quality of the transmission signal even if the user who uses the voice input/output device changes or the user changes the voice input/output device. Note that the audio input/output device 600 may stop all functions of the audio input/output device 600 when the attachment/detachment detection unit 601 detects that the audio input/output device 600 has been removed.

［第７実施形態］
次に本発明の第７実施形態に係る音声入出力装置について、図７を用いて説明する。図７は、本実施形態に係る音声入出力装置の構成を示す図である。本実施形態に係る音声入出力装置は、上記第２実施形態と比べると、遮音部を有する点で異なる。その他の構成および動作は、第２実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。 [Seventh embodiment]
Next, a voice input/output device according to a seventh embodiment of the present invention will be described with reference to FIG. FIG. 7 is a diagram showing the configuration of the audio input/output device according to this embodiment. The audio input/output device according to this embodiment differs from that of the second embodiment in that it has a sound insulation part. Since other configurations and operations are similar to those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof will be omitted.

遮音部７０１は、内部マイク２０１に対する外部雑音２２１の侵入経路を制限する。遮音部は、例えば、内部マイク２０１の周囲を取り囲むような円筒形状の部材である。ユーザ２７０の外耳道２１０を伝わって到来する主音声２１１を遮音しないように、ユーザ２７０の外耳道２１０に向いている側は、開放されている。なお、遮音部７０１の形状は、ここに示した形状には限定されず、ユーザ２７０の体や音声入出力装置７００を伝わってくる外部雑音２２１を遮音できる形状であれば、どのような形状であってもよい。また、遮音部７０１の材質は、外部雑音２２１を遮音できる部材であれば、どのような部材であってもよいが、例えば、ゴムや樹脂、ガラスなどを採用することができる。本実施形態によれば、ノイズキャンセル部２０４とエコーキャンセル部２０５と遮音部７０１とを設けたので、高品質な擬似主音声信号を生成することができる。 The sound insulation unit 701 restricts the entry route of the external noise 221 to the internal microphone 201 . The sound insulation part is, for example, a cylindrical member that surrounds the internal microphone 201 . The side facing the ear canal 210 of the user 270 is open so as not to block the main sound 211 coming through the ear canal 210 of the user 270 . The shape of the sound insulation part 701 is not limited to the shape shown here, and any shape can be used as long as it is a shape that can block the external noise 221 transmitted through the body of the user 270 and the voice input/output device 700. There may be. Further, the material of the sound insulation part 701 may be any material as long as it is a material that can insulate the external noise 221. For example, rubber, resin, glass, or the like can be used. According to this embodiment, since the noise cancellation section 204, the echo cancellation section 205, and the sound insulation section 701 are provided, a high-quality pseudo main audio signal can be generated.

［他の実施形態］
以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。また、それぞれの実施形態に含まれる別々の特徴を如何様に組み合わせたシステムまたは装置も、本発明の範疇に含まれる。 [Other embodiments]
Although the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention. Also, any system or apparatus that combines separate features included in each embodiment is also included in the scope of the present invention.

また、本発明は、複数の機器から構成されるシステムに適用されてもよいし、単体の装置に適用されてもよい。さらに、本発明は、実施形態の機能を実現する情報処理プログラムが、システムあるいは装置に直接あるいは遠隔から供給される場合にも適用可能である。したがって、本発明の機能をコンピュータで実現するために、コンピュータにインストールされるプログラム、あるいはそのプログラムを格納した媒体、そのプログラムをダウンロードさせるＷＷＷ(World Wide Web)サーバも、本発明の範疇に含まれる。特に、少なくとも、上述した実施形態に含まれる処理ステップをコンピュータに実行させるプログラムを格納した非一時的コンピュータ可読媒体（non-transitory computer readable medium）は本発明の範疇に含まれる。 Further, the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device. Furthermore, the present invention is also applicable when an information processing program that implements the functions of the embodiments is directly or remotely supplied to a system or apparatus. Therefore, in order to implement the functions of the present invention on a computer, a program installed in a computer, a medium storing the program, and a WWW (World Wide Web) server from which the program is downloaded are also included in the scope of the present invention. . In particular, non-transitory computer readable media containing programs that cause a computer to perform at least the processing steps included in the above-described embodiments are included within the scope of the present invention.

図８Ａは、第２実施形態を信号処理プログラムにより構成する場合に、その信号処理プログラムを実行するコンピュータ８００の構成図である。コンピュータ８００は、入力部８１０と、ＣＰＵ（Central Processing Unit）８２０と、出力部８３０と、メモリ８４０とを含む。 FIG. 8A is a configuration diagram of a computer 800 that executes a signal processing program when the second embodiment is configured by the signal processing program. Computer 800 includes an input section 810 , a CPU (Central Processing Unit) 820 , an output section 830 and a memory 840 .

ＣＰＵ８２０は、メモリ８４０に記憶された信号処理プログラムを読み込むことにより、コンピュータ８００の動作を制御する。すなわち、信号処理プログラムを実行したＣＰＵ８２０は、ステップＳ８０１において、入力部８１０からユーザの外部雑音２２１を捕捉する。ステップＳ８０３において、ＣＰＵ８２０は、出力部８３０から音声信号を出力する。ステップＳ８０５において、ＣＰＵ８２０は、入力部８１０から、外部雑音２２１と主音声２１１と音声出力部からの出力音声２３１とが混合された混合音声信号２１２を捕捉する。ステップＳ８０７において、ＣＰＵ８２０は、捕捉した混合音声信号２１２に対してノイズキャンセル処理を行う。ステップＳ８０９において、ＣＰＵ８２０は、スピーカ２０３に入力される音声信号を用いて、捕捉した混合音声信号２１２に対してエコーキャンセル処理を行う。ステップＳ８１１において、ＣＰＵ８２０は、音声信号を送信する。 CPU 820 controls the operation of computer 800 by reading a signal processing program stored in memory 840 . That is, the CPU 820 that has executed the signal processing program captures the user's external noise 221 from the input unit 810 in step S801. In step S803 , CPU 820 outputs an audio signal from output unit 830 . In step S805, the CPU 820 acquires from the input unit 810 the mixed audio signal 212 in which the external noise 221, the main audio 211, and the output audio 231 from the audio output unit are mixed. In step S807 , the CPU 820 performs noise cancellation processing on the captured mixed audio signal 212 . In step S809 , the CPU 820 performs echo cancellation processing on the captured mixed audio signal 212 using the audio signal input to the speaker 203 . In step S811, CPU 820 transmits an audio signal.

図８Ｂは、ＣＰＵ８２０が実行する処理の流れを示すフローチャートである。ステップＳ８２１において、ＣＰＵ８２０は、内部マイク２０１で混合音声信号２１２を捕捉しているか否かを判断する。混合音声信号２１２を捕捉していると判断した場合（ステップＳ８２１のＹＥＳ）、ＣＰＵ８２０は、処理を終了する。混合音声信号２１２を捕捉していないと判断した場合（ステップＳ９２１のＮＯ）、ＣＰＵ８２０は、ステップＳ８２３へ進む。ステップＳ８２３において、ＣＰＵ８２０は、スピーカ２０３からの出力音声２３１を出力しているか否かを判断する。出力音声２３１を出力していると判断した場合（ステップＳ８２３のＹＥＳ）、ＣＰＵ８２０は、処理を終了する。出力音声２３１を出力していないと判断した場合（ステップＳ８２３のＮＯ）、ＣＰＵ８２０は、ステップＳ８２５へ進む。ステップＳ８２５において、ＣＰＵ８２０は、ノイズキャンセル部２０４の適応フィルタ２４１の更新を行う。 FIG. 8B is a flow chart showing the flow of processing executed by the CPU 820. As shown in FIG. In step S821, CPU 820 determines whether internal microphone 201 is capturing mixed audio signal 212 or not. When determining that mixed audio signal 212 is captured (YES in step S821), CPU 820 terminates the process. When determining that mixed audio signal 212 is not captured (NO in step S921), CPU 820 proceeds to step S823. In step S823, CPU 820 determines whether or not output sound 231 from speaker 203 is being output. When determining that the output sound 231 is being output (YES in step S823), the CPU 820 terminates the process. When determining that the output sound 231 is not output (NO in step S823), the CPU 820 proceeds to step S825. In step S825 , the CPU 820 updates the adaptive filter 241 of the noise cancellation section 204 .

図８Ｃは、ＣＰＵ８２０が実行する処理の流れを示すフローチャートである。ステップＳ８３１において、ＣＰＵ８２０は、スピーカ２０３からの出力音声２３１を出力しているか否かを判断する。出力音声２３１を出力していないと判断した場合（ステップＳ８３１のＮＯ）、ＣＰＵ８２０は、処理を終了する。出力音声２３１を出力していると判断した場合（ステップＳ８３１のＹＥＳ）、ＣＰＵ８２０は、ステップＳ８３２へ進む。ステップ８３２において、ＣＰＵ８２０は、主音声を捕捉したか否かを判断する。主音声を捕捉していると判断した場合（ステップＳ８３２のＹＥＳ）、ＣＰＵ８２０は、処理を終了する。主音声を捕捉していないと判断した場合（ステップＳ８３２のＮＯ）、ＣＰＵ８２０は、ステップＳ８３３へ進む。ステップＳ８３３において、ＣＰＵ８２０は、エコーキャンセル部２０５の適応フィルタ（２５１）の更新を行う。 FIG. 8C is a flow chart showing the flow of processing executed by the CPU 820 . In step S831, the CPU 820 determines whether or not the output sound 231 from the speaker 203 is being output. When determining that the output sound 231 is not output (NO in step S831), the CPU 820 terminates the process. When determining that the output sound 231 is being output (YES in step S831), the CPU 820 proceeds to step S832. At step 832, CPU 820 determines whether the primary audio has been captured. If it is determined that the main sound has been captured (YES in step S832), CPU 820 terminates the process. When determining that the main sound has not been captured (NO in step S832), the CPU 820 proceeds to step S833. In step S833, the CPU 820 updates the adaptive filter (251) of the echo cancellation section 205. FIG.

図８Ｄは、ＣＰＵ８２０が実行する処理の流れを示すフローチャートである。ステップＳ８４１において、ＣＰＵ８２０は、音声入出力装置６００の装着があったか否かを判断する。装着がないと判断した場合（ステップＳ８４１のＮＯ）、ＣＰＵ８２０は、処理を終了する。装着があると判断した場合（ステップＳ８４１のＹＥＳ）、ＣＰＵ８２０は、ステップＳ８４３へ進む。ステップＳ８４３において、ＣＰＵ８２０は、エコーキャンセル部２０５の適応フィルタ２５１の更新を行う。 FIG. 8D is a flow chart showing the flow of processing executed by CPU 820 . In step S841, CPU 820 determines whether or not audio input/output device 600 is attached. When determining that there is no attachment (NO in step S841), the CPU 820 terminates the process. If it is determined that there is mounting (YES in step S841), the CPU 820 proceeds to step S843. In step S843 , CPU 820 updates adaptive filter 251 of echo canceling section 205 .

［実施形態の他の表現］
上記の実施形態の一部または全部は、以下の付記のようにも記載されうるが、以下には限られない。
（付記１）
ユーザの体の外側に向けて配置され、前記ユーザの外部から到来する外部雑音を取得するための雑音取得部と、
音声信号の入力を受け付け、前記ユーザの外耳道に対して出力音声を出力する音声出力部と、
前記外部雑音と前記出力音声と前記ユーザの声帯から前記外耳道内を伝わってきた前記ユーザの主音声とが混合された混合音声を取得して、混合音声信号を出力する主音声取得部と、
前記外部雑音に基づく雑音信号を用いて、前記混合音声信号を処理するノイズキャンセル部と、
前記音声信号を用いて、前記混合音声信号を処理するエコーキャンセル部と、を備える音声入出力装置。
（付記２）
前記エコーキャンセル部は、前記ノイズキャンセル部においてノイズキャンセル処理が施された後の音声信号に対して、エコーキャンセル処理を施す付記１に記載の音声入出力装置。
（付記３）
前記ノイズキャンセル部は、第１適応フィルタを用いて、ノイズキャンセル処理を行い、前記エコーキャンセル部は、第２適応フィルタを用いて、エコーキャンセル処理を行い、前記第１適応フィルタの更新を行う場合には前記第２適応フィルタの更新を行わず、前記第２適応フィルタの更新を行う場合には前記第１適応フィルタの更新を行わない付記１または２に記載の音声入出力装置。
（付記４）
前記ノイズキャンセル部は、
前記主音声取得部が前記主音声を取得しておらず、前記音声出力部が出力音声を出力していないタイミングで、前記第１適応フィルタの更新を行う付記３に記載の音声入出力装置。
（付記５）
前記エコーキャンセル部は、
前記音声出力部が出力音声を出力しているタイミングで、前記第２適応フィルタの更新を行う付記３または４に記載の音声入出力装置。
（付記６）
前記ノイズキャンセル部および前記エコーキャンセル部は、前記主音声取得部が前記主音声を取得し、前記音声出力部が出力音声を出力しているタイミングでは、前記第１、第２適応フィルタの更新を行わない付記４または５に記載の音声入出力装置。
（付記７）
前記ノイズキャンセル部は、前記エコーキャンセル部においてエコーキャンセル処理が施された後の音声信号に対して、前記雑音取得部で取得した前記外部雑音を用いてノイズキャンセル処理を施す付記１に記載の音声入出力装置。
（付記８）
前記主音声取得部に対する前記外部雑音の進入経路を制限する遮音部をさらに備えた付記１乃至７のいずれか１項に記載の音声入出力装置。
（付記９）
前記音声入出力装置の着脱を検知する着脱検知部をさらに有し、
前記ノイズキャンセル部は、第１適応フィルタを用いてノイズキャンセル処理を行い、前記エコーキャンセル部は、第２適応フィルタを用いてエコーキャンセル処理を行い、
前記着脱検知部で前記音声入出力装置の装着を検知した場合に、前記第１適応フィルタおよび前記第２適応フィルタのうち少なくとも一方の更新を行う付記１乃至８のいずれか１項に記載の音声入出力装置。
（付記１０）
前記ノイズキャンセル部および前記エコーキャンセル部の両方によって処理した音声信号を送信する通信部をさらに備えた付記１乃至８のいずれか１項に記載の音声入出力装置。
（付記１１）
ユーザの体の外側に向けて配置され、前記ユーザの外部から到来する外部雑音を取得するための雑音取得部と、
音声信号の入力を受け付け、前記ユーザの外耳道に対して出力音声を出力する音声出力部と、
前記外部雑音と前記出力音声と前記ユーザの声帯から前記外耳道内を伝わってきた前記ユーザの主音声とが混合された混合音声を取得して、混合音声信号を出力する主音声取得部と、
前記外部雑音に基づく雑音信号を用いて、前記混合音声信号を処理するノイズキャンセル部と、
前記音声信号を用いて、前記混合音声信号を処理するエコーキャンセル部と、
前記音声出力部に入力される音声信号を増幅する増幅部と、
を備えた補聴器。
（付記１２）
ユーザの体の外側に向けて配置され、前記ユーザの外部から到来する外部雑音を取得するための雑音取得ステップと、
音声の入力を受け付け、前記ユーザの外耳道に対して出力音声を出力する音声出力ステップと、
前記外部雑音と前記出力音声と前記ユーザの声帯から前記外耳道内を伝わってきた前記ユーザの主音声とが混合された混合音声を取得する主音声取得ステップと、
前記外部雑音に基づく雑音信号を用いて、前記混合音声を処理するノイズキャンセルステップと、
前記音声信号を用いて、前記混合音声を処理するエコーキャンセルステップと、
を含む音声入出力方法。
（付記１３）
ユーザの体の外側に向けて配置され、前記ユーザの外部から到来する外部雑音を取得するための雑音取得ステップと、
音声の入力を受け付け、前記ユーザの外耳道に対して出力音声を出力する音声出力ステップと、
前記外部雑音と前記出力音声と前記ユーザの声帯から前記外耳道内を伝わってきた前記ユーザの主音声とが混合された混合音声を取得する主音声取得ステップと、
前記外部雑音に基づく雑音信号を用いて、前記混合音声を処理するノイズキャンセルステップと、
前記音声信号を用いて、前記混合音声を処理するエコーキャンセルステップと、
をコンピュータに実行させる音声入出力プログラム。 [Other expressions of the embodiment]
Some or all of the above embodiments can also be described as the following additional remarks, but are not limited to the following.
(Appendix 1)
a noise acquisition unit arranged toward the outside of a user's body for acquiring external noise arriving from the outside of the user;
an audio output unit that receives input of an audio signal and outputs output audio to the user's ear canal;
a main voice acquisition unit that acquires a mixed voice obtained by mixing the external noise, the output voice, and the user's main voice transmitted through the external auditory canal from the user's vocal cords, and outputs a mixed voice signal;
a noise cancellation unit that processes the mixed audio signal using the noise signal based on the external noise;
and an echo canceling unit that processes the mixed audio signal using the audio signal.
(Appendix 2)
The audio input/output device according to appendix 1, wherein the echo canceling section performs echo canceling processing on an audio signal that has undergone noise canceling processing in the noise canceling section.
(Appendix 3)
When the noise cancellation unit performs noise cancellation processing using a first adaptive filter, and the echo cancellation unit performs echo cancellation processing using a second adaptive filter, and updates the first adaptive filter 3. The voice input/output device according to appendix 1 or 2, wherein the second adaptive filter is not updated when the second adaptive filter is updated, and the first adaptive filter is not updated when the second adaptive filter is updated.
(Appendix 4)
The noise canceling unit is
3. The audio input/output device according to appendix 3, wherein the first adaptive filter is updated at a timing when the main audio acquisition unit has not acquired the main audio and the audio output unit has not output the output audio.
(Appendix 5)
The echo canceling unit is
5. The audio input/output device according to appendix 3 or 4, wherein the second adaptive filter is updated at the timing when the audio output unit is outputting the output audio.
(Appendix 6)
The noise cancellation unit and the echo cancellation unit update the first and second adaptive filters at the timing when the main sound acquisition unit acquires the main sound and the sound output unit outputs the output sound. The audio input/output device according to appendix 4 or 5.
(Appendix 7)
The sound according to Supplementary Note 1, wherein the noise canceling unit performs noise canceling processing using the external noise acquired by the noise acquiring unit on an audio signal that has undergone echo cancellation processing in the echo canceling unit. I/O device.
(Appendix 8)
8. The voice input/output device according to any one of Appendices 1 to 7, further comprising a sound insulation unit that restricts an entry route of the external noise to the main voice acquisition unit.
(Appendix 9)
further comprising an attachment/detachment detection unit that detects attachment/detachment of the audio input/output device;
The noise cancellation unit performs noise cancellation processing using a first adaptive filter, the echo cancellation unit performs echo cancellation processing using a second adaptive filter,
9. The audio according to any one of additional notes 1 to 8, wherein at least one of the first adaptive filter and the second adaptive filter is updated when the attachment/detachment detection unit detects attachment of the audio input/output device. I/O device.
(Appendix 10)
9. The audio input/output device according to any one of Appendices 1 to 8, further comprising a communication section that transmits an audio signal processed by both the noise cancellation section and the echo cancellation section.
(Appendix 11)
a noise acquisition unit arranged toward the outside of a user's body for acquiring external noise arriving from the outside of the user;
an audio output unit that receives input of an audio signal and outputs output audio to the user's ear canal;
a main voice acquisition unit that acquires a mixed voice obtained by mixing the external noise, the output voice, and the user's main voice transmitted through the external auditory canal from the user's vocal cords, and outputs a mixed voice signal;
a noise cancellation unit that processes the mixed audio signal using the noise signal based on the external noise;
an echo cancellation unit that processes the mixed audio signal using the audio signal;
an amplifier that amplifies an audio signal input to the audio output unit;
Hearing aids with
(Appendix 12)
a noise acquisition step for acquiring external noise positioned toward the outside of a user's body and coming from outside the user;
a voice output step of receiving voice input and outputting output voice to the user's ear canal;
a main voice acquiring step of acquiring a mixed voice obtained by mixing the external noise, the output voice, and the user's main voice transmitted through the external auditory canal from the user's vocal cords;
a noise cancellation step of processing said mixed speech using a noise signal based on said external noise;
an echo cancellation step of processing said mixed speech using said speech signal;
Audio input and output methods, including
(Appendix 13)
a noise acquisition step for acquiring external noise positioned toward the outside of a user's body and coming from outside the user;
a voice output step of receiving voice input and outputting output voice to the user's ear canal;
a main voice acquiring step of acquiring a mixed voice obtained by mixing the external noise, the output voice, and the user's main voice transmitted through the external auditory canal from the user's vocal cords;
a noise cancellation step of processing said mixed speech using a noise signal based on said external noise;
an echo cancellation step of processing said mixed speech using said speech signal;
A voice input/output program that causes a computer to execute

Claims

ノイズキャンセル機能を備え、
ＡＩアシスタントに入力する音声を取得するマイクを備えるイヤホン。 With noise canceling function
An earphone equipped with a microphone that acquires the voice input to the AI assistant.

前記イヤホンは２つのマイクを備え、
前記２つのマイクから得られる第１の音声と第２の音声とを用いてノイズキャンセルを行う請求項１に記載のイヤホン。 the earphone has two microphones,
The earphone according to claim 1, wherein noise cancellation is performed using the first sound and the second sound obtained from the two microphones.

前記音声を送信する通信部をさらに備えた請求項１に記載のイヤホン。 The earphone according to claim 1, further comprising a communication section that transmits the sound.

前記イヤホンはスピーカをさらに備え、
前記第１の音声と、
前記第２の音声と、
前記スピーカから出力される第３の音声のための出力音声信号とを用いてエコーキャンセルする請求項２のイヤホン。 The earphone further comprises a speaker,
the first audio;
the second voice;
3. The earphone according to claim 2, wherein echo cancellation is performed using an output audio signal for a third audio output from said speaker.

前記イヤホンはスピーカをさらに備え、
前記第１の音声である外音に基づく雑音信号と前記スピーカから得られる第３の音声に基づく出力音声信号とに基づいて生成された擬似雑音信号をさらに用いて前記ノイズキャンセルする、請求項２に記載のイヤホン。 The earphone further comprises a speaker,
3. The noise cancellation is further performed using a pseudo-noise signal generated based on a noise signal based on an external sound that is the first voice and an output voice signal based on a third voice obtained from the speaker. Earphones described in .

前記イヤホンはスピーカをさらに備え、
前記イヤホンはユーザに装着され、
前記第１の音声は、外部から到来する外部雑音であり、
前記スピーカは、出力音声信号の入力を受け付け、前記ユーザの外耳道に対して第３の音声を出力し、
前記第２の音声は、前記外部雑音と前記第３の音声と前記ユーザの内部から到来する主音声とが混合された混合音声を含み、
前記外部雑音に基づく雑音信号と前記出力音声信号とに基づいて生成された擬似雑音信号と前記音声に基づく混合音声信号とを用いてノイズキャンセルを行う、請求項２に記載のイヤホン。 The earphone further comprises a speaker,
the earphone is worn by a user,
The first sound is external noise coming from the outside,
the speaker receives an input of an output audio signal and outputs a third audio to the user's ear canal;
the second sound includes a mixed sound in which the external noise, the third sound, and a main sound coming from inside the user are mixed;
3. The earphone according to claim 2, wherein noise cancellation is performed using a pseudo-noise signal generated based on the noise signal based on the external noise and the output audio signal, and a mixed audio signal based on the audio.

第１適応フィルタを用いて、ノイズキャンセルを行い、
前記雑音信号と、前記擬似雑音信号に基づいて前記第１適応フィルタの係数を更新する請求項６に記載のイヤホン。 Perform noise cancellation using the first adaptive filter,
7. The earphone according to claim 6, wherein coefficients of said first adaptive filter are updated based on said noise signal and said pseudo noise signal.

ＡＩアシスタントに入力する音声を取得する音声取得ステップと、
前記ＡＩアシスタントに入力する音声についてノイズキャンセルを行うノイズキャンセリングステップと、
を含む音声処理方法。 a voice acquisition step of acquiring voice to be input to the AI assistant;
a noise canceling step of noise canceling the voice input to the AI assistant;
audio processing methods, including

ＡＩアシスタントに入力する音声を取得する音声取得ステップと、
前記ＡＩアシスタントに入力する音声についてノイズキャンセルを行うノイズキャンセリングステップと、
をコンピュータに実行させる音声入出力プログラム。 a voice acquisition step of acquiring voice to be input to the AI assistant;
a noise canceling step of noise canceling the voice input to the AI assistant;
A voice input/output program that causes a computer to execute