JP6290827B2

JP6290827B2 - Method for processing an audio signal and a hearing aid system

Info

Publication number: JP6290827B2
Application number: JP2015114756A
Authority: JP
Inventors: チンフォンリウ; シアオハンチェン
Original assignee: Ching Feng LIU; Hsiao Han Chen
Current assignee: Ching Feng LIU; Hsiao Han Chen
Priority date: 2015-06-05
Filing date: 2015-06-05
Publication date: 2018-03-07
Anticipated expiration: 2035-06-05
Also published as: JP2017005356A

Description

本発明は、オーディオ信号を処理する方法及び補聴器システムに関し、特には、画像情報及び音声情報を用いて集音するオーディオ信号を処理する方法及び補聴器システムに関する。 The present invention relates to a method and a hearing aid system for processing an audio signal, and more particularly, to a method and a hearing aid system for processing an audio signal collected using image information and sound information.

カクテルパーティー効果（cocktail party effect）とは、カクテルパーティーのようにたくさんの人がそれぞれに雑談している中でも、自分が興味のある人の発話を聞き取ることができるように、音声を選択的に聞き取ることができることを指す。これは、一般に人の聴覚系が音源の位置を限定し、該音源の位置から音声情報を抽出することができるようになっていることによる。このように、騒々しい環境でも、特定の相手に集中することによって簡単にターゲット話者と会話することが可能である。 Cocktail party effect means that you can listen to the voices of people you are interested in, even when many people are chatting with each other like a cocktail party. It refers to what can be done. This is because a human auditory system generally limits the position of a sound source and can extract voice information from the position of the sound source. In this way, even in a noisy environment, it is possible to easily talk to the target speaker by concentrating on a specific partner.

しかし、聴覚障害を有する人によっては、一般の人よりも可聴値（threshold of hearing）が高く、音源の位置を特定することができないため、カクテルパーティー効果が得られない。また、片方の耳だけが不自由な場合でもうまくカクテルパーティー効果を得ることができない。 However, depending on the person with hearing impairment, the audible value (threshold of hearing) is higher than that of a general person, and the position of the sound source cannot be specified, so that the cocktail party effect cannot be obtained. Also, even if only one ear is inconvenient, a cocktail party effect cannot be obtained.

書名：ＤｉｇｉｔａｌＨｅａｒｉｎｇＡｉｄｓ、出版社：ＴｈｉｅｍｅＭｅｄｉｃａｌＰｕｂｌｉｓｈｅｒｓ，ＩＮＣ．、著者：ＡｒｔｈｕｒＳｃｈａｕｂ、出版日：２００８年６月１６日、ＩＳＢＮ：９７８−１−６０４０６−００６−５Title: Digital Healing Aids, Publisher: Thime Medical Publishers, INC. Author: Arthur Schaub, Publication date: June 16, 2008, ISBN: 978-1-60406-006-5.

従来の補聴器システムには、音源の位置を特定することにより、聴覚障害者の聞き取りを補助するものもある。このような補聴器システムは、ユーザーの前方からの音声情報を抽出し、抽出された音声情報を処理し、信号対ノイズ比（ＳＮＲ）を上げるように構成されている。聴覚障害のあるユーザーは、補聴器システムによって、騒々しい環境でも、前方に居る話者の声がよく聞き取れるようになるが、ユーザーの前方に複数の人が居る場合、従来の補聴器システムは、ユーザーのターゲット話者がだれであるかを特定することができない。 Some conventional hearing aid systems assist the hearing impaired by listening to the location of the sound source. Such hearing aid systems are configured to extract audio information from the front of the user, process the extracted audio information, and increase the signal-to-noise ratio (SNR). Users with hearing impairments can hear the voice of the speaker in front even in a noisy environment with a hearing aid system, but when there are multiple people in front of the user, It is not possible to identify who the target speaker is.

本発明は、複数の話者が居る場合でもユーザーがターゲット話者の声を集中して聞くことができるように補助することができるオーディオ信号を処理する方法及び補聴器システムを提供することを目的とする。 It is an object of the present invention to provide a method and a hearing aid system for processing an audio signal that can assist a user to listen to a target speaker's voice in a concentrated manner even when there are a plurality of speakers. To do.

上記目的を達成するために、一の観点によれば、
本発明に係るオーディオ信号を処理する方法は、画像取り込みモジュールと収音モジュールとプロセッサとを備えている補聴器システムを用いて実行される方法であって、前記収音モジュールを用いて、前記補聴器システムの周囲の音声情報を集める、ステップ（ａ）と、前記画像取り込みモジュールを用いて、前記補聴器システムの周囲の画像を取得する、ステップ（ｂ）と、前記プロセッサを用いて、前記収音モジュールによって集められた前記音声情報に指向性信号処理操作を実行し、前記画像取り込みモジュールによって取得された前記画像におけるターゲット対象物の位置に関連するエリアからの抽出されたボイス信号を含む出力オーディオ信号を生成する、ステップ（ｃ）と、前記出力オーディオ信号を出力する、ステップ（ｄ）と、を有することを特徴とする。 In order to achieve the above object, according to one aspect,
A method of processing an audio signal according to the present invention is a method performed using a hearing aid system including an image capturing module, a sound collection module, and a processor, and the hearing aid system using the sound collection module. Collecting sound information of the surroundings, step (a), using the image capturing module to obtain an image of the surroundings of the hearing aid system, step (b), and using the processor by the sound collecting module Perform directional signal processing operations on the collected audio information to generate an output audio signal that includes an extracted voice signal from an area associated with the position of a target object in the image acquired by the image capture module Step (c) and outputting the output audio signal, step (d) And having a, the.

また、他の観点によれば、本発明に係る補聴器システムは、聴覚障害者の聞き取りを補助する補聴器システムであって、前記補聴器システムの周囲の音声情報を集める、収音モジュールと、前記補聴器システムの周囲の画像を取り入れる、画像取り込みモジュールと、前記収音モジュール及び前記画像取り込みモジュールと連結され、前記収音モジュールによって集められた前記音声情報に対して指向性信号処理操作を施し、出力オーディオ信号を生成する音声処理モジュールを有し、前記出力オーディオ信号は、前記画像取り込みモジュールによって取得された画像におけるターゲット対象物の位置に関連するエリアからの抽出されたボイス信号を含む、プロセッサと、前記プロセッサと連結され、前記出力オーディオ信号を出力する、音声出力装置と、を備えていることを特徴とする。 According to another aspect, a hearing aid system according to the present invention is a hearing aid system for assisting hearing of a hearing impaired person, and collects sound information around the hearing aid system, and the hearing aid system. An image capturing module that captures an image around the sound source, and the sound collecting module and the image capturing module. The audio information collected by the sound collecting module is subjected to a directional signal processing operation, and an output audio signal is obtained. And a processor, wherein the output audio signal includes an extracted voice signal from an area associated with a position of a target object in an image acquired by the image capture module; Connected to and output the output audio signal, sound Characterized in that an output device.

本発明に係るオーディオ信号を処理する方法及び補聴器システムによって、ユーザーは、騒々しい環境、又は、ユーザーの前方に複数の人が居る場合でも、ターゲット話者を特定してそのターゲット話者の声がよく聞き取れるようになる。 With the method and hearing aid system for processing an audio signal according to the present invention, a user can identify a target speaker and hear the voice of the target speaker even in a noisy environment or when there are multiple people in front of the user. Can be heard well.

本発明に係る補聴器システムの一つの実施形態を実施する一例を概略的に示す斜視図である。It is a perspective view showing roughly an example which carries out one embodiment of a hearing aid system concerning the present invention. 本発明に係る補聴器システムの構成を概略的に示すブロック図である。It is a block diagram which shows roughly the structure of the hearing aid system which concerns on this invention. 本発明に係るオーディオ信号を処理する方法のフローチャートを示す図である。FIG. 3 shows a flowchart of a method for processing an audio signal according to the present invention. 画像取り込みモジュールによって取り入れられた補聴器システムの周囲の画像を示す図である。FIG. 3 shows an image of the surroundings of a hearing aid system taken by an image capture module. 三つの識別された人顔対象物及び人顔対象物それぞれの参照軸に対するインクルーデッドアングルθ１、θ２、θ３を示す図である。It is a figure which shows the included angles (theta) 1, (theta) 2, and (theta) 3 with respect to the reference axis | shaft of each of the three identified human face object and human face object.

以下添付図面に従って本発明に係る補聴器システムの一つの好ましい実施の形態について詳説する。
（実施形態） Hereinafter, one preferred embodiment of a hearing aid system according to the present invention will be described in detail with reference to the accompanying drawings.
(Embodiment)

図１は、本発明に係る補聴器システムの一つの実施形態を概略的に示す斜視図である。図示の如く、この実施形態に係る補聴器システム２は、眼鏡等の装着具３によってユーザーに装着可能に構成することができる。 FIG. 1 is a perspective view schematically showing one embodiment of a hearing aid system according to the present invention. As shown in the figure, the hearing aid system 2 according to this embodiment can be configured to be worn by a user with a wearing tool 3 such as glasses.

該実施形態に係る補聴器システム２は、図２に示されているように、画像取り込みモジュール４、収音モジュール５、プロセッサ６及び音声出力装置７等の電子部品を備えている。補聴器システム２に備えている電子部品が装着具３に取り付けられている。 As shown in FIG. 2, the hearing aid system 2 according to this embodiment includes electronic components such as an image capturing module 4, a sound collection module 5, a processor 6, and an audio output device 7. An electronic component included in the hearing aid system 2 is attached to the mounting tool 3.

装着具３は、画像取り込みモジュール４が取り付けられる横長のフロントサポーター３１と、ユーザーの耳に掛けて支持されるようにフロントサポーター３１の両側にそれぞれ連結されているサイドサポーター３２、３２とから構成された枠機構を備えている。装着具３は、この実施形態では、フロントサポーター３１とするレンズ枠を有する眼鏡フレームと、サイドサポーター３２、３２とする一対のテンプルとを備えてメガネのように構成され、フロントサポーター３１は一対のレンズ３１０、３１０を保持可能に構成されている。 The mounting tool 3 includes a horizontally long front support 31 to which the image capturing module 4 is attached, and side supporters 32 and 32 respectively connected to both sides of the front supporter 31 so as to be supported on the ears of the user. A frame mechanism is provided. In this embodiment, the mounting tool 3 includes a spectacle frame having a lens frame serving as a front supporter 31 and a pair of temples serving as side supporters 32 and 32, and is configured like glasses. The front supporter 31 includes a pair of front supporters 31. The lenses 310 and 310 can be held.

画像取り込みモジュール４は、図４に示されているように、補聴器システム２の周りの画像４０を取り込むように構成され、複数の画像４０を取り込み可能に一対のレンズ３１０、３１０の間に取り付けられたカメラ部を有する。 The image capture module 4 is configured to capture an image 40 around the hearing aid system 2 as shown in FIG. 4 and is mounted between a pair of lenses 310, 310 to capture multiple images 40. It has a camera part.

収音モジュール５は、１例としては、複数のマイクロホン５１を含むマイクロホンアレイを有する。複数のマイクロホン５１は、補聴器システム２の周囲の情報を集めるようにフロントサポーター３１とサイドサポーター３２、３２に取り付けられている。なお、マイクロホン５１は、さまざまな指向性、例えば全方向性、単一指向性又はそれらの混合型の指向性を備えたものが用いられることが可能であることに留意されたい。 The sound collection module 5 has a microphone array including a plurality of microphones 51 as an example. The plurality of microphones 51 are attached to the front supporter 31 and the side supporters 32 and 32 so as to collect information around the hearing aid system 2. It should be noted that the microphone 51 can be used having various directivities, for example, omnidirectional, unidirectional, or a mixed directivity thereof.

プロセッサ６は、画像取り込みモジュール４及び収音モジュール５と連結されるようにサイドサポーター３２、３２のいずれかに取り付けられ、図２に示されているように、画像分析モジュール６１と音声処理モジュール６２とを有する。 The processor 6 is attached to one of the side supporters 32 and 32 so as to be connected to the image capturing module 4 and the sound collection module 5, and as shown in FIG. 2, the image analysis module 61 and the sound processing module 62. And have.

画像分析モジュール６１は、画像取り込みモジュール４によって取り込まれた画像４０における人の顔の存在（図４参照）を人顔対象物４０１として識別するように構成されている。画像分析モジュール６１はまた、人顔対象物４０１のそれぞれに関連する対象物情報を定めると共に、人顔対象物４０１の対象物情報に基づいて人顔対象物４０１のそれぞれを識別するように尤度分類を定める。なお、ここでいう尤度とは、人顔対象物４０１のそれぞれに応じて、識別された人顔対象物４０１に関連する人がターゲット話者である可能性を示す尤度であり、以下に詳説するように、尤度の高低によって分類する。 The image analysis module 61 is configured to identify the presence of a human face (see FIG. 4) in the image 40 captured by the image capture module 4 as a human face object 401. The image analysis module 61 also determines object information related to each human face object 401 and likelihood to identify each human face object 401 based on the object information of the human face object 401. Define the classification. The likelihood here is a likelihood indicating that a person related to the identified human face object 401 may be a target speaker according to each of the human face objects 401. As will be described in detail, the classification is based on the likelihood.

画像分析モジュール６１によって定められた対象物情報は、この実施形態では、識別された人顔対象物４０１に関連する、補聴器システム２に対する相対深度情報と相対方向情報とを含む。この実施形態では、相対方向情報とは、画像４０における参照軸４０２と画像４０における識別された人顔対象物４０１との位置の間の角度（インクルーデッドアングル）を示す。参照軸４０２とは、図１に示されているように一対のレンズ３１０間のフロントサポーター３１に配置された画像取り込みモジュール４の軸を意味する。相対方向情報はまた、画像４０において識別された人顔対象物４０１のターン（顔の向き）による角度をも示す。 The object information defined by the image analysis module 61 in this embodiment includes relative depth information and relative direction information for the hearing aid system 2 associated with the identified human face object 401. In this embodiment, the relative direction information indicates an angle (included angle) between the position of the reference axis 402 in the image 40 and the identified human face object 401 in the image 40. The reference axis 402 means an axis of the image capturing module 4 arranged on the front supporter 31 between the pair of lenses 310 as shown in FIG. The relative direction information also indicates an angle according to the turn (face orientation) of the human face object 401 identified in the image 40.

画像分析モジュール６１は、上述の対象物情報を得るために、図２に示されているように、深度分析ユニット６１１と、角度分析ユニット６１２と、ターン分析ユニット６１３と、尤度分類器６１４と、唇動き検出ユニット６１５とを有する。 As shown in FIG. 2, the image analysis module 61 obtains the above-described object information, as shown in FIG. 2, the depth analysis unit 611, the angle analysis unit 612, the turn analysis unit 613, the likelihood classifier 614, And a lip movement detection unit 615.

深度分析ユニット６１１は、画像４０における識別された人顔対象物４０１それぞれの補聴器システム２に対する相対深度を定めるように構成されている。より詳しく説明すると、深度分析ユニット６１１は、識別された人顔対象物４０１のサイズと識別された人顔対象物４０１の相対深度の関係に関連する情報を含むデータベースを有する。１例としては、深度分析ユニット６１１は、例えば、画像４０におけるそれぞれの識別された人顔対象物４０１の領域を算出することによって、それぞれの識別された人顔対象物４０１のサイズを得るように構成される。その後、深度分析ユニット６１１は、該データベースを参照することで、画像４０におけるそれぞれの識別された人顔対象物４０１の相対深度を定めるように構成される。なお、この実施形態では、深度分析ユニット６１１を用いて相対深度を定めるが、他の実施形態としては深度分析ユニット６１１以外の構成を用いてもよいことに留意されたい。 The depth analysis unit 611 is configured to determine the relative depth of each identified human face object 401 in the image 40 with respect to the hearing aid system 2. More specifically, the depth analysis unit 611 includes a database including information related to the relationship between the size of the identified human face object 401 and the relative depth of the identified human face object 401. As an example, the depth analysis unit 611 obtains the size of each identified human face object 401 by, for example, calculating the area of each identified human face object 401 in the image 40. Composed. The depth analysis unit 611 is then configured to determine the relative depth of each identified human face object 401 in the image 40 by referring to the database. In this embodiment, the relative depth is determined by using the depth analysis unit 611, but it should be noted that other embodiments may use a configuration other than the depth analysis unit 611.

角度分析ユニット６１２は、図５に示されているように、画像取り込みモジュール４の一つの軸を表す画像４０における参照軸４０２と画像４０における識別された人顔対象物４０１の位置との間に形成されたインクルーデッドアングルを定めるように構成されている。 The angle analysis unit 612 is located between the reference axis 402 in the image 40 representing one axis of the image capture module 4 and the position of the identified human face object 401 in the image 40, as shown in FIG. It is comprised so that the formed included angle may be defined.

画像取り込みモジュール４は、この実施形態において一対のレンズ３１０の間に取り付けられているので、参照軸４０２は、装着具３を掛けているユーザーの視野内の線を示す。 Since the image capture module 4 is mounted between a pair of lenses 310 in this embodiment, the reference axis 402 shows a line in the field of view of the user who is wearing the fitting 3.

図５は、三つの識別された人顔対象物４０１及び参照軸４０２に対する各人顔対象物４０１それぞれのインクルーデッドアングルθ１、θ２、θ３を示している。画像分析モジュール６１は、識別された人顔対象物４０１それぞれの相対深度及びインクルーデッドアングルを使って、識別された人顔対象物４０１の補聴器システム２に対する相対位置を得ることができる。なお、この実施形態では、画像分析モジュール６１を用いて人顔対象物４０１の相対位置を定めるが、他の実施形態としては画像分析モジュール６１以外の構成を用いてもよいことに留意されたい。 FIG. 5 shows the included angles θ1, θ2, and θ3 of each human face object 401 with respect to the three identified human face objects 401 and the reference axis 402. The image analysis module 61 can obtain the relative position of the identified human face object 401 relative to the hearing aid system 2 using the relative depth and included angle of each identified human face object 401. In this embodiment, the relative position of the human face object 401 is determined using the image analysis module 61, but it should be noted that other configurations may be used as other embodiments.

ターン分析ユニット６１３は、画像４０において識別された人顔対象物４０１の曲がった角度、つまり、装着具３を掛けたユーザーから見た場合に人顔対象物４０１が正面を向いている場合を基準として人顔対象物４０１の顔の向きがその基準から曲がった（ターンした）角度を定めるように構成されている。 The turn analysis unit 613 is based on the bent angle of the human face object 401 identified in the image 40, that is, when the human face object 401 is facing the front when viewed from the user who wears the wearing tool 3. The face direction of the human face object 401 is configured to determine the angle at which the face is bent (turned) from the reference.

尤度分類器６１４は、識別された人顔対象物４０１に関連する対象物情報に基づいて、識別された人顔対象物４０１それぞれの尤度分類を定めるように構成されている。なお、画像分析モジュール６１によって定められた識別された人顔対象物４０１に関連する対象物情報は、相対深度情報と相対方向情報とを含む。 The likelihood classifier 614 is configured to determine the likelihood classification of each identified human face object 401 based on the object information related to the identified human face object 401. The object information related to the identified human face object 401 determined by the image analysis module 61 includes relative depth information and relative direction information.

この実施形態では、異なる相対深度には異なる重みが予め定められ、異なる相対方向には異なる重みが予め定められ、識別された人顔４０１それぞれの尤度分類は、それぞれの相対深度情報及び相対方向情報に関連する重みに応じて、尤度分類器６１４によって予め定められる。 In this embodiment, different weights are predetermined for different relative depths, different weights are predetermined for different relative directions, and the likelihood classification of each identified human face 401 includes respective relative depth information and relative directions. It is predetermined by likelihood classifier 614 according to the weight associated with the information.

一般的には、人との距離に関して、パーソナルスペースという近接学研究の用語があり、人と人の間の距離は、夫婦、恋人、親子、親友の間等０.５ｍ以内の密接距離、相手の表情が読み取れ且つ相手に手が届くる０.５ｍ〜１.５ｍ以内の個体距離、例えば会議等の相手に手は届きづらいが容易に会話ができる１.５〜３ｍ以内の社会距離、例えば講演等の複数の相手が見渡せる３ｍ以上の公共距離等の四つのゾーンに大別される。これによれば、親密な相手又はより重要な対話者ほどユーザーに近くなり、且つユーザーに近いほど、ユーザーに向かってほぼ対面すると共に、ユーザーの視野中心に近づいてくる。 In general, there is a term of proximity study called personal space regarding the distance to a person, and the distance between a person is a close distance within 0.5 m between a couple, lover, parent and child, best friend, etc. Individual distance within 0.5m to 1.5m that can read the facial expression and reach to the other party, for example, social distance within 1.5 to 3m that makes it difficult to reach the other party at the meeting etc. It is divided into four zones, such as public distances of 3m or more that can be seen by multiple parties such as lectures. According to this, an intimate partner or a more important conversation person is closer to the user, and the closer to the user, the closer to the user, the closer to the center of the user's field of view.

これによって、この実施形態では、尤度分類器６１４は、相対深度が小さく、インクルーデッドアングルも小さく、画像４０において識別された人顔対象物４０１の顔の向きの傾いた角度も小さいことに高い重みを対応付ける。次いで、尤度分類器６１４は、識別された人顔対象物４０１それぞれに対応付けられた重みを基にして尤度分類を定める。この実施形態では、識別された人顔対象物４０１それぞれが、高尤度、中尤度及び低尤度のいずれかに分類される。 Thereby, in this embodiment, the likelihood classifier 614 is high in that the relative depth is small, the included angle is small, and the angle of inclination of the face direction of the human face object 401 identified in the image 40 is also small. Associate weights. Next, the likelihood classifier 614 determines the likelihood classification based on the weight associated with each identified human face object 401. In this embodiment, each identified human face object 401 is classified into one of high likelihood, medium likelihood, and low likelihood.

唇動き検出ユニット６１５は、識別された人顔対象物４０１に唇の動きがあるかどうかを検出するものであり、画像分析モジュール６１によって定められた対象物情報は、唇動き検出ユニット６１５によって検出された、対応する識別された人顔対象物４０１に唇の動きがあるかどうかを示す情報を含むものである。例えば本実施形態では、唇動き検出ユニット６１５は、高尤度に分類された識別された人顔対象物４０１において唇の動きがあるかどうかを検出するように構成されている。唇動き検出ユニット６１５によって、一つの識別された人顔対象物４０１の唇の動きが検出されると、該人顔対象物４０１を発話者であるとみなし、ターゲット話者とすることが考えられる。 The lip movement detection unit 615 detects whether or not the identified human face object 401 has lip movement. The object information determined by the image analysis module 61 is detected by the lip movement detection unit 615. Information indicating whether or not the corresponding identified human face object 401 has lip movement. For example, in the present embodiment, the lip movement detection unit 615 is configured to detect whether there is lip movement in the identified human face object 401 classified as high likelihood. When the lip movement detection unit 615 detects the movement of the lip of one identified human face object 401, the human face object 401 is considered to be a speaker and considered as a target speaker. .

唇動き検出ユニット６１５はまた、高尤度に分類された識別された人顔対象物４０１がなかった場合、中尤度に分類された識別された人顔対象物４０１に唇の動きが存在するか否かを検出するように構成されている。高尤度に分類された識別された人顔対象物４０１も中尤度に分類された識別された人顔対象物４０１もなかった場合、唇動き検出ユニット６１５は判断を中止する。この場合、画像４０に取得されて識別された人顔対象物４０１においてターゲット話者が存在する可能性が低いと考えられる。 The lip movement detection unit 615 also has lip movement in the identified human face object 401 classified as medium likelihood when there is no identified human face object 401 classified as high likelihood. It is configured to detect whether or not. If neither identified human face object 401 classified as high likelihood nor identified human face object 401 classified as medium likelihood, lip motion detection unit 615 stops the determination. In this case, it is considered that the target speaker is unlikely to exist in the human face object 401 acquired and identified in the image 40.

この実施形態では、唇動き検出ユニット６１５を用いて識別された人顔対象物４０１を話者であるか否かと検出するが、場合によって唇動き検出ユニット６１５を省略してもよい。この場合、プロセッサ６は、一つの識別された人顔対象物４０１をターゲット対象として選ぶ。この際に選ばれる識別された人顔対象物４０１は、画像４０において他の一つの識別された人顔対象物４０１と比較してより高い尤度を有するものである。画像分析モジュール６１は、ターゲット対象と思われる識別された人顔対象物４０１を示す信号を音声処理モジュール６２に送る。 In this embodiment, it is detected whether or not the human face object 401 identified using the lip movement detection unit 615 is a speaker, but the lip movement detection unit 615 may be omitted in some cases. In this case, the processor 6 selects one identified human face object 401 as the target object. The identified human face object 401 selected at this time has a higher likelihood than the other identified human face object 401 in the image 40. The image analysis module 61 sends a signal to the audio processing module 62 indicating the identified human face object 401 that appears to be the target object.

音声処理モジュール６２は、この実施形態では、収音モジュール５によって集められた音声情報を受信・処理するように構成されている。音声処理モジュール６２は、受信された音声情報を処理する際、例えば、アナログ・デジタル変換処理や、ノイズ低減処理等を行う。この実施形態では、音声処理モジュール６２は、スピーチ検出ユニット６２１とサウンドプロセッサ６２２とを有する。 In this embodiment, the audio processing module 62 is configured to receive and process the audio information collected by the sound collection module 5. The audio processing module 62 performs, for example, analog / digital conversion processing, noise reduction processing, and the like when processing the received audio information. In this embodiment, the audio processing module 62 includes a speech detection unit 621 and a sound processor 622.

スピーチ検出ユニット６２１は、収音モジュール５によって集められた音声情報に含まれ得るスピーチコンテンツを検出するように構成されている。音声情報におけるスピーチコンテンツの検出とは、補聴器システム２の周囲環境における人声の存在を検出することを示す。この実施形態では、画像取り込みモジュール４は、スピーチ検出ユニット６２１によるスピーチコンテンツの検出を示す信号が送られ後で、画像４０を取得することができるようになっている。 The speech detection unit 621 is configured to detect speech content that can be included in the audio information collected by the sound collection module 5. The detection of the speech content in the audio information indicates that the presence of a human voice in the surrounding environment of the hearing aid system 2 is detected. In this embodiment, the image capturing module 4 can acquire the image 40 after a signal indicating the detection of speech content by the speech detection unit 621 is sent.

サウンドプロセッサ６２２は、音声情報に対して指向性信号処理操作を施し、出力オーディオ信号を生成するように構成されている。当該指向性信号処理操作によって、画像４０におけるターゲット対象の位置に関連するエリアから来た音声を残すと共に、ターゲット対象の位置に関連するエリアから来た音声以外の方向から来た音声をろ過して除去する。この実施形態では、当該指向性信号処理操作とは、ビーム形成操作をいう。 The sound processor 622 is configured to perform a directional signal processing operation on audio information and generate an output audio signal. The directivity signal processing operation leaves audio coming from the area related to the target target position in the image 40 and filters voice coming from directions other than the voice coming from the area related to the target target position. Remove. In this embodiment, the directional signal processing operation refers to a beam forming operation.

出力オーディオ信号は、当該指向性信号処理操作の後、画像取り込みモジュール４によって取得された画像４０におけるターゲット対象の位置に関連するエリアから来るものである抽出されたボイス信号を含む。サウンドプロセッサ６２２はまた、該抽出されたボイス信号に対してノイズの低減及び／又は増幅を行った上で音声出力装置７に送るように構成されている。この実施形態では、音声出力装置７は、一対のイヤホンの形をして、それぞれがサイドサポーター３２それぞれに連結され、プロセッサ６に結合され、プロセッサ６から出力されたボイス信号を受信して外部に音声として出力するように構成されている。こうして、ユーザーは本発明に係る補聴器システム２を掛けると、ターゲット話者からの音声を選択的に聞くことができる。 The output audio signal includes an extracted voice signal that comes from an area associated with the target target position in the image 40 acquired by the image capture module 4 after the directional signal processing operation. The sound processor 622 is also configured to perform noise reduction and / or amplification on the extracted voice signal and send it to the audio output device 7. In this embodiment, the audio output device 7 is in the form of a pair of earphones, each of which is connected to each of the side supporters 32, is coupled to the processor 6, and receives the voice signal output from the processor 6 to the outside. It is configured to output as audio. In this way, when the user puts on the hearing aid system 2 according to the present invention, the user can selectively hear the voice from the target speaker.

この実施形態では、プロセッサ６は更に、外付け電子装置８と通信できる通信ユニット９を有する（図２参照）。通信ユニット９は、Ｗｉ−Ｆｉ、Ｚｉｇｂｅｅ（登録商標）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、近距離無線通信（ｎｅａｒｆｉｅｌｄｃｏｍｍｕｎｉｃａｔｉｏｎ／ＮＦＣ）のいずれかを介して、外付け電子装置８と通信するように構成される。 In this embodiment, the processor 6 further comprises a communication unit 9 that can communicate with the external electronic device 8 (see FIG. 2). The communication unit 9 is configured to communicate with the external electronic device 8 via any one of Wi-Fi, Zigbee (registered trademark), Bluetooth (registered trademark), and near field communication (NFC). Is done.

外付け電子装置８は、モバイル端末、タブレットコンピュータ、ノートブックコンピュータ又は他の携帯機器を用いて実行され、コントロールユニット（図示せず）、タッチディスプレイ８１及び通信ユニット８２とを有する。通信ユニット８２は、外付け電子装置８にインストールされたアプリケーションプログラムによって実行される。 The external electronic device 8 is executed using a mobile terminal, a tablet computer, a notebook computer, or another portable device, and includes a control unit (not shown), a touch display 81, and a communication unit 82. The communication unit 82 is executed by an application program installed in the external electronic device 8.

対象とする補聴器システム２が外付け電子装置８と通信する場合、ユーザーはタッチディスプレイ８１を操作し、通信ユニット８２を制御するコマンドを入力する。これによって、補聴器システム２及び外付け電子装置８の間に通信チャネルを設置させようとする。通信チャネルが設置されると、プロセッサ６は、通信ユニット９を介して、画像取り込みモジュール４によって取得された画像４０を外付け電子装置８に送る。 When the target hearing aid system 2 communicates with the external electronic device 8, the user operates the touch display 81 and inputs a command for controlling the communication unit 82. This attempts to install a communication channel between the hearing aid system 2 and the external electronic device 8. When the communication channel is installed, the processor 6 sends the image 40 acquired by the image capturing module 4 to the external electronic device 8 via the communication unit 9.

外付け電子装置８のコントロールユニットは、タッチディスプレイ８１を制御して、人顔対象物４０１を含む画像４０を映すと共に、ユーザーに人顔対象物４０１の一つをターゲット対象として選ばせる。その後、外付け電子装置８は、画像４０におけるターゲット対象の位置を示す外部制御信号を生成し、通信ユニット８２を介して補聴器システム２に送る。プロセッサ６が外部制御信号を受け取ると、画像分析モジュール６１は、識別された人顔対象物４０１からのターゲット対象選択に関連する操作をバイパスし、音声処理モジュール６２は外部制御信号に基づいて操作し始める。 The control unit of the external electronic device 8 controls the touch display 81 to project the image 40 including the human face object 401 and allows the user to select one of the human face objects 401 as a target object. Thereafter, the external electronic device 8 generates an external control signal indicating the position of the target object in the image 40 and sends it to the hearing aid system 2 via the communication unit 82. When the processor 6 receives the external control signal, the image analysis module 61 bypasses the operation related to the target object selection from the identified human face object 401, and the sound processing module 62 operates based on the external control signal. start.

なお、装着具３は、更に装着具３に設けられた電子部品に電力を供給するための手段例えば電池を有することは、当業者ならすぐ理解できるだろう。また、装着具３は、ユーザーに補聴器システム２をオンオフすることを可能にする物理スイッチを設けてもよい。 It will be readily appreciated by those skilled in the art that the mounting tool 3 further includes means for supplying power to the electronic components provided on the mounting tool 3, such as a battery. The wearing device 3 may also be provided with a physical switch that allows the user to turn the hearing aid system 2 on and off.

図１と図２に示されたように構成された本発明に係る補聴器システム２は、図３に示されているように、本発明に係るオーディオ信号処理方法の一例を実行する。 The hearing aid system 2 according to the present invention configured as shown in FIGS. 1 and 2 executes an example of the audio signal processing method according to the present invention as shown in FIG.

ステップ１００では、例えばユーザーがメガネの形等をした装着具３を掛けた後で、補聴器システム２が作動される。収音モジュール５は、起動後、その周りの音声情報を集め始める。 In step 100, the hearing aid system 2 is activated after the user has put on the wearing device 3, for example in the form of glasses. The sound collection module 5 starts collecting sound information around the sound collection module 5 after being activated.

ステップ１０２では、スピーチ検出ユニット６２１は、収音モジュール５によって集められた音声情報にスピーチコンテンツが含まれているか否かを検出する。スピーチコンテンツが検出された場合、ステップ１０４に進む。スピーチコンテンツが検出されなかった場合、再度収音モジュール５によって集められた音声情報に対する検出を行なう。 In step 102, the speech detection unit 621 detects whether or not speech content is included in the audio information collected by the sound collection module 5. If speech content is detected, the process proceeds to step 104. When the speech content is not detected, the audio information collected by the sound collection module 5 is detected again.

ステップ１０４では、画像取り込みモジュール４が画像４０を取得するようにする。 In step 104, the image capturing module 4 acquires the image 40.

ステップ１０６では、プロセッサ６は、補聴器システム２及び外付け電子装置８の間に通信チャネルが設置されたかどうかを定める。通信チャネルが設置されていない場合、補聴器システム２は第１のモードで作動し、ステップ１１０に進む。通信チャネルが設置されている場合、補聴器システム２は第２のモードで作動し、ステップ１３０に進む。 In step 106, the processor 6 determines whether a communication channel has been established between the hearing aid system 2 and the external electronic device 8. If no communication channel is installed, the hearing aid system 2 operates in the first mode and proceeds to step 110. If a communication channel is installed, the hearing aid system 2 operates in the second mode and proceeds to step 130.

第１のモードにて作動すると、ステップ１１０では、画像分析モジュール６１は、画像取り込みモジュール４によって取得された画像４０における人顔対象物４０１を識別する。 When operating in the first mode, in step 110, the image analysis module 61 identifies the human face object 401 in the image 40 acquired by the image capture module 4.

ステップ１１２では、画像分析モジュール６１は、識別された人顔対象物４０１にそれぞれ対応する対象物情報を定める。深度分析ユニット６１１は、画像４０において識別された人顔対象物４０１それぞれの補聴器システム２に対しての相対深度を定める。角度分析ユニット６１２は、画像４０における参照軸４０２と画像４０における識別された人顔対象物４０１の位置との間におけるインクルーデッドアングルを定める（図５参照）。ターン分析ユニット６１３は、画像４０において識別された人顔対象物４０１それぞれの曲がった（ターンした）角度を定める。 In step 112, the image analysis module 61 determines object information corresponding to each identified human face object 401. The depth analysis unit 611 determines the relative depth of each human face object 401 identified in the image 40 with respect to the hearing aid system 2. The angle analysis unit 612 determines an included angle between the reference axis 402 in the image 40 and the position of the identified human face object 401 in the image 40 (see FIG. 5). The turn analysis unit 613 determines a bent (turned) angle of each human face object 401 identified in the image 40.

ステップ１１４では、尤度分類器６１４は、対象物情報に基づき、識別された対象物情報それぞれの尤度分類を定める。 In step 114, the likelihood classifier 614 determines the likelihood classification of each identified object information based on the object information.

ステップ１１６では、唇動き検出ユニット６１５は、高尤度に分類された（又は高尤度に分類された識別された人顔対象物４０１がない場合は中尤度に分類された）識別された人顔対象物４０１のそれぞれの唇の動きの有無を検出する。唇の動きがあると検出された識別された人顔対象物４０１がターゲット対象物として選択される。 In step 116, the lip motion detection unit 615 has been identified as being highly likely (or classified as medium likelihood if there is no identified human face object 401 classified as high likelihood). The presence or absence of movement of each lip of the human face object 401 is detected. The identified human face object 401 detected as having lip movement is selected as the target object.

ステップ１１８では、サウンドプロセッサ６２２は音声情報に対してビーム形成操作を施すと、画像４０におけるターゲット対象物に関連するエリアから抽出されたボイス信号を含む出力オーディオ信号を生成する。 In step 118, the sound processor 622 performs a beamforming operation on the audio information to generate an output audio signal that includes a voice signal extracted from the area associated with the target object in the image 40.

ステップ１２０では、当該出力オーディオ信号を音声出力装置７に送信し、出力する。 In step 120, the output audio signal is transmitted to the audio output device 7 and output.

一方、第２のモードにて（例えば、補聴器システム２が外付け電子装置８と通信したとき）操作すると、ステップ１３０では、プロセッサ６は、画像４０を通信ユニット９を介して外付け電子装置８に送る。外付け電子装置８が画像４０を受け取ると、画像４０をタッチディスプレイ８１に映し、補聴器システム２がユーザーによる手動で一つ又は一つ以上の識別された人顔対象物４０１がターゲット対象物として選択される。 On the other hand, when operated in the second mode (for example, when the hearing aid system 2 communicates with the external electronic device 8), in step 130, the processor 6 displays the image 40 via the communication unit 9 in the external electronic device 8. Send to. When the external electronic device 8 receives the image 40, the image 40 is displayed on the touch display 81, and the hearing aid system 2 manually selects one or more identified human face objects 401 as the target object by the user. Is done.

ステップ１３２では、プロセッサ６は、外付け電子装置８からユーザーによって選ばれた画像４０におけるターゲット対象物の位置を示す外部制御信号を受け取る。その後、ステップ１１８に進む。 In step 132, the processor 6 receives an external control signal indicating the position of the target object in the image 40 selected by the user from the external electronic device 8. Thereafter, the process proceeds to step 118.

なお、唇動き検出ユニット６１５を用いると、ターゲット対象物の選択の正確さを大いに高めることができるが、例えばいくつかの所定の対象物からの音声が誤って自動的にろ過されて除去されないように、唇動き検出ユニット６１５を起動しなくてもよいことに留意されたい。 It should be noted that the use of the lip movement detection unit 615 can greatly improve the accuracy of selection of the target object, but, for example, audio from some predetermined objects is not automatically filtered and removed. Furthermore, it should be noted that the lip movement detection unit 615 need not be activated.

唇動き検出ユニット６１５を起動しない場合、高尤度を有する識別された人顔対象物４０１がターゲット対象物と見なされる。高尤度を有する識別された人顔対象物４０１が現れない場合、中尤度を有する識別された人顔対象物４０１がターゲット対象物と見なされる。高尤度を有する識別された人顔対象物４０１も中尤度を有する識別された人顔対象物４０１も現れない場合、サウンドプロセッサ６２２はただ正面方向の音声情報を抽出するようにすることができる。 If the lip movement detection unit 615 is not activated, the identified human face object 401 having a high likelihood is considered as the target object. If the identified human face object 401 with high likelihood does not appear, the identified human face object 401 with medium likelihood is considered as the target object. If neither the identified human face object 401 with high likelihood nor the identified human face object 401 with medium likelihood appears, the sound processor 622 may simply extract the audio information in the front direction. it can.

また、場合によっては、補聴器システム２にターン分析ユニット６１３を設けなくてもよく、同様に、本発明に係る方法でもターン分析ユニット６１３に関連する操作を省略してもよい。 Further, in some cases, the turn analysis unit 613 may not be provided in the hearing aid system 2, and similarly, operations related to the turn analysis unit 613 may be omitted in the method according to the present invention.

なお、本実施形態では、画像取り込みモジュール４が装着具３の一対のレンズ３１０、３１０の間に取り付けられているが、場合によって装着具３の別のところに設けられてもよい。 In the present embodiment, the image capturing module 4 is mounted between the pair of lenses 310 and 310 of the mounting tool 3, but may be provided elsewhere in the mounting tool 3 depending on circumstances.

以上により、本発明に係る補聴器システム２は、取得された画像４０からターゲットとする話者を選ぶ選択肢を２つ提供することができる。第１のモードでは、画像分析モジュール６１は対象物情報に基づき自動的に選択操作を実行する。第２のモードでは、ユーザーによってターゲットとする話者が選別される。これによって、本発明に係る補聴器システム２を用いると、ユーザーは、ノイズのある環境において複数の話者からターゲットとする一人か数人の話者の居る位置からの音声を選択的に聞くことができる。 As described above, the hearing aid system 2 according to the present invention can provide two options for selecting a target speaker from the acquired image 40. In the first mode, the image analysis module 61 automatically performs a selection operation based on the object information. In the second mode, the target speaker is selected by the user. Accordingly, when the hearing aid system 2 according to the present invention is used, the user can selectively listen to the sound from the position where one or several speakers targeted from a plurality of speakers are present in a noisy environment. it can.

２補聴器システム
３装着具
３１フロントサポーター
３１０レンズ
３２サイドサポーター
４画像取り込みモジュール
４０画像
４０１人顔対象物
４０２参照軸
５収音モジュール
５１マイクロホン
６プロセッサ
６１画像分析モジュール
６１１深度分析ユニット
６１２角度分析ユニット
６１３ターン分析ユニット
６１４尤度分類器
６１５唇動き検出ユニット
６２音声処理モジュール
６２１スピーチ検出ユニット
６２２サウンドプロセッサ
７音声出力装置
８外付け電子装置
８１タッチディスプレイ
８２通信ユニット
９通信ユニット
θ１、θ２、θ３インクルーデッドアングル 2 Hearing Aid System 3 Wearing Tool 31 Front Supporter 310 Lens 32 Side Supporter 4 Image Capture Module 40 Image 401 Human Face Object 402 Reference Axis 5 Sound Collection Module 51 Microphone 6 Processor 61 Image Analysis Module 611 Depth Analysis Unit 612 Angle Analysis Unit 613 Turn Analysis unit 614 Likelihood classifier 615 Lip motion detection unit 62 Audio processing module 621 Speech detection unit 622 Sound processor 7 Audio output device 8 External electronic device 81 Touch display 82 Communication unit 9 Communication unit θ1, θ2, θ3 Included angle

Claims

画像取り込みモジュールと収音モジュールとプロセッサとを備えている補聴器システムを用いて実行される方法であって、
前記収音モジュールを用いて、前記補聴器システムの周囲の音声情報を集める、ステップ（ａ）と、
前記画像取り込みモジュールを用いて、前記補聴器システムの周囲の画像を取得する、ステップ（ｂ）と、
前記プロセッサを用いて、前記収音モジュールによって集められた前記音声情報に指向性信号処理操作を実行し、前記画像取り込みモジュールによって取得された前記画像におけるターゲット対象物の位置に関連するエリアからの抽出されたボイス信号を含む出力オーディオ信号を生成する、ステップ（ｃ）と、
前記出力オーディオ信号を出力する、ステップ（ｄ）と、を有し、
前記ステップ（ｂ）では、前記画像取り込みモジュールは、前記収音モジュールによって集められた前記音声情報にスピーチコンテンツが含まれていることが前記プロセッサにより検出された後で、前記画像を取得することができ、
前記ステップ（ｃ）では、
前記プロセッサを用いて、前記画像取り込みモジュールによって取得された画像における人顔対象物の存在を識別する、サブステップ（ｃ１）と、
前記プロセッサを用いて、識別された前記人顔対象物それぞれに関連する対象物情報を定める、サブステップ（ｃ２）と、
前記プロセッサを用いて、識別された前記人顔対象物に関連する対象物情報に基づき、識別された前記人顔対象物それぞれの尤度分類を定めるステップであって、前記尤度分類とは、識別された前記人顔対象物に関連する人がターゲット話者であるとする尤度の分類である、サブステップ（ｃ３）と、
前記プロセッサを用いて、前記画像における識別された前記人顔対象物の一つをターゲット話者として選択するステップであって、選択された前記人顔対象物の一つは他の前記人顔対象物よりも高い尤度を有するものである、サブステップ（ｃ４）と、を有し、
前記サブステップ（ｃ２）では、識別された前記人顔対象物それぞれの前記対象物情報が、
前記画像における識別された前記人顔対象物の前記補聴器システムに対する相対深度、及び、
前記画像取り込みモジュールの一つの軸を表す前記画像における参照軸と前記画像における識別された前記人顔対象物の位置との間に形成されたインクルーデッドアングル、の少なくとも一つを有することを特徴とするオーディオ信号を処理する方法。 A method performed using a hearing aid system comprising an image capture module, a sound collection module, and a processor, comprising:
Collecting sound information around the hearing aid system using the sound collection module; and
Acquiring an image of the surroundings of the hearing aid system using the image capture module; and
Extracting from the area related to the position of the target object in the image obtained by the image capturing module by performing a directional signal processing operation on the audio information collected by the sound collection module using the processor Generating an output audio signal including the processed voice signal; and
Outputting the output audio signal, step (d),
In the step (b), the image capturing module may acquire the image after the processor detects that the audio information collected by the sound collection module includes speech content. Can
In step (c),
Substep (c1) identifying the presence of a human face object in the image acquired by the image capture module using the processor;
Substep (c2) defining object information related to each of the identified human face objects using the processor;
Using the processor to determine a likelihood classification for each identified human face object based on object information associated with the identified human face object, wherein the likelihood classification is: Substep (c3), which is a classification of likelihood that the person associated with the identified human face object is the target speaker;
Selecting one of the identified human face objects in the image as a target speaker using the processor, wherein one of the selected human face objects is another human face object; Substep (c4), which has a higher likelihood than the object,
In the sub-step (c2), the object information of each identified human face object is:
The relative depth of the identified human face object in the image relative to the hearing aid system; and
At least one of an included angle formed between a reference axis in the image representing one axis of the image capture module and a position of the identified human face object in the image, To process audio signals .

前記サブステップ（ｃ２）では、識別された前記人顔対象物それぞれの前記対象物情報が更に、
識別された前記人顔対象物の前記画像において曲がった角度を少なくとも含むことを特徴とする請求項１に記載の方法。 In the sub-step (c2), the object information of each identified human face object is further
The method of claim 1, comprising at least a bent angle in the image of the identified human face object.

前記サブステップ（ｃ２）では、前記対象物情報は、関連する識別された前記人顔対象物の前記補聴器システムに対する相対深度情報と相対方向情報を含み、
前記相対深度情報において異なる相対深度には異なる重みが予め定められ、
前記相対方向情報において異なる相対方向には異なる重みが予め定められ、
識別された前記人顔対象物それぞれの前記尤度分類は、それぞれの前記相対深度情報及び前記相対方向情報に関連する重みに応じて予め定められ、
前記相対方向情報は、
前記画像取り込みモジュールの一つの軸を表す前記画像における参照軸と、
前記画像における識別された前記人顔対象物の位置との間に形成されたインクルーデッドアングルと、
関連する識別された前記人顔対象物が前記画像において曲がった角度と、
を示す、ことを特徴とする請求項１に記載の方法。 In the sub-step (c2), the object information includes relative depth information and relative direction information of the associated identified human face object with respect to the hearing aid system;
Different weights are predetermined for different relative depths in the relative depth information,
Different weights are predetermined for different relative directions in the relative direction information,
The likelihood classification of each identified human face object is predetermined according to a weight associated with each of the relative depth information and the relative direction information,
The relative direction information is
A reference axis in the image representing one axis of the image capture module;
An included angle formed between the position of the identified human face object in the image;
The angle at which the associated identified human face object is bent in the image;
The method of claim 1, wherein:

前記ステップ（ｂ）で取得された複数の前記画像及び前記サブステップ（ｃ２）における前記対象物情報は、前記画像における関連する識別された前記人顔対象物の唇に動きがあるかどうかを示す、ことを特徴とする請求項１に記載の方法。 The plurality of images acquired in step (b) and the object information in sub-step (c2) indicate whether there is movement in the lips of the associated identified human face object in the image. The method according to claim 1, wherein:

前記ステップ（ｃ）は、
前記画像取り込みモジュールによって取得された前記画像における前記人顔対象物の存在を特定し、それぞれ識別された前記人顔対象物に関連する対象物情報を定め、識別された前記人顔対象物に関連する前記対象物情報に基づき、それぞれの識別された前記人顔対象物の尤度分類を定め、前記尤度分類とは、識別された前記人顔対象物に関連する人がターゲット話者であるとする尤度の分類であり、前記画像において、他の識別された前記人顔対象物よりも高い尤度分類を有する少なくとも１つの識別された前記人顔対象物を前記ターゲット対象物として選ぶ、第１のモード、及び
前記画像取り込みモジュールによって取得された前記画像を外付け電子装置に送り、前記外付け電子装置から、前記画像において前記ターゲット対象物の位置を示す外部制御信号を受け取る第２のモードの内の選ばれた一つのモードにおいて前記プロセッサが作動するサブステップを有する、
ことを特徴とする請求項１〜４のいずれかの一項に記載の方法。 The step (c)
Identifying the presence of the human face object in the image acquired by the image capture module, defining object information associated with each identified human face object, and relating to the identified human face object Based on the object information to be determined, a likelihood classification of each identified human face object is determined, and the likelihood classification is that a person related to the identified human face object is a target speaker Selecting at least one identified human face object having a higher likelihood classification than the other identified human face objects as the target object in the image. A first mode, and the image acquired by the image capture module is sent to an external electronic device, from the external electronic device, the position of the target object in the image A sub-step in which the processor operates in a selected one of the second modes receiving an external control signal indicating
5. A method according to any one of claims 1-4.

聴覚障害者の聞き取りを補助する補聴器システムであって、
前記補聴器システムの周囲の音声情報を集める、収音モジュールと、
前記補聴器システムの周囲の画像を取り入れる、画像取り込みモジュールと、
前記収音モジュール及び前記画像取り込みモジュールと連結され、前記収音モジュールによって集められた前記音声情報に対して指向性信号処理操作を施し、出力オーディオ信号を生成する音声処理モジュールを有し、前記出力オーディオ信号は、前記画像取り込みモジュールによって取得された画像におけるターゲット対象物の位置に関連するエリアからの抽出されたボイス信号を含む、プロセッサと、
前記プロセッサと連結され、前記出力オーディオ信号を出力する、音声出力装置と、を備え、
前記プロセッサは、前記画像取り込みモジュールによって取得された前記画像における人顔対象物の存在を特定し、それぞれ識別された前記人顔対象物に関連する対象物情報を定め、識別された前記人顔対象物に関連する前記対象物情報に基づき、それぞれの識別された前記人顔対象物の尤度分類を定め、前記尤度分類は、識別された前記人顔対象物に関連する人がターゲット話者であるとする尤度によって分類を定めるものであり、前記画像において、他の識別された前記人顔対象物よりも高い尤度分類を有する少なくとも１つの識別された前記人顔対象物を前記ターゲット対象物として選ぶ、第１のモードにおいて操作可能に構成されていることを特徴とする補聴器システム。 A hearing aid system that assists the hearing impaired
A sound collection module for collecting audio information around the hearing aid system;
An image capture module for capturing an image surrounding the hearing aid system;
A voice processing module that is connected to the sound collection module and the image capturing module, performs a directional signal processing operation on the voice information collected by the sound collection module, and generates an output audio signal; An audio signal comprising an extracted voice signal from an area associated with a position of a target object in an image acquired by the image capture module;
An audio output device connected to the processor and outputting the output audio signal;
Wherein the processor, the image capture and identify the presence of a human face object that put on the acquired image by the module, each set object information associated with the identified the person faces the object, identified the person Based on the object information related to the face object, a likelihood classification of each identified human face object is determined, and the likelihood classification is targeted by a person related to the identified human face object. Classification is determined by likelihood of being a speaker, and in the image, at least one identified human face object having a higher likelihood classification than other identified human face objects. A hearing aid system configured to be operable in a first mode selected as the target object.

前記プロセッサは更に、
前記画像取り込みモジュールによって取得された画像における人顔対象物の存在を識別するように構成され、識別された前記人顔対象物に関連する対象物情報を定めると共に、前記対象物情報に基づき識別された前記人顔対象物それぞれの尤度分類を定める、画像分析モジュールを有し、
前記画像における識別された前記人顔対象物のそれぞれに対して定められる前記尤度分類は、識別された前記人顔対象物に関連する人がターゲット話者であるとする尤度の分類であり、
前記音声処理モジュールは、前記画像における識別された前記人顔対象物における少なくとも一つを前記ターゲット対象物として選択し、前記ターゲット対象物として選択された少なくとも一つの識別された前記人顔対象物は、前記画像における前記ターゲット対象物として選択された少なくとも一つ以外の他の識別された前記人顔対象物よりも高い前記尤度を有するものであり、
前記画像分析モジュールは、
前記画像における識別された前記人顔対象物の前記補聴器システムに対しての相対深度を定める、深度分析ユニット、
前記画像取り込みモジュールの軸を表す参照軸と、前記画像おける識別された前記画像における前記人顔対象物の位置との間に形成されたインクルーデッドアングルを定める、角度分析ユニット、及び、
識別された前記人顔対象物が前記画像において曲がった角度を定める、ターン分析ユニット、の少なくとも一つを有することを特徴とする請求項６に記載の補聴器システム。 The processor further includes:
Configured to identify the presence of a human face object in an image acquired by the image capture module, defining object information associated with the identified human face object and identified based on the object information An image analysis module for determining a likelihood classification of each human face object;
The likelihood classification defined for each identified human face object in the image is a likelihood classification that a person associated with the identified human face object is a target speaker. ,
The speech processing module selects at least one of the identified human face objects in the image as the target object, and the at least one identified human face object selected as the target object is , Having a higher likelihood than the other identified human face object other than at least one selected as the target object in the image,
The image analysis module includes:
A depth analysis unit that determines a relative depth of the identified human face object relative to the hearing aid system in the image;
An angle analysis unit for defining an included angle formed between a reference axis representing an axis of the image capture module and a position of the human face object in the identified image in the image; and
The hearing aid system according to claim 6, comprising at least one of a turn analysis unit that defines an angle at which the identified human face object is bent in the image.

前記画像分析モジュールによって定められた前記対象物情報は、関連する識別された前記人顔対象物の前記補聴器システムに対する相対深度情報及び相対方向情報を有し、
前記相対深度情報の相対深度は、異なる相対深度に異なる重みが予め定められ、
前記相対方向情報の相対方向は、異なる相対方向に異なる重みが予め定められ、
前記画像分析モジュールは、前記対象物情報それぞれの前記相対深度情報及び前記相対方向情報に関連する重みに応じて、識別された前記人顔対象物の尤度分類を定める尤度分類器を有する、
ことを特徴とする請求項７に記載の補聴器システム。 The object information defined by the image analysis module comprises relative depth information and relative direction information of the associated identified human face object relative to the hearing aid system;
The relative depth of the relative depth information is pre-set with different weights for different relative depths,
The relative direction of the relative direction information is predetermined different weights in different relative directions,
The image analysis module includes a likelihood classifier that determines a likelihood classification of the identified human face object according to a weight associated with the relative depth information and the relative direction information of each of the object information.
The hearing aid system according to claim 7 .

前記相対方向情報は、
前記画像取り込みモジュールの一つの軸を表す前記画像における参照軸と前記画像における識別された前記人顔対象物の位置との間に形成されたインクルーデッドアングルと、
前記画像において前記関連する識別された前記人顔対象物の曲がった角度と、
を示すことを特徴とする請求項８に記載の補聴器システム。 The relative direction information is
An included angle formed between a reference axis in the image representing one axis of the image capture module and the position of the identified human face object in the image;
A bent angle of the associated identified human face object in the image;
The hearing aid system according to claim 8, wherein:

前記画像取り込みモジュールは、複数の画像を取り入れるように構成され、
前記画像分析モジュールは、唇動き検出ユニットを有し、
前記画像分析モジュールによって定められた前記対象物情報は、前記唇動き検出ユニットによって検出される、対応する識別された前記人顔対象物の唇の動きがあるかどうかを示す情報を含むものである、ことを特徴とする請求項７に記載の補聴器システム。 The image capture module is configured to capture a plurality of images;
The image analysis module has a lip movement detection unit;
The object information defined by the image analysis module includes information indicating whether there is a lip movement of the corresponding identified human face object detected by the lip movement detection unit; The hearing aid system according to claim 7 .

前記音声処理モジュールは、前記収音モジュールによって集められた音声情報に含まれ得るスピーチコンテンツを検出するスピーチ検出ユニットを有し、
前記画像取り込みモジュールは、前記スピーチ検出ユニットにより前記音声情報にスピーチコンテンツが検出された後で、前記画像を取得することができることを特徴とする請求項６〜１０のいずれかの一項に記載の補聴器システム。 The audio processing module includes a speech detection unit that detects speech content that may be included in the audio information collected by the sound collection module;
11. The image capturing module according to claim 6, wherein the image capturing module is capable of acquiring the image after speech content is detected in the audio information by the speech detection unit. Hearing aid system.

前記画像取り込みモジュールと前記収音モジュールと前記プロセッサが取り付けられた装着具を更に有し、
前記装着具は、前記画像取り込みモジュールが取り付けられたフロントサポーターと、前記フロントサポーターの両側にそれぞれ連結された２つのサイドサポーターとからなり、
前記収音モジュールは、複数のマイクロホンを含むマイクロホンアレイであり、
前記マイクロホンのそれぞれが前記フロントサポーター及び２つの前記サイドサポーターのいずれかに設けられ、
前記プロセッサは、２つの前記サイドサポーターの一方に設けられ、
前記音声出力装置は、２つの前記サイドサポーターのそれぞれに連結されて設けられた一対のイヤホンを有し、
前記装着具は、前記フロントサポーターとするレンズ枠を有する眼鏡フレームと、前記サイドサポーターとする一対のテンプルとを備えていることを特徴とする請求項６〜１１のいずれかの一項に記載の補聴器システム。 The image capturing module, the sound collection module, and a mounting tool to which the processor is attached,
The mounting device includes a front supporter to which the image capturing module is attached, and two side supporters connected to both sides of the front supporter,
The sound collection module is a microphone array including a plurality of microphones,
Each of the microphones is provided on either the front supporter or the two side supporters,
The processor is provided on one of the two side supporters,
The audio output device has a pair of earphones connected to each of the two side supporters,
The said mounting tool is equipped with the spectacles frame which has a lens frame used as the said front supporter, and a pair of temple used as the said side supporter, The one of Claims 6-11 characterized by the above-mentioned. Hearing aid system.

聴覚障害者の聞き取りを補助する補聴器システムであって、
前記補聴器システムの周囲の音声情報を集める、収音モジュールと、
前記補聴器システムの周囲の画像を取り入れる、画像取り込みモジュールと、
前記収音モジュール及び前記画像取り込みモジュールと連結され、前記収音モジュールによって集められた前記音声情報に対して指向性信号処理操作を施し、出力オーディオ信号を生成する音声処理モジュールを有し、前記出力オーディオ信号は、前記画像取り込みモジュールによって取得された画像におけるターゲット対象物の位置に関連するエリアからの抽出されたボイス信号を含む、プロセッサと、
前記プロセッサと連結され、前記出力オーディオ信号を出力する、音声出力装置と、を備え、
前記プロセッサは、前記画像取り込みモジュールによって取得された前記画像を外付け電子装置に送り、前記外付け電子装置から、前記画像において前記ターゲット対象物の位置を示す外部制御信号を受け取る第２のモードにおいて操作可能に構成されていることを特徴とする補聴器システム。 A hearing aid system that assists the hearing impaired
A sound collection module for collecting audio information around the hearing aid system;
An image capture module for capturing an image surrounding the hearing aid system;
A voice processing module that is connected to the sound collection module and the image capturing module, performs a directional signal processing operation on the voice information collected by the sound collection module, and generates an output audio signal; An audio signal comprising an extracted voice signal from an area associated with a position of a target object in an image acquired by the image capture module;
An audio output device connected to the processor and outputting the output audio signal;
In a second mode, the processor sends the image acquired by the image capture module to an external electronic device and receives an external control signal indicating the position of the target object in the image from the external electronic device. A hearing aid system that is configured to be operable.