JP4539385B2

JP4539385B2 - Imaging device, imaging control program

Info

Publication number: JP4539385B2
Application number: JP2005074779A
Authority: JP
Inventors: 一記喜多
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2005-03-16
Filing date: 2005-03-16
Publication date: 2010-09-08
Anticipated expiration: 2025-03-16
Also published as: JP2006261900A

Description

本発明は、被写体像とともに不可視情報をも表示させる撮像装置、撮像制御プログラムに関する。 The present invention relates to an imaging apparatus and an imaging control program that display invisible information together with a subject image.

従来、被写体の温度表示機能を備えたカメラ装置が提案されるに至っている。このカメラ装置は、被写体からの光を導入して結像させる光学ファインダーと、この光学ファインダーの下部に配置された温度表示部、これら光学ファインダーと温度表示部とを同一視野で撮像するテレビカメラ、及び赤外線検出器とで構成されている。そして、この赤外線検出器により検出された信号に基づき、温度表示部に被写体の温度をデジタル表示し、この温度が表示された温度表示部と被写体が結像した光学ファインダーとをテレビカメラで撮像することにより、テレビモニターに結像した被写体と該被写体の温度とを表示させるものである（例えば、特許文献１参照）。
特許第２７４７４２６号公報 Conventionally, a camera device having a temperature display function of a subject has been proposed. This camera device includes an optical viewfinder that forms an image by introducing light from a subject, a temperature display unit disposed below the optical viewfinder, a television camera that captures images of the optical viewfinder and the temperature display unit in the same field of view, And an infrared detector. Then, based on the signal detected by the infrared detector, the temperature of the subject is digitally displayed on the temperature display, and the temperature display on which the temperature is displayed and the optical viewfinder on which the subject is imaged are captured by a television camera. Thus, the subject imaged on the television monitor and the temperature of the subject are displayed (see, for example, Patent Document 1).
Japanese Patent No. 2747426

しかしながら、係る従来のカメラ装置においては、被写体が結像された光学ファインダーとその下部に表示される温度表示とをテレビカメラで撮像して、モニターに表示することから、モニターには被写体と当該被写体の温度とが異なる部位に表示される。このため被写体が複雑に混在していると、表示されている温度がいずれの被写体の温度であるかが不明確となる。また、被写体中に異なる温度の部分が混在する場合にも、表示されている温度が被写体のいずれの部分の温度であるかが不明確となってしまう。 However, in such a conventional camera device, the optical viewfinder on which the subject is imaged and the temperature display displayed below the subject are imaged by the television camera and displayed on the monitor. It is displayed on the part where temperature is different. For this reason, if subjects are mixed together, it is unclear which subject the displayed temperature is. Further, even when different temperature portions are mixed in the subject, it is unclear which portion of the subject the displayed temperature is.

また、例えば特定の音声を発生している所望の被写体を撮影しようとする場合、当該被写体が複数の被写体中に存在していると、ファインダーを視認してもいずれの被写体が特定の音声を発生している所望の被写体であるかを容易に識別することができない場合が生ずる。 Also, for example, when shooting a desired subject generating a specific sound, if the subject is present in a plurality of subjects, any subject generates a specific sound even if the subject is viewed through the viewfinder. In some cases, it is not possible to easily identify whether the subject is a desired subject.

本発明は、かかる従来の課題に鑑みてなされたものであり、被写体画像中における不可視情報を明瞭に表示することのできる撮像装置、撮像制御プログラムを提供することを目的とする。また、本発明は、特定の音声を発生している所望の被写体を容易かつ迅速に撮影することのできる撮像装置、撮像制御プログラムを提供することを目的とする。 The present invention has been made in view of such a conventional problem, and an object of the present invention is to provide an imaging apparatus and an imaging control program capable of clearly displaying invisible information in a subject image. It is another object of the present invention to provide an imaging apparatus and an imaging control program that can easily and quickly photograph a desired subject generating a specific sound.

前記課題を解決するため請求項１記載の発明に係る撮像装置にあっては、表示手段と、撮像手段と、この撮像手段により撮像される画像を前記表示手段に表示させる第１の表示制御手段と、前記撮像手段の撮像範囲内における周囲音を検出する周囲音検出手段と、この周囲音検出手段により検出された周囲音を表す可視情報を生成し、この可視情報を前記周囲音検出手段により検出された前記周囲音の前記撮像範囲内における位置に対応させて、前記表示手段に表示させる第２の表示制御手段と、この第２の表示制御手段により前記表示手段に表示された前記周囲音を示す可視情報の任意の部分を指定することにより、前記周囲音検出手段により検出される周囲音に含まれる任意の音声を選択する選択手段と、前記周囲音検出手段により検出された周囲音を制御し、前記選択手段により選択された音声を強調処理または抑圧処理する音声制御手段と、この音声制御手段により前記音声を強調処理または抑圧処理された周囲音を記録する記録手段とを備える。
In order to solve the above-mentioned problem, in the imaging apparatus according to the first aspect of the present invention, display means, imaging means, and first display control means for causing the display means to display an image captured by the imaging means. When the ambient sound detecting means for detecting the ambient sound in the imaging range of the imaging means, generates visible information indicating the ambient sound detected by the ambient sound detector, by the visual information the ambient sound detector A second display control means for displaying on the display means in correspondence with a position of the detected ambient sound in the imaging range; and the ambient sound displayed on the display means by the second display control means. By selecting an arbitrary part of the visible information indicating, the selection means for selecting an arbitrary sound included in the ambient sound detected by the ambient sound detection means, and the detection by the ambient sound detection means A sound control means for controlling the selected ambient sound and emphasizing or suppressing the sound selected by the selection means; and a recording means for recording the ambient sound obtained by emphasizing or suppressing the sound by the sound control means With .

したがって、表示手段には、撮像手段の撮像範囲内における周囲音が可視情報とされて表示されるのみならず、この可視情報が周囲音の前記撮像範囲内における位置に対応させて表示される。よって、ユーザは、撮像範囲内における位置に対応させて表示される可視情報を視認することにより、被写体画像中における周囲音の存在を被写体画像との関係において明確に視認することが可能となる。そして、ユーザが、前記表示手段に表示された周囲音を示す可視情報の任意の部分を指定することにより、当該部分からの音声を強調または抑圧した録音が可能となる。
Therefore, the display means displays not only the ambient sound within the imaging range of the imaging means as visible information but also the visible information corresponding to the position of the ambient sound within the imaging range. Therefore, the user can visually recognize the presence of ambient sound in the subject image in relation to the subject image by visually recognizing the visible information displayed in correspondence with the position in the imaging range. Then, when the user designates an arbitrary portion of the visible information indicating the ambient sound displayed on the display means, the recording from which the sound from the portion is emphasized or suppressed becomes possible.

また、請求項２記載の発明に係る撮像装置にあっては、前記第２の表示制御手段は、前記可視情報を前記第１の表示制御手段により前記表示手段に表示される画像に重ねて、前記表示手段に表示させる。したがって、表示手段には、撮像手段の撮像範囲内における周囲音が可視情報とされて表示されるのみならず、この可視情報が撮像手段により撮像される画像に重畳されて表示される。よって、ユーザは、この撮像された画像に重畳されて表示された可視情報を視認することにより、被写体画像中における周囲音の存在を被写体画像との関係において明確に視認することが可能となる。
Further, in the imaging apparatus according to the invention of claim 2, wherein the second display control means, superimposed on images displayed on the display means by the visual information of the first display control means And display on the display means. Therefore, not only the ambient sound within the imaging range of the imaging means is displayed as visible information on the display means, but also this visible information is displayed superimposed on the image captured by the imaging means. Therefore, the user can visually recognize the presence of the ambient sound in the subject image in relation to the subject image by visually recognizing the visible information displayed superimposed on the captured image.

また、請求項３記載の発明に係る撮像装置にあっては、前記可視情報は、半透明化されている。したがって、ユーザは、被写体画像中における周囲音の存在を被写体画像との関係において明確に視認することが可能となる。
In the imaging device according to the third aspect of the invention, the visible information is translucent. Therefore, the user can clearly see the presence of ambient sounds in the subject image in relation to the subject image.

また、請求項４記載の発明に係る撮像装置にあっては、前記可視情報は、前記周囲音の分布状況を音圧レベルに基づいて表した二次元画像である。したがって、ユーザは、どこからどの程度の音圧の音（音声）が発生しているか視覚的に認識することができる。
In the imaging device according to the fourth aspect of the present invention, the visible information is a two-dimensional image representing the distribution state of the ambient sound based on a sound pressure level. Therefore, the user can visually recognize from where the sound pressure (sound) is generated .

また、請求項５記載の発明に係る撮像装置にあっては、前記二次元画像は、音圧レベルに応じて色が異なっている。したがって、ユーザは、どこからどの程度の音圧の音（音声）が発生しているか視覚的に認識することができる。
In the imaging device according to the fifth aspect of the invention, the two-dimensional image has a different color according to a sound pressure level. Therefore, the user can visually recognize from where the sound pressure (sound) is generated.

また、請求項６記載の発明に係る撮像装置にあっては、前記記録手段は、音声が強調処理または抑圧処理された前記周囲音を前記撮像手段により撮像された画像とともに記録する。したがって、特定部分からの音声を強調または抑圧した録音を伴う画像記録が可能となる。
In the image pickup apparatus according to the sixth aspect of the invention, the recording unit records the ambient sound in which the sound is enhanced or suppressed together with the image picked up by the image pickup unit. Therefore, it is possible to perform image recording with recording in which sound from a specific portion is emphasized or suppressed.

また、請求項７記載の発明に係る撮像装置にあっては、前記音声制御手段は、前記第２の表示制御手段により前記表示手段に表示された前記周囲音を示す可視情報中における任意の部分に対する操作に基づき得られる位置座標に基づき、前記指定された周囲音の方向を算出し、この算出した方向からの音声を強調処理または抑圧処理する。 In the image pickup apparatus according to the seventh aspect of the present invention, the sound control means is an arbitrary portion in the visible information indicating the ambient sound displayed on the display means by the second display control means. The direction of the designated ambient sound is calculated based on the position coordinates obtained based on the operation for, and the sound from the calculated direction is emphasized or suppressed.

また、請求項８記載の発明に係る撮像装置にあっては、前記音声制御手段は、前記位置座標と、前記撮像手段の焦点距離及び又は前記画像のサイズとに基づき、前記指定された周囲音の方向を算出し、この算出した方向からの音声を強調処理または抑圧処理する。 In the image pickup apparatus according to the eighth aspect of the invention, the sound control unit is configured to use the designated ambient sound based on the position coordinates, the focal length of the image pickup unit, and / or the size of the image. , And the sound from the calculated direction is subjected to enhancement processing or suppression processing.

また、請求項９記載の発明に係る撮像装置にあっては、撮像手段と、音声の特徴データを記憶した特徴データ記憶手段と、周囲音を検出する周囲音検出手段と、前記特徴データ記憶手段に記憶された音声の特徴データと、前記周囲音検出手段により検出された周囲音中の音声データとを比較する比較手段と、この比較手段による比較に基づき、前記撮像手段の撮像範囲内において、前記特徴データに近似する周囲音を発生している被写体を検出する被写体検出手段とを備える。したがって、例えば特定の音声を発生している所望の被写体を撮影しようとする場合、当該特定の音声の特徴データが記憶されていれば、当該被写体が複数の被写体中に存在している場合であっても、いずれの被写体が特定の音声を発生している所望の被写体であるかを容易に識別することができ、所望の被写体を容易かつ迅速に撮影することが可能となる。 In the image pickup apparatus according to the ninth aspect of the present invention, the image pickup means, the feature data storage means for storing the sound feature data, the ambient sound detection means for detecting the ambient sound, and the feature data storage means In the imaging range of the imaging means based on the comparison by the comparison means and the comparison by the comparison means, comparing the voice feature data stored in the sound and the voice data in the ambient sound detected by the ambient sound detection means, Subject detection means for detecting a subject generating an ambient sound approximating the feature data. Therefore, for example, when shooting a desired subject generating a specific sound, if the feature data of the specific sound is stored, the subject is present in a plurality of subjects. However, it is possible to easily identify which subject is the desired subject generating a specific sound, and it is possible to easily and quickly photograph the desired subject.

また、請求項１０記載の発明に係る撮像装置にあっては、前記被写体検出手段により検出された被写体に、前記撮像手段を合焦させる合焦制御手段を更に備える。したがって、例えば特定の野鳥の音声の特徴データが記憶されていれば、当該野鳥に近似した音声データからなる音声を発生している被写体に合焦させて撮影を行うことが可能となる。 The image pickup apparatus according to a tenth aspect of the present invention further includes a focus control unit that focuses the image pickup unit on the subject detected by the subject detection unit. Therefore, for example, if the feature data of the sound of a specific wild bird is stored, it is possible to perform shooting while focusing on a subject that generates sound composed of sound data that approximates the wild bird.

また、請求項１１記載の発明に係る撮像装置にあっては、前記被写体検出手段により検出された被写体を表示する表示手段を更に備える。したがって、表示手段を視認することにより、前記所望の被写体を確認しつつ撮影を行うことができる。 The image pickup apparatus according to an eleventh aspect of the present invention further includes display means for displaying the subject detected by the subject detection means. Therefore, by viewing the display means, it is possible to perform shooting while confirming the desired subject.

また、請求項１２記載の発明に係る撮像装置にあっては、前記周囲音検出手段により検出された周囲音を制御し、前記被写体検出手段により検出された被写体からの音声を強調処理または抑圧処理する音声制御手段と、この音声制御手段により前記音声を強調処理または抑圧処理された周囲音を記録する記録手段とを更に備える。したがって、前記所望の被写体の音声を強調または抑圧した周囲音声を記録することができる。 In the imaging device according to the invention of claim 12, the ambient sound detected by the ambient sound detection means is controlled, and the sound from the subject detected by the subject detection means is enhanced or suppressed. Voice control means for recording, and recording means for recording ambient sound in which the voice is emphasized or suppressed by the voice control means. Accordingly, it is possible to record ambient sound in which the sound of the desired subject is emphasized or suppressed.

また、請求項１３記載の発明に係る撮像装置にあっては、前記記録手段は、前記周囲音を前記撮像手段により撮像された画像とともに記録する。したがって、前記所望の被写体の音声を強調または抑圧した周囲音声と当該所望の被写体とを記録することができる。 In the image pickup apparatus according to the thirteenth aspect, the recording means records the ambient sound together with an image picked up by the image pickup means. Therefore, it is possible to record the ambient sound in which the sound of the desired subject is emphasized or suppressed and the desired subject.

また、請求項１４記載の発明に係る撮像装置にあっては、複数の音声の特徴データのうち任意の特徴データを指定する指定手段を更に備え、前記記憶手段は、前記指定手段により指定された前記特徴データを記憶する。したがって、指定手段の指定により種々の音声の特徴データを有する被写体の音声や被写体自体を記録することができる。 In the imaging device according to the fourteenth aspect of the present invention, the imaging device further includes designation means for designating arbitrary feature data among a plurality of audio feature data, and the storage means is designated by the designation means. The feature data is stored. Therefore, it is possible to record the voice of the subject having various voice feature data and the subject itself by the designation of the designation unit.

また、請求項１５記載の発明に係る撮像装置にあっては、前記周囲音検出手段は、複数のマイクロホンを有するマイクロホンアレーである。 In the image pickup apparatus according to the fifteenth aspect of the present invention, the ambient sound detection means is a microphone array having a plurality of microphones.

また、請求項１６記載の発明に係る撮像制御プログラムにあっては、表示手段と、撮像手段と、この撮像手段の撮像範囲内における周囲音を検出する周囲音検出手段とを備える撮像装置が有するコンピュータを、前記撮像手段により撮像される画像を前記表示手段に表示させる第１の表示制御手段と、前記周囲音検出手段により検出された周囲音を表す可視情報を生成し、この可視情報を前記周囲音検出手段により検出された前記周囲音の前記撮像範囲内における位置に対応させて、前記表示手段に表示させる第２の表示制御手段と、この第２の表示制御手段により前記表示手段に表示された前記周囲音を示す可視情報の任意の部分を指定することにより、前記周囲音検出手段により検出される周囲音に含まれる任意の音声を選択する選択手段と、前記周囲音検出手段により検出された周囲音を制御し、前記選択手段により選択された音声を強調処理または抑圧処理する音声制御手段と、この音声制御手段により前記音声を強調処理または抑圧処理された周囲音を記録する記録手段として機能させる。したがって、前記コンピュータがこのプログラムに従って処理を実行することにより、請求項１記載の発明と同様の作用効果を奏する。
In the imaging control program according to the sixteenth aspect of the invention, there is provided an imaging apparatus comprising display means, imaging means, and ambient sound detecting means for detecting ambient sounds within the imaging range of the imaging means. The computer generates first display control means for displaying the image picked up by the image pickup means on the display means, and visible information representing the ambient sound detected by the ambient sound detection means, and this visible information is in correspondence with the position within the imaging range of said detected ambient sound by ambient sound detecting means, second display control means for displaying on the display means, displaying on said display means by the second display control means By selecting an arbitrary portion of the visible information indicating the ambient sound, a selection hand for selecting an arbitrary sound included in the ambient sound detected by the ambient sound detection means Voice control means for controlling the ambient sound detected by the ambient sound detection means and emphasizing or suppressing the sound selected by the selection means; and enhancing or suppressing the sound by the voice control means been a to thereby function recording means for recording ambient sound. Therefore, when the computer executes processing according to this program, the same effects as those of the first aspect of the invention can be obtained.

また、請求項１７記載の発明に係る撮像制御プログラムにあっては、撮像手段と、音声の特徴データを記憶した特徴データ記憶手段と、周囲音を検出する周囲音検出手段とを備える撮像装置が有するコンピュータを、前記特徴データ記憶手段に記憶された音声の特徴データと、前記周囲音検出手段により検出された周囲音中の音声データとを比較する比較手段と、この比較手段による比較に基づき、前記撮像手段が撮像する被写体において、前記特徴データに近似する周囲音を発生している被写体を検出する被写体検出手段として機能させる。したがって、前記コンピュータがこのプログラムに従って処理を実行することにより、請求項９記載の発明と同様の作用効果を奏する。 In the imaging control program according to the invention described in claim 17, there is provided an imaging apparatus comprising imaging means, feature data storage means storing voice feature data, and ambient sound detection means for detecting ambient sounds. Comparing means for comparing the voice feature data stored in the feature data storage means with the voice data in the ambient sound detected by the ambient sound detection means, based on the comparison by the comparison means, In the subject picked up by the image pickup means, it functions as a subject detection means for detecting a subject generating an ambient sound that approximates the feature data. Therefore, when the computer executes processing according to this program, the same effects as those of the ninth aspect of the invention can be attained.

以上のように請求項１及び請求項１６に係る発明によれば、表示手段に、撮像手段の撮像範囲内における周囲音を可視情報として表示することができるのみならず、この可視情報が周囲音の前記撮像範囲内における位置に対応させて表示することができるので、ユーザは、撮像範囲内における位置に対応させて表示される可視情報を視認することにより、被写体画像中における不可視情報の存在を被写体画像との関係において明確に視認することが可能となる。また、ユーザが、前記表示手段に表示された周囲音を示す可視情報の任意の部分を指定することにより、当該部分からの音声を強調または抑圧した録音が可能となる。
As described above, according to the inventions according to claims 1 and 16, not only the ambient sound within the imaging range of the imaging means can be displayed as visible information on the display means, but also this visible information is displayed as the ambient sound. Therefore , the user can recognize the presence of invisible information in the subject image by viewing the visible information displayed in correspondence with the position in the imaging range. It is possible to clearly see the relationship with the subject image. Further, when the user designates an arbitrary portion of the visible information indicating the ambient sound displayed on the display means, the recording from which the sound from the portion is emphasized or suppressed becomes possible.

また、請求項２記載の発明によれば、表示手段に、撮像手段の撮像範囲内における周囲音を可視情報として表示することができるのみならず、この可視情報が撮像手段により撮像される画像に重畳して表示することができる。よって、ユーザは、この撮像された画像に重畳されて表示された可視情報を視認することにより、被写体画像中における周囲音の存在を被写体画像との関係において明確に視認することが可能となる。
According to the second aspect of the present invention, not only the surrounding sound within the imaging range of the imaging unit can be displayed as visible information on the display unit, but also the visible information is displayed on an image captured by the imaging unit. It can be displayed superimposed. Therefore, the user can visually recognize the presence of the ambient sound in the subject image in relation to the subject image by visually recognizing the visible information displayed superimposed on the captured image.

また、請求項４に係る発明によれば、表示手段に、撮像手段の撮像範囲内における周囲音を可視情報として表示することができるのみならず、この可視情報が前記周囲音の分布状況を音圧レベルに基づいて表した二次元画像として表示することができる。よって、ユーザは、どこからどの程度の音圧の音（音声）が発生しているか視覚的に認識することが可能となる。

According to the fourth aspect of the invention, not only the ambient sound within the imaging range of the imaging means can be displayed as visible information on the display means, but also the visible information indicates the distribution status of the ambient sound. It can be displayed as a two-dimensional image expressed based on the pressure level . Therefore, the user can visually recognize from where and how much sound pressure (sound) is generated .

また、請求項９及び請求項１７に係る発明によれば、特定の音声を発生している所望の被写体を撮影しようとする場合、当該特定の音声の特徴データが記憶されていれば、当該被写体が複数の被写体中に存在している場合であっても、いずれの被写体が特定の音声を発生している所望の被写体であるかを容易に識別することができ、所望の被写体を容易かつ迅速に撮影することが可能となる。 According to the ninth and seventeenth aspects of the present invention, when shooting a desired subject that generates a specific sound, if the feature data of the specific sound is stored, the subject Can be easily identified as to which subject is the desired subject generating a specific sound, and the desired subject can be easily and quickly detected. It becomes possible to shoot.

（第１の実施の形態）
図１に示すように、本発明の各実施の形態に係るデジタルカメラ１００の本体１０１には、前面上部に撮像レンズ１０２が配置され、その下部にマイクロホンアレー部１０３が設けられている。このマイクロホンアレー部１０３には、横配列マイクと縦配列マイクとからなる複数のマイクロホン（後述するマイクＭ１〜マイクＭｎ）が等間隔で設けられている。また、一方の側面には、開閉自在なカバー体１０４が設けられており、このカバー体１０４の裏面側に後述するファインダー表示部１１９とタッチパネル１３２とが配置されている。 (First embodiment)
As shown in FIG. 1, the main body 101 of the digital camera 100 according to each embodiment of the present invention is provided with an imaging lens 102 at the upper part of the front surface and a microphone array unit 103 at the lower part thereof. The microphone array 103 is provided with a plurality of microphones (microphones M1 to Mn, which will be described later) composed of a horizontal microphone and a vertical microphone at regular intervals. In addition, a cover body 104 that can be freely opened and closed is provided on one side surface, and a finder display unit 119 and a touch panel 132, which will be described later, are disposed on the back surface side of the cover body 104.

図２は、第１の実施の形態に係るデジタルカメラ１００の回路構成を示すブロック図である。このデジタルカメラ１００は、ＡＥ、ＡＷＢ、ＡＦ等の一般的な機能を有するものであり、前記撮像レンズ１０２は、ズームレンズ、フォーカスレンズで構成され、レンズ駆動部１０５により駆動される。この撮像レンズ１０２の光軸上には、及びＣＣＤ等で構成される撮像素子１０９が配置されており、この撮像素子１０９はドライバ１１１に接続されている。 FIG. 2 is a block diagram showing a circuit configuration of the digital camera 100 according to the first embodiment. The digital camera 100 has general functions such as AE, AWB, and AF, and the imaging lens 102 includes a zoom lens and a focus lens, and is driven by a lens driving unit 105. On the optical axis of the imaging lens 102, an imaging element 109 configured by a CCD or the like is disposed, and the imaging element 109 is connected to a driver 111.

このデジタルカメラ１００全体を制御する撮影録音制御部１１２（以下、単に制御部１１２という。）は、ＣＰＵ、ＲＯＭおよびワーク用のＲＡＭ等で構成されている。ＲＯＭには、制御部１１２に前記各部を制御させるための各種のプログラム、例えばＡＥ、ＡＦ、ＡＷＢ制御用のプログラムや、制御回路３１２を本発明を構成する手段として機能させるためのプログラム等の各種のプログラムが格納されている。この制御部１１２には、前記レンズ駆動部１０５とともにドライバ１１１が接続されており、ドライバ１１１は、制御部１１２が発生するタイミング信号に基づき、撮像素子１０９を駆動する。 The recording / recording control unit 112 (hereinafter simply referred to as the control unit 112) for controlling the entire digital camera 100 is composed of a CPU, a ROM, a work RAM, and the like. In the ROM, various programs such as various programs for causing the control unit 112 to control the respective units, such as a program for controlling AE, AF, and AWB, and a program for causing the control circuit 312 to function as means constituting the present invention. The program is stored. A driver 111 is connected to the control unit 112 together with the lens driving unit 105, and the driver 111 drives the image sensor 109 based on a timing signal generated by the control unit 112.

また、前記撮像素子１０９の受光面には、撮像レンズ１０２によって被写体が結像される。撮像素子１０９は、ドライバ１１１によって駆動され、被写体の光学像に応じたアナログの撮像信号はＡ／Ｄ変換器１１４によりデジタルデータに変換され、画像信号処理部１１５へ出力される。 The subject is imaged by the imaging lens 102 on the light receiving surface of the imaging element 109. The image sensor 109 is driven by a driver 111, and an analog image signal corresponding to the optical image of the subject is converted into digital data by the A / D converter 114 and output to the image signal processing unit 115.

画像信号処理部１１５は、入力した撮像信号に対しペデスタルクランプ等の処理を施し、それを輝度（Ｙ）信号及び色差（ＵＶ）信号に変換するとともに、オートホワイトバランス、輪郭強調、画素補間などの画品質向上のためのデジタル信号処理を行う。画像信号処理部１１５で変換されたＹＵＶデータは順次画像メモリ１１６に格納されるとともに、ＲＥＣスルー・モードでは１フレーム分のデータ（画像データ）が蓄積される毎にビデオ信号に変換され、被写体像スルー画像部１１３及び画像合成部１１７を介してファインダー／表示部１１９へ送られてスルー画像として画面表示される。 The image signal processing unit 115 performs processing such as pedestal clamping on the input imaging signal, converts it into a luminance (Y) signal and a color difference (UV) signal, and performs auto white balance, contour enhancement, pixel interpolation, and the like. Performs digital signal processing to improve image quality. The YUV data converted by the image signal processing unit 115 is sequentially stored in the image memory 116, and in the REC through mode, every time one frame of data (image data) is accumulated, the YUV data is converted into a video signal, and the subject image The image is sent to the finder / display unit 119 via the through image unit 113 and the image synthesis unit 117 and displayed on the screen as a through image.

そして、静止画撮影モードにおいては、後述する操作入力部１３０に設けられているシャッターキー操作をトリガとして、制御部１１２は、撮像素子１０９、ドライバ１１１、及び画像信号処理部１１５に対してスルー画撮影モードから静止画撮影モードへの切り替えを指示し、この静止画撮影モードによる撮影処理により得られ画像メモリ１１６に一時記憶された画像データは、画像圧縮符号器／伸張復号器１２０で圧縮及び符号化され、符号化画像メモリ１２１に一時記憶された後、最終的には所定のフォーマットの静止画ファイルとして、外部メモリ（図示せず）に記録される。 In the still image shooting mode, the control unit 112 uses the shutter key operation provided in the operation input unit 130, which will be described later, as a trigger, and the control unit 112 performs a through image on the image sensor 109, the driver 111, and the image signal processing unit 115. The switching from the shooting mode to the still image shooting mode is instructed, and the image data obtained by the shooting process in the still image shooting mode and temporarily stored in the image memory 116 is compressed and encoded by the image compression encoder / decompression decoder 120. And temporarily stored in the encoded image memory 121, and finally recorded in an external memory (not shown) as a still image file of a predetermined format.

また、動画撮影モードにおいては、１回目のシャッターキーと２回目のシャッターキー操作との間に、画像メモリ１１６に順次記憶される複数の画像データが画像圧縮符号器／伸張復号器１２０で順次圧縮され、符号化画像メモリ１２１に順次記憶された後、動画ファイルとして外部メモリに記録される。この外部メモリに記録された静止画ファイル及び動画ファイルは、ＰＬＡＹ・モードにおいてユーザーの選択操作に応じて画像圧縮符号器／伸張復号器１２０に読み出されるとともに伸張及び復号化され、ＹＵＶデータとして展開された後、表示部１１９に表示される。なお、駆動量／焦点距離部１２６は、撮像レンズ１０２におけるズームレンズ、フォーカスレンズの駆動量や焦点距離を検出して制御部１１２に入力する。 In the moving image shooting mode, a plurality of image data sequentially stored in the image memory 116 is sequentially compressed by the image compression encoder / decompression decoder 120 between the first shutter key operation and the second shutter key operation. After being sequentially stored in the encoded image memory 121, it is recorded in the external memory as a moving image file. The still image file and the moving image file recorded in the external memory are read out to the image compression encoder / decompression decoder 120 according to the user's selection operation in the PLAY mode, and are expanded and decoded, and are expanded as YUV data. Is displayed on the display unit 119. The driving amount / focal length unit 126 detects the driving amounts and focal lengths of the zoom lens and the focus lens in the imaging lens 102 and inputs them to the control unit 112.

また、制御部１１２には、操作入力部１３０が入力回路１３１を介して接続されており、操作入力部１３０には、モード選択キー、シャッターキー、ズームキー等の複数の操作キー及びスイッチが設けられている。前記表示部１１９タッチパネル１３２が積層されており、このタッチパネル１３２からのタッチ信号に基づく座標値も、入力回路１３１を介して制御部１１２に入力される。 An operation input unit 130 is connected to the control unit 112 via an input circuit 131. The operation input unit 130 is provided with a plurality of operation keys and switches such as a mode selection key, a shutter key, and a zoom key. ing. The display unit 119 touch panel 132 is stacked, and a coordinate value based on a touch signal from the touch panel 132 is also input to the control unit 112 via the input circuit 131.

また、このデジタルカメラ１００は、前記動画撮影モード、音声のみを記録する録音モード、音声付き（静止画）撮影モードにおいて、周囲音を記録する録音機能を備えており、このため周囲音を検出するマイクロホンを有し、このマイクロホンは前記マイクロホンアレー部１０３に設けられた横配列マイクと縦配列マイクとからなるマイクＭ１からマイクＭｎまでのｎ本のマイクロホンで構成されている。各マイクＭ１〜Ｍｎからの音声信号は、対応する各アンプ１３３・・・で増幅され、Ａ／Ｄ変換回路１３４でサンプルホールド及びデジタル変換され、指向性ビーム生成部１３５に供給される。指向性ビーム生成部１３５は、マイクＭ１〜Ｍｎに対応して設けられたｎ個の遅延器Ｄ１〜ＤｎとアンプＡ１〜Ａｎ、これらアンプＡ１〜Ａｎからの信号を加算する加算器１３６で構成されている。前記遅延器Ｄ１〜Ｄｎ等で構成される指向性ビーム生成部１３５は、第１指向性制御部１４４（抽出方向の走査制御）及び第２指向性制御部１４５（音声強調／抑圧方向制御）により制御され、第２指向性制御部１４５には、制御部１１２から音声強調／抑圧する方向座標１４６が与えられる。 The digital camera 100 also has a recording function for recording ambient sounds in the moving image shooting mode, the recording mode for recording only sound, and the recording mode with sound (still image). This microphone has n microphones from a microphone M1 to a microphone Mn, each of which includes a horizontal array microphone and a vertical array microphone provided in the microphone array unit 103. Audio signals from the microphones M1 to Mn are amplified by the corresponding amplifiers 133..., Sample-held and digitally converted by the A / D conversion circuit 134, and supplied to the directional beam generator 135. The directional beam generator 135 includes n delay devices D1 to Dn provided corresponding to the microphones M1 to Mn, amplifiers A1 to An, and an adder 136 that adds signals from these amplifiers A1 to An. ing. The directional beam generation unit 135 configured by the delay units D1 to Dn and the like is performed by a first directivity control unit 144 (extraction direction scanning control) and a second directivity control unit 145 (speech enhancement / suppression direction control). Controlled, the second directivity control unit 145 is provided with direction coordinates 146 for voice enhancement / suppression from the control unit 112.

前記加算器１３６での加算結果により得られる音声データは、走査方向別の入力音声メモリ１３７と音声メモリ１３８に格納される。この音声メモリ１３８に格納された音声データは、音声圧縮符号器／伸張復号器１３９で順次圧縮され、符号化音声メモリ１４０に順次記憶される。制御部１１２は、この圧縮音声データと前記圧縮動画データとを含む音声付き動画ファイルを生成して外部メモリに記録する。 Audio data obtained as a result of the addition by the adder 136 is stored in the input audio memory 137 and the audio memory 138 for each scanning direction. The audio data stored in the audio memory 138 is sequentially compressed by the audio compression encoder / decompression decoder 139 and stored in the encoded audio memory 140 sequentially. The control unit 112 generates an audio-added moving image file including the compressed audio data and the compressed moving image data, and records it in an external memory.

この外部メモリに記録された動画ファイルの音声データは、ＰＬＡＹ・モードにおいてユーザーの選択操作に応じて、音声圧縮符号器／伸張復号器１３９に読み出されるとともに伸張及び復号化される。この伸張及び及び復号化された音声データは、符号化音声メモリ１４０に一時記憶された後、Ｄ／Ａ変換器１４１でアナログ信号に変換され、アンプ１４２を介してスピーカー１４３に供給されて音声として再生される。なお、音声記録を行うタイミングは、動画撮影時に限定されず、音声付き静止画撮影モードにおける録音動作時でもよく、また、録音モードやアフレコモードにおける録音動作時でもよい。 The audio data of the moving image file recorded in the external memory is read by the audio compression encoder / decompression decoder 139 and decompressed and decoded in accordance with the user's selection operation in the PLAY mode. The decompressed and decoded audio data is temporarily stored in the encoded audio memory 140, converted to an analog signal by the D / A converter 141, and supplied to the speaker 143 via the amplifier 142 as audio. Played. Note that the timing of performing the audio recording is not limited to the time of moving image shooting, and may be during the recording operation in the still image shooting mode with audio, or during the recording operation in the recording mode or the after-recording mode.

一方、前記メモリ１３７に格納された走査方向別の入力音声データは、特徴抽出部１５０で特徴抽出され、この抽出された特徴は特徴抽出データメモリ１５１に格納される。二次元画像生成部１５２は、この特徴に基づき二次元画像を生成し、半透明画像変換部１５３は、半透明化パターン生成部１５４とに基づき前記二次元画像生成部１５２からの二次元画像を半透明画像に変換し、前記画像合成部１１７に出力する。画像合成部１１７は、この半透明画像と前記被写体像スルー画像部１１３からの被写体スルー画像とを合成して、表示部１１９に出力し、これにより被写体像スルーに前記半透明画像が重ねて表示されるように構成されている。 On the other hand, the input voice data for each scanning direction stored in the memory 137 is extracted by the feature extraction unit 150, and the extracted features are stored in the feature extraction data memory 151. The two-dimensional image generation unit 152 generates a two-dimensional image based on this feature, and the translucent image conversion unit 153 generates the two-dimensional image from the two-dimensional image generation unit 152 based on the translucent pattern generation unit 154. The image is converted into a translucent image and output to the image composition unit 117. The image synthesizing unit 117 synthesizes the translucent image and the subject through image from the subject image through image unit 113 and outputs the synthesized image to the display unit 119, whereby the semitransparent image is superimposed on the subject image through and displayed. It is configured to be.

以上の構成に係る本実施の形態において、制御部１１２は前記プログラムに基づき、図３及び図４に示す一連のフローチャートに示すように処理を実行する。すなわち、録音／動画撮影モードが設定されたか否かを判断し（図３ステップＳ１０１）、動画撮影モード以外の他のモードが設定された場合には、設定された当該その他のモード処理を実行する（ステップＳ１０２）。また、録音／動画撮影モードが設定されたならば、測光処理、ＷＢ処理を実行するとともに（ステップＳ１０３）、ズーム処理、ＡＦ処理を行って（ステップＳ１０４）、レンズ駆動部１０５により駆動されることにより変化したレンズ焦点距離（ｆ）、デジタルズーム倍率（Ｍ）等を算出する（ステップＳ１０５）。 In the present embodiment having the above configuration, the control unit 112 executes processing as shown in a series of flowcharts shown in FIGS. 3 and 4 based on the program. That is, it is determined whether or not the recording / moving image shooting mode has been set (step S101 in FIG. 3). If a mode other than the moving image shooting mode is set, the set other mode processing is executed. (Step S102). If the recording / moving image shooting mode is set, photometry processing and WB processing are performed (step S103), zoom processing and AF processing are performed (step S104), and the lens driving unit 105 is driven. The lens focal length (f), the digital zoom magnification (M), and the like changed by the above are calculated (step S105).

さらに、被写体像スルー画像を、照準、距離情報等とともに、ファインダー表示部１１９に表示させる（ステップＳ１０６）。すなわち、図５の説明図における（ａ）に示すように、撮像素子１０９からの撮像信号をＡ／Ｄ変換器１１４によりデジタルデータに変換し、画像信号処理部１１５で信号処理することにより、ファインダー表示部１１９に被写体スルー画像１６０を表示させる。 Further, the subject image through image is displayed on the finder display unit 119 together with the aim, distance information, and the like (step S106). That is, as shown in FIG. 5A, the image pickup signal from the image pickup element 109 is converted into digital data by the A / D converter 114, and signal processing is performed by the image signal processing unit 115. The subject through image 160 is displayed on the display unit 119.

次に、操作入力部１３０での操作により音声の走査（スキャン）が指示されたか否かを判断し（ステップＳ１０７）、指示されていない場合には後述する図４のステップＳ１２１に進む。また、音声の走査（スキャン）が指示された場合には、画像サイズ（Ｘ′Ｙ′）、焦点距離（ｆ）、デジタルズーム倍率（Ｍ）に応じて走査範囲（θｘｍｉｎ、θｘｍａｘ等）、走査間隔（Δθｘ等）を設定する（ステップＳ１０８）。引き続き、θｙ＝θｙｍｉｎとするとともに（ステップＳ１０９）、θｘ＝θｘｍｉｎとする（ステップＳ１１０）。そして、下記式を用いて走査音源方向（θｘ，θｙ）にフォーカスする為の各遅延器Ｄ（ｋ）の遅延時間ｔＤ（ｊ，ｋ）を設定する（ステップＳ１１１）。
ｔＤｘ（ｊ）＝（ｍ−ｊ）・ｄｘ・ｓｉｎθｘ／ｃ、
ｔＤｙ（ｋ）＝（ｎ−ｋ）・ｄｙ・ｓｉｎθｙ／ｃ、
ｔＤ（ｊ，ｋ）＝√［｛ｔＤｘ（ｊ）｝^２＋｛ｔＤｙ（ｋ）｝^２］
（但し、ｋ：マイク番号１〜ｎ、ｄ：マイク間隔、ｃ：音速） Next, it is determined whether or not a voice scan is instructed by an operation on the operation input unit 130 (step S107). If not, the process proceeds to step S121 in FIG. When an audio scan is instructed, the scan range (θxmin, θxmax, etc.), scan interval according to the image size (X′Y ′), focal length (f), and digital zoom magnification (M) (Δθx, etc.) is set (step S108). Subsequently, θy = θymin is set (step S109), and θx = θxmin is set (step S110). Then, the delay time tD (j, k) of each delay device D (k) for focusing in the scanning sound source direction (θx, θy) is set using the following equation (step S111).
tDx (j) = (m−j) · dx · sin θx / c,
tDy (k) = (n−k) · dy · sin θy / c,
tD (j, k) = √ [{tDx (j)} ² + {tDy (k)} ² ]
(Where k: microphone number 1 to n, d: microphone interval, c: sound velocity)

次に、θｘ、θｙ方向に指向した音声を所定時間づつ入力音声メモリＭｓ（入力音声メモリ１３７）にθｘ、θｙ、ｔとして記録する（ステップＳ１１２）。引き続き、θｘ＝θｘ＋Δθｘとし（ステップＳ１１３）、θｘ＞θｘｍａｘであるか否かを判断する（ステップＳ１１４）。この判断がＮＯである場合には、ステップＳ１１１からの処理を繰り返す。また、ＹＥＳであるならば、θｙ＝θｙ＋Δθｙとし（ステップＳ１１５）、θｙ＞θｙｍａｘであるか否かを判断する（ステップＳ１１６）。この判断がＮＯである場合には、ステップＳ１１０からの処理を繰り返す。また、ＹＥＳであるならば、入力音声メモリ１３７のＭｓ（θｘ，θｙ）に記憶された抽出音声から特徴データＣｓ（θｘ，θｙ）を算出し（ステップＳ１１７）、この特徴データＣｓ（θｘ，θｙ）をθｘ，θｙ座標上にプロットして二次元画像を描画する（ステップＳ１１８）。さらに、特徴抽出データ画像を半透明化し（ステップＳ１１９）、半透明化した特徴抽出データ画像をスルー画像に重ねてファインダー表示部１１９に表示する（ステップＳ１２０）。 Next, voices directed in the θx and θy directions are recorded as θx, θy, and t in the input voice memory Ms (input voice memory 137) every predetermined time (step S112). Subsequently, θx = θx + Δθx is set (step S113), and it is determined whether θx> θxmax is satisfied (step S114). If this determination is NO, the processing from step S111 is repeated. If YES, θy = θy + Δθy is set (step S115), and it is determined whether θy> θymax is satisfied (step S116). If this determination is NO, the processing from step S110 is repeated. If YES, the feature data Cs (θx, θy) is calculated from the extracted speech stored in Ms (θx, θy) of the input speech memory 137 (step S117), and the feature data Cs (θx, θy) is calculated. ) Is plotted on the θx and θy coordinates to draw a two-dimensional image (step S118). Further, the feature extraction data image is translucent (step S119), and the translucent feature extraction data image is superimposed on the through image and displayed on the viewfinder display unit 119 (step S120).

すなわち、図５に示すように、マイクロホンアレー部１０３からの音声データを指向性制御部１４４、１４５の制御により遅延制御し、走査方向別に入力音声メモリ１３７に記憶し、指向性を走査して入力した音声から特徴データを抽出して、特徴抽出データメモリ１５１に格納し、この特徴抽出したデータから二次元画像１６１を生成する。この二次元画像は、同図（ｂ）に示すように、音圧レベルに応じて異なる色とする。そして、この（ｂ）に示した二次元画像を半透明化し、同図（ｃ）に示すこの半透明化した特徴抽出データ画像１６２を同図（ａ）の前記被写体スルー画像１６０に重ねて表示する。これにより、ファインダー表示部１１９には同図（ｄ）に示すように、被写体スルー画像１６０と半透明化した特徴抽出データ画像１６２とが重畳して表示されることとなる。したがって、ユーザはこの（ｄ）に示した表示状態を視認することにより、複数の被写体においていずれの被写体からどの程度の音圧の音（音声）が発生しているか視覚的に認識することができる。 That is, as shown in FIG. 5, the audio data from the microphone array unit 103 is delayed by the control of the directivity control units 144 and 145, stored in the input audio memory 137 for each scanning direction, and the directivity is scanned and input. Feature data is extracted from the voice and stored in the feature extraction data memory 151, and a two-dimensional image 161 is generated from the feature extracted data. The two-dimensional image has a different color depending on the sound pressure level as shown in FIG. Then, the two-dimensional image shown in (b) is made translucent, and this semi-transparent feature extraction data image 162 shown in (c) is displayed so as to be superimposed on the subject through image 160 shown in (a). To do. As a result, the viewfinder display 119 displays the subject through image 160 and the semi-transparent feature extraction data image 162 in a superimposed manner as shown in FIG. Therefore, by visually recognizing the display state shown in (d), the user can visually recognize how much sound pressure (sound) is generated from which subject among a plurality of subjects. .

そして、図４のフローチャートに示すように、ユーザによる操作入力部１３０での操作により、被写体の音声方向の選択がなされたか否かを判断し（ステップＳ１２１）、この選択がなされていない場合にはステップＳ１２２及びＳ１２３の処理を行うことなく、ステップＳ１２３に進む。このとき、ユーザが図６（５）に示すように、操作入力部１３０での操作により、映像フォーカス照準１６３を音声強調または抑圧させたい被写体上に移動させると、被写体までの距離が測距され同図に示すように「４Ｍ」なる映像フォーカスした被写体距離が表示される。そして、この被写体上の映像フォーカス照準１６３をユーザが指Ｆでタッチした後、操作入力部１３０にて「音声強調設定ボタン」または「音声抑圧設定ボタン」を押下すると、被写体の音声方向の選択がなされる。 Then, as shown in the flowchart of FIG. 4, it is determined whether or not the audio direction of the subject has been selected by the user's operation on the operation input unit 130 (step S121). If this selection has not been made, The process proceeds to step S123 without performing the processes of steps S122 and S123. At this time, as shown in FIG. 6 (5), when the user moves the video focus aiming 163 onto the subject to be emphasized or suppressed by operating the operation input unit 130, the distance to the subject is measured. As shown in the figure, the subject distance of the focused image “4M” is displayed. Then, after the user touches the video focus aiming 163 on the subject with the finger F, when the “voice enhancement setting button” or “voice suppression setting button” is pressed on the operation input unit 130, the voice direction of the subject is selected. Made.

したがって、図４のフローチャートにおいては、音声強調の設定がなされたことにより、ステップＳ１２１の判断がＹＥＳとなってステップＳ１２２に進み、入力座標を被写体や音源の位置座標（ｘ，ｙ）、ＲＡＭに記憶する（ステップＳ１２２）。すなわち、図６に示すように、ズーム動作に応じて焦点距離と画角座標は変化するが、同図（６）に示すようよう焦点距離ｆ＝６ｍｍであったとすると、ユーザが指Ｆでタッチした入力座標として、同図（７）に示すように、（ｘ，ｙ）＝（０．７，０．１）を得ることができる。次に、下記例示式を用いて、入力位置座標（ｘ，ｙ）をレンズ焦点距離（ｆ）、画像サイズ（Ｘ′，Ｙ′）、デジタルズーム倍率（Ｍ）に基づいて、音源方向（θｘ，θｙ）に変換する（ステップＳ１２３）。
（例）θｘ＝ｔａｎ^−１（（ｘ／ｘｍａｘ）×Ｘ′／２ｆ）／Ｍ、
θｙ＝ｔａｎ^−１（（ｙ／ｙｍａｘ）×Ｙ′／２ｆ）／Ｍ、 Therefore, in the flowchart of FIG. 4, since the voice enhancement is set, the determination in step S121 is YES, and the process proceeds to step S122, where the input coordinates are stored in the position coordinates (x, y) of the subject or the sound source, and in the RAM. Store (step S122). That is, as shown in FIG. 6, the focal length and the view angle coordinates change according to the zoom operation, but when the focal length f = 6 mm as shown in FIG. 6 (6), the user touches with the finger F. As the input coordinates, (x, y) = (0.7, 0.1) can be obtained as shown in FIG. Next, using the following exemplary expression, the input position coordinates (x, y) are converted into the sound source direction (θx, y) based on the lens focal length (f), the image size (X ′, Y ′), and the digital zoom magnification (M). θy) (step S123).
(Example) θx = tan ⁻¹ ((x / xmax) × X ′ / 2f) / M,
θy = tan ⁻¹ ((y / ymax) × Y ′ / 2f) / M,

図７に、画角や半画角、被写体範囲がズーム操作などレンズ焦点距離（ｆ）の変化に伴って変化するときの強調音源方向座標（θｘ，θｙ）の換算例を示す。本実施の形態においては、被写体や音源の選択は、選択されたファインダー表示の位置座標（ｘ，ｙ）から、撮影時のズーム倍率若しくは焦点距離（ｆ）と画像サイズ（Ｘ′、Ｙ′）に応じて、音源の方向角度θｆまたはθｓ、方向角度座標（θｘ，θｙ）を算出して設定するので、ズーム倍率や画角が変わっても、対応できる。タッチパネルやカーソルによる画面上の入力位置座標（ｘ，ｙ）は、−１．０≦ｘ≦１．０、・・−０．７５ｘ≦０．７５の範囲とすると、被写体や特定音源の角度θは、同図に示す半画角（２／θ）に相当させているので、位置座標（ｘ，ｙ）をレンズ焦点距離（ｆ）と、画像サイズ（Ｘ′，Ｙ′）とに基づいて、強調音源方向の角度θｆ、または、方向座標（θｘ，θｙに変換するには、
例えば、ｘｍａｘ＝１．０，ｙｍａｘ＝０．７５として、
θｘ＝ｔａｎ^−１｛（ｘ／ｘｍａｘ）×Ｘ′／２ｆ｝、θｙ＝ｔａｎ^−１（ｙ／ｙｍａｘ）×Ｙ′／２ｆ）｝、等として変換される（図の例では、撮像サイズＸ′＝横５．２７、Ｙ′＝縦３．９６ｍｍ、焦点距離ｆ＝６ｍｍの場合に、入力位置座標＝（０．７，−０．１）から、θｘ＝ｔａｎ^−１｛（０．７／１．０）×５．２７／（２×６）｝＝＋１７．１、θｙ＝ｔａｎ^−１｛（−０．１／０．７５）×３．９５／（２×６）｝＝−２．５、がそれぞれ設定される。）。 FIG. 7 shows a conversion example of the enhanced sound source direction coordinates (θx, θy) when the angle of view, the half angle of view, and the subject range change with the change of the lens focal length (f) such as a zoom operation. In the present embodiment, the subject and the sound source are selected from the position coordinates (x, y) of the selected viewfinder display, the zoom magnification or focal length (f) and the image size (X ′, Y ′) at the time of shooting. Accordingly, the direction angle θf or θs of the sound source and the direction angle coordinates (θx, θy) are calculated and set, so that even if the zoom magnification or the angle of view changes, it is possible to cope. If the input position coordinates (x, y) on the screen by the touch panel or the cursor are in the range of −1.0 ≦ x ≦ 1.0,... −0.75x ≦ 0.75, the angle θ of the subject or the specific sound source Corresponds to the half angle of view (2 / θ) shown in the figure, so that the position coordinates (x, y) are based on the lens focal length (f) and the image size (X ′, Y ′). In order to convert the angle θf of the emphasized sound source direction or the direction coordinates (θx, θy,
For example, assuming xmax = 1.0 and ymax = 0.75,
θx = tan ⁻¹ {(x / xmax) × X ′ / 2f}, θy = tan ⁻¹ (y / ymax) × Y ′ / 2f)}, etc. (in the example of the figure, the imaging size X When ′ = horizontal 5.27, Y ′ = vertical 3.96 mm, and focal length f = 6 mm, from input position coordinates = (0.7, −0.1), θx = tan ⁻¹ {(0.7 /1.0)×5.27/(2×6)}=+17.1, θy = tan ⁻¹ {(−0.1 / 0.75) × 3.95 / (2 × 6)} = − 2.5 is set respectively).

θｆは、マイクロホンアレー部１０３がマイク配列が横並び（水平方向）のみの場合は、θｆ＝θｘとして利用し、マイク配列が縦並び（垂直方向）のみの場合は。θｆ＝θｙとして利用すればよい。配列が二次元配列で、水平及び垂直の両方向とも利用する場合には、前記遅延時間の設定では、水平方向ｔＤｘ（ｊ）＝（ｍ−ｊ）・ｄｘ・ｓｉｎθｘ／ｃ、垂直方向ｔＤｙ（ｋ）＝（ｎ−ｋ）・ｄｙ・ｓｉｎθｙ／ｃ、したがって、ｔＤ（ｊ，ｋ）＝√［｛ｔＤｘ（ｊ）｝^２＋｛ｔＤｙ（ｋ）｝^２］などと設定すればよい。 θf is used as θf = θx when the microphone array 103 is arranged side by side (horizontal direction) only, and is used when the microphone array is arranged only vertically (vertical direction). It may be used as θf = θy. When the array is a two-dimensional array and is used in both the horizontal and vertical directions, in the setting of the delay time, the horizontal direction tDx (j) = (m−j) · dx · sin θx / c and the vertical direction tDy (k ) = (N−k) · dy · sin θy / c, and therefore, tD (j, k) = √ [{tDx (j)} ² + {tDy (k)} ² ] may be set.

また、デジタルズームなどで、光学系の倍率やレンズ焦点距離は変わらないが、画像処理により撮影画角が変わる場合にも、同様に、デジタルズームの横または縦の拡大倍率、若しくは焦点距離換算の倍率Ｍを用いて、ファインダー画面上での入力座標に対して画角も倍率Ｍ分の１と狭くなるので、被写体や音源の方向はθｆは、
θｆ＝θｘ＝（ｘ／ｘｍａｘ）×ｔａｎ^−１［Ｘ′／２ｆ］／Ｍ、または、
θｆ＝θｙ＝（ｙ／ｙｍａｘ）×ｔａｎ^−１［ｙ′／２ｆ］／Ｍ、と補正すればよい。 In addition, the magnification of the optical system and the lens focal length are not changed by digital zoom or the like, but when the shooting angle of view is changed by image processing, similarly, the horizontal or vertical enlargement magnification of the digital zoom or the magnification M of the focal length conversion is used. Since the angle of view is also reduced to 1 / M magnification with respect to the input coordinates on the viewfinder screen, the direction of the subject and the sound source is θf
θf = θx = (x / xmax) × tan ⁻¹ [X ′ / 2f] / M, or
What is necessary is just to correct | amend as (theta) f = (theta) y = (y / ymax) * tan < ^-1 > [y '/ 2f] / M.

次に、音声強調（ステップＳ１２４）と音声抑圧（ステップＳ１２５）のいずれが設定されたか否かを判断する。そして、音声抑圧の設定された場合には、音源方向（θｘ，θｙ）を音源抑圧方向θＳとして設定する（ステップＳ１２６）。そして、この音源抑圧方向（θＳｘ，θＳｙ）に基づいて、下記例示次式により、音圧抑制部（指向性ビーム生成部１３５）の各遅延器Ｄ（ｋ）の遅延時間ｔＤ（ｊ，ｋ）を設定する（ステップＳ１２７）。
（例）ｔＤｘ（ｊ）＝（ｍ−ｊ）・ｄｘ・ｓｉｎθＳｘ／ｃ、
ｔＤｙ（ｋ）＝（ｍ−ｋ）・ｄｙ・ｓｉｎθＳｙ／ｃ、
ｔＤ（ｊ，ｋ）＝√［｛ｔＤｘ（ｊ）｝^２＋｛ｔＤｙ（ｋ）｝^２］
（但し、ｊ：横配列マイク番号１〜ｍ、ｋ：縦配列マイク番号１〜ｎ、ｄ：マイク間隔、ｃ：音速）
さらに、音声抑圧の照準を音声抑圧設定マークとともに、スルー画像に重ねてファインダー表示する（ステップＳ１２８）。 Next, it is determined whether voice enhancement (step S124) or voice suppression (step S125) is set. If the voice suppression is set, the sound source direction (θx, θy) is set as the sound source suppression direction θS (step S126). Then, based on the sound source suppression direction (θSx, θSy), the delay time tD (j, k) of each delay unit D (k) of the sound pressure suppression unit (directional beam generation unit 135) is expressed by the following exemplary equation. Is set (step S127).
(Example) tDx (j) = (m−j) · dx · sin θSx / c,
tDy (k) = (m−k) · dy · sin θSy / c,
tD (j, k) = √ [{tDx (j)} ² + {tDy (k)} ² ]
(However, j: Horizontally arranged microphone numbers 1 to m, k: Vertically arranged microphone numbers 1 to n, d: Mic interval, c: Sound velocity)
Further, the aim of the voice suppression is superimposed on the through image together with the voice suppression setting mark and displayed in a finder (step S128).

また、音声強調の設定された場合には、音源方向（θｘ，θｙ）を音源強調方向θＦとして設定する（ステップＳ１２９）。そして、この音源抑圧方向（θＦｘ，θＦｙ）に基づいて、下記例示次式により、音圧抑制部（指向性ビーム生成部１３５）の各遅延器Ｄ（ｋ）の遅延時間ｔＤ（ｊ，ｋ）を設定する（ステップＳ１３０）。
（例）ｔＤｘ（ｊ）＝（ｍ−ｊ）・ｄｘ・ｓｉｎθＦｘ／ｃ、
ｔＤｙ（ｋ）＝（ｍ−ｋ）・ｄｙ・ｓｉｎθＦｙ／ｃ、
ｔＤ（ｊ，ｋ）＝√［｛ｔＤｘ（ｊ）｝^２＋｛ｔＤｙ（ｋ）｝^２］
さらに、音声抑圧の照準を音声抑圧設定マークとともに、スルー画像に重ねてファインダー表示する（ステップＳ１３１）。 When the voice enhancement is set, the sound source direction (θx, θy) is set as the sound source enhancement direction θF (step S129). Then, based on the sound source suppression direction (θFx, θFy), the delay time tD (j, k) of each delay unit D (k) of the sound pressure suppression unit (directional beam generation unit 135) is expressed by the following exemplary equation. Is set (step S130).
(Example) tDx (j) = (m−j) · dx · sin θFx / c,
tDy (k) = (m−k) · dy · sin θFy / c,
tD (j, k) = √ [{tDx (j)} ² + {tDy (k)} ² ]
Further, the aim of the voice suppression is superimposed on the through image together with the voice suppression setting mark and displayed in a finder (step S131).

しかる後に、録音中または動画撮影中であるか否かを判断し（ステップＳ１３２）、録音中または動画撮影中のいずれでもない場合にはその他の処理を実行して（ステップＳ１４０）、リターンする。録音中または動画撮影中である場合には、マイクロホンアレー部１０３からの音声を入力させ（ステップＳ１３３）、前述した設定方向音声の強調処理（ステップＳ１３４）と設定方向音声の抑圧処理（ステップＳ１３５）のいずれが設定されているかを判断する。そして、いずれの処理も設定されていない場合には、通常の雑音抑圧処理を実行する（ステップＳ１３６）。また、設定方向音声の抑圧処理が設定されている場合には、マイクロホンアレー部１０３の各遅延器Ｄ１〜Ｄｎ出力を加減算合成して、特定方向を抑圧処理した音声を音声メモリ１３８に出力させる（ステップＳ１３７）。また、設定方向音声の強調処理が設定されている場合には、マイクロホンアレー部１０３の各遅延器Ｄ１〜Ｄｎ出力を加算合成して、特定方向を強調処理した音声を音声メモリ１３８に出力させる（ステップＳ１３８）。そして、音声圧縮符号器／伸長復号器１３９および／または画像圧縮符号器／伸長復号器１２０で、録音音声および／または撮像映像の符号化処理させて、符号化音声メモリ１４０および／または符号化画像メモリ１２１に記録する（ステップＳ１１５）。しかる後に、その他の処理を実行して（ステップＳ１４０）、リターンする。 Thereafter, it is determined whether or not recording or moving image shooting is in progress (step S132). If neither recording nor moving image shooting is being performed, other processing is executed (step S140), and the process returns. When recording or moving image shooting is in progress, the sound from the microphone array unit 103 is input (step S133), and the setting direction sound enhancement processing (step S134) and the setting direction sound suppression processing (step S135) described above. Which of these is set is determined. If no process is set, normal noise suppression processing is executed (step S136). Further, when the setting direction voice suppression processing is set, the outputs of the delay devices D1 to Dn of the microphone array unit 103 are added and subtracted to output the voice memory 138 with the voice subjected to the suppression processing in a specific direction ( Step S137). Further, when the enhancement processing for the set direction voice is set, the outputs of the delay devices D1 to Dn of the microphone array unit 103 are added and synthesized, and the voice in which the specific direction is emphasized is output to the voice memory 138 ( Step S138). Then, the audio compression encoder / decompression decoder 139 and / or the image compression encoder / decompression decoder 120 encode the recorded audio and / or the captured video, and the encoded audio memory 140 and / or the encoded image are processed. It records in the memory 121 (step S115). Thereafter, other processing is executed (step S140), and the process returns.

（第１の実施の形態の変形例）
図８〜１０は、前記フローチャートにおいて説明したマクロホンアレーによる指向制御、音声強調、音声抑圧処理の変形例を示すブロック回路図である。
図８は、２個のマイクＭ１，Ｍ２を用いるものであって、この２個のマイクＭ１，Ｍ２の間隔ｄ、特定音源の方向θが既知であり、マイク間隔ｄに比べて特定音源までの距離Ｌが遠距離（Ｌ＞＞ｄ）である場合である。図に示すように、特定方向の特定音源からの音声ｗ（ｎ）を強調したい場合には、特定音声ｗ（ｎ）に近い側のマイクＭ１に先に音声が伝達され、他のマイクＭ２には少し遅れて音声が入力される。このとき、角度θに応じて先に伝達する音源に近い側のマイクＭ１に、他のマイクＭ２より進んでいる分に相当する遅延時間（Ｔ_Ｄ）を遅延器Ｄにより設け、遅い側のマイクＭ２では遅延時間＝０に設定してその出力を加算回路１７０で加算する。 (Modification of the first embodiment)
8 to 10 are block circuit diagrams showing modifications of the directivity control, speech enhancement, and speech suppression processing by the macrophone array described in the flowchart.
In FIG. 8, two microphones M1 and M2 are used, and the interval d between the two microphones M1 and M2 and the direction θ of the specific sound source are known. This is a case where the distance L is a long distance (L >> d). As shown in the figure, when it is desired to emphasize the sound w (n) from a specific sound source in a specific direction, the sound is first transmitted to the microphone M1 closer to the specific sound w (n), and is transmitted to the other microphone M2. The sound is input with a little delay. At this time, a delay time (T _D ) corresponding to the amount of advance from the other microphone M2 is provided to the microphone M1 closer to the sound source to be transmitted earlier according to the angle θ by the delay device D, and the slower microphone In M2, delay time = 0 is set and the output is added by the adder circuit 170.

すると、方向θからの音声信号は、各マイクＭ１、Ｍ２からの伝播時間は加算回路１７０入力時では同じになって強調されることとなり、他の方向からの信号は互いに少しづつ打ち消し合うので、相対的に抑圧されることとなる。したがって、各マイクＭ１、Ｍ２の遅延回路の遅延時間ｔ_Ｄを設定制御することにより、任意の特定方向θじ指向性を設けて音声強調を行い、電子的に指向性を可変制御することができる。 Then, the sound signal from the direction θ is emphasized with the same propagation time from the microphones M1 and M2 when the adder circuit 170 is input, and the signals from the other directions cancel each other little by little. It will be relatively suppressed. Therefore, by setting the control delay time t _D of the delay circuit of the microphones M1, M2, performs speech enhancement by providing any particular direction θ Ji directivity, electronically directivity can be variably controlled .

同様に、特定方向θからの音声に対して伝播時間を揃え、前記とは逆に減算回路１７１で互いに相殺するようにすると、特定の音源方向θからの音声に死角を作って抑圧することができ、雑音抑制回路として利用できる。 Similarly, if the propagation time is aligned with respect to the sound from the specific direction θ and the subtraction circuit 171 cancels each other by conversely with the above, it is possible to suppress the sound from the specific sound source direction θ by creating a dead angle. Can be used as a noise suppression circuit.

遅延量（Ｔ_Ｄ）を決定するためには、いずれの場合も、音源の方向θが既知であることが必要である。本実施の形態では、ユーザがファインダー（表示部１１９）視野内から選択した被写体を入力し、その入力座標に対応する方向を特定音源方向θとして設定するので、方向θを推測する必要がなく、容易に演算して設定できる。 In any case, the direction θ of the sound source needs to be known in order to determine the delay amount (T _D ). In the present embodiment, the user inputs a subject selected from the viewfinder (display unit 119) field of view, and the direction corresponding to the input coordinates is set as the specific sound source direction θ, so there is no need to estimate the direction θ, Easy to calculate and set.

例えは、２個のマイクＭ１，Ｍ２の場合には、マイクＭ１，Ｍ２への伝播遅れ時間は、それぞれｔ_１＝０、ｔ_２＝ｄ・ｓｉｎθ／ｃとなるので、マイクＭ１，Ｍ２の各遅延回路の遅延時間ｔ_Ｄ１，ｔ_Ｄ２には、それぞれ他方の伝播遅れ時間、すなわち、
ｔ_Ｄ１＝ｔ_２＝ｄ・ｓｉｎθ／ｃ、ｔ_Ｄ２＝ｔ_１＝０（ｄ：マイク間隔、ｃ：音速）を設定すればよい。 For example, in the case of two microphones M1 and M2, the propagation delay times to the microphones M1 and M2 are t ₁ = 0 and t ₂ = d · sin θ / c, respectively. In the delay times t _D1 and t _D2 of the delay circuit, the other propagation delay time, that is,
It is only necessary to set t _D1 = t ₂ = d · sin θ / c, t _D2 = t ₁ = 0 (d: microphone interval, c: sound velocity).

また、図９に示すように、特定方向からの音声を強調したい場合には、特定音声に近い側のマイクに先に音声が伝播され、他のマイクには順に少しずつ遅れて音声信号が入力される。このとき、角度θに応じて、先に伝播する音源に近い側のマイク（図示ではマイク１）に、他より進んでいる分に相当する遅延時間（ｔＤ）を遅延器により設け、一番遅い側のマイクでは遅延時間＝０に設定して、それらの出力を加算器１３６で加算合成する。すると、方向θからの音声信号では、各マイクからの伝播時間は加算器１３６の入力時では同じになって互いに強調されることとなり、他の方向からの信号では互いに少しずつ打ち消し合うので相対的に抑圧されることとなる。したがって、各マイクの遅延時間（ｔＤ）を設定制御することにより、任意の特定方向θに指向性を設けて音声強調を行い、電子的に指向性を可変制御することができる。 In addition, as shown in FIG. 9, when it is desired to emphasize the sound from a specific direction, the sound is propagated first to the microphone near the specific sound, and the sound signal is input to the other microphones with little delay in order. Is done. At this time, according to the angle θ, the delay time (tD) corresponding to the amount of advance of the other microphone is provided to the microphone closer to the sound source that propagates earlier (the microphone 1 in the drawing) by the delay device, which is the slowest. In the microphone on the side, delay time = 0 is set, and those outputs are added and synthesized by the adder 136. Then, in the audio signal from the direction θ, the propagation time from each microphone becomes the same at the input of the adder 136 and is mutually enhanced, and the signals from other directions cancel each other little by little. Will be suppressed. Therefore, by setting and controlling the delay time (tD) of each microphone, directivity can be provided in any specific direction θ to perform voice enhancement, and the directivity can be variably controlled electronically.

また、図１０に示すように、特定方向θからの音声に対して伝播時間を揃え、前記とは逆に複数組を加減算回路１７１で互いに相殺するようにすると、特定の音源方向θからの音声に死角を作って抑圧することができ、雑音抑圧回路としても利用できる。マイクロホンアレーでは、雑音の到来方向への死角を形成することにより雑音抑圧するので、どのような特性の雑音でも除去可能であるが、雑音源数よりもマイクロホンの数が多い必要がある。 In addition, as shown in FIG. 10, when the propagation time is aligned with the sound from the specific direction θ and a plurality of sets are canceled out by the addition / subtraction circuit 171 on the contrary, the sound from the specific sound source direction θ is obtained. It can also be used as a noise suppression circuit. In the microphone array, noise is suppressed by forming a blind spot in the noise arrival direction, so that noise having any characteristic can be removed, but the number of microphones needs to be larger than the number of noise sources.

図１１は、特徴データの二次元画像を生成する処理の例を示すものである。各方向に抽出された音声特徴抽出データをＣｓ（θｘ，θｙ）とすると、これを二次元空間［ｘ，ｙ］にプロットして画像Ｐ_１［ｘ，ｙ］を作成する。この画像Ｐ_１［ｘ，ｙ］と、例えば、斜線や縦線、横線、ハッチング線などで色分け区分表示する為の半透明化（格子）パターンＰ_２［ｘ，ｙ］とを、各要素毎にＡＮＤ（論理積）処理して特徴抽出データを半透明化した二次元画像Ｐ_３［ｘ，ｙ］を合成する。あるいは、被写体像を観察しやければ、半透明化する代わりに、特徴データの値に応じた輪郭画像や等高線の画像を合成する等、他の方法でもよい。 FIG. 11 shows an example of processing for generating a two-dimensional image of feature data. If the speech feature extraction data extracted in each direction is Cs (θx, θy), this is plotted in a two-dimensional space [x, y] to create an image P ₁ [x, y]. This image P ₁ [x, y] and, for example, a translucent (grid) pattern P ₂ [x, y] for color-coded division display with diagonal lines, vertical lines, horizontal lines, hatching lines, etc. And a two-dimensional image P ₃ [x, y] obtained by making the feature extraction data translucent by AND (logical product) processing. Alternatively, if it is easy to observe the subject image, other methods such as synthesizing a contour image or contour image according to the value of the feature data may be used instead of translucent.

図１２は音声強調する被写体の方向を設定するときの表示例を示す図である。図示のように、本例においては、ファインダー表示部１１９に、特徴抽出データ指標／単位１８０、音声特徴抽出データの二次元画像１８１、被写体像スルー画像１８２、音声強調設定マーク１８３、音声強調する音源の照準（設定方向）１８４、カメラの映像フォーカスの照準１８５、撮影／録音できる残り時間１８６、撮影／録音モード表示１８７を表示させるものである。 FIG. 12 is a diagram showing a display example when setting the direction of the subject to be emphasized by voice. As shown in the figure, in this example, the viewfinder display unit 119 includes a feature extraction data index / unit 180, a two-dimensional image 181 of voice feature extraction data, a subject image through image 182, a voice enhancement setting mark 183, and a sound source for voice enhancement. Aiming (setting direction) 184, camera image focus aiming 185, remaining time 186 for recording / recording, and shooting / recording mode display 187.

図１３は、抑圧したい音源や被写体の方向を設定するときの表示例を示すものである。図示のように、本例においては、ファインダー表示部１１９に、前記特徴抽出データ指標／単位１８０、音声特徴抽出データの二次元画像１８１、被写体像スルー画像１８２、音カメラの映像フォーカスの照準１８５、撮影／録音できる残り時間１８６、撮影／録音モード表示１８７を表示させるのみならず、雑音抑圧設定マーク１８８、音声抑圧する音源の照準（設定方向）１８９を表示させるものである。 FIG. 13 shows a display example when setting the direction of a sound source or subject to be suppressed. As shown in the figure, in this example, the viewfinder display unit 119 has the feature extraction data index / unit 180, the two-dimensional image 181 of the audio feature extraction data, the subject image through image 182, the aim 185 of the image focus of the sound camera, In addition to displaying the remaining time 186 in which shooting / recording can be performed and the shooting / recording mode display 187, a noise suppression setting mark 188 and a sound source aiming (setting direction) 189 for voice suppression are displayed.

図１４及び図１５は、走査（スキャン）入力された音声信号から解析抽出する特定データの例を示す図である。図１４に示した例においては、遅延回路群からなる前記指向性ビーム生成部１３５からの信号を抽出音声メモリ１９０に記憶し、積算回路１９１で積算して方向別の音声データを方向別音声データメモリ１９２に記憶させる。そして、この方向別の音声データに基づき、（ａ）方向別の音圧を一次元表示させ、あるいは（ｂ）二次元表示させる。 14 and 15 are diagrams illustrating examples of specific data to be analyzed and extracted from a sound signal input by scanning. In the example shown in FIG. 14, the signal from the directional beam generation unit 135 made up of a delay circuit group is stored in the extracted voice memory 190, integrated by the integrating circuit 191, and the direction-specific audio data is converted into the direction-specific audio data. It is stored in the memory 192. Then, based on the sound data for each direction, (a) the sound pressure for each direction is displayed one-dimensionally, or (b) is displayed two-dimensionally.

図１５に示した例においては、遅延回路群からなる前記指向性ビーム生成部１３５からの信号を抽出音声メモリ１９０に記憶し、フーリエ変換（ＦＦＴ）回路１９１でフーリエ変換して振幅スペクトル｜Ｘ（ω）｜を出力させ、方向別の音声スペクトルデータを方向別音声スペクトルデータメモリ１９３に記憶させる。そして、この方向別の音声スペクトルデータに基づき、（ａ）音声信号、（ｂ）音声のスペクトル、（ｃ）音声スペクトルの時間変化（ソナグラフ、スペクトログラム）、（ｄ）方向別の音声スペクトルを生成する。この（ｄ）方向別の音声スペクトルに基づき、（ｅ）方向別の音声スペクトル（一次元表示）、（ｆ）方向別の音声スペクトル（二次元表示）、（ｇ）方向別の音声スペクトル（三次元表示）を行う。 In the example shown in FIG. 15, the signal from the directional beam generation unit 135 made up of a delay circuit group is stored in the extracted speech memory 190, and Fourier transformed by a Fourier transform (FFT) circuit 191, and the amplitude spectrum | X ( ω) | is output, and the speech spectrum data for each direction is stored in the speech spectrum data memory 193 for each direction. Then, based on the voice spectrum data for each direction, (a) a voice signal, (b) a voice spectrum, (c) a time change of the voice spectrum (sonograph, spectrogram), (d) a voice spectrum for each direction is generated. . Based on this (d) direction-specific speech spectrum, (e) direction-specific speech spectrum (one-dimensional display), (f) direction-specific speech spectrum (two-dimensional display), (g) direction-specific speech spectrum (third order) Original display).

（第２の実施の形態）
図１６は、本発明の第２の実施の形態に係るデジタルカメラ３００の回路構成を示すブロック図である。このデジタルカメラ３００は、ＡＥ、ＡＷＢ、ＡＦ等の一般的な機能を有するものであり、撮像レンズ３０２は、ズームレンズ、フォーカスレンズで構成され、フォーカス駆動部３０５及びズーム駆動部３０６により駆動される。この撮像レンズ３０２の光軸上には絞り３０７、シャッタ３０８及びＣＣＤ等で構成される撮像部３０９が配置されている。絞り３０７とシャッタ３０８とは、絞り／シャッタ駆動部３１０に接続され、撮像部３０９はドライバ３１１に接続されている。 (Second Embodiment)
FIG. 16 is a block diagram showing a circuit configuration of a digital camera 300 according to the second embodiment of the present invention. The digital camera 300 has general functions such as AE, AWB, and AF, and the imaging lens 302 includes a zoom lens and a focus lens, and is driven by a focus driving unit 305 and a zoom driving unit 306. . On the optical axis of the imaging lens 302, an imaging unit 309 including a diaphragm 307, a shutter 308, a CCD, and the like is disposed. The aperture 307 and the shutter 308 are connected to the aperture / shutter driving unit 310, and the imaging unit 309 is connected to the driver 311.

このデジタルカメラ３００全体を制御する撮影／録音制御回路３１２（以下、単に制御回路３１２という。）は、ＣＰＵ、ＲＯＭおよびワーク用のＲＡＭ等で構成されている。ＲＯＭには、制御回路３１２に前記各部を制御させるための各種のプログラム、例えばＡＥ、ＡＦ、ＡＷＢ制御用のプログラムや、制御回路３１２を本発明の各種手段として機能させるためのプログラム等の各種のプログラムが格納されている。この制御回路３１２には、前記駆動部３０４とともにドライバ３１１が接続されており、ドライバ３１１は、制御回路３１２が発生するタイミング信号に基づき、撮像部３０９を駆動する。 A photographing / recording control circuit 312 (hereinafter simply referred to as a control circuit 312) for controlling the entire digital camera 300 is composed of a CPU, a ROM, a work RAM, and the like. In the ROM, various programs such as various programs for causing the control circuit 312 to control the respective units, for example, a program for controlling AE, AF, and AWB, a program for causing the control circuit 312 to function as various means of the present invention, and the like. The program is stored. A driver 311 is connected to the control circuit 312 together with the driving unit 304, and the driver 311 drives the imaging unit 309 based on a timing signal generated by the control circuit 312.

また、前記撮像部３０９の受光面には、撮像レンズ３０２によって被写体が結像される。撮像部３０９は、ドライバ３１１によって駆動され、被写体の光学像に応じたアナログの撮像信号をユニット回路３１３に出力する。ユニット回路３１３は、撮像部３０９の出力信号に含まれるノイズを相関二重サンプリングによって除去するＣＤＳ回路や、この映像信号を増幅するゲイン調整アンプ（ＡＧＣ）等で構成される。このユニット回路３１３からの映像信号はＡ／Ｄ変換器３１４によりデジタルデータに変換され、映像信号処理部３１５へ出力される。 An object is imaged by the imaging lens 302 on the light receiving surface of the imaging unit 309. The imaging unit 309 is driven by the driver 311 and outputs an analog imaging signal corresponding to the optical image of the subject to the unit circuit 313. The unit circuit 313 includes a CDS circuit that removes noise included in the output signal of the imaging unit 309 by correlated double sampling, a gain adjustment amplifier (AGC) that amplifies the video signal, and the like. The video signal from the unit circuit 313 is converted into digital data by the A / D converter 314 and output to the video signal processing unit 315.

映像信号処理部３１５は、入力した撮像信号に対しペデスタルクランプ等の処理を施し、それを輝度（Ｙ）信号及び色差（ＵＶ）信号に変換するとともに、オートホワイトバランス、輪郭強調、画素補間などの画品質向上のためのデジタル信号処理を行う。映像信号処理部３１５で変換されたＹＵＶデータは順次画像メモリ３１６に格納されるとともに、ＲＥＣスルー・モードでは１フレーム分のデータ（画像データ）が蓄積される毎にビデオ信号に変換され、表示部３１９へ送られてスルー画像として画面表示される。 The video signal processing unit 315 performs processing such as pedestal clamping on the input imaging signal, converts it into a luminance (Y) signal and a color difference (UV) signal, and performs auto white balance, contour enhancement, pixel interpolation, and the like. Performs digital signal processing to improve image quality. The YUV data converted by the video signal processing unit 315 is sequentially stored in the image memory 316, and in the REC through mode, every time one frame of data (image data) is accumulated, the YUV data is converted into a video signal. 319 and displayed on the screen as a through image.

そして、静止画撮影モードにおいては、シャッターキー操作をトリガとして、制御回路３１２は、撮像部３０９、ドライバ３１１、ユニット回路３１３、及び映像信号処理部３１５に対してスルー画撮影モードから静止画撮影モードへの切り替えを指示し、この静止画撮影モードによる撮影処理により得られた画像データは、画像符号器／復号器３２０で圧縮及び符号化され、最終的には所定のフォーマットの静止画ファイルとして、入力インターフェース３２２を介して外部メモリ（図示せず）に記録される。 In the still image shooting mode, using the shutter key operation as a trigger, the control circuit 312 causes the imaging unit 309, the driver 311, the unit circuit 313, and the video signal processing unit 315 to switch from the through image shooting mode to the still image shooting mode. The image data obtained by the shooting process in this still image shooting mode is compressed and encoded by the image encoder / decoder 320, and finally as a still image file of a predetermined format, It is recorded in an external memory (not shown) via the input interface 322.

また、動画撮影モードにおいては、１回目のシャッターキーと２回目のシャッターキー操作との間に、画像メモリ３１６に順次記憶される複数の画像データが画像符号器／復号器３２０で順次圧縮され、符号化画像メモリ３２１に順次記憶された後、動画ファイルとして外部メモリに記録される。この外部メモリに記録された静止画ファイル及び動画ファイルは、ＰＬＡＹ・モードにおいてユーザーの選択操作に応じて画像伸張／復号化部３１８に読み出されるとともに伸張及び復号化され、表示部３１９に表示される。 In the moving image shooting mode, a plurality of image data sequentially stored in the image memory 316 are sequentially compressed by the image encoder / decoder 320 between the first shutter key operation and the second shutter key operation. After being sequentially stored in the encoded image memory 321, it is recorded in the external memory as a moving image file. The still image file and the moving image file recorded in the external memory are read to the image expansion / decoding unit 318 and expanded / decoded in accordance with the user's selection operation in the PLAY mode, and are displayed on the display unit 319. .

また、このデジタルカメラ３００は、各被写体（被写体Ａ，Ｂ、Ｃ・・・）までの距離に応じた測距信号を発生する測距センサ３２６を備えており、この測距センサ３２６からの出力信号は、前記映像信号処理部３１５からの映像信号とともに、測距部／合焦検出部３２７に入力される。測距部／合焦検出部３２７はこれら入力信号に基づき、各被写体（被写体Ａ，Ｂ・・・）までの距離を検出するものであって、この検出された距離は、被写体Ａ，Ｂ・・・の被写体距離ＬＡ、ＬＢ、ＬＣとしてフォーカス距離メモリ３２８に記憶される。 The digital camera 300 also includes a distance measuring sensor 326 that generates a distance measuring signal corresponding to the distance to each subject (subjects A, B, C...), And an output from the distance measuring sensor 326. The signal is input to the distance measuring unit / focus detection unit 327 together with the video signal from the video signal processing unit 315. The distance measuring unit / focus detection unit 327 detects the distance to each subject (subjects A, B...) Based on these input signals. .. Are stored in the focus distance memory 328 as subject distances LA, LB, and LC.

また、制御回路３１２には、座標入力部及び座標入力部（共に図示せず）が入力回路３３１を介して接続されている。座標入力部は、前記表示部３１９に積層されているタッチパネル（図示せず）からのタッチ信号に基づく座標値を、入力回路３３１を介して制御回路３１２に出力する In addition, a coordinate input unit and a coordinate input unit (both not shown) are connected to the control circuit 312 via an input circuit 331. The coordinate input unit outputs a coordinate value based on a touch signal from a touch panel (not shown) stacked on the display unit 319 to the control circuit 312 via the input circuit 331.

また、このデジタルカメラ３００は、前記動画撮影モード、音声のみを記録する録音モード、音声付き（静止画）撮影モードにおいて、周囲音を記録する録音機能を備えており、このため周囲音を検出するマイクロホンを有し、このマイクロホンは前記マイクロホンアレー部１０３に配置されたマイクＭ１からマイクＭｎまでのｎ本のマイクロホンで構成されている。各マイクからの音声信号は、対応する各アンプ３３３・・・で増幅され、Ａ／Ｄ変換回路３３４でサンプルホールド及びデジタル変換され、雑音抽出部３５０と雑音抑圧部３６０とに入力される。 The digital camera 300 also has a recording function for recording ambient sounds in the moving image shooting mode, the recording mode for recording only sound, and the recording mode with sound (still image), and thus detects the ambient sound. A microphone is included, and the microphone includes n microphones M1 to Mn arranged in the microphone array unit 103. The audio signals from the microphones are amplified by the corresponding amplifiers 333..., Sample-held and digitally converted by the A / D conversion circuit 334, and input to the noise extraction unit 350 and the noise suppression unit 360.

雑音抽出部３５０は、雑音抽出部３５０は、マイクｎに対応して設けられたｎ個の遅延器Ｄ１〜Ｄｎ、これら遅延器Ｄ１〜Ｄｎからの信号を増幅するアンプＡ１〜Ａｎ、これらアンプＡ１〜Ａｎからの出力を加算する加算器３５１、この加算器３５１から出力される特定方向を強調した音声データを一時的に記憶する抽出音声メモリ３５７と音声メモリ３５２、この音声メモリ３５２に記憶された音声データをフーリエ変換するフーリエ変換部３５３、このフーリエ変換部３５３で変換されたデータを前記雑音抑圧部３６０に送出する収録音のスペクトル部３５４を有している。各遅延器Ｄ１〜Ｄｎは、音声フォーカス設定メモリ３５５に記憶されるフォーカス方向座標θおよびフォーカス音源距離メモリ３５８に記憶されるフォーカス音源距離Ｌｆ基づき、遅延制御またはアレー制御を実行する遅延制御／アレー制御回路３５６により制御される。 The noise extraction unit 350 includes n delay units D1 to Dn provided corresponding to the microphone n, amplifiers A1 to An that amplify signals from the delay units D1 to Dn, and these amplifiers A1. -An adder 351 that adds the outputs from An, an extracted voice memory 357 that temporarily stores voice data that emphasizes a specific direction output from the adder 351, a voice memory 352, and the voice memory 352 A Fourier transform unit 353 for Fourier transforming the audio data, and a recorded sound spectrum unit 354 for sending the data transformed by the Fourier transform unit 353 to the noise suppression unit 360 are provided. Each of the delay devices D1 to Dn performs delay control or array control for performing delay control or array control based on the focus direction coordinate θ stored in the audio focus setting memory 355 and the focus sound source distance Lf stored in the focus sound source distance memory 358. Controlled by circuit 356.

一方、雑音抑圧部３６０は、収録音のスペクトル部３５４からの信号が入力される雑音スペクトルの推定部３６１、主マイクであるマイクＮ１側の信号が順次入力される窓関数部３６２、フーリエ変換部３６３、位相部３６４、逆フーリエ変換部３６５を有するとともに、前記フーリエ変換部３６３の出力信号から前記雑音スペクトルの推定部３６１の出力信号を減算して逆フーリエ変換部３６５に出力する減算回路３６６を有している。 On the other hand, the noise suppression unit 360 includes a noise spectrum estimation unit 361 to which a signal from the recorded sound spectrum unit 354 is input, a window function unit 362 to which a signal on the microphone N1 side that is the main microphone is sequentially input, and a Fourier transform unit. 363, a phase unit 364, and an inverse Fourier transform unit 365, and a subtraction circuit 366 that subtracts the output signal of the noise spectrum estimation unit 361 from the output signal of the Fourier transform unit 363 and outputs the result to the inverse Fourier transform unit 365. Have.

この逆フーリエ変換部３６５からの音声データは、音声メモリ３３８に格納され、この音声メモリ３３８に格納された音声データは、音声符号器／復号器３３９で順次圧縮される。制御回路３１２は、この圧縮音声データと前記圧縮動画データとを含む音声付き動画ファイルを生成して外部メモリに記録する。 The audio data from the inverse Fourier transform unit 365 is stored in the audio memory 338, and the audio data stored in the audio memory 338 is sequentially compressed by the audio encoder / decoder 339. The control circuit 312 generates an audio-added moving image file including the compressed audio data and the compressed moving image data, and records it in an external memory.

以上の構成に係る本実施の形態において、制御回路３１２は前記プログラムに基づき、図１７に示すフローチャートに示すように処理を実行する。すなわち、録音または動画撮影モードが設定されたか否かを判断し（ステップＳ２０１）、録音または動画撮影モード以外の他のモードが設定された場合には、設定された当該その他のモード処理を実行する（ステップＳ２０２）。また、録音または動画撮影モードが設定されたならば、測光処理、ＷＢ処理を実行するとともに（ステップＳ２０３）、ズーム処理を行ってズーム駆動部３０６を制御する（ステップＳ２０４）。また、測距センサ３２６を制御する測距処理を実行するとともに、フォーカス駆動部３０５を制御するＡＦ処理を実行して被写体を合焦させる（ステップＳ２０５）。次に、このＡＦ処理により合焦した被写体Ａ、またはＢ、Ｃ、Ｄの距離情報を測距部／合焦検出部３２７により検出させて、フォーカスフォーカス距離メモリ３２８に記憶させる（ステップＳ２０６）。 In the present embodiment having the above configuration, the control circuit 312 executes processing as shown in the flowchart shown in FIG. 17 based on the program. That is, it is determined whether or not the recording or moving image shooting mode is set (step S201), and when a mode other than the recording or moving image shooting mode is set, the set other mode processing is executed. (Step S202). If the recording or moving image shooting mode is set, photometry processing and WB processing are executed (step S203), and zoom processing is performed to control the zoom drive unit 306 (step S204). In addition, a ranging process for controlling the ranging sensor 326 is executed, and an AF process for controlling the focus driving unit 305 is executed to focus the subject (step S205). Next, the distance information of the subject A or B, C, and D focused by the AF process is detected by the distance measuring unit / focus detection unit 327 and stored in the focus focus distance memory 328 (step S206).

さらに、被写体像スルー画像を、照準、距離情報等とともに、ファインダーに表示させる（ステップＳ２０７）。また、操作により音声走査（スキャン）が指示されたか否かを判断し（ステップＳ２０８）、指示されていない場合にはステップＳ２０９の処理を実行することなく、ステップＳ２１０に進む。指示された場合には、前述した第１の実施の形態と同様に、音声走査入力、特徴抽出、特徴データの画像表示処理を実行する（ステップＳ２０９）。 Further, the subject image through image is displayed on the viewfinder together with the aim, distance information, and the like (step S207). Further, it is determined whether or not an audio scan (scan) is instructed by the operation (step S208). If not instructed, the process proceeds to step S210 without executing the process of step S209. When instructed, voice scanning input, feature extraction, and feature data image display processing are executed as in the first embodiment (step S209).

次に、録音動作中であるか否かを判断し（ステップＳ２１０）、録音動作中でない場合には後述するステップＳ２１９に進む。また、録音動作中であるならば、主マイクＮ１からの音声を入力し（ステップＳ２１１）、この入力音声をＡ／Ｄ変換する（ステップＳ２１２）。さらに、抑圧音声（雑音）スペクトルによる雑音抑圧に設定済みであるか否かを判断する（ステップＳ２１３）。設定済みでない場合には、ステップＳ２１５〜Ｓ２１７の処理を実行することなく、通常の音声抑圧処理を実行する（ステップＳ２１４）。また、設定済みである場合には、窓関数部３６２からのデジタル音声をフーリエ変換部３６３でのＦＦＴ演算で周波数領域に変換し、振幅スペクトル｜Ｘ（ω）｜と位相情報ωを、スペクトルの推定部３６１、逆フーリエ変換部３６５および減算回路３６６に出力させる（ステップＳ２１５）。さらに、減算回路３６６にて、振幅スペクトル｜Ｘ（ω）｜から、スペクトルの推定部３６１よりの音声スペクトル｜Ｗ（ω）｜をスペクトル減算して、
｜Ｓ（ω）｜＝｜Ｘ（ω）｜−｜Ｗ（ω）｜
を逆フーリエ変換部３６５に出力させる（ステップＳ２１６）。 Next, it is determined whether or not the recording operation is being performed (step S210). If the recording operation is not being performed, the process proceeds to step S219 described later. If the recording operation is in progress, the voice from the main microphone N1 is input (step S211), and the input voice is A / D converted (step S212). Further, it is determined whether or not the noise suppression based on the suppressed speech (noise) spectrum has been set (step S213). If it has not been set, normal speech suppression processing is executed without executing steps S215 to S217 (step S214). If the setting has already been made, the digital sound from the window function unit 362 is converted into the frequency domain by the FFT operation in the Fourier transform unit 363, and the amplitude spectrum | X (ω) | The estimation unit 361, the inverse Fourier transform unit 365, and the subtraction circuit 366 are caused to output (step S215). Further, the subtracting circuit 366 subtracts the speech spectrum | W (ω) | from the spectrum estimation unit 361 from the amplitude spectrum | X (ω) |
| S (ω) | = | X (ω) |-| W (ω) |
Is output to the inverse Fourier transform unit 365 (step S216).

また、逆フーリエ変換部３６５にて、スペクトル減算出力に位相情報ωを付加し、逆ＦＦＴ演算で時間領域信号ｓ（ｎ）に変換して音声メモリ３３８に出力出力させる（ステップＳ２１７）。引き続き、逆フーリエ変換部３６５から出力された音声信号を音声符号器／復号器３３９で圧縮符号化処理させて、符号化音声メモリ３４０に記録し（ステップＳ２１８）、リターンする。 Further, the inverse Fourier transform unit 365 adds the phase information ω to the spectrum subtraction output, converts it into a time domain signal s (n) by inverse FFT calculation, and outputs it to the audio memory 338 (step S217). Subsequently, the audio signal output from the inverse Fourier transform unit 365 is compressed and encoded by the audio encoder / decoder 339, recorded in the encoded audio memory 340 (step S218), and the process returns.

他方ステップＳ２１０での判断の結果、録音動作中でない場合には、抑圧する雑音スペクトルの設定がなされたか否かを判断し（ステップＳ２１９）、設定がなされない場合にはその他の処理を実行する（ステップＳ２２０）。このとき、ユーザが図１２に示すように、操作入力部１３０での操作により、映像フォーカス照準１６３を抑圧する雑音スペクトルさせたい被写体Ｃ上に移動させ、この被写体Ｃ上の映像フォーカス照準１６３をユーザが指Ｆでタッチした後、操作入力部１３０にて抑圧する雑音スペクトル設定ボタンを押下すると、抑圧する雑音スペクトルの設定がなされ、被写体Ｃ上に音声抑圧する音源照準２５１が表示されるとともに、音声抑圧の設定されたことを示す雑音抑圧設定マーク２５２が表示される。 On the other hand, if the result of determination in step S210 is that the recording operation is not in progress, it is determined whether or not a noise spectrum to be suppressed has been set (step S219), and if not set, other processing is executed (step S219). Step S220). At this time, as shown in FIG. 12, the user moves the image focus aiming 163 on the subject C to be noise spectrum to suppress the image focus aiming 163 by operating the operation input unit 130, and the image focus aiming 163 on the subject C is moved to the user. After touching with the finger F, when the noise spectrum setting button to be suppressed is pressed on the operation input unit 130, the noise spectrum to be suppressed is set, the sound source aiming 251 for suppressing the sound is displayed on the subject C, and the sound A noise suppression setting mark 252 indicating that suppression has been set is displayed.

したがって、図１７のフローチャートにおいては、抑圧する雑音スペクトルの設定がなされたことにより、ステップＳ２１９の判断がＹＥＳとなってステップＳ２２１に進み、音声抑圧したい方向の被写体Ｃの距離情報Ｌ_Ｃを入力または検出して、雑音抽出する音源距離Ｌ_Ｎとして設定する（ステップＳ２２１）。また、操作入力された被写体Ｃの入力座標を雑音抽出する音源の位置座標（ｘ，ｙ）としてメモリする（ステップＳ２２２）。さらに、この位置座標（ｘ，ｙ）をレンズ焦点距離（ｆ）、画像サイズ（Ｘ′，Ｙ′）に基づいて、強調音源方向の角度θｆまたは方向座標（θｘ，θｙ）に変換する（ステップＳ２２３）。 Accordingly, in the flowchart of FIG. 17, since the noise spectrum to be suppressed is set, the determination in step S219 is YES, and the process proceeds to step S221 where the distance information L _C of the subject C in the direction in which speech suppression is desired is input or It is detected and set as the sound source distance _LN for noise extraction (step S221). Further, the input coordinates of the subject C input by the operation are stored as the position coordinates (x, y) of the sound source for noise extraction (step S222). Further, the position coordinates (x, y) are converted into the angle θf or the direction coordinates (θx, θy) of the emphasized sound source direction based on the lens focal length (f) and the image size (X ′, Y ′) (step) S223).

次に、前記ステップＳ２２１で設定した音源距離Ｌ_Ｎが所定値以上であるか否かを判断する（ステップＳ２２４）。このステップＳ２２４での判断の結果、音源距離Ｌ_Ｎが所定値未満であって近距離である場合には、フォーカスする音源距離Ｌ_Ｎに基づいて、雑音抽出部３５０の各遅延器Ｄ（ｋ）の各遅延時間ｔ_Ｄ（ｋ）を設定する（ステップＳ２２５）。また、ステップＳ２２７での判断の結果、音源距離Ｌ_Ｎが所定値以上であって遠距離である場合には、雑音抽出する音源方向の角度θ_Ｎに基づいて、雑音抽出部３５０の各遅延器Ｄ（ｋ）の各遅延時間ｔ_Ｄ（ｋ）を設定する（ステップＳ２２６）。 Next, it is determined whether or not the sound source distance _LN set in step S221 is greater than or equal to a predetermined value (step S224). If the result of determination in step S224 is that the sound source distance L _N is less than a predetermined value and is a short distance, each delay device D (k) of the noise extraction unit 350 is based on the focused sound source distance L _N. Each delay time tD _(k) is set (step S225). If the result of determination in step S227 is that the sound source distance L _N is greater than or equal to a predetermined value and is a long distance, each delay unit of the noise extraction unit 350 is based on the angle θ _N of the sound source direction from which noise is extracted. Each delay time tD _(k) of _{D (k)} is set (step S226).

しかる後に、マイクロホンアレーから、フォーカスした方向／距離の音声が強調された音声を所定時間入力させ（ステップＳ２２７）、収録した音声を音声メモリ３５２に一時記憶させる（ステップＳ２２８）。また、デジタル音声信号をフーリエ変換部３５３のＦＦＴ演算で周波数領域に変換し、振幅スペクトル｜Ｘ（ω）｜を出力させる（ステップＳ２２９）。この収録音声の振幅スペクトル｜Ｘ（ω）｜を抑圧すべき雑音スペクトル｜Ｗ（ω）｜として、収録音のスペクトル部３５４から雑音抑圧部３６０に出力し、該雑音抑圧部３６０の減算回路３６６に設定し（ステップＳ２３０）、リターンする。 Thereafter, a voice in which the focused direction / distance is emphasized is input from the microphone array for a predetermined time (step S227), and the recorded voice is temporarily stored in the voice memory 352 (step S228). Also, the digital audio signal is converted into the frequency domain by the FFT operation of the Fourier transform unit 353, and the amplitude spectrum | X (ω) | is output (step S229). The amplitude spectrum | X (ω) | of the recorded sound is output as a noise spectrum | W (ω) | to be suppressed from the recorded sound spectrum unit 354 to the noise suppression unit 360, and the subtraction circuit 366 of the noise suppression unit 360 is output. (Step S230) and return.

したがって、このようにして抑圧する雑音スペクトルの設定がなされると、前述したステップＳ２１３の判断がＹＥＳとなることから、前述したステップＳ２１５〜Ｓ２１７の処理が実行されることとなる。 Therefore, when the noise spectrum to be suppressed is set in this way, the determination in step S213 described above becomes YES, and thus the processing in steps S215 to S217 described above is executed.

図１８は、前記第２の実施の形態において用いた、スペクトルサブトラクション法（スペクトル減算法）（以下、ＳＳ法という。）における雑音抑圧回路の構成例を示す図である。すなわち、マイク４０１からの音声信号は、アンプ４０２で増幅され、Ａ／Ｄ変換部４０３デジタル変換され、窓関数部４０４を介してフーリエ変換部４０５に供給される。このフーリエ変換部３５３で変換された振幅スペクトル｜Ｘ（ω）｜は、雑音スペクトル減算部４０６の雑音推定、または、雑音スペクトル設定部４０７、および減算器４０８に与えられ、また、位相情報ωｘ（位相スペクトル）４０９は、逆フーリエ変換部４１０に与えられる。また、この逆フーリエ変換部４１０には、前記減算器４０８からの出力があたえられ、の逆フーリエ変換部４１０の出力である音声信号は、音声メモリ４１１に一時記憶された後、Ｄ／Ａ変換器４１２でアナログ変換され、アンプ４１３で増幅されて、スピーカー４１４で再生されるように構成されている。 FIG. 18 is a diagram illustrating a configuration example of a noise suppression circuit in the spectral subtraction method (spectral subtraction method) (hereinafter referred to as SS method) used in the second embodiment. That is, the audio signal from the microphone 401 is amplified by the amplifier 402, digitally converted by the A / D conversion unit 403, and supplied to the Fourier transform unit 405 via the window function unit 404. The amplitude spectrum | X (ω) | converted by the Fourier transform unit 353 is supplied to the noise estimation of the noise spectrum subtraction unit 406 or the noise spectrum setting unit 407 and the subtractor 408, and the phase information ωx ( (Phase spectrum) 409 is given to the inverse Fourier transform unit 410. Further, the inverse Fourier transform unit 410 is given an output from the subtractor 408, and the audio signal which is the output of the inverse Fourier transform unit 410 is temporarily stored in the audio memory 411 and then D / A converted. The analog signal is converted by the device 412, amplified by the amplifier 413, and reproduced by the speaker 414.

このようにＳＳ法では、音声信号ｓ（ｎ）と雑音信号ｗ（ｎ）とを含む入力音声信号の信号ｘ（ｎ）＝ｓ（ｎ）＋ｗ（ｎ）を、所定サンプリング毎にフレーム分割し、ハニング窓や台形窓などの窓関数で窓掛け（Ｗｉｎｄｏｗｉｎｇ）処理した後、フーリエ変換（ＦＦＴ）により時間領域から周波数領域に変換する。入力信号の振幅パワースペクトル│Ｘ（ω）│から推定雑音のパワースペクトル│Ｘ＾（ω）│を減算して（│Ｓ＾（ω）│＝│Ｘ（ω）│−│Ｘ＾（ω）│）、それに入力信号のω_ｘを加え、得られたＳ＾（ω）＝│Ｓ＾（ω）│ｅｘｐ（ｊω_ｘ）を逆フーリエ変換（ｉｎｖｅｒｓｅＥＥＴ）により時間領域に変換すれば、動作音などの雑音が除去された強調音声信号ｓ＾（ｎ）が得られる。 As described above, in the SS method, the signal x (n) = s (n) + w (n) of the input audio signal including the audio signal s (n) and the noise signal w (n) is divided into frames for each predetermined sampling. After windowing with a window function such as a Hanning window or a trapezoidal window, the time domain is converted to the frequency domain by Fourier transform (FFT). Subtract the estimated noise power spectrum | X ^ (ω) | from the amplitude power spectrum | X (ω) | of the input signal (| S ^ (ω) | = | X (ω) |-| X ^ (ω ) |), Ω _x of the input signal is added to it, and the obtained S ^ (ω) = | S ^ (ω) | exp (jω _x ) is converted into the time domain by inverse Fourier transform (inverse EET). An enhanced speech signal s ^ (n) from which noise such as operation sound is removed is obtained.

ＳＳ法による雑音除去を伝達関数Ｈ（ω）のフィルタと考えると、伝達関数Ｈ（ω）は、
Ｈ（ω）＝Ｓ＾（ω）／Ｘ（ω）｛│Ｘ＾（ω）│−│Ｘ＾（ω）│｝ｅｘｐ（ｊω_ｘ）Ｘ（ω）、
Ｈ（ω）＝１−｛│Ｘ＾（ω）│／│Ｘ（ω）│｝、となる。 Considering noise removal by the SS method as a filter of the transfer function H (ω), the transfer function H (ω) is
H (ω) = S ^ (ω) / X (ω) {| X ^ (ω) | − | X ^ (ω) |} exp (jω _x ) X (ω),
H (ω) = 1− {| X ^ (ω) | / | X (ω) |}.

ＳＳ法では、人間の聴覚にあまり重要でない位相情報には処理を加えず、振幅情報主体での処理を行うので処理が簡単である。また、１つのマイクロホンのみで雑音抑制でき、雑音原数などは事前に知る必要はないが、最低でも１フレーム分の処理遅延が生ずる。また、雑音パワーベクトルの事前情報が必要である。携帯電話などでは、周波数領域に変換した信号の、サブバンド帯域別のＳＮ比（ＳＮＲ）を算出して、非適応な雑音推定を行い、またスペクトル減算（差分）とスペクトル利得による抑圧（乗算）とを組み合わせる方法や、入力信号のパワーベクトルに、ＳＮＲ推定値に逆比例するように重み付けを行って、適応的に雑音推定を行い、雑音の抑圧をスペクトル利得の調整（乗算）のみで行う方法など、複雑な雑音推定方法が検討されているが、機器内モーター動作音の除去には、事前に動作音の雑音スペクトルデータ│Ｗ＾（ω）│等を解析して設定できるので、構成も簡便になり利用し易い利点がある。 In the SS method, processing is not performed on phase information that is not very important for human hearing, and processing is simple because processing is performed mainly with amplitude information. Moreover, noise can be suppressed with only one microphone, and it is not necessary to know the number of noises in advance, but at least a processing delay of one frame occurs. In addition, prior information on the noise power vector is required. In mobile phones, etc., the SNR (SNR) for each subband band of the signal converted to the frequency domain is calculated, non-adaptive noise estimation is performed, and suppression (multiplication) by spectral subtraction (difference) and spectral gain is performed. Or a method of performing noise suppression adaptively by weighting the power vector of the input signal so as to be inversely proportional to the SNR estimated value, and performing noise suppression only by adjusting (multiplying) the spectrum gain. Although complicated noise estimation methods are being studied, in order to remove the motor operating noise in the equipment, the noise spectrum data | W ^ (ω) | There is an advantage that it is simple and easy to use.

なお、例えば、適応フィルタ方式のノイズキャンセラーでは、参照マイクの入力音声に適応フィルタ処理を施した信号を、主マイクの入力信号から減算するが、主マイクの他に雑音を検出するための参照用マイクを必要とする。実施の形態のようにマイクロホンアレー部１０３を設けた録音入力部の場合には、その一部を雑音参照用のマイクとして利用することもできる。 For example, in an adaptive filter type noise canceller, a signal obtained by performing adaptive filter processing on the input sound of the reference microphone is subtracted from the input signal of the main microphone. Need a microphone. In the case of the recording input unit provided with the microphone array unit 103 as in the embodiment, a part of the recording input unit can be used as a noise reference microphone.

適応フィルタ方式の動作は、希望音声信号ｓ（ｎ）と経路ｈ_ｋ(ｍ)を経由して雑音源ｗｓ（ｎ）から到達する雑音ｗ（ｎ）の和である、ｓ（ｎ）＋ｗ（ｎ）が主マイクに入力される。雑音信号Ｗ（ｎ）は、雑音経路のインパルス応答｛ｈ_ｋ(ｍ)｝（ｍ＝１，２・・・Ｐ−１）を用いて次式で表される。
ｗ（ｎ）＝Σ_ｍｈ_ｋ（ｍ）ｗ_ｓ（ｎ−ｍ）， The operation of the adaptive filter system is the sum of the desired audio signal s (n) and the noise w (n) that arrives from the noise source ws (n) via the path h _k (m), s (n) + w ( n) is input to the main microphone. The noise signal W (n) is expressed by the following equation using the impulse response {h _k (m)} (m = 1, 2,..., P−1) of the noise path.
_{w (n) = Σ m h} k (m) w s (n-m),

また、適応フィルタの出力ｙ（ｎ）は、適応フィルタのインパルス応答を｛ｈ_ｆ(ｍ)｝（ｍ＝１，２・・・Ｐ−１）とすると次式で表される。
ｙ（ｎ）＝Σ_ｍｈ_ｆ（ｍ）ｗ_ｓ（ｎ−ｍ），
このときノイズキャンセラーの出力ｓ＾（ｎ）は、
ｓ＾（ｎ）＝ｓ（ｎ）＋ｗ（ｎ）−ｙ（ｎ）＝ｓ（ｎ）＋Σ_ｍ｛ｈ_ｋ（ｍ）−ｈ_ｆ（ｍ）｝ｗ_ｓ（ｎ−ｍ）
したがって、ｈ_ｆ（ｍ）＝ｈ_ｋ（ｍ）とできれば、ｓ＾（ｎ）＝ｓ（ｎ）となり、雑音信号を除去して、音声信号のみを取り出せることとなる。
通常、未知の雑音経路ｈ_ｋ（ｍ）を求めるためには、適応フィルタ係数ｈ_ｆ（ｍ）は、推定誤差ｓ＾（ｎ）の２乗値を統計的に最小にするように更新されるが、ｈ_ｆ（ｋ）の最適値を得るには、Ｐ元の連立方程式を解く必要があり、信号の統計量が必要となる。このため適応フィルタでは、統計学を学習し、逐次最適解を探すためにＬＳＭ（最小二乗平均）法やＮＬＭＳ（正規化最小二乗平均）法などの適応アルゴリズムが必要となる。 Further, the output y (n) of the adaptive filter is expressed by the following equation when the impulse response of the adaptive filter is {h _f (m)} (m = 1, 2... P−1).
_{y (n) = Σ m h} f (m) w s (n-m),
At this time, the output s ^ (n) of the noise canceller is
s ^ (n) = s ( n) + w (n) -y (n) = s (n) + Σ m {h k (m) -h f (m)} w s (n-m)
Therefore, if h _f (m) = h _k (m) can be obtained, s (n) = s (n), and the noise signal can be removed to extract only the audio signal.
Normally, to determine the unknown noise path h _k (m), the adaptive filter coefficient h _f (m) is updated to statistically minimize the square value of the estimation error s ^ (n). However, in order to obtain the optimum value of h _f (k), it is necessary to solve the P-element simultaneous equations, and the statistics of the signal are required. For this reason, the adaptive filter requires an adaptive algorithm such as LSM (least mean square) or NLMS (normalized least mean square) method in order to learn statistics and search for an optimal solution sequentially.

しかし、前述した実施の形態のように、ユーザが音声フォーカスして収録した雑音音声データなどから、雑音の統計量を取得できる場合には、ｈ_ｆ（ｋ）の最適値の初期値を求めておき、設定することができる。このようなノイズキャンセラーでは、雑音源から主マイクへの経路が未知であっても、雑音経路のインパルス応答が適応フィルタにより良好に推定できれば雑音除去を行うことができ、雑音特性が変動しても追従できる。 However, as in the above-described embodiment, when noise statistics can be obtained from noise voice data or the like recorded by the user with voice focus, the initial value of the optimum value of h _f (k) is obtained. Can be set. In such a noise canceller, even if the path from the noise source to the main microphone is unknown, if the impulse response of the noise path can be satisfactorily estimated by the adaptive filter, noise removal can be performed, and the noise characteristics fluctuate. Can follow.

（第３の実施の形態）
図１９及び２０は、本発明の第３の実施の形態における処理手順する示す一連のフローチャートである。前記制御回路３１２は前記プログラムに基づき、このフローチャートに示すように処理を実行する。すなわち、静止画／動画撮影モードまたは録音モードが設定されたか否かを判断し（図１９ステップＳ３０１）、これら以外の他のモードが設定された場合には、設定された当該その他のモード処理を実行する（ステップＳ３０２）。また、静止画／動画撮影モードまたは録音モードが設定されたならば、測光処理、ＷＢ処理を実行するとともに（ステップＳ３０３）、ズーム処理、ＡＦ処理を行って（ステップＳ３０４）、被写体像スルー画像を、ファインダー表示部１１９に表示させる（ステップＳ３０５）。 (Third embodiment)
19 and 20 are a series of flowcharts showing the processing procedure in the third embodiment of the present invention. The control circuit 312 executes processing based on the program as shown in this flowchart. That is, it is determined whether or not the still image / moving image shooting mode or the recording mode has been set (step S301 in FIG. 19). If any other mode is set, the set other mode processing is performed. Execute (Step S302). If the still image / moving image shooting mode or the recording mode is set, photometry processing and WB processing are performed (step S303), zoom processing and AF processing are performed (step S304), and the subject image through image is displayed. The image is displayed on the finder display unit 119 (step S305).

次に、操作入力部１３０での操作により音源方向の検索が指示されたか否かを判断し（ステップＳ３０６）、指示されていない場合にはその他の処理に移行する（ステップＳ３０７）。また、音源方向の検索が指示された場合には、操作入力部１３０での操作に応じて、記録済みの録音データの中から、検索する音源を選択し（ステップＳ３０８）、画像サイズ（Ｘ′Ｙ′）、焦点距離（ｆ）、デジタルズーム倍率（Ｍ）から捜査範囲の角度（θｘｍｉｎ、θｘｍａｘ）を設定する（ステップＳ３０９）。引き続き、θｙ＝θｙｍｉｎとするとともに（ステップＳ３１０）、θｘ＝θｘｍｉｎとする（ステップＳ３１１）。そして、下記式を用いて走査音源方向（θｘ，θｙ）にフォーカスする為の各遅延器Ｄ（ｋ）の遅延時間ｔＤ（ｊ，ｋ）を設定する
ｔＤｘ（ｊ）＝（ｍ−ｊ）・ｄｘ・ｓｉｎθｘ／ｃ、
ｔＤｙ（ｋ）＝（ｎ−ｋ）・ｄｙ・ｓｉｎθｙ／ｃ、
ｔＤ（ｊ，ｋ）＝√［｛ｔＤｘ（ｊ）｝^２＋｛ｔＤｙ（ｋ）｝^２］
（但し、ｋ：マイク番号１〜ｎ、ｄ：マイク間隔、ｃ：音速） Next, it is determined whether or not a search for a sound source direction has been instructed by an operation on the operation input unit 130 (step S306). If not instructed, the process proceeds to other processing (step S307). When the search for the sound source direction is instructed, the sound source to be searched is selected from the recorded recording data in accordance with the operation on the operation input unit 130 (step S308), and the image size (X ′ Y ′), the focal length (f), and the digital zoom magnification (M), the angle (θxmin, θxmax) of the search range is set (step S309). Subsequently, θy = θymin is set (step S310), and θx = θxmin is set (step S311). Then, the delay time tD (j, k) of each delay device D (k) for focusing in the scanning sound source direction (θx, θy) is set using the following formula: tDx (j) = (m−j) · dx · sin θx / c,
tDy (k) = (n−k) · dy · sin θy / c,
tD (j, k) = √ [{tDx (j)} ² + {tDy (k)} ² ]
(Where k: microphone number 1 to n, d: microphone interval, c: sound velocity)

次に、走査方向（θｘ、θｙ）に指向させたマイクロホンアレーから音声を入力させ（ステップＳ３１３）、Ａ／Ｄ変換し、所定時間の入力音声データを入力音声メモリ１３７にとして記録する（ステップＳ３１４）。また、ＦＦＴ演算で周波数領域に変換し、振幅スペクトル｜Ｘ（ω）｜を算出し、その時間変化データを求める（ステップＳ３１５）。次に、入力音声スペクトルの時間変化と選択音源のそれとを比較し、相関度を算出し、特徴データＣｓ（θｘ，θｙ）として記録する（ステップＳ３１６）。 Next, voice is input from the microphone array oriented in the scanning direction (θx, θy) (step S313), A / D conversion is performed, and input voice data for a predetermined time is recorded in the input voice memory 137 (step S314). ). Moreover, it converts to a frequency domain by FFT calculation, calculates an amplitude spectrum | X (ω) |, and obtains its time change data (step S315). Next, the time change of the input sound spectrum is compared with that of the selected sound source, the degree of correlation is calculated, and recorded as feature data Cs (θx, θy) (step S316).

引き続き、θｘ＝θｘ＋Δθｘとし（ステップＳ３１７）、θｘ＞θｘｍａｘであるか否かを判断する（ステップＳ３１８）。この判断がＮＯである場合には、ステップＳ３１２からの処理を繰り返す。また、ＹＥＳであるならば、θｙ＝θｙ＋Δθｙとし（ステップＳ３１９）、θｙ＞θｙｍａｘであるか否かを判断する（ステップＳ３２０）。この判断がＮＯである場合には、ステップＳＳ３１１からの処理を繰り返す。また、ＹＥＳであるならば、特徴データＣｓ（θｘ，θｙ）を、対応するθｘ，θｙ座標上に２次元画像としてプロット描画する（ステップＳ３２１）。さらに、特徴データ画像を半透明化し、半透明化した特徴抽出データ画像をスルー画像に重ねてファインダー表示部１１９に表示する（ステップＳ３２２）。 Subsequently, θx = θx + Δθx is set (step S317), and it is determined whether θx> θxmax is satisfied (step S318). If this determination is NO, the processing from step S312 is repeated. If YES, θy = θy + Δθy is set (step S319), and it is determined whether θy> θymax is satisfied (step S320). If this determination is NO, the processing from step SS311 is repeated. If YES, the feature data Cs (θx, θy) is plotted and drawn as a two-dimensional image on the corresponding θx, θy coordinates (step S321). Further, the feature data image is translucent, and the translucent feature extraction data image is superimposed on the through image and displayed on the finder display unit 119 (step S322).

また、算出された相関度である特徴データＣｓ（θｘ，θｙ）が所定値以上の方向の対応位置に音源識別記号、音源の種別などを表示し（図２０ステップＳ３２３）、特徴データＣｓ（θｘ，θｙ）が所定以上の方向の中から、一つの方向（θＦｘ，θＦｙ）をマニュアル操作または自動で選択させる（ステップＳ３２４）。次に、識別方向にカメラをフォーカスするか否かを判断し（ステップＳ３２５）、選択された方向（θＦｘ，θＦｙ）にカメラ撮影のフォーカス点を設定する（ステップＳ３２６）。 Further, the sound source identification symbol, the type of the sound source, and the like are displayed at the corresponding positions in the direction where the calculated feature data Cs (θx, θy) is a predetermined value or more (step S323 in FIG. 20). , Θy) is selected manually or automatically from directions in which the predetermined direction is greater than or equal to a predetermined value (step S324). Next, it is determined whether or not the camera is focused in the identification direction (step S325), and the camera shooting focus point is set in the selected direction (θFx, θFy) (step S326).

次に、識別方向に音声入力をフォーカスする指示があったか否かを判断する（ステップＳ３２７）。指示があった場合には、選択された方向（θＦｘ，θＦｙ）にマイクロホンアレイの指向方向（音声フォーカス）を決定する。すなわち、下記例示式を用いて、方向（θＦｘ，θＦｙ）を強調するように、下記例示次式により、音圧抑制部（指向性ビーム生成部１３５）の各遅延器Ｄ（ｋ）の遅延時間ｔＤ（ｊ，ｋ）を設定する（ステップＳ３２８）。
（例）ｔＤｘ（ｊ）＝（ｍ−ｊ）・ｄｘ・ｓｉｎθＦｘ／ｃ、
ｔＤｙ（ｋ）＝（ｍ−ｋ）・ｄｙ・ｓｉｎθＦｙ／ｃ、
ｔＤ（ｊ，ｋ）＝√［｛ｔＤｘ（ｊ）｝^２＋｛ｔＤｙ（ｋ）｝^２］
さらに、スルー画像に重ねてファインダー表示する（ステップＳ３２９）。そして、指向方向が音声強調されたマイクロホンアレーから音声を入力し（ステップＳ３３０）、撮影処理または録音処理へ移行する（ステップＳ３３１）。 Next, it is determined whether or not there is an instruction to focus the voice input in the identification direction (step S327). When instructed, the direction of the microphone array (audio focus) is determined in the selected direction (θFx, θFy). That is, the delay time of each delay unit D (k) of the sound pressure suppression unit (directional beam generation unit 135) is expressed by the following exemplary equation so as to emphasize the direction (θFx, θFy) using the following exemplary equation. tD (j, k) is set (step S328).
(Example) tDx (j) = (m−j) · dx · sin θFx / c,
tDy (k) = (m−k) · dy · sin θFy / c,
tD (j, k) = √ [{tDx (j)} ² + {tDy (k)} ² ]
Further, the viewfinder image is displayed over the through image (step S329). Then, the voice is input from the microphone array in which the directing direction is voice-enhanced (step S330), and the process proceeds to the photographing process or the recording process (step S331).

つまり、この第３の実施の形態は、図２１に示すように、遅延回路群からなる前記指向性ビーム生成部１３５からの信号を抽出音声メモリ１９０に記憶し、フーリエ変換（ＦＦＴ）回路１９１でフーリエ変換して振幅スペクトル｜Ｘ（ω）｜を出力させ、方向別の音声スペクトルデータを方向別音声スペクトルデータメモリ１９３に記憶させる。また、予め記憶されているスペクトルデータメモリ１９３に記憶されている所定の音源の音源スペクトルデータやその時間変化特性等の特徴データと比較参照してその相関度から近似度などを算出して、各方向別の入力音声の音源の特性や種類を識別し、識別された結果に応じて、特徴データや識別された種別情報をＭファインダーの対応位置に重ねて表示する。 That is, in the third embodiment, as shown in FIG. 21, a signal from the directional beam generation unit 135 consisting of a delay circuit group is stored in the extracted sound memory 190, and the Fourier transform (FFT) circuit 191 stores it. The Fourier transform is performed to output an amplitude spectrum | X (ω) |, and the direction-specific speech spectrum data memory 193 is stored. In addition, the degree of approximation is calculated from the degree of correlation with reference to feature data such as sound source spectrum data of the predetermined sound source stored in the pre-stored spectrum data memory 193 and its time change characteristic, and the like. The characteristics and type of the input sound source for each direction are identified, and the feature data and the identified type information are superimposed on the corresponding positions of the M finder according to the identified result.

また、特定音源との相関度の高い類似の音声が入力された方向を識別して、当該方向にカメラの焦点もしくはマイク入力の指向方向を向けるように制御することで、所望の被写体の撮影や録音に利用できるようにしたのである。 In addition, by identifying the direction in which similar sound having a high degree of correlation with a specific sound source is input, and controlling the direction of the camera focus or the direction of microphone input in that direction, shooting of a desired subject can be performed. It was made available for recording.

例えば、撮影や録音したい野鳥の鳴き声の音声スペクトルの時間変化データなど、探したい所定の音源の特徴データを、予めメモリに記憶しておき、前記の比較する所定音源の特徴データとして選択して設定すれば、方向別に入力された各音声信号から、所望の野鳥のいる方向を識別したり、あるいは、入力された音声に最も相関度の高い所定音源を識別して、野鳥の種類などを音源識別情報としてファインダーに表示させたりすることができる。 For example, feature data of a predetermined sound source to be searched for, such as time change data of the voice spectrum of a bird's cry to be photographed or recorded, is stored in advance in memory, and is selected and set as the feature data of the predetermined sound source to be compared. In this way, the direction of the desired wild bird is identified from each audio signal input by direction, or the predetermined sound source with the highest degree of correlation with the input audio is identified, and the type of wild bird is identified as the sound source Information can be displayed on the viewfinder.

この識別された方向もしくは方向別の音源識別情報に基づいて、制御部では、その方向にカメラのＡＦ機能の焦点を合わせるように制御して、所望の野鳥をすぐに見つけて静止画像や動画像の撮影をしたり、または、その方向にマイクロホンアレーの指向方向を制御して、その方向の音声を強調して、所望の野鳥の鳴き声を明瞭に録音したりすることができる。 Based on the identified direction or the sound source identification information for each direction, the control unit performs control so that the AF function of the camera is focused on that direction, so that a desired wild bird can be immediately found and a still image or a moving image can be found. Or by controlling the directivity direction of the microphone array in that direction and emphasizing the sound in that direction to clearly record a desired bird's cry.

図２２は、予め記憶しておく所定音源の特徴データを示す図である。本図では、記録しておく所定の音源として、複数種の野鳥の鳴き声の音背信号と、特徴データとしてきおくしておく、音声スペクトル及びその時間変化データ、あるいは、ソナグラフ（音声スペクトル）の時間変化と各音圧強度を記録したもの）等の設定データの例である。無論、野鳥や生物、あるいは鳴き声などの音源や音声だけでなく、人間の話し声、航空機などが発生する騒音等などのデータであってもよい。 FIG. 22 is a diagram showing characteristic data of a predetermined sound source stored in advance. In this figure, as a predetermined sound source to be recorded, a sound back signal of a plurality of types of wild bird calls, and a voice spectrum and its time change data, or a sonograph (voice spectrum) to be kept as feature data This is an example of setting data such as recording time change and each sound pressure intensity. Of course, not only sound sources and sounds such as wild birds, creatures, and squeaks, but also data such as human speech, noise generated by aircraft, and the like may be used.

（その他の実施の形態）
（１）なお、前記実施の形態においては、複数の横配列マイクと縦配列マイクとでマイクロホンアレー部１０３を構成するようにしたが、図２３に示すような配置形態としてもよい。
（ａ）デジタルカメラ５００は、カメラ本体５０１と可動式カメラ部５０２とで構成されている。カメラ本体５０１には、ＬＣＤファインダー５０３が配置され、可動式カメラ部４０２には撮像レンズ５０４およびストロボ５０５が設けられ、ストロボ５０５の下部に水平方向に配置された複数のマイクで構成されたマイクロホン部５０６が設けられた構成である。
（ｂ）デジタルカメラ６００は、カメラ本体６０１の前面に撮像レンズ６０２が配置され、前面上部両側に水平方向に配置された複数のマイクで構成された左マイクロホン部６０３Ｌと、右マイクロホン部６０３Ｒとが設けられた構成である。
（ｃ）デジタルカメラ７００は、カメラ本体７０１の前面に撮像レンズ７０２が配置され、撮像レンズ１０２の周部にこれを囲繞するように配置された複数のマイクで構成されたマイクロホン部７０３が設けられた構成である。
以上のように、マイクロホン部のマイク配置形態は、直線的であっても曲線的であってもよい。 (Other embodiments)
(1) In the above-described embodiment, the microphone array unit 103 is configured by a plurality of horizontal array microphones and vertical array microphones, but may be arranged as shown in FIG.
(A) The digital camera 500 includes a camera body 501 and a movable camera unit 502. An LCD finder 503 is disposed on the camera body 501, an imaging lens 504 and a strobe 505 are provided on the movable camera unit 402, and a microphone unit composed of a plurality of microphones disposed horizontally below the strobe 505. 506 is provided.
(B) In the digital camera 600, an imaging lens 602 is disposed on the front surface of the camera body 601, and a left microphone unit 603L and a right microphone unit 603R each composed of a plurality of microphones disposed in the horizontal direction on both upper sides of the front surface. It is the structure provided.
(C) In the digital camera 700, an imaging lens 702 is disposed on the front surface of the camera body 701, and a microphone unit 703 including a plurality of microphones disposed so as to surround the imaging lens 102 is provided. It is a configuration.
As described above, the microphone arrangement form of the microphone unit may be linear or curvilinear.

（２）実施の形態においては、被写体周辺の音圧や周波数特性、スペクトルなどの音声の特徴データの画像を被写体像を重ねて表示し、それに応じて撮影や録音操作できるようにしたが、音声信号のその他の特徴データを抽出し可視化して、表示するようにしてもよい。あるいは、例えば、犬やコウモリなどある種の生物には聞こえるが、人間の可聴周波数範囲を超えた超音波などを入力し可視化して、表示するようにしてもよい。 (2) In the embodiment, an image of sound feature data such as sound pressure, frequency characteristics, spectrum, etc. around the subject is displayed with the subject image superimposed, and shooting and recording operations can be performed accordingly. Other feature data of the signal may be extracted, visualized, and displayed. Alternatively, for example, an ultrasonic wave that can be heard by a certain kind of living organism such as a dog or a bat but exceeds the human audible frequency range may be input and visualized for display.

（３）あるいは、音声信号以外でも、モンシロチョウなど、ある種の生物には可視である紫外線領域の光信号など、人間には可視範囲外の撮像信号以外の光や、放射線、電磁波など、あるいは、その他のセンサ手段による被写体周辺からの検出信号から、その特徴データを被写体像の方向に対応付けて画像化して、撮像信号による被写体像に重ねて、表示するようにしてもよい。 (3) Or, in addition to audio signals, light signals in the ultraviolet region that are visible to certain organisms such as white butterflies, light other than imaging signals that are not visible to humans, radiation, electromagnetic waves, etc. From the detection signal from the periphery of the subject by other sensor means, the feature data may be imaged in association with the direction of the subject image, and may be displayed superimposed on the subject image by the imaging signal.

本発明の各実施の形態に係るデジタルカメラの斜視図である。1 is a perspective view of a digital camera according to each embodiment of the present invention. 第１の実施の形態に係るデジタルカメラの回路構成を示すブロック図である。1 is a block diagram illustrating a circuit configuration of a digital camera according to a first embodiment. 同実施の形態における処理手順を示すフローチャートである。It is a flowchart which shows the process sequence in the embodiment. 図３に続くフローチャートである。It is a flowchart following FIG. 同実施の形態の動作を示す説明図である。It is explanatory drawing which shows operation | movement of the embodiment. 同実施の形態における画面遷移図である。It is a screen transition diagram in the same embodiment. 画角や半画角、被写体範囲がズーム操作などレンズ焦点距離（ｆ）の変化に伴って変化するときの強調音源方向座標（θｘ，θｙ）の換算例を示す図である。It is a figure which shows the example of conversion of the emphasis sound source direction coordinate ((theta) x, (theta) y) when a field angle, a half field angle, and a to-be-photographed object range change with changes of lens focal distances (f), such as zoom operation. マクロホンアレーによる指向制御、音声強調、音声抑圧処理の変形例を示すブロック回路図である。It is a block circuit diagram which shows the modification of the directivity control by a macrophone array, audio | voice emphasis, and audio | voice suppression processing. マクロホンアレーによる指向制御、音声強調、音声抑圧処理の変形例を示すブロック回路図である。It is a block circuit diagram which shows the modification of the directivity control by a macrophone array, audio | voice emphasis, and audio | voice suppression processing. マクロホンアレーによる指向制御、音声強調、音声抑圧処理の変形例を示すブロック回路図である。It is a block circuit diagram which shows the modification of the directivity control by a macrophone array, audio | voice emphasis, and audio | voice suppression processing. 特徴データの二次元画像を生成する処理の例を示す図である。It is a figure which shows the example of the process which produces | generates the two-dimensional image of feature data. 音声強調する被写体の方向を設定するときの表示例を示す図である。It is a figure which shows the example of a display when setting the direction of the to-be-photographed object. 抑圧したい音源や被写体の方向を設定するときの表示例を示す図である。It is a figure which shows the example of a display when setting the direction of the sound source and the to-be-photographed object to suppress. 走査（スキャン）入力された音声信号から解析抽出する特定データの例を示す図である。It is a figure which shows the example of the specific data analyzed and extracted from the audio | voice signal input by scanning (scan). 走査（スキャン）入力された音声信号から解析抽出する特定データの例を示す図である。It is a figure which shows the example of the specific data analyzed and extracted from the audio | voice signal input by scanning (scan). 本発明の第２の実施の形態に係るデジタルカメラの回路構成を示すブロック図である。It is a block diagram which shows the circuit structure of the digital camera which concerns on the 2nd Embodiment of this invention. 同実施の形態における処理手順を示すフローチャートである。It is a flowchart which shows the process sequence in the embodiment. 第２の実施の形態において用いた、スペクトルサブトラクション法（スペクトル減算法）における雑音抑圧回路の構成例を示す図である。It is a figure which shows the structural example of the noise suppression circuit in the spectrum subtraction method (spectrum subtraction method) used in 2nd Embodiment. 本発明の第３の実施の形態における処理手順を示すフローチャートである。It is a flowchart which shows the process sequence in the 3rd Embodiment of this invention. 図１９に続くフローチャートである。It is a flowchart following FIG. 第３の実施の形態の概要を示す説明図である。It is explanatory drawing which shows the outline | summary of 3rd Embodiment. 予め記憶しておく所定音源の特徴データを示す図である。It is a figure which shows the characteristic data of the predetermined sound source memorize | stored beforehand. 本発明の他の実施の形態を示すカメラ外観図である。It is a camera external view which shows other embodiment of this invention.

符号の説明Explanation of symbols

Ｍ１〜Ｍｎマイク
Ａ１〜Ａｎアンプ
Ｄ１〜Ｄｎ遅延器
１００デジタルカメラ
１０１本体
１０２撮像レンズ
１０３マイクロホンアレー部
１０４カバー体
１０５レンズ駆動部
１０９撮像素子
１１１ドライバ
１１２撮影録音制御部
１１３被写体像スルー画像部
１１４Ａ／Ｄ変換器
１１５画像信号処理部
１１６画像メモリ
１１７画像合成部
１１９ファインダー表示部
１１９ファインダー／表示部
１１９表示部
１２０画像圧縮符号器／伸長復号器
１２１符号化画像メモリ
１２６駆動量／焦点距離部
１３０操作入力部
１３１入力回路
１３２タッチパネル
１３４Ａ／Ｄ変換回路
１３５指向性ビーム生成部
１３６加算器
１３７入力音声メモリ
１３８音声メモリ
１３９音声圧縮符号器／伸長復号器
１４４指向性制御部
１４４第１指向性制御部
１４５第２指向性制御部
１５１特徴抽出データメモリ
１５２二次元画像生成部
１５３半透明画像変換部
１５４半透明化パターン生成部
１６０被写体スルー画像
１６１二次元画像
１６２特徴抽出データ画像
１６３映像フォーカス照準
３００デジタルカメラ
M1 to Mn Microphone A1 to An Amplifier D1 to Dn Delay device 100 Digital camera 101 Main body 102 Imaging lens 103 Microphone array unit 104 Cover body 105 Lens driving unit 109 Imaging element 111 Driver 112 Shooting recording control unit 113 Subject image through image unit 114 A / D converter 115 Image signal processing unit 116 Image memory 117 Image composition unit 119 Finder display unit 119 Finder / display unit 119 Display unit 120 Image compression encoder / decompression decoder 121 Encoded image memory 126 Drive amount / focal length unit 130 Operation input unit 131 Input circuit 132 Touch panel 134 A / D conversion circuit 135 Directional beam generation unit 136 Adder 137 Input audio memory 138 Audio memory 139 Audio compression encoder / decompression decoder 144 Directivity control Unit 144 First directivity control unit 145 Second directivity control unit 151 Feature extraction data memory 152 Two-dimensional image generation unit 153 Translucent image conversion unit 154 Translucent pattern generation unit 160 Subject through image 161 Two-dimensional image 162 Feature extraction Data image 163 Image focus aiming 300 Digital camera

Claims

表示手段と、
撮像手段と、
この撮像手段により撮像される画像を前記表示手段に表示させる第１の表示制御手段と、
前記撮像手段の撮像範囲内における周囲音を検出する周囲音検出手段と、
この周囲音検出手段により検出された周囲音を表す可視情報を生成し、この可視情報を前記周囲音検出手段により検出された前記周囲音の前記撮像範囲内における位置に対応させて、前記表示手段に表示させる第２の表示制御手段と、
この第２の表示制御手段により前記表示手段に表示された前記周囲音を示す可視情報の任意の部分を指定することにより、前記周囲音検出手段により検出される周囲音に含まれる任意の音声を選択する選択手段と、
前記周囲音検出手段により検出された周囲音を制御し、前記選択手段により選択された音声を強調処理または抑圧処理する音声制御手段と、
この音声制御手段により前記音声を強調処理または抑圧処理された周囲音を記録する記録手段と
を備えることを特徴とする撮像装置。 Display means;
Imaging means;
First display control means for causing the display means to display an image captured by the imaging means;
Ambient sound detection means for detecting ambient sound within the imaging range of the imaging means;
The ambient sound to produce a visible information representative of the detected ambient sound by the detecting means, and the visible information so as to correspond to the position within the imaging range of the ambient sound detected by the ambient sound detecting means, said display means a second display control means for displaying on,
By designating an arbitrary portion of the visible information indicating the ambient sound displayed on the display means by the second display control means, an arbitrary sound included in the ambient sound detected by the ambient sound detecting means can be obtained. A selection means to select;
Voice control means for controlling the ambient sound detected by the ambient sound detection means, and for emphasizing or suppressing the voice selected by the selection means;
An image pickup apparatus comprising: recording means for recording ambient sound in which the sound is enhanced or suppressed by the sound control means .

前記第２の表示制御手段は、前記可視情報を前記第１の表示制御手段により前記表示手段に表示される画像に重ねて、前記表示手段に表示させることを特徴とする請求項１記載の撮像装置。 The second display control means, the visible information superimposed said the images displayed on the display means by said first display control means, according to claim 1, wherein the to be displayed on said display means Imaging device.

前記可視情報は、半透明化されていることを特徴とする請求項２記載の撮像装置。 The imaging apparatus according to claim 2 , wherein the visible information is translucent .

前記可視情報は、前記周囲音の分布状況を音圧レベルに基づいて表した二次元画像であることを特徴とする請求項１から３にいずれか記載の撮像装置。 The visible information, the image pickup apparatus according to any one of claims 1 to 3, characterized in that the distribution of the ambient sound is a two-dimensional image representing on the basis of the sound pressure level.

前記二次元画像は、音圧レベルに応じて色が異なっていることを特徴とする請求項４記載の撮像装置。 The imaging device according to claim 4 , wherein the two-dimensional image has a different color according to a sound pressure level .

前記記録手段は、音声が強調処理または抑圧処理された前記周囲音を前記撮像手段により撮像された画像とともに記録することを特徴とする請求項１から５にいずれか記載の撮像装置。 It said recording means, the image pickup apparatus according to any one to claims 1 to 3, characterized in that the recording with an image picked up by the image pickup means the ambient sound voice is emphasis processing or reduction processing.

前記音声制御手段は、前記第２の表示制御手段により前記表示手段に表示された前記周囲音を示す可視情報中における任意の部分に対する操作に基づき得られる位置座標に基づき、前記指定された周囲音の方向を算出し、この算出した方向からの音声を強調処理または抑圧処理することを特徴とする請求項１から６にいずれか記載のカメラ装置。 The voice control unit is configured to output the designated ambient sound based on position coordinates obtained based on an operation on an arbitrary part in the visible information indicating the ambient sound displayed on the display unit by the second display control unit. of calculating the direction, the camera apparatus according to any one of claims 1 to 6, characterized in that the enhancement or suppression processing audio from the calculated direction.

前記音声制御手段は、前記位置座標と、前記撮像手段の焦点距離及び又は前記画像のサイズとに基づき、前記指定された周囲音の方向を算出し、この算出した方向からの音声を強調処理または抑圧処理することを特徴とする請求項７記載のカメラ装置。 The sound control means calculates the direction of the designated ambient sound based on the position coordinates, the focal length of the image pickup means and / or the size of the image, and emphasizes the sound from the calculated direction or The camera apparatus according to claim 7, wherein a suppression process is performed.

撮像手段と、
音声の特徴データを記憶した特徴データ記憶手段と、
周囲音を検出する周囲音検出手段と、
前記特徴データ記憶手段に記憶された音声の特徴データと、前記周囲音検出手段により検出された周囲音中の音声データとを比較する比較手段と、
この比較手段による比較に基づき、前記撮像手段の撮像範囲内において、前記特徴データに近似する周囲音を発生している被写体を検出する被写体検出手段と
を備えることを特徴とする撮像装置。 Imaging means;
Feature data storage means for storing voice feature data;
Ambient sound detection means for detecting ambient sound;
Comparison means for comparing the feature data of the voice stored in the feature data storage means with the voice data in the ambient sound detected by the ambient sound detection means;
An imaging apparatus comprising: a subject detection unit configured to detect a subject generating an ambient sound that approximates the feature data within an imaging range of the imaging unit based on the comparison by the comparison unit.

前記被写体検出手段により検出された被写体に、前記撮像手段を合焦させる合焦制御手段を更に備えることを特徴とする請求項９記載の撮像装置。 The imaging apparatus according to claim 9, further comprising a focusing control unit that focuses the imaging unit on a subject detected by the subject detection unit.

前記被写体検出手段により検出された被写体を表示する表示手段を更に備えることを特徴とする請求項９または１０記載の撮像装置。 The imaging apparatus according to claim 9, further comprising display means for displaying a subject detected by the subject detection means.

前記周囲音検出手段により検出された周囲音を制御し、前記被写体検出手段により検出された被写体からの音声を強調処理または抑圧処理する音声制御手段と、
この音声制御手段により前記音声を強調処理または抑圧処理された周囲音を記録する記録手段と
を更に備えることを特徴とする請求項９、１０または１１記載の撮像装置。 Voice control means for controlling ambient sound detected by the ambient sound detection means, and emphasizing or suppressing voice from the subject detected by the subject detection means;
12. The imaging apparatus according to claim 9, further comprising recording means for recording ambient sound in which the sound is enhanced or suppressed by the sound control means.

前記記録手段は、前記周囲音を前記撮像手段により撮像された画像とともに記録することを特徴とする請求項１２記載の撮像装置。 The imaging apparatus according to claim 12, wherein the recording unit records the ambient sound together with an image captured by the imaging unit.

複数の音声の特徴データのうち任意の特徴データを指定する指定手段を更に備え、
前記記憶手段は、前記指定手段により指定された前記特徴データを記憶することを特徴とする請求項９から１３にいずれか記載の撮像装置。 A designating unit for designating arbitrary feature data among the plurality of audio feature data;
The imaging device according to claim 9, wherein the storage unit stores the feature data designated by the designation unit.

前記周囲音検出手段は、複数のマイクロホンを有するマイクロホンアレーであることを特徴とする請求項３から１４にいずれか記載の撮像装置。 The imaging device according to claim 3, wherein the ambient sound detection unit is a microphone array having a plurality of microphones.

表示手段と、撮像手段と、この撮像手段の撮像範囲内における周囲音を検出する周囲音検出手段とを備える撮像装置が有するコンピュータを、
前記撮像手段により撮像される画像を前記表示手段に表示させる第１の表示制御手段と、
前記周囲音検出手段により検出された周囲音を表す可視情報を生成し、この可視情報を前記周囲音検出手段により検出された前記周囲音の前記撮像範囲内における位置に対応させて、前記表示手段に表示させる第２の表示制御手段と、
この第２の表示制御手段により前記表示手段に表示された前記周囲音を示す可視情報の任意の部分を指定することにより、前記周囲音検出手段により検出される周囲音に含まれる任意の音声を選択する選択手段と、
前記周囲音検出手段により検出された周囲音を制御し、前記選択手段により選択された音声を強調処理または抑圧処理する音声制御手段と、
この音声制御手段により前記音声を強調処理または抑圧処理された周囲音を記録する記録手段と
して機能させることを特徴とする撮像制御プログラム。 A computer included in an imaging apparatus comprising display means, imaging means, and ambient sound detection means for detecting ambient sounds within the imaging range of the imaging means,
First display control means for causing the display means to display an image picked up by the image pickup means;
Wherein generating a visual information representing the ambient sound detected by the ambient sound detecting means, the visual information in correspondence with a position within the imaging range of the ambient sound detected by the ambient sound detecting means, said display means a second display control means for displaying on,
By designating an arbitrary portion of the visible information indicating the ambient sound displayed on the display means by the second display control means, an arbitrary sound included in the ambient sound detected by the ambient sound detecting means can be obtained. A selection means to select;
Voice control means for controlling the ambient sound detected by the ambient sound detection means, and for emphasizing or suppressing the voice selected by the selection means;
An imaging control program that functions as recording means for recording ambient sound in which the sound is enhanced or suppressed by the sound control means .

撮像手段と、音声の特徴データを記憶した特徴データ記憶手段と、周囲音を検出する周囲音検出手段とを備える撮像装置が有するコンピュータを、
前記特徴データ記憶手段に記憶された音声の特徴データと、前記周囲音検出手段により検出された周囲音中の音声データとを比較する比較手段と、
この比較手段による比較に基づき、前記撮像手段が撮像する被写体において、前記特徴データに近似する周囲音を発生している被写体を検出する被写体検出手段と
して機能させることを特徴とする撮像制御プログラム。 A computer included in an imaging apparatus including an imaging unit, a feature data storage unit that stores voice feature data, and an ambient sound detection unit that detects ambient sound,
Comparison means for comparing the feature data of the voice stored in the feature data storage means with the voice data in the ambient sound detected by the ambient sound detection means;
An imaging control program that functions as a subject detection unit that detects a subject generating an ambient sound approximate to the feature data in a subject captured by the imaging unit based on the comparison by the comparison unit.