JP5219761B2

JP5219761B2 - Imaging device

Info

Publication number: JP5219761B2
Application number: JP2008307088A
Authority: JP
Inventors: 晋一増田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2008-12-02
Filing date: 2008-12-02
Publication date: 2013-06-26
Anticipated expiration: 2028-12-02
Also published as: JP2010135876A

Description

本発明は、撮像装置に関し、より具体的には、映像を参照して音声に含まれる騒音を低減する撮像装置に関する。 The present invention relates to an imaging apparatus, and more particularly, relates to an imaging apparatus for reducing noise included in the voice with reference to the image.

近年、静止画を撮影するデジタルスチルカメラにおいても、ビデオカメラの様に動画が撮影可能となっていており、音声も同時記録できる。他方、ビデオカメラでも、高解像度の静止画を撮影可能になっている。すなわち、ビデオカメラとスチルカメラとの境界が曖昧になってきている。 In recent years, a digital still camera that captures a still image can also capture a moving image like a video camera, and can also record audio simultaneously. On the other hand, even a video camera can shoot a high-resolution still image. That is, the boundary between the video camera and the still camera has become ambiguous.

スチルカメラ及びビデオカメラ等の撮像装置は、一般に、撮影画角変更用のズームレンズ、焦点の調節用のフォーカスレンズ、並びに、光量調整用のＮＤ（減光）フィルタ及び絞りを具備する。そして、撮像装置は、これらを機械的に駆動する駆動装置を具備する。これら機構部品は、動作時に不可避的に音を発生する。音声が記録可能な撮像装置では、これら機構部分の発生する音が、騒音として音声と共に記録されてしまうという問題がある。 An imaging apparatus such as a still camera and a video camera generally includes a zoom lens for changing a shooting angle of view, a focus lens for adjusting a focus, an ND (dimming) filter for adjusting a light amount, and an aperture. And an imaging device comprises the drive device which drives these mechanically. These mechanical parts inevitably generate sound during operation. In an imaging device capable of recording sound, there is a problem that sound generated by these mechanism portions is recorded together with sound as noise.

このための対策として、従来、駆動源、特に主たる騒音源であるモータと、マイクとを離して配置したり、駆動源の周囲に防音材を配設したり、あるいは、作動音の比較的少ないモータを使用する方法が採用された。例えば、特許文献１には、スチルビデオカメラにおいて、音声入力手段の作動時に電気部品（モータ）を作動させないようにする技術が記載されている。 As countermeasures for this, conventionally, the drive source, in particular, the motor, which is the main noise source, and the microphone are arranged apart from each other, a soundproof material is arranged around the drive source, or the operation sound is relatively low. A method using a motor was adopted. For example, Patent Document 1 describes a technique for preventing an electric component (motor) from being operated when a sound input unit is operated in a still video camera.

特許文献２には、モータの回転が始まると、単調にカメラ騒音域を低減して記録することが記載されている。また、被写体音と騒音周波数が同一時には、被写体音が大きい場合に騒音低減処理を動作させることが記載されている。 Patent Document 2 describes that when the rotation of a motor starts, the camera noise range is monotonously reduced and recorded. Further, it is described that when the subject sound and the noise frequency are the same, the noise reduction process is operated when the subject sound is loud.

特許文献３には、撮影モードに応じて、マイクの指向性を変更することが記載されている。たとえば、ポートレートモードではマイク指向性を正面の感度を上げてこれを重点的にし、周囲の感度を落として周囲のノイズをカツトする。また、風景モード又はスポーツモード等では、指向性を広げて周囲の音も記録可能とする。
特開平９−２３３４１４号公報特開２００６−１８６８１９号公報特開２００２−１９９２７２公報 Patent Document 3 describes changing the directivity of a microphone in accordance with a shooting mode. For example, in portrait mode, microphone directivity is increased by focusing on the front sensitivity, and ambient sensitivity is lowered to cut out ambient noise. In the landscape mode or the sports mode, the directivity is expanded and ambient sounds can be recorded.
Japanese Patent Laid-Open No. 9-233414 JP 2006-186819 A JP 2002-199272 A

マイクと駆動源を物理的に離して配置する方法は、撮像装置の大型化を招く事となり、撮像装置小型化の要求に反する。駆動源の周囲に防音部材を配置する方法では、撮像装置の価格上昇を招く。 A method in which the microphone and the driving source are physically separated from each other leads to an increase in the size of the imaging apparatus, which is against the demand for downsizing the imaging apparatus. In the method of arranging the soundproof member around the drive source, the price of the imaging device is increased.

特許文献１の技術では、動画像の撮影中には撮影画角の変更や焦点調節が行えなくなり、撮影者の意図する画像が得られなくなる。 With the technique of Patent Document 1, it is impossible to change the shooting angle of view or adjust the focus during moving image shooting, and an image intended by the photographer cannot be obtained.

特許文献２に記載の技術では、マイクで集音した音圧と騒音の音圧の大小関係で、騒音低減処理の有無を決定している。従って、撮影している被写体の重要性を考慮しておらず、被写体に応じた適切な騒音低減処理になっていない。 In the technique described in Patent Document 2, the presence or absence of noise reduction processing is determined based on the magnitude relationship between the sound pressure collected by the microphone and the sound pressure of the noise. Therefore, the importance of the subject being photographed is not taken into consideration, and noise reduction processing appropriate for the subject is not performed.

特許文献３に記載の技術では、撮影モードに応じてマイクの指向性を変更するようになっている。従って、撮影モードを手動で切り替える煩わしさがあると共に、モード設定を誤ると、録音音質も悪化する可能性がある。 In the technique described in Patent Document 3, the directivity of the microphone is changed according to the shooting mode. Therefore, there is an annoyance of manually switching the shooting mode, and if the mode is set incorrectly, the recording sound quality may be deteriorated.

本発明は、上記事情に鑑みてなされたもので、大型化やコストアップを抑制しつつ、記録すべき音声に対する駆動機構の動作による騒音の影響を効果的に低減する撮像装置を提示することを目的とする。 The present invention has been made in view of the above circumstances, and presents an imaging apparatus that effectively reduces the influence of noise caused by the operation of the drive mechanism on the sound to be recorded while suppressing an increase in size and cost. Objective.

本発明に係る撮像装置は、撮像手段と、音声入力手段と、ズームレンズ群駆動手段、フォーカスレンズ群駆動手段、ＮＤフィルタ駆動手段及びシャッタ駆動手段の何れかを含む騒音源と、前記撮像手段により入力される画像から被写体を認識する被写体認識手段と、前記音声入力手段により入力される音声から前記騒音源からの騒音を低減する騒音低減手段と、前記被写体認識手段の認識に従い、前記騒音低減手段を制御する制御手段とを具備し、前記被写体認識手段が、前記撮像手段により入力される画像中の人物を認識する手段であり、前記制御手段は、前記被写体認識手段が前記人物を認識している間、人物の発生周波数帯域以外で、前記騒音の帯域を低減するように前記騒音低減手段を制御することを特徴とする。
本発明に係る撮像装置は、撮像手段と、音声入力手段と、ズームレンズ群駆動手段、フォーカスレンズ群駆動手段、ＮＤフィルタ駆動手段及びシャッタ駆動手段の何れかを含む騒音源と、前記撮像手段により入力される画像から被写体を認識する被写体認識手段と、前記音声入力手段により入力される音声から前記騒音源からの騒音を低減する騒音低減手段と、前記被写体認識手段の認識に従い、前記騒音低減手段を制御する制御手段とを具備し、前記被写体認識手段が、前記撮像手段により入力される画像中の人物及びその口の動きを認識する手段であり、前記制御手段は、前記被写体認識手段が口を動かしている人物を認識している間、前記騒音低減手段を有効にすることを特徴とする。
本発明に係る撮像装置は、撮像手段と、音声入力手段と、ズームレンズ群駆動手段、フォーカスレンズ群駆動手段、ＮＤフィルタ駆動手段及びシャッタ駆動手段の何れかを含む騒音源と、前記撮像手段により入力される画像から被写体を認識する被写体認識手段と、前記音声入力手段により入力される音声から前記騒音源からの騒音を低減する騒音低減手段と、前記被写体認識手段の認識に従い、前記騒音低減手段を制御する制御手段と、性別と年齢ごとの代表的な発声周波数帯域の値を記憶する発声周波数帯域データ記憶手段とを具備し、前記被写体認識手段が、前記撮像手段により入力される画像中の人物並びにその性別と年齢を認識する手段であり、前記制御手段は、前記被写体認識手段が前記人物を認識している間、前記人物の前記性別と年齢に応じた発生周波数帯域以外で、前記騒音を低減するように前記騒音低減手段を制御することを特徴とする。
本発明に係る撮像装置は、撮像手段と、音声取得手段と、前記音声取得手段により得られた音声から、前記撮像手段に関連する駆動手段の駆動に伴う騒音を低減する音声処理手段とを有し、前記音声処理手段は、前記撮像手段により撮像されている画像中に人物が検出されている間に前記音声取得手段により得られた音声について、人物の発生周波数帯域ではない周波数帯域において前記駆動手段の駆動に伴う騒音を低減することを特徴とする。
本発明に係る撮像装置は、撮像手段と、音声取得手段と、前記音声取得手段により得られた音声から、前記撮像手段に関連する駆動手段の駆動に伴う騒音を低減する音声処理手段とを有し、前記音声処理手段は、前記撮像手段により撮像されている画像中に人物が検出されている間に前記音声取得手段により得られた音声について、前記検出された人物の性別に応じた周波数帯域において前記駆動手段の駆動に伴う騒音を低減することを特徴とする。
本発明に係る撮像装置は、撮像手段と、音声取得手段と、前記音声取得手段により得られた音声から、前記撮像手段に関連する駆動手段の駆動に伴う騒音を低減する音声処理手段とを有し、前記音声処理手段は、前記撮像手段により撮像されている画像中に人物が検出されている間に前記音声取得手段により得られた音声について、前記検出された人物の年齢に応じた周波数帯域において前記駆動手段の駆動に伴う騒音を低減することを特徴とする。 An image pickup apparatus according to the present invention includes an image pickup means, an audio input means, a noise source including any one of a zoom lens group drive means, a focus lens group drive means, an ND filter drive means, and a shutter drive means, and the image pickup means. Subject recognition means for recognizing a subject from an input image; noise reduction means for reducing noise from the noise source from voice input by the voice input means; and the noise reduction means in accordance with recognition of the subject recognition means Control means for controlling the object, wherein the subject recognition means is means for recognizing a person in the image inputted by the imaging means, and the control means recognizes the person by the subject recognition means. The noise reduction means is controlled so as to reduce the noise band outside the frequency band generated by the person .
An image pickup apparatus according to the present invention includes an image pickup means, an audio input means, a noise source including any one of a zoom lens group drive means, a focus lens group drive means, an ND filter drive means, and a shutter drive means, and the image pickup means. Subject recognition means for recognizing a subject from an input image; noise reduction means for reducing noise from the noise source from voice input by the voice input means; and the noise reduction means in accordance with recognition of the subject recognition means Control means for controlling the person, and the subject recognition means is means for recognizing the movement of the person and its mouth in the image input by the imaging means. The noise reduction means is enabled while recognizing a person who is moving.
An image pickup apparatus according to the present invention includes an image pickup means, an audio input means, a noise source including any one of a zoom lens group drive means, a focus lens group drive means, an ND filter drive means, and a shutter drive means, and the image pickup means. Subject recognition means for recognizing a subject from an input image; noise reduction means for reducing noise from the noise source from voice input by the voice input means; and the noise reduction means in accordance with recognition of the subject recognition means And a voice frequency band data storage means for storing representative voice frequency band values for each gender and age, and the subject recognition means in the image input by the imaging means. A means for recognizing a person and its gender and age, wherein the control means is configured to detect the person while the subject recognition means recognizes the person. Outside generation frequency band corresponding to a different age, and controlling the noise reduction means to reduce the noise.
An imaging apparatus according to the present invention includes an imaging unit, a voice acquisition unit, and a voice processing unit that reduces noise caused by driving of a driving unit related to the imaging unit from the voice obtained by the voice acquisition unit. The sound processing means drives the sound obtained by the sound obtaining means while the person is detected in the image picked up by the image pickup means in a frequency band that is not a frequency band generated by the person. It is characterized in that noise accompanying driving of the means is reduced.
An imaging apparatus according to the present invention includes an imaging unit, a voice acquisition unit, and a voice processing unit that reduces noise caused by driving of a driving unit related to the imaging unit from the voice obtained by the voice acquisition unit. And the sound processing unit is configured to use a frequency band according to the gender of the detected person for the sound obtained by the sound acquisition unit while the person is detected in the image captured by the image capturing unit. In the present invention, noise associated with driving of the driving means is reduced.
An imaging apparatus according to the present invention includes an imaging unit, a voice acquisition unit, and a voice processing unit that reduces noise caused by driving of a driving unit related to the imaging unit from the voice obtained by the voice acquisition unit. The sound processing unit is configured to use a frequency band corresponding to the age of the detected person for the sound obtained by the sound acquisition unit while the person is detected in the image captured by the image capturing unit. In the present invention, noise associated with driving of the driving means is reduced.

本発明によれば、機構部等が発生する騒音を効果的に低減でき、録音品質又は再生品質を向上させることができる。 According to the present invention, it is possible to effectively reduce the noise generated by the mechanism and the like, and improve the recording quality or the reproduction quality.

以下、図面を参照して、本発明の実施例を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明に係る撮像装置の第１実施例の正面図を示し、図２は、その背面図を示す。図３は、本実施例の概略構成ブロック図を示す。 FIG. 1 shows a front view of a first embodiment of an imaging apparatus according to the present invention, and FIG. 2 shows a rear view thereof. FIG. 3 shows a schematic block diagram of the present embodiment.

本実施例の撮像装置１０は、カメラ本体部１２に、レンズ鏡筒１４、フラッシュ１６及びマイク（音入力手段）１８を具備する。カメラ本体部１２の上面には、シャッタボタン２０を具備する。カメラ本体部１２の背面には、図２に示すように、画像表示部２２、ズームボタン２４、電源ボタン２６、モード設定ボタン２８及びスピーカ（音出力手段）３０を備えている。画像表示部２２は、一般的には、液晶表示装置（ＬＣＤ：ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）や有機ＥＬ（Ｅｌｅｃｔｒｏ−Ｌｕｍｉｎｅｓｃｅｎｃｅ）素子からなる。 The imaging apparatus 10 according to the present exemplary embodiment includes a lens body 14, a flash 16, and a microphone (sound input unit) 18 in the camera body 12. A shutter button 20 is provided on the upper surface of the camera body 12. As shown in FIG. 2, an image display unit 22, a zoom button 24, a power button 26, a mode setting button 28, and a speaker (sound output means) 30 are provided on the rear surface of the camera body 12. The image display unit 22 is generally composed of a liquid crystal display (LCD) or an organic EL (Electro-Luminescence) element.

レンズ鏡筒１４には、図３に示すように、ズームレンズ群３２、シャッタ３４及びフォーカスレンズ群３６を有する撮影レンズが収納される。ズームレンズ群３２とシャッタ３４の間には、光軸上にＮＤフィルタを挿入又は退出させるＮＤフィルタ進退機構３８が配置される。ＮＤフィルタは、被写体が明るく、シャッタ速度変更や撮像素子４０の感度変更でも適正露光が得られないほど被写体が明るい場合に、被写体からの光量を減光するために使用される。撮影レンズは、ズームレンズ群３２により焦点距離を変更可能であり、フォーカスレンズ群３６による合焦度を調節可能である。 As shown in FIG. 3, the lens barrel 14 houses a photographing lens having a zoom lens group 32, a shutter 34, and a focus lens group 36. Between the zoom lens group 32 and the shutter 34, an ND filter advance / retreat mechanism 38 for inserting or leaving the ND filter on the optical axis is disposed. The ND filter is used to reduce the amount of light from the subject when the subject is bright and the subject is so bright that proper exposure cannot be obtained even when the shutter speed is changed or the sensitivity of the image sensor 40 is changed. The photographing lens can change the focal length by the zoom lens group 32 and can adjust the degree of focus by the focus lens group 36.

ズームレンズ群駆動回路４２が、駆動モータ及び減速機構からなる駆動機構４４により、ズームレンズ群３２を光軸方向に移動させる。シャッタ駆動回路４６が、駆動機構４８により、シャッタ３４のシャッタ羽根を開閉する。フォーカスレンズ群駆動回路５０が、駆動モータ及び減速機構からなる駆動機構５２により、フォーカスレンズ群３６を光軸方向に移動させる。ＮＤフィルタ駆動回路５４が、駆動モータ及び減速機構からなる駆動機構５６により、ＮＤフィルタ進退機構３８のＮＤフィルタを光軸上の位置と光軸から退避した位置との間で移動させる。 The zoom lens group drive circuit 42 moves the zoom lens group 32 in the optical axis direction by a drive mechanism 44 including a drive motor and a speed reduction mechanism. The shutter drive circuit 46 opens and closes the shutter blades of the shutter 34 by the drive mechanism 48. The focus lens group drive circuit 50 moves the focus lens group 36 in the optical axis direction by a drive mechanism 52 including a drive motor and a speed reduction mechanism. The ND filter drive circuit 54 moves the ND filter of the ND filter advance / retreat mechanism 38 between a position on the optical axis and a position retracted from the optical axis by a drive mechanism 56 including a drive motor and a speed reduction mechanism.

シャッタボタン２０は、スチルカメラで周知の通り、半押し状態と全押し状態の２段階に押下可能である。静止画撮影モードでは、シャッタボタン２０の半押しに対し、露光制御及び合焦制御が実行され、全押しに対して、被写体画像の静止画としての取り込みと録画が実行される。動画撮影モードでは、シャッタボタン２０の１回目の全押しに対して、被写体画像の動画としての取り込みと録画が実行され、２回目の全押しで録画が停止される。即ち、シャッタボタン２０は、録画開始と停止を指示するトグルボタンとして機能する。 The shutter button 20 can be pressed in two stages, a half-pressed state and a fully-pressed state, as is well known in a still camera. In the still image shooting mode, exposure control and focusing control are executed when the shutter button 20 is half-pressed, and capturing and recording of the subject image as a still image are executed when the shutter button 20 is fully pressed. In the moving image shooting mode, the subject image is captured and recorded as a moving image in response to the first full press of the shutter button 20, and the recording is stopped by the second full press. That is, the shutter button 20 functions as a toggle button for instructing start and stop of recording.

詳細は後述するが、静止画撮影モードでは、シャッタボタン２０が操作されていないとき、所定の周期、例えば１／１５秒毎に被写体の光学像が取り込まれ、画像表示部２２の画面上にライブビュー画像として表示される。シャッタボタン２０の半押し段階で、露出制御値（シャッタスピード及び絞り値）が設定されると共に、被写体距離に撮影レンズのフォーカスが制御されて撮像待機状態となる。シャッタボタン２０の全押し段階で、露出制御値のシャッタスピードに応じた時間だけシャッタ３４が開放され、撮像素子４０が露光される。この露光により撮像素子４０で生成される画像信号は、後述する圧縮処理等を経て、記録媒体に記録される。 Although details will be described later, in the still image shooting mode, when the shutter button 20 is not operated, an optical image of the subject is captured at a predetermined cycle, for example, every 1/15 seconds, and live on the screen of the image display unit 22. Displayed as a view image. When the shutter button 20 is half-pressed, an exposure control value (shutter speed and aperture value) is set, and the focus of the photographic lens is controlled to the subject distance to enter an imaging standby state. When the shutter button 20 is fully pressed, the shutter 34 is opened for a time corresponding to the shutter speed of the exposure control value, and the image sensor 40 is exposed. An image signal generated by the imaging device 40 by this exposure is recorded on a recording medium through a compression process and the like which will be described later.

他方、動画撮影モードでは、シャッタボタン２０が操作されていないとき、露出と撮影レンズのフォーカスが継続的に自動制御される。勿論、露出制御を手動に切り替えたときには、露出制御は手動になる。同様に、合焦制御を手動に切り替えた場合も、合焦制御は手動になる。録画開始前も録画中も、撮像素子４０に入射する被写体画像が、画像表示部２２の画面にライブビュー画像として表示される。 On the other hand, in the moving image shooting mode, when the shutter button 20 is not operated, the exposure and the focus of the shooting lens are continuously and automatically controlled. Of course, when the exposure control is switched to manual, the exposure control is manual. Similarly, when the focus control is switched to manual, the focus control is manual. The subject image incident on the image sensor 40 is displayed as a live view image on the screen of the image display unit 22 before and during the recording.

フラッシュ１６は、被写体からの光量が不足する場合などに、被写体に撮影用の照明光を照射するために使用される。 The flash 16 is used to irradiate the subject with illumination light for photographing when the amount of light from the subject is insufficient.

マイク１８は、音声を電気信号（音声信号）に変換する素子であり、スピーカ３０は、音声信号を音響出力する素子である。 The microphone 18 is an element that converts sound into an electrical signal (sound signal), and the speaker 30 is an element that acoustically outputs the sound signal.

画像表示部２２は、撮影の際のファインダとして、また、録画中の画像及び再生画像をモニタする手段として使用される。画像表示部２２にはまた、動作パラメータ等を設定するため等の種々の設定画面が表示される。 The image display unit 22 is used as a finder at the time of shooting and as a means for monitoring an image being recorded and a reproduced image. The image display unit 22 also displays various setting screens for setting operation parameters and the like.

ズームボタン２４は、ズームレンズ群３２のワイド方向への移動を指示するワイドボタン２４ａと、テレ方向への移動を指示するテレボタン２４ｂから成る。電源ボタン２６は、押下する毎に撮像装置１０の主電源のオン／オフが交互に切り換わる。モード設定ボタン２８は、被写体像の静止画撮影を行う静止画撮影モードと、被写体像の動画撮影を行う動画撮影モードと、記録画像を再生する再生モードを選択指定する。 The zoom button 24 includes a wide button 24a for instructing movement of the zoom lens group 32 in the wide direction and a tele button 24b for instructing movement in the tele direction. Each time the power button 26 is pressed, the main power supply of the imaging apparatus 10 is alternately switched on / off. The mode setting button 28 selects and designates a still image shooting mode for shooting still images of a subject image, a moving image shooting mode for shooting moving images of a subject image, and a playback mode for playing back recorded images.

撮像素子４０は、撮影レンズによる光学像を電気画像信号に変換する。撮像素子４０は、Ｒ（赤）、Ｇ（緑）、Ｂ（青）の各色成分を受光する画素をマトリックス状に複数有する例えばＣＭＯＳカラーエリアセンサからなる。なお、全体制御部６０が、撮像素子４０の露出開始／終了、及び各画素信号の読出しを制御する。 The image sensor 40 converts an optical image from the photographing lens into an electric image signal. The image sensor 40 is formed of, for example, a CMOS color area sensor having a plurality of pixels that receive R (red), G (green), and B (blue) color components in a matrix. The overall control unit 60 controls the start / end of exposure of the image sensor 40 and the reading of each pixel signal.

画像処理部６２は、撮像素子４０から出力される画像信号にカメラで公知の処理（ノイズ低減、黒レベル、色変換、ホワイトバランス調整及びガンマ補正等）を施す。具体的には、画像処理部６２は、画像信号のノイズを低減し、レベル調整した上で、所定ビット（例えば、各色８ビット）にデジタル化する。画像処理部６２は更に、画像データの黒レベルを基準値に補正し、Ｒ（赤）、Ｇ（緑）及びＢ（青）の各色成分の画素データを調整して全体のホワイトバランスを補正し、また、γ特性を補正する。 The image processing unit 62 performs known processing (noise reduction, black level, color conversion, white balance adjustment, gamma correction, etc.) on the image signal output from the image sensor 40. Specifically, the image processing unit 62 reduces the noise of the image signal, adjusts the level, and digitizes it to a predetermined bit (for example, 8 bits for each color). The image processing unit 62 further corrects the black level of the image data to the reference value, adjusts the pixel data of each color component of R (red), G (green), and B (blue) to correct the entire white balance. In addition, the γ characteristic is corrected.

画像処理部６２で処理された画像データは、画像メモリ６４に一時格納される。撮影時には、画像メモリ６４は、画像処理部６２からの画像データに全体制御部６０による所定処理を施すための作業領域としても用いられる。再生モードでは、画像メモリ６４は、記録媒体から読み出され再生された画像データが一時格納する手段として使用される。 The image data processed by the image processing unit 62 is temporarily stored in the image memory 64. At the time of shooting, the image memory 64 is also used as a work area for performing predetermined processing by the overall control unit 60 on the image data from the image processing unit 62. In the reproduction mode, the image memory 64 is used as means for temporarily storing image data read from the recording medium and reproduced.

ＶＲＡＭ６６は、画像表示部２２で再生表示する画像データを記憶するビデオメモリであり、画像表示部２２の表示画素数に対応する画素数の画像データを格納できる。 The VRAM 66 is a video memory that stores image data to be reproduced and displayed on the image display unit 22, and can store image data having the number of pixels corresponding to the number of display pixels of the image display unit 22.

音声処理部６８は、マイク１８からの音声信号を増幅しデジタル信号に変換する機能と、全体制御部６０からの音声データ（再生音声信号）をアナログ信号に変換してスピーカ３０に供給する機能を有する。 The audio processing unit 68 has a function of amplifying an audio signal from the microphone 18 and converting it into a digital signal, and a function of converting audio data (reproduced audio signal) from the overall control unit 60 into an analog signal and supplying it to the speaker 30. Have.

入力操作部７０は、シャッタボタン２０、ズームボタン２４、電源ボタン２６、モード設定ボタン２８及びその他の操作手段からなる。入力操作部７０の操作情報は、全体制御部６０に入力される。 The input operation unit 70 includes a shutter button 20, a zoom button 24, a power button 26, a mode setting button 28, and other operation means. Operation information of the input operation unit 70 is input to the overall control unit 60.

記録媒体７２は、半導体記憶素子からなるメモリカードや、又は、ハードディスクなどからなり、全体制御部６０から供給される画像データ、音声データ又はこれらの多重データを記憶する。 The recording medium 72 is composed of a memory card composed of a semiconductor storage element, a hard disk, or the like, and stores image data, audio data, or multiplexed data thereof supplied from the overall control unit 60.

全体制御部６０はマイクロコンピュータからなり、カメラ本体部１２内の各部を統括的に制御する。全体制御部６０は、ワークエリアとしてのＲＡＭと、各種機能プログラム等を記憶するＲＯＭを有する。全体制御部６０は、プログラムにより実現される複数の機能要素７４〜８９を有する。 The overall control unit 60 is composed of a microcomputer, and comprehensively controls each unit in the camera body unit 12. The overall control unit 60 includes a RAM as a work area and a ROM that stores various function programs. The overall control unit 60 has a plurality of functional elements 74 to 89 realized by a program.

撮像制御部７４は、撮像素子４０の露光動作を制御する。具体的には、動画撮影モードでは、シャッタボタン２０の全押しに応じて、撮像素子４０に露光と画像信号の読出しを所定の周期で繰り返し実行させ、この撮影画像データを記録媒体７２に記録させる。そして、再度のシャッタボタン２０の全押しに応じて、この動作を終了させる。 The imaging control unit 74 controls the exposure operation of the imaging element 40. Specifically, in the moving image shooting mode, in response to the full press of the shutter button 20, the image sensor 40 repeatedly performs exposure and reading of the image signal at a predetermined cycle, and records the captured image data on the recording medium 72. . Then, in response to the full press of the shutter button 20 again, this operation is terminated.

符号化復号化処理部７６は、撮影画像データ及び入力音声データの符号化処理（圧縮処理）と、記録媒体７２から読み出された圧縮画像データ及び圧縮音声データの復号化処理（伸長処理）を担当する。圧縮画像データと圧縮音声データを多重する場合には、そのための多重処理と、再生の際の分離処理を担当する。 The encoding / decoding processing unit 76 performs encoding processing (compression processing) of captured image data and input audio data, and decoding processing (decompression processing) of compressed image data and compressed audio data read from the recording medium 72. Handle. When the compressed image data and the compressed audio data are multiplexed, it is in charge of the multiplexing process and the separation process at the time of reproduction.

駆動制御部７７は、ズームレンズ群駆動機構４４、フォーカスレンズ群駆動機構５２、ＮＤフィルタ進退駆動機構５６及びシャッタ駆動機構４８の動作を制御する。例えば、駆動制御部７７は、ズームボタン２４が操作されると、その操作時間に応じた期間、ズームレンズ群３２を光軸方向に駆動するようにズームレンズ群駆動回路４２を制御する。また、シャッタボタン２０の半押し操作に対し、駆動制御部７７は、フォーカスレンズ群３６を光軸方向に移動させるようにフォーカスレンズ群駆動回路５０に指示する。駆動制御部７７は、撮像素子４０に結像される光量に従い、ＮＤフィルタの進退をＮＤフィルタ駆動回路５４に指示する。 The drive control unit 77 controls operations of the zoom lens group driving mechanism 44, the focus lens group driving mechanism 52, the ND filter advance / retreat driving mechanism 56, and the shutter driving mechanism 48. For example, when the zoom button 24 is operated, the drive control unit 77 controls the zoom lens group drive circuit 42 to drive the zoom lens group 32 in the optical axis direction for a period corresponding to the operation time. In response to the half-press operation of the shutter button 20, the drive control unit 77 instructs the focus lens group drive circuit 50 to move the focus lens group 36 in the optical axis direction. The drive control unit 77 instructs the ND filter drive circuit 54 to advance or retract the ND filter according to the amount of light imaged on the image sensor 40.

被写体認識部７８は、撮像素子４０から得られる撮影画像中に人物が存在するか否か、人物の人数、画面上の位置、性別、成人か子供か、口が動いているか否か等を認識する。 The subject recognition unit 78 recognizes whether or not a person is present in the captured image obtained from the image sensor 40, the number of persons, the position on the screen, gender, whether the person is an adult or a child, whether the mouth is moving, or the like. To do.

人物自動選択部８０は、被写体認識部７８で判別した人物の内、被写体として重要である人物を一定の基準に従い選択する。例えば、画面内で各人物が占める面積や、画面の中央にいるか周辺にいるか、シャッタボタン２０の半押し時にオートフォーカスした人物かどうか等の何れかまたはその組み合わせを基準とする。そして、このような基準に従い、動画撮影時に被写体として最も重要である人物を推測する。 The automatic person selection unit 80 selects a person who is important as a subject among the persons determined by the subject recognition unit 78 according to a certain standard. For example, the area occupied by each person in the screen, whether the person is in the center or the periphery of the screen, whether or not the person is autofocused when the shutter button 20 is half-pressed, or a combination thereof is used as a reference. Then, according to such a standard, a person who is most important as a subject at the time of moving image shooting is estimated.

なお、被写体認識と顔認識技術は、特開平６−１６０９４４号公報及び特開２００１−３３０８８２号公報等に記載されている。本実施例では、このような公知の被写体認識技術を被写体認識部７８及び人物自動選択部８０に使用する。被写体認識技術のこれ以上の詳細な説明は、省略する。 Note that subject recognition and face recognition techniques are described in JP-A-6-160944 and JP-A-2001-330882. In this embodiment, such a known subject recognition technique is used for the subject recognition unit 78 and the person automatic selection unit 80. Further detailed description of the subject recognition technique is omitted.

音声処理制御部８２は、音声処理部６８でのデータの流れを制御する。すなわち、動画撮影モードでは、シャッタボタン２０の全押しに応じて、マイク１８の出力音声信号を取り込み、デジタル化及びＰＣＭ符号化するように音声処理部６８を制御する。音声処理制御部８２はまた、ズームフィルタ処理部８４、フォーカスフィルタ処理部８６及びＮＤフィルタ処理部８８に、駆動機構４４，５６，４８，５２が発生する騒音成分を低減させる。再度、シャッタボタン２０が全押しされると、全体制御部６０は、音声処理部６８のこれらの処理を終了させる。 The voice processing control unit 82 controls the flow of data in the voice processing unit 68. That is, in the moving image shooting mode, in response to the shutter button 20 being fully pressed, the audio processing unit 68 is controlled so that the output audio signal of the microphone 18 is captured and digitized and PCM encoded. The audio processing control unit 82 also causes the zoom filter processing unit 84, the focus filter processing unit 86, and the ND filter processing unit 88 to reduce noise components generated by the drive mechanisms 44, 56, 48, and 52. When the shutter button 20 is fully pressed again, the overall control unit 60 ends these processes of the audio processing unit 68.

ズームフィルタ処理部８４は、マイク１８及び音声処理部６８を介して入力された音データから、ズームレンズ群駆動機構４４が発生する騒音の周波数帯域に属する周波数成分を低減する。 The zoom filter processing unit 84 reduces frequency components belonging to the frequency band of noise generated by the zoom lens group driving mechanism 44 from the sound data input via the microphone 18 and the sound processing unit 68.

フォーカスフィルタ処理部８６は、マイク１８及び音声処理部６８を介して入力された音データから、フォーカスレンズ群駆動機構５２が発生する騒音の周波数帯域に属する周波数の音を低減する。 The focus filter processing unit 86 reduces sound having a frequency belonging to the frequency band of noise generated by the focus lens group driving mechanism 52 from sound data input through the microphone 18 and the sound processing unit 68.

ＮＤフィルタ処理部８８は、マイク１８及び音声処理部６８を介して入力された音データから、ＮＤフィルタ進退駆動機構５６が発生する騒音の周波数帯域に属する周波数の音を低減する。 The ND filter processing unit 88 reduces sound having a frequency belonging to the frequency band of noise generated by the ND filter advance / retreat drive mechanism 56 from sound data input via the microphone 18 and the sound processing unit 68.

発声周波数帯域データ記憶部９０は、図４に示すように、男性／女性別及び成人／子供別に分類した計４通りの代表的なまたは平均的な低減周波数域と低減率のデータを記憶する。これらのデータは、予め計測されたものである。 As shown in FIG. 4, the utterance frequency band data storage unit 90 stores four types of representative or average reduction frequency range and reduction rate data classified by male / female and by adult / child. These data are measured in advance.

姿勢補正係数記憶部９２は、騒音低減特性の補正係数として、図５に示すように、カメラ姿勢に対する低減周波数帯域の補正係数と低減率の補正係数を記憶する。カメラ姿勢は、図５に示す例では、通常撮影、上向き、及び下向きの３つ状態からなる。図５に示す例では、補正係数は０．８倍から１．１倍の範囲で与えられる。 As shown in FIG. 5, the posture correction coefficient storage unit 92 stores a correction coefficient of a reduction frequency band and a correction factor of a reduction rate with respect to the camera posture as correction coefficients of noise reduction characteristics. In the example shown in FIG. 5, the camera posture includes three states of normal shooting, upward, and downward. In the example shown in FIG. 5, the correction coefficient is given in the range of 0.8 times to 1.1 times.

温度補正係数記憶部９４は、騒音低減特性の補正係数として、図６に示すように、カメラ温度に対する低減周波数帯域の補正係数と低減率の補正係数を記憶する。図６に示す例では、カメラ温度は、−１０〜１０℃、１０〜３０℃及び３０〜４０℃の３つ状態のからなる。図６に示す例では、補正係数は０．８倍から１．２倍の値をとる。 As shown in FIG. 6, the temperature correction coefficient storage unit 94 stores a correction coefficient for the reduction frequency band and a correction coefficient for the reduction rate with respect to the camera temperature, as the correction coefficient for the noise reduction characteristic. In the example shown in FIG. 6, the camera temperature consists of three states of −10 to 10 ° C., 10 to 30 ° C., and 30 to 40 ° C. In the example shown in FIG. 6, the correction coefficient takes a value from 0.8 times to 1.2 times.

基本騒音波形記憶部９６は、各種基本騒音波形を記憶する。図７（ａ）は、ズーム基本騒音波形の一例を示す。図７（ｂ）はフォーカス基本騒音波形の一例を示す。図７（ｃ）はＮＤフィルタ進退基本騒音波形の一例を示す。何れも、横軸が時間を示し、縦軸が音圧を示す。 The basic noise waveform storage unit 96 stores various basic noise waveforms. FIG. 7A shows an example of the zoom basic noise waveform. FIG. 7B shows an example of the focus basic noise waveform. FIG. 7C shows an example of the ND filter advance / retreat basic noise waveform. In either case, the horizontal axis represents time, and the vertical axis represents sound pressure.

姿勢センサ９８は、撮像装置の姿勢（向き）を測定するセンサである。姿勢センサ９８は、図５に示す例に対応して、撮像装置が上向きか、通常姿勢か、下向きかを検出する。 The attitude sensor 98 is a sensor that measures the attitude (orientation) of the imaging apparatus. The posture sensor 98 detects whether the imaging device is upward, normal posture, or downward, corresponding to the example shown in FIG.

温度センサ１００は、撮像装置の温度を測定するセンサである。温度センサ１００は、図６に示す例に対応して、−１０〜１０℃、１０〜３０℃、３０〜４０℃の３つの温度範囲の何れに属する温度かを測定する。 The temperature sensor 100 is a sensor that measures the temperature of the imaging apparatus. Corresponding to the example shown in FIG. 6, the temperature sensor 100 measures the temperature belonging to any of the three temperature ranges of −10 to 10 ° C., 10 to 30 ° C., and 30 to 40 ° C.

具体的には、ズームレンズ群駆動機構４４からの騒音を除去する際に、発声周波数帯域データ記憶部９０から被写体に応じて発声周波数帯域データを読み出すか、又は、基本騒音波形記憶部９６からズーム騒音基本波形を読み出す。温度センサ１００が検出したカメラ温度を基に、温度補正係数記憶部９４から温度補正係数を読み出す。姿勢センサ９８が検出したカメラ姿勢を基に、姿勢補正係数記憶部９２から姿勢補正係数を読み出す。そして、読み出した発声周波数帯域データの音圧レベル低減周波数帯域と低減率を、温度補正係数と姿勢補正係数により補正する。ズームフィルタ処理部８４は、温度補正係数及び姿勢補正係数で補正された発生周波数帯域データに従い、音声処理部６８から入力される音声信号をフィルタリングする。 Specifically, when removing noise from the zoom lens group driving mechanism 44, the utterance frequency band data is read from the utterance frequency band data storage unit 90 according to the subject, or the zoom is performed from the basic noise waveform storage unit 96. Read the basic noise waveform. Based on the camera temperature detected by the temperature sensor 100, the temperature correction coefficient is read from the temperature correction coefficient storage unit 94. Based on the camera posture detected by the posture sensor 98, the posture correction coefficient is read from the posture correction coefficient storage unit 92. Then, the sound pressure level reduction frequency band and the reduction rate of the read utterance frequency band data are corrected by the temperature correction coefficient and the attitude correction coefficient. The zoom filter processing unit 84 filters the audio signal input from the audio processing unit 68 according to the generated frequency band data corrected with the temperature correction coefficient and the attitude correction coefficient.

発声周波数帯域データ（音圧レベル低減周波数帯域と低減率）を温度に応じて補正する理由を説明する。各種駆動機構４８，５２，５６、ズームレンズ群３２、シャッタ３４、フォーカスレンズ群３６及びＮＤフィルタ進退機構３８は、それらを構成している部品が温度により伸縮する。その結果、部品相互の間隔や接触している部分の摩擦力が変化し、駆動中の各部品の挙動が変化する。また、各駆動源に要求される駆動力も変化するので、駆動スピードも変化する。これらの結果、各駆動機構が発する騒音の周波数や音圧が変化する。そこで、低減すべき周波数と低減率を温度に従って補正することで、より精密に騒音を低減できるようになる。 The reason why the utterance frequency band data (sound pressure level reduction frequency band and reduction rate) is corrected according to temperature will be described. The various drive mechanisms 48, 52, 56, zoom lens group 32, shutter 34, focus lens group 36, and ND filter advance / retreat mechanism 38 expand and contract with the components constituting them. As a result, the distance between the parts and the frictional force of the contacting parts change, and the behavior of each driving part changes. Further, since the driving force required for each driving source also changes, the driving speed also changes. As a result, the frequency and sound pressure of noise generated by each drive mechanism change. Therefore, noise can be reduced more precisely by correcting the frequency to be reduced and the reduction rate according to the temperature.

カメラ姿勢を考慮する理由は、以下の通りである。重力の影響があることから、カメラ姿勢によって駆動源への負荷が変化する。例えば、カメラを空方向である上向きにして、ズームレンズ群３２を上方向に繰り出そうとすると、ズームレンズ群３２を重力に逆らって駆動する必要がある。重力に逆らう分、ズームレンズ群駆動機構４４の駆動源への負荷が増大し、ズームスピードが遅くなり、ズーム騒音の周波数と音圧が変化する。従って、低減すべき周波数や低減率を姿勢に応じて補正することで、より精密に騒音を低減できるようになる。 The reason for considering the camera posture is as follows. Because of the influence of gravity, the load on the drive source varies depending on the camera posture. For example, when the camera is turned upward, which is the sky direction, and the zoom lens group 32 is extended upward, it is necessary to drive the zoom lens group 32 against gravity. The load on the drive source of the zoom lens group drive mechanism 44 increases against the gravity, the zoom speed becomes slow, and the frequency and sound pressure of the zoom noise change. Therefore, noise can be reduced more precisely by correcting the frequency and reduction rate to be reduced according to the posture.

また、外気温度によってマイク１８の音響／電気変換の周波数特性が変化するので、入力音声に温度補正を加えるのが好ましい。 In addition, since the frequency characteristics of acoustic / electrical conversion of the microphone 18 change depending on the outside air temperature, it is preferable to add temperature correction to the input voice.

フォーカスレンズ群駆動機構５２からの騒音を除去する際に、発声周波数帯域データ記憶部９０から被写体に応じて発声周波数帯域データを読み出すか、又は、基本騒音波形記憶部９６からフォーカス騒音基本波形を読み出す。温度センサ１００が検出したカメラ温度を基に、温度補正係数記憶部９４から温度補正係数を読み出す。姿勢センサ９８が検出したカメラ姿勢を基に、姿勢補正係数記憶部９２から姿勢補正係数を読み出す。読み出した発声周波数帯域データの音圧レベル低減周波数帯域と低減率を、温度補正係数と姿勢補正係数により補正する。フォーカスフィルタ処理部８６は、音声処理部６８から入力される音声の、補正された周波数帯を、補正された低減率でフィルタリングする。 When removing noise from the focus lens group driving mechanism 52, the utterance frequency band data is read out from the utterance frequency band data storage unit 90 according to the subject, or the focus noise basic waveform is read out from the basic noise waveform storage unit 96. . Based on the camera temperature detected by the temperature sensor 100, the temperature correction coefficient is read from the temperature correction coefficient storage unit 94. Based on the camera posture detected by the posture sensor 98, the posture correction coefficient is read from the posture correction coefficient storage unit 92. The sound pressure level reduction frequency band and the reduction rate of the read voice frequency band data are corrected by the temperature correction coefficient and the attitude correction coefficient. The focus filter processing unit 86 filters the corrected frequency band of the audio input from the audio processing unit 68 with the corrected reduction rate.

ＮＤフィルタ進退駆動機構５６からの騒音を除去する際に、発声周波数帯域データ記憶部９０から被写体に応じて発声周波数帯域データを読み出すか、又は基本騒音波形記憶部９６からＮＤフィルタ進退騒音基本波形を読み出す。温度センサ１００が検出したカメラ温度を基に、温度補正係数記憶部９４から温度補正係数を読み出す。姿勢センサ９８が検出したカメラ姿勢を基に、姿勢補正係数記憶部９２から姿勢補正係数を読み出す。読み出した発声周波数帯域データの音圧レベル低減周波数帯域と低減率を、温度補正係数と姿勢補正係数により補正する。ＮＤフィルタ処理部８８は、音声処理部６８から入力される音声の、補正された周波数帯を補正された低減率でフィルタリングする。 When noise from the ND filter advance / retreat drive mechanism 56 is removed, the utterance frequency band data is read from the utterance frequency band data storage unit 90 according to the subject, or the ND filter advance / retreat noise basic waveform is read from the basic noise waveform storage unit 96. read out. Based on the camera temperature detected by the temperature sensor 100, the temperature correction coefficient is read from the temperature correction coefficient storage unit 94. Based on the camera posture detected by the posture sensor 98, the posture correction coefficient is read from the posture correction coefficient storage unit 92. The sound pressure level reduction frequency band and the reduction rate of the read voice frequency band data are corrected by the temperature correction coefficient and the attitude correction coefficient. The ND filter processing unit 88 filters the corrected frequency band of the audio input from the audio processing unit 68 with the corrected reduction rate.

フィルタ選択部８９は、動画撮影モードにおいてシャッタボタン２０の全押し操作が行われると、操作対象に応じて、騒音除去処理用に、ズームフィルタ処理部８４、フォーカスフィルタ処理部８６又はＮＤフィルタ処理部８８を選択する。例えば、ズームボタン２４が操作された場合、フィルタ選択部８９は、ズームフィルタ処理部８４に騒音の除去処理の実行を指示する。フォーカスレンズ群３６の駆動が開始すると、フォーカスフィルタ処理部８６がフォーカスレンズ群３６の駆動に応じた騒音除去処理を実行する。同様に、ＮＤフィルタの進退移動に対しては、ＮＤフィルタ処理部８８が騒音除去処理を実行する。 When the shutter button 20 is fully pressed in the moving image shooting mode, the filter selection unit 89 performs a zoom filter processing unit 84, a focus filter processing unit 86, or an ND filter processing unit for noise removal processing depending on the operation target. 88 is selected. For example, when the zoom button 24 is operated, the filter selection unit 89 instructs the zoom filter processing unit 84 to execute noise removal processing. When the drive of the focus lens group 36 is started, the focus filter processing unit 86 executes a noise removal process corresponding to the drive of the focus lens group 36. Similarly, the ND filter processing unit 88 executes noise removal processing for the forward / backward movement of the ND filter.

符号化復号化処理部７６は、ズームフィルタ処理部８４、フォーカスフィルタ処理部８６又はＮＤフィルタ処理部８８により騒音を除去された音声データを符号化する。符号化復号化処理部７６は、符号化した音声データを記録媒体７２に格納する。 The encoding / decoding processing unit 76 encodes the audio data from which noise has been removed by the zoom filter processing unit 84, the focus filter processing unit 86, or the ND filter processing unit 88. The encoding / decoding processing unit 76 stores the encoded audio data in the recording medium 72.

図８を参照して、人間の発する音声の周波数特性及び各種駆動機構の発生する騒音の周波数特性を説明する。図８（ａ）は、女性の発生の周波数特性例を示す。図８（ｂ）は、男性の発声の周波数特性例を示す。図８（ｃ）は、ズーム騒音の周波数特性例を示す。図８（ｄ）は、ＮＤフィルタ進退騒音の周波数特性例を示す。 With reference to FIG. 8, the frequency characteristics of voices produced by humans and the frequency characteristics of noise generated by various drive mechanisms will be described. FIG. 8A shows an example of frequency characteristics of female occurrence. FIG. 8B shows an example of frequency characteristics of male utterances. FIG. 8C shows an example of frequency characteristics of zoom noise. FIG. 8D shows an example of frequency characteristics of ND filter advance / retreat noise.

日本語の周波数帯域は、１００〜１５００Ｈｚという説があり、男性に対して女性の発声の周波数帯域はやや高くなっている。周波数特性の数値としては各種研究・測定結果があり、どれが最適であるか不明であるが、ここでは、本実施例の特徴を明確にする為に、数値を仮に設定して説明する。 There is a theory that the frequency band of Japanese is 100 to 1500 Hz, and the frequency band of female utterances is slightly higher than that of men. There are various research and measurement results as numerical values of the frequency characteristics, and it is unclear which is optimal. Here, in order to clarify the characteristics of the present embodiment, numerical values will be temporarily set and described.

図８（ａ）は、成人女性がある一定時間、会話した場合の周波数特性の一例を模式的に示す。横軸が周波数〔Ｈｚ〕を示し、縦軸が音圧〔ｄＢ〕を示す。周波数特性の曲線を見ると、１０Ｈｚ付近は音圧が低く、２００Ｈｚ付近で高くなり、２０００Ｈｚ付近で音圧が低下している。このように、図８（ａ）に示す例では、発声する音域が、２００〜２０００Ｈｚにある。 FIG. 8A schematically shows an example of frequency characteristics when an adult woman has a conversation for a certain period of time. The horizontal axis represents frequency [Hz], and the vertical axis represents sound pressure [dB]. Looking at the frequency characteristic curve, the sound pressure is low near 10 Hz, high near 200 Hz, and low near 2000 Hz. Thus, in the example shown to Fig.8 (a), the sound range to utter is in 200-2000 Hz.

図８（ｂ）は、成人男性が、ある一定時間、会話した場合の周波数特性の一例を模式的に示す。横軸が周波数〔Ｈｚ〕を示し、縦軸が音圧〔ｄＢ〕を示す。周波数特性の曲線を見ると、１０Ｈｚ付近は音圧が低く、１００Ｈｚ付近で高くなり、１０００Ｈｚ付近で音圧が低下している。このように、図８（ｂ）に示す例では、発声する音域が１００〜１０００Ｈｚにある。 FIG. 8B schematically shows an example of frequency characteristics when an adult male talks for a certain period of time. The horizontal axis represents frequency [Hz], and the vertical axis represents sound pressure [dB]. Looking at the frequency characteristic curve, the sound pressure is low near 10 Hz, high near 100 Hz, and low near 1000 Hz. Thus, in the example shown in FIG.8 (b), the sound range to utter is in 100-1000 Hz.

図８（ｃ）は、ズームレンズ群駆動機構４４がズームレンズ群３２を駆動する際に発生する騒音をマイク１８で取り込んだときの、マイク１８の出力音声の周波数特性を示す。横軸が周波数〔Ｈｚ〕を示し、縦軸が音圧〔ｄＢ〕を示す。周波数５Ｈｚから１０ｋＨｚにわたって一定の音圧で騒音を発生している。 FIG. 8C shows the frequency characteristics of the output sound of the microphone 18 when noise generated when the zoom lens group driving mechanism 44 drives the zoom lens group 32 is captured by the microphone 18. The horizontal axis represents frequency [Hz], and the vertical axis represents sound pressure [dB]. Noise is generated at a constant sound pressure over a frequency of 5 Hz to 10 kHz.

図８（ｄ）は、ＮＤフィルタ進退駆動機構５６がＮＤフィルタを駆動した際に発生する騒音をマイク１８で取り込んだときの、マイク１８の出力音声の周波数特性を示す。横軸が周波数〔Ｈｚ〕を示し、縦軸が音圧〔ｄＢ〕を示す。周波数２０Ｈｚ近辺に第１のピークがあり、周波数２ｋＨｚ近辺に第２のピークがある。 FIG. 8D shows the frequency characteristics of the output sound of the microphone 18 when the microphone 18 captures noise generated when the ND filter advance / retreat drive mechanism 56 drives the ND filter. The horizontal axis represents frequency [Hz], and the vertical axis represents sound pressure [dB]. There is a first peak around a frequency of 20 Hz, and a second peak around a frequency of 2 kHz.

図９及び図１０を参照して、本実施例の録音時の騒音低減処理の動作を詳細に説明する。図９及び図１０は、本実施例の、動画撮影モードにおける騒音低減処理の動作フローチャートを示す。 With reference to FIG. 9 and FIG. 10, the operation of noise reduction processing during recording according to the present embodiment will be described in detail. 9 and 10 show an operational flowchart of the noise reduction processing in the moving image shooting mode of the present embodiment.

全体制御部６０は、シャッタボタン２０の半押しを検出すると（Ｓ１）、測距及び測光する（Ｓ２）。即ち、被写体までの距離をコントラスト方式ＡＦ（オートフォーカス）で測距し、撮像素子４０に入射する光量から適正露光となるシャッタ速度を決定する。そして、全体制御部６０は、測距結果に従い、フォーカスレンズ群駆動回路５０に指示して、フォーカスレンズ群３６を被写体にピントが合う位置まで移動させる（Ｓ３）。また、全体制御部６０は、測光結果に従い、被写体が明るすぎる場合は、減光の為、ＮＤフィルタ駆動回路５４に指示してＮＤフィルタを光路中に進入させる（Ｓ４）。 When detecting that the shutter button 20 is half-pressed (S1), the overall control unit 60 performs distance measurement and photometry (S2). That is, the distance to the subject is measured by contrast AF (autofocus), and the shutter speed for proper exposure is determined from the amount of light incident on the image sensor 40. Then, the overall control unit 60 instructs the focus lens group driving circuit 50 according to the distance measurement result to move the focus lens group 36 to a position where the subject is in focus (S3). Further, according to the photometric result, the overall control unit 60 instructs the ND filter drive circuit 54 to cause the ND filter to enter the optical path for dimming if the subject is too bright (S4).

被写体認識部７８は、画面内の人物の存在不在を認識し（Ｓ５）、存在しない場合には、被写体に依存しない騒音低減処理を含む通常の撮影処理（Ｓ６）を実行する。図１１は、ステップＳ６の詳細なフローチャートを示す。 The subject recognizing unit 78 recognizes the presence / absence of a person on the screen (S5), and if not, executes a normal photographing process (S6) including a noise reduction process independent of the subject. FIG. 11 shows a detailed flowchart of step S6.

図１１に示すフローでは、撮像装置が発する騒音周波数の全域に対してフィルタのゲインを低減する。シャッタボタン２０の全押しに対し（Ｓ３１）、動画撮影（録画）を開始する（Ｓ３２）。 In the flow illustrated in FIG. 11, the gain of the filter is reduced with respect to the entire noise frequency generated by the imaging apparatus. When the shutter button 20 is fully pressed (S31), moving image shooting (recording) is started (S32).

ズームボタン２４が押されている場合には（Ｓ３３）、ズームレンズ群の駆動中に発する騒音の周波数帯５〜１０００Ｈｚの録音レベルを−１５ｄＢ下げて、録音を続ける（Ｓ３４）。 When the zoom button 24 is pressed (S33), the recording level of the noise frequency band 5 to 1000 Hz generated during the driving of the zoom lens group is lowered by -15 dB, and the recording is continued (S34).

フォーカスレンズ群を駆動している間は（Ｓ３５）、フォーカスレンズ群の駆動中に発する騒音の周波数帯１００〜１０００Ｈｚの録音レベルを−１３ｄＢ下げて、録音を続ける（Ｓ３６）。 While the focus lens group is being driven (S35), the recording level in the frequency band 100 to 1000 Hz of noise generated while the focus lens group is being driven is lowered by -13 dB and recording is continued (S36).

ＮＤフィルタが進退駆動している間は（Ｓ３７）、ＮＤフィルタの進退駆動中に発する騒音の周波数３０Ｈｚ前後と２０００Ｈｚ前後の録音レベルを−８ｄＢ下げて、録音を続ける（Ｓ３８）。 While the ND filter is driven forward / backward (S37), the recording level of the noise generated during the forward / backward drive of the ND filter is lowered by about -8 dB, and the recording level is continued (S38).

シャッタボタン２０が再度、全押しされるまで（Ｓ３９）、以上のステップＳ３３〜Ｓ３８を繰り返し実行する。シャッタボタン２０が再度、全押しされると（Ｓ３９）、動画撮影を終了する（Ｓ４０）。 Until the shutter button 20 is fully pressed again (S39), the above steps S33 to S38 are repeated. When the shutter button 20 is fully pressed again (S39), the moving image shooting is terminated (S40).

図９に戻る。画面内に人物が含まれている場合（Ｓ５）、画面内の人数の検出する（Ｓ７）。各人物が画面内でどの位置にいるかを検出する（Ｓ８）。ここでは、例えば、２名検出され、１名が中央に位置し、１名が右端に位置するとする。画面内の人物の大きさと配置から重要被写体となる人物を自動決定する（Ｓ９）。画面中央に大きめに人物Ａが正面顔で存在し、画面右隅に人物Ａより小さく人物Ｂが存在しているとすると、画面上の位置と大きさのいずれからも、人物Ａが重要被写体人物と判定される。決定された重要被写体人物の性別と年齢を判定する（Ｓ１０）。例えば、顔の目鼻口の距離比率から性別と年齢を判定でき、ここでは人物Ａが成人女性であると判定したとする。 Returning to FIG. When a person is included in the screen (S5), the number of persons in the screen is detected (S7). It is detected in which position each person is in the screen (S8). Here, for example, it is assumed that two persons are detected, one person is located at the center, and one person is located at the right end. A person to be an important subject is automatically determined from the size and arrangement of the person in the screen (S9). If a large person A exists in the center of the screen with a front face, and a person B smaller than the person A exists in the right corner of the screen, the person A is an important subject person from both the position and size on the screen. It is determined. The gender and age of the determined important subject person are determined (S10). For example, suppose that gender and age can be determined from the distance ratio of the eyes and nose and mouth, and it is assumed here that the person A is determined to be an adult woman.

シャッタボタン２０の全押しを待ち（Ｓ１１）、全押しに応じて、動画撮影（録画）を開始する（Ｓ１２）。 The camera waits for the shutter button 20 to be fully pressed (S11), and starts moving image recording (recording) in response to the full press (S12).

録画中のズームボタン２４の操作に対して（Ｓ１３）、発声周波数帯域データ記憶部９０から、重要被写体人物の性別と年齢に応じた音圧レベル低減周波数帯域と低減率を読み出す（Ｓ１４）。先に説明したように、重要被写体人物が成人女性であることから、図４に示すように、成人女性の制御データ（低減周波数帯域＝１０〜２００Ｈｚと１７００〜２４０００Ｈｚ、低減率＝４０％）を読み出す。 In response to the operation of the zoom button 24 during recording (S13), the sound pressure level reduction frequency band and the reduction rate corresponding to the gender and age of the important subject person are read from the utterance frequency band data storage unit 90 (S14). As described above, since the important subject person is an adult woman, as shown in FIG. 4, control data (reduction frequency bands = 10 to 200 Hz and 1700 to 24000 Hz, reduction rate = 40%) of the adult woman is obtained. read out.

温度センサ１００による検出温度及び姿勢センサ９８により検出される姿勢に従い、温度補正係数を温度補正係数記憶部９４から読み出し、姿勢補正係数を姿勢補正係数記憶部９２から読み出す（Ｓ１５）。例えば、温度センサ１００の測定値が３２℃の場合、温度補正係数記憶部９４から、図６に示すように、低減周波数帯域の補正係数＝１．１、低減率の補正係数＝０．８を読み出す。そして、姿勢センサ９８により検出される現状姿勢が通常姿勢である場合、図５に示すように、低減周波数帯域の補正係数＝１、低減率の補正係数＝１を読み出す。 According to the temperature detected by the temperature sensor 100 and the attitude detected by the attitude sensor 98, the temperature correction coefficient is read from the temperature correction coefficient storage section 94, and the attitude correction coefficient is read from the attitude correction coefficient storage section 92 (S15). For example, when the measurement value of the temperature sensor 100 is 32 ° C., the temperature correction coefficient storage unit 94 sets the correction coefficient for the reduction frequency band = 1.1 and the correction coefficient for the reduction ratio = 0.8 as shown in FIG. read out. When the current posture detected by the posture sensor 98 is a normal posture, as shown in FIG. 5, the correction coefficient for the reduction frequency band = 1 and the correction factor for the reduction rate = 1 are read out.

補正係数に従うフィルタ特性で入力音声をフィルタリングして、騒音を低減する（S１６）。そして、図１２に示すように、画像表示部（ＬＣＤ）６の左上に「騒音低減録音中」を表示して、撮影者に、ズーム騒音分の録音音声補正の開始を通知する。ズームフィルタ処理部８４は、マイク１８及び音声処理部６８を介して入力された被写体の音声の内、周波数帯（１０×１．１×１）〜（２００×１．１×１）Ｈｚの音声を（４０×０．８×１）％の低減率で加工する。また、周波数帯（１７００×１．１×１）〜（２４０００×１．１×１）Ｈｚの音声を（４０×０．８×１）％の低減率で加工する。そして、音声処理制御部８２は、その加工された音声データを記録媒体７２に記録する。 The input voice is filtered with the filter characteristic according to the correction coefficient to reduce noise (S16). Then, as shown in FIG. 12, “Noise reduction recording” is displayed on the upper left of the image display unit (LCD) 6 to notify the photographer of the start of recording sound correction for the zoom noise. The zoom filter processing unit 84 has a frequency band (10 × 1.1 × 1) to (200 × 1.1 × 1) Hz of the subject audio input via the microphone 18 and the audio processing unit 68. Is processed at a reduction rate of (40 × 0.8 × 1)%. Moreover, the sound of the frequency band (1700 × 1.1 × 1) to (24000 × 1.1 × 1) Hz is processed with a reduction rate of (40 × 0.8 × 1)%. Then, the voice processing control unit 82 records the processed voice data on the recording medium 72.

たまたま女性がカメラに向かって話し掛けていたとすると、ある時間内の平均では、図８（ａ）に示す女性発声の周波数特性の音声と、図８（ｃ）に示すズーム騒音周波数特性の音がマイク１８で集音されることになる。これらの合成音声の周波数特性は、図１３（ａ）に示すように、２００〜２０００Ｈｚが一段高い音圧の周波数特性になる。ズームフィルタ処理部８４での音声加工量は、図１３（ｂ）に示すように、２２０〜１８７０Ｈｚまでは加工されず、２２０Ｈｚ以下と１８７０Ｈｚ以上の音の音圧が３２％、低減加工される。すなわち、被写体となった人物の性別及び年齢に応じた発生周波数帯域以外で、騒音源に応じた騒音低減特性で騒音が低減される。その結果、図１０（ｃ）に示すように、成人女性の発声周波数に殆ど影響を与えないで、それ以外の周波数（２２０Ｈｚ以下と１８７０Ｈｚ以上）で騒音周波数部分を低減できる。 If a woman happens to talk to the camera, the average frequency within a certain period of time is the sound of the female voice frequency characteristic shown in FIG. 8A and the sound of the zoom noise frequency characteristic shown in FIG. 8C. 18 will be collected. As shown in FIG. 13A, the frequency characteristics of these synthesized voices are the frequency characteristics of the sound pressure that is higher by 200 to 2000 Hz. As shown in FIG. 13B, the sound processing amount in the zoom filter processing unit 84 is not processed up to 220 to 1870 Hz, and the sound pressure of sounds of 220 Hz or less and 1870 Hz or more is reduced by 32%. That is, noise is reduced by noise reduction characteristics according to the noise source outside the generated frequency band according to the gender and age of the person who is the subject. As a result, as shown in FIG. 10C, the noise frequency portion can be reduced at other frequencies (220 Hz or less and 1870 Hz or more) without substantially affecting the utterance frequency of the adult woman.

フォーカスレンズ群３６が駆動開始している場合には（Ｓ１７）、全体制御部６０は、発声周波数帯域データ記憶部９０から、重要被写体人物の性別と年齢に応じた音圧レベル低減周波数帯域と低減率を読み出す（Ｓ１８）。図４に示すように、成人女性の場合の制御データ（低減周波数帯域＝１０〜２００Ｈｚと１７００〜２４０００Ｈｚ、低減率＝４０％）を読み出すことになる。 When the focus lens group 36 has started driving (S17), the overall control unit 60 reads the sound pressure level reduction frequency band and reduction according to the gender and age of the important subject person from the utterance frequency band data storage unit 90. The rate is read (S18). As shown in FIG. 4, control data (reduced frequency bands = 10 to 200 Hz and 1700 to 24000 Hz, reduction rate = 40%) in the case of an adult woman is read out.

温度センサ１００による検出温度及び姿勢センサ９８により検出される姿勢に従い、温度補正係数を温度補正係数記憶部９４から読み出し、姿勢補正係数を姿勢補正係数記憶部９２から読み出す（Ｓ１９）。例えば、温度センサ１００の測定値が３２℃の場合、温度補正係数記憶部９４から、図６に示すように、低減周波数帯域の補正係数＝１．１、低減率の補正係数＝０．８を読み出す。そして、姿勢センサ９８により検出される現状姿勢が通常姿勢である場合、図５に示すように、低減周波数帯域の補正係数＝１、低減率の補正係数＝１を読み出す。 According to the temperature detected by the temperature sensor 100 and the attitude detected by the attitude sensor 98, the temperature correction coefficient is read from the temperature correction coefficient storage section 94, and the attitude correction coefficient is read from the attitude correction coefficient storage section 92 (S19). For example, when the measurement value of the temperature sensor 100 is 32 ° C., the temperature correction coefficient storage unit 94 sets the correction coefficient for the reduction frequency band = 1.1 and the correction coefficient for the reduction ratio = 0.8 as shown in FIG. read out. When the current posture detected by the posture sensor 98 is a normal posture, as shown in FIG. 5, the correction coefficient for the reduction frequency band = 1 and the correction factor for the reduction rate = 1 are read out.

図１２に示すように、画像表示部（ＬＣＤ）６の左上に「騒音低減録音中」を表示して、撮影者に、ズーム騒音分の録音音声補正の開始を通知する（Ｓ２０）。フォーカスフィルタ処理部８６は、マイク１８及び音声処理部６８を介して入力された被写体の音声の内、周波数帯（１０×１．１×１）〜（２００×１．１×１）Ｈｚの音声を（４０×０．８×１）％の低減率で加工する。また、周波数帯（１７００×１．１×１）〜（２４０００×１．１×１）Ｈｚの音声を（４０×０．８×１）％の低減率で加工する。そして、音声処理制御部８２は、その加工された音声データを記録媒体７２に記録する。 As shown in FIG. 12, “Noise reduction recording” is displayed on the upper left of the image display unit (LCD) 6 to notify the photographer of the start of recording sound correction for zoom noise (S20). The focus filter processing unit 86 has a frequency band (10 × 1.1 × 1) to (200 × 1.1 × 1) Hz of the subject audio input via the microphone 18 and the audio processing unit 68. Is processed at a reduction rate of (40 × 0.8 × 1)%. Moreover, the sound of the frequency band (1700 × 1.1 × 1) to (24000 × 1.1 × 1) Hz is processed with a reduction rate of (40 × 0.8 × 1)%. Then, the voice processing control unit 82 records the processed voice data on the recording medium 72.

ＮＤフィルタが進退開始している場合には（Ｓ２１）、発声周波数帯域データ記憶部９０から、重要被写体人物の性別と年齢に応じた音圧レベル低減周波数帯域と低減率を読み出す（Ｓ２２）。先に説明したように、重要被写体人物が成人女性であることから、図４に示すように、成人女性の制御データ（低減周波数帯域＝１０〜２００Ｈｚと１７００〜２４０００Ｈｚ、低減率＝４０％）を読み出す。 When the ND filter has started to advance or retreat (S21), the sound pressure level reduction frequency band and the reduction rate according to the gender and age of the important subject person are read from the utterance frequency band data storage unit 90 (S22). As described above, since the important subject person is an adult woman, as shown in FIG. 4, control data (reduction frequency bands = 10 to 200 Hz and 1700 to 24000 Hz, reduction rate = 40%) of the adult woman is obtained. read out.

温度センサ１００による検出温度及び姿勢センサ９８により検出される姿勢に従い、温度補正係数を温度補正係数記憶部９４から読み出し、姿勢補正係数を姿勢補正係数記憶部９２から読み出す（Ｓ２３）。例えば、温度センサ１００の測定値が３２℃の場合、温度補正係数記憶部９４から、図６に示すように、低減周波数帯域の補正係数＝１．１、低減率の補正係数＝０．８を読み出す。そして、姿勢センサ９８により検出される現状姿勢が通常姿勢である場合、図５に示すように、低減周波数帯域の補正係数＝１、低減率の補正係数＝１を読み出す。 In accordance with the temperature detected by the temperature sensor 100 and the attitude detected by the attitude sensor 98, the temperature correction coefficient is read from the temperature correction coefficient storage section 94, and the attitude correction coefficient is read from the attitude correction coefficient storage section 92 (S23). For example, when the measurement value of the temperature sensor 100 is 32 ° C., the temperature correction coefficient storage unit 94 sets the correction coefficient for the reduction frequency band = 1.1 and the correction coefficient for the reduction ratio = 0.8 as shown in FIG. read out. When the current posture detected by the posture sensor 98 is a normal posture, as shown in FIG. 5, the correction coefficient for the reduction frequency band = 1 and the correction factor for the reduction rate = 1 are read out.

補正係数に従うフィルタ特性で入力音声をフィルタリングして、騒音を低減する（Ｓ２４）。そして、図１２に示すように、画像表示部２２の左上に「騒音低減録音中」を表示して、撮影者に、ＮＤフィルタの進退に伴う騒音分の録音音声補正の開始を通知する。ＮＤフィルタ処理部８８は、マイク１８及び音声処理部６８を介して入力された被写体の音声の内、周波数帯（１０×１．１×１）〜（２００×１．１×１）Ｈｚの音声を（４０×０．８×１）％の低減率で加工する。また、周波数帯（１７００×１．１×１）〜（２４０００×１．１×１）Ｈｚの音声を（４０×０．８×１）％の低減率で加工する。そして、音声処理制御部８２は、その加工された音声データを記録媒体７２に記録する。 The input voice is filtered with the filter characteristic according to the correction coefficient to reduce noise (S24). Then, as shown in FIG. 12, “Noise reduction recording” is displayed on the upper left of the image display unit 22, and the photographer is notified of the start of recording sound correction for noise accompanying the advancement / retraction of the ND filter. The ND filter processing unit 88 has a frequency band (10 × 1.1 × 1) to (200 × 1.1 × 1) Hz of the subject audio input via the microphone 18 and the audio processing unit 68. Is processed at a reduction rate of (40 × 0.8 × 1)%. Moreover, the sound of the frequency band (1700 × 1.1 × 1) to (24000 × 1.1 × 1) Hz is processed with a reduction rate of (40 × 0.8 × 1)%. Then, the voice processing control unit 82 records the processed voice data on the recording medium 72.

シャッタボタン２０が再度、全押しされるまで（Ｓ２５）、以上のステップＳ１３〜Ｓ２４を繰り返し実行する。シャッタボタン２０が再度、全押しされると（Ｓ２５）、動画撮影を終了する（Ｓ２６）。 Until the shutter button 20 is fully pressed again (S25), the above steps S13 to S24 are repeated. When the shutter button 20 is fully pressed again (S25), moving image shooting is terminated (S26).

本実施例では、画面内に人物がいる場合に、その人物が発するであろう音声に悪影響を与えないで、撮像装置が発する騒音（ズーム音、フォーカス音、ＮＤ進退音）を低減して、録音できる。 In this embodiment, when there is a person on the screen, the noise (zoom sound, focus sound, ND advance / retreat sound) generated by the imaging device is reduced without adversely affecting the sound that the person will emit. Can record.

ズーム音、フォーカス音及びＮＤ進退音の３つの騒音を低減したが、必要に応じて一部の騒音のみの低減（例えば、ズーム音のみの低減）に限定してもよいことは明らかである。 Although the three noises of the zoom sound, the focus sound, and the ND advance / retreat sound have been reduced, it is obvious that the noise may be limited to only a part of the noise (for example, reduction of only the zoom sound) if necessary.

本発明の第２実施例を説明する。第２実施例では、重要被写体人物が発生しているときのみ、騒音低減を有効化する。図１４は、そのように図１０を変更した制御フローチャートを示す。ステップＳ１５とステップＳ１６の間に条件分岐のステップＳ１５−２を挿入し、ステップＳ１９とステップＳ２０の間に条件分岐のステップＳ１９−２を挿入している。 A second embodiment of the present invention will be described. In the second embodiment, noise reduction is enabled only when an important subject person is present. FIG. 14 shows a control flowchart in which FIG. 10 is changed as described above. Step S15-2 for conditional branching is inserted between step S15 and step S16, and step S19-2 for conditional branching is inserted between step S19 and step S20.

すなわち、被写体認識の１つとして、重要被写体の口が動いているか動いていないかを識別する。図１５（ａ），（ｂ）に破線で示すように、重要被写体人物の口の動きをモニタし、発声をしているか否かを判別する。図１５（ａ）は口を閉じている状態を示し、図１５（ｂ）は口を開いている状態を示す。撮影画像上、重要被写体人物の口が動いていて、且つマイク１８に音声が入力されている場合、被写体認識部７８は、重要被写体人物が発声していると判断する。重要被写体人物が発声している場合に（Ｓ１５−２，Ｓ１９−２）、騒音低減処理を実行する（Ｓ１６，Ｓ２０）。 That is, as one of subject recognition, it is identified whether the mouth of an important subject is moving or not. As shown by broken lines in FIGS. 15A and 15B, the movement of the mouth of the important subject person is monitored to determine whether or not he / she is speaking. FIG. 15A shows a state where the mouth is closed, and FIG. 15B shows a state where the mouth is opened. If the mouth of the important subject person is moving on the photographed image and the sound is input to the microphone 18, the subject recognition unit 78 determines that the important subject person is speaking. When an important subject person is speaking (S15-2, S19-2), noise reduction processing is executed (S16, S20).

なお、ＮＤフィルタ進退時の騒音は瞬間的な騒音であるので、この実施例では、発声の有無による騒音低減処理の有効化制御は行っていない。 Since the noise during the ND filter advance / retreat is instantaneous noise, in this embodiment, the activation control of the noise reduction processing based on the presence or absence of utterance is not performed.

本実施例では、人物が画面の中央に位置していても、発声や会話していないときには、駆動騒音を効果的に低減し、発声や会話しているときには、その音声を損なわないように、騒音を低減できる。被写体人物が発声している時のみ、撮像装置が発する騒音を低減するので、実施例１よりさらに違和感の少ない騒音低減録音を実現できる。 In this embodiment, even when the person is located at the center of the screen, when the voice is not spoken or talking, the driving noise is effectively reduced, and when the voice is spoken or spoken, the voice is not impaired. Noise can be reduced. Since noise generated by the imaging apparatus is reduced only when the subject person is speaking, noise-reduced recording with less discomfort than in the first embodiment can be realized.

本発明の第３実施例を説明する。この実施例では、基本騒音波形記憶部９６が、撮像装置が発声する代表的な騒音波形を記憶する。そして、録音時には、この代表的な騒音波形を音声変化及び姿勢変化に応じて補正した上で、マイク１８の入力音声（被写体音声と騒音）から減算する。 A third embodiment of the present invention will be described. In this embodiment, the basic noise waveform storage unit 96 stores a typical noise waveform uttered by the imaging apparatus. At the time of recording, the representative noise waveform is corrected according to the voice change and the posture change, and then subtracted from the input voice (subject voice and noise) of the microphone 18.

図１６は、基本騒音波形記憶部９６に記憶される基本騒音波形に対する温度補正係数表を示す。図１６に示す温度補正係数が、温度補正係数記憶部９４に記憶されている。図示例では、温度範囲が３段階に分類され、各温度範囲に対して、時間軸を係数倍する時間軸補正係数と、振幅を係数倍する振幅補正係数が格納されている。例えば、温度センサ１００により、撮像装置の周囲温度が８℃と測定された場合、温度範囲−１０〜１０℃の欄から、時間軸方向の補正係数として１．１が読み出される。これは、１秒間の騒音が１．１秒かかって発生すると予測して、基本騒音波形の騒音を補正することに相当する。また、振幅は１．２倍に補正される。 FIG. 16 shows a temperature correction coefficient table for the basic noise waveform stored in the basic noise waveform storage unit 96. The temperature correction coefficient shown in FIG. 16 is stored in the temperature correction coefficient storage unit 94. In the illustrated example, the temperature range is classified into three stages, and for each temperature range, a time axis correction coefficient for multiplying the time axis by a factor and an amplitude correction coefficient for multiplying the amplitude by a factor are stored. For example, when the ambient temperature of the imaging apparatus is measured by the temperature sensor 100 as 8 ° C., 1.1 is read out as the correction coefficient in the time axis direction from the column of the temperature range −10 to 10 ° C. This is equivalent to correcting the noise of the basic noise waveform by predicting that the noise for one second takes 1.1 seconds. The amplitude is corrected to 1.2 times.

図１７は、基本騒音波形記憶部９６に記憶される基本騒音波形に対する姿勢補正係数表を示す。図１７に示す姿勢補正係数が、姿勢補正係数記憶部９２に記憶されている。図１７に示す例では、カメラの姿勢が３段階に分類され、各姿勢に対して、時間軸を係数倍する時間軸補正係数と、振幅を係数倍する振幅補正係数が格納されている。例えば、姿勢センサ９８により撮像装置の姿勢が上向きと測定された場合、図１７に示す姿勢補正係数表の「撮像装置上向き」の欄から、時間軸方向の姿勢補正係数として１．２が読み出される。これは、１秒間の騒音が１．２秒かかって発生すると予測して、基本騒音波形の騒音を補正することに相当する。また、振幅は１．１倍に補正される。 FIG. 17 shows a posture correction coefficient table for the basic noise waveform stored in the basic noise waveform storage unit 96. The posture correction coefficient illustrated in FIG. 17 is stored in the posture correction coefficient storage unit 92. In the example shown in FIG. 17, the posture of the camera is classified into three stages, and for each posture, a time axis correction coefficient that multiplies the time axis by a coefficient and an amplitude correction coefficient that multiplies the amplitude by a coefficient are stored. For example, when the orientation sensor 98 measures the orientation of the imaging device as upward, 1.2 is read out as the orientation correction factor in the time axis direction from the “imaging device upward” column of the orientation correction coefficient table shown in FIG. . This is equivalent to correcting the noise of the basic noise waveform by predicting that the noise for one second takes 1.2 seconds. The amplitude is corrected to 1.1 times.

図１８は、図９に対応する第３実施例の騒音処理の変更部分のフローチャートを示す。 FIG. 18 shows a flowchart of the changed part of the noise processing of the third embodiment corresponding to FIG.

録画中のズームボタン２４の操作に対して（Ｓ１１３）、全体制御部６０は、基本騒音波形記憶部９６からズーム基本騒音波形データ（図７（ａ））を読み出す（Ｓ１１４）。温度センサ１００の検出温度に従い、温度補正係数記憶部９４から、対応する温度範囲の時間軸補正係数及び振幅補正係数を読み出す。また、姿勢センサ９８の検出姿勢に従い、姿勢補正係数記憶部９２から対応する姿勢に対する時間軸補正係数及び振幅補正係数を読み出す（Ｓ１１５）。 In response to the operation of the zoom button 24 during recording (S113), the overall control unit 60 reads zoom basic noise waveform data (FIG. 7A) from the basic noise waveform storage unit 96 (S114). According to the temperature detected by the temperature sensor 100, the time axis correction coefficient and the amplitude correction coefficient of the corresponding temperature range are read from the temperature correction coefficient storage unit 94. Further, in accordance with the detected posture of the posture sensor 98, the time axis correction coefficient and the amplitude correction coefficient for the corresponding posture are read from the posture correction coefficient storage unit 92 (S115).

音声処理制御部８２は、読み出されたズーム基本騒音波形に温度と姿勢に関する時間軸補正係数と振幅補正係数を各々掛け合わせ、マイク１８で集音される音声から減算することで、騒音を低減する（Ｓ１１６）。音声処理制御部８２は、このように騒音を低減された音声データを記録媒体７２に記録する。 The audio processing control unit 82 reduces the noise by multiplying the read zoom basic noise waveform by the time axis correction coefficient and the amplitude correction coefficient related to temperature and posture, respectively, and subtracting from the sound collected by the microphone 18. (S116). The sound processing control unit 82 records the sound data with reduced noise in this way on the recording medium 72.

フォーカスレンズ群が駆動開始している場合には（Ｓ１１７）、全体制御部６０は、基本騒音波形記憶部９６からフォーカス基本騒音波形データ（図７（ｂ））を読み出す（Ｓ１１８）。温度センサ１００の検出温度に従い、温度補正係数記憶部９４から、対応する温度範囲の時間軸補正係数及び振幅補正係数を読み出す。また、姿勢センサ９８の検出姿勢に従い、姿勢補正係数記憶部９２から対応する姿勢に対する時間軸補正係数及び振幅補正係数を読み出す（Ｓ１１９）。 When the focus lens group has started driving (S117), the overall control unit 60 reads the focus basic noise waveform data (FIG. 7B) from the basic noise waveform storage unit 96 (S118). According to the temperature detected by the temperature sensor 100, the time axis correction coefficient and the amplitude correction coefficient of the corresponding temperature range are read from the temperature correction coefficient storage unit 94. Further, the time axis correction coefficient and the amplitude correction coefficient for the corresponding attitude are read from the attitude correction coefficient storage unit 92 according to the detected attitude of the attitude sensor 98 (S119).

音声処理制御部８２は、読み出されたフォーカス基本騒音波形に温度と姿勢に関する時間軸補正係数と振幅補正係数を各々掛け合わせ、マイク１８で集音される音声から減算することで、騒音を低減する（Ｓ１２０）。音声処理制御部８２は、このように騒音を低減された音声データを記録媒体７２に記録する。 The voice processing control unit 82 reduces the noise by multiplying the read focus basic noise waveform by the time axis correction coefficient and the amplitude correction coefficient related to temperature and posture, respectively, and subtracting from the voice collected by the microphone 18. (S120). The sound processing control unit 82 records the sound data with reduced noise in this way on the recording medium 72.

ＮＤフィルタが進退開始している場合には（Ｓ１２１）、全体制御部６０は、基本騒音波形記憶部９６からＮＤフィルタ進退基本騒音波形データ（図７（ｃ））を読み出す（Ｓ１２２）。温度センサ１００の検出温度に従い、温度補正係数記憶部９４から、対応する温度範囲の時間軸補正係数及び振幅補正係数を読み出す。また、姿勢センサ９８の検出姿勢に従い、姿勢補正係数記憶部９２から対応する姿勢に対する時間軸補正係数及び振幅補正係数を読み出す（Ｓ１２３）。 When the ND filter has started to advance / retreat (S121), the overall control unit 60 reads the ND filter advance / retreat basic noise waveform data (FIG. 7C) from the basic noise waveform storage unit 96 (S122). According to the temperature detected by the temperature sensor 100, the time axis correction coefficient and the amplitude correction coefficient of the corresponding temperature range are read from the temperature correction coefficient storage unit 94. Further, the time axis correction coefficient and the amplitude correction coefficient for the corresponding attitude are read from the attitude correction coefficient storage unit 92 according to the detected attitude of the attitude sensor 98 (S123).

音声処理制御部８２は、読み出されたＮＤフィルタ進退基本騒音波形に温度と姿勢に関する時間軸補正係数と振幅補正係数を各々掛け合わせ、マイク１８で集音される音声から減算することで、騒音を低減する（Ｓ１２４）。音声処理制御部８２は、このように騒音を低減された音声データを記録媒体７２に記録する。 The sound processing control unit 82 multiplies the read ND filter basic / reverse basic noise waveform by the time axis correction coefficient and the amplitude correction coefficient related to temperature and posture, respectively, and subtracts the sound from the sound collected by the microphone 18 to thereby reduce the noise. (S124). The sound processing control unit 82 records the sound data with reduced noise in this way on the recording medium 72.

本実施例でも、録音低減処理を実行しながら、入力音声を録音している場合には、図１２に示すように、画像表示部（ＬＣＤ）６の左上に「騒音低減録音中」を表示する。 Also in this embodiment, when recording the input voice while performing the recording reduction process, “Noise reduction recording” is displayed on the upper left of the image display unit (LCD) 6 as shown in FIG. .

シャッタボタン２０が再度、全押しされるまで（Ｓ１２５）、以上のステップＳ１１３〜Ｓ１２４を繰り返し実行する。シャッタボタン２０が再度、全押しされると（Ｓ１２５）、動画撮影を終了する（Ｓ１２６）。 Until the shutter button 20 is fully pressed again (S125), the above steps S113 to S124 are repeated. When the shutter button 20 is fully pressed again (S125), moving image shooting is terminated (S126).

本実施例では、各種騒音の基本波形を使用環境（姿勢及び温度）に従って補正してから入力音声から減算するので、撮像装置で発声する騒音を低減して周囲音声を録音できる。 In this embodiment, the basic waveform of various noises is corrected according to the usage environment (posture and temperature) and then subtracted from the input voice, so that the noise uttered by the imaging device can be reduced and ambient voice can be recorded.

本発明の第４実施例を説明する。本実施例では、画面内の人物の配置に応じてマイクロ１０の指向性を変更制御する。これにより、撮像装置の発する騒音を極力低減できる。 A fourth embodiment of the present invention will be described. In the present embodiment, the directivity of the micro 10 is changed and controlled in accordance with the arrangement of persons on the screen. Thereby, the noise which an imaging device emits can be reduced as much as possible.

図１９（ａ）は、撮影視野内に人物が存在しないとき、又は、撮影対象の人物が口を閉じているときのマイク１８の指向性例を示す。図１９（ｂ）は、撮影対象の人物が発声中のマイク１８の指向性例を示す。円形グラフの上が被写体方向正面であり、円形グラフ右側が撮影画面右側を示し、円形グラフ左側が撮影画面右左側を示す。円形の半径方向がマイク１８の感度を示す。正面方向の感度を０ｄＢと正規化して、最外円が感度０ｄＢ、１つ内側の円が感度−１０ｄＢ、２つ目の円が感度−２０ｄＢを示す。 FIG. 19A shows an example of directivity of the microphone 18 when no person is present in the photographing field of view or when the person to be photographed closes his mouth. FIG. 19B shows an example of directivity of the microphone 18 while the person to be imaged is uttering. The top of the circular graph is the front of the subject direction, the right side of the circular graph shows the right side of the shooting screen, and the left side of the circular graph shows the left side of the shooting screen. The circular radial direction indicates the sensitivity of the microphone 18. Sensitivity in the front direction is normalized to 0 dB, the outermost circle indicates sensitivity 0 dB, the inner circle indicates sensitivity −10 dB, and the second circle indicates sensitivity −20 dB.

図１９（ａ）では、実線で示される感度曲線は、左右対称形状で、左右７０度の角度方向で急激に感度が低下している。このように左右７０度という広い指向性を持っているので、撮影画面内と周辺の音声をまんべんなく集音可能である。 In FIG. 19A, the sensitivity curve indicated by the solid line has a left-right symmetric shape, and the sensitivity sharply decreases in the angle direction of 70 degrees to the left and right. Thus, since it has a wide directivity of 70 degrees left and right, it is possible to collect sound evenly in and around the shooting screen.

図１９（ｂ）では、実線で示される感度曲線は、左右対称形状で、左右４５度の角度方向で急激に感度が低下している。このように左右４５度という狭い指向性を持っているので、マイク１８の左右側面や後方での音は集音されない。よって、それらの位置に撮像装置自体が発する騒音源があっても、マイク１８で集音されない。 In FIG. 19B, the sensitivity curve indicated by the solid line has a left-right symmetrical shape, and the sensitivity sharply decreases in the 45 ° angular direction. Thus, since it has a narrow directivity of 45 degrees to the left and right, sound on the left and right side surfaces and the rear side of the microphone 18 is not collected. Therefore, even if there is a noise source generated by the imaging apparatus itself at those positions, the sound is not collected by the microphone 18.

図２０及び図２１は、本実施例の、騒音低減処理を含む動画録画動作のフローチャートを示す。 20 and 21 show a flowchart of the moving image recording operation including noise reduction processing in this embodiment.

全体制御部６０は、シャッタボタン２０が半押しされたことを検出すると（Ｓ２０１）、測距及び測光する（Ｓ２０２）。即ち、被写体までの距離をコントラスト方式ＡＦ（オートフォーカス）で測距し、撮像素子４０に入射する光量から適正露光となるシャッタ速度を決定する。そして、全体制御部６０は、測距結果に従い、フォーカスレンズ群駆動回路５０に指示して、フォーカスレンズ群３６を被写体にピントが合う位置まで移動させる（Ｓ２０３）。また、全体制御部６０は、測光結果に従い、被写体が明るすぎる場合は、減光の為、ＮＤフィルタ駆動回路５４に指示してＮＤフィルタを光路中に進入させる（Ｓ２０４）。 When the overall control unit 60 detects that the shutter button 20 is half-pressed (S201), the overall control unit 60 performs distance measurement and photometry (S202). That is, the distance to the subject is measured by contrast AF (autofocus), and the shutter speed for proper exposure is determined from the amount of light incident on the image sensor 40. Then, the overall control unit 60 instructs the focus lens group driving circuit 50 according to the distance measurement result to move the focus lens group 36 to a position where the subject is in focus (S203). Further, according to the photometric result, the overall control unit 60 instructs the ND filter drive circuit 54 to cause the ND filter to enter the optical path for dimming if the subject is too bright (S204).

全体制御部６０は、シャッタボタン２０の全押しを待ち（Ｓ２０５）、全押しに応じて、動画撮影（録画）を開始する（Ｓ２０６）。 The overall control unit 60 waits for the shutter button 20 to be fully pressed (S205), and starts moving image recording (recording) in response to the full press (S206).

録画中のズームボタン２４の操作に対して（Ｓ２０７）、被写体認識部７８が、撮影画面内に人物が存在するかどうかを認識する（Ｓ２０８）。人物が存在しない場合（Ｓ２０８）、全体制御部６０は、周波数帯５〜１０００Ｈｚの録音レベルを−１５ｄＢ下げて録音を続け、ステップＳ２０７に戻る。 In response to the operation of the zoom button 24 during recording (S207), the subject recognition unit 78 recognizes whether or not a person is present in the shooting screen (S208). If there is no person (S208), the overall control unit 60 continues the recording by reducing the recording level of the frequency band 5 to 1000 Hz by −15 dB, and returns to step S207.

撮影画面内に人物が存在する場合（Ｓ２０８）、画面内の人数を検出し（Ｓ２１０）、検出された人物が画面内の中央にいるのか、周辺のどの位置にいるのかを検出する（Ｓ２１０）。例えば、ここで、２名検出され、中央に１名、右端に１名、位置しているとする。 If there is a person in the shooting screen (S208), the number of persons in the screen is detected (S210), and it is detected whether the detected person is in the center of the screen or in the periphery (S210). . For example, it is assumed here that two persons are detected and one person is located at the center and one person is located at the right end.

画面内の人物の大きさと配置から重要被写体となる人物を自動決定する（Ｓ２１２）。画面中央に大きめに人物Ａが正面顔で存在し、画面右隅に人物Ａより小さく人物Ｂが存在しているとすると、画面上の位置と大きさのいずれからも、人物Ａが重要被写体人物と判定される。決定された重要被写体人物の性別と年齢を判定する（Ｓ２１３）。例えば、顔の目鼻口の距離比率から性別と年齢を判定でき、ここでは人物Ａが成人女性であると判定したとする。 A person to be an important subject is automatically determined from the size and arrangement of the person in the screen (S212). If a large person A exists in the center of the screen with a front face, and a person B smaller than the person A exists in the right corner of the screen, the person A is an important subject person from both the position and size on the screen. It is determined. The gender and age of the determined important subject person are determined (S213). For example, suppose that gender and age can be determined from the distance ratio of the eyes and nose and mouth, and it is assumed here that the person A is determined to be an adult woman.

重要被写体人物が発声しているか否かを判別する（Ｓ２１４）。撮影画像上で重要被写体人物の口が動いていて、且つマイク１８に音声が入力されている場合、被写体認識部７８は、重要被写体人物が発声していると判断する。 It is determined whether or not the important subject person is speaking (S214). If the mouth of the important subject person is moving on the photographed image and sound is being input to the microphone 18, the subject recognition unit 78 determines that the important subject person is speaking.

重要被写体人物が発声している場合（Ｓ２１４）、全体制御部６０は、マイク１０の指向性を、図１９（ａ）に示す広指向性（又は無指向性）の状態から図１９（ｂ）に示す狭指向性の状態に切り替える（Ｓ２１５）。そして、図１５（ｂ）に示すように、画像表示部（ＬＣＤ）６の左上に「騒音低減録音中」を表示する。 When the important subject person is speaking (S214), the overall control unit 60 changes the directivity of the microphone 10 from the wide directivity (or omnidirectional) state shown in FIG. 19A to FIG. 19B. (S215). Then, as shown in FIG. 15B, “Noise reduction recording” is displayed on the upper left of the image display unit (LCD) 6.

全体制御部６０は、発声周波数帯域データ記憶部９０から、重要被写体人物の性別と年齢に応じた音圧レベル低減周波数帯域と低減率を読み出す（Ｓ２１６）。ここでは、重要被写体人物が成人女性であることから、図４に示すように、成人女性の制御データ（低減周波数帯域＝１０〜２００Ｈｚと１７００〜２４０００Ｈｚ、低減率＝４０％）を読み出す。 The overall control unit 60 reads out the sound pressure level reduction frequency band and the reduction rate according to the gender and age of the important subject person from the utterance frequency band data storage unit 90 (S216). Here, since the important subject person is an adult woman, as shown in FIG. 4, control data (reduction frequency bands = 10 to 200 Hz and 1700 to 24000 Hz, reduction rate = 40%) of the adult woman is read.

温度センサ１００による検出温度及び姿勢センサ９８により検出される姿勢に従い、温度補正係数を温度補正係数記憶部９４から読み出し、姿勢補正係数を姿勢補正係数記憶部９２から読み出す（Ｓ２１７）。例えば、温度センサ１００の測定値が３２℃の場合、温度補正係数記憶部９４から、図６に示すように、低減周波数帯域の補正係数＝１．１、低減率の補正係数＝０．８を読み出す。そして、姿勢センサ９８により検出される現状姿勢が通常姿勢である場合、図５に示すように、低減周波数帯域の補正係数＝１、低減率の補正係数＝１を読み出す。 According to the temperature detected by the temperature sensor 100 and the attitude detected by the attitude sensor 98, the temperature correction coefficient is read from the temperature correction coefficient storage section 94, and the attitude correction coefficient is read from the attitude correction coefficient storage section 92 (S217). For example, when the measurement value of the temperature sensor 100 is 32 ° C., the temperature correction coefficient storage unit 94 sets the correction coefficient for the reduction frequency band = 1.1 and the correction coefficient for the reduction ratio = 0.8 as shown in FIG. read out. When the current posture detected by the posture sensor 98 is a normal posture, as shown in FIG. 5, the correction coefficient for the reduction frequency band = 1 and the correction factor for the reduction rate = 1 are read out.

補正係数に従うフィルタ特性で入力音声をフィルタリングして、騒音を低減する（Ｓ２１８）。そして、図１２に示すように、画像表示部２２の左上に「騒音低減録音中」を表示して、撮影者に、ズーム騒音分の録音音声補正の開始を通知する。ズームフィルタ処理部８４は、マイク１８及び音声処理部６８を介して入力された被写体の音声の内、周波数帯（１０×１．１×１）〜（２００×１．１×１）Ｈｚの音声を（４０×０．８×１）％の低減率で加工する。また、周波数帯（１７００×１．１×１）〜（２４０００×１．１×１）Ｈｚの音声を（４０×０．８×１）％の低減率で加工する。このように加工された音声データが記録媒体７２に記録される。 The input voice is filtered with a filter characteristic according to the correction coefficient to reduce noise (S218). Then, as shown in FIG. 12, “Noise reduction recording” is displayed on the upper left of the image display unit 22 to notify the photographer of the start of recording sound correction for zoom noise. The zoom filter processing unit 84 has a frequency band (10 × 1.1 × 1) to (200 × 1.1 × 1) Hz of the subject audio input via the microphone 18 and the audio processing unit 68. Is processed at a reduction rate of (40 × 0.8 × 1)%. Moreover, the sound of the frequency band (1700 × 1.1 × 1) to (24000 × 1.1 × 1) Hz is processed with a reduction rate of (40 × 0.8 × 1)%. The audio data processed in this way is recorded on the recording medium 72.

ズームボタン２４がオフになるか（Ｓ２１７）、又は、重要被写体人物が発声していないときには（Ｓ２１４）、全体制御部６０は、ズーム騒音の低減処理を終了する（Ｓ２１９）。 When the zoom button 24 is turned off (S217) or when an important subject person is not speaking (S214), the overall control unit 60 ends the zoom noise reduction process (S219).

シャッタボタン２０が再度、全押しされるまで（Ｓ２２０）、以上のステップＳ１２０７〜Ｓ２１９を繰り返し実行する。シャッタボタン２０が再度、全押しされると（Ｓ２２０）、動画撮影を終了する（Ｓ２２１）。 Until the shutter button 20 is fully pressed again (S220), the above steps S1207 to S219 are repeatedly executed. When the shutter button 20 is fully pressed again (S220), moving image shooting is terminated (S221).

本実施例では、人物被写体の有無と発声の有無により、マイクの指向性と雑音低減のフィルタ特性を制御することで、人物の音声への影響を抑えつつ、撮像装置の発するズーム騒音の影響を軽減できる。 In this embodiment, the microphone directivity and noise reduction filter characteristics are controlled according to the presence / absence of a person subject and the presence / absence of speech, thereby suppressing the influence on the voice of the person and suppressing the influence of the zoom noise generated by the imaging device. Can be reduced.

ビデオカメラに適用した実施例を説明したが、本発明は、録画済みの動画からの再生音声のノイズ低減にも適用可能である。即ち、再生画像から人物を検出し、その検出結果に従い、上記した実施例と同様の処理を経て、再生音声から騒音を低減する。この場合、撮影時の温度及び姿勢は、撮影画像及び音声に多重して記録され、再生時に分離出力される。画像・音声・姿勢データ・温度データを分離する手段が、画像入力手段、音声入力手段、姿勢情報入力手段及び温度情報入力手段に相当する。 Although the embodiment applied to the video camera has been described, the present invention can also be applied to noise reduction of reproduced sound from a recorded moving image. That is, a person is detected from the reproduced image, and noise is reduced from the reproduced voice through the same processing as in the above-described embodiment according to the detection result. In this case, the temperature and posture at the time of shooting are multiplexed and recorded on the shot image and sound, and are separately output during playback. The means for separating the image / sound / attitude data / temperature data corresponds to an image input unit, a voice input unit, an attitude information input unit, and a temperature information input unit.

本発明の一実施例の正面図である。It is a front view of one Example of this invention. 本実施例の背面図である。It is a rear view of a present Example. 本実施例の概略構成ブロック図を示す。The schematic block diagram of a present Example is shown. 発声周波数帯域データ記憶部９０の記憶データ例である。It is an example of stored data in the utterance frequency band data storage unit 90. 姿勢補正係数記憶部９２に記憶される姿勢補正係数例である。It is an example of a posture correction coefficient stored in the posture correction coefficient storage unit 92. 温度補正係数記憶部９４に記憶される温度補正係数例である。It is an example of the temperature correction coefficient memorize | stored in the temperature correction coefficient memory | storage part 94. FIG. 基本騒音波形記憶部９６に記憶される基本騒音波形例である。It is an example of a basic noise waveform stored in a basic noise waveform storage unit 96. 男性及び女性の発声音、ズーム騒音、並びにＮＤフィルタ進退騒音の周波数特性例を示す。Examples of frequency characteristics of male and female vocal sounds, zoom noise, and ND filter advance / retreat noise are shown. 本実施例の騒音低減処理のフローチャートの一部を示す。A part of flowchart of the noise reduction process of a present Example is shown. 本実施例の騒音低減処理のフローチャートの一部を示す。A part of flowchart of the noise reduction process of a present Example is shown. ステップＳ６の詳細なフローチャートを示す。The detailed flowchart of step S6 is shown. 騒音低減処理中の表示例である。It is an example of a display during noise reduction processing. 入力音声、フィルタ加工量及び加工後の音声の各周波数特性を示す。Each frequency characteristic of the input voice, the filter processing amount, and the processed voice is shown. 第２実施例の騒音低減処理の変更部分を示すフローチャートである。It is a flowchart which shows the change part of the noise reduction process of 2nd Example. 図１４に示すフローの動作説明図である。It is operation | movement explanatory drawing of the flow shown in FIG. 基本騒音波形記憶部９６に記憶される基本騒音波形に対する温度補正係数表を示す。The temperature correction coefficient table with respect to the basic noise waveform memorize | stored in the basic noise waveform memory | storage part 96 is shown. 基本騒音波形記憶部９６に記憶される基本騒音波形に対する姿勢補正係数表を示す。An attitude correction coefficient table for the basic noise waveform stored in the basic noise waveform storage unit 96 is shown. 第３実施例の騒音処理の変更部分のフローチャートを示す。The flowchart of the change part of the noise process of 3rd Example is shown. 第４実施例におけるマイク指向性の変化例を示す。The example of a change of microphone directivity in 4th Example is shown. 第４本実施例の、騒音低減処理を含む動画録画動作のフローチャートの一部である。It is a part of flowchart of the moving image recording operation | movement containing a noise reduction process of a 4th Example. 第４本実施例の、騒音低減処理を含む動画録画動作のフローチャートの一部である。It is a part of flowchart of the moving image recording operation | movement containing a noise reduction process of a 4th Example.

符号の説明Explanation of symbols

１０：撮像装置
１２：カメラ本体部
１４：レンズ鏡筒
１６：フラッシュ
１８：マイク（音入力手段）
２０：シャッタボタン
２２：画像表示部
２４：ズームボタン
２４ａ：ワイドボタン
２４ｂ：テレボタン
２６：電源ボタン
２８：モード設定ボタン
３０：スピーカ（音出力手段）
３２：ズームレンズ群
３４：シャッタ機構
３６：フォーカスレンズ群
３８：ＮＤフィルタ進退機構
４０：撮像素子
４２：ズームレンズ群駆動回路
４４：駆動機構
４６：シャッタ駆動回路
４８：駆動機構
５０：フォーカスレンズ群駆動回路
５２：駆動機構
５４：ＮＤフィルタ駆動回路
５６：駆動機構
６０：全体制御部
６２：画像処理部
６４：画像メモリ
６６：ＶＲＡＭ
６８：音声処理部
７０：入力操作部
７２：記録媒体
７４：撮像制御部
７６：符号化復号化処理部
７７：駆動制御部
７８：被写体認識部
８０：人物自動選択部
８２：音声処理制御部
８４：ズームフィルタ処理部
８６：フォーカスフィルタ処理部
８８：ＮＤフィルタ処理部
９０：発声周波数帯域データ記憶部
９２：姿勢補正係数記憶部
９４：温度補正係数記憶部
９６：基本騒音波形記憶部
９８：姿勢センサ
１００：温度センサ 10: imaging device 12: camera body 14: lens barrel 16: flash 18: microphone (sound input means)
20: Shutter button 22: Image display unit 24: Zoom button 24a: Wide button 24b: Tele button 26: Power button 28: Mode setting button 30: Speaker (sound output means)
32: Zoom lens group 34: Shutter mechanism 36: Focus lens group 38: ND filter advance / retreat mechanism 40: Image sensor 42: Zoom lens group drive circuit 44: Drive mechanism 46: Shutter drive circuit 48: Drive mechanism 50: Focus lens group drive Circuit 52: Drive mechanism 54: ND filter drive circuit 56: Drive mechanism 60: Overall control unit 62: Image processing unit 64: Image memory 66: VRAM
68: Audio processing unit 70: Input operation unit 72: Recording medium 74: Imaging control unit 76: Coding / decoding processing unit 77: Drive control unit 78: Subject recognition unit 80: Automatic person selection unit 82: Audio processing control unit 84 : Zoom filter processing unit 86: focus filter processing unit 88: ND filter processing unit 90: utterance frequency band data storage unit 92: posture correction coefficient storage unit 94: temperature correction coefficient storage unit 96: basic noise waveform storage unit 98: posture sensor 100: Temperature sensor

Claims

撮像手段と、
音声入力手段と、
ズームレンズ群駆動手段、フォーカスレンズ群駆動手段、ＮＤフィルタ駆動手段及びシャッタ駆動手段の何れかを含む騒音源と、
前記撮像手段により入力される画像から被写体を認識する被写体認識手段と、
前記音声入力手段により入力される音声から前記騒音源からの騒音を低減する騒音低減手段と、
前記被写体認識手段の認識に従い、前記騒音低減手段を制御する制御手段
とを具備し、
前記被写体認識手段が、前記撮像手段により入力される画像中の人物を認識する手段であり、
前記制御手段は、前記被写体認識手段が前記人物を認識している間、人物の発生周波数帯域以外で、前記騒音の帯域を低減するように前記騒音低減手段を制御する
ことを特徴とする撮像装置。 Imaging means;
Voice input means;
A noise source including any of a zoom lens group driving unit, a focus lens group driving unit, an ND filter driving unit, and a shutter driving unit;
Subject recognition means for recognizing a subject from an image input by the imaging means;
Noise reduction means for reducing noise from the noise source from the voice input by the voice input means;
Control means for controlling the noise reduction means in accordance with the recognition of the subject recognition means
And
The subject recognition means is means for recognizing a person in the image input by the imaging means;
The control unit controls the noise reduction unit to reduce the noise band other than the frequency band generated by the person while the subject recognition unit recognizes the person. .

撮像手段と、
音声入力手段と、
ズームレンズ群駆動手段、フォーカスレンズ群駆動手段、ＮＤフィルタ駆動手段及びシャッタ駆動手段の何れかを含む騒音源と、
前記撮像手段により入力される画像から被写体を認識する被写体認識手段と、
前記音声入力手段により入力される音声から前記騒音源からの騒音を低減する騒音低減手段と、
前記被写体認識手段の認識に従い、前記騒音低減手段を制御する制御手段
とを具備し、
前記被写体認識手段が、前記撮像手段により入力される画像中の人物及びその口の動きを認識する手段であり、
前記制御手段は、前記被写体認識手段が口を動かしている人物を認識している間、前記騒音低減手段を有効にする
ことを特徴とする撮像装置。 Imaging means;
Voice input means;
A noise source including any of a zoom lens group driving unit, a focus lens group driving unit, an ND filter driving unit, and a shutter driving unit;
Subject recognition means for recognizing a subject from an image input by the imaging means;
Noise reduction means for reducing noise from the noise source from the voice input by the voice input means ;
Control means for controlling the noise reduction means in accordance with the recognition of the subject recognition means
And
The subject recognition means is means for recognizing a person in the image input by the imaging means and movement of the mouth;
The image pickup apparatus, wherein the control means enables the noise reduction means while the subject recognition means recognizes a person moving his / her mouth.

撮像手段と、
音声入力手段と、
ズームレンズ群駆動手段、フォーカスレンズ群駆動手段、ＮＤフィルタ駆動手段及びシャッタ駆動手段の何れかを含む騒音源と、
前記撮像手段により入力される画像から被写体を認識する被写体認識手段と、
前記音声入力手段により入力される音声から前記騒音源からの騒音を低減する騒音低減手段と、
前記被写体認識手段の認識に従い、前記騒音低減手段を制御する制御手段と、
性別と年齢ごとの代表的な発声周波数帯域の値を記憶する発声周波数帯域データ記憶手段
とを具備し、
前記被写体認識手段が、前記撮像手段により入力される画像中の人物並びにその性別と年齢を認識する手段であり、
前記制御手段は、前記被写体認識手段が前記人物を認識している間、前記人物の前記性別と年齢に応じた発生周波数帯域以外で、前記騒音を低減するように前記騒音低減手段を制御する
ことを特徴とする撮像装置。 Imaging means;
Voice input means;
A noise source including any of a zoom lens group driving unit, a focus lens group driving unit, an ND filter driving unit, and a shutter driving unit;
Subject recognition means for recognizing a subject from an image input by the imaging means;
Noise reduction means for reducing noise from the noise source from the voice input by the voice input means;
Control means for controlling the noise reduction means in accordance with the recognition of the subject recognition means;
Voice frequency band data storage means for storing representative voice frequency band values for each gender and age
And
The subject recognition means is means for recognizing a person in the image inputted by the imaging means and their gender and age;
The control means controls the noise reduction means to reduce the noise in a frequency band other than the generated frequency band according to the sex and age of the person while the subject recognition means recognizes the person. An imaging apparatus characterized by the above.

更に、温度センサを具備し、
前記制御手段は、前記温度センサからの温度情報に従い、前記騒音低減手段の騒音低減特性を補正する
ことを特徴とする請求項１乃至３の何れか１項に記載の撮像装置。 Furthermore, it has a temperature sensor,
The imaging apparatus according to claim 1, wherein the control unit corrects a noise reduction characteristic of the noise reduction unit according to temperature information from the temperature sensor.

更に、姿勢センサを具備し、
前記制御手段は、前記姿勢センサからの姿勢情報に従い、前記騒音低減手段の騒音低減特性を補正する
ことを特徴とする請求項１乃至４の何れか１項に記載の撮像装置。 Furthermore, it has an attitude sensor,
5. The imaging apparatus according to claim 1, wherein the control unit corrects a noise reduction characteristic of the noise reduction unit according to posture information from the posture sensor. 6.

撮像手段と、Imaging means;
音声取得手段と、Voice acquisition means;
前記音声取得手段により得られた音声から、前記撮像手段に関連する駆動手段の駆動に伴う騒音を低減する音声処理手段Voice processing means for reducing noise associated with driving of the driving means related to the imaging means from the voice obtained by the voice acquisition means
とを有し、And
前記音声処理手段は、前記撮像手段により撮像されている画像中に人物が検出されている間に前記音声取得手段により得られた音声について、人物の発生周波数帯域ではない周波数帯域において前記駆動手段の駆動に伴う騒音を低減することを特徴とする撮像装置。The sound processing means is configured to detect the sound obtained by the sound acquisition means while a person is detected in the image picked up by the image pickup means in a frequency band that is not a person generation frequency band. An image pickup apparatus that reduces noise caused by driving.

前記音声処理手段は、前記撮像手段により撮像されている画像中に検出された人物の口の動きが検出されている間に前記音声取得手段により得られた音声について、人物の発生周波数帯域ではない周波数帯域において前記駆動手段の駆動に伴う騒音を低減することを特徴とする請求項６に記載の撮像装置。The voice processing means is not in the frequency band generated by the person with respect to the voice obtained by the voice acquisition means while the movement of the person's mouth detected in the image picked up by the imaging means is detected. The imaging apparatus according to claim 6, wherein noise associated with driving of the driving unit is reduced in a frequency band.

撮像手段と、Imaging means;
音声取得手段と、Voice acquisition means;
前記音声取得手段により得られた音声から、前記撮像手段に関連する駆動手段の駆動に伴う騒音を低減する音声処理手段Voice processing means for reducing noise associated with driving of the driving means related to the imaging means from the voice obtained by the voice acquisition means
とを有し、And
前記音声処理手段は、前記撮像手段により撮像されている画像中に人物が検出されている間に前記音声取得手段により得られた音声について、前記検出された人物の性別に応じた周波数帯域において前記駆動手段の駆動に伴う騒音を低減することを特徴とする撮像装置。The sound processing unit is configured to detect the sound obtained by the sound acquisition unit while the person is detected in the image captured by the image capturing unit in a frequency band corresponding to the detected gender of the person. An image pickup apparatus that reduces noise associated with driving of a driving unit.

前記音声処理手段は、前記撮像手段により撮像されている画像中に複数の人物が検出された場合、前記検出された複数の人物の位置に応じて決定された人物の性別に応じた周波数帯域において前記駆動手段の駆動に伴う騒音を低減することを特徴とする請求項８に記載の撮像装置。When a plurality of persons are detected in the image picked up by the image pickup means, the sound processing means has a frequency band according to the gender of the person determined according to the positions of the detected persons. The imaging apparatus according to claim 8, wherein noise associated with driving of the driving unit is reduced.

前記音声処理手段は、前記撮像手段により撮像されている画像中に複数の人物が検出された場合、前記検出された複数の人物の撮像画像中の大きさに応じて決定された人物の性別に応じた周波数帯域において前記駆動手段の駆動に伴う騒音を低減することを特徴とする請求項８に記載の撮像装置。When a plurality of persons are detected in the image picked up by the image pickup means, the sound processing means determines the gender of the person determined according to the size in the picked-up image of the plurality of detected persons. The imaging apparatus according to claim 8, wherein noise associated with driving of the driving unit is reduced in a corresponding frequency band.

撮像手段と、Imaging means;
音声取得手段と、Voice acquisition means;
前記音声取得手段により得られた音声から、前記撮像手段に関連する駆動手段の駆動に伴う騒音を低減する音声処理手段Voice processing means for reducing noise associated with driving of the driving means related to the imaging means from the voice obtained by the voice acquisition means
とを有し、And
前記音声処理手段は、前記撮像手段により撮像されている画像中に人物が検出されている間に前記音声取得手段により得られた音声について、前記検出された人物の年齢に応じた周波数帯域において前記駆動手段の駆動に伴う騒音を低減するThe sound processing unit is configured to perform the sound obtained by the sound acquisition unit while the person is detected in the image captured by the image capturing unit in the frequency band according to the age of the detected person. Reduce the noise associated with driving the drive means
ことを特徴とする撮像装置。An imaging apparatus characterized by that.

前記音声処理手段は、前記撮像手段により撮像されている画像中に複数の人物が検出された場合、前記検出された複数の人物の位置に応じて、決定された人物の年齢に応じた周波数帯域において前記駆動手段の駆動に伴う騒音を低減することを特徴とする請求項１１に記載の撮像装置。When a plurality of persons are detected in the image captured by the image capturing unit, the sound processing unit is configured to determine a frequency band corresponding to the determined age of the persons according to the positions of the detected plurality of persons. The image pickup apparatus according to claim 11, wherein noise associated with driving of the driving unit is reduced.

前記音声処理手段は、前記撮像手段により撮像されている画像中に複数の人物が検出された場合、前記検出された複数の人物の撮像画像中の大きさに応じて決定された人物の年齢に応じた周波数帯域において前記駆動手段の駆動に伴う騒音を低減することを特徴とする請求項１１に記載の撮像装置。When a plurality of persons are detected in the image picked up by the image pickup means, the sound processing means sets the age of the person determined according to the size in the picked-up image of the detected plurality of persons. The imaging apparatus according to claim 11, wherein noise associated with driving of the driving unit is reduced in a corresponding frequency band.

前記音声処理手段は、前記駆動手段うち駆動中の駆動手段の種類に応じた周波数の音声を低減することを特徴とする請求項６から１３のいずれか１項に記載の撮像装置。The imaging apparatus according to any one of claims 6 to 13, wherein the sound processing unit reduces sound having a frequency corresponding to a type of the driving unit being driven among the driving units.

前記駆動手段はズームレンズを含むことを特徴とする請求項６から１４のいずれか１項に記載の撮像装置。15. The imaging apparatus according to claim 6, wherein the driving unit includes a zoom lens.

前記音声処理手段は、前記ズームレンズの駆動に対応する周波数の音声を低減することを特徴とする請求項１５に記載の撮像装置。The imaging apparatus according to claim 15, wherein the sound processing unit reduces sound having a frequency corresponding to driving of the zoom lens.

前記駆動手段はフォーカスレンズを含むことを特徴とする請求項６から１４のいずれか１項に記載の撮像装置。The imaging device according to claim 6, wherein the driving unit includes a focus lens.

前記音声処理手段は、前記フォーカスレンズの駆動に対応する周波数の音声を低減することを特徴とする請求項１７に記載の撮像装置。The imaging apparatus according to claim 17, wherein the sound processing unit reduces sound having a frequency corresponding to driving of the focus lens.

前記駆動手段はＮＤフィルタを含むことを特徴とする請求項６から１４のいずれか１項に記載の撮像装置。The imaging device according to claim 6, wherein the driving unit includes an ND filter.

前記音声処理手段は、前記ＮＤフィルタの駆動に対応する周波数の音声を低減することを特徴とする請求項１９に記載の撮像装置。The imaging apparatus according to claim 19, wherein the sound processing unit reduces sound having a frequency corresponding to driving of the ND filter.

前記音声処理手段により前記駆動手段の駆動に伴う騒音を低減していることを報知する報知手段を有することを特徴とする請求項６から２０のいずれか１項に記載の撮像装置。21. The imaging apparatus according to claim 6, further comprising an informing unit that informs that noise caused by driving of the driving unit is reduced by the audio processing unit.