JP2021002770A

JP2021002770A - Imaging apparatus and control method therefor, program

Info

Publication number: JP2021002770A
Application number: JP2019115745A
Authority: JP
Inventors: 真宏会見; Masahiro Aimi; 信行堀江; Nobuyuki Horie; 文裕梶村; Fumihiro Kajimura; 峻川田; Shun Kawada; 太郎松野; Taro Matsuno
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-06-21
Filing date: 2019-06-21
Publication date: 2021-01-07
Anticipated expiration: 2039-06-21
Also published as: JP7365793B2

Abstract

To make it easy to discriminate that voice recognition is in progress.SOLUTION: An imaging apparatus has imaging means, voice input means, control means performing imaging process in response to imaging start instruction from a user and setting means changing a setting determined upon imaging according to voice of the user input by the voice input means. The control means performs control in such a manner that imaging start instruction from the user is not accepted during a period while voice recognition processing is performed on a voice input by the voice input means.SELECTED DRAWING: Figure 2

Description

本発明は、ユーザの音声により撮像装置を制御する技術に関する。 The present invention relates to a technique for controlling an image pickup apparatus by a user's voice.

特許文献１には、カメラに音声認識機能を搭載し、ユーザの発する音声によりカメラの制御を行うことが記載されている。これにより、ユーザは、煩雑な操作を行うことなくハンズフリーでカメラを操作することができる。 Patent Document 1 describes that a camera is equipped with a voice recognition function and the camera is controlled by a voice emitted by a user. As a result, the user can operate the camera hands-free without performing complicated operations.

特開２０００−２３１１４２号公報JP-A-2000-231142

特許文献１のようにカメラに音声認識機能を搭載した場合、入力された音声データと膨大な音声データベースの音声情報を比較し、音声認識するため、音声の認識・理解に時間がかかることが考えられる。また、最近では、音声認識に関して、ＡＩスピーカに代表されるように、ネットワーク経由で音声データを送信し、クラウド上で音声データを解析する方法がある。外部機器であるクラウドは、複雑な音声命令であっても認識精度の高い音声認識システムを用意することができるので、精度よく音声認識を行いユーザの意図する操作をすることができる。このような方法でカメラが音声データの解析を行った場合、ネットワークの通信状態によってはデータの送受信にも時間がかかることが考えられる。また、例えば、クラウド上での音声認識処理中にユーザが撮影開始の指示を行った場合、変更される前の設定で撮影処理が行われることになり、ユーザの意図する撮影ができないことが考えられる。 When the camera is equipped with a voice recognition function as in Patent Document 1, it may take time to recognize and understand the voice because the input voice data is compared with the voice information of a huge voice database and the voice is recognized. Be done. Recently, regarding voice recognition, there is a method of transmitting voice data via a network and analyzing the voice data on the cloud, as represented by an AI speaker. Since the cloud, which is an external device, can prepare a voice recognition system having high recognition accuracy even for a complicated voice command, it is possible to perform voice recognition with high accuracy and perform an operation intended by the user. When the camera analyzes the voice data by such a method, it may take time to send and receive the data depending on the communication state of the network. Further, for example, if the user gives an instruction to start shooting during the voice recognition process on the cloud, the shooting process will be performed with the settings before the change, and it is conceivable that the shooting intended by the user cannot be performed. Be done.

本発明は、上記課題に鑑みてなされ、その目的は、音声認識中であることを容易に判別でき、ユーザの意図通りの撮影が可能になる技術を実現することである。 The present invention has been made in view of the above problems, and an object of the present invention is to realize a technique capable of easily determining that voice recognition is in progress and enabling shooting as intended by the user.

上記課題を解決し、目的を達成するために、本発明の撮像装置は、撮像手段と、音声入力手段と、ユーザの撮影開始の指示に応じて撮影処理を行う制御手段と、前記音声入力手段により入力されたユーザの音声に応じて撮影時の設定を変更する設定手段と、を有し、前記制御手段は、前記音声入力手段により入力された音声について音声認識処理を行っている間はユーザの撮影開始の指示を受け付けないように制御する。 In order to solve the above problems and achieve the object, the imaging device of the present invention includes an imaging means, a voice input means, a control means that performs a shooting process in response to a user's instruction to start shooting, and the voice input means. It has a setting means for changing the setting at the time of shooting according to the voice of the user input by the voice input means, and the control means is a user while performing voice recognition processing on the voice input by the voice input means. It is controlled not to accept the instruction to start shooting.

本発明によれば、音声認識中であることを容易に判別でき、ユーザの意図通りの撮影が可能になる。 According to the present invention, it is possible to easily determine that voice recognition is in progress, and it is possible to take a picture as intended by the user.

実施形態１の装置構成を示すブロック図。The block diagram which shows the apparatus configuration of Embodiment 1. 実施形態１の撮像装置の処理を示すフローチャート。The flowchart which shows the processing of the image pickup apparatus of Embodiment 1. 実施形態１の音声認識サーバの処理を示すフローチャート。The flowchart which shows the process of the voice recognition server of Embodiment 1. 実施形態２の撮像装置の処理を示すフローチャート。The flowchart which shows the processing of the image pickup apparatus of Embodiment 2.

以下、添付図面を参照して実施形態を詳しく説明する。尚、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the invention according to the claims. Although a plurality of features are described in the embodiment, not all of these features are essential to the invention, and the plurality of features may be arbitrarily combined. Further, in the attached drawings, the same or similar configurations are designated by the same reference numbers, and duplicate description is omitted.

［実施形態１］以下、実施形態１について説明する。 [Embodiment 1] Hereinafter, the first embodiment will be described.

以下では、本実施形態の撮像装置として外部機器とネットワークを介して通信可能なデジタルカメラに適用した例について説明する。なお、本実施形態の撮像装置はデジタルカメラに限らず、携帯電話やその一種であるスマートフォン、タブレット、パーソナルコンピュータ（ＰＣ）、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）などのカメラ機能を有する他の装置にも適用可能である。 Hereinafter, an example of applying the imaging device of the present embodiment to a digital camera capable of communicating with an external device via a network will be described. The imaging device of the present embodiment is not limited to a digital camera, but is also applicable to other devices having a camera function such as a mobile phone, a smartphone, a tablet, a personal computer (PC), and a PDA (Personal Digital Assistant). It is possible.

＜装置構成＞まず、図１を参照して、本実施形態の撮像装置１の構成および機能について説明する。 <Device Configuration> First, the configuration and functions of the imaging device 1 of the present embodiment will be described with reference to FIG.

本実施形態の撮像装置１は、例えばレンズ部１００とカメラ本体２００とを有し、レンズ部１００がカメラ本体２００に着脱可能に構成されている一眼レフデジタルカメラ、あるいは、レンズ部１００とカメラ本体２００が一体的に構成されたコンパクトデジタルカメラである。カメラ本体２００は、ネットワーク上のサーバ装置３００と無線通信または有線通信により接続可能である。サーバ装置３００は、例えば、音声認識機能を有する音声認識サーバである。 The image pickup apparatus 1 of the present embodiment has, for example, a single-lens reflex digital camera having a lens unit 100 and a camera body 200, and the lens unit 100 is detachably attached to the camera body 200, or the lens unit 100 and the camera body. The 200 is a compact digital camera integrally configured. The camera body 200 can be connected to the server device 300 on the network by wireless communication or wired communication. The server device 300 is, for example, a voice recognition server having a voice recognition function.

レンズ部１００は撮像装置１の撮影光学系を構成する。レンズ部１００は、絞り１１、手振れ補正レンズなどのレンズ群１２、フォーカスレンズやズームレンズなどのレンズ群１３、などを備え、被写体の光学像をカメラ本体２００へ導くことができる。 The lens unit 100 constitutes the photographing optical system of the imaging device 1. The lens unit 100 includes an aperture 11, a lens group 12 such as a camera shake correction lens, a lens group 13 such as a focus lens and a zoom lens, and the like, and can guide an optical image of a subject to the camera body 200.

カメラ本体２００は、レンズ部１００により結像された光学像を光電変換して画像信号を生成する撮像素子２１と、撮像素子２１を露光する露出時間を調整するメカニカルシャッター２２を備える。カメラ本体２００は、複数の設定項目の設定値（撮影設定）に基づいて、レンズ部１００の絞り１１とレンズ群１２、１３を制御すると共に、撮像素子２１の駆動タイミングとメカニカルシャッター２２のシャッタースピードを制御して適正な露出で画像の撮像を行う。 The camera body 200 includes an image pickup element 21 that photoelectrically converts an optical image formed by the lens unit 100 to generate an image signal, and a mechanical shutter 22 that adjusts the exposure time for exposing the image pickup element 21. The camera body 200 controls the aperture 11 of the lens unit 100 and the lens groups 12 and 13 based on the set values (shooting settings) of a plurality of setting items, and also controls the drive timing of the image pickup element 21 and the shutter speed of the mechanical shutter 22. Is controlled to capture an image with an appropriate exposure.

カメラ本体２００は、撮像素子２１で撮像された画像やカメラの撮影時の各種の設定値などを表示可能な背面表示部２３を備える。背面表示部２３は、液晶パネルや有機ＥＬなどの表示デバイスで構成され、カメラ本体２００におけるレンズ部１００とは反対側の背面部に設けられている。 The camera body 200 includes a rear display unit 23 capable of displaying an image captured by the image sensor 21 and various set values at the time of shooting by the camera. The rear display unit 23 is composed of a display device such as a liquid crystal panel or an organic EL, and is provided on the back surface portion of the camera body 200 opposite to the lens portion 100.

なお、撮像素子２１が、撮像素子２１の信号蓄積時間および信号読出時間を制御することで露出時間を調整可能な電子シャッター機能を備えている場合にはメカニカルシャッター２２は不要である。また、メカニカルシャッター２２と電子シャッター機能を備える場合に、電子シャッターで露出時間を調整する場合はメカニカルシャッター２２は全開状態とする。 The mechanical shutter 22 is unnecessary when the image pickup device 21 has an electronic shutter function capable of adjusting the exposure time by controlling the signal accumulation time and the signal read time of the image pickup device 21. Further, when the mechanical shutter 22 and the electronic shutter function are provided and the exposure time is adjusted by the electronic shutter, the mechanical shutter 22 is fully opened.

カメラ本体２００は、電気回路２０を備える。電気回路２０は、演算処理回路２０ａ、メモリ回路２０ｂ、画像処理回路２０ｃ、画像圧縮回路２０ｄ、状態検出回路２０ｅ、音声再生回路２０ｆ、駆動制御回路２０ｇ、などを含む。 The camera body 200 includes an electric circuit 20. The electric circuit 20 includes an arithmetic processing circuit 20a, a memory circuit 20b, an image processing circuit 20c, an image compression circuit 20d, a state detection circuit 20e, a voice reproduction circuit 20f, a drive control circuit 20g, and the like.

演算処理回路２０ａは、レンズ部１００やカメラ本体２００の動作を制御するための各種の演算処理を行うＣＰＵやＭＰＵなどのハードウェアプロセッサを含む。演算処理回路２０ａは、記憶部２９に格納されたプログラムを実行することにより、レンズ部１００やカメラ本体２００の各部を制御する。ここでいうプログラムは、本実施形態の制御処理を行うプログラムを含む。 The arithmetic processing circuit 20a includes a hardware processor such as a CPU and an MPU that performs various arithmetic processing for controlling the operation of the lens unit 100 and the camera body 200. The arithmetic processing circuit 20a controls each unit of the lens unit 100 and the camera body 200 by executing the program stored in the storage unit 29. The program referred to here includes a program that performs the control processing of the present embodiment.

メモリ回路２０ｂは、記憶部２９から読み出したプログラムを展開するワークメモリ、撮像素子２１で撮像された画像データを一時的に保持するバッファメモリ、背面表示部２３の画像表示用メモリとして使用される。 The memory circuit 20b is used as a work memory for developing a program read from the storage unit 29, a buffer memory for temporarily holding image data captured by the image sensor 21, and an image display memory for the rear display unit 23.

画像処理回路２０ｃは、撮像素子２１で生成された画像信号をデジタルデータに変換し、各種の画像処理を行う。画像処理回路２０ｃから出力される画像データは、背面表示部２３に出力されたり、画像圧縮回路２０ｄで所定のデータ形式に圧縮されて記憶部２９に出力され記録される。 The image processing circuit 20c converts the image signal generated by the image sensor 21 into digital data and performs various image processing. The image data output from the image processing circuit 20c is output to the rear display unit 23, or is compressed into a predetermined data format by the image compression circuit 20d and output to the storage unit 29 for recording.

画像圧縮回路２０ｄは、画像処理回路２０ｃから出力される画像データを所定のデータ形式に圧縮符号化して画像ファイルを生成する。 The image compression circuit 20d compresses and encodes the image data output from the image processing circuit 20c into a predetermined data format to generate an image file.

状態検出回路２０ｅは、音声認識サーバ３００による音声認識状態を検出することが可能であり、音声認識状態の検出結果を演算処理回路２０ａに出力する。 The state detection circuit 20e can detect the voice recognition state by the voice recognition server 300, and outputs the detection result of the voice recognition state to the arithmetic processing circuit 20a.

音声再生回路２０ｆは、記憶部２９から読み出した音声ファイルから音声データを再生する。 The voice reproduction circuit 20f reproduces voice data from the voice file read from the storage unit 29.

駆動制御回路２０ｇは、演算処理回路２０ａの演算処理結果に基づいて、不図示の駆動回路やアクチュエータなどを制御して、レンズ部１００の絞り１１、レンズ群１２、１３、カメラ本体２００のメカニカルシャッター２２を制御する。 The drive control circuit 20g controls a drive circuit (not shown), an actuator, or the like based on the calculation processing result of the calculation processing circuit 20a, and controls the aperture 11, the lens groups 12, 13 of the lens unit 100, and the mechanical shutter of the camera body 200. 22 is controlled.

カメラ本体２００は、ユーザ操作を受け付けるスイッチ、ボタン、タッチパネルなどの操作部２４を備える。本実施形態では、操作部２４は、撮影準備または撮影開始を指示するシャッタースイッチ（ＳＷ）２４ａを含み、シャッタースイッチ２４ａを一段目まで浅く押す、いわゆる「半押し（撮影準備）」することで、ＡＦ（オートフォーカス）処理やＡＥ（自動露出）処理、ＡＷＢ（オートホワイトバランス）処理、ＥＦ（フラッシュプリ発光）処理等の動作を開始する。さらに、シャッタースイッチ２４ａを半押しから二段目まで深く押す、いわゆる「全押し（撮影開始）」することで、メカニカルシャッター２２または撮像素子２１の電子シャッター機能を作動させ、撮像素子２１からの信号読み出しから記憶部２９に画像データを書き込むまでの一連の撮影処理の動作を開始する。 The camera body 200 includes operation units 24 such as switches, buttons, and a touch panel that accept user operations. In the present embodiment, the operation unit 24 includes a shutter switch (SW) 24a for instructing shooting preparation or shooting start, and the shutter switch 24a is lightly pressed to the first step, so-called "half-pressing (shooting preparation)". Operations such as AF (autofocus) processing, AE (automatic exposure) processing, AWB (auto white balance) processing, and EF (flash pre-flash) processing are started. Further, by pressing the shutter switch 24a deeply from half-pressed to the second step, so-called "full-pressing (starting shooting)", the electronic shutter function of the mechanical shutter 22 or the image sensor 21 is activated, and the signal from the image sensor 21 is activated. A series of shooting processes from reading to writing image data to the storage unit 29 is started.

シャッタースイッチ２４ａを「全押し」することで発生する撮影開始指示信号は状態検出回路２０ｅに出力される。状態検出回路２０ｅは、音声認識サーバ３００による音声認識状態の検出結果を演算処理回路２０ａに出力する。演算処理回路２０ａは、音声認識状態の検出結果に基づいて、記憶部２９に格納されたプログラムを実行し、後述する制御処理を実行する。なお、操作部２４として、後述する音声認識サーバ３００による音声認識機能をユーザがオンまたはオフできるスイッチを設けてもよい。 The shooting start instruction signal generated by "fully pressing" the shutter switch 24a is output to the state detection circuit 20e. The state detection circuit 20e outputs the detection result of the voice recognition state by the voice recognition server 300 to the arithmetic processing circuit 20a. The arithmetic processing circuit 20a executes the program stored in the storage unit 29 based on the detection result of the voice recognition state, and executes the control processing described later. The operation unit 24 may be provided with a switch that allows the user to turn on or off the voice recognition function by the voice recognition server 300, which will be described later.

カメラ本体２００は、通信部２５を備える。通信部２５は、カメラ本体２００をインターネットなどのネットワークを介して外部機器と通信可能に接続するためのインターフェース回路を備える。カメラ本体２００は、通信部２５により、有線または無線のネットワークに接続された外部機器とデータの送受信を行うことができる。例えば、カメラ本体２００は、通信部２５を制御して、音声入力部２７から入力された音声データをネットワーク上の音声認識サーバ３００に出力可能である。また、カメラ本体２００は、通信部２５を制御して、音声認識サーバ３００からカメラ本体２００の撮影設定に関するコマンドを受信することもできる。 The camera body 200 includes a communication unit 25. The communication unit 25 includes an interface circuit for connecting the camera body 200 to an external device in a communicable manner via a network such as the Internet. The camera body 200 can transmit and receive data to and from an external device connected to a wired or wireless network by the communication unit 25. For example, the camera body 200 can control the communication unit 25 and output the voice data input from the voice input unit 27 to the voice recognition server 300 on the network. The camera body 200 can also control the communication unit 25 to receive commands related to the shooting settings of the camera body 200 from the voice recognition server 300.

カメラ本体２００は、音声入力部２７を備える。音声入力部２７は、マイクロフォンなどを備え、入力された音声を電気信号に変換し、音声データとして電気回路２０に出力する。電気回路２０に出力された音声データは、音声出力部２８に出力されたり、画像データに付加されて記憶部２９に出力され記録されたりする。本実施形態においては、例えば、音声入力部２７はユーザが発した音声を入力し、音声データを電気回路２０に出力する。電気回路２０は、後述する音声認識サーバ３００によりユーザの音声を認識し、認識結果に基づいてカメラ本体２００の撮影設定を行うことができる。音声入力部２７は、カメラ本体２００に内蔵されていてもよいし、不図示の外部端子に接続されていてもよい。 The camera body 200 includes a voice input unit 27. The voice input unit 27 is provided with a microphone or the like, converts the input voice into an electric signal, and outputs the input voice to the electric circuit 20 as voice data. The audio data output to the electric circuit 20 is output to the audio output unit 28, or is added to the image data and output to the storage unit 29 for recording. In the present embodiment, for example, the voice input unit 27 inputs the voice emitted by the user and outputs the voice data to the electric circuit 20. The electric circuit 20 can recognize the user's voice by the voice recognition server 300 described later, and can set the shooting of the camera body 200 based on the recognition result. The voice input unit 27 may be built in the camera body 200 or may be connected to an external terminal (not shown).

カメラ本体２００は、音声出力部２８を備える。音声出力部２８は、スピーカなどを備え、音声再生回路２０ｆで再生された音声データを出力する。音声出力部２８がカメラ本体２００に内蔵されたスピーカの場合には音声を直接再生可能であり、イヤホンなどを有線または無線により接続可能な音声出力端子の場合には音声を音声出力端子を介して再生できる。 The camera body 200 includes an audio output unit 28. The audio output unit 28 includes a speaker and the like, and outputs audio data reproduced by the audio reproduction circuit 20f. When the audio output unit 28 is a speaker built in the camera body 200, the audio can be directly reproduced, and when the audio output terminal is a wired or wireless connection such as an earphone, the audio is transmitted via the audio output terminal. Can be played.

カメラ本体２００は、メモリカードやハードディスクなどの記憶部２９を備える。記憶部２９には、演算処理回路２０ａが実行するプログラムが格納されている。また、記憶部２９は、画像圧縮回路２０ｄで所定のフォーマットに圧縮された画像ファイルが記録されたり、既に記録されている画像ファイルが読み出される。記憶部２９は、カメラ本体２００に対して着脱可能な形態であってもよいし、カメラ本体２００に内蔵された形態であってもよい。 The camera body 200 includes a storage unit 29 such as a memory card or a hard disk. The storage unit 29 stores a program executed by the arithmetic processing circuit 20a. Further, the storage unit 29 records an image file compressed in a predetermined format by the image compression circuit 20d, or reads an image file that has already been recorded. The storage unit 29 may be detachable from the camera body 200, or may be built into the camera body 200.

次に、図１を参照して、本実施形態の音声認識サーバ３００の構成および機能について説明する。 Next, the configuration and function of the voice recognition server 300 of the present embodiment will be described with reference to FIG.

音声認識サーバ３００は、制御部３０、通信部３１、音声認識部３２、コマンド生成部３３を備える。 The voice recognition server 300 includes a control unit 30, a communication unit 31, a voice recognition unit 32, and a command generation unit 33.

制御部３０は、音声認識サーバ３００の動作を制御するための各種の演算処理を行うＣＰＵやＭＰＵなどのハードウェアプロセッサを含む。制御部３０は、所定のプログラムを実行することにより、音声認識サーバ３００の各部を制御する。ここでいうプログラムは、本実施形態の音声認識処理を行うプログラムを含む。 The control unit 30 includes a hardware processor such as a CPU and an MPU that performs various arithmetic processes for controlling the operation of the voice recognition server 300. The control unit 30 controls each unit of the voice recognition server 300 by executing a predetermined program. The program referred to here includes a program that performs the voice recognition process of the present embodiment.

通信部３１は、ネットワークを介してカメラ本体２００の通信部２５と接続し、カメラ本体２００とデータの送受信が可能である。通信部３１は、カメラ本体２００の通信部２５から送信された音声データを音声認識部３２に出力する。音声認識部３２は、音声データを解析し、テキストデータとしてコマンド生成部３３に出力する。音声認識部は例えばＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を含む。ＧＰＵはデータをより多く並列処理することで効率的な演算を行うことができるので、ディープラーニングのような学習モデルを用いて複数回に渡り学習を行う場合にはＧＰＵで処理を行うことが有効である。そこで本実施形態では、学習モデルを含む推論プログラムを実行する場合に、制御部３０とＧＰＵが協働して演算を行うことで音声認識のための推論処理を行う。なお、この推論処理は制御部３０またはＧＰＵのみにより演算が行われてもよい。また、この音声認識のために学習モデルを用いる場合には、あらかじめ、音声データを入力データ、その音声データの内容をテキストにしたテキストデータを教師データとして学習させておく。そして、推論処理の際には、カメラ本体２００の通信部２５から送信された音声データを入力データとして、推論した結果のテキストデータを出力する。コマンド生成部３３は、テキストデータをカメラ本体２００の撮影設定に関するコマンドに変換し、通信部３１を介してカメラ本体２００の通信部２５に送信する。カメラ本体２００は、音声認識サーバ３００の通信部３１から送信されたコマンドに基づいて、カメラ本体２００の撮影設定を行う。このように、音声認識サーバ３００は、ユーザが入力した音声データを撮像装置１で撮影設定を行うためのコマンドに変換することができる。 The communication unit 31 is connected to the communication unit 25 of the camera body 200 via a network, and can send and receive data to and from the camera body 200. The communication unit 31 outputs the voice data transmitted from the communication unit 25 of the camera body 200 to the voice recognition unit 32. The voice recognition unit 32 analyzes the voice data and outputs it as text data to the command generation unit 33. The voice recognition unit includes, for example, a GPU (Graphics Processing Unit). Since the GPU can perform efficient calculations by processing more data in parallel, it is effective to perform the processing on the GPU when learning is performed multiple times using a learning model such as deep learning. Is. Therefore, in the present embodiment, when the inference program including the learning model is executed, the control unit 30 and the GPU cooperate to perform the inference process for speech recognition. In addition, this inference processing may be performed only by the control unit 30 or the GPU. When a learning model is used for this voice recognition, the voice data is input data and the text data in which the content of the voice data is used as text is trained in advance as teacher data. Then, in the inference process, the voice data transmitted from the communication unit 25 of the camera body 200 is used as input data, and the text data of the inference result is output. The command generation unit 33 converts the text data into commands related to the shooting settings of the camera body 200, and transmits the text data to the communication unit 25 of the camera body 200 via the communication unit 31. The camera body 200 sets the shooting of the camera body 200 based on the command transmitted from the communication unit 31 of the voice recognition server 300. In this way, the voice recognition server 300 can convert the voice data input by the user into a command for setting the shooting with the image pickup device 1.

なお、図１において制御信号は省略されており、各構成要素の間のデータの流れのみを矢印で示している。 Note that control signals are omitted in FIG. 1, and only the data flow between the components is indicated by arrows.

＜撮像装置１の処理＞次に、図２を参照して、実施形態１の撮像装置１による撮影時の設定処理と制御処理について説明する。なお、図２の処理は、カメラ本体２００の電気回路２０が備える演算処理回路（以下、制御部）２０ａが記憶部２９に格納されたプログラムを実行することにより実現される。また、図２の処理は、撮像装置１の電源がオンされたり、動作モードが表示モードから撮影モードに切り替えられるなどして、撮像装置１が撮影可能な状態に遷移することで開始される。また、以下では、音声入力部２７は、ユーザが発した音声を全て集音し、集音した音声データを音声認識サーバ３００に送信して音声認識を行うものとする。後述する図４でも同様である。 <Processing of the Imaging Device 1> Next, a setting process and a control process at the time of photographing by the imaging device 1 of the first embodiment will be described with reference to FIG. The processing of FIG. 2 is realized by executing a program stored in the storage unit 29 by the arithmetic processing circuit (hereinafter, control unit) 20a included in the electric circuit 20 of the camera body 200. Further, the process of FIG. 2 is started when the power of the image pickup device 1 is turned on, the operation mode is switched from the display mode to the shooting mode, and the image pickup device 1 is in a state where shooting is possible. Further, in the following, it is assumed that the voice input unit 27 collects all the voices emitted by the user and transmits the collected voice data to the voice recognition server 300 for voice recognition. The same applies to FIG. 4, which will be described later.

Ｓ２０１では、制御部２０ａは、操作部２４または自動で撮影に関する設定がされた後、処理をＳ２０２に進める。ここでは、例えば、シャッタースピードが１／３０秒に設定される。 In S201, the control unit 20a advances the process to S202 after the operation unit 24 or the setting related to shooting is automatically made. Here, for example, the shutter speed is set to 1/30 second.

Ｓ２０２では、制御部２０ａは、操作部２４に含まれるシャッタースイッチ２４ａが半押しされたか否かを判定する。制御部２０ａはシャッタースイッチ２４ａが半押しされたと判定した場合は、処理をＳ２０３に進め、半押しされないと判定した場合はＳ２０２の判定を継続する。 In S202, the control unit 20a determines whether or not the shutter switch 24a included in the operation unit 24 is half-pressed. When the control unit 20a determines that the shutter switch 24a is half-pressed, the process proceeds to S203, and when it is determined that the shutter switch 24a is not half-pressed, the control unit 20a continues the determination of S202.

Ｓ２０３では、制御部２０ａは、音声入力部２７がユーザの音声を入力したか否かを判定する。制御部２０ａは、音声入力部２７がユーザの音声を入力したと判定した場合は、処理をＳ２０４に進め、ユーザの音声を入力していないと判定した場合は、処理をＳ２０９に進める。ここでは、例えば、音声入力部２７が「シャッタースピードを１／６０秒に変更」といった音声を入力する。 In S203, the control unit 20a determines whether or not the voice input unit 27 has input the user's voice. If the control unit 20a determines that the voice input unit 27 has input the user's voice, the process proceeds to S204, and if it determines that the user's voice has not been input, the control unit 20a proceeds to the process to S209. Here, for example, the voice input unit 27 inputs a voice such as "change the shutter speed to 1/60 second".

Ｓ２０４では、制御部２０ａは、状態検出回路２０ｅにより、音声入力部２７が音声を集音中であるか否かを判定する。制御部２０ａは、音声入力部２７が音声を集音中であると判定した場合は、Ｓ２０４でのユーザの音声入力を継続し、集音が終了したと判定した場合は、処理をＳ２０５に進める。音声入力部２７が音声を集音中にシャッタースイッチ２４の全押しによる撮影開始の指示を受け付けた場合、制御部２０ａは、後述する音声入力により変更される前の設定で撮影処理を開始する。 In S204, the control unit 20a determines whether or not the voice input unit 27 is collecting sound by the state detection circuit 20e. When the voice input unit 27 determines that the voice is being collected, the control unit 20a continues the user's voice input in S204, and when it is determined that the sound collection is completed, advances the process to S205. .. When the voice input unit 27 receives an instruction to start shooting by pressing the shutter switch 24 fully while collecting sound, the control unit 20a starts the shooting process with the settings before being changed by the voice input described later.

Ｓ２０５では、制御部２０ａは、音声入力部２７による集音が終了し、音声認識サーバ３００による音声認識処理が開始されるので、シャッタースイッチ２４ａの全押しによる撮影開始の指示を無効化し、処理をＳ２０６に進める。この場合、機械的にシャッタースイッチの全押しが不可になるように構成してもよい。 In S205, the control unit 20a ends the sound collection by the voice input unit 27 and starts the voice recognition process by the voice recognition server 300. Therefore, the instruction to start shooting by fully pressing the shutter switch 24a is invalidated and the process is performed. Proceed to S206. In this case, the shutter switch may be mechanically configured so as not to be fully pressed.

Ｓ２０６では、制御部２０ａは、Ｓ２０３およびＳ２０４において音声入力部２７から入力された音声データを、通信部２５を介して音声認識サーバ３００に送信し、処理をＳ２０７に進める。音声データは、例えば、ｗａｖファイルやｍｐ３ファイルなどである。 In S206, the control unit 20a transmits the voice data input from the voice input unit 27 in S203 and S204 to the voice recognition server 300 via the communication unit 25, and proceeds to the process in S207. The audio data is, for example, a wav file or an mp3 file.

Ｓ２０７では、制御部２０ａは、音声認識サーバ３００から送信されたコマンドを通信部２５を介して受信し、処理をＳ２０８に進める。コマンドは、例えば、シャッタースピードを１／６０秒に変更するコマンドである。 In S207, the control unit 20a receives the command transmitted from the voice recognition server 300 via the communication unit 25, and proceeds to the process in S208. The command is, for example, a command for changing the shutter speed to 1/60 second.

Ｓ２０８では、制御部２０ａは、Ｓ２０７で受信したコマンドに含まれる、音声入力部２７が入力したユーザの音声に応じた撮影設定の設定値を適用し、処理をＳ２０９に進める。ここでは、例えば、Ｓ２０１で設定されたシャッタースピードが１／３０秒から１／６０秒に変更される。 In S208, the control unit 20a applies the setting value of the shooting setting according to the user's voice input by the voice input unit 27 included in the command received in S207, and proceeds to the process in S209. Here, for example, the shutter speed set in S201 is changed from 1/30 second to 1/60 second.

Ｓ２０９では、制御部２０ａは、状態検出回路２０ｅによる音声認識状態の検出結果に基づいて音声認識処理が完了したと判定し、Ｓ２０５で無効化したシャッタースイッチ２４ａの全押しによる撮像開始の指示を有効化し、処理をＳ２１０に進める。 In S209, the control unit 20a determines that the voice recognition process is completed based on the detection result of the voice recognition state by the state detection circuit 20e, and validates the instruction to start imaging by fully pressing the shutter switch 24a invalidated in S205. And proceed with the process to S210.

Ｓ２１０では、制御部２０ａは、シャッタースイッチ２４ａの全押しによる撮影開始の指示を受け付けたか否かを判定し、指示を受け付けた場合は、処理をＳ２１１に進め、指示がない場合は、処理をＳ２０２に戻す。 In S210, the control unit 20a determines whether or not the instruction to start shooting by fully pressing the shutter switch 24a is received, and if the instruction is received, the process proceeds to S211. If there is no instruction, the process proceeds to S202. Return to.

Ｓ２１１では、制御部２０ａは、Ｓ２０８で変更された設定に基づいて撮影処理を実行し、処理をＳ２１２に進める。 In S211 the control unit 20a executes a shooting process based on the setting changed in S208, and advances the process to S212.

Ｓ２１２では、制御部２０ａは、Ｓ２１１で生成された画像データを記憶部２９に記憶し、処理をＳ２１３に進める。 In S212, the control unit 20a stores the image data generated in S211 in the storage unit 29, and proceeds to the process in S213.

Ｓ２１３では、制御部２０ａは、撮影モードを終了するか否かを判定し、終了すると判定した場合は処理を終了し、終了しないと判定した場合は処理をＳ２０１に戻す。 In S213, the control unit 20a determines whether or not to end the shooting mode, ends the process if it determines that it ends, and returns the process to S201 if it determines that it does not end.

なお、Ｓ２１１やＳ２１２では、撮影画像を表示すると共に、撮影画像が音声入力により変更された設定を反映したものであることを通知するように、背面表示部２３にメッセージなどを表示してもよい。 In S211 and S212, a message or the like may be displayed on the rear display unit 23 so as to display the captured image and notify that the captured image reflects the setting changed by the voice input. ..

＜音声認識サーバ３００の処理＞次に、図３を参照して、本実施形態の音声認識サーバ３００による音声認識処理について説明する。なお、図３の処理は、音声認識サーバ３００の制御部３０が不図示のメモリに格納されたプログラムを実行することにより実現される。 <Processing of Voice Recognition Server 300> Next, the voice recognition processing by the voice recognition server 300 of the present embodiment will be described with reference to FIG. The process of FIG. 3 is realized by the control unit 30 of the voice recognition server 300 executing a program stored in a memory (not shown).

Ｓ３０１では、制御部３０は、図２のＳ２０５においてカメラ本体２００から音声データを受信する。 In S301, the control unit 30 receives audio data from the camera body 200 in S205 of FIG.

Ｓ３０２では、制御部３０は、音声認識部３２を制御して、Ｓ３０１で受信した音声データに対して音声認識処理を行う。音声認識部３２は、音声データのテキスト化、言語理解などを行い、音声認識処理の結果をコマンド生成部３３に出力する。 In S302, the control unit 30 controls the voice recognition unit 32 to perform voice recognition processing on the voice data received in S301. The voice recognition unit 32 converts the voice data into text, understands the language, and outputs the result of the voice recognition process to the command generation unit 33.

Ｓ３０３では、制御部３０は、コマンド生成部３３を制御して、音声認識処理の結果に基づいて、カメラの撮影設定に関するコマンドを生成し、通信部３１に出力する。コマンド生成部３３は、例えば、Ｓ３０２で音声データをテキスト化した内容が「シャッタースピードを１／６０に設定」ならば、テキスト化した内容をカメラ本体２００のシャッタースピードの設定を変更するコマンドに変換し、生成したコマンドを通信部３１に出力した後、処理を終了する。音声入力により設定可能な項目は、シャッタースピードに限らず、ＩＳＯ感度、絞り、連写／単写、動画／静止画、長時間露光／短時間露光など、記録フォーマット、現像色、記録先などでもよく、音声入力によりこれらの項目を変更する場合、ユーザが音声入力を用いて設定を変更する場合、ユーザの音声が言語理解され、設定変更を行うコマンドが生成される
Ｓ３０４では、制御部３０は、通信部３１を制御して、Ｓ３０３で生成したコマンドをカメラ本体２００に送信する。 In S303, the control unit 30 controls the command generation unit 33 to generate a command related to the shooting setting of the camera based on the result of the voice recognition process, and outputs the command to the communication unit 31. For example, if the textualized content of the audio data in S302 is "set the shutter speed to 1/60", the command generation unit 33 converts the textualized content into a command for changing the shutter speed setting of the camera body 200. Then, after outputting the generated command to the communication unit 31, the process ends. Items that can be set by voice input are not limited to shutter speed, but also ISO sensitivity, aperture, continuous shooting / single shooting, video / still image, long exposure / short exposure, recording format, development color, recording destination, etc. Frequently, when these items are changed by voice input, when the user changes the setting by using voice input, the user's voice is understood in the language and a command for changing the setting is generated. In S304, the control unit 30 , The communication unit 31 is controlled to transmit the command generated in S303 to the camera body 200.

以上説明したように、実施形態１によれば、ユーザが音声入力を用いてカメラの撮影設定を変更する場合、音声認識処理中に受け付けたシャッタースイッチ２４ａの全押しによる撮影開始の指示を無効化する。これにより、ユーザは、音声認識処理中であること、音声入力した設定が反映されたことを容易に判別することができる。そして、ユーザは、音声入力した設定が反映され、シャッタースイッチ２４ａの全押しによる撮影開始の指示が受け付け可能な状態に戻った後、音声入力による設定が反映された状態で、ユーザの意図通りの撮影を行うことができる。 As described above, according to the first embodiment, when the user changes the shooting setting of the camera by using voice input, the instruction to start shooting by fully pressing the shutter switch 24a received during the voice recognition process is invalidated. To do. As a result, the user can easily determine that the voice recognition process is in progress and that the voice input setting is reflected. Then, the user reflects the voice input setting, returns to a state in which the instruction to start shooting by fully pressing the shutter switch 24a can be accepted, and then reflects the voice input setting, as the user intended. You can shoot.

［実施形態２］次に、実施形態２について説明する。 [Embodiment 2] Next, the second embodiment will be described.

実施形態１では、音声認識処理中のシャッタースイッチ２４ａの全押しによる撮影開始の指示を無効化していた。これに対して。実施形態２では、音声認識処理中にシャッタースイッチ２４ａの全押しによる撮影開始の指示を受け付けた場合、変更される前の撮影設定で撮影処理を開始する。その他に関しては実施形態１と同様であるため、以下では、異なるところを中心に説明する。また、実施形態２の撮像装置１および音声認識サーバ３００の構成は実施形態１の図１と同様であり、実施形態２の音声認識サーバ３００の処理は実施形態１の図３と同様であるため、説明を省略する。 In the first embodiment, the instruction to start shooting by fully pressing the shutter switch 24a during the voice recognition process is invalidated. On the contrary. In the second embodiment, when an instruction to start shooting by fully pressing the shutter switch 24a is received during the voice recognition process, the shooting process is started with the shooting setting before the change. Since the other aspects are the same as those in the first embodiment, the differences will be mainly described below. Further, the configuration of the image pickup device 1 and the voice recognition server 300 of the second embodiment is the same as that of FIG. 1 of the first embodiment, and the processing of the voice recognition server 300 of the second embodiment is the same as that of FIG. 3 of the first embodiment. , The description is omitted.

＜撮像装置１の処理＞以下に、図４を参照して、実施形態２の撮像装置１による撮影時の設定処理と制御処理について説明する。 <Processing of Imaging Device 1> Hereinafter, setting processing and control processing at the time of photographing by the imaging device 1 of the second embodiment will be described with reference to FIG.

図４のＳ４０１〜Ｓ４０４、Ｓ４０６〜Ｓ４０８、Ｓ４１０〜Ｓ４１３の処理は、図２のＳ２０１〜Ｓ２０４、Ｓ２０６〜Ｓ２０８、Ｓ２１０〜Ｓ２１３と同様である。 The processing of S401 to S404, S406 to S408, and S410 to S413 in FIG. 4 is the same as that of S201 to S204, S206 to S208, and S210 to S213 in FIG.

図４は、Ｓ４０４において音声入力部２７による集音が終了した後の処理が実施形態１の図２と異なっている。すなわち、実施形態１では集音が終了した後にＳ２０４においてシャッタースイッチ２４ａの全押しによる撮影開始の指示を無効にしたが、実施形態２では無効にせず、Ｓ４０６において音声データを音声認識サーバ３００に送信する。なお、Ｓ４０４の後は、状態検出回路２０ｅが音声認識サーバ３００による音声認識処理中であることを検出している。 FIG. 4 is different from FIG. 2 of the first embodiment in the processing after the sound collection by the voice input unit 27 is completed in S404. That is, in the first embodiment, the instruction to start shooting by fully pressing the shutter switch 24a was invalidated in S204 after the sound collection was completed, but it was not invalidated in the second embodiment, and the voice data was transmitted to the voice recognition server 300 in S406. To do. After S404, it is detected that the state detection circuit 20e is in the process of voice recognition by the voice recognition server 300.

音声認識サーバ３００による音声認識処理中において、Ｓ４２０では、制御部２０ａは、シャッタースイッチ２４ａの全押しによる撮影開始の指示を受け付けたか否かを判定し、指示を受け付けた場合は、処理をＳ４２１に進め、指示がない場合は、処理をＳ４０７に進める。 During the voice recognition process by the voice recognition server 300, in S420, the control unit 20a determines whether or not the instruction to start shooting by fully pressing the shutter switch 24a is accepted, and if the instruction is received, the process is set to S421. If there is no instruction, the process proceeds to S407.

Ｓ４０７では、制御部２０ａは、音声認識サーバ３００から送信されたコマンドを通信部２５を介して受信し、処理をＳ４０８に進める。 In S407, the control unit 20a receives the command transmitted from the voice recognition server 300 via the communication unit 25, and proceeds to the process in S408.

Ｓ４０８では、制御部２０ａは、Ｓ４０７で受信したコマンドに含まれる、音声入力部２７が入力したユーザの音声に応じた撮影設定の設定値を適用し、処理をＳ４１０に進める。 In S408, the control unit 20a applies the setting value of the shooting setting according to the user's voice input by the voice input unit 27 included in the command received in S407, and proceeds to the process in S410.

Ｓ４２１では、制御部２０ａは、音声入力部２７による変更前の撮影設定で撮影処理を開始し、処理をＳ４２２に進める。 In S421, the control unit 20a starts the shooting process with the shooting setting before the change by the voice input unit 27, and advances the processing to S422.

Ｓ４２２では、制御部２０ａは、音声認識サーバ３００から送信されたコマンドを通信部２５を介して受信し、処理をＳ４２３に進める。 In S422, the control unit 20a receives the command transmitted from the voice recognition server 300 via the communication unit 25, and proceeds to the process in S423.

Ｓ４２３では、制御部２０ａは、Ｓ４２２で受信したコマンドに含まれる、音声入力部２７が入力したユーザの音声に応じた撮影設定の設定値を適用し、処理をＳ４２４に進める。 In S423, the control unit 20a applies the setting value of the shooting setting according to the user's voice input by the voice input unit 27 included in the command received in S422, and proceeds to the process in S424.

Ｓ４２４では、制御部２０ａは、Ｓ４２３で受信したコマンドに基づいて再撮影が必要か否かを判定する。制御部２０ａは、再撮影が必要であると判定した場合は、処理をＳ４１１に進め、必要ではないと判定した場合は、処理をＳ４１２に進める。再撮影が必要な場合とは、例えば、音声入力による撮影設定が撮影処理後に反映できない場合であり、シャッタースピード、ＩＳＯ感度、絞り、連写／単写、動画／静止画などの設定変更が挙げられる。また、再撮影が必要な場合とは、撮影を中止する必要がある場合であり、例えば、長時間露光／短時間露光、動画／静止画などの設定変更が挙げられる。また、再撮影が不要な場合とは、撮影後の現像処理や記録処理において設定の反映が可能な場合であり、例えば、記録フォーマット、現像色、記録先などの変更が挙げられる。 In S424, the control unit 20a determines whether or not reshooting is necessary based on the command received in S423. When the control unit 20a determines that re-imaging is necessary, the process proceeds to S411, and when it is determined that re-imaging is not necessary, the control unit 20a proceeds to the process to S412. The case where re-shooting is necessary is, for example, the case where the shooting setting by voice input cannot be reflected after the shooting process, and the setting change such as shutter speed, ISO sensitivity, aperture, continuous shooting / single shooting, moving image / still image is mentioned. Be done. Further, the case where re-shooting is necessary is a case where it is necessary to stop shooting, and examples thereof include setting changes such as long exposure / short exposure and moving image / still image. Further, the case where re-shooting is unnecessary is a case where the setting can be reflected in the development process and the recording process after the shooting, and examples thereof include changes in the recording format, development color, recording destination, and the like.

Ｓ４２４で再撮影が必要な場合は、Ｓ４１１で、制御部２０ａは、設定変更後に再度撮影処理を行い、再撮影が不要な場合は、Ｓ４１２で、制御部２０ａは、Ｓ４２３の音声入力による撮影設定をＳ４２１で撮影した結果に反映させて記録する。 If re-shooting is required in S424, the control unit 20a performs the shooting process again after changing the setting in S411, and if re-shooting is not necessary, in S412, the control unit 20a sets the shooting by the voice input of S423. Is reflected in the result of shooting in S421 and recorded.

以上説明したように、実施形態２によれば、音声認識サーバ３００における音声認識処理中にシャッタースイッチ２４ａの全押しによる撮影開始の指示を受け付けた場合、変更前の撮影設定で撮影処理を開始する。これにより、ユーザは、音声認識中のシャッターチャンスを逃すことなく、意図通りの撮影を行うことができる。 As described above, according to the second embodiment, when an instruction to start shooting by fully pressing the shutter switch 24a is received during the voice recognition process in the voice recognition server 300, the shooting process is started with the shooting setting before the change. .. As a result, the user can shoot as intended without missing a photo opportunity during voice recognition.

なお、Ｓ４２２でコマンドを受信したが、受信したコマンドに応じた設定が反映できない場合も考えられる。例えば、音声認識に失敗した場合、設定値がカメラのスペックをオーバーする場合などである。このような場合は、Ｓ４１１やＳ４１２において、撮影画像を表示すると共に、撮影画像が音声入力による変更前の設定を反映したものであるのか、音声入力による設定を反映したものであるのかを通知するように、背面表示部２３にメッセージなどを表示してもよい。 Although the command was received in S422, it is possible that the settings corresponding to the received command cannot be reflected. For example, when voice recognition fails, the set value exceeds the specifications of the camera, and so on. In such a case, in S411 and S412, the captured image is displayed, and it is notified whether the captured image reflects the setting before the change by the voice input or the setting by the voice input. As described above, a message or the like may be displayed on the rear display unit 23.

上述した実施形態では、音声認識を外部の音声認識サーバ３００で行う構成を説明したが、カメラ本体２００で音声認識を行う構成でもよい。また、音声入力を受け付けるトリガをシャッタースイッチ２４ａを半押した場合としたが、これに限らず、例えば特定の音声を検知した場合でもよい。さらに、撮影を開始するトリガをシャッタースイッチ２４ａの全押した場合としたが、これに限らず、例えば背面表示部２３をタッチした場合でもよい。 In the above-described embodiment, the configuration in which the voice recognition is performed by the external voice recognition server 300 has been described, but the voice recognition may be performed by the camera body 200. Further, the trigger for accepting the voice input is the case where the shutter switch 24a is half-pressed, but the present invention is not limited to this, and for example, a specific voice may be detected. Further, the trigger for starting shooting is the case where the shutter switch 24a is fully pressed, but the present invention is not limited to this, and for example, the case where the rear display unit 23 is touched may be used.

［他の実施形態］
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 [Other Embodiments]
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above embodiments, and various modifications and modifications can be made without departing from the spirit and scope of the invention. Therefore, a claim is attached to make the scope of the invention public.

１００…レンズ部、２００…カメラ本体、３００…音声認識サーバ、２０…電気回路、２１…撮像素子、２４…操作部、２５…通信部、２７…音声入力部、３０…制御部、３１…通信部、３２…音声認識部、３３…コマンド生成部 100 ... lens unit, 200 ... camera body, 300 ... voice recognition server, 20 ... electric circuit, 21 ... image sensor, 24 ... operation unit, 25 ... communication unit, 27 ... voice input unit, 30 ... control unit, 31 ... communication Unit, 32 ... Voice recognition unit, 33 ... Command generation unit

Claims

撮像手段と、
音声入力手段と、
ユーザの撮影開始の指示に応じて撮影処理を行う制御手段と、
前記音声入力手段により入力されたユーザの音声に応じて撮影時の設定を変更する設定手段と、を有し、
前記制御手段は、前記音声入力手段により入力された音声について音声認識処理を行っている間はユーザの撮影開始の指示を受け付けないように制御することを特徴とする撮像装置。 Imaging means and
Voice input means and
A control means that performs shooting processing in response to a user's instruction to start shooting,
It has a setting means for changing a setting at the time of shooting according to a user's voice input by the voice input means.
The control means is an imaging device that controls so as not to accept a user's instruction to start shooting while performing voice recognition processing on the voice input by the voice input means.

前記制御手段は、前記音声入力手段が音声を集音している間または前記音声認識処理が終了した後は前記撮影開始の指示を受け付けるように制御することを特徴とする請求項１に記載の撮像装置。 The control means according to claim 1, wherein the control means is controlled so as to receive an instruction to start shooting while the voice input means is collecting sound or after the voice recognition process is completed. Imaging device.

撮像手段と、
音声入力手段と、
ユーザの撮影開始の指示に応じて撮影処理を行う制御手段と、
前記音声入力手段により入力されたユーザの音声に応じて撮影時の設定を変更する設定手段と、を有し、
前記制御手段は、前記音声入力手段により入力された音声について音声認識処理を行っている間にユーザから撮影開始の指示を受け付けた場合、前記設定手段により変更される前の設定で撮影処理を開始するように制御することを特徴とする撮像装置。 Imaging means and
Voice input means and
A control means that performs shooting processing in response to a user's instruction to start shooting,
It has a setting means for changing a setting at the time of shooting according to a user's voice input by the voice input means.
When the control means receives an instruction to start shooting from the user while performing voice recognition processing on the voice input by the voice input means, the control means starts the shooting process with the settings before being changed by the setting means. An imaging device characterized in that it is controlled so as to perform.

前記制御手段は、前記設定手段により変更される前の設定で撮影処理を開始した後、前記音声認識処理が終了した場合は、前記音声入力手段により入力されたユーザの音声に応じて前記設定手段により変更された設定を撮影処理後に反映させることを特徴とする請求項３に記載の撮像装置。 When the voice recognition process is completed after the control means starts the shooting process with the settings before being changed by the setting means, the setting means responds to the user's voice input by the voice input means. The imaging device according to claim 3, wherein the settings changed by the above are reflected after the photographing process.

前記制御手段は、前記設定手段により変更される前の設定で撮影処理を開始した後、前記音声認識処理が終了した場合に、前記音声入力手段により入力されたユーザの音声に応じて前記設定手段により変更された設定を撮影処理後に反映させることができないときは、設定を変更した後に再度撮影処理を行うように制御することを特徴とする請求項４に記載の撮像装置。 The control means starts the shooting process with the settings before being changed by the setting means, and then when the voice recognition process ends, the setting means responds to the user's voice input by the voice input means. The imaging device according to claim 4, wherein when the setting changed by the above method cannot be reflected after the shooting process, the setting is controlled so that the shooting process is performed again after the setting is changed.

音声認識機能を有する外部機器と通信可能な通信手段をさらに有し、
前記制御手段は、前記音声入力手段により入力したユーザの音声を前記通信手段により外部機器に送信し、
前記設定手段は、前記外部機器から受信したコマンドに基づいて、前記撮影時の設定を変更することを特徴とする請求項１から５のいずれか１項に記載の撮像装置。 Further has a communication means capable of communicating with an external device having a voice recognition function,
The control means transmits the user's voice input by the voice input means to an external device by the communication means.
The imaging device according to any one of claims 1 to 5, wherein the setting means changes the setting at the time of shooting based on a command received from the external device.

前記撮影処理の結果が前記設定手段により変更される前の設定を反映したものであるのか、前記設定手段により変更された設定を反映したものであるのかを通知する通知手段をさらに有することを特徴とする請求項３から６のいずれか１項に記載の撮像装置。 It is further characterized by having a notification means for notifying whether the result of the shooting process reflects the setting before being changed by the setting means or whether the result reflects the setting changed by the setting means. The imaging apparatus according to any one of claims 3 to 6.

撮像手段と、音声入力手段と、ユーザの撮影開始の指示に応じて撮影処理を行う制御手段と、を有する撮像装置の制御方法であって、
前記音声入力手段により入力された音声について音声認識処理を行っている間はユーザの撮影開始の指示を受け付けないようにし、
前記音声認識処理が終了し前記音声入力手段により入力されたユーザの音声に応じて撮影時の設定が変更された後に前記撮影開始の指示を受け付けることを特徴とする制御方法。 It is a control method of an image pickup apparatus having an image pickup means, a voice input means, and a control means for performing a shooting process in response to a user's instruction to start shooting.
While the voice recognition process is being performed on the voice input by the voice input means, the user's instruction to start shooting is not accepted.
A control method comprising receiving an instruction to start shooting after the voice recognition process is completed and a setting at the time of shooting is changed according to a user's voice input by the voice input means.

撮像手段と、音声入力手段と、ユーザの撮影開始の指示に応じて撮影処理を行う制御手段と、前記音声入力手段により入力されたユーザの音声に応じて撮影時の設定を変更する設定手段と、を有する撮像装置の制御方法であって、
前記音声入力手段により入力された音声について音声認識処理を行っている間にユーザから撮影開始の指示を受け付けた場合、前記設定手段により変更される前の設定で撮影処理を開始し、
前記音声認識処理が終了した後に前記音声入力手段により入力されたユーザの音声に応じて撮影時の設定を変更することを特徴とする制御方法。 An imaging means, a voice input means, a control means that performs shooting processing in response to a user's instruction to start shooting, and a setting means that changes the shooting setting according to the user's voice input by the voice input means. It is a control method of an image pickup apparatus having
When a user gives an instruction to start shooting while performing voice recognition processing on the voice input by the voice input means, the shooting process is started with the settings before being changed by the setting means.
A control method comprising changing a setting at the time of shooting according to a user's voice input by the voice input means after the voice recognition process is completed.

コンピュータを請求項１から７のいずれか１項に記載の撮像装置の各手段として機能させるための、コンピュータが読み取り可能なプログラム。 A computer-readable program for operating a computer as each means of the imaging apparatus according to any one of claims 1 to 7.