JP2018121134A

JP2018121134A - Image forming apparatus

Info

Publication number: JP2018121134A
Application number: JP2017009872A
Authority: JP
Inventors: 国晃大山; Kuniaki Oyama
Original assignee: Kyocera Document Solutions Inc
Current assignee: Kyocera Document Solutions Inc
Priority date: 2017-01-23
Filing date: 2017-01-23
Publication date: 2018-08-02
Anticipated expiration: 2037-01-23
Also published as: JP6598033B2

Abstract

PROBLEM TO BE SOLVED: To reduce the possibility that another user's voice is falsely recognized.SOLUTION: A user detection part 31 determines whether a person is detected from an image photographed by a camera 13, when a person is detected, specifies the person as a command user, and specifies the position of the face of the command user. When the command user is specified by the user detection part 31, a voice recognition part 32 (a) controls a microphone operation part 16 to direct a sound collection range of a microphone 15 toward the face of the command user on the basis of the position of the face of the command user, and (b) executes voice recognition processing on a voice collected by the microphone 15 to specify a command corresponding to the voice collected by the microphone 15. A control part 33 controls an internal device according to the specified command.SELECTED DRAWING: Figure 1

Description

本発明は、画像形成装置に関するものである。 The present invention relates to an image forming apparatus.

近年、音声認識技術を使用して、ユーザーの音声で機器を操作することが可能になっている。ある画像形成装置は、音声操作が開始された場合、近隣の機器に対して音声操作の開始を通知することで、その近隣の機器に対する音声操作を無効にさせて、非対象機器での音声認識を防止している（例えば特許文献１参照）。 In recent years, it has become possible to operate a device with a user's voice using voice recognition technology. When a voice operation is started, a certain image forming apparatus notifies the neighboring device of the start of the voice operation, thereby invalidating the voice operation on the neighboring device, and performing voice recognition on a non-target device. (For example, refer to Patent Document 1).

特開２００６−２３６３６号公報JP 2006-23636 A

しかしながら、複数ユーザーが操作対象の機器の周辺にいる場合、操作対象の機器が、その複数ユーザーの音声を認識してしまい、その機器を操作しているユーザーとは別のユーザーの音声に従って誤動作をしてしまう可能性がある。 However, when multiple users are in the vicinity of the operation target device, the operation target device recognizes the sound of the multiple users and malfunctions according to the sound of a user other than the user operating the device. There is a possibility that.

本発明は、上記の問題に鑑みてなされたものであり、他のユーザーの音声を誤って認識する可能性を低くする画像形成装置を得ることを目的とする。 The present invention has been made in view of the above problems, and an object thereof is to obtain an image forming apparatus that reduces the possibility of erroneously recognizing another user's voice.

本発明に係る画像形成装置は、内部装置と、所定範囲を撮影し撮影画像を出力するカメラと、音声を集音するマイクロフォンと、前記マイクロフォンの集音範囲を移動させるマイクロフォン操作部と、前記撮影画像から人物が検出されるか否かを判定し、前記人物が検出された場合、前記人物をコマンドユーザーと特定し、前記コマンドユーザーの顔の位置を特定するユーザー検出部と、前記ユーザー検出部により前記コマンドユーザーが特定された場合、（ａ）前記コマンドユーザーの顔の位置に基づいて、前記マイクロフォン操作部を制御して前記マイクロフォンの集音範囲を前記コマンドユーザーの顔へ向け、（ｂ）前記マイクロフォンにより集音された音声に対して音声認識処理を実行し、前記マイクロフォンにより集音された音声に対応するコマンドを特定する音声認識部と、特定された前記コマンドに従って前記内部装置を制御する制御部とを備える。 An image forming apparatus according to the present invention includes an internal device, a camera that captures a predetermined range and outputs a captured image, a microphone that collects sound, a microphone operation unit that moves the sound collection range of the microphone, and the imaging It is determined whether or not a person is detected from an image. When the person is detected, the person is identified as a command user, and a user detection unit that identifies the position of the command user's face; and the user detection unit When the command user is specified by (a), based on the position of the command user's face, the microphone operation unit is controlled to direct the microphone sound collection range toward the command user's face, (b) Voice recognition processing is performed on the voice collected by the microphone, and the voice collected by the microphone Comprising a corresponding voice recognition unit that identifies the command, and a control unit for controlling the internal device in accordance with the command identified.

本発明によれば、他のユーザーの音声を誤って認識する可能性を低くする画像形成装置が得られる。 According to the present invention, it is possible to obtain an image forming apparatus that reduces the possibility of erroneously recognizing another user's voice.

本発明の上記又は他の目的、特徴および優位性は、添付の図面とともに以下の詳細な説明から更に明らかになる。 These and other objects, features and advantages of the present invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings.

図１は、本発明の実施の形態に係る画像形成装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an image forming apparatus according to an embodiment of the present invention. 図２は、図１に示す画像形成装置１の動作について説明するフローチャートである。FIG. 2 is a flowchart for explaining the operation of the image forming apparatus 1 shown in FIG. 図３は、図１に示す画像形成装置１によるコマンドユーザー音声の集音について説明する斜視図である。FIG. 3 is a perspective view for explaining command user voice collection by the image forming apparatus 1 shown in FIG. 図４は、図１におけるユーザー検出部３１によるコマンドユーザーの特定について説明する図である。FIG. 4 is a view for explaining identification of a command user by the user detection unit 31 in FIG.

以下、図に基づいて本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の実施の形態に係る画像形成装置の構成を示すブロック図である。図１に示す画像形成装置１は、印刷機能、画像読取機能、ファクシミリ機能などを有する複合機である。 FIG. 1 is a block diagram showing a configuration of an image forming apparatus according to an embodiment of the present invention. An image forming apparatus 1 shown in FIG. 1 is a multifunction machine having a printing function, an image reading function, a facsimile function, and the like.

画像形成装置１は、操作パネル１１、人感センサー１２、カメラ１３、スピーカー１４、マイクロフォン１５、マイクロフォン操作部１６、通信装置２１、モデム２２、印刷装置２３、画像読取装置２４、およびコントローラー２５を備える。 The image forming apparatus 1 includes an operation panel 11, a human sensor 12, a camera 13, a speaker 14, a microphone 15, a microphone operation unit 16, a communication device 21, a modem 22, a printing device 23, an image reading device 24, and a controller 25. .

操作パネル１１は、液晶ディスプレイなどの表示装置１１ａ、およびタッチパネルなどの入力装置１１ｂを備え、ユーザーに対する操作画面の表示およびユーザーの入力操作の検出を行う。表示装置１１ａは、ユーザーに対して操作画面を表示し、入力装置１１ｂは、ユーザーにより入力されるユーザー操作を受け付ける。 The operation panel 11 includes a display device 11a such as a liquid crystal display and an input device 11b such as a touch panel, and displays an operation screen for the user and detects a user input operation. The display device 11a displays an operation screen for the user, and the input device 11b receives a user operation input by the user.

人感センサー１２は、赤外線、超音波などにより、当該画像形成装置１の前方の所定範囲内の人物を検出するセンサーである。 The human sensor 12 is a sensor that detects a person within a predetermined range in front of the image forming apparatus 1 using infrared rays, ultrasonic waves, or the like.

カメラ１３は、当該画像形成装置１の前方の所定範囲を撮影し撮影画像を出力する。この実施の形態では、カメラ１３は、人感センサー１２により人物が検出されると、所定範囲の撮影を開始する。 The camera 13 captures a predetermined range in front of the image forming apparatus 1 and outputs a captured image. In this embodiment, when a person is detected by the human sensor 12, the camera 13 starts shooting in a predetermined range.

スピーカー１４は、音声案内などを出力する。 The speaker 14 outputs voice guidance and the like.

マイクロフォン１５は、音声を集音し、音声信号として出力する。マイクロフォン操作部１６は、マイクロフォン１５の集音範囲を移動させる。 The microphone 15 collects sound and outputs it as a sound signal. The microphone operation unit 16 moves the sound collection range of the microphone 15.

例えば、マイクロフォン１５は、超指向性マイクロフォンであり、マイクロフォン操作部１６は、モーター、アクチュエーターなどでマイクロフォン１５の向きを機械的に調整して、マイクロフォン１５の集音範囲を移動させる。 For example, the microphone 15 is a super-directional microphone, and the microphone operation unit 16 mechanically adjusts the direction of the microphone 15 with a motor, an actuator, or the like, and moves the sound collection range of the microphone 15.

あるいは、例えば、マイクロフォン１５として、マイクロフォンアレイを使用し、マイクロフォン操作部１６が、そのマイクロフォンアレイからの複数の音声信号に対して信号処理をして集音範囲を移動させるようにしてもよい。 Alternatively, for example, a microphone array may be used as the microphone 15 and the microphone operation unit 16 may perform signal processing on a plurality of sound signals from the microphone array to move the sound collection range.

また、通信装置２１は、コンピューターネットワークに接続され、そのコンピューターネットワークに接続された他の装置との間で所定の通信プロトコルでデータ通信を行う内部装置（ネットワークインターフェイスなど）である。 The communication device 21 is an internal device (such as a network interface) that is connected to a computer network and performs data communication with other devices connected to the computer network using a predetermined communication protocol.

また、モデム２２は、構内交換機などに接続され、構内交換機などを介して、公衆電話交換網に接続された他の装置との間で音声通信を行う内部装置である。モデム２２は、ファクシミリ送受信に使用される。 The modem 22 is an internal device that is connected to a private branch exchange or the like and performs voice communication with another device connected to the public telephone exchange network via the private branch exchange or the like. The modem 22 is used for facsimile transmission / reception.

また、印刷装置２３は、例えば電子写真方式で原稿画像を印刷用紙に印刷する内部装置である。また、画像読取装置２４は、原稿から原稿画像を光学的に読み取り、原稿画像の画像データを生成する内部装置である。 The printing device 23 is an internal device that prints a document image on printing paper by, for example, electrophotography. The image reading device 24 is an internal device that optically reads a document image from a document and generates image data of the document image.

また、コントローラー２５は、図示せぬＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）などを有するコンピューターを備え、ＲＯＭまたは図示せぬ記憶装置に記憶されているプログラムをＲＡＭへロードし、そのプログラムをＣＰＵで実行することにより、各種処理部として動作する。コントローラー２５は、ＡＳＩＣ（Application Specific Integrated Circuit）を備え、ＡＳＩＣで特定の処理を実行するようにしてもよい。 The controller 25 includes a computer having a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), etc. (not shown), and stores a program stored in the ROM or a storage device (not shown). By loading the program into the RAM and executing the program by the CPU, the CPU operates as various processing units. The controller 25 may include an application specific integrated circuit (ASIC) and execute a specific process using the ASIC.

ここでは、コントローラー２５は、ユーザー検出部３１、音声認識部３２、および制御部３３として動作する。 Here, the controller 25 operates as the user detection unit 31, the voice recognition unit 32, and the control unit 33.

ユーザー検出部３１は、カメラ１３により得られた撮影画像から人物が検出されるか否かを判定し、人物が検出された場合、その人物をコマンドユーザーと特定し、コマンドユーザーの顔の位置を特定する。 The user detection unit 31 determines whether or not a person is detected from the captured image obtained by the camera 13, and when a person is detected, identifies the person as a command user and determines the position of the command user's face. Identify.

なお、ユーザー検出部３１は、撮影画像から複数の人物が検出された場合、例えば、複数の人物の顔領域をそれぞれ特定し、カメラ１３（つまり、画像形成装置１）に近い人物ほど顔領域の面積が大きくなるため、顔領域の面積が最も大きい人物をコマンドユーザーと特定する。 When a plurality of persons are detected from the captured image, for example, the user detection unit 31 identifies the face areas of the plurality of persons, and the person closer to the camera 13 (that is, the image forming apparatus 1) has a face area. Since the area increases, the person with the largest face area is identified as the command user.

音声認識部３２は、ユーザー検出部３１によりコマンドユーザーが特定された場合、（ａ）コマンドユーザーの顔の位置に基づいて、マイクロフォン操作部１６を制御してマイクロフォン１５の集音範囲をコマンドユーザーの顔へ向け、（ｂ）マイクロフォン１５により集音された音声に対して音声認識処理を実行し、マイクロフォン１５により集音された音声に対応するコマンドを特定する。 When a command user is specified by the user detection unit 31, the voice recognition unit 32 controls (a) the microphone operation unit 16 based on the position of the command user's face to set the sound collection range of the microphone 15. Toward the face, (b) voice recognition processing is performed on the voice collected by the microphone 15 and a command corresponding to the voice collected by the microphone 15 is specified.

なお、ユーザー検出部３１により撮影画像から複数の人物が検出された場合、音声認識部３２は、マイクロフォン操作部１６を制御して、コマンドユーザーの顔がマイクロフォン１５の集音範囲に入り、かつコマンドユーザー以外の人物の顔がマイクロフォン１５の集音範囲に入らないように、マイクロフォン１５の集音範囲を移動させるようにしてもよい。 When a plurality of persons are detected from the captured image by the user detection unit 31, the voice recognition unit 32 controls the microphone operation unit 16 so that the command user's face falls within the sound collection range of the microphone 15 and the command The sound collection range of the microphone 15 may be moved so that the face of a person other than the user does not enter the sound collection range of the microphone 15.

制御部３３は、音声認識部３２により特定されたコマンドに従って内部装置を制御する。 The control unit 33 controls the internal device according to the command specified by the voice recognition unit 32.

この実施の形態では、音声認識部３２は、マイクロフォン１５により集音された音声から声紋を特定し、音声操作開始コマンドの音声の声紋をコマンドユーザーの声紋として一時的に保持する。そして、制御部３３は、（ａ）音声操作終了まで、コマンドユーザーの声紋と一致する声紋の音声に対応するコマンドに従って内部装置を制御し、コマンドユーザーの声紋と一致しない声紋の音声に対応するコマンドを拒否し、（ｂ）音声操作終了時に、コマンドユーザーの声紋を破棄する。 In this embodiment, the voice recognition unit 32 specifies a voice print from the voice collected by the microphone 15 and temporarily holds the voice print of the voice operation start command as the voice print of the command user. Then, the control unit 33 controls the internal device according to a command corresponding to the voice of the voice print that matches the command user's voice print until the end of the voice operation, and the command corresponding to the voice of the voice print that does not match the voice print of the command user. (B) When the voice operation ends, the voice print of the command user is discarded.

さらに、制御部３３は、音声認識部３２により特定されたコマンドが音声操作終了コマンドである場合には、当該コマンドの声紋がコマンドユーザーの声紋と一致するか否かに拘わらず、音声操作を終了させ、コマンドユーザーの声紋を破棄するようにしてもよい。 Further, when the command specified by the voice recognition unit 32 is a voice operation end command, the control unit 33 ends the voice operation regardless of whether the voice print of the command matches the voice print of the command user. The voice print of the command user may be discarded.

なお、この声紋の照合については、行わないようにしてもよい。 Note that this voiceprint matching may not be performed.

次に、上記画像形成装置１の動作について説明する。図２は、図１に示す画像形成装置１の動作について説明するフローチャートである。図３は、図１に示す画像形成装置１によるコマンドユーザー音声の集音について説明する斜視図である。 Next, the operation of the image forming apparatus 1 will be described. FIG. 2 is a flowchart for explaining the operation of the image forming apparatus 1 shown in FIG. FIG. 3 is a perspective view for explaining command user voice collection by the image forming apparatus 1 shown in FIG.

ユーザー検出部３１は、図３に示すように、音声取得可能範囲内で人感センサー１２により人物が検出されたか否かを監視しており（ステップＳ１）、ユーザー検出部３１は、人感センサー１２により人物が検出されると、カメラ１３による撮影を開始する（ステップＳ２）。なお、音声取得可能範囲は、マイクロフォン１５の集音範囲の可動範囲である。 As shown in FIG. 3, the user detection unit 31 monitors whether or not a person has been detected by the human sensor 12 within the sound acquirable range (step S1). The user detection unit 31 includes the human sensor. When a person is detected in step 12, shooting by the camera 13 is started (step S2). Note that the sound acquirable range is a movable range of the sound collection range of the microphone 15.

ユーザー検出部３１は、撮影画像に対して顔認識処理を実行し、人物の顔領域を検出してコマンドユーザーを特定し、コマンドユーザーの顔の位置を特定する（ステップＳ３）。図４は、図１におけるユーザー検出部３１によるコマンドユーザーの特定について説明する図である。このとき、図４に示すように、複数の人物の顔領域が検出された場合には、面積が最も大きい顔領域の人物がコマンドユーザーと特定される。 The user detection unit 31 performs face recognition processing on the captured image, detects a human face area, specifies a command user, and specifies the position of the command user's face (step S3). FIG. 4 is a view for explaining identification of a command user by the user detection unit 31 in FIG. At this time, as shown in FIG. 4, when the face areas of a plurality of persons are detected, the person having the largest face area is identified as the command user.

次に、音声認識部３２は、特定されたコマンドユーザーの顔の位置に基づいて、マイクロフォン操作部１６を制御してマイクロフォン１５の集音範囲をコマンドユーザーの顔へ向ける（ステップＳ４）。例えば、カメラ１３の撮影画像の頂点の位置と、マイクロフォン１５の集音範囲の向きとが予め特定されており、撮影画像内のコマンドユーザーの顔の位置に対応するマイクロフォン１５の集音範囲の向きが線形補間などに基づいて特定される。 Next, the voice recognition unit 32 controls the microphone operation unit 16 based on the identified position of the command user's face to direct the sound collection range of the microphone 15 toward the command user's face (step S4). For example, the position of the vertex of the captured image of the camera 13 and the direction of the sound collection range of the microphone 15 are specified in advance, and the direction of the sound collection range of the microphone 15 corresponding to the position of the command user's face in the captured image. Is identified based on linear interpolation or the like.

その後、音声認識部３２は、マイクロフォン１５により集音された音声に対して音声認識処理を実行し、マイクロフォン１５により集音された音声に対応するコマンドを特定する（ステップＳ５）。このとき、マイクロフォン１５の集音範囲がコマンドユーザーの顔に向いているので、コマンドユーザーの音声が良好に集音されるとともに、マイクロフォン１５の指向性に基づいて他の人物の音声は集音されにくくなっている。 Thereafter, the voice recognition unit 32 performs voice recognition processing on the voice collected by the microphone 15 and specifies a command corresponding to the voice collected by the microphone 15 (step S5). At this time, since the sound collection range of the microphone 15 faces the face of the command user, the sound of the command user is collected well, and the sound of other persons is collected based on the directivity of the microphone 15. It has become difficult.

そして、音声認識部３２は、コマンドユーザーの声紋が保持されているか否かを判定し（ステップＳ６）、コマンドユーザーの声紋が保持されていない場合、特定したコマンドが音声開始コマンドであるか否かを判定し（ステップＳ７）、特定したコマンドが音声開始コマンドである場合には、特定したコマンドの音声の声紋を特定し、コマンドユーザーの声紋として保持する（ステップＳ８）。 Then, the voice recognition unit 32 determines whether or not the voice print of the command user is held (step S6). If the voice print of the command user is not held, it is determined whether or not the specified command is a voice start command. (Step S7), if the specified command is a voice start command, the voice print of the specified command is specified and held as the voice print of the command user (step S8).

一方、コマンドユーザーの声紋が保持されている場合、音声認識部３２は、特定したコマンドが音声操作終了コマンドであるか否かを判定する（ステップＳ９）。特定したコマンドが音声操作終了コマンドではない場合、音声認識部３２は、特定したコマンドの音声の声紋がコマンドユーザーの声紋に一致するか否かを判定する（ステップＳ１０）。特定したコマンドの音声の声紋がコマンドユーザーの声紋に一致する場合には、制御部３３は、内部装置を制御して、特定したコマンドにより指示された処理を実行する（ステップＳ１１）。このとき、特定したコマンドの音声の声紋がコマンドユーザーの声紋に一致しない場合には、制御部３３は、特定したコマンドにより指示された処理を実行しない。 On the other hand, when the voice print of the command user is held, the voice recognition unit 32 determines whether or not the specified command is a voice operation end command (step S9). If the specified command is not a voice operation end command, the voice recognition unit 32 determines whether the voice print of the specified command matches the voice print of the command user (step S10). If the voice print of the specified command matches the voice print of the command user, the control unit 33 controls the internal device to execute the process designated by the specified command (step S11). At this time, if the voice print of the specified command does not match the voice print of the command user, the control unit 33 does not execute the process instructed by the specified command.

他方、特定したコマンドが音声操作終了コマンドである場合（ステップＳ９）、音声認識部３２は、保持しているコマンドユーザーの声紋を破棄する（ステップＳ１２）。 On the other hand, when the specified command is a voice operation end command (step S9), the voice recognition unit 32 discards the voice print of the command user that is held (step S12).

なお、ステップＳ７において、特定されたコマンドが音声操作開始コマンドではない場合には、ステップＳ１に戻り、音声操作開始コマンドが検出されるまで、ステップＳ１〜Ｓ７の処理が実行される。 If the specified command is not a voice operation start command in step S7, the process returns to step S1, and the processes of steps S1 to S7 are executed until a voice operation start command is detected.

また、ステップＳ８におけるコマンドユーザーの声紋保持の後、ステップＳ１に戻り、コマンドユーザーの音声による次のコマンドが検出されるまで、ステップＳ１〜Ｓ６，Ｓ９，Ｓ１０の処理が実行される。 Further, after holding the voice print of the command user in step S8, the process returns to step S1, and the processes of steps S1 to S6, S9, and S10 are executed until the next command is detected based on the voice of the command user.

また、ステップＳ１２においてコマンドユーザーの声紋破棄の後、ステップＳ１に戻り、ステップＳ１に戻り、音声操作開始コマンドが検出されるまで、ステップＳ１〜Ｓ７の処理が実行される。 In step S12, after discarding the voice print of the command user, the process returns to step S1, returns to step S1, and the processes of steps S1 to S7 are executed until a voice operation start command is detected.

以上のように、上記実施の形態によれば、ユーザー検出部３１は、カメラ１３による撮影画像から人物が検出されるか否かを判定し、人物が検出された場合、その人物をコマンドユーザーと特定し、コマンドユーザーの顔の位置を特定する。音声認識部３２は、ユーザー検出部３１によりコマンドユーザーが特定された場合、（ａ）コマンドユーザーの顔の位置に基づいて、マイクロフォン操作部１６を制御してマイクロフォン１５の集音範囲をコマンドユーザーの顔へ向け、（ｂ）マイクロフォン１５により集音された音声に対して音声認識処理を実行し、マイクロフォン１５により集音された音声に対応するコマンドを特定する。制御部３３は、特定されたコマンドに従って内部装置を制御する。 As described above, according to the above-described embodiment, the user detection unit 31 determines whether or not a person is detected from an image captured by the camera 13. If a person is detected, the user is determined as a command user. Identify and locate command user's face. When a command user is specified by the user detection unit 31, the voice recognition unit 32 controls (a) the microphone operation unit 16 based on the position of the command user's face to set the sound collection range of the microphone 15. Toward the face, (b) voice recognition processing is performed on the voice collected by the microphone 15 and a command corresponding to the voice collected by the microphone 15 is specified. The control unit 33 controls the internal device according to the specified command.

これにより、コマンドユーザー以外の他のユーザーの音声を誤って認識する可能性が低くなる。 This reduces the possibility of erroneously recognizing the voices of other users than the command user.

なお、上述の実施の形態に対する様々な変更および修正については、当業者には明らかである。そのような変更および修正は、その主題の趣旨および範囲から離れることなく、かつ、意図された利点を弱めることなく行われてもよい。つまり、そのような変更および修正が請求の範囲に含まれることを意図している。 Various changes and modifications to the above-described embodiment will be apparent to those skilled in the art. Such changes and modifications may be made without departing from the spirit and scope of the subject matter and without diminishing its intended advantages. That is, such changes and modifications are intended to be included within the scope of the claims.

例えば、上記実施の形態において、人感センサー１２により人物が検出されたときに、音声操作開始コマンドの特定音声（例えば「音声操作開始」など）を発するようにその人物に促す音声をスピーカー１４から出力するようにしてもよい。 For example, in the above-described embodiment, when the person is detected by the human sensor 12, a voice prompting the person to issue a specific voice (for example, “start voice operation”) of the voice operation start command is sent from the speaker 14. You may make it output.

また、上記実施の形態においては、音声操作終了コマンドを音声で入力しているが、さらに、操作パネル１１を操作して、音声操作を終了させることができるようにしてもよい。 In the above embodiment, the voice operation end command is input by voice. However, the voice operation may be ended by operating the operation panel 11.

本発明は、例えば、複合機などの画像形成装置に適用可能である。 The present invention is applicable to an image forming apparatus such as a multifunction machine.

１画像形成装置
１３カメラ
１５マイクロフォン
１６マイクロフォン操作部
２１通信装置（内部装置の一例）
２２モデム（内部装置の一例）
２３印刷装置（内部装置の一例）
２４画像読取装置（内部装置の一例）
３１ユーザー検出部
３２音声認識部
３３制御部 DESCRIPTION OF SYMBOLS 1 Image forming apparatus 13 Camera 15 Microphone 16 Microphone operation part 21 Communication apparatus (an example of an internal device)
22 Modem (an example of internal device)
23 Printing device (an example of internal device)
24 Image reading device (an example of internal device)
31 User detection unit 32 Voice recognition unit 33 Control unit

Claims

内部装置と、
所定範囲を撮影し撮影画像を出力するカメラと、
音声を集音するマイクロフォンと、
前記マイクロフォンの集音範囲を移動させるマイクロフォン操作部と、
前記撮影画像から人物が検出されるか否かを判定し、前記人物が検出された場合、前記人物をコマンドユーザーと特定し、前記コマンドユーザーの顔の位置を特定するユーザー検出部と、
前記ユーザー検出部により前記コマンドユーザーが特定された場合、（ａ）前記コマンドユーザーの顔の位置に基づいて、前記マイクロフォン操作部を制御して前記マイクロフォンの集音範囲を前記コマンドユーザーの顔へ向け、（ｂ）前記マイクロフォンにより集音された音声に対して音声認識処理を実行し、前記マイクロフォンにより集音された音声に対応するコマンドを特定する音声認識部と、
特定された前記コマンドに従って前記内部装置を制御する制御部と、
を備えることを特徴とする画像形成装置。 Internal devices,
A camera that captures a predetermined range and outputs a captured image;
A microphone that collects audio,
A microphone operation unit that moves a sound collection range of the microphone;
It is determined whether or not a person is detected from the captured image, and when the person is detected, the person is specified as a command user, and a user detection unit that specifies the position of the command user's face;
When the command user is specified by the user detection unit, (a) based on the position of the command user's face, the microphone operation unit is controlled to direct the microphone sound collection range toward the command user's face. (B) a voice recognition unit that performs voice recognition processing on the voice collected by the microphone and identifies a command corresponding to the voice collected by the microphone;
A control unit for controlling the internal device according to the specified command;
An image forming apparatus comprising:

前記ユーザー検出部は、前記撮影画像から複数の人物が検出された場合、前記複数の人物の顔領域をそれぞれ特定し、前記顔領域の面積が最も大きい人物を前記コマンドユーザーと特定することを特徴とする請求項１記載の画像形成装置。 When a plurality of persons are detected from the photographed image, the user detection unit specifies a face area of each of the plurality of persons, and specifies a person having the largest area of the face area as the command user. The image forming apparatus according to claim 1.

前記音声認識部は、前記ユーザー検出部により前記撮影画像から複数の人物が検出された場合、前記マイクロフォン操作部を制御して、前記コマンドユーザーの顔が前記マイクロフォンの集音範囲に入り、かつ前記コマンドユーザー以外の人物の顔が前記マイクロフォンの集音範囲に入らないように、前記マイクロフォンの集音範囲を移動させることを特徴とする請求項２記載の画像形成装置。 The voice recognition unit controls the microphone operation unit when the user detection unit detects a plurality of persons from the captured image, and the command user's face falls within the microphone sound collection range, and The image forming apparatus according to claim 2, wherein the sound collection range of the microphone is moved so that a face of a person other than the command user does not enter the sound collection range of the microphone.

前記マイクロフォンは、超指向性マイクロフォンであり、
前記マイクロフォン操作部は、前記マイクロフォンの向きを機械的に調整して、前記マイクロフォンの集音範囲を移動させること、
を特徴とする請求項１から請求項３のうちのいずれか１項記載の画像形成装置。 The microphone is a super-directional microphone;
The microphone operation unit mechanically adjusts the direction of the microphone to move the sound collection range of the microphone;
The image forming apparatus according to claim 1, wherein the image forming apparatus is an image forming apparatus.

前記音声認識部は、前記マイクロフォンにより集音された前記音声から声紋を特定し、音声操作開始コマンドの前記音声の声紋をコマンドユーザーの声紋とし、
前記制御部は、（ａ）音声操作終了まで、前記コマンドユーザーの声紋と一致する前記声紋の音声に対応するコマンドに従って前記内部装置を制御し、前記コマンドユーザーの声紋と一致しない前記声紋の音声に対応するコマンドを拒否し、（ｂ）前記音声操作終了時に、前記コマンドユーザーの声紋を破棄すること、
を特徴とする請求項１記載の画像形成装置。 The voice recognition unit identifies a voice print from the voice collected by the microphone, and uses the voice voice print of a voice operation start command as a voice print of a command user,
The control unit controls (a) the internal device according to a command corresponding to the voiceprint voice that matches the voiceprint of the command user until the voice operation ends, and converts the voiceprint voice that does not match the voiceprint of the command user. Rejecting the corresponding command, and (b) discarding the voice print of the command user at the end of the voice operation,
The image forming apparatus according to claim 1.

前記制御部は、前記音声認識部により特定された前記コマンドが音声操作終了コマンドである場合には、当該コマンドの前記声紋が前記コマンドユーザーの声紋と一致するか否かに拘わらず、前記音声操作を終了させ、前記コマンドユーザーの声紋を破棄することを特徴とする請求項５記載の画像形成装置。 When the command specified by the voice recognition unit is a voice operation end command, the control unit is configured to perform the voice operation regardless of whether the voice print of the command matches the voice print of the command user. 6. The image forming apparatus according to claim 5, wherein the voice print of the command user is discarded.