JP2007174269A

JP2007174269A - Image processor, processing method and program

Info

Publication number: JP2007174269A
Application number: JP2005369196A
Authority: JP
Inventors: Shiro Omori; 士郎大森
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2005-12-22
Filing date: 2005-12-22
Publication date: 2007-07-05

Abstract

<P>PROBLEM TO BE SOLVED: To detect and process the background region easily. <P>SOLUTION: In a picture photoed by a television telephone, the size of an object region P is calculated based on the size of a detected face region W on a frame. The object region P has a reverse T-shape consisting of a region P1, and a region P2 wider than the region P1. The region P1 is arranged at an upper portion, and the region P2 is arranged at a lower portion on the picture. The region P1 has a width equal to the width x of the face region W multiplied by a predetermined coefficient a1, and has a height equal to the height y of the face region W multiplied by a coefficient b1. The region P2 has a width equal to the width x multiplied by a coefficient a2, and a height equal to the height y multiplied by (b2-b1). <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、画像処理装置および方法、並びにプログラムに関し、例えば人の姿を含む画像から、人の姿の部分を簡単に特定することができるようにする画像処理装置および方法、並びにプログラムに関する。 The present invention relates to an image processing apparatus, method, and program, and more particularly, to an image processing apparatus, method, and program that enable a human figure portion to be easily identified from an image including a person figure.

画像処理技術および通信処理技術の向上により、通話相手の画像を見ながら通話することができるテレビ電話システムが普及し始めている。 With the improvement of image processing technology and communication processing technology, videophone systems that can make a call while looking at the image of the other party are beginning to spread.

しかしながら、テレビ電話システムでは、通常、通話者同士のテレビ電話装置の接続が完了すると、テレビ電話装置のカメラで撮影された画像（通話者の姿を含む画像）が、通話者の意思に関わらず、通話相手側のテレビ電話装置に映し出される。 However, in a videophone system, normally, when a videophone device connection between callers is completed, images (images including the caller's appearance) taken with the camera of the videophone device are displayed regardless of the intention of the caller. It is displayed on the video phone device of the other party.

そこで、プライバシー保護の観点から、例えば、通話相手側のテレビ電話装置に自分の顔がはっきり表示されないようにする方法が考えられている。 Therefore, from the viewpoint of protecting the privacy, for example, a method is conceived in which a person's face is not clearly displayed on the videophone device on the other party's side.

顔領域の特定は、例えば、画像の輝度および色信号に対して閾値処理を行って２値化することにより行われ（特許文献１）、比較的簡単に行うことができる。 The identification of the face area is performed, for example, by performing threshold processing on the luminance and color signals of the image and binarizing (Patent Document 1), and can be performed relatively easily.

また例えば自分の部屋でテレビ電話を利用している場合、部屋の状態を通話相手に見せたくないなど、自分の姿全体はいいが、背景が表示されるのは困るという理由から、背景部分を隠蔽する方法も考えられている。 Also, for example, if you are using a videophone in your room, you do not want to show the room status to the other party. A concealment method is also considered.

背景部分の特定は、例えば、赤外線を照射し、照射された赤外線が被写体に当たって反射されたものを受信することによって行われ、精度よく背景部分が特定される。
特開平７−３２７２１３号公報 The background portion is identified by, for example, irradiating infrared rays and receiving the reflected infrared rays hitting the subject and identifying the background portion with high accuracy.
JP 7-327213 A

しかしながら、背景部分を隠蔽したい理由は、例えば部屋の様子を通話者にわからないようにするものであり、精度よく背景部分を特定する必要はないにも関わらず、従来においては、赤外線を利用するなど大掛かりな装置が必要となり、背景部分を簡単に特定することはできなかった。 However, the reason for hiding the background part is, for example, to prevent the caller from knowing the state of the room, and although it is not necessary to specify the background part accurately, conventionally, infrared rays are used. A large-scale device was required, and the background portion could not be easily identified.

本発明は、このような状況に鑑みてなされたものであり、例えばテレビ電話システムに授受される画像から、ユーザの姿およびそれ以外の背景部分を簡単に特定することができるようにするものである。 The present invention has been made in view of such a situation. For example, it is possible to easily identify a user's figure and other background parts from an image exchanged with a videophone system. is there.

本発明の一側面の画像処理装置は、画像を入力する画像入力手段と、前記画像入力手段により入力された前記画像の中から、人の顔領域を検出する検出手段と、前記検出手段により検出された前記顔領域に基づいて、前記画像入力手段により入力された前記画像の中から、前記人の姿全体の被写体領域を特定する第１の特定手段と、前記第1の特定手段により特定された前記被写体領域に基づいて、前記画像入力手段により入力された前記画像から、前記人の姿以外の背景領域を特定する第２の特定手段と、前記被写体領域もしくは前記背景領域の少なくともいずれか一方に対して所定の画像処理を行う処理手段と、前記処理手段により処理された前記画像を出力する出力手段とを備える。 An image processing apparatus according to an aspect of the present invention includes an image input unit that inputs an image, a detection unit that detects a human face area from the image input by the image input unit, and a detection unit that detects the human face region. On the basis of the face area thus determined, the first specifying means for specifying the subject area of the entire human figure from the images input by the image input means and the first specifying means Based on the subject area, second specifying means for specifying a background area other than the human figure from the image input by the image input means, and at least one of the subject area and the background area The image processing apparatus includes processing means for performing predetermined image processing, and output means for outputting the image processed by the processing means.

前記第２の特定手段は、前記被写体領域以外の領域を、前記背景領域とすることができる。 The second specifying unit may set an area other than the subject area as the background area.

前記被写体領域は、あらかじめ決められた形状の領域であり、前記第１の特定手段は、前記被写体領域のフレーム上の大きさまたは位置を、前記顔領域の大きさまたは位置に基づいて決定することができる。 The subject area is an area having a predetermined shape, and the first specifying unit determines the size or position of the subject area on the frame based on the size or position of the face area. Can do.

前記被写体領域は、第１の領域と、前記第１の領域より広い領域の前記第２の領域とからなり、フレーム上、前記第１の領域は上方に前記第２の領域は下方に並んで配置されており、前記第1の特定手段は、前記顔領域が前記被写体領域に含まれるように、前記被写体領域を特定することができる。 The subject area includes a first area and the second area that is wider than the first area. The first area is arranged above the frame and the second area is arranged below the frame. The first specifying means is capable of specifying the subject area so that the face area is included in the subject area.

前記第１の特定手段は、前記顔領域のフレーム上の位置に基づいて、前記第１の領域に対する前記第２の領域の位置を変更することができる。 The first specifying means can change the position of the second area with respect to the first area based on the position of the face area on the frame.

前記第１の特定手段は、前記顔領域のフレーム上の水平方向の位置に基づいて、前記第２の領域の水平方向の位置を変更することができる。 The first specifying means can change the horizontal position of the second area based on the horizontal position of the face area on the frame.

本発明の一側面の画像処理方法は、画像を入力する画像入力ステップと、前記画像入力ステップの処理で入力された前記画像の中から、人の顔領域を検出する検出ステップと、前記検出ステップの処理で検出された前記顔領域に基づいて、前記画像入力ステップの処理で入力された前記画像の中から、前記人の姿全体の被写体領域を特定する第１の特定ステップと、前記第1の特定ステップの処理で特定された前記被写体領域に基づいて、前記画像入力ステップの処理で入力された前記画像から、前記人の姿以外の背景領域を特定する第２の特定ステップと、前記被写体領域もしくは前記背景領域の少なくともいずれか一方に対して所定の画像処理を行う処理ステップと、前記処理ステップの処理で処理された前記画像を出力する出力ステップとを含む。 An image processing method according to one aspect of the present invention includes an image input step of inputting an image, a detection step of detecting a human face area from the image input in the processing of the image input step, and the detection step A first specifying step of specifying a subject region of the entire human figure from the images input in the processing of the image input step based on the face region detected in the processing of A second specifying step of specifying a background region other than the human figure from the image input in the processing of the image input step based on the subject region specified in the processing of the specifying step; A processing step for performing predetermined image processing on at least one of the region and the background region, and an output step for outputting the image processed in the processing step Including.

本発明の一側面のプログラムは、画像を入力する画像入力ステップと、前記画像入力ステップの処理で入力された前記画像の中から、人の顔領域を検出する検出ステップと、前記検出ステップの処理で検出された前記顔領域に基づいて、前記画像入力ステップの処理で入力された前記画像の中から、前記人の姿全体の被写体領域を特定する第１の特定ステップと、前記第1の特定ステップの処理で特定された前記被写体領域に基づいて、前記画像入力ステップの処理で入力された前記画像から、前記人の姿以外の背景領域を特定する第２の特定ステップと、前記被写体領域もしくは前記背景領域の少なくともいずれか一方に対して所定の画像処理を行う処理ステップと、前記処理ステップの処理で処理された前記画像を出力する出力ステップとを含む処理をコンピュータに実行させる。 The program according to an aspect of the present invention includes an image input step for inputting an image, a detection step for detecting a human face area from the image input in the processing of the image input step, and the processing of the detection step A first specifying step of specifying a subject region of the entire figure of the person from the images input in the processing of the image input step based on the face region detected in Step 1, and the first specifying A second specifying step of specifying a background region other than the human figure from the image input in the image input step processing based on the subject region specified in the step processing; A processing step of performing predetermined image processing on at least one of the background regions; and an output step of outputting the image processed in the processing step Causes the computer to execute processing including

本発明の一側面の画像処理装置、画像処理方法、およびプログラムにおいては、画像が入力され、入力された前記画像の中から、人の顔領域が検出され、検出された前記顔領域に基づいて、入力された前記画像の中から、前記人の姿全体の被写体領域が特定され、特定された前記被写体領域に基づいて、入力された前記画像から、前記人の姿以外の背景領域が特定され、前記被写体領域もしくは前記背景領域の少なくともいずれか一方に対して所定の画像処理が行われ、処理された前記画像が出力される。 In the image processing device, the image processing method, and the program according to one aspect of the present invention, an image is input, a human face area is detected from the input image, and based on the detected face area The subject area of the entire human figure is identified from the input image, and the background area other than the human figure is identified from the input image based on the identified subject area. A predetermined image process is performed on at least one of the subject area and the background area, and the processed image is output.

本発明によれば、例えばテレビ電話システムに授受される画像から、ユーザの姿またはそれ以外の背景部分を簡単に特定し、処理を施すことができるようにすることができる。 According to the present invention, for example, a user's figure or other background portion can be easily identified from an image exchanged with a videophone system, and can be processed.

以下に本発明の実施の形態を説明するが、本発明の構成要件と、明細書または図面に記載の実施の形態との対応関係を例示すると、次のようになる。この記載は、本発明をサポートする実施の形態が、明細書または図面に記載されていることを確認するためのものである。従って、明細書または図面中には記載されているが、本発明の構成要件に対応する実施の形態として、ここには記載されていない実施の形態があったとしても、そのことは、その実施の形態が、その構成要件に対応するものではないことを意味するものではない。逆に、実施の形態が構成要件に対応するものとしてここに記載されていたとしても、そのことは、その実施の形態が、その構成要件以外の構成要件には対応しないものであることを意味するものでもない。 Embodiments of the present invention will be described below. Correspondences between constituent elements of the present invention and the embodiments described in the specification or the drawings are exemplified as follows. This description is intended to confirm that the embodiments supporting the present invention are described in the specification or the drawings. Therefore, even if there is an embodiment which is described in the specification or the drawings but is not described here as an embodiment corresponding to the constituent elements of the present invention, that is not the case. It does not mean that the form does not correspond to the constituent requirements. Conversely, even if an embodiment is described here as corresponding to a configuration requirement, that means that the embodiment does not correspond to a configuration requirement other than the configuration requirement. It's not something to do.

本発明の一側面の画像処理装置は、画像を入力する画像入力手段（例えば、図３のステップＳ１の処理を行う図２の送信処理部５３）と、前記画像入力手段により入力された前記画像の中から、人の顔領域を検出する検出手段（例えば、図３のステップＳ３の処理を行う図２の送信処理部５３）と、前記検出手段により検出された前記顔領域に基づいて、前記画像入力手段により入力された前記画像の中から、前記人の姿全体の被写体領域を特定する第１の特定手段（例えば、図３のステップＳ４の処理を行う図２の送信処理部５３）と、前記第1の特定手段により特定された前記被写体領域に基づいて、前記画像入力手段により入力された前記画像から、前記人の姿以外の背景領域を特定する第２の特定手段（例えば、図３のステップＳ５の処理を行う図２の送信処理部５３）と、前記被写体領域もしくは前記背景領域の少なくともいずれか一方に対して所定の画像処理を行う処理手段（例えば、図３のステップＳ６の処理を行う図２の送信処理部５３）と、前記処理手段により処理された前記画像を出力する出力手段（例えば、図３のステップＳ７乃至Ｓ１０の処理を行う図２の送信処理部５３）とを備える。 The image processing apparatus according to one aspect of the present invention includes an image input unit that inputs an image (for example, the transmission processing unit 53 in FIG. 2 that performs the process of step S1 in FIG. 3), and the image that is input by the image input unit. Based on the detection means for detecting a human face area (for example, the transmission processing unit 53 in FIG. 2 that performs the process of step S3 in FIG. 3) and the face area detected by the detection means, First specifying means for specifying the subject area of the entire figure of the person from the image input by the image input means (for example, the transmission processing unit 53 in FIG. 2 that performs the process of step S4 in FIG. 3) Second specifying means for specifying a background area other than the human figure from the image input by the image input means based on the subject area specified by the first specifying means (for example, FIG. Of step 3 of step 3 2 for performing processing, and processing means for performing predetermined image processing on at least one of the subject area and the background area (for example, performing processing in step S6 in FIG. 3) Transmission processing unit 53) and output means for outputting the image processed by the processing unit (for example, the transmission processing unit 53 of FIG. 2 for performing the processing of steps S7 to S10 of FIG. 3).

前記第２の特定手段は、前記被写体領域以外の領域を、前記背景領域とすることができる（例えば、図５の背景領域Ｑ）。 The second specifying unit can set an area other than the subject area as the background area (for example, the background area Q in FIG. 5).

前記被写体領域は、あらかじめ決められた形状の領域であり、前記第１の特定手段は、前記被写体領域のフレーム上の大きさまたは位置を、前記顔領域の大きさまたは位置に基づいて決定することができる（例えば、図３のステップＳ４）。 The subject area is an area having a predetermined shape, and the first specifying unit determines the size or position of the subject area on the frame based on the size or position of the face area. (For example, step S4 in FIG. 3).

前記被写体領域は、第１の領域と、前記第１の領域より広い領域の前記第２の領域とからなり、フレーム上、前記第１の領域は上方に前記第２の領域は下方に並んで配置されており、前記第1の特定手段は、前記顔領域が前記被写体領域に含まれるように、前記被写体領域を特定することができる（例えば、図４）。 The subject area includes a first area and the second area that is wider than the first area. The first area is arranged above the frame and the second area is arranged below the frame. The first specifying means is capable of specifying the subject area so that the face area is included in the subject area (for example, FIG. 4).

前記第１の特定手段は、前記顔領域のフレーム上の水平方向の位置に基づいて、前記第２の領域の水平方向の位置を変更することができる（例えば、図１１）。 The first specifying means can change the horizontal position of the second area based on the horizontal position of the face area on the frame (for example, FIG. 11).

本発明の一側面の画像処理方法、またはプログラムは、画像を入力する画像入力ステップ（例えば、図３のステップＳ１）と、前記画像入力ステップの処理で入力された前記画像の中から、人の顔領域を検出する検出ステップ（例えば、図３のステップＳ３）と、前記検出ステップの処理で検出された前記顔領域に基づいて、前記画像入力ステップの処理で入力された前記画像の中から、前記人の姿全体の被写体領域を特定する第１の特定ステップ（例えば、図３のステップＳ４）と、前記第1の特定ステップの処理で特定された前記被写体領域に基づいて、前記画像入力ステップの処理で入力された前記画像から、前記人の姿以外の背景領域を特定する第２の特定ステップ（例えば、図３のステップＳ５）と、前記被写体領域もしくは前記背景領域の少なくともいずれか一方に対して所定の画像処理を行う処理ステップ（例えば、図３のステップＳ６）と、前記処理ステップの処理で処理された前記画像を出力する出力ステップ（例えば、図３のステップＳ７乃至ステップＳ１０）とを含む。 An image processing method or program according to an aspect of the present invention includes an image input step (for example, step S1 in FIG. 3) for inputting an image, and a human's input from the images input in the process of the image input step. Based on the detection step (for example, step S3 in FIG. 3) for detecting a face region, and the image input step based on the face region detected by the detection step processing, A first specifying step (for example, step S4 in FIG. 3) for specifying the subject area of the entire human figure, and the image input step based on the subject area specified in the processing of the first specifying step. A second specifying step (for example, step S5 in FIG. 3) for specifying a background region other than the person's figure from the image input in the above process, and the subject region or the back. A processing step (for example, step S6 in FIG. 3) that performs predetermined image processing on at least one of the scene areas, and an output step (for example, FIG. 3) that outputs the image processed in the processing step. Steps S7 to S10).

図１は、本発明を適用したテレビ電話装置１の構成例を示している。テレビ電話装置１は、テレビ電話装置３と、電話網を含むネットワーク２を介して、相手の映像を見ながら通話することができる通常のテレビ電話機能を有しているが、例えばテレビ電話装置３に映し出される画像の中の、ユーザの姿全体の領域（以下、被写体領域と称する）以外の領域（以下、背景領域と称する）を隠蔽することができる。 FIG. 1 shows a configuration example of a videophone device 1 to which the present invention is applied. The videophone device 1 has a normal videophone function that allows the user to talk with the videophone device 3 through the network 2 including the telephone network while watching the video of the other party. It is possible to hide an area (hereinafter referred to as a background area) other than the entire area of the user (hereinafter referred to as a subject area) in the image displayed on the screen.

テレビ電話装置１の不揮発性メモリ２１乃至DSP２５は、バス３２により相互に接続されている。バス３２には、入出力インタフェース３３が接続され、それにはカメラ２６乃至通信部３１が接続されている。 The nonvolatile memories 21 to DSP 25 of the videophone device 1 are connected to each other by a bus 32. An input / output interface 33 is connected to the bus 32, and a camera 26 to a communication unit 31 are connected to the bus 32.

例えば、特定の相手とテレビ電話を行うための操作がリモコン１１に対して行われ、その信号が受光部３０により受信されると、CPU（Central Processing Unit）２４は、あらかじめ不揮発性メモリ２１に記憶されているテレビ電話発信プログラムを実行し、各部を制御する。これにより通信部３１は、ネットワーク２を介してテレビ電話装置３と接続する。 For example, when an operation for making a videophone call with a specific partner is performed on the remote controller 11 and the signal is received by the light receiving unit 30, a CPU (Central Processing Unit) 24 stores in the nonvolatile memory 21 in advance. The videophone call program is executed and each part is controlled. As a result, the communication unit 31 is connected to the videophone device 3 via the network 2.

テレビ電話装置３との接続が確立すると、カメラ２６やマイクロフォン２８により取り込まれた画像音声信号が、DSP（Digital Signal Processor）２５によってエンコードされ、その結果得られたストリームが通信部３１を介してテレビ電話装置３に送信され、映像は、テレビ電話装置３のディスプレイ５１に映し出される。 When the connection with the videophone device 3 is established, an image / audio signal captured by the camera 26 or the microphone 28 is encoded by a DSP (Digital Signal Processor) 25, and the resulting stream is transmitted to the TV via the communication unit 31. The video is transmitted to the telephone device 3 and the video is displayed on the display 51 of the videophone device 3.

この際テレビ電話装置１は、テレビ電話装置３のディスプレイ５１に映し出される映像の、被写体領域以外の背景領域を隠蔽するための処理を実行する（後述）。 At this time, the videophone device 1 executes a process for concealing the background region other than the subject region of the video displayed on the display 51 of the videophone device 3 (described later).

テレビ電話装置３から送信されてきたストリームは、DSP２５によってデコードされ、その結果得られた画像信号やGUI（Graphical User Interface）などのグラフィックデータが、ビデオメモリ２３に転送されることによって、ディスプレイ２７に表示され、音声信号はスピーカ２９に供給されて出力される。 The stream transmitted from the videophone device 3 is decoded by the DSP 25, and graphic data such as an image signal and GUI (Graphical User Interface) obtained as a result is transferred to the video memory 23, thereby being displayed on the display 27. The audio signal is displayed and supplied to the speaker 29.

図２は、テレビ電話装置１の機能的構成例を示している。 FIG. 2 shows a functional configuration example of the videophone device 1.

例えばリモコン１１に対する発信の操作により、たとえばアドレス帳機能を呼び出し特定の通話相手が決定されると、ユーザインタフェース部５１は、そのコマンドを受け付けて、発呼指令を、制御部５２に出力する。 For example, when a specific call partner is determined by calling an address book function by a call operation on the remote controller 11, for example, the user interface unit 51 receives the command and outputs a call command to the control unit 52.

制御部５２は、ユーザインタフェース部５１から発呼指令を受け取ると、送信処理部５３および受信処理部５４を制御して、通話相手（テレビ電話装置３）との呼制御を行う。これにより送信処理部５３および受信処理部５４は、テレビ電話装置３との呼制御情報のやり取りを、ネットワーク２を介して行う。 When receiving a call instruction from the user interface unit 51, the control unit 52 controls the transmission processing unit 53 and the reception processing unit 54 to perform call control with the other party (videophone device 3). As a result, the transmission processing unit 53 and the reception processing unit 54 exchange call control information with the videophone device 3 via the network 2.

呼制御によりテレビ電話装置３との呼が確立すると、送信処理部５３には、デジタル化された、カメラ２６およびマイクロフォン２８により取り込まれた画像音声が入力される。 When a call with the videophone device 3 is established by call control, digitized video and audio captured by the camera 26 and the microphone 28 are input to the transmission processing unit 53.

送信処理部５３は、入力された画像音声信号に対して、画像音声エンコードや、誤り訂正などのチャネルエンコードを行うとともに、これをパケット化してネットワーク２を介してテレビ電話装置３に送信する。 The transmission processing unit 53 performs channel encoding such as image / audio encoding or error correction on the input image / audio signal, packetizes it, and transmits it to the videophone device 3 via the network 2.

送信処理部５３は、送信する画像を生成する際、必要に応じて、ユーザの姿の被写体領域以外の背景領域が、通話相手が認識できないように表示されるように所定の画像処理を施す（背景領域を隠蔽する）。 When generating an image to be transmitted, the transmission processing unit 53 performs predetermined image processing so that a background region other than the subject region of the user's figure is displayed so as not to be recognized by the other party as necessary (see FIG. Hide the background area).

送信処理部５３は、デジタル化された画像音声データを、ユーザインタフェース部５１にも供給する。 The transmission processing unit 53 also supplies the digitized video / audio data to the user interface unit 51.

ユーザインタフェース部５１は、送信処理部５３から画像音声信号が供給されると、画像信号を、自画像のモニタとしてディスプレイ２７に表示し、音声をスピーカ２９から出力する。 When the image / audio signal is supplied from the transmission processing unit 53, the user interface unit 51 displays the image signal on the display 27 as a self-image monitor and outputs the sound from the speaker 29.

一方、テレビ電話装置３からのパケット化された画像音声情報が受信されると、受信処理部５４には、そのデータが入力される。 On the other hand, when packetized video / audio information is received from the videophone device 3, the data is input to the reception processing unit 54.

受信処理部５４は、入力された画像音声データに対して、チャネルデコードや画像音声デコードを行い、その結果得られた画像音声データをユーザインタフェース部５１に供給する。 The reception processing unit 54 performs channel decoding and image / audio decoding on the input image / audio data, and supplies the obtained image / audio data to the user interface unit 51.

ユーザインタフェース部５１は、受信処理部５４から供給された画像音声データを、通話相手の画像としてディスプレイ２７に表示し、音声をスピーカ２９から出力する。 The user interface unit 51 displays the image / audio data supplied from the reception processing unit 54 on the display 27 as an image of the other party, and outputs the sound from the speaker 29.

次に、送信処理部５３における、背景領域が通話相手に認識されないように表示されるようにする処理（以下、隠蔽処理と称する）を説明する。 Next, processing (hereinafter referred to as concealment processing) that causes the transmission processing unit 53 to display the background area so that it is not recognized by the call partner will be described.

テレビ電話システムを、家庭内またはテレビ会議で利用する際、ユーザは、座っているか立っている場合が多い（寝転んだり、逆立ちをしたり、または顔を見せず背を向けるといった場合は稀である）。したがって、多くの場合、相手方のテレビ電話装置に映し出される画像においては、ユーザの顔の部分と胴体の部分は、上下方向に並んで映し出される。また、顔と胴体との大きさの関係から、胴体の画像部分の大きさは、顔の画像部分の大きさよりも大きいものと考えられる。 When using a videophone system at home or in a videoconference, users are often sitting or standing (rare, lying on their heads, or turning their back without showing their face) ). Therefore, in many cases, in an image displayed on the other party's videophone device, the face portion of the user and the body portion are displayed side by side in the vertical direction. Also, from the relationship between the size of the face and the torso, the size of the image part of the torso is considered to be larger than the size of the image part of the face.

すなわち本発明は、この通常のテレビ電話システムの使用状況でテレビ電話装置に映し出される顔の部分と胴体の部分との関係を利用して、ユーザの姿全体の部分（被写体領域）とそれ以外の部分（背景領域）を簡単に特定するものである。 In other words, the present invention uses the relationship between the face portion and the body portion projected on the videophone device in the use situation of this normal videophone system, so that the entire portion of the user (subject area) and other portions A part (background area) is easily specified.

本発明における隠蔽処理の詳細は、図３のフローチャートに示されている。 The details of the concealment process in the present invention are shown in the flowchart of FIG.

ステップＳ１において、送信処理部５３は、カメラ２６およびマイクロフォン２８により取り込まれデジタル化された画像音声信号を入力する。 In step S <b> 1, the transmission processing unit 53 inputs an image / audio signal captured by the camera 26 and the microphone 28 and digitized.

ステップＳ２において、送信処理部５３は、背景隠蔽処理を行うか否かを判定し、背景隠蔽処理を行うと判定した場合、ステップＳ３に進む。なお背景隠蔽処理を行うか否かは、例えばユーザがリモコン１１を操作することにより任意に設定することができるものとする。 In step S2, the transmission processing unit 53 determines whether or not to perform background concealment processing. If it is determined to perform background concealment processing, the process proceeds to step S3. It should be noted that whether or not the background concealment process is performed can be arbitrarily set by the user operating the remote controller 11, for example.

ステップＳ３において、送信処理部５３は、１フレーム分の画像データから、例えば、図４に示すように、長方形の領域Ｗを、ユーザの顔の部分の領域として検出する。 In step S3, the transmission processing unit 53 detects a rectangular area W as an area of the user's face from the image data for one frame, for example, as shown in FIG.

この検出アルゴリズムは、従来技術（特許文献１）など、どの技術を用いても良く、結果として顔の部分が検出されれば良い。 As this detection algorithm, any technique such as the conventional technique (Patent Document 1) may be used, and as long as a face portion is detected as a result.

次に、ステップＳ４において、送信処理部５３は、ステップＳ３で抽出した顔領域に基づいて被写体領域を特定する。 Next, in step S4, the transmission processing unit 53 identifies a subject area based on the face area extracted in step S3.

具体的にははじめに、送信処理部５３は、図４に示すように、顔領域Ｗのフレーム上の大きさに基づいて、被写体領域Ｐの大きさを算出する。図４の例の場合、被写体領域Ｐの形は、上部の小さな長方形を領域Ｐ１とし、下部の大きな長方形を領域Ｐ２とする凸の文字の形である。すなわちこの被写体領域Ｐは、領域Ｐ１が上方に領域Ｐ２が下方に並んで配置されて構成されている。 Specifically, first, the transmission processing unit 53 calculates the size of the subject region P based on the size of the face region W on the frame as shown in FIG. In the case of the example of FIG. 4, the shape of the subject area P is a convex character shape in which the upper small rectangle is the area P1 and the lower large rectangle is the area P2. That is, the subject area P is configured such that the area P1 is arranged above and the area P2 is arranged below.

例えば送信処理部５３は、領域Ｐ１の幅を、顔領域Ｗの幅ｘに所定の係数ａ１（１よりも大きい実数）を乗じたものとし、そしてその高さを、顔領域Ｗの高さｙに係数ｂ１（１よりも大きい実数）を乗じたものとする。また送信処理部５３は、領域Ｐ２の幅を、幅ｘに係数ａ２（ａ１より大きい実数）を乗じたものとし、その高さを、高さｙに（ｂ２−ｂ１）（ｂ２は、ｂ１より大きい実数）を乗じたものとする。 For example, the transmission processing unit 53 assumes that the width of the area P1 is obtained by multiplying the width x of the face area W by a predetermined coefficient a1 (a real number greater than 1), and the height is set to the height y of the face area W. Is multiplied by a coefficient b1 (a real number greater than 1). Further, the transmission processing unit 53 assumes that the width of the region P2 is obtained by multiplying the width x by a coefficient a2 (a real number larger than a1), and the height y is (b2-b1) (b2 is greater than b1). Multiplied by a large real number).

次に、送信処理部５３は、上述したように決定した大きさを有する被写体領域Ｐ（凸の文字の形の領域）のフレーム上の位置を、顔領域Ｗのフレーム上の位置に基づいて決定する。 Next, the transmission processing unit 53 determines a position on the frame of the subject area P (an area having a convex character shape) having the size determined as described above based on the position of the face area W on the frame. To do.

例えば送信処理部５３は、被写体領域Ｐの左右方向の位置（Ｘ軸上の位置）を、顔領域ＷのＸ軸上の中心Ｃwxと被写体領域ＰのＸ軸上の中心が一致するように、そして上下方向の位置（Ｙ軸上の位置）を、被写体領域Ｐの上端が、顔領域ＷのＹ軸上の中心Ｃwｙから、被写体領域Ｐ全体の高さｂ２×ｙの１／ｃだけ上方となるようにする。なお被写体領域Ｐにおける領域Ｐ１と領域Ｐ２の位置関係は、領域Ｐ１および領域Ｐ２のＸ軸上の中心がそれぞれ一致するようになされている。 For example, the transmission processing unit 53 sets the horizontal position (position on the X axis) of the subject area P so that the center Cwx on the X axis of the face area W and the center of the subject area P on the X axis match. The vertical position (position on the Y axis) is such that the upper end of the subject area P is above the center Cwy on the Y axis of the face area W by 1 / c of the height b2 × y of the entire subject area P. To be. Note that the positional relationship between the region P1 and the region P2 in the subject region P is such that the centers of the regions P1 and P2 on the X-axis coincide with each other.

ここで、係数ａ１、係数ｂ１、係数ｂ２、および係数ｃは、１よりも大きい実数であるので、被写体領域Ｐ全体の大きさおよび位置は、顔領域Ｗを含むように決定される。 Here, since the coefficient a1, the coefficient b1, the coefficient b2, and the coefficient c are real numbers larger than 1, the size and position of the entire subject area P are determined so as to include the face area W.

このようにして決定された大きさおよび位置の被写体領域Ｐが特定される。 The subject area P having the size and position determined in this manner is specified.

なお、このように特定された被写体領域Ｐがフレームからはみ出す場合、はみ出さないところまでの領域が被写体領域Ｐとされる。 In addition, when the subject area P specified in this way protrudes from the frame, the area up to the place where it does not protrude is set as the subject area P.

このように、通常のテレビ電話システムの使用例におけるテレビ電話装置に映し出される顔の領域と胴体の領域との関係を利用して、被写体領域Ｐの位置を、顔領域Ｗの位置にあわせるとともに、顔領域Ｗの下方の被写体領域Ｐの領域Ｐ２の大きさを、顔領域Ｗの大きさより大きくするようにしたので、図５に示すように、ユーザの姿の画像の大部分を、被写体領域Ｐとすることができる。 As described above, the position of the subject area P is adjusted to the position of the face area W by utilizing the relationship between the face area and the torso area displayed on the videophone apparatus in the use example of the normal videophone system. Since the size of the area P2 of the subject area P below the face area W is made larger than the size of the face area W, as shown in FIG. It can be.

次にステップＳ５において、送信処理部５３は、背景領域を特定する。 Next, in step S5, the transmission processing unit 53 identifies the background area.

具体的には、被写体領域Ｐ以外の領域が背景領域（図５に示す背景領域Ｑ）とされる。 Specifically, an area other than the subject area P is set as a background area (background area Q shown in FIG. 5).

次にステップＳ６において、送信処理部５３は、特定した背景領域が通話相手によって認識されないように表示されるように所定の画像処理を施す。 Next, in step S6, the transmission processing unit 53 performs predetermined image processing so that the specified background area is displayed so as not to be recognized by the call partner.

具体的には、図６に示すように、ぼかし効果のあるフィルタをかけ、背景領域Ｑがぼけて表示されるようにすことができる。 Specifically, as shown in FIG. 6, it is possible to apply a filter having a blurring effect so that the background region Q is blurred and displayed.

また図７に示すように、予め用意された所定の画像（図７の例では、黒く塗りつぶされた画像）が背景領域Ｑとして表示されるようにすることもできる。 Also, as shown in FIG. 7, a predetermined image prepared in advance (in the example of FIG. 7, an image painted in black) may be displayed as the background region Q.

ステップＳ６で、背景領域に対して画像処理が行われたとき、またはステップＳ２で、背景隠蔽処理を行わないと判定されたとき、ステップＳ７に進み、送信処理部５３は、画像音声データ（画像処理が施された画像または画像処理が施されていない画像の画像データ、および音声データ）を、ユーザインタフェース部５１に供給する。 When image processing is performed on the background area in step S6 or when it is determined not to perform background concealment processing in step S2, the process proceeds to step S7, and the transmission processing unit 53 performs image / audio data (image Image data and sound data of an image that has been processed or an image that has not been subjected to image processing are supplied to the user interface unit 51.

次に、ステップＳ８において、送信処理部５３は、画像処理が施されたまたは画像処理が施されてない画像データ、および入力された音声データをエンコードするとともに、ステップＳ９において、誤り訂正などのチャンネルエンコードおよびパケット化処理を行い、ステップＳ１０において、その結果得られたストリームを、テレビ電話装置３に送信する。 Next, in step S8, the transmission processing unit 53 encodes image data that has been subjected to image processing or has not been subjected to image processing, and input audio data, and in step S9, a channel for error correction or the like. Encoding and packetization processing are performed, and the stream obtained as a result is transmitted to the videophone device 3 in step S10.

その後、送信処理部５３は、ステップＳ１に戻りそれ以降の処理を同様に実行する。 Thereafter, the transmission processing unit 53 returns to Step S1 and similarly performs the subsequent processing.

以上のように、被写体領域を、顔領域の大きさおよび位置と、所定の係数との演算により求めるようにしたので、被写体領域および背景領域の特定を簡単に行うことができ、また背景領域の隠蔽を簡単に行うことができる。 As described above, since the subject area is obtained by calculating the size and position of the face area and a predetermined coefficient, it is possible to easily identify the subject area and the background area. Concealment can be performed easily.

また背景領域をぼかしたり一色に塗りつぶしたり、または被写体を単純な画像に置き換えるなど行うことで、通話相手に送信されるデータ量を削減することもできる。 Also, the amount of data transmitted to the other party can be reduced by blurring the background area, painting it with a single color, or replacing the subject with a simple image.

なお以上においては、ステップＳ３で顔領域が検出されるものとしたが、顔領域が検出されない場合、ステップＳ４で被写体領域も特定されない。このように被写体領域が特定されない場合、ステップＳ５において、例えば全ての領域が背景領域とされ、ステップＳ６以降の処理が行われる。 In the above description, the face area is detected in step S3. However, if no face area is detected, the subject area is not specified in step S4. When the subject area is not specified in this way, in step S5, for example, all the areas are set as the background area, and the processes after step S6 are performed.

次に、ステップＳ４における被写体領域特定処理の他の例について説明する。 Next, another example of the subject area specifying process in step S4 will be described.

以上においては、被写体領域Ｐの形を凸の文字の形としたが（図４および図５）、顔領域Ｗを含み、その下方に伸びる大きな領域であれば、図８Ａ乃至図８Ｃに示すような形を被写体領域Ｐとすることもできる。 In the above, the shape of the subject region P is a convex character shape (FIGS. 4 and 5). However, as long as it is a large region that includes the face region W and extends below it, as shown in FIGS. 8A to 8C. The subject area P can also be a simple shape.

図９は、被写体領域Ｐを、図８Ａの形とした場合の例を示している。このように被写体領域Ｐの、頭部の左右部分および両肩部分に対応する部分に角度を持たせるようにすることで、より人の体に合った領域を被写体領域Ｐとすることができる。 FIG. 9 shows an example in which the subject region P has the shape shown in FIG. 8A. In this way, by providing an angle to the portions corresponding to the left and right portions of the head and both shoulder portions of the subject region P, a region more suitable for the human body can be set as the subject region P.

図１０は、被写体領域Ｐを、図８Ｃの形とした場合の例を示している。なおこの被写体領域Ｐを縦長の楕円形にすることもできる。 FIG. 10 shows an example in which the subject region P has the shape shown in FIG. 8C. Note that the subject area P may be formed in a vertically long ellipse.

また、以上においては、顔領域Ｗの幅ｘから被写体領域Ｐの幅（ａ１×ｘ, ａ２×ｘ）を求め、また顔領域Ｗの高さｙから被写体領域Ｐの高さ（ｂ１×ｙ, （ｂ２−ｂ１）×ｙ）を求めたが、これに限らず、顔領域Ｗの面積ｘｙを基準とし、被写体領域Ｐの幅を、それぞれａ１×ｘｙ, ａ２×ｘｙとし、高さを、ｂ１×ｘｙ，（ｂ２−ｂ１）×ｘｙとすることもできる。ただし、この場合の各定数は、適宜変更される。 In the above, the width (a1 × x, a2 × x) of the subject region P is obtained from the width x of the face region W, and the height (b1 × y, (B2-b1) × y) is obtained, but the present invention is not limited to this, and the width of the subject region P is set to a1 × xy and a2 × xy, respectively, with the area xy of the face region W as a reference, and the height is set to b1 × xy, (b2-b1) × xy can also be used. However, each constant in this case is changed as appropriate.

さらに、得られた顔領域Ｗに対して上記のような単純な幾何学的な関係で決まる領域に決定するのではなく、従来から知られる背景差分などの方法を用いて、被写体である人が画像内に登場する以前の画像との差分情報をも利用し、上記のような簡易な方法と組み合わせることで、演算処理を簡単なものにし、CPU２４やメモリ２３に負荷をかけない方法で、被写体領域を決定することもできる。 Further, instead of determining the area determined by the simple geometrical relationship as described above with respect to the obtained face area W, a person who is the subject can use a conventionally known method such as background subtraction. Using the difference information with the previous image appearing in the image and combining it with the simple method as described above, the arithmetic processing is simplified and the subject is not subjected to a load on the CPU 24 or the memory 23. A region can also be determined.

また図４および図５の例では、凸の文字の形における上部の小さな長方形の領域Ｐ１と下部の大きな長方形の領域Ｐ２の位置関係は、領域Ｐ１および領域Ｐ２の左右方向（Ｘ軸上）の中心がそれぞれ一致するようになされていたが、顔領域Ｗのフレーム上の位置（例えば、Ｘ軸上の位置）に応じて、図１１に示すように、領域Ｐ１と領域Ｐ２のＸ軸上の中心をずらすこともできる。図１１の例では、領域Ｐ２のＸ軸上の中心Ｃp2xが領域Ｐ１（顔領域Ｗ）の中心Ｃwxに対して、図中左方向にずれている。 4 and 5, the positional relationship between the upper small rectangular area P1 and the lower large rectangular area P2 in the shape of the convex character is in the horizontal direction (on the X axis) of the area P1 and the area P2. Although the centers are made to coincide with each other, depending on the position of the face area W on the frame (for example, the position on the X axis), as shown in FIG. 11, the area P1 and the area P2 on the X axis You can also shift the center. In the example of FIG. 11, the center Cp2x on the X axis of the region P2 is shifted to the left in the drawing with respect to the center Cwx of the region P1 (face region W).

たとえば、図１２に示すように、ユーザＵの顔Ｕａが、カメラ２６の撮像方向に向かって図中右側に位置し、左方向を向くようにしてカメラ２６を見ている場合は、ユーザＵの右肩ＵRがカメラ２６に対して前方に、そして左肩ＵLが後方に向けられるので、その撮像画像では、図１３に示すように、ユーザの顔領域Ｗがフレーム画面上の中心Ｃcより図面上左側に位置し、身体の右側（図面上左側）の画像領域が、身体の左側（図面上右側）の領域に比べ大きくなる。 For example, as shown in FIG. 12, when the user U's face Ua is located on the right side in the drawing toward the imaging direction of the camera 26 and is looking leftward, the user U's face Ua Since the right shoulder UL is directed forward with respect to the camera 26 and the left shoulder UL is directed rearward, in the captured image, as shown in FIG. 13, the face area W of the user is on the left side of the drawing from the center Cc on the frame screen. The image area on the right side (left side on the drawing) of the body is larger than the area on the left side (right side on the drawing) of the body.

すなわちこの特性を利用して、例えば、顔領域ＷのＸ軸上の中心Ｃwxが、フレームのＸ軸上の中心Ｃcより、距離ｄだけずれている場合、そのずれている方向に、領域Ｐ２のＸ軸上の中心Ｃp2xが、係数ｋ×ｄだけずれるようにすることにより、より正確にユーザの姿を被写体領域Ｐとすることができる。 That is, using this characteristic, for example, when the center Cwx on the X axis of the face region W is shifted by the distance d from the center Cc on the X axis of the frame, By making the center Cp2x on the X axis deviate by a factor k × d, the user can be more accurately set as the subject region P.

また以上においては、被写体としての人が一人である場合を例として説明したが、図１４に示すように、複数人であっても、本発明を適用することができる。 In the above description, the case where there is only one person as the subject has been described as an example. However, as shown in FIG. 14, the present invention can be applied to a plurality of persons.

すなわち図１４に示すように、被写体としての人が３人である場合、いずれの被写体領域Ｐa乃至Ｐcに属さない領域が背景領域Ｑとされ、その背景領域Ｑにぼかし効果のあるフィルタをかけたり(図１５)、単純に背景領域をなんらかの色で塗りつぶしたり、画像を貼り付けたりして（図１６）、背景を置き換えることもできる。 That is, as shown in FIG. 14, when there are three persons as subjects, the region that does not belong to any subject region Pa to Pc is set as the background region Q, and a filter having a blurring effect is applied to the background region Q. (FIG. 15) The background can also be replaced by simply painting the background area with some color or pasting an image (FIG. 16).

なお以上においては、背景領域に対して画像処理が施される場合を例として説明したが、被写体領域に対してぼかし処理等の画像処理を施すようにすることもできる。 In the above description, the case where image processing is performed on the background region has been described as an example. However, image processing such as blurring processing may be performed on the subject region.

上述した一連の処理は、ハードウエアにより実行させることもできるし、ソフトウエアにより実行させることもできる。一連の処理をソフトウエアにより実行させる場合には、そのソフトウエアを構成するプログラムが、専用のハードウエアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどに、プログラム記録媒体からインストールされる。 The series of processes described above can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software executes various functions by installing a computer incorporated in dedicated hardware or various programs. For example, it is installed from a program recording medium in a general-purpose personal computer or the like.

図１７は、上述した一連の処理をプログラムにより実行するパーソナルコンピュータの構成の例を示すブロック図である。CPU（Central Processing Unit）２０１は、ROM（Read Only Memory）２０２、または記憶部２０８に記憶されているプログラムに従って各種の処理を実行する。RAM（Random Access Memory）２０３には、CPU２０１が実行するプログラムやデータなどが適宜記憶される。これらのCPU２０１、ROM２０２、およびRAM２０３は、バス２０４により相互に接続されている。 FIG. 17 is a block diagram showing an example of the configuration of a personal computer that executes the above-described series of processing by a program. A CPU (Central Processing Unit) 201 executes various processes according to a program stored in a ROM (Read Only Memory) 202 or a storage unit 208. A RAM (Random Access Memory) 203 appropriately stores programs executed by the CPU 201 and data. The CPU 201, ROM 202, and RAM 203 are connected to each other via a bus 204.

コンピュータにインストールされ、コンピュータによって実行可能な状態とされるプログラムを格納するプログラム記録媒体は、図１７に示すように、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM(Compact Disc-Read Only Memory),DVD(Digital Versatile Disc)を含む）、光磁気ディスク、もしくは半導体メモリなどよりなるパッケージメディアであるリムーバブルメディア２１１、または、プログラムが一時的もしくは永続的に格納されるROM２０２や、記憶部２０８を構成するハードディスクなどにより構成される。プログラム記録媒体へのプログラムの格納は、必要に応じてルータ、モデムなどのインタフェースである通信部２０９を介して、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の通信媒体を利用して行われる。 As shown in FIG. 17, a program recording medium that stores a program that is installed in a computer and is ready to be executed by the computer includes a magnetic disk (including a flexible disk), an optical disk (CD-ROM (Compact Disc-Read Only). Memory, DVD (Digital Versatile Disc), a magneto-optical disk, a removable medium 211 that is a package medium composed of a semiconductor memory, or the like, a ROM 202 in which a program is temporarily or permanently stored, or a storage unit 208 It is comprised by the hard disk etc. which comprise. The program is stored in the program recording medium using a wired or wireless communication medium such as a local area network, the Internet, or digital satellite broadcasting via a communication unit 209 that is an interface such as a router or a modem as necessary. Done.

なお、本明細書において、プログラム記録媒体に格納されるプログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 In the present specification, the step of describing the program stored in the program recording medium is not limited to the processing performed in time series in the order described, but is not necessarily performed in time series. Or the process performed separately is also included.

また、本明細書において、システムとは、複数の装置により構成される装置全体を表すものである。 Further, in this specification, the system represents the entire apparatus constituted by a plurality of apparatuses.

本発明を適用したテレビ電話装置１の構成例を示している。1 shows a configuration example of a videophone device 1 to which the present invention is applied. 図１のテレビ電話装置１の機能的構成例を示している。The example of a functional structure of the video telephone apparatus 1 of FIG. 1 is shown. 図２の送信処理部５３の動作を説明するフローチャートである。3 is a flowchart illustrating an operation of a transmission processing unit 53 in FIG. 顔領域に基づく被写体領域を説明する図である。It is a figure explaining the to-be-photographed object area | region based on a face area. 被写体領域の表示例を示す図である。It is a figure which shows the example of a display of a to-be-photographed area | region. 背景領域の表示例を示す図である。It is a figure which shows the example of a display of a background area | region. 背景領域の他の表示例を示す図である。It is a figure which shows the other example of a display of a background area | region. 顔領域に基づく他の被写体領域を説明する図である。It is a figure explaining the other to-be-photographed object area | region based on a face area. 被写体領域の他の表示例を示す図である。It is a figure which shows the other example of a display of a to-be-photographed area | region. 被写体背景領域の他の表示例を示す図である。It is a figure which shows the other example of a subject background area | region. 顔領域に基づく他の被写体領域を説明する図である。It is a figure explaining the other to-be-photographed object area | region based on a face area. 図１１の被写体領域を説明する図である。It is a figure explaining the to-be-photographed object area | region of FIG. 図１１の被写体領域の例を示す図である。It is a figure which shows the example of the to-be-photographed object area | region of FIG. 顔領域に基づく他の被写体領域を説明する図である。It is a figure explaining the other to-be-photographed object area | region based on a face area. 背景領域の他の表示例を示す図である。It is a figure which shows the other example of a display of a background area | region. 背景領域の他の表示例を示す図である。It is a figure which shows the other example of a display of a background area | region. コンピュータの構成例を示すブロック図である。It is a block diagram which shows the structural example of a computer.

符号の説明Explanation of symbols

１テレビ電話装置，５１ユーザインタフェース部，５２制御部，５３送信処理部，５４受信処理部 1 video telephone device, 51 user interface unit, 52 control unit, 53 transmission processing unit, 54 reception processing unit

Claims

画像を入力する画像入力手段と、
前記画像入力手段により入力された前記画像の中から、人の顔領域を検出する検出手段と、
前記検出手段により検出された前記顔領域に基づいて、前記画像入力手段により入力された前記画像の中から、前記人の姿全体の被写体領域を特定する第１の特定手段と、
前記第1の特定手段により特定された前記被写体領域に基づいて、前記画像入力手段により入力された前記画像から、前記人の姿以外の背景領域を特定する第２の特定手段と、
前記被写体領域もしくは前記背景領域の少なくともいずれか一方に対して所定の画像処理を行う処理手段と、
前記処理手段により処理された前記画像を出力する出力手段と
を備える画像処理装置。 An image input means for inputting an image;
Detecting means for detecting a human face area from the image input by the image input means;
Based on the face area detected by the detection means, from the image input by the image input means, a first specifying means for specifying a subject area of the entire human figure;
Second specifying means for specifying a background area other than the human figure from the image input by the image input means based on the subject area specified by the first specifying means;
Processing means for performing predetermined image processing on at least one of the subject area and the background area;
An image processing apparatus comprising: output means for outputting the image processed by the processing means.

前記第２の特定手段は、前記被写体領域以外の領域を、前記背景領域とする
請求項１の記載の画像処理装置。 The image processing apparatus according to claim 1, wherein the second specifying unit sets an area other than the subject area as the background area.

前記被写体領域は、あらかじめ決められた形状の領域であり、
前記第１の特定手段は、前記被写体領域のフレーム上の大きさまたは位置を、前記顔領域の大きさまたは位置に基づいて決定する
請求項１に記載の画像処理装置。 The subject area is an area having a predetermined shape,
The image processing apparatus according to claim 1, wherein the first specifying unit determines the size or position of the subject area on a frame based on the size or position of the face area.

前記被写体領域は、第１の領域と、前記第１の領域より広い領域の前記第２の領域とからなり、フレーム上、前記第１の領域は上方に前記第２の領域は下方に並んで配置されており、
前記第1の特定手段は、前記顔領域が前記被写体領域に含まれるように、前記被写体領域を特定する
請求項３に記載の画像処理装置。 The subject area includes a first area and the second area that is wider than the first area. The first area is arranged above the frame and the second area is arranged below the frame. Has been placed,
The image processing apparatus according to claim 3, wherein the first specifying unit specifies the subject region so that the face region is included in the subject region.

前記第１の特定手段は、前記顔領域のフレーム上の位置に基づいて、前記第１の領域に対する前記第２の領域の位置を変更する
請求項４に記載の画像処理装置。 The image processing apparatus according to claim 4, wherein the first specifying unit changes a position of the second area with respect to the first area based on a position of the face area on a frame.

前記第１の特定手段は、前記顔領域のフレーム上の水平方向の位置に基づいて、前記第２の領域の水平方向の位置を変更する
請求項５に記載の画像処理装置。 The image processing apparatus according to claim 5, wherein the first specifying unit changes a horizontal position of the second area based on a horizontal position of the face area on a frame.

画像を入力する画像入力ステップと、
前記画像入力ステップの処理で入力された前記画像の中から、人の顔領域を検出する検出ステップと、
前記検出ステップの処理で検出された前記顔領域に基づいて、前記画像入力ステップの処理で入力された前記画像の中から、前記人の姿全体の被写体領域を特定する第１の特定ステップと、
前記第1の特定ステップの処理で特定された前記被写体領域に基づいて、前記画像入力ステップの処理で入力された前記画像から、前記人の姿以外の背景領域を特定する第２の特定ステップと、
前記被写体領域もしくは前記背景領域の少なくともいずれか一方に対して所定の画像処理を行う処理ステップと、
前記処理ステップの処理で処理された前記画像を出力する出力ステップと
を含む画像処理方法。 An image input step for inputting an image;
A detection step of detecting a human face region from the image input in the processing of the image input step;
Based on the face area detected in the process of the detection step, a first specifying step of specifying a subject area of the entire human figure from the images input in the process of the image input step;
A second specifying step of specifying a background region other than the human figure from the image input in the processing of the image input step based on the subject region specified in the processing of the first specifying step; ,
Processing for performing predetermined image processing on at least one of the subject area and the background area;
An output step of outputting the image processed in the processing of the processing step.

画像を入力する画像入力ステップと、
前記画像入力ステップの処理で入力された前記画像の中から、人の顔領域を検出する検出ステップと、
前記検出ステップの処理で検出された前記顔領域に基づいて、前記画像入力ステップの処理で入力された前記画像の中から、前記人の姿全体の被写体領域を特定する第１の特定ステップと、
前記第1の特定ステップの処理で特定された前記被写体領域に基づいて、前記画像入力ステップの処理で入力された前記画像から、前記人の姿以外の背景領域を特定する第２の特定ステップと、
前記被写体領域もしくは前記背景領域の少なくともいずれか一方に対して所定の画像処理を行う処理ステップと、
前記処理ステップの処理で処理された前記画像を出力する出力ステップと
を含む処理をコンピュータに実行させるプログラム。 An image input step for inputting an image;
A detection step of detecting a human face region from the image input in the processing of the image input step;
Based on the face area detected in the process of the detection step, a first specifying step of specifying a subject area of the entire human figure from the images input in the process of the image input step;
A second specifying step of specifying a background region other than the human figure from the image input in the processing of the image input step based on the subject region specified in the processing of the first specifying step; ,
Processing for performing predetermined image processing on at least one of the subject area and the background area;
A program for causing a computer to execute processing including: an output step of outputting the image processed in the processing of the processing step.