JP2004064102A

JP2004064102A - Virtual video phone and image generating method in virtual video phone

Info

Publication number: JP2004064102A
Application number: JP2002215379A
Authority: JP
Inventors: Noritaka Shimizu; 清水　規貴
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2002-07-24
Filing date: 2002-07-24
Publication date: 2004-02-26

Abstract

<P>PROBLEM TO BE SOLVED: To provide a virtual video phone capable of realizing the facial expression of a person using an outline image in real time and in a realistic way by producing the outline image from a moving picture of the imaged person. <P>SOLUTION: The virtual video phone is provided with: an expression analysis means 103 for analyzing the expression of the face from the moving picture of the person; an outline image generation means 109 for generating the outline image of the face of the person from the moving picture of the person and storing the outline image to a storage device 112; an outline image selection means 107 for selecting the concerned outline image among outline images stored in the storage device 112 according to a result of the analysis of expression; and a special effect processing means 108 for applying a special effect to the selected outline image in response to the expression analyzed by the expression analysis means 103. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、携帯電話や無線ローカルエリアネットワーク（ＬＡＮ）、インターネットと言った通信基盤を利用して遠隔地の不特定多数の受信者にリアルタイムに送信者の感情や表情を伝える仮想テレビ電話装置および仮想テレビ電話装置における画像生成方法に関する。
【０００２】
【従来の技術】
従来の仮想テレビ電話装置としては、例えば音声に応じた口形状を、データベースに登録されているアニメーションから選択することで、遠隔地にいる受信者に口形状の動きと音声が同期したアニメーション画像を画面表示させるものがある。
【０００３】
図２は従来の仮想テレビ電話装置の構成を示すブロック図である。図２において、マイクロホンを通じて入力された送信者の音声を音声分析手段２０１に入力すると、音声分析手段２０１はその音圧レベルおよび周波数に応じた音声パラメータを抽出して画像情報取得手段２０２に送信する。画像情報取得手段２０２は、音声パラメータを基にアニメーションデータベース２０３から対応するアニメーション画像を選択する。これにより音声に応じた動きをする口の映像を出力することができる。
【０００４】
【発明が解決しようとする課題】
しかしながら、従来の仮想テレビ電話装置では、音声に応じて口の形状が変化する画像を選択して出力しているため、送信者である人物の口の動きは受信者に分かるものの顔にあらわれる細かい表情や感情を相手の受信者に伝えることができないと言う問題がある。
【０００５】
本発明は上記従来の問題点を解決するもので、画面表示するアニメーション画像に臨場感を付与することができる仮想テレビ電話装置および仮想テレビ電話装置における画像生成方法を提供することを目的とする。
【０００６】
【課題を解決するための手段】
請求項１記載の仮想テレビ電話装置は、撮像装置により撮像された人物の動画像から人物の顔の表情を分析する表情分析手段と、前記撮像装置により撮像された人物の動画像から当該人物の顔の簡略画像を作成して記憶装置に蓄積する簡略画像作成手段と、前記表情分析手段から得られた表情分析結果に従って前記記憶装置に蓄積されている簡略画像から対応する簡略画像を選択する簡略画像選択手段と、前記簡略画像選択手段により選択された簡略画像に対して、前記表情分析手段により分析した表情に応じて特殊効果を施す特殊効果処理手段とを具備したことを特徴とする。
【０００７】
また、請求項６記載の仮想テレビ電話装置における画像生成方法は、撮像した人物の動画像からこの人物の顔の表情を分析する表情分析工程と、前記動画像から人物の顔の簡略画像を作成して記憶装置に蓄積する簡略画像作成工程と、前記表情分析ステップで抽出された表情パラメータを基に前記記憶装置から簡略画像を選択する簡略画像選択工程と、前記簡略画像選択工程で作成された簡略画像に対し表情パラメータに応じた特殊効果を施す特殊効果処理工程とを具備することを特徴とする。
【０００８】
上記構成によれば、送信者の動画像から得た顔の表情の情報をリアルタイムに取り出して、これを利用して表情豊かな顔のアニメーション画像を作成することができる。
【０００９】
請求項２記載の仮想テレビ電話装置は、撮像装置により撮像された人物の動画像から人物の顔の表情を分析する表情分析手段と、前記撮像装置により撮像された人物の動画像から当該人物の顔の簡略画像を作成して記憶装置に蓄積する簡略画像作成手段と、音声入力装置より入力された音声に基づいて前記人物の口形状を推測する口形状推測手段と、前記表情分析手段および口形状推測手段から得られた表情分析結果および口形状推測結果に従って前記記憶装置に蓄積されている簡略画像から対応する簡略画像を選択する簡略画像選択手段と、前記簡略画像選択手段により選択された簡略画像に対して、前記表情分析手段により分析した表情に応じて特殊効果を施す特殊効果処理手段とを具備したことを特徴とする。
【００１０】
また、請求項７記載の仮想テレビ電話装置における画像生成方法は、撮像した人物の動画像からこの人物の顔の表情を分析する表情分析工程と、前記人物の音声から該人物の口形状を推測する口形状推測工程と、前記動画像から人物の顔の簡略画像を作成して記憶装置に蓄積する簡略画像作成工程と、前記表情分析工程および口形状推測工程で抽出された表情パラメータおよび口形状パラメータを基に前記記憶装置から簡略画像を選択する簡略画像選択工程と、前記簡略画像選択工程で作成された簡略画像に対し表情パラメータに応じた特殊効果を施す特殊効果付加処理工程と、を具備することを特徴とする。
【００１１】
上記構成によれば、送信者の動画像から得た顔の表情の情報と送信者の音声に基づいて得た口形状情報とをリアルタイムに取り出して、これを利用してより表情豊かな顔のアニメーション画像を作成することができる。
【００１２】
請求項３記載の仮想テレビ電話装置は、請求項１または請求項２記載の仮想テレビ電話装置において、前記表情分析手段を、撮像された人物の動画像からこの人物の顔領域を抜き出す顔領域画像抜き出し部と、前記顔領域画像抜き出し部より抜き出した顔領域の画像から顔構成要素を判断する顔構成要素判断部と、前記顔構成要素判断部による顔構成要素の判断結果から表情を判断しパラメータ化する表情判断部とから構成したことを特徴とする。
【００１３】
上記構成によれば、動画像の一部である顔情報のみを取り出して、その顔の各部における特徴のある表情を高精度にパラメータ化することができる。
【００１４】
請求項４記載の仮想テレビ電話装置は、請求項１または請求項２記載の仮想テレビ電話装置において、前記特殊効果処理手段に、前記表情分析手段において抽出された表情パラメータに応じて前記簡略画像選択手段により選択した簡略画像を変形させたり、背景を変更したりする画像処理機能を持たせたことを特徴とする。
【００１５】
上記構成によれば、表情パラメータに応じて顔アニメーションを任意かつ効果的に変形させて、喜怒哀楽の表情を強調したり変化させたりすることができる。
【００１６】
請求項５記載の仮想テレビ電話装置は、請求項１から請求項４のいずれか１項に記載の仮想テレビ電話装置において、前記簡略画像作成手段に、前記撮像装置から得られた動画像の顔の各部の三次元位置を計測する三次元位置計測部と、前記三次元位置計測部により計測された三次元位置情報を基に立体画像を作成する立体画像作成部とを設けたことを特徴とする。
【００１７】
上記構成によれば、簡略画像の顔面を立体的に変化させることができ、仮想テレビ電話システムであるにも拘らず、送信者の表情を相手に詳しく伝達することができる。
【００１８】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して説明する。図１は本発明の一実施の形態に係る仮想テレビ電話装置を示すブロック図である。本実施の形態の仮想テレビ電話装置は、人物の表情を分析し、この表情に応じて選択された簡略画像に特殊効果を施すというものである。
【００１９】
図１において、１０１は画像入力装置としての撮像装置であり、ここでは人物の動画像を撮像するビデオカメラなどが用いられる。１０２は撮像した動画像の映像信号と同期信号とを混合した複合映像信号の処理を行って、次段の表情分析手段１０３へ入力する画像処理手段である。ここで、撮像対象となるのは送信側の人物である。
【００２０】
表情分析手段１０３は、入力された動画像より人物の顔領域の画像を特定して抜き出す顔領域画像抜き出し部１０４と、特定された顔領域の中から人物の顔を構成している目、鼻、口といった各部を特定する顔構成要素判断部１０５と、特定された顔各部の変形や位置関係、変化量から喜怒哀楽といった表情パラメータを抽出する表情判断部１０６とから構成され、表情を数値化した表情パラメータを簡略画像選択手段１０７と特殊効果処理手段１０８に出力する。
【００２１】
また、１０９は画像処理手段１０２を介して撮像装置１０１から得られた動画像に基づいて、人物の顔を簡略化したアニメーション画像をリアルタイムに作成する簡略画像作成手段である。１１２はこのアニメーション画像を記憶する記憶装置としての簡略画像データベースである。簡略画像作成手段１０９には、動画各部の三次元位置を計測する三次元位置計測部１１０および三次元位置計測部１１０により計測された三次元位置情報を基に立体画像を作成する立体画像作成部１１１が設けられている。なお、簡略画像データベース１１２には、人物の顔のアニメーションだけではなく、動物や他の人物といった色々なキャラクタを登録することができる。これにより、オリジナルなアニメーション画像を作成することができる。
【００２２】
１１３は音声入力装置としての収音用のマイクロホンであり、このマイクロホン１１３には送信側の前記人物が話す音声を増幅し、不要周波数成分およびノイズを除去して口形状推測手段１１５へ入力する音声処理手段１１４が接続されている。口形状推測手段１１５は入力された音声信号に基づいて人物の口形状を推測する。口形状推測手段１１５は、入力された音声を分析することでリアルタイムにどの音を発しているかを推測し、簡略画像選択手段１０７に出力する。
【００２３】
簡略画像選択手段１０７は、表情分析手段１０３から出力される表情パラメータおよび口形状の分析結果を基に、簡略画像作成手段１０９で作成されて記憶装置１１２に記憶された顔アニメーションまたは既に登録されているキャラクタアニメーションから、音声や表情に応じた画像を選択するように機能する。特殊効果処理手段１０８は、表情分析手段１０３から出力される表情パラメータに応じて選択された顔アニメーションを、変形させたり強調させたりする顔画像変形処理部１１０と、背景となる画像を切り替える背景処理部１１７とを備え、合成された表情付き簡略画像を出力するように機能する。
【００２４】
上記構成の仮想テレビ電話装置は、送信側の人物である送信者（発信者）を画像入力装置である撮像装置１０１により撮像し、撮像した人物の動画像を画像処理手段１０２を通じて表情分析手段１０３に入力する。表情分析手段１０３は、まず、動画像情報の中から、顔領域画像抜き出し部１０４によって人物の顔領域を特定して抜き出し、抜き出した顔領域の動画像における目、鼻、口などの顔各部の要素を顔構成要素判断部１０５において判断し、これらの顔各部の変形、位置関係、変化量から、表情判断部１０６が人物の喜怒哀楽を判断して、これら表情に対応して数値化した表情パラメータを出力する。表情パラメータは簡略画像選択手段１０７および特殊効果処理手段１０８に入力される。
【００２５】
一方、画像処理手段１０２から出力された動画像情報は簡略画像作成手段１０９にも入力される。簡略画像作成手段１０９は、人物の動画像から人物の基本である顔の簡略画像をアニメーションとしてリアルタイムにて作成し、記憶装置である簡略画像データベース１１２に登録する。なお、簡略画像作成手段１０９に三次元位置計測部１１０を設けることで、簡略画像における三次元位置を特定できる。そして特定した三次元位置に基づいて立体画像作成部１１１が立体画像を作成することができる。これにより立体的な簡略画像が作成される。
【００２６】
また、送信者は音声入力装置であるマイクロホン１１３に音声を入力すると、マイクロホン１１３からの音声信号が音声処理手段１１４にて増幅され、さらにノイズ等のフィルタリング処理が行われて口形状推測手段１１５に入力される。口形状推測手段１１５は、音声信号のレベルや周波数を解析し、解析結果が口形状およびその口形状の変化を推定する。つまり、人物の発した音声に基づいてその音声を発している口の形状およびその変化を判別して、口形状推測手段１１５がその判別結果としての口形状パラメータを前記簡略画像選択手段１０７に入力する。
【００２７】
従って、簡略画像選択手段１０７は口形状パラメータおよび表情分析手段１０３からの表情パラメータに基づいて記憶装置１１２に蓄積されている簡略画像情報の中から対応する簡略画像情報を選択して取り出し、特殊効果処理手段１０８に入力する。このため、特殊効果処理手段１０８では選択された簡略画像に対し、表情分析手段１０３から得られた表情パラメータに応じて特殊効果を施すこととなる。例えば、顔画像変形処理部１１６によって簡略画像の顔アニメーションを変形させたり、背景処理部１１７によって背景となる画像を切り換えて合成したりすることができる。この結果、特殊処理を施した表情付きの簡略画像情報が、図示しない画像制御手段を介して画像メモリに書き込まれるとともに、画像メモリから簡略画像情報を順次読み出して受信側に送信する。このため、受信側では送信者の顔の表情を表現した簡略画像を看取できることになる。
【００２８】
なお、本実施の形態では、簡略画像選択手段１０７は表情分析手段１０３からの表情パラメータおよび口形状推測手段１１５からの音声情報に基づく口形状パラメータの両方に基づいて、簡略画像データベースである記憶装置１１２に蓄えられている簡略画像の中から対応する簡略画像を選択し、これに特殊効果処理を施すこととしたが、表情パラメータのみに基づいて、記憶装置１１２に蓄えられた簡略画像の中から対応する簡略画像を選択してもよく、特殊効果処理を施せば、目、鼻、頬などといった顔面各部における喜怒哀楽の表情をアニメーション画像上に具現して、受信者に看取せしめることができる。
【００２９】
【発明の効果】
以上説明したように、本発明によれば、仮想テレビ電話装置において、送信者の動画像から得た顔の表情の情報をリアルタイムに取り出して、これを利用して表情豊かな顔のアニメーション画像を作成することができる。
【図面の簡単な説明】
【図１】本発明の一実施の形態に係る仮想テレビ電話装置の構成を示すブロック図。
【図２】従来の仮想テレビ電話装置の構成を示すブロック図。
【符号の説明】
１０１　撮像装置
１０３　表情分析手段
１０４　顔領域画像抜き出し部
１０５　顔構成要素判断部
１０６　表情判断部
１０７　簡略画像選択手段
１０８　特殊効果処理手段
１０９　簡略画像作成手段
１１０　三次元位置計測部
１１１　立体画像作成部
１１２　簡略画像データベース（記憶装置）
１１３　マイクロホン（音声入力装置）
１１５　口形状推測手段
１１６　顔画像変形処理部
１１７　背景処理部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a virtual videophone device that transmits a sender's emotions and expressions in real time to an unspecified large number of recipients using a communication infrastructure such as a mobile phone, a wireless local area network (LAN), and the Internet. The present invention relates to a method for generating an image in a virtual videophone device.
[0002]
[Prior art]
As a conventional virtual videophone device, for example, by selecting a mouth shape corresponding to a voice from animation registered in a database, an animation image in which the movement of the mouth shape and the voice are synchronized with a remote receiver can be provided. Some are displayed on the screen.
[0003]
FIG. 2 is a block diagram showing a configuration of a conventional virtual videophone device. In FIG. 2, when a voice of a sender input through a microphone is input to a voice analysis unit 201, the voice analysis unit 201 extracts a voice parameter corresponding to the sound pressure level and frequency and transmits the voice parameter to the image information acquisition unit 202. . The image information acquisition means 202 selects a corresponding animation image from the animation database 203 based on the audio parameters. As a result, it is possible to output an image of the mouth that moves according to the sound.
[0004]
[Problems to be solved by the invention]
However, in the conventional virtual videophone device, since the image whose mouth shape changes in accordance with the voice is selected and output, the mouth movement of the person who is the sender can be seen by the receiver but the fine movements appearing on the face There is a problem that facial expressions and emotions cannot be transmitted to the recipient of the other party.
[0005]
SUMMARY OF THE INVENTION An object of the present invention is to provide a virtual videophone device and an image generation method for the virtual videophone device capable of giving a sense of reality to an animation image displayed on a screen.
[0006]
[Means for Solving the Problems]
A virtual videophone device according to claim 1, wherein the facial expression analysis means analyzes a facial expression of a person from a moving image of the person captured by the imaging device, and a facial image of the person from the moving image of the person captured by the imaging device. A simplified image creating means for creating a simplified image of the face and storing the same in the storage device; and a simplified method for selecting a corresponding simplified image from the simplified images stored in the storage device in accordance with the expression analysis result obtained from the expression analysis means. The image processing apparatus further includes an image selecting unit and a special effect processing unit that applies a special effect to the simplified image selected by the simplified image selecting unit in accordance with the facial expression analyzed by the facial expression analyzing unit.
[0007]
According to a sixth aspect of the present invention, there is provided an image generating method for a virtual videophone device, comprising: a facial expression analyzing step of analyzing a facial expression of a person from a captured moving image; and a simplified image of a human face from the moving image. A simplified image creating step of storing a simple image from the storage device based on the facial expression parameters extracted in the facial expression analyzing step. A special effect processing step of applying a special effect according to the expression parameter to the simplified image.
[0008]
According to the above configuration, facial expression information obtained from a moving image of a sender is extracted in real time, and an animation image of an expressive facial image can be created by using the information.
[0009]
The virtual videophone device according to claim 2, wherein the facial expression analyzing means for analyzing the facial expression of the person from the moving image of the person imaged by the imaging device, and the expression of the person from the moving image of the person imaged by the imaging device. A simplified image creating means for creating a simplified image of a face and storing it in a storage device; a mouth shape estimating means for estimating the mouth shape of the person based on voice input from a voice input device; Simple image selecting means for selecting a corresponding simple image from the simple images stored in the storage device in accordance with the expression analysis result and the mouth shape estimating result obtained from the shape estimating means, and the simplified image selected by the simple image selecting means A special effect processing means for applying a special effect to the image according to the facial expression analyzed by the facial expression analyzing means.
[0010]
According to a seventh aspect of the present invention, there is provided an image generating method for a virtual videophone device, comprising: a facial expression analyzing step of analyzing a facial expression of a person from a captured moving image; and a mouth shape of the person from voice of the person. Mouth shape estimating step, a simplified image creating step of creating a simplified image of a person's face from the moving image and storing the same in a storage device, the facial expression parameters and the mouth shape extracted in the facial expression analyzing step and the mouth shape estimating step A simplified image selecting step of selecting a simplified image from the storage device based on the parameters; and a special effect adding processing step of applying a special effect according to an expression parameter to the simplified image created in the simplified image selecting step. It is characterized by doing.
[0011]
According to the above configuration, information on the facial expression obtained from the moving image of the sender and mouth shape information obtained based on the voice of the sender are extracted in real time, and the extracted facial shape information is used for a more expressive face. Animation images can be created.
[0012]
A virtual videophone device according to claim 3, wherein the facial expression analysis unit extracts a face region of a person from a captured moving image of the person in the virtual videophone device according to claim 1 or 2. A extracting unit, a face component determining unit that determines a face component from an image of the face region extracted from the face region image extracting unit, and a parameter that determines a facial expression from a result of the face component determining performed by the face component determining unit. And a facial expression judging unit to be converted.
[0013]
According to the above configuration, it is possible to extract only the face information which is a part of the moving image and parameterize the characteristic facial expression in each part of the face with high accuracy.
[0014]
According to a fourth aspect of the present invention, in the virtual videophone apparatus according to the first or second aspect, the special effect processing means selects the simplified image according to the facial expression parameter extracted by the facial expression analysis means. An image processing function for deforming the simplified image selected by the means or changing the background is provided.
[0015]
According to the above configuration, the facial animation can be arbitrarily and effectively deformed in accordance with the facial expression parameter to emphasize or change the facial expression of emotions and emotions.
[0016]
A virtual videophone device according to claim 5, wherein the virtual videophone device according to any one of claims 1 to 4, wherein the simplified image creating means includes a face of a moving image obtained from the imaging device. A three-dimensional position measurement unit that measures the three-dimensional position of each part, and a three-dimensional image creation unit that creates a three-dimensional image based on the three-dimensional position information measured by the three-dimensional position measurement unit, I do.
[0017]
According to the above configuration, the face of the simplified image can be three-dimensionally changed, and the expression of the sender can be transmitted to the other party in detail, despite being a virtual videophone system.
[0018]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a virtual videophone device according to one embodiment of the present invention. The virtual videophone device according to the present embodiment analyzes the expression of a person and applies a special effect to a simplified image selected according to the expression.
[0019]
In FIG. 1, reference numeral 101 denotes an image capturing apparatus as an image input apparatus, and here, a video camera or the like for capturing a moving image of a person is used. Reference numeral 102 denotes an image processing unit which processes a composite video signal obtained by mixing a video signal of a captured moving image and a synchronization signal, and inputs the composite video signal to the expression analysis unit 103 at the next stage. Here, the person to be imaged is a person on the transmitting side.
[0020]
The facial expression analysis unit 103 includes a face region image extraction unit 104 that specifies and extracts an image of a person's face region from the input moving image, and an eye and a nose that form a person's face from the specified face region. A facial component determining unit 105 for identifying each part such as a face and a mouth, and a facial expression determining unit 106 for extracting a facial expression parameter such as emotion, emotion, and so on from the deformation, positional relationship, and variation of each identified facial part. The converted expression parameters are output to the simplified image selecting means 107 and the special effect processing means 108.
[0021]
Reference numeral 109 denotes a simplified image creation unit that creates, in real time, an animation image in which a person's face is simplified based on a moving image obtained from the imaging device 101 via the image processing unit 102. Reference numeral 112 denotes a simplified image database as a storage device for storing the animation image. The simplified image creation unit 109 includes a three-dimensional position measurement unit 110 that measures the three-dimensional position of each part of the moving image and a three-dimensional image creation unit that creates a three-dimensional image based on the three-dimensional position information measured by the three-dimensional position measurement unit 110 111 are provided. The simplified image database 112 can register not only animation of a person's face but also various characters such as animals and other people. Thereby, an original animation image can be created.
[0022]
Reference numeral 113 denotes a microphone for sound collection as a voice input device. The microphone 113 amplifies the voice spoken by the person on the transmitting side, removes unnecessary frequency components and noise, and inputs the voice to the mouth shape estimation unit 115. Processing means 114 is connected. The mouth shape estimation unit 115 estimates the mouth shape of the person based on the input audio signal. The mouth shape estimating unit 115 estimates which sound is being emitted in real time by analyzing the input voice, and outputs it to the simplified image selecting unit 107.
[0023]
The simplified image selecting means 107 is a face animation created by the simplified image creating means 109 and stored in the storage device 112 or a face animation already registered, based on the facial expression parameters output from the facial expression analyzing means 103 and the analysis result of the mouth shape. It functions to select an image according to the voice or expression from the character animation that is present. The special effect processing unit 108 includes a face image deformation processing unit 110 that deforms or enhances the face animation selected according to the facial expression parameter output from the facial expression analysis unit 103, and a background process that switches the background image. And a function of outputting a combined simplified image with a facial expression.
[0024]
The virtual videophone device having the above configuration captures a sender (sender), which is a person on the transmitting side, by an imaging device 101, which is an image input device, and expresses a moving image of the captured person through an image processing unit 102 to a facial expression analysis unit 103. To enter. The facial expression analysis unit 103 first specifies and extracts a face area of a person from the moving image information by the face area image extracting unit 104, and extracts each face such as an eye, a nose, and a mouth in the extracted moving image of the face area. The elements are determined by the face component determining unit 105, and the facial expression determining unit 106 determines the joy, emotion, and pleasure of the person from the deformation, positional relationship, and amount of change of each part of the face, and quantifies them according to these facial expressions. Outputs facial expression parameters. The facial expression parameters are input to the simplified image selecting means 107 and the special effect processing means 108.
[0025]
On the other hand, the moving image information output from the image processing means 102 is also input to the simplified image creation means 109. The simplified image creation means 109 creates a simplified image of a face, which is the basic of a person, from a moving image of the person as an animation in real time, and registers it in the simplified image database 112 as a storage device. By providing the three-dimensional position measuring unit 110 in the simplified image creating means 109, the three-dimensional position in the simplified image can be specified. Then, the three-dimensional image creating unit 111 can create a three-dimensional image based on the specified three-dimensional position. Thereby, a three-dimensional simplified image is created.
[0026]
Further, when the sender inputs a sound to the microphone 113 which is a sound input device, the sound signal from the microphone 113 is amplified by the sound processing means 114, and a filtering process such as noise is performed. Is entered. The mouth shape estimating means 115 analyzes the level and frequency of the audio signal, and estimates the mouth shape and a change in the mouth shape based on the analysis result. That is, the shape of the mouth emitting the voice and the change thereof are determined based on the voice emitted by the person, and the mouth shape estimating unit 115 inputs the mouth shape parameter as the determination result to the simplified image selecting unit 107. I do.
[0027]
Therefore, the simple image selecting means 107 selects and extracts corresponding simple image information from the simple image information stored in the storage device 112 based on the mouth shape parameter and the facial expression parameter from the facial expression analyzing means 103 to obtain the special effect. Input to processing means 108. For this reason, the special effect processing unit 108 applies a special effect to the selected simplified image according to the facial expression parameter obtained from the facial expression analyzing unit 103. For example, the face animation of the simplified image can be deformed by the face image deformation processing unit 116, and the background image can be switched and synthesized by the background processing unit 117. As a result, the simplified image information with the expression subjected to the special processing is written into the image memory via the image control means (not shown), and the simplified image information is sequentially read from the image memory and transmitted to the receiving side. Therefore, the receiving side can view a simplified image expressing the facial expression of the sender.
[0028]
In the present embodiment, the simplified image selection unit 107 is a storage device that is a simplified image database based on both the expression parameters from the expression analysis unit 103 and the mouth shape parameters based on the voice information from the mouth shape estimation unit 115. A corresponding simplified image is selected from among the simplified images stored in the storage device 112 and special effect processing is performed on the selected simplified image. However, based on only the expression parameters, the simplified images stored in the storage device 112 are selected. The corresponding simplified image may be selected, and if special effect processing is applied, the expression of emotions and emotions in each part of the face such as eyes, nose, cheek etc. can be embodied on the animation image and made to be seen by the recipient it can.
[0029]
【The invention's effect】
As described above, according to the present invention, in a virtual videophone device, information of a facial expression obtained from a moving image of a sender is extracted in real time, and an animation image of an expressive face is used by using the information. Can be created.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a virtual videophone device according to an embodiment of the present invention.
FIG. 2 is a block diagram showing a configuration of a conventional virtual videophone device.
[Explanation of symbols]
Reference Signs List 101 Image pickup device 103 Expression analysis means 104 Face area image extraction unit 105 Face component determination unit 106 Expression determination unit 107 Simple image selection unit 108 Special effect processing unit 109 Simple image creation unit 110 Three-dimensional position measurement unit 111 Three-dimensional image creation unit 112 Simple image database (storage device)
113 microphone (voice input device)
115 mouth shape estimating means 116 face image deformation processing unit 117 background processing unit

Claims

撮像装置により撮像された人物の動画像から人物の顔の表情を分析する表情分析手段と、
前記撮像装置により撮像された人物の動画像から当該人物の顔の簡略画像を作成して記憶装置に蓄積する簡略画像作成手段と、
前記表情分析手段から得られた表情分析結果に従って前記記憶装置に蓄積されている簡略画像から対応する簡略画像を選択する簡略画像選択手段と、
前記簡略画像選択手段により選択された簡略画像に対して、前記表情分析手段により分析した表情に応じて特殊効果を施す特殊効果処理手段と、
を具備したことを特徴とする仮想テレビ電話装置。Expression analysis means for analyzing the expression of the face of a person from a moving image of the person captured by the imaging device,
A simplified image creation unit that creates a simplified image of the face of the person from the moving image of the person captured by the imaging device and stores the simplified image in the storage device;
A simplified image selecting means for selecting a corresponding simplified image from the simplified images stored in the storage device according to the expression analysis result obtained from the expression analyzing means,
Special effect processing means for applying a special effect to the simplified image selected by the simplified image selecting means in accordance with the facial expression analyzed by the facial expression analyzing means,
A virtual videophone device comprising:

撮像装置により撮像された人物の動画像から人物の顔の表情を分析する表情分析手段と、
前記撮像装置により撮像された人物の動画像から当該人物の顔の簡略画像を作成して記憶装置に蓄積する簡略画像作成手段と、
音声入力装置より入力された音声に基づいて前記人物の口形状を推測する口形状推測手段と、
前記表情分析手段および口形状推測手段から得られた表情分析結果および口形状推測結果に従って前記記憶装置に蓄積されている簡略画像から対応する簡略画像を選択する簡略画像選択手段と、
前記簡略画像選択手段により選択された簡略画像に対して、前記表情分析手段により分析した表情に応じて特殊効果を施す特殊効果処理手段と、
を具備したことを特徴とする仮想テレビ電話装置。Expression analysis means for analyzing the expression of the face of a person from a moving image of the person captured by the imaging device,
A simplified image creation unit that creates a simplified image of the face of the person from the moving image of the person captured by the imaging device and stores the simplified image in the storage device;
Mouth shape estimating means for estimating the mouth shape of the person based on the voice input from the voice input device,
A simplified image selecting means for selecting a corresponding simplified image from the simplified images stored in the storage device according to the facial expression analysis result and the mouth shape estimation result obtained from the facial expression analyzing means and the mouth shape estimating means;
Special effect processing means for applying a special effect to the simplified image selected by the simplified image selecting means in accordance with the facial expression analyzed by the facial expression analyzing means,
A virtual videophone device comprising:

前記表情分析手段は、
撮像された人物の動画像から人物の顔領域を抜き出す顔領域画像抜き出し部と、
前記顔領域画像抜き出し部より抜き出した顔領域の画像から顔構成要素を判断する顔構成要素判断部と、
前記顔構成要素判断部による前記顔構成要素の判断結果から表情を判断しパラメータ化する表情判断部と、
を有することを特徴とする請求項１または請求項２に記載の仮想テレビ電話装置。The expression analysis means,
A face region image extraction unit that extracts a person's face region from a captured moving image of a person,
A face component determining unit that determines a face component from an image of the face region extracted from the face region image extracting unit;
A facial expression determining unit that determines a facial expression from the determination result of the facial component by the facial component determining unit and parameterizes the facial expression;
The virtual videophone device according to claim 1, further comprising:

前記特殊効果処理手段は、前記表情分析手段において抽出された表情パラメータに応じて前記簡略画像選択手段により選択した簡略画像を変形させたり、背景を変更したりする画像処理機能を有することを特徴とする請求項１または請求項２に記載の仮想テレビ電話装置。The special effect processing unit has an image processing function of deforming the simplified image selected by the simplified image selecting unit according to the facial expression parameter extracted by the facial expression analyzing unit, or changing the background. The virtual videophone device according to claim 1 or 2, wherein

前記簡略画像作成手段は、
前記撮像装置から得られた動画像の顔の各部の三次元位置を計測する三次元位置計測部と、
前記三次元位置計測部により計測された三次元位置情報を基に立体画像を作成する立体画像作成部と、
を有することを特徴とする請求項１から請求項４のいずれか１項に記載の仮想テレビ電話装置。The simplified image creating means includes:
A three-dimensional position measurement unit that measures the three-dimensional position of each part of the face of the moving image obtained from the imaging device,
A stereoscopic image creating unit that creates a stereoscopic image based on the three-dimensional position information measured by the three-dimensional position measuring unit,
The virtual videophone device according to any one of claims 1 to 4, comprising:

撮像した人物の動画像からこの人物の顔の表情を分析する表情分析工程と、
前記動画像から人物の顔の簡略画像を作成して記憶装置に蓄積する簡略画像作成工程と、
前記表情分析ステップで抽出された表情パラメータを基に前記記憶装置から簡略画像を選択する簡略画像選択工程と、
前記簡略画像選択工程で作成された簡略画像に対し表情パラメータに応じた特殊効果を施す特殊効果処理工程と、
を具備することを特徴とする仮想テレビ電話装置における画像生成方法。A facial expression analysis step of analyzing the facial expression of the person from the captured moving image of the person,
A simplified image creation step of creating a simplified image of a person's face from the moving image and storing the simplified image in a storage device;
A simplified image selection step of selecting a simplified image from the storage device based on the expression parameters extracted in the expression analysis step,
A special effect processing step of applying a special effect according to an expression parameter to the simplified image created in the simplified image selecting step,
An image generation method in a virtual videophone device, comprising:

撮像した人物の動画像からこの人物の顔の表情を分析する表情分析工程と、
前記人物の音声から該人物の口形状を推測する口形状推測工程と、
前記動画像から人物の顔の簡略画像を作成して記憶装置に蓄積する簡略画像作成工程と、
前記表情分析工程および口形状推測工程で抽出された表情パラメータおよび口形状パラメータを基に前記記憶装置から簡略画像を選択する簡略画像選択工程と、
前記簡略画像選択工程で作成された簡略画像に対し表情パラメータに応じた特殊効果を施す特殊効果付加処理工程と、
を具備することを特徴とする仮想テレビ電話装置における画像生成方法。A facial expression analysis step of analyzing the facial expression of the person from the captured moving image of the person,
A mouth shape estimation step of estimating the mouth shape of the person from the voice of the person,
A simplified image creation step of creating a simplified image of a person's face from the moving image and storing the simplified image in a storage device;
A simplified image selection step of selecting a simplified image from the storage device based on the expression parameters and the mouth shape parameters extracted in the expression analysis step and the mouth shape estimation step;
A special effect addition processing step of applying a special effect according to an expression parameter to the simplified image created in the simplified image selection step,
An image generation method in a virtual videophone device, comprising: