WO2024009721A1

WO2024009721A1 - Image processing device, and image processing method

Info

Publication number: WO2024009721A1
Application number: PCT/JP2023/022231
Authority: WO
Inventors: 倫晶有定; 新吾堀内; 大暉市原; 祥彦静野; 浩之木村; 裕貴中山; 喜貴千賀
Original assignee: 株式会社Ｎｔｔデータ
Priority date: 2022-07-08
Filing date: 2023-06-15
Publication date: 2024-01-11
Also published as: JP2024008557A

Abstract

The present invention reduces unnaturalness in a synthesized image and obviates the need for manual correction by a person. This image processing device: divides a face and complete hands in a model image for synthesis into regions bounded by edges, on the basis of edge information pertaining to a mask image of a face and hands in the model image for synthesis and edge information pertaining to a clothing image for synthesis; calculates, for each divided region, the rate of consistency between a second 3D clothed avatar and portions that correspond to images of the face and complete hands in the model image for synthesis, the face and hands having been divided into the regions; ad generates a face/hand image for synthesis for which a before/after relationship between clothing and a person has already been assessed. The image processing device also: generates, for a first 3D clothed avatar, a shadow image in which rendering is executed to reflect settings data that is associated with the model image for synthesis; outputs a first clothed model image in which the clothing image is superposed on the model image for synthesis; superposes the face/hand image for synthesis on the first clothed model image; and furthermore superposes the shadow image on the resultant image to generate a final clothed model image.

Description

画像処理装置及び画像処理方法Image processing device and image processing method

　本発明は、画像処理装置及び画像処理方法に関する。より詳細に言えば、本発明は、ユーザが作成した３Ｄ衣服データを３Ｄアバターに３ＤＣＧ環境で着せ付け、そこから衣服画像のみを２Ｄデータ化して、衣服画像の２Ｄデータとヒトの２Ｄデータを自動で画像合成し、２Ｄデータの着用画像として出力する画像処理装置及び画像処理方法に関する。 The present invention relates to an image processing device and an image processing method. More specifically, the present invention attaches 3D clothing data created by a user to a 3D avatar in a 3DCG environment, converts only the clothing image into 2D data, and automatically converts the 2D data of the clothing image and the 2D data of the human. The present invention relates to an image processing apparatus and an image processing method for synthesizing images and outputting them as a 2D data wearing image.

　従来、アパレル業界や広告業界などでは、実在のモデルに衣服を身に着けてもらって写真を撮り、その画像を販促やマーケティングに利用してきた。しかしながら、このような手法は一つの画像を生成する際の人的、金銭的及び時間的負荷が高く、多くの画像を生成するには不向きであった。 Traditionally, in the apparel and advertising industries, real models have been photographed wearing clothing, and the images have been used for sales promotions and marketing. However, such a method requires a large amount of human resources, money, and time to generate one image, and is not suitable for generating many images.

　このため、３ＤアバターやＧＡＮ（Generative Adversarial Network：敵対的生成ネットワーク）と呼ばれる技術が利用されつつある。アバターは、ネットワーク上でユーザの分身として使用されるキャラクターのことであり、２次元（２Ｄ）画像のアバターと３次元（３Ｄ）表現されたアバターがある。ＧＡＮは、２つのニューラルネットワークを用いて学習することで画像を生成する手法である。ＧＡＮの技術により、実物を撮影したかのように、コンピュータが同類の画像を無尽蔵に生成でき得るようになってきた。 For this reason, 3D avatars and a technology called GAN (Generative Adversarial Network) are being used. An avatar is a character used as a user's alter ego on a network, and includes two-dimensional (2D) image avatars and three-dimensional (3D) avatars. GAN is a method of generating images by learning using two neural networks. GAN technology has made it possible for computers to generate an unlimited number of similar images as if they were taken of the real thing.

　３ＤアバターやＧＡＮの技術を用いることにより、アパレル業界や広告業界などでは、実在しないモデル（バーチャルモデル）を使用した販促やマーケティングなどの業務活用に関する取組みが進んできている。特に、アパレル業界では、衣服の３Ｄモデルを作成し、その３Ｄ衣服データを２Ｄデータ化してモデルの画像に合成しているが、単なる合成ではモデルの体と衣服の前後関係から不自然に見えてしまうことがあるため、ヒトの手を介して自然に見えるように補正を行っていた。 By using 3D avatars and GAN technology, efforts are being made in the apparel and advertising industries to utilize non-existent models (virtual models) in sales promotions, marketing, and other business operations. In particular, in the apparel industry, 3D models of clothing are created, and the 3D clothing data is converted into 2D data and composited with the model's image. Because the image may sometimes be hidden away, corrections were made manually to make it look natural.

特開２０１１－１８６７７４号公報Japanese Patent Application Publication No. 2011-186774

　ヒトの手を介して画像合成を行う手法では作成可能な画像数に限界があるため、多くの企業が３Ｄ衣服データをヒトの画像に自動的に着衣させる画像合成手法を研究している。 Because there is a limit to the number of images that can be created using methods that perform image synthesis using human hands, many companies are researching image synthesis methods that automatically attach 3D clothing data to human images.

　衣服データをマネキン画像に合成させる画像合成技術として、特許文献１の技術が知られていた。しかしながら、特許文献１の技術を利用したとしても、衣服画像とモデルの画像を結合させたものに依然として不自然さが残るという問題があった。例えば、首から上の人体画像と衣服画像を結合させると首背部の裏地が表示されてしまうなど、合成画像に不自然さが残る問題が依然として存在していた。 The technique disclosed in Patent Document 1 was known as an image synthesis technique for synthesizing clothing data with a mannequin image. However, even if the technique of Patent Document 1 is used, there is a problem in that the combination of the clothing image and the model image still looks unnatural. For example, when an image of a human body from the neck up and an image of clothing are combined, the lining of the back of the neck is displayed, leaving the combined image unnatural.

　また、３Ｄ衣服データをヒトの２Ｄ画像に自動的に着衣させる処理を行う場合、従来技術では本来ヒトのうえに来るべき衣服領域が見えなくなってしまうなどの精度の問題が頻繁に発生しており、最終的にヒトの手による修正などが必要とされるなど技術的な課題が存在していた。 In addition, when automatically applying 3D clothing data to a 2D image of a person, conventional technology frequently suffers from accuracy problems such as the clothing area that should originally be on the person becoming invisible. However, there were technical issues, such as the need for manual corrections in the end.

　本発明は、このような課題を解決するためになされたものであり、ユーザが作成した３Ｄ衣服データを３Ｄアバターに３ＤＣＧ環境で着せ付け、そこから衣服画像のみを２Ｄデータ化して、衣服画像の２Ｄデータとヒトの２Ｄデータを自動で画像合成し、２Ｄデータの着用画像として出力する画像処理装置及び画像処理方法を提供することを目的とする。 The present invention was made to solve such problems, and it involves dressing a 3D avatar with 3D clothing data created by a user in a 3DCG environment, converting only the clothing image into 2D data, and converting the clothing image into 2D data. An object of the present invention is to provide an image processing device and an image processing method that automatically synthesize images of 2D data and human 2D data and output the 2D data as a worn image.

　本発明の一態様である画像処理装置は、
　合成用モデル画像に関連付けられる設定データ、３Ｄアバター、及び３Ｄ衣服データを使用して、第１の３Ｄ着衣アバター、第２の３Ｄ着衣アバター、及び合成用衣服画像を生成する手段と、
　前記合成用モデル画像に関連付けられる表出人体部位のマスク画像のエッジ情報と、前記合成用衣服画像のエッジ情報とに基づいて、前記合成用モデル画像の表出人体部位全体の画像をエッジで囲まれた領域に分割し、分割された前記領域ごとに、前記第２の３Ｄ着衣アバターと前記領域に分割された前記合成用モデル画像の表出人体部位全体の画像との対応する部分の一致率を計算することによって、衣服と人体の前後関係を判定済みの合成用表出人体部位画像を生成する手段と、
　前記第１の３Ｄ着衣アバターに対して、前記合成用モデル画像に関連付けられる設定データを反映させてレンダリングを実行した際の陰画像を生成する手段と、
　前記合成用モデル画像に前記合成用衣服画像を重ね合わせて第１の着衣モデル画像を出力する手段と、
　前記第１の着衣モデル画像に、前記合成用表出人体部位画像を重ね合わせ、さらに前記陰画像を重ね合わせて、最終的な着衣モデル画像を生成する手段と
　を備えるように構成される。 An image processing device that is one aspect of the present invention includes
Means for generating a first 3D clothed avatar, a second 3D clothed avatar, and a synthetic clothing image using setting data, a 3D avatar, and 3D clothing data associated with the synthetic model image;
An image of the entire expressed human body part of the synthetic model image is surrounded by edges based on edge information of a mask image of the expressed human body part associated with the synthetic model image and edge information of the synthetic clothing image. and for each divided region, the matching rate of the corresponding portion between the second 3D clothed avatar and the image of the entire expressed human body part of the synthesis model image divided into the regions. means for generating a composite expressed human body part image in which the anteroposterior relationship between the clothing and the human body has been determined by calculating the
means for generating a shadow image when rendering is performed on the first 3D clothed avatar by reflecting setting data associated with the synthesis model image;
means for superimposing the synthetic clothing image on the synthetic model image to output a first clothed model image;
and means for superimposing the expressed human body part image for synthesis on the first clothed model image, and further superimposing the negative image on the first clothed model image to generate a final clothed model image.

　本発明によれば、２種類のエッジ画像、３Ｄ画像データ、及びモデル画像を用いることにより、従来発生していた合成画像における不自然さを減少させ、ヒトの手による修正を不要にする。 According to the present invention, by using two types of edge images, 3D image data, and model images, the unnaturalness that conventionally occurs in a composite image is reduced, and manual correction by humans is no longer necessary.

　本明細書において開示される実施形態の詳細な理解は、添付図面に関連して例示される以下の説明から得ることができる。
本発明の実施形態に係る画像処理装置１０、ユーザ端末１１、３Ｄスキャナ１２及び撮像装置１３を含む画像処理システム１の全体構成図である。本発明の実施形態に係る画像処理装置１０、ユーザ端末１１、及び撮像装置１３によって実行される処理の概要を説明する図である。本発明の実施形態に係る画像処理装置１０のシステム構成図である。本発明の実施形態に係る実写撮影データ１０６のデータ構造の一例を示す図である。本発明の実施形態に係る３Ｄアバター１０８のデータ構造の一例を示す図である。本発明の実施形態に係る３Ｄ着衣アバター１０９のデータ構造の一例を示す図である。本発明の実施形態に係る合成用衣服画像１１０のデータ構造の一例を示す図である。画像処理装置１０が３Ｄ着衣アバター及び合成用衣服画像を生成する処理フローを示す図である。画像処理装置１０が衣服と人体の前後関係を判定済みの合成用顔・手画像を生成する処理フローを示す図である。画像処理装置１０が陰画像を生成する処理フローを示す図である。画像処理装置１０が最終着衣画像を生成する処理フローを示す図である。（ａ）は、顔及び手のマスク画像、合成用衣服画像からエッジ情報を抽出するイメージを示す図であり、（ｂ）は、２つのエッジ情報を組み合わせたイメージを示す図である。（ａ）は、合成用モデル画像の手部分をエッジで囲まれたいくつかの領域に分割したことを示す例の図であり、（ｂ）は、生成された合成用顔・手画像の従来技術と本発明の手画像の例を示す図である。 A detailed understanding of the embodiments disclosed herein can be obtained from the following description, illustrated in conjunction with the accompanying drawings.
1 is an overall configuration diagram of an image processing system 1 including an image processing device 10, a user terminal 11, a 3D scanner 12, and an imaging device 13 according to an embodiment of the present invention. FIG. 1 is a diagram illustrating an overview of processing executed by an image processing device 10, a user terminal 11, and an imaging device 13 according to an embodiment of the present invention. 1 is a system configuration diagram of an image processing apparatus 10 according to an embodiment of the present invention. FIG. 3 is a diagram illustrating an example of a data structure of live photographic data 106 according to an embodiment of the present invention. It is a diagram showing an example of a data structure of a 3D avatar 108 according to an embodiment of the present invention. It is a diagram showing an example of a data structure of a 3D clothed avatar 109 according to an embodiment of the present invention. It is a diagram showing an example of a data structure of a clothing image for synthesis 110 according to an embodiment of the present invention. FIG. 2 is a diagram showing a processing flow in which the image processing device 10 generates a 3D clothed avatar and a synthetic clothing image. FIG. 3 is a diagram illustrating a processing flow in which the image processing device 10 generates a composite face/hand image in which the anteroposterior relationship between clothing and a human body has been determined. 3 is a diagram showing a processing flow in which the image processing device 10 generates a shadow image. FIG. FIG. 3 is a diagram showing a processing flow in which the image processing device 10 generates a final clothed image. (a) is a diagram illustrating an image in which edge information is extracted from mask images of faces and hands, and clothing images for synthesis, and (b) is a diagram illustrating an image in which two pieces of edge information are combined. (a) is a diagram showing an example in which the hand part of the synthesis model image is divided into several regions surrounded by edges, and (b) is a diagram showing the conventional method of the generated synthesis face/hand image. FIG. 2 is a diagram illustrating an example of a hand image of the technique and the present invention.

　（全体構成）
　図１は、本発明の実施形態に係る画像処理装置１０、ユーザ端末１１、３Ｄスキャナ１２及び撮像装置１３を含む画像処理システム１の全体構成図である。画像処理装置１０は、ユーザ端末１１及び撮像装置１３とネットワーク１４を介して相互に通信可能に接続されている。ユーザ端末１１は、ＬＡＮやＷＡＮなどの任意のネットワークを介して３Ｄスキャナ１２と相互に通信可能に接続されている。図１は、説明の簡便化のため、画像処理装置１０、ユーザ端末１１、３Ｄスキャナ１２及び撮像装置１３を１つずつしか示していないが、これらは複数存在し得る。 (overall structure)
FIG. 1 is an overall configuration diagram of an image processing system 1 including an image processing device 10, a user terminal 11, a 3D scanner 12, and an imaging device 13 according to an embodiment of the present invention. The image processing device 10 is connected to a user terminal 11 and an imaging device 13 via a network 14 so as to be able to communicate with each other. The user terminal 11 is connected to the 3D scanner 12 via an arbitrary network such as a LAN or WAN so that they can communicate with each other. Although FIG. 1 shows only one image processing device 10, one user terminal 11, one 3D scanner 12, and one imaging device 13 for simplicity of explanation, there may be a plurality of these devices.

　本明細書では、画像処理装置１０を１つの装置あるいはシステムとして説明するが、画像処理装置１０によって実行される様々な処理を複数の装置あるいはシステムで分散して実行するように構成してもよい。 In this specification, the image processing device 10 will be described as one device or system, but the various processes executed by the image processing device 10 may be configured to be distributed and executed by multiple devices or systems. .

　画像処理装置１０は、本明細書で説明する画像合成処理を実行する。より詳細に言えば、画像処理装置１０は、ユーザが作成した３Ｄ衣服データを３Ｄアバターに３ＤＣＧ環境で着せ付け、そこから衣服画像のみを２Ｄデータ化して、衣服画像の２Ｄデータとヒトの２Ｄデータを自動で画像合成し、２Ｄデータの着用画像を出力する。 The image processing device 10 executes image composition processing described in this specification. More specifically, the image processing device 10 dresses the 3D clothing data created by the user on a 3D avatar in a 3DCG environment, converts only the clothing image into 2D data, and converts the 2D data of the clothing image and the 2D data of the human being into 2D data. The system automatically synthesizes the images and outputs a 2D data wearing image.

　ユーザ端末１１は、ユーザによって使用される有線または無線環境において動作可能な任意のタイプのデバイス（例えば、ＰＣ、タブレット型端末など）とすることができるが、特定の装置やデバイスに限定されることはない。ユーザ端末１１は、サードパーティのアプリケーションを使用して３Ｄ衣服データを生成し、３Ｄスキャナ１２などを使用して３Ｄアバターのデータを生成することができる。ユーザ端末１１は、３Ｄ衣服データ及び３Ｄアバターのデータを画像処理装置１０に送信し、画像処理装置１０によって提供されるアプリケーションを介して画像合成処理についての各種指示を送信し、合成結果を画像処理装置１０から受信することができる。 The user terminal 11 can be any type of device (e.g., PC, tablet terminal, etc.) capable of operating in a wired or wireless environment used by the user, but is not limited to a specific device or device. There isn't. The user terminal 11 can generate 3D clothing data using a third-party application, and can generate 3D avatar data using a 3D scanner 12 or the like. The user terminal 11 transmits 3D clothing data and 3D avatar data to the image processing device 10, transmits various instructions regarding image composition processing via an application provided by the image processing device 10, and performs image processing on the composition results. It can be received from the device 10.

　３Ｄスキャナ１２は、ユーザ端末１１からの指示に応答して３Ｄアバターのデータを生成する機能を有する装置である。 The 3D scanner 12 is a device that has a function of generating 3D avatar data in response to instructions from the user terminal 11.

　撮像装置１３は、実在のモデルの実写撮影を行う装置であり、１つ以上のカメラによってモデルの画像を撮影する装置であり、任意のスタジオ装置を含み得る装置である。モデルの撮像画像データを識別しやすくするために、撮像場所の床、壁などは、ブルーバック（青い背景）やグリーンバック（緑の背景）などの任意の背景としてよい。 The imaging device 13 is a device that takes a live photograph of a real model, and is a device that takes an image of the model using one or more cameras, and can include any studio device. In order to easily identify the captured image data of the model, the floor, wall, etc. of the imaging location may be any background such as a blue background or a green background.

　ネットワーク１４は、画像処理装置１０、ユーザ端末１１、３Ｄスキャナ１２及び撮像装置１３の間の通信を担う任意の通信網であり、インターネット、イントラネット、専用線、任意のネットワークシステムなどを含み、特に限定されることはない。 The network 14 is any communication network responsible for communication between the image processing device 10, the user terminal 11, the 3D scanner 12, and the imaging device 13, and includes the Internet, an intranet, a leased line, any network system, etc., and is particularly limited to It will not be done.

　（画像処理装置１０の機能構成）
　画像処理装置１０は、実在のモデルの画像や各種設定データを利用して、ユーザが作成した３Ｄ衣服データを３Ｄアバターに３ＤＣＧ環境で着せ付け、そこから衣服画像のみを２Ｄデータ化して、衣服画像の２Ｄデータとヒトの２Ｄデータを自動で画像合成し、２Ｄデータの着用画像を出力する。 (Functional configuration of image processing device 10)
The image processing device 10 uses images of real models and various setting data to dress 3D avatars with 3D clothing data created by the user in a 3DCG environment, converts only the clothing images into 2D data, and creates clothing images. automatically synthesizes the 2D data of the person and the 2D data of the person, and outputs the 2D data wearing image.

　以下、図２を参照しながら、画像処理装置１０によって提供される様々な機能を説明する。図２は、本発明の実施形態に係る画像処理装置１０、ユーザ端末１１、及び撮像装置１３によって実行される処理の概要を説明する図である。図２のＳ１は、撮像装置１３によって実行され、Ｓ２はユーザ端末１１によって実行され、Ｓ３～Ｓ６は、画像処理装置１０によって実行される。なお、本明細書で説明される実施形態は、衣服の種類としてトップス（上衣）を例として説明するが、本発明はこれ以外の種類の衣服（例えば、ボトムス）に対しても適用可能であることを了解されたい。 Hereinafter, various functions provided by the image processing device 10 will be explained with reference to FIG. 2. FIG. 2 is a diagram illustrating an overview of processing executed by the image processing device 10, the user terminal 11, and the imaging device 13 according to the embodiment of the present invention. S1 in FIG. 2 is executed by the imaging device 13, S2 is executed by the user terminal 11, and S3 to S6 are executed by the image processing device 10. Note that the embodiments described in this specification will be described using tops (upper garments) as an example of the type of clothing, but the present invention is also applicable to other types of clothing (e.g., bottoms). I hope you understand that.

　（Ｓ１：撮像装置１３による実写モデル撮像時の処理内容）
　ユーザは、撮像装置１３を使用して実在のモデルの撮影を行う。後述するように、実在のモデルのモデル画像は、後述する画像合成処理における合成用モデル画像として使用される。合成用モデル画像は、２Ｄ画像データである。 (S1: Processing details when photographing a live-action model by the imaging device 13)
A user uses the imaging device 13 to photograph an actual model. As will be described later, a model image of an actual model is used as a model image for synthesis in image synthesis processing that will be described later. The model image for synthesis is 2D image data.

　撮像装置１３は、合成用モデル画像、並びに撮像した際のカメラ設定データ（カメラ角度、距離など）及び照明設定データ（明るさなど）を画像処理装置１０に送信する。画像処理装置１０は、撮像装置１３から受信した合成用モデル画像、カメラ設定データ及び照明設定データを実写撮影データ１０６に格納する。 The imaging device 13 transmits the synthesis model image, camera setting data (camera angle, distance, etc.) and illumination setting data (brightness, etc.) at the time of imaging to the image processing device 10. The image processing device 10 stores the compositing model image, camera setting data, and illumination setting data received from the imaging device 13 in the live photographing data 106.

　（Ｓ２：ユーザ端末１１による３Ｄ衣服データ及び３Ｄアバター生成の処理内容）
　ユーザ端末１１は、サードパーティなどのアプリケーションを有しており、３Ｄ衣服データを生成し、また、３Ｄスキャナ１２と通信を行って実在のモデルのポーズと同じポーズの３Ｄアバターを生成する。３Ｄアバターのポーズは、この段階で実在のモデルと同じポーズにしてもいいし、あるいは、基本的なポーズとしておき、後述するＳ３の処理時に同じポーズに変更するようにしてもよい。 (S2: Processing details of 3D clothing data and 3D avatar generation by user terminal 11)
The user terminal 11 has an application such as a third party, generates 3D clothing data, and also communicates with the 3D scanner 12 to generate a 3D avatar with the same pose as the actual model. The pose of the 3D avatar may be the same as that of the real model at this stage, or it may be set as a basic pose and changed to the same pose during processing in S3, which will be described later.

　ユーザ端末１１は、３Ｄ衣服データ及び３Ｄアバターを画像処理装置１０に送信し、画像処理装置１０は、受信した３Ｄ衣服データ及び３Ｄアバターを、それぞれ、３Ｄ衣服データ１０７及び３Ｄアバター１０８に格納する。 The user terminal 11 transmits the 3D clothing data and 3D avatar to the image processing device 10, and the image processing device 10 stores the received 3D clothing data and 3D avatar in 3D clothing data 107 and 3D avatar 108, respectively.

　（Ｓ３：３Ｄ着衣アバター及び合成用衣服画像を生成する処理内容）
　画像処理装置１０は、３Ｄ着衣アバター及び合成用衣服画像を生成するためのアプリケーションをユーザ端末１１に提供する。画像処理装置１０は、ユーザ端末１１からの指示に応答して、合成処理に使用するための衣服のみの２Ｄ画像（合成用衣服画像）と２種類の３Ｄ着衣アバターとを出力する。 (S3: Processing content for generating 3D clothed avatar and synthetic clothing image)
The image processing device 10 provides the user terminal 11 with an application for generating a 3D clothed avatar and a synthetic clothing image. In response to an instruction from the user terminal 11, the image processing device 10 outputs a 2D image of only clothes (synthesis clothing image) and two types of 3D clothed avatars for use in the compositing process.

　画像処理装置１０は、ユーザ端末１１からの指示に応じて、３Ｄ衣服データを３Ｄ衣服データ１０７から読み出し、３Ｄアバターを３Ｄアバター１０８から読み出す。画像処理装置１０は、ユーザ端末１１からの指示に応じて、モデル画像を実写撮影データ１０６からさらに読み出し、読み出したモデル画像のポーズと同じポーズに３Ｄアバターのポーズを変更することもできる。画像処理装置１０は、ユーザ端末１１からの指示に応じて、３ＤＣＧ（コンピュータグラフィックス）空間上で３Ｄ衣服データを３Ｄアバターに重ね合わせ、所定の位置計算を行って３Ｄ衣服データの大きさや配置位置を調整し、３Ｄ衣服データを３Ｄアバターの適切な位置に配置する。この処理により、３Ｄ衣服データが３Ｄアバターに着衣される。 The image processing device 10 reads 3D clothing data from the 3D clothing data 107 and reads 3D avatars from the 3D avatar 108 in response to instructions from the user terminal 11. In response to an instruction from the user terminal 11, the image processing device 10 can also read out a model image from the live-action photography data 106 and change the pose of the 3D avatar to the same pose as the pose of the read model image. In response to an instruction from a user terminal 11, the image processing device 10 superimposes 3D clothing data on a 3D avatar in a 3DCG (computer graphics) space, performs predetermined position calculations, and determines the size and placement position of the 3D clothing data. and place the 3D clothing data at the appropriate position of the 3D avatar. Through this process, the 3D clothing data is put on the 3D avatar.

　画像処理装置１０は、ユーザ端末１１からの指示に応じて、３ＤＣＧ空間上で、３Ｄ衣服データを着衣した３Ｄアバターにおいて、３Ｄアバターの体形やポーズに応じて３Ｄ衣服データのクロスシミュレーションを実行し、第１の３Ｄ着衣アバターを３Ｄ着衣アバター１０９に格納する。 In response to an instruction from the user terminal 11, the image processing device 10 executes a cross simulation of 3D clothing data on a 3D avatar wearing 3D clothing data in a 3DCG space according to the body shape and pose of the 3D avatar, The first 3D clothed avatar is stored in the 3D clothed avatar 109.

　画像処理装置１０は、３ＤＣＧ空間上で、第１の３Ｄ着衣アバターに対して、実写撮影データ１０６から読み出したカメラ設定データ及び照明設定データを反映させ、所定のシェーダー（shader）設定パラメータを使用して３Ｄレンダリング処理を実行し、第２の３Ｄ着衣アバターを３Ｄ着衣アバター１０９に格納する。 The image processing device 10 reflects the camera setting data and lighting setting data read from the live-action shooting data 106 on the first 3D clothed avatar in the 3DCG space, and uses predetermined shader setting parameters. 3D rendering processing is performed, and the second 3D clothed avatar is stored in the 3D clothed avatar 109.

　画像処理装置１０は、３Ｄアバターを除いた３Ｄ衣服データに基づいて２Ｄ衣服画像（「合成用衣服画像」とも言う）を生成し、合成用衣服画像１１０に格納する。 The image processing device 10 generates a 2D clothing image (also referred to as a "synthesis clothing image") based on the 3D clothing data excluding the 3D avatar, and stores it in the synthesis clothing image 110.

　（Ｓ４：合成用顔及び手画像の生成処理）
　ユーザは、任意のアプリケーションを使用して合成用モデル画像から顔及び手のマスク画像を２値化などの方法で機械的に、あるいは手動で生成する。すなわち、画像処理装置１０は、ユーザ端末１１からのマスク画像生成指示に応答して、撮像装置１３から受信した合成用モデル画像から顔及び手のマスク画像を生成する。画像処理装置１０は、任意のフィルタを使用して予め生成してあった顔及び手のマスク画像のエッジ情報を抽出する。説明される実施形態は、衣服の種類としてトップスを例として説明されているため、着衣した状態で表出する人体の部位（表出人体部位）は、顔及び／または手であるが、他の種類の衣服である場合には、その種類に応じて表出人体部位が変わり得る（例えば、ボトムスの場合には、足及び／または足首）ことを了解されたい。 (S4: Generation process of face and hand images for synthesis)
The user uses any application to mechanically or manually generate face and hand mask images from the synthesis model image using a method such as binarization. That is, in response to a mask image generation instruction from the user terminal 11, the image processing device 10 generates mask images of the face and hands from the synthesis model image received from the imaging device 13. The image processing device 10 extracts edge information of a face and hand mask image that has been generated in advance using an arbitrary filter. The described embodiment is explained using tops as an example of the type of clothing, so the parts of the human body that are exposed in the clothed state (exposed human body parts) are the face and/or hands, but other It should be understood that in the case of different types of clothing, the body parts exposed may vary depending on the type of clothing (eg, feet and/or ankles in the case of bottoms).

　画像処理装置１０は、合成用モデル画像から、顔及び手のキーポイントを抽出し、抽出したキーポイントをバウンディングボックスとして探索範囲を設定する。画像処理装置１０は、３Ｄの顔や手の一方の側から他方の側（例えば、左側の外郭部分から右側の外郭部分）へ移動しながらエッジを探索し、顔や手の他方の側に到達して折り返すまでエッジの探索を続ける。画像処理装置１０は、探索されたエッジの範囲から、合成用モデル画像の顔及び手全体の画像を抽出する。 The image processing device 10 extracts key points of the face and hands from the synthesis model image, and sets a search range using the extracted key points as a bounding box. The image processing device 10 searches for edges while moving from one side of the 3D face or hand to the other side (for example, from the left outer part to the right outer part), and reaches the other side of the face or hand. and continues searching the edge until it turns around. The image processing device 10 extracts an image of the entire face and hand of the synthesis model image from the searched edge range.

　画像処理装置１０は、第１の３Ｄ着衣アバターにおいて深度情報（「奥行情報」とも言う）を抽出する深度情報抽出処理を実行する。 The image processing device 10 executes depth information extraction processing to extract depth information (also referred to as "depth information") in the first 3D clothed avatar.

　画像処理装置１０は、任意のフィルタを使用して合成用衣服画像のエッジ情報を抽出する。画像処理装置１０は、合成用衣服画像のエッジをそのまま抽出するとテクスチャによる皺などがエッジのノイズになるため、深度情報を利用してエッジ情報を抽出することができる。 The image processing device 10 extracts edge information of the clothing image for synthesis using an arbitrary filter. The image processing device 10 can extract edge information using depth information, because if the edges of the clothing image for synthesis are extracted as they are, wrinkles caused by the texture will become edge noise.

　画像処理装置１０は、顔及び手のマスク画像のエッジ情報と、合成用衣服画像のエッジ情報とを組み合わせ、組み合わせた画像をもとに合成用モデル画像の顔及び手全体をいくつかの画像に分割する。画像処理装置１０は、合成用モデル画像の顔及び手全体の分割された領域ごとに、第２の３Ｄ着衣アバターにおける対応する領域との一致率を計算しながら最終的に合成すべき顔及び手画像（合成用顔及び手画像）を抽出する。 The image processing device 10 combines the edge information of the mask image of the face and hand with the edge information of the clothing image for synthesis, and creates several images of the entire face and hand of the model image for synthesis based on the combined images. To divide. The image processing device 10 determines the face and hand to be finally synthesized while calculating the matching rate with the corresponding area in the second 3D clothed avatar for each divided area of the entire face and hand of the synthesis model image. Extract images (face and hand images for synthesis).

　（Ｓ５：陰画像の生成処理）
　画像処理装置１０は、３Ｄ着衣アバター１０９から読み出した第１の３Ｄ着衣アバターに対して、合成用モデル画像を生成した際のカメラ設定データ及び証明設定データを反映させレンダリングを実行した際に生まれる陰（シェード）情報を陰画像として生成し、３Ｄ着衣アバター１０９に格納する。 (S5: Shadow image generation process)
The image processing device 10 reflects the camera setting data and proof setting data used when generating the synthesis model image on the first 3D clothed avatar read from the 3D clothed avatar 109, and calculates the shadow created when rendering is performed. (shade) information is generated as a shade image and stored in the 3D clothed avatar 109.

　（Ｓ６：最終着衣画像を生成する着衣合成処理）
　画像処理装置１０は、合成用モデル画像に合成用衣服画像を重ね合わせて第１の着衣モデル画像を出力する第１の着衣合成処理を実行する。画像処理装置１０は、第１の着衣モデル画像に合成用顔及び手画像を重ね合わせ、さらに陰画像を重ね合わせることによって、最終的な着衣モデル画像を生成する第２の着衣画像生成処理を実行する。 (S6: Clothes composition processing that generates the final clothed image)
The image processing device 10 executes a first clothing composition process of superimposing a composition clothing image on a composition model image and outputting a first clothed model image. The image processing device 10 executes a second clothed image generation process that generates a final clothed model image by superimposing the synthetic face and hand images on the first clothed model image and further superimposing the shadow image. do.

　（画像処理装置１０のシステム構成）
　次に、画像処理装置１０のシステム構成を説明する。図３は、本発明の実施形態に係る画像処理装置１０のシステム構成図である。画像処理装置１０は、クラウドシステム上に、あるいは社内ネットワーク上に置かれるように構成されてよい。図３に示すように、画像処理装置１０は、一般的なコンピュータと同様に、バス１２０などによって相互に接続された制御部１０１、主記憶部１０２、補助記憶部１０３、インターフェース（ＩＦ）部１０４、及び出力部１０５を備える。補助記憶部１０３は、画像処理装置１０の各機能を実装するプログラム、及び当該プログラムで取り扱うデータを格納する。補助記憶部１０３は、ファイル／データベースなどの形式で、実写撮影データ１０６、３Ｄ衣服データ１０７、３Ｄアバター１０８、３Ｄ着衣アバター１０９、及び合成用衣服画像１１０を備える。画像処理装置１０は、実写撮影データ１０６、３Ｄ衣服データ１０７、３Ｄアバター１０８、３Ｄ着衣アバター１０９、及び合成用衣服画像１１０に格納されている情報を読み出し、あるいは更新できる。補助記憶部１０３に格納されている各プログラムは、画像処理装置１０によって実行される。 (System configuration of image processing device 10)
Next, the system configuration of the image processing device 10 will be explained. FIG. 3 is a system configuration diagram of the image processing device 10 according to the embodiment of the present invention. The image processing device 10 may be configured to be placed on a cloud system or on an in-house network. As shown in FIG. 3, the image processing device 10 includes a control section 101, a main storage section 102, an auxiliary storage section 103, and an interface (IF) section 104, which are interconnected by a bus 120 or the like, like a general computer. , and an output section 105. The auxiliary storage unit 103 stores programs that implement each function of the image processing device 10 and data handled by the programs. The auxiliary storage unit 103 includes live photographed data 106, 3D clothing data 107, 3D avatar 108, 3D clothing avatar 109, and synthetic clothing image 110 in a file/database format. The image processing device 10 can read or update information stored in the live-action photography data 106, 3D clothing data 107, 3D avatar 108, 3D clothed avatar 109, and synthetic clothing image 110. Each program stored in the auxiliary storage unit 103 is executed by the image processing device 10.

　制御部１０１は、中央処理装置（ＣＰＵ）とも呼ばれ、画像処理装置１０の各構成要素の制御やデータの演算を行い、補助記憶部１０３に格納されている各種プログラムを主記憶部１０２に読み出して実行する。主記憶部１０２は、メインメモリとも呼ばれ、受信した各種データ、コンピュータ実行可能な命令及び当該命令による演算処理後のデータなどを記憶する。補助記憶部１０３は、ハードディスク（ＨＤＤ）やＳＳＤ（Solid State Drive）などに代表される記憶装置であり、データやプログラムを長期的に保存する。 The control unit 101, also called a central processing unit (CPU), controls each component of the image processing device 10 and calculates data, and reads various programs stored in the auxiliary storage unit 103 to the main storage unit 102. and execute it. The main storage unit 102 is also called main memory, and stores various received data, computer-executable instructions, and data after arithmetic processing using the instructions. The auxiliary storage unit 103 is a storage device such as a hard disk (HDD) or SSD (Solid State Drive), and stores data and programs for a long period of time.

　図３の実施形態は、制御部１０１、主記憶部１０２及び補助記憶部１０３を同一のコンピュータの内部に設ける実施形態について説明するが、他の実施形態として、画像処理装置１０は、制御部１０１、主記憶部１０２及び補助記憶部１０３を複数個使用することにより、複数のコンピュータによる並列分散処理を実現するように構成することもできる。また、他の実施形態として、画像処理装置１０のための複数のサーバを設置し、複数サーバが一つの補助記憶部１０３を共有する実施形態にすることも可能である。 Although the embodiment of FIG. 3 describes an embodiment in which the control unit 101, the main storage unit 102, and the auxiliary storage unit 103 are provided inside the same computer, as another embodiment, the image processing apparatus 10 By using a plurality of main storage units 102 and auxiliary storage units 103, it is also possible to implement parallel distributed processing by a plurality of computers. Furthermore, as another embodiment, a plurality of servers for the image processing apparatus 10 may be installed, and one auxiliary storage unit 103 may be shared by the plurality of servers.

　ＩＦ部１０４は、他のシステムや装置との間でデータを送受信する際のインターフェースの役割を果たし、また、システムオペレータから各種コマンドや入力データ（各種マスタ、テーブルなど）を受け付けるインターフェースを提供する。出力部１０５は、処理されたデータを表示する表示画面や当該データを印刷するための印刷手段などを提供する。 The IF unit 104 serves as an interface for transmitting and receiving data with other systems and devices, and also provides an interface for receiving various commands and input data (various masters, tables, etc.) from the system operator. The output unit 105 provides a display screen for displaying processed data, a printing means for printing the data, and the like.

　制御部１０１、主記憶部１０２、補助記憶部１０３、ＩＦ部１０４、及び出力部１０５と同様な構成要素は、ユーザ端末１１及び撮像装置１３にも存在する。 Components similar to the control unit 101, main storage unit 102, auxiliary storage unit 103, IF unit 104, and output unit 105 also exist in the user terminal 11 and the imaging device 13.

　実写撮影データ１０６は、実在モデルのモデル画像（２Ｄ画像データ）、実在モデルの顔及び手のマスク画像、並びに実写撮影時のカメラ設定データ及び照明設定データを格納する。図４は、本発明の実施形態に係る実写撮影データ１０６のデータ構造の一例を示す図である。実写撮影データ１０６は、実写撮影ＩＤ４０１、モデル画像４０２、マスク画像４０３、カメラ設定データ４０４、及び照明設定データ４０５を含むことができるが、これらのデータ項目に限定されることはなく他のデータ項目も含むことができる。 The live-action shooting data 106 stores a model image (2D image data) of a real model, a mask image of the face and hands of the real model, and camera setting data and lighting setting data at the time of live-action shooting. FIG. 4 is a diagram illustrating an example of the data structure of the live-action photography data 106 according to the embodiment of the present invention. The live-action shooting data 106 can include a live-action shooting ID 401, a model image 402, a mask image 403, camera setting data 404, and lighting setting data 405, but is not limited to these data items and may include other data items. can also be included.

　実写撮影ＩＤ４０１は、実写撮影時のモデル及び当該モデルに関連付けられるデータを識別する識別子である。モデル画像４０２は、実在モデルの２Ｄモデル画像データであり、「合成用モデル画像」とも呼ばれる。マスク画像４０３は、合成用モデル画像から生成したモデルの顔及び手のマスク画像である。カメラ設定データ４０４は、実写撮影時のカメラの設定データ、例えば、カメラ角度、距離などを示す。照明設定データ４０５は、実写撮影時の照明の設定データ、例えば、明るさなどを示す。 The live-action shooting ID 401 is an identifier that identifies a model at the time of live-action shooting and data associated with the model. The model image 402 is 2D model image data of a real model, and is also called a "synthesis model image." The mask image 403 is a mask image of the model's face and hands generated from the synthesis model image. Camera setting data 404 indicates camera setting data at the time of live-action photography, such as camera angle and distance. Lighting setting data 405 indicates lighting setting data, such as brightness, at the time of live-action photography.

　図３に戻って説明すると、３Ｄ衣服データ１０７は、ユーザによって生成された３Ｄ衣服データを格納する。３Ｄ衣服データは、画像選択を容易にするための属性情報（例えば、衣服のカテゴリー、色、形など）と関連付けられて記憶されていてもよい。 Returning to FIG. 3, the 3D clothing data 107 stores 3D clothing data generated by the user. The 3D clothing data may be stored in association with attribute information (eg, clothing category, color, shape, etc.) to facilitate image selection.

　３Ｄアバター１０８は、ユーザによって生成された３Ｄアバターのデータを格納する。３Ｄアバターは、実写撮影時のモデル画像のポーズと同じになるようにユーザによって作成される。図５は、本発明の実施形態に係る３Ｄアバター１０８のデータ構造の一例を示す図である。３Ｄアバター１０８は、３ＤアバターＩＤ５０１、３Ｄアバター５０２、及び実写撮影ＩＤ４０１を含むことができるが、これらのデータ項目に限定されることはなく他のデータ項目も含むことができる。 The 3D avatar 108 stores 3D avatar data generated by the user. The 3D avatar is created by the user so as to have the same pose as the model image at the time of live-action photography. FIG. 5 is a diagram showing an example of the data structure of the 3D avatar 108 according to the embodiment of the present invention. The 3D avatar 108 can include a 3D avatar ID 501, a 3D avatar 502, and a live-action shooting ID 401, but is not limited to these data items and can also include other data items.

　３ＤアバターＩＤ５０１は、３Ｄアバターを識別する識別子である。３Ｄアバター５０２は、３Ｄアバターのデータを示す。実写撮影ＩＤ４０１は、３Ｄアバターに関連付けられる実写撮影を識別する識別子である。実写撮影ＩＤ４０１により、対応する実写モデルのポーズ、マスク画像、カメラや照明の設定データが３Ｄアバターに関連付けられる。 The 3D avatar ID 501 is an identifier that identifies a 3D avatar. 3D avatar 502 shows 3D avatar data. The live-action shooting ID 401 is an identifier that identifies the live-action shooting associated with the 3D avatar. The live-action shooting ID 401 associates the corresponding live-action model's pose, mask image, camera and lighting setting data with the 3D avatar.

　図３に戻って説明すると、３Ｄ着衣アバター１０９は、３Ｄアバターに３Ｄ衣服データを重ね合わせて所定の処理を施した３Ｄ着衣アバターの画像データを格納する。図６は、本発明の実施形態に係る３Ｄ着衣アバター１０９のデータ構造の一例を示す図である。３Ｄ着衣アバター１０９は、３Ｄ着衣アバターＩＤ６０１、第１の３Ｄ着衣アバター６０２、第２の３Ｄ着衣アバター６０３、陰情報６０４、陰画像６０５、３ＤアバターＩＤ５０１、及び実写撮影ＩＤ４０１を含むことができるが、これらのデータ項目に限定されることはなく他のデータ項目も含むことができる。 Returning to FIG. 3, the 3D clothed avatar 109 stores image data of a 3D clothed avatar obtained by superimposing 3D clothing data on the 3D avatar and performing predetermined processing. FIG. 6 is a diagram showing an example of the data structure of the 3D clothed avatar 109 according to the embodiment of the present invention. The 3D clothed avatar 109 can include a 3D clothed avatar ID 601, a first 3D clothed avatar 602, a second 3D clothed avatar 603, shadow information 604, a shadow image 605, a 3D avatar ID 501, and a live-action shooting ID 401. It is not limited to these data items and can also include other data items.

　３Ｄ着衣アバターＩＤ６０１は、画像処理装置１０によって生成された３Ｄ着衣アバターを識別する識別子である。第１の３Ｄ着衣アバター６０２は、３Ｄクロスシミュレーション済の３Ｄ着衣アバターの画像データを示す。第２の３Ｄ着衣アバター６０３は、第１の３Ｄ着衣アバターに対してカメラ設定データ及び証明設定データを反映させて３Ｄレンダリング処理を行った３Ｄ着衣アバターの画像データを示す。陰情報６０４及び陰画像６０５は、第１の３Ｄ着衣アバターに対して合成用モデル画像を生成した際のカメラ設定データ及び証明設定データを反映させてレンダリングを実行した際に生成される陰情報及び陰画像をそれぞれ示す。３ＤアバターＩＤ５０１は、３Ｄ着衣アバターを生成する元になった３Ｄアバターを特定するための識別子であり、実写撮影ＩＤ４０１は、当該３Ｄアバターに関連付けられる実写撮影を識別する識別子である。３ＤアバターＩＤ５０１及び実写撮影ＩＤ４０１により、実在モデル撮影時の各種設定データなどを取得しやすくなる。 The 3D clothed avatar ID 601 is an identifier that identifies the 3D clothed avatar generated by the image processing device 10. The first 3D clothed avatar 602 shows image data of a 3D clothed avatar that has undergone 3D cloth simulation. A second 3D clothed avatar 603 represents image data of a 3D clothed avatar that has been subjected to 3D rendering processing by reflecting camera setting data and certification setting data on the first 3D clothed avatar. The shadow information 604 and the shadow image 605 are the shadow information and shadow information generated when rendering is executed by reflecting the camera setting data and proof setting data when generating the synthesis model image for the first 3D clothed avatar. The shadow images are shown respectively. The 3D avatar ID 501 is an identifier for identifying the 3D avatar from which the 3D clothed avatar is generated, and the live-action shooting ID 401 is an identifier for identifying the live-action shooting associated with the 3D avatar. The 3D avatar ID 501 and live-action shooting ID 401 make it easier to acquire various setting data and the like when shooting a real model.

　図３に戻って説明すると、合成用衣服画像１１０は、第２の３Ｄ着衣アバターの３Ｄ衣服データに基づいて生成された２Ｄ衣服データを格納する。図７は、本発明の実施形態に係る合成用衣服画像１１０のデータ構造の一例を示す図である。合成用衣服画像１１０は、合成用衣服画像ＩＤ７０１、合成用衣服画像７０２、実写撮影ＩＤ４０１、及び３Ｄ着衣アバターＩＤ６０１を含むことができるが、これらのデータ項目に限定されることはなく他のデータ項目も含むことができる。 Returning to FIG. 3, the synthetic clothing image 110 stores 2D clothing data generated based on the 3D clothing data of the second 3D clothing avatar. FIG. 7 is a diagram illustrating an example of a data structure of a clothing image for synthesis 110 according to an embodiment of the present invention. The synthetic clothing image 110 can include a synthetic clothing image ID 701, a synthetic clothing image 702, a live-action photography ID 401, and a 3D clothed avatar ID 601, but is not limited to these data items and may include other data items. can also be included.

　合成用衣服画像ＩＤ７０１は、本発明の実施形態に係る画像合成処理に使用される２Ｄの衣服画像データを識別する識別子である。合成用衣服画像７０２は、画像合成処理に使用される２Ｄの衣服画像データを示す。実写撮影ＩＤ４０１は、合成用衣服画像を生成する元になった３Ｄ着衣アバターに関連付けられるに実写撮影を識別する識別子である。３Ｄ着衣アバターＩＤ６０１は、合成用衣服画像の元データである３Ｄ衣服データに関連付けられる３Ｄ着衣アバターの識別子である。 The compositing clothing image ID 701 is an identifier that identifies 2D clothing image data used in the image compositing process according to the embodiment of the present invention. A clothing image for composition 702 indicates 2D clothing image data used for image composition processing. The live-action shooting ID 401 is an identifier that identifies the live-action shooting associated with the 3D clothed avatar from which the synthetic clothing image is generated. The 3D clothed avatar ID 601 is an identifier of a 3D clothed avatar associated with 3D cloth data that is the original data of the clothing image for synthesis.

　（各種フローについての説明）
　図８～図１１を参照しながら、画像処理装置１０が、合成用衣服画像（２Ｄ）、合成用モデル画像（２Ｄ）、各種設定データ、３Ｄアバター及び３Ｄ衣服データを使用して最終な着衣モデル画像を生成する処理フローを説明する。図８～図１１は、図２のＳ３～Ｓ６の処理内容をそれぞれ示している。Ｓ４及びＳ５の処理はいずれが先に行われてもよい。 (Explanation about various flows)
With reference to FIGS. 8 to 11, the image processing device 10 creates a final clothed model using a compositing clothing image (2D), a compositing model image (2D), various setting data, 3D avatars, and 3D clothing data. The processing flow for generating an image will be explained. 8 to 11 show the processing contents of S3 to S6 in FIG. 2, respectively. Either of S4 and S5 may be performed first.

　（Ｓ３：３Ｄ着衣アバター及び合成用衣服画像を生成する処理内容）
　図８は、画像処理装置１０が、図２のＳ１及びＳ２を参照しながら上記で説明したような処理により生成されたデータ、すなわち、合成用モデル画像、各種設定データ、３Ｄアバター、及び３Ｄ衣服データを使用して、３Ｄ着衣アバター及び合成用衣服画像を生成する処理フローを示す。 (S3: Processing details for generating 3D clothed avatar and synthetic clothing image)
FIG. 8 shows data generated by the image processing device 10 through the processing described above with reference to S1 and S2 of FIG. A processing flow for generating a 3D clothed avatar and a synthetic cloth image using data is shown.

　図８の処理では、画像処理装置１０は、３Ｄ着衣アバター画像を生成するためのアプリケーションをユーザ端末１１に提供し、ユーザ端末１１を介して受信したユーザ指示に基づいて処理を行う。 In the process of FIG. 8, the image processing device 10 provides the user terminal 11 with an application for generating a 3D clothed avatar image, and performs the process based on user instructions received via the user terminal 11.

　Ｓ８０１にて、ユーザ端末１１は、提供されたアプリケーションを通じて処理対象とする合成用モデル画像、３Ｄ衣服データ及び３Ｄアバターを選択し、画像処理装置１０に選択指示を送信する。画像処理装置１０は、ユーザ端末１１からの選択指示に応じて、選択されたモデル画像を実写撮影データ１０６から読み出し、３Ｄ衣服データを３Ｄ衣服データ１０７から読み出し、選択された３Ｄアバターを３Ｄアバター１０８から読み出す。画像処理装置１０は、読み出したモデル画像のポーズに合わせるように３Ｄアバターのポーズを変更することができる。この処理により、モデル画像のポーズと３Ｄアバターのポーズが一致し、モデル画像及び３Ｄアバターは関連付けられ、３Ｄアバター１０８に選択した実写撮影ＩＤ４０１が記憶される。 In S801, the user terminal 11 selects a synthesis model image, 3D clothing data, and 3D avatar to be processed through the provided application, and sends a selection instruction to the image processing device 10. In response to a selection instruction from the user terminal 11, the image processing device 10 reads the selected model image from the live-action photography data 106, reads 3D clothing data from the 3D clothing data 107, and converts the selected 3D avatar into a 3D avatar 108. Read from. The image processing device 10 can change the pose of the 3D avatar to match the pose of the read model image. Through this processing, the pose of the model image and the pose of the 3D avatar match, the model image and the 3D avatar are associated, and the selected live-action shooting ID 401 is stored in the 3D avatar 108.

　ユーザ端末１１は、アプリケーションの３ＤＣＧ空間上で３Ｄ衣服データを３Ｄアバターの適切な位置に配置する配置指示を画像処理装置１０に送信する。画像処理装置１０は、ユーザ端末１１からの配置指示に応じて、３ＤＣＧ空間上で３Ｄ衣服データを３Ｄアバターに重ね合わせ、所定の位置計算を行うことにより、３Ｄ衣服データの大きさや配置位置を調整し、３Ｄ衣服データを３Ｄアバターの適切な位置に配置する。この処理により、３Ｄ衣服データが３Ｄアバターに着衣される。 The user terminal 11 transmits a placement instruction to the image processing device 10 to place the 3D clothing data at an appropriate position of the 3D avatar on the 3DCG space of the application. The image processing device 10 adjusts the size and placement position of the 3D clothing data by superimposing the 3D clothing data on the 3D avatar in the 3DCG space and performing predetermined position calculations in accordance with placement instructions from the user terminal 11. and place the 3D clothing data at an appropriate position on the 3D avatar. Through this process, the 3D clothing data is put on the 3D avatar.

　Ｓ８０２にて、ユーザ端末１１は、３Ｄ衣服データを着衣した３Ｄアバターに対してクロスシミュレーションを実行するためのシミュレーション指示を画像処理装置１０に送信する。クロスシミュレーションとは、衣服などのクロス（布）の動きを物理的にシミュレートする技術を指す。例えば、３Ｄアバターが着衣した際にできる衣服の皺の状態をシミュレーションするなど、クロスの物理演算が行われる。 In S802, the user terminal 11 transmits a simulation instruction to the image processing device 10 to perform a cloth simulation on a 3D avatar wearing 3D clothing data. Cloth simulation refers to a technology that physically simulates the movement of cloth such as clothing. For example, physical calculations are performed on the cloth, such as simulating the wrinkles in clothing when a 3D avatar wears it.

　画像処理装置１０は、ユーザ端末１１からのシミュレーション指示に応じて、３ＤＣＧ空間上で、３Ｄ衣服データを着衣した３Ｄアバターにおいて、３Ｄアバターの体形やポーズに応じて３Ｄ衣服データのクロスシミュレーションを実行し、クロスシミュレーション済の３Ｄ着衣アバターを３Ｄ着衣アバター１０９の第１の３Ｄ着衣アバター６０２に格納する。 In response to a simulation instruction from the user terminal 11, the image processing device 10 executes a cross simulation of 3D clothing data in a 3DCG space on a 3D avatar wearing 3D clothing data according to the body shape and pose of the 3D avatar. , the cross-simulated 3D clothed avatar is stored in the first 3D clothed avatar 602 of the 3D clothed avatar 109 .

　Ｓ８０３にて、画像処理装置１０は、Ｓ８０１にて選択されたモデル画像に関連付けられるカメラ設定データ及び照明設定データを実写撮影データ１０６から読み出す。画像処理装置１０は、アプリケーションの３ＤＣＧ空間上で、クロスシミュレーション済の３Ｄ着衣アバター（第１の３Ｄ着衣アバター）に対して、読み出したカメラ設定データ及び照明設定データを反映させ、所定のシェーダー（shader）設定パラメータを使用して３Ｄレンダリング処理を実行し、３Ｄレンダリング処理済の３Ｄ着衣アバター（第２の３Ｄ着衣アバター）を３Ｄ着衣アバター１０９の第２の３Ｄ着衣アバター６０３に格納する。 In S803, the image processing device 10 reads camera setting data and lighting setting data associated with the model image selected in S801 from the live-action photography data 106. The image processing device 10 reflects the read camera setting data and lighting setting data on the cross-simulated 3D clothed avatar (first 3D clothed avatar) in the 3DCG space of the application, and applies a predetermined shader. ) Execute the 3D rendering process using the setting parameters, and store the 3D rendered avatar (second 3D clothed avatar) in the second 3D clothed avatar 603 of the 3D clothed avatar 109.

　Ｓ８０４にて、画像処理装置１０は、３Ｄレンダリング処理済の３Ｄ着衣アバター（第２の３Ｄ着衣アバター）から３Ｄアバターを除いた３Ｄ衣服データを抽出する。画像処理装置１０は、抽出した３Ｄ衣服データに基づいて２Ｄ衣服画像（本明細書では「合成用衣服画像」と言う）を生成し、合成用衣服画像１１０に格納する。 In S804, the image processing device 10 extracts 3D clothing data by removing the 3D avatar from the 3D clothing avatar (second 3D clothing avatar) that has undergone 3D rendering processing. The image processing device 10 generates a 2D clothing image (herein referred to as a "synthesis clothing image") based on the extracted 3D clothing data, and stores it in the synthesis clothing image 110.

　（Ｓ４：合成用顔及び手画像を生成する処理内容）
　図９は、画像処理装置１０が、顔及び手のマスク画像、合成用衣服画像、及び３Ｄ着衣アバターを使用して、衣服と人体の前後関係を判定済みの合成用顔・手画像を生成する処理フローを示す。なお、本処理フローの前提として、画像処理装置１０は、任意のアプリケーションを通じてユーザ端末１１と通信して、合成用モデル画像から顔及び手のマスク画像を生成しているものとする。また、本明細書では「手」という用語は、人体の肩から指先に至る間、手首、手のひら、及び手の指のいずれかを示す用語として使用されるが、これらは衣服のデザインによって変わりうる。 (S4: Processing details for generating face and hand images for synthesis)
In FIG. 9, the image processing device 10 uses a face and hand mask image, a composition clothing image, and a 3D clothed avatar to generate a composite face and hand image in which the front-back relationship between the clothing and the human body has been determined. The processing flow is shown. Note that this processing flow assumes that the image processing device 10 communicates with the user terminal 11 through an arbitrary application to generate mask images of the face and hands from the synthesis model image. Additionally, in this specification, the term "hand" is used to indicate any of the wrist, palm, and fingers of a human body from the shoulder to the fingertips, but these may vary depending on the design of the clothing. .

　Ｓ９０１にて、画像処理装置１０は、処理対象の実写撮影ＩＤ４０１に関連付けられる合成用モデル画像から、顔及び手のキーポイントを抽出し、抽出したキーポイントをバウンディングボックスとして探索範囲を設定する。画像処理装置１０は、顔や手の一方の側から他方の側（例えば、左側の外郭部分から右側の外郭部分）へ移動しながらエッジを探索し、顔や手の他方の側に到達して折り返すまでエッジの探索を続ける。画像処理装置１０は、探索されたエッジの範囲から合成用モデル画像の顔及び手全体の画像を抽出する。 In S901, the image processing device 10 extracts key points of the face and hands from the synthesis model image associated with the live-action shooting ID 401 to be processed, and sets a search range using the extracted key points as a bounding box. The image processing device 10 searches for edges while moving from one side of the face or hand to the other side (for example, from the left outer part to the right outer part), and reaches the other side of the face or hand. Continue searching the edge until it turns around. The image processing device 10 extracts the entire face and hand image of the synthesis model image from the searched edge range.

　画像処理装置１０は、任意のフィルタを使用して予め生成してあった顔及び手のマスク画像のエッジ情報を抽出する。図１２（ａ）の上部は、顔及び手のマスク画像からエッジ情報を抽出するイメージを示している。 The image processing device 10 extracts edge information of a face and hand mask image that has been generated in advance using an arbitrary filter. The upper part of FIG. 12(a) shows an image of extracting edge information from mask images of faces and hands.

　Ｓ９０２にて、画像処理装置１０は、第１の３Ｄ着衣アバター６０２を３Ｄ着衣アバター１０９から読み出し、読み出した第１の３Ｄ着衣アバター６０２に対して深度情報抽出処理を実行する。この処理により、画像処理装置１０は、第１の３Ｄ着衣アバター６０２のそれぞれの位置の深度情報を取得することができる。深度情報により、衣服における皺と外郭線を区別できるようになる。 In S902, the image processing device 10 reads the first 3D clothed avatar 602 from the 3D clothed avatar 109, and performs depth information extraction processing on the read first 3D clothed avatar 602. Through this process, the image processing device 10 can acquire depth information at each position of the first 3D clothed avatar 602. Depth information makes it possible to distinguish between wrinkles and contour lines in clothing.

　Ｓ９０３にて、画像処理装置１０は、任意のフィルタを使用して合成用衣服画像のエッジ情報を抽出する。合成用衣服画像のエッジ情報をそのまま抽出すると衣服のテクスチャによる皺などがノイズとなり得るため、画像処理装置１０は、Ｓ９０２で取得した深度情報を使用して合成用衣服画像のエッジ情報を抽出することができる。図１２（ａ）の下部は、合成用衣服画像からエッジ情報を抽出するイメージを示している。 In S903, the image processing device 10 extracts edge information of the clothing image for synthesis using an arbitrary filter. If the edge information of the clothing image for synthesis is extracted as is, wrinkles due to the texture of the clothing may become noise, so the image processing device 10 extracts the edge information of the clothing image for synthesis using the depth information acquired in S902. Can be done. The lower part of FIG. 12(a) shows an image of extracting edge information from a clothing image for synthesis.

　なお、Ｓ９０１の処理と、Ｓ９０２及びＳ９０３の処理の順序はどちらが先に行われても構わず、特に限定されない。すなわち、Ｓ９０１の処理の後にＳ９０２及びＳ９０３の処理が行われてもいいし、Ｓ９０２及びＳ９０３の処理の後にＳ９０１の処理が行われてもよい。あるいは、両者が同時並行で処理されても構わない。 Note that the order of the processing in S901 and the processing in S902 and S903 is not particularly limited and may be performed first. That is, the processing in S902 and S903 may be performed after the processing in S901, or the processing in S901 may be performed after the processing in S902 and S903. Alternatively, both may be processed in parallel.

　Ｓ９０４にて、画像処理装置１０は、合成用衣服画像のエッジ情報と、顔及び手のマスク画像のエッジ情報とを組み合わせる。図１２（ｂ）は、２つのエッジ情報を組み合わせたイメージを示している。 In S904, the image processing device 10 combines the edge information of the clothing image for synthesis with the edge information of the face and hand mask images. FIG. 12(b) shows an image in which two pieces of edge information are combined.

　Ｓ９０５にて、画像処理装置１０は、Ｓ９０４で組み合わされたエッジ情報に基づいて、Ｓ９０１にて抽出された合成用モデル画像の顔及び手全体の画像をエッジで囲まれた領域に分割する。図１３（ａ）は、合成用モデル画像の手部分をエッジで囲まれたいくつかの領域に分割したことを示す例である。 In S905, the image processing device 10 divides the entire face and hand image of the synthesis model image extracted in S901 into regions surrounded by edges, based on the edge information combined in S904. FIG. 13A is an example showing that the hand portion of the synthesis model image is divided into several regions surrounded by edges.

　Ｓ９０６にて、画像処理装置１０は、第２の３Ｄ着衣アバター６０３を３Ｄ着衣アバター１０９から読み出し、読み出した第２の３Ｄ着衣アバター６０３と、いくつかの領域に分割処理がなされた合成用モデル画像の顔及び手全体の画像との対応部分同士（例えば、両者の左手の親指同士）を分割領域ごとに比較する。画像処理装置１０は、両者の一致率を計算し、一致率が予め定められた閾値（Ｘ）以上となる部分を実際に見える部分であると判定する。画像処理装置１０は、それぞれの領域についての判定結果に基づいて、合成用モデル画像の顔及び手全体の画像のうちある領域を視認可能な箇所（見える部分）とし、あるいは視認不可な箇所（見えない部分）とした最終的に合成する顔及び手画像を抽出する。図１３（ｂ）の例では、左手の親指部分は、閾値（Ｘ）未満の一致率であったため、画像処理装置１０は、この親指部分の画像を最終的に合成する顔及び手画像には含めない処理を行っている。 In S906, the image processing device 10 reads the second 3D clothed avatar 603 from the 3D clothed avatar 109, and combines the read second 3D clothed avatar 603 with the compositing model image that has been divided into several regions. The corresponding parts of the face and the entire hand image (for example, the thumbs of both left hands) are compared for each divided region. The image processing device 10 calculates the matching rate between the two, and determines that a portion where the matching rate is equal to or higher than a predetermined threshold value (X) is an actually visible portion. Based on the determination results for each area, the image processing device 10 determines a certain area of the entire face and hand image of the synthesis model image as a visible part (visible part) or as an invisible part (visible part). The face and hand images to be finally synthesized are extracted. In the example of FIG. 13(b), since the matching rate for the thumb portion of the left hand was less than the threshold value (X), the image processing device 10 determines that the image of the thumb portion of the left hand should not be combined with the face and hand images to be finally combined. Processing is being performed not to include it.

　閾値（Ｘ）は、深度情報に基づいて変化させることができる。このため、画像処理装置１０は、Ｓ９０２で取得したそれぞれの位置の深度情報に基づいて、それぞれの位置の閾値（Ｘ）を変えることができる。このため、エッジで囲まれた分割領域ごとに閾値（Ｘ）の値は変化し得る。 The threshold (X) can be changed based on depth information. Therefore, the image processing apparatus 10 can change the threshold value (X) of each position based on the depth information of each position acquired in S902. Therefore, the value of the threshold (X) can change for each divided region surrounded by edges.

　Ｓ９０７にて、画像処理装置１０は、Ｓ９０６にて抽出された、それぞれの領域についての最終的に合成する顔及び手画像に基づいて、合成用顔・手画像を生成する。図１３（ｂ）は、生成された合成用顔・手画像の従来技術と本発明の手画像の例を示している。図１３（ｂ）に示されるように、従来の一般的な画像合成処理では上述したような一致判定を行わないため、親指部分が見えてしまっている。実際のポーズでは、親指部分は衣服の襞の裏側に隠れており見えないはずなので、不自然な画像となる。一方、本発明による一致判定を行った場合、親指部分が衣服の襞の裏側に隠れてしまい見えることはない。画像処理装置１０は、この親指部分については一致率が閾値（Ｘ）未満であると判定し、この親指部分については見えない部分であるとして合成用顔・手画像には含めていない。 In S907, the image processing device 10 generates a face/hand image for synthesis based on the face and hand images to be finally synthesized for each region extracted in S906. FIG. 13(b) shows an example of the generated synthetic face/hand images of the prior art and the hand image of the present invention. As shown in FIG. 13(b), the thumb portion is visible because the conventional general image synthesis process does not perform the above-described matching determination. In the actual pose, the thumb is hidden behind the folds of the clothing and cannot be seen, resulting in an unnatural image. On the other hand, when a match is determined according to the present invention, the thumb part is hidden behind the folds of the clothing and cannot be seen. The image processing device 10 determines that the matching rate for this thumb portion is less than the threshold (X), and does not include this thumb portion in the face/hand image for synthesis since it is an invisible portion.

　（Ｓ５：陰画像の生成処理）
　図１０は、画像処理装置１０が、クロスシミュレーション済の３Ｄ着衣アバター（第１の３Ｄ着衣アバター）に対して、合成用モデル画像を生成した際のカメラ設定データ及び証明設定データを反映させてレンダリングを実行した際に生まれる陰（シェード）情報に基づく陰画像を生成する処理フローを示す。 (S5: Shadow image generation process)
In FIG. 10, the image processing device 10 renders the cross-simulated 3D clothed avatar (first 3D clothed avatar) by reflecting the camera setting data and proof setting data when generating the synthesis model image. The processing flow for generating a shade image based on shade information generated when executing the process is shown.

　Ｓ１００１にて、画像処理装置１０は、処理対象の実写撮影ＩＤ４０１に基づいて第１の３Ｄ着衣アバター６０２を３Ｄ着衣アバター１０９から読み出す。画像処理装置１０はまた、当該実写撮影ＩＤ４０１に基づいて実写撮影データ１０６に問い合わせを行い、対応するカメラ設定データ４０４及び照明設定データ４０５を読み出す。 In S1001, the image processing device 10 reads the first 3D clothed avatar 602 from the 3D clothed avatar 109 based on the live-action shooting ID 401 to be processed. The image processing device 10 also queries the live-action shooting data 106 based on the live-action shooting ID 401 and reads out the corresponding camera setting data 404 and illumination setting data 405.

　Ｓ１００２にて、画像処理装置１０は、読み出した第１の３Ｄ着衣アバター６０２に対して、対応するカメラ設定データ４０４及び照明設定データ４０５を反映させてレンダリングを実行し、光が当たっているか、いないかを計算してシェーディングを行う。 In S1002, the image processing device 10 performs rendering on the read first 3D clothed avatar 602 by reflecting the corresponding camera setting data 404 and lighting setting data 405, and determines whether or not light is shining on it. Calculate and perform shading.

　Ｓ１００３にて、画像処理装置１０は、シェーディングの計算結果である陰情報に基づく陰画像を生成する。画像処理装置１０は、陰情報及び陰画像を３Ｄ着衣アバター１０９の陰情報６０４及び陰画像６０５にそれぞれ格納する。 In S1003, the image processing device 10 generates a shadow image based on shadow information that is a shading calculation result. The image processing device 10 stores the shadow information and the shadow image as the shadow information 604 and the shadow image 605 of the 3D clothed avatar 109, respectively.

　（Ｓ６：最終着衣画像を生成する着衣合成処理）
　図１１は、画像処理装置１０が、処理対象の実写撮影ＩＤ４０１に関連付けられる合成用モデル画像に合成用衣服画像を重ね合わせて第１の着衣モデル画像を出力する第１の着衣合成処理と、第１の着衣合成処理の出力である第１の着衣モデル画像に、合成用顔・手画像を重ね合わせ、さらに陰画像を重ね合わせて、最終的な着衣モデル画像を生成する第２の着衣合成処理とを実行することにより最終着衣画像を生成する処理フローを示す。 (S6: Clothes composition processing that generates the final clothed image)
FIG. 11 shows a first clothing composition process in which the image processing device 10 outputs a first clothed model image by superimposing a composition clothing image on a composition model image associated with the live-action shooting ID 401 to be processed; A second clothed compositing process that generates a final clothed model image by superimposing the compositing face/hand image on the first clothed model image that is the output of the first clothed compositing process, and further overlapping the shadow image. The processing flow for generating the final clothed image by executing the following is shown.

　Ｓ１１０１にて、画像処理装置１０は、第１の着衣合成処理を実行する。より詳細に言えば、画像処理装置１０は、処理対象の実写撮影ＩＤ４０１に基づいてモデル画像４０２を実写撮影データ１０６から読み出し、当該実写撮影ＩＤ４０１を使用して合成用衣服画像１１０に問い合わせを行って合成用衣服画像７０２を読み出す。画像処理装置１０は、合成用モデル画像に合成用衣服画像を重ね合わせて、第１の着衣モデル画像を生成する。 In S1101, the image processing device 10 executes a first clothing composition process. More specifically, the image processing device 10 reads the model image 402 from the live-action photography data 106 based on the live-action photography ID 401 to be processed, and queries the clothing image for composition 110 using the live-action photography ID 401. The composite clothing image 702 is read out. The image processing device 10 generates a first clothed model image by superimposing the synthetic clothing image on the synthetic model image.

　Ｓ１１０２にて、画像処理装置１０は、第２の着衣合成処理を実行する。より詳細に言えば、画像処理装置１０は、生成された第１の着衣モデル画像に、合成用顔・手画像を重ね合わせ、さらに陰画像を重ね合わせることにより、最終的な着衣モデル画像を生成する。画像処理装置１０は、生成した最終的な着衣モデル画像をユーザ端末１１に提供する。 In S1102, the image processing device 10 executes a second clothing composition process. More specifically, the image processing device 10 generates the final clothed model image by superimposing the synthetic face/hand image and the shadow image on the generated first clothed model image. do. The image processing device 10 provides the generated final clothed model image to the user terminal 11.

　（本発明の利点）
　上述した処理により、画像処理装置１０がヒトと衣服の前後関係をより精緻に推定しながら画像合成処理を実行することが可能となる。本発明によれば、ヒトと服の前後関係の特定が難しく、本来であればヒトの上に来るべき衣服領域が見えなくなるなどの出力画像の精度の低さといった問題が解決されるようになる。 (Advantages of the present invention)
The above-described processing enables the image processing device 10 to perform image synthesis processing while estimating the context of a person and clothing more precisely. According to the present invention, it is difficult to identify the front and back relationship between a person and clothing, and problems such as low accuracy of output images such as the clothing area that should normally be on the person becoming invisible can be solved. .

　（その他の実施形態）
　上記では、ヒトを例として説明してきたが、本発明の原理はヒト以外の動物に対しても適用可能である。最近、ペットとして飼育される動物用の衣服が販売されている。このようなペットの動物用の衣服についても本発明の原理を利用して合成画像を作成することにより、広告やマーケティングに合成画像を使用することができるようになる。 (Other embodiments)
Although humans have been described above as an example, the principles of the present invention are also applicable to animals other than humans. Recently, clothing for animals kept as pets has been on sale. By creating a composite image of such pet animal clothing using the principles of the present invention, the composite image can be used for advertising and marketing.

　また、上記では、ヒトの顔及び手を例として説明してきたが、衣服の種類によっては人体の部位のうち、顔や手以外の部位に対しても本発明の原理を適用して合成画像を生成することができるようになる。例えば、衣服がトップスの場合と、ボトムスの場合とでは、着衣した状態で表出する人体の部位は異なる。トップスの場合は、表出する人体の部位は顔及び／または手であってよく、ボトムスの場合は、表出する人体の部位は手及び／または足首であってよい。本明細書では「表出人体部位」は、衣服の種類に応じて、顔、手、足、足首などを示すものとする。 In addition, although the above explanation has been given using the human face and hands as an example, the principles of the present invention can also be applied to parts of the human body other than the face and hands, depending on the type of clothing, to create a composite image. be able to generate. For example, the parts of the human body that are exposed when wearing tops and bottoms are different. In the case of tops, the exposed body parts may be the face and/or hands, and in the case of bottoms, the exposed body parts may be the hands and/or ankles. In this specification, "exposed human body parts" refer to the face, hands, feet, ankles, etc., depending on the type of clothing.

　また、上記では、人体と衣服というオブジェクトを対象に本発明の原理を説明したが、対象となるオブジェクトは、人体や衣服に限定されることはない。例えば、対象となるオブジェクトが人体と乗り物（自動車、オートバイ、自転車など）であってもよい。さらに、オブジェクトの数は３つ以上であっても構わない。上記の例に関連して言えば、アクセサリーやバッグなどの小物を第３のオブジェクトとして合成画像を生成することも可能である。すなわち、本発明は、複数のオブジェクトの前後関係が複雑に入り組んでいたとしても、出力される合成画像の精度を高くすることができるようになる。 Furthermore, although the principles of the present invention have been described above with reference to objects such as a human body and clothing, the objects are not limited to the human body and clothing. For example, the target objects may be a human body and a vehicle (car, motorcycle, bicycle, etc.). Furthermore, the number of objects may be three or more. In relation to the above example, it is also possible to generate a composite image using small items such as accessories and bags as the third object. That is, the present invention makes it possible to increase the accuracy of the output composite image even if the context of a plurality of objects is complicated.

　以上、例示的な実施形態を参照しながら本発明の原理を説明したが、本発明の要旨を逸脱することなく、構成及び細部において変更する様々な実施形態を実現可能であることを当業者は理解するだろう。すなわち、本発明は、例えば、システム、装置、方法、プログラムもしくは記憶媒体等としての実施態様をとることが可能である。 Although the principles of the present invention have been described above with reference to exemplary embodiments, those skilled in the art will appreciate that various embodiments with changes in configuration and details can be realized without departing from the gist of the present invention. You will understand. That is, the present invention can be implemented as, for example, a system, device, method, program, storage medium, or the like.

　１　画像処理システム
　１０　画像処理装置
　１１　ユーザ端末
　１２　３Ｄスキャナ
　１３　撮像装置
　１４　ネットワーク
　１０１　制御部
　１０２　主記憶部
　１０３　補助記憶部
　１０４　インターフェース（ＩＦ）部
　１０５　出力部
　１０６　実写撮影データ
　１０７　３Ｄ衣服データ
　１０８　３Ｄアバター
　１０９　３Ｄ着衣アバター
　１１０　合成用衣服画像 1 Image processing system 10 Image processing device 11 User terminal 12 3D scanner 13 Imaging device 14 Network 101 Control unit 102 Main storage unit 103 Auxiliary storage unit 104 Interface (IF) unit 105 Output unit 106 Live-action shooting data 107 3D clothing data 108 3D avatar 109 3D clothed avatar 110 Clothes image for synthesis

Claims

　合成用モデル画像に関連付けられる設定データ、３Ｄアバター、及び３Ｄ衣服データを使用して、第１の３Ｄ着衣アバター、第２の３Ｄ着衣アバター、及び合成用衣服画像を生成する手段と、
　前記合成用モデル画像に関連付けられる表出人体部位のマスク画像のエッジ情報と、前記合成用衣服画像のエッジ情報とに基づいて、前記合成用モデル画像の表出人体部位全体の画像をエッジで囲まれた領域に分割し、分割された前記領域ごとに、前記第２の３Ｄ着衣アバターと前記領域に分割された前記合成用モデル画像の表出人体部位全体の画像との対応する部分の一致率を計算することによって、衣服と人体の前後関係を判定済みの合成用表出人体部位画像を生成する手段と、
　前記第１の３Ｄ着衣アバターに対して、前記合成用モデル画像に関連付けられる設定データを反映させてレンダリングを実行した際の陰画像を生成する手段と、
　前記合成用モデル画像に前記合成用衣服画像を重ね合わせて第１の着衣モデル画像を出力する手段と、
　前記第１の着衣モデル画像に、前記合成用表出人体部位画像を重ね合わせ、さらに前記陰画像を重ね合わせて、最終的な着衣モデル画像を生成する手段と
　を備えた画像処理装置。 Means for generating a first 3D clothed avatar, a second 3D clothed avatar, and a synthetic clothing image using setting data, a 3D avatar, and 3D clothing data associated with the synthetic model image;
An image of the entire expressed human body part of the synthetic model image is surrounded by edges based on edge information of a mask image of the expressed human body part associated with the synthetic model image and edge information of the synthetic clothing image. and for each divided region, the matching rate of the corresponding portion between the second 3D clothed avatar and the image of the entire expressed human body part of the synthesis model image divided into the regions. means for generating a composite expressed human body part image in which the anteroposterior relationship between the clothing and the human body has been determined by calculating the
means for generating a shadow image when rendering is performed on the first 3D clothed avatar by reflecting setting data associated with the synthesis model image;
means for superimposing the synthetic clothing image on the synthetic model image to output a first clothed model image;
An image processing device comprising: means for superimposing the expressed human body part image for synthesis on the first clothed model image, and further superimposing the negative image on the first clothed model image to generate a final clothed model image.
　衣服と人体の前後関係を判定済みの合成用表出人体部位画像を生成する手段は、
　計算された前記一致率が閾値以上である場合に、前記領域に分割された前記合成用モデル画像の表出人体部位全体の画像の対応する部分が視認可能な部分であると判定する手段であって、前記視認可能な部分は衣服の上に見えている部分である、手段
　をさらに備える、請求項１の画像処理装置。 The means for generating a synthetic expressed human body part image in which the anteroposterior relationship between clothing and the human body has been determined is as follows:
means for determining that a corresponding portion of an image of the entire expressed human body part of the synthesis model image divided into the regions is a visible portion when the calculated matching rate is equal to or higher than a threshold; The image processing apparatus according to claim 1, further comprising: means, wherein the visible portion is a portion visible on clothing.
　前記第１の３Ｄ着衣アバターの深度情報を取得する手段をさらに備え、
　前記合成用衣服画像のエッジ情報は、前記深度情報を使用して抽出される、
　請求項２の画像処理装置。 Further comprising means for acquiring depth information of the first 3D clothed avatar,
edge information of the synthetic clothing image is extracted using the depth information;
An image processing device according to claim 2.
　エッジで囲まれた前記領域に関連付けられる閾値は、前記深度情報に基づいて異なる、請求項３の画像処理装置。 The image processing device according to claim 3, wherein the threshold associated with the area surrounded by edges differs based on the depth information.
　前記合成用モデル画像から、表出人体部位のキーポイントを抽出し、抽出したキーポイントをバウンディングボックスとして探索範囲を設定する手段と、
　前記合成用モデル画像における、表出人体部位の一方の側から他方の側へ移動しながらエッジを探索し、表出人体部位の他方の側に到達して折り返すまでエッジの探索を続ける手段と、
　前記探索されたエッジの範囲から前記合成用モデル画像の表出人体部位全体の画像を抽出する手段と
　をさらに備えた、請求項１の画像処理装置。 means for extracting key points of expressed human body parts from the synthesis model image and setting a search range using the extracted key points as bounding boxes;
means for searching for edges in the synthetic model image while moving from one side of the expressed human body part to the other side, and continuing to search for edges until reaching the other side of the expressed human body part and turning back;
The image processing apparatus according to claim 1, further comprising means for extracting an image of the entire human body part expressed in the synthesis model image from the range of the searched edges.
　衣服の種類がトップスである場合、前記表出人体部位は、顔及び／または手であり、
　衣服の種類がボトムスである場合、前記表出人体部位は、足及び／または足首である、
　請求項１の画像処理装置。 When the type of clothing is a top, the exposed human body part is a face and/or a hand;
When the type of clothing is bottoms, the exposed human body parts are feet and/or ankles;
An image processing device according to claim 1.
　画像処理装置によって実行される画像処理方法であって、
　合成用モデル画像に関連付けられる設定データ、３Ｄアバター、及び３Ｄ衣服データを使用して、第１の３Ｄ着衣アバター、第２の３Ｄ着衣アバター、及び合成用衣服画像を生成することと、
　前記合成用モデル画像に関連付けられる表出人体部位のマスク画像のエッジ情報と、前記合成用衣服画像のエッジ情報とに基づいて、前記合成用モデル画像の表出人体部位全体の画像をエッジで囲まれた領域に分割し、分割された前記領域ごとに、前記第２の３Ｄ着衣アバターと前記領域に分割された前記合成用モデル画像の表出人体部位全体の画像との対応する部分の一致率を計算することによって、衣服と人体の前後関係を判定済みの合成用表出人体部位画像を生成することと、
　前記第１の３Ｄ着衣アバターに対して、前記合成用モデル画像に関連付けられる設定データを反映させてレンダリングを実行した際の陰画像を生成することと、
　前記合成用モデル画像に前記合成用衣服画像を重ね合わせて第１の着衣モデル画像を出力することと、
　前記第１の着衣モデル画像に、前記合成用表出人体部位画像を重ね合わせ、さらに前記陰画像を重ね合わせて、最終的な着衣モデル画像を生成することと
　を備える画像処理方法。 An image processing method executed by an image processing device, the method comprising:
Generating a first 3D clothed avatar, a second 3D clothed avatar, and a composite clothing image using setting data, a 3D avatar, and 3D clothing data associated with the composite model image;
An image of the entire expressed human body part of the synthetic model image is surrounded by edges based on edge information of a mask image of the expressed human body part associated with the synthetic model image and edge information of the synthetic clothing image. and for each divided region, the matching rate of the corresponding portion between the second 3D clothed avatar and the image of the entire expressed human body part of the synthesis model image divided into the regions. By calculating the above, a expressed human body part image for synthesis in which the anteroposterior relationship between the clothing and the human body has been determined,
generating a shadow image when rendering is performed on the first 3D clothed avatar by reflecting setting data associated with the synthesis model image;
superimposing the synthetic clothing image on the synthetic model image to output a first clothed model image;
An image processing method comprising: superimposing the expressed human body part image for synthesis on the first clothed model image, and further superimposing the negative image on the first clothed model image to generate a final clothed model image.