WO2024009721A1 - Image processing device, and image processing method - Google Patents

Image processing device, and image processing method Download PDF

Info

Publication number
WO2024009721A1
WO2024009721A1 PCT/JP2023/022231 JP2023022231W WO2024009721A1 WO 2024009721 A1 WO2024009721 A1 WO 2024009721A1 JP 2023022231 W JP2023022231 W JP 2023022231W WO 2024009721 A1 WO2024009721 A1 WO 2024009721A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
clothing
clothed
avatar
human body
Prior art date
Application number
PCT/JP2023/022231
Other languages
French (fr)
Japanese (ja)
Inventor
倫晶 有定
新吾 堀内
大暉 市原
祥彦 静野
浩之 木村
裕貴 中山
喜貴 千賀
Original Assignee
株式会社Nttデータ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Nttデータ filed Critical 株式会社Nttデータ
Publication of WO2024009721A1 publication Critical patent/WO2024009721A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Definitions

  • the present invention relates to an image processing device and an image processing method. More specifically, the present invention attaches 3D clothing data created by a user to a 3D avatar in a 3DCG environment, converts only the clothing image into 2D data, and automatically converts the 2D data of the clothing image and the 2D data of the human.
  • the present invention relates to an image processing apparatus and an image processing method for synthesizing images and outputting them as a 2D data wearing image.
  • GAN Geneative Adversarial Network
  • An avatar is a character used as a user's alter ego on a network, and includes two-dimensional (2D) image avatars and three-dimensional (3D) avatars.
  • GAN is a method of generating images by learning using two neural networks. GAN technology has made it possible for computers to generate an unlimited number of similar images as if they were taken of the real thing.
  • 3D avatars and GAN technology efforts are being made in the apparel and advertising industries to utilize non-existent models (virtual models) in sales promotions, marketing, and other business operations.
  • 3D models of clothing are created, and the 3D clothing data is converted into 2D data and composited with the model's image. Because the image may sometimes be hidden away, corrections were made manually to make it look natural.
  • Patent Document 1 The technique disclosed in Patent Document 1 was known as an image synthesis technique for synthesizing clothing data with a mannequin image. However, even if the technique of Patent Document 1 is used, there is a problem in that the combination of the clothing image and the model image still looks unnatural. For example, when an image of a human body from the neck up and an image of clothing are combined, the lining of the back of the neck is displayed, leaving the combined image unnatural.
  • the present invention was made to solve such problems, and it involves dressing a 3D avatar with 3D clothing data created by a user in a 3DCG environment, converting only the clothing image into 2D data, and converting the clothing image into 2D data.
  • An object of the present invention is to provide an image processing device and an image processing method that automatically synthesize images of 2D data and human 2D data and output the 2D data as a worn image.
  • An image processing device that is one aspect of the present invention includes Means for generating a first 3D clothed avatar, a second 3D clothed avatar, and a synthetic clothing image using setting data, a 3D avatar, and 3D clothing data associated with the synthetic model image;
  • An image of the entire expressed human body part of the synthetic model image is surrounded by edges based on edge information of a mask image of the expressed human body part associated with the synthetic model image and edge information of the synthetic clothing image. and for each divided region, the matching rate of the corresponding portion between the second 3D clothed avatar and the image of the entire expressed human body part of the synthesis model image divided into the regions.
  • the unnaturalness that conventionally occurs in a composite image is reduced, and manual correction by humans is no longer necessary.
  • FIG. 1 is an overall configuration diagram of an image processing system 1 including an image processing device 10, a user terminal 11, a 3D scanner 12, and an imaging device 13 according to an embodiment of the present invention.
  • FIG. 1 is a diagram illustrating an overview of processing executed by an image processing device 10, a user terminal 11, and an imaging device 13 according to an embodiment of the present invention.
  • 1 is a system configuration diagram of an image processing apparatus 10 according to an embodiment of the present invention.
  • FIG. 3 is a diagram illustrating an example of a data structure of live photographic data 106 according to an embodiment of the present invention. It is a diagram showing an example of a data structure of a 3D avatar 108 according to an embodiment of the present invention.
  • FIG. 2 is a diagram showing a processing flow in which the image processing device 10 generates a 3D clothed avatar and a synthetic clothing image.
  • FIG. 3 is a diagram illustrating a processing flow in which the image processing device 10 generates a composite face/hand image in which the anteroposterior relationship between clothing and a human body has been determined.
  • 3 is a diagram showing a processing flow in which the image processing device 10 generates a shadow image.
  • FIG. 3 is a diagram showing a processing flow in which the image processing device 10 generates a final clothed image.
  • (a) is a diagram illustrating an image in which edge information is extracted from mask images of faces and hands, and clothing images for synthesis
  • (b) is a diagram illustrating an image in which two pieces of edge information are combined.
  • (a) is a diagram showing an example in which the hand part of the synthesis model image is divided into several regions surrounded by edges
  • (b) is a diagram showing the conventional method of the generated synthesis face/hand image.
  • FIG. 2 is a diagram illustrating an example of a hand image of the technique and the present invention.
  • the image processing device 10 will be described as one device or system, but the various processes executed by the image processing device 10 may be configured to be distributed and executed by multiple devices or systems. .
  • the image processing device 10 executes image composition processing described in this specification. More specifically, the image processing device 10 dresses the 3D clothing data created by the user on a 3D avatar in a 3DCG environment, converts only the clothing image into 2D data, and converts the 2D data of the clothing image and the 2D data of the human being into 2D data.
  • the system automatically synthesizes the images and outputs a 2D data wearing image.
  • the user terminal 11 can be any type of device (e.g., PC, tablet terminal, etc.) capable of operating in a wired or wireless environment used by the user, but is not limited to a specific device or device. There isn't.
  • the user terminal 11 can generate 3D clothing data using a third-party application, and can generate 3D avatar data using a 3D scanner 12 or the like.
  • the user terminal 11 transmits 3D clothing data and 3D avatar data to the image processing device 10, transmits various instructions regarding image composition processing via an application provided by the image processing device 10, and performs image processing on the composition results. It can be received from the device 10.
  • the 3D scanner 12 is a device that has a function of generating 3D avatar data in response to instructions from the user terminal 11.
  • the imaging device 13 is a device that takes a live photograph of a real model, and is a device that takes an image of the model using one or more cameras, and can include any studio device.
  • the floor, wall, etc. of the imaging location may be any background such as a blue background or a green background.
  • the image processing device 10 uses images of real models and various setting data to dress 3D avatars with 3D clothing data created by the user in a 3DCG environment, converts only the clothing images into 2D data, and creates clothing images. automatically synthesizes the 2D data of the person and the 2D data of the person, and outputs the 2D data wearing image.
  • FIG. 2 is a diagram illustrating an overview of processing executed by the image processing device 10, the user terminal 11, and the imaging device 13 according to the embodiment of the present invention.
  • S1 in FIG. 2 is executed by the imaging device 13
  • S2 is executed by the user terminal 11
  • S3 to S6 are executed by the image processing device 10.
  • a user uses the imaging device 13 to photograph an actual model.
  • a model image of an actual model is used as a model image for synthesis in image synthesis processing that will be described later.
  • the model image for synthesis is 2D image data.
  • the imaging device 13 transmits the synthesis model image, camera setting data (camera angle, distance, etc.) and illumination setting data (brightness, etc.) at the time of imaging to the image processing device 10.
  • the image processing device 10 stores the compositing model image, camera setting data, and illumination setting data received from the imaging device 13 in the live photographing data 106.
  • the user terminal 11 has an application such as a third party, generates 3D clothing data, and also communicates with the 3D scanner 12 to generate a 3D avatar with the same pose as the actual model.
  • the pose of the 3D avatar may be the same as that of the real model at this stage, or it may be set as a basic pose and changed to the same pose during processing in S3, which will be described later.
  • the user terminal 11 transmits the 3D clothing data and 3D avatar to the image processing device 10, and the image processing device 10 stores the received 3D clothing data and 3D avatar in 3D clothing data 107 and 3D avatar 108, respectively.
  • the image processing device 10 provides the user terminal 11 with an application for generating a 3D clothed avatar and a synthetic clothing image. In response to an instruction from the user terminal 11, the image processing device 10 outputs a 2D image of only clothes (synthesis clothing image) and two types of 3D clothed avatars for use in the compositing process.
  • the image processing device 10 reads 3D clothing data from the 3D clothing data 107 and reads 3D avatars from the 3D avatar 108 in response to instructions from the user terminal 11.
  • the image processing device 10 can also read out a model image from the live-action photography data 106 and change the pose of the 3D avatar to the same pose as the pose of the read model image.
  • the image processing device 10 superimposes 3D clothing data on a 3D avatar in a 3DCG (computer graphics) space, performs predetermined position calculations, and determines the size and placement position of the 3D clothing data. and place the 3D clothing data at the appropriate position of the 3D avatar. Through this process, the 3D clothing data is put on the 3D avatar.
  • 3DCG computer graphics
  • the image processing device 10 executes a cross simulation of 3D clothing data on a 3D avatar wearing 3D clothing data in a 3DCG space according to the body shape and pose of the 3D avatar,
  • the first 3D clothed avatar is stored in the 3D clothed avatar 109.
  • the image processing device 10 generates a 2D clothing image (also referred to as a "synthesis clothing image") based on the 3D clothing data excluding the 3D avatar, and stores it in the synthesis clothing image 110.
  • a 2D clothing image also referred to as a "synthesis clothing image”
  • the user uses any application to mechanically or manually generate face and hand mask images from the synthesis model image using a method such as binarization. That is, in response to a mask image generation instruction from the user terminal 11, the image processing device 10 generates mask images of the face and hands from the synthesis model image received from the imaging device 13. The image processing device 10 extracts edge information of a face and hand mask image that has been generated in advance using an arbitrary filter.
  • the described embodiment is explained using tops as an example of the type of clothing, so the parts of the human body that are exposed in the clothed state (exposed human body parts) are the face and/or hands, but other It should be understood that in the case of different types of clothing, the body parts exposed may vary depending on the type of clothing (eg, feet and/or ankles in the case of bottoms).
  • the image processing device 10 extracts key points of the face and hands from the synthesis model image, and sets a search range using the extracted key points as a bounding box.
  • the image processing device 10 searches for edges while moving from one side of the 3D face or hand to the other side (for example, from the left outer part to the right outer part), and reaches the other side of the face or hand. and continues searching the edge until it turns around.
  • the image processing device 10 extracts an image of the entire face and hand of the synthesis model image from the searched edge range.
  • the image processing device 10 executes depth information extraction processing to extract depth information (also referred to as "depth information") in the first 3D clothed avatar.
  • the image processing device 10 extracts edge information of the clothing image for synthesis using an arbitrary filter.
  • the image processing device 10 can extract edge information using depth information, because if the edges of the clothing image for synthesis are extracted as they are, wrinkles caused by the texture will become edge noise.
  • the image processing device 10 combines the edge information of the mask image of the face and hand with the edge information of the clothing image for synthesis, and creates several images of the entire face and hand of the model image for synthesis based on the combined images. To divide. The image processing device 10 determines the face and hand to be finally synthesized while calculating the matching rate with the corresponding area in the second 3D clothed avatar for each divided area of the entire face and hand of the synthesis model image. Extract images (face and hand images for synthesis).
  • the image processing device 10 reflects the camera setting data and proof setting data used when generating the synthesis model image on the first 3D clothed avatar read from the 3D clothed avatar 109, and calculates the shadow created when rendering is performed. (shade) information is generated as a shade image and stored in the 3D clothed avatar 109.
  • the image processing device 10 executes a first clothing composition process of superimposing a composition clothing image on a composition model image and outputting a first clothed model image.
  • the image processing device 10 executes a second clothed image generation process that generates a final clothed model image by superimposing the synthetic face and hand images on the first clothed model image and further superimposing the shadow image. do.
  • FIG. 3 is a system configuration diagram of the image processing device 10 according to the embodiment of the present invention.
  • the image processing device 10 may be configured to be placed on a cloud system or on an in-house network.
  • the image processing device 10 includes a control section 101, a main storage section 102, an auxiliary storage section 103, and an interface (IF) section 104, which are interconnected by a bus 120 or the like, like a general computer. , and an output section 105.
  • the auxiliary storage unit 103 stores programs that implement each function of the image processing device 10 and data handled by the programs.
  • the auxiliary storage unit 103 includes live photographed data 106, 3D clothing data 107, 3D avatar 108, 3D clothing avatar 109, and synthetic clothing image 110 in a file/database format.
  • the image processing device 10 can read or update information stored in the live-action photography data 106, 3D clothing data 107, 3D avatar 108, 3D clothed avatar 109, and synthetic clothing image 110.
  • Each program stored in the auxiliary storage unit 103 is executed by the image processing device 10.
  • the control unit 101 also called a central processing unit (CPU), controls each component of the image processing device 10 and calculates data, and reads various programs stored in the auxiliary storage unit 103 to the main storage unit 102. and execute it.
  • the main storage unit 102 is also called main memory, and stores various received data, computer-executable instructions, and data after arithmetic processing using the instructions.
  • the auxiliary storage unit 103 is a storage device such as a hard disk (HDD) or SSD (Solid State Drive), and stores data and programs for a long period of time.
  • HDD hard disk
  • SSD Solid State Drive
  • FIG. 3 describes an embodiment in which the control unit 101, the main storage unit 102, and the auxiliary storage unit 103 are provided inside the same computer, as another embodiment, the image processing apparatus 10
  • the image processing apparatus 10 By using a plurality of main storage units 102 and auxiliary storage units 103, it is also possible to implement parallel distributed processing by a plurality of computers.
  • a plurality of servers for the image processing apparatus 10 may be installed, and one auxiliary storage unit 103 may be shared by the plurality of servers.
  • the IF unit 104 serves as an interface for transmitting and receiving data with other systems and devices, and also provides an interface for receiving various commands and input data (various masters, tables, etc.) from the system operator.
  • the output unit 105 provides a display screen for displaying processed data, a printing means for printing the data, and the like.
  • Components similar to the control unit 101, main storage unit 102, auxiliary storage unit 103, IF unit 104, and output unit 105 also exist in the user terminal 11 and the imaging device 13.
  • the live-action shooting data 106 stores a model image (2D image data) of a real model, a mask image of the face and hands of the real model, and camera setting data and lighting setting data at the time of live-action shooting.
  • FIG. 4 is a diagram illustrating an example of the data structure of the live-action photography data 106 according to the embodiment of the present invention.
  • the live-action shooting data 106 can include a live-action shooting ID 401, a model image 402, a mask image 403, camera setting data 404, and lighting setting data 405, but is not limited to these data items and may include other data items. can also be included.
  • the live-action shooting ID 401 is an identifier that identifies a model at the time of live-action shooting and data associated with the model.
  • the model image 402 is 2D model image data of a real model, and is also called a "synthesis model image.”
  • the mask image 403 is a mask image of the model's face and hands generated from the synthesis model image.
  • Camera setting data 404 indicates camera setting data at the time of live-action photography, such as camera angle and distance.
  • Lighting setting data 405 indicates lighting setting data, such as brightness, at the time of live-action photography.
  • the 3D clothing data 107 stores 3D clothing data generated by the user.
  • the 3D clothing data may be stored in association with attribute information (eg, clothing category, color, shape, etc.) to facilitate image selection.
  • the 3D avatar 108 stores 3D avatar data generated by the user.
  • the 3D avatar is created by the user so as to have the same pose as the model image at the time of live-action photography.
  • FIG. 5 is a diagram showing an example of the data structure of the 3D avatar 108 according to the embodiment of the present invention.
  • the 3D avatar 108 can include a 3D avatar ID 501, a 3D avatar 502, and a live-action shooting ID 401, but is not limited to these data items and can also include other data items.
  • the 3D clothed avatar 109 stores image data of a 3D clothed avatar obtained by superimposing 3D clothing data on the 3D avatar and performing predetermined processing.
  • FIG. 6 is a diagram showing an example of the data structure of the 3D clothed avatar 109 according to the embodiment of the present invention.
  • the 3D clothed avatar 109 can include a 3D clothed avatar ID 601, a first 3D clothed avatar 602, a second 3D clothed avatar 603, shadow information 604, a shadow image 605, a 3D avatar ID 501, and a live-action shooting ID 401. It is not limited to these data items and can also include other data items.
  • the 3D clothed avatar ID 601 is an identifier that identifies the 3D clothed avatar generated by the image processing device 10.
  • the first 3D clothed avatar 602 shows image data of a 3D clothed avatar that has undergone 3D cloth simulation.
  • a second 3D clothed avatar 603 represents image data of a 3D clothed avatar that has been subjected to 3D rendering processing by reflecting camera setting data and certification setting data on the first 3D clothed avatar.
  • the shadow information 604 and the shadow image 605 are the shadow information and shadow information generated when rendering is executed by reflecting the camera setting data and proof setting data when generating the synthesis model image for the first 3D clothed avatar. The shadow images are shown respectively.
  • the 3D avatar ID 501 is an identifier for identifying the 3D avatar from which the 3D clothed avatar is generated
  • the live-action shooting ID 401 is an identifier for identifying the live-action shooting associated with the 3D avatar.
  • the 3D avatar ID 501 and live-action shooting ID 401 make it easier to acquire various setting data and the like when shooting a real model.
  • the synthetic clothing image 110 stores 2D clothing data generated based on the 3D clothing data of the second 3D clothing avatar.
  • FIG. 7 is a diagram illustrating an example of a data structure of a clothing image for synthesis 110 according to an embodiment of the present invention.
  • the synthetic clothing image 110 can include a synthetic clothing image ID 701, a synthetic clothing image 702, a live-action photography ID 401, and a 3D clothed avatar ID 601, but is not limited to these data items and may include other data items. can also be included.
  • the compositing clothing image ID 701 is an identifier that identifies 2D clothing image data used in the image compositing process according to the embodiment of the present invention.
  • a clothing image for composition 702 indicates 2D clothing image data used for image composition processing.
  • the live-action shooting ID 401 is an identifier that identifies the live-action shooting associated with the 3D clothed avatar from which the synthetic clothing image is generated.
  • the 3D clothed avatar ID 601 is an identifier of a 3D clothed avatar associated with 3D cloth data that is the original data of the clothing image for synthesis.
  • the image processing device 10 creates a final clothed model using a compositing clothing image (2D), a compositing model image (2D), various setting data, 3D avatars, and 3D clothing data.
  • the processing flow for generating an image will be explained. 8 to 11 show the processing contents of S3 to S6 in FIG. 2, respectively. Either of S4 and S5 may be performed first.
  • FIG. 8 shows data generated by the image processing device 10 through the processing described above with reference to S1 and S2 of FIG. A processing flow for generating a 3D clothed avatar and a synthetic cloth image using data is shown.
  • the image processing device 10 provides the user terminal 11 with an application for generating a 3D clothed avatar image, and performs the process based on user instructions received via the user terminal 11.
  • the user terminal 11 selects a synthesis model image, 3D clothing data, and 3D avatar to be processed through the provided application, and sends a selection instruction to the image processing device 10.
  • the image processing device 10 reads the selected model image from the live-action photography data 106, reads 3D clothing data from the 3D clothing data 107, and converts the selected 3D avatar into a 3D avatar 108. Read from.
  • the image processing device 10 can change the pose of the 3D avatar to match the pose of the read model image. Through this processing, the pose of the model image and the pose of the 3D avatar match, the model image and the 3D avatar are associated, and the selected live-action shooting ID 401 is stored in the 3D avatar 108.
  • the user terminal 11 transmits a placement instruction to the image processing device 10 to place the 3D clothing data at an appropriate position of the 3D avatar on the 3DCG space of the application.
  • the image processing device 10 adjusts the size and placement position of the 3D clothing data by superimposing the 3D clothing data on the 3D avatar in the 3DCG space and performing predetermined position calculations in accordance with placement instructions from the user terminal 11. and place the 3D clothing data at an appropriate position on the 3D avatar. Through this process, the 3D clothing data is put on the 3D avatar.
  • Cloth simulation refers to a technology that physically simulates the movement of cloth such as clothing. For example, physical calculations are performed on the cloth, such as simulating the wrinkles in clothing when a 3D avatar wears it.
  • the image processing device 10 executes a cross simulation of 3D clothing data in a 3DCG space on a 3D avatar wearing 3D clothing data according to the body shape and pose of the 3D avatar.
  • the cross-simulated 3D clothed avatar is stored in the first 3D clothed avatar 602 of the 3D clothed avatar 109 .
  • the image processing device 10 reads camera setting data and lighting setting data associated with the model image selected in S801 from the live-action photography data 106.
  • the image processing device 10 reflects the read camera setting data and lighting setting data on the cross-simulated 3D clothed avatar (first 3D clothed avatar) in the 3DCG space of the application, and applies a predetermined shader.
  • the image processing device 10 extracts 3D clothing data by removing the 3D avatar from the 3D clothing avatar (second 3D clothing avatar) that has undergone 3D rendering processing.
  • the image processing device 10 generates a 2D clothing image (herein referred to as a "synthesis clothing image") based on the extracted 3D clothing data, and stores it in the synthesis clothing image 110.
  • the image processing device 10 uses a face and hand mask image, a composition clothing image, and a 3D clothed avatar to generate a composite face and hand image in which the front-back relationship between the clothing and the human body has been determined.
  • the processing flow is shown. Note that this processing flow assumes that the image processing device 10 communicates with the user terminal 11 through an arbitrary application to generate mask images of the face and hands from the synthesis model image.
  • the term "hand” is used to indicate any of the wrist, palm, and fingers of a human body from the shoulder to the fingertips, but these may vary depending on the design of the clothing. .
  • the image processing device 10 extracts key points of the face and hands from the synthesis model image associated with the live-action shooting ID 401 to be processed, and sets a search range using the extracted key points as a bounding box.
  • the image processing device 10 searches for edges while moving from one side of the face or hand to the other side (for example, from the left outer part to the right outer part), and reaches the other side of the face or hand. Continue searching the edge until it turns around.
  • the image processing device 10 extracts the entire face and hand image of the synthesis model image from the searched edge range.
  • the image processing device 10 extracts edge information of a face and hand mask image that has been generated in advance using an arbitrary filter.
  • the upper part of FIG. 12(a) shows an image of extracting edge information from mask images of faces and hands.
  • the image processing device 10 reads the first 3D clothed avatar 602 from the 3D clothed avatar 109, and performs depth information extraction processing on the read first 3D clothed avatar 602. Through this process, the image processing device 10 can acquire depth information at each position of the first 3D clothed avatar 602. Depth information makes it possible to distinguish between wrinkles and contour lines in clothing.
  • the image processing device 10 extracts edge information of the clothing image for synthesis using an arbitrary filter. If the edge information of the clothing image for synthesis is extracted as is, wrinkles due to the texture of the clothing may become noise, so the image processing device 10 extracts the edge information of the clothing image for synthesis using the depth information acquired in S902. Can be done.
  • the lower part of FIG. 12(a) shows an image of extracting edge information from a clothing image for synthesis.
  • the order of the processing in S901 and the processing in S902 and S903 is not particularly limited and may be performed first. That is, the processing in S902 and S903 may be performed after the processing in S901, or the processing in S901 may be performed after the processing in S902 and S903. Alternatively, both may be processed in parallel.
  • the image processing device 10 combines the edge information of the clothing image for synthesis with the edge information of the face and hand mask images.
  • FIG. 12(b) shows an image in which two pieces of edge information are combined.
  • the image processing device 10 divides the entire face and hand image of the synthesis model image extracted in S901 into regions surrounded by edges, based on the edge information combined in S904.
  • FIG. 13A is an example showing that the hand portion of the synthesis model image is divided into several regions surrounded by edges.
  • the image processing device 10 reads the second 3D clothed avatar 603 from the 3D clothed avatar 109, and combines the read second 3D clothed avatar 603 with the compositing model image that has been divided into several regions.
  • the corresponding parts of the face and the entire hand image (for example, the thumbs of both left hands) are compared for each divided region.
  • the image processing device 10 calculates the matching rate between the two, and determines that a portion where the matching rate is equal to or higher than a predetermined threshold value (X) is an actually visible portion. Based on the determination results for each area, the image processing device 10 determines a certain area of the entire face and hand image of the synthesis model image as a visible part (visible part) or as an invisible part (visible part).
  • the face and hand images to be finally synthesized are extracted.
  • the image processing device 10 since the matching rate for the thumb portion of the left hand was less than the threshold value (X), the image processing device 10 determines that the image of the thumb portion of the left hand should not be combined with the face and hand images to be finally combined. Processing is being performed not to include it.
  • the threshold (X) can be changed based on depth information. Therefore, the image processing apparatus 10 can change the threshold value (X) of each position based on the depth information of each position acquired in S902. Therefore, the value of the threshold (X) can change for each divided region surrounded by edges.
  • FIG. 13(b) shows an example of the generated synthetic face/hand images of the prior art and the hand image of the present invention.
  • the thumb portion is visible because the conventional general image synthesis process does not perform the above-described matching determination. In the actual pose, the thumb is hidden behind the folds of the clothing and cannot be seen, resulting in an unnatural image.
  • the thumb part is hidden behind the folds of the clothing and cannot be seen.
  • the image processing device 10 determines that the matching rate for this thumb portion is less than the threshold (X), and does not include this thumb portion in the face/hand image for synthesis since it is an invisible portion.
  • the image processing device 10 renders the cross-simulated 3D clothed avatar (first 3D clothed avatar) by reflecting the camera setting data and proof setting data when generating the synthesis model image.
  • the processing flow for generating a shade image based on shade information generated when executing the process is shown.
  • the image processing device 10 reads the first 3D clothed avatar 602 from the 3D clothed avatar 109 based on the live-action shooting ID 401 to be processed.
  • the image processing device 10 also queries the live-action shooting data 106 based on the live-action shooting ID 401 and reads out the corresponding camera setting data 404 and illumination setting data 405.
  • the image processing device 10 performs rendering on the read first 3D clothed avatar 602 by reflecting the corresponding camera setting data 404 and lighting setting data 405, and determines whether or not light is shining on it. Calculate and perform shading.
  • the image processing device 10 In S1003, the image processing device 10 generates a shadow image based on shadow information that is a shading calculation result.
  • the image processing device 10 stores the shadow information and the shadow image as the shadow information 604 and the shadow image 605 of the 3D clothed avatar 109, respectively.
  • FIG. 11 shows a first clothing composition process in which the image processing device 10 outputs a first clothed model image by superimposing a composition clothing image on a composition model image associated with the live-action shooting ID 401 to be processed; A second clothed compositing process that generates a final clothed model image by superimposing the compositing face/hand image on the first clothed model image that is the output of the first clothed compositing process, and further overlapping the shadow image.
  • the processing flow for generating the final clothed image by executing the following is shown.
  • the image processing device 10 executes a first clothing composition process. More specifically, the image processing device 10 reads the model image 402 from the live-action photography data 106 based on the live-action photography ID 401 to be processed, and queries the clothing image for composition 110 using the live-action photography ID 401. The composite clothing image 702 is read out. The image processing device 10 generates a first clothed model image by superimposing the synthetic clothing image on the synthetic model image.
  • the image processing device 10 executes a second clothing composition process. More specifically, the image processing device 10 generates the final clothed model image by superimposing the synthetic face/hand image and the shadow image on the generated first clothed model image. do. The image processing device 10 provides the generated final clothed model image to the user terminal 11.
  • the above-described processing enables the image processing device 10 to perform image synthesis processing while estimating the context of a person and clothing more precisely. According to the present invention, it is difficult to identify the front and back relationship between a person and clothing, and problems such as low accuracy of output images such as the clothing area that should normally be on the person becoming invisible can be solved. .
  • the principles of the present invention can also be applied to parts of the human body other than the face and hands, depending on the type of clothing, to create a composite image. be able to generate.
  • the parts of the human body that are exposed when wearing tops and bottoms are different.
  • the exposed body parts may be the face and/or hands, and in the case of bottoms, the exposed body parts may be the hands and/or ankles.
  • “exposed human body parts” refer to the face, hands, feet, ankles, etc., depending on the type of clothing.
  • the objects are not limited to the human body and clothing.
  • the target objects may be a human body and a vehicle (car, motorcycle, bicycle, etc.).
  • the number of objects may be three or more.
  • the present invention can be implemented as, for example, a system, device, method, program, storage medium, or the like.
  • Image processing system 10 Image processing device 11 User terminal 12 3D scanner 13 Imaging device 14 Network 101 Control unit 102 Main storage unit 103 Auxiliary storage unit 104 Interface (IF) unit 105 Output unit 106 Live-action shooting data 107 3D clothing data 108 3D avatar 109 3D clothed avatar 110 Clothes image for synthesis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The present invention reduces unnaturalness in a synthesized image and obviates the need for manual correction by a person. This image processing device: divides a face and complete hands in a model image for synthesis into regions bounded by edges, on the basis of edge information pertaining to a mask image of a face and hands in the model image for synthesis and edge information pertaining to a clothing image for synthesis; calculates, for each divided region, the rate of consistency between a second 3D clothed avatar and portions that correspond to images of the face and complete hands in the model image for synthesis, the face and hands having been divided into the regions; ad generates a face/hand image for synthesis for which a before/after relationship between clothing and a person has already been assessed. The image processing device also: generates, for a first 3D clothed avatar, a shadow image in which rendering is executed to reflect settings data that is associated with the model image for synthesis; outputs a first clothed model image in which the clothing image is superposed on the model image for synthesis; superposes the face/hand image for synthesis on the first clothed model image; and furthermore superposes the shadow image on the resultant image to generate a final clothed model image.

Description

画像処理装置及び画像処理方法Image processing device and image processing method
 本発明は、画像処理装置及び画像処理方法に関する。より詳細に言えば、本発明は、ユーザが作成した3D衣服データを3Dアバターに3DCG環境で着せ付け、そこから衣服画像のみを2Dデータ化して、衣服画像の2Dデータとヒトの2Dデータを自動で画像合成し、2Dデータの着用画像として出力する画像処理装置及び画像処理方法に関する。 The present invention relates to an image processing device and an image processing method. More specifically, the present invention attaches 3D clothing data created by a user to a 3D avatar in a 3DCG environment, converts only the clothing image into 2D data, and automatically converts the 2D data of the clothing image and the 2D data of the human. The present invention relates to an image processing apparatus and an image processing method for synthesizing images and outputting them as a 2D data wearing image.
 従来、アパレル業界や広告業界などでは、実在のモデルに衣服を身に着けてもらって写真を撮り、その画像を販促やマーケティングに利用してきた。しかしながら、このような手法は一つの画像を生成する際の人的、金銭的及び時間的負荷が高く、多くの画像を生成するには不向きであった。 Traditionally, in the apparel and advertising industries, real models have been photographed wearing clothing, and the images have been used for sales promotions and marketing. However, such a method requires a large amount of human resources, money, and time to generate one image, and is not suitable for generating many images.
 このため、3DアバターやGAN(Generative Adversarial Network:敵対的生成ネットワーク)と呼ばれる技術が利用されつつある。アバターは、ネットワーク上でユーザの分身として使用されるキャラクターのことであり、2次元(2D)画像のアバターと3次元(3D)表現されたアバターがある。GANは、2つのニューラルネットワークを用いて学習することで画像を生成する手法である。GANの技術により、実物を撮影したかのように、コンピュータが同類の画像を無尽蔵に生成でき得るようになってきた。 For this reason, 3D avatars and a technology called GAN (Generative Adversarial Network) are being used. An avatar is a character used as a user's alter ego on a network, and includes two-dimensional (2D) image avatars and three-dimensional (3D) avatars. GAN is a method of generating images by learning using two neural networks. GAN technology has made it possible for computers to generate an unlimited number of similar images as if they were taken of the real thing.
 3DアバターやGANの技術を用いることにより、アパレル業界や広告業界などでは、実在しないモデル(バーチャルモデル)を使用した販促やマーケティングなどの業務活用に関する取組みが進んできている。特に、アパレル業界では、衣服の3Dモデルを作成し、その3D衣服データを2Dデータ化してモデルの画像に合成しているが、単なる合成ではモデルの体と衣服の前後関係から不自然に見えてしまうことがあるため、ヒトの手を介して自然に見えるように補正を行っていた。 By using 3D avatars and GAN technology, efforts are being made in the apparel and advertising industries to utilize non-existent models (virtual models) in sales promotions, marketing, and other business operations. In particular, in the apparel industry, 3D models of clothing are created, and the 3D clothing data is converted into 2D data and composited with the model's image. Because the image may sometimes be hidden away, corrections were made manually to make it look natural.
特開2011-186774号公報Japanese Patent Application Publication No. 2011-186774
 ヒトの手を介して画像合成を行う手法では作成可能な画像数に限界があるため、多くの企業が3D衣服データをヒトの画像に自動的に着衣させる画像合成手法を研究している。 Because there is a limit to the number of images that can be created using methods that perform image synthesis using human hands, many companies are researching image synthesis methods that automatically attach 3D clothing data to human images.
 衣服データをマネキン画像に合成させる画像合成技術として、特許文献1の技術が知られていた。しかしながら、特許文献1の技術を利用したとしても、衣服画像とモデルの画像を結合させたものに依然として不自然さが残るという問題があった。例えば、首から上の人体画像と衣服画像を結合させると首背部の裏地が表示されてしまうなど、合成画像に不自然さが残る問題が依然として存在していた。 The technique disclosed in Patent Document 1 was known as an image synthesis technique for synthesizing clothing data with a mannequin image. However, even if the technique of Patent Document 1 is used, there is a problem in that the combination of the clothing image and the model image still looks unnatural. For example, when an image of a human body from the neck up and an image of clothing are combined, the lining of the back of the neck is displayed, leaving the combined image unnatural.
 また、3D衣服データをヒトの2D画像に自動的に着衣させる処理を行う場合、従来技術では本来ヒトのうえに来るべき衣服領域が見えなくなってしまうなどの精度の問題が頻繁に発生しており、最終的にヒトの手による修正などが必要とされるなど技術的な課題が存在していた。 In addition, when automatically applying 3D clothing data to a 2D image of a person, conventional technology frequently suffers from accuracy problems such as the clothing area that should originally be on the person becoming invisible. However, there were technical issues, such as the need for manual corrections in the end.
 本発明は、このような課題を解決するためになされたものであり、ユーザが作成した3D衣服データを3Dアバターに3DCG環境で着せ付け、そこから衣服画像のみを2Dデータ化して、衣服画像の2Dデータとヒトの2Dデータを自動で画像合成し、2Dデータの着用画像として出力する画像処理装置及び画像処理方法を提供することを目的とする。 The present invention was made to solve such problems, and it involves dressing a 3D avatar with 3D clothing data created by a user in a 3DCG environment, converting only the clothing image into 2D data, and converting the clothing image into 2D data. An object of the present invention is to provide an image processing device and an image processing method that automatically synthesize images of 2D data and human 2D data and output the 2D data as a worn image.
 本発明の一態様である画像処理装置は、
 合成用モデル画像に関連付けられる設定データ、3Dアバター、及び3D衣服データを使用して、第1の3D着衣アバター、第2の3D着衣アバター、及び合成用衣服画像を生成する手段と、
 前記合成用モデル画像に関連付けられる表出人体部位のマスク画像のエッジ情報と、前記合成用衣服画像のエッジ情報とに基づいて、前記合成用モデル画像の表出人体部位全体の画像をエッジで囲まれた領域に分割し、分割された前記領域ごとに、前記第2の3D着衣アバターと前記領域に分割された前記合成用モデル画像の表出人体部位全体の画像との対応する部分の一致率を計算することによって、衣服と人体の前後関係を判定済みの合成用表出人体部位画像を生成する手段と、
 前記第1の3D着衣アバターに対して、前記合成用モデル画像に関連付けられる設定データを反映させてレンダリングを実行した際の陰画像を生成する手段と、
 前記合成用モデル画像に前記合成用衣服画像を重ね合わせて第1の着衣モデル画像を出力する手段と、
 前記第1の着衣モデル画像に、前記合成用表出人体部位画像を重ね合わせ、さらに前記陰画像を重ね合わせて、最終的な着衣モデル画像を生成する手段と
 を備えるように構成される。
An image processing device that is one aspect of the present invention includes
Means for generating a first 3D clothed avatar, a second 3D clothed avatar, and a synthetic clothing image using setting data, a 3D avatar, and 3D clothing data associated with the synthetic model image;
An image of the entire expressed human body part of the synthetic model image is surrounded by edges based on edge information of a mask image of the expressed human body part associated with the synthetic model image and edge information of the synthetic clothing image. and for each divided region, the matching rate of the corresponding portion between the second 3D clothed avatar and the image of the entire expressed human body part of the synthesis model image divided into the regions. means for generating a composite expressed human body part image in which the anteroposterior relationship between the clothing and the human body has been determined by calculating the
means for generating a shadow image when rendering is performed on the first 3D clothed avatar by reflecting setting data associated with the synthesis model image;
means for superimposing the synthetic clothing image on the synthetic model image to output a first clothed model image;
and means for superimposing the expressed human body part image for synthesis on the first clothed model image, and further superimposing the negative image on the first clothed model image to generate a final clothed model image.
 本発明によれば、2種類のエッジ画像、3D画像データ、及びモデル画像を用いることにより、従来発生していた合成画像における不自然さを減少させ、ヒトの手による修正を不要にする。 According to the present invention, by using two types of edge images, 3D image data, and model images, the unnaturalness that conventionally occurs in a composite image is reduced, and manual correction by humans is no longer necessary.
 本明細書において開示される実施形態の詳細な理解は、添付図面に関連して例示される以下の説明から得ることができる。
本発明の実施形態に係る画像処理装置10、ユーザ端末11、3Dスキャナ12及び撮像装置13を含む画像処理システム1の全体構成図である。 本発明の実施形態に係る画像処理装置10、ユーザ端末11、及び撮像装置13によって実行される処理の概要を説明する図である。 本発明の実施形態に係る画像処理装置10のシステム構成図である。 本発明の実施形態に係る実写撮影データ106のデータ構造の一例を示す図である。 本発明の実施形態に係る3Dアバター108のデータ構造の一例を示す図である。 本発明の実施形態に係る3D着衣アバター109のデータ構造の一例を示す図である。 本発明の実施形態に係る合成用衣服画像110のデータ構造の一例を示す図である。 画像処理装置10が3D着衣アバター及び合成用衣服画像を生成する処理フローを示す図である。 画像処理装置10が衣服と人体の前後関係を判定済みの合成用顔・手画像を生成する処理フローを示す図である。 画像処理装置10が陰画像を生成する処理フローを示す図である。 画像処理装置10が最終着衣画像を生成する処理フローを示す図である。 (a)は、顔及び手のマスク画像、合成用衣服画像からエッジ情報を抽出するイメージを示す図であり、(b)は、2つのエッジ情報を組み合わせたイメージを示す図である。 (a)は、合成用モデル画像の手部分をエッジで囲まれたいくつかの領域に分割したことを示す例の図であり、(b)は、生成された合成用顔・手画像の従来技術と本発明の手画像の例を示す図である。
A detailed understanding of the embodiments disclosed herein can be obtained from the following description, illustrated in conjunction with the accompanying drawings.
1 is an overall configuration diagram of an image processing system 1 including an image processing device 10, a user terminal 11, a 3D scanner 12, and an imaging device 13 according to an embodiment of the present invention. FIG. 1 is a diagram illustrating an overview of processing executed by an image processing device 10, a user terminal 11, and an imaging device 13 according to an embodiment of the present invention. 1 is a system configuration diagram of an image processing apparatus 10 according to an embodiment of the present invention. FIG. 3 is a diagram illustrating an example of a data structure of live photographic data 106 according to an embodiment of the present invention. It is a diagram showing an example of a data structure of a 3D avatar 108 according to an embodiment of the present invention. It is a diagram showing an example of a data structure of a 3D clothed avatar 109 according to an embodiment of the present invention. It is a diagram showing an example of a data structure of a clothing image for synthesis 110 according to an embodiment of the present invention. FIG. 2 is a diagram showing a processing flow in which the image processing device 10 generates a 3D clothed avatar and a synthetic clothing image. FIG. 3 is a diagram illustrating a processing flow in which the image processing device 10 generates a composite face/hand image in which the anteroposterior relationship between clothing and a human body has been determined. 3 is a diagram showing a processing flow in which the image processing device 10 generates a shadow image. FIG. FIG. 3 is a diagram showing a processing flow in which the image processing device 10 generates a final clothed image. (a) is a diagram illustrating an image in which edge information is extracted from mask images of faces and hands, and clothing images for synthesis, and (b) is a diagram illustrating an image in which two pieces of edge information are combined. (a) is a diagram showing an example in which the hand part of the synthesis model image is divided into several regions surrounded by edges, and (b) is a diagram showing the conventional method of the generated synthesis face/hand image. FIG. 2 is a diagram illustrating an example of a hand image of the technique and the present invention.
 (全体構成)
 図1は、本発明の実施形態に係る画像処理装置10、ユーザ端末11、3Dスキャナ12及び撮像装置13を含む画像処理システム1の全体構成図である。画像処理装置10は、ユーザ端末11及び撮像装置13とネットワーク14を介して相互に通信可能に接続されている。ユーザ端末11は、LANやWANなどの任意のネットワークを介して3Dスキャナ12と相互に通信可能に接続されている。図1は、説明の簡便化のため、画像処理装置10、ユーザ端末11、3Dスキャナ12及び撮像装置13を1つずつしか示していないが、これらは複数存在し得る。
(overall structure)
FIG. 1 is an overall configuration diagram of an image processing system 1 including an image processing device 10, a user terminal 11, a 3D scanner 12, and an imaging device 13 according to an embodiment of the present invention. The image processing device 10 is connected to a user terminal 11 and an imaging device 13 via a network 14 so as to be able to communicate with each other. The user terminal 11 is connected to the 3D scanner 12 via an arbitrary network such as a LAN or WAN so that they can communicate with each other. Although FIG. 1 shows only one image processing device 10, one user terminal 11, one 3D scanner 12, and one imaging device 13 for simplicity of explanation, there may be a plurality of these devices.
 本明細書では、画像処理装置10を1つの装置あるいはシステムとして説明するが、画像処理装置10によって実行される様々な処理を複数の装置あるいはシステムで分散して実行するように構成してもよい。 In this specification, the image processing device 10 will be described as one device or system, but the various processes executed by the image processing device 10 may be configured to be distributed and executed by multiple devices or systems. .
 画像処理装置10は、本明細書で説明する画像合成処理を実行する。より詳細に言えば、画像処理装置10は、ユーザが作成した3D衣服データを3Dアバターに3DCG環境で着せ付け、そこから衣服画像のみを2Dデータ化して、衣服画像の2Dデータとヒトの2Dデータを自動で画像合成し、2Dデータの着用画像を出力する。 The image processing device 10 executes image composition processing described in this specification. More specifically, the image processing device 10 dresses the 3D clothing data created by the user on a 3D avatar in a 3DCG environment, converts only the clothing image into 2D data, and converts the 2D data of the clothing image and the 2D data of the human being into 2D data. The system automatically synthesizes the images and outputs a 2D data wearing image.
 ユーザ端末11は、ユーザによって使用される有線または無線環境において動作可能な任意のタイプのデバイス(例えば、PC、タブレット型端末など)とすることができるが、特定の装置やデバイスに限定されることはない。ユーザ端末11は、サードパーティのアプリケーションを使用して3D衣服データを生成し、3Dスキャナ12などを使用して3Dアバターのデータを生成することができる。ユーザ端末11は、3D衣服データ及び3Dアバターのデータを画像処理装置10に送信し、画像処理装置10によって提供されるアプリケーションを介して画像合成処理についての各種指示を送信し、合成結果を画像処理装置10から受信することができる。 The user terminal 11 can be any type of device (e.g., PC, tablet terminal, etc.) capable of operating in a wired or wireless environment used by the user, but is not limited to a specific device or device. There isn't. The user terminal 11 can generate 3D clothing data using a third-party application, and can generate 3D avatar data using a 3D scanner 12 or the like. The user terminal 11 transmits 3D clothing data and 3D avatar data to the image processing device 10, transmits various instructions regarding image composition processing via an application provided by the image processing device 10, and performs image processing on the composition results. It can be received from the device 10.
 3Dスキャナ12は、ユーザ端末11からの指示に応答して3Dアバターのデータを生成する機能を有する装置である。 The 3D scanner 12 is a device that has a function of generating 3D avatar data in response to instructions from the user terminal 11.
 撮像装置13は、実在のモデルの実写撮影を行う装置であり、1つ以上のカメラによってモデルの画像を撮影する装置であり、任意のスタジオ装置を含み得る装置である。モデルの撮像画像データを識別しやすくするために、撮像場所の床、壁などは、ブルーバック(青い背景)やグリーンバック(緑の背景)などの任意の背景としてよい。 The imaging device 13 is a device that takes a live photograph of a real model, and is a device that takes an image of the model using one or more cameras, and can include any studio device. In order to easily identify the captured image data of the model, the floor, wall, etc. of the imaging location may be any background such as a blue background or a green background.
 ネットワーク14は、画像処理装置10、ユーザ端末11、3Dスキャナ12及び撮像装置13の間の通信を担う任意の通信網であり、インターネット、イントラネット、専用線、任意のネットワークシステムなどを含み、特に限定されることはない。 The network 14 is any communication network responsible for communication between the image processing device 10, the user terminal 11, the 3D scanner 12, and the imaging device 13, and includes the Internet, an intranet, a leased line, any network system, etc., and is particularly limited to It will not be done.
 (画像処理装置10の機能構成)
 画像処理装置10は、実在のモデルの画像や各種設定データを利用して、ユーザが作成した3D衣服データを3Dアバターに3DCG環境で着せ付け、そこから衣服画像のみを2Dデータ化して、衣服画像の2Dデータとヒトの2Dデータを自動で画像合成し、2Dデータの着用画像を出力する。
(Functional configuration of image processing device 10)
The image processing device 10 uses images of real models and various setting data to dress 3D avatars with 3D clothing data created by the user in a 3DCG environment, converts only the clothing images into 2D data, and creates clothing images. automatically synthesizes the 2D data of the person and the 2D data of the person, and outputs the 2D data wearing image.
 以下、図2を参照しながら、画像処理装置10によって提供される様々な機能を説明する。図2は、本発明の実施形態に係る画像処理装置10、ユーザ端末11、及び撮像装置13によって実行される処理の概要を説明する図である。図2のS1は、撮像装置13によって実行され、S2はユーザ端末11によって実行され、S3~S6は、画像処理装置10によって実行される。なお、本明細書で説明される実施形態は、衣服の種類としてトップス(上衣)を例として説明するが、本発明はこれ以外の種類の衣服(例えば、ボトムス)に対しても適用可能であることを了解されたい。 Hereinafter, various functions provided by the image processing device 10 will be explained with reference to FIG. 2. FIG. 2 is a diagram illustrating an overview of processing executed by the image processing device 10, the user terminal 11, and the imaging device 13 according to the embodiment of the present invention. S1 in FIG. 2 is executed by the imaging device 13, S2 is executed by the user terminal 11, and S3 to S6 are executed by the image processing device 10. Note that the embodiments described in this specification will be described using tops (upper garments) as an example of the type of clothing, but the present invention is also applicable to other types of clothing (e.g., bottoms). I hope you understand that.
 (S1:撮像装置13による実写モデル撮像時の処理内容)
 ユーザは、撮像装置13を使用して実在のモデルの撮影を行う。後述するように、実在のモデルのモデル画像は、後述する画像合成処理における合成用モデル画像として使用される。合成用モデル画像は、2D画像データである。
(S1: Processing details when photographing a live-action model by the imaging device 13)
A user uses the imaging device 13 to photograph an actual model. As will be described later, a model image of an actual model is used as a model image for synthesis in image synthesis processing that will be described later. The model image for synthesis is 2D image data.
 撮像装置13は、合成用モデル画像、並びに撮像した際のカメラ設定データ(カメラ角度、距離など)及び照明設定データ(明るさなど)を画像処理装置10に送信する。画像処理装置10は、撮像装置13から受信した合成用モデル画像、カメラ設定データ及び照明設定データを実写撮影データ106に格納する。 The imaging device 13 transmits the synthesis model image, camera setting data (camera angle, distance, etc.) and illumination setting data (brightness, etc.) at the time of imaging to the image processing device 10. The image processing device 10 stores the compositing model image, camera setting data, and illumination setting data received from the imaging device 13 in the live photographing data 106.
 (S2:ユーザ端末11による3D衣服データ及び3Dアバター生成の処理内容)
 ユーザ端末11は、サードパーティなどのアプリケーションを有しており、3D衣服データを生成し、また、3Dスキャナ12と通信を行って実在のモデルのポーズと同じポーズの3Dアバターを生成する。3Dアバターのポーズは、この段階で実在のモデルと同じポーズにしてもいいし、あるいは、基本的なポーズとしておき、後述するS3の処理時に同じポーズに変更するようにしてもよい。
(S2: Processing details of 3D clothing data and 3D avatar generation by user terminal 11)
The user terminal 11 has an application such as a third party, generates 3D clothing data, and also communicates with the 3D scanner 12 to generate a 3D avatar with the same pose as the actual model. The pose of the 3D avatar may be the same as that of the real model at this stage, or it may be set as a basic pose and changed to the same pose during processing in S3, which will be described later.
 ユーザ端末11は、3D衣服データ及び3Dアバターを画像処理装置10に送信し、画像処理装置10は、受信した3D衣服データ及び3Dアバターを、それぞれ、3D衣服データ107及び3Dアバター108に格納する。 The user terminal 11 transmits the 3D clothing data and 3D avatar to the image processing device 10, and the image processing device 10 stores the received 3D clothing data and 3D avatar in 3D clothing data 107 and 3D avatar 108, respectively.
 (S3:3D着衣アバター及び合成用衣服画像を生成する処理内容)
 画像処理装置10は、3D着衣アバター及び合成用衣服画像を生成するためのアプリケーションをユーザ端末11に提供する。画像処理装置10は、ユーザ端末11からの指示に応答して、合成処理に使用するための衣服のみの2D画像(合成用衣服画像)と2種類の3D着衣アバターとを出力する。
(S3: Processing content for generating 3D clothed avatar and synthetic clothing image)
The image processing device 10 provides the user terminal 11 with an application for generating a 3D clothed avatar and a synthetic clothing image. In response to an instruction from the user terminal 11, the image processing device 10 outputs a 2D image of only clothes (synthesis clothing image) and two types of 3D clothed avatars for use in the compositing process.
 画像処理装置10は、ユーザ端末11からの指示に応じて、3D衣服データを3D衣服データ107から読み出し、3Dアバターを3Dアバター108から読み出す。画像処理装置10は、ユーザ端末11からの指示に応じて、モデル画像を実写撮影データ106からさらに読み出し、読み出したモデル画像のポーズと同じポーズに3Dアバターのポーズを変更することもできる。画像処理装置10は、ユーザ端末11からの指示に応じて、3DCG(コンピュータグラフィックス)空間上で3D衣服データを3Dアバターに重ね合わせ、所定の位置計算を行って3D衣服データの大きさや配置位置を調整し、3D衣服データを3Dアバターの適切な位置に配置する。この処理により、3D衣服データが3Dアバターに着衣される。 The image processing device 10 reads 3D clothing data from the 3D clothing data 107 and reads 3D avatars from the 3D avatar 108 in response to instructions from the user terminal 11. In response to an instruction from the user terminal 11, the image processing device 10 can also read out a model image from the live-action photography data 106 and change the pose of the 3D avatar to the same pose as the pose of the read model image. In response to an instruction from a user terminal 11, the image processing device 10 superimposes 3D clothing data on a 3D avatar in a 3DCG (computer graphics) space, performs predetermined position calculations, and determines the size and placement position of the 3D clothing data. and place the 3D clothing data at the appropriate position of the 3D avatar. Through this process, the 3D clothing data is put on the 3D avatar.
 画像処理装置10は、ユーザ端末11からの指示に応じて、3DCG空間上で、3D衣服データを着衣した3Dアバターにおいて、3Dアバターの体形やポーズに応じて3D衣服データのクロスシミュレーションを実行し、第1の3D着衣アバターを3D着衣アバター109に格納する。 In response to an instruction from the user terminal 11, the image processing device 10 executes a cross simulation of 3D clothing data on a 3D avatar wearing 3D clothing data in a 3DCG space according to the body shape and pose of the 3D avatar, The first 3D clothed avatar is stored in the 3D clothed avatar 109.
 画像処理装置10は、3DCG空間上で、第1の3D着衣アバターに対して、実写撮影データ106から読み出したカメラ設定データ及び照明設定データを反映させ、所定のシェーダー(shader)設定パラメータを使用して3Dレンダリング処理を実行し、第2の3D着衣アバターを3D着衣アバター109に格納する。 The image processing device 10 reflects the camera setting data and lighting setting data read from the live-action shooting data 106 on the first 3D clothed avatar in the 3DCG space, and uses predetermined shader setting parameters. 3D rendering processing is performed, and the second 3D clothed avatar is stored in the 3D clothed avatar 109.
 画像処理装置10は、3Dアバターを除いた3D衣服データに基づいて2D衣服画像(「合成用衣服画像」とも言う)を生成し、合成用衣服画像110に格納する。 The image processing device 10 generates a 2D clothing image (also referred to as a "synthesis clothing image") based on the 3D clothing data excluding the 3D avatar, and stores it in the synthesis clothing image 110.
 (S4:合成用顔及び手画像の生成処理)
 ユーザは、任意のアプリケーションを使用して合成用モデル画像から顔及び手のマスク画像を2値化などの方法で機械的に、あるいは手動で生成する。すなわち、画像処理装置10は、ユーザ端末11からのマスク画像生成指示に応答して、撮像装置13から受信した合成用モデル画像から顔及び手のマスク画像を生成する。画像処理装置10は、任意のフィルタを使用して予め生成してあった顔及び手のマスク画像のエッジ情報を抽出する。説明される実施形態は、衣服の種類としてトップスを例として説明されているため、着衣した状態で表出する人体の部位(表出人体部位)は、顔及び/または手であるが、他の種類の衣服である場合には、その種類に応じて表出人体部位が変わり得る(例えば、ボトムスの場合には、足及び/または足首)ことを了解されたい。
(S4: Generation process of face and hand images for synthesis)
The user uses any application to mechanically or manually generate face and hand mask images from the synthesis model image using a method such as binarization. That is, in response to a mask image generation instruction from the user terminal 11, the image processing device 10 generates mask images of the face and hands from the synthesis model image received from the imaging device 13. The image processing device 10 extracts edge information of a face and hand mask image that has been generated in advance using an arbitrary filter. The described embodiment is explained using tops as an example of the type of clothing, so the parts of the human body that are exposed in the clothed state (exposed human body parts) are the face and/or hands, but other It should be understood that in the case of different types of clothing, the body parts exposed may vary depending on the type of clothing (eg, feet and/or ankles in the case of bottoms).
 画像処理装置10は、合成用モデル画像から、顔及び手のキーポイントを抽出し、抽出したキーポイントをバウンディングボックスとして探索範囲を設定する。画像処理装置10は、3Dの顔や手の一方の側から他方の側(例えば、左側の外郭部分から右側の外郭部分)へ移動しながらエッジを探索し、顔や手の他方の側に到達して折り返すまでエッジの探索を続ける。画像処理装置10は、探索されたエッジの範囲から、合成用モデル画像の顔及び手全体の画像を抽出する。 The image processing device 10 extracts key points of the face and hands from the synthesis model image, and sets a search range using the extracted key points as a bounding box. The image processing device 10 searches for edges while moving from one side of the 3D face or hand to the other side (for example, from the left outer part to the right outer part), and reaches the other side of the face or hand. and continues searching the edge until it turns around. The image processing device 10 extracts an image of the entire face and hand of the synthesis model image from the searched edge range.
 画像処理装置10は、第1の3D着衣アバターにおいて深度情報(「奥行情報」とも言う)を抽出する深度情報抽出処理を実行する。 The image processing device 10 executes depth information extraction processing to extract depth information (also referred to as "depth information") in the first 3D clothed avatar.
 画像処理装置10は、任意のフィルタを使用して合成用衣服画像のエッジ情報を抽出する。画像処理装置10は、合成用衣服画像のエッジをそのまま抽出するとテクスチャによる皺などがエッジのノイズになるため、深度情報を利用してエッジ情報を抽出することができる。 The image processing device 10 extracts edge information of the clothing image for synthesis using an arbitrary filter. The image processing device 10 can extract edge information using depth information, because if the edges of the clothing image for synthesis are extracted as they are, wrinkles caused by the texture will become edge noise.
 画像処理装置10は、顔及び手のマスク画像のエッジ情報と、合成用衣服画像のエッジ情報とを組み合わせ、組み合わせた画像をもとに合成用モデル画像の顔及び手全体をいくつかの画像に分割する。画像処理装置10は、合成用モデル画像の顔及び手全体の分割された領域ごとに、第2の3D着衣アバターにおける対応する領域との一致率を計算しながら最終的に合成すべき顔及び手画像(合成用顔及び手画像)を抽出する。 The image processing device 10 combines the edge information of the mask image of the face and hand with the edge information of the clothing image for synthesis, and creates several images of the entire face and hand of the model image for synthesis based on the combined images. To divide. The image processing device 10 determines the face and hand to be finally synthesized while calculating the matching rate with the corresponding area in the second 3D clothed avatar for each divided area of the entire face and hand of the synthesis model image. Extract images (face and hand images for synthesis).
 (S5:陰画像の生成処理)
 画像処理装置10は、3D着衣アバター109から読み出した第1の3D着衣アバターに対して、合成用モデル画像を生成した際のカメラ設定データ及び証明設定データを反映させレンダリングを実行した際に生まれる陰(シェード)情報を陰画像として生成し、3D着衣アバター109に格納する。
(S5: Shadow image generation process)
The image processing device 10 reflects the camera setting data and proof setting data used when generating the synthesis model image on the first 3D clothed avatar read from the 3D clothed avatar 109, and calculates the shadow created when rendering is performed. (shade) information is generated as a shade image and stored in the 3D clothed avatar 109.
 (S6:最終着衣画像を生成する着衣合成処理)
 画像処理装置10は、合成用モデル画像に合成用衣服画像を重ね合わせて第1の着衣モデル画像を出力する第1の着衣合成処理を実行する。画像処理装置10は、第1の着衣モデル画像に合成用顔及び手画像を重ね合わせ、さらに陰画像を重ね合わせることによって、最終的な着衣モデル画像を生成する第2の着衣画像生成処理を実行する。
(S6: Clothes composition processing that generates the final clothed image)
The image processing device 10 executes a first clothing composition process of superimposing a composition clothing image on a composition model image and outputting a first clothed model image. The image processing device 10 executes a second clothed image generation process that generates a final clothed model image by superimposing the synthetic face and hand images on the first clothed model image and further superimposing the shadow image. do.
 (画像処理装置10のシステム構成)
 次に、画像処理装置10のシステム構成を説明する。図3は、本発明の実施形態に係る画像処理装置10のシステム構成図である。画像処理装置10は、クラウドシステム上に、あるいは社内ネットワーク上に置かれるように構成されてよい。図3に示すように、画像処理装置10は、一般的なコンピュータと同様に、バス120などによって相互に接続された制御部101、主記憶部102、補助記憶部103、インターフェース(IF)部104、及び出力部105を備える。補助記憶部103は、画像処理装置10の各機能を実装するプログラム、及び当該プログラムで取り扱うデータを格納する。補助記憶部103は、ファイル/データベースなどの形式で、実写撮影データ106、3D衣服データ107、3Dアバター108、3D着衣アバター109、及び合成用衣服画像110を備える。画像処理装置10は、実写撮影データ106、3D衣服データ107、3Dアバター108、3D着衣アバター109、及び合成用衣服画像110に格納されている情報を読み出し、あるいは更新できる。補助記憶部103に格納されている各プログラムは、画像処理装置10によって実行される。
(System configuration of image processing device 10)
Next, the system configuration of the image processing device 10 will be explained. FIG. 3 is a system configuration diagram of the image processing device 10 according to the embodiment of the present invention. The image processing device 10 may be configured to be placed on a cloud system or on an in-house network. As shown in FIG. 3, the image processing device 10 includes a control section 101, a main storage section 102, an auxiliary storage section 103, and an interface (IF) section 104, which are interconnected by a bus 120 or the like, like a general computer. , and an output section 105. The auxiliary storage unit 103 stores programs that implement each function of the image processing device 10 and data handled by the programs. The auxiliary storage unit 103 includes live photographed data 106, 3D clothing data 107, 3D avatar 108, 3D clothing avatar 109, and synthetic clothing image 110 in a file/database format. The image processing device 10 can read or update information stored in the live-action photography data 106, 3D clothing data 107, 3D avatar 108, 3D clothed avatar 109, and synthetic clothing image 110. Each program stored in the auxiliary storage unit 103 is executed by the image processing device 10.
 制御部101は、中央処理装置(CPU)とも呼ばれ、画像処理装置10の各構成要素の制御やデータの演算を行い、補助記憶部103に格納されている各種プログラムを主記憶部102に読み出して実行する。主記憶部102は、メインメモリとも呼ばれ、受信した各種データ、コンピュータ実行可能な命令及び当該命令による演算処理後のデータなどを記憶する。補助記憶部103は、ハードディスク(HDD)やSSD(Solid State Drive)などに代表される記憶装置であり、データやプログラムを長期的に保存する。 The control unit 101, also called a central processing unit (CPU), controls each component of the image processing device 10 and calculates data, and reads various programs stored in the auxiliary storage unit 103 to the main storage unit 102. and execute it. The main storage unit 102 is also called main memory, and stores various received data, computer-executable instructions, and data after arithmetic processing using the instructions. The auxiliary storage unit 103 is a storage device such as a hard disk (HDD) or SSD (Solid State Drive), and stores data and programs for a long period of time.
 図3の実施形態は、制御部101、主記憶部102及び補助記憶部103を同一のコンピュータの内部に設ける実施形態について説明するが、他の実施形態として、画像処理装置10は、制御部101、主記憶部102及び補助記憶部103を複数個使用することにより、複数のコンピュータによる並列分散処理を実現するように構成することもできる。また、他の実施形態として、画像処理装置10のための複数のサーバを設置し、複数サーバが一つの補助記憶部103を共有する実施形態にすることも可能である。 Although the embodiment of FIG. 3 describes an embodiment in which the control unit 101, the main storage unit 102, and the auxiliary storage unit 103 are provided inside the same computer, as another embodiment, the image processing apparatus 10 By using a plurality of main storage units 102 and auxiliary storage units 103, it is also possible to implement parallel distributed processing by a plurality of computers. Furthermore, as another embodiment, a plurality of servers for the image processing apparatus 10 may be installed, and one auxiliary storage unit 103 may be shared by the plurality of servers.
 IF部104は、他のシステムや装置との間でデータを送受信する際のインターフェースの役割を果たし、また、システムオペレータから各種コマンドや入力データ(各種マスタ、テーブルなど)を受け付けるインターフェースを提供する。出力部105は、処理されたデータを表示する表示画面や当該データを印刷するための印刷手段などを提供する。 The IF unit 104 serves as an interface for transmitting and receiving data with other systems and devices, and also provides an interface for receiving various commands and input data (various masters, tables, etc.) from the system operator. The output unit 105 provides a display screen for displaying processed data, a printing means for printing the data, and the like.
 制御部101、主記憶部102、補助記憶部103、IF部104、及び出力部105と同様な構成要素は、ユーザ端末11及び撮像装置13にも存在する。 Components similar to the control unit 101, main storage unit 102, auxiliary storage unit 103, IF unit 104, and output unit 105 also exist in the user terminal 11 and the imaging device 13.
 実写撮影データ106は、実在モデルのモデル画像(2D画像データ)、実在モデルの顔及び手のマスク画像、並びに実写撮影時のカメラ設定データ及び照明設定データを格納する。図4は、本発明の実施形態に係る実写撮影データ106のデータ構造の一例を示す図である。実写撮影データ106は、実写撮影ID401、モデル画像402、マスク画像403、カメラ設定データ404、及び照明設定データ405を含むことができるが、これらのデータ項目に限定されることはなく他のデータ項目も含むことができる。 The live-action shooting data 106 stores a model image (2D image data) of a real model, a mask image of the face and hands of the real model, and camera setting data and lighting setting data at the time of live-action shooting. FIG. 4 is a diagram illustrating an example of the data structure of the live-action photography data 106 according to the embodiment of the present invention. The live-action shooting data 106 can include a live-action shooting ID 401, a model image 402, a mask image 403, camera setting data 404, and lighting setting data 405, but is not limited to these data items and may include other data items. can also be included.
 実写撮影ID401は、実写撮影時のモデル及び当該モデルに関連付けられるデータを識別する識別子である。モデル画像402は、実在モデルの2Dモデル画像データであり、「合成用モデル画像」とも呼ばれる。マスク画像403は、合成用モデル画像から生成したモデルの顔及び手のマスク画像である。カメラ設定データ404は、実写撮影時のカメラの設定データ、例えば、カメラ角度、距離などを示す。照明設定データ405は、実写撮影時の照明の設定データ、例えば、明るさなどを示す。 The live-action shooting ID 401 is an identifier that identifies a model at the time of live-action shooting and data associated with the model. The model image 402 is 2D model image data of a real model, and is also called a "synthesis model image." The mask image 403 is a mask image of the model's face and hands generated from the synthesis model image. Camera setting data 404 indicates camera setting data at the time of live-action photography, such as camera angle and distance. Lighting setting data 405 indicates lighting setting data, such as brightness, at the time of live-action photography.
 図3に戻って説明すると、3D衣服データ107は、ユーザによって生成された3D衣服データを格納する。3D衣服データは、画像選択を容易にするための属性情報(例えば、衣服のカテゴリー、色、形など)と関連付けられて記憶されていてもよい。 Returning to FIG. 3, the 3D clothing data 107 stores 3D clothing data generated by the user. The 3D clothing data may be stored in association with attribute information (eg, clothing category, color, shape, etc.) to facilitate image selection.
 3Dアバター108は、ユーザによって生成された3Dアバターのデータを格納する。3Dアバターは、実写撮影時のモデル画像のポーズと同じになるようにユーザによって作成される。図5は、本発明の実施形態に係る3Dアバター108のデータ構造の一例を示す図である。3Dアバター108は、3DアバターID501、3Dアバター502、及び実写撮影ID401を含むことができるが、これらのデータ項目に限定されることはなく他のデータ項目も含むことができる。 The 3D avatar 108 stores 3D avatar data generated by the user. The 3D avatar is created by the user so as to have the same pose as the model image at the time of live-action photography. FIG. 5 is a diagram showing an example of the data structure of the 3D avatar 108 according to the embodiment of the present invention. The 3D avatar 108 can include a 3D avatar ID 501, a 3D avatar 502, and a live-action shooting ID 401, but is not limited to these data items and can also include other data items.
 3DアバターID501は、3Dアバターを識別する識別子である。3Dアバター502は、3Dアバターのデータを示す。実写撮影ID401は、3Dアバターに関連付けられる実写撮影を識別する識別子である。実写撮影ID401により、対応する実写モデルのポーズ、マスク画像、カメラや照明の設定データが3Dアバターに関連付けられる。 The 3D avatar ID 501 is an identifier that identifies a 3D avatar. 3D avatar 502 shows 3D avatar data. The live-action shooting ID 401 is an identifier that identifies the live-action shooting associated with the 3D avatar. The live-action shooting ID 401 associates the corresponding live-action model's pose, mask image, camera and lighting setting data with the 3D avatar.
 図3に戻って説明すると、3D着衣アバター109は、3Dアバターに3D衣服データを重ね合わせて所定の処理を施した3D着衣アバターの画像データを格納する。図6は、本発明の実施形態に係る3D着衣アバター109のデータ構造の一例を示す図である。3D着衣アバター109は、3D着衣アバターID601、第1の3D着衣アバター602、第2の3D着衣アバター603、陰情報604、陰画像605、3DアバターID501、及び実写撮影ID401を含むことができるが、これらのデータ項目に限定されることはなく他のデータ項目も含むことができる。 Returning to FIG. 3, the 3D clothed avatar 109 stores image data of a 3D clothed avatar obtained by superimposing 3D clothing data on the 3D avatar and performing predetermined processing. FIG. 6 is a diagram showing an example of the data structure of the 3D clothed avatar 109 according to the embodiment of the present invention. The 3D clothed avatar 109 can include a 3D clothed avatar ID 601, a first 3D clothed avatar 602, a second 3D clothed avatar 603, shadow information 604, a shadow image 605, a 3D avatar ID 501, and a live-action shooting ID 401. It is not limited to these data items and can also include other data items.
 3D着衣アバターID601は、画像処理装置10によって生成された3D着衣アバターを識別する識別子である。第1の3D着衣アバター602は、3Dクロスシミュレーション済の3D着衣アバターの画像データを示す。第2の3D着衣アバター603は、第1の3D着衣アバターに対してカメラ設定データ及び証明設定データを反映させて3Dレンダリング処理を行った3D着衣アバターの画像データを示す。陰情報604及び陰画像605は、第1の3D着衣アバターに対して合成用モデル画像を生成した際のカメラ設定データ及び証明設定データを反映させてレンダリングを実行した際に生成される陰情報及び陰画像をそれぞれ示す。3DアバターID501は、3D着衣アバターを生成する元になった3Dアバターを特定するための識別子であり、実写撮影ID401は、当該3Dアバターに関連付けられる実写撮影を識別する識別子である。3DアバターID501及び実写撮影ID401により、実在モデル撮影時の各種設定データなどを取得しやすくなる。 The 3D clothed avatar ID 601 is an identifier that identifies the 3D clothed avatar generated by the image processing device 10. The first 3D clothed avatar 602 shows image data of a 3D clothed avatar that has undergone 3D cloth simulation. A second 3D clothed avatar 603 represents image data of a 3D clothed avatar that has been subjected to 3D rendering processing by reflecting camera setting data and certification setting data on the first 3D clothed avatar. The shadow information 604 and the shadow image 605 are the shadow information and shadow information generated when rendering is executed by reflecting the camera setting data and proof setting data when generating the synthesis model image for the first 3D clothed avatar. The shadow images are shown respectively. The 3D avatar ID 501 is an identifier for identifying the 3D avatar from which the 3D clothed avatar is generated, and the live-action shooting ID 401 is an identifier for identifying the live-action shooting associated with the 3D avatar. The 3D avatar ID 501 and live-action shooting ID 401 make it easier to acquire various setting data and the like when shooting a real model.
 図3に戻って説明すると、合成用衣服画像110は、第2の3D着衣アバターの3D衣服データに基づいて生成された2D衣服データを格納する。図7は、本発明の実施形態に係る合成用衣服画像110のデータ構造の一例を示す図である。合成用衣服画像110は、合成用衣服画像ID701、合成用衣服画像702、実写撮影ID401、及び3D着衣アバターID601を含むことができるが、これらのデータ項目に限定されることはなく他のデータ項目も含むことができる。 Returning to FIG. 3, the synthetic clothing image 110 stores 2D clothing data generated based on the 3D clothing data of the second 3D clothing avatar. FIG. 7 is a diagram illustrating an example of a data structure of a clothing image for synthesis 110 according to an embodiment of the present invention. The synthetic clothing image 110 can include a synthetic clothing image ID 701, a synthetic clothing image 702, a live-action photography ID 401, and a 3D clothed avatar ID 601, but is not limited to these data items and may include other data items. can also be included.
 合成用衣服画像ID701は、本発明の実施形態に係る画像合成処理に使用される2Dの衣服画像データを識別する識別子である。合成用衣服画像702は、画像合成処理に使用される2Dの衣服画像データを示す。実写撮影ID401は、合成用衣服画像を生成する元になった3D着衣アバターに関連付けられるに実写撮影を識別する識別子である。3D着衣アバターID601は、合成用衣服画像の元データである3D衣服データに関連付けられる3D着衣アバターの識別子である。 The compositing clothing image ID 701 is an identifier that identifies 2D clothing image data used in the image compositing process according to the embodiment of the present invention. A clothing image for composition 702 indicates 2D clothing image data used for image composition processing. The live-action shooting ID 401 is an identifier that identifies the live-action shooting associated with the 3D clothed avatar from which the synthetic clothing image is generated. The 3D clothed avatar ID 601 is an identifier of a 3D clothed avatar associated with 3D cloth data that is the original data of the clothing image for synthesis.
 (各種フローについての説明)
 図8~図11を参照しながら、画像処理装置10が、合成用衣服画像(2D)、合成用モデル画像(2D)、各種設定データ、3Dアバター及び3D衣服データを使用して最終な着衣モデル画像を生成する処理フローを説明する。図8~図11は、図2のS3~S6の処理内容をそれぞれ示している。S4及びS5の処理はいずれが先に行われてもよい。
(Explanation about various flows)
With reference to FIGS. 8 to 11, the image processing device 10 creates a final clothed model using a compositing clothing image (2D), a compositing model image (2D), various setting data, 3D avatars, and 3D clothing data. The processing flow for generating an image will be explained. 8 to 11 show the processing contents of S3 to S6 in FIG. 2, respectively. Either of S4 and S5 may be performed first.
 (S3:3D着衣アバター及び合成用衣服画像を生成する処理内容)
 図8は、画像処理装置10が、図2のS1及びS2を参照しながら上記で説明したような処理により生成されたデータ、すなわち、合成用モデル画像、各種設定データ、3Dアバター、及び3D衣服データを使用して、3D着衣アバター及び合成用衣服画像を生成する処理フローを示す。
(S3: Processing details for generating 3D clothed avatar and synthetic clothing image)
FIG. 8 shows data generated by the image processing device 10 through the processing described above with reference to S1 and S2 of FIG. A processing flow for generating a 3D clothed avatar and a synthetic cloth image using data is shown.
 図8の処理では、画像処理装置10は、3D着衣アバター画像を生成するためのアプリケーションをユーザ端末11に提供し、ユーザ端末11を介して受信したユーザ指示に基づいて処理を行う。 In the process of FIG. 8, the image processing device 10 provides the user terminal 11 with an application for generating a 3D clothed avatar image, and performs the process based on user instructions received via the user terminal 11.
 S801にて、ユーザ端末11は、提供されたアプリケーションを通じて処理対象とする合成用モデル画像、3D衣服データ及び3Dアバターを選択し、画像処理装置10に選択指示を送信する。画像処理装置10は、ユーザ端末11からの選択指示に応じて、選択されたモデル画像を実写撮影データ106から読み出し、3D衣服データを3D衣服データ107から読み出し、選択された3Dアバターを3Dアバター108から読み出す。画像処理装置10は、読み出したモデル画像のポーズに合わせるように3Dアバターのポーズを変更することができる。この処理により、モデル画像のポーズと3Dアバターのポーズが一致し、モデル画像及び3Dアバターは関連付けられ、3Dアバター108に選択した実写撮影ID401が記憶される。 In S801, the user terminal 11 selects a synthesis model image, 3D clothing data, and 3D avatar to be processed through the provided application, and sends a selection instruction to the image processing device 10. In response to a selection instruction from the user terminal 11, the image processing device 10 reads the selected model image from the live-action photography data 106, reads 3D clothing data from the 3D clothing data 107, and converts the selected 3D avatar into a 3D avatar 108. Read from. The image processing device 10 can change the pose of the 3D avatar to match the pose of the read model image. Through this processing, the pose of the model image and the pose of the 3D avatar match, the model image and the 3D avatar are associated, and the selected live-action shooting ID 401 is stored in the 3D avatar 108.
 ユーザ端末11は、アプリケーションの3DCG空間上で3D衣服データを3Dアバターの適切な位置に配置する配置指示を画像処理装置10に送信する。画像処理装置10は、ユーザ端末11からの配置指示に応じて、3DCG空間上で3D衣服データを3Dアバターに重ね合わせ、所定の位置計算を行うことにより、3D衣服データの大きさや配置位置を調整し、3D衣服データを3Dアバターの適切な位置に配置する。この処理により、3D衣服データが3Dアバターに着衣される。 The user terminal 11 transmits a placement instruction to the image processing device 10 to place the 3D clothing data at an appropriate position of the 3D avatar on the 3DCG space of the application. The image processing device 10 adjusts the size and placement position of the 3D clothing data by superimposing the 3D clothing data on the 3D avatar in the 3DCG space and performing predetermined position calculations in accordance with placement instructions from the user terminal 11. and place the 3D clothing data at an appropriate position on the 3D avatar. Through this process, the 3D clothing data is put on the 3D avatar.
 S802にて、ユーザ端末11は、3D衣服データを着衣した3Dアバターに対してクロスシミュレーションを実行するためのシミュレーション指示を画像処理装置10に送信する。クロスシミュレーションとは、衣服などのクロス(布)の動きを物理的にシミュレートする技術を指す。例えば、3Dアバターが着衣した際にできる衣服の皺の状態をシミュレーションするなど、クロスの物理演算が行われる。 In S802, the user terminal 11 transmits a simulation instruction to the image processing device 10 to perform a cloth simulation on a 3D avatar wearing 3D clothing data. Cloth simulation refers to a technology that physically simulates the movement of cloth such as clothing. For example, physical calculations are performed on the cloth, such as simulating the wrinkles in clothing when a 3D avatar wears it.
 画像処理装置10は、ユーザ端末11からのシミュレーション指示に応じて、3DCG空間上で、3D衣服データを着衣した3Dアバターにおいて、3Dアバターの体形やポーズに応じて3D衣服データのクロスシミュレーションを実行し、クロスシミュレーション済の3D着衣アバターを3D着衣アバター109の第1の3D着衣アバター602に格納する。 In response to a simulation instruction from the user terminal 11, the image processing device 10 executes a cross simulation of 3D clothing data in a 3DCG space on a 3D avatar wearing 3D clothing data according to the body shape and pose of the 3D avatar. , the cross-simulated 3D clothed avatar is stored in the first 3D clothed avatar 602 of the 3D clothed avatar 109 .
 S803にて、画像処理装置10は、S801にて選択されたモデル画像に関連付けられるカメラ設定データ及び照明設定データを実写撮影データ106から読み出す。画像処理装置10は、アプリケーションの3DCG空間上で、クロスシミュレーション済の3D着衣アバター(第1の3D着衣アバター)に対して、読み出したカメラ設定データ及び照明設定データを反映させ、所定のシェーダー(shader)設定パラメータを使用して3Dレンダリング処理を実行し、3Dレンダリング処理済の3D着衣アバター(第2の3D着衣アバター)を3D着衣アバター109の第2の3D着衣アバター603に格納する。 In S803, the image processing device 10 reads camera setting data and lighting setting data associated with the model image selected in S801 from the live-action photography data 106. The image processing device 10 reflects the read camera setting data and lighting setting data on the cross-simulated 3D clothed avatar (first 3D clothed avatar) in the 3DCG space of the application, and applies a predetermined shader. ) Execute the 3D rendering process using the setting parameters, and store the 3D rendered avatar (second 3D clothed avatar) in the second 3D clothed avatar 603 of the 3D clothed avatar 109.
 S804にて、画像処理装置10は、3Dレンダリング処理済の3D着衣アバター(第2の3D着衣アバター)から3Dアバターを除いた3D衣服データを抽出する。画像処理装置10は、抽出した3D衣服データに基づいて2D衣服画像(本明細書では「合成用衣服画像」と言う)を生成し、合成用衣服画像110に格納する。 In S804, the image processing device 10 extracts 3D clothing data by removing the 3D avatar from the 3D clothing avatar (second 3D clothing avatar) that has undergone 3D rendering processing. The image processing device 10 generates a 2D clothing image (herein referred to as a "synthesis clothing image") based on the extracted 3D clothing data, and stores it in the synthesis clothing image 110.
 (S4:合成用顔及び手画像を生成する処理内容)
 図9は、画像処理装置10が、顔及び手のマスク画像、合成用衣服画像、及び3D着衣アバターを使用して、衣服と人体の前後関係を判定済みの合成用顔・手画像を生成する処理フローを示す。なお、本処理フローの前提として、画像処理装置10は、任意のアプリケーションを通じてユーザ端末11と通信して、合成用モデル画像から顔及び手のマスク画像を生成しているものとする。また、本明細書では「手」という用語は、人体の肩から指先に至る間、手首、手のひら、及び手の指のいずれかを示す用語として使用されるが、これらは衣服のデザインによって変わりうる。
(S4: Processing details for generating face and hand images for synthesis)
In FIG. 9, the image processing device 10 uses a face and hand mask image, a composition clothing image, and a 3D clothed avatar to generate a composite face and hand image in which the front-back relationship between the clothing and the human body has been determined. The processing flow is shown. Note that this processing flow assumes that the image processing device 10 communicates with the user terminal 11 through an arbitrary application to generate mask images of the face and hands from the synthesis model image. Additionally, in this specification, the term "hand" is used to indicate any of the wrist, palm, and fingers of a human body from the shoulder to the fingertips, but these may vary depending on the design of the clothing. .
 S901にて、画像処理装置10は、処理対象の実写撮影ID401に関連付けられる合成用モデル画像から、顔及び手のキーポイントを抽出し、抽出したキーポイントをバウンディングボックスとして探索範囲を設定する。画像処理装置10は、顔や手の一方の側から他方の側(例えば、左側の外郭部分から右側の外郭部分)へ移動しながらエッジを探索し、顔や手の他方の側に到達して折り返すまでエッジの探索を続ける。画像処理装置10は、探索されたエッジの範囲から合成用モデル画像の顔及び手全体の画像を抽出する。 In S901, the image processing device 10 extracts key points of the face and hands from the synthesis model image associated with the live-action shooting ID 401 to be processed, and sets a search range using the extracted key points as a bounding box. The image processing device 10 searches for edges while moving from one side of the face or hand to the other side (for example, from the left outer part to the right outer part), and reaches the other side of the face or hand. Continue searching the edge until it turns around. The image processing device 10 extracts the entire face and hand image of the synthesis model image from the searched edge range.
 画像処理装置10は、任意のフィルタを使用して予め生成してあった顔及び手のマスク画像のエッジ情報を抽出する。図12(a)の上部は、顔及び手のマスク画像からエッジ情報を抽出するイメージを示している。 The image processing device 10 extracts edge information of a face and hand mask image that has been generated in advance using an arbitrary filter. The upper part of FIG. 12(a) shows an image of extracting edge information from mask images of faces and hands.
 S902にて、画像処理装置10は、第1の3D着衣アバター602を3D着衣アバター109から読み出し、読み出した第1の3D着衣アバター602に対して深度情報抽出処理を実行する。この処理により、画像処理装置10は、第1の3D着衣アバター602のそれぞれの位置の深度情報を取得することができる。深度情報により、衣服における皺と外郭線を区別できるようになる。 In S902, the image processing device 10 reads the first 3D clothed avatar 602 from the 3D clothed avatar 109, and performs depth information extraction processing on the read first 3D clothed avatar 602. Through this process, the image processing device 10 can acquire depth information at each position of the first 3D clothed avatar 602. Depth information makes it possible to distinguish between wrinkles and contour lines in clothing.
 S903にて、画像処理装置10は、任意のフィルタを使用して合成用衣服画像のエッジ情報を抽出する。合成用衣服画像のエッジ情報をそのまま抽出すると衣服のテクスチャによる皺などがノイズとなり得るため、画像処理装置10は、S902で取得した深度情報を使用して合成用衣服画像のエッジ情報を抽出することができる。図12(a)の下部は、合成用衣服画像からエッジ情報を抽出するイメージを示している。 In S903, the image processing device 10 extracts edge information of the clothing image for synthesis using an arbitrary filter. If the edge information of the clothing image for synthesis is extracted as is, wrinkles due to the texture of the clothing may become noise, so the image processing device 10 extracts the edge information of the clothing image for synthesis using the depth information acquired in S902. Can be done. The lower part of FIG. 12(a) shows an image of extracting edge information from a clothing image for synthesis.
 なお、S901の処理と、S902及びS903の処理の順序はどちらが先に行われても構わず、特に限定されない。すなわち、S901の処理の後にS902及びS903の処理が行われてもいいし、S902及びS903の処理の後にS901の処理が行われてもよい。あるいは、両者が同時並行で処理されても構わない。 Note that the order of the processing in S901 and the processing in S902 and S903 is not particularly limited and may be performed first. That is, the processing in S902 and S903 may be performed after the processing in S901, or the processing in S901 may be performed after the processing in S902 and S903. Alternatively, both may be processed in parallel.
 S904にて、画像処理装置10は、合成用衣服画像のエッジ情報と、顔及び手のマスク画像のエッジ情報とを組み合わせる。図12(b)は、2つのエッジ情報を組み合わせたイメージを示している。 In S904, the image processing device 10 combines the edge information of the clothing image for synthesis with the edge information of the face and hand mask images. FIG. 12(b) shows an image in which two pieces of edge information are combined.
 S905にて、画像処理装置10は、S904で組み合わされたエッジ情報に基づいて、S901にて抽出された合成用モデル画像の顔及び手全体の画像をエッジで囲まれた領域に分割する。図13(a)は、合成用モデル画像の手部分をエッジで囲まれたいくつかの領域に分割したことを示す例である。 In S905, the image processing device 10 divides the entire face and hand image of the synthesis model image extracted in S901 into regions surrounded by edges, based on the edge information combined in S904. FIG. 13A is an example showing that the hand portion of the synthesis model image is divided into several regions surrounded by edges.
 S906にて、画像処理装置10は、第2の3D着衣アバター603を3D着衣アバター109から読み出し、読み出した第2の3D着衣アバター603と、いくつかの領域に分割処理がなされた合成用モデル画像の顔及び手全体の画像との対応部分同士(例えば、両者の左手の親指同士)を分割領域ごとに比較する。画像処理装置10は、両者の一致率を計算し、一致率が予め定められた閾値(X)以上となる部分を実際に見える部分であると判定する。画像処理装置10は、それぞれの領域についての判定結果に基づいて、合成用モデル画像の顔及び手全体の画像のうちある領域を視認可能な箇所(見える部分)とし、あるいは視認不可な箇所(見えない部分)とした最終的に合成する顔及び手画像を抽出する。図13(b)の例では、左手の親指部分は、閾値(X)未満の一致率であったため、画像処理装置10は、この親指部分の画像を最終的に合成する顔及び手画像には含めない処理を行っている。 In S906, the image processing device 10 reads the second 3D clothed avatar 603 from the 3D clothed avatar 109, and combines the read second 3D clothed avatar 603 with the compositing model image that has been divided into several regions. The corresponding parts of the face and the entire hand image (for example, the thumbs of both left hands) are compared for each divided region. The image processing device 10 calculates the matching rate between the two, and determines that a portion where the matching rate is equal to or higher than a predetermined threshold value (X) is an actually visible portion. Based on the determination results for each area, the image processing device 10 determines a certain area of the entire face and hand image of the synthesis model image as a visible part (visible part) or as an invisible part (visible part). The face and hand images to be finally synthesized are extracted. In the example of FIG. 13(b), since the matching rate for the thumb portion of the left hand was less than the threshold value (X), the image processing device 10 determines that the image of the thumb portion of the left hand should not be combined with the face and hand images to be finally combined. Processing is being performed not to include it.
 閾値(X)は、深度情報に基づいて変化させることができる。このため、画像処理装置10は、S902で取得したそれぞれの位置の深度情報に基づいて、それぞれの位置の閾値(X)を変えることができる。このため、エッジで囲まれた分割領域ごとに閾値(X)の値は変化し得る。 The threshold (X) can be changed based on depth information. Therefore, the image processing apparatus 10 can change the threshold value (X) of each position based on the depth information of each position acquired in S902. Therefore, the value of the threshold (X) can change for each divided region surrounded by edges.
 S907にて、画像処理装置10は、S906にて抽出された、それぞれの領域についての最終的に合成する顔及び手画像に基づいて、合成用顔・手画像を生成する。図13(b)は、生成された合成用顔・手画像の従来技術と本発明の手画像の例を示している。図13(b)に示されるように、従来の一般的な画像合成処理では上述したような一致判定を行わないため、親指部分が見えてしまっている。実際のポーズでは、親指部分は衣服の襞の裏側に隠れており見えないはずなので、不自然な画像となる。一方、本発明による一致判定を行った場合、親指部分が衣服の襞の裏側に隠れてしまい見えることはない。画像処理装置10は、この親指部分については一致率が閾値(X)未満であると判定し、この親指部分については見えない部分であるとして合成用顔・手画像には含めていない。 In S907, the image processing device 10 generates a face/hand image for synthesis based on the face and hand images to be finally synthesized for each region extracted in S906. FIG. 13(b) shows an example of the generated synthetic face/hand images of the prior art and the hand image of the present invention. As shown in FIG. 13(b), the thumb portion is visible because the conventional general image synthesis process does not perform the above-described matching determination. In the actual pose, the thumb is hidden behind the folds of the clothing and cannot be seen, resulting in an unnatural image. On the other hand, when a match is determined according to the present invention, the thumb part is hidden behind the folds of the clothing and cannot be seen. The image processing device 10 determines that the matching rate for this thumb portion is less than the threshold (X), and does not include this thumb portion in the face/hand image for synthesis since it is an invisible portion.
 (S5:陰画像の生成処理)
 図10は、画像処理装置10が、クロスシミュレーション済の3D着衣アバター(第1の3D着衣アバター)に対して、合成用モデル画像を生成した際のカメラ設定データ及び証明設定データを反映させてレンダリングを実行した際に生まれる陰(シェード)情報に基づく陰画像を生成する処理フローを示す。
(S5: Shadow image generation process)
In FIG. 10, the image processing device 10 renders the cross-simulated 3D clothed avatar (first 3D clothed avatar) by reflecting the camera setting data and proof setting data when generating the synthesis model image. The processing flow for generating a shade image based on shade information generated when executing the process is shown.
 S1001にて、画像処理装置10は、処理対象の実写撮影ID401に基づいて第1の3D着衣アバター602を3D着衣アバター109から読み出す。画像処理装置10はまた、当該実写撮影ID401に基づいて実写撮影データ106に問い合わせを行い、対応するカメラ設定データ404及び照明設定データ405を読み出す。 In S1001, the image processing device 10 reads the first 3D clothed avatar 602 from the 3D clothed avatar 109 based on the live-action shooting ID 401 to be processed. The image processing device 10 also queries the live-action shooting data 106 based on the live-action shooting ID 401 and reads out the corresponding camera setting data 404 and illumination setting data 405.
 S1002にて、画像処理装置10は、読み出した第1の3D着衣アバター602に対して、対応するカメラ設定データ404及び照明設定データ405を反映させてレンダリングを実行し、光が当たっているか、いないかを計算してシェーディングを行う。 In S1002, the image processing device 10 performs rendering on the read first 3D clothed avatar 602 by reflecting the corresponding camera setting data 404 and lighting setting data 405, and determines whether or not light is shining on it. Calculate and perform shading.
 S1003にて、画像処理装置10は、シェーディングの計算結果である陰情報に基づく陰画像を生成する。画像処理装置10は、陰情報及び陰画像を3D着衣アバター109の陰情報604及び陰画像605にそれぞれ格納する。 In S1003, the image processing device 10 generates a shadow image based on shadow information that is a shading calculation result. The image processing device 10 stores the shadow information and the shadow image as the shadow information 604 and the shadow image 605 of the 3D clothed avatar 109, respectively.
 (S6:最終着衣画像を生成する着衣合成処理)
 図11は、画像処理装置10が、処理対象の実写撮影ID401に関連付けられる合成用モデル画像に合成用衣服画像を重ね合わせて第1の着衣モデル画像を出力する第1の着衣合成処理と、第1の着衣合成処理の出力である第1の着衣モデル画像に、合成用顔・手画像を重ね合わせ、さらに陰画像を重ね合わせて、最終的な着衣モデル画像を生成する第2の着衣合成処理とを実行することにより最終着衣画像を生成する処理フローを示す。
(S6: Clothes composition processing that generates the final clothed image)
FIG. 11 shows a first clothing composition process in which the image processing device 10 outputs a first clothed model image by superimposing a composition clothing image on a composition model image associated with the live-action shooting ID 401 to be processed; A second clothed compositing process that generates a final clothed model image by superimposing the compositing face/hand image on the first clothed model image that is the output of the first clothed compositing process, and further overlapping the shadow image. The processing flow for generating the final clothed image by executing the following is shown.
 S1101にて、画像処理装置10は、第1の着衣合成処理を実行する。より詳細に言えば、画像処理装置10は、処理対象の実写撮影ID401に基づいてモデル画像402を実写撮影データ106から読み出し、当該実写撮影ID401を使用して合成用衣服画像110に問い合わせを行って合成用衣服画像702を読み出す。画像処理装置10は、合成用モデル画像に合成用衣服画像を重ね合わせて、第1の着衣モデル画像を生成する。 In S1101, the image processing device 10 executes a first clothing composition process. More specifically, the image processing device 10 reads the model image 402 from the live-action photography data 106 based on the live-action photography ID 401 to be processed, and queries the clothing image for composition 110 using the live-action photography ID 401. The composite clothing image 702 is read out. The image processing device 10 generates a first clothed model image by superimposing the synthetic clothing image on the synthetic model image.
 S1102にて、画像処理装置10は、第2の着衣合成処理を実行する。より詳細に言えば、画像処理装置10は、生成された第1の着衣モデル画像に、合成用顔・手画像を重ね合わせ、さらに陰画像を重ね合わせることにより、最終的な着衣モデル画像を生成する。画像処理装置10は、生成した最終的な着衣モデル画像をユーザ端末11に提供する。 In S1102, the image processing device 10 executes a second clothing composition process. More specifically, the image processing device 10 generates the final clothed model image by superimposing the synthetic face/hand image and the shadow image on the generated first clothed model image. do. The image processing device 10 provides the generated final clothed model image to the user terminal 11.
 (本発明の利点)
 上述した処理により、画像処理装置10がヒトと衣服の前後関係をより精緻に推定しながら画像合成処理を実行することが可能となる。本発明によれば、ヒトと服の前後関係の特定が難しく、本来であればヒトの上に来るべき衣服領域が見えなくなるなどの出力画像の精度の低さといった問題が解決されるようになる。
(Advantages of the present invention)
The above-described processing enables the image processing device 10 to perform image synthesis processing while estimating the context of a person and clothing more precisely. According to the present invention, it is difficult to identify the front and back relationship between a person and clothing, and problems such as low accuracy of output images such as the clothing area that should normally be on the person becoming invisible can be solved. .
 (その他の実施形態)
 上記では、ヒトを例として説明してきたが、本発明の原理はヒト以外の動物に対しても適用可能である。最近、ペットとして飼育される動物用の衣服が販売されている。このようなペットの動物用の衣服についても本発明の原理を利用して合成画像を作成することにより、広告やマーケティングに合成画像を使用することができるようになる。
(Other embodiments)
Although humans have been described above as an example, the principles of the present invention are also applicable to animals other than humans. Recently, clothing for animals kept as pets has been on sale. By creating a composite image of such pet animal clothing using the principles of the present invention, the composite image can be used for advertising and marketing.
 また、上記では、ヒトの顔及び手を例として説明してきたが、衣服の種類によっては人体の部位のうち、顔や手以外の部位に対しても本発明の原理を適用して合成画像を生成することができるようになる。例えば、衣服がトップスの場合と、ボトムスの場合とでは、着衣した状態で表出する人体の部位は異なる。トップスの場合は、表出する人体の部位は顔及び/または手であってよく、ボトムスの場合は、表出する人体の部位は手及び/または足首であってよい。本明細書では「表出人体部位」は、衣服の種類に応じて、顔、手、足、足首などを示すものとする。 In addition, although the above explanation has been given using the human face and hands as an example, the principles of the present invention can also be applied to parts of the human body other than the face and hands, depending on the type of clothing, to create a composite image. be able to generate. For example, the parts of the human body that are exposed when wearing tops and bottoms are different. In the case of tops, the exposed body parts may be the face and/or hands, and in the case of bottoms, the exposed body parts may be the hands and/or ankles. In this specification, "exposed human body parts" refer to the face, hands, feet, ankles, etc., depending on the type of clothing.
 また、上記では、人体と衣服というオブジェクトを対象に本発明の原理を説明したが、対象となるオブジェクトは、人体や衣服に限定されることはない。例えば、対象となるオブジェクトが人体と乗り物(自動車、オートバイ、自転車など)であってもよい。さらに、オブジェクトの数は3つ以上であっても構わない。上記の例に関連して言えば、アクセサリーやバッグなどの小物を第3のオブジェクトとして合成画像を生成することも可能である。すなわち、本発明は、複数のオブジェクトの前後関係が複雑に入り組んでいたとしても、出力される合成画像の精度を高くすることができるようになる。 Furthermore, although the principles of the present invention have been described above with reference to objects such as a human body and clothing, the objects are not limited to the human body and clothing. For example, the target objects may be a human body and a vehicle (car, motorcycle, bicycle, etc.). Furthermore, the number of objects may be three or more. In relation to the above example, it is also possible to generate a composite image using small items such as accessories and bags as the third object. That is, the present invention makes it possible to increase the accuracy of the output composite image even if the context of a plurality of objects is complicated.
 以上、例示的な実施形態を参照しながら本発明の原理を説明したが、本発明の要旨を逸脱することなく、構成及び細部において変更する様々な実施形態を実現可能であることを当業者は理解するだろう。すなわち、本発明は、例えば、システム、装置、方法、プログラムもしくは記憶媒体等としての実施態様をとることが可能である。 Although the principles of the present invention have been described above with reference to exemplary embodiments, those skilled in the art will appreciate that various embodiments with changes in configuration and details can be realized without departing from the gist of the present invention. You will understand. That is, the present invention can be implemented as, for example, a system, device, method, program, storage medium, or the like.
 1 画像処理システム
 10 画像処理装置
 11 ユーザ端末
 12 3Dスキャナ
 13 撮像装置
 14 ネットワーク
 101 制御部
 102 主記憶部
 103 補助記憶部
 104 インターフェース(IF)部
 105 出力部
 106 実写撮影データ
 107 3D衣服データ
 108 3Dアバター
 109 3D着衣アバター
 110 合成用衣服画像
1 Image processing system 10 Image processing device 11 User terminal 12 3D scanner 13 Imaging device 14 Network 101 Control unit 102 Main storage unit 103 Auxiliary storage unit 104 Interface (IF) unit 105 Output unit 106 Live-action shooting data 107 3D clothing data 108 3D avatar 109 3D clothed avatar 110 Clothes image for synthesis

Claims (7)

  1.  合成用モデル画像に関連付けられる設定データ、3Dアバター、及び3D衣服データを使用して、第1の3D着衣アバター、第2の3D着衣アバター、及び合成用衣服画像を生成する手段と、
     前記合成用モデル画像に関連付けられる表出人体部位のマスク画像のエッジ情報と、前記合成用衣服画像のエッジ情報とに基づいて、前記合成用モデル画像の表出人体部位全体の画像をエッジで囲まれた領域に分割し、分割された前記領域ごとに、前記第2の3D着衣アバターと前記領域に分割された前記合成用モデル画像の表出人体部位全体の画像との対応する部分の一致率を計算することによって、衣服と人体の前後関係を判定済みの合成用表出人体部位画像を生成する手段と、
     前記第1の3D着衣アバターに対して、前記合成用モデル画像に関連付けられる設定データを反映させてレンダリングを実行した際の陰画像を生成する手段と、
     前記合成用モデル画像に前記合成用衣服画像を重ね合わせて第1の着衣モデル画像を出力する手段と、
     前記第1の着衣モデル画像に、前記合成用表出人体部位画像を重ね合わせ、さらに前記陰画像を重ね合わせて、最終的な着衣モデル画像を生成する手段と
     を備えた画像処理装置。
    Means for generating a first 3D clothed avatar, a second 3D clothed avatar, and a synthetic clothing image using setting data, a 3D avatar, and 3D clothing data associated with the synthetic model image;
    An image of the entire expressed human body part of the synthetic model image is surrounded by edges based on edge information of a mask image of the expressed human body part associated with the synthetic model image and edge information of the synthetic clothing image. and for each divided region, the matching rate of the corresponding portion between the second 3D clothed avatar and the image of the entire expressed human body part of the synthesis model image divided into the regions. means for generating a composite expressed human body part image in which the anteroposterior relationship between the clothing and the human body has been determined by calculating the
    means for generating a shadow image when rendering is performed on the first 3D clothed avatar by reflecting setting data associated with the synthesis model image;
    means for superimposing the synthetic clothing image on the synthetic model image to output a first clothed model image;
    An image processing device comprising: means for superimposing the expressed human body part image for synthesis on the first clothed model image, and further superimposing the negative image on the first clothed model image to generate a final clothed model image.
  2.  衣服と人体の前後関係を判定済みの合成用表出人体部位画像を生成する手段は、
     計算された前記一致率が閾値以上である場合に、前記領域に分割された前記合成用モデル画像の表出人体部位全体の画像の対応する部分が視認可能な部分であると判定する手段であって、前記視認可能な部分は衣服の上に見えている部分である、手段
     をさらに備える、請求項1の画像処理装置。
    The means for generating a synthetic expressed human body part image in which the anteroposterior relationship between clothing and the human body has been determined is as follows:
    means for determining that a corresponding portion of an image of the entire expressed human body part of the synthesis model image divided into the regions is a visible portion when the calculated matching rate is equal to or higher than a threshold; The image processing apparatus according to claim 1, further comprising: means, wherein the visible portion is a portion visible on clothing.
  3.  前記第1の3D着衣アバターの深度情報を取得する手段をさらに備え、
     前記合成用衣服画像のエッジ情報は、前記深度情報を使用して抽出される、
     請求項2の画像処理装置。
    Further comprising means for acquiring depth information of the first 3D clothed avatar,
    edge information of the synthetic clothing image is extracted using the depth information;
    An image processing device according to claim 2.
  4.  エッジで囲まれた前記領域に関連付けられる閾値は、前記深度情報に基づいて異なる、請求項3の画像処理装置。 The image processing device according to claim 3, wherein the threshold associated with the area surrounded by edges differs based on the depth information.
  5.  前記合成用モデル画像から、表出人体部位のキーポイントを抽出し、抽出したキーポイントをバウンディングボックスとして探索範囲を設定する手段と、
     前記合成用モデル画像における、表出人体部位の一方の側から他方の側へ移動しながらエッジを探索し、表出人体部位の他方の側に到達して折り返すまでエッジの探索を続ける手段と、
     前記探索されたエッジの範囲から前記合成用モデル画像の表出人体部位全体の画像を抽出する手段と
     をさらに備えた、請求項1の画像処理装置。
    means for extracting key points of expressed human body parts from the synthesis model image and setting a search range using the extracted key points as bounding boxes;
    means for searching for edges in the synthetic model image while moving from one side of the expressed human body part to the other side, and continuing to search for edges until reaching the other side of the expressed human body part and turning back;
    The image processing apparatus according to claim 1, further comprising means for extracting an image of the entire human body part expressed in the synthesis model image from the range of the searched edges.
  6.  衣服の種類がトップスである場合、前記表出人体部位は、顔及び/または手であり、
     衣服の種類がボトムスである場合、前記表出人体部位は、足及び/または足首である、
     請求項1の画像処理装置。
    When the type of clothing is a top, the exposed human body part is a face and/or a hand;
    When the type of clothing is bottoms, the exposed human body parts are feet and/or ankles;
    An image processing device according to claim 1.
  7.  画像処理装置によって実行される画像処理方法であって、
     合成用モデル画像に関連付けられる設定データ、3Dアバター、及び3D衣服データを使用して、第1の3D着衣アバター、第2の3D着衣アバター、及び合成用衣服画像を生成することと、
     前記合成用モデル画像に関連付けられる表出人体部位のマスク画像のエッジ情報と、前記合成用衣服画像のエッジ情報とに基づいて、前記合成用モデル画像の表出人体部位全体の画像をエッジで囲まれた領域に分割し、分割された前記領域ごとに、前記第2の3D着衣アバターと前記領域に分割された前記合成用モデル画像の表出人体部位全体の画像との対応する部分の一致率を計算することによって、衣服と人体の前後関係を判定済みの合成用表出人体部位画像を生成することと、
     前記第1の3D着衣アバターに対して、前記合成用モデル画像に関連付けられる設定データを反映させてレンダリングを実行した際の陰画像を生成することと、
     前記合成用モデル画像に前記合成用衣服画像を重ね合わせて第1の着衣モデル画像を出力することと、
     前記第1の着衣モデル画像に、前記合成用表出人体部位画像を重ね合わせ、さらに前記陰画像を重ね合わせて、最終的な着衣モデル画像を生成することと
     を備える画像処理方法。
    An image processing method executed by an image processing device, the method comprising:
    Generating a first 3D clothed avatar, a second 3D clothed avatar, and a composite clothing image using setting data, a 3D avatar, and 3D clothing data associated with the composite model image;
    An image of the entire expressed human body part of the synthetic model image is surrounded by edges based on edge information of a mask image of the expressed human body part associated with the synthetic model image and edge information of the synthetic clothing image. and for each divided region, the matching rate of the corresponding portion between the second 3D clothed avatar and the image of the entire expressed human body part of the synthesis model image divided into the regions. By calculating the above, a expressed human body part image for synthesis in which the anteroposterior relationship between the clothing and the human body has been determined,
    generating a shadow image when rendering is performed on the first 3D clothed avatar by reflecting setting data associated with the synthesis model image;
    superimposing the synthetic clothing image on the synthetic model image to output a first clothed model image;
    An image processing method comprising: superimposing the expressed human body part image for synthesis on the first clothed model image, and further superimposing the negative image on the first clothed model image to generate a final clothed model image.
PCT/JP2023/022231 2022-07-08 2023-06-15 Image processing device, and image processing method WO2024009721A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022-110527 2022-07-08
JP2022110527A JP2024008557A (en) 2022-07-08 2022-07-08 Image processing device, image processing method, and program

Publications (1)

Publication Number Publication Date
WO2024009721A1 true WO2024009721A1 (en) 2024-01-11

Family

ID=89453185

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/022231 WO2024009721A1 (en) 2022-07-08 2023-06-15 Image processing device, and image processing method

Country Status (2)

Country Link
JP (1) JP2024008557A (en)
WO (1) WO2024009721A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09259252A (en) * 1996-03-22 1997-10-03 Hitachi Ltd Picture processing method
JP2017037637A (en) * 2015-07-22 2017-02-16 アディダス アーゲー Method and apparatus for generating artificial picture
US20170372515A1 (en) * 2014-12-22 2017-12-28 Reactive Reality Gmbh Method and system for generating garment model data
US10540757B1 (en) * 2018-03-12 2020-01-21 Amazon Technologies, Inc. Method and system for generating combined images utilizing image processing of multiple images
WO2021063829A1 (en) * 2019-09-30 2021-04-08 Reactive Reality Ag Method and computer program product for processing model data of a set of garments
JP2022530710A (en) * 2020-02-24 2022-06-30 深▲チェン▼市商▲湯▼科技有限公司 Image processing methods, devices, computer equipment and storage media

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09259252A (en) * 1996-03-22 1997-10-03 Hitachi Ltd Picture processing method
US20170372515A1 (en) * 2014-12-22 2017-12-28 Reactive Reality Gmbh Method and system for generating garment model data
JP2017037637A (en) * 2015-07-22 2017-02-16 アディダス アーゲー Method and apparatus for generating artificial picture
US10540757B1 (en) * 2018-03-12 2020-01-21 Amazon Technologies, Inc. Method and system for generating combined images utilizing image processing of multiple images
WO2021063829A1 (en) * 2019-09-30 2021-04-08 Reactive Reality Ag Method and computer program product for processing model data of a set of garments
JP2022530710A (en) * 2020-02-24 2022-06-30 深▲チェン▼市商▲湯▼科技有限公司 Image processing methods, devices, computer equipment and storage media

Also Published As

Publication number Publication date
JP2024008557A (en) 2024-01-19

Similar Documents

Publication Publication Date Title
US10685454B2 (en) Apparatus and method for generating synthetic training data for motion recognition
JP7370527B2 (en) Method and computer program for generating three-dimensional model data of clothing
US10628666B2 (en) Cloud server body scan data system
US7663648B1 (en) System and method for displaying selected garments on a computer-simulated mannequin
CN106373178B (en) Apparatus and method for generating artificial image
US9167155B2 (en) Method and system of spacial visualisation of objects and a platform control system included in the system, in particular for a virtual fitting room
US9639635B2 (en) Footwear digitization system and method
JP2019510297A (en) Virtual try-on to the user's true human body model
Li et al. In-home application (App) for 3D virtual garment fitting dressing room
JP2013235537A (en) Image creation device, image creation program and recording medium
KR20130089649A (en) Method and arrangement for censoring content in three-dimensional images
KR101586010B1 (en) Apparatus and method for physical simulation of cloth for virtual fitting based on augmented reality
KR100828935B1 (en) Method of Image-based Virtual Draping Simulation for Digital Fashion Design
KR20150124518A (en) Apparatus and method for creating virtual cloth for virtual fitting based on augmented reality
WO2018182938A1 (en) Method and system for wireless ultra-low footprint body scanning
WO2024009721A1 (en) Image processing device, and image processing method
JP2012120080A (en) Stereoscopic photography apparatus
JP2017188071A (en) Pattern change simulation device, pattern change simulation method and program
Siegmund et al. Virtual Fitting Pipeline: Body Dimension Recognition, Cloth Modeling, and On-Body Simulation.
JP7388751B2 (en) Learning data generation device, learning data generation method, and learning data generation program
WO2018151612A1 (en) Texture mapping system and method
WO2015144563A1 (en) Image processing system and method
CA2289413C (en) System and method for displaying selected garments on a computer-simulated mannequin
KR101803064B1 (en) Apparatus and method for 3d model reconstruction
JP2023153534A (en) Image processing apparatus, image processing method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23835255

Country of ref document: EP

Kind code of ref document: A1