US11068699B2 - Image processing device, image processing method, and telecommunication system to generate an output image for telecommunication - Google Patents

Image processing device, image processing method, and telecommunication system to generate an output image for telecommunication Download PDF

Info

Publication number
US11068699B2
US11068699B2 US16/609,043 US201816609043A US11068699B2 US 11068699 B2 US11068699 B2 US 11068699B2 US 201816609043 A US201816609043 A US 201816609043A US 11068699 B2 US11068699 B2 US 11068699B2
Authority
US
United States
Prior art keywords
image
user
high fidelity
unit
captured
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US16/609,043
Other languages
English (en)
Other versions
US20200151427A1 (en
Inventor
Seiji Kimura
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Original Assignee
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corp filed Critical Sony Corp
Assigned to SONY CORPORATION reassignment SONY CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KIMURA, SEIJI
Publication of US20200151427A1 publication Critical patent/US20200151427A1/en
Application granted granted Critical
Publication of US11068699B2 publication Critical patent/US11068699B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • G06K9/00281
    • G06K9/00597
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present disclosure relates to an image processing device, an image processing method, a program, and a telecommunication system, and particularly relates to an image processing device, an image processing method, a program, and a telecommunication system for achieving more realistic telecommunication.
  • Telecommunication systems have been conventionally used, in which users located at remote places have conversation as if the users are facing each other.
  • a capture device and a display device since arrangement positions of a capture device and a display device are limited, gazes of users cannot be matched, and eye contact may not be established, for example.
  • Patent Document 1 discloses an image generation method of generating a video of an object subjectively viewed from the front using a plurality cameras arranged outside and inside a display (behind a semi-transparent display in the case of the semi-transparent display).
  • Patent Document 2 discloses image processing for video conference of performing 3D modeling of a face and mapping texture of the face in a state where orientation of a model is rotated to match a gaze, thereby generating a video with a coincident gaze.
  • Patent Document 1 Japanese Patent Application Laid-Open No. 2011-165081
  • Patent Document 2 Japanese PCT National Publication No. 2015-513833
  • Patent Documents 1 and 2 in a case of using a large display device, for example, even if image processing is performed using an image obtained by capturing an object by a capture device arranged around the large display device, an unnatural image is assumed. Therefore, achieving more realistic telecommunication such that gazes of users coincide with each other has been difficult.
  • the present disclosure has been made in view of such a situation, and enables achievement of more realistic telecommunication.
  • An image processing device includes: a high fidelity display region setting unit configured to set a predetermined region including at least an eye region in which an eye of a first user is captured in an image in which the first user is captured, as a high fidelity display region; a high fidelity image generation unit configured to perform first image generation processing using at least a part of a plurality of captured images having the first user respectively captured by a plurality of capture devices arranged outside a display device, and generate a high fidelity image in which the first user looks captured from a virtual capture position that is obtained by setting a viewpoint position of a second user displayed on the display device as the virtual capture position, and the high fidelity image having an appearance with higher fidelity; a low fidelity image generation unit configured to perform second image generation processing using at least a part of the plurality of captured images in each of which the first user is captured, and generate a low fidelity image in which the first user looks captured from the virtual capture position and having lower fidelity than the high fidelity image; and an image superimposing
  • An image processing method or a program includes the steps of: setting a predetermined region including at least an eye region in which an eye of a first user is captured in an image in which the first user is captured, as a high fidelity display region; performing first image generation processing using at least a part of a plurality of captured images having the first user respectively captured by a plurality of capture devices arranged outside a display device, and generating a high fidelity image in which the first user looks captured from a virtual capture position that is obtained by setting a viewpoint position of a second user displayed on the display device as the virtual capture position, and the high fidelity image having an appearance with higher fidelity; performing second image generation processing using at least a part of the plurality of captured images in each of which the first user is captured, and generating a low fidelity image in which the first user looks captured from the virtual capture position and having lower fidelity than the high fidelity image; and superimposing the high fidelity image on the high fidelity display region in the low fidelity image to generate an output image to
  • the predetermined region including at least an eye region in which an eye of a first user is captured in an image in which the first user is captured is generated as a high fidelity display region
  • the first image generation processing is performed using at least a part of a plurality of captured images having the first user respectively captured by a plurality of capture devices arranged outside a display device, and the high fidelity image in which the first user looks captured from a virtual capture position that is obtained by setting a viewpoint position of a second user displayed on the display device as the virtual capture position, and the high fidelity image having an appearance with higher fidelity
  • the second image generation processing is performed using at least a part of the plurality of captured images in each of which the first user is captured, and the low fidelity image in which the first user looks captured from the virtual capture position and having lower fidelity than the high fidelity image is generated
  • the high fidelity image is superimposed on the high fidelity display region in the low fidelity image to generate an output image to be output as an image processing result.
  • a telecommunication system is configured to have a first user-side telecommunication apparatus and a second user-side telecommunication apparatus connected via a network, the first user-side telecommunication apparatus including a first image processing device including at least a high fidelity display region setting unit configured to set a predetermined region including at least an eye region in which an eye of a first user is captured in an image in which the first user is captured, as a high fidelity display region, a high fidelity image generation unit configured to perform first image generation processing using at least a part of a plurality of captured images having the first user respectively captured by a plurality of capture devices arranged outside a display device, and generate a high fidelity image in which the first user looks captured from a virtual capture position that is obtained by setting a viewpoint position of a second user displayed on the display device as the virtual capture position, and the high fidelity image having an appearance with higher fidelity, a low fidelity image generation unit configured to perform second image generation processing using at least a part of the plurality of
  • the predetermined region including at least an eye region in which an eye of a first user is captured in an image in which the first user is captured is generated as a high fidelity display region
  • the first image generation processing is performed using at least a part of a plurality of captured images having the first user respectively captured by a plurality of capture devices arranged outside a display device, and the high fidelity image in which the first user looks captured from a virtual capture position that is obtained by setting a viewpoint position of a second user displayed on the display device as the virtual capture position, and the high fidelity image having an appearance with higher fidelity
  • the second image generation processing is performed using at least a part of the plurality of captured images in each of which the first user is captured, and the low fidelity image in which the first user looks captured from the virtual capture position and having lower fidelity than the high fidelity image is generated, and the high fidelity image is superimposed on the high fidelity display region in the low
  • the display image for displaying the first user with a specific size at a specific position is generated from the output image in which the first user is captured on the basis of the viewpoint position of the first user in a three-dimensional space.
  • FIG. 1 is a diagram illustrating a configuration example of an embodiment of a telecommunication system to which the present technology is applied.
  • FIG. 2 is a block diagram illustrating a first configuration example of an image processing unit.
  • FIG. 3 is a block diagram illustrating a configuration example of an object viewpoint information setting unit in FIG. 2 .
  • FIG. 4 is a diagram illustrating an example of characteristic points of each part of a face on an image.
  • FIG. 5 is a diagram for describing corresponding points of three captured images.
  • FIG. 6 is a block diagram illustrating a configuration example of a high fidelity display region setting unit in FIG. 2 .
  • FIGS. 7A and 7B are diagrams for describing a mask image for specifying a high fidelity display region in FIG. 2 .
  • FIG. 8 is a block diagram illustrating a configuration example of a high fidelity image generation unit in FIG. 2 .
  • FIG. 9 is a diagram illustrating an example of a virtual capture position.
  • FIG. 10 is a diagram for describing viewpoint interpolation processing.
  • FIG. 11 is a block diagram illustrating a configuration example of a low fidelity image generation unit in FIG. 2 .
  • FIGS. 12A and 12B are diagrams for describing a person image having an object captured by a capture device arranged on an upper side.
  • FIGS. 13A and 13B are diagrams for describing projective transformation in a case where a virtual capture position is at the same height as an object viewpoint.
  • FIGS. 14A and 14B are diagrams for describing projective transformation in a case where the virtual capture position is higher than the object viewpoint.
  • FIGS. 15A and 15B are diagrams for describing projective transformation in a case where the virtual capture position is lower than the object viewpoint.
  • FIG. 16 is a block diagram illustrating a configuration example of a pseudo gaze coincidence image generation unit in FIG. 2 .
  • FIG. 17 is a block diagram illustrating a configuration example of an encoding unit in FIG. 2 .
  • FIG. 18 is a block diagram illustrating a configuration example of a decoding unit in FIG. 2 .
  • FIG. 19 is a block diagram illustrating a configuration example of a pseudo gaze coincidence image display unit in FIG. 2 .
  • FIG. 20 is a flowchart for describing processing of outputting a pseudo gaze coincidence image in which a principal user is captured.
  • FIG. 21 is a flowchart for describing processing of displaying a pseudo gaze coincidence image in which the other party's user is captured.
  • FIG. 22 is a block diagram illustrating a second configuration example of the image processing unit.
  • FIG. 23 is a block diagram illustrating a third configuration example of the image processing unit.
  • FIG. 24 is a diagram illustrating an example of object viewpoint information set in a fixed manner.
  • FIG. 25 is a block diagram illustrating a configuration example of a high fidelity display region setting unit in FIG. 23 .
  • FIG. 26 is a block diagram illustrating a configuration example of an encoding unit in FIG. 23 .
  • FIG. 27 is a block diagram illustrating a configuration example of a decoding unit in FIG. 23 .
  • FIG. 28 is a block diagram illustrating a configuration example of a pseudo gaze coincidence image display unit in FIG. 23 .
  • FIG. 29 is a diagram for describing a geometric correction parameter including a scaling component.
  • FIG. 30 is a block diagram illustrating a fourth configuration example of the image processing unit.
  • FIGS. 31A and 31B illustrate PTZ control by a capture means control unit.
  • FIG. 32 is a block diagram illustrating a fifth configuration example of the image processing unit.
  • FIG. 33 is a block diagram illustrating a configuration example of an object viewpoint information setting unit in FIG. 32 .
  • FIG. 34 is a block diagram illustrating a configuration example of a high fidelity display region setting unit in FIG. 32 .
  • FIGS. 35A and 35B are diagrams for describing a high fidelity display region set to avoid a portion where a rim of glasses exists from a face.
  • FIG. 36 is a block diagram illustrating a sixth configuration example of the image processing unit.
  • FIG. 37 is a block diagram illustrating a configuration example of a pseudo gaze coincidence image generation unit in FIG. 36 .
  • FIGS. 38A and 38B are diagrams for describing determination processing by a high fidelity determination unit.
  • FIG. 39 is a block diagram illustrating a seventh example of the image processing unit.
  • FIG. 40 is a block diagram illustrating a configuration example of a low fidelity image generation parameter generation unit in FIG. 39 .
  • FIG. 41 is a block diagram illustrating a configuration example of an encoding unit in FIG. 39 .
  • FIG. 42 is a block diagram illustrating a configuration example of a decoding unit in FIG. 39 .
  • FIG. 43 is a block diagram illustrating a configuration example of a low fidelity image generation unit in FIG. 39 .
  • FIG. 44 is a block diagram illustrating an eighth example of the image processing unit.
  • FIG. 45 is a block diagram illustrating a configuration example of a pseudo gaze coincidence image generation unit in FIG. 44 .
  • FIG. 46 is a diagram for describing removal of a signal interfering with gaze coincidence.
  • FIG. 47 is a block diagram illustrating a configuration example of an interference signal removal unit in FIG. 45 .
  • FIG. 48 is a diagram illustrating an example of a blend ratio of a low fidelity image.
  • FIG. 49 is a block diagram illustrating a ninth example of the image processing unit.
  • FIG. 50 is a diagram for describing deviation of a gaze in a perception direction.
  • FIG. 51 is a block diagram illustrating a configuration example of a high fidelity image generation unit in FIG. 49 .
  • FIG. 52 is a block diagram illustrating a configuration example of a pseudo gaze coincidence image display unit in FIG. 49 .
  • FIG. 53 is a diagram for describing viewpoint interpolation position.
  • FIGS. 54A, 54B, and 54C are diagrams illustrating examples of an upward correction amount, a leftward correction amount, and a rightward correction amount.
  • FIG. 55 is a diagram for describing a perception direction of a gaze after correction.
  • FIG. 56 is a block diagram illustrating a tenth example of the image processing unit.
  • FIG. 57 is a block diagram illustrating a configuration example of an object viewpoint information setting unit in FIG. 56 .
  • FIG. 58 is a block diagram illustrating a configuration example of a pseudo gaze coincidence image generation unit in FIG. 56 .
  • FIG. 59 is a block diagram illustrating a configuration example of a catch light emphasizing unit in FIG. 58 .
  • FIG. 60 is a diagram for describing detection of a pupil region.
  • FIGS. 61A and 61B are diagrams illustrating an example of luminance occurrence probability distribution in the pupil region.
  • FIGS. 62A, 62B, and 62C are diagrams illustrating an example of parameters used in catch light emphasizing processing.
  • FIG. 63 is a block diagram illustrating an eleventh example of the image processing unit.
  • FIG. 64 is a block diagram illustrating a configuration example of a pseudo gaze coincidence image display unit in FIG. 63 .
  • FIG. 65 is a diagram for describing an error between a gaze direction of eyes and face orientation.
  • FIGS. 66A and 66B are diagrams illustrating examples of upward and downward correction amounts and rightward and leftward correction amounts.
  • FIG. 67 is a block diagram illustrating a configuration example of an embodiment of a computer to which the present technology is applied.
  • FIG. 1 is a diagram illustrating a configuration example of an embodiment of a telecommunication system to which the present technology is applied.
  • a telecommunication system 11 is configured such that two telecommunication apparatuses 12 a and 12 b are connected via a network 13 such as the Internet.
  • the telecommunication system 11 can provide a telecommunication service in which a user of the telecommunication apparatus 12 a and a user of the telecommunication apparatus 12 b can perform interactive communication.
  • a user of the telecommunication apparatus 12 a is also referred to as a principal user
  • the user of the telecommunication apparatus 12 b who is the other party to perform telecommunication with the principal user, is also referred to as the other party's user, as appropriate.
  • the telecommunication apparatus 12 a includes a plurality of capture devices 21 a , a display device 22 a , and an information processing device 23 a .
  • the information processing device 23 a includes an image processing unit 24 a and a communication unit 25 a.
  • the telecommunication apparatus 12 a includes three capture devices 21 a - 1 to 21 a - 3 .
  • the capture device 21 a - 1 is arranged above the display device 22 a
  • the capture device 21 a - 2 is arranged on a left side of the display device 22 a
  • the capture device 21 a - 3 is arranged on a right side of the display device 22 a .
  • the number of the capture devices 21 a is not limited to three, and may be two or four or more, and the arrangement of the capture devices 21 a is not limited to the example illustrated in FIG. 1 .
  • a state in which the three capture devices 21 a - 1 to 21 a - 3 capture a user standing alone in front of the display device 22 a as an object will be described.
  • Each of the capture devices 21 a - 1 to 21 a - 3 includes an imaging element such as a complementary metal oxide semiconductor (CMOS) image sensor, for example, and the capture devices 21 a - 1 to 21 a - 3 capture the user as the object and supply obtained three captured images to the information processing device 23 a .
  • CMOS complementary metal oxide semiconductor
  • the capture devices 21 a - 1 to 21 a - 3 will be simply referred to as “capture devices 21 a ” as appropriate unless distinguishing the capture devices 21 a - 1 to 21 a - 3 is required.
  • the display device 22 a includes, for example, a display device such as a liquid crystal panel or an organic electro luminescence (EL) panel, and displays an image transmitted from the telecommunication apparatus 12 b as a life size of the user of the telecommunication apparatus 12 b captured in the image.
  • a display device such as a liquid crystal panel or an organic electro luminescence (EL) panel
  • the information processing device 23 a can be configured by, for example, a computer including a central processing unit (CPU), a read only memory (ROM), a random access memory (RAM), and the like. Then, when the information processing device 23 a executes an application for realizing telecommunication, the image processing unit 24 a performs image processing and the communication unit 25 a performs communication processing.
  • a computer including a central processing unit (CPU), a read only memory (ROM), a random access memory (RAM), and the like. Then, when the information processing device 23 a executes an application for realizing telecommunication, the image processing unit 24 a performs image processing and the communication unit 25 a performs communication processing.
  • CPU central processing unit
  • ROM read only memory
  • RAM random access memory
  • the image processing unit 24 a performs image processing of causing the principal user to recognize that a gaze of the principal user coincides with a gaze of the other party's user displayed on the display device 22 a in a pseudo manner. For example, the image processing unit 24 a performs the image processing of generating an image as if the principal user is captured from a virtual viewpoint set at a position of an eye of the other party's user displayed on the display device 22 a (hereinafter, the image will be referred to as a pseudo gaze coincidence image), using the three captured images supplied from the capture devices 21 a - 1 to 21 a - 3 .
  • the image processing unit 24 a performs the image processing of displaying the other party's user with a size at a position (at a height of an eye) where the other party's user is displayed in a life-size manner and displays the other party's user on the display device 22 a.
  • the communication unit 25 a can perform communication via the network 13 .
  • the communication unit 25 a transmits a coded stream output from the image processing unit 24 a to the telecommunication apparatus 12 b , receives a coded stream transmitted from the telecommunication apparatus 12 b , and supplies the coded stream to the image processing unit 24 a.
  • the telecommunication apparatus 12 a configured as described above can generate the pseudo gaze coincidence image as if the principal user is captured from the viewpoint of the other party's user set on the display device 22 a , using the captured images of the principal user captured by the capture devices 21 a - 1 to 21 a - 3 . Furthermore, the telecommunication apparatus 12 a can display the other party's user with the size at the position for life size, using the pseudo gaze coincidence image transmitted from the telecommunication apparatus 12 b . Similarly, the telecommunication apparatus 12 b can generate a pseudo gaze coincidence image in which the principal user is captured, and display a pseudo gaze coincidence image in which the other party's user is captured.
  • the users using the telecommunication system 11 can perform telecommunication in a state where the user turns the gaze toward the eyes of the other party displayed in a life size manner and the mutual gazes coincide with each other. Thereby, the users can perform more realistic communication by the telecommunication system 11 .
  • the telecommunication apparatus 12 b is similarly configured to the telecommunication apparatus 12 a .
  • the configuration of the telecommunication apparatus 12 a will be described and description of the configuration of the telecommunication apparatus 12 b is omitted.
  • the telecommunication apparatuses 12 a and 12 b will be referred to as telecommunication apparatuses 12 unless distinguishing the telecommunication apparatuses 12 is required, and the respective constituent elements will also be referred to in a similar manner.
  • a first configuration example of the image processing unit 24 will be described with reference to FIGS. 2, 3, 4, 5, 6, 7A, 7B, 8, 9, 10, 11, 12A, 12B, 13A, 13B, 14A, 14B, 15A, 15B, 16, 17 , 18 , and 19 .
  • FIG. 2 is a block diagram illustrating a first configuration example of the image processing unit 24 .
  • the image processing unit 24 includes an object viewpoint information setting unit 31 , a high fidelity display region setting unit 32 , a high fidelity image generation unit 33 , a low fidelity image generation unit 34 , a pseudo gaze coincidence image generation unit 35 , an encoding unit 36 , a transmission unit 37 , a reception unit 38 , a decoding unit 39 , and a pseudo gaze coincidence image display unit 40 .
  • the blocks arranged above the broken line apply image processing of generating the pseudo gaze coincidence image as if the gaze of the principal user looks at the eyes of the other party's user as viewed from the other party's user, using a plurality images in which the principal user is captured.
  • the blocks arranged below the broken line apply image processing of displaying the pseudo gaze coincidence image in which the other party's user is captured in such a manner that the gaze of the other party's user looks at the principal user as viewed from the viewpoint of the principal user.
  • an input capture signal in which the captured images obtained by capturing the principal user from three directions by the capture devices 21 - 1 to 21 - 3 in FIG. 1 , and a signal indicating, for example, depth information indicating a distance in a depth direction detected by a depth sensor (not illustrated) are multiplexed is input to the image processing unit 24 .
  • the input capture signal is supplied to the object viewpoint information setting unit 31 , the high fidelity image generation unit 33 , and the low fidelity image generation unit 34 .
  • object viewpoint information that is information indicating the viewpoint position of the user in a three-dimensional space and is decoded object viewpoint information that is decoded after object viewpoint information of the other party's user is encoded and transmitted is input to the image processing unit 24 .
  • the decoded object viewpoint information is supplied to the high fidelity image generation unit 33 and the low fidelity image generation unit 34 .
  • the object viewpoint information setting unit 31 analyzes the face of the principal user to be captured by the capture devices 21 on the basis of the three captured images and the depth information obtained from the input capture signal. Thereby, the object viewpoint information setting unit 31 acquires analysis information including coordinates indicating characteristic points of parts of the face on the images, and supplies the analysis information to the high fidelity display region setting unit 32 . Moreover, the object viewpoint information setting unit 31 obtains the viewpoint position of the principal user in the three-dimensional space on the basis of the three captured images and the depth information obtained from the input capture signal, acquires the object viewpoint information indicating the viewpoint position, and supplies the object viewpoint information to the encoding unit 36 .
  • the object viewpoint information is used when the other party's image processing unit 24 generates a pseudo gaze coincidence display image from the pseudo gaze coincidence image in which the principal user is captured. Note that detailed processing in the object viewpoint information setting unit 31 will be described with reference to FIGS. 3 to 5 .
  • the high fidelity display region setting unit 32 sets a high fidelity display region to serve as a region where a high fidelity image to be described below is displayed, of a region where the face of the principal user is captured in the pseudo gaze coincidence image, on the basis of the analysis information supplied from the object viewpoint information setting unit 31 . Then, the high fidelity display region setting unit 32 supplies high fidelity display region information indicating the high fidelity display region to the high fidelity image generation unit 33 and the low fidelity image generation unit 34 . Note that detailed processing in the high fidelity display region setting unit 32 will be described below with reference to FIGS. 6, 7A and 7B .
  • the high fidelity image generation unit 33 generates a high fidelity image in which the principal user looks captured from a virtual capture position that is obtained by setting the viewpoint position of the other party's user indicated by the decoded object viewpoint information as the virtual capture position, and the high fidelity image having an appearance with higher fidelity.
  • the high fidelity image generation unit 33 can generate a high fidelity image that reproduces how the user looks from the virtual capture position at a high level by using a viewpoint interpolation technology or the like for at least a part of the three captured images captured by the capture devices 21 - 1 to 21 - 3 .
  • the high fidelity image generation unit 33 generates the high fidelity image, limiting the display region to the high fidelity display region indicated by the high fidelity display region information supplied from the high fidelity display region setting unit 32 . Then, the high fidelity image generation unit 33 supplies the generated high fidelity image to the pseudo gaze coincidence image generation unit 35 . Note that detailed processing in the high fidelity image generation unit 33 will be described below with reference to FIGS. 8 to 10 .
  • the low fidelity image generation unit 34 generates a low fidelity image in which the principal user looks captured from a virtual capture position that is obtained by setting the viewpoint position of the other party's user indicated by the decoded object viewpoint information, and the low fidelity image with lower fidelity than the high fidelity image.
  • the low fidelity image generation unit 34 can generate a low fidelity image that reproduces how the user looks from the virtual capture position at a certain level by performing projective transformation for at least a part of the three captured images captured by the capture devices 21 - 1 to 21 - 3 .
  • the low fidelity image generation unit 34 applies correction for reflecting an influence of the projective transformation of when generating the low fidelity image to the high fidelity display region indicated by the high fidelity display region information supplied from the high fidelity display region setting unit 32 . Then, the low fidelity image generation unit 34 supplies corrected high fidelity display region information indicating the high fidelity display region to which the correction has been applied to the pseudo gaze coincidence image generation unit 35 together with the generated low fidelity image. Note that detailed processing in the low fidelity image generation unit 34 will be described below with reference to FIGS. 11, 12A, 12B, 13A, 13B, 14A, 14B, 15A, and 15B .
  • the pseudo gaze coincidence image generation unit 35 superimposes the high fidelity image supplied from the high fidelity image generation unit 33 on the low fidelity image supplied from the low fidelity image generation unit 34 in the corrected high fidelity display region indicated by the corrected high fidelity display region information. Thereby, the pseudo gaze coincidence image generation unit 35 can generate the pseudo gaze coincidence image in which the mutual gazes coincide with each other in a pseudo manner as if the gaze of the principal user looks at the eyes of the other party's user as viewed from the virtual capture position, and supply the pseudo gaze coincidence image to the encoding unit 36 .
  • the encoding unit 36 encodes the object viewpoint information of the principal user supplied from the object viewpoint information setting unit 31 and the pseudo gaze coincidence image supplied from the pseudo gaze coincidence image generation unit 35 . Thereby, the encoding unit 36 generates a coded stream in which the object viewpoint information and the pseudo gaze coincidence image are encoded and supplies the coded stream to the transmission unit 37 .
  • the transmission unit 37 outputs the coded stream supplied from the encoding unit 36 to the communication unit 25 as a transmission stream to be transmitted via the network 13 in FIG. 1 , and the communication unit 25 transmits the transmission stream to the other party's telecommunication apparatus 12 .
  • the transmission unit 37 can multiplex a separately coded audio stream with the coded stream supplied from the encoding unit 36 and output the multiplexed coded stream as a transmission stream.
  • the reception unit 38 receives the transmission stream transmitted from the other party's telecommunication apparatus 12 via the network 13 in FIG. 1 , returns the transmission stream to the coded stream, and supplies the coded stream to the decoding unit 39 . At this time, in the case where the audio stream is multiplexed with the received transmission stream, the reception unit 38 inversely multiplexes the transmission stream into the audio stream and the coded stream, and outputs the coded stream to the decoding unit 39 .
  • the decoding unit 39 supplies the decoded object viewpoint information and decoded pseudo gaze coincidence image obtained by decoding the coded stream supplied from the reception unit 38 to the pseudo gaze coincidence image display unit 40 .
  • the decoded object viewpoint information is the object viewpoint information indicating the viewpoint position of the other party's user
  • the decoded pseudo gaze coincidence image is the pseudo gaze coincidence image in which the other party's user is captured.
  • the pseudo gaze coincidence image display unit 40 generates the pseudo gaze coincidence display image for displaying the other party's user with the size at the position for life size on the display device 22 , for example, on the basis of the decoded object viewpoint information and the decoded pseudo gaze coincidence image supplied from the decoding unit 39 . Then, the pseudo gaze coincidence image display unit 40 outputs the generated pseudo gaze coincidence display image to the display device 22 .
  • FIG. 3 is a block diagram illustrating a configuration example of the object viewpoint information setting unit 31 in FIG. 2 .
  • the object viewpoint information setting unit 31 includes a face part detection unit 51 , an eye region corresponding point detection unit 52 , a viewpoint distance calculation unit 53 , and an object viewpoint information generation unit 54 .
  • the face part detection unit 51 performs face part detection (facial landmark detection) for the three captured images captured by the capture devices 21 - 1 to 21 - 3 .
  • the face part detection unit 51 performs the face part detection using a technology disclosed in the non-patent document “One Millisecond Face Alignment with an Ensemble of Regression Trees” by Vahid Kazemi and Josephine Sullivan, CVPR 2014”, or the like.
  • the face part detection unit 51 can obtain the coordinates indicating the characteristic points of parts of the face included in the captured images, and outputs the coordinates as the analysis information to the high fidelity display region setting unit 32 in FIG. 2 and supplies them to the eye region corresponding point detection unit 52 .
  • FIG. 4 illustrates an example of the analysis information obtained by the face part detection unit 51 .
  • sixty eight characteristic points are arranged for the eyes, nose, mouth, eyebrows, and face contour detected as the face parts.
  • the eye region corresponding point detection unit 52 extracts the characteristic points arranged for regions of the eyes from the analysis information supplied from the face part detection unit 51 , and detects, for the characteristic points, corresponding points corresponding among the three images captured by the capture devices 21 - 1 to 21 - 3 . Specifically, the eye region corresponding point detection unit 52 detects the characteristic points with the same numbers given among the three images as the corresponding points from among the characteristic points (37th to 48th characteristic points, or 28th characteristic point may be added to the 37th to 48th characteristic points) in the regions of the eyes illustrated in FIG. 4 . Furthermore, the eye region corresponding point detection unit 52 may detect the corresponding points for part of the characteristic points, for example, in addition to detecting the corresponding points for all the characteristic points of the regions of the eyes.
  • FIG. 5 illustrates an example in which the eye region corresponding point detection unit 52 detects the characteristic points arranged for a right eye as the corresponding points.
  • the characteristic points arranged for the right eye are detected as the corresponding points among a captured image P 1 captured by the capture device 21 - 1 , a captured image P 2 captured by the captured image 21 - 2 , and a captured image P 3 captured by the capture device 21 - 3 .
  • the viewpoint distance calculation unit 53 calculates a distance to an eye of the object as a viewpoint distance on the basis of the corresponding points detected by the eye region corresponding point detection unit 52 .
  • the viewpoint distance calculation unit 53 corrects the capture devices 21 to a parallelized state as needed and uses the principle of triangulation, thereby obtaining the viewpoint distance.
  • the viewpoint distance calculation unit 53 may calculate the viewpoint distance using only the corresponding points detected from two captured images, of the three captured images captured by the capture devices 21 - 1 to 21 - 3 . Note that the viewpoint distance calculation unit 53 may calculate the viewpoint distance using all the corresponding points detected from the three captured images, using a plane sweep technique or the like.
  • the object viewpoint information generation unit 54 transforms the viewpoint distance calculated by the viewpoint distance calculation unit 53 into a coordinate value of a world coordinate system, generates the object viewpoint information indicating the viewpoint position of the object, and outputs the object viewpoint information.
  • FIG. 6 is a block diagram illustrating a configuration example of the high fidelity display region setting unit 32 in FIG. 2 .
  • the high fidelity display region setting unit 32 includes a high fidelity display mask generation unit 61 .
  • the analysis information output from the object viewpoint information setting unit 31 is supplied to the high fidelity display mask generation unit 61 .
  • the high fidelity display mask generation unit 61 generates a mask image for specifying the high fidelity display region that serves as the region for displaying the high fidelity image generated by the high fidelity image generation unit 33 , in the pseudo gaze coincidence image generated by the pseudo gaze coincidence image generation unit 35 , on the basis of the analysis information. Then, the high fidelity display mask generation unit 61 outputs the mask image for specifying the high fidelity display region as the high fidelity display region information.
  • the high fidelity display mask generation unit 61 can generate a polygon involving all the characteristic points included in the analysis information, in other words, a polygon covering all the face parts as the mask image indicating the high fidelity display region. Furthermore, as illustrated in FIG. 7B , the high fidelity display mask generation unit 61 may generate a polygon covering the region of the eyes, limiting the characteristic points to only the characteristic points arranged in the eyes, of the characteristic points included in the analysis information, as the mask image indicating the high fidelity display region.
  • the high fidelity display mask generation unit 61 may generate a predetermined region other than the above-described regions and including at least an eye region in which the eyes of the principal user are captured, as the mask image indicating the high fidelity display region.
  • the mask image may be a binary image or an image with continuous tone.
  • FIG. 8 is a block diagram illustrating a configuration example of the high fidelity image generation unit 33 in FIG. 2
  • the high fidelity image generation unit 33 includes a high fidelity display region cropping unit 71 and a viewpoint interpolation image generation unit 72 .
  • the high fidelity display region cropping unit 71 crops a portion corresponding to the high fidelity display region (mask image) indicated by the high fidelity display region information from the three captured images captured by the capture devices 21 - 1 to 21 - 3 . Then, the high fidelity display region cropping unit 71 supplies three images respectively cropped from the three captured images to the viewpoint interpolation image generation unit 72 .
  • the viewpoint interpolation image generation unit 72 first sets the positions of the eyes of when displaying the other party's user with a life size as the virtual capture position according to the viewpoint position of the other party's user in the three-dimensional real space indicated by the decoded object viewpoint information.
  • FIG. 9 illustrates the other party's user displayed with the life size on the display device 22 in FIG. 1 with the broken line, and the viewpoint interpolation image generation unit 72 sets the virtual capture position to a midpoint of both eyes of the other party's user.
  • the viewpoint interpolation image generation unit 72 applies viewpoint interpolation processing of interpolating the three images cropped as the high fidelity display region by the high fidelity display region cropping unit 71 to generate a viewpoint interpolation image as if the principal user is viewed from the virtual capture position, and outputs the viewpoint interpolation image as the high fidelity image.
  • the viewpoint interpolation processing performed setting the virtual capture position at a midpoint (a point indicated with the cross mark) between a midpoint (a point indicated with the triangle mark) between the capture devices 21 - 2 and 21 - 3 arranged right and left, and the capture device 21 - 1 arranged above the capture devices 21 - 2 and 21 - 3 , as illustrated in FIG. 10 , will be described.
  • the viewpoint interpolation image generation unit 72 generates a virtual viewpoint intermediate image that is an interpolation image in a horizontal direction of the position of the triangle mark from the two captured images captured by the capture devices 21 - 2 and 21 - 3 such that the degrees of influence from the two captured images equally affect the virtual viewpoint intermediate image. Then, the viewpoint interpolation image generation unit 72 generates an interpolation image in a vertical direction such that the degrees of influence from the virtual viewpoint intermediate image and the captured image captured by the capture device 21 - 1 equally affect the interpolation image.
  • the interpolation image generated in this manner is a viewpoint interpolation image viewed from the virtual capture position (the point indicated by the cross mark) illustrated in FIG. 10 , in other words, the high fidelity image.
  • FIG. 11 is a block diagram illustrating a configuration example of the low fidelity image generation unit 34 in FIG. 2 .
  • the low fidelity image generation unit 34 includes a projective transformation parameter estimation unit 81 and a projective transformation processing unit 82 .
  • the projective transformation parameter estimation unit 81 estimates a parameter for performing projective transformation to make an image close to an image viewed from the virtual capture position according to the viewpoint position of the other party's user in the three-dimensional real space indicated by the decoded object viewpoint information. Then, the projective transformation parameter estimation unit 81 supplies a projective transformation parameter indicating the estimated parameter to the projective transformation processing unit 82 .
  • the projective transformation processing unit 82 applies projective transformation using the parameter indicated by the projective transformation parameter supplied from the projective transformation parameter estimation unit 81 to the captured image captured by the capture device 21 a - 1 to generate a low fidelity image. Moreover, the projective transformation processing unit 82 applies projective transformation using the parameter used to generate the low fidelity image to the mask image (see FIGS. 7A and 7B ) that is the high fidelity display region indicated by the high fidelity display region information supplied from the high fidelity display region setting unit 32 . Thereby, the projective transformation processing unit 82 corrects the mask image to correspond to the low fidelity image, and sets the mask image as corrected high fidelity display region information. Then, the projective transformation processing unit 82 outputs the low fidelity image and the corrected high fidelity display region information.
  • FIG. 12A illustrates a state where the object stands in front of the display device 22 , as illustrated in FIG. 12A , and when the object is assumed to be an object approximate plane approximating a planar plate without having a thickness in the depth direction, geometric correction is realized by projective transformation.
  • FIG. 12B illustrates a schematic diagram of a person image obtained by capturing the object by the capture device 21 - 1 .
  • FIGS. 14A and 14B the projective transformation in a case where the virtual capture position is higher than the object viewpoint will be described with reference to FIGS. 14A and 14B .
  • a parameter of projective transformation equivalent to rotation at an angle b that is the same as an angle b made by the straight line connecting the capture device 21 - 1 and the object viewpoint and the straight line connecting the virtual capture position and the object viewpoint.
  • a low fidelity image spreading upward as if looking down the object from above, can be generated, as illustrated in FIG. 14B .
  • FIGS. 15A and 15B projective transformation in a case where the virtual capture position is lower than the object viewpoint will be described with reference to FIGS. 15A and 15B .
  • a low fidelity image spreading downward as if looking up the object from below, can be generated, as illustrated in FIG. 15B .
  • the low fidelity image generation unit 34 can generate the low fidelity image close to how the object looks (facing, looking down, or looking up) corresponding to the viewpoint of the other party's user, using the viewpoint of the other party's user as the virtual capture position.
  • FIG. 16 is a block diagram illustrating a configuration example of the pseudo gaze coincidence image generation unit 35 in FIG. 2 .
  • the pseudo gaze coincidence image generation unit 35 includes a mask image filter processing unit 91 and a high fidelity display region blending processing unit 92 .
  • the mask image filter processing unit 91 applies filter processing with a morphology filter, a lowpass filter, or the like to the high fidelity display region (corrected mask image) indicated by the corrected high fidelity display region information output from the low fidelity image generation unit 34 . Thereby, the mask image filter processing unit 91 generates a blend map image in which a value (blend ratio) in a boundary of the mask image gradually changes and the boundary is less noticeable in subsequent blending processing, and supplies the blend map image to the high fidelity display region blending processing unit 92 .
  • the high fidelity display region blending processing unit 92 performs alpha blending processing for the high fidelity image and the low fidelity image according to the blend ratio set in the blend map image supplied from the mask image filter processing unit 91 . Thereby, the high fidelity display region blending processing unit 92 generates and outputs a pseudo gaze coincidence image in which the above-described portion of the face as illustrated in FIGS. 7A and 7B are replaced with the high fidelity image with respect to the low fidelity image.
  • FIG. 17 is a block diagram illustrating a configuration example of the encoding unit 36 in FIG. 2 .
  • the encoding unit 36 includes an object viewpoint information encoding unit 101 , a video codec encoding unit 102 , and a stream integration unit 103 .
  • the object viewpoint information encoding unit 101 encodes the object viewpoint information by an arbitrary encoding method consistent with the decoding side, and supplies an additional stream, which is generated by encoding the object viewpoint information, to the stream integration unit 103 .
  • the object viewpoint information encoding unit 101 can adopt an encoding method using general lossless encoding such as Ziv-Lempel (LZ) encoding.
  • the video codec encoding unit 102 encodes the pseudo gaze coincidence image using generally used arbitrary video codec such as moving picture experts group (MPEG)-2 or high efficiency video coding (H.264,HEVC) to generate a video stream. Then, the video codec encoding unit 102 supplies the generated video stream to the stream integration unit 103 .
  • MPEG moving picture experts group
  • H.264,HEVC high efficiency video coding
  • the stream integration unit 103 integrates the additional stream supplied from the object viewpoint information encoding unit 101 and the video stream supplied from the video codec encoding unit 102 , and outputs the integrated streams from the encoding unit 36 as a coded stream.
  • the stream integration unit 103 can adopt an integration method of embedding the additional stream generated in the object viewpoint information encoding unit 101 into a header portion where user information of the video stream is recordable.
  • FIG. 18 is a block diagram illustrating a configuration example of the decoding unit 39 in FIG. 2 .
  • the decoding unit 39 includes a stream separation unit 111 , an object viewpoint information decoding unit 112 , and a video codec decoding unit 113 .
  • the stream separation unit 111 separates the coded stream supplied from the reception unit 38 in FIG. 2 into the additional stream and the video stream. Then, the stream separation unit 111 supplies the additional stream to the object viewpoint information decoding unit 112 and supplies the video stream to the video codec decoding unit 113 .
  • the object viewpoint information decoding unit 112 decodes the additional stream supplied from the stream separation unit 111 to the decoded object viewpoint information and outputs the decoded object viewpoint information.
  • the decoded object viewpoint information is decoded after the object viewpoint information indicating the viewpoint position of the other party's user is encoded on the other party side and transmitted.
  • the video codec decoding unit 113 decodes the video stream supplied from the stream separation unit 111 to the decoded pseudo gaze coincidence image and outputs the decoded pseudo gaze coincidence image.
  • the decoded pseudo gaze coincidence image is decoded after the pseudo gaze coincidence image in which the other party's user is captured is encoded on the other party side and transmitted.
  • FIG. 19 is a block diagram illustrating a configuration example of the pseudo gaze coincidence image display unit 40 in FIG. 2 .
  • the pseudo gaze coincidence image display unit 40 includes a life-size-display geometric correction parameter estimation unit 121 and a life-size-display geometric correction processing unit 122 .
  • the life-size-display geometric correction parameter estimation unit 121 estimates a life-size-display geometric correction parameter with which the size of the face and the positions of the eyes of the other party's user displayed on the display device 22 are displayed with actual sizes on the basis of the viewpoint position (defined with world coordinates) of the other party's user in the three-dimensional real space indicated by the decoded object viewpoint information. At this time, the life-size-display geometric correction parameter estimation unit 121 estimates the life-size-display geometric correction parameter in consideration of a resolution and a size of the display device 22 , a resolution of the decoded pseudo gaze coincidence image, and the like, and supplies the life-size-display geometric correction parameter to the life-size-display geometric correction processing unit 122 .
  • the life-size-display geometric correction processing unit 122 applies geometric correction using the life-size-display geometric correction parameter supplied from the life-size-display geometric correction parameter estimation unit 121 to the decoded pseudo gaze coincidence image. Thereby, the life-size-display geometric correction processing unit 122 generates the pseudo gaze coincidence display image to be displayed on the display device 22 with the size at the position (at the height of the eyes) where the other party's user is displayed in a life-size manner. Then, the life-size-display geometric correction processing unit 122 outputs the pseudo gaze coincidence display image to the display device 22 in FIG. 1 and causes the display device 22 to display the pseudo gaze coincidence display image.
  • the blocks included in the image processing unit 24 are configured as described above, and for example, a video communication experience with coincidence of the gaze with the other party's user can be provided in the interactive communication performed with the other party's user displayed in a life-size manner.
  • Image processing performed by the image processing unit 24 will be described with reference to the flowcharts illustrated in FIGS. 20 and 21 .
  • FIG. 20 illustrates a flowchart for describing image processing of outputting the pseudo gaze coincidence image in which the principal user is captured.
  • the processing is started.
  • step S 11 the object viewpoint information setting unit 31 acquires the analysis information indicating the coordinates of the parts of the face on the images on the basis of the captured images and the depth information obtained from the input capture signal, and supplies the analysis information to the high fidelity display region setting unit 32 . Furthermore, the object viewpoint information setting unit 31 acquires the object viewpoint information indicating the viewpoint position of the principal user in the three-dimensional space on the basis of the captured images and the depth information obtained from the input capture signal, and supplies the object viewpoint information to the encoding unit 36 .
  • step S 12 the high fidelity display region setting unit 32 sets the high fidelity display region for displaying the high fidelity image on the basis of the analysis information supplied from the object viewpoint information setting unit 31 in step S 11 . Then, the high fidelity display region setting unit 32 supplies the high fidelity display region information indicating the high fidelity display region to the high fidelity image generation unit 33 and the low fidelity image generation unit 34 .
  • step S 13 the high fidelity image generation unit 33 generates the high fidelity image using the viewpoint interpolation technology or the like for the captured images, using the viewpoint position of the other party's user as the virtual capture position, limiting the display region to the high fidelity display region set by the high fidelity display region setting unit 32 in step S 12 . Then, the high fidelity image generation unit 33 supplies the high fidelity image to the pseudo gaze coincidence image generation unit 35 .
  • step S 14 the low fidelity image generation unit 34 performs the geometric correction for the captured images, using the viewpoint position of the other party's user as the virtual capture position to generate the low fidelity image. Moreover, the low fidelity image generation unit 34 applies the correction for reflecting the influence by the geometric correction of when generating the low fidelity image to the high fidelity display region set by the high fidelity display region setting unit 32 in step S 12 . Then, the low fidelity image generation unit 34 supplies the low fidelity image and the corrected high fidelity display region information to the pseudo gaze coincidence image generation unit 35 .
  • step S 15 the pseudo gaze coincidence image generation unit 35 superimposes the high fidelity image supplied from the high fidelity image generation unit 33 in step S 13 on the low fidelity image supplied from the low fidelity image generation unit 34 in step S 14 in the corrected high fidelity display region.
  • the pseudo gaze coincidence image generation unit 35 generates the pseudo gaze coincidence image that is coincident in a pseudo manner as viewed from the other party's user when the principal user holds the gaze with the other party's user displayed on the display device 22 , and supplies the pseudo gaze coincidence image to the encoding unit 36 .
  • step S 16 the encoding unit 36 encodes the object viewpoint information of the principal user supplied from the object viewpoint information setting unit 31 in step S 11 and the pseudo gaze coincidence image supplied from the pseudo gaze coincidence image generation unit 35 in step S 15 , and supplies the coded stream to the transmission unit 37 .
  • step S 17 the transmission unit 37 outputs the coded stream supplied from the encoding unit 36 in step S 16 to the communication unit 25 as the transmission stream to be transmitted via the network 13 in FIG. 1 . Then, after the communication unit 25 transmits the transmission stream to the other party's telecommunication apparatus 12 , the processing returns to step S 11 .
  • similar processing is repeatedly performed until the telecommunication is terminated.
  • the image processing unit 24 can transmit the object viewpoint information of the principal user and can transmit the pseudo gaze coincidence image having the gaze coincident in a pseudo manner with the principal user as viewed from the other party's user to the other party.
  • FIG. 21 illustrates a flowchart for describing image processing of displaying the pseudo gaze coincidence image in which the other party's user is captured.
  • the processing is started.
  • step S 21 the reception unit 38 receives the transmission stream, returns the transmission stream to the coded stream, and supplies the coded stream to the decoding unit 39 .
  • step S 22 the decoding unit 39 decodes the coded stream supplied from the reception unit 38 in step S 21 , acquires the decoded object viewpoint information and the decoded pseudo gaze coincidence image, and supplies the decoded object viewpoint information and the decoded pseudo gaze coincidence image to the pseudo gaze coincidence image display unit 40 .
  • step S 23 the pseudo gaze coincidence image display unit 40 generates the pseudo gaze coincidence display image on the basis of the decoded object viewpoint information and the decoded pseudo gaze coincidence image supplied from the decoding unit 39 in step S 22 , and outputs the pseudo gaze coincidence display image to the display device 22 .
  • the pseudo gaze coincidence image display unit 40 generates the pseudo gaze coincidence display image displayed such that the gaze coincides in a pseudo manner as viewed from the principal user when the other party's user holds the gaze with the principal user displayed on the other party's display device 22 .
  • the processing returns to step S 21 .
  • similar processing is repeatedly performed until the telecommunication is terminated.
  • the image processing unit 24 can display the pseudo gaze coincidence display image having the gaze coincident in a pseudo manner with the other party's user as viewed from the principal user.
  • a second configuration example of the image processing unit 24 will be described with reference to FIG. 22 .
  • FIG. 22 is a block diagram illustrating the second configuration example of the image processing unit 24 . Note that, in an image processing unit 24 A illustrated in FIG. 22 , configurations common to the image processing unit 24 in FIG. 2 are denoted with the same reference numerals and detailed description of the configurations is omitted.
  • the image processing unit 24 A has a configuration common to the image processing unit 24 in FIG. 2 in including the object viewpoint information setting unit 31 , the high fidelity display region setting unit 32 , the high fidelity image generation unit 33 , the low fidelity image generation unit 34 , the pseudo gaze coincidence image generation unit 35 , and the pseudo gaze coincidence image display unit 40 .
  • the image processing unit 24 A includes a mirror image display processing unit 41 .
  • the image processing unit 24 A has the configuration assuming the use as an electronic mirror for displaying the principal user, not assuming the use as the interactive telecommunication.
  • the image processing unit 24 A is configured to supply the object viewpoint information of the principal user from the object viewpoint information setting unit 31 to the high fidelity image generation unit 33 and the low fidelity image generation unit 34 , instead of the decoded object viewpoint information described with reference to FIG. 2 . Therefore, the high fidelity image generation unit 33 and the low fidelity image generation unit 34 respectively generate the high fidelity image and the low fidelity image, using the viewpoint position of the principal user as the virtual capture position.
  • the image processing unit 24 A is configured such that the object viewpoint information of the principal user is directly supplied from the object viewpoint information setting unit 31 to the pseudo gaze coincidence image display unit 40 and the pseudo gaze coincidence image is directly supplied from the pseudo gaze coincidence image generation unit 35 to the pseudo gaze coincidence image display unit 40 . Therefore, the pseudo gaze coincidence image display unit 40 generates the pseudo gaze coincidence display image for displaying the principal user with a size at a position for life size on the display device 22 in consideration of the viewpoint position of the principal user. Then, the pseudo gaze coincidence image display unit 40 supplies the generated pseudo gaze coincidence display image to the mirror image display processing unit 41 .
  • the mirror image display processing unit 41 performs mirror image display processing of horizontally reversing the pseudo gaze coincidence display image supplied from the pseudo gaze coincidence image display unit 40 , assuming the use as an electronic mirror, and outputs the pseudo gaze coincidence display image to the display device 22 .
  • the pseudo gaze coincidence display image in which the principal user is captured as if the principal user looks at a mirror and the principal user is horizontally reversed is displayed on the display device 22 .
  • the image processing unit 24 A configured as described above can perform the viewpoint interpolation processing in the high fidelity image generation unit 33 and the geometric correction in the pseudo gaze coincidence image display unit 40 in consideration of the viewpoint position of the principal user when performing electronic mirror display for the principal user with life size.
  • the principal user can confirm, for example, the expression of the face with the gaze coincident with the principal user as when looking at a mirror.
  • AR augmented reality
  • a third configuration example of the image processing unit 24 will be described with reference to FIGS. 23 to 29 .
  • FIG. 23 is a block diagram illustrating the third configuration example of the image processing unit 24 . Note that, in an image processing unit 24 B illustrated in FIG. 23 , configurations common to the image processing unit 24 in FIG. 2 are denoted with the same reference numerals and detailed description of the configurations is omitted.
  • the image processing unit 24 B has a configuration common to the image processing unit 24 in FIG. 2 in including the high fidelity image generation unit 33 , the low fidelity image generation unit 34 , the pseudo gaze coincidence image generation unit 35 , the transmission unit 37 , and the reception unit 38 .
  • the image processing unit 24 B includes an object viewpoint information setting unit 31 B, a high fidelity display region setting unit 32 B, an encoding unit 36 B, a decoding unit 39 B, a pseudo gaze coincidence image display unit 40 B, a high fidelity display information setting unit 42 , and an object viewpoint information setting unit 43 .
  • the three-dimensionally measured viewpoint position of the other party's user is used as the virtual capture position, whereas in the image processing unit 24 B, a simply fixed virtual capture position is used.
  • the object viewpoint information setting unit 31 B is configured to set and supply fixed object viewpoint information to the high fidelity image generation unit 33 and the low fidelity image generation unit 34 . Then, the high fidelity image generation unit 33 and the low fidelity image generation unit 34 respectively generate the high fidelity image and the low fidelity image on the basis of the fixed object viewpoint information. Furthermore, the fixed object viewpoint information is also output to the pseudo gaze coincidence image display unit 40 B included in the other party's image processing unit 24 B.
  • the fixed object viewpoint information set by the object viewpoint information setting unit 31 B is information indicating a relative positional relationship between the three capture devices 21 - 1 to 21 - 3 and the display device 22 .
  • the fixed object viewpoint information can be determined from an average value of the height of the user who uses the telecommunication apparatus 12 and the distance from the display device 22 to a user's standing position.
  • the high fidelity display information setting unit 42 outputs a representative position of a mask region of the corrected high fidelity display region information (for example, a position of the center of gravity or coordinates of a position corresponding to the eyes) and an area of the mask region to the encoding unit 36 B as the high fidelity display information.
  • the object viewpoint information setting unit 43 is configured to set and supply the fixed object viewpoint information to the pseudo gaze coincidence image display unit 40 B, similarly to the object viewpoint information setting unit 31 B.
  • the object viewpoint information is also supplied to the high fidelity image generation unit 33 and the low fidelity image generation unit 34 included in the other party's image processing unit 24 B.
  • FIG. 25 is a block diagram illustrating a configuration example of the high fidelity display region setting unit 32 B in FIG. 23 .
  • the input video signal is supplied to the high fidelity display region setting unit 32 B, unlike the high fidelity display region setting unit 32 in FIG. 2 .
  • the high fidelity display region setting unit 32 B has a common configuration to the high fidelity display region setting unit 32 in FIG. 6 in including the high fidelity display mask generation unit 61 , and further includes a face part detection unit 62 .
  • the input video signal is supplied to the face part detection unit 62 .
  • the face part detection unit 62 can obtain coordinates indicating the characteristic points of parts of a face included in the captured images, similarly to the face part detection unit 51 included in the object viewpoint information setting unit 31 illustrated in FIG. 3 , and supplies the coordinates to the high fidelity display mask generation unit 61 as the analysis information.
  • the analysis information is used as an internal signal of the high fidelity display region setting unit 32 B.
  • FIG. 26 is a block diagram illustrating a configuration example of the encoding unit 36 B in FIG. 23 .
  • the high fidelity display information is supplied from the high fidelity display information setting unit 42 to the encoding unit 36 B.
  • the encoding unit 36 B has a common configuration to the encoding unit 36 in FIG. 17 in including the video codec encoding unit 102 and the stream integration unit 103 , and further includes a high fidelity display information encoding unit 104 .
  • the high fidelity display information encoding unit 104 encodes the high fidelity display information supplied from the high fidelity display information setting unit 42 in FIG. 23 , and supplies the high fidelity display information to the stream integration unit 103 as the additional stream. Therefore, the stream integration unit 103 integrates the additional stream in which the high fidelity display information is encoded, and the video stream supplied from the video codec encoding unit 102 , and outputs the integrated streams from the encoding unit 36 B as the coded stream.
  • FIG. 27 is a block diagram illustrating a configuration example of the decoding unit 39 B in FIG. 23 .
  • the coded stream encoded by the encoding unit 36 B is supplied to the decoding unit 39 B.
  • the decoding unit 39 B has a common configuration to the decoding unit 39 in FIG. 18 in including the stream separation unit 111 and the video codec decoding unit 113 , and further includes a high fidelity display information decoding unit 114 .
  • the additional stream separated from the coded stream by the stream separation unit 111 is supplied to the high fidelity display information decoding unit 114 .
  • the high fidelity display information decoding unit 114 decodes the additional stream to the decoded high fidelity display information and outputs the decoded high fidelity display information.
  • the decoded high fidelity display information is decoded after the high fidelity display information of the other party's user is encoded on the other party side and transmitted.
  • FIG. 28 is a block diagram illustrating a configuration example of the pseudo gaze coincidence image display unit 40 B in FIG. 23 .
  • the decoded high fidelity display information output from the decoding unit 39 B and the decoded pseudo gaze coincidence image are supplied to the pseudo gaze coincidence image display unit 40 B, and the object viewpoint information is supplied from the object viewpoint information setting unit 43 to the pseudo gaze coincidence image display unit 40 B.
  • the pseudo gaze coincidence image display unit 40 B has a common configuration to the pseudo gaze coincidence image display unit 40 in FIG. 19 in including the life-size-display geometric correction processing unit 122 , and further includes a life-size-display geometric correction parameter estimation unit 131 .
  • the life-size-display geometric correction parameter estimation unit 131 estimates the life-size-display geometric correction parameter with which the size of the face and the positions of the eyes of the other party's user displayed on the display device 22 are displayed with actual sizes, similarly to the life-size-display geometric correction parameter estimation unit 121 in FIG. 19 .
  • the virtual capture position in the pseudo gaze coincidence display image of the principal user and the positions of the eyes (viewpoint) displayed in the pseudo gaze coincidence display image of the other party's user are need to be displayed to coincide with each other (or at proximity positions).
  • a geometric correction parameter including a component that moves in parallel according to a difference between the object viewpoint indicated by the fixed object viewpoint information and the position indicated by the decoded high fidelity display information, and a scaling component with which the area indicated by the decoded high fidelity display information becomes a life size is estimated.
  • the life-size-display geometric correction processing unit 122 applies the geometric correction using the geometric correction parameter supplied from the life-size-display geometric correction parameter estimation unit 131 to generate the pseudo gaze coincidence display image, and outputs and displays the pseudo gaze coincidence display image on the display device 22 in FIG. 1 .
  • the image processing unit 24 B configured as described above does not use the measured object viewpoint information as in the image processing unit 24 in FIG. 2 . Therefore, the degree of causing the gazes to coincide with each other declines as compared with the image processing unit 24 in FIG. 2 .
  • the image processing unit 24 B has an advantage that the processing of measuring the object viewpoint is not necessary and the effect of causing the gazes to coincide with each other does not depend on calibration accuracy and the like. Therefore, the image processing unit 24 B can realize more robust operation while maintaining the effect of performing telecommunication with coincident gazes in a case where change in the viewpoint positions of the respective users is small, for example.
  • a fourth configuration example of the image processing unit 24 will be described with reference to FIGS. 30, 31A, and 31B .
  • FIG. 30 is a block diagram illustrating the fourth configuration example of the image processing unit 24 . Note that, in an image processing unit 24 C illustrated in FIG. 30 , configurations common to the image processing unit 24 in FIG. 2 are denoted with the same reference numerals and detailed description of the configurations is omitted.
  • the image processing unit 24 C has a configuration common to the image processing unit 24 in FIG. 2 in including the object viewpoint information setting unit 31 , the high fidelity display region setting unit 32 , the high fidelity image generation unit 33 , the low fidelity image generation unit 34 , the pseudo gaze coincidence image generation unit 35 , the encoding unit 36 , the transmission unit 37 , the reception unit 38 , the decoding unit 39 , and the pseudo gaze coincidence image display unit 40 .
  • the image processing unit 24 C includes a capture means control unit 44 .
  • the image processing unit 24 C has a configuration in which the capture means control unit 44 is newly added to the image processing unit 24 in FIG. 2 .
  • the captured images output from the capture devices 21 are input to the capture means control unit 44 , and the capture means control unit 44 can output the input capture signal. Moreover, the capture means control unit 44 feeds back the high fidelity display region information output from the high fidelity display region setting unit 32 , thereby changing a focal length, a posture, and the like of the capture devices 21 and controlling (PTZ controlling) pitch, tilt, and zoom.
  • the PTZ control by the capture means control unit 44 will be described with reference to FIGS. 31A and 31B .
  • an input capture signal in which the captured image P 1 captured by the capture device 21 - 1 , the captured image P 2 captured by the captured image 21 - 2 , and the captured image P 3 captured by the capture device 21 - 3 are multiplexed is input to the image processing unit 24 C.
  • the high fidelity image generation unit 33 generates the high fidelity image using the captured image P 2 and the captured image P 3
  • the low fidelity image generation unit 34 generates the low fidelity image using the captured image P 1 .
  • FIG. 31A illustrates the captured images P 1 to P 3 captured in an initial capture state, regions where the high fidelity display region is set in the high fidelity display region setting unit 32 are hatched in the captured images P 1 to P 3 .
  • the capture means control unit 44 obtains a ratio of the high fidelity display region to the entire area of the captured images P 2 and P 3 . Then, the capture means control unit 44 performs PTZ control for the capture devices 21 - 2 and 21 - 3 such that the ratio becomes a predetermined value in a case where the ratio of the high fidelity display region to the entire area of the captured images P 2 and P 3 is the predetermined value or less. In other words, the capture means control unit 44 performs zoom control to make the high fidelity display region wide (pan or tilt control as needed) in a case where the high fidelity display region is narrow in the captured images P 2 and P 3 .
  • the high fidelity display regions are captured widely with respect to the respective entire areas such that the ratios of the high fidelity display regions become the predetermined value in captured images P 2 ′ and P 3 ′ for which the zoom control has been performed.
  • the captured images P 2 ′ and P 3 ′ with the high fidelity display regions zoomed by the capture means control unit 44 are supplied to the high fidelity image generation unit 33 .
  • the high fidelity image generation unit 33 can generate the high fidelity image with a higher resolution
  • the pseudo gaze coincidence image generation unit 35 can generate the pseudo gaze coincidence display image having the high fidelity display region with an enhanced resolution.
  • the image processing unit 24 C configured as described above can generate the high fidelity image with a higher resolution by the pseudo gaze coincidence display image having the high fidelity display region with an enhanced resolution, and can perform more realistic telecommunication.
  • a fifth configuration example of the image processing unit 24 will be described with reference to FIGS. 32, 33, 34, 35A, and 35B .
  • FIG. 32 is a block diagram illustrating the fifth configuration example of the image processing unit 24 . Note that, in an image processing unit 24 D illustrated in FIG. 32 , configurations common to the image processing unit 24 in FIG. 2 are denoted with the same reference numerals and detailed description of the configurations is omitted.
  • the image processing unit 24 D has a configuration common to the image processing unit 24 in FIG. 2 in including the high fidelity image generation unit 33 , the low fidelity image generation unit 34 , the pseudo gaze coincidence image generation unit 35 , the encoding unit 36 , the transmission unit 37 , the reception unit 38 , the decoding unit 39 , and the pseudo gaze coincidence image display unit 40 .
  • the image processing unit 24 D includes an object viewpoint information setting unit 31 D and a high fidelity display region setting unit 32 D.
  • FIG. 33 is a block diagram illustrating a configuration example of the object viewpoint information setting unit 31 D in FIG. 32 .
  • the object viewpoint information setting unit 31 D has a common configuration to the object viewpoint information setting unit 31 in FIG. 3 in including the eye region corresponding point detection unit 52 , the viewpoint distance calculation unit 53 , and the object viewpoint information generation unit 54 , and further includes a face part detection unit 51 D and a glasses wearing recognition unit 55 .
  • the face part detection unit 51 D detects reliability of when detecting face parts in addition to the coordinates of the characteristic points of the parts of the face included in the captured images, similarly to the face part detection unit 51 in FIG. 3 , and outputs the analysis information including the reliability to the high fidelity display region setting unit 32 D.
  • the glasses wearing recognition unit 55 recognizes whether or not a pair of glasses is worn on the face captured in the captured image. Then, in a case where the glasses wearing recognition unit 55 recognizes that a pair of glasses is worn, the glasses wearing recognition unit 55 outputs glasses wearing information indicating the recognition to the high fidelity display region setting unit 32 D. Note that the glasses wearing recognition unit 55 is available as attribute information of a general face recognition technology.
  • FIG. 34 is a block diagram illustrating a configuration example of the high fidelity display region setting unit 32 D in FIG. 32 .
  • the high fidelity display region setting unit 32 D includes a high fidelity display mask generation unit 61 D, and the analysis information and the glasses wearing information are supplied to the high fidelity display mask generation unit 61 D.
  • the high fidelity display mask generation unit 61 D sets the high fidelity display region, avoiding a portion where a rim of the glasses is present from the face captured in the captured image. For example, artifacts is determined to be likely to occur in the portion where the rim of the glasses is present, so by avoiding the portion and setting the high fidelity display region, data errors, distortion of signals, and the like can be avoided.
  • the high fidelity display region is set in a region avoiding the portion where the rim of the glasses is present, as compared with the above-described mask image in FIG. 7A .
  • the high fidelity display mask generation unit 61 D may set the high fidelity display region in only regions of the eyes, as illustrated in FIG. 35B , in a case where the reliability of the face parts such as a contour portion of the face is determined to be low on the basis of the analysis information.
  • the image processing unit 24 D configured as described above can generate the high fidelity image, avoiding the region where the possibility of occurrence of artifacts is high, in the subsequent high fidelity image generation unit 33 , by setting the high fidelity display region information using the glasses wearing information, the reliability, and the like. Therefore, the fidelity of the high fidelity image can be enhanced and more realistic telecommunication can be performed.
  • FIGS. 36, 37, 38A and 38B A sixth configuration example of the image processing unit 24 will be described with reference to FIGS. 36, 37, 38A and 38B .
  • FIG. 36 is a block diagram illustrating the sixth configuration example of the image processing unit 24 . Note that, in an image processing unit 24 E illustrated in FIG. 36 , configurations common to the image processing unit 24 in FIG. 2 are denoted with the same reference numerals and detailed description of the configurations is omitted.
  • the image processing unit 24 E has a configuration common to the image processing unit 24 in FIG. 2 in including the object viewpoint information setting unit 31 , the high fidelity display region setting unit 32 , the high fidelity image generation unit 33 , the low fidelity image generation unit 34 , the encoding unit 36 , the transmission unit 37 , the reception unit 38 , the decoding unit 39 , and the pseudo gaze coincidence image display unit 40 .
  • the image processing unit 24 E includes a pseudo gaze coincidence image generation unit 35 E.
  • FIG. 37 is a block diagram illustrating a configuration example of the pseudo gaze coincidence image generation unit 35 E in FIG. 36 .
  • the pseudo gaze coincidence image generation unit 35 E has a common configuration to the pseudo gaze coincidence image generation unit 35 in FIG. 16 in including the mask image filter processing unit 91 and the high fidelity display region blending processing unit 92 , and further includes a high fidelity determination unit 93 .
  • the high fidelity determination unit 93 determines similarity of image data of the high fidelity image and the low fidelity image in the corrected high fidelity display region indicated by the corrected high fidelity display region information supplied from the low fidelity image generation unit 34 .
  • the high fidelity determination unit 93 can obtain the similarity of the image data according to a ratio of coincidence of the positions of the parts of the faces between the high fidelity image and the low fidelity image. In other words, in a case where the ratio of coincidence of the positions of the parts of the faces is high between the high fidelity image and the low fidelity image, the similarity of the image data is high. In a case where the ratio of coincidence of the positions of the parts of the faces is low between the high fidelity image and the low fidelity image, the similarity of the image data is low.
  • the high fidelity determination unit 93 generates a blend ratio map image in which the blend ratio is set such that the blend ratio of the high fidelity image becomes higher as the similarity is higher, and the blend ratio of the high fidelity image becomes lower as the similarity is lower, and supplies the blend ratio map image to the mask image filter processing unit 91 .
  • the corrected high fidelity display regions indicated by the corrected high fidelity display region information are slightly hatched in the low fidelity image and the high fidelity image. Furthermore, in the blend ratio map image, the blend ratio of the high fidelity image is made higher (darker hatching) in the region with high similarity, and the blend ratio of the high fidelity image is made lower in the region with low similarity.
  • FIG. 38A illustrates an example of high similarity of the image data of the high fidelity image and the low fidelity image in the corrected high fidelity display regions. Therefore, the blend ratio map image in which the blend ratio of the high fidelity image is set to be high is generated for the entire corrected high fidelity display region.
  • FIG. 38B illustrates an example of low similarity of the image data of the high fidelity image and the low fidelity image in the corrected high fidelity display region, in which noses, mouths, and the like are shifted and synthesized. Therefore, the similarity is low in the regions of the noses, mouths, and the like, and the blend ratio map image in which the blend ratio of the high fidelity image in the region is set to be low is generated.
  • the determination processing by the high fidelity determination unit 93 is performed, and the blend ratio map image according to the similarity is supplied to the mask image filter processing unit 91 .
  • subsequent processing of the mask image filter processing unit 91 is similarly performed to the image processing unit 24 in FIG. 2 .
  • the image processing unit 24 E configured as described above can display the image without generating artifacts although the effect of causing the gazes to coincide with each other decreases in a case where the quality of the high fidelity image generated by the viewpoint interpolation processing is poor.
  • a seventh configuration example of the image processing unit 24 will be described with reference to FIGS. 39 to 43 .
  • FIG. 39 is a block diagram illustrating the seventh configuration example of the image processing unit 24 . Note that, in an image processing unit 24 F illustrated in FIG. 39 , configurations common to the image processing unit 24 in FIG. 2 are denoted with the same reference numerals and detailed description of the configurations is omitted.
  • the image processing unit 24 F has a configuration common to the image processing unit 24 in FIG. 2 in including the object viewpoint information setting unit 31 , the high fidelity display region setting unit 32 , the high fidelity image generation unit 33 , the transmission unit 37 , the reception unit 38 , and the pseudo gaze coincidence image display unit 40 .
  • the image processing unit 24 F includes an encoding unit 36 F, a decoding unit 39 F, a low fidelity image generation parameter generation unit 45 , an object viewpoint information setting unit 46 , a low fidelity image generation unit 47 , and a pseudo gaze coincidence image generation unit 48 .
  • processing is different from the processing of the image processing unit 24 in FIG. 2 in that the low fidelity image is configured by a computer graphic (CG) avatar animation.
  • the low fidelity image generation parameter generation unit 45 arranged on the transmission side generates a parameter for generating the low fidelity image with the CG avatar animation.
  • the low fidelity image generation unit 47 arranged on the reception side generates the low fidelity image with the CG avatar animation.
  • the viewpoint position of the other party's user (the decoded object viewpoint information described with reference to FIG. 2 , for example) is used in the high fidelity image generation unit 33 on the transmission side.
  • the low fidelity image generation unit 47 is arranged after the reception and thus uses the information of the viewpoint position of the principal user set in the object viewpoint information setting unit 46 , unlike the low fidelity image generation unit 34 in FIG. 2 .
  • FIG. 40 is a block diagram illustrating a configuration example of the low fidelity image generation parameter generation unit 45 in FIG. 39 .
  • the low fidelity image generation parameter generation unit 45 includes a person skeleton analysis unit 141 , a person body model parameter extraction unit 142 , a person body model parameter motion estimation unit 143 , a face modeling parameter extraction unit 144 , a face model parameter motion estimation unit 145 , and a model parameter information integration unit 146 .
  • the person skeleton analysis unit 141 generates person skeleton information for a part of the captured images obtained from the input video signal, and supplies the person skeleton information to the person body model parameter extraction unit 142 .
  • the person body model parameter extraction unit 142 generates person mesh information on the basis of the person skeleton information supplied from the person skeleton analysis unit 141 , and supplies the person mesh information to the person body model parameter motion estimation unit 143 and the model parameter information integration unit 146 .
  • the person body model parameter motion estimation unit 143 obtains person mesh motion information corresponding to a motion of the object and indicating a motion of a vertex of each mesh of the person mesh information (or a mesh geometric transformation parameter), and supplies the person mesh information to the model parameter information integration unit 146 .
  • the face modeling parameter extraction unit 144 generates face mesh information according to face part positions indicated by the analysis information, using the analysis information obtained from the input video signal.
  • the face model parameter motion estimation unit 145 obtains face mesh motion information corresponding to a motion of the face and indicating a motion of a vertex of each mesh of the face mesh information (or a mesh geometric transformation parameter), and supplies the face mesh motion information to the model parameter information integration unit 146 .
  • the model parameter information integration unit 146 integrates the person mesh information, the person mesh motion information, the face mesh information, and the face mesh motion information, and outputs the integrated information as object mesh information. Moreover, the model parameter information integration unit 146 labels a mesh corresponding to the high fidelity display region information, of the meshes configured by the object mesh information, and outputs the labeled mesh as high fidelity display mesh label information.
  • FIG. 41 is a block diagram illustrating a configuration example of an encoding unit 36 F in FIG. 39 .
  • the encoding unit 36 F has a common configuration to the encoding unit 36 in FIG. 17 in including the object viewpoint information encoding unit 101 , the video codec encoding unit 102 , and the stream integration unit 103 , and further includes an object mesh encoding unit 105 and a high fidelity display mesh label encoding unit 106 .
  • the object viewpoint information encoding unit 101 encodes the object viewpoint information and supplies the encoded object viewpoint information to the stream integration unit 103 as the additional stream.
  • the video codec encoding unit 102 encodes the high fidelity image using various codecs as described above, and supplies the encoded high fidelity image to the stream integration unit 103 as the video stream.
  • the object mesh encoding unit 105 encodes the object mesh information, and supplies the encoded object mesh information to the stream integration unit 103 as an object mesh stream.
  • the high fidelity display mesh label encoding unit 106 encodes the fidelity display mesh label information, and supplies the encoded fidelity display mesh label information to the stream integration unit 103 as high fidelity display mesh label stream.
  • the stream integration unit 103 integrates the additional stream, the video stream, the object mesh stream, and the high fidelity display mesh label stream, and outputs the integrated streams to the transmission unit 37 as the coded stream.
  • FIG. 42 is a block diagram illustrating a configuration example of a decoding unit 39 F in FIG. 39 .
  • the decoding unit 39 F has a common configuration to the decoding unit 39 in FIG. 18 in including the stream separation unit 111 , the object viewpoint information decoding unit 112 , and the video codec decoding unit 113 , and further includes an object mesh decoding unit 115 and a high fidelity display mesh label decoding unit 116 .
  • the stream separation unit 111 separates the coded stream supplied from the reception unit 38 in FIG. 39 into the additional stream, the video stream, the object mesh stream, and the high fidelity display mesh label stream. Then, the stream separation unit 111 supplies the object mesh stream to the object mesh decoding unit 115 and supplies the high fidelity display mesh label stream to the high fidelity display mesh label decoding unit 116 .
  • the object mesh decoding unit 115 decodes the object mesh stream supplied from the stream separation unit 111 to decoded object mesh information, and outputs the decoded object mesh information.
  • the high fidelity display mesh label decoding unit 116 decodes the high fidelity display mesh label stream supplied from the stream separation unit 111 to decoded high fidelity display mesh label information, and outputs the decoded high fidelity display mesh label information.
  • FIG. 43 is a block diagram illustrating a configuration example of the low fidelity image generation unit 47 in FIG. 39 .
  • the low fidelity image generation unit 47 includes an animation rendering unit 151 and a database 152 . Then, the low fidelity image generation unit 47 renders the CG avatar animation to generate the low fidelity image.
  • the animation rendering unit 151 performs rendering such that the object is displayed in a life size manner on the display device 22 as viewed from the viewpoint of the other party's user indicated by the object viewpoint information.
  • the animation rendering unit 151 can perform rendering for a 3D mesh structure configured by the object mesh information with the other party's user as the object by acquiring various types of information (texture information, actual size information, background CG information, light source information, and the like) registered in advance in the database 152 .
  • the animation rendering unit 151 reproduces animation according to motion information included in the object mesh information, and outputs the animation as the low fidelity image.
  • the animation rendering unit 151 generates the mask image corresponding to a region indicated by the decoded high fidelity display mesh label information, and outputs the mask image as the corrected high fidelity display region information.
  • the pseudo gaze coincidence image generation unit 48 generates the pseudo gaze coincidence image using the corrected high fidelity display region information and the low fidelity image, and the decoded high fidelity image in place of the high fidelity image, similarly to the processing performed by the pseudo gaze coincidence image generation unit 35 in FIG. 2 .
  • the pseudo gaze coincidence image display unit 40 generates and outputs the pseudo gaze coincidence display image to the display device 22 , similarly to FIG. 2 .
  • the image processing unit 24 F configured as described above transmits the parameter for generating the low fidelity image with the CG avatar animation to the other party side, and can generate the low fidelity image with the CG avatar animation on the basis of the parameter transmitted from the other party side.
  • the users of the telecommunication apparatus 12 can perform more realistic telecommunication using life-size video and live-action-based avatar animation.
  • a video communication experience for causing mutual gazes to coincide with each other in consideration of the viewpoint positions of the users, and the like can be provided without arranging the capture devices inside the display device 22 .
  • FIG. 44 is a block diagram illustrating the eighth configuration example of the image processing unit 24 . Note that, in an image processing unit 24 G illustrated in FIG. 44 , configurations common to the image processing unit 24 in FIG. 2 are denoted with the same reference numerals and detailed description of the configurations is omitted.
  • the image processing unit 24 G has a configuration common to the image processing unit 24 in FIG. 2 in including the object viewpoint information setting unit 31 , the high fidelity display region setting unit 32 , the high fidelity image generation unit 33 , the low fidelity image generation unit 34 , the encoding unit 36 , the transmission unit 37 , the reception unit 38 , the decoding unit 39 , and the pseudo gaze coincidence image display unit 40 .
  • the image processing unit 24 G includes a pseudo gaze coincidence image generation unit 35 G.
  • the image processing unit 24 G is configured to supply the analysis information output from the object viewpoint information setting unit 31 to the pseudo gaze coincidence image generation unit 35 G.
  • FIG. 45 is a block diagram illustrating a configuration example of the pseudo gaze coincidence image generation unit 35 G in FIG. 44 .
  • the pseudo gaze coincidence image generation unit 35 G has a common configuration to the pseudo gaze coincidence image generation unit 35 in FIG. 16 in including the mask image filter processing unit 91 and the high fidelity display region blending processing unit 92 . Furthermore, the pseudo gaze coincidence image generation unit 35 G has a common configuration to the pseudo gaze coincidence image generation unit 35 E in FIG. 37 in including the high fidelity determination unit 93 , and further includes an interference signal removal unit 94 .
  • the analysis information is supplied from the object viewpoint information setting unit 31 , the high fidelity image is supplied from the high fidelity image generation unit 33 , and the low fidelity image is supplied from the low fidelity image generation unit 34 to the interference signal removal unit 94 . Then, the interference signal removal unit 94 removes a signal interfering with gaze coincidence included in the high fidelity image, using the analysis information and the low fidelity image, and supplies interference-removed high fidelity image from which the interference signal has been removed to the high fidelity display region blending processing unit 92 and the high fidelity determination unit 93 .
  • the interference signal removal unit 94 removes an element interfering with the gaze coincidence from the high fidelity image according to an error amount between the high fidelity image and the low fidelity image in a region near both eyes of the user before the alpha blending processing is performed by the high fidelity display region blending processing unit 92 .
  • the interference signal removal unit 94 specifies a region assumed to interfere with the gaze coincidence on the basis of the analysis information, and removes the deformed rim of the glasses to interfere with the gaze coincidence, using the undeformed rim of the glasses captured in the low fidelity image, in the specified region.
  • the alpha blending processing is performed on the basis of the blend ratio map image avoiding the portion where the rim of the glasses is present, as in the mask image illustrated in FIG. 35A , for the low fidelity image in which the gaze does not coincide and the high fidelity image with the deformed rim of the glasses, as illustrated on the left side in FIG. 46 .
  • a part of the distorted rim of the glasses in the vicinity of the region of the eyes of the high fidelity image is smoothed by the mask image filter processing unit 91 ( FIG. 16 ) and thus may be mixed on the pseudo gaze coincidence image as an element interfering with the gaze coincidence (interference signal).
  • the interference signal removal unit 94 removes the interference signal on the pseudo gaze coincidence image and outputs the interference-removed high fidelity image as illustrated on the right side in FIG. 46 . Therefore, the high fidelity display region blending processing unit 92 can generate the pseudo gaze coincidence display image for enabling further coincidence of the gaze.
  • the region from which the interference signal is removed by the interference signal removal unit 94 is a region near both eyes as illustrated with the thick broken line in FIG. 46 and excluding the eye regions respectively corresponding to the right eye and the left eye, as hatched in gray in FIG. 46 .
  • FIG. 47 is a block diagram illustrating a configuration example of the interference signal removal unit 94 in FIG. 44 .
  • the interference signal removal unit 94 includes an interference signal removal target region setting unit 161 , an eye region setting unit 162 , an interference signal removal blending unit 163 , and a remaining interference signal removal smoothing unit 164 .
  • the interference signal removal target region setting unit 161 specifies a region involving both eyes as an interference signal removal target region, as described with reference to FIG. 46 , on the basis of the analysis information supplied from the object viewpoint information setting unit 31 . Then, the interference signal removal target region setting unit 161 sets the interference signal removal target region for the interference signal removal blending unit 163 .
  • the eye region setting unit 162 specifies the regions respectively corresponding to the right eye and the left eye as the eye region, as described with reference to FIG. 46 , on the basis of the analysis information supplied from the object viewpoint information setting unit 31 . Then, the eye region setting unit 162 sets the eye region for the interference signal removal blending unit 163 and the remaining interference signal removal smoothing unit 164 .
  • the interference signal removal blending unit 163 obtains an error amount between the high fidelity image and the low fidelity image in a region other than the eye region set by the eye region setting unit 162 , of the interference signal removal target region set by the interference signal removal target region setting unit 161 . Then, the interference signal removal blending unit 163 performs the alpha blending processing using the blend ratio of the low fidelity image, the blend ratio becoming larger in value with an increase in the obtained error amount, as illustrated in FIG. 48 , in the interference signal removal target region excluding the eye region.
  • the interference signal removal blending unit 163 displays the high fidelity image as it is, for the eye region set by the eye region setting unit 162 .
  • the interference signal removal blending unit 163 performs the alpha blending processing with the blend ratio of the low fidelity image in the eye region set to 0.
  • the interference signal removal blending unit 163 generates the interference signal-removed blend image from which most of the portion of the distorted rim of the glasses in the high fidelity image is removed as the interference signal, and supplies the interference signal-removed blend image to the remaining interference signal removal smoothing unit 164 .
  • an edge of the distorted rim of the glasses may not be removed and remain as a linear interference signal, as illustrated in the center in FIG. 46 .
  • the remaining interference signal removal smoothing unit 164 applies smoothing processing with an edge preserving nonlinear filter such as a median filter for removing an impulsive signal to the interference signal remaining in the interference signal-removed blend image supplied from the interference signal removal blending unit 163 . Thereby, the remaining interference signal removal smoothing unit 164 generates the interference-removed high fidelity image from which all the interference signal remaining in the interference signal-removed blend image have been removed, and supplies the interference-removed high fidelity image to the subsequent high fidelity display region blending processing unit 92 and high fidelity determination unit 93 ( FIG. 45 ).
  • an edge preserving nonlinear filter such as a median filter for removing an impulsive signal to the interference signal remaining in the interference signal-removed blend image supplied from the interference signal removal blending unit 163 .
  • the mask image filter processing unit 91 performs processing similar to the processing in the pseudo gaze coincidence image generation unit 35 E described with reference to FIG. 37 .
  • an edge portion of the portion of the rim of the glasses of the low fidelity image is not made blurred. Reproducibility of the rim portion of the glasses of the pseudo gaze coincidence image undergoing the alpha blending processing by the high fidelity display region blending processing unit 92 is maintained.
  • the image processing unit 24 G configured as described above can display the image without generating artifacts in the vicinity of the eye region.
  • a ninth configuration example of the image processing unit 24 will be described with reference to FIGS. 49, 53, 54A, 54B, 54C, and 55 .
  • FIG. 49 is a block diagram illustrating the ninth configuration example of the image processing unit 24 . Note that, in an image processing unit 24 H illustrated in FIG. 49 , configurations common to the image processing unit 24 in FIG. 2 are denoted with the same reference numerals and detailed description of the configurations is omitted.
  • the image processing unit 24 H has a configuration common to the image processing unit 24 in FIG. 2 in including the object viewpoint information setting unit 31 , the high fidelity display region setting unit 32 , the low fidelity image generation unit 34 , the pseudo gaze coincidence image generation unit 35 , the transmission unit 37 , and the reception unit 38 .
  • the image processing unit 24 H includes a high fidelity image generation unit 33 H, an encoding unit 36 H, a decoding unit 39 H, and a pseudo gaze coincidence image display unit 40 H.
  • the image processing unit 24 in FIG. 2 the size of the face and the positions of the eyes of the other party's user displayed on the display device 22 have been displayed to be equivalent to actual sizes on the basis of the viewpoint position (defined in the world coordinates) of the other party's user in the three-dimensional real space
  • the image processing unit 24 H performs display for enabling the gazes to more easily coincide with each other, in consideration of a difference in processing characteristics using a plurality of captured images depending on a portion of the face.
  • the gaze perception direction as illustrated with the alternate long and short dashed line deviates toward the face orientation direction as illustrated with the broken line with respect to the true eye direction as illustrated with the dotted line. The gaze is thus perceived in such a direction.
  • the high fidelity image generation unit 33 H supplies virtual capture position information related to the difference in the processing characteristics using a plurality of captured images depending on a portion of the face to the encoding unit 36 H, unlike the high fidelity image generation unit 33 in FIG. 2 .
  • the high fidelity image generation unit 33 H includes the high fidelity display region cropping unit 71 , similarly to the high fidelity image generation unit 33 in FIG. 8 , and further includes a viewpoint interpolation image generation unit 72 H.
  • the viewpoint interpolation image generation unit 72 H sets the virtual capture position, and outputs the virtual capture position information indicating the virtual capture position.
  • the encoding unit 36 H newly encodes the virtual capture position information in addition to the object viewpoint information of the principal user supplied from the object viewpoint information setting unit 31 and the pseudo gaze coincidence image supplied from the pseudo gaze coincidence image generation unit 35 , similarly to the high fidelity image generation unit 33 in FIG. 2 .
  • the encoding unit 36 H generates the coded stream in which the object viewpoint information, the pseudo gaze coincidence image, and the virtual capture position information are encoded, and supplies the coded stream to the transmission unit 37 .
  • the decoding unit 39 H newly supplies decoded virtual capture position information in addition to the decoded object viewpoint information and the decoded pseudo gaze coincidence image obtained by decoding the coded stream supplied from the reception unit 38 to the pseudo gaze coincidence image display unit 40 H.
  • FIG. 52 is a block diagram illustrating a configuration example of the pseudo gaze coincidence image display unit 40 H in FIG. 49 .
  • the decoded virtual capture position information in addition to the decoded object viewpoint information and the decoded pseudo gaze coincidence image output from the decoding unit 39 H are supplied to the pseudo gaze coincidence image display unit 40 H.
  • the pseudo gaze coincidence image display unit 40 H includes a gaze coincidence promotion display geometric correction parameter estimation unit 121 H and a gaze coincidence promotion display geometric correction processing unit 122 H, in place of the life-size-display geometric correction parameter estimation unit 121 and the life-size-display geometric correction processing unit 122 of the pseudo gaze coincidence image display unit 40 in FIG. 19 .
  • the gaze coincidence promotion display geometric correction parameter estimation unit 121 H obtains a parameter with which the size of the face and the positions of the eyes of the other party's user displayed on the display device 22 are displayed with actual sizes on the basis of the viewpoint position (defined with world coordinates) of the other party's user in the three-dimensional real space indicated by the decoded object viewpoint information, similarly to the life-size-display geometric correction parameter estimation unit 121 in FIG. 19 .
  • the gaze coincidence promotion display geometric correction parameter estimation unit 121 H adds correction using the state as a reference (hereinafter reference state) without using the parameter as it is.
  • the gaze coincidence promotion display geometric correction parameter estimation unit 121 H determines a correction amount with respect to the reference state set by the gaze coincidence promotion display geometric correction parameter estimation unit 121 H, taking a viewpoint interpolation position set when the decoded pseudo gaze coincidence image is generated into account, which is obtained from the decoded virtual capture position information, and sets a gaze coincidence promotion display geometric correction parameter.
  • the left cross mark illustrated in FIG. 53 is set as the viewpoint interpolation position set when the decoded pseudo gaze coincidence image obtained from the decoded virtual capture position information is generated.
  • the left cross mark illustrated in FIG. 53 divides the vertical direction to r a and (1.0 ⁇ r a ).
  • r a is a numerical value from 0.0 to 1.0 (0.0 ⁇ r a ⁇ 1.0).
  • the left cross mark illustrated in FIG. 53 divides the horizontal direction to s a and (1.0 ⁇ s a ) on the left side.
  • S a is a numerical value from 0.0 to 1.0 (0.0 ⁇ S a ⁇ 1.0).
  • the upward correction amount is obtained such that the viewpoint interpolation position is away from the capture device 21 - 1 as the value of ra is larger, as illustrated in the graph in FIG. 54A .
  • consistency between the face orientation (looking slightly downward) of the low fidelity image based on the image captured by the capture device 21 - 1 and the eye orientation (looking at the camera seen in front) of the high fidelity image faithfully generated at the viewpoint interpolation position from the three capture devices 21 - 1 to 21 - 3 becomes low. Therefore, since the gaze is perceived slightly shifted downward, the upward correction amount is increased.
  • the leftward correction amount is obtained such that the viewpoint interpolation position is away from the central position between the capture devices 21 - 2 and 21 - 3 as the value of sa is larger, as in the graph in FIG. 54B .
  • the upward correction amount setting method similarly to the upward correction amount setting method, consistency between the face orientation (looking slightly rightward) of the low fidelity image and the eye orientation (looking at the camera seen in front) of the high fidelity image becomes low. Therefore, since the gaze is perceived slightly shifted rightward, the leftward correction amount is increased.
  • the final correction amount for the right cross mark illustrated in FIG. 53 is expressed by a two-dimensional vector (DXL, DY), and is corrected and displayed at a shifted position by the amount of the vector.
  • a two-dimensional vector (DXR, DY) is determined similarly to the case of the left cross mark, as illustrated in the graph in FIG. 54 except for the horizontal correction amount in the rightward direction.
  • the gaze coincidence promotion display geometric correction parameter estimation unit 121 H estimates the gaze coincidence promotion display geometric correction parameter, reflecting the correction amount determined by the above determination method, and supplies the gaze coincidence promotion display geometric correction parameter to the gaze coincidence promotion display geometric correction processing unit 122 H.
  • the gaze coincidence promotion display geometric correction processing unit 122 H applies the geometric correction using the gaze coincidence promotion display geometric correction parameter supplied from the gaze coincidence promotion display geometric correction parameter estimation unit 121 H to the decoded pseudo gaze coincidence image. Thereby, the gaze coincidence promotion display geometric correction processing unit 122 H generates the pseudo gaze coincidence display image to be displayed on the display device 22 such that the gazes more easily coincide with each other on the basis of the above-described viewpoint interpolation position from the state where the size and the position (height of the eyes) for life size of the other party's user are set as the reference. Then, the gaze coincidence promotion display geometric correction processing unit 122 H outputs and displays the pseudo gaze coincidence display image on the display device 22 in FIG. 1 .
  • the blocks included in the image processing unit 24 H are configured as described above, and the display position is corrected for the actual size display, as illustrated in FIG. 55 , in the interactive communication performed with the other party's user displayed, whereby a video communication experience in which the gaze easily coincides with the other party's user can be provided.
  • FIG. 56 is a block diagram illustrating the tenth configuration example of the image processing unit 24 . Note that, in an image processing unit 24 J illustrated in FIG. 56 configurations common to the image processing unit 24 in FIG. 2 are denoted with the same reference numerals and detailed description of the configurations is omitted.
  • the image processing unit 24 J has a configuration common to the image processing unit 24 in FIG. 2 in including the high fidelity display region setting unit 32 , the high fidelity image generation unit 33 , the low fidelity image generation unit 34 , the encoding unit 36 , the transmission unit 37 , the reception unit 38 , the decoding unit 39 , and the pseudo gaze coincidence image display unit 40 .
  • the image processing unit 24 J includes an object viewpoint information setting unit 31 J and a pseudo gaze coincidence image generation unit 35 J.
  • FIG. 57 is a block diagram illustrating a configuration example of the object viewpoint information setting unit 31 J in FIG. 56 .
  • the object viewpoint information setting unit 31 J has a common configuration to the object viewpoint information setting unit 31 in FIG. 3 in including the face part detection unit 51 , the eye region corresponding point detection unit 52 , the viewpoint distance calculation unit 53 , and the object viewpoint information generation unit 54 , and further includes a gaze direction detection unit 56 .
  • the input capture signal is supplied to the gaze direction detection unit 56 and the analysis information is supplied from the face part detection unit 51 to the gaze direction detection unit 56 .
  • the gaze direction detection unit 56 detects the gaze direction of the pupils of both eyes on the basis of at least one image, of the three captured images captured by the capture devices 21 - 1 to 21 - 3 , and the analysis information indicating the coordinates of the characteristic points of the parts of the face output by the face part detection unit 51 .
  • the gaze direction detection unit 56 can detect the gaze direction using the technology disclosed in Non-Patent Document “Rendering of Eyes for Eye-Shape Registration and Gaze Estimation” by Erroll Wood, et al. ICCV2015” or the like.
  • the gaze direction detection unit 56 supplies a detection result to the pseudo gaze coincidence image generation unit 35 J as gaze direction information, and the gaze direction information is output from the object viewpoint information setting unit 31 J together with the analysis information.
  • FIG. 58 is a block diagram illustrating a configuration example of the pseudo gaze coincidence image generation unit 35 J in FIG. 56 .
  • the pseudo gaze coincidence image generation unit 35 J has a common configuration to the pseudo gaze coincidence image generation unit 35 G in FIG. 45 in including the mask image filter processing unit 91 and the high fidelity display region blending processing unit 92 , the high fidelity determination unit 93 , and the interference signal removal unit 94 , and further includes a catch light emphasizing unit 95 .
  • the analysis information and the gaze direction information are supplied from the object viewpoint information setting unit 31 and the interference-removed high fidelity image is supplied from the interference signal removal unit 94 to the catch light emphasizing unit 95 . Then, the catch light emphasizing unit 95 emphasizes, in advance, a portion of catch light where the pupil reflects in the interference-removed high fidelity image, using the analysis information and the gaze direction before the alpha blending processing is performed by the high fidelity display region blending processing unit 92 .
  • FIG. 59 is a block diagram illustrating a configuration example of the catch light emphasizing unit 95 in FIG. 58 .
  • the catch light emphasizing unit 95 includes a pupil region detection unit 171 , a catch light saliency determination unit 172 , and a catch light emphasizing processing unit 173 .
  • processing performed in the catch light emphasizing unit 95 will be described with reference to FIGS. 60, 61A, and 61B .
  • the pupil region detection unit 171 outputs a rectangular region obtained by connecting four characteristic points close to a boundary of the pupil (pupil and iris) as pupil region information, as illustrated in FIG. 60 , from the face part of the eye of the analysis information supplied from the object viewpoint information setting unit 31 .
  • the catch light saliency determination unit 172 determines whether or not the catch light is noticeable in the rectangular region indicated by the pupil region information supplied from the pupil region detection unit 171 with respect to a high fidelity interference-removed image supplied from the interference signal removal unit 94 .
  • the catch light saliency determination unit 172 first obtains occurrence probability distribution (histogram) of luminance as illustrated in FIGS. 61A and 61B for a luminance signal of the rectangular region indicated by the pupil region information.
  • FIG. 61A illustrates an example of the occurrence probability of when brighter catch light has occurred
  • FIG. 61B illustrates an example of the occurrence probability of when darker catch light has occurred.
  • the catch light saliency determination unit 172 sets the catch light saliency CLS to 1.0. Then, the catch light saliency determination unit 172 supplies the catch light saliency CLS obtained as described above to the catch light emphasizing processing unit 173 .
  • the catch light saliency CLS and the gaze direction information are supplied to the catch light emphasizing processing unit 173 .
  • the catch light emphasizing processing unit 173 transforms the catch light saliency CLS into a catch light unclear degree Clr with the characteristic illustrated in FIG. 62A .
  • the catch light emphasizing processing unit 173 obtains magnitude of deviation of the gaze direction from the front as a front gaze error GE from the gaze direction information, and transforms the front gaze error GE into a front gaze degree Fgr with the characteristic illustrated in FIG. 62B .
  • the catch light emphasizing processing unit 173 performs emphasizing processing for the interference-removed high fidelity image, using the catch light unclear degree Clr and the front gaze degree Fgr.
  • the emphasizing processing is intensified when the gaze direction is closer to the front and the catch light is more unclear to make the catch light more noticeable.
  • an unsharp mask, super resolution, contrast enhancement, color enhancement, or the like can be used as the emphasizing processing by the catch light emphasizing processing unit 173 .
  • the image processing unit 24 J configured as described above can enhance the gaze coincidence by the clue of the catch light even in a poor illumination environment.
  • FIGS. 63, 64, 65, 66A, and 66B An eleventh configuration example of the image processing unit 24 will be described with reference to FIGS. 63, 64, 65, 66A, and 66B .
  • FIG. 63 is a block diagram illustrating the eleventh configuration example of the image processing unit 24 . Note that, in an image processing unit 24 K illustrated in FIG. 63 configurations common to the image processing unit 24 in FIG. 2 are denoted with the same reference numerals and detailed description of the configurations is omitted.
  • the image processing unit 24 K has a configuration common to the image processing unit 24 in FIG. 2 in including the object viewpoint information setting unit 31 , the high fidelity display region setting unit 32 , the high fidelity image generation unit 33 , the low fidelity image generation unit 34 , the pseudo gaze coincidence image generation unit 35 , the encoding unit 36 , the transmission unit 37 , the reception unit 38 , and the decoding unit 39 .
  • the image processing unit 24 includes a pseudo gaze coincidence image display unit 40 K.
  • the image processing unit 24 K displays the size of the face and the positions of the eyes of the other party's user displayed on the display device 22 , for enabling the gazes to more easily coincide with each other by correction based on Wollaston illusion in consideration of the difference in processing characteristics using a plurality of captured images depending on a portion of the face, using a display equivalent to the actual size as a reference, which is similar to the image processing unit 24 H described with reference to FIG. 49 .
  • the image processing unit 24 K is configured such that the virtual capture position information is not transmitted as the additional information, and the difference in processing characteristics using a plurality of captured images depending on a portion of the face is detected by the pseudo gaze coincidence image display unit 40 K.
  • FIG. 64 is a block diagram illustrating a configuration example of the pseudo gaze coincidence image display unit 40 K in FIG. 63 .
  • the decoded object viewpoint information and the decoded pseudo gaze coincidence image output from the decoding unit 39 are supplied to the pseudo gaze coincidence image display unit 40 K as illustrated in FIG. 64 , similarly to the pseudo gaze coincidence image display unit 40 in FIG. 19 .
  • the pseudo gaze coincidence image display unit 40 K includes a gaze coincidence promotion display geometric correction processing unit 122 K, similarly to the pseudo gaze coincidence image display unit 40 H in FIG. 52 .
  • the pseudo gaze coincidence image display unit 40 K includes a gaze coincidence promotion display geometric correction parameter estimation unit 121 K, a face part detection unit 51 K, a gaze direction detection unit 132 , and a face orientation detection unit 133 .
  • the face part detection unit 51 K obtains coordinates indicating the characteristic points of the parts of the face included in the decoded pseudo gaze coincidence image, similarly to the face part detection unit 51 included in the object viewpoint information setting unit 31 illustrated in FIG. 3 .
  • the gaze direction detection unit 132 detects the gaze direction of both eyes from the analysis information detected in the previous face part detection unit 51 K and the decoded pseudo gaze coincidence image, similarly to the above-described gaze direction detection unit 56 in FIG. 57 .
  • the face orientation detection unit 133 detects face orientation in the decoded pseudo gaze coincidence image, using the analysis information of other than the eyes detected in the previous face part detection unit 51 K.
  • the face orientation detection unit 133 detects the face orientation using a technology disclosed in Non-Patent Document “OpenFace: an open source facial behavior analysis toolkit” by Tadas Baltru?aitis, et al. in IEEE Winter Conference on Applications of Computer Vision, 2016” or the like.
  • the detected directions are expressed as angles of vectors (roll, pitch, yaw) on three-dimensional space axes as illustrated in FIG. 65 .
  • the face orientation includes head orientation.
  • the gaze direction information indicating the gaze direction of both eyes detected by the gaze direction detection unit 132 and face orientation information indicating the face orientation detected by the face orientation detection unit 133 are supplied to the gaze coincidence promotion display geometric correction parameter estimation unit 121 K. Then, the gaze coincidence promotion display geometric correction parameter estimation unit 121 K estimates the gaze coincidence promotion display geometric correction parameter on the basis of an error between the gaze direction of both eyes and the face orientation, as illustrated in FIG. 65 .
  • the gaze coincidence promotion display geometric correction parameter estimation unit 121 K estimates the gaze coincidence promotion display geometric correction parameter, reflecting the correction amount determined by the above determination method, and supplies the gaze coincidence promotion display geometric correction parameter to the gaze coincidence promotion display geometric correction processing unit 122 K.
  • the gaze coincidence promotion display geometric correction processing unit 122 K performs similar operation to the gaze coincidence promotion display geometric correction processing unit 122 H described with reference to FIG. 52 .
  • the gaze coincidence promotion display geometric correction processing unit 122 K generates the pseudo gaze coincidence display image to be displayed on the display device 22 by being corrected such that the gazes more easily coincide with each other on the basis of the correction amount based on the above-described angle error from the state where the size and the position (height of the eyes) for life size of the other party's user are set as the reference.
  • the blocks included in the image processing unit 24 K are configured as described above, and the display position is corrected for the actual size display, similarly to the image processing unit 24 H in FIG. 49 , whereby a video communication experience in which the gaze easily coincides with the other party's user without increasing the additional information can be provided.
  • each processing described with reference to the above-described flowchart does not necessarily need to be chronologically processed according to the order described as the flowchart, and includes processing executed in parallel or individually (for example, parallel processing or object processing).
  • the program may be processed by a single CPU or may be processed in a distributed manner by a plurality of CPUs.
  • the above-described series of processing can be executed by hardware or software.
  • a program constituting the software is installed from a program recording medium in which the program is recorded into a computer incorporated in special hardware, a general-purpose personal computer capable of executing various functions by installing various programs, or the like.
  • FIG. 67 is a block diagram illustrating a configuration example of hardware of a computer that executes the above-described series of processing by a program.
  • a central processing unit (CPU) 201 a central processing unit (CPU) 201 , a read only memory (ROM) 202 , and a random access memory (RAM) 203 are mutually connected by a bus 204 .
  • CPU central processing unit
  • ROM read only memory
  • RAM random access memory
  • an input/output interface 205 is connected to the bus 204 .
  • An input unit 206 including a keyboard, a mouse, a microphone, and the like, an output unit 207 including a display, a speaker, and the like, a storage unit 208 including a hard disk, a nonvolatile memory, and the like, a communication unit 209 including a network interface and the like, and a drive 210 for driving removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory are connected to the input/output interface 205 .
  • the CPU 201 for example, loads a program stored in the storage unit 208 into the RAM 203 and executes the program via the input/output interface 205 and the bus 204 , thereby performing the above-described series of processing.
  • the program to be executed by the computer (CPU 201 ) is provided by being recorded on the removable medium 211 that is a package medium including a magnetic disk (including a flexible disk), an optical disk (compact disc-read only memory (CD-ROM), a digital versatile disc (DVD), or the like), a magneto-optical disk, or a semiconductor memory, or by being provided via a wired or wireless transmission medium, such as a local area network, the Internet, or digital broadcasting.
  • a package medium including a magnetic disk (including a flexible disk), an optical disk (compact disc-read only memory (CD-ROM), a digital versatile disc (DVD), or the like), a magneto-optical disk, or a semiconductor memory, or by being provided via a wired or wireless transmission medium, such as a local area network, the Internet, or digital broadcasting.
  • the program can be installed to the storage unit 208 via the input/output interface 205 by attaching the removable medium 211 to the drive 210 . Furthermore, the program can be received by the communication unit 209 via a wired or wireless transmission medium and installed in the storage unit 208 . Other than the above method, the program can be installed in the ROM 202 or the storage unit 208 in advance.
  • An image processing device including:
  • a high fidelity display region setting unit configured to set a predetermined region including at least an eye region in which an eye of a first user is captured in an image in which the first user is captured, as a high fidelity display region;
  • a high fidelity image generation unit configured to perform first image generation processing using at least a part of a plurality of captured images having the first user respectively captured by a plurality of capture devices arranged outside a display device, and generate a high fidelity image in which the first user looks captured from a virtual capture position that is obtained by setting a viewpoint position of a second user displayed on the display device as the virtual capture position, and the high fidelity image having an appearance with higher fidelity;
  • a low fidelity image generation unit configured to perform second image generation processing using at least a part of the plurality of captured images in each of which the first user is captured, and generate a low fidelity image in which the first user looks captured from the virtual capture position and having lower fidelity than the high fidelity image;
  • an image superimposing unit configured to superimpose the high fidelity image on the high fidelity display region in the low fidelity image to generate an output image to be output as an image processing result.
  • the image processing device further including
  • a display image generation unit configured to generate a display image for displaying the second user with a specific size at a specific position on the display device from the output image in which the second user is captured according to the viewpoint position of the second user in a three-dimensional space.
  • the display image generation unit performs geometric correction using a parameter based on the viewpoint position of the second user in the three-dimensional space, a resolution of the output image in which the second user is captured, and a resolution and a size of the display device, and generates the display image such that the second user is displayed with a substantially same size as the real second user.
  • the image processing device according to (2) or (3), further including
  • an object viewpoint information setting unit configured to analyze a face of the first user captured in the plurality of captured images having the first user as an object and obtain a coordinate indicating a characteristic point of each part of the face on each of the captured images, and acquire object viewpoint information indicating a viewpoint position of the first user on a basis of the coordinates, in which
  • the object viewpoint information is used when generating the display image from the output image in which the first user is captured on a side of the second user who becomes a party to perform telecommunication with the first user.
  • the high fidelity display region setting unit generates a mask image for specifying the high fidelity display region, using analysis information including the coordinates of the characteristic point obtained by the object viewpoint information setting unit.
  • the high fidelity image generation unit crops a portion corresponding to the high fidelity display region from the plurality of captured images in which the first user is captured, and performs viewpoint interpolation processing according to the virtual capture position, for a plurality of cropped images to generate the high fidelity image.
  • the low fidelity image generation unit is configured to apply projective transformation processing using a projective transformation parameter estimated such that a captured image becomes close to an image as viewed from the virtual capture position according to the viewpoint position of the second user to the captured image in which the first user is captured to generate the low fidelity image.
  • the low fidelity image generation unit applies the projective transformation processing using the projective transformation parameter to a mask image for specifying the high fidelity display region to perform correction to reflect an influence of projective transformation to the low fidelity image.
  • the image superimposing unit generates a blend map image obtained by applying filter processing to a mask image for specifying the high fidelity display region and performs alpha blending processing for blending the high fidelity image and the low fidelity image according to a blend ratio set for the blend map image to generate the output image.
  • the image processing device further including:
  • an encoding unit configured to encode the object viewpoint information indicating the viewpoint position of the first user and the output image in which the first user is captured to generate a coded stream
  • a transmission unit configured to output the coded stream as a transmission stream to be transmitted via a network.
  • the image processing device according to any one of (2) to (10), further including:
  • a reception unit configured to receive a transmission stream obtained by encoding object viewpoint information indicating the viewpoint position of the second user and the output image in which the second user is captured and transmitted via a network, and restore the transmission stream to a coded stream;
  • a decoding unit configured to decode the coded stream and supply the object viewpoint information indicating the viewpoint position of the second user and the output image in which the second user is captured to the display image generation unit.
  • the first user and the second user are a same person
  • the high fidelity image generation unit generates the high fidelity image, using a viewpoint position of the person itself as a virtual capture position
  • the low fidelity image generation unit generates the low fidelity image, using a viewpoint position of the person itself as a virtual capture position
  • the image processing device further includes a mirror image display processing unit configured to perform mirror image display processing of horizontally reversing the output image.
  • the high fidelity image generation unit and the low fidelity image generation unit respectively generate the high fidelity image and the low fidelity image on a basis of fixed viewpoint information
  • the image processing device further includes a high fidelity display information setting unit configured to output a representative position and an area of the mask image as high fidelity display information.
  • a capture control unit configured to control a plurality of the capture devices to zoom and capture the first user as the captured image to be used by the high fidelity image generation unit to generate the high fidelity image, and to capture the first user at a wide angle as the captured image to be used by the low fidelity image generation unit to generate the low fidelity image.
  • the high fidelity display region setting unit sets the high fidelity display region, avoiding an existing portion of a rim of the pair of glasses from a face captured in the image.
  • the image superimposing unit generates a blend ratio map image in which the blend ratio of the high fidelity image becomes higher as similarity is higher on a basis of the similarity between the high fidelity image and the low fidelity image, and blends the high fidelity image and the low fidelity image according to the blend ratio map image.
  • an animation rendering unit configured to generate a computer graphic (CG) avatar image on a basis of a parameter generated from an image in which the second user is captured, as the low fidelity image.
  • CG computer graphic
  • the image superimposing unit includes a removal unit that removes an element that interferes with gaze coincidence from the high fidelity image according to an error amount between the high fidelity image and the low fidelity image in a region near eyes of the first user before blending the high fidelity image and the low fidelity image according to the blend ratio map image.
  • an object viewpoint information setting unit including a gaze direction detection unit that detects a gaze direction of the first user on a basis of at least one piece of the captured image capturing the first user as an object, and analysis information including a coordinate indicating a characteristic point of each part of a face of the first user, in which
  • the image superimposing unit includes a catch light emphasizing unit that emphasizes a catch light of a pupil region of the high fidelity image using the analysis information and the gaze direction in advance before blending the high fidelity image and the low fidelity image according to the blend ratio map image.
  • the display image generation unit displays the display image generated such that the second user is displayed with a substantially same size as the real second user at a display position according to a correction amount for correcting a deviation in a direction in which the first user perceives a gaze of the second user, on a basis of a viewpoint interpolation position set when generating the high fidelity image.
  • the display image generation unit displays the display image at a display position according to a correction amount for correcting a deviation in a direction in which the first user perceives a gaze of the second user, on a basis of an error of an angle made by a three-dimensional vector indicating a gaze direction of the second user in the output image and a three-dimensional vector indicating a face orientation of the second user in the output image when generating the display image in which the second user is displayed with a substantially same size as the real second user.
  • An image processing method including the steps of:
  • a predetermined region including at least an eye region in which an eye of a first user is captured in an image in which the first user is captured, as a high fidelity display region;
  • a program for causing a computer to execute image processing including the steps of:
  • a predetermined region including at least an eye region in which an eye of a first user is captured in an image in which the first user is captured, as a high fidelity display region;
  • a telecommunication system configured to have a first user-side telecommunication apparatus and a second user-side telecommunication apparatus connected via a network
  • the first user-side telecommunication apparatus including a first image processing device including at least
  • the second user-side telecommunication apparatus including a second image processing device including at least

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Ophthalmology & Optometry (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)
US16/609,043 2017-06-07 2018-05-24 Image processing device, image processing method, and telecommunication system to generate an output image for telecommunication Active US11068699B2 (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
JP2017112488 2017-06-07
JP2017-112488 2017-06-07
JPJP2017-112488 2017-06-07
JP2018003139 2018-01-12
JP2018-003139 2018-01-12
JPJP2018-003139 2018-01-12
PCT/JP2018/019953 WO2018225518A1 (ja) 2017-06-07 2018-05-24 画像処理装置、画像処理方法、プログラム、およびテレコミュニケーションシステム

Publications (2)

Publication Number Publication Date
US20200151427A1 US20200151427A1 (en) 2020-05-14
US11068699B2 true US11068699B2 (en) 2021-07-20

Family

ID=64567049

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/609,043 Active US11068699B2 (en) 2017-06-07 2018-05-24 Image processing device, image processing method, and telecommunication system to generate an output image for telecommunication

Country Status (2)

Country Link
US (1) US11068699B2 (ja)
WO (1) WO2018225518A1 (ja)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10692290B2 (en) * 2016-10-14 2020-06-23 Tremolant Inc. Augmented reality video communications
US11068699B2 (en) * 2017-06-07 2021-07-20 Sony Corporation Image processing device, image processing method, and telecommunication system to generate an output image for telecommunication
JP7343760B2 (ja) 2019-08-08 2023-09-13 富士通株式会社 画像処理プログラム、画像処理方法および画像処理装置
US11410331B2 (en) * 2019-10-03 2022-08-09 Facebook Technologies, Llc Systems and methods for video communication using a virtual camera
JP7423251B2 (ja) * 2019-10-25 2024-01-29 キヤノン株式会社 情報処理装置、情報処理方法、及びプログラム
CN115456855B (zh) * 2022-11-11 2023-04-11 湖北晓雲科技有限公司 一种无人机辅助倾斜摄影图像采集***

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09154114A (ja) 1995-11-28 1997-06-10 Nec Corp 電子会議端末装置
WO2007062478A1 (en) 2005-11-30 2007-06-07 Seeing Machines Pty Ltd Visual tracking of eye glasses in visual head and eye tracking systems
US7532224B2 (en) * 2005-04-08 2009-05-12 Canon Kabushiki Kaisha Information processing method and apparatus
JP2011165081A (ja) 2010-02-12 2011-08-25 Nippon Telegr & Teleph Corp <Ntt> 画像生成方法、画像生成装置、及びプログラム
US8280115B2 (en) * 2007-10-30 2012-10-02 Canon Kabushiki Kaisha Image processing apparatus and image processing method
US20120256950A1 (en) * 2010-12-17 2012-10-11 Panasonic Corporation Medical support apparatus, medical support method, and medical support system
US8798330B2 (en) * 2005-11-11 2014-08-05 Eyelock, Inc. Methods for performing biometric recognition of a human eye and corroboration of same
US8860714B2 (en) * 2010-03-29 2014-10-14 Fujifilm Corporation Apparatus and method for generating stereoscopic viewing image based on three-dimensional medical image, and a computer readable recording medium on which is recorded a program for the same
JP2015513833A (ja) 2012-02-27 2015-05-14 エー・テー・ハー・チューリッヒEth Zuerich ビデオ会議における眼差し補正のための画像処理のための方法およびシステム
US9057826B2 (en) * 2013-01-31 2015-06-16 Google Inc. See-through near-to-eye display with eye prescription
US9357231B2 (en) * 2008-07-31 2016-05-31 Mitsubishi Electric Corporation Video encoding device, video encoding method, video reproducing device, video reproducing method, video recording medium, and video data stream
WO2016159165A1 (ja) 2015-03-31 2016-10-06 大和ハウス工業株式会社 映像表示システム及び映像表示方法
US9552467B2 (en) * 2013-11-01 2017-01-24 Sony Corporation Information processing device and information processing method
US9916690B2 (en) * 2013-06-19 2018-03-13 Panasonic Intellectual Property Management Co., Ltd. Correction of displayed images for users with vision abnormalities
US10136101B2 (en) * 2015-03-31 2018-11-20 Sony Corporation Information processing apparatus, communication system, and information processing method
US20200151427A1 (en) * 2017-06-07 2020-05-14 Sony Corporation Image processing device, image processing method, program, and telecommunication system

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09154114A (ja) 1995-11-28 1997-06-10 Nec Corp 電子会議端末装置
US7532224B2 (en) * 2005-04-08 2009-05-12 Canon Kabushiki Kaisha Information processing method and apparatus
US8798330B2 (en) * 2005-11-11 2014-08-05 Eyelock, Inc. Methods for performing biometric recognition of a human eye and corroboration of same
WO2007062478A1 (en) 2005-11-30 2007-06-07 Seeing Machines Pty Ltd Visual tracking of eye glasses in visual head and eye tracking systems
EP1977374A1 (en) 2005-11-30 2008-10-08 Seeing Machines Pty Ltd Visual tracking of eye glasses in visual head and eye tracking systems
US20080285801A1 (en) 2005-11-30 2008-11-20 Jochen Heinzmann Visual Tracking Eye Glasses In Visual Head And Eye Tracking Systems
JP2009517745A (ja) 2005-11-30 2009-04-30 シーイング・マシーンズ・プロプライエタリー・リミテッド 視覚的に頭と目を追跡するシステムにおける眼鏡の視覚的追跡
US8280115B2 (en) * 2007-10-30 2012-10-02 Canon Kabushiki Kaisha Image processing apparatus and image processing method
US9357231B2 (en) * 2008-07-31 2016-05-31 Mitsubishi Electric Corporation Video encoding device, video encoding method, video reproducing device, video reproducing method, video recording medium, and video data stream
JP2011165081A (ja) 2010-02-12 2011-08-25 Nippon Telegr & Teleph Corp <Ntt> 画像生成方法、画像生成装置、及びプログラム
US8860714B2 (en) * 2010-03-29 2014-10-14 Fujifilm Corporation Apparatus and method for generating stereoscopic viewing image based on three-dimensional medical image, and a computer readable recording medium on which is recorded a program for the same
US20120256950A1 (en) * 2010-12-17 2012-10-11 Panasonic Corporation Medical support apparatus, medical support method, and medical support system
JP2015513833A (ja) 2012-02-27 2015-05-14 エー・テー・ハー・チューリッヒEth Zuerich ビデオ会議における眼差し補正のための画像処理のための方法およびシステム
US9057826B2 (en) * 2013-01-31 2015-06-16 Google Inc. See-through near-to-eye display with eye prescription
US9916690B2 (en) * 2013-06-19 2018-03-13 Panasonic Intellectual Property Management Co., Ltd. Correction of displayed images for users with vision abnormalities
US9552467B2 (en) * 2013-11-01 2017-01-24 Sony Corporation Information processing device and information processing method
WO2016159165A1 (ja) 2015-03-31 2016-10-06 大和ハウス工業株式会社 映像表示システム及び映像表示方法
JP2016192687A (ja) 2015-03-31 2016-11-10 大和ハウス工業株式会社 映像表示システム及び映像表示方法
US10136101B2 (en) * 2015-03-31 2018-11-20 Sony Corporation Information processing apparatus, communication system, and information processing method
US20200151427A1 (en) * 2017-06-07 2020-05-14 Sony Corporation Image processing device, image processing method, program, and telecommunication system

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Baek, et al., "Intermediate view synthesis for eye-gazing", Proceedings of SPIE—The International Society for Optical Engineering, Feb. 2015, 08 pages.
Baek, et al., "Intermediate view synthesis for eye-gazing", Proceedings of SPIE—The International Society for Optical Engineering, vol. 9406, 2015, 9 pages.
International Search Report and Written Opinion of PCT Application No. PCT/JP2018/019953, dated Jun. 26, 2018, 19 pages of ISRWO.
Ishii, et al., "MoPaCo: Window Interface to Enhance Telepresence in Video Communication", IEICE Transaction on Information and Systems, vol. J96-D, No. 1, Jan. 2013, 13 pages.
Ishii, et al., "MoPaCo: Window Interface to Enhance Telepresence in Video Communication", The IEICE Transactions on information and systems (japanese edition), vol. J96-D, No. 1, Jan. 2013, 13 pages.
Noda, et al., "A Study of High Presence Video Communication System in Tiled Display Environment", Information Processing Society of Japan, 2015, 10 pages.
Noda, et al., "A Study of High Presence Video Communication System in Tiled Display Environment", Information Processing Society of Japan, Technical Report, vol. 2015, No. 22, Mar. 5, 2015, 10 pages.

Also Published As

Publication number Publication date
US20200151427A1 (en) 2020-05-14
WO2018225518A1 (ja) 2018-12-13

Similar Documents

Publication Publication Date Title
US11068699B2 (en) Image processing device, image processing method, and telecommunication system to generate an output image for telecommunication
US20220014723A1 (en) Enhancing performance capture with real-time neural rendering
US9684953B2 (en) Method and system for image processing in video conferencing
CN112166604B (zh) 具有单个rgbd相机的对象的体积捕获
CN106981078B (zh) 视线校正方法、装置、智能会议终端及存储介质
Eng et al. Gaze correction for 3D tele-immersive communication system
WO2011148449A1 (ja) 映像処理装置、映像処理方法、および映像通信システム
JP2022524683A (ja) 深度情報を使用して実世界の物体をレンダリングするシステムおよび方法
WO2016149576A1 (en) Background modification in video conferencing
JP2008535116A (ja) 3次元レンダリング用の方法及び装置
US9380263B2 (en) Systems and methods for real-time view-synthesis in a multi-camera setup
US10602077B2 (en) Image processing method and system for eye-gaze correction
KR101933037B1 (ko) 360도 동영상에서의 가상현실 재생 장치
WO2020056769A1 (en) Method and system of facial resolution upsampling for image processing
US20230024396A1 (en) A method for capturing and displaying a video stream
KR101540113B1 (ko) 실감 영상을 위한 영상 데이터를 생성하는 방법, 장치 및 이 방법을 실행하기 위한 컴퓨터 판독 가능한 기록 매체
WO2021207747A2 (en) System and method for 3d depth perception enhancement for interactive video conferencing
JP2006331065A (ja) 顔情報送信装置、顔情報送信方法及びそのプログラムを記録した記録媒体
JP2017207818A (ja) 画像処理装置、画像処理方法、プログラム
Eisert et al. Volumetric video–acquisition, interaction, streaming and rendering
JP2015191537A (ja) 視線一致画像生成方法、装置およびプログラム
CN111556304B (zh) 一种全景影像处理方法、装置及***
KR101947799B1 (ko) 가상현실 콘텐츠 서비스를 위한 360도 vr 어안 렌더링 방법
US20230306698A1 (en) System and method to enhance distant people representation
KR102659115B1 (ko) 이미지 생성 장치 및 그를 위한 방법

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE