JP7485221B2

JP7485221B2 - Image processing device, image processing method, and image processing program

Info

Publication number: JP7485221B2
Application number: JP2023526771A
Authority: JP
Inventors: 弘員柿沼; 誉宗巻口; 秀信長田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2024-05-16
Anticipated expiration: 2041-06-10
Also published as: JPWO2022259480A1; WO2022259480A1

Description

実施形態は、映像処理装置、映像処理方法及び映像処理プログラムに関する。The embodiments relate to a video processing device, a video processing method, and a video processing program.

ユーザは、テレビジョン（テレビ）等のディスプレイに表示される遠隔地の映像を通じて、音楽及び演劇といったイベントを鑑賞したり、スポーツを観戦したりする際に、現地との一体感を感じにくいことがある。この理由の１つは、例えばユーザの周囲に観客の存在が感じられないことである。When a user watches a music or theater event or a sporting event through a remote image displayed on a television or other display, the user may find it difficult to feel a sense of unity with the actual location. One reason for this is that the user cannot sense the presence of spectators around the user.

これに対し、非特許文献１及び非特許文献２では、観客の様子や発言を画面上に提示することでユーザに他者の存在を感じさせる技術が提案されている。In response to this, Non-Patent Documents 1 and 2 propose a technique for making a user feel the presence of others by displaying the behavior and comments of the audience on a screen.

Nathan, Mukesh, et al. "CollaboraTV: making television viewing social again." Proceedings of the 1st international conference on Designing interactive user experiences for TV and video. 2008.Nathan, Mukesh, et al. "CollaboraTV: making television viewing social again." Proceedings of the 1st international conference on Designing interactive user experiences for TV and video. 2008. 代蔵巧, 棟方渚, 小野哲雄, & 松原仁. "ExciTV: 他者を感じる動画鑑賞システム." インタラクション (2011): 433-434.Takumi Daizo, Nagisa Munakata, Tetsuo Ono, & Jin Matsubara. "ExciTV: A video viewing system that allows users to sense other people." Interaction (2011): 433-434.

ユーザが現地との一体感を感じにくい別の理由として、現実と異なる距離感で映像がテレビ等のディスプレイに表示されることがある。これにより、ユーザは、仮に観客の様子等が提示されたとしても、違和感を覚えてしまい、実際の会場に居るかのような没入感を感じにくい。Another reason why users may find it difficult to feel a sense of unity with the actual venue is that images are displayed on a television or other display at a distance that differs from reality. This makes the user feel uncomfortable even if they are shown an image of the audience, making it difficult for them to feel immersed as if they were actually at the venue.

実施形態は、遠隔地の映像を通じて、音楽及び演劇といったイベントを鑑賞したり、スポーツを観戦したりする際に、ユーザが実際の会場で観ているかのような感覚を得ることができる映像処理装置、映像処理方法及び映像処理プログラムを提供する。The embodiments provide a video processing device, a video processing method, and a video processing program that allow a user to feel as if they are watching an event such as music or theater or a sporting event through video from a remote location as if they were at an actual venue.

実施形態の映像処理装置は、受信部と、距離推定部と、映像加工部と、送信部とを有する。受信部は、映像を受信する。距離推定部は、映像と同等の範囲の景色を人間が見ていたと仮定した場合の視聴距離を推定する。映像加工部は、視聴距離において映像に映る基準物のサイズが基準物の現実のサイズと一致するように映像を加工する。送信部とは、加工された映像をディスプレイに送信する。The image processing device of the embodiment includes a receiving unit, a distance estimation unit, an image processing unit, and a transmitting unit. The receiving unit receives an image. The distance estimation unit estimates a viewing distance assuming that a person is viewing a scene of the same range as the image. The image processing unit processes the image so that the size of a reference object shown in the image at the viewing distance matches the actual size of the reference object. The transmitting unit transmits the processed image to a display.

実施形態によれば、遠隔地の映像を通じて、音楽及び演劇といったイベントを鑑賞したり、スポーツを観戦したりする際に、ユーザが実際の会場で観ているかのような感覚を得ることができる映像処理装置、映像処理方法及び映像処理プログラムが提供される。According to the embodiments, a video processing device, a video processing method, and a video processing program are provided that allow a user to feel as if they are watching an event such as music or theater or a sporting event through video from a remote location as if they were at the actual venue.

図１は、実施形態に係る映像処理装置を含む映像配信システムの概略の構成を示す図である。FIG. 1 is a diagram showing a schematic configuration of a video distribution system including a video processing device according to an embodiment. 図２は、映像処理装置のハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of the video processing device. 図３は、映像処理装置の機能ブロック図である。FIG. 3 is a functional block diagram of the video processing device. 図４は、映像処理装置の動作を示すフローチャートである。FIG. 4 is a flowchart showing the operation of the video processing device. 図５は、パフォーマーの身長とカメラの高さ方向の撮影範囲との関係を示す図である。FIG. 5 is a diagram showing the relationship between the height of a performer and the shooting range of a camera in the height direction. 図６は、ユーザが会場でパフォーマーを見ていたと仮定したときのユーザとパフォーマーとの距離の関係を示す図である。FIG. 6 is a diagram showing the relationship of the distance between a user and a performer when it is assumed that the user is watching the performer at a venue. 図７Ａは、透過型のウェアラブルディスプレイが用いられた映像配信システムの概略の構成を示す図である。FIG. 7A is a diagram showing a schematic configuration of a video distribution system in which a transmissive wearable display is used. 図７Ｂは、透過型のウェアラブルディスプレイが用いられた映像配信システムの概略の構成を示す図である。FIG. 7B is a diagram showing a schematic configuration of a video distribution system in which a transmissive wearable display is used.

以下、実施形態について図面を参照して説明する。図１は、実施形態に係る映像処理装置を含む映像配信システムの概略の構成を示す図である。映像配信システム１は、カメラ１０と、映像処理装置２０と、ディスプレイ３０とを有する。[0023] Hereinafter, an embodiment will be described with reference to the drawings. Fig. 1 is a diagram showing a schematic configuration of a video distribution system including a video processing device according to an embodiment. The video distribution system 1 includes a camera 10, a video processing device 20, and a display 30.

カメラ１０は、例えば、音楽及び演劇といったイベントが開催されている会場における観客席ＡＳに設置される。具体的には、観客席ＡＳは、例えばステージＳに面するように配置されている。そして、観客席ＡＳの一部にカメラ１０を設置するための空間が空けられている。カメラ１０は、移動できるように構成されたカメラであってもよく、位置が固定されたカメラであってもよい。カメラ１０は、パフォーマーＰを予め定められたフレームレートで撮影し、パフォーマーＰの映像のデータを生成する。映像のデータは、ディスプレイ３０によって表示できる映像のサイズ等に応じてリサイズされてよい。例えば、ディスプレイ３０によって表示できる映像のサイズがフルＨＤ（High Definition)サイズであれば、映像のデータは、１９２０画素×１０８０画素にリサイズされてよい。また、カメラ１０は、映像処理装置２０と通信できるように接続されている。カメラ１０で撮影された映像のデータは、映像処理装置２０に送信される。The camera 10 is installed in an audience seat AS in a venue where an event such as a music or theater performance is held. Specifically, the audience seat AS is arranged to face, for example, a stage S. A space for installing the camera 10 is provided in a part of the audience seat AS. The camera 10 may be a camera configured to be movable, or may be a camera with a fixed position. The camera 10 captures a performer P at a predetermined frame rate and generates video data of the performer P. The video data may be resized according to the size of the video that can be displayed by the display 30. For example, if the size of the video that can be displayed by the display 30 is full HD (High Definition) size, the video data may be resized to 1920 pixels x 1080 pixels. The camera 10 is also connected to the video processing device 20 so as to be able to communicate with the video processing device 20. The video data captured by the camera 10 is transmitted to the video processing device 20.

ここで、実施形態におけるパフォーマーＰは、音楽のイベントであれば演奏者等であり、演劇のイベントであれば役者等であるといったように、イベントにおいて各種の活動をする人の総称である。実施形態において、パフォーマーＰは、特定の表現活動をする人に限定されない。Here, the performer P in the embodiment is a general term for people who perform various activities at an event, such as a musician in a music event or an actor in a theater event. In the embodiment, the performer P is not limited to a person who performs a specific expression activity.

また、図１では、カメラ１０の台数は１台である。しかしながら、カメラ１０の台数は、１台に限定されない。例えば、会場内の複数の位置にカメラ１０が設置されていてもよい。1, the number of cameras 10 is one. However, the number of cameras 10 is not limited to one. For example, the cameras 10 may be installed at multiple positions within the venue.

映像処理装置２０は、カメラ１０から送信されてきた映像を処理する。例えば、映像処理装置２０は、ディスプレイ３０の表示によってユーザＵに知覚される３次元空間上に配置される仮想ディスプレイ面のユーザＵからの距離及び仮想ディスプレイ面におけるパフォーマーＰの身長が、ユーザが実際に会場でパフォーマーＰを見ていたと仮定したときに知覚するであろう距離及び身長となるように映像を処理する。そして、映像処理装置２０は、処理した映像をディスプレイ３０に送信する。映像処理装置２０は、会場の中に設置されてもよいし、会場の外に設置されてもよい。会場の中として、映像処理装置２０は、カメラ１０に含められてもよい。また、会場の外として、映像処理装置２０は、ディスプレイ３０に含められてもよい。勿論、映像処理装置２０は、カメラ１０及びディスプレイ３０と別体であってもよい。映像処理装置２０の詳細については後で説明する。The video processing device 20 processes the video transmitted from the camera 10. For example, the video processing device 20 processes the video so that the distance from the user U to the virtual display surface arranged in the three-dimensional space perceived by the user U through the display on the display 30 and the height of the performer P on the virtual display surface are the distance and height that the user would perceive if he or she were actually watching the performer P at the venue. Then, the video processing device 20 transmits the processed video to the display 30. The video processing device 20 may be installed inside the venue or outside the venue. As the inside of the venue, the video processing device 20 may be included in the camera 10. As the outside of the venue, the video processing device 20 may be included in the display 30. Of course, the video processing device 20 may be separate from the camera 10 and the display 30. Details of the video processing device 20 will be described later.

ディスプレイ３０は、映像処理装置２０と通信できるように構成されており、映像処理装置２０から送信されてきた映像を表示する。ディスプレイ３０は、会場に対して遠隔地に居るユーザＵの頭部に装着される例えば非透過型の眼鏡型ウェアラブルディスプレイである。ディスプレイ３０は、３次元表示をすることできるように構成されている。例えば、ディスプレイ３０は、両眼位置にそれぞれディスプレイユニットを有し、映像処理装置２０から送信されてきたカメラ１０の映像と、観客のアバター映像が合成された映像をそれぞれのディスプレイユニットに表示することによって、ユーザＵから所定の視聴距離の位置においてパフォーマーＰの虚像Ｐｉと、観客のアバターの虚像Ａｉを知覚させる。ここで、アバターは、観客を模した映像であり、例えば観客を表す２次元又は３次元のイラスト映像であってよい。また、アバターは、実写を用いたＣＧ映像等であってもよい。ディスプレイ３０による３次元表示のための構成は、特定の構成には限定されない。また、ユーザＵのいる場所は、ユーザＵの自宅であったり、会場とは別に設けられたパブリックビューイング会場であったりといったように会場に対して遠隔地であればよい。The display 30 is configured to be able to communicate with the image processing device 20 and displays the image transmitted from the image processing device 20. The display 30 is, for example, a non-transparent eyeglass-type wearable display that is worn on the head of a user U who is in a remote location from the venue. The display 30 is configured to be able to display three-dimensional images. For example, the display 30 has display units at both eye positions, and displays an image obtained by synthesizing the image of the camera 10 transmitted from the image processing device 20 and the avatar image of the audience on each display unit, thereby allowing the user U to perceive a virtual image Pi of the performer P and a virtual image Ai of the avatar of the audience at a position at a predetermined viewing distance from the user U. Here, the avatar is an image that imitates an audience, and may be, for example, a two-dimensional or three-dimensional illustration image that represents the audience. The avatar may also be a CG image using live action. The configuration for three-dimensional display by the display 30 is not limited to a specific configuration. Furthermore, the location of the user U may be a remote location from the venue, such as the user U's home or a public viewing venue set up separately from the venue.

また、会場には、カメラ１０に加えて、カメラ４０及び大型サービスモニタ５０が配置されていてもよい。カメラ４０は、大型サービスモニタ５０と通信できるように接続されている。カメラ４０は、例えばステージＳの上に配置され、ステージＳの上のパフォーマーＰを撮影するように構成されている。カメラ４０の台数は、１台に限定されない。例えば、カメラ４０は、観客席ＡＳにも設置されていてよい。大型サービスモニタ５０は、観客席ＡＳに面するように例えばステージＳの上に設置され、カメラ４０で撮影された映像を大画面で表示する。In addition to the camera 10, a camera 40 and a large service monitor 50 may be placed at the venue. The camera 40 is connected so as to be able to communicate with the large service monitor 50. The camera 40 is placed, for example, on the stage S, and is configured to capture images of the performers P on the stage S. The number of cameras 40 is not limited to one. For example, the camera 40 may also be installed in the spectator seats AS. The large service monitor 50 is placed, for example, on the stage S so as to face the spectator seats AS, and displays the images captured by the camera 40 on a large screen.

図１で示した映像配信システム１において、例えば、ステージＳに近い観客席ＡＳに座っている観客Ａ１及びパフォーマーＰに面した観客席ＡＳに座っている観客Ａ２は、ステージＳのパフォーマーＰを肉眼で見ることができる。ただし、ステージＳの近くの観客Ａ１とステージＳから遠くの観客Ａ２とでは、知覚する景色が異なる。具体的には、観客Ａ１が知覚する景色ｖ１に対して、観客Ａ２が知覚する景色ｖ２ではパフォーマーＰが小さくなる。また、ステージＳから遠くの観客席ＡＳに座っている観客Ａ３は、大型サービスモニタ５０の映像を介してパフォーマーＰを肉眼で見ることができる。したがって、観客Ａ３も、観客Ａ１及びＡ２とは異なる景色ｖ３を知覚する。このように、人間は、同一の会場でパフォーマーＰを見たとしても視聴位置によって異なる景色を知覚する。このことが、ユーザＵがカメラ１０で撮影された映像を遠隔地で見た場合の違和感の原因の１つである。In the video distribution system 1 shown in FIG. 1, for example, a spectator A1 sitting in the spectator seat AS close to the stage S and a spectator A2 sitting in the spectator seat AS facing the performer P can see the performer P on the stage S with the naked eye. However, the view perceived by the spectator A1 close to the stage S is different from that perceived by the spectator A2 far from the stage S. Specifically, the performer P is smaller in the view v2 perceived by the spectator A2 than in the view v1 perceived by the spectator A1. Also, a spectator A3 sitting in the spectator seat AS far from the stage S can see the performer P with the naked eye through the image on the large service monitor 50. Therefore, the spectator A3 also perceives a view v3 different from that of the spectators A1 and A2. In this way, even if a person sees the performer P at the same venue, the view perceived by the person differs depending on the viewing position. This is one of the reasons why a user U feels uncomfortable when viewing the image captured by the camera 10 at a remote location.

図２は、映像処理装置２０のハードウェア構成の一例を示す図である。映像処理装置２０は、コンピュータとして構成され得る。映像処理装置２０は、単一のコンピュータである必要はなく、複数のコンピュータによって構成されていてもよい。図２に示すように、映像処理装置２０は、プロセッサ２０１と、ＲＯＭ(Read Only Memory)２０２と、ＲＡＭ(Random Access Memory)２０３と、ストレージ２０４と、入力装置２０５と、通信モジュール２０６とを有している。ここで、映像処理装置２０は、ディスプレイ等をさらに有していてもよい。Fig. 2 is a diagram showing an example of a hardware configuration of the video processing device 20. The video processing device 20 may be configured as a computer. The video processing device 20 does not need to be a single computer, and may be configured by multiple computers. As shown in Fig. 2, the video processing device 20 has a processor 201, a read only memory (ROM) 202, a random access memory (RAM) 203, a storage 204, an input device 205, and a communication module 206. Here, the video processing device 20 may further have a display or the like.

プロセッサ２０１は、様々なプログラムを実行することが可能な処理回路であり、映像処理装置２０の全体の動作を制御する。プロセッサ２０１は、ＣＰＵ（Central Processing Unit)、ＭＰＵ(Micro Processing Unit)、ＧＰＵ(Graphics Processing Unit)等のプ
ロセッサであってよい。また、プロセッサ２０１は、ＡＳＩＣ（Application Specific Integrated Circuit)、ＦＰＧＡ(Field Programmable Gate Array)等であってもよい。さ
らに、プロセッサ２０１は、単一のＣＰＵ等で構成されていてもよいし、複数のＣＰＵ等で構成されていてもよい。 The processor 201 is a processing circuit capable of executing various programs, and controls the overall operation of the video processing device 20. The processor 201 may be a processor such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or a GPU (Graphics Processing Unit). The processor 201 may also be an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or the like. The processor 201 may be composed of a single CPU or the like, or may be composed of multiple CPUs or the like.

ＲＯＭ２０２は、不揮発性の半導体メモリであり、映像処理装置２０を制御するためのプログラム及び制御データ等を保持している。The ROM 202 is a non-volatile semiconductor memory, and holds programs and control data for controlling the video processing device 20 .

ＲＡＭ２０３は、例えば揮発性の半導体メモリであり、プロセッサ２０１の作業領域として使用される。The RAM 203 is, for example, a volatile semiconductor memory, and is used as a working area for the processor 201 .

ストレージ２０４は、ハードディスクドライブ（ＨＤＤ）、ソリッドステートドライブ（ＳＳＤ）といった不揮発性の記憶装置である。ストレージ２０４は、プログラム２０４１、変数２０４２及び観客データ２０４３を保持している。The storage 204 is a non-volatile storage device such as a hard disk drive (HDD) or a solid state drive (SSD). The storage 204 holds a program 2041, variables 2042, and audience data 2043.

プログラム２０４１は、カメラ１０の映像の処理のためのプログラムである。プログラム２０４１は、カメラ１０の映像から撮影範囲を推定する処理と、推定した撮影範囲と同等の範囲の景色を人間が見たと仮定した場合の仮想的な視聴距離を推定する処理と、推定した視聴距離に基づいて映像を加工する処理とをプロセッサ２０１に実行させるためのプログラムである。Program 2041 is a program for processing the image of camera 10. Program 2041 is a program for causing processor 201 to execute a process of estimating a shooting range from the image of camera 10, a process of estimating a virtual viewing distance on the assumption that a human views a scene having a range equivalent to the estimated shooting range, and a process of processing the image based on the estimated viewing distance.

変数２０４２は、映像の処理に用いられる各種の変数である。実施形態では、変数２０４２は、基準物の長さ、イメージセンサの縦方向サイズ、焦点距離及び映像の縦画素数を含む。Variables 2042 are various variables used in processing the image. In an embodiment, variables 2042 include the length of the reference object, the vertical size of the image sensor, the focal length, and the number of vertical pixels of the image.

基準物の長さは、撮影範囲の推定に用いられる基準物の現実の長さである。基準物は、カメラ１０で撮影される映像に映り得る、長さが既知の物体であればよい。例えば、パフォーマーＰは、基準物であり得る。基準物がパフォーマーＰであるとき、基準物の長さはパフォーマーＰの身長であってよい。パフォーマーＰの身長は、例えば、イベントの開催前にイベントの主催者等によって映像処理装置２０に入力されてよい。The length of the reference object is the actual length of the reference object used to estimate the shooting range. The reference object may be any object that can appear in an image captured by the camera 10 and has a known length. For example, the reference object may be a performer P. When the reference object is the performer P, the length of the reference object may be the height of the performer P. The height of the performer P may be input to the image processing device 20 by, for example, an event organizer before the event is held.

イメージセンサの縦方向サイズは、人間の眼の網膜をカメラのイメージセンサとみなしたときの縦方向サイズである。例えば、人間の眼の網膜をフルサイズのイメージセンサと同等とみなした場合、イメージセンサの縦方向サイズは２４ｍｍである。また、人間の眼の網膜機能をＡＰＳ－Ｃサイズのイメージセンサと同等とみなした場合、イメージセンサの縦方向サイズは１６．７ｍｍである。The vertical size of an image sensor is the vertical size when the retina of the human eye is considered to be the image sensor of a camera. For example, when the retina of the human eye is considered to be equivalent to a full-size image sensor, the vertical size of the image sensor is 24 mm. Also, when the retinal function of the human eye is considered to be equivalent to an APS-C size image sensor, the vertical size of the image sensor is 16.7 mm.

焦点距離は、人間の眼の水晶体をカメラのレンズとみなしたときの焦点距離の値である。一般に、人間の眼をレンズとフルサイズのイメージセンサからなるカメラとみなすと、その焦点距離は１０～１２ｍｍ相当であると言われている。ただし、実際には人間は、焦点距離１０～１２ｍｍ相当の視野から眼に入る光をすべて処理できるわけではなく、視野の一部の範囲から眼に入る光だけを処理している。この一部の範囲は、焦点距離で５０ｍｍ程度であると言われている。実施形態では、ストレージ２０４は、焦点距離の値として５０ｍｍを保持している。なお、人間の眼をレンズとＡＰＳ－Ｃサイズのイメージセンサからなるカメラとみなすと、その焦点距離は３５ｍｍ程度である。この場合、ストレージ２０４は、焦点距離の値として３５ｍｍを保持している。The focal length is the focal length value when the crystalline lens of the human eye is regarded as a camera lens. In general, if the human eye is regarded as a camera consisting of a lens and a full-size image sensor, the focal length is said to be equivalent to 10 to 12 mm. However, in reality, humans cannot process all the light that enters the eye from a field of view equivalent to a focal length of 10 to 12 mm, but only process the light that enters the eye from a part of the field of view. This part of the field is said to be about 50 mm in focal length. In the embodiment, the storage 204 holds 50 mm as the focal length value. Note that if the human eye is regarded as a camera consisting of a lens and an APS-C size image sensor, the focal length is about 35 mm. In this case, the storage 204 holds 35 mm as the focal length value.

映像の縦画素数は、カメラ１０においてリサイズされた後の映像の縦画素数である。例えば、映像のサイズがフルＨＤ（High Definition)サイズにリサイズされていれば、縦画素数は１０８０画素である。The number of vertical pixels of an image is the number of vertical pixels of the image after it has been resized by the camera 10. For example, if the size of the image has been resized to full HD (High Definition) size, the number of vertical pixels is 1080 pixels.

また、ストレージ２０４は、観客データ２０４３を保持している。観客データ２０４３は、観客を表す映像のデータを含む。観客を表す映像のデータは、例えば観客のアバターの映像のデータである。観客を表す映像のデータは、予め撮影された実際の観客の映像のデータであってもよい。さらに、観客データ２０４３は、会場においてイベントを視聴している観客の心拍や、動作状態、感情等の生体情報をメタデータとして含んでいてもよい。このようなメタデータはイベントの進行中に会場において逐次に収集され、映像処理装置２０に送信される。The storage 204 also holds spectator data 2043. The spectator data 2043 includes video data representing the spectators. The video data representing the spectators is, for example, video data of spectator avatars. The video data representing the spectators may be video data of actual spectators filmed in advance. Furthermore, the spectator data 2043 may include biometric information such as heart rate, movement state, and emotions of spectators watching the event at the venue as metadata. Such metadata is collected sequentially at the venue while the event is in progress, and transmitted to the video processing device 20.

入力装置２０５は、映像処理装置２０の管理者が映像処理装置２０を操作するためのインターフェース機器である。入力装置２０５は、例えば、タッチパネル、キーボード、マウス、各種の操作ボタン、各種の操作スイッチ等を含み得る。入力装置２０５は、例えば変数２０４２の入力に用いられ得る。The input device 205 is an interface device for an administrator of the video processing device 20 to operate the video processing device 20. The input device 205 may include, for example, a touch panel, a keyboard, a mouse, various operation buttons, various operation switches, etc. The input device 205 may be used to input, for example, a variable 2042.

通信モジュール２０６は、映像処理装置２０と他の機器との通信に使用される回路を含むモジュールである。通信モジュール２０６は、例えば有線ＬＡＮの規格に準拠した通信モジュールであってよい。また、通信モジュール２０６は、例えば無線ＬＡＮの規格に準拠した通信モジュールであってもよい。The communication module 206 is a module including a circuit used for communication between the video processing device 20 and other devices. The communication module 206 may be, for example, a communication module conforming to a wired LAN standard. The communication module 206 may also be, for example, a communication module conforming to a wireless LAN standard.

図３は、映像処理装置２０の機能ブロック図である。図３に示すように、映像処理装置２０は、受信部２０１１と、距離推定部２０１２と、映像加工部２０１３と、３次元処理部２０１４と、送信部２０１５とを有している。映像処理装置２０のプロセッサ２０１は、プログラム２０４１を実行することによって、受信部２０１１と、距離推定部２０１２と、映像加工部２０１３と、３次元処理部２０１４と、送信部２０１５として動作し得る。受信部２０１１と、距離推定部２０１２と、映像加工部２０１３と、３次元処理部２０１４と、送信部２０１５とは、プロセッサ２０１とは別のハードウェアによって実現されてもよい。Fig. 3 is a functional block diagram of the video processing device 20. As shown in Fig. 3, the video processing device 20 has a receiving unit 2011, a distance estimation unit 2012, a video processing unit 2013, a three-dimensional processing unit 2014, and a transmission unit 2015. The processor 201 of the video processing device 20 can operate as the receiving unit 2011, the distance estimation unit 2012, the video processing unit 2013, the three-dimensional processing unit 2014, and the transmission unit 2015 by executing a program 2041. The receiving unit 2011, the distance estimation unit 2012, the video processing unit 2013, the three-dimensional processing unit 2014, and the transmission unit 2015 may be realized by hardware other than the processor 201.

受信部２０１１は、カメラ１０から通信モジュール２０６を介して受信された映像を取得し、取得した映像をフレームの単位に分解する。そして、受信部２０１１は、フレーム単位の映像を逐次に距離推定部２０１２と映像加工部２０１３とに転送する。例えば、映像のフレームレートが６０ｆｐｓ（frame per second)であるとき、受信部２０１１は、
映像を６０フレームに分解する。 The receiving unit 2011 acquires the image received from the camera 10 via the communication module 206, and breaks down the acquired image into frame units. The receiving unit 2011 then sequentially transfers the frame-by-frame image to the distance estimation unit 2012 and the image processing unit 2013. For example, when the frame rate of the image is 60 fps (frames per second), the receiving unit 2011
The video is divided into 60 frames.

距離推定部２０１２は、受信部２０１１から転送されてきた映像から、カメラ１０で撮影された映像と同等の範囲の景色をユーザＵが会場で見ていたと仮定したときのユーザＵとパフォーマーＰとの仮想的な視聴距離を推定する。距離推定部２０１２の動作については後で詳しく説明する。The distance estimation unit 2012 estimates, from the video transferred from the receiving unit 2011, a virtual viewing distance between the user U and the performer P on the assumption that the user U is viewing a view in the venue having a range equivalent to that of the video captured by the camera 10. The operation of the distance estimation unit 2012 will be described in detail later.

映像加工部２０１３は、距離推定部２０１２で推定された視聴距離とパフォーマーＰの身長とに基づいて、カメラ１０で撮影された映像を加工する。例えば、映像加工部２０１３は、３次元表示によってユーザＵによって知覚される３次元空間におけるパフォーマーＰの身長がパフォーマーＰの現実の身長と一致するように、映像におけるパフォーマーＰの領域を拡大又は縮小する。映像加工部２０１３の動作については後で詳しく説明する。The image processing unit 2013 processes the image captured by the camera 10 based on the viewing distance estimated by the distance estimation unit 2012 and the height of the performer P. For example, the image processing unit 2013 enlarges or reduces the area of the performer P in the image so that the height of the performer P in the three-dimensional space perceived by the user U through the three-dimensional display matches the actual height of the performer P. The operation of the image processing unit 2013 will be described in detail later.

３次元処理部２０１４は、ディスプレイ３０による３次元表示のためのレンダリングの処理を行う。例えば、３次元処理部２０１４は、ユーザＵによって知覚される３次元空間における距離推定部２０１２で推定された視聴距離において、パフォーマーＰの映像平面を配置する。また、３次元処理部２０１４は、距離推定部２０１２で推定された視聴距離に基づいて、パフォーマーＰの映像平面の手前に観客を表す映像を配置する。そして、３次元処理部２０１４は、３次元空間における仮想的な光源による反射等を考慮して、３次元映像のデータをレンダリングする。そして、３次元処理部２０１４は、パフォーマーＰの映像平面と観客の映像平面とを含む３次元空間をユーザＵの位置から仮想的なステレオカメラで撮影したときに得られる右眼映像と左眼映像とを取得する。そして、３次元処理部２０１４は、取得した右眼映像と左眼映像とを含む３次元映像のデータを送信部２０１５に転送する。３次元処理部２０１４の動作については後で詳しく説明する。The three-dimensional processing unit 2014 performs a rendering process for three-dimensional display by the display 30. For example, the three-dimensional processing unit 2014 places the image plane of the performer P at the viewing distance estimated by the distance estimation unit 2012 in the three-dimensional space perceived by the user U. The three-dimensional processing unit 2014 also places an image representing the audience in front of the image plane of the performer P based on the viewing distance estimated by the distance estimation unit 2012. The three-dimensional processing unit 2014 then renders the data of the three-dimensional image, taking into consideration reflections from a virtual light source in the three-dimensional space, etc. The three-dimensional processing unit 2014 then acquires a right-eye image and a left-eye image obtained when a three-dimensional space including the image plane of the performer P and the image plane of the audience is photographed from the position of the user U with a virtual stereo camera. The three-dimensional processing unit 2014 then transfers the data of the three-dimensional image including the acquired right-eye image and left-eye image to the transmission unit 2015. The operation of the three-dimensional processing unit 2014 will be described in detail later.

送信部２０１５は、３次元処理部２０１４から送られてきた３次元映像のデータを、通信モジュール２０６を介してディスプレイ３０に送信する。The transmission unit 2015 transmits the three-dimensional video data sent from the three-dimensional processing unit 2014 to the display 30 via the communication module 206 .

次に、実施形態における映像配信システム１の動作を説明する。図４は、映像処理装置２０の動作を示すフローチャートである。図４の処理は、例えばイベントが開始されて終了されるまでの間の一定時間毎にプロセッサ２０１によって実施される。この一定時間は、例えばカメラ１０から映像のデータが送信される時間間隔である。Next, the operation of the video distribution system 1 in the embodiment will be described. Fig. 4 is a flowchart showing the operation of the video processing device 20. The process in Fig. 4 is performed by the processor 201 at regular intervals between the start and end of an event, for example. This regular interval is the time interval at which video data is transmitted from the camera 10, for example.

ステップＳ１において、プロセッサ２０１は、ストレージ２０４から変数２０４２を取得する。前述したように、変数２０４２は、基準物の長さ、焦点距離、イメージセンサの縦方向サイズ及び映像の縦画素数を含む。以下の例では、基準物の長さは、パフォーマーＰの身長である。In step S1, the processor 201 obtains the variable 2042 from the storage 204. As described above, the variable 2042 includes the length of the reference object, the focal length, the vertical size of the image sensor, and the number of vertical pixels of the image. In the following example, the length of the reference object is the height of the performer P.

ステップＳ２において、プロセッサ２０１は、例えばカメラ１０から送信されてＲＡＭ２０３に蓄積された映像のデータを取得する。そして、プロセッサ２０１は、映像をフレームの単位に分解する。また、プロセッサ２０１は、ストレージ２０４から観客データ２０４３を取得する。以下の例では、観客データ２０４３の観客を表す映像は、観客のアバター映像である。In step S2, the processor 201 acquires, for example, video data transmitted from the camera 10 and stored in the RAM 203. The processor 201 then breaks down the video into frame units. The processor 201 also acquires spectator data 2043 from the storage 204. In the following example, the video representing the spectators in the spectator data 2043 is an avatar video of the spectators.

ステップＳ３において、プロセッサ２０１は、距離推定処理を実施する。以下、距離推定処理について説明する。ここで、以下の説明をするに当たり、変数２０４２の値は例えば次のものであるとする。
パフォーマーＰの身長Ｈｐ＝１．７（ｍ）
焦点距離ｆ＝５０（ｍｍ）
イメージセンサの縦方向サイズＳ＝２４（ｍｍ）
映像の縦画素数ｈ＝１０８０（画素） In step S3, the processor 201 executes a distance estimation process. The distance estimation process will be described below. In the following description, it is assumed that the value of the variable 2042 is, for example, as follows.
Performer P's height Hp = 1.7 (m)
Focal length f=50 (mm)
Image sensor vertical size S = 24 (mm)
Number of vertical pixels of the image h = 1080 (pixels)

距離推定処理において、まず、プロセッサ２０１は、それぞれのフレームの単位の映像における物体検出をする。実施形態における物体検出は、映像における長さが既知の物体を検出する処理である。実施形態では、プロセッサ２０１は、パフォーマーＰを検出する。物体検出の手法は、Mask R-CNN及びYOLO等の物体検出アルゴリズムに基づいて行われてよい。以下では、物体検出により、映像に占めるパフォーマーＰの領域の縦画素数ｈｐが６４０（画素）と推定されたとする。In the distance estimation process, first, the processor 201 detects an object in the video of each frame. The object detection in the embodiment is a process of detecting an object of a known length in the video. In the embodiment, the processor 201 detects the performer P. The object detection method may be based on an object detection algorithm such as Mask R-CNN and YOLO. In the following, it is assumed that the number of vertical pixels hp of the area of the performer P in the video is estimated to be 640 (pixels) by the object detection.

物体検出の後、プロセッサ２０１は、パフォーマーＰの身長Ｈｐに基づき、パフォーマーＰの立ち位置におけるカメラ１０の高さ方向の撮影範囲Ｈ（ｍ）を推定する。図５は、パフォーマーＰの身長Ｈｐと撮影範囲Ｈとの関係を示す図である。図５に示すパフォーマーＰの身長Ｈｐと撮影範囲Ｈとの割合は、パフォーマーＰの縦画素数ｈｐと映像の縦画素数ｈとの割合と一致する。したがって、以下の（式１）の関係が成り立つ。
Ｈｐ／Ｈ＝ｈｐ／ｈ（式１）
したがって、Ｈ＝ｈ／ｈｐ×Ｈｐ＝２．８７（ｍ）である。 After object detection, the processor 201 estimates the height-direction imaging range H (m) of the camera 10 at the standing position of the performer P based on the height Hp of the performer P. Fig. 5 is a diagram showing the relationship between the height Hp of the performer P and the imaging range H. The ratio between the height Hp of the performer P and the imaging range H shown in Fig. 5 matches the ratio between the number of vertical pixels hp of the performer P and the number of vertical pixels h of the image. Therefore, the following relationship (Equation 1) holds.
Hp/H=hp/h (Equation 1)
Therefore, H = h/hp x Hp = 2.87 (m).

撮影範囲Ｈは、パフォーマーＰからある距離に配置されたある焦点距離を有するカメラ１０で撮影した場合の撮影範囲である。したがって、図６に示すようにして、撮影範囲Ｈと同等の範囲の景色をユーザＵが見ていたと仮定したときの視聴距離Ｄは、ユーザＵの眼を焦点距離ｆのレンズ及び縦方向サイズＳのイメージセンサを有するカメラと考えると、以下の（式２）より計算される。
Ｓ／ｆ＝Ｈ／Ｄ（式２）
したがって、Ｄ＝Ｈ×ｆ／Ｓ＝５．９８（ｍ）である。以上で距離推定処理が完了する。その後、プロセッサ２０１は、処理をステップＳ４に移す。 The shooting range H is the shooting range when shooting with a camera 10 having a certain focal length placed at a certain distance from the performer P. Therefore, as shown in Fig. 6, when it is assumed that the user U is looking at a view of a range equivalent to the shooting range H, the viewing distance D can be calculated by the following (Equation 2) when considering the eye of the user U as a camera having a lens with a focal length f and an image sensor with a vertical size S.
S/f=H/D (Equation 2)
Therefore, D=H×f/S=5.98 (m). This completes the distance estimation process. After that, the processor 201 proceeds to step S4.

ここで図４の説明に戻る。ステップＳ４において、プロセッサ２０１は、映像の加工処理を行う。すなわち、プロセッサ２０１は、３次元表示によってユーザＵによって知覚される仮想ディスプレイ面におけるパフォーマーＰの身長が実際の身長であるＨｐとなるように映像を拡大又は縮小する。Returning now to the description of Fig. 4, in step S4, the processor 201 processes the image. That is, the processor 201 enlarges or reduces the image so that the height of the performer P on the virtual display surface perceived by the user U through the three-dimensional display becomes the actual height Hp.

ステップＳ５において、プロセッサ２０１は、３次元映像の生成処理を行う。具体的には、プロセッサ２０１は、ユーザＵから視聴距離Ｄにおいて、パフォーマーＰの映像平面を配置する。さらに、プロセッサ２０１は、ユーザＵとパフォーマーＰの映像平面との間に観客のアバターの映像平面を配置する。そして、プロセッサ２０１は、３次元空間をユーザＵの位置から仮想的なステレオカメラで撮影したときに得られる右眼映像と左眼映像とを取得する。ここで、パフォーマーＰの映像に重ねられるアバター映像における観客のアバターの数及びサイズは視聴距離Ｄによって決められてよい。具体的には、観客のアバターの数は、視聴距離Ｄが短くなるほど少なくなり、視聴距離Ｄが閾値以下のときには０になる。これは、現実の会場におけるイベントでは、ステージＳに近い観客席ほど、その前に座っている観客の数が少なくなることを再現するためである。したがって、視聴距離Ｄの閾値は、例えば会場における最前列の観客席とステージＳとの距離に設定され得る。また、観客のアバターのサイズは、３次元表示によってユーザに知覚される３次元空間内でユーザＵに近い位置ほど大きくなり、遠い位置ほど小さくなる。これは、現実の会場におけるイベントでは、近い観客ほど大きく見えることを再現するためである。また、ステップＳ５において、プロセッサ２０１は、パフォーマーＰとアバターとをそれぞれ照明する仮想的な光源による反射及び散乱等が考慮された視聴距離ＤにおけるパフォーマーＰとアバターの見えを既存の光沢再現技術によって計算する。さらに、プロセッサ２０１は、観客の生体データに応じてアバターを動作させる等の加工を施してもよい。In step S5, the processor 201 performs a process of generating a three-dimensional image. Specifically, the processor 201 places an image plane of the performer P at a viewing distance D from the user U. Furthermore, the processor 201 places an image plane of the audience avatar between the image planes of the user U and the performer P. Then, the processor 201 acquires a right-eye image and a left-eye image obtained when the three-dimensional space is photographed by a virtual stereo camera from the position of the user U. Here, the number and size of the audience avatars in the avatar image superimposed on the image of the performer P may be determined by the viewing distance D. Specifically, the number of audience avatars decreases as the viewing distance D becomes shorter, and becomes 0 when the viewing distance D is equal to or less than a threshold. This is to reproduce the fact that in an event at a real venue, the closer the audience seats are to the stage S, the fewer the number of audience members sitting in front of them. Therefore, the threshold value of the viewing distance D can be set to, for example, the distance between the front row of audience seats in the venue and the stage S. Furthermore, the size of the audience avatars becomes larger the closer they are to the user U in the three-dimensional space perceived by the user through the three-dimensional display, and becomes smaller the farther they are from the user. This is to reproduce the phenomenon that, at an event held at a real venue, the closer the audience is, the larger they appear to be. In addition, in step S5, the processor 201 calculates the appearance of the performer P and the avatar at the viewing distance D, taking into account reflections and scattering caused by virtual light sources that respectively illuminate the performer P and the avatar, using existing gloss reproduction technology. Furthermore, the processor 201 may perform processing such as moving the avatar in accordance with the audience's biometric data.

ステップＳ６において、プロセッサ２０１は、右眼映像と左眼映像とを含む３次元映像のデータを、通信モジュール２０６を介してディスプレイ３０に送信する。ディスプレイ３０は、受信した３次元映像のデータに基づいて両眼のディスプレイユニットに右眼映像と左眼映像とを適宜に表示する。これにより、ユーザＵは、視聴距離Ｄの位置において身長がＨｐのパフォーマーＰの映像を知覚する。In step S6, the processor 201 transmits data of the three-dimensional image including the right eye image and the left eye image to the display 30 via the communication module 206. The display 30 appropriately displays the right eye image and the left eye image on the display units of both eyes based on the received three-dimensional image data. This allows the user U to perceive an image of the performer P with height Hp at a position at the viewing distance D.

ステップＳ７において、プロセッサ２０１は、処理を終了するか否かを判定する。例えば、イベントが終了してカメラ１０との通信が切断された場合に、処理を終了すると判定される。ステップＳ７において、処理を終了すると判定されていないときには、処理はステップＳ２に戻る。ステップＳ７において、処理を終了すると判定されたときには、プロセッサ２０１は、図４の処理を終了させる。In step S7, the processor 201 determines whether or not to end the process. For example, when the event ends and communication with the camera 10 is disconnected, it is determined that the process is to end. When it is not determined that the process is to end in step S7, the process returns to step S2. When it is determined that the process is to end in step S7, the processor 201 ends the process in FIG. 4.

以上説明したように実施形態によれば、会場に配置されたカメラ１０で撮影される映像とパフォーマーＰの身長とから、パフォーマーＰの立ち位置におけるカメラ１０の撮影範囲が推定される。そして、推定されたカメラ１０の撮影範囲に基づき、このカメラ１０の撮影範囲と同等の範囲の景色をユーザＵが見たと仮定したときの視聴距離Ｄが推定される。そして、３次元表示によってユーザＵに知覚される３次元空間において視聴距離Ｄに身長ＨｐのパフォーマーＰが位置するように仮想ディスプレイ面が配置され、仮想ディスプレイ面でのパフォーマーＰのサイズが調整される。このようにして表示される３次元映像により、ユーザＵは、あたかも会場でパフォーマーＰを見ているかのような感覚を得ることができる。As described above, according to the embodiment, the shooting range of the camera 10 at the standing position of the performer P is estimated from the image captured by the camera 10 arranged in the venue and the height of the performer P. Then, based on the estimated shooting range of the camera 10, the viewing distance D is estimated assuming that the user U sees a view of a range equivalent to the shooting range of the camera 10. Then, the virtual display surface is arranged so that the performer P of height Hp is located at the viewing distance D in the three-dimensional space perceived by the user U by the three-dimensional display, and the size of the performer P on the virtual display surface is adjusted. The three-dimensional image displayed in this way gives the user U the sensation of watching the performer P at the venue.

さらに、視聴距離Ｄに応じて観客のアバター映像がパフォーマーＰの映像に重ねられることにより、よりユーザＵは、実際に会場に居るかのような感覚を得ることができる。Furthermore, by overlaying the avatar images of the audience on the image of the performer P according to the viewing distance D, the user U can feel as if he or she is actually at the venue.

［変形例１］
以下、実施形態の変形例を説明する。前述した実施形態では、ディスプレイ３０は、非透過型のウェアラブルディスプレイであるとされている。しかしながら、ディスプレイ３０は、透過型のウェアラブルディスプレイであってもよい。透過型のウェアラブルディスプレイとして、ビデオ透過型のウェアラブルディスプレイ及び光学透過型のウェアラブルディスプレイの何れが用いられてもよい。また、透過型のウェアラブルディスプレイによるユーザＵへの映像の投影方式として、虚像投影方式と網膜投影方式が知られている。実施形態では、虚像投影方式と網膜投影方式の何れのウェアラブルディスプレイが用いられてもよい。 [Modification 1]
Modifications of the embodiment will be described below. In the above-described embodiment, the display 30 is a non-transmissive wearable display. However, the display 30 may be a transmissive wearable display. As the transmissive wearable display, either a video transmissive wearable display or an optical transmissive wearable display may be used. In addition, as a method of projecting an image onto the user U by a transmissive wearable display, a virtual image projection method and a retinal projection method are known. In the embodiment, either a virtual image projection method or a retinal projection method wearable display may be used.

図７Ａ及び図７Ｂは、透過型のウェアラブルディスプレイが用いられた映像配信システム１の概略の構成を示す図である。ここで、図７Ａ及び図７Ｂは、遠隔地における構成だけが示されている。遠隔地における構成以外は、図１で示したものと同様である。7A and 7B are diagrams showing a schematic configuration of a video distribution system 1 using a transmissive wearable display. Here, Fig. 7A and Fig. 7B show only the configuration at the remote location. The configuration other than that at the remote location is the same as that shown in Fig. 1.

図７Ａは、第１の例を示す。第１の例では、ユーザＵは、透過型のウェアラブルディスプレイであるディスプレイ３０ａを頭部に装着している。そして、ユーザＵの前方には壁Ｗがある。透過型のウェアラブルディスプレイの場合、ユーザＵは、両眼のディスプレイユニットによる表示に基づくパフォーマーＰの虚像Ｐｉ及び観客のアバターの虚像Ａｉに加えて、外界から得られる実像も知覚する。したがって、例えばユーザＵから視聴距離Ｄの位置が壁Ｗよりも遠くの位置であれば、ユーザＵは、壁越しにパフォーマーＰの虚像Ｐｉ及び観客のアバターの虚像Ａｉを知覚することになる。つまり、ユーザＵは、壁Ｗが透過していて壁Ｗの奥にも空間が広がっているかのような感覚を得る。このようにして壁越しにパフォーマーＰの虚像Ｐｉ及び観客のアバターの虚像Ａｉを知覚することにより、ユーザＵは、何もない空中の位置にパフォーマーＰの虚像Ｐｉ及び観客のアバターの虚像Ａｉを知覚するよりも現実感を得ることができる。FIG. 7A shows a first example. In the first example, the user U wears a display 30a, which is a transmissive wearable display, on his/her head. And there is a wall W in front of the user U. In the case of a transmissive wearable display, the user U perceives a real image obtained from the outside world in addition to the virtual image Pi of the performer P and the virtual image Ai of the audience avatar based on the display by the display unit of both eyes. Therefore, for example, if the position of the viewing distance D from the user U is farther than the wall W, the user U will perceive the virtual image Pi of the performer P and the virtual image Ai of the audience avatar through the wall. In other words, the user U gets a sense that the wall W is transparent and a space is spreading behind the wall W. By perceiving the virtual image Pi of the performer P and the virtual image Ai of the audience avatar through the wall in this way, the user U can obtain a sense of reality more than if he/she perceives the virtual image Pi of the performer P and the virtual image Ai of the audience avatar at a position in the air where there is nothing.

第１の例では、ユーザＵは、壁Ｗの奥においてパフォーマーＰの虚像Ｐｉ及び観客のアバターの虚像Ａｉを知覚している。この場合、壁Ｗに明確な凹凸や模様等があると、ユーザＵは壁の凹凸や模様等とともにパフォーマーＰの虚像Ｐｉ及び観客のアバターの虚像Ａｉを知覚することになって現実感が損なわれる。このような理由から、壁Ｗは平面且つ無地であることが望ましい。同様の理由により、壁Ｗの前には不要物が置かれていないことが望ましい。In the first example, the user U perceives a virtual image Pi of the performer P and virtual images Ai of the audience avatars behind the wall W. In this case, if the wall W has clear irregularities or patterns, the user U will perceive the virtual image Pi of the performer P and the virtual images Ai of the audience avatars along with the irregularities and patterns of the wall, which will impair the sense of reality. For this reason, it is desirable for the wall W to be flat and plain. For the same reason, it is desirable for no unnecessary objects to be placed in front of the wall W.

図７Ｂは、第２の例を示す。第２の例では、ユーザＵは、透過型のウェアラブルディスプレイであるディスプレイ３０ａを頭部に装着している。そして、ユーザＵの前方には電源がオフされたモニタＭがある。このため、ユーザＵから視聴距離Ｄの位置がモニタＭの画面よりも遠くの位置であれば、ユーザＵは、モニタＭの画面越しにパフォーマーＰの虚像Ｐｉ及び観客のアバターの虚像Ａｉを知覚する。つまり、ユーザＵは、モニタＭの枠の奥にも空間が広がっているかのような感覚を得る。このようにしてモニタ越しにパフォーマーＰの虚像Ｐｉ及び観客のアバターの虚像Ａｉを知覚することにより、ユーザＵは、何もない空中の位置にパフォーマーＰの虚像Ｐｉ及び観客のアバターの虚像Ａｉを知覚するよりも現実感を得ることができる。また、第２の例では、モニタＭがオフされているにもかかわらずに、ユーザＵは、モニタＭにも映像が映っているかのような感覚を得ることができる。FIG. 7B shows a second example. In the second example, the user U wears a display 30a, which is a transmissive wearable display, on his head. In front of the user U, there is a monitor M whose power is turned off. Therefore, if the position of the viewing distance D from the user U is farther than the screen of the monitor M, the user U perceives the virtual image Pi of the performer P and the virtual image Ai of the audience's avatar through the screen of the monitor M. In other words, the user U gets a sense as if a space is spreading beyond the frame of the monitor M. By perceiving the virtual image Pi of the performer P and the virtual image Ai of the audience's avatar through the monitor in this way, the user U can get a sense of reality more than if he perceives the virtual image Pi of the performer P and the virtual image Ai of the audience's avatar in an empty position in the air. In addition, in the second example, even though the monitor M is turned off, the user U gets a sense as if an image is also displayed on the monitor M.

第２の例では、モニタＭの電源はオフされている。これは、モニタＭに映像が表示されていると、ユーザは、モニタＭに表示されている映像とともにパフォーマーＰの虚像Ｐｉ及び観客のアバターの虚像Ａｉを知覚してしまうためである。一方で、会場と関係のある情報がモニタＭに表示されてもよい。In the second example, the power of the monitor M is turned off. This is because, if an image is displayed on the monitor M, the user will perceive a virtual image Pi of the performer P and virtual images Ai of the audience avatars along with the image displayed on the monitor M. On the other hand, information related to the venue may be displayed on the monitor M.

［変形例２］
前述した実施形態では、パフォーマーＰの身長が３次元空間内で再現されるように処理が行われる。これに対し、大型サービスモニタ５０のサイズが３次元空間内で再現されるように処理が行われてもよい。この場合、プロセッサ２０１は、カメラ４０から映像を取得する。ここで、映像におけるパフォーマーＰの身長に基づいて仮想ディスプレイ面のサイズが決められるのは、前述した実施形態と同様であるが、変形例２では、プロセッサ２０１は、３次元空間内での仮想ディスプレイ面のサイズが大型サービスモニタ５０のサイズと一致するように仮想ディスプレイ面のサイズを決める。さらに、変形例２では、プロセッサ２０１は、ユーザＵに対して十分に遠くの視聴距離の位置に大型サービスモニタ５０の映像平面を配置し、ユーザＵと大型サービスモニタ５０の映像平面との間の配置位置に、観客を表す映像を配置する。これにより、映像の迫力や見やすさはそのままに、ユーザＵからの距離だけが離れ、その映像平面とユーザとの間に観客もいるという状態が作り出される。 [Modification 2]
In the above-described embodiment, processing is performed so that the height of the performer P is reproduced in a three-dimensional space. In contrast to this, processing may be performed so that the size of the large service monitor 50 is reproduced in a three-dimensional space. In this case, the processor 201 acquires an image from the camera 40. Here, as in the above-described embodiment, the size of the virtual display surface is determined based on the height of the performer P in the image. However, in the second modification, the processor 201 determines the size of the virtual display surface so that the size of the virtual display surface in the three-dimensional space matches the size of the large service monitor 50. Furthermore, in the second modification, the processor 201 places the image plane of the large service monitor 50 at a position at a viewing distance sufficiently far from the user U, and places an image representing an audience at a position between the user U and the image plane of the large service monitor 50. This creates a state in which the impact and ease of viewing of the image are the same, but the distance from the user U is farther away, and an audience is present between the image plane and the user.

［変形例３］
前述した実施形態では、カメラ１０でリアルタイムに撮影される映像に対して映像処理装置２０による処理が行われる。これに対し、過去に記録媒体に記録された映像の再生の際に映像処理装置２０による処理が行われてもよい。記録媒体に記録された映像は、必ずしもカメラ１０で撮影された映像に限らない。例えば、記録媒体に記録された映像は、ＣＧ等であってもよい。変形例３の場合、記録媒体に記録された映像は一旦、映像処理装置２０に送られる。そして、映像処理装置２０のプロセッサ２０１は、図４で示した処理を行った上で、３次元映像のデータをディスプレイ３０に送信する。なお、観客データ２０４３としての観客の生体データは、映像の記録時に収集されたデータが用いられる。プロセッサ２０１は、再生される映像のタイミングと同期して生体データを用いたアバター映像の更新を実施する。 [Modification 3]
In the above-described embodiment, the image processing device 20 processes the image captured in real time by the camera 10. In contrast, the image processing device 20 may process the image recorded in the recording medium when playing back the image recorded in the recording medium. The image recorded in the recording medium is not necessarily limited to the image captured by the camera 10. For example, the image recorded in the recording medium may be CG or the like. In the case of the modified example 3, the image recorded in the recording medium is once sent to the image processing device 20. Then, the processor 201 of the image processing device 20 performs the process shown in FIG. 4 and then transmits the data of the three-dimensional image to the display 30. Note that the biometric data of the audience as the audience data 2043 is data collected when the image is recorded. The processor 201 updates the avatar image using the biometric data in synchronization with the timing of the image being played back.

［変形例４］
前述した実施形態では、ディスプレイ３０は、３次元表示できるディスプレイであるとされている。これに対し、ディスプレイ３０は、３次元表示できないディスプレイであってもよい。この場合、プロセッサ２０１は、映像の加工処理においてパフォーマーＰのサイズを変化させなくてよい。なお、プロセッサ２０１は、ディスプレイ３０の画面が十分に大きければパフォーマーＰの身長が再現されるように映像におけるパフォーマーＰのサイズを拡大又は縮小してもよい。一方で、プロセッサ２０１は、アバターのサイズをユーザＵから視聴距離Ｄの位置において知覚されるであろうサイズに拡大又は縮小する。 [Modification 4]
In the above-described embodiment, the display 30 is a display capable of three-dimensional display. In contrast, the display 30 may be a display that is not capable of three-dimensional display. In this case, the processor 201 does not need to change the size of the performer P in the video processing. If the screen of the display 30 is sufficiently large, the processor 201 may enlarge or reduce the size of the performer P in the video so that the height of the performer P is reproduced. On the other hand, the processor 201 enlarges or reduces the size of the avatar to a size that would be perceived at a viewing distance D from the user U.

［変形例５］
前述した実施形態では、音楽及び演劇といったイベントにおける適用例が示されている。これに対し、実施形態は、スポーツ観戦といったイベントに対しても適用され得る。この場合において、基準物は、必ずしも人である必要はない。例えば、サッカーの試合といったイベントでは、基準物は、選手といった人物であってもよいし、サッカーボールといった物体等であってもよい。基準物がサッカーボールといった物体である場合、映像の拡大又は縮小は基準物としての物体を基準にして行われてよい。また、基準物によっては、基準物の縦方向の長さが用いられずに、横方向の長さが用いられてもよい。 [Modification 5]
In the above-described embodiment, application examples in events such as music and theater are shown. In contrast, the embodiment can also be applied to events such as watching sports. In this case, the reference object does not necessarily have to be a person. For example, in an event such as a soccer match, the reference object may be a person such as a player, or an object such as a soccer ball. When the reference object is an object such as a soccer ball, the image may be enlarged or reduced based on the object as the reference object. Also, depending on the reference object, the horizontal length of the reference object may be used instead of the vertical length of the reference object.

［変形例６］
前述した実施形態では、基準物の長さに基づいてカメラ１０の撮影範囲が推定されている。この推定は、カメラ１０が移動できるように構成されていて、撮影範囲が時間的に変わり得るときに特に有効である。これに対し、カメラ１０が固定されているときには、カメラ１０の焦点距離及び撮影距離といった情報からカメラ１０の撮影範囲が推定されてもよい。 [Modification 6]
In the above-described embodiment, the shooting range of the camera 10 is estimated based on the length of the reference object. This estimation is particularly effective when the camera 10 is configured to be movable and the shooting range may change over time. In contrast, when the camera 10 is fixed, the shooting range of the camera 10 may be estimated from information such as the focal length and shooting distance of the camera 10.

［変形例７］
前述した実施形態では、カメラ１０の撮影範囲と同等の範囲の景色をユーザＵが見たと仮定したときの距離感及びサイズ感がユーザの知覚する３次元映像において再現される。これに対し、視聴距離Ｄが特定の観客席からパフォーマーＰまでの距離とされて、この視聴距離ＤからユーザＵが見たときと同等の撮影範囲Ｈの映像がカメラ１０で撮影された映像からトリミングされ、このトリミングされた映像に基づいて視聴距離Ｄに身長ＨｐのパフォーマーＰが位置するように仮想ディスプレイ面が配置され、仮想ディスプレイ面でのパフォーマーＰのサイズが調整されてもよい。この場合、映像が様々な領域でトリミングされても対応できるよう、カメラ１０は広範囲を高解像度で撮影していることが望ましい。これにより、ユーザＵは、あたかも会場の特定の席でパフォーマーＰを見ているかのような感覚を得ることができる。 [Modification 7]
In the above-mentioned embodiment, the sense of distance and sense of size when the user U is assumed to see a scene of a range equivalent to the shooting range of the camera 10 are reproduced in the three-dimensional image perceived by the user. In contrast, the viewing distance D may be set to the distance from a specific audience seat to the performer P, and an image of a shooting range H equivalent to that seen by the user U from this viewing distance D may be trimmed from the image captured by the camera 10, and a virtual display surface may be arranged so that the performer P of height Hp is located at the viewing distance D based on this trimmed image, and the size of the performer P on the virtual display surface may be adjusted. In this case, it is desirable that the camera 10 captures a wide range with high resolution so that it can respond even if the image is trimmed in various areas. This allows the user U to get the feeling as if he or she is watching the performer P from a specific seat in the venue.

なお、本発明は、上記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、各実施形態は適宜組み合わせて実施してもよく、その場合組み合わせた効果が得られる。更に、上記実施形態には種々の発明が含まれており、開示される複数の構成要件から選択された組み合わせにより種々の発明が抽出され得る。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、課題が解決でき、効果が得られる場合には、この構成要件が削除された構成が発明として抽出され得る。The present invention is not limited to the above-mentioned embodiment, and can be modified in various ways without departing from the gist of the present invention. The embodiments may be combined as appropriate, and in that case, the combined effect can be obtained. Furthermore, the above-mentioned embodiment includes various inventions, and various inventions can be extracted by combinations selected from the multiple components disclosed. For example, if the problem can be solved and the effect can be obtained even if some components are deleted from all the components shown in the embodiment, the configuration from which the components are deleted can be extracted as an invention.

１…映像配信システム
１０…カメラ
２０…映像処理装置
３０…ディスプレイ
４０…カメラ
５０…大型サービスモニタ
２０１…プロセッサ
２０２…ＲＯＭ
２０３…ＲＡＭ
２０４…ストレージ
２０５…入力装置
２０６…通信モジュール
２０１１…受信部
２０１２…距離推定部
２０１３…映像加工部
２０１４…３次元処理部
２０１５…送信部
２０４１…プログラム
２０４２…変数
２０４３…観客データ Reference Signs List 1: Video distribution system 10: Camera 20: Video processing device 30: Display 40: Camera 50: Large service monitor 201: Processor 202: ROM
203...RAM
204: storage 205: input device 206: communication module 2011: receiving unit 2012: distance estimation unit 2013: image processing unit 2014: three-dimensional processing unit 2015: transmitting unit 2041: program 2042: variable 2043: audience data

Claims

映像を受信する受信部と、
前記映像と同等の範囲の景色を人間が見ていたと仮定した場合の視聴距離を推定する距離推定部と、
前記視聴距離において前記映像に映る基準物のサイズが前記基準物の現実のサイズと一致するように前記映像を加工する映像加工部と、
加工された前記映像をディスプレイに送信する送信部と、
を具備する映像処理装置。 A receiving unit for receiving a video image;
a distance estimation unit that estimates a viewing distance when it is assumed that a human is viewing a scene having a range equivalent to that of the image;
an image processing unit that processes the image so that a size of a reference object shown in the image at the viewing distance coincides with an actual size of the reference object;
A transmission unit that transmits the processed image to a display;
A video processing device comprising:

３次元空間内でユーザが前記映像を知覚する仮想ディスプレイ面が前記ユーザから前記視聴距離の位置において配置されるように前記映像を処理する３次元処理部をさらに具備し、
前記映像加工部は、前記仮想ディスプレイ面における前記基準物のサイズが現実のサイズと一致するように前記映像を拡大又は縮小する、
請求項１に記載の映像処理装置。 a three-dimensional processing unit that processes the image so that a virtual display surface on which the user perceives the image in a three-dimensional space is located at a position at the viewing distance from the user;
The image processing unit enlarges or reduces the image so that the size of the reference object on the virtual display surface matches the actual size.
The video processing device according to claim 1 .

前記３次元処理部は、前記ユーザから前記視聴距離よりも短い距離の位置に観客を表す映像が配置されるように前記映像を処理する、
請求項２に記載の映像処理装置。 the three-dimensional processing unit processes the image so that the image representing the audience is positioned at a distance from the user that is shorter than the viewing distance;
The video processing device according to claim 2 .

前記距離推定部は、
前記映像に映っている前記基準物のサイズから前記映像が映っている範囲の現実のサイズを推定し、
前記映像が映っている範囲の現実のサイズと、前記人間をレンズとイメージセンサを有するカメラとみなした場合の前記レンズの焦点距離及び前記イメージセンサのサイズに基づいて前記視聴距離を推定する、
請求項１乃至３の何れか１項に記載の映像処理装置。 The distance estimation unit
Estimating the actual size of the area in which the image is captured from the size of the reference object captured in the image;
The viewing distance is estimated based on the actual size of the area in which the image is captured, and the focal length of the lens and the size of the image sensor when the person is regarded as a camera having a lens and an image sensor.
The image processing device according to claim 1 .

前記基準物は、人間であり、
前記基準物のサイズは、人間の身長である、
請求項１乃至４の何れか１項に記載の映像処理装置。 the reference object is a human being;
The size of the reference object is the height of a human being.
The image processing device according to claim 1 .

映像処理装置によって実行される映像処理方法であって、
前記映像処理装置の受信部により、映像を受信することと、
前記映像処理装置の距離推定部により、前記映像と同等の範囲の景色を人間が見ていたと仮定した場合の視聴距離を推定することと、
前記映像処理装置の映像加工部により、前記視聴距離において前記映像に映る基準物のサイズが前記基準物の現実のサイズと一致するように前記映像を加工することと、
前記映像処理装置の送信部により、加工された前記映像をディスプレイに送信することと、
を具備する映像処理方法。 A video processing method executed by a video processing device, comprising:
receiving an image by a receiving unit of the image processing device;
estimating a viewing distance when it is assumed that a human is viewing a scene having a range equivalent to that of the image by a distance estimation unit of the image processing device;
processing the image by an image processing unit of the image processing device so that a size of a reference object shown in the image at the viewing distance coincides with an actual size of the reference object;
transmitting the processed image to a display by a transmission unit of the image processing device;
A video processing method comprising:

コンピュータを請求項１乃至５の何れか１項に記載の映像処理装置の前記受信部と、前記距離推定部と、前記映像加工部と、前記送信部として機能させるための映像処理プログラム。6. A video processing program for causing a computer to function as the receiving section, the distance estimating section, the video processing section, and the transmitting section of the video processing device according to claim 1.