JP2018026677A

JP2018026677A - Encoding device, encoding method, and encoding program

Info

Publication number: JP2018026677A
Application number: JP2016156723A
Authority: JP
Inventors: 豊國田; Yutaka Kunida; 越智　大介; Daisuke Ochi; 大介越智; 亀田　明男; Akio Kameda; 明男亀田; 愛磯貝; Ai Isogai; 明小島; Akira Kojima
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-08-09
Filing date: 2016-08-09
Publication date: 2018-02-15

Abstract

PROBLEM TO BE SOLVED: To provide an encoding device that can improve the image quality of an area in one video stream to which an observer pays attention in a limited transmission band, and reduce an increase in load on a distribution server side for a number of users.SOLUTION: An encoding device comprises: a video input part that receives an input of a video to be distributed; a weighting part 33 that provides different weight distributions respectively to areas of attention in the video; and an encoding part 34 that generates a plurality of bit streams in which coding is performed to achieve the amount of coding according to the weight distribution.SELECTED DRAWING: Figure 3

Description

本発明は、符号化装置、符号化方法及び符号化プログラムに関する。 The present invention relates to an encoding device, an encoding method, and an encoding program.

従来から、入力映像を複数解像度に変換し、それらをタイルと呼ばれる部分領域ごとに分割して符号化しておくことにより、限られた帯域で全体映像と高解像度映像を配信することができる映像配信システムが知られている。この映像配信システムは、視聴者からの注目領域の要求に応じて、全体の低解像度タイルと、注目領域の高解像度タイルを配信し、再生端末側で表示する際、低解像度タイルの注目領域部分のみの画素を高解像度タイルの画素と置き換えることで、注目領域が変更された場合にも映像が途切れることを防止することができる。 Conventionally, video distribution that can distribute the entire video and high-resolution video in a limited band by converting the input video into multiple resolutions and dividing and encoding them into partial areas called tiles The system is known. This video distribution system distributes the entire low resolution tile and the high resolution tile of the attention area in response to a request for the attention area from the viewer, and displays the attention area portion of the low resolution tile when displayed on the playback terminal side. By replacing only the pixels with the pixels of the high resolution tile, it is possible to prevent the video from being interrupted even when the attention area is changed.

ここで領域に関する言葉を定義しておく。注目領域とは、ＲＯＩ（Region of Interest）と呼ばれ、大きなサイズの映像のうち、視聴者が注目している映像の一部の領域のことである。ただし、注目領域の大きさは視聴者によって変更可能であり、注目領域を最大の注目領域とした場合は、注目領域と映像サイズが一致する場合もある。タイルとは、全領域の映像を複数の小さい部分領域に分けたものである。部分領域とは、全領域を予め決められた数の小さな矩形領域に分割したものであり、複数の部分領域を組み合わせてタイルが構成される場合もある。最小範囲領域とは、注目領域に対応するビットストリームを得る際に分割する必要がない映像の部分領域を纏めた映像の部分領域の集合または部分領域そのものである。 Here, we define terms related to the domain. The attention area is called ROI (Region of Interest), and is a partial area of the video that the viewer is paying attention to out of a large size video. However, the size of the attention area can be changed by the viewer. If the attention area is the maximum attention area, the attention area and the video size may match. A tile is a video in which the entire area is divided into a plurality of small partial areas. The partial area is obtained by dividing the entire area into a predetermined number of small rectangular areas, and a tile may be configured by combining a plurality of partial areas. The minimum range area is a set of partial areas of a video or a partial area itself, in which partial areas of a video that do not need to be divided when obtaining a bitstream corresponding to a region of interest.

図１４は、映像配信システムの構成を示す図である。この図において、符号１は、映像を配信する配信サーバである。符号２１は、ヘッドマウントディスプレイで構成された端末装置である。符号２２は、液晶ディスプレイ等で構成されたデスクトップ型の端末装置である。符号２３は、スマートフォンを挿入して簡易型のヘッドマウントディスプレイを構成する端末装置である。 FIG. 14 is a diagram illustrating a configuration of a video distribution system. In this figure, reference numeral 1 denotes a distribution server that distributes video. Reference numeral 21 denotes a terminal device composed of a head mounted display. Reference numeral 22 denotes a desktop terminal device composed of a liquid crystal display or the like. Reference numeral 23 denotes a terminal device that constitutes a simple head-mounted display by inserting a smartphone.

配信サーバ１では、図１４に示すように、入力映像を複数解像度に変換し、それらをタイルと呼ばれる領域ごとに分割して符号化しておく。そして、配信サーバ１は、端末装置２１〜２３のいずれかからのタイル配信要求に応じて、２枚のタイルを配信する。配信された２枚のタイルを表示する際に、低解像度タイルＢの注目領域部分のみの画素を高解像度タイルＡの画素と置き換えて端末装置２１〜２３の画面に表示する。 As shown in FIG. 14, the distribution server 1 converts the input video into a plurality of resolutions, and divides them into areas called tiles and encodes them. The distribution server 1 distributes two tiles in response to a tile distribution request from any of the terminal devices 21 to 23. When displaying the two delivered tiles, the pixels of the attention area portion of the low resolution tile B are replaced with the pixels of the high resolution tile A and displayed on the screens of the terminal devices 21 to 23.

図１４においては、タイルＡは高解像度のタイルであり、これが注目領域となる。一方、タイルＢは低解像度の注目領域を含む注目領域以上の広範囲のタイルであり、注目領域が変更された場合に、高解像度のタイルが配信されるまでの間の映像としても用いられる。このようにすることにより、注目領域の変更があった場合でも映像が途切れることなる映像表示を行うことが可能となる。 In FIG. 14, the tile A is a high-resolution tile, and this is the attention area. On the other hand, the tile B is a wide range of tiles including the attention area including the low-resolution attention area, and is used as an image until the high-resolution tile is distributed when the attention area is changed. By doing so, it is possible to perform video display in which video is interrupted even when the attention area is changed.

次に、図１４に示す映像配信システムの詳細な構成と動作を説明する。図１５は、映像配信システムの詳細な構成を示すブロック図である。この図において、符号１は、映像配信を行う配信サーバである。符号２５は、形態を特定しない端末装置である。端末装置２５は、例えば、ヘッドマウントディスプレイ等である。 Next, the detailed configuration and operation of the video distribution system shown in FIG. 14 will be described. FIG. 15 is a block diagram showing a detailed configuration of the video distribution system. In this figure, reference numeral 1 denotes a distribution server that performs video distribution. The code | symbol 25 is a terminal device which does not specify a form. The terminal device 25 is, for example, a head mounted display.

符号１１は、注目領域に対応するビットストリームを配信するためにビットストリームＲＯＩビットストリームを選択して配信する映像配信部である。符号１２は、注目領域（ＲＯＩ）毎の高画質映像のビットストリームが蓄積されているビットストリーム蓄積部である。符号１３は、全体領域の低画質映像のビットストリームが蓄積されている全体領域ビットストリーム蓄積部である。映像配信部１１は、ビットストリーム蓄積部１２に蓄積されたビットストリームのうち、ＲＯＩ候補領域のビットストリームを選択して配信するとともに、全体領域のビットストリームを併せて配信する。 Reference numeral 11 denotes a video distribution unit that selects and distributes a bit stream ROI bit stream in order to distribute a bit stream corresponding to a region of interest. Reference numeral 12 denotes a bitstream storage unit in which a high-quality video bitstream for each region of interest (ROI) is stored. Reference numeral 13 denotes an entire area bitstream storage unit in which a low-quality video bitstream of the entire area is stored. The video distribution unit 11 selects and distributes the bit stream of the ROI candidate area from the bit streams stored in the bit stream storage unit 12, and distributes the bit stream of the entire area together.

符号２６は、視聴者の注目領域を特定する情報に基づいて、注目領域の配信要求を出す映像ストリーム要求部である。符号２７は、注目領域のビットストリームを復号して、注目領域の映像を得るＲＯＩ復号部である。符号２８は、全体領域のビットストリームを復号して、注目領域の映像を得る全体復号部である。符号２９は、低解像度タイルの注目領域部分のみの画素を高解像度タイルの画素と置き換えた映像合成して表示する映像合成表示部である。 Reference numeral 26 denotes a video stream requesting unit that issues a distribution request for the attention area based on information for specifying the attention area of the viewer. Reference numeral 27 denotes an ROI decoding unit that decodes a bit stream of a region of interest to obtain a video of the region of interest. Reference numeral 28 denotes an overall decoding unit that decodes the bit stream of the entire area and obtains the video of the attention area. Reference numeral 29 denotes an image composition display unit that displays an image synthesized by replacing pixels of only the attention area portion of the low resolution tile with pixels of the high resolution tile.

次に、ビットストリーム蓄積部１２に符号化したビットストリームを出力する符号化装置について説明する。図１６は、配信サーバ１に符号化装置３を接続した構成を示すブロック図である。図１６において、図１５に示す部分と同一の部分には、同じ符号を付与してその説明を省略する。符号化装置３は、例えば８分割された部分領域を含む全領域の映像を入力する。全領域符号化部３１は、映像（＝全領域の映像）を、公知の符号化方式（例えば各種標準規格など）で符号化して、映像に対応するビットストリームを得る。ＲＯＩ符号化部３１は、映像（＝全領域の映像）から、予め定めた複数個のＲＯＩそれぞれの映像に対応するビットストリームを得る。ＲＯＩ符号化部３１は、各ＲＯＩの映像を、それぞれ公知の符号化方式（例えば各種標準規格など）で符号化して、各ＲＯＩに対応するビットストリームを得て、ＲＯＩビットスリーム蓄積部に蓄積する。ＲＯＩ符号化部３１は、例えば、４つ部分領域からなるＲＯＩに対応するビットストリームを３つ得て（１、２、５、６の部分領域、２、３、６、７の部分領域、３、４、７、８の部分領域）、ＲＯＩビットストリーム蓄積部に蓄積する。 Next, an encoding device that outputs an encoded bit stream to the bit stream storage unit 12 will be described. FIG. 16 is a block diagram illustrating a configuration in which the encoding device 3 is connected to the distribution server 1. In FIG. 16, the same parts as those shown in FIG. The encoding device 3 inputs the video of the entire region including, for example, the partial region divided into eight. The entire area encoding unit 31 encodes a video (= video of the entire area) with a known encoding method (for example, various standards) to obtain a bit stream corresponding to the video. The ROI encoding unit 31 obtains a bitstream corresponding to each of a plurality of predetermined ROI images from the video (= all region video). The ROI encoding unit 31 encodes each ROI video by a known encoding method (for example, various standards), obtains a bitstream corresponding to each ROI, and stores the bitstream in the ROI bitstream storage unit. . The ROI encoding unit 31 obtains, for example, three bit streams corresponding to the ROI composed of four partial areas (partial areas 1, 2, 5, and 6; partial areas 2, 2, 6, 6, and 7; 4, 7, 8 partial areas), and accumulates in the ROI bitstream accumulating unit.

このように、従来の技術は、解像度と位置の異なる複数のタイルを符号化しておき、全体の低解像度のタイルと、観察者の注目領域の要求に応じた高解像度のタイルを配信し、利用者側端末で重ね合わせて表示することで、限られた帯域で広範囲・高解像度の映像を配信していた（例えば、特許文献１参照）。 As described above, the conventional technique encodes a plurality of tiles having different resolutions and positions, and distributes and uses the entire low-resolution tile and the high-resolution tile according to the request of the observer's attention area. By superimposing and displaying on a person-side terminal, a wide-range, high-resolution video is distributed in a limited band (see, for example, Patent Document 1).

また、利用者の注目領域の要求に応じて高精細な映像を再符号化(トランスコード)して配信する方式もある（例えば、特許文献２参照）。 There is also a method of re-encoding (transcoding) high-definition video in accordance with a user's request for a region of interest (see, for example, Patent Document 2).

特許第５４４９２４１号公報Japanese Patent No. 5449241 特許第３９３６７０８号公報Japanese Patent No. 3936708

ところで、特許文献１では、２つ以上の映像ストリームで配信し、利用者端末側で合成して表示するため、映像ストリーム間の表示内容がずれることを防止するためには、端末装置側で２つの映像ストリームの再生時刻を合わせて表示させる同期処理が必要である。しかしながら、伝送路を通る２つの映像ストリームは必ずしも同時刻に端末装置側に到達するわけではない。そのため、一定期間ストリームを蓄積するバッファを持ち、すべての映像ストリームが到達するのを待ってから表示するために、余剰の遅延を生んでいた。 By the way, in Patent Document 1, since two or more video streams are distributed and combined and displayed on the user terminal side, in order to prevent the display content between the video streams from deviating, 2 on the terminal device side. A synchronization process for displaying the playback times of the two video streams together is necessary. However, the two video streams passing through the transmission path do not necessarily reach the terminal device side at the same time. For this reason, a buffer for storing the stream for a certain period of time is provided, and an excessive delay is caused in order to display after waiting for all the video streams to arrive.

また、端末装置では、２つのストリームを受信し、復号化し、映像を合成して表示する仕組みが必要となるため、１つのストリームを受信・符号化する一般的な端末装置で動作させることができない。また、端末装置側での計算機負荷が増加するという問題がある。 In addition, since the terminal device requires a mechanism for receiving and decoding two streams, synthesizing and displaying the video, it cannot be operated on a general terminal device that receives and encodes one stream. . There is also a problem that the computer load on the terminal device side increases.

また、ネットワーク構成や装置によっては１つの端末装置に２つ以上のストリームを配信することを想定していないため、配信ができないケースがある。特許文献２では、利用者側（クライアント側）からの要求に応じて、階層的に符号化されたデータをトランスコードして送信するため、利用者数が多くなった際には、処理の負荷が増大するという問題もある。 Moreover, since it is not assumed that two or more streams are distributed to one terminal device depending on the network configuration or device, there are cases where distribution is not possible. In Patent Document 2, in order to transcode and transmit hierarchically encoded data in response to a request from the user side (client side), when the number of users increases, processing load is increased. There is also a problem that increases.

本発明は、このような事情に鑑みてなされたもので、映像ストリームに関し、限られた伝送帯域で観察者の注目する領域の画質を高め、さらに多数の利用者に対して配信サーバ側の負荷増大を少なくすることができる符号化を行う符号化装置、符号化方法及び符号化プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and for video streams, the image quality of the region of interest of the observer is enhanced in a limited transmission band, and the load on the distribution server side for a large number of users. It is an object of the present invention to provide an encoding device, an encoding method, and an encoding program that perform encoding capable of reducing an increase.

本発明の一態様は、配信対象の映像を入力する映像入力部と、前記映像の注目領域毎に異なる重み分布を付与する重み付け部と、前記重み分布に応じた符号量となるように符号化を行ったビットストリームを複数生成する符号化部とを備えた符号化装置である。 According to one aspect of the present invention, a video input unit that inputs a video to be distributed, a weighting unit that assigns a different weight distribution to each region of interest of the video, and an encoding amount corresponding to the weight distribution It is an encoding apparatus provided with the encoding part which produces | generates multiple bit streams which performed.

本発明の一態様は、前記符号化装置であって、前記重み付け部は、前記映像の所定領域ごとに重み付けを行う第１の重み付けと、前記映像から注目領域を抽出して重み付けを行う第２の重み付けと、他のユーザの判断をフィードバックして注目領域を抽出して重み付けを行う第３の重み付けと、前記映像の内容に基づいて手作業で注目領域を抽出して重み付けを行う第４の重み付けと、前記映像を見るときの地理的位置により注目領域を抽出して重み付けを行う第５の重み付けとのうち、少なくとも一つの重み付けを用いて前記重み分布を付与する。 One aspect of the present invention is the encoding device, wherein the weighting unit performs first weighting for performing weighting for each predetermined region of the video, and second for performing weighting by extracting a region of interest from the video. And weighting by extracting the attention area by feeding back the judgments of other users and performing weighting by manually extracting the attention area based on the contents of the video. The weight distribution is assigned using at least one weighting among weighting and a fifth weighting that performs weighting by extracting a region of interest based on a geographical position when viewing the video.

本発明の一態様は、配信対象の映像を入力する映像入力部を備えた符号化装置が行う符号化方法であって、前記映像の注目領域毎に異なる重み分布を付与する重み付けステップと、前記重み分布に応じた符号量となるように符号化を行ったビットストリームを複数生成する符号化ステップとを有する符号化方法である。 One aspect of the present invention is an encoding method performed by an encoding device including a video input unit that inputs a video to be distributed, the weighting step of assigning a different weight distribution to each region of interest of the video, And an encoding step of generating a plurality of bitstreams encoded so as to have a code amount corresponding to the weight distribution.

本発明の一態様は、コンピュータを、前記符号化装置として機能させるための符号化プログラムである。 One aspect of the present invention is an encoding program for causing a computer to function as the encoding device.

本発明によれば、映像ストリームに関し、限られた伝送帯域で観察者の注目する領域の画質を高め、さらに多数の利用者に対して配信サーバ側の負荷の増大を抑えつつ、符号化を行うことができるという効果が得られる。 According to the present invention, with respect to a video stream, encoding is performed while improving the image quality of a region of interest to a viewer in a limited transmission band and further suppressing an increase in load on the distribution server side for a large number of users. The effect that it can be obtained.

本発明の第１実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of 1st Embodiment of this invention. 図１に示す映像配信システムの詳細な構成を示すブロック図である。It is a block diagram which shows the detailed structure of the video delivery system shown in FIG. 図２に示すビットストリーム蓄積部１４にビットストリームを蓄積する符号化装置を配信サーバ１に接続した構成を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration in which an encoding device that stores a bit stream in a bit stream storage unit 14 illustrated in FIG. 2 is connected to a distribution server 1. 図２に示す端末装置２５詳細な構成を示すブロック図である。It is a block diagram which shows the detailed structure of the terminal device 25 shown in FIG. 局所重みを決定する動作を示す説明図である。It is explanatory drawing which shows the operation | movement which determines a local weight. 重み分布と映像の例を示す説明図である。It is explanatory drawing which shows the example of a weight distribution and an image | video. 映像ストリーム毎に、正規分布に従う重み分布の中心座標、分散を記憶したテーブルの一例を示す説明図である。It is explanatory drawing which shows an example of the table which memorize | stored the center coordinate and dispersion | distribution of the weight distribution according to normal distribution for every video stream. 魚眼レンズを搭載したカメラや複数のカメラで全方位撮影した映像を配信する例を示す説明図である。It is explanatory drawing which shows the example which delivers the image | video which image | photographed omnidirectional with the camera carrying a fisheye lens, or several cameras. 重み分布の一例を示す説明図である。It is explanatory drawing which shows an example of weight distribution. 重み分布の一例を示す説明図である。It is explanatory drawing which shows an example of weight distribution. 第３実施形態による映像配信システムの構成を示すブロック図である。It is a block diagram which shows the structure of the video delivery system by 3rd Embodiment. 局所重み決定部３３の構成を示すブロック図である。3 is a block diagram illustrating a configuration of a local weight determination unit 33. FIG. 局所重み決定部３３の構成を示すブロック図である。3 is a block diagram illustrating a configuration of a local weight determination unit 33. FIG. 映像配信システムの構成を示す図である。It is a figure which shows the structure of a video delivery system. 映像配信システムの詳細な構成を示すブロック図である。It is a block diagram which shows the detailed structure of a video delivery system. 配信サーバ１に符号化装置３を接続した構成を示すブロック図である。2 is a block diagram showing a configuration in which an encoding device 3 is connected to a distribution server 1. FIG.

＜第１実施形態＞
以下、図面を参照して、本発明の第１実施形態による映像配信システムを説明する。図１は、本実施形態による映像配信システムの構成を示す図である。この図において、符号１は、映像を配信する配信サーバである。符号２１は、ヘッドマウントディスプレイで構成された端末装置である。符号２２は、液晶ディスプレイ等で構成されたデスクトップ型の端末装置である。符号２３は、スマートフォンを挿入して簡易型のヘッドマウントディスプレイを構成する端末装置である。 <First Embodiment>
Hereinafter, a video delivery system according to a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a diagram showing a configuration of a video distribution system according to the present embodiment. In this figure, reference numeral 1 denotes a distribution server that distributes video. Reference numeral 21 denotes a terminal device composed of a head mounted display. Reference numeral 22 denotes a desktop terminal device composed of a liquid crystal display or the like. Reference numeral 23 denotes a terminal device that constitutes a simple head-mounted display by inserting a smartphone.

本実施形態による映像配信システムは、見ている領域（注目領域：ＲＯＩ）が他の領域よりも高画質な映像を視聴可能とするために、注目領域のほうがその他の部分より高画質な１つの映像のビットストリームを配信するものである。配信サーバ１は、端末装置２１、２２、２３からの配信要求に応じて、低画質の全領域映像Ｂの一部である注目領域（ＲＯＩ）が高画質映像Ｂになっているものをストリーミング配信を行う。 In the video distribution system according to the present embodiment, in order to make it possible to view a video whose region of interest (region of interest: ROI) has higher image quality than other regions, the region of interest has one higher image quality than the other portions. A video bit stream is distributed. In response to a distribution request from the terminal devices 21, 22, and 23, the distribution server 1 performs streaming distribution in which a region of interest (ROI) that is a part of the low-quality all-region video B is a high-quality video B I do.

この映像配信システムを実現するために、想定される複数パターンそれぞれについて、所定の注目領域（ＲＯＩ）がその他の領域よりも高画質な１つの映像の１つのビットストリームを生成し、複数パターンのビットストリームから、１つのビットストリームを選択して、配信する。 In order to realize this video distribution system, for each of a plurality of assumed patterns, one bit stream of one video whose predetermined region of interest (ROI) has higher image quality than other regions is generated, One bit stream is selected from the stream and distributed.

次に、図２を参照して、図１に示す映像配信システムの詳細な構成について説明する。図２は、図１に示す映像配信システムの詳細な構成を示すブロック図である。図２において、符号１は、映像の配信を行う配信サーバである。配信サーバ１は、映像配信部１１とビットストリーム蓄積部１４とから構成する。映像配信部１１は、ビットストリーム１４蓄積部に蓄積されたビットストリームのうち、所望の注目領域に重み付けした全体領域ビットストリームを選択して配信する。ビットストリーム蓄積部１４は、予め定めた注目領域に重み付けした（＝予め定めた注目領域が、他の領域よりも高画質である）全体領域映像のビットストリームが、複数種類（複数個のビットストリーム。予め定めた注目領域がそれぞれ異なるビットストリーム。）蓄積されている。 Next, a detailed configuration of the video distribution system shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a block diagram showing a detailed configuration of the video distribution system shown in FIG. In FIG. 2, reference numeral 1 denotes a distribution server that distributes video. The distribution server 1 includes a video distribution unit 11 and a bit stream storage unit 14. The video distribution unit 11 selects and distributes the entire region bit stream weighted to the desired region of interest among the bit streams stored in the bit stream 14 storage unit. The bit stream storage unit 14 weights the predetermined attention area (= the predetermined attention area has a higher image quality than the other areas). Bitstreams with different predetermined attention areas are accumulated.)

符号２５は、形態を特定しない端末装置である。端末装置２５は、映像ストリーム要求部２５１と、復号部２５２と、表示部２５３とから構成する。映像ストリーム要求部２５１は、映像ストリームの配信要求として、視聴者の注目領域の情報含む配信要求を送出する。復号部２５２は、注目領域が重み付けされた全領域映像のビットストリームを復号して、注目領域が他の領域よりも高画質な映像を得る。表示部２５３は、注目領域が他の領域よりも高画質な映像を表示する。 The code | symbol 25 is a terminal device which does not specify a form. The terminal device 25 includes a video stream request unit 251, a decoding unit 252, and a display unit 253. The video stream request unit 251 sends out a distribution request including information on the viewer's attention area as a video stream distribution request. The decoding unit 252 decodes the bit stream of the entire area video in which the attention area is weighted, and obtains an image in which the attention area has a higher image quality than the other areas. The display unit 253 displays an image in which the region of interest has a higher image quality than other regions.

次に、図３を参照して、図２に示すビットストリーム蓄積部１４にビットストリームを蓄積する符号化装置を配信サーバ１に接続した構成について説明する。図３は、図２に示すビットストリーム蓄積部１４にビットストリームを蓄積する符号化装置を配信サーバ１に接続した構成を示すブロック図である。符号化装置３は、局所重み決定部３３と符号化部３４とを備える。局所重み決定部３３は、全領域の映像中に局所的な重みを付ける。局所重み決定部３３の詳細については、後述する。符号化部３４は、映像（＝全領域の映像）から、予め定めた複数個の領域それぞれの映像に重み付けしたビットストリームを得る。そして、符号化部３４は、重みに従って、映像を、公知の符号化方式（例えば、各種標準規格など）で符号化して、ビットストリームを得る。符号化部３４は、例えば、３つのＲＯＩそれぞれに重み付けした全領域ビットストリームをそれぞれビットストリーム蓄積部１４に蓄積する。 Next, a configuration in which an encoding device that stores a bit stream in the bit stream storage unit 14 illustrated in FIG. 2 is connected to the distribution server 1 will be described with reference to FIG. FIG. 3 is a block diagram showing a configuration in which an encoding device that stores a bit stream in the bit stream storage unit 14 shown in FIG. The encoding device 3 includes a local weight determination unit 33 and an encoding unit 34. The local weight determination unit 33 assigns a local weight to the video of the entire area. Details of the local weight determination unit 33 will be described later. The encoding unit 34 obtains a bitstream weighted to each of a plurality of predetermined areas from the video (= all area video). Then, the encoding unit 34 encodes the video according to a known encoding method (for example, various standards) according to the weight to obtain a bit stream. The encoding unit 34 stores, for example, all area bit streams weighted to the three ROIs in the bit stream storage unit 14, respectively.

局所重み決定部３３は、以下のいずれかに方法によって局所重み決定を行う。なお、以下の重み付けの方法を組み合わせて重み付けを行ってもよい。
（１）．映像の内容に関係なく映像を領域分けし、領域ごとに重みを決める。
（２）．ユーザの注目領域の基づいて映像を領域分けし、領域ごとに重みを決める。
（２）−１．映像のみから注目領域を抽出(サリエンシ値等)する。
（２）−２．他のユーザの判断をフィードバックして注目領域を抽出(ログ)する。
（２）−３．映像の内容に基づいて人手で領域分け(会場入り口領域等)する。
（３）．映像を見るときの地理的位置により映像を領域分けし、領域ごとに重みを決める。 The local weight determination unit 33 performs local weight determination by any of the following methods. In addition, you may weight by combining the following weighting methods.
(1). Regardless of the content of the video, the video is divided into regions and the weight is determined for each region.
(2). The video is divided into regions based on the user's attention region, and a weight is determined for each region.
(2) -1. Extract attention area (saliency value etc.) from video only.
(2) -2. The attention area is extracted (logged) by feeding back the judgment of other users.
(2) -3. Based on the content of the video, manually divide the area (such as the venue entrance area).
(3). The video is divided into regions according to the geographical position when viewing the video, and the weight is determined for each region.

次に、図４を参照して、図２に示す端末装置２５の詳細を説明する。図４は、図２に示す端末装置２５詳細な構成を示すブロック図である。図４に示す端末装置２５が、図２に示す端末装置２５と異なる点は、映像ストリーム要求部２５１が、要求方法設定部２５４と、注目領域検出部２５５と、注目領域情報決定部２５６と、タイル別重み情報記憶部２５７によって構成されている点である。要求方法設定部２５４は、ユーザの頭部の動きベクトルを選択入力する。注目領域検出部２５５は、ユーザの頭部の動きベクトルをユーザの視聴情報として入力し、頭部の動き予測による注目領域を推定する。注目領域情報決定部２５６は、推定した注目領域に基づくタイルの重みを、注目領域の情報として送出する。タイル別重み情報記憶部２５７は、タイル別重み情報を記憶する。ここでは、タイルが部分領域と同じサイズであるものとして説明する。 Next, the details of the terminal device 25 shown in FIG. 2 will be described with reference to FIG. FIG. 4 is a block diagram showing a detailed configuration of the terminal device 25 shown in FIG. The terminal device 25 shown in FIG. 4 differs from the terminal device 25 shown in FIG. 2 in that the video stream request unit 251 includes a request method setting unit 254, a region of interest detection unit 255, a region of interest information determination unit 256, This is a point constituted by the tile-specific weight information storage unit 257. The request method setting unit 254 selectively inputs a motion vector of the user's head. The attention area detection unit 255 inputs the motion vector of the user's head as user viewing information, and estimates the attention area based on the motion prediction of the head. The attention area information determination unit 256 sends the weight of the tile based on the estimated attention area as attention area information. The tile-by-tile weight information storage unit 257 stores tile-by-tile weight information. Here, description will be made assuming that the tile has the same size as the partial area.

注目領域検出部２５５は、以下のいずれかの方法によって注目領域を検出する。
（ａ）．注目領域の動き（動きベクトル）
（ａ）−１．頭部の動き（外部センサを用いる）
（ａ）−２．視線の動き
（ｂ）．幾何学変換後の面積の大小（地理的位置）
（ｃ）．注目領域
（ｃ）−１．タッチパネルやマウスで入力
（ｃ）−２．視線検出（視点）の位置（外部センサを用いる）
（ｃ）−３．ログによる視聴傾向のフィードバック
（ｄ）．コンテンツの内容 The attention area detection unit 255 detects the attention area by any of the following methods.
(A). Region of interest movement (motion vector)
(A) -1. Head movement (using external sensor)
(A) -2. Eye movement (b). Area size (geographical position) after geometric transformation
(C). Region of interest (c) -1. Input with touch panel or mouse (c) -2. Gaze detection (viewpoint) position (using an external sensor)
(C) -3. Feedback of viewing tendency by log (d). Content content

また、注目領域情報決定部２５６は、前述の（ａ）（ｂ）（ｃ）（ｄ）のいずれか１つ以上でタイル別の重みを決定する。 Further, the attention area information determination unit 256 determines the weight for each tile based on any one or more of the above-described (a), (b), (c), and (d).

次に、具体例を挙げて、前述した映像配信システムの動作を説明する。図５は、本実施形態による映像配信システムの例を示している。符号化部においては、元映像を入力として、複数の映像ストリームを符号化する。符号化にあたっては、映像が記録している領域のうち、注目領域に応じて局所的に符号量の重み付けが異なる重み分布（局所重み分布）をもとに複数の映像ストリームを符号化する。 Next, the operation of the video distribution system described above will be described with a specific example. FIG. 5 shows an example of a video distribution system according to the present embodiment. The encoding unit encodes a plurality of video streams using the original video as input. In encoding, a plurality of video streams are encoded based on a weight distribution (local weight distribution) in which the weighting of the code amount is locally different among the regions in which the video is recorded, depending on the region of interest.

図５においては、グレースケールにて重みを示しており、白い箇所ほど重みが大きく、符号量を多く割り当てる領域である。この重みがユーザの注目領域に相当し、ユーザが注目している領域の候補を仮定して符号量を多く割り当てられた映像ストリームを複数（ｎ通り）用意しておく。 In FIG. 5, the weight is shown in gray scale, and the white portion is a region where the weight is larger and a larger amount of code is allocated. This weight corresponds to the attention area of the user, and a plurality (n) of video streams to which a large amount of code is allocated are prepared assuming a candidate area that the user is paying attention to.

図６は、重み分布と映像の例を示している。元映像は同一であるが、重みの分布に応じて符号量を割り当てた映像が符号化され、例えば符号Ａ１を重み分布として符号化した映像Ａ３は符号Ａ４の部分に多く符号量が割り当てられる。また、符号Ａ２を重み分布として符号化した映像Ａ３は符号Ａ５の部分に多く符号量が割り当てられ、画質が向上している。 FIG. 6 shows an example of weight distribution and video. Although the original video is the same, the video to which the code amount is assigned according to the weight distribution is encoded. For example, the video A3 encoded using the code A1 as the weight distribution is assigned a large code amount to the portion of the code A4. Also, in the video A3 encoded with the code A2 as a weight distribution, a large amount of code is allocated to the portion of the code A5, and the image quality is improved.

注目領域の形状は例えば矩形や円形などを用いることができるが、特定の形状に限定されるものではない。また、領域の配置は規則的に等間隔に並んでいてもよいし、不規則であってもよい。また、重みは任意の離散値、連続値どちらで規定されていてもよい。 The shape of the attention area can be, for example, a rectangle or a circle, but is not limited to a specific shape. The arrangement of the regions may be regularly arranged at regular intervals, or may be irregular. Further, the weight may be defined by any discrete value or continuous value.

重みに基づいた符号量の割り当ては、例えば量子化ステップ数を変化させることで制御することができ、規格Ｈ．２６４などの符号化方式においては、Quantitative Parameterの係数を設定することで指示することができる。 The allocation of the code amount based on the weight can be controlled, for example, by changing the number of quantization steps. In an encoding method such as H.264, it can be instructed by setting a coefficient of a Quantitative Parameter.

映像配信部においては、複数の映像ストリームを選択し、ユーザに配信する。ユーザが注目する領域に基づいた要求に対して、ｎ個のストリームのうち１個が配信される。なお、映像ストリームはある時間間隔で区切ったものを配信すれば、時間間隔ごとに要求するストリームを変更することができる。 The video distribution unit selects a plurality of video streams and distributes them to the user. One of the n streams is distributed in response to the request based on the area that the user pays attention to. In addition, if the video stream divided by a certain time interval is distributed, the requested stream can be changed at every time interval.

ここで、映像ストリームの要求はユーザ側にて行う。その実現方法としては、例えば、図４に示すような構成が考えられる。ユーザ側の端末装置２５においては、注目領域検出部２５５、注目領域情報決定部２５６、復号部２５２、表示部２５３を持つことで、ユーザが注目している部分の映像を高画質で観察することができる。 Here, the video stream request is made on the user side. For example, a configuration as shown in FIG. The terminal device 25 on the user side has a region of interest detection unit 255, a region of interest information determination unit 256, a decoding unit 252, and a display unit 253, thereby observing the video of the portion of interest of the user with high image quality. Can do.

注目領域検出２５５においては、ユーザが映像のうちどこの箇所に注目しているかを検出する。注目領域は、ユーザの操作により映像全体のうち一部分を視聴しているような場合には、表示されている領域を注目領域として用いることができる。 In the attention area detection 255, it is detected where in the video the user is paying attention. The attention area can be used as the attention area when a part of the entire video is viewed by the user's operation.

ユーザの操作は、例えばマウス操作、タッチパネルによる操作を用いることができるほか、頭部搭載型の表示装置と組み合わせる場合には、頭部の動作を位置センサや加速度センサなどを用いて頭部の動きや向きにより操作を行うことも可能である。また、注目領域の検出手段の別の実現方法としては、画像処理や筋電位を用いた視線検出により実現することができる。 For example, a mouse operation or a touch panel operation can be used as the user's operation. When combined with a head-mounted display device, the head operation is performed using a position sensor, an acceleration sensor, or the like. It is also possible to perform the operation depending on the direction. Further, as another method for realizing the attention area detecting means, it can be realized by image processing or gaze detection using myoelectric potential.

映像ストリーム要求部２５１においては、注目領域の情報を入力として、その注目領域に多く符号量が割り当てられ高画質な映像を映像配信部１１に対して要求する。注目する領域に対し、どの映像ストリームを要求するかという対応は、あらかじめテーブルやルールを共有する。 The video stream request unit 251 receives information on the attention area and requests the video distribution section 11 for a high-quality video with a large code amount allocated to the attention area. The correspondence of which video stream is requested for the region of interest shares a table and a rule in advance.

例えば、図７に示すようなテーブルにおいては、映像ストリーム毎に、正規分布に従う重み分布の中心座標、分散が記されている。もちろん重み分布は正規分布に限るわけではなく、矩形分布や任意の分布を用いることができる。 For example, in the table as shown in FIG. 7, the center coordinates and variance of the weight distribution according to the normal distribution are described for each video stream. Of course, the weight distribution is not limited to the normal distribution, and a rectangular distribution or an arbitrary distribution can be used.

また、分布は必ずしもパラメータ化されている必要はなく、重み分布情報自体が映像配信部１１と映像ストリーム要求部２５１で共有するようにしてもよい。この場合、例えば重み分布を注目領域で積分し、最も値の大きな重み分布をもつ映像ストリームを特定して要求するようにしてもよい。また、重み分布の形状を表すそれらのテーブルやルール、もしくは重み分布情報自体は通信経路により配信してもよいし、記録装置により配布がされてもよい。 The distribution is not necessarily parameterized, and the weight distribution information itself may be shared by the video distribution unit 11 and the video stream request unit 251. In this case, for example, the weight distribution may be integrated in the attention area, and the video stream having the largest weight distribution may be specified and requested. Further, those tables and rules representing the shape of the weight distribution, or the weight distribution information itself may be distributed through a communication path, or may be distributed by a recording device.

映像ストリームが要求されると、映像配信部１１は、ユーザに対して対応する映像ストリームが伝送路を介して配信される。ユーザ側の端末装置２５においては、この映像ストリームを復号部２５２により復号し、表示部２５３により表示する。ここで表示部２５３は復号された映像をそのまま平面ディスプレイに表示することも可能であるし、また、その一部を切り取って例えばタブレット型端末のディスプレイに表示するようにしてもよい。また、矩形状に表示するだけでなく、幾何的な処理により変形して（例えば円形）表示するようにしてもよい。その場合、配信された映像は、コンピュータグラフィクスにおけるテクスチャマッピング処理のテクスチャ映像として用いることもできる。 When the video stream is requested, the video distribution unit 11 distributes the corresponding video stream to the user via the transmission path. In the terminal device 25 on the user side, the video stream is decoded by the decoding unit 252 and displayed by the display unit 253. Here, the display unit 253 can display the decoded video as it is on a flat display, or a part of the video can be cut out and displayed on a display of a tablet terminal, for example. In addition to displaying in a rectangular shape, the image may be deformed (for example, circular) by geometric processing. In this case, the distributed video can also be used as a texture video for texture mapping processing in computer graphics.

一例として、図８に示すように、魚眼レンズを搭載したカメラや複数のカメラで全方位撮影した映像を配信し、復号部２５２により復号した映像５４をコンピュータ空間上でユーザ視点を取り囲むように配置した球面５１にテクスチャとして貼り付け、ユーザの頭部の動作に合わせて視野の位置・方向を設定し、視野の前に設置した描画面５２に描画することができる。 As an example, as shown in FIG. 8, an image taken in all directions by a camera equipped with a fisheye lens or a plurality of cameras is distributed, and the image 54 decoded by the decoding unit 252 is arranged so as to surround the user viewpoint in the computer space. The texture can be pasted on the spherical surface 51, the position and direction of the visual field can be set in accordance with the movement of the user's head, and the image can be drawn on the drawing surface 52 set in front of the visual field.

このようにして描画した映像を頭部搭載型ディスプレイにて表示し、実際の空間上でのユーザの頭部動作を回転センサで検出してその動きに合わせて描画する映像を変化させることで、ユーザは全方位を見回したかのような映像を体験することができる。この場合、球面上におけるユーザの視野５３を注目領域と考えることができ、その映像５４上での対応領域５５の符号量が多く符号化された映像ストリームを映像配信部に要求することで、ユーザの視野内にあり、注目する付近の映像の品質が他の周辺領域に比べて高くなる。 By displaying the image drawn in this way on the head-mounted display, detecting the user's head movement in the actual space with the rotation sensor and changing the image to be drawn according to the movement, The user can experience an image as if looking around. In this case, the user's visual field 53 on the spherical surface can be considered as a region of interest, and the user is requested by requesting a video stream in which the code amount of the corresponding region 55 on the video 54 is large, to the video distribution unit. The quality of the video in the vicinity of interest is higher than in other peripheral areas.

ただし、ここで示したユーザ側の端末装置での実装は一例であり、映像符号化配信において、ユーザの注目する箇所に符号量を多く割り当て、同一条件の配信帯域でユーザの体感品質を高めるような符号化配信方法を提供するものであれば他の方法によって実現してもよい。 However, the implementation on the terminal device on the user side shown here is merely an example, and in video encoded distribution, a large amount of code is allocated to the location of interest of the user so as to improve the user's quality of experience in the distribution band of the same condition Any other method may be used as long as it provides a simple encoding and distribution method.

ここで、本実施形態における符号化処理の計算量は注目領域の数ｎにより決定され、ユーザ数ｍに依存しない。それにより、例えばユーザ数ｍが数百万など大規模になったとしても、サーバ側が符号化に要する計算機負荷は一定となり、配信の負荷のみが増大する。 Here, the calculation amount of the encoding process in the present embodiment is determined by the number of attention areas n and does not depend on the number of users m. Thereby, for example, even if the number of users m becomes large, such as several million, the computer load required for encoding on the server side is constant, and only the distribution load increases.

配信の負荷を分散処理して解決する手段は、例えばＣＤＮ（Content delivery network）と呼ばれる仕組みが利用可能であり、公知の技術により解決可能である。 For example, a mechanism called CDN (Content delivery network) can be used as a means for solving the distribution load by performing distributed processing, and can be solved by a known technique.

＜第２実施形態＞
次に、本発明の第２実施形態による映像配信システムを説明する。本実施形態では、ユーザ側の端末装置でユーザの視点の移動に応じた映像ストリームの要求を行い、映像配信部１１により配信をする。このようにすることで、ユーザが映像ストリームを要求してから配信されて届くまでの伝送経路やバッファでの遅延の影響を低減する。 Second Embodiment
Next, a video distribution system according to a second embodiment of the present invention will be described. In the present embodiment, the user terminal device requests a video stream according to the movement of the user's viewpoint, and the video distribution unit 11 distributes the request. In this way, the influence of delay in the transmission path and buffer from when the user requests the video stream until it is delivered and arrives is reduced.

前述した第１実施形態においては、注目領域が静止している場合には問題ではないが、移動している場合には、必ずしも重み付けをして符号量を多く割り当てた箇所を表示時にユーザが観察しているとは限らない。例えば、要求を行い、配信されて表示されるまでの遅延をΔｔとすると、この期間中に注目領域が移動した場合、必ずしも符号量を多く割り当てた領域を注目しているとは限らない。 In the first embodiment described above, this is not a problem when the region of interest is stationary. However, when the region of interest is moving, the user observes a portion where a large amount of code is assigned by weighting at the time of display. Not necessarily. For example, if a delay is made until a request is made and distributed and displayed, if the attention area moves during this period, the area to which a large amount of code is allocated is not necessarily focused.

例えば、注目領域の中心点（ｘ，ｙ）が速度ベクトル（Ｖｘ，Ｖｙ）で移動していた場合、表示をする際には要求をした位置（ｘ，ｙ）から（ｘ＋ＶｘΔｔ，ｙ＋ＶｙΔｔ）まで移動している。そこで、本実施形態の符号化部３４においては、注目領域の移動パターンに応じた複数の映像ストリームを用意する。その一例として、例えば、注目領域の中心は同じであるが、分布の広がりや形状が異なる複数の重み分布を用いて符号化処理を行う。 For example, when the center point (x, y) of the attention area is moved by the velocity vector (Vx, Vy), the display moves from the requested position (x, y) to (x + VxΔt, y + VyΔt). doing. Therefore, the encoding unit 34 of the present embodiment prepares a plurality of video streams corresponding to the movement pattern of the attention area. As an example, for example, encoding processing is performed using a plurality of weight distributions having the same center of the attention area but having different distribution spreads and shapes.

例えば図９は、注目領域の中心は同じであるが、分散が異なる正規分布に従う重み分布６１、６２を示している。そして、注目領域の速度Ｖ＝√（Ｖｘ^２＋Ｖｙ^２）に応じた分散σの重み分布で符号化された映像ストリームを要求することで、移動速度が小さいときには局所的に符号量をより集中した映像ストリームをユーザに提示する一方で、移動速度が大きいときには広い範囲で符号量を割り当てている映像ストリームを配信し、注目領域が重みの中心から外れてしまったときの極端な画質の低下を抑止することができる。 For example, FIG. 9 shows weight distributions 61 and 62 according to normal distributions having the same center of attention area but different variances. Then, by requesting a video stream encoded with a weight distribution of variance σ corresponding to the speed V = √ (Vx ² + Vy ² ) of the region of interest, the code amount is more concentrated locally when the moving speed is low While presenting the video stream to the user, when the moving speed is high, the video stream with the code amount allocated in a wide range is distributed, and the extreme deterioration in image quality when the attention area deviates from the center of the weight is suppressed. can do.

また、例えば図１０のように傾きを持つ重み分布を用いることができ、例えば（１）式で表すことができる。
Further, for example, a weight distribution having an inclination as shown in FIG. 10 can be used, and it can be expressed by, for example, Expression (1).

ここで、ρは分布の傾きに関するパラメータ、（ｘ０，ｙ０）は分布の中心点である。そして、分布の方向を現在の速度の方向に応じたものを要求することで、Δｔ後の注目領域が重みのある箇所にあたる可能性を大きくすることができる。 Here, ρ is a parameter related to the slope of the distribution, and (x0, y0) is the center point of the distribution. Then, by requesting the direction of the distribution according to the current speed direction, it is possible to increase the possibility that the attention area after Δt corresponds to a weighted part.

また、Δｔは要求を行って配信されるまでの時間を計測することで分かるため、等速度で移動することを仮定すれば、中心点が（ｘ＋ＶｘΔｔ，ｙ＋ＶｙΔｔ）に移動していることを予測して、その付近で重みが大きい重み分布で符号化された映像ストリームを要求することができる。 In addition, since Δt can be determined by measuring the time until a request is made and delivered, assuming that it moves at a constant speed, it is predicted that the center point has moved to (x + VxΔt, y + VyΔt). Thus, a video stream encoded with a weight distribution having a large weight in the vicinity thereof can be requested.

このように、ユーザが注目領域を移動している場合においても、注目領域での画質の低下を抑止することができる。ただし、特定の重み分布や予測方法に依存するものではない。 In this way, even when the user is moving the attention area, it is possible to suppress a decrease in image quality in the attention area. However, it does not depend on a specific weight distribution or prediction method.

＜第３実施形態＞
次に、本発明の第３実施形態による映像配信システムを説明する。本実施形態では、ユーザの注目領域から要求するストリームを特定する手段を備える。第１実施形態および第２実施形態においては、ユーザ側の端末装置２５で要求するストリームを特定し、配信サーバ１に要求をしていたのに対し、本実施形態の構成をとることで、端末装置２５側の計算負荷を軽減することができる。 <Third Embodiment>
Next, a video distribution system according to a third embodiment of the present invention will be described. In the present embodiment, there is provided means for specifying a stream requested from the attention area of the user. In the first embodiment and the second embodiment, the stream requested by the terminal device 25 on the user side is specified and the distribution server 1 is requested. The calculation load on the device 25 side can be reduced.

図１１は、第３実施形態による映像配信システムの構成を示すブロック図である。ストリームを特定する手段としては、例えば、第１実施形態の映像ストリーム要求部２５１で説明したように注目領域の情報を入力とし、決定することも可能であるし、第２実施形態の映像ストリーム要求部２５１で説明したように移動速度や方向の情報を入力とし、特定するすることもできる。 FIG. 11 is a block diagram showing a configuration of a video distribution system according to the third embodiment. As a means for specifying the stream, for example, as described in the video stream request unit 251 of the first embodiment, it is possible to input and determine the information of the attention area, or the video stream request of the second embodiment. As described in the section 251, information on the moving speed and direction can be input and specified.

なお、図１１においては通信経路上に映像ストリーム決定部４を配置したが、映像配信部と同一の装置に組み込むことも可能である。 In FIG. 11, the video stream determination unit 4 is arranged on the communication path, but it can be incorporated in the same apparatus as the video distribution unit.

＜第４実施形態＞
次に、本発明の第４実施形態による映像配信システムを説明する。本実施形態では、個々のユーザごとの注目領域とは別に、多くのユーザに見られやすい領域の重みをあらかじめ大きくし、ユーザごとの要求に応じたものと演算することで、一般的に多くのユーザにとって重要な部分（統計的重み）と、個々のユーザにとって重要な部分の両方に割り当てる符号量を多くする。 <Fourth embodiment>
Next, a video distribution system according to a fourth embodiment of the present invention will be described. In this embodiment, in addition to the attention area for each individual user, the weight of the area that is easily seen by many users is increased in advance, and it is generally calculated that the weight according to the request for each user is satisfied. The amount of code to be allocated to both the important part for users (statistical weight) and the important part for individual users is increased.

図１２は、局所重み決定部３３の構成を示すブロック図である。局所重み決定部３３において、注目領域による重み分布と統計的重み分布を元にして、重みマップ合成部３３１が新たな局所重み分布を作成する。例えば、注目領域による重み分布と統計的重み分布を積演算することで新たな局所重み分布を得ることができるが、作成方法はこの限りではない。 FIG. 12 is a block diagram illustrating a configuration of the local weight determination unit 33. In the local weight determination unit 33, the weight map synthesis unit 331 creates a new local weight distribution based on the weight distribution and the statistical weight distribution by the region of interest. For example, a new local weight distribution can be obtained by multiplying the weight distribution by the region of interest and the statistical weight distribution, but the creation method is not limited to this.

このようにすることで、個々のユーザにとって重要な部分の品質を高めつつ、急な視点移動によって配信が追いつかない場合でも、一般的に重要な部分の再現性は高いため、ユーザ体験を補償することが可能となる。 By doing this, while improving the quality of the important parts for individual users, even if the distribution cannot catch up due to sudden viewpoint movement, the reproducibility of the important parts is generally high, so the user experience is compensated It becomes possible.

統計的重みの決定方法としては、参考文献１のように顕著性マップと呼ばれる人の視覚モデルを用いた注目されやすい領域の推定を用いることができる。また、そのようなエッジやコントラストなど低次の特徴量だけでなく、人の顔など高次の特徴量を用いることができる。
参考文献１：L. Itti, C. Koch, and E. Niebur. A model of saliency based visual attention for rapid scene analysis. IEEE TPAMI, 20(11):1254-1259, 1998. 409, 410, 412, 414. As a method of determining the statistical weight, it is possible to use an estimation of a region that is easily noticed using a human visual model called a saliency map as in Reference Document 1. Further, not only low-order feature amounts such as edges and contrast but also high-order feature amounts such as human faces can be used.
Reference 1: L. Itti, C. Koch, and E. Niebur. A model of saliency based visual attention for rapid scene analysis. IEEE TPAMI, 20 (11): 1254-1259, 1998. 409, 410, 412, 414 .

また、そのほかの統計的重みの決定方法として、観察頻度情報を用いることができる。例えば、配信サーバにおいて配信ログとして記録されているため、それを用いればよい。また、そのほかの統計的重みの決定方法として、コンサート会場でのステージ付近など、重要な被写体が存在する箇所が経験則として分かっている場合は重みを大きくすることができる。ただし、統計的重みの決定方法は、ここに記載された限りではなく、多くのユーザに見られやすい箇所を決定する他の手段を適用するようにしてもよい。 In addition, observation frequency information can be used as another statistical weight determination method. For example, since it is recorded as a distribution log in the distribution server, it may be used. As another method for determining the statistical weight, the weight can be increased when a place where an important subject exists is known as an empirical rule, such as near the stage in a concert venue. However, the statistical weight determination method is not limited to that described here, and other means for determining a location that is easily seen by many users may be applied.

＜第５実施形態＞
次に、本発明の第５実施形態による映像配信システムを説明する。本実施形態では、映像が表示される際の幾何変換に応じた重み（地理的重み）をつけた重み分布と注目領域による重み分布を演算することで、最終的な重み分布を計算し、それらに応じた符号量の割り当てをして複数の映像ストリームを符号化する。 <Fifth Embodiment>
Next, a video distribution system according to a fifth embodiment of the present invention will be described. In this embodiment, the final weight distribution is calculated by calculating the weight distribution with the weight (geographic weight) according to the geometric transformation when the video is displayed and the weight distribution by the region of interest, A plurality of video streams are encoded by assigning a code amount according to.

注目領域による重み分布は複数の重み分布のバリエーションを持つのに対し、地理的重み分布は個々の注目領域に依存せずに一定の分布であり、例えばそれぞれの注目領域による重み分布と積演算をすることで、最終的な重み分布を得る。ただし、注目領域による重み分布と地理的重み分布から最終的な重み分布を得る手段は積演算に限定されるものではない。また、注目領域による重み分布と地理的重み分布のほか、第４実施形態における統計的重みとの合成演算により重み分布を決定できる。 While the weight distribution by the attention area has a plurality of weight distribution variations, the geographical weight distribution is a constant distribution that does not depend on each attention area. By doing so, a final weight distribution is obtained. However, the means for obtaining the final weight distribution from the weight distribution by the region of interest and the geographical weight distribution is not limited to product operation. Further, in addition to the weight distribution and the geographical weight distribution according to the region of interest, the weight distribution can be determined by a synthesis operation with the statistical weight in the fourth embodiment.

図１３は、局所重み決定部３３の構成を示すブロック図である。地理的重み分布の決定方法は、例えば、全方位（３６０度）撮影された映像をユーザ側の端末装置でユーザの頭部や視線の動きに応じて表示するような場合、映像をテクスチャマッピングのテクスチャとして配信し、ユーザ側の端末装置２５でコンピュータグラフィクスにより頭部の動きに応じて描画することで実現が可能である。 FIG. 13 is a block diagram illustrating a configuration of the local weight determination unit 33. The determination method of the geographical weight distribution is, for example, when displaying an image taken in all directions (360 degrees) in accordance with the movement of the user's head or line of sight on the terminal device on the user side. This can be realized by distributing the image as a texture and drawing it according to the movement of the head by computer graphics on the terminal device 25 on the user side.

全方位映像を正距円筒図法により記録し、表示時に球面の内部から観察するようにコンピュータグラフィクスのテクスチャマッピング手法により貼り付けて描画することで実現できる。この場合、正距円筒図法の極付近（図８中、符号５５などθ＝９０，−９０付近）では標準緯線付近（図８中、符号５６などθ＝０付近）に比べて実際の球面における面積が拡大されるため、同一の解像度であった場合、より精細に表示される。 It can be realized by recording an omnidirectional video by equirectangular projection and pasting and drawing it by a computer graphics texture mapping method so that it can be observed from the inside of a spherical surface at the time of display. In this case, in the vicinity of the pole of equirectangular projection (in the vicinity of θ = 90, −90 such as 55 in FIG. 8), in the actual spherical surface as compared with the vicinity of the standard parallel (in the vicinity of θ = 0 such as 56 in FIG. 8). Since the area is enlarged, when the resolution is the same, it is displayed more finely.

この画質のムラを補正するために、極付近の地理的重みを小さく、標準緯線付近５６の地理的重みを大きくすることで符号量を割り当て、図１３中、符号１００１のようになる。 In order to correct this unevenness in image quality, a code amount is assigned by reducing the geographical weight near the pole and increasing the geographical weight near the standard parallel 56, as indicated by reference numeral 1001 in FIG.

また、地理的重み分布は、球を内部から観察した場合に限らず、視線と法線がなす角度により決定することができ、球面だけでなくポリゴンと呼ばれる多角形の組み合わせなど任意の被写体形状、任意の被写体の数、任意の視点の位置において有効である。 In addition, the geographical weight distribution is not limited to the case where the sphere is observed from the inside, but can be determined by the angle formed by the line of sight and the normal, and any subject shape such as a combination of polygons called not only spherical surfaces but also polygons, This is effective for an arbitrary number of subjects and an arbitrary viewpoint position.

また、重みとして連続値や複数の離散値だけでなく、二値（例えば０か１）の見えるか見えないかという可視性（visibility）の情報を割り当てることで、見えていない領域に符号量を割り当てないこともできる。 Moreover, not only continuous values and a plurality of discrete values but also visibility information indicating whether a binary value (for example, 0 or 1) is visible or not is assigned as a weight. It is also possible not to assign.

このようにすることで、表示の際に縮小や解像度が低い場合においても、不必要に符号量を割り当て、伝送帯域を多く必要とすることを防ぐことができる。 In this way, even when the display is reduced or the resolution is low, it is possible to prevent unnecessary allocation of a code amount and the need for a large transmission band.

以上説明したように、重み付けをした符号化をすることで、ユーザが注目していない箇所の符号量を削減し、伝送帯域を節約することができる。その一方で、重み付けにより、注目している箇所に符号量を割り当て、画質を向上させることで、ユーザの体感品質が向上する。また、１つのストリームで伝送することができる。また、あらかじめ符号化しておくことで、要求してから重みを割り当てて符号化するのに対して、符号化にかかる時間と、要求数が増えたときの計算機負荷を抑えることができる。 As described above, by performing weighted encoding, it is possible to reduce the amount of code at a location not noticed by the user and save the transmission band. On the other hand, by assigning a code amount to a spotted area by weighting and improving the image quality, the quality of experience of the user is improved. Further, it can be transmitted in one stream. Also, by encoding in advance, it is possible to reduce the time required for encoding and the computer load when the number of requests increases, while assigning weights after encoding and encoding.

また、移動の情報を元にすることで注目領域の予測が可能となり、注目領域の変化に対し、高画質領域の反映が遅れないようにすることができる。 In addition, the attention area can be predicted based on the movement information, and the reflection of the high-quality area can be prevented from being delayed with respect to the change in the attention area.

ユーザ側の端末装置で、必要な映像ストリームを決定する演算を行う必要がなくなり、端末装置の演算量を低減することができる。 It is not necessary to perform a calculation for determining a necessary video stream in the terminal device on the user side, and the calculation amount of the terminal device can be reduced.

重み付けの演算をすることで、多くのユーザが注目しにくい箇所の符号量を削減し、伝送帯域を節約することができる。その一方で、重み付けにより、注目されやすい箇所に符号量を割り当て、画質を向上させることで、ユーザの体感品質が向上する。 By performing the weighting calculation, it is possible to reduce the amount of code at a location that is difficult for many users to pay attention to and to save the transmission band. On the other hand, by assigning a code amount to a spot that is easily noticed by weighting and improving the image quality, the quality of experience of the user is improved.

また、ユーザの注目領域の移動が速い場合には、多くのユーザが注目しやすい箇所の画質は高いため、個々のユーザの体感品質の低下を防ぐことができる。また、表示の際に高い解像度の表示装置を必要としない。 In addition, when the user's attention area moves quickly, the image quality of a portion where many users are likely to pay attention is high, so that it is possible to prevent a decrease in the quality of experience of individual users. In addition, a high-resolution display device is not required for display.

前述した実施形態における映像配信システムの全部または一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されるものであってもよい。 You may make it implement | achieve all or one part of the video delivery system in embodiment mentioned above with a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be a program for realizing a part of the above-described functions, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system. It may be realized using hardware such as PLD (Programmable Logic Device) or FPGA (Field Programmable Gate Array).

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行ってもよい。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Therefore, additions, omissions, substitutions, and other modifications of the components may be made without departing from the technical idea and scope of the present invention.

１つの映像ストリームを、限られた伝送帯域で、観察者の注目する領域の画質を高め、さらに多数の利用者に対して配信サーバ側の負荷増大を少なくすることが不可欠な用途にも適用できる。 One video stream can be applied to applications where it is indispensable to improve the image quality of the region of interest to the viewer with a limited transmission band and to reduce the load increase on the distribution server side for a large number of users. .

１・・・配信サーバ、１１・・・映像配信部、１４・・・ビットストリーム蓄積部、２１、２２、２３、２５・・・端末装置、２５１・・・映像ストリーム要求部、２５２・・・復号部、２５３・・・表示部、２５４・・・要求方法設定部、２５５・・・注目領域検出部、２５６・・・注目領域情報決定部、２５７・・・タイル別重み情報記憶部、３・・・符号化装置、３３・・・局所重み決定部、３３１・・・重みマップ合成部、３３２・・・重みマップ合成部、３４・・・符号化部、４・・・映像ストリーム特定部 DESCRIPTION OF SYMBOLS 1 ... Distribution server, 11 ... Video distribution part, 14 ... Bit stream storage part, 21, 22, 23, 25 ... Terminal device, 251 ... Video stream request part, 252 ... Decoding unit, 253 ... Display unit, 254 ... Request method setting unit, 255 ... Attention region detection unit, 256 ... Attention region information determination unit, 257 ... Tile-by-tile weight information storage unit, 3 ... Coding device, 33 ... Local weight determination unit, 331 ... Weight map synthesis unit, 332 ... Weight map synthesis unit, 34 ... Coding unit, 4 ... Video stream identification unit

Claims

配信対象の映像を入力する映像入力部と、
前記映像の注目領域毎に異なる重み分布を付与する重み付け部と、
前記重み分布に応じた符号量となるように符号化を行ったビットストリームを複数生成する符号化部と
を備えた符号化装置。 A video input unit for inputting the video to be distributed;
A weighting unit that assigns a different weight distribution to each region of interest of the video;
An encoding device comprising: an encoding unit that generates a plurality of bitstreams encoded so as to have a code amount corresponding to the weight distribution.

前記重み付け部は、
前記映像の所定領域ごとに重み付けを行う第１の重み付けと、
前記映像から注目領域を抽出して重み付けを行う第２の重み付けと、
他のユーザの判断をフィードバックして注目領域を抽出して重み付けを行う第３の重み付けと、
前記映像の内容に基づいて手作業で注目領域を抽出して重み付けを行う第４の重み付けと、
前記映像を見るときの地理的位置により注目領域を抽出して重み付けを行う第５の重み付け
とのうち、少なくとも一つの重み付けを用いて前記重み分布を付与する請求項１に記載の符号化装置。 The weighting unit is
A first weighting for weighting each predetermined area of the video;
A second weighting that extracts a region of interest from the video and performs weighting;
A third weighting that performs feedback by extracting a region of interest by feeding back the judgment of another user;
A fourth weighting for manually extracting a region of interest based on the content of the video and performing weighting;
The encoding apparatus according to claim 1, wherein the weight distribution is assigned using at least one weighting among a fifth weighting that performs weighting by extracting a region of interest based on a geographical position when viewing the video.

配信対象の映像を入力する映像入力部を備えた符号化装置が行う符号化方法であって、
前記映像の注目領域毎に異なる重み分布を付与する重み付けステップと、
前記重み分布に応じた符号量となるように符号化を行ったビットストリームを複数生成する符号化ステップと
を有する符号化方法。 An encoding method performed by an encoding device including a video input unit that inputs video to be distributed,
A weighting step of assigning a different weight distribution for each region of interest of the video;
An encoding step of generating a plurality of bitstreams encoded so as to have a code amount corresponding to the weight distribution.

コンピュータを、請求項１または２に記載の符号化装置として機能させるための符号化プログラム。 An encoding program for causing a computer to function as the encoding device according to claim 1.