WO2015053287A1 - 画像復号装置、画像符号化装置、および、符号化データ変換装置 - Google Patents
画像復号装置、画像符号化装置、および、符号化データ変換装置 Download PDFInfo
- Publication number
- WO2015053287A1 WO2015053287A1 PCT/JP2014/076853 JP2014076853W WO2015053287A1 WO 2015053287 A1 WO2015053287 A1 WO 2015053287A1 JP 2014076853 W JP2014076853 W JP 2014076853W WO 2015053287 A1 WO2015053287 A1 WO 2015053287A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- layer
- encoded data
- picture
- information
- unit
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/33—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/174—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/187—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/40—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Definitions
- the present invention relates to an image decoding apparatus that decodes hierarchically encoded data in which an image is hierarchically encoded, an image encoding apparatus that generates hierarchically encoded data by hierarchically encoding an image, and a hierarchical code
- the present invention relates to an encoded data converter for converting encoded data.
- One of information transmitted in a communication system or information recorded in a storage device is an image or a moving image. 2. Description of the Related Art Conventionally, a technique for encoding an image for transmitting and storing these images (hereinafter including moving images) is known.
- Non-patent Document 1 As video encoding methods, AVC (H.264 / MPEG-4 Advanced Video Coding) and HEVC (High-Efficiency Video Coding), which is a successor codec, are known (Non-patent Document 1).
- a predicted image is usually generated based on a local decoded image obtained by encoding / decoding an input image, and obtained by subtracting the predicted image from the input image (original image).
- Prediction residuals (sometimes referred to as “difference images” or “residual images”) are encoded.
- examples of the method for generating a predicted image include inter-screen prediction (inter prediction) and intra-screen prediction (intra prediction).
- predicted images in a picture are sequentially generated based on a locally decoded image in the same picture.
- inter prediction a predicted image is generated by motion compensation between pictures.
- a decoded picture used for predictive image generation in inter prediction is called a reference picture.
- a technique for generating encoded data from a plurality of moving images by encoding a plurality of mutually related moving images into layers (hierarchies) is also known, which is called a hierarchical encoding technique .
- the encoded data generated by the hierarchical encoding technique is also referred to as hierarchical encoded data.
- SHVC Scalable HEVC
- Non-patent Document 2 As a representative hierarchical encoding technique, SHVC (Scalable HEVC) based on HEVC is known (Non-patent Document 2).
- SHVC supports spatial scalability, temporal scalability, and SNR scalability.
- spatial scalability hierarchical encoded data is generated by dividing a plurality of moving images having different resolutions into layers. For example, an image downsampled from the original image to a desired resolution is encoded as a lower layer. Next, the original image is encoded as an upper layer after applying inter-layer prediction in order to remove redundancy between layers.
- MV-HEVC Multi-View HEVC
- HEVC based on HEVC
- Non-patent Document 3 Another typical hierarchical coding technique
- MV-HEVC supports view scalability.
- view scalability a moving image corresponding to a plurality of different viewpoints (views) is divided into layers and encoded to generate hierarchical encoded data.
- a moving image corresponding to a basic viewpoint (base view) is encoded as a lower layer.
- a moving image corresponding to a different viewpoint is encoded as an upper layer after applying inter-layer prediction.
- Inter-layer prediction in SHVC and MV-HEVC includes inter-layer image prediction and inter-layer motion prediction.
- inter-layer image prediction a predicted image is generated using a decoded image of a lower layer.
- inter-layer motion prediction motion information prediction values are derived using motion information of lower layers.
- a picture used for prediction in inter-layer prediction is called an inter-layer reference picture.
- a layer including an inter-layer reference picture is called a reference layer.
- reference pictures used for inter prediction and reference pictures used for inter-layer prediction are generically referred to simply as reference pictures.
- any of inter prediction, intra prediction, and inter-layer image prediction can be used to generate a predicted image.
- One of the applications that use SHVC and MV-HEVC is a video application that takes into account the attention area.
- a video playback terminal normally plays back video in the entire area with a relatively low resolution.
- the attention area When a part of the video displayed by the viewer of the video reproduction terminal is designated as the attention area, the attention area is displayed on the reproduction terminal with high resolution.
- the video application considering the attention area as described above is a hierarchical code in which a relatively low resolution video of the entire area is encoded as lower layer encoded data, and a high resolution video of the attention area is encoded as upper layer encoded data.
- This can be realized using the data. That is, when reproducing the entire region, only the encoded data of the lower layer is decoded and reproduced, and when reproducing the high-resolution video of the region of interest, the encoded data of the upper layer is converted into the encoded data of the lower layer.
- the application can be realized with a smaller transmission band than when both encoded data for low-resolution video and encoded data for high-resolution video are sent.
- the present invention has been made in view of the above-described problem, and its purpose is to use any one of the encoded data of the upper layer corresponding to the entire region and the encoded data of the upper layer corresponding to the region of interest in the hierarchical encoding method. Is to realize an image encoding device and an image decoding device capable of encoding / decoding.
- an object of the present invention is to encode / decode encoded data of an upper layer that includes only a region of interest and in which the positional relationship between the pixels of the upper layer and the pixels of the lower layer is correctly associated
- An object is to realize a possible image encoding device and image decoding device.
- an object of the present invention is to provide a data structure of encoded data that can generate encoded data of an upper layer corresponding to a region of interest from encoded data of an upper layer corresponding to the entire region without generating a decoded image.
- an encoded data conversion apparatus that generates encoded data of an upper layer corresponding to the region of interest from the encoded data of the upper layer.
- an image decoding apparatus decodes encoded data of an upper layer included in hierarchically encoded data, and restores a decoded picture of an upper layer that is a target layer
- An image decoding apparatus comprising: a parameter set decoding unit that decodes a parameter set; and a prediction image generation unit that generates a prediction image by inter-layer prediction with reference to a decoded pixel of a reference layer picture, the parameter set decoding unit Is characterized by decoding the phase correspondence information between layers, which is information relating to the target layer pixel and the position on the reference layer picture corresponding to the target layer pixel.
- an image encoding device is an image encoding device that generates encoded data of an upper layer from an input image, a parameter set decoding unit that decodes a parameter set, and a reference
- a prediction image encoding unit that generates a prediction image by inter-layer prediction with reference to a decoded pixel of a layer picture
- the parameter set decoding unit includes a target layer pixel and a reference layer picture corresponding to the target layer pixel
- the inter-layer phase correspondence information which is information related to the position, is encoded, and the prediction image encoding unit derives a reference layer position corresponding to the prediction target pixel based on the inter-layer phase correspondence information when performing inter-layer prediction.
- the corresponding reference position deriving process is executed.
- the hierarchical encoded data conversion apparatus converts input hierarchical encoded data based on input attention area information, and outputs the converted hierarchical encoded data.
- a hierarchically encoded data conversion apparatus comprising: a parameter set decoding unit that decodes an uncorrected parameter set from input hierarchically encoded data; and a corrected parameter set by correcting the uncorrected parameter set based on input attention area information
- a parameter set correction unit that generates a NAL selection unit that selects a coding layer NAL to be included in output hierarchical encoded data based on tile information and the region-of-interest information, and the NAL selection unit includes: A tile that overlaps at least a part of the region of interest shown is set as the extraction target tile, and the slice included in the extraction target tile
- the corresponding video coding layer NAL is selected as a video coding layer NAL to be included in the layered coded data after conversion, and the parameter set modification unit is configured to select the picture size and tile
- An image decoding apparatus is an image decoding apparatus that decodes higher layer encoded data included in hierarchically encoded data and restores a decoded picture of an upper layer that is a target layer, and includes a parameter A parameter set decoding unit that decodes the set, and a prediction image generation unit that generates a prediction image by inter-layer prediction with reference to the decoded pixel of the reference layer picture, and the parameter set decoding unit includes: Inter-phase phase correspondence information, which is information related to the position on the reference layer picture corresponding to the target layer pixel, is decoded.
- the image decoding apparatus can derive an accurate position on the reference layer picture corresponding to the prediction target pixel using the inter-layer phase correspondence information. Will improve. Therefore, it is possible to decode the encoded data having a smaller code amount than the conventional one and output the decoded picture of the upper layer.
- (C) shows the slice layer that defines the slice S
- (d) shows the CTU layer that defines the coding tree unit CTU
- (e) shows the code layer 3 shows a CU layer that defines a coding unit (Coding Unit; CU) included in a coding tree unit CTU.
- CU Coding Unit
- the hierarchical moving picture decoding apparatus 1, the hierarchical moving picture encoding apparatus 2, and the encoded data conversion apparatus 3 according to an embodiment of the present invention are described as follows based on FIGS.
- a hierarchical video decoding device (image decoding device) 1 decodes encoded data that has been hierarchically encoded by a hierarchical video encoding device (image encoding device) 2.
- Hierarchical coding is a coding scheme that hierarchically encodes moving images from low quality to high quality.
- Hierarchical coding is standardized in SVC and SHVC, for example.
- the quality of a moving image here widely means an element that affects the appearance of a subjective and objective moving image.
- the quality of the moving image includes, for example, “resolution”, “frame rate”, “image quality”, and “pixel representation accuracy”.
- the quality of the moving image is different, it means that, for example, “resolution” is different, but it is not limited thereto.
- the quality of moving images is different from each other.
- Hierarchical coding technology is classified into (1) spatial scalability, (2) temporal scalability, (3) SNR (Signal to Noise Ratio) scalability, and (4) view scalability from the viewpoint of the type of information layered.
- Spatial scalability is a technique for hierarchizing resolution and image size.
- Time scalability is a technique for layering at a frame rate (number of frames per unit time).
- SNR scalability is a technique for layering in coding noise.
- view scalability is a technique for hierarchizing at the viewpoint position associated with each image.
- the encoded data conversion device 3 converts the encoded data that has been hierarchically encoded by the hierarchical moving image encoding device 2, and converts the encoded data related to a predetermined attention region (the attention region encoded data). ) Is generated.
- the attention area encoded data can be decoded by the hierarchical moving picture decoding apparatus 1 according to the present embodiment.
- the hierarchical video decoding device 1 Prior to detailed description of the hierarchical video encoding device 2, the hierarchical video decoding device 1, and the hierarchical encoded data conversion device 3 according to the present embodiment, first, (1) the hierarchical video encoding device 2 or the hierarchical code. A layer structure of hierarchically encoded data generated by the encoded data conversion device 3 and decoded by the hierarchical video decoding device 1 will be described, and then (2) a specific example of a data structure that can be adopted in each layer will be described.
- FIG. 2 is a diagram schematically illustrating a case where a moving image is hierarchically encoded / decoded by three layers of a lower layer L3, a middle layer L2, and an upper layer L1. That is, in the example shown in FIGS. 2A and 2B, of the three layers, the upper layer L1 is the highest layer and the lower layer L3 is the lowest layer.
- a decoded image corresponding to a specific quality that can be decoded from hierarchically encoded data is referred to as a decoded image of a specific hierarchy (or a decoded image corresponding to a specific hierarchy) (for example, in the upper hierarchy L1).
- Decoded image POUT # A a decoded image of a specific hierarchy (or a decoded image corresponding to a specific hierarchy) (for example, in the upper hierarchy L1).
- FIG. 2A shows a hierarchical moving image encoding apparatus 2 # A to 2 # C that generates encoded data DATA # A to DATA # C by hierarchically encoding input images PIN # A to PIN # C, respectively. Is shown.
- FIG. 2B shows a hierarchical moving picture decoding apparatus 1 # A ⁇ that generates decoded images POUT # A ⁇ POUT # C by decoding the encoded data DATA # A ⁇ DATA # C, which are encoded hierarchically. 1 # C is shown.
- the input images PIN # A, PIN # B, and PIN # C that are input on the encoding device side have the same original image but different image quality (resolution, frame rate, image quality, and the like).
- the image quality decreases in the order of the input images PIN # A, PIN # B, and PIN # C.
- the hierarchical video encoding device 2 # C of the lower hierarchy L3 encodes the input image PIN # C of the lower hierarchy L3 to generate encoded data DATA # C of the lower hierarchy L3.
- Basic information necessary for decoding the decoded image POUT # C of the lower layer L3 is included (indicated by “C” in FIG. 2). Since the lower layer L3 is the lowest layer, the encoded data DATA # C of the lower layer L3 is also referred to as basic encoded data.
- the hierarchical video encoding apparatus 2 # B of the middle hierarchy L2 encodes the input image PIN # B of the middle hierarchy L2 with reference to the encoded data DATA # C of the lower hierarchy, and performs the middle hierarchy L2 Encoded data DATA # B is generated.
- additional data necessary for decoding the decoded image POUT # B of the intermediate hierarchy is added to the encoded data DATA # B of the intermediate hierarchy L2.
- Information (indicated by “B” in FIG. 2) is included.
- the hierarchical video encoding apparatus 2 # A of the upper hierarchy L1 encodes the input image PIN # A of the upper hierarchy L1 with reference to the encoded data DATA # B of the intermediate hierarchy L2 to Encoded data DATA # A is generated.
- the encoded data DATA # A of the upper layer L1 is used to decode the basic information “C” necessary for decoding the decoded image POUT # C of the lower layer L3 and the decoded image POUT # B of the middle layer L2.
- additional information indicated by “A” in FIG. 2 necessary for decoding the decoded image POUT # A of the upper layer is included.
- the encoded data DATA # A of the upper layer L1 includes information related to decoded images of different qualities.
- the decoding device side will be described with reference to FIG.
- the decoding devices 1 # A, 1 # B, and 1 # C corresponding to the layers of the upper layer L1, the middle layer L2, and the lower layer L3 are encoded data DATA # A and DATA # B, respectively.
- And DATA # C are decoded to output decoded images POUT # A, POUT # B, and POUT # C.
- the hierarchy decoding apparatus 1 # B of the middle hierarchy L2 receives information necessary for decoding the decoded image POUT # B from the hierarchy encoded data DATA # A of the upper hierarchy L1 (that is, the hierarchy encoded data DATA # A decoded image POUT # B may be decoded by extracting “B” and “C”) included in A.
- the decoded images POUT # A, POUT # B, and POUT # C can be decoded based on information included in the hierarchically encoded data DATA # A of the upper hierarchy L1.
- the hierarchical encoded data is not limited to the above three-layer hierarchical encoded data, and the hierarchical encoded data may be hierarchically encoded with two layers or may be hierarchically encoded with a number of layers larger than three. Good.
- Hierarchically encoded data may be configured as described above. For example, in the example described above with reference to FIGS. 2A and 2B, it has been described that “C” and “B” are referred to for decoding the decoded image POUT # B, but the present invention is not limited thereto. It is also possible to configure the hierarchically encoded data so that the decoded image POUT # B can be decoded using only “B”. For example, it is possible to configure a hierarchical video decoding apparatus that receives the hierarchically encoded data composed only of “B” and the decoded image POUT # C for decoding the decoded image POUT # B.
- Hierarchically encoded data can also be generated so that In that case, the lower layer hierarchical video encoding device generates hierarchical encoded data by quantizing the prediction residual using a larger quantization width than the upper layer hierarchical video encoding device. To do.
- Upper layer A layer located above a certain layer is referred to as an upper layer.
- the upper layers of the lower layer L3 are the middle layer L2 and the upper layer L1.
- the decoded image of the upper layer means a decoded image with higher quality (for example, high resolution, high frame rate, high image quality, etc.).
- Lower layer A layer located below a certain layer is referred to as a lower layer.
- the lower layers of the upper layer L1 are the middle layer L2 and the lower layer L3.
- the decoded image of the lower layer refers to a decoded image with lower quality.
- Target layer A layer that is the target of decoding or encoding.
- a decoded image corresponding to the target layer is referred to as a target layer picture.
- pixels constituting the target layer picture are referred to as target layer pixels.
- Reference layer A specific lower layer referred to for decoding a decoded image corresponding to the target layer is referred to as a reference layer.
- a decoded image corresponding to the reference layer is referred to as a reference layer picture.
- pixels constituting the reference layer are referred to as reference layer pixels.
- the reference layers of the upper hierarchy L1 are the middle hierarchy L2 and the lower hierarchy L3.
- the hierarchically encoded data can be configured so that it is not necessary to refer to all of the lower layers in decoding of the specific layer.
- the hierarchical encoded data can be configured such that the reference layer of the upper hierarchy L1 is either the middle hierarchy L2 or the lower hierarchy L3.
- Base layer A layer located at the lowest layer is called a base layer.
- the decoded image of the base layer is the lowest quality decoded image that can be decoded from the encoded data, and is referred to as a basic decoded image.
- the basic decoded image is a decoded image corresponding to the lowest layer.
- the partially encoded data of the hierarchically encoded data necessary for decoding the basic decoded image is referred to as basic encoded data.
- the basic information “C” included in the hierarchically encoded data DATA # A of the upper hierarchy L1 is the basic encoded data.
- Extension layer The upper layer of the base layer is called the extension layer.
- the layer identifier is for identifying the hierarchy, and corresponds to the hierarchy one-to-one.
- the hierarchically encoded data includes a hierarchical identifier used for selecting partial encoded data necessary for decoding a decoded image of a specific hierarchy.
- a subset of hierarchically encoded data associated with a layer identifier corresponding to a specific layer is also referred to as a layer representation.
- a layer representation of the layer and / or a layer representation corresponding to a lower layer of the layer is used. That is, in decoding the decoded image of the target layer, layer representation of the target layer and / or layer representation of one or more layers included in a lower layer of the target layer are used.
- Inter-layer prediction is based on the syntax element value, the value derived from the syntax element value included in the layer expression of the layer (reference layer) different from the layer expression of the target layer, and the decoded image. It is to predict the syntax element value of the target layer, the encoding parameter used for decoding of the target layer, and the like. Inter-layer prediction that predicts information related to motion prediction from reference layer information is sometimes referred to as motion information prediction. In addition, inter-layer prediction predicted from a lower layer decoded image may be referred to as inter-layer image prediction (or inter-layer texture prediction). Note that the hierarchy used for inter-layer prediction is, for example, a lower layer of the target layer. In addition, performing prediction within a target layer without using a reference layer may be referred to as intra-layer prediction.
- the lower layer and the upper layer may be encoded by different encoding methods.
- the encoded data of each layer may be supplied to the hierarchical video decoding device 1 via different transmission paths, or may be supplied to the hierarchical video decoding device 1 via the same transmission path. .
- the base layer when transmitting ultra-high-definition video (moving image, 4K video data) with a base layer and one extended layer in a scalable encoding, the base layer downscales 4K video data, and interlaced video data. It may be encoded by MPEG-2 or H.264 / AVC and transmitted over a television broadcast network, and the enhancement layer may encode 4K video (progressive) with HEVC and transmit over the Internet.
- FIG. 3 is a diagram illustrating a data structure of encoded data (hierarchically encoded data DATA # C in the example of FIG. 2) that can be employed in the base layer.
- Hierarchically encoded data DATA # C illustratively includes a sequence and a plurality of pictures constituting the sequence.
- FIG. 3 shows a hierarchical structure of data in the hierarchical encoded data DATA # C.
- 3A to 3E respectively show a sequence layer that defines a sequence SEQ, a picture layer that defines a picture PICT, a slice layer that defines a slice S, and a coding tree unit (CTU).
- CTU coding tree unit
- sequence layer a set of data referred to by the hierarchical video decoding device 1 for decoding a sequence SEQ to be processed (hereinafter also referred to as a target sequence) is defined.
- the sequence SEQ includes a video parameter set VPS (Video Parameter Set), a sequence parameter set SPS (Sequence Parameter Set), a picture parameter set PPS (Picture Parameter Set), and pictures PICT 1 to PICT. It includes NP (NP is the total number of pictures included in the sequence SEQ) and supplemental enhancement information (SEI).
- the number of layers included in the encoded data and the dependency relationship between the layers are defined.
- the sequence parameter set SPS defines a set of encoding parameters that the hierarchical video decoding device 1 refers to in order to decode the target sequence.
- a plurality of SPSs may exist in the encoded data.
- an SPS used for decoding is selected from a plurality of candidates for each target sequence.
- An SPS used for decoding a specific sequence is also called an active SPS. In the following, unless otherwise specified, it means an active SPS for the target sequence.
- a set of encoding parameters referred to by the hierarchical video decoding device 1 for decoding each picture in the target sequence is defined.
- a plurality of PPS may exist in the encoded data. In that case, one of a plurality of PPSs is selected from each picture in the target sequence.
- a PPS used for decoding a specific picture is also called an active PPS. In the following, unless otherwise specified, PPS means active PPS for the current picture.
- active SPS and the active PPS may be set to different SPS or PPS for each layer.
- Picture layer In the picture layer, a set of data that is referred to by the hierarchical video decoding device 1 in order to decode a picture PICT to be processed (hereinafter also referred to as a target picture) is defined. As shown in FIG. 3B, the picture PICT includes slice headers SH 1 to SH NS and slices S 1 to S NS (NS is the total number of slices included in the picture PICT).
- the slice header SH k includes a coding parameter group referred to by the hierarchical video decoding device 1 in order to determine a decoding method for the corresponding slice S k .
- a coding parameter group referred to by the hierarchical video decoding device 1 in order to determine a decoding method for the corresponding slice S k .
- an SPS identifier (seq_parameter_set_id) that specifies SPS and a PPS identifier (pic_parameter_set_id) that specifies PPS are included.
- the slice type designation information (slice_type) for designating the slice type is an example of an encoding parameter included in the slice header SH.
- I slice that uses only intra prediction at the time of encoding (2) P slice that uses unidirectional prediction or intra prediction at the time of encoding, (3) B-slice using unidirectional prediction, bidirectional prediction, or intra prediction at the time of encoding may be used.
- slice layer In the slice layer, a set of data that is referred to by the hierarchical video decoding device 1 in order to decode a slice S (also referred to as a target slice) to be processed is defined. As shown in FIG. 3C, the slice S includes coding tree units CTU 1 to CTU NC (NC is the total number of CTUs included in the slice S).
- CTU layer In the CTU layer, a set of data referred to by the hierarchical video decoding device 1 for decoding a coding tree unit CTU to be processed (hereinafter also referred to as a target CTU) is defined.
- the coding tree unit may be referred to as a coding tree block (CTB) or a maximum coding unit (LCU).
- CTB coding tree block
- LCU maximum coding unit
- the coding tree unit CTU includes a CTU header CTUH and coding unit information CU 1 to CU NL (NL is the total number of coding unit information included in the CTU).
- NL is the total number of coding unit information included in the CTU.
- the coding tree unit CTU is divided into units for specifying a block size for each process of intra prediction or inter prediction and conversion.
- the above unit of the coding tree unit CTU is divided by recursive quadtree division.
- the tree structure obtained by this recursive quadtree partitioning is hereinafter referred to as a coding tree.
- a unit corresponding to a leaf that is a node at the end of a coding tree is referred to as a coding node.
- the encoding node is a basic unit of the encoding process, hereinafter, the encoding node is also referred to as an encoding unit (CU).
- CU encoding unit
- coding unit information (hereinafter referred to as CU information)
- CU 1 to CU NL corresponds to each coding node (coding unit) obtained by recursively dividing the coding tree unit CTU into quadtrees. Information.
- the root of the coding tree is associated with the coding tree unit CTU.
- the coding tree unit CTU is associated with the highest node of the tree structure of the quadtree partition that recursively includes a plurality of coding nodes.
- each coding node is half of the size of the coding node that is the parent node of the coding node (that is, the node that is one layer higher than the coding node).
- the size of the coding tree unit CTU and the size that each coding unit can take are the size designation information of the minimum coding node and the maximum coding node and the minimum coding node included in the sequence parameter set SPS.
- the size of the coding tree unit CTU is 64 ⁇ 64 pixels.
- the size of the encoding node can take any of four sizes, that is, 64 ⁇ 64 pixels, 32 ⁇ 32 pixels, 16 ⁇ 16 pixels, and 8 ⁇ 8 pixels.
- the CTU header CTUH includes an encoding parameter referred to by the hierarchical video decoding device 1 in order to determine a decoding method of the target CTU. Specifically, as shown in FIG. 3 (d), CTU division information SP_CTU for designating a division pattern of the target CTU into each CU, and a quantization parameter difference ⁇ qp (for designating the quantization step size) qp_delta).
- the CTU division information SP_CTU is information representing a coding tree for dividing the CTU, and specifically, information specifying the shape and size of each CU included in the target CTU and the position in the target CTU. It is.
- the CTU partition information SP_CTU does not need to explicitly include the shape or size of the CU.
- the CTU division information SP_CTU may be a set of flags indicating whether or not the entire target CTU or a partial region of the CTU is to be divided into four. In that case, the shape and size of each CU can be specified by using the shape and size of the CTU together.
- the quantization parameter difference ⁇ qp is a difference qp ⁇ qp ′ between the quantization parameter qp in the target CTU and the quantization parameter qp ′ in the CTU encoded immediately before the target CTU.
- CU layer In the CU layer, a set of data referred to by the hierarchical video decoding device 1 for decoding a CU to be processed (hereinafter also referred to as a target CU) is defined.
- the encoding node is a node at the root of a prediction tree (PT) and a transformation tree (TT).
- PT prediction tree
- TT transformation tree
- the encoding node is divided into one or a plurality of prediction blocks, and the position and size of each prediction block are defined.
- the prediction block is one or a plurality of non-overlapping areas constituting the encoding node.
- the prediction tree includes one or a plurality of prediction blocks obtained by the above division.
- Prediction processing is performed for each prediction block.
- a prediction block that is a unit of prediction is also referred to as a prediction unit (PU).
- PU partitioning There are roughly two types of partitioning in the prediction tree (hereinafter abbreviated as PU partitioning): intra prediction and inter prediction.
- 2N ⁇ 2N (the same size as the encoding node), 2N ⁇ N, 2N ⁇ nU, 2N ⁇ nD, N ⁇ 2N, nL ⁇ 2N, nR ⁇ 2N, and the like are used as division methods. is there.
- the encoding node is divided into one or a plurality of transform blocks, and the position and size of each transform block are defined.
- the transform block is one or a plurality of non-overlapping areas constituting the encoding node.
- the conversion tree includes one or a plurality of conversion blocks obtained by the above division.
- the division in the transformation tree includes the one in which an area having the same size as the encoding node is assigned as the transformation block, and the one in the recursive quadtree division as in the above-described division of the tree block.
- transform processing is performed for each conversion block.
- the transform block which is a unit of transform is also referred to as a transform unit (TU).
- the CU information CU specifically includes a skip flag SKIP, prediction tree information (hereinafter abbreviated as PT information) PTI, and conversion tree information (hereinafter abbreviated as TT information). Include TTI).
- PT information prediction tree information
- TT information conversion tree information
- the skip flag SKIP is a flag indicating whether or not the skip mode is applied to the target PU.
- the value of the skip flag SKIP is 1, that is, when the skip mode is applied to the target CU, A part of the PT information PTI and the TT information TTI in the CU information CU are omitted. Note that the skip flag SKIP is omitted for the I slice.
- the PT information PTI is information related to a prediction tree (hereinafter abbreviated as PT) included in the CU.
- PT prediction tree
- the PT information PTI is a set of information related to each of one or a plurality of PUs included in the PT, and is referred to when a predicted image is generated by the hierarchical video decoding device 1.
- the PT information PTI includes prediction type information PType and prediction information PInfo.
- Prediction type information PType is information that specifies a predicted image generation method for the target PU. In the base layer, it is information that specifies whether intra prediction or inter prediction is used.
- the prediction information PInfo is prediction information used in the prediction method specified by the prediction type information PType.
- intra prediction information PP_Intra is included in the case of intra prediction.
- inter prediction information PP_Inter is included in the case of inter prediction.
- Inter prediction information PP_Inter includes prediction information that is referred to when the hierarchical video decoding device 1 generates an inter prediction image by inter prediction. More specifically, the inter prediction information PP_Inter includes inter PU division information that specifies a division pattern of the target CU into each inter PU, and inter prediction parameters (motion compensation parameters) for each inter PU. Examples of inter prediction parameters include a merge flag (merge_flag), a merge index (merge_idx), an estimated motion vector index (mvp_idx), a reference picture index (ref_idx), an inter prediction flag (inter_pred_flag), and a motion vector residual (mvd) including.
- merge_flag merge flag
- merge_idx merge index
- mvp_idx estimated motion vector index
- ref_idx reference picture index
- inter_pred_flag inter prediction flag
- mvd motion vector residual
- the intra prediction information PP_Intra includes an encoding parameter that is referred to when the hierarchical video decoding device 1 generates an intra predicted image by intra prediction. More specifically, the intra prediction information PP_Intra includes intra PU division information that specifies a division pattern of the target CU into each intra PU, and intra prediction parameters for each intra PU.
- the intra prediction parameter is a parameter for designating an intra prediction method (prediction mode) for each intra PU.
- the intra prediction parameter is a parameter for restoring intra prediction (prediction mode) for each intra PU.
- Parameters for restoring the prediction mode include mpm_flag which is a flag related to MPM (Most Probable Mode, the same applies below), mpm_idx which is an index for selecting the MPM, and an index for designating a prediction mode other than the MPM. Rem_idx is included.
- MPM is an estimated prediction mode that is highly likely to be selected in the target partition.
- the MPM may include an estimated prediction mode estimated based on prediction modes assigned to partitions around the target partition, and a DC mode or Planar mode that generally has a high probability of occurrence.
- prediction mode when simply described as “prediction mode”, it means the luminance prediction mode unless otherwise specified.
- the color difference prediction mode is described as “color difference prediction mode” and is distinguished from the luminance prediction mode.
- the parameter for restoring the prediction mode includes chroma_mode that is a parameter for designating the color difference prediction mode.
- the TT information TTI is information regarding a conversion tree (hereinafter abbreviated as TT) included in the CU.
- TT conversion tree
- the TT information TTI is a set of information regarding each of one or a plurality of transform blocks included in the TT, and is referred to when the hierarchical video decoding device 1 decodes residual data.
- the TT information TTI includes TT division information SP_TT that designates a division pattern for each transform block of the target CU, and quantized prediction residuals QD 1 to QD NT (NT is the target The total number of blocks included in the CU).
- TT division information SP_TT is information for determining the shape of each transformation block included in the target CU and the position in the target CU.
- the TT division information SP_TT can be realized from information (split_transform_unit_flag) indicating whether or not the target node is divided and information (trafoDepth) indicating the division depth.
- each conversion block obtained by the division can take a size from 32 ⁇ 32 pixels to 4 ⁇ 4 pixels.
- Each quantization prediction residual QD is encoded data generated by the hierarchical video encoding device 2 performing the following processes 1 to 3 on a target block that is a conversion block to be processed.
- Process 1 The prediction residual obtained by subtracting the prediction image from the encoding target image is subjected to frequency conversion (for example, DCT conversion (Discrete Cosine Transform) and DST conversion (Discrete Sine Transform));
- Process 2 Quantize the transform coefficient obtained in Process 1;
- the PU partition type specified by the PU partition information includes the following eight patterns in total, assuming that the size of the target CU is 2N ⁇ 2N pixels. That is, 4 symmetric splittings of 2N ⁇ 2N pixels, 2N ⁇ N pixels, N ⁇ 2N pixels, and N ⁇ N pixels, and 2N ⁇ nU pixels, 2N ⁇ nD pixels, nL ⁇ 2N pixels, And four asymmetric splittings of nR ⁇ 2N pixels.
- N 2 m (m is an arbitrary integer of 1 or more).
- a prediction unit obtained by dividing the target CU is referred to as a prediction block or a partition.
- enhancement layer encoded data For encoded data included in the layer representation of the enhancement layer (hereinafter, enhancement layer encoded data), for example, a data structure substantially similar to the data structure shown in FIG. 3 can be adopted. However, in the enhancement layer encoded data, additional information can be added or parameters can be omitted as follows.
- spatial scalability, temporal scalability, SNR scalability, and view scalability hierarchy identification information may be encoded.
- the prediction type information PType included in the CU information CU is information that specifies whether the prediction image generation method for the target CU is intra prediction, inter prediction, or inter-layer image prediction.
- the prediction type information PType includes a flag (inter-layer image prediction flag) that specifies whether or not to apply the inter-layer image prediction mode.
- the inter-layer image prediction flag may be referred to as texture_rl_flag, inter_layer_pred_flag, or base_mode_flag.
- the CU type of the target CU is an intra CU, an inter-layer CU, an inter CU, or a skip CU.
- the intra CU can be defined in the same manner as the intra CU in the base layer.
- the inter-layer image prediction flag is set to “0”, and the prediction mode flag is set to “0”.
- An inter-layer CU can be defined as a CU that uses a decoded image of a picture in a reference layer for generating a predicted image.
- the inter-layer image prediction flag is set to “1” and the prediction mode flag is set to “0”.
- the skip CU can be defined in the same manner as in the HEVC method described above. For example, in the skip CU, “1” is set in the skip flag.
- the inter CU may be defined as a CU that applies non-skip and motion compensation (MC).
- MC non-skip and motion compensation
- the encoded data of the enhancement layer may be generated by an encoding method different from the encoding method of the lower layer. That is, the encoding / decoding process of the enhancement layer does not depend on the type of the lower layer codec.
- the lower layer may be encoded by, for example, MPEG-2 or H.264 / AVC format.
- the VPS may be extended to include a parameter representing a reference structure between layers.
- SPS, PPS, and slice header are extended, and information related to a decoded image of a reference layer used for inter-layer image prediction (for example, an inter-layer reference picture set, an inter-layer reference picture list described later) , Syntax for deriving base control information or the like directly or indirectly).
- a reference layer used for inter-layer image prediction for example, an inter-layer reference picture set, an inter-layer reference picture list described later
- the parameters described above may be encoded independently, or a plurality of parameters may be encoded in combination.
- an index is assigned to the combination of parameter values, and the assigned index is encoded.
- the encoding of the parameter can be omitted.
- FIG. 4 is a diagram for explaining the relationship between pictures and tile slices in hierarchically encoded data.
- a tile is associated with a rectangular partial area in a picture and encoded data relating to the partial area.
- a slice is associated with a partial area in a picture and encoded data related to the partial area, that is, a slice header and slice data related to the partial area.
- FIG. 4A illustrates a divided area when a picture is divided by tile slices.
- the picture is divided into six rectangular tiles (T00, T01, T02, T10, T11, T12).
- Each of the tile T00, the tile T02, the tile T10, and the tile T12 includes one slice (in order, a slice S00, a slice S02, a slice S10, and a slice S12).
- the tile T01 includes two slices (slice S01a and slice S01b)
- the tile T11 includes two slices (slice S11a and slice S11b).
- FIG. 4B illustrates the relationship between tiles and slices in the configuration of encoded data.
- encoded data includes a plurality of VCL (Video Coding Layer) NAL units and non-VCL (non-VCL) NAL units.
- the encoded data of the video encoding layer corresponding to one picture is composed of a plurality of VCL NALs.
- the encoded data corresponding to the picture includes encoded data corresponding to the tiles in the tile raster order. That is, as shown in FIG. 4A, when a picture is divided into tiles, encoded data corresponding to tiles is included in the order of tiles T00, T01, T02, T10, T11, and T12.
- the encoded data corresponding to the slice is changed to the encoded data corresponding to the tile in the order of the CTU at the head of the slice starting from the slice positioned first in the CTU raster scan order within the tile. included.
- the encoded data corresponding to the slices are included in the encoded data corresponding to the tile T01 in order of the slices S01a and S01b. .
- encoded data corresponding to a specific tile in a picture is associated with encoded data corresponding to one or more slices. Therefore, if a decoded image of a slice associated with a tile can be generated, a decoded image of a partial region in a picture corresponding to the tile can be generated.
- FIG. 5 exemplifies a system SYS_ROI1 that performs transmission and reproduction of a hierarchical video that can be realized by combining the hierarchical video decoding device 1, the hierarchical video encoding device 2, and the encoded data conversion device 3.
- the system SYS_ROI1 hierarchically encodes the input low-quality input image PIN # L and the high-quality input image PIN # H with the hierarchical video encoding device 2 # L and the hierarchical video encoding device 2 # H. Generate encoded data BSALL.
- Hierarchically encoded data BSALL includes encoded data corresponding to the entire high-quality input image PIN # H as hierarchically encoded data of an upper layer (enhancement layer).
- the hierarchically encoded data BSALL includes encoded data corresponding to the entire low-quality input image PIN # L as hierarchically encoded data of the lower layer (base layer).
- the hierarchically encoded data BSROI is generated based on the input region of interest ROI.
- the hierarchically encoded data BSROI includes encoded data of a portion corresponding to the attention area ROI of the high-quality input image PIN # H as hierarchically encoded data of the upper layer (enhancement layer).
- the hierarchically encoded data BSROI includes encoded data corresponding to the entire low-quality input image PIN # L as hierarchically encoded data of the lower layer (base layer).
- a decoded image DROI # H corresponding to the high quality input image PIN # H and corresponding to the region of interest ROI is output.
- a decoded image DOUT # L corresponding to the low-quality input image PIN # L is output.
- the description may be made assuming that the system SYS_ROI1 is used.
- the usage of the apparatus is not limited to the system SYS_ROI1.
- FIG. 6 is a functional block diagram showing a schematic configuration of the hierarchical video decoding device 1.
- the hierarchical video decoding device 1 receives hierarchical encoded data DATA (hierarchical encoded data DATAF provided from the hierarchical video encoding device 2 or hierarchical encoded data DATAAR provided from the encoded data conversion device 3). Decoding is performed to generate a decoded image POUT # T of the target layer.
- the target layer is an extension layer having the base layer as a reference layer. Therefore, the target layer is also an upper layer with respect to the reference layer. Conversely, the reference layer is also a lower layer with respect to the target layer.
- the hierarchical video decoding device 1 includes a NAL demultiplexing unit 11, a parameter set decoding unit 12, a tile setting unit 13, a slice decoding unit 14, a base decoding unit 15, and a decoded picture management unit 16.
- the NAL demultiplexing unit 11 demultiplexes the hierarchical encoded data DATA transmitted in units of NAL units in NAL (Network Abstraction Layer).
- NAL is a layer provided to abstract communication between VCL (Video Coding Layer) and lower systems that transmit and store encoded data.
- VCL Video Coding Layer
- VCL is a layer that performs video encoding processing, and encoding is performed in VCL.
- the lower system here corresponds to the H.264 / AVC and HEVC file formats and the MPEG-2 system.
- NAL the bit stream generated by VCL is divided into units called NAL units and transmitted to the lower system as the destination.
- the NAL unit includes encoded data encoded by the VCL and a header for appropriately delivering the encoded data to a destination lower system.
- the encoded data in each layer is stored in NAL units, is NAL multiplexed, and is transmitted to the hierarchical video decoding device 1.
- Hierarchical encoded data DATA includes NAL including parameter sets (VPS, SPS, PPS), SEI, etc. in addition to NAL generated by VCL. Those NALs are called non-VCL NALs versus VCL NALs.
- the NAL demultiplexing unit 11 demultiplexes the hierarchical encoded data DATA, and extracts the target layer encoded data DATA # T and the reference layer encoded data DATA # R. Further, the NAL demultiplexing unit 11 supplies non-VCL NAL to the parameter set decoding unit 12 and VCL NAL to the slice decoding unit 14 among NALs included in the target layer encoded data DATA # T.
- the parameter set decoding unit 12 decodes the parameter set, that is, VPS, SPS, and PPS, from the input non-VCL NAL and supplies them to the tile setting unit 13 and the slice decoding unit 14. Details of processing highly relevant to the present invention in the parameter set decoding unit 12 will be described later.
- the tile setting unit 13 derives the tile information of the picture based on the input parameter set and supplies it to the slice decoding unit 14.
- the tile information includes at least tile division information of a picture. The detailed description of the tile setting unit 13 will be described later.
- the slice decoding unit 14 generates a decoded picture or a partial area of the decoded picture based on the input VCL NAL, parameter set, tile information, and reference picture, and records them in a buffer in the decoded picture management unit 16. .
- a detailed description of the slice decoding unit will be described later.
- the decoded picture management unit 16 records the input decoded picture and the base decoded picture in an internal decoded picture buffer (DPB: “Decoded” Picture ”Buffer), and performs reference picture list generation and output picture determination. Also, the decoded picture management unit 16 outputs the decoded picture recorded in the DPB to the outside as an output picture POUT # T at a predetermined timing.
- DPB internal decoded picture buffer
- the base decoding unit 15 decodes the base decoded picture from the reference layer encoded data DATA # R.
- the base decoded picture is a decoded picture of the reference layer used when decoding the decoded picture of the target layer.
- the base decoding unit 15 records the decoded base decoded picture in the DPB in the decoded picture management unit 16.
- FIG. 7 is a functional block diagram illustrating the configuration of the base decoding unit 15.
- the base decoding unit 15 includes a base NAL demultiplexing unit 151, a base parameter set decoding unit 152, a base tile setting unit 153, a base slice decoding unit 154, and a base decoded picture management unit 156.
- the base NAL demultiplexing unit 151 demultiplexes the reference layer encoded data DATA # R to extract VCL NAL and non-VCL NAL, non-VCL NAL to the base parameter set decoding unit 152, and VCL NAL to base slice Each is supplied to the decryption unit 154.
- the base parameter set decoding unit 152 decodes the parameter set, that is, VPS, SPS, and PPS, from the input non-VCL NAL and supplies them to the base tile setting unit 153 and the base slice decoding unit 154.
- the base style setting unit 153 derives picture tile information based on the input parameter set and supplies it to the base slice decoding unit 154.
- the base slice decoding unit 154 generates a decoded picture or a partial area of the decoded picture based on the input VCL NAL, parameter set, tile information, and reference picture, and stores the decoded picture in the buffer in the base decoded picture management unit 156. Record.
- the base decoded picture management unit 156 records the input decoded picture in the internal DPB, and performs reference picture list generation and output picture determination. Further, the base decoded picture management unit 156 outputs the decoded picture recorded in the DPB as a base decoded picture at a predetermined timing.
- the parameter set decoding unit 12 decodes and outputs a parameter set (VPS, SPS, PPS) used for decoding the target layer from the input encoded data of the target layer.
- a parameter set (VPS, SPS, PPS) used for decoding the target layer from the input encoded data of the target layer.
- the decoding of the parameter set is performed based on a predetermined syntax table. That is, a bit string is read from the encoded data according to the procedure defined by the syntax table, and the syntax value of the syntax included in the syntax table is decoded. Further, if necessary, a variable derived based on the decoded syntax value may be derived and included in the output parameter set.
- the parameter set output from the parameter set decoding unit 12 is a syntax value of syntax related to the parameter set (VPS, SPS, PPS) included in the encoded data, and a variable derived from the syntax value. It can also be expressed as a set of
- the parameter set decoding unit 12 decodes picture information from input target layer encoded data.
- the picture information is information that determines the size of the decoded picture of the target layer.
- the picture information includes information indicating the width and height of the decoded picture of the target layer.
- the picture information is included in the SPS, for example, and is decoded according to the syntax table shown in FIG. FIG. 8 is a part of a syntax table that the parameter set decoding unit 12 refers to when performing SPS decoding, and is a part related to display area information.
- the picture information decoded from the SPS includes the width of the decoded picture (pic_width_in_luma_samples) and the height of the decoded picture (pic_height_in_luma_samples).
- the value of the syntax pic_width_in_luma_samples corresponds to the width of the decoded picture in luminance pixel units.
- the value of the syntax pic_height_in_luma_samples corresponds to the height of the decoded picture in luminance pixel units.
- the parameter set decoding unit 12 decodes the display area information from the input target layer encoded data.
- the display area information is included in the SPS, for example, and is decoded according to the syntax table shown in FIG. FIG. 9 is a part of a syntax table that the parameter set decoding unit 12 refers to when performing SPS decoding, and is a part related to display area information.
- the display area information decoded from the SPS includes a display area flag (conformance_flag).
- the display area flag indicates whether information indicating the position of the display area (display area position information) is additionally included in the SPS. That is, when the display area flag is 1, it indicates that the display area position information is additionally included, and when the display area flag is 0, it indicates that the display area position information is not additionally included.
- the display area information decoded from the SPS further includes display area left offset (conf_win_left_offset), display area right offset (conf_win_right_offset), display area upper offset (conf_win_top_offset), and display area. Contains the lower offset (conf_win_bottom_offset).
- the entire picture is set as the display area.
- the display area flag is 1
- a partial area in the picture indicated by the display area position information is set.
- the display area is also referred to as a conformance window.
- FIG. 10 is a diagram illustrating a relationship between a display area which is a partial area in a picture and display area position information.
- the display area is included in the picture
- the display area offset is the distance between the picture upper edge and the display area upper edge
- the display area left offset is the distance between the picture left edge and the display area left edge
- the display area right offset Represents the distance between the right side of the picture and the right side of the display area
- the lower offset of the display area represents the distance between the lower side of the picture and the lower side of the display area. Therefore, the position and size of the display area in the picture can be uniquely specified by the display area position information.
- the display area information may be other information that can uniquely identify the position and size of the display area in the picture.
- the parameter set decoding unit 12 decodes the inter-layer position correspondence information from the input target layer encoded data.
- the inter-layer position correspondence information schematically indicates the positional relationship between corresponding areas of the target layer and the reference layer. For example, when an object (object A) in a picture of the target layer and a picture of the reference layer is included, an area corresponding to the object A on the picture of the target layer and an area corresponding to the object A on the picture of the reference layer , Corresponding to the regions corresponding to the target layer and the reference layer.
- the inter-layer position correspondence information may not necessarily be information that accurately indicates the positional relationship between the corresponding regions of the target layer and the reference layer, but generally, in order to improve the accuracy of inter-layer prediction. The correct positional relationship between the corresponding layers of the target layer and the reference layer is shown.
- the inter-layer position correspondence information includes inter-layer pixel correspondence information and inter-layer phase correspondence information.
- the inter-layer pixel correspondence information is information indicating a positional relationship between a pixel on the reference layer picture and a pixel on the corresponding target layer picture.
- the inter-layer phase correspondence information is information representing the phase difference of the pixels whose correspondence is indicated by the inter-layer pixel correspondence information.
- the inter-layer pixel correspondence information is included in, for example, an SPS extension (sps_estension) that is a part of the SPS of the upper layer, and is decoded according to the syntax table shown in FIG.
- FIG. 11 is a part of a syntax table that the parameter set decoding unit 12 refers to when performing SPS decoding, and is a part related to inter-layer pixel correspondence information.
- the inter-layer pixel correspondence information decoded from the SPS includes the number of inter-layer pixel correspondence information (num_scaled_ref_layer_offsets) included in the SPS extension.
- the inter-layer pixel correspondence information includes an inter-layer pixel correspondence offset corresponding to the number of the inter-layer pixel correspondence information.
- Inter-pixel pixel offset includes scaled reference layer left offset (scaled_ref_layer_left_offset [i]), scaled reference layer top offset (scaled_ref_layer_top_offset [i]), scaled reference layer right offset (scaled_ref_layer_right_offset [i]), and scale The post-reference layer lower offset (scaled_ref_layer_bottom_offset [i]) is included.
- FIG. 12 is a diagram illustrating a relationship among a picture of the target layer, a picture of the reference layer, and an inter-layer pixel corresponding offset.
- FIG. 12A shows an example in which the entire picture of the reference layer corresponds to a part of the picture of the target layer.
- an area on the target layer corresponding to the entire reference layer picture (target layer corresponding area) is included in the target layer picture.
- FIG. 12B illustrates an example in which a part of the reference layer picture corresponds to the entire picture of the target layer.
- the target layer picture is included inside the reference layer corresponding area. Note that the entire target layer picture is included in the offset.
- the post-scale reference layer left offset represents the offset of the left side of the reference layer target region with respect to the left side of the target layer picture.
- SRL left offset is larger than 0, it means that the left side of the reference layer target area is located on the right side of the left side of the target layer picture.
- the post-scale reference layer offset (SRL offset in the figure) represents the offset of the upper side of the reference layer target area with respect to the upper side of the target layer picture.
- SRL offset When the SRL offset is larger than 0, this indicates that the upper side of the reference layer target area is located below the upper side of the target layer picture.
- the scaled reference layer right offset (SRL right offset in the figure) represents the offset of the right side of the reference layer target area with respect to the right side of the target layer picture.
- SRL right offset When the SRL right offset is larger than 0, it indicates that the right side of the reference layer target area is located on the left side of the right side of the target layer picture.
- the post-scale reference layer lower offset (SRL lower offset in the figure) represents the offset of the lower side of the reference layer target area with respect to the lower side of the target layer picture.
- SRL lower offset When the SRL lower offset is larger than 0, it indicates that the lower side of the reference layer target area is located above the lower side of the target layer picture.
- the inter-layer phase correspondence information is included in, for example, the SPS extension that is a part of the SPS of the higher layer, and is decoded according to the syntax table shown in FIG. FIG. 13 is a part of a syntax table that the parameter set decoding unit 12 refers to when performing SPS decoding, and is a part related to the phase correspondence information between layers.
- the inter-layer phase correspondence information decoded from the SPS includes the reference layer phase offset number (num_ref_layer_phase_offsets).
- the inter-layer phase correspondence information includes the number of reference layer phase offsets corresponding to the number of reference layer phase offsets.
- the reference layer phase offset is expressed by a combination of a left phase offset (ref_layer_left_phase_offset) and an upper phase offset (ref_layer_top_phase_offset).
- the left phase offset represents a horizontal phase offset between the upper left pixel of the reference layer corresponding area and the upper left pixel of the reference layer picture.
- the upper phase offset represents a vertical phase offset between the upper left pixel of the reference layer corresponding region and the upper left pixel of the reference layer picture.
- the upper left pixel of the reference layer corresponding area is a pixel in the target layer picture.
- the phase offset between the pixel in the target layer picture (target layer pixel) and the pixel in the reference layer picture (reference layer pixel) is the target layer pixel corresponding to the reference layer pixel at the point on the target layer corresponding to the reference layer pixel Is a quantity representing a deviation of less than a pixel unit.
- FIG. 14 is a diagram illustrating the relationship between the correspondence between the target layer pixel and the reference layer pixel and the phase difference.
- a part of a reference layer picture and a corresponding reference layer corresponding region on the target layer when a spatial scalability of an enlargement ratio of 1.5 times is used are one-dimensional (a dimension corresponding to a horizontal direction or a vertical direction).
- 6 pixels on the target layer PEL1, PEL2, PEL3, PEL4, PEL5, PEL6 in order from the left
- 4 pixels on the reference layer PRL1, PRL2, PRL3 in order from the left
- PRL4 PRL1, PRL2, PRL3 in order from the left
- Pixel PEL1 and pixel PRL1, and pixel PEL6 and pixel PRL4 are in corresponding positions.
- the phase offset of the pixel PEL2 is a positional shift with respect to the point PEL2 'on the reference layer corresponding to the pixel PEL2 and the reference layer pixel (pixel PEL1) corresponding to the pixel PEL2.
- the phase offset of the pixel PEL2 is 3/5 for each pixel of the reference layer.
- PhaseEL PEL ' That is, the value obtained by adding the phase offset to the position of the reference layer pixel corresponding to the target layer pixel PEL matches the position of the point on the reference layer corresponding to the target layer pixel.
- the strict reference layer phase offset value does not necessarily need to be included in the parameter set, and an approximate value may be included.
- the unit of the reference layer phase offset does not necessarily need to be a pixel unit of the reference layer. For example, a value obtained by approximating a value expressed in units of 16 pixels of the reference layer with integer precision may be used as the reference layer phase offset.
- Interlayer phase correspondence information 2 In the example described with reference to FIG. 13 above, the reference layer phase offset is directly included in the SPS, but this is not a limitation. For example, another parameter from which the reference layer phase offset can be derived may be included. Such an example will be described with reference to the syntax table shown in FIG. FIG. 15 is another example of a part of the syntax table that the parameter set decoding unit 12 refers to at the time of SPS decoding and related to the phase correspondence information between layers.
- the inter-layer phase correspondence information decoded from the SPS includes the reference layer crop offset number (num_cropped_ref_layer_offsets).
- the inter-layer phase correspondence information includes reference layer phase offsets corresponding to the number of reference layer crop offsets.
- the reference layer phase offset is expressed by a combination of a left crop offset (cropped_ref_layer_left_offset) and a top crop offset (cropped_ref_layer_top_offset).
- the left crop offset represents a shift in the horizontal position of the upper left pixel of the reference layer corresponding area with respect to the reference target layer pixel.
- the reference target layer pixel is a pixel located at the upper left of the upper left pixel of the reference layer corresponding region (horizontal position is the same or left and the vertical position is the same or above), and on the corresponding reference layer This is a pixel whose position is an integer position in pixel units on the reference layer.
- the pixel PEL1 can be used as the reference target layer pixel of the pixel PEL2.
- the upper crop offset represents a shift in the vertical position of the upper left pixel of the reference layer corresponding region with respect to the reference target layer pixel.
- the position of the reference layer pixel corresponding to the upper left pixel PELTL of the target layer is Int (PELTL)
- the phase offset of the target layer pixel PELTL is PhaseELTL
- the reference target layer pixel is PELBASE
- the pixel on the reference layer corresponding to the reference target layer pixel PELBASE is In the case of Int (PELBASE)
- scale * (PELTL-PELBASE) PhaseELTL + Int (PELTL)-Int (PELBASE)
- scale is a scaling factor of spatial scalability
- a region obtained by enlarging a reference layer picture at a scaling factor is a reference layer corresponding region.
- the value obtained by multiplying the distance between the pixel PELBTL and the pixel PELBASE on the target layer by the spatial scalability factor is the distance between the pixel corresponding to the pixel PELTL and the pixel corresponding to the pixel PELBASE on the reference layer. Is equal to the value obtained by adding the phase offset to. From this relationship, the phase offset PhaseELTL can be derived based on the position of the pixel PELBASE. The above relationship is established because the point PELBASE ′ on the reference layer corresponding to the pixel PELBASE on the target layer matches the pixel Int (PELBASE) on the reference layer corresponding to the pixel PELBASE.
- the tile setting unit 13 derives and outputs tile information of a picture based on the input parameter set.
- the tile information generated by the tile setting unit 13 roughly includes tile structure information and tile dependency information.
- the tile structure information is information indicating the number of tiles in the picture and the size of each tile.
- the number of tiles in the picture is equal to the product of the number of tiles included in the horizontal direction and the number of tiles included in the vertical direction.
- Tile dependency information is information indicating dependency upon decoding of tiles in a picture.
- the dependency at the time of decoding the tile indicates the degree to which the tile depends on the decoded pixel and the syntax value related to the area outside the tile.
- the area outside the tile includes an area outside the tile on the target picture, an area outside the tile on the reference picture, and an area outside the tile on the base decoded picture.
- tile information generated by the tile setting unit 13 will be described including a derivation process based on an input parameter set.
- Tile information is derived based on syntax values related to tile information included in SPS and PPS included in the parameter set. The syntax related to tile information will be described with reference to FIG.
- FIG. 16 shows a part of the syntax table referred to by the parameter decoding unit 12 when decoding the PPS included in the parameter set, and is a part related to tile information.
- the syntax (PPS tile information) related to tile information included in the PPS includes a multi-tile enabled flag (tiles_enabled_flag).
- tile_enabled_flag When the value of the multi-tile valid flag is 1, it indicates that the picture is composed of two or more tiles. When the value of the flag is 0, the picture is composed of one tile, that is, the picture and the tile match.
- the PPS tile information includes information indicating the number of tile columns (num_tile_columns_minus1), information indicating the number of tile rows (num_tiles_rows_minus1), and a flag indicating tile size uniformity ( uniform_spacing_flag) is additionally included.
- Num_tile_columns_minus1 is a syntax corresponding to a value obtained by subtracting 1 from the number of tiles included in the horizontal direction of the picture.
- Num_tile_rows_minus1 is a syntax corresponding to a value obtained by subtracting 1 from the number of tiles included in the vertical direction of the picture. Therefore, the number of tiles NumTilesInPic included in the picture is calculated by the following equation.
- NumTilesInPic (num_tile_columns_minus1 + 1) * (num_tile_rows_minus1 + 1)
- a uniform_spacing_flag value of 1 indicates that the tile size included in the picture is uniform, that is, the width and height of each tile are equal.
- a uniform_spacing_flag value of 0 indicates that the tile sizes included in the picture are uneven, that is, the width and height of the tiles included in the picture do not necessarily match.
- the PPS tile information includes information indicating the tile width (column_width_minus1 [i]) for each tile column included in the picture, and the picture For each tile row included, additional information (row_height_minus1 [i]) indicating the height of the tile is included.
- the PPS tile information additionally includes a flag (loop_filter_across_tiles_enabled_flag) indicating whether or not to apply a loop filter that crosses tile boundaries.
- FIG. 17 is a diagram illustrating tile rows and tile columns when a picture is divided into tiles.
- the picture is divided by four tile columns and three tile rows, and includes a total of 12 tiles.
- the tile row 0 (TileCol0) includes tiles T00, T10, and T20.
- tile row 0 (TileRow0) includes tiles T00, T01, T02, and T03.
- the width of the tile row i is expressed as ColWidth [i] in CTU units.
- the height of the tile row j is expressed as RowHeight [j] in CTU units. Therefore, the width of the tile belonging to tile row i and belonging to tile column j is ColWidth [i] and the height is RowHeight [j].
- the tile setting unit 13 derives tile structure information.
- the tile structure information includes an array for deriving a tile scan CTB address from a raster scan CTB address (CtbAddrRsToTs [ctbAddrRs]), an array for deriving a raster scan CTB address from a tile scan CTB address (CtbAddrTsToRs [ctbAddrTs]), and a tile scan CTB address
- Each tile identifier (TileId [ctbAddrTs])
- the width of each tile column ColumnWidthInLumaSamples [i]
- the height of each tile row RowHeightInLumaSamples [j]
- each tile column is calculated based on the picture size and the number of tiles in the picture. For example, the width of the i-th tile column (ColumnWidthInLumaSamples [i]) is calculated by the following equation. Note that PicWidthInCtbsY represents the number of CTUs included in the horizontal direction of the picture.
- ColWidth [i] ((i + 1) * PicWidthInCtbsY) / (num_tile_columns_minus1 + 1)-(i * PicWidthInCtbsY) / (num_tile_columns_minus1 + 1) That is, ColWidth [i], which is the width of the i-th tile column in CTU units, is calculated as the difference between the (i + 1) -th and i-th boundary positions obtained by equally dividing the picture by the number of tile columns. .
- the value of ColumnWidthInLumaSamples [i] is set to the value obtained by multiplying ColWidth [i] by the width of the CTU pixel unit.
- the height RowHeight [j] of tile rows in CTU units is also calculated in the same manner as the tile column width.
- PicHeightInCtbsY (the number of CTUs included in the vertical direction of the picture) is used instead of PicWidthInCtbsY, and num_tiles_row_minus1 and row_height_minus1 [i] are used instead of num_tiles_columns_minus1 and column_width_minus1 [i].
- RowHeightInLumaSamples [j] is set to the value obtained by multiplying RowHeight [j] by the CTU pixel unit height.
- colBd [i] indicating the boundary position of the i-th tile row and rowBd [j] indicating the boundary position of the j-th tile row are calculated by the following equations. Note that the values of colBd [0] and rowBd [0] are 0.
- the tile scan CTU address associated with the CTU identified by the raster scan CTU address (ctbAddrRs) included in the picture is derived by the following procedure.
- tbX ctbAddrRs% PicWidthInCtbsY
- tbY ctbAddrRs / PicWidthInCtbsY
- CtbAddrRsToTs includes the sum of CTUs contained in tiles that precede the tiles of (tileX, tileY) in the tile scan order, and (tbX-colBd [tileX], A value obtained by adding the positions of the raster scan order in the tile of the CTU located at tbY (-rowBd [tileY]) is set.
- tileId [ctbAddrTs]
- tile identifier of the tile to which the CTU indicated by ctbAddrTs belongs is set.
- the tile identifier tileId (tileX, tileY) of the tile located at the position of (tileX, tileY) in the tile in the picture is calculated by the following equation.
- tileId (tileX, tileY) (tileY * (num_tile_cols_minus1 + 1)) + tileX (Slice decoding unit 14)
- the slice decoding unit 14 generates and outputs a decoded picture based on the input VCL NAL, parameter set, and tile information.
- FIG. 18 is a functional block diagram illustrating a schematic configuration of the slice decoding unit 14.
- the slice decoding unit 14 includes a slice header decoding unit 141, a slice position setting unit 142, and a CTU decoding unit 144.
- the CTU decoding unit 144 further includes a prediction residual restoration unit 1441, a predicted image generation unit 1442, and a CTU decoded image generation unit 1443.
- the slice header decoding unit 141 decodes the slice header based on the input VCL NAL and the parameter set, and outputs the decoded slice header to the slice position setting unit 142 and the CTU decoding unit 144.
- the slice header includes information related to the slice position in the picture (SH slice position information).
- SH slice position information information related to the slice position in the picture
- a syntax table that the slice header decoding unit 141 refers to when decoding the slice header will be described as an example.
- FIG. 19 shows a part of the syntax table referred to by the slice header decoding unit 141 at the time of decoding the slice header, and is a part related to the slice position information.
- the slice header includes an in-picture first slice flag (first_slice_segment_in_pic_flag) as slice position information.
- first_slice_segment_in_pic_flag an in-picture first slice flag
- the slice header includes a slice PPS identifier (slice_pic_parameter_set_id) as slice position information.
- the slice PPS identifier is an identifier of a PPS associated with the target slice, and tile information to be associated with the target slice is specified via the PPS identifier.
- the slice position setting unit 142 specifies the slice position in the picture based on the input slice header and tile information, and outputs the slice position to the CTU decoding unit 144.
- CtbAddrTs [0] CtbAddrRsToTs [slice_segment_address]
- ctbX [0] slice_segment_address% PicWidthInCtbsY
- ctbY [0] slice_segment_address / PicWidthInCtbsY
- CtbAddrRsToTs [X] is an array for converting raster scan addresses into tile scan addresses, and is included in tile information input to the slice position setting unit.
- the position (ctbX [i], ctbY [i]) in the picture of the i th (i> 0) CTU in the slice is calculated by the following equation.
- the slice position setting unit 142 calculates and outputs the position of each CTU included in the slice in the picture.
- the CTU decoding unit 144 decodes a slice by decoding a decoded image in a region corresponding to each CTU included in the slice based on the input slice header, slice data, and parameter set. Generate an image. The decoded image of the slice is output as a part of the decoded picture at the position indicated by the input slice position.
- a decoded image of the CTU is generated by the prediction residual restoration unit 1441, the predicted image generation unit 1442, and the CTU decoded image generation unit 1443 inside the CTU decoding unit 144.
- the prediction residual restoration unit 1441 decodes prediction residual information (TT information) included in the input slice data to generate and output a prediction residual of the target CTU.
- the predicted image generation unit 1442 generates and outputs a predicted image based on the prediction method and the prediction parameter indicated by the prediction information (PT information) included in the input slice data. At that time, a decoded picture of a reference picture or a coding meter is used as necessary.
- the CTU decoded image generation unit 1443 adds the input predicted image and the prediction residual to generate and output a decoded image of the target CTU.
- the generation process of the predicted pixel value of the target pixel included in the target CTU to which the inter-layer image prediction is applied is executed according to the following procedure.
- a reference picture position derivation process is executed to derive a corresponding reference position.
- the corresponding reference position is a position on the reference layer corresponding to the target pixel on the target layer picture. Since the pixels of the target layer and the reference layer do not necessarily correspond one-to-one, the corresponding reference position is expressed with an accuracy of less than the pixel unit in the reference layer.
- the prediction pixel value of the target pixel is generated by executing the interpolation filter process using the derived corresponding reference position as an input.
- the corresponding reference position derivation process the corresponding reference position is derived based on the picture information, the inter-layer pixel correspondence information, and the inter-layer phase correspondence information included in the parameter set.
- the detailed procedure of the corresponding reference position deriving process will be described with reference to FIG.
- FIG. 1 is a flowchart of the corresponding reference position derivation process.
- the corresponding reference position deriving process is realized by sequentially executing the following processes of S101 to S104.
- the reference layer corresponding region size and the inter-layer size ratio are calculated based on the target layer picture size, the reference layer picture size, and the inter-layer pixel correspondence information.
- the width SRLW and height SRLH of the reference layer corresponding region and the horizontal component scaleX and horizontal component scaleY of the size ratio between layers are calculated by the following equations.
- currPicW and currPicH are the height and width of the target picture.
- the target of the corresponding reference position derivation process is a luminance pixel, it matches the syntax values of pic_width_in_luma_samples and pic_height_in_luma_samples included in the SPS picture information in the target layer To do.
- the object is a color difference, a value obtained by converting the syntax value according to the type of color format is used.
- refPicW and refPicH are the height and width of the reference picture, and when the target is a luminance pixel, they match the syntax values of pic_width_in_luma_samples and pic_height_in_luma_samples included in the SPS picture information in the reference layer.
- SRLLeftOffset, SRLRightOffset, SRLTopOffset, and SRLBottomOffset are inter-layer pixel correspondence offsets described with reference to FIG.
- the provisional reference position is calculated based on the inter-layer pixel correspondence information and the inter-layer size ratio.
- the horizontal component xRefTmp and the vertical component yRefTmp at the provisional reference position corresponding to the target layer pixel are calculated by the following equations. Note that xRefTmp represents the position in the horizontal direction with respect to the upper left pixel of the reference layer picture, and yRefTmp represents the position in the vertical direction with reference to the upper left pixel in pixel units of the reference layer picture.
- xRefTmp (xP-SRLLeftOffset) * scaleX
- yRefTmp (yP-SRLTopOffset) * scaleY
- xP and yP represent the horizontal component and the vertical component of the target layer pixel with reference to the upper left pixel of the target layer picture, in pixel units of the target layer picture.
- Floor (X) with respect to the real number X means the maximum integer that does not exceed X.
- the provisional reference position is a value obtained by scaling the position of the target pixel with respect to the upper left pixel of the reference layer corresponding area by the size ratio between layers.
- scaleX and scaleY may be calculated as integers obtained by multiplying actual magnification values by a predetermined value (for example, 16), and xRefTmp and yRefTmp may be calculated using the integer values.
- xRefTmp and yRefTmp may be calculated using the integer values.
- correction may be performed in consideration of the phase difference between the luminance and the color difference.
- phase offset is calculated based on the inter-layer phase correspondence information included in the parameter set.
- the horizontal component phaseOffsetX of the phase offset and the vertical are based on the reference layer phase offsets ref_layer_left_phase_offset [i] and ref_layer_top_phase_offset [i] included in the phase correspondence information between layers.
- the component phaseOffsetY is calculated by the following equation.
- phaseOffsetX ref_layer_left_phase_offset [rlIdx] ⁇ 8
- phaseOffsetY ref_layer_left_phase_offset [rlIdx] ⁇ 8
- rlIdx is an index for selecting a reference layer at the time of the corresponding reference position derivation process.
- the phase offsets phaseOffsetX and phaseOffsetY are in pixel units
- the reference layer phase offsets ref_layer_left_phase_offset [rlIdx] and ref_layer_top_phase_offset [rlIdx] are in 1/8 pixel units, so the value obtained by dividing the latter by 8 is the value of the phase offset It is set to.
- the phase offset and the reference layer phase offset are expressed in different units, the adjustment should be made according to the difference between the units as appropriate, and it is not always necessary to set the phase offset as described above. Absent.
- xRef xRefTmp + phaseOffsetX
- yRef yRefTmp + phaseOffsetY That is, the value obtained by adding the phase offset to the temporary reference pixel position is derived as the corresponding reference position.
- the provisional reference pixel position, the phase offset, and the corresponding reference position are expressed in different units, it is not always necessary to calculate the corresponding reference position according to the above formula, and adjustments that match the units as appropriate should be performed.
- the corresponding reference position is calculated in units of pixels, but the present invention is not limited to this.
- a value of 1/16 pixel unit (xRef16, yRef16) in integer representation of the corresponding reference position may be calculated by the following expression.
- xRef16 Floor ((xRefTmp + phaseOffestX) * 16)
- yRef16 Floor ((yRefTmp + phaseOffsetY) * 16)
- the position on the reference layer picture corresponding to the target pixel on the target layer picture can be derived as the corresponding reference position.
- procedure S103a is performed instead of procedure S103 of said corresponding
- FIG. 20 is a diagram illustrating the relationship between the horizontal components of points and amounts used in the calculation when the phase offset is calculated using the reference layer crop offset.
- the target pixel xP, the upper left pixel xO of the reference layer corresponding region, and the target layer standard pixel xBase are on the target layer.
- xBase is located to the left of the reference layer crop offset croppedOffsetX from x0.
- On the reference layer there are reference positions (corresponding reference positions) xRef corresponding to the positions xRefInt and xP of the pixels corresponding to the pixels xBaseRef and xP corresponding to xBase.
- the distance D between xRef and xBaseRef is calculated by (xP ⁇ croppedOffsetX) * scaleX.
- the distance D is derived by multiplying the distance between xP and xBase (xP ⁇ ppcroppedOffsetX) by the reference layer ratio scaleX.
- xBaseRef is assumed to be in a pixel, that is, an integer position
- the distance between xRefInt and xBaseRef is an integer component (Floor (D)) of distance D.
- the phase offset phaseOffsetX that is, the distance between xRef and xRefInt is less than one pixel
- the fractional part (Frac (D)) of the distance D is the value of phaseOffsetX.
- phase offset is derived by the processing of S103a using the reference layer crop offset included in the inter-layer phase correspondence information
- the phase offset is calculated for each target pixel, and therefore the reference layer phase offset does not depend on the target pixel position.
- the processing amount of the corresponding reference position derivation process increases compared with the case where the process of S103 for deriving the phase offset based on the value of is increased, but it is more accurate particularly when the phase offset is represented by an approximate value represented by an integer expression. There is an advantage that the phase offset can be derived.
- the pixel value at the position corresponding to the corresponding reference position derived in the corresponding reference position deriving process is applied to the decoded pixels of the pixels near the corresponding reference position on the reference layer picture. Generate.
- the hierarchical video decoding device 1 (hierarchical image decoding device) according to the present embodiment described above decodes the encoded data of the upper layer included in the encoded data that has been hierarchically encoded, and the upper layer that is the target layer.
- An image decoding apparatus that restores a decoded picture which includes a parameter set decoding unit 12 that decodes a parameter set, and a prediction image generation unit 1442 that generates a prediction image by inter-layer prediction with reference to decoded pixels of a reference layer picture
- the parameter set decoding unit 12 decodes inter-layer phase correspondence information that is information relating to a target layer pixel and a position on a reference layer picture corresponding to the target layer pixel.
- the hierarchical video decoding apparatus 1 can derive an accurate position on the reference layer picture corresponding to the prediction target pixel using the inter-layer phase correspondence information. Will improve. Therefore, it is possible to decode the encoded data having a smaller code amount than the conventional one and output the decoded picture of the upper layer.
- FIG. 21 is a functional block diagram showing a schematic configuration of the hierarchical video encoding device 2.
- the hierarchical video encoding device 2 encodes the input image PIN # T of the target layer with reference to the reference layer encoded data DATA # R to generate hierarchical encoded data DATA of the target layer. It is assumed that the reference layer encoded data DATA # R has been encoded in the hierarchical video encoding apparatus corresponding to the reference layer.
- the hierarchical video encoding apparatus 2 includes a NAL multiplexing unit 21, a parameter set encoding unit 22, a tile setting unit 23, a slice encoding unit 14, a decoded picture management unit 16, and a base decoding unit. 15.
- the NAL multiplexer 21 generates NAL-multiplexed hierarchical moving image encoded data DATA by storing the input target layer encoded data DATA # T and reference layer encoded data DATA # R in the NAL unit. And output to the outside.
- the parameter set encoding unit 22 sets parameter sets (VPS, SPS, and PPS) used for encoding the input image based on the input tile information and the input image, and sets the target layer encoded data DATA #. Packetized as a part of T in the format of VCL NAL and supplied to the NAL multiplexer 21.
- the parameter set encoded by the parameter set encoding unit 22 includes at least the picture information, the display area information, and the inter-layer pixel correspondence information described in relation to the hierarchical video decoding device 1.
- the tile setting unit 23 sets the tile information of a picture based on the input image, and supplies it to the parameter set encoding unit 22 and the slice encoding unit 24.
- tile information indicating that the picture size is divided into M ⁇ N tiles is set.
- M and N are arbitrary positive integers.
- the tile information may be set so that the picture is divided into tiles of a predetermined size (for example, tiles of 128 pixels ⁇ 128 pixels).
- the slice encoding unit 24 is a part of the input image corresponding to the slice constituting the picture. Is encoded to generate encoded data of the part, and the encoded data is supplied to the NAL multiplexer 21 as a part of the target layer encoded data DATA # T. Detailed description of the slice encoding unit 24 will be described later.
- the decoded picture management unit 16 is the same component as the decoded picture management unit 16 included in the hierarchical video decoding device 1 already described. However, since the decoded picture management unit 16 included in the hierarchical video encoding device 2 does not need to output the picture recorded in the internal DPB as an output picture, the output can be omitted. Note that the description described as “decoding” in the description of the decoded picture management unit 16 of the hierarchical video decoding device 1 is also applied to the decoded picture management unit 16 of the hierarchical video encoding device 2 by replacing “coding”. it can.
- the base decoding unit 15 is the same constituent element as the base decoding unit 15 included in the hierarchical video decoding device 1 already described, and detailed description thereof is omitted.
- FIG. 22 is a functional block diagram showing a schematic configuration of the slice encoding unit 24.
- the slice encoding unit 24 includes a slice header setting unit 241, a slice position setting unit 242, and a CTU encoding unit 244.
- the CTU encoding unit 244 includes a prediction residual encoding unit 2441, a prediction image encoding unit 2442, and a CTU decoded image generation unit 1443 therein.
- the slice header setting unit 241 generates a slice header used for encoding an input image input in units of slices based on the input parameter set and slice position information.
- the generated slice header is output as a part of the slice encoded data, and is supplied to the CTU encoding unit 244 together with the input image.
- the slice header generated by the slice header setting unit 241 includes at least the SH slice position information described with reference to FIG.
- the slice position setting unit 242 determines the slice position in the picture based on the input tile information and supplies the slice position to the slice header setting unit 241.
- the CTU encoding unit 244 encodes an input image (target slice portion) in units of CTU based on the input parameter set and slice header, and generates slice data and a decoded image (decoded picture) related to the target slice. Output.
- the CTU encoding is performed by a prediction image encoding unit 2442, a prediction residual encoding unit 2441, and a CTU decoded image generation unit.
- the prediction image encoding unit 2441 determines a prediction method and prediction parameters of the target CTU included in the target slice, generates a prediction image based on the determined prediction method, and generates a prediction residual encoding unit 2441 and a CTU decoded image. Output to the unit 1443.
- Information on the prediction method and prediction parameters is variable-length encoded as prediction information (PT information) and output as a part of slice data included in the slice encoded data.
- the prediction methods that can be selected by the prediction image encoding unit 2441 include at least inter-layer image prediction.
- the prediction image encoding unit 2441 executes a corresponding reference position derivation process, determines a reference layer pixel position corresponding to the prediction target pixel, and performs interpolation based on the position A predicted pixel value is determined by processing.
- a corresponding reference position derivation process each process described for the predicted image generation unit 1442 of the hierarchical video decoding device 1 can be applied. For example, the process described with reference to FIG. 1 can be applied.
- the prediction residual coding unit 2441 converts a quantized transform coefficient (TT information) obtained by transforming and quantizing a difference image between an input image and a predicted image into one piece of slice data included in the slice coded data. Output as part.
- the prediction residual is restored by applying inverse transformation / inverse quantization to the quantized transform coefficient, and the restored prediction residual is output to the CTU decoded image generation unit 1443.
- the CTU decoded image generation unit 1443 has the same function as the component of the same name of the hierarchical video decoding device 1, the same reference numerals are given and the description thereof is omitted.
- the hierarchical video encoding apparatus 2 is an image encoding apparatus that generates encoded data of an upper layer from an input image, and includes a parameter set encoding unit 22 that decodes a parameter set; A prediction image encoding unit 2442 that generates a prediction image by inter-layer prediction with reference to a decoded pixel of a reference layer picture, and the parameter set encoding unit 22 includes a target layer pixel and a reference corresponding to the target layer pixel.
- the inter-layer phase correspondence information which is information related to the position on the layer picture, is encoded, and the prediction image encoding unit 2442 corresponds to the prediction target pixel based on the inter-layer phase correspondence information when performing the inter-layer prediction.
- a corresponding reference position deriving process for deriving the reference layer position is executed.
- the hierarchical moving image encoding apparatus 2 can derive an accurate position on the reference layer picture corresponding to the prediction target pixel using the inter-layer phase correspondence information, the prediction pixel generated by the interpolation process can be derived. Increases accuracy. Therefore, it is possible to generate and output encoded data with a smaller code amount than in the past.
- FIG. 23 is a functional block diagram showing a schematic configuration of the hierarchically encoded data conversion device 3.
- the hierarchical encoded data conversion device 3 converts the input hierarchical encoded data DATA to generate hierarchical encoded data DATA-ROI related to the input attention area information.
- the hierarchically encoded data DATA is hierarchically encoded data generated by the hierarchical moving image encoding device 2. Also, by inputting the hierarchically encoded data DATA-ROI to the hierarchical video decoding device 1, it is possible to reproduce the upper layer video related to the attention area information.
- the hierarchical encoded data conversion apparatus 3 includes a NAL demultiplexing unit 11, a NAL multiplexing unit 21, a parameter set decoding unit 12, a tile setting unit 13, a parameter set modification unit 32, and a NAL selection unit 34. including.
- Each of the NAL demultiplexing unit 11, the parameter set decoding unit 12, and the tile setting unit 13 has the same function as the component of the same name included in the hierarchical video decoding device 1, and therefore, the same reference numerals are given and description thereof is omitted. To do.
- the NAL multiplexing unit 21 has the same function as the component of the same name included in the hierarchical video encoding device 2, the same reference numeral is given and the description is omitted.
- the parameter set correction unit 32 corrects and outputs the input parameter set information based on the input attention area information and tile information.
- the parameter set correction unit 34 generally corrects picture information, display area information, inter-layer pixel correspondence information, inter-layer phase correspondence information, and PPS tile information included in the parameter set.
- the attention area information is a partial area of a picture specified by a user (for example, a viewer of a reproduction moving image) in a picture constituting the moving image.
- the attention area information is specified by a rectangular area, for example.
- an offset of a position from the corresponding side (upper side, lower side, left side, or right side) of the entire picture of the upper side, the lower side, the left side, and the right side of the rectangle representing the target region can be designated as the attention region information.
- an area having a shape other than a rectangle for example, a circle, a polygon, or an area indicating an object extracted by object extraction
- a rectangular attention area is assumed below. To do.
- a rectangle with the smallest area including the region of interest can be regarded as the region of interest in the following description.
- FIG. 24 is a diagram illustrating a relationship among pictures, attention areas, and tiles in hierarchically encoded data before and after conversion.
- FIG. 24A shows the relationship among pictures, attention areas, and tiles in hierarchically encoded data before conversion.
- the parameter set of the hierarchically encoded data before conversion is composed of nine tiles in total (three in the vertical and horizontal directions for the picture (before conversion)) (tiles T00, T01, T02, T10 in the raster scan order from the upper left). , T11, T12, T20, T21, T22).
- the attention area is set in the upper right part of the picture and has an area overlapping with the tiles T01, T02, T11, and T12.
- FIG. 24B shows the relationship among pictures, attention areas, and tiles in the hierarchically encoded data after conversion.
- the parameter set of the hierarchically encoded data after conversion is composed of a total of four tiles (two tiles T01, T02, T11, and T12) in the vertical direction and horizontal direction in the picture (after conversion). That is, tiles (tiles T00, T10, T20, T21, T22) that are tiles in the picture before conversion and that do not have an overlapping area with the attention area are not included in the converted picture.
- the hierarchically encoded data conversion device 3 removes tiles that do not have a region of interest and an overlapping region from the input hierarchically encoded data before conversion, and corrects a related parameter set so that a post-conversion is performed.
- the hierarchically encoded data is generated.
- the hierarchical moving image decoding apparatus can generate a decoded image related to a region of interest with the converted hierarchical encoded data as an input.
- the parameter set correction unit 32 refers to the input attention area information and tile information, and updates the PPS tile information so that a part of the corresponding area includes only a tile (extraction target tile) that overlaps the attention area.
- the PPS tile information is updated based on the extraction target tile information. First, when there is one extraction target tile, tiles_enabled_flag is corrected to 0. If there are two or more extraction target tiles, the correction process can be omitted. Next, the number of tile rows (nu_tile_columns_minus1) and the number of tile columns (num_tile_rows_minus1) are corrected based on the number of extraction target tiles included in the horizontal and vertical directions of the picture.
- the bit string corresponding to the syntax related to the width of the tile column not including the extraction target tile and the height of the tile row not including the extraction target tile is set as a parameter. Remove from set.
- the parameter set correction unit 32 corrects the picture information using the area corresponding to the set of extraction target tiles as the converted picture size.
- the sum of tile columns each including the tiles T01 and T02 is set as the picture width pic_width_in_luma_samples of the target layer after correction.
- the sum of the heights of tile rows each including the tiles T01 and T11 is set as the picture height pic_height_in_luma_samples of the target layer.
- the parameter set correction unit 32 corrects the inter-layer pixel correspondence information included in the parameter set based on the change in the picture size. Specifically, all the inter-layer pixel correspondence offsets included in the inter-layer pixel correspondence information are corrected.
- the post-scale reference layer left offset (scaled_ref_layer_left_offset [i]), which constitutes the inter-layer pixel correspondence offset, is the tile column to the left of the region of interest, and the sum of the widths of the tile columns not including the extraction target tile is added The For example, in the example of FIG. 24, the width of the tile row including the tile T00 is added.
- the post-scale reference layer upper offset (scaled_ref_layer_top_offset [i]) is added to the sum of the heights of tile rows that are above the region of interest and do not include the extraction target tile.
- the sum of the widths of tile columns that are to the right of the attention area and do not include the extraction target tile is added to the post-scale reference layer right offset (scaled_ref_layer_right_offset [i]).
- the sum of the heights of tile rows that are below the region of interest and do not include the extraction target tile is added to the post-scale reference layer lower offset (scaled_ref_layer_bottom_offset [i]).
- the parameter set correction unit 32 corrects the inter-layer phase correspondence information included in the parameter set based on the change in the picture size.
- the phase correspondence information between layers is corrected so that the phase of the upper left pixel of the picture after conversion matches the phase of the same pixel before conversion.
- the phase derived in the corresponding reference position derivation process is the upper left pixel of the converted picture.
- the corrected reference layer offset is used as the inter-layer phase correspondence information
- specific correction processing is as follows. First, the corresponding reference positions xLORef and yL0Ref for the upper left pixels xLO and yL0 in the extraction target region of the upper layer picture are calculated. For the derivation of the target reference position, for example, the corresponding reference position derivation process described in the predicted image generation unit 1442 of the hierarchical video decoding device may be applied with reference to the parameter set before correction. Next, provisional corresponding reference positions xLARefTmp and yLARefTmp for the upper left pixels xLA and yLA of the converted upper layer picture are derived with reference to the corrected parameter set in which the reference layer phase offset is set to 0. When the corrected reference layer left phase offset is phaseLAft and the reference layer phase offset is phaseTAft, the corrected reference layer offset can be determined by the following equation.
- phaseLAft Frac (Frac (xLORef)-Frac (xLARefTmp))
- phaseTAft Frac (Frac (yLORef)-Frac (yLARefTmp))
- the parameter set correction unit 32 rewrites the display area information of the SPS included in the input parameter set so as to match the attention area indicated by the input attention area information.
- the display area information of SPS is rewritten by the following steps S301 to S303.
- Each offset of the display area offset is set to the offset value of the position of each side of the rectangle representing the attention area with the corresponding side of the picture. For example, the position offset of the upper side of the attention area with respect to the upper side of the picture is set to the value of the display area upper offset (conf_win_top_offset). If the value of the display area flag before rewriting is 1, the original attention area offset value is overwritten using the attention area offset value set above. When the value of the display area flag before rewriting is 1, the set attention area offset is inserted immediately after the SPS display area flag.
- the NAL selection unit 34 selects an input video encoding layer NAL (VCL NAL) based on the input attention area information and tile information.
- VCL NAL input video encoding layer NAL
- the selected VCL NAL is sequentially output to the NAL multiplexing unit 21, and the unselected VCLALNAL is discarded.
- the VCL NAL selected by the NAL selection unit 34 is a VCL NAL including a slice header and slice data related to a slice included in the extraction target tile. As described with reference to FIG. 24, the extraction target tile is determined based on the attention area information and the tile information. The NAL selection unit 34 determines whether the slice is included in the extraction target tile from the slice address and tile information included in the slice header. If included, the NAL selection unit 34 selects the VCL NAL including the slice. If not, discard the VCL NAL.
- Hierarchical coded data conversion process flow The hierarchical encoded data conversion process by the hierarchical encoded data conversion device 3 is realized by sequentially executing the procedures shown in S501 to S506.
- the NAL demultiplexing unit 11 demultiplexes the input hierarchical encoded data DATA.
- the part related to the parameter set (non-VCL NAL) is output to the parameter decoding unit 12, and the video encoding is the part related to the slice layer (slice header, slice data)
- the layer NAL (VCL NAL) is output to the NAL selector 34.
- the obtained reference layer encoded data DATA # R is output to the NAL demultiplexer 21.
- the parameter set decoding unit 12 decodes the parameter set (VPS, SPS, PPS) from the input non-VCL NAL and outputs it to the parameter set correction unit 32 and the tile setting unit 13.
- the tile setting unit 13 derives tile information from the input parameter set, and outputs the tile information to the parameter set correction unit 32 and the NAL selection unit 34.
- the parameter set correction unit 32 corrects and outputs the parameter set input based on the input attention area information and tile information.
- the NAL selection unit 34 selects a part of the input VCL NAL based on the input tile information and attention area information, and outputs the selected VCL NAL to the NAL multiplexing unit 21.
- the NAL multiplexing unit 21 receives the input reference layer encoded data DATA # R as the encoded data of the target layer after correction of the corrected parameter set, the corrected slice header, and the slice data. And output to the outside as hierarchically encoded data DATA-ROI.
- the hierarchically encoded data conversion device 3 based on attention area information, part of the encoded data (VCL NAL) of the video layer included in the encoded data of the target layer (upper layer).
- the NAL selection unit 34 to be corrected and the hierarchical encoded data conversion device 3 include a parameter set correction unit 32.
- the NAL selection unit 34 selects, as an extraction target tile, a tile having an area overlapping with the attention area based on the attention area indicated by the attention area information, and the code of the video layer related to the slice included in the selected extraction target tile
- the encoded data is included in the converted hierarchical encoded data.
- the parameter set correcting unit 32 corrects the picture information, the PPS tile information, the display information, the inter-layer pixel correspondence information, and the inter-layer phase correspondence information based on the attention area information and the tile information.
- the input hierarchical encoded data is converted, and the VCL NAL related to the extraction target tile (the tile having the area overlapping the attention area) is extracted in the upper layer.
- the hierarchical encoded data after conversion can be configured. Since the VCL NAL related to the tile having no overlap area with the attention area is discarded, the code amount of the hierarchically encoded data after conversion is smaller than that of the hierarchically encoded data before conversion.
- the hierarchically encoded data after conversion since the parameter set is modified to match picture information, PPS tile information, and display information in accordance with the extraction target tile, the hierarchically encoded data after conversion is a hierarchical moving image.
- the decoding device can be decoded by the decoding device, and a decoded picture related to the region of interest can be displayed.
- the inter-layer pixel correspondence information and the inter-layer phase correspondence information are corrected, the correspondence relationship between the upper layer pixel and the reference layer pixel is maintained in the encoded data before and after the conversion. Therefore, the prediction image of inter-layer prediction generated from the encoded data before conversion and the prediction image of inter-layer prediction generated from the encoded data after conversion can be maintained at the same level.
- a system that displays attention area information can be configured by combining the above-described hierarchical moving picture decoding apparatus 1, hierarchical moving picture encoding apparatus 2, and hierarchical encoded data conversion apparatus 3.
- FIG. 25 is a block diagram illustrating a configuration of a region of interest display system that is a combination of the hierarchical video decoding device 1, the hierarchical video encoding device 2, and the hierarchical encoded data conversion device 3.
- the attention area display system SYS is generally provided by hierarchically encoding and storing input images having different qualities, and converting and providing the hierarchically encoded data accumulated according to attention area information from the user, By decoding the converted hierarchically encoded data, a high-quality reproduced image related to the region of interest (ROI) is displayed.
- ROI region of interest
- the attention area display system SYS includes a hierarchical video encoding unit SYS1A, a hierarchical video encoding unit SYS1B, a hierarchical encoded data storage unit SYS2, a hierarchical encoded data conversion unit SYS3, and a hierarchical video decoding.
- the unit SYS4, the display control unit SYS5, the ROI display unit SYS6, the whole display unit SYS7, and the ROI notification unit SYS8 are included as constituent elements.
- the hierarchical video encoding device 2 described above can be used for the hierarchical video encoding units SYS1A and SYS1B.
- the hierarchically encoded data storage unit SYS2 stores hierarchically encoded data and supplies the hierarchically encoded data as required.
- a computer having a recording medium (memory, hard disk, optical disk) can be used as the hierarchically encoded data storage unit SYS2.
- the hierarchical encoded data conversion unit 3 can be used as the hierarchical encoded data conversion unit SYS3.
- the hierarchical video decoding device 1 described above can be used for the hierarchical video decoding unit SYS4.
- the display control unit SYS5 provides the decoded picture as the ROI display image to the ROI display unit SYS6 based on the attention area information or supplies the decoded picture as the entire display image to the entire display unit SYS7.
- the display control unit SYS5 supplies the whole picture display unit SYS7 with the decoded picture of the lower layer, which is a decoded picture input from the hierarchical moving picture decoding unit, as a whole display image.
- the ROI display unit SYS6 supplies the decoded picture input from the hierarchical moving image decoding unit and the decoded picture of the upper layer to the ROI display unit SYS6 as the ROI display image. Note that when the attention area is not specified in the attention area information, the ROI display image is not supplied to the ROI display section SYS6.
- the display control unit SYS5 supplies the whole picture display unit SYS7 with the decoded picture of the lower layer, which is a decoded picture input from the hierarchical moving picture decoding unit, as a whole display image.
- the decoded picture is not supplied to the ROI display unit SYS6.
- the display control unit SYS5 is configured so that the decoded picture of the upper layer of the hierarchically encoded data related to the attention area information is supplied from the hierarchical video decoding unit SYS4.
- a partial region of the decoded picture in the lower layer of the encoded data and corresponding to the region of interest may be supplied to the ROI display unit SYS6 as the ROI display image.
- the partial area of the decoded picture of the lower layer, which corresponds to the attention area has a lower image quality than the decoded picture of the upper layer related to the attention area.
- ROI display unit SYS6 displays the ROI display image at a predetermined display position in a predetermined display area.
- the display area is a television screen
- the display position is a partial area (for example, a rectangular area in the upper right corner).
- the display area is a display of a portable terminal (smart phone or tablet computer), and the display position is the whole.
- the whole display unit SYS7 displays the whole display image at a predetermined display position in a predetermined display area.
- the display area is a television screen, and the display position is the whole.
- the display area of whole display part SYS7 and ROI display part SYS6 is the same, it is preferable to display so that a ROI display image may be superimposed on a whole display image.
- the ROI display unit SYS6 and the entire display unit SYS7 may display the input image enlarged or reduced to a size that matches the size of the display area.
- the ROI notification unit SYS8 notifies attention area information designated by the user by a predetermined method. For example, the user can inform the ROI notification unit of the attention area by designating an area corresponding to the attention area on the display area where the entire display image is displayed. Note that the ROI notification unit SYS8 notifies information indicating that there is no attention area as attention area information when there is no user designation.
- Processing by the attention area display system can be divided into hierarchical encoded data generation and accumulation processing and attention area data generation and reproduction processing.
- hierarchical encoded data generation and accumulation process hierarchical encoded data is generated and stored from different quality input images.
- the hierarchically encoded data generation / accumulation process is executed in the sequence from T101 to T103.
- the hierarchical moving image encoding unit SYS1B encodes the input low-quality input image and supplies the generated hierarchical encoded data to the hierarchical moving image encoding unit SYS1A. That is, the hierarchical moving image encoding unit SYS1B generates and outputs hierarchical encoded data used as a reference layer (lower layer) in the hierarchical moving image encoding unit SYS1A from the input image.
- the hierarchical moving image encoding unit SYS1A encodes the input high-quality input image using the input hierarchical encoded data as encoded data of the reference layer, generates hierarchical encoded data, and generates a hierarchical code Output to the digitized data storage unit SYS2.
- the hierarchically encoded data storage unit SYS2 attaches an appropriate index to the input hierarchically encoded data and records it on an internal recording medium.
- the hierarchically encoded data is read from the hierarchically encoded data storage unit SYS2, converted into hierarchically encoded data corresponding to the attention area, and the converted hierarchically encoded data is decoded and reproduced and displayed. .
- the attention area data generation / reproduction processing is executed in the following steps T201 to T207.
- T201 The hierarchically encoded data related to the moving image selected by the user is supplied from the hierarchically encoded data storage unit SYS2 to the hierarchically encoded data conversion unit SYS3.
- the ROI notification unit SYS8 notifies the user-specified region-of-interest information to the hierarchically encoded data conversion unit SYS3 and the display control unit SYS5.
- the hierarchical encoded data conversion unit SYS3 converts the input hierarchical encoded data based on the input attention area information, and outputs the converted hierarchical encoded data to the hierarchical video decoding unit SYS4.
- the hierarchical video decoding unit SYS4 decodes the input hierarchical video encoded data (after conversion), and outputs the reproduced decoded pictures of the upper layer and the lower layer to the display control unit SYS5.
- the display control unit SYS5 outputs the input decoded picture to the ROI display unit SYS6 and the entire display unit SYS7 based on the input attention area information.
- the entire display unit SYS7 displays the input entire display image.
- the ROI display unit SYS6 displays the input ROI display image.
- the attention area display system SYS includes an attention area notification section (ROI notification section SYS8) that supplies attention area information, and converts the hierarchically encoded data based on the attention area information and after conversion.
- a hierarchical encoded data conversion unit SYS3 that generates hierarchical encoded data
- a hierarchical moving image decoding unit SYS4 that decodes the converted hierarchical encoded data and outputs decoded pictures of an upper layer and a lower layer
- a display control unit SYS5 An attention area display section (ROI display section SYS6), and an entire display section SYS7.
- the display control unit SYS5 supplies the lower layer decoded picture to the entire display unit SYS7, and supplies the upper layer decoded picture to the attention area display unit.
- the entire decoded picture of the lower layer can be displayed, and the decoded picture of the area specified by the attention area information can be displayed.
- the decoded picture of the area specified by the attention area information is decoded using the encoded data of the upper layer of the hierarchically encoded data, so that the image quality is high.
- the hierarchically encoded data converted based on the attention area has a smaller code amount than the hierarchically encoded data before conversion. Therefore, by using the attention area display system SYS described above, it is possible to reproduce a decoded picture with high image quality related to the attention area while reducing the bandwidth required for transferring the hierarchically encoded data.
- the above-described hierarchical video encoding device 2 and hierarchical video decoding device 1 can be used by being mounted on various devices that perform transmission, reception, recording, and reproduction of moving images.
- the moving image may be a natural moving image captured by a camera or the like, or may be an artificial moving image (including CG and GUI) generated by a computer or the like.
- FIG. 26A is a block diagram illustrating a configuration of a transmission device PROD_A in which the hierarchical moving image encoding device 2 is mounted.
- the transmission device PROD_A modulates a carrier wave with an encoding unit PROD_A1 that obtains encoded data by encoding a moving image, and with the encoded data obtained by the encoding unit PROD_A1.
- a modulation unit PROD_A2 that obtains a modulation signal and a transmission unit PROD_A3 that transmits the modulation signal obtained by the modulation unit PROD_A2 are provided.
- the hierarchical moving image encoding apparatus 2 described above is used as the encoding unit PROD_A1.
- the transmission device PROD_A is a camera PROD_A4 that captures a moving image, a recording medium PROD_A5 that records the moving image, an input terminal PROD_A6 that inputs the moving image from the outside, as a supply source of the moving image input to the encoding unit PROD_A1.
- An image processing unit A7 that generates or processes an image may be further provided.
- FIG. 26A illustrates a configuration in which the transmission apparatus PROD_A includes all of these, but some of them may be omitted.
- the recording medium PROD_A5 may be a recording of a non-encoded moving image, or a recording of a moving image encoded by a recording encoding scheme different from the transmission encoding scheme. It may be a thing. In the latter case, a decoding unit (not shown) for decoding the encoded data read from the recording medium PROD_A5 according to the recording encoding method may be interposed between the recording medium PROD_A5 and the encoding unit PROD_A1.
- FIG. 26 is a block diagram illustrating a configuration of the receiving device PROD_B in which the hierarchical video decoding device 1 is mounted.
- the receiving device PROD_B includes a receiving unit PROD_B1 that receives the modulated signal, a demodulating unit PROD_B2 that obtains encoded data by demodulating the modulated signal received by the receiving unit PROD_B1, and a demodulator.
- a decoding unit PROD_B3 that obtains a moving image by decoding the encoded data obtained by the unit PROD_B2.
- the above-described hierarchical video decoding device 1 is used as the decoding unit PROD_B3.
- the receiving device PROD_B has a display PROD_B4 for displaying a moving image, a recording medium PROD_B5 for recording the moving image, and an output terminal for outputting the moving image to the outside as a supply destination of the moving image output by the decoding unit PROD_B3.
- PROD_B6 may be further provided.
- FIG. 26B illustrates a configuration in which all of these are provided in the receiving device PROD_B, but some of them may be omitted.
- the recording medium PROD_B5 may be used for recording a non-encoded moving image, or may be encoded using a recording encoding method different from the transmission encoding method. May be. In the latter case, an encoding unit (not shown) for encoding the moving image acquired from the decoding unit PROD_B3 according to the recording encoding method may be interposed between the decoding unit PROD_B3 and the recording medium PROD_B5.
- the transmission medium for transmitting the modulation signal may be wireless or wired.
- the transmission mode for transmitting the modulated signal may be broadcasting (here, a transmission mode in which the transmission destination is not specified in advance) or communication (here, transmission in which the transmission destination is specified in advance). Refers to the embodiment). That is, the transmission of the modulation signal may be realized by any of wireless broadcasting, wired broadcasting, wireless communication, and wired communication.
- a terrestrial digital broadcast broadcasting station (broadcasting equipment or the like) / receiving station (such as a television receiver) is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives a modulated signal by wireless broadcasting.
- a broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) of cable television broadcasting is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives a modulated signal by cable broadcasting.
- a server workstation etc.
- Client television receiver, personal computer, smart phone etc.
- VOD Video On Demand
- video sharing service using the Internet is a transmitting device for transmitting and receiving modulated signals by communication.
- PROD_A / reception device PROD_B usually, either a wireless or wired transmission medium is used in a LAN, and a wired transmission medium is used in a WAN.
- the personal computer includes a desktop PC, a laptop PC, and a tablet PC.
- the smartphone also includes a multi-function mobile phone terminal.
- the video sharing service client has a function of encoding a moving image captured by the camera and uploading it to the server. That is, the client of the video sharing service functions as both the transmission device PROD_A and the reception device PROD_B.
- FIG. 27A is a block diagram illustrating a configuration of a recording apparatus PROD_C in which the above-described hierarchical video encoding apparatus 2 is mounted.
- the recording device PROD_C has an encoding unit PROD_C1 that obtains encoded data by encoding a moving image, and the encoded data obtained by the encoding unit PROD_C1 on the recording medium PROD_M.
- the hierarchical moving image encoding device 2 described above is used as the encoding unit PROD_C1.
- the recording medium PROD_M may be of a type built in the recording device PROD_C, such as (1) HDD (Hard Disk Drive) or SSD (Solid State Drive), or (2) SD memory. It may be of the type connected to the recording device PROD_C, such as a card or USB (Universal Serial Bus) flash memory, or (3) DVD (Digital Versatile Disc) or BD (Blu-ray Disc: registration) For example, it may be loaded into a drive device (not shown) built in the recording device PROD_C.
- the recording device PROD_C is a camera PROD_C3 that captures moving images as a supply source of moving images to be input to the encoding unit PROD_C1, an input terminal PROD_C4 for inputting moving images from the outside, and reception for receiving moving images.
- the unit PROD_C5 and an image processing unit C6 that generates or processes an image may be further provided.
- FIG. 27A illustrates a configuration in which the recording apparatus PROD_C includes all of these, but some of them may be omitted.
- the receiving unit PROD_C5 may receive a non-encoded moving image, or may receive encoded data encoded by a transmission encoding scheme different from the recording encoding scheme. You may do. In the latter case, a transmission decoding unit (not shown) that decodes encoded data encoded by the transmission encoding method may be interposed between the reception unit PROD_C5 and the encoding unit PROD_C1.
- Examples of such a recording device PROD_C include a DVD recorder, a BD recorder, and an HDD (Hard Disk Drive) recorder (in this case, the input terminal PROD_C4 or the receiving unit PROD_C5 is a main supply source of moving images).
- a camcorder in this case, the camera PROD_C3 is a main source of moving images
- a personal computer in this case, the receiving unit PROD_C5 or the image processing unit C6 is a main source of moving images
- a smartphone in this case In this case, the camera PROD_C3 or the receiving unit PROD_C5 is a main supply source of moving images
- the camera PROD_C3 or the receiving unit PROD_C5 is a main supply source of moving images
- FIG. 27 is a block showing a configuration of the playback device PROD_D in which the above-described hierarchical video decoding device 1 is mounted.
- the playback device PROD_D reads a moving image by decoding a read unit PROD_D1 that reads encoded data written to the recording medium PROD_M and a coded data read by the read unit PROD_D1. And a decoding unit PROD_D2 to be obtained.
- the hierarchical moving image decoding apparatus 1 described above is used as the decoding unit PROD_D2.
- the recording medium PROD_M may be of the type built into the playback device PROD_D, such as (1) HDD or SSD, or (2) such as an SD memory card or USB flash memory, It may be of a type connected to the playback device PROD_D, or (3) may be loaded into a drive device (not shown) built in the playback device PROD_D, such as DVD or BD. Good.
- the playback device PROD_D has a display PROD_D3 that displays a moving image, an output terminal PROD_D4 that outputs the moving image to the outside, and a transmission unit that transmits the moving image as a supply destination of the moving image output by the decoding unit PROD_D2.
- PROD_D5 may be further provided.
- FIG. 27B illustrates a configuration in which the playback apparatus PROD_D includes all of these, but a part of the configuration may be omitted.
- the transmission unit PROD_D5 may transmit an unencoded moving image, or transmits encoded data encoded by a transmission encoding method different from the recording encoding method. You may do. In the latter case, it is preferable to interpose an encoding unit (not shown) that encodes a moving image with an encoding method for transmission between the decoding unit PROD_D2 and the transmission unit PROD_D5.
- Examples of such a playback device PROD_D include a DVD player, a BD player, and an HDD player (in this case, an output terminal PROD_D4 to which a television receiver or the like is connected is a main supply destination of moving images).
- a television receiver in this case, the display PROD_D3 is a main supply destination of moving images
- a digital signage also referred to as an electronic signboard or an electronic bulletin board
- the display PROD_D3 or the transmission unit PROD_D5 is a main supply of moving images.
- Desktop PC (in this case, the output terminal PROD_D4 or the transmission unit PROD_D5 is the main video image supply destination), laptop or tablet PC (in this case, the display PROD_D3 or the transmission unit PROD_D5 is a moving image)
- a smartphone which is a main image supply destination
- a smartphone in this case, the display PROD_D3 or the transmission unit PROD_D5 is a main moving image supply destination
- the like are also examples of such a playback device PROD_D.
- each block of the hierarchical video decoding device 1 and the hierarchical video encoding device 2 may be realized in hardware by a logic circuit formed on an integrated circuit (IC chip), or may be a CPU (Central It may be realized by software using a Processing Unit).
- IC chip integrated circuit
- CPU Central It may be realized by software using a Processing Unit
- each of the devices includes a CPU that executes instructions of a control program that realizes each function, a ROM (Read Memory) that stores the program, a RAM (Random Access Memory) that expands the program, the program, and A storage device (recording medium) such as a memory for storing various data is provided.
- An object of the present invention is to provide a recording medium in which a program code (execution format program, intermediate code program, source program) of a control program for each of the above devices, which is software that realizes the above-described functions, is recorded in a computer-readable manner This can also be achieved by supplying each of the above devices and reading and executing the program code recorded on the recording medium by the computer (or CPU or MPU (Micro Processing Unit)).
- a program code execution format program, intermediate code program, source program
- Examples of the recording medium include tapes such as magnetic tapes and cassette tapes, magnetic disks such as floppy (registered trademark) disks / hard disks, CD-ROMs (Compact Disc-Read-Only Memory) / MO (Magneto-Optical) / Discs including optical discs such as MD (Mini Disc) / DVD (Digital Versatile Disc) / CD-R (CD Recordable), cards such as IC cards (including memory cards) / optical cards, mask ROM / EPROM (Erasable) Programmable Read-only Memory / EEPROM (registered trademark) (ElectricallyErasable Programmable Read-only Memory) / Semiconductor memories such as flash ROM, or logic circuits such as PLD (Programmable Logic Device) and FPGA (Field Programmable Gate Array) Etc. can be used.
- tapes such as magnetic tapes and cassette tapes
- magnetic disks such as floppy (registered trademark) disks / hard disks
- each of the above devices may be configured to be connectable to a communication network, and the program code may be supplied via the communication network.
- the communication network is not particularly limited as long as it can transmit the program code.
- the Internet intranet, extranet, LAN (Local Area Network), ISDN (Integrated Services Digital Network), VAN (Value-Added Network), CATV (Community Area Antenna Television) communication network, Virtual Private Network (Virtual Private Network), A telephone line network, a mobile communication network, a satellite communication network, etc. can be used.
- the transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type.
- IEEE Institute of Electrical and Electronic Engineers 1394, USB, power line carrier, cable TV line, telephone line, ADSL (Asymmetric Digital Subscriber Line) line, etc. wired such as IrDA (Infrared Data Association) or remote control
- IrDA Infrared Data Association
- remote control such as Bluetooth (registered trademark), IEEE 802.11 wireless, HDR (High Data Rate), NFC (Near Field Communication), DLNA (Digital Living Network Alliance) (registered trademark), mobile phone network, satellite line, terrestrial digital network, etc. It can also be used wirelessly.
- the present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.
- an image decoding apparatus decodes encoded data of an upper layer included in hierarchically encoded data, and restores a decoded picture of an upper layer that is a target layer
- An image decoding apparatus comprising: a parameter set decoding unit that decodes a parameter set; and a prediction image generation unit that generates a prediction image by inter-layer prediction with reference to a decoded pixel of a reference layer picture, the parameter set decoding unit Is characterized by decoding the phase correspondence information between layers, which is information relating to the target layer pixel and the position on the reference layer picture corresponding to the target layer pixel.
- the inter-layer phase correspondence information is a reference layer that is an amount representing a difference between an upper left pixel of the reference layer corresponding region and a corresponding reference position corresponding to the upper left pixel of the reference layer corresponding region. It preferably includes a phase offset.
- the inter-layer phase correspondence information includes a reference layer phase offset number that is an amount indicating the reference layer phase offset number included in the parameter set.
- each value of the reference layer phase offset is set to 0 when the reference layer phase offset is not decoded from the parameter set.
- the inter-layer phase correspondence information includes a reference pixel offset that is an amount representing a position of the reference pixel on the upper layer with respect to the upper left pixel of the reference layer corresponding region.
- the reference pixel indicated by the reference pixel offset has a horizontal position that is the same as or on the left side of the upper left pixel of the reference layer corresponding area, and a vertical position that is the reference layer corresponding area. It is preferable that the reference example layer position corresponding to the reference pixel is an integer position in pixel units.
- the prediction image generation unit performs a corresponding reference position derivation process for deriving a reference layer position corresponding to a prediction target pixel when performing inter-layer prediction
- the corresponding reference position derivation process includes: It is preferable to derive a reference layer position based on the inter-layer phase correspondence information.
- the corresponding reference position derivation process is based on the process of deriving a temporary reference position corresponding to the position of the reference layer pixel corresponding to the prediction target pixel and the phase correspondence information between layers. Preferably including a process for deriving an offset.
- the corresponding reference position deriving process derives the phase offset by adding the phase offset to the provisional reference position after applying conversion so that the units of both match. Is preferable.
- an image encoding device is an image encoding device that generates encoded data of an upper layer from an input image, a parameter set decoding unit that decodes a parameter set, and a reference
- a prediction image encoding unit that generates a prediction image by inter-layer prediction with reference to a decoded pixel of a layer picture
- the parameter set decoding unit includes a target layer pixel and a reference layer picture corresponding to the target layer pixel
- the inter-layer phase correspondence information which is information related to the position, is encoded, and the prediction image encoding unit derives a reference layer position corresponding to the prediction target pixel based on the inter-layer phase correspondence information when performing inter-layer prediction.
- the corresponding reference position deriving process is executed.
- an encoded data conversion apparatus converts an input hierarchical encoded data based on input attention area information, and outputs a hierarchical encoded data after conversion
- An encoded data conversion apparatus, a parameter set decoding unit for decoding a parameter set before correction from input hierarchical encoded data, and a parameter set after correction by correcting the parameter set before correction based on the attention area information of the input A parameter set correction unit to be generated; and a NAL selection unit that selects a coding layer NAL to be included in output hierarchical encoded data based on the tile information and the attention area information, and the NAL selection part includes: A tile that overlaps at least a part of the region of interest shown is set as the extraction target tile, and the slice included in the extraction target tile
- the corresponding video coding layer NAL is selected as a video coding layer NAL to be included in the layered coded data after conversion
- the parameter set modification unit includes a picture size and tile information included in the parameter set based on
- the parameter set correcting unit corrects display area information included in the parameter set so as to match the attention area information.
- the parameter set further includes inter-layer pixel correspondence information and inter-layer phase correspondence information
- the parameter set correction unit corresponds to a higher-layer pixel in the transformed hierarchically encoded data.
- the inter-layer pixel correspondence information and the inter-layer phase correspondence information are corrected so that the position on the layer is close to the reference layer position corresponding to the upper layer pixel in the hierarchically encoded data before conversion. .
- the present invention is suitable for a hierarchical image decoding device that decodes encoded data in which image data is hierarchically encoded, and a hierarchical image encoding device that generates encoded data in which image data is hierarchically encoded. Applicable to. Further, the present invention can be suitably applied to the data structure of hierarchically encoded data generated by the hierarchical image encoding device and referenced by the hierarchical image decoding device.
- Hierarchical video decoding device (image decoding device) 11 NAL Demultiplexing Unit 12 Parameter Set Decoding Unit 13 Tile Setting Unit 14 Slice Decoding Unit 141 Slice Header Decoding Unit 142 Slice Position Setting Unit 144 CTU Decoding Unit 1441 Prediction Residual Restoring Unit 1442 Predictive Image Generation Unit 1443 CTU Decoded Image Generation Unit DESCRIPTION OF SYMBOLS 15 Base decoding part 151 Variable length decoding part 152 Base parameter set decoding part 153 Base picture decoding part 154 Base slice decoding part 156 Base decoded picture management part 16 Decoded picture management part 2 Hierarchical video coding apparatus (picture coding apparatus) 21 NAL multiplexing unit 22 Parameter set encoding unit 23 Tile setting unit 24 Slice encoding unit 241 Slice header setting unit 242 Slice position setting unit 244 CTU encoding unit 2441 Prediction residual encoding unit 2442 Predictive image encoding unit 3 layers Encoded data converter (encoded data converter) 32 Parameter set correction unit 34 NAL selection unit
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
本実施の形態に係る階層動画像復号装置(画像復号装置)1は、階層動画像符号化装置(画像符号化装置)2によって階層符号化された符号化データを復号する。階層符号化とは、動画像を低品質のものから高品質のものにかけて階層的に符号化する符号化方式のことである。階層符号化は、例えば、SVCやSHVCにおいて標準化されている。なお、ここでいう動画像の品質とは、主観的および客観的な動画像の見栄えに影響する要素のことを広く意味する。動画像の品質には、例えば、“解像度”、“フレームレート”、“画質”、および、“画素の表現精度”が含まれる。よって、以下、動画像の品質が異なるといえば、例示的には、“解像度”等が異なることを指すが、これに限られない。例えば、異なる量子化ステップで量子化された動画像の場合(すなわち、異なる符号化雑音により符号化された動画像の場合)も互いに動画像の品質が異なるといえる。
ここで、図2を用いて、階層符号化データの符号化および復号について説明すると次のとおりである。図2は、動画像を、下位階層L3、中位階層L2、および上位階層L1の3階層により階層的に符号化/復号する場合について模式的に表す図である。つまり、図2(a)および(b)に示す例では、3階層のうち、上位階層L1が最上位層となり、下位階層L3が最下位層となる。
以下、各階層の符号化データを生成する符号化方式として、HEVCおよびその拡張方式を用いる場合について例示する。しかしながら、これに限られず、各階層の符号化データを、MPEG-2や、H.264/AVCなどの符号化方式により生成してもよい。
図3は、基本レイヤにおいて採用できる符号化データ(図2の例でいえば、階層符号化データDATA#C)のデータ構造を例示する図である。階層符号化データDATA#Cは、例示的に、シーケンス、およびシーケンスを構成する複数のピクチャを含む。
シーケンスレイヤでは、処理対象のシーケンスSEQ(以下、対象シーケンスとも称する)を復号するために階層動画像復号装置1が参照するデータの集合が規定されている。シーケンスSEQは、図3の(a)に示すように、ビデオパラメータセットVPS(Video Parameter Set)、シーケンスパラメータセットSPS(Sequence Parameter Set)、ピクチャパラメータセットPPS(Picture Parameter Set)、ピクチャPICT1~PICTNP(NPはシーケンスSEQに含まれるピクチャの総数)、及び、付加拡張情報SEI(Supplemental Enhancement Information)を含んでいる。
ピクチャレイヤでは、処理対象のピクチャPICT(以下、対象ピクチャとも称する)を復号するために階層動画像復号装置1が参照するデータの集合が規定されている。ピクチャPICTは、図3の(b)に示すように、スライスヘッダSH1~SHNS、及び、スライスS1~SNSを含んでいる(NSはピクチャPICTに含まれるスライスの総数)。
スライスレイヤでは、処理対象のスライスS(対象スライスとも称する)を復号するために階層動画像復号装置1が参照するデータの集合が規定されている。スライスSは、図3の(c)に示すように、符号化ツリーユニットCTU1~CTUNC(NCはスライスSに含まれるCTUの総数)を含んでいる。
CTUレイヤでは、処理対象の符号化ツリーユニットCTU(以下、対象CTUとも称する)を復号するために階層動画像復号装置1が参照するデータの集合が規定されている。なお、符号化ツリーユニットのことを符号化ツリーブロック(CTB: Coding Tree block)、または、最大符号化単位(LCU:Largest Cording Unit)と呼ぶこともある。
CTUヘッダCTUHには、対象CTUの復号方法を決定するために階層動画像復号装置1が参照する符号化パラメータが含まれる。具体的には、図3の(d)に示すように、対象CTUの各CUへの分割パターンを指定するCTU分割情報SP_CTU、および、量子化ステップの大きさを指定する量子化パラメータ差分Δqp(qp_delta)が含まれる。
CUレイヤでは、処理対象のCU(以下、対象CUとも称する)を復号するために階層動画像復号装置1が参照するデータの集合が規定されている。
続いて、図3(e)を参照しながらCU情報CUに含まれるデータの具体的な内容を説明する。図3(e)に示すように、CU情報CUは、具体的には、スキップフラグSKIP、予測ツリー情報(以下、PT情報と略称する)PTI、および、変換ツリー情報(以下、TT情報と略称する)TTIを含む。
PT情報PTIは、CUに含まれる予測ツリー(以下、PTと略称する)に関する情報である。言い換えれば、PT情報PTIは、PTに含まれる1または複数のPUそれぞれに関する情報の集合であり、階層動画像復号装置1により予測画像を生成する際に参照される。PT情報PTIは、図3(e)に示すように、予測タイプ情報PType、および、予測情報PInfoを含んでいる。
TT情報TTIは、CUに含まれる変換ツリー(以下、TTと略称する)に関する情報である。言い換えれば、TT情報TTIは、TTに含まれる1または複数の変換ブロックそれぞれに関する情報の集合であり、階層動画像復号装置1により残差データを復号する際に参照される。
処理2:処理1にて得られた変換係数を量子化する;
処理3:処理2にて量子化された変換係数を可変長符号化する;
なお、上述した量子化パラメータqpは、階層動画像符号化装置2が変換係数を量子化する際に用いた量子化ステップQPの大きさを表す(QP=2qp/6)。
PU分割情報によって指定されるPU分割タイプには、対象CUのサイズを2N×2N画素とすると、次の合計8種類のパターンがある。すなわち、2N×2N画素、2N×N画素、N×2N画素、およびN×N画素の4つの対称的分割(symmetric splittings)、並びに、2N×nU画素、2N×nD画素、nL×2N画素、およびnR×2N画素の4つの非対称的分割(asymmetric splittings)である。なお、N=2m(mは1以上の任意の整数)を意味している。以下、対象CUを分割して得られる予測単位のことを予測ブロック、または、パーティションと称する。
拡張レイヤのレイヤ表現に含まれる符号化データ(以下、拡張レイヤ符号化データ)についても、例えば、図3に示すデータ構造とほぼ同様のデータ構造を採用できる。ただし、拡張レイヤ符号化データでは、以下のとおり、付加的な情報を追加したり、パラメータを省略できる。
次に、本発明に係る重要な概念であるピクチャ、タイル、スライスについて、相互の関係および符号化データとの関係を図4を参照して説明する。図4は、階層符号化データにおけるピクチャとタイル・スライスの関係を説明する図である。タイルは、ピクチャ内の矩形の部分領域、および、該部分領域に係る符号化データに対応付けられる。スライスはピクチャ内の部分領域、および、該部分領域に係る符号化データ、すなわち、該部分領域に係るスライスヘッダおよびスライスデータに対応付けられる。
本実施形態に係る階層動画像復号装置1、階層動画像符号化装置2、および、符号化データ変換装置3の説明に先立って、階層動画像復号装置1、階層動画像符号化装置2、および、符号化データ変換装置3を組み合わせて実現できるシステムの例を図5を参照して説明しておく。図5は、階層動画像復号装置1、階層動画像符号化装置2、および、符号化データ変換装置3を組み合わせて実現できる階層動画像の伝送と再生を行うシステムSYS_ROI1を例示している。
以下では、本実施形態に係る階層動画像復号装置1の構成について、図1~図20を参照して説明する。
図6を用いて、階層動画像復号装置1の概略的構成を説明すると次のとおりである。図6は、階層動画像復号装置1の概略的構成を示した機能ブロック図である。階層動画像復号装置1は、階層符号化データDATA(階層動画像符号化装置2から提供される階層符号化データDATAF、または、符号化データ変換装置3から提供される階層符号化データDATAR)を復号して、対象レイヤの復号画像POUT#Tを生成する。なお、以下では、対象レイヤは基本レイヤを参照レイヤとする拡張レイヤであるとして説明する。そのため、対象レイヤは、参照レイヤに対する上位レイヤでもある。逆に、参照レイヤは、対象レイヤに対する下位レイヤでもある。
パラメータセット復号部12は、入力される対象レイヤの符号化データから、対象レイヤの復号に用いられるパラメータセット(VPS、SPS、PPS)を復号して出力する。一般に、パラメータセットの復号は既定のシンタックス表に基づいて実行される。すなわち、シンタックス表の定める手順に従って符号化データからビット列を読み出して、シンタックス表に含まれるシンタックスのシンタックス値を復号する。また、必要に応じて、復号したシンタックス値に基づいて導出した変数を導出して、出力するパラメータセットに含めてもよい。したがって、パラメータセット復号部12から出力されるパラメータセットは、符号化データに含まれるパラメータセット(VPS、SPS、PPS)に係るシンタックスのシンタックス値、および、該シンタックス値より導出される変数の集合と表現することもできる。
パラメータセット復号部12は、入力される対象レイヤ符号化データからピクチャ情報を復号する。ピクチャ情報は、概略的には、対象レイヤの復号ピクチャのサイズを定める情報である。例えば、ピクチャ情報は、対象レイヤの復号ピクチャの幅や高さを表わす情報を含んでいる。
パラメータセット復号部12は、入力される対象レイヤ符号化データから表示領域情報を復号する。表示領域情報は、例えば、SPSに含まれており、図9に示すシンタックス表に従って復号される。図9は、パラメータセット復号部12がSPS復号時に参照するシンタックス表の一部であって、表示領域情報に係る部分である。
パラメータセット復号部12は、入力される対象レイヤ符号化データからレイヤ間位置対応情報を復号する。レイヤ間位置対応情報は、概略的には、対象レイヤと参照レイヤの対応する領域の位置関係を示す。例えば、対象レイヤのピクチャと参照レイヤのピクチャにある物体(物体A)が含まれる場合、対象レイヤのピクチャ上の物体Aに対応する領域と、参照レイヤのピクチャ上の物体Aに対応する領域が、前記対象レイヤと参照レイヤの対応する領域に相当する。なお、レイヤ間位置対応情報は、必ずしも上記の対象レイヤと参照レイヤの対応する領域の位置関係を正確に示す情報でなくてもよいが、一般的には、レイヤ間予測の正確性を高めるために正確な対象レイヤと参照レイヤの対応する領域の位置関係を示している。
レイヤ間画素対応情報は、例えば、上位レイヤのSPSの一部であるSPS拡張(sps_estension)に含まれており、図11に示すシンタックス表に従って復号される。図11は、パラメータセット復号部12がSPS復号時に参照するシンタックス表の一部であって、レイヤ間画素対応情報に係る部分である。
レイヤ間位相対応情報は、例えば、上位レイヤのSPSの一部であるSPS拡張に含まれており、図13に示すシンタックス表に従って復号される。図13は、パラメータセット復号部12がSPS復号時に参照するシンタックス表の一部であって、レイヤ間位相対応情報に係る部分である。
つまり、対象レイヤ画素PELに対応する参照レイヤ画素の位置に、位相オフセットを加算した値が、対象レイヤ画素に対応する参照レイヤ上の点の位置と一致する。
上記の図13を参照して説明した例では、参照レイヤ位相オフセットが直接SPSに含まれていたが、それに限らない。例えば、参照レイヤ位相オフセットが導出可能な別のパラメータが含まれていても構わない。そのような例を、図15に示すシンタックス表を参照して説明する。図15は、パラメータセット復号部12がSPS復号時に参照するシンタックス表の一部であって、レイヤ間位相対応情報に係る部分の別の例である。
ここで、scaleは空間スケーラビリティの倍率であり、参照レイヤのピクチャをscaleの示す倍率で拡大した領域が参照レイヤ対応領域となる。
タイル設定部13は、入力されるパラメータセットに基づいてピクチャのタイル情報を導出して出力する。
図16はパラメータセットに含まれるPPSの復号時にパラメータ復号部12により参照されるシンタックス表の一部であって、タイル情報に係る部分である。
uniform_spacing_flagの値が1の場合、ピクチャに含まれるタイルサイズが均等、すなわち、各タイルの幅と高さが等しいことを示す。uniform_spacing_flagの値が0の場合、ピクチャに含まれるタイルサイズが不均等、すなわち、ピクチャに含まれるタイルの幅や高さが必ずしも一致しないことを示す。
つまり、ピクチャをタイル列数で等分して得られる(i+1)番目とi番目の境界位置の差分として、i番目のタイル列のCTU単位の幅であるColWidth[i]が計算される。
rowBd[j+1] = rowBd[j] + rowHeight[j]
続いて、ピクチャに含まれるラスタスキャンCTUアドレス(ctbAddrRs)で識別されるCTUに関連付けられるタイルスキャンCTUアドレスを以下の手順で導出する。
tbY = ctbAddrRs / PicWidthInCtbsY
続いて、対象CTUを含むタイルのピクチャ内のタイル単位の位置(tileX、tileY)を導出する。tileXには、評価式(tbX >= colBd[i])が真となる最大のiの値が設定される。同様に、tileYには、評価式(tbY >= rowBd[j])が真となる最大のjの値が設定される。
(スライス復号部14)
スライス復号部14は、入力されるVCL NAL、パラメータセット、および、タイル情報に基づいて復号ピクチャを生成して出力する。
スライスヘッダ復号部141は、入力されるVCL NALとパラメータセットに基づいてスライスヘッダを復号し、スライス位置設定部142、および、CTU復号部144に出力する。
スライス位置設定部142は、入力されるスライスヘッダとタイル情報に基づいてピクチャ内のスライス位置を特定してCTU復号部144に出力する。
ctbX[0] = slice_segment_address % PicWidthInCtbsY
ctbY[0] = slice_segment_address / PicWidthInCtbsY
ここで、CtbAddrRsToTs[X]はラスタスキャンのアドレスをタイルスキャンのアドレスに変換する配列であり、スライス位置設定部に入力されるタイル情報に含まれている。
ctbX[i] = CtbAddrTsToRs[ctbAddrTs[i]] % PicWidthInCtbsY
ctbY[i] = CtbAddrTsToRs[ctbAddrTs[i]] / PicWidthInCtbsY
つまり、対象CTUのタイルスキャンのアドレスは、直前に先行するCTUのタイルスキャンのアドレスに1加算した値に設定される。そして、得られたタイルスキャンのアドレスを、タイル情報に含まれる変換配列CtbAddrTsToRsを用いてラスタスキャンのアドレスに変換する。ラスタスキャンのアドレスとCTU単位のピクチャの幅によりCTUのピクチャ内での位置(ctbX[i],ctbY[i])が導出される。
ctbYInLumaPixels[i] = ctbY[i] << CtbLog2SizeY
以上の処理により、スライス位置設定部142は、スライスに含まれる各CTUのピクチャ内での位置を計算して出力する。
CTU復号部144は、概略的には、入力されるスライスヘッダ、スライスデータ、および、パラメータセットに基づいて、スライスに含まれる各CTUに対応する領域の復号画像を復号することで、スライスの復号画像を生成する。スライスの復号画像は、入力されるスライス位置の示す位置に、復号ピクチャの一部として出力される。CTUの復号画像は、CTU復号部144内部の予測残差復元部1441、予測画像生成部1442、および、CTU復号画像生成部1443により生成される。予測残差復元部1441は、入力のスライスデータに含まれる予測残差情報(TT情報)を復号して対象CTUの予測残差を生成して出力する。予測画像生成部1442は、入力のスライスデータに含まれる予測情報(PT情報)の示す予測方法と予測パラメータに基づいて予測画像を生成して出力する。その際、必要に応じて、参照ピクチャの復号画像や符号化らメータが利用される。CTU復号画像生成部1443は、入力される予測画像と予測残差を加算して対象CTUの復号画像を生成して出力する。
前述の予測画像生成部1442による予測画像生成処理のうち、レイヤ間画像予測が選択された場合の予測画像生成処理の詳細を説明する。
SRLH = currPicH - SRLTopOffset - SRLBottomOffset
scaleX = refPicW ÷ SRLW
scaleY = refPicH ÷ SRLH
ここで、currPicWとcurrPicHは対象ピクチャの高さと幅であり、対応参照位置導出処理の対象が輝度画素の場合は、対象レイヤにおけるSPSのピクチャ情報に含まれるpic_width_in_luma_samplesとpic_height_in_luma_samplesの各シンタックス値と一致する。対象が色差の場合は、色フォーマットの種類に応じて前記シンタックス値を変換した値を使用する。例えば色フォーマットが4:2:2の場合、各シンタックス値の半分の値を使用する。また、refPicWとrefPicHは参照ピクチャの高さと幅であり、対象が輝度画素の場合、参照レイヤにおけるSPSのピクチャ情報に含まれるpic_width_in_luma_samplesとpic_height_in_luma_samplesの各シンタックス値と一致する。また、SRLLeftOffset、SRLRightOffset、SRLTopOffset、SRLBottomOffsetは、図12を参照して説明したレイヤ間画素対応オフセットである。
yRefTmp = (yP - SRLTopOffset) * scaleY
ここで、xPとyPは対象レイヤピクチャ左上画素を基準とする対象レイヤ画素の水平成分と垂直成分をそれぞれ対象レイヤピクチャの画素単位で表わす。また、実数Xに対してFloor(X)は、Xを超えない最大の整数を意味する。
phaseOffsetY = ref_layer_left_phase_offset[rlIdx] ÷ 8
ここで、rlIdxは対応参照位置導出処理の時点での参照レイヤを選択するインデックスである。上記の式では、位相オフセットphaseOffsetXとphaseOffsetYが画素単位、参照レイヤ位相オフセットref_layer_left_phase_offset[rlIdx]とref_layer_top_phase_offset[rlIdx]が8分の1画素単位であるため、後者を8で除算した値を位相オフセットの値に設定している。なお、位相オフセットと参照レイヤ位相オフセットがそれぞれ別の単位で表現される場合は、適宜単位の違いに合わせた調整を行うべきであって、必ずしも常に上記式の通りに位相オフセットを設定する必要はない。
yRef = yRefTmp + phaseOffsetY
つまり、暫定参照画素位置に位相オフセットを加えた値を対応参照位置として導出している。なお、暫定参照画素位置、位相オフセット、対応参照位置が異なる単位で表現されている場合は、必ずしも上記式に従って対応参照位置を計算する必要はなく、適宜単位を合わせる調整を行うべきである。
yRef16 = Floor ((yRefTmp + phaseOffsetY) * 16)
一般に、フィルタ処理の適用に好ましい単位や表現で対応参照位置を導出することが好ましい。例えば、補間フィルタが参照する最小単位と一致する精度の整数表現により対象参照位置を導出することが好ましい。
croppedOffsetY = (- cropped_ref_layer_top_offset[i] << 1)
phaseOffsetX = Frac ((xP - croppedOffsetX) * scaleX)
phaseOffsetY = Frac ((yP - croppedOffsetY) * scaleY)
ここで、Frac(X)は、Xの小数部分を意味し、Frac(X) = X - Floor(X)の関係がある。
以上説明した本実施形態に係る階層動画像復号装置1(階層画像復号装置)は、階層符号化された符号化データに含まれる上位レイヤの符号化データを復号し、対象レイヤである上位レイヤの復号ピクチャを復元する画像復号装置であって、パラメータセットを復号するパラメータセット復号部12と、参照レイヤピクチャの復号画素を参照して、レイヤ間予測により予測画像を生成する予測画像生成部1442を備えており、前記パラメータセット復号部12は、対象レイヤ画素と該対象レイヤ画素に対応する参照レイヤピクチャ上の位置に係る情報であるレイヤ間位相対応情報を復号する。
図21を用いて、階層動画像符号化装置2の概略構成を説明する。図21は、階層動画像符号化装置2の概略的構成を示した機能ブロック図である。階層動画像符号化装置2は、対象レイヤの入力画像PIN#Tを、参照レイヤ符号化データDATA#Rを参照しながら符号化して、対象レイヤの階層符号化データDATAを生成する。なお、参照レイヤ符号化データDATA#Rは、参照レイヤに対応する階層動画像符号化装置において符号化済みであるとする。
次に図22を参照して、スライス符号化部24の構成の詳細を説明する。図22は、スライス符号化部24の概略的構成を示した機能ブロック図である。
以上説明した本実施形態に係る階層動画像符号化装置2は、入力画像から上位レイヤの符号化データを生成する画像符号化装置であって、パラメータセットを復号するパラメータセット符号化部22と、参照レイヤピクチャの復号画素を参照して、レイヤ間予測により予測画像を生成する予測画像符号化部2442を備え、前記パラメータセット符号化部22は、対象レイヤ画素と該対象レイヤ画素に対応する参照レイヤピクチャ上の位置に係る情報であるレイヤ間位相対応情報を符号化し、前記予測画像符号化部2442は、レイヤ間予測実行時に、前記レイヤ間位相対応情報に基づいて、予測対象画素に対応する参照レイヤ位置を導出する対応参照位置導出処理を実行する。
図23を用いて、階層符号化データ変換装置3の概略構成を説明する。図23は、階層符号化データ変換装置3の概略的構成を示した機能ブロック図である。階層符号化データ変換装置3は、入力される階層符号化データDATAを変換して、入力される注目領域情報に係る階層符号化データDATA-ROIを生成する。なお、階層符号化データDATAは階層動画像符号化装置2により生成された階層符号化データである。また、階層符号化データDATA-ROIを階層動画像復号装置1に入力することで注目領域情報に係る上位レイヤの動画像を再生できる。
始めに、階層符号化データ変換装置3による変換処理によるパラメータセット修正の概略について、図24を参照して説明する。図24は、変換前後の階層符号化データにおけるピクチャ、注目領域、および、タイルの関係を例示した図である。図24(a)は変換前の階層符号化データにおけるピクチャ、注目領域、および、タイルの関係を示している。変換前の階層符号化データのパラメータセットは、ピクチャ(変換前)は垂直、水平方向に各3個、計9個のタイルから構成される(左上からラスタスキャン順にタイルT00、T01、T02、T10、T11、T12、T20、T21、T22)。注目領域はピクチャ右上部分に設定されており、タイルT01、T02、T11、T12と重複する領域を持つ。図24(b)は変換後の階層符号化データにおけるピクチャ、注目領域、および、タイルの関係を示している。変換後の階層符号化データのパラメータセットは、ピクチャ(変換後)は垂直、水平方向に各2個(タイルT01、T02、T11、T12)、計4個のタイルから構成される。つまり、変換前のピクチャにあったタイルであって、注目領域と重複領域を持たないタイル(タイルT00、T10、T20、T21、T22)は、変換後のピクチャには含まれない。
パラメータセット修正部32は、入力される注目領域情報とタイル情報を参照して、対応する領域の一部が注目領域と重複するタイル(抽出対象タイル)のみを含むようPPSタイル情報を更新する。抽出対象タイルの情報に基づいて、PPSタイル情報を更新する。まず、抽出対象タイルが1個の場合、tiles_enabled_flagを0に修正する。なお、抽出対象タイルが2個以上の場合は修正処理は省略できる。次に、ピクチャの水平方向と垂直方向に含まれる抽出対象タイルの個数に基づいて、タイル行数を表わす(nu_tile_columns_minus1)とタイル列数を表わす(num_tile_rows_minus1)を修正する。次に、タイルサイズが不均等(uniform_spacing_flagが0)の場合には、抽出対象タイルを含まないタイル列の幅、抽出対象タイルを含まないタイル行の高さに係るシンタックスに対応するビット列をパラメータセットから削除する。
パラメータセット修正部32は、抽出対象タイルの集合に対応する領域を変換後のピクチャサイズとしてピクチャ情報を修正する。図24に示した例では、タイルT01、T02をそれぞれ含むタイル列の和を修正後の対象レイヤのピクチャ幅pic_width_in_luma_samplesとして設定する。また、タイルT01、T11をそれぞれ含むタイル行の高さの和を対象レイヤのピクチャの高さpic_height_in_luma_samplesとして設定する。
パラメータセット修正部32は、ピクチャサイズの変更を踏まえて、パラメータセットに含まれるレイヤ間画素対応情報を修正する。具体的には、レイヤ間画素対応情報に含まれる全てのレイヤ間画素対応オフセットを修正する。レイヤ間画素対応オフセットを構成する、スケール後参照レイヤ左オフセット(scaled_ref_layer_left_offset[i])は、注目領域より左にあるタイル列であって、抽出対象タイルを含まないタイル列の幅の和が加算される。例えば、図24の例では、タイルT00を含むタイル列の幅が加算される。同様に、スケール後参照レイヤ上オフセット(scaled_ref_layer_top_offset[i])は、注目領域より上にあるタイル行であって、抽出対象タイルを含まないタイル行の高さの和が加算される。同様に、スケール後参照レイヤ右オフセット(scaled_ref_layer_right_offset[i])には、注目領域より右にあるタイル列であって、抽出対象タイルを含まないタイル列の幅の和が加算される。同様に、スケール後参照レイヤ下オフセット(scaled_ref_layer_bottom_offset[i])には、注目領域より下にあるタイル行であって、抽出対象タイルを含まないタイル行の高さの和が加算される。
パラメータセット修正部32は、ピクチャサイズの変更を踏まえて、パラメータセットに含まれるレイヤ間位相対応情報を修正する。レイヤ間位相対応情報の修正は、概略的には、変換後のピクチャの左上画素の位相が、変換前の同じ画素の位相と一致するように修正される。言い換えると、変換前の抽出対象領域の左上画素(抽出対象タイルの中で最も左上に位置するタイルの左上画素)において、対応参照位置導出処理で導出される位相が、変換後のピクチャの左上画素において、対象参照位置導出処理で導出される位相と一致するように修正される。なお、必ずしも完全に一致するように修正される必要はなく、修正を行わない場合に較べて、位相が近くなるように修正すれば本発明による効果は得られる。
phaseTAft = Frac(Frac(yLORef) - Frac(yLARefTmp))
上記の式は、修正後の参照レイヤ位相オフセットと、変換後のピクチャ左上画素の参照レイヤ位相オフセットが0の場合の対応参照位置の和の小数部分が、変換前の抽出対応領域の左上画素の対応参照位置の小数部分に一致することから、導かれた式である。
パラメータセット修正部32は、入力される注目領域情報の示す注目領域と一致するように、入力されるパラメータセットに含まれるSPSの表示領域情報を書き換える。SPSの表示領域情報として図9を参照して説明したシンタックスを用いる場合、表示領域情報は次のS301からS303の手順で書き換えられる。
階層符号化データ変換装置3による階層符号化データ変換処理は、S501~S506に示す手順を順次実行することで実現される。
以上説明した本実施形態に係る階層符号化データ変換装置3は、対象レイヤ(上位レイヤ)の符号化データに含まれるビデオレイヤの符号化データ(VCL NAL)の一部を注目領域情報に基づいて修正するNAL選択部34と、階層符号化データ変換装置3はパラメータセット修正部32を備えている。NAL選択部34は、注目領域情報の示す注目領域に基づいて、注目領域と重複する領域をもつタイルを抽出対象タイルとして選択し、前記選択した抽出対象タイルに含まれるスライスに係るビデオレイヤの符号化データが変換後の階層符号化データに含まれる。パラメータセット修正部32は、注目領域情報とタイル情報に基づいて、ピクチャ情報、PPSタイル情報、表示情報、レイヤ間画素対応情報、および、レイヤ間位相対応情報を修正する。
上述した階層動画像復号装置1、階層動画像符号化装置2、及び、階層符号化データ変換装置3を組み合わせて、注目領域情報を表示するシステム(注目領域表示システムSYS)を構成できる。
注目領域表示システムによる処理は、階層符号化データ生成蓄積処理と注目領域データ生成再生処理に分けることができる。
以上説明した本実施形態に係る注目領域表示システムSYSは、注目領域情報を供給する注目領域通知部(ROI通知部SYS8)と、前記注目領域情報に基づいて階層符号化データを変換して変換後階層符号化データを生成する階層符号化データ変換部SYS3と、上記変換後階層符号化データを復号して上位レイヤ及び下位レイヤの復号ピクチャを出力する階層動画像復号部SYS4と、表示制御部SYS5、注目領域表示部(ROI表示部SYS6),および、全体表示部SYS7を備えている。前記表示制御部SYS5は、前記下位レイヤの復号ピクチャを全体表示部SYS7に供給し、かつ、前記上位レイヤの復号ピクチャを注目領域表示部に供給する。
上述した階層動画像符号化装置2及び階層動画像復号装置1は、動画像の送信、受信、記録、再生を行う各種装置に搭載して利用できる。なお、動画像は、カメラ等により撮像された自然動画像であってもよいし、コンピュータ等により生成された人工動画像(CGおよびGUIを含む)であってもよい。
最後に、階層動画像復号装置1、階層動画像符号化装置2の各ブロックは、集積回路(ICチップ)上に形成された論理回路によってハードウェア的に実現してもよいし、CPU(Central Processing Unit)を用いてソフトウェア的に実現してもよい。
上記課題を解決するために、本発明に係る画像復号装置は、階層符号化された符号化データに含まれる上位レイヤの符号化データを復号し、対象レイヤである上位レイヤの復号ピクチャを復元する画像復号装置であって、パラメータセットを復号するパラメータセット復号部と、参照レイヤピクチャの復号画素を参照して、レイヤ間予測により予測画像を生成する予測画像生成部を備え、前記パラメータセット復号部は、対象レイヤ画素と該対象レイヤ画素に対応する参照レイヤピクチャ上の位置に係る情報であるレイヤ間位相対応情報を復号することを特徴としている。
11 NAL逆多重化部
12 パラメータセット復号部
13 タイル設定部
14 スライス復号部
141 スライスヘッダ復号部
142 スライス位置設定部
144 CTU復号部
1441 予測残差復元部
1442 予測画像生成部
1443 CTU復号画像生成部
15 ベース復号部
151 可変長復号部
152 ベースパラメータセット復号部
153 ベースピクチャ復号部
154 ベーススライス復号部
156 ベース復号ピクチャ管理部
16 復号ピクチャ管理部
2 階層動画像符号化装置(画像符号化装置)
21 NAL多重化部
22 パラメータセット符号化部
23 タイル設定部
24 スライス符号化部
241 スライスヘッダ設定部
242 スライス位置設定部
244 CTU符号化部
2441 予測残差符号化部
2442 予測画像符号化部
3 階層符号化データ変換装置(符号化データ変換装置)
32 パラメータセット修正部
34 NAL選択部
Claims (8)
- 階層符号化された符号化データに含まれる上位レイヤの符号化データを復号し、対象レイヤである上位レイヤの復号ピクチャを復元する画像復号装置であって、
パラメータセットを復号するパラメータセット復号部と、
参照レイヤピクチャの復号画素を参照して、レイヤ間予測により予測画像を生成する予測画像生成部を備え、
前記パラメータセット復号部は、対象レイヤ画素と該対象レイヤ画素に対応する参照レイヤピクチャ上の位置に係る情報であるレイヤ間位相対応情報を復号することを特徴とする画像復号装置。 - 前記パラメータセット復号部は、対象レイヤピクチャの各辺を基準とする参照レイヤ対応領域の各辺のオフセットを示すレイヤ間画素対応オフセットを復号し、
前記予測画像生成部は、前記レイヤ間画素対応オフセットを用いて導出される参照レイヤ対応領域の第1の画素を基準とする対象レイヤ画素の位置を、レイヤ間サイズ比率を用いてスケールすることで暫定参照位置を導出し、
前記レイヤ間位相対応情報から導出される参照レイヤ位相オフセットと、前記暫定参照位置とを用いて、対応参照位置を導出することを特徴とする、請求項1に記載の画像復号装置。 - 前記レイヤ間位相対応情報は、パラメータセットに含まれる参照レイヤ位相オフセット数を示す量である参照レイヤ位相オフセット数を含むことを特徴とする、請求項2に記載の画像復号装置。
- 前記パラメータセット復号部は、レイヤ間位相対応情報として参照レイヤクロップオフセットを復号し、
前記予測画像生成部は、前記参照レイヤクロップオフセットとレイヤ間サイズ比率を用いて、前記参照レイヤ位相オフセットを導出することを特徴とする、請求項2に記載の画像復号装置。 - 入力画像から上位レイヤの符号化データを生成する画像符号化装置であって、
パラメータセットを復号するパラメータセット復号部と、
参照レイヤピクチャの復号画素を参照して、レイヤ間予測により予測画像を生成する予測画像符号化部を備え、
前記パラメータセット復号部は、対象レイヤ画素と該対象レイヤ画素に対応する参照レイヤピクチャ上の位置に係る情報であるレイヤ間位相対応情報を符号化し、
前記予測画像符号化部は、レイヤ間予測実行時に、前記レイヤ間位相対応情報に基づいて、予測対象画素に対応する参照レイヤ位置を導出する対応参照位置導出処理を実行することを特徴とする画像符号化装置。 - 入力される階層符号化データを入力される注目領域情報に基づいて変換し、変換後の階層符号化データを出力する階層符号化データ変換装置であって、
入力の階層符号化データから修正前パラメータセットを復号するパラメータセット復号部と、
入力の注目領域情報に基づいて修正前パラメータセットを修正して修正後パラメータセットを生成するパラメータセット修正部と、
タイル情報と前記注目領域情報に基づいて出力階層符号化データに含める符号化レイヤNALを選択するNAL選択部を備え、
前記NAL選択部は、前記注目領域情報の示す注目領域と少なくとも一部の領域が重複するタイルを抽出対象タイルとし、該抽出対象タイルに含まれるスライスに対応するビデオ符号化レイヤNALを変換後の階層符号化データに含めるビデオ符号化レイヤNALとして選択し、
前記パラメータセット修正部は、前記抽出対象タイルに基づいて、パラメータセットに含まれるピクチャサイズとタイル情報を修正することを特徴とする階層符号化データ変換装置。 - 前記パラメータセット修正部は、パラメータセットに含まれる表示領域情報を注目領域情報と一致するよう修正することを特徴とする、請求項6に記載の階層符号化データ変換装置。
- 前記パラメータセットはレイヤ間画素対応情報とレイヤ間位相対応情報をさらに含み、
前記パラメータセット修正部は、変換後の階層符号化データにおける上位レイヤの画素と対応する参照レイヤ上の位置が、変換前の階層符号化データにおける上位レイヤの画素と対応する参照レイヤ位置に近くなるように、前記レイヤ間画素対応情報と前記レイヤ間位相対応情報を修正することを特徴とする、請求項6または請求項7に記載の階層符号化データ変換装置。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/027,455 US10225567B2 (en) | 2013-10-08 | 2014-10-07 | Image decoder, image encoder, and encoded data converter |
CN201480052141.6A CN105580370A (zh) | 2013-10-08 | 2014-10-07 | 图像解码装置、图像编码装置以及编码数据变换装置 |
JP2015541595A JP6363088B2 (ja) | 2013-10-08 | 2014-10-07 | 画像復号装置、画像復号方法、画像符号化装置、および画像符号化方法 |
EP14851834.3A EP3057327A4 (en) | 2013-10-08 | 2014-10-07 | Image decoder, image encoder, and encoded data converter |
HK16111660.8A HK1223471A1 (zh) | 2013-10-08 | 2016-10-07 | 圖像解碼裝置、圖像編碼裝置以及編碼數據變換裝置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2013-211231 | 2013-10-08 | ||
JP2013211231 | 2013-10-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015053287A1 true WO2015053287A1 (ja) | 2015-04-16 |
Family
ID=52813103
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/076853 WO2015053287A1 (ja) | 2013-10-08 | 2014-10-07 | 画像復号装置、画像符号化装置、および、符号化データ変換装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US10225567B2 (ja) |
EP (1) | EP3057327A4 (ja) |
JP (1) | JP6363088B2 (ja) |
CN (1) | CN105580370A (ja) |
HK (1) | HK1223471A1 (ja) |
WO (1) | WO2015053287A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPWO2015083575A1 (ja) * | 2013-12-06 | 2017-03-16 | ホアウェイ・テクノロジーズ・カンパニー・リミテッド | 画像復号装置、画像符号化装置、および、符号化データ変換装置 |
JP2017507548A (ja) * | 2014-01-02 | 2017-03-16 | ヴィド スケール インコーポレイテッド | インターレースおよびプログレッシブ混合のコンテンツを用いるスケーラブルビデオコーディングのための方法およびシステム |
JPWO2018123608A1 (ja) * | 2016-12-27 | 2019-10-31 | ソニー株式会社 | 画像処理装置および方法 |
JP2022514558A (ja) * | 2018-12-20 | 2022-02-14 | テレフオンアクチーボラゲット エルエム エリクソン(パブル) | ピクチャにおける均一なセグメントスプリットを使用したビデオコーディングのための方法および装置 |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106105208B (zh) * | 2014-01-09 | 2020-04-07 | 三星电子株式会社 | 可伸缩视频编码/解码方法和设备 |
US10051281B2 (en) * | 2014-05-22 | 2018-08-14 | Apple Inc. | Video coding system with efficient processing of zooming transitions in video |
US10178394B2 (en) * | 2016-06-10 | 2019-01-08 | Apple Inc. | Transcoding techniques for alternate displays |
EP3293981A1 (en) * | 2016-09-08 | 2018-03-14 | Koninklijke KPN N.V. | Partial video decoding method, device and system |
CN108156459A (zh) * | 2016-12-02 | 2018-06-12 | 北京中科晶上科技股份有限公司 | 可伸缩视频传输方法及*** |
KR20230117492A (ko) * | 2017-04-11 | 2023-08-08 | 브이아이디 스케일, 인크. | 면 연속성을 사용하는 360 도 비디오 코딩 |
CN110022481B (zh) | 2018-01-10 | 2023-05-02 | 中兴通讯股份有限公司 | 视频码流的解码、生成方法及装置、存储介质、电子装置 |
WO2019230904A1 (ja) * | 2018-06-01 | 2019-12-05 | シャープ株式会社 | 画像復号装置、および画像符号化装置 |
US10848768B2 (en) * | 2018-06-08 | 2020-11-24 | Sony Interactive Entertainment Inc. | Fast region of interest coding using multi-segment resampling |
CN109525842B (zh) * | 2018-10-30 | 2022-08-12 | 深圳威尔视觉科技有限公司 | 基于位置的多Tile排列编码方法、装置、设备和解码方法 |
CN118200565A (zh) | 2018-12-07 | 2024-06-14 | 松下电器(美国)知识产权公司 | 编码装置、解码装置和非暂时性的计算机可读介质 |
US11659201B2 (en) * | 2019-08-16 | 2023-05-23 | Qualcomm Incorporated | Systems and methods for generating scaling ratios and full resolution pictures |
WO2021061492A1 (en) * | 2019-09-24 | 2021-04-01 | Futurewei Technologies, Inc. | Layer based parameter set nal unit constraints |
EP4122207A4 (en) * | 2020-03-20 | 2024-05-08 | HFI Innovation Inc. | METHOD AND APPARATUS FOR SIGNALING TILE AND SLICE PARTITION INFORMATION IN IMAGE AND VIDEO CODING |
CN116469159B (zh) * | 2022-11-16 | 2023-11-14 | 北京理工大学 | 一种获取人体运动数据的方法及电子设备 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014025741A2 (en) * | 2012-08-06 | 2014-02-13 | Vid Scale, Inc. | Sampling grid information for spatial layers in multi-layer video coding |
WO2014169156A1 (en) * | 2013-04-10 | 2014-10-16 | General Instrument Corporation | Re-sampling with phase offset adjustment for luma and chroma in scalable video coding |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009540666A (ja) * | 2006-11-09 | 2009-11-19 | エルジー エレクトロニクス インコーポレイティド | ビデオ信号のデコーディング/エンコーディング方法及び装置 |
EP1980107A4 (en) * | 2006-11-17 | 2010-01-13 | Lg Electronics Inc | METHOD AND APPARATUS FOR DECODING / ENCODING A VIDEO SIGNAL |
US9774927B2 (en) * | 2012-12-21 | 2017-09-26 | Telefonaktiebolaget L M Ericsson (Publ) | Multi-layer video stream decoding |
US9426468B2 (en) * | 2013-01-04 | 2016-08-23 | Huawei Technologies Co., Ltd. | Signaling layer dependency information in a parameter set |
US9992493B2 (en) * | 2013-04-01 | 2018-06-05 | Qualcomm Incorporated | Inter-layer reference picture restriction for high level syntax-only scalable video coding |
US9578328B2 (en) * | 2013-07-15 | 2017-02-21 | Qualcomm Incorporated | Cross-layer parallel processing and offset delay parameters for video coding |
KR101712108B1 (ko) * | 2013-07-16 | 2017-03-03 | 삼성전자 주식회사 | 비트 뎁스 및 컬러 포맷의 변환을 동반하는 업샘플링 필터를 이용하는 스케일러블 비디오 부호화 방법 및 장치, 스케일러블 비디오 복호화 방법 및 장치 |
CN105519119B (zh) * | 2013-10-10 | 2019-12-17 | 夏普株式会社 | 图像解码装置 |
DE102014115310A1 (de) * | 2014-10-21 | 2016-04-21 | Infineon Technologies Ag | Bilderzeugungsvorrichtungen und ein Laufzeit-Bilderzeugungsverfahren |
-
2014
- 2014-10-07 JP JP2015541595A patent/JP6363088B2/ja not_active Expired - Fee Related
- 2014-10-07 WO PCT/JP2014/076853 patent/WO2015053287A1/ja active Application Filing
- 2014-10-07 EP EP14851834.3A patent/EP3057327A4/en not_active Withdrawn
- 2014-10-07 US US15/027,455 patent/US10225567B2/en not_active Expired - Fee Related
- 2014-10-07 CN CN201480052141.6A patent/CN105580370A/zh active Pending
-
2016
- 2016-10-07 HK HK16111660.8A patent/HK1223471A1/zh unknown
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014025741A2 (en) * | 2012-08-06 | 2014-02-13 | Vid Scale, Inc. | Sampling grid information for spatial layers in multi-layer video coding |
WO2014169156A1 (en) * | 2013-04-10 | 2014-10-16 | General Instrument Corporation | Re-sampling with phase offset adjustment for luma and chroma in scalable video coding |
Non-Patent Citations (10)
Title |
---|
"MV-HEVC Draft Text 5", JOINT COLLABORATIVE TEAM ON 3D VIDEO CODING EXTENSION DEVELOPMENT OF ITU-T SG16 WP3 AND ISO/IEC JTC 1/SC 29/WG 115TH MEETING, 27 July 2013 (2013-07-27) |
"Recommendation H.265 (04/13", ITU-T, 7 June 2013 (2013-06-07) |
"SHVC Draft 3", JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC) OF ITU-T SG16 WP3 AND ISO/IEC JTC 1/SC 29/WG 11 14TH MEETING, 25 July 2013 (2013-07-25) |
DO-KYOUNG KWON ET AL.: "Reference-layer cropping offsets signaling in SHVC", JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11, JCTVC-M0219-R2, 13TH MEETING, April 2013 (2013-04-01), INCHEON, KR, pages 1 - 5, XP030114176 * |
JIE DONG ET AL.: "Upsampling based on sampling grid information for aligned inter layer prediction", JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11, JCTVC-M0188_R1, 13TH MEETING, April 2013 (2013-04-01), INCHEON, KR, pages 1 - 11, XP030114145 * |
KAZUSHI SATO ET AL.: "Support of Field Coding for Signalling of Chroma Phase for Upsampling", JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11, JCTVC-N0248_R1, 14TH MEETING, July 2013 (2013-07-01), VIENNA, AT, pages 1 - 4, XP030114765 * |
KEMAL UGUR ET AL.: "AHG13: Signaling phase offset for upsampling in SHVC", JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11, JCTVC-M0231_V2, 13TH MEETING, April 2013 (2013-04-01), INCHEON, KR, pages 1 - 3, XP030114188 * |
KOOHYAR MINOO ET AL.: "AHG13: SHVC Upsampling with phase offset adjustment", JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11, JCTVC-M0263-RL, 13TH MEETING, April 2013 (2013-04-01), INCHEON, KR, pages 1 - 7, XP030114220 * |
See also references of EP3057327A4 * |
TOMOYUKI YAMAMOTO ET AL.: "MV-HEVC/SHVC HLS: On conversion to ROI-oriented multi-layer bitstream", JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11, JCTVC-00056-3V, 15TH MEETING, October 2013 (2013-10-01), GENEVA, CH, pages 1 - 5, XP030115029 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPWO2015083575A1 (ja) * | 2013-12-06 | 2017-03-16 | ホアウェイ・テクノロジーズ・カンパニー・リミテッド | 画像復号装置、画像符号化装置、および、符号化データ変換装置 |
US10142653B2 (en) | 2013-12-06 | 2018-11-27 | Huawei Technologies Co., Ltd. | Image decoding apparatus, image coding apparatus, and coded data transformation apparatus |
JP2017507548A (ja) * | 2014-01-02 | 2017-03-16 | ヴィド スケール インコーポレイテッド | インターレースおよびプログレッシブ混合のコンテンツを用いるスケーラブルビデオコーディングのための方法およびシステム |
JPWO2018123608A1 (ja) * | 2016-12-27 | 2019-10-31 | ソニー株式会社 | 画像処理装置および方法 |
JP7052732B2 (ja) | 2016-12-27 | 2022-04-12 | ソニーグループ株式会社 | 画像処理装置および方法 |
US11336909B2 (en) | 2016-12-27 | 2022-05-17 | Sony Corporation | Image processing apparatus and method |
JP2022514558A (ja) * | 2018-12-20 | 2022-02-14 | テレフオンアクチーボラゲット エルエム エリクソン(パブル) | ピクチャにおける均一なセグメントスプリットを使用したビデオコーディングのための方法および装置 |
JP7182006B2 (ja) | 2018-12-20 | 2022-12-01 | テレフオンアクチーボラゲット エルエム エリクソン(パブル) | ピクチャにおける均一なセグメントスプリットを使用したビデオコーディングのための方法および装置 |
Also Published As
Publication number | Publication date |
---|---|
EP3057327A1 (en) | 2016-08-17 |
CN105580370A (zh) | 2016-05-11 |
JPWO2015053287A1 (ja) | 2017-03-09 |
EP3057327A4 (en) | 2017-05-17 |
US10225567B2 (en) | 2019-03-05 |
JP6363088B2 (ja) | 2018-07-25 |
US20160255354A1 (en) | 2016-09-01 |
HK1223471A1 (zh) | 2017-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6363088B2 (ja) | 画像復号装置、画像復号方法、画像符号化装置、および画像符号化方法 | |
JP6229904B2 (ja) | 画像復号装置、画像符号化装置、および、符号化データ変換装置 | |
JP6542201B2 (ja) | 画像復号装置および画像復号方法 | |
US10841600B2 (en) | Image decoding device, an image encoding device and a decoding method | |
US20160249056A1 (en) | Image decoding device, image coding device, and coded data | |
US10136161B2 (en) | DMM prediction section, image decoding device, and image coding device | |
WO2014162954A1 (ja) | 画像復号装置、および画像符号化装置 | |
JP2015073213A (ja) | 画像復号装置、画像符号化装置、符号化データ変換装置、および、注目領域表示システム | |
AU2020380731B2 (en) | High level syntax signaling method and device for image/video coding | |
CN114762350A (zh) | 基于切片类型的图像/视频编译方法和设备 | |
WO2013161689A1 (ja) | 動画像復号装置、および動画像符号化装置 | |
WO2015098713A1 (ja) | 画像復号装置および画像符号化装置 | |
JP2016143962A (ja) | 領域分割画像生成装置、画像復号装置、および符号化装置。 | |
WO2012147947A1 (ja) | 画像復号装置、および画像符号化装置 | |
JP2015177318A (ja) | 画像復号装置、画像符号化装置 | |
JP2016072941A (ja) | Dmm予測装置、画像復号装置、および画像符号化装置 | |
JP2015126508A (ja) | 画像復号装置、画像符号化装置、符号化データ変換装置、領域再生装置 | |
CN115668940A (zh) | 基于与画面输出相关的信息的图像或视频编码 | |
CN116982318A (zh) | 媒体文件处理方法及设备 | |
JP2015076807A (ja) | 画像復号装置、画像符号化装置、および符号化データのデータ構造 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201480052141.6 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14851834 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2015541595 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15027455 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2014851834 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2014851834 Country of ref document: EP |