GB2512563A

GB2512563A - Method and apparatus for encoding an image into a video bitstream and decoding corresponding video bitstream with weighted residual predictions

Info

Publication number: GB2512563A
Application number: GB1300149.0A
Authority: GB
Inventors: Christophe Gisquet; Edouard Francois; Guillaume Laroche; Patrice Onno
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-01-04
Filing date: 2013-01-04
Publication date: 2014-10-08
Anticipated expiration: 2033-01-04
Also published as: GB2512563B; GB201300149D0

Abstract

Method and device for encoding an image according to a scalable encoding (SVC) scheme having an enhancement layer and a reference (or base) layer. The method comprises obtaining a block predictor candidate for predicting a coding unit within the enhancement layer and an associated first residual block corresponding to the prediction 5.1. A block predictor is determined in the reference layer that is co-located with the obtained block predictor candidate in the enhancement layer 5.2. The determined block predictor in the reference layer is used to determine a second residual block that is associated with a coding unit in the reference layer that is co-located with the coding unit in the enhancement layer 5.3. Finally a residual block is determined for the coding unit of the enhancement layer that corresponds to the difference between the first and the second residual blocks. A weighting factor is applied to at least one element among the obtained block predictor candidate, the determined block predictor and the reference layer residual block. Also disclosed is corresponding decoding method and device. The invention gives better handling of differences between enhancement layer motion and motion of the co-located block in the reference layer.

Description

METHOD AND APPARATUS FOR ENCODING AN IMAGE INTO A VIDEO

BITSTREAM AND DECODING CORRESPONDING VIDEO BITSTREAM WITH

WEIGHTED RESIDUAL PREDICTIONS

The present invention concerns a method for encoding an image of pixels and for decoding a corresponding bit stream and associated devices.

More particularly, it concerns weighted residual prediction according to a spatial scalable encoding scheme.

Scalable Video Coding (SVC) is the name for the Annex G extension of the H.264/MPEG-4 AVC video compression standard. Scalable video coding is based on the principle of encoding a base layer in low quality and an enhancement layer with complementary data allowing encoding or decoding an enhanced version of this base layer. The bitrate needed for encoding the base layer is kept relatively small while each enhancement layer allows for a better quality at the cost of extra bandwidth. The image within a sequence to be encoded or decoded is considered as several picture representations, corresponding to each layer, the base layer and each of the actual enhancement layers. A coded picture within a given scalability layer is called a picture representation level. Often, the base layer picture representation of an image corresponds to a low resolution version of the image while the picture representations of successive layers correspond to higher resolutions version of the image. This is illustrated on Figure 1, illustrating two successive images having two layers. Image 1.1 corresponds to the base layer picture representation of image at time t. Image 1.2 corresponds to the base layer picture representation of image at time 1-1. Image 1.3 corresponds to the enhancement layer picture representation of image at time t Image 1.4 corresponds to the enhancement layer picture representation of image at time t-t In the following, to emphasize the fact that in scalable encoding, the encoding of an enhancement layer is made relatively to another layer used as a reference and that this reference layer is not necessarily the base layer, the term of reference layer will be used instead of base layer.

Typically the image is divided into coding units which are typically square shapes, often called blocks as in coding unit 2.3 or 2.7 in Figure 2. The coding units are encoded using predictive encoding. Predictive encoding is based on determining data whose value is an approximation of the pixel data to be encoded, this data being called a predictor of the coding unit. The difference between this predictor and the coding unit to encode is called the residual.

Encoding consists, in this case, of encoding the location of the predictor and the residual. A good predictor is a predictor whose value is close to the value of the coding unit, leading to a residual having a small value which could be efficiently encoded.

Each coding unit may be encoded based on predictors from previously encoded images in a coding mode called "inter" coding. It may be noted that the term of previous does not refer exclusively to a previous image in the temporal sequence of video. It refers to the sequential encoding or decoding scheme, and means that the "previous" image has been encoded or decoded previously and may therefore be used as a reference image for the encoding of the current image. For example, in Figure 2a, block 2.4 in previous image 2.2 is used as a predictor of coding unit 2.3 in image 2.1. In this case, the location is indicated by a vector 2.5 giving the location of the predictor in the previous image relatively to the location of the coding unit in the image to encode. This predictive mode is called "inter" coding. It may be also encoded based on information already encoded and decoded in the image to encode. In this case, illustrated by Figure 2b, the predictor is obtained from the left and above border pixels 2.6 of the coding unit 2.5 and a vector giving prediction direction. This predictive mode is called "intra" coding.

Figure 3a illustrates scalable encoding as implemented, for example, in SVC. The image to be encoded at time t has two picture representations, a picture representation 3.3 in the reference layer and a picture representation 3.1 in the enhancement layer. The previous image, typically already encoded, or decoded, has picture representations 3.4 in the reference layer and 3.2 in the enhancement layer. In the reference layer, the coding unit 3.8 has been encoded using the predictor 3.7 and the motion vector 3.9. In the enhancement layer, the coding unit 3.5, collocated to the coding unit 3.8 of the reference layer is encoded using the predictor 3.6 and the motion vector 3.10. The motion vectors 3.9 and 3.10 are illustrated as being very different as they result of independent block matching procedures. On Figure 3b, motion vectors 3.20 in the enhancement layer and 3.19 in the base layer are strongly correlated. This leads to residual data in the base and enhancement layers that are correlated.

However, note that the motion vector 3.20 associated to a current enhancement coding unit 3.15 may differ strongly from the motion vector of the co-located coding unit 3.16 in the reference layer. Indeed, the motion vectors are selected by the encoder side according to a rate/distortion criterion. The rate distortion optimized motion vector selection aims at finding a good predictor of a current coding unit 3.15 in the reference picture 3.16, while keeping the coding cost of resulting motion vector and residual data acceptable. This may lead to quite different results in two different scalability layers. And especially as the quality parameters used to code each layer are different from each other.

The term co-located in this document concerns pixels or set of pixels having the same spatial location within two different image picture representations. It is mainly used to define two blocks of pixels, one in the enhancement layer, the second in the reference layer, having the same spatial location in the two layers (with a scaling factor in case of resolution change between two layers). It may also be used for two successive images in time. It may also refer to entities related to co-located data, for example when talking about co-located residual.

It is to be noted that, at decoding time, when decoding a particular picture representation, the only data we can use are the picture representations already decoded. To fit the decoding and have a perfect match between encoding and decoding, the encoding of a particular picture representation is based on a decoded version of previously encoded picture representations. This is known as the principle of causal coding.

In SVC, the encoding of the enhancement layer considers that "single loop decoding" is applied, meaning that the decoding of the reference layer is not brought to completion when dealing with the enhancement layer. The reference layer is subject to a partial decoding. The obtained partially decoded reference image is then composed of inter blocks partially decoded until obtaining their residual but on which is not applied the motion compensation, and of intra blocks integrally decoded. Note that during encoding, a constraint intra prediction could be applied preventing the usage of neighbouring inter predicted blocks for performing the intra prediction. Therefore, due to the principle of causal coding, when encoding the enhancement picture representation of an image, like image 3.1, the only available data are the previous decoded enhancement picture representation 3.2 and an image of the residual used in the reference layer.

The encoding of the enhancement layer is predictive, meaning that a predictor 3.6 is found in the previous image 3.2 to encode the coding unit 3.5 in the original picture representation 3.1. This encoding leads to the computation of a residual, called the first order residual block, being the difference between the coding unit 3.5 and its predictor 3.6. It may be attempted to improve the encoding by realizing a second order prediction, namely by using predictive encoding of this first order residual block itself. The standard SVC was offering the possibility of predicting the residual of a temporally predicted block in the enhancement layer from the residual of a co-located temporally predicted block in the reference layer. This inter layer residual prediction (ILRP) mode was mainly based on the assumption that the enhancement and the reference layer motions were strongly correlated. As can be seen in Figure 3b, predicted blocks 3.15 in the enhancement layer and 3.18 in the reference layer have a similar motion 3.20 and 3.19. In that condition, it can be assumed that the residual of block 3.19 is close to the residual of block 3.20. The first order residual block of block 3.19 offers a good predictor for the first order residual block of block 3.20.

In that case the enhancement layer block is coded in the form of a mode indicator indicating the ILRP mode and a second order residual corresponding to the difference between the two first order residual blocks.

Actually, the assumption that co-located enhancement and reference layer blocks share the same motion is rarely verified. As already explained, the motion vector choice in the enhancement layer depends on the rate distortion properties of each candidate motion vector considered during the motion estimation process. These rate distortion properties may strongly differ from a layer to another one, since each layer is encoded with its own resolution and quality level.

In HEVC as in the previous standard H.264/AVC, the temporal prediction signal can be weighted by a weight, in order for instance to better deal with fading or cross-fading images. Weighted prediction modes are therefore specified to enable to weight the predictions based on the reference pictures.

Weighted prediction may be used in uni-prediction and bi-prediction. These modes may apply to any layer in case of scalability.

In HEVC, as in previous standards, in the uni-prediction case, a weighting factor w0 and an offset 00 may be signalled in the slice header.

Conceptually the prediction signal is defined by the following simplified equation where rounding aspects are not taken into account: PRED = MC[REF,MV] *w0 + 0 In HEVC as in previous standards, in the Bi-prediction case, two weighted factors w0 and w1 and two offsets 00 and 01 are signaled in the slice header. Conceptually the prediction signal is defined by the following simplified equation where rounding aspects are not taken into account: PRED = (MC[REF°,MV°] *w0 +MC[REF1JMV1I *w1 -Eo0 -F o)/2 Besides, a difference in the luminosity may occur between the reference image used for the prediction and the image to be encoded.

The present invention has been devised to address one or more of the foregoing concerns. It is also proposed to extend the weighted prediction concept in the context of second order prediction.

According to a first aspect of the invention there is provided a method for encoding an image of pixels according to a scalable encoding scheme having an enhancement layer and a reference layer, the method comprising for the encoding of said enhancement layer: (a) obtaining a block predictor candidate for predicting a coding unit within the enhancement layer and an associated enhancement-layer residual block corresponding to said prediction; (b) determining a block predictor in the reference layer co-located with the determined block predictor candidate within the enhancement layer; (c) determining a reference-layer residual block associated with the coding unit in the reference layer that is co-located with the coding unit in the enhancement layer using the determined block predictor in the reference layer; and (d) determining for the coding unit of the enhancement layer a further residual block corresponding, at least partly, to the difference between the enhancement-layer residual block and the reference-layer residual block wherein the method further comprises applying at least a weighting factor to at least one element among the obtained block predictor candidate, the determined block predictor and the reference-layer residual block.

Accordingly, the differences in luminosity between the reference image and the image to be encoded may be compensated.

According to an embodiment, the method comprises applying a first weighting factor to the obtained predictor candidate and applying a second weighting factor to the determined block predictor.

According to an embodiment, the method comprises applying a dedicated weighting factor to the reference-layer residual block.

According to an embodiment, the method further comprises adding an offset to at least one element multiplied by a weighting factor.

According to an embodiment, the method comprises selecting an encoding mode for a coding unit of the enhancement layer from among a plurality of available encoding modes, one of the available modes involving said steps (a) to (d).

According to an embodiment, when a reconstructed version of the reference layer picture representations is available: the reference-layer residual block is determined as the difference between the co-located coding unit in the reference layer and the determined block predictor in the reference layer and each sample of said further residual block corresponds to a difference between a sample of the enhancement-layer block and a corresponding sample of the reference-layer residual block.

According to an embodiment, when an image of residual data used for the encoding of the reference layer is available, determining the reference-layer residual block comprises: determining the overlap in the image of residual data between the determined block predictor and the block predictor used in the encoding of the block co-located with the coding unit in the reference layer and using the part in the image of residual data corresponding to this overlap, if any, to compute a part of said further residual block, wherein the samples of said further residual block corresponding to this overlap each correspond to a difference between a sample of the enhancement-layer residual block and a corresponding sample of the reference-layer residual block.

According to an embodiment, the obtained block predictor candidate of the coding unit is in a previously encoded image.

According to an embodiment, the obtained predictor candidate of the coding unit is obtained from a previously encoded part of the same image the coding unit belongs to.

According to an embodiment, the method further comprises transmitting said dedicated weighting factor as a syntax element.

According to an embodiment, the method further comprises transmitting said dedicated weighting factor as a syntax element added in the signalling table corresponding to the weighted prediction signalling.

According to an embodiment, the method further comprises transmitting said dedicated weighting factor as a syntax element in the signalling table corresponding to the weighted prediction signalling usually used for the transmission of usual weighted prediction for the enhancement layer, a high level flag being added to discriminate both usages.

According to an embodiment, the method further comprises transmitting said dedicated weighting factor as a syntax element in the signalling table corresponding to the weighted prediction signalling usually used for the transmission of usual weighted prediction for slices of B type, a high level flag being added to discriminate both usages for this type of slice.

According to another aspect of the invention there is provided a method for decoding a bit stream comprising data representing an image encoded according to a scalable encoding scheme having an enhancement layer and a reference layer, the method comprising for the decoding of said enhancement layer: obtaining from the bit stream the location of a block predictor of a coding unit within the enhancement layer to be decoded and a residual block comprising difference information between enhancement layer residual information and reference layer residual information; determining the block predictor in the reference layer co-located with the block predictor in the enhancement layer; determining a reference-layer residual block corresponding to the difference between the block of the reference layer co-located with the coding unit to be decoded and the determined block predictor in the reference layer; reconstructing an enhancement-layer residual block using the determined reference-layer residual block and said residual block obtained from the bit stream and reconstructing the coding unit using the block predictor and the enhancement-layer residual block; wherein the method further comprises applying at least a weighting factor to at least one element among the obtained block predictor candidate, the determined block predictor and the reference-layer residual block..

According to an embodiment, the determination of a predictor of the coding unit is made using a cost function adapted to take into account the prediction of the enhancement-layer residual block to determine a rate distortion cost.

According to an embodiment, determining a reference-layer residual block comprises on-demand upsampling of needed blocks in the reference layer.

According to an embodiment, when the reference-layer residual block is determined at the reference layer resolution, the method further comprises up-sampling the determined reference-layer residual block.

According to an embodiment, picture representations used in the reference layer to compute the reference-layer residual block correspond to some of the reference picture representations stored in the decoded picture buffer of the reference layer.

According to an embodiment, the method further comprises obtaining said dedicated weighting factor as a syntax element.

According to an embodiment, the method further comprises selecting several dedicated weighting factors for the same coding unit among a plurality of dedicated weighting factor available, each of said available dedicated weighting factors being defined as a particular reference frame.

According to another aspect of the invention there is provided a device for encoding an image of pixels according to a scalable encoding scheme having an enhancement layer and a reference layer, the device comprising for the encoding of said enhancement layer: (a) an obtaining module for obtaining a block predictor candidate for predicting a coding unit within the enhancement layer and an associated enhancement-layer residual block corresponding to said prediction; (b) a determining module for determining a block predictor in the reference layer co-located with the determined block predictor candidate within the enhancement layer; (c) a determining module for determining a reference-layer residual block associated with the coding unit in the reference layer that is co-located with the coding unit in the enhancement layer using the determined block predictor in the reference layer; (d) a determining module for determining for the coding unit of the enhancement layer a further residual block corresponding, at least partly, to the difference between the enhancement-layer residual block and the reference-layer residual block wherein the device further comprises a multiplying module for applying at least a weighting factor to at least one element among the obtained block predictor candidate, the determined block predictor and the reference-layer residual block.

According to an embodiment, the multiplying module is adapted for applying a first weighting factor to the obtained predictor candidate and for applying a second weighting factor to the determined block predictor.

According to an embodiment, the multiplying module is adapted for applying a dedicated weighting factor to the reference-layer residual block.

According to an embodiment, the multiplying module is adapted for adding an offset to at least one element multiplied by a weighting factor.

According to an embodiment, the device comprises a selecting module for selecting an encoding mode for a coding unit of the enhancement layer from among a plurality of available encoding modes, one of the available modes involving said modules (a) to (d).

According to an embodiment, when a reconstructed version of the reference layer picture representations is available the reference-layer residual block is determined as the difference between the co-located coding unit in the reference layer and the determined block predictor in the reference layer and each sample of said further residual block corresponds to a difference between a sample of the enhancement-layer residual block and a corresponding sample of the reference-layer residual block.

According to an embodiment, the determining module for determining the reference-layer residual block is operable, when an image of residual data used for the encoding of the reference layer is available, to determine the overlap in the image of residual data between the determined block predictor and the block predictor used in the encoding of the block co-located with the coding unit in the reference layer and to compute a part of said further residual block using the part in the image of residual data corresponding to this overlap, if any, wherein the samples of said further residual block of the enhancement layer corresponding to this overlap each correspond to a difference between a sample of the enhancement-layer residual block and a corresponding sample of the reference-layer residual block.

According to an embodiment, the device further comprises a transmitter for transmitting said dedicated weighting factor as a syntax element.

According to an embodiment, said transmitter is adapted for transmitting said dedicated weighting factor as a syntax element added in the signalling table corresponding to the weighted prediction signalling.

According to an embodiment, said transmitter is adapted for transmitting said dedicated weighting factor as a syntax element in the signalling table corresponding to the weighted prediction signalling usually used for the transmission of usual weighted prediction for the enhancement layer, a high level flag being added to discriminate both usages.

According to an embodiment, said transmitter is adapted for transmitting said dedicated weighting factor as a syntax element in the signalling table corresponding to the weighted prediction signalling usually used for the transmission of usual weighted prediction for slices of B type, a high level flag being added to discriminate both usages for this type of slice.

According to another aspect of the invention there is provided a device for decoding a bit stream comprising data representing an image encoded according to a scalable encoding scheme having an enhancement layer and a reference layer, the device comprising for the decoding of said enhancement layer: an obtaining module for obtaining from the bit stream the location of a block predictor of a coding unit within the enhancement layer to be decoded and a residual block comprising difference information between enhancement layer residual information and reference layer residual information; a determining module for determining the block predictor in the reference layer co-located with the block predictor in the enhancement layer; a determining module for determining a reference-layer residual block corresponding to the difference between the block of the reference layer co-located with the coding unit to be decoded and the determined block predictor in the reference layer; a reconstructing module for reconstructing an enhancement-layer residual block using the determined reference-layer residual block and said residual block obtained from the bit stream and a reconstructing module for reconstructing the coding unit using the block predictor and the enhancement-layer residual block wherein the device further comprises a multiplying module for applying at least a weighting factor to at least one element among the obtained block predictor candidate, the determined block predictor and the reference-layer residual block.

According to an embodiment, said determining module for determining a reference-layer residual block comprises an up-sampling module for on-demand upsampling of needed blocks in the reference layer.

According to an embodiment, the device further comprises an up-sampling module operable, when the reference-layer residual block is determined at the reference layer resolution, to up-sample the determined reference-layer residual block.

According to an embodiment, the device further comprises an obtaining module for obtaining said dedicated weighting factor as a syntax element.

According to an embodiment, the device further comprises a selecting module for selecting several dedicated weighting factors for the same coding unit among a plurality of dedicated weighting factor available, each of said available dedicated weighting factors being defined as a particular reference frame.

According to a further aspect, the invention concerns a computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to the invention when loaded into and executed by the programmable apparatus.

According to a further aspect, the invention concerns a computer-readable storage medium storing instructions of a computer program for implementing a method, according to the invention.

According to a further aspect of the invention there is provided a method for encoding an image of pixels according to a scalable encoding scheme having an enhancement layer and a reference layer, the method comprising for the encoding of said enhancement layer (a) obtaining a first block predictor candidate for predicting a coding unit within the enhancement layer; (b) obtaining a second block predictor candidate for predicting a coding unit within the enhancement layer; (c) obtaining a resulting block predictor based on the first and the second block predictor and an associated enhancement-layer residual block corresponding to said prediction (d) determining a first block predictor in the reference layer co-located with the determined first block predictor candidate within the enhancement layer; (e) determining a second block predictor in the reference layer co-located with the determined second block predictor candidate within the enhancement layer; (f) determining a resulting block predictor in the reference layer (g) determining a reference-layer residual block associated with the coding unit in the reference layer that is co-located with the coding unit in the enhancement layer using the determined resulting block predictor in the reference layer; and (h) determining for the coding unit of the enhancement layer a further residual block corresponding, at least partly, to the difference between the enhancement-layer resulting residual block and the reference-layer residual block.

At least parts of the methods according to the invention may be computer implemented. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.

Since the present invention can be implemented in software, the present invention can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.

Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which: Figure 1 illustrates the relations between the different picture representations of images in a scalable encoding architecture; Figure 2a and 2b illustrates the principle of inter and intra coding; Figure 3a and 3b illustrates scalable encoding as implemented in prior art; Figure 4 illustrates the residual prediction in an embodiment of the invention; Figure 5 illustrates the method used for residual prediction in an embodiment of the invention; Figure 6 illustrates the method used for decoding in an embodiment of the invention; Figure 7 illustrates an alternative embodiment in the context of single loop decoding; Figure 8 illustrates an alternative embodiment in the context of intra coding; Figure 9 illustrates the architecture of an example of device for the implementation of the invention; Figure 10 illustrates a block diagram of a typical scalable video coder generating 2 scalability layers.

Figure 11 illustrates a block diagram of a decoder which may be used to receive data from an encoder according an embodiment of the invention.

The prediction of the residual will now be described in relation with Figure 4 and Figure 5. The image to encode, or decode, is the picture representation 4.1 in the enhancement layer. This image is constituted by the original pixels.

Picture representation 4.2 in the enhancement layer is available in its reconstructed version. Regarding the reference layer, it depends on the type of scalable video decoder. In the case of a single loop decoding approach, meaning that the reference layer reconstruction is not brought to completion, the picture representation 4.4 is composed of inter blocks decoded until obtaining their residual but on which is not applied the motion compensation and of intra blocks which may be integrally decoded as in SVC, as will be described later, partially decoded until obtaining their intra prediction residual.

Note that in figure 4, both layers are represented at the same resolution as in SNR (Signal-to-Noise Ratio) scalability. In Spatial scalability, two different layers will have different resolutions which require an up-sampling of the residual before performing the prediction of the residual.

In a multiple loop decoding case, a complete reconstruction of the reference layer is conducted. In this case, picture representation 4.4 of the previous image and picture representation 4.3 of the current image both in the reference layer are available in their reconstructed version.

Figure 10 provides a block diagram of a typical scalable video coder generating two scalability layers. This diagram is organized in two stages 10.0, 10.30, respectively dedicated to the coding of each of the scalability layers generated. The numerical references of similar functions are incremented by 30 between the successive stages. Each stage takes, as an input, the original sequence of images to be compressed, respectively 10.2 and 10.32, possibly subsampled at the spatial resolution of the scalability layer. Within each stage a motion-compensated temporal prediction loop is implemented.

The first stage 10.0 in Figure 10 corresponds to the encoding diagram of an H.264/AVC or HEVC non-scalable video coder and is known to persons skilled in the art. It successively performs the following steps for coding the base layer. A current image 10.2 to be compressed at the input of the coder is divided into coding units, by the function 10.4. Each coding unit first undergoes a motion estimation step 10.16, comprising a block matching algorithm which attempts to find among reference images stored in a buffer 10.12 reference prediction units for best predicting the current coding unit. This motion estimation function 10.16 supplies one or more indices of reference images containing the reference prediction units found, as well as the corresponding motion vectors. A motion compensation function 10.18 applies the estimated motion vectors to the reference prediction units found and copies the blocks thus obtained, which provides a temporal prediction block. In addition, an INJTRA prediction function 10.20 determines the spatial prediction mode of the current coding unit that would provide the best performance for the coding of the current coding unit in INTRA mode. Next a function of choosing the coding mode 10.14 determines, among the temporal and spatial predictions, the coding mode that provides the best rate-distortion compromise in the coding of the current coding unit. The difference between the current coding unit and the prediction coding unit thus selected is calculated by the function 10.26, so as to provide a residue (temporal or spatial) to be compressed. This residual coding unit then undergoes a spatial transform (such as the discrete cosine transformation or DOT) and quantization functions 10.6 to produce quantized transform coefficients. Entropy coding of these coefficients is then performed, by a function not shown in Figure 10, and supplies the compressed texture data of the current coding units. This function is in charge of encoding all information related to block, i.e. the coding mode, the motion information and the coefficient information.

Finally, the current coding unit is reconstructed by means of a reverse quantization and reverse transformation 10.8, and an addition 10.10 of the residue after reverse transformation and the prediction coding unit of the current coding unit. Once the current image is thus reconstructed, it is stored in a buffer 10.12, called the DPB standing for Decoded Picture Buffer, in order to serve as a reference for the temporal prediction of future images to be coded.

Function 10.24 performs a post filtering operations comprising a deblocking filter and Sample adaptive Offset (SAO) and optionally an Adaptive Loop Filter (ALF). These post filter operations aim at reducing the encoding artifacts.

The second stage in Figure 10 illustrates the coding of a first enhancement layer 10.30 of the scalable stream. This stage 10.30 is similar to the coding scheme of the base layer, except that, for each coding of a current image in the course of compression, additional prediction modes, compared to the coding of the base layer, may be chosen by the coding mode selection function 10.44. These prediction modes called "inter-layer prediction modes" may comprise several modes. These modes consist of reusing the coded data in a reference layer below the enhancement layer currently being coded as prediction data of the current coding unit.

In the case where the reference layer contains an image that coincides in time with the current image, then referred to as the "base image" of the current image, the co-located coding unit, may serve as a reference for predicting the current coding unit. More precisely, the coding mode, the coding unit partitioning, the motion data (if present) and the texture data (residue in the case of a temporally predicted coding unit, reconstructed texture in the case of a coding unit coded in INTRA) of the co-located coding unit can be used to predict the current coding unit. In the case of a spatial enhancement layer, (not shown) up-sampling operations are applied on texture and motion data of the reference layer. These inter layer prediction modes comprise the Generalized Residual Inter Layer Prediction Mode corresponding to the invention.

In addition to the inter-layer prediction modes, each coding unit of the enhancement layer can be encoded using usual H.264/AVC or HEVC modes based on temporal or spatial prediction. The mode providing the best rate-distortion compromise is then selected by block 10.44.

Figure 11 is a block diagram of a scalable decoding method for application on a scalable bit-stream comprising two scalability layers, e.g. comprising a base layer and an enhancement layer. The decoding process may thus be considered as corresponding to reciprocal processing of the scalable coding process of Figure 10. The scalable bit stream being decoded, as shown in Figure 10, is made of one base layer and one spatial enhancement layer on top of the base layer, which are demultiplexed in step 11.11 into their respective layers. It will be appreciated that the process may be applied to a bit stream with any number of enhancement layers.

The first stage of Figure 11 concerns the base layer decoding process.

The decoding process starts in step 11.12 by entropy decoding each coding unit of each coded image in the base layer. The entropy decoding process 11.12 provides the coding mode, the motion data (reference images indexes, motion vectors of INTER coded coding units in case of INTER prediction, the INTRA prediction direction in case of intra prediction) and residual data. This residual data includes quantized and transformed DCT coefficients. Next, these quantized DCT coefficients undergo inverse quantization (scaling) and inverse transform operations in step 11.13. The decoded residual is then added in step 11.16 to a temporal prediction area from motion compensation 11.14 or an Intra prediction area from Intra prediction step 11.15 to reconstruct the coding unit.

Loop filtering is effected in step 11.17. The so-reconstructed residual data is then stored in the frame buffer 11.60. The decoded motion and temporal residual for INTER coding units may also be stored in the frame buffer. The stored frames contain the data that can be used as reference data to predict an upper scalability layer. Decoded base images 11.70 are obtained.

The second stage of Figure 11 performs the decoding of a spatial enhancement layer on top of the base layer decoded by the first stage. This spatial enhancement layer decoding includes entropy decoding of the enhancement layer in step 11.52, which provides the coding modes, motion information as well as the transformed and quantized residual information of coding units of the enhancement layer.

A subsequent step of the decoding process involves predicting coding units in the enhancement image. The choice 11.53 between different types of coding unit prediction (INJTRA, INTER, Intra BL or Base mode) depends on the prediction mode obtained from the entropy decoding step 11.52. In the same way as on the encoder side, these prediction modes consist in the set of prediction modes of HEVC, which are enriched with some additional inter-layer prediction modes.

The prediction of each enhancement coding unit thus depends on the coding mode signalled in the bit stream. According to the CU coding mode the coding units are processed as follows: In the case of an inter-layer predicted INTRA coding unit, the enhancement coding unit is reconstructed by undergoing inverse quantization and inverse transform in step 11.54 to obtain residual data and adding in step 11.55 the resulting residual data to Intra prediction data from step 11.57 to obtain the fully reconstructed coding unit. Loop filtering is then effected in step 11.58 and the result stored in frame memory 11.80.

-In the case of an INTER coding unit, the reconstruction involves the motion compensated temporal prediction 11.56, the residual data decoding in step 11.54 and then the addition of the decoded residual information to the temporal predictor in step 11.55. In such an INTER coding unit decoding process, inter-layer prediction can be used in two ways. First, the temporal residual data associated with the considered enhancement layer coding unit may be predicted from the temporal residual of the co-sited coding unit in the base layer by means of generalized residual inter-layer prediction. Second, the motion vectors of prediction units of a considered enhancement layer coding unit may be decoded in a predictive way, as a refinement of the motion vector of the co-located coding unit in the base layer.

-In the case of an lntra-BL coding mode, the result of the entropy decoding of step 11.52 undergoes inverse quantization and inverse transform in step 11.54, and then is added in step 11.55 to the co-located coding unit of current coding unit in base image, in its decoded, post-filtered and up-sampled (in case of spatial scalability) version.

In the case of Base-Mode prediction the result of the entropy decoding of step 11.52 undergoes inverse quantization and inverse transform in step 11.54, and then is added to the co-located area of current CU in the Base Mode prediction picture in step 11.55.

As mentioned previously, it may be noted that the Intra BL prediction coding mode is allowed for every CU in the enhancement image, regardless of the coding mode that was employed in the co-sited Coding unit(s) of a considered enhancement CU. Therefore, the proposed approach consists of a multiple loop decoding system, i.e. the motion compensated temporal prediction loop is involved in each scalability layer on the decoder side.

A method of deriving prediction information, in a base-mode prediction mode, for encoding or decoding at least pad of an image of an enhancement layer of video data, in accordance with an embodiment of the invention will now be described.

As already mentioned above, two types of decoding processes are envisioned in the proposed invention. In single loop decoding, the decoding process of a reference layer described in Figure 11 is performed partially.

Concerning Inter coded blocks, only steps 11.12 and 11.13 are applied in order to obtain motion and residual information. Concerning Intra predicted blocks, two solutions are possible. Intra blocks could be integrally decoded until obtaining their reconstructed pixel values or partially reconstructed to obtain prediction directions and residual data only.

In case of multi-loop decoding, the complete decoding process described in Figure 11 is applied to the base layer. It would be obvious to a person skilled in the art that in order to avoid any drift between the encoder and the decoder the same decoding process (either single loop or multi-loop) is applied by the encoder and the decoder to allow identical inter layer prediction.

As already mentioned above, one of the mode in competition for encoding a block in an enhancement layer in SVC was the ILRP mode. This mode was consisting in predicting the residual of an enhancement layer block from the residual of the co-located block in a reference layer. It has been proposed a generalization of these mode called GRILF (Generalized Inter Layer Prediction Mode), better taking into account the difference between the enhancement layer motion and the motion of the co-located block in the reference layer.

The GRILP mode will now be described in relation with Figure 4 and Figure 5. The image to encode or decode is the picture representation 4.1 in the enhancement layer. This image is constituted of the original pixels. Picture representation 4.2 in the enhancement layer is available in its reconstructed version. Regarding the reference layer, it depends on the scalable decoder architecture considered. If the encoding mode is single loop, meaning that the reference layer reconstruction is not brought to completion, the picture representation 4.4 is composed of inter blocks decoded until obtaining their residual but on which is not applied the motion compensation and intra blocks which may be integrally decoded as in SVC or partially decoded until obtaining their intra prediction residual and a prediction direction. Note that in figure 4, both layers are represented at the same resolution as in SNIR scalability. In Spatial scalability, two different layers will have different resolutions which require an up-sampling of the residual and motion information before performing the prediction of the residual.

Where the encoding mode is multi loop, a complete reconstruction of the reference layer is conducted. In this case, picture representation 4.4 of the previous image and picture representation 4.3 of the current image both in the reference layer are available in their reconstructed version.

As already seen with reference to step 10.44 in Figure 10, a competition is performed between all modes available in the enhancement layer to determine mode optimizing a rate-distortion trade off. The GRILP mode is one of the modes in competition for encoding a block of an enhancement layer.

In a first embodiment we describe a first version of the GRILP adapted to temporal prediction in the enhancement layer. This embodiment starts with the determination of the best temporal GRILP predictor in a set comprising several potential temporal GRILP predictors obtained using a block matching algorithm.

In a first step 5.1, a predictor candidate contained in the search area of the motion estimation algorithm is obtained for block 4.5. This predictor candidate represents an area of pixels 4.6 in the reconstructed reference image 4.2 in the enhancement layer pointed by a motion vector 4.10. A difference between block 4.5 and block 4.6 is then computed to obtain a first order residual block in the enhancement layer. For the considered reference area 4.6 in the enhancement layer, the corresponding co-located area 4.12 in the reconstructed reference layer image 4.4 in the base layer is identified in step 5.2. In step 5.3 a difference is computed between block 4.8 and block 4.12 to obtain a first order residual block for the base layer. In step 5.4, a prediction of the first order residual block of the enhancement layer by the first order residual block of the reference layer is performed. During this prediction, the difference between the first order residual block of the enhancement layer and the first order residual block of the reference layer is computed. This last prediction allows obtaining a second order residual. It is to be noted that the first order residual block of the reference layer does not correspond to the residual used in the predictive encoding of the reference layer which is based on the predictor 4.7. This first order residual block is a kind of virtual residual obtained by reporting in the reference layer the motion vector obtained by the motion estimation conducted in the enhancement layer. Accordingly, by being obtained from co-located pixels, it is expected to be a good predictor for the residual obtained in the enhancement layer. To emphasize this distinction and the fact that it is obtained from co-located pixels, it will be called the co-located residual in the following.

In step 5.5, the rate distortion cost of the GRILP mode under consideration is evaluated. This evaluation is based on a cost function depending on several factors. An example of such a cost function is: C = D + A(R + Rmv + R); where C is the obtained cost, D is the distortion between the original coding unit to encode and its reconstructed version after encoding and decoding. R + Rmv + Rr represents the bitrate of the encoding, where R is the component for the size of the syntax element representing the coding mode, is the component for the size of the encoding of the motion information, and R1 is the component for the size of the second order residual. A is the usual Lagrange parameter.

In step 5.6, a test is performed to determine if all predictor candidates contained in the search area have been tested. If some predictor candidates remain, the process loops back to step 5.1 with a new predictor candidate.

Otherwise, all costs are compared during step 5.7 and the predictor candidate minimizing the rate distortion cost is selected.

The cost of the best GRILP predictor will be then compared to the costs of other predictors available for blocks in an enhancement layer to select the best prediction mode. If the GRILP mode is finally selected, a mode identifier, the motion information and the encoded residual are inserted in the bit stream.

The decoding of the GRILP mode is illustrated by Figure 6, The bit stream comprises the means to locate the predictor and the second order residual. In a first step 6.1, the location of the predictor used for the prediction of the coding unit and the associated residual are obtained from the bit stream.

This residual corresponds to the second order residual obtained at encoding. In a step 6.2, similarly to encoding, the co-located predictor is determined. It is the location in the reference layer of the pixels corresponding to the predictor obtained from the bit stream. In a step 6.3, the co-located residual is determined. This determination may vary according to the particular embodiment similarly to what is done in encoding. In the context of multi loop and inter encoding it is defined by the difference between the co-located coding unit and the co-located predictor in the reference layer. In a step 6.4, the first order residual block is reconstructed by adding the residual obtained from the bit stream which corresponds to the second order residual and the co-located residual. Once the first order residual block has been reconstructed, it is used with the predictor which location has been obtained from the bit stream to reconstruct the coding unit in a step 6.5.

In an alternative embodiment allowing a reduction of the complexity of the determination of the best GRILP predictor, it is possible to perform the motion estimation in the enhancement layer without considering the prediction of the first order residual block. The motion estimation becomes classical and provides a best temporal predictor in the enhancement layer. In figure 5, this embodiment consists in replacing step 5.1 by a complete motion estimation step determining the best temporal predictor among the predictor candidates in the enhancement layer and by removing steps 5.6, 5.7 and 5.8. All other steps remain identical and the cost of the GRILP mode is then compared to the costs ofothermodes.

By default, during the computation, the following picture representations are stored in memory, the picture representation of the current image to encode in the enhancement layer, the picture representation of the previous image in the enhancement layer in its reconstructed version, the picture representation of the current image in the reference layer in its reconstructed version and the picture representation of the previous image in the reference layer in its reconstructed version. The reference layer picture representations are typically upsampled to fit the resolution of the enhancement layer.

Advantageously, the blocks in the reference layer are upsampled only when needed instead of upsampling the whole picture representation at once.

The encoder and the decoder may be provided with on-demand block upsampling means to achieve the upsampling. Alternatively, to save some computation, the upsampling is done on the block data only. The decoder must use the same upsampling function to ensure proper decoding. It is to be noted that all the blocks of a picture representation are typically not encoded using the same coding mode. Therefore, at decoding, only some of the blocks are to be decoded using the GRILP mode herein described. Using on-demand block upsampling means is then particularly advantageous at decoding as only some of the blocks of a picture representation have to be upsampled during the process.

In another embodiment, which is advantageous in terms of memory saving, the first order residual block in the reference layer may be computed between reconstructed pictures which are not up-sampled, being thus stored in memory at the spatial resolution of the reference layer.

The computation of the first order residual block in the reference layer then includes a down-sampling of the motion vector considered in the enhancement layer, towards the spatial resolution of the reference layer. The motion compensation is then performed at reduced resolution level in the reference layer, which provides a first order residual block predictor at reduced resolution.

The last inter-layer residual prediction step then consists in up-sampling the so-obtained first order residual block predictor, through a bi-linear interpolation filtering for instance. Any spatial interpolation filtering could be considered at this step of the process (examples: 8-Tap DCTIF, 6-tap DCT-IF, 4-tap SVC filter, bi-linear). This last embodiment may lead to slightly reduced coding efficiency in the overall scalable video coding process, but does not need additional reference picture storing compared to standard approaches that do not implement the present embodiment.

An alternative embodiment is now described in the context of single loop decoding in relation with Figure 7. Single loop decoding means that the reconstruction process of the reference layer is not brought to completion. The only data available for the reference layer is therefore the image of the residual 7.4 used for the encoding. Figure 7 is adapted from Figure 4 in which picture representation 7.4 is composed of residual instead of the reconstructed version of the previous image in the reference layer. References with the same secondary number correspond to the same element as in Figure 4. The process is globally the same as the one conducted for multi loop decoding configuration.

When coming to step 5.3 in Figure 5, the way the co-located residual is determined needs to be adapted. Block 7.12 is the co-located predictor in the picture representation 7.4 corresponding to the predictor 7.6 in the picture representation 7.2 of the previous image in the enhancement layer. Block 7.7 is the block corresponding to the location of the predictor found in the reference layer. The case where these two blocks overlap is considered. The overlap defines a common part 7.13 represented as dashed in Figure 7. The residual values pertaining to this common pad 7.13 are composed with the residual of the prediction of the corresponding dashed part 7.14 in the picture representation of the current image in the reference layer. This prediction has been done according to the motion vector 7.9 found in the reference layer. The dashed part 7.14 that has been predicted pertained to the block 7.8 which is co-located with the coding unit 7.5 in the enhancement layer.

This common part 7.13 is co-located with the dashed part 7.16 of the predictor 7.6, this dashed part 7.16 of the predictor being used for the prediction of the dashed part 7.15 of the coding unit. The prediction of the coding unit 7.5 based on the predictor 7.6 generates a first order residual block. It is to be noted that residuals in the co-located predictor 7.12 outside the common part are related to pixels outside the co-located coding unit 7.8. In the same manner, residuals in the predictor 7.7 pixels outside the common part are related to predictor pixels outside the predictor 7.6. It comes that the only relevant part of the residuals available in the picture representation 7.4 for the prediction of the first order residual block obtained in the enhancement layer is the common part.

This common part constitutes the co-located residual that is used for the prediction of the co-located part of the first order residual block in the enhancement layer. In this embodiment, the prediction of the residual is realized partially for the part corresponding to the overlap in the residual image in the reference layer between the co-located predictor and the predictor used in the reference layer when this overlap exists. Other parts of coding unit 7.5 not corresponding to the part 7.15 corresponding to the overlap are predicted directly by predictor 7.6.

Figure 8 illustrates an embodiment of the invention in the context of intra coding. In intra coding the current image is the only one used for encoding. It is composed of a picture representation 8.5 in the reference layer and a picture representation 8.1 in the enhancement layer. The coding unit to be encoded is the block 8.2. The process of the invention works according to the same principle except temporal predictors are replaced by spatial predictors.

Predictors are blocks of the same size as the coding unit to encode obtained using a set of neighbour pixels as explained relatively to Figure 2b. As illustrated in Figure 8, the prediction in the enhancement layer taking into account a spatial GRILP prediction, has determined predictor pixels 8.3 and a prediction direction 8.4. The prediction direction plays the role of the motion vector in the inter coding. They constitute both means to locate the predictor.

The encoding of the reference layer has determined for the co-located coding unit 8.6 pixel predictors 8.7 and a prediction direction 8.8. The co-located predictor 8.9 is determined in the reference layer with the corresponding prediction direction 8.10. Similarly to the inter coding the prediction direction and the predictor obtained in the different layers may be correlated or not. For the sake of clarity, Figure 8 illustrates a case where both the predictor and the prediction direction are clearly different.

Similarly to the method described for inter coding, the co-located residual is computed in the reference layer as the difference between the co-located coding unit 8.6 and the predictor obtained from the co-located border pixels 8.9 using the prediction direction 8.10 determined in the enhancement layer. This co-located residual is used as a predictor for the first order residual block obtained in the enhancement layer. This prediction of the first order residual block leads to a second order residual which is embedded in the stream as the result of the encoding of the coding unit.

Considering the single ioop mode, 2 cases occur.

In a first case the following conditions are considered. First the constrained intra prediction mode is selected. Then INTRA prediction is allowed only from INTRA blocks that are completely decoded by the decoder. If blocks containing the pixels used for intra prediction in the enhancement layer are co-located with the blocks containing the pixels used for intra prediction in the reference layer then, all pixels used for intra prediction are reconstructed pixels and the process is the same than in the multi-loop decoding.

In a second case, when these conditions are not fulfilled, meaning that some blocks containing the pixels used for intra prediction in the enhancement layer are not co-located with the blocks containing the pixels used for intra prediction in the reference layer or if we are not in the constrained intra prediction mode, then inter layer prediction of the residual will be possible only if the prediction direction in the enhancement layer and the prediction direction in the reference layer are the same.

The GRILP mode has been described in the context of a so called enhancement layer and reference layer, it should be understood that it may apply to any image picture representations encoded based on another picture representation in the same relationship. It has been described focusing to the encoding, the person skilled in the art will understand, that the very same mechanism applies to the decoding.

In the specific case of an enhancement layer coding unit coded as inter, the coding unit can use uni-prediction or bi-prediction. Uni-prediction consists of predicting the coding unit using one single motion vector referring to one enhancement layer reference picture representation and when the GRILP mode applies, to one corresponding reference picture representation in the reference layer. Bi-prediction consists of predicting the coding unit using two motion vectors referring to two enhancement layer reference picture representations and when the GRILP mode applies, to two corresponding reference picture representations in the reference layer.

In case of inter uni-prediction, the invention can be summarized using the following equation to generate the enhancement layer prediction signal PREDEL: PREDEL = MC[REFEL)MVEL] + {UPS[RECRJJ -MC[UPS[REFRJJMVEJJ} where, * PREDEL corresponds to the prediction of the enhancement layer coding unit being processed, * RECRL is the block in the reconstructed reference layer picture representation collocated to the current coding unit in the enhancement layer picture representation, * Mi/EL is the motion vector used for the temporal prediction in the enhancement layer, * REFEL is the enhancement layer picture representation of the reference image, * REFRL is the reference layer picture representation of the reference image, * UPS[x] is the upsampling operator performing the upsampling of samples from picture x; it applies to the reference layer samples, * MC[x,y] is the operator performing the motion compensated prediction from the picture representation x using the motion vector y.

The bi-prediction mode may be summarized by the following equation: PREDEL = (MC[REFEL°J Mi/EL0] + MC[REFEL1, MVEL])/2 + {UPS[RECRLI -(MCFUPS[REERL°1) MT/EL°] + MCFUPSFREFRL1I) MVEL'])/2} where the upper indices 0 or 1 relates to the referred reference picture.

The proposed HEVC standard defines two prediction lists, the list L0 and the list L1. Each list corresponds to one set of reference pictures. MV refers to pictures from list L1, with I = 0,1.

These weighted prediction modes may be used in combination with the GRILP mode evocated above. In the uni-prediction case, the equation to generate the enhancement layer prediction signal can therefore be written as: PREDEL = w0. MC[REEEL, MVEJ + o -E {UPS[RECRJJ -w. MC[UPS[REFRJJ, IV! Vffj + °}, where w and o' are the weighting factor and the offset for the reference layer picture. Preferably, these parameters are identical to w0 and o.

In the bi-prediction case, the equation to generate the enhancement layer prediction signal can therefore be written as:

PRED

= (wo.Mc[REEEL°,MVEL°] +wl.MC[REFEL1)MVEL']+ 0 + oi)/2 + {UPS[RECRL] -(w. MC[UPSFREFRL°I, MVEL°I + WiMCFUPSFREFRL'1, MVEL1I+ Oo' + o')/2} According to some embodiments of the invention, it is proposed a dedicated weighting factor a and a dedicated offset f? to be applied specifically to the reference layer residual block determined in the GRILP mode. We call this coding mode the reference layer residual weighted prediction mode.

Advantageously, this weighting factor a is a power of two. Accordingly the computations are simpler as the multiplication may be implemented by a shift operator. This dedicated weighting factor and offset may be transmitted as a high level syntax element in the signalling of the encoded signal. It may be defined for the whole sequence, an image, a slice or a single block. It may also be selected from a list of a plurality of available weighting factors. Accordingly, the differences in luminosity between the reference image and the image to be encoded may be compensated.

The introduction of this dedicated weighting factor for uni-prediction may be summarized by the following equation: PREDEL = MC[REFEL, MVEL] + a. [UPS[RECRL] -MC[UPS[REFRLI, MVEL]} + /3; The introduction of this dedicated weighting factor for bi-prediction may be summarized by the following equation: PREDEL = (MCFREFEL°J MVEL°1 + MCFREFEL1J M VEL 1)72 + a. [UPS[RECRL] -(MC[UPS[REFRL°], MVEL°] + MC[UPS[REFRL1I, MVEL1])/2) + /3; In an alternative embodiment the two dedicated weighting factors for bi-prediction can be used, one for each prediction list:

PREDEL

-(MC[REFEL°)MVEL°] + aO.[UPS[RECRL] + MC[UPS[REERL°],MVEL°]) -E [3 + -\. MC[REFEL1JMVEL1I+ al.{UPS[RECRLI+ MC[UPS[REFRL'1,MVEL1]}-f-fl This reference layer residual weighted prediction mode may be combined with the previous weighted modes already described. In other words, the previously weighted modes were focused on applying a weight to the predictor while the reference layer residual weighted prediction mode focus on applying a dedicated weighting factor to the prediction of the residual.

According to an embodiment, the selection of the dedicated weighting factor may be based on decoded data from the enhancement layer.

Dedicated weighting factors can be selected for the same coding unit, the best dedicated weighting factor being selecting between said several dedicated factor for instance base on a rate/distortion criterion. A coding mode could be defined for each dedicated weighting factor, and written in the coding unit bitstream by the encoder to allow the decoder retrieving the selected dedicated weighting factor.

According to an embodiment, all or part of the coding modes defined by the different dedicated weighting factors can be deactivated by a higher level syntax element, for instance a specific flag to deactivate them, or as part of profile information that deactivates several tools at once.

According to an embodiment, the weighted prediction mechanism is generalized to control the weight to be applied to the reference layer residual. In this generalization of weighted prediction, the weight a, and possibly an offset f3, can be signalled in the stream similarly as the usual weighted prediction parameters as specified for instance in HEVC standard.

For example, new syntax elements for the reference layer residual weighted prediction mode can be added. The following table 1 corresponds to the weighted prediction signalling in HEVC. The part in bold font corresponds to the additional syntax added for the dedicated weighted prediction signalling of the reference layer residual.

pred_weight_table() { lumajog2_weight_denom if( chrorna_forrnat_idc 0) delta_chroma_log2_weight_denom for( i = 0; 1 <= nurn_ref_idx_10_active_minusl; i++) lurna_weight_10_flag [ i I if( chrorna_forrnat_idc 0) for( I = 0; I <= num_ref_idx_ffl_active_minusl; i++) chrorna_weight_10_flag [i] for( I = 0; 1 <= num_ref_idx_l0_active_minusl; i++) { if( luma_weight_10_flag[ ii) delta_luma_weight_10[ ii iuma_offset_I0[ I] if( chrorna_weight_10_flag[ i]) for(j = 0;j <2;j++) { delta_chroma_weight_10[ I][j delta_chrorna_offset_10 [ i I [ I I if( slice_type = = B for( i = ü; 1 <= num_ref_icix_il_active_minusi; i++) luma_w eight_i i_flag [ i] if( chrorna_fomiat_idc = 0) for( i = 0; I c= num_ref_idx_1 1_active_minus 1; i++) chrorna_weight_11_flag[ ii for( i = 0; 1 <= nurn_ref_idx_1l_active_rninusl; i++) { if( luma_weight_1l_flag[ ii) { deita_iurna_weight_LI[ ii lurna_offset_i 1 [ii if( chrorna_weight_il_flag[ i] ) for( j = 0; j <2; j++) { delta_chrorna_weight_11[ I][ j] delta_chroma_offset_1 I [ i][ j] if( slice_GRP_enabled) { for( I = 0; 1 c= num_ref_kJx_GRP_active_minusl; i÷+) luma_weight_GRP_flag[ I] if( chroma_format_idc!= 0) for( i = 0; 1 c= num_ref_idx_GRP_active_minusl; i-i--i-) chroma_weight_GRP_flag[ 1] for( i = 0; i c= num_ret'_idx_GRP_active_minusl; ii-+ ) 4 if( luma_weight_GRP_flag[ I]) { detta_tuma_weight_GRP[ III Iuma_offset_GRP[ 1]

I

if( chroma_weight_GRP_flag[ iJ) for(j=O;j<2;j++){ delta_chroma_weight_GRP[ I] [ j] delta_chroma_offset_GRP[ I][ j]

I

Table 1

According to an embodiment, as the enhancement layer can use the reference layer weighted prediction parameters, the weighted prediction parameters signalled in the enhancement layer can be used exclusively for either the usual weighted prediction or for the reference layer residual weighted prediction mode. A high-level flag, that may be called for example ordinary_weighted_pred_fIa, can be added to indicate if the enhancement layer weighted prediction syntax is used for the usual weighted prediction, or for the reference layer residual weighted prediction mode.

According to an embodiment, the existing HEVC syntax elements for the Li list of F-Slices or B-Slices, which are the elements in the table i above that contains _l1' as opposed to 110' for the other list, are re-used to signal the dedicated weighted prediction parameters of the reference layer residual: * In P-slices, using only uni-prediction, the Li parameters are used for encoding the reference layer residual weighting parameters; * In B-slices, using both uni-prediction and bi-prediction, one slice-level syntax element indicates if the Li parameters in the EL apply to usual weighted prediction or to the reference layer residual weighted prediction mode or to both.

This can be controlled by a high-level flag, for example at the slice level, indicating if the Li parameters apply to the usual prediction or to the reference layer residual weighted prediction mode. There may be one flag for the B-slice case and one flag for the F-slice case.

For instance, the following syntax table 2 may apply, where changes compared to the HEVC design are in bold font:

pred_weight_table()

lurna_log2_weight_denom if( chrorna_forrnat_idc 0) delta_chrorna_log2_weight_denom for( i = 0; i <= nurn_ref_idx_10_active_minusl; i++) lurna_weight_10_flag[ i] if( chroma_format_idc 0) for( I = 0; I <= nurn_ref_idx_ffl_active_rninusl; i++ ) chrorna_weight_lO_flag [i] for( I = 0; 1 <= num_ref_idx_10_active_minusl; i++) { if( lurna_weight_1O_flag[ ii) { delta_luma_weight_lO [ i I luma_offset_1O[ I I if( chroma_weight_i0_flag[ 1]) for( j = 0; j <2; j++) delta_chrorna_weight_10[i][j deita_chroma_offset_10 [ i I [ I I if( slice_type = = B II (slice_type = = P && slice_GRP_enabled)) for( I = 0; I <= num_ref_idx_I1_active_minusl; i++) luma_w eight_i i_flag [ i] if( chronia_format_idc 0) for( i = 0; i c= num_ref_idx_il_active_rninusl; i++) chrorna_weight_11_flag[ ii foul i = 0; i <= num_ref_idx_1l_active_minusl; i++) if( luma_weight_1l_flag[ ii) deita_iuma_weight_il [ii lurna_offset_1 I [I] if( chrorna_weight_il_flag[ i] ) for( j = 0; j <2; j++) { delta_chrorna_weight_1 I [ i][ j deita_chrorna_offset_l 1 [ii [ii

I

Table 2

Based on this table, if the type of the slice is B and the flag ordinary_weighted_prectftag is false, then the Li parameters apply to the reference layer residual weighted prediction mode, and not for the Li prediction.

Also, if the type of the slice is P and if the slIce_GRP_enabled flag is enabled, the Li parameters, normally not signalled in P-slices, are signalled anyway for the reference layer residual weighted prediction mode.

iO In relation with Figure 10, function iO.54 in particular may be in charge of determining the prediction. In particular, the reference layer coded data may be used in relation with a weighting factor, selected among a list of available weighting factors are transmitted by the above mechanism.

i5 The entropy coding function of the base layer 10.0 in that second part takes into account these inter-layer prediction modes to encode specifically information for them. For instance, given a list of three weighting factors (e.g. 1, 1/2 and 1/4 in that order), codewords 0', iO' and ii' can be written in addition to the inter mode information. This function may take into account the fact that all of part of these modes are deactivated, e.g. by only writing codewords 0' and i' for weighting factors 1 and 1/2, or even no codeword at all if all such inter-layer prediction modes are deactivated.

Figure 9 is a schematic block diagram of a computing device 9.0 for implementation of one or more embodiments of the invention. The computing device 9.0 may be a device such as a micro-computer, a workstation or a light portable device. The computing device 9.0 comprises a communication bus connected to: -a central processing unit 9.1, such as a microprocessor, denoted CPU; -a random access memory 9.2, denoted RAM, for storing the executable code of the method of embodiments of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method for encoding or decoding at least part of an image according to embodiments of the invention, the memory capacity thereof can be expanded by an optional RAM connected to an expansion port for example; -a read only memory 9.3, denoted ROM, for storing computer programs for implementing embodiments of the invention; -a network interface 9.4 is typically connected to a communication network over which digital data to be processed are transmitted or received.

The network interface 9.4 can be a single network interface, or composed of a set of different network interfaces (for instance wired and wireless interfaces, or different kinds of wired or wireless interfaces). Data packets are written to the network interface for transmission or are read from the network interface for reception under the control of the software application running in the CPU 9.1; -a user interface 9.5 for receiving inputs from a user or to display information to a user may be provided; -a hard disk 9.6 denoted HD may be used as a mass storage device; -an I/O module 9.7 may be used for receiving/sending data from/to external devices such as a video source or display.

The executable code may be stored either in read only memory 9.3, on the hard disk 9.6 or on a removable digital medium such as for example a disk.

According to a variant, the executable code of the programs can be received by means of a communication network, via the network interface 9.4, in order to be stored in one of the storage means of the communication device 9.0, such as the hard disk 9.6, before being executed.

The central processing unit 9.1 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to embodiments of the invention, which instructions are stored in one of the aforementioned storage means. After powering on, the CPU 9.1 is capable of executing instructions from main RAM memory 9.2 relating to a software application after those instructions have been loaded from the program ROM 9.3 or the hard-disc (HD) 9.6 for example. Such a software application, when executed by the CPU 201, causes the steps of the flowcharts shown in Figures 5 or 6 to be performed.

Any step of the algorithm shown in Figure 5 or 6 may be implemented in software by execution of a set of instructions or program by a programmable computing machine, such as a PC ("Personal Computer"), a DSP ("Digital Signal Processor") or a microcontroller; or else implemented in hardware by a machine or a dedicated component, such as an FPGA ("Field-Programmable Gate Array") or an ASIC ("Application-Specific Integrated Circuit").

Although the present invention has been described hereinabove with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a skilled person in the art which lie within the scope of the present invention.

Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.

In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Claims

CLAIMS1. A method for encoding an image of pixels according to a scalable encoding scheme having an enhancement layer and a reference layer, the method comprising for the encoding of said enhancement layer: -(a) obtaining a block predictor candidate for predicting a coding unit within the enhancement layer and an associated enhancement-layer residual block corresponding to said prediction; -(b) determining a block predictor in the reference layer co-located with the determined block predictor candidate within the enhancement layer; -(c) determining a reference-layer residual block associated with the coding unit in the reference layer that is co-located with the coding unit in the enhancement layer using the determined block predictor in the reference layer; and -(d) determining for the coding unit of the enhancement layer a further residual block corresponding, at least partly, to the difference between the enhancement-layer residual block and the reference-layer residual block; wherein the method further comprises: -applying at least a weighting factor to at least one element among the obtained block predictor candidate, the determined block predictor and the reference-layer residual block.
2. A method according to claim 1, wherein the method comprises: -applying a first weighting factor to the obtained predictor candidate; and -applying a second weighting factor to the determined block predictor.
3. A method according to claim 1 or 2, wherein the method comprises: -applying a dedicated weighting factor to the reference-layer residual block.
4. A method according to any one claim 1 to 3, wherein the method further comprises: -adding an offset to at least one element multiplied by a weighting factor.
5. A method according to any one claim 1 to 4, wherein the method comprises: -selecting an encoding mode for a coding unit of the enhancement layer from among a plurality of available encoding modes, one of the available modes involving said steps (a) to (d).
6. A method according to claim any one 1 to 5, wherein when a reconstructed version of the reference layer picture representations is available: -the reference-layer residual block is determined as the difference between the co-located coding unit in the reference layer and the determined block predictor in the reference layer; and -each sample of said further residual block corresponds to a difference between a sample of the enhancement-layer block and a corresponding sample of the reference-layer residual block.
7. A method according to claim 5, wherein when an image of residual data used for the encoding of the reference layer is available determining the reference-layer residual block comprises: -determining the overlap in the image of residual data between the determined block predictor and the block predictor used in the encoding of the block co-located with the coding unit in the reference layer; and -using the part in the image of residual data corresponding to this overlap, if any, to compute a part of said further residual block, wherein the samples of said further residual block corresponding to this overlap each correspond to a difference between a sample of the enhancement-layer residual block and a corresponding sample of the reference-layer residual block.
8. A method according to any of claim 5 to 7, wherein the obtained block predictor candidate of the coding unit is in a previously encoded image.
9. A method according to any of claim 5 to 7, wherein the obtained predictor candidate of the coding unit is obtained from a previously encoded part of the same image the coding unit belongs to.
10.A method according to claim 3, wherein the method further comprises: -transmitting said dedicated weighting factor as a syntax element.
11.A method according to claim 10, wherein the method further comprises: -transmitting said dedicated weighting factor as a syntax element added in the signalling table corresponding to the weighted prediction signalling.
12.A method according to claim 10, wherein the method further comprises: -transmitting said dedicated weighting factor as a syntax element in the signalling table corresponding to the weighted prediction signalling usually used for the transmission of usual weighted prediction for the enhancement layer, a high level flag being added to discriminate both usages.
13. A method according to claim 10, wherein the method further comprises: -transmitting said dedicated weighting factor as a syntax element in the signalling table corresponding to the weighted prediction signalling usually used for the transmission of usual weighted prediction for slices of B type, a high level flag being added to discriminate both usages for this type of slice.
14.A method for decoding a bit stream comprising data representing an image encoded according to a scalable encoding scheme having an enhancement layer and a reference layer, the method comprising for the decoding of said enhancement layer: -obtaining from the bit stream the location of a block predictor of a coding unit within the enhancement layer to be decoded and a residual block comprising difference information between enhancement layer residual information and reference layer residual information; -determining the block predictor in the reference layer co-located with the block predictor in the enhancement layer; -determining a reference-layer residual block corresponding to the difference between the block of the reference layer co-located with the coding unit to be decoded and the determined block predictor in the reference layer; -reconstructing an enhancement-layer residual block using the determined reference-layer residual block and said residual block obtained from the bit stream; reconstructing the coding unit using the block predictor and the enhancement-layer residual block; wherein the method further comprises: -applying at least a weighting factor to at least one element among the obtained block predictor candidate, the determined block predictor and the reference-layer residual block.
15.A method according to claim 14, wherein the method comprises: -applying a first weighting factor to the obtained predictor candidate; and -applying a second weighting factor to the determined block predictor.
16.A method according to claim 14 or 15, wherein the method comprises: -applying a dedicated weighting factor to the reference-layer residual block.
17.A method according to any one claim 14 to 16, wherein the method further comprises: -adding an offset to at least one element multiplied by a weighting factor.
18.A method according to any of claim ito 17, wherein the determination of a predictor of the coding unit is made using a cost function adapted to take into account the prediction of the enhancement-layer residual block to determine a rate distortion cost.
i9.A method according to any of claim ito 18, wherein step determining a reference-layer residual block comprises: -on-demand upsampling of needed blocks in the reference layer.
20. A method according to any of claim 1 to 19, wherein, when the reference-layer residual block is determined at the reference layer resolution, the method further comprises: -up-sampling the determined reference-layer residual block.
21.A method according to claim 20, wherein picture representations used in the reference layer to compute the reference-layer residual block correspond to some of the reference picture representations stored in the decoded picture buffer of the reference layer.
22.A method according to claim 16, wherein the method further comprises: -obtaining said dedicated weighting factor as a syntax element.
23.A method according to claim 16, wherein the method further comprises: -selecting several dedicated weighting factors for the same coding unit among a plurality of dedicated weighting factor available, each of said available dedicated weighting factor being defined as a particular reference frame.
24.A device for encoding an image of pixels according to a scalable encoding scheme having an enhancement layer and a reference layer, the device comprising for the encoding of said enhancement layer: -(a) an obtaining module for obtaining a block predictor candidate for predicting a coding unit within the enhancement layer and an associated enhancement-layer residual block corresponding to said prediction; -(b) a determining module for determining a block predictor in the reference layer co-located with the determined block predictor candidate within the enhancement layer; -(c) a determining module for determining a reference-layer residual block associated with the coding unit in the reference layer that is co-located with the coding unit in the enhancement layer using the determined block predictor in the reference layer; -(d) a determining module for determining for the coding unit of the enhancement layer a further residual block corresponding, at least partly, to the difference between the enhancement-layer residual block and the reference-layer residual block; wherein the device further comprises: -a multiplying module for applying at least a weighting factor to at least one element among the obtained block predictor candidate, the determined block predictor and the reference-layer residual block.
25.A device according to claim 24, wherein the multiplying module is adapted for: -applying a first weighting factor to the obtained predictor candidate; and for -applying a second weighting factor to the determined block predictor.
26.A device according to claim 24 or 25, wherein the multiplying module is adapted for: -applying a dedicated weighting factor to the reference-layer residual block.
27.A device according to any one claim 24 to 26, wherein the multiplying module is adapted for: -adding an offset to at least one element multiplied by a weighting factor.
28.A device according to any one claim 24 to 27, wherein the device comprises: -a selecting module for selecting an encoding mode for a coding unit of the enhancement layer from among a plurality of available encoding modes, one of the available modes involving said modules (a) to (d).
29.A device according to any one claim 24 to 28, wherein when a reconstructed version of the reference layer picture representations is available: -the reference-layer residual block is determined as the difference between the co-located coding unit in the reference layer and the determined block predictor in the reference layer; and -each sample of said further residual block corresponds to a difference between a sample of the enhancement-layer residual block and a corresponding sample of the reference-layer residual block.
30.A device according to claim 28, wherein the determining module for determining the reference-layer residual block is operable, when an image of residual data used for the encoding of the reference layer is available, to: -determine the overlap in the image of residual data between the determined block predictor and the block predictor used in the encoding of the block co-located with the coding unit in the reference layer; and -compute a pad of said further residual block using the part in the image of residual data corresponding to this overlap, if any, wherein the samples of said further residual block of the enhancement layer corresponding to this overlap each correspond to a difference between a sample of the enhancement-layer residual block and a corresponding sample of the reference-layer residual block.
31.A device according to any one of claims 24 to 30, wherein the obtained block predictor candidate of the coding unit is in a previously encoded image.
32.A device according to any of claim 24 to 30, wherein the obtained predictor candidate of the coding unit is obtained from a previously encoded part of the same image the coding unit belongs to.
33.A device according to claim 26, wherein the device further comprises: -a transmitter for transmitting said dedicated weighting factor as a syntax element.
34.A device according to claim 33, wherein said transmitter is adapted for: -transmitting said dedicated weighting factor as a syntax element added in the signalling table corresponding to the weighted prediction signalling.
35.A device according to claim 33, wherein said transmitter is adapted for: -transmitting said dedicated weighting factor as a syntax element in the signalling table corresponding to the weighted prediction signalling usually used for the transmission of usual weighted prediction for the enhancement layer, a high level flag being added to discriminate both usages.
36. A device according to claim 33, wherein said transmitter is adapted for: -transmitting said dedicated weighting factor as a syntax element in the signalling table corresponding to the weighted prediction signalling usually used for the transmission of usual weighted prediction for slices of B type, a high level flag being added to discriminate both usages for this type of slice.
37. A device for decoding a bit stream comprising data representing an image encoded according to a scalable encoding scheme having an enhancement layer and a reference layer, the device comprising for the decoding of said enhancement layer: -an obtaining module for obtaining from the bit stream the location of a block predictor of a coding unit within the enhancement layer to be decoded and a residual block comprising difference information between enhancement layer residual information and reference layer residual information; -a determining module for determining the block predictor in the reference layer co-located with the block predictor in the enhancement layer; -a determining module for determining a reference-layer residual block corresponding to the difference between the block of the reference layer co-located with the coding unit to be decoded and the determined block predictor in the reference layer; -a reconstructing module for reconstructing an enhancement-layer residual block using the determined reference-layer residual block and said residual block obtained from the bit stream; and -a reconstructing module for reconstructing the coding unit using the block predictor and the enhancement-layer residual block; -wherein the device further comprises: -a multiplying module for applying at least a weighting factor to at least one element among the obtained block predictor candidate, the determined block predictor and the reference-layer residual block.
38.A device according to claim 37, wherein the multiplying module is adapted for: -applying a first weighting factor to the obtained predictor candidate; and for -applying a second weighting factor to the determined block predictor.
39.A device according to claim 37 or 38, wherein the multiplying module is adapted for: -applying a dedicated weighting factor to the reference-layer residual block.
40.A device according to any one claim 37 to 39, wherein the multiplying module is adapted for: -adding an offset to at least one element multiplied by a weighting factor.
41.A device according to any of claim 24to 40, wherein the determination of a predictor of the coding unit is made using a cost function adapted to take into account the prediction of the enhancement-layer residual block to determine a rate distortion cost.
42.A device according to any of claim 24 to 41, wherein said determining module for determining a reference-layer residual block comprises: -an up-sampling module for on-demand upsampling of needed blocks in the reference layer.
43. A device according to any of claim 24 to 42, further comprising an up-sampling module operable, when the reference-layer residual block is determined at the reference layer resolution, to up-sample the determined reference-layer residual block.
44.A device according to claim 43, wherein picture representations used in the reference layer to compute the reference-layer residual block correspond to some of the reference picture representations stored in the decoded picture buffer of the reference layer.
45. A device according to claim 39, wherein the device further comprises: -an obtaining module for obtaining said dedicated weighting factor as a syntax element.
46. A method according to claim 39, wherein the device further comprises: -a selecting module for selecting several dedicated weighting factors for the same coding unit among a plurality of dedicated weighting factor available, each of said available dedicated weighting factor being defined as a particular reference frame.
47.A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to any one of claims 1 to 23; when loaded into and executed by the programmable apparatus.
48.A computer-readable storage medium storing instructions of a computer program for implementing a method, according to any one of claims 1 to 23.
49.A method of encoding or decoding substantially as hereinbefore described with reference to, and as shown in Figure 5, 6, 10 and 11.