CN101415120A

CN101415120A - Enhancement layer residual prediction for bit depth scalability using hierarchical luts

Info

Publication number: CN101415120A
Application number: CNA2008101693789A
Authority: CN
Inventors: 武宇文; 高永英
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS; International Digital Madison Patent Holding SAS
Priority date: 2007-10-15
Filing date: 2008-10-14
Publication date: 2009-04-22
Anticipated expiration: 2028-10-14
Also published as: KR101526011B1; CN101415120B; US8798149B2; EP2051529B1; DE602008001140D1; US20090110073A1; EP2051527A1; KR20090038374A; JP2009100473A; EP2051529A1; JP5346198B2

Abstract

The invention discloses a method for the predication of residual error of classified-LUT-used and bit-depth-scalable enhanced layer. The scalable video bit stream may have H264/AVC compatible basic layer and scalable enhanced layer, wherein the scalability relates to the color bit depth. According to the invention, BL information carries out bit depth up-sampling using single search table mapped by reverse tone on two or more levels which are image level, image bar level and MB level. The search table is carried out differential coding and contained in the head information. The process of the bit depth up-sampling (BDUp) is as follows: the process enhances the number of the value of each pixel and relative to the colour intensity of the pixel. Based on the search table, the data of the basic layer after the up-sampling is used for predicting the enhance layer in the relative position. The up-sampling is finished at a coder, and a decoder by the same mode, wherein the up-sampling relates to time, space and bit depth. Therefore, the bit depth up-sampling (BDUp) is compatible with the texture up-sampling (TUp).

Description

Enhancement layer residual prediction at the bit-depth scalability that uses hierarchical luts

Technical field

The present invention relates to the digital video coding technical field.The invention discloses a kind of encoding scheme of the bit-depth scalability to color.

Background technology

In recent years, more and more wish to have high color depth rather than eight traditional bit color depth in a lot of fields, for example science imaging of described field, digital camera, high-quality video projection computer game, Job chamber and the application relevant with home theater.Correspondingly, H.264/AVC the video encoding standard of current high-tech development level has comprised that the expansion (FRExt) of fidelity range, its support reach every sampling 14 bits and up to the sample rate of 4:4:4 colourity.Current SVC reference software JSVM does not support the higher bit degree of depth.

Yet, do not have existing higher level code scheme to support bit-depth scalability with other scalability types compatibilities.Situation for two different decoders, or has a situation of the client of different bit-depth requests, for example for identical original video, the situation of 8 bits and 12 bits, existing H.264/AVC scheme is: 12 bit original videos are encoded so that produce first bit stream, then 12 bit original videos are converted to 8 bit original videos, and it is encoded so that produce second bit stream.If delivery of video to the different clients of asking different bit-depths, then must be transmitted twice, for example dibit stream is placed a dish together.It is all lower to do for compression ratio and complicated operation degree efficient like this.

European patent application EP 06291041 discloses a kind of scalable scheme, is used for whole 12 bit original videos are once encoded, and comprises a H.264/AVC bit stream of compatible basic layer (BL) and scalable enhancement layer (EL) so that produce.Than above-mentioned first bit stream, the expense of whole scalable bit stream is compared less with the second additional bit stream.If H.264/AVC decoder can be used for receiving terminal,, and can on 8 traditional bit display device, watch decoded 8 bit video then only to BL bit stream decoding; If bit-depth scalable decoding device can be used for receiving terminal, then can all decode so that obtain 12 bit video, and can on the high-quality display equipment of supporting more than eight bit color depth, watch this 12 bit video BL and EL bit stream.

Summary of the invention

H.264/AVC scalable extension SVC also provides other scalability types, for example spatial scalability.In spatial scalability, BL is different with pixel count among the EL.Therefore, the problem that how bit-depth scalability combined with other scalability types (especially spatial scalability) has appearred.The invention provides a kind of solution of this problem.

Claim 1 discloses a kind of coding method, and it allows bit-depth scalability to combine with other scalability types.Claim 6 discloses a kind of corresponding decoding method.

In claim 10, disclose a kind of equipment that utilizes this coding method, in claim 11, disclose a kind of equipment that utilizes this coding/decoding method.

According to the present invention, in inter-layer prediction, adopt contrary tone mapping, so that improve code efficiency based on look-up table (LUT).Contrary tone mapping techniques based on LUT is used for following EL pictorial element: wherein the BL pictorial element for correspondence position (collocated) carries out intraframe coding.Total pictorial element is macro block (MB), piece, as bar (slice), image or image sets.For example, for picture bar level,, create LUT at encoder based on the original EL picture bar of BL I-picture bar after the reconstruct and correspondence position.Particularly, LUT can be inserted bit stream with hierarchical approaches.For example, in the bit stream of AVC compatibility,, produce a LUT as " base " LUT based on whole sequence; Based on different frames, can also produce more rudimentary LUT; In addition, if desired, can also in bit stream, transmit as bar level LUT.In order to reduce the expense that LUT introduces,, only its difference with the higher level LUT that is right after is encoded in each level of LUT.In the SVC structure, can realize whole proposal, and the compatibility of support and time, space and these other types scalabilities of SNR scalability.

In one embodiment, in two logic steps BL information is carried out up-sampling, a step is texture (texture) up-sampling, and another step is the bit-depth up-sampling.The texture up-sampling is the process that increases pixel count, and the bit-depth up-sampling is to increase the numbered purpose process that each pixel can have.This numerical value is corresponding with (color) intensity of pixel.BL pictorial element behind the up-sampling is used to predict the EL pictorial element of correspondence position.Encoder produces residual error from the EL video data, this residual error further can be encoded (normally entropy coding) and transmission.Want sampled BL information that any granularity can be arranged, for example single pixel unit, block of pixels, MB, as bar, entire image or image sets.Can in single step, carry out these two sampling step in logic in addition.In encoder one side to BL information up-sampling, and in the same way in decoder one side to its up-sampling, wherein this up-sampling relates to space and bit-depth characteristic.

In addition, the space and the bit-depth up-sampling of combination be can carry out usually, intraframe coding and inter coded images are used for.Yet,, only define and use according to hierarchical luts of the present invention if carry out intraframe coding for correspondence position BL.

Particularly, the invention discloses a kind of method to coding video data with basic unit and enhancement layer, wherein basic unit's pixel has littler bit-depth and lower spatial resolution than enhancement layer pixels, said method comprising the steps of:

On first particle size fraction base layer data is encoded, described first particle size fraction for example is GOP level, a plurality of image level or picture bar level, wherein base layer data is carried out intraframe coding,

Base layer data behind the reconstruct coding,

Be that the base layer data after the intraframe coding (as first predicted version of enhancement data) produces the first tone mapping table, this table definition the independent mapping between the base layer data after the reconstruct and the corresponding original enhancement layer data,

Be that the base layer data fragment after the intraframe coding produces the second different tone mapping tables, this table definition the independent mapping between the homologous segment of described fragment and corresponding original enhancement layer data of the base layer data after the reconstruct,

Produce difference table, the difference (that is: the deviation of the second and first tone mapping table) between this first and second tone mapping table of table expression,

Based on the described first and second tone mapping tables, base layer data is carried out the bit-depth up-sampling, wherein for the described fragment of base layer data, only use the second tone mapping table, and obtain second predicted version of corresponding enhancement data, its first predicted version than enhancement data has higher bit-depth resolution

Produce enhancement layer residual, its be the original enhancement layer data with second predicted version of corresponding enhancement data between difference, and

Enhancement layer residual, the first tone mapping table and described difference table are encoded, wherein the first tone mapping table behind the coding is associated with basic unit or enhancement data behind the coding, and difference table with encode after base layer data or the described fragment of enhancement data be associated.

In one embodiment, before the bit-depth up-sampling, the base layer data after the reconstruct is carried out space, time or SNR up-sampling, wherein, obtain first predicted version of corresponding enhancement data, it has higher space, time or SNR resolution than base layer data.First mapping table has defined the tone mapping between BL data up-sampling, after the reconstruct and the corresponding original EL data substantially then, second tone mapping table definition up-sampling, the mapping after the reconstruct between the homologous segment of BL data and corresponding original EL data.In addition, in this embodiment, first predicted version of the EL data that the bit-depth up-sampling is related is different from the BL data, and this is because first predicted version of these EL data has been carried out up-sampling.

According to an aspect of the present invention, disclose a kind of method that video data is decoded, said method comprising the steps of:

From the coding after EL data or the BL data extract first and second tone mapping (enum) datas relevant with EL data after the intraframe coding,

From the tone mapping (enum) data reconstruct first tone mapping table that extracts,

From the tone mapping (enum) data of extraction and the first tone mapping table reconstruct, the second tone mapping table after the described reconstruct, wherein, the tone mapping (enum) data that extracts that utilizes is represented the difference between described first and second tone mapping table,

Determine first coding unit relevant with the first tone mapping table and second coding unit relevant with the second tone mapping table, wherein, second coding unit is the sub-fraction of described first coding unit,

The BL data and the EL data that receive are carried out inverse quantization and inverse transformation, and wherein, the EL data after inverse quantization and the inverse transformation comprise residual error,

BL data after the reconstruct intraframe coding,

BL data after the reconstruct are carried out up-sampling, wherein increased the value degree of depth of every pixel, and for the pixel in described second coding unit, use the second tone mapping table, and, use the first tone mapping table for the residual pixel of first coding unit, and obtain the EL data of prediction, and

From the EL data of prediction and the EL data after inverse quantization and the inverse transformation, reconstruct the EL video data of reconstruct.

The principle that is adopted can be interpreted as LUT general and exception: usually, a LUT is effectively for specified scope, for example: for the picture bar, remove the subrange of appointment in the described scope, for example as the MB in the bar.In the subrange of appointment, the 2nd LUT is effective.In principle, in the subrange of appointment, the second tone mapping table rewrites the first tone mapping table.This principle can be expanded to some or all of available code level.

According to a further aspect in the invention, a kind of signal is disclosed, it comprises base layer video data and enhancement layer video data, basic unit has littler color bit-depth than enhancement layer, wherein base layer data comprises the video data after the intraframe coding, and wherein said signal further comprise with intraframe coding after the first relevant tone mapping (enum) data of first level (for example image) of video data, and further comprise the second relevant tone mapping (enum) data of defined segmentation (fraction) (for example specific picture bar or MB) in described first level with video data.The first tone mapping (enum) data is represented first table, the bit-depth up-sampling that is used for the pixel of described first level except that described segmentation, basic unit, the second tone mapping (enum) data is represented the difference between second table and described first table, and wherein said second table is used for the bit-depth up-sampling of the pixel of described segmentation.Herein, term " segmentation " is meant elementary area substantially, for example MB, image, GOP and image sequence.

According on the other hand, corresponding apparatus is disclosed.

In one embodiment of the invention, a kind of equipment to video data encoding or decoding has been proposed, described coding or decoding device further comprise: carry out the device of space (residual error or texture) up-sampling and the device of carrying out the color bit-depth up-sampling, the device that wherein is used for the space up-sampling has increased the interior numerical value number of BL information, the device that is used for the color bit-depth up-sampling has increased the color gamut of value, and has wherein obtained through the BL data behind space and the color bit-depth up-sampling.

The various embodiment of this encoding scheme can with H.264/AVC and the scalability compatibility of all categories of definition in the H.264/AVC scalable expansion (SVC).

Advantageous embodiment of the present invention is disclosed in dependent claims, following specification and accompanying drawing.

Description of drawings

Exemplary embodiment of the present invention has been described with reference to the drawings, and the accompanying drawing of institute's reference is as follows:

Fig. 1 shows the structure of color bit-depth ges forschung;

Fig. 2 shows and is used for the encoder of texture in the frame of spatial scalability (intra texture) inter-layer prediction to the color bit-depth scalability extension;

Fig. 3 shows and is used for texture inter-layer prediction in the frame of spatial scalability to the decoder of color bit-depth scalability extension;

An exemplary collection of the classification look-up table of tone mapping on Fig. 4 shows at GOP, as bar and macro-block level; And

Another exemplary collection of the classification look-up table of tone mapping on Fig. 5 shows at GOP, as bar and macro-block level.

Embodiment

As shown in Figure 1, use the input of two videos conducts: N bit original video and M bit (M＜N, M=8 usually) video to video encoder.This M bit video can be decomposed from N bit original video, or provide by other modes.This scalable scheme can be by using the image of BL, reduces the redundancy between two-layer.These two video flowings, one has 8 bit color, another has N bit color (N〉8), and they are inputed to encoder, output be scalable bit stream.Only import a N bit color data flow and also be fine, (M＜N) color data flows from the inner M of generation of this N bit color data flow bit for BL.Use included H.264/AVC encoder, this M bit video is encoded to BL.Can use BL information, improve the code efficiency of EL.This is called as inter-layer prediction herein.Each image---one group of MB---has two access units, and one is used for BL, and another is used for EL.Bit stream coded is carried out multiplexing, to form scalable bit stream.The BL encoder for example comprises H.264/AVC encoder, and uses reconstruct to predict the N bit color video that will be used for the EL coding.

As shown in Figure 1, exemplarily comprise can be by BL bit stream BL decoder (traditional AVC decoder) decoding, that be suitable for AVC for scalable bit stream.Then, decoder one side will finish with encoder in identical prediction (after each self-indication is estimated), with the N bit video that obtains predicting.Then, utilize this N bit prediction video, the EL decoder will use the prediction of N bit, come to produce final N bit video for high quality displayer HQ.

When using term " color bit-depth " herein, its indication bit degree of depth, promptly every numerical value bit number.This is corresponding with color intensity usually, but also can refer to the gray value among the luminance channel Y.

In one embodiment, the present invention is based on the current structure of SVC space, time and quality scalability, and strengthen by the bit-depth scalability of the color bit-depth that has strengthened.Therefore, this embodiment and current SVC standard are compatible fully.Yet it will be easy that the technical staff adapts to other standards with it.The key of bit-depth scalability is the bit-depth inter-layer prediction.By using inter-layer prediction, be EL with the differential coding between N bit and the M bit video.

The present invention has used the contrary tone mapping techniques based on LUT, is used for the inter-layer prediction of bit-depth ges forschung, this technological improvement code efficiency.Based on the relation between the original EL coding unit of BL coding unit after the reconstruct (GOP, image, as bar or MB) and correspondence position, create LUT at the encoder place.

Usually, to each brightness/chroma channel: Y, Cb and Cr, create a LUT.In fact, two in these different channels or all three can share identical LUT.If two or more different LUT are applied to identical code level, then can also carry out differential coding, for example LUT to them _Y, LUT _Cb-Y, LUT _CR-YThen, in inter-layer prediction process, use the LUT that has created at the encoder place, so that the redundancy between BL and the EL is carried out decorrelation.LUT is inserted in the bit stream, and can recover this LUT in decoder end.Decoder uses identical LUT in inter-layer prediction, thereby can be with high-quality reconstruct EL.

BL and EL data that the tone mapping LUT is consulted can be in any level, for example image sequence, image, as bar, macro block (MB), piece (with descending).For LUT not at the same level is carried out decorrelation, for every grade (except that the superlative degree), only to its be right after more higher leveled difference and encode.This difference look-up table is known as " increment (delta) LUT ".For example, produce a LUT for the superlative degree such as GOP (image sets) level.Can produce another LUT for the child group level of for example 4 images.Then, can produce the difference table of difference between this child group LUT of expression and this group/GOP LUT.Can produce another LUT for single image.Then, produce the corresponding increment LUT of difference between this child group LUT of expression and this image LUT.In an identical manner, can on picture bar level and Mb level, produce other LUT.For these grades each, produce and be right after the increment LUT of higher one-level LUT with respect to it.This is shown in Figure 5.Yet, needn't all produce LUT for each level, for example can the skip pictures level.Then, return as bar level increment LUT that to consult the next one more senior, for example GOP level LUT.May occur as same stages equally and produce situation more than a LUT and increment LUT.For example, a LUT/ increment LUT consults first image in the GOP (or son group), and the 2nd LUT/ increment LUT consults another second image in the identical GOP (or son group).Then, these two increment LUT return and consult identical GOP (or son group) LUT.

In order further to reduce the expense of LUT in the bit stream, use differential encoding in one embodiment, more rudimentary LUT and/or increment LUT are encoded.The mathematic(al) representation of the Code And Decode process of LUT is as follows.

Suppose that NB and NE represent the bit-depth of basic unit (BL) and enhancement layer (EL) respectively,, be expressed as LUT={V (0) by the LUT of BL signal estimation EL signal for individual channel, V (1) ..., V (2 ^NB-1) }, wherein BL the level from 0 to 2 ^NB-1, and the level from 0 to 2 of EL ^NE-1.Therefore, according to LUT, in interlayer bit depth prediction process with among the BL the level i be mapped among the EL the level V (i).

At the encoder place,, highest LUT is encoded by difference to consecutive value.Only following value is carried out entropy coding:

V(0)，V(1)-V(0)，V(2)-V(1)，...，V(2 ^NB-1)-V(2 ^NB-2) (1)

Clauses and subclauses add up to 2 ^NBFor more rudimentary LUT, we are that each grade i calculates increment LUT according to following formula at first:

ΔLUT ⁱ＝LUT ⁱ-LUT ^i-1≡{V ⁱ(0)-V ^i-1(0)，V ⁱ(1)-V ^i-1(1)，…，V ⁱ(2 ^NB-1)-V ^i-1(2 ^NB-1)}

(2)

Also can use the method in the equation (1), LUT encodes to increment.In addition, because V ⁱ(k)-V ^I-1(k) many in all are zero, so the run-length coding of Huffman type can be favourable.

According to an aspect of the present invention, only when the BL data are carried out intraframe coding, use contrary tone mapping techniques based on LUT.This has following advantage: this technology be suitable for for example in current SVC standard, using to the decoding of the single loop of image after the intraframe coding and fragment, and it is suitable for the other types scalability for example also supported in current SVC standard.

Fig. 2 shows a kind of encoder, is used for that the texture inter-layer prediction expands to bit-depth scalability in the frame of the identical spatial scalability that will use with current SVC standard.Bit-depth up-sampling piece BDUp and look-up table (LUT) produce piece LUTGEN and LUT entropy coding piece EC _LUTRepresent expansion together, and other pieces also are used for spatial scalability to bit-depth scalability.These pieces BDUp, LUTGEN, EC _LUTAnd their connection is traditional SVC intra encoder and according to the difference between the intra encoder of the present invention.

Yet, it is noted that the bit-depth up-sampling must not need space (texture), time or SNR up-sampling.Yet, an advantage of the invention is: dissimilar scalabilities can be combined.

In Fig. 2, the MB of basic unit is input to encoder with the M bit, and N bit enhancement layer MB is input to EL encoder (N〉M).In current SVC standard, in the frame in space between texture layer predictive designs the texture up-sampling.In Fig. 2, the input of texture up-sampling TUp is the BL macro block BL after the reconstruct _REC, output is space (texture) the predicted version Pre of EL macro block _t{ BL _Rec.Step by (in this example) directly follows the bit-depth up-sampling BDUp after texture up-sampling TUp realizes bit-depth scalability.In fact, it is normally favourable at first the texture up-sampling to be applied as spatial inter-layer prediction, then bit-depth up-sampling BDUp is applied as the bit-depth inter-layer prediction.Yet the inverted order of prediction steps is fine.Utilize texture up-sampling TUp and bit-depth up-sampling BDUp, obtain the predicted version Pre of N bit EL macro block _c{ Pre _t{ BL _Rec.For each MB, use among at least two defined LUT.Based on the characteristic of BL after the reconstruct and original EL view data, in LUT generation piece LUTGEN, produce LUT.Bit-depth up-sampling piece BDUp uses LUT, and also LUT is outputed to encoder, and therefore this must send to decoder with them owing to LUT is necessary for decoding.As mentioned above, at LUT entropy coding unit EC _LUTIn LUT is encoded.

By difference generator D _EL, obtain original N bit EL macro block EL _OrgWith its predicted version Pre _c{ Pre _t{ BL _RecBetween residual error EL ' _ResIn one embodiment of the invention, to the further conversion T of this residual error, quantification Q and entropy coding EC _EL, to form EL bit stream, as in SVC.In mathematic(al) representation, the residual error of up-sampling is in the color bit-depth frame:

EL’ _res＝EL _org-Pre _c{Pre _t{BL _rec}} (3)

Pre wherein _t{ } expression texture up-sampling operator.

The different variants of cataloged procedure are possible, and can control variant by Control Parameter.Exemplary sign base_mode_flag has been shown among Fig. 2, and it determines to be based on the BL information prediction EL residual error after EL information after the reconstruct also is based on up-sampling.

Below, show the exemplary embodiment of this technical scheme, shine upon based on the contrary tone of hierarchical luts in the SVC bit-depth scalability, realizing.At length, some are new syntactic element adds to as the sequence parameter set in the capable exemplary scalable expansion that illustrates of 25-41 of table 1.Use following symbol:

Inv_tone_map_flag equals 1 explanation can be called contrary tone mapping in inter-layer prediction process.Inv_tone_map_flag equals 0 explanation can not called contrary tone mapping in inter-layer prediction process (default).

Level_lookup_table_luma_minus8 adds the progression of the look-up table of 8 explanation Y-channels.

Offset_val_lookup_table_luma[i] explanation value s[i], the level i in the look-up table of Y-channel is mapped to this value s[i as follows]:

If i is not equal to 0, s[i then] equal s[i-1] add offset_val_lookup_table_luma[i], s[i-1 wherein] be the level value that i-1 was mapped in the Y-channel.

If i equals 0, s[i then] equal offset_val_lookup_table_luma[i].

Chroma_inv_tone_map_flag equals 1 explanation can be called contrary tone mapping in the inter-layer prediction of Cb and Cr channel process.

Level_lookup_table_chroma_minus8 adds the progression of the LUT of 8 explanation Cb and Cr channel.

Offset_val_lookup_table_cb[i] explanation value s[i], the level i in the look-up table of Cb channel is mapped to this value s[i as follows]:

If i is not equal to 0, s[i then] equal s[i-1] add offset_val_lookup_table_cb[i], s[i-1 wherein] be the level value that i-1 was mapped in the Cb channel.If i equals 0, s[i then] equal offset_val_lookup_table_cb[i].

Cr_inv_tone_map_flag equals 0 explanation reuses the Cb channel in the inter-layer prediction of Cr channel LUT.Cr_inv_tone_map_flag equals 1 explanation use different look-up table except that the LUT of Cb channel in the inter-layer prediction of Cr channel.

Offset_val_lookup_table_cr[i] explanation value s[i], the level i among the LUT of Cr channel is mapped to this value s[i as follows]:

If i is not equal to 0, s[i then] equal s[i-1] add offset_val_lookup_table_cr[i], s[i-1 wherein] be the level value that i-1 was mapped in the Cr channel.If i equals 0, s[i then] equal offset_val_lookup_table_cr[i].

#
#				1	seq_parameter_set_svc_extension(){	C	Descriptor
2	interlayer_deblocking_filter_control_present_flag	0	u(1)	1	seq_parameter_set_svc_extension(){	C	Descriptor
2	interlayer_deblocking_filter_control_present_flag	0	u(1)	3	extended_spatial_scalability	0	u(2)
4	if(chroma_format_idc＝＝1\|\|chroma_format_idc＝＝2)			3	extended_spatial_scalability	0	u(2)
4	if(chroma_format_idc＝＝1\|\|chroma_format_idc＝＝2)			5	chroma_phase_x_plus1	0	u(1)
6	if(chroma_format_idc＝＝1)			5	chroma_phase_x_plus1	0	u(1)
6	if(chroma_format_idc＝＝1)			7	chroma_phase_y_plus1	0	u(2)
8	if(extended_spatial_scalability＝＝1){			7	chroma_phase_y_plus1	0	u(2)
8	if(extended_spatial_scalability＝＝1){			9	if(chroma_format_idc>0){
10	base_chroma_phase_x_plus1	0	u(1)	9	if(chroma_format_idc>0){
10	base_chroma_phase_x_plus1	0	u(1)	11	base_chroma_phase_y_plus1	0	u(2)
12	}			11	base_chroma_phase_y_plus1	0	u(2)
12	}			13	scaled_base_left_offset	0	se(v)
14	scaled_base_top_offset	0	se(v)	13	scaled_base_left_offset	0	se(v)
14	scaled_base_top_offset	0	se(v)	15	scaled_base_right_offset	0	se(v)
16	scaled_base_bottom_offset	0	se(v)	15	scaled_base_right_offset	0	se(v)
16	scaled_base_bottom_offset	0	se(v)	17	}
18	if(extended_spatial_scalability＝＝0){			17	}
18	if(extended_spatial_scalability＝＝0){			19	Avc_rewrite_flag	0	u(1)
20	if(avc_rewrite_flag){			19	Avc_rewrite_flag	0	u(1)
20	if(avc_rewrite_flag){			21	avc_adaptive_rewrite_flag	0	u(1)
22	}			21	avc_adaptive_rewrite_flag	0	u(1)
22	}			23	}
24	avc_header_rewrite_flag	0	u(1)	23	}
24	avc_header_rewrite_flag	0	u(1)	25	inv_tone_map_flag	1	u(1)
26	if(inv_tone_map_flag){			25	inv_tone_map_flag	1	u(1)
26	if(inv_tone_map_flag){			27	level_lookup_table_luma_minus8	1	u(v)
28	for(i＝0；i<(1<<(8+level_lookup_table_luma_minus8))；i++) {			27	level_lookup_table_luma_minus8	1	u(v)
28	for(i＝0；i<(1<<(8+level_lookup_table_luma_minus8))；i++) {			29	offset_val_lookup_table_luma[i]		se(v)
30	}			29	offset_val_lookup_table_luma[i]		se(v)
30	}			31	chroma_inv_tone_map_flag	1	u(1)
32	if(chroma_inv_tone_map_flag){			31	chroma_inv_tone_map_flag	1	u(1)
32	if(chroma_inv_tone_map_flag){			33	level_lookup_table_chroma_minus8	1	u(v)
34	for(i＝0；i<(1<<(8+level_lookup_table_chroma_minus8))； i++){			33	level_lookup_table_chroma_minus8	1	u(v)
34	for(i＝0；i<(1<<(8+level_lookup_table_chroma_minus8))； i++){			35	offset_val_lookup_table_cb[i]	1	se(v)
36	}			35	offset_val_lookup_table_cb[i]	1	se(v)
36	}			37	cr_inv_tone_map_flag	1	u(1)
38	if(cr_inv_tone_map_flag){			37	cr_inv_tone_map_flag	1	u(1)
38	if(cr_inv_tone_map_flag){			39	for(i＝0；i<(1<<(8+ level_lookup_table_chroma_minus8))；i++){
40	offset_val_lookup_table_cr[i]	1	se(v)	39	for(i＝0；i<(1<<(8+ level_lookup_table_chroma_minus8))；i++){
40	offset_val_lookup_table_cr[i]	1	se(v)	41	}
42	} } } }			41	}

Table 1: the illustrative embodiments in the picture bar head (Slice Header) in the scalable expansion

Table 2 shows the picture parameter set of revising according to one embodiment of present invention.The present invention be included in the 49-68 of table 2 capable in.

#	pic_parameter_set_rbsp(){	C	Descriptor
#	pic_parameter_set_rbsp(){	C	Descriptor	1	pic_parameter_set_id	1	ue(v)
2	seq_parameter_set_id	1	ue(v)	1	pic_parameter_set_id	1	ue(v)
2	seq_parameter_set_id	1	ue(v)	3	entropy_coding_mode_flag	1	u(1)
4	pic_order_present_flag	1	u(1)	3	entropy_coding_mode_flag	1	u(1)
4	pic_order_present_flag	1	u(1)	5	num_slice_groups_minus1	1	ue(v)
6	if(num_slice_groups_minus1>0){			5	num_slice_groups_minus1	1	ue(v)
6	if(num_slice_groups_minus1>0){			7	slice_group_map_type	1	ue(v)
8	if(slice_group_map_type＝＝0)			7	slice_group_map_type	1	ue(v)
8	if(slice_group_map_type＝＝0)			9	for(iGroup＝0；iGroup<＝num_slice_groups_minus1；iGroup++)
10	run_length_minus1[iGroup]	1	ue(v)	9	for(iGroup＝0；iGroup<＝num_slice_groups_minus1；iGroup++)
10	run_length_minus1[iGroup]	1	ue(v)	11	else if(slice_group_map_type＝＝2)
12	for(iGroup＝0；iGroup<num_slice_groups_minus1；iGroup++){			11	else if(slice_group_map_type＝＝2)
12	for(iGroup＝0；iGroup<num_slice_groups_minus1；iGroup++){			13	top_left[iGroup]	1	ue(v)
14	bottom_right[iGroup]	1	ue(v)	13	top_left[iGroup]	1	ue(v)
14	bottom_right[iGroup]	1	ue(v)	15	}
16	else if(slice_group_map_type＝＝3\|\| slice_group_map_type＝＝4\|\| slice_group_map_type＝＝5){			15	}
16				17	slice_group_change_direction_flag	1	u(1)
18	slice_group_change_rate_minus1	1	ue(v)	17	slice_group_change_direction_flag	1	u(1)
18	slice_group_change_rate_minus1	1	ue(v)	19	}else if(slice_group_map_type＝＝6){
20	pic_size_in_map_units_minus1	1	ue(v)	19	}else if(slice_group_map_type＝＝6){
20	pic_size_in_map_units_minus1	1	ue(v)	21	for(i＝0；i<＝pic_size_in_map_units_minus1；i++)
22	slice_group_id[i]	1	u(v)	21	for(i＝0；i<＝pic_size_in_map_units_minus1；i++)
22	slice_group_id[i]	1	u(v)	23	}
24	}			23	}
24	}			25	num_ref_idx_l0_active_minus1	1	ue(v)
26	num_ref_idx_l1_active_minus1	1	ue(v)	25	num_ref_idx_l0_active_minus1	1	ue(v)
26	num_ref_idx_l1_active_minus1	1	ue(v)	27	weighted_pred_flag	1	u(1)
28	weighted_bipred_idc	1	u(2)	27	weighted_pred_flag	1	u(1)
28	weighted_bipred_idc	1	u(2)	29	pic_init_qp_minus26/relative to 26/	1	se(v)
30	pic_init_qs_minus26/relative to 26/	1	se(v)	29	pic_init_qp_minus26/relative to 26/	1	se(v)
30	pic_init_qs_minus26/relative to 26/	1	se(v)	31	chroma_qp_index_offset	1	se(v)
32	deblocking_filter_control_present_flag	1	u(1)	31	chroma_qp_index_offset	1	se(v)
32	deblocking_filter_control_present_flag	1	u(1)	33	constrained_intra_pred_flag	1	u(1)
34	redundant_pic_cnt_present_flag	1	u(1)	33	constrained_intra_pred_flag	1	u(1)
34	redundant_pic_cnt_present_flag	1	u(1)	35	if(more_rbsp_data()){
36	transform_8x8_mode_flag	1	u(1)	35	if(more_rbsp_data()){
36	transform_8x8_mode_flag	1	u(1)	37	pic_scaling_matrix_present_flag	1	u(1)
38	if(pic_scaling_matrix_present_flag)			37	pic_scaling_matrix_present_flag	1	u(1)

Table 2: the illustrative embodiments in the picture parameter set

39	for(i＝0；i<6+ ((chroma_format_idc!＝3) 2:6)*transform_8x8_mode_flag； i++){
39				40	pic_scaling_list_present_flag[i]	1	u(1)
41	if(pic_scaling_list_present_flag[i])			40	pic_scaling_list_present_flag[i]	1	u(1)
41	if(pic_scaling_list_present_flag[i])			42	if(i<6)
43	scaling_list(ScalingList4x4[i]，16， UseDefaultScalingMatrix4x4Flag[i])	1		42	if(i<6)
43		1		44	else
45	scaling_list(ScalingList8x8[i-6]，64， UseDefaultScalingMatrix8x8Flag[i-6])	1		44	else
45		1		46	}
47	second_chroma_qp_index_offset	1	se(v)	46	}
47	second_chroma_qp_index_offset	1	se(v)	48	}
49	inv_tone_map_delta_flag	1	u(1)	48	}
49	inv_tone_map_delta_flag	1	u(1)	50	if(inv_tone_map_delta_flag){
51	level_lookup_table_luma_minus8	1	u(v)	50	if(inv_tone_map_delta_flag){
51	level_lookup_table_luma_minus8	1	u(v)	52	for(i＝0；i<(1<<(8+level_lookup_table_luma_minus8))；i++){
53	offset_val_lookup_table_luma_delta[i]		se(v)	52	for(i＝0；i<(1<<(8+level_lookup_table_luma_minus8))；i++){
53	offset_val_lookup_table_luma_delta[i]		se(v)	54	}
55	chroma_inv_tone_map_delta_flag	1	u(1)	54	}
55	chroma_inv_tone_map_delta_flag	1	u(1)	56	if(chroma_inv_tone_map_delta_flag){
57	level_lookup_table_chroma_minus8	1	u(v)	56	if(chroma_inv_tone_map_delta_flag){
57	level_lookup_table_chroma_minus8	1	u(v)	58	for(i＝0；i<(1<<(8+level_lookup_table_chroma_minus8))； i++){
59	offset_val_lookup_table_cb_delta[i]	1	se(v)	58	for(i＝0；i<(1<<(8+level_lookup_table_chroma_minus8))； i++){
59	offset_val_lookup_table_cb_delta[i]	1	se(v)	60	}
61	cr_inv_tone_map_delta_flag	1	u(1)	60	}
61	cr_inv_tone_map_delta_flag	1	u(1)	62	if(cr_inv_tone_map_delta_flag){
63	for(i＝0；i<(1<<(8+ level_lookup_table_chroma_minus8))；i++){			62	if(cr_inv_tone_map_delta_flag){
63	for(i＝0；i<(1<<(8+ level_lookup_table_chroma_minus8))；i++){			64	offset_val_lookup_table_cr_delta[i]	1	se(v)
65	}			64	offset_val_lookup_table_cr_delta[i]	1	se(v)
65	}			66	}
67	}			66	}
67	}			68	}
69	rbsp_trailing_bits()	1		68	}
69	rbsp_trailing_bits()	1		70	}

Table 2 (brought forward): the illustrative embodiments in the picture parameter set

Inv_tone_map_delta_flag equals the increment size that there is contrary tone mapping interpolation specified in will the sequence parameter set (SPS) in inter-layer prediction in 1 explanation.

Offset_val_lookup_table_luma_delta[i] the increment size s[i that will be in SPS specified value is added is described], the level i in the look-up table of Y-channel is mapped to this increment size s[i as follows]:

If i is not equal to 0, s[i then] equal s[i-1] add offset_val_lookup_table_luma_delta[i].Otherwise, s[i] and equal offset_val_lookup_table_luma_delta[i].

Chroma_inv_tone_map_delta_flag equals 1 explanation and exists and will shine upon the increment size that adds by specified contrary tone in the SPS in the inter-layer prediction of Cb and Cr channel.

Offset_val_lookup_table_cb_delta[i] the increment size s[i that will be in SPS specified value is added is described], the level i in the look-up table of Cb channel is mapped to this increment size s[i as follows]:

If i is not equal to 0, s[i then] equal s[i-1] add offset_val_lookup_table_cb_delta[i].Otherwise, s[i] and equal offset_val_lookup_table_cb_delta[i].

Cr_inv_tone_map_delta_flag equals the Cb channel of increment size 0 explanation reuses to(for) the Cr channel.Cr_inv_tone_map_delta_flag equals the different increment size of 1 explanation use except that the increment size of Cb channel.

Offset_val_lookup_table_cr_delta[i] the increment size s[i that will be in SPS specified value is added is described], the level i in the look-up table of Cr channel is mapped to this increment size s[i as follows]:

If i is not equal to 0, s[i then] equal s[i-1] add offset_val_lookup_table_cr_delta[i].Otherwise, s[i] and equal offset_val_lookup_table_cr_delta[i].

Picture bar after the exemplary extended in scalable expansion head is provided in the table 3.It is capable that the present invention is included in 46-67.

#	Slice_header_in_scalable_extension(){	C	Descriptor
#	Slice_header_in_scalable_extension(){	C	Descriptor	1	first_mb_in_slice	2	ue(v)
2	slice_type	2	ue(v)	1	first_mb_in_slice	2	ue(v)
2	slice_type	2	ue(v)	3	pic_parameter_set_id	2	ue(v)
4	frame_num	2	u(v)	3	pic_parameter_set_id	2	ue(v)
4	frame_num	2	u(v)	5	if(!frame_mbs_only_flag){
6	field_pic_flag	2	u(1)	5	if(!frame_mbs_only_flag){
6	field_pic_flag	2	u(1)	7	if(field_pic_flag)
8	bottom_field_flag	2	u(1)	7	if(field_pic_flag)
8	bottom_field_flag	2	u(1)	9	}
10	if(nal_unit_type＝＝21)			9	}
10	if(nal_unit_type＝＝21)			11	idr_pic_id	2	ue(v)
12	if(pic_order_cnt_type＝＝0){			11	idr_pic_id	2	ue(v)
12	if(pic_order_cnt_type＝＝0){			13	pic_order_cnt_lsb	2	u(v)
14	if(pic_order_present_flag && !field_pic_flag)			13	pic_order_cnt_lsb	2	u(v)
14	if(pic_order_present_flag && !field_pic_flag)			15	delta_pic_order_cnt_bottom	2	se(v)
16	}			15	delta_pic_order_cnt_bottom	2	se(v)
16	}			17	if( pic_order_cnt_type ＝＝ 1 && !delta_pic_order_always_zero_flag){
18	delta_pic_order_cnt[0]	2	se(v)	17
18	delta_pic_order_cnt[0]	2	se(v)	19	if(pic_order_present_flag &&!field_pic_flag)
20	delta_pic_order_cnt[1]	2	se(v)	19	if(pic_order_present_flag &&!field_pic_flag)
20	delta_pic_order_cnt[1]	2	se(v)	21	}
22	if(redundant_pic_cnt_present_flag)			21	}
22	if(redundant_pic_cnt_present_flag)			23	redundant_pic_cnt	2	ue(v)
24	if(slice_type＝＝EB)			23	redundant_pic_cnt	2	ue(v)
24	if(slice_type＝＝EB)			25	direct_spatial_mv_pred_flag	2	u(1)
26	if(quality_id＝＝0){			25	direct_spatial_mv_pred_flag	2	u(1)
26	if(quality_id＝＝0){			27	if(slice_type＝＝EP\|\|slice_type＝＝EB){
28	num_ref_idx_active_override_flag	2	u(1)	27	if(slice_type＝＝EP\|\|slice_type＝＝EB){
28	num_ref_idx_active_override_flag	2	u(1)	29	if(num_ref_idx_active_override_flag){
30	num_ref_idx_l0_active_minus1	2	ue(v)	29	if(num_ref_idx_active_override_flag){
30	num_ref_idx_l0_active_minus1	2	ue(v)	31	if(slice_type＝＝EB)
32	num_ref_idx_l1_active_minus1	2	ue(v)	31	if(slice_type＝＝EB)
32	num_ref_idx_l1_active_minus1	2	ue(v)	33	}
34	}			33	}
34	}			35	Ref_pic_list_reordering()	2
36	if(!layer_base_flag){			35	Ref_pic_list_reordering()	2
36	if(!layer_base_flag){			37	base_id	2	ue(v)
38	adaptive_prediction_flag	2	u(1)	37	base_id	2	ue(v)
38	adaptive_prediction_flag	2	u(1)	39	if(!adaptive_prediction_flag){
40	default_base_mode_flag	2	u(1)	39	if(!adaptive_prediction_flag){
40	default_base_mode_flag	2	u(1)	41	if(!default_base_mode_flag){
42	adaptive_motion_prediction_flag	2	u(1)	41	if(!default_base_mode_flag){
42	adaptive_motion_prediction_flag	2	u(1)	43	if(!adaptive_motion_prediction_flag)
44	default_motion_prediction_flag	2	u(1)	43	if(!adaptive_motion_prediction_flag)

45	}
45	}			46	inv_tone_map_delta_flag	1	u(1)
47	if(inv_tone_map_delta_flag){			46	inv_tone_map_delta_flag	1	u(1)
47	if(inv_tone_map_delta_flag){			48	level_lookup_table_luma_minus8	1	u(v)
49	for(i ＝0； i<(1<<(8+ level_lookup_table_luma_minus8))；i++){			48	level_lookup_table_luma_minus8	1	u(v)
49	for(i ＝0； i<(1<<(8+ level_lookup_table_luma_minus8))；i++){			50	offset_val_lookup_table_luma_delta[i]	1	se(v)
51	}			50	offset_val_lookup_table_luma_delta[i]	1	se(v)
51	}			52	chroma_inv_tone_map_delta_flag	1	u(1)
53	if(chroma_inv_tone_map_delta_flag){			52	chroma_inv_tone_map_delta_flag	1	u(1)
53	if(chroma_inv_tone_map_delta_flag){			54	level_lookup_table_chroma_minus8	1	u(v)
55	for( i＝0； i<(1<<(8+ level_lookup_table_chroma_minus8))；			54	level_lookup_table_chroma_minus8	1	u(v)
55	for( i＝0； i<(1<<(8+ level_lookup_table_chroma_minus8))；			56	i++){
57	offset_val_lookup_table_cb_delta[i]	1	se(v)	56	i++){
57	offset_val_lookup_table_cb_delta[i]	1	se(v)	58	}
59	cr_inv_tone_map_delta_flag	1	u(1)	58	}
59	cr_inv_tone_map_delta_flag	1	u(1)	60	if(cr_inv_tone_map_delta_flag){
61	for( i＝0； i<(1<<(8+ level_lookup_table_chroma_minus8))；			60	if(cr_inv_tone_map_delta_flag){
61	for( i＝0； i<(1<<(8+ level_lookup_table_chroma_minus8))；			62	i++){
63	offset_val_lookup_table_cr_delta[i]	1	se(v)	62	i++){
63	offset_val_lookup_table_cr_delta[i]	1	se(v)	64	}
65	}			64	}
65	}			66	}
67	}			66	}
67	}			68	}
69	adaptive_residual_prediction_flag	2	u(1)	68	}
69	adaptive_residual_prediction_flag	2	u(1)	70	}
71	if((weighted_pred_flag && slice_type＝＝EP)\|\| (weighted_bipred_idc＝＝1 && slice_type＝＝EB)){			70	}
71				72	if(adaptive_prediction_flag)
73	base_pred_weight_table_flag	2	u(1)	72	if(adaptive_prediction_flag)
73	base_pred_weight_table_flag	2	u(1)	74	if(layer_base_flag\|\|base_pred_weight_table_flag＝＝0)
75	pred_weight_table()			74	if(layer_base_flag\|\|base_pred_weight_table_flag＝＝0)
75	pred_weight_table()			76	}
77	if(nal_ref_idc!＝0){			76	}
77	if(nal_ref_idc!＝0){			78	dec_ref_pic_marking()	2
79	if(use_base_prediction_flag && nal_unit_type!＝21)			78	dec_ref_pic_marking()	2
79	if(use_base_prediction_flag && nal_unit_type!＝21)			80	dec_ref_pic_marking_base()
81	}			80	dec_ref_pic_marking_base()
81	}			82	}
83	if(entropy_coding_mode_flag && slice_type!＝EI)			82	}
83	if(entropy_coding_mode_flag && slice_type!＝EI)			84	cabac_init_idc	2	ue(v)
85	slice_qp_delta	2	se(v)	84	cabac_init_idc	2	ue(v)
85	slice_qp_delta	2	se(v)	86	if(deblocking_filter_control_present_flag){
87	disable_deblocking_filter_idc	2	ue(v)	86	if(deblocking_filter_control_present_flag){

88	if(disable_deblocking_filter_idc!＝1){
88	if(disable_deblocking_filter_idc!＝1){			89	slice_alpha_c0_offset_div2	2	se(v)
90	slice_beta_offset_div2	2	se(v)	89	slice_alpha_c0_offset_div2	2	se(v)
90	slice_beta_offset_div2	2	se(v)	91	}
92	}			91	}
92	}			93	if(interlayer_deblocking_filter_control_present_flag){
94	disable_interlayer_deblocking_filter_idc	2	ue(v)	93	if(interlayer_deblocking_filter_control_present_flag){
94	disable_interlayer_deblocking_filter_idc	2	ue(v)	95	if(disable_interlayer_deblocking_filter_idc!＝1){
96	interlayer_slice_alpha_c0_offset_div2	2	se(v)	95	if(disable_interlayer_deblocking_filter_idc!＝1){
96	interlayer_slice_alpha_c0_offset_div2	2	se(v)	97	interlayer_slice_beta_offset_div2	2	se(v)
98	}			97	interlayer_slice_beta_offset_div2	2	se(v)
98	}			99	}
100	constrained_intra_upsampling_flag	2	u(1)	99	}
100	constrained_intra_upsampling_flag	2	u(1)	101	if(quality_id＝＝0)
102	if(num_slice_groups_minus1>0 && slice_group_map_type>＝3 && slice_group_map_type<＝5)			101	if(quality_id＝＝0)
102				103	slice_group_change_cycle	2	u(v)
104	if(quality_id＝＝0 && extended_spatial_scalability>0){			103	slice_group_change_cycle	2	u(v)
104	if(quality_id＝＝0 && extended_spatial_scalability>0){			105	if(chroma_format_idc>0){
106	base_chroma_phase_x_plus1	2	u(2)	105	if(chroma_format_idc>0){
106	base_chroma_phase_x_plus1	2	u(2)	107	base_chroma_phase_y_plus1	2	u(2)
108	}			107	base_chroma_phase_y_plus1	2	u(2)
108	}			109	if(extended_spatial_scalability＝＝2){
110	scaled_base_left_offset	2	se(v)	109	if(extended_spatial_scalability＝＝2){
110	scaled_base_left_offset	2	se(v)	111	scaled_base_top_offset	2	se(v)
112	scaled_base_right_offset	2	se(v)	111	scaled_base_top_offset	2	se(v)
112	scaled_base_right_offset	2	se(v)	113	scaled_base_bottom_offset	2	se(v)
114	}			113	scaled_base_bottom_offset	2	se(v)
114	}			115	}
116	if(use_base_prediction_flag)			115	}
116	if(use_base_prediction_flag)			117	store_base_rep_flag	2	u(1)
118	if(quality_id＝＝0){			117	store_base_rep_flag	2	u(1)
118	if(quality_id＝＝0){			119	if(BaseFrameMbsOnlyFlag && !frame_mbs_only_flag && !field_pic_flag)
120	base_frame_and_bottom_field_coincided_flag	2	u(1)	119
120	base_frame_and_bottom_field_coincided_flag	2	u(1)	121	else if(frame_mbs_only_flag && !BaseFrameMbsOnlyFlag && !BaseFie ldPicFlag)
122	base_bottom_field_coincided_flag	2	u(1)	121
122	base_bottom_field_coincided_flag	2	u(1)	123	}
124	SpatialScalabilityType=spatial_scalability_type ()/* [Ed.: should be moved to semanteme and delete grammatic function] */			123	}
124				125	}

Table 3: the exemplary picture bar head in the scalable expansion

In one embodiment, the BL picture bar based on behind original EL picture bar and the up-sampling, after the reconstruct produces a LUT.The MB of the correspondence position of the BL picture bar behind the one or more MB and up-sampling based on original EL picture bar, after the reconstruct produces the 2nd LUT.As mentioned above, be LUT/ increment LUT with these two LUT differential codings.Therefore, can use a LUT at the decoder place, (except that those MB that the 2nd LUT is consulted) are mapped to EL picture bar with the BL picture bar behind the up-sampling, after the reconstruct, and can use the 2nd LUT at the decoder place, shine upon those MB that it is consulted.The method of this generation LUT has following advantage: can optimize decoding, this is because LUT has defined at the obtainable picture bar in decoder place (the BL picture bar behind the up-sampling, after the reconstruct) and had mapping between the highest EL picture bar (being original EL picture bar) that obtains quality.Use the advantage of hierarchical luts to be: LUT collects best-fit in actual video data, and this is owing to all be isomorphism usually as the most of of bar, and in the picture bar some are than possible the difference in zonule.Advantageously, these zones are defined difference LUT respectively.Therefore, this method best-fit is in the needs of decoder and the reconstruct of first water.

In one embodiment, LUT is encoded with the EL data and send.At the encoder place, use these LUT, by the BL data prediction EL data after the reconstruct, and residual error carried out intraframe coding and send.Then, at the decoder place, LUT is applied to BL data after the reconstruct, and adds residual error.The result is the decoded EL image with high color bit-depth.

Advantageously, with added, support to insert in the head, for example for picture bar level interpolation slice_header_in_scalable_extension based on the syntactic element of the contrary tone mapping of LUT.

In fact, different units (image, as bar, MB) can have Different L UT.Adding new syntactic element in the head of level separately makes and adopts contrary tone mapping to have flexibility.For example, in the situation of object-based picture bar segmentation, different has different characteristics as bar, and in the middle of different picture bars, and the relation between the EL picture bar of BL picture bar and correspondence position can be very different.Therefore, it can be useful creating different LUT for difference as bar.On the other hand, to tie up on the sequence of a plurality of images can be constant to the pass between the EL picture bar of characteristic and BL picture bar and correspondence position.In this case, can produce higher LUT for higher level (for example sequence or GOP level), be one, interior zone (for example as bar, MB group, MB) the more rudimentary LUT of generation of some or all these images.In one embodiment, the specific region that defines in this more rudimentary LUT and each image is associated.In another embodiment, separately zone association in each image of single more rudimentary LUT and sequence can be got up.In one embodiment, MB has related increment LUT, and the next MB in the sequence has the indication that identical increment LUT is applied as once more last MB.Can on the code level except that MB, use identical principle.

Fig. 3 shows and utilizes exemplary decoder inter-layer prediction, that be used for the BL image after the intraframe coding.BL and EL information BL after receiving the coding that has the back LUT that encodes according to the present invention _Enc, EL _EncAfterwards, for example in multiplex packet bit stream, isolate BL, EL and LUT information, BL information, EL information and LUT are carried out the entropy decoding.In this example, LUT is included in the EL information.Then, with inverse quantization Q ^-1With inverse transformation T ^-1Be applied to video data, in LUT decoding unit LUTdec to hierarchical luts---LUT ₁, LUT ₂Decode.The LUT decoding unit reconstructs more senior LUT, increment LUT and final more rudimentary LUT, and provides two or more decoded look-up tables for bit depth prediction unit B DUp.Reconstruct according to the highest LUT behind 1 pair of coding of equation can be used (V _EncBe the value behind the coding):

V(0)＝V _enc(0)，

V(1)＝V(0)-V _enc(1)，

V(2)＝V(1)-V _enc(2)，

…，

V(2 ^NB-1)＝V(2 ^NB-2)-V _enc(2 ^NB-1) (4)

Reconstruct according to the more rudimentary LUT behind 2 pairs of codings of equation can be used:

LUT ^i-1≡LUT ⁱ-ΔLUT ⁱ＝{V ⁱ(0)-dV ⁱ(0)，V ⁱ(1)-dV ⁱ(1)，…，V ⁱ(2 ^NB-1)-dV ⁱ(2 ^NB-1)}

(5)

Wherein, common most of dV ⁱ(k) be zero.

For BL, to treatment of picture after the intraframe coding with identical for traditional SVC: the usage space infra-frame prediction comes reconstructed image, promptly based on the last reconfiguration information of identical image.After deblocking, can be with the BL signal BL that obtains _RecBe presented on the standard SVC display of 8 bit color depth.Can also use this signal to produce the predicted version Pre of the EL image of correspondence position _c{ Pre _t{ BL _Rec}: for this reason, this signal is carried out texture up-sampling TUp, wherein obtain the texture prediction version Pre of EL image _t{ BL _Rec, then, the look-up table behind the coding that use is extracted carries out bit-depth up-sampling BDUp to this texture prediction version.Then, use texture and bit-depth up-sampling, BL image Pre after the reconstruct _c{ Pre _t{ BL _Rec, upgrade A _{2, EL}EL residual error EL ' after improved, inverse quantization and the inverse transformation _ResThereby, obtain after deblocking, to can be used as EL video EL _RecExport to the signal of HQ display.

Certainly, the decoder of operating under the EL pattern also can produce BL video BL in inside _Rec, this is to predict owing to needing it to be used for EL, but the BL video must not obtain in decoder output place.In one embodiment, decoder has two outputs, and one is used for BL video BL _Rec, one is used for EL video EL _Rec, and in another embodiment, decoder only has the EL of being used for video EL _RecOutput.

EL MB for correspondence position BL MB is wherein carried out interframe encode does not have following constraint: must use and the identical inter-layer prediction based on LUT of situation that correspondence position BL MB is carried out intraframe coding.For example, correspondence position BL MB is being carried out under the situation of interframe encode, linear scale can be served as the method for bit-depth up-sampling.

As mentioned above, for the intra encoder of Fig. 2, this decoder also can work in and encode under the corresponding different mode.Therefore, from bit stream, extract sign separately and sign estimated, for example determine whether to use the indication base_mode_flag of inter-layer prediction.If do not use, then use deblock, spatial prediction and to the renewal A of spatial prediction image _{1, EL}, come reconstruct EL image traditionally.

In one embodiment of the invention, proposed a kind of equipment to coding video data with basic unit and enhancement layer, wherein basic unit's pixel has littler color bit-depth than enhancement layer pixels, and described equipment comprises:

Be used for the code device T, the Q that on first particle size fraction, base layer data are encoded, wherein base layer data carried out intraframe coding,

The device T that is used for the base layer data after reconstruct is encoded ^-1, Q ^-1,

Be used for producing the first tone mapping table LUT at the base layer data after the intraframe coding _GOPDevice, the table LUT _GOPDefined the base layer data Pre after original enhancement layer data and the corresponding reconstruct _t{ BL _RecBetween tone mapping,

Be used for producing the second different tone mapping table LUT at the base layer data fragment after the intraframe coding (for example MB) _MBDevice, the table LUT _MBDefined original enhancement layer data EL _OrgDescribed fragment and reconstruct after base layer data Pre _t{ BL _RecHomologous segment between tone mapping,

Be used to produce difference table dLUT _MBDevice, the table dLUT _MBRepresent first and second tone mapping table LUT _GOP, LUT _MBBetween difference,

Be used for based on the described first and second tone mapping tables, base layer data after the reconstruct is carried out the device BDUp of bit-depth up-sampling, wherein, only use the second tone mapping table, and obtain the predicted version Pre of corresponding enhancement data for the described fragment of the basic unit after the reconstruct _c{ Pre _t{ BL _Rec, it has higher bit-depth resolution than base layer data,

Be used to produce enhancement layer residual EL ' _ResDevice, enhancement layer residual EL ' _ResBe the corresponding predicted version Pre of original enhancement layer data with enhancement data _c{ Pre _t{ BL _RecBetween difference, and

Be used for enhancement layer residual, the first tone mapping table LUT _GOPWith described difference table dLUT _MBCarry out apparatus for encoding, wherein the first tone mapping table behind the coding is associated with basic unit or enhancement data behind the coding, and difference table with encode after base layer data or the described fragment of enhancement data be associated.

In one embodiment, this encoding device further comprises: device TUp was used for before described bit-depth up-sampling the base layer data BL after the reconstruct _RecCarry out up-sampling, wherein obtained the first predicted version Pre of corresponding enhancement data _t{ BL _Rec, it has higher space, time or SNR resolution and is used to described bit-depth up-sampling step than base layer data.

In one embodiment of the invention, proposed a kind of equipment that video data with basic unit and enhancement layer is decoded, described equipment comprises:

Be used for from the coding after enhancement data EL _EncOr base layer data BL _ENCExtract the device of the first and second tone mapping (enum) datas relevant with enhancement data after the intraframe coding,

Be used for from the tone mapping (enum) data reconstruct first tone mapping table LUT that extracts _GOPDevice,

Be used for from the tone mapping (enum) data of extraction and the first tone mapping table reconstruct, the second tone mapping table LUT after the described reconstruct _MBDevice, wherein, the tone mapping (enum) data of the extraction that is utilized is represented the difference dLUT between described first and second tone mapping table _MB,

The device of second coding unit that is used for determining first coding unit relevant and is correlated with the second tone mapping table with the first tone mapping table, wherein, second coding unit is the sub-fraction of described first coding unit,

Be used for the base layer data that receives and enhancement data are carried out the device T of inverse quantization and inverse transformation ^-1, Q ^-1, wherein, the enhancement data after inverse quantization and the inverse transformation comprises residual error EL ' _Res,

The device A that is used for the base layer data after the reconstruct intraframe coding _{1, BL}, PR _I, DBL _I,

Be used for the base layer data BL after the reconstruct _RecCarry out the device BDUp of up-sampling, wherein increased the numerical value degree of depth of every pixel, and for the pixel in described second coding unit, use the second tone mapping table, and, use the first tone mapping table, and obtain the enhancement data Pre of prediction for the residual pixel of first coding unit _c{ Pre _t{ BL _Rec, and

Be used for from the enhancement data Pre of prediction _c{ Pre _t{ BL _RecAnd inverse quantization and inverse transformation after enhancement data reconstruct the device A of the enhancement layer video data of reconstruct _{2, EL}

Exemplarily, in one embodiment, proposed a kind of equipment that video data with basic unit and enhancement layer is decoded, described equipment comprises:

Be used for enhancement data or the base layer data device that extracts the first and second tone mapping (enum) datas behind the coding, one or more heads of the enhancement data of the first and second tone mapping (enum) datas after from intraframe coding,

Be used for from the device of the tone mapping (enum) data reconstruct first tone mapping table that extracts,

Be used for from the tone mapping (enum) data of extraction and the device of the first tone mapping table reconstruct, the second tone mapping table after the described reconstruct, wherein, the tone mapping (enum) data of the extraction that is utilized is represented the difference between described first and second tone mapping table,

Be used for the base layer data that receives and enhancement data are carried out the device of inverse quantization and inverse transformation, wherein, the enhancement data after inverse quantization and the inverse transformation comprises residual error,

The device that is used for the base layer data after the reconstruct intraframe coding,

Be used for the base layer data after the reconstruct is carried out the device of up-sampling, wherein increase pixel count and increased the value degree of depth of every pixel, wherein for the first intra-coding data unit, use the first tone mapping table, and for the second intra-coding data unit that is included in first data cell, use the second tone mapping table, and obtain the enhancement data of prediction, and

Be used for reconstructing the device of the EL video information of reconstruct from the EL data of prediction and the EL information after inverse quantization and the inverse transformation.

It is noted that term " tone mapping " describes identical process with " contrary tone mapping " from different viewpoints.Therefore, use them with the free burial ground for the destitute herein.For example in JVT, use term " contrary tone mapping ", describe by of the prediction of low bit-depth (being BL) to the higher bit degree of depth (being EL).Yet, the term that herein uses can be interpreted as and get rid of the practicality of the present invention JVT.Identical situation is also applicable to other standards.

In addition, the part after the not all intraframe coding of BL image all needs to use the contrary tone mapping based on LUT.Can pass through some distortion measurement technology, determine whether to use contrary tone mapping based on LUT.If determine to use contrary tone mapping techniques, then for example, will select the INTRA_BL pattern based on LUT; If determine not use, then can use shared AVC instrument, come current EL MB coding.

Because number of colors possible among BL and the EL is different, so each BL color can be mapped to different EL colors.Usually, these different EL colors are closely similar, therefore in colour code or colour gamut " adjacent ".

Fig. 4 shows at GOP, as an exemplary collection of the classification look-up table of the tone on bar and MB level mapping.GOP comprises a plurality of image I that have similar characteristic about higher bit degree of depth color ₁, I ₂..., I _nFor example, use particular color more continually than its " adjacent " color.Exemplarily, at least one in the image, for example I ₂, comprise a plurality of picture bar SL ₁, SL ₂, SL ₃, at these as one of bar SL ₂EL in, more often do not use this special neighbourhood color than another second adjacent color.In addition, at one of picture bar SL ₃In, include one or more macro blocks, wherein also be more often not use this special neighbourhood color than described second (or another the 3rd) adjacent color.First tone mapping look-up table LUT that is sent _GOPDefined the general mapping on the GOP level between BL and the EL.In addition, second tone mapping look-up table LUT _SLDefine the different mappings of the described color on the picture bar level, only consulted picture bar SL separately ₂With this particular B L color.With second tone mapping look-up table LUT _SLDifferential coding is " increment LUT " dLUT _SL, then with dLUT _SLSend.Two tables are all consulted with them, and their zone separately (being GOP and picture bar) is associated, for example by indication or by inserting head separately.In addition, produce another three color scheme mapping look-up table LUT _MBAnd use it for as the one or more macro block MB in one of bar ₁, MB ₂, MB ₃, MB ₄To this three color scheme mapping look-up table LUT _MBAlso carrying out differential coding, (is LUT in this example with respect to five-star table promptly _GOP) come differential coding.Then, with increment LUT dLUT _MBBe associated with its MB separately that consults or a plurality of MB, and with dLUT _MBSend.

Fig. 5 shows at GOP, as another exemplary collection of the classification tone mapping look-up table of the tone on bar and MB level mapping.It and Fig. 4 are similar, except to more rudimentary tone mapping look-up table LUT _MB, be right after higher one-level with respect to it and (be LUT in this example _SL) encode.Because the characteristic of natural video frequency, this coding can to consult highest LUT more suitable than returning in Fig. 4.In addition, the MB that consulted of MB level tone mapping LUT is positioned at the picture bar SL that has independent related tone mapping LUT ₂In.Picture bar level table LUT _SLOnly be SL ₂Domination GOP level table LUT _GOP, MB level table is MB ₂Arrange GOP level table LUT simultaneously _GOPWith picture bar level table LUT _SLIn addition, can be for example MB ₃Produce another MB level LUT.In one embodiment, MB level look-up table can be consulted more than a macro block, for example can consult MB ₁And MB ₂

Usually, in the inapplicable zone of more rudimentary tone mapping table, ignore this table (for example, for the MB among Fig. 5 ₁, ignore LUT _MB).In addition, can implicitly produce more rudimentary tone mapping table, for example by carrying out the mapping step in two sub-steps: at first in more rudimentary LUT, inquire about specific input value,, then use this output valve if determine that more rudimentary LUT has defined output valve for this specific input value.Yet, if more rudimentary LUT is not this specific input value definition output valve, for example:, in more senior LUT, inquire about input value because more rudimentary LUT is partial L UT.If have above, then begin to search for continuously two or more more senior LUT continuously, up to there being one to provide output valve for input value from being right after higher one-level more than two levels.

The advantage that being used for of being presented expands to spatial scalability the classification look-up table method of bit-depth scalability is: the data volume that send is very low, and this is because look-up table is suitable for the content of image individually and is compressed.Therefore, the control data (being the LUT data) and the amount of actual video data are minimized.In addition, do not need new predictive mode to realize expansion to the color bit-depth scalability.

Additional advantages of the present invention are: to the complete compatibility of other type scalability, robustness and to the extensibility of advanced technology.Especially, the present invention is still keeping the single loop decode structures, so that when will only be applied to image after basic unit's intraframe coding or image section based on the contrary tone mapping of LUT, has improved code efficiency.

Because the BL data after the use reconstruct are used for the generation of up-sampling and look-up table, the prediction of encoder one side adapts to the prediction of decoder one side better, so that residual error is more suitable for and can obtain better prediction and reconstruction result in decoder one side, this also is an advantage.

The present invention can be used to ges forschung device, scalable decoding device and scalable signal, especially can be used to vision signal or have the layer of different quality and other type signal of high interlayer redundancy.

Should be appreciated that and intactly described the present invention for example, and in the case without departing from the scope of the present invention, can make modification details of the present invention.Can be independently or with suitable combination arbitrarily furnish an explanation book and appropriate location and claim and disclosed each feature of accompanying drawing.Appropriate location in hardware, software or both combinations is realized these features.The reference number that occurs in the claim is only as example, and the system of being limited in scope to claim does not influence.

Claims

1. method to coding video data with basic unit and enhancement layer, wherein the pixel of basic unit has littler color bit-depth than the pixel of enhancement layer, and described method comprises step:

-with first particle size fraction base layer data is encoded (T, Q), wherein base layer data is carried out intraframe coding;

-reconstruct (T ^-1, Q ^-1) base layer data behind the coding;

-produce the first tone mapping table (LUT at the base layer data after the intraframe coding _GOP), this table definition the base layer data (Pre after original enhancement layer data and the corresponding reconstruct _t{ BL _Rec) between tone mapping;

-produce the second different tone mapping table (LUT at the base layer data fragment (MB2) after the intraframe coding _MB), this table definition original enhancement layer data (EL _Org) described fragment and corresponding reconstruct after base layer data (Pre _t{ BL _Rec) fragment between tone mapping;

-generation difference table (dLUT _MB), this first and second tone mapping table of table expression (LUT _GOP, LUT _MB) between difference;

-based on the described first and second tone mapping tables, base layer data after the reconstruct is carried out bit-depth up-sampling (BDUp), wherein, only use the second tone mapping table, and obtain the predicted version (Pre of corresponding enhancement data for the described fragment of the base layer data after the reconstruct _c{ Pre _t{ BL _Rec), it has higher bit-depth resolution than base layer data;

-generation enhancement layer residual (EL ' _Res), described enhancement layer residual (EL ' _Res) be the original enhancement layer data and the predicted version (Pre of corresponding enhancement data _c{ Pre _t{ BL _Rec) between difference; And

-to enhancement layer residual, the first tone mapping table (LUT _GOP) and described difference table (dLUT _MB) encode, wherein the first tone mapping table behind the coding is associated with basic unit or enhancement data behind the coding, and difference table with encode after base layer data or the described fragment of enhancement data be associated.

2. method according to claim 1 also comprises step: before described bit-depth up-sampling step, to the base layer data (BL after the reconstruct _Rec) carry out up-sampling (TUp), wherein, obtain the first predicted version (Pre of corresponding enhancement data _t{ BL _Rec), it has higher space, time or SNR resolution than base layer data, and is used to described bit-depth up-sampling step.

3. method according to claim 1 and 2 wherein, is carried out entropy coding to the first tone mapping table and/or difference table, wherein, represents clauses and subclauses by the difference value relevant with last clauses and subclauses.

4. according to any described method in the claim 1 to 3, wherein produce independent tone mapping table at brightness and chrominance channe.

5. according to any described method in the claim 1 to 4, wherein, do not produce the tone mapping table at the base layer data after the interframe encode.

6. method that the video data with basic unit and enhancement layer is decoded, described method comprises step:

-from the coding after enhancement data (EL _Enc) or base layer data (BL _ENC) extract and first and second relevant tone mapping (enum) datas of enhancement data after the intraframe coding;

-the tone mapping (enum) data reconstruct first tone mapping table (LUT from extracting _GOP);

-from the tone mapping (enum) data of extraction and the first tone mapping table reconstruct, the second tone mapping table (LUT after the described reconstruct _MB), wherein, the tone mapping (enum) data of extraction is represented the difference (dLUT between described first and second tone mapping table _MB);

-determine first coding unit relevant and second coding unit relevant with the second tone mapping table with the first tone mapping table, wherein, second coding unit is the sub-fraction of described first coding unit;

-base layer data and the enhancement data that receive are carried out inverse quantization (T ^-1) and inverse transformation (Q ^-1), wherein, the enhancement data after inverse quantization and the inverse transformation comprise residual error (EL ' _Res);

-reconstruct (A _{1, BL}, PR _I, DBL _I) base layer data after the intraframe coding;

-to the base layer data (BL after the reconstruct _Rec) carry out up-sampling (BDUp), wherein increased the value degree of depth of every pixel, and for the pixel in described second coding unit, use the second tone mapping table, and, use the first tone mapping table, and obtain the enhancement data (Pre of prediction for the residual pixel of first coding unit _c{ Pre _t{ BL _Rec); And

-from the prediction enhancement data (Pre _c{ Pre _t{ BL _Rec) and inverse quantization and inverse transformation after enhancement data, reconstruct (A _{2, EL}) enhancement layer video data (EL of reconstruct _Rec).

7. method according to claim 6, wherein, the up-sampling step also comprises: one or more in space, time or the SNR up-sampling (TUp).

8. according to claim 6 or 7 described methods, wherein, to described first and/or the described difference second tone mapping table carry out differential coding.

9. one kind comprises base layer video data (BL _ENC) and enhancement layer video data (EL _ENC) signal, basic unit has littler color bit-depth than enhancement layer, wherein base layer data comprises the video data after the intraframe coding, described signal also comprise with intraframe coding after the first relevant tone mapping (enum) data of first level of video data, and also comprise the second relevant tone mapping (enum) data of defined segmentation in described first level with video data, wherein the first tone mapping (enum) data is represented the first table (LUT _GOP), being used for the bit-depth up-sampling of the pixel of described first level except that described segmentation, basic unit, the second tone mapping (enum) data is represented the second table (LUT _SL) and described first the table between difference, wherein said second the table (LUT _SL) be used for the bit-depth up-sampling of the pixel of described segmentation.

10. equipment to coding video data with basic unit and enhancement layer, wherein basic unit's pixel has littler color bit-depth than enhancement layer pixels, and described equipment comprises:

-be used for first particle size fraction encode device (T, Q) of (T, Q) of base layer data is wherein carried out intraframe coding to base layer data;

-be used for the device (T of the base layer data behind the reconstruct coding ^-1, Q ^-1);

-be used for the base layer data (BL after the reconstruct _Rec) carry out the device (TUp) of up-sampling, wherein obtain the first predicted version (Pre of corresponding enhancement data _t{ BL _Rec), it has higher space, time or SNR resolution than base layer data;

-being used to first, device (CM) that the base layer slice after the intraframe coding produces tone mapping table (LUT), this tone mapping table (LUT) has defined first base layer slice (Pre up-sampling, after the reconstruct _t{ BL _Rec) with corresponding original enhancement layer picture bar (EL _Org) between independent mapping, wherein two are compared mutually as bar, and the base layer slice not at second, after the interframe encode produces the tone mapping table;

-be used for shining upon the first predicted version (Pre to enhancement layer slice based on described tone _t{ BL _Rec) carry out the device (BDUp) of bit-depth up-sampling, wherein obtain the second predicted version (Pre of corresponding enhancement layer slice _c{ Pre _t{ BL _Rec), its first predicted version than enhancement layer slice has higher bit-depth resolution;

-be used to produce enhancement layer residual (EL ' _Res) device, enhancement layer residual (EL ' _Res) be original enhancement layer picture bar (EL _Org) with the second predicted version (Pre of corresponding enhancement layer slice _c{ Pre _t{ BL _Rec) between difference; And

-be used for enhancement layer residual (EL ' _Res) be encoded to the device (T of enhancement layer slice _EL, Q _EL, EC _EL), wherein enhancement layer slice comprises its corresponding tone mapping table (LUT) at it in as the bar head.

11. the equipment that the video data with basic unit and enhancement layer is decoded, described equipment comprises:

-be used for from the coding after enhancement data (EL _Enc) or base layer data (BL _ENC) extract the device of the first and second tone mapping (enum) datas, one or more heads of the enhancement data of the first and second tone mapping (enum) datas after from intraframe coding;

-be used for from the tone mapping (enum) data reconstruct first tone mapping table (LUT that extracts _GOP) device;

-be used for from the tone mapping (enum) data of extraction and the first tone mapping table reconstruct, the second tone mapping table (LUT after the described reconstruct _MB) device, wherein, the tone mapping (enum) data of extraction is represented the difference (dLUT between described first and second tone mapping table _MB);

-be used for the base layer data and the enhancement data that receive are carried out inverse quantization (T ^-1) and inverse transformation (Q ^-1) device, wherein, the enhancement data after inverse quantization and the inverse transformation comprise residual error (EL ' _Res);

-be used for reconstruct (A _{1, BL}, PR _I, DBL _I) the device of base layer data after the intraframe coding;

-be used for the base layer data (BL after the reconstruct _Rec) carry out the device of up-sampling (BDUp, TUp), wherein increase pixel count and increased the value degree of depth of every pixel, wherein for the first intra-coding data unit, use the first tone mapping table, and for the second intra-coding data unit that is included in first data cell, use the second tone mapping table, and obtain the enhancement data (Pre of prediction _c{ Pre _t{ BL _Rec); And

-be used for from the enhancement data (Pre of prediction _c{ Pre _t{ BL _Rec) and inverse quantization and inverse transformation after enhanced layer information reconstruct (A _{2, EL}) the enhancement-layer video information (EL of reconstruct _Rec) device.

12., also comprise: be used for before described bit-depth up-sampling step, to the base layer data (BL after the reconstruct according to claim 10 or 11 described equipment _Rec) carry out the device (TUp) of up-sampling; Generation has the first predicted version (Pre of the corresponding enhancement data of higher space, time or SNR resolution than base layer data _t{ BL _Rec) device, the first predicted version (Pre of corresponding enhancement data wherein _t{ BL _Rec) be used to described bit-depth up-sampling.