GB2521349A

GB2521349A - Data encoding and decoding

Info

Publication number: GB2521349A
Application number: GB1321523.1A
Authority: GB
Inventors: Karl James Sharman
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2013-12-05
Filing date: 2013-12-05
Publication date: 2015-06-24
Also published as: GB201321523D0

Abstract

An array of input image data having a data precision of n bits is encoded and subsequently decoded. The encoding comprises compressing (1500) (e.g. rounding, bit shift, mapping function) the input n-bit image data values to a data precision lower than n bits. The compressed values are frequency-transformed (1510) to generate an array of frequency transformed coefficients by a matrix multiplication process, according to a maximum dynamic range of the transform process and using transform matrices having a data precision lower than n bits. The resulting image coefficients are then decompressed (1520) (e.g. bit shifted, mapping function applied) to a data precision of n bits. The precision of the data is therefore reduced before frequency transformation, the resulting image coefficients being restored to the increased precision. Transformation does not have to be done at such high precision, thereby reducing hardware complexity in some systems where the highest accuracy is either not required or not possible.

Description

DATA ENCODING AND DECODING

Field of the Invention

This disclosure relates to data encoding and decoding.

Description of the Related Art

The "background" description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as

prior art against the present disclosure.

There are several video data compression and decompression systems which involve transforming video data into a frequency domain representation, quantising the frequency domain coefficients and then applying some form of entropy encoding to the quantised coefficients.

It is noted that different instances of video data can have different respective bit depths or data precisions.

Summary

This disclosure provides a data encoding method according to claim 1.

Further respective aspects and features are defined in the appended claims.

It is to be understood that both the foregoing general description and the following detailed description are exemplary, but not restrictive of, the present disclosure.

Brief Description of the Drawings

A more complete appreciation of the disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description of embodiments, when considered in connection with the accompanying drawings, wherein: Figure 1 schematically illustrates an audio/video (AN) data transmission and reception system using video data compression and decompression; Figure 2 schematically illustrates a video display system using video data decompression; Figure 3 schematically illustrates an audio/video storage system using video data compression and decompression; Figure 4 schematically illustrates a video camera using video data compression; Figure 5 provides a schematic overview of a video data compression and decompression apparatus; Figure 6 schematically illustrates the generation of predicted images; Figure 7 schematically illustrates a largest coding unit ([CU); Figure 8 schematically illustrates a set of four coding units (CU); Figures 9 and 10 schematically illustrate the coding units of Figure 8 sub-divided into smaller coding units; Figure 11 schematically illustrates an array of prediction units (PU); Figure 12 schematically illustrates an array of transform units (TU); Figure 13 is a schematic diagram showing an overview of an encoding system; Figure 14 is a table providing examples of encoding profiles; Figure 15 schematically illustrates a part of the functionality of Figure 13, using a compression and decompression arrangement; Figure 16a schematically illustrates a rounding unit; Figure 16b schematically illustrates a shift unit; Figure 17a schematically illustrates a mapping unit; Figure 17b schematically illustrates an inverse mapping unit; Figure 18 schematically illustrates a mapping function; Figure 19 is a schematic flow chart illustrating some of the functionality of an encoder; and Figure 20 is a schematic flow chart illustrating some of the functionality of a decoder.

Description of the Embodiments

Referring now to the drawings, Figures 1-4 are provided to give schematic illustrations of apparatus or systems making use of the compression and/or decompression apparatus to be described below in connection with embodiments.

All of the data compression and/or decompression apparatus is to be described below may be implemented in hardware, in software running on a general-purpose data processing apparatus such as a general-purpose computer, as programmable hardware such as an application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or as combinations of these. In cases where the embodiments are implemented by software and/or firmware, it will be appreciated that such software and/or firmware, and non-transitory machine-readable data storage media by which such software and/or firmware are stored or otherwise provided, are considered as embodiments.

Figure 1 schematically illustrates an audio/video data transmission and reception system using video data compression and decompression.

An input audio/video signal 10 is supplied to a video data compression apparatus 20 which compresses at least the video component of the audio/video signal 10 for transmission along a transmission route 30 such as a cable, an optical fibre, a wireless link or the like. The compressed signal is processed by a decompression apparatus 40 to provide an output audio/video signal 50. For the return path, a compression apparatus 60 compresses an audio/video signal for transmission along the transmission route 30 to a decompression apparatus 70.

The compression apparatus 20 and decompression apparatus 70 can therefore form one node of a transmission link. The decompression apparatus 40 and decompression apparatus 60 can form another node of the transmission link. Of course, in instances where the transmission link is uni-directional, only one of the nodes would require a compression apparatus and the other node would only require a decompression apparatus.

Figure 2 schematically illustrates a video display system using video data decompression. In particular, a compressed audio/video signal 100 is processed by a decompression apparatus 110 to provide a decompressed signal which can be displayed on a display 120. The decompression apparatus 110 could be implemented as an integral part of the display 120, for example being provided within the same casing as the display device.

Alternatively, the decompression apparatus 110 might be provided as (for example) a so-called set top box (STB), noting that the expression "set-top" does not imply a requirement for the box to be sited in any particular orientation or position with respect to the display 120; it is simply a term used in the art to indicate a device which is connectable to a display as a peripheral device.

Figure 3 schematically illustrates an audio/video storage system using video data compression and decompression. An input audio/video signal 130 is supplied to a compression apparatus 140 which generates a compressed signal for storing by a store device 150 such as a magnetic disk device, an optical disk device, a magnetic tape device, a solid state storage device such as a semiconductor memory or other storage device. For replay, compressed data is read from the store device 150 and passed to a decompression apparatus 160 for decompression to provide an output audio/video signal 170.

It will be appreciated that the compressed or encoded signal, and a storage medium or data carrier storing that signal, are considered as embodiments.

Figure 4 schematically illustrates a video camera using video data compression. In Figure 4, and image capture device 180, such as a charge coupled device (COD) image sensor and associated control and read-out electronics, generates a video signal which is passed to a compression apparatus 190. A microphone (or plural microphones) 200 generates an audio signal to be passed to the compression apparatus 190. The compression apparatus 190 generates a compressed audio/video signal 210 to be stored and/or transmitted (shown generically as a schematic stage 220).

The techniques to be described below relate primarily to video data compression. It will be appreciated that many existing techniques may be used for audio data compression in conjunction with the video data compression techniques which will be described, to generate a compressed audio/video signal. Accordingly, a separate discussion of audio data compression will not be provided. It will also be appreciated that the data rate associated with video data, in particular broadcast quality video data, is generally very much higher than the data rate associated with audio data (whether compressed or uncompressed). It will therefore be appreciated that uncompressed audio data could accompany compressed video data to form a compressed audio/video signal. It will further be appreciated that although the present examples (shown in Figures 1-4) relate to audio/video data, the techniques to be described below can find use in a system which simply deals with (that is to say, compresses, decompresses, stores, displays and/or transmits) video data. That is to say, the embodiments can apply to video data compression without necessarily having any associated audio data handling at all.

Figure 5 provides a schematic overview of a video data compression and decompression apparatus.

Successive images of an input video signal 300 are supplied to an adder 310 and to an image predictor 320. The image predictor 320 will be described below in more detail with reference to Figure 6. The adder 310 in fact performs a subtraction (negative addition) operation, in that it receives the input video signal 300 on a +" input and the output of the image predictor 320 on a "-" input, so that the predicted image is subtracted from the input image. The result is to generate a so-called residual image signal 330 representing the difference between the actual and projected images.

One reason why a residual image signal is generated is as follows. The data coding techniques to be described, that is to say the techniques which will be applied to the residual image signal, tends to work more efficiently when there is less "energy" in the image to be encoded. Here, the term "efficiently" refers to the generation of a small amount of encoded data; for a particular image quality level, it is desirable (and considered "efficient") to generate as little data as is practicably possible. The reference to "energy" in the residual image relates to the amount of information contained in the residual image. If the predicted image were to be identical to the real image, the difference between the two (that is to say, the residual image) would contain zero information (zero energy) and would be very easy to encode into a small amount of encoded data. In general, if the prediction process can be made to work reasonably well, the expectation is that the residual image data will contain less information (less energy) than the input image and so will be easier to encode into a small amount of encoded data.

The residual image data 330 is supplied to a transform unit 340 which generates a discrete cosine transform (DCT) representation of the residual image data. The OCT technique itself is well known and will not be described in detail here. There are however aspects of the techniques used in the present apparatus which will be described in more detail below, in particular relating to the selection of different blocks of data to which the OCT operation is applied. These will be discussed with reference to Figures 7-12 below.

Note that in some embodiments, a discrete sine transform (DST) is used instead of a DCT. In other embodiments, no transform might be used. This can be done selectively, so that the transform stage is, in effect, bypassed, for example under the control of a "transform skip" command or mode.

The output of the transform unit 340, which is to say, a set of transform coefficients for each transformed block of image data, is supplied to a quantiser 350. Various quantisation techniques are known in the field of video data compression, ranging from a simple multiplication by a quantisation scaling factor through to the application of complicated lookup tables under the control of a quantisation parameter. The general aim is twofold. Firstly, the quantisation process reduces the number of possible values of the transformed data. Secondly, the quantisation process can increase the likelihood that values of the transformed data are zero. Both of these can make the entropy encoding process, to be described below, work more efficiently in generating small amounts of compressed video data.

A data scanning process is applied by a scan unit 360. The purpose of the scanning process is to reorder the quantised transformed data so as to gather as many as possible of the non-zero quantised transformed coefficients together, and of course therefore to gather as many as possible of the zero-valued coefficients together. These features can allow so-called run-length coding or similar techniques to be applied efficiently. So, the scanning process involves selecting coefficients from the quantised transformed data, and in particular from a block of coefficients corresponding to a block of image data which has been transformed and quantised, according to a "scanning order' so that (a) all of the coefficients are selected once as part of the scan, and (b) the scan tends to provide the desired reordering. Techniques for selecting a scanning order will be described below. One example scanning order which can tend to give useful results is a so-called zigzag scanning order.

The scanned coefficients are then passed to an entropy encoder (EE) 370. Again, various types of entropy encoding may be used. Two examples which will be described below are variants of the so-called CABAC (Context Adaptive Binary Arithmetic Coding) system and variants of the so-called CAVLC (Context Adaptive Variable-Length Coding) system. In general terms, CABAC is considered to provide a better efficiency, and in some studies has been shown to provide a 10-20% reduction in the quantity of encoded output data for a comparable image quality compared to CAVLC. However, CAVLC is considered to represent a much lower level of complexity (in terms of its implementation) than CABAC.

Note that the scanning process and the entropy encoding process are shown as separate processes, but in fact can be combined or treated together. That is to say, the reading of data into the entropy encoder can take place in the scan order. Corresponding considerations apply to the respective inverse processes to be described below.

The output of the entropy encoder 370, along with additional data, for example defining the manner in which the predictor 320 generated the predicted image, provides a compressed output video signal 380.

However, a return path is also provided because the operation of the predictor 320 itself depends upon a decompressed version of the compressed output data.

The reason for this feature is as follows. At the appropriate stage in the decompression process (to be described below) a decompressed version of the residual data is generated. This decompressed residual data has to be added to a predicted image to generate an output image (because the original residual data was the difference between the input image and a predicted image). In order that this process is comparable, as between the compression side and the decompression side, the predicted images generated by the predictor 320 should be the same during the compression process and during the decompression process. Of course, at decompression, the apparatus does not have access to the original input images, but only to the decompressed images. Therefore, at compression, the predictor 320 bases its prediction (at least, for inter-image encoding) on decompressed versions of the compressed images.

The entropy encoding process carried out by the entropy encoder 370 is considered to be "lossless", which is to say that it can be reversed to arrive at exactly the same data which was first supplied to the entropy encoder 370. So, the return path can be implemented before the entropy encoding stage. Indeed, the scanning process carried out by the scan unit 360 is also considered lossless, but in the present embodiment the return path 390 is from the output of the quantiser 350 to the input of a complimentary inverse quantiser 420.

In genelal terms, an entropy decoder 410, the reverse scan unit 400, an inverse quantiser 420 and an inverse transform unit 430 provide the respective inverse functions of the entropy encoder 370, the scan unit 360, the quantiser 350 and the transform unit 340. For now, the discussion will continue through the compression process; the process to decompress an input compressed video signal will be discussed separately below.

In the compression process, the scanned coefficients are passed by the return path 390 from the quantiser 350 to the inverse quantiser 420 which carries out the inverse operation of the scan unit 360. An inverse quantisation and inverse transformation process are carried out by the units 420, 430 to generate a compressed-decompressed residual image signal 440.

The image signal 440 is added, at an adder 450, to the output of the predictor 320 to generate a reconstructed output image 460. This forms one input to the image predictor 320, as will be described below.

A controller 345 controls the operation of the arrangement shown in Figure 5.

Turning now to the process applied to a received compressed video signal 470, the signal is supplied to the entropy decoder 410 and from there to the chain of the reverse scan unit 400, the inverse quantiser 420 and the inverse transform unit 430 before being added to the output of the image predictor 320 by the adder 450. In straightforward terms, the output 460 of the adder 450 forms the output decompressed video signal 480. In practice, further filtering may be applied befole the signal is output.

Figure 6 schematically illustrates the generation of predicted images, and in particular the operation of the image predictor 320.

There are two basic modes of prediction: so-called intra-image prediction and so-called inter-image, or motion-compensated (MC), prediction.

Intra-image prediction bases a prediction of the content of a block of the image on data from within the same image. This corresponds to so-called I-frame encoding in other video compression techniques. In contrast to I-frame encoding, where the whole image is intra-encoded, in the present embodiments the choice between intra-and inter-encoding can be made on a block-by-block basis, though in other embodiments the choice is still made on an image-by-image basis.

Motion-compensated prediction makes use of motion information which attempts to define the source, in another adjacent or nearby image, of image detail to be encoded in the current image. Accordingly, in an ideal example, the contents of a block of image data in the predicted image can be encoded very simply as a reference (a motion vector) pointing to a corresponding block at the same or a slightly different position in an adjacent image.

Returning to Figure 6, two image prediction arrangements (corresponding to intra-and inter-image prediction) are shown, the results of which are selected by a multiplexer 500 under the control of a mode signal 510 so as to provide blocks of the predicted image for supply to the adders 310 and 450. The choice is made in dependence upon which selection gives the lowest "energy" (which, as discussed above, may be considered as information content requiring encoding), and the choice is signalled to the encoder within the encoded output datastream.

Image energy, in this context, can be detected, for example, by carrying out a trial subtraction of an area of the two versions of the predicted image from the input image, squaring each pixel value of the difference image, summing the squared values, and identifying which of the two versions gives rise to the lower mean squared value of the difference image relating to that image area.

The actual prediction, in the intra-encoding system, is made on the basis of image blocks received as part of the signal 460, which is to say, the prediction is based upon encoded-decoded image blocks in order that exactly the same prediction can be made at a decompression apparatus. However, data can be derived from the input video signal 300 by an intra-mode selector 520 to control the operation of the intra-image predictor 530.

For inter-image prediction, a motion compensated (MC) predictor 540 uses motion information such as motion vectors derived by a motion estimator 550 from the input video signal 300. Those motion vectors are applied to a processed version of the reconstructed image 460 by the motion compensated predictor 540 to generate blocks of the inter-image prediction.

The processing applied to the signal 460 will now be described. Firstly, the signal is filtered by a filter unit 560. This involves applying a "deblocking" filter to remove or at least tend to reduce the effects of the block-based processing carried out by the transform unit 340 and subsequent operations. Also, an adaptive loop filter is applied using coefficients derived by processing the reconstructed signal 460 and the input video signal 300. The adaptive loop filter is a type of filter which, using known techniques, applies adaptive filter coefficients to the data to be filtered. That is to say, the filter coefficients can vary in dependence upon various factors.

Data defining which filter coefficients to use is included as part of the encoded output datastream.

The filtered output from the filter unit 560 in fact forms the output video signal 480. It is also buffered in one or more image stores 570; the storage of successive images is a requirement of motion compensated prediction processing, and in particular the generation of motion vectors. To save on storage requirements, the stored images in the image stores 570 may be held in a compressed form and then decompressed for use in generating motion vectors. For this particular purpose, any known compression / decompression system may be used. The stored images are passed to an interpolation filter 580 which generates a higher resolution version of the stored images; in this example, intermediate samples (sub-samples) are generated such that the resolution of the interpolated image is output by the interpolation filter 580 is 8 times (in each dimension) that of the images stored in the image stores 570. The interpolated images are passed as an input to the motion estimator 550 and also to the motion compensated predictor 540.

In embodiments, a further optional stage is provided, which is to multiply the data values of the input video signal by a factor of four using a multiplier 600 (effectively just shifting the data values left by two bits), and to apply a corresponding divide operation (shift right by two bits) at the output of the apparatus using a divider or right-shifter 610. So, the shifting left and shifting right changes the data purely for the internal operation of the apparatus. This measure can provide for higher calculation accuracy within the apparatus, as the effect of any data rounding errors is reduced.

The way in which an image is partitioned for compression processing will now be described. At a basic level, and image to be compressed is considered as an array of blocks of samples. For the purposes of the present discussion, the largest such block under consideration is a so-called largest coding unit ([CU) 700 (Figure 7), which represents a square array of 64 x 64 samples. Here, the discussion relates to luminance samples. Depending on the chrominance mode, such as 4:4:4, 4:2:2, 4:2:0 or 4:4:4:4 (GBR plus key data), there will be differing numbers of corresponding chrominance samples corresponding to the luminance block.

Three basic types of blocks will be described: coding units, prediction units and transform units. In general terms, the recursive subdividing of the LCUs allows an input picture to be partitioned in such a way that both the block sizes and the block coding parameters (such as prediction or residual coding modes) can be set according to the specific characteristics of the image to be encoded.

The LCU may be subdivided into so-called coding units (CU). Coding units are always square and have a size between 8x8 samples and the full size of the [CU 700. The coding units can be arranged as a kind of tree structure, so that a first subdivision may take place as shown in Figure 8, giving coding units 710 of 32x32 samples; subsequent subdivisions may then take place on a selective basis so as to give some coding units 720 of 16x16 samples (Figure 9) and potentially some coding units 730 of 8x8 samples (Figure 10). Overall, this process can provide a content-adapting coding tree structure of CU blocks, each of which may be as large as the LCU or as small as 8x8 samples. Encoding of the output video data takes place on the basis of the coding unit structure.

Figure 11 schematically illustrates an array of prediction units (PU). A prediction unit is a basic unit for carrying information relating to the image prediction processes, or in other words the additional data added to the entropy encoded residual image data to form the output video signal from the apparatus of Figure 5. In general, prediction units are not restricted to being square in shape. They can take other shapes, in particular rectangular shapes forming half of one of the square coding units, as long as the coding unit is greater than the minimum (8x8) size. The aim is to allow the boundary of adjacent prediction units to match (as closely as possible) the boundary of real objects in the picture, so that different prediction parameters can be applied to different real objects. Each coding unit may contain one or more prediction units.

Figure 12 schematically illustrates an array of transform units (TU). A transform unit is a basic unit of the transform and quantisation process. Transform units are always square and can take a size from 4x4 up to 32x32 samples. Each coding unit can contain one or more transform units. The acronym SDIP-P in Figure 12 signifies a so-called short distance intra-prediction partition. In this arrangement only one dimensional transforms are used, so a 4xN block is passed through N transforms with input data to the transforms being based upon the previously decoded neighbouring blocks and the previously decoded neighbouring lines within the current SDIP-P.

A simplified schematic diagram illustrating a flow of data through an encoder of the types discussed above, such as a HEVC encoder, is shown in Figure 13. A purpose of summarising the process in the form shown in Figure 13 is to indicate the potential limitations on operating resolution within the system. Note that for this reason, not all of the encoder functionality is shown in Figure 13. Note also that Figure 13 provides an example of an apparatus for encoding input data values of a data set (which may be video data values). Further, note that (as discussed above) techniques used in a forward encoding path such as that shown in Figure 13 may also be used in the complementary reverse decoding path of the encoder and may also be used in a forward decoding path of a decoder.

Input data 1300 of a certain bit depth is supplied to a prediction stage 1310 which performs either intra-or inter-prediction and subtracts the predicted version from the actual input image, generating residual data 1320 of a certain bit depth. So, the stage 1300 generally corresponds to the items 320 and 310 of Figure 5.

The residual data 1320 is frequency-transformed by a transform stage 1330 which involves multiple stages of transform processing (labelled as stage 1 and stage 2), corresponding to left and right matrix multiplications in a 20 transform equaon (the transforms can be implemented by a matrix multiplication process), and operates according to one or more sets of transform matrices 1340 having a certain resolution. A maximum dynamic range 1350 of the transform process, referred to as MAX_TR_DYNAMIC_RANGE, applies to the calculations used in this process. The output of the transform stage is a set of transform coefficients 1360 according to the MAX_TR_DYNAMIC_RANGE. The transform stage 1330 corresponds generally to the transform unit 340 of Figure 5.

The coefficients 1360 are then passed to a quantising stage 1370 generally corresponding to the quantiser 350 of Figure 5. The quantising stage may use a multiply-shift mechanism under the control of quantisation coefficients and scaling lists 1380, including clipping to the maximum dynamic range ENTROPY_CODING_DYNAMIC_RANGE (which is, in embodiments, the same as MAX_TR_DYNAM IC_RANGE). The output of the quantising stage is a set 1390 of quantised coefficients according to ENTROPY_CODING_DYNAMIC_RANGE which is then (in the full encoder, not shown here) passed to an entropy encoding stage such as that represented by the scan unit 360 and entropy encoder 370 of Figure 5.

Using the notation introduced in respect of Figure 13, the main sources of calculation noise, ignoring (for the purposes of this discussion) noise shaping caused by the various predictions and any ROT (residual quad-tree) and RDOO (rate distortion optimized quantisation) decision processes, in HEVC are discussed below: Transform matrix coefficient values Ideally, the inverse transform applied to transformed coefficients will reproduce the original input values. However, this is limited by the integer nature of the calculations. In HEVC, the transform matrix coefficients have 6 fractional bits (i.e. they have an inherent left-shift of 6).

Shifting results to MAX_TR_DYNAMIC_RANGE after each stage of the transform The forward transform will result in values that are bitDepth+log2(size) bits in size. After the first stage of the transform, the coefficients' width in bits should be at least bitoepth+log2(size) (though additional bits will help maintain more accuracy). However, in HEVC, these intermediates are shifted in the forward (encoder only) transform so that they never exceed MAX_YR_DYNAMIC_RANGE; similarly for the second stage. In the inverse transform, the values at the output of each stage are clipped to MAX_TR_DYNAMIC RANGE.

If MAX_TR_DYNAMIC_RANGE is less than bitDepth-'-log2(size), then the values out of the forward transform will actually be shifted left (instead of right) in the quantising stage, and then clipped to 15-bit (ENTROPY_CODING_DYNAMIC_RANGE). Actually, if ENTROPY_CODING_DYNAMIC_RANGE is less than b/tDepth+log2(size)+1, clipping will occur when OP is less than (4 -(6 * (bitDepth -8).

In HEVC, MAX_YR_DYNAMIC_RANGE (and ENYROPY_COD1NG_DYNAM/C_RANGE of 15 is used for up to 10 bit operation, although coefficients in 32x32 blocks may be clipped for OP < -8. In addition, the lack of headroom for internal accuracy may also introduce errors for low OPs.

Noise added during quantisation Although the quantiser and inverse quantiser of an encoder and decoder will add noise when quantising, additional noise may be inadvertently added when the scaling lists are applied, and because the quantisation coefficients defined in the arrays quantscales' and invOuantScales' are not necessarily perfect reciprocals.

The effects of transform matrix precision and MAX_YR_DYNAMIC_RANGE are discussed below.

Empirical data was obtained by analysis (under the so-called intra coding profile) of the coding of five video sequences from the so-called SVT test set (1920x1 080 Sop at l6bit, scaled down from 4K video). Of these sequences, only the first 150 frames have been used in the tests. A sixth sequence, referred to as Traffic_RGB (2560x1 600 30p at 12 bit) is defined by the standard Range Extension test conditions applicable to HEVC at the time of filing the present application.

In the empirical tests, if the file (input data) bit depth was less than the internal bit depth being tested (the codec's input bit depth), then the samples were padded (with the LSBs set to 0); if the file bit depth was more than the internal bit depth, the samples were scaled and rounded.

In the discussion below, bitDeptli is used to describe the internal bit depth rather than the bit depth of the input data. Systems with internal bit depth (bitDepth) up to 16 are considered.

Empirical results have shown that in at least some instances, the transform matrix precision should be at least b/tflepm-2.

In embodiments, MAX_YR_DYNAMIC_RANGE should be at least 5 (which is the minimum value of log2(size)) more than bitDepth. Additional accuracy has been shown to further improve coding efficiency.

In embodiments, ENTROPY_COD/NG_DYNAM/C_RANGE should be at least 6 more than the bitDepth (1 for the "quantisation" factor applied by OPs less than (4 -(6 * (bitDepth - 8))) plus 5 for the maximum value of log2(size)). In other embodiments, where the clipping for the lowest OP values is not a concern, then the ENTROPY_CODING_DYNAMIC_RANGE should be at least 5 (the minimum value of log2(size)) more than bitDepth.

For the 16-bit system, the transform matrix precision should be set to 14, MAX TR DYNAMIC RANGE should be set to 21, and ENTROPY_CODING_DYNAMIC_RANGE should be set to 22. Since having more internal accuracy is rarely considered harmful, these parameters have also been tested at different bitDepths, producing results which demonstrate that, for the same number of bits, significantly higher SNRs are achievable, and that the increased-accuracy system has PSNR/MSE operating points that are suitable for bitDepths of up to 16.

If Range Extensions is intended to produce a single new profile for all bit depths, then the system described above is suitable. However, if different profiles are to be described for different maximum bitDepths, then having different parameter values may be useful for reducing hardware complexity in systems that do not require the highest profiles. In some embodiments, the different profiles may define different values for transform matrix precision, MAX TR DYNAMIC RANGE and EN TROPY CODING DYNAM/C RANGE.

In other embodiments, the profile would allow the values of some or all of transform matrix precision, MAX_TR_DYNAM/C_RANGE and ENTROPY_CODING_DYNAM/C_RANGE to be chosen from a list of permissible values by the encoder (with the cost of implementation being a selection criterion), or a function of side information such as the bitDepth. However, this may require multiple sets of transform matrices if the transform matrix precision is to be varied and for this reason, in further embodiments only one transform matrix precision is defined for a profile, with that transform matrix precision corresponding to the recommended value for the maximum bit depth for which the profile is designed. A set of possible profiles is proposed below with reference to Figure 14.

Examples values of transform matrix precision, MAX_TR_DYNAM/C_RANGE, ENTROPY_CODING_DYNAMIC_RANGE and bitDepth are shown in the following table: bitDeptli 16 15 14 13 12 11 10 9 2 Transform Matrix Precision 14 13 12 11 10 9 8$ 7$ 6 MAX_Tn_DYNAMIC_RANGE 21 20 19 18 17 16 15 15* 15 ENTROPY CODING DYNAMIC RANGE 22 21 20 19 18 17 16$ 15 15 In the table, values marked with a "i' are clipped to a minimum of 15, in line with the current description of HEVC. The values marked with t' and t' are greater than those specified for the current description of HEVC, those being 15 and 6 respectively.

If different profiles are to be used, then in embodiments of the disclosure these specifications may be used as minima (noting that the HEVO version 1 10-bit system does not quite meet these targets). Using values less than these indicated minima is possible, although this will degrade the PSNR for higher bit rates (lower QPs).

Accordingly, the table above gives an example of an arrangement in which the precision and/or dynamic range of one or more coding (and complementary decoding) stages is set according to the bit depth of the video data.

Although the discussions above provide for a variable transform dynamic range, some arrangements may not allow this parameter to be varied. For example, in some arrangements, the variable MAX_TR_DYNAM IC_RANGE and the data precision of the transform matrices may be fixed at 16 bits. However, it may still be appropriate to handle high bit depth (n bit, where n might be 11-24, for example) video data using a greater data precision at other parts of the processing chain.

In other arrangements, there may be a lower maximum, such as 11 bits into the first stage of the forward transform, which is then increased to 16 bits by maintaining internal accuracies of the transform, and optionally applying an appropriate shift and optional rounding stage that causes the second stage to have 16 bits at its input.

In some embodiments, a data compression and decompression arrangement is provided to allow for this.

Figure 15 schematically illustrates a part of the functionality of Figure 13, using such a compression and decompression arrangement.

In this example, a compression unit 1500 is provided at the input to the transform unit 1510 (which corresponds to the transform unit 1330 of Figure 13) to data compress the input n-bit data values to a data precision lower than n bits; and a complementary decompression unit 1520 is provided at the output of the transform unit 1510 to decompress the transformed data back to a data precision of n bits. The compression unit 1500 receives data at a data precision which is higher than the data precision of the transform unit (16 bits in this example). The compression unit converts the input, higher resolution data into 16 bit data which is then handled in the normal way by the transform unit 1510. The output of the transform unit is transformed data at a 16 bit resolution, which the decompression unit 1520 converts back to data at (for example) the original, higher, resolution corresponding to the resolution of the data supplied to the compression unit 1500.ln this context, the transform unit acts to frequency-transform input frequency-transformed image data to generate an array of output image data by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision lower than n bits; a complementary operation takes place at an inverse transform unit in a decoder.

In some embodiments, the compression unit converts the input, higher resolution data into a lower bit width than 16, such as 11 bits, thereby providing a similar input to the transform as provided by lower bit video.

Note that a corresponding process (which is to say, a process as just described) can take place at the decoder, and/or in the decoding side of the encoder, in which instance the transform unit 1510 would be substituted by an inverse transform unit. But this does not mean to say that all examples of an encoder will use the same process in its forward encoding path (of the type described) as that used in its reverse decoding path and in the forward decoding path of a decoder. In some embodiments, the encoder's forward process will be different to the reverse decoding path process and that of the decoder, although the encoder would model the decoder's behaviour and supply any required control data for the purpose of restricting the maximum values through the transform.

Of course, if the input data (to be handled by the transform process) is below a specific bit depth resolution already, such as 16 bit, then either the compression unit 1500 and the decompression unit 1520 can be bypassed or otherwise rendered non-functional for the purposes of that data, or the compression and decompression functions applied by the compression and decompression units can be arranged so as not to change (or at least, not to decrease) the resolution of the input data and not to change (or at least not to increase) the resolution of the output data.

Various possibilities are available for the compression and decompression functions. In one example, Figure 16a schematically illustrates a rounding unit 1530 acting as the compression unit 1500 and Figure lBb schematically illustrates a shift unit 1540 acting as the decompression unit 1520. These cooperate to provide complementary compression and decompression functions in which, at the rounding unit 1530, the input data is rounded down to 16 bit resolution (using a fixed or adaptive rounding technique) and, at the shift unit 1540, the data output by the transform unit 1510 is bit-shifted back to the resolution of the original input data.

It will be appreciated that a bit shift can provide a simple example of a rounding operation. It will also be appreciated that rounding can take place using a more advanced rounding algorithm than that provided simply by discarding LSBs. For example, a value may be rounded to the nearest integer, or to the next-higher integer.

As another example option, a simple shift unit can be used in place of (or after) the rounding unit.

As a further example option, a clipping operation can be applied after the rounding unit.

In general examples, the data compressing step may apply a bit shift function and the decompressing step may apply a bit shift function. In other general examples, the data compressing step may apply a rounding function before the bit shift function and the decompressing step may apply a bit shift function.

As another example option, Figure 17a schematically illustrates a mapping unit 1550 acting as a compression unit 1500 and Figure 17b schematically illustrates an inverse mapping unit 1560 acting as a decompression unit 1520. Here, a mapping is applied as between input data and compressed data (in the mapping unit 1550) and a complementary inverse mapping (using a complementary mapping function) is applied in the inverse mapping unit 1560 as between data output by the transform unit 1510 and data output by the inverse mapping unit 1560.

Figure 18 schematically illustrates a mapping function 1570, which is just provided as one example of a possible function. The function, as drawn, assumes that the data supplied to the map function (to the mapping unit 1550) has a higher resolution than the data to be output by the mapping unit 1550. Very small values of input data are rounded to zero by this example mapping function. Higher values of input data are mapped to corresponding mapped data values at the resolution of the transform unit 1510, but (in this example) according to a non-linear relationship between input and output of the mapping function. The mapping function can be implemented by means of a look-up table of values stored in a read-only memory, for

example.

Figure 19 is a schematic flow chart illustrating some of the functionality of an encoder.

At a step 1600, a compression and decompression function is selected. The selection can be fixed (such as using a "round to 16 bits" function. The function can be selected according to, for example, the bit depth of the input video data, according to a predetermined relationship between bit depth and compression/decompression function. Or as another option, a compression/decompression function could be selected according to the results of one or more trial encodings. The controller 345 handles the selection of the function. At a step 1610, the selected function is applied by the compression unit 1500 and the decompression unit 1520.

Optionally, and if required (that is to say, if the decoder cannot derive the function itself from predetermined information and/or aspects of the data stream), indicator data can be added to the data stream at a step 1620 to indicate the function which has been used.

Figure 20 is a schematic flow chart illustrating some of the functionality of a decoder.

Optionally (as discussed above) at a step 1630, indicator data is detected so as to indicate the compression / decompression function used. At a step 1640, the appropriate compression / decompression function to match or complement that used at the encoder is selected. At a step 1650, the function is applied to compression and decompression units at the decoder.

It will be appreciated that a decoder may be defined as equivalent to the decoding path of the apparatus of Figure 5.

It is important to note that this technique can also be applied during the calculation of the prediction data, and in particular the inter-predicted data. In inter predicted data, a two dimensional filter is applied to previously reconstructed video data, in order to provide a prediction with sub-pixel motion vector accuracy. The prediction itself will be of the same number of bits as the source data, although intermediate calculations may be higher. Again, in order to reduce the overhead in a system, the encoder could inspect the source data and calculate the most suitable compression and decompression function that would minimise or at least reduce the accuracy required for the intermediate calculations. The minimisation (reduction) of the accuracy could be in consideration of the decoder and/or the encoder.

A corresponding compression and decompression process would occur in the decoder.

As with the transform, such a process would not be needed in some circumstances, such as when the input data bit depth is low, or when the sub-pixel vector positions are 0 (for example if the first stage of the prediction is a horizontal interpolator, and the horizontal vector has a sub-pixel value of 0, no horizontal prediction is required, leading to a single stage prediction with a cropping on the output, which would not necessarily require any pre-compression techniques). However, when the process is used, the decoder would either infer the type of compression using other information in the stream, such as the type of compression used by the transform, or would decode an indicator from the bit-stream.

Respective features of embodiments of the present disclosure are defined by the following numbered clauses: 1. A method of decoding input image data having a data precision of n bits, the method comprising the steps of: data compressing the input n-bit data values to a data precision lower than n bits; frequency-transforming input frequency-transformed image data to generate an array of output image data by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision lower than n bits; and decompressing the output image data to a data precision of n bits.

2. A method according to clause 1, in which the maximum dynamic range of the transformed data and the transform matrices data precision have a predetermined, fixed value.

3. A method according to clause 2, in which the maximum dynamic range of the transformed data and the coefficients passing through the stages of the transform are equal to 16 bits, and the transform matrices data precision has 6 fractional bits.

4. A method according to any one of the preceding clauses, comprising the steps of: detecting indicator data associated with the input image data; and selecting data compression and decompression functions according to the indicator data.

5. A method according to any one of the preceding clauses, in which the data compressing step applies a bit shift function and the decompressing step applies a bit shift function.

6. A method according to clause 5, in which the data compressing step applies a rounding function before the bit shift function and the decompressing step applies a bit shift function.

7. A method according to any one of clauses 1 to 4, in which the data compressing step and the decompressing step apply complementary mapping functions.

8. A data encoding method for encoding an array of n-bit data values, the method comprising the steps of: data compressing the input n-bit data values to a data precision lower than n bits; frequency-transforming input image data to generate an array of frequency-transformed input image coefficients by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision lower than n bits; and decompressing the image coefficients to a data precision of n bits.

9. A method according to clause 8, comprising the steps of: selecting data compression and decompression functions; and associating indicator data with the encoded data to indicate the selected functions.

10. Computer software which, when executed by a computer, causes the computer to carry out the method of any one of the preceding clauses.

11. A non-transitory machine-readable storage medium by which computer software according to clause 10 is stored.

12. Image data decoding apparatus for decoding input image data having a data precision of n bits, the apparatus comprising: a data compressor configured to compress the input n-bit data values to a data precision lower than n bits; a frequency transformer configured to frequency-transform input frequency-transformed image data to generate an array of output image data by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision lower than n bits; and a data decompressor configured to decompress the output image data to a data precision of n bits.

13. Image data encoding apparatus for encoding an array of n-bit data values, the apparatus comprising: a data compressor configured to compress the input n-bit data values to a data precision lower than n bits; a frequency transformer configured to frequency-transform input image data to generate an array of frequency-transformed input image coefficients by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision lower than n bits; and a data decompressor configured to decompress the image coefficients to a data precision of n bits.

14. Video data capture, transmission, display and/or storage apparatus comprising apparatus according to clause 12 or clause 13.

As discussed earlier, it will be appreciated that apparatus features of the above clauses may be implemented by respective features of the encoder or decoder.

Claims

CLAIMS1. A method of decoding input image data having a data precision of n bits, the method comprising the steps of: data compressing the input n-bit data values to a data precision lower than n bits; frequency-transforming input frequency-transformed image data to generate an array of output image data by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision lower than n bits; and decompressing the output image data to a data precision of n bits.
2. A method according to claim 1, in which the maximum dynamic range of the transformed data and the transform matrices data precision have a predetermined, fixed value.
3. A method according to claim 2, in which the maximum dynamic range of the transformed data and the coefficients passing through the stages of the transform are equal to 16 bits, and the transform matrices data precision has 6 fractional bits.
4. A method according to claim 1, comprising the steps of: detecting indicator data associated with the input image data; and selecting data compression and decompression functions according to the indicator data.
5. A method according to claim 1, in which the data compressing step applies a bit shift function and the decompressing step applies a bit shift function.
6. A method according to claim 5, in which the data compressing step applies a rounding function before the bit shift function and the decompressing step applies a bit shift function.
7. A method according to claim 1, in which the data compressing step and the decompressing step apply complementary mapping functions.
8. A data encoding method for encoding an array of n-bit data values, the method comprising the steps of: data compressing the input n-bit data values to a data precision lower than n bits; frequency-transforming input image data to generate an array of frequency-transformed input image coefficients by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision lower than n bits; and decompressing the image coefficients to a data precision of n bits.
9. A method according to claim 8, comprising the steps of: selecting data compression and decompression functions; and associating indicator data with the encoded data to indicate the selected functions.
10. Computer software which, when executed by a computer, causes the computer to carry out the method of claim 1.
11. A non-transitory machine-readable storage medium by which computer software according to claim 10 is stored.
12. Image data decoding apparatus for decoding input image data having a data precision of n bits, the apparatus comprising: a data compressor configured to compress the input n-bit data values to a data precision lower than n bits; a frequency transformer configured to frequency-transform input frequency-transformed image data to generate an array of output image data by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision lower than n bits; and a data decompressor configured to decompress the output image data to a data precision of n bits.
13. Image data encoding apparatus for encoding an array of n-bit data values, the apparatus comprising: a data compressor configured to compress the input n-bit data values to a data precision lower than n bits; a frequency transformer configured to frequency-transform input image data to generate an array of frequency-transformed input image coefficients by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision lower than n bits; and a data decompressor configured to decompress the image coefficients to a data precision of n bits.
14. Video data capture, transmission, display and/or storage apparatus comprising apparatus according to claim 12.
15. Video data capture, transmission, display and/or storage apparatus comprising S apparatus according to claim 13.