GB2488798A - Video encoding and decoding with improved error resilience - Google Patents

Video encoding and decoding with improved error resilience Download PDF

Info

Publication number
GB2488798A
GB2488798A GB1103925.2A GB201103925A GB2488798A GB 2488798 A GB2488798 A GB 2488798A GB 201103925 A GB201103925 A GB 201103925A GB 2488798 A GB2488798 A GB 2488798A
Authority
GB
United Kingdom
Prior art keywords
motion vector
predictors
image areas
image
encoded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1103925.2A
Other versions
GB2488798B (en
GB201103925D0 (en
Inventor
Guillaume Laroche
Patrice Onno
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to GB201103925A priority Critical patent/GB2488798B/en
Publication of GB201103925D0 publication Critical patent/GB201103925D0/en
Priority to PCT/EP2012/001011 priority patent/WO2012119766A1/en
Publication of GB2488798A publication Critical patent/GB2488798A/en
Application granted granted Critical
Publication of GB2488798B publication Critical patent/GB2488798B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/89Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/164Feedback from the receiver or from the transmission channel
    • H04N19/166Feedback from the receiver or from the transmission channel concerning the amount of transmission errors, e.g. bit error rate [BER]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

An image area to encode belongs to a first group of image areas that are to be encoded into an encoding unit which can be decoded independently of any other encoding unit. An encoding method comprises determining a first set of motion vector predictors for the image area to encode from motion vectors that are associated with other image areas belonging to the first group of image areas. A second set of motion vector predictors for the image area to encode is determined from motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units. The number of predictors in the second set is obtainable by a decoder even if at least one of the further encoding units is lost. A motion vector predictor is selected from among the motion vector predictors of the first and second sets, and information representative of the selected motion vector predictor is encoded in dependence upon the number of motion vector predictors in the first and second sets. Such a method enables the second set of predictors to be used to improve the coding efficiency but, in the event that one of the further encoding units is lost, the number of second-set predictors is still obtainable by the decoder. This avoids parsing problems on the decoder side.

Description

Video encoding and decoding with improved error resilience
Field of the invention
The invention relates to a method and device for encoding a sequence of digital images into a bitstream and a method and device for decoding the corresponding bitstream.
The invention belongs to the field of digital signal processing, and in particular to the field of video compression using motion compensation to reduce spatial and temporal redundancies in video streams.
Description of the prior-art
Many video compression formats, for example H.263, H.264, MPEG-i, MPEG-2, MPEG-4, SVC, use block-based discrete cosine transform (DCT) and motion compensation to remove spatial and temporal redundancies. They can be referred to as predictive video formats. Each frame or image of the video signal is divided into slices which are encoded and can be decoded independently. A slice is typically a rectangular portion of the frame, or more generally, a portion of a frame or an entire frame. Further, each slice is divided into macroblocks (MB5), and each macroblock is further divided into blocks, typically blocks of 8x8 pixels. The encoded frames are of two types: temporal predicted frames (either predicted from one reference frame called P-frames or predicted from two reference frames called B- frames) and non temporal predicted frames (called Intra frames or I-frames).
Temporal prediction consists in finding in a reference frame, either a frame preceding or following the frame to encode in the video sequence, a reference area which is the closest to the block to encode.
This step is known as motion estimation. Next, the difference between the block to encode and the reference area is encoded (motion compensation), along with an item of motion information relative to the motion vector which indicates the reference area to use for motion compensation.
In order to further reduce the cost of encoding motion information, it is known to encode a motion vector by difference with a motion vector predictor, typically computed from the motion vectors of the blocks surrounding the block to encode.
In H.264, motion vectors are encoded with respect to a median predictor computed from the motion vectors of the block situated in a causal neighborhood of the block to encode, for example from the blocks situated above and on the left of the block to encode.
Only the difference, also called residual motion vector, between the median predictor and the current block motion vector is encoded.
The encoding using residual motion vectors saves some bitrate, but necessitates that the decoder performs the same computation of the motion vector predictor in order to decode the value of the motion vector of a block to decode.
Recently, further improvements have been proposed, such as using a plurality of possible motion vector predictors. Again motion vector predictors are motion vectors associated to blocks generally close to the block to predict. This method, called motion vector competition, consists in selecting between several motion vector predictors or candidates the one that minimizes an encoding cost, typically a rate-distortion (RD) cost, of the residual motion information.
The residual motion information comprises the residual motion vector, i.e. the difference between the actual motion vector of the block to encode and the selected motion vector predictor, and an item of information indicating the selected motion vector predictor, such as for example an encoded value of the index of the selected motion vector predictor.
In the High Efficiency Video Coding (HEVC) currently in course of standardization, it has been proposed a new motion vector prediction scheme called AMVP (Advanced Motion Vector Prediction).
This scheme proposes to use a plurality of motion vector predictors as schematically illustrated in figure Ia. In the current version of the standard under development, 0 to 3 motion vector predictors can be extracted from a predefined set of blocks in the spatial and temporal neighborhood of the block to encode: a left predictor, a top predictor and a temporal predictor.
The left predictor is extracted from the blocks I, H, C or F depending on the availability of the vector in these blocks. Here a vector is considered as available if this vector exists and if the reference frame index of the associated block is the same as the reference frame index of the block to encode (if the vector predictor and vector of the block to encode point to the same reference frame). The non-existence of a motion vector means that the related block was Intra coded.
The search is performed from bottom to top, starting at the block I and finishing at the block F. The first vector predictor which meets the availability criterion is selected as the left predictor. If no predictor meets the availability criterion, the left predictor is considered unavailable.
The top predictor is extracted from the block E, D, C, B or A depending on the same availability criterion. Here the search is performed from right to left starting at the block E and finishing at the block A. Again, if no predictor meets the availability criterion, the top predictor is considered unavailable.
The temporal predictor is generally extracted from the collocated block. Note that a collocated block is a block situated at the same position as the block to encode but in another image. Two cases are considered for the temporal predictor depending on the acceptable decoding delay: low delay decoding case and hierarchical case.
In the low delay decoding case, the temporal motion predictor comes from the collocated block extracted from the nearest reference frame. However, in case of B frames, two collocated blocks could be available: one in the temporally closest frame from a list "LU" of reference frames and one in the temporally closest frame from a list "LI" of reference frames. The construction of these lists is left to the implementer who can decide for instance inserting past frames in list "LU" and future frames in list "LI". If both collocated blocks contain a motion vector, the predictor contained in the collocated block of the reference frame which is at the shortest temporal distance from the frame to encode is selected. If both reference frames are at the same temporal distance, the predictor provided by the list "LU" is selected.
The selected predictor is then scaled, if needed, according to the temporal distance between its originating reference frame and the frame to encode. If no collocated predictor exists, the temporal predictor is considered unavailable.
In the random access case (also called hierarchical case), which involves more decoding delay, the collocated block is extracted from the future reference frame and contains 2 motion vectors since this block is bi-predicted. The motion vector which crosses the frame to encode is selected. If both predictors cross the frame to encode, the motion vector which points to the temporally closest reference frame is selected. If both predictors are associated to the same temporal distance, the motion from the first list "LU" is then selected. The selected predictor is then scaled, if needed, according to the temporal distance between its originating reference frame and the frame to encode. If no collocated predictor exists, the temporal predictor is considered unavailable.
For the low delay case and hierarchical case, when the collocated block is divided into a plurality of partitions (potentially, the collocated block contains a plurality of motion vectors), the partition selected for providing a motion vector predictor is a centered partition.
One can note that the contribution "JCTVC-D1 25: Improved Advanced Motion Vector Prediction, Jian-Liang Lin, Yu-Pao Tsai, Yu-Wen Huang, Shawmin Lei, 4th JCTVC meeting, Daegu, KR, 20-28 January, 2011, http://phenix.int-evryfr/icti' proposed to use multiple temporal predictors instead of using only one in the current version. At the end of this vector predictors' derivation, the set of predictors generated can contain 0, 1, 2 or 3 predictors. If no predictor could be defined during the derivation, the motion vector is not predicted. Both vertical and horizontal components are coded without prediction (this corresponds to a prediction by a predictor equal to the zero value). In the current HEVC implementation, the index of the predictor is equal to 0.
The ordering of the motion vector predictors obtained after the vector predictors' derivation is important to reduce the overhead of signaling the best motion predictor in the set. The ordering in the set is adapted depending on the block to encode prediction mode to position the most probable motion vector predictor in the first position. Indeed in that case a minimum overhead occurs if the first candidate is chosen as the best predictor. In the current implementation of HEVC, the temporal predictor is in first position.
The overhead of signaling the index of the best predictor can be further reduced by a suppression process which minimizes the number of predictors in the set. Duplicated motion vectors are simply removed from the set.
A particular case arises with the merge mode. The Merge mode is a particular Inter coding, similar to the usual Skip mode well known by persons skilled in the art. The main difference compared to the usual Skip mode is that the Merge mode propagates the value of the reference frame index, the direction (Bi directional or uni-directional) and the list (with the uni-directional direction) of the motion vector predictors to the predicted block. The Merge mode uses a motion vector predictor and its reference frame index, unless the predictor is a temporal predictor where the reference frame considered is always the closest preceding reference frame also called RefO (and always bi prediction for B frames). So the block predictors (the copied blocks) come from the reference frames pointed by the motion vector predictors.
In the case of the merge mode, the suppression process takes into account the values of the motion vector and its reference frame. So the two components of the motion vector and its reference index are compared to each other and only if these three values are equal the predictor is removed from the set. For B frames, this criterion is extended to the direction and the lists. So, predictors are considered as duplicated predictors if they use the same direction, the same lists (LU, LI, or LU and Ll), the same reference frame indexes and have the same motion vector components (MV_LU and MV_LI for bi prediction).
In AMVP, the index signaling depends on the result of the motion vector predictor suppression process described above. Indeed, the number of bits allocated to the signaling depends on the number of motion vectors remaining after the suppression. For instance, if at the end of this algorithm, only one motion vector remains, no overhead is required to signal the motion vector predictor index, since the index can be easily retrieved by the decoder. The following Table 1 gives the codeword for each index coding according to the number of predictors after the suppression process.
Codeword when the number of predictors in the set is
_______ ______ ______ N ______ ______
Index N1 N2 N3 N4 N=5 O (inferred) 0 0 0 0 1 _________ 1 10 10 10 2 _________ ________ 11 110 110 3 __________ ________ ________ ill 1110 4 ________ _______ _______ _______ lIlt
Table 1:
Note that in the following for facilitating the reading of figures, we prefix reference by the letter 5, when the reference represents a step and by D, when the reference represents some data.
Figure 3 illustrates a block diagram of a video encoder in which AMVP is implemented (in step S317). The encoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by a processor, a corresponding step of a method implementing a video encoding process involving AMVP.
An original sequence of digital images i0 to i D301 is received as an input to an encoder 30. Each digital image is represented by a set of samples, known as pixels.
A bitstream D31 0 is output by the encoder 30.
The bitstream D310 comprises a plurality of encoding units or slices, each slice comprising a slice header for encoding values of encoding parameters used to encode the slice and a slice body, comprising encoded video data.
The input digital images are divided into blocks 5302, which blocks are image areas and may be of variable sizes (e.g. 4x4, 8x8, 16x16, 32x32). A coding mode is selected for each input block. There are two families of coding modes, spatial prediction coding or Intra coding, and temporal prediction coding or Inter coding. The possible coding modes are tested.
Step S303 implements Intra prediction, in which the given block to encode is predicted by a predictor computed from pixels of the neighborhood of said block to encode. An indication of the Intra predictor selected and the difference between the given block and its predictor is encoded if the Intra coding is selected.
Temporal prediction is implemented by step S304 and S305.
Firstly a reference image among a set of reference images D316 is selected, and an area of the reference image, also called reference area, which is the closest area to the given block to encode, is selected by the motion estimation module S304. The difference between the selected reference area and the given block, also called a residual block, is computed by the motion compensation module 5305. The selected reference area is indicated by a motion vector.
Information relative to the motion vector and the residual block is encoded if the Inter prediction is selected. To further reduce the bitrate, the motion vector is encoded by difference with respect to a motion vector predictor. A set of motion vector predictors, also called motion information predictors, is obtained from the motion vectors field D318 by a motion vector prediction and coding module 5317.
The index of the selected motion vector predictor, which is an item of information representative of the selected motion vector predictor, is encoded.
The encoder 30 further comprises a module of selection of the coding mode S306, which uses an encoding cost criterion, such as a rate-distortion criterion, to determine which is the best mode among the spatial prediction mode and the temporal prediction mode. A transform 5307 is applied to the residual block, the transformed data obtained is then quantized by module 5308 and entropy encoded by module 5309. Finally, the encoded residual block of the current block to encode is inserted in the bitstream D310, along with the information relative to the predictor used. For the blocks encoded in SKIP' mode, only a reference to the predictor is encoded in the bitstream, without any residual block.
The encoder 30 further performs the decoding of the encoded image in order to produce a reference image for the motion estimation of the subsequent images. The module S311 performs inverse quantization of the quantized data, followed by an inverse transform D312. The reverse motion prediction module S314 uses the prediction information to determine which predictor to use for a given block and performs the reverse motion compensation by actually adding the residual obtained by module D312 to the reference area obtained from the set of reference images D316. Optionally, a deblocking filter 5315 is applied to remove the blocking effects and enhance the visual quality of the decoded image. The same deblocking filter is applied at the decoder, so that, if there is no transmission loss, the encoder and the decoder apply the same processing.
Figure lb shows the description of the current
implementation of the AMVP scheme in HEVC at encoder side which corresponds to module 8317 in figure 3 when AMVP is used.
In figure 1 b the value of each stored encoded vector, also called the motion vector field, is denoted by module D1b01 which corresponds to the motion vectors field D318 in figure 3. As a consequence the neighboring motion vectors, used for the prediction, can be extracted from the motion vectors field Dl b01. StepS 1 bi 2 in Figure 1 b corresponds to the entropy decoding step 8309 in Figure 3.
At step Si b03 the motion vector predictors derivation process generates the motion vector predictors set Dl b04 by taking into account the current reference frame index D1b13. Then the suppression step S1b05 is applied, as described above. Please note that with this process the value of the motion vector to be encoded needs to be known. This step S1b05 produces a reduced motion vector predictors set Dl b06. The step Si b07 of RD (rate-distortion) selection of the best motion vector predictor is applied among the reduced motion vector predictors set Dl b06 with respect to the motion vector to be encoded D1b02. This generates a motion vector predictor index D1b08 (if there is one) depending on the selected motion vector predictor Dl b09. Then, the motion vector Dl b02 is predicted in step Si bi 0 which produces a motion vector residual Dl bI 1. This motion vector residual is then encoded in step S1b12 using entropic coding. The motion vector predictor index D1b08 is converted in step S1b14 into a codeword D1b15 according to the number of predictors D1b16 in the reduced motion vector predictors set Dl b06 as depicted in the previous table 1.
As described above, if this set contains only one predictor, no index is transmitted to the decoder side. This codeword, if it exists, is then entropy coded in step S1b12.
Figure 4 illustrates a flow chart of a video decoder implementing AMVP. The decoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by a processor, a corresponding step of a method implementing a video decoding process involving AMVP.
The decoder 40 receives a bitstream D401 comprising encoding units or slice, each one being composed of a header containing information on encoding parameters and a body containing the encoded video data. The received encoded video data is entropy decoded during step S402, dequantized in step S403 and then a reverse transform is applied in step S404.
In particular, when the received encoded video data corresponds to a residual block of a current block to decode, the decoder also decodes motion prediction information from the bitstream, so as to find the reference area used by the encoder.
The step S410 applies the motion vector decoding for each current block encoded by motion prediction, comprising determining the number N of motion vector predictors considered by the encoder and retrieving the motion vector predictor index encoded on a number of bits dependent on N. If the bitstream is received without losses, the decoder generates exactly the same set of motion vector predictors as the encoder. In case of losses, it may not be possible to generate the set of motion vector predictors and therefore to correctly decode the motion vector associated with the current block. In the current implementation of AMVP in HEVC, the parsing of the bitstream is not insured in case of errors.
Once the index of the motion vector predictor for the current block has been obtained, if no losses have occurred, the actual value of the motion vector associated with the current block can be decoded and used to apply reverse motion compensation in step S406. The reference area indicated by the decoded motion vector is extracted from a reference image during step S408 to finally apply the reverse motion compensation in step 5406.
In case an Intra prediction has been applied, an inverse Intra prediction is applied by step 5405.
Finally, a decoded block is obtained. A deblocking filter is applied during step 8407, similarly to the deblocking filter in step 8315 applied at the encoder. A decoded video signal D409 is finally provided by the decoder 40.
Figure Ic shows in detail the flow chart of the AMVP scheme at decoder side. In this flow chart, reference DicOl is the same as D411 in Figure 4, reference D1c05 is the same as reference D401 and step S1c06 is the same as step 5402. This flow chart shows the algorithm applied in step S410 when the AMVP scheme is used.
Similarly to the encoder, in the step 51c02 the decoder applies the motion vector predictors' derivation and the suppression process to generate the motion predictors set D1c03 based on the motion vectors field Dl cOl of the current frame and of the previous frames. The motion vector residual D1c07 is extracted from the bitstream D1c05 and decoded by the entropy decoder in step S1c06. The number of predictors in the reduced set D1c16 is then used in the entropy decoding step S1c06 to extract (if needed) the motion vector predictor codeword D1c14. This codeword (if it exists) is converted during step S1c15 according to the value in D1c16 in a predictor index D1c09. The motion vector predictor DiclO is then extracted from the reduced set D1c08 according to this predictor index value D1c09. The motion vector predictor is added to the motion residual in step Sicil in order to produce the decoded motion vector D1c12.
As can be seen, the decoder needs to be able to apply the same derivation and suppression as the encoder to determine the set of possible motion vector predictors considered by the encoder. Indeed, otherwise, it cannot deduce the amount of bits used for indicating the selected motion vector predictor and decode the index of the motion vector predictor and finally decode the motion vector using the motion vector residual received.
As a consequence, in case of errors in a reference frame containing a motion vector predictor used in AMVP (due to a channel error for instance in a video streaming application) there is a high risk of decoder suffering de-synchronization. Referring to the example of figure 1 a, if frame N-I is lost, one cannot determine the existence or the value of the motion vector predictor contained in the collocated block. As a consequence the derivation and the suppression process cannot be applied identically on the decoder side as on the encoder side. This drift between the encoder and the decoder processes induces an uncertainty on the number of bits used for encoding the motion vector predictor index and so it is impossible to parse the bitstream correctly.
Such an error may propagate causing the decoder's de-synchronization until a following synchronization image, encoded without prediction, is received by the decoder.
Therefore, the fact that the number of bits used for signaling the motion vector predictors depends on the type of the collocated block and on the values taken by the motion vector predictors makes the method very vulnerable to transmission errors, when the bitstream is transmitted to a decoder on a lossy communication network. It would be desirable to at least be able to parse an encoded bitstream at a decoder even in case of packet losses, so that some re-synchronization or error concealment can be subsequently applied.
It was proposed, in the document JCTVC-C166r1, TEll: Study on motion vector coding (experiment 3.3a and 3.3c)' by K. Sato, published at the 3rd meeting of the Joint Collaborative Team on Video Coding (JTC-VC) of Guangzhou, 7-15 of October 2010, to use only the spatial motion vector predictors coming from the same slice in the predictor set. This solution solves the problem of parsing at the decoder in case of slice losses. However, the coding efficiency is significantly decreased, since the temporal motion vector predictor is no longer used. Therefore, this solution is not satisfactory in terms of compression performance.
A second solution is to encode the predictor with a unary code. The unary code consists in signaling the predictor index in a way that does not depend on the size of the set of predictors. Generally, the encoder writes several successive "1" corresponding to the number of the predictor and the final bit is always "0". The following Table 2 gives an example of the unary code with 3 possible indexes. With this coding, a bit is always written in the bitstream even if no predictor or one predictor was in the set after the suppression process. Consequently, one bit at minimum is sent to the decoder to signal the predictor index.
This is not the case with the usual index coding.
Index Unary Codeword 0 0 1 10 2 110
Table 2:
In the JCTVC contribution "JCTVC-D197, proposition for robust parsing with temporal predictor, J. Jung, G. Clare, 4th JCTVC meeting: Daegu, KR, 20-28 January, 2011, http://phenix.int-evry.fr/jctI" it was considered that the maximum number of available predictors can be known by the decoder because the decoder can receive it in the slice header or infer this information. Consequently, the last "0" bit doesn't need to be written when the index of the predictor is the last predictor. As depicted in the following table 3: Index Unary Codeword 0 0 1 10 2 11
Table 3:
So at the maximum, when 3 predictors are used, only 2 bits need to be sent. But this has a low impact on the coding efficiency because the probability that all predictors exist and are all different and that the third predictor is used is very low. This special coding is known as "unary max code".
Figure 5 shows the flow chart of the encoding when the unary max code is used. Figure 5 is the same as Figure 1 b, except that the step 5614 receives the maximum possible number of predictors D616. In that case, the number of predictors in the reduced set D1b16 is not used for the conversion of the index to codeword in 5614. This number can be determined if transmitted in the slice header or inferred if this maximum number is always the same.
Figure 6 shows a flow chart of the decoding when the unary max code is used. Figure 6 is the same as Figure Ic except that the module step 5715 knows or receives the maximum possible number of motion vector predictors D71 6 that the encoder can use. So the number of predictors in the reduced set DIcI6 of figure Ic is not used for the entropy decoding in step 5706 and for the conversion of the codeword to index in step S715. Only the maximum possible number is used.
Encoding the maximum number of predictors in the slice header is efficient when all blocks in a slice have the same number.
Generally, this is not the case and this solution reduces the compression efficiency compared to a scheme adapting the maximum number at the CU level.
Some alternative solutions inferring the maximum number already proposed in HEVC suffers from the same parsing problem in case of errors as mentioned above.
SUMMARY OF THE INVENTION
It is desirable to address one or more of the prior art drawbacks. It is also desirable to provide a method allowing correct parsing at the decoder even in the case of a bitstream corrupted by transmission losses while keeping good compression efficiency.
To that end and according to a first aspect of the present invention there is provided a method of encoding a sequence of digital images in which at least one area of an image is encoded by motion compensation with respect to at least one reference image area belonging to at least one reference image, wherein, for an image area to encode belonging to a first group of image areas that are to be encoded into an encoding unit which can be decoded independently of any other encoding unit, the method comprises: -determining a first set of motion vector predictors from motion vectors that are associated with other image areas belonging to said first group of image areas, -determining a second set of motion vector predictors from motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units, -selecting a motion vector predictor from among the motion vector predictors of the first and second sets, and -encoding information representative of the selected motion vector predictor in dependence upon the number of motion vector predictors in the first and second sets.
Advantageously, the method of the invention allows the systematic determination of a target number of motion vector predictors to be used for encoding the motion information, such as a motion vector, associated with the image area to encode. This feature allows avoiding any parsing issue on the decoder side when an error occurs.
In addition, the compression is advantageously improved since this number of motion vector predictors is adapted at image area level.
In one embodiment the number of predictors in said second set is obtainable by a decoder even if at least one said further encoding unit is lost.
In another embodiment the number of predictors in said second set is independent of the information contained in said one or more further encoding units. We thus insure that the number of motion vector predictors for a received image area is systematically obtained by a decoder, even in case of losses of said one or more further encoding units.
In another embodiment the number of predictors in said second set is obtainable by a decoder without using the information contained in the or each said further encoding unit.
In another embodiment the number of predictors in said second set is obtainable by a decoder from information contained in the encoding unit in which said image area to encode is encoded.
In another embodiment the number of predictors in said second set is obtainable by a decoder exclusively from information contained in the encoding unit in which said image area to encode is encoded.
In another embodiment the number of predictors in said second set is obtainable by the decoder using at least one piece of information already contained in the encoding unit in which said image area to encode is encoded for a purpose other than obtaining the number of predictors in said second set. This avoids any overhead for obtaining the number of predictors in said second set.
Advantageously, information from which the decoder is able to obtain the number of predictors in said second set include at least one of the following pieces of information: -index of the reference image; -indexes of one or more reference images used for the encoding of the other image areas of the group of image areas; -number of motion vector predictors in the first set; -encoding mode of the image area to encode; -size of the image area to encode; -encoding mode of already encoded image areas in said group of image areas; and/or -size of already encoded image areas in said group of image areas.
Information that can be used is not limited to the above and can be related to any encoding parameter, such as for example, if the images are divided into variable size macroblocks for processing, the size of the macroblock to which the image area or block to encode belongs.
Information that can be used can also extend to pieces of information contained in the encoding unit in which said image area to encode is encoded for the exclusive purpose of obtaining the number of predictors in said second set, such as for example a dedicated data
field.
Any combination of pieces of information can also be considered such as for example the number of motion vector predictors in the first set in combination with the size of the image area to encode.
In an embodiment, if the number of motion vector predictors in the first set is below a threshold, the number of predictors in said second set is set to a predetermined value. This further avoids any overhead for obtaining the number of predictors in said second set.
More generally, the number of predictors in said second set can be obtained from said information following a predetermined rule.
This allows to adapt the number of predictors in said second set to coding conditions of the image area to encode. This may lead to different numbers of predictors in second sets corresponding to different image areas to encode.
In another embodiment the number of predictors in said second set is obtainable by a decoder from information contained in the encoding unit in which said image area to encode is encoded and from another source of information. The information from that other source is preferably independent of the information contained in said one or more further encoding units.
In another embodiment the number of predictors in the second set is independent of information that can only be derived from the or each said further encoding unit.
In some embodiments the number of predictors in the second set is permitted to be zero, at least on some occasions, for example as may be appropriate for the coding conditions. In this case, of course, the second set is empty and there is no need to take account of motion vectors that are associated with image areas belonging to the further group(s) of image areas when carrying out the selection of the motion vector predictor and the encoding of the selected motion vector predictor.
In another embodiment the number of predictors in said second set is a predetermined number. This embodiment allows reducing the complexity of the determining said number.
According to an embodiment, the information representative of the selected motion vector predictor is determined by ordering the motion vector predictors of the first and second sets and indexing sequentially to motion vector predictors of the first and second sets according to the established order. Thus the information representative of the selected motion vector predictor to be encoded is the index assigned to said selected motion vector predictor. The index is then encoded by means of entropy coding for example in dependence upon the number of motion vector predictors in the first and second sets. This latter information being available to the decoder.
Advantageously, motion vector predictor duplicates are indexed only once. This leads to better coding efficiency (less number of bits in the encoded index).
Advantageously, by indexing motion vector predictor duplicates only once, which is similar to suppressing motion vector predictor duplicates, allows further increasing the compression performance of the encoder. In addition, this process doesn't prevent from parsing the bitstream associated represented the sequence of digital images since it is performed after the obtaining of the number of motion vector predictors in the first and second sets used at the encoding step.
Advantageously, the motion vector predictors of the second set are determined from motion vectors associated with an image area having the same size as the image area to encode and positioned, in the reference image, at the same position than the image area to encode.
According to an embodiment, said determined second set includes motion vector predictors generated by default if the number of motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units is less than a target number.
According to the first aspect of the present invention a motion vector of the image area to encode is encoded with respect to the selected motion vector predictor which results in a residual motion vector.
According to a second aspect of the present invention there is provided a method of encoding a sequence of digital images in which at least one area of an image is encoded by motion compensation with respect to at least one reference image area belonging to at least one reference image, wherein, for an image area to encode belonging to a first group of image areas that are to be encoded into an encoding unit which can be decoded independently of any other encoding unit, the method comprises: -determining a first set of motion vector predictors from motion vectors that are associated with other image areas belonging to said first group of image areas, -obtaining a number of motion vector predictors to be determined for forming a second set from motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units; -selecting a motion vector predictor from among the motion vector predictors of the first set and the second set formed according to the obtained number of motion vector predictors, if any, and -encoding information representative of the selected motion vector predictor in dependence upon the number of motion vector predictors in the first and second sets.
In one embodiment the obtained number is obtainable by a decoder even if at least one said further encoding unit is lost.
In another embodiment the obtained number is independent of the information contained in said one or more further encoding units.
In another embodiment the obtained number is obtainable by a decoder without using the information contained in the or each said further encoding unit.
In another embodiment the obtained number is obtainable by a decoder from information contained in the encoding unit in which said image area to encode is encoded.
In another embodiment the obtained number is obtainable by a decoder exclusively from information contained in the encoding unit in which said image area to encode is encoded.
In another embodiment the obtained number is obtainable by the decoder using at least one piece of information already contained in the encoding unit in which said image area to encode is encoded for a purpose other than obtaining said number. This avoids any overhead for obtaining the number of motion vector predictors that would form the second set.
Advantageously, information from which the decoder is able to obtain the number of predictors to form said second set include at least one of the following pieces of information: -index of the reference image; -indexes of one or more reference images used for the encoding of the other image areas of the group of image areas; -number of motion vector predictors in the first set; -encoding mode of the image area to encode; -size of the image area to encode; -encoding mode of already encoded image areas in said group of image areas; and/or -size of already encoded image areas in said group of image areas.
Information that can be used is not limited to the above and can be related to any encoding parameter, such as for example, if the images are divided into variable size macroblocks for processing, the size of the macroblock to which the image area or block to encode belongs.
Information that can be used can also extend to pieces of information contained in the encoding unit in which said image area to encode is encoded for the exclusive purpose of obtaining the number of predictors to form said second set, such as for example a dedicated
data field.
Any combination of pieces of information can also be considered such as for example the number of motion vector predictors in the first set in combination with the size of the image area to encode.
In an embodiment, if the number of motion vector predictors in the first set is below a threshold, the obtained number is set to a predetermined value More generally, the number of predictors forming said second set can be obtained from said information following a predetermined rule. This allows to adapt the number of predictors in said second set to coding conditions of the image area to encode. This may lead to different numbers of predictors in second sets corresponding to different image areas to encode.
In another embodiment the obtained number is obtainable by a decoder from information contained in the encoding unit in which said image area to encode is encoded and from another source of information. The information from that other source is preferably independent of the information contained in said one or more further encoding units.
In another embodiment the number of predictors in the second set is independent of information that can only be derived from the or each said further encoding unit.
In some embodiments the obtained number is permitted to be zero, at least on some occasions, for example as may be appropriate for the coding conditions. In this case, of course, the second set is empty and there is no need to take account of motion vectors that are associated with image areas belonging to the further group(s) of image areas when carrying out the selection of the motion vector predictor and the encoding of the selected motion vector predictor.
In another embodiment the number of predictors forming said second set is a predetermined number.
In one embodiment, if the number of motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units is less than the obtained number, default motion vector predictors are generated for forming a second set having said obtained number of motion vector predictors.
According to a third aspect of the present invention there is provided a method of encoding a sequence of digital images into a bitstream according to a predictive video format adapted to encode motion vector of a an image area to encode relatively to a motion vector predictor, said image area to encode belonging to a first group of image areas that are to be encoded into an encoding unit which can be decoded independently of any other encoding unit, the method being characterized in that it comprises the steps of: -determining a number of motion vector predictors from motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units and wherein said number is obtainable by a decoder even if at least one said further encoding unit is lost; and -encoding information representative of the motion vector predictor, relatively to which the motion vector of said image area is encoded in dependence upon the determined number of motion vector predictors.
According to a fourth aspect of the present invention there is provided a method of decoding a bitstream representative of a sequence of digital images in which at least one area of an image is encoded by motion compensation with respect to at least one reference image area belonging to at least one reference image, wherein, for an image area to decode belonging to a first group of image areas that are encoded into a received encoding unit which can be decoded independently of any other encoding unit, the method comprises: -determining a first set of motion vector predictors from motion vectors that are associated with other image areas belonging to said first group of image areas, -determining a second set of motion vector predictors from motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units, -decoding information representative of a selected motion vector predictor from among the motion vector predictors of the first and second sets in dependence upon the number of motion vector predictors in the first and second sets.
In one embodiment, the number of predictors, if any, in said second set is obtainable even if at least one said further encoding unit is lost.
In another embodiment the selected motion vector predictor along with residual motion victor obtained from the received encoding unit are used to decode a motion vector of the image area to decode.
According to a fifth aspect of the present invention there is provided a method of decoding a bitstream representative of a sequence of digital images according to a predictive video format adapted to decode motion vector of an image area to decode relatively to a motion vector predictor, said image area belonging to a first group of image areas to be decoded from an encoding unit which can be decoded independently of any other encoding unit, the method being characterized in that it comprises the steps of: -determining a number of motion vector predictors from motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units and wherein said number is obtainable even if at least one said further encoding unit is lost; and -decoding information representative of the motion vector predictor, relatively to which the motion vector of said image area is to be decoded in dependence upon the determined number of motion vector predictors.
According to a sixth aspect of the present invention there is provided a device of encoding a sequence of digital images in which at least one area of an image is encoded by motion compensation with respect to at least one reference image area belonging to at least one reference image, wherein, for an image area to encode belonging to a first group of image areas that are to be encoded into an encoding unit which can be decoded independently of any other encoding unit, the device comprises: -means for determining a first set of motion vector predictors from motion vectors that are associated with other image areas belonging to said first group of image areas, -means for determining a second set of motion vector predictors from motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units, the number of predictors, if any, in said second set is obtainable by a decoder even if at least one said further encoding unit is lost, -means for selecting a motion vector predictor from among the motion vector predictors of the first and second sets, and -means for encoding information representative of the selected motion vector predictor in dependence upon the number of motion vector predictors in the first and second sets.
According to a seventh aspect of the present invention there is provided a device of decoding a bitstream representative of a sequence of digital images in which at least one area of an image is encoded by motion compensation with respect to at least one reference image area belonging to at least one reference image, wherein, for an image area to decode belonging to a first group of image areas that are encoded into a received encoding unit which can be decoded independently of any other encoding unit, the device comprises: -means for determining a first set of motion vector predictors from motion vectors that are associated with other image areas belonging to said first group of image areas, -means for determining a second set of motion vector predictors from motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units, the number of predictors, if any, in said second set is obtainable by said device even if at least one said further encoding unit is lost, -means for decoding information representative of a selected motion vector predictor from among the motion vector predictors of the first and second sets in dependence upon the number of motion vector predictors in the first and second sets.
According to yet another aspect, the invention also relates to a computer program product that can be loaded into a programmable apparatus, comprising sequences of instructions for implementing a method of encoding a sequence of digital images or decoding a bitstream as briefly described above, when the program is loaded into and executed by the programmable apparatus. Such a computer program may be transitory or non transitory. In an implementation, the computer program can be stored on a non-transitory computer-readable carrier medium.
In all of the above embodiments, an encoding unit can be preferably chosen equal to a slice and an image area to encode equal to an image block.
The particular characteristics and advantages of the device for encoding a sequence of digital images or decoding a bitstream being similar to those of the method of encoding a sequence of digital images or the decoding method a bitstream, they are not repeated here.
BRIEF DESCRIPTION OF THE DRAWINGS
Other features and advantages will appear in the following description, which is given solely by way of non-limiting example and made with reference to the accompanying drawings, in which: -Figure Ia, illustrates schematically a set of motion vector predictors used in a motion vector prediction scheme.
-Figure ib, illustrates an implementation of the AMVP scheme in HEVC on the encoder side.
-Figure Ic, illustrates the current implementation of the AMVP scheme in HEVC on the decoder side.
-Figure 2 is a diagram of a processing device adapted to implement an embodiment of the present invention.
-Figure 3 is a block diagram of an implementation of a HEVC encoder.
-Figure 4 is a block diagram of an implementation of a HEVC decoder.
-Figure 5, illustrates a particular implementation of the AMVP scheme on the encoder side, where the maximum possible number of motion vector predictors is considered as given.
-Figure 6, illustrates a particular implementation of the AMVP scheme on the decoder side, where the maximum possible number of motion vector predictors is considered as given.
-Figure 7 illustrates schematically the motion vector predictor derivation process on the encoder side according to an embodiment of the invention.
-Figure 8 illustrates schematically the motion vector predictor derivation process on the decoder side according to an embodiment of the invention.
DETAILED DESCRIPTION OF THE EMBODIMENTS
Figure 2 illustrates a diagram of a processing device 1000 adapted to implement one embodiment of the present invention. The apparatus 1000 is for example a micro-computer, a workstation or a light portable device.
The apparatus 1000 comprises a communication bus 1113 to which there are preferably connected: -a central processing unit 1111, such as a microprocessor, denoted CPU; -a read only memory 1107 able to contain computer programs for implementing the invention, denoted ROM; -a random access memory 1112, denoted RAM, able to contain the executable code of the method of the invention as well as the registers adapted to record variables and parameters necessary for implementing the method of encoding a sequence of digital images and/or the method of decoding a bitstream; and -a communication interface 1102 connected to a communication network 1103 over which digital data to be processed are transmitted, Optionally, the apparatus 1000 may also have the following components: -a data storage means 1104 such as a hard disk, able to contain the programs implementing the invention and data used or produced during the implementation of the invention; -a disk drive 1105 for a disk 1106, the disk drive being adapted to read data from the disk 1106 or to write data onto said disk; -a screen 1109 for displaying data and/or serving as a graphical interface with the user, by means of a keyboard 1110 or any other pointing means.
The apparatus 1000 can be connected to various peripherals, such as for example a digital camera 1100 or a microphone 1108, each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 1000.
The communication bus affords communication and interoperability between the various elements included in the apparatus 1000 or connected to it. The representation of the bus is not limiting and in particular the central processing unit is able to communicate instructions to any element of the apparatus 1000 directly or by means of another element of the apparatus 1000.
The disk 1106 can be replaced by any information medium such as for example a compact disk (CD-ROM), rewritable or not, a ZIP disk or a memory card and, in general terms, by an information storage means that can be read by a microcomputer or by a microprocessor, integrated or not into the apparatus, possibly removable and adapted to store one or more programs whose execution enables the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to the invention to be implemented.
The executable code may be stored either in read only memory 1107, on the hard disk 1104 or on a removable digital medium such as for example a disk 1106 as described previously. According to a variant, the executable code of the programs can be received by means of the communication network 1103, via the interface 1102, in order to be stored in one of the storage means of the apparatus 1000 before being executed, such as the hard disk 1104.
The central processing unit 1111 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, instructions that are stored in one of the aforementioned storage means. On powering up, the program or programs that are stored in a non-volatile memory, for example on the hard disk 1104 or in the read only memory 1107, are transferred into the random access memory 1112, which then contains the executable code of the program or programs, as well as registers for storing the variables and parameters necessary for implementing the invention.
In this embodiment, the apparatus is a programmable apparatus which uses software to implement the invention. However, alternatively, the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).
The invention could be implemented in the encoder described by figure 3 for the encoder part and the decoder described by figure 4 for parts related to the decoder. For the encoder part we replace in step S317 the AMVP version currently implemented in HEVC and described by figure 1 b by a version according to the invention and represented by figure 7. For the decoder part we replace in step S410 the AMVP version currently implemented in HEVC and described in figure ic by a version according to the invention and represented by figure 8 Figure 7 shows the flow chart describing the motion vector predictor derivation process. This figure is based on Figure lb. The main differences are the derivation process of the predictors and the information used to convert the index into a codeword. In this embodiment, two derivation processes were used instead of one: one for the predictors which come from the same slice as the block to encode and one for the motion vector predictors which come from another slice (spatial or temporal). We will see later that the specific criteria to determine the number of predictors in the set of motion vector predictors which come from another slice, only depend on information associated to the current slice. Consequently, the number of motion vector predictors considered by the encoder in set D820 is known by the decoder even if the other slice is lost or unavailable. The conversion of the index into a codeword depends on this number of motion vector predictors instead of depending on the number of motion vector predictors of the reduced set (after suppression) because the number of predictors in the reduced set cannot be determined at the decoder side if some relevant other slices are lost.
The module D801 contains the motion vectors field and all other data related to the blocks of the current slice already coded.
The derivation process of the motion vector predictors of the current slice in step 5803 receives this motion vector field as well as the reference frame index of the motion vector of the block to be encoded and generates the related motion vector predictors set 1 D816. The same derivation algorithm but without suppression implemented in the current version of AMVP and described above with reference to figure 1 a could be used to derive the number of motion vector predictors in set I as well as the values of the motion vector predictors. So, this derivation process can still take into account of the reference frame indexes of the neighboring blocks as long as they belong to the same slice and the reference frame index of the block to encode but also of the existence of the associated neighboring motion vectors in this slice.
In a particular embodiment, the number of motion vector predictors and the value of the predictors can depend on the encoding mode of the neighboring blocks from the current slice.
As a consequence the derivation process may result in 0, 1 or 2 spatial motion vector predictors. It can be noted that the number of predictors in set 1 only depends on information which can be obtained on the decoder side from the current slice. We suppose here that a slice is transported by a same transport unit (packet), which is either received or lost by a receiver. In that condition, a slice cannot be partially received or received with errors by a decoder.
In figure 7, the motion vector field and all other data related to the blocks of the other slices already coded are represented by reference D817. This includes the slices of previously encoded frames (the temporal slices) and the slices already encoded of the current frame.
The derivation process of the motion vector predictors from other slices generates the motion vector predictors set 2 D819. The derivation process for set 2 can be divided into 2 independent functions: one to determine the number of predictors and one to determine the values of the predictors.
The number of the predictors in set 2 must depend only on information which can be determined by the decoder in case a slice other than the current slice is lost. Consequently, the values of the predictors can depend on all information available at decoder side at the time of decoding the current block. The main difference with the prior art is that the number of predictors in the set 2 doesn't depend on information associated with slices from which the motion vectors predictors of set 2 are derived but only on information associated with the current slice which would be always available at the decoder as long as the data unit transporting the current encoded slice is not lost.
As mentioned above, the number of predictors in set 2 must depend only on information which can be known by the decoder in case of slice loss. So, in a preferred embodiment this number of predictors can depend for instance on: * The reference frame index of the current motion vector to predict.
* The reference frame index of the motion vectors of already encoded blocks of the current slice.
* The amount of predictors in set 1.
* The existence of the spatial neighboring motion vectors (spatial predictors of the current slice).
Neighboring blocks may not include a motion vector if it is coded in INTRA for instance.
* The encoding mode of the current block to encode * The current block to encode size * The modes and/or the sizes of the neighboring spatial blocks of the current slice.
In a particular embodiment, in cases where the reference frame index of the blocks in the current slice could be false due to slice loss the criteria to fix the number of predictors must not depend on the reference frame index. This situation can occur when the reference frame index is used in the suppression process for instance in the case of merge mode. The same comment could be made for the direction and the list.
One can note that in a preferred embodiment the number of predictors in set 2 must not depend on: * Any data of the other slices * The values of the motion vectors (as these values could be false in case of slice loss).
* All the values which could be false in case of slice loss (in particular the reference index can be false if the Merge mode uses a suppression process).
One embodiment to determine the number of predictors in set 2 is to take into account the number of predictors in set I and the size of the block to encode. For example, if the block size is a small block size (4x4, 8x8, I 6x1 6) and if the number of predictors in set I reaches its maximum (2 in the current implementation), the number of predictors in set 2 could be equal to 0. For large block size (32x32, 64x64, etc.), the number of predictors could be always equal to the maximum possible number of predictors in set 2 because for large block size a higher number of predictors is generally more efficient and the temporal motion vectors are more efficient than the spatial predictors.
The criterion on the block size could be determined according to the sequence resolution. And this rule can be sent in the sequence parameters set or picture parameters set or in the slice header.
In a second embodiment, if the mode of the predictors in set 1 is the Inter mode, the number of predictors in set 2 can be equal to 0.
Indeed, the motion vectors which come from Inter mode have a residual, so they are (theoretically) more precise than motion vectors which come from a mode without motion vector residual.
In a third embodiment, the number of predictors for set 2 can be fixed (encoder and decoder side) or transmitted in the bitstream (Sequence parameter set, picture parameter set, slice header).
In a fourth embodiment, the number of predictors in set 2 could be given by: Card(set 2) = Max_pred -Card(set 1) Where Max_pred' is the maximum number of predictors (820) in set (804) and Card (x) is the cardinal of set x. Max pred' could be fixed or sent in the bitstream (Sequence parameter Set I Picture Parameter Set Islice header).
In a fifth embodiment, if only one predictor is contained in set 1, the number of predictors in set 2 is zero. Indeed, in that case no index of the motion vector predictor is transmitted.
In a sixth embodiment, if the number of predictors in set 1 is zero, the number of predictors in set 2 is 1, for the same reasons as in the previous embodiment.
Note that some combinations of these embodiments are also possible.
In embodiments of the present invention, the values of the predictors in set 2 can depend on any information even if this information may be unavailable at decoder side when a slice loss occurs. This is possible because the parsing of the bitstream doesn't depend on the value of the motion vector predictors.
For example, the value of the predictors in set 2 can depend on: * the value of the reference frame index of the current motion vector * the encoded modes of the blocks of these other slices * the reference frame indexes of the blocks of these other slices * the existence of the motion vectors of these other slices * the reference frame indexes of the predictors of set I * the encoded modes of the predictors of set I * the block size of the current blocks and the size of the blocks of the predictor of set I * more generally on the data contained in D801 and D817 In a particular embodiment, the derivation process of the values of the motion vector predictors can be the same as the one currently implemented in HEVC and described above with reference to figure lb. In embodiments of the present invention, the number of predictors in set 2 is independent of the existence of the motion vectors.
Consequently, in embodiments of invention, some particular determinations of the predictors could be used when the predictor doesn't exist.
One can imagine that the number of predictors in set 2 obtained from the number derivation process applied on information of the current slice is above the number of predictors existing in the other slices, in that case, missing predictors could be generated.
In a first embodiment, if the temporal motion vector predictor doesn't exist but the derivation of the number of predictors in set 2 determines that at least one motion vector predictor is needed in this set, then the value of the at least one temporal motion vector predictor could be set to a default value. For example, the value (0, 0), or a value sent in the bitstream (in the Sequence Parameter Set, Picture Parameter Set, or slice header) could be used as the default value. For the particular case of the collocated block, if the collocated block has no motion vector, a predictor can be extracted from one block surrounding the collocated block (for instance the block on the top, the block on the left, the block on the right and the block below the collocated block).
A second embodiment appears in case of Bi-predicted blocks. In that case, two collocated blocks could be available: one in the temporally closest frame in the list "LO" of reference frames and one in the temporally closest frame in the list "LI" of reference frames. As explained before, one collocated block is chosen for the motion vector prediction. If the motion vector predictor associated with the selected collocated block doesn't exist, a motion vector predictor can be extracted from the collocated block of the other list if it exists. The value could be scaled according to the temporal distance and the direction of the lists.
In a third embodiment, if the motion vector predictor doesn't exist, this motion vector predictor can be replaced by a motion vector predictor extracted from a block surrounding it if at least one exists. If these motion vector predictors don't exist, the value of the predictor can be equal to the value of one available motion vector in the slice containing this motion vector predictor.
Any other means allowing deriving a predictor when the temporal motion vector predictor is missing can be also envisaged.
Then set I D816 and set 2 D819 are merged in the motion vector predictors set D804 which will be used for the coding of the current motion vector D802.
The best predictor index is selected in the RD selection in step S807 among the reduced set D806 which was generated by the suppression step S805. Note that the number of predictor in set 1 and 2 could be zero. In that case the suppression step is not applied. Like in the state of the art, the current motion vector of the block to be encoded D808 is then predicted in step S810 by the best predictor D809 and the generated residual D81 1 is entropy coded in step S81 2.
The index of the best predictor D808 is converted in step S814 into the codeword D815. The conversion in S814 depends on the number of predictors D820 in the motion vector predictors set D804.
The conversion process uses the following Table 4 when a maximum of 3 predictors are used.
Codeword when the amount of predictors in the ___________ set is N Index N1 N=2 N=3 0 (inferred) 0 0 1 1 io
Table 4:
This conversion table is exactly the same as the current implementation of AMVP in HEVC. However it is important to note that the conversion process depends on the number of predictors in set D804 and not on the reduced vector predictors set D806 as in the current implementation of HEVC. Nevertheless, as seen before, the value of the index can depend on this reduced set D806, which allows improving further the coding efficiency. In this table, if the number of predictors (820) is equal to 1 the index of the predictor can be inferred and no de-synchronization of the bitstream in case of errors occurs.
This is because in embodiments of the present invention this conversion process only depends on information associated with the current slice.
Indeed we consider here that the status of the collocated block (availability of a temporal motion vector predictor or not) will always be known, and that this information could be used to determine the final number of motion vector predictors in set D804. Indeed this information depends only on information available in the current slice.
We have seen that the number of predictors in set 2 obtained from the number derivation process applied on information of the current slice could be above the number of predictors existing in the other slices. In that case, missing predictors could be generated. The situation is different for set 1, since in that case left and top predictor may not be available. However the information on the availability of the top and left predictors will always be available on the decoder side since it depends on the current slice only.
The suppression process in step S805 is then applied to the motion vector predictors set D804. The suppression process can be the same as described above and based on the suppression of duplicates values. This suppression process is particularly interesting in terms of coding efficiency. When suppressing some duplicate motion vector predictors, the index values for the remaining motion vector predictors are changed. Thanks to this process the probability of obtaining a shorter index binary code is increased. For example, let us consider that 4 predictors are used for the current set of predictors D804. Among these predictors the first predictor having index 0 is equal to the second predictor having index 1. The third predictor (index 2) and the fourth predictor (index 3) are different. The best predictor is the third predictor index 3. If the suppression process is not used, the codeword associated with the index 2 is 110', 50 3 bits need to be coded as indicated in table 4b below. If the suppression process is used, the second predictor is removed and the third predictor has the index 1.
The codeword related to this index is 10', so only 2 bits need to be coded.
Codeword when the amount of predictors in the __________ set is N Index N=3 (suppression N=3 (suppression not __________ used) used) 0 0 0 1 10 10 2 11 110 3 _________________ 111
Table 4b:
Moreover this suppression process could have an impact on the coding efficiency when the arithmetic coding is used even if the codeword of the best predictor has the same number of bits before and after the suppression process. Indeed, the probability of obtaining different binary values for the index coding decrease. As a consequence the probability of obtaining a good prediction of a binary value from its context during the arithmetic coding increases.
Note that in another embodiment it is possible to not use the suppression process to reduce the computational complexity of the motion vector coding.
Note that a particular case arises with the suppression process used in the merge mode. Some de-synchronization of the bitstream can occur in case of losses because the Merge mode propagates the reference frame index.
For example, let us consider a case where the Merge has 3 predictors in the initial set D804. The first and the second predictors have the same reference index 0' and the same motion vector vaiue.
The third predictor points to reference frame I (index 1). At encoder side, the Merge mode suppression process removes the second predictor and the third predictor takes its position (and its index 1') in the reduced set. The RD selection step S807 selects the third predictor and sends the predictor index 1' to the decoder. Consequently, the reference index for this block is reference frame 1 and the following blocks take into account this reference frame index for their derivation process of motion vector predictors. At decoder side, if a slice loss occurs, the values of 3 predictors are false for this Merge block. The Merge suppression process finds that the reduced set contains 3 predictors instead of I. The decoder reads the predictor index I' in the bitstream. The decoder considers that the second predictor with a reference frame index equal to 0' is the selected predictor.
Consequently, the reference frame value is equal to 0' for this block instead of I' at encoder side. Then, the derivation process of the motion vector predictors for the following blocks can't find the same amount of predictors as the encoder because this amount of predictors depends on this reference frame index.
In a preferred embodiment of the invention, in order to avoid any issue with the Merge mode, the suppression process will not be applied in the Merge mode. This can also be extended in case of B frames, where the Merge mode propagates the direction (Bi-directional or unidirectional) and the list (LU, Li, LU and Li).
Figure 8 shows the flow chart of the processing at the decoder side in an embodiment of the invention. This flow chart is similar to the flow chart of Figure 1 c. One difference is in the generation modules which are the same as the encoder. In this flow chart, the number of predictors in the set S920 is used to determine the number of bits which needs to be decoded in the entropic decoding step 5906.
This step 5906 decodes the code word D9i4. The amount of predictors D920 is then used to convert the codeword to index in D9i5. The main difference compared to the state of the art is that the entropic decoding step 5906 and the conversion of the codeword to index step 5915 depends on the amount of the predictors in set D902 and not on the amount of predictors in reduced set D909.
When a slice loss occurs, the number of predictors in set D920 depends only on information associated with the current slice which is systematically correctly decoded. So, no parsing issue can happen in the entropic decoding because the same values (number of predictors) are used at encoder and decoder side. But the motion vector value decoded could be false in case of slice loss.
The embodiments described above are based on block partitions of input images, but more generally, any type of image areas to encode or decode can be considered, in particular rectangular areas or more generally geometrical areas.
More generally, any modification or improvement of the above-described embodiments, that a person skilled in the art may easily conceive should be considered as falling within the scope of the invention.

Claims (16)

  1. CLAIMS1. A method of encoding a sequence of digital images in which at least one area of an image is encoded by motion compensation with respect to at least one reference image area belonging to at least one reference image, wherein, for an image area to encode, belonging to a first group of image areas that are to be encoded into an encoding unit which can be decoded independently of any other encoding unit, the method comprises: -determining a first set of motion vector predictors from motion vectors that are associated with other image areas belonging to said first group of image areas, -determining a second set of motion vector predictors from motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units, the number of predictors, if any, in said second set being obtainable by a decoder even if at least one said further encoding unit is lost, -selecting a motion vector predictor from among the motion vector predictors of the first and second sets, and -encoding information representative of the selected motion vector predictor in dependence upon the number of motion vector predictors in the first and second sets.
  2. 2. Method according to claim 1, wherein the number of predictors in said second set is independent of the information contained in said one or more further encoding units.
  3. 3. Method according to claim 1, wherein the number of predictors in said second set is obtainable by a decoder from information contained in the encoding unit in which said image area to encode is encoded.
  4. 4. Method according to claim 1, wherein the number of predictors in said second set is obtainable by the decoder using at least one piece of information already contained in the encoding unit in which said image area to encode is encoded for a purpose other than obtaining the number of predictors in said second set.
  5. 5. Method according to claim 3 or 4, wherein information from which the decoder is able to obtain the number of predictors in said second set includes at least one of the following pieces of information: -an index of the reference image; -indices of one or more reference images used for the encoding of the other image areas of the group of image areas; -a number of motion vector predictors in the first set; -an encoding mode of the image area to encode; -a size of the image area to encode; -an encoding mode of already encoded image areas in said group of image areas; and -a size of already encoded image areas in said group of image areas.
  6. 6. Method according to claim 1, wherein the number of predictors in said second set is a predetermined number.
  7. 7. Method according to claim 1, wherein the information representative of the selected motion vector predictor is determined by ordering the motion vector predictors of the first and second sets and indexing sequentially to motion vector predictors of the first and second sets according to the established order.
  8. 8. Method according to claim 7, wherein motion vector predictor duplicates are indexed only once.
  9. 9. A method of decoding a bitstream representative of a sequence of digital images in which at least one area of an image is encoded by motion compensation with respect to at least one reference image area belonging to at least one reference image, wherein, for an image area to decode, belonging to a first group of image areas that are encoded into a received encoding unit which can be decoded independently of any other encoding unit, the method comprises: -determining a first set of motion vector predictors from motion vectors that are associated with other image areas belonging to said first group of image areas, -determining a second set of motion vector predictors from motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units, the number of predictors, if any, in said second set being obtainable even if at least one said further encoding unit is lost, and -decoding information representative of a selected motion vector predictor from among the motion vector predictors of the first and second sets in dependence upon the number of motion vector predictors in the first and second sets.
  10. 10. Method according to claim 9, wherein the selected motion vector predictor along with residual motion victor obtained from the received encoding unit are used to decode a motion vector of the image area to decode.
  11. 11. An encoding device for encoding a sequence of digital images in which at least one area of an image is encoded by motion compensation with respect to at least one reference image area belonging to at least one reference image, the device comprising: means for determining a first set of motion vector predictors for an image area to encode, the image area to encode belonging to a first group of image areas that are to be encoded into an encoding unit which can be decoded independently of any other encoding unit, and the first set of motion vector predictors being determined from motion vectors that are associated with other image areas belonging to said first group of image areas, -means for determining a second set of motion vector predictors for the image area to encode from motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more further encoding units, the number of predictors, if any, in said second set being obtainable by a decoder even if at least one said further encoding unit is lost, -means for selecting a motion vector predictor from among the motion vector predictors of the first and second sets, and -means for encoding information representative of the selected motion vector predictor in dependence upon the number of motion vector predictors in the first and second sets.
  12. 12. A decoding device for decoding a bitstream representative of a sequence of digital images in which at least one area of an image is encoded by motion compensation with respect to at least one reference image area belonging to at least one reference image, the device comprising: means for determining a first set of motion vector predictors for an image area to decode, the image area to decode belonging to a first group of image areas that are encoded into a received encoding unit which can be decoded independently of any other encoding unit, and the first set of motion vector predictors being determined from motion vectors that are associated with other image areas belonging to said first group of image areas, -means for determining a second set of motion vector predictors for said image area to decode from motion vectors that are associated with image areas belonging to one or more further groups of image areas different from said first group of image areas and that are encoded into one or more received further encoding units, the number of predictors, if any, in said second set being obtainable by the decoding device even if at least one said further encoding unit is lost, and -means for decoding information representative of a selected motion vector predictor from among the motion vector predictors of the first and second sets in dependence upon the number of motion vector predictors in the first and second sets.
  13. 13. A computer program which, when run on a computer, causes the implementation of a method of encoding a sequence of digital images according to any one of claims 1 to 8 or a method for decoding a bitstream according to one of the claims 9 to 10.
  14. 14. A computer-readable storage medium storing a program according to claim 13.
  15. 15. A method, device or computer program for encoding a sequence of digital images substantially as hereinbefore described with reference to any one of Figures 2, 7 and 8 of the accompanying drawings.
  16. 16. A method, device or computer program for decoding a bitstream substantially as hereinbefore described with reference to any one of Figures 2, 7 and 8 of the accompanying drawings.
GB201103925A 2011-03-08 2011-03-08 Video encoding and decoding with improved error resillience Active GB2488798B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB201103925A GB2488798B (en) 2011-03-08 2011-03-08 Video encoding and decoding with improved error resillience
PCT/EP2012/001011 WO2012119766A1 (en) 2011-03-08 2012-03-07 Video encoding and decoding with improved error resilience

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB201103925A GB2488798B (en) 2011-03-08 2011-03-08 Video encoding and decoding with improved error resillience

Publications (3)

Publication Number Publication Date
GB201103925D0 GB201103925D0 (en) 2011-04-20
GB2488798A true GB2488798A (en) 2012-09-12
GB2488798B GB2488798B (en) 2015-02-11

Family

ID=43923384

Family Applications (1)

Application Number Title Priority Date Filing Date
GB201103925A Active GB2488798B (en) 2011-03-08 2011-03-08 Video encoding and decoding with improved error resillience

Country Status (2)

Country Link
GB (1) GB2488798B (en)
WO (1) WO2012119766A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0658057A2 (en) * 1993-12-13 1995-06-14 Sharp Kabushiki Kaisha Moving image coder
GB2332115A (en) * 1997-12-01 1999-06-09 Samsung Electronics Co Ltd Motion vector prediction method
US20050053137A1 (en) * 2003-09-07 2005-03-10 Microsoft Corporation Predicting motion vectors for fields of forward-predicted interlaced video frames
WO2008117158A1 (en) * 2007-03-27 2008-10-02 Nokia Corporation Method and system for motion vector predictions
US20100020877A1 (en) * 2008-07-23 2010-01-28 The Hong Kong University Of Science And Technology Multiple reference frame motion estimation in video coding

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009115901A2 (en) * 2008-03-19 2009-09-24 Nokia Corporation Combined motion vector and reference index prediction for video coding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0658057A2 (en) * 1993-12-13 1995-06-14 Sharp Kabushiki Kaisha Moving image coder
GB2332115A (en) * 1997-12-01 1999-06-09 Samsung Electronics Co Ltd Motion vector prediction method
US20050053137A1 (en) * 2003-09-07 2005-03-10 Microsoft Corporation Predicting motion vectors for fields of forward-predicted interlaced video frames
WO2008117158A1 (en) * 2007-03-27 2008-10-02 Nokia Corporation Method and system for motion vector predictions
US20100020877A1 (en) * 2008-07-23 2010-01-28 The Hong Kong University Of Science And Technology Multiple reference frame motion estimation in video coding

Also Published As

Publication number Publication date
GB2488798B (en) 2015-02-11
GB201103925D0 (en) 2011-04-20
WO2012119766A1 (en) 2012-09-13

Similar Documents

Publication Publication Date Title
US11943465B2 (en) Video encoding and decoding
US11968390B2 (en) Method and device for encoding a sequence of images and method and device for decoding a sequence of images
WO2012095467A1 (en) Video encoding and decoding with low complexity
WO2012095466A1 (en) Video encoding and decoding with improved error resilience
US9648341B2 (en) Video encoding and decoding with improved error resilience
GB2492778A (en) Motion compensated image coding by combining motion information predictors
US11095878B2 (en) Method and device for encoding a sequence of images and method and device for decoding a sequence of image
WO2012089773A1 (en) Video encoding and decoding with improved error resilience
GB2488798A (en) Video encoding and decoding with improved error resilience