US20060062300A1 - Method and device for encoding/decoding video signals using base layer - Google Patents

Method and device for encoding/decoding video signals using base layer Download PDF

Info

Publication number
US20060062300A1
US20060062300A1 US11/231,868 US23186805A US2006062300A1 US 20060062300 A1 US20060062300 A1 US 20060062300A1 US 23186805 A US23186805 A US 23186805A US 2006062300 A1 US2006062300 A1 US 2006062300A1
Authority
US
United States
Prior art keywords
image block
sequence
frames
frame
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/231,868
Inventor
Seung Park
Ji Park
Byeong Jeon
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LG Electronics Inc
Original Assignee
LG Electronics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LG Electronics Inc filed Critical LG Electronics Inc
Priority to US11/231,868 priority Critical patent/US20060062300A1/en
Assigned to LG ELECTRONICS INC. reassignment LG ELECTRONICS INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PARK, SEUNG WOOK, JEON, BYEONG MOON, PARK, JI HO
Publication of US20060062300A1 publication Critical patent/US20060062300A1/en
Priority to US13/067,497 priority patent/US8885710B2/en
Priority to US14/510,496 priority patent/US9338453B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/31Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • H04N19/615Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation

Definitions

  • the present invention relates to a method and a device for encoding and decoding video signals.
  • MPEG which has been adopted as a standard for recording movie content, etc., on a recording medium such as a DVD and is now in widespread use.
  • H.264 which is expected to be used as a standard for high-quality TV broadcast signals in the future.
  • TV broadcast signals require high bandwidth, it is difficult to allocate such high bandwidth for the type of wireless transmissions/receptions performed by mobile phones and notebook computers, for example.
  • video compression standards for such devices must have high video signal compression efficiencies.
  • Such mobile devices have a variety of processing and so that a variety of forms corresponding to a variety of combinations of variables such as the number of frames transmitted per second, resolution, the number of bits per pixel, etc. This imposes a great burden on content providers.
  • content providers prepare high-bitrate compressed video signals for each video source and perform, when receiving a request from a mobile device, a process of decoding the compressed video signals and encoding it back into video signals suited to the video processing capabilities of a mobile device before providing the requested video signals to the mobile device.
  • this method entails a transcoding procedure including decoding, scaling and encoding processes, and causes some time delay in providing the requested signals to the mobile device.
  • the transcoding procedure also requires complex hardware and algorithms to cope with the wide variety of target encoding formats.
  • a Scalable Video Codec (SVC) has been developed in an attempt to overcome these problems.
  • video signals are encoded into a sequence of pictures with the highest image quality while ensuring that a part of the encoded picture sequence (specifically, a partial sequence of pictures intermittently selected from the total sequence of pictures) can be used to represent the video signals with a low image quality.
  • Motion Compensated Temporal Filtering is an encoding and decoding scheme that has been suggested for use in the scalable video codec.
  • the MCTF scheme requires a high compression efficiency (i.e., a high coding rate) for reducing the number of bits transmitted per second since it is highly likely to be applied to mobile communication where bandwidth is limited, as described above.
  • the present invention has been made in view of the above problems, and it is an object of the present invention to provide a method and a device for encoding video signals in a scalable scheme by additionally using a base layer provided for a lower transfer rate.
  • the present invention related to encoding and decoding a video signal by motion compensated temporal filtering.
  • a first sequence of frames are decoded by inverse motion compensated temporal filtering by selectively adding to a first image block in the first sequence image information, the image information being based on at least one of (1) a second image block from the first sequence and (2) a third image block from an auxiliary sequence of frames.
  • a frame in a current frame interval is decoded wherein the second image block is in a frame of the first sequence that is one of prior to and subsequent to a frame including the first image block.
  • a frame in a current interval is decoded wherein the third image block is from a frame in the auxiliary sequence of frames that is temporally aligned with a frame including the first image block.
  • a frame in a current frame interval is decoded by adding the first image block one of (1) an adjacent image block positioned prior to the first image block or an adjacent image positioned subsequent to the first image block in the first sequence, and (2) the third image block from the auxiliary sequence of frames from at least one of an image block temporally aligned with, before and after an image block in the auxiliary sequence temporally aligned with the first image block.
  • a frame in a current frame interval is decoded by adding the first image block two of (1) an adjacent image block positioned prior to the first image block or an adjacent image positioned subsequent to the first image block in the first sequence, and (2) the third image block from the auxiliary sequence of frames from at least one of an image block temporally aligned with, before and after an image block in the auxiliary sequence temporally aligned with the first image block.
  • FIG. 1 is a block diagram of a video signal encoding device to which a video signal compression method according to the present invention is applied;
  • FIG. 2 is a block diagram of a filter that performs image estimation/prediction and update operations in the MCTF encoder as shown in FIG. 1 ;
  • FIG. 3 illustrates how L frames and H frames having image differences are produced from a picture sequence in a group of pictures (GOP) according to an embodiment of the present invention
  • FIG. 4 illustrates the structure of timing information according to an embodiment of the present invention, which indicates a temporal correlation between main frames of an enhanced layer and auxiliary frames of a base layer and which is inserted and transmitted in a bitstream of the enhanced layer;
  • FIGS. 5 a and 5 b illustrate the relationship between frames of the enhanced and the base layers which can be used as references to produce an H frame having a predicted image according to an embodiment of the present invention
  • FIG. 6 illustrates limited examples of various reference block selection modes of a macroblock produced by the filter of FIG. 2 ;
  • FIG. 7 illustrates the structure of reference block selection mode information carried in macroblock header information according to an embodiment of the present invention
  • FIG. 8 illustrates the structure of information required due to the use of the base layer, which is carried in the enhanced layer bitstream, according to an embodiment of the present invention
  • FIG. 9 illustrates the structure of information of the encoding level of L frames of the enhanced layer from which images of auxiliary frames of the base layer have been subtracted;
  • FIG. 10 is a block diagram of a device for decoding a bitstream encoded by the device of FIG. 1 ;
  • FIG. 11 is a block diagram of an inverse filter that performs inverse prediction and update operations in an MCTF decoder shown in FIG. 10 .
  • FIG. 1 is a block diagram of a video signal encoding device to which a scalable video signal compression method according to the present invention is applied.
  • the video signal encoding device shown in FIG. 1 comprises an MCTF encoder 100 , a texture coding unit 110 , a motion coding unit 120 , a base layer encoder 150 , and a muxer (or multiplexer) 130 .
  • the MCTF encoder 100 encodes an input video signal in units of macroblocks in an MCTF scheme, and generates suitable management information.
  • the texture coding unit 110 converts information of encoded macroblocks into a compressed bitstream.
  • the motion coding unit 120 encodes motion vectors of macroblocks obtained by the MCTF encoder 100 into a compressed bitstream according to a specified scheme.
  • the base layer encoder 150 encodes an input video signal according to a specified scheme, for example, according to the MPEG-1, 2 or 4 standard or the H.261, H.263 or H.264 standard, and may produce a small-screen picture sequence, for example, a sequence of pictures scaled down to 25% of their original size if necessary.
  • the muxer 130 encapsulates output data from the texture coding unit 110 , the small-screen picture sequence output from the base layer encoder 150 , and motion vector data of the motion coding unit 120 into a predetermined format.
  • the muxer 130 then multiplexes and outputs the encapsulated data into a set transmission format.
  • the small-screen picture sequence is referred to as a base layer sequence
  • the output frame/picture sequence of the MCTF encoder 100 is referred to as an enhanced layer sequence.
  • the base layer sequence is a sequence of auxiliary frames that is provided to be selectively used in devices that may have lower performance capabilities than the capabilities of other devices that decode a sequence of main frames of the enhanced layer.
  • the MCTF encoder 100 performs motion estimation and prediction operations on each target macroblock in a frame.
  • the MCTF encoder 100 also performs an update operation in which an image difference of the target macroblock from a corresponding macroblock in a neighbor frame is added to the corresponding macroblock in the neighbor frame.
  • FIG. 2 is a block diagram of a filter for carrying out these operations.
  • the filter includes a splitter 101 , an estimator/predictor 102 , an updater 103 , and a decoder 105 .
  • the splitter 101 splits an input video frame sequence into earlier and later frames in pairs of successive frames (for example, into odd and even frames).
  • the decoder 105 decodes the sequence of encoded small-screen pictures received from the base layer encoder 150 , and reconstructs pictures to have their original size using an internal scaler 105 a .
  • the estimator/predictor 102 performs motion estimation and prediction on each macroblock in the current frame that will be converted to a predicted frame.
  • the estimator/predictor 102 searches for a reference block of each macroblock in the current frame in neighbor frames of the enhanced layer prior to or subsequent to the current frame or in frames of the base layer, whose size have been restored by the scaler 105 a .
  • the estimator/predictor 102 then calculates an image difference (i.e., a pixel-to-pixel difference) of each macroblock in the current frame from the reference block in the neighbor frames of the enhanced layer and a motion vector from each macroblock to the reference block therein.
  • the estimator/predictor 102 calculates an image difference of each macroblock in the current frame from a corresponding macroblock in a base layer frame in the same time as the current frame, whose size has been restored by the scaler 105 a .
  • the updater 103 performs an update operation on a macroblock, whose reference block has been found by the motion estimation, by normalizing the calculated image difference of the macroblock from the reference block and adding the normalized value to the reference block.
  • the scaler 105 a may be provided as a separate unit outside the decoder 105 .
  • the operation carried out by the updater 103 is referred to as a ‘U’ operation, and a frame produced by the ‘U’ operation is referred to as an ‘L’ (“low”) frame.
  • the updater 103 selectively performs an operation for subtracting an enlarged base layer frame in the same time as the updated frame from the updated frame, and outputting a corresponding L frame produced by the subtraction.
  • the filter of FIG. 2 may perform its operations on a plurality of slices simultaneously and in parallel, which are produced by dividing a single frame, instead of performing its operations on the video frame.
  • a frame (or slice) having an image difference, which is produced by the estimator/predictor 102 is referred to as an ‘H’ (“high”) frame (or slice) since the difference value data in the ‘H’ frame (or slice) reflects high frequency components of the video signal.
  • the term ‘frame’ is used in a broad sense to include a ‘slice’.
  • the estimator/predictor 102 divides each of the input video frames into macroblocks of a set size. For each divided macroblock, the estimator/predictor 102 searches for a block, whose image is most similar to that of each divided macroblock, in previous/next neighbor frames of the enhanced layer and/or in corresponding base layer frames enlarged by the scaler 105 a . That is, the estimator/predictor 102 searches for a macroblock temporally correlated with each divided macroblock. A block having the most similar image to a target image block has the smallest image difference from the target image block. The image difference of two image blocks is defined, for example, as the sum or average of pixel-to-pixel differences of the two image blocks.
  • a macroblock having the smallest difference sum (or average) (i.e., the smallest image difference) from the target macroblock is referred to as a reference block.
  • two reference blocks may be present in a frame (including a base layer frame) prior to the current frame and in a frame (including a base layer frame) subsequent thereto.
  • the estimator/predictor 102 calculates and outputs a motion vector from the current block to the reference block, and also calculates and outputs pixel error values (i.e., pixel difference values) of the current block from pixel values of the reference block, which is present in either the prior frame or the subsequent frame, or from average pixel values of the two reference blocks, which are present in the prior and subsequent frames.
  • pixel error values i.e., pixel difference values
  • the estimator/predictor 102 determines whether or not a frame in the same time as the current frame (hereinafter also referred to as a “temporally coincident frame”) or a frame in a close time to the current frame (hereinafter also referred to as a “temporally close frame”) is present in the base layer sequence.
  • the estimator/predictor 102 obtains the image difference of the current macroblock from a corresponding macroblock in the temporally coincident or temporally close frame based on pixel values of the two macroblocks, and does not obtain a motion vector of the current macroblock.
  • a close time to the current frame corresponds to a time interval including frames that can be regarded as having the same image as the current frame. Information of this time interval is carried within an encoded stream, which will be described later.
  • the corresponding macroblock in the same or close time in the base layer may be used even when a reference block is found for the current macroblock.
  • the pixel value differences of the current macroblock can be calculated, based on, for example the average pixel values of the found reference macroblock and the corresponding base layer macroblock.
  • a motion vector is determined for the current macroblock whose reference block is found, and information indicating that a base layer frame has been used is recorded in a header of the current macroblock.
  • Such an operation of the estimator/predictor 102 is referred to as a ‘P’ operation.
  • the MCTF encoder 100 generates a sequence of H frames and a sequence of L frames, respectively, by performing the ‘P’ and ‘U’ operations described above on a certain-length sequence of pictures, for example, on a group of pictures (GOP). Then, an estimator/predictor and an updater at a next temporal decomposition stage (not shown) generates a sequence of H frames and a sequence of L frames by repeating the ‘P’ and ‘U’ operations on the generated L frame sequence. The ‘P’ and ‘U’ operations are performed an appropriate number of times to produce a final enhanced layer sequence.
  • FIG. 3 shows an example of such a procedure in which the ‘P’ and ‘U’ operations are performed three times (i.e., up to a 3rd encoding level) on one GOP until two L frames remain.
  • the updater 103 in the MCTF encoder 100 generates a 2nd-level sequence of L frames from a 1st-level sequence of L frames by subtracting a sequence of temporally-coincident enlarged frames received from the scaler 105 a from the 1st-level sequence of L frames. It is also possible to generate a next-level sequence of L frames by subtracting the temporally-coincident enlarged frames from L frames of a level other than the 1st level.
  • the level of a sequence of L frames from which the base layer pictures will be subtracted is increased to reduce the time difference of the video signals of the two layers, and the enlarged base-layer pictures are subtracted from the L frames of the increased level.
  • an enhanced layer sequence is produced by subtracting a sequence of small-screen frames provided in the base layer from a sequence of L frames of an appropriate level as described above, image redundancy is removed from the enhanced layer sequence, thereby reducing the amount of coded data and increasing coding gain.
  • the MCTF encoder 100 While performing scalable encoding in the above manner, the MCTF encoder 100 incorporates timing information, which has a structure as shown in FIG. 4 .
  • the timing information is used for synchronizing the enhanced layer to the base layer into a bitstream of the enhanced layer.
  • the MCTF encoder 100 receives information required to provide timing information, as shown in FIG. 4 , from the base layer encoder 150 and/or obtains the required information from externally input and set values.
  • the timing information of FIG. 4 is inserted and transmitted in a bitstream of the enhanced layer periodically or once at the initial transmission of the bitstream.
  • a field ‘flag_BL_fixed_frame_rate’ contains information indicating whether or not the base layer bitstream is encoded at a fixed frame rate in the base layer encoder 150 .
  • a field ‘BL_time_increment_resolution’ contains information representing the ‘resolution’ of a time value recorded in a field ‘BL_time_increment’. For example, if ‘1’ (second) is recorded in the field ‘BL_time_increment’ and ‘5’ is recorded in the field ‘BL_time_increment_resolution’, this indicates that base layer frames are transmitted at 5 frames per second.
  • the estimator/predictor 102 When the estimator/predictor 102 performs the ‘P’ operation to produce an H frame, i.e., when it searches for a reference block of each macroblock in the current frame and converts each macroblock to a predicted image block, the estimator/predictor 102 can selectively use enlarged pictures of the base layer received from the scaler 105 a , in addition to neighbor L frames of the enhanced layer prior to and subsequent to the current frame, as shown in FIG. 5 a.
  • FIG. 5 b shows five frames that can be used to produce an H frame.
  • L frames 401 and 402 are in the same MCTF level as a current L frame 400 L and respectively positioned prior to and subsequent to the L frame 400 L.
  • a frame 405 of the base layer is in the same time as the L frame 400 L.
  • Frames 403 and 405 respectively positioned, prior to and subsequent to the frame 404 are used to produce an H frame 400 H from the current L frame 400 L.
  • FIG. 6 shows some examples of reference block selection modes according to an example embodiment of the present invention in which one or two are selected from five frames to convert a macroblock to image difference data.
  • ‘Fwd_BL_mode’ denotes a reference block selection mode which uses a reference block present in a past picture in the base layer sequence.
  • ‘Bwd_BL_mode’ denotes a reference block selection mode which uses a reference block present in a future picture in the base layer sequence.
  • ‘Bid_BL_mode’ denotes a reference block selection mode which uses two reference blocks present in a past picture and in a future picture in the base layer sequence.
  • ‘Fwd_BL_Bwd_EL_mode’ denotes a reference block selection mode which uses two reference blocks present in a past picture in the base layer and in a future picture in the enhanced layer.
  • ‘Fwd_EL_Bwd_BL_mode’ denotes a reference block selection mode which uses two reference blocks present in a past picture in the enhanced layer and in a future picture in the base layer.
  • ‘TC_pred_mode’ denotes a reference block selection mode which uses pixel values of a corresponding block in a picture in the base layer in the same time as the current frame.
  • ‘TC_pred_Bwd_BL_mode’ denotes a reference block selection mode which uses a corresponding block in a picture in the base layer in the same time as the current frame and a reference block present in a future picture in the base layer.
  • ‘TC_pred_Fwd_BL_mode’ denotes a reference block selection mode which uses a corresponding block in a picture in the base layer in the same time as the current frame and a reference block present in a past picture in the base layer.
  • ‘TC_pred_Bwd_EL_mode’ denotes a reference block selection mode which uses a corresponding block in a picture in the base layer in the same time as the current frame and a reference block present in a future picture in the enhanced layer.
  • ‘TC_pred_Fwd_EL_mode’ denotes a reference block selection mode which uses a corresponding block in a picture in the base layer in the same time as the current frame and a reference block present in a past picture in the enhanced layer.
  • the MCTF encoder 100 transmits ‘reference block selection mode’ information having a structure as shown in FIG. 7 to the texture coding unit 110 after inserting/writing it into a “Ref_Sel_mode” field at a specified position of a header area of a corresponding macroblock as shown in FIG. 8 .
  • the “Ref_Sel_mode” field can be inserted in the header of a frame (or slice) so that the same two reference pictures can be used in the same frame (or slice).
  • ‘flag_use_BL’ denotes information indicating whether or not the base layer is used for the reference block
  • ‘reference_selection_code’ denotes a field in which a value about one of the above-mentioned modes is written. The value in the field ‘reference_selection_code’ indicates which one or two of the five frames described above are used to produce the image difference of the current macroblock.
  • the MCTF encoder 100 also transmits information of the level (i.e., MCTF level) of an L frame sequence, from which the base layer picture sequence has been subtracted, after writing the level information having a structure, as shown in FIG. 9 , into a BL_subtraction field at a specified position of a header area of the corresponding GOP as shown in FIG. 8 .
  • ‘flag_use_BL’ denotes information indicating whether or not the base layer is used for the corresponding GOP
  • ‘BL_subtraction_level’ denotes information indicating the level of an L frame sequence from which the base layer picture sequence has been subtracted.
  • the data stream encoded in the method described above is transmitted by wire or as a wireless transmission to a decoding device. Alternatively, it may be delivered via recording media.
  • the decoding device restores the original video signal in the enhanced and/or base layer according to the method described below.
  • FIG. 10 is a block diagram of a device for decoding a data stream encoded by the device of FIG. 1 .
  • the decoding device of FIG. 10 includes a demuxer (or demultiplexer) 200 , a texture decoding unit 210 , a motion decoding unit 220 , an MCTF decoder 230 , and a base layer decoder 240 .
  • the demuxer 200 separates a received data stream into a compressed motion vector stream, a compressed macroblock information stream, and a base layer stream.
  • the texture decoding unit 210 decodes the compressed bitstream.
  • the motion decoding unit 220 decodes the compressed motion vector information.
  • the MCTF decoder 230 decodes the bitstream containing macroblock information and the motion vector in an MCTF scheme.
  • the base layer decoder 240 decodes the base layer stream according to a specified scheme, for example, according to the MPEG-4 or H.264 standard.
  • the base layer decoder 240 includes therein a scaler 240 a that enlarges a small-screen picture sequence in the base layer to the enhanced layer picture size.
  • the scaler 240 a may be provided as a separate unit outside the base layer decoder 240 .
  • the MCTF decoder 230 includes, as an internal element, an inverse filter that has a structure as shown in FIG. 11 for decoding an input bitstream into a frame sequence.
  • the inverse filter of FIG. 11 includes a front processor 236 , an inverse updater 231 , an inverse predictor 232 , a motion vector decoder 235 , and an arranger 234 .
  • the front processor 236 divides an input enhanced layer stream into H frames and L frames, and analyzes information in each header in the enhanced layer stream.
  • the inverse updater 231 subtracts pixel difference values of input H frames from corresponding pixel values of input L frames.
  • the inverse predictor 232 restores input H frames to frames having original images with reference to the L frames, from which the image differences of the H frames have been subtracted, and/or with reference to enlarged pictures output from the scaler 240 a .
  • the motion vector decoder 235 decodes an input motion vector stream into motion vector information of each block and provides the motion vector information to the inverse predictor 232 .
  • the arranger 234 interleaves the frames completed by the inverse predictor 232 between the L frames output from the inverse updater 231 , thereby producing a normal video frame sequence.
  • the inverse updaters 231 and the inverse predictors 232 are provided in multiple stages corresponding to the MCTF encoding levels described above. As denoted by “239” in FIG. 11 , image values of enlarged pictures from the scaler 240 a are added to corresponding image values of L frames output from an inverse updater 231 of one of the multiple stages. Based on the value of the information “BL_subtraction_level” shown in FIG. 9 carried within the enhanced layer stream, the MCTF decoder 230 determines the stage (encoding level) of L frames to which the base layer frames are to be added.
  • the front processor 236 analyzes and divides an input enhanced layer stream into an L frame sequence and an H frame sequence. In addition, the front processor 236 uses information in each header in the enhanced layer stream to notify the inverse predictor 232 of which frame or frames have been used to produce macroblocks in the H frame. The used frame or frames can be determined from received ‘reference_selection_code’ information as shown in FIG. 7 .
  • the inverse predictor 232 may specify an L frame in the enhanced layer and/or an enlarged frame in the base layer used to produce a predicted image of the macroblock of the H frame, and determine a reference block in the specified frame(s) based on a motion vector provided from the motion vector decoder 235 , and then add pixel values of the reference block or average pixel values of the two reference blocks to pixel difference values of the macroblock of the H frame, thereby restoring the original image of the macroblock thereof.
  • the inverse predictor 232 refers to timing information shown in FIG.
  • each base layer frame is determined based on both the time difference between the two frames and the value ‘THR_temporal_coincident’.
  • the MCTF decoding is performed in specified units, for example, in units of slices in a parallel fashion, so that all macroblocks in the frame approximately restore their original images, and the original images are combined to constitute a complete video frame.
  • the above decoding method restores an MCTF-encoded data stream to a complete video frame sequence.
  • a video frame sequence with the original image quality is obtained if the inverse prediction and update operations are performed N times, whereas a video frame sequence with a lower image quality and at a lower bitrate is obtained if the inverse prediction and update operations are performed less than N times.
  • the decoding device is designed to perform inverse prediction and update operations to the extent suitable for its performance or is designed to decode only the base layer streams.
  • the decoding device described above can be incorporated into a mobile communication terminal or the like or into a recording media playback device.
  • a method and device for encoding/decoding video signals according to the present invention has advantages in that a base layer provided for low-performance decoders, in addition to an enhanced layer, is used in an MCTF encoding procedure to produce H and L frames, thereby reducing the total amount of coded data and thus improving the MCTF coding efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present invention relates to encoding and decoding a video signal by motion compensated temporal filtering. In one embodiment, a first sequence of frames are decoded by inverse motion compensated temporal filtering by selectively adding to a first image block in the first sequence image information, the image information being based on at least one of (1) a second image block from the first sequence and (2) a third image block from an auxiliary sequence of frames.

Description

    DOMESTIC PRIORITY INFORMATION
  • This application claims priority under 35 U.S.C. §119 on U.S. provisional application 60/612,180, filed Sep. 23, 2004; the entire contents of which are hereby incorporated by reference.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates to a method and a device for encoding and decoding video signals.
  • 2. Description of the Related Art
  • A number of standards have been suggested for compressing video signals. One well-known standard is MPEG, which has been adopted as a standard for recording movie content, etc., on a recording medium such as a DVD and is now in widespread use. Another well-known standard is H.264, which is expected to be used as a standard for high-quality TV broadcast signals in the future.
  • While TV broadcast signals require high bandwidth, it is difficult to allocate such high bandwidth for the type of wireless transmissions/receptions performed by mobile phones and notebook computers, for example. Thus, video compression standards for such devices must have high video signal compression efficiencies.
  • Such mobile devices have a variety of processing and so that a variety of forms corresponding to a variety of combinations of variables such as the number of frames transmitted per second, resolution, the number of bits per pixel, etc. This imposes a great burden on content providers.
  • In view of the above, content providers prepare high-bitrate compressed video signals for each video source and perform, when receiving a request from a mobile device, a process of decoding the compressed video signals and encoding it back into video signals suited to the video processing capabilities of a mobile device before providing the requested video signals to the mobile device. However, this method entails a transcoding procedure including decoding, scaling and encoding processes, and causes some time delay in providing the requested signals to the mobile device. The transcoding procedure also requires complex hardware and algorithms to cope with the wide variety of target encoding formats.
  • A Scalable Video Codec (SVC) has been developed in an attempt to overcome these problems. In this scheme, video signals are encoded into a sequence of pictures with the highest image quality while ensuring that a part of the encoded picture sequence (specifically, a partial sequence of pictures intermittently selected from the total sequence of pictures) can be used to represent the video signals with a low image quality.
  • Motion Compensated Temporal Filtering (MCTF) is an encoding and decoding scheme that has been suggested for use in the scalable video codec. However, the MCTF scheme requires a high compression efficiency (i.e., a high coding rate) for reducing the number of bits transmitted per second since it is highly likely to be applied to mobile communication where bandwidth is limited, as described above.
  • Although it is possible to represent low image-quality video signals by receiving and processing part of the sequence of pictures encoded in the scalable MCTF coding scheme as described above, there is still a problem in that the image quality is significantly reduced when the bitrate is lowered.
  • SUMMARY OF THE INVENTION
  • Therefore, the present invention has been made in view of the above problems, and it is an object of the present invention to provide a method and a device for encoding video signals in a scalable scheme by additionally using a base layer provided for a lower transfer rate.
  • The present invention related to encoding and decoding a video signal by motion compensated temporal filtering.
  • In one embodiment, a first sequence of frames are decoded by inverse motion compensated temporal filtering by selectively adding to a first image block in the first sequence image information, the image information being based on at least one of (1) a second image block from the first sequence and (2) a third image block from an auxiliary sequence of frames.
  • In another embodiment, a frame in a current frame interval is decoded wherein the second image block is in a frame of the first sequence that is one of prior to and subsequent to a frame including the first image block.
  • In another embodiment, a frame in a current interval is decoded wherein the third image block is from a frame in the auxiliary sequence of frames that is temporally aligned with a frame including the first image block.
  • In another embodiment, a frame in a current frame interval is decoded by adding the first image block one of (1) an adjacent image block positioned prior to the first image block or an adjacent image positioned subsequent to the first image block in the first sequence, and (2) the third image block from the auxiliary sequence of frames from at least one of an image block temporally aligned with, before and after an image block in the auxiliary sequence temporally aligned with the first image block.
  • In another embodiment, a frame in a current frame interval is decoded by adding the first image block two of (1) an adjacent image block positioned prior to the first image block or an adjacent image positioned subsequent to the first image block in the first sequence, and (2) the third image block from the auxiliary sequence of frames from at least one of an image block temporally aligned with, before and after an image block in the auxiliary sequence temporally aligned with the first image block.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The above and other objects, features and other advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
  • FIG. 1 is a block diagram of a video signal encoding device to which a video signal compression method according to the present invention is applied;
  • FIG. 2 is a block diagram of a filter that performs image estimation/prediction and update operations in the MCTF encoder as shown in FIG. 1;
  • FIG. 3 illustrates how L frames and H frames having image differences are produced from a picture sequence in a group of pictures (GOP) according to an embodiment of the present invention;
  • FIG. 4 illustrates the structure of timing information according to an embodiment of the present invention, which indicates a temporal correlation between main frames of an enhanced layer and auxiliary frames of a base layer and which is inserted and transmitted in a bitstream of the enhanced layer;
  • FIGS. 5 a and 5 b illustrate the relationship between frames of the enhanced and the base layers which can be used as references to produce an H frame having a predicted image according to an embodiment of the present invention;
  • FIG. 6 illustrates limited examples of various reference block selection modes of a macroblock produced by the filter of FIG. 2;
  • FIG. 7 illustrates the structure of reference block selection mode information carried in macroblock header information according to an embodiment of the present invention;
  • FIG. 8 illustrates the structure of information required due to the use of the base layer, which is carried in the enhanced layer bitstream, according to an embodiment of the present invention;
  • FIG. 9 illustrates the structure of information of the encoding level of L frames of the enhanced layer from which images of auxiliary frames of the base layer have been subtracted;
  • FIG. 10 is a block diagram of a device for decoding a bitstream encoded by the device of FIG. 1; and
  • FIG. 11 is a block diagram of an inverse filter that performs inverse prediction and update operations in an MCTF decoder shown in FIG. 10.
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • Example embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
  • FIG. 1 is a block diagram of a video signal encoding device to which a scalable video signal compression method according to the present invention is applied.
  • The video signal encoding device shown in FIG. 1 comprises an MCTF encoder 100, a texture coding unit 110, a motion coding unit 120, a base layer encoder 150, and a muxer (or multiplexer) 130. The MCTF encoder 100 encodes an input video signal in units of macroblocks in an MCTF scheme, and generates suitable management information. The texture coding unit 110 converts information of encoded macroblocks into a compressed bitstream. The motion coding unit 120 encodes motion vectors of macroblocks obtained by the MCTF encoder 100 into a compressed bitstream according to a specified scheme. The base layer encoder 150 encodes an input video signal according to a specified scheme, for example, according to the MPEG-1, 2 or 4 standard or the H.261, H.263 or H.264 standard, and may produce a small-screen picture sequence, for example, a sequence of pictures scaled down to 25% of their original size if necessary. The muxer 130 encapsulates output data from the texture coding unit 110, the small-screen picture sequence output from the base layer encoder 150, and motion vector data of the motion coding unit 120 into a predetermined format. The muxer 130 then multiplexes and outputs the encapsulated data into a set transmission format.
  • In the following description, the small-screen picture sequence is referred to as a base layer sequence, and the output frame/picture sequence of the MCTF encoder 100 is referred to as an enhanced layer sequence. The base layer sequence is a sequence of auxiliary frames that is provided to be selectively used in devices that may have lower performance capabilities than the capabilities of other devices that decode a sequence of main frames of the enhanced layer.
  • The MCTF encoder 100 performs motion estimation and prediction operations on each target macroblock in a frame. The MCTF encoder 100 also performs an update operation in which an image difference of the target macroblock from a corresponding macroblock in a neighbor frame is added to the corresponding macroblock in the neighbor frame. FIG. 2 is a block diagram of a filter for carrying out these operations.
  • As shown in FIG. 2, the filter includes a splitter 101, an estimator/predictor 102, an updater 103, and a decoder 105. The splitter 101 splits an input video frame sequence into earlier and later frames in pairs of successive frames (for example, into odd and even frames). The decoder 105 decodes the sequence of encoded small-screen pictures received from the base layer encoder 150, and reconstructs pictures to have their original size using an internal scaler 105 a. The estimator/predictor 102 performs motion estimation and prediction on each macroblock in the current frame that will be converted to a predicted frame. Specifically, the estimator/predictor 102 searches for a reference block of each macroblock in the current frame in neighbor frames of the enhanced layer prior to or subsequent to the current frame or in frames of the base layer, whose size have been restored by the scaler 105 a. The estimator/predictor 102 then calculates an image difference (i.e., a pixel-to-pixel difference) of each macroblock in the current frame from the reference block in the neighbor frames of the enhanced layer and a motion vector from each macroblock to the reference block therein. Alternatively, the estimator/predictor 102 calculates an image difference of each macroblock in the current frame from a corresponding macroblock in a base layer frame in the same time as the current frame, whose size has been restored by the scaler 105 a. The updater 103 performs an update operation on a macroblock, whose reference block has been found by the motion estimation, by normalizing the calculated image difference of the macroblock from the reference block and adding the normalized value to the reference block. Here, the scaler 105 a may be provided as a separate unit outside the decoder 105. The operation carried out by the updater 103 is referred to as a ‘U’ operation, and a frame produced by the ‘U’ operation is referred to as an ‘L’ (“low”) frame. The updater 103 selectively performs an operation for subtracting an enlarged base layer frame in the same time as the updated frame from the updated frame, and outputting a corresponding L frame produced by the subtraction.
  • The filter of FIG. 2 may perform its operations on a plurality of slices simultaneously and in parallel, which are produced by dividing a single frame, instead of performing its operations on the video frame. A frame (or slice) having an image difference, which is produced by the estimator/predictor 102, is referred to as an ‘H’ (“high”) frame (or slice) since the difference value data in the ‘H’ frame (or slice) reflects high frequency components of the video signal. In the following description of the embodiments, the term ‘frame’ is used in a broad sense to include a ‘slice’.
  • The estimator/predictor 102 divides each of the input video frames into macroblocks of a set size. For each divided macroblock, the estimator/predictor 102 searches for a block, whose image is most similar to that of each divided macroblock, in previous/next neighbor frames of the enhanced layer and/or in corresponding base layer frames enlarged by the scaler 105 a. That is, the estimator/predictor 102 searches for a macroblock temporally correlated with each divided macroblock. A block having the most similar image to a target image block has the smallest image difference from the target image block. The image difference of two image blocks is defined, for example, as the sum or average of pixel-to-pixel differences of the two image blocks. Accordingly, of macroblocks in a previous/next neighbor frame and/or in a corresponding frame enlarged by the scaler 105 a which have a predetermined threshold pixel-to-pixel difference sum (or average) or less from a target macroblock in the current frame, a macroblock having the smallest difference sum (or average) (i.e., the smallest image difference) from the target macroblock is referred to as a reference block. For each macroblock of a current frame, two reference blocks may be present in a frame (including a base layer frame) prior to the current frame and in a frame (including a base layer frame) subsequent thereto.
  • If the reference block is found, the estimator/predictor 102 calculates and outputs a motion vector from the current block to the reference block, and also calculates and outputs pixel error values (i.e., pixel difference values) of the current block from pixel values of the reference block, which is present in either the prior frame or the subsequent frame, or from average pixel values of the two reference blocks, which are present in the prior and subsequent frames.
  • If no macroblock providing a predetermined threshold image difference or less from the current macroblock is found in the two neighbor frames (including base layer frames) via the motion estimation operation, the estimator/predictor 102 determines whether or not a frame in the same time as the current frame (hereinafter also referred to as a “temporally coincident frame”) or a frame in a close time to the current frame (hereinafter also referred to as a “temporally close frame”) is present in the base layer sequence. If such a frame is present in the base layer sequence, the estimator/predictor 102 obtains the image difference of the current macroblock from a corresponding macroblock in the temporally coincident or temporally close frame based on pixel values of the two macroblocks, and does not obtain a motion vector of the current macroblock. A close time to the current frame corresponds to a time interval including frames that can be regarded as having the same image as the current frame. Information of this time interval is carried within an encoded stream, which will be described later.
  • The corresponding macroblock in the same or close time in the base layer may be used even when a reference block is found for the current macroblock. Specifically, the pixel value differences of the current macroblock can be calculated, based on, for example the average pixel values of the found reference macroblock and the corresponding base layer macroblock. In this case, a motion vector is determined for the current macroblock whose reference block is found, and information indicating that a base layer frame has been used is recorded in a header of the current macroblock.
  • Such an operation of the estimator/predictor 102 is referred to as a ‘P’ operation.
  • The MCTF encoder 100 generates a sequence of H frames and a sequence of L frames, respectively, by performing the ‘P’ and ‘U’ operations described above on a certain-length sequence of pictures, for example, on a group of pictures (GOP). Then, an estimator/predictor and an updater at a next temporal decomposition stage (not shown) generates a sequence of H frames and a sequence of L frames by repeating the ‘P’ and ‘U’ operations on the generated L frame sequence. The ‘P’ and ‘U’ operations are performed an appropriate number of times to produce a final enhanced layer sequence.
  • FIG. 3 shows an example of such a procedure in which the ‘P’ and ‘U’ operations are performed three times (i.e., up to a 3rd encoding level) on one GOP until two L frames remain. In the example of FIG. 3, the updater 103 in the MCTF encoder 100 generates a 2nd-level sequence of L frames from a 1st-level sequence of L frames by subtracting a sequence of temporally-coincident enlarged frames received from the scaler 105 a from the 1st-level sequence of L frames. It is also possible to generate a next-level sequence of L frames by subtracting the temporally-coincident enlarged frames from L frames of a level other than the 1st level. For example, in the case where enlarged base-layer pictures provided from the scaler 105 a are not synchronized with 1st-level L frames, the level of a sequence of L frames from which the base layer pictures will be subtracted is increased to reduce the time difference of the video signals of the two layers, and the enlarged base-layer pictures are subtracted from the L frames of the increased level.
  • If an enhanced layer sequence is produced by subtracting a sequence of small-screen frames provided in the base layer from a sequence of L frames of an appropriate level as described above, image redundancy is removed from the enhanced layer sequence, thereby reducing the amount of coded data and increasing coding gain.
  • While performing scalable encoding in the above manner, the MCTF encoder 100 incorporates timing information, which has a structure as shown in FIG. 4. The timing information is used for synchronizing the enhanced layer to the base layer into a bitstream of the enhanced layer. The MCTF encoder 100 receives information required to provide timing information, as shown in FIG. 4, from the base layer encoder 150 and/or obtains the required information from externally input and set values. The timing information of FIG. 4 is inserted and transmitted in a bitstream of the enhanced layer periodically or once at the initial transmission of the bitstream.
  • In the timing information structure of FIG. 4, a field ‘flag_BL_fixed_frame_rate’ contains information indicating whether or not the base layer bitstream is encoded at a fixed frame rate in the base layer encoder 150. A field ‘BL_time_increment_resolution’ contains information representing the ‘resolution’ of a time value recorded in a field ‘BL_time_increment’. For example, if ‘1’ (second) is recorded in the field ‘BL_time_increment’ and ‘5’ is recorded in the field ‘BL_time_increment_resolution’, this indicates that base layer frames are transmitted at 5 frames per second. A field ‘THR_temporal_coincident’ indicates the time interval between an enhanced layer frame and a base layer frame that are regarded as having the same time. For example, this field may have a value in milliseconds. Specifically, when this value is 10, the decoder regards both an enhanced layer frame and a base layer frame as having the same image (i.e., as being coincident) if the difference between a time value of the enhanced layer frame, which is inserted in the frame during encoding, and a time value of the base layer frame calculated from the frame rate (where the frame rate =the number of received base layer frames * ‘BL_time_increment’/‘BL_time_increment_resolution’) is less than 0.01 second.
  • When the estimator/predictor 102 performs the ‘P’ operation to produce an H frame, i.e., when it searches for a reference block of each macroblock in the current frame and converts each macroblock to a predicted image block, the estimator/predictor 102 can selectively use enlarged pictures of the base layer received from the scaler 105 a, in addition to neighbor L frames of the enhanced layer prior to and subsequent to the current frame, as shown in FIG. 5 a.
  • In an example embodiment of the present invention, five frames are used to produce each H frame. FIG. 5 b shows five frames that can be used to produce an H frame. Specifically, L frames 401 and 402 are in the same MCTF level as a current L frame 400L and respectively positioned prior to and subsequent to the L frame 400L. A frame 405 of the base layer is in the same time as the L frame 400L. Frames 403 and 405 respectively positioned, prior to and subsequent to the frame 404, are used to produce an H frame 400H from the current L frame 400L.
  • FIG. 6 shows some examples of reference block selection modes according to an example embodiment of the present invention in which one or two are selected from five frames to convert a macroblock to image difference data. In FIG. 6, ‘Fwd_BL_mode’ denotes a reference block selection mode which uses a reference block present in a past picture in the base layer sequence. ‘Bwd_BL_mode’ denotes a reference block selection mode which uses a reference block present in a future picture in the base layer sequence. ‘Bid_BL_mode’ denotes a reference block selection mode which uses two reference blocks present in a past picture and in a future picture in the base layer sequence. ‘Fwd_BL_Bwd_EL_mode’ denotes a reference block selection mode which uses two reference blocks present in a past picture in the base layer and in a future picture in the enhanced layer. ‘Fwd_EL_Bwd_BL_mode’ denotes a reference block selection mode which uses two reference blocks present in a past picture in the enhanced layer and in a future picture in the base layer. In addition, ‘TC_pred_mode’ denotes a reference block selection mode which uses pixel values of a corresponding block in a picture in the base layer in the same time as the current frame. ‘TC_pred_Bwd_BL_mode’ denotes a reference block selection mode which uses a corresponding block in a picture in the base layer in the same time as the current frame and a reference block present in a future picture in the base layer. ‘TC_pred_Fwd_BL_mode’ denotes a reference block selection mode which uses a corresponding block in a picture in the base layer in the same time as the current frame and a reference block present in a past picture in the base layer. ‘TC_pred_Bwd_EL_mode’ denotes a reference block selection mode which uses a corresponding block in a picture in the base layer in the same time as the current frame and a reference block present in a future picture in the enhanced layer. ‘TC_pred_Fwd_EL_mode’ denotes a reference block selection mode which uses a corresponding block in a picture in the base layer in the same time as the current frame and a reference block present in a past picture in the enhanced layer.
  • There are various other modes not shown in FIG. 6. To inform the decoder of which one of the modes shown in FIG. 6 and the various other modes not shown therein is employed, the MCTF encoder 100 transmits ‘reference block selection mode’ information having a structure as shown in FIG. 7 to the texture coding unit 110 after inserting/writing it into a “Ref_Sel_mode” field at a specified position of a header area of a corresponding macroblock as shown in FIG. 8. The “Ref_Sel_mode” field can be inserted in the header of a frame (or slice) so that the same two reference pictures can be used in the same frame (or slice).
  • In the reference block selection mode information structure of FIG. 7, ‘flag_use_BL’ denotes information indicating whether or not the base layer is used for the reference block, and ‘reference_selection_code’ denotes a field in which a value about one of the above-mentioned modes is written. The value in the field ‘reference_selection_code’ indicates which one or two of the five frames described above are used to produce the image difference of the current macroblock.
  • The MCTF encoder 100 also transmits information of the level (i.e., MCTF level) of an L frame sequence, from which the base layer picture sequence has been subtracted, after writing the level information having a structure, as shown in FIG. 9, into a BL_subtraction field at a specified position of a header area of the corresponding GOP as shown in FIG. 8.
  • In the information structure shown in FIG. 9, ‘flag_use_BL’ denotes information indicating whether or not the base layer is used for the corresponding GOP, and ‘BL_subtraction_level’ denotes information indicating the level of an L frame sequence from which the base layer picture sequence has been subtracted.
  • The data stream encoded in the method described above is transmitted by wire or as a wireless transmission to a decoding device. Alternatively, it may be delivered via recording media. The decoding device restores the original video signal in the enhanced and/or base layer according to the method described below.
  • FIG. 10 is a block diagram of a device for decoding a data stream encoded by the device of FIG. 1. The decoding device of FIG. 10 includes a demuxer (or demultiplexer) 200, a texture decoding unit 210, a motion decoding unit 220, an MCTF decoder 230, and a base layer decoder 240. The demuxer 200 separates a received data stream into a compressed motion vector stream, a compressed macroblock information stream, and a base layer stream. The texture decoding unit 210 decodes the compressed bitstream. The motion decoding unit 220 decodes the compressed motion vector information. The MCTF decoder 230 decodes the bitstream containing macroblock information and the motion vector in an MCTF scheme. The base layer decoder 240 decodes the base layer stream according to a specified scheme, for example, according to the MPEG-4 or H.264 standard. The base layer decoder 240 includes therein a scaler 240 a that enlarges a small-screen picture sequence in the base layer to the enhanced layer picture size. The scaler 240 a may be provided as a separate unit outside the base layer decoder 240.
  • The MCTF decoder 230 includes, as an internal element, an inverse filter that has a structure as shown in FIG. 11 for decoding an input bitstream into a frame sequence.
  • The inverse filter of FIG. 11 includes a front processor 236, an inverse updater 231, an inverse predictor 232, a motion vector decoder 235, and an arranger 234. The front processor 236 divides an input enhanced layer stream into H frames and L frames, and analyzes information in each header in the enhanced layer stream. The inverse updater 231 subtracts pixel difference values of input H frames from corresponding pixel values of input L frames. The inverse predictor 232 restores input H frames to frames having original images with reference to the L frames, from which the image differences of the H frames have been subtracted, and/or with reference to enlarged pictures output from the scaler 240 a. The motion vector decoder 235 decodes an input motion vector stream into motion vector information of each block and provides the motion vector information to the inverse predictor 232. The arranger 234 interleaves the frames completed by the inverse predictor 232 between the L frames output from the inverse updater 231, thereby producing a normal video frame sequence.
  • Although one inverse updater 231 and one inverse predictor 232 are illustrated above, the inverse updaters 231 and the inverse predictors 232 are provided in multiple stages corresponding to the MCTF encoding levels described above. As denoted by “239” in FIG. 11, image values of enlarged pictures from the scaler 240 a are added to corresponding image values of L frames output from an inverse updater 231 of one of the multiple stages. Based on the value of the information “BL_subtraction_level” shown in FIG. 9 carried within the enhanced layer stream, the MCTF decoder 230 determines the stage (encoding level) of L frames to which the base layer frames are to be added.
  • The front processor 236 analyzes and divides an input enhanced layer stream into an L frame sequence and an H frame sequence. In addition, the front processor 236 uses information in each header in the enhanced layer stream to notify the inverse predictor 232 of which frame or frames have been used to produce macroblocks in the H frame. The used frame or frames can be determined from received ‘reference_selection_code’ information as shown in FIG. 7.
  • For each macroblock of an H frame, the inverse predictor 232 may specify an L frame in the enhanced layer and/or an enlarged frame in the base layer used to produce a predicted image of the macroblock of the H frame, and determine a reference block in the specified frame(s) based on a motion vector provided from the motion vector decoder 235, and then add pixel values of the reference block or average pixel values of the two reference blocks to pixel difference values of the macroblock of the H frame, thereby restoring the original image of the macroblock thereof. In the case of using a base layer frame, the inverse predictor 232 refers to timing information shown in FIG. 4 analyzed by the front processor 236 to specify an auxiliary frame in the base layer prior to, subsequent to, or temporally coincident with the current H frame. If output frames of the base layer decoder 240 are counted, it is possible to determine the time of each base layer frame from the information shown in FIG. 4, so that it is possible to determine whether the base layer frame is prior to or subsequent to the current H frame. Whether or not each base layer frame is in the same time as the current H frame is determined based on both the time difference between the two frames and the value ‘THR_temporal_coincident’.
  • For one H frame, the MCTF decoding is performed in specified units, for example, in units of slices in a parallel fashion, so that all macroblocks in the frame approximately restore their original images, and the original images are combined to constitute a complete video frame.
  • The above decoding method restores an MCTF-encoded data stream to a complete video frame sequence. In the case where the estimation/prediction and update operations have been performed for a GOP N times in the MCTF encoding procedure described above, a video frame sequence with the original image quality is obtained if the inverse prediction and update operations are performed N times, whereas a video frame sequence with a lower image quality and at a lower bitrate is obtained if the inverse prediction and update operations are performed less than N times. However, it is possible to achieve a higher image quality by decoding and outputting a frame sequence in the base layer, instead of obtaining a low bitrate video frame sequence according to the MCTF scheme. Accordingly, the decoding device is designed to perform inverse prediction and update operations to the extent suitable for its performance or is designed to decode only the base layer streams.
  • The decoding device described above can be incorporated into a mobile communication terminal or the like or into a recording media playback device.
  • As is apparent from the above description, a method and device for encoding/decoding video signals according to the present invention has advantages in that a base layer provided for low-performance decoders, in addition to an enhanced layer, is used in an MCTF encoding procedure to produce H and L frames, thereby reducing the total amount of coded data and thus improving the MCTF coding efficiency.
  • Although this invention has been described with reference to the preferred embodiments, it will be apparent to those skilled in the art that various improvements, modifications, replacements, and additions can be made in the invention without departing from the scope and spirit of the invention. Thus, it is intended that the invention cover the improvements, modifications, replacements, and additions of the invention, provided they come within the scope of the appended claims and their equivalents.

Claims (19)

1. A method of decoding a first sequence of frames by inverse motion compensated temporal filtering, comprising:
selectively adding to a first image block in the first sequence image information, the image information being based on at least one of (1) a second image block from the first sequence and (2) a third image block from an auxiliary sequence of frames.
2. The method of claim 1, wherein the second image block is in a frame of the first sequence that is one of prior to and subsequent to a frame including the first image block.
3. The method of claim 2, wherein the third image block is from a frame in the auxiliary sequence of frames that is temporally aligned with a frame including the first image block.
4. The method of claim 3, wherein more than one frame of the auxiliary sequence is temporally aligned with the frame including the first image block.
5. The method of claim 1, where the selectively adding step adds to the first image block the third image block from the auxiliary sequence of frames from at least one of an image block temporally aligned with, before, and after the first image block.
6. The method of claim 4, where the selectively adding step adds to the first image block the third image block from the auxiliary sequence of frames from a combination of more than one of an image block temporally aligned with, before, and after the first image block.
7. The method of claim 1, wherein the selectively adding step adds to the first image block (1) the second image block from the first sequence and (2) the third image block from the auxiliary sequence of frames.
8. The method of claim 1, further comprising:
obtaining a decoding mode of the first image block based on information in the encoded video signal; and wherein
the selectively adding step determines the image information based on the obtained decoding mode.
9. The method of claim 8, wherein the obtaining step obtains the decoding mode from a header of the first image block.
10. The method of claim 9, wherein the obtaining step obtains information in the header of the first image block to synchronize the timing of the first sequence of frames with the auxiliary sequence of frames.
11. The method of claim 10, wherein the obtaining step obtains a frame rate from the information in the header of the first image block to synchronize the timing of the first sequence of frames with the auxiliary sequence of frames.
12. The method of claim 11, wherein the obtaining step obtains a frame rate and resolution of the auxiliary sequence of frames from the information in the header of the first image block to synchronize the timing of the first sequence of frames with the auxiliary sequence of frames.
13. The method of claim 9, wherein the obtaining step obtains information indicating a fixed frame rate of the auxiliary sequence of frames in the header of the first image block to synchronize the timing of the first sequence of frames with the auxiliary sequence of frames.
14. The method of claim 1, where the selectively adding step adds to the first image block one of (1) an adjacent image block positioned prior to the first image block or an adjacent image positioned subsequent to the first image block in the first sequence, and (2) the third image block from the auxiliary sequence of frames from at least one of an image block temporally aligned with, before and after an image block in the auxiliary sequence temporally aligned with the first image block.
15. The method of claim 1, where the selectively adding step adds to the first image block two of (1) an adjacent image block positioned prior to the first image block or an adjacent image positioned subsequent to the first image block in the first sequence, and (2) the third image block from the auxiliary sequence of frames from at least one of an image block temporally aligned with, before and after an image block in the auxiliary sequence temporally aligned with the first image block.
16. A device for decoding a first sequence of frames in an encoded video signal by inverse motion compensated temporal filtering, comprising:
an inverse estimator/predictor for selectively adding to a first image block in the first sequence image information, the image information being based on at least one of (1) a second image block from the first sequence and (2) a third image block from an auxiliary sequence of frames.
17. A device for encoding a first sequence of frames in a video signal by inverse motion compensated temporal filtering, comprising:
an estimator for selectively subtracting from a first image block in the first sequence at least one of (1) a second image block from the first sequence and (2) a third image block from an auxiliary sequence of frames.
18. A method for encoding a first sequence of frames by inverse motion compensated temporal filtering, comprising:
selectively subtracting a first image block in the first sequence from image information, the image information being based on at least one of (1) a second image block from the first sequence and (2) a third image block from an auxiliary sequence of frames.
19. A method of encoding a first sequence of frames by inverse motion compensated temporal filtering, comprising:
adding encoding mode information to the first sequence indicating image information added to a first image block in a frame of the first sequence of frames.
US11/231,868 2004-09-23 2005-09-22 Method and device for encoding/decoding video signals using base layer Abandoned US20060062300A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US11/231,868 US20060062300A1 (en) 2004-09-23 2005-09-22 Method and device for encoding/decoding video signals using base layer
US13/067,497 US8885710B2 (en) 2004-09-23 2011-06-06 Method and device for encoding/decoding video signals using base layer
US14/510,496 US9338453B2 (en) 2004-09-23 2014-10-09 Method and device for encoding/decoding video signals using base layer

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US61218004P 2004-09-23 2004-09-23
KR10-2004-0116898 2004-12-30
KR1020040116898A KR100878809B1 (en) 2004-09-23 2004-12-30 Method of decoding for a video signal and apparatus thereof
US11/231,868 US20060062300A1 (en) 2004-09-23 2005-09-22 Method and device for encoding/decoding video signals using base layer

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/067,497 Continuation US8885710B2 (en) 2004-09-23 2011-06-06 Method and device for encoding/decoding video signals using base layer

Publications (1)

Publication Number Publication Date
US20060062300A1 true US20060062300A1 (en) 2006-03-23

Family

ID=37138731

Family Applications (3)

Application Number Title Priority Date Filing Date
US11/231,868 Abandoned US20060062300A1 (en) 2004-09-23 2005-09-22 Method and device for encoding/decoding video signals using base layer
US13/067,497 Active 2027-12-20 US8885710B2 (en) 2004-09-23 2011-06-06 Method and device for encoding/decoding video signals using base layer
US14/510,496 Expired - Fee Related US9338453B2 (en) 2004-09-23 2014-10-09 Method and device for encoding/decoding video signals using base layer

Family Applications After (2)

Application Number Title Priority Date Filing Date
US13/067,497 Active 2027-12-20 US8885710B2 (en) 2004-09-23 2011-06-06 Method and device for encoding/decoding video signals using base layer
US14/510,496 Expired - Fee Related US9338453B2 (en) 2004-09-23 2014-10-09 Method and device for encoding/decoding video signals using base layer

Country Status (2)

Country Link
US (3) US20060062300A1 (en)
KR (1) KR100878809B1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050271140A1 (en) * 2000-11-27 2005-12-08 Tsuyoshi Hanamura Bit stream separating and merging system, apparatus, method and computer program product
US20060062299A1 (en) * 2004-09-23 2006-03-23 Park Seung W Method and device for encoding/decoding video signals using temporal and spatial correlations between macroblocks
US20090175358A1 (en) * 2008-01-03 2009-07-09 Broadcom Corporation Video processing system and transcoder for use with layered video coding and methods for use therewith
US20090180761A1 (en) * 2008-01-11 2009-07-16 Martin Austin Wand System and method for recording high frame rate video, replaying slow-motion and replaying normal speed with audio-video synchronization
US20110182354A1 (en) * 2010-01-26 2011-07-28 Wonkap Jang Low Complexity, High Frame Rate Video Encoder
CN105379278A (en) * 2013-07-18 2016-03-02 高通股份有限公司 Device and method for scalable coding of video information
US20160097858A1 (en) * 2014-10-06 2016-04-07 The Boeing Company Backfilling clouds of 3d coordinates
US9426462B2 (en) 2012-09-21 2016-08-23 Qualcomm Incorporated Indication and activation of parameter sets for video coding

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI376645B (en) * 2008-07-29 2012-11-11 Sonix Technology Co Ltd Image processing methof of noise reduction and apparatus thereof
US20130188718A1 (en) * 2012-01-20 2013-07-25 Qualcomm Incorporated Motion prediction in svc without including a temporally neighboring block motion vector in a candidate list
WO2013168952A1 (en) * 2012-05-08 2013-11-14 엘지전자 주식회사 Inter-layer prediction method and apparatus using same
WO2013176495A1 (en) * 2012-05-25 2013-11-28 엘지전자 주식회사 Interlayer prediction method and apparatus using same
KR101353655B1 (en) * 2012-06-22 2014-01-21 한국방송공사 Method and apparatus of scalable video coding using heterogeneous video codecs
WO2014017809A1 (en) * 2012-07-24 2014-01-30 한국전자통신연구원 Method of decoding images and device using same
US20150312579A1 (en) * 2012-12-04 2015-10-29 Intellectual Discovery Co., Ltd. Video encoding and decoding method and device using said method
CN104885454B (en) 2013-01-04 2017-09-15 维迪奥股份有限公司 Method, device and the system decoded for video
US20150350671A1 (en) * 2013-01-04 2015-12-03 Samsung Electronics Co., Ltd. Motion compensation method and device for encoding and decoding scalable video
CA2909595A1 (en) * 2013-04-17 2014-10-23 Wilus Institute Of Standards And Technology Inc. Method and apparatus for processing video signal
US9635371B2 (en) * 2013-05-31 2017-04-25 Qualcomm Incorporated Determining rounding offset using scaling factor in picture resampling
GB2539197B (en) * 2015-06-08 2019-10-30 Imagination Tech Ltd Complementary vectors

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050195900A1 (en) * 2004-03-04 2005-09-08 Samsung Electronics Co., Ltd. Video encoding and decoding methods and systems for video streaming service
US20090168872A1 (en) * 2005-01-21 2009-07-02 Lg Electronics Inc. Method and Apparatus for Encoding/Decoding Video Signal Using Block Prediction Information
US20090310680A1 (en) * 2006-11-09 2009-12-17 Lg Electronic Inc. Method and Apparatus for Decoding/Encoding a Video Signal

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6263022B1 (en) * 1999-07-06 2001-07-17 Philips Electronics North America Corp. System and method for fine granular scalable video with selective quality enhancement
US7929610B2 (en) * 2001-03-26 2011-04-19 Sharp Kabushiki Kaisha Methods and systems for reducing blocking artifacts with reduced complexity for spatially-scalable video coding
CN1253008C (en) * 2001-10-26 2006-04-19 皇家飞利浦电子股份有限公司 Spatial scalable compression
US20050195896A1 (en) * 2004-03-08 2005-09-08 National Chiao Tung University Architecture for stack robust fine granularity scalability
KR100587561B1 (en) * 2004-04-08 2006-06-08 삼성전자주식회사 Method and apparatus for implementing motion scalability
CN100588257C (en) * 2004-06-23 2010-02-03 新加坡科技研究局 Scalable video coding with grid motion estimation and compensation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050195900A1 (en) * 2004-03-04 2005-09-08 Samsung Electronics Co., Ltd. Video encoding and decoding methods and systems for video streaming service
US20090168872A1 (en) * 2005-01-21 2009-07-02 Lg Electronics Inc. Method and Apparatus for Encoding/Decoding Video Signal Using Block Prediction Information
US20090310680A1 (en) * 2006-11-09 2009-12-17 Lg Electronic Inc. Method and Apparatus for Decoding/Encoding a Video Signal

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050271140A1 (en) * 2000-11-27 2005-12-08 Tsuyoshi Hanamura Bit stream separating and merging system, apparatus, method and computer program product
US20060062299A1 (en) * 2004-09-23 2006-03-23 Park Seung W Method and device for encoding/decoding video signals using temporal and spatial correlations between macroblocks
US8594191B2 (en) * 2008-01-03 2013-11-26 Broadcom Corporation Video processing system and transcoder for use with layered video coding and methods for use therewith
US20090175358A1 (en) * 2008-01-03 2009-07-09 Broadcom Corporation Video processing system and transcoder for use with layered video coding and methods for use therewith
US20090180761A1 (en) * 2008-01-11 2009-07-16 Martin Austin Wand System and method for recording high frame rate video, replaying slow-motion and replaying normal speed with audio-video synchronization
US8045836B2 (en) * 2008-01-11 2011-10-25 Texas Instruments Incorporated System and method for recording high frame rate video, replaying slow-motion and replaying normal speed with audio-video synchronization
US20110182354A1 (en) * 2010-01-26 2011-07-28 Wonkap Jang Low Complexity, High Frame Rate Video Encoder
US9426462B2 (en) 2012-09-21 2016-08-23 Qualcomm Incorporated Indication and activation of parameter sets for video coding
US9554146B2 (en) 2012-09-21 2017-01-24 Qualcomm Incorporated Indication and activation of parameter sets for video coding
CN105379278A (en) * 2013-07-18 2016-03-02 高通股份有限公司 Device and method for scalable coding of video information
US10212437B2 (en) 2013-07-18 2019-02-19 Qualcomm Incorporated Device and method for scalable coding of video information
US20160097858A1 (en) * 2014-10-06 2016-04-07 The Boeing Company Backfilling clouds of 3d coordinates
US9772405B2 (en) * 2014-10-06 2017-09-26 The Boeing Company Backfilling clouds of 3D coordinates

Also Published As

Publication number Publication date
US20110235714A1 (en) 2011-09-29
KR100878809B1 (en) 2009-01-14
US9338453B2 (en) 2016-05-10
US20150023428A1 (en) 2015-01-22
KR20060027778A (en) 2006-03-28
US8885710B2 (en) 2014-11-11

Similar Documents

Publication Publication Date Title
US9338453B2 (en) Method and device for encoding/decoding video signals using base layer
US7627034B2 (en) Method for scalably encoding and decoding video signal
US7924917B2 (en) Method for encoding and decoding video signals
US20060062299A1 (en) Method and device for encoding/decoding video signals using temporal and spatial correlations between macroblocks
US7733963B2 (en) Method for encoding and decoding video signal
US8532187B2 (en) Method and apparatus for scalably encoding/decoding video signal
US20060133482A1 (en) Method for scalably encoding and decoding video signal
US20100208799A1 (en) Method for decoding image block
US20060062298A1 (en) Method for encoding and decoding video signals
KR100880640B1 (en) Method for scalably encoding and decoding video signal
US20060120454A1 (en) Method and apparatus for encoding/decoding video signal using motion vectors of pictures in base layer
US20060159181A1 (en) Method for encoding and decoding video signal
US20060078053A1 (en) Method for encoding and decoding video signals
KR100883604B1 (en) Method for scalably encoding and decoding video signal
KR100878824B1 (en) Method for scalably encoding and decoding video signal
US20080008241A1 (en) Method and apparatus for encoding/decoding a first frame sequence layer based on a second frame sequence layer
US20060159176A1 (en) Method and apparatus for deriving motion vectors of macroblocks from motion vectors of pictures of base layer when encoding/decoding video signal
US20070242747A1 (en) Method and apparatus for encoding/decoding a first frame sequence layer based on a second frame sequence layer
US20070223573A1 (en) Method and apparatus for encoding/decoding a first frame sequence layer based on a second frame sequence layer
US20070280354A1 (en) Method and apparatus for encoding/decoding a first frame sequence layer based on a second frame sequence layer
US20060133497A1 (en) Method and apparatus for encoding/decoding video signal using motion vectors of pictures at different temporal decomposition level
US20060067410A1 (en) Method for encoding and decoding video signals
US20060072670A1 (en) Method for encoding and decoding video signals
US20060120457A1 (en) Method and apparatus for encoding and decoding video signal for preventing decoding error propagation
US20060133499A1 (en) Method and apparatus for encoding video signal using previous picture already converted into H picture as reference picture of current picture and method and apparatus for decoding such encoded video signal

Legal Events

Date Code Title Description
AS Assignment

Owner name: LG ELECTRONICS INC., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PARK, SEUNG WOOK;PARK, JI HO;JEON, BYEONG MOON;REEL/FRAME:017085/0248;SIGNING DATES FROM 20051128 TO 20051129

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE