MXPA99007435A

MXPA99007435A - Video objects coded by keyregions

Info

Publication number: MXPA99007435A
Application number: MXPA/A/1999/007435A
Authority: MX
Inventors: Geoffry Haskell Barin; Puri Atul; Lewis Schmidt Robert
Original assignee: At & T Corporation
Priority date: 1997-02-14
Filing date: 1999-08-11
Publication date: 2000-01-01

Abstract

A coding protocol provides for coding video data that has been organized as video objects. The protocol provides a keyregion to permit coding of a region of data within the video object having common attributes. According to the protocol a keyregion is identified by a keyregion header, which includes a resync marker that uniquely identifies the keyregion header, a keyregion position signal indicating an origin and a size of the keyregion;and data of the common attribute. Data following the keyregion is coded according to the common attribute.

Description

VIDEO OBJECTS CODED BY KEY REGIONS BACKGROUND OF THE INVENTION The present invention relates to video coding and more particularly to the use of video objects in combination with key regions to improve coding efficiency and image quality. The emergence of video objects and video object plans (VOPs) in video coding, allows significant coding savings by selectively distributing bits between frame portions that require a relatively large number of bits and other portions that require a relatively small number of bits. VOPs also allow additional functionality such as object manipulation. As an example, Figure 1 (a) illustrates a table for coding that includes a head and shoulders of a narrator, a logo suspended within the box and a background. The logo can be static, have no movement and without REF .: 30957 animation. In this case, bit savings can be achieved by logo coding only once. For display, the encoded logo can be decoded and displayed continuously from the simple coded representation. Similarly, it may be convenient to code the background at a low refresh rate or refresh rate, to save bits and yet create an illusion of movement in the reconstructed image. The bit savings achieved by logo and background coding at lower speeds can allow the narrator to encode at a higher speed, where the perceptual significance of the image can reside. VOPs are suitable for these applications. Figures 1 (b) - (d) illustrate the picture of Figure 1 (a) decomposed into three VOPs. By convention, a fund is usually assigned VOPF. The narrator and the logo can be assigned V0P1 and V0P2, respectively. Within each VOP, all image data is encoded and decoded identically. Not all data within a VOP merit identical treatment. For example, certain regions of the VOP may require animation, while others are relatively static. Consider the example of the narrator. The perceptually significant area of the V0P1 center around the facial features of the figure. The narrator's clothing and hair may not require animation in the same proportion as the facial features. Accordingly, there is a need in the art for a coding system that emphasizes certain areas of a VOP more than others. In addition, regions of a VOP may possess similar characteristics. For example, some image data within the VOP may exhibit the same movement vector or may be quantized according to the same quantization parameters. Certain regions of a VOP may require greater resilience against channel errors. Coding efficiencies can be obtained by coding similar attributes only once for the region. These efficiencies are lost unless the coding systems provide a means to encode common attributes of the region differently from other regions in a VOP that do not share the common attributes. Finally, it may be preferable to embed functionalities in certain regions of a VOP. For example, images over regions of a VOP can be superimposed. Consider an example where it is convenient to impose a logo image on the narrator's dress in VOP1 and allow a viewer to selectively activate or deactivate the logo display. Accordingly, there is a need in the art for associating functionalities with certain regions of a VOP. SUMMARY OF THE INVENTION The present invention alleviates the aforementioned needs in the art to a large extent by providing key regions for VOPs. The key regions exhibit one or more of the following properties: They are optional, They consist of a sequence of macroblocks, • Are two-dimensional but do not have to be rectangular, A VOP can be delivered in regions, but not all macro-blocks of a VOP must belong to a key region, • A macro-block that is not a member of the key region can be a member of a key background region, Key regions start and end in the same VOP, - A macroblock belonging to a key region belongs only to a key region, and Macroblocks of a key region share at least some common attribute. The key region is defined in the video information encoded by a key region head that identifies the location and width of the key region. Data of the macro blocks in the key region are decoded by a decoder and sequentially placed within the key region until the decoder receives another head of key region. BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 (a) illustrates a video frame that is encoded in accordance with the present invention; Figures 1 (b) -l (d) represent video objects of the frame of Figure 1 (a), which are coded according to the present invention. Figure 2 is a block diagram of the present invention. Figure 3 represents the structure of a key region head generated in accordance with the present invention. Figure 4 illustrates a video object coded by key region according to the present invention. Figure 5 illustrates the operation of a decoder operating in accordance with the present invention. DETAILED DESCRIPTION The present invention provides key regions for coding areas of VOPs at lower bit rates and with improved image quality. The key regions are a collection of macroblocks within a VOP that is related according to one or more attributes. For example, macro blocks within a key region may have been quantized according to the same quantization parameter, may exhibit the same movement vector and / or may have the same priority. Typically, however, the macro blocks do not merit coding as a separate VOP since the bit costs associated with VOP coding would result in coding inefficiencies. Based on similarities between macroblocks, coding efficiencies are obtained by organizing macroblocks in key regions and coding common information only once. In accordance with the present invention, as illustrated in Figure 2, an encoder 100 receives a video signal representative of a frame or frames to be encoded. The video signal is sampled and organized into macroblocks that are spatial areas of each frame. The encoder 100 encodes the macroblocks and outputs an encoded bitstream to the channel 150. The bit stream can identify some macroblocks that have been organized and encoded as VOPs. The channel 150 can be a radio channel, a computer network or some storage medium such as a memory or a magnetic or optical disk. A decoder 200 retrieves the bit stream from channel 150 and reconstructs a video signal for display therefrom. The encoder 100 defines a VOP in the bit stream when generating a VOP header. The heads define the position, shape and size of the VOP. As it is known, the shape of a VOP can be defined by a level of one pixel or two pixels. After decoding a VOP head, the decoder 200 knows which macroblocks or portions of macroblocks are members of the VOP and which are not. When implemented with the present invention, the VOP head contains a key region trigger signal indicating whether the VOP contains one or more key regions. The key region activation signal can be as short as a single bit at a predetermined position of the VOP head. In the bitstream, a key region is defined by an upper signal called a "key region head" followed by data for macroblocks of the key region. Figure 3 illustrates the data structure of the key region head 300. To indicate the occurrence of a key region, the encoder 100 generates a resine marker 310, a code having a unique predetermined bit sequence. The resign marker sequence can not occur naturally in the VOP. The encoder 100 also generates a macroblock number signal 320 that identifies a macroblock that is the origin of the key region. For example, the macroblock number 320 may define an address of the upper left corner of the key region. The macroblock number 320 is a code whose length is determined by the size of the VOP where the key region resides. Also, the encoder 100 generates a key region width signal 330 which defines the key region width in terms of macroblocks. Then, the width field of the key region 330 has a length that is determined by the size of the VOP where the key region resides. The macroblock number and width fields 320 and 330 define a border box that circumscribes the key region. For example, to code the key region Kl within V0P1 (shown in Figure 4), the macroblock number field identifies the macroblock MB1 as the origin of the key region. The width field 330 defines that the key region is 4 macroblocks wide. These fields define a border boundary Bl limited by lines Ll, L2 and L3. The bottom border of the border box Bl is not defined by the macroblock number and width fields. By predefined value, the key region is defined as occupying the entire area of the border box Bl that falls within the area of V0P1. In this way, the predefined key region includes macroblocks MB 3-4, MB 6-16, etc. In the VOP header, macroblocks MB1, MB2, and MB5 would be defined to be excluded from V0P1. If the key region takes an irregular shape, as the key region Kl does in Figure 4, the shape is defined by a refining field of form 340. The refining field of form 340 follows the field of width 330 in the head of key region 300. It contains a refining flag of form 342. The refining flag of form 342 is a one-bit code, which when activated, indicates that the key region takes an arbritary form. If flag 342 is activated, it is followed by a form code 344 that identifies which of the macroblocks contained within a border box of the key region lies within the key region. The form code 344 provides a bit associated with each macroblock contained in the boundary box as long as the macroblock falls within the area of the VOP. The state of the bit determines whether the associated macroblock is included in the key region. If flag 342 is deactivated, form code 342 is omitted from the refining field of form 340. Consider again, V0P1 in Figure 4. As noted, the fields of macroblock number 320 and width 330 define the border block for every macroblock of V0P1 falling within a column starting at macroblock MB1, four macroblocks extend laterally from macroblock MB1 and extend vertically to the bottom of V0P1. However, the key region Kl is irregular.

Includes only macroblocks MB3, MB4, MB6-11 and MB13-15. To define the irregular shape of the key region, the form code 342 will be a 13-bit code that identifies which macroblocks are part of the irregular key region. The following table demonstrates how the form code 344 defines the membership of each macroblock in the key region Kl: Again, the data for macroblocks MB1, MB2 and MB5 are not provided in the form code 344 because they were defined as not being VOPl members. If a form code 344 is included in the key region head 300, the form code 344 identifies that so many macroblocks are contained in the key region. The head of the key region 300 also identifies data that is common throughout the key region. For example, a quantization parameter field 350, a movement compensation field 360 and a priority field 370 may be provided for any key region. Preferably, each can be identified in the key region head 300 by a one bit flag which, if activated, is followed by a code representing the attribute value. The key region may have more than one common attribute. The decoder 200 uses the common attribute information to decode the macroblocks following the key region head 300. The key region head 300 is followed by a variable length sequence of macroblock data (not shown) representing image data of the macroblocks within the key region. For key region macroblocks that overlap the edge of the VOP, the decoder interprets the encoded data to represent only the portion of the macroblock that falls within the VOP according to conventional coding. At the time of this writing, the MPEG-4 video standard is written. The key region coding scheme of the present invention has been proposed to integrate into the MPEG-4 video standard. Under this proposal, the resine marker 310 is defined as the sequence of sixteen zero bits and one "0000 0000 0000 0000 1". The macroblock number code 320 is a 1-12bit code representing an address of the upper left corner of the boundary box. The code length is determined by the following formula: Longi tud = VOP width x VOP to 16. 16. The width field 330 is a 1 to 7 bit code that represents the width of the key region in macroblock units. Then, the length of the width field depends on the width of the VOP. The refining field of form 340 is a one bit code. The quantization parameter value, the priority value and the values of the motion vector are each two-bit codes. Figure 5 illustrates a method of operating the decoder 200. The decoder 200 detects the key region when it detects the resine marker (step 1010). The decoder 200 decodes the key region head 300 to construct the key region. The decoder detects macroblock number fields 320 and width 330 which define the border box Bl which circumscribes the key region Kl (steps 1020 and 1030). By predefined value, the decoder 200 defines the key region to include every macroblock that falls within the union of the VOP and the boundary box Bl (step 1040). However, the decoder 200 receives the refining field in the form (step 1050). If the shape refining flag 342 is activated (step 1060), the decoder 200 decodes the shape coding data 344 (step 1070) to identify macroblocks of the border box Bl that are excluded from the key region (step 1080). The decoder 200 receives and decodes the common attribute data (step 1090). Using the attribute data, the decoder 200 receives and decodes macroblock data and places each macroblock sequentially in position according to a scan scan direction (left to right, then down) over the key region (steps 1100 and 1110) . The decoder 200 does not place the decoded data in any position that is not included within the key region. After receiving and decoding the key region head 300, the decoder 200 receives all the successive data as macroblock data until it is interrupted. For example, following macroblock data of the key region, the bitstream may include another resine marker indicating the start of another key region. Alternatively, the bit stream may include a successive VOP header or another data pattern indicating the occurrence of another type of data. When the decoder 200 detects this data pattern in the bitstream, it stops decoding the data as macroblock data associated with the key region.

The present invention provides a system for encoding and decoding key regions in planes of video objects. Key regions achieve efficient VOP data encryption, when a portion of the data shares common attributes that are not distributed throughout the VOP. For example, when a specific region of a VOP requires coding at a higher resolution than the rest of the VOP, a single quantization parameter can be adjusted for the region using the key region of the present invention. High resolution image segment coding occurs using a greater number of bits than the rest of the VOP. In this way, bits are preserved in coding of the rest of the VOP. Also, movement information or priority information can be coded for a key region, resulting in coding efficiencies that would not be achieved if the attribute data were already distributed to the entire VOP or established on a macroblock basis per macroblock. It is noted that in relation to this date, the best method known to the applicant to carry out the aforementioned invention, is that which is clear from the present description of the invention.

Claims

CLAIMS Having described the invention as above, the content of the following claims is claimed as property:

1. - A method for encoding video information within a video object as a key region, characterized in that it comprises the steps of: generating a key region head, the head comprises: a resine marker that uniquely identifies the key region head , and a key region position signal indicating an origin and size of the key region; and encode the video information within a boundary box defined by the original size of the key region.

2. - The method according to claim 1, characterized in that the key region position signal includes a signal indicating an origin of the key region.

3. The method according to claim 1, characterized in that the key region position signal includes a signal indicating a width of the key region.

4. - The method according to claim 3, characterized in that the data is representative of video information contained in an area where the key region and the video object overlap.

5. - The method according to claim 1, characterized in that it also comprises a step of generating a refining signal so that it represents a shape of the key region.

6. - The method according to claim 1, characterized in that the key region position signal defines a pre-defined key region.

7. - The method according to claim 6, characterized in that it further comprises a step of generating a refining signal so that it represents a shape of the key region, and a refining flag so that it indicates that the key region takes a irregular shape, and when the shape refining flag is activated, a shape code that identifies which data within the predefined key region is contained within the irregular key region.

8. A method for decoding video object data, characterized in that it comprises the steps of: receiving the data, when the data includes a resign marker, receiving the key region position signal from the encoded data, identifying an area of key region from the key region position signal, receiving image data encoded in data associated with the key region, decoding the encoded image data and placing the decoded video data within the area of the key region.

9. The decoding method according to claim 8, characterized in that the key region position signal includes a signal representative of an origin of the key region and a width signal representative of a width of the key region, wherein the identification step includes a step of identifying the predefined key region area as an area starting at the origin, extending laterally from the origin as determined by the width signal and extending to a bottom of the video object.

10. The decoding method according to claim 8, characterized in that the placing step includes a step of placing the decoded video data in an area where the key region and the video object overlap.

11. The decoding method according to claim 8, characterized in that the video data is placed in the area according to a scanning scan direction.

12. - The decoding method according to claim 8, characterized in that it further comprises the step of detecting an attribute signal that identifies an attribute value that is common to the video data of the key region.

13. - The decoding method according to claim 12, characterized in that the decoding step decodes the video data according to the attribute value.

14. A bitstream that represents video information of a video object, the bit stream is produced by the procedure of: generating a key region head, the head comprises: a key region start code that identifies the key region head and a key region position signal indicating the position of the key region, a key region width signal representing the width of the key region; and generating representative data of video information of the video object in an area limited by the key region.

15. The method for decoding encoded data of a video object, the video object is represented as a plurality of key regions, comprising the steps of: detecting a key region head from the encoded data, detecting a signal of key region position from the encoded data, construct a key region from the key region position signal, detect data from the encoded data, representative of video information of the key region, decode the data from video and place the video data within the key region.