WO2008056878A1

WO2008056878A1 - Method for determining packet type for svc video bitstream, and rtp packetizing apparatus and method using the same

Info

Publication number: WO2008056878A1
Application number: PCT/KR2007/004413
Authority: WO
Inventors: Soon-Heung Jung; Kwang-Deok Seo; Jae-Gon Kim; Jin-Woo Hong
Original assignee: Electronics And Telecommunications Research Institute
Priority date: 2006-11-09
Filing date: 2007-09-12
Publication date: 2008-05-15

Abstract

Provided are a method for determining the packet type for a Scalable Video Coded (SVC) video bitstream, and a Real-time Transport Protocol (RTP) packetizing apparatus and method using the same. The method for determining a packet type for a Scalable Video Coded (SVC) video bitstream, which includes the steps of: a) deriving temporal and spatial hierarchy information between Network Abstraction Layer (NAL) units from field information defined in the NAL unit headers of scalable layers; b) detecting the type of encoding information by applying combined scalability encoding to the hierarchical structure of the Scalable Video Coding (SVC); and c) determining a Real-time Transport Protocol (RTP) packet type for the corresponding SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.

Description

METHOD FOR DETERMINING PACKET TYPE FOR SVC

VIDEO BITSTREAM, AND RTP PACKETIZING APPARATUS

AND METHOD USING THE SAME

Technical Field

[1] The present invention relates to a method for determining the packet type for a

Scalable Video Coded (SVC) video bitstream, and a Real-time Transport Protocol (RTP) packetizing apparatus and method using the same; and, more particularly, to a method for determining the packet type for RTP packetization in a procedure of applying RTP packetization to an SVC bitstream, and an RTP packetizing method and apparatus including an RTP packet generating method based on the packet type determining method. Background Art

[2] Scalable Video Coding (SVC), which is a scalable coding technique of H.264, is a new scalable coding technique that is developed to solve the problems of low compression efficiency, unsupportability of combined scalability, and high implementation complexity, which are caused by layered coding-based scalability attempted in existing Moving Picture Experts Group 2 (MPEG-2), MPEG-4, etc.

[3] SVC encodes multiple video layers into a single bit sequence. The layers of SVC include one base layer and scalable layers that can be continuously stacked over the base layer.

[4] Each scalable layer is able to express the maximum bit rate, frame rate and resolution that are given to itself based on low-order layer information.

[5] The more the SVC continuously stacks scalable layers, the more diverse bit rates, frame rates, and resolutions it is possible to support. Thus, the SVC is a coding technique suitable for multimedia contents service in a Universal Multimedia Access (UMA) environment that can solve the problem of variability in bandwidth that occurs in a heterogeneous network environment, the problem of variability in receiving terminal performance and resolution, the problem of various preferences of contents consumers and so on in a complex way.

[6] A Video Coding Layer (VCL) of an SVC encoder generates base layer encoding information and scalability encoding information of the scalable layers in slices.

[7] Each slice is generated in Network Abstraction Layer (NAL) units in an NAL and stored in an SVC bitstream.

[8] Although an RTP payload format for loading the NAL units of the SVC is currently disclosed in an internet draft document "draft-wenger-avt-rtp-svc-02.txt", the SVC is of a complicated structure that stores encoding information of SNR scalability and temporal and spatial scalability, as well as base layer encoding information that is compatible with H.264, in a single bit stream. Thus, no research has provided a result yet on an effective RTP packetizing method that can support the RTP payload format of the SVC.

[9] As RTP packet types for the NAL units of the SVC, there are a total of seven types, including a Single NAL Unit (SNU), a Simple-Time Aggregation Packet-A (STAP-A), STAP-B, Multi-Time Aggregation Packet 16 (MTAP 16), MTAP24, Fragmentation Unit-A (FU-A), and FU-B.

[10] The SNU type can load only one NAL unit in one RTP, and the STAP can simultaneously load multiple NAL units that belong to the same presentation time instant in one RTP packet. This STAP is divided into an STAP-A type that loads NAL units in an RTP packet in the same order as decoding and a STAP-B type that loads NAL units in an RTP packet without considering the encoding order for interleaving purposes.

[11] The MTAP can load multiple NAL units belonging to different presentation time instants in one RTP packet at a time and basically supports interleaving. This MTAP is divided into an MTAP 16 type supporting a 16-bit time offset and an MTAP24 type supporting a 24-bit time offset depending on the size of a time offset field for displaying the difference in presentation time instant between the NAL units.

[12] Among these seven RTP packet types, only packet types required according to an application field are aggregated by three types of RTP packet modes. Fig. 1 shows RTP packet types that can be supported by three types of RTP packet modes including an SNU mode, a non-interleaved mode, and interleaved mode.

[13] The SNU mode of Fig. 1 is able to support only the SNU type that can load only one

NAL unit having 1 to 23 "NAL_unit_types" shown in Fig. 2 in an RTP packet, and its application field is restrictive.

[14] On the other hand, the non-interleaved mode is able to support the STAP-A and the

FU-A as well as the SNU type, and thus, its practically applicable application range is wide.

[15] The interleaved mode is a mode that adds an interleaving function to the non- interleaved mode, and has a drawback that it cannot support the SNU type. As the order of the NAL units to be loaded in the RTP packet by the interleaving function of the interleaved mode is different from the order of decoding, a burst error in a channel can be effectively dealt with, but RTP packetization and de-packetization and an SVC decoding procedure become very complicated.

[16] Therefore, in view of the implementation complexity and the applicable application range, the non-interleaved mode is suitable as the RTP packetization mode that must be necessarily supported in a commercial SVC streaming service, and the interleaved mode can be considered as an option for a service in an environment with high channel error. [17] The SNU type of the non-interleaved mode is supposed to load one NAL unit having

1 to 23 "NAL_unit_types" shown in Fig. 2 in one RTP packet. [18] In other words, the STAP-A type of the non-interleaved mode has an RTP payload format structure as shown in Fig. 3, and is of the type that aggregates several NAL units corresponding to the same presentation time instant and loads the same in one

RTP packet. [19] The STAP-A type of the non-interleaved mode, as shown in Fig. 3, has a 1-byte RTP payload header (STAP-A NAL HDR) additionally inserted therein, unlike the SNU type. The value of the F field of the payload header is set to " 1 " if there is more than one NAL unit in which the F field indicated in each of the headers of the NAL units to be loaded together has a value of " 1 " [20] The NRI field of the payload header is set to the maximum value of the NRI field values indicated in each of the headers of the NAL units to be loaded together. [21] In the "Type" field of the payload header, "NAL_unit_type" of No.24 in Fig. 3 is set in order to show that this is a STAP-A type. [22] In addition, the "NALU_Size" field of 2 bytes representing the size of each NAL unit to be loaded separately from payload header information is inserted in the front part of each NAL unit. [23] The FU-A type of the non-interleaved mode is a type that divides a NAL unit into two or more so that it does not exceed an MTU (Maximum Transmission Unit) size and loads the divided units in respective corresponding RTP packets in order to prevent the occurrence of packet fragmentation in a router or gateway during transmission if the size of one NAL unit exceeds that of the MTU of a network. [24] Fig. 4 illustrates the structure of an RTP payload format for the FU-A type. The RTP payload header is composed of a total of 2 bytes including one byte of "FU_indicator" and one byte of "FU_header". [25] The values indicated in the headers of the NAL units are applied to the F field and

NRI field of "FU_indicator" as it is. [26] "NAL_unit_type" of "No.28" in Fig. 1 is set in the "Type" field of "FU_indicator" in order to show that this is the FU-A type. [27] The S field and E field of "FU_header" are used in order to show that the parts to be divided and loaded are the start part of an NAL unit or the end part thereof, respectively. [28] In the "Type" field of the "FUJieader", the "NAL_unit_type" value indicating encoding contents contained in the NAL unit is set, as shown in Fig. 2. [29] That is, as described above, although the RTP packet type for the NAL units stored in an SVC bitstream is classified as standard, there has been no suggestion of the standard and method for determining a given NAL unit as a suitable packet type.

[30] Consequently, the present invention proposes a practical RTP packetization algorithm which can effectively load NAL units of an SVC in an RTP payload while maintaining the specification of the RTP payload format. Disclosure of Invention Technical Problem

[31] It is, therefore, an object of the present invention to provide a method for determining a packet type for RTP packetization in a procedure of applying RTP packetization to an SVC bitstream, and an RTP packetizing method and apparatus including an RTP packet generating method based on the packet type determining method.

[32] Other objects and advantages of the present invention can be understood by the following description, and become apparent with reference to the embodiments of the present invention. Also, it is obvious to those skilled in the art of the present invention that the objects and advantages of the present invention can be realized by the means as claimed and combinations thereof. Technical Solution

[33] In accordance with an aspect of the present invention, there is provided a method for determining a packet type for a Scalable Video Coded (SVC) video bitstream, which includes the steps of: a) deriving temporal and spatial hierarchy information between Network Abstraction Layer (NAL) units from field information defined in the NAL unit headers of scalable layers; b) detecting the type of encoding information by applying combined scalability encoding to the hierarchical structure of the SVC; and c) determining a Real-time Transport Protocol (RTP) packet type for the corresponding SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.

[34] The temporal and spatial hierarchy information derivation step of a) is performed by a combination of related hierarchy values (TL, DID, and QL) between the layers in the temporal and spatial SNR scalability defined in the last octet of the scalable layer NAL unit headers.

[35] The encoding information type detection step of b) is carried out by analyzing

"NAL_unit_type" values of the NAL units belonging to the base layer and the scalable layers, which are the "NAL_unit_type" values indicating encoding information. The packet type determined in the step c) is any one among Single NAL Unit (SNU), Fragmentation Unit-A (FU-A), and Simple-Time Aggregation Packet-A (STAP-A) types of a non-interleaved mode.

[36] In accordance with another aspect of the present invention, there is provided a packetizing method for an SVC video bitstream, which includes the steps of: a) determining a packet type for the SVC video bitstream; and b) fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the determined packet type and loading the fragments in RTP packets.

[37] In accordance with another aspect of the present invention, there is provided an apparatus for packetizing an SVC video bitstream, which includes: a packet type determiner for determining a packet type for the SVC video bitstream; and a packet generator for generating a packet by fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the packet type determined by the packet type determiner and loading the fragments in RTP packets.

Advantageous Effects

[38] As mentioned above and will be described below, the present invention can efficiently determine the packet type for an SVC bitstream and perform RTP pack- etization using the same. [39] As a result, the present invention can more efficiently transmit an SVC video bitstream through an IP network such as the internet.

Brief Description of the Drawings

[40] Fig. 1 is a table showing a packet type supportable for each RTP packetization mode.

[41] Fig. 2 is a table summarizing contents contained in NAL units by NAL_unit_types.

[42] Fig. 3 is an explanatory view showing an RTP payload format structure for a STAP-

A type. [43] Fig. 4 is an explanatory view showing an RTP payload format structure for an FU-A type. [44] Figs. 5 and 6 are explanatory views showing the header structures of NAL units used in a base layer and scalable layers of an SVC in accordance with the present invention. [45] Fig. 7 is an explanatory view showing a screen and hierarchical structure for combined scalability encoding of the SVC in accordance with the present invention. [46] Fig. 8 is an explanatory view of the encoding order of SVC screens and of NAL units of the base layer and scalable layers belonging to each screen in accordance with the present invention. [47] Fig. 9 is a detailed flow chart illustrating an RTP packetizing method in accordance with a preferred embodiment of the present invention. [48] Fig. 10 is a block diagram illustrating the structure of an RTP packetizing apparatus for an SVC bitstream in accordance with another preferred embodiment of the present invention.

Best Mode for Carrying Out the Invention [49] The advantages, features and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter. Thus, the present invention will be easily practiced by those skilled in the art. Further, in the following description, well-known arts will not be described in detail if it seems that they could obscure the invention in unnecessary detail. Hereinafter, specific embodiments of the present invention will be set forth in detail with reference to the accompanying drawings.

[50] Figs. 5 and 6 are explanatory views showing header structures of NAL units used in the base layer and scalable layers of an SVC in accordance with the present invention.

[51] Encoding information generated by SVC encoding is stored in a bit stream in NAL units. As shown in Figs. 5 and 6, the header structure of the NAL unit generated in the base layer and that of the NAL unit generated in the scalable layers are different from each other.

[52] Fig. 5 shows the header structure of the NAL unit generated in the base layer compatible with H.264, and Fig. 6 depicts the header structure of the NAL unit generated in the scalable layers.

[53] In Figs. 5 and 6, the "Type" field means "NAL_unit_type" representing information on contents of encoding information contained in the NAL unit, and shows the type of encoding information contained in the NAL unit for each of the aforementioned "NAL_unit_types" of Fig. 2.

[54] The "NAL_unit_types" of Nos. 1 to 6 are usable in the NAL unit of the base layer is, while the "NAL_unit_types" of Nos. 20 and 21 are usable in the scalable layers. The other "NAL_unit_types" are used to indicate the NAL units containing not encoding information but additional information.

[55] In the present invention, temporal and spatial hierarchy for each of the NAL units can be derived from (TL, DID, and QL) field information defined in the header of each NAL unit of the scalable layers.

[56] The (TL, DID, QL) field, which is the last octet, in Fig. 6 represents the inter-layer hierarchy in the temporal and spatial SNR scalability. That is, TL (temporal_level) represents the hierarchy between temporal layers for temporal scalability, DID (dependency_id) indicates the dependency hierarchy between higher/lower scalable layers in the inter-layer prediction of spatial scalability, and QL (quality_level) represents the hierarchy between FGS layers for support of SNR scalability.

[57] The TL, DID, and QL values are all integers greater than "0", and the temporal and spatial hierarchy of the NAL units can be derived from a combination of these values.

[58] Based on the (TL, DID, QL) information and the "NAL_unit_types" thus analyzed, a practical RTP packetization algorithm that can be effectively applied to combined scalability of the SVC is proposed.

[59] Fig. 7 is a view showing an example of a screen and hierarchical structure for combined scalability encoding of the SVC in accordance with the present invention. [60] In Fig. 7, only an Instantaneous Decoding Refresh (IDR) screen, which is the start part of an SVC stream, and a first Group Of Picture (GOP) screen are shown. One

GOP consists of 16 screens, and the other GOPs not shown in the drawing also has a structure where the GOP size is 16. [61] The screen resolution that can be supported by the base layer is QCIF, and the screen resolution that can be supported by the spatial scalable layers is CIF. [62] In order to display different resolutions in different spatial layers, the DID value in the (TL, DID, QL) field of Fig. 6 is used. [63] That is, in Fig. 7, the NAL unit with DID=O means a screen with a resolution of

QCIF, and the NAL unit with DID=I represents a screen with a resolution of CIF. [64] A hierarchical B -picture approach is applied for provision of temporal scalability, and the TL value in the (TL, DID, QL) field is used in order to display a supportable frame rate. [65] In Fig. 7, the TL value is displayed in the middle part of each screen indicated in a rectangle. In case of transmitting only a key picture with TL=O, the frame rate can be supported up to 1.875 fps, and in case of transmitting it, including a B-picture with

TL=I, the frame rate can be supported up to 3.75 fps. [66] In case of additionally transmitting a B-picture with TL=2, the frame rate can be supported up to 7.5 fps, and in case of additionally transmitting B-pictures with TL=3 and TL=4, the frame rate can be supported up to 15 fps and 30 fps, respectively. [67] In Fig. 7, as the maximum TL value in the base layer is 3, the frame rate can be supported up to 15 fps in QCIF standard, and as the maximum TL value in the spatial scalable layer is 4, the frame rate can be supported up to 30 fps in CIF standard. [68] If the screens at the same point of time belonging to the base layer and the spatial scalable layer have the same TL value, inter-layer prediction encoding is executed in the direction of an arrow indicated by dotted lines in the drawing. The resolution of the base layer screen of the QCIF standard where DID=O is upsampled to be utilized for the prediction encoding of the scalable layer screen of the CIF standard where DID=I. [69] Meanwhile, in Fig. 7, since each screen generates one FGS layer for support of SNR scalability, the NAL units containing encoding information of each FGS layer are all set to QL=I. [70] Fig. 8 sequentially shows the encoding order of the screens and the NAL_unit_types and (TL, DID, QL) field information for NAL units of the base layer and scalable layers belonging to each screen in a case where SVC combined scalability encoding is applied to the screen and layer structure of Fig. 7. [71] Referring to the drawing, it can be seen that the encoding of an IDR picture occurs first. One base layer NAL unit having the header structure of Fig. 5 is generated in the base layer, and three scalable layer NAL units having the header structure of Fig. 6 are generated in the scalable layers. [72] The three NAL units generated in the scalable layers include one NAL unit for FGS scaling for the base layer, one NAL unit for the spatial scalable layer, and one NAL unit for FGS scaling for the spatial scalable layer. [73] The NAL unit firstly generated in the IDR picture is done in the base layer, and the header of the NAL unit conforms to the structure of (a) of Fig. 5. Because the base layer is an IDR picture, it can be seen from Fig. 9 that the "NAL_unit_type" of the header is set to "5" by Fig. 2 described above. [74] The NAL unit secondly generated in the IDR picture is the one for FGS scaling for the base layer. As the "NAL_unit_type" is set to "21" by Fig. 2, and QL is set to 1

(QL=I), (TL, DID, QL) becomes (0,0,1). [75] The NAL unit thirdly generated in the IDR picture is the one for spatial scalable layer. As the "NAL_unit_type" is set to "21" and DID is set to 1 (DID=I), (TL, DID,

QL) becomes (0,1,0). [76] The NAL unit lastly generated in the IDR picture is the one for FGS scaling for spatial scalable layer. As the "NAL_unit_type" is set to "21" and QL is set to 1

(QL=I), (TL, DID, QL) becomes (0,1,1). [77] When the encoding of the IDR picture is finished as above, a screen with screen number 16, which is an I- or P-picture, is encoded. Because this picture is a non-IDR picture, "NAL_unit_type" is set to " 1" in the base layer and to "20" in the scalable layer. [78] After completion of the encoding of the screen number 16, a screen with screen number 8 is encoded. The TL values for four NAL units generated at this time are all set to "1", thereby making it possible to support the frame rate of 3.75 fps. [79] Next, as the screens with screen numbers 4 and 12 are set to TL=2, the frame rate of

7.5 fps can be supported. [80] The screen numbers 2, 6, 10, and 14 are set to TL=3 with respect to all the four NAL units generated for support of the frame rate of 15 fps. [81] Meanwhile, the screen numbers 1, 3, 5, 7, 13, and 15 are encoded in the spatial scalable layer in order to support 30 fps only by the CIF standard. [82] As shown in Fig. 8, there exists no NAL unit belonging to the base layer, but only two NAL units belonging to the scalable layers exist. [83] To support 30fps, TL is set to 4 (TL=4) and all the NAL units belong to the spatial scalable layer, and thus, they are commonly set to DID=I. [84] The NAL units for the spatial scalable layer are set to QL=O, and the NAL units for

FGS scaling for the spatial scalable layer are set to QL=I. [85] As shown in Fig. 8, for the combined scalability of the SVC, analyzing the "NAL_unit_types" and (TL, DID, QL) field of the NAL units can detect the type of encoding information contained in the NAL units through the "NAL_unit_type" values and derive the temporal and spatial hierarchy between the NAL units through the (TL, DID, QL) values.

[86] Such information can be very usefully utilized in effectively designing the RTP pack- etization scheme for cutting an SVC stream to a proper size and loading the same in an RTP packet.

[87] In the non-interleaved mode, three packet types such as SNU, FU-A, and STAP-A are supported, as shown in Fig. 1.

[88] Generally, the NAL units belonging to the base layer have a higher priority order in transmission than the NAL units belonging to the scalable layers, and are processed to be strong against an error through channel encoding, separately from scalable layer information. Therefore, the NAL units of the base layer are not loaded in an RTP packet by mixing with the NAL units of the scalable layers, but loaded independently in an RTP packet.

[89] Accordingly, the STAP-A packet type that can be aggregated with the NAL units of the scalable layers is not applied to the NAL units of the base layer, but either SNU or FU-A is selected and loaded in an RTP packet by considering the length of the NAL units.

[90] Applied to the NAL units belonging to the scalable layers are all the three packet types including SNU, FU-A, and STAP-A.

[91] Among them, SNU and FU-A are selectively applied depending on the length of the

NAL units, and STAP-A is applied in such a manner that several NAL units of the scalable layers belonging to the same screen number are aggregated as one within the range that does not exceed the MTU size and loaded in an RTP packet.

[92] Hereinafter, an algorithm based on a look-ahead scheme for identifying scalable layer NAL units, to which the STAP-A type is to be applied, will be described.

[93] (TL, DID, QL) information of NU , which is the NAL unit being inputted to the loop of the present algorithm, is indicated by (T , D , Q ), and the next NAL unit to be analyzed one step in advance by the look- ahead scheme is designated by NU , and (TL, DID, QL) information of NU is indicated by (T , D , Q ). t+l i+l i+l i+l

[94] In order to determine whether to apply the STAP-A type, (T , D , Q ) in- i+l i+l i+l formation of NU is extracted in advance and compared. The sequential condition that should be satisfied in order to aggregate NU and NU and add the same to one i+l

RTP pay load is as follows:

[95] i. NU should not be the NAL unit belonging to the base layer. i+l

[96] ii. NU should have the same TL value as NU .

[97] iii. The sum of the size of the NAL units accumulated until NU in an RTP payload plus the size of NU should be smaller than the size of an MTU (in case of the t+l internet, the general size of the MTU is 1500 bytes). In case of transmitting an RTP packet greater than the MTU, the RTP packet is fragmented into several packets by the fragmentation function of a router or gateway during transmission through a network, thereby causing a burden to the network and the client.

[98] iv. The following conditions should be satisfied depending on the magnitude correlation of Q and Q i i+l

[99] (a) If Q >Q , this means that the quality level of a FGS layer increases. This t+l i phenomenon occurs only to the NAL units belonging to the same screen number, and thus, the condition of STAP-A is satisfied. Therefore, NU and NU can be loaded t+l i together in an RTP payload. [100] Qo) Jf Q = Q , this means that the quality level of a FGS layer does not increase. t+l i

The situation where this phenomenon occurs can be divided into the situation of D > i+l

D and vice versa. The situation of D >D occurs only to the NAL units that always i i+l i exist within the same screen number, and thus, NU and NU can be targets of ST AP- i i+l

A. However, the situation of D i+l =D i occurs between the NAL units having different screen numbers, i.e., different presentation time instants, and thus NU i+l and NU i cannot be targets of STAP-A.

[101] In conclusion, in order to perform RTP packetization in the STAP-A type, NU and NU i+l should sequentially satisfy all of i, ii, iii, and iv-(a) among the above conditions, or should sequentially satisfy all of i, ii, iii, and iv-(b).

[102] There is shown in Fig. 9 a flowchart of the algorithm proposed in order to perform

RTP packetization by determining the SNU, FU-A, and STAP-A packet types based on the above conditions for determining the packet type as STAP-A.

[103] The flowchart as shown in Fig. 9 is performed for every GOP unit, and the RTP packet type is determined based on NUTypei, which is the "NAL_unit_type" of all the NAL units existing within one GOP, and (T , D , Q ), which is (TL, DID, QL) in- formation, as explained in Fig. 8. [104] i, ii, iii, iv-(a), and iv-(b), which are the conditions for determining the packet type as

STAP-A, are indicated on the corresponding blocks of Fig. 9, respectively. [105] In Fig. 9, N implies that a packetizing process that is currently in progress is in the process of loading an N-th NAL unit in an RTP payload. The algorithm shown in the drawing is operated in the look-ahead scheme of investigating NU in advance in t+l order to compare the STAP-A type condition. Therefore, if the packet type is determined as STAP-A when N=I, NU and NU are simultaneously loaded in the i i+l

RTP payload, while if the packet type is determined as STAP-A when N>1, only NU is loaded in the RTP payload. [106] If the packet type is not determined as STAP-A when N=I, the packet type is determined as SNU or FU-A by checking whether the size of NU exceeds that of the

MTU. On the other hand, if the packet type is not determined as STAP-A when N>1, an N-number of NAL units accumulated in an RTP payload up to present are loaded and transmitted in one RTP packet, and then parameters N and I are updated to generate a new RTP packet, followed by repeating the entire process. [107] Herein, parameters I and J are used so as to indicate the start position and end position of the N-number of NAL units to be loaded in the RTP payload. Meanwhile, Si means the size of NU , and Pi means the size of total packets accumulated in the RTP payload including NU and is used to check whether or not the size of the total packets accumulated in the RTP payload exceeds that of the MTU.

[108] Fig. 10 is a block diagram illustrating an RTP packetizing apparatus for an SVC bitstream in accordance with another preferred embodiment of the present invention.

[109] Referring to Fig. 10, the inventive RTP packetizing apparatus 120 for an SVC bitstream 120 includes a packet type determiner 130 for determining a packet type for an input SVC bitstream and a packet generator 140 for generating an RTP packet by fragmenting the SVC bitstream so as to correspond to the packet type determined by the packet type determiner 130 and loading the same in an RTP packet.

[110] A description of the detailed functions of the components such as the packet type determiner 130 and the packet generator 140 will be substituted by the above description of Figs. 5 to 9.

[I l l] Reference numeral 110 not explained represents an SVC encoder 110 which provides the SVC bitstream to the packet type determiner 130 by encoding an input video sequence.

[112] The method of the present invention as mentioned above may be implemented by a software program that is stored in a computer-readable storage medium such as CD- ROM, RAM, ROM, floppy disk, hard disk, optical magnetic disk, or the like. This procedure may be readily carried out by those skilled in the art; and therefore, details of thereof are omitted here.

[113] The present application contains subject matter related to Korean Patent Application Nos. 2006-0110714 and 2006-0125144, filed in the Korean Intellectual Property Office on November 9, 2006, and December 8, 2006, the entire contents of which is incorporated herein by reference.

[114] While the present invention has been described with respect to the specific embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.

Claims

[1] A method for determining a packet type for a Scalable Video Coded (SVC) video bitstream, comprising the steps of: a) deriving temporal and spatial hierarchy information between Network Abstraction Layer (NAL) units from field information defined in the NAL unit headers of scalable layers; b) detecting the type of encoding information by applying combined scalability encoding to the hierarchical structure of the Scalable Video Coding (SVC); and c) determining a Real-time Transport Protocol (RTP) packet type for the corresponding SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.

[2] The method of claim 1, wherein the step a) is performed by a combination of related hierarchy values (TL, DID, and QL) between the layers in the temporal and spatial SNR scalability defined in the last octet of the scalable layer NAL unit headers.

[3] The method of claim 1, wherein the step b) is carried out by analyzing

NAL_unit_type values of the NAL units belonging to the base layer and the scalable layers, which are the "NAL_unit_type" values indicating encoding information.

[4] The method of claim 1, wherein the packet type determined in the step c) is any one among Single NAL Unit (SNU), Fragmentation Unit-A (FU-A), and Simple- Time Aggregation Packet-A (STAP-A) types of a non-interleaved mode.

[5] An RTP packetizing method for an SVC video bitstream, comprising the steps of: a) determining a packet type for the SVC video bitstream; and b) fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the determined packet type and loading the fragments in RTP packets.

[6] The method of claim 5, wherein the step a) includes the steps of: al) deriving temporal and spatial hierarchy information between NAL units from field information defined in the NAL unit headers of scalable layers; a2) detecting a type of encoding information by applying combined scalability encoding to the hierarchical structure of the SVC; and a3) determining an RTP packet type for the SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.

[7] The method of claim 6, wherein the step al) is performed by a combination of related hierarchy values (TL, DID, and QL) between the layers in the temporal and spatial SNR scalability defined in the last octet of the scalable layer NAL unit headers.

[8] The method of claim 6, wherein the step a2) is performed by analyzing

"NAL_unit_type" values of the NAL units belonging to the base layer and the scalable layers, which are the "NAL_unit_type" values indicating encoding information.

[9] The method of claim 6, wherein the packet type determined in the step a3) is any one among SNU, FU-A, and STAP-A types of a non-interleaved mode.

[10] An apparatus for packetizing an SVC video bitstream, comprising: a packet type determiner for determining a packet type for the SVC video bitstream; and a packet generator for generating a packet by fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the packet type determined by the packet type determiner and loading the fragments in RTP packets.