WO2008056878A1 - Method for determining packet type for svc video bitstream, and rtp packetizing apparatus and method using the same - Google Patents

Method for determining packet type for svc video bitstream, and rtp packetizing apparatus and method using the same Download PDF

Info

Publication number
WO2008056878A1
WO2008056878A1 PCT/KR2007/004413 KR2007004413W WO2008056878A1 WO 2008056878 A1 WO2008056878 A1 WO 2008056878A1 KR 2007004413 W KR2007004413 W KR 2007004413W WO 2008056878 A1 WO2008056878 A1 WO 2008056878A1
Authority
WO
WIPO (PCT)
Prior art keywords
nal
type
svc
rtp
packet
Prior art date
Application number
PCT/KR2007/004413
Other languages
French (fr)
Inventor
Soon-Heung Jung
Kwang-Deok Seo
Jae-Gon Kim
Jin-Woo Hong
Original Assignee
Electronics And Telecommunications Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020060125144A external-priority patent/KR100776680B1/en
Application filed by Electronics And Telecommunications Research Institute filed Critical Electronics And Telecommunications Research Institute
Priority to US12/513,542 priority Critical patent/US8761203B2/en
Publication of WO2008056878A1 publication Critical patent/WO2008056878A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/6437Real-time Transport Protocol [RTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/31Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/34Scalability techniques involving progressive bit-plane based encoding of the enhancement layer, e.g. fine granular scalability [FGS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/36Scalability techniques involving formatting the layers as a function of picture distortion after decoding, e.g. signal-to-noise [SNR] scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234327Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into layers, e.g. base layer and one or more enhancement layers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • H04N21/2381Adapting the multiplex stream to a specific network, e.g. an Internet Protocol [IP] network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8451Structuring of content, e.g. decomposing content into time segments using Advanced Video Coding [AVC]

Definitions

  • the present invention relates to a method for determining the packet type for a
  • Scalable Video Coded (SVC) video bitstream and a Real-time Transport Protocol (RTP) packetizing apparatus and method using the same; and, more particularly, to a method for determining the packet type for RTP packetization in a procedure of applying RTP packetization to an SVC bitstream, and an RTP packetizing method and apparatus including an RTP packet generating method based on the packet type determining method.
  • RTP Real-time Transport Protocol
  • Scalable Video Coding which is a scalable coding technique of H.264, is a new scalable coding technique that is developed to solve the problems of low compression efficiency, unsupportability of combined scalability, and high implementation complexity, which are caused by layered coding-based scalability attempted in existing Moving Picture Experts Group 2 (MPEG-2), MPEG-4, etc.
  • MPEG-2 Moving Picture Experts Group 2
  • MPEG-4 MPEG-4
  • SVC encodes multiple video layers into a single bit sequence.
  • the layers of SVC include one base layer and scalable layers that can be continuously stacked over the base layer.
  • Each scalable layer is able to express the maximum bit rate, frame rate and resolution that are given to itself based on low-order layer information.
  • the SVC is a coding technique suitable for multimedia contents service in a Universal Multimedia Access (UMA) environment that can solve the problem of variability in bandwidth that occurs in a heterogeneous network environment, the problem of variability in receiving terminal performance and resolution, the problem of various preferences of contents consumers and so on in a complex way.
  • UMA Universal Multimedia Access
  • a Video Coding Layer (VCL) of an SVC encoder generates base layer encoding information and scalability encoding information of the scalable layers in slices.
  • Each slice is generated in Network Abstraction Layer (NAL) units in an NAL and stored in an SVC bitstream.
  • NAL Network Abstraction Layer
  • RTP packet types for the NAL units of the SVC there are a total of seven types, including a Single NAL Unit (SNU), a Simple-Time Aggregation Packet-A (STAP-A), STAP-B, Multi-Time Aggregation Packet 16 (MTAP 16), MTAP24, Fragmentation Unit-A (FU-A), and FU-B.
  • SNU Single NAL Unit
  • STAP-A Simple-Time Aggregation Packet-A
  • STAP-B Simple-Time Aggregation Packet 16
  • MTAP24 Multi-Time Aggregation Packet 16
  • FU-A Fragmentation Unit-A
  • FU-B Fragmentation Unit-B
  • the SNU type can load only one NAL unit in one RTP, and the STAP can simultaneously load multiple NAL units that belong to the same presentation time instant in one RTP packet.
  • This STAP is divided into an STAP-A type that loads NAL units in an RTP packet in the same order as decoding and a STAP-B type that loads NAL units in an RTP packet without considering the encoding order for interleaving purposes.
  • the MTAP can load multiple NAL units belonging to different presentation time instants in one RTP packet at a time and basically supports interleaving.
  • This MTAP is divided into an MTAP 16 type supporting a 16-bit time offset and an MTAP24 type supporting a 24-bit time offset depending on the size of a time offset field for displaying the difference in presentation time instant between the NAL units.
  • Fig. 1 shows RTP packet types that can be supported by three types of RTP packet modes including an SNU mode, a non-interleaved mode, and interleaved mode.
  • the SNU mode of Fig. 1 is able to support only the SNU type that can load only one
  • the non-interleaved mode is able to support the STAP-A and the
  • the interleaved mode is a mode that adds an interleaving function to the non- interleaved mode, and has a drawback that it cannot support the SNU type.
  • the order of the NAL units to be loaded in the RTP packet by the interleaving function of the interleaved mode is different from the order of decoding, a burst error in a channel can be effectively dealt with, but RTP packetization and de-packetization and an SVC decoding procedure become very complicated.
  • the non-interleaved mode is suitable as the RTP packetization mode that must be necessarily supported in a commercial SVC streaming service, and the interleaved mode can be considered as an option for a service in an environment with high channel error.
  • the SNU type of the non-interleaved mode is supposed to load one NAL unit having
  • the STAP-A type of the non-interleaved mode has an RTP payload format structure as shown in Fig. 3, and is of the type that aggregates several NAL units corresponding to the same presentation time instant and loads the same in one
  • the STAP-A type of the non-interleaved mode has a 1-byte RTP payload header (STAP-A NAL HDR) additionally inserted therein, unlike the SNU type.
  • the value of the F field of the payload header is set to " 1 " if there is more than one NAL unit in which the F field indicated in each of the headers of the NAL units to be loaded together has a value of " 1 "
  • the NRI field of the payload header is set to the maximum value of the NRI field values indicated in each of the headers of the NAL units to be loaded together.
  • "Type" field of the payload header "NAL_unit_type” of No.24 in Fig.
  • the FU-A type of the non-interleaved mode is a type that divides a NAL unit into two or more so that it does not exceed an MTU (Maximum Transmission Unit) size and loads the divided units in respective corresponding RTP packets in order to prevent the occurrence of packet fragmentation in a router or gateway during transmission if the size of one NAL unit exceeds that of the MTU of a network.
  • MTU Maximum Transmission Unit
  • RTP payload header is composed of a total of 2 bytes including one byte of "FU_indicator” and one byte of "FU_header”. [25] The values indicated in the headers of the NAL units are applied to the F field and
  • "NAL_unit_type” of "No.28” in Fig. 1 is set in the "Type” field of "FU_indicator” in order to show that this is the FU-A type.
  • the S field and E field of "FU_header” are used in order to show that the parts to be divided and loaded are the start part of an NAL unit or the end part thereof, respectively.
  • the "NAL_unit_type” value indicating encoding contents contained in the NAL unit is set, as shown in Fig. 2.
  • the present invention proposes a practical RTP packetization algorithm which can effectively load NAL units of an SVC in an RTP payload while maintaining the specification of the RTP payload format. Disclosure of Invention Technical Problem
  • an object of the present invention to provide a method for determining a packet type for RTP packetization in a procedure of applying RTP packetization to an SVC bitstream, and an RTP packetizing method and apparatus including an RTP packet generating method based on the packet type determining method.
  • a method for determining a packet type for a Scalable Video Coded (SVC) video bitstream which includes the steps of: a) deriving temporal and spatial hierarchy information between Network Abstraction Layer (NAL) units from field information defined in the NAL unit headers of scalable layers; b) detecting the type of encoding information by applying combined scalability encoding to the hierarchical structure of the SVC; and c) determining a Real-time Transport Protocol (RTP) packet type for the corresponding SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.
  • NAL Network Abstraction Layer
  • RTP Real-time Transport Protocol
  • the temporal and spatial hierarchy information derivation step of a) is performed by a combination of related hierarchy values (TL, DID, and QL) between the layers in the temporal and spatial SNR scalability defined in the last octet of the scalable layer NAL unit headers.
  • the encoding information type detection step of b) is carried out by analyzing
  • NAL_unit_type values of the NAL units belonging to the base layer and the scalable layers, which are the "NAL_unit_type” values indicating encoding information.
  • the packet type determined in the step c) is any one among Single NAL Unit (SNU), Fragmentation Unit-A (FU-A), and Simple-Time Aggregation Packet-A (STAP-A) types of a non-interleaved mode.
  • SNU Single NAL Unit
  • FU-A Fragmentation Unit-A
  • STAP-A Simple-Time Aggregation Packet-A
  • a packetizing method for an SVC video bitstream which includes the steps of: a) determining a packet type for the SVC video bitstream; and b) fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the determined packet type and loading the fragments in RTP packets.
  • an apparatus for packetizing an SVC video bitstream which includes: a packet type determiner for determining a packet type for the SVC video bitstream; and a packet generator for generating a packet by fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the packet type determined by the packet type determiner and loading the fragments in RTP packets.
  • the present invention can efficiently determine the packet type for an SVC bitstream and perform RTP pack- etization using the same. [39] As a result, the present invention can more efficiently transmit an SVC video bitstream through an IP network such as the internet.
  • Fig. 1 is a table showing a packet type supportable for each RTP packetization mode.
  • Fig. 2 is a table summarizing contents contained in NAL units by NAL_unit_types.
  • FIG. 3 is an explanatory view showing an RTP payload format structure for a STAP-
  • FIG. 4 is an explanatory view showing an RTP payload format structure for an FU-A type.
  • Figs. 5 and 6 are explanatory views showing the header structures of NAL units used in a base layer and scalable layers of an SVC in accordance with the present invention.
  • Fig. 7 is an explanatory view showing a screen and hierarchical structure for combined scalability encoding of the SVC in accordance with the present invention.
  • Fig. 8 is an explanatory view of the encoding order of SVC screens and of NAL units of the base layer and scalable layers belonging to each screen in accordance with the present invention.
  • Fig. 8 is an explanatory view of the encoding order of SVC screens and of NAL units of the base layer and scalable layers belonging to each screen in accordance with the present invention.
  • FIG. 9 is a detailed flow chart illustrating an RTP packetizing method in accordance with a preferred embodiment of the present invention.
  • Fig. 10 is a block diagram illustrating the structure of an RTP packetizing apparatus for an SVC bitstream in accordance with another preferred embodiment of the present invention.
  • FIGs. 5 and 6 are explanatory views showing header structures of NAL units used in the base layer and scalable layers of an SVC in accordance with the present invention.
  • Encoding information generated by SVC encoding is stored in a bit stream in NAL units. As shown in Figs. 5 and 6, the header structure of the NAL unit generated in the base layer and that of the NAL unit generated in the scalable layers are different from each other.
  • Fig. 5 shows the header structure of the NAL unit generated in the base layer compatible with H.264
  • Fig. 6 depicts the header structure of the NAL unit generated in the scalable layers.
  • the "Type” field means "NAL_unit_type” representing information on contents of encoding information contained in the NAL unit, and shows the type of encoding information contained in the NAL unit for each of the aforementioned "NAL_unit_types" of Fig. 2.
  • NAL_unit_types of Nos. 1 to 6 are usable in the NAL unit of the base layer is, while the “NAL_unit_types" of Nos. 20 and 21 are usable in the scalable layers.
  • the other "NAL_unit_types" are used to indicate the NAL units containing not encoding information but additional information.
  • temporal and spatial hierarchy for each of the NAL units can be derived from (TL, DID, and QL) field information defined in the header of each NAL unit of the scalable layers.
  • the (TL, DID, QL) field which is the last octet, in Fig. 6 represents the inter-layer hierarchy in the temporal and spatial SNR scalability. That is, TL (temporal_level) represents the hierarchy between temporal layers for temporal scalability, DID (dependency_id) indicates the dependency hierarchy between higher/lower scalable layers in the inter-layer prediction of spatial scalability, and QL (quality_level) represents the hierarchy between FGS layers for support of SNR scalability.
  • TL, DID, and QL values are all integers greater than "0", and the temporal and spatial hierarchy of the NAL units can be derived from a combination of these values.
  • Fig. 7 is a view showing an example of a screen and hierarchical structure for combined scalability encoding of the SVC in accordance with the present invention.
  • IDR Instantaneous Decoding Refresh
  • GOP Group Of Picture
  • GOP consists of 16 screens, and the other GOPs not shown in the drawing also has a structure where the GOP size is 16.
  • the screen resolution that can be supported by the base layer is QCIF
  • the screen resolution that can be supported by the spatial scalable layers is CIF.
  • the DID value in the (TL, DID, QL) field of Fig. 6 is used.
  • a hierarchical B -picture approach is applied for provision of temporal scalability, and the TL value in the (TL, DID, QL) field is used in order to display a supportable frame rate.
  • the TL value is displayed in the middle part of each screen indicated in a rectangle.
  • the frame rate can be supported up to 1.875 fps, and in case of transmitting it, including a B-picture with
  • the frame rate can be supported up to 3.75 fps.
  • the frame rate can be supported up to 15 fps in QCIF standard, and as the maximum TL value in the spatial scalable layer is 4, the frame rate can be supported up to 30 fps in CIF standard.
  • the three NAL units generated in the scalable layers include one NAL unit for FGS scaling for the base layer, one NAL unit for the spatial scalable layer, and one NAL unit for FGS scaling for the spatial scalable layer.
  • the NAL unit firstly generated in the IDR picture is done in the base layer, and the header of the NAL unit conforms to the structure of (a) of Fig. 5. Because the base layer is an IDR picture, it can be seen from Fig. 9 that the "NAL_unit_type" of the header is set to "5" by Fig. 2 described above.
  • the NAL unit secondly generated in the IDR picture is the one for FGS scaling for the base layer. As the "NAL_unit_type" is set to "21" by Fig. 2, and QL is set to 1
  • the NAL unit thirdly generated in the IDR picture is the one for spatial scalable layer.
  • NAL unit lastly generated in the IDR picture is the one for FGS scaling for spatial scalable layer.
  • NAL_unit_type is set to "21” and QL is set to 1
  • the screen numbers 1, 3, 5, 7, 13, and 15 are encoded in the spatial scalable layer in order to support 30 fps only by the CIF standard.
  • analyzing the "NAL_unit_types" and (TL, DID, QL) field of the NAL units can detect the type of encoding information contained in the NAL units through the "NAL_unit_type” values and derive the temporal and spatial hierarchy between the NAL units through the (TL, DID, QL) values.
  • Such information can be very usefully utilized in effectively designing the RTP pack- etization scheme for cutting an SVC stream to a proper size and loading the same in an RTP packet.
  • the NAL units belonging to the base layer have a higher priority order in transmission than the NAL units belonging to the scalable layers, and are processed to be strong against an error through channel encoding, separately from scalable layer information. Therefore, the NAL units of the base layer are not loaded in an RTP packet by mixing with the NAL units of the scalable layers, but loaded independently in an RTP packet.
  • the STAP-A packet type that can be aggregated with the NAL units of the scalable layers is not applied to the NAL units of the base layer, but either SNU or FU-A is selected and loaded in an RTP packet by considering the length of the NAL units.
  • Applied to the NAL units belonging to the scalable layers are all the three packet types including SNU, FU-A, and STAP-A.
  • NAL units, and STAP-A is applied in such a manner that several NAL units of the scalable layers belonging to the same screen number are aggregated as one within the range that does not exceed the MTU size and loaded in an RTP packet.
  • (TL, DID, QL) information of NU which is the NAL unit being inputted to the loop of the present algorithm, is indicated by (T , D , Q ), and the next NAL unit to be analyzed one step in advance by the look- ahead scheme is designated by NU , and (TL, DID, QL) information of NU is indicated by (T , D , Q ).
  • RTP pay load is as follows:
  • NU should not be the NAL unit belonging to the base layer. i+l
  • NU should have the same TL value as NU .
  • NU and NU i+l should sequentially satisfy all of i, ii, iii, and iv-(a) among the above conditions, or should sequentially satisfy all of i, ii, iii, and iv-(b).
  • RTP packetization by determining the SNU, FU-A, and STAP-A packet types based on the above conditions for determining the packet type as STAP-A.
  • N implies that a packetizing process that is currently in progress is in the process of loading an N-th NAL unit in an RTP payload.
  • the packet type is determined as STAP-A when N>1, only NU is loaded in the RTP payload.
  • the packet type is determined as SNU or FU-A by checking whether the size of NU exceeds that of the
  • parameters I and J are used so as to indicate the start position and end position of the N-number of NAL units to be loaded in the RTP payload.
  • Si means the size of NU
  • Pi means the size of total packets accumulated in the RTP payload including NU and is used to check whether or not the size of the total packets accumulated in the RTP payload exceeds that of the MTU.
  • FIG. 10 is a block diagram illustrating an RTP packetizing apparatus for an SVC bitstream in accordance with another preferred embodiment of the present invention.
  • the inventive RTP packetizing apparatus 120 for an SVC bitstream 120 includes a packet type determiner 130 for determining a packet type for an input SVC bitstream and a packet generator 140 for generating an RTP packet by fragmenting the SVC bitstream so as to correspond to the packet type determined by the packet type determiner 130 and loading the same in an RTP packet.
  • Reference numeral 110 not explained represents an SVC encoder 110 which provides the SVC bitstream to the packet type determiner 130 by encoding an input video sequence.
  • the method of the present invention as mentioned above may be implemented by a software program that is stored in a computer-readable storage medium such as CD- ROM, RAM, ROM, floppy disk, hard disk, optical magnetic disk, or the like. This procedure may be readily carried out by those skilled in the art; and therefore, details of thereof are omitted here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Provided are a method for determining the packet type for a Scalable Video Coded (SVC) video bitstream, and a Real-time Transport Protocol (RTP) packetizing apparatus and method using the same. The method for determining a packet type for a Scalable Video Coded (SVC) video bitstream, which includes the steps of: a) deriving temporal and spatial hierarchy information between Network Abstraction Layer (NAL) units from field information defined in the NAL unit headers of scalable layers; b) detecting the type of encoding information by applying combined scalability encoding to the hierarchical structure of the Scalable Video Coding (SVC); and c) determining a Real-time Transport Protocol (RTP) packet type for the corresponding SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.

Description

Description
METHOD FOR DETERMINING PACKET TYPE FOR SVC
VIDEO BITSTREAM, AND RTP PACKETIZING APPARATUS
AND METHOD USING THE SAME
Technical Field
[1] The present invention relates to a method for determining the packet type for a
Scalable Video Coded (SVC) video bitstream, and a Real-time Transport Protocol (RTP) packetizing apparatus and method using the same; and, more particularly, to a method for determining the packet type for RTP packetization in a procedure of applying RTP packetization to an SVC bitstream, and an RTP packetizing method and apparatus including an RTP packet generating method based on the packet type determining method. Background Art
[2] Scalable Video Coding (SVC), which is a scalable coding technique of H.264, is a new scalable coding technique that is developed to solve the problems of low compression efficiency, unsupportability of combined scalability, and high implementation complexity, which are caused by layered coding-based scalability attempted in existing Moving Picture Experts Group 2 (MPEG-2), MPEG-4, etc.
[3] SVC encodes multiple video layers into a single bit sequence. The layers of SVC include one base layer and scalable layers that can be continuously stacked over the base layer.
[4] Each scalable layer is able to express the maximum bit rate, frame rate and resolution that are given to itself based on low-order layer information.
[5] The more the SVC continuously stacks scalable layers, the more diverse bit rates, frame rates, and resolutions it is possible to support. Thus, the SVC is a coding technique suitable for multimedia contents service in a Universal Multimedia Access (UMA) environment that can solve the problem of variability in bandwidth that occurs in a heterogeneous network environment, the problem of variability in receiving terminal performance and resolution, the problem of various preferences of contents consumers and so on in a complex way.
[6] A Video Coding Layer (VCL) of an SVC encoder generates base layer encoding information and scalability encoding information of the scalable layers in slices.
[7] Each slice is generated in Network Abstraction Layer (NAL) units in an NAL and stored in an SVC bitstream.
[8] Although an RTP payload format for loading the NAL units of the SVC is currently disclosed in an internet draft document "draft-wenger-avt-rtp-svc-02.txt", the SVC is of a complicated structure that stores encoding information of SNR scalability and temporal and spatial scalability, as well as base layer encoding information that is compatible with H.264, in a single bit stream. Thus, no research has provided a result yet on an effective RTP packetizing method that can support the RTP payload format of the SVC.
[9] As RTP packet types for the NAL units of the SVC, there are a total of seven types, including a Single NAL Unit (SNU), a Simple-Time Aggregation Packet-A (STAP-A), STAP-B, Multi-Time Aggregation Packet 16 (MTAP 16), MTAP24, Fragmentation Unit-A (FU-A), and FU-B.
[10] The SNU type can load only one NAL unit in one RTP, and the STAP can simultaneously load multiple NAL units that belong to the same presentation time instant in one RTP packet. This STAP is divided into an STAP-A type that loads NAL units in an RTP packet in the same order as decoding and a STAP-B type that loads NAL units in an RTP packet without considering the encoding order for interleaving purposes.
[11] The MTAP can load multiple NAL units belonging to different presentation time instants in one RTP packet at a time and basically supports interleaving. This MTAP is divided into an MTAP 16 type supporting a 16-bit time offset and an MTAP24 type supporting a 24-bit time offset depending on the size of a time offset field for displaying the difference in presentation time instant between the NAL units.
[12] Among these seven RTP packet types, only packet types required according to an application field are aggregated by three types of RTP packet modes. Fig. 1 shows RTP packet types that can be supported by three types of RTP packet modes including an SNU mode, a non-interleaved mode, and interleaved mode.
[13] The SNU mode of Fig. 1 is able to support only the SNU type that can load only one
NAL unit having 1 to 23 "NAL_unit_types" shown in Fig. 2 in an RTP packet, and its application field is restrictive.
[14] On the other hand, the non-interleaved mode is able to support the STAP-A and the
FU-A as well as the SNU type, and thus, its practically applicable application range is wide.
[15] The interleaved mode is a mode that adds an interleaving function to the non- interleaved mode, and has a drawback that it cannot support the SNU type. As the order of the NAL units to be loaded in the RTP packet by the interleaving function of the interleaved mode is different from the order of decoding, a burst error in a channel can be effectively dealt with, but RTP packetization and de-packetization and an SVC decoding procedure become very complicated.
[16] Therefore, in view of the implementation complexity and the applicable application range, the non-interleaved mode is suitable as the RTP packetization mode that must be necessarily supported in a commercial SVC streaming service, and the interleaved mode can be considered as an option for a service in an environment with high channel error. [17] The SNU type of the non-interleaved mode is supposed to load one NAL unit having
1 to 23 "NAL_unit_types" shown in Fig. 2 in one RTP packet. [18] In other words, the STAP-A type of the non-interleaved mode has an RTP payload format structure as shown in Fig. 3, and is of the type that aggregates several NAL units corresponding to the same presentation time instant and loads the same in one
RTP packet. [19] The STAP-A type of the non-interleaved mode, as shown in Fig. 3, has a 1-byte RTP payload header (STAP-A NAL HDR) additionally inserted therein, unlike the SNU type. The value of the F field of the payload header is set to " 1 " if there is more than one NAL unit in which the F field indicated in each of the headers of the NAL units to be loaded together has a value of " 1 " [20] The NRI field of the payload header is set to the maximum value of the NRI field values indicated in each of the headers of the NAL units to be loaded together. [21] In the "Type" field of the payload header, "NAL_unit_type" of No.24 in Fig. 3 is set in order to show that this is a STAP-A type. [22] In addition, the "NALU_Size" field of 2 bytes representing the size of each NAL unit to be loaded separately from payload header information is inserted in the front part of each NAL unit. [23] The FU-A type of the non-interleaved mode is a type that divides a NAL unit into two or more so that it does not exceed an MTU (Maximum Transmission Unit) size and loads the divided units in respective corresponding RTP packets in order to prevent the occurrence of packet fragmentation in a router or gateway during transmission if the size of one NAL unit exceeds that of the MTU of a network. [24] Fig. 4 illustrates the structure of an RTP payload format for the FU-A type. The RTP payload header is composed of a total of 2 bytes including one byte of "FU_indicator" and one byte of "FU_header". [25] The values indicated in the headers of the NAL units are applied to the F field and
NRI field of "FU_indicator" as it is. [26] "NAL_unit_type" of "No.28" in Fig. 1 is set in the "Type" field of "FU_indicator" in order to show that this is the FU-A type. [27] The S field and E field of "FU_header" are used in order to show that the parts to be divided and loaded are the start part of an NAL unit or the end part thereof, respectively. [28] In the "Type" field of the "FUJieader", the "NAL_unit_type" value indicating encoding contents contained in the NAL unit is set, as shown in Fig. 2. [29] That is, as described above, although the RTP packet type for the NAL units stored in an SVC bitstream is classified as standard, there has been no suggestion of the standard and method for determining a given NAL unit as a suitable packet type.
[30] Consequently, the present invention proposes a practical RTP packetization algorithm which can effectively load NAL units of an SVC in an RTP payload while maintaining the specification of the RTP payload format. Disclosure of Invention Technical Problem
[31] It is, therefore, an object of the present invention to provide a method for determining a packet type for RTP packetization in a procedure of applying RTP packetization to an SVC bitstream, and an RTP packetizing method and apparatus including an RTP packet generating method based on the packet type determining method.
[32] Other objects and advantages of the present invention can be understood by the following description, and become apparent with reference to the embodiments of the present invention. Also, it is obvious to those skilled in the art of the present invention that the objects and advantages of the present invention can be realized by the means as claimed and combinations thereof. Technical Solution
[33] In accordance with an aspect of the present invention, there is provided a method for determining a packet type for a Scalable Video Coded (SVC) video bitstream, which includes the steps of: a) deriving temporal and spatial hierarchy information between Network Abstraction Layer (NAL) units from field information defined in the NAL unit headers of scalable layers; b) detecting the type of encoding information by applying combined scalability encoding to the hierarchical structure of the SVC; and c) determining a Real-time Transport Protocol (RTP) packet type for the corresponding SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.
[34] The temporal and spatial hierarchy information derivation step of a) is performed by a combination of related hierarchy values (TL, DID, and QL) between the layers in the temporal and spatial SNR scalability defined in the last octet of the scalable layer NAL unit headers.
[35] The encoding information type detection step of b) is carried out by analyzing
"NAL_unit_type" values of the NAL units belonging to the base layer and the scalable layers, which are the "NAL_unit_type" values indicating encoding information. The packet type determined in the step c) is any one among Single NAL Unit (SNU), Fragmentation Unit-A (FU-A), and Simple-Time Aggregation Packet-A (STAP-A) types of a non-interleaved mode.
[36] In accordance with another aspect of the present invention, there is provided a packetizing method for an SVC video bitstream, which includes the steps of: a) determining a packet type for the SVC video bitstream; and b) fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the determined packet type and loading the fragments in RTP packets.
[37] In accordance with another aspect of the present invention, there is provided an apparatus for packetizing an SVC video bitstream, which includes: a packet type determiner for determining a packet type for the SVC video bitstream; and a packet generator for generating a packet by fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the packet type determined by the packet type determiner and loading the fragments in RTP packets.
Advantageous Effects
[38] As mentioned above and will be described below, the present invention can efficiently determine the packet type for an SVC bitstream and perform RTP pack- etization using the same. [39] As a result, the present invention can more efficiently transmit an SVC video bitstream through an IP network such as the internet.
Brief Description of the Drawings
[40] Fig. 1 is a table showing a packet type supportable for each RTP packetization mode.
[41] Fig. 2 is a table summarizing contents contained in NAL units by NAL_unit_types.
[42] Fig. 3 is an explanatory view showing an RTP payload format structure for a STAP-
A type. [43] Fig. 4 is an explanatory view showing an RTP payload format structure for an FU-A type. [44] Figs. 5 and 6 are explanatory views showing the header structures of NAL units used in a base layer and scalable layers of an SVC in accordance with the present invention. [45] Fig. 7 is an explanatory view showing a screen and hierarchical structure for combined scalability encoding of the SVC in accordance with the present invention. [46] Fig. 8 is an explanatory view of the encoding order of SVC screens and of NAL units of the base layer and scalable layers belonging to each screen in accordance with the present invention. [47] Fig. 9 is a detailed flow chart illustrating an RTP packetizing method in accordance with a preferred embodiment of the present invention. [48] Fig. 10 is a block diagram illustrating the structure of an RTP packetizing apparatus for an SVC bitstream in accordance with another preferred embodiment of the present invention.
Best Mode for Carrying Out the Invention [49] The advantages, features and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter. Thus, the present invention will be easily practiced by those skilled in the art. Further, in the following description, well-known arts will not be described in detail if it seems that they could obscure the invention in unnecessary detail. Hereinafter, specific embodiments of the present invention will be set forth in detail with reference to the accompanying drawings.
[50] Figs. 5 and 6 are explanatory views showing header structures of NAL units used in the base layer and scalable layers of an SVC in accordance with the present invention.
[51] Encoding information generated by SVC encoding is stored in a bit stream in NAL units. As shown in Figs. 5 and 6, the header structure of the NAL unit generated in the base layer and that of the NAL unit generated in the scalable layers are different from each other.
[52] Fig. 5 shows the header structure of the NAL unit generated in the base layer compatible with H.264, and Fig. 6 depicts the header structure of the NAL unit generated in the scalable layers.
[53] In Figs. 5 and 6, the "Type" field means "NAL_unit_type" representing information on contents of encoding information contained in the NAL unit, and shows the type of encoding information contained in the NAL unit for each of the aforementioned "NAL_unit_types" of Fig. 2.
[54] The "NAL_unit_types" of Nos. 1 to 6 are usable in the NAL unit of the base layer is, while the "NAL_unit_types" of Nos. 20 and 21 are usable in the scalable layers. The other "NAL_unit_types" are used to indicate the NAL units containing not encoding information but additional information.
[55] In the present invention, temporal and spatial hierarchy for each of the NAL units can be derived from (TL, DID, and QL) field information defined in the header of each NAL unit of the scalable layers.
[56] The (TL, DID, QL) field, which is the last octet, in Fig. 6 represents the inter-layer hierarchy in the temporal and spatial SNR scalability. That is, TL (temporal_level) represents the hierarchy between temporal layers for temporal scalability, DID (dependency_id) indicates the dependency hierarchy between higher/lower scalable layers in the inter-layer prediction of spatial scalability, and QL (quality_level) represents the hierarchy between FGS layers for support of SNR scalability.
[57] The TL, DID, and QL values are all integers greater than "0", and the temporal and spatial hierarchy of the NAL units can be derived from a combination of these values.
[58] Based on the (TL, DID, QL) information and the "NAL_unit_types" thus analyzed, a practical RTP packetization algorithm that can be effectively applied to combined scalability of the SVC is proposed.
[59] Fig. 7 is a view showing an example of a screen and hierarchical structure for combined scalability encoding of the SVC in accordance with the present invention. [60] In Fig. 7, only an Instantaneous Decoding Refresh (IDR) screen, which is the start part of an SVC stream, and a first Group Of Picture (GOP) screen are shown. One
GOP consists of 16 screens, and the other GOPs not shown in the drawing also has a structure where the GOP size is 16. [61] The screen resolution that can be supported by the base layer is QCIF, and the screen resolution that can be supported by the spatial scalable layers is CIF. [62] In order to display different resolutions in different spatial layers, the DID value in the (TL, DID, QL) field of Fig. 6 is used. [63] That is, in Fig. 7, the NAL unit with DID=O means a screen with a resolution of
QCIF, and the NAL unit with DID=I represents a screen with a resolution of CIF. [64] A hierarchical B -picture approach is applied for provision of temporal scalability, and the TL value in the (TL, DID, QL) field is used in order to display a supportable frame rate. [65] In Fig. 7, the TL value is displayed in the middle part of each screen indicated in a rectangle. In case of transmitting only a key picture with TL=O, the frame rate can be supported up to 1.875 fps, and in case of transmitting it, including a B-picture with
TL=I, the frame rate can be supported up to 3.75 fps. [66] In case of additionally transmitting a B-picture with TL=2, the frame rate can be supported up to 7.5 fps, and in case of additionally transmitting B-pictures with TL=3 and TL=4, the frame rate can be supported up to 15 fps and 30 fps, respectively. [67] In Fig. 7, as the maximum TL value in the base layer is 3, the frame rate can be supported up to 15 fps in QCIF standard, and as the maximum TL value in the spatial scalable layer is 4, the frame rate can be supported up to 30 fps in CIF standard. [68] If the screens at the same point of time belonging to the base layer and the spatial scalable layer have the same TL value, inter-layer prediction encoding is executed in the direction of an arrow indicated by dotted lines in the drawing. The resolution of the base layer screen of the QCIF standard where DID=O is upsampled to be utilized for the prediction encoding of the scalable layer screen of the CIF standard where DID=I. [69] Meanwhile, in Fig. 7, since each screen generates one FGS layer for support of SNR scalability, the NAL units containing encoding information of each FGS layer are all set to QL=I. [70] Fig. 8 sequentially shows the encoding order of the screens and the NAL_unit_types and (TL, DID, QL) field information for NAL units of the base layer and scalable layers belonging to each screen in a case where SVC combined scalability encoding is applied to the screen and layer structure of Fig. 7. [71] Referring to the drawing, it can be seen that the encoding of an IDR picture occurs first. One base layer NAL unit having the header structure of Fig. 5 is generated in the base layer, and three scalable layer NAL units having the header structure of Fig. 6 are generated in the scalable layers. [72] The three NAL units generated in the scalable layers include one NAL unit for FGS scaling for the base layer, one NAL unit for the spatial scalable layer, and one NAL unit for FGS scaling for the spatial scalable layer. [73] The NAL unit firstly generated in the IDR picture is done in the base layer, and the header of the NAL unit conforms to the structure of (a) of Fig. 5. Because the base layer is an IDR picture, it can be seen from Fig. 9 that the "NAL_unit_type" of the header is set to "5" by Fig. 2 described above. [74] The NAL unit secondly generated in the IDR picture is the one for FGS scaling for the base layer. As the "NAL_unit_type" is set to "21" by Fig. 2, and QL is set to 1
(QL=I), (TL, DID, QL) becomes (0,0,1). [75] The NAL unit thirdly generated in the IDR picture is the one for spatial scalable layer. As the "NAL_unit_type" is set to "21" and DID is set to 1 (DID=I), (TL, DID,
QL) becomes (0,1,0). [76] The NAL unit lastly generated in the IDR picture is the one for FGS scaling for spatial scalable layer. As the "NAL_unit_type" is set to "21" and QL is set to 1
(QL=I), (TL, DID, QL) becomes (0,1,1). [77] When the encoding of the IDR picture is finished as above, a screen with screen number 16, which is an I- or P-picture, is encoded. Because this picture is a non-IDR picture, "NAL_unit_type" is set to " 1" in the base layer and to "20" in the scalable layer. [78] After completion of the encoding of the screen number 16, a screen with screen number 8 is encoded. The TL values for four NAL units generated at this time are all set to "1", thereby making it possible to support the frame rate of 3.75 fps. [79] Next, as the screens with screen numbers 4 and 12 are set to TL=2, the frame rate of
7.5 fps can be supported. [80] The screen numbers 2, 6, 10, and 14 are set to TL=3 with respect to all the four NAL units generated for support of the frame rate of 15 fps. [81] Meanwhile, the screen numbers 1, 3, 5, 7, 13, and 15 are encoded in the spatial scalable layer in order to support 30 fps only by the CIF standard. [82] As shown in Fig. 8, there exists no NAL unit belonging to the base layer, but only two NAL units belonging to the scalable layers exist. [83] To support 30fps, TL is set to 4 (TL=4) and all the NAL units belong to the spatial scalable layer, and thus, they are commonly set to DID=I. [84] The NAL units for the spatial scalable layer are set to QL=O, and the NAL units for
FGS scaling for the spatial scalable layer are set to QL=I. [85] As shown in Fig. 8, for the combined scalability of the SVC, analyzing the "NAL_unit_types" and (TL, DID, QL) field of the NAL units can detect the type of encoding information contained in the NAL units through the "NAL_unit_type" values and derive the temporal and spatial hierarchy between the NAL units through the (TL, DID, QL) values.
[86] Such information can be very usefully utilized in effectively designing the RTP pack- etization scheme for cutting an SVC stream to a proper size and loading the same in an RTP packet.
[87] In the non-interleaved mode, three packet types such as SNU, FU-A, and STAP-A are supported, as shown in Fig. 1.
[88] Generally, the NAL units belonging to the base layer have a higher priority order in transmission than the NAL units belonging to the scalable layers, and are processed to be strong against an error through channel encoding, separately from scalable layer information. Therefore, the NAL units of the base layer are not loaded in an RTP packet by mixing with the NAL units of the scalable layers, but loaded independently in an RTP packet.
[89] Accordingly, the STAP-A packet type that can be aggregated with the NAL units of the scalable layers is not applied to the NAL units of the base layer, but either SNU or FU-A is selected and loaded in an RTP packet by considering the length of the NAL units.
[90] Applied to the NAL units belonging to the scalable layers are all the three packet types including SNU, FU-A, and STAP-A.
[91] Among them, SNU and FU-A are selectively applied depending on the length of the
NAL units, and STAP-A is applied in such a manner that several NAL units of the scalable layers belonging to the same screen number are aggregated as one within the range that does not exceed the MTU size and loaded in an RTP packet.
[92] Hereinafter, an algorithm based on a look-ahead scheme for identifying scalable layer NAL units, to which the STAP-A type is to be applied, will be described.
[93] (TL, DID, QL) information of NU , which is the NAL unit being inputted to the loop of the present algorithm, is indicated by (T , D , Q ), and the next NAL unit to be analyzed one step in advance by the look- ahead scheme is designated by NU , and (TL, DID, QL) information of NU is indicated by (T , D , Q ). t+l i+l i+l i+l
[94] In order to determine whether to apply the STAP-A type, (T , D , Q ) in- i+l i+l i+l formation of NU is extracted in advance and compared. The sequential condition that should be satisfied in order to aggregate NU and NU and add the same to one i+l
RTP pay load is as follows:
[95] i. NU should not be the NAL unit belonging to the base layer. i+l
[96] ii. NU should have the same TL value as NU .
[97] iii. The sum of the size of the NAL units accumulated until NU in an RTP payload plus the size of NU should be smaller than the size of an MTU (in case of the t+l internet, the general size of the MTU is 1500 bytes). In case of transmitting an RTP packet greater than the MTU, the RTP packet is fragmented into several packets by the fragmentation function of a router or gateway during transmission through a network, thereby causing a burden to the network and the client.
[98] iv. The following conditions should be satisfied depending on the magnitude correlation of Q and Q i i+l
[99] (a) If Q >Q , this means that the quality level of a FGS layer increases. This t+l i phenomenon occurs only to the NAL units belonging to the same screen number, and thus, the condition of STAP-A is satisfied. Therefore, NU and NU can be loaded t+l i together in an RTP payload. [100] Qo) Jf Q = Q , this means that the quality level of a FGS layer does not increase. t+l i
The situation where this phenomenon occurs can be divided into the situation of D > i+l
D and vice versa. The situation of D >D occurs only to the NAL units that always i i+l i exist within the same screen number, and thus, NU and NU can be targets of ST AP- i i+l
A. However, the situation of D i+l =D i occurs between the NAL units having different screen numbers, i.e., different presentation time instants, and thus NU i+l and NU i cannot be targets of STAP-A.
[101] In conclusion, in order to perform RTP packetization in the STAP-A type, NU and NU i+l should sequentially satisfy all of i, ii, iii, and iv-(a) among the above conditions, or should sequentially satisfy all of i, ii, iii, and iv-(b).
[102] There is shown in Fig. 9 a flowchart of the algorithm proposed in order to perform
RTP packetization by determining the SNU, FU-A, and STAP-A packet types based on the above conditions for determining the packet type as STAP-A.
[103] The flowchart as shown in Fig. 9 is performed for every GOP unit, and the RTP packet type is determined based on NUTypei, which is the "NAL_unit_type" of all the NAL units existing within one GOP, and (T , D , Q ), which is (TL, DID, QL) in- formation, as explained in Fig. 8. [104] i, ii, iii, iv-(a), and iv-(b), which are the conditions for determining the packet type as
STAP-A, are indicated on the corresponding blocks of Fig. 9, respectively. [105] In Fig. 9, N implies that a packetizing process that is currently in progress is in the process of loading an N-th NAL unit in an RTP payload. The algorithm shown in the drawing is operated in the look-ahead scheme of investigating NU in advance in t+l order to compare the STAP-A type condition. Therefore, if the packet type is determined as STAP-A when N=I, NU and NU are simultaneously loaded in the i i+l
RTP payload, while if the packet type is determined as STAP-A when N>1, only NU is loaded in the RTP payload. [106] If the packet type is not determined as STAP-A when N=I, the packet type is determined as SNU or FU-A by checking whether the size of NU exceeds that of the
MTU. On the other hand, if the packet type is not determined as STAP-A when N>1, an N-number of NAL units accumulated in an RTP payload up to present are loaded and transmitted in one RTP packet, and then parameters N and I are updated to generate a new RTP packet, followed by repeating the entire process. [107] Herein, parameters I and J are used so as to indicate the start position and end position of the N-number of NAL units to be loaded in the RTP payload. Meanwhile, Si means the size of NU , and Pi means the size of total packets accumulated in the RTP payload including NU and is used to check whether or not the size of the total packets accumulated in the RTP payload exceeds that of the MTU.
[108] Fig. 10 is a block diagram illustrating an RTP packetizing apparatus for an SVC bitstream in accordance with another preferred embodiment of the present invention.
[109] Referring to Fig. 10, the inventive RTP packetizing apparatus 120 for an SVC bitstream 120 includes a packet type determiner 130 for determining a packet type for an input SVC bitstream and a packet generator 140 for generating an RTP packet by fragmenting the SVC bitstream so as to correspond to the packet type determined by the packet type determiner 130 and loading the same in an RTP packet.
[110] A description of the detailed functions of the components such as the packet type determiner 130 and the packet generator 140 will be substituted by the above description of Figs. 5 to 9.
[I l l] Reference numeral 110 not explained represents an SVC encoder 110 which provides the SVC bitstream to the packet type determiner 130 by encoding an input video sequence.
[112] The method of the present invention as mentioned above may be implemented by a software program that is stored in a computer-readable storage medium such as CD- ROM, RAM, ROM, floppy disk, hard disk, optical magnetic disk, or the like. This procedure may be readily carried out by those skilled in the art; and therefore, details of thereof are omitted here.
[113] The present application contains subject matter related to Korean Patent Application Nos. 2006-0110714 and 2006-0125144, filed in the Korean Intellectual Property Office on November 9, 2006, and December 8, 2006, the entire contents of which is incorporated herein by reference.
[114] While the present invention has been described with respect to the specific embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.

Claims

Claims
[1] A method for determining a packet type for a Scalable Video Coded (SVC) video bitstream, comprising the steps of: a) deriving temporal and spatial hierarchy information between Network Abstraction Layer (NAL) units from field information defined in the NAL unit headers of scalable layers; b) detecting the type of encoding information by applying combined scalability encoding to the hierarchical structure of the Scalable Video Coding (SVC); and c) determining a Real-time Transport Protocol (RTP) packet type for the corresponding SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.
[2] The method of claim 1, wherein the step a) is performed by a combination of related hierarchy values (TL, DID, and QL) between the layers in the temporal and spatial SNR scalability defined in the last octet of the scalable layer NAL unit headers.
[3] The method of claim 1, wherein the step b) is carried out by analyzing
NAL_unit_type values of the NAL units belonging to the base layer and the scalable layers, which are the "NAL_unit_type" values indicating encoding information.
[4] The method of claim 1, wherein the packet type determined in the step c) is any one among Single NAL Unit (SNU), Fragmentation Unit-A (FU-A), and Simple- Time Aggregation Packet-A (STAP-A) types of a non-interleaved mode.
[5] An RTP packetizing method for an SVC video bitstream, comprising the steps of: a) determining a packet type for the SVC video bitstream; and b) fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the determined packet type and loading the fragments in RTP packets.
[6] The method of claim 5, wherein the step a) includes the steps of: al) deriving temporal and spatial hierarchy information between NAL units from field information defined in the NAL unit headers of scalable layers; a2) detecting a type of encoding information by applying combined scalability encoding to the hierarchical structure of the SVC; and a3) determining an RTP packet type for the SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.
[7] The method of claim 6, wherein the step al) is performed by a combination of related hierarchy values (TL, DID, and QL) between the layers in the temporal and spatial SNR scalability defined in the last octet of the scalable layer NAL unit headers.
[8] The method of claim 6, wherein the step a2) is performed by analyzing
"NAL_unit_type" values of the NAL units belonging to the base layer and the scalable layers, which are the "NAL_unit_type" values indicating encoding information.
[9] The method of claim 6, wherein the packet type determined in the step a3) is any one among SNU, FU-A, and STAP-A types of a non-interleaved mode.
[10] An apparatus for packetizing an SVC video bitstream, comprising: a packet type determiner for determining a packet type for the SVC video bitstream; and a packet generator for generating a packet by fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the packet type determined by the packet type determiner and loading the fragments in RTP packets.
PCT/KR2007/004413 2006-11-09 2007-09-12 Method for determining packet type for svc video bitstream, and rtp packetizing apparatus and method using the same WO2008056878A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/513,542 US8761203B2 (en) 2006-11-09 2007-09-12 Method for determining packet type for SVC video bitstream, and RTP packetizing apparatus and method using the same

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2006-0110714 2006-11-09
KR20060110714 2006-11-09
KR1020060125144A KR100776680B1 (en) 2006-11-09 2006-12-08 Method for packet type classification to svc coded video bitstream, and rtp packetization apparatus and method
KR10-2006-0125144 2006-12-08

Publications (1)

Publication Number Publication Date
WO2008056878A1 true WO2008056878A1 (en) 2008-05-15

Family

ID=39364668

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2007/004413 WO2008056878A1 (en) 2006-11-09 2007-09-12 Method for determining packet type for svc video bitstream, and rtp packetizing apparatus and method using the same

Country Status (1)

Country Link
WO (1) WO2008056878A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010108416A1 (en) * 2009-03-24 2010-09-30 华为技术有限公司 Method, device and communication system for forwarding scalable video coding data messages
EP2509359A2 (en) * 2009-12-01 2012-10-10 Samsung Electronics Co., Ltd. Method and apparatus for transmitting a multimedia data packet using cross-layer optimization
CN101783935B (en) * 2009-01-19 2013-07-24 财团法人工业技术研究院 Method for encapsulating scalable video coding bitstreams

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1311125A2 (en) * 2001-11-12 2003-05-14 Sony Corporation Data communication system and method, data transmission apparatus and method, data receiving apparatus, received-data processing method and computer program
WO2004036916A1 (en) * 2002-10-15 2004-04-29 Koninklijke Philips Electronics N.V. System and method for transmitting scalable coded video over an ip network
EP1631088A1 (en) * 2004-08-23 2006-03-01 LG Electronics Inc. Apparatus for transmitting video signal and method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1311125A2 (en) * 2001-11-12 2003-05-14 Sony Corporation Data communication system and method, data transmission apparatus and method, data receiving apparatus, received-data processing method and computer program
WO2004036916A1 (en) * 2002-10-15 2004-04-29 Koninklijke Philips Electronics N.V. System and method for transmitting scalable coded video over an ip network
EP1631088A1 (en) * 2004-08-23 2006-03-01 LG Electronics Inc. Apparatus for transmitting video signal and method thereof

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101783935B (en) * 2009-01-19 2013-07-24 财团法人工业技术研究院 Method for encapsulating scalable video coding bitstreams
WO2010108416A1 (en) * 2009-03-24 2010-09-30 华为技术有限公司 Method, device and communication system for forwarding scalable video coding data messages
EP2509359A2 (en) * 2009-12-01 2012-10-10 Samsung Electronics Co., Ltd. Method and apparatus for transmitting a multimedia data packet using cross-layer optimization
EP2509359A4 (en) * 2009-12-01 2014-03-05 Samsung Electronics Co Ltd Method and apparatus for transmitting a multimedia data packet using cross-layer optimization

Similar Documents

Publication Publication Date Title
US8761203B2 (en) Method for determining packet type for SVC video bitstream, and RTP packetizing apparatus and method using the same
EP2919453B1 (en) Video stream switching
AU2009211932B2 (en) Method and device for reordering and multiplexing multimedia packets from multimedia streams pertaining to interrelated sessions
US9456209B2 (en) Method of multiplexing H.264 elementary streams without timing information coded
US8798145B2 (en) Methods for error concealment due to enhancement layer packet loss in scalable video coding (SVC) decoding
US20140133489A1 (en) Method for transmitting packet-based media data having header in which overhead is minimized
JP2006087125A (en) Method of encoding sequence of video frames, encoded bit stream, method of decoding image or sequence of images, use including transmission or reception of data, method of transmitting data, coding and/or decoding apparatus, computer program, system, and computer readable storage medium
KR20100071688A (en) A streaming service system and method for universal video access based on scalable video coding
US20100046552A1 (en) Time-stamping apparatus and method for rtp packetization of svc coded video, and rtp packetization system using the same
EP2404451B1 (en) Processing of multimedia data
WO2008056878A1 (en) Method for determining packet type for svc video bitstream, and rtp packetizing apparatus and method using the same
US8565083B2 (en) Thinning of packet-switched video data
KR100849495B1 (en) Method for producing bit-rate based on RTP packet mode
Seo et al. A practical RTP packetization scheme for SVC video transport over IP networks
WO2007035151A1 (en) Media stream scaling
Kordelas et al. On the performance of H. 264/MVC over lossy IP-based networks
Hannuksela et al. Congestion-aware transmission rate control using medium grain scalability of scalable video coding
Klemets RTP payload format for video codec 1 (VC-1)
Klemets RFC 4425: RTP Payload Format for Video Codec 1 (VC-1)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07808205

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 12513542

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07808205

Country of ref document: EP

Kind code of ref document: A1