WO2008056878A1 - Method for determining packet type for svc video bitstream, and rtp packetizing apparatus and method using the same - Google Patents
Method for determining packet type for svc video bitstream, and rtp packetizing apparatus and method using the same Download PDFInfo
- Publication number
- WO2008056878A1 WO2008056878A1 PCT/KR2007/004413 KR2007004413W WO2008056878A1 WO 2008056878 A1 WO2008056878 A1 WO 2008056878A1 KR 2007004413 W KR2007004413 W KR 2007004413W WO 2008056878 A1 WO2008056878 A1 WO 2008056878A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nal
- type
- svc
- rtp
- packet
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000002123 temporal effect Effects 0.000 claims abstract description 20
- 238000013467 fragmentation Methods 0.000 claims description 5
- 238000006062 fragmentation reaction Methods 0.000 claims description 5
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 239000012634 fragment Substances 0.000 claims description 4
- 239000010410 layer Substances 0.000 description 83
- 102100034187 S-methyl-5'-thioadenosine phosphorylase Human genes 0.000 description 4
- 101710136206 S-methyl-5'-thioadenosine phosphorylase Proteins 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 239000011229 interlayer Substances 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 102100037812 Medium-wave-sensitive opsin 1 Human genes 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/63—Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
- H04N21/643—Communication protocols
- H04N21/6437—Real-time Transport Protocol [RTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/31—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/33—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/34—Scalability techniques involving progressive bit-plane based encoding of the enhancement layer, e.g. fine granular scalability [FGS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/36—Scalability techniques involving formatting the layers as a function of picture distortion after decoding, e.g. signal-to-noise [SNR] scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234327—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by decomposing into layers, e.g. base layer and one or more enhancement layers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/238—Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
- H04N21/2381—Adapting the multiplex stream to a specific network, e.g. an Internet Protocol [IP] network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8451—Structuring of content, e.g. decomposing content into time segments using Advanced Video Coding [AVC]
Definitions
- the present invention relates to a method for determining the packet type for a
- Scalable Video Coded (SVC) video bitstream and a Real-time Transport Protocol (RTP) packetizing apparatus and method using the same; and, more particularly, to a method for determining the packet type for RTP packetization in a procedure of applying RTP packetization to an SVC bitstream, and an RTP packetizing method and apparatus including an RTP packet generating method based on the packet type determining method.
- RTP Real-time Transport Protocol
- Scalable Video Coding which is a scalable coding technique of H.264, is a new scalable coding technique that is developed to solve the problems of low compression efficiency, unsupportability of combined scalability, and high implementation complexity, which are caused by layered coding-based scalability attempted in existing Moving Picture Experts Group 2 (MPEG-2), MPEG-4, etc.
- MPEG-2 Moving Picture Experts Group 2
- MPEG-4 MPEG-4
- SVC encodes multiple video layers into a single bit sequence.
- the layers of SVC include one base layer and scalable layers that can be continuously stacked over the base layer.
- Each scalable layer is able to express the maximum bit rate, frame rate and resolution that are given to itself based on low-order layer information.
- the SVC is a coding technique suitable for multimedia contents service in a Universal Multimedia Access (UMA) environment that can solve the problem of variability in bandwidth that occurs in a heterogeneous network environment, the problem of variability in receiving terminal performance and resolution, the problem of various preferences of contents consumers and so on in a complex way.
- UMA Universal Multimedia Access
- a Video Coding Layer (VCL) of an SVC encoder generates base layer encoding information and scalability encoding information of the scalable layers in slices.
- Each slice is generated in Network Abstraction Layer (NAL) units in an NAL and stored in an SVC bitstream.
- NAL Network Abstraction Layer
- RTP packet types for the NAL units of the SVC there are a total of seven types, including a Single NAL Unit (SNU), a Simple-Time Aggregation Packet-A (STAP-A), STAP-B, Multi-Time Aggregation Packet 16 (MTAP 16), MTAP24, Fragmentation Unit-A (FU-A), and FU-B.
- SNU Single NAL Unit
- STAP-A Simple-Time Aggregation Packet-A
- STAP-B Simple-Time Aggregation Packet 16
- MTAP24 Multi-Time Aggregation Packet 16
- FU-A Fragmentation Unit-A
- FU-B Fragmentation Unit-B
- the SNU type can load only one NAL unit in one RTP, and the STAP can simultaneously load multiple NAL units that belong to the same presentation time instant in one RTP packet.
- This STAP is divided into an STAP-A type that loads NAL units in an RTP packet in the same order as decoding and a STAP-B type that loads NAL units in an RTP packet without considering the encoding order for interleaving purposes.
- the MTAP can load multiple NAL units belonging to different presentation time instants in one RTP packet at a time and basically supports interleaving.
- This MTAP is divided into an MTAP 16 type supporting a 16-bit time offset and an MTAP24 type supporting a 24-bit time offset depending on the size of a time offset field for displaying the difference in presentation time instant between the NAL units.
- Fig. 1 shows RTP packet types that can be supported by three types of RTP packet modes including an SNU mode, a non-interleaved mode, and interleaved mode.
- the SNU mode of Fig. 1 is able to support only the SNU type that can load only one
- the non-interleaved mode is able to support the STAP-A and the
- the interleaved mode is a mode that adds an interleaving function to the non- interleaved mode, and has a drawback that it cannot support the SNU type.
- the order of the NAL units to be loaded in the RTP packet by the interleaving function of the interleaved mode is different from the order of decoding, a burst error in a channel can be effectively dealt with, but RTP packetization and de-packetization and an SVC decoding procedure become very complicated.
- the non-interleaved mode is suitable as the RTP packetization mode that must be necessarily supported in a commercial SVC streaming service, and the interleaved mode can be considered as an option for a service in an environment with high channel error.
- the SNU type of the non-interleaved mode is supposed to load one NAL unit having
- the STAP-A type of the non-interleaved mode has an RTP payload format structure as shown in Fig. 3, and is of the type that aggregates several NAL units corresponding to the same presentation time instant and loads the same in one
- the STAP-A type of the non-interleaved mode has a 1-byte RTP payload header (STAP-A NAL HDR) additionally inserted therein, unlike the SNU type.
- the value of the F field of the payload header is set to " 1 " if there is more than one NAL unit in which the F field indicated in each of the headers of the NAL units to be loaded together has a value of " 1 "
- the NRI field of the payload header is set to the maximum value of the NRI field values indicated in each of the headers of the NAL units to be loaded together.
- "Type" field of the payload header "NAL_unit_type” of No.24 in Fig.
- the FU-A type of the non-interleaved mode is a type that divides a NAL unit into two or more so that it does not exceed an MTU (Maximum Transmission Unit) size and loads the divided units in respective corresponding RTP packets in order to prevent the occurrence of packet fragmentation in a router or gateway during transmission if the size of one NAL unit exceeds that of the MTU of a network.
- MTU Maximum Transmission Unit
- RTP payload header is composed of a total of 2 bytes including one byte of "FU_indicator” and one byte of "FU_header”. [25] The values indicated in the headers of the NAL units are applied to the F field and
- "NAL_unit_type” of "No.28” in Fig. 1 is set in the "Type” field of "FU_indicator” in order to show that this is the FU-A type.
- the S field and E field of "FU_header” are used in order to show that the parts to be divided and loaded are the start part of an NAL unit or the end part thereof, respectively.
- the "NAL_unit_type” value indicating encoding contents contained in the NAL unit is set, as shown in Fig. 2.
- the present invention proposes a practical RTP packetization algorithm which can effectively load NAL units of an SVC in an RTP payload while maintaining the specification of the RTP payload format. Disclosure of Invention Technical Problem
- an object of the present invention to provide a method for determining a packet type for RTP packetization in a procedure of applying RTP packetization to an SVC bitstream, and an RTP packetizing method and apparatus including an RTP packet generating method based on the packet type determining method.
- a method for determining a packet type for a Scalable Video Coded (SVC) video bitstream which includes the steps of: a) deriving temporal and spatial hierarchy information between Network Abstraction Layer (NAL) units from field information defined in the NAL unit headers of scalable layers; b) detecting the type of encoding information by applying combined scalability encoding to the hierarchical structure of the SVC; and c) determining a Real-time Transport Protocol (RTP) packet type for the corresponding SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.
- NAL Network Abstraction Layer
- RTP Real-time Transport Protocol
- the temporal and spatial hierarchy information derivation step of a) is performed by a combination of related hierarchy values (TL, DID, and QL) between the layers in the temporal and spatial SNR scalability defined in the last octet of the scalable layer NAL unit headers.
- the encoding information type detection step of b) is carried out by analyzing
- NAL_unit_type values of the NAL units belonging to the base layer and the scalable layers, which are the "NAL_unit_type” values indicating encoding information.
- the packet type determined in the step c) is any one among Single NAL Unit (SNU), Fragmentation Unit-A (FU-A), and Simple-Time Aggregation Packet-A (STAP-A) types of a non-interleaved mode.
- SNU Single NAL Unit
- FU-A Fragmentation Unit-A
- STAP-A Simple-Time Aggregation Packet-A
- a packetizing method for an SVC video bitstream which includes the steps of: a) determining a packet type for the SVC video bitstream; and b) fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the determined packet type and loading the fragments in RTP packets.
- an apparatus for packetizing an SVC video bitstream which includes: a packet type determiner for determining a packet type for the SVC video bitstream; and a packet generator for generating a packet by fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the packet type determined by the packet type determiner and loading the fragments in RTP packets.
- the present invention can efficiently determine the packet type for an SVC bitstream and perform RTP pack- etization using the same. [39] As a result, the present invention can more efficiently transmit an SVC video bitstream through an IP network such as the internet.
- Fig. 1 is a table showing a packet type supportable for each RTP packetization mode.
- Fig. 2 is a table summarizing contents contained in NAL units by NAL_unit_types.
- FIG. 3 is an explanatory view showing an RTP payload format structure for a STAP-
- FIG. 4 is an explanatory view showing an RTP payload format structure for an FU-A type.
- Figs. 5 and 6 are explanatory views showing the header structures of NAL units used in a base layer and scalable layers of an SVC in accordance with the present invention.
- Fig. 7 is an explanatory view showing a screen and hierarchical structure for combined scalability encoding of the SVC in accordance with the present invention.
- Fig. 8 is an explanatory view of the encoding order of SVC screens and of NAL units of the base layer and scalable layers belonging to each screen in accordance with the present invention.
- Fig. 8 is an explanatory view of the encoding order of SVC screens and of NAL units of the base layer and scalable layers belonging to each screen in accordance with the present invention.
- FIG. 9 is a detailed flow chart illustrating an RTP packetizing method in accordance with a preferred embodiment of the present invention.
- Fig. 10 is a block diagram illustrating the structure of an RTP packetizing apparatus for an SVC bitstream in accordance with another preferred embodiment of the present invention.
- FIGs. 5 and 6 are explanatory views showing header structures of NAL units used in the base layer and scalable layers of an SVC in accordance with the present invention.
- Encoding information generated by SVC encoding is stored in a bit stream in NAL units. As shown in Figs. 5 and 6, the header structure of the NAL unit generated in the base layer and that of the NAL unit generated in the scalable layers are different from each other.
- Fig. 5 shows the header structure of the NAL unit generated in the base layer compatible with H.264
- Fig. 6 depicts the header structure of the NAL unit generated in the scalable layers.
- the "Type” field means "NAL_unit_type” representing information on contents of encoding information contained in the NAL unit, and shows the type of encoding information contained in the NAL unit for each of the aforementioned "NAL_unit_types" of Fig. 2.
- NAL_unit_types of Nos. 1 to 6 are usable in the NAL unit of the base layer is, while the “NAL_unit_types" of Nos. 20 and 21 are usable in the scalable layers.
- the other "NAL_unit_types" are used to indicate the NAL units containing not encoding information but additional information.
- temporal and spatial hierarchy for each of the NAL units can be derived from (TL, DID, and QL) field information defined in the header of each NAL unit of the scalable layers.
- the (TL, DID, QL) field which is the last octet, in Fig. 6 represents the inter-layer hierarchy in the temporal and spatial SNR scalability. That is, TL (temporal_level) represents the hierarchy between temporal layers for temporal scalability, DID (dependency_id) indicates the dependency hierarchy between higher/lower scalable layers in the inter-layer prediction of spatial scalability, and QL (quality_level) represents the hierarchy between FGS layers for support of SNR scalability.
- TL, DID, and QL values are all integers greater than "0", and the temporal and spatial hierarchy of the NAL units can be derived from a combination of these values.
- Fig. 7 is a view showing an example of a screen and hierarchical structure for combined scalability encoding of the SVC in accordance with the present invention.
- IDR Instantaneous Decoding Refresh
- GOP Group Of Picture
- GOP consists of 16 screens, and the other GOPs not shown in the drawing also has a structure where the GOP size is 16.
- the screen resolution that can be supported by the base layer is QCIF
- the screen resolution that can be supported by the spatial scalable layers is CIF.
- the DID value in the (TL, DID, QL) field of Fig. 6 is used.
- a hierarchical B -picture approach is applied for provision of temporal scalability, and the TL value in the (TL, DID, QL) field is used in order to display a supportable frame rate.
- the TL value is displayed in the middle part of each screen indicated in a rectangle.
- the frame rate can be supported up to 1.875 fps, and in case of transmitting it, including a B-picture with
- the frame rate can be supported up to 3.75 fps.
- the frame rate can be supported up to 15 fps in QCIF standard, and as the maximum TL value in the spatial scalable layer is 4, the frame rate can be supported up to 30 fps in CIF standard.
- the three NAL units generated in the scalable layers include one NAL unit for FGS scaling for the base layer, one NAL unit for the spatial scalable layer, and one NAL unit for FGS scaling for the spatial scalable layer.
- the NAL unit firstly generated in the IDR picture is done in the base layer, and the header of the NAL unit conforms to the structure of (a) of Fig. 5. Because the base layer is an IDR picture, it can be seen from Fig. 9 that the "NAL_unit_type" of the header is set to "5" by Fig. 2 described above.
- the NAL unit secondly generated in the IDR picture is the one for FGS scaling for the base layer. As the "NAL_unit_type" is set to "21" by Fig. 2, and QL is set to 1
- the NAL unit thirdly generated in the IDR picture is the one for spatial scalable layer.
- NAL unit lastly generated in the IDR picture is the one for FGS scaling for spatial scalable layer.
- NAL_unit_type is set to "21” and QL is set to 1
- the screen numbers 1, 3, 5, 7, 13, and 15 are encoded in the spatial scalable layer in order to support 30 fps only by the CIF standard.
- analyzing the "NAL_unit_types" and (TL, DID, QL) field of the NAL units can detect the type of encoding information contained in the NAL units through the "NAL_unit_type” values and derive the temporal and spatial hierarchy between the NAL units through the (TL, DID, QL) values.
- Such information can be very usefully utilized in effectively designing the RTP pack- etization scheme for cutting an SVC stream to a proper size and loading the same in an RTP packet.
- the NAL units belonging to the base layer have a higher priority order in transmission than the NAL units belonging to the scalable layers, and are processed to be strong against an error through channel encoding, separately from scalable layer information. Therefore, the NAL units of the base layer are not loaded in an RTP packet by mixing with the NAL units of the scalable layers, but loaded independently in an RTP packet.
- the STAP-A packet type that can be aggregated with the NAL units of the scalable layers is not applied to the NAL units of the base layer, but either SNU or FU-A is selected and loaded in an RTP packet by considering the length of the NAL units.
- Applied to the NAL units belonging to the scalable layers are all the three packet types including SNU, FU-A, and STAP-A.
- NAL units, and STAP-A is applied in such a manner that several NAL units of the scalable layers belonging to the same screen number are aggregated as one within the range that does not exceed the MTU size and loaded in an RTP packet.
- (TL, DID, QL) information of NU which is the NAL unit being inputted to the loop of the present algorithm, is indicated by (T , D , Q ), and the next NAL unit to be analyzed one step in advance by the look- ahead scheme is designated by NU , and (TL, DID, QL) information of NU is indicated by (T , D , Q ).
- RTP pay load is as follows:
- NU should not be the NAL unit belonging to the base layer. i+l
- NU should have the same TL value as NU .
- NU and NU i+l should sequentially satisfy all of i, ii, iii, and iv-(a) among the above conditions, or should sequentially satisfy all of i, ii, iii, and iv-(b).
- RTP packetization by determining the SNU, FU-A, and STAP-A packet types based on the above conditions for determining the packet type as STAP-A.
- N implies that a packetizing process that is currently in progress is in the process of loading an N-th NAL unit in an RTP payload.
- the packet type is determined as STAP-A when N>1, only NU is loaded in the RTP payload.
- the packet type is determined as SNU or FU-A by checking whether the size of NU exceeds that of the
- parameters I and J are used so as to indicate the start position and end position of the N-number of NAL units to be loaded in the RTP payload.
- Si means the size of NU
- Pi means the size of total packets accumulated in the RTP payload including NU and is used to check whether or not the size of the total packets accumulated in the RTP payload exceeds that of the MTU.
- FIG. 10 is a block diagram illustrating an RTP packetizing apparatus for an SVC bitstream in accordance with another preferred embodiment of the present invention.
- the inventive RTP packetizing apparatus 120 for an SVC bitstream 120 includes a packet type determiner 130 for determining a packet type for an input SVC bitstream and a packet generator 140 for generating an RTP packet by fragmenting the SVC bitstream so as to correspond to the packet type determined by the packet type determiner 130 and loading the same in an RTP packet.
- Reference numeral 110 not explained represents an SVC encoder 110 which provides the SVC bitstream to the packet type determiner 130 by encoding an input video sequence.
- the method of the present invention as mentioned above may be implemented by a software program that is stored in a computer-readable storage medium such as CD- ROM, RAM, ROM, floppy disk, hard disk, optical magnetic disk, or the like. This procedure may be readily carried out by those skilled in the art; and therefore, details of thereof are omitted here.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Provided are a method for determining the packet type for a Scalable Video Coded (SVC) video bitstream, and a Real-time Transport Protocol (RTP) packetizing apparatus and method using the same. The method for determining a packet type for a Scalable Video Coded (SVC) video bitstream, which includes the steps of: a) deriving temporal and spatial hierarchy information between Network Abstraction Layer (NAL) units from field information defined in the NAL unit headers of scalable layers; b) detecting the type of encoding information by applying combined scalability encoding to the hierarchical structure of the Scalable Video Coding (SVC); and c) determining a Real-time Transport Protocol (RTP) packet type for the corresponding SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.
Description
Description
METHOD FOR DETERMINING PACKET TYPE FOR SVC
VIDEO BITSTREAM, AND RTP PACKETIZING APPARATUS
AND METHOD USING THE SAME
Technical Field
[1] The present invention relates to a method for determining the packet type for a
Scalable Video Coded (SVC) video bitstream, and a Real-time Transport Protocol (RTP) packetizing apparatus and method using the same; and, more particularly, to a method for determining the packet type for RTP packetization in a procedure of applying RTP packetization to an SVC bitstream, and an RTP packetizing method and apparatus including an RTP packet generating method based on the packet type determining method. Background Art
[2] Scalable Video Coding (SVC), which is a scalable coding technique of H.264, is a new scalable coding technique that is developed to solve the problems of low compression efficiency, unsupportability of combined scalability, and high implementation complexity, which are caused by layered coding-based scalability attempted in existing Moving Picture Experts Group 2 (MPEG-2), MPEG-4, etc.
[3] SVC encodes multiple video layers into a single bit sequence. The layers of SVC include one base layer and scalable layers that can be continuously stacked over the base layer.
[4] Each scalable layer is able to express the maximum bit rate, frame rate and resolution that are given to itself based on low-order layer information.
[5] The more the SVC continuously stacks scalable layers, the more diverse bit rates, frame rates, and resolutions it is possible to support. Thus, the SVC is a coding technique suitable for multimedia contents service in a Universal Multimedia Access (UMA) environment that can solve the problem of variability in bandwidth that occurs in a heterogeneous network environment, the problem of variability in receiving terminal performance and resolution, the problem of various preferences of contents consumers and so on in a complex way.
[6] A Video Coding Layer (VCL) of an SVC encoder generates base layer encoding information and scalability encoding information of the scalable layers in slices.
[7] Each slice is generated in Network Abstraction Layer (NAL) units in an NAL and stored in an SVC bitstream.
[8] Although an RTP payload format for loading the NAL units of the SVC is currently disclosed in an internet draft document "draft-wenger-avt-rtp-svc-02.txt", the SVC is
of a complicated structure that stores encoding information of SNR scalability and temporal and spatial scalability, as well as base layer encoding information that is compatible with H.264, in a single bit stream. Thus, no research has provided a result yet on an effective RTP packetizing method that can support the RTP payload format of the SVC.
[9] As RTP packet types for the NAL units of the SVC, there are a total of seven types, including a Single NAL Unit (SNU), a Simple-Time Aggregation Packet-A (STAP-A), STAP-B, Multi-Time Aggregation Packet 16 (MTAP 16), MTAP24, Fragmentation Unit-A (FU-A), and FU-B.
[10] The SNU type can load only one NAL unit in one RTP, and the STAP can simultaneously load multiple NAL units that belong to the same presentation time instant in one RTP packet. This STAP is divided into an STAP-A type that loads NAL units in an RTP packet in the same order as decoding and a STAP-B type that loads NAL units in an RTP packet without considering the encoding order for interleaving purposes.
[11] The MTAP can load multiple NAL units belonging to different presentation time instants in one RTP packet at a time and basically supports interleaving. This MTAP is divided into an MTAP 16 type supporting a 16-bit time offset and an MTAP24 type supporting a 24-bit time offset depending on the size of a time offset field for displaying the difference in presentation time instant between the NAL units.
[12] Among these seven RTP packet types, only packet types required according to an application field are aggregated by three types of RTP packet modes. Fig. 1 shows RTP packet types that can be supported by three types of RTP packet modes including an SNU mode, a non-interleaved mode, and interleaved mode.
[13] The SNU mode of Fig. 1 is able to support only the SNU type that can load only one
NAL unit having 1 to 23 "NAL_unit_types" shown in Fig. 2 in an RTP packet, and its application field is restrictive.
[14] On the other hand, the non-interleaved mode is able to support the STAP-A and the
FU-A as well as the SNU type, and thus, its practically applicable application range is wide.
[15] The interleaved mode is a mode that adds an interleaving function to the non- interleaved mode, and has a drawback that it cannot support the SNU type. As the order of the NAL units to be loaded in the RTP packet by the interleaving function of the interleaved mode is different from the order of decoding, a burst error in a channel can be effectively dealt with, but RTP packetization and de-packetization and an SVC decoding procedure become very complicated.
[16] Therefore, in view of the implementation complexity and the applicable application range, the non-interleaved mode is suitable as the RTP packetization mode that must be necessarily supported in a commercial SVC streaming service, and the interleaved
mode can be considered as an option for a service in an environment with high channel error. [17] The SNU type of the non-interleaved mode is supposed to load one NAL unit having
1 to 23 "NAL_unit_types" shown in Fig. 2 in one RTP packet. [18] In other words, the STAP-A type of the non-interleaved mode has an RTP payload format structure as shown in Fig. 3, and is of the type that aggregates several NAL units corresponding to the same presentation time instant and loads the same in one
RTP packet. [19] The STAP-A type of the non-interleaved mode, as shown in Fig. 3, has a 1-byte RTP payload header (STAP-A NAL HDR) additionally inserted therein, unlike the SNU type. The value of the F field of the payload header is set to " 1 " if there is more than one NAL unit in which the F field indicated in each of the headers of the NAL units to be loaded together has a value of " 1 " [20] The NRI field of the payload header is set to the maximum value of the NRI field values indicated in each of the headers of the NAL units to be loaded together. [21] In the "Type" field of the payload header, "NAL_unit_type" of No.24 in Fig. 3 is set in order to show that this is a STAP-A type. [22] In addition, the "NALU_Size" field of 2 bytes representing the size of each NAL unit to be loaded separately from payload header information is inserted in the front part of each NAL unit. [23] The FU-A type of the non-interleaved mode is a type that divides a NAL unit into two or more so that it does not exceed an MTU (Maximum Transmission Unit) size and loads the divided units in respective corresponding RTP packets in order to prevent the occurrence of packet fragmentation in a router or gateway during transmission if the size of one NAL unit exceeds that of the MTU of a network. [24] Fig. 4 illustrates the structure of an RTP payload format for the FU-A type. The RTP payload header is composed of a total of 2 bytes including one byte of "FU_indicator" and one byte of "FU_header". [25] The values indicated in the headers of the NAL units are applied to the F field and
NRI field of "FU_indicator" as it is. [26] "NAL_unit_type" of "No.28" in Fig. 1 is set in the "Type" field of "FU_indicator" in order to show that this is the FU-A type. [27] The S field and E field of "FU_header" are used in order to show that the parts to be divided and loaded are the start part of an NAL unit or the end part thereof, respectively. [28] In the "Type" field of the "FUJieader", the "NAL_unit_type" value indicating encoding contents contained in the NAL unit is set, as shown in Fig. 2. [29] That is, as described above, although the RTP packet type for the NAL units stored
in an SVC bitstream is classified as standard, there has been no suggestion of the standard and method for determining a given NAL unit as a suitable packet type.
[30] Consequently, the present invention proposes a practical RTP packetization algorithm which can effectively load NAL units of an SVC in an RTP payload while maintaining the specification of the RTP payload format. Disclosure of Invention Technical Problem
[31] It is, therefore, an object of the present invention to provide a method for determining a packet type for RTP packetization in a procedure of applying RTP packetization to an SVC bitstream, and an RTP packetizing method and apparatus including an RTP packet generating method based on the packet type determining method.
[32] Other objects and advantages of the present invention can be understood by the following description, and become apparent with reference to the embodiments of the present invention. Also, it is obvious to those skilled in the art of the present invention that the objects and advantages of the present invention can be realized by the means as claimed and combinations thereof. Technical Solution
[33] In accordance with an aspect of the present invention, there is provided a method for determining a packet type for a Scalable Video Coded (SVC) video bitstream, which includes the steps of: a) deriving temporal and spatial hierarchy information between Network Abstraction Layer (NAL) units from field information defined in the NAL unit headers of scalable layers; b) detecting the type of encoding information by applying combined scalability encoding to the hierarchical structure of the SVC; and c) determining a Real-time Transport Protocol (RTP) packet type for the corresponding SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.
[34] The temporal and spatial hierarchy information derivation step of a) is performed by a combination of related hierarchy values (TL, DID, and QL) between the layers in the temporal and spatial SNR scalability defined in the last octet of the scalable layer NAL unit headers.
[35] The encoding information type detection step of b) is carried out by analyzing
"NAL_unit_type" values of the NAL units belonging to the base layer and the scalable layers, which are the "NAL_unit_type" values indicating encoding information. The packet type determined in the step c) is any one among Single NAL Unit (SNU), Fragmentation Unit-A (FU-A), and Simple-Time Aggregation Packet-A (STAP-A) types of a non-interleaved mode.
[36] In accordance with another aspect of the present invention, there is provided a
packetizing method for an SVC video bitstream, which includes the steps of: a) determining a packet type for the SVC video bitstream; and b) fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the determined packet type and loading the fragments in RTP packets.
[37] In accordance with another aspect of the present invention, there is provided an apparatus for packetizing an SVC video bitstream, which includes: a packet type determiner for determining a packet type for the SVC video bitstream; and a packet generator for generating a packet by fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the packet type determined by the packet type determiner and loading the fragments in RTP packets.
Advantageous Effects
[38] As mentioned above and will be described below, the present invention can efficiently determine the packet type for an SVC bitstream and perform RTP pack- etization using the same. [39] As a result, the present invention can more efficiently transmit an SVC video bitstream through an IP network such as the internet.
Brief Description of the Drawings
[40] Fig. 1 is a table showing a packet type supportable for each RTP packetization mode.
[41] Fig. 2 is a table summarizing contents contained in NAL units by NAL_unit_types.
[42] Fig. 3 is an explanatory view showing an RTP payload format structure for a STAP-
A type. [43] Fig. 4 is an explanatory view showing an RTP payload format structure for an FU-A type. [44] Figs. 5 and 6 are explanatory views showing the header structures of NAL units used in a base layer and scalable layers of an SVC in accordance with the present invention. [45] Fig. 7 is an explanatory view showing a screen and hierarchical structure for combined scalability encoding of the SVC in accordance with the present invention. [46] Fig. 8 is an explanatory view of the encoding order of SVC screens and of NAL units of the base layer and scalable layers belonging to each screen in accordance with the present invention. [47] Fig. 9 is a detailed flow chart illustrating an RTP packetizing method in accordance with a preferred embodiment of the present invention. [48] Fig. 10 is a block diagram illustrating the structure of an RTP packetizing apparatus for an SVC bitstream in accordance with another preferred embodiment of the present invention.
Best Mode for Carrying Out the Invention [49] The advantages, features and aspects of the invention will become apparent from the
following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter. Thus, the present invention will be easily practiced by those skilled in the art. Further, in the following description, well-known arts will not be described in detail if it seems that they could obscure the invention in unnecessary detail. Hereinafter, specific embodiments of the present invention will be set forth in detail with reference to the accompanying drawings.
[50] Figs. 5 and 6 are explanatory views showing header structures of NAL units used in the base layer and scalable layers of an SVC in accordance with the present invention.
[51] Encoding information generated by SVC encoding is stored in a bit stream in NAL units. As shown in Figs. 5 and 6, the header structure of the NAL unit generated in the base layer and that of the NAL unit generated in the scalable layers are different from each other.
[52] Fig. 5 shows the header structure of the NAL unit generated in the base layer compatible with H.264, and Fig. 6 depicts the header structure of the NAL unit generated in the scalable layers.
[53] In Figs. 5 and 6, the "Type" field means "NAL_unit_type" representing information on contents of encoding information contained in the NAL unit, and shows the type of encoding information contained in the NAL unit for each of the aforementioned "NAL_unit_types" of Fig. 2.
[54] The "NAL_unit_types" of Nos. 1 to 6 are usable in the NAL unit of the base layer is, while the "NAL_unit_types" of Nos. 20 and 21 are usable in the scalable layers. The other "NAL_unit_types" are used to indicate the NAL units containing not encoding information but additional information.
[55] In the present invention, temporal and spatial hierarchy for each of the NAL units can be derived from (TL, DID, and QL) field information defined in the header of each NAL unit of the scalable layers.
[56] The (TL, DID, QL) field, which is the last octet, in Fig. 6 represents the inter-layer hierarchy in the temporal and spatial SNR scalability. That is, TL (temporal_level) represents the hierarchy between temporal layers for temporal scalability, DID (dependency_id) indicates the dependency hierarchy between higher/lower scalable layers in the inter-layer prediction of spatial scalability, and QL (quality_level) represents the hierarchy between FGS layers for support of SNR scalability.
[57] The TL, DID, and QL values are all integers greater than "0", and the temporal and spatial hierarchy of the NAL units can be derived from a combination of these values.
[58] Based on the (TL, DID, QL) information and the "NAL_unit_types" thus analyzed, a practical RTP packetization algorithm that can be effectively applied to combined scalability of the SVC is proposed.
[59] Fig. 7 is a view showing an example of a screen and hierarchical structure for
combined scalability encoding of the SVC in accordance with the present invention. [60] In Fig. 7, only an Instantaneous Decoding Refresh (IDR) screen, which is the start part of an SVC stream, and a first Group Of Picture (GOP) screen are shown. One
GOP consists of 16 screens, and the other GOPs not shown in the drawing also has a structure where the GOP size is 16. [61] The screen resolution that can be supported by the base layer is QCIF, and the screen resolution that can be supported by the spatial scalable layers is CIF. [62] In order to display different resolutions in different spatial layers, the DID value in the (TL, DID, QL) field of Fig. 6 is used. [63] That is, in Fig. 7, the NAL unit with DID=O means a screen with a resolution of
QCIF, and the NAL unit with DID=I represents a screen with a resolution of CIF. [64] A hierarchical B -picture approach is applied for provision of temporal scalability, and the TL value in the (TL, DID, QL) field is used in order to display a supportable frame rate. [65] In Fig. 7, the TL value is displayed in the middle part of each screen indicated in a rectangle. In case of transmitting only a key picture with TL=O, the frame rate can be supported up to 1.875 fps, and in case of transmitting it, including a B-picture with
TL=I, the frame rate can be supported up to 3.75 fps. [66] In case of additionally transmitting a B-picture with TL=2, the frame rate can be supported up to 7.5 fps, and in case of additionally transmitting B-pictures with TL=3 and TL=4, the frame rate can be supported up to 15 fps and 30 fps, respectively. [67] In Fig. 7, as the maximum TL value in the base layer is 3, the frame rate can be supported up to 15 fps in QCIF standard, and as the maximum TL value in the spatial scalable layer is 4, the frame rate can be supported up to 30 fps in CIF standard. [68] If the screens at the same point of time belonging to the base layer and the spatial scalable layer have the same TL value, inter-layer prediction encoding is executed in the direction of an arrow indicated by dotted lines in the drawing. The resolution of the base layer screen of the QCIF standard where DID=O is upsampled to be utilized for the prediction encoding of the scalable layer screen of the CIF standard where DID=I. [69] Meanwhile, in Fig. 7, since each screen generates one FGS layer for support of SNR scalability, the NAL units containing encoding information of each FGS layer are all set to QL=I. [70] Fig. 8 sequentially shows the encoding order of the screens and the NAL_unit_types and (TL, DID, QL) field information for NAL units of the base layer and scalable layers belonging to each screen in a case where SVC combined scalability encoding is applied to the screen and layer structure of Fig. 7. [71] Referring to the drawing, it can be seen that the encoding of an IDR picture occurs first. One base layer NAL unit having the header structure of Fig. 5 is generated in the
base layer, and three scalable layer NAL units having the header structure of Fig. 6 are generated in the scalable layers. [72] The three NAL units generated in the scalable layers include one NAL unit for FGS scaling for the base layer, one NAL unit for the spatial scalable layer, and one NAL unit for FGS scaling for the spatial scalable layer. [73] The NAL unit firstly generated in the IDR picture is done in the base layer, and the header of the NAL unit conforms to the structure of (a) of Fig. 5. Because the base layer is an IDR picture, it can be seen from Fig. 9 that the "NAL_unit_type" of the header is set to "5" by Fig. 2 described above. [74] The NAL unit secondly generated in the IDR picture is the one for FGS scaling for the base layer. As the "NAL_unit_type" is set to "21" by Fig. 2, and QL is set to 1
(QL=I), (TL, DID, QL) becomes (0,0,1). [75] The NAL unit thirdly generated in the IDR picture is the one for spatial scalable layer. As the "NAL_unit_type" is set to "21" and DID is set to 1 (DID=I), (TL, DID,
QL) becomes (0,1,0). [76] The NAL unit lastly generated in the IDR picture is the one for FGS scaling for spatial scalable layer. As the "NAL_unit_type" is set to "21" and QL is set to 1
(QL=I), (TL, DID, QL) becomes (0,1,1). [77] When the encoding of the IDR picture is finished as above, a screen with screen number 16, which is an I- or P-picture, is encoded. Because this picture is a non-IDR picture, "NAL_unit_type" is set to " 1" in the base layer and to "20" in the scalable layer. [78] After completion of the encoding of the screen number 16, a screen with screen number 8 is encoded. The TL values for four NAL units generated at this time are all set to "1", thereby making it possible to support the frame rate of 3.75 fps. [79] Next, as the screens with screen numbers 4 and 12 are set to TL=2, the frame rate of
7.5 fps can be supported. [80] The screen numbers 2, 6, 10, and 14 are set to TL=3 with respect to all the four NAL units generated for support of the frame rate of 15 fps. [81] Meanwhile, the screen numbers 1, 3, 5, 7, 13, and 15 are encoded in the spatial scalable layer in order to support 30 fps only by the CIF standard. [82] As shown in Fig. 8, there exists no NAL unit belonging to the base layer, but only two NAL units belonging to the scalable layers exist. [83] To support 30fps, TL is set to 4 (TL=4) and all the NAL units belong to the spatial scalable layer, and thus, they are commonly set to DID=I. [84] The NAL units for the spatial scalable layer are set to QL=O, and the NAL units for
FGS scaling for the spatial scalable layer are set to QL=I. [85] As shown in Fig. 8, for the combined scalability of the SVC, analyzing the
"NAL_unit_types" and (TL, DID, QL) field of the NAL units can detect the type of encoding information contained in the NAL units through the "NAL_unit_type" values and derive the temporal and spatial hierarchy between the NAL units through the (TL, DID, QL) values.
[86] Such information can be very usefully utilized in effectively designing the RTP pack- etization scheme for cutting an SVC stream to a proper size and loading the same in an RTP packet.
[87] In the non-interleaved mode, three packet types such as SNU, FU-A, and STAP-A are supported, as shown in Fig. 1.
[88] Generally, the NAL units belonging to the base layer have a higher priority order in transmission than the NAL units belonging to the scalable layers, and are processed to be strong against an error through channel encoding, separately from scalable layer information. Therefore, the NAL units of the base layer are not loaded in an RTP packet by mixing with the NAL units of the scalable layers, but loaded independently in an RTP packet.
[89] Accordingly, the STAP-A packet type that can be aggregated with the NAL units of the scalable layers is not applied to the NAL units of the base layer, but either SNU or FU-A is selected and loaded in an RTP packet by considering the length of the NAL units.
[90] Applied to the NAL units belonging to the scalable layers are all the three packet types including SNU, FU-A, and STAP-A.
[91] Among them, SNU and FU-A are selectively applied depending on the length of the
NAL units, and STAP-A is applied in such a manner that several NAL units of the scalable layers belonging to the same screen number are aggregated as one within the range that does not exceed the MTU size and loaded in an RTP packet.
[92] Hereinafter, an algorithm based on a look-ahead scheme for identifying scalable layer NAL units, to which the STAP-A type is to be applied, will be described.
[93] (TL, DID, QL) information of NU , which is the NAL unit being inputted to the loop of the present algorithm, is indicated by (T , D , Q ), and the next NAL unit to be analyzed one step in advance by the look- ahead scheme is designated by NU , and (TL, DID, QL) information of NU is indicated by (T , D , Q ). t+l i+l i+l i+l
[94] In order to determine whether to apply the STAP-A type, (T , D , Q ) in- i+l i+l i+l formation of NU is extracted in advance and compared. The sequential condition that should be satisfied in order to aggregate NU and NU and add the same to one i+l
RTP pay load is as follows:
[95] i. NU should not be the NAL unit belonging to the base layer. i+l
[96] ii. NU should have the same TL value as NU .
[97] iii. The sum of the size of the NAL units accumulated until NU in an RTP payload
plus the size of NU should be smaller than the size of an MTU (in case of the t+l internet, the general size of the MTU is 1500 bytes). In case of transmitting an RTP packet greater than the MTU, the RTP packet is fragmented into several packets by the fragmentation function of a router or gateway during transmission through a network, thereby causing a burden to the network and the client.
[98] iv. The following conditions should be satisfied depending on the magnitude correlation of Q and Q i i+l
[99] (a) If Q >Q , this means that the quality level of a FGS layer increases. This t+l i phenomenon occurs only to the NAL units belonging to the same screen number, and thus, the condition of STAP-A is satisfied. Therefore, NU and NU can be loaded t+l i together in an RTP payload. [100] Qo) Jf Q = Q , this means that the quality level of a FGS layer does not increase. t+l i
The situation where this phenomenon occurs can be divided into the situation of D > i+l
D and vice versa. The situation of D >D occurs only to the NAL units that always i i+l i exist within the same screen number, and thus, NU and NU can be targets of ST AP- i i+l
A. However, the situation of D i+l =D i occurs between the NAL units having different screen numbers, i.e., different presentation time instants, and thus NU i+l and NU i cannot be targets of STAP-A.
[101] In conclusion, in order to perform RTP packetization in the STAP-A type, NU and NU i+l should sequentially satisfy all of i, ii, iii, and iv-(a) among the above conditions, or should sequentially satisfy all of i, ii, iii, and iv-(b).
[102] There is shown in Fig. 9 a flowchart of the algorithm proposed in order to perform
RTP packetization by determining the SNU, FU-A, and STAP-A packet types based on the above conditions for determining the packet type as STAP-A.
[103] The flowchart as shown in Fig. 9 is performed for every GOP unit, and the RTP packet type is determined based on NUTypei, which is the "NAL_unit_type" of all the NAL units existing within one GOP, and (T , D , Q ), which is (TL, DID, QL) in- formation, as explained in Fig. 8. [104] i, ii, iii, iv-(a), and iv-(b), which are the conditions for determining the packet type as
STAP-A, are indicated on the corresponding blocks of Fig. 9, respectively. [105] In Fig. 9, N implies that a packetizing process that is currently in progress is in the process of loading an N-th NAL unit in an RTP payload. The algorithm shown in the drawing is operated in the look-ahead scheme of investigating NU in advance in t+l order to compare the STAP-A type condition. Therefore, if the packet type is determined as STAP-A when N=I, NU and NU are simultaneously loaded in the i i+l
RTP payload, while if the packet type is determined as STAP-A when N>1, only NU is loaded in the RTP payload. [106] If the packet type is not determined as STAP-A when N=I, the packet type is
determined as SNU or FU-A by checking whether the size of NU exceeds that of the
MTU. On the other hand, if the packet type is not determined as STAP-A when N>1, an N-number of NAL units accumulated in an RTP payload up to present are loaded and transmitted in one RTP packet, and then parameters N and I are updated to generate a new RTP packet, followed by repeating the entire process. [107] Herein, parameters I and J are used so as to indicate the start position and end position of the N-number of NAL units to be loaded in the RTP payload. Meanwhile, Si means the size of NU , and Pi means the size of total packets accumulated in the RTP payload including NU and is used to check whether or not the size of the total packets accumulated in the RTP payload exceeds that of the MTU.
[108] Fig. 10 is a block diagram illustrating an RTP packetizing apparatus for an SVC bitstream in accordance with another preferred embodiment of the present invention.
[109] Referring to Fig. 10, the inventive RTP packetizing apparatus 120 for an SVC bitstream 120 includes a packet type determiner 130 for determining a packet type for an input SVC bitstream and a packet generator 140 for generating an RTP packet by fragmenting the SVC bitstream so as to correspond to the packet type determined by the packet type determiner 130 and loading the same in an RTP packet.
[110] A description of the detailed functions of the components such as the packet type determiner 130 and the packet generator 140 will be substituted by the above description of Figs. 5 to 9.
[I l l] Reference numeral 110 not explained represents an SVC encoder 110 which provides the SVC bitstream to the packet type determiner 130 by encoding an input video sequence.
[112] The method of the present invention as mentioned above may be implemented by a software program that is stored in a computer-readable storage medium such as CD- ROM, RAM, ROM, floppy disk, hard disk, optical magnetic disk, or the like. This procedure may be readily carried out by those skilled in the art; and therefore, details of thereof are omitted here.
[113] The present application contains subject matter related to Korean Patent Application Nos. 2006-0110714 and 2006-0125144, filed in the Korean Intellectual Property Office on November 9, 2006, and December 8, 2006, the entire contents of which is incorporated herein by reference.
[114] While the present invention has been described with respect to the specific embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.
Claims
[1] A method for determining a packet type for a Scalable Video Coded (SVC) video bitstream, comprising the steps of: a) deriving temporal and spatial hierarchy information between Network Abstraction Layer (NAL) units from field information defined in the NAL unit headers of scalable layers; b) detecting the type of encoding information by applying combined scalability encoding to the hierarchical structure of the Scalable Video Coding (SVC); and c) determining a Real-time Transport Protocol (RTP) packet type for the corresponding SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.
[2] The method of claim 1, wherein the step a) is performed by a combination of related hierarchy values (TL, DID, and QL) between the layers in the temporal and spatial SNR scalability defined in the last octet of the scalable layer NAL unit headers.
[3] The method of claim 1, wherein the step b) is carried out by analyzing
NAL_unit_type values of the NAL units belonging to the base layer and the scalable layers, which are the "NAL_unit_type" values indicating encoding information.
[4] The method of claim 1, wherein the packet type determined in the step c) is any one among Single NAL Unit (SNU), Fragmentation Unit-A (FU-A), and Simple- Time Aggregation Packet-A (STAP-A) types of a non-interleaved mode.
[5] An RTP packetizing method for an SVC video bitstream, comprising the steps of: a) determining a packet type for the SVC video bitstream; and b) fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the determined packet type and loading the fragments in RTP packets.
[6] The method of claim 5, wherein the step a) includes the steps of: al) deriving temporal and spatial hierarchy information between NAL units from field information defined in the NAL unit headers of scalable layers; a2) detecting a type of encoding information by applying combined scalability encoding to the hierarchical structure of the SVC; and a3) determining an RTP packet type for the SVC video bitstream by using the derived temporal and spatial hierarchy information between the NAL units and the detected type of encoding information.
[7] The method of claim 6, wherein the step al) is performed by a combination of related hierarchy values (TL, DID, and QL) between the layers in the temporal and spatial SNR scalability defined in the last octet of the scalable layer NAL unit headers.
[8] The method of claim 6, wherein the step a2) is performed by analyzing
"NAL_unit_type" values of the NAL units belonging to the base layer and the scalable layers, which are the "NAL_unit_type" values indicating encoding information.
[9] The method of claim 6, wherein the packet type determined in the step a3) is any one among SNU, FU-A, and STAP-A types of a non-interleaved mode.
[10] An apparatus for packetizing an SVC video bitstream, comprising: a packet type determiner for determining a packet type for the SVC video bitstream; and a packet generator for generating a packet by fragmenting the SVC video bitstream so as to conform the SVC video bitstream to the packet type determined by the packet type determiner and loading the fragments in RTP packets.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/513,542 US8761203B2 (en) | 2006-11-09 | 2007-09-12 | Method for determining packet type for SVC video bitstream, and RTP packetizing apparatus and method using the same |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2006-0110714 | 2006-11-09 | ||
KR20060110714 | 2006-11-09 | ||
KR1020060125144A KR100776680B1 (en) | 2006-11-09 | 2006-12-08 | Method for packet type classification to svc coded video bitstream, and rtp packetization apparatus and method |
KR10-2006-0125144 | 2006-12-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008056878A1 true WO2008056878A1 (en) | 2008-05-15 |
Family
ID=39364668
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2007/004413 WO2008056878A1 (en) | 2006-11-09 | 2007-09-12 | Method for determining packet type for svc video bitstream, and rtp packetizing apparatus and method using the same |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2008056878A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010108416A1 (en) * | 2009-03-24 | 2010-09-30 | 华为技术有限公司 | Method, device and communication system for forwarding scalable video coding data messages |
EP2509359A2 (en) * | 2009-12-01 | 2012-10-10 | Samsung Electronics Co., Ltd. | Method and apparatus for transmitting a multimedia data packet using cross-layer optimization |
CN101783935B (en) * | 2009-01-19 | 2013-07-24 | 财团法人工业技术研究院 | Method for encapsulating scalable video coding bitstreams |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1311125A2 (en) * | 2001-11-12 | 2003-05-14 | Sony Corporation | Data communication system and method, data transmission apparatus and method, data receiving apparatus, received-data processing method and computer program |
WO2004036916A1 (en) * | 2002-10-15 | 2004-04-29 | Koninklijke Philips Electronics N.V. | System and method for transmitting scalable coded video over an ip network |
EP1631088A1 (en) * | 2004-08-23 | 2006-03-01 | LG Electronics Inc. | Apparatus for transmitting video signal and method thereof |
-
2007
- 2007-09-12 WO PCT/KR2007/004413 patent/WO2008056878A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1311125A2 (en) * | 2001-11-12 | 2003-05-14 | Sony Corporation | Data communication system and method, data transmission apparatus and method, data receiving apparatus, received-data processing method and computer program |
WO2004036916A1 (en) * | 2002-10-15 | 2004-04-29 | Koninklijke Philips Electronics N.V. | System and method for transmitting scalable coded video over an ip network |
EP1631088A1 (en) * | 2004-08-23 | 2006-03-01 | LG Electronics Inc. | Apparatus for transmitting video signal and method thereof |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101783935B (en) * | 2009-01-19 | 2013-07-24 | 财团法人工业技术研究院 | Method for encapsulating scalable video coding bitstreams |
WO2010108416A1 (en) * | 2009-03-24 | 2010-09-30 | 华为技术有限公司 | Method, device and communication system for forwarding scalable video coding data messages |
EP2509359A2 (en) * | 2009-12-01 | 2012-10-10 | Samsung Electronics Co., Ltd. | Method and apparatus for transmitting a multimedia data packet using cross-layer optimization |
EP2509359A4 (en) * | 2009-12-01 | 2014-03-05 | Samsung Electronics Co Ltd | Method and apparatus for transmitting a multimedia data packet using cross-layer optimization |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8761203B2 (en) | Method for determining packet type for SVC video bitstream, and RTP packetizing apparatus and method using the same | |
EP2919453B1 (en) | Video stream switching | |
AU2009211932B2 (en) | Method and device for reordering and multiplexing multimedia packets from multimedia streams pertaining to interrelated sessions | |
US9456209B2 (en) | Method of multiplexing H.264 elementary streams without timing information coded | |
US8798145B2 (en) | Methods for error concealment due to enhancement layer packet loss in scalable video coding (SVC) decoding | |
US20140133489A1 (en) | Method for transmitting packet-based media data having header in which overhead is minimized | |
JP2006087125A (en) | Method of encoding sequence of video frames, encoded bit stream, method of decoding image or sequence of images, use including transmission or reception of data, method of transmitting data, coding and/or decoding apparatus, computer program, system, and computer readable storage medium | |
KR20100071688A (en) | A streaming service system and method for universal video access based on scalable video coding | |
US20100046552A1 (en) | Time-stamping apparatus and method for rtp packetization of svc coded video, and rtp packetization system using the same | |
EP2404451B1 (en) | Processing of multimedia data | |
WO2008056878A1 (en) | Method for determining packet type for svc video bitstream, and rtp packetizing apparatus and method using the same | |
US8565083B2 (en) | Thinning of packet-switched video data | |
KR100849495B1 (en) | Method for producing bit-rate based on RTP packet mode | |
Seo et al. | A practical RTP packetization scheme for SVC video transport over IP networks | |
WO2007035151A1 (en) | Media stream scaling | |
Kordelas et al. | On the performance of H. 264/MVC over lossy IP-based networks | |
Hannuksela et al. | Congestion-aware transmission rate control using medium grain scalability of scalable video coding | |
Klemets | RTP payload format for video codec 1 (VC-1) | |
Klemets | RFC 4425: RTP Payload Format for Video Codec 1 (VC-1) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07808205 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12513542 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 07808205 Country of ref document: EP Kind code of ref document: A1 |