WO2017127816A1 - Omnidirectional video encoding and streaming - Google Patents
Omnidirectional video encoding and streaming Download PDFInfo
- Publication number
- WO2017127816A1 WO2017127816A1 PCT/US2017/014588 US2017014588W WO2017127816A1 WO 2017127816 A1 WO2017127816 A1 WO 2017127816A1 US 2017014588 W US2017014588 W US 2017014588W WO 2017127816 A1 WO2017127816 A1 WO 2017127816A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frame
- segments
- joining
- spherical
- encoding
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/698—Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/90—Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/08—Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division
- H04N7/0806—Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division the signals being two or more video signals
Definitions
- the current disclosure relates to encoding and streaming of video and in particular to encoding and streaming omnidirectional video.
- Omnidirectional video provides a 360 ° view of an environment.
- Omnidirectional video allows a viewer to view any desired portion of the 360 ° environment.
- Encoding omnidirectional video may use existing encoding techniques used for 2-dimensional (2D) video, by projecting the omnidirection video from a sphere to one or more rectangles.
- FIG. 1 depicts projecting the omnidirectional video from a sphere 100 onto one or more rectangles 102, 104a, 104b, 104c, 104d, 104e, 104f using equirectangular projection and cubic projection. In both cases of equirectangular projection and cubic projection, the resulting 2D projections have wasted pixels.
- the area of the omnidirectional video is that of a sphere 102.
- the omnidirectional videos' sphere has a radius of r
- the omnidirectional video covers an area of 4nr 2 .
- the sphere's area is projected onto a rectangle having an area of 2n 2 r 2 which is 157% the area of the sphere.
- the sphere's area is projected to six squares have a combined area of 6nr 2 , which is 150% the area of the sphere. Accordingly, both projection techniques result in relatively large amount of unnecessary information being encoded.
- the present disclosures provides a new encoding method that uses a nearly equal-area projection.
- the encoding may also use ROI-targeted encoding to provide the encoded omnidirectional videos.
- the present disclosure provides adaptive streaming techniques for omnidirectional videos.
- the present disclosure further provides video capture devices and stitching and techniques for capturing panoramic and omnidirectional video.
- omnidirectional video data into a north pole segment formed by mapping a top spherical dome portion of the frame of the spherical omnidirectional video data onto a circle, a south pole segment formed by mapping a bottom spherical dome portion of the frame of the spherical omnidirectional video data onto a circle and at least one joining segment formed by mapping a spherical joining portion of the frame of the spherical omnidirectional video data joining the top spherical dome portion and the bottom spherical dome portion onto at least one rectangle; stacking the north pole segment, the south pole segment and the at least one joining segment together to form an 2- dimension (2D) frame corresponding to the frame of the spherical
- the method further comprises segmenting a plurality of frames of the spherical omnidirectional video data into a plurality of north pole segments, south pole segments and joining segments; stacking the plurality of north pole segments, south pole segments and joining segments into a plurality of 2D frames; and encoding the plurality of 2D frames.
- the at least one joining segment comprises a plurality of joining segments, each mapping a portion of the spherical joining portion onto a respective rectangles.
- the circular poles are placed within squares.
- each of the north pole segment, the south pole segment and the at least one joining segment comprise overlapping pixel data.
- the at least one joining segment comprises between 2 and 4 segments.
- the method further comprises tracking one or more regions of interest (ROI) before encoding the 2D frames.
- ROI regions of interest
- encoding the 2D frame comprises: encoding one or more view points into a first stream; and for each view point encoding additional streams comprising an intracoded frame stream, a predictive frame stream, and a bi-predictive frame stream.
- the method further comprises streaming at least one of the encoded view points.
- a system for encoding omnidirectional video comprising: a processor for executing instructions; and a memory for storing instructions, which when executed by the processor, configure the system to provide a method comprising: receiving spherical omnidirectional video data; segmenting a frame of the spherical omnidirectional video data into a north pole segment formed by mapping a top spherical dome portion of the frame of the spherical omnidirectional video data onto a circle, a south pole segment formed by mapping a bottom spherical dome portion of the frame of the spherical omnidirectional video data onto a circle and at least one joining segment formed by mapping a spherical joining portion of the frame of the spherical omnidirectional video data joining the top spherical dome portion and the bottom spherical dome portion onto at least one rectangle; stacking the north pole segment, the south pole segment and the at least one joining segment together to form an 2-dimension (2D) frame
- the instructions further configure the system to: segment a plurality of frames of the spherical omnidirectional video data into a plurality of north pole segments, south pole segments and joining segments; stack the plurality of north pole segments, south pole segments and joining segments into a plurality of 2D frames; and encode the plurality of 2D frames.
- the at least one joining segment comprises a plurality of joining segments, each mapping a portion of the spherical joining portion onto a respective rectangles.
- each of the north pole segment, the south pole segment and the at least one joining segment comprise overlapping pixel data.
- the at least one joining segment comprises between 2 and 4 segments.
- the instructions further configure the system to track one or more regions of interest (ROI) before encoding the 2D frames.
- ROI regions of interest
- encoding the 2D frame comprises: encoding one or more view points into a first stream; and for each view point encoding additional streams comprising an intracoded frame stream, a predictive frame stream, and a bi-predictive frame stream.
- the instructions further configure the system to stream at least one of the encoded view points.
- a device for use in capturing panoramic video comprising: a frame for holding a mobile device; a first fisheye lens mounted on the frame and arranged to be located over a front facing camera of the mobile device when the mobile device is held by the frame; and a second fisheye lens mounted on the frame and arranged to be located over a rear facing camera of the mobile device when the mobile device is held by the frame.
- a method of stitching multiple videos captured from one or more mobile devices comprising: generating a stitching template for each camera capturing the videos; synchronizing frames of the captured video using timestamps of the frames; remapping the multiple videos onto a sphere using the stitching template; and blending the remapped images to provide a panoramic video.
- FIG. 1 depicts equirectangular projection and cubic projection of a sphere
- FIGs. 2A and 2B depict segmented sphere projection (SSP) of a sphere
- FIG. 3 depicts stacking of segments from a segmented sphere projection
- FIG. 4 is a graph of a ration of segmented area to the spherical area based on the number of segments for both circular pole and square pole segments;
- FIG. 5 is a graph of a ration of segmented area to the spherical area based on the number of segments and different amounts of segment overlap for circular pole segments;
- FIG. 6 is a graph of a ration of segmented area to the spherical area based on the number of segments and different amounts of segment overlap for square pole segments;
- FIG. 7 depicts segmented sphere projection using a single equatorial tile segment and square poles;
- FIG. 8 depicts segmented sphere projection using multiple equatorial tile segments and square poles
- FIG. 9 depicts segmented sphere projection using equally sized equatorial tile segments and square poles
- FIG. 10 depicts stacking of overlapping tile segments
- FIG. 1 1 depicts further stacking of overlapping tile segments
- FIG. 12 depicts further stacking of overlapping tile segments
- FIG. 13 depicts stacking of overlapping tile segments for stereoscopic omnidirectional video
- FIG. 14 depicts region of interest (ROI) encoding
- FIG. 15 depicts further ROI encoding
- FIG. 16 depicts an ROI heat map
- FIG. 17 depicts ROI temporal encoding
- FIG. 18 depicts view point encoding of omnidirectional video
- FIG. 19 depicts view point encoding of a view of omnidirectional video for adaptive view point streaming
- FIG. 20 depicts adaptive view point streaming of omnidirectional video
- FIG. 21 depicts a system for encoding and streaming omnidirectional video
- FIGs. 22A, 22B and 25c depict devices for capturing panoramic and/or omnidirectional video
- FIGs. 23A, 23B, and 23C depict stitching video together; and [52] FIGs 24A and 24B depict brightness mapping. DETAILED DESCRIPTION
- Omnidirectional video can be encoded using regular encoding techniques by first projecting the video from a sphere to 2-dimensional (2D) tiles.
- Segmented sphere projection projects the spherical video from a top dome or cap portion of the sphere, a bottom dome or cap portion of the sphere and a middle equatorial portion of the sphere joining the top and bottom cap portions.
- the top and bottom cap segments may be mapped to circular tiles or to a circular portion of a square tile.
- the equatorial portion of the sphere may be mapped to one or more rectangular tiles.
- the tiles may then be stacked together into a single frame for subsequent encoding.
- the total area of tiles resulting from SSP may be smaller than the total area resulting from either equirectangular projection or cubic projection.
- the tile area for SSP is close to that of the area of the sphere of the omnidirectional video.
- the encoding efficiency of omnidirectional video may be further improved by encoding particular region of interest (ROI) portions of the omnidirectional video with a higher bitrate while encoding non-ROI portions of the omnidirectional video using a lower bitrate.
- ROI region of interest
- FIGs. 2A and 2B depict segmented sphere projection (SSP) of a sphere.
- SSP segmented sphere projection
- the sphere is segmented and mapped to tiles using an improved projection based on Sinusoidal projection.
- the sphere 200 is cut along its latitude into several segments including a north pole segment 202a, a south pole segment 202b (referred to collectively as pole segments 202) and one or more equatorial joining segments 204a-f (referred to collectively as joining segments 204) between the two poles.
- the segments may then be mapped to tiles, and in particular the pole segments may be mapped to circular tiles 206a, 206b and the joining segments 204 may be mapped to rectangular tiles 208a-208f.
- the number of joining segments can vary.
- the sphere 200 may be cut into two pole segments 210a, 210b and 3 equatorial joining segments 212a, 212b, 212c.
- Each of the two pole segments 210a, 210b are mapped to respective circles contained by squares 214a, 214b and the joining segments 212a-c are mapped to respective rectangles 216a-c.
- the individual tiles may overlap with each other a certain amount in order to maintain video quality during further processing. Once segmented into tiles, the individual tiles may be stacked together to form a frame that may be encoded using various encoding techniques.
- FIG. 3 depicts stacking of segments from a segmented sphere projection.
- individual joining segment tiles 304a-c may be stacked together with the square pole tiles 302a,b and arranged in rectangular frame 300.
- the rectangular frame 300 may then be encoded, using for example a h.264 encoder although other encoding techniques may be used.
- FIG. 4 is a graph of a ratio of segmented area to the spherical area based on the number of segments for both circular pole and square pole segments. As depicted in the graph of FIG. 4, as the number of equatorial joining segments increases, the total area of the segmented tiles approaches the area of the sphere. As can be seen from FIG. 4, the segmented tile area is greater when using square poles when compared to circular poles. As the number of segmented tiles increases, the tile-pole segmentation latitude, that is the latitude where the sphere is cut to form the segments, will be pushed toward the poles. [58] FIG.
- FIG. 5 is a graph of a ratio of segmented area to the spherical area based on the number of segments and different amounts of segment overlap for circular pole segments.
- FIG. 6 is a graph of a ration of segmented area to the spherical area based on the number of segments and different amounts of segment overlap for square pole segments.
- Table 1 shows segmentation latitudes for varying amounts of segment overlap, varying number of joining segments, and the use of circular or square pole tiles.
- each hemisphere of the sphere into 1 segment and 1 pole may be described by: niinee(o/y 2 ) + ⁇ )
- ⁇ is the tile-pole segmentation latitude.
- ⁇ 1 is the tile-tile segmentation latitude and ⁇ 2 is the tile-pole segmentation latitude.
- Table 2 provides examples about coding levels and the corresponding HEVC supported resolution, equivalent equirectangular resolution and equivalent resolution displayed on single eye(FOV 90 ° ) when using different tile.
- FIG. 7 depicts segmented sphere projection using a single equatorial tile segment and square poles.
- a sphere 700 may be segmented into north pole 702 and south pole 704 joined by a joining segment 706.
- the poles 702, 704 are mapped to circles within squares 708, 71 0 and the joining segment 706 is mapped to a rectangle 71 2.
- the pole tiles 708 and the equator tile 710 can be vertically stacked to form a frame for encoding.
- FIG. 8 depicts segmented sphere projection using multiple equatorial tile segments and square poles. As depicted, a sphere 800 may be
- SSP south segment
- the layout of the tiles may be vertically arranged when forming a frame as shown in FIGs. 7 and 8.
- the formulas for the SSP are shown below.
- Equation (3) indicates how to map a point on the cap (0', ⁇ ) into a point in circle (x',y')- It should be noticed that there are differences in the sign between north and south poles.
- Equation (4) indicates how to map the equator to the middle rectangle. It uses the same equation as Equirectangular Projection (ERP) to convert the equator area into the rectangle.
- Equations (5) and (6) indicate how to map the rest of the segments to rectangles. It also uses the same equation as Equirectangular Projection (ERP) to map to rectangles.
- FIG. 9 depicts segmented sphere projection using equally sized equatorial tile segments and square poles.
- the projection depicted in FIG. 9 may be similar to that depicted in FIG. 7; however, rather than mapping the single joining segment to a single rectangle, the projection depicted in FIG. 9 breaks the single rectangle into 4 squares. That is, the sphere 900 is segmented into a two poles 902, 904 and a joining segment 906 and mapped to circles on squares 908, 910 and squares 912a-d.
- the Segmented Sphere Projection (SSP) of FIG. 9 segments the sphere into 3 tiles: north pole, equator and south pole. The boundaries of 3 segments are 45 °N and 45 °S.
- the north and south pole are mapped into 2 circles, and the projection of the equatorial segment is the same as ERP.
- the diameter of the circle is equal to the height of the equatorial segments since both pole segments and equatorial segment have a 90 " latitude span.
- the segment tiles may be packed together to form a frame for encoding.
- the packing process attempts to put each region of the SSP segments into one 2D image with the least wasted area.
- the two circles on squares 1002 are placed vertically on top of the rectangles 1004. The circles are centered horizontally as the center of the rectangle of the equator. All the other rectangles are centered vertically as the center of the rectangle of the equator.
- the 2 nd type depicted in FIG. 1 1
- two circles 1 102 are place horizontally on top of the rectangles 1 104.
- the circles are also centered horizontally as the center of the rectangle of the equator and all the other rectangles are also centered vertically as the center of the rectangle of the equator.
- two circles 1202 are put on the left side and the right side of the rectangle 1204 of the equator.
- the highest point of the circle is as high as the top edge of the rectangle of the equator. All the other rectangles are placed so that the bottom edges of all rectangles are at the same height.
- FIG. 13 depicts stacking of overlapping tile segments for stereoscopic omnidirectional video.
- stereoscopic video there are two views.
- the segmented tiles of each of the views 1302, 1304 are packed side by side.
- FIG. 13 shows a layout of 1 tile 1 pole SSP that supports stereoscopic video.
- SSP Video Information box is defined as
- geometry_type unsigned int(8) geometry_type
- is_stereoscopic indicates whether stereoscopic media rendering is used or not.
- the value of this field is equal to 1 to indicate that the video in the referenced track is divided into two parts to provide different texture data for left eye and right eye separately according to the composition type specified by stereoscopic_type.
- geometry_type indicates the type of geometry for rendering of omnidirectional media. It may be GEOMETRY_ERP indicating that an
- GEOMETRY_CMP indicating a cube map projection
- GEOMETRY_SSP indicating a segmented sphere projection
- stereoscopic_type indicates the type of composition for the stereoscopic video in the referenced track.
- ssp_theta_num indicates how many ⁇ are used. Then the number of segments of SSP including north pole and south pole in total will be 2 * ssp_theta_num +1 , the default value is 1 .
- s sp_thet a_id indicates the identifier of the theta.
- s sp_thet a contains ⁇ values in terms of degrees, ranging from 0-180. The default value is 45.
- s sp_over i ap_width indicates the width, in pixels, of overlap.
- FIG. 14 depicts region of interest (ROI) encoding.
- ROI region of interest
- an ROI target encoding process 1400 uses ROI information 1406, which may comprise a mask 1408 specifying the ROI portion of the raw video 1402 being encoded.
- the raw video 1402 is depicted as a video frame 1404 having a person and a tree, with the person being the ROI.
- the raw video 1402 and the ROI information 1406 may be used to lower the encoding quality of the non-ROI areas of the raw video by the encoder 1410.
- the reduced quality of the non-ROI areas allow an optimized bitrate allocation, in order to acquire highest quality encoding of ROI areas with constant bitrate.
- the encoder 1410 provides an ROI optimized video 1412.
- FIG. 15 depicts further ROI encoding.
- the process 1500 is similar to that described above with reference to FIG. 14; however, the process tracks ROIs across the raw video.
- the raw video 1402 is provided to an ROI analysis and tracking functionality 1506.
- ROI tracking the user may point out objects in the first frame, or any subsequent frames, that the ROIs are based on.
- the tracking scheme uses an image segmentation algorithm to estimate an ROI corresponding to the selected objects.
- the image segmentation algorithm is tuned specifically for omnidirectional videos such that it automatically adjusts the area allocation to achieve better efficiency when the resulting ROI is applied to
- an optic flow tracking algorithm is used to generate the ROIs for successive frames based on prior frames.
- the number of feature points, the fineness of the optic flow vector field and other parameters are chosen by the algorithm to maximize its efficiency for the projection scheme. Users can pause the optic flow tracking algorithm at any point, and manually define the ROI for a specific frame with the same image segmentation algorithm.
- the optic flow tracking algorithm will use the newest manually specified mask as its reference once it is resumed.
- FIG. 16 depicts an ROI heat map.
- the heat map 1600 depicts the most common locations, depicted by brightness with the most common area being depicted by white 1602, of ROIs.
- the heat map 1600 provides information on the most common locations within the pole tiles 1604, 1606 and the equatorial tiles 1608.
- the ROI expansion size margin is relatively low and the ROI border is sharp.
- the segmentation iteration is high and the number of feature points is small.
- the fineness of the optic flow field is low. In the low frequency regions there is a smaller ROI region with a sharp transition tuned for static video.
- the ROI expansion size margin is relatively high and the ROI border is smooth.
- the segmentation iteration is low and the number of feature points is large.
- the fineness of the optic flow field is high. In the high frequency regions there is a larger ROI region with a smooth transition and is motion-sensitive. [91 ]
- the first is adjusting the QP utilization. Regular video encoders treat every block (CU) in the video stream equally. However, in omnidirectional video and with the information on ROI, it is possible to tune the parameter to give ROI areas higher quality.
- the second is resolution utilization. As described above, a omnidirectional video will be cut and reshaped. Some of the tiles may not contain any ROI area. Therefore there is no need to keep the same resolution ratio for those tiles. Hence those tiles which doesn't contain ROI area can be downscaled to certain resolution encoded with tuned qp parameters in order to save bitrate.
- ROI area's resolution 1704, 1708 will be enhanced.
- the extra pixel information will be stored in even-frames 1706 while the original frames become all odd-frames 1702.
- the changes of resolution may be uncomfortable, and as such, the resolution may be slowly adjusted while there is limited motion in the video.
- FIG. 18 depicts view point encoding of omnidirectional video.
- the omnidirectional video 1800 is encoded to provide a plurality of different view points 1802, 1804, 1806.
- Each view point stream is encoded into different time blocks 1808.
- the streaming of different view points can be switched between each other at the different clip starting time blocks.
- the encoded time blocks form a 2D caching scheme that allows different time blocks for different view points to be cached.
- the view points are encoded to include additional streams of l-frames, P-frames and B-frames that allow a smart assembler to quickly recover the decoded stream when switching between the view points.
- FIG. 19 depicts view point encoding of a view of omnidirectional video for adaptive view point streaming.
- an original view point stream 1902 is further encoded into additional streams 1904, 1906, 1908 for the different time clips.
- the additional streams 1904, 1906, 1908 encode different frames into l-frames, P-frames and B-frames.
- the original stream and the additional streams 1910 are transmitted to allow quick view point switching at any time point.
- FIG. 19 after encoding one whole video stream, several addition streams are encoded. All frames after the l-frame in a GOP in the original stream 1902 are encoded as l-frames forming Stream 0 1904.
- This view point encoding can shorten the average waiting time during temporal random access. Combined with spatial division feature of encoding the different view points described above, it is possible to achieve high spatial and temporal random access ability during the omnidirectional video streaming.
- FIG. 20 depicts adaptive view point streaming of omnidirectional video.
- each view point with additional streams of data which provides an improved adaptive streaming for omnidirectional video.
- previous adaptive streaming techniques encoded a video into a plurality of different quality streams 2002.
- the different quality streams allowed the streaming of a video to adapt to network conditions.
- the adapting streaming for omnidirectional viewpoints allows the adaptive streaming of multiple view points.
- the additional streams allow the quick switching between view points.
- FIG. 21 a system for encoding and streaming omnidirectional video.
- the system is depicted as a server 2100 for processing omnidirectional video that may be provided to the server 2100 from a video system 2102 that captures 360 ° video.
- the server 2100 comprises a processing unit 2104 for executing instructions.
- An input/output (I/O) 2106 interface allows additional components to be operatively coupled to the processing unit 2104.
- the server 2100 further comprises a non-volatile (NV) memory 2108 for storing data and instructions and a memory unit 21 10, such as RAM, for storing instructions for execution by the processing unit 2104.
- the instructions stored in the memory unit 21 10 when executed by the processing unit 2104 configure the server 2100 to provide an omnidirectional video encoding functionality 21 12 in accordance with the functionality described above.
- the encoding functionality 21 12 comprises functionality for segmenting and mapping 21 14 spherical omnidirectional video data to a number of pole and equatorial joining segments.
- Tile stacking functionality 21 16 arranges the segments into a single frame for subsequent encoding.
- the functionality further comprises ROI tracking functionality 21 18 that tracks ROIs across frames of the omnidirectional video. The stacked images and ROI information is then used by encoding functionality 2120 to encode the video data.
- FIGs. 22A, 22B and 22C depict devices for capturing panoramic and/or omnidirectional video.
- the devices depicted in FIGs. 22A, 22B, 22C each comprise fish eye-lenses that are mounted over camera of the device.
- the fish-eye devices may be used with a single mobile phone or with additional mobile phones.
- FIG. 22A depicts single phone 2200a that has a front facing camera and a rear facing camera.
- a panoramic video capture device 2202a fits over the phone 2200a and places a first fisheye lens 2204a over a front facing camera and a second fisheye lens 2206a over a back facing camera.
- FIG. 22B depicts a similar panoramic video capture device 2202b; however, rather than placing fisheye lenses over the front and back cameras, the device 2202b is designed for holding two mobile devices 2200b-1 , 2200b-2 back-to-back and places the fisheye lenses 2204b, 2204b over the front facing cameras of the mobile devices.
- FIG. 22C depicts a further device 2202C that is designed for holding three mobile devices 2200c-1 , 2200c-2, 2200c-3.
- the device 2202c holds the three mobile devices and arranges fisheye lenses 2204c, 2206c, 2208c over the front facing cameras of the devices.
- the devices 2202a, 2202b, 2202c allow panoramic video to be captured using common mobile devices.
- Each of the fisheye lenses may provide a 180 ° field of view.
- two or more fisheye video streams are captured simultaneously.
- one of the capture devices acts as the master capture device and may make connections with other devices to to receive the video streams, stitch the videos together and output the panoramic videos.
- the two video streams captured by the front and back cameras of the device can be stitched together by the mobile device and stream out the resulting video.
- stitching can be done in a player, which is suitable for low-power capture devices. In that case, all capture devices stream video directly to the player.
- the devices depicted in FIGs. 22A, 22B, 22C may be used to stream panoramic video in a video chat system.
- the video streaming process both between capture devices and from capture device to player, may use Real- time Transport Protocol to transfer real-time video and use Session
- Timestamps for each frame may be added to the stream for synchronization.
- the stitching process may be performed as follows:
- Static photos are captured for every camera and key points are extracted using algorithms like SIFT. After matching the key points from different camera, each camera's parameters and rotation can be generated. More details about stitching is described below.
- each frame from different camera can be remapped into a sphere.
- Blend Linear or multi-band blending algorithm is used to blend remapped frames from different cameras to produce a 360 degree panorama frame, which is usually projected into a rectangular image as described above.
- Generating the stitching template is illustrated in FIGs. 23A, 23B, 23C.
- Using key points directly extracted from fisheye images to perform matching may produce bad results as depicted in FIG. 23A.
- the mismatch between the fisheye videos 2302a, 2302b may result from distortion effects of the fisheye lens make the objects far from the image center hard to recognize by algorithms like SIFT and because most part of the images do not overlap.
- Generating the stitching template uses predefined approximate camera parameters to remap the fisheye images to flattened images 2304a, 2304b before extracting key points as depicted in FIG 23B. Based on the
- key points at certain areas 2306a, 2306b can be ignored safely. After a correct match is found in remapped images 2306a, 2306b, those key points are then un-mapped into the original fisheye images 2308a, 2308b to compute the final camera parameters and rotation to provide a proper matching between captured videos.
- FIGs. 24A, 23B Another important step in generating the template is calculating a brightness map as depicted in FIGs. 24A, 23B.
- a fisheye lens As a fisheye lens is used, brightness of each pixel varies greatly near the border as depicted in FIG. 24A.
- the brightness map which provides brightness values for each pixel in an image, can be calculated as depicted in FIG 24B and used later to correct image brightness before blending.
- the stitching process may also involve audio reconstruction. Audio streams captured by several devices at different positions can be
- the player (usually smart devices or headsets) receives stitched panorama video (or multiple original video stream and then do stitching) and displays it.
- a user can look at different angles using rotation sensors in the player such as gyroscopes.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Systems, devices and methods for capturing, encoding and streaming 360° video. Devices allow a fisheye lens to be placed over mobile device cameras allowing two or more cameras to capture a full 360° video. Omni directional video may be segmented into a plurality of poles and one or more equatorial tiles. The segmented tiles may be stacked into a frame for encoding. Multiple view points may be encoded in order to provide adaptive view point streaming.
Description
OMNIDIRECTIONAL VIDEO ENCODING AND STREAMING
TECHNICAL FIELD
[1 ] The current disclosure relates to encoding and streaming of video and in particular to encoding and streaming omnidirectional video. BACKGROUND
[2] Omnidirectional video provides a 360 ° view of an environment.
Omnidirectional video allows a viewer to view any desired portion of the 360 ° environment. Encoding omnidirectional video may use existing encoding techniques used for 2-dimensional (2D) video, by projecting the omnidirection video from a sphere to one or more rectangles. FIG. 1 depicts projecting the omnidirectional video from a sphere 100 onto one or more rectangles 102, 104a, 104b, 104c, 104d, 104e, 104f using equirectangular projection and cubic projection. In both cases of equirectangular projection and cubic projection, the resulting 2D projections have wasted pixels. As depicted in FIG. 1 the area of the omnidirectional video is that of a sphere 102. If the omnidirectional videos' sphere has a radius of r, the omnidirectional video covers an area of 4nr2. However, in equirectangular projection, the sphere's area is projected onto a rectangle having an area of 2n2r2 which is 157% the area of the sphere. Similarly, in cubic projection, the sphere's area is projected to six squares have a combined area of 6nr2, which is 150% the area of the sphere. Accordingly, both projection techniques result in relatively large amount of unnecessary information being encoded.
[3] In 2D videos, photographers tend to use a whole frame to capture all regions of interest (ROI). However, in omnidirectional videos, a relatively high percentage of pixels are used to render environment of the scene. Regular encoding methods treat these non-ROI regions the same as ROI regions. Accordingly, the non-ROI areas may utilize bit rate unnecessarily resulting in lower available bit rate for encoding ROI areas. The encoded video may streamed to a viewer for interactive viewing of the 360 ° video
[4] An additional, alternative and/or improved encoding technique for encoding omnidirectional video as well as an improved streaming technique for streaming the encoded omnidirectional video is desirable.
SUMMARY
[5] The present disclosures provides a new encoding method that uses a nearly equal-area projection. The encoding may also use ROI-targeted encoding to provide the encoded omnidirectional videos. Further, the present disclosure provides adaptive streaming techniques for omnidirectional videos. The present disclosure further provides video capture devices and stitching and techniques for capturing panoramic and omnidirectional video.
[6] In accordance with the present disclosure, there is provided a method of encoding omnidirectional video comprising: receiving spherical
omnidirectional video data segmenting a frame of the spherical
omnidirectional video data into a north pole segment formed by mapping a top spherical dome portion of the frame of the spherical omnidirectional video data onto a circle, a south pole segment formed by mapping a bottom spherical dome portion of the frame of the spherical omnidirectional video data onto a circle and at least one joining segment formed by mapping a spherical joining portion of the frame of the spherical omnidirectional video data joining the top spherical dome portion and the bottom spherical dome portion onto at least one rectangle; stacking the north pole segment, the south pole segment and the at least one joining segment together to form an 2- dimension (2D) frame corresponding to the frame of the spherical
omnidirectional video data; and encoding the 2D frame. [7] In a further embodiment, the method further comprises segmenting a plurality of frames of the spherical omnidirectional video data into a plurality of north pole segments, south pole segments and joining segments; stacking the plurality of north pole segments, south pole segments and joining segments into a plurality of 2D frames; and encoding the plurality of 2D frames.
[8] In a further embodiment of the method, the at least one joining segment comprises a plurality of joining segments, each mapping a portion of the spherical joining portion onto a respective rectangles.
[9] In a further embodiment of the method, the circular poles are placed within squares.
[10] In a further embodiment of the method, each of the north pole segment, the south pole segment and the at least one joining segment comprise overlapping pixel data.
[1 1 ] In a further embodiment of the method, there is up to 5% overlap between the segments.
[12] In a further embodiment of the method, the at least one joining segment comprises between 2 and 4 segments.
[13] In a further embodiment, the method further comprises tracking one or more regions of interest (ROI) before encoding the 2D frames. [14] In a further embodiment of the method, encoding the 2D frame comprises: encoding one or more view points into a first stream; and for each view point encoding additional streams comprising an intracoded frame stream, a predictive frame stream, and a bi-predictive frame stream.
[15] In a further embodiment, the method further comprises streaming at least one of the encoded view points.
[16] In accordance with the present disclosure there is further provided a system for encoding omnidirectional video comprising: a processor for executing instructions; and a memory for storing instructions, which when executed by the processor, configure the system to provide a method comprising: receiving spherical omnidirectional video data; segmenting a frame of the spherical omnidirectional video data into a north pole segment formed by mapping a top spherical dome portion of the frame of the spherical omnidirectional video data onto a circle, a south pole segment formed by
mapping a bottom spherical dome portion of the frame of the spherical omnidirectional video data onto a circle and at least one joining segment formed by mapping a spherical joining portion of the frame of the spherical omnidirectional video data joining the top spherical dome portion and the bottom spherical dome portion onto at least one rectangle; stacking the north pole segment, the south pole segment and the at least one joining segment together to form an 2-dimension (2D) frame corresponding to the frame of the spherical omnidirectional video data; and encoding the 2D frame.
[17] In a further embodiment of the system, the instructions further configure the system to: segment a plurality of frames of the spherical omnidirectional video data into a plurality of north pole segments, south pole segments and joining segments; stack the plurality of north pole segments, south pole segments and joining segments into a plurality of 2D frames; and encode the plurality of 2D frames. [18] In a further embodiment of the system, the at least one joining segment comprises a plurality of joining segments, each mapping a portion of the spherical joining portion onto a respective rectangles.
[19] In a further embodiment of the system, the circular poles are placed within squares. [20] In a further embodiment of the system, each of the north pole segment, the south pole segment and the at least one joining segment comprise overlapping pixel data.
[21 ] In a further embodiment of the system, there is up to 5% overlap between the segments. [22] In a further embodiment of the system, the at least one joining segment comprises between 2 and 4 segments.
[23] In a further embodiment of the system, the instructions further configure the system to track one or more regions of interest (ROI) before encoding the 2D frames.
[24] In a further embodiment of the system, encoding the 2D frame comprises: encoding one or more view points into a first stream; and for each view point encoding additional streams comprising an intracoded frame stream, a predictive frame stream, and a bi-predictive frame stream. [25] In a further embodiment of the system, the instructions further configure the system to stream at least one of the encoded view points.
[26] In accordance with the present disclosure there is further provided a device for use in capturing panoramic video comprising: a frame for holding a mobile device; a first fisheye lens mounted on the frame and arranged to be located over a front facing camera of the mobile device when the mobile device is held by the frame; and a second fisheye lens mounted on the frame and arranged to be located over a rear facing camera of the mobile device when the mobile device is held by the frame.
[27] In accordance with the present disclosure there is further provided a method of stitching multiple videos captured from one or more mobile devices comprising: generating a stitching template for each camera capturing the videos; synchronizing frames of the captured video using timestamps of the frames; remapping the multiple videos onto a sphere using the stitching template; and blending the remapped images to provide a panoramic video. BRIEF DESCRIPTION OF THE DRAWINGS
[28] Features, aspects and advantages of the present disclosure will become better understood with regard to the following description and accompanying drawings in which:
[29] FIG. 1 depicts equirectangular projection and cubic projection of a sphere;
[30] FIGs. 2A and 2B depict segmented sphere projection (SSP) of a sphere;
[31 ] FIG. 3 depicts stacking of segments from a segmented sphere projection;
[32] FIG. 4 is a graph of a ration of segmented area to the spherical area based on the number of segments for both circular pole and square pole segments;
[33] FIG. 5 is a graph of a ration of segmented area to the spherical area based on the number of segments and different amounts of segment overlap for circular pole segments;
[34] FIG. 6 is a graph of a ration of segmented area to the spherical area based on the number of segments and different amounts of segment overlap for square pole segments; [35] FIG. 7 depicts segmented sphere projection using a single equatorial tile segment and square poles;
[36] FIG. 8 depicts segmented sphere projection using multiple equatorial tile segments and square poles;
[37] FIG. 9 depicts segmented sphere projection using equally sized equatorial tile segments and square poles;
[38] FIG. 10 depicts stacking of overlapping tile segments;
[39] FIG. 1 1 depicts further stacking of overlapping tile segments;
[40] FIG. 12 depicts further stacking of overlapping tile segments;
[41 ] FIG. 13 depicts stacking of overlapping tile segments for stereoscopic omnidirectional video;
[42] FIG. 14 depicts region of interest (ROI) encoding;
[43] FIG. 15 depicts further ROI encoding;
[44] FIG. 16 depicts an ROI heat map;
[45] FIG. 17 depicts ROI temporal encoding;
[46] FIG. 18 depicts view point encoding of omnidirectional video;
[47] FIG. 19 depicts view point encoding of a view of omnidirectional video for adaptive view point streaming;
[48] FIG. 20 depicts adaptive view point streaming of omnidirectional video; [49] FIG. 21 depicts a system for encoding and streaming omnidirectional video;
[50] FIGs. 22A, 22B and 25c depict devices for capturing panoramic and/or omnidirectional video;
[51 ] FIGs. 23A, 23B, and 23C depict stitching video together; and [52] FIGs 24A and 24B depict brightness mapping. DETAILED DESCRIPTION
[53] Omnidirectional video can be encoded using regular encoding techniques by first projecting the video from a sphere to 2-dimensional (2D) tiles. Segmented sphere projection (SSP) projects the spherical video from a top dome or cap portion of the sphere, a bottom dome or cap portion of the sphere and a middle equatorial portion of the sphere joining the top and bottom cap portions. The top and bottom cap segments may be mapped to circular tiles or to a circular portion of a square tile. The equatorial portion of the sphere may be mapped to one or more rectangular tiles. The tiles may then be stacked together into a single frame for subsequent encoding. The total area of tiles resulting from SSP may be smaller than the total area resulting from either equirectangular projection or cubic projection. The tile area for SSP is close to that of the area of the sphere of the omnidirectional video. [54] In addition to segmenting the sphere into tiles having a lower total area than other projection techniques such as equirectangular projection or cubic projection, the encoding efficiency of omnidirectional video may be further improved by encoding particular region of interest (ROI) portions of the
omnidirectional video with a higher bitrate while encoding non-ROI portions of the omnidirectional video using a lower bitrate.
[55] FIGs. 2A and 2B depict segmented sphere projection (SSP) of a sphere. As depicted in FIG. 2A, the sphere is segmented and mapped to tiles using an improved projection based on Sinusoidal projection. As depicted, the sphere 200 is cut along its latitude into several segments including a north pole segment 202a, a south pole segment 202b (referred to collectively as pole segments 202) and one or more equatorial joining segments 204a-f (referred to collectively as joining segments 204) between the two poles. The segments may then be mapped to tiles, and in particular the pole segments may be mapped to circular tiles 206a, 206b and the joining segments 204 may be mapped to rectangular tiles 208a-208f. As depicted in FIG. 2B the number of joining segments can vary. The sphere 200 may be cut into two pole segments 210a, 210b and 3 equatorial joining segments 212a, 212b, 212c. Each of the two pole segments 210a, 210b are mapped to respective circles contained by squares 214a, 214b and the joining segments 212a-c are mapped to respective rectangles 216a-c. The individual tiles may overlap with each other a certain amount in order to maintain video quality during further processing. Once segmented into tiles, the individual tiles may be stacked together to form a frame that may be encoded using various encoding techniques.
[56] FIG. 3 depicts stacking of segments from a segmented sphere projection. As depicted, individual joining segment tiles 304a-c may be stacked together with the square pole tiles 302a,b and arranged in rectangular frame 300. The rectangular frame 300 may then be encoded, using for example a h.264 encoder although other encoding techniques may be used.
[57] FIG. 4 is a graph of a ratio of segmented area to the spherical area based on the number of segments for both circular pole and square pole segments. As depicted in the graph of FIG. 4, as the number of equatorial joining segments increases, the total area of the segmented tiles approaches the area of the sphere. As can be seen from FIG. 4, the segmented tile area
is greater when using square poles when compared to circular poles. As the number of segmented tiles increases, the tile-pole segmentation latitude, that is the latitude where the sphere is cut to form the segments, will be pushed toward the poles. [58] FIG. 5 is a graph of a ratio of segmented area to the spherical area based on the number of segments and different amounts of segment overlap for circular pole segments. FIG. 6 is a graph of a ration of segmented area to the spherical area based on the number of segments and different amounts of segment overlap for square pole segments. As can be seen from the graphs of FIGs. 5 and 6, if there is overlap between the segmented tiles, then as the number of segments increases, so does the total area. In contrast, when there is no overlap between the segmented tiles, the total area decreases as the number of segments increases. Table 1 below shows segmentation latitudes for varying amounts of segment overlap, varying number of joining segments, and the use of circular or square pole tiles.
segmentations and overlap
[59] As described above, the number of segment tiles may vary. The segmentation of each hemisphere of the sphere into 1 segment and 1 pole may be described by: niinee(o/y2) + ø)
[60] Where Θ is the tile-pole segmentation latitude. The minimum total area is about 107.1 % of the sphere's area of 4nr2 when Θ = 32.70°.
[61 ] The segmentation of each hemisphere of the sphere into 2 segments and 1 pole may be described by:
Γηϊηθι<θ2 ε(0 π 7/2 )) 4ττΓ2 ((π/2 01)2 + θ1 + θ1 + (θ2 - COS (2)
[62] Where θ1 is the tile-tile segmentation latitude and θ2 is the tile-pole segmentation latitude. The minimum total area is about 105.4% when θχ = 25.34° and θ2 = 38.22°.
[63] As described above, the poles may be mapped to circles; however, when encoding the resulting tiles, the circles are placed in squares. Placing the circles of the poles in squares will increase the total area of the segments to about 1 17.8% for 1 tile and 1 pole per hemisphere when Θ = 45°. When each hemisphere is segmented into 2 tiles and 1 pole the total area is approximately 1 13.4% when θ = 35.07° and θ2 = 53.16°. [64] Different hardware decoders have different decoding ability. Taking
HEVC an as example, Table 2 provides examples about coding levels and the corresponding HEVC supported resolution, equivalent equirectangular resolution and equivalent resolution displayed on single eye(FOV 90°) when using different tile.
Table 2 showing resolutions for different segmenta ions and HEVC encoding levels
[65] FIG. 7 depicts segmented sphere projection using a single equatorial tile segment and square poles. As depicted, a sphere 700 may be segmented into north pole 702 and south pole 704 joined by a joining segment 706. The poles 702, 704 are mapped to circles within squares 708, 71 0 and the joining segment 706 is mapped to a rectangle 71 2. As depicted, the pole tiles 708 and the equator tile 710 can be vertically stacked to form a frame for encoding.
[66] FIG. 8 depicts segmented sphere projection using multiple equatorial tile segments and square poles. As depicted, a sphere 800 may be
segmented into north pole and south pole joined by a number of joining segments. The poles are mapped to circles within squares 802, 804 and the joining segments are mapped to rectangles 806. As depicted, the pole tiles 808 and the equator tiles 81 0 can be vertically stacked to form a frame for encoding. [67] As is shown in both FIGs. 7 and 8, Segmented Sphere Projection
(SSP) segments the sphere into several segments: north pole, south pole and the rest. The boundaries of all segments in the north and south are
symmetric. The north and south poles are mapped into 2 circles, and the rest of the segments are projected to the one or more rectangles. [68] The layout of the tiles may be vertically arranged when forming a frame as shown in FIGs. 7 and 8. The formulas for the SSP are shown below.
Assuming there are Θ, namely α-ι , a2, ... k , then there will be 2k + 1.
= - hi + > φ Ε (-π, π]
i = 1,2, ... /c— 1
[69] The origin is in the upper left corner of the image. The initial side of θ' is located in the equatorial plane, θ' in north hemisphere is positive and in south hemisphere is negative. Equation (3) indicates how to map a point on the cap (0', ø) into a point in circle (x',y')- It should be noticed that there are differences in the sign between north and south poles. Equation (4) indicates how to map the equator to the middle rectangle. It uses the same equation as Equirectangular Projection (ERP) to convert the equator area into the rectangle. Equations (5) and (6) indicate how to map the rest of the segments to rectangles. It also uses the same equation as Equirectangular Projection (ERP) to map to rectangles.
[70] FIG. 9 depicts segmented sphere projection using equally sized equatorial tile segments and square poles. The projection depicted in FIG. 9 may be similar to that depicted in FIG. 7; however, rather than mapping the single joining segment to a single rectangle, the projection depicted in FIG. 9 breaks the single rectangle into 4 squares. That is, the sphere 900 is segmented into a two poles 902, 904 and a joining segment 906 and mapped to circles on squares 908, 910 and squares 912a-d. As depicted, the
Segmented Sphere Projection (SSP) of FIG. 9 segments the sphere into 3 tiles: north pole, equator and south pole. The boundaries of 3 segments are 45 °N and 45 °S. The north and south pole are mapped into 2 circles, and the projection of the equatorial segment is the same as ERP. The diameter of the circle is equal to the height of the equatorial segments since both pole segments and equatorial segment have a 90 " latitude span.
[71 ] The equatorial segment is split into 4 squares in order to get "faces" of same size. The frame packing structure is depicted in FIG. 9. The corners of the circular poles are filled with "null" values to form the square. Points on the sphere are mapped to the respective tiles according to:
[72]
[73] The segment tiles may be packed together to form a frame for encoding. The packing process attempts to put each region of the SSP segments into one 2D image with the least wasted area.
[74] There are three packing types for SSP. The particular packing method may be selected at the encoder side in order to minimize the wasted area. For the 1 st type, depicted in FIG. 10, the two circles on squares 1002 are placed vertically on top of the rectangles 1004. The circles are centered horizontally as the center of the rectangle of the equator. All the other rectangles are centered vertically as the center of the rectangle of the equator. For the 2nd type, depicted in FIG. 1 1 , two circles 1 102 are place horizontally on top of the rectangles 1 104. The circles are also centered horizontally as the center of the rectangle of the equator and all the other rectangles are also centered vertically as the center of the rectangle of the equator. For the 3rd type, depicted in FIG. 12, two circles 1202 are put on the left side and the right side of the rectangle 1204 of the equator. The highest point of the circle is as high as the top edge of the rectangle of the equator. All the other rectangles are placed so that the bottom edges of all rectangles are at the same height.
[75] The overlap of pixels, depicted by reference numbers 1006, 1 106, 1206 in the above FIGs 10 - 12, for the segments will also be put in the 2D image and the width of the overlap area can be indicate by syntax and semantics communicated with the encoded image data to decoders.
[76] FIG. 13 depicts stacking of overlapping tile segments for stereoscopic omnidirectional video. For stereoscopic video, there are two views. The segmented tiles of each of the views 1302, 1304 are packed side by side. FIG. 13 shows a layout of 1 tile 1 pole SSP that supports stereoscopic video.
[77] New syntax is provided below that allows the existing MP4 format to support SSP format. An SSP Video Information box is defined as
below.Although a specific syntax and semantics are described below, it will be appreciated that other implementations are possible.
Syntax
aligned(8) class SSPVideoInfoBox extends FullBox ( ' ssp ' , version = 0, 0)
{
bit (8) reserved = 0;
unsigned int(l) is_stereoscopic ;
if ( is_sterescopic )
unsigned int(8) stereoscopic_type ;
unsigned int(8) geometry_type ;
if (geometry_type == GEOMETRY_SSP ) {
unsigned int(8) ssp_theta_num;
for ( ssp_theta_id=0 ; ssp_theta_id < ssp_theta_num; ssp_theta_id
++) {
unsigned int(8) ssp_theta [ ssp_theta_id] ;
unsigned int(8) ssp_overlap_pixel [ ssp_theta_id] ;
}
}
Semantics
Box Type ssp
Container Scheme Information box ( 'vrvd'
Mandatory Yes
Quantity One
[78] is_stereoscopic indicates whether stereoscopic media rendering is used or not. The value of this field is equal to 1 to indicate that the video in the referenced track is divided into two parts to provide different texture data for left eye and right eye separately according to the composition type specified by stereoscopic_type.
[79] geometry_type indicates the type of geometry for rendering of omnidirectional media. It may be GEOMETRY_ERP indicating that an
Equirectangular projection is used, GEOMETRY_CMP indicating a cube map projection is used or GEOMETRY_SSP indicating a segmented sphere projection.
[80] stereoscopic_type indicates the type of composition for the stereoscopic video in the referenced track.
[81 ] ssp_theta_num indicates how many Θ are used. Then the number of segments of SSP including north pole and south pole in total will be 2* ssp_theta_num +1 , the default value is 1 .
[82] s sp_thet a_id indicates the identifier of the theta.
[83] s sp_thet a contains Θ values in terms of degrees, ranging from 0-180. The default value is 45.
[84] s sp_over i ap_width indicates the width, in pixels, of overlap. [85] The above has described the segmenting and mapping of the spherical omnidirectional video to a number of segments and packed together into a frame and encoded. It will be appreciated that while a single segmenting and mapping a single frame is described, the process will map each of the frames of the omnidirectional video. In addition to the efficient mapping provided by the segmented sphere projection, the encoding efficiency for omnidirectional video may be improved by encoding regions of interest with a higher bitrate while encoding non-ROIs with a lower bitrate.
[86] FIG. 14 depicts region of interest (ROI) encoding. As shown in FIG.14, an ROI target encoding process 1400 uses ROI information 1406, which may comprise a mask 1408 specifying the ROI portion of the raw video 1402 being encoded. The raw video 1402 is depicted as a video frame 1404 having a person and a tree, with the person being the ROI. The raw video 1402 and the ROI information 1406 may be used to lower the encoding quality of the non-ROI areas of the raw video by the encoder 1410. The reduced quality of the non-ROI areas allow an optimized bitrate allocation, in order to acquire highest quality encoding of ROI areas with constant bitrate. The encoder 1410 provides an ROI optimized video 1412. The output is depicted as having a frame 1414 with a high quality encoding of the person, while the tree is a low quality encoding. [87] FIG. 15 depicts further ROI encoding. The process 1500 is similar to that described above with reference to FIG. 14; however, the process tracks ROIs across the raw video. The raw video 1402 is provided to an ROI analysis and tracking functionality 1506.
[88] For ROI tracking, the user may point out objects in the first frame, or any subsequent frames, that the ROIs are based on. The tracking scheme uses an image segmentation algorithm to estimate an ROI corresponding to the selected objects. The image segmentation algorithm is tuned specifically for omnidirectional videos such that it automatically adjusts the area allocation to achieve better efficiency when the resulting ROI is applied to
omnidirectional encoding. Users can further correct the estimation by pointing out the misclassified region and the ROI will be optimized.
[89] Once the ROI for the first frame is decided, an optic flow tracking algorithm is used to generate the ROIs for successive frames based on prior frames. The number of feature points, the fineness of the optic flow vector field and other parameters are chosen by the algorithm to maximize its efficiency for the projection scheme. Users can pause the optic flow tracking algorithm at any point, and manually define the ROI for a specific frame with the same image segmentation algorithm. The optic flow tracking algorithm will use the newest manually specified mask as its reference once it is resumed.
[90] FIG. 16 depicts an ROI heat map. The heat map 1600 depicts the most common locations, depicted by brightness with the most common area being depicted by white 1602, of ROIs. The heat map 1600 provides information on the most common locations within the pole tiles 1604, 1606 and the equatorial tiles 1608. For the lower observed frequency of ROIs, the ROI expansion size margin is relatively low and the ROI border is sharp. The segmentation iteration is high and the number of feature points is small. The fineness of the optic flow field is low. In the low frequency regions there is a smaller ROI region with a sharp transition tuned for static video. For the higher observed frequency of ROIs, the ROI expansion size margin is relatively high and the ROI border is smooth. The segmentation iteration is low and the number of feature points is large. The fineness of the optic flow field is high. In the high frequency regions there is a larger ROI region with a smooth transition and is motion-sensitive.
[91 ] For these extracted and tracked ROIs, there are two general ways to control the encoding quality. The first is adjusting the QP utilization. Regular video encoders treat every block (CU) in the video stream equally. However, in omnidirectional video and with the information on ROI, it is possible to tune the parameter to give ROI areas higher quality. The second is resolution utilization. As described above, a omnidirectional video will be cut and reshaped. Some of the tiles may not contain any ROI area. Therefore there is no need to keep the same resolution ratio for those tiles. Hence those tiles which doesn't contain ROI area can be downscaled to certain resolution encoded with tuned qp parameters in order to save bitrate.
[92] For the ROI area, it is possible to simply upscale the whole tile to a higher resolution, or it may be possible to use a temporal resolution
enhancement as shown in FIG .17. Using the temporal resolution
enhancement, only ROI area's resolution 1704, 1708 will be enhanced. The extra pixel information will be stored in even-frames 1706 while the original frames become all odd-frames 1702. The changes of resolution may be uncomfortable, and as such, the resolution may be slowly adjusted while there is limited motion in the video.
[93] The above has described the segmenting and subsequent encoding of the entire omnidirectional video. However, it may be advantageous to encode particular view points of the omnidirectional video. Encoding the different view points as separate streams, may allow for more efficient streaming of the encoded video as only the stream of the particular view point being displayed to the user needs to be transmitted. If the user navigates through the omnidirectional video, different view point streams can be retrieved.
[94] FIG. 18 depicts view point encoding of omnidirectional video. As depicted, the omnidirectional video 1800 is encoded to provide a plurality of different view points 1802, 1804, 1806. Each view point stream is encoded into different time blocks 1808. The streaming of different view points can be switched between each other at the different clip starting time blocks.
However, as depicted, if each clip starting block is 5 seconds long, switching
between view points may take up to 5 seconds to be able to properly decode the new view point. The encoded time blocks form a 2D caching scheme that allows different time blocks for different view points to be cached.
[95] As described further below, it is possible to encode additional streams for each view point. The view points are encoded to include additional streams of l-frames, P-frames and B-frames that allow a smart assembler to quickly recover the decoded stream when switching between the view points.
[96] FIG. 19 depicts view point encoding of a view of omnidirectional video for adaptive view point streaming. As depicted in FIG. 19, an original view point stream 1902 is further encoded into additional streams 1904, 1906, 1908 for the different time clips. The additional streams 1904, 1906, 1908 encode different frames into l-frames, P-frames and B-frames. The original stream and the additional streams 1910 are transmitted to allow quick view point switching at any time point. [97] As depicted in FIG. 19, after encoding one whole video stream, several addition streams are encoded. All frames after the l-frame in a GOP in the original stream 1902 are encoded as l-frames forming Stream 0 1904. Several frames after the first l-frame in Stream 0 1902 are selected and encoded as P-frames forming Stream 1 1904. Several frames after the first l-frame in Stream 0 1902 are selected and encoded as B-frame forming Stream 2 1908.
[98] When streaming though a network, a smart assembler can pick up these Ι,Ρ,Β frames whose position is between the random access
point(include) and the l-frame of next GOP of the original stream to form a standard decodable stream according to their frame dependency. This view point encoding can shorten the average waiting time during temporal random access. Combined with spatial division feature of encoding the different view points described above, it is possible to achieve high spatial and temporal random access ability during the omnidirectional video streaming.
[99] FIG. 20 depicts adaptive view point streaming of omnidirectional video.
As described above, it is possible to encode each view point with additional
streams of data which provides an improved adaptive streaming for omnidirectional video. As depicted in FIG. 20, previous adaptive streaming techniques encoded a video into a plurality of different quality streams 2002. The different quality streams allowed the streaming of a video to adapt to network conditions. In contrast to regular adaptive streaming, the adapting streaming for omnidirectional viewpoints allows the adaptive streaming of multiple view points. The additional streams allow the quick switching between view points.
[100] FIG. 21 a system for encoding and streaming omnidirectional video. The system is depicted as a server 2100 for processing omnidirectional video that may be provided to the server 2100 from a video system 2102 that captures 360 ° video. The server 2100 comprises a processing unit 2104 for executing instructions. An input/output (I/O) 2106 interface allows additional components to be operatively coupled to the processing unit 2104. The server 2100 further comprises a non-volatile (NV) memory 2108 for storing data and instructions and a memory unit 21 10, such as RAM, for storing instructions for execution by the processing unit 2104. The instructions stored in the memory unit 21 10 when executed by the processing unit 2104 configure the server 2100 to provide an omnidirectional video encoding functionality 21 12 in accordance with the functionality described above.
[101 ] The encoding functionality 21 12 comprises functionality for segmenting and mapping 21 14 spherical omnidirectional video data to a number of pole and equatorial joining segments. Tile stacking functionality 21 16 arranges the segments into a single frame for subsequent encoding. The functionality further comprises ROI tracking functionality 21 18 that tracks ROIs across frames of the omnidirectional video. The stacked images and ROI information is then used by encoding functionality 2120 to encode the video data.
[102] The above has described the encoding and streaming of
omnidirectional video. As described further below, 360 ° panoramic video may be captured from existing devices such as smart phones.
[103] FIGs. 22A, 22B and 22C depict devices for capturing panoramic and/or omnidirectional video. The devices depicted in FIGs. 22A, 22B, 22C each comprise fish eye-lenses that are mounted over camera of the device. The fish-eye devices may be used with a single mobile phone or with additional mobile phones. FIG. 22A depicts single phone 2200a that has a front facing camera and a rear facing camera. A panoramic video capture device 2202a fits over the phone 2200a and places a first fisheye lens 2204a over a front facing camera and a second fisheye lens 2206a over a back facing camera. FIG. 22B depicts a similar panoramic video capture device 2202b; however, rather than placing fisheye lenses over the front and back cameras, the device 2202b is designed for holding two mobile devices 2200b-1 , 2200b-2 back-to-back and places the fisheye lenses 2204b, 2204b over the front facing cameras of the mobile devices. FIG. 22C depicts a further device 2202C that is designed for holding three mobile devices 2200c-1 , 2200c-2, 2200c-3. The device 2202c holds the three mobile devices and arranges fisheye lenses 2204c, 2206c, 2208c over the front facing cameras of the devices. The devices 2202a, 2202b, 2202c allow panoramic video to be captured using common mobile devices. Each of the fisheye lenses may provide a 180 ° field of view. [104] With each of the devices described above, two or more fisheye video streams are captured simultaneously. When the video streams are captured from separate devices, one of the capture devices acts as the master capture device and may make connections with other devices to to receive the video streams, stitch the videos together and output the panoramic videos. When capturing video from a single mobile device, the two video streams captured by the front and back cameras of the device can be stitched together by the mobile device and stream out the resulting video. Alternatively, stitching can be done in a player, which is suitable for low-power capture devices. In that case, all capture devices stream video directly to the player.
[105] The devices depicted in FIGs. 22A, 22B, 22C may be used to stream panoramic video in a video chat system. The video streaming process, both between capture devices and from capture device to player, may use Real-
time Transport Protocol to transfer real-time video and use Session
Description Protocol to negotiate the parameters. Additionally, timestamps for each frame may be added to the stream for synchronization.
[106] The stitching process may be performed as follows:
1 . Generate stitching template. This is necessary every time the
fisheye lens is re-mounted. Static photos are captured for every camera and key points are extracted using algorithms like SIFT. After matching the key points from different camera, each camera's parameters and rotation can be generated. More details about stitching is described below.
2. Frame synchronize. Using timestamps of each video frame, every video frame is synchronized with frames captured by other devices.
3. Remap. Using the generated template (each camera's parameters and rotation), each frame from different camera can be remapped into a sphere.
4. Blend. Linear or multi-band blending algorithm is used to blend remapped frames from different cameras to produce a 360 degree panorama frame, which is usually projected into a rectangular image as described above.
[107] Generating the stitching template is illustrated in FIGs. 23A, 23B, 23C. Using key points directly extracted from fisheye images to perform matching may produce bad results as depicted in FIG. 23A. The mismatch between the fisheye videos 2302a, 2302b may result from distortion effects of the fisheye lens make the objects far from the image center hard to recognize by algorithms like SIFT and because most part of the images do not overlap. Generating the stitching template uses predefined approximate camera parameters to remap the fisheye images to flattened images 2304a, 2304b before extracting key points as depicted in FIG 23B. Based on the
approximate position of two or three cameras, key points at certain areas
2306a, 2306b can be ignored safely. After a correct match is found in remapped images 2306a, 2306b, those key points are then un-mapped into the original fisheye images 2308a, 2308b to compute the final camera parameters and rotation to provide a proper matching between captured videos.
[108] Another important step in generating the template is calculating a brightness map as depicted in FIGs. 24A, 23B. As a fisheye lens is used, brightness of each pixel varies greatly near the border as depicted in FIG. 24A. Using multiple overlapped images captured when rotating the device and position data from sensors like gyroscopes to detect the rotation of the device, the brightness map, which provides brightness values for each pixel in an image, can be calculated as depicted in FIG 24B and used later to correct image brightness before blending.
[109] The stitching process may also involve audio reconstruction. Audio streams captured by several devices at different positions can be
reconstructed to provide stereo audio.
[1 10] The player (usually smart devices or headsets) receives stitched panorama video (or multiple original video stream and then do stitching) and displays it. A user can look at different angles using rotation sensors in the player such as gyroscopes.
Claims
1 . A method of encoding omnidirectional video comprising: receiving spherical omnidirectional video data;
segmenting a frame of the spherical omnidirectional video data into a north pole segment formed by mapping a top spherical dome portion of the frame of the spherical omnidirectional video data onto a circle, a south pole segment formed by mapping a bottom spherical dome portion of the frame of the spherical omnidirectional video data onto a circle and at least one joining segment formed by mapping a spherical joining portion of the frame of the spherical omnidirectional video data joining the top spherical dome portion and the bottom spherical dome portion onto at least one rectangle; stacking the north pole segment, the south pole segment and the at least one joining segment together to form an 2-dimension (2D) frame corresponding to the frame of the spherical omnidirectional video data; and
encoding the 2D frame.
2. The method of claim 1 , further comprising: segmenting a plurality of frames of the spherical omnidirectional video data into a plurality of north pole segments, south pole segments and joining segments;
stacking the plurality of north pole segments, south pole segments and joining segments into a plurality of 2D frames; and
encoding the plurality of 2D frames.
3. The method of claim 1 , wherein the at least one joining segment comprises a plurality of joining segments, each mapping a portion of the spherical joining portion onto a respective rectangles.
4. The method of claim 1 , wherein the circular poles are placed within
squares.
The method of claim 1 , wherein each of the north pole segment, the south pole segment and the at least one joining segment comprise overlapping pixel data.
The method of claim 5, wherein there is up to 5% overlap between the segments.
The method of claim 6, wherein the at least one joining segment comprises between 2 and 4 segments.
The method of claim 2, further comprising tracking one or more regions of interest (ROI) before encoding the 2D frames.
The method of claim 1 , wherein encoding the 2D frame comprises: encoding one or more view points into a first stream; and
for each view point encoding additional streams comprising an
intracoded frame stream, a predictive frame stream, and a bi- predictive frame stream.
10. The method of claim 9, further comprising streaming at least one of the encoded view points.
1 1 . A system for encoding omnidirectional video comprising: a processor for executing instructions; and
a memory for storing instructions, which when executed by the
processor, configure the system to provide a method comprising: receiving spherical omnidirectional video data;
segmenting a frame of the spherical omnidirectional video data into a north pole segment formed by mapping a top spherical dome portion of the frame of the spherical omnidirectional video data onto a circle, a south pole segment formed by mapping a bottom spherical dome portion of the frame of the
spherical omnidirectional video data onto a circle and at least one joining segment formed by mapping a spherical joining portion of the frame of the spherical omnidirectional video data joining the top spherical dome portion and the bottom spherical dome portion onto at least one rectangle;
stacking the north pole segment, the south pole segment and the at least one joining segment together to form an 2- dimension (2D) frame corresponding to the frame of the spherical omnidirectional video data; and
encoding the 2D frame.
12. The system of claim 1 1 , wherein the instructions further configure the
system to: segment a plurality of frames of the spherical omnidirectional video data into a plurality of north pole segments, south pole segments and joining segments;
stack the plurality of north pole segments, south pole segments and joining segments into a plurality of 2D frames; and
encode the plurality of 2D frames.
13. The system of claim 1 1 , wherein the at least one joining segment
comprises a plurality of joining segments, each mapping a portion of the spherical joining portion onto a respective rectangles.
14. The system of claim 1 1 , wherein the circular poles are placed within
squares.
15. The system of claim 1 1 , wherein each of the north pole segment, the
south pole segment and the at least one joining segment comprise overlapping pixel data.
16. The system of claim 15, wherein there is up to 5% overlap between the segments.
17. The system of claim 16, wherein the at least one joining segment comprises between 2 and 4 segments.
18. The system of claim 12, wherein the instructions further configure the system to track one or more regions of interest (ROI) before encoding the 2D frames.
19. The system of claim 1 , wherein encoding the 2D frame comprises: encoding one or more view points into a first stream; and
for each view point encoding additional streams comprising an
intracoded frame stream, a predictive frame stream, and a bi- predictive frame stream.
20. The system of claim 9, wherein the instructions further configure the system to stream at least one of the encoded view points.
21 . A device for use in capturing panoramic video comprising: a frame for holding a mobile device;
a first fisheye lens mounted on the frame and arranged to be located over a front facing camera of the mobile device when the mobile device is held by the frame; and
a second fisheye lens mounted on the frame and arranged to be
located over a rear facing camera of the mobile device when the mobile device is held by the frame.
22. A method of stitching multiple videos captured from one or more mobile devices comprising: generating a stitching template for each camera capturing the videos; synchronizing frames of the captured video using timestamps of the frames;
remapping the multiple videos onto a sphere using the stitching
template; and
blending the remapped images to provide a panoramic video.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201780007828.1A CN109121466B (en) | 2016-01-22 | 2017-01-23 | Omnidirectional video coding and streaming |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662286252P | 2016-01-22 | 2016-01-22 | |
US62/286,252 | 2016-01-22 | ||
US201662286516P | 2016-01-25 | 2016-01-25 | |
US62/286,516 | 2016-01-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017127816A1 true WO2017127816A1 (en) | 2017-07-27 |
Family
ID=59362365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2017/014588 WO2017127816A1 (en) | 2016-01-22 | 2017-01-23 | Omnidirectional video encoding and streaming |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109121466B (en) |
WO (1) | WO2017127816A1 (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018002425A3 (en) * | 2016-06-30 | 2018-02-08 | Nokia Technologies Oy | An apparatus, a method and a computer program for video coding and decoding |
US20180349705A1 (en) * | 2017-06-02 | 2018-12-06 | Apple Inc. | Object Tracking in Multi-View Video |
WO2019038433A1 (en) * | 2017-08-24 | 2019-02-28 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Characteristics signaling for omnidirectional content |
WO2019120638A1 (en) * | 2017-12-22 | 2019-06-27 | Huawei Technologies Co., Ltd. | Scalable fov+ for vr 360 video delivery to remote end users |
EP3518087A1 (en) * | 2018-01-29 | 2019-07-31 | Thomson Licensing | Method and network equipment for tiling a sphere representing a spherical multimedia content |
EP3531703A1 (en) * | 2018-02-26 | 2019-08-28 | Thomson Licensing | Method and network equipment for encoding an immersive video spatially tiled with a set of tiles |
WO2019190197A1 (en) * | 2018-03-27 | 2019-10-03 | 주식회사 케이티 | Method and apparatus for video signal processing |
US10484621B2 (en) * | 2016-02-29 | 2019-11-19 | Gopro, Inc. | Systems and methods for compressing video content |
EP3618442A1 (en) * | 2018-08-27 | 2020-03-04 | Axis AB | An image capturing device, a method and computer program product for forming an encoded image |
US10666863B2 (en) | 2018-05-25 | 2020-05-26 | Microsoft Technology Licensing, Llc | Adaptive panoramic video streaming using overlapping partitioned sections |
US10754242B2 (en) | 2017-06-30 | 2020-08-25 | Apple Inc. | Adaptive resolution and projection format in multi-direction video |
US10764494B2 (en) | 2018-05-25 | 2020-09-01 | Microsoft Technology Licensing, Llc | Adaptive panoramic video streaming using composite pictures |
JPWO2020235034A1 (en) * | 2019-05-22 | 2020-11-26 | ||
US10924747B2 (en) | 2017-02-27 | 2021-02-16 | Apple Inc. | Video coding techniques for multi-view video |
US10999602B2 (en) | 2016-12-23 | 2021-05-04 | Apple Inc. | Sphere projected motion estimation/compensation and mode decision |
US11259046B2 (en) | 2017-02-15 | 2022-02-22 | Apple Inc. | Processing of equirectangular object data to compensate for distortion by spherical projections |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060034067A1 (en) * | 2004-08-11 | 2006-02-16 | Weiliang Kui | Illumination pen |
US20140003523A1 (en) * | 2012-06-30 | 2014-01-02 | Divx, Llc | Systems and methods for encoding video using higher rate video sequences |
US20140132598A1 (en) * | 2007-01-04 | 2014-05-15 | Hajime Narukawa | Method of mapping image information from one face onto another continous face of different geometry |
WO2014162324A1 (en) * | 2013-04-04 | 2014-10-09 | Virtualmind Di Davide Angelelli | Spherical omnidirectional video-shooting system |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003141562A (en) * | 2001-10-29 | 2003-05-16 | Sony Corp | Image processing apparatus and method for nonplanar image, storage medium, and computer program |
US7011625B1 (en) * | 2003-06-13 | 2006-03-14 | Albert Shar | Method and system for accurate visualization and measurement of endoscopic images |
CN103247020A (en) * | 2012-02-03 | 2013-08-14 | 苏州科泽数字技术有限公司 | Fisheye image spread method based on radial characteristics |
-
2017
- 2017-01-23 WO PCT/US2017/014588 patent/WO2017127816A1/en active Application Filing
- 2017-01-23 CN CN201780007828.1A patent/CN109121466B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060034067A1 (en) * | 2004-08-11 | 2006-02-16 | Weiliang Kui | Illumination pen |
US20140132598A1 (en) * | 2007-01-04 | 2014-05-15 | Hajime Narukawa | Method of mapping image information from one face onto another continous face of different geometry |
US20140003523A1 (en) * | 2012-06-30 | 2014-01-02 | Divx, Llc | Systems and methods for encoding video using higher rate video sequences |
WO2014162324A1 (en) * | 2013-04-04 | 2014-10-09 | Virtualmind Di Davide Angelelli | Spherical omnidirectional video-shooting system |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10484621B2 (en) * | 2016-02-29 | 2019-11-19 | Gopro, Inc. | Systems and methods for compressing video content |
US10979727B2 (en) | 2016-06-30 | 2021-04-13 | Nokia Technologies Oy | Apparatus, a method and a computer program for video coding and decoding |
US20190297339A1 (en) * | 2016-06-30 | 2019-09-26 | Nokia Technologies Oy | An Apparatus, A Method and A Computer Program for Video Coding and Decoding |
WO2018002425A3 (en) * | 2016-06-30 | 2018-02-08 | Nokia Technologies Oy | An apparatus, a method and a computer program for video coding and decoding |
US11818394B2 (en) | 2016-12-23 | 2023-11-14 | Apple Inc. | Sphere projected motion estimation/compensation and mode decision |
US10999602B2 (en) | 2016-12-23 | 2021-05-04 | Apple Inc. | Sphere projected motion estimation/compensation and mode decision |
US11259046B2 (en) | 2017-02-15 | 2022-02-22 | Apple Inc. | Processing of equirectangular object data to compensate for distortion by spherical projections |
US10924747B2 (en) | 2017-02-27 | 2021-02-16 | Apple Inc. | Video coding techniques for multi-view video |
US20180349705A1 (en) * | 2017-06-02 | 2018-12-06 | Apple Inc. | Object Tracking in Multi-View Video |
US11093752B2 (en) * | 2017-06-02 | 2021-08-17 | Apple Inc. | Object tracking in multi-view video |
US10754242B2 (en) | 2017-06-30 | 2020-08-25 | Apple Inc. | Adaptive resolution and projection format in multi-direction video |
WO2019038433A1 (en) * | 2017-08-24 | 2019-02-28 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Characteristics signaling for omnidirectional content |
WO2019120638A1 (en) * | 2017-12-22 | 2019-06-27 | Huawei Technologies Co., Ltd. | Scalable fov+ for vr 360 video delivery to remote end users |
US11706274B2 (en) | 2017-12-22 | 2023-07-18 | Huawei Technologies Co., Ltd. | Scalable FOV+ for VR 360 video delivery to remote end users |
US11546397B2 (en) | 2017-12-22 | 2023-01-03 | Huawei Technologies Co., Ltd. | VR 360 video for remote end users |
CN111567052B (en) * | 2017-12-22 | 2022-01-14 | 华为技术有限公司 | Scalable FOV + for issuing VR 360 video to remote end user |
CN111567052A (en) * | 2017-12-22 | 2020-08-21 | 华为技术有限公司 | Scalable FOV + for issuing VR360 video to remote end user |
EP3518087A1 (en) * | 2018-01-29 | 2019-07-31 | Thomson Licensing | Method and network equipment for tiling a sphere representing a spherical multimedia content |
CN112088352A (en) * | 2018-01-29 | 2020-12-15 | 交互数字Ce专利控股公司 | Method and network device for chunking spheres representing spherical multimedia content |
WO2019145296A1 (en) * | 2018-01-29 | 2019-08-01 | Interdigital Ce Patent Holdings | Method and network equipment for tiling a sphere representing a spherical multimedia content |
EP3531704A1 (en) * | 2018-02-26 | 2019-08-28 | InterDigital CE Patent Holdings | Method and network equipment for encoding an immersive video spatially tiled with a set of tiles |
US11076162B2 (en) | 2018-02-26 | 2021-07-27 | Interdigital Ce Patent Holdings | Method and network equipment for encoding an immersive video spatially tiled with a set of tiles |
EP3531703A1 (en) * | 2018-02-26 | 2019-08-28 | Thomson Licensing | Method and network equipment for encoding an immersive video spatially tiled with a set of tiles |
WO2019190197A1 (en) * | 2018-03-27 | 2019-10-03 | 주식회사 케이티 | Method and apparatus for video signal processing |
US10764494B2 (en) | 2018-05-25 | 2020-09-01 | Microsoft Technology Licensing, Llc | Adaptive panoramic video streaming using composite pictures |
US10666863B2 (en) | 2018-05-25 | 2020-05-26 | Microsoft Technology Licensing, Llc | Adaptive panoramic video streaming using overlapping partitioned sections |
US10972659B2 (en) | 2018-08-27 | 2021-04-06 | Axis Ab | Image capturing device, a method and a computer program product for forming an encoded image |
EP3618442A1 (en) * | 2018-08-27 | 2020-03-04 | Axis AB | An image capturing device, a method and computer program product for forming an encoded image |
KR102172276B1 (en) | 2018-08-27 | 2020-10-30 | 엑시스 에이비 | An image capturing device, a method and a computetr program product for forming an encoded image |
TWI716960B (en) * | 2018-08-27 | 2021-01-21 | 瑞典商安訊士有限公司 | An image capturing device, a method and a computer program product for forming an encoded image |
KR20200024095A (en) * | 2018-08-27 | 2020-03-06 | 엑시스 에이비 | An image capturing device, a method and a computetr program product for forming an encoded image |
JP7259947B2 (en) | 2019-05-22 | 2023-04-18 | 日本電信電話株式会社 | Video distribution device, video distribution method and program |
JPWO2020235034A1 (en) * | 2019-05-22 | 2020-11-26 | ||
WO2020235034A1 (en) * | 2019-05-22 | 2020-11-26 | 日本電信電話株式会社 | Video distribution device, video distribution method, and program |
Also Published As
Publication number | Publication date |
---|---|
CN109121466A (en) | 2019-01-01 |
CN109121466B (en) | 2022-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109121466B (en) | Omnidirectional video coding and streaming | |
US11228749B2 (en) | Systems, methods and apparatus for compressing video content | |
US10652558B2 (en) | Apparatus and methods for video compression using multi-resolution scalable coding | |
US11166047B2 (en) | Apparatus and methods for video compression | |
KR102191875B1 (en) | Method for transmitting 360 video, method for receiving 360 video, 360 video transmitting device, and 360 video receiving device | |
US20190373245A1 (en) | 360 video transmission method, 360 video reception method, 360 video transmission device, and 360 video reception device | |
US20180310010A1 (en) | Method and apparatus for delivery of streamed panoramic images | |
WO2018175493A1 (en) | Adaptive perturbed cube map projection | |
CN115150617A (en) | Encoding method and decoding method | |
WO2018132317A1 (en) | Adjusting field of view of truncated square pyramid projection for 360-degree video | |
KR20190095430A (en) | 360 video processing method and apparatus therefor | |
EP3434021B1 (en) | Method, apparatus and stream of formatting an immersive video for legacy and immersive rendering devices | |
US20210312588A1 (en) | Immersive video bitstream processing | |
Hu et al. | Mobile edge assisted live streaming system for omnidirectional video | |
US12003692B2 (en) | Systems, methods and apparatus for compressing video content | |
WO2019034803A1 (en) | Method and apparatus for processing video information | |
WO2017220851A1 (en) | Image compression method and technical equipment for the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17742114 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.12.2018) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17742114 Country of ref document: EP Kind code of ref document: A1 |