WO2018120474A1 - 一种信息的处理方法及装置 - Google Patents

一种信息的处理方法及装置 Download PDF

Info

Publication number
WO2018120474A1
WO2018120474A1 PCT/CN2017/078585 CN2017078585W WO2018120474A1 WO 2018120474 A1 WO2018120474 A1 WO 2018120474A1 CN 2017078585 W CN2017078585 W CN 2017078585W WO 2018120474 A1 WO2018120474 A1 WO 2018120474A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
spatial
target
space
spatial information
Prior art date
Application number
PCT/CN2017/078585
Other languages
English (en)
French (fr)
Inventor
邸佩云
谢清鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/CN2017/070863 external-priority patent/WO2018120294A1/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to MX2019007926A priority Critical patent/MX2019007926A/es
Priority to EP17888252.8A priority patent/EP3557534A4/en
Priority to EP23172704.1A priority patent/EP4287637A1/en
Priority to CN201780081717.5A priority patent/CN110121734B/zh
Priority to JP2019535932A priority patent/JP7058273B2/ja
Priority to BR112019013609A priority patent/BR112019013609A8/pt
Priority to KR1020197022156A priority patent/KR102261559B1/ko
Priority to CN201710214606.9A priority patent/CN108271044B/zh
Publication of WO2018120474A1 publication Critical patent/WO2018120474A1/zh
Priority to US16/458,734 priority patent/US20190325652A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26258Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for generating a list of items to be played back in a given order, e.g. playlist, or scheduling item distribution according to such list
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/23439Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements for generating different versions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/41407Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • the present invention relates to the field of streaming media processing, and in particular, to a method and an apparatus for processing information.
  • the DASH technical specification is mainly composed of two major parts: the media presentation description (English: Media Presentation) Description, MPD) and media file format (English: file format).
  • the media file format belongs to a file format.
  • the server prepares multiple versions of the code stream for the same video content.
  • Each version of the code stream is called a representation in the DASH standard (English: representation).
  • Representation is a collection and encapsulation of one or more codestreams in a transport format, one representation containing one or more segments.
  • Different versions of the code stream may have different coding parameters such as code rate and resolution, and each code stream is divided into a plurality of small files, and each small file is called segmentation (or segmentation, English: segment). It is possible to switch between different media representations during the client requesting media segmentation data.
  • the segment may be packaged in accordance with the standard ISO/IEC 14496-12 (ISO BMFF (Base Media File Format)) or may be encapsulated in accordance with ISO/IEC 13818-1 (MPEG-2 TS).
  • the media presentation description is called MPD
  • the MPD can be an xml file.
  • the information in the file is described in a hierarchical manner. As shown in FIG. 1, the information of the upper level is completely inherited by the next level. Some media metadata is described in this file, which allows the client to understand the media content information in the server and can use this information to construct the http-URL of the request segment.
  • media presentation is a collection of structured data for presenting media content
  • media presentation description English: media presentation description
  • a standardized description of media presentation files for providing streaming media services Period English: period
  • representation is the description information of one or more code streams in the transmission format Collection and encapsulation, one representation containing one or more segments
  • adaptive set English: AdaptationSet
  • Subset English: subset
  • the segmentation information is the HTTP unified resource in the media presentation description.
  • the media unit referenced by the locator, the segmentation information describes the segmentation of the media data, and the media data Segment may be stored in a file, can be stored separately, in one possible
  • the related technical concept of the MPEG-DASH technology of the present invention can refer to the relevant provisions in ISO/IEC 23009-1 Information technology--Dynamic adaptive streaming over HTTP (DASH)--Part 1:Media presentation description and segment formats, refer to the relevant provisions in the historical standard version, such as ISO/IEC 23009-1:2013 or ISO/IEC 23009-1:2012.
  • Virtual reality technology is a computer simulation system that can create and experience virtual worlds. It uses computer to generate a simulation environment. It is a multi-source information fusion interactive 3D dynamic vision and system simulation of entity behavior. The user is immersed in the environment.
  • VR mainly includes simulation environment, perception, natural skills and sensing equipment.
  • the simulation environment is a computer-generated, real-time, dynamic, three-dimensional, realistic image. Perception means that the ideal VR should have the perception that everyone has.
  • there are also perceptions such as hearing, touch, force, and motion, and even smell and taste, also known as multi-perception.
  • Natural skills refer to the rotation of the person's head, eyes, gestures, or other human behaviors.
  • a sensing device is a three-dimensional interactive device.
  • VR video or 360 degree video, or Omnidirectional video
  • only the video image representation and associated audio presentation corresponding to the orientation portion of the user's head are presented.
  • VR video is that the entire video content will be presented to the user; VR video is only a subset of the entire video is presented to the user (English: in VRtypically only a subset Of the entire video region represented by the videopictures).
  • a Spatial Object is defined as a spatial part of a content component (ega region of interest, or a tile ) and represented by either an adaptation Set or a Sub-Representation.”
  • Spatial information is the spatial relationship between spatial objects (Spatial Objects).
  • a spatial object is defined as a part of a content component, such as an existing region of interest (ROI) and tiles; spatial relationships can be described in Adaptation Set and Sub-Representation.
  • ROI region of interest
  • tiles spatial relationships can be described in Adaptation Set and Sub-Representation.
  • the spatial information of the spatial object can be described in the MPD.
  • a file is composed of a number of boxes and a full box (FullBox).
  • Each Box consists of a header (Header) and data (Data).
  • Data is the actual data of the Box, which can be pure data or more sub-Boxes.
  • the contact type (reference_type) used for the link between the track of the media content and the track of the metadata is 'cdsc', such as parsing the association in the track of the video.
  • Track, and the associated type is 'cdsc', indicating that the associated track is the metadata track used to describe the track of the video.
  • the client needs to parse all the tracks included in the file, and then according to the type of contact between the track of the media content and the track of the metadata.
  • VR video viewing applications such as 360-degree viewing angles are increasingly presented to users.
  • the user may change the view angle (English: field of view, FOV) at any time, and each view corresponds to a video object of a spatial object (which can be understood as an area in the VR video), and is presented to the user when the view is switched.
  • the VR video image within the perspective should also be switched.
  • the video data of the spatial object that can cover the human eye's perspective is presented.
  • the spatial object viewed by the user may be the region of interest that most users choose to view, or may be the region specified by the video creator.
  • the area will change over time.
  • spatial information describing the location of the spatial object in the VR video needs to be encapsulated in a corresponding file. Since the image data in the video data corresponds to a large number of images, a large amount of spatial information of a large number of images causes an excessive amount of data.
  • the embodiment of the invention provides a method and a device for processing information of a streaming media, which can save the data amount of the spatial information.
  • a method for processing information of a streaming media comprising:
  • target space information of the target space object where the target space object is a spatial object of two spatial objects, where the two spatial objects are associated with data of two images included in the target video data, and the target spatial information includes the same attribute space Information, the same attribute space information includes the same information between the spatial information of the two spatial objects, and the spatial information of the other spatial objects other than the target spatial object includes the same attribute spatial information;
  • video data to be played is determined.
  • the target video data may be a code stream, that is, a target video code stream, or may be original data that has not been encoded or reconstructed reconstructed data, that is, a target video sequence.
  • the target video data is the target video code stream
  • the data of the two images may be the encoded data of the two images.
  • the target video data is the target video sequence
  • the data of the two images is the original data or the reconstructed data of the two images. .
  • the two images and the two spatial objects may have a one-to-one correspondence.
  • the same attribute space information is a group of information, and the spatial information of each of the two space objects shares the group information.
  • the acquiring target space information of the target space object may include receiving target space information of the target space object from the server.
  • the two images may be two frames in the video sequence, that is, the two images correspond to different moments, and the two images may also be sub-images of the same frame in the video sequence, that is, the two images correspond to the same image. At the moment, the two images can also be sub-images of different frames in the video sequence.
  • the redundancy of the spatial information is reduced, thereby reducing the amount of data of the spatial information.
  • the determining, according to the target space information, the video data to be played may include: determining, according to the target spatial information, whether the target spatial object includes all or part of a spatial object corresponding to the screen to be played, where the target spatial object includes When all or part of the spatial object corresponding to the picture to be played is used, the target video data is the video data to be played.
  • the determining, according to the target space information, the video data to be played may further include: according to the target space information, and a spatial relationship of the spatial object corresponding to the screen to be played relative to the target spatial object (or, the perspective switching) Track), the spatial information of the spatial object corresponding to the picture to be played (or the spatial object after the view switching) is determined, thereby further determining the video data to be played.
  • the video data to be played may be a video code stream to be played, and after acquiring the target space information of the target space object, the relative position of the target space object in the panoramic space (or called a panoramic space object) may be determined, and then During the video playing process, the location of the spatial object after the perspective switching is determined according to the target spatial information of the target spatial object and the trajectory of the viewing angle switching. And further requesting, from the server, the video code stream to be played corresponding to the space object corresponding to the picture to be played.
  • the request for acquiring the video code stream to be played may be sent to the server according to the URL of the code stream of each spatial object described in the MPD, thereby obtaining the video code stream to be played, decoding and playing the to-be-played video stream.
  • Video stream may be sent to the server according to the URL of the code stream of each spatial object described in the MPD, thereby obtaining the video code stream to be played, decoding and playing the to-be-played video stream.
  • the target spatial information further includes hetero-information space information of the target spatial object
  • the spatial information of the other spatial object further includes the different spatial objects. Attribute space information, the different attribute space information of the target space object is different from the different attribute information of the other space object.
  • the different attribute space information of the target space object and the different attribute information of the other space object may refer to different values of the two different attribute space information.
  • the target space information includes location information of a center point of the target space object or an upper left of the target space object The location information of the point, the target space information further including the width of the target space object and the height of the target space object.
  • the target space information may also include location information of other location points (lower left point, upper right point, lower right point, or preset point) in the target space object instead of the location information of the center point of the target space object or The position information of the upper left point of the target space object.
  • the upper left point is a point at which the value of the abscissa and the value of the ordinate are the smallest in the target space object.
  • the position information of the center point or the position information of the upper left point may be a pitch angle ⁇ (pitch) and a yaw angle y (yaw), or may be a pitch angle. ⁇ (pitch), yaw angle yaw, and roll angle ⁇ (roll).
  • the coordinate system corresponding to the target spatial information is a pixel coordinate system
  • the position information of the center point or the position information of the upper left point may be pixels.
  • the absc issa of the unit and the ordinate in pixels.
  • the target spatial information includes location information of an upper left point of the target spatial object, and the target spatial object The location information of the lower right point.
  • the target space information may also include location information of an upper right point of the target spatial object, and location information of a lower left point of the target spatial object.
  • the target spatial information includes spatial rotation information of the target spatial object.
  • the spatial rotation information of the target space object may be used to indicate a degree of rotation of the target space object relative to a horizontal coordinate axis or a longitudinal coordinate axis of the panoramic space object, where the target space object is located in the panoramic space object.
  • the spatial rotation information when the coordinate system corresponding to the target spatial information is an angular coordinate system, the spatial rotation information may be a roll angle ⁇ (roll), and when the coordinate system corresponding to the target spatial information is a pixel coordinate system, the spatial rotation information may be used.
  • the motion vector of the position point in the target space object obtained by the roll angle conversion is represented by the motion vector in units of pixels.
  • the target spatial information is encapsulated in a space
  • the spatial information data is a code stream of the target video data, metadata of the target video data, or a file independent of the target video data, the spatial information track being independent of the target The trajectory of the video data.
  • the file independent of the target video data may be a spatial information file for describing spatial information.
  • the trajectory independent of the target video data may be a spatial information trajectory for describing spatial information.
  • the target spatial information when the target spatial information is encapsulated in the code stream of the target video data, the target spatial information may be encapsulated in an auxiliary enhanced information unit or a parameter set unit in the code stream of the target video data, and the target spatial information may also be encapsulated in The segment of the representation in which the target video data is located. Specifically, it can be encapsulated in a specified box in the segment (for example, trun box or tfhd box).
  • the same attribute space information and the different attribute space information of the target space object may be encapsulated in the same box, or may be encapsulated in different boxes. Specifically, the same attribute space information may be encapsulated in a 3dsc box, and the different attribute space information of the target space object may be encapsulated in the mdat box.
  • the spatial information data or the spatial information track further includes a spatial information type for indicating a type of the same attribute spatial information.
  • the identifier, the spatial information type identifier is used to indicate information belonging to the same attribute spatial information in the target spatial information.
  • the spatial information type identifier can also be used to indicate information of the heterogeneous spatial information belonging to the target spatial object in the target spatial information.
  • the spatial information type identifier may also be used to indicate the same attribute space information or the spatial information type of the different attribute space information of the target space object.
  • the optional spatial information type may include, but is not limited to, including location information of the spatial object but not The spatial information including the width and height information of the spatial object, the spatial information including the width and height information of the spatial object but not the positional information of the spatial object, the spatial information including the width and height information of the spatial object, and the positional information of the spatial object.
  • the spatial information type identifier may also be used to indicate the spatial object type of the two spatial objects.
  • the optional spatial object types may include, but are not limited to, a spatial object with a constant position and width, a position change, and a width and height constant. Spatial objects, spatial objects with constant position and width and height, and spatial objects with varying positions and widths.
  • the spatial information type identifier when the spatial information type identifier is the first preset value, the spatial information type identifier is used to indicate that the information belonging to the same attribute spatial information in the target spatial information is the target space. Position information of a center point of the object or position information of an upper left point of the target space object, and the target space pair The width of the image and the height of the target space object.
  • the spatial information type identifier is used to indicate that the information belonging to the same attribute spatial information in the target spatial information is the width of the target spatial object and the height of the target spatial object.
  • the spatial information type identifier is the third preset value, the spatial information type identifier is used to indicate that the target spatial information does not belong to the same attribute spatial information.
  • the spatial information type identifier when the spatial information type identifier is the first preset value, the spatial information type identifier further indicates that the hetero-attribute space information does not exist.
  • the spatial information type identifier when the spatial information type identifier is the second preset value, the spatial information type identifier further indicates that the heterogeneous spatial information of the target spatial object is the location information of the central point of the target spatial object or the location information of the upper left point of the target spatial object.
  • the spatial information type identifier when the spatial information type identifier is the third preset value, the spatial information type identifier further indicates that the heterogeneous spatial information of the target spatial object is the location information of the central point of the target spatial object or the location information of the upper left point of the target spatial object. And the width of the target space object and the height of the target space object.
  • the spatial information type identifier when the spatial information type identifier is the fourth preset value, the spatial information type identifier is used to indicate that the information belonging to the same attribute spatial information in the target spatial information is the target space.
  • the spatial information type identifier is used to indicate that the information belonging to the same attribute spatial information in the target spatial information is the location information of the lower right point of the target spatial object.
  • the spatial information type identifier is the sixth preset value, the spatial information type identifier is used to indicate that the target spatial information does not belong to the same attribute spatial information. It should be noted that the position information of the upper left point of the target space object or the position information of the lower right point of the target space object may be replaced by the width of the target space object and the height of the target space object.
  • the spatial information type identifier when the spatial information type identifier is the fourth preset value, the spatial information type identifier further indicates that the hetero-attribute space information does not exist.
  • the spatial information type identifier when the spatial information type identifier is the fifth preset value, the spatial information type identifier further indicates that the heterogeneous spatial information of the target spatial object is the location information of the central point of the target spatial object or the location information of the upper left point of the target spatial object.
  • the spatial information type identifier when the spatial information type identifier is the sixth preset value, the spatial information type identifier further indicates that the heterogeneous spatial information of the target spatial object is the location information of the upper left point of the target spatial object, and the lower right point of the target spatial object location information. It should be noted that the position information of the upper left point of the target space object or the position information of the lower right point of the target space object may be replaced by the width of the target space object and the height of the target space object.
  • the spatial information type identifier indicates that the target spatial information does not belong to the same attribute spatial information
  • the same attribute The spatial information includes a wide minimum of the target spatial object, a high minimum of the target spatial object, a wide maximum of the target spatial object, and a high maximum of the target spatial object.
  • the spatial information type identifier is encapsulated in the same box as the same attribute spatial information.
  • the spatial information data or the spatial information trajectory is further A coordinate system identifier for indicating a coordinate system corresponding to the target spatial information, the coordinate system being a pixel coordinate system or an angular coordinate system.
  • position information is represented by coordinates of pixels, and width and height are also expressed in units of pixels.
  • the position information is represented by an angle, for example, a pitch angle ⁇ (pitch) and a yaw angle y (yaw), or a pitch angle ⁇ (pitch), yaw
  • the angle yaw and the roll angle ⁇ (roll) are angles to represent the angle range.
  • the coordinate system identifier is encapsulated in the same box as the same attribute space information.
  • the implementation manner of the tenth implementation manner of the first aspect in the eleventh possible implementation manner of the first aspect, is further The space rotation information identifier is used to indicate whether the target space information includes spatial rotation information of the target space object.
  • the space rotation information identifier may be encapsulated in the same box as the same attribute space information, and the space rotation information identifier may also be encapsulated in the same box as the different attribute space information of the target space object. Specifically, when the spatial rotation information identifier and the heterogeneous spatial information of the target spatial object are encapsulated in the same box, when the spatial rotation information identifier indicates that the target spatial information includes spatial rotation information of the target spatial object, The hetero-attribute space information of the target space object includes spatial rotation information.
  • the spatial rotation information identifier indicates that the target spatial information includes spatial rotation information of the target spatial object, it indicates that the target spatial object has a rotation.
  • the spatial rotation information identifier indicates that the target spatial information does not include the spatial rotation information of the target spatial object, it indicates that the target spatial object has no rotation.
  • a second aspect provides a method for processing information of a streaming media, where the method includes:
  • the target space information includes the same attribute space information
  • the same attribute space information includes The same information between the spatial information of the two spatial objects, and the spatial information of the other spatial objects except the target spatial object includes the same attribute spatial information.
  • the method may further include: sending the target space information to the client.
  • the target space information may further include the hetero-attribute space information of the target spatial object, where the spatial information of the other spatial object further includes the other spatial object.
  • the different attribute space information, the different attribute space information of the target space object is different from the different attribute information of the other spatial object.
  • the target spatial information may include location information of a central point of the target spatial object or the target spatial object
  • the target space information may further include a width of the target space object and a height of the target space object.
  • the spatial information of the two spatial objects may include location information of respective center points of the two spatial objects or position information of respective upper left points of the two spatial objects, and spatial information of each of the two spatial objects is further The respective widths of the two spatial objects and the respective heights of the two spatial objects may be included.
  • the target space information may include location information of an upper left point of the target spatial object, and the target spatial object The location information of the lower right point.
  • the spatial information of each of the two spatial objects may include location information of respective upper left points of the two spatial objects and position information of respective lower right points of the two spatial objects.
  • the target space information may include the Target empty Spatial rotation information between objects.
  • the spatial information of each of the two spatial objects may include spatial rotation information of the two spatial objects.
  • the target spatial information may be encapsulated in In the spatial information data or the spatial information track, the spatial information data may be a code stream of the target video data, metadata of the target video data, or a file independent of the target video data, and the spatial information track may be independent.
  • the trajectory of the target video data may be independent.
  • the spatial information data or the spatial information track may further include spatial information indicating a type of the same attribute spatial information.
  • a type identifier, the space information type identifier is used to indicate information in the target space information that belongs to the same attribute space information.
  • the spatial information type identifier indicates that the target spatial information does not belong to the same attribute space information
  • the same attribute space The information may include a wide minimum of the target spatial object, a high minimum of the target spatial object, a wide maximum of the target spatial object, and a high maximum of the target spatial object.
  • the spatial information type identifier may be encapsulated in the same box as the same attribute space information. in.
  • the spatial information data or the spatial information trajectory is further A coordinate system identifier for indicating a coordinate system corresponding to the target spatial information may be included, and the coordinate system is a pixel coordinate system or an angular coordinate system.
  • the coordinate system identifier may be encapsulated in the same box as the same attribute space information.
  • the implementation manner of the tenth implementation manner of the second aspect in the eleventh possible implementation manner of the second aspect, is further A spatial rotation information identifier may be included, the spatial rotation information identifying a spatial rotation information indicating whether the target spatial information includes the target spatial object.
  • a device for processing information of a streaming media comprising:
  • An acquiring module configured to acquire target space information of a target space object, where the target space object is a spatial object of two spatial objects, where the two spatial objects are associated with data of two images included in the target video data, the target space
  • the information includes the same attribute space information, and the same attribute space information includes the same information between the spatial information of the two spatial objects, and the spatial information of the other spatial objects other than the target spatial object includes the same Attribute space information;
  • a determining module configured to determine video data to be played according to the target space information acquired by the acquiring module.
  • the obtaining module may be configured to receive the target space information from the server.
  • the target space information further includes the hetero-attribute space information of the target spatial object
  • the spatial information of the other spatial object further includes the different spatial objects. Attribute space information, the different attribute space information of the target space object is different from the different attribute information of the other space object.
  • the target space information includes location information of a center point of the target space object or location information of an upper left point of the target space object, where the target space information further includes the target space The width of the object and the height of the target space object.
  • the target space information includes location information of an upper left point of the target spatial object, and the target spatial object The location information of the lower right point.
  • the target spatial information includes the target Spatial rotation information of a spatial object.
  • the target spatial information is encapsulated in the space
  • the spatial information data is a code stream of the target video data, metadata of the target video data, or a file independent of the target video data, the spatial information track being independent of the target The trajectory of the video data.
  • the spatial information data or the spatial information track further includes a spatial information type indicating a type of the same attribute spatial information.
  • the identifier, the spatial information type identifier is used to indicate information belonging to the same attribute spatial information in the target spatial information.
  • the spatial information type identifier indicates that the target spatial information does not belong to the same attribute space information
  • the same attribute space The information includes a wide minimum of the target spatial object, a high minimum of the target spatial object, a wide maximum of the target spatial object, and a high maximum of the target spatial object.
  • the spatial information type identifier is encapsulated in the same box as the same attribute space information.
  • the spatial information data or the spatial information trajectory is further A coordinate system identifier for indicating a coordinate system corresponding to the target spatial information, the coordinate system being a pixel coordinate system or an angular coordinate system.
  • the coordinate system identifier is encapsulated in the same box as the same attribute space information.
  • the implementation manner of the tenth implementation manner of the third aspect in the eleventh possible implementation manner of the third aspect, is further The space rotation information identifier is used to indicate whether the target space information includes spatial rotation information of the target space object.
  • a fourth aspect provides a processing device for streaming media information, the device comprising:
  • An acquiring module configured to acquire spatial information of each of two spatial objects associated with data of two images in the target video data
  • a determining module configured to determine target space information of the target spatial object according to the spatial information of the two spatial objects acquired by the acquiring module, where the target spatial object is a spatial object in the two spatial objects, and the target spatial information includes The same attribute space information, the same attribute space information includes the same information between the spatial information of the two spatial objects, and the spatial information of the other spatial objects except the target spatial object includes The same attribute space information.
  • the device may further include a sending module, configured to send, to the client, the target space information determined by the determining module.
  • the target space information may further include the hetero-attribute space information of the target spatial object, where the spatial information of the other spatial object further includes the other spatial object.
  • the different attribute space information, the different attribute space information of the target space object is different from the different attribute information of the other spatial object.
  • the target space information may include location information of a center point of the target spatial object or the target spatial object
  • the target space information may further include a width of the target space object and a height of the target space object.
  • the spatial information of the two spatial objects may include location information of respective center points of the two spatial objects or position information of respective upper left points of the two spatial objects, and spatial information of each of the two spatial objects is further The respective widths of the two spatial objects and the respective heights of the two spatial objects may be included.
  • the target space information may include location information of an upper left point of the target spatial object, and the target spatial object The location information of the lower right point.
  • the spatial information of each of the two spatial objects may include location information of respective upper left points of the two spatial objects and position information of respective lower right points of the two spatial objects.
  • the target space information may include the The spatial rotation information of the target space object.
  • the spatial information of each of the two spatial objects may include spatial rotation information of the two spatial objects.
  • the target spatial information may be encapsulated in In the spatial information data or the spatial information track, the spatial information data may be a code stream of the target video data, metadata of the target video data, or a file independent of the target video data, and the spatial information track may be independent. The trajectory of the target video data.
  • the spatial information data or the spatial information trajectory may further include spatial information indicating a type of the same attribute spatial information.
  • a type identifier, the space information type identifier is used to indicate information in the target space information that belongs to the same attribute space information.
  • the spatial information type identifier indicates that the target spatial information does not belong to the same attribute spatial information
  • the same attribute space The information may include a wide minimum of the target spatial object, a high minimum of the target spatial object, a wide maximum of the target spatial object, and a high maximum of the target spatial object.
  • the spatial information type identifier may be encapsulated in the same box as the same attribute spatial information. in.
  • the spatial information data or the spatial information trajectory may further include a coordinate system identifier for indicating a coordinate system corresponding to the target spatial information, where the coordinate system is a pixel coordinate system or Angle coordinate system.
  • the coordinate system identifier may be encapsulated in the same box as the same attribute space information.
  • the implementation manner of the tenth implementation manner of the fourth aspect in the eleventh possible implementation manner of the fourth aspect, is further A spatial rotation information identifier may be included, the spatial rotation information identifying a spatial rotation information indicating whether the target spatial information includes the target spatial object.
  • a fifth aspect provides a processing device for streaming media information, the device comprising a processor and a memory; the memory for storing a code; the processor reading the code stored in the memory for executing the first Aspects of the methods provided.
  • a computer storage medium for storing computer software instructions for execution by a processor of the fifth aspect for performing the method provided by the first aspect.
  • a processing apparatus for streaming media information comprising a processor and a memory; the memory for storing a code; the processor reading the code stored in the memory for executing the second Aspects of the methods provided.
  • a computer storage medium for storing computer software instructions for execution by a processor of the seventh aspect for performing the method provided by the second aspect.
  • a description manner of a reference type between media data and metadata is disclosed.
  • different contact types are defined for different usage methods of metadata, which can facilitate the client to process according to the contact type.
  • the type of association between the media data and the metadata is stored in a track of the media data or a track of the metadata; in a possible implementation manner, The type of connection between media data and metadata is transmitted in the form of a box.
  • the definitions of Track and box can refer to the existing MPEG-DASH standard and the relevant provisions of ISO/IEC 14496-12, and will not be described here.
  • information about the type of contact can be stored in the "tref" box.
  • the media data is video data
  • the type of association between the media data and the metadata is stored in the video track.
  • the tref box is stored in the metadata describing the video track.
  • the track that includes the tref box is the referenced video track, and is associated with the metadata track by the type of association between the media data and the metadata in the tref box.
  • the associated metadata track can be determined by way of track id.
  • the contact type may be used to describe one or more of the following information:
  • a region of interest (ROI) in the media data a spatial region covered by the media data, quality information of the ROI associated in the media data, and quality information associated with the spatial region covered by the media data.
  • ROI region of interest
  • the contact type can be used to describe one or more of the following information:
  • the media data corresponds to the spatial position information of the ROI in the spatial object on the spherical surface, or the 2D plane, or the mapped image;
  • the area covered by the media data is on a spherical surface, or a 2D plane, or spatial position information in the mapped image;
  • the above ROI information is included in the metadata track of the time attribute of the ROI, and the quality information is included in the metadata track of the quality time attribute.
  • the 'tref' box of the track of the media data includes a type of association indicating the media data and the metadata.
  • This contact type can be used to describe one or more of the following information:
  • the media data corresponds to 2D spatial position information of a region of interest (ROI) in the spatial object, or spatial position information of the ROI in the spherical surface corresponding to the spatial data in the media object, or the ROI in the corresponding spatial object of the media data is Spatial position information in the mapped image, or 2D spatial position information of the spatial object corresponding to the spatial object, or spatial position information of the spatial object corresponding to the spatial object in the spherical surface, or space in the mapped image corresponding to the spatial object of the media data.
  • the location information, or the quality information of the 2D spatial position of the ROI in the media data corresponding to the spatial object, or the quality information of the spatial position of the ROI in the spherical space corresponding to the media data, or the ROI in the corresponding spatial object of the media data is mapped
  • the value of the reference type is rois, indicating that the associated track includes the region information of the ROI (this track contains the region information of the ROI on the sphere).
  • the information is a spatial area that describes an image corresponding to a sample in the associated video track.
  • the client can obtain the ROI area information by parsing the samples in the metadata track of the time attribute of the ROI, and use the ROI information to present the ROI of the image corresponding to the samp le in the associated video track (The client can use the sample in this Track to render the ROI on the sphere).
  • the value of the reference type is roiq, indicating that the associated track includes the quality information of the ROI of the image corresponding to the sample in the associated video track on the spherical surface (this track contains the quality information of the ROI on the sphere for the referenced video track).
  • the client can use the sample i n this track to know the quality of the ROI object on the sphere by parsing the sample in the metadata track with the time attribute of the quality. .
  • the value of the reference type is conc, which indicates that the associated track includes the coverage information of the image corresponding to the sample in the video track on the spherical surface (this track provides information on the area on the spherical surface for The referenced video track).
  • the associated metadata track may be a track track (a track maybe a recommended viewport timed metadata track), and the associated video track may be a director's perspective video track (the referenced video track maybe a Director's cut video track).
  • the contact type describing the metadata associated with the media data is encapsulated.
  • the client parses the data of the media track, through the information described by the contact type of the media data and the metadata, the client can understand what processing can be performed on the video data, and the method decouples the analysis between the tracks and reduces the client. Implement the complexity in the process.
  • the client can request corresponding track data according to different processing needs.
  • the ROI in the embodiment of the present invention may be a viewing angle, or may be a recommended viewing angle, such as an author's perspective.
  • the viewing angle or the recommended viewing angle may also refer to a coverage area, where the coverage area refers to a spatial range of spatial objects corresponding to the media data.
  • the track of the media data includes a contact type that describes metadata associated with the media data, and the contact type may describe a 2D spatial location of the ROI in the corresponding spatial object of the media data. Or the spatial position of the ROI in the spatial object corresponding to the media data, or the spatial position of the ROI in the spatial image corresponding to the media data in the mapped image.
  • the track of the media data includes a ‘tref’ box, where
  • the reference_type in 'tref'box takes the value ri2d, which indicates that the media data is associated with a timed metadata track of an ROI.
  • the sample in the track of the ROI is the 2D spatial location information of the ROI in the spatial object corresponding to the media data.
  • the 2D position information may be a position defined in the existing standard ISO/IEC 23001-10;
  • the reference_type value is rois
  • the sample in the ROI track is the spatial position information of the ROI in the spherical space corresponding to the media data
  • the spatial position information in the spherical surface may be the existing standard ISO/IEC a sample in the timed metadata track on the sphere defined in 23000-20;
  • the reference_type value is ri2p, where the value indicates that the media data is associated with a timed metadata track of an ROI, and the sample (sample) in the track of the ROI is the spatial position information of the ROI in the mapped image in the spatial object corresponding to the media data.
  • the client can handle the following:
  • the client parses the track of the media data.
  • the track id of the track of the media data is 1 (can be any non-zero integer), and the 'tref' box in the track is obtained; the reference_type value from the 'tref' box is 'ri2d'.
  • the associated track of 'rois' or 'ri2p', the track id of the associated track is 2 (can be any non-zero integer);
  • the client determines, according to ‘ri2d’, that the track whose track id is 2 describes the 2D spatial position information of the ROI in the corresponding spatial object of the media data.
  • the track whose track id is 2 describes the spatial position information of the ROI in the spherical surface corresponding to the spatial data of the media data.
  • the track whose track id is 2 describes the spatial position information of the ROI in the mapped image in the corresponding spatial object of the media data
  • the client can provide an ROI option on the user interface, and the user selects whether to view the content of the ROI area. If the user selects to view the content of the ROI area, the client presents the ROI area content; or the client directly presents the ROI content;
  • the type of the metadata referenced by the track is clearly described, so that the resolution between the client's tracks is decoupled, and the complexity in the client implementation process is reduced.
  • the coverage information describes an area that is intercepted by the media content of the current track in the entire source video content, and the area may be an area in the VR sphere. It may also be an area in a 2D image, or an area that is spherically mapped to a 2D image; as shown in FIG. 17, the spatial position information of the small image on the right side in FIG. 17(a) is a large image on the left side.
  • Coverage information in the spherical image (17(a)) the area where the gray area is intercepted on the sphere is the coverage area of the gray area, and the coverage area
  • the association type in the volume track adds a description in the 'tref' box, and the description information indicates that the metadata is 2D spatial position information of the spatial object corresponding to the media data, or the spatial position information of the spatial object corresponding to the spatial object in the spherical surface, or It is spatial position information of the media data corresponding to the spatial object in the mapped image.
  • the coverage information is described by using a box.
  • a specific example is as follows:
  • the box provides information for the area on the spherical surface represented by the projected frame associated with the container ProjctedOmnidirectionalVideoBox. (This box provides information on the area on the spherical surface that is represented by the projected frame associated with the container Projcted Omnidirectional VideoBox). If there is no box indicating the coverage information in the data, the projection of the frame is a representation of the full frame.
  • the spherical area represented by the projected frame is the area specified by the two yaw circles and the two pitch circles, as shown in FIG. 10 (When the projection format is the equirectangular projection, the spherical The region represented by the projected frame is the region specified by two yaw circles and two pitch circles, as illustrated fig 10).
  • the coverage information is described as follows:
  • the hor_range element and the ver_range element specify a horizontal and vertical range of the image corresponding to the sample in the video track on the spherical surface, and the unit may be 0.01 degrees.
  • hor_range and ver_range specify the extent of the center point of the region.
  • the range of hor_range should be between 1 and 36000.
  • the range of ver_range should be between 1 and 36000.
  • Center_pitch+ver_range ⁇ 2 is no greater than 18000.
  • center_pitch-ver_range ⁇ 2 is not less than -18000 (hor_range and ver_rangespecify the range through the center point of the region.
  • hor_range shall be in the range of 1to 36000, inclusive.ver_range shall be in the range Of 1to 36000,inclus ive.center_pitch+ver_range ⁇ 2shall not be greater than 18000.center_pitch-ver_range ⁇ 2shall not be less than-18000).
  • the value of dynamic_range_flag is equal to 0, indicating that the horizontal and vertical ranges corresponding to all samples involved in the sample entry are not changed (dynamic_range_flag equal to 0specifies that the horizontal and vertical ranges of the region In all samples reference to this sample entry).
  • dynamic_range_flag 0specifies that the horizontal and vertical ranges of the region In all samples reference to this sample entry.
  • the value of dynamic_range_flag equal to 1 indicates that the horizontal and vertical ranges corresponding to the sample are described in the sample format (dynamic_range_flag equal to 1specifies that the horizonal and vertical ranges of the region is indicated in the sample) Format).
  • the coverage area of the image corresponding to the sample in the video track on the spherical surface is described by the overlay information track.
  • the coverage timed metadata track indicates the coverage area of the video content on the sphere.
  • the entry type of the sample in the coverage timed metadata track is 'covg'.
  • the RegionOnSphereSample element can be used to describe the sample syntax in the coverage timed metadata track.
  • Specific examples of the RegionOnSphereSample element can be referred to the relevant provisions in the existing standards, for example, with reference to the relevant examples of ISO/IEC 23000-20.
  • the value of shape_type in the RegionOnSphereConfigBox in the sample entry is 0.
  • the static_hor_range element and the static_ver_range element, or the hor_range element and the ver_range element are used to indicate the coverage of the corresponding horizontal view and the coverage of the vertical view, respectively, and the center_yaw element and the center_pitch element are used to indicate the center point of the coverage.
  • the reference_type in the 'tref' box takes the value cv2d, the semantic representation that the media data is associated with a timed metadata track of a coverage area, and the sample in the track of the coverage area is the corresponding spatial object in the media data.
  • 2D spatial location information may be location information defined in an existing standard ISO/IEC 23001-10;
  • the reference_type value is cvsp
  • the semantic representation is that the media data is associated with a timed metadata track of the coverage area, and the sample in the track of the coverage area is spatial position information of the spatial object corresponding to the spatial object in the spherical surface; the spherical surface
  • the information in the information may be the sample in the timed metadata track on the sphere defined in the existing standard ISO/IEC 23000-20;
  • the reference_type value is cv2p.
  • the semantic representation indicates that the media data is associated with a timed metadata track of a coverage area.
  • the sample in the track of the coverage area is spatial location information of the spatial object corresponding to the spatial object in the mapped image.
  • the client can handle the following:
  • the client parses the track of the media data.
  • the track id of the track of the media data is 1 (can be any non-zero integer)
  • the 'tref' box of the media track is obtained, and the reference_type value is 'cv2d' from the 'tref' box.
  • the associated track of 'cvsp' or 'cv2p' the track id of the track is 2 (can be any non-zero integer);
  • the client determines that the track whose track id is 2 describes the 2D spatial position information of the spatial object corresponding to the media data, or judges that the track id is 2 according to 'cvsp', and the media data is described.
  • the client determines that the track whose track id is 2 describes the spatial position information of the spatial object corresponding to the spatial object in the mapped image;
  • the client can judge whether the content of the media is presented or intercepted part of the content according to the coverage information and the rendering capability of the device, or when the user's perspective changes, the user acquires data other than the perspective.
  • the track whose track id is 2 is a spatial information description track of the coverage area
  • the sample entry type of the track indicates that the current timed metadata track is a spatial information description track of the coverage area
  • the sample entry type may take a value of ' Cvvp'(coverage viewport).
  • the coverage information of the media may be described by an independent track, such as a timed metadata track description with a sample entry type value of 'cvvp', but if the coverage information of the media is a static case (every frame in the video) The coverage information is the same), then the media does not need to be associated with the coverage area.
  • the specific description information is in 'covi' in ISO/IEC 23000-20 (covering information box) ), the shape of the coverage area on the spherical or 2D plane is described in the box.
  • the value of the contact type describes a description manner of the association relationship between the metadata track of the quality information and the media data track.
  • the type of the metadata referenced by the track is clearly described, so that the resolution between the tracks of the client is decoupled, and the complexity in the implementation process of the client is reduced.
  • FIG. 1 is a schematic structural diagram of an MPD transmitted by a DASH standard used for system layer video streaming media transmission;
  • FIG. 2 is a schematic diagram of an example of a framework for DASH standard transmission used in system layer video streaming media transmission
  • FIG. 3 is a schematic diagram of switching of a code stream segment according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a segmentation storage manner in code stream data
  • 5 is another schematic diagram of a segmentation storage manner in code stream data
  • FIG. 6 is a schematic diagram of a perspective corresponding to a change in viewing angle
  • Figure 7 is another schematic diagram of the spatial relationship of spatial objects
  • FIG. 8 is a schematic flowchart diagram of a method for processing information of a streaming media according to an embodiment of the present disclosure
  • FIG. 9 is a schematic diagram of relative positions of target space objects in a panoramic space
  • FIG. 10 is a schematic diagram of a coordinate system according to an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of another coordinate system according to an embodiment of the present invention.
  • Figure 12 is a schematic illustration of another coordinate system in accordance with an embodiment of the present invention.
  • FIG. 13 is a schematic flowchart diagram of a method for processing information of a streaming media according to an embodiment of the present disclosure
  • FIG. 14 is a schematic diagram showing the logical structure of a device for processing information of streaming media according to an embodiment of the present invention.
  • FIG. 15 is a schematic diagram showing the logical structure of a device for processing information of streaming media according to an embodiment of the present invention.
  • FIG. 16 is a schematic structural diagram of hardware of a computer device according to an embodiment of the present invention.
  • Figure 17 is a schematic illustration of a coverage area in accordance with an embodiment of the present invention.
  • the current client-side system layer video streaming media transmission scheme can adopt the DASH standard framework, as shown in FIG. 2, and FIG. 2 is a schematic diagram of a framework example of DASH standard transmission used in system layer video streaming media transmission.
  • the data transmission process of the system layer video streaming media transmission scheme includes two processes: a server side (such as an HTTP server, a media content preparation server, hereinafter referred to as a server) generates media data for video content, responds to a client request process; and a client ( The process of requesting and obtaining media data from a server, such as an HTTP streaming client.
  • the media data includes a media presentation description (MPD) and a media stream (eg, a video stream to be played).
  • MPD media presentation description
  • media stream eg, a video stream to be played.
  • the MPD on the server includes a plurality of representations (also called presentations, English: representation), each representation describing a plurality of segments.
  • the client's HTTP streaming request control module obtains the service.
  • the MPD sent by the server analyzes the MPD, determines the information of each segment of the video code stream described in the MPD, and further determines the segment to be requested, sends a corresponding segmented HTTP request to the server, and passes the media.
  • the player performs decoding playback.
  • the media data generated by the server for the video content includes a video stream corresponding to different versions of the same video content, and an MPD of the code stream.
  • the server generates a low-resolution low-rate low frame rate (such as 360p resolution, 300kbps code rate, 15fps frame rate) for the video content of the same episode, and a medium-rate medium-rate high frame rate (such as 720p).
  • Resolution 1200 kbps, 25 fps frame rate, high resolution, high bit rate, high frame rate (such as 1080p resolution, 3000 kbps, 25 fps frame rate).
  • FIG. 1 is a schematic structural diagram of an MPD of a system transmission scheme DASH standard.
  • each of the information indicating a plurality of segments is described in time series, for example, Initialization Segment, Media Segment 1, Media Segment2, ..., Media Segment20, etc.
  • the representation may include segmentation information such as a playback start time, a playback duration, and a network storage address (for example, a network storage address expressed in the form of a Uniform Resource Locator (URL)).
  • URL Uniform Resource Locator
  • the client In the process of the client requesting and obtaining the media data from the server, when the user selects to play the video, the client obtains the corresponding MPD according to the video content requested by the user to the server.
  • the client sends a request for downloading the code stream segment corresponding to the network storage address to the server according to the network storage address of the code stream segment described in the MPD, and the server sends the code stream segment to the client according to the received request.
  • the client After the client obtains the stream segment sent by the server, it can perform decoding, playback, and the like through the media player.
  • FIG. 3 is a schematic diagram of switching of a code stream segment according to an embodiment of the present invention.
  • the server can prepare three different versions of code stream data for the same video content (such as a movie), and describe the three different versions of the code stream data in the MPD using three Representations.
  • the above three Representations (hereinafter referred to as rep) can be assumed to be rep1, rep2, rep3, and the like.
  • rep1 is a high-definition video with a code rate of 4mbps (megabits per second)
  • rep2 is a standard-definition video with a code rate of 2mbps
  • rep3 is a normal video with a code rate of 1mbps.
  • Each rep segment contains a video stream within a time period.
  • each rep describes the segments of each time segment according to the time series, and the segment lengths of the same time period are the same, thereby enabling content switching of segments on different reps.
  • the segment marked as shadow in the figure is the segmentation data requested by the client, wherein the first 3 segments requested by the client are segments of rep3, and the client may request rep2 when requesting the 4th segment.
  • the fourth segment in the middle can be switched to play on the fourth segment of rep2 after the end of the third segment of rep3.
  • the playback end point of the third segment of Rep3 (corresponding to the time end of the playback time) is the playback start point of the fourth segment (corresponding to the time start time of playback), and also rep2 or rep1.
  • the playback start point of the 4th segment is used to achieve alignment of segments on different reps. After the client requests the 4th segment of rep2, it switches to rep1, requests the 5th segment and the 6th segment of rep1, and so on. Then you can switch to rep3, request the 7th segment of rep3, then switch to rep1, request the 8th segment of rep1.
  • Each rep segment can The files that are connected end to end exist in one file, and can also be stored as small files one by one.
  • the segment may be packaged in accordance with the standard ISO/IEC 14496-12 (ISO BMFF (Base Media File Format)) or may be encapsulated in accordance with ISO/IEC 13818-1 (MPEG-2 TS). It can be determined according to the requirements of the actual application scenario, and no limitation is imposed here.
  • FIG. 4 is a schematic diagram of a segment storage mode in the code stream data; All the segments on the same rep are stored in one file, as shown in Figure 5.
  • Figure 5 is another schematic diagram of the segmentation storage mode in the code stream data.
  • each segment in the segment of repA is stored as a file separately, and each segment in the segment of repB is also stored as a file separately.
  • the server may describe information such as the URL of each segment in the form of a template or a list in the MPD of the code stream.
  • the server may use an index segment (English: index segment, that is, sidx in FIG. 5) in the MPD of the code stream to describe related information of each segment.
  • the index segment describes the byte offset of each segment in its stored file, the size of each segment, and the duration of each segment (also known as the duration of each segment).
  • the spatial region of the VR video (the spatial region may also be called a spatial object) is a 360-degree panoramic space (or full).
  • the azimuth space, or the panoramic space object exceeds the normal visual range of the human eye. Therefore, the user changes the viewing angle (ie, the angle of view, FOV) at any time during the process of watching the video.
  • FIG. 6 is a schematic diagram of a perspective corresponding to a change in viewing angle.
  • Box 1 and Box 2 are two different perspectives of the user, respectively.
  • the video image viewed when the user's perspective is box 1 is a video image presented by the one or more spatial objects corresponding to the perspective at the moment.
  • the user's perspective is switched to box 2.
  • the video image viewed by the user should also be switched to the video image presented by the space object corresponding to box 2 at that moment.
  • the server may divide a panoramic space (or called a panoramic spatial object) within a 360-degree viewing angle range to obtain a plurality of spatial objects, each The spatial object corresponds to a sub-view of the user, and the splicing of the plurality of sub-views forms a complete human eye viewing angle. That is, the human eye angle of view (hereinafter referred to as the angle of view) may correspond to one or more divided spatial objects, and the spatial object corresponding to the perspective is all the spatial objects corresponding to the content objects within the scope of the human eye.
  • the angle of view may correspond to one or more divided spatial objects
  • the spatial object corresponding to the perspective is all the spatial objects corresponding to the content objects within the scope of the human eye.
  • the viewing angle of the human eye can be dynamically changed, but generally the viewing angle range may be 120 degrees*120 degrees, and the spatial object corresponding to the content object within the human eye angle range of 120 degrees*120 degrees may include one or more divided objects.
  • the spatial object is, for example, the angle of view 1 corresponding to the frame 1 of FIG. 6 and the angle of view 2 corresponding to the frame 2.
  • the client may obtain the spatial information of the video code stream prepared by the server for each spatial object through the MPD, and then request the video code stream corresponding to one or more spatial objects in a certain period of time according to the requirement of the perspective.
  • the segment outputs the corresponding spatial object according to the perspective requirements.
  • the client outputs the video stream segment corresponding to all the spatial objects within the 360-degree viewing angle range in the same time period, and then displays the complete video image in the entire 360-degree panoramic space.
  • the server may first map the spherical surface into a plane, and divide the spatial object on the plane. Specifically, the server may map the spherical surface into a latitude and longitude plan by using a latitude and longitude mapping manner.
  • FIG. 7 is a schematic diagram of a spatial object according to an embodiment of the present invention. The server can map the spherical surface into a latitude and longitude plan, and divide the latitude and longitude plan into a plurality of spatial objects such as A to I.
  • the server may also map the spherical surface into a cube, expand the plurality of faces of the cube to obtain a plan view, or map the spherical surface to other polyhedrons, and expand the plurality of faces of the polyhedron to obtain a plan view or the like.
  • the server can also map the spherical surface to a plane by using more mapping methods, which can be determined according to the requirements of the actual application scenario, and is not limited herein. The following will be described in conjunction with FIG. 7 in a latitude and longitude mapping manner. As shown in FIG. 7, after the server can divide the spherical panoramic space into a plurality of spatial objects such as A to I, a set of DASH video code streams can be prepared for each spatial object.
  • the set of DASH video streams corresponding to each spatial object When the client user switches the viewing angle of the video viewing, the client can obtain the code stream corresponding to the new spatial object according to the new perspective selected by the user, and then the video content of the new spatial object code stream can be presented in the new perspective.
  • the method and apparatus for processing information provided by the embodiments of the present invention will be described below with reference to FIG. 8 to FIG.
  • the system layer video streaming media transmission scheme adopts the DASH standard, and realizes the transmission of video data by analyzing the MPD by the client, requesting the video data to the server as needed, and receiving the data sent by the server.
  • a main plot route may be designed for the video playback according to the storyline requirement of the video.
  • the user only needs to watch the video image corresponding to the main plot route to understand the storyline, and other video images can be seen or not.
  • the client can selectively play the video image corresponding to the storyline, and other video images may not be presented, which can save the transmission resource and storage space resources of the video data, and improve the processing efficiency of the video data. .
  • the video image to be presented to the user at each play time during video playback can be set according to the above-mentioned main plot route, and the video sequence of each play time can be obtained by stringing the time series to obtain the above main plot route.
  • Storyline The video image to be presented to the user at each of the playing times is a video image presented on a spatial object corresponding to each playing time, that is, a video image to be presented by the spatial object during the time period.
  • the angle of view corresponding to the video image to be presented at each of the playing times may be set as the author's perspective
  • the spatial object that presents the video image in the perspective of the author may be set as the author space object.
  • the code stream corresponding to the author view object can be set as the author view code stream.
  • the video stream data of multiple video frames (encoded data of multiple video frames) is included in the code stream of the author.
  • Each video frame can be presented as one image, that is, corresponding to multiple images in the author's view code stream.
  • the image presented by the author's perspective is only part of the panoramic image (or VR image or omnidirectional image) that the entire video is to present.
  • the spatial information of the spatial object associated with the image corresponding to the author's video stream may be different, or may be the same, that is, the spatial information of the spatial object associated with the video data of the author's perspective stream is different.
  • the corresponding code stream can be prepared by the server for the author perspective of each play time.
  • the code stream corresponding to the author view may be set as the author view code stream.
  • the server may encode the author view code stream and transmit it to the client.
  • the story scene picture corresponding to the author view code stream may be presented to the user.
  • the server does not need to transmit the code stream of other perspectives other than the author's perspective (set to the non-author perspective, that is, the static view stream) to the client, which can save resources such as the transmission bandwidth of the video data.
  • the author's perspective is an image of the preset space object set by the author according to the video storyline
  • the author space objects at different playing times may be different or the same, thereby knowing that the author's perspective is a With the changing perspective of the playing time
  • the author's spatial object is a dynamic space pair with constantly changing positions.
  • the image space of the author space object corresponding to each play time is not the same in the panoramic space.
  • Each of the spatial objects shown in FIG. 7 is a spatial object divided according to a preset rule, and is a spatial object fixed in a relative position in the panoramic space.
  • the author space object corresponding to any play time is not necessarily fixed as shown in FIG. 7.
  • One of the spatial objects and a spatial object whose relative position is constantly changing in the global space.
  • the content presented by the video obtained by the client from the server is stringed by each author's perspective. It does not contain the spatial object corresponding to the non-author perspective.
  • the author view code stream only contains the content of the author space object, and the MPD obtained from the server does not.
  • the spatial information of the author's spatial object containing the author's perspective the client can only decode and present the code stream of the author's perspective. If the viewing angle of the viewing is switched to the non-author perspective during the video viewing process, the client cannot present the corresponding video content to the user.
  • the identifier information when the server generates the media presentation description, may be added to the media presentation description for identifying the author view code stream of the video, that is, the author view code stream.
  • the identifier information may be carried in the attribute information of the code stream set in which the author view code stream is carried in the media presentation description, that is, the identifier information may be carried in the information of the adaptive set in the media presentation description, where the identifier is The information can also be carried in the information contained in the presentation contained in the media presentation description. Further, the foregoing identification information may also be carried in the information of the descriptor in the media presentation description.
  • the client can quickly obtain the code stream of the author view code stream and the non-author view by parsing the MPD to obtain the syntax elements added in the MPD. If the spatial information related to the author's view stream is encapsulated in a separate metadata file, the client can parse the MPD and obtain the metadata of the spatial information according to the codec identifier, thereby parsing the spatial information.
  • the server may also add spatial information for one or more author space objects in the author view stream.
  • each author space object corresponds to one or more images, that is, one or more images may be associated with the same spatial object, or each image may be associated with one spatial object.
  • the server can add the spatial information of each author space object in the author view code stream, and can also use the space information as a sample and independently encapsulate it in a track or file.
  • the spatial information of an author space object is the spatial relationship between the author space object and its associated content component, that is, the spatial relationship between the author space object and the panoramic space. That is, the space described by the spatial information of the author space object may specifically be a partial space in the panoramic space, such as any one of the spatial objects in FIG. 7 above.
  • the server may add the foregoing spatial information to the trun box or the tfhd box included in the segment of the author view code stream in the existing file format, and describe the video frame of the author view code stream.
  • the spatial information of the spatial object associated with each frame of image corresponding to the data may be added to the server.
  • the spatial information of the spatial objects associated with each frame of image may have the same information, the spatial information of the plurality of author spatial objects is duplicated and redundant, thereby affecting the efficiency of data transmission.
  • the modification of the file format provided by the present invention can also be applied to the file format of the ISOBMFF or the MPEG2-TS, and can be determined according to the requirements of the actual application scenario, and is not limited herein.
  • FIG. 8 is a schematic flowchart diagram of a method for processing information of a streaming media according to an embodiment of the present invention.
  • the information processing method of the streaming media provided by the embodiment of the present invention can be applied to the DASH field, and can also be applied to other streaming media fields, such as streaming media transmission based on the RTP protocol.
  • the executor of the method may be a client, and may be a terminal, a user equipment, or a computer device, or may be a network device, such as a gateway, a proxy server, or the like. As shown in FIG. 8, the method may include the following steps:
  • Target space information of a target space object where the target space object is a spatial object of two spatial objects, where the two spatial objects are associated with data of two images included in the target video data, and the target spatial information includes the same Attribute space information, including the spatial information of each of the two spatial objects
  • the spatial information of the other spatial objects except the target spatial object of the two spatial objects includes the same attribute spatial information.
  • the target video data may be the target video code stream or the uncoded video data.
  • the data of the two images may be the encoded data of the two images.
  • the target video code stream may be an author view code stream or a non-author view code stream.
  • the acquiring the target space information of the target space object may be receiving the target space information from the server.
  • the two images may correspond to the two spatial objects one by one, or one spatial object may correspond to two images.
  • the spatial information of a target space object is a spatial relationship between the target space object and its associated content component, that is, the spatial relationship between the target space object and the panoramic space. That is, the space described by the target space information of the target space object may specifically be a partial space in the panoramic space.
  • the target video data may be the above-mentioned author view code stream or a non-author view code stream.
  • the target space object may be the author space object described above, or may not be.
  • the video data to be played can also be displayed.
  • the target space information may further include hetero-attribute space information of the target space object, and the spatial information of the other spatial object further includes hetero-attribute space information of the other spatial object, and the hetero-attribute space of the target spatial object The information is different from the heterogeneous information of the other spatial object.
  • the target space information may include location information of a center point of the target spatial object or location information of an upper left point of the target spatial object, and the target spatial information may further include the target space.
  • the width of the object and the height of the target space object may include location information of a center point of the target spatial object or location information of an upper left point of the target spatial object, and the target spatial information may further include the target space. The width of the object and the height of the target space object.
  • the target spatial information when the coordinate system corresponding to the target spatial information is an angular coordinate system, the target spatial information may be described by using a yaw angle.
  • the target spatial information When the coordinate system corresponding to the target spatial information is a pixel coordinate system, the target spatial information may adopt a latitude and longitude map.
  • the spatial position description is described by other geometric solid figures, and no limitation is imposed here. It is described by the yaw angle method, such as the pitch angle ⁇ (pitch), the yaw angle yaw, and the roll angle ⁇ (roll), which are used to indicate the width of the angle range and to indicate the height of the angle range.
  • FIG. 9 is a schematic diagram of the relative positions of the center points of the target space objects in the panoramic space. In FIG.
  • the point O is the center of the 360-degree VR panoramic video spherical image, which can be considered as the position of the human eye when viewing the VR panoramic image.
  • Point A is the center point of the target space object
  • C and F are the boundary points of the target space object along the horizontal coordinate axis of the target space object
  • E and D are the target space objects passing the point A along the target space.
  • the boundary point of the longitudinal coordinate axis of the object B is the projection point of the A point along the spherical meridian line on the equator line
  • I is the starting coordinate point of the horizontal direction on the equator line.
  • Pitch angle the center position of the image of the target space object is mapped to the vertical direction of the point on the panoramic spherical (ie global space) image, such as ⁇ AOB in FIG. 9;
  • Yaw angle the center position of the image of the target space object is mapped to the horizontal deflection angle of the point on the panoramic spherical image, as shown in FIG. 9 ⁇ IOB;
  • Rolling angle the center position of the image of the yaw angle space object is mapped to the rotation angle of the point on the panoramic spherical image and the connection direction of the spherical center, as shown in FIG. 9 ⁇ DOB;
  • the image of the target space object is represented by the maximum field of view of the panoramic spherical image, as shown by ⁇ DOE in Figure 9;
  • the width of the angular range (the width of the target space object in the angular coordinate system):
  • the image of the target space object is represented by the field of view of the panoramic spherical image, as indicated by the horizontal maximum angle of the field of view, as shown in Figure 9 for COF.
  • the target spatial information may include location information of an upper left point of the target spatial object, and location information of a lower right point of the target spatial object.
  • the target space information when the target space object is not a rectangle, the target space information may include at least one of a shape type, a radius, and a perimeter of the target space object.
  • the target spatial information can include spatial rotation information for the target spatial object.
  • the target spatial information may be encapsulated in spatial information data or a spatial information track
  • the spatial information data may be a code stream of the target video data, metadata of the target video data, or independent of the A file of target video data, which may be a track independent of the target video data.
  • the spatial information data or the spatial information track may further include a spatial information type identifier for indicating the type of the same attribute spatial information, where the spatial information type identifier is used to indicate that the target spatial information belongs to the same attribute spatial information.
  • Information for indicating the type of the same attribute spatial information, where the spatial information type identifier is used to indicate that the target spatial information belongs to the same attribute spatial information.
  • the same attribute spatial information may include a wide minimum of the target spatial object, and a high minimum of the target spatial object.
  • the width maximum of the target space object and the high maximum of the target space object may include a wide minimum of the target spatial object, and a high minimum of the target spatial object.
  • the spatial information type identifier and the same attribute space information may be encapsulated in the same box.
  • the server when the target space information is encapsulated in a file (a spatial information file) independent of the target video data or a track (a spatial information track) independent of the target video data, the server may be in a 3dsc in an existing file format. Add the same attribute space information in the box, and add the different attribute space information of the target space object in the mdat box in the existing file format.
  • the same attribute space information may be part of yaw, pitch, roll, reference_width, and reference_height, but not all, for example, no roll.
  • the roll may belong to the different attribute space information of the target space object, or may not be included in the target space information.
  • the spatial information type identifier regionType is also added to the 3dsc box, and the sample is an example in the angular coordinate system. Wherein, when the spatial information type identifier is 0, the spatial information type identifier is used to indicate that the information belonging to the same attribute spatial information in the target spatial information is the location information of the central point of the target spatial object or the upper left of the target spatial object The location information of the point, as well as the width of the target space object and the height of the target space object.
  • the position information is represented by a pitch angle ⁇ (pitch), a yaw angle yaw (yaw), and a roll angle ⁇ (roll), and the width and height can also be expressed by angles.
  • pitch angle ⁇ pitch
  • yaw yaw angle
  • roll roll
  • the width and height can also be expressed by angles.
  • the spatial information type identifier is used to indicate that the information belonging to the same attribute spatial information in the target spatial information is the width of the target spatial object and the height of the target spatial object.
  • the sizes of the two spatial objects are the same and the positions are different.
  • the spatial information type identifier is 2
  • the spatial information type identifier is used to indicate that the target spatial information does not belong to the same attribute spatial information. Another way to understand is that when the spatial information type identifier is 2, the size and position of the two spatial objects are different.
  • the spatial information type identifier when the spatial information type identifier is 0, it may indicate that the hetero-attribute space information does not exist.
  • the spatial information type identifier when the spatial information type identifier is 1, the spatial information type identifier further indicates that the heterogeneous spatial information of the target spatial object is the The position information of the center point of the target space object or the position information of the upper left point of the target space object.
  • the spatial information type identifier is 2
  • the spatial information type identifier further indicates that the heterogeneous spatial information of the target spatial object is location information of a central point of the target spatial object or location information of an upper left point of the target spatial object, and the target The width of the spatial object and the height of the target space object.
  • the sample is a sample in a pixel coordinate system.
  • the spatial information type identifier is used to indicate that the information belonging to the same attribute spatial information in the target spatial information is the upper left point of the target spatial object.
  • the position information is represented by the abscissa in units of pixels and the ordinate in pixels, and the width and height can also be expressed in units of pixels.
  • the abscissa and the ordinate may be coordinates of the position point in the latitude and longitude plan view in FIG. 7, or may be coordinates in the panoramic space (or the panoramic space object).
  • the location and size of the two spatial objects are the same. It should be noted that the location information of the center point of the target space object may be used instead of the location information of the upper left point of the target space object.
  • the spatial information type identifier is 1
  • the spatial information type identifier is used to indicate that the information belonging to the same attribute spatial information in the target spatial information is the width of the target spatial object and the height of the target spatial object. Another way to understand is that when the spatial information type identifier is 1, the two spatial objects have the same size and different positions.
  • the spatial information type identifier is 2
  • the spatial information type identifier is used to indicate that the target spatial information does not belong to the same attribute spatial information. Another way to understand is that when the spatial information type identifier is 2, the size and position of the two spatial objects are different.
  • the spatial information type identifier when the spatial information type identifier is 0, it may indicate that the hetero-attribute space information does not exist.
  • the spatial information type identifier when the spatial information type identifier is 1, the spatial information type identifier further indicates that the heterogeneous spatial information of the target spatial object is the location information of the upper left point of the target spatial object.
  • the spatial information type identifier is 2 the spatial information type identifier further indicates that the heterogeneous spatial information of the target spatial object is location information of the upper left point of the target spatial object, and the width of the target spatial object and the height of the target spatial object.
  • the location information of the center point of the target space object may be used instead of the location information of the upper left point of the target space object.
  • Example of adding spatial information (Sample 3):
  • the sample is a sample in a pixel coordinate system.
  • the spatial information type identifier is used to indicate that the information belonging to the same attribute spatial information in the target spatial information is the upper left point of the target spatial object.
  • the positional information is represented by the abscissa in units of pixels and the ordinate in units of pixels.
  • the abscissa and the ordinate may be coordinates of the position point in the latitude and longitude plan view in FIG. 7, or may be coordinates in the panoramic space (or the panoramic space object).
  • the position information of the lower right point of the target space object may be replaced by the width and height of the target space object.
  • the spatial information type identifier is used to indicate that the information belonging to the same attribute spatial information in the target spatial information is the location information of the lower right point of the target spatial object. Another way to understand is that when the spatial information type identifier is 1, the two spatial objects have the same size and different positions. It should be noted that the position information of the lower right point of the target space object may be replaced by the width and height of the target space object.
  • the spatial information type identifier is 2
  • the spatial information type identifier is used to indicate that the target spatial information does not belong to the same attribute spatial information. Another way to understand is that when the spatial information type identifier is 2, the size and position of the two spatial objects are different.
  • the spatial information type identifier when the spatial information type identifier is 0, it may indicate that the hetero-attribute space information does not exist.
  • the spatial information type identifier when the spatial information type identifier is 1, the spatial information type identifier further indicates that the heterogeneous spatial information of the target spatial object is the location information of the upper left point of the target spatial object.
  • the spatial information type identifier is 2 the spatial information type identifier further indicates that the heterogeneous spatial information of the target spatial object is location information of an upper left point of the target spatial object, and location information of a lower right point of the target spatial object. It should be noted that the position information of the lower right point of the target space object may be replaced by the width and height of the target space object.
  • the spatial information data or the spatial information trajectory may further include a coordinate system identifier for indicating a coordinate system corresponding to the target spatial information, and the coordinate system is a pixel coordinate system or an angular coordinate system.
  • the coordinate system identifier and the same attribute space information may be encapsulated in the same box.
  • the server when the target space information is encapsulated in a file (a spatial information file) independent of the target video data or a track (a spatial information track) independent of the target video data, the server may be in a 3dsc in an existing file format. Add a coordinate system identifier to the box.
  • the coordinate system is an angular coordinate system
  • the coordinate system identifier is 1, the coordinate system is a pixel coordinate system.
  • the spatial information data or the spatial information trajectory may further include a spatial rotation information identifier for indicating whether the target spatial information includes spatial rotation information of the target spatial object.
  • the spatial rotation information identifier may be encapsulated in the same box (for example, 3dscbox) as the same attribute space information, and the spatial rotation information identifier may also be encapsulated in the same box with the different attribute space information of the target space object (for example, mdat) Box). Specifically, when the spatial rotation information identifier and the heterogeneous spatial information of the target spatial object are encapsulated in the same box, when the spatial rotation information identifier indicates that the target spatial information includes spatial rotation information of the target spatial object, The hetero-attribute space information of the target space object includes spatial rotation information.
  • the server may encapsulate the spatial rotation information identifier and the heterogeneous spatial information of the target spatial object in a same box (for example, an mdat box). Further, the server may encapsulate the spatial rotation information identifier and the heterogeneous spatial information of the target spatial object in the same sample of the same box. One of the samples can encapsulate the different attribute information corresponding to a spatial object.
  • Example 1 Example of adding a spatial rotation information identifier (example 1):
  • the same attribute space information and the attribute space information of the target space object may also be encapsulated in the track matedata of the video, such as in the same box, for example, a trun box or The tfhd box is either in the new box.
  • Example of adding spatial information (Sample 4):
  • One spatial information of one spatial object is a sample, and the above sample quantity is used to indicate the number of spatial objects, and each spatial object corresponds to a respective set of different attribute spatial information.
  • the steps are as follows:
  • Obtain a spatial information file or a spatial information track (the spatial information may be called timed metadata) or a spatial information metadata of the video (or metadata called target video data);
  • Parsing the box (space information description box) with the tag 3dsc, parsing the spatial information type identifier, the spatial information type identifier can be used to indicate the spatial object type of the two spatial objects, and the optional spatial object type can include but not Limited to: spatial objects with constant position and size, spatial objects with different position and size, spatial objects with unchanged position and size, and spatial objects with different positions and sizes;
  • the spatial object type that is parsed is a spatial object with the same position and size
  • the spatial object with the same position and size refers to the spatial position of the spatial object and the size of the spatial object, so it is parsed in the 3dsc box.
  • the same attribute space information can be used as the target space information
  • the spatial object type indicates that the spatial information of all the two spatial objects are the same, and the values are consistent with the parsed same attribute space information; if it is the same type Attribute space information, in the subsequent parsing, there is no need to parse the box where the heterogeneous spatial information of the target space object is located;
  • the spatial object type parsed is a spatial object with a change in position and size
  • the same attribute space information in the 3dscbox will carry the size information of the spatial object, such as the width of the spatial object; then in the subsequent analysis
  • the obtained information in the hetero-attribute space information of the target space object is the position information of each spatial object.
  • the information carried in the heterogeneous spatial information of the target spatial object obtained by subsequent parsing is the position information of each spatial object (for example, The position information of the center point) and the size information of the space object, such as the width and height of the space object;
  • the target space information After parsing the target space information, select the presented content object in the obtained VR video according to the spatial object (target space object) described by the target space information; or request the video data corresponding to the spatial object described by the target spatial information to decode Presenting, or determining, based on the target spatial information, the location of the currently viewed video content in the VR video space (or, the panoramic space).
  • the manner in which the spatial information is carried may be described by adding a carryType in the MPD: carried in the spatial information file or the spatial information track or the metadata of the target video data.
  • the specific MPD sample is as follows:
  • Example 1 Spatial information is carried in the metadata of the target video data.
  • va l ue 1,0
  • 1 is the identity of the source
  • 0 indicates that the spatial information is carried in the metadata of the trajectory of the target video data (or metadata called the target video data).
  • Example 2 Spatial information is carried in the spatial information track
  • va l ue 1,1
  • 1 is the identity of the source, and 1 indicates that the spatial information is carried in the independent spatial information track.
  • Example 3 Spatial information is carried in a separate spatial information file
  • va l ue 1,2
  • 1 is the identity of the source
  • 2 indicates that the spatial information is carried in a separate spatial information file
  • the target video representation associated with the spatial information file or called the target video
  • the client can obtain the spatial information carrying manner by parsing the MPD, thereby obtaining the spatial letter according to the carrying manner. interest.
  • the spatial information data or the spatial information track may further include a width and height type identifier for indicating the target space object, where the width and height type identifier may be used to indicate the target space object for describing A coordinate system of width and height, or the width and height type identification may be used to indicate a coordinate system for describing a boundary of the target spatial object.
  • the width and height type identifier may be an identifier, and may also include a wide type identifier and a high type identifier.
  • the space type identifier and the same attribute space information may be encapsulated in the same box (for example, a 3dsc box), and the width and height type identifier may also be encapsulated in the same box as the different attribute space information of the target space object. For example, mdat box).
  • the server may encapsulate the width and height type identifier and the same attribute space information in the same box (for example, a 3dsc box). Further, when the target space information is encapsulated in a file (a spatial information file) independent of the target video data or a track (a spatial information track) independent of the target video data, the server may add a width and height type identifier in the 3dsc box. .
  • the same attribute space information and the attribute space information of the target space object may also be encapsulated in the track matedata of the video, such as in the same box, for example, a trun box or Tfhd box or tfhd box or new box.
  • the coordinate system for describing the width and height of the target space object is as shown in FIG. 10, and the shaded portion of the spherical surface is the target space object, and the four corners of the target space object are The vertices are B, E, G, and I respectively; in FIG.
  • O is the center of the sphere corresponding to the 360-degree VR panoramic video spherical image
  • the vertices BEGI are respectively the circle of the spherical center (the circle is centered on the center of the sphere O, and The radius of the circle is the radius of the sphere corresponding to the 360 degree VR panoramic video spherical image
  • the circle passes the z axis
  • the number of the circle is two, one passing point BAIO, one passing point EFGO), and parallel to the coordinate axis x axis
  • the circle of the y-axis (the circle is not centered on the center of the sphere O, the number of the circle is two, and the two circles are parallel to each other, one passing point BDE, one passing point IHG) on the spherical surface
  • C is the target
  • the center point of the spatial object, the angle corresponding to the DH side is represented as the height of the target space object, the angle corresponding to the AF side is represented as the width of the target space
  • the vertex of the angle corresponding to the IG side is the intersection of the circle where the IHG is located and the z-axis in the above circle, and the vertex of the angle corresponding to the AF edge.
  • the vertices of the corners corresponding to the BI side, the EG side, and the DH side are also O points.
  • the target space object can also be two circles passing the x-axis and parallel to the y-axis and the z-axis, but the two circles of the spherical center are intersected, and the target space object can also be The two circles of the y-axis are obtained by intersecting the two circles parallel to the x-axis and the z-axis but not the center of the sphere.
  • the width and height type identifier is 1
  • the coordinate system for describing the width and height of the target space object is as shown in FIG. 11, and the shaded portion of the spherical surface is the target space object, and the vertices of the four corners of the target space object are B, E, respectively. , G, I; In Fig.
  • O is the spherical center corresponding to the 360-degree VR panoramic video spherical image
  • the apex BEGI is a circle passing through the z-axis (the circle is centered on the center O, and the radius of the circle is 360)
  • the radius of the sphere corresponding to the VR panoramic video spherical image, the number of the circle is two, one passing point BAI, one passing point EFG), and a circle passing the y-axis (the circle is centered on the center O, and the circle
  • the radius is the radius of the sphere corresponding to the 360 degree VR panoramic video spherical image.
  • the number of the circle is two, one passing point BDE, one passing point IHG) on the spherical surface, C is the center point of the target space object, DH
  • the angle corresponding to the edge is represented as the height of the target space object
  • the angle corresponding to the AF edge is represented as the width of the target space object
  • the DH edge and the AF edge pass the C point, wherein the BI edge, the EG edge and the DH edge correspond to the same angle
  • the BE edge The angles corresponding to the IG edge and the AF edge are the same; the top of the corner corresponding to the BE edge
  • the point is point J, the point J is the intersection of the circle passing through the two points of BE and parallel to the x-axis and the y-axis, and the apex of the angle corresponding to the IG side is a circle passing through two points of IG and parallel with the x-axis and the y-axis.
  • the intersection with the z-axis, the vertex of the angle corresponding to the AF side is the O point, the vertex of the angle corresponding to the BI side is the L point, and the point L is the intersection of the circle parallel to the z-axis and the x-axis and the y-axis.
  • the vertex of the angle corresponding to the EG side is the intersection of the circle passing through the EG two points and parallel to the z-axis and the x-axis, and the apex of the angle corresponding to the DH side is also the O-point.
  • the target space object can also be obtained by intersecting two circles passing the x-axis and two circles passing the z-axis.
  • the target space object can also be two circles passing the x-axis and The two circles of the y-axis are intersected.
  • the coordinate system for describing the width and height of the target space object is as shown in FIG. 12, and the shaded portion of the spherical surface is the target space object, and the vertices of the four corners of the target space object are B, E, respectively. , G, I; In FIG.
  • O is a spherical center corresponding to a 360-degree VR panoramic video spherical image
  • the vertices BEGI are respectively a circle parallel to the coordinate axis x-axis and the z-axis (the circle is not centered on the spherical center O,
  • the number of the circles is two, and the two circles are parallel to each other, the number of the circles is two, one passing point BAI, one passing point EFG), and a circle parallel to the coordinate axis x-axis and y-axis (the circle Do not take the center of the sphere O as the center, the number of the circle is two, and the two circles are parallel to each other, one passing point BDE, one passing point I HG) on the spherical surface
  • C is the center point of the target space object
  • DH The angle corresponding to the edge is represented as the height of the target space object
  • the angle corresponding to the AF edge is represented as the width of the target space object
  • the target space object may also be parallel to the y-axis and the z-axis and the two circles of the spherical center and the y-axis and the x-axis are parallel, but the two circles of the spherical center intersect.
  • the target space object may also be parallel to the y-axis and the z-axis and the two circles of the spherical center are parallel to the z-axis and the x-axis and the two circles of the spherical center intersect.
  • the J point and the L point in FIG. 11 are the same as the J point in FIG. 10, the vertex of the corner corresponding to the BE side is the J point, and the vertex of the angle corresponding to the BI side is the L point; in FIG. 12, the BE side The vertices corresponding to the BI side are all O points.
  • the same attribute space information and the attribute space information of the target space object may also include description information of the target space object, such as the description information used to describe the target space object as a view area (for example, It may be a spatial object corresponding to the view code stream, or a region of interest, or the description information is used to describe the quality information of the target space object.
  • description information of the target space object such as the description information used to describe the target space object as a view area (for example, It may be a spatial object corresponding to the view code stream, or a region of interest, or the description information is used to describe the quality information of the target space object.
  • the tfhd box is either a new box syntax (syntax) to add the description information, or the description information (content_type) is added in the SphericalCoordinatesSample for implementing one or more of the following functions: describing the target space object as a view area
  • the target space object is described as a region of interest, and quality information for describing the target space object.
  • FIG. 13 is a schematic flowchart diagram of a method for processing information of a streaming media according to an embodiment of the present invention.
  • the execution body of the method may be a server, and may specifically be a computer device. As shown in FIG. 13, the method may include the following steps:
  • S1001 Acquire spatial information of each of two spatial objects associated with data of two images in the target video data
  • S1002 Determine, according to spatial information of each of the two spatial objects, target space information of the target spatial object, where the target spatial object is a spatial object in the two spatial objects, where the target spatial information includes the same attribute space information, and the same attribute space
  • the information includes the same information between the spatial information of the two spatial objects, and the spatial information of the other spatial objects other than the target spatial object includes the same attribute spatial information.
  • the method may further include transmitting the target space information to the client.
  • the target space information may further include hetero-attribute space information of the target space object, and the spatial information of the other spatial object further includes hetero-attribute space information of the other spatial object, and the hetero-attribute space of the target spatial object The information is different from the heterogeneous information of the other spatial object.
  • the target spatial information may include location information of a central point of the target spatial object or location information of an upper left point of the target spatial object, the target spatial information may further include a width of the target spatial object and the target The height of the space object.
  • the spatial information of the two spatial objects may include location information of respective center points of the two spatial objects or position information of respective upper left points of the two spatial objects, and spatial information of each of the two spatial objects is further The respective widths of the two spatial objects and the respective heights of the two spatial objects may be included.
  • the target spatial information may include location information of an upper left point of the target spatial object, and location information of a lower right point of the target spatial object.
  • the spatial information of each of the two spatial objects may include location information of respective upper left points of the two spatial objects and position information of respective lower right points of the two spatial objects.
  • the target spatial information can include spatial rotation information for the target spatial object.
  • the spatial information of each of the two spatial objects may include spatial rotation information of the two spatial objects.
  • the target spatial information may be encapsulated in spatial information data or a spatial information track
  • the spatial information data may be a code stream of the target video data, metadata of the target video data, or independent of the A file of target video data, which may be a track independent of the target video data.
  • the spatial information data or the spatial information track may further include a spatial information type identifier for indicating the type of the same attribute spatial information, where the spatial information type identifier is used to indicate that the target spatial information belongs to the same attribute spatial information.
  • Information for indicating the type of the same attribute spatial information, where the spatial information type identifier is used to indicate that the target spatial information belongs to the same attribute spatial information.
  • the same attribute spatial information may include a wide minimum of the target spatial object, and a high minimum of the target spatial object.
  • the width maximum of the target space object and the high maximum of the target space object may include a wide minimum of the target spatial object, and a high minimum of the target spatial object.
  • the spatial information type identifier may be encapsulated in the same box as the same attribute spatial information.
  • the spatial information data or the spatial information trajectory may further include a coordinate system identifier for indicating a coordinate system corresponding to the target spatial information, and the coordinate system is a pixel coordinate system or an angular coordinate system.
  • the coordinate system identifier may be encapsulated in the same box as the same attribute space information.
  • the spatial information data or the spatial information trajectory may further include a spatial rotation information identifier, where the spatial rotation information identifier is used to indicate whether the target spatial information includes spatial rotation information of the target spatial object.
  • FIG. 14 is a processing device 1100 for streaming information according to an embodiment of the present invention.
  • the information processing device 1100 may be a server, and may be a computer device.
  • the device 1100 includes an obtaining module 1101 and a determining module 1102, where
  • the acquiring module 1101 is configured to acquire target space information of a target space object, where the target space object is a spatial object of two spatial objects, where the two spatial objects are associated with data of two images included in the target video data, the target
  • the spatial information includes the same attribute space information
  • the same attribute space information includes the same information between the spatial information of the two spatial objects, and the spatial information of the other spatial objects except the target spatial object includes the spatial information.
  • Same attribute space information is configured to acquire target space information of a target space object, where the target space object is a spatial object of two spatial objects, where the two spatial objects are associated with data of two images included in the target video data, the target The spatial information includes the same attribute space information, and the same attribute space information includes the same information between the spatial information of the two spatial objects, and the spatial information of the other spatial objects except the target spatial object includes the spatial information. Same attribute space information.
  • the determining module 1102 is configured to determine video data to be played according to the target space information acquired by the acquiring module.
  • the information processing apparatus 1100 may further include a display module (or a display) for displaying the video data to be played.
  • the acquisition module 1101 is configured to receive the target spatial information from the server.
  • the acquisition module may be a receiving module (or a receiver, or a transceiver).
  • the target space information may further include hetero-attribute space information of the target space object, and the spatial information of the other spatial object further includes hetero-attribute space information of the other spatial object, and the hetero-attribute space of the target spatial object The information is different from the heterogeneous information of the other spatial object.
  • the target spatial information may include location information of a central point of the target spatial object or location information of an upper left point of the target spatial object, the target spatial information may further include a width of the target spatial object and the target The height of the space object.
  • the spatial information of the two spatial objects may include location information of respective center points of the two spatial objects or position information of respective upper left points of the two spatial objects, and spatial information of each of the two spatial objects is further The respective widths of the two spatial objects and the respective heights of the two spatial objects may be included.
  • the target spatial information may include location information of an upper left point of the target spatial object, and location information of a lower right point of the target spatial object.
  • the spatial information of each of the two spatial objects may include location information of respective upper left points of the two spatial objects and position information of respective lower right points of the two spatial objects.
  • the target spatial information can include spatial rotation information for the target spatial object.
  • the spatial information of each of the two spatial objects may include spatial rotation information of the two spatial objects.
  • the target spatial information may be encapsulated in spatial information data or spatial information trajectories.
  • the spatial information data may be a code stream of the target video data, metadata of the target video data, or a file independent of the target video data
  • the spatial information track may be a track independent of the target video data.
  • the spatial information data or the spatial information track may further include a spatial information type identifier for indicating the type of the same attribute spatial information, where the spatial information type identifier is used to indicate that the target spatial information belongs to the same attribute spatial information.
  • Information for indicating the type of the same attribute spatial information, where the spatial information type identifier is used to indicate that the target spatial information belongs to the same attribute spatial information.
  • the same attribute spatial information may include a wide minimum of the target spatial object, and a high minimum of the target spatial object.
  • the width maximum of the target space object and the high maximum of the target space object may include a wide minimum of the target spatial object, and a high minimum of the target spatial object.
  • the spatial information type identifier may be encapsulated in the same box as the same attribute spatial information.
  • the spatial information data or the spatial information trajectory may further include a coordinate system identifier for indicating a coordinate system corresponding to the target spatial information, and the coordinate system is a pixel coordinate system or an angular coordinate system.
  • the coordinate system identifier may be encapsulated in the same box as the same attribute space information.
  • the spatial information data or the spatial information trajectory may further include a spatial rotation information identifier, where the spatial rotation information identifier is used to indicate whether the target spatial information includes spatial rotation information of the target spatial object.
  • the functions of the obtaining module 1101 and the determining module 1102 can be implemented by means of software programming, hardware programming, or circuit implementation, which is not limited herein.
  • a processing device 1200 for providing information of a streaming media where the device includes an obtaining module 1201 and a determining module 1202, where
  • the obtaining module 1201 is configured to acquire spatial information of each of two spatial objects associated with data of two images in the target video data.
  • the determining module 1202 is configured to determine target space information of the target spatial object according to the spatial information of the two spatial objects acquired by the acquiring module, where the target spatial object is a spatial object in the two spatial objects, and the target spatial information is Include the same attribute space information, the same attribute space information includes the same information between the spatial information of the two spatial objects, and the spatial information of the other spatial objects except the target spatial object includes the same attribute Spatial information.
  • the apparatus 1200 may further include a sending module (or a transmitter, or a transceiver) for transmitting the target space information determined by the determining module to the client.
  • a sending module or a transmitter, or a transceiver for transmitting the target space information determined by the determining module to the client.
  • the target space information may further include hetero-attribute space information of the target space object, and the spatial information of the other spatial object further includes hetero-attribute space information of the other spatial object, and the hetero-attribute space of the target spatial object The information is different from the heterogeneous information of the other spatial object.
  • the target spatial information may include location information of a central point of the target spatial object or location information of an upper left point of the target spatial object, the target spatial information may further include a width of the target spatial object and the target The height of the space object.
  • the spatial information of the two spatial objects may include location information of respective center points of the two spatial objects or position information of respective upper left points of the two spatial objects, and spatial information of each of the two spatial objects is further The respective widths of the two spatial objects and the respective heights of the two spatial objects may be included.
  • the target spatial information may include location information of an upper left point of the target spatial object, and location information of a lower right point of the target spatial object.
  • the spatial information of each of the two spatial objects may include location information of respective upper left points of the two spatial objects and position information of respective lower right points of the two spatial objects.
  • the target spatial information can include spatial rotation information for the target spatial object.
  • the spatial information of each of the two spatial objects may include spatial rotation information of the two spatial objects.
  • the target spatial information may be encapsulated in spatial information data or a spatial information track
  • the spatial information data may be a code stream of the target video data, metadata of the target video data, or independent of the A file of target video data, which may be a track independent of the target video data.
  • the spatial information data or the spatial information track may further include a spatial information type identifier for indicating the type of the same attribute spatial information, where the spatial information type identifier is used to indicate that the target spatial information belongs to the same attribute spatial information.
  • Information for indicating the type of the same attribute spatial information, where the spatial information type identifier is used to indicate that the target spatial information belongs to the same attribute spatial information.
  • the same attribute spatial information may include a wide minimum of the target spatial object, and a high minimum of the target spatial object.
  • the width maximum of the target space object and the high maximum of the target space object may include a wide minimum of the target spatial object, and a high minimum of the target spatial object.
  • the spatial information type identifier may be encapsulated in the same box as the same attribute spatial information.
  • the spatial information data or the spatial information trajectory may further include a coordinate system identifier for indicating a coordinate system corresponding to the target spatial information, and the coordinate system is a pixel coordinate system or an angular coordinate system.
  • the coordinate system identifier may be encapsulated in the same box as the same attribute space information.
  • the spatial information data or the spatial information trajectory may further include a spatial rotation information identifier, where the spatial rotation information identifier is used to indicate whether the target spatial information includes spatial rotation information of the target spatial object.
  • the functions of the obtaining module 1201 and the determining module 1202 can be implemented by means of software programming, and can also be implemented by hardware programming or by a circuit, which is not limited herein.
  • FIG. 16 is a schematic diagram showing the hardware structure of a computer device 1300 according to an embodiment of the present invention.
  • the computer device 1300 can be used as an implementation of the processing device 1100 for streaming media information, and can also be used as an implementation of the streaming device information processing device 1200.
  • the computer device 1300 includes a processor 1302. Memory 1304, input/output interface 1306, communication interface 1308, and bus 1310.
  • the processor 1302, the memory 1304, the input/output interface 1306, and the communication interface 1308 implement communication connections with each other through the bus 1310.
  • the processor 1302 may be a general-purpose central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), or one or more integrated circuits for executing related programs.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • the processor 1302 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 1302 or an instruction in a form of software.
  • the above processing The device 1302 can be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA), or other programmable logic device, discrete gate or transistor logic device, discrete hardware component.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory 1304, and the processor 1302 reads the information in the memory 1304, and combines the hardware to complete the processing device 1100 of the information of the streaming media provided by the embodiment of the present invention or the module included in the processing device 1200 of the streaming media information.
  • the function to be performed, or the processing method of the information of the streaming media corresponding to FIG. 8 or FIG. 13 provided by the embodiment of the method of the present invention.
  • the memory 1304 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). Memory 1304 can store operating systems as well as other applications.
  • ROM read only memory
  • RAM random access memory
  • Memory 1304 can store operating systems as well as other applications.
  • the processing code for executing the information of the technical solution provided by the embodiment of the present invention is stored in the memory 1304, and the processing device 1100 for executing the information of the streaming media by the processor 1302 Or an operation performed by a module included in the processing device 1200 for streaming information, or a method for generating a regression model for predicting a key performance indicator provided by an embodiment of the method of the present invention.
  • the input/output interface 1306 is configured to receive input data and information, and output data such as operation results. It can be used as the acquisition module 1101 in the device 1100, or the acquisition module 1201 or the transmission module in the device 1200.
  • Communication interface 1308 implements communication between computer device 1300 and other devices or communication networks using transceivers such as, but not limited to, transceivers. It can be used as the acquisition module 1101 in the device 1100, or the acquisition module 1201 or the transmission module in the device 1200.
  • Bus 1310 can include a path for communicating information between various components of computer device 1300, such as processor 1302, memory 1304, input/output interface 1306, and communication interface 1308.
  • the computer device 1300 shown in FIG. 16 only shows the processor 1302, the memory 1304, the input/output interface 1306, the communication interface 1308, and the bus 1310, those skilled in the art will understand in the specific implementation process.
  • the computer device 1300 further includes other devices necessary for normal operation, for example, when the processing device 1100 implements information as streaming media, the processing device 1100 for streaming media information may further include a display for displaying the video to be played. data.
  • computer device 1300 may also include hardware devices that implement other additional functions, depending on the particular needs.
  • computer device 1300 may also only include the components necessary to implement embodiments of the present invention, and does not necessarily include all of the devices shown in FIG.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Graphics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种流媒体的信息的处理方法以及装置,其中流媒体的信息的处理方法包括:获取目标空间对象的目标空间信息,目标空间对象为目标视频数据中的两个图像的数据所关联的两个空间对象中的空间对象,两个图像的数据为两个图像的数据,目标空间信息包括同属性空间信息,同属性空间信息包括所述两个空间对象各自的空间信息之间相同的信息,两个空间对象中除目标空间对象外的其他空间对象的空间信息包括同属性空间信息;根据目标空间信息,确定要播放的视频数据。通过将一组同属性空间信息替代两个空间对象各自的空间信息中重复的部分,实现了减少空间信息的冗余,从而减少空间信息的数据量。

Description

一种信息的处理方法及装置 技术领域
本发明涉及流媒体处理领域,尤其涉及一种信息的处理方法及装置。
背景技术。
一、MPEG-DASH技术介绍
2011年11月,MPEG组织批准了DASH标准,DASH标准是基于HTTP协议传输媒体流的技术规范(以下称DASH技术规范);DASH技术规范主要由两大部分组成:媒体呈现描述(英文:Media Presentation Description,MPD)和媒体文件格式(英文:file format)。
1、媒体文件格式
媒体文件格式属于一种文件格式,在DASH中服务器会为同一个视频内容准备多种版本的码流,每个版本的码流在DASH标准中称为表示(英文:representation)。表示是在传输格式中的一个或者多个码流的集合和封装,一个表示中包含一个或者多个分段。不同版本的码流的码率、分辨率等编码参数可以不同,每个码流分割成多个小的文件,每个小文件被称为分段(或称分段,英文:segment)。在客户端请求媒体分段数据的过程中可以在不同的媒体表示之间切换。segment可以按照标准ISO/IEC 14496-12中的格式封装(ISO BMFF(Base Media File Format)),也可以是按照ISO/IEC 13818-1中的格式封装(MPEG-2TS)。
2、媒体呈现描述
在DASH标准中,媒体呈现描述被称为MPD,MPD可以是一个xml的文件,文件中的信息是采用分级方式描述,如图1所示,上一级的信息被下一级完全继承。在该文件中描述了一些媒体元数据,这些元数据可以使得客户端了解服务器中的媒体内容信息,并且可以使用这些信息构造请求segment的http-URL。
在DASH标准中,媒体呈现(英文:media presentation),是呈现媒体内容的结构化数据的集合;媒体呈现描述(英文:media presentation description),一个规范化描述媒体呈现的文件,用于提供流媒体服务;时期(英文:period),一组连续的时期组成整个媒体呈现,时期具有连续和不重叠的特性;在MPD中,表示(英文:representation)是传输格式中一个或者多个码流的描述信息的集合和封装,一个表示中包含一个或者多个分段;自适应集(英文:AdaptationSet),表示同一媒体内容成分的多个可互替换的编码版本的集合,一个自适应集包含一个或者多个表示;子集(英文:subset),一组自适应集合的组合,当播放器播放其中所有自适应集合时,可以获得相应的媒体内容;分段信息,是媒体呈现描述中的HTTP统一资源定位符引用的媒体单元,分段信息描述媒体数据的分段,媒体数据的分段可以存储在一个文件中,也可以单独存储,在一种可能的方式中,MPD中会存储媒体数据的分段。
本发明有关MPEG-DASH技术的相关技术概念可以参考ISO/IEC 23009-1 Information technology--Dynamic adaptive streaming over HTTP(DASH)--Part 1:Media presentation description and segment formats,中的有关规定,也可以参考历史标准版本中的相关规定,如ISO/IEC 23009-1:2013或ISO/IEC 23009-1:2012等。
二、虚拟现实(virtual reality,VR)技术介绍
虚拟现实技术是一种可以创建和体验虚拟世界的计算机仿真***,它利用计算机生成一种模拟环境,是一种多源信息融合的交互式的三维动态视景和实体行为的***仿真,可以使用户沉浸到该环境中。VR主要包括模拟环境、感知、自然技能和传感设备等方面。模拟环境是由计算机生成的、实时动态的三维立体逼真图像。感知是指理想的VR应该具有一切人所具有的感知。除计算机图形技术所生成的视觉感知外,还有听觉、触觉、力觉、运动等感知,甚至还包括嗅觉和味觉等,也称为多感知。自然技能是指人的头部转动,眼睛、手势、或其他人体行为动作,由计算机来处理与参与者的动作相适应的数据,并对用户的输入作出实时响应,并分别反馈到用户的五官。传感设备是指三维交互设备。当VR视频(或者360度视频,或者全方位视频(英文:Omnidirectional video))在头戴设备和手持设备上呈现时,只有对应于用户头部的方位部分的视频图像呈现和相关联的音频呈现。
VR视频和通常的视频(英文:normal video)的差别在于通常的视频是整个视频内容都会被呈现给用户;VR视频是只有整个视频的一个子集被呈现给用户(英文:in VRtypically only a subset of the entire video region represented by the videopictures)。
二、空间信息介绍
现有标准中,对空间信息的描述原文是“The SRD scheme allows Media Presentation authors to express spatial relationships between Spatial Objects.A Spatial Object is defined as a spatial part of a content component(e.g.a region of interest,or a tile)and represented by either an Adaptat ion Set or a Sub-Representation.”
【中文】:空间信息是空间对象(即Spatial Objects)之间的空间关系(即spatial relationships)。空间对象被定义为一个内容成分的一部分空间,比如现有的感兴趣区域(英文:region of interest,ROI)和tile;空间关系可以在Adaptation Set和Sub-Representation中描述。现有标准中,MPD中可以描述空间对象的空间信息。
在ISO/IEC 14496-12(2012)标准文档中,文件是由许多个盒(box)和完整盒(FullBox)构成的。每个Box由头部(Header)和数据(Data)组成。FullBox是Box的扩展。Header包含了整个Box的长度size和类型type。当size==0时,代表这是文件中最后一个Box;当size==1时,意味着Box长度需要更多bits来描述。Data是Box的实际数据,可以是纯数据也可以是更多的子Boxes。
在ISO/IEC 14496-12(2012)的标准文档中,使用“tref box”描述轨道(track)之间关系。比如:一个MP4文件中有三条视频(video)track,ID分别是2、3、4,以及三条audio track,ID分别是6、7、8。可以在track 2与track 6的tref box中指定一下,将2与6两条track绑定起来播放。
在目前的标准的规定中,例如ISO/IEC 23000-20,媒体内容的track和元数据的track之间的联系采用的联系类型(reference_type)是‘cdsc’,比如在视频的track中解析到关联的track,且关联的类型是‘cdsc’,说明关联的track是用来描述视频的track的元数据track。但是在实际应用中,描述媒体内容的元数据有很多种类,不同的种类能够提供给用户的使用方法不同。采用现有的方法客户端需要将文件中包含的所有track都解析完之后,再根据媒体内容的track和元数据的track之间的联系采用的联系类型 来确定和媒体内容相关联的track的属性,从而确定视频track都具有那些属性,不同的属性能够够给用户提供哪些体验。也就是说,现有的方法中如果要确定某媒体track在呈现时客户端能够有哪些操作,需要依赖于文件中所有的track解析完成才能确定,增加了客户端实现流程中的复杂度。
随着虚拟现实(英文:virtual reality,VR)技术的日益发展完善,360度视角等VR视频的观看应用越来越多地呈现在用户面前。在VR视频观看过程中,用户随时可能变换视角(英文:field of view,FOV),每个视角对应一个空间对象(可以理解为VR视频中的一个区域)的视频数据,视角切换时呈现在用户视角内的VR视频图像也应当随着切换。
现有技术在VR视频呈现时会呈现能够覆盖人眼视角的空间对象的视频数据,用户观看的空间对象可以是多数用户都选择观看的感兴趣区域,也可以是视频制作者指定的区域,该区域会随着时间的变化而不断变化。为了标识图像数据所关联的空间对象在在VR视频中的位置,需要将用于描述空间对象在VR视频中位置的空间信息封装在相应的文件中。由于视频数据中的图像数据对应大量图像,因此大量图像的大量的空间信息会造成数据量过大。
发明内容
本发明实施例提供了一种流媒体的信息的处理方法及装置,可节省空间信息的数据量。
上述目标和其它目标将通过独立权利要求中的特征来达成。进一步的实现方式在从属权利要求、说明书和附图中体现。
第一方面,提供一种流媒体的信息的处理方法,该方法包括:
获取目标空间对象的目标空间信息,该目标空间对象为两个空间对象中的空间对象,该两个空间对象与目标视频数据中包括的两个图像的数据关联,该目标空间信息包括同属性空间信息,该同属性空间信息包括该两个空间对象各自的空间信息之间相同的信息,该两个空间对象中除该目标空间对象外的其他空间对象的空间信息包括该同属性空间信息;
根据该目标空间信息,确定要播放的视频数据。
其中,目标视频数据可以是码流,也就是目标视频码流,也可以是没有经过编码的原始数据或者经过重建的重建数据,也就是目标视频序列。在目标视频数据为目标视频码流时,两个图像的数据可以为两个图像的编码数据,在目标视频数据为目标视频序列时,两个图像的数据为两个图像的原始数据或者重建数据。
其中,该两个图像与该两个空间对象可以一一对应。
其中,该同属性空间信息为一组信息,该两个空间对象各自的空间信息共享该组信息。
其中,获取目标空间对象的目标空间信息可以包括接收来自服务器的目标空间对象的目标空间信息。
其中,两个图像可以是视频序列中的两个帧,即是说两个图像对应不同的时刻,两个图像也可以是视频序列中同一个帧的子图像,即是说两个图像对应相同的时刻,两个图像也可以是视频序列中不同帧的子图像。
通过将一组同属性空间信息表示两个空间对象各自的空间信息中重复的部分,实现了减少空间信息的冗余,从而减少空间信息的数据量。
其中,该根据该目标空间信息,确定要播放的视频数据可以包括:根据该目标空间信息,判断该目标空间对象是否包括要播放的画面对应的空间对象的全部或部分,在该目标空间对象包括要播放的画面对应的空间对象的全部或部分时,则该目标视频数据为要播放的视频数据。
其中,该根据该目标空间信息,确定要播放的视频数据也可以包括:根据该目标空间信息,以及要播放的画面对应的空间对象相对于该目标空间对象的空间关系(或者说,视角切换的轨迹),确定要播放的画面对应的空间对象(或者说,视角切换后的空间对象)的空间信息,从而进一步确定要播放的视频数据。具体的,要播放的视频数据可以是要播放的视频码流,获取目标空间对象的目标空间信息之后则可确定目标空间对象在全景空间(或者称为全景空间对象)中的相对位置,进而可在视频播放过程中根据目标空间对象的目标空间信息和视角切换的轨迹,确定视角切换后的空间对象的位置。并进一步向服务器请求要播放的画面对应的空间对象对应的要播放的视频码流。具体实现中,可根据MPD中描述的各个空间对象的码流的URL等信息,向服务器发送获取要播放的视频码流的请求,从而获取得到要播放的视频码流,解码并播放要播放的视频码流。
根据第一方面,在第一方面的第一种可能的实现方式中,该目标空间信息还包括该目标空间对象的异属性空间信息,该其他空间对象的空间信息还包括该其他空间对象的异属性空间信息,该目标空间对象的异属性空间信息与该其他空间对象的异属性信息不同。
其中,该目标空间对象的异属性空间信息与该其他空间对象的异属性信息不同可以指两个异属性空间信息的值不同。
根据第一方面或者第一方面第一种实现方式,在第一方面的第二种可能的实现方式中,该目标空间信息包括该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息,该目标空间信息还包括该目标空间对象的宽和该目标空间对象的高。
其中,该目标空间信息也可以包括该目标空间对象中的其他位置点(左下点、右上点、右下点或者预设的点)的位置信息来代替该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息。
其中,该左上点为该目标空间对象中横坐标的值和纵坐标的值均为最小的点。
其中,在该目标空间信息对应的坐标系为角度坐标系时,中心点的位置信息或者左上点的位置信息可以是俯仰角θ(pitch)和偏航角ψ(yaw),也可以是俯仰角θ(pitch)、偏航角ψ(yaw)和滚转角Φ(roll),在该目标空间信息对应的坐标系为像素坐标系时,中心点的位置信息或者左上点的位置信息可以是以像素为单位的横坐标和以像素为单位的纵坐标。
根据第一方面或者第一方面第一种实现方式,在第一方面的第三种可能的实现方式中,该目标空间信息包括该目标空间对象的左上点的位置信息,和该目标空间对象的右下点的位置信息。
其中,该目标空间信息也可以包括该目标空间对象的右上点的位置信息,和该目标空间对象的左下点的位置信息。
根据第一方面或者第一方面第一种实现方式至第一方面第三种实现方式中任意一种 实现方式,在第一方面的第四种可能的实现方式中,该目标空间信息包括该目标空间对象的空间旋转信息。
其中,该目标空间对象的空间旋转信息可以用于指示该目标空间对象相对于全景空间对象的横向坐标轴或纵向坐标轴旋转的程度,该目标空间对象位于该全景空间对象中。
其中,在该目标空间信息对应的坐标系为角度坐标系时,该空间旋转信息可以是滚转角Φ(roll),在该目标空间信息对应的坐标系为像素坐标系时,空间旋转信息可以用由滚转角换算得到的该目标空间对象中的位置点的运动矢量来表示,运动矢量以像素为单位。
根据第一方面或者第一方面第一种实现方式至第一方面第四种实现方式中任意一种实现方式,在第一方面的第五种可能的实现方式中,该目标空间信息封装在空间信息数据或者空间信息轨迹(track)中,该空间信息数据为该目标视频数据的码流、该目标视频数据的元数据或者独立于该目标视频数据的文件,该空间信息轨迹为独立于该目标视频数据的轨迹。
其中,独立于该目标视频数据的文件可以是用于描述空间信息的空间信息文件。独立于该目标视频数据的轨迹可以是用于描述空间信息的空间信息轨迹。
其中,在目标空间信息封装在该目标视频数据的码流中时,目标空间信息可以封装在该目标视频数据的码流中的辅助增强信息单元或者参数集单元中,目标空间信息也可以封装在该目标视频数据所在的表示的分段中。具体的,可以封装在分段中指定box(例如,trun box或者tfhd box)中。
其中,该同属性空间信息和该目标空间对象的异属性空间信息可以封装在同一个box中,也可以封装在不同的box中。具体的,同属性空间信息可以封装在3dsc box中,该目标空间对象的异属性空间信息可以封装在于mdat box中。
根据第一方面第五种实现方式,在第一方面的第六种可能的实现方式中,该空间信息数据或者该空间信息轨迹中还包括用于指示该同属性空间信息的类型的空间信息类型标识,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息。
从定义可知,该同属性空间信息和该目标空间对象的异属性空间信息是互补的,因此,在该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息时,该空间信息类型标识同样可以用于指示该目标空间信息中属于该目标空间对象的异属性空间信息的信息。
其中,空间信息类型标识也可以用于指示同属性空间信息或者该目标空间对象的异属性空间信息的空间信息类型,可选的空间信息类型可以包括但不限于:包括空间对象的位置信息但不包括空间对象的宽高信息的空间信息、包括空间对象的宽高信息但不包括空间对象的位置信息的空间信息、包括空间对象的宽高信息和空间对象的位置信息的空间信息。
其中,空间信息类型标识也可以用于指示两个空间对象的空间对象类型,可选的空间对象类型可以包括但不限于:位置和宽高不变的空间对象、位置变化和宽高不变的空间对象、位置不变和宽高变化的空间对象、以及位置和宽高均变化的空间对象。
其中,对于第一方面第二种实现方式,在空间信息类型标识为第一预设值时,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息为该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息,以及该目标空间对 象的宽和该目标空间对象的高。在空间信息类型标识为第二预设值时,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息为该目标空间对象的宽和该目标空间对象的高。在空间信息类型标识为第三预设值时,该空间信息类型标识用于指示该目标空间信息中没有属于该同属性空间信息的信息。
相应的,对于第一方面第二种实现方式,在空间信息类型标识为第一预设值时,空间信息类型标识还指示不存在异属性空间信息。在空间信息类型标识为第二预设值时,空间信息类型标识还指示该目标空间对象的异属性空间信息为该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息。在空间信息类型标识为第三预设值时,空间信息类型标识还指示该目标空间对象的异属性空间信息为该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息,以及该目标空间对象的宽和该目标空间对象的高。
其中,对于第一方面第三种实现方式,在空间信息类型标识为第四预设值时,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息为该目标空间对象的左上点的位置信息,和该目标空间对象的右下点的位置信息。在空间信息类型标识为第五预设值时,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息为该目标空间对象的右下点的位置信息。在空间信息类型标识为第六预设值时,该空间信息类型标识用于指示该目标空间信息中没有属于该同属性空间信息的信息。需要说明的是,该目标空间对象的左上点的位置信息或者该目标空间对象的右下点的位置信息,可以用该目标空间对象的宽和该目标空间对象的高来替代。
相应的,对于第一方面第三种实现方式,在空间信息类型标识为第四预设值时,空间信息类型标识还指示不存在异属性空间信息。在空间信息类型标识为第五预设值时,空间信息类型标识还指示该目标空间对象的异属性空间信息为该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息。在空间信息类型标识为第六预设值时,空间信息类型标识还指示该目标空间对象的异属性空间信息为该目标空间对象的左上点的位置信息,和该目标空间对象的右下点的位置信息。需要说明的是,该目标空间对象的左上点的位置信息或者该目标空间对象的右下点的位置信息,可以用该目标空间对象的宽和该目标空间对象的高来替代。
根据第一方面第六种实现方式,在第一方面的第七种可能的实现方式中,在该空间信息类型标识指示该目标空间信息中没有属于该同属性空间信息的信息时,该同属性空间信息包括该目标空间对象的宽的最小值、该目标空间对象的高的最小值、该目标空间对象的宽的最大值和该目标空间对象的高的最大值。
根据第一方面第六种实现方式或第一方面第七种实现方式,在第一方面的第八种可能的实现方式中,该空间信息类型标识与该同属性空间信息封装在同一个box中。
根据第一方面第五种实现方式至第一方面第八种实现方式种任意一种实现方式,在第一方面的第九种可能的实现方式中,该空间信息数据或者该空间信息轨迹中还包括用于指示该目标空间信息对应的坐标系的坐标系标识,该坐标系为像素坐标系或者角度坐标系。
其中,在该坐标系为像素坐标系时,位置信息用以像素为单位的坐标来表示,宽和高同样以像素为单位来表示。在该坐标系为角度坐标系时,位置信息用角度来表示,例如,可以是俯仰角θ(pitch)和偏航角ψ(yaw),也可以是俯仰角θ(pitch)、偏航 角ψ(yaw)和滚转角Φ(roll),宽和高用角度来表示角度范围。
根据第一方面第九种实现方式,在第一方面的第十种可能的实现方式中,该坐标系标识与该同属性空间信息封装在同一个box中。
根据第一方面第五种实现方式至第一方面第十种实现方式种任意一种实现方式,在第一方面的第十一种可能的实现方式中,该空间信息数据或者该空间信息轨迹还包括空间旋转信息标识,该空间旋转信息标识用于指示该目标空间信息是否包括该目标空间对象的空间旋转信息。
其中,空间旋转信息标识可以与该同属性空间信息封装在同一个box中,空间旋转信息标识也可以与该目标空间对象的异属性空间信息封装在同一个box中。具体的,对于空间旋转信息标识与该目标空间对象的异属性空间信息封装在同一个box的情况,在该空间旋转信息标识指示该目标空间信息包括该目标空间对象的空间旋转信息时,则该目标空间对象的异属性空间信息包括空间旋转信息。
其中,在该空间旋转信息标识指示该目标空间信息包括该目标空间对象的空间旋转信息时,表明目标空间对象有旋转。在该空间旋转信息标识指示该目标空间信息不包括该目标空间对象的空间旋转信息时,表明目标空间对象没有旋转。
第二方面,提供一种流媒体的信息的处理方法,其特征在于,该方法包括:
获取目标视频数据中的两个图像的数据所关联的两个空间对象各自的空间信息;
根据两个空间对象各自的空间信息,确定目标空间对象的目标空间信息,该目标空间对象为该两个空间对象中的空间对象,该目标空间信息包括同属性空间信息,该同属性空间信息包括该两个空间对象各自的空间信息之间相同的信息,该两个空间对象中除该目标空间对象外的其他空间对象的空间信息包括该同属性空间信息。
其中,该方法还可以包括:向客户端发送该目标空间信息。
根据第二方面,在第二方面的第一种可能的实现方式中,该目标空间信息还可以包括该目标空间对象的异属性空间信息,该其他空间对象的空间信息还包括该其他空间对象的异属性空间信息,该目标空间对象的异属性空间信息与该其他空间对象的异属性信息不同。
根据第二方面或者第二方面第一种实现方式,在第二方面的第二种可能的实现方式中,该目标空间信息可以包括该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息,该目标空间信息还可以包括该目标空间对象的宽和该目标空间对象的高。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的中心点的位置信息或者该两个空间对象各自的左上点的位置信息,该两个空间对象各自的空间信息还可以包括该两个空间对象各自的宽和该两个空间对象各自的高。
根据第二方面或者第二方面第一种实现方式,在第二方面的第三种可能的实现方式中,该目标空间信息可以包括该目标空间对象的左上点的位置信息,和该目标空间对象的右下点的位置信息。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的左上点的位置信息和该两个空间对象各自的右下点的位置信息。
根据第二方面或者第二方面第一种实现方式至第二方面第三种实现方式中任意一种实现方式,在第二方面的第四种可能的实现方式中,该目标空间信息可以包括该目标空 间对象的空间旋转信息。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的空间旋转信息。
根据第二方面或者第二方面第一种实现方式至第二方面第四种实现方式中任意一种实现方式,在第二方面的第五种可能的实现方式中,该目标空间信息可以封装在空间信息数据或者空间信息轨迹(track)中,该空间信息数据可以为该目标视频数据的码流、该目标视频数据的元数据或者独立于该目标视频数据的文件,该空间信息轨迹可以为独立于该目标视频数据的轨迹。
根据第二方面第五种实现方式,在第二方面的第六种可能的实现方式中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该同属性空间信息的类型的空间信息类型标识,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息。
根据第二方面第六种实现方式,在第二方面的第七种可能的实现方式中,在该空间信息类型标识指示目标空间信息中没有属于该同属性空间信息的信息时,该同属性空间信息可以包括该目标空间对象的宽的最小值、该目标空间对象的高的最小值、该目标空间对象的宽的最大值和该目标空间对象的高的最大值。
根据第二方面第六种实现方式或第二方面第七种实现方式,在第二方面的第八种可能的实现方式中,该空间信息类型标识可以与该同属性空间信息封装在同一个box中。
根据第二方面第五种实现方式至第二方面第八种实现方式种任意一种实现方式,在第二方面的第九种可能的实现方式中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该目标空间信息对应的坐标系的坐标系标识,该坐标系为像素坐标系或者角度坐标系。
根据第二方面第九种实现方式,在第二方面的第十种可能的实现方式中,该坐标系标识可以与该同属性空间信息封装在同一个box中。
根据第二方面第五种实现方式至第二方面第十种实现方式种任意一种实现方式,在第二方面的第十一种可能的实现方式中,该空间信息数据或者该空间信息轨迹还可以包括空间旋转信息标识,该空间旋转信息标识用于指示该目标空间信息是否包括该目标空间对象的空间旋转信息。
第三方面,提供一种流媒体的信息的处理装置,该装置包括:
获取模块,用于获取目标空间对象的目标空间信息,该目标空间对象为两个空间对象中的空间对象,该两个空间对象与目标视频数据中包括的两个图像的数据关联,该目标空间信息包括同属性空间信息,该同属性空间信息包括该两个空间对象各自的空间信息之间相同的信息,该两个空间对象中除该目标空间对象外的其他空间对象的空间信息包括该同属性空间信息;
确定模块,用于根据该获取模块获取的该目标空间信息,确定要播放的视频数据。
其中,该获取模块可以用于接收来自服务器的该目标空间信息。
根据第三方面,在第三方面的第一种可能的实现方式中,该目标空间信息还包括该目标空间对象的异属性空间信息,该其他空间对象的空间信息还包括该其他空间对象的异属性空间信息,该目标空间对象的异属性空间信息与该其他空间对象的异属性信息不同。
根据第三方面或者第三方面第一种实现方式,该目标空间信息包括该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息,该目标空间信息还包括该目标空间对象的宽和该目标空间对象的高。
根据第三方面或者第三方面第一种实现方式,在第三方面的第三种可能的实现方式中,该目标空间信息包括该目标空间对象的左上点的位置信息,和该目标空间对象的右下点的位置信息。
根据第三方面或者第三方面第一种实现方式至第三方面第三种实现方式中任意一种实现方式,在第三方面的第四种可能的实现方式中,该目标空间信息包括该目标空间对象的空间旋转信息。
根据第三方面或者第三方面第一种实现方式至第三方面第四种实现方式中任意一种实现方式,在第三方面的第五种可能的实现方式中,该目标空间信息封装在空间信息数据或者空间信息轨迹(track)中,该空间信息数据为该目标视频数据的码流、该目标视频数据的元数据或者独立于该目标视频数据的文件,该空间信息轨迹为独立于该目标视频数据的轨迹。
根据第三方面第五种实现方式,在第三方面的第六种可能的实现方式中,该空间信息数据或者该空间信息轨迹中还包括用于指示该同属性空间信息的类型的空间信息类型标识,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息。
根据第三方面第六种实现方式,在第三方面的第七种可能的实现方式中,在该空间信息类型标识指示目标空间信息中没有属于该同属性空间信息的信息时,该同属性空间信息包括该目标空间对象的宽的最小值、该目标空间对象的高的最小值、该目标空间对象的宽的最大值和该目标空间对象的高的最大值。
根据第三方面第六种实现方式或第三方面第七种实现方式,在第三方面的第八种可能的实现方式中,该空间信息类型标识与该同属性空间信息封装在同一个box中。
根据第三方面第五种实现方式至第三方面第八种实现方式种任意一种实现方式,在第三方面的第九种可能的实现方式中,该空间信息数据或者该空间信息轨迹中还包括用于指示该目标空间信息对应的坐标系的坐标系标识,该坐标系为像素坐标系或者角度坐标系。
根据第三方面第九种实现方式,在第三方面的第十种可能的实现方式中,该坐标系标识与该同属性空间信息封装在同一个box中。
根据第三方面第五种实现方式至第三方面第十种实现方式种任意一种实现方式,在第三方面的第十一种可能的实现方式中,该空间信息数据或者该空间信息轨迹还包括空间旋转信息标识,该空间旋转信息标识用于指示该目标空间信息是否包括该目标空间对象的空间旋转信息。
第四方面,提供一种流媒体的信息的处理装置,该装置包括:
获取模块,用于获取目标视频数据中的两个图像的数据所关联的两个空间对象各自的空间信息;
确定模块,用于根据该获取模块获取的该两个空间对象各自的空间信息,确定目标空间对象的目标空间信息,该目标空间对象为该两个空间对象中的空间对象,该目标空间信息包括同属性空间信息,该同属性空间信息包括该两个空间对象各自的空间信息之间相同的信息,该两个空间对象中除该目标空间对象外的其他空间对象的空间信息包括 该同属性空间信息。
其中,该装置还可以包括发送模块,用于向客户端发送该确定模块确定的该目标空间信息。
根据第四方面,在第四方面的第一种可能的实现方式中,该目标空间信息还可以包括该目标空间对象的异属性空间信息,该其他空间对象的空间信息还包括该其他空间对象的异属性空间信息,该目标空间对象的异属性空间信息与该其他空间对象的异属性信息不同。
根据第四方面或者第四方面第一种实现方式,在第四方面的第二种可能的实现方式中,该目标空间信息可以包括该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息,该目标空间信息还可以包括该目标空间对象的宽和该目标空间对象的高。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的中心点的位置信息或者该两个空间对象各自的左上点的位置信息,该两个空间对象各自的空间信息还可以包括该两个空间对象各自的宽和该两个空间对象各自的高。
根据第四方面或者第四方面第一种实现方式,在第四方面的第三种可能的实现方式中,该目标空间信息可以包括该目标空间对象的左上点的位置信息,和该目标空间对象的右下点的位置信息。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的左上点的位置信息和该两个空间对象各自的右下点的位置信息。
根据第四方面或者第四方面第一种实现方式至第四方面第三种实现方式中任意一种实现方式,在第四方面的第四种可能的实现方式中,该目标空间信息可以包括该目标空间对象的空间旋转信息。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的空间旋转信息。
根据第四方面或者第四方面第一种实现方式至第四方面第四种实现方式中任意一种实现方式,在第四方面的第五种可能的实现方式中,该目标空间信息可以封装在空间信息数据或者空间信息轨迹(track)中,该空间信息数据可以为该目标视频数据的码流、该目标视频数据的元数据或者独立于该目标视频数据的文件,该空间信息轨迹可以为独立于该目标视频数据的轨迹。
根据第四方面第五种实现方式,在第四方面的第六种可能的实现方式中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该同属性空间信息的类型的空间信息类型标识,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息。
根据第四方面第六种实现方式,在第四方面的第七种可能的实现方式中,在该空间信息类型标识指示目标空间信息中没有属于该同属性空间信息的信息时,该同属性空间信息可以包括该目标空间对象的宽的最小值、该目标空间对象的高的最小值、该目标空间对象的宽的最大值和该目标空间对象的高的最大值。
根据第四方面第六种实现方式或第四方面第七种实现方式,在第四方面的第八种可能的实现方式中,该空间信息类型标识可以与该同属性空间信息封装在同一个box中。
根据第四方面第五种实现方式至第四方面第八种实现方式种任意一种实现方式,在 第四方面的第九种可能的实现方式中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该目标空间信息对应的坐标系的坐标系标识,该坐标系为像素坐标系或者角度坐标系。
根据第四方面第九种实现方式,在第四方面的第十种可能的实现方式中,该坐标系标识可以与该同属性空间信息封装在同一个box中。
根据第四方面第五种实现方式至第四方面第十种实现方式种任意一种实现方式,在第四方面的第十一种可能的实现方式中,该空间信息数据或者该空间信息轨迹还可以包括空间旋转信息标识,该空间旋转信息标识用于指示该目标空间信息是否包括该目标空间对象的空间旋转信息。
第五方面,提供一种流媒体的信息的处理装置,该装置包括处理器和存储器;该存储器用于存储代码;该处理器通过读取该存储器中存储的该代码,以用于执行第一方面提供的方法。
第六方面,提供一种计算机存储介质,用于储存用于第五方面的处理器执行的计算机软件指令,以用于执行第一方面提供的方法。
第七方面,提供一种流媒体的信息的处理装置,该装置包括处理器和存储器;该存储器用于存储代码;该处理器通过读取该存储器中存储的该代码,以用于执行第二方面提供的方法。
第八方面,提供一种计算机存储介质,用于储存用于第七方面的处理器执行的计算机软件指令,以用于执行第二方面提供的方法。
在本发明实施例的一种可能的实现方式中,公开了一种媒体数据和元数据之间的联系类型(reference type)的描述方式。在现有标准草案中规定的联系类型的基础上,针对元数据的不同的使用方法,定义了不同的联系类型,可以方便客户端根据联系类型进行相应的处理。
在本发明实施例的一种可能的实现方式中,媒体数据和元数据之间的联系类型存储在媒体数据的轨道(track)或元数据的track中;在一种可能的实现方式中,通过盒(box)的形式传输媒体数据和元数据之间的联系类型。Track和box的有关定义可以参考现有的MPEG-DASH标准和ISO/IEC 14496-12中的有关规定,在此不再赘述。在一个示例中,可以在“tref”box里存储联系类型的信息。
在本发明实施例的一种可能的实现方式中,媒体数据是视频数据,媒体数据和元数据之间的联系类型存储在视频track中,例如,tref box存储在描述视频track的元数据中。
在一个示例中,包括了tref box的track为被关联视频轨道(the referenced video track),通过tref box中的媒体数据和元数据之间的联系类型与元数据track关联。可以用track id的方式确定关联的元数据track。
在本发明实施例的一种可能的实现方式中,联系类型可以用来描述下列信息中的一种或多种:
媒体数据中的感兴趣区域(region of interest,ROI),媒体数据所覆盖的空间区域,媒体数据中的ROI关联的质量信息,以及媒体数据所覆盖的空间区域关联的质量信息。
在本发明实施例的一个示例中,联系类型可以用来描述下列信息中的一种或多种:
媒体数据对应空间对象中的ROI在球面,或者2D平面,或者映射后图像中的空间位置信息;
或者媒体数据所覆盖的区域在球面,或者2D平面,或者映射后图像中的空间位置信息;
或者是上述ROI或者覆盖区域的空间的质量信息。
上述的ROI信息包含在ROI的有时间属性的元数据track中,质量信息包含在质量的有时间属性的元数据track中
在本发明实施例的一个示例中,媒体数据的track的‘tref’box中包括有表示媒体数据和元数据之间的联系类型。该联系类型可以用来描述如下信息中的一种或多种:
媒体数据对应空间对象中的感兴趣区域(region of interest,ROI)的2D空间位置信息,或者媒体数据对应空间对象中的ROI在球面中的空间位置信息,或者媒体数据对应空间对象中的ROI在映射后的图像中的空间位置信息,或者媒体数据对应空间对象的2D空间位置信息,或者媒体数据对应空间对象在球面中的空间位置信息,或者媒体数据对应空间对象在映射后的图像中的空间位置信息,或者媒体数据对应空间对象中的ROI的2D空间位置的质量信息,或者媒体数据对应空间对象中的ROI在球面中的空间位置的质量信息,或者媒体数据对应空间对象中的ROI在映射后的图像中的空间位置的质量信息,或者媒体数据对应空间对象的2D空间位置的质量信息,或者媒体数据对应空间对象在球面中的空间位置的质量信息,或者媒体数据对应空间对象在映射后的图像中的空间位置的质量信息。
在本发明实施例的一个示例中,reference type的值为rois,表示关联的track中包括了球面上的ROI的区域信息(this track contain the region information of the ROI on the sphere),这个ROI的区域信息是描述被关联的视频track中的样本对应的图像的空间区域。客户端可以通过解析ROI的有时间属性的元数据track中的样本,获得ROI区域信息,并使用ROI信息呈现被关联视频track中的samp le对应的图像的ROI(The client can use the sample in this track to render the ROI on the sphere)。
在本发明实施例的一个示例中,reference type的值为roiq,表示关联的track中包括了被关联视频track中样本对应的图像在球面上的ROI的质量信息(this track contain the quality information of the ROI on the sphere for the referenced video track)。客户端可以通过解析质量的有时间属性的元数据track中的样本,获得被关联视频数据的ROI的质量(the client can use the sample i n this track to know the quality of the ROI object on the sphere)。
在本发明实施例的一个示例中,reference type的值为conc,表示关联的track中包括了视频track中样本对应的图像在球面上的覆盖信息(this track provides information on the area on the spherical surface for the referenced video track)。关联的元数据track可以是一个推荐视点的有时间属性的元数据track(this track maybe a recommended viewport timed metadata track),被关联的视频track可以是一个导演视角的视频track(the referenced video track maybe a director's cut video track)。
在本发明实施例的一个示例中,联系类型(reference_type)的取值和描述的信息的对应关系如下:表1
Figure PCTCN2017078585-appb-000001
在本发明的实施例中,在服务器侧,媒体数据封装‘tref’box的语法时,封装了描述和媒体数据有关联关系的元数据的联系类型。客户端解析媒体track的数据之后,通过媒体数据和元数据的联系类型描述的信息,客户端可以了解对该视频数据能够执行哪些处理,该方法使得各个track之间的解析解耦,降低客户端实现流程中的复杂度。客户端可以根据不同的处理需要请求对应的track数据。
在本发明实施例中的ROI可以是视角,也可以是推荐视角,例如作者视角等。
在本发明的一个实施例中,视角或推荐视角也可以是指覆盖区域,覆盖区域是指媒体数据对应的空间对象的空间范围。
在本发明实施例的一种实现方式中,媒体数据的track中包括有描述和媒体数据有关联关系的元数据的联系类型,联系类型可以描述媒体数据对应空间对象中的ROI的2D空间位置,或者是媒体数据对应空间对象中的ROI在球面中的空间位置,或者是媒体数据对应空间对象中的ROI在映射后的图像中的空间位置。
在一个具体的示例中,媒体数据的track中包括‘tref’box,其中,
‘tref’box中的reference_type取值为ri2d,该值表示媒体数据和一个ROI的timed metadata track关联,该ROI的track中的样本(sample)是媒体数据对应空间对象中的ROI的2D空间位置信息;该2D位置信息可以是现有标准ISO/IEC 23001-10中定义的位置;
或者reference_type取值为rois,该ROI的track中的样本(sample)是媒体数据对应空间对象中的ROI在球面中的空间位置信息;该在球面中的空间位置信息可以是现有标准ISO/IEC 23000-20中定义的在球面上的timed metadata track中的sample;
或者reference_type取值为ri2p,该值表示媒体数据和一个ROI的timed metadata track关联,该ROI的track中的样本(sample)是媒体数据对应空间对象中的ROI在映射后的图像中的空间位置信息。
对应的示例中,客户端的处理方式可以如下:
1、客户端解析媒体数据的track,媒体数据的track的track id为1(可以是任意非0整数),得到track中的‘tref’box;从‘tref’box中得到reference_type值为‘ri2d’或者‘rois’或者‘ri2p’的关联track,该关联track的track id取值是2(可以是任意非0整数);
2、客户端根据‘ri2d’,判断出track id是2的track描述的是媒体数据对应空间对象中的ROI的2D空间位置信息,
或者根据‘rois’,判断出track id是2的track描述的是媒体数据对应空间对象中的ROI在球面中的空间位置信息,
或者根据‘ri2p’,判断出track id是2的track描述的是媒体数据对应空间对象中的ROI在映射后的图像中的空间位置信息;
3、客户端可以在用户界面上提供ROI选项,由用户选择是否观看ROI区域的内容,如果用户选择观看ROI区域的内容,客户端呈现ROI区域内容;或者客户端直接呈现ROI内容;
4、在客户端要呈现ROI内容之前,客户端解析track id=2的timed metadata track,从该track中获取ROI的空间位置信息,根据空间位置信息从视频内容中截取ROI内容呈现;在track的的tref元数据中,将该track参考的元数据的类型进行清晰描述,使得客户端的track之间的解析解耦,降低客户端实现流程中的复杂度。
在本发明实施例的一种可能的实现方式中,如图17所示,覆盖信息描述的是当前track的媒体内容在整个源视频内容中截取到的区域,该区域可以是VR球面中的区域,也可以是2D图像中的区域,也可以是球面映射为2D图像后截取的区域;如图17描述,在图17(a)中右边的小图的空间位置信息就是小图在左边大图中的覆盖信息,在球面图(17(a))中,灰色区域在球体上截取出来的区域就是灰色区域的覆盖区域,覆盖区域的媒 体track中的关联类型会在‘tref’box中增加描述,描述信息指示出该元数据是媒体数据对应空间对象2D空间位置信息,或者是媒体数据对应空间对象在球面中的空间位置信息,或者是媒体数据对应空间对象在映射后的图像中的空间位置信息。
在本发明实施例的一种可能的方式中,采用box的方式描述覆盖信息。一个具体的示例如下:
Figure PCTCN2017078585-appb-000002
在一个示例中,该box提供由与容器ProjctedOmnidirectionalVideoBox相关联的投影帧表示的球形表面上的区域的信息。(This box provides information on the area on the spherical surface that is represented by the projected frame associated with the container ProjctedOmnidirectionalVideoBox)。如果数据中没有表示覆盖信息的box,则说明投影帧是对应完整球面的一个表示(The absence of this box indicates that the projected frame is a representation of the full sphere)。
当投影格式是等角投影时,由投影帧表示的球面区域是由两个偏航圆圈和两个俯仰圆圈所指定的区域,如图10所示(When the projection format is the equirectangular projection,the spherical region represented by the projected frame is the region specified by two yaw circles and two pitch circles, as illustrated fig 10)。
在本发明实施例的一个示例中,根据如下方式描述覆盖信息:
Figure PCTCN2017078585-appb-000003
Figure PCTCN2017078585-appb-000004
其中,hor_range元素和ver_range元素指定了视频track的中的样本(sample)对应的图像在球面上的水平和垂直的范围,单位可以是0.01度。
在一个示例中,hor_range和ver_range指定区域中心点的范围。hor_range的范围应在1到36000之间。ver_range的范围应在1到36000之间。center_pitch+ver_range÷2不大于18000.center_pitch-ver_range÷2不小于-18000(hor_range and ver_rangespecify the range through the center point of the region.hor_range shall be in the range of 1to 36000,inclusive.ver_range shall be in the range of 1to 36000,inclus ive.center_pitch+ver_range÷2shall not be greater than 18000.center_pitch-ver_range÷2shall not be less than-18000)。
在本发明实施例的一个示例中,dynamic_range_flag的值等于0表示sample entry所涉及的全部的sample所对应的水平和垂直的范围不改变(dynamic_range_flag equal to 0specifies that the horizontal and vertical ranges of the region remain unchanged in all samples referring to this sample entry)。此时可以在sample entry的数据中描述这些sample所对应的图像在球面上的水平和垂直的范围。
在本发明实施例的一个示例中,dynamic_range_flag的值等于1表示在样本格式中描述该样本所对应的水平和垂直的范围(dynamic_range_flag equal to 1specifies that the horizonal and vertical ranges of the region is indicated in the sample format)。
在本发明实施例的一个示例中,通过覆盖信息track描述视频track中的样本对应的图像在球面上的覆盖区域。
在一个示例中通过coverage timed metadata track指示视频内容在球面上的覆盖区域。
在一个示例中,coverage timed metadata track中的sample的entry type是'covg'。
在一个示例中,可以使用RegionOnSphereSample元素描述coverage timed metadata track中的sample语法。RegionOnSphereSample元素的具体示例可以参照现有标准中的有关规定,例如参照ISO/IEC 23000-20的有关示例。
在一个示例中,sample entry中的RegionOnSphereConfigBox里的shape_type的值为0。
在一个示例中,static_hor_range元素和static_ver_range元素,或者hor_range元素和ver_range元素,用于分别指示对应的水平视点的覆盖范围和垂直视点的覆盖范围,center_yaw元素和center_pitch元素用于指示覆盖范围的中心点。
在一个示例中,‘tref’box中的reference_type取值为cv2d,该语义表示媒体数据和一个覆盖区域的timed metadata track关联,该覆盖区域的track中的样本(sample)是媒体数据对应空间对象中2D空间位置信息;该2D位置信息可以是现有标准ISO/IEC 23001-10中定义的位置信息;
或者reference_type取值为cvsp,该语义表示媒体数据和一个覆盖区域的timed metadata track关联,该覆盖区域的track中的样本(sample)是媒体数据对应空间对象在球面中的空间位置信息;该在球面中的信息可以是现有标准ISO/IEC 23000-20中定义的在球面上的timed metadata track中的sample;
或者reference_type取值为cv2p,该语义表示媒体数据和一个覆盖区域的timed metadata track关联,该覆盖区域的track中的样本(sample)是媒体数据对应空间对象在映射后的图像中的空间位置信息。
对应的示例中,客户端的处理方式可以如下:
1、客户端解析媒体数据的track,媒体数据的track的track id为1(可以是任意非0整数),得到媒体track的‘tref’box,从‘tref’box中得到reference_type值为‘cv2d’或者‘cvsp’或者‘cv2p’的关联track,该track的track id取值是2(可以是任意非0整数);
2、客户端根据‘cv2d’,判断出track id是2的track描述的是媒体数据对应空间对象的2D空间位置信息,或者根据‘cvsp’,判断出track id是2的track描述的是媒体数据对应空间对象在球面中的空间位置信息,或者根据‘cv2p’,判断出track id是2的track描述的是媒体数据对应空间对象在映射后的图像中的空间位置信息;
3、客户端可以根据覆盖信息以及设备的呈现能力来判断,是否是将媒体的全部内容都进行呈现还是截取部分内容呈现,或者当用户的视角发生改变时,用户如果获取视角之外的数据。
在一个示例中,track id是2的track是一个覆盖区域的空间信息描述track,该track的sample entry type表示当前timed metadata track是一个覆盖区域的空间信息描述track,sample entry type可以取值为‘cvvp’(coverage viewport)。
在一个示例中,媒体的覆盖信息可以采用独立的track描述,比如采用sample entry type取值为‘cvvp’的timed metadata track描述,但是如果媒体的覆盖信息是一个静态的情况(视频中每一帧的覆盖信息都是相同的),那么媒体中就不需要关联覆盖区域 的metadata track,在这种情况下媒体的覆盖信息就可以在媒体的track(track id=1)的元数据中描述,具体描述信息在ISO/IEC 23000-20中‘covi’中(覆盖信息box),在该box中描述覆盖区域在球面或者2D平面的形状。
在本发明实施例的一种可能的实现方式中,联系类型的值描述的是质量信息的元数据track和媒体数据track的关联关系描述方式。
在一个示例中,具体的对应关系如下表:
表2
Figure PCTCN2017078585-appb-000005
在对应的示例中,客户端解析媒体数据的track,track id=1,得到‘tref’box中reference_type值为‘r2dq’或者‘risq’或者‘ri2p’或者‘c2dq’或者‘cspq’或者‘c2pq’,客户端就可以了解到存在ROI空间区域的质量信息的metadata track或者是 存在对媒体内容覆盖区域的视频的质量描述的metadata track;如果客户端需要获取当前视频的质量信息,就可以去解析质量信息对应的track。
在track的元数据中,将该track参考的元数据的类型进行清晰描述,使得客户端的track之间的解析解耦,降低客户端实现流程中的复杂度。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是***层视频流媒体传输采用的DASH标准传输的MPD的结构示意图;
图2是***层视频流媒体传输采用的DASH标准传输的框架实例示意图;
图3是本发明实施例提供的码流分段的切换的一示意图;
图4是码流数据中的分段存储方式的一示意图;
图5是码流数据中的分段存储方式的另一示意图;
图6是视角变化对应的视角示意图;
图7是空间对象的空间关系的另一示意图;
图8是本发明实施例提供的一种流媒体的信息的处理方法的流程示意图;
图9是目标空间对象在全景空间中的相对位置的示意图;
图10是本发明实施例的一种坐标系的示意图;
图11是本发明实施例的另一种坐标系的示意图;
图12是本发明实施例的另一种坐标系的示意图;
图13是本发明实施例提供的一种流媒体的信息的处理方法的流程示意图;
图14是发明实施例提供一种流媒体的信息的处理装置的逻辑结构示意图;
图15是发明实施例提供一种流媒体的信息的处理装置的逻辑结构示意图;
图16是本发明实施例提供的一种计算机设备的硬件结构示意图。
图17是本发明实施例的一种覆盖区域的示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。
当前以客户端为主导的***层视频流媒体传输方案可采用DASH标准框架,如图2,图2是***层视频流媒体传输采用的DASH标准传输的框架实例示意图。***层视频流媒体传输方案的数据传输过程包括两个过程:服务器端(如HTTP服务器,媒体内容准备服务器,以下简称服务器)为视频内容生成媒体数据,响应客户端请求的过程;和客户端(如HTTP流媒体客户端)向服务器请求并获取媒体数据的过程。其中,上述媒体数据包括媒体呈现描述(英文:Media Presentation Description,MPD)和媒体码流((例如要播放的视频码流))。服务器上的MPD中包括多个表示(也称呈现,英文:representation),每个表示描述多个分段。客户端的HTTP流媒体请求控制模块获取服 务器发送的MPD,并对MPD进行分析,确定MPD中描述的视频码流的各个分段的信息,进而可确定要请求的分段,向服务器发送相应的分段的HTTP请求,并通过媒体播放器进行解码播放。
1)在上述服务器为视频内容生成媒体数据的过程中,服务器为视频内容生成的媒体数据包括对应同一视频内容的不同版本的视频码流,以及码流的MPD。例如,服务器为同一集电视剧的视频内容生成低分辨率低码率低帧率(如360p分辨率、300kbps码率、15fps帧率)的码流,中分辨率中码率高帧率(如720p分辨率、1200kbps码率、25fps帧率)的码流,高分辨率高码率高帧率(如1080p分辨率、3000kbps码率、25fps帧率)的码流等。
此外,服务器还可为该集电视剧的视频内容生成MPD。其中,如图1,图1是***传输方案DASH标准的MPD的结构示意图。上述码流的MPD包含多个时期(Per iod),例如,图1的MPD)中的period start=100s部分可包含多个自适应集(英文:adaptation set),每个adaptation set可包含Representation1、Representation2,…等多个表示。每个表示描述码流的一个或者多个分段。
在本发明的一个实施例中,每个表示按照时序描述若干个分段(英文:Segment)的信息,例如初始化分段(英文:Initialization segment)、媒体分段(Media Segment)1、Media Segment2,…,Media Segment20等。表示中可以包括播放起始时刻、播放持续时长、网络存储地址(例如以统一资源定位符(英文:Universal Resource Locator,URL)的形式表示的网络存储地址)等分段信息。
2)在客户端向服务器请求并获取媒体数据的过程中,用户选择播放视频时,客户端根据用户点播的视频内容向服务器获取相应的MPD。客户端根据MPD中描述的码流分段的网络存储地址,向服务器发送下载网络存储地址对应的码流分段的请求,服务器根据接收到的请求向客户端发送码流分段。客户端获取得到服务器发送的码流分段之后,则可通过媒体播放器进行解码、播放等操作。
参见图3,是本发明实施例提供的码流分段的切换的一示意图。服务器可为同一个视频内容(比如一部电影)准备三个不同版本的码流数据,并在MPD中使用三个Representation对上述三个不同版本的码流数据进行描述。其中,上述三个Representation(以下简称rep)可假设为rep1、rep2和rep3等。其中,rep1是码率为4mbps(每秒兆比特)的高清视频,rep2是码率为2mbps的标清视频,rep3是码率为1mbps的普通视频。每个rep的segment包含一个时间段内的视频码流,同一个时间段内,不同的rep包含的segment相互对齐。即,每个rep按照时序描述每个时间段的segment,并且相同时段的segment长度相同,进而可实现不同rep上的segment的内容切换。如图,图中标记为阴影的分段是客户端请求播放的分段数据,其中,客户端请求的前3个分段是rep3的分段,客户端请求第4个分段时可请求rep2中的第4个分段,进而可在rep3的第3个segment播放结束之后切换到rep2的第4个分段上播放。Rep3的第3个segment的播放终止点(对应到时间上可为播放结束时刻)即为第4个segment的播放起始点(对应到时间上可为播放起始时刻),同时也是rep2或者rep1的第4个segment的播放起始点,实现不同rep上的segment的对齐。客户端请求rep2的第4个分段之后切换到rep1,请求rep1的第5个分段和第6个分段等。随后可切换至rep3上,请求rep3的第7个分段,再切换到rep1上,请求rep1的第8个分段。每个rep的segment可以 首尾相接的存在一个文件中,也可以独立存储为一个个的小文件。segment可以按照标准ISO/IEC 14496-12中的格式封装(ISO BMFF(Base Media File Format)),也可以是按照ISO/IEC 13818-1中的格式封装(MPEG-2TS)。具体可根据实际应用场景需求确定,在此不做限制。
在DASH媒体文件格式中提到,上述segment有两种存储方式:一种是每个segment分开独立存储,如图4,图4是码流数据中的分段存储方式的一示意图;另一种是同一个rep上的所有segment均存储在一个文件中,如图5,图5是码流数据中的分段存储方式的另一示意图。如图4,repA的segment中每个segment单独存储为一个文件,repB的segment中每个segment也单独存储为一个文件。对应的,图4所示的存储方式,服务器可在码流的MPD中可采用模板的形式或者列表的形式描述每个segment的URL等信息。如图5,rep1的segment中所有segment存储为一个文件,rep2的segment中所有segment存储为一个文件。对应的,图5所示的存储方法,服务器可在码流的MPD中采用一个索引分段(英文:index segment,也就是图5中的sidx)来描述每个segment的相关信息。索引分段描述了每个segment在其所存储的文件中的字节偏移,每个segment大小以及每个segment持续时间(durat ion,也称每个segment的时长)等信息。
当前随着360度视频等VR视频的观看应用的日益普及,越来越多的用户加入到大视角的VR视频观看的体验队伍中。这种新的视频观看应用给用户带来了新的视频观看模式和视觉体验的同时,也带来了新的技术挑战。由于360度(本发明实施例将以360度为例进行说明)等大视角的视频观看过程中,VR视频的空间区域(空间区域也可以叫做空间对象)为360度的全景空间(或称全方位空间,或称全景空间对象),超过了人眼正常的视觉范围,因此,用户在观看视频的过程中随时都会变换观看的角度(即视角,FOV)。用户观看的视角不同,看到的视频图像也将不同,故此视频呈现的内容需要随着用户的视角变化而变化。如图6,图6是视角变化对应的视角示意图。框1和框2分别为用户的两个不同的视角。用户在观看视频的过程中,可通过眼部或者头部转动,或者视频观看设备的画面切换等操作,将视频观看的视角由框1切换到框2。其中,用户的视角为框1时所观看的视频图像为该视角对应的一个或者多个空间对象在该时刻所呈现的视频图像。下一个时刻用户的视角切换为框2,此时用户观看到的视频图像也应该切换为框2对应的空间对象在该时刻所呈现视频图像。
在一些可行的实施方式中,对于360度大视角的视频图像的输出,服务器可将360度的视角范围内的全景空间(或者称为全景空间对象)进行划分以得到多个空间对象,每个空间对象对应用户的一个子视角,多个子视角的拼接形成一个完整的人眼观察视角。即人眼视角(下面简称视角)可对应一个或者多个划分得到的空间对象,视角对应的空间对象是人眼视角范围内的内容对象所对应的所有的空间对象。其中,人眼观察视角可以动态变化的,但是通常视角范围可为120度*120度,120度*120度的人眼视角范围内的内容对象对应的空间对象可包括一个或者多个划分得到的空间对象,例如上述图6该的框1对应的视角1,框2对应的视角2。进一步的,客户端可通过MPD获取服务器为每个空间对象准备的视频码流的空间信息,进而可根据视角的需求向服务器请求某一时间段某个或者多个空间对象对应的视频码流分段并按照视角需求输出对应的空间对象。客户端在同一个时间段内输出360度的视角范围内的所有空间对象对应的视频码流分段,则可在整个360度的全景空间内输出显示该时间段内的完整视频图像。
具体实现中,在360度的空间对象的划分中,服务器可首先将球面映射为平面,在平面上对空间对象进行划分。具体的,服务器可采用经纬度的映射方式将球面映射为经纬平面图。如图7,图7是本发明实施例提供的空间对象的示意图。服务器可将球面映射为经纬平面图,并将经纬平面图划分为A~I等多个空间对象。进一步的,服务器可也将球面映射为立方体,再将立方体的多个面进行展开得到平面图,或者将球面映射为其他多面体,在将多面体的多个面进行展开得到平面图等。服务器还可采用更多的映射方式将球面映射为平面,具体可根据实际应用场景需求确定,在此不做限制。下面将以经纬度的映射方式,结合图7进行说明。如图7,服务器可将球面的全景空间划分为A~I等多个空间对象之后,则可为每个空间对象准备一组DASH视频码流。其中,每个空间对象对应的一组DASH视频码流。客户端用户切换视频观看的视角时,客户端则可根据用户选择的新视角获取新空间对象对应的码流,进而可将新空间对象码流的视频内容呈现在新视角内。下面将结合图8至图10对本发明实施例提供的信息的处理方法及装置进行描述。
***层视频流媒体传输方案采用DASH标准,通过客户端分析MPD、按需向服务器请求视频数据并接收服务器发送的数据的方式实现视频数据的传输。
在一些实施例中,视频的制作者(以下简称作者)制作视频时,可根据视频的故事情节需求为视频播放设计一条主要情节路线。视频播放过程中,用户只需要观看该主要情节路线对应的视频图像则可了解到该故事情节,其他视频图像可看可不看。由此可知,视频播放过程中,客户端可选择性的播放该故事情节对应的视频图像,其他的视频图像可以不呈现,可节省视频数据的传输资源和存储空间资源,提高视频数据的处理效率。作者设计故事的主要情节之后,可根据上述主要情节路线设定视频播放时每个播放时刻所要呈现给用户的视频图像,将每个播放时刻的视频图像按照时序串起来则可得到上述主要情节路线的故事情节。其中,上述每个播放时刻所要呈现给用户的视频图像为在每个播放时刻对应的空间对象上呈现的视频图像,即该空间对象在该时间段所要呈现的视频图像。具体实现中,上述每个播放时刻所要呈现的视频图像对应的视角可设为作者视角,呈现作者视角上的视频图像的空间对象可设为作者空间对象。作者视角对象对应的码流可设为作者视角码流。作者视角码流中包含多个视频帧的视频帧数据(多个视频帧的编码数据),每个视频帧呈现时可为一个图像,即作者视角码流中对应多个图像。在视频播放过程中,在每个播放时刻,作者视角上呈现的图像仅是整个视频所要呈现的全景图像(或称VR图像或者全方位图像)中的一部分。在不同的播放时刻,作者视频码流对应的图像所关联的空间对象的空间信息可以不同,也可以相同,即作者视角码流的视频数据所关联的空间对象的空间信息不同。
在一些可行的实施方式中,作者设计了每个播放时刻的作者视角之后,则可通过服务器对每个播放时刻的作者视角准备相应的码流。其中,作者视角对应的码流可设为作者视角码流。服务器可对作者视角码流进行编码并传输给客户端,客户端对作者视角码流进行解码之后,则可呈现作者视角码流对应的故事情节画面给用户。服务器无需传输作者视角以外其他视角(设为非作者视角,即静态视角码流)的码流给客户端,可节省视频数据的传输带宽等资源。
在一些可行的实施方式中,由于作者视角是作者根据视频故事情节设定的呈现预设空间对象的图像,不同的播放时刻上的作者空间对象可不同也可相同,由此可知作者视角是一个随着播放时刻不断变化的视角,作者空间对象是个不断变化位置的动态空间对 象,即每个播放时刻对应的作者空间对象在全景空间中的位置不尽相同。上述图7所示的各个空间对象是按照预设规则划分的空间对象,是在全景空间中的相对位置固定的空间对象,任一播放时刻对应的作者空间对象不一定是图7所示的固定空间对象中的某一个,而且在全局空间中相对位置不断变化的空间对象。客户端从服务器获取的视频所呈现的内容是由各个作者视角串起来的,不包含非作者视角对应的空间对象,作者视角码流仅包含作者空间对象的内容,并且从服务器获取的MPD中不包含作者视角的作者空间对象的空间信息,则客户端只能解码并呈现作者视角的码流。若用户在观看视频的过程中,观看的视角切换到非作者视角上,客户端则无法呈现相应的视频内容给用户。
在一些实施例中,服务器生成媒体呈现描述时,可在媒体呈现描述中添加标识信息,用于标识视频的作者视角码流,即作者视角码流。具体实现中,上述标识信息可携带在媒体呈现描述中携带的作者视角码流所在码流集合的属性信息中,即上述标识信息可携带在媒体呈现描述中的自适应集的信息中,上述标识信息也可携带在媒体呈现描述中包含的表示的信息中。进一步的,上述标识信息还可携带在媒体呈现描述中的描述子的信息中。客户端可通过解析MPD得到MPD中增加的语法元素快速识别作者视角码流和非作者视角的码流。如果作者视角流相关的空间信息封装在独立的元数据文件中,那么客户端可以通过解析MPD,根据codec标识获取空间信息的元数据,从而解析出空间信息。
在一些实施例中,服务器还可在作者视角码流中添加一个或者多个作者空间对象的空间信息。其中,每个作者空间对象对应一个或者多个图像,即一个或者多个图像可关联同一个空间对象,也可每个图像关联一个空间对象。服务器可在作者视角码流中添加每个作者空间对象的空间信息,也就可以将空间信息作为样本,独立的封装在一个轨迹或者文件中。其中,一个作者空间对象的空间信息为该作者空间对象与其关联的内容成分的空间关系,即作者空间对象与全景空间的空间关系。即上述作者空间对象的空间信息所描述的空间具体可为全景空间中的部分空间,如上述图7中任意一个空间对象。具体实现中,针对于DASH码流,服务器可在现有文件格式中的作者视角码流的分段中包含的trun box或者tfhd box中增加上述空间信息,用于描述作者视角码流的视频帧数据对应的每一帧图像所关联的空间对象的空间信息。
进一步的,由于每一帧图像所关联的空间对象的空间信息之间可能会有相同的信息,这就导致了多个作者空间对象的空间信息存在重复和冗余,从而影响数据传输的效率。
本发明实施例通过对DASH标准中提供的视频文件格式(英文:file format)进行修改,可以实现减少多个作者空间对象的空间信息存在重复和冗余。
本发明提供的对文件格式的修改,也可应用在ISOBMFF或者MPEG2-TS的文件格式中,具体可根据实际应用场景需求确定,在此不做限制。
图8本发明实施例提供的一种流媒体的信息的处理方法的流程示意图。本发明实施例提供的流媒体的信息处理方法,可以应用DASH领域,也可以应用于其他流媒体领域,例如基于RTP协议的流媒体传输。该方法的执行主体可以是客户端,具体可以为终端、用户设备或者计算机设备,也可以是网络设备,比如网关,代理服务器等。如图8所示,该方法可以包括如下步骤:
S801,获取目标空间对象的目标空间信息,该目标空间对象为两个空间对象中的空间对象,该两个空间对象与目标视频数据中包括的两个图像的数据关联,该目标空间信息包括同属性空间信息,该同属性空间信息包括该两个空间对象各自的空间信息之间相 同的信息,该两个空间对象中除该目标空间对象外的其他空间对象的空间信息包括该同属性空间信息。
其中,目标视频数据可以是目标视频码流,也可以是未经过编码的视频数据,在目标视频数据为目标视频码流时,两个图像的数据可以是两个图像的编码数据。进一步的,目标视频码流可以是作者视角码流,也可以是非作者视角码流。
其中,获取目标空间对象的目标空间信息可以是接收来自服务器的该目标空间信息。
其中,该两个图像可以与该两个空间对象一一对应,也可以一个空间对象对应两个图像。
其中,一个目标空间对象的空间信息为该目标空间对象与其关联的内容成分的空间关系,即目标空间对象与全景空间的空间关系。即上述目标空间对象的目标空间信息所描述的空间具体可为全景空间中的部分空间。该目标视频数据可以是上述作者视角码流,也可以是非作者视角码流。该目标空间对象可以是上述作者空间对象,也可以不是。
S802,根据该目标空间信息,确定要播放的视频数据。
进一步的,在确定要播放的视频数据后,还可以显示要播放的视频数据。
在一些实施例中,该目标空间信息还可以包括该目标空间对象的异属性空间信息,该其他空间对象的空间信息还包括该其他空间对象的异属性空间信息,该目标空间对象的异属性空间信息与该其他空间对象的异属性信息不同。
在目标空间信息一种可能的实现方式中,该目标空间信息可以包括该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息,该目标空间信息还可以包括该目标空间对象的宽和该目标空间对象的高。
其中,在目标空间信息对应的坐标系为角度坐标系时,该目标空间信息可以采用偏航角来描述,在目标空间信息对应的坐标系为像素坐标系时,该目标空间信息可以采用经纬图的空间位置描述,或者采用其他几何立体图形来描述,在此不做限制。采用偏航角方式描述,如俯仰角θ(pitch)、偏航角ψ(yaw)、滚转角Φ(roll),用于表示角度范围的宽和用于表示角度范围的高。如图9,图9是目标空间对象的中心点在全景空间中的相对位置的示意图。在图9中,O点为360度VR全景视频球面图像对应的球心,可认为是观看VR全景图像时人眼的位置。A点为目标空间对象的中心点,C、F为目标空间对象中过A点的沿该目标空间对象横向坐标轴的边界点,E、D为目标空间对象中过A点的沿该目标空间对象纵向坐标轴的边界点,B为A点沿球面经线在赤道线的投影点,I为赤道线上水平方向的起始坐标点。各个元素的含义解释如下:
俯仰角:目标空间对象的图像的中心位置映射到全景球面(即全局空间)图像上的点的竖直方向的偏转角,如图9中的∠AOB;
偏航角:目标空间对象的图像的中心位置映射到全景球面图像上的点的水平方向的偏转角,如图9中的∠IOB;
滚转角:偏航角空间对象的图像的中心位置映射到全景球面图像上的点与球心连线方向的旋转角,如图9中的∠DOB;
用于表示角度范围的高(在角度坐标系中的目标空间对象的高):目标空间对象的图像在全景球面图像的视场高度,以视场纵向最大角度表示,如图9中∠DOE;用于表示角度范围的宽(在角度坐标系中的目标空间对象的宽):目标空间对象的图像在全景球面图像的视场宽度,以视场横向最大角度表示,如图9中∠COF。
在目标空间信息另一种可能的实现方式中,该目标空间信息可以包括该目标空间对象的左上点的位置信息,和该目标空间对象的右下点的位置信息。
在目标空间信息另一种可能的实现方式中,在目标空间对象不是矩形时,该目标空间信息可以包括该目标空间对象的形状类型、半径、周长中至少一种。
在一些实施例中,该目标空间信息可以包括该目标空间对象的空间旋转信息。
在一些实施例中,该目标空间信息可以封装在空间信息数据或者空间信息轨迹(track)中,该空间信息数据可以为该目标视频数据的码流、该目标视频数据的元数据或者独立于该目标视频数据的文件,该空间信息轨迹可以为独立于该目标视频数据的轨迹。
其中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该同属性空间信息的类型的空间信息类型标识,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息。
其中,在该空间信息类型标识指示目标空间信息中没有属于该同属性空间信息的信息时,该同属性空间信息可以包括该目标空间对象的宽的最小值、该目标空间对象的高的最小值、该目标空间对象的宽的最大值和该目标空间对象的高的最大值。
其中,该空间信息类型标识与该同属性空间信息可以封装在同一个box中。
具体实现中,在目标空间信息封装在独立于该目标视频数据的文件(空间信息文件)或者独立于该目标视频数据的轨迹(空间信息轨迹)中时,服务器可以在现有文件格式中的3dsc box中添加同属性空间信息,在现有文件格式中的mdat box中添加该目标空间对象的异属性空间信息。
添加空间信息的样例(样例一):
Figure PCTCN2017078585-appb-000006
Figure PCTCN2017078585-appb-000007
在该样例中,同属性空间信息可以是yaw、pitch、roll、reference_width和reference_height中的部分,而并非全部,例如没有roll。roll可以属于该目标空间对象的异属性空间信息,也可以不包括在目标空间信息内。3dsc box中还添加了空间信息类型标识regionType,该样例为角度坐标系下的样例。其中,在空间信息类型标识为0时,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息为该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息,以及该目标空间对象的宽和该目标空间对象的高。在样例中,位置信息用俯仰角θ(pitch)、偏航角ψ(yaw)和滚转角Φ(roll)来表示,宽和高同样可以用角度来表示。换一种方式来理解就是,在空间信息类型标识为0时,该两个空间对象的位置和大小(例如但不限于宽高)都是相同的。
在空间信息类型标识为1时,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息为该目标空间对象的宽和该目标空间对象的高。换一种方式来理解就是,在空间信息类型标识为1时,该两个空间对象的大小(例如但不限于宽高)相同,位置不同。
在空间信息类型标识为2时,该空间信息类型标识用于指示该目标空间信息中没有属于该同属性空间信息的信息。换一种方式来理解就是,在空间信息类型标识为2时,该两个空间对象的大小和位置均不同。
相应的,在空间信息类型标识为0时,则可以指示不存在异属性空间信息。在空间信息类型标识为1时,空间信息类型标识还指示该目标空间对象的异属性空间信息为该 目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息。在空间信息类型标识为2时,空间信息类型标识还指示该目标空间对象的异属性空间信息为该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息,以及该目标空间对象的宽和该目标空间对象的高。
添加空间信息的样例(样例二):
Figure PCTCN2017078585-appb-000008
该样例为像素坐标系下的样例,在空间信息类型标识为0时,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息为该目标空间对象的左上点的位置信息,以及该目标空间对象的宽和该目标空间对象的高。在样例中,位置信息用以像素为单位的横坐标和以像素为单位的纵坐标来表示,宽和高同样可以以像素为单位来表示。其中,横坐标和纵坐标可以是位置点在图7中的经纬平面图中的坐标,也可以是在全景空间(或者,全景空间对象)中的坐标。换一种方式来理解就是,在空间信息类型标识为0时,该两个空间对象的位置和大小都是相同的。需要说明的是,可以用该目标空间对象的中心点的位置信息替代该目标空间对象的左上点的位置信息。
在空间信息类型标识为1时,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息为该目标空间对象的宽和该目标空间对象的高。换一种方式来理解就是,在空间信息类型标识为1时,该两个空间对象的大小相同,位置不同。
在空间信息类型标识为2时,该空间信息类型标识用于指示该目标空间信息中没有属于该同属性空间信息的信息。换一种方式来理解就是,在空间信息类型标识为2时,该两个空间对象的大小和位置均不同。
相应的,在空间信息类型标识为0时,则可以指示不存在异属性空间信息。在空间信息类型标识为1时,空间信息类型标识还指示该目标空间对象的异属性空间信息为该目标空间对象的左上点的位置信息。在空间信息类型标识为2时,空间信息类型标识还指示该目标空间对象的异属性空间信息为该目标空间对象的左上点的位置信息,以及该目标空间对象的宽和该目标空间对象的高。需要说明的是,可以用该目标空间对象的中心点的位置信息替代该目标空间对象的左上点的位置信息。
添加空间信息的样例(样例三):
Figure PCTCN2017078585-appb-000009
Figure PCTCN2017078585-appb-000010
该样例为像素坐标系下的样例,在空间信息类型标识为0时,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息为该目标空间对象的左上点的位置信息,和该目标空间对象的右下点的位置信息。在该样例中,位置信息用以像素为单位的横坐标和以像素为单位的纵坐标来表示。其中,横坐标和纵坐标可以是位置点在图7中的经纬平面图中的坐标,也可以是在全景空间(或者,全景空间对象)中的坐标。换一种方式来理解就是,在空间信息类型标识为0时,该两个空间对象的位置和大小都是相同的。需要说明的是,可以用该目标空间对象的宽和高替代该目标空间对象的右下点的位置信息。
在空间信息类型标识为1时,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息为该目标空间对象的右下点的位置信息。换一种方式来理解就是,在空间信息类型标识为1时,该两个空间对象的大小相同,位置不同。需要说明的是,可以用该目标空间对象的宽和高替代该目标空间对象的右下点的位置信息。
在空间信息类型标识为2时,该空间信息类型标识用于指示该目标空间信息中没有属于该同属性空间信息的信息。换一种方式来理解就是,在空间信息类型标识为2时,该两个空间对象的大小和位置均不同。
相应的,在空间信息类型标识为0时,则可以指示不存在异属性空间信息。在空间信息类型标识为1时,空间信息类型标识还指示该目标空间对象的异属性空间信息为该目标空间对象的左上点的位置信息。在空间信息类型标识为2时,空间信息类型标识还指示该目标空间对象的异属性空间信息为该目标空间对象的左上点的位置信息,和该目标空间对象的右下点的位置信息。需要说明的是,可以用该目标空间对象的宽和高替代该目标空间对象的右下点的位置信息。
在一些实施例中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该目标空间信息对应的坐标系的坐标系标识,该坐标系为像素坐标系或者角度坐标系。
其中,该坐标系标识与该同属性空间信息可以封装在同一个box中。
具体实现中,在目标空间信息封装在独立于该目标视频数据的文件(空间信息文件)或者独立于该目标视频数据的轨迹(空间信息轨迹)中时,服务器可以在现有文件格式中的3dsc box中添加坐标系标识。
添加坐标系标识的样例(样例一):
Figure PCTCN2017078585-appb-000011
在该样例中,在坐标系标识Coordinate_system为0时,坐标系为角度坐标系,在坐标系标识为1时,坐标系为像素坐标系。
在一些实施例中,该空间信息数据或者该空间信息轨迹还可以包括空间旋转信息标识,该空间旋转信息标识用于指示该目标空间信息是否包括该目标空间对象的空间旋转信息。
其中,空间旋转信息标识可以与该同属性空间信息封装在同一个box(例如,3dscbox)中,空间旋转信息标识也可以与该目标空间对象的异属性空间信息封装在同一个box(例如,mdat box)中。具体的,对于空间旋转信息标识与该目标空间对象的异属性空间信息封装在同一个box的情况,在该空间旋转信息标识指示该目标空间信息包括该目标空间对象的空间旋转信息时,则该目标空间对象的异属性空间信息包括空间旋转信息。
具体实现中,服务器可以将空间旋转信息标识与该目标空间对象的异属性空间信息封装在同一个box(例如,mdat box)中。进一步的,服务器可以将空间旋转信息标识与该目标空间对象的异属性空间信息封装在同一个box的同一个样本中。其中一个样本可以封装以一个空间对象对应的异属性信息。
添加空间旋转信息标识的样例(样例一):
Figure PCTCN2017078585-appb-000012
在一些实施例中,该同属性空间信息和该目标空间对象的异属性空间信息也可以封装在视频的空间信息元数据中(track matedata),比如封装在同一个box中,例如,trun box或者tfhd box或者是新的box中。
添加空间信息的样例(样例四):
Figure PCTCN2017078585-appb-000013
Figure PCTCN2017078585-appb-000014
其中一个空间对象的一个空间信息为一个样本,上述样本数量用于指示空间对象的数量,每个空间对象对应各自的一组异属性空间信息。作为本发明实施例提供的一种流媒体的信息的处理方法一种实现方式,步骤如下:
1、获得空间信息文件或者空间信息轨迹(空间信息可以称之为timed metadata)或者视频的空间信息元数据(或称为目标视频数据的元数据);
2、解析空间信息文件或者空间信息track;
3、解析到tag为3dsc的box(空间信息描述box),解析空间信息类型标识,该空间信息类型标识可以用于指示两个空间对象的空间对象类型,可选的空间对象类型可以包括但不限于:位置和大小不变的空间对象、位置变化和大小不变的空间对象、位置不变和大小变化的空间对象、以及位置和大小均变化的空间对象;
4、如果解析到的空间对象类型是位置和大小不变的空间对象,位置和大小不变的空间对象是指空间对象的空间位置和空间对象的空间的大小不变,那么在3dsc box中解析到的同属性空间信息可以作为目标空间信息,该空间对象类型表示所有的两个空间对象的空间信息都是相同的,取值都和解析到的同属性空间信息一致;如果是该类型的同属性空间信息,在后续的解析中,不需要解析该目标空间对象的异属性空间信息所在的box;
5、如果解析到的空间对象类型是位置变化和大小不变的空间对象,那么在该3dscbox中的同属性空间信息会携带空间对象的大小信息,比如空间对象的宽的高;那么在后续解析得到的该目标空间对象的异属性空间信息中所携带的信息就是每个空间对象的位置信息。
6、如果解析到的空间对象类型是位置和大小均变化的空间对象,那么在后续解析得到的该目标空间对象的异属性空间信息中所携带的信息就是每个空间对象的位置信息(例如,中心点的位置信息)和空间对象的大小信息,例如空间对象的宽的高;
7、解析到目标空间信息后,根据目标空间信息描述的空间对象(目标空间对象),在得到的VR视频中选择呈现的内容对象;或者请求目标空间信息描述的空间对象对应的视频数据进行解码呈现,或者根据该目标空间信息确定当前所观看的视频内容在在VR视频空间(或者称,全景空间)中的位置。
本一些实施例中,可以通过在MPD中的增加携带方式标识(carryType)来描述空间信息的携带方式:携带在空间信息文件或者空间信息轨迹或者目标视频数据的元数据中。
以MPD的基本属性描述子(Essential Property)或者补充属性描述子(Supplemental Property)为例:
语法表:EssentialProperty@value and/or SupplementalProperty@value attributes for the scheme
Figure PCTCN2017078585-appb-000015
或者,
语法表:EssentialProperty@value and/or SupplementalProperty@value attributes for the scheme
Figure PCTCN2017078585-appb-000016
具体的MPD样例如下:
样例一:空间信息携带在目标视频数据的元数据中
Figure PCTCN2017078585-appb-000017
Figure PCTCN2017078585-appb-000018
在该样例中,va l ue="1,0",1是源的标识,0表示空间信息携带在目标视频数据的轨迹的元数据(或称为目标视频数据的元数据)中
样例二:空间信息携带在空间信息轨迹中
Figure PCTCN2017078585-appb-000019
Figure PCTCN2017078585-appb-000020
在该样例中,va l ue="1,1",1是源的标识,1表示空间信息携带在独立的空间信息轨迹中。
样例三:空间信息携带在独立的空间信息文件中
Figure PCTCN2017078585-appb-000021
在该样例中,va l ue="1,2",1是源的标识,2表示空间信息携带在独立的空间信息文件中,空间信息文件所关联的目标视频表示(或称为目标视频码流)由associationId="zoomed"表示,空间信息文件和representation id为zoomed的目标视频表示关联;
客户端通过解析MPD可以获得空间信息的携带方式,从而根据携带方式获得空间信 息。
在一些实施例中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该目标空间对象的宽高类型标识,该宽高类型标识可以用于指示用于描述所述目标空间对象的宽高的坐标系,或者,该宽高类型标识可以用于指示用于描述所述目标空间对象的边界的坐标系。该宽高类型标识可以是一个标识,也可以包括宽类型标识和高类型标识。
其中,该宽高类型标识与该同属性空间信息可以封装在同一个box(例如,3dsc box)中,该宽高类型标识也可以与该目标空间对象的异属性空间信息封装在同一个box(例如,mdat box)中。
具体实现中,服务器可以将宽高类型标识与该同属性空间信息封装在同一个box(例如,3dsc box)中。进一步的,在目标空间信息封装在独立于该目标视频数据的文件(空间信息文件)或者独立于该目标视频数据的轨迹(空间信息轨迹)中时,服务器可以在3dsc box中添加宽高类型标识。
添加宽高类型标识的样例(样例一):
Figure PCTCN2017078585-appb-000022
在一些实施例中,该同属性空间信息和该目标空间对象的异属性空间信息也可以封装在视频的空间信息元数据中(track matedata),比如封装在同一个box中,例如,trun box或者tfhd box或者tfhd box或者是新的box中。
添加空间信息的样例(样例二):
Figure PCTCN2017078585-appb-000023
在该样例中,在宽高类型标识为0时,用于描述目标空间对象的宽高的坐标系如图10所示,球面的阴影部分是目标空间对象,目标空间对象的四个角的顶点分别是B,E,G,I;在图10中,O为360度VR全景视频球面图像对应的球心,顶点BEGI分别为过球心的圆(该圆以球心O为圆心,并且该圆的半径为360度VR全景视频球面图像对应的球体的半径,该圆过z轴,该圆的数量为两个,一个经过点BAIO,一个经过点EFGO),和平行于坐标轴x轴和y轴的圆(该圆不以球心O为圆心,该圆的数量为两个,且两个圆互相平行,一个经过点BDE,一个经过点IHG)在球面上的交点,C为目标空间对象的中心点,DH边对应的角度表示为目标空间对象的高度,AF边对应的角度表示为目标空间对象的宽度,DH边和AF边过C点,其中BI边、EG边和DH边对应的角度相同;BE边、IG边和AF边对应的角度相同;BE边对应的角的顶点是J,J是上述圆中BDE所在圆和z轴的交点,相应的,IG边对应的角的顶点为上述圆中IHG所在的圆和z轴的交点,AF边对应的角的顶点为O点,BI边、EG边和DH边对应的角的顶点也为O点。
需要说明的是,以上只是一种示例,目标空间对象也可以是过x轴的两个圆和平行于y轴和z轴的且不过球心两个圆相交获得,目标空间对象也可以是过y轴的两个圆和平行于x轴和z轴的且不过球心两个圆相交获得。
在宽高类型标识为1时,用于描述目标空间对象的宽高的坐标系如图11所示,球面的阴影部分是目标空间对象,目标空间对象的四个角的顶点分别是B,E,G,I;在图11中,O为360度VR全景视频球面图像对应的球心,顶点BEGI分别为过z轴的圆(该圆以球心O为圆心,并且该圆的半径为360度VR全景视频球面图像对应的球体的半径,该圆的数量为两个,一个经过点BAI,一个经过点EFG),和过y轴的圆(该圆以球心O为圆心,并且该圆的半径为360度VR全景视频球面图像对应的球体的半径,该圆的数量为两个,一个经过点BDE,一个经过点IHG)在球面上的交点,C为目标空间对象的中心点,DH边对应的角度表示为目标空间对象的高度,AF边对应的角度表示为目标空间对象的宽度,DH边和AF边过C点,其中BI边、EG边和DH边对应的角度相同;BE边、IG边和AF边对应的角度相同;BE边对应的角的顶点为J点,J点为过BE两点并与x轴和y轴平行的圆与z轴的交点,IG边对应的角的顶点为过IG两点并与x轴和y轴平行的圆与z轴的交点,AF边对应的角的顶点为O点,BI边对应的角的顶点为L点,L点为过BI两点并与z轴和x轴平行的圆与y轴的交点,EG边对应的角的顶点为过EG两点并与z轴和x轴平行的圆与y轴的交点,DH边对应的角的顶点也为O点。
需要说明的是,以上只是一种示例,目标空间对象也可以是过x轴的两个圆和过z轴的两个圆相交获得,目标空间对象也可以是过x轴的两个圆和过y轴的两个圆相交获得。
在宽高类型标识为2时,用于描述目标空间对象的宽高的坐标系如图12所示,球面的阴影部分是目标空间对象,目标空间对象的四个角的顶点分别是B,E,G,I;在图12中,O为360度VR全景视频球面图像对应的球心,顶点BEGI分别为平行于坐标轴x轴和z轴的圆(该圆不以球心O为圆心,该圆的数量为两个,且两个圆互相平行,,该圆的数量为两个,一个经过点BAI,一个经过点EFG),和平行于坐标轴x轴和y轴的圆(该圆不以球心O为圆心,该圆的数量为两个,且两个圆互相平行,一个经过点BDE,一个经过点I HG)在球面上的交点,C为目标空间对象的中心点,DH边对应的角度表示为目标空间对象的高度,AF边对应的角度表示为目标空间对象的宽度,DH边和AF边过C点,其中BI边、EG边和DH边对应的角度相同;BE边、IG边和AF边对应的角度相同;BE边、IG边和AF边对应的角的顶点为O点,BI边、EG边和DH边对应的角的顶点也为O点。
需要说明的是,以上只是一种示例,目标空间对象也可以是平行于y轴和z轴的且不过球心两个圆和平行于y轴和x轴的且不过球心两个圆相交获得,目标空间对象也可以是平行于y轴和z轴的且不过球心两个圆和平行于z轴和x轴的且不过球心两个圆相交获得。
在图11中的J点和L点和图10中的J点获取方式相同,BE边对应的角的顶点是J点,BI边对应的角的顶点是L点;在图12中,BE边和BI边对应的顶点都是O点。
在一些实施例中,该同属性空间信息和该目标空间对象的异属性空间信息也可以包含目标空间对象的描述信息,比如该描述信息用于将目标空间对象描述为视角区域(举例来说,可以是视角码流对应的空间对象),或者感兴趣区域,或者该描述信息用于描述目标空间对象的质量信息。可以通过在上述实施例中的3dsc box或者trun box或者 tfhd box或者是新的box的语法(syntax)添加该描述信息,或者在SphericalCoordinatesSample中添加该描述信息(content_type),以用于实现以下作用的一种或多种:将目标空间对象描述为视角区域,将目标空间对象描述为感兴趣区域,以及用于描述目标空间对象的质量信息。
图13本发明实施例提供的一种流媒体的信息的处理方法的流程示意图。该方法的执行主体可以是服务器,具体可以为计算机设备。如图13所示,该方法可以包括如下步骤:
S1001,获取目标视频数据中的两个图像的数据所关联的两个空间对象各自的空间信息;
S1002,根据两个空间对象各自的空间信息,确定目标空间对象的目标空间信息,该目标空间对象为该两个空间对象中的空间对象,该目标空间信息包括同属性空间信息,该同属性空间信息包括该两个空间对象各自的空间信息之间相同的信息,该两个空间对象中除该目标空间对象外的其他空间对象的空间信息包括该同属性空间信息。
在一些实施例中,该方法还可以包括:向客户端发送该目标空间信息。
在一些实施例中,该目标空间信息还可以包括该目标空间对象的异属性空间信息,该其他空间对象的空间信息还包括该其他空间对象的异属性空间信息,该目标空间对象的异属性空间信息与该其他空间对象的异属性信息不同。
在一些实施例中,该目标空间信息可以包括该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息,该目标空间信息还可以包括该目标空间对象的宽和该目标空间对象的高。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的中心点的位置信息或者该两个空间对象各自的左上点的位置信息,该两个空间对象各自的空间信息还可以包括该两个空间对象各自的宽和该两个空间对象各自的高。
在一些实施例中,该目标空间信息可以包括该目标空间对象的左上点的位置信息,和该目标空间对象的右下点的位置信息。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的左上点的位置信息和该两个空间对象各自的右下点的位置信息。
在一些实施例中,该目标空间信息可以包括该目标空间对象的空间旋转信息。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的空间旋转信息。
在一些实施例中,该目标空间信息可以封装在空间信息数据或者空间信息轨迹(track)中,该空间信息数据可以为该目标视频数据的码流、该目标视频数据的元数据或者独立于该目标视频数据的文件,该空间信息轨迹可以为独立于该目标视频数据的轨迹。
其中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该同属性空间信息的类型的空间信息类型标识,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息。
其中,在该空间信息类型标识指示目标空间信息中没有属于该同属性空间信息的信息时,该同属性空间信息可以包括该目标空间对象的宽的最小值、该目标空间对象的高的最小值、该目标空间对象的宽的最大值和该目标空间对象的高的最大值。
其中,该空间信息类型标识可以与该同属性空间信息封装在同一个box中。
其中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该目标空间信息对应的坐标系的坐标系标识,该坐标系为像素坐标系或者角度坐标系。
其中,该坐标系标识可以与该同属性空间信息封装在同一个box中。
其中,该空间信息数据或者该空间信息轨迹还可以包括空间旋转信息标识,该空间旋转信息标识用于指示该目标空间信息是否包括该目标空间对象的空间旋转信息。
可以理解的是,本实施例的流媒体的信息的处理方法的具体实现过程或者实现方案可以参照上述图8对应的方法实施例的相关描述,此处不再赘述。
图14是发明实施例提供一种流媒体的信息的处理装置1100,信息的处理装置1100可以为服务器,具体可以为计算机设备。该装置1100包括获取模块1101、确定模块1102,其中,
获取模块1101,用于获取目标空间对象的目标空间信息,该目标空间对象为两个空间对象中的空间对象,该两个空间对象与目标视频数据中包括的两个图像的数据关联,该目标空间信息包括同属性空间信息,该同属性空间信息包括该两个空间对象各自的空间信息之间相同的信息,该两个空间对象中除该目标空间对象外的其他空间对象的空间信息包括该同属性空间信息。
确定模块1102,用于根据该获取模块获取的该目标空间信息,确定要播放的视频数据。
在一些实施例中,信息的处理装置1100还可以包括显示模块(或者称为显示器),用于显示该要播放的视频数据。
在一些实施例中,获取模块1101用于接收来自服务器的该目标空间信息。在获取模块1101用于接收来自服务器的该目标空间信息时,获取模块可以为接收模块(或者称为接收器,或者收发器)。
在一些实施例中,该目标空间信息还可以包括该目标空间对象的异属性空间信息,该其他空间对象的空间信息还包括该其他空间对象的异属性空间信息,该目标空间对象的异属性空间信息与该其他空间对象的异属性信息不同。
在一些实施例中,该目标空间信息可以包括该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息,该目标空间信息还可以包括该目标空间对象的宽和该目标空间对象的高。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的中心点的位置信息或者该两个空间对象各自的左上点的位置信息,该两个空间对象各自的空间信息还可以包括该两个空间对象各自的宽和该两个空间对象各自的高。
在一些实施例中,该目标空间信息可以包括该目标空间对象的左上点的位置信息,和该目标空间对象的右下点的位置信息。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的左上点的位置信息和该两个空间对象各自的右下点的位置信息。
在一些实施例中,该目标空间信息可以包括该目标空间对象的空间旋转信息。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的空间旋转信息。
在一些实施例中,该目标空间信息可以封装在空间信息数据或者空间信息轨迹 (track)中,该空间信息数据可以为该目标视频数据的码流、该目标视频数据的元数据或者独立于该目标视频数据的文件,该空间信息轨迹可以为独立于该目标视频数据的轨迹。
其中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该同属性空间信息的类型的空间信息类型标识,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息。
其中,在该空间信息类型标识指示目标空间信息中没有属于该同属性空间信息的信息时,该同属性空间信息可以包括该目标空间对象的宽的最小值、该目标空间对象的高的最小值、该目标空间对象的宽的最大值和该目标空间对象的高的最大值。
其中,该空间信息类型标识可以与该同属性空间信息封装在同一个box中。
其中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该目标空间信息对应的坐标系的坐标系标识,该坐标系为像素坐标系或者角度坐标系。
其中,该坐标系标识可以与该同属性空间信息封装在同一个box中。
其中,该空间信息数据或者该空间信息轨迹还可以包括空间旋转信息标识,该空间旋转信息标识用于指示该目标空间信息是否包括该目标空间对象的空间旋转信息。
可以理解的是,获取模块1101、确定模块1102的功能可以通过软件编程的方式实现,也可以通过硬件编程实现,也可以通过电路实现,在此不做限定。
可以理解的是,本实施例的流媒体的信息的处理装置1100的各个模块的功能可以根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
图15是发明实施例提供一种流媒体的信息的处理装置1200,该装置包括获取模块1201、确定模块1202,其中,
获取模块1201,用于获取目标视频数据中的两个图像的数据所关联的两个空间对象各自的空间信息。
确定模块1202,用于根据该获取模块获取的该两个空间对象各自的空间信息,确定目标空间对象的目标空间信息,该目标空间对象为该两个空间对象中的空间对象,该目标空间信息包括同属性空间信息,该同属性空间信息包括该两个空间对象各自的空间信息之间相同的信息,该两个空间对象中除该目标空间对象外的其他空间对象的空间信息包括该同属性空间信息。
在一些实施例中,该装置1200还可以包括发送模块(或称为发送器,或者收发器),用于向客户端发送该确定模块确定的该目标空间信息。
在一些实施例中,该目标空间信息还可以包括该目标空间对象的异属性空间信息,该其他空间对象的空间信息还包括该其他空间对象的异属性空间信息,该目标空间对象的异属性空间信息与该其他空间对象的异属性信息不同。
在一些实施例中,该目标空间信息可以包括该目标空间对象的中心点的位置信息或者该目标空间对象的左上点的位置信息,该目标空间信息还可以包括该目标空间对象的宽和该目标空间对象的高。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的中心点的位置信息或者该两个空间对象各自的左上点的位置信息,该两个空间对象各自的空间信息还可以包括该两个空间对象各自的宽和该两个空间对象各自的高。
在一些实施例中,该目标空间信息可以包括该目标空间对象的左上点的位置信息,和该目标空间对象的右下点的位置信息。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的左上点的位置信息和该两个空间对象各自的右下点的位置信息。
在一些实施例中,该目标空间信息可以包括该目标空间对象的空间旋转信息。
相应的,该两个空间对象各自的空间信息可以包括该两个空间对象各自的空间旋转信息。
在一些实施例中,该目标空间信息可以封装在空间信息数据或者空间信息轨迹(track)中,该空间信息数据可以为该目标视频数据的码流、该目标视频数据的元数据或者独立于该目标视频数据的文件,该空间信息轨迹可以为独立于该目标视频数据的轨迹。
其中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该同属性空间信息的类型的空间信息类型标识,该空间信息类型标识用于指示该目标空间信息中属于该同属性空间信息的信息。
其中,在该空间信息类型标识指示目标空间信息中没有属于该同属性空间信息的信息时,该同属性空间信息可以包括该目标空间对象的宽的最小值、该目标空间对象的高的最小值、该目标空间对象的宽的最大值和该目标空间对象的高的最大值。
其中,该空间信息类型标识可以与该同属性空间信息封装在同一个box中。
其中,该空间信息数据或者该空间信息轨迹中还可以包括用于指示该目标空间信息对应的坐标系的坐标系标识,该坐标系为像素坐标系或者角度坐标系。
其中,该坐标系标识可以与该同属性空间信息封装在同一个box中。
其中,该空间信息数据或者该空间信息轨迹还可以包括空间旋转信息标识,该空间旋转信息标识用于指示该目标空间信息是否包括该目标空间对象的空间旋转信息。
可以理解的是,获取模块1201、确定模块1202的功能可以通过软件编程的方式实现,也可以通过硬件编程实现,也可以通过电路实现,在此不做限定。
可以理解的是,本实施例的流媒体的信息的处理装置1200的各个模块的功能可以根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
图16是本发明实施例提供的计算机设备1300的硬件结构示意图。如图16所示,计算机设备1300可以作为流媒体的信息的处理装置1100的一种实现方式,也可以作为流媒体的信息的处理装置1200的一种实现方式,计算机设备1300包括处理器1302、存储器1304、输入/输出接口1306、通信接口1308和总线1310。其中,处理器1302、存储器1304、输入/输出接口1306和通信接口1308通过总线1310实现彼此之间的通信连接。
处理器1302可以采用通用的中央处理器(Central Processing Unit,CPU),微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),或者一个或多个集成电路,用于执行相关程序,以实现本发明实施例所提供的流媒体的信息的处理装置1100或者流媒体的信息的处理装置1200中包括的模块所需执行的功能,或者执行本发明方法实施例提供的图8或者图13对应的流媒体的信息的处理方法。处理器1302可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1302中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理 器1302可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1304,处理器1302读取存储器1304中的信息,结合其硬件完成本发明实施例所提供的流媒体的信息的处理装置1100或者流媒体的信息的处理装置1200中包括的模块所需执行的功能,或者执行本发明方法实施例提供的图8或者图13对应的流媒体的信息的处理方法。
存储器1304可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器1304可以存储操作***以及其他应用程序。在通过软件或者固件来实现本发明实施例提供的流媒体的信息的处理装置1100或者流媒体的信息的处理装置1200中包括的模块所需执行的功能,或者执行本发明方法实施例提供的图8或者图13对应的流媒体的信息的处理方法时,用于实现本发明实施例提供的技术方案的程序代码保存在存储器1304中,并由处理器1302来执行流媒体的信息的处理装置1100或者流媒体的信息的处理装置1200中包括的模块所需执行的操作,或者执行本发明方法实施例提供的用于预测关键性能指标的回归模型的生成方法。
输入/输出接口1306用于接收输入的数据和信息,输出操作结果等数据。可以作为装置1100中的获取模块1101,或者装置1200中的获取模块1201或者发送模块。
通信接口1308使用例如但不限于收发器一类的收发装置,来实现计算机设备1300与其他设备或通信网络之间的通信。可以作为装置1100中的获取模块1101,或者装置1200中的获取模块1201或者发送模块。
总线1310可包括在计算机设备1300各个部件(例如处理器1302、存储器1304、输入/输出接口1306和通信接口1308)之间传送信息的通路。
应注意,尽管图16所示的计算机设备1300仅仅示出了处理器1302、存储器1304、输入/输出接口1306、通信接口1308以及总线1310,但是在具体实现过程中,本领域的技术人员应当明白,计算机设备1300还包含实现正常运行所必须的其他器件,例如在作为流媒体的信息的处理装置1100实现方式时,流媒体的信息的处理装置1100还可以包括显示器,用于显示要播放的视频数据。同时,根据具体需要,本领域的技术人员应当明白,计算机设备1300还可包含实现其他附加功能的硬件器件。此外,本领域的技术人员应当明白,计算机设备1300也可仅仅包含实现本发明实施例所必须的器件,而不必包含图16中所示的全部器件。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分 流程,是可以通过计算机程序来指令相关的硬件来完成,上述的程序可存储于一种计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,上述的存储介质可为磁碟、光盘、只读存储记忆体(ROM:Read-Only Memory)或随机存储记忆体(RAM:Random Access Memory)等。
尽管在此结合各实施例对本发明进行了描述,然而,在实施所要保护的本发明的过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其它组成部分或步骤,“一”或“一个”不排除多个的可能性。单个处理器或其它单元可以实现权利要求中列举的若干项功能。互相不同的从属权利要求中记载了某些措施,但这并不代表这些措施不能组合起来产生良好的效果。计算机程序可以存储/分布在合适的介质中,例如:光存储介质或固态介质,与其它硬件一起提供或作为硬件的一部分,也可以采用其它分布形式,如通过Internet或其它有线或无线电信***。
本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (15)

  1. 一种流媒体的信息的处理方法,其特征在于,所述方法包括:
    获取目标空间对象的目标空间信息,所述目标空间对象为两个空间对象中的空间对象,所述两个空间对象与目标视频数据中包括的两个图像的数据关联,所述目标空间信息包括同属性空间信息,所述同属性空间信息包括所述两个空间对象各自的空间信息之间相同的信息,所述两个空间对象中除所述目标空间对象外的其他空间对象的空间信息包括所述同属性空间信息;
    根据所述目标空间信息,确定要播放的视频数据。
  2. 根据权利要求1所述的方法,其特征在于,所述目标空间信息还包括所述目标空间对象的异属性空间信息,所述其他空间对象的空间信息还包括所述其他空间对象的异属性空间信息,所述目标空间对象的异属性空间信息与所述其他空间对象的异属性信息不同。
  3. 根据权利要求1或2所述的方法,其特征在于,所述目标空间信息包括所述目标空间对象的中心点的位置信息或者所述目标空间对象的左上点的位置信息,所述目标空间信息还包括所述目标空间对象的宽和所述目标空间对象的高。
  4. 根据权利要求1或2所述的方法,其特征在于,所述目标空间信息包括所述目标空间对象的左上点的位置信息,和所述目标空间对象的右下点的位置信息。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述目标空间信息包括所述目标空间对象的空间旋转信息。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述目标空间信息封装在空间信息数据或者空间信息轨迹(track)中,所述空间信息数据为所述目标视频数据的码流、所述目标视频数据的元数据或者独立于所述目标视频数据的文件,所述空间信息轨迹为独立于所述目标视频数据的轨迹。
  7. 根据权利要求6所述的方法,其特征在于,所述空间信息数据或者所述空间信息轨迹中还包括用于指示所述同属性空间信息的类型的空间信息类型标识,所述空间信息类型标识用于指示所述目标空间信息中属于所述同属性空间信息的信息。
  8. 根据权利要求7所述的方法,其特征在于,在所述空间信息类型标识指示目标空间信息中没有属于所述同属性空间信息的信息时,所述同属性空间信息包括所述目标空间对象的宽的最小值、所述目标空间对象的高的最小值、所述目标空间对象的宽的最大值和所述目标空间对象的高的最大值。
  9. 根据权利要求7或8所述的方法,其特征在于,所述空间信息类型标识与所述同属性空间信息封装在同一个box中。
  10. 根据权利要求6至9任一项所述的方法,其特征在于,所述空间信息数据或者所述空间信息轨迹中还包括用于指示所述目标空间信息对应的坐标系的坐标系标识,所述坐标系为像素坐标系或者角度坐标系。
  11. 根据权利要求10所述的方法,其特征在于,所述坐标系标识与所述同属性空间信息封装在同一个box中。
  12. 根据权利要求6至11任一项所述的方法,其特征在于,所述空间信息数据或者所述空间信息轨迹还包括空间旋转信息标识,所述空间旋转信息标识用于指示所述目标空间信息是否包括所述目标空间对象的空间旋转信息。
  13. 一种流媒体的信息的处理装置,其特征在于,所述装置包括:
    获取模块,用于获取目标空间对象的目标空间信息,所述目标空间对象为两个空间对象中的空间对象,所述两个空间对象与目标视频数据中包括的两个图像的数据关联,所述目标空间信息包括同属性空间信息,所述同属性空间信息包括所述两个空间对象各自的空间信息之间相同的信息,所述两个空间对象中除所述目标空间对象外的其他空间对象的空间信息包括所述同属性空间信息;
    确定模块,用于根据所述获取模块获取的所述目标空间信息,确定要播放的视频数据。
  14. 根据权利要求13所述的装置,其特征在于,所述目标空间信息还包括所述目标空间对象的异属性空间信息,所述其他空间对象的空间信息还包括所述其他空间对象的异属性空间信息,所述目标空间对象的异属性空间信息与所述其他空间对象的异属性信息不同。
  15. 根据权利要求13或14所述的装置,其特征在于,所述目标空间信息包括所述目标空间对象的中心点的位置信息或者所述目标空间对象的左上点的位置信息,所述目标空间信息还包括所述目标空间对象的宽和所述目标空间对象的高。
PCT/CN2017/078585 2016-12-30 2017-03-29 一种信息的处理方法及装置 WO2018120474A1 (zh)

Priority Applications (9)

Application Number Priority Date Filing Date Title
MX2019007926A MX2019007926A (es) 2016-12-30 2017-03-29 Método y aparato de procesamiento de información.
EP17888252.8A EP3557534A4 (en) 2016-12-30 2017-03-29 INFORMATION PROCESSING METHOD AND APPARATUS
EP23172704.1A EP4287637A1 (en) 2016-12-30 2017-03-29 Information processing method and apparatus
CN201780081717.5A CN110121734B (zh) 2016-12-30 2017-03-29 一种信息的处理方法及装置
JP2019535932A JP7058273B2 (ja) 2016-12-30 2017-03-29 情報処理方法および装置
BR112019013609A BR112019013609A8 (pt) 2016-12-30 2017-03-29 Método e aparelho de processamento de informação
KR1020197022156A KR102261559B1 (ko) 2016-12-30 2017-03-29 정보 처리 방법 및 장치
CN201710214606.9A CN108271044B (zh) 2016-12-30 2017-04-01 一种信息的处理方法及装置
US16/458,734 US20190325652A1 (en) 2016-12-30 2019-07-01 Information Processing Method and Apparatus

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201611252815.4 2016-12-30
CN201611252815 2016-12-30
CNPCT/CN2017/070863 2017-01-11
PCT/CN2017/070863 WO2018120294A1 (zh) 2016-12-30 2017-01-11 一种信息的处理方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/458,734 Continuation US20190325652A1 (en) 2016-12-30 2019-07-01 Information Processing Method and Apparatus

Publications (1)

Publication Number Publication Date
WO2018120474A1 true WO2018120474A1 (zh) 2018-07-05

Family

ID=62706718

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/078585 WO2018120474A1 (zh) 2016-12-30 2017-03-29 一种信息的处理方法及装置

Country Status (1)

Country Link
WO (1) WO2018120474A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11532128B2 (en) * 2017-03-23 2022-12-20 Qualcomm Incorporated Advanced signaling of regions of interest in omnidirectional visual media

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132931A1 (en) * 2007-11-15 2009-05-21 International Business Machines Corporation Method, device and program for automatically generating reference mark in virtual shared space
CN102930047A (zh) * 2012-11-15 2013-02-13 中国科学院深圳先进技术研究院 虚拟地球用户化身节点检索方法及***
CN103390057A (zh) * 2013-07-26 2013-11-13 国家测绘地理信息局卫星测绘应用中心 一种历史信息的空间化建模存储方法
CN104735464A (zh) * 2015-03-31 2015-06-24 华为技术有限公司 一种全景视频交互传输方法、服务器和客户端
CN105898254A (zh) * 2016-05-17 2016-08-24 亿唐都科技(北京)有限公司 节省带宽的vr全景视频布局方法、装置及展现方法、***

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132931A1 (en) * 2007-11-15 2009-05-21 International Business Machines Corporation Method, device and program for automatically generating reference mark in virtual shared space
CN102930047A (zh) * 2012-11-15 2013-02-13 中国科学院深圳先进技术研究院 虚拟地球用户化身节点检索方法及***
CN103390057A (zh) * 2013-07-26 2013-11-13 国家测绘地理信息局卫星测绘应用中心 一种历史信息的空间化建模存储方法
CN104735464A (zh) * 2015-03-31 2015-06-24 华为技术有限公司 一种全景视频交互传输方法、服务器和客户端
CN105898254A (zh) * 2016-05-17 2016-08-24 亿唐都科技(北京)有限公司 节省带宽的vr全景视频布局方法、装置及展现方法、***

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11532128B2 (en) * 2017-03-23 2022-12-20 Qualcomm Incorporated Advanced signaling of regions of interest in omnidirectional visual media

Similar Documents

Publication Publication Date Title
WO2018120294A1 (zh) 一种信息的处理方法及装置
WO2018214698A1 (zh) 一种视频信息的呈现方法和装置
KR102241082B1 (ko) 복수의 뷰포인트들에 대한 메타데이터를 송수신하는 방법 및 장치
WO2018058773A1 (zh) 一种视频数据的处理方法及装置
KR102305633B1 (ko) 퀄리티 기반 360도 비디오를 송수신하는 방법 및 그 장치
TWI670973B (zh) 在iso基本媒體檔案格式推導虛擬實境投影、填充、感興趣區域及視埠相關軌跡並支援視埠滾動訊號之方法及裝置
US20200389640A1 (en) Method and device for transmitting 360-degree video by using metadata related to hotspot and roi
CN109644262A (zh) 发送全向视频的方法、接收全向视频的方法、发送全向视频的装置和接收全向视频的装置
CN109218755B (zh) 一种媒体数据的处理方法和装置
WO2018068213A1 (zh) 一种视频数据的处理方法及装置
WO2018126702A1 (zh) 一种应用于虚拟现实技术的流媒体的传输方法和客户端
CN107888993B (zh) 一种视频数据的处理方法及装置
US20210176446A1 (en) Method and device for transmitting and receiving metadata about plurality of viewpoints
US20190199921A1 (en) Method for transmitting 360-degree video, method for receiving 360-degree video, 360-degree video transmitting device, and 360-degree video receiving device
US20200145716A1 (en) Media information processing method and apparatus
CN110798707B (zh) 传输媒体数据的方法、客户端和服务器
CN107959861B (zh) 一种数据处理方法、相关设备及***
WO2018058993A1 (zh) 一种视频数据的处理方法及装置
WO2018120474A1 (zh) 一种信息的处理方法及装置
CN116248947A (zh) 处理媒体数据的方法、客户端和服务器
KR20200008631A (ko) 360도 비디오를 전송하는 방법, 360도 비디오를 수신하는 방법, 360도 비디오 전송 장치, 360도 비디오 수신 장치
CN108271084B (zh) 一种信息的处理方法及装置
WO2023169003A1 (zh) 点云媒体的解码方法、点云媒体的编码方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17888252

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019535932

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112019013609

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2017888252

Country of ref document: EP

Effective date: 20190715

Ref document number: 20197022156

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 112019013609

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20190628