WO2022269163A1 - Procédé de construction d'une image de profondeur d'une vidéo multi-vues, procédé de décodage d'un flux de données représentatif d'une vidéo multi-vues, procédé de codage, dispositifs, système, équipement terminal, signal et programmes d'ordinateur correspondants - Google Patents
Procédé de construction d'une image de profondeur d'une vidéo multi-vues, procédé de décodage d'un flux de données représentatif d'une vidéo multi-vues, procédé de codage, dispositifs, système, équipement terminal, signal et programmes d'ordinateur correspondants Download PDFInfo
- Publication number
- WO2022269163A1 WO2022269163A1 PCT/FR2022/051126 FR2022051126W WO2022269163A1 WO 2022269163 A1 WO2022269163 A1 WO 2022269163A1 FR 2022051126 W FR2022051126 W FR 2022051126W WO 2022269163 A1 WO2022269163 A1 WO 2022269163A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- block
- view
- depth image
- depth
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000004590 computer program Methods 0.000 title claims description 14
- 230000033001 locomotion Effects 0.000 claims abstract description 178
- 239000013598 vector Substances 0.000 claims abstract description 104
- 238000010276 construction Methods 0.000 claims description 23
- 230000002194 synthesizing effect Effects 0.000 claims description 2
- 230000006870 function Effects 0.000 description 16
- 230000015654 memory Effects 0.000 description 16
- 238000004364 calculation method Methods 0.000 description 11
- 238000012545 processing Methods 0.000 description 11
- 230000015572 biosynthetic process Effects 0.000 description 7
- 238000003786 synthesis reaction Methods 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 102100029158 Consortin Human genes 0.000 description 1
- 101000771062 Homo sapiens Consortin Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000004377 microelectronic Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/119—Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/137—Motion inside a coding unit, e.g. average field, frame or block difference
- H04N19/139—Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Definitions
- Method for constructing a depth image of a multi-view video Method for constructing a depth image of a multi-view video, method for decoding a data stream representative of a multi-view video, encoding method, devices, system, terminal equipment, signal and programs for corresponding computer
- the present invention relates generally to the field of 3D image processing, and more specifically to the decoding of multi-view image sequences and, in particular, to the construction of a depth image associated with a current view, from information coded in the data stream and representative of a texture image of the same current view.
- a device such as a Virtual Reality Headset
- viewpoint is a viewpoint captured by a camera or a viewpoint that has not been captured by a camera.
- Such a view which has not been captured by the camera is also called a virtual view or an intermediate view because it is between views captured by the camera and must be synthesized for the rendering of the scene to the viewer from the captured views.
- the scene is conventionally captured by a set of cameras, as shown in Figure 1.
- These cameras can be of the 2D (cameras Ci, C 2 ...C N , with N non-zero integer of figure 1), that is to say that each of them captures a view according to a point of view, or of the 360 type , ie they capture the whole scene at 360 degrees around the camera (C360 camera in figure 1), so from several different points of view.
- the cameras can be arranged in an arc, a rectangle, or any other configuration that provides good coverage of the scene.
- a set of images representing the scene according to different views is obtained at a given instant.
- videos a temporal sampling of the captured images is carried out (for example at 30 frames per second), in order to produce an original multi-view video, as shown in Figure 3.
- the information captured by these cameras is encoded in a data stream and transmitted to a decoder, which will decode all or part of these views. Then, a view synthesis is applied in order to synthesize the view requested by the user at a given moment, according to his position and his angle of view of the scene.
- DIBR free viewpoint Depth-lmage-BasedRendering
- DIBR Depth-Based Image Rendering
- each view contains a texture component (i.e. the image in the classical sense) and a depth component (i.e. a depth map, for which the intensity of each pixel is associated with the depth of the scene at that location).
- a texture component i.e. the image in the classical sense
- a depth component i.e. a depth map, for which the intensity of each pixel is associated with the depth of the scene at that location.
- Techniques are also known for estimating the depth of a pixel of a texture image from one or more other texture images, such as for example the DERS technique (for "Depth Estimation Reference Software", in English), described in particular in the document entitled “Enhanced Depth Estimation Reference Software (DERS) for Free-viewpoint T elevision”, by Stankiewicz et al., published in October 2013, by the ISO and available at the following link https:/ /www.researchgate.net/publication/271851694_Enhanced_Depth_Estimation_Reference_Sof tware_DERS_for_Free-viewpoint_Television), the IVDE technique (for “Immersive Video Depth Estimation”, in English), described in the document entitled “Depth Map Refinement for Immersive Video”, by D.
- DERS Depth Estimation Reference Software
- This standard describes in particular a coding and decoding profile called “Geometry Absent”, according to which (FIG. 4, where only the views at a time T are represented), only the components or images of original textures TO are coded. No depth information is therefore transmitted in the coded data stream.
- a decoded version T D of the texture images is obtained from the information coded in the data stream.
- the depth components P E of one or more views are then estimated from the decoded texture images T D .
- the decoded texture images and the estimated depth images are available for any subsequent DIBR synthesis of a view V, to respond to the user's request at time T.
- the depth component is not transmitted, which reduces the quantity of data to be processed (encoding, transmission and decoding) and makes it possible to save calculation resources and bandwidth;
- the depth component is not captured during the acquisition of the multi-view video of the scene, so it is not necessary to resort to specific detection and distance estimation devices such as LIDARs (for “Laser Imaging Detection And Ranging”, in English), based on the analysis of the properties of a beam of light returned to its emitter.
- LIDARs for “Laser Imaging Detection And Ranging”, in English
- a major drawback of this approach is that it requires, on the other hand, an estimation of this depth component in the decoding device.
- the invention meets this need by proposing a method for constructing a depth image associated with a view of a multi-view video, called current view, from a data stream representative of said video, said stream comprising information representative of the motion vectors of a texture image associated with said current view with respect to at least one reference texture image, said texture image having been cut into blocks.
- Said method comprises:
- the motion compensation of a block of the depth image, co-located with the current block from said at least one motion vector and from at least one available reference depth image, said reference depth image being associated with the same view as said reference texture image.
- the invention proposes an entirely new and inventive approach for constructing a depth image of a view of a multi-view video, when the latter has not been transmitted in the data stream. It consists in exploiting the motion vectors transmitted in the stream for the texture image associated with the same view to compensate in motion at least part of this depth image from an available reference depth image (already decoded or built in accordance with the construction method according to the invention) and associated with the same view as the reference texture image.
- Such motion compensation is much less complex to implement than an estimation of the depth image from the texture images according to one of the aforementioned prior art techniques, such as DERS, IVDE, GaNet, etc.
- the resources of the receiving terminal are therefore preserved.
- the invention finds particular application in the case of the “Geometry Absent” profile defined by the MIV coding standard, according to which no depth information is transmitted in the coded data stream.
- the method when no motion vector has been decoded for said at least one block of the texture image, for example according to the INTRA coding mode or another coding mode which does not use a motion vector, the method does not trigger the implementation of said motion compensation of said at least one block and comprises estimating said at least one block of the depth image from at least one previously processed texture image.
- the method comprises obtaining a motion compensation indicator from information coded in the stream, said indicator being associated with said block of the depth image and the method comprises the decision to implement said motion compensation when the indicator is positioned at a predetermined value.
- An advantage is that it is possible to decide on the coder side for which elements of the depth image the reconstruction of the depth image is authorized by motion compensation and to transmit this decision by means of this indicator.
- This embodiment applies advantageously to the case where the depth image of the current view has actually been captured on the encoder side, then used to evaluate a performance level of motion compensation for this image, for example by comparing the motion compensated depth image to the actual captured depth image. Such a comparison makes it possible to calculate an error, for example a quantity of energy of a residue between the depth image actually captured and the motion-compensated depth image from the motion vectors of the associated texture image to current view.
- the indicator is positioned at the predetermined value, for example equal to 1, otherwise l indicator is set to another value, for example equal to 0.
- the method comprises obtaining an identifier of the reference texture image from information encoded in the data stream and obtaining the depth image of reference from said identifier.
- the invention also relates to a device for constructing a depth image associated with a view of a multi-view video, called current view, from a data stream representing said video, said stream comprising coded information representative of the motion vectors of a texture image associated with said current view with respect to at least one reference texture image, said texture image having been cut into blocks.
- Said device is configured to implement:
- the motion compensation of a block of the depth image co-located with the current block from said at least one motion vector and at least one image of available reference depth, said reference depth image being associated with a same view as said reference texture image.
- said device is configured to implement the steps of the construction method as described previously in its various embodiments.
- the construction device has in combination all or part of the characteristics set out throughout this document.
- the invention also relates to a method for decoding a data stream representative of a multi-view video, said stream comprising coded information representative of motion vectors of a texture image of a current view with respect to to a reference texture image, said texture image having been cut into blocks, said method comprising:
- the motion compensation of a block of the depth image, co-located with the current block from said at least one motion vector and from at least one available reference depth image, said reference depth image being associated with the same view as said reference texture image.
- the invention relates to a method for decoding a data stream representative of a multi-view video, said stream comprising coded information representative of motion vectors of a texture image of a current view by relative to a reference texture image, said texture image having been cut into blocks, said method comprising:
- the decoding method comprises the characteristics of the aforementioned construction method.
- the decoding method further comprises the decoding of coded information representing a motion compensation indicator of said at least one block of said depth image, said construction being implemented for said block when the indicator is positioned at a predetermined value.
- the invention also relates to a device for decoding a data stream representing a multi-view video, said stream comprising current view in relation to a reference texture image, said texture image having been cut into blocks, said device being configured to implement:
- said device is configured to implement the steps of the decoding method as described previously in its different embodiments.
- the construction device is itself integrated into the decoding device.
- the aforementioned construction and decoding devices are integrated into a free navigation system in a multi-view video of a scene.
- said system further comprises a module for synthesizing a view according to a point of view chosen by a user from the decoded texture images and the constructed depth images.
- the aforementioned free navigation system is integrated into terminal equipment configured to receive a stream of coded data representative of a multi-view video.
- the terminal equipment and the free navigation system have at least the same advantages as those conferred by the aforementioned construction and decoding methods.
- the invention also relates to a method for coding a data stream representative of a multi-view video and comprising:
- captured depth image Obtaining a depth image associated with said current view, captured by a depth camera, called captured depth image
- the motion compensation of at least one block of a depth image associated with the current view called depth image constructed, said block being co-located with said block of the texture image, from said at least one motion vector and from at least one available reference depth image, said reference depth image being associated with the same view as said reference texture image;
- the invention also relates to a device for coding a data stream representative of a multi-view video and configured to implement:
- captured depth image Obtaining a depth image associated with said current view, captured by a depth camera, called captured depth image
- the motion compensation of at least one block of a depth image associated with the current view called depth image constructed, said block being co-located with said block of the texture image, from said at least one motion vector and from at least one available reference depth image, said reference depth image being associated with the same view as said reference texture image;
- said device is configured to implement the steps of the coding method as described previously in its various embodiments.
- the invention also relates to a signal carrying a stream of coded data representative of a multi-view video, said stream comprising coded data representative of motion vectors of a texture image of a current view with respect to to a reference texture image, said texture image having been cut into blocks.
- said stream comprises coded data representative of a motion compensation indicator, said indicator being associated with said at least one block of a depth image associated with said current view and said indicator is intended, when it is positioned at a predetermined value, to be used to implement a motion compensation of said block of the depth image, from at least one decoded motion vector and from at least one available reference depth image, said reference depth image being associated with the same view as said reference texture image.
- the invention also relates to computer program products comprising program code instructions for implementing the methods as described previously, when they are executed by a processor.
- a program may use any programming language, and be in the form of source code, object code, or intermediate code between source code and object code, such as in partially compiled form, or in any other desirable form.
- the invention also relates to a recording medium readable by a computer on which are recorded computer programs comprising program code instructions for the execution of the steps of the methods according to the invention as described above.
- Such recording medium can be any entity or device capable of storing the program.
- the medium may include a storage medium, such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or else a magnetic recording medium, for example a mobile medium (memory card) or a hard drive or SSD.
- such a recording medium may be a transmissible medium such as an electrical or optical signal, which may be conveyed via an electrical or optical cable, by radio or by other means, so that the program computer it contains is executable remotely.
- the program according to the invention can in particular be downloaded on a network, for example the Internet network.
- the recording medium may be an integrated circuit in which the program is incorporated, the circuit being adapted to execute or to be used in the execution of the aforementioned construction, coding and/or decoding methods.
- the present technique is implemented by means of software and/or hardware components.
- the term "module" may correspond in this document to a software component, a hardware component or a set of hardware and software components.
- a software component corresponds to one or more computer programs, one or more sub-programs of a program, or more generally to any element of a program or software capable of implementing a function or a set of functions, as described below for the module concerned.
- Such a software component is executed by a data processor of a physical entity (terminal, server, gateway, set-top-box, router, etc.) and is likely to access the hardware resources of this physical entity (memories, recording media, communication bus, electronic input/output cards, interfaces user, etc).
- resources means all sets of hardware and/or software elements supporting a function or a service, whether unitary or combined.
- a hardware component corresponds to any element of a hardware assembly (or hardware) able to implement a function or a set of functions, according to what is described below for the module concerned. It can be a hardware component that can be programmed or has an integrated processor for executing software, for example an integrated circuit, a smart card, a memory card, an electronic card for executing firmware ( “firmware” in English), etc.
- Figure 1 shows an example of an arrangement of a plurality of cameras forming a system for acquiring multi-view video of a scene, according to the prior art
- Figure 2 schematically illustrates a plurality of images of the scene, captured by the plurality of cameras at a given instant, according to the prior art
- FIG. 3 schematically illustrates a sequence of the plurality of images, captured by the plurality of cameras at several successive instants, forming the original multi-view video, according to the prior art
- FIG. 4 schematically illustrates an example of processing of a coded data stream representative of a multi-view video by terminal equipment according to the prior art
- Figure 5 schematically illustrates an example of architecture of a terminal equipment comprising a system for free navigation in a multi-view video comprising a device for decoding a stream of coded data representative of a multi-view video and a device for constructing a depth image associated with a view of said video according to one embodiment of the invention
- FIG. 6 describes in the form of a flowchart the steps of a method for decoding a stream of coded data representative of a multi-view video, according to an example embodiment of the invention
- FIG. 7 describes in the form of a flowchart the steps of a method for constructing a depth image of a view of a multi-view video according to one embodiment of the invention
- FIG. 8 describes in the form of a flowchart the steps of a method for encoding a stream of encoded data representative of a multi-view video according to one embodiment of the invention
- Figure 9 details an example of implementation of the aforementioned methods according to one embodiment of the invention.
- FIG. 10 describes an example of the hardware structure of a device for constructing a depth image according to the invention.
- FIG. 11 describes an example of hardware structure of a device for decoding a multi-view video according to the invention.
- FIG. 12 describes an example of hardware structure of a multi-view video coding device according to the invention.
- the principle of the invention is based on the decoding of motion vectors of a texture image associated with a current view of a multi-view video with respect to a reference texture image and on the construction of at least one block of a depth image associated with said current view, by motion compensation of this block from the decoded motion vector for a co-located block of the texture image and an available reference depth image ( decoded conventionally or constructed according to the invention), said reference depth image being associated with the same view as said reference texture image.
- the invention finds particular application in a free navigation system within a multi-view video, for example embedded in terminal equipment, for example of the mobile telephone or virtual reality headset type.
- a free navigation system within a multi-view video, for example embedded in terminal equipment, for example of the mobile telephone or virtual reality headset type.
- the depth images associated with the views are used in association with the decoded texture images to synthesize the view desired by the user.
- FIG. 5 an example is presented of the architecture of a terminal equipment UE, UE′ comprising a virtual reality headset HMD according to one embodiment of the invention.
- Such terminal equipment is configured to receive and process a coded F D data stream representative of a multi-view video of a scene as well as to display on a display device DISP, for example the screen of the HMD helmet, any view of the scene chosen by the UT user.
- the terminal equipment UE, UE′ integrates a system S, S′ of free navigation in the multi-view video of the scene according to the invention.
- the terminal equipment UE comprises a device 100 for constructing a depth image associated with the current view, a device 200, 200' for decoding a coded data stream representative of the multi-view video and a SYNT synthesis module for a view chosen by the user.
- the device 100 is configured to obtain from the data stream F D motion vectors of a texture image associated with the current view with respect to at least one reference texture image, and to compensate in motion at least one block of the depth image from at least one decoded motion vector and at least one already constructed reference depth image, said reference depth image being associated with the same view as said reference texture image.
- the device 100 thus implements the method for constructing a depth image according to the invention which will be detailed below in relation to FIG. 7.
- the device 200, 200' for decoding the coded data stream F D representative of the multi-view video is configured to decode coded information representative of the motion vectors of the texture image associated with said current view and to transmit at least said decoded information to the aforementioned construction device 100.
- the device 200, 200' thus implements the method for decoding a coded data stream representative of a multi-view video according to the invention which will be detailed in relation to FIG. 6.
- the data stream F D has been coded by a coding device 300, for example integrated into a remote server equipment ES which has transmitted it via its transmission-reception module E/R to that of the terminal equipment UE or UE' via a communication network RC.
- the SYNT synthesis module is configured to generate the view chosen by the user from the decoded texture images and the constructed depth images, when this does not correspond to any of the views of the multi-view video transmitted in the stream of data F D .
- the terminal equipment UE′ which integrates a free navigation system S′ in the multi-view video comprising the device 200′ for decoding a stream of coded data and a SYNT synthesis module.
- the device 200′ incorporates the aforementioned device 100.
- the coding device 300 is configured to code information representative of motion vectors of a texture image of a current view with respect to a reference texture image, to obtain an image depth associated with said current view, captured by a depth camera, called captured depth image, compensating in motion at least one block of a depth image associated with the current view, called constructed depth image, from at least one decoded motion vector and at least one already constructed reference depth image, said reference depth being associated with the same view as said reference texture image, evaluating said constructed depth image by comparison with the captured depth image, a compensation error being obtained; and encoding information representative of a motion compensation indicator of said at least one element of said depth image as a function of a predetermined error criterion, said indicator being positioned at a first value when the error criterion is satisfied .
- the device 300 thus implements the coding method according to the invention which will be detailed below in relation to FIG. 8.
- an image is conventionally designated by an array of pixels, generally rectangular. Such an image can associate with each pixel of the table a texture value or a depth value.
- view designates the image or images acquired by a camera according to a particular point of view of the scene.
- a view can be represented by a texture image and/or a depth image, which form the components of this view.
- the texture component TO is conventionally divided into blocks of pixels, for example of dimensions 16 ⁇ 16, 8 ⁇ 8, 4 ⁇ 4 or other and the coding is carried out block by block, in a conventional manner.
- the invention is not limited to this particular case and also applies to another division or to coding per pixel.
- the encoder chooses in a manner known per se for each block of a current texture image, if it is going to be coded according to a so-called INTER mode, that is to say by motion compensation (in which case a reference image and at least one motion vector are signaled in the coded data stream) or according to a so-called INTRA mode or any other mode which does not include a motion vector.
- INTER mode that is to say by motion compensation (in which case a reference image and at least one motion vector are signaled in the coded data stream) or according to a so-called INTRA mode or any other mode which does not include a motion vector.
- the motion vector or vectors are encoded in the data stream F D as well as an identifier of the reference texture image ID_T R .
- a terminal equipment UE, UE′ and of a device 200, 200′ for decoding a stream of coded data FD received by this terminal equipment.
- FIG. 6 in the form of a flowchart, an example of implementation of a method for decoding the data stream F D according to one embodiment of the invention.
- this method is implemented by the aforementioned device 200 or 200'.
- a current view Vc and a current block Bc of the texture image TOc associated with this view Vc are considered.
- a decoding for example of FIEVC type is carried out. It comprises at 60 the reading and decoding of syntax elements included in the information of the stream F D .
- these syntax elements comprise a prediction mode MP used for the current block Bc. This mode can be of the INTER, INTRA or other type. From this prediction mode information, it is deduced whether a motion vector MV is encoded in the stream F D .
- the prediction mode MP is of the INTER type.
- the motion vector MV associated with the current block Bc is decoded. This motion vector is representative of a motion of the current texture image TOc relative to a reference image T R .
- this motion vector information MV and reference texture image identifier are stored in a memory M2 then transmitted at 64 to the device 100 for constructing a depth image associated with the current view Vc according to the invention.
- This device 100 may or may not be integrated into the decoding device 200, 200', as illustrated by FIG. 5 already described.
- a motion compensation indicator Fcm is decoded at 63. It can take at least a first value VI for example equal to 1 to indicate that motion compensation of the depth image Pc is to be implemented or a second value V2 for example equal to zero to indicate on the contrary that motion compensation should not be implemented.
- This indicator when it is decoded, is then transmitted at 64 to the device 100 according to the invention.
- the motion vector MV of the current block Bc is obtained. For example, it is received from the decoding device 200 or 200'.
- the identifier ID_T R of the reference texture image used to estimate said motion vector MV is obtained. For example, it is received from the decoding device 200 or 200'.
- This reference texture image is associated with a view V R .
- a motion compensation indicator Fcm is obtained and stored in memory M1. For example, it is encoded in the stream F D and received from the decoding device 200 or 200'.
- first predetermined value for example equal to 1
- second predetermined value for example equal to 0
- an information field Mij is filled in according to the information previously obtained. It is for example positioned at 1 when a motion vector MV has been obtained for the current block and when the indicator Fcm is received with the first predetermined value.
- the current depth block BPc is motion compensated. Such a decision is made based on whether or not a motion vector MV has been obtained for the current texture block Bc or when the Fcm flag has been received, based on the value of the Fcm flag or again, in the embodiment where the Mij information field is used, based on the value of Mij.
- a reference depth image P R is obtained from the identifier ID_T R .
- This is a depth image previously constructed by the device 100 according to the invention and in particular that which is associated with the same view V R as the reference texture image T R .
- the current block BPc of the depth image Pc is constructed by motion compensation CM of the block of the reference depth image P R pointed to by the motion vector MV.
- the device 100 constructs the depth image Pc of the current view Vc block by block, as it obtains the decoded information from the data stream F D from the device 200 or 200'.
- it waits to have received all the decoded information to build the current depth image P.
- FIG. 8 we now present, in the form of a flowchart, an example of implementation of a method for coding a depth image of a current view according to an embodiment of the invention.
- this method is implemented by the aforementioned device 300.
- this image is divided into blocks and that the blocks of the image are scanned according to a predetermined scanning order, for example a so-called zigzag mode (which corresponds to the lexicographical order).
- the prediction mode MP to be used for this current block is determined.
- the encoding device chooses whether the current block of the image is going to be encoded in INTER mode, by motion compensation (in which case a reference texture image T R and a motion vector MV are signaled in the flow of coded data F D ) or in intra mode or any other mode which does not include a motion vector.
- the prediction mode MP chosen for the current block Bc is a mode for which a motion vector MV is calculated.
- the motion vector MV of the current block is calculated with respect to a reference texture image T R . It is associated with a view V R .
- the motion vector is coded and the coded information obtained is inserted into a stream of coded data F D .
- an original depth image POc is captured by a depth camera associated with the camera arranged to capture the texture image TOc of the current view Vc.
- a previously constructed depth image referred to as the reference depth image P R , and associated with the same view V R as the reference texture image T R is obtained, for example from a memory M3.
- a block BPc corresponding to the current block Bc in an image of depth Pc associated with the current view Vc is constructed by motion compensation of the block of the reference depth image P R pointed to by the motion vector MV.
- the coding device performs identical motion compensation of the co-located depth block in the associated depth image at the same view Vc.
- the current texture block BTOc, of coordinates (i,j) in the texture image TOc at time t is motion compensated by a block of the reference image T R at time t' , with a motion vector of components (MVx,MVy)
- the coding device performs at 84 a motion compensation of the block of depth BPc, of coordinates (i,j) of the current depth image Pc at using the depth block BP R of the depth image P R associated with the reference texture image TO R at time t', with the motion vector of components (MVx,MVy).
- the same motion compensation is applied to the current depth block BPc as to the co-located texture block associated with the same current view Vc and a compensated depth block is thus obtained.
- the compensated block BPc is evaluated by comparison with the block co-located at i,j of the original depth image POc. For example, a quantity of energy of a residue between the actually captured depth image POc and the motion-compensated depth image PCc is calculated.
- a motion compensation indicator Fcm is determined as a function of a positioning of this quantity of energy with respect to a predetermined threshold.
- the indicator is positioned at the first value, for example equal to 1, to indicate that the current block is motion compensated for its depth component. Otherwise the indicator is positioned at a second value, for example equal to 0, which means that the current block is not motion compensated for its depth component.
- the indicator Fcm is then encoded in the stream F D .
- the depth indicator Fcm is not transmitted for a depth block associated with a texture block which has no motion vector (for example, because it is coded according to the intra mode).
- the data stream F D obtained is for example stored in memory M3 or transmitted in a signal to one or more terminal equipment via the communication network RC.
- FIG. 9 an example of decoding of a data stream representative of a multi-view video and construction of a depth image Pc of a current view Vc of this video according to a mode of carrying out the invention.
- a data stream F D is obtained. For example, it is received by the decoding device 200 via the communication network RC. It is assumed that the depth image associated with the same current view Vc is not coded in the stream F D .
- the block DEC for decoding the current texture image Te and information relating to the processing of the current depth image Pc It is considered that the current texture image Te is for example divided into blocks. The blocks of this image are processed according to a predetermined processing order, for example in zigzag mode, which corresponds to the lexicographic order.
- the information coded in the data stream F D is read at 60-63.
- the prediction mode MP of this block if applicable the motion vector MV and an identifier ID_T R of the reference texture image are obtained.
- a movement compensation indicator Fcm of the co-located block in the depth image Pc is read.
- the current block Bc of the current texture image Te is decoded.
- a motion vector MV of the current texture block is obtained, as well as at 71, an identifier ID_T R of the reference texture image.
- an information field Mij is populated with a first value indicating that the current depth block should be motion compensated.
- the motion vector and the identifier of the reference texture image are stored in memory.
- the information field Mij is filled with a second value indicating that the current depth block should not be motion compensated.
- a motion compensation identifier Fcm is also obtained at 72 and stored in memory.
- Fcm motion compensation identifier
- a decision to implement or not to implement motion compensation of the current depth block is made based on previously obtained information.
- the indicator Fcm prevails, that is to say that, when it has been obtained, the decision is made according to its value. In other words, if the indicator is positioned at the first value, it is decided to compensate in motion the current depth block. Otherwise, the decision depends on the presence or absence of a motion vector for the co-located texture block associated with the current view.
- a reference depth image P R is obtained at 75 from the identifier ID_T R of the reference texture image. This is the depth image associated with the same view as the reference texture image T R .
- the current depth block BPc is motion compensated according to a conventional motion compensation technique, from the block of the reference depth image co-located with that pointed to by the motion vector in the image of reference texture T R .
- a conventional estimation is implemented at 77 for example using one of the techniques DERS, IVDE or GANet mentioned above.
- the decoded texture images T D and the constructed depth images Pc NST S are then advantageously exploited by a synthesis module to generate, for example according to a DIBR type technique, the view chosen by the user UT of the restitution system d a multi-view video according to the invention, for example according to its point of view of the scene.
- an example of the hardware structure of a device 100 for constructing a depth image of a current view of a multi-view video comprising at least one module for obtaining of motion vectors of a texture image associated with the current view, from a coded data stream representative of the multi-view video, and a motion compensation module of at least one element of said image of depth, configured to be implemented when at least one motion vector has been obtained for said at least one element.
- the device 100 further comprises obtaining a motion compensation indicator from information coded in the stream, said indicator being associated with said at least one element of the depth image and a decision module to implement said motion compensation when the indicator is positioned at a first value.
- it also comprises a module for obtaining an identifier of the reference texture image and a module for obtaining the reference depth image P R from said identifier.
- module can correspond both to a software component and to a hardware component or a set of hardware and software components, a software component itself corresponding to one or more computer programs or sub-programs or in a more general to any element of a program capable of implementing a function or a set of functions.
- a device 100 comprises a random access memory 103 (for example a RAM memory), a processing unit 102 equipped for example with a processor, and controlled by a computer program Pgl, representative of the modules for obtaining, motion decision and compensation, stored in a read only memory 101 (for example a ROM memory or a hard disk).
- a read only memory 101 for example a ROM memory or a hard disk.
- the code instructions of the computer program are for example loaded into the random access memory 103 before being executed by the processor of the processing unit 102.
- the random access memory 103 can also contain the motion vector , reference texture image id, motion compensation indicator, etc.
- FIG. 10 only illustrates one particular way, among several possible, of making the device 100 so that it carries out the steps of the method for constructing a depth image as detailed above, in relation to FIGS. 7 and 9 in its different embodiments. Indeed, these steps can be carried out either on a reprogrammable calculation machine (a PC computer, a DSP processor or a microcontroller) executing a program comprising a sequence of instructions, or on a dedicated calculation machine (for example a set of logic gates like an FPGA or an ASIC, or any other hardware module).
- a reprogrammable calculation machine a PC computer, a DSP processor or a microcontroller
- a dedicated calculation machine for example a set of logic gates like an FPGA or an ASIC, or any other hardware module.
- the corresponding program (that is to say the sequence of instructions) could be stored in a removable storage medium (such as for example an SD card , a USB key, a CD-ROM or a DVD-ROM) or not, this storage medium being partially or totally readable by a computer or a processor.
- a removable storage medium such as for example an SD card , a USB key, a CD-ROM or a DVD-ROM
- a device 100 integrated in a terminal equipment item for example a mobile telephone or a virtual reality headset.
- a device 200, 200' for decoding a stream of coded data comprising at least one module for decoding coded information representing motion vectors of a texture image associated with said current view, a module for transmitting said information to an aforementioned device 100 for constructing a depth image associated with the current view.
- the device 200, 200' comprises a module for decoding coded information representative of an identifier of a reference texture image associated with the motion vectors and a module for decoding a motion compensation indicator of at least one element of the depth image.
- the transmission module is replaced by the aforementioned device 100.
- the device 100 is integrated into the device 200' and connected to its decoding module.
- the term "module" can correspond both to a software component and to a hardware component or a set of hardware and software components, a software component itself corresponding to one or more computer programs or sub-programs or in a more general to any element of a program capable of implementing a function or a set of functions.
- such a device 200, 200' comprises a random access memory 203 (for example a RAM memory), a processing unit 202 equipped for example with a processor, and controlled by a computer program Pg2, representative of the modules of decoding and transmission stored in a read only memory 201 (for example a ROM memory or a hard disk).
- a random access memory 203 for example a RAM memory
- a processing unit 202 equipped for example with a processor
- a computer program Pg2 representative of the modules of decoding and transmission stored in a read only memory 201 (for example a ROM memory or a hard disk).
- the code instructions of the computer program are for example loaded into the random access memory 203 before being executed by the processor of the processing unit 202.
- the random access memory 203 can also contain the decoded information.
- FIG. 11 only illustrates one particular way, among several possible ones, of making the device 200, 200' so that it performs the steps of the decoding method as detailed above, in relation to FIGS. 6 and 9 in its different embodiments. Indeed, these steps can be carried out either on a reprogrammable calculation machine (a PC computer, a DSP processor or a microcontroller) executing a program comprising a sequence of instructions, or on a dedicated calculation machine (for example a set of logic gates like an FPGA or an ASIC, or any other hardware module).
- a reprogrammable calculation machine a PC computer, a DSP processor or a microcontroller
- a dedicated calculation machine for example a set of logic gates like an FPGA or an ASIC, or any other hardware module.
- the corresponding program (that is to say the sequence of instructions) could be stored in a removable storage medium (such as for example an SD card, a USB key, a CD-ROM or a DVD-ROM) or not, this storage medium being partially or totally readable by a computer or a processor.
- a removable storage medium such as for example an SD card, a USB key, a CD-ROM or a DVD-ROM
- an example of the hardware structure of a device 300 for coding a data stream representative of a multi-view video comprising a module for determining motion vectors of a texture image associated with a view of the multi-view video, called the current view, with respect to a reference texture image, a module for coding the motion vectors in the data stream, a module for obtaining a depth image associated with said current view, captured by a depth camera, called captured depth image, a motion compensation module of at least one block of a depth image associated with the current view, called depth image constructed, configured to be implemented when at least one motion vector has been obtained for at least one block of the texture image, said motion compensation being implemented from said at least one motion vector and from at least one image reference depth available, said reference depth image being associated with the same view as said reference texture image; a module for evaluating a motion-compensated block of said reconstructed depth image by comparison with the co-located block of the captured depth image, a compensation error being obtained and a module
- module can correspond both to a software component and to a hardware component or a set of hardware and software components, a software component itself corresponding to one or more computer programs or sub-programs or in a more general to any element of a program capable of implementing a function or a set of functions.
- such a device 300 comprises a random access memory 303 (for example a RAM memory), a processing unit 302 equipped for example with a processor, and controlled by a computer program Pg3, representative of the coding modules, compensation in motion, evaluation and coding, stored in a read only memory 301 (for example a ROM memory or a hard disk).
- a random access memory 303 for example a RAM memory
- a processing unit 302 equipped for example with a processor
- Pg3 representative of the coding modules, compensation in motion, evaluation and coding
- a read only memory 301 for example a ROM memory or a hard disk.
- the code instructions of the computer program are for example loaded into the RAM 303 before being executed by the processor of the processing unit 302.
- FIG. 12 only illustrates one particular way, among several possible, of making the device 300 so that it performs the steps of the method of coding a data stream representative of a multi-view video as detailed above, in relation to Figures 8 and 9 in its various embodiments. Indeed, these steps can be carried out either on a reprogrammable calculation machine (a PC computer, a DSP processor or a microcontroller) executing a program comprising a sequence of instructions, or on a dedicated calculation machine (for example a set of logic gates like an FPGA or an ASIC, or any other hardware module).
- a reprogrammable calculation machine a PC computer, a DSP processor or a microcontroller
- a program comprising a sequence of instructions
- a dedicated calculation machine for example a set of logic gates like an FPGA or an ASIC, or any other hardware module.
- the corresponding program (that is to say the sequence of instructions) can be stored in a removable storage medium (such as for example an SD card , a USB key, a CD-ROM or a DVD-ROM) or not, this storage medium being partially or totally readable by a computer or a processor.
- a removable storage medium such as for example an SD card , a USB key, a CD-ROM or a DVD-ROM
- the invention which has just been described in its various embodiments has numerous advantages. By offering an alternative solution to conventional techniques for estimating a depth image from one or more decoded texture images, it contributes to reducing the complexity of the processing of a data stream representing a multi-view video by a receiver terminal equipment. This advantage is made possible by resorting to motion compensation of the depth image of a current view which reuses the motion vectors transmitted in the data stream for the corresponding texture image, associated with the same view.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020237044582A KR20240026942A (ko) | 2021-06-25 | 2022-06-13 | 멀티뷰 비디오로부터 깊이 이미지를 구성하기 위한 방법, 멀티뷰 비디오를 표현하는 데이터 스트림을 디코딩하기 위한 방법, 인코딩 방법, 디바이스들, 시스템, 단말 장비, 신호 및 이에 대응하는 컴퓨터 프로그램들 |
EP22735220.0A EP4360319A1 (fr) | 2021-06-25 | 2022-06-13 | Procédé de construction d'une image de profondeur d'une vidéo multi-vues, procédé de décodage d'un flux de données représentatif d'une vidéo multi-vues, procédé de codage, dispositifs, système, équipement terminal, signal et programmes d'ordinateur correspondants |
CN202280044686.7A CN117561716A (zh) | 2021-06-25 | 2022-06-13 | 用于从多视图视频构建深度图像的方法、用于对表示多视图视频的数据流进行解码的方法、编码方法、设备、***、终端设备、信号以及与其对应的计算机程序 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR2106867A FR3124301A1 (fr) | 2021-06-25 | 2021-06-25 | Procédé de construction d’une image de profondeur d’une vidéo multi-vues, procédé de décodage d’un flux de données représentatif d’une vidéo multi-vues, procédé de codage, dispositifs, système, équipement terminal, signal et programmes d’ordinateur correspondants. |
FR2106867 | 2021-06-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022269163A1 true WO2022269163A1 (fr) | 2022-12-29 |
Family
ID=78820837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2022/051126 WO2022269163A1 (fr) | 2021-06-25 | 2022-06-13 | Procédé de construction d'une image de profondeur d'une vidéo multi-vues, procédé de décodage d'un flux de données représentatif d'une vidéo multi-vues, procédé de codage, dispositifs, système, équipement terminal, signal et programmes d'ordinateur correspondants |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP4360319A1 (fr) |
KR (1) | KR20240026942A (fr) |
CN (1) | CN117561716A (fr) |
FR (1) | FR3124301A1 (fr) |
WO (1) | WO2022269163A1 (fr) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009091383A2 (fr) * | 2008-01-11 | 2009-07-23 | Thomson Licensing | Codage vidéo et de profondeur |
-
2021
- 2021-06-25 FR FR2106867A patent/FR3124301A1/fr active Pending
-
2022
- 2022-06-13 EP EP22735220.0A patent/EP4360319A1/fr active Pending
- 2022-06-13 CN CN202280044686.7A patent/CN117561716A/zh active Pending
- 2022-06-13 WO PCT/FR2022/051126 patent/WO2022269163A1/fr active Application Filing
- 2022-06-13 KR KR1020237044582A patent/KR20240026942A/ko unknown
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009091383A2 (fr) * | 2008-01-11 | 2009-07-23 | Thomson Licensing | Codage vidéo et de profondeur |
Non-Patent Citations (4)
Title |
---|
ANONYMOUS: "Test Model under Consideration for AVC based 3D video coding", no. n12349, 3 December 2011 (2011-12-03), XP030018844, Retrieved from the Internet <URL:http://phenix.int-evry.fr/mpeg/doc_end_user/documents/98_Geneva/wg11/w12349.zip w12349 (3DV TMuC AVC).doc> [retrieved on 20111203] * |
GIANLUCA CERNIGLIARO ET AL: "Low Complexity Mode Decision and Motion Estimation for H.264/AVC Based Depth Maps Encoding in Free Viewpoint Video", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, IEEE, USA, vol. 23, no. 5, 1 May 2013 (2013-05-01), pages 769 - 783, XP011506459, ISSN: 1051-8215, DOI: 10.1109/TCSVT.2012.2223632 * |
WENXIU SUNLINGFENG XUOSCAR C. AUSUNG HIM CHUICHUN WING: "An overview of free viewpoint Depth-lmage-BasedRendering (DIBR", KWOK THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY, PUBLIÉ DANS LES PROCEEDINGS OF THE SECOND APSIPA ANNUAL SUMMIT AND CONFÉRENCE, pages 1023 - 1030 |
YING CHEN ET AL: "Description of 3D video coding technology proposal by Qualcomm Incorporated", 98. MPEG MEETING; 28-11-2011 - 2-12-2011; GENEVA; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11),, no. m22583, 21 November 2011 (2011-11-21), pages 1 - 21, XP030051146 * |
Also Published As
Publication number | Publication date |
---|---|
KR20240026942A (ko) | 2024-02-29 |
CN117561716A (zh) | 2024-02-13 |
EP4360319A1 (fr) | 2024-05-01 |
FR3124301A1 (fr) | 2022-12-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI504220B (zh) | 於三維視訊之深度轉變的有效編碼 | |
EP3061246B1 (fr) | Procédé de codage et de décodage d'images, dispositif de codage et de décodage d'images et programmes d'ordinateur correspondants | |
FR2907575A1 (fr) | Procede et dispositif de codage d'images representant des vues d'une meme scene | |
EP3788789A2 (fr) | Procede et dispositif de traitement d'images et procede et dispositif de decodage d'une video multi-vue adaptés | |
WO2012150407A1 (fr) | Procédé de codage et de décodage d'images intégrales, dispositif de codage et de décodage d'images intégrales et programmes d'ordinateur correspondants | |
FR3012004A1 (fr) | Procede de codage et de decodage d'images, dispositif de codage et de decodage d'images et programmes d'ordinateur correspondants | |
WO2010043809A1 (fr) | Prediction d'une image par compensation en mouvement en avant | |
EP3649778B1 (fr) | Procédé de codage et décodage d'images, dispositif de codage et décodage et programmes d'ordinateur correspondants | |
EP4360319A1 (fr) | Procédé de construction d'une image de profondeur d'une vidéo multi-vues, procédé de décodage d'un flux de données représentatif d'une vidéo multi-vues, procédé de codage, dispositifs, système, équipement terminal, signal et programmes d'ordinateur correspondants | |
EP3725080B1 (fr) | Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues représentative d'une vidéo omnidirectionnelle | |
EP3158749B1 (fr) | Procédé de codage et de décodage d'images, dispositif de codage et de décodage d'images et programmes d'ordinateur correspondants | |
EP3360328A1 (fr) | Codage et décodage multi-vues | |
EP4140136A1 (fr) | Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues | |
EP3861751A1 (fr) | Codage et décodage d'une vidéo omnidirectionnelle | |
WO2015044581A1 (fr) | Codage et décodage vidéo par héritage d'un champ de vecteurs de mouvement | |
WO2019008253A1 (fr) | Procédé de codage et décodage d'images, dispositif de codage et décodage et programmes d'ordinateur correspondants | |
EP4104446A1 (fr) | Procédé et dispositif de traitement de données de vidéo multi-vues | |
EP4222950A1 (fr) | Codage et decodage d'une video multi-vues | |
EP2962459B1 (fr) | Dérivation de vecteur de mouvement de disparité, codage et décodage vidéo 3d utilisant une telle dérivation | |
WO2010086562A1 (fr) | Procede et dispositif de codage d'images mettant en oeuvre des modes de codage distincts, procede et dispositif de decodage, et programmes d'ordinateur correspondants | |
WO2021136895A1 (fr) | Synthese iterative de vues a partir de donnees d'une video multi-vues | |
FR3137240A1 (fr) | Procédé de segmentation d’une pluralité de données, procédé de codage, procédé de décodage, dispositifs, systèmes et programme d’ordinateur correspondants | |
EP3542533A1 (fr) | Procédé et dispositif de codage et de décodage d'une séquence multi-vues |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22735220 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18573015 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280044686.7 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022735220 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022735220 Country of ref document: EP Effective date: 20240125 |