WO2023009392A1 - Réseaux de neurones pour conversion de gamme dynamique et gestion d'affichage d'images - Google Patents

Réseaux de neurones pour conversion de gamme dynamique et gestion d'affichage d'images Download PDF

Info

Publication number
WO2023009392A1
WO2023009392A1 PCT/US2022/037991 US2022037991W WO2023009392A1 WO 2023009392 A1 WO2023009392 A1 WO 2023009392A1 US 2022037991 W US2022037991 W US 2022037991W WO 2023009392 A1 WO2023009392 A1 WO 2023009392A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
dynamic range
intensity
input
display
Prior art date
Application number
PCT/US2022/037991
Other languages
English (en)
Inventor
Robert Wanat
Anustup Kumar Atanu Choudhury
Robin Atkins
Original Assignee
Dolby Laboratories Licensing Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corporation filed Critical Dolby Laboratories Licensing Corporation
Priority to CN202280052320.4A priority Critical patent/CN117716385A/zh
Priority to EP22757411.8A priority patent/EP4377879A1/fr
Publication of WO2023009392A1 publication Critical patent/WO2023009392A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/92Dynamic range modification of images or parts thereof based on global image properties
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20208High dynamic range [HDR] image processing

Definitions

  • the present invention relates generally to images. More particularly, an embodiment of the present invention relates to the dynamic range conversion and display management of standard-dynamic range (SDR) images into high-dynamic range (HDR) displays.
  • SDR standard-dynamic range
  • HDR high-dynamic range
  • the term 'dynamic range' may relate to a capability of the human visual system (HVS) to perceive a range of intensity (e.g., luminance, luma) in an image, e.g., from darkest grays (blacks) to brightest whites (highlights).
  • HVS human visual system
  • DR relates to a 'scene-referred' intensity.
  • DR may also relate to the ability of a display device to adequately or approximately render an intensity range of a particular breadth. In this sense, DR relates to a 'display-referred' intensity.
  • the term may be used in either sense, e.g. interchangeably.
  • high dynamic range relates to a DR breadth that spans the some 14-15 orders of magnitude of the human visual system (HVS).
  • HVS human visual system
  • EDR enhanced dynamic range
  • VDR visual dynamic range
  • images where n > 10 may be considered images of enhanced dynamic range.
  • EDR and HDR images may also be stored and distributed using high-precision (e.g., 16-bit) floating-point formats, such as the OpenEXR file format developed by Industrial Light and Magic.
  • Metadata relates to any auxiliary information that is transmitted as part of the coded bitstream and assists a decoder to render a decoded image.
  • metadata may include, but are not limited to, minimum, average, and maximum luminance values in an image, color space or gamut information, reference display parameters, and auxiliary signal parameters, as those described herein.
  • LDR lower dynamic range
  • SDR standard dynamic range
  • HDR content may be color graded and displayed on HDR displays that support higher dynamic ranges (e.g., from 1,000 nits to 5,000 nits or more).
  • the methods of the present disclosure relate to any dynamic range higher than SDR.
  • display management refers to processes that are performed on a receiver to render a picture for a target display.
  • processes may include tone-mapping, gamut-mapping, color management, frame-rate conversion, and the like.
  • HDR high dynamic range
  • legacy content may be available only in standard dynamic range (SDR) and the broadcast infrastructure may not allow transmitting metadata to convert such content to a format suitable to take full advantage of the capabilities of an HDR display.
  • SDR standard dynamic range
  • improved techniques for the up-conversion and display management of SDR images to HDR displays are developed.
  • the approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.
  • FIG. 1 depicts an example process for a video delivery pipeline
  • FIG. 2A depicts a dynamic range up-conversion and display management pipeline according to a first example embodiment of the present invention with a single neural- network processing unit;
  • FIG. 2B depicts a dynamic range up-conversion and display management pipeline according to a second example embodiment of the present invention with two neural-network processing units;
  • FIG. 3 depicts an example neural-network architecture to predict luminance metadata according to an example embodiment of the present invention
  • FIG. 4A depicts a processing pipeline in a Residual Network (ResNet) block being used in a neural network to predict a detail layer residual image according to an example embodiment of the present invention
  • FIG. 4B depicts a processing pipeline in a neural network to predict a detail layer residual image according to an example embodiment of the present invention.
  • Example embodiments described herein relate to methods for the dynamic range conversion and display management of SDR images onto HDR displays.
  • a processor receives an input image (202) in a first dynamic range and a first spatial resolution. It generates an intensity image (207) based on the input image, it applies the intensity image to a first neural network (210) to generate predicted statistics of the intensity image when mapped in a second dynamic range higher than the first dynamic range, it generates (215) a tone-mapping curve based on statistics of the intensity image and the predicted statistics, and applies the tone-mapping curve to display the input image to a display with a target dynamic range different than the second dynamic range.
  • a method for dynamic range conversion and display mapping comprises: accessing an input image in a first dynamic range and a first spatial resolution; generating an intensity image based on the input image; applying the intensity image to a first neural network to generate predicted statistics of the intensity image when mapped in a second dynamic range higher than the first dynamic range; and generating a tone-mapping curve based on statistics of the intensity image and the predicted statistics.
  • the method may in an embodiment comprise generating, based on the input image and the tone-mapping curve, a mapped output image for display on a display with a target dynamic range.
  • the target dynamic range may be different than the second dynamic range.
  • the statistics of the intensity image comprise intensity values of the intensity image in the first dynamic range
  • the predicted statistics comprise predicted intensity values in the second dynamic range (e.g. corresponding predicted intensity values in the second dynamic range).
  • the first neural network may be trained on pairs of reference images in the second dynamic range (e.g. HDR) and reference images in the first dynamic range (e.g. SDR).
  • the reference images in the first dynamic range may be generated by mapping each reference image in the second dynamic range to the first dynamic range using a tone mapping operation.
  • the first neural network may be trained on the pairs of reference images in the second and first dynamic range to learn a relationship (e.g. minimize an error) between predicted statistics for the reference images in the first dynamic range (e.g. predicted statistics of the respective intensity images generated based on the reference images in the first dynamic range) and statistics of the reference images in the second dynamic range.
  • this may be done by iteratively calculating, for pairs of reference images in the first and second dynamic range, predicted statistics for the reference image in the first dynamic range using the first neural network and back-propagating the error between the predicted statistics and the statistics of the corresponding reference image in the second dynamic range into the first neural network.
  • the training may be terminated when the error between reference and predicted statistics is within a small threshold or reaches a non- decreasing plateau.
  • the predicted statistics may be applied to up-convert the reference image in the first dynamic range into its corresponding predicted image in the second dynamic range.
  • the corresponding reference image in the second dynamic range and the predicted image in the second dynamic range may be compared, and errors between the predicted image and the reference image in the second dynamic range may be back-propagated into the first neural network.
  • the method may further comprise: generating a base layer image and a detail layer image based on the intensity image; and applying the tone-mapping curve to the base layer image to generate a tone-mapped base layer image in the second dynamic range.
  • the method may further comprise: applying the intensity image and the detail layer image into a second neural network to generate a residual layer image in the second dynamic range; adding the residual layer image to the detail layer image to generate a second detail layer image; and adding the second detail layer image to the tone- mapped base layer image to generate an output image in the second dynamic range.
  • the second neural network may be trained on pairs of reference images in the second dynamic range (e.g. HDR) and reference images in the first dynamic range (e.g. SDR).
  • the reference images in the first dynamic range may be generated by mapping each reference image in the second dynamic range to the first dynamic range using a tone mapping operation.
  • the second neural network may be trained on the pairs of reference images in the second and first dynamic range to learn a relationship (e.g. minimize an error) between a predicted image in the second dynamic range and a corresponding reference image in the second dynamic range.
  • Each pair of reference images in the second and first dynamic range may be processed by the second neural network, wherein the errors between the reference image in the second dynamic range and the corresponding predicted image in the second dynamic range is back-propagated into the second neural network.
  • the predicted image of each pair may be generated by: applying the intensity image generated based on the reference image in the first dynamic range, and the corresponding detail layer image into the second neural network to generate a residual layer image in the second dynamic range; adding the residual layer image to the detail layer image to generate a second detail layer image; and adding the second detail layer image to a tone-mapped base layer image generated by applying the tone-mapping curve to the intensity image, to generate the predicted output image in the second dynamic range.
  • FIG. 1 depicts an example process of a conventional video delivery pipeline (100) showing various stages from video capture to video content display.
  • a sequence of video frames (102) is captured or generated using image generation block (105).
  • Video frames (102) may be digitally captured (e.g. by a digital camera) or generated by a computer (e.g. using computer animation) to provide video data (107).
  • video frames (102) may be captured on film by a film camera. The film is converted to a digital format to provide video data (107).
  • a production phase (110) video data (107) is edited to provide a video production stream (112).
  • Block (115) post-production editing may include adjusting or modifying colors or brightness in particular areas of an image to enhance the image quality or achieve a particular appearance for the image in accordance with the video creator's creative intent. This is sometimes called “color timing” or “color grading.”
  • Other editing e.g. scene selection and sequencing, image cropping, addition of computer-generated visual special effects, etc.
  • video images are viewed on a reference display (125).
  • video data of final production may be delivered to encoding block (120) for delivering downstream to decoding and playback devices such as television sets, set-top boxes, movie theaters, and the like.
  • coding block (120) may include audio and video encoders, such as those defined by ATSC, DVB, DVD, Blu-Ray, and other delivery formats, to generate coded bit stream (122).
  • the coded bit stream (122) is decoded by decoding unit (130) to generate a decoded signal (132) representing an identical or close approximation of signal (117).
  • the receiver may be attached to a target display (140) which may have completely different characteristics than the reference display (125).
  • a display management block (135) may be used to map the dynamic range of decoded signal (132) to the characteristics of the target display (140) by generating display-mapped signal (137).
  • Examples of display management processes are described in Refs. [1] and [2].
  • mapping algorithm applies a sigmoid like function (for examples, see Refs [3] and [4]) to map the input dynamic range to the dynamic range of the target display.
  • mapping functions may be represented as piece-wise linear or non-linear polynomials characterized by anchor points, pivots, and other polynomial parameters generated using characteristics of the input source and the target display.
  • the mapping functions use anchor points based on luminance characteristics (e.g., the minimum, medium (average), and maximum luminance) of the input images and the display.
  • luminance characteristics e.g., the minimum, medium (average), and maximum luminance
  • other mapping functions may use different statistical data, such as luminance-variance or luminance-standard deviation values at a block level or for the whole image.
  • the process may also be assisted by additional metadata which are either transmitted as part of the transmitted video or they are computed by the decoder or the display.
  • additional metadata which are either transmitted as part of the transmitted video or they are computed by the decoder or the display.
  • a source may use both versions to generate metadata (such as piece-wise linear approximations of forward or backward reshaping functions) to assist the decoder in converting incoming SDR images to HDR images.
  • metadata such as piece-wise linear approximations of forward or backward reshaping functions
  • FIG. 2A depicts a dynamic-range up-conversion and display management pipeline (200A) according to an example embodiment.
  • input video (202) may include video received from a video decoder and/or video received from a graphical processing unit (say, from a set-top box), and/or other video inputs (say, from a camera, an HDMI port in the TV or the set-top box, a graphical processing unit (GPU), and the like).
  • a graphical processing unit say, from a set-top box
  • other video inputs say, from a camera, an HDMI port in the TV or the set-top box, a graphical processing unit (GPU), and the like.
  • GPU graphical processing unit
  • input video 202 may be characterized as “SDR” video to be upconverted to “HDR” video to be displayed on a HDR display.
  • process 200A includes a neural network (NN) (210) to generate a set of predicted HDR statistics (or metadata) to facilitate the generation of an optimized SDR-to-HDR mapping.
  • the NN unit (210) may be preceded by a preprocessing unit (205) to translate the input image 202 to a suitable image in terms of color format and resolution.
  • mapping unit (215) is used by mapping unit (215) to generate an optimized mapping curve, which, together with the original input (202), is fed to the display mapping unit (220), to generate the mapped output 222. Details of each component are described next.
  • the input image is converted to a format suitable for processing by the NN unit 210.
  • this process comprises two steps: a) extracting the intensity or luminance of the input signal, and b) adjusting its resolution.
  • input RGB images may be converted to a luma-chroma color format, such as YCbCr, ICtCp, and the like, using known in the art color- transformation techniques, such as ITU-R Rec. BT 2100 and the like.
  • intensity may be characterized as the per-pixel maximum value of its R, G, and B components.
  • the intensity extraction step may be bypassed if the source image is already represented as a single- channel intensity image.
  • pixel values may also be normalized to [0, 1] according to a predefined standard dynamic range, e.g., between 0.005 and 100 nits, to facilitate the computation of the image statistics.
  • the global metadata generation neural network (210) typically operates on fixed image dimensions, but the input dimensions of the image may vary based on the source content (e.g., 480p, 720p, 1080i, and the like).
  • unit 205 may resample the image size to dimensions used to train and operate the NN metadata generator (e.g., 960 x 540). For example, the 960 x 540 resolution has been found to provide a good trade-off between complexity and resolution with state of the art neural networks.
  • the input image is larger than the supported resolution of the NN, then it is down-sampled repeatedly by a factor of two until both the width and height are less than or equal to the desired resolution.
  • the down- sampling operation may be performed by 4-tap separable horizontal and vertical low pass filters (e.g., [1 3 3 1 ]/8), followed by discarding every other pixel in both the horizontal and vertical dimensions.
  • the width and height are then padded with a padding value symmetrically on all four sides to obtain the desired image dimensions (e.g., 960 x 540).
  • the neural network can be trained for different image dimensions and this resampling step can be adjusted accordingly.
  • the predicted HDR statistics neural network (210) takes as input a single channel (its luminance) of the SDR image (202) and predicts statistics of the corresponding HDR image as needed to generate an SDR-to-HDR mapping curve (such as the minimum, average, and maximum luminance values).
  • the predicted HDR metadata (212) may be temporally filtered to ensure temporal consistency among pictures in a video scene. These values may also be adjusted for inconsistent results to ensure they can be used for mapping, e.g. by clamping the results between 0 and 1 or by ensuring monotonicity of the resulting image statistics.
  • the neural network 210 is defined as a set of 4-dimensional convolutions, each of which is followed by adding a constant bias value to all results. In some layers, the convolution is followed by clamping negative values to 0.
  • the convolutions are defined by their size in pixels (M x N), how many image channels (C) they operate on, and how many such kernels are in the filter bank (K). In that sense, each convolution can be described by the size of the filter bank MxNxCxK.
  • a filter bank of the size 3x3x1x2 is composed of 2 convolution kernels, each of which operates on one channel and has a size of 3 pixels by 3 pixels.
  • Some filter banks may also have a stride, meaning that some results of the convolution are discarded.
  • a stride of 1 means every input pixel produces an output pixel.
  • a stride of 2 means that only every second pixel in each dimension produces an output, and the like.
  • a filter bank with a stride of 2 will produce an output with (M/2)x(N/2) pixels, where MxN is the input image size. All inputs except the ones to fully connected kernels are padded so that setting the stride of 1 would produce an output with the same number of pixels as the input.
  • the output of each convolution bank feeds as an input into the next convolution layer.
  • the neural network (210) is composed of four such convolution layers:
  • a fourth filter bank (320) sized 48x27x16x3, fully connected, with 3 biases and one 1x3 output (212) representing the estimated minimum, med., and maximum luminance levels of an HDR image corresponding to the SDR input.
  • NN (210) is trained on pairs of HDR and SDR images. For example, a large collection of HDR images is mapped to corresponding SDR images using a tone mapping operation, such the one described in Refs. [1] and [2].
  • This process includes the analysis of the reference HDR metadata (e.g., min, mid, and max luminance values) from the HDR source images being used during the tone mapping process.
  • the goal of the network is to learn the relationship between metadata from the estimated HDR image and the reference HDR image. In one embodiment, this is done by iteratively calculating predicted HDR metadata using the neural network architecture and minimizing the error between the predicted HDR metadata and the reference HDR metadata, by propagating the error back into the network weights.
  • the training terminates when the error between reference and predicted metadata is within a small threshold or reaches a non-decreasing plateau.
  • the predicted metadata is applied to up-convert the input SDR image into its corresponding HDR image.
  • the source HDR image and the predicted HDR image are compared, and errors are back-propagated to the network. It has been observed that training based on errors between original and predicted images than based on errors between original and predicted metadata yields a better performing neural network.
  • step 215 Given the predicted HDR metadata (212), step 215 generates an optimal mapping curve to be used by the display mapping process (220). It is noted that neural net 210 does not generate a mapping for a specific display. As a result, the output of such SDR-to-HDR mapping may exceed the capabilities of the target display, thus requiring a second HDR (predicted image)-to-HDR (display) mapping that takes into consideration the characteristics of the target display. This second HDR-to-HDR mapping may be skipped if the generated HDR data are simply stored off-line or are transmitted to be displayed by another device downstream.
  • This second HDR-to-HDR mapping may be skipped if the generated HDR data are simply stored off-line or are transmitted to be displayed by another device downstream.
  • the predicted HDR metadata (212) is processed to generate a “forward-mapping” curve to map an HDR image with the predicted HDR metadata to an SDR signal range (see Ref. [3] or Ref. [4]).
  • the forward mapping curve may be inverted to generate an “inverse- mapping” curve that would convert the SDR signal range of the source image to the HDR signal range of the predicted HDR image.
  • This inverse mapping curve is then further adjusted to map the predicted HDR image dynamic range according to the characteristics of the target display (such as its minimum and maximum luminance) or other parameters, such as desired contrast or surrounding ambient light.
  • step 220 using the input SDR image (202) and the mapping curve (217) derived in step 215, the display mapping process generates the final HDR image (222) for the target display (e.g., see Refs. [1-2]).
  • the up-conversion process 200A may be considered a global dynamic-range mapping process.
  • the display mapping process 220 may be further improved by taking into consideration local contrast and details information of the input image. For example, as described in the Appendix, a down-sampling and up- sampling/filtering process may be used to split the input image into two layers: a filtered base layer image and a detail layer image. By applying the tone-mapping curve (217) to the filtered base layer, and then adding back the detail layer to the result, the original contrast of the image can be preserved both globally as well as locally. This may be referred to as “detail preservation” or as “precision rendering.”
  • display-mapping can be performed as a multi-stage operation: a) Generate a base layer (BL) image to guide the SDR to HDR mapping; b) Perform the tone-mapping to the base layer image; c) Add the detail layer image to the tone-mapped base layer image.
  • BL base layer
  • HDR HDR
  • the generated base layer represents a spatially-blurred, edge- preserved, version of the original image. That is, it maintains important edges but blurs finer details. More specifically, generating the BL image may include:
  • FIG. 2B depicts an example embodiment of inverse mapping and display management process (200B) using a second neural net (230) that takes advantage of a pyramid representation of the input image and precision rendering.
  • process 200B includes a new block (225), which, given the intensity (7) of the original image, it generates a base layer (I BL ) (BL) image and a detail layer image (I DL ) (DL).
  • I BL base layer
  • I DL detail layer image
  • the predicted HDR detail neural network (230) takes as input two channels: the detail layer (DL) of the SDR image and the intensity (I) channel of the source SDR image. It generates a single channel predicted detail layer (PDL) image, with the same resolution as the detail layer image, containing residual values to be added to the detail layer image.
  • the detail layer residuals stretch the local contrast of the output image to increase its perceived contrast and dynamic range.
  • block 205 in 200A can be simplified by performing only the appropriate down-sampling of I to the appropriate input resolution of neural-network 210 (e.g., 960x540).
  • Neural network 230 is composed of convolutional layers and Residual Network (ResNet) layers.
  • each ResNet block (410) contains two convolution layers (405a, 405b) with ReLU units, with the input (402) to each ResNet unit being added to the output of the second convolutional layer (405b) to generate the ResNet output (407).
  • each convolutional layer (405) has a 3x3x32x32 filter bank, with no biases, and a stride of 1.
  • the predicted HDR detail layer neural network 230 is composed of an input convolution (420), followed by five ResNet blocks (410) (each one depicted in FIG.
  • the output of the network forms an MxN image, the same size as the input detail layer image. This output is then added to the input detail layer image to form the final detail layer image.
  • the output residual image instead of using the input images (I, DL) at full resolutions (MxN) one may use sub-sampled versions to reduce complexity. Then, the output residual image (PDL) may be upscaled to the full resolution.
  • convolution network 420 has: input MxNx2, a filter bank of: 3x3x2x32, stride 1, no biases, and an output of MxNx32.
  • convolution network 430 has: input MxNx32, a filter bank of: 3x3x32x1, stride 1, no biases, and an output of MxNxl [00048]
  • the network may be trained on pairs of HDR and SDR images. In an embodiment, a large collection of HDR images are mapped to SDR using a tone mapping operation, such as described in Ref. [2].
  • the HDR detail layer prediction NN This pair is then processed by the HDR detail layer prediction NN, where the error signal between the reference and predicted HDR images is propagated to the weights of the neural network. Training terminates when the error is below a threshold or reaches a non-decreasing plateau.
  • the dg scaler in equation (1) may be set to 1.
  • I B a * I BL + (1 - a) * I where ⁇ is a scaler in [0, 1].
  • a scaler in [0, 1].
  • process 200B may be simplified by bypassing (removing) the NN for HDR detail layer prediction (230) and by using only the original detail layer (DL).
  • process 200B may be adjusted as follows:
  • Embodiments of the present invention may be implemented with a computer system, systems configured in electronic circuitry and components, an integrated circuit (IC) device such as a microcontroller, a field programmable gate array (FPGA), or another configurable or programmable logic device (PLD), a discrete time or digital signal processor (DSP), an application specific IC (ASIC), and/or apparatus that includes one or more of such systems, devices or components.
  • IC integrated circuit
  • FPGA field programmable gate array
  • PLD configurable or programmable logic device
  • DSP discrete time or digital signal processor
  • ASIC application specific IC
  • the computer and/or IC may perform, control, or execute instructions related to image transformations, such as those described herein.
  • the computer and/or IC may compute any of a variety of parameters or values that relate to image up- converting and display mapping processes described herein.
  • the image and video embodiments may be implemented in hardware, software, firmware and various combinations thereof.
  • Certain implementations of the invention comprise computer processors which execute software instructions which cause the processors to perform a method of the invention.
  • processors in a display, an encoder, a set top box, a transcoder or the like may implement methods related to image up-converting and display mapping as described above by executing software instructions in a program memory accessible to the processors.
  • the invention may also be provided in the form of a program product.
  • the program product may comprise any tangible and non-transitory medium which carries a set of computer-readable signals comprising instructions which, when executed by a data processor, cause the data processor to execute a method of the invention.
  • Program products according to the invention may be in any of a wide variety of tangible forms.
  • the program product may comprise, for example, physical media such as magnetic data storage media including floppy diskettes, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAM, or the like.
  • the computer-readable signals on the program product may optionally be compressed or encrypted.
  • a component e.g. a software module, processor, assembly, device, circuit, etc.
  • reference to that component should be interpreted as including as equivalents of that component any component which performs the function of the described component (e.g., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated example embodiments of the invention.
  • the Base Layer (BL) image may be constructed by a combination of down-sampling and up-sampling operations on the intensity image.
  • layers of the pyramid may be skipped in order to reduce the memory bandwidth.
  • the first layer say, at 2K resolution
  • the quarter resolution image would simply be doubled twice.
  • the pyramid may generate the following layers: 1024 x 576, 512 x 288, 256 x 144, 128 x 72, 64 x 36, 32 x 18, and 16 x 9.
  • both the half and quarter resolution layers may be skipped. This ensures that no matter the input image size, the subsequent layers of the pyramid will have the same dimensions.
  • filtering is performed using either a separable low-pass 2 x 2 filter (e.g., with filter coefficients [1 1 ]/2) or a separable 4 x 4 low-pass filter (e.g., with filter coefficients [ 1 3 3 1] Z8).
  • the 4 x 4 filter results in better alignment between the pyramid levels but requires additional line buffers.
  • one may apply different filters in the horizontal and vertical directions e.g., a 4-tap horizontal filter and a 2-tap vertical filter, or vice versa.
  • the input image may be padded to:
  • a processor receives the down-sampled pyramid data and reconstructs the original image in its original resolution using, at each layer, an edge-aware up-sampling filter.
  • the smallest level of the pyramid is up-sampled first, then, additional levels are up-sampled, up to the resolution of the highest pyramid level.
  • P(i) the pyramid image at layer i.
  • the lowest resolution pyramid image e.g., P(7)
  • an edge-preserving filter which generates two coefficient “images” to be denoted as and Imb(7) (defined below).
  • both Ima and Imb are up-sampled by a factor of two to generate up-sampled coefficient images ImaU(7) and ImbU(7).
  • Ima(1) and Imb(1) can be upscaled by 2.
  • Ima(1) and Imb(1) can be upscaled by 4.
  • the two upscaled coefficient images (ImaU(1) and (ImbU(1) , combined with the intensity image (I) of the input video may be used to generate a base layer image, as
  • PScov PSout — Sout * Pout
  • Ima(i) PScov/(Pvar + PW(i))
  • Imb(i) Sout — Ima(f) * Pout.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne des procédés et des systèmes de conversion de gamme dynamique et de mappage d'affichage d'images à gamme dynamique standard (SDR) sur des dispositifs d'affichage à grande gamme dynamique (HDR). Étant donnée une image d'entrée SDR, un processeur génère une image d'intensité (luminance), et facultativement une image de couche de base et une image de couche de détail. Un premier réseau de neurones (NN) utilise l'image d'intensité pour prédire des statistiques de l'image SDR dans une plus grande gamme dynamique. Ces statistiques prédites sont utilisées conjointement avec les statistiques d'image d'origine de l'image d'entrée pour dériver une courbe de mappage des tons optimale afin de mapper l'image SDR d'entrée sur un dispositif d'affichage HDR. Facultativement, un second réseau de neurones peut générer, en utilisant l'image d'intensité et l'image de couche de détail, une image de couche de détail résiduelle dans une plus grande gamme dynamique pour améliorer le mappage des tons de l'image de couche de base dans la plus grande gamme dynamique.
PCT/US2022/037991 2021-07-29 2022-07-22 Réseaux de neurones pour conversion de gamme dynamique et gestion d'affichage d'images WO2023009392A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202280052320.4A CN117716385A (zh) 2021-07-29 2022-07-22 用于图像的动态范围转换和显示管理的神经网络
EP22757411.8A EP4377879A1 (fr) 2021-07-29 2022-07-22 Réseaux de neurones pour conversion de gamme dynamique et gestion d'affichage d'images

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202163226847P 2021-07-29 2021-07-29
EP21188516.5 2021-07-29
US63/226,847 2021-07-29
EP21188516 2021-07-29

Publications (1)

Publication Number Publication Date
WO2023009392A1 true WO2023009392A1 (fr) 2023-02-02

Family

ID=82942845

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/037991 WO2023009392A1 (fr) 2021-07-29 2022-07-22 Réseaux de neurones pour conversion de gamme dynamique et gestion d'affichage d'images

Country Status (2)

Country Link
EP (1) EP4377879A1 (fr)
WO (1) WO2023009392A1 (fr)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8593480B1 (en) 2011-03-15 2013-11-26 Dolby Laboratories Licensing Corporation Method and apparatus for image data transformation
US9961237B2 (en) 2015-01-19 2018-05-01 Dolby Laboratories Licensing Corporation Display management for high dynamic range video
US20190108621A1 (en) * 2017-10-04 2019-04-11 Fotonation Limited System and method for estimating optimal parameters
EP3503019A1 (fr) * 2017-12-21 2019-06-26 Thomson Licensing Procédé amélioré de mappage inverse de tonalité et dispositif correspondant
US10600166B2 (en) 2017-02-15 2020-03-24 Dolby Laboratories Licensing Corporation Tone curve mapping for high dynamic range images
WO2020131731A1 (fr) * 2018-12-18 2020-06-25 Dolby Laboratories Licensing Corporation Composition dynamique basée sur l'apprentissage machine dans une vidéo à plage dynamique standard améliorée (sdr +)
WO2020219341A1 (fr) 2019-04-23 2020-10-29 Dolby Laboratories Licensing Corporation Gestion d'affichage pour images à grande gamme dynamique

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8593480B1 (en) 2011-03-15 2013-11-26 Dolby Laboratories Licensing Corporation Method and apparatus for image data transformation
US9961237B2 (en) 2015-01-19 2018-05-01 Dolby Laboratories Licensing Corporation Display management for high dynamic range video
US10600166B2 (en) 2017-02-15 2020-03-24 Dolby Laboratories Licensing Corporation Tone curve mapping for high dynamic range images
US20190108621A1 (en) * 2017-10-04 2019-04-11 Fotonation Limited System and method for estimating optimal parameters
EP3503019A1 (fr) * 2017-12-21 2019-06-26 Thomson Licensing Procédé amélioré de mappage inverse de tonalité et dispositif correspondant
WO2020131731A1 (fr) * 2018-12-18 2020-06-25 Dolby Laboratories Licensing Corporation Composition dynamique basée sur l'apprentissage machine dans une vidéo à plage dynamique standard améliorée (sdr +)
WO2020219341A1 (fr) 2019-04-23 2020-10-29 Dolby Laboratories Licensing Corporation Gestion d'affichage pour images à grande gamme dynamique

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BIST CAMBODGE ET AL: "Tone expansion using lighting style aesthetics", COMPUTERS AND GRAPHICS, ELSEVIER, GB, vol. 62, 15 December 2016 (2016-12-15), pages 77 - 86, XP029900476, ISSN: 0097-8493, DOI: 10.1016/J.CAG.2016.12.006 *
WANG CHAO ET AL: "Deep Inverse Tone Mapping for Compressed Images", IEEE ACCESS, vol. 7, 20 June 2019 (2019-06-20), pages 74558 - 74569, XP011731162, DOI: 10.1109/ACCESS.2019.2920951 *

Also Published As

Publication number Publication date
EP4377879A1 (fr) 2024-06-05

Similar Documents

Publication Publication Date Title
US10244244B2 (en) Screen-adaptive decoding of high dynamic range video
JP7483747B2 (ja) ハイダイナミックレンジ画像のディスプレイ管理
EP3266208B1 (fr) Quantification perceptuelle adaptable au contenu des images de plage dynamique élevée
JP5086067B2 (ja) 高ダイナミックレンジ画像を符号化するための方法、表現するためのデータ構造、及び符号化装置
KR102157032B1 (ko) 고 동적 범위 비디오에 대한 디스플레이 관리
EP3537717B1 (fr) Codage progressif monocouche pour prendre en charge une composition hdr à capacité multiple
US12003746B2 (en) Joint forward and backward neural network optimization in image processing
Zhang et al. Multi-scale-based joint super-resolution and inverse tone-mapping with data synthesis for UHD HDR video
US11895416B2 (en) Electro-optical transfer function conversion and signal legalization
WO2023009392A1 (fr) Réseaux de neurones pour conversion de gamme dynamique et gestion d'affichage d'images
JP2024527025A (ja) 画像のダイナミックレンジ変換及び表示管理のためのニューラルネットワーク
CN117716385A (zh) 用于图像的动态范围转换和显示管理的神经网络
KR102681436B1 (ko) 하이 다이내믹 레인지 이미지들에 대한 디스플레이 관리
WO2023028046A1 (fr) Réseaux neuronaux pour un rendu de précision dans la gestion d'affichage
US20240095893A1 (en) Image enhancement via global and local reshaping
WO2021213336A1 (fr) Dispositif d'amélioration de qualité d'image et procédé associé
EP3459248B1 (fr) Réarrangement de chrominance pour d'images à haute gamme dynamique
WO2023205548A1 (fr) Génération d'une image hdr à partir d'images brutes et sdr correspondantes de la caméra
WO2023055612A1 (fr) Métadonnées spatiales dynamiques pour traitement d'image et vidéo
WO2023244616A1 (fr) Système de distribution vidéo capable de changements de plage dynamique
WO2023224917A1 (fr) Prédiction de métadonnées de passage de compensation dans des séquences vidéo au moyen de réseaux neuronaux
CN117980958A (zh) 用于在显示管理中进行精确渲染的神经网络
WO2023096728A1 (fr) Débruitage pour reprofilage local sdr-à-hdr
CN118044198A (zh) 用于图像和视频处理的动态空间元数据
CN116888959A (zh) 经由全局和局部整形实现的图像增强

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22757411

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2024504814

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 202280052320.4

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2022757411

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022757411

Country of ref document: EP

Effective date: 20240229