CN118044189A - Encoding and decoding multi-intent images and video using metadata - Google Patents

Encoding and decoding multi-intent images and video using metadata Download PDF

Info

Publication number
CN118044189A
CN118044189A CN202280066734.2A CN202280066734A CN118044189A CN 118044189 A CN118044189 A CN 118044189A CN 202280066734 A CN202280066734 A CN 202280066734A CN 118044189 A CN118044189 A CN 118044189A
Authority
CN
China
Prior art keywords
image
metadata
intent
adjustment
applying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280066734.2A
Other languages
Chinese (zh)
Inventor
R·阿特金斯
J·A·派拉兹
R·瓦纳特
J·W·苏埃纳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby Laboratories Licensing Corp
Original Assignee
Dolby Laboratories Licensing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corp filed Critical Dolby Laboratories Licensing Corp
Priority claimed from PCT/US2022/044899 external-priority patent/WO2023055736A1/en
Publication of CN118044189A publication Critical patent/CN118044189A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

Systems and methods for encoding and decoding multi-intent images and video using metadata. When an image is encoded as a multi-purpose image, at least one appearance adjustment may be made to the image. Metadata characterizing the at least one appearance adjustment may be included in or transmitted along with the encoded multi-purpose image. When decoding the multi-intent image, the system may obtain a selection of a desired rendering intent and render the multi-intent image with an applied appearance adjustment based on the selection, or may use the metadata to reverse the appearance adjustment and restore the image to before the appearance adjustment.

Description

Encoding and decoding multi-intent images and video using metadata
Cross Reference to Related Applications
The present application claims priority from U.S. provisional application No. 63/251427, filed on 1 at 10 and 1 at 2021, and from european patent application No. 21208445.3, filed on 16 at 11 and 2021, all of which are incorporated herein by reference in their entirety.
Technical Field
The present application relates generally to systems and methods for image encoding and decoding.
Background
Peel An Deli (Pierre Andrivon) et al: "SEI message for color mapping information (SEI MESSAGE for Colour Mapping Information)", ITU-T SG 16WP 3 and ISO/IEC JTC 1/SC 29/WG 11 video coding joint collaboration group (JCT-VC) 17 th conference: spanish bane, 3 months 27 to 4 months 4 days, numbering: JCTVC-Q0074, month 4, 2, 2014, XP030239839, proposes color mapping side information in a SET message that guarantees smooth color space transformations in connection with upcoming HDTV and multi-stage UHDTV service deployments. The proposed mapping is said to help preserve the artistic intent of the studio production content while preserving the differences between television manufacturers. This idea was first exposed to JCTVC-N0180. JCTVC-00363 clarifies the proposed SET message intent. In addition, complexity problems are handled, and the color mapping model in JCTCC-P0126 is simplified. Finally, the proposal also handles editing problems and synchronization layers. A software is provided that identifies proposed model parameters. Embodiments are provided in the HM-13.0+RExt-6.0 encoder and decoder. When the proposed color mapping information SEI message exists, the decoded output picture is color mapped.
US2016/261889 A1 discloses an image processing apparatus and method that can easily improve encoding efficiency. A setting unit configured to set additional information including packing information related to a packing process of rearranging each pixel data of original data, which is image data before performing a demosaicing process according to a degree of correlation; and an encoding unit configured to encode the raw data subjected to the packetizing process, and generate a bit stream including the obtained encoded data and the additional information set by the setting unit.
The "research team reports High Dynamic Range (HDR) imaging ecosystem (Study Group Report High-Dynamic-Range (HDR) Imaging Ecosystem)", SMPTE Technical Commission (TC) 10e sg,2015, 9, 19, XP055250336, sets forth definitions of High Dynamic Range (HDR) and related technologies, describes the gaps currently existing in the various ecosystems for forming, delivering and displaying HDR-related content, determines existing standards that may be affected by the HDR ecosystem, including Wide Color Gamut (WGC), and determines areas where implementation problems may require further investigation. The report focuses on the field of professional applications, but does not explicitly address the problem of implementing to the home.
US2016/254028A1 discloses a method and system for generating and applying scene-stable metadata for a video data stream. The video data stream is partitioned or partitioned into scenes and a first set of metadata may be generated for a given scene of video data. The first set of metadata may be any known metadata that is a function of the desired video content (e.g., brightness). The first set of metadata may be generated on a frame-by-frame basis. Scene stability metadata is generated that may be different from the first set of metadata of the scene. The scene stabilization metadata is generated by monitoring desired features with a scene and is used to keep the desired features within an acceptable range of values. This may help to avoid visible and possibly unwanted visual artifacts occurring when rendering video data.
WO 2020/264409 A1 discloses an apparatus and method for providing a solution to the problem of preserving the original authoring intent for video played on a target. The video bitstream includes metadata having a flag indicating an authoring intent for a target display. The metadata includes a number of fields that represent characteristics such as content type, content subtype, desired white point, whether video is used in reference mode, desired sharpness, desired noise reduction, desired MPEG noise reduction, desired frame rate conversion, desired average picture level, and desired color. The metadata is intended to ease the content creator to mark its content. Metadata may be added to the video content at multiple points, with the status of the flag set to TRUE (TRUE) or FALSE (FALSE) to indicate whether the metadata was added by the content creator or by a third party.
Disclosure of Invention
The invention is defined by the independent claims. The dependent claims relate to optional features of some embodiments of the invention. When encoding images of a scene captured using a digital device, it is common practice to adjust the captured images by, for example, adjusting the images for viewing in a reference viewing environment and applying aesthetic adjustments such as enhanced contrast and color saturation. It is desirable to be able to transmit the originally captured or preprocessed image captured by the imaging sensor representing "true" and then apply these operations at play. This would allow for a variety of rendering intents to be implemented: upon play, the device may render the originally captured "real" image, or alternatively, the device may modify the originally captured "real" image to form a "pleasing" image. Accordingly, techniques for encoding and decoding multi-purpose images have been developed.
Aspects of the present disclosure relate to devices, systems, and methods for encoding and decoding one or more multi-intent images.
In one exemplary aspect of the present disclosure, a method for encoding a multi-purpose image is provided. The method includes obtaining an image for encoding as a multi-intent image, applying at least one appearance adjustment to the image, generating metadata characterizing the at least one appearance adjustment, and encoding the image and metadata as a multi-intent image.
In another exemplary aspect of the present disclosure, a method for decoding a multi-intent image is provided. The method includes obtaining a multi-intent image and metadata characterizing at least one appearance adjustment between the multi-intent image and a substitute version of the multi-intent image, obtaining a selection of the substitute version of the multi-intent image, and applying an inverse adjustment of the at least one appearance adjustment to the multi-intent image using the metadata to recover the substitute version of the multi-intent image.
In another exemplary aspect of the present disclosure, a method of providing a multi-intent image is provided. The method includes obtaining an original image for encoding as a multi-intent image, generating metadata characterizing at least one appearance adjustment applied to the original image, encoding the original image and the metadata as the multi-intent image, and providing the multi-intent image.
In another exemplary aspect of the disclosure, a non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform operations comprising obtaining an image for encoding as a multi-intent image, applying at least one appearance adjustment to the image, generating metadata characterizing the at least one appearance adjustment, and encoding the image and the metadata as a multi-intent image.
In another exemplary aspect of the disclosure, a non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform operations including obtaining a multi-intent image and metadata characterizing at least one appearance adjustment between the multi-intent image and a substitute version of the multi-intent image, obtaining a selection of the substitute version of the multi-intent image, and applying an inverse adjustment of the at least one appearance adjustment to the multi-intent image using the metadata to recover the substitute version of the multi-intent image is provided.
In another exemplary aspect of the disclosure, a non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform operations including obtaining an original image for encoding into a multi-intent image, generating metadata characterizing at least one appearance adjustment applied to the original image, encoding the original image and the metadata into the multi-intent image, and providing the multi-intent image.
In this manner, aspects of the present disclosure provide encoding, decoding, and multi-purpose images and videos, and improvements are realized at least in the technical fields of image encoding, image decoding, image projection, image display, holography, signal processing, and the like.
Drawings
These and other more detailed and specific features of the various embodiments are more fully disclosed in the following description, with reference to the accompanying drawings, in which:
fig. 1 illustrates an exemplary process of an image encoding and decoding flow.
Fig. 2 illustrates an exemplary process of encoding and decoding multi-purpose images and video. Fig. 3 illustrates an exemplary process of encoding multi-purpose images and video.
Fig. 4 illustrates an exemplary process of decoding multi-intent images and video.
Detailed Description
The present disclosure and aspects thereof may be embodied in various forms including hardware, devices or circuits controlled by a computer-implemented method, computer program products, computer systems and networks, user interfaces, and application programming interfaces; and hardware implemented methods, signal processing circuits, memory arrays, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), and the like. The above is intended merely to give an overall concept of the various aspects of the disclosure, and is not intended to limit the scope of the disclosure in any way.
In the following description, numerous details are set forth, such as optical device configurations, adaptations, operations, etc., in order to provide an understanding of one or more aspects of the present disclosure. It will be apparent to one skilled in the art that these specific details are merely exemplary details and are not intended to limit the scope of the application.
Fig. 1 illustrates an exemplary process of an image transfer flow (100) showing various stages from image capture to image content display. An image generation module (105) is used to capture or generate an image (102) that may include a series of video frames (102). The image (102) may be captured digitally (e.g., by a digital camera) or generated by a computer (e.g., using a computer animation) to provide image data (107). Alternatively, the image (102) may be captured on film by a film camera. The film is converted to a digital format to provide image data (107). In the production phase (110), the image data (107) is edited to provide an image production stream (112).
The image data of the production stream (112) is then provided to a processor (or one or more processors such as a Central Processing Unit (CPU)) at a module (115) for post-production editing. The post-production editing of module (115) may include adjusting or modifying the color or brightness of a particular region of the image according to the authoring intent of the image creator to enhance image quality or achieve a particular image appearance. This is sometimes referred to as "color blending" or "color grading (color grading)", and the methods described herein may be performed by the processor at block (115). Other edits (e.g., scene selection and ordering, image cropping, adding computer-generated visual effects, etc.) may be performed at module (115) to produce a final version of the article for distribution (117). During post-production editing (115), an image or video image is viewed on a reference display (125). The reference display (125) may be a consumer-level display or projector, if desired.
After the post-production (115) is completed, the image data of the final product (117) may be transferred to an encoding module (120) for downstream transfer to a decoding device and a playback device, such as a computer display, television, set-top box, movie theatre, etc. In some embodiments, the encoding module (120) may include audio and video encoders, such as those defined by ATSC, DVB, DVD, blu-ray, and other transport formats, for generating the encoded bitstream (122). In the receiver, a decoding unit (130) decodes the encoded bitstream (122) to generate a decoded signal (132), which is identical or similar to the signal (117). The receiver may be attached to a target display (140) that may have entirely different features than the reference display (125). In this case, the display management module (135) is operable to map the dynamic range of the decoded signal (132) to the characteristics of the target display (140) by generating a display mapped signal (137). Additional methods described herein may be performed by the decoding unit (130) or the display management module (135). Both the decoding unit (130) and the display management module (135) may comprise respective processors, or may be integrated into a single processing unit. While the present disclosure relates to a target display (140), it should be understood that this is merely an example. It should also be appreciated that the target display (140) may include any device configured to display or project light; such as computer displays, televisions, OLED displays, LCD displays, quantum dot displays, movie theatres, consumer and other commercial projection systems, heads-up displays, virtual reality displays, and the like.
When acquiring a scene using a digital device, radiation measurements of a real reference scene are rarely transmitted directly to generate an image. Instead, it is common practice for an Original Electronic Manufacturer (OEM) or software application designer of the device to adjust the image, for example, to adjust the image for viewing in a reference viewing environment such as a dim ambient environment and D65 lighting, and to apply aesthetic adjustments such as enhanced contrast and color saturation. These adjustments and others create a preferred rendering that is considered to be pleasing to the consumer.
Currently, there are two losses in these operations. Firstly, the parameters for the application operation are not transmitted, and secondly, the pixel operation may be lossy due to non-linear clipping and quantization, irreversible operation, unknown algorithm or unknown order of operation.
Instead, it is desirable to be able to transmit the original captured/preprocessed image representing "true" captured by the imaging sensor, and then apply these operations at play. This would allow for a variety of rendering intents: upon playback, the device may render the originally captured "real" image, or the device may form a "pleasing" image that is modified from the originally captured "real" image.
It is also desirable to be able to transmit such content in a backward compatible manner. In this approach, modifications that form a "pleasing" image may be applied during capture and appropriate parameters transmitted to the playback device so that it can reverse the modifications to recover the originally captured "real" image.
Fig. 2 provides a method (200) that allows images with multiple intents to be encoded and decoded using metadata. The method (200) may be performed, for example, by a processor as part of the module (115) and/or module (120) for encoding and as part of the module (130) and/or module (135) for decoding.
At step (202), an image is captured. In a digital capture device, the exposed scene is transformed into raw sensor values in a single channel representation. The single channel image representation is extended to have, for example, by a process called demosaicing: three color representations of three channels, red, green and blue (RGB). There are many methods of demosaicing, any of which are sufficient in the embodiments disclosed herein.
In order to perfectly capture the chromaticity of the scene, the spectral sensitivity of the capture device should be matched to that of the viewer. In practice, these are typically not perfectly matched, but rather approximated using a 3x3 matrix transformation to convert the sensor sensitivity to some set of desired RGB primaries. Traditionally, during this step, the camera spectral sensitivity is not transmitted with the content, which makes this process lossy. In one embodiment of the invention, the camera spectral sensitivity is transmitted along with the content along with the previously applied 3x3 matrix transform, which allows the playback device to apply or reverse the conversion of the sensor output to the specified RGB primary colors. As a non-limiting example, step (202) may include reading single channel values from the sensor, applying a demosaicing scheme to form a three-color channel (e.g., RGB) image, optionally applying a 3x3 transform to conform the image sensitivity to the image sensitivity of the desired three (e.g., RGB) primary colors. Step (202) may also include measuring capture ambient brightness (e.g., a level of ambient light in the capture environment).
Once the desired RGB composition of the captured image is determined, these values may be consistent with the specified reference white point. The image can be made to coincide with one of the standardized white points (D50, D65, etc.) by Von-Kries adaptive transformation. The process involves: (a) Estimating the captured environment surrounding brightness and white point, and (b) applying a correction to the image to achieve color matching for the observer in a specified reference viewing environment (e.g., an environment with known white point and surrounding brightness). PCT application No. PCT/US2021/027826 filed on month 4 of 2021 and PCT application No. PCT/US2021/029476 filed on day 27 of 2021 outline methods for adjusting images to adapt the adaptation status of an observer to a color surrounding environment, both of which are incorporated herein by reference for all purposes. At step (204), one or more optional source appearance adjustments may be applied to the captured imaging, including, but not limited to, white balance adjustments, color correction adjustments, light-to-light transfer function (optical-optical transfer function, OOTF) adjustments. Step (204) may include calculating a nonlinear light transfer function (OOTF) to map the measured captured ambient brightness to a reference viewing environment. The order of the white point adjustment and the 3x3 matrix may be changed. Computing and applying the light transfer function (OOTF) may form a rendering intent of the image on a standard display device. In practice OOTF is applied to map an image from a viewing environment at the time of capture to display in a reference viewing environment. Today, application OOTF is a lossy operation that makes it difficult to reverse OOTF when played. As with the white point adjustment, in the first step (a) the surrounding brightness of the capture environment can be estimated and in the second step (b) the image can be corrected to achieve a match for the observer in the reference environment.
At step (206), one or more optional source preference adjustments may be applied to the captured image, including, but not limited to, contrast adjustments, color saturation adjustments including overall color saturation adjustments and/or individual color saturation adjustments, slope-offset-power Tmid adjustments of the tone curve, and other tone curve clipping and adjustments. As used herein, "mid" refers to the average of the maxRGB values of an image in a perceptually quantized (perceptually quantized, PQ) encoded image, where each pixel has its own maxRGB value equal to the maximum color component value (R, G or B) of that pixel. In other words, any color component of a pixel has a maximum value, which is the maxRGB value of that pixel, and the average of the individual maxRGB values in the entire PQ-encoded image is the "mid" of the image. "T-mid" may refer to a "target mid," which may be a "mid" value that a user or content creator wants to achieve in a final image. In some embodiments, the individual color saturation adjustments may include saturation adjustments for 6 different colors, which may be referred to as "six vector adjustments.
Step (206) and step (208) may involve receiving an intent selection from the user in step (208), wherein the intent selection specifies what source appearance adjustments and source preference adjustments are to be made, coefficients of such adjustments, which portions of the image to apply the adjustments, and so on.
It is common practice for OEMs or software applications to apply source preference adjustments to captured images. These changes are purely aesthetic and are typically used to render images with a higher level of contrast and color saturation. In various embodiments of the present disclosure, these preference changes, as determined by the OEM, are transmitted as metadata with the content and applied at play in the same manner as the source appearance metadata. In each case, there is a first step (a) of calculating or specifying a desired correction amount to be applied, and (b) of applying the correction using a parameterized function. (a) And (b) both are transmitted as metadata, enabling the playback device to render "pleasing" or "real" images with sufficient flexibility, and enabling the capture device to transmit "pleasing" or "real" images with sufficient flexibility.
As described herein, one benefit of the various embodiments disclosed herein is that all adjustments made to the three-channel image can be encoded as metadata and sent with the content to the playback device for application. In one embodiment, the OEM or encoding device may decide not to apply adjustments to the appearance and preferences in order to produce a "real" image.
At step (210), the image as modified in steps (206) and (208) may be encoded. Step (210) may include encoding the image for downstream transmission to a decoding device and a playback device, such as a computer display, television, set-top box, movie theater, etc. In some embodiments, the encoding step (210) may include audio and video encoders, such as those defined by ATSC, DVB, DVD, blu-ray, and other transport formats, for generating an encoded bitstream. In addition to encoding the image, step (210) may also include forming and/or encoding metadata characterizing the source appearance adjustment applied in step (204) and the source preference adjustment applied in step (206). Metadata may include metadata associated with source appearance adjustments, such as scene white points specified in x, y coordinates (or some other system); scene surround light intensity (e.g., information about estimated capture environment) specified as units of lux (or some other system); the coefficients of the applied white point adjustment matrix; coefficients of the applied 3x3 color matrix, coefficients of the applied parameterization OOTF; the spectral sensitivity of the sensor for calculating the 3x3 matrix; and other enhanced coefficients or other information applied in step (204). Further, the metadata may include metadata associated with the source preference adjustment, such as coefficients for contrast enhancement, such as slope-offset-power Tmid contrast adjustment; a coefficient of saturation enhancement; coefficients of individual color saturation adjustment; coefficient of tone curve clipping; and other enhanced coefficients applied in step (206).
At step (212), the encoded image and metadata may be decoded. At step (214), a selection of a desired rendering intent may be obtained. As a first example, a selection may be obtained to render an image modified by the source appearance adjustment of step (204) and the source preference adjustment. As a second example and a third example, a selection of rendered images may be obtained as if it had been modified by the source appearance adjustment of step (204), but not by the source preference adjustment of step (206) (or vice versa). As a fourth example, a selection of a rendered image may be obtained as if it was not modified by the source appearance adjustment of step (204) or the source preference adjustment of step (206). In a fourth example, the image captured in step (202) may be partially or fully restored. In some embodiments, the selection of the rendering intent obtained in step (214) may be based on a user selection at the playback device. In some embodiments, a default rendering intent may be specified in the encoding process and may be selected without reverse user input. In some embodiments, the default rendering intent may involve rendering the image with the application of the source appearance adjustment of step (204) and the source preference adjustment of step (206).
At optional step (216), the metadata may be used to calculate a reversed source preference adjustment. When applied, the reversed source preference adjustment of step (216) may undo some or all of the source preference adjustment of step (206), wherein the user selection and default rendering intent identify which source preference adjustments are reversed.
At optional step (218), the metadata may be used to calculate a reversed source appearance adjustment. When applied, the reversed source appearance adjustment of step (218) may undo some or all of the source appearance adjustments of step (204), wherein the user selection and default rendering intent identify which source appearance adjustments are reversed.
At optional step (220), a target appearance adjustment may be calculated and applied. As a non-limiting example, the target appearance adjustment may include measuring display surround brightness (e.g., a level of ambient light in the display environment), and then calculating and applying a nonlinear light transfer function (OOTF) to map from the reference viewing environment to the measured display surround brightness (e.g., the actual viewing environment).
At optional step (222), target preference adjustments may be calculated and applied. As non-limiting examples, target preference adjustments may include contrast adjustments, color saturation adjustments, slope-offset-power Tmid adjustments, individual color saturation adjustments, and tone curve clipping.
At step (224), the image may be rendered. For example, the image may be projected, displayed, saved to a storage device, transferred to another device, or otherwise used.
In some embodiments, the source adjustment reversal and the application target adjustment are combined into a single processing step, and the adjustment is then calculated accordingly. In other words, some or all of steps 216, 218, 220, and 220 may be used in combination.
In some embodiments, the rendering intent selected in step (208) is for a "real" image, and steps (204 and 206) are substantially bypassed. This corresponds to the distribution of "real" images. Metadata in such embodiments would indicate that no source look adjustment was made and no source preference adjustment was made.
In some other embodiments, some source appearance adjustment and preference adjustment are applied (e.g., in steps (204 and 206)), resulting in a "pleasing" image. Metadata in such embodiments may indicate the amount and type of source appearance adjustment and preference adjustment that has been applied. The metadata may include a plurality of values, each value corresponding to a parameter controlling a particular function that is applied as a source appearance adjustment and/or a preference adjustment. The playback device may reverse (or nearly reverse) the exact functions applied by knowing the exact functions, the order in which the exact functions are applied, and the parameters that control the strength of the functions. The metadata may be configured to include information required for reversing (or approximately reversing) these functions by the playback device.
If desired, the metadata formed in step (210) may be used to transmit the "desired rendering intent" of the content, which specifies default values of how the image is handled at play (whether to display a "real" image or a "pleasing" image). This may be a boolean value or a ratio that varies continuously between the two. The playback device interprets this metadata as "desired rendering intent" and reverses the source appearance adjustment and preference adjustment according to the source adjustment metadata and also applies the target appearance adjustment according to the viewing environment. If desired, the "desired rendering intent" specified in the metadata may be overridden upon receipt of user input.
Fig. 3 provides a method (300) that allows images with multiple intents to be encoded using metadata. The method (300) may be performed by, for example, a processor as part of the module (115) and/or the module (120) for encoding.
At step (302), an image is captured by exposing a scene to a sensor. At step (304), raw sensor values for each color channel are collected. At step (306), the raw sensor values for each color channel may be converted into a multi-channel color image (e.g., a three-channel color image having three primary colors) using a demosaicing algorithm or process. At step (308), a 3x3 matrix transform may be applied to the multi-channel color image to convert the raw sensor values to a desired set of primary colors, such as RGB primary colors. The 3x3 matrix transformation of step (308) may be used to account for the difference in sensitivity of the sensor to different color channels. At step (310), the image may be brought into agreement with the reference white point by one or more white balance adjustments, color correction adjustments, or the like. At step (312), a light-to-light transfer function (OOTF) may be applied, for example, to map from ambient brightness in the capture environment to brightness of the reference viewing environment. At step (314), one or more source preference adjustments may be applied, including but not limited to contrast adjustment, color saturation adjustment, slope-offset-power Tmid adjustment, individual color saturation adjustment, and tone curve clipping. After step (314), the image may be encoded and metadata generated to enable potential reversal of any source preference adjustments and source appearance adjustments made during the method (300).
Fig. 4 provides a method (400) that allows decoding an image with multiple intents using metadata. The method (400) may be performed, for example, by a processor as part of the module (130) and/or the module (135) for decoding.
At step (402), the multi-intent image and its corresponding metadata are decoded.
After decoding the image and metadata on the playback device, there are a number of options regarding the rendering intent of the displayed image. In one embodiment, the selected (or preferred) intent resides within the metadata as a flag or profile to guide the operation of the target device/receiving device to accommodate adjustments to both the desired appearance area and the preference area. In another embodiment, the final rendered image may not involve adaptation to appearance or preference adjustments. Another embodiment relates to adaptation of the rendered image reception to appearance phenomena, but not preference (or vice versa). These intentions are not necessarily binary, as the determined adjustments for appearance and preference phenomena may be applied in part.
At step (404), the desired rendering intent is obtained, for example, from a default value specified in the metadata, from user input, and so forth.
Once the intent has been established for the target device, it may be necessary to reverse the adjustment based on the source image. Both the appearance adjustment and preference adjustment made to the image on the source side of the flow have been decoded from the attached metadata file. The reversal may be determined based on applied adjustments known from the metadata, if desired. In embodiments where the OEM decides not to apply any image adjustments, no inversion of the computational source is required and the targets may be applied directly. For all other embodiments, if it is desired not to apply the source image based adjustment (e.g., if it is desired to reverse the source image based adjustment), a reverse adjustment may be calculated.
At step (406), inverted source preference adjustments and appearance adjustments are calculated, e.g., based on metadata.
Since the source preference adjustment is applied last before encoding, it may be necessary to reverse them first after decoding. The inverse preference adjustment undoes any additional image processing specified by the metadata for aesthetic purposes, e.g., in one embodiment, changing image contrast and saturation. After this, the source appearance adjustment is reversed by describing the metadata of the source to display OOTF, and any adjustments made to correct for the presence of ambient and/or colored light.
Once the source adjustment has been reversed, the target adjustment may be applied. Similar to the source appearance adjustment, the target appearance adjustment utilizes information about the target viewing environment and the adaptation state of the standard viewer to change the image white point, brightness, and color saturation to properly render the image. The proximity of the viewer to the screen will determine how much the screen produces an effect compared to the effect produced by the environment (exemplary techniques are described in PCT patent application No. PCT/US2021/027826 filed on month 4 of 2021, which is incorporated herein in its entirety for all purposes). Alternatively, the effect of screen size on adaptation may be calculated using the viewing distances recommended by the standards. In one embodiment, additional adjustments may be applied to personalize appearance to individual viewers. These adjustments include correcting the individual's contrast sensitivity function, accounting for metamerism, and potentially the degree of achromatopsia. Image enhancement may be further applied to the target to accommodate OEM preferences.
At step (408), target appearance adjustments and preference adjustments are calculated, for example, based on the desired rendering intent, information about the target display environment, such as surround brightness.
At step (410), the inverted source preference adjustment and appearance adjustment are applied to the decoded image, e.g., to undo the source preference adjustment and appearance adjustment made during the method (300).
At step (412), target appearance adjustments and preference adjustments are applied to the decoded image.
At step (414), the decoded image, to which the target appearance adjustment and preference adjustment are applied, is displayed, saved to disk, transferred to another device or another party, or for other use.
The above-described encoding system, decoding system, and method for encoding and decoding multi-purpose images and videos using metadata may be provided. Systems, methods, and devices according to the present disclosure may employ any one or more of the following configurations.
(1) A method of encoding a multi-purpose image, the method comprising: obtaining an image for encoding as a multi-intent image, applying at least one appearance adjustment to the image, generating metadata characterizing the at least one appearance adjustment, and encoding the image and metadata as the multi-intent image.
(2) The method of (1), wherein the metadata characterizes at least one appearance adjustment to a degree sufficient to make the metadata available to reverse the at least one appearance adjustment.
(3) The method of (1) or (2), wherein applying at least one appearance adjustment comprises converting a sensor value to a color value.
(4) The method of any one of (1) to (3), wherein applying at least one appearance adjustment comprises converting sensor values to color values using a 3x3 matrix, and wherein the metadata comprises coefficients of the 3x3 matrix.
(5) The method of any one of (1) to (4), wherein applying at least one appearance adjustment comprises estimating a captured ambient surround luminance and a white point, and applying a white point correction based on the estimated captured ambient surround luminance and white point.
(6) The method of (5), wherein the metadata comprises estimated captured ambient surround brightness and white point.
(7) The method of any one of (1) to (4), wherein applying at least one appearance adjustment comprises estimating a capture ambient surround brightness, and applying a light-to-light transfer function OOTF based in part on the estimated capture ambient surround brightness to prepare an image for rendering on a reference display device.
(8) The method of (7), wherein the metadata comprises estimated ambient brightness of the captured environment.
(9) The method according to (7) or (8), wherein the metadata includes coefficients of an optical light transfer function.
(10) The method of any one of (1) to (9), wherein applying at least one appearance adjustment comprises applying saturation enhancement, and wherein the metadata comprises coefficients of saturation enhancement.
(11) The method of any one of (1) to (10), wherein applying at least one appearance adjustment comprises applying contrast enhancement, and wherein the metadata comprises coefficients of contrast enhancement.
(12) The method of any one of (1) to (11), wherein applying at least one appearance adjustment comprises applying a separate color saturation adjustment, and wherein the metadata comprises coefficients of the separate color saturation adjustment.
(13) The method of any one of (1) to (12), wherein applying at least one appearance adjustment comprises applying slope-offset-power Tmid enhancement, and wherein the metadata comprises coefficients of slope-offset-power Tmid enhancement.
(14) The method of any one of (1) to (13), wherein applying at least one appearance adjustment comprises applying enhancement, and wherein the metadata comprises coefficients of enhancement.
(15) The method of any one of (1) to (14), wherein applying at least one appearance adjustment comprises applying tone curve pruning, and wherein the metadata comprises coefficients of the tone curve pruning.
(16) The method of any one of (1) to (15), wherein the multi-purpose image comprises a video frame in a video.
(17) A method of decoding a multi-purpose image, the method comprising: the method includes obtaining a multi-intent image along with metadata characterizing at least one appearance adjustment between the multi-intent image and a substitute version of the multi-intent image, obtaining a selection of the substitute version of the multi-intent image, and applying an inverse adjustment of the at least one appearance adjustment to the multi-intent image using the metadata to recover the substitute version of the multi-intent image.
(18) A method, the method comprising: obtaining an original image for encoding into a multi-intent image, generating metadata characterizing at least one appearance adjustment made to the original image, encoding the original image and the metadata into the multi-intent image, and providing the multi-intent image.
(19) The method of (18), the method further comprising: receiving the multi-intent image at a decoder, obtaining a selection of the first rendering intent based on the selection of the first rendering intent at the decoder, decoding the multi-intent image by applying at least one appearance adjustment to the original image, and providing the original image to which the at least one appearance adjustment is applied.
(20) The method of (18) or (19), the method further comprising: obtaining, at the decoder, a selection of the second rendering intent based on the selection of the second rendering intent, decoding the multi-intent image without applying at least one appearance adjustment to the original image, and providing the original image without applying at least one appearance adjustment.
(21) The method of (18), wherein the metadata characterizes at least one appearance adjustment to a degree sufficient to make the metadata available to reverse the at least one appearance adjustment.
(22) The method of any one of (18) to (21), wherein at least one appearance adjustment comprises converting a sensor value to a color value.
(23) The method of any one of (18) to (22), wherein at least one appearance adjustment comprises converting sensor values to color values using a 3x3 matrix, and wherein the metadata comprises coefficients of the 3x3 matrix.
(24) The method of any of (18) to (23), wherein the at least one appearance adjustment includes estimating a capture environment surrounding luminance and a white point, and applying a white point correction based on the estimated capture environment surrounding luminance and white point.
(25) The method of (24), wherein the metadata includes estimated captured ambient surround brightness and white point.
(26) The method of any one of (18) to (23), wherein at least one appearance adjustment comprises estimating a capture ambient surround brightness, and applying a light transfer function OOTF based in part on the estimated capture ambient surround brightness to prepare an image for rendering on a reference display device.
(27) The method of (26), wherein the metadata comprises estimated ambient brightness of the captured environment.
(28) The method of (26) or (27), wherein the metadata comprises coefficients of an optical light transfer function.
(29) The method of any one of (18) to (28), wherein at least one appearance adjustment comprises applying saturation enhancement, and wherein the metadata comprises coefficients of saturation enhancement.
(30) The method of any one of (18) to (29), wherein at least one appearance adjustment comprises applying contrast enhancement, and wherein the metadata comprises coefficients of contrast enhancement.
(31) The method of any one of (18) to (30), wherein at least one appearance adjustment comprises applying individual color saturation adjustments, and wherein the metadata comprises coefficients of the individual color saturation adjustments.
(32) The method of any one of (18) to (31), wherein at least one appearance adjustment comprises applying a slope-offset-power Tmid enhancement, and wherein the metadata comprises coefficients of the slope-offset-power Tmid enhancement.
(33) The method of any one of (18) to (32), wherein at least one appearance adjustment comprises applying enhancements, and wherein the metadata comprises coefficients of the enhancements.
(34) The method of any one of (18) to (33), wherein at least one appearance adjustment comprises applying tone curve pruning, and wherein the metadata comprises coefficients of the tone curve pruning.
(35) The method of any one of (18) to (34), wherein the multi-intent image comprises a video frame in a video.
(36) A non-transitory computer-readable medium storing instructions that, when executed by an electronic processor, cause the electronic processor to perform operations according to any one of (1) to (35).
(37) An image transmission system for transmitting a multi-purpose image, the image transmission system comprising a processor configured to encode the multi-purpose image according to any one of (1) to (16) and (18) to (35).
(38) An image decoding system for receiving and decoding a multi-purpose image, the image decoding system comprising a processor configured to encode the multi-purpose image according to (17).
With respect to the processes, systems, methods, heuristics, etc. described herein, it should be understood that, while the steps of such processes, etc. have been described as occurring according to some ordered sequence, such processes may be practiced with the described steps performed in an order other than that described herein. It should further be appreciated that certain steps may be performed concurrently, other steps may be added, or certain steps described herein may be omitted. In other words, for the purpose of illustrating particular embodiments, a description of the processes herein is provided and should in no way be construed as limiting the claims.
Accordingly, it is to be understood that the above description is intended to be illustrative, and not restrictive. Many embodiments and applications other than the examples provided will be apparent upon reading the above description. The scope should be determined not with reference to the above description, but should instead be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. It is contemplated and reserved that the technology discussed herein will evolve in the future and that the disclosed systems and methods will be incorporated into such future embodiments. In summary, it should be understood that the application is capable of modification and variation.
All terms used in the claims are intended to be given their broadest reasonable constructions and their ordinary meanings as understood by those familiar with the art described herein unless an explicit indication to the contrary is made herein. In particular, use of the singular articles such as "a," "an," "the," and the like should be understood to recite one or more of the indicated elements unless a claim recites an explicit limitation to the contrary.
The Abstract of the disclosure is provided to enable the reader to quickly ascertain the nature of the technical disclosure. The abstract is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Furthermore, in the foregoing detailed description, it can be seen that various features are grouped together in various embodiments for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments cover more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby incorporated into the detailed description, with each claim standing on its own as a separate claimed subject matter.

Claims (13)

1. A method of decoding a multi-intent image including a representation of the image in a reference viewing environment and metadata for transforming the included representation into an alternate version of the image, the method comprising:
Obtaining the multi-intent image and metadata characterizing at least one appearance adjustment of the image between the representation in the reference viewing environment and an alternate version of the image, the metadata indicating a surrounding brightness and white point in a capture environment when the image has been captured by an image sensor;
Obtaining a selection of the alternate version of the multi-intent image, wherein the alternate version selected approximates the image captured by the image sensor; and
Using the metadata, applying an inverse of the at least one appearance adjustment to the representation of the image in the reference viewing environment to recover the alternate version of the multi-intent image based on the obtained selection.
2. The method of claim 1, wherein applying an inverse of the at least one appearance adjustment to the representation of the image in the reference viewing environment to recover the alternate version of the multi-intent image based on the obtained selection comprises:
Mapping the image from the white point in the reference viewing environment to the white point in the capture environment; and
A light transfer function is applied to the image to map from the ambient brightness in the reference viewing environment to the ambient brightness of the capture environment.
3. The method of claim 2, wherein the metadata is further indicative of a spectral sensitivity of the image sensor that has captured the image, and is indicative of coefficients of a 3x3 matrix transform, the 3x3 matrix transform being applied to raw sensor values from the image sensor to correct for differences in spectral sensitivity of the image sensor between color channels; and
Wherein applying an inverse of the at least one appearance adjustment to the representation of the image in the reference viewing environment to recover the alternate version of the multi-intent image based on the obtained selection further comprises applying an inverse of the 3x3 matrix transform to the image to retrieve original sensor values.
4. A method of encoding a multi-intent image including a representation of the image in a reference viewing environment and metadata for transforming the reference representation into an alternate version of the image, the method comprising:
Obtaining an image for encoding as the multi-intent image, comprising:
Capturing a multi-channel color image by exposing a scene to an image sensor in a capture environment, and collecting raw sensor values from the image sensor for each color channel; and
Determining a surrounding luminance and white point in the capture environment;
Applying at least one appearance adjustment to the image to transform a captured image into the representation of the image in the reference viewing environment, comprising:
mapping the image from the determined white point in the capture environment to a preferred white point in the reference viewing environment; and
Applying a light transfer function to the image to map from the ambient brightness in the capture environment to a preferred ambient brightness of the reference viewing environment;
generating metadata characterizing the at least one appearance adjustment, the metadata indicating the determined surrounding brightness and white point in the capture environment; and
The transformed image and metadata are encoded into the multi-intent image.
5. The method of claim 4, wherein applying at least one appearance adjustment to the image further comprises 3x3 matrix transforming the captured multi-channel color image to convert the raw collected sensor values to a set of desired primary colors, the 3x3 matrix transforming accounting for differences in spectral sensitivity of the image sensor between the color channels; and
Wherein the metadata is further indicative of a spectral sensitivity of the image sensor that has captured the image and coefficients of the 3x3 matrix transform for correcting differences in spectral sensitivity of the image sensor between the color channels such that the metadata is capable of transforming the reference representation to an image that approximates the captured image.
6. The method of claim 4 or claim 5, wherein applying the at least one appearance adjustment comprises applying an individual color saturation adjustment, and wherein the metadata comprises coefficients of the individual color saturation adjustment.
7. The method of any of claims 4-6, wherein applying the at least one appearance adjustment comprises applying a slope-offset-power-Tmid adjustment, and wherein the metadata comprises coefficients of the slope-offset-power Tmid adjustment.
8. The method of any of claims 4-7, wherein applying the at least one appearance adjustment comprises applying a tone curve adjustment, and wherein the metadata comprises coefficients of the tone curve adjustment.
9. The method of any of claims 4 to 8, wherein the multi-intent image comprises a video frame in a video.
10. The method of any of claims 4 to 9, wherein the metadata characterizes the at least one appearance adjustment to a degree sufficient to enable the metadata to be used to reverse the at least one appearance adjustment.
11. A decoder for decoding a multi-purpose image, the multi-purpose image comprising a representation of the image in a reference viewing environment and metadata for transforming the included representation into an alternative version of the image, the decoder comprising a processor configured to decode the multi-purpose image according to any of claims 1 to 3.
12. An image transmission system for transmitting a multi-intent image comprising a representation of the image in a reference viewing environment and metadata for transforming the reference representation into an alternative version of the image, the image transmission system comprising a processor configured to encode the multi-intent image according to any of claims 4 to 10.
13. A non-transitory computer-readable medium storing instructions that, when executed by an electronic processor, cause the electronic processor to perform operations according to any one of claims 1-10.
CN202280066734.2A 2021-10-01 2022-09-27 Encoding and decoding multi-intent images and video using metadata Pending CN118044189A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202163251427P 2021-10-01 2021-10-01
US63/251,427 2021-10-01
EP21208445.3 2021-11-16
PCT/US2022/044899 WO2023055736A1 (en) 2021-10-01 2022-09-27 Encoding and decoding multiple-intent images and video using metadata

Publications (1)

Publication Number Publication Date
CN118044189A true CN118044189A (en) 2024-05-14

Family

ID=78676313

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280066734.2A Pending CN118044189A (en) 2021-10-01 2022-09-27 Encoding and decoding multi-intent images and video using metadata

Country Status (1)

Country Link
CN (1) CN118044189A (en)

Similar Documents

Publication Publication Date Title
KR102670327B1 (en) Method and apparatus for HDR signal conversion
JP7145290B2 (en) Scalable system to control color management with various levels of metadata
US11183143B2 (en) Transitioning between video priority and graphics priority
TWI684166B (en) Signal reshaping for high dynamic range signals
JP5992997B2 (en) Method and apparatus for generating a video encoded signal
US20170034519A1 (en) Method, apparatus and system for encoding video data for selected viewing conditions
KR102135841B1 (en) High dynamic range image signal generation and processing
US20110154426A1 (en) Method and system for content delivery
CN110050292B (en) System and method for adjusting video processing curve of high dynamic range image
JP7084984B2 (en) Tone curve optimization method and related video encoders and video decoders
JP7453214B2 (en) Multi-range HDR video encoding
WO2018111682A1 (en) Systems and methods for adjusting video processing curves for high dynamic range images
CN118044189A (en) Encoding and decoding multi-intent images and video using metadata
WO2023055736A1 (en) Encoding and decoding multiple-intent images and video using metadata
US20230230617A1 (en) Computing dynamic metadata for editing hdr content
Demos High Dynamic Range Intermediate

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination