WO2023010749A1 - Hdr video conversion method and apparatus, and device and computer storage medium - Google Patents

Hdr video conversion method and apparatus, and device and computer storage medium Download PDF

Info

Publication number
WO2023010749A1
WO2023010749A1 PCT/CN2021/137979 CN2021137979W WO2023010749A1 WO 2023010749 A1 WO2023010749 A1 WO 2023010749A1 CN 2021137979 W CN2021137979 W CN 2021137979W WO 2023010749 A1 WO2023010749 A1 WO 2023010749A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
frames
frame
sdr
hdr video
Prior art date
Application number
PCT/CN2021/137979
Other languages
French (fr)
Chinese (zh)
Inventor
陈翔宇
章政文
董超
乔宇
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2023010749A1 publication Critical patent/WO2023010749A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234309Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/92Dynamic range modification of images or parts thereof based on global image properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20208High dynamic range [HDR] image processing

Definitions

  • the present application belongs to the technical field of video processing, and in particular relates to an HDR video conversion method, device, equipment and computer storage medium.
  • HDR video is also gradually developed.
  • SDR Standard Dynamic Range
  • HDR video has a larger dynamic range, a wider color gamut, and can show higher contrast and richer colors. Therefore, in many consumer
  • UHD ultra-high definition
  • HDR picture conversion method needs to first convert the video frame from the video encoding format to the image encoding format, then perform HDR conversion on the data in the image encoding format and then convert it back to the video encoding format to obtain the HDR video, and the processing method is also relatively complicated.
  • Embodiments of the present application provide an HDR video conversion method, device, terminal device, and storage medium, which can solve the problems of complex methods and high calculation costs in the HDR video conversion process.
  • the embodiment of the present application provides a HDR video conversion method, the method includes: performing frame extraction processing on the SDR video to be processed, and obtaining J frames of SDR video frames contained in the SDR video, where J is greater than 1 integer;
  • the SDR video frames of J frames are input into the trained full convolution model for processing, and J frames of HDR video frames are output.
  • the full convolution model includes N convolution kernels with a size of 1 ⁇ 1 convolutional layers, N N-1 activation functions are interspersed in each of the convolutional layers, and N is an integer greater than or equal to 3;
  • the full convolution model provided by this application is used to realize the HDR video conversion task. Since the full convolution model consists of N convolutional layers with a convolution kernel size of 1 ⁇ 1 and interspersed settings Consisting of N-1 activation functions, the model structure is simple and the number of parameters used is relatively small, which can effectively reduce the computational cost of HDR video conversion tasks, improve computational efficiency, and speed up video processing.
  • the activation function is a non-linear activation function.
  • performing frame extraction processing on the SDR video to be processed to obtain J frames of SDR video frames contained in the SDR video including: using a frame extraction tool to perform frame extraction processing on the SDR video frame to obtain the J frames SDR video frame;
  • the described J frames of HDR video frames are combined to obtain the HDR video corresponding to the SDR video, including:
  • the training method of the full convolution model includes:
  • the training set includes a plurality of SDR video frame samples and HDR video frame samples corresponding to the SDR video frame samples.
  • the preset loss function is used to describe the L2 loss between the predicted HDR video frame and the HDR video frame sample, and the predicted HDR video frame is performed on the SDR video frame sample by the full convolution model dealt with.
  • an HDR video conversion device which includes: a frame extraction unit, configured to perform frame extraction processing on the SDR video to be processed, and obtain J frames of SDR video frames contained in the SDR video , J is an integer greater than 1;
  • the processing unit is used to input the SDR video frames of J frames into the trained full convolution model for processing, and output J frames of HDR video frames, and the full convolution model includes N convolution kernels with a size of 1 ⁇ 1 A convolutional layer, N-1 activation functions are interspersed in the N convolutional layers, and N is a positive integer greater than or equal to 3;
  • a frame merging unit configured to perform frame merging processing on the J frames of HDR video frames to obtain an HDR video corresponding to the SDR video.
  • the activation function is a non-linear activation function.
  • the embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • a terminal device including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, any of the above-mentioned first aspect one method.
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method according to any one of the above-mentioned first aspects is implemented.
  • an embodiment of the present application provides a computer program product, which, when the computer program product is run on a terminal device, causes the terminal device to execute the method in any one of the foregoing first aspects.
  • Figure 1 is a schematic diagram of the HDR and SDR color gamut ranges provided by an embodiment of the present application
  • Fig. 2 is a flow chart of an embodiment of an HDR video conversion method provided by an embodiment of the present application
  • Fig. 3 is the architecture diagram of the full convolution model of the HDR video conversion provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of an HDR video conversion device provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • High Dynamic Range has a larger dynamic range and a wider color gamut. Due to the wide color gamut and large dynamic range of HDR, therefore, HDR video can show video with higher contrast and richer colors.
  • FIG 1 is a schematic diagram of the range of HDR and SDR color gamuts.
  • BT.709 and BT.2020 are TV parameter standards issued by the ITU (International Telecommunication Union), and DCI-P3 is the American film industry for digital cinema.
  • the color gamut standard formulated is mostly used to test the color range that the projector can cover.
  • BT.2020 has the largest range, followed by the color gamut range of DCI-P3, and the color range represented by BT.709 The domain range is the smallest.
  • HDR video adopts BT.709 color gamut
  • HDR video adopts BT.2020 color gamut with wider color gamut.
  • HDR video will also adopt DCI-P3 color gamut.
  • the contrast and color of the HDR video are better than the SDR video.
  • SDR video usually adopts 8-bit encoding
  • HDR video adopts 16-bit encoding or 10-bit encoding.
  • neural network-based methods or HDR image conversion algorithms are used to convert SDR video to HDR video.
  • the proposed neural network model has a high complexity and a large amount of calculation.
  • it is often used to convert SDR video into HDR video by using the generation confrontation network.
  • the network parameters used by the generation confrontation network have reached 1.06M (wherein, M is the abbreviation symbol of the order of magnitude "million"), and some network parameters have even reached 2.87M. , the more network parameters, the more complex the designed network, and the greater the amount of calculation.
  • the HDR picture conversion method needs to convert the video frame from the video encoding format to the image encoding format first, then perform HDR conversion on the data in the image encoding format and then convert it back to the video encoding format to obtain the HDR video.
  • the processing method is also relatively complicated.
  • this application provides a full convolution model, which can realize the HDR video conversion task.
  • the full convolution model is composed of N convolution layers with a convolution kernel size of 1 ⁇ 1 and N-1 activation functions interspersed.
  • the model structure is simple and the number of parameters used is relatively small, which can effectively reduce HDR Computational cost of video conversion tasks, improving computational efficiency and speeding up video processing.
  • a convolution layer with a convolution kernel size of 1 ⁇ 1 is generally used as a dimensionality increase/decrease function in a complex neural network model that implements a specific function, that is, to increase or decrease the number of channels of the feature map, thereby improving the neural network.
  • Computational efficiency of network models it was found through experiments that the fully convolutional model constructed by simply superimposing the convolution layer with a convolution kernel size of 1 ⁇ 1 and the activation function can achieve the task of HDR video conversion and achieve good results. transition effect.
  • FIG. 2 it is a flowchart of an embodiment of an HDR video conversion method provided by the present application.
  • the subject of execution of the method may be a video processing device.
  • the video processing device may be a mobile terminal device such as a smart phone, a tablet computer, or a video camera, or may be a terminal device capable of processing video data such as a desktop computer, a robot, or a server.
  • the method includes:
  • the SDR video to be processed may be a complete video taken, downloaded or read from a local storage area, or an SDR video segment intercepted from a completed video.
  • a frame extraction tool may be used to extract frames from the SDR video to be processed.
  • FFmpeg Fast Forward mpeg
  • the full convolution model includes N convolution layers with a convolution kernel size of 1 ⁇ 1, and N convolution layers N-1 activation functions are interspersed in the product layer, and N is an integer greater than or equal to 3.
  • the activation function may be a nonlinear activation function, which can increase the nonlinear fitting ability of the full convolution model and improve the flexibility of the full convolution model.
  • the activation function may be a ReLU activation function.
  • the setting of N can be set according to actual precision requirements.
  • the full convolution model includes 3 convolutional layers with a convolution kernel size of 1 ⁇ 1 and 2 ReLU activation functions interspersed in the 3 convolutional layers .
  • K frames of SDR video are respectively input to the full convolution model shown in Figure 3 for processing, and corresponding K frames of HDR video frames can be obtained.
  • the residual network (ResNet), the ring generation confrontation network (CycleGAN) and the pixel-to-pixel generation network (Pixel 2 Pixel) are algorithm models for image-to-image translation.
  • High Dynamic Range Network High Dynamic Range Net, HDRNet
  • Conditional Sequential Retouching Network Conditional Sequential Retouching Network, CSRNet
  • Ada-3DLUT Adaptive 3D lookup table
  • Deep super-resolution inverse tone-mapping method (Deep super-resolution inverse tone-mapping, Deep SR-ITM) and super-resolution joint inverse tone mapping generation confrontation network (GAN-Based Joint Super-Resolution and Inverse Tone-Mapping, JSI-GAN ) is an algorithm model for SDR video to HDR video conversion.
  • the fully convolutional network model is in peak signal to noise ratio (Peak Signal to Noise Ratio, PSNR), structural similarity index measure (SSIM), spectral residual based similarity index (spectral residual based similarity index measure, SR-SIM), color fidelity ⁇ E ITP , high dynamic range visible difference prediction (High Dynamic Range Visible Difference Predictor, HDR-VDP3) and other performance indicators also have good experimental results.
  • PSNR Peak Signal to Noise Ratio
  • SSIM structural similarity index measure
  • SR-SIM spectral residual based similarity index measure
  • HDR-VDP3 High Dynamic Range Visible Difference Predictor
  • N may specifically have a value between 3 and 10.
  • the algorithm models listed in Table 1 they generally have advantages in parameter quantity and belong to efficient models.
  • the training method of the full convolution model in this embodiment includes: using the preset training set and the preset loss function to iteratively train the initial full convolution model to obtain the above full convolution model; wherein,
  • the training set includes a plurality of SDR video frame samples and HDR video frame samples corresponding to the SDR video frame samples.
  • the SDR video sample and its corresponding HDR video sample can be acquired from public video websites. It is also possible to perform SDR and HDR processing on videos in the same RAW data format, respectively, to obtain SDR video samples and corresponding HDR video samples. It is also possible to use the SDR camera and the HDR camera respectively to shoot corresponding SDR video samples and HDR video samples in the same scene.
  • the SDR video samples and their corresponding HDR video samples are frame-drawn to obtain a plurality of SDR video frame samples and the temporal and spatial connections between multiple SDR video samples.
  • L2 is used as the preset loss function for full convolution model training.
  • the preset loss function is used to describe the loss between the predicted HDR video frame and the HDR video frame sample, wherein the predicted HDR video frame is obtained by processing the SDR video frame sample by the full convolution model.
  • the J frames of HDR video frames obtained after the above-mentioned full convolution model processing are image data in HDR format corresponding to the J frames of SDR format image data extracted from the SDR video to be processed. It can adapt to higher contrast and color gamut range. It uses 16-bit encoding. Therefore, the J-frame HDR video frame output after the full convolution model is 16-bit encoded or 10-bit encoded image data. Compared with 8-bit encoded image data, 16-bit encoded image data or 10-bit encoded image data display brighter image data.
  • the J frames of HDR video frames obtained after being processed by the HDR video conversion model may be combined by using a frame extraction tool, for example, the J frames of HDR video frames may be combined by using the FFmpeg tool.
  • the existing models When performing HDR video conversion tasks, the existing models often need to convert the SDR video frames obtained by frame extraction into 8-bit YUV encoded frames after the frame extraction tool is used to complete the processing.
  • the YUV sequence file obtained after processing also needs to be format converted to obtain HDR video.
  • the embodiment of the present application provides a fully convolutional neural network, and the SDR video frames obtained by using the frame extraction tool to extract frames can be directly input into the fully convolutional neural network for processing, and the obtained HDR video frames can also be directly combined to obtain an HDR video.
  • the method provided by this application does not need to convert the SDR video to be processed into data in other formats multiple times.
  • the neural network model it does not need to convert data in other formats into HDR multiple times. data, reducing the complexity of the method.
  • an embodiment of the present application provides an HDR video conversion device.
  • the embodiment of the device corresponds to the embodiment of the method described above.
  • the embodiment of the device does not implement the method described above.
  • the details in the examples are described one by one, but it should be clear that the device in this embodiment can correspondingly implement all the content in the foregoing method embodiments.
  • the present application provides an HDR video conversion device, and the above-mentioned device 200 includes:
  • the frame extraction unit 201 is used to perform frame extraction processing on the SDR video to be processed, so as to obtain J frames of SDR video frames contained in the SDR video, where J is an integer greater than 1;
  • the processing unit 202 is used to input J frames of SDR video frames into the trained full convolution model for processing, and output J frames of HDR video frames.
  • the full convolution model includes N convolutional layers with a convolution kernel size of 1 ⁇ 1 , N-1 activation functions are interspersed in N convolutional layers, and N is a positive integer greater than or equal to 3.
  • the framing unit 203 is configured to perform frame merging processing on J frames of HDR video frames to obtain HDR video corresponding to the SDR video.
  • the activation function is a non-linear activation function.
  • the SDR video to be processed is subjected to frame extraction processing to obtain J-frame SDR video frames contained in the SDR video, including: adopting FFmpeg tool to perform frame extraction processing on the SDR video frame to obtain J-frame SDR video frames;
  • Combine J frames of HDR video frames to obtain HDR video corresponding to SDR video including: using FFmpeg tool to perform frame processing on J frames of HDR video frames to obtain HDR video corresponding to SDR video.
  • the training methods of the full convolution model include:
  • the training set includes a plurality of SDR video frame samples and HDR video frame samples corresponding to the SDR video frame samples.
  • the preset loss function is used to describe the L2 loss between the predicted HDR video frame and the HDR video frame sample, and the predicted HDR video frame is obtained by processing the SDR video frame sample by the full convolution model.
  • FIG. 5 is a schematic diagram of a terminal device provided in an embodiment of the present application.
  • the terminal device 300 provided in this embodiment includes: a memory 302 and a processor 301, the memory 302 is used to store computer programs; the processor 301 is used to The methods described in the above method embodiments are executed when the computer program is called, for example, steps S101 to S103 shown in FIG. 2 .
  • the processor 301 executes the computer program, it realizes the functions of the modules/units in the above-mentioned device embodiments, for example, the functions of the units 201 to 203 shown in FIG. 4 .
  • the computer program may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 302 and executed by the processor 301 to complete this Apply.
  • the one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program in the terminal device.
  • FIG. 5 is only an example of a terminal device, and does not constitute a limitation on the terminal device. It may include more or less components than those shown in the figure, or combine certain components, or different components, such as
  • the terminal device may also include an input and output device, a network access device, a bus, and the like.
  • the processor 301 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the storage 302 may be an internal storage unit of the terminal device, for example, a hard disk or memory of the terminal device.
  • the memory 302 may also be an external storage device of the terminal device, such as a plug-in hard disk equipped on the terminal device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, Flash card (Flash Card), etc. Further, the memory 302 may also include both an internal storage unit of the terminal device and an external storage device.
  • the memory 302 is used to store the computer program and other programs and data required by the terminal device.
  • the memory 302 can also be used to temporarily store data that has been output or will be output.
  • the terminal device provided in this embodiment can execute the foregoing method embodiment, and its implementation principle and technical effect are similar, and details are not repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method described in the foregoing method embodiment is implemented.
  • the embodiment of the present application further provides a computer program product, which, when the computer program product runs on a terminal device, enables the terminal device to implement the method described in the foregoing method embodiments when executed.
  • An embodiment of the present application further provides a chip system, including a processor, the processor is coupled to a memory, and the processor executes a computer program stored in the memory, so as to implement the method described in the above method embodiment.
  • the chip system may be a single chip, or a chip module composed of multiple chips.
  • the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, all or part of the procedures in the methods of the above embodiments in the present application can be completed by instructing related hardware through computer programs, and the computer programs can be stored in a computer-readable storage medium.
  • the computer program When executed by a processor, the steps in the above-mentioned various method embodiments can be realized.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable storage medium may at least include: any entity or device capable of carrying computer program codes to a photographing device/terminal device, a recording medium, a computer memory, a read-only memory (Read-Only Memory, ROM), a random access Memory (Random Access Memory, RAM), electrical carrier signal, telecommunication signal and software distribution medium.
  • a photographing device/terminal device a recording medium
  • a computer memory a read-only memory (Read-Only Memory, ROM), a random access Memory (Random Access Memory, RAM), electrical carrier signal, telecommunication signal and software distribution medium.
  • ROM read-only memory
  • RAM random access Memory
  • electrical carrier signal telecommunication signal and software distribution medium.
  • U disk mobile hard disk
  • magnetic disk or optical disk etc.
  • computer readable media may not be electrical carrier signals and telecommunication signals under legislation and patent practice.
  • references to "one embodiment” or “some embodiments” or the like in this application means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically stated otherwise.
  • the terms “including”, “comprising”, “having” and variations thereof mean “including but not limited to”, unless specifically stated otherwise.
  • first and second are used for description purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, the features defined as “first” and “second” may explicitly or implicitly include at least one of these features. It should also be understood that the term “and/or” used in the description of the present application and the appended claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations.
  • connection and “connected” should be understood in a broad sense, for example, it can be mechanical connection or electrical connection; it can be direct connection or through An intermediate medium is indirectly connected, which can be the internal communication of two elements or the interaction relationship between two elements. Unless otherwise clearly defined, those of ordinary skill in the art can understand the above terms in this application according to the specific situation. specific meaning.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

The present application relates to the technical field of video processing, and provides an HDR video conversion method and apparatus, and a device and a computer storage medium. The method comprises: performing frame extraction processing on an SDR video to be processed, so as to obtain J SDR video frames included in the SDR video, wherein J is an integer greater than 1; inputting the J SDR video frames into a trained fully convolutional model for processing, and outputting J HDR video frames, wherein the fully convolutional model comprises N convolutional layers with a convolution kernel size being 1×1, N-1 activation functions are alternately arranged in the N convolutional layers, and N is an integer greater than or equal to 3; and performing frame combination processing on the J HDR video frames to obtain an HDR video corresponding to the SDR video. The model used in the present application is simple in structure and has a relatively small amount of parameters, so that the calculation cost of an HDR video conversion task can be effectively reduced, the calculation efficiency is improved, and the speed of video processing is increased.

Description

一种HDR视频转换方法、装置、设备及计算机存储介质A kind of HDR video conversion method, device, equipment and computer storage medium 技术领域technical field
本申请属于视频处理技术领域,尤其涉及一种HDR视频转换方法、装置、设备及计算机存储介质。The present application belongs to the technical field of video processing, and in particular relates to an HDR video conversion method, device, equipment and computer storage medium.
背景技术Background technique
随着高动态范围(High Dynamic Range,HDR)拍摄和视频显示技术的成熟,HDR视频也逐渐发展起来。相较于传统的标准动态范围(Standard Dynamic Range,SDR)视频,HDR视频具有更大的动态范围、更宽广的色域,能够展现出更高的对比度和更加丰富的色彩,因此,在许多消费级超高清(Ultra-high definition,UHD)电视上设置了播放HDR视频的功能用于HDR视频的播放。但是,目前能够拍摄真正符合HDR视频标准的拍摄设备尚未普及,因此,如何将拍摄的大量SDR视频转换为HDR视频,成为目前的热门问题。With the maturity of high dynamic range (High Dynamic Range, HDR) shooting and video display technology, HDR video is also gradually developed. Compared with the traditional standard dynamic range (Standard Dynamic Range, SDR) video, HDR video has a larger dynamic range, a wider color gamut, and can show higher contrast and richer colors. Therefore, in many consumer The function of playing HDR video is set on the ultra-high definition (Ultra-high definition, UHD) TV for the playback of HDR video. However, at present, shooting equipment capable of shooting truly HDR video standards has not been popularized. Therefore, how to convert a large number of shot SDR videos into HDR videos has become a hot issue at present.
目前,一般采用基于神经网络的方法或HDR图片转换算法将SDR视频转换为HDR视频。但已提出的神经网络模型的复杂度较高,计算量较大。HDR图片转换方法需要先把视频帧从视频编码格式转换为图像编码格式,然后对图像编码格式的数据进行HDR转换后重新转回视频编码格式,得到HDR视频,处理方法也较为复杂。At present, methods based on neural networks or HDR image conversion algorithms are generally used to convert SDR video to HDR video. However, the proposed neural network model has a high complexity and a large amount of calculation. The HDR picture conversion method needs to first convert the video frame from the video encoding format to the image encoding format, then perform HDR conversion on the data in the image encoding format and then convert it back to the video encoding format to obtain the HDR video, and the processing method is also relatively complicated.
发明内容Contents of the invention
本申请实施例提供一种HDR视频转换方法、装置、终端设备及存储介质,可以解决HDR视频转换过程中方法复杂、计算成本高的问题。Embodiments of the present application provide an HDR video conversion method, device, terminal device, and storage medium, which can solve the problems of complex methods and high calculation costs in the HDR video conversion process.
第一方面,本申请实施例提供了一种HDR视频转换方法,该方法包括: 对待处理的SDR视频进行抽帧处理,得到所述SDR视频中包含的J帧SDR视频帧,J为大于1的整数;In the first aspect, the embodiment of the present application provides a HDR video conversion method, the method includes: performing frame extraction processing on the SDR video to be processed, and obtaining J frames of SDR video frames contained in the SDR video, where J is greater than 1 integer;
分别将J帧所述SDR视频帧输入已训练的全卷积模型中处理,输出J帧HDR视频帧,所述全卷积模型包括N个卷积核大小为1×1的卷积层,N个所述卷积层中穿插设置有N-1个激活函数,N为大于等于3的整数;The SDR video frames of J frames are input into the trained full convolution model for processing, and J frames of HDR video frames are output. The full convolution model includes N convolution kernels with a size of 1×1 convolutional layers, N N-1 activation functions are interspersed in each of the convolutional layers, and N is an integer greater than or equal to 3;
将所述J帧HDR视频帧进行合帧处理,得到与所述SDR视频对应的HDR视频。Combining the J frames of HDR video frames to obtain an HDR video corresponding to the SDR video.
在本申请提供的HDR视频转换方法中,使用本申请提供的全卷积模型实现HDR视频转换任务,由于全卷积模型由卷积核大小为1×1的N个卷积层及穿插设置的N-1个激活函数构成,模型结构简单且使用的参数量相对较少,能够有效降低HDR视频转换任务的计算成本,提高计算效率,加快视频处理的速度。In the HDR video conversion method provided by this application, the full convolution model provided by this application is used to realize the HDR video conversion task. Since the full convolution model consists of N convolutional layers with a convolution kernel size of 1×1 and interspersed settings Consisting of N-1 activation functions, the model structure is simple and the number of parameters used is relatively small, which can effectively reduce the computational cost of HDR video conversion tasks, improve computational efficiency, and speed up video processing.
可选地,3≤N≤10。Optionally, 3≤N≤10.
可选地,所述激活函数为非线性激活函数。Optionally, the activation function is a non-linear activation function.
可选地,对待处理的SDR视频进行抽帧处理,得到所述SDR视频中包含的J帧SDR视频帧,包括:采用抽帧工具对所述SDR视频帧进行抽帧处理,得到所述J帧SDR视频帧;Optionally, performing frame extraction processing on the SDR video to be processed to obtain J frames of SDR video frames contained in the SDR video, including: using a frame extraction tool to perform frame extraction processing on the SDR video frame to obtain the J frames SDR video frame;
所述将所述J帧HDR视频帧进行合帧处理,得到与所述SDR视频对应的HDR视频,包括:The described J frames of HDR video frames are combined to obtain the HDR video corresponding to the SDR video, including:
采用所述抽帧工具对所述J帧HDR视频帧进行合帧处理,得到与所述SDR视频对应的HDR视频。Using the frame extraction tool to perform frame combining processing on the J frames of HDR video frames to obtain an HDR video corresponding to the SDR video.
可选地,所述全卷积模型的训练方式包括:Optionally, the training method of the full convolution model includes:
利用预设的训练集和预设的损失函数对初始的全卷积模型进行迭代训练,得到所述全卷积模型;Using a preset training set and a preset loss function to iteratively train the initial full convolution model to obtain the full convolution model;
所述训练集包括多个SDR视频帧样本以及与所述SDR视频帧样本对应的HDR视频帧样本。The training set includes a plurality of SDR video frame samples and HDR video frame samples corresponding to the SDR video frame samples.
所述预设的损失函数用于描述预测的HDR视频帧和所述HDR视频帧 样本之间的L2损失,所述预测的HDR视频帧为所述全卷积模型对所述SDR视频帧样本进行处理得到的。The preset loss function is used to describe the L2 loss between the predicted HDR video frame and the HDR video frame sample, and the predicted HDR video frame is performed on the SDR video frame sample by the full convolution model dealt with.
第二方面,本申请实施例提供了一种HDR视频转换装置,该装置包括:抽帧单元,用于对待处理的SDR视频进行抽帧处理,得到所述SDR视频中包含的J帧SDR视频帧,J为大于1的整数;In a second aspect, the embodiment of the present application provides an HDR video conversion device, which includes: a frame extraction unit, configured to perform frame extraction processing on the SDR video to be processed, and obtain J frames of SDR video frames contained in the SDR video , J is an integer greater than 1;
处理单元,用于分别将J帧所述SDR视频帧输入已训练的全卷积模型中处理,输出J帧HDR视频帧,所述全卷积模型包括N个卷积核大小为1×1的卷积层,N个所述卷积层中穿插设置有N-1个激活函数,N为大于或者等于3的正整数;The processing unit is used to input the SDR video frames of J frames into the trained full convolution model for processing, and output J frames of HDR video frames, and the full convolution model includes N convolution kernels with a size of 1×1 A convolutional layer, N-1 activation functions are interspersed in the N convolutional layers, and N is a positive integer greater than or equal to 3;
合帧单元,用于将所述J帧HDR视频帧进行合帧处理,得到与所述SDR视频对应的HDR视频。A frame merging unit, configured to perform frame merging processing on the J frames of HDR video frames to obtain an HDR video corresponding to the SDR video.
可选地,3≤N≤10。Optionally, 3≤N≤10.
可选地,所述激活函数为非线性激活函数。Optionally, the activation function is a non-linear activation function.
第三方面,本申请实施例提供了一种终端设备,包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序,处理器执行计算机程序时实现如上述第一方面中任一项的方法。In the third aspect, the embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the computer program, any of the above-mentioned first aspect one method.
第四方面,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时实现如上述第一方面中任一项的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method according to any one of the above-mentioned first aspects is implemented.
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行上述第一方面中任一项的方法。In a fifth aspect, an embodiment of the present application provides a computer program product, which, when the computer program product is run on a terminal device, causes the terminal device to execute the method in any one of the foregoing first aspects.
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面和第一方面的各可能的实施方式所带来的有益效果的相关描述,在此不再赘述。It can be understood that, for the beneficial effects of the above-mentioned second aspect to the fifth aspect, reference may be made to the relevant description of the above-mentioned first aspect and the beneficial effects brought about by each possible implementation manner of the first aspect, and details are not repeated here.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the accompanying drawings that need to be used in the descriptions of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings in the following description are only for the present application For some embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings without paying creative efforts.
图1是本申请一实施例提供的HDR和SDR色域表示范围的示意图;Figure 1 is a schematic diagram of the HDR and SDR color gamut ranges provided by an embodiment of the present application;
图2是本申请一实施例提供的一种HDR视频转换方法的一个实施例的流程图;Fig. 2 is a flow chart of an embodiment of an HDR video conversion method provided by an embodiment of the present application;
图3是本申请一实施例提供的HDR视频转换的全卷积模型的架构图;Fig. 3 is the architecture diagram of the full convolution model of the HDR video conversion provided by an embodiment of the present application;
图4是本申请一实施例提供的一种HDR视频转换装置的示意图;FIG. 4 is a schematic diagram of an HDR video conversion device provided by an embodiment of the present application;
图5是本申请一实施例提供的一种终端设备的结构示意图。Fig. 5 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.
相较于标准动态范围(Standard Dynamic Range,SDR),高动态范围(High Dynamic Range,HDR)的动态范围更大且色域范围具有更宽广,由于HDR的色域广、动态范围大,因此,HDR视频能够展现出对比度更高和色彩更加丰富的视频。Compared with Standard Dynamic Range (SDR), High Dynamic Range (HDR) has a larger dynamic range and a wider color gamut. Due to the wide color gamut and large dynamic range of HDR, therefore, HDR video can show video with higher contrast and richer colors.
如图1所示为HDR和SDR色域表示范围的示意图,其中,BT.709和BT.2020都是ITU(国际电信联盟)发布的电视参数标准,DCI-P3是美国电影工业为数字电影院所制定的色域标准,多用来测试投影机所能覆盖的色彩范围。就色域范围来讲,在图1所示的DCI-P3、BT.709和BT.2020中范围最大的是BT.2020,DCI-P3的色域范围次之,BT.709所表示的色域范 围最小。Figure 1 is a schematic diagram of the range of HDR and SDR color gamuts. Among them, BT.709 and BT.2020 are TV parameter standards issued by the ITU (International Telecommunication Union), and DCI-P3 is the American film industry for digital cinema. The color gamut standard formulated is mostly used to test the color range that the projector can cover. In terms of color gamut range, among the DCI-P3, BT.709 and BT.2020 shown in Figure 1, BT.2020 has the largest range, followed by the color gamut range of DCI-P3, and the color range represented by BT.709 The domain range is the smallest.
通常SDR视频采用BT.709色域,而HDR视频采用的是色域更为宽广的BT.2020色域,在实际应用HDR视频也会采用DCI-P3色域。就同一视频而言,无论HDR视频采用BT.2020色域还是DCI-P3色域,HDR视频展现出对比度和色彩都要优于SDR视频。Usually SDR video adopts BT.709 color gamut, while HDR video adopts BT.2020 color gamut with wider color gamut. In actual application, HDR video will also adopt DCI-P3 color gamut. As far as the same video is concerned, whether the HDR video uses the BT.2020 color gamut or the DCI-P3 color gamut, the contrast and color of the HDR video are better than the SDR video.
除此之外,在编码格式上,SDR视频通常采用的是8比特编码,而HDR视频采用的是16比特编码或者10比特编码的编码格式,编码格式所采用的比特位越大,视频呈现出来的对比度越高、色域范围广。In addition, in terms of encoding format, SDR video usually adopts 8-bit encoding, while HDR video adopts 16-bit encoding or 10-bit encoding. The higher the contrast, the wider the color gamut.
随着HDR视频的拍摄及显示技术的逐渐成熟,越来越多的播放设备支持HDR视频的播放。因此,如何将SDR视频转换为HDR视频,成为当前的热门问题。With the gradual maturity of HDR video shooting and display technology, more and more playback devices support HDR video playback. Therefore, how to convert SDR video to HDR video has become a current hot issue.
目前,采用基于神经网络的方法或HDR图片转换算法将SDR视频转换为HDR视频。但已提出的神经网络模型的复杂度较高,计算量较大。例如,常使用生成对抗网络实现SDR视频转换HDR视频,其中生成对抗网络使用的网络参数达到了1.06M(其中,M是数量级“百万”的缩写符号),有的网络参数甚至达到了2.87M,网络参数越多,设计的网络越复杂,计算量越大。而HDR图片转换方法需要先把视频帧从视频编码格式转换为图像编码格式,然后对图像编码格式的数据进行HDR转换后重新转回视频编码格式,得到HDR视频,处理方法也较为复杂。Currently, neural network-based methods or HDR image conversion algorithms are used to convert SDR video to HDR video. However, the proposed neural network model has a high complexity and a large amount of calculation. For example, it is often used to convert SDR video into HDR video by using the generation confrontation network. The network parameters used by the generation confrontation network have reached 1.06M (wherein, M is the abbreviation symbol of the order of magnitude "million"), and some network parameters have even reached 2.87M. , the more network parameters, the more complex the designed network, and the greater the amount of calculation. The HDR picture conversion method needs to convert the video frame from the video encoding format to the image encoding format first, then perform HDR conversion on the data in the image encoding format and then convert it back to the video encoding format to obtain the HDR video. The processing method is also relatively complicated.
针对现有的HDR视频转换方法复杂、计算成本高的问题,本申请提供一种全卷积模型,能够实现HDR视频转换任务。其中,全卷积模型是由卷积核大小为1×1的N个卷积层及穿插设置的N-1个激活函数构成,模型结构简单且使用的参数量相对较少,能够有效降低HDR视频转换任务的计算成本,提高计算效率,加快视频处理的速度。Aiming at the problems that the existing HDR video conversion methods are complicated and computationally expensive, this application provides a full convolution model, which can realize the HDR video conversion task. Among them, the full convolution model is composed of N convolution layers with a convolution kernel size of 1×1 and N-1 activation functions interspersed. The model structure is simple and the number of parameters used is relatively small, which can effectively reduce HDR Computational cost of video conversion tasks, improving computational efficiency and speeding up video processing.
一般来说,卷积核大小为1×1的卷积层一般在实现特定功能的复杂的神经网络模型中用作升/降维功能,即增加或者减少特征图的通道数,从而提升该神经网络模型的计算效率。而在本申请中,经试验发现,通过将卷 积核大小为1×1的卷积层与激活函数进行简单的叠加,构建的全卷积模型,能够实现HDR视频转换任务,并且取得良好的转换效果。Generally speaking, a convolution layer with a convolution kernel size of 1×1 is generally used as a dimensionality increase/decrease function in a complex neural network model that implements a specific function, that is, to increase or decrease the number of channels of the feature map, thereby improving the neural network. Computational efficiency of network models. However, in this application, it was found through experiments that the fully convolutional model constructed by simply superimposing the convolution layer with a convolution kernel size of 1×1 and the activation function can achieve the task of HDR video conversion and achieve good results. transition effect.
下面以具体地实施例对本申请的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solution of the present application will be described in detail below with specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
参见图2,为本申请提供的一种HDR视频转换方法的一个实施例的流程图。该方法的执行主体可以是视频处理设备。其中,视频处理设备可以是智能手机、平板电脑、摄像机等移动终端设备,还可以是台式电脑、机器人、服务器等能够处理视频数据的终端设备。如图2所示,该方法包括:Referring to FIG. 2 , it is a flowchart of an embodiment of an HDR video conversion method provided by the present application. The subject of execution of the method may be a video processing device. Wherein, the video processing device may be a mobile terminal device such as a smart phone, a tablet computer, or a video camera, or may be a terminal device capable of processing video data such as a desktop computer, a robot, or a server. As shown in Figure 2, the method includes:
S101,对待处理的SDR视频进行抽帧处理,得到SDR视频中包含的J帧SDR视频帧,J为大于3的整数。S101. Perform frame sampling processing on the SDR video to be processed to obtain J frames of SDR video frames included in the SDR video, where J is an integer greater than 3.
其中,待处理的SDR视频可以是拍摄、下载或者是从本地存储区域中读取的完整的视频,也可以是从完成的视频中截取的SDR视频片段。Wherein, the SDR video to be processed may be a complete video taken, downloaded or read from a local storage area, or an SDR video segment intercepted from a completed video.
示例性的,可以采用抽帧工具对待处理的SDR视频进行抽帧。例如,采用FFmpeg(Fast Forward mpeg)工具对待处理的SDR视频进行抽帧。Exemplarily, a frame extraction tool may be used to extract frames from the SDR video to be processed. For example, use the FFmpeg (Fast Forward mpeg) tool to extract frames from the SDR video to be processed.
S102,分别将J帧SDR视频帧输入已训练的全卷积模型中处理,输出J帧HDR视频帧,全卷积模型包括N个卷积核大小为1×1的卷积层,N个卷积层中穿插设置有N-1个激活函数,N为大于等于3的整数。S102. Input J frames of SDR video frames into the trained full convolution model for processing, and output J frames of HDR video frames. The full convolution model includes N convolution layers with a convolution kernel size of 1×1, and N convolution layers N-1 activation functions are interspersed in the product layer, and N is an integer greater than or equal to 3.
其中,激活函数可以是非线性激活函数,能够增加全卷积模型的非线拟合能力,提高全卷积模型的灵活度。示例性的,该激活函数可以是ReLU激活函数。Wherein, the activation function may be a nonlinear activation function, which can increase the nonlinear fitting ability of the full convolution model and improve the flexibility of the full convolution model. Exemplarily, the activation function may be a ReLU activation function.
N的设置可以根据实际精度要求来设置。The setting of N can be set according to actual precision requirements.
示例性的,设置N=3,如图3所示,全卷积模型包括3层卷积核大小为1×1的卷积层及穿插设置在3层卷积层中的2个ReLU激活函数。分别将K帧SDR视频输入如图3所示的全卷积模型进行处理,即可得到对应的K帧HDR视频帧。Exemplarily, set N=3, as shown in Figure 3, the full convolution model includes 3 convolutional layers with a convolution kernel size of 1×1 and 2 ReLU activation functions interspersed in the 3 convolutional layers . K frames of SDR video are respectively input to the full convolution model shown in Figure 3 for processing, and corresponding K frames of HDR video frames can be obtained.
下面结合如下表1,对图3所示全卷积模型的性能进行说明:The performance of the full convolution model shown in Figure 3 is described below in conjunction with Table 1 below:
表1Table 1
模型Model ParamsParams PSNRPSNR SSIMSSIM SR-SIMSR-SIM ΔE ITP ΔE ITP HDR-VDP3HDR-VDP3
ResNetResNet 1.37M1.37M 37.3237.32 0.97200.9720 0.99500.9950 9.029.02 8.3918.391
Pixel2PixelPixel2Pixel 11.38M11.38M 25.8025.80 0.87770.8777 0.98710.9871 44.2544.25 7.1367.136
CycleGANCycleGAN 11.38M11.38M 21.3321.33 0.84960.8496 0.95950.9595 77.7477.74 6.9416.941
HDRNetHDRNet 482K482K 35.7335.73 0.96640.9664 0.99570.9957 11.5211.52 8.4628.462
CSRNetCSRNet 36K36K 35.0435.04 0.96250.9625 0.99550.9955 14.2814.28 8.4008.400
Ada-3DLUTAda-3DLUT 594K594K 36.2236.22 0.96580.9658 0.99670.9967 10.8910.89 8.4238.423
Deep SR-ITMDeep SR-ITM 2.87M2.87M 37.1037.10 0.96860.9686 0.99500.9950 9.249.24 8.2338.233
JSI-GANJSI-GAN 1.06M1.06M 37.0137.01 0.96940.9694 0.99280.9928 9.369.36 8.1698.169
全卷积模型Fully Convolutional Model 5K5K 36.1436.14 0.96430.9643 0.99610.9961 10.4310.43 8.0358.035
表1中,残差网络(ResNet)、环形生成对抗网络(CycleGAN)和像素到像素生成网络(Pixel 2 Pixel)是用于图像到图像的转换(image-to-image traslation)的算法模型。高动态范围网络(High Dynamic Range Net,HDRNet)、条件序列图像修饰网络(Conditional Sequential Retouching Network,CSRNet)和自适应3D查找表(Adaptive 3D lookup table。Ada-3DLUT)网络是用于图像修饰(photo retouching)的算法模型。深度超分辨联合逆色调映射方法(Deep super-resolution inverse tone-mapping,Deep SR-ITM)和超分辨联合逆色调映射生成对抗网络(GAN-Based Joint Super-Resolution and Inverse Tone-Mapping,JSI-GAN)是用于SDR视频到HDR视频转换的算法模型。In Table 1, the residual network (ResNet), the ring generation confrontation network (CycleGAN) and the pixel-to-pixel generation network (Pixel 2 Pixel) are algorithm models for image-to-image translation. High Dynamic Range Network (High Dynamic Range Net, HDRNet), Conditional Sequential Retouching Network (Conditional Sequential Retouching Network, CSRNet) and Adaptive 3D lookup table (Ada-3DLUT) network are used for image modification (photo retouching) algorithm model. Deep super-resolution inverse tone-mapping method (Deep super-resolution inverse tone-mapping, Deep SR-ITM) and super-resolution joint inverse tone mapping generation confrontation network (GAN-Based Joint Super-Resolution and Inverse Tone-Mapping, JSI-GAN ) is an algorithm model for SDR video to HDR video conversion.
从表1可以看出,当N=3时,激活函数采用ReLU激活函数时,本申请提供的全卷积网络模型的参数量不到5K(其中,K是数量级“千”的缩写符号)个,在参数量上远小于表1中列举的其他算法模型。因此,具备高效的处理效率。It can be seen from Table 1 that when N=3, when the activation function adopts the ReLU activation function, the parameters of the fully convolutional network model provided by this application are less than 5K (wherein, K is the abbreviation symbol of the order of magnitude "thousand") , the number of parameters is much smaller than other algorithm models listed in Table 1. Therefore, it has high processing efficiency.
且全卷积网络模型在峰值信噪比(Peak Signal to Noise Ratio,PSNR)、 结构相似性度量指标(structural similarity index measure,SSIM)、基于光谱残差的相似度度量指标(spectral residual based similarity index measure,SR-SIM)、色彩保真度ΔE ITP、高动态范围可见差异预测(High Dynamic Range Visible Difference Predictor,HDR-VDP3)等性能指标上也具备良好的实验效果。 And the fully convolutional network model is in peak signal to noise ratio (Peak Signal to Noise Ratio, PSNR), structural similarity index measure (SSIM), spectral residual based similarity index (spectral residual based similarity index measure, SR-SIM), color fidelity ΔE ITP , high dynamic range visible difference prediction (High Dynamic Range Visible Difference Predictor, HDR-VDP3) and other performance indicators also have good experimental results.
当然,N具体可以取值在3到10之间。例如,当N=5时,全卷积模型的参数量不到13K,在表1所列举的算法模型中,在参数量上通样具备优势,属于高效的模型。且当N=5时,全卷积模型的PSNR=36.15,SSIM=0.9642,SR_SIM=0.9963,ΔE ITP=10.43,HDR-VDP=38.032。可见,在其他性能,也具备良好的实验效果。 Certainly, N may specifically have a value between 3 and 10. For example, when N=5, the parameter quantity of the full convolution model is less than 13K. Among the algorithm models listed in Table 1, they generally have advantages in parameter quantity and belong to efficient models. And when N=5, PSNR=36.15, SSIM=0.9642, SR_SIM=0.9963, ΔE ITP =10.43, HDR-VDP=38.032 of the full convolution model. It can be seen that in other properties, it also has good experimental results.
值得说明的是,本实施例中全卷积模型的训练方式包括:利用预设的训练集和预设的损失函数对初始的全卷积模型进行迭代训练,得到上述全卷积模型;其中,训练集包括多个SDR视频帧样本以及与上述SDR视频帧样本对应的HDR视频帧样本。示例性的,SDR视频样本及其对应的HDR视频样本,可以从公开的视频网站中获取SDR视频样本及对应的HDR视频样本。也可以对同一RAW数据格式的视频分别进行SDR和HDR处理,得到SDR视频样本及其对应的HDR视频样本。还可以分别利用SDR相机和HDR相机在同一场景下,分别拍摄对应的SDR视频样本和HDR视频样本。It is worth noting that the training method of the full convolution model in this embodiment includes: using the preset training set and the preset loss function to iteratively train the initial full convolution model to obtain the above full convolution model; wherein, The training set includes a plurality of SDR video frame samples and HDR video frame samples corresponding to the SDR video frame samples. Exemplarily, the SDR video sample and its corresponding HDR video sample can be acquired from public video websites. It is also possible to perform SDR and HDR processing on videos in the same RAW data format, respectively, to obtain SDR video samples and corresponding HDR video samples. It is also possible to use the SDR camera and the HDR camera respectively to shoot corresponding SDR video samples and HDR video samples in the same scene.
在获取到SDR视频样本及其对应的HDR视频样本之后,分别对SDR视频样本及其对应的HDR视频样本进行抽帧处理,得到多个SDR视频帧样本以及在时序上和空间上与多个SDR视频帧样本一一对应的HDR视频帧样本。After the SDR video samples and their corresponding HDR video samples are acquired, the SDR video samples and their corresponding HDR video samples are frame-drawn to obtain a plurality of SDR video frame samples and the temporal and spatial connections between multiple SDR video samples. One-to-one correspondence between video frame samples and HDR video frame samples.
本实施例中采用L2作为全卷积模型训练的预设损失函数。预设的损失函数用于描述预测的HDR视频帧和HDR视频帧样本之间的损失,其中预测的HDR视频帧为全卷积模型对SDR视频帧样本进行处理得到的。In this embodiment, L2 is used as the preset loss function for full convolution model training. The preset loss function is used to describe the loss between the predicted HDR video frame and the HDR video frame sample, wherein the predicted HDR video frame is obtained by processing the SDR video frame sample by the full convolution model.
S103,将J帧HDR视频帧进行合帧处理,得到与SDR视频对应的HDR 视频。S103. Combine J frames of HDR video frames to obtain HDR video corresponding to the SDR video.
值得说明的是,经上述全卷积模型处理后得到的J帧HDR视频帧是与从待处理的SDR视频中提取的J帧SDR格式的图像数据对应的HDR格式的图像数据,由于HDR视频为了能够适配更高对比度和色域范围,其采用的是16比特编码,因此,经全卷积模型处理后输出的J帧HDR视频帧是16比特编码或者10比特编码的图像数据,相较于8比特编码的图像数据,16比特编码或者10比特编码的图像数据显示的图像数据更亮。It is worth noting that the J frames of HDR video frames obtained after the above-mentioned full convolution model processing are image data in HDR format corresponding to the J frames of SDR format image data extracted from the SDR video to be processed. It can adapt to higher contrast and color gamut range. It uses 16-bit encoding. Therefore, the J-frame HDR video frame output after the full convolution model is 16-bit encoded or 10-bit encoded image data. Compared with 8-bit encoded image data, 16-bit encoded image data or 10-bit encoded image data display brighter image data.
应该理解的,可以采用抽帧工具对经HDR视频转换模型处理后得到的J帧HDR视频帧进行合帧,例如,采用FFmpeg工具对J帧HDR视频帧进行合帧。It should be understood that the J frames of HDR video frames obtained after being processed by the HDR video conversion model may be combined by using a frame extraction tool, for example, the J frames of HDR video frames may be combined by using the FFmpeg tool.
在执行HDR视频转任务时,现有的模型往往在利用抽帧工具完成抽帧之后,需要将抽帧得到的SDR视频帧转换成8比特YUV编码帧,才能进行处理。处理后得到的YUV序列文件也需要进行格式转换,得到HDR视频。When performing HDR video conversion tasks, the existing models often need to convert the SDR video frames obtained by frame extraction into 8-bit YUV encoded frames after the frame extraction tool is used to complete the processing. The YUV sequence file obtained after processing also needs to be format converted to obtain HDR video.
而本申请实施例提供全卷积神经网络,利用抽帧工具抽帧得到SDR视频帧可以直接输入全卷积神经网络中处理,得到的HDR视频帧也可以直接合帧得到HDR视频。与现有的模型,本申请提供的方法无需多次将待处理的SDR视频转换成其他格式的数据,相应的,经神经网络模型处理后,也不需要多次将其他格式的数据转换成HDR数据,降低了方法的复杂度。However, the embodiment of the present application provides a fully convolutional neural network, and the SDR video frames obtained by using the frame extraction tool to extract frames can be directly input into the fully convolutional neural network for processing, and the obtained HDR video frames can also be directly combined to obtain an HDR video. Compared with existing models, the method provided by this application does not need to convert the SDR video to be processed into data in other formats multiple times. Correspondingly, after processing by the neural network model, it does not need to convert data in other formats into HDR multiple times. data, reducing the complexity of the method.
基于同一发明构思,作为对上述方法的实现,本申请实施例提供了一种HDR视频转换装置,该装置实施例与前述方法实施例对应,为便于阅读,本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述,但应当明确,本实施例中的装置能够对应实现前述方法实施例中的全部内容。Based on the same inventive concept, as an implementation of the above method, an embodiment of the present application provides an HDR video conversion device. The embodiment of the device corresponds to the embodiment of the method described above. For ease of reading, the embodiment of the device does not implement the method described above. The details in the examples are described one by one, but it should be clear that the device in this embodiment can correspondingly implement all the content in the foregoing method embodiments.
如图4所示,本申请提供一种HDR视频转换装置,上述装置200包括:As shown in FIG. 4, the present application provides an HDR video conversion device, and the above-mentioned device 200 includes:
抽帧单元201,用于对待处理的SDR视频进行抽帧处理,得到所述SDR视频中包含的J帧SDR视频帧,J为大于1的整数;The frame extraction unit 201 is used to perform frame extraction processing on the SDR video to be processed, so as to obtain J frames of SDR video frames contained in the SDR video, where J is an integer greater than 1;
处理单元202,用于分别将J帧SDR视频帧输入已训练的全卷积模型中处理,输出J帧HDR视频帧,全卷积模型包括N个卷积核大小为1×1的卷积层,N个卷积层中穿插设置有N-1个激活函数,N为大于或者等于3的正整数。The processing unit 202 is used to input J frames of SDR video frames into the trained full convolution model for processing, and output J frames of HDR video frames. The full convolution model includes N convolutional layers with a convolution kernel size of 1×1 , N-1 activation functions are interspersed in N convolutional layers, and N is a positive integer greater than or equal to 3.
合帧单元203,用于将J帧HDR视频帧进行合帧处理,得到与SDR视频对应的HDR视频。The framing unit 203 is configured to perform frame merging processing on J frames of HDR video frames to obtain HDR video corresponding to the SDR video.
可选地,3≤N≤10。Optionally, 3≤N≤10.
可选地,激活函数为非线性激活函数。Optionally, the activation function is a non-linear activation function.
可选地,对待处理的SDR视频进行抽帧处理,得到SDR视频中包含的J帧SDR视频帧,包括:采用FFmpeg工具对SDR视频帧进行抽帧处理,得到J帧SDR视频帧;Optionally, the SDR video to be processed is subjected to frame extraction processing to obtain J-frame SDR video frames contained in the SDR video, including: adopting FFmpeg tool to perform frame extraction processing on the SDR video frame to obtain J-frame SDR video frames;
将J帧HDR视频帧进行合帧处理,得到与SDR视频对应的HDR视频,包括:采用FFmpeg工具对J帧HDR视频帧进行合帧处理,得到与SDR视频对应的HDR视频。Combine J frames of HDR video frames to obtain HDR video corresponding to SDR video, including: using FFmpeg tool to perform frame processing on J frames of HDR video frames to obtain HDR video corresponding to SDR video.
可选地,全卷积模型的训练方式包括:Optionally, the training methods of the full convolution model include:
利用预设的训练集和预设的损失函数对初始的全卷积模型进行迭代训练,得到全卷积模型;Use the preset training set and the preset loss function to iteratively train the initial full convolution model to obtain the full convolution model;
训练集包括多个SDR视频帧样本以及与SDR视频帧样本对应的HDR视频帧样本。The training set includes a plurality of SDR video frame samples and HDR video frame samples corresponding to the SDR video frame samples.
预设的损失函数用于描述预测的HDR视频帧和HDR视频帧样本之间的L2损失,预测的HDR视频帧为全卷积模型对SDR视频帧样本进行处理得到的。The preset loss function is used to describe the L2 loss between the predicted HDR video frame and the HDR video frame sample, and the predicted HDR video frame is obtained by processing the SDR video frame sample by the full convolution model.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个 单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述***中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional units and modules is used for illustration. In practical applications, the above-mentioned functions can be assigned to different functional units, Completion of modules means that the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit, and the above-mentioned integrated units may adopt hardware It can also be implemented in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above system, reference may be made to the corresponding process in the foregoing method embodiments, and details will not be repeated here.
基于同一发明构思,本申请实施例还提供了一种终端设备。图5为本申请实施例提供的终端设备的示意图,如图5所示,本实施例提供的终端设备300包括:存储器302和处理器301,存储器302用于存储计算机程序;处理器301用于在调用计算机程序时执行上述方法实施例所述的方法,例如图2所示的步骤S101至步骤S103。或者,所述处理器301执行所述计算机程序时实现上述各装置实施例中各模块/单元的功能,例如图4所示单元201至单元203的功能。Based on the same inventive concept, the embodiment of the present application also provides a terminal device. FIG. 5 is a schematic diagram of a terminal device provided in an embodiment of the present application. As shown in FIG. 5 , the terminal device 300 provided in this embodiment includes: a memory 302 and a processor 301, the memory 302 is used to store computer programs; the processor 301 is used to The methods described in the above method embodiments are executed when the computer program is called, for example, steps S101 to S103 shown in FIG. 2 . Alternatively, when the processor 301 executes the computer program, it realizes the functions of the modules/units in the above-mentioned device embodiments, for example, the functions of the units 201 to 203 shown in FIG. 4 .
示例性的,所述计算机程序可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器302中,并由所述处理器301执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序在所述终端设备中的执行过程。Exemplarily, the computer program may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 302 and executed by the processor 301 to complete this Apply. The one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program in the terminal device.
本领域技术人员可以理解,图5仅仅是终端设备的示例,并不构成对终端设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端设备还可以包括输入输出设备、网络接入设备、总线等。Those skilled in the art can understand that FIG. 5 is only an example of a terminal device, and does not constitute a limitation on the terminal device. It may include more or less components than those shown in the figure, or combine certain components, or different components, such as The terminal device may also include an input and output device, a network access device, a bus, and the like.
所述处理器301可以是中央处理单元(Central Processing Unit,CPU),还可以是其它通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 301 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
所述存储器302可以是所述终端设备的内部存储单元,例如终端设备的硬盘或内存。所述存储器302也可以是所述终端设备的外部存储设备,例如所述终端设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器302还可以既包括所述终端设备的内部存储单元也包括外部存储设备。所述存储器302用于存储所述计算机程序以及所述终端设备所需的其它程序和数据。所述存储器302还可以用于暂时地存储已经输出或者将要输出的数据。The storage 302 may be an internal storage unit of the terminal device, for example, a hard disk or memory of the terminal device. The memory 302 may also be an external storage device of the terminal device, such as a plug-in hard disk equipped on the terminal device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, Flash card (Flash Card), etc. Further, the memory 302 may also include both an internal storage unit of the terminal device and an external storage device. The memory 302 is used to store the computer program and other programs and data required by the terminal device. The memory 302 can also be used to temporarily store data that has been output or will be output.
本实施例提供的终端设备可以执行上述方法实施例,其实现原理与技术效果类似,此处不再赘述。The terminal device provided in this embodiment can execute the foregoing method embodiment, and its implementation principle and technical effect are similar, and details are not repeated here.
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述方法实施例所述的方法。The embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method described in the foregoing method embodiment is implemented.
本申请实施例还提供一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行时实现上述方法实施例所述的方法。The embodiment of the present application further provides a computer program product, which, when the computer program product runs on a terminal device, enables the terminal device to implement the method described in the foregoing method embodiments when executed.
本申请实施例还提供一种芯片***,包括处理器,所述处理器与存储器耦合,所述处理器执行存储器中存储的计算机程序,以实现上述方法实施例所述的方法。其中,所述芯片***可以为单个芯片,或者多个芯片组成的芯片模组。An embodiment of the present application further provides a chip system, including a processor, the processor is coupled to a memory, and the processor executes a computer program stored in the memory, so as to implement the method described in the above method embodiment. Wherein, the chip system may be a single chip, or a chip module composed of multiple chips.
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读存储介质至少可以包括:能够将计算机程序代码携带到拍照装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器 (Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。If the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, all or part of the procedures in the methods of the above embodiments in the present application can be completed by instructing related hardware through computer programs, and the computer programs can be stored in a computer-readable storage medium. The computer program When executed by a processor, the steps in the above-mentioned various method embodiments can be realized. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form. The computer-readable storage medium may at least include: any entity or device capable of carrying computer program codes to a photographing device/terminal device, a recording medium, a computer memory, a read-only memory (Read-Only Memory, ROM), a random access Memory (Random Access Memory, RAM), electrical carrier signal, telecommunication signal and software distribution medium. Such as U disk, mobile hard disk, magnetic disk or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunication signals under legislation and patent practice.
在本申请中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。Reference to "one embodiment" or "some embodiments" or the like in this application means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically stated otherwise. The terms "including", "comprising", "having" and variations thereof mean "including but not limited to", unless specifically stated otherwise.
在本申请的描述中,需要理解的是,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。In the description of the present application, it should be understood that the terms "first" and "second" are used for description purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. It should also be understood that the term "and/or" used in the description of the present application and the appended claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations.
此外,在本申请中,除非另有明确的规定和限定,术语“连接”、“相连”等应做广义理解,例如可以是机械连接,也可以是电连接;可以是直接连接,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系,除非另有明确的限定、对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本申请中的具体含义。In addition, in this application, unless otherwise clearly stipulated and limited, the terms "connected" and "connected" should be understood in a broad sense, for example, it can be mechanical connection or electrical connection; it can be direct connection or through An intermediate medium is indirectly connected, which can be the internal communication of two elements or the interaction relationship between two elements. Unless otherwise clearly defined, those of ordinary skill in the art can understand the above terms in this application according to the specific situation. specific meaning.
以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相 应技术方案的本质脱离本申请各实施例技术方案的范围。The above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit them; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be applied to the foregoing embodiments The technical solutions described in the examples are modified, or some or all of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the application.

Claims (10)

  1. 一种HDR视频转换方法,其特征在于,包括:A kind of HDR video conversion method, is characterized in that, comprises:
    对待处理的SDR视频进行抽帧处理,得到所述SDR视频中包含的J帧SDR视频帧,J为大于1的整数;The SDR video to be processed is carried out frame drawing processing, obtains the J frame SDR video frame that comprises in the described SDR video, and J is an integer greater than 1;
    分别将J帧所述SDR视频帧输入已训练的全卷积模型中处理,输出J帧HDR视频帧,所述全卷积模型包括N个卷积核大小为1×1的卷积层,N个所述卷积层中穿插设置有N-1个激活函数,N为大于等于3的整数;The SDR video frames of J frames are input into the trained full convolution model for processing, and J frames of HDR video frames are output. The full convolution model includes N convolution kernels with a size of 1×1 convolutional layers, N N-1 activation functions are interspersed in each of the convolutional layers, and N is an integer greater than or equal to 3;
    将所述J帧HDR视频帧进行合帧处理,得到与所述SDR视频对应的HDR视频。Combining the J frames of HDR video frames to obtain an HDR video corresponding to the SDR video.
  2. 根据权利要求1所述的方法,其特征在于,3≤N≤10。The method according to claim 1, characterized in that 3≤N≤10.
  3. 根据权利要求1所述的方法,其特征在于,所述激活函数为非线性激活函数。The method according to claim 1, wherein the activation function is a non-linear activation function.
  4. 根据权利要求1所述的方法,其特征在于,对待处理的SDR视频进行抽帧处理,得到所述SDR视频中包含的J帧SDR视频帧,包括:The method according to claim 1, characterized in that, the SDR video to be processed is subjected to frame extraction processing to obtain the J frame SDR video frame contained in the SDR video, including:
    采用抽帧工具对所述SDR视频帧进行抽帧处理,得到所述J帧SDR视频帧;Adopt frame drawing tool to carry out frame drawing process to described SDR video frame, obtain described J frame SDR video frame;
    所述将所述J帧HDR视频帧进行合帧处理,得到与所述SDR视频对应的HDR视频,包括:The described J frames of HDR video frames are combined to obtain the HDR video corresponding to the SDR video, including:
    采用所述抽帧工具对所述J帧HDR视频帧进行合帧处理,得到与所述SDR视频对应的HDR视频。Using the frame extraction tool to perform frame combining processing on the J frames of HDR video frames to obtain an HDR video corresponding to the SDR video.
  5. 根据权利要求1-3任一项所述的方法,其特征在于,所述全卷积模型的训练方式包括:The method according to any one of claims 1-3, wherein the training method of the full convolution model comprises:
    利用预设的训练集和预设的损失函数对初始的全卷积模型进行迭代训练,得到所述全卷积模型;Using a preset training set and a preset loss function to iteratively train the initial full convolution model to obtain the full convolution model;
    所述训练集包括多个SDR视频帧样本以及与所述SDR视频帧样本对应的HDR视频帧样本;The training set includes a plurality of SDR video frame samples and HDR video frame samples corresponding to the SDR video frame samples;
    所述预设的损失函数用于描述预测的HDR视频帧和所述HDR视频帧样本之间的L2损失,所述预测的HDR视频帧为所述全卷积模型对所述SDR视频帧样本进行处理得到的。The preset loss function is used to describe the L2 loss between the predicted HDR video frame and the HDR video frame sample, and the predicted HDR video frame is performed on the SDR video frame sample by the full convolution model dealt with.
  6. 一种HDR视频转换装置,其特征在于,包括:A kind of HDR video conversion device, is characterized in that, comprises:
    抽帧单元,用于对待处理的SDR视频进行抽帧处理,得到所述SDR视频中包含的J帧SDR视频帧,J为大于1的整数;A frame extraction unit, configured to perform frame extraction processing on the SDR video to be processed, to obtain J frames of SDR video frames contained in the SDR video, where J is an integer greater than 1;
    处理单元,用于分别将J帧所述SDR视频帧输入已训练的全卷积模型中处理,输出J帧HDR视频帧,所述全卷积模型包括N个卷积核大小为1×1的卷积层,N个所述卷积层中穿插设置有N-1个激活函数,N为大于等于3的整数;The processing unit is used to input the SDR video frames of J frames into the trained full convolution model for processing, and output J frames of HDR video frames, and the full convolution model includes N convolution kernels with a size of 1×1 A convolutional layer, N-1 activation functions are interspersed in the N convolutional layers, and N is an integer greater than or equal to 3;
    合帧单元,用于将所述J帧HDR视频帧进行合帧处理,得到与所述SDR视频对应的HDR视频。A frame merging unit, configured to perform frame merging processing on the J frames of HDR video frames to obtain an HDR video corresponding to the SDR video.
  7. 根据权利要求6所述的装置,其特征在于,3≤N≤10。The device according to claim 6, characterized in that 3≤N≤10.
  8. 根据权利要求6所述的装置,其特征在于,所述激活函数为非线性激活函数。The device according to claim 6, wherein the activation function is a non-linear activation function.
  9. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至5任一项所述的方法。A terminal device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, characterized in that, when the processor executes the computer program, the following claims 1 to 1 are implemented. The method described in any one of 5.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至5任一项所述的方法。A computer-readable storage medium storing a computer program, wherein the computer program implements the method according to any one of claims 1 to 5 when executed by a processor.
PCT/CN2021/137979 2021-08-02 2021-12-14 Hdr video conversion method and apparatus, and device and computer storage medium WO2023010749A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110883118.3A CN113784175B (en) 2021-08-02 2021-08-02 HDR video conversion method, device, equipment and computer storage medium
CN202110883118.3 2021-08-02

Publications (1)

Publication Number Publication Date
WO2023010749A1 true WO2023010749A1 (en) 2023-02-09

Family

ID=78836564

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/137979 WO2023010749A1 (en) 2021-08-02 2021-12-14 Hdr video conversion method and apparatus, and device and computer storage medium

Country Status (2)

Country Link
CN (1) CN113784175B (en)
WO (1) WO2023010749A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113784175B (en) * 2021-08-02 2023-02-28 中国科学院深圳先进技术研究院 HDR video conversion method, device, equipment and computer storage medium
CN113781319A (en) * 2021-08-02 2021-12-10 中国科学院深圳先进技术研究院 HDR video conversion method, device, equipment and computer storage medium
CN113781322A (en) * 2021-08-02 2021-12-10 中国科学院深圳先进技术研究院 Color gamut mapping method and device, terminal equipment and storage medium
CN114422718B (en) * 2022-01-19 2022-12-13 北京百度网讯科技有限公司 Video conversion method and device, electronic equipment and storage medium
CN116704926A (en) * 2022-02-28 2023-09-05 荣耀终端有限公司 Frame data display method, electronic device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681991A (en) * 2018-04-04 2018-10-19 上海交通大学 Based on the high dynamic range negative tone mapping method and system for generating confrontation network
US20190080440A1 (en) * 2017-09-08 2019-03-14 Interdigital Vc Holdings, Inc. Apparatus and method to convert image data
CN111145097A (en) * 2019-12-31 2020-05-12 华为技术有限公司 Image processing method, device and image processing system
CN112102166A (en) * 2020-08-26 2020-12-18 上海交通大学 Method and device for combining super-resolution, color gamut expansion and inverse tone mapping
CN112200719A (en) * 2020-09-27 2021-01-08 咪咕视讯科技有限公司 Image processing method, electronic device and readable storage medium
CN113784175A (en) * 2021-08-02 2021-12-10 中国科学院深圳先进技术研究院 HDR video conversion method, device, equipment and computer storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019112085A1 (en) * 2017-12-06 2019-06-13 Korea Advanced Institute Of Science And Technology Method and apparatus for inverse tone mapping
CN109447907B (en) * 2018-09-20 2020-06-16 宁波大学 Single image enhancement method based on full convolution neural network
US10997690B2 (en) * 2019-01-18 2021-05-04 Ramot At Tel-Aviv University Ltd. Method and system for end-to-end image processing
CN111292264B (en) * 2020-01-21 2023-04-21 武汉大学 Image high dynamic range reconstruction method based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190080440A1 (en) * 2017-09-08 2019-03-14 Interdigital Vc Holdings, Inc. Apparatus and method to convert image data
CN108681991A (en) * 2018-04-04 2018-10-19 上海交通大学 Based on the high dynamic range negative tone mapping method and system for generating confrontation network
CN111145097A (en) * 2019-12-31 2020-05-12 华为技术有限公司 Image processing method, device and image processing system
CN112102166A (en) * 2020-08-26 2020-12-18 上海交通大学 Method and device for combining super-resolution, color gamut expansion and inverse tone mapping
CN112200719A (en) * 2020-09-27 2021-01-08 咪咕视讯科技有限公司 Image processing method, electronic device and readable storage medium
CN113784175A (en) * 2021-08-02 2021-12-10 中国科学院深圳先进技术研究院 HDR video conversion method, device, equipment and computer storage medium

Also Published As

Publication number Publication date
CN113784175B (en) 2023-02-28
CN113784175A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
WO2023010749A1 (en) Hdr video conversion method and apparatus, and device and computer storage medium
RU2762384C1 (en) Signal reformation for signals of wide dynamic range
CN107203974B (en) Method, apparatus and system for extended high dynamic range HDR to HDR tone mapping
WO2023010754A1 (en) Image processing method and apparatus, terminal device, and storage medium
US9501818B2 (en) Local multiscale tone-mapping operator
US9767544B2 (en) Scene adaptive brightness/contrast enhancement
RU2710888C2 (en) Method and device for colour picture encoding and decoding
RU2710873C2 (en) Method and device for colour image decoding
RU2737507C2 (en) Method and device for encoding an image of a high dynamic range, a corresponding decoding method and a decoding device
US8675984B2 (en) Merging multiple exposed images in transform domain
US20180005358A1 (en) A method and apparatus for inverse-tone mapping a picture
JP7359521B2 (en) Image processing method and device
US8600159B2 (en) Color converting images
WO2023010750A1 (en) Image color mapping method and apparatus, electronic device, and storage medium
CN114189691A (en) Method and apparatus for encoding both HDR images and SDR images using color mapping functions
KR20200002029A (en) Method and device for color gamut mapping
WO2023010751A1 (en) Information compensation method and apparatus for highlighted area of image, device, and storage medium
CN110807735A (en) Image processing method, image processing device, terminal equipment and computer readable storage medium
CN111738951A (en) Image processing method and device
WO2023010755A1 (en) Hdr video conversion method and apparatus, and device and computer storage medium
CN111738950A (en) Image processing method and device
WO2023010753A1 (en) Color gamut mapping method and apparatus, terminal device, and storage medium
US10855886B2 (en) Cubiform method
Le et al. Improving color space conversion for camera-captured images via wide-gamut metadata
CN116508054A (en) Methods, apparatus and devices for avoiding chroma clipping in a tone mapper while maintaining saturation and maintaining hue

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21952612

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE