WO2021109846A1 - 码流数据的处理方法和装置 - Google Patents

码流数据的处理方法和装置 Download PDF

Info

Publication number
WO2021109846A1
WO2021109846A1 PCT/CN2020/128960 CN2020128960W WO2021109846A1 WO 2021109846 A1 WO2021109846 A1 WO 2021109846A1 CN 2020128960 W CN2020128960 W CN 2020128960W WO 2021109846 A1 WO2021109846 A1 WO 2021109846A1
Authority
WO
WIPO (PCT)
Prior art keywords
code stream
neural network
training
network model
type information
Prior art date
Application number
PCT/CN2020/128960
Other languages
English (en)
French (fr)
Inventor
陈昱志
蒋忠林
陈尚松
王洋
任学亮
刘建
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021109846A1 publication Critical patent/WO2021109846A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2347Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving video stream encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display

Definitions

  • This application relates to the field of image processing, and more specifically to a method and device for processing bitstream data, and a method and device for obtaining a neural network model.
  • This application provides a method and device for processing bitstream data, which can improve the image quality and resolution of video images under certain conditions of bandwidth, time delay, and connection rate.
  • a method for processing code stream data including: obtaining a first code stream; determining a first neural network model according to a first correspondence and type information of the first code stream, and the first neural network model is the same as the first neural network model.
  • a neural network model corresponding to a code stream, the first corresponding relationship is used to indicate the neural network models corresponding to different types of code streams; the first code stream is parsed to obtain the first image data; the first neural network model is used to compare the first image The data is processed to obtain processed images.
  • the neural network model corresponding to its type is used to process each code stream.
  • the processing method is more accurate and the image quality is closer to The image quality of the standard image to achieve the optimal processing of image quality.
  • the type information of the code stream includes the provider of the code stream and the media metadata of the code stream.
  • the media metadata includes any one of the film subject matter, the film type, the film scene, the color gamut of the code stream, the resolution of the code stream, or the code rate of the code stream. Or more.
  • the code stream is classified according to the type information of the code stream to obtain a neural network model that matches the type information of the code stream.
  • the first neural network model is obtained by training based on a training code stream having the same type of information as the first code stream.
  • the neural network model for processing the code stream is obtained by training on the training code stream of the same type of information as the code stream, and the neural network model obtained therefrom has a high degree of matching with the corresponding code stream. , So it has an excellent effect in the image processing process.
  • the first correspondence includes: the correspondence between the type information of the code stream and the neural network model, and the first correspondence is determined according to the first correspondence and the type information of the first code stream.
  • a neural network model which specifically includes: determining the supplier of the first code stream and the media metadata of the first code stream; determining the supplier of the first code stream and the media metadata of the first code stream in the first correspondence relationship
  • the corresponding neural network model is the first neural network model.
  • the code stream and the neural network model are matched according to the first correspondence relationship, and the neural network model corresponding to the first code stream can be quickly obtained.
  • a method for obtaining a neural network model including: obtaining multiple sets of training data, each of the multiple sets of training data includes a standard code stream and a training code stream, and the multiple sets of training data include training codes Stream corresponds to different types of information.
  • the training code stream is obtained by preprocessing the standard code stream; according to multiple sets of training data, multiple initial neural network models are trained separately to obtain multiple neural network models. Among them, one set of training data is trained Obtain a neural network model; store multiple neural network models and multiple types of information of training data used to train the multiple neural network models in correspondence.
  • the multiple sets of training data include a first set of training data, the first set of training data includes a first standard code stream and a first training code stream, and the first training code stream
  • the type information of is the first type information
  • multiple initial neural network models are trained separately according to multiple sets of training data, and multiple neural network models are obtained, including: comparing the initial neural network according to the first standard code stream and the first training code stream The model is trained; when the difference between the first training code stream processed by the neural network model and the first standard code stream meets the preset conditions, the first neural network model is obtained; multiple neural network models are trained and multiple The multiple types of information of training data used by the neural network model are correspondingly stored, including: correspondingly storing the first neural network model and the first type of information.
  • the images of different code stream providers and the pre-processed results of different images are different. If you try to use an algorithm and training method to obtain neural network models for different images, the algorithm will be too complicated to converge or the computing power will be too complicated. The requirements are too high.
  • the solution provided in this application avoids the problem that the algorithm is difficult to converge and saves training time by separately training different types of code streams from different code stream providers. Obtain multiple neural network models adapted to multiple different code stream providers, and then adaptively select the corresponding neural network model according to the code stream type information, which also has a better effect in improving the image quality of a specific image.
  • the type information includes the supplier of the training code stream and the media metadata of the training code stream.
  • the media metadata includes any one of movie subject matter, movie type, movie scene, bitstream color gamut, bitstream resolution, or bitrate Or more.
  • neural network models are trained separately for the code streams of different types of information.
  • the more detailed the type information the better the targeted neural network model obtained by training and the better the corresponding image processing effect. .
  • the method further includes: The model and the type information corresponding to the neural network model are transmitted to the code stream provider included in the type information.
  • the trained neural network model and the type information corresponding to the neural network model can be stored in the terminal device.
  • the corresponding Stored neural network model When the media player of the corresponding code stream provider is downloaded on the terminal device, the corresponding Stored neural network model.
  • the trained neural network model and the type information corresponding to the neural network model can also be transmitted to the stream provider.
  • the terminal device can also Download the corresponding neural network model parameters to establish a neural network model to process the code stream provided by the code stream provider.
  • a device for processing code stream data including: an obtaining unit, configured to obtain a first code stream;
  • the processing unit is configured to determine the first neural network model according to the first correspondence and the type information of the first code stream, the first neural network model is a neural network model corresponding to the first code stream, and the first correspondence is used to indicate the difference The neural network model corresponding to the type of code stream; the processing unit is also used to parse the first code stream to obtain the first image data; the processing unit is also used to process the first image data using the first neural network model to obtain the processing After the image.
  • the type information of the code stream includes the supplier of the code stream and the media metadata of the code stream.
  • the media metadata includes any one of the movie subject matter, the movie type, the movie scene, the color gamut of the code stream, the resolution of the code stream, or the code rate of the code stream. Or more.
  • the first neural network model is obtained by training based on a training code stream having the same type of information as the first code stream.
  • the first correspondence includes: the correspondence between the type information of the code stream and the neural network model, and the processing unit is specifically configured to: determine the supplier of the first code stream and The media metadata of the first code stream; the neural network model corresponding to both the supplier of the first code stream and the media metadata of the first code stream is determined as the first neural network model in the first correspondence.
  • a device for processing code stream data including a processor and a transmission interface, the transmission interface is used to receive or send data, the processor is configured to perform the first aspect and any one of the first aspect Methods in possible implementations.
  • a device for obtaining a neural network model including: an obtaining unit for obtaining multiple sets of training data, each of the multiple sets of training data includes a standard code stream and a training code stream, and multiple sets of training data
  • the training code stream included in the data corresponds to different types of information.
  • the training code stream is obtained by preprocessing the standard code stream; the processing unit is used to train multiple initial neural network models based on multiple sets of training data to obtain multiple neural networks.
  • a network model where a set of training data is trained to obtain a neural network model; the processing unit is also used to store multiple neural network models and multiple types of information of training data used for training the multiple neural network models.
  • the multiple sets of training data include a first set of training data, the first set of training data includes a first standard code stream and a first training code stream, and the first training code stream
  • the type information of is the first type information
  • the processing unit is specifically used to: train the initial neural network model according to the first standard code stream and the first training code stream; when the first training code stream is processed by the neural network model, the code stream When the difference from the first standard code stream satisfies the preset condition, the first neural network model is obtained; the first neural network model and the first type of information are correspondingly stored.
  • the type information includes the supplier of the training code stream and the media metadata of the training code stream.
  • the media metadata includes any one of the film subject matter, the film type, the film scene, the color gamut of the code stream, the resolution of the code stream, or the code rate of the code stream. Or more.
  • the device further includes: a sending unit for transmitting the neural network model and the type information corresponding to the neural network model to the code stream provider included in the type information Quotient.
  • an apparatus for obtaining a neural network model including a processor and a transmission interface, the transmission interface is used to receive or send data, and the processor is configured to execute the second aspect and any one of the second aspect Methods in possible implementations.
  • a computer program product includes: a computer program (also called code, or instruction), which when the computer program runs on a computer, causes the computer to execute the above-mentioned One aspect, any one of the possible implementation manners of the first aspect, or the second aspect, any one of the possible implementation manners of the second aspect.
  • a computer program also called code, or instruction
  • a computer-readable storage medium for storing a computer program.
  • the computer program includes methods for executing any one of the first aspect and the first aspect, or the second aspect and the second aspect. Instructions for the method in any one of the possible implementation modes.
  • Fig. 1 is a schematic block diagram of an encoding and decoding system provided by an embodiment of the present application.
  • Fig. 2 is a schematic flowchart of a method for obtaining a neural network model provided by an embodiment of the present application.
  • Fig. 3 is a schematic flowchart of an image processing process provided by an embodiment of the present application.
  • Fig. 4 is a schematic flowchart of a method for training a neural network model provided by an embodiment of the present application.
  • Fig. 5 is a schematic flowchart of a method for processing code stream data provided by an embodiment of the present application.
  • Fig. 6 is a schematic flowchart of a player using a neural network model to process a code stream according to an embodiment of the present application.
  • Fig. 7 is a schematic flowchart of updating a neural network model provided by an embodiment of the present application.
  • Fig. 8 is a schematic block diagram of a device for obtaining a neural network model provided by an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of an apparatus for processing code stream data provided by an embodiment of the present application.
  • FIG. 1 is a conceptual or schematic block diagram illustrating an exemplary encoding system 10, for example, a video encoding system 10 that can utilize the technology of the present application (this disclosure).
  • the encoder 20 for example, the video encoder 20
  • the decoder 30 for example, the video decoder 30
  • the encoding system 10 includes a source device 12 for providing encoded data 13, for example, an encoded picture 13, to a destination device 14 that decodes the encoded data 13, for example.
  • the source device 12 includes an encoder 20.
  • a picture source 16 such as a preprocessing unit 18 of the picture preprocessing unit 18, and a communication interface or communication unit 22.
  • the picture source 16 may include or may be any type of picture capture device, for example to capture real-world pictures, and/or any type of pictures or comments (for screen content encoding, some text on the screen is also considered to be the picture to be encoded Or a part of an image) generating equipment, for example, a computer graphics processor used to generate computer animation pictures, or used to obtain and/or provide real world pictures, computer animation pictures (for example, screen content, virtual reality (VR) ) Pictures), and/or any combination thereof (for example, augmented reality (AR) pictures).
  • a computer graphics processor used to generate computer animation pictures, or used to obtain and/or provide real world pictures, computer animation pictures (for example, screen content, virtual reality (VR) ) Pictures), and/or any combination thereof (for example, augmented reality (AR) pictures).
  • the (digital) picture is or can be regarded as a two-dimensional array or matrix of sampling points with brightness values.
  • the sampling points in the array may also be called pixels (short for picture element) or pels.
  • the number of sampling points of the array or picture in the horizontal and vertical directions (or axis) defines the size and/or resolution of the picture.
  • three color components are usually used, that is, pictures can be represented as or contain three sample arrays.
  • a picture includes corresponding red, green, and blue sample arrays.
  • each pixel is usually expressed in a luminance/chrominance format or color space, for example, YCbCr, including the luminance component indicated by Y (sometimes may also be indicated by L) and the two chrominance indicated by Cb and Cr Weight.
  • Luma (abbreviated as luma) component Y represents brightness or gray level intensity (for example, the two are the same in a grayscale picture)
  • two chroma (abbreviated as chroma) components Cb and Cr represent chrominance or color information components .
  • a picture in the YCbCr format includes a luminance sample array of luminance sample values (Y), and two chrominance sample arrays of chrominance values (Cb and Cr).
  • Y luminance sample values
  • Cb and Cr chrominance sample arrays of chrominance values
  • Pictures in RGB format can be converted or converted to YCbCr format, and vice versa. This process is also called color conversion or conversion. If the picture is black and white, the picture may only include the luminance sample array.
  • the picture source 16 may be, for example, a camera used to capture pictures, such as a memory of a picture memory, including or storing previously captured or generated pictures, and/or any type of (internal) that acquires or receives pictures. Or external) interface.
  • the camera may be, for example, an integrated camera that is local or integrated in the source device, and the memory may be local or, for example, an integrated memory that is integrated in the source device.
  • the interface can be, for example, an external interface for receiving pictures from an external video source.
  • the external video source is, for example, an external picture capture device, such as a camera, an external memory, or an external picture generation device, and the external picture generation device is, for example, an external computer graphics processor, a computer Or server.
  • the interface can be any type of interface according to any proprietary or standardized interface protocol, such as a wired or wireless interface, and an optical interface.
  • the interface for acquiring the picture data 17 may be the same interface as the communication interface 22 or a part of the communication interface 22.
  • the picture or picture data 17 may also be referred to as the original picture or the original picture data 17.
  • the pre-processing unit 18 is used to receive (original) picture data 17 and perform pre-processing on the picture data 17 to obtain pre-processed pictures 19 or pre-processed picture data 19.
  • the preprocessing performed by the preprocessing unit 18 may include trimming, color format conversion (for example, conversion from RGB to YCbCr), toning, or denoising. It can be understood that the pre-processing unit 18 may be an optional component.
  • the encoder 20 (for example, the video encoder 20) is used to receive the pre-processed picture data 19 and provide the encoded picture data 21 (details will be further described below, for example, based on FIG. 2 or FIG. 4).
  • the communication interface 22 of the source device 12 can be used to receive the encoded picture data 21 and transmit it to other devices, for example, the destination device 14 or any other device for storage or direct reconstruction, or for storing the encoded image data 21 accordingly.
  • the encoded data 13 and/or the encoded picture data 21 are processed before transmitting the encoded data 13 to other devices, such as the destination device 14 or any other device for decoding or storage.
  • the destination device 14 includes a decoder 30 (for example, a video decoder 30), and in addition, that is, optionally, may include a communication interface or communication unit 28, a post-processing unit 32, and a display device 34.
  • a decoder 30 for example, a video decoder 30
  • the communication interface 28 of the destination device 14 is used, for example, to directly receive the encoded picture data 21 or the encoded data 13 from the source device 12 or any other source.
  • Any other source is, for example, a storage device, and the storage device is, for example, a storage device for encoded picture data. equipment.
  • the communication interface 22 and the communication interface 28 can be used to directly communicate through the direct communication link between the source device 12 and the destination device 14 or through any type of network to transmit or receive the encoded picture data 21 or the encoded data 13
  • the link is, for example, a direct wired or wireless connection, and any type of network is, for example, a wired or wireless network or any combination thereof, or any type of private network and public network, or any combination thereof.
  • the communication interface 22 may be used, for example, to encapsulate the encoded picture data 21 into a suitable format, such as a packet, for transmission on a communication link or a communication network.
  • the communication interface 28 forming the corresponding part of the communication interface 22 may be used, for example, to decapsulate the encoded data 13 to obtain the encoded picture data 21.
  • Both the communication interface 22 and the communication interface 28 can be configured as a one-way communication interface, as indicated by the arrow pointing from the source device 12 to the destination device 14 for the encoded picture data 13 in FIG. 1, or as a two-way communication interface, and It can be used, for example, to send and receive messages to establish a connection, confirm and exchange any other information related to the communication link and/or data transmission such as the transmission of encoded picture data.
  • the communication interface can also be called a transmission interface, and the interface can be any type of interface based on any proprietary or standardized interface protocol, such as high definition multimedia interface (HDMI), Mobile Industry Processor Interface (MIPI), MIPI standardized display serial interface (Display Serial Interface, DSI), Video Electronics Standards Association (Video Electronics Standards Association, VESA) standardized embedded display port (Embedded Display) Port, eDP) or V-By-One interface.
  • HDMI high definition multimedia interface
  • MIPI Mobile Industry Processor Interface
  • DSI Display Serial Interface
  • DSI Display Serial Interface
  • Video Electronics Standards Association Video Electronics Standards Association
  • VESA Video Electronics Standards Association
  • eDP embedded Display Port
  • V-By-One interface is a digital interface standard developed for image transmission, as well as various wired or wireless interfaces, optical interfaces, etc.
  • the decoder 30 is used to receive the encoded picture data 21 and provide the decoded picture data 31 or the decoded picture 31 (details will be further described below, for example, based on FIG. 3 or FIG. 5).
  • the post-processor 32 of the destination device 14 is used to post-process the decoded picture data 31 (also referred to as reconstructed picture data), for example, the decoded picture 131, to obtain the post-processed picture data 33, for example, post-processed Picture 33.
  • the post-processing performed by the post-processing unit 32 may include, for example, color format conversion (for example, conversion from YCbCr to RGB), toning, trimming or resampling, or any other processing for preparing decoded picture data 31 for example
  • the display device 34 displays.
  • the display device 34 of the destination device 14 is used to receive the post-processed picture data 33 to display the picture to, for example, a user or a viewer.
  • the display device 34 may be or may include any type of display for presenting reconstructed pictures, for example, an integrated or external display or monitor.
  • the display may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), Digital light processor (digital light processor, DLP) or any other type of display.
  • FIG. 1 shows the source device 12 and the destination device 14 as separate devices
  • the device embodiment may also include the source device 12 and the destination device 14 or the functionality of both, that is, the source device 12 or the corresponding The functionality of the destination device 14 or the corresponding functionality.
  • the same hardware and/or software may be used, or separate hardware and/or software, or any combination thereof may be used to implement the source device 12 or the corresponding functionality and the destination device 14 or the corresponding functionality .
  • Encoder 20 for example, video encoder 20
  • decoder 30 for example, video decoder 30
  • Encoder 20 can be implemented as any of various suitable circuits, for example, one or more microprocessors, digital signal processors (digital signal processor, DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof.
  • DSP digital signal processors
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and can use one or more processors to execute the instructions in hardware to execute the technology of the present disclosure.
  • Any of the foregoing can be regarded as one or more processors.
  • Each of the video encoder 20 and the video decoder 30 may be included in one or more encoders or decoders, and any of the encoders or decoders may be integrated as a combined encoder/decoder in the corresponding device Part of the device (codec).
  • the source device 12 may be referred to as a video encoding device or a video encoding device.
  • the destination device 14 may be referred to as a video decoding device or a video decoding device.
  • the source device 12 and the destination device 14 may be examples of video encoding devices or video encoding devices.
  • the source device 12 and the destination device 14 may include any of a variety of devices, including any type of handheld or stationary device, for example, a notebook or laptop computer, mobile phone, smart phone, tablet or tablet computer, video camera, desktop Computers, set-top boxes, televisions, display devices, digital media players, video game consoles, video streaming devices (such as content service servers or content distribution servers), broadcast receiver devices, broadcast transmitter devices, etc., and may not be used Or use any kind of operating system.
  • a notebook or laptop computer mobile phone, smart phone, tablet or tablet computer, video camera, desktop Computers, set-top boxes, televisions, display devices, digital media players, video game consoles, video streaming devices (such as content service servers or content distribution servers), broadcast receiver devices, broadcast transmitter devices, etc., and may not be used Or use any kind of operating system.
  • source device 12 and destination device 14 may be equipped for wireless communication. Therefore, the source device 12 and the destination device 14 may be wireless communication devices.
  • the video encoding system 10 shown in FIG. 1 is only an example, and the technology of the present application can be applied to video encoding settings (for example, video encoding or video decoding) that do not necessarily include any data communication between encoding and decoding devices. .
  • the data can be retrieved from local storage, streamed on the network, etc.
  • the video encoding device can encode data and store the data to the memory, and/or the video decoding device can retrieve the data from the memory and decode the data.
  • encoding and decoding are performed by devices that do not communicate with each other but only encode data to and/or retrieve data from the memory and decode the data.
  • video decoder 30 may be used to perform the reverse process.
  • the video decoder 30 can be used to receive and parse such syntax elements, and decode related video data accordingly.
  • video encoder 20 may entropy encode one or more syntax elements into an encoded video bitstream. In such instances, video decoder 30 may parse such syntax elements and decode related video data accordingly.
  • Fig. 2 shows a flowchart of a method 200 for obtaining a neural network model provided by an embodiment of the present application.
  • the method shown in FIG. 2 includes steps 201 to 203, and steps 201 to 203 are described in detail below.
  • Each set of training data in the multiple sets of training data includes a standard code stream and a training code stream.
  • the training code streams included in the multiple sets of training data correspond to different types of information.
  • the training code stream is a standard code stream. Pre-processed.
  • the image processing flow includes steps 301 to 306, which are respectively six stages of recording, segmentation, preprocessing, encoding, code stream packaging and encryption, and sending.
  • the high-quality image obtained after recording is called a standard code stream, and the image corresponding to the standard code stream has high resolution.
  • the standard code stream is preprocessed in step 303 to obtain the training code stream referred to in the embodiment of the present application.
  • the preprocessing process includes a preprocessing process and an encoding process.
  • the segmented standard code stream is subjected to color gamut conversion and down-sampling due to different stream requirements in the pre-processing stage, thereby generating different color gamuts and resolutions.
  • image For example, converting high dynamic range (HDR) images into standard dynamic range (standard dynamic range, SDR) images, and converting ultra high definition (UHD) into full high definition (FHD).
  • HDR can provide more dynamic range and image details. The image quality improvement brought by HDR is intuitive and sensible.
  • HDR technology requires the display of electronic equipment to have high contrast and high color expression. Therefore, not all electronic devices can use HDR technology. Therefore, in order to take into account the scope of use of electronic devices, it is necessary to convert HDR to SDR; the International Telecommunications Union (International Telecommunications Union, ITU) refers to the resolution above 3840x2160 as super HD, the resolution of 1920x1080 is called Full HD. Similar to the aforementioned, because not all electronic devices support UHD, it is necessary to convert UHD to FHD with a wider range of applications. In the encoding process, a compression algorithm is used to compress the video.
  • ITU International Telecommunications Union
  • the code stream the smaller the file size. Therefore, in some cases, in order to make the file size smaller, the code stream is reduced in the encoding process.
  • the training code stream obtained by preprocessing the standard code stream has the first type information, that is, the specific type information of the training code stream.
  • the type information includes the supplier and media metadata of the training code stream, and the media metadata is also Including any one or more of movie theme, type, scene, color gamut of code stream, resolution of code stream, or code rate of code stream.
  • specific media metadata of different code stream providers are trained separately to obtain multiple neural network models for code streams of different types of information.
  • the type information selects the corresponding neural network model from multiple neural network models to process the code stream, which is more flexible and has a better processing effect.
  • S202 Train multiple initial neural network models separately according to multiple sets of training data to obtain multiple neural network models, where one set of training data is trained to obtain a neural network model.
  • FIG. 4 is a schematic flowchart of training a neural network model provided by an embodiment of the application.
  • One set of training data includes a standard code stream and a training code stream.
  • the preprocessed training code stream and the standard code stream have changes in color gamut and resolution.
  • multiple types of information can be collected first.
  • the type information includes the stream provider and media metadata, where the media metadata includes movie subject matter, movie type, movie scene, bit stream color gamut, and bit stream resolution. , Bit rate of the code stream, etc. Further, the training data corresponding to each type of information is obtained.
  • the standard code stream is preprocessed, and the supplier is Tencent, the film subject is comedy, the film type is movie, and the film scene is outdoor type training code stream, then the standard code stream and the training code stream constitute A set of training data, the type information of the stream corresponding to the neural network model trained based on the set of training data is: the supplier is Tencent, the movie subject is comedy, the movie type is movie, and the movie scene is outdoor.
  • Table 1 shows the possible situations of type information.
  • the type information of the training code stream includes but is not limited to the situation shown in Table 1.
  • the type information of the training code stream may be any one or a combination of the possible situations in Table 1.
  • Table 2 shows the type information of four possible code streams.
  • Type 2 information The third type of information Fourth type of information Streaming provider IQIYI IQIYI Youku Youku Film subject comedy comedy love To Video type the film the film TV series Sporting event Movie scene indoor indoor city outdoor Color gamut BT601 BT709 BT2020 BT2020 Resolution in low high low Bit rate low in in low
  • constructing multiple sets of training data includes:
  • the first group of training data includes training code stream A
  • the second group of training data includes training code stream B
  • the third group of training data includes training code stream C
  • the fourth group of training data includes training code stream D.
  • the standard code stream is preprocessed according to the type information to obtain the corresponding training code stream. For example, Table 3 shows the corresponding four training code streams.
  • Each group of training data includes a standard code stream and a training code stream, that is, each group of training code stream has a group of standard code streams corresponding to it.
  • the standard code streams of different groups may be the same.
  • training code stream A and training code stream B may be due to different pre-processing procedures of the same standard code stream, resulting in different color gamuts, resolutions and bit rates of the training code stream, but training code stream A and training code stream B correspond to The same standard stream.
  • training code stream A, training code stream C, and training code stream D because the three code stream providers are different, and the film subject, type, and scene are different, it can be seen that the standard code streams of the three are different.
  • the neural network model is established by setting initial parameters, and the parameters can be any parameters.
  • the parameters of the neural network model are adjusted to train the neural network model until the difference between the code stream obtained after the training code stream is processed by the adjusted neural network model and the standard code stream meets the preset condition.
  • the training code stream A, the training code stream B, the training code stream C, the training code stream D and their corresponding standard code streams are respectively transmitted to the server, and then the neural network model is trained separately.
  • training code stream A and its standard code stream 1 establish an initial neural network model a, and then adjust the parameters of neural network model a.
  • the difference between training code stream A and standard code stream 1 meets the expected value.
  • the neural network model a'corresponding to the first type of information is obtained.
  • the corresponding evaluation function can be used for evaluation.
  • the neural network model a'after adjusting the parameters makes the difference between the training code stream A and the standard code stream 1 satisfy the preset condition, the first type information of the neural network model a'and the training code stream A is correspondingly stored.
  • training code stream B, training code stream C, and training code stream D you can refer to the training method of neural network model a'to obtain corresponding neural network model b', neural network model c', and neural network model d'.
  • the AI deep learning framework is used to achieve image quality enhancement calculations. After training, the image quality generated by the AI neural network will gradually approach the input high-quality image. After the training is over, you can use this training result to let the AI neural network improve the image quality and resolution of the image. However, the images of different stream providers and the pre-processed results of different images are not the same. If you try to use an algorithm and training method to obtain neural network models for different images, the algorithm will be too complicated to converge or correct. The force requirements are too high. In the embodiment of the present application, training is performed separately for different types of code streams of different code stream providers, so as to avoid the problem that the algorithm is difficult to converge and save training time. Obtain multiple neural network models adapted to multiple different code stream providers, and then adaptively select the corresponding neural network model according to the code stream type information, which also has a better effect in improving the image quality of a specific image.
  • the method 200 for obtaining neural network parameters further includes:
  • S203 Store multiple neural network models and multiple types of information of training data used for training the multiple neural network models in correspondence.
  • the first type information of the trained neural network model and the training code stream corresponding to the neural network model can be stored in the terminal device.
  • the media player of the corresponding code stream provider is downloaded on the terminal device, the corresponding storage can be called Neural network model.
  • the trained neural network model and the first type information of the training code stream corresponding to the neural network model can also be transmitted to the code stream provider, when the media player of the corresponding code stream provider is downloaded on the terminal device ,
  • the terminal device can also download the corresponding neural network model parameters at the same time to establish a neural network model to process the code stream provided by the code stream provider.
  • FIG. 5 is a flowchart of a method 500 for processing code stream data provided by an embodiment of the present application.
  • the method shown in FIG. 5 includes steps 501 to 504, and steps 501 to 504 are described in detail below.
  • S501 Acquire a first code stream.
  • the code stream can be any image that the player obtains from a code stream provider.
  • S502 Determine a first neural network model according to the first correspondence and the type information of the first code stream, where the first neural network model is a neural network model corresponding to the first code stream, and the first correspondence is used to indicate different types of The neural network model corresponding to the code stream.
  • the type information includes the type of the first code stream provider and media metadata, among which the media metadata includes the type of the film, the theme of the film, the scene of the film, the color gamut of the code stream, the resolution of the code stream, and the bit rate of the code stream. Any one or more.
  • the type information of the first code stream includes but is not limited to the possible situations shown in Table 1.
  • a neural network model matching the first code stream is selected from the results obtained by pre-training according to the first correspondence.
  • the first correspondence is used to indicate neural network models corresponding to different types of code streams, that is, the training code stream type information for the selected neural network model is the same as the type information of the first code stream.
  • Table 4 shows the correspondence between the code streams of different types of information and the neural network model.
  • Table 4 only partially shows the correspondence between code streams of different types of information and neural network models.
  • the first neural network model corresponding to the first code stream can be obtained.
  • the type information of the obtained code stream is: the supplier is iQiyi, the film subject is comedy, the film type is movie, the color gamut is BT709, the resolution is low, and the bit rate is medium, then the code
  • the neural network model corresponding to the flow is neural network model c'.
  • the image playback method provided in the embodiment of the present application further includes:
  • S503 Parse the first code stream to obtain first image data.
  • the method of decoding and processing the code stream to obtain an image may be any existing feasible method of parsing the code stream to obtain an image, so as to obtain the first image data.
  • S504 Use the first neural network model to process the first image data to obtain a processed image.
  • the first image data is processed according to the determined first neural network model to achieve the purpose of image quality optimization.
  • Figure 6 shows a schematic flow diagram of the player using the neural network model to process the code stream during video playback.
  • the code stream provider sends the first code stream and its corresponding type information to the terminal device.
  • the terminal device includes a player and an image generator.
  • the first code stream can be any image sent by the code stream provider. If the pre-trained neural network model is preset in the terminal device, the first neural network model corresponding to the first code stream can be directly obtained according to the first corresponding relationship. It should be understood that a neural network model resource pool is stored in the terminal device, and the neural network model resource pool includes a plurality of neural network models. After obtaining the type information of the code stream, the terminal device obtains the code stream in the first corresponding relationship. The type information corresponds to the neural network model.
  • the pre-trained neural network model is stored at the code stream provider, it also includes: the player sends feedback information to the code stream provider.
  • the feedback information may be the type information of the first code stream received by the player.
  • the code stream provider receives the feedback information sent by the player, and sends the parameters of the first neural network model corresponding to the first code stream to the player according to the first correspondence.
  • the parameters establish the first neural network model.
  • the feedback information may also include the computing power of the player.
  • the code stream provider may send the parameters of the neural network model suitable for the computing power of the player to The player. The player decodes the acquired code stream to obtain the image source.
  • the neural network model is set in the image generator according to the parameters of the neural network model corresponding to the type information of the code stream, or the selected and
  • the neural network model corresponding to the type information of the code stream is loaded into the image generator, and then the image source is passed to the image generator.
  • the image generator processes the image source according to the neural network model, or the neural network model runs on the image generator to process the input image data, and finally outputs the optimized image.
  • the pre-trained neural network model can be preset in the player of the terminal device, for example, it can be stored in the storage unit of the terminal device, or it can be returned to the code stream provider, when the corresponding code stream is downloaded on the terminal device
  • the terminal device can also download the corresponding neural network model parameters to establish a neural network model to process the code stream provided by the code stream supplier.
  • the basic settings can be loaded.
  • the basic setting may be a neural network model manually specified, or any neural network model obtained in the training process, which is not limited in the embodiment of the present application.
  • Fig. 7 shows a schematic flow chart of updating the neural network model during the video playback process.
  • the player first sets the first neural network model according to the type information of the code stream, and then processes the code stream according to the neural network model.
  • the player if there is a change in at least one of the color gamut, resolution, bit rate, or film type, theme, and scene of the code stream, the player will obtain and change the type information according to the changed code stream.
  • the neural network model that matches the type information of the, and then the changed code stream is processed according to the neural network model that matches the type information after the change. If there is no change in the color gamut, resolution, bit rate of the code stream, or the type, theme, and scene of the movie, the first neural network model continues to be used to process the code stream.
  • the embodiments of the present application may be combined with dynamic adaptive streaming technology to ensure the smoothness and high quality of the images viewed by the viewer.
  • the essence of dynamic adaptive streaming technology is to monitor or report the network connection rate of the terminal by the player. When the network connection rate decreases and the user experience means affects, it actively obtains a lower resolution bit stream, at the expense of image quality to guarantee the image. The fluency.
  • a dynamic adaptive streaming technology when a dynamic adaptive streaming technology is used to obtain a lower resolution code stream, a neural network model that matches the lower resolution code rate is determined, and then the neural network model that matches the lower resolution code rate is determined according to the matched neural network model. Process the image corresponding to the current code stream to improve the image quality. This can ensure the smoothness and high quality of video playback.
  • FIG. 8 is a device 800 for obtaining a neural network model provided by an embodiment of the present application.
  • the apparatus 800 for obtaining a neural network model shown in FIG. 8 (the apparatus 800 may specifically be a computer device) includes a memory 801, a processor 802, a communication interface 803, and a bus 804. Among them, the memory 801, the processor 802, and the communication interface 803 realize the communication connection between each other through the bus 804.
  • the memory 801 may be a read only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
  • the memory 801 may store a program. When the program stored in the memory 801 is executed by the processor 802, the processor 802 and the communication interface 803 are used to execute each step of the method 200 for obtaining a neural network model in an embodiment of the present application.
  • the processor 802 can adopt a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a graphics processing unit (graphics processing unit, GPU), or one or more
  • the integrated circuit is used to execute related programs to implement the functions required by the units in the apparatus for obtaining a neural network model in the embodiment of the present application, or to perform the method for obtaining a neural network model in the method embodiment of the present application.
  • the processor 802 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the method for obtaining a neural network model of the present application can be completed by an integrated logic circuit of hardware in the processor 802 or instructions in the form of software.
  • the aforementioned processor 802 may also be a general-purpose processor, a digital signal processor (Digital Signal Processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices , Discrete gates or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processing
  • ASIC application specific integrated circuit
  • FPGA field Programmable Gate Array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 801, and the processor 802 reads the information in the memory 801 and, in combination with its hardware, completes the functions required by the units included in the apparatus for obtaining neural network models of the embodiments of the present application, or executes the method embodiments of the present application The method of obtaining the neural network model.
  • the communication interface 803 uses a transceiving device such as but not limited to a transceiver to implement communication between the device 800 and other devices or a communication network.
  • a transceiving device such as but not limited to a transceiver to implement communication between the device 800 and other devices or a communication network.
  • the training data (such as the training code stream in the embodiment of the present application) can be obtained through the communication interface 803.
  • the bus 804 may include a path for transferring information between various components of the device 800 (for example, the memory 801, the processor 802, and the communication interface 803).
  • FIG. 9 is a schematic diagram of the hardware structure of a device for processing code stream data provided by an embodiment of the present application.
  • the video playback device 900 shown in FIG. 9 includes a memory 901, a processor 902, a communication interface 903, and a bus 904.
  • the memory 901, the processor 902, and the communication interface 903 implement communication connections between each other through the bus 904.
  • the memory 901 may be a read only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
  • the memory 901 may store a program. When the program stored in the memory 901 is executed by the processor 902, the processor 902 and the communication interface 903 are used to execute each step of the code stream data processing method 500 of the embodiment of the present application.
  • the processor 902 may adopt a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a graphics processing unit (graphics processing unit, GPU), or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the video playback device of the embodiment of the present application, or to execute the video playback method of the method embodiment of the present application.
  • the processor 902 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the video playback method of the present application can be completed by an integrated logic circuit of hardware in the processor 902 or instructions in the form of software.
  • the aforementioned processor 902 may also be a general-purpose processor, a digital signal processor (Digital Signal Processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices , Discrete gates or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processing
  • ASIC application specific integrated circuit
  • FPGA field Programmable Gate Array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 901, and the processor 902 reads the information in the memory 901, and combines its hardware to complete the functions required by the units included in the video playback device of the embodiment of the present application, or execute the code of the method embodiment of the present application The processing method of streaming data.
  • the communication interface 903 uses a transceiving device such as but not limited to a transceiver to implement communication between the device 900 and other devices or a communication network.
  • a transceiving device such as but not limited to a transceiver to implement communication between the device 900 and other devices or a communication network.
  • the neural network model or the code stream to be processed can be obtained through the communication interface 903.
  • the bus 904 may include a path for transferring information between various components of the device 900 (for example, the memory 901, the processor 902, and the communication interface 903).
  • the devices 800 and 900 shown in FIG. 8 and FIG. 9 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the devices 800 and 900 also include implementations. Other devices necessary for normal operation. At the same time, according to specific needs, those skilled in the art should understand that the devices 800 and 900 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the devices 800 and 900 may also only include the necessary components for implementing the embodiments of the present application, and not necessarily all the components shown in FIG. 8 or FIG. 9.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提供了一种码流数据的处理方法,可以提高影像的画质。该方法包括:获取第一码流;根据第一对应关系和第一码流的类型信息确定第一神经网络模型,第一神经网络模型为与第一码流对应的神经网络模型,第一对应关系用于指示不同类型的码流对应的神经网络模型;解析所述第一码流,以获得第一图像数据;采用所述第一神经网络模型对第一图像数据进行处理,以获得处理后的图像。

Description

码流数据的处理方法和装置
本申请要求于2019年12月06日提交中国国家知识产权局、申请号为201911237986.3、申请名称为“码流数据的处理方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及影像处理领域,并且更具体地涉及一种码流数据的处理方法和装置,以及获取神经网络模型的方法和装置。
背景技术
随着网络频宽逐年增长和视频编解码技术持续演进,多媒体的点播服务已成为多媒体的主流服务。而点播服务由于其多源多点的特性,即使有足够的频宽并配置了内容传递网络,也难以维持稳定的终端连接速率。当用户持续追求高品质的视觉效果,对频宽、时延和连接速率则有更高的要求。因此,在频宽、时延和连接速率一定的条件下,如何提高视频图像的画质和分辨率,成为亟待解决的问题。
发明内容
本申请提供一种码流数据的处理方法和装置,能够在频宽、时延和连接速率一定的条件下,提高视频图像的画质和分辨率。
第一方面,提供一种码流数据的处理方法,包括:获取第一码流;根据第一对应关系和第一码流的类型信息确定第一神经网络模型,第一神经网络模型为与第一码流对应的神经网络模型,第一对应关系用于指示不同类型的码流对应的神经网络模型;解析第一码流,以获得第一图像数据;采用第一神经网络模型对第一图像数据进行处理,以获得处理后的图像。
相比于采用统一的神经网络模型进行处理,根据本申请提供的方案,对每一种码流分别采用与其类型相对应的神经网络模型进行处理,处理方法更加精确,使得影像画质更加贴近于标准影像的画质,以达到影像画质的最优化处理。
结合第一方面,在第一方面的某些实现方式中,码流的类型信息包括码流的供应商和码流的媒体元数据。
结合第一方面,在第一方面的某些实现方式中,媒体元数据包括影片题材、影片类型、影片场景、码流的色域、码流的分辨率或码流的码率中的任意一个或多个。
本申请提供的方案中,根据码流的类型信息对码流进行分类,以获取与码流的类型信息相匹配的神经网络模型,其中,类型信息越细致,则获取的神经网络模型的匹配度越高,图像处理效果越好。
结合第一方面,在第一方面的某些实现方式中,第一神经网络模型为根据与第一码流 具有相同类型信息的训练码流训练得到的。
本申请提供的方案中,对码流进行处理的神经网络模型为根据与该码流相同类型信息的训练码流训练得到,由此获得的神经网络模型与其相对应的码流具有极高匹配度,因此在图像处理过程中具有极好的效果。
结合第一方面,在第一方面的某些实现方式中,第一对应关系包括:码流的类型信息与神经网络模型的对应关系,根据第一对应关系和第一码流的类型信息确定第一神经网络模型,具体包括:确定第一码流的供应商和第一码流的媒体元数据;在第一对应关系中确定与第一码流的供应商和第一码流的媒体元数据均对应的神经网络模型为第一神经网络模型。
本申请提供的方案中,根据第一对应关系对码流和神经网络模型进行匹配,可以快速获取与第一码流相对应的神经网络模型。
第二方面,提供了一种获取神经网络模型的方法,包括:获取多组训练数据,多组训练数据中的每组训练数据包括标准码流和训练码流,多组训练数据包括的训练码流对应不同的类型信息,训练码流是标准码流经过预处理得到的;根据多组训练数据对多个初始神经网络模型分别进行训练,得到多个神经网络模型,其中,一组训练数据训练得到一个神经网络模型;将多个神经网络模型和训练多个神经网络模型所用的训练数据的多个类型信息对应存储。
结合第二方面,在第二方面的某些实现方式中,多组训练数据包括第一组训练数据,第一组训练数据包括第一标准码流和第一训练码流,第一训练码流的类型信息为第一类型信息,根据多组训练数据对多个初始神经网络模型分别进行训练,得到多个神经网络模型,包括:根据第一标准码流和第一训练码流对初始神经网络模型进行训练;当第一训练码流经过神经网络模型处理得到的码流与第一标准码流的差异满足预设条件时,得到第一神经网络模型;将多个神经网络模型和训练多个神经网络模型所用的训练数据的多个类型信息对应存储,包括:对应存储第一神经网络模型和第一类型信息。
不同码流供应商的影像以及不同影像经过预处理后的结果都不相同,如果试图采用一种演算法和训练方式得到不同影像的神经网络模型,会造成演算法过于复杂无法收敛或对算力的要求过高。本申请提供的方案通过针对不同码流供应商的不同类型的码流分别进行训练,避免演算法难以收敛的问题,节省了训练时间。得到适应于多个不同码流供应商的多个神经网络模型,然后可以根据码流类型信息适应性选择对应的神经网络模型,在提升特定影像的画质时也具有更好的效果。
结合第二方面,在第二方面的某些实现方式中,类型信息包括训练码流的供应商和训练码流的媒体元数据。
结合第二方面,在第二方面的某些实现方式中,媒体元数据包括影片题材、影片类型、影片场景、码流的色域、码流的分辨率或码流的码率中的任意一个或多个。
本申请提供的方案中,针对不同类型信息的码流分别进行神经网络模型的训练,其中,类型信息越细致,则训练得到的神经网络模型的针对性越好,对相应的图像处理效果越好。
结合第二方面,在第二方面的某些实现方式中,将多个神经网络模型和训练多个神经网络模型所用的训练数据的多个类型信息对应存储之后,该方法还包括:将神经网络模型和与神经网络模型对应的类型信息传输给所述类型信息中包括的码流供应商。
本申请提供的方案中,训练好的神经网络模型与该神经网络模型对应的类型信息可以存储在终端设备,当终端设备上下载了相应码流供应商的媒体播放器时,即可调用相应的存储的神经网络模型。可选地,训练好的神经网络模型与该神经网络模型对应的类型信息还可以传输给码流供应商,当终端设备上下载了相应码流供应商的媒体播放器时,终端设备同时也可以下载相应的神经网络模型参数,从而建立神经网络模型对该码流供应商提供的码流进行处理。
第三方面,提供了一种码流数据的处理装置,包括:获取单元,用于获取第一码流;
处理单元,用于根据第一对应关系和第一码流的类型信息确定第一神经网络模型,第一神经网络模型为与第一码流对应的神经网络模型,第一对应关系用于指示不同类型的码流对应的神经网络模型;处理单元还用于解析第一码流,以获得第一图像数据;处理单元还用于采用第一神经网络模型对第一图像数据进行处理,以获得处理后的图像。
结合第三方面,在第三方面的某些实现方式中,码流的类型信息包括所述码流的供应商和所述码流的媒体元数据。
结合第三方面,在第三方面的某些实现方式中,媒体元数据包括影片题材、影片类型、影片场景、码流的色域、码流的分辨率或码流的码率中的任意一个或多个。
结合第三方面,在第三方面的某些实现方式中,第一神经网络模型为根据与第一码流具有相同类型信息的训练码流训练得到的。
结合第三方面,在第三方面的某些实现方式中,第一对应关系包括:码流的类型信息与神经网络模型的对应关系,处理单元具体用于:确定第一码流的供应商和第一码流的媒体元数据;在第一对应关系中确定与第一码流的供应商和第一码流的媒体元数据均对应的神经网络模型为第一神经网络模型。
第四方面,提供了一种码流数据的处理装置,包括处理器和传输接口,该传输接口用于接收或发送数据,该处理器被配置为执行第一方面以及第一方面的任一种可能的实现方式中的方法。
第五方面,提供了一种获取神经网络模型的装置,包括:获取单元,用于获取多组训练数据,多组训练数据中的每组训练数据包括标准码流和训练码流,多组训练数据包括的训练码流对应不同的类型信息,训练码流是标准码流经过预处理得到的;处理单元,用于根据多组训练数据对多个初始神经网络模型分别进行训练,得到多个神经网络模型,其中,一组训练数据训练得到一个神经网络模型;处理单元还用于将多个神经网络模型和训练多个神经网络模型所用的训练数据的多个类型信息对应存储。
结合第五方面,在第五方面的某些实现方式中,多组训练数据包括第一组训练数据,第一组训练数据包括第一标准码流和第一训练码流,第一训练码流的类型信息为第一类型信息,处理单元具体用于:根据第一标准码流和第一训练码流对初始神经网络模型进行训练;当第一训练码流经过神经网络模型处理得到的码流与第一标准码流的差异满足预设条件时,得到第一神经网络模型;对应存储第一神经网络模型和第一类型信息。
结合第五方面,在第五方面的某些实现方式中,类型信息包括训练码流的供应商和训练码流的媒体元数据。
结合第五方面,在第五方面的某些实现方式中,媒体元数据包括影片题材、影片类型、影片场景、码流的色域、码流的分辨率或码流的码率中的任意一个或多个。
结合第五方面,在第五方面的某些实现方式中,所述装置还包括:发送单元,用于将神经网络模型和与神经网络模型对应的类型信息传输给类型信息中包括的码流供应商。
第六方面,提供了一种获取神经网络模型的装置,包括处理器和传输接口,该传输接口用于接收或发送数据,该处理器被配置为执行第二方面以及第二方面的任一种可能的实现方式中的方法。
第七方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序(也可以称为代码,或指令),当所述计算机程序在计算机上运行时,使得所述计算机执行上述第一方面、第一方面中任一种可能实现方式或第二方面、第二方面中任一种可能实现方式中的方法。
第八方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序包括用于执行上述第一方面、第一方面中任一种可能实现方式或第二方面、第二方面中任一种可能实现方式中的方法的指令。
附图说明
图1是本申请实施例提供的一种编码解码***的示意性框图。
图2是本申请实施例提供的一种获取神经网络模型的方法的示意性流程图。
图3是本申请实施例提供的一种影像处理过程的示意性流程图。
图4是本申请实施例提供的一种神经网络模型的训练方法的示意性流程图。
图5是本申请实施例提供的一种码流数据的处理方法的示意性流程图。
图6是本申请实施例提供的一种播放器利用神经网络模型对码流进行处理的示意性流程图。
图7是本申请实施例提供的一种神经网络模型的更新的示意性流程图。
图8是本申请实施例提供的一种获取神经网络模型的装置的示意性框图。
图9是本申请实施例提供的一种码流数据的处理装置的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
为了对视频编解码的过程有一个初步的了解和认识,下面结合图1描述编码解码***10的实施例。
图1为绘示示例性编码***10的概念性或示意性框图,例如,可以利用本申请(本公开)技术的视频编码***10。视频编码***10的编码器20(例如,视频编码器20)和解码器30(例如,视频解码器30)表示可用于根据本申请中描述的各种实例执行用于视频编码或视频解码方法的技术的设备实例。如图1中所示,编码***10包括源设备12,用于向例如解码经编码数据13的目的地设备14提供经编码数据13,例如,经编码图片13。
源设备12包括编码器20,另外亦即可选地,可以包括图片源16,例如图片预处理单元18的预处理单元18,以及通信接口或通信单元22。
图片源16可以包括或可以为任何类别的图片捕获设备,用于例如捕获现实世界图片,和/或任何类别的图片或评论(对于屏幕内容编码,屏幕上的一些文字也认为是待编码的 图片或图像的一部分)生成设备,例如,用于生成计算机动画图片的计算机图形处理器,或用于获取和/或提供现实世界图片、计算机动画图片(例如,屏幕内容、虚拟现实(virtual reality,VR)图片)的任何类别设备,和/或其任何组合(例如,实景(augmented reality,AR)图片)。
(数字)图片为或者可以视为具有亮度值的采样点的二维阵列或矩阵。阵列中的采样点也可以称为像素(pixel)(像素(picture element)的简称)或像素(pel)。阵列或图片在水平和垂直方向(或轴线)上的采样点数目定义图片的尺寸和/或分辨率。为了表示颜色,通常采用三个颜色分量,即图片可以表示为或包含三个采样阵列。RBG格式或颜色空间中,图片包括对应的红色、绿色及蓝色采样阵列。但是,在视频编码中,每个像素通常以亮度/色度格式或颜色空间表示,例如,YCbCr,包括Y指示的亮度分量(有时也可以用L指示)以及Cb和Cr指示的两个色度分量。亮度(简写为luma)分量Y表示亮度或灰度水平强度(例如,在灰度等级图片中两者相同),而两个色度(简写为chroma)分量Cb和Cr表示色度或颜色信息分量。相应地,YCbCr格式的图片包括亮度采样值(Y)的亮度采样阵列,和色度值(Cb和Cr)的两个色度采样阵列。RGB格式的图片可以转换或变换为YCbCr格式,反之亦然,该过程也称为色彩变换或转换。如果图片是黑白的,该图片可以只包括亮度采样阵列。
图片源16(例如,视频源16)可以为,例如用于捕获图片的相机,例如图片存储器的存储器,包括或存储先前捕获或产生的图片,和/或获取或接收图片的任何类别的(内部或外部)接口。相机可以为,例如,本地的或集成在源设备中的集成相机,存储器可为本地的或例如集成在源设备中的集成存储器。接口可以为,例如,从外部视频源接收图片的外部接口,外部视频源例如为外部图片捕获设备,比如相机、外部存储器或外部图片生成设备,外部图片生成设备例如为外部计算机图形处理器、计算机或服务器。接口可以为根据任何专有或标准化接口协议的任何类别的接口,例如有线或无线接口、光接口。获取图片数据17的接口可以是与通信接口22相同的接口或是通信接口22的一部分。
区别于预处理单元18和预处理单元18执行的处理,图片或图片数据17(例如,视频数据16)也可以称为原始图片或原始图片数据17。
预处理单元18用于接收(原始)图片数据17并对图片数据17执行预处理,以获得经预处理的图片19或经预处理的图片数据19。例如,预处理单元18执行的预处理可以包括整修、色彩格式转换(例如,从RGB转换为YCbCr)、调色或去噪。可以理解,预处理单元18可以是可选组件。
编码器20(例如,视频编码器20)用于接收经预处理的图片数据19并提供经编码图片数据21(下文将进一步描述细节,例如,基于图2或图4)。
源设备12的通信接口22可以用于接收经编码图片数据21并传输至其它设备,例如,目的地设备14或任何其它设备,以用于存储或直接重构,或用于在对应地存储经编码数据13和/或传输经编码数据13至其它设备之前处理经编码图片数据21,其它设备例如为目的地设备14或任何其它用于解码或存储的设备。
目的地设备14包括解码器30(例如,视频解码器30),另外亦即可选地,可以包括通信接口或通信单元28、后处理单元32和显示设备34。
目的地设备14的通信接口28用于例如,直接从源设备12或任何其它源接收经编码 图片数据21或经编码数据13,任何其它源例如为存储设备,存储设备例如为经编码图片数据存储设备。
通信接口22和通信接口28可以用于藉由源设备12和目的地设备14之间的直接通信链路或藉由任何类别的网络传输或接收经编码图片数据21或经编码数据13,直接通信链路例如为直接有线或无线连接,任何类别的网络例如为有线或无线网络或其任何组合,或任何类别的私网和公网,或其任何组合。
通信接口22可以例如用于将经编码图片数据21封装成合适的格式,例如包,以在通信链路或通信网络上传输。
形成通信接口22的对应部分的通信接口28可以例如用于解封装经编码数据13,以获得经编码图片数据21。
通信接口22和通信接口28都可以配置为单向通信接口,如图1中用于经编码图片数据13的从源设备12指向目的地设备14的箭头所指示,或配置为双向通信接口,以及可以用于例如发送和接收消息来建立连接、确认和交换任何其它与通信链路和/或例如经编码图片数据传输的数据传输有关的信息。在一种可选的方案中,通信接口还可以称为传输接口,接口可以为根据任何专有或标准化接口协议的任何类别的接口,例如高清晰度多媒体接口(high definition multimedia interface,HDMI)、移动产业处理器接口(Mobile Industry Processor Interface,MIPI)、MIPI标准化的显示串行接口(Display Serial Interface,DSI)、视频电子标准协会(Video Electronics Standards Association,VESA)标准化的嵌入式显示端口(Embedded Display Port,eDP)或者V-By-One接口,V-By-One接口是一种面向图像传输开发的数字接口标准,以及各种有线或无线接口、光接口等。
解码器30用于接收经编码图片数据21并提供经解码图片数据31或经解码图片31(下文将进一步描述细节,例如,基于图3或图5)。
目的地设备14的后处理器32用于后处理经解码图片数据31(也称为经重构图片数据),例如,经解码图片131,以获得经后处理图片数据33,例如,经后处理图片33。后处理单元32执行的后处理可以包括,例如,色彩格式转换(例如,从YCbCr转换为RGB)、调色、整修或重采样,或任何其它处理,用于例如准备经解码图片数据31以由显示设备34显示。
目的地设备14的显示设备34用于接收经后处理图片数据33以向例如用户或观看者显示图片。显示设备34可以为或可以包括任何类别的用于呈现经重构图片的显示器,例如,集成的或外部的显示器或监视器。例如,显示器可以包括液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light emitting diode,OLED)显示器、等离子显示器、投影仪、微LED显示器、硅基液晶(liquid crystal on silicon,LCoS)、数字光处理器(digital light processor,DLP)或任何类别的其它显示器。
虽然图1将源设备12和目的地设备14绘示为单独的设备,但设备实施例也可以同时包括源设备12和目的地设备14或同时包括两者的功能性,即源设备12或对应的功能性以及目的地设备14或对应的功能性。在此类实施例中,可以使用相同硬件和/或软件,或使用单独的硬件和/或软件,或其任何组合来实施源设备12或对应的功能性以及目的地设备14或对应的功能性。
本领域技术人员基于描述明显可知,不同单元的功能性或图1所示的源设备12和/或 目的地设备14的功能性的存在和(准确)划分可能根据实际设备和应用有所不同。
编码器20(例如,视频编码器20)和解码器30(例如,视频解码器30)都可以实施为各种合适电路中的任一个,例如,一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件实施所述技术,则设备可将软件的指令存储于合适的非暂时性计算机可读存储介质中,且可使用一或多个处理器以硬件执行指令从而执行本公开的技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可视为一或多个处理器。视频编码器20和视频解码器30中的每一个可以包含在一或多个编码器或解码器中,所述编码器或解码器中的任一个可以集成为对应设备中的组合编码器/解码器(编解码器)的一部分。
源设备12可称为视频编码设备或视频编码装置。目的地设备14可称为视频解码设备或视频解码装置。源设备12以及目的地设备14可以是视频编码设备或视频编码装置的实例。
源设备12和目的地设备14可以包括各种设备中的任一个,包含任何类别的手持或静止设备,例如,笔记本或膝上型计算机、移动电话、智能电话、平板或平板计算机、摄像机、台式计算机、机顶盒、电视、显示设备、数字媒体播放器、视频游戏控制台、视频流式传输设备(例如内容服务服务器或内容分发服务器)、广播接收器设备、广播发射器设备等,并可以不使用或使用任何类别的操作***。
在一些情况下,源设备12和目的地设备14可以经装备以用于无线通信。因此,源设备12和目的地设备14可以为无线通信设备。
在一些情况下,图1中所示视频编码***10仅为示例,本申请的技术可以适用于不必包含编码和解码设备之间的任何数据通信的视频编码设置(例如,视频编码或视频解码)。在其它实例中,数据可从本地存储器检索、在网络上流式传输等。视频编码设备可以对数据进行编码并且将数据存储到存储器,和/或视频解码设备可以从存储器检索数据并且对数据进行解码。在一些实例中,由并不彼此通信而是仅编码数据到存储器和/或从存储器检索数据且解码数据的设备执行编码和解码。
应理解,对于以上参考视频编码器20所描述的实例中的每一个,视频解码器30可以用于执行相反过程。关于信令语法元素,视频解码器30可以用于接收并解析这种语法元素,相应地解码相关视频数据。在一些例子中,视频编码器20可以将一个或多个语法元素熵编码成经编码视频比特流。在此类实例中,视频解码器30可以解析这种语法元素,并相应地解码相关视频数据。
图2示出了本申请实施例提供的一种获取神经网络模型的方法200的流程图。图2所示的方法包括步骤201至203,下面对步骤201至203进行详细的介绍。
S201,获取多组训练数据,多组训练数据中的每组训练数据包括标准码流和训练码流,多组训练数据包括的训练码流对应不同的类型信息,训练码流是标准码流经过预处理得到的。
对于特定的码流供应商来说,从取得影像至传播影像到客户端,一般需要经过如图3所示的影像处理流程。如图3所示,该影像处理流程包括步骤301至306,分别为录制、分段、预处理、编码、码流包装和加密以及发送等六个阶段。
其中,在本申请实施例中,录制后得到的高质量影像称为标准码流,标准码流对应的图像具有高分辨率。
标准码流进行步骤303预处理后得到本申请实施例所称的训练码流,该预处理过程包括前处理过程和编码过程。将标准码流分段成合适的长度后,由于在前处理阶段需要应不同串流需求,将分段后的标准码流进行色域转换和下采样,从而产生具有不同色域和解析度的影像。例如,将高动态范围(high dynamic range,HDR)影像转换为标准动态范围(standard dynamic range,SDR)影像,将超高清(ultra high definition,UHD)转换为全高清(full high definition,FHD)。HDR可以提供更多的动态范围和图像细节,HDR带来的画质提升是直观可感的,即使普通人也能用肉眼辨别,然而HDR技术要求电子设备的显示器具有高对比度和高色彩表现力,因此并不是所有的电子设备都能使用HDR技术,因此,为了顾及电子设备的使用范围,需要将HDR转换为SDR;国际电信联盟(international telecommunications union,ITU)将3840ⅹ2160以上的分辨率称为超高清,将1920ⅹ1080的分辨率称为全高清,与前述类似,由于并非所有的电子设备都支持UHD,因此需要将UHD转换为应用范围更广的FHD。在编码过程中,采用压缩算法对视频进行压缩,压缩后的码流越小,说明压缩编码的过程中丢弃的信息越多,压缩比越大,解码后获取的图像质量越低。然而码流越小,文件体积越小,因此在某些情况下,为了使得文件体积小而在编码过程中减小码流。
标准码流经预处理后得到的训练码流具有第一类型信息,即为该训练码流特定的类型信息,该类型信息包括该训练码流的供应商和媒体元数据,其中媒体元数据还包括影片题材、类型、场景、码流的色域、码流的分辨率或码流的码率中的任意一个或多个。
由此可知,经过前处理和编码过程得到的训练码流所对应的图像质量与标准码流对应的图像质量之间存在差异,这是造成影像质量下降的原因。虽然可以基于人工智能技术提升影像的画质或分辨率,然而对于不同的码流供应商,采用的预处理方法也不相同,如果试图采用同一种神经网络模型对不同码流供应商的不同预处理后的码流进行处理,则训练该神经网络模型会造成算法过于复杂无法收敛,或对算力要求过高,因此传统的神经网络无法适应多种不同码流供应商的需求。本申请实施例中,针对不同的码流供应商的特定媒体元数据分别进行训练,以获得针对不同类型信息的码流的多个神经网络模型,这样在获取码流之后,可以根据码流的类型信息从多个神经网络模型中选择对应的神经网络模型对码流进行处理,更灵活且处理效果更好。
S202,根据多组训练数据对多个初始神经网络模型分别进行训练,得到多个神经网络模型,其中,一组训练数据训练得到一个神经网络模型。
图4为本申请实施例提供的神经网络模型的训练示意流程图。
S401,构建多组训练数据。
根据不同的码流的类型信息构建多组训练数据,例如可以对标准码流进行不同的预处理,得到多种不同的训练码流,其中一组训练数据包括一个标准码流和一个训练码流。经过预处理后的训练码流与标准码流在色域、分辨率等方面均有所变化。示例性的,可以先收集多种类型信息,该类型信息包括码流供应商和媒体元数据,其中媒体元数据包括影片题材、影片类型、影片场景、码流的色域、码流的分辨率、码流的码率等。进一步的,获取每一种类型信息对应的训练数据。示例性的,对标准码流进行预处理,获取供应商为腾 讯、影片题材为喜剧,影片类型为电影、影片场景为户外的类型的训练码流,则该标准码流和该训练码流构成一组训练数据,基于该组训练数据训练得到的神经网络模型对应的码流的类型信息即为:供应商为腾讯、影片题材为喜剧,影片类型为电影、影片场景为户外。
表1示出了类型信息的可能情况。
表1类型信息的可能情况
Figure PCTCN2020128960-appb-000001
应理解,训练码流的类型信息包括但不限于表1中示出的情况。本申请实施例中,训练码流的类型信息可以是表1中可能的情况中的任意一种或多种的组合。例如,表2示出了四种可能的码流的类型信息。
表2四种可能的类型信息
  第一类型信息 第二类型信息 第三类型信息 第四类型信息
码流供应商 爱奇艺 爱奇艺 优酷 优酷
影片题材 喜剧 喜剧 爱情  
影片类型 电影 电影 电视剧 体育赛事
影片场景 室内 室内 城市 户外
色域 BT601 BT709 BT2020 BT2020
分辨率
码率
对应的,构建多组训练数据包括:
构建对应第一类型信息的第一组训练数据,构建对应第二类型信息的第二组训练数据,构建对应第三类型信息的第三组训练数据,构建对应第四类型信息的第四组训练数据,其中,第一组训练数据包括训练码流A,第二组训练数据包括训练码流B,第三组训练数据包括训练码流C,第四组训练数据包括训练码流D。依据类型信息对标准码流进行预处理,得到对应的训练码流,示例性的,表3示出了对应的四种训练码流。
表3四种可能的训练码流
  训练码流A 训练码流B 训练码流C 训练码流D
码流供应商 爱奇艺 爱奇艺 优酷 优酷
影片题材 喜剧 喜剧 爱情  
影片类型 电影 电影 电视剧 体育赛事
影片场景 室内 室内 城市 户外
色域 BT601 BT709 BT2020 BT2020
分辨率
码率
每组训练数据包括标准码流和训练码流,即每组训练码流均有一组标准码流与其对应。可选地,不同组的标准码流可能相同。例如,训练码流A和训练码流B可能是由于同一标准码流经过不同预处理过程导致训练码流的色域、分辨率和码率不同,但训练码流A和训练码流B对应于同一标准码流。而训练码流A与训练码流C、训练码流D,由于三者的码流供应商不同,影片题材、类型、场景不同,可知三者的标准码流不相同。
S402,根据多组训练数据训练出多个神经网络模型,其中一组训练数据训练得到一种神经网络模型。
具体地,通过设定初始参数建立神经网络模型,该参数可以是任意参数。调节神经网络模型的参数对神经网络模型进行训练,直到训练码流经过调整后的神经网络模型处理后得到的码流与所述标准码流的差异满足预设条件。例如,将训练码流A、训练码流B、训练码流C、训练码流D与其对应的标准码流分别传送至服务器,然后分别进行神经网络模型的训练。例如,对于训练码流A和其标准码流1,建立初始神经网络模型a,然后调整神经网络模型a的参数,当训练码流A经过神经网络模型处理后与标准码流1的差异满足预设条件时,得到第一类型信息对应的神经网络模型a’。可选地,训练码流A与标准码流1的差异满足预设条件可以使用相应的评价函数来进行评价。如果调整参数后的神经网络模型a’使得训练码流A与标准码流1的差异满足预设条件,则对应存储神经网络模型a’与训练码流A的第一类型信息。相应地,对于训练码流B、训练码流C、训练码流D可以参考神经网络模型a’的训练方法,得到对应的神经网络模型b’、神经网络模型c’、神经网络模型d’。
利用AI深度学习框架实现图像质量强化演算,经由训练,AI神经网络所产生的图像质量会逐渐逼近于所输入的高品质图像。训练结束后,便可以利用此训练成果,让AI神经网络对图像进行画质和分辨率的提升。然而不同码流供应商的影像以及不同影像经过预处理后的结果都不相同,如果试图采用一种演算法和训练方式得到不同影像的神经网络模型,会造成演算法过于复杂无法收敛或对算力的要求过高。本申请实施例通过针对不同码流供应商的不同类型的码流分别进行训练,避免演算法难以收敛的问题,节省了训练时间。得到适应于多个不同码流供应商的多个神经网络模型,然后可以根据码流类型信息适应性选择对应的神经网络模型,在提升特定影像的画质时也具有更好的效果。
本申请实施例提供的一种获取神经网络参数的方法200还包括:
S203,将多个神经网络模型和训练多个神经网络模型所用的训练数据的多个类型信息对应存储。
训练好的神经网络模型与该神经网络模型对应的训练码流的第一类型信息可以存储在终端设备,当终端设备上下载了相应码流供应商的媒体播放器时,即可调用相应的存储的神经网络模型。
可选地,训练好的神经网络模型与该神经网络模型对应的训练码流的第一类型信息还可以传输给码流供应商,当终端设备上下载了相应码流供应商的媒体播放器时,终端设备同时也可以下载相应的神经网络模型参数,从而建立神经网络模型对该码流供应商提供的码流进行处理。
图5是本申请实施例提供的一种码流数据的处理的方法500的流程图。图5所示的方法包括步骤501至504,下面对步骤501至504进行详细介绍。
S501,获取第一码流。
该码流可以是播放器从某一码流供应商处获取的任意影像。
S502,根据第一对应关系和第一码流的类型信息确定第一神经网络模型,该第一神经网络模型为与第一码流对应的神经网络模型,第一对应关系用于指示不同类型的码流对应的神经网络模型。
该类型信息包括第一码流供应商类型和媒体元数据等,其中媒体元数据包括影片类型、影片题材、影片场景、码流的色域、码流的分辨率、码流的码率中的任意一个或多个。可选地,第一码流的类型信息包括但不限于表1中示出的可能的情况。
获取第一码流的类型信息之后,根据第一对应关系,在预先训练得到的结果中选取与该第一码流相匹配的神经网络模型。第一对应关系用于指示不同类型的码流对应的神经网络模型,即选取的神经网络模型所针对的训练码流类型信息与该第一码流的类型信息相同。例如,表4示出了不同类型信息的码流与神经网络模型的对应关系。
表4不同类型信息的码流与神经网络模型的对应关系
Figure PCTCN2020128960-appb-000002
应理解,表4只是部分示出了不同类型信息的码流与神经网络模型的对应关系。根据第一码流的类型信息和如表4所示的对应关系,可以获取与第一码流对应的第一神经网络模型。示例性的,若获取的码流的类型信息为:供应商为爱奇艺、影片题材为喜剧、影片类型为电影、色域为BT709,分辨率为低,码率为中,则与该码流对应的神经网络模型为神经网络模型c’。
获取到第一码流和与第一码流相对应的第一神经网络模型后,本申请实施例提供的影像播放方法还包括:
S503,解析第一码流,以获得第一图像数据。
对获取的第一码流进行解码处理,从而获得图像。应理解,在本申请实施例中,对码流解码处理以获得图像的方法可以是已有的任一种解析码流以获得图像的可行的方法,从而获得第一图像数据。
S504,采用第一神经网络模型对第一图像数据进行处理,以获得处理后的图像。
根据确定的第一神经网络模型对第一图像数据进行处理,以达到影像画质优化的目的。
图6示出了在影像播放时,播放器利用神经网络模型对码流进行处理的流程示意图。
码流供应商发送第一码流和其对应的类型信息到终端设备,终端设备包括播放器和图像生成器。该第一码流可以是该码流供应商发送的任意影像。如果预先训练好的神经网络模型预置于终端设备中,则可以直接根据第一对应关系获取与第一码流对应的第一神经网络模型。应当理解,终端设备中存储有神经网络模型资源池,该神经网络模型资源池中包括多个神经网络模型,终端设备在获得码流的类型信息之后,在第一对应关系中获得与该码流的类型信息对应的神经网络模型。可选地,如果预先训练好的神经网络模型存储在码流供应商处,则还包括:播放器发送反馈信息给码流供应商。该反馈信息可以是播放器接收的第一码流的类型信息。码流供应商接收到播放器发送的反馈信息,根据第一对应关系发送与第一码流相对应的第一神经网络模型的参数给播放器,播放器可以根据获取的第一神经网络模型的参数建立第一神经网络模型。可选地,该反馈信息还可以包含该播放器的算力,码流供应商接收到包含播放器算力的反馈信息后,可以发送与该播放器算力相适应的神经网络模型的参数至该播放器。播放器对获取到的码流进行解码处理,得到图像源,同时根据与码流的类型信息对应的神经网络模型的参数对图像生成器中进行神经网络模型的设定,或者说将选择的与码流的类型信息相对应的神经网络模型加载到图像生成器中,然后将图像源传入图像生成器。图像生成器根据神经网络模型对图像源进行处理,或者说神经网络模型运行在图像生成器上对输入的图像数据进行处理,最后输出优化后的图像。
可选地,预先训练好的神经网络模型可以预置于终端设备的播放器,例如可以存储在终端设备的存储单元中,也可以返回给码流供应商,当终端设备上下载了相应码流供应商的媒体播放器时,终端设备同时也可以下载相应的神经网络模型参数,从而建立神经网络模型对该码流供应商提供的码流进行处理。
可选地,如果在预先训练好的神经网络模型中找不到与该码流相匹配的神经网络模型,则可以载入基本设定。该基本设定可以是人为规定的一种神经网络模型,也可以是在训练过程中得到的任一种神经网络模型,本申请实施例在此不做限定。
图7示出了在影像播放过程中,神经网络模型的更新的示意性流程图。
如图7所示,播放器先根据码流的类型信息进行第一神经网络模型的设定,然后根据该神经网络模型对码流进行处理。在影像播放过程中,如果出现码流的色域、分辨率、码率或影片的类型、题材、场景中至少一种的变化,则播放器根据变化后的码流的类型信息获取与变化后的类型信息相匹配的神经网络模型,然后根据与变化后的类型信息相匹配的神经网络模型对变化后的码流进行处理。如果未出现码流的色域、分辨率、码率或影片的类型、题材、场景中任意一种的变化,则继续使用第一神经网络模型对码流进行处理。
可选地,本申请实施例可以与动态自适应串流技术相结合,以保证观看者观看的影像的流畅性和高品质。动态自适应串流技术的本质是藉由播放器监控或报告终端的网络连接速率,当网络连接速率降低、用户体验手段影响之前,主动获取较低分辨率的码流,以牺牲画质保证影像的流畅性。结合本申请实施例的方法,当采用动态自适应串流技术获取较低分辨率的码流时,确定与该较低分辨率码率相匹配的神经网络模型,然后根据该匹配的神经网络模型对当前的码流对应的图像进行处理,以提高画质。由此可以保证影像播放的流畅性和高品质性。
图8是本申请实施例提供的一种获取神经网络模型的装置800。图8所示的获取神经 网络模型的装置800(该装置800具体可以是一种计算机设备)包括存储器801、处理器802、通信接口803以及总线804。其中,存储器801、处理器802、通信接口803通过总线804实现彼此之间的通信连接。
存储器801可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器801可以存储程序,当存储器801中存储的程序被处理器802执行时,处理器802和通信接口803用于执行本申请实施例的获取神经网络模型的方法200的各个步骤。
处理器802可以采用通用的中央处理器(Central Processing Unit,CPU),微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的获取神经网络模型的装置中的单元所需执行的功能,或者执行本申请方法实施例的获取神经网络模型的方法。
处理器802还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的获取神经网络模型的方法的各个步骤可以通过处理器802中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器802还可以是通用处理器、数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器801,处理器802读取存储器801中的信息,结合其硬件完成本申请实施例的获取神经网络模型的装置中包括的单元所需执行的功能,或者执行本申请方法实施例的获取神经网络模型的方法。
通信接口803使用例如但不限于收发器一类的收发装置,来实现装置800与其他设备或通信网络之间的通信。例如,可以通过通信接口803获取训练数据(如本申请实施例中的训练码流)。
总线804可包括在装置800各个部件(例如,存储器801、处理器802、通信接口803)之间传送信息的通路。
图9是本申请实施例提供的码流数据的处理的装置的硬件结构示意图。图9所示的影像播放的装置900(该装置900具体可以是一种计算机设备)包括存储器901、处理器902、通信接口903以及总线904。其中,存储器901、处理器902、通信接口903通过总线904实现彼此之间的通信连接。
存储器901可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器901可以存储程序,当存储器901中存储的程序被处理器902执行时,处理器902和通信接口903用于执行本申请实施例的码流数据的处理方法500的各个步骤。
处理器902可以采用通用的中央处理器(Central Processing Unit,CPU),微处理器, 应用专用集成电路(Application Specific Integrated Circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例影像播放的装置中的单元所需执行的功能,或者执行本申请方法实施例的影像播放的方法。
处理器902还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的影像播放的方法的各个步骤可以通过处理器902中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器902还可以是通用处理器、数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器901,处理器902读取存储器901中的信息,结合其硬件完成本申请实施例的影像播放的装置中包括的单元所需执行的功能,或者执行本申请方法实施例的码流数据的处理的方法。
通信接口903使用例如但不限于收发器一类的收发装置,来实现装置900与其他设备或通信网络之间的通信。例如,可以通过通信接口903获取神经网络模型或待处理码流。
总线904可包括在装置900各个部件(例如,存储器901、处理器902、通信接口903)之间传送信息的通路。
应注意,尽管图8和图9所示的装置800和900仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置800和900还包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置800和900还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置800和900也可仅仅包括实现本申请实施例所必须的器件,而不必包括图8或图9中所示的全部器件。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (22)

  1. 一种码流数据的处理方法,其特征在于,包括:
    获取第一码流;
    根据第一对应关系和所述第一码流的类型信息确定第一神经网络模型,所述第一神经网络模型为与所述第一码流对应的神经网络模型,所述第一对应关系用于指示不同类型的码流对应的神经网络模型;
    解析所述第一码流,以获得第一图像数据;
    采用所述第一神经网络模型对所述第一图像数据进行处理,以获得处理后的图像。
  2. 根据权利要求1所述的方法,其特征在于,所述码流的类型信息包括所述码流的供应商和所述码流的媒体元数据。
  3. 根据权利要求2所述的方法,其特征在于,所述媒体元数据包括影片题材、影片类型、影片场景、码流的色域、码流的分辨率或码流的码率中的任意一个或多个。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述第一神经网络模型为根据与所述第一码流具有相同类型信息的训练码流训练得到的。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述第一对应关系包括:码流的类型信息与神经网络模型的对应关系,所述根据第一对应关系和所述第一码流的类型信息确定第一神经网络模型,具体包括:
    确定所述第一码流的供应商和所述第一码流的媒体元数据;
    在所述第一对应关系中确定与所述第一码流的供应商和所述第一码流的媒体元数据均对应的神经网络模型为所述第一神经网络模型。
  6. 一种获取神经网络模型的方法,其特征在于,包括:
    获取多组训练数据,所述多组训练数据中的每组训练数据包括标准码流和训练码流,所述多组训练数据包括的训练码流对应不同的类型信息,所述训练码流是所述标准码流经过预处理得到的;
    根据所述多组训练数据对多个初始神经网络模型分别进行训练,得到多个神经网络模型,其中,一组训练数据训练得到一个神经网络模型;
    将所述多个神经网络模型和训练所述多个神经网络模型所用的训练数据的多个类型信息对应存储。
  7. 根据权利要求6所述的方法,其特征在于,所述多组训练数据包括第一组训练数据,所述第一组训练数据包括第一标准码流和第一训练码流,所述第一训练码流的类型信息为第一类型信息,所述根据所述多组训练数据对多个初始神经网络模型分别进行训练,得到多个神经网络模型,包括:
    根据所述第一标准码流和所述第一训练码流对初始神经网络模型进行训练;
    当所述第一训练码流经过神经网络模型处理得到的码流与所述第一标准码流的差异满足预设条件时,得到第一神经网络模型;
    所述将所述多个神经网络模型和训练所述多个神经网络模型所用的训练数据的多个类型信息对应存储,包括:
    对应存储所述第一神经网络模型和所述第一类型信息。
  8. 根据权利要求6或7所述的方法,其特征在于,所述类型信息包括所述训练码流的供应商和所述训练码流的媒体元数据。
  9. 根据权利要求8所述的方法,其特征在于,所述媒体元数据包括影片题材、影片类型、影片场景、码流的色域、码流的分辨率或码流的码率中的任意一个或多个。
  10. 根据权利要求6至9中任一项所述的方法,其特征在于,所述将所述多个神经网络模型和训练所述多个神经网络模型所用的训练数据的多个类型信息对应存储之后,所述方法还包括:
    将神经网络模型和与所述神经网络模型对应的类型信息传输给所述类型信息中包括的码流供应商。
  11. 一种码流数据的处理装置,其特征在于,包括:处理器和传输接口,
    所述传输接口,用于获取第一码流;
    所述处理器被配置为用于执行如下步骤:
    根据第一对应关系和所述第一码流的类型信息确定第一神经网络模型,所述第一神经网络模型为与所述第一码流对应的神经网络模型,所述第一对应关系用于指示不同类型的码流对应的神经网络模型;
    解析所述第一码流,以获得第一图像数据;
    采用所述第一神经网络模型对所述第一图像数据进行处理,以获得处理后的图像。
  12. 根据权利要求11所述的装置,其特征在于,所述码流的类型信息包括所述码流的供应商和所述码流的媒体元数据。
  13. 根据权利要求12所述的装置,其特征在于,所述媒体元数据包括影片题材、影片类型、影片场景、码流的色域、码流的分辨率或码流的码率中的任意一个或多个。
  14. 根据权利要求11至13中任一项所述的装置,其特征在于,所述第一神经网络模型为根据与所述第一码流具有相同类型信息的训练码流训练得到的。
  15. 根据权利要求11至14中任一项所述的装置,其特征在于,所述第一对应关系包括:码流的类型信息与神经网络模型的对应关系,所述处理器,具体用于:
    确定所述第一码流的供应商和所述第一码流的媒体元数据;
    在所述第一对应关系中确定与所述第一码流的供应商和所述第一码流的媒体元数据均对应的神经网络模型为所述第一神经网络模型。
  16. 一种获取神经网络模型的装置,其特征在于,包括:处理器和传输接口;
    所述传输接口,用于获取多组训练数据,所述多组训练数据中的每组训练数据包括标准码流和训练码流,所述多组训练数据包括的训练码流对应不同的类型信息,所述训练码流是所述标准码流经过预处理得到的;
    所述处理器被配置为用于执行如下步骤:
    根据所述多组训练数据对多个初始神经网络模型分别进行训练,得到多个神经网络模型,其中,一组训练数据训练得到一个神经网络模型;
    将所述多个神经网络模型和训练所述多个神经网络模型所用的训练数据的多个类型信息对应存储到存储器。
  17. 根据权利要求16所述的装置,其特征在于,所述多组训练数据包括第一组训练 数据,所述第一组训练数据包括第一标准码流和第一训练码流,所述处理器具体用于:
    根据所述第一标准码流和所述第一训练码流对初始神经网络模型进行训练;
    当所述第一训练码流经过神经网络模型处理得到的码流与所述第一标准码流的差异满足预设条件时,得到第一神经网络模型;
    对应存储所述第一神经网络模型和所述第一类型信息至所述存储器。
  18. 根据权利要求16或17所述的装置,其特征在于,所述类型信息包括所述训练码流的供应商和所述训练码流的媒体元数据。
  19. 根据权利要求18所述的装置,其特征在于,所述媒体元数据包括影片题材、影片类型、影片场景、码流的色域、码流的分辨率或码流的码率中的任意一个或多个。
  20. 根据权利要求16至19中任一项所述的装置,其特征在于,所述传输接口还用于:
    将神经网络模型和与所述神经网络模型对应的类型信息传输给所述类型信息中包括的码流供应商。
  21. 一种电子设备,其特征在于,所述电子设备包括如权利要求11至15任一项所述的装置和/或如权利要求16至20任一项所述的装置。
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有程序指令,当所述程序指令在计算机或处理器上运行时,使得所述计算机或所述处理器执行如权利要求1-5或者6-10中任一项所述的方法。
PCT/CN2020/128960 2019-12-06 2020-11-16 码流数据的处理方法和装置 WO2021109846A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911237986.3 2019-12-06
CN201911237986.3A CN112929703A (zh) 2019-12-06 2019-12-06 码流数据的处理方法和装置

Publications (1)

Publication Number Publication Date
WO2021109846A1 true WO2021109846A1 (zh) 2021-06-10

Family

ID=76161317

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/128960 WO2021109846A1 (zh) 2019-12-06 2020-11-16 码流数据的处理方法和装置

Country Status (2)

Country Link
CN (1) CN112929703A (zh)
WO (1) WO2021109846A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116170426A (zh) * 2021-11-23 2023-05-26 华为技术有限公司 数据传输的方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107172428A (zh) * 2017-06-06 2017-09-15 西安万像电子科技有限公司 图像的传输方法、装置和***
CN107197260A (zh) * 2017-06-12 2017-09-22 清华大学深圳研究生院 基于卷积神经网络的视频编码后置滤波方法
CN108965920A (zh) * 2018-08-08 2018-12-07 北京未来媒体科技股份有限公司 一种视频内容拆条方法及装置
CN110166828A (zh) * 2019-02-19 2019-08-23 腾讯科技(深圳)有限公司 一种视频处理方法和装置
WO2019197715A1 (en) * 2018-04-09 2019-10-17 Nokia Technologies Oy An apparatus, a method and a computer program for running a neural network

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106534942A (zh) * 2016-11-04 2017-03-22 微鲸科技有限公司 视频画质调整方法及装置
CN108462876B (zh) * 2018-01-19 2021-01-26 瑞芯微电子股份有限公司 一种视频解码优化调整装置及方法
CN108712674A (zh) * 2018-05-17 2018-10-26 深圳创维-Rgb电子有限公司 视频播放控制方法、播放设备及存储介质
CN109769143A (zh) * 2019-02-03 2019-05-17 广州视源电子科技股份有限公司 视频图像处理方法、装置、视频***、设备和存储介质
CN110009048B (zh) * 2019-04-10 2021-08-24 苏州浪潮智能科技有限公司 一种神经网络模型的构建方法以及设备
CN110532871B (zh) * 2019-07-24 2022-05-10 华为技术有限公司 图像处理的方法和装置
KR20190110965A (ko) * 2019-09-11 2019-10-01 엘지전자 주식회사 이미지 해상도를 향상시키기 위한 방법 및 장치

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107172428A (zh) * 2017-06-06 2017-09-15 西安万像电子科技有限公司 图像的传输方法、装置和***
CN107197260A (zh) * 2017-06-12 2017-09-22 清华大学深圳研究生院 基于卷积神经网络的视频编码后置滤波方法
WO2019197715A1 (en) * 2018-04-09 2019-10-17 Nokia Technologies Oy An apparatus, a method and a computer program for running a neural network
CN108965920A (zh) * 2018-08-08 2018-12-07 北京未来媒体科技股份有限公司 一种视频内容拆条方法及装置
CN110166828A (zh) * 2019-02-19 2019-08-23 腾讯科技(深圳)有限公司 一种视频处理方法和装置

Also Published As

Publication number Publication date
CN112929703A (zh) 2021-06-08

Similar Documents

Publication Publication Date Title
CN107147942B (zh) 视频信号传输方法、设备、装置以及存储介质
US10839565B1 (en) Decoding apparatus and operating method of the same, and artificial intelligence (AI) up-scaling apparatus and operating method of the same
WO2019210822A1 (zh) 视频编解码方法、装置、***及存储介质
JP2019193269A (ja) 送信装置、送信方法、受信装置および受信方法
WO2020135357A1 (zh) 数据压缩方法及装置、数据编码/解码方法及装置
KR20220020367A (ko) 이미지 프로세싱 방법 및 장치
KR20210028654A (ko) Sl-hdr2 포맷에서 중간 동적 범위 비디오 신호를 처리하기 위한 방법 및 장치
JP6980054B2 (ja) 画像データを処理する方法および装置
CN112686810A (zh) 一种图像处理的方法及装置
US20170279866A1 (en) Adaptation of streaming data based on the environment at a receiver
WO2021109846A1 (zh) 码流数据的处理方法和装置
CN108737877B (zh) 图像处理的方法、装置和终端设备
US11659223B2 (en) System, device and method for displaying display-dependent media files
WO2021168624A1 (zh) 视频图像编码方法、设备及可移动平台
JP2018507618A (ja) カラー・ピクチャを符号化および復号する方法および装置
US11792359B2 (en) Efficient electro-optical transfer function (EOTF) curve for standard dynamic range (SDR) content
JP2019097013A (ja) ディスプレイ適合hdr画像を再構成する方法およびデバイス
KR102589858B1 (ko) 복호화 장치 및 그 동작방법, 및 ai 업 스케일 장치 및 그 동작방법
CN113747099B (zh) 视频传输方法和设备
CN117149123A (zh) 数据处理方法、装置、电子设备
KR20170032605A (ko) 영상 칼라성분 샘플링 위치 정보 전송을 통한 비디오 신호 복호화 방법 및 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20897591

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20897591

Country of ref document: EP

Kind code of ref document: A1