WO2020062074A1 - Reconstructing distorted images using convolutional neural network - Google Patents

Reconstructing distorted images using convolutional neural network Download PDF

Info

Publication number
WO2020062074A1
WO2020062074A1 PCT/CN2018/108441 CN2018108441W WO2020062074A1 WO 2020062074 A1 WO2020062074 A1 WO 2020062074A1 CN 2018108441 W CN2018108441 W CN 2018108441W WO 2020062074 A1 WO2020062074 A1 WO 2020062074A1
Authority
WO
WIPO (PCT)
Prior art keywords
filter
cnn
image
side information
distorted image
Prior art date
Application number
PCT/CN2018/108441
Other languages
French (fr)
Inventor
Jiabao YAO
Xiaoyang Wu
Xiaodan Song
Li Wang
Original Assignee
Hangzhou Hikvision Digital Technology Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co., Ltd. filed Critical Hangzhou Hikvision Digital Technology Co., Ltd.
Priority to PCT/CN2018/108441 priority Critical patent/WO2020062074A1/en
Publication of WO2020062074A1 publication Critical patent/WO2020062074A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation

Definitions

  • This disclosure relates to reconstructing distorted images.
  • CNN convolutional neural network
  • CNN uses artificial neurons that respond to a part of the surrounding cells within the coverage area to improve performance for large-scale image processing.
  • the present disclosure describes reconstructing distorted images.
  • a computer-implemented method implemented by a video codec includes receiving, by at least one processor, image data of at least one distorted image; selecting, by the at least one processor, at least one type of filter from a plurality of filter types based on the image data, wherein the type of filter is selected by using a first convolutional neural network (CNN) ; and using a filter of the selected type to generate a reconstructed image corresponding to the distorted image.
  • CNN convolutional neural network
  • a first feature combinable with any of the following features, where the image data represents a portion of the at least one distorted image.
  • DBK deblocking
  • SAO sample adaptive offset
  • ALF Adaptive Loop Filter
  • a third feature combinable with any of the previous or following features, where the selected type of filter is the CNN type.
  • a fourth feature combinable with any of the previous or following features, where the type of filter is selected further based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
  • a fifth feature combinable with any of the previous or following features, further comprising: generating controlling coefficients to adjust weights or biases of the filter by using a second CNN.
  • a sixth feature combinable with any of the previous or following features, where the controlling coefficients adjust more than one convolution kernals in a same channel with a same value.
  • a seventh feature combinable with any of the previous or following features, where the controlling coefficients adjust different convolution kernals in a same channel with different values.
  • controlling coefficients are generated based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
  • a ninth feature combinable with any of the previous or following features, where controlling coefficients are generated based on a preconfigured computation boundary or a target quality factor.
  • a tenth feature combinable with any of the previous or following features, where the controlling coefficients are used to determine whether to omit a convolutional layer in generating a reconstructed image.
  • An eleventh feature combinable with any of the previous or following features, where the image data comprises data for at least one of a luminance component or a color component.
  • a computer-implemented method includes receiving, by at least one processor, image data of at least one distorted image; selecting, by the at least one processor, a layer path of a convolutional neural network (CNN) filter based on the image data, wherein the layer path of the CNN filter is selected by using a first CNN; and generating, by the at least one processor, a reconstructed image corresponding to the at least one distorted image by using the selected layer path of the CNN filter.
  • CNN convolutional neural network
  • a first feature combinable with any of the following features, where the image data represents a portion of the at least one distorted image.
  • a third feature combinable with any of the previous or following features, where the layer path is selecged based on a preconfigured computation boundary or a target quality factor.
  • a fourth feature combinable with any of the previous or following features, further comprising: generating controlling coefficients to adjust weights or biases of the CNN filter by using a second CNN.
  • a fifth feature combinable with any of the previous or following features, where the controlling coefficients are generated based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
  • a sixth feature combinable with any of the previous or following features, where the image data comprises data for at least one of a luminance component or a color component.
  • a computer-readable medium storing computer instructions, that when executed by one or more hardware processors, cause the one or more hardware processors of a router to perform operations including: receiving, by at least one processor, image data of at least one distorted image; selecting, by the at least one processor, at least one type of filter from a plurality of filter types based on the image data, wherein the type of filter is selected by using a first convolutional neural network (CNN) ; and using a filter of the selected type to generate a reconstructed image corresponding to the distorted image.
  • CNN convolutional neural network
  • the previously described implementation is implementable using a computer-implemented method; a non-transitory, computer-readable medium storing computer-readable instructions to perform the computer-implemented method; and a computer-implemented system comprising a computer memory interoperably coupled with a hardware processor configured to perform the computer-implemented method and the instructions stored on the non-transitory, computer-readable medium.
  • FIG. 1 is an example communication system 100 that reconstructs distorted images, according to an implementation.
  • FIG. 2 is a flow diagram illustrating an example process for reconstructing images, according to an implementation.
  • FIG. 3 is a schematic diagram illustrating using a CNN to select a filter to reconstruct a distorted image, according to an implementation.
  • FIG. 4 is a schematic diagram illustrating using a CNN to generate coefficients of a filter used to reconstruct a distorted image, according to an implementation.
  • FIG. 5 includes schematic diagrams that illustrate different levels of controlling coefficients, according to an implementation.
  • FIG. 6 is a flowchart illustrating an example method for reconstructing an image, according to an implementation.
  • FIG. 7 is a block diagram of an example computer system used to provide computational functionalities associated with described algorithms, methods, functions, processes, flows, and procedures, as described in the instant disclosure, according to an implementation.
  • FIG. 8 is a schematic diagram illustrating an example structure of an electronic circuit that reconstructs images as described in the present disclosure, according to an implementation.
  • FIG. 9 is a schematic diagram illustrating the construct of an example CNN filter that reconstructs an image, according to an implementation.
  • FIG. 10 is a flowchart illustrating another example method for reconstructing an image, according to an implementation.
  • CNN simplifies the complexity because it can reduce or avoid complex pre-processing of the image and can directly use the original image for end-to-end learning. Furthermore, traditional neural networks are fully connected, which means that input-to-hidden neurons are all connected. Such configurations result in a large amount of parameters, thus complicating the training process and consuming large amount of computation resources. By using local connections and weight sharing, CNN saves computation resources and improves computation efficiency.
  • an image de-distortion filter can be used to post-process the distorted images to reconstruct the images by, for example, restoring the pixel intensity offset and reducing visual loss.
  • DBK filters Deblocking filters
  • filter coefficients and structures can be adjusted based on statistical information of local image regions.
  • adaptive filters include Sample Adaptive Offset (SAO) filter and Adaptive Loop Filter (ALF) .
  • SAO Sample Adaptive Offset
  • ALF Adaptive Loop Filter
  • different filter parameters e.g., filter coefficients, corresponding to different local statistical information are included in the software code so that they can be chosen based on the local statistical information for different images. As a result, the size and complexity of the software code for these adaptive filters would increase.
  • deep learning technology can be applied to image processing.
  • Using CNN filters to reconstruct distorted images can provide a significant improvement in the quality of the reconstructed images.
  • Such an approach can reduce or avoid the process of image pre-processing and manually designing filter coefficients. It learns the image distortion characteristics and compensation methods through data driving training, is simpler to use, more generalized, and more accurate. The improvement is more pronounced when applied to image and video compression that incorporates multiple distortions.
  • Some CNN filters e.g., Variable-filter-size Residue-learning Convolutional Neural Network (VRCNN)
  • VRCNN Variable-filter-size Residue-learning Convolutional Neural Network
  • Conventional convolutional neural network filters such as VRCNN only provides a high-dimensional filter, and do not include a decision-making filtering process for different texture inputs. This may cause a lack of generalization abilities of these filters and performance loss.
  • the process of reconstructing distorted images can be improved by incorporating a CNN based decision-making filtering process.
  • the decision making filtering process can be used to select filters, determine filter coefficients, or a combination thereof.
  • a side information guide map can be generated based on distorted parameters, and used as an input to the decision making filtering process. FIGS. 1-8 and associated descriptions provide additional details of these implementations.
  • FIG. 1 is an example communication system 100 that reconstructs distorted images, according to an implementation.
  • the example communication system 100 includes an electronic device 102 and a server 120, that are communicatively coupled with a network 110.
  • the server 120 represents an application, a set of applications, software, software modules, hardware, or any combination thereof that can be configured to provide trained CNN models.
  • the server 120 can perform training of the CNN models based on a set of training data.
  • the server 120 can receive data from different electronic devices that perform image processing to construct additional sets of training data.
  • the electronic device 102 represents an electronic device that can reconstruct distorted images.
  • the electronic device 102 can be a video codec or include a video codec.
  • the video codec can perform an image reconstruction process during encoding and decoding operations of video images.
  • the electronic device 102 can be a graphics-processing unit (GPU) or include a GPU that reconstructs distorted image data.
  • the electronic device 102 can use a first CNN to select a filter based on the distorted image.
  • the electronic device 102 can use a second CNN to generate controlling coefficients based on the distorted image, and use the controlling coefficients to adjust the weights of the filter.
  • the electronic device 102 can use the filter to reconstruct the distorted image.
  • the electronic device 102 can generate a side information guide map and use the side information guide map as additional input to the first and the second CNN. In some cases, the electronic device 102 can also receive parameters of trained CNN models from the server 120. FIGS. 2-8 and associated descriptions provide additional details of these implementations.
  • the electronic device 102 may include, without limitation, any of the following: endpoint, computing device, mobile device, mobile electronic device, user device, mobile station, subscriber station, portable electronic device, mobile communications device, wireless modem, wireless terminal, or other electronic device.
  • an endpoint may include an IoT (Internet of Things) device, EoT (Enterprise of Things) device, cellular phone, personal data assistant (PDA) , smart phone, laptop, tablet, personal computer (PC) , pager, portable computer, portable gaming device, wearable electronic device, health/medical/fitness device, camera, vehicle, or other mobile communications devices having components for communicating voice or data via a wireless communication network.
  • the wireless communication network may include a wireless link over at least one of a licensed spectrum and an unlicensed spectrum.
  • the term “mobile device” can also refer to any hardware or software component that can terminate a communication session for a user.
  • the terms “user equipment, ” “UE, ” “user equipment device, ” “user agent, ” “UA, ” “user device, ” and “mobile device” can be used interchangeably herein.
  • the example communication system 100 includes the network 110.
  • the network 110 represents an application, set of applications, software, software modules, hardware, or combination thereof, that can be configured to transmit data messages between the entities in the system 100.
  • the network 110 can include a wireless network, a wireline network, the Internet, or a combination thereof.
  • the network 110 can include one or a plurality of radio access networks (RANs) , core networks (CNs) , and the Internet.
  • the RANs may comprise one or more radio access technologies.
  • the radio access technologies may be Global System for Mobile communication (GSM) , Interim Standard 95 (IS-95) , Universal Mobile Telecommunications System (UMTS) , CDMA2000 (Code Division Multiple Access) , Evolved Universal Mobile Telecommunications System (E-UMTS) , Long Term Evaluation (LTE) , LTE-Advanced, the fifth generation (5G) , or any other radio access technologies.
  • GSM Global System for Mobile communication
  • UMTS Universal Mobile Telecommunications System
  • CDMA2000 Code Division Multiple Access
  • E-UMTS Evolved Universal Mobile Telecommunications System
  • LTE Long Term Evaluation
  • LTE-Advanced Long Term Evaluation
  • the fifth generation 5G
  • the core networks may be evolved packet cores (EPCs) .
  • FIG. 1 While elements of FIG. 1 are shown as including various component parts, portions, or modules that implement the various features and functionality, nevertheless, these elements may instead include a number of sub-modules, third-party services, components, libraries, and such, as appropriate. Furthermore, the features and functionality of various components can be combined into fewer components, as appropriate.
  • FIG. 2 is a flow diagram illustrating an example process 200 for reconstructing images, according to an implementation.
  • the process 200 can be performed by an electronic device that reconstructs images, e.g., the electronic device 102 as illustrated in FIG. 1.
  • process 200 may be performed, for example, by any suitable system, environment, software, and hardware, or a combination of systems, environments, software, and hardware, as appropriate.
  • various steps of process 200 can be run in parallel, in combination, in loops, or in any order.
  • a prediction process is performed.
  • input data is compared with a predicted frame to generate residual data.
  • the predicted frame is generated based on one or more prediction models.
  • the prediction models include temporal prediction models and spatial prediction models.
  • the temporal prediction models provide inter-frame predictions.
  • the spatial prediction models provide intra-frame predictions.
  • the prediction process can be performed continuously.
  • the input data represents the raw image data of the n-th frame.
  • a predicted frame for the n-th frame can be generated by applying the prediction models on a reference frame of the previous frame, i.e., the (n-1) -th frame.
  • the reference frame can be generated based on information of multiple previous frames, e.g., the (n-1) -th frame, the (n-2) -th frame, and etc.
  • the input data of the n-th frame is compared with the predicted frame for the n-th frame to generate residual data of the n-th frame.
  • the residual data of the n-th frame is processed through other steps in FIG. 2, e.g., transformation, quantization, and entropy encoding to generate output data of the n-th frame.
  • the residual data is also processed through reverse quantization and transformation and a filtering process to generate a reference frame of the n-th frame.
  • the reference frame of the n-th frame will be used to generate the predicted frame for the next frame, i.e., the (n+1) -th frame, which is used for encoding the (n+1) -th frame.
  • the input data can be provided in different formats.
  • the input data can be provided in the YCrCb format.
  • the input data includes the luminance component Y, and the color components Cr and Cb.
  • Other formats e.g., RGB, can also be used.
  • the residual data is transformed.
  • transformation techniques that can be used in this step include Karhunen-Loeve Transform (KLT) , Discrete Cosine Transformation (DCT) , Discrete Wavelet Transform (DWT) , and etc.
  • the transformed data is quantized.
  • quantization techniques that can be used in this step include scalar quantizer, vector quantizer, and etc.
  • the transform data can have a large range of values. Quantization can reduce the range of values, thereby obtaining a better compression effect.
  • the quantization is a main factor that causes image distortion.
  • the quantization process is configured by one or more quantization parameters, which can be used to configure the degrees of quantization. The following equation represents an example calculation of the quantization process:
  • QP represents a quantization parameter.
  • QP can be an integer value between 0 to 51.
  • entropy encoding is performed on the quantized data.
  • the inputs to the entropy encoder can include the quantized data generated at step 206.
  • the inputs to the entropy encoder can also include motion vectors, headers of the raw image of the current frame, supplementary information, side information, and any combinations thereof.
  • the entropy encoder converts these inputs to an encoded frame, and outputs the encoded frame as output data for storage or transmission. In some cases, reordering can also be performed on the quantized data prior to the entropy encoding.
  • the quantized data is also used to generate a reference frame for the next frame.
  • a reverse quantization and transformation process is performed on the quantized data to generate image data of a distorted image.
  • the reverse quantization and transformation process is implemented using techniques that corresponds to those used in the transformation and quantization process. For example, if the quantization process uses vector scaling, the reverse quantization process uses vector rescaling; if the transformation process uses DCT, the reverse transformation process uses Inverse DCT (IDCT) .
  • the quantized data after being reverse quantized and transformed, can be added to the predicted frame used in the prediction step (e.g., step 202) to generate the image data of the distorted image.
  • the distorted image can include multiple portions.
  • the distorted image can be divided into multiple pieces. Accordingly, multiple groups of image data can be generated. Each group represents one or more portions of the distorted images.
  • each group of image data can be processed separately in the following steps.
  • the distorted image discussed in the subsequent steps refers to the group of image data corresponding to the portion of the distorted image that is processed together.
  • one group of image data can represent the entire distorted image and can be processed together.
  • the distorted image discussed in the subsequent steps refers to the image data corresponding to the entire distorted image.
  • image data of more than one distorted images can be processed together in selecting the filter and generating controlling coefficients. In these cases, the distorted image discussed in the subsequent steps refers to the image data corresponding to multiple distorted images.
  • the image data of the distorted image can be grouped into different components. These components can include luminance components and color components.
  • the distorted image can be represented in a YCrCb format, and the image data can have the Y, Cr, and Cb components. In these cases, different components of the image data can be processed separately or jointly.
  • the quantized parameter is used as input to generate a side information guide map.
  • side information of an image refers to prior knowledge that can be used to assist the processing of one or more images.
  • conventional CNN filters such as VRCNN may rely on side information of the sensor that produce the image to determine suitable filter coefficients for the image, and thus its performance may suffer if such side information is not available.
  • a side information guide map can be generated based on the quantized parameter used in the quantization step. Because the quantization parameter indicates an extent to distortion introduced in the encoding process, using the quantization parameter to generate a side information guide map can provide additional input to the CNN to improve the performance.
  • the side information guide map can be generated by two steps: obtaining distortion parameters and normalization.
  • the distortion parameters can be generated based on the quantization parameter discussed previously.
  • the distortion parameters can also be generated based on other information that indicates the extent of image distortion, such as block size and sampling parameters used in the encoding process, or the resolution restoration factor in a high-resolution image restoration process.
  • the side information guide map can have the same dimension (e.g., width and height) as the portion of distorted image represented by the .
  • Each pixel on the side information guide map can be represented by a distortion parameter that indicates the degree of distortion for the corresponding pixel in the distorted image.
  • the distortion parameter can be obtained on the side information parameters discussed previously, e.g., the quantization parameters or the resolution restoration factor.
  • the information regarding the degree of distortion may not be readily available.
  • an image may be subjected to multiple digital image processing such as scaling and compression, which may not be known by the electronic device that performs the image reconstruction process.
  • non-reference image evaluation techniques can be used to determine the distortion parameter for the side information guide map.
  • fuzzy degree evaluation can be used as such a non-reference image evaluation. Following equation represents an example fuzzy degree evaluation technique:
  • f (x, y) represents the value of the pixel in the coordinate (x, y)
  • D (f) can represents the fuzzy degree in some circumstances.
  • the distortion parameters can be normalized so that the range of values of each distortion parameter in the guide map is consistent with the range of values of the corresponding pixel in the distorted image.
  • the range of values for the quantization parameters is [QP_MIN, QP_MAX]
  • the range of values for pixels in the distorted image is [PIXEL_MIN, PIXEL_MAX] .
  • the normalized quantization parameters norm (x) can be obtained using the following equation:
  • a first CNN is used to select a type of a filter.
  • different filters can be used as candidate filters for reconstructing the images.
  • the type of the filter selected by the first CNN can be a non-CNN type, such as DBK, SAO, ALF, or a CNN type.
  • a CNN filter can outperform the non-CNN filters.
  • a non-CNN filter can outperform the CNN filters.
  • Using a CNN to select an appropriate filter can improve the performance of the image reconstruction process.
  • the first CNN can be used to select more than one filters.
  • the candidate filters can include more than one types of CNN filters. These CNN filters can have different constructs (e.g., layers) , filter parameters (e.g., weights or bias) .
  • the candidate filters may not include non-CNN filters, e.g., non-CNN filters may not be included for image reconstruction that is not part of a video coding process.
  • the first CNN can be used to select one or more CNN filters among the different CNN filters
  • FIG. 3 is a schematic diagram 300 illustrating using a CNN to select a filter to reconstruct a distorted image, according to an implementation.
  • the diagram 300 includes a first CNN 310 and a first trained CNN model 320.
  • the first CNN 310 represents a CNN that is configured to select a filter to reconstruct a distorted image.
  • the first CNN 310 includes an input layer 312, a hidden layer 314, and an output layer 316.
  • the first CNN 310 can be implemented by using any suitable system, environment, software, and hardware, or a combination of systems, environments, software, and hardware, as appropriate.
  • the input layer 312 is configured to receive input data to the first CNN 310.
  • the input data can include the distorted image as discussed previouslyIn some cases, if the side information guide map can be generated as discussed previously, the side information guide map can also be included as the input data. If the side information guide map is not available, e.g., due to lack of information regarding the degree of distortion, the first CNN 310 can proceed without the side information guide map.
  • features of the distorted image can also be used as input to the first CNN 310.
  • these features include linear features or non-linear features.
  • Linear features also referred to as textual features, include features generated by linear transformations, e.g., gradient features.
  • Non-linear features include features generated by non-linear transformations, e.g., frequency features generated by Fourier transformation such as Fast Fourier Transform (FFT) , wavelet features generated by Wavelet Transform (WT) , or features generated by non-linear activations.
  • FFT Fast Fourier Transform
  • WT Wavelet Transform
  • the input layer 312 can perform a channel merging operation and a convolution filtering operation.
  • the channel merging operation combines the side information guide map with the distorted image for each channel to generate a combined input data, represented as I.
  • the convolution filtering operation perform a convolutional filtering on the combined input data I, as illustrated in the following equation:
  • W 1 represents the weighting coefficients of the convolutional filter used in the input layer 312
  • B 1 represents the bias coefficients of the convolutional filter
  • g () represents a non-linear mapping function
  • F 1 (I) represents the output of the input layer 312.
  • each convolutional filter can have a kernel of a size c1 x f1 x f1, where C1 represents the number of channels in the input data, and f1 represents the spatial size of each kernel.
  • the hidden layer 314 performs additional high-dimensional mapping on the image segmentation of sparse representations extracted by the input layer 312.
  • the hidden layer 314 includes one or more convolution layers.
  • F i (I) g (W i *F i-1 (I) +B i ) , i ⁇ ⁇ 2, 3, ..., N ⁇
  • F i (I) represents the output of the i-th convolutional layer
  • W i represents the weighting coefficients of the convolutional filter used in the i-th convolutional layer
  • B i represents the bias coefficients of the convolutional filter used in the i-th convolutional layer
  • g () represents a non-linear mapping function
  • each convolutional filter can have a kernel of a size c2 x f2 x f2, where C2 represents the number of channels for input data to the convolutional layer, and f2 represents the spatial size of each kernel.
  • the output layer 316 processes the high-dimensional image output from the hidden layer 314 and generates the filter selection decisions.
  • the filter decisions can be a CNN filter, or a non-CNN filter, such as a DBK, ALF, or SAO filter. In some cases, more than one type of filter can be selected.
  • the output layer 316 can further perform a convolutional operation on the distorted image to generate the reconstruction image.
  • the output layer 316 can include one reconstruction layer.
  • the operation of the reconstruction layer can be represented by the following equation:
  • F (I) represents the output of the reconstruction layer
  • F N-1 (I) represents output of the hidden layer 134
  • W N represents the weighting coefficients of the convolutional filter used in the reconstruction layer
  • B N represents the bias coefficients of the convolutional filter used in the reconstruction layer.
  • each convolutional filter can have a kernel of a size cN x fN x fN, where CN represents the number of channels for input data to the construction layer, and fN represents the spatial size of each kernel.
  • the first CNN 310 uses the first trained CNN model 320 to make filter selection decisions.
  • the parameters related to the network structure of the first CNN 310 including e.g., the number of convolutional layers, concatenation of convolutional layers, the number of convolution filters per convolutional layer, and the size of the convolution kernel, can be fixed, while the filter coefficients, e.g., the weighting coefficients and bias coefficients, can be configured based on the first trained CNN model 320.
  • the filter coefficients e.g., the weighting coefficients and bias coefficients
  • the parameter set of the first CNN 310 can be stored on the electronic device that performs the image reconstruction process.
  • the parameter set can be downloaded or updated from a server that performs training based on data collected from multiple electronic devices.
  • training for the first CNN 310 can be performed in the following steps:
  • a side information guide map is generated for a large number of undistorted images based on different noise sources from the natural image.
  • Corresponding distorted images can also be generated to form a training set.
  • the parameters of the CNN are initialized as ⁇ 0 , and the training-related high-level parameters such as the learning rate and the weight-updating algorithm are set.
  • the loss function can be adjusted to improve the converging process.
  • the third and fourth septs are repeated until the loss function converges, at which point the final parameter ⁇ final is outputted .
  • the final parameter ⁇ final can be represented by the first trained CNN model 320 to configure the first CNN 310.
  • a non-CNN filter e.g., SAO, ALF, or DEK
  • the process 200 proceeds from 222 to 230, where the selected non-CNN filter is used in the to generate the reference frame.
  • the process 200 proceeds from 222 to 224, where filter coefficients of the CNN filter are generated.
  • a second CNN is used to determine a set of controlling coefficients based on the distorted image, the side information guide map, or a combination thereof.
  • the controlling coefficients can be used to adjust the configured weights of the CNN filter to generate the filter coefficients for the CNN filter to be used in the filtering process.
  • FIG. 4 is a schematic diagram 400 illustrating using a CNN to generate coefficients of a filter used to reconstruct a distorted image, according to an implementation.
  • the diagram 400 includes a second CNN 410 and a second trained CNN model 420.
  • the second CNN 410 represents a CNN that is configured to generate controlling coefficient that can be used to adjust the configured filter weights or filter biases of the filter.
  • the second CNN 410 includes an input layer 412, a hidden layer 414, and an output layer 416.
  • the second CNN 410 can be implemented by using any suitable system, environment, software, and hardware, or a combination of systems, environments, software, and hardware, as appropriate.
  • the input layer 412 is configured to receive input data to the second CNN 410.
  • the input data can include the distorted image, as discussed previously.
  • the side information guide map can be generated as discussed previously, the side information guide map can also be included as the input data. If the side information guide map is not available, e.g., due to lack of information regarding the degree of distortion, the second CNN 410 can proceed without the side information guide map.
  • features of the distorted image can also be used as input to the second CNN 410. Examples of these features include linear features or non-linear features discussed previously.
  • a preconfigured computation boundary or a target quality factor can also be used as input to the second CNN 410 to generate the controlling coefficient.
  • the target quality factor can be a Peak Signal to Noise Ratio (PSNR) value. Following equation represents an example PSNR value calculation:
  • the input layer 412 can perform a channel merging operation and a convolution filtering operation.
  • the channel merging operation combines the side information guide map with the distorted image for each channel to generate a combined input data, represented as I.
  • the convolution filtering operation perform a convolutional filtering on the combined input data I, as illustrated in the following equation:
  • W 1 represents the weighting coefficients of the convolutional filter used in the input layer 412
  • B 1 represents the bias coefficients of the convolutional filter
  • g () represents a non-linear mapping function
  • F 1 (I) represents the output of the input layer 412.
  • each convolutional filter can have a kernel of a size c1 x f1 x f1, where C1 represents the number of channels in the input data, and f1 represents the spatial size of each kernel.
  • the input data can also be extracted from the feature map of a convolutional layer in the CNN filter.
  • the hidden layer 414 performs additional high-dimensional mapping on the image segmentation of sparse representations extracted by the input layer 412.
  • the hidden layer 414 includes one or more convolution layers.
  • F i (I) g (W i *F i-1 (I) +B i ) , i ⁇ ⁇ 2, 3, ..., N ⁇
  • F i (I) represents the output of the i-th convolutional layer
  • W i represents the weighting coefficients of the convolutional filter used in the i-th convolutional layer
  • B i represents the bias coefficients of the convolutional filter used in the i-th convolutional layer
  • g () represents a non-linear mapping function
  • the outputted controlling coefficients are used to adjust the configured CNN filter weights.
  • the adjustment can be performed by a multiplication operation.
  • the controlling coefficients can be multiplied with the configured filter weights to generate the coefficients of the CNN filter.
  • Other operations e.g., addition or convolution, can also be used to perform the adjustment.
  • the coefficients can be controlled at a channel level or a pixel level.
  • FIG. 5 includes schematic diagrams 510 and 520 that illustrate different levels of controlling coefficients, according to an implementation.
  • the schematic diagram 510 illustrates controlling coefficients at a pixel level, according to an implementation.
  • F i-j represents the j-th feature map of the i-th convolutional layer.
  • Cf i-j represents a set of controlling coefficients that are applied to different convolution kernals in the j-th feature map of the i-th convolutional layer.
  • F i-1, F i-2, and F i-3 represent the first, second, and third feature map of the i-th convolution layer, respectively.
  • Cf i-1, Cf i-2 , and Cf i-3 represent the set of controlling coefficients for the convolution kernals in these feature maps, respectively.
  • the dimension of the controlling coefficients is the same as the dimension of the feature map. Accordingly, different controlling coefficients in the set of the controlling coefficients can be applied to different convolution kernals in the same channel of the CNN filter.
  • the schematic diagram 520 illustrates controlling coefficients at a channel level, according to an implementation.
  • the same controlling coefficient is applied to different convolution kernals in the feature map.
  • S i-1 represents the controlling coefficient that is applied to all the convolution kernals in the first feature map of the i-th convolution layer.
  • S i-2 , and S i-3 represents the controlling coefficients that are applied to all the convolution kernals in the second and third feature maps of the i-th convolution layer, respectively.
  • the level of the controlling coefficients can be configured to be the same for the CNN filter, or differently or different feature map of the CNN filter.
  • each convolutional filter can have a kernel of a size c2 x f2 x f2, where C2 represents the number of channels for input data to the convolutional layer, and f2 represents the spatial size of each kernel.
  • the output layer 416 processes the high-dimensional image output from the hidden layer 414 and generates the controlling coefficients.
  • the second CNN 410 uses the second trained CNN model 420 to make filter selection decisions. Similarly to the first CNN 310, the parameters related to the network structure, the filter coefficients, or any combinations thereof can be configured based on the second trained CNN model 420.
  • the parameter set of the second CNN 410 can be stored on the electronic device that performs the image reconstruction process. Alternatively or additionally, the parameter set can be downloaded or updated from a server.
  • training for the second CNN 410 can be performed by constructing a training set, initializing parameter ⁇ 0 , and performing forward and backward calculation in an iteration process.
  • the training for the second CNN 410 can be performed jointly with or separately from the first CNN 310.
  • the loss function for the controlling coefficient training can be the represented as the following:
  • I n the input data based on the combination of side information guide map and distorted image.
  • ⁇ i ) represents the reconstructed image corresponding to the current parameter ⁇ i , X n represents undistorted image.
  • the distorted image is reconstructed using the selected filter, the generated filter coefficients, or a combination thereof.
  • the reconstructed image is used as the reference frame for the prediction operation of the next frame.
  • the controlling coefficients generated by the second CNN can also be used to simplify the reconstruction process. For example, if a control coefficient for a particular layer in the CNN filter is below a configured threshold, the particular layer can be skipped in the reconstruction operation. This approach can increase the processing speed and save computation resources.
  • the quantization and prediction step discussed previously can be skipped.
  • the input image may be the distorted image that is used as the input to step 220, 224, and 230 discussed previously.
  • While the process 200 illustrates an example encoding process, other image reconstruction process can also use CNNs to select filter type and generate controlling coefficients for CNN filters in a similar fashion.
  • a decoding process distorted images are generated based on encoded image data, and reconstructed images are generated based on the distorted image and reference frames.
  • a first CNN can be used to select a filter type used for the reconstruction process, and a second CNN can be used to generate controlling coefficients that are used to adjust filter weights of the reconstruction filter.
  • An image restoration can also be performed on distorted images that are generated or received in other processes.
  • FIG. 6 is a flowchart illustrating an example method 600 for reconstructing an image, according to an implementation.
  • the method 600 can be implemented by an electronic device shown in FIG. 1.
  • the method 600 can also be implemented using additional, fewer, or different entities.
  • the method 600 can also be implemented using additional, fewer, or different operations, which can be performed in the order shown or in a different order. In some instances, an operation or a group of operations can be iterated or repeated, for example, for a specified number of iterations or until a terminating condition is reached.
  • the example method 600 begins at 602, where image data of at least one distorted image is received.
  • at least one type of filter is selected from a plurality of filter types based on the image data.
  • the filter type selection is performed using a first CNN.
  • the plurality of filter types can include a deblocking (DBK) type, a sample adaptive offset (SAO) type, an Adaptive Loop Filter (ALF) type, or a CNN type.
  • the CNN type can be selected.
  • the filter is selected further based on a side information guide map of the distorted image or features of the distorted image.
  • controlling coefficients that adjust weights or biases of the reconstruction filter are generated by using a second CNN.
  • the second CNN uses the distorted image, the side information guide map of the distorted image, features of the distorted image, a preconfigured computation boundary, a target quality factor, or any combinations thereof to generate the controlling coefficients.
  • the step 604 and 606 can be performed together or separately.
  • a filter of the selected type is used to generate a reconstructed image corresponding to the distorted image.
  • the controlling coefficients are used to determine whether to omit a convolutional layer in generating a reconstructed image.
  • FIG. 9 is a schematic diagram 900 illustrating the construct of an example CNN filter 910 that reconstructs an image, according to an implementation. While the CNN filter 910 includes 4 layers (represented horizontally) and 3 scale levels (represented vertically) as illustrated. Additional layers and scale levels can be included in the CNN filter 910. As illustrated, each layer includes one or more features at each scale level, denoted as features 911, 921, 931 for layer 1, features 912, 922, 932 for layer 2, features 913, 923, 933 for layer 3, and features 914, 924, 934 for layer 4, respectively. As illustrated, the arrows represent convolutional (regular or strided) operations.
  • concatenations 941, 942, 943 represent concatenation operations at scale level 1; concatenations 951, 952, 953, 954, 955 represents concatenation operations at scale level 2; and concatenations 961, 962, 963, 964, 965, 966 represents concatenation operations at scale level 3.
  • the diagram 902 includes a distorted image 902.
  • the distorted image 902 is a blurred image of a cat.
  • Different layer paths in the CNN filter 910 can be taken to process the image data of the distorted image 902 to obtain a reconstructed image.
  • the CNN filter 910 includes layer paths 920 and 930.
  • the layer path 920 takes the image data, take a convolutional operation with features 911, then takes another convolutional operation with features 912, takes a concatenation operation at 941 (concatenating the convolutional operation outputs of features 911 and 912) , takes another convolutional operation at 913 and another concatenation operation at 942, then takes another convolutional operation at 914 and another concatenation operation at 943 to generate a reconstruction image as an output.
  • the layer path 930 takes concatenation operations 951, 952, 953, 954, 955, and 956 to generate a reconstruction image as an output. As illustrated, these concatenation operations concatenate the outputs of convolutional operation with features at scale level 1 and scale level 2. Therefore, while layer path 930 may generate a different and better image that the layer path 920, the layer path 930 involves more computations and therefore can be more time and resource consuming.
  • a CNN can be used to select the layer path to process the image data of the distorted image.
  • the CNN can take input of the image data of the at least one distorted image, a side information guide map of the at least one distorted image, features of the at least one distorted image, or any combinations thereof, and output a selected layer path.
  • the input of the CNN can further include a preconfigured computation boundary or a target quality factor.
  • a different CNN can take input of the image data of the at least one distorted image, a side information guide map of the at least one distorted image, features of the at least one distorted image, a preconfigured computation boundary, a target quality factor or any combinations thereof, and generate controlling coefficients to adjust the weights or biases of the filtering operations (e.g., the convolutional operations) on the selected layer path.
  • the filtering operations e.g., the convolutional operations
  • FIG. 10 is a flowchart illustrating another example method 1000 for reconstructing an image, according to an implementation.
  • the method 1000 can be implemented by an electronic device shown in FIG. 1.
  • the method 1000 can also be implemented using additional, fewer, or different entities.
  • the method 1000 can also be implemented using additional, fewer, or different operations, which can be performed in the order shown or in a different order. In some instances, an operation or a group of operations can be iterated or repeated, for example, for a specified number of iterations or until a terminating condition is reached.
  • the example method 1000 begins at 602, where image data of at least one distorted image is received.
  • a layer path of a convolutional neural network (CNN) filter is selected based on the image data, where the layer path of the CNN filter is selected by using a first CNN.
  • the selected layer path of the CNN filter is used to generate a reconstructed image corresponding to the distorted image.
  • CNN convolutional neural network
  • FIG. 7 is a block diagram of an example computer system 700 used to provide computational functionalities associated with described algorithms, methods, functions, processes, flows, and procedures, as described in the instant disclosure, according to an implementation.
  • the computer system 700 or more than one computer system 700, can be used to implement the electronic device that reconstructs the image, and the server that trains and provides the CNN models.
  • the illustrated computer 702 is intended to encompass any computing device, such as a server, desktop computer, laptop/notebook computer, wireless data port, smart phone, personal data assistant (PDA) , tablet computing device, one or more processors within these devices, or any other suitable processing device, including physical or virtual instances (or both) of the computing device. Additionally, the computer 702 may comprise a computer that includes an input device, such as a keypad, keyboard, touch screen, or other device that can accept user information, and an output device that conveys information associated with the operation of the computer 702, including digital data, visual, or audio information (or a combination of information) , or a graphical user interface (GUI) .
  • GUI graphical user interface
  • the computer 702 can serve in a role as a client, network component, a server, a database or other persistency, or any other component (or a combination of roles) of a computer system for performing the subject matter described in the instant disclosure.
  • the illustrated computer 702 is communicably coupled with a network 730.
  • one or more components of the computer 702 may be configured to operate within environments, including cloud-computing-based, local, global, or other environment (or a combination of environments) .
  • the computer 702 is an electronic computing device operable to receive, transmit, process, store, or manage data and information associated with the described subject matter. According to some implementations, the computer 702 may also include, or be communicably coupled with, an application server, e-mail server, web server, caching server, streaming data server, or other server (or a combination of servers) .
  • the computer 702 can receive requests over network 730 from a client application (for example, executing on another computer 702) and respond to the received requests by processing the received requests using an appropriate software application (s) .
  • requests may also be sent to the computer 702 from internal users (for example, from a command console or by other appropriate access methods) , external or third-parties, other automated applications, as well as any other appropriate entities, individuals, systems, or computers.
  • Each of the components of the computer 702 can communicate using a system bus 703.
  • any or all of the components of the computer 702, hardware or software (or a combination of both hardware and software) may interface with each other or the interface 704 (or a combination of both) , over the system bus 703 using an application programming interface (API) 712 or a service layer 713 (or a combination of the API 712 and service layer 713) .
  • the API 712 may include specifications for routines, data structures, and object classes.
  • the API 712 may be either computer-language independent or dependent and refer to a complete interface, a single function, or even a set of APIs.
  • the service layer 713 provides software services to the computer 702 or other components (whether or not illustrated) that are communicably coupled to the computer 702.
  • the functionality of the computer 702 may be accessible for all service consumers using this service layer.
  • Software services, such as those provided by the service layer 713 provide reusable, defined functionalities through a defined interface.
  • the interface may be software written in JAVA, C++, or other suitable language providing data in extensible markup language (XML) format or other suitable formats.
  • XML extensible markup language
  • alternative implementations may illustrate the API 712 or the service layer 713 as stand-alone components in relation to other components of the computer 702 or other components (whether or not illustrated) that are communicably coupled to the computer 702.
  • any or all parts of the API 712 or the service layer 713 may be implemented as child or sub-modules of another software module, enterprise application, or hardware module without departing from the scope of this disclosure.
  • the computer 702 includes an interface 704. Although illustrated as a single interface 704 in FIG. 7, two or more interfaces 704 may be used according to particular needs, desires, or particular implementations of the computer 702.
  • the interface 704 is used by the computer 702 for communicating with other systems that are connected to the network 730 (whether illustrated or not) in a distributed environment.
  • the interface 704 includes logic encoded in software or hardware (or a combination of software and hardware) and is operable to communicate with the network 730. More specifically, the interface 704 may include software supporting one or more communication protocols associated with communications such that the network 730 or interface’s hardware is operable to communicate physical signals within and outside of the illustrated computer 702.
  • the computer 702 includes a processor 705. Although illustrated as a single processor 705 in FIG. 7, two or more processors may be used according to particular needs, desires, or particular implementations of the computer 702. Generally, the processor 705 executes instructions and manipulates data to perform the operations of the computer 702 and any algorithms, methods, functions, processes, flows, and procedures as described in the instant disclosure.
  • the computer 702 also includes a database 706 that can hold data for the computer 702 or other components (or a combination of both) that can be connected to the network 730 (whether illustrated or not) .
  • database 706 can be an in-memory, conventional, or other type of database storing data consistent with this disclosure.
  • database 706 can be a combination of two or more different database types (for example, a hybrid in-memory and conventional database) according to particular needs, desires, or particular implementations of the computer 702 and the described functionality.
  • two or more databases can be used according to particular needs, desires, or particular implementations of the computer 702 and the described functionality.
  • database 706 is illustrated as an integral component of the computer 702, in alternative implementations, database 706 can be external to the computer 702.
  • the computer 702 also includes a memory 707 that can hold data for the computer 702 or other components (or a combination of both) that can be connected to the network 730 (whether illustrated or not) .
  • memory 707 can be Random Access Memory (RAM) , Read-Only Memory (ROM) , optical, magnetic, and the like, storing data consistent with this disclosure.
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • memory 707 can be a combination of two or more different types of memory (for example, a combination of RAM and magnetic storage) according to particular needs, desires, or particular implementations of the computer 702 and the described functionality. Although illustrated as a single memory 707 in FIG.
  • memories 707 can be used according to particular needs, desires, or particular implementations of the computer 702 and the described functionality. While memory 707 is illustrated as an integral component of the computer 702, in alternative implementations, memory 707 can be external to the computer 702.
  • the application 708 is an algorithmic software engine providing functionality according to particular needs, desires, or particular implementations of the computer 702, particularly with respect to functionality described in this disclosure.
  • application 708 can serve as one or more components, modules, or applications.
  • the application 708 may be implemented as multiple applications 708 on the computer 702.
  • the application 708 can be external to the computer 702.
  • the computer 702 can also include a power supply 714.
  • the power supply 714 can include a rechargeable or non-rechargeable battery that can be configured to be either user-or non-user-replaceable.
  • the power supply 714 can include power-conversion or management circuits (including recharging, standby, or other power management functionality) .
  • the power supply 714 can include a power plug to allow the computer 702 to be plugged into a wall socket or other power source to, for example, power the computer 702 or recharge a rechargeable battery.
  • computers 702 there may be any number of computers 702 associated with, or external to, a computer system containing computer 702, each computer 702 communicating over network 730.
  • clients, ” “user, ” and other appropriate terminology may be used interchangeably, as appropriate, without departing from the scope of this disclosure.
  • this disclosure contemplates that many users may use one computer 702, or that one user may use multiple computers 702.
  • FIG. 8 is a schematic diagram illustrating an example structure of an electronic circuit 800 that reconstructs images as described in the present disclosure, according to an implementation.
  • the electronic circuit 800 can be a component or a functional block of a codec, e.g., a video codec.
  • the electronic circuit 800 can also be a component or a functional block of a graphic processing unit.
  • the electronic circuit 800 includes a receiving circuit 802, a filter selection circuit 804, a filter coefficient determination circuit 806, a storage circuit 808, and a processing circuit 810 that are coupled to, or capable of communicating with, the receiving circuit 802, the filter selection circuit 804, the filter coefficient determination circuit 806, and the storage circuit 808.
  • the electronic circuit 800 can further include one or more circuits for performing any one or a combination of steps described in the present disclosure. In some implementations, some or all of these component circuits can be combined into fewer components.
  • the receiving circuit 802 is configured to receive image data that represents a distorted image.
  • the filter selection circuit 804 is configured to select a type of filter from a plurality of filter types based on the image data by using a first CNN.
  • the processing circuit 810 is configured to use a filter of the selected type to generate a reconstructed image corresponding to the distorted image.
  • the filter coefficient circuit 806 is configured to generate controlling coefficients to adjust weights of the filter by using a second CNN.
  • the storage circuit 808 is configured to store training models used by the first CNN and the second CNN.
  • Implementations of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
  • Implementations of the subject matter described in this specification can be implemented as one or more computer programs, that is, one or more modules of computer program instructions encoded on a tangible, non-transitory, computer-readable computer-storage medium for execution by, or to control the operation of, data processing apparatus.
  • the program instructions can be encoded in/on an artificially generated propagated signal, for example, a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
  • the computer-storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of computer-storage mediums.
  • real-time, ” “real time, ” “real time, ” “real (fast) time (RFT) , ” “near (ly) real-time (NRT) , ” “quasi real-time, ” or similar terms means that an action and a response are temporally proximate such that an individual perceives the action and the response occurring substantially simultaneously.
  • the time difference for a response to display (or for an initiation of a display) of data following the individual’s action to access the data may be less than 1 ms, less than 1 sec., or less than 5 secs.
  • data processing apparatus refers to data processing hardware and encompass all kinds of apparatus, devices, and machines for processing data, including by way of example, a programmable processor, a computer, or multiple processors or computers.
  • the apparatus can also be or further include special purpose logic circuitry, for example, a Central Processing Unit (CPU) , a Field Programmable Gate Array (FPGA) , or an Application-specific Integrated Circuit (ASIC) .
  • CPU Central Processing Unit
  • FPGA Field Programmable Gate Array
  • ASIC Application-specific Integrated Circuit
  • the data processing apparatus or special purpose logic circuitry may be hardware-or software-based (or a combination of both hardware-and software-based) .
  • the apparatus can optionally include code that creates an execution environment for computer programs, for example, code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of execution environments.
  • code that constitutes processor firmware for example, code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of execution environments.
  • the present disclosure contemplates the use of data processing apparatuses with or without conventional operating systems, for example LINUX, UNIX, WINDOWS, MAC OS, ANDROID, IOS, or any other suitable conventional operating system.
  • a computer program which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a computer program may, but need not, correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data, for example, one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, for example, files that store one or more modules, sub-programs, or portions of code.
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network. While portions of the programs illustrated in the various figures are shown as individual modules that implement the various features and functionality through various objects, methods, or other processes, the programs may instead include a number of sub-modules, third-party services, components, libraries, and such, as appropriate. Conversely, the features and functionality of various components can be combined into single components, as appropriate. Thresholds used to make computational determinations can be statically, dynamically, or both statically and dynamically determined.
  • the methods, processes, or logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output.
  • the methods, processes, or logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, for example, a CPU, an FPGA, or an ASIC.
  • Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors, both, or any other kind of CPU.
  • a CPU will receive instructions and data from a ROM or a Random Access Memory (RAM) , or both.
  • RAM Random Access Memory
  • the essential elements of a computer are a CPU, for performing or executing instructions, and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to, receive data from or transfer data to, or both, one or more mass storage devices for storing data, for example, magnetic, magneto-optical disks, or optical disks.
  • mass storage devices for storing data, for example, magnetic, magneto-optical disks, or optical disks.
  • a computer need not have such devices.
  • a computer can be embedded in another device, for example, a mobile telephone, a Personal Digital Assistant (PDA) , a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, for example, a Universal Serial Bus (USB) flash drive, to name just a few.
  • PDA Personal Digital Assistant
  • GPS Global Positioning System
  • USB Universal Serial Bus
  • Computer-readable media suitable for storing computer program instructions and data includes non-volatile memory, media and memory devices, including by way of example, semiconductor memory devices, for example, Erasable Programmable Read-Only Memory (EPROM) , Electrically Erasable Programmable Read-Only Memory (EEPROM) , and flash memory devices; magnetic disks, for example, internal hard disks or removable disks; magneto-optical disks; and CD-ROM, DVD+/-R, DVD-RAM, and DVD-ROM disks.
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • flash memory devices including by way of example, semiconductor memory devices, for example, Erasable Programmable Read-Only Memory (EPROM) , Electrically Erasable Programmable Read-Only Memory (EEPROM) , and flash memory devices; magnetic disks, for example, internal hard disks or removable disks; magneto-optical disks; and CD-ROM, DVD+/-R, DVD
  • the memory may store various objects or data, including caches, classes, frameworks, applications, backup data, jobs, web pages, web page templates, database tables, repositories storing dynamic information, and any other appropriate information including any parameters, variables, algorithms, instructions, rules, constraints, or references thereto. Additionally, the memory may include any other appropriate data, such as logs, policies, security or access data, reporting files, as well as others.
  • the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • implementations of the subject matter described in this specification can be implemented on a computer having a display device, for example, a Cathode Ray Tube (CRT) , Liquid Crystal Display (LCD) , Light Emitting Diode (LED) , or plasma monitor, for displaying information to the user and a keyboard and a pointing device, for example, a mouse, trackball, or trackpad by which the user can provide input to the computer.
  • a display device for example, a Cathode Ray Tube (CRT) , Liquid Crystal Display (LCD) , Light Emitting Diode (LED) , or plasma monitor
  • LCD Liquid Crystal Display
  • LED Light Emitting Diode
  • plasma monitor for displaying information to the user
  • keyboard and a pointing device for example, a mouse, trackball, or trackpad by which the user can provide input to the computer.
  • Input may also be provided to the computer using a touchscreen, such as a tablet computer surface with pressure sensitivity, a multi-touch screen
  • a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user’s client device in response to requests received from the web browser.
  • GUI graphical user interface
  • GUI may be used in the singular or the plural to describe one or more graphical user interfaces and each of the displays of a particular graphical user interface. Therefore, a GUI may represent any graphical user interface, including but not limited to, a web browser, a touch screen, or a Command Line Interface (CLI) that processes information and efficiently presents the information results to the user.
  • a GUI may include a plurality of User Interface (UI) elements, some or all associated with a web browser, such as interactive fields, pull-down lists, and buttons. These and other UI elements may be related to or represent the functions of the web browser.
  • UI User Interface
  • Implementations of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, for example, as a data server, or that includes a middleware component, for example, an application server, or that includes a front-end component, for example, a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components.
  • the components of the system can be interconnected by any form or medium of wireline or wireless digital data communication (or a combination of data communication) , for example, a communication network.
  • Examples of communication networks include a Local Area Network (LAN) , a Radio Access Network (RAN) , a Metropolitan Area Network (MAN) , a Wide Area Network (WAN) , Worldwide Interoperability for Microwave Access (WIMAX) , a Wireless Local Area Network (WLAN) using, for example, 802.11 a/b/g/n or 802.20 (or a combination of 802.11x and 802.20 or other protocols consistent with this disclosure) , all or a portion of the Internet, or any other communication system or systems at one or more locations (or a combination of communication networks) .
  • the network may communicate with, for example, Internet Protocol (IP) packets, Frame Relay frames, Asynchronous Transfer Mode (ATM) cells, voice, video, data, or other suitable information (or a combination of communication types) between network addresses.
  • IP Internet Protocol
  • ATM Asynchronous Transfer Mode
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • any claimed implementation is considered to be applicable to at least a computer-implemented method; a non-transitory, computer-readable medium storing computer-readable instructions to perform the computer-implemented method; and a computer system comprising a computer memory interoperably coupled with a hardware processor configured to perform the computer-implemented method or the instructions stored on the non-transitory, computer-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

A computer-implemented method for reconstructing digital images includes receiving, by at least one processor, receiving, by at least one processor, image data of at least one distorted image; selecting, by the at least one processor, at least one type of filter from a plurality of filter types based on the image data, wherein the type of filter is selected by using a first convolutional neural network (CNN); and using a filter of the selected type to generate a reconstructed image corresponding to the distorted image.

Description

[Title established by the ISA under Rule 37.2] RECONSTRUCTING DISTORTED IMAGES USING CONVOLUTIONAL NEURAL NETWORK TECHNICAL FIELD
This disclosure relates to reconstructing distorted images.
BACKGROUND
In the context of imaging processing technology, convolutional neural network (CNN) refers to a class of deep learning, feed-forward artificial neural networks, which can be used to analyze visual imagery. CNN uses artificial neurons that respond to a part of the surrounding cells within the coverage area to improve performance for large-scale image processing.
SUMMARY
The present disclosure describes reconstructing distorted images.
In a first example implementation, a computer-implemented method implemented by a video codec includes receiving, by at least one processor, image data of at least one distorted image; selecting, by the at least one processor, at least one type of filter from a plurality of filter types based on the image data, wherein the type of filter is selected by using a first convolutional neural network (CNN) ; and using a filter of the selected type to generate a reconstructed image corresponding to the distorted image.
The foregoing and other described implementations can each, optionally, include one or more of the following features:
A first feature, combinable with any of the following features, where the image data represents a portion of the at least one distorted image.
A second feature, combinable with any of the previous or following features, where the plurality of filter types comprises a deblocking (DBK) type, a sample adaptive offset (SAO) type, an Adaptive Loop Filter (ALF) type, or a CNN type.
A third feature, combinable with any of the previous or following features, where the selected type of filter is the CNN type.
A fourth feature, combinable with any of the previous or following features, where the type of filter is selected further based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side  information comprises a side information guide map.
A fifth feature, combinable with any of the previous or following features, further comprising: generating controlling coefficients to adjust weights or biases of the filter by using a second CNN.
A sixth feature, combinable with any of the previous or following features, where the controlling coefficients adjust more than one convolution kernals in a same channel with a same value.
A seventh feature, combinable with any of the previous or following features, where the controlling coefficients adjust different convolution kernals in a same channel with different values.
An eight feature, combinable with any of the previous or following features, where the controlling coefficients are generated based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
A ninth feature, combinable with any of the previous or following features, where controlling coefficients are generated based on a preconfigured computation boundary or a target quality factor.
A tenth feature, combinable with any of the previous or following features, where the controlling coefficients are used to determine whether to omit a convolutional layer in generating a reconstructed image.
An eleventh feature, combinable with any of the previous or following features, where the image data comprises data for at least one of a luminance component or a color component.
In a second example implementation, a computer-implemented method includes receiving, by at least one processor, image data of at least one distorted image; selecting, by the at least one processor, a layer path of a convolutional neural network (CNN) filter based on the image data, wherein the layer path of the CNN filter is selected by using a first CNN; and generating, by the at least one processor, a reconstructed image corresponding to the at least one distorted image by using the selected layer path of the CNN filter.
The foregoing and other described implementations can each, optionally, include one or more of the following features:
A first feature, combinable with any of the following features, where the image data represents a portion of the at least one distorted image.
A second feature, combinable with any of the previous or following features, where the layer path is selected further based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
A third feature, combinable with any of the previous or following features, where the layer path is selecged based on a preconfigured computation boundary or a target quality factor.
A fourth feature, combinable with any of the previous or following features, further comprising: generating controlling coefficients to adjust weights or biases of the CNN filter by using a second CNN.
A fifth feature, combinable with any of the previous or following features, where the controlling coefficients are generated based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
A sixth feature, combinable with any of the previous or following features, where the image data comprises data for at least one of a luminance component or a color component.
In a third example implementation, a computer-readable medium storing computer instructions, that when executed by one or more hardware processors, cause the one or more hardware processors of a router to perform operations including: receiving, by at least one processor, image data of at least one distorted image; selecting, by the at least one processor, at least one type of filter from a plurality of filter types based on the image data, wherein the type of filter is selected by using a first convolutional neural network (CNN) ; and using a filter of the selected type to generate a reconstructed image corresponding to the distorted image.
The previously described implementation is implementable using a computer-implemented method; a non-transitory, computer-readable medium storing computer-readable instructions to perform the computer-implemented method; and a computer-implemented system comprising a computer memory interoperably coupled with a hardware processor configured to perform the computer-implemented method and the instructions stored on the non-transitory, computer-readable medium.
The details of one or more implementations of the subject matter of this specification are set forth in the accompanying drawings and the description. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.
DESCRIPTION OF DRAWINGS
FIG. 1 is an example communication system 100 that reconstructs distorted images, according to an implementation.
FIG. 2 is a flow diagram illustrating an example process for reconstructing images, according to an implementation.
FIG. 3 is a schematic diagram illustrating using a CNN to select a filter to reconstruct a distorted image, according to an implementation.
FIG. 4 is a schematic diagram illustrating using a CNN to generate coefficients of a filter used to reconstruct a distorted image, according to an implementation.
FIG. 5 includes schematic diagrams that illustrate different levels of controlling coefficients, according to an implementation.
FIG. 6 is a flowchart illustrating an example method for reconstructing an image, according to an implementation.
FIG. 7 is a block diagram of an example computer system used to provide computational functionalities associated with described algorithms, methods, functions, processes, flows, and procedures, as described in the instant disclosure, according to an implementation.
FIG. 8 is a schematic diagram illustrating an example structure of an electronic circuit that reconstructs images as described in the present disclosure, according to an implementation.
FIG. 9 is a schematic diagram illustrating the construct of an example CNN filter that reconstructs an image, according to an implementation.
FIG. 10 is a flowchart illustrating another example method for reconstructing an image, according to an implementation.
Like reference numbers and designations in the various drawings indicate like elements.
DETAILED DESCRIPTION
The following detailed description describes establishing a virtual network route in a computer network and is presented to enable any person skilled in the art to make and use the disclosed subject matter in the context of one or more particular implementations.
Various modifications, alterations, and permutations of the disclosed implementations can be made and will be readily apparent to those of ordinary skill in the art, and the general principles defined herein may be applied to other implementations and applications, without departing from scope of the disclosure. In some instances, details unnecessary to obtain an understanding of the described subject matter may be omitted so as to not obscure one or more described implementations with unnecessary detail, inasmuch as such details are within the skill of one of ordinary skill in the art. The present disclosure is not intended to be limited to the described or illustrated implementations, but to be accorded the widest scope consistent with the described principles and features.
Comparing to traditional image processing algorithms, CNN simplifies the complexity because it can reduce or avoid complex pre-processing of the image and can directly use the original image for end-to-end learning. Furthermore, traditional neural networks are fully connected, which means that input-to-hidden neurons are all connected. Such configurations result in a large amount of parameters, thus complicating the training process and consuming large amount of computation resources. By using local connections and weight sharing, CNN saves computation resources and improves computation efficiency.
In the processing of digital image, filtering, data rounding, quantization, or any other processing steps may cause pixel values to shift in intensity, which results in distorted images that may have visual obstructions or artifacts. In some implementations, an image de-distortion filter can be used to post-process the distorted images to reconstruct the images by, for example, restoring the pixel intensity offset and reducing visual loss.
Traditional image reconstruction filters can be designed based on experiments and experience. Examples of these types of filters include Deblocking (DBK) filters, which may be used in video encoding and decoding processes and may have fixed filter coefficients and structures. Alternatively, filter coefficients and structures can be adjusted based on statistical information of local image regions. Examples of these types of adaptive filters include Sample Adaptive Offset (SAO) filter and Adaptive Loop Filter (ALF) . In adaptive filters, different filter parameters, e.g., filter coefficients, corresponding to different local statistical information are  included in the software code so that they can be chosen based on the local statistical information for different images. As a result, the size and complexity of the software code for these adaptive filters would increase.
In some implementations, deep learning technology can be applied to image processing. Using CNN filters to reconstruct distorted images can provide a significant improvement in the quality of the reconstructed images. Such an approach can reduce or avoid the process of image pre-processing and manually designing filter coefficients. It learns the image distortion characteristics and compensation methods through data driving training, is simpler to use, more generalized, and more accurate. The improvement is more pronounced when applied to image and video compression that incorporates multiple distortions.
Some CNN filters, e.g., Variable-filter-size Residue-learning Convolutional Neural Network (VRCNN) , use different filter weights and filtering power for images that are generated from different sensors to compensate different noises generated by these sensors. When there is not sufficient side information associated with these images, it may be difficult to obtain corresponding reference parameters to guide image filtering of these filters. Conventional convolutional neural network filters such as VRCNN only provides a high-dimensional filter, and do not include a decision-making filtering process for different texture inputs. This may cause a lack of generalization abilities of these filters and performance loss.
In some implementations, the process of reconstructing distorted images can be improved by incorporating a CNN based decision-making filtering process. The decision making filtering process can be used to select filters, determine filter coefficients, or a combination thereof. Furthermore, a side information guide map can be generated based on distorted parameters, and used as an input to the decision making filtering process. FIGS. 1-8 and associated descriptions provide additional details of these implementations.
FIG. 1 is an example communication system 100 that reconstructs distorted images, according to an implementation. At a high level, the example communication system 100 includes an electronic device 102 and a server 120, that are communicatively coupled with a network 110.
The server 120 represents an application, a set of applications, software, software modules, hardware, or any combination thereof that can be configured to provide trained CNN models. In some implementations, the server 120 can perform training of the CNN models based  on a set of training data. In some cases, the server 120 can receive data from different electronic devices that perform image processing to construct additional sets of training data.
The electronic device 102 represents an electronic device that can reconstruct distorted images. In some cases, the electronic device 102 can be a video codec or include a video codec. The video codec can perform an image reconstruction process during encoding and decoding operations of video images. Alternatively or additionally, the electronic device 102 can be a graphics-processing unit (GPU) or include a GPU that reconstructs distorted image data. In operation, the electronic device 102 can use a first CNN to select a filter based on the distorted image. The electronic device 102 can use a second CNN to generate controlling coefficients based on the distorted image, and use the controlling coefficients to adjust the weights of the filter. The electronic device 102 can use the filter to reconstruct the distorted image. In some cases, the electronic device 102 can generate a side information guide map and use the side information guide map as additional input to the first and the second CNN. In some cases, the electronic device 102 can also receive parameters of trained CNN models from the server 120. FIGS. 2-8 and associated descriptions provide additional details of these implementations.
Turning to a general description, the electronic device 102 may include, without limitation, any of the following: endpoint, computing device, mobile device, mobile electronic device, user device, mobile station, subscriber station, portable electronic device, mobile communications device, wireless modem, wireless terminal, or other electronic device. Examples of an endpoint may include an IoT (Internet of Things) device, EoT (Enterprise of Things) device, cellular phone, personal data assistant (PDA) , smart phone, laptop, tablet, personal computer (PC) , pager, portable computer, portable gaming device, wearable electronic device, health/medical/fitness device, camera, vehicle, or other mobile communications devices having components for communicating voice or data via a wireless communication network. The wireless communication network may include a wireless link over at least one of a licensed spectrum and an unlicensed spectrum. The term “mobile device” can also refer to any hardware or software component that can terminate a communication session for a user. In addition, the terms “user equipment, ” “UE, ” “user equipment device, ” “user agent, ” “UA, ” “user device, ” and “mobile device” can be used interchangeably herein.
The example communication system 100 includes the network 110. The network 110 represents an application, set of applications, software, software modules,  hardware, or combination thereof, that can be configured to transmit data messages between the entities in the system 100. The network 110 can include a wireless network, a wireline network, the Internet, or a combination thereof. For example, the network 110 can include one or a plurality of radio access networks (RANs) , core networks (CNs) , and the Internet. The RANs may comprise one or more radio access technologies. In some implementations, the radio access technologies may be Global System for Mobile communication (GSM) , Interim Standard 95 (IS-95) , Universal Mobile Telecommunications System (UMTS) , CDMA2000 (Code Division Multiple Access) , Evolved Universal Mobile Telecommunications System (E-UMTS) , Long Term Evaluation (LTE) , LTE-Advanced, the fifth generation (5G) , or any other radio access technologies. In some instances, the core networks may be evolved packet cores (EPCs) .
While elements of FIG. 1 are shown as including various component parts, portions, or modules that implement the various features and functionality, nevertheless, these elements may instead include a number of sub-modules, third-party services, components, libraries, and such, as appropriate. Furthermore, the features and functionality of various components can be combined into fewer components, as appropriate.
FIG. 2 is a flow diagram illustrating an example process 200 for reconstructing images, according to an implementation. For clarity of presentation, the description that follows generally describes process 200 in the context of the other figures in this description. The process 200 can be performed by an electronic device that reconstructs images, e.g., the electronic device 102 as illustrated in FIG. 1. However, it will be understood that process 200 may be performed, for example, by any suitable system, environment, software, and hardware, or a combination of systems, environments, software, and hardware, as appropriate. In some implementations, various steps of process 200 can be run in parallel, in combination, in loops, or in any order.
At 202, where a prediction process is performed. During the prediction process, input data is compared with a predicted frame to generate residual data. The predicted frame is generated based on one or more prediction models. Examples of the prediction models include temporal prediction models and spatial prediction models. The temporal prediction models provide inter-frame predictions. The spatial prediction models provide intra-frame predictions. The prediction process can be performed continuously. For example, in a video encoding  process, when processing the n-th frame, the input data represents the raw image data of the n-th frame. A predicted frame for the n-th frame can be generated by applying the prediction models on a reference frame of the previous frame, i.e., the (n-1) -th frame. In some examples, the reference frame can be generated based on information of multiple previous frames, e.g., the (n-1) -th frame, the (n-2) -th frame, and etc. The input data of the n-th frame is compared with the predicted frame for the n-th frame to generate residual data of the n-th frame. The residual data of the n-th frame is processed through other steps in FIG. 2, e.g., transformation, quantization, and entropy encoding to generate output data of the n-th frame. In addition, as discussed subsequently, the residual data is also processed through reverse quantization and transformation and a filtering process to generate a reference frame of the n-th frame. The reference frame of the n-th frame will be used to generate the predicted frame for the next frame, i.e., the (n+1) -th frame, which is used for encoding the (n+1) -th frame. The input data can be provided in different formats. In one example, the input data can be provided in the YCrCb format. In this example, the input data includes the luminance component Y, and the color components Cr and Cb. Other formats, e.g., RGB, can also be used.
At 204, the residual data is transformed. Examples of transformation techniques that can be used in this step include Karhunen-Loeve Transform (KLT) , Discrete Cosine Transformation (DCT) , Discrete Wavelet Transform (DWT) , and etc.
At 206, the transformed data is quantized. Examples of quantization techniques that can be used in this step include scalar quantizer, vector quantizer, and etc. The transform data can have a large range of values. Quantization can reduce the range of values, thereby obtaining a better compression effect. The quantization is a main factor that causes image distortion. The quantization process is configured by one or more quantization parameters, which can be used to configure the degrees of quantization. The following equation represents an example calculation of the quantization process:
Figure PCTCN2018108441-appb-000001
where
Figure PCTCN2018108441-appb-000002
c i represents the transformation coefficient before the quantization, l i represents the coefficient after the quantization, floor (. ) represents a floor operation that returns an integer, f represents a rounding off parameter, QP represents a  quantization parameter. In some implementations, e.g., H. 265/High Efficiency Video Coding (HEVC) coding process, QP can be an integer value between 0 to 51.
At 208, entropy encoding is performed on the quantized data. The inputs to the entropy encoder can include the quantized data generated at step 206. The inputs to the entropy encoder can also include motion vectors, headers of the raw image of the current frame, supplementary information, side information, and any combinations thereof. The entropy encoder converts these inputs to an encoded frame, and outputs the encoded frame as output data for storage or transmission. In some cases, reordering can also be performed on the quantized data prior to the entropy encoding.
As discussed previously, the quantized data is also used to generate a reference frame for the next frame. At 210, a reverse quantization and transformation process is performed on the quantized data to generate image data of a distorted image. The reverse quantization and transformation process is implemented using techniques that corresponds to those used in the transformation and quantization process. For example, if the quantization process uses vector scaling, the reverse quantization process uses vector rescaling; if the transformation process uses DCT, the reverse transformation process uses Inverse DCT (IDCT) . Furthermore, the quantized data, after being reverse quantized and transformed, can be added to the predicted frame used in the prediction step (e.g., step 202) to generate the image data of the distorted image. In some cases, the distorted image can include multiple portions. For example, the distorted image can be divided into multiple pieces. Accordingly, multiple groups of image data can be generated. Each group represents one or more portions of the distorted images. In some cases, each group of image data can be processed separately in the following steps. In these cases, the distorted image discussed in the subsequent steps refers to the group of image data corresponding to the portion of the distorted image that is processed together. Alternatively, one group of image data can represent the entire distorted image and can be processed together. In these cases, the distorted image discussed in the subsequent steps refers to the image data corresponding to the entire distorted image. In some implementations, image data of more than one distorted images can be processed together in selecting the filter and generating controlling coefficients. In these cases, the distorted image discussed in the subsequent steps refers to the image data corresponding to multiple distorted images.
Alternatively or additionally, the image data of the distorted image can be grouped into different components. These components can include luminance components and color components. For example, the distorted image can be represented in a YCrCb format, and the image data can have the Y, Cr, and Cb components. In these cases, different components of the image data can be processed separately or jointly.
At 212, the quantized parameter is used as input to generate a side information guide map. In the context of image processing, side information of an image refers to prior knowledge that can be used to assist the processing of one or more images. As discussed previously, conventional CNN filters such as VRCNN may rely on side information of the sensor that produce the image to determine suitable filter coefficients for the image, and thus its performance may suffer if such side information is not available. In the illustrated example, a side information guide map can be generated based on the quantized parameter used in the quantization step. Because the quantization parameter indicates an extent to distortion introduced in the encoding process, using the quantization parameter to generate a side information guide map can provide additional input to the CNN to improve the performance.
In some implementations, the side information guide map can be generated by two steps: obtaining distortion parameters and normalization. The distortion parameters can be generated based on the quantization parameter discussed previously. The distortion parameters can also be generated based on other information that indicates the extent of image distortion, such as block size and sampling parameters used in the encoding process, or the resolution restoration factor in a high-resolution image restoration process. The side information guide map can have the same dimension (e.g., width and height) as the portion of distorted image represented by the . Each pixel on the side information guide map can be represented by a distortion parameter that indicates the degree of distortion for the corresponding pixel in the distorted image. The distortion parameter can be obtained on the side information parameters discussed previously, e.g., the quantization parameters or the resolution restoration factor.
In some image reconstruction processes other than encoding/decoding operations, the information regarding the degree of distortion may not be readily available. For example, in a social media application, an image may be subjected to multiple digital image processing such as scaling and compression, which may not be known by the electronic device that performs the image reconstruction process. In these or other cases, non-reference image evaluation techniques  can be used to determine the distortion parameter for the side information guide map. In one example, fuzzy degree evaluation can be used as such a non-reference image evaluation. Following equation represents an example fuzzy degree evaluation technique:
D (f) =∑ yx|f (x+2, y) -f (x, y) | 2
where f (x, y) represents the value of the pixel in the coordinate (x, y) , and D (f) can represents the fuzzy degree in some circumstances.
After the distortion parameters are obtained, the distortion parameters can be normalized so that the range of values of each distortion parameter in the guide map is consistent with the range of values of the corresponding pixel in the distorted image. In one example, the range of values for the quantization parameters is [QP_MIN, QP_MAX] , and the range of values for pixels in the distorted image is [PIXEL_MIN, PIXEL_MAX] . Using x to represent the quantization parameters, the normalized quantization parameters norm (x) can be obtained using the following equation:
Figure PCTCN2018108441-appb-000003
At 220, a first CNN is used to select a type of a filter. In the process 200, different filters can be used as candidate filters for reconstructing the images. The type of the filter selected by the first CNN can be a non-CNN type, such as DBK, SAO, ALF, or a CNN type. For some images, a CNN filter can outperform the non-CNN filters. For other images, a non-CNN filter can outperform the CNN filters. Using a CNN to select an appropriate filter can improve the performance of the image reconstruction process. In some cases, the first CNN can be used to select more than one filters.
In some cases, the candidate filters can include more than one types of CNN filters. These CNN filters can have different constructs (e.g., layers) , filter parameters (e.g., weights or bias) . In addition, the candidate filters may not include non-CNN filters, e.g., non-CNN filters may not be included for image reconstruction that is not part of a video coding process. In these or other cases, the first CNN can be used to select one or more CNN filters among the different CNN filters
FIG. 3 is a schematic diagram 300 illustrating using a CNN to select a filter to reconstruct a distorted image, according to an implementation. The diagram 300 includes a first CNN 310 and a first trained CNN model 320.
The first CNN 310 represents a CNN that is configured to select a filter to reconstruct a distorted image. The first CNN 310 includes an input layer 312, a hidden layer 314, and an output layer 316. The first CNN 310 can be implemented by using any suitable system, environment, software, and hardware, or a combination of systems, environments, software, and hardware, as appropriate.
The input layer 312 is configured to receive input data to the first CNN 310. The input data can include the distorted image as discussed previouslyIn some cases, if the side information guide map can be generated as discussed previously, the side information guide map can also be included as the input data. If the side information guide map is not available, e.g., due to lack of information regarding the degree of distortion, the first CNN 310 can proceed without the side information guide map.
In some cases, in addition to or alternative to the side information guide map, features of the distorted image can also be used as input to the first CNN 310. Examples of these features include linear features or non-linear features. Linear features, also referred to as textual features, include features generated by linear transformations, e.g., gradient features. Non-linear features include features generated by non-linear transformations, e.g., frequency features generated by Fourier transformation such as Fast Fourier Transform (FFT) , wavelet features generated by Wavelet Transform (WT) , or features generated by non-linear activations.
In some implementations, the input layer 312 can perform a channel merging operation and a convolution filtering operation.
The channel merging operation combines the side information guide map with the distorted image for each channel to generate a combined input data, represented as I. The convolution filtering operation perform a convolutional filtering on the combined input data I, as illustrated in the following equation:
F 1 (I) =g (W 1*I+B 1)
where W 1 represents the weighting coefficients of the convolutional filter used in the input layer 312, B 1 represents the bias coefficients of the convolutional filter, g () represents a non-linear mapping function, *represents the convolution operation, and F 1 (I) represents the output of the input layer 312.
In some implementations, there can be n 1 convolutional filters used in the input layer 312, and W 1 represents coefficients of the set of the n 1 convolutional filters. Each  convolutional filter can have a kernel of a size c1 x f1 x f1, where C1 represents the number of channels in the input data, and f1 represents the spatial size of each kernel. In one implementation example, C1=2, f1=5, and n1= 64, using (Rectified linear unit) ReLU as the non-linear mapping function g () , represented by g (x) =max (0, x) , the convolution filtering operation can be represented using the following equation:
F 1 (I) =max (0, W 1*I+B 1)
The hidden layer 314 performs additional high-dimensional mapping on the image segmentation of sparse representations extracted by the input layer 312. The hidden layer 314 includes one or more convolution layers. In one example, the hidden layer 314 can include N-1 (N >= 2) convolution layers, represented by the following equation:
F i (I) =g (W i*F i-1 (I) +B i) , i∈ {2, 3, …, N}
Where F i (I) represents the output of the i-th convolutional layer, W i represents the weighting coefficients of the convolutional filter used in the i-th convolutional layer, B i represents the bias coefficients of the convolutional filter used in the i-th convolutional layer, and g () represents a non-linear mapping function.
Similarly to the input layer 312 discussed previously, there can be n i convolutional filters used in each convolutional layer, and W i represents coefficients of the set of the n i convolutional filters. Each convolutional filter can have a kernel of a size c2 x f2 x f2, where C2 represents the number of channels for input data to the convolutional layer, and f2 represents the spatial size of each kernel. In one implementation example, the hidden layer includes one convolutional layer, and C2=64, f2=1, and n2= 32, using ReLU as the non-linear mapping function g () , the convolution filtering operation can be represented using the following equation:
F 2 (I) =max (0, W 2*F 1 (I) +B 2)
The output layer 316 processes the high-dimensional image output from the hidden layer 314 and generates the filter selection decisions. The filter decisions can be a CNN filter, or a non-CNN filter, such as a DBK, ALF, or SAO filter. In some cases, more than one type of filter can be selected.
In one implementation example, the output layer 316 can further perform a convolutional operation on the distorted image to generate the reconstruction image. In this  example, the output layer 316 can include one reconstruction layer. The operation of the reconstruction layer can be represented by the following equation:
F (I) =W N*F N-1 (I) +B N
Where F (I) represents the output of the reconstruction layer, F N-1 (I) represents output of the hidden layer 134, W N represents the weighting coefficients of the convolutional filter used in the reconstruction layer, and B N represents the bias coefficients of the convolutional filter used in the reconstruction layer.
Similarly to the input layer 312 and hidden layer 314 discussed previously, there can be n N convolutional filters used in the reconstruction layer, and W N represents coefficients of the set of the n N convolutional filters. Each convolutional filter can have a kernel of a size cN x fN x fN, where CN represents the number of channels for input data to the construction layer, and fN represents the spatial size of each kernel. In one implementation example, CN=32, fN=3, and nN= 1, the convolution filtering operation can be represented using the following equation:
F (I) =W 3*F 2 (I) +B 3+Y
The first CNN 310 uses the first trained CNN model 320 to make filter selection decisions. In some implementations, the parameters related to the network structure of the first CNN 310, including e.g., the number of convolutional layers, concatenation of convolutional layers, the number of convolution filters per convolutional layer, and the size of the convolution kernel, can be fixed, while the filter coefficients, e.g., the weighting coefficients and bias coefficients, can be configured based on the first trained CNN model 320. Alternatively, one or more parameters related to the network structure of the first CNN 310 can also be trained. In some implementations, the parameter set of the first CNN 310 can be stored on the electronic device that performs the image reconstruction process. Alternatively or additionally, the parameter set can be downloaded or updated from a server that performs training based on data collected from multiple electronic devices.
In some implementations, training for the first CNN 310, can be performed in the following steps:
First, a side information guide map is generated for a large number of undistorted images based on different noise sources from the natural image. Corresponding distorted images can also be generated to form a training set.
Second, the parameters of the CNN are initialized as Θ 0, and the training-related high-level parameters such as the learning rate and the weight-updating algorithm are set.
Third, for the i-th iteration, a forward calculation is performed on the training set using the CNN having the parameter Θ i, to obtain an output F (Y) . Accordingly, the loss function L (Θ i) can be calculated. Following is an example of calculating the loss function:
Figure PCTCN2018108441-appb-000004
Figure PCTCN2018108441-appb-000005
where a j represents a probability of a filter selection decision j, T represents the number of candidate filter types, y j represents the actual selected filter decision. In some implementations, the loss function can be adjusted to improve the converging process.
Fourth, a backward calculation is performed to adjust Θ i for the next iteration.
Fifth, the third and fourth septs are repeated until the loss function converges, at which point the final parameter Θ final is outputted . The final parameter Θ final can be represented by the first trained CNN model 320 to configure the first CNN 310.
Returning to FIG. 2, at 222, whether a CNN type is selected is determined. If a non-CNN filter (e.g., SAO, ALF, or DEK) is selected by the filter selection process at 220, the process 200 proceeds from 222 to 230, where the selected non-CNN filter is used in the to generate the reference frame.
If a CNN filter is selected, or if a CNN filter and one or more non-CNN filter are selected, the process 200 proceeds from 222 to 224, where filter coefficients of the CNN filter are generated. In some cases, a second CNN is used to determine a set of controlling coefficients based on the distorted image, the side information guide map, or a combination thereof. The controlling coefficients can be used to adjust the configured weights of the CNN filter to generate the filter coefficients for the CNN filter to be used in the filtering process.
FIG. 4 is a schematic diagram 400 illustrating using a CNN to generate coefficients of a filter used to reconstruct a distorted image, according to an implementation. The diagram 400 includes a second CNN 410 and a second trained CNN model 420.
The second CNN 410 represents a CNN that is configured to generate controlling coefficient that can be used to adjust the configured filter weights or filter biases of the filter. The second CNN 410 includes an input layer 412, a hidden layer 414, and an output layer 416. The second CNN 410 can be implemented by using any suitable system, environment, software, and hardware, or a combination of systems, environments, software, and hardware, as appropriate.
The input layer 412 is configured to receive input data to the second CNN 410. The input data can include the distorted image, as discussed previously. In some cases, if the side information guide map can be generated as discussed previously, the side information guide map can also be included as the input data. If the side information guide map is not available, e.g., due to lack of information regarding the degree of distortion, the second CNN 410 can proceed without the side information guide map. In addition to or alternative to the side information guide map, features of the distorted image can also be used as input to the second CNN 410. Examples of these features include linear features or non-linear features discussed previously. In some cases, a preconfigured computation boundary or a target quality factor can also be used as input to the second CNN 410 to generate the controlling coefficient. In one example, the target quality factor can be a Peak Signal to Noise Ratio (PSNR) value. Following equation represents an example PSNR value calculation:
Figure PCTCN2018108441-appb-000006
Figure PCTCN2018108441-appb-000007
As discussed previously with respect to the first CNN 310, the input layer 412 can perform a channel merging operation and a convolution filtering operation.
The channel merging operation combines the side information guide map with the distorted image for each channel to generate a combined input data, represented as I. The convolution filtering operation perform a convolutional filtering on the combined input data I, as illustrated in the following equation:
F 1 (I) =g (W 1*I+B 1)
where W 1 represents the weighting coefficients of the convolutional filter used in the input layer 412, B 1 represents the bias coefficients of the convolutional filter, g () represents a non-linear mapping function, *represents the convolution operation, and F 1 (I) represents the output of the input layer 412.
In some implementations, there can be n 1 convolutional filters used in the input layer 412, and W 1 represents coefficients of the set of the n 1 convolutional filters. Each convolutional filter can have a kernel of a size c1 x f1 x f1, where C1 represents the number of channels in the input data, and f1 represents the spatial size of each kernel. In one implementation example, C1=2, f1=5, and n1= 64, using (Rectified linear unit) ReLU as the non-linear mapping function g () , represented by g (x) =max (0, x) , the convolution filtering operation can be represented using the following equation:
F 1 (I) =max (0, W 1*I+B 1)
In some cases, for the controlling efficient generation, the input data can also be extracted from the feature map of a convolutional layer in the CNN filter.
The hidden layer 414 performs additional high-dimensional mapping on the image segmentation of sparse representations extracted by the input layer 412. The hidden layer 414 includes one or more convolution layers. In one example, the hidden layer 414 can includes N-1 (N >= 2) convolution layers, represented by the following equation:
F i (I) =g (W i*F i-1 (I) +B i) , i∈ {2, 3, …, N}
Where F i (I) represents the output of the i-th convolutional layer, W i represents the weighting coefficients of the convolutional filter used in the i-th convolutional layer, B i represents the bias coefficients of the convolutional filter used in the i-th convolutional layer, and g () represents a non-linear mapping function.
The outputted controlling coefficients are used to adjust the configured CNN filter weights. In the illustrated example, the adjustment can be performed by a multiplication operation. The controlling coefficients can be multiplied with the configured filter weights to generate the coefficients of the CNN filter. Other operations, e.g., addition or convolution, can also be used to perform the adjustment.
In some implementations, the coefficients can be controlled at a channel level or a pixel level. FIG. 5 includes schematic diagrams 510 and 520 that illustrate different levels of  controlling coefficients, according to an implementation. The schematic diagram 510 illustrates controlling coefficients at a pixel level, according to an implementation. F i-j represents the j-th feature map of the i-th convolutional layer. Cf i-j, represents a set of controlling coefficients that are applied to different convolution kernals in the j-th feature map of the i-th convolutional layer. In the illustrated example, F i-1, F i-2, and F i-3 represent the first, second, and third feature map of the i-th convolution layer, respectively. Cf i-1, Cf i-2, and Cf i-3 represent the set of controlling coefficients for the convolution kernals in these feature maps, respectively. In this example, the dimension of the controlling coefficients is the same as the dimension of the feature map. Accordingly, different controlling coefficients in the set of the controlling coefficients can be applied to different convolution kernals in the same channel of the CNN filter.
The schematic diagram 520 illustrates controlling coefficients at a channel level, according to an implementation. In this example, for the same feature map (channel) , the same controlling coefficient is applied to different convolution kernals in the feature map. As illustrated, S i-1 represents the controlling coefficient that is applied to all the convolution kernals in the first feature map of the i-th convolution layer. S i-2, and S i-3 represents the controlling coefficients that are applied to all the convolution kernals in the second and third feature maps of the i-th convolution layer, respectively.
The level of the controlling coefficients can be configured to be the same for the CNN filter, or differently or different feature map of the CNN filter.
Returning to FIG. 4, similarly to the input layer 412 discussed previously, there can be n i convolutional filters used in each convolutional layer, and W i represents coefficients of the set of the n i convolutional filters. Each convolutional filter can have a kernel of a size c2 x f2 x f2, where C2 represents the number of channels for input data to the convolutional layer, and f2 represents the spatial size of each kernel. In one implementation example, the hidden layer includes one convolutional layer, and C2=64, f2=1, and n2= 32, using ReLU as the non-linear mapping function g () , the convolution filtering operation can be represented using the following equation:
F 2 (I) =max (0, W 2*F 1 (I) +B 2)
The output layer 416 processes the high-dimensional image output from the hidden layer 414 and generates the controlling coefficients.
The second CNN 410 uses the second trained CNN model 420 to make filter selection decisions. Similarly to the first CNN 310, the parameters related to the network structure, the filter coefficients, or any combinations thereof can be configured based on the second trained CNN model 420. In some implementations, the parameter set of the second CNN 410 can be stored on the electronic device that performs the image reconstruction process. Alternatively or additionally, the parameter set can be downloaded or updated from a server.
Similarly to the first CNN 310, training for the second CNN 410, can be performed by constructing a training set, initializing parameter Θ 0, and performing forward and backward calculation in an iteration process.
The training for the second CNN 410 can be performed jointly with or separately from the first CNN 310. In one implementation example, the loss function for the controlling coefficient training can be the represented as the following:
Figure PCTCN2018108441-appb-000008
where N represents the number of training pair, I n represents the input data based on the combination of side information guide map and distorted image. F (I ni) represents the reconstructed image corresponding to the current parameter Θ i, X n represents undistorted image.
Returning to FIG. 2, at 230, the distorted image is reconstructed using the selected filter, the generated filter coefficients, or a combination thereof. The reconstructed image is used as the reference frame for the prediction operation of the next frame.
In some implementations, the controlling coefficients generated by the second CNN can also be used to simplify the reconstruction process. For example, if a control coefficient for a particular layer in the CNN filter is below a configured threshold, the particular layer can be skipped in the reconstruction operation. This approach can increase the processing speed and save computation resources.
In some implementations, for example in image reconstruction processes that are not part of video encoding or decoding, the quantization and prediction step discussed previously can be skipped. In these or other cases, the input image may be the distorted image that is used as the input to step 220, 224, and 230 discussed previously.
While the process 200 illustrates an example encoding process, other image reconstruction process can also use CNNs to select filter type and generate controlling  coefficients for CNN filters in a similar fashion. For example, in a decoding process, distorted images are generated based on encoded image data, and reconstructed images are generated based on the distorted image and reference frames. Accordingly, a first CNN can be used to select a filter type used for the reconstruction process, and a second CNN can be used to generate controlling coefficients that are used to adjust filter weights of the reconstruction filter. An image restoration can also be performed on distorted images that are generated or received in other processes.
FIG. 6 is a flowchart illustrating an example method 600 for reconstructing an image, according to an implementation. The method 600 can be implemented by an electronic device shown in FIG. 1. The method 600 can also be implemented using additional, fewer, or different entities. Furthermore, the method 600 can also be implemented using additional, fewer, or different operations, which can be performed in the order shown or in a different order. In some instances, an operation or a group of operations can be iterated or repeated, for example, for a specified number of iterations or until a terminating condition is reached.
The example method 600 begins at 602, where image data of at least one distorted image is received. At 604, at least one type of filter is selected from a plurality of filter types based on the image data. The filter type selection is performed using a first CNN. In some implementations, the plurality of filter types can include a deblocking (DBK) type, a sample adaptive offset (SAO) type, an Adaptive Loop Filter (ALF) type, or a CNN type. In some cases, the CNN type can be selected. In some cases, the filter is selected further based on a side information guide map of the distorted image or features of the distorted image.
At 606, controlling coefficients that adjust weights or biases of the reconstruction filter are generated by using a second CNN. In some implementations, the second CNN uses the distorted image, the side information guide map of the distorted image, features of the distorted image, a preconfigured computation boundary, a target quality factor, or any combinations thereof to generate the controlling coefficients. The  step  604 and 606 can be performed together or separately.
At 608, a filter of the selected type is used to generate a reconstructed image corresponding to the distorted image. In some cases, the controlling coefficients are used to determine whether to omit a convolutional layer in generating a reconstructed image.
In some cases, CNN can be used to select a layer path of the CNN filter. FIG. 9 is a schematic diagram 900 illustrating the construct of an example CNN filter 910 that reconstructs an image, according to an implementation. While the CNN filter 910 includes 4 layers (represented horizontally) and 3 scale levels (represented vertically) as illustrated. Additional layers and scale levels can be included in the CNN filter 910. As illustrated, each layer includes one or more features at each scale level, denoted as  features  911, 921, 931 for layer 1, features 912, 922, 932 for layer 2, features 913, 923, 933 for layer 3, and features 914, 924, 934 for layer 4, respectively. As illustrated, the arrows represent convolutional (regular or strided) operations. The circled C symbols represent concatenation. In the illustrated example,  concatenations  941, 942, 943 represent concatenation operations at scale level 1;  concatenations  951, 952, 953, 954, 955 represents concatenation operations at scale level 2; and  concatenations  961, 962, 963, 964, 965, 966 represents concatenation operations at scale level 3.
The diagram 902 includes a distorted image 902. The distorted image 902 is a blurred image of a cat. Different layer paths in the CNN filter 910 can be taken to process the image data of the distorted image 902 to obtain a reconstructed image. For example, the CNN filter 910 includes  layer paths  920 and 930. The layer path 920 takes the image data, take a convolutional operation with features 911, then takes another convolutional operation with features 912, takes a concatenation operation at 941 (concatenating the convolutional operation outputs of features 911 and 912) , takes another convolutional operation at 913 and another concatenation operation at 942, then takes another convolutional operation at 914 and another concatenation operation at 943 to generate a reconstruction image as an output. The layer path 930 takes  concatenation operations  951, 952, 953, 954, 955, and 956 to generate a reconstruction image as an output. As illustrated, these concatenation operations concatenate the outputs of convolutional operation with features at scale level 1 and scale level 2. Therefore, while layer path 930 may generate a different and better image that the layer path 920, the layer path 930 involves more computations and therefore can be more time and resource consuming.
In some cases, a CNN can be used to select the layer path to process the image data of the distorted image. Similarly to the first CNN as discussed previously in FIGS. 2-3, the CNN can take input of the image data of the at least one distorted image, a side information guide map of the at least one distorted image, features of the at least one distorted image, or any  combinations thereof, and output a selected layer path. In some implementations, the input of the CNN can further include a preconfigured computation boundary or a target quality factor.
In addition, similarly to the second CNN as discussed previously in FIGS. 2 and 4, a different CNN can take input of the image data of the at least one distorted image, a side information guide map of the at least one distorted image, features of the at least one distorted image, a preconfigured computation boundary, a target quality factor or any combinations thereof, and generate controlling coefficients to adjust the weights or biases of the filtering operations (e.g., the convolutional operations) on the selected layer path.
FIG. 10 is a flowchart illustrating another example method 1000 for reconstructing an image, according to an implementation. The method 1000 can be implemented by an electronic device shown in FIG. 1. The method 1000 can also be implemented using additional, fewer, or different entities. Furthermore, the method 1000 can also be implemented using additional, fewer, or different operations, which can be performed in the order shown or in a different order. In some instances, an operation or a group of operations can be iterated or repeated, for example, for a specified number of iterations or until a terminating condition is reached.
The example method 1000 begins at 602, where image data of at least one distorted image is received. At 1004, a layer path of a convolutional neural network (CNN) filter is selected based on the image data, where the layer path of the CNN filter is selected by using a first CNN. At 1006, the selected layer path of the CNN filter is used to generate a reconstructed image corresponding to the distorted image.
FIG. 7 is a block diagram of an example computer system 700 used to provide computational functionalities associated with described algorithms, methods, functions, processes, flows, and procedures, as described in the instant disclosure, according to an implementation. The computer system 700, or more than one computer system 700, can be used to implement the electronic device that reconstructs the image, and the server that trains and provides the CNN models.
The illustrated computer 702 is intended to encompass any computing device, such as a server, desktop computer, laptop/notebook computer, wireless data port, smart phone, personal data assistant (PDA) , tablet computing device, one or more processors within these  devices, or any other suitable processing device, including physical or virtual instances (or both) of the computing device. Additionally, the computer 702 may comprise a computer that includes an input device, such as a keypad, keyboard, touch screen, or other device that can accept user information, and an output device that conveys information associated with the operation of the computer 702, including digital data, visual, or audio information (or a combination of information) , or a graphical user interface (GUI) .
The computer 702 can serve in a role as a client, network component, a server, a database or other persistency, or any other component (or a combination of roles) of a computer system for performing the subject matter described in the instant disclosure. The illustrated computer 702 is communicably coupled with a network 730. In some implementations, one or more components of the computer 702 may be configured to operate within environments, including cloud-computing-based, local, global, or other environment (or a combination of environments) .
At a high level, the computer 702 is an electronic computing device operable to receive, transmit, process, store, or manage data and information associated with the described subject matter. According to some implementations, the computer 702 may also include, or be communicably coupled with, an application server, e-mail server, web server, caching server, streaming data server, or other server (or a combination of servers) .
The computer 702 can receive requests over network 730 from a client application (for example, executing on another computer 702) and respond to the received requests by processing the received requests using an appropriate software application (s) . In addition, requests may also be sent to the computer 702 from internal users (for example, from a command console or by other appropriate access methods) , external or third-parties, other automated applications, as well as any other appropriate entities, individuals, systems, or computers.
Each of the components of the computer 702 can communicate using a system bus 703. In some implementations, any or all of the components of the computer 702, hardware or software (or a combination of both hardware and software) , may interface with each other or the interface 704 (or a combination of both) , over the system bus 703 using an application programming interface (API) 712 or a service layer 713 (or a combination of the API 712 and service layer 713) . The API 712 may include specifications for routines, data structures, and object classes. The API 712 may be either computer-language independent or dependent and  refer to a complete interface, a single function, or even a set of APIs. The service layer 713 provides software services to the computer 702 or other components (whether or not illustrated) that are communicably coupled to the computer 702. The functionality of the computer 702 may be accessible for all service consumers using this service layer. Software services, such as those provided by the service layer 713, provide reusable, defined functionalities through a defined interface. For example, the interface may be software written in JAVA, C++, or other suitable language providing data in extensible markup language (XML) format or other suitable formats. While illustrated as an integrated component of the computer 702, alternative implementations may illustrate the API 712 or the service layer 713 as stand-alone components in relation to other components of the computer 702 or other components (whether or not illustrated) that are communicably coupled to the computer 702. Moreover, any or all parts of the API 712 or the service layer 713 may be implemented as child or sub-modules of another software module, enterprise application, or hardware module without departing from the scope of this disclosure.
The computer 702 includes an interface 704. Although illustrated as a single interface 704 in FIG. 7, two or more interfaces 704 may be used according to particular needs, desires, or particular implementations of the computer 702. The interface 704 is used by the computer 702 for communicating with other systems that are connected to the network 730 (whether illustrated or not) in a distributed environment. Generally, the interface 704 includes logic encoded in software or hardware (or a combination of software and hardware) and is operable to communicate with the network 730. More specifically, the interface 704 may include software supporting one or more communication protocols associated with communications such that the network 730 or interface’s hardware is operable to communicate physical signals within and outside of the illustrated computer 702.
The computer 702 includes a processor 705. Although illustrated as a single processor 705 in FIG. 7, two or more processors may be used according to particular needs, desires, or particular implementations of the computer 702. Generally, the processor 705 executes instructions and manipulates data to perform the operations of the computer 702 and any algorithms, methods, functions, processes, flows, and procedures as described in the instant disclosure.
The computer 702 also includes a database 706 that can hold data for the computer 702 or other components (or a combination of both) that can be connected to the  network 730 (whether illustrated or not) . For example, database 706 can be an in-memory, conventional, or other type of database storing data consistent with this disclosure. In some implementations, database 706 can be a combination of two or more different database types (for example, a hybrid in-memory and conventional database) according to particular needs, desires, or particular implementations of the computer 702 and the described functionality. Although illustrated as a single database 706 in FIG. 7, two or more databases (of the same or combination of types) can be used according to particular needs, desires, or particular implementations of the computer 702 and the described functionality. While database 706 is illustrated as an integral component of the computer 702, in alternative implementations, database 706 can be external to the computer 702.
The computer 702 also includes a memory 707 that can hold data for the computer 702 or other components (or a combination of both) that can be connected to the network 730 (whether illustrated or not) . For example, memory 707 can be Random Access Memory (RAM) , Read-Only Memory (ROM) , optical, magnetic, and the like, storing data consistent with this disclosure. In some implementations, memory 707 can be a combination of two or more different types of memory (for example, a combination of RAM and magnetic storage) according to particular needs, desires, or particular implementations of the computer 702 and the described functionality. Although illustrated as a single memory 707 in FIG. 7, two or more memories 707 (of the same or a combination of types) can be used according to particular needs, desires, or particular implementations of the computer 702 and the described functionality. While memory 707 is illustrated as an integral component of the computer 702, in alternative implementations, memory 707 can be external to the computer 702.
The application 708 is an algorithmic software engine providing functionality according to particular needs, desires, or particular implementations of the computer 702, particularly with respect to functionality described in this disclosure. For example, application 708 can serve as one or more components, modules, or applications. Further, although illustrated as a single application 708, the application 708 may be implemented as multiple applications 708 on the computer 702. In addition, although illustrated as integral to the computer 702, in alternative implementations, the application 708 can be external to the computer 702.
The computer 702 can also include a power supply 714. The power supply 714 can include a rechargeable or non-rechargeable battery that can be configured to be either user-or  non-user-replaceable. In some implementations, the power supply 714 can include power-conversion or management circuits (including recharging, standby, or other power management functionality) . In some implementations, the power supply 714 can include a power plug to allow the computer 702 to be plugged into a wall socket or other power source to, for example, power the computer 702 or recharge a rechargeable battery.
There may be any number of computers 702 associated with, or external to, a computer system containing computer 702, each computer 702 communicating over network 730. Further, the term “client, ” “user, ” and other appropriate terminology may be used interchangeably, as appropriate, without departing from the scope of this disclosure. Moreover, this disclosure contemplates that many users may use one computer 702, or that one user may use multiple computers 702.
FIG. 8 is a schematic diagram illustrating an example structure of an electronic circuit 800 that reconstructs images as described in the present disclosure, according to an implementation. The electronic circuit 800 can be a component or a functional block of a codec, e.g., a video codec. The electronic circuit 800 can also be a component or a functional block of a graphic processing unit.
The electronic circuit 800 includes a receiving circuit 802, a filter selection circuit 804, a filter coefficient determination circuit 806, a storage circuit 808, and a processing circuit 810 that are coupled to, or capable of communicating with, the receiving circuit 802, the filter selection circuit 804, the filter coefficient determination circuit 806, and the storage circuit 808. In some implementations, the electronic circuit 800 can further include one or more circuits for performing any one or a combination of steps described in the present disclosure. In some implementations, some or all of these component circuits can be combined into fewer components.
The receiving circuit 802 is configured to receive image data that represents a distorted image. The filter selection circuit 804 is configured to select a type of filter from a plurality of filter types based on the image data by using a first CNN. The processing circuit 810 is configured to use a filter of the selected type to generate a reconstructed image corresponding to the distorted image. The filter coefficient circuit 806 is configured to generate controlling coefficients to adjust weights of the filter by using a second CNN. The storage circuit 808 is configured to store training models used by the first CNN and the second CNN.
Implementations of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer programs, that is, one or more modules of computer program instructions encoded on a tangible, non-transitory, computer-readable computer-storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively, or additionally, the program instructions can be encoded in/on an artificially generated propagated signal, for example, a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. The computer-storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of computer-storage mediums.
The term “real-time, ” “real time, ” “realtime, ” “real (fast) time (RFT) , ” “near (ly) real-time (NRT) , ” “quasi real-time, ” or similar terms (as understood by one of ordinary skill in the art) , means that an action and a response are temporally proximate such that an individual perceives the action and the response occurring substantially simultaneously. For example, the time difference for a response to display (or for an initiation of a display) of data following the individual’s action to access the data may be less than 1 ms, less than 1 sec., or less than 5 secs. While the requested data need not be displayed (or initiated for display) instantaneously, it is displayed (or initiated for display) without any intentional delay, taking into account processing limitations of a described computing system and time required to, for example, gather, accurately measure, analyze, process, store, or transmit the data.
The terms “data processing apparatus, ” “computer, ” or “electronic computer device” (or equivalent as understood by one of ordinary skill in the art) refer to data processing hardware and encompass all kinds of apparatus, devices, and machines for processing data, including by way of example, a programmable processor, a computer, or multiple processors or computers. The apparatus can also be or further include special purpose logic circuitry, for example, a Central Processing Unit (CPU) , a Field Programmable Gate Array (FPGA) , or an  Application-specific Integrated Circuit (ASIC) . In some implementations, the data processing apparatus or special purpose logic circuitry (or a combination of the data processing apparatus or special purpose logic circuitry) may be hardware-or software-based (or a combination of both hardware-and software-based) . The apparatus can optionally include code that creates an execution environment for computer programs, for example, code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of execution environments. The present disclosure contemplates the use of data processing apparatuses with or without conventional operating systems, for example LINUX, UNIX, WINDOWS, MAC OS, ANDROID, IOS, or any other suitable conventional operating system.
A computer program, which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, for example, one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, for example, files that store one or more modules, sub-programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network. While portions of the programs illustrated in the various figures are shown as individual modules that implement the various features and functionality through various objects, methods, or other processes, the programs may instead include a number of sub-modules, third-party services, components, libraries, and such, as appropriate. Conversely, the features and functionality of various components can be combined into single components, as appropriate. Thresholds used to make computational determinations can be statically, dynamically, or both statically and dynamically determined.
The methods, processes, or logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The methods, processes,  or logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, for example, a CPU, an FPGA, or an ASIC.
Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors, both, or any other kind of CPU. Generally, a CPU will receive instructions and data from a ROM or a Random Access Memory (RAM) , or both. The essential elements of a computer are a CPU, for performing or executing instructions, and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to, receive data from or transfer data to, or both, one or more mass storage devices for storing data, for example, magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, for example, a mobile telephone, a Personal Digital Assistant (PDA) , a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, for example, a Universal Serial Bus (USB) flash drive, to name just a few.
Computer-readable media (transitory or non-transitory, as appropriate) suitable for storing computer program instructions and data includes non-volatile memory, media and memory devices, including by way of example, semiconductor memory devices, for example, Erasable Programmable Read-Only Memory (EPROM) , Electrically Erasable Programmable Read-Only Memory (EEPROM) , and flash memory devices; magnetic disks, for example, internal hard disks or removable disks; magneto-optical disks; and CD-ROM, DVD+/-R, DVD-RAM, and DVD-ROM disks. The memory may store various objects or data, including caches, classes, frameworks, applications, backup data, jobs, web pages, web page templates, database tables, repositories storing dynamic information, and any other appropriate information including any parameters, variables, algorithms, instructions, rules, constraints, or references thereto. Additionally, the memory may include any other appropriate data, such as logs, policies, security or access data, reporting files, as well as others. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, implementations of the subject matter described in this specification can be implemented on a computer having a display device, for example, a Cathode Ray Tube (CRT) , Liquid Crystal Display (LCD) , Light Emitting Diode (LED) , or plasma monitor, for displaying information to the user and a keyboard and a pointing  device, for example, a mouse, trackball, or trackpad by which the user can provide input to the computer. Input may also be provided to the computer using a touchscreen, such as a tablet computer surface with pressure sensitivity, a multi-touch screen using capacitive or electric sensing, or other type of touchscreen. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, for example, visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user’s client device in response to requests received from the web browser.
The term “graphical user interface, ” or “GUI, ” may be used in the singular or the plural to describe one or more graphical user interfaces and each of the displays of a particular graphical user interface. Therefore, a GUI may represent any graphical user interface, including but not limited to, a web browser, a touch screen, or a Command Line Interface (CLI) that processes information and efficiently presents the information results to the user. In general, a GUI may include a plurality of User Interface (UI) elements, some or all associated with a web browser, such as interactive fields, pull-down lists, and buttons. These and other UI elements may be related to or represent the functions of the web browser.
Implementations of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, for example, as a data server, or that includes a middleware component, for example, an application server, or that includes a front-end component, for example, a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of wireline or wireless digital data communication (or a combination of data communication) , for example, a communication network. Examples of communication networks include a Local Area Network (LAN) , a Radio Access Network (RAN) , a Metropolitan Area Network (MAN) , a Wide Area Network (WAN) , Worldwide Interoperability for Microwave Access (WIMAX) , a Wireless Local Area Network (WLAN) using, for example, 802.11 a/b/g/n or 802.20 (or a combination of 802.11x and 802.20 or other protocols consistent with this  disclosure) , all or a portion of the Internet, or any other communication system or systems at one or more locations (or a combination of communication networks) . The network may communicate with, for example, Internet Protocol (IP) packets, Frame Relay frames, Asynchronous Transfer Mode (ATM) cells, voice, video, data, or other suitable information (or a combination of communication types) between network addresses.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or on the scope of what may be claimed, but rather as descriptions of features that may be specific to particular implementations of particular inventions. Certain features that are described in this specification in the context of separate implementations can also be implemented, in combination, in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations, separately, or in any suitable sub-combination. Moreover, although previously described features may be described as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can, in some cases, be excised from the combination, and the claimed combination may be directed to a sub-combination or variation of a sub-combination.
Particular implementations of the subject matter have been described. Other implementations, alterations, and permutations of the described implementations are within the scope of the following claims as will be apparent to those skilled in the art. While operations are depicted in the drawings or claims in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed (some operations may be considered optional) , to achieve desirable results. In certain circumstances, multitasking or parallel processing (or a combination of multitasking and parallel processing) may be advantageous and performed as deemed appropriate.
Moreover, the separation or integration of various system modules and components in the previously described implementations should not be understood as requiring  such separation or integration in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Accordingly, the previously described example implementations do not define or constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure.
Furthermore, any claimed implementation is considered to be applicable to at least a computer-implemented method; a non-transitory, computer-readable medium storing computer-readable instructions to perform the computer-implemented method; and a computer system comprising a computer memory interoperably coupled with a hardware processor configured to perform the computer-implemented method or the instructions stored on the non-transitory, computer-readable medium.

Claims (20)

  1. A method implemented by a video codec, comprising:
    receiving, by at least one processor, image data of at least one distorted image;
    selecting, by the at least one processor, at least one type of filter from a plurality of filter types based on the image data, wherein the type of filter is selected by using a first convolutional neural network (CNN) ; and
    using a filter of the selected type to generate a reconstructed image corresponding to the distorted image.
  2. The method of claim 1, wherein the image data represents a portion of the at least one distorted image.
  3. The method of claim 1, wherein the plurality of filter types comprises a deblocking (DBK) type, a sample adaptive offset (SAO) type, an Adaptive Loop Filter (ALF) type, or a CNN type.
  4. The method of claim 3, wherein the selected type of filter is the CNN type.
  5. The method of claim 1, wherein the type of filter is selected further based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
  6. The method of claim 1, further comprising: generating controlling coefficients to adjust weights or biases of the filter by using a second CNN.
  7. The method of claim 6, wherein the controlling coefficients adjust more than one convolution kernals in a same channel with a same value.
  8. The method of claim 6, wherein the controlling coefficients adjust different convolution kernals in a same channel with different values.
  9. The method of claim 6, wherein the controlling coefficients are generated based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
  10. The method of claim 9, wherein the controlling coefficients are generated based on a preconfigured computation boundary or a target quality factor.
  11. The method of claim 6, wherein the controlling coefficients are used to determine whether to omit a convolutional layer in generating a reconstructed image.
  12. The method of claim 1, wherein the image data comprises data for at least one of a luminance component or a color component.
  13. A computer-implemented method, comprising:
    receiving, by at least one processor, image data of at least one distorted image;
    selecting, by the at least one processor, a layer path of a convolutional neural network (CNN) filter based on the image data, wherein the layer path of the CNN filter is selected by using a first CNN; and
    generating, by the at least one processor, a reconstructed image corresponding to the at least one distorted image by using the selected layer path of the CNN filter.
  14. The method of claim 13, wherein the image data represents a portion of the at least one distorted image.
  15. The method of claim 13, wherein the layer path is selected further based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
  16. The method of claim 13, wherein the layer path is selected based on a preconfigured computation boundary or a target quality factor.
  17. The method of claim 13, further comprising: generating controlling coefficients to adjust weights or biases of the CNN filter by using a second CNN.
  18. The method of claim 17, wherein the controlling coefficients are generated based on one or more features of the at least one distorted image or side information of the at least one distorted image, wherein the side information comprises a side information guide map.
  19. The method of claim 13, wherein the image data comprises data for at least one of a luminance component or a color component.
  20. A computer-readable medium storing computer instructions, that when executed by one or more hardware processors, cause the one or more hardware processors of a router to perform operations comprising:
    receiving, by at least one processor, image data of at least one distorted image;
    selecting, by the at least one processor, at least one type of filter from a plurality of filter types based on the image data, wherein the type of filter is selected by using a first convolutional neural network (CNN) ; and
    using a filter of the selected type to generate a reconstructed image corresponding to the distorted image.
PCT/CN2018/108441 2018-09-28 2018-09-28 Reconstructing distorted images using convolutional neural network WO2020062074A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/108441 WO2020062074A1 (en) 2018-09-28 2018-09-28 Reconstructing distorted images using convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/108441 WO2020062074A1 (en) 2018-09-28 2018-09-28 Reconstructing distorted images using convolutional neural network

Publications (1)

Publication Number Publication Date
WO2020062074A1 true WO2020062074A1 (en) 2020-04-02

Family

ID=69953262

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/108441 WO2020062074A1 (en) 2018-09-28 2018-09-28 Reconstructing distorted images using convolutional neural network

Country Status (1)

Country Link
WO (1) WO2020062074A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022067806A1 (en) * 2020-09-30 2022-04-07 Oppo广东移动通信有限公司 Video encoding and decoding methods, encoder, decoder, and storage medium
CN114339221A (en) * 2020-09-30 2022-04-12 脸萌有限公司 Convolutional neural network based filter for video coding and decoding
WO2022245640A3 (en) * 2021-05-18 2023-01-05 Tencent America LLC Substitutional quality factor learning for quality-adaptive neural network-based loop filter
US11792438B2 (en) 2020-10-02 2023-10-17 Lemon Inc. Using neural network filtering in video coding
WO2024077575A1 (en) * 2022-10-13 2024-04-18 Oppo广东移动通信有限公司 Neural network based loop filter method, video encoding method and apparatus, video decoding method and apparatus, and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100329362A1 (en) * 2009-06-30 2010-12-30 Samsung Electronics Co., Ltd. Video encoding and decoding apparatus and method using adaptive in-loop filter
US20130113880A1 (en) * 2011-11-08 2013-05-09 Jie Zhao High Efficiency Video Coding (HEVC) Adaptive Loop Filter
CN104350752A (en) * 2012-01-17 2015-02-11 华为技术有限公司 In-loop filtering for lossless coding mode in high efficiency video coding
CN108447036A (en) * 2018-03-23 2018-08-24 北京大学 A kind of low light image Enhancement Method based on convolutional neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100329362A1 (en) * 2009-06-30 2010-12-30 Samsung Electronics Co., Ltd. Video encoding and decoding apparatus and method using adaptive in-loop filter
US20130113880A1 (en) * 2011-11-08 2013-05-09 Jie Zhao High Efficiency Video Coding (HEVC) Adaptive Loop Filter
CN104350752A (en) * 2012-01-17 2015-02-11 华为技术有限公司 In-loop filtering for lossless coding mode in high efficiency video coding
CN108447036A (en) * 2018-03-23 2018-08-24 北京大学 A kind of low light image Enhancement Method based on convolutional neural networks

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022067806A1 (en) * 2020-09-30 2022-04-07 Oppo广东移动通信有限公司 Video encoding and decoding methods, encoder, decoder, and storage medium
CN114339221A (en) * 2020-09-30 2022-04-12 脸萌有限公司 Convolutional neural network based filter for video coding and decoding
CN114339221B (en) * 2020-09-30 2024-06-07 脸萌有限公司 Convolutional neural network based filter for video encoding and decoding
US11792438B2 (en) 2020-10-02 2023-10-17 Lemon Inc. Using neural network filtering in video coding
WO2022245640A3 (en) * 2021-05-18 2023-01-05 Tencent America LLC Substitutional quality factor learning for quality-adaptive neural network-based loop filter
WO2024077575A1 (en) * 2022-10-13 2024-04-18 Oppo广东移动通信有限公司 Neural network based loop filter method, video encoding method and apparatus, video decoding method and apparatus, and system

Similar Documents

Publication Publication Date Title
WO2020062074A1 (en) Reconstructing distorted images using convolutional neural network
US10880551B2 (en) Method and apparatus for applying deep learning techniques in video coding, restoration and video quality analysis (VQA)
US11606560B2 (en) Image encoding and decoding, video encoding and decoding: methods, systems and training methods
Mentzer et al. Conditional probability models for deep image compression
KR102332476B1 (en) Tile image compression using neural networks
Cai et al. Efficient variable rate image compression with multi-scale decomposition network
EP3828811A1 (en) Electronic device, control method thereof, and system
US11956447B2 (en) Using rate distortion cost as a loss function for deep learning
JP7345650B2 (en) Alternative end-to-end video coding
US11881003B2 (en) Image compression and decoding, video compression and decoding: training methods and training systems
CN107113426B (en) Method and apparatus for performing graph-based transformations using generalized graph parameters
Jeong et al. An overhead-free region-based JPEG framework for task-driven image compression
EP4387233A1 (en) Video encoding and decoding method, encoder, decoder and storage medium
Petrov et al. Intra frame compression and video restoration based on conditional markov processes theory
Xu et al. Perceptual rate-distortion optimized image compression based on block compressive sensing
CN102948147A (en) Video rate control based on transform-coefficients histogram
Xie et al. Bandwidth-Aware Adaptive Codec for DNN Inference Offloading in IoT
Nami et al. Lightweight Multitask Learning for Robust JND Prediction using Latent Space and Reconstructed Frames
US11683515B2 (en) Video compression with adaptive iterative intra-prediction
US11750847B2 (en) Quality-adaptive neural network-based loop filter with smooth quality control by meta-learning
Wang et al. Adaptive CNN-Based Image Compression Model for Improved Remote Desktop Experience
EP4231643A1 (en) Method for image compression and apparatus for implementing the same
US20220383554A1 (en) Substitutional quality factor learning for quality-adaptive neural network-based loop filter
EP3942475B1 (en) Using rate distortion cost as a loss function for deep learning
Jiang et al. Compressed vision information restoration based on cloud prior and local prior

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18934919

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18934919

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 18934919

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.12.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18934919

Country of ref document: EP

Kind code of ref document: A1