WO2021068175A1

WO2021068175A1 - Method and apparatus for video clip compression

Info

Publication number: WO2021068175A1
Application number: PCT/CN2019/110470
Authority: WO
Inventors: David Jones Brady; Xuefei YAN; Weiping Zhang; Changzhi YU
Original assignee: Suzhou Aqueti Technology Co., Ltd.
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2021-04-15
Also published as: CN113196779B; CN113196779A

Abstract

The present invention discloses a compressing method and apparatus. The compressing method includes steps of: reading out a plurality of groups of raw pixel values from a camera head, wherein each group of raw pixel values correspond to a frame of the video clip; doing intra-frame compression by compressing each group of raw pixel values into an intra-frame-compressive-sampling (ICS) frame with a compressing kernel; quantizing the first ICS-frame, and quantizing R-ICS frames into QR-ICS frames, wherein the quantized first ICS-frame comprises Ncomp sub-quantized first ICS-frames and each sub-quantized first ICS-frame corresponds to an ICS-channel of the quantized first ICS-frame; for sub-QR-ICS frames corresponding to each ICS-channel, doing patch-matching subtraction in sub-QR-ICS frames relative to a sub-quantized first ICS-frame corresponding to the same ICS-channel, and generating patch-subtracted ICS-frames; for patch-subtracted ICS-frames corresponding to each ICS-channel, grouping the patch-subtracted ICS-frames into stacks, wherein each stack includes a pre-defined number of patch-subtracted ICS-frames; for patch-subtracted ICS-frames in each stack corresponding to each ICS-channel, determining shared data, wherein the shared data represent similar data among the patch-subtracted ICS-frames, and determining stack-residual frames based on the shared data.

Description

Method and Apparatus for Video Clip Compression

TECHNICAL FIELD

The present invention relates to method and apparatus of video clip compressing, particularly to a method for low power video clip compression.

BACKGROUND

The general function of a camera is to transform parallel optical data into compressed serial electronic formats for transmission or information storage. In some embodiments, the parallel optical data may correspond to a video clip.

A video clip consists of multi-frames, and frames of the video clip may have high resolution, which may decrease the transmission speed, which means a compressing method before transmission is necessary.

Operating power is a key limitation in camera pixel capacity. A conventional camera consists of a focal plane and a “system on chip” image processing platform. The image signal processing (ISP) chip implements various image processing functions, such as demosaicing, nonuniformity correction, etc. as well as image compression.

While standard compression algorithms are often implemented in hardware application circuits to improve power efficiency and speed, these algorithms still require substantial power and memory resources. In addition, ISP chips typically implement image processing algorithms that also require power and chip area. In view of the multiple digital steps per pixel required in these processes, ISP chips commonly use substantially more power than image sensor capture and read-out.

Image quality after reconstruction is another index to measure the compression method.

Based on description above, a method to balance operating power, reconstruction quality, etc. may be needed.

SUMMARY

One aspect of the present disclosure is directed to a method for compressing a video clip. The compressing method may include one or more of the following operations: reading out a plurality of groups of raw pixel values from a camera head, wherein each group of raw pixel values correspond to a frame of the video clip; doing intra-frame compression by compressing each group of raw pixel values into an intra-frame-compressive-sampling (ICS) frame with a compressing kernel, wherein ICS frames include a first ICS-frame and a number of remaining ICS frames (R-ICS frames) after the first ICS-frame in time, and wherein the compressing kernel has Ncomp ICS-channels and Ncomp is an integer not smaller than 1; quantizing the first ICS-frame, and quantizing R-ICS frames into QR-ICS frames, wherein the quantized first ICS-frame comprises Ncomp sub-quantized first ICS-frames and each sub-quantized first ICS-frame corresponds to an ICS-channel of the quantized first ICS-frame, wherein each QR-ICS frame comprises Ncomp sub-QR-ICS frames and each sub-QR-ICS frame corresponds to an ICS-channel of the QR-ICS frame; for sub-QR-ICS frames corresponding to each ICS-channel, doing patch-matching subtraction in sub-QR-ICS frames relative to the sub-quantized first ICS-frame corresponding to the same ICS-channel, and generating patch-subtracted ICS-frames, wherein one or more motion vectors correspond to a sub-QR-ICS frame and a motion vector represents a relative location between a matched patch in one of the sub-QR-ICS frames corresponding to an ICS-channel and a reference patch in a sub-quantized first ICS-frame corresponding to the same ICS-channel, and wherein motion vectors corresponding to an ICS-channel are shared among other ICS-channels; for patch-subtracted ICS-frames corresponding to each ICS-channel, grouping the patch-subtracted ICS-frames into stacks, wherein each stack includes a pre-defined number of patch-subtracted ICS-frames; for patch-subtracted ICS-frames in each stack corresponding to each ICS-channel, determining shared data, wherein the shared data represent similar data among the patch-subtracted ICS-frames, and determining stack-residual frames based on the shared data.

In some embodiments, the step doing intra-frame compression by compressing each group of raw pixel values into a compressed frame with a compressing kernel, comprising: for each frame of the video clip, compressing each portion of the group of raw pixel values into an integer with the compressing kernel, wherein each portion of the group of raw pixel values corresponds to a section of the frame.

In some embodiments, the step for sub-QR-ICS frames corresponding to each ICS-channel, doing patch-matching subtraction in the QR-ICS frames relative to the quantized first ICS-frame, comprising: for each sub-QR-ICS frame corresponding to an ICS-channel, doing patch-matching-based motion prediction for patches of the sub-QR-ICS frame relative to the sub-quantized first ICS-frame corresponding to the same ICS-channel, wherein the sub-QR-ICS-frame is divided into patches for patch-searching in the sub-quantized first ICS frames, there is no gap, nor overlap between the patches in the sub-QR-ICS-frame; for each sub-QR-ICS frame, determining a patch-subtracted ICS-frame by subtracting one or more matched patches from the sub-QR-ICS frame.

In some embodiments, the step for each sub-QR-ICS frame corresponding to an ICS-channel, doing patch-matching-based motion prediction for patches of the sub-QR-ICS frame relative to the sub-quantized first ICS-frame corresponding to the same ICS-channel, comprising: defining a sub-QR-ICS frame corresponding to an ICS-channel as a matching frame and a sub-quantized first ICS-frame corresponding to the same ICS-channel as a searching frame, and doing hierarchical patch searching in the searching frame, wherein the matching frame is divided into relative patches and the hierarchical patch searching comprising: for each relative patch in the matching frame, doing patch searching in the searching frame with a stride size, wherein the stride size is a defined integer not smaller than 1; during the patch searching in the searching frame, calculating a square difference between each patch in the searching frame and the relative patch in the matching frame; if the lowest square difference is smaller than a defined threshold, determining a target patch in the searching frame with lowest square difference as a reference patch and determining the relative patch as the matched patch; if the lowest square difference is not smaller than the defined threshold and if patch size of the relative patch is larger than a defined minimal patch size, defining the relative patch as the matching frame and the target patch as the searching frame, and doing hierarchical patch searching in the searching frame; doing the hierarchical patch searching repeatedly until a reference patch corresponding to the relative patch is found with a square difference smaller than the defined threshold, or patch size of the relative patch is not larger than the defined minimal patch size.

In some embodiments, the numbers of patch-subtracted ICS-frames in each stack are equal to each other.

In some embodiments, the step for patch-subtracted ICS-frames in each stack corresponding to each ICS-channel, determining shared data, and for each patch-subtracted ICS-frame, determining a stack-residual frame based on the shared data, comprising: for each stack corresponding to an ICS-channel, determining shared data by convoluting the patch-subtracted ICS-frames in each stack with a first kernel; for each stack corresponding to the ICS-channel, determining quantized shared data for each stack by quantizing values in shared data to integers of pre-defined bit depth; for each stack corresponding to the ICS-channel, rescaling the quantized shared data to RQ-shared data; for each stack corresponding to the ICS-channel, reshaping the RQ-shared data to RRQ-shared data by performing deconvolution with a second kernel; for each patch-subtracted ICS-frame in each stack corresponding to the ICS-channel, determining a stack-residual frame by subtracting RRQ-shared data from the patch-subtracted ICS-frame.

In some embodiments, the step for patch-subtracted ICS-frames in each stack corresponding to each ICS-channel, determining shared data, and determining stack-residual frames based on the shared data, comprising: for each stack corresponding to an ICS-channel, compressing the patch-subtracted ICS-frames into a weighted sum frame by doing weighted summation of values in a same location of the patch-subtracted ICS-frames with weighted-sum parameters; for each stack corresponding to the ICS-channel, determining shared data by convoluting the weighted sum frame with a first kernel; for each stack corresponding to the ICS-channel, determining quantized shared data by quantizing values in shared data to integers of pre-defined bit depth; for each stack corresponding to the ICS-channel, rescaling the quantized shared data to RQ-shared data; for each stack corresponding to the ICS-channel, reshaping the RQ-shared data to RRQ-shared data by performing deconvolution with a second kernel; for each patch-subtracted ICS-frame in each stack corresponding to the ICS-channel, determining a stack-residual frame by subtracting RRQ-shared data from the patch-subtracted ICS-frame.

In some embodiments, the method further comprising: for each stack corresponding to the ICS-channel, compressing each stack-residual frame with a third kernel; for each stack corresponding to the ICS-channel, determining quantized-compressed stack-residual frames by quantizing values in each compressed stack-residual frame to integers of pre-defined bit depth; doing entropy encoding to quantized-shared data corresponding to each ICS-channel, the quantized-compressed stack-residual frames corresponding to each ICS-channel, and motion vectors in each stack shared among ICS-channels respectively; wherein the entropy encoded-quantized-shared data corresponding to each ICS-channel, the entropy encoded quantized-compressed stack-residual frames corresponding to each ICS-channel and entropy encoded-motion vectors in each stack shared among ICS-channels are stored for decoding; wherein the entropy encoding operations are performed based on a global dictionary, and the global dictionary is pre-constructed based on large amount of data in the same type.

In some embodiments, the method further comprising: for each stack corresponding to the ICS-channel, doing entropy decoding to the entropy encoded-quantized-shared data, the entropy encoded-quantized-compressed stack-residual frames and corresponding entropy-encoded motion vectors; for each stack corresponding to the ICS-channel, rescaling each quantized-compressed stack-residual frame to a RQ-compressed stack-residual frame; for each stack corresponding to the ICS-channel, decompressing each RQ-compressed stack-residual frame into a first decompressed ICS-frame by performing deconvolution with a fourth kernel; for each stack corresponding to the ICS-channel, reshaping the RQ-shared data to RRQ-shared data by performing deconvolution with the second kernel; for each first decompressed ICS-frame in each stack corresponding to the ICS-channel, determining a second decompressed ICS-frame by adding RRQ-shared data and one or more corresponding matched patches with stored motion vectors to a first decompressed ICS-frame; for each second decompressed ICS-frame in each stack, determining a third decompressed ICS-frame by stacking the second decompressed ICS-frames corresponding to all ICS-channels together; for each third decompressed ICS-frame in each stack, determining a reconstructed frame by performing intra-frame decompression to the third decompressed ICS-frame with a decompressing kernel and a neural network for quality improvements (QINN) ; wherein the first-fourth kernel are shared among stacks corresponding to the ICS-channel.

In some embodiments, a temporal module comprises the first-fourth kernels, or the temporal module comprises the first-fourth kernels and the weighted-sum parameters, and parameters in the compressing kernel, the temporal module, the decompressing kernel and the QINN are determined by sample-based training, the sample-based training procedure comprising: reading out a plurality of groups of sample raw pixel values, wherein each group of sample raw pixel values correspond to a frame; doing intra-frame compression by compressing each group of sample raw pixel values into a sample ICS frame with an initial compressing kernel, wherein the sample ICS frames including a sample first ICS-frame and a number of sample R-ICS frames after the sample first ICS-frame in time, and wherein the initial compressing kernel has Ncomp ICS-channels and Ncomp is an integer not smaller than 1; quantizing the sample first ICS-frame, and quantizing sample R-ICS frames into sample QR-ICS frames, wherein sample quantized first ICS-frame comprises Ncomp sample sub-quantized first ICS-frames and each sample sub-quantized first ICS-frame corresponds to an ICS-channel of the sample quantized first ICS-frame, wherein each sample QR-ICS frame comprises Ncomp sample sub-QR-ICS frames and each sample sub-QR-ICS frame corresponds to an ICS-channel of the sample QR-ICS frame; for sample sub-QR-ICS frames corresponding to each ICS-channel, doing patch-matching subtraction in the sample sub-QR-ICS frames relative to the sample sub-quantized first ICS-frame, and generating sample patch-subtracted ICS-frames, wherein one or more motion vectors correspond to a sample sub-QR-ICS frame and a motion vector represents a relative location between a matched patch in one of the sample sub-QR-ICS frames and a reference patch in the sample sub-quantized first ICS-frame, and wherein motion vectors corresponding to an ICS-channel are shared among other ICS-channels; for sample patch-subtracted ICS-frames corresponding to each ICS-channel, grouping the sample patch-subtracted ICS-frames into stacks, wherein each stack includes a pre-defined number of sample patch-subtracted sample ICS-frames; for patch-subtracted ICS-frames in each stack corresponding to each ICS-channel, determining sample shared data and sample compressed stack-residual frames, sample first decompressed ICS-frames with an initial temporal module, wherein the initial temporal module comprises a fifth kernel, a sixth kernel, a seventh kernel and an eighth kernel, or the temporal module comprises the fifth-eighth kernel and initial weighted sum parameters; for each stack, determining sample reconstructed frames with an initial decompressing kernel and an initial QINN; training the initial compressing kernel into the compressing kernel; and training the initial decompressing kernel and the initial QINN into an intermediate decompressing kernel and an intermediate QINN; training parameters of the initial temporal module via multi-graph-combined-loss training.

In some embodiments, the step training parameters of the initial temporal module via multi-graph-combined-loss training, comprising: determining four computation graphs, wherein the four graphs are procedures using the initial temporal module, wherein the first computation graph G1 represents a procedure keeping both first and second quantization points, the second computation graph G2 represents a procedure keeping first quantization point, the third computation graph G3 represents a procedure keeping second quantization point, and the fourth computation graph G4 represents a procedure keeping none quantization point; wherein the first quantization point represents quantizing output data of the fifth kernel, and the second quantization point represents quantizing output data of the seventh kernel; determining three optimizers in sequential manner during iterative training; wherein the first optimizer is configured to train parameters before the first quantization point to minimize a first total loss which includes DA_E of the first quantization point from G1, DA_E of second quantization point from G3, and reconstruction loss from G4; wherein the second optimizer is configured to train parameters between first and second quantization points to minimize a second total loss which includes DA_E of second quantization point from G1, and reconstruction loss from G2; wherein the third optimizer is configured to train parameters after second quantization point to minimize a third total loss which includes reconstruction loss from G1; wherein DA_E represents an entropy’s differentiable approximations; training parameters in the initial temporal module by iteratively running the first, the second and the third optimizers, wherein the fifth-eighth kernels may be trained into fifth-eighth intermediate kernels, wherein parameters in the fifth-eighth intermediate kernels are floating numbers.

In some embodiments, the first graph G1 comprising: determining data T2 by inputting data T1 into a first convolution layer with parameters Para (bQ1) , wherein data T1 corresponds to sample patch-subtracted ICS-frames for all stacks corresponding to each channel; determining data T2_Q by quantizing data T2 at the first quantization point; determining data T3 based on data T2_Q, wherein parameters to be trained in the process from data T2_Q to T3 include a first deconvolution layer and a second convolution layer with parameters Para (aQ1, bQ2) , wherein the process further includes a rescaling operation before the first deconvolution layer and a subtracting operation after the second convolution layer; determining data T3_Q by quantizing data T3 at the second quantization point, wherein data T3_Q corresponds to quantized-compressed stack-residual frames; determining data T4 based on data T3_Q, wherein parameters to be trained in the process from data T3_Q to T4 include a second deconvolution layer with parameters Para (aQ2) , wherein the process further includes a rescaling operation before the second deconvolution layer; wherein Para (bQ1) is parameters in the fifth kernel, or Para (bQ1) is the weighted sum parameters and parameters in the fifth kernel; wherein Para (aQ1, bQ2) is parameters in the sixth kernel and the seventh kernel, and Para (aQ2) is parameters in the eighth kernel.

In some embodiments, the second graph G2 comprising: determining data T2 by inputting data T1 into the first convolution layer with parameters Para (bQ1) ; determining data T2_Q by quantizing data T2 at the first quantization point; determining data T3 based on data T2_Q, wherein parameters to be trained in the process from data T2_Q to T3 include the first deconvolution layer and the second convolution layer with parameters Para (aQ1, bQ2) ; determining data T4 (2) based on data T3, wherein parameters to be trained in the process from data T3 to T4 (2) include the second deconvolution layer with parameters Para (aQ2) .

In some embodiments, the third graph G3 comprising: determining data T2 by inputting data T1 into the first convolution layer with parameters Para (bQ1) ; determining data T3 (3) based on data T2, wherein parameters to be trained in the process from data T2 to T3 (3) include the first deconvolution layer and the second convolution layer with parameters Para (aQ1, bQ2) , wherein the process further includes a subtracting operation after the second convolution layer; determining data T3_Q (3) by quantizing data T3 at the second quantization point; determining data T4 (3) based on data T3_Q (3) , wherein parameters to be trained in the process from data T3_Q (3) to T4 (3) include the second deconvolution layer with parameters Para (aQ2) , wherein the process further include a rescaling operation before the second deconvolution layer.

In some embodiments, the fourth graph G4 comprising: determining data T2 by inputting data T1 into the first convolution layer with parameters Para (bQ1) ; determining data T3 (4) based on data T2, wherein parameters to be trained in the process from data T2 to T3 (4) include the first deconvolution layer and the second convolution layer with parameters Para (aQ1, bQ2) , wherein the process further includes a subtracting operation after the second convolution layer; determining data T4 (4) based on data T3 (4) , wherein parameters to be trained in the process from data T3 (4) to T4 (4) include the second deconvolution layer with parameters Para (aQ2) .

In some embodiments, the method further comprising: determining the first kernel by integerizing parameters in the fifth intermediate kernel, and determining the second kernel by integerizing parameters in the sixth intermediate kernel, and determining the third kernel by integerizing parameters in the seventh intermediate kernel; determining the fourth kernel by fine-tuning parameters in the eighth intermediate kernel; determining the decompressing kernel and the QINN by fine-tuning parameters in the intermediate decompressing kernel and the intermediate QINN.

Another aspect of the present disclosure is directed to an apparatus for compressing a video clip including a reading-out unit, wherein the reading-out unit is configured to read out a plurality of groups of raw pixel values from a camera head; a processor, wherein the processor is configured to perform compression to multi-frames of the video clip, wherein the compression comprising: doing intra-frame compression by compressing each group of raw pixel values into an intra-frame-compressive-sampling (ICS) frame with a compressing kernel, wherein ICS frames include a first ICS-frame and a number of remaining ICS frames (R-ICS frames) after the first ICS-frame in time, and wherein the compressing kernel has Ncomp ICS-channels and Ncomp is an integer not smaller than 1; quantizing the first ICS-frame, and quantizing R-ICS frames into QR-ICS frames, wherein the quantized first ICS-frame comprises Ncomp sub-quantized first ICS-frames and each sub-quantized first ICS-frame correspond to an ICS-channel of the quantized first ICS-frame, wherein each QR-ICS frame comprises Ncomp sub-QR-ICS frames and each sub-QR-ICS frame correspond to an ICS-channel of the QR-ICS frame; for sub-QR-ICS frames corresponding to each ICS-channel, doing patch-matching subtraction in sub-QR-ICS frames relative to the sub-quantized first ICS-frame corresponding to the same ICS-channel, and generating patch-subtracted ICS-frames, wherein one or more motion vectors correspond to a sub-QR-ICS frame and a motion vector represents a relative location between a matched patch in one of the sub-QR-ICS frames and a reference patch in the sub-quantized first ICS-frame, and wherein motion vectors corresponding to an ICS-channel are shared among other ICS-channels; for patch-subtracted ICS-frames corresponding to each ICS-channel, grouping the patch-subtracted ICS-frames into stacks, wherein each stack includes a pre-defined number of patch-subtracted ICS-frames; for patch-subtracted ICS-frames in each stack corresponding to each ICS-channel, determining shared data, wherein the shared data represent similar data among the patch-subtracted ICS-frames, and determining stack-residual frames based on the shared data; wherein each group of raw pixel values correspond to a frame of the video clip.

Additional features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The features of the present disclosure may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations set forth in the detailed examples discussed below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:

FIG. 1 shows an example of original raw-bayer picture according to some embodiments of the present disclosure;

FIG. 2 illustrates an intra-frame compressing method according to some embodiments of the present disclosure;

FIG. 3 shows a convolution process as described in 204 according to some embodiments of the present disclosure;

FIG. 4 illustrates an example of intra-frame compression process with frame strategy one as described in 204 according to some embodiments of the present disclosure;

FIG. 5 is an example of a compressing kernel during a single-layer convolutional 2D compression process of raw-bayer data according to some embodiments of the present disclosure;

FIG. 6 is an integer array of the shape [256, 480, 4] after compression of the input pixel values with the compressing kernel;

FIG. 7 illustrates a shared data determining method after the intra-frame compression process according to some embodiments of the present disclosure;

FIG. 8 shows an exemplary procedure of FIG. 7 according to some embodiments of the present disclosure;

FIG. 9 illustrates an exemplary patch-matching subtraction method according to some embodiments of the present disclosure;

FIG. 10 is an exemplary patch-matching-based motion prediction method according to some embodiments of the present disclosure;

FIG. 11 is an exemplary shared data-based compressing method according to some embodiments of the present disclosure;

FIG. 12 illustrates an exemplary procedure of FIG. 11 according to some embodiments of the present disclosure;

FIG. 13 illustrates a reconstruction method of the video clip according to some embodiments of the present disclosure;

FIG. 14 illustrates an exemplary procedure of FIG. 13 according to some embodiments of the present disclosure;

FIG. 15 illustrates a sample-based training method according to some embodiments of the present disclosure;

FIG. 16 illustrates an exemplary multi-graph-combined-loss training according to some embodiments of the present disclosure;

FIG. 17 illustrates an exemplary procedure of the first computation graph G1 according to some embodiments of the present disclosure;

FIG. 18 illustrates an exemplary procedure of the second computation graph G2 according to some embodiments of the present disclosure;

FIG. 19 illustrates an exemplary procedure of the third computation graph G3 according to some embodiments of the present disclosure;

FIG. 20 illustrates an exemplary procedure of the fourth computation graph G4 according to some embodiments of the present disclosure;

FIG. 21 illustrates exemplary four computation graphs according to some embodiments of the present disclosure;

FIG. 22 illustrates an exemplary temporal module determining method according to some embodiments of the present disclosure;

FIG. 23 is an exemplary diagram of a compressing apparatus according to some embodiments of the present disclosure; and

FIG. 24 is an exemplary diagram of a videoing system according to some embodiments of the present disclosure.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth by way of example in order to provide a thorough understanding of the relevant disclosure. However, it should be apparent to those skilled in the art that the present disclosure may be practiced without such details. In other instances, well known methods, procedures, systems, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present disclosure. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the present disclosure is not limited to the embodiments shown, but to be accorded the widest scope consistent with the claims.

It will be understood that the term “system, ” “engine, ” “unit, ” “module, ” and/or “block” used herein are one method to distinguish different components, elements, parts, section or assembly of different level in ascending order. However, the terms may be displaced by other expression if they may achieve the same purpose.

Generally, the word “module, ” “unit, ” or “block, ” as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions. A module, a unit, or a block described herein may be implemented as software and/or hardware and may be stored in any type of non-transitory computer-readable medium or other storage device. In some embodiments, a software module/unit/block may be compiled and linked into an executable program. It will be appreciated that software modules can be callable from other modules/units/blocks or from themselves, and/or may be invoked in response to detected events or interrupts. Software modules/units/blocks configured for execution on computing devices may be provided on a computer-readable medium, such as a compact disc, a digital video disc, a flash drive, a magnetic disc, or any other tangible medium, or as a digital download (and can be originally stored in a compressed or installable format that needs installation, decompression, or decryption prior to execution) . Such software code may be stored, partially or fully, on a storage device of the executing computing device, for execution by the computing device. Software instructions may be embedded in a firmware, such as an EPROM. It will be further appreciated that hardware modules/units/blocks may be included in connected logic components, such as gates and flip-flops, and/or can be included of programmable units, such as programmable gate arrays or processors. The modules/units/blocks or computing device functionality described herein may be implemented as software modules/units/blocks, but may be represented in hardware or firmware. In general, the modules/units/blocks described herein refer to logical modules/units/blocks that may be combined with other modules/units/blocks or divided into sub-modules/sub-units/sub-blocks despite their physical organization or storage. The description may be applicable to a system, an engine, or a portion thereof.

It will be understood that when a unit, engine, module or block is referred to as being “on, ” “connected to” or “coupled to” another unit, engine, module, or block, it may be directly on, connected or coupled to, or communicate with the other unit, engine, module, or block, or an intervening unit, engine, module, or block may be present, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

These and other features, and characteristics of the present disclosure, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, may become more apparent upon consideration of the following description with reference to the accompanying drawings, all of which form a part of this disclosure. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended to limit the scope of the present disclosure. It is understood that the drawings are not to scale.

The terminology used herein is for the purposes of describing particular examples and embodiments only, and is not intended to be limiting. As used herein, the singular forms “a,” “an, ” and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “include, ” and/or “comprise, ” when used in this disclosure, specify the presence of integers, devices, behaviors, stated features, steps, elements, operations, and/or components, but do not exclude the presence or addition of one or more other integers, devices, behaviors, features, steps, elements, operations, components, and/or groups thereof.

The present disclosure provided herein relates to a compressing method, apparatus and an imaging system. Detailed descriptions will be illustrated in the following embodiments.

In some embodiments, light received by a camera may be read out as raw-bayer data by image signal processing (ISP) chip. Raw-bayer data must be read-out of the camera using parallel or serial streaming as shown in FIG. 1. FIG. 1 shows an example of original raw-bayer picture according to some embodiments of the present disclosure. As shown in FIG. 1, a raw-bayer picture is in shape of [2048, 3840] , and each pixel may have a corresponding pixel value. The pixel values may be read out in sequence by the camera head after capturing the frame.

For each electronic sensor array, the read-out data from a focal plane may be in a raster format, meaning rows are read out in sequence. The input pixel values (raw-bayer data) is in shape of [2048, 3840] . In some embodiments, the [2048, 3840] original raw-bayer picture is compressed to an integer array of the shape [256, 480, 4] which will be described in Fig. 6. Pixels also correspond to different colors, typically red, green and blue, but the color values may be typically mosaicked across the sensor so that a given pixel corresponds to a given known color.

Pixel values in conventional cameras may be buffered for demosaicing, image processing (nonuniformity correction, color space transformation, denoising, sharpening, white balance and black level adjustment, etc. ) and compression. Since compression is implemented as a two-dimensional transformation, multiple rows (typically 8) must be buffered and each pixel value must be accumulated in several transformation buffers. In addition, division of pixel values by quantization matrices and compressive (Huffman) coding must be implemented on each image block.

In some embodiments, intra-frame compression may be performed on the raw-bayer data streams. FIG. 2 illustrates an intra-frame compressing method according to some embodiments of the present disclosure. In some embodiments, the intra-frame compressing method may be implemented in electronics connected to the camera head.

In 202, a plurality of groups of raw pixel values may be read out in sequence from a camera head. In some embodiments, the raw pixel values may be read out in sequence as raw-bayer data. For example, a camera head may capture a frame comprising a plurality of groups of raw pixel values, wherein each group of raw pixel values correspond to a frame of the video clip. Each pixel may be represented by a pixel value. The pixel value may be transmitted in a form of binary.

In 204, intra-frame compression may be performed by compressing each group of raw pixel values into an intra-frame-compressive-sampling (ICS) frame with a compressing kernel. For each frame, the compression may be compressing each portion of the group of raw pixel values into an integer with the compressing kernel, wherein each portion of the group of raw pixel values corresponds to a section of the frame. In some embodiments, a section may be a patch or a segment which will be described in the frame strategies. In some embodiments, ICS frames include a first ICS-frame and a number of remaining ICS frames (R-ICS frames) after the first ICS-frame in time. In some embodiments, the compressing kernel may have Ncomp ICS-channels and Ncomp may be an integer not smaller than 1.

In some embodiments, elements in the compressing kernel may be integers for easy application on hardware such as FPGA. For example, the element in the compressing kernel may be binaries and the bit depths of the elements may be 12-bit, 10-bit, 8-bit, 6-bit, 4-bit or 2-bit. Further, when the elements are 2-bit binaries, the elements may be -1 or +1; or the elements may be 0 or 1.

Frame strategy one

In some embodiments, a group of raw pixel values may correspond to pixels in a frame, and the compressing kernel may be a 2D kernel, wherein the frame may be divided into multi-2D patches. A 2D patch and the 2D kernel may have a same dimension. For example, the 2D kernel may have a dimension [k _x, k _y] , and the frame of shape [N _X, N _Y] may be divided into [N _x, N _y] 2D patches, wherein N _x=N _X/k _x, N _y=N _Y/k _y . Pixel values corresponding to the pixels in a certain patch with shape [k _x, k _y, 1] may be multiplied by a 2D kernel with shape [k _x, k _y, Ncomp] , and the pixel values in the 2D patch may be compressed into Ncomp numbers (Ncomp is a preset integer defined manually) . As a result, the input raw pixel values of the frame may be compressed into COMP, wherein COMP is an array of Ncomp numbers (an array of integers) with a dimension of [N _x, N _y, Ncomp] , Ncomp may represent number of ICS-channels of COMP . The intra-frame compression process may be a 2D convolution operation described as equation (1) shows below:

where indices i and j loop through k _x and k _y respectively, and index k is from 1 to Ncomp.

The compression ratio without considering the difference between bit depth of input pixel values (raw pixel values are usually 8-bit or 10-bit) and bit depth of the array of numbers after compression (8 bit) can be expressed as Ncomp/ (k _x*k _y) . In some embodiments, different compression ratios can be achieved by using various settings of [k _x, k _y, Ncomp] . For example, different compression ratios such as 1/16 , 1/16 , 1/32 and 1/256 can be achieved with 2D kernels [16, 16, 16] , [8, 8, 4] , [16, 16, 8] and [16, 16, 1] respectively.

Frame strategy two

In some embodiments, a group of raw pixel values may correspond to pixels in a frame, and the compressing kernel may be a 1D kernel. As the pixels in a frame is transmitted in sequence, the sequence of pixels corresponding to the frame may be divided into segments. The 1D compressing kernel may be an integer vector with a same dimension of the segment, and the intra-frame compression process may be a 1D convolution operation that combines pixels in the 1D segment of the frame into an integer with the 1D kernel.

In some embodiments, each element in the integer vector may be -1 or +1. In some embodiments, each element in the integer vector may be 0 or 1. For example, 16 incoming pixel values may be combined together using an integer vector [0, 1, 0, 0, 1, 0, ... 1] with length of 16 into one number. As another example, 16 incoming pixel values may be combined together using an integer vector [-1, 1, -1, -1, 1, -1, ... 1] with length of 16 into one number.

Specifically, the sequence may be divided row by row. Various 1D kernels or 1D integer vectors have been developed, including [128, 1, 4] , [32, 1, 4] . And combinations of different convolutional-1D kernels for different rows in the raw-bayer data may be used to control the total compression ratio of a frame.

This division way of the sequence of pixels uses less buffer size than that of the 2D patch division way, because pixel values from different rows/segments do not need to be buffered while the incoming pixel values can be processed as segments.

As described above, raw pixel values correspond to a patch or a segment may be compressed into an integer, and the group of raw pixel values may be compressed into integers. The integers may be stored or buffered during the compression/decompression process.

It should be noted that the intra-frame compression process is provided only for illustration purpose, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. For example, compressing kernel with other bit depth may also be applied to compress the raw pixel values.

FIG. 3 shows a convolution process as described in 204 according to some embodiments of the present disclosure. As shown in FIG. 3, an array of pixels with dimension of 4*4 may be compressed into one integer with a compressing kernel. The compressing kernel may also have a dimension of 4*4.

In some embodiments, the compressed data (ICS-frames) may be quantized to n-bit integers for storage and/or transmission. For example, the first ICS-frame may be quantized to quantized first ICS-frame, and R-ICS frames may be quantized into quantized R-ICS frames (QR-ICS frames) . In some embodiments, the quantized first ICS-frame may comprise Ncomp sub-quantized first ICS-frames and each sub-quantized first ICS-frame may correspond to an ICS-channel of the quantized first ICS-frame; and each QR-ICS frame may comprise Ncomp sub-QR-ICS frames and each sub-QR-ICS frame may correspond to an ICS-channel of the QR-ICS frame.

The quantization process in the current invention is much simpler than that of JPEG’s. The results of the output of the convolutional operation may be scaled (reduce bit depth) to suit the range of 8-bit integers. While there are no complicated calculations for entropy coding for reducing quality loss, and the quantization loss directly affects the overall quality of compression/decompression, the compression/decompression process keeps high similar overall quality as that of JPEG.

While the intra-frame compression process is already simple, it may provide a way to reduce the necessary buffer size, as illustrated in 204. For applying a convolutional-2D kernel with dimension [4, 4, 1] to a patch of pixels in shape [4, 4] , one does not need to put all the pixels (16 in total) in buffer and carry out the elementwise product and summation once. Instead, one can process the pixels with the proper kernel weight elements row-by-row as the pixels are read in, and put the output value (array of numbers) in buffer until a single convolutional operation is finished. After each convolutional operation, the buffered numbers can be output to storage, and the buffer can be cleared.

For an incoming raw-bayer picture of dimension [N _X, N _Y] being processed by a convolutional-2D kernel of dimension [k _x, k _y, Ncomp] , the necessary buffer size is k _x rows of raw bayer pixels when the method above is carried out.

FIG. 4 illustrates an example of intra-frame compression process with frame strategy one as described in 204 according to some embodiments of the present disclosure. As shown in FIG. 4, a frame may be compressed patch by patch. In some embodiments, the frame may be processed by hardware such as FPGA, the convolutional kernel is applied to each patch of pixels, and move to the next patch with no overlap nor gap until the last one. The patch 1 shown in FIG. 4 represents the patch that have been processed, and patch 2 represents the patch being processed.

FIG. 5 is an example of a compressing kernel during a single-layer convolutional 2D compression process of raw-bayer data according to some embodiments of the present disclosure. A single-layer convolutional-2D operation may be implemented to compress a [N _x, N _y] -pixel raw-bayer data. The compressing kernel is of the shape [k _x, k _y, Ncomp] = [8, 8, 4] and is quantized from a trained float-weight neural network to 4-bit signed integers in range [-7, 7] . The compressing kernel is shown in Fig. 5, and the four panels present each of the [8, 8] matrix. Each panel presents one of the [8, 8, i] part of the compressing kernel (i=0, 1, 2, 3) .

FIG. 6 is an integer array of the shape [256, 480, 4] after compression of the input pixel values shown in FIG. 1 with the compressing kernel. The intra-frame compression may be implemented with the compressing kernel shown in FIG. 5. The i ^th panel shows the matrix [256, 480, i] , where index i=0, 1, 2, 3.

FIG. 7 illustrates a shared data determining method after the intra-frame compression process according to some embodiments of the present disclosure. The shared data determining method may lead to a further compression of transmission/buffer data.

As described in FIG. 2, multi-frames in the video clip may be compressed into ICS frames. The ICS frames may include a first ICS-frame and a number of remaining ICS frames (R-ICS frames) after the first ICS frame in time.

In 702, for sub-QR-ICS frames corresponding to each ICS-channel, doing patch-matching subtraction in sub-QR-ICS frames relative to the sub-quantized first ICS-frame corresponding to the same ICS-channel, and generating patch-subtracted ICS-frames. In a patch-subtracted ICS-frame, there may be one or more matched patches relative to the quantized first ICS-frame, and there is a motion vector for a matched patch relative to a reference patch in the first ICS-frame. A motion vector represents a relative location between a matched patch in one of the sub-QR-ICS frames and a reference patch in the sub-quantized first ICS-frame. In some embodiments, the motion vectors may be shared among ICS-channels and therefore motion vectors for each QR-ICS frame may be determined only once for all ICS-channels in one QR-ICS frame.

In 704, for patch-subtracted ICS-frames corresponding to each ICS-channel, grouping the patch-subtracted ICS-frames into stacks, wherein each stack includes a pre-defined number of patch-subtracted ICS-frames. In some embodiments, the numbers of patch-subtracted ICS-frames in each stack may be equal to each other. For example, a video clip consists of 100 frames may be compressed into 100 ICS frames (a first ICS frame and 99 R-ICS frames) , quantization may be performed to the 100 ICS frames (including a quantized first ICS frame and 99 QR-ICS frames) , then for each ICS-channel, patch-matching subtraction may be performed in 99 sub-QR-ICS frames relative to a sub-quantized first ICS-frame and generating 99 patch-subtracted ICS-frames, then the 99 patch-subtracted ICS-frames may be grouped into 33 stacks (total number of stacks is 33*Ncomp) , wherein each stack includes 3 patch-subtracted ICS-frames.

In 706, for patch-subtracted ICS-frames in each stack corresponding to each ICS-channel, determining shared data, and determining stack-residual frames based on the shared data. Shared data may represent similar data among the patch-subtracted ICS-frames corresponding to the ICS-channel.

FIG. 8 shows an exemplary procedure of FIG. 7 according to some embodiments of the present disclosure.

As shown in FIG. 8, input raw pixel values of a frame of shape [N _X, N _Y] may be compressed into COMP with a dimension of [N _x, N _y, Ncomp] . Then COMP may be quantized to COMP_Q . Patch-matching subtraction may be performed in COMP_Q , and generating a patch-subtracted frame COMP_SMP with a dimension of [N _x, N _y, Ncomp] , wherein a patch-subtracted frame comprises Ncomp patch-subtracted ICS-frames (patch-subtracted ICS-frames is performed ICS-channel by ICS-channel as described in FIG. 7) . Then corresponding to each ICS-channel, patch-subtracted ICS-frames (COMP_SMP in each ICS-channel, with dimension [N _x, N _y] ) may be grouped into stacks, and there are Nfp patch-subtracted ICS-frames in each stack. Pixel values data in each stack may be expressed as COMP_SMPS.

FIG. 9 illustrates an exemplary patch-matching subtraction method according to some embodiments of the present disclosure. The patch-matching subtraction may be performed in a sub-QR-ICS frame relative to the sub-quantized first ICS-frame.

In 902, for each sub-QR-ICS frame corresponding to an ICS-channel, patch-matching-based motion prediction for patches of the sub-QR-ICS frame relative to a sub-quantized first ICS-frame corresponding to the same ICS-channel, wherein the sub-QR-ICS-frame is divided into patches for patch-searching in the sub-quantized first ICS frames, there is no gap, nor overlap between the patches in the sub-QR-ICS-frame.

In 904, for each sub-QR-ICS frame, determining a patch-subtracted ICS-frame by subtracting matched patches from the sub-QR-ICS frame. As a result, multi-patch-subtracted ICS-frames corresponding to the R-ICS frames may be determined.

In some embodiments, patch-matching-based motion prediction may be performed based on hierarchical patch searching. FIG. 10 is an exemplary patch-matching-based motion prediction method according to some embodiments of the present disclosure.

In 1002, the sub-QR-ICS frame corresponding to an ICS-channel may be defined as a matching frame and the sub-quantized first ICS-frame corresponding to the same ICS-channel may be defined as a searching frame. In 1004, doing hierarchical patch searching in the searching frame. The matching frame is divided into relative patches.

For each relative patch in the matching frame, the hierarchical patch searching may be described as the following steps 1006-1012, and the hierarchical patch searching may be performed in each relative patch.

For a relative patch, patch searching may be performed in the searching frame with a stride size. In some embodiments, the stride size is a defined integer not smaller than 1. In some embodiments, during the patch searching in the searching frame, a square difference between each patch in the searching frame and the relative patch in the matching frame may be determined. As a result, for the relative patch, multi-square differences may be determined corresponding to each patch in the searching frame.

In 1006, for a relative patch, it may be determined that whether the lowest square difference is smaller than a defined threshold. The lowest square difference is one of the multi-square differences.

In 1008, if the lowest square difference is smaller than the defined threshold, a target patch in the searching frame with lowest square difference may be determined as a reference patch, and the relative patch may be determined as a matched patch.

In 1010, if the lowest square difference is not smaller than the defined threshold, it may be determined that whether patch size of the relative patch larger than a defined minimal patch size.

In 1012, if patch size of the relative patch is larger than the defined minimal patch size, the relative patch may be defined as the matching frame and the target patch may be defined as the searching frame, then repeat 1004 to do hierarchical patch searching.

For a relative patch, the hierarchical patch searching may be ended if a reference patch is determined or if patch size of the relative patch is not larger than the defined minimal patch size. For all relative patches, doing step 1006-1012. As a result, one or more matched patches in a sub-QR-ICS frame may be determined, and for each matched patch, a corresponding motion vector may be determined.

As the relative patch may be further divided, patch size is reducing during the hierarchical patch searching. In some embodiments, the stride size and the patch size of the relative patch may have positive correlation. For example, when the relative patch has a large patch size, the stride size may be big; while when the relative patch has a small patch size, the stride size may be small. Generally, the stride size is a defined integer not smaller than 2, and the stride size may be 1 only when the relative patch cannot be divided any more.

FIG. 11 is an exemplary shared data-based compressing method according to some embodiments of the present disclosure. Specifically, FIG. 11 shows a process of step 706.

In 1102, for each stack corresponding to an ICS-channel, shared data may be determined by convoluting patch-subtracted ICS-frames in the ICS-channel with a first kernel. Shared data may represent similar data among the patch-subtracted ICS-frames in a stack.

In 1104, for each stack corresponding to the ICS-channel, quantized shared data may be determined by quantizing values in the shared data to integers of pre-defined bit depth with a first scaling factor. In some embodiments, the quantization may include two steps. Firstly, values in the shared data may be scaled (reduce bit depth) to suit the range of n-bit integers with a first scaling factor. For example, values in the shared data may be multiplied by the first scaling factor. The first scaling factor may be an integer or a proper fraction. Secondly, scaled values of the shared data may be integerized to integers.

In 1106, for each stack corresponding to the ICS-channel, the quantized shared data may be rescaled to rescaled-quantized shared data (RQ-shared data) . In some embodiments, the quantized shared data may be rescaled by dividing values of the quantized shared data with the first scaling factor. In some embodiments, the rescaling operation may bring quantization loss. For example, a pixel with value 23 in the shared data may be scaled to 11.5 with a first scaling factor 1/2, and then be integerized to integer 12. Integer 12 then may be rescaled to integer 24 with the first scaling factor 1/2, which brings quantization loss between integer 24 and integer 23, wherein the error of the convolution and deconvolution has not been taken into account.

In 1108, the RQ-shared data may be reshaped to reshaped RQ-shared data (RRQ-shared data) by performing deconvolution with a second kernel.

In 1110, for each stack corresponding to the ICS-channel, a stack-residual frame may be determined by subtracting RRQ-shared data from the patch-subtracted ICS-frame.

In some embodiments, method in FIG. 11 may be further simplified. Shared data in each stack corresponding to each ICS-channel may be determined by convoluting a weighted sum frame with a first kernel. The weighted sum frame may be determined by doing weightedsummation of values in a same location of the patch-subtracted ICS-frames with weighted-sum parameters. And convolution of one weighted sum frame takes much less power and memory than convolution of multi-patch-subtracted ICS-frames.

In some embodiments, stack-residual frames may be further compressed, quantized. etc., for each stack corresponding to the ICS-channel, each stack-residual frame in the stack may be compressed with a third kernel. Then, for each stack corresponding to the ICS-channel, quantized-compressed stack-residual frames may be determined by quantizing values in each compressed stack-residual frame to integers of pre-defined bit depth with a second scaling factor. In some embodiments, the quantization process may be same as that in 1104. In some embodiments, a second scaling factor used in the quantization may be equal or not equal to the first scaling factor.

In some embodiments, entropy encoding may be performed to the quantized-shared data after step 1102. Secondly, do rescaling as described in 1108. In some embodiments, entropy encoding may be performed to the quantized-compressed stack-residual frames after step 1112. The entropy encoded quantized-compressed stack-residual frames may be stored for decoding. In some embodiments, entropy encoding may be performed to the motion vectors shared among ICS-channels. Entropy encoding is an operation before transmission or storage, and the entropy encoded-quantized-shared data corresponding to each ICS-channel, the entropy encoded quantized-compressed stack-residual frames corresponding to each ICS-channel and entropy encoded-motion vectors corresponding to each stack shared among ICS-channels are stored for decoding/decompression. In some embodiments, quantized first ICS-frame may be also stored for adding matched patches to reconstruct the multi-frames of the video clip.

Entropy encoding may be performed based on a global dictionary, wherein the global dictionary is pre-constructed based on large amount of data in the same type. In some embodiments, the global dictionary may be determined based on the codec building process of Huffman coding.

Firstly, large amount of data in the same type may be used to construct general dictionary that applicable for each type of data. In some embodiments, data in same type may be quantized shared data, or quantized-compressed stack-residual frames, or motion vectors . etc., while same type of data originated from different frames have difference, they share statistical similarity of distribution of values (like different Gaussian-distribution peaks with overlaps and/or close to each other) . Then in entropy encoding process of input data of certain type, the global dictionary may be used to code values.

FIG. 12 illustrates an exemplary procedure of FIG. 11 according to some embodiments of the present disclosure (also including quantization and entropy encoding process) . As shown in FIG. 12, shared data Smem (with a dimension of [Nx/ksmx, Ny/ksmy, ncomp_sm] ) may be determined by convoluting COMP_SMPS (with a dimension of [N _x, N _y, Nfp ) with a first kernel (with a dimension of [ksmx, ksmy, Nfp, ncomp_sm] ) . Smem may be quantized to Smem_Q. Then Smem_Q may be rescaled to Smem_Q_rsc (with a dimension of [N _x/kresx, N _y/kresy, ncomp_sm] ) , and Smem_Q_rsc may be reshaped to SMem_rs (with a dimension of [N _x, N _y ) by performing deconvolution to Smem_Q_rsc with a second kernel (with a dimension of [ksmx, ksmy, ncomp_sm] ) . Then in each ICS-channel and in each stack, SMem_rs may be subtracted from each patch-subtracted ICS-frame COMP_SMPS_i to determine stack-residual frames COMP_sres_i . Each stack-residual frame in each stack may be further compressed into comp_sres_i (with a dimension of [N _x/kresx, N _y/kresy, ncomp_res] ) with a third kernel (with a dimension of [kresx, kresy, ncomp_res] ) . Finally, comp_sres_i may be quantized into comp_sres_i_Q. And comp_sres_i_Q may be entropy-encoded into comp_res_i_Q_EC.

FIG. 13 illustrates a reconstruction method of the video clip according to some embodiments of the present disclosure.

In 1302, for each stack corresponding to the ICS-channel, doing entropy decoding to the entropy encoded-quantized-shared data, the entropy encoded-quantized-compressed stack-residual frames and corresponding entropy-encoded motion vectors. In some embodiments, as the decompression and compression may be performed on different side, entropy encoding may be performed before transmission. When the decompression side received the entropy encoded data, entropy decoding may be performed firstly.

In 1304, for each stack corresponding to the ICS-channel, each quantized-compressed stack-residual frame may be rescaled to RQ-compressed stack-residual frame, and quantized shared data may be rescaled to RQ-shared data. In some embodiments, the RQ-shared data may be determined based on the first scaling factor as described in 1106. Each RQ-compressed stack-residual frame may be determined by dividing quantized-compressed stack-residual frame with the second scaling factor.

In 1306, for each stack corresponding to the ICS-channel, each RQ-compressed stack-residual frame may be decompressed into a first decompressed ICS-frame by performing deconvolution with a fourth kernel.

In 1308, for each stack corresponding to the ICS-channel, the RQ-shared data may be reshaped to RRQ-shared data by performing deconvolution with the second kernel.

In 1310, for each first decompressed ICS-frame in each stack corresponding to the ICS-channel, a second decompressed ICS-frame may be determined by adding corresponding RRQ-shared data and one or more corresponding matched patches with stored motion vectors to the first decompressed ICS-frame.

In 1312, for each second decompressed ICS-frame in each stack, a third decompressed ICS-frame may be determined by stacking the second decompressed ICS-frames corresponding to all ICS-channels together. The third decompressed ICS-frame may be an ICS-channel-stack of the second decompressed ICS-frames. The third decompressed ICS-frame may correspond to a R-ICS frame and there may be a pre-defined number of third decompressed ICS-frames in each stack.

In 1314, for each third decompressed ICS-frame in each stack, a reconstructed frame may be determined by performing intra-frame decompression to the third decompressed ICS-frame with a decompressing kernel and a neural network for quality improvements (QINN) . In some embodiments, the process with decompressing kernel may be seen as two steps: a rescaling step and a decompressing step. The rescaling step may correspond to the quantization process after the intro-frame compression as described in FIG. 3, and the rescaling may be performed based on a scaling factor same as that in FIG. 3. In some embodiments, the step 1312 may be divided into two steps: rescaling the third decompressed ICS-frame to a fourth decompressed ICS-frame, and decompressing the fourth decompressed ICS-frame to a reconstructed frame.

In some embodiments, the first-fourth kernels may be shared among stacks corresponding to the ICS-channel. Further, corresponding to each ICS-channel, there may be a set of first-fourth kernels, and each set of first-fourth kernels may be determined by sample-based training.

Frames in a stack may be reconstructed with corresponding RQ-shared data. Further, all frames of the video clip may be reconstructed.

FIG. 14 illustrates an exemplary procedure of FIG. 13 according to some embodiments of the present disclosure. As shown in FIG. 14, quantized-compressed stack-residual frame comp_sres_i_Q may be rescaled into comp_sres_i_rsc (with a dimension of [N _x/kresx, N _y/kresy, Ncomp_res] ) , and comp_sres_i_rsc may be decompressed into COMP_sres_i_D with a fourth kernel, wherein COMP_sres_i_D has a dimension of [N _x, N _y] and the fourth kernel has a dimension of [kresx, kresy, Ncomp_res] . Then SMem_rs may be added into COMP_sres_i_D to determine COMP_D, wherein COMP_D has a dimension of [N _x, N _y] . SMem_rs is determined based on entropy decoding (with the global dictionary) , rescaling (with the first scaling factor) and reshaping (with the second kernel) . Then reconstructed frame may be determined by performing deconvolution to CPMP_D_all_chans with a decompressing kernel and a quality-improved neural network (QINN) , wherein CPMP_D_all_chans is a collection of values in all ICS-channels, and COMP_D_all_chans has a dimension of [N _x, N _y, Ncomp] , and the decompressing kernel has a dimension of [k _x, k _y, Ncomp] , and the reconstructed frame has a dimension of [N _X, N _Y] .

In some embodiments, a temporal module comprises the first-fourth kernels for each of the Ncomp ICS-channels, or the temporal module comprises the first-fourth kernels and the weighted-sum parameters. In some embodiments, the parameters in the compressing kernel, the temporal module, the decompressing kernel and the QINN may be determined by sample-based training. Further, temporal modules for all ICS-channels may be determined based on the sample-based training.

FIG. 15 illustrates a sample-based training method according to some embodiments of the present disclosure.

In 1502, a plurality of groups of sample raw pixel values may be read out, wherein each group of sample raw pixel values correspond to a frame. The plurality of groups of sample raw pixel values may be used for sample-based training.

In 1504, intra-frame compression may be performed by compressing each group of sample raw pixel values into a sample ICS frame with an initial compressing kernel, wherein the sample ICS frames including a sample first ICS-frame and sample R-ICS frames.

In 1506, the sample first ICS-frame may be quantized to quantized sample first ICS-frame, and sample R-ICS frames may be quantized into quantized sample QR-ICS frames. In some embodiments, sample quantized first ICS-frame may comprise Ncomp sample sub-quantized first ICS-frames and each sample sub-quantized first ICS-frame may correspond to an ICS-channel of the sample quantized first ICS-frame, wherein each sample QR-ICS frame may comprise Ncomp sample sub-QR-ICS frames and each sample sub-QR-ICS frame may correspond to an ICS-channel of the sample QR-ICS frame.

In 1508, for sample sub-QR-ICS frames corresponding to each ICS-channel, patch-matching subtraction may be performed in the sample sub-QR-ICS frames relative to the sample sub-quantized first ICS-frame, and generating sample patch-subtracted ICS-frames, wherein one or more motion vectors correspond to a sample sub-QR-ICS frame and a motion vector represents a relative location between a matched patch in one of the sample sub-QR-ICS frames and a reference patch in the sample sub-quantized first ICS-frame, and wherein motion vectors in an ICS-channel are shared among other ICS-channels.

In 1510, for sample patch-subtracted ICS-frames corresponding to each ICS-channel, the sample patch-subtracted ICS-frames may be grouped into stacks, wherein each stack includes a pre-defined number of sample patch-subtracted sample ICS-frames.

In 1512, for patch-subtracted ICS-frames in each stack in each ICS-channel, sample shared data, sample compressed stack-residual frames and sample first decompressed ICS-frames may be determined with an initial temporal module, wherein the initial temporal module comprises a fifth kernel, a sixth kernel, a seventh kernel and an eighth kernel, or the temporal module comprises the fifth-eighth kernel and initial weighted sum parameters.

In 1514, for each stack, sample reconstructed frames may be determined with an initial decompressing kernel and an initial QINN.

In 1516, the initial compressing kernel may be trained into the compressing kernel, and the initial decompressing kernel and the initial QINN may be trained into an intermediate decompressing kernel and an intermediate QINN.

In 1518, parameters of the initial temporal module may be trained via multi-graph-combined-loss training.

FIG. 16 illustrates an exemplary multi-graph-combined-loss training according to some embodiments of the present disclosure.

In 1602, four computation graphs may be determined. The four graphs are procedures using the initial temporal module. In some embodiments, the first computation graph G1 represents a procedure keeping both first and second quantization points, the second computation graph G2 represents a procedure keeping first quantization point, the third computation graph G3 represents a procedure keeping second quantization point, and the fourth computation graph G4 represents a procedure keeping none quantization point. In some embodiments, the first quantization point represents quantizing output data of the fifth kernel, and the second quantization point represents quantizing output data of the seventh kernel. The four computation graphs may also be shown in FIG. 21 according to some embodiments of the present disclosure.

In 1604, three optimizers in sequential manner during iterative training may be determined. In some embodiments, the first optimizer is configured to train parameters before the first quantization point to minimize a first total loss which includes DA_E of the first quantization point from G1, DA_E of second quantization point from G3, and reconstruction loss from G4. In some embodiments, the second optimizer is configured to train parameters between first and second quantization points to minimize a second total loss which includes DA_E of second quantization point from G1, and reconstruction loss from G2. In some embodiments, the third optimizer is configured to train parameters after second quantization point to minimize a third total loss which includes reconstruction loss from G1. In some embodiments, DA_E represents an entropy’s differentiable approximations.

In 1606, parameters in the initial temporal module may be trained by iteratively running the first, the second and the third optimizers, wherein the fifth-eighth kernels may be trained into fifth-eighth intermediate kernels. In some embodiments, parameters in the fifth-eighth intermediate kernels are floating numbers. In some embodiments, the weighted sum parameters may be also trained.

FIG. 17 illustrates an exemplary procedure of the first computation graph G1 according to some embodiments of the present disclosure.

In 1702, data T2 may be determined by inputting data T1 into a first convolution layer with parameters Para (bQ1) . In some embodiments, data T1 corresponds to sample patch-subtracted ICS-frames for all stacks, which is the input of the initial temporal module. In some embodiments, Para (bQ1) is parameters in the fifth kernel, or Para (bQ1) is the weighted sum parameters and parameters in the fifth kernel.

In 1704, data T2_Q may be determined by quantizing data T2 at the first quantization point.

In 1706, data T3 may be determined based on data T2_Q. In some embodiments, parameters to be trained in the process from data T2_Q to T3 may include a first deconvolution layer and a second convolution layer with parameters Para (aQ1, bQ2) . In some embodiments, Para (aQ1, bQ2) is parameters in the sixth kernel and the seventh kernel. The process may further include a rescaling operation before the first deconvolution layer and a subtracting operation after the second convolution layer.

In 1708, data T3_Q may be determined by quantizing data T3 at the second quantization point. In some embodiments, data T3_Q may correspond to quantized- compressed stack-residual frames.

In 1710, data T4 may be determined based on data T3_Q. In some embodiments, parameters to be trained in the process from data T3_Q to T4 may include a second deconvolution layer with parameters Para (aQ2) . In some embodiments, Para (aQ2) is parameters in the eighth kernel. The process may further include a rescaling operation before the second deconvolution layer.

As described above, the processing in FIG. 17 may correspond to content that described in FIG. 11-13.

FIG. 18 illustrates an exemplary procedure of the second computation graph G2 according to some embodiments of the present disclosure.

In 1802, data T2 may be determined by inputting data T1 into a first convolution layer with parameters Para (bQ1) .

In 1804, data T2_Q may be determined by quantizing data T2 at the first quantization point.

In 1806, data T3 may be determined based on data T2_Q. In some embodiments, parameters to be trained in the process from data T2_Q to T3 may include the first deconvolution layer and the second convolution layer in sequence with parameters Para (aQ1, bQ2) . And same with step 1706, the process further includes a rescaling operation before the first deconvolution layer and a subtracting operation after the second convolution layer.

In 1808, data T4 (2) may be determined based on data T3. In some embodiments, parameters to be trained in the process from data T3 to T4 (2) may include the second deconvolution layer with parameters Para (aQ2) . As data T3 has not been quantized, the rescaling operation as described in 1710 is not necessary, either.

FIG. 19 illustrates an exemplary procedure of the third computation graph G3 according to some embodiments of the present disclosure.

In 1902, data T2 may be determined by inputting data T1 into the first convolution layer with parameters Para (bQ1) .

In 1904, data T3 (3) may be determined based on data T2. In some embodiments, parameters to be trained in the process from data T2 to T3 (3) may include the first deconvolution layer and the second convolution layer in sequence with parameters Para (aQ1, bQ2) . In some embodiments, the process may further include a subtracting operation after the second convolution layer. As data T2 has not been quantized, the rescaling operation as described in 1706 is not necessary, either.

In 1906, data T3_Q (3) may be determined by quantizing data T3 at the second quantization point.

In 1908, data T4 (3) may be determined based on data T3_Q (3) . In some embodiments, parameters to be trained in the process from data T3_Q (3) to T4 (3) include the second deconvolution layer with parameters Para (aQ2) . In some embodiments, the process may further include a rescaling operation before the second deconvolution layer.

FIG. 20 illustrates an exemplary procedure of the fourth computation graph G4 according to some embodiments of the present disclosure.

In 2002, data T2 may be determined by inputting data T1 into the first convolution layer with parameters Para (bQ1) .

In 2004, data T3 (4) may be determined based on data T2. In some embodiments, parameters to be trained in the process from data T2 to T3 (4) may include the first deconvolution layer and the second convolution layer in sequence with parameters Para (aQ1, bQ2) . In some embodiments, the process may further include a subtracting operation after the second convolution layer.

In 2006, data T4 (4) may be determined based on data T3 (4) . In some embodiments, parameters to be trained in the process from data T3 (4) to T4 (4) may include the second deconvolution layer with parameters Para (aQ2) .

As described above in FIG. 15-20, the fifth-eighth kernels may be trained into fifth-eighth intermediate kernels. In some embodiments, the fifth-eighth intermediate kernels may be further processed to determine the temporal module.

FIG. 22 illustrates an exemplary temporal module determining method according to some embodiments of the present disclosure.

In 2202, the first kernel may be determined by integerizing parameters in the fifth intermediate kernel, and the second kernel may be determined by integerizing parameters in the sixth intermediate kernel, and the third kernel may be determined by integerizing parameters in the seventh intermediate kernel.

In 2204, the fourth kernel may be determined by fine-tuning parameters in the eighth intermediate kernel.

In some embodiments, parameters in the first-third kernels may be integers and parameters in the fourth kernel may be still floating numbers.

In 2206, the decompressing kernel and the QINN may be determined by fine-tuning parameters in the intermediate decompressing kernel and the intermediate QINN.

FIG. 23 is an exemplary diagram of a compressing apparatus according to some embodiments of the present disclosure. As shown in FIG. 23, the compressing apparatus may include a read-out unit 2310, a compressing unit 2320 and a storage 2330. In some embodiments, the compressing apparatus 2300 may be configured to compressing raw-bayer data corresponding to a video clip from focal plane (sensor array) of a camera.

The reading-out unit 2310 may be configured to read out a plurality of groups of raw pixel values in sequence. In some embodiments, each group of raw pixel values may correspond to a frame of a video clip.

The compressing unit 2320 may be configured to perform compression as described in FIG. 2, FIG. 7-12, wherein intra-frame compression, and shared data-based compression may be performed.

FIG. 24 is an exemplary diagram of a videoing system according to some embodiments of the present disclosure. the videoing system 2400 may include a compressing module 2410 and a decompressing module 2420.

The compressing module 2410 may be configured to read out a plurality of groups of raw pixel values in sequence by reading-out unit 2411, and may be configured to perform compression operations by compressing unit 2412. It should be noted that the compressing module may be same as the compressing apparatus described in FIG. 23.

The storage 2420 may be configured to store quantized first ICS-frame, entropy encoded-quantized-shared data corresponding to each ICS-channel, the entropy encoded quantized-compressed stack-residual frames corresponding to each ICS-channel and entropy encoded-motion vectors in each stack shared among ICS-channels. In some embodiments, the stored data may be used to reconstruct multi-frames in the video clip.

The decompressing module 2430 may be configured to reconstruct multi-frames of the video clip. The reconstruction operations may be performed as FIG. 13-14.

The compression and decompression procedures has been described in the method part of the present disclosure, and we do not develop here in the compressing apparatus and video system.

Having thus described the basic concepts, it may be rather apparent to those skilled in the art after reading this detailed disclosure that the foregoing detailed disclosure is intended to be presented by way of example only and is not limiting. Various alterations, improvements, and modifications may occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested by this disclosure, and are within the spirit and scope of the exemplary embodiments of this disclosure.

Moreover, certain terminology has been used to describe embodiments of the present disclosure. For example, the terms “one embodiment, ” “an embodiment, ” and/or “some embodiments” mean that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Therefore, it is emphasized and should be appreciated that two or more references to “an embodiment” or “one embodiment” or “an alternative embodiment” in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the present disclosure.

Further, it will be appreciated by one skilled in the art, aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc. ) or combining software and hardware implementation that may all generally be referred to herein as a “block, ” “module, ” “engine, ” “unit, ” “component, ” or “system” . Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.

Furthermore, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations therefore, is not intended to limit the claimed processes and methods to any order except as may be specified in the claims. Although the above disclosure discusses through various examples what is currently considered to be a variety of useful embodiments of the disclosure, it is to be understood that such detail is solely for that purpose, and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalent arrangements that are within the spirit and scope of the disclosed embodiments. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software only solution, e.g., an installation on an existing processing device or mobile device.

Similarly, it should be appreciated that in the foregoing description of embodiments of the present disclosure, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various inventive embodiments. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, inventive embodiments lie in less than all features of a single foregoing disclosed embodiment.

Claims

A method for compressing a video clip, comprising:

reading out a plurality of groups of raw pixel values from a camera head, wherein each group of raw pixel values correspond to a frame of the video clip;

doing intra-frame compression by compressing each group of raw pixel values into an intra-frame-compressive-sampling (ICS) frame with a compressing kernel, wherein ICS frames include a first ICS-frame and a number of remaining ICS frames (R-ICS frames) after the first ICS-frame in time, and wherein the compressing kernel has Ncomp ICS-channels and Ncomp is an integer not smaller than 1;

quantizing the first ICS-frame, and quantizing R-ICS frames into QR-ICS frames, wherein the quantized first ICS-frame comprises Ncomp sub-quantized first ICS-frames and each sub-quantized first ICS-frame corresponds to an ICS-channel of the quantized first ICS-frame, wherein each QR-ICS frame comprises Ncomp sub-QR-ICS frames and each sub-QR-ICS frame corresponds to an ICS-channel of the QR-ICS frame;

for sub-QR-ICS frames corresponding to each ICS-channel, doing patch-matching subtraction in sub-QR-ICS frames relative to a sub-quantized first ICS-frame corresponding to the same ICS-channel, and generating patch-subtracted ICS-frames, wherein one or more motion vectors correspond to a sub-QR-ICS frame and a motion vector represents a relative location between a matched patch in one of the sub-QR-ICS frames corresponding to an ICS-channel and a reference patch in a sub-quantized first ICS-frame corresponding to the same ICS-channel, and wherein motion vectors corresponding to an ICS-channel are shared among other ICS-channels;

for patch-subtracted ICS-frames corresponding to each ICS-channel, grouping the patch-subtracted ICS-frames into stacks, wherein each stack includes a pre-defined number of patch-subtracted ICS-frames;

for patch-subtracted ICS-frames in each stack corresponding to each ICS-channel, determining shared data, wherein the shared data represent similar data among the patch-subtracted ICS-frames, and determining stack-residual frames based on the shared data.
The method according to claim 1, wherein doing intra-frame compression by compressing each group of raw pixel values into a compressed frame with a compressing kernel, comprising:

for each frame of the video clip, compressing each portion of the group of raw pixel values into an integer with the compressing kernel, wherein each portion of the group of raw pixel values corresponds to a section of the frame.
The method according to claim 1, wherein for sub-QR-ICS frames corresponding to each ICS-channel, doing patch-matching subtraction in the QR-ICS frames relative to the quantized first ICS-frame, comprising:

for each sub-QR-ICS frame corresponding to an ICS-channel, doing patch-matching-based motion prediction for patches of the sub-QR-ICS frame relative to the sub-quantized first ICS-frame corresponding to the same ICS-channel, wherein the sub-QR-ICS-frame is divided into patches for patch-searching in the sub-quantized first ICS frames, there is no gap, nor overlap between the patches in the sub-QR-ICS-frame;

for each sub-QR-ICS frame, determining a patch-subtracted ICS-frame by subtracting one or more matched patches from the sub-QR-ICS frame.
The method according to claim 3, wherein for each sub-QR-ICS frame corresponding to an ICS-channel, doing patch-matching-based motion prediction for patches of the sub-QR-ICS frame relative to the sub-quantized first ICS-frame corresponding to the same ICS-channel, comprising:

defining a sub-QR-ICS frame corresponding to an ICS-channel as a matching frame and a sub-quantized first ICS-frame corresponding to the same ICS-channel as a searching frame, and doing hierarchical patch searching in the searching frame, wherein the matching frame is divided into relative patches and the hierarchical patch searching comprising:

for each relative patch in the matching frame, doing patch searching in the searching frame with a stride size, wherein the stride size is a defined integer not smaller than 1;

during the patch searching in the searching frame, calculating a square difference between each patch in the searching frame and the relative patch in the matching frame;

if the lowest square difference is smaller than a defined threshold, determining a target patch in the searching frame with lowest square difference as a reference patch and determining the relative patch as the matched patch;

if the lowest square difference is not smaller than the defined threshold and if patch size of the relative patch is larger than a defined minimal patch size, defining the relative patch as the matching frame and the target patch as the searching frame, and doing hierarchical patch searching in the searching frame;

doing the hierarchical patch searching repeatedly until a reference patch corresponding to the relative patch is found with a square difference smaller than the defined threshold, or patch size of the relative patch is not larger than the defined minimal patch size.
The method according to claim 1, the numbers of patch-subtracted ICS-frames in each stack are equal to each other.
The method according to claim 1, wherein for patch-subtracted ICS-frames in each stack corresponding to each ICS-channel, determining shared data, and for each patch-subtracted ICS-frame, determining a stack-residual frame based on the shared data, comprising:

for each stack corresponding to an ICS-channel, determining shared data by convoluting the patch-subtracted ICS-frames in each stack with a first kernel;

for each stack corresponding to the ICS-channel, determining quantized shared data for each stack by quantizing values in shared data to integers of pre-defined bit depth;

for each stack corresponding to the ICS-channel, rescaling the quantized shared data to RQ-shared data;

for each stack corresponding to the ICS-channel, reshaping the RQ-shared data to RRQ-shared data by performing deconvolution with a second kernel;

for each patch-subtracted ICS-frame in each stack corresponding to the ICS-channel, determining a stack-residual frame by subtracting RRQ-shared data from the patch-subtracted ICS-frame.
The method according to claim 1, wherein for patch-subtracted ICS-frames in each stack corresponding to each ICS-channel, determining shared data, and determining stack- residual frames based on the shared data, comprising:

for each stack corresponding to an ICS-channel, compressing the patch-subtracted ICS-frames into a weighted sum frame by doing weighted summation of values in a same location of the patch-subtracted ICS-frames with weighted-sum parameters;

for each stack corresponding to the ICS-channel, determining shared data by convoluting the weighted sum frame with a first kernel;

for each stack corresponding to the ICS-channel, determining quantized shared data by quantizing values in shared data to integers of pre-defined bit depth;

for each stack corresponding to the ICS-channel, rescaling the quantized shared data to RQ-shared data;

for each stack corresponding to the ICS-channel, reshaping the RQ-shared data to RRQ-shared data by performing deconvolution with a second kernel;

for each patch-subtracted ICS-frame in each stack corresponding to the ICS-channel, determining a stack-residual frame by subtracting RRQ-shared data from the patch-subtracted ICS-frame.
The method according to claim 6 or 7, wherein the method further comprising:

for each stack corresponding to the ICS-channel, compressing each stack-residual frame with a third kernel;

for each stack corresponding to the ICS-channel, determining quantized-compressed stack-residual frames by quantizing values in each compressed stack-residual frame to integers of pre-defined bit depth;

doing entropy encoding to quantized-shared data corresponding to each ICS-channel, the quantized-compressed stack-residual frames corresponding to each ICS-channel, and motion vectors in each stack shared among ICS-channels respectively;

wherein the entropy encoded-quantized-shared data corresponding to each ICS-channel, the entropy encoded quantized-compressed stack-residual frames corresponding to each ICS-channel and entropy encoded-motion vectors in each stack shared among ICS-channels are stored for decoding;

wherein the entropy encoding operations are performed based on a global dictionary, and the global dictionary is pre-constructed based on large amount of data in the same type.
The method according to claim 8, wherein the method further comprising:

for each stack corresponding to the ICS-channel, doing entropy decoding to the entropy encoded-quantized-shared data, the entropy encoded-quantized-compressed stack-residual frames and corresponding entropy-encoded motion vectors;

for each stack corresponding to the ICS-channel, rescaling each quantized-compressed stack-residual frame to a RQ-compressed stack-residual frame;

for each stack corresponding to the ICS-channel, decompressing each RQ-compressed stack-residual frame into a first decompressed ICS-frame by performing deconvolution with a fourth kernel;

for each stack corresponding to the ICS-channel, reshaping the RQ-shared data to RRQ-shared data by performing deconvolution with the second kernel;

for each first decompressed ICS-frame in each stack corresponding to the ICS-channel, determining a second decompressed ICS-frame by adding RRQ-shared data and one or more corresponding matched patches with stored motion vectors to a first decompressed ICS-frame;

for each second decompressed ICS-frame in each stack, determining a third decompressed ICS-frame by stacking second decompressed ICS-frames corresponding to all ICS-channels together;

for each third decompressed ICS-frame in each stack, determining a reconstructed frame by performing intra-frame decompression to the third decompressed ICS-frame with a decompressing kernel and a neural network for quality improvements (QINN) ;

wherein the first-fourth kernel are shared among stacks corresponding to the ICS-channel.
The method according to claim 9, wherein a temporal module comprises the first-fourth kernels, or the temporal module comprises the first-fourth kernels and the weighted-sum parameters, and parameters in the compressing kernel, the temporal module, the decompressing kernel and the QINN are determined by sample-based training, the sample-based training procedure comprising:

reading out a plurality of groups of sample raw pixel values, wherein each group of sample raw pixel values correspond to a frame;

doing intra-frame compression by compressing each group of sample raw pixel values into a sample ICS frame with an initial compressing kernel, wherein the sample ICS frames including a sample first ICS-frame and a number of sample R-ICS frames after the sample first ICS-frame in time, and wherein the initial compressing kernel has Ncomp ICS-channels and Ncomp is an integer not smaller than 1;

quantizing the sample first ICS-frame, and quantizing sample R-ICS frames into sample QR-ICS frames, wherein sample quantized first ICS-frame comprises Ncomp sample sub-quantized first ICS-frames and each sample sub-quantized first ICS-frame corresponds to an ICS-channel of the sample quantized first ICS-frame, wherein each sample QR-ICS frame comprises Ncomp sample sub-QR-ICS frames and each sample sub-QR-ICS frame corresponds to an ICS-channel of the sample QR-ICS frame;

for sample sub-QR-ICS frames corresponding to each ICS-channel, doing patch-matching subtraction in the sample sub-QR-ICS frames relative to the sample sub-quantized first ICS-frame, and generating sample patch-subtracted ICS-frames, wherein one or more motion vectors correspond to a sample sub-QR-ICS frame and a motion vector represents a relative location between a matched patch in one of the sample sub-QR-ICS frames and a reference patch in the sample sub-quantized first ICS-frame, and wherein motion vectors corresponding to an ICS-channel are shared among other ICS-channels;

for sample patch-subtracted ICS-frames corresponding to each ICS-channel, grouping the sample patch-subtracted ICS-frames into stacks, wherein each stack includes a pre-defined number of sample patch-subtracted sample ICS-frames;

for patch-subtracted ICS-frames in each stack corresponding to each ICS-channel, determining sample shared data and sample compressed stack-residual frames, sample first decompressed ICS-frames with an initial temporal module, wherein the initial temporal module comprises a fifth kernel, a sixth kernel, a seventh kernel and an eighth kernel, or the temporal module comprises the fifth-eighth kernel and initial weighted sum parameters;

for each stack, determining sample reconstructed frames with an initial decompressing kernel and an initial QINN;

training the initial compressing kernel into the compressing kernel; and training the initial decompressing kernel and the initial QINN into an intermediate decompressing kernel and an intermediate QINN;

training parameters of the initial temporal module via multi-graph-combined-loss training.
The method according to claim 10, wherein training parameters of the initial temporal module via multi-graph-combined-loss training, comprising:

determining four computation graphs, wherein the four graphs are procedures using the initial temporal module, wherein the first computation graph G1 represents a procedure keeping both first and second quantization points, the second computation graph G2 represents a procedure keeping first quantization point, the third computation graph G3 represents a procedure keeping second quantization point, and the fourth computation graph G4 represents a procedure keeping none quantization point;

wherein the first quantization point represents quantizing output data of the fifth kernel, and the second quantization point represents quantizing output data of the seventh kernel;

determining three optimizers in sequential manner during iterative training;

wherein the first optimizer is configured to train parameters before the first quantization point to minimize a first total loss which includes DA_E of the first quantization point from G1, DA_E of second quantization point from G3, and reconstruction loss from G4;

wherein the second optimizer is configured to train parameters between first and second quantization points to minimize a second total loss which includes DA_E of second quantization point from G1, and reconstruction loss from G2;

wherein the third optimizer is configured to train parameters after second quantization point to minimize a third total loss which includes reconstruction loss from G1;

wherein DA_E represents an entropy’s differentiable approximations;

training parameters in the initial temporal module by iteratively running the first, the second and the third optimizers, wherein the fifth-eighth kernels may be trained into fifth-eighth intermediate kernels, wherein parameters in the fifth-eighth intermediate kernels are floating numbers.
The method according to claim 11, wherein the first graph G1 comprising:

determining data T2 by inputting data T1 into a first convolution layer with parameters Para (bQ1) , wherein data T1 corresponds to sample patch-subtracted ICS-frames for all stacks corresponding to each channel;

determining data T2_Q by quantizing data T2 at the first quantization point;

determining data T3 based on data T2_Q, wherein parameters to be trained in the process from data T2_Q to T3 include a first deconvolution layer and a second convolution layer with parameters Para (aQ1, bQ2) , wherein the process further includes a rescaling operation before the first deconvolution layer and a subtracting operation after the second convolution layer;

determining data T3_Q by quantizing data T3 at the second quantization point, wherein data T3_Q corresponds to quantized-compressed stack-residual frames;

determining data T4 based on data T3_Q, wherein parameters to be trained in the process from data T3_Q to T4 include a second deconvolution layer with parameters Para (aQ2) , wherein the process further includes a rescaling operation before the second deconvolution layer;

wherein Para (bQ1) is parameters in the fifth kernel, or Para (bQ1) is the weighted sum parameters and parameters in the fifth kernel;

wherein Para (aQ1, bQ2) is parameters in the sixth kernel and the seventh kernel, and Para (aQ2) is parameters in the eighth kernel.
The method according to claim 12, wherein the second graph G2 comprising:

determining data T2 by inputting data T1 into the first convolution layer with parameters Para (bQ1) ;

determining data T2_Q by quantizing data T2 at the first quantization point;

determining data T3 based on data T2_Q, wherein parameters to be trained in the process from data T2_Q to T3 include the first deconvolution layer and the second convolution layer with parameters Para (aQ1, bQ2) ;

determining data T4 (2) based on data T3, wherein parameters to be trained in the process from data T3 to T4 (2) include the second deconvolution layer with parameters Para (aQ2) .
The method according to claim 13, wherein the third graph G3 comprising:

determining data T2 by inputting data T1 into the first convolution layer with parameters Para (bQ1) ;

determining data T3 (3) based on data T2, wherein parameters to be trained in the process from data T2 to T3 (3) include the first deconvolution layer and the second convolution layer with parameters Para (aQ1, bQ2) , wherein the process further includes a subtracting operation after the second convolution layer;

determining data T3_Q (3) by quantizing data T3 at the second quantization point;

determining data T4 (3) based on data T3_Q (3) , wherein parameters to be trained in the process from data T3_Q (3) to T4 (3) include the second deconvolution layer with parameters Para (aQ2) , wherein the process further include a rescaling operation before the second deconvolution layer.
The method according to claim 14, wherein the fourth graph G4 comprising:

determining data T2 by inputting data T1 into the first convolution layer with parameters Para (bQ1) ;

determining data T3 (4) based on data T2, wherein parameters to be trained in the process from data T2 to T3 (4) include the first deconvolution layer and the second convolution layer with parameters Para (aQ1, bQ2) , wherein the process further includes a subtracting operation after the second convolution layer;

determining data T4 (4) based on data T3 (4) , wherein parameters to be trained in the process from data T3 (4) to T4 (4) include the second deconvolution layer with parameters Para (aQ2) .
The method according to claim 11, the method further comprising:

determining the first kernel by integerizing parameters in the fifth intermediate kernel, and determining the second kernel by integerizing parameters in the sixth intermediate kernel, and determining the third kernel by integerizing parameters in the seventh intermediate kernel;

determining the fourth kernel by fine-tuning parameters in the eighth intermediate kernel;

determining the decompressing kernel and the QINN by fine-tuning parameters in the intermediate decompressing kernel and the intermediate QINN.
An apparatus for compressing a video clip, comprising:

a reading-out unit, wherein the reading-out unit is configured to read out a plurality of groups of raw pixel values from a camera head;

a processor, wherein the processor is configured to perform compression to multi-frames of the video clip, wherein the compression comprising:

doing intra-frame compression by compressing each group of raw pixel values into an intra-frame-compressive-sampling (ICS) frame with a compressing kernel, wherein ICS frames include a first ICS-frame and a number of remaining ICS frames (R-ICS frames) after the first ICS-frame in time, and wherein the compressing kernel has Ncomp ICS-channels and Ncomp is an integer not smaller than 1;

quantizing the first ICS-frame, and quantizing R-ICS frames into QR-ICS frames, wherein the quantized first ICS-frame comprises Ncomp sub-quantized first ICS-frames and each sub-quantized first ICS-frame correspond to an ICS-channel of the quantized first ICS-frame, wherein each QR-ICS frame comprises Ncomp sub-QR-ICS frames and each sub-QR-ICS frame correspond to an ICS-channel of the QR-ICS frame;

for sub-QR-ICS frames corresponding to each ICS-channel, doing patch-matching subtraction in sub-QR-ICS frames relative to a sub-quantized first ICS-frame corresponding to the same ICS-channel, and generating patch-subtracted ICS-frames, wherein one or more motion vectors correspond to a sub-QR-ICS frame and a motion vector represents a relative location between a matched patch in one of the sub-QR-ICS frames and a reference patch in the sub-quantized first ICS-frame, and wherein motion vectors corresponding to an ICS-channel are shared among other ICS-channels;

for patch-subtracted ICS-frames corresponding to each ICS-channel, grouping the patch-subtracted ICS-frames into stacks, wherein each stack includes a pre-defined number of patch-subtracted ICS-frames;

for patch-subtracted ICS-frames in each stack corresponding to each ICS-channel, determining shared data, wherein the shared data represent similar data among the patch-subtracted ICS-frames, and determining stack-residual frames based on the shared data;

wherein each group of raw pixel values correspond to a frame of the video clip.