WO2021097771A1

WO2021097771A1 - Ics-frame transformation method and apparatus for cv analysis

Info

Publication number: WO2021097771A1
Application number: PCT/CN2019/120031
Authority: WO
Inventors: David Jones Brady; Xuefei YAN; Yulin JIANG
Original assignee: Suzhou Aqueti Technology Co., Ltd.
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2021-05-27
Also published as: CN113170160A; CN113170160B

Abstract

An ICS-frame transformation method and apparatus. The method includes steps of: reading out one or more ICS-frames of dimension [NX/kx, NY/ky, Ncomp] (702); determining one or more transformed ICS-frames of dimension [NX/kx, NY/ky, 3] by using parameters in a 2D array of dimension [Ncomp, 3] to linearly transform the one or more ICS-frames (704); outputting the one or more transformed ICS-frames to a neural network for CV analysis (706); wherein the one or more ICS-frames are determined by performing intra-frame compression to one or more raw-bayer frames with a compressing kernel, wherein the one or more raw-bayer frames are captured by one or more camera heads; wherein each raw-bayer frame is of dimension [NX, NY] and the compressing kernel is of dimension [kx, ky, Ncomp]; wherein NX, NY, kx, ky, NX/kx, NY/ky and Ncomp are positive integers, and Ncomp represents number of ICS-channels of the compressing kernel.

Description

ICS-frame Transformation Method and Apparatus for CV Analysis

TECHNICAL FIELD

The present invention relates to CV analysis, particularly to a method for data processing method during compression for CV analysis.

BACKGROUND

The general function of a camera is to transform optical data into compressed serial electronic formats for transmission or information storage. The optical data may correspond to one or more raw-bayer frames. Raw-bayer frame may generally have high resolution, which may decrease the transmission speed, so a compression method before transmission is necessary. Therefore, a decompressing operation may be also needed.

A conventional camera consists of a focal plane and a “system on chip” image processing platform. The chip may implement demosaicing, white-balance tuning, color mixing tuning, gamma correction, compressing and decompressing in sequence, and one frame or image only can be viewed or be analyzed after reconstruction (decompression) .

Generally, neural networks (NNs) are used for computer vision (CV) analysis. CV analysis may be object detection/classification, face recognition, etc., and frames (mostly in usual RGB format) are inputted into the NNs for either NN applications (object detection/classification, face recognition, etc. ) or NNs training.

Most existing CV analysis NNs have small input XY-plain dimensions, such as [416, 416] , and high-resolution demosaiced frames must be down-sampled (and followed by zero-padding in most cases) before being inputted in the NNs.

Operating power and computation are key limitations in camera pixel capacity, and the operating power and computation consumption involve demosaicing, white-balance tuning, color mixing tuning, gamma correction, compression, decompression and down-sampling. Some method may be needed for reducing the operating power and computation, while not needing to use large amount of new labeled training data to retrain the CV-analysis NNs existing on the market.

SUMMARY

One aspect of the present disclosure is directed to an ICS-frame transformation method for CV analysis. The ICS-frame transformation method may include one or more of the following operations. One or more ICS-frames of dimension [NX/kx, NY/ky, Ncomp] may be read out. One or more transformed ICS-frames of dimension [NX/kx, NY/ky, 3] may be determined by using parameters in a 2D array of dimension [Ncomp, 3] to linearly transform the one or more ICS-frames. The one or more transformed ICS-frames may be output to a neural network for CV analysis. In some embodiments, the one or more ICS-frames may be determined by performing intra-frame compression to one or more raw-bayer frames with a compressing kernel, wherein the one or more raw-bayer frames may be captured by one or more camera heads. In some embodiments, each raw-bayer frame may be of dimension [NX, NY] , and the compressing kernel may be of dimension [kx, ky, Ncomp] , and NX, NY, kx, ky, NX/kx, NY/ky and Ncomp may be positive integers, and Ncomp represents number of ICS-channels of the compressing kernel.

In some embodiments, for each ICS-frame, a corresponding transformed ICS-frame may be determined by summing pixel values at the same XY-plane position in Ncomp ICS-channels with weighting factors in three 1D vectors [Ncomp, j] of the 2D array, wherein j is 0, 1 and 2.

In some embodiments, the parameters in the 2D array may be determined based on sample training, and the sample training comprises: reading out one or more first sample raw-bayer frames, wherein each first sample raw-bayer frame is of dimension [NX, NY] ; determining one or more demosaiced first sample raw-bayer frames by performing demosaicing to each first sample raw-bayer frame, wherein each demosaiced first sample raw-bayer frame is of dimension [NX, NY, 3] ; determining one or more trans-training-label frames by performing down-sampling to each demosaiced first sample raw-bayer frame, wherein each trans-training-label frame is of dimension [NX/kx, NY/ky, 3] ; determining one or more first sample ICS-frames by performing intra-frame compression to each first sample raw-bayer frame with the compressing kernel; determining one or more first sample transformed ICS-frames of dimension [NX/kx, NY/ky, 3] by using initial parameters in a 2D array of dimension [Ncomp, 3] to linearly transform the one or more first sample ICS-frames; determining the parameters in the 2D array by tuning the initial parameters in the 2D array to minimize total training loss between the one or more first sample transformed ICS-frames and the corresponding one or more trans-training-label frames.

In some embodiments, each first sample transformed ICS-frame corresponds to a trans-training-label frame, and training loss of a first sample transformed ICS-frame is mean square difference between the first sample transformed ICS-frame and its corresponding trans-training-label frame, and the total training loss between the one or more first sample transformed ICS-frames and the corresponding one or more trans-training-label frames is the sum of the individual training loss values.

In some embodiments, the intra-frame compression comprises: for each raw-bayer frame, compressing each group of pixel values in the raw-bayer frame into an integer with the compressing kernel, wherein pixels in each raw-bayer frame are divided into multi-groups and each group of pixels corresponds to a 2D or a 1D raw pixels array of the raw-bayer frame.

In some embodiments, the compressing kernel may be determined based on sample training, and the sample training comprises: reading out one or more second sample raw-bayer frames, wherein each second sample raw-bayer frame is of dimension [NX, NY] ; for each second sample raw-bayer frame, determining a corresponding ICS-training-label frame by performing linear transformations and combinations to the R, B, Gb and Gr pixels in the second sample raw-bayer frame, wherein each ICS-training-label frame is of dimension [NX′, NY′, Nlabel] ; determining one or more second sample ICS-frames of dimension [NX/kx, NY/ky, Ncomp] by performing intra-frame compression to each second sample raw-bayer frame with an initial compressing kernel, wherein the initial compressing kernel is of dimension [kx, ky, Ncomp] ; determining one or more second sample decompressed ICS-frames of dimension [NX′, NY′, Nlabel] by performing decompression to each second sample ICS-frame with an initial decompressing kernel of dimension [kx′, ky′, Ncomp, Nlabel] , wherein kx′=NX′*kx/NX, ky′=NY′*ky/NY; determining the compressing kernel by training the initial compressing kernel based on the one or more ICS-training-label frames; wherein NX′, NY′, kx′, ky′, and Nlabel are positive integers.

In some embodiments, the process determining the compressing kernel by training the initial compressing kernel based on the one or more ICS-training-label frames, may comprises: determining a floating-number compressing kernel by tuning parameters in the initial compressing kernel based on machine learning to minimize total quality loss between the one or more second sample decompressed ICS-frames and the corresponding one or more ICS-training-label frames; determining the compressing kernel by integerizing parameters in the floating-number compressing kernel.

In some embodiments, each second sample decompressed ICS-frame corresponds to an ICS-training-label frame, and quality loss of a second sample decompressed ICS-frame is mean square difference between the second sample decompressed ICS-frame and its corresponding ICS-training-label frame, and the total quality loss between the one or more second sample decompressed ICS-frames and the one or more ICS-training-label frames is the sum of the individual quality loss values.

In some embodiments, the method further comprises: determining one or more second sample intermediate ICS-frames by performing intra-frame compression to each second sample raw-bayer frame with the compressing kernel; determining one or more second sample intermediate decompressed ICS-frames by performing decompression to each second sample intermediate ICS-frame with the initial decompressing kernel; determining the decompressing kernel by tuning parameters in the initial decompressing kernel based on machine learning to minimize total quality loss between the one or more second sample intermediate decompressed frames and the corresponding one or more ICS-training-label frames.

In some embodiments, each second sample intermediate decompressed ICS-frame corresponds to an ICS-training-label frame, and quality loss of a second sample intermediate decompressed ICS-frame is mean square difference between the second sample intermediate decompressed ICS-frame and its corresponding ICS-training-label frame, and the total quality loss between the one or more second sample intermediate decompressed ICS-frames and the one or more ICS-training-label frames is the sum of the individual quality loss values.

In some embodiments, the method further comprises: determining one or more second sample intermediate ICS-frames by performing intra-frame compression to each first sample raw-bayer frame with the compressing kernel; determining one or more second sample intermediate decompressed ICS-frames by performing decompression to each second sample intermediate ICS-frame with the initial decompressing kernel; determining one or more second sample reconstructed frames by inputting each second sample intermediate decompressed ICS-frame into an initial QINN; determining the decompressing kernel and a QINN by tuning parameters in the initial decompressing kernel and the initial QINN based on machine learning to minimize total quality loss between the one or more second sample reconstructed frames and the one or more ICS-training-label frames.

In some embodiments, each second sample reconstructed frame corresponds to an ICS-training-label frame, and quality loss of a second sample reconstructed ICS-frame is mean square difference between the second sample reconstructed ICS-frame and its corresponding ICS-training-label frame, and the total quality loss between the one or more second sample reconstructed ICS-frames and the one or more ICS-training-label frames is the sum of the individual quality loss values.

Another aspect of the present disclosure is directed to an ICS-frame transformation apparatus for CV analysis including a reading-out module, a processor and an output port. The reading out module may be configured to read out one or more raw-bayer frames, and each raw-bayer frame is of dimension [NX, NY] . The processor may be configured to determine one or more transformed ICS-frames by using parameters in a 2D array of dimension [Ncomp, 3] to linearly transform the one or more ICS-frames. The output port may be configured to output the one or more transformed ICS-frames to a neural network for CV analysis. In some embodiments, the one or more ICS-frames may be determined by performing intra-frame compression to one or more raw-bayer frames with a compressing kernel, wherein the one or more raw-bayer frames may be captured by one or more camera heads. In some embodiments, each raw-bayer frame may be of dimension [NX, NY] , and the compressing kernel may be of dimension [kx, ky, Ncomp] , and NX, NY, kx, ky, NX/kx, NY/ky and Ncomp may be positive integers, and Ncomp represents number of ICS-channels of the compressing kernel.

In some embodiments, for each ICS-frame, the processor determines a corresponding transformed ICS-frame by summing pixel values at the same XY-plane position in Ncomp ICS-channels with weighting factors in three 1D vectors [Ncomp, j] of the 2D array, wherein j is 0, 1 and 2.

Additional features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The features of the present disclosure may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations set forth in the detailed examples discussed below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:

FIG. 1 shows an example of original raw-bayer picture according to some embodiments of the present disclosure;

FIG. 2 illustrates an intra-frame compressing and decompressing method according to some embodiments of the present disclosure;

FIG. 3 shows a convolution process as described in 204 according to some embodiments of the present disclosure;

FIG. 4 illustrates an example of intra-frame compression process with frame strategy as described in 204 according to some embodiments of the present disclosure;

FIG. 5 is an example of a compressing kernel during a single-layer convolutional 2D compression process of raw-bayer data according to some embodiments of the present disclosure;

FIG. 6 is an integer array of the shape [256, 480, 4] after compression of the input pixel values with the compressing kernel;

FIG. 7 illustrates an ICS-frame transformation method for CV analysis according to some embodiments of the present disclosure;

FIG. 8 shows an exemplary transformed ICS-frame according to some embodiments of the present disclosure;

FIG. 9 shows a CV analysis result of stacking RGB format of the transformed ICS-frame in FIG. 8 according to some embodiments of the present disclosure;

FIG. 10 shows an exemplary training method of parameters in the 2D array according to some embodiments of the present disclosure;

FIG. 11 shows two widely-used format of block-matrices according to some embodiments of the present disclosure;

FIG. 12 illustrates an exemplary pre-training method of the compressing kernel according to some embodiments of the present disclosure;

FIG. 13 illustrates an exemplary training method of the compressing kernel according to some embodiments of the present disclosure;

FIG. 14 is an exemplary training method for a decompressing kernel according to some embodiments of the present disclosure;

[Rectified under Rule 91, 07.01.2020]
FIG. 15 shows another training method of the decompressing kernel according to some embodiments of the present disclosure.
FIG. 16 illustrates an ICS-frame transformation apparatus for CV analysis.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth by way of example in order to provide a thorough understanding of the relevant disclosure. However, it should be apparent to those skilled in the art that the present disclosure may be practiced without such details. In other instances, well known methods, procedures, systems, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present disclosure. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the present disclosure is not limited to the embodiments shown, but to be accorded the widest scope consistent with the claims.

It will be understood that the term “system, ” “engine, ” “unit, ” “module, ” and/or “block” used herein are one method to distinguish different components, elements, parts, section or assembly of different level in ascending order. However, the terms may be displaced by other expression if they may achieve the same purpose.

Generally, the word “module, ” “unit, ” or “block, ” as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions. A module, a unit, or a block described herein may be implemented as software and/or hardware and may be stored in any type of non-transitory computer-readable medium or other storage device. In some embodiments, a software module/unit/block may be compiled and linked into an executable program. It will be appreciated that software modules can be callable from other modules/units/blocks or from themselves, and/or may be invoked in response to detected events or interrupts. Software modules/units/blocks configured for execution on computing devices may be provided on a computer-readable medium, such as a compact disc, a digital video disc, a flash drive, a magnetic disc, or any other tangible medium, or as a digital download (and can be originally stored in a compressed or installable format that needs installation, decompression, or decryption prior to execution) . Such software code may be stored, partially or fully, on a storage device of the executing computing device, for execution by the computing device. Software instructions may be embedded in a firmware, such as an EPROM. It will be further appreciated that hardware modules/units/blocks may be included in connected logic components, such as gates and flip-flops, and/or can be included of programmable units, such as programmable gate arrays or processors. The modules/units/blocks or computing device functionality described herein may be implemented as software modules/units/blocks, but may be represented in hardware or firmware. In general, the modules/units/blocks described herein refer to logical modules/units/blocks that may be combined with other modules/units/blocks or divided into sub-modules/sub-units/sub-blocks despite their physical organization or storage. The description may be applicable to a system, an engine, or a portion thereof.

It will be understood that when a unit, engine, module or block is referred to as being “on, ” “connected to” or “coupled to” another unit, engine, module, or block, it may be directly on, connected or coupled to, or communicate with the other unit, engine, module, or block, or an intervening unit, engine, module, or block may be present, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

These and other features, and characteristics of the present disclosure, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, may become more apparent upon consideration of the following description with reference to the accompanying drawings, all of which form a part of this disclosure. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended to limit the scope of the present disclosure. It is understood that the drawings are not to scale.

The terminology used herein is for the purposes of describing particular examples and embodiments only, and is not intended to be limiting. As used herein, the singular forms “a” , “an” , and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “include, ” and/or “comprise, ” when used in this disclosure, specify the presence of integers, devices, behaviors, stated features, steps, elements, operations, and/or components, but do not exclude the presence or addition of one or more other integers, devices, behaviors, features, steps, elements, operations, components, and/or groups thereof.

The present disclosure provided herein relates to a CV analysis method and apparatus. Detailed descriptions will be illustrated in the following embodiments. The CV analysis method may be applied together with a new compression/decompression method which will be described in FIGs. 1-6. The new compression/decompression method is different from the current ones described in backgrounds, wherein the demosaicing and other operations in the new compression/decompression method may be implemented after compression and decompression.

In a camera, light received by a camera may be read out as raw-bayer data by a chip. Raw-bayer data must be read out of the camera using parallel or serial streaming as shown in FIG. 1. FIG. 1 shows an example of raw-bayer frame corresponding to the raw-bayer data according to some embodiments of the present disclosure. As shown in FIG. 1, a raw-bayer frame is in shape of [2048, 3840] , and each pixel may have a corresponding pixel value (raw pixel value) . The raw pixel values may be read out in sequence by the chip after capturing the frame.

For an electronic sensor array of the camera, the read-out data from a focal plane may be in a raster format, meaning rows are read out in sequence. The input pixel values (raw pixel values) is in shape of [2048, 3840] . In some embodiments, the [2048, 3840] raw-bayer frame can be compressed to an integer array of the shape [256, 480, 4] which will be described in Fig. 6. Pixels also correspond to different colors, typically red, green and blue, but the color values may be typically mosaicked across the sensor so that a given pixel corresponds to a given known color.

Intra-frame compression may be performed on the raw-bayer data streams. FIG. 2 illustrates an intra-frame compression/decompression method according to some embodiments of the present disclosure.

In 202, a plurality of groups of raw pixel values may be read out in sequence from a camera head. In some embodiments, the raw pixel values may be read out in sequence as raw-bayer data. For example, a camera head may capture a raw-bayer frame comprising a plurality of groups of raw pixel values, wherein each group of raw pixel values correspond to a 2D or a 1D raw pixels array of the raw-bayer frame.

In 204, intra-frame compression may be performed by compressing each group of raw pixel values into an integer with a compressing kernel, and as a result, the raw-bayer frame may be compressed into an ICS-frame. In some embodiments, the compressing kernel may have Ncomp ICS-channels and Ncomp may be an integer not smaller than 1. The compressing may be described in the frame strategies.

In some embodiments, elements in the compressing kernel may be integers for easy application on hardware such as FPGA. For example, the element in the compressing kernel may be binaries and the bit depths of the elements may be 12-bit, 10-bit, 8-bit, 6-bit, 4-bit or 2-bit. Further, when the elements are 2-bit binaries, the elements may be -1 or +1; or the elements may be 0 or 1.

Frame strategy one

In the raw-bayer frame, a group of raw pixel values may correspond to pixels in a 2D patch of the raw-bayer frame, and the compressing kernel may be a 2D kernel, wherein the raw-bayer frame may be divided into multi-2D patches. A 2D patch and the 2D kernel may have a same dimension. For example, the 2D kernel may have a dimension [k _x, k _y] , and the frame of shape [N _X, N _Y] may be divided into [N _x, N _y] 2D patches, wherein N _x=N _X/k _x, N _y=N _Y/k _y. Pixel values corresponding to the pixels in a certain patch with shape [k _x, k _y, 1] may be multiplied by a 2D kernel with shape [k _x, k _y, Ncomp] , and the pixel values in the 2D patch may be compressed into Ncomp numbers (Ncomp is a preset integer defined manually) . As a result, the input raw pixel values of the frame may be compressed into COMP, wherein COMP is an array of Ncomp numbers (an array of integers) with a dimension of [N _x, N _y, Ncomp] , Ncomp may represent number of ICS-channels of COMP. The intra-frame compression process may be a 2D convolution operation described as equation (1) shows below:

Output (k) =sum _i, j (pixel _i, j, weight _i, j, k) (1)

where indices i and j loop through k _x and k _y respectively, and index k is from 0 to Ncomp-1.

The compression ratio without considering the difference between bit depth of input pixel values (raw pixel values are usually 8-bit or 10-bit) and bit depth of the array of numbers after compression (8 bit) can be expressed as Ncomp/ (k _x*k _y) . Different compression ratios can be achieved by using various settings of [k _x, k _y, Ncomp] . For example, different compression ratios such as 1/16, 1/16, 1/32 and 1/256 can be achieved with 2D kernels [16, 16, 16] , [8, 8, 4] , [16, 16, 8] and [16, 16, 1] respectively.

Frame strategy two

In the raw-bayer frame, a group of raw pixel values may correspond to pixels in a 1D segment of the raw-bayer frame, and the compressing kernel may be a 1D kernel, wherein the raw-bayer frame may be divided into multi-1D segments. A 1D segment and the 1D kernel may have a same dimension. In some embodiments, each element in the compressing kernel may be -1 or +1; or each element in the compressing kernel may be 0 or 1. For example, 16 incoming pixel values (1D raw pixels array) may be combined together using a compressing kernel [0, 1, 0, 0, 1, 0, ... 1] with length of 16 into one number. As another example, 16 incoming pixel values may be combined together using a compressing kernel [-1, 1, -1, -1, 1, -1, ... 1] with length of 16 into one number.

Specifically, the sequence may be divided row by row. Various 1D compressing kernels have been developed, including [128, 1, 4] , [32, 1, 4] . And combinations of different convolutional-1D kernels for different rows in the raw-bayer data may be used to control the total compression ratio of a picture/frame.

This division way of the sequence of pixels uses less buffer size than that of the 2D patch division way, because pixel values from different rows/segments do not need to be buffered while the incoming pixel values can be processed as segments.

As described above, each group of raw pixel values may be compressed into an integer, and the raw-bayer frame may be compressed into a plurality of integers. The plurality of integers may be stored or buffered during the compression/decompression process.

In 206, decompression may be performed to determine a decompressed ICS-frame with a decompressing kernel. In some embodiments, the decompression may be performing deconvolution to the plurality of integers of the ICS-frame. In some embodiments, quantization and entropy encoding may be performed after compression, which means that entropy decoding, rescaling and integerizing (rescaling and integerizing correspond to the quantization operation) may be performed before decompression.

It should be noted that the intra-frame compression process is provided only for illustration purpose, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. For example, compressing kernel with other bit depth may also be applied to compress the raw pixel values.

FIG. 3 shows a convolution process as described in 204 according to some embodiments of the present disclosure. As shown in FIG. 3, an array of pixels values with dimension of 4*4 may be compressed into one integer with a compressing kernel. The compressing kernel may also have a dimension of 4*4.

While the intra-frame compression process is already simple, it may provide a way to reduce the necessary buffer size, as illustrated in 204. For applying a convolutional-2D kernel with dimension [4, 4, 1] to a patch of pixels in shape [4, 4] , one does not need to put all the pixels (16 in total) in buffer and carry out the elementwise product and summation once. Instead, one can process the pixels with the proper kernel weight elements row-by-row as the pixels are read in and put the output value (array of numbers) in buffer until a single convolutional operation is finished. After each convolutional operation, the buffered numbers can be output to storage, and the buffer can be cleared.

For an incoming raw-bayer frame of dimension [N _X, N _Y] being processed by a convolutional-2D kernel of dimension [k _x, k _y, Ncomp] , the necessary buffer size is k _x rows of raw bayer pixels when the method above is carried out.

FIG. 4 illustrates an example of intra-frame compression process with frame strategy one as described in 204 according to some embodiments of the present disclosure. As shown in FIG. 4, a frame may be compressed patch by patch. In some embodiments, the frame may be processed by hardware such as FPGA, the convolutional kernel is applied to each patch of pixels and move to the next patch with no overlap or gap until the last one. The patch 1 shown in FIG. 4 represents the patch that have been processed, and patch 2 represents the patch being processed.

FIG. 5 is an example of a compressing kernel during a single-layer convolutional 2D compression process of a raw-bayer frame according to some embodiments of the present disclosure. A single-layer convolutional-2D operation may be implemented to compress a [N _X, N _Y] -pixel raw-bayer frame (FIG. 1 for example) . The compressing kernel is of the shape [k _x, k _y, Ncomp] = [8, 8, 3] . The compressing kernel is shown in Fig. 5, and the three panels present each of the [8, 8] matrix. Each panel presents one of the [8, 8, i] part of the compressing kernel (i=0, 1, 2) , and each panel represents a channel of the compressing kernel.

FIG. 6 is an integer array of the shape [256, 480, 3] after compression of the raw pixel values shown in FIG. 1 with the compressing kernel shown in FIG. 5. The i ^th panel shows the matrix [256, 480, i] , where index i=0, 1, 2.

In some embodiments, the integer array after compression may be reconstructed with a decompressing kernel (or further with a QINN) . Further, demosaicing and other operation such as white-balance tuning, color mixing tuning and gamma correction may be performed after the reconstruction. As most existing CV analysis need frames in RGB format and frames in RGB format are determined by the demosaicing operation (which is after the decompression) , operating power and computation involves the compression, the decompression, the demosaicing and the down-sampling (for reason described in the background, CV-analysis NNs generally have small input XY-plain dimensions) before the CV analysis. The general way for CV analysis is using the existing CV analysis NNs and inputting frames after decompression, demosaicing and other operations. Another way is designing one’s own CV analysis NNs, and it needs to use large amount of new labeled training data to train the CV-analysis NNs achieving similar purposes covered by the existing ones on the market, which also has big cost. In a sense, the decompression may be viewed as an up-sampling operation, while a down-sampling operation may be performed to the demosaiced frame before inputting into the NNs for CV analysis. So, maybe it is possible to skip decompression and down-sampling both for power and computation reduction.

FIG. 7 illustrates an ICS-frame transformation method for CV analysis according to some embodiments of the present disclosure.

In 702, one or more ICS-frames may be read out, wherein each ICS-frame is of dimension [NX/kx, NY/ky, Ncomp] . In some embodiments, the one or more ICS-frames may be determined as described in 204 of FIG. 2, wherein each raw-bayer frame of dimension [N _X, N _Y] may be compressed into an ICS-frame of dimension [NX/kx, NY/ky, Ncomp] with a compressing kernel of dimension [k _x, k _y, Ncomp] .

In 704, one or more transformed ICS-frames of dimension [NX/kx, NY/ky, 3] may be determined by using parameters in a 2D array of dimension [Ncomp, 3] to linearly transform the one or more ICS-frames.

In some embodiments, for each ICS-frame, the linear transformation may be summing pixel values at the same XY-plane position in Ncomp ICS-channels with weighting factors in three 1D vectors [Ncomp, j] of the 2D array, wherein j is 0, 1 and 2. A pixel in a transformed ICS-frames may be determined as equation (2) shown below:

where j is the RGB-channel index of RGB_trans, and j is from 0 to 2; and wherein i is the ICS-channel index, and i is from 0 to Ncomp-1; and COMP is the ICS-frame and trans_w is the parameters in the 2D array, and wherein i_x is the X-axis index of a pixel, and i_x is from 0 to

and i_y is the Y-axis index of a pixel, and i_y is from 0 to

An exemplary transformed ICS-frame is shown in FIG. 8. As shown in FIG. 8, three channels of the transformed ICS-frame represent RGB channels which can be used for CV analysis. Further, RGB format of the transformed ICS-frame may be determined by stacking the three channels of FIG. 8. And in FIG. 9, a CV analysis result may be determined by putting the RGB format of the transformed ICS-frame into Yolo_v3 (an existing CV analysis NN) .

In 706, outputting the one or more transformed ICS-frames to a neural network for CV analysis. In some embodiments, the one or more transformed ICS-frames can be used directly for CV analysis without down-sampling.

It should be noted that NX, NY, kx, ky, NX/kx, NY/ky and Ncomp are positive integers, and Ncomp represents number of ICS-channels of the compressing kernel.

To make each transformed ICS-frame represent its corresponding RGB format well enough for CV analysis, the parameters in the 2D array may be determined based on sample training, which will be described in FIG. 10. FIG. 10 shows an exemplary training method of parameters in the 2D array according to some embodiments of the present disclosure.

In 1002, one or more first sample raw-bayer frames may be read out, wherein each first sample raw-bayer frame is of dimension [NX, NY] .

In 104, one or more demosaiced first sample raw-bayer frames may be determined by performing demosaicing to each first sample raw-bayer frame, wherein each demosaiced first sample raw-bayer frame is of dimension [NX, NY, 3] .

A raw-bayer frame may be viewed as putting a number of block-matrices of size [2, 2] , with no gap or overlap, wherein each block-matrix contains 1 R, 1 B and 2 G pixels: number of G pixels are doubled due to a convention (origin of this convention is that human eyes are more sensitive to color green among all colors) . And FIG. 11 shows two widely-used format of block-matrices according to some embodiments of the present disclosure.

There are various demosaicing method, and a fundamental one is based on spatial interpolation. Firstly, for R (or B) pixels, a sub-frame may be extracted and the sub-frame is purely made of this type of pixel without changing the spatial relations between pixels of this type. The size of X (Y) dimension of this image is half of that of the raw-bayer frame. Then up-sampling this small-single-color image (sub-frame) using usual interpolation methods to obtain a single-color image of R (or B) with same XY dimension as the raw-bayer frame. Secondly, for Gb and Gr pixels, they are used to replace their nearest R & B pixels, and after a fixing-size interpolation, a single-color-green image with same XY dimension as the raw-bayer frame. Thirdly, the three single-color images (red, green and blue) are stacked together as the viewable RGB image with same XY dimension as the raw-bayer frame.

In 1006, one or more trans-training-label frames may be determined by performing down-sampling to each demosaiced first sample raw-bayer frame, wherein each trans-training-label frame is of dimension [NX/kx, NY/ky, 3] . In some embodiments, the down-sampling may be using well-established existing down-sampling method, or one may design his own down-sampling method.

In 1008, one or more first sample ICS-frames may be determined by performing intra-frame compression to each first sample raw-bayer frame with the compressing kernel. In some embodiments, the intra-frame compression may be like the method in FIG. 2.

In 1010, one or more first sample transformed ICS-frames of dimension [NX/kx, NY/ky, 3] may be determined by using initial parameters in a 2D array of dimension [Ncomp, 3] to linearly transform the one or more first sample ICS-frames. The linearly transformation may be like that in 704.

In 1012, the parameters in the 2D array may be determined by tuning the initial parameters in the 2D array to minimize total training loss between the one or more first sample transformed ICS-frames and the corresponding one or more trans-training-label frames.

Each first sample transformed ICS-frame corresponds to a trans-training-label frame, and training loss of a first sample transformed ICS-frame is mean square difference between the first sample transformed ICS-frame and its corresponding trans-training-label frame, and the total training loss between the one or more first sample transformed ICS-frames and the corresponding one or more trans-training-label frames is the sum of the individual training loss values.

The description in FIG. 10 is presented to enable any person skilled in the art to make and use the present disclosure. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. For example, process in FIG. 10 may change the order. Specifically, the

step

1004 and 1006 may be performed after step 1008 or step 1010.

In some embodiments, the compression kernel may be determined based on sample training. FIG. 12 illustrates an exemplary pre-training method of the compressing kernel according to some embodiments of the present disclosure.

In 1202, one or more second sample raw-bayer frames may be read out, wherein each second sample raw-bayer frame is of dimension [NX, NY] . In some embodiments, the training sample in FIG. 12 and the training sample in FIG. 10 may overlap with each other. For example, at least one first sample raw-bayer frame in the one or more first sample raw-bayer frames can be found in the one or more second sample raw-bayer frames. In some embodiments, the training sample in FIG. 12 and the training sample in FIG. 10 may not overlap with each other. For example, the one or more first sample raw-bayer frames are same as the one or more second sample raw-bayer frames. As another example, anyone of the one or more first sample raw-bayer frames may not found in the one or more second sample raw-bayer frames.

In 1204, for each second sample raw-bayer frame, a corresponding ICS-training-label frame may be determined by performing linear transformations and combinations of the R, B, Gb and Gr pixels in the second sample raw-bayer frame, wherein each ICS-training-label frame is of dimension [NX′, NY′, Nlabel] , and NX, NY, NX′, NY′and Nlabel are positive integers.

It should be noted that the operation to the second sample raw-bayer frame (linear transformations and combinations to the R, B, Gb and Gr pixels) is an upper concept of demosaicing. In some cases, demosaicing may be used for determining the one or more ICS-training-label frames. For example, for a raw-bayer frame, a corresponding ICS-training-label frame may be determined with the following steps: firstly, R, B, Gr and Gb pixels may be extracted from the raw-bayer frame as 2D arrays, each of which is of dimension [NX/2, NY/2] ; secondly, Gr and Gb may be combined to G= (Gr+Gb) /2 of dimension [NX/2, NY/2] ; thirdly, usual RGB-to-YUV linear transformation may be applied to transform stacked RGB 2D arrays in dimension [NX/2, NY/2, 3] , and the transformed stacked RGB 2D arrays in dimension [NX/2, NY/2, 3] is the corresponding ICS-training-label frame. As a result, the dimension of the decompressing kernel may be [kx/2, ky/2, Ncomp, 3] .

In 1206, one or more second sample ICS-frames of dimension [NX/kx, NY/ky, Ncomp] may be determined by performing intra-frame compression to each second sample raw-bayer frame with an initial compressing kernel, wherein the initial compressing kernel is of dimension [kx, ky, Ncomp] .

In 1208, one or more second sample decompressed ICS-frames of dimension [NX′, NY′, Nlabel] may be determined by performing decompression to each second sample ICS-frame with an initial decompressing kernel of dimension [kx′, ky′, Ncomp, Nlabel] , wherein kx′=NX′*kx/NX, ky′=NY′*ky/NY. In some embodiments, the decompression may be a deconvolution operation with the decompressing kernel. With the decompressing kernel with dimension [kx′, ky′, Ncomp, Nlabel] can determine one or more second sample decompressed ICS-frames with a same dimension ( [NX′, NY′, Nlabel] ) as the one or more ICS-training-label frames.

In 1210, the compressing kernel may be determined by training the initial compressing kernel based on the one or more ICS-training-label frames. The training process may be described in FIG. 13.

The description in FIG. 12 is presented to enable any person skilled in the art to make and use the present disclosure. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. For example, process in FIG. 12 may change the order. Specifically, the step 1204 may be performed after step 1206 or step 1208.

FIG. 13 illustrates an exemplary training method of the compressing kernel according to some embodiments of the present disclosure.

In 1302, a floating-number compressing kernel may be determined by tuning parameters in the initial compressing kernel based on machine learning to minimize total quality loss between the one or more second sample decompressed ICS-frames and the one or more ICS-training-label frames. Each second sample decompressed ICS-frame corresponds to an ICS-training-label frame, and quality loss of a second sample decompressed ICS-frame is mean square difference between the second sample decompressed ICS-frame and its corresponding ICS-training-label frame, and the total quality loss between the one or more second sample decompressed ICS-frames and the corresponding one or more ICS-training-label frames is the sum of the individual quality loss values.

In 1304, the compressing kernel may be determined by integerizing parameters in the floating-number compressing kernel.

A decompressing kernel may be determined based on further training as shown in FIG. 14. FIG. 14 is an exemplary training method for a decompressing kernel according to some embodiments of the present disclosure.

In 1402, one or more second sample intermediate ICS-frames may be determined by performing intra-frame compression to each second sample raw-bayer frame with the compressing kernel. The compressing kernel is derived from the step 1304.

In 1404, one or more second sample intermediate decompressed ICS-frames may be determined by performing decompression to each second sample intermediate ICS-frame with the initial decompressing kernel. And in 1406, the decompressing kernel may be determined by tuning parameters in the initial decompressing kernel based on machine learning to minimize total quality loss between the one or more second sample intermediate decompressed frames and the corresponding one or more ICS-training-label frames.

Each second sample intermediate decompressed ICS-frame corresponds to an ICS-training-label frame, and quality loss of a second sample intermediate decompressed ICS-frame is mean square difference between the second sample intermediate decompressed ICS-frame and its corresponding ICS-training-label frame, and the total quality loss between the one or more second sample intermediate decompressed ICS-frames and the one or more ICS-training-label frames is the sum of the individual quality loss values.

In some embodiments, a neural network for quality improvements (QINN) may be applied for reducing quality loss during compressing and decompression, and FIG. 15 shows another training method of the decompressing kernel according to some embodiments of the present disclosure.

In 1502, one or more second sample intermediate ICS-frames may be determined by performing intra-frame compression to each first sample raw-bayer frame with the compressing kernel. The compressing kernel is derived from the step 1304.

In 1504, one or more second sample intermediate decompressed ICS-frames may be determined by performing decompression to each second sample intermediate ICS-frame with the initial decompressing kernel. Further in 1506, one or more second sample reconstructed frames may be determined by inputting each second sample intermediate decompressed ICS-frame into an initial QINN. And in 1508, the decompressing kernel and a QINN may be determined by tuning parameters in the initial decompressing kernel and the initial QINN based on machine learning to minimize total quality loss between the one or more second sample reconstructed frames and the corresponding one or more ICS-training-label frames.

Each second sample reconstructed frame corresponds to an ICS-training-label frame, and quality loss of a second sample reconstructed ICS-frame is mean square difference between the second sample reconstructed ICS-frame and its corresponding ICS-training-label frame, and the total quality loss between the one or more second sample reconstructed ICS-frames and the one or more ICS-training-label frames is the sum of the individual quality loss values.

FIG. 16 illustrates an ICS-frame transformation apparatus for CV analysis. As shown in FIG. 16, the apparatus may include a reading out module 1610, a processor 1620 and an output port 1630.

The reading out module 1610 may be configured to read out one or more raw-bayer frames, and each raw-bayer frame is of dimension [NX, NY] . The processor 1620 may be configured to determine one or more transformed ICS-frames by using parameters in a 2D array of dimension [Ncomp, 3] to linearly transform the one or more ICS-frames. The output port 1630 may be configured to output the one or more transformed ICS-frames to a neural network for CV analysis.

In some embodiments, the one or more ICS-frames are determined by performing intra-frame compression to one or more raw-bayer frames with a compressing kernel, wherein the one or more raw-bayer frames are captured by one or more camera heads. In some embodiments, each raw-bayer frame is of dimension [NX, NY] and the compressing kernel is of dimension [kx, ky, Ncomp] . NX, NY, kx, ky, NX/kx, NY/ky and Ncomp are positive integers, and Ncomp represents number of ICS-channels of the compressing kernel.

In some embodiments, the parameters in the 2D array may be determined based on sample training, and the sample training process may be same as that described in FIG. 10. In some embodiments, the intra-frame compression may be same as that described in FIGs. 1-6, and also, the compressing kernel is determined based on sample training, and the sample training, and the sample training process may be same as that described in FIGs. 12-15.

Having thus described the basic concepts, it may be rather apparent to those skilled in the art after reading this detailed disclosure that the foregoing detailed disclosure is intended to be presented by way of example only and is not limiting. Various alterations, improvements, and modifications may occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested by this disclosure, and are within the spirit and scope of the exemplary embodiments of this disclosure.

Moreover, certain terminology has been used to describe embodiments of the present disclosure. For example, the terms “one embodiment, ” “an embodiment, ” and/or “some embodiments” mean that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Therefore, it is emphasized and should be appreciated that two or more references to “an embodiment” or “one embodiment” or “an alternative embodiment” in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the present disclosure.

Further, it will be appreciated by one skilled in the art, aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc. ) or combining software and hardware implementation that may all generally be referred to herein as a “block, ” “module, ” “engine, ” “unit, ” “component, ” or “system” . Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.

Furthermore, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations therefore, is not intended to limit the claimed processes and methods to any order except as may be specified in the claims. Although the above disclosure discusses through various examples what is currently considered to be a variety of useful embodiments of the disclosure, it is to be understood that such detail is solely for that purpose, and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalent arrangements that are within the spirit and scope of the disclosed embodiments. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software only solution, e.g., an installation on an existing processing device or mobile device.

Similarly, it should be appreciated that in the foregoing description of embodiments of the present disclosure, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various inventive embodiments. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, inventive embodiments lie in less than all features of a single foregoing disclosed embodiment.

Claims

An ICS-frame transformation method for CV analysis, comprising:

reading out one or more ICS-frames of dimension [NX/kx, NY/ky, Ncomp] ;

determining one or more transformed ICS-frames of dimension [NX/kx, NY/ky, 3] by using parameters in a 2D array of dimension [Ncomp, 3] to linearly transform the one or more ICS-frames;

outputting the one or more transformed ICS-frames to a neural network for CV analysis;

wherein the one or more ICS-frames are determined by performing intra-frame compression to one or more raw-bayer frames with a compressing kernel, wherein the one or more raw-bayer frames are captured by one or more camera heads;

wherein each raw-bayer frame is of dimension [NX, NY] and the compressing kernel is of dimension [kx, ky, Ncomp] ;

wherein NX, NY, kx, ky, NX/kx, NY/ky and Ncomp are positive integers, and Ncomp represents number of ICS-channels of the compressing kernel.
The method according to claim 1, wherein determining one or more transformed ICS-frames of dimension [NX/kx, NY/ky, 3] by using parameters in a 2D array of dimension [Ncomp, 3] to linearly transform the one or more ICS-frames, comprising:

for each ICS-frame, summing pixel values at the same XY-plane position in Ncomp ICS-channels with weighting factors in three 1D vectors [Ncomp, j] of the 2D array, wherein j is 0, 1 and 2.
The method according to claim 1 or 2, wherein the parameters in the 2D array is determined based on sample training, and the sample training comprising:

reading out one or more first sample raw-bayer frames, wherein each first sample raw-bayer frame is of dimension [NX, NY] ;

determining one or more demosaiced first sample raw-bayer frames by performing demosaicing to each first sample raw-bayer frame, wherein each demosaiced first sample raw-bayer frame is of dimension [NX, NY, 3] ;

determining one or more trans-training-label frames by performing down-sampling to each demosaiced first sample raw-bayer frame, wherein each trans-training-label frame is of dimension [NX/kx, NY/ky, 3] ;

determining one or more first sample ICS-frames by performing intra-frame compression to each first sample raw-bayer frame with the compressing kernel;

determining one or more first sample transformed ICS-frames of dimension [NX/kx, NY/ky, 3] by using initial parameters in a 2D array of dimension [Ncomp, 3] to linearly transform the one or more first sample ICS-frames;

determining the parameters in the 2D array by tuning the initial parameters in the 2D array to minimize total training loss between the one or more first sample transformed ICS-frames and the corresponding one or more trans-training-label frames.
The method according to claim 3, wherein each first sample transformed ICS-frame corresponds to a trans-training-label frame, and training loss of a first sample transformed ICS-frame is mean square difference between the first sample transformed ICS-frame and its corresponding trans-training-label frame, and the total training loss between the one or more first sample transformed ICS-frames and the corresponding one or more trans-training-label frames is the sum of the individual training loss values.
The method according to claim 1, wherein the intra-frame compression, comprising:

for each raw-bayer frame, compressing each group of pixel values in the raw-bayer frame into an integer with the compressing kernel, wherein pixels in each raw-bayer frame are divided into multi-groups and each group of pixels corresponds to a 2D or a 1D raw pixels array of the raw-bayer frame.
The method according to claim 1, 3, or 5, wherein the compressing kernel is determined based on sample training, and the sample training comprising:

reading out one or more second sample raw-bayer frames, wherein each second sample raw-bayer frame is of dimension [NX, NY] ;

for each second sample raw-bayer frame, determining a corresponding ICS-training-label frame by performing linear transformations and combinations to the R, B, Gb and Gr pixels in the second sample raw-bayer frame, wherein each ICS-training-label frame is of dimension [NX′, NY′, Nlabel] ;

determining one or more second sample ICS-frames of dimension [NX/kx, NY/ky, Ncomp] by performing intra-frame compression to each second sample raw-bayer frame with an initial compressing kernel, wherein the initial compressing kernel is of dimension [kx, ky, Ncomp] ;

determining one or more second sample decompressed ICS-frames of dimension [NX′, NY′, Nlabel] by performing decompression to each second sample ICS-frame with an initial decompressing kernel of dimension [kx′, ky′, Ncomp, Nlabel] , wherein kx′=NX′*kx/NX, ky′=NY′*ky/NY;

determining the compressing kernel by training the initial compressing kernel based on the one or more ICS-training-label frames;

wherein NX′, NY′, kx′, ky′, and Nlabel are positive integers.
The method according to claim 6, wherein determining the compressing kernel by training the initial compressing kernel based on the one or more ICS-training-label frames, comprising:

determining a floating-number compressing kernel by tuning parameters in the initial compressing kernel based on machine learning to minimize total quality loss between the one or more second sample decompressed ICS-frames and the corresponding one or more ICS-training-label frames;

determining the compressing kernel by integerizing parameters in the floating-number compressing kernel.
The method according to claim 7, wherein each second sample decompressed ICS-frame corresponds to an ICS-training-label frame, and quality loss of a second sample decompressed ICS-frame is mean square difference between the second sample decompressed ICS-frame and its corresponding ICS-training-label frame, and the total quality loss between the one or more second sample decompressed ICS-frames and the one or more ICS-training-label frames is the sum of the individual quality loss values.
The method according to claim 7, wherein the method further comprising:

determining one or more second sample intermediate ICS-frames by performing intra-frame compression to each second sample raw-bayer frame with the compressing kernel;

determining one or more second sample intermediate decompressed ICS-frames by performing decompression to each second sample intermediate ICS-frame with the initial decompressing kernel;

determining the decompressing kernel by tuning parameters in the initial decompressing kernel based on machine learning to minimize total quality loss between the one or more second sample intermediate decompressed frames and the corresponding one or more ICS-training-label frames.
The method according to claim 9, wherein each second sample intermediate decompressed ICS-frame corresponds to an ICS-training-label frame, and quality loss of a second sample intermediate decompressed ICS-frame is mean square difference between the second sample intermediate decompressed ICS-frame and its corresponding ICS-training-label frame, and the total quality loss between the one or more second sample intermediate decompressed ICS-frames and the one or more ICS-training-label frames is the sum of the individual quality loss values.
The method according to claim 7, wherein the method further comprising:

determining one or more second sample intermediate ICS-frames by performing intra-frame compression to each first sample raw-bayer frame with the compressing kernel;

determining one or more second sample intermediate decompressed ICS-frames by performing decompression to each second sample intermediate ICS-frame with the initial decompressing kernel;

determining one or more second sample reconstructed frames by inputting each second sample intermediate decompressed ICS-frame into an initial QINN;

determining the decompressing kernel and a QINN by tuning parameters in the initial decompressing kernel and the initial QINN based on machine learning to minimize total quality loss between the one or more second sample reconstructed frames and the one or more ICS-training-label frames.
The method according to claim 11, wherein each second sample reconstructed frame corresponds to an ICS-training-label frame, and quality loss of a second sample reconstructed ICS-frame is mean square difference between the second sample reconstructed ICS-frame and its corresponding ICS-training-label frame, and the total quality loss between the one or more second sample reconstructed ICS-frames and the one or more ICS-training-label frames is the sum of the individual quality loss values.
An ICS-frame transformation apparatus for CV analysis, comprising:

a reading out module, wherein the reading out module is configured to read out one or more raw-bayer frames, and each raw-bayer frame is of dimension [NX, NY] ;

a processor, wherein the processor is configured to determine one or more transformed ICS-frames by using parameters in a 2D array of dimension [Ncomp, 3] to linearly transform the one or more ICS-frames;

an output port, wherein the output port is configured to output the one or more transformed ICS-frames to a neural network for CV analysis;

wherein the one or more ICS-frames are determined by performing intra-frame compression to one or more raw-bayer frames with a compressing kernel, wherein the one or more raw-bayer frames are captured by one or more camera heads;

wherein each raw-bayer frame is of dimension [NX, NY] and the compressing kernel is of dimension [kx, ky, Ncomp] ;

wherein NX, NY, kx, ky, NX/kx, NY/ky and Ncomp are positive integers, and Ncomp represents number of ICS-channels of the compressing kernel.
The apparatus according to claim 13, wherein for each ICS-frame, the processor determines a corresponding transformed ICS-frame by summing pixel values at the same XY-plane position in Ncomp ICS-channels with weighting factors in three 1D vectors [Ncomp, j] of the 2D array, wherein j is 0, 1 and 2.
The apparatus according to claim 13 or 14, wherein the parameters in the 2D array is determined based on sample training, and the sample training comprising:

reading out one or more first sample raw-bayer frames, wherein each first sample raw-bayer frame is of dimension [NX, NY] ;

determining one or more demosaiced first sample raw-bayer frames by performing demosaicing to each first sample raw-bayer frame, wherein each demosaiced first sample raw-bayer frame is of dimension [NX, NY, 3] ;

determining one or more trans-training-label frames by performing down-sampling to each demosaiced first sample raw-bayer frame, wherein each trans-training-label frame is of dimension [NX/kx, NY/ky, 3] ;

determining one or more first sample ICS-frames by performing intra-frame compression to each first sample raw-bayer frame with the compressing kernel;

determining one or more first sample transformed ICS-frames of dimension [NX/kx, NY/ky, 3] by using initial parameters in a 2D array of dimension [Ncomp, 3] to linearly transform the one or more first sample ICS-frames;

determining the parameters in the 2D array by tuning the initial parameters in the 2D array to minimize total training loss between the one or more first sample transformed ICS-frames and the corresponding one or more trans-training-label frames.
The apparatus according to claim 15, wherein each first sample transformed ICS-frame corresponds to a trans-training-label frame, and training loss of a first sample transformed ICS-frame is mean square difference between the first sample transformed ICS-frame and its corresponding trans-training-label frame, and the total training loss between the one or more first sample transformed ICS-frames and the corresponding one or more trans-training-label frames is the sum of the individual training loss values.
The apparatus according to claim 15, wherein the intra-frame compression, comprising:

for each raw-bayer frame, compressing each group of pixel values in the raw-bayer frame into an integer with the compressing kernel, wherein pixels in each raw-bayer frame are divided into multi-groups and each group of pixels corresponds to a 2D or a 1D raw pixels array of the raw-bayer frame.
The apparatus according to claim 13, 15 or 17, wherein the compressing kernel is determined based on sample training, and the sample training comprising:

reading out one or more second sample raw-bayer frames, wherein each second sample raw-bayer frame is of dimension [NX, NY] ;

for each second sample raw-bayer frame, determining a corresponding ICS-training-label frame by performing linear transformations and combinations to the R, B, Gb and Gr pixels in the second sample raw-bayer frame, wherein each ICS-training-label frame is of dimension [NX′, NY′, Nlabel] ;

determining one or more second sample ICS-frames of dimension [NX/kx, NY/ky, Ncomp] by performing intra-frame compression to each second sample raw-bayer frame with an initial compressing kernel, wherein the initial compressing kernel is of dimension [kx, ky, Ncomp] ;

determining one or more second sample decompressed ICS-frames of dimension [NX′, NY′, Nlabel] by performing decompression to each second sample ICS-frame with an initial decompressing kernel of dimension [kx′, ky′, Ncomp, Nlabel] , wherein kx′=NX′*kx/NX, ky′=NY′*ky/NY;

determining the compressing kernel by training the initial compressing kernel based on the one or more ICS-training-label frames;

wherein NX′, NY′, kx′, ky′, and Nlabel are positive integers.
The apparatus according to claim 18, wherein determining the compressing kernel by training the initial compressing kernel based on the one or more ICS-training-label frames, comprising:

determining a floating-number compressing kernel by tuning parameters in the initial compressing kernel based on machine learning to minimize total quality loss between the one or more second sample decompressed ICS-frames and the corresponding one or more ICS-training-label frames;

determining the compressing kernel by integerizing parameters in the floating-number compressing kernel.
The apparatus according to claim 19, wherein each second sample decompressed ICS-frame corresponds to an ICS-training-label frame, and quality loss of a second sample decompressed ICS-frame is mean square difference between the second sample decompressed ICS-frame and its corresponding ICS-training-label frame, and the total quality loss between the one or more second sample decompressed ICS-frames and the one or more ICS-training-label frames is the sum of the individual quality loss values.