GB2563627A

GB2563627A - Image processing

Info

Publication number: GB2563627A
Application number: GB1709889.8A
Authority: GB
Inventors: Pawlik Bartek
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2017-06-21
Filing date: 2017-06-21
Publication date: 2018-12-26
Also published as: GB201709889D0

Abstract

This invention relates to a method, apparatus and computer readable medium related to the field of image processing and, more specifically, denoising. The method comprises receiving a plurality of pixelated two-dimensional images and identifying 6.2, in each image, a group of pixels representing the same, or similar, content. A denoising algorithm 6.3 is applied to pixels in the categorised clusters to derive a combined denoised pixel estimate for each pixel position. This joint pixel estimate is then used to replace 6.4 the original pixel in each image of the set. The denoising algorithm may use weighted averages for denoised pixel evaluation or block-matching. It may also be a three-dimensional filtering (BM3D) or non-local means (NLM) algorithm. The pictures may be in colour, with the array being represented as an RGB array (Fig. 2), and could be received via a Bayer imaging array. The pixel identification step may comprise a window-based search (85, Fig.10).

Description

Image Processing

Field of the Invention

This invention relates to methods and systems for image processing, particularly methods and systems for noise reduction in images.

Background

Digital images often comprise noise, not present in the object or scene that was imaged. The noise maybe random noise exhibited as a variation of brightness or colour information. The image noise maybe produced by the sensor and circuitry of the imaging equipment, for example the sensor of a camera of digital scanner. Image noise is an undesirable by-product of image capture that adds spurious and extraneous information. Types of image noise include Gaussian noise, salt-and-pepper noise, shot noise and quantization noise.

Algorithms for reducing image noise (or denoising) are known. The algorithms typically involve determining whether differences in pixel values constitute noise or real photographic detail, and average out the former whilst attempting to preserve the latter.

One such algorithm is the so-called Non-Local Means (NLM) denoising algorithm. The NLM algorithm takes the mean value of all pixels in the image, weighted by how similar these pixels are to the target pixel.

Denoising may also be extended in the temporal domain, i.e. video denoising. This may involve denoising each frame of video individually (spatial video denoising), denoising between frames (temporal video denoising) or a combination of both (spatial-temporal video denoising, or 3D denoising). The so-called block matching and three-dimensional filtering (BM3D) algorithm is an example of a 3D denoising algorithm.

Denoising algorithms are not necessarily perfect, in that not all noise may be removed from images. Therefore references herein to denoising are intended to cover noise reduction and well as removal.

Summary of the Invention A first aspect of the invention provides a method comprising: receiving a plurality of images comprising a two-dimensional array of pixels; identifying in each received image a group of pixels representing the same or similar content; applying a denoising algorithm to pixels in the pixel groups to derive a collaboratively denoised pixel estimate for each pixel position in the identified group; and using the collaborative pixel estimate for corresponding pixel positions in each received image.

The denoising algorithm maybe applied to each pixel in the pixel groups and the resulting denoised pixel estimates for each corresponding pixel position may be combined to derive the collaboratively denoised pixel estimate for each said pixel position.

Combining the resulting denoised pixel estimates may comprise using the weighted average of the resulting denoised pixel estimates.

The denoising algorithm maybe a collaborative denoising algorithm, applied to corresponding blocks of pixels in the pixel groups.

The collaborative denoising algorithm maybe a block matching and three-dimensional filtering (BM3D) algorithm.

The denoising algorithm may be a non-local means (NLM) algorithm.

The images maybe colour images, each pixel in the pixel array being represented as either a red (R), Green (G) or Blue (B) colour component.

Each pixel in the pixel array may be non-interpolated.

The colour images may be received via a Bayer colour filter array.

The identifying step may comprise performing a window-based search, e.g. a sliding window search.

The window-based search maybe performed by correlating different positions of an n x m window with a fixed n x m reference window.

The different positions of the n x m window maybe such that the spatial arrangement of RGB colour components in each position corresponds with that of the reference window.

The window-based search may be performed within a larger, axb search window, the axb search window being such that the spatial arrangement of RGB colour components is identical for each image.

The n x m reference window may be substantially at the centre of the a x b search window.

The axb search window may be moved iteratively over different corresponding portions of each image and the window-based search repeated for each different portion.

The window-based search may be performed using the Li-norm and/or L2-norm algorithm.

The plurality of colour images may represent substantially simultaneously-captured content.

The method may further comprise receiving the plurality of colour images from separate image sources having a predetermined spatial relationship to one another such that overlapping parts of the images can be identified.

The a x b search window maybe positioned such that it is within the overlapping parts.

The method may further comprise transforming the received colour images by rotation and/or warping.

The method may be performed on a single integrated circuit.

The method may be performed on a FPGA. A second aspect of the invention provides a computer program comprising instructions that when executed by a computer control it to perform the method of any preceding definition. A third aspect of the invention provides a non-transitory computer-readable medium having stored thereon computer-readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising: receiving a plurality of colour images comprising a two-dimensional array of pixels; identifying in each received image a group of pixels representing the same or similar content; applying a denoising algorithm to pixels in the pixel groups to derive a collaboratively denoised pixel estimate for each pixel position in the identified group; and using the collaborative pixel estimate for corresponding pixel positions in each received image. A fourth aspect of the invention provides an apparatus, the apparatus having at least one processor and at least one memory having computer-readable code stored thereon which when executed controls the at least one processor: to receive a plurality of colour images comprising a two-dimensional array of pixels; to identify in each received image a group of pixels representing the same or similar content; to apply a denoising algorithm to pixels in the pixel groups to derive a collaboratively denoised pixel estimate for each pixel position in the identified group; and to use the collaborative pixel estimate for corresponding pixel positions in each received image.

Brief Description of the Drawings

The invention will now be described, by way of non-limiting example, with reference to the drawings, in which:

Figure 1 is a schematic diagram of components of a digital camera according to embodiments of the invention;

Figure 2 is a partial view of an sensor a colour filter array of the Figure 1 digital camera;

Figures 3a - 3c are schematic views of different colour patterns received at photo sensors of the Figure 1 sensor;

Figure 4 is a perspective view of a system comprising a plurality of cameras and an image processing system, according to embodiments;

Figure 5 is a schematic diagram of components of the Figure 4 image processing system, according to embodiments;

Figure 6 is a flow diagram showing processing steps for performing a denoising method according to embodiments;

Figure 7 is a flow diagram showing detailed processing steps for performing an identification step of the Figure 6 method;

Figures 8a - 8c are graphical representations of images from respective cameras in which non-overlapping areas are identified;

Figures 9a - 9c are graphical representations of the Figure 8 images after cropping; Figures 10a - 10c are graphical representations of the Figure 9 images with a search window defined;

Figure 11 is a graphical representation of one of the Figure 10 images to illustrate alignment with a particular pixel Bayer colour;

Figures 12a - 12c are graphical representations of the Figure 9 images with reference windows and candidate windows shown;

Figure 13 is a schematic diagram which is useful for understanding embodiments of the invention when employing a 3D denoising algorithm; and Figure 14 is a graphical representation of a method of extending the method into the temporal domain, according to some embodiments.

Detailed Description of Preferred Embodiments

Embodiments herein relate to methods and systems for denoising images.

References herein to denoising are intended to cover noise reduction and well as removal.

Embodiments particularly relate to denoising a colour image. The general approach outlined below however is applicable to all image types, including black and white images.

For example, the colour image maybe received from a digital camera which can be of any suitable type, including a still image camera, a video camera and/or a multicamera system such as Nokia’s OZO camera.

The camera maybe a dedicated camera, fixed or portable, or maybe provided as part of a mobile handset such as a mobile telephone, laptop, or tablet computer. The camera may also form part of more dedicated equipment such as medical imaging equipment or thermal imaging devices.

Cameras typically comprise a solid state sensor for receiving light on a two-dimensional surface after the light passes through a lens. The sensor may, for example, be a charged-coupled device (CCD) sensor or a Complementary Metal Oxide Semiconductor (CMOS) sensor, or a hybrid of both. The sensor converts received light into electrical signals that convey information on the light at respective spatial positions of the sensor’s two-dimensional surface, each spatial position corresponding to a picture element, or pixel, of the captured image.

In some embodiments, a colour-separation filter may be provided in front of the sensor. This is because a digitised colour image typically comprises three channels per pixel, each carrying information of a different light wavelength band, e.g. red, green and blue. Rather than using three separate sensors, preceded by a different colour filter to capture the information for the respective bands, most cameras employ a single sensor with an overlaid colour filter array (CFA). This is more efficient in terms of size and cost. It follows that only one colour is captured at each spatial position. The data produced at the sensor is sometimes referred to as RAW image data.

The resulting image data, e.g. the RAW data, may comprise noise not present in the object or scene that is imaged. The noise maybe random noise exhibited as a variation of brightness or colour information and usually results from electronic noise. The image noise may be produced by the sensor and circuitry of the imaging equipment, for example the sensor of a camera of digital scanner. Image noise is an undesirable by-product of image capture that adds spurious and extraneous information. Types of image noise include Gaussian noise, salt-and-pepper noise, shot noise and quantization noise. Embodiments to be described below relate to methods and systems for denoising, i.e. reducing, noise in digital images. A process of reconstructing a full colour image from the RAW image data is referred to as demosaicing. In essence, demosaicing involves deriving a full set of colour values for each spatial position from the spatially under sampled colour channels. Demosiacing typically involves interpolation. A commonly-used CFA is the Bayer filter. The Bayer filter employs alternating red and green filters for odd rows and alternating green and blue filters for even rows. There are therefore twice as many green filters as red or blue filters. It is known that human vision is more sensitive to green light. Each pixel location of the sensor is behind a particular filter, and hence the RAW output is an array of pixel values indicating the intensity of one of the filter colours.

Preferably, the denoising algorithms to be described below are performed on the non-interpolated image data, for example the RAW data received via a CFA. This is found to be efficient because we are able to more accurately estimate the noise model as a Poisson-Gaussian distribution and, also, because the amount of data to be processed is smaller. Embodiments herein therefore relate to image processing on the RAW output from an image sensor of a digital camera, which maybe of any form, and for any purpose as outlined above.

Referring to Figure 1, a digital camera 1 comprises a lens 3, a CFA 5 and an image sensor 7 disposed behind the CFA relative to the direction of received light (indicated by the arrow 9.) In some embodiments, the lens 3 may not be required or may be provided by an external lens system detachably mounted to the camera 1. In some embodiments, the CFA 5 and sensor 7 may not be aligned with the shown direction of light, in which case a mirror or prism (not shown) may be provided to redirect light towards the filter and sensor, as appropriate.

The camera 1 may further comprise a controller 11, RAM 13, a memory 15, and, optionally, hardware keys 19 and a display 17. The camera 1 may comprise an interface 25, which maybe a data port for connecting the camera via a cable to an external terminal such as a computer, television, printer or a mobile device such as a mobile phone or tablet computer. The interface 25 maybe one or more of a USB port, micro-USB port or Firewire port, for example. The interface 25 may be used for transmitting and/or receiving data, e.g. image data to and/or from the external terminal.

The camera 1 may also comprise a radiofrequency receiver and/or transmitter 27 for the wireless reception and/or transmission of data to an external terminal such as a computer, television, printer or a mobile device such as a mobile phone or tablet computer. The camera 1 may also comprise a memory card slot (not shown) for receiving a removable storage device, such as a SD memory card or the like.

The memory 15 may be a non-volatile memory such as read only memory (ROM), a hard disk drive (HDD) or a solid state drive (SSD). The memory 15 stores, amongst other things, an operating system 21 and may store software applications 23. The RAM 13 is used by the controller 11 for the temporary storage of data. The operating system 21 may contain code which, when executed by the controller 11 in conjunction with the RAM 13, controls operation of each of the hardware components of the camera 1.

The controller 11 may take any suitable form. For instance, it maybe a microcontroller, plural microcontrollers, a processor, or plural processors.

In some embodiments, the camera 1 may also be associated with external software applications not stored on the camera. These may be applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications maybe termed cloud-hosted applications. The camera 1 maybe in communication with the remote server device in order to utilize the software application stored there.

In some embodiments, the controller n may be implemented as a Field

Programmable Gate Array (FPGA). An FPGA is an integrated circuit designed to be configured ‘in the field’ by a user after manufacture. FPGAs comprise a matrix of programmable logic blocks and a hierarchy of reconfigurable interconnects that permit the logic blocks to be connected together according to a user’s design. Defining the operation of the FPGA may be by means of a hardware description language (HDL) and a design automation tool which generates a netlist for a subsequent place-and-route stage using the FPGA vendor’s proprietary software. FPGAs are useful for implementing image processing applications. This is because their structure is able to exploit spatial and temporal parallelism. FPGAs offer particular advantages therefore in real-time image processing, which is otherwise difficult to achieve on more conventional serial processors. FPGAs are often therefore the technology of choice in modern digital cameras.

Figure 2 is a partial view of the CFA 5, shown overlaid on a small, two-dimensional section of the sensor 7 corresponding to an 8x8 array of photo sensors 37. It will be appreciated that both the CFA 5 and sensor 7 will in reality have a much greater area and array size, but similar principles apply. The CFA 5 in the embodiments is a Bayer filter for arranging red, green and blue (RGB) colour filters 31, 33, 35 on the square grid of photo sensors 37. Each individual RGB filter 31, 33, 35 of the mosaic allows light of the corresponding colour wavelength to pass to the underlying photo sensor 37, blocking the other colours.

The arrangement of the RGB filters 31, 33, 35 is as follows. One row comprises alternate green and blue filters 33, 35. The adjacent row comprises alternative red and green filters 31, 33. The pattern then repeats. The filter pattern is therefore 50% green, 25% red and 25% blue, owing to human vision being more sensitive to green light. Effectively, the image is spatially sub-sampled.

Figures 3a - 3c show the resulting patterns received at the photo sensors 37 of the sensor 7. To distinguish between a green colour detected in a row 37 of red photo sensors and a row 39 of blue photo sensors, the convention is to use the label Gb for the former and Gr for the latter.

The photo sensors 37 of the sensor 7 generate an electrical signal responsive to received light; the signal represents the intensity of the light which is then passed to an Analog-to-Digital Converter (ADC) to provide a digital representation of the pixel at the corresponding spatial location. The pixel maybe represented in, for example, 8,16, 32 or 64 bits, or even greater, depending on the required bit depth for the particular application. It will therefore be appreciated that image processing is by its nature an intensive process.

The collection of pixels for the entire area of the sensor 7 corresponds to an unprocessed RAW image that may be stored and processed. As each photo sensor 37 captures only one colour per spatial position, demosaicing is required to provide data corresponding to the other two colours. There are a number of known methods for demosaicing, including (at the most simple level) interpolation using the values of nearby pixels.

Embodiments herein propose methods and systems for denoising the RAW image data prior to demosaicing. However, alternative embodiments may perform denoising after demosaicing. The embodiments relate to denoising one or more images using data from a plurality of said images which cover or capture at least part of a common scene or object. The images may be captured substantially simultaneously, although this is not essential.

Figure 4 shows a scenario in which a space, e.g. a room 41, comprises a plurality of cameras 43, 45, 47 of the Figure 1 type mounted at spatially separate locations. The sensors (not shown) of the respective cameras 43, 44, 47 are directed towards a common object 51 in the space such that at least part of the sensors cover an overlapping part of the scene, including the object in this case. In some embodiments, only two cameras may be provided. In other embodiments, a greater number of cameras maybe provided.

An image processing system 53 is provided external to the cameras 43,45, 47. As will be explained below, the image processing system 53 is configured to receive the RAW image data from each of the cameras 43, 45, 47 and to perform collaborative denoising on the received RAW image data.

Collaborative denoising means that data from different images, in this case the images captured by the different cameras, is used to estimate a denoised pixel or group of pixels.

Alternatively, however, the functionality of the image processing system 53 may be provided in one or more of the cameras 43, 45, 47 without the need for a dedicated external system.

The RAW image data from each camera 43, 45, 47 is transmitted to the image processing system 53. If the RAW image data corresponds to video, i.e. as a series of sequentially captured images, the data maybe transmitted in accordance with a predetermined refresh rate, e.g. 25 frames per second. The data transmission may be by means of cables or wires 55 connected between the interface 25 of each camera 43, 45, 47 and a corresponding interface of the image processing system 53.

Alternatively, the data transmission maybe by means of a wireless communications protocol.

Referring to Figure 5, the image processing system 53 may comprise a controller 61, RAM 63, a memory 65, and, optionally, hardware keys 67 and a display 69. The image processing system 53 may comprise a network interface 71, which may be a data port for connecting the system to the cameras 43, 45, 47 via a cable. The interface 71 maybe one or more of a USB port, micro-USB port or Firewire port, for example. The interface 71 may be used for transmitting and/or receiving data, e.g. image data to and/or from the external terminal.

The network interface 71 may additionally or alternatively comprise a radiofrequency wireless interface for transmitting and/or receiving the image data using a wireless communications protocol, e.g. WiFi or Bluetooth. An antenna 73 maybe provided for this purpose.

The memory 65 may be a non-volatile memory such as read only memory (ROM), a hard disk drive (HDD) or a solid state drive (SSD). The memory 65 stores, amongst other things, an operating system 75 and may store software applications 77. The RAM 63 is used by the controller 61 for the temporary storage of data. The operating system 75 may contain code which, when executed by the controller 61 in conjunction with the RAM 63, controls operation of each of the hardware components of the image processing system 53.

The memory 65 may also store a set of camera alignment data 79 which represents intrinsic and extrinsic properties and characteristics of the cameras 43, 45, 47, including their relative spatial positions and other data sufficient to determine the common overlapping area of their respective fields-of-view. The camera alignment data 79 may for example comprise the yaw, roll and pitch angles of the different cameras 43, 45, 47 and their relative orientation angles. The camera alignment data 79 may be derived in a calibration stage.

An example multi-camera calibration model is described in “A software for complete calibration of multicamera systems” Tomas Svoboda, Czech Technical University, Faculty of Electrical Engineering, Center for Machine Perception, http: / / cmp .felk, cwt. cz / ~ svoboda/ SelfCalZ Publ/talk. pdf.

The controller n may take any suitable form. For instance, it may be a microcontroller, plural microcontrollers, a processor, or plural processors.

In some embodiments, the image processing system 53 may also be associated with external software applications not stored on the camera. These maybe applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications maybe termed cloud-hosted applications. The camera 1 may be in communication with the remote server device in order to utilize the software application stored there.

In some embodiments, the controller 11 may be implemented as a Field

Programmable Gate Array (FPGA). As mentioned above, FPGAs are useful for implementing image processing applications of the sort described herein. This is because their structure is able to exploit spatial and temporal parallelism. FPGAs offer particular advantages therefore in real-time image processing, which is otherwise difficult to achieve on more conventional serial processors. FPGAs are often therefore the technology of choice in modern digital cameras.

In the following, therefore, it should be appreciated that the same functionality is realizable using alternative integrated circuit methods, such as using FPGAs or Application Specific Integrated Circuits (ASIC). In such embodiments, the software functionality described herein maybe implemented by the FPGA or ASIC rather than using program code.

Figure 6 is a flow diagram indicating steps performed by a software application 77. A first step 6.1 comprises receiving a plurality of images from the respective cameras 43, 45, 47. In some embodiments, the plurality of images are captured substantially simultaneously, although this is not essential. A subsequent step 6.2 comprises identifying in each of the images a group of pixels representing the same or similar content. For example, in the Figure 4 scenario, the same or similar content maybe pixels corresponding to the object 51. In more complex scenarios, the background may also comprise same or similar content between the images, there maybe further objects and so on.

The result will be a group of pixels identified in each captured image so that there is a correspondence between the groups (in the different images) in terms of constituent pixels. A more detailed explanation of the identifying step 6.2 will be described later on. A subsequent step 6.3 comprises performing denoising to derive a collaboratively denoised pixel estimate for each pixel position in the identified groups. A collaboratively denoised pixel estimate is a pixel value derived from a denoising algorithm which takes into account pixels from multiple images. A subsequent step 6.4 comprises replacing pixels in one, or in each of the received images, with the collaboratively denoised pixel estimate for each pixel position in the identified groups. The result is, or are, one or more denoised images resulting from collaborative denoising methods, taking into account data from spatially-separate cameras. The denoised images show significant improvements in quality over denoised images resulting from existing methods.

Figure 7 is a flow diagram indicating one way of identifying in each image a group of pixels representing the same or similar content (step 6.2.) A first step 7.1 may comprise rotating and/or warping each image. In this respect, the intrinsic and extrinsic properties and characteristics of the cameras 43, 45, 47 maybe such that their resulting images maybe at different orientations and/or be distorted. For example, the image from one camera maybe upside-down compared with the image from the other camera. In this case, rotation may be employed so that the two images are of the same orientation. For example, one or more of the cameras 43, 45, 47 may employ a wide-angle lens such as a fish-eye lens. In this case, warping maybe employed to correct for differences in the lens shapes. Step 7.1 is therefore an optional step. A subsequent step 7.2 may comprise removing non-overlapping areas of the received images. This reduces the amount of subsequent processing required, leaving only the overlapping area common to the images captured by the cameras 43, 45, 47. Step 7.2 is therefore an optional step. A subsequent step 7.3 comprises defining a search window in each image. The search window may be a two-dimensional search window of any size. For reasons of efficiency, the search window is defined with a predetermined size of a x b pixels, smaller than the image size, and has an initial position within the image.

The initial position of the search window in each image is such that a reference point of the search window, e.g. the top-left corner, is located on the same Bayer pixel in each image, i.e. one of a red, green or blue pixel. A subsequent step 7.4 comprises defining a reference window within the search window. The reference window may be a two-dimensional reference window of any size that is smaller than the a xb search window. For example, the reference window may have a predetermined size of n x m pixels. Typically, the reference window may be square and positioned in the centre of the current search window. A subsequent step 7.5 comprises defining a candidate window within the search window and correlating pixel values of the candidate window with those of the reference window for different positions within search window to identify a maximum correlation, or one above a threshold indicative of a block match. In this sense, ‘correlation’ maybe defined broadly as measuring the similarity or correspondence between blocks of pixels using, for example the Li or L2 - norm algorithms, and not necessarily correlation in the mathematical sense. The candidate window may have a predetermined size which is the same as that of the reference window, i.e. n x m pixels. Typically, therefore, the candidate window may be square. The initial position of the candidate window may be at the top-left corner of the search window and subsequently moves according to a predetermined pattern.

The initial position of the reference window and the candidate window may have a reference point, e.g. the top-left corner, located on the same Bayer pixel, i.e. one of a red, green or blue pixel. A subsequent step 7.6 identifies a group of pixels based on the correlation result as the candidate window moves within the search window, and if a positive step results, the group of pixels is stored in step 7.7. The method then proceeds to step 7.8.

In step 7.8 it is determined whether the n x m pixels search window has covered all of the overlapping image area. If not, the method moves to step 7.9 in which the axb search window is moved to an adjacent position and the process repeats from step 7.4. The movement of the search window maybe restricted to a reference point of the search window, e.g. the top-left corner, being located on the same Bayer pixel, i.e. one of a red, green or blue pixel.

If in step 7.8 it is determined that the search window has covered all of the overlapping image area, the process ends in step 7.10. It should be appreciated that the step size of the search window can be one pixel, or more than one pixel. For example, moving the search window every two or three pixels may be performed to decrease computation time. There may be some loss of denoising quality, but within a reasonable margin.

It will be appreciated that certain steps of the Figure 7 method may be re-ordered and/or performed in parallel.

The above method will now be graphically illustrated with reference to Figures 8 to 12.

Referring to Figures 8a - 8c, the images respectively captured by the cameras 43, 45, 47 in Figure 4 are shown. In this case, no rotation or warping is needed because the images are of the same orientation and a conventional lens has been used. The broken line 80 indicates a division in the image whereby the shaded region 81 represents a region that does not overlap with the other images. The shaded region 81 can therefore be removed from each image.

The result of the removal or cropping is shown in Figures 9a - 9c, which respectively correspond to Figures 8a - 8c.

Referring to Figures 10a - 10c, a search window 85 is shown when located around the object 51 in each of the images. The search window 85 may be, for example, a rectangle or any suitable shape or size. The initial position of the search window 85 may be at the top left-hand corner of the image; in the shown examples it is assumed that the search window has been moved to the current positon as part of a predetermined movement pattern.

Figure 11 shows how the top left-hand corner 87 of the search window 85 corresponds with, in this case, a blue Bayer pixel 89. This is the case for all search windows 85 shown in Figures 10a - 10c. Colour constrained block matching is important to reduce, and possibly eliminate, checkerboard artefacts. If blocks of different colour configurations are grouped, we would be mixing colour information during averaging or 3D denoising which is unwanted. Such artefacts appear in regions with small intercolour difference.

Referring to Figures 12a - 12c, a reference block 89 is defined within the search window 85, for example at its approximate centre. A candidate window 91 is defined within the search window 89 at an initial position and the correlation step performed with respect to the reference block 89 at respective positions for each image as the candidate window is moved within the search window 85.

The correlation step may use known algorithms such as Li-norm or L2-norm. Li-norm is also known as least absolute deviations (LAD) or least absolute errors (LAE). It involves minimising the sum of the absolute differences between a target value and estimated values. L2-norm is also known as least squares. It involves minimising the sum of the square of the differences between the target value and the estimated values.

As an alternative to the Figure 7 method, step 6.2 maybe performed using a so-called PatchMatch algorithm, for example see htt p: / /gfx. cs.prineeton. eduZ pubs /Barnes 2 0 oq PAR/patehmatch. pdf, the contents of which are incorporated herein by reference.

Having identified corresponding groups of pixels in each image, the process moves onto the collaborative denoising step 6.3.

As mentioned, the collaborative denoising step 6.3 comprises deriving pixel values using a denoising algorithm which takes into account similar or corresponding pixels from multiple images.

One method for collaborative denoising may comprise using a known denoising algorithm on individual corresponding pixels or blocks of pixels of each image. For example, the blocks of pixels may be those identified as correlating with other blocks of pixels in other images in the previous stage. An algorithm such as the NLM algorithm may be applied to each target pixel or block of pixels in each image, and the denoised pixel or group of pixels may be a weighted average of the (in this case three) denoised pixels or blocks of pixels.

Another method for collaborative denoising may use a 3D denoising algorithm, for example the BM3D algorithm.

In effect, the similar blocks of pixels identified in the previous stage are stacked together to form a 3D block 101 as shown in Figure 13. The 3D block 101 is applied to a BM3D algorithm 102, as explained in “Image denoising by sparse 3D transformdomain collaborative filtering” Dabov, Kostadin et al, IEEE Transactions on Image Processing, Vol. 16, No. 8, August 2007, the contents of which are incorporated herein by reference. The 3D block 101 maybe transformed into a selected transform domain, e.g. using Haar 3D decomposition, and collaborative filtering performed by shrinkage in the transform domain. In some embodiments, a 3D transformation may be separated into a 2D + lD transformation, in which case the former can be different from the latter. For example, a discrete cosine transform (DCT) can be used for the 2D transformation, and the Haar transformation can be performed in one dimension along the vertical direction.

Once filtered, the denoised block 103 is separated and the constituent parts transferred back to their original positions in their respective images (as shown in Figure 13) and another block processed, if needed. As such, the second 3D denoising algorithm acts collaboratively on 3D blocks of corresponding pixels.

In some embodiments, the above methods maybe extended into the temporal domain, i.e. taking subsequent frames of a video sequence into account. For example, steps 6.2 and 6.3 maybe performed in the temporal as well as spatial domain. Step 6.2 would involve performing a many-to-many correlation or correspondence provided that the search window will have the same location in each consecutive frame of the video. This is shown graphically in Figure 14.

It will be appreciated that the above described embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present application.

Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present application or of any application derived therefrom, new claims may be formulated to cover any such features and/or combination of such features.

Claims

1. A method comprising: receiving a plurality of images comprising a two-dimensional array of pixels; identifying in each received image a group of pixels representing the same or similar content; applying a denoising algorithm to pixels in the pixel groups to derive a collaboratively denoised pixel estimate for each pixel position in the identified group; and using the collaborative pixel estimate for corresponding pixel positions in each received image.

2. The method of claim 1, wherein the denoising algorithm is applied to each pixel in the pixel groups and the resulting denoised pixel estimates for each corresponding pixel position are combined to derive the collaboratively denoised pixel estimate for each said pixel position.

3. The method of claim 2, wherein combining the resulting denoised pixel estimates comprises using the weighted average of the resulting denoised pixel estimates.

4. The method of claim 1, wherein the denoising algorithm is a collaborative denoising algorithm, applied to corresponding blocks of pixels in the pixel groups.

5. The method of claim 4, wherein the collaborative denoising algorithm is a block matching and three-dimensional filtering (BM3D) algorithm.

6. The method of claim 4, wherein the denoising algorithm is a non-local means (NLM) algorithm.

7. The method of any preceding claim, wherein the images are colour images, each pixel in the pixel array is represented as either a red (R), Green (G) or Blue (B) colour component.

8. The method of claim 7, wherein each pixel in the pixel array is non-interpolated.

9. The method of claim 7 or claim 8, wherein the colour images are received via a Bayer colour filter array.

10. The method of any preceding claim, wherein the identifying step comprises performing a window-based search, e.g. a sliding window search.

11. The method of claim 10, wherein the window-based search is performed by correlating different positions of an n x m window with a fixed n x m reference window.

12. The method of claim 11, wherein the different positions of the n x m window are such that the spatial arrangement of RGB colour components in each position corresponds with that of the reference window.

13. The method of claim 11 or claim 12, wherein the window-based search is performed within a larger, axb search window, the axb search window being such that the spatial arrangement of RGB colour components is identical for each image.

14. The method of any of claim 13, wherein the n x m reference window is substantially at the centre of the a x b search window.

15. The method of claim 13 or claim 14, wherein the axb search window is moved iteratively over different corresponding portions of each image and the window-based search repeated for each different portion.

16. The method of any of claims 10 to 15, wherein the window-based search is performed using the Li-norm and/or L2-norm algorithm.

17- The method of any preceding claim, wherein the plurality of colour images represent substantially simultaneously-captured content.

18. The method of any preceding claim, further comprising receiving the plurality of colour images from separate image sources having a predetermined spatial relationship to one another such that overlapping parts of the images can be identified.

19. The method of claim 18, when dependent on any of claims 13 to 15, wherein the axb search window is positioned such that it is within the overlapping parts.

20. The method of claim 18 or claim 19, further comprising transforming the received colour images by rotation and/or warping.

21. The method of any preceding claim, performed on a single integrated circuit.

22. The method of any preceding claim, performed on a FPGA.

23. A computer program comprising instructions that when executed by a computer control it to perform the method of any preceding claim.

24. A non-transitory computer-readable medium having stored thereon computer-readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising: receiving a plurality of colour images comprising a two-dimensional array of pixels; identifying in each received image a group of pixels representing the same or similar content; applying a denoising algorithm to pixels in the pixel groups to derive a collaboratively denoised pixel estimate for each pixel position in the identified group; and using the collaborative pixel estimate for corresponding pixel positions in each received image.

25- An apparatus, the apparatus having at least one processor and at least one memory having computer-readable code stored thereon which when executed controls the at least one processor: to receive a plurality of colour images comprising a two-dimensional array of pixels; to identify in each received image a group of pixels representing the same or similar content; to apply a denoising algorithm to pixels in the pixel groups to derive a collaboratively denoised pixel estimate for each pixel position in the identified group; and to use the collaborative pixel estimate for corresponding pixel positions in each received image.