WO2017214848A1

WO2017214848A1 - Apparatus, method and computer program product for removing object in image

Info

Publication number: WO2017214848A1
Application number: PCT/CN2016/085680
Authority: WO
Inventors: Xuhang LIAN
Original assignee: Nokia Technologies Oy; Nokia Technologies (Beijing) Co., Ltd.
Priority date: 2016-06-14
Filing date: 2016-06-14
Publication date: 2017-12-21

Abstract

Apparatus, method, computer program product and computer readable medium are disclosed for removing object in an image. The apparatus comprises at least one processor; at least one memory including computer program code, the memory and the computer program code configured to, working with the at least one processor, cause the apparatus to determine input intensity of a pixel in an image; and determine output intensity of the pixel based on mapped input intensity and mapped detail intensity.

Description

APPARATUS, METHOD AND COMPUTER PROGRAM PRODUCT FOR REMOVING OBJECT IN AN IMAGE

Field of the Invention

Embodiments of the disclosure generally relate to information technologies, and, more particularly, to removing object in an image.

Background

Image quality plays an important role in most applications based on image. For example, computer vision systems are broadly used for video surveillance, traffic surveillance, driver assistant systems, traffic monitoring, human identification, human-computer interaction, public security, event detection, tracking, frontier guards and the Customs, scenario analysis and classification, object detecting and identification, image indexing and retrieve, etc. However, certain object (such as haze, dark light, etc) in the image greatly influences the performance of the applications based on image. For example, haze and dark light are two common sources of degrading image quality. They hamper the visibility of the scene and its objects. The intensity, hue and saturation of the scene and its objects are also altered by the haze or dark light. Consequently, it gives arise difficulty in extracting haze-invariant or light-invariant features. Because image features are key for computer vision tasks such as image matching, recognition, retrieval, and object detection, the existence of certain object such as haze and dark causes negative effect in the computer vision systems. Therefore, removing certain object in the image is necessary in many applications based on image.

Summary

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

According to one aspect of the disclosure, it is provided an apparatus. The apparatus may comprise at least one processor； and at least one memory including computer program code, the memory and the computer program code configured to, working with the at least one processor, cause the apparatus to perform at least the following: determine input intensity of a pixel in an image； and determine output intensity of the pixel based on mapped input intensity and mapped detail intensity.

According to another aspect of the present disclosure, it is provided a method. The method may comprise determining input intensity of a pixel in an image； and determining output intensity of the pixel based on mapped input intensity and mapped detail intensity.

According to still another aspect of the present disclosure, it is provided a computer program product embodied on a distribution medium readable by a computer and comprising program instructions which, when loaded into a computer, execute at least the following: determine input intensity of a pixel in an image； and determine output intensity of the pixel based on mapped input intensity and mapped detail intensity.

According to still another aspect of the present disclosure, it is provided a non-transitory computer readable medium having encoded thereon statements and instructions to cause a processor to execute at least the following: determine input intensity of a pixel in an image； and determine output intensity of the pixel based on mapped input intensity and mapped detail intensity.

According to still another aspect of the present disclosure, it is provided an apparatus comprising means configured to determine input intensity of a pixel in an image； and means configured to determine output intensity of the pixel based on mapped input intensity and mapped detail intensity.

These and other objects, features and advantages of the disclosure will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.

Brief Description of the Drawings

Figure 1 is a simplified block diagram showing an apparatus according to an embodiment；

Figure 2 is a flow chart depicting a process of removing certain object in image in accordance with embodiments of the present disclosure；

Figure 3 is a flow chart depicting a part of process of removing certain object in image in accordance with embodiments of the present disclosure；

Figure 4 is a flow chart depicting a part of process of removing certain object in image in accordance with embodiments of the present disclosure；

Figure 5 is a flow chart depicting a process of removing haze in image in accordance with embodiments of the present disclosure； and

Figure 6 shows some results of methods according to embodiments of the present disclosure and Dark Channel Prior (DCP) .

Detailed Description

For the purpose of explanation, details are set forth in the following description in order to provide a thorough understanding of the embodiments disclosed. It is apparent, however, to those skilled in the art that the embodiments may be implemented without these specific details or with an equivalent arrangement. Various embodiments of the disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein； rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout. As used herein, the terms "data, " "content, " "information, " and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present disclosure. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present disclosure.

Additionally, as used herein, the term 'circuitry' refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry) ； (b) combinations of circuits and computer program product (s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein； and (c) circuits, such as, for example, a microprocessor (s) or a portion of a microprocessor (s) , that require software or firmware for operation even if the software or firmware is not physically present. This definition of 'circuitry' applies to all uses of this term herein, including in any claims. As a further example, as used herein, the term 'circuitry' also includes an implementation comprising one or more processors and/or portion (s) thereof and accompanying software and/or firmware. As another example, the term 'circuitry' as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network apparatus, other network apparatus, and/or other computing apparatus.

As defined herein, a "non-transitory computer-readable medium, " which refers to a physical medium (e.g., volatile or non-volatile memory device) , can be differentiated from a "transitory computer-readable medium, " which refers to an electromagnetic signal.

It is noted that though the embodiments are mainly described in the context of image dehazing, they are not limited to this but can be applied to remove any suitable object in the image, such as image dedarking. Moreover, the embodiments can be applied to video, though they are mainly discussed in the context of a single image. It is also noted that the embodiments may be applied to not only the processing of non-real time image or video but also the processing of real time image or video. In addition, it is further noted that the image as used herein may refer to a color image or a gray image.

Dehazing is a process of haze removal and dedarking is a process of dealing with dark (low light) image so that the content of the image is clear. Existing dehazing methods work on the assumption of an idea imaging model. But the model is not guaranteed to perfectly fit the practical situation. In addition, it is difficult to precisely estimate the parameters of the imaging model. Existing dedarking methods conduct dedarking by general and empirical image enhancement techniques which are not guaranteed to have minimum error between the dedarked image and ideal bright one.

State-of-the-art dehazing methods are characterized by estimation of medium transmission and airlight. Representative methods include Dark Channel Prior (DCP) as described in “K. He, J. Sun, X. Tang, Single image haze removal using dark channel prior, IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (12) (2011) 2341-2352” , and Haze Relevant Features (HRF) as described in “K. Tang, J. Yang, J. Wang, Investigating Haze-relevant Features in A Learning Framework for Image Dehazing, Proc. IEEE International Conference on Computer Vision and Pattern Recognition, 2014” . In DCP, the medium transmission is estimated by using the so-called dark channel prior. HRF utilizes haze-relevant features extracted from synthetic hazy patches to train a regression model (Random Forest) . The output of the regressor is an estimation of the medium transmission. It is noted that the most important haze-relevant feature is dark channel prior.

However, DCP will break when the input image contains certain structures like a white wall. Both HRF and DCP need to estimate global atmospheric light. The brightest pixel value or the median of the 0.1％pixels with largest dark channel values is taken as the estimation. But there are cases where bright pixels do not correspond to atmospheric light. Therefore, in some cases, these methods cannot obtain good results.

Some dedarking methods are characterized by tone mapping framework/function. However, the empirically designed tone-mapping function is not necessary to obtain optimal results. There are some methods using DCP for enhancement of low-light video. However, these methods inherit the drawback of DCP.

Embodiments of the disclosure can at least solve or mitigate one or more of above issues. A reconstruction framework for removing certain object in the image (such as dehazing and dedarking) is provided in the embodiments, where the output intensity is the multiplication of mapped input intensity and mapped detail intensity. Compared with state-of-the-art dehazing methods, the embodiments do not rely on the imaging model so it is not required to estimate media transmission and air-light which are difficult to be exactly estimated. Compared to state-of-the-art dedarking methods, the tone mapping functions provided by the embodiments are leaned according to an objective function and hence the parameters are optimal and are not empirical. The embodiments propose to recover a degraded image by two tone mapping functions for image intensity and one tone mapping function for image saturation. In addition, the embodiments propose an alternative optimization algorithm for estimating the optimal parameters with a reconstruction-error minimization strategy.

Figure 1 is a simplified block diagram showing an apparatus, such as an electronic apparatus 10, in which various embodiments of the disclosure may be applied. It should be understood, however, that the electronic apparatus as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the disclosure and, therefore, should not be taken to limit the scope of the disclosure. While the electronic apparatus 10 is illustrated and will be hereinafter described for purposes of example, other types of apparatuses may readily employ embodiments of the disclosure. The electronic apparatus 10 may be a portable digital assistant (PDAs) , a user equipment, a mobile computer, a desktop computer, a smart television, a gaming apparatus, a laptop computer, a media player, a camera, a video recorder, a mobile phone, a global positioning system (GPS) apparatus, a smart phone, a tablet, a server, a thin client, a cloud computer, a virtual server, a set-top box, a computing device, a distributed system, a smart glass, a vehicle navigation system and/or any other types of electronic systems. The electronic apparatus 10 may run with any kind of operating system including, but not limited to, Windows, Linux, UNIX, Android, iOS and their variants. Moreover, the apparatus of at least one example embodiment need not to be the entire electronic apparatus, but may be a component or group of components of the electronic apparatus in other example embodiments.

Furthermore, the electronic apparatus may readily employ embodiments of the disclosure regardless of their intent to provide mobility. In this regard, it should be understood that embodiments of the disclosure may be utilized in conjunction with a variety of applications, both in the mobile communications industries and outside of the mobile communications industries.

In at least one example embodiment, the electronic apparatus 10 may comprise processor 11 and memory 12. Processor 11 may be any type of processor, controller, embedded controller, processor core, and/or the like. In at least one example embodiment, processor 11 utilizes computer program code to cause an apparatus to perform one or more actions. Memory 12 may comprise volatile memory, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data and/or other memory, for example, non-volatile memory, which may be embedded and/or may be removable. The non-volatile memory may comprise an EEPROM, flash memory and/or the like. Memory 12 may store any of a number of pieces of information, and data. The information and data may be used by the electronic apparatus 10 to implement one or more functions of the electronic apparatus 10, such as the functions described herein. In at least one example embodiment, memory 12 includes computer program code such that the memory and the computer program code are configured to, working with the processor, cause the apparatus to perform one or more actions described herein.

The electronic apparatus 10 may further comprise a communication device 15. In at least one example embodiment, communication device 15 comprises an antenna, (or multiple antennae) , a wired connector, and/or the like in operable communication with a transmitter and/or a receiver. In at least one example embodiment, processor 11 provides signals to a transmitter and/or receives signals from a receiver. The signals may comprise signaling information in accordance with a communications interface standard, user speech, received data, user generated data, and/or the like. Communication device 15 may operate with one or more air interface standards, communication protocols, modulation types, and access types. By way of illustration, the electronic communication device 15 may operate in accordance with second-generation (2G) wireless communication protocols IS-136 (time division multiple access (TDMA) ) , Global System for Mobile communications (GSM) , and IS-95 (code division multiple access (CDMA) ) , with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS) , CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA) , and/or with fourth-generation (4G) wireless communication protocols, wireless networking protocols, such as 802.11, short-range wireless protocols, such as Bluetooth, and/or the like. Communication device 15 may operate in accordance with wireline protocols, such as Ethernet, digital subscriber line (DSL) , and/or the like.

Processor 11 may comprise means, such as circuitry, for implementing audio, video, communication, navigation, logic functions, and/or the like, as well as for implementing embodiments of the disclosure including, for example, one or more of the functions described herein. For example, processor 11 may comprise means, such as a digital signal processor device, a microprocessor device, various analog to digital converters, digital to analog converters, processing circuitry and other support circuits, for performing various functions including, for example, one or more of the functions described herein. The apparatus may perform control and signal processing functions of the electronic apparatus 10 among these devices according to their respective capabilities. The processor 11 thus may comprise the functionality to encode and interleave message and data prior to modulation and transmission. The processor 11 may additionally comprise an internal voice coder, and may comprise an internal data modem. Further, the processor 11 may comprise functionality to operate one or more software programs, which may be stored in memory and which may, among other things, cause the processor 11 to implement at least one embodiment including, for example, one or more of the functions described herein. For example, the processor 11 may operate a connectivity program, such as a conventional internet browser. The connectivity program may allow the electronic apparatus 10 to transmit and receive internet content, such as location-based content and/or other web page content, according to a Transmission Control Protocol (TCP) , Internet Protocol (IP) , User Datagram Protocol (UDP) , Internet Message Access Protocol (IMAP) , Post Office Protocol (POP) , Simple Mail Transfer Protocol (SMTP) , Wireless Application Protocol (WAP) , Hypertext Transfer Protocol (HTTP) , and/or the like, for example.

The electronic apparatus 10 may comprise a user interface for providing output and/or receiving input. The electronic apparatus 10 may comprise an output device 14. Output device 14 may comprise an audio output device, such as a ringer, an earphone, a speaker, and/or the like. Output device 14 may comprise a tactile output device, such as a vibration transducer, an electronically deformable surface, an electronically deformable structure, and/or the like. Output Device 14 may comprise a visual output device, such as a display, a light, and/or the like. The electronic apparatus may comprise an input device 13. Input device 13 may comprise a light sensor, a proximity sensor, a microphone, a touch sensor, a force sensor, a button, a keypad, a motion sensor, a magnetic field sensor, a camera, a removable storage device and/or the like. A touch sensor and a display may be characterized as a touch display. In an embodiment comprising a touch display, the touch display may be configured to receive input from a single point of contact, multiple points of contact, and/or the like. In such an embodiment, the touch display and/or the processor may determine input based, at least in part, on position, motion, speed, contact area, and/or the like.

The electronic apparatus 10 may include any of a variety of touch displays including those that are configured to enable touch recognition by any of resistive, capacitive, infrared, strain gauge, surface wave, optical imaging, dispersive signal technology, acoustic pulse recognition or other techniques, and to then provide signals indicative of the location and other parameters associated with the touch. Additionally, the touch display may be configured to receive an indication of an input in the form of a touch event which may be defined as an actual physical contact between a selection object (e.g., a finger, stylus, pen, pencil, or other pointing device) and the touch display. Alternatively, a touch event may be defined as bringing the selection object in proximity to the touch display, hovering over a displayed object or approaching an object within a predefined distance, even though physical contact is not made with the touch display. As such, a touch input may comprise any input that is detected by a touch display including touch events that involve actual physical contact and touch events that do not involve physical contact but that are otherwise detected by the touch display, such as a result of the proximity of the selection object to the touch display. A touch display may be capable of receiving information associated with force applied to the touch screen in relation to the touch input. For example, the touch screen may differentiate between a heavy press touch input and a light press touch input. In at least one example embodiment, a display may display two-dimensional information, three-dimensional information and/or the like.

In embodiments including a keypad, the keypad may comprise numeric (for example, 0-9) keys, symbol keys (for example, #, *) , alphabetic keys, and/or the like for operating the electronic apparatus 10. For example, the keypad may comprise a conventional QWERTY keypad arrangement. The keypad may also comprise various soft keys with associated functions. Any keys may be physical keys in which, for example, an electrical connection is physically made or broken, or may be virtual. Virtual keys may be, for example, graphical representations on a touch sensitive surface, whereby the key is actuated by performing a hover or touch gesture on or near the surface. In addition, or alternatively, the electronic apparatus 10 may comprise an interface device such as a joystick or other user input interface.

Input device 13 may comprise a media capturing element. The media capturing element may be any means for capturing an image, video, and/or audio for storage, display or transmission. For example, in at least one example embodiment in which the media capturing element is a camera module, the camera module may comprise a digital camera which may form a digital image file from a captured image. As such, the camera module may comprise hardware, such as a lens or other optical component (s) , and/or software necessary for creating a digital image file from a captured image. Alternatively, the camera module may comprise only the hardware for viewing an image, while a memory device of the electronic apparatus 10 stores instructions for execution by the processor 11 in the form of software for creating a digital image file from a captured image. In at least one example embodiment, the camera module may further comprise a processing element such as a co-processor that assists the processor 11 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and/or decoder may encode and/or decode according to a standard format, for example, a Joint Photographic Experts Group (JPEG) standard format, a moving picture expert group (MPEG) standard format, a Video Coding Experts Group (VCEG) standard format or any other suitable standard formats.

Figure 2 is a flow chart depicting a process 200 of removing object such as haze in the image according to embodiments of the present disclosure, which may be performed at an apparatus such as the electronic apparatus 10 of Figure 1. As such, the electronic apparatus 10 may provide means for accomplishing various parts of the process 200 as well as means for accomplishing other processes in conjunction with other components.

The object to be removed in the image may refer to certain object which may influence image quality. For example, the certain object may be an object in bad weather, such as haze, dark, dust, drizzle, fog, smoke, or other possible objects or particles； may be an object caused by an imaging device, for example object caused by turbid medium on lens of the imaging device； may be an object resulting from imaging condition, for example, the image may be captured in an imaging condition where a camera is behind a glass (such as a window glass) on which a lot of particles (such as fog) are located. In these cases, the irradiance received by the camera from the scene point is attenuated along the line of sight. Furthermore, the incoming light is blended with the airlight (ambient light reflected into the line of sight by particles) . The degraded images may lose the contrast and color fidelity. Therefore, certain object removal (such as dehazing) is highly desired in applications based on image.

As shown in Figure 2, the process 200 may start at block 201 where input intensity of a pixel in an image is determined. The image may be pre-stored in a memory of the electronic apparatus 10, captured in real time by an image sensor, or retrieved from a network location or a local location. By way of example, referring to Figure 1, the processor 11 may obtain the image from the memory 12 if the image is stored in the memory 12； obtain the image from the input device 13 such as from a removable storage device which has stored the image or from a camera； or obtain the image from a network location by means of the communication device 15. As an example, in a transportation monitoring system, the image may be received from a digital camera arranged in a monitoring location.

The image may be a color image or a gray image. In addition, the image may be a single image or an image frame in a video. The color image may be represented by any suitable color model, such as RGB color model, HSL color model, CMYK color model, or the like. The image format may include but not limit to bmp, jpg, jpeg, tiff, gif, pcx, tga, exif, fpx, svg, psd, cdr, pcd, dxf, ufo, eps, ai, raw, or the like.

As an example, let I_i be the input color image represented by RGB color model. Denote the red, green, and blue channels at location (x, y) of the input color image I_i by R_i (x, y) , G_i (x, y) , andB_i (x, y) , respectively. According to the intensity-color decoupling strategy designed for illumination-albedo decoupling, the intensity L_i (x, y) at a location (x, y) on the input color image I_i is a quadratic combination of the three color channels:

L_i(x, y) ＝w_rR_i (x, y) +w_gG_i (x, y) +w_bB_i (x, y) (1)

with the weights w_r, w_g, and w_b proportional to the color channels themselves:

If the image is a gray image, then the gray value of a pixel may be used as the input intensity of the pixel. In other embodiments, if the image is represented by another color model, the input intensity may be determined by any suitable approaches. In addition, it is noted that the input intensity for each pixel in the image can be determined at block 201 and used as input in the following operations of method 200.

After determining the input intensity, the process 200 may proceed to block 212. At block 212, output intensity of the pixel may be determined based on mapped input intensity and mapped detail intensity. For example, a mapping fucntion can be determined, which can map the input intensity to the mapped input intensity, and another mapping fucntion can be determined, which can determine the detail intensity based on the input intensity and map it to the mapped detail intensity, such that the output intensity determined by the mapped input intensity and the mapped detail intensity is equal to or approach to the intensity of the pixel of a corresponding image without the certain object (such as haze) . These mapping fucntions may be learned from a dataset including pairs of images (one with the certain object, and the other one without the certain object) or determined empirically.

In an embodiment, referring to Figure 3, determination of the output intensity may comprise: at block 212-1, determining mapped input intensity based on the input intensity； at block 212-5, determining mapped detail intensity based on the input intensity； and at block 212-8, determining the output intensity of the pixel based on the mapped input intensity and the mapped detail intensity.

Specifically, the output intensity L_o (x, y) may be calculated by multiplying the mapped input intensity L_g (x, y) and the mapped detail intensity D (x, y) . The process may be described as follows.

It is found that nosies (such as haze or low-light) significantly changes image intensity (luminance) . So it necessary to transform the input intensity L_i (x, y) to mapped input intensity L_g (x, y) to compensate the change in intensity. Suppose the intensity mapping function is g: L_i (x, y) →L_g (x, y) . It is noted that many forms of intensity mapping functions may be possible. In an embodiment, the intensity mapping function may be a polynomial function g (which is referred as first polynomial function herein) :

where α_j denote coefficients, and each of the terms comprises respective coefficient α_j (which is referred as first coefficient herein) and the input intensity (L_i (x, y) ) ^j. In an embodiment, the number of terms of the first polynomial function is 4 (i.e., A＝3) . In other embodiments, A may be set as other suitable value for example empirically or by machine learning. The determination of the first coefficients will be discussed in detail in a subsequent section.

Moreover, the image detail may be weaken or eliminated by certain object such as haze or low-light. Therefore, transforming intensity by a simple function g may be not enough for recovering the image detail. To deal with the problem, the output intensity L_o (x, y) may be calculated based on the mapped input intensity I_g (x, y) and the mapped detail intensityD (x, y) . The mapped detail intensity D (x, y) may be calculated by any suitable approaches.

In an embodiment, refering to Figure 4, determination of mapped detail intensityD (x, y) may comprise: at block 212-5-1, smoothing the input intensity to obtain a detail intensity； at block 212-5-5, normalizing the detail intensity； and at block 212-5-8, determining the mapped detail intensity based on the normalized detail intensity by a second polynomial function.

Specifically, the detail intensity L_D (x, y) may be computed by subtracting the input intensity L_i (x, y) by its smoothed version L_s (x, y) at block 212-5-1:

L_D(x, y) ＝L_i (x, y) -L_s (x, y) (4)

where formula (5) is in fact the formula of a bilateral filter where the vector x＝ (x, y) stands for the location (x, y) in the image, the vector x′＝ (x′, y′) is a pixel inside a neighboring region Ω (x) of x, and σ_h and σ_i are predefined parameters. Note that the computation of L_s (x, y) is not limited to formula (5) and other smoothing filters such as guided filter are also possible.

The detail intensity L_D (x, y) may be normalized to L_d (x, y) at block 212-5-5 so that L_d (x, y) is within a standard range such as a range of [0, 255] . The mapped detail intensity D (x, y) may be obtained by a function f: L_d (x, y) →D (x, y) at block 212-5-8. It is noted that many forms of functions may be possible. In an embodiment, similar to fucntiong, fucntion f is also a polynomial function (which is referred as second polynomial function herein) :

where β_j denotes coefficients, and each of the terms comprises respective coefficient β_j (which is referred as second coefficient herein) and the normalized detail intensity (L_d (x, y) ) ^j. In an embodiment, the number of terms of the second polynomial function is 3 (i.e., B＝2) . In other embodiments, B may be set as other suitable value for example empirically or by machine learning. In another embodiment, removal of different type of object can use different first coefficients and second coefficients. The determination of the second coefficients will be discussed in connection with the first coefficients in a subsequent section.

Turn back to Figure 3, after determining the mapped input intensity and the mapped detail intensity, the output intensity of the pixel may be determined based on the mapped input intensity and the mapped detail intensity at block 212-8. It is noted that many forms of functions may be possible, which are used for determining the output intensity based on the mapped input intensity and the mapped detail intensity. In an embodiment, the product of the mapped input intensity L_g (x, y) and the mapped detail intensity D (x, y) may be used as the output intensity L_o (x, y) :

L_o(x, y) ＝D (x, y) L_g (x, y) . (7)

In an embodiment, if the image is a gray image, then the output intensity of the pixel may be used as the gray value of the output pixel. In this way, certain object in the gray image can be removed. In some cases, the image may be a color image. In this case, individual color channel of the output pixel should be determined. To determine the individual color channel of the output pixel, a transform coefficient may be used to determine the individual color channel of the output pixel.

In an embodiment, the transform coefficient may be determined based on the input intensity and the output intensity. For example, the transform coefficient c (x, y) may be obtained by a function c (x, y) ＝F (L_i (x, y) , L_o (x, y) ) , where F (L_i (x, y) , L_o (x, y)) may have any suitable forms. In an embodiment, the transform coefficient c (x, y) may be determined by the following formula:

An output pixel associated with the pixel may be determined based on the transform coefficient. In an embodiment, the image is a color image for example represented by RGB color model and the product of the transform coefficient and individual color channel of the pixel may be used as individual color channel of the output pixel:

where R_o (x, y) , G_o (x, y) , and B_o (x, y) are the red, green, and blue channels of the output image I_o. It is straightforward that the output intensity L_o (x, y) can be obtained by:

L_o(x, y) ＝c (x, y) L_i (x, y) (10)

In an embodiment, the saturation of the output pixel may be adjusted. For example, the saturation of the output pixel may be adjusted empirically or based on a learned saturation adjustment function or by other suitable approaches.

In an embodiment, a saturation mapping function q (S (x, y) ) may be used to correct the saturation S (x, y) of the output pixel. Specifically, suppose the output image is reprsented by RGB color model where R_o (x, y) , G_o (x, y) , and B_o (x, y) are the red, green, and blue channels of the output image I_o, R_o (x, y) , G_o (x, y) , and B_o (x, y) may be transformed into a HSI (Hue, Saturation, Intensity) color space and the resulting components of hue, saturation, and intensity are H (x, y) , S (x, y) , and I (x, y) respectively. The saturation mapping function may be defined as:

where γ_i, μ_i, and

are mapping parameters which may be predifined empirically or learned from a training dataset. q (S (x, y) ) may be referred as saturation coefficient.

The saturation S (x, y) may be mapped to output saturation S_o (x, y) by multiplying q (S (x, y)) :

S_o (x, y) ＝S (x, y) q (S (x, y)) . (12)

Finally, H (x, y) , S_o (x, y) , and I (x, y) may be transformed to RGB color space.

In at least one embodiment, the output pixel may be normalized such that the individual color channel of the output pixel is within a standard range, such as [0-255] . In is noted that the normalization can be performed at any stage of method 200.

The performance of certain object removal may be determined by the first coefficients (α_j, j＝1, ... , A) and the second coefficients (β_j, j＝1, ... , B) . In an embodiment, these coefficients may be determined empirically or by machine learning. In another embodiment, removal of different type of object can use same or different first coefficients and second coefficients. It is noted that removal of same or similar type of object can use same first coefficients and second coefficients. In other words, the first coefficients and second coefficients can be determined or updated for images with same or similar objects to be removed.

In an embodiment, the first coefficients and the second coefficients may be determined based on a minimum-reconstruction-error criterion for learning the optimal parameters. In the following, this embodiment will be discribed for example in the context of hazy images.

To begin with, pairs of images, for example one image with haze and the other one without haze in a pair of images, are collected. An objective function may be calculated across pairs of the images. Let

and

be the intensity of a hazy image

and its haze-free version

k＝1, ... , K.

may be transformed to

according to formula (7) . The objective function

is defined as:

Formula (13) may measure error of estimated

when it is used for approximating the ground-truth haze-free image

Substituting formulas (7) , (6) , and (3) into (13) yields

It is difficult to jointly minimize

with respect to

and

To tackle this problem, it can alternatively optimize

and

Specifically, the following α-step and β-step may be used for updating

and

respectively.

α-step. In this step,

are considered unchanged and

are considered variable. Consequently,

can be viewed as a const

The objective function

becomes

or sake of simplicity, subscript ‘i’ is used to index pixel (x, y) and omit the summation over k. Define

we have

where

and N is the number of pixels.

Compute the derivative of e (a) w. r. t. a and let the result to be zero, we have:

Therefore, the optimal solution is

a＝(BB′) ^-1By. (17)

β-step. In this step,

are considered unchanged and

are considered as variable. Hence,

is viewed as a const

The objective function

becomes

Subscript ‘i’ is also used to index pixel (x, y) and omit the summation over k. Define

we have

where

The solution of minimizing e (b) is

b＝(BB′) ^-1By. (20)

The algorithm iteratively run α-step and β-step until convergence. Finally, it can obtain the optimal first coefficients and the second coefficients.

It is noted that, above method steps such as 201, 212, 212-1, 212-5, 212-8, 212-5-1, 212-5-5, 212-5-8 and/or other steps can be performed on each pixel of the input image respectively to obtain the ouput image.

Figure 5 shows a flow chart depicting a process of dehazing in accordance with embodiments of the present disclosure. As shown in Figure 5, at block 501, obtaining an input image I_i which may be an input color image for example represented by RGB color model. Denote the red, green, and blue channels at location (x, y) of the input image I_i by R_i (x, y) , G_i (x, y) , andB_i (x, y) , respectively.

At block 505, input intensity L_i (x, y) of a pixel at a location (x, y) in the image I_i is determined.

At block 510, mapped input intensity L_g (x, y) may be determined based on the first polynomial function.

At block 515, the input intensity L_i (x, y) may be smoothed by a bilateral filter to obtain its smoothed version L_s (x, y) .

At block 520, detail intensity L_D (x, y) may be computed by subtracting the input intensity L_i (x, y) by its smoothed version L_s (x, y) .

At block 525, L_D (x, y) may be normalized to L_d (x, y) so that L_d (x, y) is in the range of [0, 255] .

At block 530, the mapped detail intensity D (x, y) may be obtained by the second polynomial function.

At block 535, the product of the mapped input intensity L_g (x, y) and the mapped detail intensity D (x, y) may be used as the output intensity L_o (x, y) .

At block 540, the dehazing (transform) coefficient c (x, y) may be determined by dividing the output intensity L_o (x, y) by the input intensity L_i (x, y) .

At block 545, the output pixel may be determined based on the transform coefficient.

At block 550, the saturation of the output pixel may be adjusted by the saturation mapping function.

At block 555, optionally, the output pixel may be normalized such that individual color channel of the output pixel is within a standard range, such as [0-255] .

Figure 6 shows some results of methods according to the embodiments and DCP. Figure 6 (a) and 6 (d) are two different hazy images which are both color images. Figure 6 (b) and 6 (e) are the results of methods according to the embodiments, and Figure 6 (c) and 6 (f) are the results of DCP. Comparing Figure 6 (b) with Figure 6 (c) , we can see that DCP incorrectly recovers greenish buildings in block 601. Figure 6 (e) and (f) show that DCP is unable to correctly recover the green leafs in block 602. Methods according to the embodiments are thus significantly better than DCP.

The proposed methods do not rely on any model (such as imaging or low-light model) whose parameters are difficult to be estimated exactly. It proposes to enhance both intensity and saturation by novel tone mapping functions whose parameters are learned from degraded images and their ground truth. Existing methods do not deal with image saturation with tone mapping. The form of tone mapping functions is different from existing ones and the manner of obtained the parameters of the tone mapping function is also different. The parameters of the proposed tone mapping are optimal whereas those of existing ones are empirically chosen. Both image intensity and saturation are explicitly recovered whereas existing methods merely consider intensity. As shown in Figure 6, the proposed methods are significantly better than existing method such as DCP.

According to an aspect of the disclosure it is provided an apparatus for removing object in an image. For same parts as in the previous embodiments, the description thereof may be omitted as appropriate. The apparatus may comprise means configured to carry out the processes described above. In an embodiment, the apparatus comprises means configured to determine input intensity of a pixel in an image； and means configured to determine output intensity of the pixel based on mapped input intensity and mapped detail intensity.

In an embodiment, the mapped input intensity is determined based on the input intensity and the mapped detail intensity is determined based on the input intensity.

In an embodiment, the mapped input intensity is determined based on the input intensity by a first polynomial function.

In an embodiment, the number of terms of the first polynomial function is 4, and each of the terms comprises respective first coefficient and the input intensity.

In an embodiment, the mapped detail intensity is determined by: smoothing the input intensity to obtain detail intensity； normalizing the detail intensity； and determining the mapped detail intensity based on the normalized detail intensity by a second polynomial function.

In an embodiment, the number of terms of the second polynomial function is 3, and each of the terms comprises respective second coefficient and the normalized detail intensity.

In an embodiment, the apparatus may further comprise means configured to determine the first coefficients and the second coefficients based on a minimum-reconstruction-error criterion.

In an embodiment, determination of the output intensity of the pixel based on the mapped input intensity and the mapped detail intensity comprises: using the product of the mapped input intensity and the mapped detail intensity as the output intensity.

In an embodiment, the image is a color image, and the apparatus may further comprise means configured to determine a transform coefficient based on the input intensity and the output intensity； and means configured to determine an output pixel associated with the pixel based on the transform coefficient.

In an embodiment, determination of the transform coefficient based on the input intensity and the output intensity comprises: using the result of dividing the input intensity by the output intensity as the transform coefficient.

In an embodiment, determination of the output pixel associated with the pixel based on the transform coefficient comprises: using the product of the transform coefficient and individual color channel of the pixel as individual color channel of the output pixel.

In an embodiment, the apparatus may further comprise means configured to adjust the saturation of the output pixel

In an embodiment, wherein adjustment of saturation comprises: transforming the output pixel into a hue, saturation, intensity color space； determining a saturation coefficient by

wherein S (x, y) is the saturation of the output pixel, q (S (x, y) ) is saturation coefficient, γ_i, μ_i, and

are mapping parameters； and determining output saturation S_o (x, y) by S_o (x, y) ＝S (x, y) q (S (x, y) ) .

In an embodiment, the apparatus may further comprise means configured to normalize the output pixel.

It is noted that any of the components of the apparatus described above can be implemented as hardware or software modules. In the case of software modules, they can be embodied on a tangible computer-readable recordable storage medium. All of the software modules (or any subset thereof) can be on the same medium, or each can be on a different medium, for example. The software modules can run, for example, on a hardware processor. The method steps can then be carried out using the distinct software modules, as described above, executing on a hardware processor.

Additionally, an aspect of the disclosure can make use of software running on a general purpose computer or workstation. Such an implementation might employ, for example, a processor, a memory, and an input/output interface formed, for example, by a display and a keyboard. The term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) and/or other forms of processing circuitry. Further, the term “processor” may refer to more than one individual processor. The term “memory” is intended to include memory associated with a processor or CPU, such as, for example, RAM (random access memory) , ROM (read only memory) , a fixed memory device (for example, hard drive) , a removable memory device (for example, diskette) , a flash memory and the like. The processor, memory, and input/output interface such as display and keyboard can be interconnected, for example, via bus as part of a data processing unit. Suitable interconnections, for example via bus, can also be provided to a network interface, such as a network card, which can be provided to interface with a computer network, and to a media interface, such as a diskette or CD-ROM drive, which can be provided to interface with media.

Accordingly, computer software including instructions or code for performing the methodologies of the disclosure, as described herein, may be stored in associated memory devices (for example, ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (for example, into RAM) and implemented by a CPU. Such software could include, but is not limited to, firmware, resident software, microcode, and the like.

As noted, aspects of the disclosure may take the form of a computer program product embodied in a computer readable medium having computer readable program code embodied thereon. Also, any combination of computer readable media may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (anon-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , an optical fiber, a portable compact disc read-only memory (CD-ROM) , an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Computer program code for carrying out operations for aspects of the disclosure may be written in any combination of at least one programming language, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, component, segment, or portion of code, which comprises at least one executable instruction for implementing the specified logical function (s) . It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It should be noted that the terms "connected, " "coupled, " or any variant thereof, mean any connection or coupling, either direct or indirect, between two or more elements, and may encompass the presence of one or more intermediate elements between two elements that are "connected" or "coupled" together. The coupling or connection between the elements can be physical, logical, or a combination thereof. As employed herein, two elements may be considered to be "connected" or "coupled" together by the use of one or more wires, cables and/or printed electrical connections, as well as by the use of electromagnetic energy, such as electromagnetic energy having wavelengths in the radio frequency region, the microwave region and the optical region (both visible and invisible) , as several non-limiting and non-exhaustive examples.

In any case, it should be understood that the components illustrated in this disclosure may be implemented in various forms of hardware, software, or combinations thereof, for example, application specific integrated circuit (s) (ASICS) , functional circuitry, an appropriately programmed general purpose digital computer with associated memory, and the like. Given the teachings of the disclosure provided herein, one of ordinary skill in the related art will be able to contemplate other implementations of the components of the disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a, ” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising, ” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of another feature, integer, step, operation, element, component, and/or group thereof.

The descriptions of the various embodiments have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Claims

An apparatus, comprising:

at least one processor；

at least one memory including computer program code, the memory and the computer program code configured to, working with the at least one processor, cause the apparatus to perform at least the following:

determine input intensity of a pixel in an image； and

determine output intensity of the pixel based on mapped input intensity and mapped detail intensity.
The apparatus according to claim 1, wherein the mapped input intensity is determined based on the input intensity and the mapped detail intensity is determined based on the input intensity.
The apparatus according to claim 2, wherein the mapped input intensity is determined based on the input intensity by a first polynomial function.
The apparatus according to claim 3, wherein the number of terms of the first polynomial function is 4, and each of the terms comprises respective first coefficient and the input intensity.
The apparatus according to any one of claims 1-4, wherein the mapped detail intensity is determined by:

smoothing the input intensity to obtain a detail intensity；

normalizing the detail intensity； and

determining the mapped detail intensity based on the normalized detail intensity by a second polynomial function.
The apparatus according to claim 5, wherein the number of terms of the second polynomial function is 3, and each of the terms comprises respective second coefficient and the normalized detail intensity.
The apparatus according to claim 6, wherein the apparatus is further caused to determine the first coefficients and the second coefficients based on a minimum-reconstruction-error criterion.
The apparatus according to any one of claims 1-7, wherein determination of the output intensity of the pixel based on the mapped input intensity and the mapped detail intensity comprises: use the product of the mapped input intensity and the mapped detail intensity as the output intensity.
The apparatus according to any one of claims 1-8, wherein the image is a color image, and the apparatus is further caused to

determine a transform coefficient based on the input intensity and the output intensity； and

determine an output pixel associated with the pixel based on the transform coefficient.
The apparatus according to claim 9, wherein determination of the transform coefficient based on the input intensity and the output intensity comprises:

use the result of dividing the input intensity by the output intensity as the transform coefficient.
The apparatus according to any one of claims 9-10, wherein determination of the output pixel associated with the pixel based on the transform coefficient comprises:

use the product of the transform coefficient and individual color channel of the pixel as individual color channel of the output pixel.
The apparatus according to any one of claims 9-11, wherein the apparatus is further caused to adjust the saturation of the output pixel.
The apparatus according to claim 12, wherein adjustment of saturation comprises:

transform the output pixel into a hue, saturation, intensity color space；

determine a saturation coefficient by
wherein S (x, y) is the saturation of the output pixel, q (S (x, y) ) is saturation coefficient, γ_i, μ_i, and
are mapping parameters； and

determine output saturation S_o (x, y) by S_o (x, y) ＝S (x, y) q (S (x, y) ) .
The apparatus according to any one of claims 9-13, wherein the apparatus is further caused to normalize the output pixel.
A method, comprising:

determining input intensity of a pixel in an image； and

determining output intensity of the pixel based on mapped input intensity and mapped detail intensity.
The method according to claim 15, wherein the mapped input intensity is determined based on the input intensity and the mapped detail intensity is determined based on the input intensity.
The method according to claim 16, wherein the mapped input intensity is determined based on the input intensity by a first polynomial function.
The method according to claim 17, wherein the number of terms of the first polynomial function is 4, and each of the terms comprises respective first coefficient and the input intensity.
The method according to any one of claims 15-18, wherein the mapped detail intensity is determined by:

smoothing the input intensity to obtain a detail intensity；

normalizing the detail intensity； and

determining the mapped detail intensity based on the normalized detail intensity by a second polynomial function.
The method according to claim 19, wherein the number of terms of the second polynomial function is 3, and each of the terms comprises respective second coefficient and the normalized detail intensity.
The method according to claim 20, further comprising determining the first coefficients and the second coefficients based on a minimum-reconstruction-error criterion.
The method according to any one of claims 15-21, wherein determination of the output intensity of the pixel based on the mapped input intensity and the mapped detail intensity comprises: using the product of the mapped input intensity and the mapped detail intensity as the output intensity.
The method according to any one of claims 15-22, wherein the image is a color image, and the method further comprises:

determining a transform coefficient based on the input intensity and the output intensity； and

determine an output pixel associated with the pixel based on the transform coefficient.
The method according to claim 23, wherein determination of the transform coefficient based on the input intensity and the output intensity comprises:

using the result of dividing the input intensity by the output intensity as the transform coefficient.
The method according to any one of claims 23-24, wherein determination of the output pixel associated with the pixel based on the transform coefficient comprises:

using the product of the transform coefficient and individual color channel of the pixel as individual color channel of the output pixel.
The method according to any one of claims 23-25, further comprising adjusting the saturation of the output pixel
The method according to claim 26, wherein adjustment of saturation comprises:

transforming the output pixel into a hue, saturation, intensity color space；

determining a saturation coefficient by
wherein S (x, y) is the saturation of the output pixel, q (S (x, y) ) is saturation coefficient, γ_i, μ_i, and
are mapping parameters； and

determining output saturation S_o (x, y) by S_o (x, y) ＝S (x, y) q (S (x, y) ) .
The method according to any one of claims 23-27, further comprising: normalizing the output pixel.
An apparatus, comprising means configured to carry out the method according to any one of claims 15 to 28.
A computer program product embodied on a distribution medium readable by a computer and comprising program instructions which, when loaded into a computer, execute the method according to any one of claims 15 to 28.
A non-transitory computer readable medium having encoded thereon statements and instructions to cause a processor to execute a method according to any one of claims 15 to 28.