WO2023178542A1

WO2023178542A1 - Image processing apparatus and method

Info

Publication number: WO2023178542A1
Application number: PCT/CN2022/082430
Authority: WO
Inventors: Marc Patrick ZAPF; Xinrun LI; Fangyun HU
Original assignee: Robert Bosch Gmbh
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2023-09-28

Abstract

An image processing apparatus includes a receiving module configured to receive an image; an object detecting module configured to detect one or more objects in the received image and create an object detection list including the one or more objects; a segmentation module configured to segment the received image into a single background and one or more foreground objects to obtain a segmented image; a comparing module configured to perform a comparison based on the one or more detected objects and the one or more FG objects; and a determining module configured to determine at least one detected object of the one or more detected objects to be true positive or false positive based on the comparison, and to determine an object to be false negative if the object exists in the received image but is not detected by the object detecting module based on the comparison.

Description

IMAGE PROCESSING APPARATUS AND METHOD

FIELD

The following disclosure relates to an image processing apparatus, to an image processing method, and to a corresponding computer readable medium.

BACKGROUND

Cameras are widely used in many applications such as surveillance systems and traffic monitoring systems. In most scenarios, a mono camera is used due to its characteristic properties like low simple structure and low cost. However, the mono camera produces pixel images in 2D image space, and those pixel images do not provide any information on how far away background (BG) features and foreground (FG) features in the images are from the imaging camera. Thus, inferring from the pixel position of an object in the 2D image space to the physical position of the object in the 3D world space is not trivial. In this case, the detection of objects in the 2D image space may be inaccurate.

SUMMARY

One aspect of the disclosure provides an image processing apparatus including: a receiving module configured to receive an image captured by a mono camera; an object detecting module configured to detect one or more objects in the received image and create an object detection list including the one or more objects; a segmentation module configured to segment the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image; a comparing module configured to perform a comparison based on the one or more detected objects and the one or more FG objects; and a determining module configured to determine at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison, and to determine an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.

Another aspect of the disclosure provides an image processing method including: receiving an image captured by a mono camera; detecting one or more objects in the received image; creating an object detection list including the one or more objects; segmenting the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image; performing a comparison based on the one or more detected objects and the one or more FG objects; determining at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison; and determining an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.

Yet another aspect of the disclosure provides a non-transitory computer readable medium with instructions stored therein which, when executed, causes a processor to carry out the steps including: receiving an image captured by a mono camera; detecting one or more objects in the received image; creating an object detection list including the one or more objects; segmenting the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image; performing a comparison based on the one or more detected objects and the one or more FG objects; determining at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison; and determining an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.

Embodiments of the disclosure determine FP and TP objects of the detected objects as well as any possible FN object, and therefore provide more accurate information on object detections.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 is a block diagram of an image processing apparatus according to an example of the disclosure.

Figure 2 illustrates an image processing procedure implemented by the image processing apparatus of Figure 1 according to an example of the disclosure.

Figures 3-8B illustrate the working principle of the image processing apparatus of Figure 1.

Figure 9 is a flowchart of an image processing method according to an example of the disclosure.

DETAILED DESCRIPTION

Object detection has become increasingly important in a variety of technology fields in recent years, as the ability to track objects in images has become increasingly significant in many applications involving security and artificial intelligence technologies (e.g., a collision avoidance function in self-driving vehicles) . Examples of the disclosure relate to apparatus and method for processing images captured by mono cameras to recognize FP and TP objects as well as any possible FN object and modify the object detection list accordingly such that more trustworthy object detections can be acquired. Embodiments of the present disclosure will be described with reference to the figures.

One aspect of the disclosure relates to an image processing apparatus (hereafter referred to as “processing apparatus” ) for processing images captured by a mono camera. The processing apparatus may be integrated with the mono camera to form a part of the mono camera. The processing apparatus may also be separated from the mono camera to be an independent apparatus. The mono camera may be situated in an infrastructure or a parked vehicle and thus it is stationary. The mono camera may also be mounted in a travelling vehicle or wearable device carried by a walking person and thus it is moving.

“Mono camera” or “mono cameras” , as used herein, refer to single camera (s) as opposed to stereo setups or depth cameras which can provide depth information of a scene.

The processing apparatus according to an example of the disclosure receives images from a mono camera that is configured for capturing images and/or a storage unit for storing the captured images, and then processes the received images. The processing apparatus may process images at camera run-time or deal with images stored in the storage unit. In an example, the captured images are sent to the processing apparatus instantly and processed in real time. In another example, the captured images are stored in the storage unit and then obtained and processed by the processing apparatus when the processing is required.

Figure 1 illustrates an image processing apparatus 10 according to an example of the disclosure. With reference to Figure 1, the processing apparatus 10 includes a receiving module 12, an object detecting module 13, a segmentation module 14, a comparing module 15, a determining module. In an example, the processing apparatus 10 further includes an assigning module 17. In an example, the processing apparatus 10 further includes a calibrating module 11. Those modules of the processing apparatus 10 are named functionally. Those names are not intended to limit physical positions of the modules. For example, the modules may be provided in the same chip or circuit or provided in different chips or circuits.

The processing apparatus 10 may be implemented by means of hardware or software or a combination of hardware and software, including a non-transitory computer readable medium stored in a memory and implemented as instructions executed by a processor. Regarding the part implemented by means of hardware, it may be implemented in application-specific integrated circuit (ASIC) , digital signal processor (DSP) , data signal processing device (DSPD) , programmable logic device (PLD) , field programmable gate array (FPGA) , processor, controller, microcontroller, microprocessor, electronic unit, or a combination thereof. The part implemented by software may include microcode, program code or code segments. The software may be stored in a machine readable storage medium, such as a memory.

In an example, the processing apparatus 10 may include a memory and a processor. The memory includes instructions that, when executed by the processor, cause the processor to perform the image processing method according to examples of the disclosure.

Figure 2 illustrates an exemplary image processing procedure 200 which can be implemented by the processing apparatus 10. The image processing procedure 200 is provided in the order shown in Figure 2. However, other orders may be provided and/or steps (blocks) may be repeated or performed in parallel.

In block 210, the calibrating module 11 calibrates camera parameters of a mono camera. For example, after placement of the mono camera at its destined position (e.g., the mono camera is installed on a vehicle or infrastructure) , the calibrating module 11 calibrates camera parameters of the mono camera. The calibrating may include estimating parameters of a lens and an imaging sensor of the mono camera and then obtaining intrinsic parameters, extrinsic parameters and distortion coefficients of the mono camera. Through the employment of the calibrating, images provided by the mono camera can reflect the real-world scene more accurately. In an example, the calibrating module 11 can be integrated with the lens of the mono camera.

In block 220, the receiving module 12 receives an image (I) captured by the mono camera. For clarity, an example of the captured image (I) is shown in Figure 3 and examples of the following steps will be described with reference to the image (I) .

In block 230, the object detecting module 13 detects one or more objects in the image (I) and creates an object detection list including the one or more detected objects. The detecting of the one or more objects can include locating the one or more objects in the image (I) and further include predicting a bounding box for each of the one or more objects. As shown in the image (I_1) of Figure 4, the object detecting module 13 locates objects 1-2 and predicts a bounding box for each object.

In an example, the detecting of the one or more objects includes applying an object detection algorithm to the received image (I) to locate the one or more objects in the image (I) and predict a bounding box for each of the one or more objects.

In this example, the object detecting module 13 may use one or more neural networks of the object detection algorithm to detect instances of objects from a particular object class (e.g., human beings, bicycles, cars) within the image (I) . The neural networks are trained on data sets of training images. The trained neural networks (e.g., image classifiers) are then feed the image (I) as an input and output a prediction of a bounding box and a class label for each object in the image (I) .

The bounding box may refer to a set of coordinates of a rectangular box that fully encloses on an object. A smaller bounding box for a given object is preferred as it more precisely indicates the location of the object in the image (I) , as compared to a larger bounding box for the same object. In an example of predicting a bounding box, the object detection algorithm may rely on a technique referred to as a sliding window. In the sliding window technique, a window moves across the image (I) , and, at various intervals, the region within the window is analyzed using an image classifier to determine if it contains an object.

In block 240, the segmentation module 14 performs a segmentation of the image (I) to acquire a segmented image including a single background (BG) and one or more foreground (FG) objects, i.e., sets of pixels of the single BG and the one or more FG objects. As shown in the image (I_2) of Figure 5, the segmentation module 14 yields a binary image, black being the background and white being the foreground. Each of the instances of the FG objects A～C in the received image (I) is segmented as an individual object. The segmentation module 14 can also yield the segmented image with a continuous value (e.g., float or integer) to describe the probability of a pixel being BG or FG.

In an example, the segmentation of the image (I) includes identifying, for each pixel of the image (I) , a belonging object of the one or more FG objects and labeling each identified object with an object ID that is a unique identification of the object. The segmentation module 24 can use one or more neural networks (e.g., CNN and FCN) to perform the segmentation of the image (I) .

In this example, the identifying includes determining which pixels belong to the BG and which pixels belong to the one or more FG objects. In the case that the segmented image (I_2) includes only one FG object, the segmentation module 14 can use a segmentation mask to determine which pixels are in the BG and which pixels are in the FG. In the case that the segmented image (I_2) includes multiple FG objects, the segmentation module 14 can use a segmentation map to determine which pixels are in the BG and which pixels are in the FG. The segmentation map is formed such that it can indicate the BG and each of the individual FG objects, such as 0 for the BG, 1 for an FG object, 2 for another FG object, and so forth.

In this example, the segmentation module 14 can perform background-foreground segmentation in which the BG is divided from the FG. This can form a segmentation mask that indicates which pixels are in the background and which are in the foreground. For example, the segmentation mask may be a binary mask with 0 referring to the background and 1 referring to the foreground. The one or more objects are defined to form multiple individual foregrounds, that is to say, each FG object is divided as a single foreground against the single large background.

In this example, the segmentation module 14 can use two trained classifiers i.e., a first classifier and a second classifier, to segment the image (I) into the BG and the one or more FG objects. The first classifier is used to classify pixels of the image (I) as BG or FG. For example, the pixels of the image (I) are input to the first classifier, and the first classifier outputs classified results such as 1) a pixel with a high background probability and a low foreground probability to be the background; and 2) a pixel with a low background probability and a high foreground probability to be the foreground. The second classifier is used to classify FG pixels (i.e., pixels that are classified as FG by the first classier) as a respective one of the FG objects. For example, the FG pixels are input to the second classifier, and the second classifier outputs classified results such as an object ID that each of the FG pixels belongs to.

It is noted that the “segmentation” in the disclosure refers to an indication of a plurality of image pixels portraying one or more objects. For example, the segmentation can include a segmentation boundary (e.g., a boundary line or curve indicating an edge of each of the FG objects) or a segmentation mask (e.g., a binary mask identifying pixels corresponding to an FG object) .

In block 250, the comparing module 15 compares the detected one or more objects with the segmented one or more FG objects. In other words, comparing module 15 compares the detection results from the object detecting module 13 and the segmentation results from the segmentation module 14.

In block 260, the determining module 16 determines at least one of the detected objects to be true positive (TP) or false positive (FP) and also determines an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module. That is to say, an FN object is the object that is present in the received image (I) and the detecting module 13 failed to detect the object.

Then, the procedure 200 may proceed to block 270. In block 270, the determining module 16 directly modifies the object detection list according to the TP, FP and FN determinations. The procedure 200 may also proceed to

blocks

280 and 290. In block 280, the assigning module 17 assigns confidence values to the detected objects according to the TP, FP and FN determinations. In block 290, the determining module 16 modifies the object detection list according to the assigned confidence values. The confidence values of the detected objects are between 0 and 1. The confidence value of a detected object is representative of a probability that the object is present in the image (I) . For example, the higher the confidence value of the detected object, the more we believe the detected object exists in the image (I) .

Figure 6 shows exemplary TP, FP and FN determinations according the comparison between the detected objects 1-2 in the image I-1 with the segmented FG objects A-C in the image I_2. As shown in Figure 6, the detected object 1 in the image (I) corresponds to the segmented FG object A in the image (I_2) and thus is determined to be TP. The detected object 2 in the image (I) cannot correspond to any FG object in the in the image (I_2) and thus is determined to be FP. The FG objects B and C are present in the image (I) and failed to be detected by the detecting module 13, and thus the determining module 16 determines there are FN objects that exist in the image (I) but are not detected by the detecting module. That is to say, with regard to segmented FG objects B and C, no corresponding detected objects can be found, and thus the determining module 16 suspects FN objects in the detection list. Since a non-detected object does not exist in the detection list yet. We can only infer that one should be there. The detection results incorrectly indicate the absence of the objects B and C that are actually present.

In examples of the disclosure, the description of “the detected object 1 corresponding to the FG object A” means that the object 1 detected by the detecting module 13 has almost the same location and size with that of the FG object A segmented by the segmentation module 14.

Examples of the TP, FP and FN determinations are described below.

In an example, the TP, FP and FN determinations are implemented by means of the bounding boxes of the detected objects. In this example, the comparing module 15 overlays the bounding boxes onto the segmented image to overlay corresponding areas of the BG and FG objects (block 250) .

With reference to Figure 7, the comparing module 15 overlays the bounding boxes of the detected objects 1 and 2 onto the segmented image (I_2) by locating those bounding boxes at corresponding positions in the segmented image (I_2) . The corresponding positions, for example, are a series of coordinates of boundaries of those bounding boxes. As shown in Figure 7, one bounding box overlays the FG object A and another bounding box overlays the BG (i.e., the overlaid bounding box contains the BG with the symbol “X” ) .

Then, the determining module 16 counts a number of FG pixels and a number of BG pixels within a corresponding bounding box that corresponds to the at least one detected object; and determines the at least one detected object to be TP or FP based on the counted numbers of FG pixels and BG pixels (block 260) . Specifically, the determining module 16 determines the at least one detected object to be TP if a ratio of the number of FG pixels to the number of BG pixels within the corresponding bounding box is greater than or equal to a ratio threshold. The determining module 16 determines the at least one detected object to be FP if the ratio is smaller than the ratio threshold. The ratio threshold can be pre-determined by experiences and/or model calculations. The ratio threshold may be adjusted according to scenarios. The determining module 16 also determines that the object detecting module 13 fails to detect an object that is present in the received image (I) and thus determines the object that fails to be detected is an FN object, if an FG object is not overlaid by any bounding box.

With continued reference to Figure 7, within the overlaid bounding box which contains the FG object A, a ratio of the number of FG pixels to the number of BG pixels is greater than the ratio threshold, and thus the corresponding detected object 1 is determined to be TP. Within the overlaid bounding box which contains the BG with symbol “X” , a ratio of the number of FG pixels to the number of BG pixels is smaller than the ratio threshold, and thus the corresponding detected object 2 is determined to be FP.The FG object B and C are presented in the segmented image but not overlaid by any bounding box, and thus were not detected by the detecting module 13, and the determining module 16 determines there are FN objects that exist in the received image but are not detected by the detecting module.

In an example, the TP, FP and FN determinations are implemented by means of respective sets of position points of the detected objects. In this example, each of the detected object includes a set of position points. The sets of position points are formed by raw data output from the mono camera. The comparing module 15 marks the sets of position points of the one or more detected objects onto the segmented image to overlay corresponding areas of the BG and FG objects. For example, the comparing module 15 marks the sets of position points onto the segmented image by locating those sets of position points at corresponding positions in the segmented image (I_2) . Then, the comparing module 15 sets a central point of each of the sets of position points and a checking area around the central point (block 250) .

It is noted that a central point of a set of position points of an object is the point that is located at the central position of the set of position points. With reference to Figure 8A, the size of the checking area around the central point may be smaller than that of the object. The boundary of the checking area is within (surrounded by) the boundary of the object. With reference to Figure 8B, the size of the checking area around the central point may be a little larger than that of the object. At least a part of the boundary of the checking area is beyond the boundary of the object.

With reference to Figures 8A and 8B, the example of a set of position points of a car is shown. The comparing module 15 sets a central point P1 of the set of position points of the car and sets a checking area P11 around the central point P1. The boundary of the checking area P11 is within the boundary of the car, and the size of the checking area P11 is smaller than that of the car.

Then, the determining module 16 counts a number of FG pixels and a number of BG pixels within a corresponding checking area that corresponds to the at least one detected object; and determines the at least one detected object to be TP or FP based on the counted numbers of FG and BG pixels (block 260) . Specifically, the determining module 16 determines the at least one detected object to be TP if a ratio of the number of FG pixels to the number of BG pixels within the corresponding checking area is greater than or equal to a ratio threshold. The determining module 16 determines the at least one detected object to be FP if the ratio is smaller than the ratio threshold. The ratio threshold can be pre-determined by experiences and/or model calculations. The ratio threshold may be adjusted according to scenarios. The determining module 16 also determines that the object detecting module 13 fails to detect an object that is present in the received image (I) and thus determines the object that fails to be detected is an FN object not appeared in the object detection list, if an FG object is marked by position points the number of which is less than a lower limit threshold. The lower limit threshold may be pre-determined by experiences and/or model calculations. The lower limit threshold may be adjusted according to scenarios.

In an example block 270, the determining module 16 eliminates the at least one detected object from the object detection list if the at least one object is determined to be FP, and keeps the at least one detected object in the object detection list if the at least one detected object is determined to be TP. The determining module 16 also adds any object that is determined to be FN into the object detection list.

In an example of block 280, the assigning module 17 assigns an initial confidence value to each of the detected objects. Then, the assigning module 17 adjusts the initial confidence values of the detected objects according to the TP and FP determinations. In an example, the initial confidence value of each detected object is set to be an intermediate value 0.5 between 0 and 1. The assigning module 17 increases an initial confidence value of a detected object towards 1 or to 1 and acquires a final confidence value of the detected object if the detected object is determined to be TP; and decrease the initial confidence value of the detected object towards 0 or to 0 and acquires the final confidence value of the detected object if the detected object is determined to be FP.

In another example of block 280, the assigning module 17 assigns a first confidence value to each of the detected objects. The first confidence value can be between 0 and 1 and assigned based on how much trust is given beforehand to the detected results, depending on the properties of the algorithms used by the object detecting module 13. The assigning module 17 further assigns a second confidence value to each of the detected objects according to the TP and FP determinations, and then calculates a final confidence value of each of the detected objects based on the assigned first and second confidence values. In an embodiment, the second confidence value of a detected object will be between 0.5 and 1 if it is determined to be TP and between 0 and 0.5 if it is determined to be FP. Then, the assigning module 17 calculates a final confidence value of each detected object based on its first and second confidence values. The calculation includes various rules, such as weight average and multiplication. In an embodiment, a final confidence value of a detected object is calculated by the following formula:

C _i=k ₁*C _1i+k ₂*C _2i

where C _i represents a final confidence value of one of the detected objects (OBJ _i) ;

C _1i represents the first confidence value of the object (OBJ _i) ;

k ₁ represents the weight coefficient of the first confidence value C _1i;

C _2i represents the second confidence value of the object (OBJ _i) ; and

k ₂ represents the weight coefficient of the second confidence value C _2i.

In an example, k ₂ is greater than k ₁ (e.g., k ₂ is three times as much as k ₁) . In this way, the second confidence value is considered more important than the first confidence value in determining the final confidence value, and thus the second confidence value is assigned more weight than the first confidence value and contribute more to the final confidence value.

In an example of block 290, the determining module 16 eliminates the at least one detected object from the detection list if the final confidence value of the at least one detected object is lower than a confidence threshold, and keeps the at least one detected object in the detection list if the final confidence value of the at least one detected object is greater than or equal to the confidence threshold. The determining module 16 also adds any object that is determined to be FN into the detection list.

It is noted that some examples of determining a TP or FP object are described using at least one detected object. According to some other examples of the disclosure, the determining module 16 can determine each of the detected object in detection list to be TP or FP using similar solutions as described above. Further, the determining module 16 can determine the elimination or keeping of each detected object using similar solutions as described above.

Another aspect of the disclosure relates to an image processing method. The method can be performed by means of the processing apparatus 10 as described above. For this reason, various features, which are described above with reference to the processing apparatus 10 are also applicable in the method. An image processing method 900 according to an example of the disclosure is schematically shown in Figure 9 and comprises the following steps.

In step S910, an image captured by a mono camera is received.

In step S920, one or more objects are detected in the received image.

In step S930, an object detection list is created. The object detection list includes the detected one or more objects.

In step S940, the received image is segmented into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image.

In step S950, a comparison is performed based on the one or more detected objects and the one or more FG objects.

In step S960, at least one detected object of the one or more detected objects is determined to be true positive (TP) or false positive (FP) based on the comparison.

In step S970, it is determined an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module.

Embodiments of the disclosure may be implemented in a non-transitory computer readable medium. The non-transitory computer readable medium may include instructions that, when executed, cause one or more processors to perform any operation of the method 900 according to examples of the disclosure.

It should be appreciated that all the operations in the method described above are merely exemplary, and the disclosure is not limited to any operations in the method or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.

The processors can be implemented using electronic hardware, computer software, or any combination thereof. Whether these processors are implemented as hardware or software will depend on the specific application and the overall design constraints imposed on the system. By way of example, a processor, any portion of a processor, or any combination of processors presented in this disclosure may be implemented as a microprocessor, a micro-controller, a digital signal processor (DSP) , a field programmable gate array (FPGA) , a programmable logic device (PLD) , state machine, gate logic, discrete hardware circuitry, and other suitable processing components configured to perform the various functions described in this disclosure. The functions of a processor, any portion of a processor, or any combination of processors presented in this disclosure may be implemented as software executed by a microprocessor, a micro-controller, a DSP, or other suitable platforms.

Software should be considered broadly to represent instructions, instruction sets, code, code segments, program code, programs, subroutines, software modules, applications, software applications, software packages, routines, subroutines, objects, running threads, processes, functions, and the like. Software can reside on computer readable medium. Computer readable medium may include, for example, a memory, which may be, for example, a magnetic storage device (e.g., a hard disk, a floppy disk, a magnetic strip) , an optical disk, a smart card, a flash memory device, a random access memory (RAM) , a read only memory (ROM) , a programmable ROM (PROM) , an erasable PROM (EPROM) , an electrically erasable PROM (EEPROM) , a register, or a removable disk. Although a memory is shown as being separate from the processor in various aspects presented in this disclosure, a memory may also be internal to the processor (e.g., a cache or a register) .

The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalent transformations to the elements of the various aspects of the disclosure, which are known or to be apparent to those skilled in the art, are intended to be covered by the claims.

Claims

An image processing apparatus, comprising:

a receiving module configured to receive an image captured by a mono camera;

an object detecting module configured to detect one or more objects in the received image and create an object detection list including the one or more objects;

a segmentation module configured to segment the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image;

a comparing module configured to perform a comparison based on the one or more detected objects and the one or more FG objects; and

a determining module configured to determine at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison, and to determine an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.
The image processing apparatus of claim 1, wherein the determining module is further configured to:

eliminate the at least one detected object from the object detection list if the at least one object is determined to be FP;

keep the at least one detected object in the object detection list if the at least one detected object is determined to be TP; and

add any object that is determined to be FN into the object detection list.
The image processing apparatus of claim 1, further comprising an assigning module configured to assign a confidence value to the at least one detected object based on determined results from the determining module, the confidence value of the at least one detected object representing an existence probability of the at least one detected object.
The image processing apparatus of claim 3, wherein the assigning comprises:

assigning an initial confidence value to the at least one detected objects; and

adjusting the initial confidence value of the at least one detected object to acquire a final confidence value of the at least one detected object in response to a TP or FP determination.
The image processing apparatus of claim 3, wherein the assigning comprises:

assigning a first confidence value to the at least one detected object based on detected results from the object detecting module;

assigning a second confidence value to the at least one detected object based on a TP or FP determination from the determining module; and

calculating a final confidence value of the at least one detected object based on the first and second confidence values.
The image processing apparatus of claim 4 or 5, wherein the determining module is further configured to:

eliminate the at least one detected object from the detection list if the final confidence value of the at least one detected object is lower than a confidence threshold;

keep the at least one detected object in the detection list if the final confidence value of the at least one detected object is greater than or equal to the confidence threshold; and

add any object that is determined to be FN into the detection list.
The image processing apparatus of any one of claims 1-6, wherein the object detecting module is further configured to predict bounding boxes each of which contains a respective one of the one or more detected objects; and

the comparing module is further configured to overlay the bounding boxes onto the segmented image to overlay corresponding areas of the BG and FG objects.
The image processing apparatus of claim 7, wherein determining the at least one detected object to be TP or FP comprises:

counting a number of FG pixels and a number of BG pixels within a corresponding bounding box that corresponds to the at least one detected object; and

determining the at least one detected object to be TP or FP based on the counted numbers of FG and BG pixels.
The image processing apparatus of claims 8, wherein determining the at least one detected object to be TP or FP based on the counted numbers of FG and BG pixels comprises:

determining the at least one detected object to be TP if a ratio of the number of FG pixels to the number of BG pixels within the corresponding bounding box is greater than or equal to a ratio threshold; and

determining the at least one detected object to be FP if the ratio is smaller than the ratio threshold.
The image processing apparatus of any one of claims 7-9, wherein determining an object to be FN comprises:

if an FG object is not overlaid by any bounding box, determining the object detecting module fails to detect an object in the received image and determining the object that fails to be detected is an FN object.
The image processing apparatus of any one of claims 1-10, wherein each of the one or more detected objects comprises a set of position points; and

wherein the comparing module is further configured to:

mark sets of position points of the one or more detected objects onto the segmented image to overlay corresponding areas of the BG and FG objects; and

set a central point of each of the sets of position points and a checking area around the central point.
The image processing apparatus of claim 11, wherein determine at least one detected object of the one or more detected objects to be TP or FP comprises:

counting a number of FG pixels and a number of BG pixels within a corresponding checking area that corresponds to the at least one detected object; and

determining the at least one detected object to be TP or FP based on the counted numbers of FG and BG pixels.
The image processing apparatus of claim 12, wherein determining the at least one detected object to be TP or FP based on the counted numbers of FG and BG pixels comprises:

determining the at least one detected object to be TP if a ratio of the number of FG pixels to the number of BG pixels within the corresponding checking area is greater than or equal to a ratio threshold; and

determining the at least one detected object to be FP if the ratio is small than the ratio threshold.
The image processing apparatus of any one of claims 11-13, wherein determining an object to be FN comprises:

if an FG object is marked by position points the number of which is less than a lower limit threshold, determining the object detecting module fails to detect an object in the received image and determining the object that fails to be detected is an FN object.
The image processing apparatus of any one of claims 1-14, further comprising a calibrating module configured for calibrating camera parameters of the mono camera.
An image processing method, comprising:

receiving an image captured by a mono camera;

detecting one or more objects in the received image;

creating an object detection list including the one or more objects;

segmenting the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image;

performing a comparison based on the one or more detected objects and the one or more FG objects;

determining at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison; and

determining an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.
A non-transitory computer readable medium with instructions stored therein which, when executed, causes a processor to carry out the steps comprising:

receiving an image captured by a mono camera;

detecting one or more objects in the received image;

creating an object detection list including the one or more objects;

segmenting the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image;

performing a comparison based on the one or more detected objects and the one or more FG objects;

determining at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison; and

determining an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.