WO2023178542A1 - Image processing apparatus and method - Google Patents

Image processing apparatus and method Download PDF

Info

Publication number
WO2023178542A1
WO2023178542A1 PCT/CN2022/082430 CN2022082430W WO2023178542A1 WO 2023178542 A1 WO2023178542 A1 WO 2023178542A1 CN 2022082430 W CN2022082430 W CN 2022082430W WO 2023178542 A1 WO2023178542 A1 WO 2023178542A1
Authority
WO
WIPO (PCT)
Prior art keywords
detected
objects
determining
image
processing apparatus
Prior art date
Application number
PCT/CN2022/082430
Other languages
French (fr)
Inventor
Marc Patrick ZAPF
Xinrun LI
Fangyun HU
Original Assignee
Robert Bosch Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robert Bosch Gmbh filed Critical Robert Bosch Gmbh
Priority to PCT/CN2022/082430 priority Critical patent/WO2023178542A1/en
Publication of WO2023178542A1 publication Critical patent/WO2023178542A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/758Involving statistics of pixels or of feature values, e.g. histogram matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/759Region-based matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data

Definitions

  • the following disclosure relates to an image processing apparatus, to an image processing method, and to a corresponding computer readable medium.
  • BG far away background
  • FG foreground
  • One aspect of the disclosure provides an image processing apparatus including: a receiving module configured to receive an image captured by a mono camera; an object detecting module configured to detect one or more objects in the received image and create an object detection list including the one or more objects; a segmentation module configured to segment the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image; a comparing module configured to perform a comparison based on the one or more detected objects and the one or more FG objects; and a determining module configured to determine at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison, and to determine an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.
  • a receiving module configured to receive an image captured by a mono camera
  • an object detecting module configured to detect one or more objects in the received image and create an object detection list including the one or more objects
  • Another aspect of the disclosure provides an image processing method including: receiving an image captured by a mono camera; detecting one or more objects in the received image; creating an object detection list including the one or more objects; segmenting the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image; performing a comparison based on the one or more detected objects and the one or more FG objects; determining at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison; and determining an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.
  • BG background
  • FG foreground
  • Yet another aspect of the disclosure provides a non-transitory computer readable medium with instructions stored therein which, when executed, causes a processor to carry out the steps including: receiving an image captured by a mono camera; detecting one or more objects in the received image; creating an object detection list including the one or more objects; segmenting the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image; performing a comparison based on the one or more detected objects and the one or more FG objects; determining at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison; and determining an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.
  • BG background
  • FG foreground
  • Embodiments of the disclosure determine FP and TP objects of the detected objects as well as any possible FN object, and therefore provide more accurate information on object detections.
  • Figure 1 is a block diagram of an image processing apparatus according to an example of the disclosure.
  • FIG 2 illustrates an image processing procedure implemented by the image processing apparatus of Figure 1 according to an example of the disclosure.
  • FIGS 3-8B illustrate the working principle of the image processing apparatus of Figure 1.
  • Figure 9 is a flowchart of an image processing method according to an example of the disclosure.
  • Object detection has become increasingly important in a variety of technology fields in recent years, as the ability to track objects in images has become increasingly significant in many applications involving security and artificial intelligence technologies (e.g., a collision avoidance function in self-driving vehicles) .
  • security and artificial intelligence technologies e.g., a collision avoidance function in self-driving vehicles.
  • Examples of the disclosure relate to apparatus and method for processing images captured by mono cameras to recognize FP and TP objects as well as any possible FN object and modify the object detection list accordingly such that more trustworthy object detections can be acquired. Embodiments of the present disclosure will be described with reference to the figures.
  • One aspect of the disclosure relates to an image processing apparatus (hereafter referred to as “processing apparatus” ) for processing images captured by a mono camera.
  • the processing apparatus may be integrated with the mono camera to form a part of the mono camera.
  • the processing apparatus may also be separated from the mono camera to be an independent apparatus.
  • the mono camera may be situated in an infrastructure or a parked vehicle and thus it is stationary.
  • the mono camera may also be mounted in a travelling vehicle or wearable device carried by a walking person and thus it is moving.
  • “Mono camera” or “mono cameras” refer to single camera (s) as opposed to stereo setups or depth cameras which can provide depth information of a scene.
  • the processing apparatus receives images from a mono camera that is configured for capturing images and/or a storage unit for storing the captured images, and then processes the received images.
  • the processing apparatus may process images at camera run-time or deal with images stored in the storage unit.
  • the captured images are sent to the processing apparatus instantly and processed in real time.
  • the captured images are stored in the storage unit and then obtained and processed by the processing apparatus when the processing is required.
  • Figure 1 illustrates an image processing apparatus 10 according to an example of the disclosure.
  • the processing apparatus 10 includes a receiving module 12, an object detecting module 13, a segmentation module 14, a comparing module 15, a determining module.
  • the processing apparatus 10 further includes an assigning module 17.
  • the processing apparatus 10 further includes a calibrating module 11.
  • Those modules of the processing apparatus 10 are named functionally. Those names are not intended to limit physical positions of the modules.
  • the modules may be provided in the same chip or circuit or provided in different chips or circuits.
  • the processing apparatus 10 may be implemented by means of hardware or software or a combination of hardware and software, including a non-transitory computer readable medium stored in a memory and implemented as instructions executed by a processor.
  • a processor may be implemented in application-specific integrated circuit (ASIC) , digital signal processor (DSP) , data signal processing device (DSPD) , programmable logic device (PLD) , field programmable gate array (FPGA) , processor, controller, microcontroller, microprocessor, electronic unit, or a combination thereof.
  • ASIC application-specific integrated circuit
  • DSP digital signal processor
  • DSPD data signal processing device
  • PLD programmable logic device
  • FPGA field programmable gate array
  • the part implemented by software may include microcode, program code or code segments.
  • the software may be stored in a machine readable storage medium, such as a memory.
  • the processing apparatus 10 may include a memory and a processor.
  • the memory includes instructions that, when executed by the processor, cause the processor to perform the image processing method according to examples of the disclosure.
  • FIG. 2 illustrates an exemplary image processing procedure 200 which can be implemented by the processing apparatus 10.
  • the image processing procedure 200 is provided in the order shown in Figure 2. However, other orders may be provided and/or steps (blocks) may be repeated or performed in parallel.
  • the calibrating module 11 calibrates camera parameters of a mono camera. For example, after placement of the mono camera at its destined position (e.g., the mono camera is installed on a vehicle or infrastructure) , the calibrating module 11 calibrates camera parameters of the mono camera.
  • the calibrating may include estimating parameters of a lens and an imaging sensor of the mono camera and then obtaining intrinsic parameters, extrinsic parameters and distortion coefficients of the mono camera.
  • the calibrating module 11 can be integrated with the lens of the mono camera.
  • the receiving module 12 receives an image (I) captured by the mono camera.
  • an example of the captured image (I) is shown in Figure 3 and examples of the following steps will be described with reference to the image (I) .
  • the object detecting module 13 detects one or more objects in the image (I) and creates an object detection list including the one or more detected objects.
  • the detecting of the one or more objects can include locating the one or more objects in the image (I) and further include predicting a bounding box for each of the one or more objects. As shown in the image (I_1) of Figure 4, the object detecting module 13 locates objects 1-2 and predicts a bounding box for each object.
  • the detecting of the one or more objects includes applying an object detection algorithm to the received image (I) to locate the one or more objects in the image (I) and predict a bounding box for each of the one or more objects.
  • the object detecting module 13 may use one or more neural networks of the object detection algorithm to detect instances of objects from a particular object class (e.g., human beings, bicycles, cars) within the image (I) .
  • the neural networks are trained on data sets of training images.
  • the trained neural networks e.g., image classifiers
  • the trained neural networks are then feed the image (I) as an input and output a prediction of a bounding box and a class label for each object in the image (I) .
  • the bounding box may refer to a set of coordinates of a rectangular box that fully encloses on an object.
  • a smaller bounding box for a given object is preferred as it more precisely indicates the location of the object in the image (I) , as compared to a larger bounding box for the same object.
  • the object detection algorithm may rely on a technique referred to as a sliding window. In the sliding window technique, a window moves across the image (I) , and, at various intervals, the region within the window is analyzed using an image classifier to determine if it contains an object.
  • the segmentation module 14 performs a segmentation of the image (I) to acquire a segmented image including a single background (BG) and one or more foreground (FG) objects, i.e., sets of pixels of the single BG and the one or more FG objects.
  • the segmentation module 14 yields a binary image, black being the background and white being the foreground.
  • Each of the instances of the FG objects A ⁇ C in the received image (I) is segmented as an individual object.
  • the segmentation module 14 can also yield the segmented image with a continuous value (e.g., float or integer) to describe the probability of a pixel being BG or FG.
  • the segmentation of the image (I) includes identifying, for each pixel of the image (I) , a belonging object of the one or more FG objects and labeling each identified object with an object ID that is a unique identification of the object.
  • the segmentation module 24 can use one or more neural networks (e.g., CNN and FCN) to perform the segmentation of the image (I) .
  • the identifying includes determining which pixels belong to the BG and which pixels belong to the one or more FG objects.
  • the segmentation module 14 can use a segmentation mask to determine which pixels are in the BG and which pixels are in the FG.
  • the segmentation module 14 can use a segmentation map to determine which pixels are in the BG and which pixels are in the FG.
  • the segmentation map is formed such that it can indicate the BG and each of the individual FG objects, such as 0 for the BG, 1 for an FG object, 2 for another FG object, and so forth.
  • the segmentation module 14 can perform background-foreground segmentation in which the BG is divided from the FG. This can form a segmentation mask that indicates which pixels are in the background and which are in the foreground.
  • the segmentation mask may be a binary mask with 0 referring to the background and 1 referring to the foreground.
  • the one or more objects are defined to form multiple individual foregrounds, that is to say, each FG object is divided as a single foreground against the single large background.
  • the segmentation module 14 can use two trained classifiers i.e., a first classifier and a second classifier, to segment the image (I) into the BG and the one or more FG objects.
  • the first classifier is used to classify pixels of the image (I) as BG or FG.
  • the pixels of the image (I) are input to the first classifier, and the first classifier outputs classified results such as 1) a pixel with a high background probability and a low foreground probability to be the background; and 2) a pixel with a low background probability and a high foreground probability to be the foreground.
  • the second classifier is used to classify FG pixels (i.e., pixels that are classified as FG by the first classier) as a respective one of the FG objects.
  • FG pixels i.e., pixels that are classified as FG by the first classier
  • the FG pixels are input to the second classifier, and the second classifier outputs classified results such as an object ID that each of the FG pixels belongs to.
  • segmentation in the disclosure refers to an indication of a plurality of image pixels portraying one or more objects.
  • the segmentation can include a segmentation boundary (e.g., a boundary line or curve indicating an edge of each of the FG objects) or a segmentation mask (e.g., a binary mask identifying pixels corresponding to an FG object) .
  • the comparing module 15 compares the detected one or more objects with the segmented one or more FG objects. In other words, comparing module 15 compares the detection results from the object detecting module 13 and the segmentation results from the segmentation module 14.
  • the determining module 16 determines at least one of the detected objects to be true positive (TP) or false positive (FP) and also determines an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module. That is to say, an FN object is the object that is present in the received image (I) and the detecting module 13 failed to detect the object.
  • the procedure 200 may proceed to block 270.
  • the determining module 16 directly modifies the object detection list according to the TP, FP and FN determinations.
  • the procedure 200 may also proceed to blocks 280 and 290.
  • the assigning module 17 assigns confidence values to the detected objects according to the TP, FP and FN determinations.
  • the determining module 16 modifies the object detection list according to the assigned confidence values.
  • the confidence values of the detected objects are between 0 and 1.
  • the confidence value of a detected object is representative of a probability that the object is present in the image (I) . For example, the higher the confidence value of the detected object, the more we believe the detected object exists in the image (I) .
  • Figure 6 shows exemplary TP, FP and FN determinations according the comparison between the detected objects 1-2 in the image I-1 with the segmented FG objects A-C in the image I_2.
  • the detected object 1 in the image (I) corresponds to the segmented FG object A in the image (I_2) and thus is determined to be TP.
  • the detected object 2 in the image (I) cannot correspond to any FG object in the in the image (I_2) and thus is determined to be FP.
  • the FG objects B and C are present in the image (I) and failed to be detected by the detecting module 13, and thus the determining module 16 determines there are FN objects that exist in the image (I) but are not detected by the detecting module.
  • the description of “the detected object 1 corresponding to the FG object A” means that the object 1 detected by the detecting module 13 has almost the same location and size with that of the FG object A segmented by the segmentation module 14.
  • the TP, FP and FN determinations are implemented by means of the bounding boxes of the detected objects.
  • the comparing module 15 overlays the bounding boxes onto the segmented image to overlay corresponding areas of the BG and FG objects (block 250) .
  • the comparing module 15 overlays the bounding boxes of the detected objects 1 and 2 onto the segmented image (I_2) by locating those bounding boxes at corresponding positions in the segmented image (I_2) .
  • the corresponding positions are a series of coordinates of boundaries of those bounding boxes.
  • one bounding box overlays the FG object A and another bounding box overlays the BG (i.e., the overlaid bounding box contains the BG with the symbol “X” ) .
  • the determining module 16 counts a number of FG pixels and a number of BG pixels within a corresponding bounding box that corresponds to the at least one detected object; and determines the at least one detected object to be TP or FP based on the counted numbers of FG pixels and BG pixels (block 260) . Specifically, the determining module 16 determines the at least one detected object to be TP if a ratio of the number of FG pixels to the number of BG pixels within the corresponding bounding box is greater than or equal to a ratio threshold. The determining module 16 determines the at least one detected object to be FP if the ratio is smaller than the ratio threshold.
  • the ratio threshold can be pre-determined by experiences and/or model calculations.
  • the ratio threshold may be adjusted according to scenarios.
  • the determining module 16 also determines that the object detecting module 13 fails to detect an object that is present in the received image (I) and thus determines the object that fails to be detected is an FN object, if an FG object is not overlaid by any bounding box.
  • a ratio of the number of FG pixels to the number of BG pixels is greater than the ratio threshold, and thus the corresponding detected object 1 is determined to be TP.
  • a ratio of the number of FG pixels to the number of BG pixels is smaller than the ratio threshold, and thus the corresponding detected object 2 is determined to be FP.
  • the FG object B and C are presented in the segmented image but not overlaid by any bounding box, and thus were not detected by the detecting module 13, and the determining module 16 determines there are FN objects that exist in the received image but are not detected by the detecting module.
  • the TP, FP and FN determinations are implemented by means of respective sets of position points of the detected objects.
  • each of the detected object includes a set of position points.
  • the sets of position points are formed by raw data output from the mono camera.
  • the comparing module 15 marks the sets of position points of the one or more detected objects onto the segmented image to overlay corresponding areas of the BG and FG objects.
  • the comparing module 15 marks the sets of position points onto the segmented image by locating those sets of position points at corresponding positions in the segmented image (I_2) .
  • the comparing module 15 sets a central point of each of the sets of position points and a checking area around the central point (block 250) .
  • a central point of a set of position points of an object is the point that is located at the central position of the set of position points.
  • the size of the checking area around the central point may be smaller than that of the object.
  • the boundary of the checking area is within (surrounded by) the boundary of the object.
  • the size of the checking area around the central point may be a little larger than that of the object. At least a part of the boundary of the checking area is beyond the boundary of the object.
  • the comparing module 15 sets a central point P1 of the set of position points of the car and sets a checking area P11 around the central point P1.
  • the boundary of the checking area P11 is within the boundary of the car, and the size of the checking area P11 is smaller than that of the car.
  • the determining module 16 counts a number of FG pixels and a number of BG pixels within a corresponding checking area that corresponds to the at least one detected object; and determines the at least one detected object to be TP or FP based on the counted numbers of FG and BG pixels (block 260) . Specifically, the determining module 16 determines the at least one detected object to be TP if a ratio of the number of FG pixels to the number of BG pixels within the corresponding checking area is greater than or equal to a ratio threshold. The determining module 16 determines the at least one detected object to be FP if the ratio is smaller than the ratio threshold.
  • the ratio threshold can be pre-determined by experiences and/or model calculations. The ratio threshold may be adjusted according to scenarios.
  • the determining module 16 also determines that the object detecting module 13 fails to detect an object that is present in the received image (I) and thus determines the object that fails to be detected is an FN object not appeared in the object detection list, if an FG object is marked by position points the number of which is less than a lower limit threshold.
  • the lower limit threshold may be pre-determined by experiences and/or model calculations. The lower limit threshold may be adjusted according to scenarios.
  • the determining module 16 eliminates the at least one detected object from the object detection list if the at least one object is determined to be FP, and keeps the at least one detected object in the object detection list if the at least one detected object is determined to be TP.
  • the determining module 16 also adds any object that is determined to be FN into the object detection list.
  • the assigning module 17 assigns an initial confidence value to each of the detected objects. Then, the assigning module 17 adjusts the initial confidence values of the detected objects according to the TP and FP determinations. In an example, the initial confidence value of each detected object is set to be an intermediate value 0.5 between 0 and 1. The assigning module 17 increases an initial confidence value of a detected object towards 1 or to 1 and acquires a final confidence value of the detected object if the detected object is determined to be TP; and decrease the initial confidence value of the detected object towards 0 or to 0 and acquires the final confidence value of the detected object if the detected object is determined to be FP.
  • the assigning module 17 assigns a first confidence value to each of the detected objects.
  • the first confidence value can be between 0 and 1 and assigned based on how much trust is given beforehand to the detected results, depending on the properties of the algorithms used by the object detecting module 13.
  • the assigning module 17 further assigns a second confidence value to each of the detected objects according to the TP and FP determinations, and then calculates a final confidence value of each of the detected objects based on the assigned first and second confidence values.
  • the second confidence value of a detected object will be between 0.5 and 1 if it is determined to be TP and between 0 and 0.5 if it is determined to be FP.
  • the assigning module 17 calculates a final confidence value of each detected object based on its first and second confidence values.
  • the calculation includes various rules, such as weight average and multiplication.
  • a final confidence value of a detected object is calculated by the following formula:
  • C i represents a final confidence value of one of the detected objects (OBJ i ) ;
  • C 1i represents the first confidence value of the object (OBJ i ) ;
  • k 1 represents the weight coefficient of the first confidence value C 1i ;
  • C 2i represents the second confidence value of the object (OBJ i ) ;
  • k 2 represents the weight coefficient of the second confidence value C 2i .
  • k 2 is greater than k 1 (e.g., k 2 is three times as much as k 1 ) .
  • the second confidence value is considered more important than the first confidence value in determining the final confidence value, and thus the second confidence value is assigned more weight than the first confidence value and contribute more to the final confidence value.
  • the determining module 16 eliminates the at least one detected object from the detection list if the final confidence value of the at least one detected object is lower than a confidence threshold, and keeps the at least one detected object in the detection list if the final confidence value of the at least one detected object is greater than or equal to the confidence threshold.
  • the determining module 16 also adds any object that is determined to be FN into the detection list.
  • the determining module 16 can determine each of the detected object in detection list to be TP or FP using similar solutions as described above. Further, the determining module 16 can determine the elimination or keeping of each detected object using similar solutions as described above.
  • An image processing method 900 according to an example of the disclosure is schematically shown in Figure 9 and comprises the following steps.
  • step S910 an image captured by a mono camera is received.
  • step S920 one or more objects are detected in the received image.
  • step S930 an object detection list is created.
  • the object detection list includes the detected one or more objects.
  • step S940 the received image is segmented into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image.
  • BG background
  • FG foreground
  • step S950 a comparison is performed based on the one or more detected objects and the one or more FG objects.
  • step S960 at least one detected object of the one or more detected objects is determined to be true positive (TP) or false positive (FP) based on the comparison.
  • step S970 it is determined an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module.
  • Embodiments of the disclosure may be implemented in a non-transitory computer readable medium.
  • the non-transitory computer readable medium may include instructions that, when executed, cause one or more processors to perform any operation of the method 900 according to examples of the disclosure.
  • the processors can be implemented using electronic hardware, computer software, or any combination thereof. Whether these processors are implemented as hardware or software will depend on the specific application and the overall design constraints imposed on the system.
  • a processor, any portion of a processor, or any combination of processors presented in this disclosure may be implemented as a microprocessor, a micro-controller, a digital signal processor (DSP) , a field programmable gate array (FPGA) , a programmable logic device (PLD) , state machine, gate logic, discrete hardware circuitry, and other suitable processing components configured to perform the various functions described in this disclosure.
  • DSP digital signal processor
  • FPGA field programmable gate array
  • PLD programmable logic device
  • state machine gate logic, discrete hardware circuitry, and other suitable processing components configured to perform the various functions described in this disclosure.
  • the functions of a processor, any portion of a processor, or any combination of processors presented in this disclosure may be implemented as software executed by a microprocessor, a micro-controller,
  • Software should be considered broadly to represent instructions, instruction sets, code, code segments, program code, programs, subroutines, software modules, applications, software applications, software packages, routines, subroutines, objects, running threads, processes, functions, and the like. Software can reside on computer readable medium.
  • Computer readable medium may include, for example, a memory, which may be, for example, a magnetic storage device (e.g., a hard disk, a floppy disk, a magnetic strip) , an optical disk, a smart card, a flash memory device, a random access memory (RAM) , a read only memory (ROM) , a programmable ROM (PROM) , an erasable PROM (EPROM) , an electrically erasable PROM (EEPROM) , a register, or a removable disk.
  • a memory is shown as being separate from the processor in various aspects presented in this disclosure, a memory may also be internal to the processor (e.g., a cache or a register) .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

An image processing apparatus includes a receiving module configured to receive an image; an object detecting module configured to detect one or more objects in the received image and create an object detection list including the one or more objects; a segmentation module configured to segment the received image into a single background and one or more foreground objects to obtain a segmented image; a comparing module configured to perform a comparison based on the one or more detected objects and the one or more FG objects; and a determining module configured to determine at least one detected object of the one or more detected objects to be true positive or false positive based on the comparison, and to determine an object to be false negative if the object exists in the received image but is not detected by the object detecting module based on the comparison.

Description

IMAGE PROCESSING APPARATUS AND METHOD FIELD
The following disclosure relates to an image processing apparatus, to an image processing method, and to a corresponding computer readable medium.
BACKGROUND
Cameras are widely used in many applications such as surveillance systems and traffic monitoring systems. In most scenarios, a mono camera is used due to its characteristic properties like low simple structure and low cost. However, the mono camera produces pixel images in 2D image space, and those pixel images do not provide any information on how far away background (BG) features and foreground (FG) features in the images are from the imaging camera. Thus, inferring from the pixel position of an object in the 2D image space to the physical position of the object in the 3D world space is not trivial. In this case, the detection of objects in the 2D image space may be inaccurate.
SUMMARY
One aspect of the disclosure provides an image processing apparatus including: a receiving module configured to receive an image captured by a mono camera; an object detecting module configured to detect one or more objects in the received image and create an object detection list including the one or more objects; a segmentation module configured to segment the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image; a comparing module configured to perform a comparison based on the one or more  detected objects and the one or more FG objects; and a determining module configured to determine at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison, and to determine an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.
Another aspect of the disclosure provides an image processing method including: receiving an image captured by a mono camera; detecting one or more objects in the received image; creating an object detection list including the one or more objects; segmenting the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image; performing a comparison based on the one or more detected objects and the one or more FG objects; determining at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison; and determining an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.
Yet another aspect of the disclosure provides a non-transitory computer readable medium with instructions stored therein which, when executed, causes a processor to carry out the steps including: receiving an image captured by a mono camera; detecting one or more objects in the received image; creating an object detection list including the one or more objects; segmenting the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image; performing a comparison based on the one or more detected objects and the one or more FG objects; determining at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the  comparison; and determining an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.
Embodiments of the disclosure determine FP and TP objects of the detected objects as well as any possible FN object, and therefore provide more accurate information on object detections.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram of an image processing apparatus according to an example of the disclosure.
Figure 2 illustrates an image processing procedure implemented by the image processing apparatus of Figure 1 according to an example of the disclosure.
Figures 3-8B illustrate the working principle of the image processing apparatus of Figure 1.
Figure 9 is a flowchart of an image processing method according to an example of the disclosure.
DETAILED DESCRIPTION
Object detection has become increasingly important in a variety of technology fields in recent years, as the ability to track objects in images has become increasingly significant in many applications involving security and artificial intelligence technologies (e.g., a collision avoidance function in self-driving vehicles) . Examples of the disclosure relate to apparatus and method for processing images captured by mono cameras to  recognize FP and TP objects as well as any possible FN object and modify the object detection list accordingly such that more trustworthy object detections can be acquired. Embodiments of the present disclosure will be described with reference to the figures.
One aspect of the disclosure relates to an image processing apparatus (hereafter referred to as “processing apparatus” ) for processing images captured by a mono camera. The processing apparatus may be integrated with the mono camera to form a part of the mono camera. The processing apparatus may also be separated from the mono camera to be an independent apparatus. The mono camera may be situated in an infrastructure or a parked vehicle and thus it is stationary. The mono camera may also be mounted in a travelling vehicle or wearable device carried by a walking person and thus it is moving.
“Mono camera” or “mono cameras” , as used herein, refer to single camera (s) as opposed to stereo setups or depth cameras which can provide depth information of a scene.
The processing apparatus according to an example of the disclosure receives images from a mono camera that is configured for capturing images and/or a storage unit for storing the captured images, and then processes the received images. The processing apparatus may process images at camera run-time or deal with images stored in the storage unit. In an example, the captured images are sent to the processing apparatus instantly and processed in real time. In another example, the captured images are stored in the storage unit and then obtained and processed by the processing apparatus when the processing is required.
Figure 1 illustrates an image processing apparatus 10 according to an  example of the disclosure. With reference to Figure 1, the processing apparatus 10 includes a receiving module 12, an object detecting module 13, a segmentation module 14, a comparing module 15, a determining module. In an example, the processing apparatus 10 further includes an assigning module 17. In an example, the processing apparatus 10 further includes a calibrating module 11. Those modules of the processing apparatus 10 are named functionally. Those names are not intended to limit physical positions of the modules. For example, the modules may be provided in the same chip or circuit or provided in different chips or circuits.
The processing apparatus 10 may be implemented by means of hardware or software or a combination of hardware and software, including a non-transitory computer readable medium stored in a memory and implemented as instructions executed by a processor. Regarding the part implemented by means of hardware, it may be implemented in application-specific integrated circuit (ASIC) , digital signal processor (DSP) , data signal processing device (DSPD) , programmable logic device (PLD) , field programmable gate array (FPGA) , processor, controller, microcontroller, microprocessor, electronic unit, or a combination thereof. The part implemented by software may include microcode, program code or code segments. The software may be stored in a machine readable storage medium, such as a memory.
In an example, the processing apparatus 10 may include a memory and a processor. The memory includes instructions that, when executed by the processor, cause the processor to perform the image processing method according to examples of the disclosure.
Figure 2 illustrates an exemplary image processing procedure 200 which  can be implemented by the processing apparatus 10. The image processing procedure 200 is provided in the order shown in Figure 2. However, other orders may be provided and/or steps (blocks) may be repeated or performed in parallel.
In block 210, the calibrating module 11 calibrates camera parameters of a mono camera. For example, after placement of the mono camera at its destined position (e.g., the mono camera is installed on a vehicle or infrastructure) , the calibrating module 11 calibrates camera parameters of the mono camera. The calibrating may include estimating parameters of a lens and an imaging sensor of the mono camera and then obtaining intrinsic parameters, extrinsic parameters and distortion coefficients of the mono camera. Through the employment of the calibrating, images provided by the mono camera can reflect the real-world scene more accurately. In an example, the calibrating module 11 can be integrated with the lens of the mono camera.
In block 220, the receiving module 12 receives an image (I) captured by the mono camera. For clarity, an example of the captured image (I) is shown in Figure 3 and examples of the following steps will be described with reference to the image (I) .
In block 230, the object detecting module 13 detects one or more objects in the image (I) and creates an object detection list including the one or more detected objects. The detecting of the one or more objects can include locating the one or more objects in the image (I) and further include predicting a bounding box for each of the one or more objects. As shown in the image (I_1) of Figure 4, the object detecting module 13 locates objects 1-2 and predicts a bounding box for each object.
In an example, the detecting of the one or more objects includes applying an object detection algorithm to the received image (I) to locate the one or more objects in the image (I) and predict a bounding box for each of the one or more objects.
In this example, the object detecting module 13 may use one or more neural networks of the object detection algorithm to detect instances of objects from a particular object class (e.g., human beings, bicycles, cars) within the image (I) . The neural networks are trained on data sets of training images. The trained neural networks (e.g., image classifiers) are then feed the image (I) as an input and output a prediction of a bounding box and a class label for each object in the image (I) .
The bounding box may refer to a set of coordinates of a rectangular box that fully encloses on an object. A smaller bounding box for a given object is preferred as it more precisely indicates the location of the object in the image (I) , as compared to a larger bounding box for the same object. In an example of predicting a bounding box, the object detection algorithm may rely on a technique referred to as a sliding window. In the sliding window technique, a window moves across the image (I) , and, at various intervals, the region within the window is analyzed using an image classifier to determine if it contains an object.
In block 240, the segmentation module 14 performs a segmentation of the image (I) to acquire a segmented image including a single background (BG) and one or more foreground (FG) objects, i.e., sets of pixels of the single BG and the one or more FG objects. As shown in the image (I_2) of Figure 5, the segmentation module 14 yields a binary image, black being the  background and white being the foreground. Each of the instances of the FG objects A~C in the received image (I) is segmented as an individual object. The segmentation module 14 can also yield the segmented image with a continuous value (e.g., float or integer) to describe the probability of a pixel being BG or FG.
In an example, the segmentation of the image (I) includes identifying, for each pixel of the image (I) , a belonging object of the one or more FG objects and labeling each identified object with an object ID that is a unique identification of the object. The segmentation module 24 can use one or more neural networks (e.g., CNN and FCN) to perform the segmentation of the image (I) .
In this example, the identifying includes determining which pixels belong to the BG and which pixels belong to the one or more FG objects. In the case that the segmented image (I_2) includes only one FG object, the segmentation module 14 can use a segmentation mask to determine which pixels are in the BG and which pixels are in the FG. In the case that the segmented image (I_2) includes multiple FG objects, the segmentation module 14 can use a segmentation map to determine which pixels are in the BG and which pixels are in the FG. The segmentation map is formed such that it can indicate the BG and each of the individual FG objects, such as 0 for the BG, 1 for an FG object, 2 for another FG object, and so forth.
In this example, the segmentation module 14 can perform background-foreground segmentation in which the BG is divided from the FG. This can form a segmentation mask that indicates which pixels are in the background and which are in the foreground. For example, the segmentation mask may be a binary mask with 0 referring to the  background and 1 referring to the foreground. The one or more objects are defined to form multiple individual foregrounds, that is to say, each FG object is divided as a single foreground against the single large background.
In this example, the segmentation module 14 can use two trained classifiers i.e., a first classifier and a second classifier, to segment the image (I) into the BG and the one or more FG objects. The first classifier is used to classify pixels of the image (I) as BG or FG. For example, the pixels of the image (I) are input to the first classifier, and the first classifier outputs classified results such as 1) a pixel with a high background probability and a low foreground probability to be the background; and 2) a pixel with a low background probability and a high foreground probability to be the foreground. The second classifier is used to classify FG pixels (i.e., pixels that are classified as FG by the first classier) as a respective one of the FG objects. For example, the FG pixels are input to the second classifier, and the second classifier outputs classified results such as an object ID that each of the FG pixels belongs to.
It is noted that the “segmentation” in the disclosure refers to an indication of a plurality of image pixels portraying one or more objects. For example, the segmentation can include a segmentation boundary (e.g., a boundary line or curve indicating an edge of each of the FG objects) or a segmentation mask (e.g., a binary mask identifying pixels corresponding to an FG object) .
In block 250, the comparing module 15 compares the detected one or more objects with the segmented one or more FG objects. In other words, comparing module 15 compares the detection results from the object detecting module 13 and the segmentation results from the segmentation  module 14.
In block 260, the determining module 16 determines at least one of the detected objects to be true positive (TP) or false positive (FP) and also determines an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module. That is to say, an FN object is the object that is present in the received image (I) and the detecting module 13 failed to detect the object.
Then, the procedure 200 may proceed to block 270. In block 270, the determining module 16 directly modifies the object detection list according to the TP, FP and FN determinations. The procedure 200 may also proceed to  blocks  280 and 290. In block 280, the assigning module 17 assigns confidence values to the detected objects according to the TP, FP and FN determinations. In block 290, the determining module 16 modifies the object detection list according to the assigned confidence values. The confidence values of the detected objects are between 0 and 1. The confidence value of a detected object is representative of a probability that the object is present in the image (I) . For example, the higher the confidence value of the detected object, the more we believe the detected object exists in the image (I) .
Figure 6 shows exemplary TP, FP and FN determinations according the comparison between the detected objects 1-2 in the image I-1 with the segmented FG objects A-C in the image I_2. As shown in Figure 6, the detected object 1 in the image (I) corresponds to the segmented FG object A in the image (I_2) and thus is determined to be TP. The detected object 2 in the image (I) cannot correspond to any FG object in the in the image (I_2) and thus is determined to be FP. The FG objects B and C are present  in the image (I) and failed to be detected by the detecting module 13, and thus the determining module 16 determines there are FN objects that exist in the image (I) but are not detected by the detecting module. That is to say, with regard to segmented FG objects B and C, no corresponding detected objects can be found, and thus the determining module 16 suspects FN objects in the detection list. Since a non-detected object does not exist in the detection list yet. We can only infer that one should be there. The detection results incorrectly indicate the absence of the objects B and C that are actually present.
In examples of the disclosure, the description of “the detected object 1 corresponding to the FG object A” means that the object 1 detected by the detecting module 13 has almost the same location and size with that of the FG object A segmented by the segmentation module 14.
Examples of the TP, FP and FN determinations are described below.
In an example, the TP, FP and FN determinations are implemented by means of the bounding boxes of the detected objects. In this example, the comparing module 15 overlays the bounding boxes onto the segmented image to overlay corresponding areas of the BG and FG objects (block 250) .
With reference to Figure 7, the comparing module 15 overlays the bounding boxes of the detected objects 1 and 2 onto the segmented image (I_2) by locating those bounding boxes at corresponding positions in the segmented image (I_2) . The corresponding positions, for example, are a series of coordinates of boundaries of those bounding boxes. As shown in Figure 7, one bounding box overlays the FG object A and another bounding box overlays the BG (i.e., the overlaid bounding box contains the BG with  the symbol “X” ) .
Then, the determining module 16 counts a number of FG pixels and a number of BG pixels within a corresponding bounding box that corresponds to the at least one detected object; and determines the at least one detected object to be TP or FP based on the counted numbers of FG pixels and BG pixels (block 260) . Specifically, the determining module 16 determines the at least one detected object to be TP if a ratio of the number of FG pixels to the number of BG pixels within the corresponding bounding box is greater than or equal to a ratio threshold. The determining module 16 determines the at least one detected object to be FP if the ratio is smaller than the ratio threshold. The ratio threshold can be pre-determined by experiences and/or model calculations. The ratio threshold may be adjusted according to scenarios. The determining module 16 also determines that the object detecting module 13 fails to detect an object that is present in the received image (I) and thus determines the object that fails to be detected is an FN object, if an FG object is not overlaid by any bounding box.
With continued reference to Figure 7, within the overlaid bounding box which contains the FG object A, a ratio of the number of FG pixels to the number of BG pixels is greater than the ratio threshold, and thus the corresponding detected object 1 is determined to be TP. Within the overlaid bounding box which contains the BG with symbol “X” , a ratio of the number of FG pixels to the number of BG pixels is smaller than the ratio threshold, and thus the corresponding detected object 2 is determined to be FP.The FG object B and C are presented in the segmented image but not overlaid by any bounding box, and thus were not detected by the detecting module 13, and the determining module 16 determines there are FN objects  that exist in the received image but are not detected by the detecting module.
In an example, the TP, FP and FN determinations are implemented by means of respective sets of position points of the detected objects. In this example, each of the detected object includes a set of position points. The sets of position points are formed by raw data output from the mono camera. The comparing module 15 marks the sets of position points of the one or more detected objects onto the segmented image to overlay corresponding areas of the BG and FG objects. For example, the comparing module 15 marks the sets of position points onto the segmented image by locating those sets of position points at corresponding positions in the segmented image (I_2) . Then, the comparing module 15 sets a central point of each of the sets of position points and a checking area around the central point (block 250) .
It is noted that a central point of a set of position points of an object is the point that is located at the central position of the set of position points. With reference to Figure 8A, the size of the checking area around the central point may be smaller than that of the object. The boundary of the checking area is within (surrounded by) the boundary of the object. With reference to Figure 8B, the size of the checking area around the central point may be a little larger than that of the object. At least a part of the boundary of the checking area is beyond the boundary of the object.
With reference to Figures 8A and 8B, the example of a set of position points of a car is shown. The comparing module 15 sets a central point P1 of the set of position points of the car and sets a checking area P11 around the central point P1. The boundary of the checking area P11 is within the boundary of the car, and the size of the checking area P11 is smaller than  that of the car.
Then, the determining module 16 counts a number of FG pixels and a number of BG pixels within a corresponding checking area that corresponds to the at least one detected object; and determines the at least one detected object to be TP or FP based on the counted numbers of FG and BG pixels (block 260) . Specifically, the determining module 16 determines the at least one detected object to be TP if a ratio of the number of FG pixels to the number of BG pixels within the corresponding checking area is greater than or equal to a ratio threshold. The determining module 16 determines the at least one detected object to be FP if the ratio is smaller than the ratio threshold. The ratio threshold can be pre-determined by experiences and/or model calculations. The ratio threshold may be adjusted according to scenarios. The determining module 16 also determines that the object detecting module 13 fails to detect an object that is present in the received image (I) and thus determines the object that fails to be detected is an FN object not appeared in the object detection list, if an FG object is marked by position points the number of which is less than a lower limit threshold. The lower limit threshold may be pre-determined by experiences and/or model calculations. The lower limit threshold may be adjusted according to scenarios.
In an example block 270, the determining module 16 eliminates the at least one detected object from the object detection list if the at least one object is determined to be FP, and keeps the at least one detected object in the object detection list if the at least one detected object is determined to be TP. The determining module 16 also adds any object that is determined to be FN into the object detection list.
In an example of block 280, the assigning module 17 assigns an initial confidence value to each of the detected objects. Then, the assigning module 17 adjusts the initial confidence values of the detected objects according to the TP and FP determinations. In an example, the initial confidence value of each detected object is set to be an intermediate value 0.5 between 0 and 1. The assigning module 17 increases an initial confidence value of a detected object towards 1 or to 1 and acquires a final confidence value of the detected object if the detected object is determined to be TP; and decrease the initial confidence value of the detected object towards 0 or to 0 and acquires the final confidence value of the detected object if the detected object is determined to be FP.
In another example of block 280, the assigning module 17 assigns a first confidence value to each of the detected objects. The first confidence value can be between 0 and 1 and assigned based on how much trust is given beforehand to the detected results, depending on the properties of the algorithms used by the object detecting module 13. The assigning module 17 further assigns a second confidence value to each of the detected objects according to the TP and FP determinations, and then calculates a final confidence value of each of the detected objects based on the assigned first and second confidence values. In an embodiment, the second confidence value of a detected object will be between 0.5 and 1 if it is determined to be TP and between 0 and 0.5 if it is determined to be FP. Then, the assigning module 17 calculates a final confidence value of each detected object based on its first and second confidence values. The calculation includes various rules, such as weight average and multiplication. In an embodiment, a final confidence value of a detected object is calculated by the following formula:
C i=k 1*C 1i+k 2*C 2i
where C i represents a final confidence value of one of the detected objects  (OBJ i) ;
C 1i represents the first confidence value of the object (OBJ i) ;
k 1 represents the weight coefficient of the first confidence value C 1i;
C 2i represents the second confidence value of the object (OBJ i) ; and
k 2 represents the weight coefficient of the second confidence value C 2i.
In an example, k 2 is greater than k 1 (e.g., k 2 is three times as much as k 1) . In this way, the second confidence value is considered more important than the first confidence value in determining the final confidence value, and thus the second confidence value is assigned more weight than the first confidence value and contribute more to the final confidence value.
In an example of block 290, the determining module 16 eliminates the at least one detected object from the detection list if the final confidence value of the at least one detected object is lower than a confidence threshold, and keeps the at least one detected object in the detection list if the final confidence value of the at least one detected object is greater than or equal to the confidence threshold. The determining module 16 also adds any object that is determined to be FN into the detection list.
It is noted that some examples of determining a TP or FP object are described using at least one detected object. According to some other examples of the disclosure, the determining module 16 can determine each of the detected object in detection list to be TP or FP using similar solutions as described above. Further, the determining module 16 can determine the elimination or keeping of each detected object using similar solutions as described above.
Another aspect of the disclosure relates to an image processing method. The method can be performed by means of the processing apparatus 10 as  described above. For this reason, various features, which are described above with reference to the processing apparatus 10 are also applicable in the method. An image processing method 900 according to an example of the disclosure is schematically shown in Figure 9 and comprises the following steps.
In step S910, an image captured by a mono camera is received.
In step S920, one or more objects are detected in the received image.
In step S930, an object detection list is created. The object detection list includes the detected one or more objects.
In step S940, the received image is segmented into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image.
In step S950, a comparison is performed based on the one or more detected objects and the one or more FG objects.
In step S960, at least one detected object of the one or more detected objects is determined to be true positive (TP) or false positive (FP) based on the comparison.
In step S970, it is determined an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module.
Embodiments of the disclosure may be implemented in a non-transitory  computer readable medium. The non-transitory computer readable medium may include instructions that, when executed, cause one or more processors to perform any operation of the method 900 according to examples of the disclosure.
It should be appreciated that all the operations in the method described above are merely exemplary, and the disclosure is not limited to any operations in the method or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.
The processors can be implemented using electronic hardware, computer software, or any combination thereof. Whether these processors are implemented as hardware or software will depend on the specific application and the overall design constraints imposed on the system. By way of example, a processor, any portion of a processor, or any combination of processors presented in this disclosure may be implemented as a microprocessor, a micro-controller, a digital signal processor (DSP) , a field programmable gate array (FPGA) , a programmable logic device (PLD) , state machine, gate logic, discrete hardware circuitry, and other suitable processing components configured to perform the various functions described in this disclosure. The functions of a processor, any portion of a processor, or any combination of processors presented in this disclosure may be implemented as software executed by a microprocessor, a micro-controller, a DSP, or other suitable platforms.
Software should be considered broadly to represent instructions, instruction sets, code, code segments, program code, programs, subroutines, software modules, applications, software applications, software packages, routines, subroutines, objects, running threads,  processes, functions, and the like. Software can reside on computer readable medium. Computer readable medium may include, for example, a memory, which may be, for example, a magnetic storage device (e.g., a hard disk, a floppy disk, a magnetic strip) , an optical disk, a smart card, a flash memory device, a random access memory (RAM) , a read only memory (ROM) , a programmable ROM (PROM) , an erasable PROM (EPROM) , an electrically erasable PROM (EEPROM) , a register, or a removable disk. Although a memory is shown as being separate from the processor in various aspects presented in this disclosure, a memory may also be internal to the processor (e.g., a cache or a register) .
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalent transformations to the elements of the various aspects of the disclosure, which are known or to be apparent to those skilled in the art, are intended to be covered by the claims.

Claims (17)

  1. An image processing apparatus, comprising:
    a receiving module configured to receive an image captured by a mono camera;
    an object detecting module configured to detect one or more objects in the received image and create an object detection list including the one or more objects;
    a segmentation module configured to segment the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image;
    a comparing module configured to perform a comparison based on the one or more detected objects and the one or more FG objects; and
    a determining module configured to determine at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison, and to determine an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.
  2. The image processing apparatus of claim 1, wherein the determining module is further configured to:
    eliminate the at least one detected object from the object detection list if the at least one object is determined to be FP;
    keep the at least one detected object in the object detection list if the at least one detected object is determined to be TP; and
    add any object that is determined to be FN into the object detection list.
  3. The image processing apparatus of claim 1, further comprising an  assigning module configured to assign a confidence value to the at least one detected object based on determined results from the determining module, the confidence value of the at least one detected object representing an existence probability of the at least one detected object.
  4. The image processing apparatus of claim 3, wherein the assigning comprises:
    assigning an initial confidence value to the at least one detected objects; and
    adjusting the initial confidence value of the at least one detected object to acquire a final confidence value of the at least one detected object in response to a TP or FP determination.
  5. The image processing apparatus of claim 3, wherein the assigning comprises:
    assigning a first confidence value to the at least one detected object based on detected results from the object detecting module;
    assigning a second confidence value to the at least one detected object based on a TP or FP determination from the determining module; and
    calculating a final confidence value of the at least one detected object based on the first and second confidence values.
  6. The image processing apparatus of claim 4 or 5, wherein the determining module is further configured to:
    eliminate the at least one detected object from the detection list if the final confidence value of the at least one detected object is lower than a confidence threshold;
    keep the at least one detected object in the detection list if the final confidence value of the at least one detected object is greater than or equal  to the confidence threshold; and
    add any object that is determined to be FN into the detection list.
  7. The image processing apparatus of any one of claims 1-6, wherein the object detecting module is further configured to predict bounding boxes each of which contains a respective one of the one or more detected objects; and
    the comparing module is further configured to overlay the bounding boxes onto the segmented image to overlay corresponding areas of the BG and FG objects.
  8. The image processing apparatus of claim 7, wherein determining the at least one detected object to be TP or FP comprises:
    counting a number of FG pixels and a number of BG pixels within a corresponding bounding box that corresponds to the at least one detected object; and
    determining the at least one detected object to be TP or FP based on the counted numbers of FG and BG pixels.
  9. The image processing apparatus of claims 8, wherein determining the at least one detected object to be TP or FP based on the counted numbers of FG and BG pixels comprises:
    determining the at least one detected object to be TP if a ratio of the number of FG pixels to the number of BG pixels within the corresponding bounding box is greater than or equal to a ratio threshold; and
    determining the at least one detected object to be FP if the ratio is smaller than the ratio threshold.
  10. The image processing apparatus of any one of claims 7-9, wherein determining an object to be FN comprises:
    if an FG object is not overlaid by any bounding box, determining the object detecting module fails to detect an object in the received image and determining the object that fails to be detected is an FN object.
  11. The image processing apparatus of any one of claims 1-10, wherein each of the one or more detected objects comprises a set of position points; and
    wherein the comparing module is further configured to:
    mark sets of position points of the one or more detected objects onto the segmented image to overlay corresponding areas of the BG and FG objects; and
    set a central point of each of the sets of position points and a checking area around the central point.
  12. The image processing apparatus of claim 11, wherein determine at least one detected object of the one or more detected objects to be TP or FP comprises:
    counting a number of FG pixels and a number of BG pixels within a corresponding checking area that corresponds to the at least one detected object; and
    determining the at least one detected object to be TP or FP based on the counted numbers of FG and BG pixels.
  13. The image processing apparatus of claim 12, wherein determining the at least one detected object to be TP or FP based on the counted numbers of FG and BG pixels comprises:
    determining the at least one detected object to be TP if a ratio of the number of FG pixels to the number of BG pixels within the corresponding checking area is greater than or equal to a ratio threshold; and
    determining the at least one detected object to be FP if the ratio is  small than the ratio threshold.
  14. The image processing apparatus of any one of claims 11-13, wherein determining an object to be FN comprises:
    if an FG object is marked by position points the number of which is less than a lower limit threshold, determining the object detecting module fails to detect an object in the received image and determining the object that fails to be detected is an FN object.
  15. The image processing apparatus of any one of claims 1-14, further comprising a calibrating module configured for calibrating camera parameters of the mono camera.
  16. An image processing method, comprising:
    receiving an image captured by a mono camera;
    detecting one or more objects in the received image;
    creating an object detection list including the one or more objects;
    segmenting the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image;
    performing a comparison based on the one or more detected objects and the one or more FG objects;
    determining at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison; and
    determining an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.
  17. A non-transitory computer readable medium with instructions stored therein which, when executed, causes a processor to carry out the steps  comprising:
    receiving an image captured by a mono camera;
    detecting one or more objects in the received image;
    creating an object detection list including the one or more objects;
    segmenting the received image into a single background (BG) and one or more foreground (FG) objects to acquire a segmented image;
    performing a comparison based on the one or more detected objects and the one or more FG objects;
    determining at least one detected object of the one or more detected objects to be true positive (TP) or false positive (FP) based on the comparison; and
    determining an object to be false negative (FN) if the object exists in the received image but is not detected by the object detecting module based on the comparison.
PCT/CN2022/082430 2022-03-23 2022-03-23 Image processing apparatus and method WO2023178542A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/082430 WO2023178542A1 (en) 2022-03-23 2022-03-23 Image processing apparatus and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/082430 WO2023178542A1 (en) 2022-03-23 2022-03-23 Image processing apparatus and method

Publications (1)

Publication Number Publication Date
WO2023178542A1 true WO2023178542A1 (en) 2023-09-28

Family

ID=81325278

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/082430 WO2023178542A1 (en) 2022-03-23 2022-03-23 Image processing apparatus and method

Country Status (1)

Country Link
WO (1) WO2023178542A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089505A1 (en) * 2016-09-23 2018-03-29 Samsung Electronics Co., Ltd. System and method for deep network fusion for fast and robust object detection
US20180341813A1 (en) * 2017-05-25 2018-11-29 Qualcomm Incorporated Methods and systems for appearance based false positive removal in video analytics
US20210056708A1 (en) * 2019-06-26 2021-02-25 Beijing Sensetime Technology Development Co., Ltd. Target detection and training for target detection network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089505A1 (en) * 2016-09-23 2018-03-29 Samsung Electronics Co., Ltd. System and method for deep network fusion for fast and robust object detection
US20180341813A1 (en) * 2017-05-25 2018-11-29 Qualcomm Incorporated Methods and systems for appearance based false positive removal in video analytics
US20210056708A1 (en) * 2019-06-26 2021-02-25 Beijing Sensetime Technology Development Co., Ltd. Target detection and training for target detection network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SAMAL KRUTTIDIPTA ET AL: "Hybridization of Data and Model based Object Detection for Tracking in Flash Lidars", 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), IEEE, 19 July 2020 (2020-07-19), pages 1 - 6, XP033832000, DOI: 10.1109/IJCNN48605.2020.9207677 *
SIKDAR ARINDAM ET AL: "An Ellipse Fitted Training-Less Model for Pedestrian Detection", 2017 NINTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), IEEE, 27 December 2017 (2017-12-27), pages 1 - 6, XP033484900, DOI: 10.1109/ICAPR.2017.8592967 *

Similar Documents

Publication Publication Date Title
KR101848019B1 (en) Method and Apparatus for Detecting Vehicle License Plate by Detecting Vehicle Area
US5592567A (en) Method for detecting and separating the shadow of moving objects in a sequence of digital images
Sarfraz et al. Real-time automatic license plate recognition for CCTV forensic applications
JP7185419B2 (en) Method and device for classifying objects for vehicles
KR20160144149A (en) A video surveillance apparatus for removing overlap and tracking multiple moving objects and method thereof
Denman et al. Multi-spectral fusion for surveillance systems
EP3726421A2 (en) Recognition method and apparatus for false detection of an abandoned object and image processing device
US20130027550A1 (en) Method and device for video surveillance
KR101851492B1 (en) Method and apparatus for recognizing license plate
WO2023178542A1 (en) Image processing apparatus and method
Monteiro et al. Robust segmentation for outdoor traffic surveillance
Kos et al. Where to look for tiny objects? ROI prediction for tiny object detection in high resolution images
CN117392638A (en) Open object class sensing method and device for serving robot scene
KR101976952B1 (en) System and method for detecting object using motion vector
US20240221426A1 (en) Behavior detection method, electronic device, and computer readable storage medium
WO2017077261A1 (en) A monocular camera cognitive imaging system for a vehicle
Tourani et al. Challenges of video-based vehicle detection and tracking in intelligent transportation systems
CN111950501B (en) Obstacle detection method and device and electronic equipment
CN110781710B (en) Target object clustering method and device
CN113033551A (en) Object detection method, device, equipment and storage medium
CN113762027A (en) Abnormal behavior identification method, device, equipment and storage medium
TWI749870B (en) Device of handling video content analysis
US20220301189A1 (en) Motion estimation device and motion estimation method using motion estimation device
US11645838B2 (en) Object detection system, object detection method, and program
Abdillah et al. Horizontal lines and Haar-like features for car detection using Support Vector Machine on traffic imagery

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22714347

Country of ref document: EP

Kind code of ref document: A1