CN113221603A

CN113221603A - Method and device for detecting shielding of monitoring equipment by foreign matters

Info

Publication number: CN113221603A
Application number: CN202010080542.XA
Authority: CN
Inventors: 陆晔; 罗渝平; 周翔; 孙晓凯; 倪卿元; 张驰; 陈文强; 李梦媛; 黄海涛; 陆成
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2020-02-05
Filing date: 2020-02-05
Publication date: 2021-08-06

Abstract

The present disclosure relates to a method and a device for detecting shielding of a monitoring device by a foreign object, comprising: obtaining an original image by using a camera; processing the original image to detect a target pixel associated with a foreign object; if the target pixel is detected, extracting the target pixel in the original image to obtain an image retaining the target pixel; performing depth estimation on the image of the reserved target pixel; calculating an occlusion ratio based on a result of the depth estimation; and judging whether the camera is shielded by the foreign matter or not based on the shielding ratio.

Description

Method and device for detecting shielding of monitoring equipment by foreign matters

Technical Field

The present disclosure relates to the field of video image processing, and in particular, to a method and an apparatus for detecting shielding of a monitoring device by a foreign object.

Background

With the rapid development of modern science and technology and the gradual development and improvement of image and video acquisition technology, intelligent security video monitoring is widely applied to daily production and life of people. The cameras of the security video monitoring are easy to be shielded, so that the monitoring information of monitoring points is seriously lost, the function of the monitoring lens is seriously influenced, the capability of video patrol and investigation of public security organs is weakened, the public safety management is seriously influenced, the life and property safety of residents is threatened, and irreparable loss can be caused even at some key monitoring points.

At present, the method for detecting the shielding of the camera by the foreign matters mainly comprises two methods: the first is a method for manual periodical examination, and the second is a traditional video image processing method.

For the first method, the camera which has problems through manual regular troubleshooting is not only laborious and troublesome, but also needs to pay high labor cost, and is easy to miss or neglect, so that the occlusion of leaves in the dead angle of sight cannot be noticed.

For the second method, the conventional video image processing method generally adopts a leaf segmentation method combining threshold segmentation and leaf color feature analysis, introduces color features and segmentation region area features on the basis of leaf segmentation, and performs classification modeling by a support vector machine, thereby realizing the detection of leaf occlusion. The method has the obvious defect that the leaf shielding detection precision is too low, and the leaf depth information cannot be obtained, so that the interference of distant large leaves and other objects such as playgrounds, lawns and the like with similar colors to the leaves often occurs, the misjudgment is caused, and the manual re-inspection is still needed.

Disclosure of Invention

According to an aspect of the present disclosure, a method for detecting whether a camera is shielded by a foreign object is provided, which includes the following steps: obtaining an original image by using the camera; processing the original image to detect a target pixel associated with a foreign object; if the target pixel is detected, extracting the target pixel in the original image to obtain an image with the reserved target pixel; performing depth estimation on the image of the reserved target pixel; calculating an occlusion ratio based on a result of the depth estimation; and judging whether the camera is shielded by the foreign matter or not based on the shielding ratio.

According to another aspect of the present disclosure, there is provided a non-transitory storage medium for detecting whether a camera is shielded by a foreign object, having a program stored thereon, characterized in that, when the program is executed by a computer, the program causes the computer to execute the method as described above.

According to yet another aspect of the present disclosure, there is provided an apparatus for detecting whether a camera is shielded by a foreign object, comprising a memory and a processor, the memory having stored therein a program, which when executed by the processor, causes the processor to perform the method as described above.

An advantage of an embodiment according to the present disclosure is that an intelligent detection method for detecting occlusion by a camera including image semantic segmentation and monocular image depth estimation methods is provided, so as to solve the problem that manual inspection is costly and improve occlusion detection accuracy. The accuracy here refers to the ratio of the number of correctly detected occluded cameras to the total number of detected cameras.

Another advantage of the embodiments according to the present disclosure is that the conventional video image processing method mentioned in the above background has a low accuracy, and is easily interfered by a scene similar to a targeted foreign object, causing misjudgment and the like. For example, when detecting whether the camera is blocked by leaves, it may be interfered by a similar lawn, thereby causing misjudgment. The invention provides a self-defined semantic segmentation detection model, which is used for distinguishing leaves from other objects and extracting large-area leaves in a picture, thereby effectively reducing the interference of the characteristics of other objects and reducing environmental interference factors to the maximum extent.

Yet another advantage of an embodiment according to the present disclosure is that the present disclosure further calculates a distance from a leaf to a camera in a real scene by using a monocular image depth estimation method, thereby implementing anomaly detection of leaf occlusion. The method introduces the leaf depth information in the image so as to accurately judge the camera sheltered from the leaves, and accurately guide relevant workers to prune the leaves and branches.

Other features of the present disclosure and advantages thereof will become more apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 illustrates an example of raw images used to train a sample according to one or more exemplary embodiments of the present disclosure.

FIG. 2 illustrates an example of an annotated image resulting from annotating the original image shown in FIG. 1 used to train a sample in accordance with one or more exemplary embodiments of the present disclosure.

Fig. 3 shows an example of an original image taken by a camera of a monitoring device according to one or more exemplary embodiments of the present disclosure.

Fig. 4 illustrates an example of a color probability map resulting from inputting the example original image shown in fig. 3 into an image semantic segmentation model according to one or more example embodiments of the present disclosure.

Fig. 5 illustrates an example of a binarized image obtained after subjecting the color probability map shown in fig. 4 to a binarization process according to one or more exemplary embodiments of the present disclosure.

Fig. 6 illustrates an example of subjecting the binarized image shown in fig. 5 to a morphological on operation to obtain a morphologically on-operated binarized image in accordance with one or more exemplary embodiments of the present disclosure.

Fig. 7 illustrates an image of a retained target pixel resulting from combining the morphological on-computed binarized image shown in fig. 6 with the original image shown in fig. 3, according to one or more exemplary embodiments of the present disclosure.

Fig. 8 illustrates an example of an image with depth information resulting from inputting the image of the retained target pixel, as illustrated in fig. 7, into a depth estimation model according to one or more exemplary embodiments of the present disclosure.

Fig. 9 shows a flowchart of a method for detecting that a monitoring device is occluded by a foreign object according to one or more exemplary embodiments of the present disclosure.

Fig. 10 shows a block diagram of an apparatus for detecting that a monitoring device is blocked by a foreign object according to one or more exemplary embodiments of the present disclosure.

Note that in the embodiments described below, the same reference numerals are used in common between different drawings to denote the same portions or portions having the same functions, and a repetitive description thereof will be omitted. In some cases, similar reference numbers and letters are used to denote similar items, and thus, once an item is defined in one figure, it need not be discussed further in subsequent figures.

For convenience of understanding, the positions, sizes, ranges, and the like of the respective structures shown in the drawings and the like do not sometimes indicate actual positions, sizes, ranges, and the like. Therefore, the present disclosure is not limited to the positions, dimensions, ranges, and the like disclosed in the drawings and the like.

Detailed Description

Various exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. That is, the structures and methods herein are shown by way of example to illustrate different embodiments of the structures and methods of the present disclosure. Those skilled in the art will understand, however, that they are merely illustrative of exemplary ways in which the disclosure may be practiced and not exhaustive. Furthermore, the figures are not necessarily to scale, some features may be exaggerated to show details of particular components.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

For a monitoring device that is blocked by a foreign object, at least a portion of an image captured by a camera of the device may correspond to the foreign object blocking the camera. Therefore, in the present disclosure, an original image captured by a camera to be detected may be processed to identify whether at least a portion of an image corresponding to a foreign object exists in the original image, thereby determining whether the camera is blocked by the foreign object.

The embodiment of the disclosure adopts an image semantic segmentation technology to detect whether at least one part corresponding to the foreign object exists in an original image shot by a camera. Image semantic segmentation algorithms are important technologies in image processing and computer vision, and can be applied to automatic driving systems (e.g., street view recognition), unmanned aerial vehicle applications (e.g., landing site determination), wearable devices, and the like. The algorithm identifies each pixel and the meaning of each pixel in the image, groups or segments each pixel in the image according to the meaning corresponding to the pixel, and may color the pixels in the image to different colors according to the results of the grouping or segmentation.

In the image semantic segmentation technique, first, the class of an object in an image needs to be defined. The category to which the foreign matter to be detected belongs is the target category, and is included in these defined categories. For example, in a case where it is required to detect whether the camera is occluded by a leaf, for an image captured by the camera of the monitoring device, objects in the image may be customized to 14 categories, including: two-wheeled vehicles, buildings, four-wheeled vehicles, clothing, indoor floors, lawns, leaves, people, playgrounds, outdoor pavements, runways, sky, vegetation, and others. Also, each category may correspond to a color, with different categories corresponding to different colors. When customizing the category of an object in an image, a semantic segmentation labeling software (e.g., pixelantanotitiontotol) may be used to perform labeling of the custom category. Annotating can include classifying pixels in the image as one of the custom categories and coloring the pixels by different categories to a color corresponding to the category to yield an annotated image. For example, fig. 1 is an existing original image, and fig. 2 is an annotated image obtained by annotating the original image of fig. 1 according to a custom category.

In the image semantic segmentation technology, common image semantic segmentation models include a Unet, a PSPnet, a BiseNet, a deep series, and the like. In an embodiment, preferably, for performance reasons, a BiseNet (binary Segmentation Network for Real-time Segmentation) model based on the resnet101 backbone Network may be used. Common standard metrics in image semantic segmentation techniques include mlou (mean Intersection over union), which computes the ratio of the Intersection and union of the two sets.

In order to obtain an image semantic segmentation model suitable for an application background, a large number of samples are required to train the image semantic segmentation model. Such samples may include an original image (e.g., fig. 1) and an annotated image (e.g., fig. 2) corresponding to the original image. And a plurality of samples form a training set for training the self-defined image semantic segmentation model. In training the custom image semantic segmentation model using the training set, for example, resnet101 weights pre-trained in ImageNet (a common data set) can be used as initialization weights, the learning rate is set to 0.01, and a back propagation optimization function is performed using the sdg algorithm to obtain a trained custom image semantic segmentation model.

And after the trained image semantic segmentation model is obtained, processing an original image obtained from a camera of the monitoring equipment by using the image semantic segmentation model to obtain a colorful probability map. In the color probability map, different colors correspond to different categories.

Optionally, before the original image is input into the image semantic segmentation model, the size of the image may be processed to adapt to the requirements of the image semantic segmentation model.

For example, fig. 3 is an original image of one of the videos (1920,1080,3), which was taken by a camera of a monitoring device. The deformation process may be performed on fig. 3, resulting in an image of (960,720, 3). Inputting the image subjected to deformation processing (960,720,3) into the trained BiseNet network, and obtaining a color probability map (960,720,3) based on the inference of the image semantic segmentation model, as shown in FIG. 4, the green part is the part corresponding to the foreign matter (i.e. leaf) to be detected.

In the present disclosure, a portion of an image has three-dimensional coordinates (x, y, z), where x is the width of the image, y is the height of the image, and z is the number of channels of the image.

After obtaining the probability map of colors, it can be detected whether there is a color corresponding to the target category in the probability map. If the color corresponding to the target category does not exist, at least one part corresponding to the foreign matter to be detected does not exist in the original graph, namely, the camera is judged not to be blocked by the foreign matter. And if the color corresponding to the target category exists, further processing the color probability map to obtain a binary image. For example, pixels in the color probability map having a color corresponding to the target category may be colored white, and pixels having a color not corresponding to the target category may be colored black. The image obtained after the binarization process of the color probability map shown in fig. 4 is shown in fig. 5. In fig. 5, it can be seen that a portion corresponding to a foreign substance to be detected (i.e., a leaf) is white, and the remaining portion is black.

Note also that there are small black blocks at the edges of the black blocks, as shown in fig. 5. Such small black blocks at the edges of the binarized image are erroneous pixels due to erroneous inference of the model.

Optionally, to eliminate the effect of false inference, the binarized image may be further processed to remove misjudged pixels. This can be achieved, for example, by performing a morphological open operation on the binarized image. For example, a morphological on operation may be performed on the binarized image, and then blocks composed of target pixels and having an area smaller than an area threshold value in the morphologically on operated image may be removed.

For example, for the binarized image shown in fig. 5, a morphological open operation may be performed using 5 × 5 structural elements to remove blocks with an area of less than 10000 pixels in the image, thereby obtaining the binarized image subjected to the morphological open operation shown in fig. 6.

After obtaining the binarized image or the morphologically opened binarized image, the binarized image or the morphologically opened binarized image may be compared with the original image. According to the position of the pixel which is colored white in the binarized image or the binarized image which is subjected to the morphological opening operation, the pixel at the position corresponding to the position in the original image is extracted, so that the image which retains the target pixel in the original image is obtained. The remaining pixels in the original image (i.e., non-target pixels, i.e., pixels colored black in the binarized image or the morphologically open binarized image) may be discarded. For example, these remaining pixels may be changed to black.

Alternatively, in the case where the size of the original image is different from the size of the binarized image or the morphologically ON-computed binarized image, the original image may be scaled to match the binarized image.

In order to avoid interference caused by an object that is far away from the camera in the image and is close to the foreign object to be detected, the present disclosure also uses a depth estimation method to perform depth estimation on the image of the remaining target pixel, thereby further improving detection accuracy using depth information of pixels in the image of the remaining target pixel.

Depth estimation of the image with the retained target pixels includes using a monocular camera depth estimation model to obtain an image with depth information. The monocular camera depth estimation model can estimate the depth of an object (i.e., the distance between the object and a single camera in the real world) by using only images shot by the single camera. Monocular camera depth estimation belongs to an unsupervised method, and depth data do not need to be trained. Common Monocular camera Depth Estimation models include fast-Depth, SVS, monodepth, and the like, and in this embodiment, a monodepth model (unsuperved monomeric Depth Estimation with Left-Right Consistency) may be selected. Since the monodepth model does not need to be retrained, the depth of an object in the image of the retained target pixel can be estimated by directly inputting the image of the retained target pixel into the depth estimation network monodepth. The result of the depth estimation may be, for example, an image with depth information. The depth information here may be a numerical value associated with the depth of the object corresponding to the pixel. The depth information may be, for example, a pixel value or the like, and the pixel value may be, for example, a gradation value or the like.

For example, the image of the retention target pixel shown in fig. 7 is input to the depth estimation network monodepth, resulting in the image having depth information shown in fig. 8. For example, the image of the retained target pixel may be a (512,256,3) image, and the resulting image with depth information may be a (512,256) image, as shown in FIG. 8. Fig. 8 is a grayscale image with depth information. In fig. 8, the closer the object is to the camera, the larger the gradation value of the pixel thereof. The gray scale value is a value indicating the brightness of the image, and generally ranges from 0 to 255, the white gray scale value is 255, and the black gray scale value is 0. The larger the gray value of the pixel, the brighter it appears. Thus, the luminance (or gray value) of a pixel in fig. 8 reflects the depth of the pixel.

In the present disclosure, a portion of an image has two-dimensional coordinates (x, y), where x (e.g., the abscissa in fig. 8) is the width of the image and y (e.g., the ordinate in fig. 8) is the height of the image.

Optionally, before inputting the image of the reserved target pixel into the monocular camera depth estimation model, the size of the image of the reserved target pixel may be processed to adapt to the requirements of the monocular camera depth estimation model.

Based on the image with the depth information, it can be determined whether the camera is blocked by a foreign object to be detected.

First, the image with depth information is counted. If a pixel in the image having the depth information has a pixel value greater than a pixel value threshold, it is determined that the pixel corresponds to an object that is close relative to the camera, i.e., corresponds to a foreign object to be detected. And when a pixel in the image having the depth information has a pixel value less than or equal to a pixel value threshold, it is determined that the pixel corresponds to an object at a far distance with respect to the camera, that is, does not correspond to a foreign object to be detected. And counting the pixels with the pixel values larger than the pixel value threshold value in the image with the depth information to obtain the number of the pixels with the pixel values larger than the pixel value threshold value, and recording the number as Num.

For example, as shown in the grayscale image of fig. 8, when the grayscale value of a pixel in the image is greater than the grayscale value threshold 200, it can be determined that the pixel corresponds to a foreign object to be detected (i.e., a leaf). The gray value threshold is merely an example and is not limited, and the gray value threshold may be adjusted to any value in the range of 0 to 255 according to actual situations.

Based on the above statistics, the occlusion ratio of the image with depth information may be calculated. All pixels in the image with depth information are counted to obtain the total number of pixels, which is denoted as Sum (height of the image and width of the image). For example, for fig. 8 Sum is 512 × 256. The occlusion ratio can be calculated as: the occlusion ratio is Num/Sum.

Based on the calculated occlusion ratio, it can be determined whether the camera is occluded by a foreign object to be detected. And if the calculated shielding ratio is larger than or equal to the shielding ratio threshold value, judging that the camera is shielded by the foreign matter to be detected. And if the calculated shielding ratio is smaller than the shielding ratio threshold value, judging that the camera is not shielded by the foreign matter to be detected.

For example, for the image shown in fig. 8, when the calculated occlusion ratio is greater than or equal to 50% of the occlusion ratio threshold, it can be determined that the camera is occluded by leaves; and when the calculated shielding ratio is less than 50% of the shielding ratio threshold value, judging that the camera is not shielded by the leaves. The occlusion ratio threshold value is merely an example and is not limited, and the occlusion ratio threshold value may be adjusted to any value in the range of 0 to 100% according to actual situations.

As shown in fig. 9, the present invention discloses a method for detecting whether a camera is shielded by a foreign object, including: at 910, obtaining an original image with a camera; next, at 920, the original image is processed to detect a target pixel associated with the foreign object; then, at 930, it is determined whether a target pixel is detected; if a target pixel is detected, at 940, the target pixel in the original image is extracted to obtain an image with the reserved target pixel; next, at 950, a depth estimation is performed on the image of the retained target pixel; then, at 960, based on the results of the depth estimation, an occlusion ratio is calculated; and finally, at 970, it is determined whether the camera is occluded by a foreign object based on the occlusion ratio.

In one embodiment, in the method, extracting the target pixel in the original image comprises: the pixels of the original image are binarized and the binarized image is combined with the original image to determine the location of the target pixel to be extracted in the original image.

In one embodiment, the method further comprises: before the depth estimation is performed on the image of the reserved target pixel, the binarized image is processed to remove misjudged pixels.

In one embodiment, the method wherein processing the binarized image comprises: the binarized image is subjected to a morphological open operation, and blocks in the morphologically open operated image which are composed of target pixels and have an area smaller than an area threshold value are removed.

In one embodiment, the method further comprises: when extracting the target pixel in the original image, other pixels in the original image which do not belong to the target pixel are changed to black.

In one embodiment, in the method, processing the original image includes utilizing an image semantic segmentation model.

In one embodiment, the method further comprises: and processing the original image by using an image semantic segmentation model to obtain a color probability map, and obtaining an image with reserved target pixels according to whether the pixels in the color probability map are the target pixels.

In one embodiment, in the method, depth estimation of the image of the retained target pixel includes using a monocular camera depth estimation model to obtain an image with depth information.

In one embodiment, the method further comprises: counting the number of pixels with pixel values larger than the pixel value threshold value in the image with the depth information, and calculating the number of the pixels with the pixel values larger than the pixel value threshold value and the total number of the pixels in the image with the depth information to obtain the shielding ratio.

In one embodiment, the method further comprises: and if the target pixel is not detected, judging that the camera is not shielded by the foreign matter.

In one embodiment, in the method, if the shielding ratio is greater than or equal to a shielding ratio threshold value, it is determined that the camera is shielded by a foreign object; otherwise, judging that the camera is not shielded by the foreign matters.

In one embodiment, in the method, the foreign matter is leaves.

As shown in fig. 10, the present invention discloses a device for detecting whether a camera is shielded by a foreign object. The apparatus includes a camera 1010, a memory 1020, and a processor 1030 coupled to each other. Memory 1020 has stored thereon a program that, when executed by processor 1030, causes processor 1030 to perform a method as described above

In summary, the present invention discloses a method and an apparatus for improving the accuracy of detecting the shielding of a camera by a foreign object in a complex scene by combining image semantic segmentation and monocular depth estimation. In the embodiment of the disclosure, a deep learning image semantic segmentation BiseNet method is used. Compared with the traditional image feature segmentation method, the method has the advantages of good robustness, high mIoU, high speed, capability of self-adapting to the image quality of the camera due to various weather changes and the like. In the embodiment of the present disclosure, a monocular camera depth estimation MonoDepth method is also used. According to the method, the physical distance between the foreign matters (such as leaves) and the camera can be estimated by only using a single camera, so that the interference of distant large leaves and other objects such as playground and lawn with similar colors to the leaves in the RGB image to the algorithm is eliminated, the false alarm rate is greatly reduced, and the detection precision is improved.

It should be appreciated that reference throughout this specification to "an embodiment" or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrases "in embodiments of the present disclosure" and similar language throughout this specification do not necessarily all refer to the same embodiment.

As used herein, the word "exemplary" means "serving as an example, instance, or illustration," and not as a "model" that is to be replicated accurately. Any implementation exemplarily described herein is not necessarily to be construed as preferred or advantageous over other implementations. Furthermore, the disclosure is not limited by any expressed or implied theory presented in the preceding technical field, background, brief summary or the detailed description.

It will be further understood that the terms "comprises/comprising," "includes" and/or "including," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

One skilled in the art will appreciate that the present disclosure can be implemented as a system, apparatus, method, or computer-readable medium (e.g., non-transitory storage medium) as a computer program product. Accordingly, the present disclosure may be embodied in various forms, such as an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-program code, etc.) or an embodiment combining software and hardware aspects that may all be referred to hereinafter as a "circuit," module "or" system. Furthermore, the present disclosure may also be embodied in any tangible media as a computer program product having computer usable program code stored thereon.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of systems, apparatuses, methods and computer program products according to specific embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and any combination of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be executed by a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, implement the functions or acts specified in the flowchart and/or block diagram block or blocks.

Flowcharts and block diagrams of the architecture, functionality, and operation in which systems, apparatuses, methods and computer program products according to various embodiments of the present disclosure may be implemented are shown in the accompanying drawings. Accordingly, each block in the flowchart or block diagrams may represent a module, segment, or portion of program code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in the drawings may be executed substantially concurrently, or in some cases, in the reverse order from the drawing depending on the functions involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the market technology, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for detecting whether a camera is shielded by foreign matters comprises the following steps:

obtaining an original image by using the camera;

processing the original image to detect a target pixel associated with the foreign object;

if the target pixel is detected, extracting the target pixel in the original image to obtain an image with the reserved target pixel;

performing depth estimation on the image of the reserved target pixel;

calculating an occlusion ratio based on a result of the depth estimation; and

and judging whether the camera is shielded by the foreign matter or not based on the shielding ratio.

2. The method of claim 1, wherein extracting the target pixel in the original image comprises: the pixels of the original image are binarized, and the binarized image is combined with the original image to determine the location of the target pixel to be extracted in the original image.

3. The method of claim 2, further comprising: and processing the binarized image to remove misjudged pixels before performing depth estimation on the image of the reserved target pixel.

4. The method as claimed in claim 3, wherein processing the binarized image comprises: performing a morphological open operation on the binarized image, and removing a block which is composed of a target pixel and has an area smaller than an area threshold value in the morphologically open operated image.

5. The method of claim 1, further comprising: when the target pixel in the original image is extracted, other pixels in the original image, which do not belong to the target pixel, are changed into black.

6. The method of claim 1, wherein processing the original image comprises utilizing an image semantic segmentation model.

7. The method of claim 6, further comprising: and processing the original image by using an image semantic segmentation model to obtain a color probability map, and obtaining the image of the reserved target pixel according to whether the pixel in the color probability map is the target pixel.

8. The method of claim 1, wherein depth estimating the image of the retained target pixel comprises utilizing a monocular camera depth estimation model to obtain an image with depth information.

9. The method of claim 8, further comprising: counting the number of pixels with pixel values larger than a pixel value threshold value in the image with the depth information, and calculating the number of the pixels with the pixel values larger than the pixel value threshold value and the total number of the pixels in the image with the depth information to obtain the shielding ratio.

10. The method of claim 1, further comprising: and if the target pixel is not detected, judging that the camera is not shielded by the foreign matter.

11. The method of claim 1, wherein,

if the shielding ratio is larger than or equal to a shielding ratio threshold value, judging that the camera is shielded by the foreign matter; otherwise

And judging that the camera is not shielded by the foreign matter.

12. The method of claim 1, wherein the foreign matter is leaves.

13. A non-transitory storage medium for detecting whether a camera is obstructed by a foreign object, on which a program is stored, characterized in that the program, when executed by a computer, causes the computer to perform the method according to any one of claims 1 to 12.

14. An apparatus for detecting whether a camera is obstructed by a foreign object, comprising a memory and a processor, the memory having stored therein a program which, when executed by the processor, causes the processor to carry out the method of any one of claims 1 to 12.