CN112149476B

CN112149476B - Target detection method, device, equipment and storage medium

Info

Publication number: CN112149476B
Application number: CN201910578134.4A
Authority: CN
Inventors: 郁昌存; 王德鑫
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2024-06-21
Anticipated expiration: 2039-06-28
Also published as: CN112149476A

Abstract

The embodiment of the invention discloses a target detection method, a target detection device, target detection equipment and a storage medium. The method comprises the following steps: determining a current environment index value of a shooting environment corresponding to an image to be detected according to the image to be detected; determining a target shooting environment category corresponding to the image to be detected according to the current environment index value and the detection environment index threshold value; determining a target preset detection model corresponding to the target shooting environment category according to the target shooting environment category and the mapping relation between the shooting environment category and the preset detection model; and carrying out target detection on the image to be detected based on the target preset detection model to obtain each target in the image to be detected. Through the technical scheme, the target is detected more accurately, and the real-time performance of target detection is improved.

Description

Target detection method, device, equipment and storage medium

Technical Field

Embodiments of the present invention relate to image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting an object.

Background

With the development of technology, many image or video-based automatic target detection services, such as image-based person detection in traffic analysis services, and automatic detection of vehicles through photographed images or videos in intelligent traffic systems, have emerged, so as to further perform subsequent tasks such as vehicle tracking, license plate recognition, and road traffic statistics.

Taking automatic detection of a vehicle as an example, the existing method is mainly aimed at vehicle detection under the condition of strong illumination. However, under the condition of weak illumination (such as night or overcast weather), the characteristics of the vehicle in the image can be greatly changed, such as unclear outline, vanishing of texture details or strong light of the vehicle lamp, so that the vehicle accuracy of the existing vehicle detection algorithm is not high.

To solve the above problems, two methods are currently adopted: one is to augment a vehicle training data set containing various influencing factors in hopes of improving the generalization ability of a vehicle detection algorithm; another is to increase the complexity of the detection algorithm in hopes that the algorithm can adapt to various lighting conditions.

In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art: (1) The improvement mode of increasing the training data set is difficult to ensure higher detection accuracy under various illumination conditions; (2) The scheme for improving the algorithm complexity cannot achieve the purpose of real-time detection.

Disclosure of Invention

The embodiment of the invention provides a target detection method, a device, equipment and a storage medium, which are used for realizing more accurate target detection and improving the real-time performance of target detection.

In a first aspect, an embodiment of the present invention provides a target detection method, including:

determining a current environment index value of a shooting environment corresponding to an image to be detected according to the image to be detected;

Determining a target shooting environment category corresponding to the image to be detected according to the current environment index value and the detection environment index threshold value;

determining a target preset detection model corresponding to the target shooting environment category according to the target shooting environment category and the mapping relation between the shooting environment category and the preset detection model;

And carrying out target detection on the image to be detected based on the target preset detection model to obtain each target in the image to be detected.

In a second aspect, an embodiment of the present invention further provides an object detection apparatus, including:

The current environment index value determining module is used for determining a current environment index value of a shooting environment corresponding to the image to be detected according to the image to be detected;

The target shooting environment category determining module is used for determining a target shooting environment category corresponding to the image to be detected according to the current environment index value and the detection environment index threshold value;

The target preset detection model determining module is used for determining a target preset detection model corresponding to the target shooting environment category according to the target shooting environment category and the mapping relation between the shooting environment category and the preset detection model;

And the target detection module is used for carrying out target detection on the image to be detected based on the target preset detection model to obtain each target in the image to be detected.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

one or more processors;

storage means for storing one or more programs,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the object detection methods provided by any of the embodiments of the present invention.

In a fourth aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method provided by any of the embodiments of the present invention.

According to the embodiment of the invention, the current environment index value of the shooting environment corresponding to the image to be detected is determined, and the target shooting environment category corresponding to the image to be detected is determined according to the current environment index value and the detection environment index threshold value; further, according to the target shooting environment category and the mapping relation between the shooting environment category and the preset detection model, determining a target preset detection model corresponding to the target shooting environment category; and carrying out target detection on the image to be detected based on a target preset detection model to obtain each target in the image to be detected. The method and the device realize the self-adaptive scheduling of the proper target preset detection model according to the current environment index value of the image to be detected, so that the detection result with higher target detection precision can be obtained in different shooting environments, the problems of low target detection precision in different shooting environments caused by low algorithm generalization capability and detection delay caused by high algorithm complexity are solved, the accuracy of target detection is improved, and the algorithm complexity is reduced so as to detect the target more rapidly.

Drawings

FIG. 1 is a flow chart of a target detection method according to a first embodiment of the invention;

FIG. 2 is a flow chart of a target detection method in a second embodiment of the invention;

fig. 3 is an image to be detected corresponding to a strong light shooting environment and a weak light shooting environment in the second embodiment of the present invention;

FIG. 4 is a flow chart of a target detection method in a third embodiment of the invention;

Fig. 5 is a schematic diagram of a vehicle detection result of an image to be detected corresponding to a strong light shooting environment and a weak light shooting environment in the third embodiment of the present invention;

fig. 6 is a schematic structural diagram of an object detection device in a fourth embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device in a fifth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

The object detection method provided in the present embodiment may be applied to detection of an object, such as a vehicle or a pedestrian, from images photographed under different photographing environments. The method may be performed by an object detection device, which may be implemented in software and/or hardware, which may be integrated in an electronic device with image processing functionality, such as a mobile phone, a tablet, a desktop computer, a server, etc. Referring to fig. 1, the method of this embodiment specifically includes the following steps:

S110, determining a current environment index value of a shooting environment corresponding to the image to be detected according to the image to be detected.

The image to be detected refers to an image to be subjected to target detection, and the image to be detected can be an image which is independently shot or a frame of image in a video. The imaging environment refers to an environment at the time of imaging an image to be detected, and may be, for example, the intensity of illumination or the degree of fogging at the time of imaging. Different illumination intensities can cause different brightness of the photographed image, and different fogging degrees can cause different blurring degrees of the photographed image. These are environmental factors that affect the quality of the image and thus the accuracy of target detection. The environmental index value refers to a value of an index that characterizes the shooting environment. The index for representing the shooting environment can be the brightness of an image or edge information in the image. The current environmental index value refers to an environmental index value when an image to be detected is photographed.

In the related art, when an image is used for target detection, the shooting environment corresponding to the image to be detected is not distinguished, but the same target detection model is adopted for target detection, so that the detection precision of target detection based on the images obtained under different shooting environments is unequal. Based on the problems, the embodiment of the invention adopts a scheme of distinguishing the shooting environment of the image to be detected and further adopting a target detection model consistent with the shooting environment to detect the target. Therefore, the target detection is performed on the image to be detected, and the shooting environment corresponding to the image to be detected needs to be determined first. In specific implementation, brightness information extraction or edge information extraction can be performed on the image to be detected, and the extracted result is used as a current environment index value of the shooting environment corresponding to the image to be detected.

Illustratively, S110 includes: and determining the brightness information of the image to be detected according to each brightness value corresponding to the brightness channel of the image to be detected, and taking the brightness information as the current environment index value of the shooting environment corresponding to the image to be detected.

And converting the image to be detected from an original color space to a color space containing brightness channels, such as Lab color space or HSV color space, and obtaining a brightness channel image corresponding to the brightness channels of the image to be detected. And then, calculating the mean value or mean variance and the like of the gray values in the brightness channel image to obtain brightness information corresponding to the brightness channel image, wherein the brightness information can be used as a current environment index value of a shooting environment corresponding to the image to be detected. The advantage of this arrangement is that the determination speed of the current environment index value can be increased by reflecting the shooting environment of the image by the brightness information of the image.

S120, determining a target shooting environment category corresponding to the image to be detected according to the current environment index value and the detection environment index threshold value.

The detection environment index threshold value refers to a demarcation value used for distinguishing environment indexes of different shooting environments in the target detection process, and the demarcation value can be set by human experience, or can be automatically determined based on an automatic threshold value segmentation algorithm by utilizing a plurality of environment index values determined in advance. If the detection environment index threshold is automatically determined, only one detection environment index threshold can be determined, and the detection environment index threshold is used in the subsequent target detection process, so that the accuracy of the detection environment index threshold can be ensured to a certain extent, and the target detection speed can be improved to a certain extent; the detection environment index threshold value can also be updated in a staged manner, for example, when target detection is performed based on each frame of image in the shot video, each section of shot video is utilized to automatically use one detection environment index threshold value, and corresponding detection environment index threshold values are used in corresponding shot videos, so that the applicability of the detection environment index threshold values can be ensured to a greater extent, the accuracy of a preset detection model of subsequent scheduling is further improved, and the target detection precision is further improved. The specific determination mode of the detection environment index threshold value can be determined according to the requirements on the detection speed, the detection precision and the like in specific service requirements. The photographing environment category refers to a category to which the photographing environment belongs, for example, when the photographing environment is illumination intensity, the photographing environment category may be a strong illumination category, a weak illumination category, or the like. The target shooting environment category refers to a shooting environment category to which a shooting environment corresponding to an image to be detected belongs.

The shooting environments are divided into different shooting environment categories according to the detection environment index threshold. And then comparing the current environment index value with the detection environment index threshold value, determining which numerical range corresponding to the detection environment index threshold value falls into the current environment index value, and determining the shooting environment category corresponding to the falling numerical range as the target shooting environment category corresponding to the image to be detected.

S130, determining a target preset detection model corresponding to the target shooting environment category according to the target shooting environment category and the mapping relation between the shooting environment category and the preset detection model.

The preset detection model is a preset target detection model, and may be a target detection model trained in the related art, or a target detection model obtained by retraining with a training sample set. The target detection model herein may be a modified Region-based convolutional neural network target detection model Faster Region-CNN, a multi-frame prediction-based end-to-end target detection model (Single Shot MultiBox Detector, SSD), an end-to-end single network target detection model (You Only Look Once, YOLO), or the like.

In order to improve the target detection accuracy based on the images obtained in various shooting environments, the embodiment of the invention obtains preset detection models more suitable for each shooting environment (shooting environment category) in advance, and establishes a one-to-one correspondence relationship between each shooting environment category and the corresponding preset detection model. It can be understood that the accuracy of the detection result obtained by shooting the obtained image under the shooting environment a and performing the target detection by using the preset detection model corresponding to the shooting environment a is higher than the accuracy of the detection result obtained by performing the target detection by using the preset detection model corresponding to the other shooting environment.

In specific implementation, a keyword is searched by taking a target shooting environment type as a model, and a preset detection model corresponding to the target shooting environment type is determined from the mapping relation between the shooting environment type and the preset detection model and is used as a target preset detection model corresponding to the target shooting environment type.

And S140, performing target detection on the image to be detected based on a target preset detection model to obtain each target in the image to be detected.

And inputting the image to be detected into a target preset detection model, and detecting each target in the image to be detected through target detection processing of the model.

According to the technical scheme, the target shooting environment category corresponding to the image to be detected is determined according to the current environment index value and the detection environment index threshold value by determining the current environment index value of the shooting environment corresponding to the image to be detected; further, according to the target shooting environment category and the mapping relation between the shooting environment category and the preset detection model, determining a target preset detection model corresponding to the target shooting environment category; and carrying out target detection on the image to be detected based on a target preset detection model to obtain each target in the image to be detected. The method and the device realize the self-adaptive scheduling of the proper target preset detection model according to the current environment index value of the image to be detected, so that the detection result with higher target detection precision can be obtained in different shooting environments, the problems of low target detection precision in different shooting environments caused by low algorithm generalization capability and detection delay caused by high algorithm complexity are solved, the accuracy of target detection is improved, and the algorithm complexity is reduced so as to detect the target more rapidly.

Example two

The present embodiment further optimizes "determining the current environmental index value of the shooting environment corresponding to the image to be detected according to the image to be detected" based on the first embodiment. On the basis, a step of determining a detection environment index threshold corresponding to the image to be detected is further added. Wherein the explanation of the same or corresponding terms as those of the above embodiments is not repeated herein. Referring to fig. 2, the target detection method provided in this embodiment includes:

s210, filtering the image to be detected based on the guidable filter according to the image to be detected and the set guide image to obtain a filtered image.

The set guide image is a preset image, and is used for a guide image in filtering processing, and is consistent with a shooting scene of an image to be detected. Illustratively, the guide image is set as at least one frame of video image in the captured video corresponding to the image to be detected. The advantage of this arrangement is that the consistency of the shooting scene between the set guiding image and the image to be detected is further improved. The guidable filter is a filter that performs a filtering process using a guidance map, and may be, for example, a guidance filter or a joint bilateral filter. The guidable filter has the characteristics of denoising and retaining edge information, is input into an image to be filtered and a guide image, and is output into a filtered image, wherein the whole image of the filtered image is consistent with the image to be filtered, but the texture information of the filtered image is consistent with the guide image.

In the embodiment of the invention, the target detection is performed based on the shooting environment type, so that in the process of judging the target shooting environment type corresponding to the image to be detected, in order to control the variable factors in the image to be the shooting environment, a guide map and a guidable filter are introduced. In specific implementation, the image to be detected and the set guide image are input into a guidable filter, and the filtered image of the image to be detected, namely the filtered image, can be obtained through filtering treatment.

The guidable filter in the above process may be a guidance filter, and the algorithm principle of the guidance filter is as follows:

the output and input of the guided filter function are assumed to satisfy the following linear relationship within a two-dimensional window:

Where q is the pixel value of the output image, I is the pixel value of the set guide image, p is the pixel value of the input image, n is the noise, I and k are the pixel indices, and a and b are the coefficients of the linear function when the window center is at k (i.e., window w _k).

The gradient is taken on both sides of the first formula in the formula (1), so that the following can be obtained: The formula shows that: when the guiding image I is set to have a gradient, the output image q also has a similar gradient, i.e. edge consistency between the guiding filter output image and the guiding image is ensured.

Solving the linear coefficients a and b in equation (1), i.e. the minimum difference between the output value q of the desired fitting function and the true value p, translates into an optimization problem, i.e. minimizing the following cost function in the window:

where ε is a regularization parameter to prevent a from being too large, and also to increase the numerical stability.

S220, determining a peak signal-to-noise ratio corresponding to the image to be detected according to the image to be detected and the filtered image, and taking the peak signal-to-noise ratio as a current environment index value of a shooting environment corresponding to the image to be detected.

Peak signal-to-noise ratio PSNR is an objective method of evaluating image quality. After the image is subjected to the guided filtering process, the quality of the image is different from that of the original image, and in order to measure the change degree of the image before and after the guided filtering, namely the change degree between the image to be detected and the filtered image, a PSNR value is used as a measurement standard. The calculation formula of PSNR is as follows:

Where M represents the mean square error of the image to be detected X and the filtered image Y, i.e

Wherein H and W are the height and width of the image respectively; n is the number of bits per pixel, typically taking n=8, i.e. the number of pixel gray levels is 256. The unit of PSNR is dB, and the larger the value thereof, the smaller the distortion.

Taking the shooting environment as the illumination intensity as an example, the peak signal-to-noise ratio can be used as an environment index, and the change of the shooting environment can be reflected more accurately relative to the brightness index. According to the above description, there is an edge difference between the images before and after the guided filtering process. Referring to fig. 3, the gray value difference between adjacent pixels of the image obtained in the weak illumination shooting environment is small, the edge information is small, and the image variation after the guide filtering is small. And the gray value difference between adjacent pixels of the obtained image under the strong illumination shooting environment is large, the edge information is more, and the image change after the guide filtering is large. Therefore, whether the image to be detected is photographed in a strong illumination environment or in a moderate illumination environment can be judged according to the degree of change between the images before and after filtering. As can be seen from the calculation formula of PSNR, the PSNR value represents the difference between the two images. Therefore, the PSNR value of the image to be detected before and after the guiding filtering can be used as the current environment index value of the shooting environment corresponding to the image to be detected.

S230, determining an associated environment index value of the shooting environment corresponding to each associated image according to the set number of associated images corresponding to the image to be detected.

The set number is a preset number value of images, which can be set manually or can be automatically determined according to the number of frames in the photographed video corresponding to the image to be detected. The related image refers to an image having a related relationship with the image to be detected, and may be, for example, a frame image in a captured video corresponding to the image to be detected. The associated environment index value refers to an environment index value corresponding to the associated image.

In order to make the detection environment index threshold more suitable for the type judgment of the shooting environment when shooting the image to be detected, in this embodiment, updating the detection environment index threshold in real time according to the associated image of the image to be detected is set. In specific implementation, a set number of associated images are acquired, and then for each associated image, an associated environment index value corresponding to the associated image is determined. The associated environment index value may be brightness information of the associated image, or may be a peak signal-to-noise ratio corresponding to the associated image. In order to keep consistent with the current environmental index value of the image to be detected, the correlation environmental index value in this embodiment adopts a peak signal-to-noise ratio. The peak signal-to-noise ratio determination operation of the correlation environment can be described with reference to S210 to S220.

S240, determining a detection environment index threshold corresponding to the image to be detected based on a preset threshold segmentation algorithm according to each associated environment index value.

The preset threshold segmentation algorithm refers to a preset threshold segmentation algorithm, and may be, for example, a maximum inter-class variance method OTSU, an adaptive threshold segmentation method, a maximum entropy threshold segmentation method, or an iterative threshold segmentation method. In this embodiment, the maximum inter-class variance method is taken as an example. The OTSU is an algorithm for obtaining an optimal threshold, and after classification by using the threshold obtained by the OTSU, the inter-class variance of the two classes of targets is the largest.

In order to reduce the influence of human factors in the threshold determination process and improve the accuracy of the threshold, the method of automatically determining the threshold is adopted in the embodiment. In specific implementation, all the associated environment index values can be combined into an associated environment index value two-dimensional matrix, and then the associated environment index value two-dimensional matrix is processed by adopting a maximum inter-class variance method, so that the automatically determined segmentation environment index value can be obtained and used as a detection environment index threshold corresponding to the image to be detected.

All the associated environment index values comprise historical associated environment index values besides the associated environment index values of the associated images corresponding to the images to be detected. In other words, in the process of determining the detection environment index threshold value, the data volume of the associated environment index value is continuously increased, so that the determined detection environment index threshold value is more accurate, the accuracy of the threshold value cannot be affected by abrupt change of a certain value, and the anti-interference capability is strong.

S250, determining a target shooting environment category corresponding to the image to be detected according to the current environment index value and the detection environment index threshold value.

S260, determining a target preset detection model corresponding to the target shooting environment category according to the target shooting environment category and the mapping relation between the shooting environment category and the preset detection model.

And S270, performing target detection on the image to be detected based on a target preset detection model to obtain each target in the image to be detected.

According to the technical scheme of the embodiment, the filtering processing is carried out on the image to be detected based on the guidable filter according to the image to be detected and the set guide image, the filtering image is obtained, and the peak signal-to-noise ratio corresponding to the image to be detected is determined according to the image to be detected and the filtering image and is used as the current environment index value of the shooting environment corresponding to the image to be detected. The method and the device have the advantages that the current environment index value is determined through the peak signal-to-noise ratio, the characterization accuracy of the shooting environment corresponding to the image to be detected is improved, the accuracy of the preset detection model of subsequent dispatching is further improved, and further the target detection accuracy is further improved. Determining an associated environment index value of a shooting environment corresponding to each associated image according to a set number of associated images corresponding to the image to be detected; and determining a detection environment index threshold corresponding to the image to be detected based on a preset threshold segmentation algorithm according to each associated environment index value. The method realizes the staged updating of the detection environment index threshold value, ensures the applicability of the detection environment index threshold value and the image to be detected to a greater extent, further improves the accuracy of a preset detection model of subsequent dispatching, and further improves the target detection precision.

Example III

In this embodiment, on the basis of the second embodiment, a training step of "initial detection model" is added. Wherein the explanation of the same or corresponding terms as those of the above embodiments is not repeated herein.

In this embodiment, the target is a vehicle, the shooting environment is illumination intensity, and the shooting environment categories are a strong illumination category and a weak illumination category.

The automatic detection of vehicles in an intelligent traffic system is the basis for the development of various traffic services, and traffic videos shot in real time on roads are greatly influenced by shooting environments. A rapid vehicle detection algorithm that accommodates different shooting environments is more desirable. The factor that affects the video quality greatly in the shooting environment is the illumination intensity. When the shooting environment is illumination intensity, the shooting environment type can shoot the light intensity type, and is specifically classified into a strong illumination type and a weak illumination type. In the scene, the current environmental index value, the associated environmental index value and the detected environmental index threshold value are respectively the current light intensity index value, the associated light intensity index value and the detected light intensity index threshold value. The target shooting environment category is a target shooting light intensity category.

Referring to fig. 4, the target detection method provided in this embodiment includes:

s310, filtering the image to be detected based on the guidable filter according to the image to be detected and the set guide image to obtain a filtered image.

S320, determining a peak signal-to-noise ratio corresponding to the image to be detected according to the image to be detected and the filtered image, and taking the peak signal-to-noise ratio as a current light intensity index value of the illumination intensity corresponding to the image to be detected.

S330, determining an associated light intensity index value of the illumination intensity corresponding to each associated image according to the set number of associated images corresponding to the image to be detected.

S340, determining a detection light intensity index threshold corresponding to the image to be detected based on a preset threshold segmentation algorithm according to each associated light intensity index value.

S350, determining the type of the target shooting light intensity corresponding to the image to be detected according to the current light intensity index value and the detection light intensity index threshold value.

S360, determining a target preset detection model corresponding to the target shooting light intensity category according to the target shooting light intensity category and the mapping relation between the shooting light intensity category and the preset detection model.

The preset detection models in the embodiment are obtained by retraining the models by using the training sample set, so that each preset detection model is more matched with the corresponding shooting environment, and the target detection precision of the model can be further improved.

Illustratively, each preset detection model is pre-trained by:

A. And determining a training environment index threshold corresponding to the training sample set according to each sample image in the training sample set of target detection.

Before training the model, images obtained by shooting targets in various shooting environments are firstly obtained to serve as a training sample set, wherein the training sample set can be a self-defined image set or a data set provided by each open source platform, such as a public vehicle detection data set. And then, determining a training environment index value corresponding to each sample image in the training sample set, and determining an environment index threshold value (namely a training environment index threshold value) corresponding to the training sample set by utilizing all the training environment index values. The training environment index threshold in this embodiment is the training light intensity index threshold. Specific implementation of this procedure can be seen from the description of determining the detection environment index threshold in the second embodiment.

B. and dividing the training sample set into training sample subsets corresponding to all shooting environments according to the training environment index value and the training environment index threshold value corresponding to each sample image.

C. and aiming at each shooting environment, performing model training on an initial detection model corresponding to the shooting environment by utilizing a training sample subset corresponding to the shooting environment to obtain a preset detection model corresponding to the shooting environment.

The initial detection model refers to an untrained target detection model, and model parameters of the initial detection model are initial model parameter values. The initial detection model corresponding to each shooting environment can be the same or different.

And training the corresponding initial detection model by using each training sample subset to obtain a preset detection model corresponding to the training sample subset. If all the initial detection models are the same, the preset detection models finally obtained by the training data are different. If the preset detection model trained by using the strong illumination training sample subset has a better effect on target detection of the image obtained under the strong illumination shooting environment, and the preset detection model trained by using the weak illumination training sample subset has a better effect on target detection of the image obtained under the weak illumination shooting environment.

Illustratively, the initial detection model is a third version of the end-to-end single network object detection model YOLO V3.YOLO V3 is the most balanced object detection network for speed of operation and monitoring accuracy so far. Short plates of the YOLO series (fast, not good at detecting small objects etc.) are all complemented by fusion of a number of advanced methods. YOLO divides an input image into S x S bins, and if the coordinates of the center position of an object Ground truth fall into a certain bin, the bin is responsible for detecting the object, and each bin predicts B bounding boxes and their confidence scores (confidence score), and C class probabilities. bbox information (x, y, w, h) is the offset and width and height of the center position of the object relative to the grid position, all normalized. The confidence reflects the accuracy of whether an Object is contained and the position where the Object is contained, defined as Pr (Object). Times. IOUtruthpred, where Pr (Object) ∈ {0,1}.

YOLO V3 predicts using a multi-scale fusion approach. In YOLO V3, FPN-like upsampling (upsample) and fusion (3 scales are fused finally, the sizes of the other two scales are 16×16 and 32×32 respectively) are adopted, and detection is performed on feature images feature maps of multiple scales, so that the detection effect of small targets is still obvious. Although 3 bounding boxes per grid in YOLO V3 appear to be less than 5 bounding boxes per grid cell in YOLO V2, the number of bounding boxes is much greater than before because YOLO V3 employs feature fusion of multiple scales.

And S370, carrying out vehicle detection on the image to be detected based on the target preset detection model, and obtaining each vehicle in the image to be detected.

The two images to be detected shown in fig. 3 are subjected to vehicle detection, and corresponding vehicle detection results can be obtained, as shown in fig. 5. As can be seen from fig. 5, in the embodiment of the invention, the object detection method using the model scheduling algorithm in combination with the yolov model can obtain the object detection result with high vehicle recognition accuracy in the strong illumination and weak illumination shooting environments, thereby improving the vehicle recognition accuracy, reducing the calculation complexity of the vehicle detection process and achieving the real-time detection effect.

According to the technical scheme of the embodiment, a training environment index threshold corresponding to a training sample set is determined according to each sample image in the training sample set detected by the target; dividing a training sample set into training sample subsets corresponding to shooting environments according to training environment index values and training environment index thresholds corresponding to each sample image; and aiming at each shooting environment, performing model training on an initial detection model corresponding to the shooting environment by utilizing a training sample subset corresponding to the shooting environment to obtain a preset detection model corresponding to the shooting environment. Training of preset detection models corresponding to different shooting environment categories is achieved, so that each preset detection model is matched with the corresponding shooting environment, and the target detection accuracy of the model can be further improved.

Example IV

The present embodiment provides an object detection apparatus, referring to fig. 6, which specifically includes:

The current environment index value determining module 810 is configured to determine a current environment index value of a shooting environment corresponding to the image to be detected according to the image to be detected;

The target shooting environment category determining module 820 is configured to determine a target shooting environment category corresponding to the image to be detected according to the current environment index value and the detection environment index threshold value;

The target preset detection model determining module 830 is configured to determine a target preset detection model corresponding to the target shooting environment category according to the target shooting environment category and a mapping relationship between the shooting environment category and the preset detection model;

the target detection module 840 is configured to perform target detection on the image to be detected based on a target preset detection model, so as to obtain each target in the image to be detected.

Optionally, the current environmental indicator value determining module 810 is specifically configured to:

And determining the brightness information of the image to be detected according to each brightness value corresponding to the brightness channel of the image to be detected, and taking the brightness information as the current environment index value of the shooting environment corresponding to the image to be detected.

According to the image to be detected and the set guide image, filtering the image to be detected based on a guidable filter to obtain a filtered image;

And determining the peak signal-to-noise ratio corresponding to the image to be detected as the current environment index value of the shooting environment corresponding to the image to be detected according to the image to be detected and the filtering image.

Optionally, on the basis of the above apparatus, the apparatus further includes a detection environment index threshold determining module, configured to:

before determining a target shooting environment category corresponding to an image to be detected according to the current environment index value and the detection environment index threshold value, determining an associated environment index value of a shooting environment corresponding to each associated image according to a set number of associated images corresponding to the image to be detected;

and determining a detection environment index threshold corresponding to the image to be detected based on a preset threshold segmentation algorithm according to each associated environment index value.

Optionally, on the basis of the device, the device further comprises a model training module, configured to train each preset detection model in advance by the following manner:

Determining a training environment index threshold corresponding to the training sample set according to each sample image in the training sample set of target detection;

Dividing a training sample set into training sample subsets corresponding to shooting environments according to training environment index values and training environment index thresholds corresponding to each sample image;

And aiming at each shooting environment, performing model training on an initial detection model corresponding to the shooting environment by utilizing a training sample subset corresponding to the shooting environment to obtain a preset detection model corresponding to the shooting environment.

Further, the initial detection model is a third version of an end-to-end single network target detection model.

Optionally, the target is a vehicle, the shooting environment is illumination intensity, and the shooting environment categories are a strong illumination category and a weak illumination category.

According to the target detection device provided by the fourth embodiment of the invention, the proper target preset detection model is adaptively scheduled according to the current environment index value of the image to be detected, so that a detection result with higher target detection precision can be obtained in different shooting environments, the problems of low target detection precision in different shooting environments and detection delay caused by low algorithm generalization capability and high algorithm complexity are solved, the target detection accuracy is improved, and the algorithm complexity is reduced so as to detect the target more rapidly.

The object detection device provided by the embodiment of the invention can execute the object detection method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

It should be noted that, in the above embodiment of the object detection apparatus, each unit and module included are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.

Example five

Referring to fig. 7, the present embodiment provides an electronic device, which includes: one or more processors 920; the storage device 910 is configured to store one or more programs, where the one or more programs are executed by the one or more processors 920, so that the one or more processors 920 implement the target detection method provided by the embodiment of the present invention, and the method includes:

determining a current environment index value of a shooting environment corresponding to the image to be detected according to the image to be detected;

and carrying out target detection on the image to be detected based on a target preset detection model to obtain each target in the image to be detected.

Of course, those skilled in the art will appreciate that the processor 920 may also implement the technical solution of the target detection method provided in any embodiment of the present invention.

The electronic device shown in fig. 7 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present invention. As shown in fig. 7, the electronic device includes a processor 920, a storage device 910, an input device 930, and an output device 940; the number of processors 920 in the electronic device may be one or more, one processor 920 being illustrated in fig. 7; the processor 920, the storage device 910, the input device 930, and the output device 940 in the electronic device may be connected by a bus or other means, for example, by a bus 950 in fig. 7.

The storage device 910 is used as a computer readable storage medium, and may be used to store a software program, a computer executable program, and a module, such as program instructions/modules corresponding to the target detection method in the embodiment of the present invention (for example, a current environmental index value determining module, a target shooting environment category determining module, a target preset detection model determining module, and a target detection module in the target detection device).

The storage device 910 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the terminal, etc. In addition, the storage 910 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, the storage 910 may further include memory remotely located relative to the processor 920, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 930 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. The output device 940 may include a display device such as a display screen.

Example six

The present embodiment provides a storage medium containing computer executable instructions which, when executed by a computer processor, are configured to perform a method of object detection, the method comprising:

Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the above method operations, and may also perform the related operations in the object detection method provided in any embodiment of the present invention.

From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk, or an optical disk of a computer, etc., and include several instructions for causing an electronic device (which may be a personal computer, a server, or a network device, etc.) to execute the object detection method provided by the embodiments of the present invention.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A method of detecting an object, comprising:

Determining a current environment index value of a shooting environment corresponding to an image to be detected according to the image to be detected; the determining the current environment index value of the shooting environment corresponding to the image to be detected according to the image to be detected comprises the following steps: according to the image to be detected and the set guide image, filtering the image to be detected based on a guidable filter to obtain a filtered image; determining a peak signal-to-noise ratio corresponding to the image to be detected as a current environment index value of a shooting environment corresponding to the image to be detected according to the image to be detected and the filtering image;

2. The method according to claim 1, wherein determining the current environment index value of the shooting environment corresponding to the image to be detected according to the image to be detected comprises:

3. The method according to claim 1, further comprising, before the determining the target capturing environment category corresponding to the image to be detected according to the current environment index value and the detection environment index threshold value:

determining an associated environment index value of a shooting environment corresponding to each associated image according to a set number of associated images corresponding to the image to be detected;

And determining the detection environment index threshold corresponding to the image to be detected based on a preset threshold segmentation algorithm according to each associated environment index value.

4. The method of claim 1, wherein each of the predetermined detection models is pre-trained by:

determining a training environment index threshold corresponding to a training sample set according to each sample image in the training sample set of target detection;

dividing the training sample set into training sample subsets corresponding to shooting environments according to training environment index values and training environment index thresholds corresponding to each sample image;

5. The method of claim 4, wherein the initial detection model is a third version of an end-to-end single network target detection model.

6. The method of any one of claims 1-5, wherein the target is a vehicle, the capture environment is an illumination intensity, and the capture environment categories are a high illumination category and a low illumination category.

7. An object detection apparatus, comprising:

The current environment index value determining module is used for determining a current environment index value of a shooting environment corresponding to the image to be detected according to the image to be detected; the current environment index value determining module is specifically configured to: according to the image to be detected and the set guide image, filtering the image to be detected based on a guidable filter to obtain a filtered image; determining a peak signal-to-noise ratio corresponding to the image to be detected as a current environment index value of a shooting environment corresponding to the image to be detected according to the image to be detected and the filtering image;

8. An electronic device, the electronic device comprising:

one or more processors;

storage means for storing one or more programs,

When executed by the one or more processors, causes the one or more processors to implement the target detection method of any of claims 1-6.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the object detection method according to any one of claims 1-6.