CN112949389A

CN112949389A - Haze image target detection method based on improved target detection network

Info

Publication number: CN112949389A
Application number: CN202110115251.4A
Authority: CN
Inventors: 许悦雷; 张凡; 加尔肯别克; 胡璐娟; 崔祺
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2021-06-11

Abstract

The invention discloses a haze image target detection method based on an improved target detection network, which comprises the steps of firstly adopting a rapid defogging algorithm based on optimized contrast enhancement, effectively removing haze in real time and enhancing the contrast information of an image; then sending the defogged image into an improved Yolov4 target detection network for detection, and outputting target information under the haze weather condition; meanwhile, in order to improve the feature extraction capability of the YOLOV4 network, the invention replaces the residual blocks in the YOLOV4 main network with the dense link blocks, and compared with the residual blocks, the dense link blocks enable the network to improve the feature expression capability and improve the feature extraction capability of the network. Finally, experiments prove that the target detection capability under the haze weather condition can be improved by using the method.

Description

Haze image target detection method based on improved target detection network

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a haze image target detection method.

Background

The target detection and identification refers to positioning and identifying an interested target in an image by using a certain technical means. Currently, there are two main types of target detection methods, sliding window based methods and regional targeting based methods. The former uses a window to traverse the whole image, extracts the image characteristics in the window at each position, and finally uses a classifier to classify the characteristics; the latter adopts a saliency detection or target method to extract a candidate region of a target, so that a large amount of calculation of a sliding window is avoided, but as the situation that a part of the target is framed or the adjacent target is distinguished is likely to occur, a certain influence is exerted on a detection result, and a plurality of methods are proposed to improve the problem, for example, a document proposes a region extraction method with stronger target, so that the number of candidate windows is reduced and the quality of the candidate windows is improved.

Although the current deep learning network is greatly developed in the field of target detection, target identification is basically performed on the basis of clear images, and identification accuracy is greatly reduced when the target is under complex conditions such as haze and the like.

In reality, rain, snow, fog, dust and the like are often encountered, and in severe weather, the performance of sensors for city security, high-speed monitoring and reconnaissance monitoring is rapidly reduced or even fails, so that the degradation problems of loss of details, increase of noise, reduction of contrast and the like of images acquired by a computer system are caused, and great influence is caused on post-processing such as image feature extraction, detection segmentation, target identification and the like. Fog, for example, is the most common extreme weather that affects visual performance, and image visual distortion from such severe weather conditions can affect the performance of many outdoor vision systems, negatively impacting target recognition accuracy.

The performance of a monitoring and reconnaissance sensor can be greatly influenced by complicated and changeable weather conditions such as haze and the like, the acquired image noise is increased, details are lost, the contrast is reduced and the like, the target identification and detection are used as a preprocessing part of most computer vision applications, and the identification effect of the target identification and detection inevitably can be interfered by haze conditions of different degrees. Therefore, the method can quickly and accurately remove the haze interference in the input image, and can stably and normally work in various complex states, thereby having extremely high value for improving the accuracy of target identification and detection. Some traditional defogging algorithms have the problems that a defogged picture can generate a halo artifact, the color distortion is serious, haze can be locally eliminated, but thick fog in some places cannot be eliminated, the instantaneity is difficult to guarantee, and the like.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a haze image target detection method based on an improved target detection network, which comprises the steps of firstly adopting a rapid defogging algorithm based on optimized contrast enhancement, effectively removing haze in real time and enhancing the contrast information of an image; then sending the defogged image into an improved Yolov4 target detection network for detection, and outputting target information under the haze weather condition; meanwhile, in order to improve the feature extraction capability of the YOLOV4 network, the invention replaces the residual blocks in the YOLOV4 main network with the dense link blocks, and compared with the residual blocks, the dense link blocks enable the network to improve the feature expression capability and improve the feature extraction capability of the network. Finally, experiments prove that the target detection capability under the haze weather condition can be improved by using the method.

The technical scheme adopted by the invention for solving the technical problem comprises the following steps:

step 1: acquiring a haze weather image containing a target;

step 2: defining a haze degradation model as shown in formula (1):

I(p)＝t(p)J(p)+(1-t(p))A (1)

wherein J (p) ═ J (J)_r(p)，J_g(p)，J_b(p))^TRepresenting the image after haze removal, J_r(p)，J_g(p)，J_b(p) pixel values of the image after haze removal in three channels of r, g, and b, respectively, and I (p) ═ I_r(p)，I_g(p)，I_b(p))^TRepresenting an image containing haze I_r(p)，I_g(p)，I_b(p) each isRepresenting pixel values of an image containing haze in three channels of r, g and b, wherein p represents the image; a represents the atmospheric light value of the image, t (p) epsilon [0, 1]Represents a transmittance;

and step 3: determining an atmospheric light value of the haze weather image by using a hierarchical search method based on quadtree subdivision;

dividing a haze weather image into four rectangular areas, subtracting the standard deviation of pixel values in each rectangular area from the average pixel value of each rectangular area, then selecting the rectangular area with the largest calculation result, further dividing the rectangular area into four smaller rectangular areas, and repeating the process until the size of the selected rectangular area is smaller than a pre-specified threshold;

in the finally selected rectangular area, calculating the distance between each pixel point and the white pixel, namely calculating I (I)_r(p)，I_g(p)，I_b(p)) - (255, 255, 255) l, finding a pixel point with the minimum distance to the white pixel, and taking the pixel value of the pixel point as the atmospheric light value of the haze weather image;

and 4, step 4: calculating an optimal transmittance based on the image block;

step 4-1: dividing the haze weather image into a plurality of image blocks b × b, and assuming that the scene depth of each image block is consistent; setting a transmittance for each image block, the haze degradation model for each image block is written as:

step 4-2: the mean square error contrast is adopted to represent the contrast of the haze-removed image block of the haze weather image block, and the mean square error contrast is represented by the formula (3):

where c e (r, g, b) represents the color channel index, J_c(p) represents the image block pixel values after haze removal,

is J_c(p), N is the number of pixels of the image block; according to formula (2), C_MSEThe rewrite is:

wherein, I_c(p) represents a haze weather image block pixel value,

is I_c(p) average value;

step 4-3: obtaining the optimum transmittance t^*；

Defining a contrast cost E_contrast：

Defining information loss cost E_loss：

E_loss＝∑_{c∈(r，g，b)}∑_p∈B{(min{0，J_c(p)})²+(max{0，J_c(p)-255})²} (6)

Where min {0, J }_c(p) } and max {0, J_c(p) -255} represent pixel values in the output defogged image that are truncated due to pixel underflow and overflow, respectively;

minimizing the overall cost function E to obtain the optimum transmission t^*The overall cost function E is expressed as:

E＝E_contrast+λ_LE_loss (7)

where λ L is a weighting parameter that controls the relative importance of contrast cost and information loss cost by controlling λ L_LA balance is struck between contrast enhancement and information loss;

and 5: recovering the haze image according to the formula (2) after obtaining the atmospheric light value and the optimal transmittance of the haze weather image;

step 6: labeling the target position and the type of the haze-removing image, and making a target data set;

and 7: performing data enhancement work on a target data set, wherein the enhancement method comprises but is not limited to rotation, translation, gamma transformation, MixUp, Mosaic and fuzzification to obtain a new target data set;

and 8: replacing the residual blocks in the YOLOV4 network with dense link blocks to obtain an improved YOLOV4 network;

a dense link block is made up of a number of 3 x 3 convolutional layers followed by a 1 x 1 convolutional layer, each 3 x 3 convolutional layer is directly connected to all the following 1 x 1 convolutional layers, and the output of the dense link block is represented as:

wherein X_kRepresents a characteristic diagram, W, of the 3X 3 convolutional layers preceding the 1X 1 convolutional layers_kRepresenting the weight, Y, assigned to each signature associated with a 1 x 1 convolutional layer_iRepresenting the output of the 1 × 1 convolutional layer in the dense linked block;

and step 9: training the improved YOLOV4 network by taking the image in the new target data set as a sample to obtain a target detection model of the image after haze removal;

step 10: and 2, the detected target is the haze image target detection model from the step 2 to the step 9, the haze image to be detected is input into the haze image target detection model, and the detected target is output.

Preferably, b is 32.

The invention has the following beneficial effects:

1. compared with the traditional atmospheric light calculation method, the method can effectively avoid the problem that the selection of the atmospheric light is inaccurate due to the fact that the object brighter than the atmospheric light can cause.

2. The method establishes a cost function consisting of a contrast item and an information loss item, and through minimizing the cost function, the defogging algorithm furthest reduces the information loss caused by the truncation of the pixel value while enhancing the contrast, stores the information in an optimal mode, and comprehensively considers the problem that the information loss is increased when the contrast is enhanced compared with the traditional method.

3. The method uses the dense link mode to replace a residual link mode in the YOLOV4 network, effectively improves the feature expression capability of the target detection network, and improves the detection capability of the network on complex small targets.

Drawings

FIG. 1 is a flow chart of a haze removal method according to the present invention.

FIG. 2 is a schematic diagram of hierarchical search based on quadtree subdivision according to the method of the present invention.

Fig. 3 is a diagram of a mapping between input (fogged) image pixel values to output (dehazed) image pixel values.

Fig. 4 shows a set of image defogging results under different transmittances according to the embodiment of the invention, in which the original images are sequentially displayed from left to right, and the transmission values are t equal to 0.1, t equal to 0.3, t equal to 0.5, and t equal to 0.7, respectively.

Fig. 5 is a YOLOV4 network residual block diagram.

FIG. 6 is a dense link block diagram.

Detailed Description

The invention is further illustrated with reference to the following figures and examples.

In order to improve the accuracy of target detection and identification under the haze weather condition, the main problems to be solved include the following two points: (1) how to improve the image contrast on the basis of minimum image information loss and effectively remove haze; (2) how to effectively improve the accuracy of target detection and identification. The measures adopted by the invention are as follows: (1) establishing a cost function consisting of a contrast item and an information loss item, and reducing the information loss caused by pixel value truncation to the maximum extent while enhancing the contrast by minimizing the cost function through a defogging algorithm, so that the information is stored in an optimal mode; (2) the dense link mode is used for replacing a residual link mode in the YOLOV4 network, the extraction capability of the network on the detail features of the target is effectively improved, the detection capability of the network on the complex small target is effectively improved, and therefore the detection precision of the overall target is effectively improved.

According to the method, the haze image is subjected to defogging treatment, the YOLOV4 network residual block is replaced by an intensive link mode, fine granularity features of the target can be better extracted, and finally the defogged image is subjected to target identification, so that a target identification task under the haze condition is realized.

A haze image target detection method based on an improved target detection network comprises the following steps:

step 1: acquiring a haze weather image containing a target;

step 2: defining a haze degradation model as shown in formula (1):

i (p) ═ t (p) J (p) +(1-t (p)) a (1) wherein J (p) ═ J (J)_r(p)，J_g(p)，J_b(p))^TRepresenting the image after haze removal, J_r(p)，J_g(p)，J_b(p) pixel values of the image after haze removal in three channels of r, g, and b, respectively, and I (p) ═ I_r(p)，I_g(p)，I_b(p))^TRepresenting an image containing haze I_r(p)，I_g(p)，I_b(p) respectively representing pixel values of the image containing haze in three channels of r, g and b, wherein p represents the image; a represents the atmospheric light value of the image, t (p) epsilon [0, 1]Represents a transmittance;

and step 3: as shown in fig. 2, determining the atmospheric light value of the haze weather image using a hierarchical search method based on quadtree subdivision;

atmospheric light in a haze degradation model is generally considered to be the brightest color in an image because a large amount of haze produces bright colors. However, in this approach, objects brighter than the atmospheric light may cause inaccuracies in the atmospheric light selection. In order to estimate the atmospheric light more reliably, the fact that: in foggy areas, such as the sky, the variance of the pixel values is typically low.

in the finally selected rectangular area, calculating the distance between each pixel point and the white pixel, namely calculating I (I)_r(p)，I_g(p)，I_b(p)) - (255, 255, 255) l, finding a pixel point with the minimum distance to a white pixel, taking the pixel value of the pixel point as an atmospheric light value of a haze weather image, and selecting the atmospheric light which is as bright as possible; in the traditional method, atmospheric light is generally considered as the brightest color in an image, however, an object brighter than the atmospheric light may cause inaccurate selection of the atmospheric light, and in contrast, the atmospheric light calculation method provided by the invention can effectively avoid the problem;

and 4, step 4: calculating an optimal transmittance based on the image block;

after estimation of the atmospheric light a, j (p) depends on the choice of the transmission t (p). Generally, the contrast generated by haze is low, and the contrast of the haze-removed block is increased along with the decrease of the estimated t (p), and it is desirable to estimate the optimal t (p) so that the haze-removed block has the maximum contrast;

wherein, I_c(p) represents a haze weather image block pixel value,

is I_c(p) average value; the MSE contrast is a decreasing function with respect to t (p). Since contrast is inversely related to the transmission t (p), a smaller value of t (p) may be selected to improve the contrast of the recovered block. However, as t (p) becomes smaller, more pixel values are truncated in the restored image, and as shown in fig. 4, the original image and the defogged image with transmission values t 0.1, t 0.3, t 0.5, and t 0.7 are sequentially displayed from left to right, and the smaller t, the more the pixel level truncation is serious; therefore, enhancing contrast and reducing information loss are contradictory processes;

step 4-3: obtaining the optimum transmittance t^*；

Defining a contrast cost E by taking the negative sum of the MSE contrasts of the three color channels of each block_contrastBy minimizing E_contrastMSE contrast can be maximized;

defining information loss cost E_loss：

Where min {0, J }_c(p) } and max {0, J_c(p) -255} represent pixel values in the output defogged image that are truncated due to pixel underflow and overflow, respectively, as shown in fig. 3;

E＝E_contrast+λ_LE_loss (7)

wherein λ L is a weighting parameter that controls the relative importance of contrast cost and information loss cost, by controlling λ L, a balance is struck between contrast enhancement and information loss;

and 5: as shown in fig. 1, after obtaining the atmospheric light value and the optimal transmittance of the haze weather image, the haze image is restored according to the formula (2);

and 8: the YOLOV4 target detection network is optimized and improved on the basis of the YOLOV3 network. The main work comprises the following steps: 1) the data are enhanced by using a Mosaic data enhancement technology, so that the background of a detected object is effectively enriched; 2) using a self-confrontation training data enhancement method that improves the robustness of the model by adding noise to the data; 3) modifying a spatial-wise attention mechanism (spatial-wise attention) of a spatial attention module into a point-wise attention mechanism (point-wise attention), and replacing shortcut connections in a path aggregation network with cascade connections; 4) the Batch Normalization (BN) is replaced by cross-batch normalization (CmBN), 4 mini lots inside the batch are isolated to the outside as a whole, and then statistics of the first three moments are summarized in the mini lots.

In order to improve the network feature extraction capability, the invention replaces the residual block in the Yolov4 network with the dense chainAnd (7) connecting blocks. YOLOV4 uses CSPDarknet53 as a backbone network extraction feature, and CSPDarknet53 is composed of adding CSP blocks on the basis of YOLOV3Darknet53 network. Each CSP block consists of a number of residual blocks, as shown in fig. 5. The residual block can improve the propagation efficiency of the gradient during network training, and the network training speed and the network characteristic propagation efficiency are improved at the same time. Suppose X_iRepresenting the output of the i-th layer of the network, X_i+1Is the output of the (i +1) th layer. The output of the residual layer in the network is:

Y_i＝(X_i+X_i-1)W

wherein W represents the added feature weight; it can be seen that the features output by the ith layer and the (i +1) th layer share the same weight; although the efficiency of network training can be improved, the output of the two layers share one weight, so that the diversity of the network on feature expression is limited, and the capability of the network on extracting tiny target features in a complex environment is limited.

To solve this problem, the residual blocks in the YOLOV4 network are replaced with dense link blocks, resulting in an improved YOLOV4 network;

as shown in fig. 6, a dense link block is composed of a plurality of 3 × 3 convolutional layers followed by 1 × 1 convolutional layers, each 3 × 3 convolutional layer is directly connected to all the following 1 × 1 convolutional layers, and the output of the dense link block is represented as:

wherein X_kRepresents a characteristic diagram, W, of the 3X 3 convolutional layers preceding the 1X 1 convolutional layers_kRepresenting the weight, Y, assigned to each signature associated with a 1 x 1 convolutional layer_iRepresenting the output of the 1 × 1 convolutional layer in the dense linked block; it can be seen that the dense linked blocks have a stronger feature expression capability than the residual linked blocks because the dense linked blocks assign different weights to each feature map.

The specific embodiment is as follows:

in order to verify the effectiveness of the algorithm, 11243 pictures of different scenes are collected under the haze condition, and the algorithm is verified.

To verify the effectiveness of the defogging algorithm and the modified YOLOV4 network, verification was performed on the haze image dataset and the defogged image dataset using original YOLOV4 and modified YOLOV4, respectively, and the experimental results are shown in table 1. As can be seen from the table, under the same data set, the precision of the improved YOLOV4 is improved to a certain extent compared with that of the original YOLOV4, because the improved YOLOV4 has better network feature extraction capability; under the condition of adopting the same model, the target recognition precision in the defogged image is improved to a certain extent compared with that in the haze image, and the detection accuracy of the deep learning detection network on the target under the haze condition can be effectively improved through the defogging treatment. Therefore, the defogging algorithm based on optimized contrast enhancement is combined with the improved YOLOV4 network, the detection and identification rate of the target under the haze condition is improved to a great extent in numerical value, and the method is extremely effective.

Table 1 comparative experiments on haze and defogging datasets for Yolov4 and modified Yolov4

In the table, AP represents average accuracy, AP₅₀Denotes an average accuracy when IoU is 0.5 or more, AP₇₅The average accuracy at IoU of 0.75 or more was represented as a strict mode measurement.

Claims

1. A haze image target detection method based on an improved target detection network is characterized by comprising the following steps:

step 1: acquiring a haze weather image containing a target;

step 2: defining a haze degradation model as shown in formula (1):

and 4, step 4: calculating an optimal transmittance based on the image block;

wherein, I_c(p) represents a haze weather image block pixel value,

is I_c(p) average value;

step 4-3: obtaining the optimum transmittance t^*；

Defining a contrast cost E_contrast：

Defining information loss cost E_loss：

E＝E_contrast+λ_LE_loss (7)

wherein λ_LIs a weighting parameter that controls the relative importance of contrast cost and information loss cost by controlling lambda_LA balance is struck between contrast enhancement and information loss;

wherein X_kRepresents a characteristic diagram, W, of the 3X 3 convolutional layers preceding the 1X 1 convolutional layers_kRepresenting the weight, Y, assigned to each signature associated with a 1 x 1 convolutional layer_iPresentation secretCollecting the output of 1 × 1 convolutional layer in the link block;

2. The method for detecting the haze image target based on the improved target detection network as claimed in claim 1, wherein b is 32.