CN115861971A

CN115861971A - Night vehicle and pedestrian detection method based on improved YOLOv4-tiny

Info

Publication number: CN115861971A
Application number: CN202211631655.XA
Authority: CN
Inventors: 吕圣; 周奎; 刘瀚文; 张伟; 王红霞; 张友兵; 张宇丰; 赵毅; 阮航
Original assignee: Hubei University of Automotive Technology
Current assignee: Hubei University of Automotive Technology
Priority date: 2022-12-19
Filing date: 2022-12-19
Publication date: 2023-03-28

Abstract

The invention provides a night vehicle and pedestrian detection method based on improved YOLOv4-tiny, which comprises the steps of firstly establishing a night vehicle and pedestrian data set according to a NightOwls data set; inputting the data set into a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure, and training the night vehicle and pedestrian data set through the network model to obtain training weights; deploying the trained network model to a local auxiliary driving platform; and loading the training weight and transplanting the training weight to an automatic driving platform, and judging whether the image frames acquired by the automatic driving platform camera contain vehicles and pedestrians by using the network model. On the basis of ensuring the detection speed, the method can effectively solve the problem that the targets of the vehicles and the pedestrians are difficult to identify under the conditions of weak light, large difference between light and shade and the like in a road scene at night, and improves the detection precision and speed of the vehicles and the pedestrians at night.

Description

Night vehicle and pedestrian detection method based on improved YOLOv4-tiny

Technical Field

The invention relates to the technical field of automatic driving or unmanned driving, in particular to a night vehicle and pedestrian detection method based on improved YOLOv 4-tiny.

Background

As one of the important research fields of computer vision, the application of pedestrian detection is very wide. In recent years, computer vision technology based on a deep convolutional neural network has attracted attention, and the achievement promotes the development of unmanned driving, monitoring video, internet video retrieval processing, man-machine interaction, virtual reality, medical care, intelligent security and the like. Pedestrian detection is an important research content in the automatic driving technology and is also one of the research hotspots in the field of computer vision.

The automatic driving technology is generally divided into three major parts, namely environment perception, decision planning and vehicle control. The vision is easily affected by environmental changes, the effect is not ideal when the visibility is low, the sensor belongs to a passive type, and the sensor is sensitive to a plurality of uncertain factors such as illumination, glare, visual angle, scale, shadow, fouling, background interference, target shielding and the like. The whole luminosity of a driving road is darker at night, the local illumination of street lamps, illumination sources and the like is stronger, the whole image resolution is lower, and the difficulty of detecting a target by a vision sensor at night is increased.

However, in the automatic driving application, many complex weak lighting situations inevitably occur, and in an actual night driving scene, the vehicle and the detection target have relative motion, the image generated by the vehicle-mounted camera detection device has a certain degree of motion blur and image noise, and the image is usually presented with some blur and artifacts, which obviously increases the difficulty of the automatic driving target detection work to some extent. Therefore, the night vehicle and pedestrian detection method is also one of the main factors that hinder the development of the automatic driving technology.

In view of the above, it is necessary to intensively study a vehicle and pedestrian detection method under nighttime road conditions to improve the effect thereof in practical applications.

Disclosure of Invention

Aiming at the technical problems in the background art, the invention provides a night vehicle and pedestrian detection method based on improved YOLOv4-tiny, which has reasonable conception, can effectively solve the problem that the targets of the vehicles and the pedestrians are difficult to identify under the conditions of weak light, large light and shade difference and the like in a night road scene on the basis of ensuring the detection speed, and improves the detection precision and speed of the vehicles and the pedestrians at night.

In order to solve the technical problems, the invention provides a night vehicle and pedestrian detection method based on improved YOLOv4-tiny, which comprises the steps of firstly establishing a night vehicle and pedestrian data set according to a NightOwls data set; inputting the data set into a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure, and training the night vehicle and pedestrian data set to obtain training weights through the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure; deploying a trained night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure to a local auxiliary driving platform; loading training weights and transplanting the training weights to an automatic driving platform, judging whether the image frames acquired by the automatic driving platform camera contain vehicles and pedestrians by using a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure, if so, marking the image frames by using a boundary frame, and otherwise, not performing any processing.

The night vehicle and pedestrian detection method based on improved YOLOv4-tiny is characterized in that: the training process of the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure specifically comprises the following steps:

(2.1) training a coco data set by using a YOLOv4-tiny network to obtain a weight file serving as an initial weight file of a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure;

(2.2) constructing an initial data set according to a NightOwls night data set, labeling images of vehicles and pedestrians in the initial data set, and dividing the data set into a training set, a verification set and a test set according to the proportion of 7;

(2.3) according to the training set, the verification set and the local loss function, iteratively updating parameters of a deep convolutional neural network in a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure after initializing network parameters by using a gradient descent back propagation method; taking the network parameters obtained after iteration to the maximum set times as the optimal network parameters, completing training, and obtaining the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure;

and (2.4) transplanting the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure to a local auxiliary driving platform, judging whether the driving platform image contains the vehicle and the pedestrian by using the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure, if so, marking by using a boundary frame, and if not, carrying out no processing.

The night vehicle and pedestrian detection method based on improved YOLOv4-tiny is characterized in that: the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure is characterized in that a feature extraction network and a feature fusion network are improved, and meanwhile, an SPP pyramid pooling model module is introduced and a 52 x 52 YOLO detection head is added to form a final improved algorithm structure; the characteristic fusion network comprises an upper sampling layer, a fusion layer, a lower sampling layer and a convolution layer, and is connected in sequence according to the data flow direction; the up-sampling layer comprises a first up-sampling layer and a second up-sampling layer; the fusion layers comprise a first fusion layer, a second fusion layer, a third fusion layer and a fourth fusion layer; the downsampling layers include a first downsampling layer and a second downsampling layer.

The night vehicle and pedestrian detection method based on the improved YOLOv4-tiny is characterized in that the process that the characteristic fusion network is sequentially connected according to the data flow direction is as follows: inputting an output feature map of the SPP pyramid pooling model module into a CBS (cubic boron sulfide) convolutional layer, outputting the feature map to a second upsampling layer and a fourth fusion layer by the CBS convolutional layer, and finally inputting the output feature map of the fourth fusion layer into a YOLO (YOLO) detection head with the size of 13 multiplied by 13; inputting an output feature map of 13 × 13 size of the feature extraction network to the second fusion layer through convolution operation, inputting the output feature map of the second fusion layer to the first up-sampling layer and the third fusion layer respectively, and inputting the output feature map of the third fusion layer to the second down-sampling layer and a YOLO detection head of 26 × 26 size; the output feature map of the feature extraction network with the size of 52 × 52 is input to the first fusion layer, and the output feature map of the first fusion layer is input to the first downsampling layer and the YOLO detection head with the size of 52 × 52.

The night vehicle and pedestrian detection method based on the improved YOLOv4-tiny is characterized in that the construction process of the night vehicle and pedestrian target detection network model based on the improved YOLOv4-tiny is as follows:

(1.1) adding a 52 x 52 scale feature in a YOLOv4-tiny feature fusion network, fusing deep semantic information from bottom to top, and correspondingly adding a detection head to the newly fused feature layer; in order to fully fuse the feature information of different scales, a top-down sampling path is added in the YOLOv4-tiny feature fusion network to obtain a new feature fusion structure;

(1.2) introducing an SPP pyramid pooling model module before convolution of the last layer of a YOLOv4-tiny feature extraction network, and extracting and fusing the features of different receptive fields to improve the extraction of fuzzy and shielding image features; then adopting four pooling windows of 1 × 1, 5 × 5, 9 × 9 and 13 × 13 to perform maximum pooling on the feature map and then splicing, and fusing global and local features;

(1.3) optimizing a loss function of the YOLOv4-tiny algorithm model by adopting a Focal loss function, and adjusting the weight of positive and negative samples on the total loss;

(1.4) replacing the Leaky ReLU activation function in the YOLOv4-tiny algorithm model with the SiLU activation function.

The night vehicle and pedestrian detection method based on improved Yolov4-tiny, wherein in the step (1.1), a 52 × 52 scale feature is newly added in a Yolov4-tiny feature fusion network, deep semantic information is fused from bottom to top, and a 52 × 52 YOLO detection head is correspondingly added to the newly fused feature layer, and the specific process is as follows: inputting a CBS convolutional layer through an output feature map of an SPP pyramid pooling model module, outputting the feature map to a second upsampling layer and a fourth fusion layer by the CBS convolutional layer, outputting the feature map to the second fusion layer by the second upsampling layer, outputting the feature map to a first upsampling layer and a third fusion layer by the second fusion layer, outputting the feature map to the first fusion layer by the first upsampling layer, and inputting the output feature map of the first fusion layer to a YOLO detection head with the size of 52 multiplied by 52; the CBS convolutional layer is composed of a convolutional layer, a normalization layer and an activation function.

The night vehicle and pedestrian detection method based on improved YOLOv4-tiny is characterized in that: the SPP pyramid pooling model module comprises 4 parallel maximum pooling layers MAXpool with pooling kernel sizes of 13 × 13, 9 × 9, 5 × 5 and 1 × 1 respectively; the SPP pyramid pooling model module is used for enhancing the receptive field of an output feature map with the size of 13 x 13 in a feature extraction network, obtaining feature maps of different receptive fields by different pooling cores, performing maximum pooling on the feature maps by four pooling windows of 1 x 1, 5 x 5, 9 x 9 and 13 x 13, removing channel dimensions, splicing, fusing global and local features, and expressing the capability by rich feature information to obtain the output feature map of the SPP pyramid pooling model module.

The night vehicle and pedestrian detection method based on improved YOLOv4-tiny is characterized in that: in the step (1.1), the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure obtains semantic information and resolution information by performing feature fusion on three scale features of 13 × 13, 26 × 26 and 52 × 52, and generates three YOLO detection heads of 13 × 13, 26 × 26 and 52 × 52 with different scales; the 13 × 13, 26 × 26 and 52 × 52 YOLO detection heads with different scales are respectively used for improving three fusion feature maps with the sizes of 13 × 13, 26 × 26 and 52 × 52 of the feature fusion module, and the whole detection process is completed.

The night vehicle and pedestrian detection method based on improved YOLOv4-tiny is characterized in that: the expression of the Focal loss function in the step (1.3) is as follows:

L _focal (p _t )＝-α _t (1-p _t ) ^γ log(p _t ) (1)

in the above formulas (1) - (3), y =1 represents a positive sample, y =0 represents a negative sample, p ∈ [0,1] is the probability of judging the sample to be positive, α ∈ [0,1] is a sample weight factor and is used for balancing the number of positive and negative samples, γ ≧ 0 is a difficult sample focusing factor and is used for attenuating the loss value proportion of an easily classified sample, so that the algorithm is more focused on the difficultly classified sample.

The night vehicle and pedestrian detection method based on improved YOLOv4-tiny is characterized in that: the expression of the SiLU activation function in the step (1.4) is as follows:

the SiLU activation function is a continuous function, has a lower bound and is not monotonous, and can be continuously differentiated; the SiLU activation function goes to 0 in the negative infinity direction and diverges in the positive infinity direction.

The night vehicle and pedestrian detection method based on improved YOLOv4-tiny is characterized in that: the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure is an improved model based on the improved YOLOv4-tiny algorithm structure.

The night vehicle and pedestrian detection method based on improved YOLOv4-tiny is characterized in that: the night vehicle and pedestrian data set consists of night vehicle and pedestrian pictures in a NightOwls data set, and the total number of the night vehicle and pedestrian pictures is 12000 with different postures and different shielding degrees.

By adopting the technical scheme, the invention has the following beneficial effects:

the night vehicle and pedestrian detection method based on improved YOLOv4-tiny has reasonable conception, and effectively solves the problem that the technical defects of the existing algorithm in night scenes, such as the difficulty in identifying the vehicle and pedestrian targets due to the conditions of weak light, large light and shade difference and the like, are solved on the basis of ensuring the detection speed.

In the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny, in order to enhance the detection capability of the network to sheltering and small targets, a new 52 x 52 scale feature is added in a YOLOv4-tiny feature fusion network, deep semantic information is fused from bottom to top, and a detection head is output; introducing a Spatial Pyramid Pooling module (SPP) to obtain multi-scale features of the same convolutional layer; the detection precision of the algorithm is improved by using the Focal loss function; the SiLU activation function is used as the activation function, so that the complexity of network training is reduced, the algorithm generalization capability is improved, and the precision and speed of vehicle and pedestrian detection at night are improved. The algorithm is extremely light, the weight is only 43.32MB, the weight is less than one fifth of that of a YOLOv4 model, and the method is suitable for being deployed on an automatic driving platform with embedded platform modules such as Nvidia Jetson xavier.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description in the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of the night vehicle and pedestrian detection steps in the night vehicle and pedestrian detection method based on the improved YOLOv4-tiny algorithm of the present invention;

FIG. 2 is a schematic structural composition diagram of a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure in the night vehicle and pedestrian detection method based on the improved YOLOv4-tiny algorithm structure of the present invention;

FIG. 3 is a flow chart of an SPP pyramid pooling model in a feature extraction network in the nighttime vehicle and pedestrian detection method based on the improved YOLOv4-tiny algorithm of the present invention;

FIG. 4 is a graph of the SiLU activation function in the nighttime vehicle and pedestrian detection method of the present invention based on the improved YOLOv4-tiny algorithm;

FIG. 5 is an mAP value evaluation index of different improved methods in the night vehicle and pedestrian detection method based on the improved YOLOv4-tiny algorithm;

FIG. 6 shows a night vehicle and pedestrian detection method based on the improved YOLOv4-tiny algorithm

FIG. 7 is a real object diagram of an automatic driving platform deployed in the night vehicle and pedestrian detection method based on the improved YOLOv4-tiny algorithm according to the present invention;

fig. 8 is a test result diagram of the automatic driving platform deployed in the night vehicle and pedestrian detection method based on the improved YOLOv4-tiny algorithm in the night illumination environment.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The present invention will be further explained with reference to specific embodiments.

As shown in fig. 1, the night vehicle and pedestrian detection method based on the improved YOLOv4-tiny algorithm of the present invention is to establish a night vehicle and pedestrian data set according to a NightOwls data set; inputting the data set into a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure, and training the night vehicle and pedestrian data set to obtain training weights through the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure; deploying a trained night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure to a local auxiliary driving platform; and loading a training weight to the automatic driving platform, judging whether the image frames acquired by the automatic driving platform camera contain vehicles and pedestrians by using a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure, if so, marking by using a boundary frame, and otherwise, not performing any processing.

The night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure is an improved model based on the improved YOLOv4-tiny algorithm structure.

The night vehicle and pedestrian data set consists of night vehicle and pedestrian pictures in the NightOwls data set, and the total number of the night vehicle and pedestrian pictures is 12000 with different postures and different shielding degrees.

As shown in fig. 2, the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure is an improvement on a feature extraction network and a feature fusion network, and an SPP pyramid pooling model module and a 52 × 52 YOLO detection head are added to form a final improved algorithm structure. The feature fusion network includes an up-sampling layer, a fusion layer, a down-sampling layer, and a convolution layer, and is connected in sequence according to the data flow direction shown in fig. 2. The upsampling layers include a first upsampling layer (i.e., "Upsample-11" in fig. 2) and a second upsampling layer (i.e., "Upsample-12" in fig. 2); the fused layers include a first fused layer (i.e., "Concat-21" in FIG. 2), a second fused layer (i.e., "Concat-22" in FIG. 2), a third fused layer (i.e., "Concat-23" in FIG. 2), and a fourth fused layer (i.e., "Concat-24" in FIG. 2); the down-sampling layers include a first down-sampling layer (i.e., "down sample-31" in fig. 2) and a second down-sampling layer (i.e., "down sample-32" in fig. 2).

The above-mentioned feature fusion network sequentially connects according to the data flow direction as follows: inputting an output characteristic diagram of the SPP pyramid pooling model module into a CBS (cubic boron sulfide) convolutional layer, outputting the characteristic diagram of the CBS convolutional layer to a second upper sampling layer and a fourth fusion layer, and finally inputting an output characteristic diagram of the fourth fusion layer to a YOLO (YOLO) detection head with the size of 13 multiplied by 13; inputting an output feature map of 13 × 13 size of the feature extraction network to the second fusion layer by convolution operation, inputting the output feature map of the second fusion layer to the first up-sampling layer and the third fusion layer, respectively, and inputting the output feature map of the third fusion layer to the second down-sampling layer and a YOLO detection head of 26 × 26 size; the output feature map of 52 × 52 size of the feature extraction network is input to the first fusion layer, and the output feature map of the first fusion layer is input to the first downsampling layer and the YOLO detection head of 52 × 52 size.

The improvement process of the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure comprises the following steps:

(1.1) as shown in fig. 2, a 52 × 52 scale feature (three scale features 13 × 13, 26 × 26, and 52 × 52, respectively) is newly added in the YOLOv4-tiny feature fusion network, deep semantic information is fused from bottom to top, and a detection head is correspondingly added to a newly fused feature layer; in addition, in order to fully fuse the feature information of different scales, a top-down sampling path is added in the YOLOv4-tiny feature fusion network to obtain a new feature fusion network structure.

(1.2) introducing an SPP pyramid pooling model module before convolution of the last layer of a YOLOv4-tiny feature extraction network, and extracting and fusing the features of different receptive fields to improve the extraction of fuzzy and shielding image features; and then, the feature graphs are spliced after being subjected to maximum pooling by adopting four pooling windows of 1 × 1, 5 × 5, 9 × 9 and 13 × 13, and global and local features are fused, so that the feature information expression capability is enriched, and the detection precision is improved.

(1.3) optimizing a loss function of the YOLOv4-tiny algorithm model by adopting a Focal loss function, and adjusting the weight of the positive and negative samples to the total loss.

And (1.4) replacing a Leaky ReLU activation function in a YOLOv4-tiny algorithm model by a SiLU activation function so as to improve the accuracy of algorithm detection.

In the step (1.1), a 52 × 52 scale feature is newly added in the YOLOv4-tiny feature fusion network, deep semantic information is fused from bottom to top, and a YOLO detection head with a size of 52 × 52 is correspondingly added in the newly fused feature layer. The specific process is as follows: as shown in fig. 2, the output feature map of the SPP pyramid pooling model module is input into the CBS convolutional layer, which outputs the feature map to the second upsampling layer and the fourth fusion layer, the second upsampling layer outputs the feature map to the second fusion layer, the second fusion layer outputs the feature map to the first upsampling layer and the third fusion layer, the first upsampling layer outputs the feature map to the first fusion layer, and the first fusion layer outputs the feature map to the YOLO detection head with the size of 52 × 52. The CBS convolutional layer is composed of three network layers, i.e., convolutional layer constraint, normalization layer Batch normalization, and activation function. The Convolution is consistent with the Convolution in the original YOLOv4-tiny algorithm, and is used for inter-layer data information transfer, which is not described herein again.

As shown in fig. 3, the SPP pyramid pooling model module includes 4 parallel maximum pooling layers MAXpool with pooling kernel sizes of 13 × 13, 9 × 9, 5 × 5 and 1 × 1. The SPP pyramid pooling model module is used for enhancing the receptive field of an output feature map with the size of 13 x 13 in a feature extraction network, obtaining feature maps with different receptive fields by different pooling kernels, performing maximum pooling on the feature maps by four pooling windows of 1 x 1, 5 x 5, 9 x 9 and 13 x 13, removing channel dimensions, splicing, and fusing global and local features, thereby enriching feature information expression capability and obtaining the output feature map of the SPP pyramid pooling model module.

The effect of the newly added 52 × 52 scale features in the step (1.1) is to enhance the detection capability of the network for occlusion and small targets.

In the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure in the step (1.1), the multi-scale feature fusion is performed on the three scale features 13 × 13, 26 × 26 and 52 × 52 to obtain stronger semantic information and high-resolution information, and the YOLO detection heads with the three different scales 13 × 13, 26 × 26 and 52 × 52 are respectively generated to improve the detection accuracy of the night vehicle and pedestrian. The three YOLO detection heads are respectively used for improving three fusion feature maps with the sizes of 52 × 52, 26 × 26 and 13 × 13 of the feature fusion module, and the whole detection process is completed.

The expression of the Focal loss function in the step (1.3) is as follows:

L _focal (p _t )＝-α _t (1-p _t ) ^γ log(p _t ) (1)

in the above formulas (1) - (3), y =1 represents a positive sample, y =0 represents a negative sample, p e [0,1] is the probability of judging the sample to be positive, α e [0,1] is a sample weight factor used for balancing the number of the positive and negative samples, γ ≧ 0 is a difficult and easy sample focusing factor used for attenuating the loss value proportion of the easy-to-classify sample, so that the algorithm is more focused on the difficult-to-classify sample. The Focal loss function improves the influence of the samples difficult to classify on gradient updating, and avoids the influence of a large amount of useless information on network training.

The expression of the sulu activation function in the step (1.4) is as follows:

the SiLU activation function is a continuous function, has a lower bound and is not monotonous, and can be continuously differentiated, so that the gradient optimization effect of the neural network is better when the neural network reversely propagates. As shown in fig. 4, in addition, the sulu activation function tends to 0 in the negative infinite direction, and diverges in the positive infinite direction, which helps to control the network complexity, thereby avoiding overfitting and improving the algorithm generalization capability.

The training process of the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure specifically comprises the following steps:

(2.1) training in a coco data set by using a YOLOv4-tiny network to obtain a weight file as an initial weight file of a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure;

(2.2) constructing an initial data set according to a NightOwls night data set, labeling images of vehicles and pedestrians in the initial data set, and dividing the data set into a training set, a verification set and a test set according to the proportion of (7);

(2.3) according to the training set, the verification set and the local loss function, iteratively updating parameters of a deep convolutional neural network in a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure after initializing network parameters by using a gradient descent back propagation method; taking the network parameters obtained after iteration to the maximum set times as the optimal network parameters, completing training, and obtaining a night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure;

In order to verify and evaluate the detection effect of the night vehicle and pedestrian target detection network based on the night vehicle and pedestrian detection network model with the improved YOLOv4-tiny algorithm structure, the commonly used Precision (Precision), recall (Recall), average Precision (mAP) and F1 score are selected as index evaluation. The specific formula is as follows:

in the above equations (5) to (8), TP (True Positive) indicates a Positive sample predicted to be True, FP (False Positive) indicates a Positive sample predicted to be False, FN (False Negative) indicates a Negative sample predicted to be False, and k is the number of classes.

Attempts have been made to devise ablation experiments to compare the impact of different improvement methods on detection performance. YOLOv4-tiny represents the original network model; A-YOLOv4-tiny represents a network model using the SiLU activation function; B-YOLOv4-tiny represents a network model using a Focal loss function; C-YOLOv4-tiny represents a network model of multi-scale detection; D-YOLOv4-tiny represents a network model for introducing the SPP module; improved-YOLOv4-tiny represents a night vehicle and pedestrian detection network model based on an Improved YOLOv4-tiny algorithm structure. The mAP values of the six network models under different iterative training times in the two sets of data sets are calculated by taking the mAP values as evaluation indexes, and a data statistical result is shown in FIG. 5. The test result data of the validation set with the optimal weight obtained after the six network models are trained on the two sets of data sets is shown in table 1. The detection effects of the six network models are shown in fig. 6.

TABLE 1 comparison of the effects of different improvements

In addition, a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure is used for training and testing five advanced target detection networks, namely Mobilene-YOLOv 4, YOLOv3, YOLOv4, SDD and Faster-RCNN, and the test results are compared with the test results, wherein the test results are shown in a table 2.

The detection precision (mAP value) of the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure reaches 96.64% which is higher than that of a YOLOv4, SSD and a Mobilene-YOLOv 4 model, the detection speed (FPS) of the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure reaches 49.83 frames per second, the detection precision is only 0.5% different from that of the YOLOv4 model, but the weight is less than one fifth of that of the YOLOv4 model, and the night vehicle and pedestrian detection network model is suitable for being deployed on an automatic driving platform with embedded platform modules such as Nvidia Jetson xavier.

TABLE 2 comparison of Performance between different test methods

In order to verify the actual detection effect of the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure, the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure is deployed on an unmanned platform with an Nvidia Jetson xavier embedded platform module and an image acquisition module shown in fig. 7, the night road scene in the campus is detected in real time, and the detection result is shown in fig. 8. The night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure has good detection effect in the night environment.

The method has reasonable concept, and solves the technical defects of the existing algorithm in night scenes, such as the problem that the targets of vehicles and pedestrians are difficult to identify due to the conditions of weak light, large light and shade difference and the like, on the basis of ensuring the detection speed.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A night vehicle and pedestrian detection method based on improved YOLOv4-tiny is characterized in that: firstly, establishing a data set of vehicles and pedestrians at night according to a NightOwls data set; inputting the data set into a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure, and training the night vehicle and pedestrian data set to obtain training weights through the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure; deploying a trained night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure to a local auxiliary driving platform; loading training weights and transplanting the training weights to an automatic driving platform, judging whether the image frames acquired by the automatic driving platform camera contain vehicles and pedestrians by using a night vehicle and pedestrian detection network model based on an improved YOLOv4-tiny algorithm structure, if so, marking the image frames by using a boundary frame, and otherwise, not performing any processing.

2. The improved YOLOv4-tiny based night vehicle and pedestrian detection method of claim 1, wherein: the training process of the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure specifically comprises the following steps:

(2.3) according to the training set, the verification set and the local loss function, iteratively updating parameters of a deep convolutional neural network in the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure after initializing network parameters by using a gradient descent back propagation method; taking the network parameters obtained after iteration to the maximum set times as the optimal network parameters, completing training, and obtaining the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure;

3. The improved YOLOv4-tiny based night vehicle and pedestrian detection method of claim 1, wherein: the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure is characterized in that a feature extraction network and a feature fusion network are improved, and an SPP pyramid pooling model module and a 52 x 52 YOLO detection head are added to form a final improved algorithm structure; the characteristic fusion network comprises an upper sampling layer, a fusion layer, a lower sampling layer and a convolution layer, and is connected in sequence according to the data flow direction; the up-sampling layers include a first up-sampling layer and a second up-sampling layer; the fusion layers comprise a first fusion layer, a second fusion layer, a third fusion layer and a fourth fusion layer; the downsampling layers include a first downsampling layer and a second downsampling layer.

4. The improved YOLOv 4-tiny-based night vehicle and pedestrian detection method of claim 3, wherein the sequential connection of the feature fusion network according to the data flow direction comprises: inputting an output feature map of the SPP pyramid pooling model module into a CBS (cubic boron sulfide) convolutional layer, outputting the feature map to a second upsampling layer and a fourth fusion layer by the CBS convolutional layer, and finally inputting the output feature map of the fourth fusion layer into a YOLO (YOLO) detection head with the size of 13 multiplied by 13; the output feature map of 13 × 13 size of the feature extraction network is input to the second fusion layer through convolution operation, the output feature map of the second fusion layer is input to the first up-sampling layer and the third fusion layer respectively, and the output feature map of the third fusion layer is input to the second down-sampling layer and the YOLO detection head of 26 × 26 size; the output feature map of 52 × 52 size of the feature extraction network is input to the first fusion layer, and the output feature map of the first fusion layer is input to the first downsampling layer and the YOLO detection head of 52 × 52 size.

5. The improved YOLOv4-tiny based night vehicle and pedestrian detection method of claim 3, wherein the improved YOLOv4-tiny based night vehicle and pedestrian target detection network model is constructed by the following steps:

(1.2) introducing an SPP pyramid pooling model module before the last layer of convolution of the YOLOv4-tiny feature extraction network, extracting and fusing the features of different receptive fields, and improving the extraction of fuzzy and shielding image features; then adopting four pooling windows of 1 × 1, 5 × 5, 9 × 9 and 13 × 13 to perform maximum pooling on the feature map and then splicing, and fusing global and local features;

6. The improved YOLOv 4-tiny-based night vehicle and pedestrian detection method according to claim 5, wherein in the step (1.1), a new 52 x 52 scale feature is added to the YOLOv4-tiny feature fusion network, the deep semantic information is fused from bottom to top, and a YOLO detection head with a size of 52 x 52 is correspondingly added to the newly fused feature layer, which comprises the following specific processes: inputting a CBS (convolutional layer cubic) by using an output characteristic diagram of an SPP (sinusoidal pulse P) pyramid pooling model module, outputting a characteristic diagram to a second upper sampling layer and a fourth fusion layer by using the CBS convolutional layer, outputting the characteristic diagram to the second fusion layer by using the second upper sampling layer, outputting the characteristic diagram to a first upper sampling layer and a third fusion layer by using the second fusion layer, outputting the characteristic diagram to the first fusion layer by using the first upper sampling layer, and inputting the output characteristic diagram to a YOLO (YOLO) detection head with the size of 52 x 52; the CBS convolutional layer consists of three network layers, namely a convolutional layer, a normalization layer and an activation function.

7. The improved YOLOv4-tiny based night time vehicle and pedestrian detection method of claim 6, wherein: the SPP pyramid pooling model module comprises 4 parallel maximum pooling layers MAXpool with pooling kernel sizes of 13 × 13, 9 × 9, 5 × 5 and 1 × 1; the SPP pyramid pooling model module is used for enhancing the receptive field of an output feature map with the size of 13 x 13 in a feature extraction network, obtaining feature maps with different receptive fields by different pooling kernels, performing maximum pooling on the feature maps by four pooling windows of 1 x 1, 5 x 5, 9 x 9 and 13 x 13, removing channel dimensions, splicing, fusing global and local features, and expressing the capability by rich feature information to obtain the output feature map of the SPP pyramid pooling model module.

8. The improved YOLOv4-tiny based night time vehicle and pedestrian detection method of claim 5, wherein: in the step (1.1), the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure obtains semantic information and resolution information by performing feature fusion on three scale features of 13 × 13, 26 × 26 and 52 × 52, and generates three YOLO detection heads of 13 × 13, 26 × 26 and 52 × 52 with different scales; the 13 × 13, 26 × 26 and 52 × 52 YOLO detection heads with different scales are respectively used for improving three fusion feature maps with the sizes of 13 × 13, 26 × 26 and 52 × 52 of the feature fusion module, and the whole detection process is completed.

9. The improved YOLOv4-tiny based night time vehicle and pedestrian detection method of claim 5, wherein: the expression of the Focal loss function in the step (1.3) is as follows:

L _focal (p _t )＝-α _t (1-pt) ^γ log(pt) (1)

in the above formulas (1) - (3), y =1 represents a positive sample, y =0 represents a negative sample, p ∈ [0,1] is the probability of judging the sample to be positive, α ∈ [0,1] is a sample weighting factor and is used for balancing the number of the positive and negative samples, γ ≧ 0 is a difficult sample focusing factor and is used for attenuating the loss value proportion of the easily-classified sample, so that the algorithm is more focused on the difficultly-classified sample.

10. The improved YOLOv4-tiny based night time vehicle and pedestrian detection method of claim 5, wherein: the expression of the SiLU activation function in the step (1.4) is as follows:

11. The improved YOLOv4-tiny based night vehicle and pedestrian detection method of claim 1, wherein: the night vehicle and pedestrian detection network model based on the improved YOLOv4-tiny algorithm structure is an improved model based on the improved YOLOv4-tiny algorithm structure.

12. The improved YOLOv4-tiny based night vehicle and pedestrian detection method of claim 1, wherein: the night vehicle and pedestrian data set consists of night vehicle and pedestrian pictures in a NightOwls data set, and the total number of the night vehicle and pedestrian pictures is 12000 with different postures and different shielding degrees.