CN113553979B

CN113553979B - Safety clothing detection method and system based on improved YOLO V5

Info

Publication number: CN113553979B
Application number: CN202110871211.2A
Authority: CN
Inventors: 于俊清; 张培基; 陈刚
Original assignee: Guodian Hanchuan Power Generation Co ltd; Huazhong University of Science and Technology
Current assignee: Guodian Hanchuan Power Generation Co ltd; Huazhong University of Science and Technology
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2023-08-08
Anticipated expiration: 2041-07-30
Also published as: CN113553979A

Abstract

The invention discloses a safety clothing detection method and system based on improved YOLO V5, and belongs to the field of target detection. Comprising the following steps: training and improving the YOLO V5 by adopting a training set of the wearing state of the safety clothing, wherein a training sample comprises a picture frame of a worker, and a label is the wearing state of the safety clothing, so as to obtain a trained detection model; and inputting each frame of the industrial monitoring video into a trained detection model to obtain a safety clothing detection result. According to the invention, different neural network structures are used for replacing a backbond module of an original YOLO V5 algorithm, and EfficientNet is used as the backbond, so that the width and depth of the network structure and the resolution of an input image are uniformly scaled through the expansion coefficient of the composite model, and the effect superior to the effect of manually adjusting parameters of YOLO V5 is obtained. ResNet50 is adopted as a Backbone, and as a residual block is added, the network extracts the characteristic information and completely retains the characteristic information to the next layer, so that gradient dispersion between network layers is effectively eliminated in the forward propagation process. And the SheffeNet or the MobileNet is used as a Backbone, so that the complexity of a network structure and the volume of a model are reduced, and a lightweight model is obtained.

Description

Safety clothing detection method and system based on improved YOLO V5

Technical Field

The invention belongs to the field of target detection, and in particular relates to a safety clothing detection method and system based on improved YOLO V5.

Background

In industrial production, safety problems are critical, and the awareness of safety production is deep. The safety clothing is protective clothing which is necessary to be worn by workers in production operation areas, and the body of the workers can be effectively protected by normatively wearing the safety clothing, so that the damage of dangerous chemical substances such as acid and alkali to the skin is reduced. The detection method for the state of wearing the safety clothing by the staff based on the industrial monitoring video in real time is an important guarantee for the life safety of the staff in an industrial scene, and is also important for standardizing industrial management and safety production. Currently, most industrial management units mostly adopt a manual supervision method to perform visual inspection on whether workers entering and exiting a production operation area wear safety clothing, and the original inspection and supervision mode is low in efficiency.

Currently, some staff safety clothes using a target detection algorithm wear an online detection system, but most of the aimed safety clothes targets are sourced from reflective clothes vests acquired by shooting in network crawling or daily life scenes, and the aimed safety clothes targets in real industrial scenes are very rare. The detection and recognition of large and medium target safety clothing in close range can be realized, but the recognition difficulty of safety clothing shielded by illumination change and shadow in the complex background of the real industrial scene is high. Therefore, the improvement of the detection accuracy of the safety clothing target in the industrial monitoring video is very important.

The current detection method mainly uses a computer vision technology, and images or videos of a working site are recorded for a camera based on a vision method. The human body detection is divided into a head part, an upper half part and a lower half part of a trunk of the body by using example segmentation, and after the HOG extracts the characteristics of the three parts, a support vector machine is used for classifying whether a worker wears the safety clothing. In recent years, as graphics processor hardware resources grow and deep learning related research is gradually advanced, a target detection algorithm based on deep learning is widely used in computer vision tasks. For example, an improved YOLOv3 algorithm is used as a security suit detection, multi-scale detection is completed by expanding the original YOLOv3 image input size, and the accuracy is higher under different resolution tests. Or aiming at the problem of feature diversity caused by the difference of appearance shapes and material compositions of different clothes, the YOLOv4 is used for detecting the upper half body, the lower half body clothes and carried articles of a human body, and the objects are classified in detail by means of a migration learning method, so that a YOLOv4 double-stage detection algorithm is provided.

There are many studies on the problem of safety clothing detection at present, but there are still some problems, which are mainly reflected in the following points:

1. the target detection technical means adopted by the existing solution is relatively backward. The algorithms adopted by the current researchers for completing the related target detection tasks are mostly improved on the YOLOv3 and YOLOv4 algorithm models, and the related researches on the specific use of the latest YOLOv5 algorithm are rare in the recent past;

2. most of safety clothing objects detected by the existing research schemes are derived from reflective clothing vests acquired by network crawling or shooting in daily life scenes, and a safety clothing target detection method in real industrial scenes is lacked;

3. the algorithm model in the existing research scheme has poor detection effect on the safety clothing target in the real industrial scene, the background of the real industrial scene is complex, the safety clothing target object to be detected is easily interfered by various complex problems such as illumination change, multi-shielding, monitoring picture loss, motion blur and the like, and a higher requirement is put on the robustness of the algorithm model.

Disclosure of Invention

Aiming at the defects and improvement demands of the prior art, the invention provides a safety clothing detection method and a system based on improved YOLO V5, and aims to improve the detection accuracy and robustness of the safety clothing targets with larger performance on the premise of ensuring the real-time detection speed requirement of an improved algorithm model.

To achieve the above object, according to a first aspect of the present invention, there is provided a safety wear detection method based on improved YOLO V5, the method comprising:

training the safety clothing wearing state training set to improve the YOLO V5, wherein training samples in the training set are picture frames containing workers, and labels are the safety clothing wearing state of the workers to obtain a trained detection model; inputting each frame of the industrial monitoring video into a trained detection model to obtain a safety clothing detection result;

the modified YOLO V5 includes Input, modification Backbone, neck and Prediction in series, the modified Backbone being EfficientNet, resNet, shuffleNet or MobileNet.

Preferably, the safety wear state training set is constructed in the following manner:

extracting a picture frame containing a worker from a real industrial scene monitoring video;

marking the staff in each picture frame according to the wearing state of the staff safety clothing, wherein the marking content comprises the following steps: and obtaining a wearing state detection data set of the safety clothing and a wearing color detection data set of the safety clothing.

Preferably, the manner of extracting the picture frames is as follows: setting the read first frame image of the monitoring video stream as a background frame, setting a static object as a background, and extracting a moving target object by using a background modeling algorithm; calculating the difference between the subsequent frame and the current background frame, if the difference is greater than a threshold value T, updating the background frame, otherwise, continuing to read until the video is finished; if the difference value is smaller than the threshold value T, calculating the moving contour area of the moving target object, if the moving contour area is larger than the threshold value T', storing the image frame, otherwise, continuing to read the next frame.

The beneficial effects are that: the invention provides a key frame extraction algorithm, which converts pedestrian detection in the monitored video interested fragment into detection of a moving target object in a video picture, automatically selects the monitored video data, and greatly improves the data preprocessing efficiency.

Preferably, labelImg software is used for marking the situation that a worker wears the safety suit in the training set of the wearing state of the safety suit, the correct wearing safety suit is marked as 'safe Cloth', no wearing safety suit is marked as 'No_safe Cloth', the Green wearing safety suit is marked as 'Green', the White wearing safety suit is marked as 'White', the Orange wearing safety suit is marked as 'Orange', and the situation that the worker faces the camera at the front and wears the safety suit but the jacket is not fastened or zipped is marked as 'SC_Unzip' of the safety suit.

Preferably, random occlusion is performed on the picture frame in the Input part by using Random Erase: the random shielding block generated in the target object area generates shielding to different degrees on the target object in the image, but does not completely shield.

The beneficial effects are that: because the target area of the safety suit is larger, the safety suit is easy to be shielded by various conditions in the industrial environment. According to the invention, the Random occlusion of the image is simulated by a Random Erase data enhancement mode, multiple occlusions in the industrial environment are selected in the training process, random erasure is selected to expand the data of the positive sample in the safety clothing target detection data set, and the robustness of the algorithm model is improved.

Preferably, the image frames are randomly blocked in the Input part by adopting a GridMask data enhancement mode: generating Mask with the same resolution as the original image, and obtaining the processed image through multiplication operation.

The beneficial effects are that: because the target area of the safety suit is larger, the safety suit is easy to be shielded by various conditions in the industrial environment. The multi-occlusion in the industrial environment is simulated through random occlusion of the Gridmask to the image, and the area and the density of the occluded area can be controlled through five parameters of x, y, w, h and I. The method enables the model to learn more different components of the target object by uniformly shielding the original image area and discarding partial information of the image area, thereby improving the training effect.

To achieve the above object, according to a second aspect of the present invention, there is provided a safety wear detection system based on improved YOLO V5, comprising: a computer readable storage medium and a processor;

the computer-readable storage medium is for storing executable instructions;

the processor is configured to read executable instructions stored in the computer readable storage medium and execute the improved YOLO V5-based security garment detection method according to the first aspect.

In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:

according to the invention, an EfficientNet neural network structure is adopted as a Backbone, and the width and depth of the network structure and the resolution of an input image are uniformly scaled through a composite model expansion coefficient, so that an effect superior to that of manual parameter adjustment of YOLO V5 is obtained. The ResNet50 neural network structure is adopted as a Backbone, and the problem of gradient loss caused by the fact that weights of different convolution network layers cannot be updated in time in the feature transfer process in the YOLO V5 model deep expansion process is optimized due to the fact that a residual block is added. The characteristic information extracted from the network to the input image is completely reserved to the next layer in a jump layer connection mode, so that the gradient dispersion problem between the YOLO V5 network layers can be effectively eliminated in the forward propagation process. By adopting the SheffeNet or MobileNet neural network structure as the backbond, the complexity and the model volume of the YOLO V5 network structure can be reduced, and a lightweight model is realized, so that the requirements of actual industrial detection service requirements on algorithm model precision, speed and model deployment are met.

Drawings

FIG. 1 is a flow chart of a method for detecting safety clothing based on improved YOLO V5;

FIG. 2 is a schematic view of a wearing state and a color detection sample of the safety suit according to the present invention;

FIG. 3 is a schematic diagram of the network structure of the YOLOv5 algorithm before optimization;

FIG. 4 is a schematic representation of the expansion coefficient of the composite model provided by the present invention;

FIG. 5 is a schematic diagram of an optimized YOLOv5+ResNet-50 network provided by the present invention;

FIG. 6 is a schematic diagram of a two-Channel branch signature information exchange based on a SheffleNet Channel separation used in the present invention, wherein (a) is an add Channel Split (Channel Split) operation and (b) is a Channel Split operation not performed;

FIG. 7 is a schematic diagram of an inverse residual network architecture with linear bottlenecks based on MobileNet used in the present invention;

FIG. 8 is a schematic diagram of the Random Erase data enhancement used in the present invention;

FIG. 9 is a schematic diagram of Grid Mask data enhancement used in the present invention;

FIG. 10 is a schematic diagram showing the comparison of the detection accuracy of the YOLOv5 algorithm to the industrial safety suit data set before and after the optimization of the present invention;

FIG. 11 is a schematic diagram showing the comparison of the detection results of the YOLOv5 algorithm to the industrial safety suit dataset before and after the optimization of the present invention;

FIG. 12 is a graph showing the comparison of target detection results of the YOLOv5 algorithm before and after optimization of the present invention, for irregular wearing of safety clothing in an industrial safety clothing data set.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

As shown in fig. 1, the invention provides a safety clothing detection method based on improved YOLO V5, which comprises the following steps:

step S1, collecting monitoring video data in a real industrial scene, judging whether the labeling content of a dataset needs to be updated or not by analyzing the collected data, if so, expanding the labeling of the image content of the dataset, and turning to step S2; otherwise, the operation ends.

And S2, updating an industrial safety clothing data set, and marking the state of wearing the safety clothing by workers in the data set image and the color condition of wearing the safety clothing by using a LabelImg marking tool.

In step S2, the preprocessing method for the industrial monitoring video data includes: definition of image background, screening of video data, labeling and classification of images, and the specific method for preprocessing the industrial monitoring video data disclosed by the invention comprises the following steps:

s21, acquiring monitoring video data in a real industrial scene, judging whether the content of the data set needs to be updated or not, if not, ending the operation, and if so, performing the next operation.

S22, defining an image without staff in a monitoring video picture as a detection background, preprocessing the acquired industrial monitoring video data, screening background image frames in the monitoring video data, and reserving the image with staff in the monitoring picture.

S23, marking the situation that a worker wears the safety clothing in the safety clothing data set by using LabelImg software, marking the correct wearing safety clothing as 'safe Cloth', marking the No wearing safety clothing as 'No_safe Cloth', marking the Green wearing safety clothing as 'Green', marking the White wearing safety clothing as 'White', marking the Orange wearing safety clothing as 'Orange', and marking the situation that the worker faces the camera in front, wearing the safety clothing but the jacket is not fastened or zipped as 'SC_Unzip' as the safety clothing is not standard, as shown in figure 2.

And S3, performing re-clustering analysis on the data set marked in the step S2 by adopting a k-means clustering algorithm to obtain the number and the size of the Anchor of the security clothing target detection frame in the monitoring video in the real industrial scene, and adjusting the number and the size of the Anchor of the YOLO V5 neural network.

In the step S3, the specific method for re-clustering analysis of the industrial safety service data set by using the k-means clustering algorithm is as follows:

s31, clustering analysis is carried out on original pictures of a data set by adopting a k-means clustering algorithm based on a security clothing data set in an actual industrial scene with the marked S2, so that the number and the size of the novel security applicable to security clothing in the industrial scene are obtained, and 9 groups of security sizes are respectively: (12×57), (18×84), (24×114), (31×144), (38×183), (51×219), (61×291), (75×366), (99×373), as shown in fig. 3, the YOLO V5 neural network structure prediction output is composed of three detection heads, each corresponding to a set of Anchor parameter values, for an input image size of 640×640, # p3/8 network layer detection head scale 80×80 for detecting smaller targets of 8×8, #p4/16 network layer detection head scale 40×40 for detecting medium targets of 16×16, and #p5/32 network layer detection head scale 20×20 for detecting larger targets of 32×32; the matching rule of the YOLO V5 detection head and the Anchor is that the first # P3/8 network layer detection head is matched with a first group of Anchor [12,57,18,84,24,114], the second # P4/16 network layer detection head is matched with a second group of Anchor [31,144,38,183,51,219], and the third # P5/32 network layer detection head is matched with a third group of Anchor [61,291,75,366,99,373 ].

S32, replacing the Anchor generated by the original YOLO V5 network based on COCO data aggregation, wherein the original 9 groups of anchors are respectively: (14×27), (23×46), (28×130), (39×148), (52×186), (62×279), (85×237), (88×360), (145×514), the original Anchor is not suitable for security clothing target detection tasks in industrial scenes. And (3) applying the new Anchor to the training process of the YOLO V5 network model, and correcting the number and the size of the Anchor in the YOLO V5 network configuration file to be parameters obtained by a k-means clustering algorithm.

And S4, optimizing a neural network structure of a backbond module in the YOLO V5 algorithm model, and considering the optimization mode of replacing the original network structure in the backbond module in the YOLO V5 by using deep neural network models EfficientNet and ResNet50 and lightweight neural network models ShuffeNet and MobileNet, providing a model fusion algorithm based on the YOLO V5, wherein the model fusion algorithm comprises a plurality of algorithm model combination modes such as YOLO V5+EfficientNet B8, YOLO V5+ResNet50, YOLO V5 +Shuffenet V2, YOLO V5+MobileNet V3 and the like.

In step S4, the specific method for optimizing the neural network structure of the backhaul module in the YOLO V5 algorithm model is as follows:

s41, considering that a network structure of a Backbone module in a YOLO V5 algorithm is replaced by using a deep neural network model EfficientNet, the original YOLO V5 algorithm is scaled by manually adjusting the depth and the width of a neural network to obtain four models of YOLO V5S, YOLO V5m, YOLO V5l and YOLO V5x, as shown in FIG. 4, a method of using a composite model expansion coefficient in the EfficientNet is used, on the basis of baseine, the scaling of the YOLO V5 model is controlled by adjusting the depth (depth, d), the width (width, w) and the resolution (resolution, r) of the network, wherein the deeper the network is the number of layers of the network structure, the more the network width is the channel number (channel) of the network structure is, the stronger the resolution of an input image is the input image, the lower the resolution is the input image, and the lower the information loss in the image is the image; and a mode of manually adjusting network parameters is replaced by the composite model expansion coefficient to obtain a better model scaling result, and a YOLOV5+EfficientB 8 algorithm combination is provided.

S42, after the YOLO V5 and the deep neural network model are fused, the model training error is not reduced along with the deepening of the network layers, the model training fitting is more difficult due to the lack of a residual network structure in the network, the network structure of a Backbone module in the YOLO V5 algorithm is replaced by using the residual network structure, and as shown in FIG. 5, a YOLO V5+ResNet50 algorithm model combination is proposed.

S43, considering comprehensive factors such as accuracy, speed and model deployment usability of an algorithm model by actual industrial detection task demands, replacing a network structure of a Backbone module in a YOLO V5 algorithm by using a lightweight neural network model, namely a SheffleNet, and providing a YOLO V5+SheffleNet V2 algorithm model combination by using a two-channel characteristic information exchange strategy of channel splitting for an original YOLO V5 network as shown in FIG. 6.

S43, considering that the network structure of the Backbone module in the YOLO V5 algorithm is replaced by a lightweight neural network model MobileNet, adding an inverse residual network structure with a linear bottleneck to the Backbone module network of the original YOLO V5 as shown in FIG. 7, and providing a YOLO V5+MobileNet V3 algorithm model combination.

And S5, training an optimized YOLO V5 model fusion algorithm to obtain a safety clothing detection algorithm model oriented to the industrial monitoring video.

In step S5, training the improved YOLO V5 model fusion algorithm to obtain a security clothing detection algorithm model for the industrial monitoring video, wherein the specific contents of the security clothing detection algorithm model are as follows:

s51, downloading configuration files of the pre-training model and the network structure from the YOLO V5 functional network, wherein the configuration files comprise default hyper-parameters and related weight values trained by the YOLO V5 model, and loading the default hyper-parameters and the related weight values into the improved YOLO V5 model fusion neural network.

S52, adjusting the description of the neural network structure in the YOLO V5 network configuration file according to the optimization method of S4, and in order to obtain the optimal effect of the YOLO V5 algorithm on the safety clothing data set before and after optimization, expanding the data set in a model training process by adopting a Random Erase and Grid Mask data enhancement mode, wherein the Random Erase mode can be divided into two modes of Random erasure (Image-aware Random Erasing, IRE) taking an Image background as perception and Random erasure (Object-aware Random Erasing, ORE) taking a target Object as perception, and the Grid Mask data enhancement controls the area and the density of an occluded area by adjusting x, y, w, h and I five parameters as shown in fig. 8 and 9 until the model converges.

S53, as shown in FIG. 10, the detection accuracy and the detection result of the YOLO V5 algorithm before and after optimization on the wearing state and the color of the safety clothing in the industrial safety clothing dataset are respectively tested, as shown in FIG. 11, the situation of false detection and missing detection exists in the algorithm before improvement, and the optimized YOLO V5 model fusion algorithm has higher detection accuracy and robustness on the safety clothing which is interfered by factors such as illumination change and shielding in a complex background in a real industrial scene. And testing the detection accuracy of the YOLO V5 algorithm before and after optimization for the targets which are not standardized in the safety wear of the workers and face the cameras, so as to obtain the result shown in fig. 12, and the target objects can obtain correct detection results under the conditions of illumination change, shadows and the like.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method for detecting safety clothing based on improved YOLO V5, comprising:

the modified YOLO V5 comprises Input, modification Backbone, neck and Prediction in series, the modified backbond being EfficientNet, resNet or ShuffleNet; when the improved Backbone is EfficientNet, uniformly scaling the width and depth of the network structure and the resolution of the input image through the expansion coefficient of the composite model; when the improved Backbone is ResNet50, optimizing gradient loss caused by the fact that weights of different convolution network layers cannot be updated in time in a feature transmission process in a YOLO V5 model depth expansion process through a residual error module, enabling feature information extracted from an input image by a network to be completely reserved to the next layer in a jump layer connection mode, and eliminating gradient dispersion problems among YOLO V5 network layers in a forward propagation process; when the improved Backbone is a ShuffleNet, the complexity and the model volume of a YOLO V5 network structure are reduced, and a lightweight model is realized;

the training set of the wearing state of the safety clothing is constructed by adopting the following modes:

marking the staff in each picture frame according to the wearing state of the staff safety clothing, wherein the marking content comprises the following steps: whether the safety clothing is worn or not and what color of the safety clothing is worn are obtained, and a wearing state detection data set of the safety clothing and a wearing color detection data set of the safety clothing are obtained;

the way to extract the picture frames is as follows: setting the read first frame image of the monitoring video stream as a background frame, setting a static object as a background, and extracting a moving target object by using a background modeling algorithm; calculating the difference between the subsequent frame and the current background frame, if the difference is greater than a threshold value T, updating the background frame, otherwise, continuing to read until the video is finished; if the difference value is smaller than the threshold value T, calculating the moving contour area of the moving target object, if the moving contour area is larger than the threshold value T', storing the picture frame, otherwise, continuing to read the next frame;

marking the situation that a worker wears the safety clothing in the training set of the wearing state of the safety clothing by using LabelImg software, marking the correct wearing of the safety clothing as 'safe Cloth', marking the No wearing of the safety clothing as 'No_safe Cloth', marking the Green wearing of the safety clothing as 'Green', marking the White wearing of the safety clothing as 'White', marking the Orange wearing of the safety clothing as 'Orange', and marking the situation that the worker faces the camera in front, wearing the safety clothing but the jacket is not fastened or zipped as 'SC_Unzip' as the safety clothing wearing is not standard;

expanding a data set in a detection model training process by adopting a Random Erase and Grid Mask data enhancement mode, wherein the Random Erase mode is divided into a Random erasure mode taking an image background as a perception and a Random erasure mode taking a target object as a perception, and the Grid Mask data enhancement controls the area and the density of a shielded area by adjusting x, y, w, h and I five parameters until the detection model converges;

the improved YOLO V5 of the EfficientNet, the improved YOLO V5 of the ResNet50 and the improved YOLO V5 of the ShuffleNet are adopted to form a model fusion algorithm based on the YOLO V5, the trained model fusion algorithm improves the detection accuracy and robustness of safety clothing interfered by complex problems in complex backgrounds in real industrial scenes, and meanwhile, the detection accuracy of targets which are not normal for workers to wear facing cameras and safety clothing is high; the complications include illumination changes, occlusion, monitor picture loss, and motion blur.

2. The method of claim 1, wherein the picture frame is randomly occluded in the Input portion by Random Erase data enhancement: the random shielding block generated in the target object area generates shielding to different degrees on the target object in the image, but does not completely shield.

3. The method according to claim 1 or 2, wherein the picture frames are randomly blocked in the Input part by using a GridMask data enhancement mode: generating Mask with the same resolution as the original image, and obtaining the processed image through multiplication operation.

4. A safety wear detection system based on improved YOLO V5, comprising: a computer readable storage medium and a processor;

the computer-readable storage medium is for storing executable instructions;

the processor is configured to read executable instructions stored in the computer readable storage medium and execute the improved YOLO V5 based security garment detection method of any one of claims 1 to 3.