CN112464765B - Safety helmet detection method based on single-pixel characteristic amplification and application thereof - Google Patents

Safety helmet detection method based on single-pixel characteristic amplification and application thereof Download PDF

Info

Publication number
CN112464765B
CN112464765B CN202011282208.9A CN202011282208A CN112464765B CN 112464765 B CN112464765 B CN 112464765B CN 202011282208 A CN202011282208 A CN 202011282208A CN 112464765 B CN112464765 B CN 112464765B
Authority
CN
China
Prior art keywords
feature
characteristic
pixel
safety helmet
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011282208.9A
Other languages
Chinese (zh)
Other versions
CN112464765A (en
Inventor
姜丽芬
周雍恒
孙华志
马春梅
梁妍
马建扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Normal University
Original Assignee
Tianjin Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Normal University filed Critical Tianjin Normal University
Publication of CN112464765A publication Critical patent/CN112464765A/en
Application granted granted Critical
Publication of CN112464765B publication Critical patent/CN112464765B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a safety helmet detection method based on single-pixel characteristic amplification and application thereof, wherein the detection algorithm comprises the following steps: preprocessing and enhancing the safety helmet data set; extracting a characteristic representation form of the target through an Efficientnet-b0 network; using a single-pixel feature scaling module to perform feature filtering on the backbone network features to enhance foreground elements in the features; performing multi-scale feature fusion operation on the enhanced features through a BiFPN feature fusion module; and inputting the fused features into a target prediction network, and classifying and positioning the targets. The SPZ-Det safety helmet wearing detection algorithm mainly uses the SPZ module to scale the features, so that the small target features are ensured not to be lost in a network, and the performance of the algorithm in detecting the small target is improved.

Description

Safety helmet detection method based on single-pixel characteristic amplification and application thereof
Technical Field
The invention relates to the technical field of deep learning and target detection, in particular to a safety helmet detection method based on single-pixel feature amplification and application thereof.
Background
The safety helmet is a necessary safety precaution measure on the construction site of a construction site, and research reports show that nearly hundreds of construction workers in China suffer from construction accidents every year. The accidents are mostly caused by the fact that the safety supervision of a construction site is not in place. In a construction site, a worker wears a safety helmet which is the most basic safety protection measure, but because the safety consciousness of the worker is low and the self-protection consciousness is weak, the worker can take off the safety helmet in the process of working conveniently, so that once an accident happens, the life of a construction worker can be threatened.
At present, the wearing detection scheme of the safety helmet mainly adopts the detection modes of video monitoring, manual patrol and the like, can not give out warning to workers who do not wear the safety helmet in time, and needs a large amount of human resources, so that the automatic detection technology of the safety helmet is very important. The safety helmet detection is the practical application field of target detection, in early safety helmet detection, the position distribution of the safety helmet and the human face is determined through the comparison of the color distribution of the safety helmet and the human face, and whether a worker wears the safety helmet or not is finally determined according to the position distribution information. Such a detection algorithm based on color distribution is extremely dependent on the characteristics of the color difference of the safety helmet, and is difficult to satisfy the detection environment with various types of safety helmets.
With the development and the improvement of the deep learning technology, the deep neural network can automatically capture more fine-grained characteristic information, and the self-adaptively captured information characteristics are used for helping a subsequent detection task to predict the target position. The safety helmet detection method in the deep learning era avoids the dependence on single characteristics, and the network can acquire more precise characteristic information in a self-adaptive manner for predicting the target. The general target detection algorithm can be divided into two categories: the method comprises the steps of a regression-based single-step target detection algorithm represented by YOLO, SSD and RetinalNet; the second is a two-step detection algorithm based on regions, such as Fast-RCNN, etc. Some algorithms adopt a two-step detection algorithm, namely fast-RCNN, to detect the helmet in pursuit of detection accuracy, but the detection is difficult to apply to actual life, and the detection speed of the two-step detection algorithm is very slow due to the complex calculation process of the two-step detection algorithm.
In addition, in the research of related algorithms for wearing detection of the safety helmet, huge challenges are faced, such as large background change of a construction site and complex scene; individuals far away from the camera are often small in size and difficult to distinguish from a complex background; the construction site is dense in personnel, and the situation that a plurality of people are in the same scene and are shielded mutually often occurs. These challenges greatly limit the performance of headgear donning detection algorithms.
Disclosure of Invention
The invention aims to provide a safety helmet detection method based on single-pixel characteristic amplification aiming at the problems of complex safety helmet wearing detection steps, low detection speed and high identification difficulty in the prior art.
In another aspect of the invention, the application of the safety helmet detection method based on single-pixel feature amplification in construction site monitoring is provided.
The technical scheme adopted for realizing the purpose of the invention is as follows:
a safety helmet detection method based on single-pixel feature amplification comprises the following steps:
step 1, preprocessing and enhancing a safety helmet data set to obtain preprocessed and enhanced sample data;
step 2, extracting a characteristic representation form of a target from the preprocessed and enhanced sample data obtained in the step 1 through an Efficientnet-b0 network to obtain a backbone network characteristic;
step 3, performing feature filtering on the backbone network features obtained in the step 2 by using a single-pixel feature scaling module, and enhancing foreground elements in the features to obtain new feature values;
step 4, performing multi-scale feature fusion operation on the new feature value obtained in the step 3 through a BiFPN feature fusion module to obtain a fused feature;
and 5, inputting the fused features obtained in the step 4 into a target prediction network, and classifying and positioning the targets.
In the above technical solution, the pretreatment in step 1 includes the following steps:
step 1.1, expanding a safety helmet data set by using a horizontal turning method, so that each sample in the safety helmet data set has sample data in a positive form and a negative form;
and 1.2, randomly inserting noise into the sample data, improving the complexity of the sample, and improving the robustness of the algorithm on a data level.
In the above technical solution, when selecting the trunk feature layer of Efficientnet-b0 in step 2, selecting the top three-layer feature, a down-sampling layer feature and a lower layer feature;
the backbone network characteristics are extracted by the following method:
step 2.1, in the feature map extracted by Efficientnet-b0, for feature layers with different resolutions of the upper layer, two feature layers are provided, namely a low-layer feature X1 and a high-layer feature X2;
and 2.2, for the feature layer with the same resolution, the algorithm only selects the high-level feature X2 as the feature representation of the subsequent calculation.
In the above technical solution, the new characteristic value in step 3 is obtained through the following steps:
and performing primary spatial attention enhancement on the features of the trunk network to obtain a main area of the foreground element, namely an attention enhancement feature, wherein in the attention enhancement feature, the contribution capacity of each pixel to the overall feature is calculated, then a feature contribution graph is obtained according to the contribution value, different pixels are subjected to scaling control, and the scaled feature is obtained.
In the above technical solution, the new characteristic value in step 3 is obtained through the following steps:
step 3.1, the main network characteristic F is calculated by using simple space attention once to obtain the attention enhancement characteristic F, as shown in a formula (1),
Figure GDA0003763201300000031
where max is the maximum pooling, mean is the average pooling, v is a convolution calculation of 7 x 7,
Figure GDA0003763201300000032
and S stands for Rule and Sigmoid operations, f i Is an initial feature;
step 3.2, after obtaining the attention enhancing feature F, performing a feature amplification operation at a pixel level on the attention enhancing feature F, firstly calculating a contribution value of each pixel point to the feature map, then obtaining a feature contribution map according to the contribution value, and scaling primary and secondary elements of the feature elements, which is specifically as follows:
Figure GDA0003763201300000033
wherein, C i Characteristic value, n, representing the ith channel i To scale value, f i For the initial feature, H and W are the height and width of the feature map, S is a Softmax function, firstly, the Softmax score value of the single channel feature is obtained through S, the score represents the contribution value of each pixel position to the overall feature of the channel, then the contribution value is compared with the average contribution value 1/(H multiplied by W) of the single pixel of the channel, and if the contribution value is larger than the average contribution value, the scaling value is set to n i If the average contribution value is smaller than the average contribution value, the scaling value is set to (1-n) i ) And finally obtaining a characteristic contribution graph.
Step 3.3, the feature contribution graph and input features C i Performing dot multiplication operation to obtain scaling characteristics, introducing a residual error structure, and performing interpolation on the initial characteristics i And adding the new characteristic values to obtain new characteristic values for the characteristic fusion of the multi-scale module.
In the above technical solution, the multi-scale feature fusion operation method in step 4 includes: and the new characteristic value enhanced by the single-pixel scaling module is transmitted to a BiFPN characteristic fusion module, and the BiFPN characteristic fusion module performs characteristic fusion on the characteristics with different sizes of different levels to compensate the information lost due to downsampling.
In the technical scheme, in a BiFPN feature fusion module, three layers of cross-link operation are used for maintaining the transmission of original features in a backbone network, and the proportion of different features is controlled by a control factor.
In the above technical solution, in the step 5, the detection networks, both of which are three layers of CNNs, classify and locate the target.
In the above technical solution, the method for classifying and positioning the target in step 5 includes the following steps:
step 5.1, in a classification network, using an FcoalLoss loss calculation strategy to limit a large number of background elements and ensure the balance of a positive sample and a negative sample;
and 5.2, in the positioning regression network, using smooth L1 function as a loss calculation strategy as shown in formula (3), and performing loss calculation on the predicted position offset and the offset of the real position of the sample, wherein the offset of the real position can be calculated by formula (4).
Figure GDA0003763201300000041
Figure GDA0003763201300000042
Wherein gt is the regression offset after conversion; reg represents the predicted offset of the regression sub-network; (dx, dy, dw, dh) is a regression label, which is replaced by the relative positional offset between the true annotation box (tx, ty, tw, th) and the anchor box (ax, ay, aw, ah).
On the other hand, the application of the safety helmet detection method based on single-pixel characteristic amplification in construction site monitoring is characterized in that a foreground terminal monitors a construction site by using a camera, the camera transmits a real-time picture to a background processing terminal, and the background terminal performs detection and analysis by performing the safety helmet detection method based on single-pixel characteristic amplification and returns a result to the foreground terminal to remind workers in real time.
Compared with the prior art, the invention has the beneficial effects that:
1. when the data set is preprocessed, the data enhancement mode of horizontal turning is used on the detection data set of the open-source safety helmet, so that the volume of the data set is doubled, and the content of the data set is supplemented; meanwhile, noise is randomly inserted into the data set so as to improve the complexity of data samples and ensure stronger robustness of the model from a data level.
2. The invention adopts Efficientnet-b0 as a backbone network for feature extraction. In order to solve the problems of serious loss of characteristic information of the small target and the like, a bottom layer characteristic layer is introduced into a main network characteristic layer, and the occupation ratio of the small target characteristic in the network is increased. Specifically, in the aspect of selecting a backbone feature layer, the most upper three layers of features are selected by Efficientnet, and then two layers of features are sampled at the lower part, namely five layers, but the feature of the lower layer is also added into the backbone network, so that one down-sampling layer is reduced, but the feature of the lower layer is also five layers of feature layers in total, which means that one more layer of backbone network features is reserved compared with the original Efficientnet network, and one additional down-sampling feature layer is reduced. In order to ensure that the features of small objects still exist in the feature fusion and detection network, a single-pixel enhancement module is used for controlling the features to ensure that foreground elements are not lost.
3. The invention carries out pixel-level scaling on the features extracted from the backbone network, then transmits the enhanced features to a multi-scale fusion module BiFPN feature fusion module for interactive fusion of upper and lower layer features, and then carries out classification and positioning prediction of targets by a detection head network (target prediction network).
4. The invention provides a context attention-based single-pixel feature scaling-SPZ-Det detection model aiming at the problems of complex shielding, small target detection and the like. The model introduces the bottom layer characteristics with rich details into the network, ensures the effectiveness of the network in detecting the small target, and solves the problems that the personnel are shielded mutually, the small target is detected, the network is difficult to extract accurate characteristics and the like. The single-pixel feature Scaling (SPZ) module strengthens the main information in the features, ensures that the main feature information is not ignored or replaced by other noise features in the network reasoning process, and relieves the phenomenon of feature loss in the feature transmission process.
5. The loss phenomenon of characteristic information in the network transmission process is solved based on the selection of the characteristic layer and the introduction of the single-pixel characteristic scaling module, the effectiveness of the SPZ module is verified through comparison, the accuracy of target detection is improved on the basis of ensuring the detection speed by the model, and the AP detection accuracy reaches 94% when the safety helmet is worn.
Drawings
FIG. 1 is a diagram of an SPZ-Det network model;
FIG. 2 is a block diagram of an SPZ module, where M is max pooling; a is average pooling; c is a concatenate; s is Sigmoid calculation; ghost represents GhostModule; feature i Represents formula (2).
Detailed Description
The present invention will be described in further detail with reference to specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
A safety helmet detection method based on single-pixel feature amplification comprises the following steps:
step 1, preprocessing and enhancing a safety helmet data set to obtain preprocessed and enhanced sample data:
step 1.1, expanding a safety helmet data set by using a horizontal turning method, so that each sample in the safety helmet data set has sample data in positive and negative forms;
and 1.2, randomly inserting noise into the sample data, improving the complexity of the sample, and improving the robustness of the algorithm on a data level.
Step 2, extracting a characteristic representation form of a target through an Efficientnet-b0 network from the preprocessed and enhanced sample data obtained in the step 1 to obtain the characteristics of a backbone network:
step 2.1, in the feature map extracted by Efficientnet-b0, for feature layers with different resolutions of the upper layer, two feature layers are provided, namely a low-layer feature X1 and a high-layer feature X2;
and 2.2, for the feature layer with the same resolution, the algorithm only selects the high-level feature X2 as the feature representation of the subsequent calculation.
And 3, performing feature filtering on the backbone network features obtained in the step 2 by using a single-pixel feature scaling module, enhancing foreground elements in the features, and obtaining a new feature value:
step 3.1, the main network characteristic F is calculated by using simple space attention once to obtain the attention enhancement characteristic F, as shown in a formula (1),
Figure GDA0003763201300000061
where max is the maximum pooling, mean is the average pooling, v is a convolution calculation of 7 x 7,
Figure GDA0003763201300000062
and S stands for Rule and Sigmoid operations, f i Is an initial feature;
step 3.2, after obtaining the attention enhancing feature F, performing a feature amplification operation at a pixel level on the attention enhancing feature F, first calculating a contribution value of each pixel point to the feature map, then obtaining a feature contribution map according to the contribution value, and scaling primary and secondary elements of the feature elements, which is specifically as follows:
Figure GDA0003763201300000063
wherein, C i Representing the characteristic value of the ith channel, n i To scale value, f i For the initial features, H and W are the height and width of the feature map, S is the Softmax function, and first a single pass is found by SThe Softmax score value of the channel feature represents the contribution value of each pixel position to the overall feature of the channel, and is compared with the average contribution value 1/(H multiplied by W) of the single pixel of the channel, and if the contribution value is larger than the average contribution value, the scaling value is set to n i If the average contribution value is smaller than the average contribution value, the scaling value is set to (1-n) i ) And finally obtaining a characteristic contribution graph.
Step 3.3, the feature contribution graph and input features C i Performing dot multiplication operation to obtain scaling characteristics, introducing a residual error structure, and performing interpolation on the initial characteristics i And adding the new characteristic values to obtain new characteristic values for the characteristic fusion of the multi-scale module. The introduction of the residual structure can be referred to: he K, Zhang X, Ren S, et al]// Proceedings of the IEEE conference on computer vision and pattern recognition.2016: 770-778, which will not be described in detail.
Step 4, performing multi-scale feature fusion operation on the new feature value obtained in the step 3 through a BiFPN feature fusion module to obtain a fused feature;
and the new characteristic value enhanced by the single-pixel scaling module is transmitted to a BiFPN characteristic fusion module, and the BiFPN characteristic fusion module performs characteristic fusion on the characteristics with different sizes of different levels to compensate the information lost due to downsampling.
In the BiFPN feature fusion module, three-layer cross-link operation is used for maintaining the original features in the backbone network to be transferred, and the proportion between different features is controlled by a control factor. BiFPN feature fusion module feature fusion can refer to Tan M, Pang R, Le Q V.Efficientdet: scalable and effective object detection [ C ]// Proceedings of the IEEE/CVFConreference on Computer Vision and Pattern recognition.2020: 10781-10790.
And 5, inputting the fused features obtained in the step 4 into a target prediction network, and classifying and positioning the targets by two detection networks which are three layers of CNNs:
step 5.1, in a classification network, limiting a large number of background elements by using an FcoalLoss loss calculation strategy to ensure the balance of a positive sample and a negative sample;
and 5.2, in the positioning regression network, using smooth L1 function as a loss calculation strategy, such as formula (3), and performing loss calculation on the predicted position offset and the offset of the real position of the sample, wherein the offset of the real position can be calculated by formula (4).
Figure GDA0003763201300000071
Figure GDA0003763201300000072
Wherein gt is the regression offset after conversion; reg represents the predicted offset of the regression sub-network; (dx, dy, dw, dh) is a regression label, which is replaced by the relative positional offset between the true annotation box (tx, ty, tw, th) and the anchor box (ax, ay, aw, ah).
On the other hand, the application of the safety helmet detection method based on single-pixel characteristic amplification in construction site monitoring is characterized in that a foreground terminal monitors a construction site by using a camera, the camera transmits a real-time picture to a background processing terminal, and the background terminal performs detection and analysis by performing the safety helmet detection method based on single-pixel characteristic amplification and returns a result to the foreground terminal to remind workers in real time.
Example 2
The embodiment adopts the public data set Safety-Helmet-week-Dataset provided by wensihaihui, which comprises 7582 images in total, and comprises 9044 bounding boxes (positive classes) for Wearing the Safety Helmet and 111514 bounding boxes (negative classes) for not Wearing the Safety Helmet, wherein most of the negative class data sets are derived from the SCUT-HEAD data set. The labeled target of the data set has a large number of small heads and an obscured unclear target, the data is disordered and complex, and the labeled data does not belong to the detection category. In the process of data reading, the targets with class errors and difficult detection are firstly eliminated, and a data set which finally participates in training is obtained. The data of two parts of 7582 pieces of image data are each as 8: the structure of 2 divides a training set and a test set, and the uniformity of the distribution of the original data is ensured.
Cross-over ratios IoU are commonly used in the field of object detection to evaluate whether the prediction can locate the position of a real object. IoU is shown in the formula (5).
Figure GDA0003763201300000073
Wherein DR is a detectionResult result frame as a network prediction result frame, GT is a position frame of a GroundTruth real sample, and the larger the IoU value in the experiment is, the more the model prediction effect accords with the real sample position frame. IoU is a criterion for determining whether the predicted result can be used as the final predicted result, and the location prediction box is determined to be available when the value is set to 0.5 in the experiment.
The performance effect of the algorithm model is often evaluated by using the mAP value in the evaluation of detection performance, and the mAP is the average AP value of results predicted by multiple categories. In order to obtain the AP value of a single category, the accuracy and recall must be obtained, as shown in equations (6) and (7).
Figure GDA0003763201300000081
Figure GDA0003763201300000082
Wherein TP, FP and FN are defined as shown in Table 1.
TABLE 1 TP, FP, FN definitions
Figure GDA0003763201300000083
If a PR curve can be constructed from the values of precision and Recall, the AP value is calculated as shown in equation (8).
Figure GDA0003763201300000084
It can be seen that the AP value is equal to the area under the PR curve. And (3) calculating the AP value of each category according to a formula (8), and finally taking an average value to obtain a final result mAP, wherein the larger the mAP value is, the better the detection performance of the network is.
Selecting a rolling module group: in order to achieve better effect and minimum calculation amount, general convolution, separation convolution and Ghost convolution are used for comparison, and finally the Ghost convolution with good effect and low calculation amount is selected as a convolution operation scheme of the single-pixel scaling module.
To ensure the validity of our proposed model, we used two baseline models for comparative experiments. The specific method comprises the following steps:
(1) efficientdet-d 0: the original Efficientdet model shows that the detection performance is low in the small target Person category, and the final mAP is only 52.3%.
(2) Efficientdet-change: the improved Efficientdet model increases the bottom layer characteristic information, and finally the mAP reaches 77.9%.
(3) YOLOv 3: the mAP can reach 71.4% when tested on the same dataset using YOLOV 3.
(4) YOLOv3+ SPZ: the introduction of the single pixel scaling module we propose in YOLOv3 increased the value of the mAP to 73.5%.
(5) SPZ-Det: the method is a model finally proposed by us, a novel safety helmet detection method is constructed by using an Efficientdet structure and combining bottom layer characteristics and an SPZ single-pixel-side module, the mAP value of the safety helmet detection method finally reaches 80.2%, and the AP value for safety helmet wearing detection reaches 94.6%.
Compared analysis shows that the characterization capability of the network characteristics can be improved by focusing on the bottom layer characteristics, the single-pixel characteristic scaling module can be embedded into other detection models, the detection performance of the models is improved, and the experimental results are shown in table 2.
TABLE 2 results of the experiment
Detection method AP(hat) AP(person) mAP
Efficientdet-d0 79.6% 24.9% 52.3%
Efficientdet-change 93.9% 61.9% 77.9%
YOLOv3 86.3% 56.4% 71.4%
YOLOv3+SPZ 91.5% 55.5% 73.5%
SPZ-Det 94.6% 65.8% 80.2%
Feature extraction is carried out by using an Efficientnet-do backbone network, an important feature layer is reselected to enhance the representation capability of features extracted by the backbone network, and then a single-pixel feature scaling module is introduced into the model, so that the phenomenon that features generated in the calculation process of a small target disappear is solved. Finally, the detection precision of the model on the wearing safety helmet reaches 94%, and the overall mAP reaches 80.2%.
Example 3
A monitoring system is constructed through a safety helmet detection method based on single-pixel characteristic amplification, a foreground terminal uses a camera to monitor a construction site, the camera transmits a real-time picture to a background processing terminal, the safety helmet detection method based on single-pixel characteristic amplification of the background terminal carries out detection analysis, a result is returned to the foreground terminal, and workers are reminded in real time.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (9)

1. A safety helmet detection method based on single-pixel characteristic amplification is characterized by comprising the following steps:
step 1, preprocessing and enhancing a safety helmet data set to obtain preprocessed and enhanced sample data;
step 2, extracting a characteristic representation form of a target from the preprocessed and enhanced sample data obtained in the step 1 through an Efficientnet-b0 network to obtain a backbone network characteristic;
step 3, performing feature filtering on the backbone network features obtained in the step 2 by using a single-pixel feature scaling module, and enhancing foreground elements in the features to obtain new feature values; the new characteristic value is obtained by the following steps: performing primary spatial attention enhancement on the features of the trunk network to obtain a main area of a foreground element, namely an attention enhancement feature, wherein in the attention enhancement feature, the contribution capacity of each pixel to the overall feature is calculated, then a feature contribution graph is obtained according to the contribution value, different pixels are subjected to scaling control, and the scaled feature is obtained;
step 4, performing multi-scale feature fusion operation on the new feature value obtained in the step 3 through a BiFPN feature fusion module to obtain a fused feature;
and 5, inputting the fused features obtained in the step 4 into a target prediction network, and classifying and positioning the targets.
2. The safety helmet detection method based on single-pixel feature amplification as claimed in claim 1, wherein the preprocessing in the step 1 comprises the following steps:
step 1.1, expanding a safety helmet data set by using a horizontal turning method, so that each sample in the safety helmet data set has sample data in positive and negative forms;
and step 1.2, randomly inserting noise into the sample data, improving the complexity of the sample and improving the robustness of the method on the data level.
3. The method for detecting the safety helmet based on the single-pixel feature amplification of claim 1, wherein in the step 2, when selecting the main feature layer of Efficientnet-b0, the topmost feature, the downsampling feature and the next lower feature are selected;
the backbone network characteristics are extracted by the following method:
step 2.1, in the feature map extracted by Efficientnet-b0, for the feature layers with different resolutions of the upper layer, two feature layers are provided, one low-layer feature X1 and one high-layer feature X2;
and 2.2, for the feature layer with the same resolution, only selecting the high-layer feature X2 as the feature representation of the subsequent calculation.
4. The method for detecting the safety helmet based on the single-pixel feature amplification as claimed in claim 1, wherein the new feature value in the step 3 is obtained by the following steps:
step 3.1, the main network characteristic F is calculated by using simple space attention once to obtain the attention enhancement characteristic F, as shown in a formula (1),
Figure FDA0003782996320000021
where max is the maximum pooling, mean is the average pooling, v is a convolution calculation of 7 x 7,
Figure FDA0003782996320000022
and S stands for Rule and Sigmoid operations, f i F is the initial characteristic, and f is the characteristic of the backbone network;
step 3.2, after obtaining the attention enhancing feature F, performing a feature amplification operation at a pixel level on the attention enhancing feature F, firstly calculating a contribution value of each pixel point to the feature map, then obtaining a feature contribution map according to the contribution value, and scaling primary and secondary elements of the feature elements, specifically as follows:
Figure FDA0003782996320000023
wherein, C i Representing the characteristic value of the ith channel, n i To scale value, f i For the initial feature, H and W are the height and width of the feature map, S is a Softmax function, firstly, the Softmax score value of the single channel feature is obtained through S, the score represents the contribution value of each pixel position to the overall feature of the channel, then the contribution value is compared with the average contribution value 1/(H multiplied by W) of the single pixel of the channel, and if the contribution value is larger than the average contribution value, the scaling value is set to n i If the average contribution value is smaller than the average contribution value, the scaling value is set to (1-n) i ) Finally, a characteristic contribution graph is obtained;
step 3.3, the feature contribution graph and input features C i Performing dot product operation to obtain scalingCharacteristic, finally introducing a residual error structure, and converting the initial characteristic f i And adding the new characteristic values to obtain new characteristic values for the characteristic fusion of the multi-scale module.
5. The safety helmet detection method based on single-pixel feature amplification as claimed in claim 1, wherein the multi-scale feature fusion operation method in the step 4 is as follows: and the new characteristic value enhanced by the single-pixel scaling module is transmitted to a BiFPN characteristic fusion module, and the BiFPN characteristic fusion module performs characteristic fusion on the characteristics with different sizes of different levels to compensate the information lost due to downsampling.
6. The method as claimed in claim 5, wherein in the BiFPN feature fusion module, three-layer cross-link operation is used to maintain the original features in the backbone network to be transferred, and the control factor controls the ratio between different features.
7. The helmet detection method based on single-pixel feature amplification of claim 1, wherein in the step 5, the target is classified and located by the detection networks which are both three layers of CNNs.
8. The safety helmet detection method based on single-pixel feature amplification of claim 1, wherein the method for classifying and positioning the target in the step 5 comprises the following steps:
step 5.1, in a classification network, limiting a large number of background elements by using an FcoalLoss loss calculation strategy to ensure the balance of a positive sample and a negative sample;
step 5.2, in the positioning regression network, using smooth L1 function as a loss calculation strategy as formula (3), and performing loss calculation on the predicted position offset and the offset of the real position of the sample, wherein the offset of the real position can be calculated by formula (4);
Figure FDA0003782996320000031
Figure FDA0003782996320000032
wherein gt is the regression offset after conversion; reg represents the predicted offset of the regression sub-network; (dx, dy, dw, dh) is a regression label, which is replaced by the relative positional offset between the true annotation box (tx, ty, tw, th) and the anchor box (ax, ay, aw, ah).
9. The application of the safety helmet detection method based on single-pixel characteristic amplification in construction site monitoring as claimed in any one of claims 1 to 8, wherein a front-end terminal uses a camera to monitor a construction site, the camera transmits a real-time picture to a back-end processing terminal, and the back-end terminal performs detection analysis by performing the safety helmet detection method based on single-pixel characteristic amplification and returns the result to the front-end terminal to perform real-time reminding on workers.
CN202011282208.9A 2020-09-10 2020-11-16 Safety helmet detection method based on single-pixel characteristic amplification and application thereof Active CN112464765B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010949870 2020-09-10
CN2020109498709 2020-09-10

Publications (2)

Publication Number Publication Date
CN112464765A CN112464765A (en) 2021-03-09
CN112464765B true CN112464765B (en) 2022-09-23

Family

ID=74837081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011282208.9A Active CN112464765B (en) 2020-09-10 2020-11-16 Safety helmet detection method based on single-pixel characteristic amplification and application thereof

Country Status (1)

Country Link
CN (1) CN112464765B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011365A (en) * 2021-03-31 2021-06-22 中国科学院光电技术研究所 Target detection method combined with lightweight network
CN114462555B (en) 2022-04-13 2022-08-16 国网江西省电力有限公司电力科学研究院 Multi-scale feature fusion power distribution network equipment identification method based on raspberry group

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059582A (en) * 2019-03-28 2019-07-26 东南大学 Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110796009A (en) * 2019-09-29 2020-02-14 航天恒星科技有限公司 Method and system for detecting marine vessel based on multi-scale convolution neural network model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110059582A (en) * 2019-03-28 2019-07-26 东南大学 Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks
CN110796009A (en) * 2019-09-29 2020-02-14 航天恒星科技有限公司 Method and system for detecting marine vessel based on multi-scale convolution neural network model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CBAM: Convolutional Block Attention Module;Sanghyun Woo et al.;《arXiv》;20180718;第1-17页 *
EfficientDet: Scalable and Efficient Object Detection;Mingxing Tan et al.;《arXiv》;20200727;第1-10页 *

Also Published As

Publication number Publication date
CN112464765A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN108053427B (en) Improved multi-target tracking method, system and device based on KCF and Kalman
CN108009473B (en) Video structuralization processing method, system and storage device based on target behavior attribute
CN109670441B (en) Method, system, terminal and computer readable storage medium for realizing wearing recognition of safety helmet
CN108052859B (en) Abnormal behavior detection method, system and device based on clustering optical flow characteristics
CN103824070B (en) A kind of rapid pedestrian detection method based on computer vision
CN109918971B (en) Method and device for detecting number of people in monitoring video
CN104978567B (en) Vehicle checking method based on scene classification
CN109935080B (en) Monitoring system and method for real-time calculation of traffic flow on traffic line
CN112464765B (en) Safety helmet detection method based on single-pixel characteristic amplification and application thereof
CN102521565A (en) Garment identification method and system for low-resolution video
Ahmad et al. Overhead view person detection using YOLO
CN111401310B (en) Kitchen sanitation safety supervision and management method based on artificial intelligence
CN111753651A (en) Subway group abnormal behavior detection method based on station two-dimensional crowd density analysis
CN112163477B (en) Escalator pedestrian pose target detection method and system based on Faster R-CNN
CN114842397A (en) Real-time old man falling detection method based on anomaly detection
CN112163572A (en) Method and device for identifying object
KR101030257B1 (en) Method and System for Vision-Based People Counting in CCTV
CN113989858B (en) Work clothes identification method and system
CN112184773A (en) Helmet wearing detection method and system based on deep learning
CN114092877A (en) Garbage can unattended system design method based on machine vision
CN114885119A (en) Intelligent monitoring alarm system and method based on computer vision
CN109325426B (en) Black smoke vehicle detection method based on three orthogonal planes time-space characteristics
CN106372566A (en) Digital signage-based emergency evacuation system and method
CN109064444B (en) Track slab disease detection method based on significance analysis
CN114422720A (en) Video concentration method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant