CN117636266B - Method and system for detecting safety behaviors of workers, storage medium and electronic equipment - Google Patents

Method and system for detecting safety behaviors of workers, storage medium and electronic equipment Download PDF

Info

Publication number
CN117636266B
CN117636266B CN202410103863.5A CN202410103863A CN117636266B CN 117636266 B CN117636266 B CN 117636266B CN 202410103863 A CN202410103863 A CN 202410103863A CN 117636266 B CN117636266 B CN 117636266B
Authority
CN
China
Prior art keywords
network model
preset
loss function
training
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410103863.5A
Other languages
Chinese (zh)
Other versions
CN117636266A (en
Inventor
唐洪
罗林枫
夏军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Jiaotong University
Original Assignee
East China Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Jiaotong University filed Critical East China Jiaotong University
Priority to CN202410103863.5A priority Critical patent/CN117636266B/en
Publication of CN117636266A publication Critical patent/CN117636266A/en
Application granted granted Critical
Publication of CN117636266B publication Critical patent/CN117636266B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a worker safety behavior detection method, a system, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring and marking an image sample of worker behaviors in a construction site, acquiring a sample data set marked with safety behaviors, preprocessing samples in the sample data set, and dividing the preprocessed sample data set into a training set and a testing set; performing network training through a preset network model according to the training set, and judging the training effect of the preset network model through a preset loss function; when the effect of the preset network model is stable, testing the accuracy of the preset network model through a test set, judging whether the accuracy reaches a preset value, if so, ending training of the preset network model, and obtaining a target network model; and deploying the target network model into the mobile terminal with the shooting function. The invention solves the problems of detection error and omission phenomenon, complex detection model structure, large parameter quantity and large calculation force and energy consumption in the detection method in the prior art.

Description

Method and system for detecting safety behaviors of workers, storage medium and electronic equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and a system for detecting safety behaviors of workers, a storage medium, and an electronic device.
Background
In recent years, due to lack of safety consciousness for workers at construction sites of construction sites, regulations for correctly wearing safety helmets and prohibiting smoking are often ignored, and a number of unnecessary accidents are caused. If the worker carries out detection and reminding of the helmet wearing and smoking behaviors by means of a manual method, the consumed manpower and material resources are too high, and real-time monitoring cannot be achieved; the method for detecting the smoke by the sensor through the contact of the safety helmet and the head of the staff requires the additional installation of the sensor on all the safety helmets, has high cost and brings uncomfortable feeling to the wearing of workers, and the sensor is used for detecting the smoking behaviors of the workers and is greatly influenced by environmental factors, especially in an open-air construction site, and has very low smoke detection effect. With the continuous development of deep learning, monitoring of worker safety behavior by a vision-based target detection method becomes a popular research direction.
However, the conventional target detection technology, although capable of achieving good detection effect, still cannot well complete the detection of the wearing of the safety helmet in the safety behavior of workers in the complex environment such as construction sites, especially the detection of small targets with smaller sizes such as cigarette ends in the smoking behavior detection, and has the phenomena of detection errors and omission. In addition, the existing detection model has complex structure and large parameter quantity, which prevents the application of the detection model in industrial production, and particularly, the detection model is difficult to be widely deployed in the scene of limited terminal calculation power and energy consumption.
Aiming at the detection of two safety behaviors of worker safety helmet wearing and smoking, the following problems exist at present:
Firstly, the wearing angle of the safety helmet can be changed according to different personnel, and the camera shoots the difference of distance, so that different safety helmet detection targets have different shapes and even different sizes, and the model needs to have adaptability and can accurately detect the safety helmet with various angles and sizes.
And secondly, the size of the cigarette is smaller, the effect of the existing various target detection models on small target detection is poor and satisfactory, and the actual industrial application standard cannot be reached.
Finally, the existing target detection model with good detection effect is high in model complexity and large in parameter quantity, and has high calculation resource requirements on equipment needing to be deployed and operated, so that the target detection model is difficult to deploy to a terminal with limited resources, such as a monitoring camera.
Disclosure of Invention
Based on the detection, the invention aims to provide a method, a system, a storage medium and electronic equipment for detecting the safety behavior of workers, and aims to solve the problems that detection errors and omission occur in the detection method in the prior art, and the detection model is complex in structure, large in parameter quantity and large in calculation force and energy consumption are required.
According to the embodiment of the invention, the method for detecting the safety behavior of the worker comprises the following steps:
Acquiring an image sample of worker behaviors in a construction site, marking the image sample to obtain a sample data set marked with safety behaviors, preprocessing samples in the sample data set, and dividing the preprocessed sample data set into a training set and a testing set according to a preset proportion;
performing network training through a preset network model according to the training set, and judging the training effect of the preset network model through a preset loss function;
when the change amplitude of the damage function value of the preset network model is smaller than a preset value, the test set is imported into the trained preset network model, whether the accuracy of the trained preset network model reaches the preset value is judged, if yes, training of the preset network model is finished, and a target network model is obtained;
And deploying the target network model into a mobile terminal with a shooting function so as to monitor the safety behavior of workers through the mobile terminal.
In addition, the method for detecting the safety behavior of the worker according to the above embodiment of the invention may further have the following additional technical features:
further, the preprocessing the samples in the sample dataset includes:
scaling and unifying the pictures in the sample to a preset pixel size;
Carrying out data cleaning on the zoomed picture to remove incomplete or noise-existing data;
Randomly selecting a preset number of cleaned pictures, randomly disturbing the pictures to strengthen the pictures, and marking the strengthened pictures and the cleaned pictures so as to facilitate the subsequent training of the preset network model.
Further, the preset loss function includes a classification loss function and a positioning loss function, the positioning loss function includes an intersection ratio loss function and a distribution focusing loss function, and the preset loss function is:
Wherein, in the formula, Is a weight coefficient,/>For the preset loss function value,/>To classify the loss function value,/>Is the cross-ratio loss function value,/>To distribute the focus loss function values.
Further, the classification loss function is:
In the method, in the process of the invention, To classify the loss function value,/>For the real class in the target frame,/>Predicting score for category in preset frame,/>Is the total number of categories.
Further, the cross-ratio loss function is:
In the method, in the process of the invention, Is the cross-ratio loss function value,/>For prediction frame,/>Is a true frame,/>For the degree of coverage between the predicted bounding box and the real bounding box,/>Is the square of the intersection area of the predicted and real frames,/>Is the square of half the diagonal length of the prediction and real frames,/>As a weight function,/>To measure the function of aspect ratio similarity between a real bounding box and a prediction bounding box,/>Is the width of the real bounding box,/>Is the height of the real bounding box,/>To predict the width of the bounding box,/>Is the height of the prediction bounding box;
the distributed focus loss function is:
In the method, in the process of the invention, For/>Predicted value obtained by rounding down,/>For/>Predicted value obtained by rounding upwards,/>The mapping value after double-speed scaling is adopted downwards for the distance between the target center point and one of the upper, lower, left and right sides of the prediction frame,/>For/>Corresponding prediction probability,/>For/>Corresponding prediction probabilities.
Further, the step of deploying the target network model to a mobile terminal with a shooting function to monitor the safety behavior of workers through the mobile terminal comprises the following steps:
Acquiring a background picture through the mobile terminal with the shooting function;
Acquiring a detection result picture through the target network model according to the background picture, and judging whether a characteristic behavior exists according to the detection result picture, wherein the characteristic behavior at least comprises a smoking behavior and a safety helmet wearing behavior;
if yes, the early warning information is sent to an administrator for early warning.
Further, the target network model includes a backhaul end, a Neck end and a Head end, the backhaul end includes a convolution layer for preliminary feature advance, a multi-layer shuffle module for deep extraction, and an SPPF module disposed at the end of the backhaul end, the Neck end includes an upsampling module, a stitching module, an EMA module, and a dynamic serpentine convolution layer, the SPPF module includes a plurality of maximum pooling layers, and the step of obtaining a detection result picture through the target network model according to the background picture includes:
Performing feature extraction through a background terminal according to the background picture to obtain a feature extraction picture;
collecting the feature extraction graphs of the maximum pooling layer passing through different times through the up-sampling module, and splicing the feature extraction graphs which do not pass through the maximum pooling layer with the feature extraction graphs which pass through one layer more than the maximum pooling layer through the splicing module to obtain an output feature graph;
Obtaining the target detection feature diagrams with different sizes through the EMA module and the dynamic snake-shaped convolution layer according to the output feature diagrams;
And the target detection feature map performs target category prediction and target position prediction through the Head end so as to output a detection result picture.
It is another object of an embodiment of the present invention to provide a worker safety behavior detection system, the system comprising:
The data processing module is used for obtaining an image sample of the worker behavior in the construction site, marking the image sample to obtain a sample data set of marked safety behavior, preprocessing samples in the sample data set, and dividing the preprocessed sample data set into a training set and a testing set according to a preset proportion;
the model training module is used for carrying out network training through a preset network model according to the training set, and judging the training effect of the preset network model through a preset loss function;
the model detection module is used for importing the test set into the trained preset network model when the change amplitude of the damage function value of the preset network model is smaller than a preset value, judging whether the accuracy of the trained preset network model reaches the preset value, and if so, ending the training of the preset network model to obtain a target network model;
the model deployment module is used for deploying the target network model into the mobile terminal with the shooting function so as to monitor the safety behavior of workers through the mobile terminal.
It is another object of an embodiment of the present invention to provide a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the worker safety behavior detection method described above.
It is another object of an embodiment of the present invention to provide an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the worker safety behavior detection method described above when executing the program.
The invention forms a lightweight network (shuffle net v 2) through the multi-layer shuffle module in the network model, so that the calculation complexity and the parameter quantity of the network model are reduced on the premise of ensuring the detection accuracy; secondly, combining an EMA module and a PAFPN path aggregation feature pyramid network in a dynamic serpentine convolution layer, so that effective channel description is learned under the condition that channel dimension reduction is not carried out in convolution operation, and better pixel-level attention is generated for an advanced feature map, thereby effectively reducing interference of noise information brought by multi-scale feature fusion on target detection, especially small target detection, and finally effectively improving detection and identification capacities of models on safety caps with different angles and cigarettes with smaller sizes; in addition, the dynamic snake-shaped convolution layer enhances the attention of the network model to the slender continuous characteristics of the tubular structure, and enhances the detection capability of the network model to small targets, especially small tubular structure targets such as cigarettes; and judging the training condition of the network model through the loss function, optimizing the parameter structure of the network model, and finally, testing the accuracy of the network model according to the test set data after the network model is stable to reach an expected value, thereby completing the model training. According to the network model, the network model structure is adjusted and optimized according to the detection targets of the safety helmet and the small tubular structure targets with different angles and sizes, the accuracy of network model detection is provided, and the parameter quantity and the calculation quantity are reduced through the light-weight module, so that a terminal with limited resources can be deployed without consuming a large amount of calculation force and energy consumption.
Drawings
Fig. 1 is a flowchart of a worker safety behavior detection method in a first embodiment of the invention;
FIG. 2 is a schematic diagram showing the results of a worker safety behavior detection system in accordance with a second embodiment of this invention;
fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating an execution process of the shuffle module when the convolution step size is 1 in the first embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating an execution process of the shuffle module when the convolution step size is 2 in the first embodiment of the present invention;
the invention will be further described in the following detailed description in conjunction with the above-described figures.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
Example 1
Referring to fig. 1, a method for detecting safety behavior of a worker according to a first embodiment of the present invention is shown, and specifically includes steps S01-S04.
S01, obtaining an image sample of worker behaviors in a construction site, marking the image sample to obtain a sample data set marked with safety behaviors, preprocessing samples in the sample data set, and dividing the preprocessed sample data set into a training set and a testing set according to a preset proportion;
Specifically, scaling and unifying the pictures in the sample into a preset pixel size; carrying out data cleaning on the zoomed picture to remove incomplete or noise-existing data; randomly selecting a preset number of cleaned pictures, randomly disturbing the pictures to strengthen the pictures, and marking the strengthened pictures and the cleaned pictures so as to facilitate the subsequent training of the preset network model. More specifically, random cutting, overturning, shielding, angle rotation and color disturbance operations are carried out on some data pictures at random to complete the enhancement operation of the pictures, and picture data obtained after the enhancement operation are added into an original data set after the labeling operation is carried out, so that the adaptability of a network model to targets with different angles and sizes and the target detection capability under different colors and illumination conditions are enhanced; marking data, namely, marking the positions of the safety helmets in a frame selection way on pictures of the correct wearing of the safety helmets by workers and marking the wearing of the safety helmets, marking the top of the head in a frame selection way on pictures of the abnormal wearing of the safety helmets by workers and marking the wearing of the safety helmets; meanwhile, the cigarette position and the smoking behavior of the pictures smoked by the worker are marked, and the smoking behavior of the pictures not smoked by the worker is marked. The trained model is compared with the prediction result to directly determine whether the prediction is accurate or not.
S02, performing network training through a preset network model according to the training set, and judging the training effect of the preset network model through a preset loss function;
Specifically, the pictures of the training set are input into the detection model, preliminary feature extraction is carried out through a convolution layer in the back bone end, a shallow feature map with the size being half of the original size is obtained, the shallow feature map is further extracted while the light weight of a network model is guaranteed through a multi-layer shuffle module, the front end and the rear end of the model back bone are processed through an SPPF module, the front end and the rear end of the SPPF module comprise 1x1 convolution layers, batch normalization (batch normalization) convolution layers activated by Si LU layers, three largest pooling layers in the middle of the SPPF module, the feature map which is not subjected to the largest pooling and the feature map which is subjected to the largest pooling every time are spliced, the feeling field of the feature map is enlarged, the feature map is fused to obtain an output feature map of the back bone, EMA (high-efficiency multi-scale attention mechanism) and DS Conv (path aggregation feature network) are introduced into the back bone end, the tubular feature map is obtained, the tubular feature map is greatly predicted under the condition that the different feature map is influenced by different feature maps, the three-scale object-like object detection is achieved, and the target-class detection result is very small, and the tubular object detection result is very small, and the target object detection is achieved.
Furthermore, the offsets that manipulate a single convolution kernel shape are all learned at once in the network, with only one range constraint for each offset, i.e., receptive field range. Controlling all convolutions to deform depends on the final loss constraint feedback of the whole network, and the change process is relatively free, but completely free, so that a model is easy to lose small structural characteristics with small duty ratio, namely, an elongated tubular structure is easy to lose. Thus, continuity constraints are added to the design of the dynamic serpentine convolution kernel. Each convolution position is referenced by its previous position, and the direction of oscillation is freely selected, so that continuity of feeling is ensured while freely selecting.
For each shuffle module, we split the incoming channels into two branches, one being c 'and the other being c-c'. To reduce the degree of fragmentation of the model, no operation is performed on the left shortcut branch when stride (step size) =1, as shown in fig. 4, and when stride (step size) =2, the left shortcut branch performs two convolution layer operations of 3×3 Dconv (3×3 depth separable convolution) and 1×1 Conv (1×1 convolution), as shown in fig. 5. On the other right main branch there are then 3 convolution layers, 1x1 Conv (1 x1 convolution), 3x3 Dconv (3 x3 depth separable convolution) and 1x1 Conv (1 x1 convolution), respectively, regardless of stride size. Two different branches have the same input and output channels. After convolution, the two branches are spliced by a concat operation so that the consistency of the input channel and the output channel can be satisfied.
In addition, in the forward propagation calculation process, target detection loss calculation is performed according to a preset loss function, then reverse derivative operation is performed by a random gradient descent method to update model parameters, and finally, the model training process of minimizing the loss function is completed. Specifically, the preset loss function includes a classification loss function and a positioning loss function, the positioning loss function includes an intersection ratio loss function and a distribution focusing loss function, and the preset loss function is:
In the method, in the process of the invention, Is a weight coefficient,/>For the preset loss function value,/>To classify the loss function value,/>Is the cross-ratio loss function value,/>To distribute the focus loss function values.
The classification loss function is:
In the method, in the process of the invention, To classify the loss function value,/>For the real class in the target frame,/>Predicting score for category in preset frame,/>Is the total number of categories.
The cross-ratio loss function is:
In the method, in the process of the invention, Is the cross-ratio loss function value,/>For prediction frame,/>Is a true frame,/>For the degree of coverage between the predicted bounding box and the real bounding box,/>Is the square of the intersection area of the predicted and real frames,/>Is the square of half the diagonal length of the prediction and real frames,/>As a weight function,/>To measure the function of aspect ratio similarity between a real bounding box and a prediction bounding box,/>Is the width of the real bounding box,/>Is the height of the real bounding box,/>To predict the width of the bounding box,/>Is the height of the prediction bounding box;
the distributed focus loss function is:
In the method, in the process of the invention, For/>Predicted value obtained by rounding down,/>For/>Predicted value obtained by rounding upwards,/>The mapping value after double-speed scaling is adopted downwards for the distance between the target center point and one of the upper, lower, left and right sides of the prediction frame,/>For/>Corresponding prediction probability,/>For/>Corresponding prediction probabilities. The training state of the model can be judged through the change of the loss function value, and then whether the training of the model is finished is judged through the judgment of the test set data when the model is stable.
S03, when the change amplitude of the damage function value of the preset network model is smaller than a preset value, importing the test set into the trained preset network model, judging whether the accuracy of the trained preset network model reaches the preset value, if so, ending the training of the preset network model, and obtaining a target network model;
In this embodiment, when the accuracy of the preset network model reaches more than 95%, model training is completed, and the target network model is obtained. When the target network model confirms that the worker performs unsafe behavior, the target network model stores pictures of the unsafe behavior and sends the pictures to the mobile terminal of the corresponding worker for deduction and warning. If the worker finds that the judgment is wrong, the worker can feed back through the mobile terminal, and then the target network model can be further improved.
S04, deploying the target network model into a mobile terminal with a shooting function so as to monitor the safety behavior of workers through the mobile terminal;
In this embodiment, after the target network model is obtained, the target network model is deployed on a mobile terminal with a shooting function, the mobile terminal is preferably a camera, after the target network model is deployed on the camera, the camera is controlled to shoot a worker to obtain a worker behavior image, the worker behavior image is imported into the target network model to judge the worker behavior, and whether the worker is in unsafe behavior is determined.
In addition, after the target network model is deployed to the mobile terminal with the shooting function, obtaining a background picture through the mobile terminal with the shooting function; acquiring a detection result picture through the target network model according to the background picture, and judging whether a characteristic behavior exists according to the detection result picture, wherein the characteristic behavior at least comprises a smoking behavior and a safety helmet wearing behavior; if yes, the early warning information is sent to an administrator for early warning. Specifically, the target network model includes a backhaul end, a Neck end and a Head end, the backhaul end includes a convolution layer for preliminary feature advance, a multi-layer shuffle module for deep extraction, and an SPPF module disposed at the tail end of the backhaul end, the Neck end includes an upsampling module, a stitching module, an EMA module and a dynamic serpentine convolution layer, the SPPF module includes a plurality of maximum pooling layers, and the step of obtaining a detection result picture through the target network model according to the background picture includes: performing feature extraction through a background terminal according to the background picture to obtain a feature extraction picture; collecting the feature extraction graphs of the maximum pooling layer passing through different times through the up-sampling module, and splicing the feature extraction graphs which do not pass through the maximum pooling layer with the feature extraction graphs which pass through one layer more than the maximum pooling layer through the splicing module to obtain an output feature graph; obtaining the target detection feature diagrams with different sizes through the EMA module and the dynamic snake-shaped convolution layer according to the output feature diagrams; and the target detection feature map performs target category prediction and target position prediction through the Head end so as to output a detection result picture.
In summary, according to the worker safety behavior detection method in the embodiment of the invention, a lightweight network (shuffle net v 2) is formed by the multi-layer shuffle modules in the network model, so that the calculation complexity and parameter quantity of the network model are reduced on the premise of ensuring the detection accuracy; secondly, combining an EMA module and a PAFPN path aggregation feature pyramid network in a dynamic serpentine convolution layer, so that effective channel description is learned under the condition that channel dimension reduction is not carried out in convolution operation, and better pixel-level attention is generated for an advanced feature map, thereby effectively reducing interference of noise information brought by multi-scale feature fusion on target detection, especially small target detection, and finally effectively improving detection and identification capacities of models on safety caps with different angles and cigarettes with smaller sizes; in addition, the dynamic snake-shaped convolution layer enhances the attention of the network model to the slender continuous characteristics of the tubular structure, and enhances the detection capability of the network model to small targets, especially small tubular structure targets such as cigarettes; and judging the training condition of the network model through the loss function, optimizing the parameter structure of the network model, and finally, testing the accuracy of the network model according to the test set data after the network model is stable to reach an expected value, thereby completing the model training. According to the network model, the network model structure is adjusted and optimized according to the detection targets of the safety helmet and the small tubular structure targets with different angles and sizes, the accuracy of network model detection is provided, and the parameter quantity and the calculation quantity are reduced through the light-weight module, so that a terminal with limited resources can be deployed without consuming a large amount of calculation force and energy consumption.
Example two
Referring to fig. 2, a block diagram of a worker safety behavior detection system according to a second embodiment of the invention is shown, and the worker safety behavior detection system 200 includes: a data processing module 21, a model training module 22, a model detection module 23, and a model deployment module 24, wherein:
The data processing module 21 is configured to obtain an image sample of a worker's behavior in a worksite, label the image sample to obtain a sample data set labeled with a security behavior, pre-process samples in the sample data set, and divide the pre-processed sample data set into a training set and a testing set according to a preset proportion;
The model training module 22 is configured to perform network training through a preset network model according to the training set, and determine a training effect of the preset network model through a preset loss function;
the model detection module 23 is configured to determine whether the accuracy of the trained preset network model reaches a preset value according to the trained preset network model when the effect of the preset network model is stable, and if so, end training the preset network model to obtain a target network model;
the model deployment module 24 is configured to deploy the target network model to a mobile terminal with a shooting function, so as to monitor the safety behavior of workers through the mobile terminal.
Further, in other embodiments of the present invention, the worker safety behavior detection system 200 includes:
The monitoring and early warning module is used for acquiring a background picture through the mobile terminal with the shooting function; acquiring a detection result picture through the target network model according to the background picture, and judging whether a characteristic behavior exists according to the detection result picture, wherein the characteristic behavior at least comprises a smoking behavior and a safety helmet wearing behavior; if yes, the early warning information is sent to an administrator for early warning.
Further, the data processing module 21 includes:
the scaling unit is used for scaling and unifying the pictures in the sample into a preset pixel size;
the denoising unit is used for cleaning the data of the zoomed picture so as to remove incomplete or noisy data;
The enhancement unit is used for randomly selecting a preset number of cleaned pictures, randomly disturbing the pictures to enhance the pictures, and labeling the enhanced pictures and the cleaned pictures so as to facilitate the subsequent training of the preset network model.
Further, the target network model includes a backhaul end, a Neck end and a Head end, the backhaul end includes a convolution layer for preliminary feature advance, a multi-layer shuffle module for deep extraction, and an SPPF module disposed at the end of the backhaul end, the Neck end includes an upsampling module, a splicing module, an EMA module and a dynamic serpentine convolution layer, the SPPF module includes a plurality of maximum pooling layers, and the monitoring and early warning module includes:
The feature extraction unit is used for carrying out feature extraction through a background terminal according to the background picture to obtain a feature extraction picture;
The output feature map acquisition unit is used for acquiring the feature extraction maps of the maximum pooling layer passing through different times through the up-sampling module, and splicing the feature extraction maps which do not pass through the maximum pooling layer with the feature extraction maps of one layer more through the maximum pooling layer through the splicing module to obtain an output feature map;
The target detection feature map acquisition unit is used for acquiring the target detection feature maps with different sizes through the EMA module and the dynamic snake-shaped convolution layer according to the output feature map;
And the detection result picture acquisition unit is used for carrying out target category prediction and target position prediction on the target detection feature map through the Head end so as to output a detection result picture.
The functions or operation steps implemented when the above modules are executed are substantially the same as those in the above method embodiments, and are not described herein again.
Example III
In another aspect, referring to fig. 3, a schematic diagram of an electronic device according to a third embodiment of the present invention is provided, including a memory 20, a processor 10, and a computer program 30 stored in the memory and capable of running on the processor, where the processor 10 implements the above-mentioned method for detecting the safety behavior of a worker when executing the computer program 30.
The processor 10 may be, among other things, a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor or other data processing chip in some embodiments for running program code or processing data stored in the memory 20, e.g. executing an access restriction program or the like.
The memory 20 includes at least one type of readable storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 20 may in some embodiments be an internal storage unit of the electronic device, such as a hard disk of the electronic device. The memory 20 may also be an external storage device of the electronic device in other embodiments, such as a plug-in hard disk provided on the electronic device, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), etc. Further, the memory 20 may also include both internal storage units and external storage devices of the electronic device. The memory 20 may be used not only for storing application software of an electronic device and various types of data, but also for temporarily storing data that has been output or is to be output.
It should be noted that the structure shown in fig. 3 does not constitute a limitation of the electronic device, and in other embodiments the electronic device may comprise fewer or more components than shown, or may combine certain components, or may have a different arrangement of components.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the worker safety behavior detection method as described above.
Those of skill in the art will appreciate that the logic and/or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (5)

1. A method for detecting worker safety behavior, the method comprising:
Acquiring an image sample of worker behaviors in a construction site, marking the image sample to obtain a sample data set marked with safety behaviors, preprocessing samples in the sample data set, and dividing the preprocessed sample data set into a training set and a testing set according to a preset proportion;
performing network training through a preset network model according to the training set, and judging the training effect of the preset network model through a preset loss function;
when the change amplitude of the damage function value of the preset network model is smaller than a preset value, the test set is imported into the trained preset network model, whether the accuracy of the trained preset network model reaches the preset value is judged, if yes, training of the preset network model is finished, and a target network model is obtained;
Deploying the target network model into a mobile terminal with a shooting function so as to monitor the safety behavior of workers through the mobile terminal;
Acquiring a background picture through the mobile terminal with the shooting function;
Acquiring a detection result picture through the target network model according to the background picture, and judging whether a characteristic behavior exists according to the detection result picture, wherein the characteristic behavior at least comprises a smoking behavior and a safety helmet wearing behavior;
if yes, the early warning information is sent to an administrator for early warning;
The target network model comprises a backbox end, a Neck end and a Head end, wherein the backbox end comprises a convolution layer for preliminary feature advance, a multi-layer shuffle module for deep extraction and an SPPF module arranged at the tail end of the backbox end, the Neck end comprises an upsampling module, a splicing module, an EMA module and a dynamic snake convolution layer, the SPPF module comprises a plurality of maximum pooling layers, and the step of acquiring a detection result picture through the target network model according to the background picture comprises the following steps:
Performing feature extraction through a background terminal according to the background picture to obtain a feature extraction picture;
collecting the feature extraction graphs of the maximum pooling layer passing through different times through the up-sampling module, and splicing the feature extraction graphs which do not pass through the maximum pooling layer with the feature extraction graphs which pass through one layer more than the maximum pooling layer through the splicing module to obtain an output feature graph;
Obtaining the target detection feature diagrams with different sizes through the EMA module and the dynamic snake-shaped convolution layer according to the output feature diagrams;
the target detection feature map carries out target category prediction and target position prediction through a Head end so as to output a detection result picture;
the preset loss function comprises a classification loss function and a positioning loss function, the positioning loss function comprises an intersection ratio loss function and a distribution focusing loss function, and the preset loss function is as follows:
Wherein, in the formula, Is a weight coefficient,/>For the preset loss function value,/>To classify the loss function value,/>Is the cross-ratio loss function value,/>A value of a function for distributed focus loss;
the classification loss function is:
In the method, in the process of the invention, To classify the loss function value,/>For the real class in the target frame,/>Predicting score for category in preset frame,/>Is the total number of categories;
The cross-ratio loss function is:
In the method, in the process of the invention, Is the cross-ratio loss function value,/>For prediction frame,/>Is a true frame,/>For the degree of coverage between the predicted bounding box and the real bounding box,/>Is the square of the intersection area of the predicted and real frames,/>Is the square of half the diagonal length of the prediction and real frames,/>As a weight function,/>To measure the function of aspect ratio similarity between a real bounding box and a prediction bounding box,/>Is the width of the real bounding box,/>Is the height of the real bounding box,/>In order to predict the width of the bounding box,Is the height of the prediction bounding box;
the distributed focus loss function is:
In the method, in the process of the invention, For/>Predicted value obtained by rounding down,/>For/>Predicted value obtained by rounding upwards,/>The mapping value after double-speed scaling is adopted downwards for the distance between the target center point and one of the upper, lower, left and right sides of the prediction frame,/>For/>Corresponding prediction probability,/>For/>Corresponding prediction probabilities.
2. The worker safety behavior detection method according to claim 1, wherein the preprocessing of the samples in the sample data set includes:
scaling and unifying the pictures in the sample to a preset pixel size;
Carrying out data cleaning on the zoomed picture to remove incomplete or noise-existing data;
Randomly selecting a preset number of cleaned pictures, randomly disturbing the pictures to strengthen the pictures, and marking the strengthened pictures and the cleaned pictures so as to facilitate the subsequent training of the preset network model.
3. A worker safety behavior detection system for realizing the worker safety behavior detection method according to any one of claims 1 to 2, the system comprising:
The data processing module is used for obtaining an image sample of the worker behavior in the construction site, marking the image sample to obtain a sample data set of marked safety behavior, preprocessing samples in the sample data set, and dividing the preprocessed sample data set into a training set and a testing set according to a preset proportion;
the model training module is used for carrying out network training through a preset network model according to the training set, and judging the training effect of the preset network model through a preset loss function;
the model detection module is used for importing the test set into the trained preset network model when the change amplitude of the damage function value of the preset network model is smaller than a preset value, judging whether the accuracy of the trained preset network model reaches the preset value, and if so, ending the training of the preset network model to obtain a target network model;
the model deployment module is used for deploying the target network model into the mobile terminal with the shooting function so as to monitor the safety behavior of workers through the mobile terminal.
4. A computer-readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the worker safety behavior detection method according to any one of claims 1 to 2.
5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the worker safety behavior detection method of any one of claims 1 to 2 when the program is executed.
CN202410103863.5A 2024-01-25 2024-01-25 Method and system for detecting safety behaviors of workers, storage medium and electronic equipment Active CN117636266B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410103863.5A CN117636266B (en) 2024-01-25 2024-01-25 Method and system for detecting safety behaviors of workers, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410103863.5A CN117636266B (en) 2024-01-25 2024-01-25 Method and system for detecting safety behaviors of workers, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN117636266A CN117636266A (en) 2024-03-01
CN117636266B true CN117636266B (en) 2024-05-14

Family

ID=90035798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410103863.5A Active CN117636266B (en) 2024-01-25 2024-01-25 Method and system for detecting safety behaviors of workers, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117636266B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022000855A1 (en) * 2020-06-29 2022-01-06 魔门塔(苏州)科技有限公司 Target detection method and device
CN114359606A (en) * 2021-12-17 2022-04-15 西安理工大学 Deep learning-based student classroom behavior detection method, system and terminal
CN115546630A (en) * 2022-09-14 2022-12-30 国网江苏省电力有限公司无锡供电分公司 Construction site extraction method and system based on remote sensing image characteristic target detection
CN115830533A (en) * 2022-11-25 2023-03-21 淮阴工学院 Helmet wearing detection method based on K-means clustering improved YOLOv5 algorithm
CN116092115A (en) * 2022-11-28 2023-05-09 中国计量大学 Real-time lightweight construction personnel safety dressing detection method
CN116704357A (en) * 2023-08-09 2023-09-05 江西省水利科学院(江西省大坝安全管理中心、江西省水资源管理中心) YOLOv 7-based intelligent identification and early warning method for landslide of dam slope
WO2023173598A1 (en) * 2022-03-15 2023-09-21 中国华能集团清洁能源技术研究院有限公司 Fan blade defect detection method and system based on improved ssd model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022000855A1 (en) * 2020-06-29 2022-01-06 魔门塔(苏州)科技有限公司 Target detection method and device
CN114359606A (en) * 2021-12-17 2022-04-15 西安理工大学 Deep learning-based student classroom behavior detection method, system and terminal
WO2023173598A1 (en) * 2022-03-15 2023-09-21 中国华能集团清洁能源技术研究院有限公司 Fan blade defect detection method and system based on improved ssd model
CN115546630A (en) * 2022-09-14 2022-12-30 国网江苏省电力有限公司无锡供电分公司 Construction site extraction method and system based on remote sensing image characteristic target detection
CN115830533A (en) * 2022-11-25 2023-03-21 淮阴工学院 Helmet wearing detection method based on K-means clustering improved YOLOv5 algorithm
CN116092115A (en) * 2022-11-28 2023-05-09 中国计量大学 Real-time lightweight construction personnel safety dressing detection method
CN116704357A (en) * 2023-08-09 2023-09-05 江西省水利科学院(江西省大坝安全管理中心、江西省水资源管理中心) YOLOv 7-based intelligent identification and early warning method for landslide of dam slope

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
徐锐 ; 冯瑞 ; .卷积神经网络的聚焦均方损失函数设计.计算机***应用.2020,(10),全文. *
肖体刚 ; 蔡乐才 ; 汤科元 ; 高祥 ; 张超洋 ; .改进SSD的安全帽佩戴检测方法.四川轻化工大学学报(自然科学版).2020,(04),全文. *

Also Published As

Publication number Publication date
CN117636266A (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN110188807B (en) Tunnel pedestrian target detection method based on cascading super-resolution network and improved Faster R-CNN
JP5325899B2 (en) Intrusion alarm video processor
CN109993031A (en) A kind of animal-drawn vehicle target is driven against traffic regulations behavioral value method, apparatus and camera
CN110533950A (en) Detection method, device, electronic equipment and the storage medium of parking stall behaviour in service
CN111160527A (en) Target identification method and device based on MASK RCNN network model
CN112329881B (en) License plate recognition model training method, license plate recognition method and device
CN113808098A (en) Road disease identification method and device, electronic equipment and readable storage medium
CN110599453A (en) Panel defect detection method and device based on image fusion and equipment terminal
CN112906471A (en) Traffic signal lamp identification method and device
CN111488808A (en) Lane line detection method based on traffic violation image data
CN115965934A (en) Parking space detection method and device
CN111047598B (en) Deep learning-based ultraviolet discharge light spot segmentation method and device for power transmission and transformation equipment
CN111435445A (en) Training method and device of character recognition model and character recognition method and device
CN112784675B (en) Target detection method and device, storage medium and terminal
CN117036895B (en) Multi-task environment sensing method based on point cloud fusion of camera and laser radar
CN116229336B (en) Video moving target identification method, system, storage medium and computer
CN117636266B (en) Method and system for detecting safety behaviors of workers, storage medium and electronic equipment
CN114821484B (en) Airport runway FOD image detection method, system and storage medium
CN111160206A (en) Traffic environment element visual perception method and device
CN115294440A (en) Power transmission line identification method and device and electronic equipment
CN104809438A (en) Method and device for detecting electronic eyes
CN114926815A (en) Driving behavior early warning method and system based on signal lamp identification
CN114913488A (en) Sprinkler detection method, device, electronic device, and storage medium
CN113052118A (en) Method, system, device, processor and storage medium for realizing scene change video analysis and detection based on high-speed dome camera
CN116935168B (en) Method, device, computer equipment and storage medium for target detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant