CN115393905A

CN115393905A - Helmet wearing detection method based on attitude correction

Info

Publication number: CN115393905A
Application number: CN202211356734.4A
Authority: CN
Inventors: 康凯; 艾坤; 刘海峰; 王子磊
Original assignee: Hefei Zhongke Leinao Intelligent Technology Co ltd
Current assignee: Hefei Zhongke Leinao Intelligent Technology Co ltd
Priority date: 2022-11-01
Filing date: 2022-11-01
Publication date: 2022-11-25

Abstract

The invention discloses a helmet wearing detection method based on posture correction, and belongs to the technical field of safety appliance wearing detection. The invention combines the characteristics of the human head target, adds the human head key point detection branch for the human head detector, the added branch is compatible with the current mainstream detector, and the detection precision of the human head can be improved by introducing extra supervision information; the posture of the human head is corrected through the detected key points of the human head, so that the influence of the human head posture on human head attribute classifiers such as safety helmet wearing and the like can be weakened, and the classification precision is improved; when the head attribute classifier such as wearing of a safety helmet and the like is trained, random shaking of key points of the head is adopted for data enhancement, and the robustness of the head attribute classifier can be improved.

Description

Helmet wearing detection method based on attitude correction

Technical Field

The invention relates to the technical field of safety appliance wearing detection, in particular to a safety helmet wearing detection method based on posture correction.

Background

The safety helmet is an indispensable safety appliance for safety production workers in all walks of life. However, due to the intention or the accident of the working personnel, the situation that the safety helmet is not worn frequently occurs in the actual operation, and certain potential safety hazard is caused. In order to guarantee the personal safety of workers, the traditional mode is mainly to monitor or watch monitoring videos and give an early warning through safety supervision personnel on site, but the mode is low in efficiency, and the safety supervision personnel are easy to fatigue and prone to false detection.

For example, in the prior art, the human head target detection is performed by using a self-universal target detection algorithm for reference, and the algorithm design is not performed in combination with the characteristics of the human head, so that the human head posture and the visual angle of a camera in a real scene are complicated and changeable, and the influence of the human head posture on the safety helmet wearing detection is not considered by the current safety helmet wearing attribute classifier. Therefore, a helmet wearing detection method based on posture correction is provided.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: how to design a detection algorithm by combining the characteristics of a human head target, improve the detection precision of the human head target, weaken the influence of human head posture on a human head attribute classifier such as safety helmet wearing and the like, improve the classification precision of human head attributes, and provide a safety helmet wearing detection method based on posture correction.

The invention solves the technical problems through the following technical scheme, and comprises the following steps:

s1: scaling an input image to a set size, sending the input image to a head detector, and detecting to obtain a head bounding box and head key points;

s2: for each detected human head target, intercepting a human head image from an input image according to the detected human head bounding box; calculating an alignment transformation matrix according to the corresponding relation between the detected head key points and the standard head key points, aligning the head image to the standard posture and size by using the alignment transformation matrix, and enabling the target image to be an aligned head image; sending the aligned head images into a head attribute classifier, and outputting to obtain multiple head attributes; and judging whether the safety helmet is worn or not by using the output head attributes.

Further, in step S1, the head detector includes a first backbone network module, a detection head module, and a post-processing module, where the detection head module includes three branches, which are a head classification branch, a head detection branch, and a head key point detection branch, an input image first enters the first backbone network module to obtain image feature representations on multiple spatial scales, and features of each scale are then sent to three branches with independent parameters in the detection head module to obtain a head confidence feature map, a head bounding box feature map, and a head key point coordinate feature map, and the head confidence feature map, the head bounding box feature map, and the head key point coordinate feature map are processed by the post-processing module to obtain final head bounding box and head key point coordinates.

Further, in training the human head detector, each branch leads out a loss term, and for a single sample, the mathematical form of the final loss function is:

wherein the content of the first and second substances,

is a head classification loss item which adopts a softmax cross entropy loss function,

is the probability of being predicted as a human head,

is a true tag of the target;

is a regression loss term of the head bounding box, the loss term adopts a Smooth-L1 loss function,

is the predicted head bounding box of the person,

is a real head bounding box;

is a human head key point regression loss term which adopts a Smooth-L1 loss function,

is a key point of the head of a person to be predicted,

is a real head key point;

，

the loss term is only effective for human head targets, in addition by

And

to control the weight occupied between the lossy terms.

Further, in the step S2, after the head bounding box is detected and enlarged by 1.3 times, the head image corresponding to the head bounding box is cut from the input image.

Further, in the step S2, let

Is the key point of the detected human head,

is a key point of the corresponding standard head of a person,

the relationship between the detected head keypoints and the standard head keypoints is modeled using a similarity transformation, as follows:

wherein the content of the first and second substances,

is a matrix of similarity transformation, which is,

is the translation parameter(s) of the image,

as a function of the scale parameter(s),

is a rotation parameter;

note the book

By using

The corresponding points are listed in the following equation:

；

denote the above formula as

Wherein

The equation is an over-determined equation with a least-squares solution of

(ii) a According to the obtained

A similarity transformation matrix can be obtained

(ii) a Transforming the matrix

And acting on the human head image to obtain the aligned human head image.

Further, in step S2, the head attribute classifier includes a second backbone network module and a plurality of classification branches, the plurality of classification branches are respectively connected to the second backbone network module, the input image enters the second backbone network module to obtain a feature representation of the image, and then the input image is sent to the branches with independent parameters to obtain a plurality of head attributes corresponding to the plurality of classification branches.

Furthermore, the number of the classification branches is at least two, the classification branches are respectively a helmet wearing detection branch and a helmet color classification branch, and the helmet wearing confidence level and the confidence level of each color of the helmet are correspondingly output.

Further, a multi-tasking loss function is used in training the head attribute classifier, and for a single sample, each branch leads out a loss term, and the mathematical form of the loss function is as follows:

wherein the content of the first and second substances,

is a safety helmet wearing partA class loss term, which employs a softmax cross-entropy loss function,

is the predicted probability of headgear being worn,

is a real tag worn by the safety helmet;

is a safety helmet color classification loss item, the loss item adopts a softmax cross entropy loss function,

is the predicted color of the helmet,

is a real helmet color tag;

the loss term is only effective for the head of the person wearing the safety helmet.

Furthermore, when training the head attribute classifier, random jitter of key points of the head is introduced for data enhancement, and the specific operations are as follows: inputting a human head image and key points thereof, adding random offset to the human head key points, namely random jitter, and then aligning the human head image.

Further, in the step S2, the method for determining whether or not to wear the crash helmet using the output attribute of the head of the person in the closed construction site scene includes the steps of:

a1: presetting a threshold;

a2: and when the wearing confidence of the safety helmet of the head image is larger than the threshold value, judging that the person wears the safety helmet, otherwise, judging that the person does not wear the safety helmet.

Compared with the prior art, the invention has the following advantages: the helmet wearing detection method based on the posture correction combines the characteristics of a human head target, adds a human head key point detection branch for a human head detector, the added branch is compatible with a current mainstream detector, and the detection precision of the human head can be improved by introducing extra supervision information; the gesture of the human head is corrected through the detected key points of the human head, so that the influence of the human head gesture on a head attribute classifier such as helmet wearing and the like can be weakened, and the classification precision is improved; when the head attribute classifier such as wearing of a safety helmet and the like is trained, random shaking of key points of the head is adopted for data enhancement, and the robustness of the head attribute classifier can be improved.

Drawings

FIG. 1 is a schematic flow chart of a method for detecting wearing of a helmet based on posture correction according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a human head detector according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a head attribute classifier according to an embodiment of the present invention;

FIG. 4 is a schematic view of a wearing manner of the safety helmet according to an embodiment of the present invention;

FIG. 5 is a second schematic view of a manner of determining how to wear a safety helmet according to an embodiment of the present invention.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

As shown in fig. 1, the present embodiment provides a technical solution: a helmet wearing detection method based on posture correction comprises the following steps:

1) The input image is scaled to a fixed size and sent to a head detector, while head bounding boxes and head keypoints are detected.

2) For each human head target detected, the following operations are performed:

2.1 Intercept the human head image from the input image according to the detected human head bounding box;

2.2 According to the corresponding relation between the detected head key points and the standard head key points, an alignment transformation matrix is calculated, the head images are aligned to the standard posture and size by using the alignment transformation matrix, and the target images become aligned head images;

2.3 Sending the aligned head images into a head attribute classifier, and obtaining head attributes such as the confidence of wearing safety helmets and the confidence of safety helmets of all colors;

2.4 The above output and other information (i.e., the additional information according to the present invention) are used to determine whether or not to wear the crash helmet.

In the step S1:

as shown in fig. 2, an input image first enters a backbone network module to obtain image feature representations on a plurality of spatial scales (fig. 2 only shows 3 scales), and features of each scale are sent to three independent branches of a parameter in a detection head module: the head classification branch, the head detection branch and the head key point detection branch are respectively responsible for predicting the head confidence coefficient, the head bounding box and the coordinates of the head key point. The result of each branch is processed by a post-processing module (mainly non-maximum suppression) to obtain the final head bounding box and the head key point coordinates.

The backbone Network module is realized by adopting convolutional neural networks such as ResNet, denseNet, darkNet or MobileNet, and FPN (Feature Pyramid Network) is added on the backbone Network module to perform multi-scale Feature fusion so as to improve the expression capability of features; each branch is also a small convolutional neural network, assuming that their input feature map size is

The output feature size of the head classification branch is

(the last dimension refers to the probability of a human head and the probability of a non-human head), the output feature map size of the human head detection branch is

(the last dimension refers to the upper left point coordinate of the bounding box and the bounding box width and height, orTheir offsets relative to the corresponding anchor frame), the output feature map size of the human head keypoint detection branch is

(the last dimension refers to the coordinates of the four head keypoints, or their offset relative to the corresponding anchor frame).

In training the human head detector, each branch leads out a loss term, and the mathematical form of the final loss function (for a single sample) is:

wherein the content of the first and second substances,

is a head classification loss term of the person,

is the probability of being predicted as a human head,

if the target is a real label (if the human head label is 1, the non-human head label is 0), the loss item adopts a softmax cross entropy loss function;

is a regression loss term of the head bounding box,

is the predicted head bounding box of the person,

the loss item is a real human head bounding box and adopts a Smooth-L1 loss function;

is a regression loss term of key points of the human head,

is a key point of the head of a person to be predicted,

is a real head key point; the loss term also employs the Smooth-L1 loss function.

，

The loss term is only effective for the human head target, so it is preceded by

Coefficient of otherwise passing

And

to control the weight occupied between the lossy terms.

In step 2.1:

considering that the detected human head image is rotated by the subsequent alignment operation, in order to avoid introducing too many invalid regions on the rotated human head image, the detected human head bounding box is enlarged by 1.3 times, and then the human head image is intercepted according to the enlarged human head bounding box.

In step 2.2:

is provided with

Is a key point of the detected human head,

is a key point of the corresponding standard head of a person,

(ii) a Standard headThe key point is the calculated mean over one additional data set.

In this embodiment, four standard head key points of the head, the neck, the left ear and the right ear are adopted, and the relationship between the detected head key points and the standard head key points is modeled by similarity transformation, as follows:

wherein the content of the first and second substances,

is a matrix of similarity transformation, which is,

is a parameter of the translation that is,

in order to be a scale parameter,

is a rotation parameter;

note the book

By using

The corresponding points are listed in the following equation:

denote the above formula as

Wherein

Because of

So the equation is generally an over-determined equation with a least squares solution of

. According to the obtained

A similarity transformation matrix can be obtained

(ii) a Transforming the matrix

And acting on the head image to obtain the aligned head image.

In step 2.3:

as shown in fig. 3, an input image firstly enters a backbone network module to obtain a feature representation of the image, and then is sent to a branch network with independent parameters, such as a helmet wearing detection branch and a helmet color classification branch, to respectively predict a helmet wearing confidence and confidences of colors of a helmet; other branches (such as head pose, occlusion degree, age, etc.) can also be added according to the application scene of the head attribute classifier.

The backbone network can be implemented by a classical CNN network, such as ResNet, densnet, mobileNet, and the like. Each branch leads out a loss item, and a multitask loss function is adopted when the head attribute classifier is trained, and the mathematical form (for a single sample) of the loss item is as follows:

wherein the content of the first and second substances,

is a classification loss item for wearing the safety helmet,

is the predicted probability of headgear wear,

the loss term is a real label worn by the safety helmet (the label with the safety helmet is 1, and the label without the safety helmet is 0), and the loss term adopts a softmax cross entropy loss function.

Is a color classification loss term of the safety helmet,

is the predicted color of the helmet,

is a true helmet color label, and the loss term also adopts a softmax cross entropy loss function.

The loss item is only effective for the head wearing the safety helmet, so that the front part of the safety helmet is provided with

And (4) the coefficient.

It should be noted that, when training the head attribute classifier, in order to weaken the influence caused by the detection error of the head key point and improve the robustness of the model, random jitter data enhancement of the head key point is introduced, and the specific operations are as follows: inputting the human head image and the key points thereof, adding random offset (namely random jitter) to the human head key points, and then carrying out the alignment operation on the human head image. The data enhancement mode can be carried out off-line or on-line.

In step 2.4:

and judging whether the safety helmet is worn by using the output of the head attribute classifier and other information.

Mode one (see fig. 4): and presetting a threshold, when the wearing confidence of the safety helmet of the head image is greater than the threshold, judging that the person wears the safety helmet, otherwise, judging that the person does not wear the safety helmet. The method is suitable for closed construction site scenes.

Mode two (see fig. 5): firstly, a worker judging module (mainly comprising a human body detector and a human body attribute classifier) is utilized to judge whether a detected person is a worker, the worker is judged according to a first mode, and is directly ignored in a second non-worker mode, and the method is suitable for a scene that passers-by often pass around a construction site.

It should be noted that the head attribute such as wearing a helmet is independent of the head pose angle, but for the head attribute classifier, the pose angle of the head target in the input image may be changed, and different prediction results may be obtained, that is, the head attribute classifier does not have rotation invariance. In order to weaken the influence of the head postures on the head attribute classifier, the invention introduces the alignment of the key points of the head, and the operation can align the head images in different postures to the standard size and turn right. In addition, through the alignment of the key points of the human head, the background area in the human head image can be reduced, and the dependence of the human head attribute classifier on a specific human head detector is weakened (because the bounding boxes detected by different human head detectors are different in size, the standard size with the same normalized style of the human head image can be obtained through the alignment of the key points of the human head).

To sum up, the helmet wearing detection method based on posture correction of the above embodiment combines the characteristics of the human head target, adds the human head key point detection branch to the human head detector, the added branch is compatible with the current mainstream detector, and the detection precision of the human head can be improved by introducing extra supervision information; the posture of the human head is corrected through the detected key points of the human head, so that the influence of the human head posture on human head attribute classifiers such as safety helmet wearing and the like can be weakened, and the classification precision is improved; when the head attribute classifier such as wearing of a safety helmet and the like is trained, random shaking of key points of the head is adopted for data enhancement, and the robustness of the head attribute classifier can be improved.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A helmet wearing detection method based on posture correction is characterized by comprising the following steps:

s1: the input image is zoomed to a set size and sent to a human head detector, and a human head bounding box and human head key points are obtained through detection;

2. The helmet wearing detection method based on posture correction according to claim 1, characterized in that: in step S1, the head detector includes a first backbone network module, a detection head module, and a post-processing module, where the detection head module includes three branches, which are a head classification branch, a head detection branch, and a head key point detection branch, an input image first enters the first backbone network module to obtain image feature representations on multiple spatial scales, and features of each scale are sent to three branches with independent parameters in the detection head module to obtain a head confidence feature map, a head bounding box feature map, and a head key point coordinate feature map, respectively, and the head confidence feature map, the head bounding box feature map, and the head key point coordinate feature map are processed by the post-processing module to obtain final head bounding box and head key point coordinates.

3. The helmet wearing detection method based on posture correction according to claim 2, characterized in that: in training the human head detector, each branch leads out a loss term, and for a single sample, the mathematical form of the final loss function is:

wherein the content of the first and second substances,

is the probability of being predicted as a human head,

is the true tag of the target;

is the predicted head bounding box of the person,

is a real head bounding box;

is a key point of the head of a person to be predicted,

is a real head key point;

，

the loss term is only effective for human head targets, in addition by

And

to control the weight occupied between the lossy terms.

4. The helmet wearing detection method based on posture correction according to claim 3, characterized in that: in the step S2, let

Is a key point of the detected human head,

is the key point of the corresponding standard head,

wherein the content of the first and second substances,

is a matrix of similarity transformation, which is,

is the translation parameter(s) of the image,

as a function of the scale parameter(s),

is a rotation parameter;

note the book

By using

The corresponding points are listed in the following equation:

；

denote the above formula as

Wherein

The equation is an overdetermined equation having a least squares solution of

(ii) a According to the obtained

A similarity transformation matrix can be obtained

(ii) a Transforming the matrix

And acting on the head image to obtain the aligned head image.

5. The helmet wearing detection method based on posture correction according to claim 4, characterized in that: in step S2, the head attribute classifier includes a second backbone network module and a plurality of classification branches, the plurality of classification branches are respectively connected to the second backbone network module, the input image enters the second backbone network module first to obtain the feature representation of the image, and then the input image is sent to the branches with independent parameters to obtain a plurality of head attributes corresponding to the plurality of classification branches.

6. The helmet wearing detection method based on posture correction according to claim 5, characterized in that: the number of the classification branches is at least two, the classification branches are respectively a helmet wearing detection branch and a helmet color classification branch, and the helmet wearing confidence coefficient and the confidence coefficient of each color of the helmet are correspondingly output.

7. The helmet wearing detection method based on posture correction according to claim 5, characterized in that: when the head attribute classifier is trained, random shaking of key points of the head is introduced for data enhancement, and the specific operation is as follows: inputting a human head image and key points thereof, adding random offset, namely random jitter, to the human head key points, and then carrying out alignment operation on the human head image.

8. The helmet wearing detection method based on posture correction according to claim 7, characterized in that: in the step S2, the method of determining whether or not the crash helmet is worn using the output attribute of the head includes the steps of:

a1: presetting a threshold;