CN113449609A

CN113449609A - Subway violation early warning method based on improved HigherHRNet model and DNN (deep neural network)

Info

Publication number: CN113449609A
Application number: CN202110643907.XA
Authority: CN
Inventors: 张义红; 蒲安会; 李德敏
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2021-09-28

Abstract

The invention relates to a subway violation early warning method based on an improved HigherHRNet model and a DNN network, which comprises the following steps: 1) for the subway behavior image to be detected, extracting human joint features including joint point position features, joint point movement features and human body movement features by constructing a HigherHRNet model; 2) constructing a data set after preprocessing according to the extracted human body joint features; 3) and (4) identifying the illegal action of the data set by adopting the DNN deep neural network, and further judging whether the illegal action occurs. Compared with the prior art, the invention has the advantages of accurate identification, portrait distinguishing, effective identification of lying and non-mask behaviors and the like.

Description

Subway violation early warning method based on improved HigherHRNet model and DNN (deep neural network)

Technical Field

The invention relates to the technical field of public traffic target detection, in particular to a subway violation early warning method based on an improved HigherHRNet model and a DNN (deep traffic network).

Background

The HigherHRNet model is the most advanced algorithm in the task of multi-person joint point identification bottom-up at present, and the model is more accurate in joint point positioning and can identify the joint points with smaller human objects in pictures. When the positioning task is involved, the position of the target needs to be saved at high resolution, and then the position of the target is added in the deconvolution and upsampling processes, so that the positioning of the target can be realized. For example, PersonLab generates a high resolution heat map by increasing the input resolution, and ourglass recovers high resolution by a symmetric low-to-high resolution (symmetry low-to-high process) process. SimpleBaseline adopts a small number of transposed convolution layers (transposed convolution layers) to generate a high-resolution representation, however, the methods cannot generate a high-resolution heat map, and therefore cannot adapt to detection and early warning of the current illegal behavior (without wearing a mask) of the subway.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a subway violation early warning method based on an improved HigherHRNet model and a DNN network.

The purpose of the invention can be realized by the following technical scheme:

a subway violation early warning method based on an improved HigherHRNet model and a DNN network comprises the following steps:

1) for the subway behavior image to be detected, extracting human joint features including joint point position features, joint point movement features and human body movement features by constructing a HigherHRNet model;

2) constructing a data set after preprocessing according to the extracted human body joint features;

3) and (4) identifying the illegal action of the data set by adopting the DNN deep neural network, and further judging whether the illegal action occurs.

In the step 1), for the blurred and unclear image to be detected, super-resolution reconstruction processing is adopted to enable the resolution of the reconstructed image to meet the requirements of a HigherHRNet model.

In the step 1), judging whether the subway window mirror reflects the shadow in the subway carriage at night by adopting a double-threshold method, and removing the reflected shadow.

The double-threshold method judges whether the reflection figure is a reflection figure through the connecting line between all the joint points, and specifically comprises the following steps:

11) n joint points I corresponding to human figures or reflected human shadows in the image₁,I₂,I₃,...I_nAnd (3) connecting to form n connecting lines, wherein the connecting lines comprise:

wherein S is_bFor a given length threshold, if the sum of the distances of adjacent joint points c₁If the length is larger than the length threshold, determining that the person is a real person, and if the length is smaller than the length threshold, determining that the person is a reflection figure;

12) further judging on the judgment result of the step 11), specifically:

W_x＝max(I_1x,I_2x,I_3x...I_nx)-min(I_1x,I_2x,I_3x...I_nx)

W_y＝max(I_1y,I_2y,I_3y...I_ny)-min(I_1y,I_2y,I_3y...I_ny)

S＝w_x*w_y

wherein, w_x,w_yRespectively the transverse and longitudinal distances of the connecting lines of all the joints, I_1xRepresents the transverse distance of the line connecting all the joints, I_1yAnd S is the summation area of all the joint point connecting lines, and when the summation area of all the joint point connecting lines is larger than a set threshold value, the target is judged to be a real person.

When the HigherHRNet model regresses the joint points, the HigherHRNet model identifies the lying posture by reducing the contact degree threshold value between the vector ab and the vector ba which are generated by the joint points a and b.

In the step 2), the pretreatment is specifically PCA dimension reduction.

In the step 2), the behavior data sets corresponding to different violation behaviors are used as the input of the DNN deep neural network for training.

In the illegal action identification, continuous frame picture data of videos containing multiple illegal actions are labeled to form data sets of different illegal actions.

For face and mask recognition, the positions of the eyes of a person are obtained through the joint point detection of a HigherHRNet model, the face is positioned according to the positions of the eyes, and then a MobileNet V2 network is adopted to recognize whether the face and the mask are worn or not.

The illegal behaviors specifically comprise running, jumping, lying, boxing, kicking, putting feet on a seat, pulling upwards, smoking and throwing garbage.

Compared with the prior art, the invention has the following advantages:

the invention gives full play to the respective advantages of the HigherHRNet model and the DNN network, and on the basis of accurately identifying the joint points and judging the portrait by the HigherHRNet model, the invention combines the DNN network to accurately and quickly identify the illegal behaviors, thereby solving the specific problems that the mirror image identification error occurs in the night picture, the pedestrian can not be identified when lying horizontally, the pedestrian wears the mask, and the like.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic structural diagram of a HigherHRNet model.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

As shown in fig. 1, the invention provides a subway violation early warning method based on an improved highernet model and a DNN network, the adopted highernet is improved on the basis of the HRNet, the problem that the task precision of detecting small-and-small-scale pedestrian joints in an image is not high in the prior art is solved, and the technical key points are that the detection task is matched with scale change, namely the performance of small human bodies is improved and high-quality high-resolution heat maps are generated under the condition that the performance of the large human bodies is not sacrificed, so that key points of small people are accurately positioned.

The invention takes deep learning as background, combines artificial intelligence with image recognition, and recognizes various illegal behaviors on the subway, whether wearing a mask or not, and the like, and specifically comprises the following three aspects:

(1) the joint points of the person are detected.

(2) Acquiring a characteristic vector of a person according to the position information of the joint points of the person, and inputting the characteristic vector into a DNN (digital network) to acquire a result;

(3) the face is located using the MobileNetV2 model.

The invention uses an improved high-resolution network (HigherHRNet) to recognize human body postures, takes the high-resolution network as a branch, and forms a plurality of parallel branch networks with a plurality of low-resolution networks in parallel. Firstly, the resolution of the picture on the own branch network cannot be reduced, secondly, each branch network is not fused at the final output, but each network branch is continuously interacted, the high-resolution network provides position information for the low-resolution network, and the low-resolution network provides characteristic information for the high-resolution network. And extracting the characteristics of the human joints acquired by the HigherHRNet, and mainly extracting the position characteristics of the joint points, the moving characteristics of the joint points and the moving characteristics of the human body. And making a data set according to the features, and preprocessing the features before retraining, specifically performing dimension reduction on the original 334 features by using LDA. And finally, putting the preprocessed features into different models for experimental comparison to obtain the accuracy of the models on different evaluation standards, and drawing a confusion matrix of each model. In addition, in the aspect of mask identification, a MobileNet V2 model is selected to perform positioning identification on the face.

Examples

The invention firstly utilizes an improved HigherHRNet model (as shown in figure 2) to detect the pedestrian joint points and form a model, and adopts super-resolution processing aiming at fuzzy and unclear pictures to ensure that the reconstructed resolution meets the input requirement of the HigherHRNet, so that the network can smoothly detect the joint points. The invention respectively adopts the existing bilinear interpolation, bilinear interpolation and deep learning super-resolution technologies of the super-resolution reconstruction technology.

However, the HigherHRNet model is a laboratory product, and the performance of the model on the precision is the best at presentEspecially, the method has good detection effect on the target with small human in the picture, and has good adaptability to light. However, in a subway car, the picture taken by the camera in the car is not perfect according to actual conditions, for example, at night, the subway window reflects the shadow of a person, and the model can predict the joint point in the mirror due to the fact that the model can adapt to the change of light. Aiming at the problem, the invention adopts a double-threshold method to eliminate the shadow reflected in the window according to two thresholds. The method specifically comprises the steps of judging whether the mirror is reflected or not by utilizing connecting lines among all joint points, wherein the judgment is divided into two steps, the first step is to connect n joints in the mirror, and the joint points are respectively represented as I₁,I₂,I₃,...I_nN connecting lines in total, wherein the formula is as follows:

S_bif the length is larger than the set length threshold, the person is in reality, and if the length is smaller than the set length threshold, the person is in mirror image. After the first step, the second step of judgment is carried out, and the judgment criteria are as follows:

W_x＝max(I_1x,I_2x,I_3x...I_nx)-min(I_1x,I_2x,I_3x...I_nx)

W_y＝max(I_1y,I_2y,I_3y...I_ny)-min(I_1y,I_2y,I_3y...I_ny)

S＝w_x*w_y，w_x,w_ydenotes the distance, I, of all joint points in the transverse and longitudinal directions_1xRepresenting the transverse coordinates of all joint points, I_1yRepresents the longitudinal coordinates of all the joint points, and S represents the area. And only when the connecting lines and the sum area of all the joint points meet the threshold value requirement, judging that the target is a real person.

When the HigherHRNet model is used for joint point regression, 17 joint points of a target are mutually influenced and connected, because each joint point a is identified, the joint point a extends to other joint points to form a vector, for example, the joint point a extends to the joint point b to form a vector ab, the joint point b also extends to the joint point a to form a vector ba, when the coincidence degree of the vectors ab and ba is high, the positioned joint point is considered to be correct, and in practical application, the joint point of a target person is found to be blocked by an object and cannot be completely detected, so that the model is not accurate enough when the joint points are regressed. In addition, the normal training set is basically in a standing posture, so that the model cannot learn the characteristics of the lying posture person, and the joint points can not return accurately.

After the joint point detection model is established, the action characteristics are preprocessed through an illegal action recognition model, and the PCA is adopted for dimension reduction. The PCA is sorted according to the magnitude of the characteristic value of the covariance matrix, the most important part in the front is reserved, the dimension in the back is saved, the effect of reducing the dimension so as to simplify a model or compress data can be achieved, and meanwhile, the information of the original data is kept to the maximum extent. Due to the diversity of violations, the data created and also the data set for the different actions should be. Because the identification of the action needs a large amount of picture data of continuous frames, the invention labels a section of video, firstly the section of video must contain a plurality of non-civilized actions, secondly each action must last for a period of time, and further ensures that the time of each action cannot be very different. And finally, training by adopting a DNN full-connection network.

In the aspect of face mask recognition, the invention does not adopt other face detection models to position the face, but adopts the improved joint point detection technology of the HigherHRNet model and the trained MobileNet V2 network to recognize. The data set used in the present invention contains the joint points of a human face, including the eyes, nose, and ears of a human, which are recognized even with a mask. When the joint point model training is carried out, the characteristics of the face are required to be used as joint points for training, so that important characteristic points of the face can be directly obtained, then a frame of the whole face is predicted according to the characteristic points, when the face is shielded by a mask, the HigherHRNet model can detect the eyes of the face, the approximate region of the whole face is judged according to the positions of the eyes, and the prediction method is specifically as follows:

wherein L is_low、L_up、L_left、L_rightRespectively representing the lower line, the upper line, the left line and the right line of the face rectangular frame. d_eyel、d_eyerRepresenting left-eye and right-eye pixel locations, respectively. d_low、d_up、d_left、d_rightRespectively representing the pixel distances between the center point between the two eyes and the mandible, between the vertex of the head, between the left ear and the right ear. These pixel positions are derived from the ratio of the pixel distance to the true distance. The distance between the eyes is set as the standard d_mThe left eye is 1.5 x d from the left frame_mThe distance from the right eye to the right frame is 1.5 x d_mThe distance between the vertex and the center point between the two eyes is 1.5 x d_mThe distance between the mandible and the center point between the two eyes is 2.5 x d_mFinally, a face positioning picture is obtained according to the method, and then training and recognition of whether a mask is worn or not are carried out by using a MobileNet V2 network.

Claims

1. A subway violation early warning method based on an improved HigherHRNet model and a DNN network is characterized by comprising the following steps:

2. The subway violation early warning method based on the improved HigherHRNet model and the DNN network as claimed in claim 1, wherein in step 1), for the blurred and unclear image to be detected, the super-resolution reconstruction processing is adopted to enable the resolution of the reconstructed image to meet the requirement of the HigherHRNet model.

3. The subway violation behavior early warning method based on the improved HigherHRNet model and the DNN network as claimed in claim 1, wherein in step 1), for the case that a subway window mirror reflection figure appears in a subway carriage at night, a double-threshold method is adopted to judge whether the reflection figure is a real person, and the reflection figure is removed.

4. The subway violation early warning method based on the improved HigherHRNet model and the DNN network as claimed in claim 3, wherein the double-threshold method judges whether the reflection artifact is determined by the connection between the joint points, specifically comprising the steps of:

12) further judging on the judgment result of the step 11), specifically:

W_x＝max(I_1x,I_2x,I_3x...I_nx)-min(I_1x,I_2x,I_3x...I_nx)

W_y＝max(I_1y,I_2y,I_3y...I_ny)-min(I_1y,I_2y,I_3y...I_ny)

S＝w_x*w_y

5. The subway violation behavior early warning method based on the improved HigherHRNet model and the DNN network as claimed in claim 4, wherein when the HigherHRNet model regresses the joint points, the HigherHRNet model identifies the lying posture by reducing the contact degree threshold between the vector ab and the vector ba generated by judging the mutual of the two joint points a and b.

6. The subway violation behavior early warning method based on the improved HigherHRNet model and the DNN network as claimed in claim 1, wherein in the step 2), the preprocessing is specifically PCA dimension reduction.

7. The subway violation early warning method based on the improved HigherHRNet model and the DNN network as claimed in claim 1, wherein in step 2), the behavior data sets corresponding to different violation behaviors are trained as the input of the DNN deep neural network.

8. The subway violation early warning method based on the improved HigherHRNet model and the DNN network as claimed in claim 7, wherein in violation identification, different violation data sets are formed by labeling continuous frame picture data of videos containing multiple violations simultaneously.

9. The subway illegal behavior early warning method based on the improved HigherHRNet model and the DNN network as claimed in claim 1, wherein for face and mask recognition, the positions of the eyes of a person are obtained through the joint point detection of the HigherHRNet model, the face is positioned according to the positions of the eyes, and then the MobileNet V2 network is adopted to recognize whether the face mask is worn or not.

10. A subway violation early warning method based on an improved HigherHRNet model and DNN network as claimed in claim 1, wherein said violation specifically comprises running, jumping, lying, boxing, kicking, feet on seat, chin up, smoking and throwing garbage.