CN113516028A - Human body abnormal behavior identification method and system based on mixed attention mechanism - Google Patents
Human body abnormal behavior identification method and system based on mixed attention mechanism Download PDFInfo
- Publication number
- CN113516028A CN113516028A CN202110468555.9A CN202110468555A CN113516028A CN 113516028 A CN113516028 A CN 113516028A CN 202110468555 A CN202110468555 A CN 202110468555A CN 113516028 A CN113516028 A CN 113516028A
- Authority
- CN
- China
- Prior art keywords
- features
- feature
- characteristic
- low
- level
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010000117 Abnormal behaviour Diseases 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000007246 mechanism Effects 0.000 title claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 38
- 238000000605 extraction Methods 0.000 claims abstract description 33
- 238000012216 screening Methods 0.000 claims abstract description 14
- 238000003062 neural network model Methods 0.000 claims abstract description 8
- 238000011176 pooling Methods 0.000 claims description 103
- 230000004913 activation Effects 0.000 claims description 14
- 230000006399 behavior Effects 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 13
- 238000009499 grossing Methods 0.000 claims description 11
- 230000005284 excitation Effects 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 8
- 230000002159 abnormal effect Effects 0.000 claims description 7
- 230000006835 compression Effects 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 6
- 230000002779 inactivation Effects 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000000391 smoking effect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a human body abnormal behavior identification method and system based on a mixed attention mechanism, wherein the identification method comprises the following steps: extracting the features of the original image to obtain low-level detail features F; screening the low-level detail features F to obtain main significant features F'; inputting the main significant features F' into a convolution feature extraction module to obtain high-level semantic features; fusing the high-level semantic features and the low-level detail features to obtain fused features; calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value; optimizing a training parameter based on the loss value; training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model; and identifying the abnormal behaviors of the human body based on the trained abnormal behavior identification model. The method can improve the identification precision of the abnormal behaviors of the human body.
Description
Technical Field
The invention relates to the field of human body abnormal behavior recognition, in particular to a human body abnormal behavior recognition method and system based on a mixed attention mechanism.
Background
Human body abnormal behavior detection has recently been gaining wide attention in academic and industrial fields as one of the research hotspots in the field of human body behavior recognition. With the rapid development of social economy, whether explosion-proof measures of important places such as gas stations are in place or not directly threatens the safety of surrounding people and buildings. According to incomplete statistics, the proportion of smokers in China is as high as 26.92%, and the explosion accident rate caused by smoking is as high as 12.2%. As is well known, flammable and explosive oil gas is reserved in the air of a gas station, so that the possibility of explosion accidents caused by smoking at and near the gas station is higher; in addition, the illegal behaviors such as smoking, calling and the like of the driver in the driving process also have great potential safety hazards. Therefore, people hope to analyze human behaviors, pertinently and emphatically strengthen prevention, and can send out hidden danger early warning before the occurrence of potential safety hazards to prevent the hidden dangers.
At present, classification and identification methods for human body abnormal behaviors are divided into two types according to different feature extraction modes, wherein one type is a traditional method for extracting features by relying on manual design, and the other type is a method based on deep learning. The method for extracting the artificial design features mainly judges whether abnormal behaviors occur or not through a series of means such as target detection, feature extraction and the like according to specific abnormal behavior characteristics. The traditional abnormal behavior recognition algorithm has the advantages and the disadvantages. On one hand, the traditional abnormal behavior recognition algorithm does not need complex calculation amount and strong hardware device support. Therefore, for sample data with a small calculation amount, the detection of abnormal behaviors by using the traditional recognition algorithm is more advantageous. On the other hand, there are also disadvantages, such as manually extracting features only for specific scenes, which causes its limitation and unity, and poor generalization ability. Different from the traditional method, the deep learning-based method does not need manual extraction, and mainly trains and learns the model by artificially defining some abnormal behaviors or directly based on data on the basis of the special requirements of a scene on the basis of human behavior recognition and classification, and the extracted deep features can effectively express the human behaviors and enhance the adaptability of the model to input data.
With the development of deep learning, attention mechanism is gradually and widely applied to the fields of computer vision and the like. Jaderberg et al think that the direct pooling method is too violent, and the key information cannot be identified due to direct combination of the information, so that a space conversion module is provided to perform corresponding space transformation on the space domain information in the picture, and the key information can be extracted; hu et al consider the contribution weight of the feature map for each channel to be different, and therefore propose a compressed excitation network that adaptively recalibrates the feature response in terms of channels by explicitly modeling the interdependencies between channels; although the channel attention mechanism shows great potential in improving the performance of deep convolutional neural networks, the existing method inevitably increases the complexity of the model while obtaining better performance, and Wang et al [7] propose an effective channel attention module that maintains performance while significantly reducing the complexity of the model in order to overcome the contradiction between performance and complexity; fu et al propose a double-attention mechanism, which, unlike the previous one by multi-scale feature fusion, extracts significant features with relevance from spatial dimension and channel dimension, adaptively integrates local features and their global dependency.
Inspired by an attention mechanism, the method for identifying the abnormal behavior of the mixed attention mechanism is provided, and the characteristics that a convolution block attention module can effectively extract spatial information and channel information are utilized to highlight the significance characteristics of an identified object; meanwhile, the hidden high-level semantic information is mined layer by using an improved convolution feature extraction module and is combined with the low-level information, so that the classification performance of the network is further improved.
Disclosure of Invention
The invention aims to provide a human body abnormal behavior identification method and system based on a mixed attention mechanism, which can realize accurate identification of human body abnormal behaviors.
In order to achieve the purpose, the invention provides the following scheme:
a human body abnormal behavior identification method based on a mixed attention mechanism comprises the following steps:
extracting the features of the original image to obtain low-level detail features F;
screening the low-level detail features F to obtain main significant features F';
inputting the main significant features F' into a convolution feature extraction module to obtain high-level semantic features;
fusing the high-level semantic features and the low-level detail features to obtain fused features;
calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value;
optimizing a training parameter based on the loss value;
training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model;
and identifying the abnormal behaviors of the human body based on the trained abnormal behavior identification model.
Optionally, screening the low-level detail feature F to obtain a main significant feature F ″ specifically includes:
inputting the low-level detail features F into a global average pooling layer and a maximum pooling layer of a space dimension, and sending the low-level detail features F into a shared network MLP to obtain first average pooling features and first maximum pooling features;
splicing the first average pooling characteristic and the first maximum pooling characteristic, and obtaining a weight coefficient Mc through a Sigmoid activation function;
multiplying the weight coefficient Mc with the low-level detail feature F to obtain a new feature F' after zooming;
inputting the new feature F' into an average pooling layer and a maximum pooling layer of the channel dimension to obtain a second average pooling feature and a second maximum pooling feature;
splicing the second average pooling characteristic and the second maximum pooling characteristic, and obtaining a weight coefficient Ms through a Sigmoid activation function;
and multiplying the weight coefficient Ms and the scaled new feature F 'to obtain a main significant feature F'.
Optionally, inputting the main significant feature F ″ to a convolution feature extraction module to obtain a high-level semantic feature and fusing the high-level semantic feature and the low-level detail feature, where the obtaining of the fused feature specifically includes:
carrying out point-by-point convolution and depth separable convolution operations on the main significant features F' to obtain features G;
compressing the feature G by adopting global average pooling to obtain a compressed vector L;
carrying out excitation operation on the compressed vector L to obtain an output S;
weighting the output S to a characteristic G to obtain a re-calibrated characteristic I;
performing maximum pooling operation and average pooling operation on the re-calibrated characteristic I to obtain a maximum pooling characteristic and an average pooling characteristic;
splicing the maximum pooling characteristics and the average pooling characteristics, and generating a characteristic mapping Q by utilizing convolutions;
Mapping the feature to QsWeighting on the characteristic I, performing characteristic recalibration, ending by point-by-point convolution of 1x1, recovering original channel dimension, performing connection inactivation and inputtingAnd jumping connection, namely fusing the high-level semantic features and the low-level detail features extracted by the convolution feature extraction module in a multi-level manner to obtain fused features.
Optionally, the loss between the predicted value and the actual value of the training sample is calculated, and the obtained loss value specifically adopts the following formula:
and y ═ 1-epsilon x y + epsilon/k, wherein k represents the number of classes in a specific task, y represents a k-dimensional matrix composed of k classes, epsilon represents a smoothing factor, and y' represents a k-dimensional matrix composed of k classes after label smoothing.
The invention further provides a human body abnormal behavior recognition system based on a mixed attention mechanism, which comprises:
the low-level detail feature extraction module is used for extracting features of the original image to obtain low-level detail features F;
the main significant feature screening module is used for screening the low-level detail features F to obtain main significant features F';
the high-level semantic feature extraction module is used for inputting the main significant features F' into the convolution feature extraction module to obtain high-level semantic features;
the feature fusion module is used for fusing the high-level semantic features and the low-level detail features to obtain fused features;
the loss value calculation module is used for calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value;
the optimization module is used for optimizing the training parameters based on the loss values;
the training module is used for training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model;
and the abnormal behavior recognition module is used for recognizing the abnormal behavior of the human body based on the trained abnormal behavior recognition model.
Optionally, the module for screening for main significant features specifically includes:
the first average pooling feature and first maximum pooling feature extracting unit is used for inputting the low-level detail features F into a global average pooling layer and a maximum pooling layer of a space dimension and sending the low-level detail features F into a shared network MLP to obtain first average pooling features and first maximum pooling features;
the weight coefficient Mc calculating unit is used for splicing the first average pooling characteristic and the first maximum pooling characteristic and obtaining a weight coefficient Mc through a Sigmoid activation function;
the characteristic F 'determining unit is used for multiplying the weight coefficient Mc and the low-level detail characteristic F to obtain a new scaled characteristic F';
a second average pooling feature and second maximum pooling feature extracting unit, configured to input the new feature F' to an average pooling layer and a maximum pooling layer of a channel dimension to obtain a second average pooling feature and a second maximum pooling feature;
the weight coefficient Ms calculation unit is used for splicing the second average pooling characteristic and the second maximum pooling characteristic and obtaining a weight coefficient Ms through a Sigmoid activation function;
and the main significant feature F ' determination unit is used for multiplying the weight coefficient Ms and the scaled new feature F ' to obtain a main significant feature F '.
Optionally, the high-level semantic feature extraction module and the feature fusion module specifically include:
a point-by-point convolution and depth separable convolution operation unit for performing point-by-point convolution and depth separable convolution operations on the main significant feature F' to obtain a feature G;
the compression unit is used for performing compression operation on the characteristic G by adopting global average pooling to obtain a compressed vector L;
the excitation operation unit is used for carrying out excitation operation on the compressed vector L to obtain an output S;
the recalibration unit is used for weighting the output S to the characteristic G to obtain a recalibrated characteristic I;
a maximum pooling operation and average pooling operation unit for performing maximum pooling operation and average pooling operation on the re-calibrated feature I to obtain a maximum pooling feature and an average pooling feature;
a splicing unit for splicing the maximum pooling characteristic and the average pooling characteristic and generating a characteristic mapping Q by convolutions;
A feature fusion unit for mapping the features QsWeighting the feature I, performing feature recalibration, ending by point-by-point convolution of 1x1, recovering the original channel dimension, performing connection inactivation and input jump connection, and fusing the high-level semantic features and the low-level detail features extracted by the convolution feature extraction module in a multi-level manner to obtain fused features.
Optionally, the loss value calculation module specifically adopts the following formula:
and y ═ 1-epsilon x y + epsilon/k, wherein k represents the number of classes in a specific task, y represents a k-dimensional matrix composed of k classes, epsilon represents a smoothing factor, and y' represents a k-dimensional matrix composed of k classes after label smoothing.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the method comprises the steps of extracting features through an original image to obtain low-level detail features F; screening the low-level detail features F to obtain main significant features F'; inputting the main significant features F' into a convolution feature extraction module to obtain high-level semantic features; fusing the high-level semantic features and the low-level detail features to obtain fused features; calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value; optimizing a training parameter based on the loss value; training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model; the abnormal behaviors of the human body are identified based on the trained abnormal behavior identification model, so that the identification precision and effect of the abnormal behaviors of the human body are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic diagram of an abnormal behavior recognition framework of a hybrid attentive mechanism according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for recognizing abnormal human behavior based on a hybrid attention mechanism according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a convolution block attention module according to an embodiment of the present invention;
FIG. 4 is a diagram of a convolution feature extraction module according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a human body abnormal behavior recognition system based on a hybrid attention mechanism according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a human body abnormal behavior identification method and system based on a mixed attention mechanism, which can realize accurate identification of human body abnormal behaviors.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a schematic diagram of an abnormal behavior recognition framework of a hybrid attention mechanism according to an embodiment of the present invention, and fig. 2 is a flowchart of a human body abnormal behavior recognition method based on the hybrid attention mechanism according to an embodiment of the present invention, as shown in fig. 1 and fig. 2, the method includes:
step 101: and (4) performing feature extraction on the original image to obtain low-level detail features F.
Specifically, the original picture is processed by a Stem module to obtain a feature F.
Step 102: and screening the low-level detail feature F to obtain a main significant feature F'.
In order to enhance the significance characteristics and reduce the attention degree of other information, a convolution attention module is introduced, the low-level detail characteristics F extracted in the step 1 are scaled to obtain new main significance characteristics F ", the structure of the convolution attention module is shown in FIG. 3, and the specific processing flow is as follows:
to effectively focus on meaningful channel features, channel attention is calculated. Firstly, respectively carrying out global average pooling and maximum pooling on the features F through a space dimension, then respectively sending the features F into a shared network MLP, splicing the two obtained features, and then obtaining a weight coefficient M through a Sigmoid activation functioncAnd finally the weighting factor McMultiplying the original input feature F to obtain a new feature F' after scaling, which is defined as:
wherein the content of the first and second substances,andmean pooling characteristic and maximum pooling characteristic are represented, respectively, and σ represents Sigmoid activation function.
To effectively focus on meaningful spatial features, spatial attention is computed. Firstly, respectively carrying out average pooling and maximum pooling on the features F' in one channel dimension, and splicing the two features together; then, a convolution layer with an activation function of Sigmoid is used to obtain a weight coefficient MsAnd finally the weighting factor MsMultiplying the input feature F 'to obtain a scaled new feature F', which is defined as:
wherein, sigma represents Sigmoid activation function, f represents convolution operation,andrepresenting the average pooling characteristic and the maximum pooling characteristic, respectivelyRepresenting the point-by-point multiplication of the matrix and F "the final output.
Step 103: and inputting the main significant features F' into a convolution feature extraction module to obtain high-level semantic features.
In order to further mine high-level semantic information and improve the feature extraction capability of a network model, a convolution feature extraction module is provided, the significant feature F' obtained in the step 102 is input to obtain high-level semantic features and is fused with low-level detail features to enhance the interactivity of the network model, the structure of the convolution feature extraction module is shown in FIG. 4, and the specific processing flow is as follows:
firstly, performing 1 × 1 point-by-point convolution on input F ″, changing the dimensionality of an output channel according to an expansion ratio and adopting a deep separable convolution operation, and effectively reducing the parameter number while ensuring the independence among the channels, wherein the definition is as follows:
G=f2(f1(F″)) (5)
wherein, F' represents the input of the input,g denotes the input feature map, f1(. represents a point-by-point convolution, f2(. cndot.) represents a depth separable convolution.
Then, in order to obtain the global distribution of response on the feature channel, the global mean pooling is used as a compression operation, the feature G is changed into a feature of 1 × 1 × C by a convolution operation, and the obtained vector has a global receptive field to some extent, and the formula is as follows:
wherein, UsqIndicating a compression operation, L indicates a compressed vector, and H × W indicates its size.
Then, a full-link layer is adopted to form a Bottleneeck structure to learn the correlation among channels, a parameter W is introduced to generate a weight for each feature channel, wherein the parameter W is learnable and is convolved by the convolution of an activation proportion multiplied by a global feature dimension by a number of 1x1, and the formula is as follows:
S=Uex(L,W) (7)
wherein, UexRepresenting the excitation operation, S being the output of the operation, and possibly characterizing the importance of different features, W adjusting the excitation operation based on a scale parameter.
The output weight of the excitation operation is regarded as the importance of each selected characteristic channel, and the selected characteristic channel is weighted to the previous characteristic channel by channel through multiplication, so that the recalibration of the original characteristic in the channel dimension is completed, and the formula is as follows:
I=Uscale(G,S)=G·S (8)
wherein, denotes a matrix multiplication operation, UscaleIndicating an assign weight operation.
In order to obtain deeper high-level feature information, maximum pooling operation and average pooling operation are respectively carried out on the re-calibrated features, the unique features of the object are effectively extracted, the extracted pooling features are spliced, and a new feature is generated by convolutionSign mapping QsIt is defined as follows:
Qs(I)=σ(h([Iavg;Imax])) (9)
wherein, IavgAnd ImaxMean pooling and maximum pooling features are represented, respectively, σ represents Sigmoid activation function, and h (·) represents convolution operation.
Finally, the features are mapped to QsWeighting the previous feature I, ending by point-by-point convolution of 1x1 after completing the re-calibration of the feature, recovering the original channel dimension, performing connection inactivation and input jump connection, fusing the high-level semantic features and the low-level detail features extracted by a convolution feature extraction module in a multi-level manner, and enhancing the interactivity, wherein the definition is as follows:
Z=D(G′(U′scale(I·Qs))) (10)
wherein D (-) denotes a hopping connection,. denotes a matrix multiplication operation, G 'denotes a convolution operation, U'scaleIndicating the assignment of weights, Z indicates the output characteristics.
Step 104: and fusing the high-level semantic features and the low-level detail features to obtain fused features.
Step 105: and calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value.
The cross entropy loss function is corrected by adopting label smoothing, the curve is smooth, derivation is easy to conduct, the gradient is stable, the network has better generalization, finally, more accurate prediction is generated on invisible data, and the accuracy of image classification is improved.
y′=(1-ε)×y+ε/k (11)
Wherein k represents the number of classes in a specific task, y represents a k-dimensional matrix composed of k classes, epsilon represents a smoothing factor, and y' represents a k-dimensional matrix composed of k classes after label smoothing.
Step 106: and optimizing the training parameters based on the loss values.
Step 107: and training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model.
Step 108: and identifying the abnormal behaviors of the human body based on the trained abnormal behavior identification model.
And training the model based on the characteristic information, the model parameters and all the training samples to obtain a trained abnormal behavior recognition model, and recognizing and classifying the abnormal behaviors in all the test samples based on the obtained abnormal behavior model.
All samples are calculated through the softmax function to obtain a corresponding probability, the probability sum is 1, and the corresponding abnormal behavior category with the maximum probability is judged.
PiRepresenting the probability that the predicted object belongs to the i-th class behavior, exp (-) representing the mapping of the real output to zero to positive infinity, ΣiMeaning that all probabilities are summed.
Fig. 5 is a schematic structural diagram of a human body abnormal behavior recognition system based on a hybrid attention mechanism according to an embodiment of the present invention, and as shown in fig. 5, the system includes:
the low-level detail feature extraction module 201 is configured to perform feature extraction on an original image to obtain a low-level detail feature F;
a main significant feature screening module 202, configured to screen the low-level detail feature F to obtain a main significant feature F ″;
the high-level semantic feature extraction module 203 is used for inputting the main significant features F' into the convolution feature extraction module to obtain high-level semantic features;
a feature fusion module 204, configured to fuse the high-level semantic features and the low-level detail features to obtain fused features;
a loss value calculation module 205, configured to calculate a loss between a predicted value and an actual value of the training sample to obtain a loss value;
an optimization module 206, configured to optimize a training parameter based on the loss value;
the training module 207 is configured to train the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model;
and the abnormal behavior recognition module 208 is configured to recognize the abnormal behavior of the human body based on the trained abnormal behavior recognition model.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.
Claims (8)
1. A human body abnormal behavior identification method based on a mixed attention mechanism is characterized by comprising the following steps:
extracting the features of the original image to obtain low-level detail features F;
screening the low-level detail features F to obtain main significant features F';
inputting the main significant features F' into a convolution feature extraction module to obtain high-level semantic features;
fusing the high-level semantic features and the low-level detail features to obtain fused features;
calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value;
optimizing a training parameter based on the loss value;
training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model;
and identifying the abnormal behaviors of the human body based on the trained abnormal behavior identification model.
2. The method for recognizing the abnormal human behavior based on the mixed attention mechanism according to claim 1, wherein the step of screening the low-level detail features F to obtain the main significant features F ″ specifically comprises the steps of:
inputting the low-level detail features F into a global average pooling layer and a maximum pooling layer of a space dimension, and sending the low-level detail features F into a shared network MLP to obtain first average pooling features and first maximum pooling features;
splicing the first average pooling characteristic and the first maximum pooling characteristic, and obtaining a weight coefficient Mc through a Sigmoid activation function;
multiplying the weight coefficient Mc with the low-level detail feature F to obtain a new feature F' after zooming;
inputting the new feature F' into an average pooling layer and a maximum pooling layer of the channel dimension to obtain a second average pooling feature and a second maximum pooling feature;
splicing the second average pooling characteristic and the second maximum pooling characteristic, and obtaining a weight coefficient Ms through a Sigmoid activation function;
and multiplying the weight coefficient Ms and the scaled new feature F 'to obtain a main significant feature F'.
3. The method for recognizing the abnormal human behavior based on the mixed attention mechanism as claimed in claim 1, wherein the step of inputting the main significant features F "to a convolution feature extraction module to obtain high-level semantic features and the step of fusing the high-level semantic features and the low-level detail features to obtain fused features specifically comprises the steps of:
carrying out point-by-point convolution and depth separable convolution operations on the main significant features F' to obtain features G;
compressing the feature G by adopting global average pooling to obtain a compressed vector L;
carrying out excitation operation on the compressed vector L to obtain an output S;
weighting the output S to a characteristic G to obtain a re-calibrated characteristic I;
performing maximum pooling operation and average pooling operation on the re-calibrated characteristic I to obtain a maximum pooling characteristic and an average pooling characteristic;
splicing the maximum pooling characteristics and the average pooling characteristics, and generating a characteristic mapping Q by utilizing convolutions;
Mapping the feature to QsWeighting the feature I, performing feature recalibration, ending by point-by-point convolution of 1x1, recovering the original channel dimension, performing connection inactivation and input jump connection, and fusing the high-level semantic features and the low-level detail features extracted by the convolution feature extraction module in a multi-level manner to obtain fused features.
4. The method for recognizing the abnormal human behavior based on the mixed attention mechanism as claimed in claim 1, wherein the loss between the predicted value and the actual value of the training sample is calculated, and the following formula is specifically adopted to obtain the loss value:
and y ═ 1-epsilon x y + epsilon/k, wherein k represents the number of classes in a specific task, y represents a k-dimensional matrix composed of k classes, epsilon represents a smoothing factor, and y' represents a k-dimensional matrix composed of k classes after label smoothing.
5. A human body abnormal behavior recognition system based on a mixed attention mechanism is characterized in that the recognition system comprises:
the low-level detail feature extraction module is used for extracting features of the original image to obtain low-level detail features F;
the main significant feature screening module is used for screening the low-level detail features F to obtain main significant features F';
the high-level semantic feature extraction module is used for inputting the main significant features F' into the convolution feature extraction module to obtain high-level semantic features;
the feature fusion module is used for fusing the high-level semantic features and the low-level detail features to obtain fused features;
the loss value calculation module is used for calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value;
the optimization module is used for optimizing the training parameters based on the loss values;
the training module is used for training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model;
and the abnormal behavior recognition module is used for recognizing the abnormal behavior of the human body based on the trained abnormal behavior recognition model.
6. The system for recognizing the abnormal human behavior based on the mixed attention mechanism as claimed in claim 5, wherein the main significant feature screening module specifically comprises:
the first average pooling feature and first maximum pooling feature extracting unit is used for inputting the low-level detail features F into a global average pooling layer and a maximum pooling layer of a space dimension and sending the low-level detail features F into a shared network MLP to obtain first average pooling features and first maximum pooling features;
the weight coefficient Mc calculating unit is used for splicing the first average pooling characteristic and the first maximum pooling characteristic and obtaining a weight coefficient Mc through a Sigmoid activation function;
the characteristic F 'determining unit is used for multiplying the weight coefficient Mc and the low-level detail characteristic F to obtain a new scaled characteristic F';
a second average pooling feature and second maximum pooling feature extracting unit, configured to input the new feature F' to an average pooling layer and a maximum pooling layer of a channel dimension to obtain a second average pooling feature and a second maximum pooling feature;
the weight coefficient Ms calculation unit is used for splicing the second average pooling characteristic and the second maximum pooling characteristic and obtaining a weight coefficient Ms through a Sigmoid activation function;
and the main significant feature F ' determination unit is used for multiplying the weight coefficient Ms and the scaled new feature F ' to obtain a main significant feature F '.
7. The system for recognizing the abnormal human behavior based on the mixed attention mechanism as claimed in claim 5, wherein the high-level semantic feature extraction module and the feature fusion module specifically comprise:
a point-by-point convolution and depth separable convolution operation unit for performing point-by-point convolution and depth separable convolution operations on the main significant feature F' to obtain a feature G;
the compression unit is used for performing compression operation on the characteristic G by adopting global average pooling to obtain a compressed vector L;
the excitation operation unit is used for carrying out excitation operation on the compressed vector L to obtain an output S;
the recalibration unit is used for weighting the output S to the characteristic G to obtain a recalibrated characteristic I;
a maximum pooling operation and average pooling operation unit for performing maximum pooling operation and average pooling operation on the re-calibrated feature I to obtain a maximum pooling feature and an average pooling feature;
a splicing unit for splicing the maximum pooling characteristic and the average pooling characteristic and generating a characteristic mapping Q by convolutions;
A feature fusion unit for mapping the features QsWeighting the feature I, performing feature recalibration, ending by point-by-point convolution of 1x1, recovering the original channel dimension, performing connection inactivation and input jump connection, and fusing the high-level semantic features and the low-level detail features extracted by the convolution feature extraction module in a multi-level manner to obtain fused features.
8. The system for recognizing abnormal human behavior based on the mixed attention mechanism as claimed in claim 5, wherein the loss value calculating module specifically adopts the following formula:
and y ═ 1-epsilon x y + epsilon/k, wherein k represents the number of classes in a specific task, y represents a k-dimensional matrix composed of k classes, epsilon represents a smoothing factor, and y' represents a k-dimensional matrix composed of k classes after label smoothing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110468555.9A CN113516028B (en) | 2021-04-28 | 2021-04-28 | Human body abnormal behavior identification method and system based on mixed attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110468555.9A CN113516028B (en) | 2021-04-28 | 2021-04-28 | Human body abnormal behavior identification method and system based on mixed attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113516028A true CN113516028A (en) | 2021-10-19 |
CN113516028B CN113516028B (en) | 2024-01-19 |
Family
ID=78063994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110468555.9A Active CN113516028B (en) | 2021-04-28 | 2021-04-28 | Human body abnormal behavior identification method and system based on mixed attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113516028B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114745175A (en) * | 2022-04-11 | 2022-07-12 | 中国科学院信息工程研究所 | Attention mechanism-based network malicious traffic identification method and system |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10089556B1 (en) * | 2017-06-12 | 2018-10-02 | Konica Minolta Laboratory U.S.A., Inc. | Self-attention deep neural network for action recognition in surveillance videos |
CN108830157A (en) * | 2018-05-15 | 2018-11-16 | 华北电力大学(保定) | Human bodys' response method based on attention mechanism and 3D convolutional neural networks |
CN109389055A (en) * | 2018-09-21 | 2019-02-26 | 西安电子科技大学 | Video classification methods based on mixing convolution sum attention mechanism |
CN110059582A (en) * | 2019-03-28 | 2019-07-26 | 东南大学 | Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks |
CN110222653A (en) * | 2019-06-11 | 2019-09-10 | 中国矿业大学(北京) | A kind of skeleton data Activity recognition method based on figure convolutional neural networks |
CN110852273A (en) * | 2019-11-12 | 2020-02-28 | 重庆大学 | Behavior identification method based on reinforcement learning attention mechanism |
WO2020113886A1 (en) * | 2018-12-07 | 2020-06-11 | 中国科学院自动化研究所 | Behavior feature extraction method, system and apparatus based on time-space/frequency domain hybrid learning |
CN111626171A (en) * | 2020-05-21 | 2020-09-04 | 青岛科技大学 | Group behavior identification method based on video segment attention mechanism and interactive relation activity diagram modeling |
CN112307982A (en) * | 2020-11-02 | 2021-02-02 | 西安电子科技大学 | Human behavior recognition method based on staggered attention-enhancing network |
CN112307958A (en) * | 2020-10-30 | 2021-02-02 | 河北工业大学 | Micro-expression identification method based on spatiotemporal appearance movement attention network |
-
2021
- 2021-04-28 CN CN202110468555.9A patent/CN113516028B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10089556B1 (en) * | 2017-06-12 | 2018-10-02 | Konica Minolta Laboratory U.S.A., Inc. | Self-attention deep neural network for action recognition in surveillance videos |
CN108830157A (en) * | 2018-05-15 | 2018-11-16 | 华北电力大学(保定) | Human bodys' response method based on attention mechanism and 3D convolutional neural networks |
CN109389055A (en) * | 2018-09-21 | 2019-02-26 | 西安电子科技大学 | Video classification methods based on mixing convolution sum attention mechanism |
WO2020113886A1 (en) * | 2018-12-07 | 2020-06-11 | 中国科学院自动化研究所 | Behavior feature extraction method, system and apparatus based on time-space/frequency domain hybrid learning |
CN110059582A (en) * | 2019-03-28 | 2019-07-26 | 东南大学 | Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks |
CN110222653A (en) * | 2019-06-11 | 2019-09-10 | 中国矿业大学(北京) | A kind of skeleton data Activity recognition method based on figure convolutional neural networks |
CN110852273A (en) * | 2019-11-12 | 2020-02-28 | 重庆大学 | Behavior identification method based on reinforcement learning attention mechanism |
CN111626171A (en) * | 2020-05-21 | 2020-09-04 | 青岛科技大学 | Group behavior identification method based on video segment attention mechanism and interactive relation activity diagram modeling |
CN112307958A (en) * | 2020-10-30 | 2021-02-02 | 河北工业大学 | Micro-expression identification method based on spatiotemporal appearance movement attention network |
CN112307982A (en) * | 2020-11-02 | 2021-02-02 | 西安电子科技大学 | Human behavior recognition method based on staggered attention-enhancing network |
Non-Patent Citations (2)
Title |
---|
ZHAI, B: "Research on Detection Method of Abnormal Behavior of People in Video Surveillance", 2018 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2018), pages 289 - 293 * |
余阿祥, 李承润: "多注意力机制的口罩检测网络", 南京师范大学学报(工程技术版), pages 23 - 29 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114745175A (en) * | 2022-04-11 | 2022-07-12 | 中国科学院信息工程研究所 | Attention mechanism-based network malicious traffic identification method and system |
CN114745175B (en) * | 2022-04-11 | 2022-12-23 | 中国科学院信息工程研究所 | Network malicious traffic identification method and system based on attention mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN113516028B (en) | 2024-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111626350B (en) | Target detection model training method, target detection method and device | |
CN111126258B (en) | Image recognition method and related device | |
CN111061843A (en) | Knowledge graph guided false news detection method | |
CN112016500A (en) | Group abnormal behavior identification method and system based on multi-scale time information fusion | |
CN109190472B (en) | Pedestrian attribute identification method based on image and attribute combined guidance | |
CN111626116A (en) | Video semantic analysis method based on fusion of multi-attention mechanism and Graph | |
CN116994069B (en) | Image analysis method and system based on multi-mode information | |
CN115131638B (en) | Training method, device, medium and equipment for visual text pre-training model | |
CN116699297B (en) | Charging pile detection system and method thereof | |
CN115761900B (en) | Internet of things cloud platform for practical training base management | |
CN115564766B (en) | Preparation method and system of water turbine volute seat ring | |
CN113628059A (en) | Associated user identification method and device based on multilayer graph attention network | |
CN114978613B (en) | Network intrusion detection method based on data enhancement and self-supervision feature enhancement | |
CN111008570B (en) | Video understanding method based on compression-excitation pseudo-three-dimensional network | |
CN113516028A (en) | Human body abnormal behavior identification method and system based on mixed attention mechanism | |
CN117475236B (en) | Data processing system and method for mineral resource exploration | |
CN110458215A (en) | Pedestrian's attribute recognition approach based on multi-time Scales attention model | |
CN115393927A (en) | Multi-modal emotion emergency decision system based on multi-stage long and short term memory network | |
CN114550297A (en) | Pedestrian intention analysis method and system | |
CN114241253A (en) | Model training method, system, server and storage medium for illegal content identification | |
CN116030507A (en) | Electronic equipment and method for identifying whether face in image wears mask | |
CN113205044A (en) | Deep counterfeit video detection method based on characterization contrast prediction learning | |
CN112287929A (en) | Remote sensing image significance analysis method based on feature integration deep learning network | |
CN117576279B (en) | Digital person driving method and system based on multi-mode data | |
CN113012049B (en) | Remote sensing data privacy protection method based on GAN network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |