CN111310689A - Method for recognizing human body behaviors in potential information fusion home security system - Google Patents

Method for recognizing human body behaviors in potential information fusion home security system Download PDF

Info

Publication number
CN111310689A
CN111310689A CN202010116795.8A CN202010116795A CN111310689A CN 111310689 A CN111310689 A CN 111310689A CN 202010116795 A CN202010116795 A CN 202010116795A CN 111310689 A CN111310689 A CN 111310689A
Authority
CN
China
Prior art keywords
human body
human
joint
detected
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010116795.8A
Other languages
Chinese (zh)
Other versions
CN111310689B (en
Inventor
李颀
姜莎莎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi University of Science and Technology
Original Assignee
Shaanxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi University of Science and Technology filed Critical Shaanxi University of Science and Technology
Priority to CN202010116795.8A priority Critical patent/CN111310689B/en
Publication of CN111310689A publication Critical patent/CN111310689A/en
Application granted granted Critical
Publication of CN111310689B publication Critical patent/CN111310689B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

A human body behavior recognition method in a home security system with potential information fusion is characterized in that a human body motion time sequence obtained by tracking is used as a research object, correlations between posture characteristics and behaviors, between interactive object characteristics and behaviors and between behaviors are used as potential information, and the influence of the potential information on human body behavior recognition in the home security system is fully mined by introducing constraint conditions in the extraction of the posture space-time characteristics and the extraction of the characteristics of the interactive objects, so that the difference between behavior classes is increased, the difference in the behavior classes is reduced, and the accuracy and the generalization of the human body behavior recognition method are improved. And (3) taking the mutual information of each joint point about the behavior category as a constraint condition, sequencing all the obtained mutual information, reserving the joint point group with the maximum mutual information capable of representing the specific behavior, and performing behavior recognition by using the screened joint point group and the interactive object characteristic fusion, thereby improving the real-time performance and accuracy of the recognition.

Description

Method for recognizing human body behaviors in potential information fusion home security system
Technical Field
The invention relates to the technical field of computer vision, in particular to a human behavior identification method in a home security system with potential information fusion.
Background
At present, many people begin to install video monitoring systems at home to guarantee their own property and life safety. However, these monitoring systems are installed in the home and cannot prevent the user from getting ill, and the traditional digital monitoring system mainly relies on the monitoring and analysis of the monitoring images by the monitoring personnel, which is not only inefficient, but also cannot meet the higher and higher security requirements in real-time and effectiveness.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a human behavior identification method in a potential information fusion home security system, which carries out automatic analysis on human behaviors in a monitoring picture by replacing family members with a computer, can immediately prompt the family members to pay attention to the condition of a door when an abnormal phenomenon is found, can carry out 24 × 7 all-weather reliable monitoring, and greatly improves the instantaneity and the effectiveness.
In order to achieve the purpose, the invention adopts the technical scheme that:
the method for providing human behavior identification in the home security system with potential information fusion comprises the following steps;
step one, a camera is used for collecting images;
secondly, detecting a human body target of the acquired image by using an illumination self-adaption method based on a background difference method, and then tracking the detected human body target by using a repeat method to obtain a human body motion time sequence;
step three, identifying the detected face, judging whether the face is a family or not, if so, not performing any operation on the motion time sequence obtained in the step two, otherwise, identifying human behaviors;
step four, extracting the posture space-time characteristics of the human body from the motion time sequence obtained in the step two;
step five, extracting the characteristics of the interactive objects by adopting a clue enhanced deep convolutional neural network;
step six, fusing the global posture space-time characteristics and the local interactive object characteristics extracted in the step four and the step five;
and step seven, inputting the fused feature vectors into an SVM classifier for behavior recognition.
In the second step, the human body entering the detection range is detected by using an illumination self-adaptive method based on a background difference method, a Vibe algorithm is used for background modeling, the number of pixels of the human body target detected in the previous frame is recorded and is represented by Y, the number of pixels of the foreground target detected in the current frame is represented by L, and the system can falsely detect the background as the foreground because large-area white appears at the moment of illumination mutation, wherein L is larger than Y. Therefore, a threshold (the number of pixels of a human body target range detected in a previous frame) is set during foreground detection to judge the detection range of the foreground, if the range exceeds the threshold, illumination mutation is caused, otherwise, illumination mutation is not caused, if illumination mutation is caused, illumination compensation is performed on the background model by using the brightness change values of the pixels in two adjacent frames of images, and the compensation formula is as follows:
Δt(x,y)=|Vt(x,y)-Vt-1(x,y)|
wherein:
Figure BDA0002391733480000031
Vtrepresenting an image It(x, y), where n is the total number of pixels in the image, n is 1280 × 480 is 614400, and I ist(x,y)max(R,G,B)And It(x,y)min(R,G,B)Respectively representing the maximum value and the minimum value of the R, G and B components at the pixel point (x, y);
after the human body target is detected, the detected human body target is tracked by using a Stacke method, the position of the target is found by using a translation filter and a color filter in the tracking process, then the size of the target is obtained by using a scale filter, and finally, a human body motion time sequence is obtained.
In the fourth step, the posture space-time characteristics of the obtained human motion time sequence are extracted, and the specific process comprises the following steps:
1) calculating mutual information of each joint point, judging the response degree of each joint point to a certain specific behavior through the mutual information, and finally reserving a joint point group which can represent the specific behavior and has the maximum mutual information, wherein a formula for calculating the mutual information of each joint point is as follows:
I(fj,Y)=H(fj)-H(fj|Y)
wherein H (f)j) Information entropy, j 1,2, 20,
Figure BDA0002391733480000032
Figure BDA0002391733480000033
the dynamic process of the j-th joint point changing along with time is represented, N represents the frame number of a human motion time sequence, Y is the category of human behaviors, and under the home security scene, water delivery, express delivery, takeaway, friends, cleaning personnel, other people and the like are mainly identified, so that Y is 1,2, 3, 4, 5 and 6, wherein the calculation formula of entropy is as follows:
Figure BDA0002391733480000041
wherein p (f)j) Is a probability density function, i represents the frame number i of the time sequence is 1,2.
2) Extracting posture space-time characteristics from the screened joint points, wherein the characteristics in space dimension are as follows:
Figure BDA0002391733480000042
the method comprises the following steps that K represents joint points of human body postures, K is 1,2, 20, N represents the frame number of a human body motion time sequence, human body hip joint points are selected as the mass center of a human body, T represents a joint coordinate track characteristic matrix, theta represents a direction matrix of each deleted joint point relative to the mass center of the human body, D represents a space distance matrix of any two joint points, psi represents a direction matrix of a vector formed by any 2 joints relative to an upward vector of the mass center, and A represents a 3 internal angle size matrix formed by any 3 joint points;
the features in the time dimension are:
Figure BDA0002391733480000043
wherein, Δ T is a trajectory displacement matrix of the joint points, Δ θ is a direction of the same joint point along with displacement, Δ D is a matrix of a distance change of any two joint points along with time, Δ ψ is a direction change of a vector of any two joint points relative to the centroid upward, and Δ a is an internal angle size change matrix formed by any 3 joint points.
The extracted pose spatiotemporal features are represented as:
Fpose=Fspatial+Ftemporal
in the fifth step, the detected human body is used as a clue, the effective object interacted with the human body is used as a high-level clue, the feature of the object interacted with the human body is extracted by using a convolutional neural network, the position relation between the object and the human body in the detected human body is implicitly integrated into the convolutional neural network, and the feature of the effective object interacted with the human body is extracted;
a loss function is used in the training process, parameters are adjusted when loss is propagated reversely, and a mixed loss function calculation formula is as follows:
L(M,D)=Lmain(M,D)+αLhint(M,D)
wherein L ismain(M, D) loss function for interactive object feature extraction, Lhint(M, D) represents the distance implying that the loss of the task is a function, M represents the network model,
Figure BDA0002391733480000051
as a training set of N sample pictures,
Figure BDA0002391733480000052
the number of N images is shown,
Figure BDA0002391733480000053
representing the related category label, and α takes a value between 0 and 1.
In the sixth step, because the gesture space-time characteristics and the interactive object characteristics have different response degrees to different human behavior identifications, the two obtained characteristics are subjected to weighted fusion, and the formula is as follows:
F=w1Fpose+w2Fobject
wherein, w1Weighting coefficients, w, being space-time features of the attitude2A weighting coefficient for a human interaction object feature, and w1+w2=1,FposeRepresenting a pose spatio-temporal feature, FobjectRepresenting the interactive object features.
And seventhly, inputting the fused feature vectors into an SVM classifier for classification to obtain a final recognition result.
The invention has the beneficial effects that:
the invention uses the machine vision technology to perform behavior detection on a human body in a home security environment, firstly performs face recognition before the behavior detection, judges whether the human body is a family or not, does not perform the behavior detection if the human body is the family, otherwise performs the behavior detection, integrates the interactive object detection in the behavior detection, improves the recognition accuracy, can prevent the human body from getting ill before detecting the behavior before pressing a doorbell, and improves the instantaneity and the effectiveness. The invention combines the body characteristics of the human body interaction object and the human body movement to perform behavior detection, and has important research value for processing human body behaviors identified by the interaction object under different scenes.
Drawings
Fig. 1 is a flowchart of a behavior recognition method according to an embodiment of the present invention.
Fig. 2 is a flowchart of human target detection according to an embodiment of the present invention.
FIG. 3 is a flow chart of pose spatio-temporal feature extraction according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The system aims at the problem of outsider invasion existing in the existing home security system and the problem that the real-time performance and effectiveness of the traditional digital monitoring can not meet the safety requirement. The invention uses the machine vision technology in the security scene, does not need to analyze the monitoring image by a monitoring person, adopts the face recognition technology for family members, adopts the human body behavior recognition technology for people except the family members, detects the behavior of the human body before knocking the door, and timely informs the family members of the behavior before knocking the door, thereby preventing the people from getting ill and improving the real-time property and the accuracy.
The application principle of the invention is further explained in the following with the attached drawings:
as shown in fig. 1, which is a general flow diagram of the method of the present invention, the method for identifying human behavior in a home security system with latent information fusion according to the present invention comprises the following steps:
step one, a camera installed at a door of a house is used for collecting images in a detection range.
Step two, as shown in fig. 2: the method comprises the steps of firstly carrying out real-time human target detection on an acquired image, establishing and updating a background model when the system carries out human target detection by using a background difference method, and adopting a Vibe background modeling method to realize establishment of the model because a home security system has higher requirements on real-time performance and has the problem of light mutation. The number of pixel points of the human body target range detected in the previous frame is recorded and is represented by Y, the number of pixel points of the foreground detected in the current frame is represented by L, and due to the fact that large-area white appears at the moment of sudden change of illumination, the background can be mistakenly detected as the foreground by the system, and L is larger than Y. Therefore, a threshold (the number of pixel points of a target range of a human body detected in a previous frame) is set during foreground detection to judge the detection range of the foreground, if the range exceeds the threshold, the illumination mutation is generated, otherwise, the illumination mutation is not generated, if the illumination mutation is generated, illumination compensation is performed on the background model by using the brightness change values of the pixel points in two adjacent frames of images, and the compensation formula is as follows:
Δt(x,y)=|Vt(x,y)-Vt-1(x,y)|
wherein:
Figure BDA0002391733480000071
representing an image It(x, y) where n is the total number of pixels in the image, n 1280 × 480 is 614400, and I ist(x,y)max(R,G,B)And It(x,y)min(R,G,B)Respectively representing the maximum value and the minimum value of the R, G and B components at the pixel point (x, y).
In practical situations, because there are some background objects with boundary portions or strong reflection coefficients in a captured picture, the background objects cannot be completely cancelled after background difference processing, and they are represented as some point-like, small block-like and linear noises, which need to be reasonably judged in the process of target detection to distinguish actual moving targets. Therefore, the obtained binary image uses morphological filtering to solve the problem, for a plurality of targets, the binary image after the morphological filtering generally comprises a plurality of areas, and as the multi-target area generally comprises a plurality of sub-areas which are not communicated with each other, it is necessary to detect the communication condition of each area, then distinguish the areas by marks, and frame each target in the original image according to the marks, thereby further calculating the position of each target in each frame of image.
Further, a kernel method is adopted to establish a correlation filter model and a color filter template for a first frame of image, for a new frame of image, firstly a translation filter and a color filter are used to find the position of a target, then different candidate target frames are extracted by using a scale filter with the target position as a central point, the scale corresponding to the value with the maximum response value is taken as the final target scale, the position and the size of the target are obtained, and then the correlation filter model and the color filter model are updated.
After the scores of the translation filter and the color histogram are obtained, weighted summation is carried out, and the calculation formula is as follows:
f(x)=γtmplftmpl(x)+γhistfhist(x)
wherein x ═ T (x)t,p);θt-1T is a feature extraction function, xtIs to represent the t-th frame image, p represents the rectangular frame in any frame image, theta represents the model parameter, thetat-1The method is characterized in that a scoring function is obtained in a linear mode according to target model parameters established by a first t-1 frame and in order to combine gradient characteristics and color characteristics on the basis of meeting the real-time performance, wherein gamma istmplIs the score coefficient, gamma, of the filter templatehistIs the histogram score coefficient, γtmplhist=1。
Step three, carrying out face recognition on the detected human body, judging whether the human body is a family or not, and if the human body is the family, displaying a recognition result on a human-computer interaction interface; otherwise, behavior recognition is carried out.
Step four, as shown in fig. 3: the human body posture space-time characteristics are extracted by taking a human body motion time sequence as a research object, and the specific process comprises the following steps:
1) in a home security scene, a person moves forwards, so that the distance of the human body shot by the camera changes, and the position coordinates of the human body joint points may have a large difference.
The original coordinate of a certain joint point of a human body is assumed to be (x)0,y0) And the normalized coordinates are (x, y), and the normalization formula is as follows:
Figure BDA0002391733480000091
where d is max { w, h }, w and h are the width and height of the image, respectively, and x, y ∈ (-1,1) after normalization.
2) And after obtaining the normalized human body posture coordinate, extracting the time characteristic and the space characteristic of the posture time sequence, wherein the space characteristic describes the position and the mutual position relation of the joint points on the same frame of image, and the time characteristic describes the change of the joint position caused by the posture change.
Because the response degree of each joint point of the human body to a certain specific behavior is different, if all the joint points are considered and not distinguished, the joint points with low response degree will inevitably introduce noise to reduce the identification effect, so some noise points are selectively discarded, and therefore, by calculating the mutual information of each joint point about the behavior category, the joint point group with the maximum mutual information capable of representing the specific behavior is reserved.
Assuming a time-series sequence of behaviors with a total of N frames, the dynamic process of the j (j ═ 1,2.., 20) th joint point over time can be expressed as:
Figure BDA0002391733480000092
mutual information of each joint point to the human behavior category is as follows:
I(fj,Y)=H(fj)-H(fj|Y)
wherein H (f)j) The information entropy of the jth joint point is represented, Y is the category of human behavior, and mainly identifies water delivery, express delivery, takeaway, friends, cleaning personnel, other people and the like in a home security scene, so that Y is 1,2, 3, 4, 5 and 6. The above formula is used to measure the degree of response of each joint to a particular category of behavior. The calculation formula of the entropy is as follows:
Figure BDA0002391733480000101
Figure BDA0002391733480000102
wherein p (f)j) Is a probability density function, p (Y, f)i j) As a joint probability density function, p (f)i jY) is a conditional probability density function, i represents the number of frames i of the time series, 1,2.
After the mutual information of each joint point to the behavior category of the human body is calculated, the mutual information is sorted from big to small, and the joint point group with the maximum mutual information capable of representing the specific behavior is selected.
The selection rule of the joint point group with the maximum mutual information is as follows: for human behavior recognition in a home security scene, for a normal human body, the information of joint points of arms, hands and legs of the human body is mainly concerned, and for a person with an invasive behavior, the information of the whole joint points is concerned, so that the constraint condition for selecting a sequenced mutual information matrix is as follows:
the matrix formed by the mutual information of the joint points obtained by each action is as follows:
Figure BDA0002391733480000103
wherein, N represents the N (N is 1,2, 3, 4, 5, 6) type behavior, K represents the K (K is 1,2, 20) type joint point, the mutual information obtained by each joint point is sorted, and since the arm, the hand and the leg are the main concerned joint points, the joint point group which can represent the specific behavior and is composed of the three joint points is selected from the sorted mutual information group.
The attitude matrix after screening the response degree of the joint points to the behaviors is as follows:
Figure BDA0002391733480000111
wherein r isij=(xij,yij) I belongs to {1, 2.,. N }, j belongs to {1, 2.,. K }, N is 6, K has a value range of 4-14, and K is the maximum subscript of the obtained maximum joint point group.
After obtaining the screened attitude matrix, extracting features in a space dimension and features in a time dimension, wherein the features in the space dimension are as follows:
Figure BDA0002391733480000112
wherein a human hip joint point (x) is selectedi0,yi0) As the centroid of the human body, T ═ Tij)N×KRepresenting a characteristic matrix of the joint coordinate trajectory, tij=(xij-xi0,yij-yi0),
Figure BDA0002391733480000113
A direction matrix of each screened joint point relative to the center of mass of the human body is shown,
Figure BDA0002391733480000114
a spatial distance matrix representing any two joint points,
Figure BDA0002391733480000115
a direction matrix representing the vector of any 2 joints with respect to the vector up the centroid,
Figure BDA0002391733480000116
representing a matrix of 3 interior angles formed by any 3 joint points.
The features in the time dimension are:
Figure BDA0002391733480000117
wherein Δ T ═ xi+s,j-xij,yi+s,j-yij)(N-s)×2KIs a track displacement matrix of the joint points,
Figure BDA0002391733480000121
is the direction of the same joint point along with the displacement,
Figure BDA0002391733480000122
is a matrix of the distance of any two joint points over time,
Figure BDA0002391733480000123
for the change in direction of any two joint points with respect to the vector with the centroid up,
Figure BDA0002391733480000124
and the internal angle size change matrix is formed by any 3 joint points.
Obtaining the posture space-time characteristic through the time characteristic and the space characteristic, wherein the posture space-time characteristic is expressed as:
Fpose=Fspatial+Ftemporal
further, the features of the object interacting with the human are extracted by using a convolutional neural network by taking the detected human body as a clue and the effective object interacting with the human as a high-level clue, and the features of the effective object interacting with the human are extracted by implicitly integrating the position relationship between the object and the human in the detected human body into the convolutional neural network.
In the invention, two tasks are jointly executed, including a main task of interactive object recognition and an auxiliary task of distance hint enhancement. The auxiliary task plays a role in regularizing the network and enhancing the expression capacity of the network. Implying that the effect of the task on the primary task is reflected in sharing all convolutional layers before the full connection. In order to learn the weights of these layers mixedly, a mixed loss function is used, which combines the loss functions of both tasks. The method comprises the following specific steps: the network model is expressed in terms of M,
Figure BDA0002391733480000125
as a training set of N sample pictures,
Figure BDA0002391733480000126
the number of N images is shown,
Figure BDA0002391733480000127
representing related class labels, wherein α values are between 0 and 1, and the formula of the mixing loss function is as follows:
L(M,D)=Lmain(M,D)+αLhint(M,D)
Figure BDA0002391733480000128
Figure BDA0002391733480000131
Mmain(. and M)hint(·) Respectively representing the output of the primary task and the output of the implied task. The model parameters are trained and fine tuned by random gradient descent. A random gradient descent algorithm is used to optimize L (M, D). After the gradient is calculated, the weight ω is updated using a rule expressed by the following formula.
Figure BDA0002391733480000132
Further, because the response degrees of the posture characteristic and the interactive object characteristic to different human behavior recognition are different, the obtained two characteristics are subjected to weighted fusion, and the formula is as follows:
F=w1Fpose+w2Fobject
wherein, w1Weighting coefficients, w, being space-time features of the attitude2A weighting coefficient for a human interaction object feature, and w1+w2=1。FposeFor extracted pose spatiotemporal features, FobjectAnd the extracted interactive object features.
Furthermore, the integrated features are classified, and the system mainly identifies behaviors of water delivery, express delivery, take-away, friends, other people and the like. Therefore, a multi-classification support vector machine is needed, which is realized by designing a two-classification model between any two classes, and finally combining a plurality of two classifiers to realize the construction of a multi-classifier, wherein the two classifications still use the above method. In the system, 6 categories exist, 1 category is taken as a positive sample in each classification, the other 1 category is taken as a negative sample, and the like. This gives a total of 15 classifiers. During classification, the 15 classifiers answer which of the two categories belongs to in turn, and the category with the highest vote count in the final voting statistics is the category to which the 15 classifiers belong.
And step five, because the image processing is performed on the cloud server, the recognition result cannot be seen by each user, and therefore the human-computer interaction module is required to receive and display the recognition result. And sending the recognition result to the family in a short message form when the family does not stay at home and some people move at the door.

Claims (6)

1. The method for providing the human behavior recognition in the home security system with the potential information fusion is characterized by comprising the following steps;
step one, a camera is used for collecting images;
secondly, detecting a human body target of the acquired image by using an illumination self-adaption method based on a background difference method, and then tracking the detected human body target by using a repeat method to obtain a human body motion time sequence;
step three, identifying the detected face, judging whether the face is a family or not, if so, not performing any operation on the motion time sequence obtained in the step two, otherwise, identifying human behaviors;
step four, extracting the posture space-time characteristics of the human body from the motion time sequence obtained in the step two;
step five, extracting the characteristics of the interactive objects by adopting a clue enhanced deep convolutional neural network;
step six, fusing the global posture space-time characteristics and the local interactive object characteristics extracted in the step four and the step five;
and step seven, inputting the fused feature vectors into an SVM classifier for behavior recognition.
2. The method for providing human behavior recognition in a home security system with potential information fusion according to claim 1, wherein in the second step, the human body entering the detection range is detected by an illumination adaptive method based on a background difference method, a background modeling is performed by using a Vibe algorithm, the number of pixels of the human body target detected in the previous frame is recorded and is represented by Y, the number of pixels of the foreground target detected in the current frame is represented by L, and the system falsely detects the background as the foreground because a large area of white color appears at the moment of illumination mutation, wherein L is greater than Y. Therefore, a threshold (the number of pixels of a human body target range detected in a previous frame) is set during foreground detection to judge the detection range of the foreground, if the range exceeds the threshold, illumination mutation is caused, otherwise, illumination mutation is not caused, if illumination mutation is caused, illumination compensation is performed on the background model by using the brightness change values of the pixels in two adjacent frames of images, and the compensation formula is as follows:
Δt(x,y)=|Vt(x,y)-Vt-1(x,y)|
wherein:
Figure FDA0002391733470000021
Vtrepresenting an image It(x, y), where n is the total number of pixels in the image, n is 1280 × 480 is 614400, and I ist(x,y)max(R,G,B)And It(x,y)min(R,G,B)Respectively representing the maximum value and the minimum value of the R, G and B components at the pixel point (x, y);
after the human body target is detected, the detected human body target is tracked by using a Stacke method, the position of the target is found by using a translation filter and a color filter in the tracking process, then the size of the target is obtained by using a scale filter, and finally, a human body motion time sequence is obtained.
3. The method for providing human behavior recognition in the home security system with potential information fusion according to claim 1, wherein in the fourth step, the posture space-time feature is extracted from the obtained human motion time sequence, and the specific process comprises:
1) calculating mutual information of each joint point, judging the response degree of each joint point to a certain specific behavior through the mutual information, and finally reserving a joint point group which can represent the specific behavior and has the maximum mutual information, wherein a formula for calculating the mutual information of each joint point is as follows:
I(fj,Y)=H(fj)-H(fj|Y)
wherein H (f)j) Information entropy, j 1,2, 20,
Figure FDA0002391733470000031
Figure FDA0002391733470000032
the dynamic process of the j-th joint point changing along with time is represented, N represents the frame number of a human motion time sequence, Y is the category of human behaviors, and under the home security scene, water delivery, express delivery, takeaway, friends, cleaning personnel, other people and the like are mainly identified, so that Y is 1,2, 3, 4, 5 and 6, wherein the calculation formula of entropy is as follows:
Figure FDA0002391733470000033
wherein p (f)j) Is a probability density function, i represents the frame number i of the time sequence is 1,2.
2) Extracting posture space-time characteristics from the screened joint points, wherein the characteristics in space dimension are as follows:
Figure FDA0002391733470000034
the method comprises the following steps that K represents joint points of human body postures, K is 1,2, 20, N represents the frame number of a human body motion time sequence, human body hip joint points are selected as the mass center of a human body, T represents a joint coordinate track characteristic matrix, theta represents a direction matrix of each deleted joint point relative to the mass center of the human body, D represents a space distance matrix of any two joint points, psi represents a direction matrix of a vector formed by any 2 joints relative to an upward vector of the mass center, and A represents a 3 internal angle size matrix formed by any 3 joint points;
the features in the time dimension are:
Figure FDA0002391733470000035
wherein, Δ T is a trajectory displacement matrix of the joint points, Δ θ is a direction of the same joint point along with displacement, Δ D is a matrix of a distance change of any two joint points along with time, Δ ψ is a direction change of a vector of any two joint points relative to the centroid upward, and Δ a is an internal angle size change matrix formed by any 3 joint points.
The extracted pose spatiotemporal features are represented as:
Fpose=Fspatial+Ftemporal
4. the method for providing human behavior recognition in a home security system with potential information fusion as claimed in claim 1, wherein in the fifth step, the features of the object interacting with the human are extracted by using a convolutional neural network by taking the detected human body as a clue and taking the effective object interacting with the human as a high-level clue, and the features of the effective object interacting with the human are extracted by implicitly integrating the position relationship between the object and the human in the detected human body into the convolutional neural network;
a loss function is used in the training process, parameters are adjusted when loss is propagated reversely, and a mixed loss function calculation formula is as follows:
L(M,D)=Lmain(M,D)+αLhint(M,D)
wherein L ismain(M, D) loss function for interactive object feature extraction, Lhint(M, D) represents the distance implying that the loss of the task is a function, M represents the network model,
Figure FDA0002391733470000041
as a training set of N sample pictures,
Figure FDA0002391733470000042
the number of N images is shown,
Figure FDA0002391733470000043
representing the related category label, and α takes a value between 0 and 1.
5. The method for human behavior recognition in a home security system for providing potential information fusion as claimed in claim 1, wherein in the sixth step, since the response degree of the attitude space-time feature and the interactive object feature to different human behavior recognition is different, the two obtained features are weighted and fused, and the formula is as follows:
F=w1Fpose+w2Fobject
wherein, w1Weighting coefficients, w, being space-time features of the attitude2A weighting coefficient for a human interaction object feature, and w1+w2=1,FposeRepresenting a pose spatio-temporal feature, FobjectRepresenting the interactive object features.
6. The method for providing human behavior recognition in the home security system with potential information fusion according to claim 1, wherein in the seventh step, the fused feature vector is input into an SVM classifier for classification, and a final recognition result is obtained.
CN202010116795.8A 2020-02-25 2020-02-25 Method for recognizing human body behaviors in potential information fusion home security system Active CN111310689B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010116795.8A CN111310689B (en) 2020-02-25 2020-02-25 Method for recognizing human body behaviors in potential information fusion home security system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010116795.8A CN111310689B (en) 2020-02-25 2020-02-25 Method for recognizing human body behaviors in potential information fusion home security system

Publications (2)

Publication Number Publication Date
CN111310689A true CN111310689A (en) 2020-06-19
CN111310689B CN111310689B (en) 2023-04-07

Family

ID=71149293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010116795.8A Active CN111310689B (en) 2020-02-25 2020-02-25 Method for recognizing human body behaviors in potential information fusion home security system

Country Status (1)

Country Link
CN (1) CN111310689B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381072A (en) * 2021-01-11 2021-02-19 西南交通大学 Human body abnormal behavior detection method based on time-space information and human-object interaction
CN113487596A (en) * 2021-07-26 2021-10-08 盛景智能科技(嘉兴)有限公司 Working strength determination method and device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170046567A1 (en) * 2015-04-16 2017-02-16 California Institute Of Technology Systems and Methods for Behavior Detection Using 3D Tracking and Machine Learning
WO2018095082A1 (en) * 2016-11-28 2018-05-31 江苏东大金智信息***有限公司 Rapid detection method for moving target in video monitoring
WO2018130016A1 (en) * 2017-01-10 2018-07-19 哈尔滨工业大学深圳研究生院 Parking detection method and device based on monitoring video
CN110096950A (en) * 2019-03-20 2019-08-06 西北大学 A kind of multiple features fusion Activity recognition method based on key frame
CN110378281A (en) * 2019-07-17 2019-10-25 青岛科技大学 Group Activity recognition method based on pseudo- 3D convolutional neural networks
CN110555387A (en) * 2019-08-02 2019-12-10 华侨大学 Behavior identification method based on local joint point track space-time volume in skeleton sequence
CN110826453A (en) * 2019-10-30 2020-02-21 西安工程大学 Behavior identification method by extracting coordinates of human body joint points

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170046567A1 (en) * 2015-04-16 2017-02-16 California Institute Of Technology Systems and Methods for Behavior Detection Using 3D Tracking and Machine Learning
WO2018095082A1 (en) * 2016-11-28 2018-05-31 江苏东大金智信息***有限公司 Rapid detection method for moving target in video monitoring
WO2018130016A1 (en) * 2017-01-10 2018-07-19 哈尔滨工业大学深圳研究生院 Parking detection method and device based on monitoring video
CN110096950A (en) * 2019-03-20 2019-08-06 西北大学 A kind of multiple features fusion Activity recognition method based on key frame
CN110378281A (en) * 2019-07-17 2019-10-25 青岛科技大学 Group Activity recognition method based on pseudo- 3D convolutional neural networks
CN110555387A (en) * 2019-08-02 2019-12-10 华侨大学 Behavior identification method based on local joint point track space-time volume in skeleton sequence
CN110826453A (en) * 2019-10-30 2020-02-21 西安工程大学 Behavior identification method by extracting coordinates of human body joint points

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李海涛: "基于DTW约束的动作行为识别", 《计算机仿真》 *
郑潇等: "基于姿态时空特征的人体行为识别方法", 《计算机辅助设计与图形学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381072A (en) * 2021-01-11 2021-02-19 西南交通大学 Human body abnormal behavior detection method based on time-space information and human-object interaction
CN112381072B (en) * 2021-01-11 2021-05-25 西南交通大学 Human body abnormal behavior detection method based on time-space information and human-object interaction
CN113487596A (en) * 2021-07-26 2021-10-08 盛景智能科技(嘉兴)有限公司 Working strength determination method and device and electronic equipment

Also Published As

Publication number Publication date
CN111310689B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
KR102462572B1 (en) Systems and methods for training object classifiers by machine learning
Torralba Context-based vision system for place and object recognition
CN106897670B (en) Express violence sorting identification method based on computer vision
EP3092619B1 (en) Information processing apparatus and information processing method
Jalal et al. The state-of-the-art in visual object tracking
CN108805900B (en) Method and device for determining tracking target
Shahzad et al. A smart surveillance system for pedestrian tracking and counting using template matching
CN109583315B (en) Multichannel rapid human body posture recognition method for intelligent video monitoring
Seow et al. Neural network based skin color model for face detection
CN103020992B (en) A kind of video image conspicuousness detection method based on motion color-associations
Zin et al. Fusion of infrared and visible images for robust person detection
Damen et al. Detecting carried objects from sequences of walking pedestrians
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
Guo et al. Improved hand tracking system
García-Martín et al. Robust real time moving people detection in surveillance scenarios
CN111310689B (en) Method for recognizing human body behaviors in potential information fusion home security system
Nosheen et al. Efficient Vehicle Detection and Tracking using Blob Detection and Kernelized Filter
CN115116132A (en) Human behavior analysis method for deep perception in Internet of things edge service environment
Miller et al. Person tracking in UAV video
CN112347967B (en) Pedestrian detection method fusing motion information in complex scene
Panini et al. A machine learning approach for human posture detection in domotics applications
Hernández et al. People counting with re-identification using depth cameras
CN112183287A (en) People counting method of mobile robot under complex background
Kompella et al. Detection and avoidance of semi-transparent obstacles using a collective-reward based approach
CN113763418B (en) Multi-target tracking method based on head and shoulder detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant