WO2021082112A1 - 神经网络训练、骨骼图构建、异常行为监控方法和*** - Google Patents

神经网络训练、骨骼图构建、异常行为监控方法和*** Download PDF

Info

Publication number
WO2021082112A1
WO2021082112A1 PCT/CN2019/119826 CN2019119826W WO2021082112A1 WO 2021082112 A1 WO2021082112 A1 WO 2021082112A1 CN 2019119826 W CN2019119826 W CN 2019119826W WO 2021082112 A1 WO2021082112 A1 WO 2021082112A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognized
branch
image
neural network
convolutional neural
Prior art date
Application number
PCT/CN2019/119826
Other languages
English (en)
French (fr)
Inventor
林孝发
林孝山
胡金玉
于海峰
梁俊奇
Original Assignee
九牧厨卫股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 九牧厨卫股份有限公司 filed Critical 九牧厨卫股份有限公司
Publication of WO2021082112A1 publication Critical patent/WO2021082112A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • the embodiments of the present invention relate to, but are not limited to, the computer field, in particular to a deep convolutional neural network training method, a method for constructing a human skeleton diagram, a method and system for monitoring abnormal behavior.
  • the embodiments of the application provide a convolutional neural network training method, a human skeleton diagram construction method, an abnormal behavior monitoring method, and an abnormal behavior monitoring system.
  • an embodiment of the present application provides a method for training a deep convolutional neural network.
  • the deep convolutional neural network is a single-stage two-branch convolutional neural network, including a first branch for predicting confidence and a For predicting the second branch of the local affinity vector field, the method includes:
  • an embodiment of the present application provides a method for constructing a human skeleton map based on a deep convolutional neural network.
  • the deep convolutional neural network is a single-stage two-branch convolutional neural network, and includes a first step for predicting confidence.
  • a branch and a second branch for predicting a local affinity vector field the method includes:
  • an embodiment of the present application provides a method for monitoring abnormal behavior based on a deep convolutional neural network, and the monitoring method includes:
  • an embodiment of the present application also provides an abnormal behavior monitoring system based on a deep convolutional neural network, the system including:
  • the image acquisition device is set to acquire the image to be recognized
  • the server side is configured to obtain the image to be recognized sent by the image acquisition device, obtain the skeleton diagram of the human body in the image to be recognized by using the aforementioned method for constructing a skeleton diagram of the human body, and perform behavior recognition on the skeleton diagram, and when it is judged that there is abnormal behavior , Send an alarm signal to the client; and
  • the client is configured to receive the alarm signal sent by the server, and trigger an alarm according to the alarm signal.
  • the embodiments of the present application also provide a computer-readable storage medium that stores program instructions.
  • the program instructions When the program instructions are executed, the aforementioned deep convolutional neural network training method can be implemented, or based on the deep convolutional neural network.
  • the embodiments of the present application also provide a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor implements the aforementioned deep volume when the program is executed. Steps of the training method of the convolutional neural network, or the construction method of the human skeleton map based on the deep convolutional neural network, or the abnormal behavior monitoring method based on the deep convolutional neural network.
  • Figure 1 is a schematic diagram of the 14-point bone icon annotation method according to the embodiment of the present invention.
  • FIG. 2 is a flowchart of a method according to Embodiment 1 of the present invention.
  • FIG. 3 is a flowchart of obtaining a skeleton diagram through a single-stage dual-branch CNN network according to an embodiment of the present invention
  • 5a-c are schematic diagrams of the process of connecting key points into a skeleton diagram according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of a method for monitoring abnormal behavior according to an embodiment of the present invention.
  • FIG. 7a-d are schematic diagrams of abnormal behavior of the balcony according to the embodiment of the present invention.
  • FIG. 8 is a deployment diagram of a monitoring system applied to a balcony scene according to an embodiment of the present invention.
  • Fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
  • the specification may have presented the method and/or process as a specific sequence of steps. However, to the extent that the method or process does not depend on the specific order of the steps described herein, the method or process should not be limited to the steps in the specific order described. As those of ordinary skill in the art will understand, other sequence of steps are also possible. Therefore, the specific order of steps set forth in the specification should not be construed as a limitation on the claims. In addition, the claims for the method and/or process should not be limited to performing their steps in the written order. Those skilled in the art can easily understand that these orders can be changed and still remain within the spirit and scope of the embodiments of the present application. Inside.
  • the applicant proposes a method for monitoring abnormal behaviors using a deep convolutional neural network.
  • a method for training a deep convolutional neural network and a method for constructing a human skeleton map are provided, which will be described separately below.
  • This embodiment describes how to train and obtain a deep convolutional neural network (Deep Convolutional Neural Network, referred to herein as a CNN network) for recognizing human posture.
  • the CNN network in this embodiment obtains a skeleton diagram of key points of the human body by recognizing pictures, so as to recognize one or more people present in the image.
  • the skeleton diagram of the key points of the human body is composed of a set of coordinate points, and the posture of the person is described by the connection of the coordinate points.
  • Each coordinate point in the skeleton diagram is called a key point (part, or part or joint), and the effective connection between two key points is called a limb (pair, or pair).
  • the human body key point recognition in this embodiment includes one or more of the following recognition: face key point recognition, body key point recognition, foot key point recognition, and hand key point recognition.
  • face key point recognition is the recognition of the face as the object.
  • the number of key points depends on the design accuracy and the adopted database, and can be selected from 6 to 130.
  • the key points of the body are the recognition of the whole torso.
  • a complete skeleton diagram of the key points of the body is shown in Figure 1, including: head (0), neck (1), right shoulder (2), right elbow (3) , Right wrist (4), left shoulder (5), left elbow (6), left wrist (7), right hip (8), right knee (9), right ankle (10), left hip (11), left knee (12) and left ankle (13).
  • Hand key point recognition is the recognition of the hand as an object, which can include the recognition of 21 key points of the hand.
  • Foot key point recognition is to recognize the foot as an object, and the number of key points is determined according to needs.
  • the recognition including the above-mentioned face key point recognition, body key point recognition, foot key point recognition and hand key point recognition is the whole body key point recognition.
  • the recognition objects of whole body key point recognition include: human face, body, foot and hand. According to different application scenarios, only part of it can be trained and recognized during training. For example, when it is applied to abnormal behavior recognition, it can only perform body key point recognition, or perform body key point recognition and face key point recognition, or perform body Key point recognition, face key point recognition, hand key point recognition, or whole body key point recognition. In this embodiment, the whole body key point recognition is taken as an example for description.
  • the CNN network training method of this embodiment is shown in FIG. 2 and includes the following steps 10-13.
  • Step 10 input the image to be recognized
  • the image to be recognized may be acquired from an image acquisition device, for example, it may be an image directly acquired by the image acquisition device, or may be an image in a video acquired by the image acquisition device.
  • the image to be recognized can also be acquired from a storage device that stores images or videos.
  • the embodiment of the present invention has no limitation on the image acquisition device used to acquire an image, as long as it can acquire an image.
  • the image may be in color.
  • the person in the image may be single or multiple.
  • Step 11 Perform feature analysis on the image to be identified according to the preset object to be identified to obtain one or more feature atlases containing the object to be identified in the image to be identified;
  • the objects to be recognized include: face, body, feet, and hands, and all faces, bodies, feet, and heads are obtained from the image to be recognized. This process can also be called a pre-training process.
  • a set of feature maps includes one or more feature maps, that is, a feature map includes one or more feature maps.
  • four sets of feature atlases can be obtained, including: face feature atlas, body feature atlas, foot feature atlas, and hand feature atlas, where each feature atlas includes the image
  • the feature maps of all corresponding objects to be recognized for example, the face feature atlas includes all the face feature maps in the figure, and the hand feature atlas includes all the hand feature maps in the figure.
  • only the first 10 layers of VGG-19 are used as an example. In other embodiments, the number of layers used may be different from this embodiment.
  • the network used to extract the feature information to obtain the feature atlas F can also be other networks.
  • the resolution of the image to be recognized before extracting a feature map for a body part, such as a face, a foot, or a hand, the resolution of the image to be recognized can be increased as needed, so that the obtained image contains the to-be-recognized image.
  • the resolutions of at least two feature atlases in the multiple feature atlases of the recognition object are different.
  • the resolution of the feature map obtained by feature analysis of the body part is 128*128ppi (pixels per inch), but when performing feature analysis on the hand, if the resolution of 128*128ppi is still used, the local recognition accuracy will be too high. Low, so the original image can be enlarged to, for example, 960*960ppi, and then the hand feature map is extracted to ensure the accuracy of local recognition.
  • the resolution of the feature map of each object to be recognized can be different.
  • Step 12 Input the set of feature atlas F to the first branch for predicting confidence to obtain a confidence prediction result
  • a single-stage (stage) two-branch CNN network is used to obtain a human skeleton map, as shown in FIG. 3, where the first branch is used for predicting confidence (Part Confidence Maps, or confidence maps or confidence maps), and the second The branch is used to predict partial affinity field (Part Affinity Fields, PAFs, affinity field for short), where the confidence is used to predict the location of key points, and the affinity field is used to indicate the degree of association between key points.
  • Part Confidence Maps or confidence maps or confidence maps
  • PAFs Part Affinity Fields
  • a set of feature atlas F is input to the first branch, and a preset confidence loss function is used to constrain the training accuracy of the first branch.
  • prediction training is performed on the feature atlas of all objects to be recognized at the same time, that is, multi-tasks coexist, so that the skeleton map of the whole body can be predicted at the same time during actual network application, and the prediction speed is improved.
  • the prediction result will not be affected when the human body is occluded. For example, when a person's body is occluded, it will not affect the recognition of key points of the person's face and hands.
  • the complexity of the algorithm can be greatly reduced, the calculation speed is increased, and the calculation time is reduced.
  • the confidence loss function f c can be calculated using the following formula
  • f C is the confidence loss function
  • j is the key point, j ⁇ ⁇ 1,...,J ⁇
  • J is the total number of key points
  • C j (p) is the confidence of the coordinate position of the key point j at p in the image
  • Predictive value Is the true confidence of the coordinate position of the key point j at p, that is, the joint point of the person in the real state
  • the function R is used to avoid penalizing true positive predictions during training.
  • Step 13 input the confidence prediction result and the set of feature atlases into the second branch for predicting the affinity field to obtain the affinity field prediction result;
  • the whole body key point recognition is adopted, and the confidence prediction result is a series set, including 4 sub-sets, namely the face key point sub-set, the body key point sub-set, the foot key point sub-set, and the hand key point sub-set (The order is not limited). In other embodiments, the number of sub-sets in the series set may be different depending on the identification object, which will not be repeated here.
  • Each sub-collection has key points that overlap with one or more other sub-collections, so as to obtain a complete skeleton diagram of the whole body in the follow-up.
  • At least one key point in the face key point sub-set coincides with at least one key point coordinate in the body key point sub-set; at least one key point in the body key point sub-set coincides with at least one key point coordinate in the foot key point sub-set
  • the left ankle key point coincides with a key point in the left foot key point sub-set
  • the right ankle key point coincides with a key point in the right foot key point sub-set
  • the coordinates of at least one key point in the key point sub-set are coincident, for example, a key point of the left wrist coincides with a key point in the key point sub-set of the left hand, and a key point of the right wrist coincides with a key point in the key point sub-set of the right hand.
  • Each subset is used as a unit to calculate the affinity field.
  • a set of feature atlas F and confidence prediction results are input into the second branch, and the corresponding preset affinity field loss function is also used to control the training accuracy.
  • the number of convolution blocks in the second branch can be increased, for example, 10 convolution blocks are set in the second branch. Or according to the calculation speed, the number of convolution blocks can be increased or decreased accordingly.
  • the number of convolution blocks in the second branch may be greater than the number of convolution blocks in the first branch.
  • the width of one or more convolution blocks in the second branch may be increased, and the width of each convolution block may be the same or different.
  • the width of each convolution block in the last h convolution blocks can be set to be greater than the width of the previous xh convolution blocks, x and h are all positive integers greater than 1, h ⁇ x.
  • the width of the previous convolution blocks is 3*3
  • the width of the last convolution block can be set to 7*7, or 9*9, or 12*12.
  • the width of the convolution block of the first branch and the second branch may be different.
  • the number of network layers of the entire second branch can be reduced to 10-15 layers to ensure the network prediction speed.
  • f Y is the affinity field loss function
  • i represents the affinity field, i ⁇ 1,...,I ⁇
  • I is the total number of affinity fields
  • Y i (p) is the i-th coordinate position at p in the image
  • the predicted value of the affinity field, Y i * (p) is the true value of the i-th affinity field at the coordinate position p, that is, the relationship between the key points and the key points in the real state
  • the function R is used to avoid penalizing true positive predictions during training.
  • the total objective loss function can also be calculated, and it can be judged whether the total objective loss function satisfies the target
  • the loss function threshold further comprehensively measures the accuracy of the network prediction results.
  • the target loss function threshold is not set, when the confidence loss function value meets the preset confidence loss function threshold, and the affinity field loss function value meets the preset local affinity vector field loss function threshold, all The training of the deep convolutional neural network used to predict the confidence and affinity field is completed.
  • the CNN network used to predict the confidence and affinity field can be obtained. Since the CNN network used in the prediction is the aforementioned single-stage dual-branch network and adopts a multi-task coexistence mechanism, it can recognize multiple objects to be recognized at the same time, with fast calculation speed and low computational complexity, and it can be obtained in a few seconds. Forecast results, suitable for occasions that require quick response.
  • the human skeleton map can be constructed based on the CNN network.
  • the method for constructing a human skeleton map includes the following steps 21-24.
  • Steps 20-21 are the same as steps 10-11;
  • Step 22 similar to step 12, the difference is that the first branch network parameter ⁇ () has been determined during the training process, and there is no need to calculate the confidence loss function, as long as the original feature atlas is input into the first branch, the confidence can be obtained forecast result;
  • Step 23 is similar to step 13, except that the second branch network parameter ⁇ () has been determined during the training process, and there is no need to calculate the affinity field loss function, as long as the original feature atlas and the confidence prediction result are input to the second branch , You can get the prediction result of the affinity field;
  • Step 24 Obtain a human skeleton map according to the confidence prediction result and the affinity field prediction result.
  • the affinity field method can detect the correlation between the key points, and can retain the position and rotation information in the entire limb area.
  • the affinity field is the two-dimensional vector field of each limb. For the two-dimensional vector encoding of each pixel belonging to a specific limb area, a vector is directed from one key point of the limb to another key point.
  • the quality of the connection can be evaluated by calculating the linear integral of the corresponding affinity field. For the sum of two possible key point positions, the reliability of the line segment between the two points is measured by the integral value.
  • the confidence prediction results obtained through the CNN network may be a+b. Combining the affinity field, select a from the a+b confidence prediction results and connect them to form a whole body skeleton map .
  • the bipartute matcing algorithm can be used for calculation.
  • the greedy algorithm is introduced into the bipartite graph matching algorithm to obtain a human skeleton graph.
  • both the first branch and the second branch only need one stage to achieve a better prediction result, and there is no need to perform multi-stage prediction.
  • each subset is used as a unit to calculate the affinity field.
  • the bipartite graph matching algorithm that introduces the greedy algorithm in step 24 is described below.
  • the process of calculating the human skeleton graph is shown in FIG. 4 and includes the following steps 241-242.
  • Step 241 Determine the position of the key point according to the confidence prediction result, calculate the connection of one limb according to the key point using the bipartite graph matching method, and obtain the limb connection of each limb (each type of limb) independently, until each kind of limb connection is obtained.
  • Limb connection of limb type
  • each key point has a subset, and two subsets m and n are obtained, and the key points in m and the key points in n are matched in pairs, and the two keys are calculated
  • Point affinity field choose two key points with the strongest affinity field to connect, and get the limb connection between the two key points.
  • the bipartite graph matching method can increase the calculation speed. In other embodiments, other algorithms can also be used.
  • Figure 5a shows a schematic diagram of the key points of the body obtained after passing the first branch
  • Figure 5b shows the calculated connection from key point 1 to key point 2.
  • Step 242 connect all the key points of the body: for all possible limb predictions obtained, assemble them into a skeleton diagram of the body by sharing the key points at the same position. In this case, it is a skeleton diagram of the body, as shown in Figure 5c. .
  • the above method can be used to obtain the skeleton diagram of the object to be recognized, and then all the local skeleton diagrams are combined according to the overlapping key point coordinates (that is, the key points sharing the same position), Obtain a skeleton diagram of the whole body.
  • the image size needs to be unified before assembling.
  • the image to be recognized is input into the CNN network trained by the foregoing embodiment, and then the CNN network calculates and outputs the skeleton map of all people in the image .
  • the skeleton diagram construction method has low complexity and fast calculation speed.
  • FIG. 6 is a flowchart of a method for monitoring abnormal behaviors according to an embodiment of the present invention, including the following steps 31 to 33.
  • Step 31 Obtain an image to be recognized
  • the acquisition of the image to be recognized in this step may be obtained from an image acquisition device, for example, it may be an image directly acquired by the image acquisition device, or an image in a video acquired by the image acquisition device. In addition to acquiring from an image acquisition device, it can also be acquired from a storage device that stores images or videos. The image can be in color or black and white. When the monitoring method is used in a balcony scene, the image to be recognized can be obtained from a camera set on the balcony.
  • the embodiment of the present invention has no limitation on the image acquisition device used to acquire an image, as long as it can acquire an image.
  • Step 32 construct a skeleton diagram of the human body in the image to be recognized
  • the person in the image to be recognized can be one or multiple, that is, a single-person skeleton diagram can be constructed, or a multiple-person skeleton diagram can be constructed.
  • a single-person skeleton diagram can be constructed, or a multiple-person skeleton diagram can be constructed.
  • the posture of the human body can be more accurately depicted for the follow-up.
  • Abnormal behavior recognition lays a good foundation.
  • the CNN network trained in Example 1 can be used to estimate the multi-person pose.
  • the confidence and affinity field can be obtained through the trained CNN network, and then the bipartite graph matching algorithm that introduces the greedy algorithm (or greedy algorithm) is used to analyze the confidence. Degree and affinity field, and finally get the skeleton diagram of multiple people.
  • Step 33 Perform behavior recognition on the human skeleton diagram, and trigger an alarm when it is judged to be an abnormal behavior.
  • the abnormal behavior can be, for example, a preset unsafe action, and the unsafe action can be defined by oneself according to the applicable scenario of the monitoring method.
  • unsafe actions may include, but are not limited to, one or more of the following actions: climbing, climbing, intruding, falling, etc.
  • the action library can be set up in advance to define abnormal behaviors or real-time recognition of human skeleton diagrams. When the abnormal behavior conditions are met, that is, the characteristics of the abnormal behavior (such as unsafe actions) are met, an alarm is issued.
  • the abnormal behavior monitoring method proposed in the embodiment of the present invention constructs a human skeleton diagram of the acquired image to be recognized, and recognizes abnormal actions (such as unsafe actions) on the constructed human skeleton diagram, and triggers an alarm as soon as an abnormal behavior is found. It can realize the automatic and intelligent capture of abnormal behaviors, and the recognition is accurate, which avoids the misjudgment rate and missed judgment rate of manual monitoring, and reduces labor costs.
  • the above abnormal monitoring method can be applied to a server or a client (also referred to as a client) that performs abnormal behavior identification and monitoring.
  • the embodiments of the present invention can be applied to various security monitoring scenarios.
  • it can be applied to workplaces such as factories and office buildings, and it can also be applied to home scenes.
  • the CNN network used in the prediction is the aforementioned single-stage dual-branch network and adopts a multi-task coexistence mechanism, its prediction speed is very fast, and the prediction result can be obtained in a few seconds, which is suitable for occasions that require fast response.
  • the monitoring of abnormal behavior on the balcony is taken as an example.
  • Climbing behavior and climbing behavior are to judge the same kind of climbing action from two angles. For example, when a person's feet exceed a certain height (such as 0.3 meters), it is considered that there is a climbing behavior, and an alarm is triggered. Climbing behavior can be when a person's head appears at a place higher than a normal person's height, such as 2 meters, and an alarm is triggered when it is considered that a climbing behavior occurs. In a sense, these two behaviors may or may not overlap. For example, if a child climbs to a certain height, which is higher than 0.3 meters and lower than 2 meters, the climbing behavior will be triggered, but the climbing behavior will not be triggered.
  • the setting rule for this action can be to set the area from a certain height (for example, 0.3 meters, the user can set this height) to the ceiling from the outdoor direction of the balcony to the ceiling as the warning area. If it is judged that the limb type in this area is leg ( Or the presence of legs and feet), it is judged as a climbing action. This type of alarm usually does not cause false positives.
  • the setting of this action can, for example, set the area on the balcony above the height of a normal person (for example, 2 meters, the height can be set by the user) to the roof as a warning area. If a key point of a person's head or a facial skeleton map is detected in the alert area, the client's warning is triggered.
  • the climbing event is a comprehensive recognition of bone features and human posture, and this type of action alarm is usually correct.
  • the monitoring time period (or the arming time period) can be set according to the needs. For example, if someone breaks into the balcony at night during bedtime, an alarm can be triggered (see Figure 7c).
  • An event in which a person is detected in the screen detection can be defined as an intrusion event.
  • you can set the effective monitoring area for example, the entire balcony area can be defaulted as the monitoring area
  • the arming time period When someone breaks into the effective monitoring area during this time period, an alarm is triggered.
  • This type of alarm is a type of bone recognition action, and there is usually no misjudgment.
  • Example 1 When the CNN network obtained through the training method of Example 1 is applied to the recognition of abnormal behaviors, especially when it is applied to the recognition of abnormal behaviors that affect life safety, a few seconds of difference may cause different results.
  • the CNN network can be used Get results faster and buy time as much as possible.
  • the embodiment of the present invention proposes an abnormal behavior monitoring system based on a CNN network.
  • an abnormal behavior such as an unsafe behavior
  • the client terminal will immediately receive early warning information and pictures.
  • Figure 8 The deployment of the system applied to the balcony scene is shown in Figure 8, including:
  • the image acquisition device is set to acquire the image to be recognized
  • the server side is configured to obtain the image to be recognized sent by the image acquisition device, use the CNN network to obtain the skeleton diagram of the human body in the image to be recognized, and perform behavior recognition on the skeleton diagram. Send an alarm signal;
  • the client is configured to receive the alarm signal sent by the server, and trigger an alarm according to the alarm signal. If the alarm signal includes an early warning image, the early warning image is displayed in real time.
  • abnormal behavior monitoring system of the embodiment of the present invention by constructing a human skeletal diagram of the acquired image to be recognized, abnormal behavior recognition is performed on the constructed human skeletal diagram, and an alarm is triggered once an abnormal behavior is found. It can realize the automatic and intelligent capture of abnormal behaviors, and the recognition is accurate, which avoids the misjudgment rate and missed judgment rate of manual monitoring, and reduces labor costs.
  • the monitoring system is a balcony security system
  • cameras can be installed on the balcony of multiple users, and these cameras can collect real-time video of the balcony.
  • the server can receive real-time video sent by multiple users' balcony cameras and perform real-time analysis.
  • the server can be set in the cloud, and when the cloud server determines that there is an abnormal behavior, it sends an alarm signal to the corresponding client.
  • the client can be implemented by downloading the corresponding application program (APP) through the user's handheld terminal.
  • the client can provide the user to set one or more of the following: abnormal behaviors that need to be monitored (for example, one or more of the following behaviors: Pan Gao , Climbing, intrusion and fall), early warning area, monitoring area, monitoring time period and monitoring sensitivity, etc.
  • the main advantage of the abnormal behavior monitoring system described in the embodiment of the present invention is that it can quickly and actively defend and warn in advance. Set up various abnormal behaviors required by users in advance through the client, and alert users to various abnormal behaviors identified by the system. Based on cloud computing and behavior recognition and analysis capabilities, the dilemma of relying on manpower to find abnormal problems is solved.
  • the system can also send on-site photos of various emergencies to the user client, which is convenient for users to deal with and solve problems in the venue in a timely manner.
  • the system of this embodiment is not only applicable to large-scale public places, but also applicable to home security intelligent monitoring.
  • the intelligent behavior recognition in the embodiment of the present invention is based on real-time multi-person human body gesture recognition. Given an RGB picture, you can get the location information of all the key points, and at the same time you can determine who each key point belongs to in the picture, that is, the connection information between the key points.
  • Traditional multi-person human body subgroup estimation algorithms generally use a top-down method. The first major flaw of this method is to rely on the detection of human posture, and the second flaw is that the algorithm speed is proportional to the number of people in the picture.
  • This system adopts a bottom-up method. It first detects the key points of the human body, then connects these key points by calculating the affinity field, and finally outlines the skeleton diagram of the human body.
  • the embodiment of the present invention detects each frame of image on the video in real time.
  • the trained CNN network can perform multiple tasks at the same time, the response speed of the system to abnormal behavior event processing , Will respond much faster than traditional methods.
  • the embodiment of the present invention also provides a computer storage medium, the computer storage medium stores a computer program; after the computer program is executed, it can implement the deep convolutional neural network training method provided by one or more of the foregoing embodiments, Either a method for constructing a human skeleton map based on a deep convolutional neural network, or a method for monitoring abnormal behavior based on a deep convolutional neural network.
  • the computer storage medium includes volatile and non-volatile, removable and non-removable implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data) In addition to the medium.
  • Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassette, magnetic tape, magnetic disk storage or other magnetic storage device, or Any other medium used to store desired information and that can be accessed by a computer.
  • a computer device may include a processor, a memory, and a computer program that is stored on the memory and can run on the processor, and the processor implements the deep convolutional neural network training of the embodiment of the present invention when the processor executes the computer program Method, or method of constructing human skeleton diagram, or method of monitoring abnormal behavior.
  • the computer receiver 40 may include: a processor 410, a memory 420, a bus system 430, and a transceiver 440, wherein the processor 410, the memory 420, and the transceiver 440 pass through the bus
  • the system 1430 is connected, the memory 1420 is configured to store instructions, and the processor 410 is configured to execute instructions stored in the memory 420 to control the transceiver 440 to send signals.
  • the processor 410 may be a central processing unit (Central Processing Unit, referred to as "CPU"), and the processor 410 may also be other general-purpose processors, digital signal processors (DSP), application specific integrated circuits (ASIC), or off-the-shelf processors. Programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 420 may include a read-only memory and a random access memory, and provides instructions and data to the processor 410. A part of the memory 420 may also include a non-volatile random access memory.
  • the bus system 430 may also include a power bus, a control bus, a status signal bus, and the like. However, for the sake of clear description, various buses are marked as the bus system 430 in FIG. 9.
  • the processing performed by the computer device may be completed by an integrated logic circuit of hardware in the processor 410 or instructions in the form of software. That is, the steps of the method disclosed in the embodiments of the present invention may be embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the software module can be located in storage media such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 420, and the processor 410 reads information in the memory 420, and completes the steps of the foregoing method in combination with its hardware. To avoid repetition, it will not be described in detail here.
  • Such software may be distributed on a computer-readable medium, and the computer-readable medium may include a computer storage medium (or a non-transitory medium) and a communication medium (or a transitory medium).
  • the term computer storage medium includes volatile and non-volatile data implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Sexual, removable and non-removable media.
  • Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer.
  • communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

一种深度卷积神经网络的训练方法、人体骨骼图构建方法、异常行为监控方法和***。所述深度卷积神经网络为单阶段双分支卷积神经网络,包括用于预测置信度的第一分支和用于预测局部亲和矢量场的第二分支,所述训练方法包括:输入待识别图像;根据预设的待识别对象,对所述待识别图像进行特征分析,得到包含所述待识别图像中待识别对象的一组或多组特征图集,每组特征图集对应一个待识别对象;将所述一组特征图集输入到所述深度卷积神经网络的第一分支,得到置信度预测结果;将所述置信度预测结果与所述一组特征图集输入到所述深度卷积神经网络的第二分支,得到亲和场预测结果。

Description

神经网络训练、骨骼图构建、异常行为监控方法和*** 技术领域
本发明实施例涉及但不限于计算机领域,尤指一种深度卷积神经网络训练方法、人体骨骼图构建方法、异常行为监控方法和***。
背景技术
传统的监控***,需要聘请专职的值班人员看守,值班人员需要时刻盯着监控画面,但面对数量众多的监控画面,值班人员无法看到所有的监控画面。所以很多时候,传统的监控***更多的是起到威慑和事后取证的作用。
发明概述
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供了一种卷积神经网络训练方法、人体骨骼图构建方法、异常行为监控方法和异常行为监控***。
第一方面,本申请实施例提供了一种深度卷积神经网络的训练方法,所述深度卷积神经网络为单阶段双分支卷积神经网络,包括用于预测置信度的第一分支和用于预测局部亲和矢量场的第二分支,所述方法包括:
输入待识别图像;根据预设的待识别对象,对所述待识别图像进行特征分析,得到包含所述待识别图像中待识别对象的一组或多组特征图集,每组特征图集对应一个待识别对象;将所述一组特征图集输入到所述深度卷积神经网络的第一分支,得到置信度预测结果;将所述置信度预测结果与所述一组特征图集输入到所述深度卷积神经网络的第二分支,得到局部亲和场预测结果。
第二方面,本申请实施例提供了一种基于深度卷积神经网络的人体骨骼图构建方法,所述深度卷积神经网络为单阶段双分支卷积神经网络,包括用于预测置信度的第一分支和用于预测局部亲和矢量场的第二分支,所述方法 包括:
输入待识别图像;根据预设的待识别对象,对所述待识别图像进行特征分析,得到包含所述待识别图像中待识别对象的一组或多组特征图集,每组特征图集对应一个待识别对象;将所述一组特征图集输入到所述深度卷积神经网络的第一分支,得到置信度预测结果;将所述置信度预测结果与所述一组特征图集输入到所述深度卷积神经网络的第二分支,得到局部亲和场预测结果;根据所述置信度预测结果和局部亲和场预测结果得到人体骨骼图。
第三方面,本申请实施例提供了一种基于深度卷积神经网络的异常行为监控方法,所述监控方法包括:
获取待识别图像;采用前述人体骨骼图构建方法获取所述待识别图像中的人体的骨骼图;对所述骨骼图进行行为识别,判断为异常行为时,触发报警。
第三方面,本申请实施例还提供了一种基于深度卷积神经网络的异常行为监控***,所述***包括:
图像采集装置,设置为采集待识别图像;
服务器端,设置为获取图像采集装置发送的待识别图像,采用前述人体骨骼图构建方法获取所述待识别图像中的人体的骨骼图,并对所述骨骼图进行行为识别,判断有异常行为时,向客户端发送报警信号;以及
客户端,设置为接收服务器端发送的报警信号,根据所述报警信号触发报警。
第四方面,本申请实施例还提供了一种计算机可读存储介质,存储有程序指令,当该程序指令被执行时可实现前述深度卷积神经网络的训练方法、或者基于深度卷积神经网络的人体骨骼图构建方法、或者基于深度卷积神经网络的异常行为监控方法。
第五方面,本申请实施例还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现前述深度卷积神经网络的训练方法、或者基于深度卷积神经网络的人体骨骼图构建方法、或者基于深度卷积神经网络的异常行为监控方法 的步骤。
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的其他优点可通过在说明书、权利要求书以及附图中所描述的方案来实现和获得。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
附图用来提供对本申请技术方案的理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。
图1为本发明实施例14点骨骼图标注法示意图;
图2为本发明实施例一方法流程图;
图3为本发明实施例通过单阶段双分支CNN网络得到骨骼图的流程图;
图4为本发明实施例多人骨骼图提取的流程图;
图5a-c为本发明实施例将关键点连接为骨骼图的过程示意图;
图6为本发明实施例异常行为监控方法的流程图;
图7a-d为本发明实施例阳台异常行为示意图;
图8为本发明实施例应用于阳台场景的监控***部署图;
图9为本发明实施例计算机设备的结构示意图。
详述
本申请描述了多个实施例,但是该描述是示例性的,而不是限制性的。尽管在附图中示出了许多可能的特征组合,并在具体实施方式中进行了讨论,但是所公开的特征的许多其它组合方式也是可能的。除非特意加以限制的情况以外,任何实施例的任何特征或元件可以与任何其它实施例中的任何其他特征或元件结合使用,或可以替代任何其它实施例中的任何其他特征或元件。
本申请包括并设想了与本领域普通技术人员已知的特征和元件的组合。 本申请已经公开的实施例、特征和元件也可以与任何常规特征或元件组合,以形成由权利要求限定的独特的发明方案。任何实施例的任何特征或元件也可以与来自其它发明方案的特征或元件组合,以形成另一个由权利要求限定的独特的发明方案。因此,应当理解,在本申请中示出和/或讨论的任何特征可以单独地或以任何适当的组合来实现。因此,除了根据所附权利要求及其等同替换所做的限制以外,实施例不受其它限制。此外,可以在所附权利要求的保护范围内进行各种修改和改变。
此外,在描述具有代表性的实施例时,说明书可能已经将方法和/或过程呈现为特定的步骤序列。然而,在该方法或过程不依赖于本文所述步骤的特定顺序的程度上,该方法或过程不应限于所述的特定顺序的步骤。如本领域普通技术人员将理解的,其它的步骤顺序也是可能的。因此,说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外,针对该方法和/或过程的权利要求不应限于按照所写顺序执行它们的步骤,本领域技术人员可以容易地理解,这些顺序可以变化,并且仍然保持在本申请实施例的精神和范围内。
本文及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。
为了避免传统监控***需要人工值守容易漏判的弊端,申请人提出一种采用深度卷积神经网络对异常行为进行监控的方法,为使该深度卷积神经网络能够对人体姿态进行识别,申请人提供一种训练深度卷积神经网络的方法和构建人体骨骼图的方法,下面分别进行说明。
实施例一
本实施例描述如何训练获得用于识别人体姿态的深度卷积神经网络(Deep Convolutional Neural Network,本文中简称为CNN网络)。本实施例中的CNN网络通过对图片识别得到人体关键点骨骼图,以对图像中存在的一个或多个人进行姿态识别。
人体关键点骨骼图由一组坐标点组成,通过坐标点的连接来描述人的姿态。骨骼图中的每一个坐标点称为一个关键点(part,或称为部分或关节),两个关键点之间的有效连接称为一个肢体(pair,或称为对)。
本实施例所述人体关键点识别包括以下识别的一种或多种:人脸关键点识别、身体关键点识别、足部关键点识别、手部关键点识别。其中,人脸关键点识别是以面部为对象的识别,关键点个数依据设计精度以及所采用的数据库不同,从6个到130个可选。身体关键点是以整体躯干部分为对象的识别,一个完整的身体关键点骨骼图如图1所示,包括:头(0)、脖子(1)、右肩(2)、右肘(3)、右手腕(4)、左肩(5)、左肘(6)、左手腕(7)、右臀(8)、右膝盖(9)、右脚踝(10)、左臀(11)、左膝盖(12)和左脚踝(13)。手部关键点识别是以手为对象的识别,可以包括手部21个关键点的识别。足部关键点识别是以脚为对象的识别,关键点个数依据需要确定。包含上述人脸关键点识别、身体关键点识别、足部关键点识别和手部关键点识别的识别为全身关键点识别。全身关键点识别的识别对象包括:人脸、身体、足部和手部。根据应用场景的不同,训练时可以仅对其中的部分进行训练识别,例如应用于异常行为识别时,可以仅进行身体关键点识别,或者进行身体关键点识别和人脸关键点识别,或者进行身体关键点识别、人脸关键点识别和手部关键点识别,或者进行全身关键点识别。本实施例以进行全身关键点识别为例进行说明。
本实施例的CNN网络训练方法如图2所示,包括以下步骤10-13。
步骤10,输入待识别图像;
所述待识别图像可以是从图像采集设备获取的,例如可以是图像采集设备直接采集的图像,也可以是图像采集设备采集的视频中的图像。除了从图像采集设备获取外,还可以从存储有图像或者视频的存储设备中获取待识别图像。本发明实施例对采集图像所用的图像采集设备无限制,只要能采集图像即可。所述图像可以是彩色的。所述图像中的人可以是单个也可以是多个。
步骤11,根据预设的待识别对象,对所述待识别图像进行特征分析,得到包含所述待识别图像中待识别对象的一组或多组特征图集;
以全身关键点识别为例,则待识别对象包括:人脸、身体、足部和手部,从待识别图像中获得所有的人脸、身体、足部和头部。该过程也可称为预训练过程。
例如可以使用VGG-19的前10层对输入的待识别图像进行特征分析(例如初始化并微调)以生成一组或多组特征图集F,每组特征图集(或每个特 征图集组)对应一个待识别对象。一组特征图集中包含一个或多个特征图,即一个特征图集中包括一个或多个特征图。例如对待识别图像进行特征分析后可以得到四组特征图集,包括:人脸特征图集、身体特征图集、足部特征图集和手部特征图集,其中每个特征图集包括图中所有相应待识别对象的特征图,例如人脸特征图集中包括图中所有的人脸特征图,手部特征图集包括图中所有的手部特征图。本例中仅以使用VGG-19的前10层为例,在其他实施例中,使用的层数可以与本实施例不同。为提取特征信息得到特征图集F所使用的网络也可以是其他网络。
在一示例性实施例中,当对身体局部,例如脸部、足部或手部提取特征图前,可根据需要提高待识别图像的分辨率,以使得得到的包含所述待识别图像中待识别对象的多组特征图集中至少有两组特征图集的分辨率不同。例如,对身体部分进行特征分析得到的特征图的分辨率为128*128ppi(每英寸像素数),但是针对手部进行特征分析时,如果仍采用128*128ppi的分辨率会导致局部识别精度过低,因此可先将原始图像放大到例如960*960ppi,再提取手部特征图,以保证局部识别的精确度。每个待识别对象的特征图分辨率均可以不同。
步骤12,将所述一组特征图集F输入到用于预测置信度的第一分支,得到置信度预测结果;
本实施例采用单阶段(stage)双分支CNN网络获得人体骨骼图,如图3所示,其中第一分支用于预测置信度(Part Confidence Maps,或称置信图或置信度图),第二分支用于预测局部亲和矢量场(Part Affinity Fields,PAFs,简称亲和场),其中,置信度用于预测关键点位置,亲和场用于表示关键点之间的关联程度。
具体地,将一组特征图集F输入第一分支,并用预设的置信度损失函数约束该第一分支的训练精度,当训练精度满足预设置信度损失函数阈值时,可得到置信度C=ω(F),其中ω()对应第一分支的网络参数。
本实施例中是对所有待识别对象的特征图集同时进行预测训练,即多任务并存,使得实际网络应用时可以同时预测出全身的骨骼图,提高预测速度。同时由于采用多任务训练和预测,使得当人体存在遮挡时不会影响预测结果, 例如当某人的身体被遮挡时,不会影响该人脸部和手部的关键点识别。当识别多人的骨骼图时,可以大幅降低算法复杂度,提高计算速度,减少计算时间。
置信度损失函数f c可采用下式计算获得
Figure PCTCN2019119826-appb-000001
其中,f C为置信度损失函数,j表示关键点,j∈{1,…,J},J为关键点总数;C j(p)为关键点j在图像中p处坐标位置的置信度预测值,
Figure PCTCN2019119826-appb-000002
为该关键点j在p处坐标位置的真实置信度,即真实状态下的人的关节点;R()是一个非0即1的函数,当图像中的p处没有标注为关键点时,则R(p)=0,如果有标注,则R(p)=1。函数R用于避免在训练期间惩罚真正的阳性预测。
步骤13,将所述置信度预测结果与所述一组特征图集输入到用于预测亲和场的第二分支,得到亲和场预测结果;
本实施例中采用全身关键点识别,所述置信度预测结果为一个串联集合,包括4个子集合,分别为人脸关键点子集合、身体关键点子集合、足部关键点子集合和手部关键点子集合(顺序不限)。在其他实施例中,依据识别对象的不同,串联集合中的子集合数量可能有所不同,此处不再赘述。每个子集合中均有与其他一个或多个子集合重合的关键点,以便于后续得到全身的完整骨骼图。例如人脸关键点子集合中有至少一个关键点与身体关键点子集合中的至少一个关键点坐标重合;身体关键点子集合中有至少一个关键点与足部关键点子集合中的至少一个关键点坐标重合,例如左脚踝关键点与左脚关键点子集合中的一个关键点重合,右脚踝关键点与右脚关键点子集合中的一个关键点重合,且该身体关键点子集合中有至少一个关键点与手部关键点子集合中的至少一个关键点坐标重合,例如左手腕关键点与左手关键点子集合中的一个关键点重合,右手腕关键点与右手关键点子集合中的一个关键点重合。每个子集作为一个单元计算亲和场。
具体地,将一组特征图集F和置信度预测结果输入第二分支,同时也用相应的预设亲和场损失函数控制训练精度,在训练精度满足预设亲和场损失 函数阈值时,可得到亲和场Y=θ(F),其中θ()对应第二分支的网络参数。
由于采用多任务并存,当其中身体局部的特征图的分辨率提高时,为了保证检测精度,可以增加第二分支中的卷积块的数量,例如在第二分支中设置10个卷积块,或根据计算速度可以相应增加或减少卷积块数量。第二分支中卷积块数量可以大于第一分支中卷积块的数量。
在一示例性实施方式中,为了提高整体精度,还可以增加第二分支中一个或多个卷积块的宽度,各卷积块的宽度可以相同,也可以不同。例如共有顺序排列的x个卷积块,可以设置最后h个卷积块中每个卷积块的宽度均大于前面x-h个卷积块的宽度,x、h均为大于1的正整数,h<x。例如,前面多个卷积块的宽度为3*3,则可以将最后一个卷积块宽度设置为7*7、或9*9、或12*12等。第一分支与第二分支的卷积块宽度可以不同。
在一示例性实施例中,当增加卷积块的数量同时增加卷积块的宽度后,可将整个第二分支的网络层数减少到10-15层,以保证网络预测速度。
亲和场损失函数f Y可采用下式计算获得:
Figure PCTCN2019119826-appb-000003
其中,f Y为亲和场损失函数,i表示亲和场,i∈{1,…,I},I为亲和场总数;Y i(p)为图像中p处坐标位置的第i个亲和场的预测值,Y i *(p)是p处坐标位置第i个亲和场的真实值,即真实状态下关键点与关键点的关系;R()是一个非0即1的函数,当图像中的p处没有标注为关键点时,则R(p)=0,如果有标注,则R(p)=1。函数R用于避免在训练期间惩罚真正的阳性预测。
在一示例性实施例中,在上述步骤12得到置信度损失函数以及在步骤13得到亲和场损失函数值后,还可以计算总的目标损失函数,并判断该总的目标损失函数是否满足目标损失函数阈值,进一步全面衡量网络预测结果的准确性。当置信度损失函数值满足预设置信度损失函数阈值、亲和场损失函数值满足预设局部亲和矢量场损失函数阈值,且总的目标损失函数值满足预设目标损失函数阈值时,所述用于预测置信度和亲和场的深度卷积神经网络训练完成。在其他实施例中,如果未设置目标损失函数阈值,则当置信度损 失函数值满足预设置信度损失函数阈值、亲和场损失函数值满足预设局部亲和矢量场损失函数阈值时,所述用于预测置信度和亲和场的深度卷积神经网络训练完成。
通过上述步骤11-13可以得到用于预测置信度和亲和场的CNN网络。由于预测所使用的CNN网络为前述单阶段双分支网络,且采用多任务并存机制,因此其可以同时对多个待识别对象进行识别,计算速度快、计算复杂度低,几秒钟便可得到预测结果,适用于需要快速响应的场合。
在一示例性实施例中,在训练得到CNN网络后,基于该CNN网络可以实现人体骨骼图构建。所述人体骨骼图构建方法包括以下步骤21-24。
步骤20-21,同步骤10-11;
步骤22,与步骤12类似,区别在于,第一分支网络参数ω()已经在训练过程中确定,且无需计算置信度损失函数,只要将原始特征图集输入第一分支,即可得到置信度预测结果;
步骤23,与步骤13类似,区别在于,第二分支网络参数θ()已经在训练过程中确定,且无需计算亲和场损失函数,只要将原始特征图集和置信度预测结果输入第二分支,即可得到亲和场预测结果;
步骤24,根据置信度预测结果和亲和场预测结果得到人体骨骼图。
采用亲和场方法可以检测各关键点之间的关联性,能够在整个肢体区域保留位置和旋转信息。亲和场是每个肢体的二维向量场。对于属于某个特定肢体区域中的每个像素的二维矢量编码,都是一个向量从肢体的一个关键点指向另外一个关键点。在一示例性实施例中,在训练和测试期间,可以通过计算相应的亲和场的线性积分来评估连接的好坏。对于两个可能的关键点位置之和,通过积分值来衡量两点间线段的可靠性。
假设真实置信度结果为a个,通过CNN网络得到的置信度预测结果可能是a+b个,结合亲和场从a+b个置信度预测结果中选出a个,并连接形成全身骨骼图。
在计算亲和场时,可采用二分图匹配算法(Bipartute matcing)进行计算。 在本实施例中,为了提高计算速度,简化计算复杂度,将贪婪算法引入二分图匹配算法中,以得到人体骨骼图。
在本发明实施例中,第一分支和第二分支均只需一个阶段即可实现较好的预测结果,无需进行多阶段的预测。
由于每个子集作为一个单元计算亲和场。下面以计算身体部分的亲和场为例,对上述步骤24中采用引入贪婪算法的二分图匹配算法进行说明,计算得到人体骨骼图的过程如图4所示,包括以下步骤241-242。
步骤241,根据置信度预测结果确定关键点的位置,根据所述关键点采用二分图匹配方法计算一个肢体的连接,独立地获得每个肢体(每种肢体类型)的肢体连接,直到获得每种肢体类型的肢体连接;
获得图片中所有身体部位的检测候选集即前述串联集合。只考虑相邻节点的连接,且每次只考虑一个肢体连接。即对于连接一个肢体l的两个关键点,每个关键点具有一个子集,得到两个子集m和n,对m中的关键点和n中的关键点进行两两匹配,计算出两关键点的亲和场,选择亲和场最强的两个关键点进行连接,得到两关键点之间的肢体连接。采用二分图匹配方法可以增加计算速度,在其他实施例中,也可以采用其他算法。
图5a所示为通过第一分支后得到的身体关键点示意图,图5b为计算得到的关键点1到关键点2的连接。
步骤242,把身体的各个关键点都连接起来:对于得到的所有可能的肢体预测,通过共享位置相同的关键点来组装为身体的骨骼图,本例中为身体骨骼图,如图5c所示。
对于每个待识别对象(身体局部)可以采用上述方法得到该待识别对象的骨骼图,之后根据重合的关键点坐标(即共享位置相同的关键点),将所有各局部骨骼图组合在一起,得到全身的骨骼图。
如果在输入CNN网络时提高了某身体部位特征图的分辨率,则需要统一图片尺寸后再进行组装。
在通过上述方法训练得到单阶段双分支CNN网络后,在实际使用过程中,将待识别图像输入通过前述实施例训练得到的CNN网络,再通过该CNN 网络计算并输出图像中所有人的骨骼图。该骨骼图构建方法复杂度低,计算速度快。
实施例二
通过上述实施例一方法训练得到的CNN网络可以应用于对异常行为进行监控,图6为本发明实施例异常行为监控方法的流程图,包括以下步骤31-步骤33。
步骤31,获取待识别图像;
本步骤中所述获取待识别图像可以是从图像采集设备获取,例如可以是获取图像采集设备直接采集的图像,也可以是获取图像采集设备采集的视频中的图像。除了从图像采集设备获取外,还可以从存储有图像或者视频的存储设备中获取。所述图像可以是彩色的,也可以是黑白的。所述监控方法用于阳台场景时,可以从设置于阳台的摄像头获取待识别图像。
本发明实施例对采集图像所用的图像采集设备无限制,只要能采集图像即可。
步骤32,构建所述待识别图像中的人体的骨骼图;
待识别图像中的人可以是一个,也可以是多个,即可以构建单人的骨骼图,也可以构建多人的骨骼图,通过骨骼图可以较准确的描绘出人体的姿态,为后续进行异常行为识别打下良好的基础。具体可以采用实施例一训练得到的CNN网络进行多人姿态估计,先通过训练好的CNN网络得到置信度和亲和场,接着使用引入了贪婪算法(或称贪心算法)的二分图匹配算法解析置信度和亲和场,最终得到多个人的骨骼图。
步骤33,对所述人体骨骼图进行行为识别,判断为异常行为时,触发报警。
异常行为例如可以是预先设置的不安全动作,根据监控方法适用的场景可以自行定义不安全动作。例如所述监控方法用于阳台场景时,不安全动作可以包括但不限于以下动作的一种或多种:攀爬、攀高、闯入、跌倒等。可以预先设置动作库,用于对异常行为进行定义,或者对人体骨骼图进行实时 识别。当满足异常行为条件即符合异常行为(例如不安全动作)特征时,进行报警。
本发明实施例提出的异常行为监控方法,通过对获取的待识别图像进行人体骨骼图构建,针对构建的人体骨骼图进行异常动作(如不安全动作)识别,一旦发现异常行为即刻触发报警。可实现自动、智能地捕捉异常行为,且识别准确,避免了人工监控的误判率和漏判率,同时减少人工成本。上述异常监控方法可以应用于进行异常行为识别监控的服务器或者客户端(也可称为用户端)。
本发明实施例可以适用于各种安防监控场景。针对不同的安防监控场景,只需设置相应的异常行为动作库即可。例如,可以应用于工厂、写字楼等工作场所,还可以应用于居家场景。由于预测所使用的CNN网络为前述单阶段双分支网络,且采用多任务并存机制,因此其预测速度很快,几秒钟便可得到预测结果,适用于需要快速响应的场合。
在一示例性实施例中,以针对阳台异常行为监控为例进行说明,为了判别某个动作是不是安全的,需要先对不安全的动作进行定义。
在此,本实施例中定义了四类动作为不安全动作,分别是攀爬(图7a)、攀高(图7b)、闯入(图7c)和跌倒(图7d)。
攀爬行为和攀高行为是从两个角度来判断同一种登高的动作。例如,当人的脚超过了一定的高度(比如0.3米),就认为出现攀爬行为,则触发报警。攀高行为则可以是当人的头出现在超过正常人高度的地方,比如2米,认为出现攀高行为,就触发报警。某种意义上,这两种行为是可能有重叠的,也可能不重叠。比如小孩爬到了一定的高度,高于0.3米低于2米,就触发攀爬行为,而不会触发攀高行为。如果一个大人,爬到一定的高度,但是由于衣物等遮挡,摄像头没有检测到这个人的脚,但是在高于2米的范围内检测到了头,就会触发攀高事件,而不会触发攀爬事件。如果攀爬时刚好脚高于0.3米,头在2米以上的空间,攀爬和攀高两个事件都会触发,导致报警。下面对上述四种行为分别进行介绍。
(a)攀爬行为
在阳台的监控画面下,如果有人攀爬栏杆、窗户等,可通过客户端(例如手机)弹出攀爬事件的预警。攀爬示意图见图7a。
在一示例中,可以将双脚都离开了地面并且身体形态为往上爬定义为攀爬动作。该动作的设置规则可以是,将阳台靠室外方向从距离地面一定高度(例如0.3米,该高度用户可以自行设置)到天花板的区域设置为警戒区域,如果判断在该区域出现肢体类型为腿(或者出现腿和脚),就判定为攀爬动作。该类报警通常不会出现误判。
(b)攀高行为
在阳台的监控画面下,如果有人出现在一个高于正常人的高度范围,可通过客户端弹出攀高的预警,攀高示意图见图7b。
在***设定的高度范围内,如果出现了人头就定义为攀高。该动作的设置例如可以将阳台上超过正常人的高度(例如2米,该高度用户可自行设置)到屋顶的区域设置为警戒区域。如果在该警戒区域内检测到人的头部关键点,或者脸部骨骼图,就触发客户端预警。攀高事件属于骨骼特征与人体姿态的综合识别,这类动作报警通常无误判。
(c)闯入行为
如果发现有人闯入监控画面,可通过客户端弹出闯入行为预警。可根据需要设置监控时间段(或称布防时间段),比如在晚上睡觉时间有人闯入阳台,则可以触发报警(见图7c)。
可以把画面检测中检测到有人的事件定义为闯入事件。在设置规则时,可以设置有效监控区域(例如可将整个阳台区域默认为监控区),以及布防时间段,当在该时间段内有人闯入有效监控区域则触发报警。该类报警属于骨骼识别类动作,通常没有误判。
(d)跌倒行为
在开启摄像机的跌倒行为识别后,监控画面下如果发现有人突然晕倒、摔倒等情况,可在客户端屏幕上弹出倒地的预警画面。
从骨骼图的角度来说,当人的头部、臀部和脚部都处于同一个平行于地面的平面时,定义为跌倒。对该动作设置规则时,可不必设置警戒区域和布 防时间段,可全区域全时间范围内实施监测。用户可以对灵敏度进行调整。灵敏度越低,识别规则要求越高,误报会降低。灵敏度越高,识别规则要求就越低,误报会增加。此外还可以设置跌倒时间阈值,例如当人跌倒在地上,如果在跌倒时间阈值(例如2分钟,用户可自行设置)内爬起来,则不报警,如果超过跌倒时间阈值(例如2分钟,用户可自行设置)还是没有爬起来,则报警。
当通过实施例一训练方法得到的CNN网络应用于异常行为识别时,特别是应用于对生命安全有影响的异常行为识别时,几秒钟的差异就可能造成不同的结果,采用该CNN网络可以较快得出结果,最大限度争取时间。
实施例三
本发明实施例提出一种基于CNN网络的异常行为监控***,当监控区域出现异常行为(例如不安全的行为)时,客户端就会即时收到预警信息和画面。所述***应用于阳台场景的部署如图8所示,包括:
图像采集装置,设置为采集待识别图像;
服务器端,设置为获取图像采集装置发送的待识别图像,采用CNN网络获取所述待识别图像中的人体的骨骼图,并对所述骨骼图进行行为识别,判断有异常行为时,向客户端发送报警信号;
客户端,设置为接收服务器端发送的报警信号,根据所述报警信号触发报警,如果所述报警信号中包含预警图像时,则实时显示所述预警图像。
采用本发明实施例的异常行为监控***,通过对获取的待识别图像进行人体骨骼图构建,针对构建的人体骨骼图进行异常行为识别,一旦发现异常行为即刻触发报警。可实现自动、智能地捕捉异常行为,且识别准确,避免了人工监控的误判率和漏判率,同时减少人工成本。
例如,当所述监控***为阳台安防***时,可在多个用户的阳台安装摄像头,这些摄像头可以采集阳台的实时视频。服务器端可以接收多个用户阳台摄像头发送的实时视频并进行实时分析,所述服务器可以设置在云端,当云端服务器判断有异常行为时,向相应的客户端发送报警信号。客户端可通 过用户手持终端下载相应的应用程序(APP)实现,该客户端可提供用户设置以下内容的一种或多种:需要监控的异常行为(例如以下行为的一种或多种:攀高、攀爬、闯入和跌倒),预警区域、监控区域、监控时间段以及监控的灵敏度等。
本发明实施例所述异常行为监控***,主要优势就是能够快速、主动的防御和提前预警。通过客户端提前把用户需要的各种异常行为设置好,向用户报警***识别出的各种异常行为。依据云计算和行为识别分析能力,解决了依靠人力去发现异常问题的困境。该***还可以将各种突发事件的现场照片发给用户客户端,方便用户及时处理和解决场内发生的问题。本实施例***不仅是可以适用于大型公共场合,还可以适用于家居安防智能监控。
本发明实施例的智能行为识别,是基于实时多人人体姿态识别。给定一张RGB图片,可以得到所有人的关键点位置信息,同时可以确定每一个关键点是属于图片中哪个人的,即关键点之间的连接信息。传统的多人人体子团体估计算算法一般采用自上而下的方法,这种方法的第一主要缺陷是依赖人体姿态的检测,第二个缺陷就是算法速度与图片中人的数目成正比。本***采用的是自下而上的方法,首先检测到人体的关键点,然后通过计算亲和场来连接这些关键点,最后勾勒出人体的骨骼图。另外,相对传统的分析方法,本发明实施例是实时地在视频上对每一帧图像进行检测,同时由于训练得到的CNN网络能够同时执行多任务,因此本***对异常行为事件处理的响应速度,会比传统方法的响应速度快很多。
本发明实施例还提供一种计算机存储介质,所述计算机存储介质存储有计算机程序;所述计算机程序被执行后,能够实现前述一个或多个实施例提供的深度卷积神经网络的训练方法、或者基于深度卷积神经网络的人体骨骼图构建方法、或者基于深度卷积神经网络的异常行为监控方法。所述计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、 磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。
在本申请一示例性实施例中,还提供了一种计算机设备。所述设备可包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本发明实施例深度卷积神经网络训练方法、或人体骨骼图构建方法、或异常行为监控方法的操作。
如图9所示,在一个示例中,计算机接收40可包括:处理器410、存储器420、总线***430和收发器440,其中,该处理器410、该存储器420和该收发器440通过该总线***1430相连,该存储器1420设置为存储指令,该处理器410设置为执行该存储器420存储的指令,以控制该收发器440发送信号。
应理解,处理器410可以是中央处理单元(Central Processing Unit,简称为“CPU”),处理器410还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器420可以包括只读存储器和随机存取存储器,并向处理器410提供指令和数据。存储器420的一部分还可以包括非易失性随机存取存储器。
总线***430除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图9中将各种总线都标为总线***430。
在实现过程中,该计算机设备所执行的处理可以通过处理器410中的硬件的集成逻辑电路或者软件形式的指令完成。即本发明实施例所公开的方法的步骤可以体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等存储介质中。该存储介质位于存储器420,处理器410读取存储器420中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、***、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由一个或多个物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
本领域的普通技术人员应当理解,可以对本申请实施例的技术方案进行修改或者等同替换,而不脱离本申请技术方案的精神和范围,均应涵盖在本申请的权利要求范围当中。

Claims (14)

  1. 一种深度卷积神经网络的训练方法,所述深度卷积神经网络为单阶段双分支卷积神经网络,包括用于预测置信度的第一分支和用于预测局部亲和矢量场的第二分支,所述方法包括:
    输入待识别图像;根据预设的待识别对象,对所述待识别图像进行特征分析,得到包含所述待识别图像中待识别对象的一组或多组特征图集,每组特征图集对应一个待识别对象;将所述一组特征图集输入到所述深度卷积神经网络的第一分支,得到置信度预测结果;将所述置信度预测结果与所述一组特征图集输入到所述深度卷积神经网络的第二分支,得到局部亲和场预测结果。
  2. 根据权利要求1所述的训练方法,所述方法还包括:
    在所述第一分支得到所述置信度预测结果后,计算置信度损失函数,判断是否满足预设置信度损失函数阈值;
    在所述第二分支得到所述局部亲和矢量场预测结果后,计算局部亲和矢量场损失函数,判断是否满足预设局部亲和矢量场损失函数阈值;
    当满足预设置信度损失函数阈值且满足预设局部亲和矢量场损失函数阈值时,所述深度卷积神经网络训练完成。
  3. 根据权利要求2所述的训练方法,所述方法还包括:
    计算所述置信度损失函数与局部亲和矢量场损失函数之和,判断是否满足预设目标损失函数阈值;
    当满足预设置信度损失函数阈值、满足预设局部亲和矢量场损失函数阈值且满足预设目标损失函数阈值时,所述深度卷积神经网络训练完成。
  4. 根据权利要求1所述的训练方法,所述对所述待识别图像进行特征分析前,所述方法还包括:提高所述待识别图像的分辨率;所述得到的包含所述待识别图像中待识别对象的多组特征图集中,至少有两组特征图集的分辨率不同。
  5. 根据权利要求1所述的训练方法,其中:
    所述第二分支中卷积块的数量大于第一分支中卷积块的数量。
  6. 根据权利要求1所述的训练方法,其中:
    所述第二分支中包括顺序排列的x个卷积块,所述第二分支中最后h个卷积块中每个卷积块的宽度均大于前面x-h个卷积块的宽度,x和h为大于1的正整数,h<x。
  7. 一种基于深度卷积神经网络的人体骨骼图构建方法,所述深度卷积神经网络为单阶段双分支卷积神经网络,包括用于预测置信度的第一分支和用于预测局部亲和矢量场的第二分支,所述方法包括:
    输入待识别图像;根据预设的待识别对象,对所述待识别图像进行特征分析,得到包含所述待识别图像中待识别对象的一组或多组特征图集,每组特征图集对应一个待识别对象;将所述一组特征图集输入到所述深度卷积神经网络的第一分支,得到置信度预测结果;将所述置信度预测结果与所述一组特征图集输入到所述深度卷积神经网络的第二分支,得到局部亲和场预测结果;根据所述置信度预测结果和局部亲和场预测结果得到人体骨骼图。
  8. 根据权利要求7所述的人体骨骼图构建方法,其中,所述根据所述置信度预测结果和局部亲和场预测结果得到人体骨骼图,包括:
    对于每个待识别对象,根据所述置信度预测结果得到关键点的位置,根据所述关键点采用二分图匹配方法计算获得每种肢体类型的肢体连接,共享位置相同的关键点,组成所述待识别对象的骨骼图。
  9. 根据权利要求7所述的人体骨骼图构建方法,所述对所述待识别图像进行特征分析前,所述方法还包括:提高所述待识别图像的分辨率;所述得到的包含所述待识别图像中待识别对象的多组特征图集中,至少有两组特征图集的分辨率不同。
  10. 根据权利要求7所述的人体骨骼图构建方法,其中:
    所述第二分支中卷积块的数量大于第一分支中卷积块的数量。
  11. 根据权利要求7所述的人体骨骼图构建方法,其中:
    所述第二分支中包括顺序排列的x个卷积块,所述第二分支中最后h个卷积块中每个卷积块的宽度均大于前面x-h个卷积块的宽度,x和h为大于1 的正整数,h<x。
  12. 一种基于深度卷积神经网络的异常行为监控方法,所述方法包括:
    获取待识别图像;
    采用权利要求7-11中任一方法获取所述待识别图像中的人体的骨骼图;
    对所述骨骼图进行行为识别,判断为异常行为时,触发报警。
  13. 根据权利要求12所述的异常行为监控方法,其中:
    所述异常行为包括但不限于以下动作中的一种或多种:攀爬、攀高、闯入、跌倒。
  14. 一种基于深度卷积神经网络的异常行为监控***,所述***包括:
    图像采集装置,设置为采集待识别图像;
    服务器端,设置为获取图像采集装置发送的待识别图像,采用权利要求7-11中任一方法获取所述待识别图像中的人体的骨骼图,并对所述骨骼图进行行为识别,判断有异常行为时,向客户端发送报警信号;以及
    客户端,设置为接收服务器端发送的报警信号,根据所述报警信号触发报警。
PCT/CN2019/119826 2019-10-28 2019-11-21 神经网络训练、骨骼图构建、异常行为监控方法和*** WO2021082112A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911034172.X 2019-10-28
CN201911034172.XA CN110929584A (zh) 2019-10-28 2019-10-28 网络训练方法、监控方法、***、存储介质和计算机设备

Publications (1)

Publication Number Publication Date
WO2021082112A1 true WO2021082112A1 (zh) 2021-05-06

Family

ID=69849636

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/119826 WO2021082112A1 (zh) 2019-10-28 2019-11-21 神经网络训练、骨骼图构建、异常行为监控方法和***

Country Status (3)

Country Link
US (1) US20210124914A1 (zh)
CN (1) CN110929584A (zh)
WO (1) WO2021082112A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326778A (zh) * 2021-05-31 2021-08-31 中科计算技术西部研究院 基于图像识别的人体姿态检测方法、装置和存储介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11138414B2 (en) * 2019-08-25 2021-10-05 Nec Corporation Of America System and method for processing digital images
CN112131985B (zh) * 2020-09-11 2024-01-09 同济人工智能研究院(苏州)有限公司 一种基于OpenPose改进的实时轻量人体姿态估计方法
TWI733616B (zh) * 2020-11-04 2021-07-11 財團法人資訊工業策進會 人體姿勢辨識系統、人體姿勢辨識方法以及非暫態電腦可讀取儲存媒體
CN113673601B (zh) * 2021-08-23 2023-02-03 北京三快在线科技有限公司 一种行为识别方法、装置、存储介质及电子设备
CN114445851A (zh) * 2021-12-15 2022-05-06 厦门市美亚柏科信息股份有限公司 基于视频的谈话场景异常检测方法、终端设备及存储介质
CN114550287B (zh) * 2022-01-27 2024-06-21 福建和盛高科技产业有限公司 基于人体关键点的变电站场景下人员行为异常检测方法
CN116189311B (zh) * 2023-04-27 2023-07-25 成都愚创科技有限公司 一种防护服穿戴标准化流程监测***
CN116863638B (zh) * 2023-06-01 2024-02-23 国药集团重庆医药设计院有限公司 一种基于主动预警的人员异常行为检测方法及安防***

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170344829A1 (en) * 2016-05-31 2017-11-30 Microsoft Technology Licensing, Llc Skeleton -based action detection using recurrent neural network
CN109460702A (zh) * 2018-09-14 2019-03-12 华南理工大学 基于人体骨架序列的乘客异常行为识别方法
CN110210323A (zh) * 2019-05-09 2019-09-06 浙江大学 一种基于机器视觉的溺水行为在线识别方法
CN110378281A (zh) * 2019-07-17 2019-10-25 青岛科技大学 基于伪3d卷积神经网络的组群行为识别方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052896B (zh) * 2017-12-12 2020-06-02 广东省智能制造研究所 基于卷积神经网络与支持向量机的人体行为识别方法
CN110135319B (zh) * 2019-05-09 2022-09-16 广州大学 一种异常行为检测方法及其***
CN110298332A (zh) * 2019-07-05 2019-10-01 海南大学 行为识别的方法、***、计算机设备和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170344829A1 (en) * 2016-05-31 2017-11-30 Microsoft Technology Licensing, Llc Skeleton -based action detection using recurrent neural network
CN109460702A (zh) * 2018-09-14 2019-03-12 华南理工大学 基于人体骨架序列的乘客异常行为识别方法
CN110210323A (zh) * 2019-05-09 2019-09-06 浙江大学 一种基于机器视觉的溺水行为在线识别方法
CN110378281A (zh) * 2019-07-17 2019-10-25 青岛科技大学 基于伪3d卷积神经网络的组群行为识别方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326778A (zh) * 2021-05-31 2021-08-31 中科计算技术西部研究院 基于图像识别的人体姿态检测方法、装置和存储介质

Also Published As

Publication number Publication date
US20210124914A1 (en) 2021-04-29
CN110929584A (zh) 2020-03-27

Similar Documents

Publication Publication Date Title
WO2021082112A1 (zh) 神经网络训练、骨骼图构建、异常行为监控方法和***
CN108629791B (zh) 行人跟踪方法和装置及跨摄像头行人跟踪方法和装置
CN111383421B (zh) 隐私保护跌倒检测方法和***
CN110569772B (zh) 一种泳池内人员状态检测方法
JP6905850B2 (ja) 画像処理システム、撮像装置、学習モデル作成方法、情報処理装置
US20220180534A1 (en) Pedestrian tracking method, computing device, pedestrian tracking system and storage medium
CN107256377B (zh) 用于检测视频中的对象的方法、设备和***
Othman et al. A new IoT combined body detection of people by using computer vision for security application
CN109190475A (zh) 一种人脸识别网络与行人再识别网络协同训练方法
JP2010206405A (ja) 画像監視装置
Cardile et al. A vision-based system for elderly patients monitoring
Shalnov et al. Convolutional neural network for camera pose estimation from object detections
CN112541403A (zh) 一种利用红外摄像头的室内人员跌倒检测方法
US20240135579A1 (en) Method for human fall detection and method for obtaining feature extraction model, and terminal device
KR102564300B1 (ko) 체온 행동 패턴을 이용한 학교 폭력 예방 시스템
JP2012212238A (ja) 物品検出装置および静止人物検出装置
CN111144260A (zh) 一种翻越闸机的检测方法、装置及***
CN116152745A (zh) 一种抽烟行为检测方法、装置、设备及存储介质
Gupta et al. SSDT: distance tracking model based on deep learning
Zhang A Yolo-based Approach for Fire and Smoke Detection in IoT Surveillance Systems.
KR20230064095A (ko) 딥러닝 기반 영상분석을 통한 이상행동 탐지 장치 및 방법
TW201447825A (zh) 保全影像辨識系統
Rothmeier et al. Comparison of Machine Learning and Rule-based Approaches for an Optical Fall Detection System
TWI820784B (zh) 一種具安全照護及高隱私處理的跌倒及姿態辨識方法
Cai et al. A new family monitoring alarm system based on improved yolo network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19951114

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19951114

Country of ref document: EP

Kind code of ref document: A1