CN112766168B

CN112766168B - Personnel fall detection method and device and electronic equipment

Info

Publication number: CN112766168B
Application number: CN202110078426.9A
Authority: CN
Inventors: 刘豹; 李承烈; 刘瑞洁; 李哲山; 司雁天
Original assignee: Greatwood System Integration Co ltd; Beijing Yunyang Technology Co ltd
Current assignee: Greatwood System Integration Co ltd; Beijing Yunyang Technology Co ltd
Filing date: 2021-01-20
Publication date: 2024-06-28
Anticipated expiration: 2041-01-20

Abstract

The invention provides a method and a device for detecting personal tumbling and electronic equipment, wherein the method comprises the following steps: acquiring a monitoring video image; inputting the monitoring video image into a pre-trained neural network to obtain key position features and motion features of a human body in the monitoring video image; and fusing the key position features of the human body with the motion features of the human body, classifying the fused features, and judging whether the fall occurs or not. By implementing the method, classification is carried out according to the characteristics of the key position characteristics and the motion characteristics of the human body, so that the human body gestures similar to the falling gestures but different motion characteristics can be eliminated, and the accuracy of falling detection is improved.

Description

Personnel fall detection method and device and electronic equipment

Technical Field

The invention relates to the field of electronics, in particular to a method and a device for detecting personal fall and electronic equipment.

Background

Based on current research, in the home environment, a fall of an elderly person or a child is one of the main causes of injury to the elderly person or the child. If the old people or children can be timely found to fall down and timely processed according to different conditions, the damage to the old people or children caused by falling down can be greatly reduced.

In the related art, a human body key point in a monitoring image is obtained by inputting the monitoring image into a neural network, and a human body posture formed by the human body key point is matched with a pre-stored human body posture feature, so that whether a human body falls down or not is judged. However, in reality, the activities of the human body are diversified, and some gesture features are relatively close to the falling gesture features, for example, the gesture features of parents sitting on the ground and accompanying with children playing are similar to the falling gesture features, and the parents are likely to be judged to fall, so that the falling detection accuracy is low.

Disclosure of Invention

In view of the above, the embodiment of the invention provides a method and a device for detecting a falling of a person and electronic equipment, so as to solve the defect of low falling detection accuracy in the prior art.

According to a first aspect, an embodiment of the present invention provides a method for detecting a fall of a person, including the steps of: acquiring a monitoring video image; inputting the monitoring video image into a pre-trained neural network to obtain key position features and motion features of a human body in the monitoring video image; and fusing the key position features of the human body with the motion features of the human body, classifying the fused features, and judging whether the fall occurs or not.

Optionally, inputting the monitoring image to a pre-trained neural network to obtain a human body key position feature and a human body motion feature in the monitoring image, including: inputting the monitoring image into a pre-trained Pix2Pose network, and obtaining the predicted coordinates of the human body shielding part when the human body is shielded; and obtaining the key position characteristics of the human body and the motion characteristics of the human body in the monitoring image according to the predicted coordinates of the human body shielding part.

Optionally, the human motion features include: human centroid changes between multiple video frames and human torso changes from horizontal.

Optionally, the angular change of the human torso from horizontal between video frames is determined by the following formula:

Wherein θ represents the angular variation of the human torso from the horizontal direction between video frames, y ₁ represents the ordinate of the target feature point of the human torso in the first video frame, y ₂ represents the ordinate of the target feature point of the human torso in the second video frame, x ₁ represents the abscissa of the target feature point of the human torso in the first video frame, x ₂ represents the abscissa of the target feature point of the human torso in the second video frame, max (y ₁,y₂) represents the maximum value in y ₁,y₂, min (y ₁,y₂) represents the minimum value in the ordinate y ₁,y₂, max (x ₁,x₂) represents the maximum value in x ₁,x₂, and min (x ₁,x₂) represents the minimum value in x ₁,x₂.

Optionally, inputting the monitoring image to a pre-trained Pix2Pose network, when the human body is blocked, obtaining predicted coordinates of a blocked part of the human body, including: acquiring a shielded human body detection frame in the monitoring image; translating the human body detection frame to enable the center of the human body detection frame to coincide with the center of the shielded human body, eliminating background and uncertain pixels, and obtaining an image to be detected, wherein the uncertain pixels are determined according to errors of the pixels and an outlier threshold; and determining the predicted coordinates of the human body shielding part according to the predicted error determined by the image to be detected and the interior point threshold value.

Optionally, when the falling is judged, an alarm is sent out.

Optionally, the monitoring video image is an infrared video image.

According to a second aspect, an embodiment of the present invention provides a personal fall detection apparatus including: the video image acquisition module is used for acquiring a monitoring video image; the feature determining module is used for inputting the monitoring video image into a pre-trained neural network to obtain the key position features and the motion features of the human body in the monitoring video image; and the feature fusion module is used for fusing the key position features of the human body with the motion features of the human body, classifying the fused features and judging whether the human body falls down or not.

According to a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method for detecting a personal fall according to the first aspect or any implementation manner of the first aspect when the processor executes the program.

According to a fourth aspect, an embodiment of the present invention provides a storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method for detecting a fall of a person according to the first aspect or any of the embodiments of the first aspect.

The technical scheme of the invention has the following advantages:

according to the human fall detection method, the human body key position features and the human body motion features in the acquired monitoring video image are extracted through the pre-trained neural network, classification is carried out according to the features fusing the human body key position features and the human body motion features, human body behaviors similar to fall postures but different in motion features can be eliminated, and the fall detection accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart showing a specific example of a method for detecting a human fall in an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a specific example of a human fall detection device in accordance with an embodiment of the present invention;

fig. 3 is a schematic block diagram of a specific example of an electronic device in an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; the two components can be directly connected or indirectly connected through an intermediate medium, or can be communicated inside the two components, or can be connected wirelessly or in a wired way. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The embodiment provides a method for detecting a person falling, as shown in fig. 1, comprising the following steps:

s101, acquiring a monitoring video image;

The surveillance video image may be a natural light surveillance video image, for example, although an infrared surveillance video image may be selected to protect the privacy of the user. The mode of acquiring the monitoring video image can be to install a natural light camera or an infrared camera in a range needing to be monitored. The manner of acquiring the surveillance video image and the type of the surveillance video image are not limited in this embodiment, and may be determined by those skilled in the art according to need.

S102, inputting a monitoring video image into a pre-trained neural network to obtain key position features and motion features of a human body in the monitoring video image;

Illustratively, the pre-trained neural network may be a dense convolutional neural network. The key position features and the motion features of the human body in the monitoring video image are obtained through the dense convolutional neural network, the gradient dissipation problem in the training process is effectively reduced, meanwhile, as the features are largely multiplexed, a large number of features can be generated through a small number of convolutional kernels, the small model size is obtained, the method has good applicability to human body posture judgment with high model fineness requirements, and higher resolution can be obtained.

The dense convolution neural network adopts a Pre-activation strategy to carry out unit design, before BN operation is moved up to a branch from a main branch, (BN- > ReLU- >1x1Conv- > BN- > ReLU- >3x3 Conv), downsampling is carried out after each stage, and characteristic dimensions are compressed to half of the current input through a convolution layer to carry out pooling operation; by adopting the dense block design, the number of output feature maps of each convolution layer is small, each layer is directly connected with input and loss, and different from a residual error network ResNet, the input is directly transmitted into and output from the non-mapping relation is as follows:

x_ι＝H_ι([x₀,x₁,…,x_ι-1])

Wherein x ₀,x₁,…,x_l-1 represents that the feature maps output by each layer are combined in a channel mode, so that the problem of information obstruction possibly occurring in information flow between different layers of information in a residual network is solved. In the training process, the 3×3 convolution of each dense block comprises a1×1 convolution operation, so that the dimension and the calculation amount can be reduced, and the characteristics of each channel can be fused. The dense convolutional neural network determines key position features of a human body according to the generated data, wherein the key position features of the human body can comprise key points of bones such as human head, shoulders and the like.

When training the dense convolutional neural network, the COCO data set can be used as a training sample, or a large number of images shot by the camera equipment can be used as training samples for pre-training to obtain pre-training parameters. In generating the training samples, a picture in the COCO dataset or a large number of images taken by the image capturing apparatus are first subjected to preprocessing, such as matrixing and color processing operations. When the label is set on the sample, key position features in the sample can be marked by using LabelImg tools, so that one-to-one corresponding xml files are generated, and the xml files are sent into the dense convolutional neural network for pre-training.

In order to extract more key position features of the human body, a plurality of parts of the human body trunk can be regarded as multitask learning, multitask S= { S ₁,S₂,…,S_P }, wherein S represents tasks such as tasks corresponding to arms, shoulders, waist, legs, knees and the like, and the recognition rate of key points of the human body trunk and bones is improved by customizing semantic attributes corresponding to the tasks. Through putting together the training sample set of multitasking, study training in parallel, the correlation between each data of deep consideration has effectively avoided the condition of data fracture to take place, has promoted the learning rate of shallow shared layer simultaneously, can great improvement learning rate, improves the learning effect.

The human motion characteristics can be determined by the change in the height of the centroid of the human body between video frames and the change in the angle of the human body with the horizontal direction. The human body can be connected with the diagonal line of the detected human body detection frame, the connection intersection point is extracted to be used as the human body centroid, the centroid height gradually decreases along with gradual falling of the human body from standing, therefore, the height change of the centroid can be used as the human body motion characteristic. The determination of the angular change of the human body from the horizontal direction can be obtained by the following steps: firstly, acquiring the key position characteristics of the same human body in adjacent video frames, such as feet and the like, and establishing a coordinate system with the feet as original points; secondly, acquiring the coordinates of another human body key position feature of the adjacent video frame under the coordinate system, for example, the coordinates of the head under the coordinate system of the current video frame; and finally, according to the head coordinates of two adjacent video frames, determining the angle change between the human body and the ground in the horizontal direction.

According to the head coordinates of two adjacent video frames, the mode of determining the angle change between the human body and the ground in the horizontal direction can be obtained by the following formula:

Where θ represents the angular change of the human torso from the horizontal direction in the adjacent video frames, y ₁ represents the ordinate of the target feature point (head) of the human torso in the first video frame (the ordinate in the coordinate system established with the human foot as the origin), y ₂ represents the ordinate of the target feature point (head) of the human torso in the second video frame (the ordinate in the coordinate system established with the human foot as the origin), x ₁ represents the abscissa of the target feature point (head) of the human torso in the first video frame (the abscissa in the coordinate system established with the human foot as the origin), x ₂ represents the abscissa (the abscissa in the coordinate system established with the human foot as the origin), max (y ₁,y₂) represents the maximum value in y ₁,y₂, min (y ₁,y₂) represents the minimum value in the ordinate y ₁,y₂, max (x ₁,x₂) represents the maximum value in x ₁,x₂, and min (x ₁,x₂) represents the minimum value in x ₁,x₂.

S103, fusing the key position features of the human body and the motion features of the human body, classifying the fused features, and judging whether the fall occurs or not.

The method for fusing the key position features of the human body and the motion features of the human body can be characterized by fusing the key position features of the human body and the motion features of the human body by utilizing a full connection layer, and mapping the key position features and the motion features of the human body to the same feature space; and (3) stretching the convolution layer, entering the full-connection layer to obtain 169-dimensional feature vectors, obtaining 12-dimensional vectors by using the human motion features activated by Sigmoid, and performing feature fusion by using a full-connection neural network with 3 nodes to obtain fusion feature data.

Classifying the fused features, and judging whether the fused feature data fall occurs or not by classifying the fused feature data by utilizing a random forest, setting N samples, wherein each sample has M attributes, randomly selecting the M attributes from the M attributes when each node of the decision tree needs to be split, and meeting the condition M < M. And then selecting 1 attribute from the m attributes based on the coefficient of the Kennel as the splitting attribute of the node until the splitting position cannot be continued, and building more decision trees according to the steps to form a random forest to finish classification.

As an optional implementation manner of this embodiment, inputting the monitoring image to the pre-trained neural network to obtain the key position feature and the motion feature of the human body in the monitoring image, including: inputting the monitoring image into a pre-trained Pix2Pose network, and obtaining the predicted coordinates of the human body shielding part when the human body is shielded; and obtaining the key position characteristics of the human body and the motion characteristics of the human body in the monitoring image according to the predicted coordinates of the human body shielding part.

Illustratively, the monitoring image is input to a pre-trained Pix2Pose network, and the normalized three-dimensional coordinates l3D of each pixel in the target coordinate system and the estimated error l _e of each prediction can be output.

Wherein n represents the number of pixels,Representing the prediction of the error and,Representing the error prediction loss.

The specific step of obtaining the predicted coordinates of the human body shielding part may include: firstly, acquiring a shielded human body detection frame in a monitoring image; and secondly, translating the human body detection frame to enable the center of the human body detection frame to coincide with the center of the shielded human body, eliminating the background and uncertain pixels, and obtaining an image to be detected. The predicted coordinate image l3D designates the object pixel including the occlusion part by acquiring non-zero value pixels, if the error prediction of the pixel is larger than an outlier threshold θ ₀, the object is to include more visible pixels, and the noise pixel is removed by using a training chart of artificial occlusion, wherein θ ₀ is determined by three values, namely, three values of three-dimensional model is converted into a color coordinate model, the coordinates of each point are directly mapped to three values of red, green and blue in the color space, and the outlier threshold is preset to be a smaller value, for example, 0.1. Then, according to the prediction error and the interior point threshold value determined by the image to be detected, determining the prediction coordinate of the human body shielding part, and according to the obtained image to be detected, predicting the final prediction coordinate and the expected error, when the prediction error is larger than the interior point threshold value theta _i, removing the point in the three-dimensional coordinate sample, so that each pixel in the image has a corresponding value of the three-dimensional point in the object coordinate; wherein, the determination method of theta _i is consistent with theta ₀. Setting the iteration times to 15000 times, then using a RANSAC algorithm to maximize the number of interior points, eliminating redundant points, and performing iterative computation to obtain the predicted coordinates of the human body shielding part. The specific content of the manner of obtaining the key position features and the motion features of the human body in the monitored image according to the predicted coordinates of the human body shielding part is referred to the corresponding parts of the above embodiment, and will not be described herein.

According to the personnel fall detection method, the monitoring image is input to the pre-trained Pix2Pose network, when a human body is shielded, the predicted coordinates of the shielded part of the human body are obtained, the key position features of the human body and the motion features of the human body in the monitoring image are obtained according to the predicted coordinates of the shielded part of the human body for subsequent fall judgment, the fall behavior can be identified even if the environment is partially shielded under the condition that the monitoring environment is very complex, and the accuracy of personnel fall detection is further improved.

As an alternative implementation manner of this embodiment, when it is determined that a fall occurs, an alarm is sent.

The alarm may be sent to a terminal device associated with the alarm in a manner of a short message or an application reminder, for example, the alarm may be sent to a mobile phone terminal of a family member in a manner of a short message or a reminder is sent to a corresponding APP/webpage, or the alarm may be sent to an audible and visual alarm, and prompt related personnel to process the alarm.

The embodiment of the invention provides a personnel fall detection device, as shown in fig. 2, comprising:

a video image acquisition module 201, configured to acquire a monitoring video image; the specific content refers to the corresponding parts of the method embodiments, and will not be described herein.

The feature determining module 202 is configured to input the monitoring video image to a pre-trained neural network, so as to obtain a human body key position feature and a human body motion feature in the monitoring video image; the specific content refers to the corresponding parts of the method embodiments, and will not be described herein.

And the feature fusion module 203 is configured to fuse the key position features of the human body with the motion features of the human body, classify the fused features, and determine whether a fall occurs. The specific content refers to the corresponding parts of the method embodiments, and will not be described herein.

According to the personal falling detection device provided by the embodiment, the human body key position features and the human body motion features in the acquired monitoring video image are extracted through the pre-trained neural network, and the human body gestures similar to falling gestures but different in motion features can be eliminated by classifying the features of the human body key position features and the human body motion features, so that the falling detection accuracy is improved.

As an alternative implementation of this embodiment, the feature determining module 202 includes:

the coordinate prediction module is used for inputting the monitoring image into a pre-trained Pix2Pose network, and obtaining the predicted coordinate of the human body shielding part when the human body is shielded; the specific content refers to the corresponding parts of the method embodiments, and will not be described herein.

And the characteristic determination submodule is used for obtaining the key position characteristic and the motion characteristic of the human body in the monitoring image according to the predicted coordinates of the human body shielding part. The specific content refers to the corresponding parts of the method embodiments, and will not be described herein.

As an alternative implementation manner of this embodiment, the feature fusion module 203 includes: and the centroid angle change extraction module is used for human centroid change among a plurality of video frames and angle change of human trunk and horizontal direction. The specific content refers to the corresponding parts of the method embodiments, and will not be described herein.

As an optional implementation manner of this embodiment, the centroid angle change extraction module includes performing:

Wherein θ represents the angular variation of the human torso from the horizontal direction between video frames, y ₁ represents the ordinate of the target feature point of the human torso in the first video frame, y ₂ represents the ordinate of the target feature point of the human torso in the second video frame, x ₁ represents the abscissa of the target feature point of the human torso in the first video frame, x ₂ represents the abscissa of the target feature point of the human torso in the second video frame, max (y ₁,y₂) represents the maximum value in y ₁,y₂, min (y ₁,y₂) represents the minimum value in the ordinate y ₁,y₂, max (x ₁,x₂) represents the maximum value in x ₁,x₂, and min (x ₁,x₂) represents the minimum value in x ₁,x₂. The specific content refers to the corresponding parts of the method embodiments, and will not be described herein.

As an alternative implementation manner of this embodiment, the coordinate prediction module includes:

The human body detection frame acquisition module is used for acquiring the blocked human body detection frame in the monitoring image; the specific content refers to the corresponding parts of the method embodiments, and will not be described herein.

The image to be detected determining module is used for translating the human body detecting frame, enabling the center of the human body detecting frame to coincide with the center of the blocked human body, eliminating background and uncertain pixels, and obtaining an image to be detected, wherein the uncertain pixels are determined according to errors of the pixels and an outlier threshold; the specific content refers to the corresponding parts of the method embodiments, and will not be described herein.

And the coordinate prediction sub-module is used for determining the predicted coordinate of the human body shielding part according to the predicted error and the interior point threshold value determined by the image to be detected. The specific content refers to the corresponding parts of the method embodiments, and will not be described herein.

As an alternative implementation manner of this embodiment, the personal fall detection device further includes: and the alarm module is used for giving an alarm when judging that the falling occurs. The specific content refers to the corresponding parts of the method embodiments, and will not be described herein.

As an alternative implementation of this embodiment, the video image acquisition module 201 includes an infrared video image acquisition module for acquiring an infrared video image. The specific content refers to the corresponding parts of the method embodiments, and will not be described herein.

Embodiments of the present application also provide an electronic device, as shown in fig. 3, a processor 310 and a memory 320, where the processor 310 and the memory 320 may be connected by a bus or other means.

The processor 310 may be a central processing unit (Central Processing Unit, CPU). The Processor 310 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field-Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 320 is used as a non-transitory computer readable storage medium, and can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for detecting a personal fall in an embodiment of the present invention. The processor executes various functional applications of the processor and data processing by running non-transitory software programs, instructions, and modules stored in memory.

Memory 320 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by the processor, etc. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 320 may optionally include memory located remotely from the processor, which may be connected to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 320 and when executed by the processor 310, perform the method of detecting a person falling in the embodiment shown in fig. 1.

The details of the above electronic device may be understood correspondingly with respect to the corresponding related descriptions and effects in the embodiment shown in fig. 1, which are not repeated herein.

The present embodiment also provides a computer storage medium storing computer-executable instructions that can execute the human fall detection method of any of the above method embodiments 1. Wherein the storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a hard disk (HARD DISK DRIVE, abbreviated as HDD), a Solid state disk (Solid-state-STATE DRIVE, SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims

1. The method for detecting the falling of the personnel is characterized by comprising the following steps of:

Acquiring a monitoring video image;

inputting the monitoring video image into a pre-trained neural network to obtain key position features and motion features of a human body in the monitoring video image;

Fusing the key position features of the human body with the motion features of the human body, classifying the fused features, and judging whether the human body falls down or not;

The human motion features include: the human body centroid changes among a plurality of video frames and the angle changes of human body trunk and horizontal direction, wherein the human body centroid changes comprise human body centroid height changes and human body centroid height change rate, and the human body centroid height changes gradually fall along with the gradual falling of a human body from standing; the centroid height change rate obtains the centroid height change rate among a plurality of video frames through the height of a human body in each frame;

The method for determining the angle change between the human body and the horizontal direction is obtained through the following steps:

acquiring the same human body key position feature in adjacent video frames, and establishing a coordinate system taking the same human body key position feature as an origin;

acquiring coordinates of another human body key position feature of the adjacent video frame under the coordinate system;

According to the coordinates of the other human body key position feature of the two adjacent video frames under the coordinate system, determining the angle change between the human body and the ground in the horizontal direction;

Classifying the fused features, judging whether the fused features fall down to classify the fused feature data by using a random forest, setting N samples, wherein each sample has M attributes, randomly selecting M attributes from the M attributes when each node of the decision tree needs to be split, meeting the condition M < M, then selecting 1 attribute from the M attributes as the splitting attribute of the node based on a coefficient of a foundation, and building more decision trees according to the steps until the node cannot be split, forming a random forest, and finishing classification;

the fusing the key position features of the human body and the motion features of the human body comprises the following steps:

The method comprises the steps of carrying out feature fusion on key position features of a human body and motion features of the human body by using a full-connection layer, mapping the key position features and the motion features of the human body to the same feature space to obtain fused features, carrying out stretching operation on a convolution layer, entering the full-connection layer to obtain 169-dimensional feature vectors, obtaining 12-dimensional vectors by virtue of the motion features of the human body after Sigmoid activation, and carrying out feature fusion on the 12-dimensional vectors by virtue of a full-connection neural network with 3 nodes to obtain fused feature data.

2. The method of claim 1, wherein inputting the surveillance video image to a pre-trained neural network to obtain the human body key location feature and the human body motion feature in the surveillance video image comprises:

Inputting the monitoring video image into a pre-trained Pix2Pose network, and obtaining the predicted coordinates of the human body shielding part when the human body is shielded;

and obtaining the key position characteristics of the human body and the motion characteristics of the human body in the monitoring video image according to the predicted coordinates of the human body shielding part.

3. The method of claim 1, wherein the angular change of the human torso from horizontal between video frames is determined by the following formula:

Wherein, Representing the angular change of the human torso from horizontal between video frames,An ordinate representing a target feature point of the torso of the person within the first video frame,An ordinate representing a target feature point of the torso of the person within the second video frame,The abscissa of the target feature points representing the torso of the human body within the first video frame,The abscissa of the target feature points representing the torso of the person within the second video frame,Representation ofIs selected from the group consisting of a maximum value of (c),Representing the ordinateIs selected from the group consisting of a minimum value of,Representation ofIs selected from the group consisting of a maximum value of (c),Representation ofIs the minimum value of (a).

4. The method of claim 2, wherein inputting the surveillance video image into a pre-trained Pix2Pose network, when the human body is occluded, obtains predicted coordinates of an occluded portion of the human body, comprising:

acquiring a shielded human body detection frame in the monitoring video image;

Translating the human body detection frame to enable the center of the human body detection frame to coincide with the center of the shielded human body, eliminating background and uncertain pixels, and obtaining an image to be detected, wherein the uncertain pixels are determined according to errors of the pixels and an outlier threshold;

and determining the predicted coordinates of the human body shielding part according to the predicted error determined by the image to be detected and the interior point threshold value.

5. The method of claim 1, wherein an alarm is raised when a fall is determined.

6. The method of claim 1, wherein the surveillance video image is an infrared video image.

7. A personal fall detection device, comprising:

The video image acquisition module is used for acquiring a monitoring video image;

The feature determining module is used for inputting the monitoring video image into a pre-trained neural network to obtain the key position features and the motion features of the human body in the monitoring video image;

The feature fusion module is used for fusing the key position features of the human body with the motion features of the human body, classifying the fused features and judging whether the human body falls down or not;

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method for detecting a fall of a person according to any one of claims 1-6 when the program is executed by the processor.

9. A storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method for detecting a fall of a person of any of claims 1-6.