CN111914676A

CN111914676A - Human body tumbling detection method and device, electronic equipment and storage medium

Info

Publication number: CN111914676A
Application number: CN202010660416.1A
Authority: CN
Inventors: 杨颜如; 李驰; 刘岩; 贾晨
Original assignee: Taikang Insurance Group Co Ltd
Current assignee: Taikang Insurance Group Co Ltd
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2020-11-10

Abstract

The embodiment of the application discloses a human body tumbling detection method and device, electronic equipment and a storage medium. The method is used for solving the problem of low detection rate of human body falling detection in the related art. In the embodiment of the application, the states of multiple specific and comprehensive seemingly monitored targets can be adopted, and whether each frame of target image monitors the falling of the monitored target is comprehensively analyzed by combining the falling detection results of multiple classifiers. On the basis, whether the object falls is judged by further combining the detection results of the multiple continuous target images, so that the detection rate of the fall detection result is improved.

Description

Human body tumbling detection method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of information processing technologies, and in particular, to a method and an apparatus for detecting human body tumble, an electronic device, and a storage medium.

Background

The problem of mobility safety in the elderly has been a focus of social attention. The old people have high falling incidence and serious consequences, and are the first cause of death of the old people. Human fall detection techniques are therefore gaining increasing attention. With the continuous development of computer vision technologies such as image analysis, mode recognition and information fusion, intelligent human motion state recognition based on video monitoring gradually becomes a main means of home security monitoring, and the computer vision technology provides a new early warning means for the home security of the old, especially the old living alone.

At present, the detection means of the falling behavior of the human body mainly comprises wearable equipment, a peripheral sensor arrangement mode, a video monitoring mode and the like. The former two methods rely on corresponding equipment to acquire information, have strong dependence on the environment and have low detection rate. The intelligent human motion state identification based on video monitoring gradually becomes a main means of home safety monitoring, and a new early warning means is provided for the home safety of the old, especially the old living alone. However, in practical application, the problem that the detection rate is low and the requirement of practical application cannot be met still exists.

Disclosure of Invention

The application aims to provide a human body falling detection method, a human body falling detection device, electronic equipment and a storage medium, and is used for solving the problem that the human body falling detection rate is low in the prior art.

In a first aspect, an embodiment of the present application provides a method for detecting human body tumbling, where the method includes:

acquiring multi-frame target images of a monitored object within a specified time period;

respectively executing the following steps for each frame of the target image:

respectively acquiring multiple types of features from the target image, wherein the multiple types of features comprise at least one human body posture feature and at least one human body motion feature;

inputting each type of features in the multiple types of features into a corresponding classifier respectively for fall detection to obtain fall detection results corresponding to each type of features;

comprehensively analyzing the falling detection result of each type of features to obtain a final falling detection result of the target image;

and if the final fall detection results of the continuous multiple frames of target images are judged to fall, determining that the monitored object falls.

In some embodiments, the acquiring multiple frames of target images of the monitored object within the same specified time period includes:

acquiring a video of the monitored object within the specified time period;

and performing equal-frequency frame extraction processing on the video to obtain the multi-frame target image of the target object.

In some embodiments, the fall detection result output by each classifier is a fall probability, and the comprehensively analyzing the fall detection result of each type of feature to obtain a final fall detection result of the target image includes:

performing a weighted summation of the fall detection results for each class of features;

when the weighted sum result is greater than or equal to a preset value, the final fall detection result of the target image is that the target image falls;

and when the weighted sum result is smaller than the preset value, the final fall detection result of the target image is that no fall occurs.

In some embodiments, the human posture features of the monitored subject include: human body key point position characteristics and human body shape characteristics, wherein the human body shape characteristics of the monitored object comprise one or a combination of the following: the aspect ratio of the human body outline, the human body moment characteristic and the mass center position information of the human body;

and/or the presence of a gas in the gas,

the motion characteristics include a motion velocity of a center of mass of the monitored object.

In some embodiments, acquiring the centroid position of the human body of the monitored object, the human body moment feature and the aspect ratio of the human body outline for each frame of the target image comprises:

carrying out human body target detection on the target image, and segmenting a human body partial image of the monitored object;

determining the position of the mass center of the human body according to the position information of each pixel point in the human body partial image in the target image;

determining the human body moment characteristics of the monitored object according to the mass center position and the central moment identification method of the human body, wherein the human body moment characteristics comprise: translation moment features, rotation moment features, scale moment features and magnitude moment features;

and acquiring a circumscribed rectangle of the human body partial image, and determining the ratio of the width to the height of the circumscribed rectangle as the aspect ratio of the human body outline.

In some embodiments, acquiring the human body keypoint location features of the monitored object for each frame of the target image includes:

extracting the position information of the human body key points of the monitored object from the target image;

and extracting key points of the hip, the neck and the knee from the human body key point position information, and determining the relative position relationship between the hip and the neck and the relative position relationship between the hip and the knee as the human body key point position characteristics.

In some embodiments, the classifier is an extreme learning machine.

In a second aspect, an embodiment of the present application provides a human body fall detection device, the device includes:

the acquisition module is used for acquiring multi-frame target images of the monitored object within a specified time period;

a single-frame image recognition module, configured to perform, for each frame of the target image:

the fusion module is used for comprehensively analyzing the falling detection result of each type of features to obtain the final falling detection result of the target image;

and the result determining module is used for determining that the monitored object falls if the final fall detection results of the continuous multiple frames of the target images are judged to fall.

In some embodiments, the obtaining module is configured to:

acquiring a video of the monitored object within the specified time period;

In some embodiments, the fall detection result output by each classifier is a fall probability, and the fusion module is configured to:

and/or the presence of a gas in the gas,

In some embodiments, for each frame of the target image, the single-frame image identification module is configured to:

In some embodiments, the classifier is an extreme learning machine.

In a third aspect, another embodiment of the present application further provides an electronic device, including at least one processor; and a memory communicatively coupled to the at least one processor; the storage stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute any human fall detection method provided by the embodiment of the application.

In a fourth aspect, another embodiment of the present application further provides a non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of the electronic device, cause the electronic device to perform any one of the human fall detection methods in the embodiments of the present application.

According to the embodiment of the application, the states of multiple classes of specific and comprehensive seemingly monitored targets can be adopted, and whether each frame of target image monitors the falling of the monitored target is comprehensively analyzed by combining the falling detection results of multiple classifiers. On the basis, whether the object falls is judged by further combining the detection results of the multiple continuous target images, so that the detection rate of the fall detection result is improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an application environment according to one embodiment of the present application;

fig. 2 is a schematic flow chart of a method for detecting a human fall according to an embodiment of the present application;

3-6 are diagrams illustrating the effects of image processing according to one embodiment of the present application;

fig. 7 is a schematic view of a personal fall detection apparatus according to an embodiment of the present application;

FIG. 8 is a schematic view of an electronic device according to one embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first", "second", and the like in the description of the present disclosure are used for distinguishing similar objects, and are not necessarily used for describing a particular order or sequence. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In the prior art, a mode of adopting a wearable device or a peripheral sensor depends on a specific device product, and the environment dependence is high. The problem of low detection rate still exists in the current application result display based on the detection mode of the monitoring video. In view of this, the present application provides a human body fall detection method. The inventive concept of the method is as follows: the human body fall detection is realized based on computer vision, various human body characteristics are fused in the realization of the fall detection, and whether the human body fall is detected or not is determined from different dimensions and angles as far as possible in an all-round manner. Therefore, the corresponding detection result is accurate, and the detection rate can be effectively improved.

In addition, in the embodiment of the application, in order to accelerate the detection speed, the video can be sampled to obtain the multi-frame sampling samples of the monitored object, based on the analysis of the multi-frame sampling samples, redundant data can be effectively filtered, and the integration of various detection characteristics can be combined to ensure that the efficiency is improved and the accuracy of the detection result can be improved.

The human body fall detection method provided by the present application is explained with reference to the accompanying drawings. FIG. 1 is a schematic diagram of an application environment according to one embodiment of the present application.

As shown in fig. 1, the application environment may include, for example, at least one server 20 and a plurality of monitoring devices 30. Each monitoring device 30 may be any electronic device capable of performing network access, including but not limited to a monitoring camera, a computer, a laptop, a smart phone, a tablet computer, or other types of terminals. The server 20 is any server capable of providing information required for an interactive service through a network. The monitoring device 30 can transmit and receive information to and from the server 20 via the network 10, for example, transmit the acquired monitoring image to the server 20 for human fall detection. And acquiring corresponding alarm information from the server, and executing instructions issued remotely by the server, and the like. The server 20 may access the database 50 to acquire and store information related to the monitoring device 30, and the like. The terminal device 40 may acquire the monitoring screen presentation from the server 20 if necessary. The monitoring devices (e.g., 30_1 and 30_2 or 30_ N) may also communicate with each other via the network 10 to enable linkage between the monitoring devices. Network 10 may be a network for information transfer in a broad sense and may include one or more communication networks such as a wireless communication network, the internet, a private network, a local area network, a metropolitan area network, a wide area network, or a cellular data network, among others.

In the following description, only a single server or monitoring device is detailed, but it will be understood by those skilled in the art that the single server 20, monitoring device 30 and database 50 shown are intended to represent that the solution of the present application relates to the operation of the monitoring device, server and database. The detailed description of a single monitoring device and a single server and database is for convenience of illustration at least and does not imply limitations on the type or location of the monitoring device and server, etc. It should be noted that the underlying concepts of the example embodiments of the present application may not be altered if additional modules are added or removed from the illustrated environments. In addition, although a bidirectional arrow from the database 50 to the server 20 is shown in the figure for convenience of explanation, it will be understood by those skilled in the art that the above-described data transmission and reception may be realized through the network 10.

To be able to detect whether a human body falls from multiple dimensions and angles. The embodiment of the application provides a technical scheme capable of fusing multiple detection characteristics, and as shown in fig. 2, the flow diagram for detecting human body tumble provided by the embodiment of the application comprises the following steps:

the monitoring equipment carries out video acquisition on a monitored object and reports an acquisition result to the server in real time, and in step 201, the server acquires a plurality of frames of target images of the monitored object within a specified time period; for example, the detection may be performed every 1 minute or 5 minutes, and the specified time period is 1 minute or 5 minutes. Of course, in specific implementation, the setting may be set according to actual requirements, and the embodiment of the present application does not limit this.

Furthermore, in some possible embodiments, the video frame rate is generally high, and the difference between two adjacent frames of images is almost negligible. Therefore, repeated detection of such highly similar images is less meaningful because the detection results thereof have less influence on the actual detection results than the detection of only a part of the images thereof. In order to improve the processing speed of detection, in the embodiment of the present application, the server may sample the received video, and in order to better obtain an image of a key feature, in the embodiment of the present application, the video is only subjected to constant-frequency frame extraction processing, so as to obtain a multi-frame target image that needs to be subjected to tumble detection.

After obtaining multiple frames of target images, in step 202, the server extracts multiple types of features from each frame of target image, where the multiple types of features may be set according to actual requirements. The human body state of the monitored object can be described comprehensively from different angles and dimensions as far as possible. For example, the plurality of types of features may include at least one human posture feature and at least one human motion feature. The following description is made for the extraction of the human body posture features and the human body motion features, respectively:

1) extracting human body posture characteristics of monitored object

The human posture characteristics of the monitored object can comprise human key point position characteristics, human shape characteristics and the like. Wherein, the human body shape characteristics of the monitored object can comprise one or a combination of the following: aspect ratio of human body outline, human body moment characteristic, mass center position information of human body, etc.

(1) Extraction of centroid position information of human body

Taking a frame of target image as an example, human body target detection can be performed on the target image, and a human body partial image of the monitored object is segmented. As shown in fig. 3, a in fig. 3 is an original image of the target image, and after the human body part is segmented by human body target detection, the result shown in b in fig. 3 is obtained. The body region of the human target can be seen from the b-diagram.

Based on the detected position information of the human body region in the target image, the centroid position of the human body of the monitored object can be calculated. The b-map (binary image) in fig. 3 can be directly used for calculation, and the calculation method is as follows:

firstly, the abscissa distribution characteristic and the ordinate distribution characteristic of the human body region can be obtained according to a binary image formula, as shown in formula (1):

wherein x and y represent the position coordinates of each pixel point of the human body region, and when p is 1 and q is 0, m is_pqRepresenting the sum of x coordinates of all points of a human body region in the binary image; when p is 0 and q is 1, m_pqRepresenting the sum of y coordinates of all points of a human body region in the binary image; when p is 0, m_pqRepresenting the total area of the body region in the binary image.

According to the description of the formula (1), the calculation of the centroid position coordinates of the human body of the monitored object is shown in the formula (2):

wherein the content of the first and second substances,

an abscissa representing the center of mass of the human body and an ordinate representing the center of mass of the human body.

(2) Extraction of human body moment features

The target image is converted into a grayscale image. Considering the coordinates of a pixel as a two-dimensional random variable (X, Y), a gray scale image can be represented by a two-dimensional gray scale density function, and thus the moments representing the distribution of the random variables can be used to characterize the gray scale image. The (p + q) order origin moment and the center distance of the probability function are expressed by the following formulas (3) and (4):

wherein, in the formula (3), m_pqRepresents the origin moment of the order of (p + q), and other parameters have the same meanings as above, and are not described herein again. In the formula (4), μ_pqThe (p + q) -order center distance is shown, and other parameters have the same meanings as above, and are not described herein again. Due to the central moment mu_pqWith no invariance in translation but no invariance in scale, it can be normalized according to the following equation (5):

η_pq＝μ_pq/μ₀₀ ^rr ═ p + q)/2; wherein the content of the first and second substances,

in the formula (5), η_pqRepresent the normalized results; mu.s_pqRepresents the center distance of the order of (p + q); mu.s₀₀Representing the mass (area) of the target region, the normalized moments have scale invariance. The other parameters have the same meanings as above, and are not described in detail here.

Then, seven moments which can satisfy the characteristics of translation, scale, rotation, size and the like can be constructed by utilizing the normalized central moments. In practical applications, the higher the order of the central moment, the more details in the target image can be reflected, but the higher the central moment of the solution, the more complicated the calculation amount is. Therefore, in the embodiment of the present application, the first four invariant moments with lower orders are used as feature vectors to describe characteristics of the human body region, i.e., H1, H2, H3, and H4, and the calculation manner of each moment feature is as shown in formula (6):

(3) extracting aspect ratio of human body contour

As described above, after the human body partial image is segmented, a circumscribed rectangle of the human body partial image may be obtained, and then a ratio of a width to a height of the circumscribed rectangle may be determined as an aspect ratio of the human body contour. The calculation mode is simple and easy to realize.

The limbs action of control object when falling under normal condition and have very big difference, so, in order to the state of description control object that can be comprehensive, human key point position information has been introduced and human body falls to detect in this application embodiment.

In implementation, the position information of the human body key points of the monitored object can be extracted from the target image as the position characteristics of the human body key points.

However, in the falling posture, the relative position relationship of key points of some characteristic parts has certain characteristics. Therefore, in the embodiment of the present application, the relative position relationship between the extracted positions of the human key points is used as the position feature of the human key points. In practice, Alphapos can be used to detect the coordinates of 18 skeletal points of the human body. The neck, the hip, the knee, the ankle and the ankle are selected, and for different human body actions, the relative positions of all the bone points are obviously different, so that the relative position information of the bone points can be extracted as the features of human body falling detection. The tumbling action measures the relative positions of the hip and neck, and the hip and knee, which can be described using equation (7) as follows:

wherein x_hip、y_hip,x_neck、y_neck,x_knee、y_kneeRespectively, the abscissa and ordinate of the hip, the abscissa and ordinate of the neck, and the abscissa and ordinate of the knee, and weight and height respectively, represent the width and height of the target frame of the human body detected by alphadose (i.e., the width and height of the circumscribed rectangle of the human body described in (3)).

2) Extracting motion features

The motion characteristics may be expressed in terms of the speed of motion of the monitored object. In practice, the moving speed of the centroid of the monitored object can be used for simplifying the calculation.

In some embodiments, the motion information of the human body can be extracted by forming a graph pair by two adjacent frames of target images. The feature extraction based on the image pair is an important means for detecting the falling behavior of the human body, the motion information of the human body can be described by using the two frames of images, and the motion information expressed by the optical flow field can be used for describing the magnitude of the motion amplitude. In implementation, the motion speed v of the centroid in the image optical flow obtained by the optical flow prediction algorithm flonet 2.0 based on the convolutional neural network can be selected to describe the motion characteristics of the human body in the horizontal coordinate and vertical coordinate directions.

In other embodiments, the relative position change values of the key points of the previous and subsequent frames may be calculated as the motion information of the human body. After the centroid position in the shape feature is obtained, the movement speed can also be obtained according to the centroid position change of the equal-frequency extraction frame.

In summary, the various features extracted include:

class 1 features, namely human body shape features, include:

a-the centroid of the body region is expressed as: p_wc＝{(x_i,y_i)|i∈N}

b-aspect ratio of the human body region expressed as: p_wh＝{w_i/h_i|i∈N}

c-moment characteristics of the body region are expressed as: h_u＝{(H_1i,H_2i,H_3i,H_4i)|i∈N}

The second kind of features, namely the human body key point position features, include: d-Key points of the body region the neck, crotch, knee associated with the fall maneuver are selected, and their relative positions with respect to the length and width of the person's target frame are calculated and expressed as: p_kp＝{(D_xhni,D_yhni,D_xkhi,D_ykhi)|i∈N}

A third category of features, motion features, includes: e-the speed of motion of the center of mass of the monitored object is expressed as: v { (u)_i,v_i)|i∈N}

Wherein N in the above-mentioned a-e represents the number of frames of the extracted target image.

After the three types of features are obtained, in step 203, for each frame of target image, each type of feature is respectively input into a corresponding classifier for fall detection, so as to obtain a fall detection result corresponding to each type of feature.

Then, in step 204, for each frame of target image, the fall detection results of each type of features of the target image are comprehensively analyzed to obtain the final fall detection result of the target image.

In practice, the fall detection result of each type of feature can give a conclusion about whether a fall occurs. When the fall detection results of the multiple types of features are fused, the mode can be adopted to represent the final fall detection result. For example, if the fall detection results of the first and third features are both falls, and the fall detection result of the second feature is not falls, then the mode is 2 falls. Therefore, the final detection result can be identified as a fall. In addition, the fall detection result output by each type of classifier can be a probability value used for describing the probability of falling, and then the final fall detection result is determined by means of weighted summation, for example, a weighted summation result indicates falling when the weighted summation result is greater than or equal to a preset value, and indicates not falling when the weighted summation result is less than the preset value. Therefore, the final fall detection result of each frame of target image is obtained frame by frame, and in step 205, if the final fall detection results of consecutive multiple frames of target images are all determined to fall, it is determined that the monitored object falls.

The detection result of continuous multi-frame target images is adopted, so that the detection accuracy can be further effectively improved.

In summary, in the embodiment of the present application, multiple types of states that are particularly comprehensive and seemingly monitoring targets can be adopted, and whether each frame of target image monitors that the monitoring target falls down is comprehensively analyzed by integrating fall detection results of multiple classifiers. On the basis, whether the object falls is judged by further combining the detection results of the multiple continuous target images, so that the detection rate of the fall detection result is improved.

When the method is implemented, each class of features corresponds to one own classifier. In order to take account of the training speed and the recognition accuracy of the classifier into account, in the embodiment of the present application, an ELM (Extreme Learning Machine) may be used as the classifier.

How to train the ELM is explained below:

the three characteristics are respectively used for making training samples. For each class of features, the class of features are used as training samples, and the output result of the corresponding classifier is a classification prediction label. The falling detection result corresponding to the shape characteristics of the human body is assumed to be Y_spAnd the falling detection result corresponding to the position characteristics of the key points of the human body is Y_kpThe fall detection result corresponding to the motion feature is Y_mt. And the value ranges of the fall detection results of various characteristics are all [0,1 ]]. Then, fall detection results Y based on the three types of features_sp、Y_kpAnd Y_mtPerforming decision layer fusion to obtain the final fall detection result Y ═ w_sY_sp+w_kY_kp+w_mY_mt. Wherein, w_s、w_kAnd w_mThe weighted values are respectively distributed corresponding to each kind of characteristics, and the value range is [0, 1%]And has w_s+w_k+w_m1. The weight value may be determined according to an experimental value, which is not limited in the present application.

The ELM classifier adopted in the application is a forward single hidden layer neural network, random weights are adopted between a hidden layer and an input layer, the output weights are solved by a method of adding regular terms on a final output layer, and falling behaviors are classified from input feature vectors.

Firstly, training a training sample to obtain an output weight of a hidden layer, wherein the process is as follows:

setting the number of ELM hidden layer nodes and an activation function g (x), wherein the number of hidden nodes is equal to the number of input samples under general conditions, and the activation function selects a sigmoid function; the input weight value and the bias value of the hidden layer in the network are initialized randomly, and the value range is [ -1,1]Calculating a hidden layer output matrix H; finally, the optimal solution is obtained by least squares

And calculating the output weight beta of the hidden layer. The method is realized as follows:

and/inputting a training sample X and outputting a falling detection result as T.

// output weight β: which outputs weights for the hidden nodes.

The cycle execution training process is as follows:

sequentially inputting 1-L training samples

Randomly generating hidden layer node parameters (w)_i,b_i) At the beginning, randomly generating hidden layer node parameters, wherein w is the hidden layer input weight, b is the bias, and i represents the ith node.

Sequentially inputting 1-N training samples, N-L

X ═ xi \ \ current input is the ith training sample

Sequentially inputting 1-N training samples, N-L

H＝g(w_i·x_j+b_i) \ \ hidden layer output matrix

β＝H⁺T|\\H⁺Generalized inverse matrix being matrix H

Return < w, b, beta > \ \ returns the weight, bias and output weight on the hidden layer node

And calculating the optimal output weight beta by training the known training samples. The test data were classified by β, the procedure is as follows:

// input x, < w, b, β >: sample test data, hiding node parameters;

// output T: ELM classification results

Sequentially inputting the 1 st to M th test samples

Inputting ith test sample in term of X ═ xi \ \ current period

For j＝1 to L Do

H＝g(w_i·x_j+b_i) Output matrix of \ \ hidden layer

T ═ H β \ β is the hidden layer output matrix obtained in the training phase

Return T

The following describes the human fall detection provided by the embodiments of the present application with reference to experimental results. Take the example of detecting a fall video: the video is sampled by extracting one frame of image every 10 frames, and 10 frames are extracted in total, and the sampled picture is as shown in fig. 4. A key frame image of a fall of a monitored subject from a bed-in position during the fall is depicted in fig. 4. Then, the depllabv 3+ model is used to segment the human body part, and the effect graph of the segmented human body part is shown in fig. 5, and it can be seen from fig. 5 that the human body part in each frame of image is effectively identified and segmented. And then, coordinates of 18 skeleton points of the human body are detected by using Alphapos, the positions of the detected skeleton points are shown in figure 6, and white points on the human body part in figure 6 represent the skeleton points. Then, flomonet 2.0 is selected to obtain the optical flow of two frames before and after frame extraction, and further the motion characteristics are obtained.

Then, position coordinates of the neck, the hip, and the knee in the bone point positions are extracted to calculate relative positions of the neck and the hip, and the knee. Then, inputting the three types of features into the trained ELM for classification, and classifying the result Y based on the three types of features_sp、Y_kpAnd Y_mtPerforming decision layer fusion to obtain the final classification label Y ═ w_sY_sp+w_kY_kp+w_mY_mt。

In order to ensure the accuracy of the test result, when the results of two consecutive frames are both 1 (namely, when the monitored object is judged to fall), the monitored object is judged to fall. In the above 10 video images, the 8 th, 9 th and 10 th images are all determined to be 1, and therefore, it is determined that the falling behavior occurs.

Based on the same conception, the embodiment of the application also provides a human body falling detection device.

Fig. 7 is a schematic diagram of an image processing apparatus according to an embodiment of the present application.

As shown in fig. 7, the recommendation apparatus 700 may include:

an obtaining module 701, configured to obtain multiple frames of target images within a specified time period of a monitored object;

a single-frame image recognition module 702, configured to perform, for each frame of the target image:

a fusion module 703, configured to perform comprehensive analysis on the fall detection result of each type of feature to obtain a final fall detection result of the target image;

a result determining module 704, configured to determine that the monitored object falls if the final fall detection results of the consecutive multiple frames of the target images are all determined to fall.

In some embodiments, the obtaining module is configured to:

acquiring a video of the monitored object within the specified time period;

and/or the presence of a gas in the gas,

segmenting the human body of the target image into human body partial images of the monitored object;

and acquiring a circumscribed rectangle of the human body partial image according to a human body rectangular frame obtained by target detection in the key point detection process, and determining the ratio of the width to the height of the circumscribed rectangle as the aspect ratio of the human body contour.

In some embodiments, the classifier is an extreme learning machine.

For implementation and beneficial effects of the operations of the human fall detection apparatus, reference is made to the description in the foregoing method, and details are not repeated here.

Having described a human fall detection method and apparatus according to an exemplary embodiment of the present application, an electronic device according to another exemplary embodiment of the present application will be described.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

In some possible implementations, an electronic device according to the present application may include at least one processor, and at least one memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the human fall detection method according to various exemplary embodiments of the present application described above in the present specification.

The electronic device 130 according to this embodiment of the present application is described below with reference to fig. 8. The electronic device 130 shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 8, the electronic device 130 is represented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131).

Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.

Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the electronic device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

In some possible embodiments, the various aspects of a human fall detection method provided herein may also be embodied in the form of a program product including program code for causing a computer device to perform the steps of a human fall detection method according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for human fall detection of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic devices may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for detecting a human fall, the method comprising:

respectively executing the following steps for each frame of the target image:

2. The method according to claim 1, wherein the obtaining multiple frames of target images within the same specified time period of the monitored object comprises:

acquiring a video of the monitored object within the specified time period;

3. The method as claimed in claim 1, wherein the fall detection result output by each classifier is a fall probability, and the step of performing the comprehensive analysis on the fall detection result of each type of feature to obtain the final fall detection result of the target image comprises:

4. The method of claim 1, wherein the human pose features of the monitored subject comprise: human body key point position characteristics and human body shape characteristics, wherein the human body shape characteristics of the monitored object comprise one or a combination of the following: the aspect ratio of the human body outline, the human body moment characteristic and the mass center position information of the human body;

and/or the presence of a gas in the gas,

5. The method of claim 4, wherein acquiring the centroid position of the human body of the monitored object, the human body moment feature and the aspect ratio of the human body contour for each frame of the target image comprises:

6. The method according to claim 5, wherein acquiring the human body key point position characteristics of the monitored object for each frame of the target image comprises:

7. The method of any one of claims 1-6, wherein the classifier is an extreme learning machine.

8. A human fall detection device, characterized in that the device comprises:

9. An electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the human fall detection method of any one of claims 1-7.

10. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, cause the electronic device to perform the human fall detection method of any one of claims 1-7.