CN108090458B

CN108090458B - Human body falling detection method and device

Info

Publication number: CN108090458B
Application number: CN201711468689.0A
Authority: CN
Inventors: 谢阳阳
Original assignee: Nanjing Avatarmind Robot Technology Co ltd
Current assignee: Nanjing Avatarmind Robot Technology Co ltd
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2020-02-14
Anticipated expiration: 2037-12-29
Also published as: CN108090458A; WO2019128304A1

Abstract

The embodiment of the application provides a human body falling detection method and a human body falling detection device, wherein the method comprises the following steps: acquiring a target image; detecting a human body of the target image through a target detection network to determine whether the target image is an image containing the human body; in the scheme, the target image is subjected to falling identification through the convolutional neural network under the condition that the target image is determined to be the image containing the human body, so that whether the human body in the target image is in the falling state or not is determined.

Description

Human body falling detection method and device

Technical Field

The present application relates to the field of human body detection technologies, and in particular, to a method and an apparatus for detecting human body falling.

Background

With the increasingly serious aging trend of society, people pay more and more attention to the safety of daily life of the old. For example, it is desirable to detect an accident such as a fall of an old person alone at home in time. Therefore, in real life, how to effectively and accurately detect whether the old person falls down so as to help the old person in time becomes an important problem.

At present, in order to detect falling, most of the existing methods are to arrange a plurality of cameras in a human body activity area in advance to acquire video stream data, and then judge whether the human body falls or not by analyzing the human body change condition in the video stream data. When the method is implemented, the video stream data needs to be processed and analyzed, so that the workload is large and the efficiency is low. In addition, the judgment process of whether the human body falls down or not is complex by analyzing the change condition of the human body, and the error is relatively large. In summary, the existing method has the technical problems of poor accuracy, large error and low efficiency in fall identification in specific implementation.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the application provides a method and a device for detecting human body falling, which are used for solving the technical problems of poor falling identification accuracy, large error and low efficiency in the existing method and achieving the technical effect of accurately and efficiently identifying a falling state.

The embodiment of the application provides a human body falling detection method, which comprises the following steps:

acquiring a target image;

detecting a human body of the target image through a target detection network to determine whether the target image is an image containing the human body;

and under the condition that the target image is determined to be an image containing a human body, carrying out falling identification on the target image through a convolutional neural network so as to determine whether the human body in the target image is in a falling state.

In one embodiment, the acquiring the target image includes:

collecting sound information in a target area;

determining the target direction according to the sound information;

and moving a camera according to the target position to acquire the target image.

In one embodiment, the target detection network is established as follows:

acquiring human body image sample data, wherein the human body image sample data comprises a plurality of images containing human body states;

marking a human body region in the image of the human body image sample data;

and training by using the labeled human body image sample data to obtain a target detection network based on a target detection algorithm.

In one embodiment, the human state comprises: standing state, sitting state, lying state, squatting state, inclining state, and lying state.

In one embodiment, in a case where it is determined that the target image is an image not including a human body, the method further includes: and re-acquiring the target image.

In one embodiment, the convolutional neural network is built as follows:

extracting an image meeting the requirements from the human body image sample data to serve as preprocessing sample data;

according to the human body state in the image of the pre-processing sample data, dividing the image in the pre-processing sample data into positive sample data and negative sample data, wherein the image in the positive sample data comprises at least one of the following: an image including a state in which a human body stands, an image including a state in which the human body sits, an image including a state in which the human body squats, and an image including a state in which the human body tilts; the image in the negative sample data comprises at least one of: the image comprises an image of a lying state of a human body and an image of a lying state of the human body;

and training by using the positive sample data and the negative sample data to establish the convolutional neural network for identifying the human body state type.

In one embodiment, the satisfactory image includes: an image having a human body area accounting for more than 80%.

The embodiment of the present application further provides a human fall detection device, including:

the acquisition module is used for acquiring a target image;

the human body detection module is used for carrying out human body detection on the target image through a target detection network so as to determine whether the target image is an image containing a human body;

and the falling identification module is used for carrying out falling identification on the target image through a convolutional neural network under the condition that the target image is determined to be an image containing a human body so as to determine whether the human body in the target image is in a falling state.

In one embodiment, the obtaining module comprises:

the sound collector is used for collecting sound information in the target area;

the locator is used for determining the target direction according to the sound information;

the mobile device is used for moving the camera according to the target position, and the camera is used for acquiring a target image.

In one embodiment, the apparatus further comprises an alarm module for giving an alarm and/or sending warning information if the human body in the target image is determined to be in a falling state.

In the embodiment of the application, a single-frame target image is obtained instead of a video stream for analysis and processing, an image containing a human body is firstly identified by using a target detection network based on a target detection algorithm, and then the human body state in the target image is classified and identified by using a convolutional neural network based on a classification algorithm so as to identify the specific state of the human body in the target image, so that the technical problems of poor identification falling accuracy, large error and low efficiency in the existing method are solved, and the technical effect of accurately and efficiently identifying the falling state is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a schematic processing flow diagram of a human fall detection method provided according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a human body fall detection device provided according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device based on a human fall detection method provided by an embodiment of the present application;

fig. 4 is a schematic structural diagram of a human body fall detection robot to which the human body fall detection method and apparatus provided by the embodiments of the present application are applied in a scene example;

fig. 5 is a schematic flow chart of human fall detection using a human fall detection robot in one example scenario.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In consideration of the fact that most of the existing methods collect video stream data and analyze and process the video stream data, the existing methods occupy more resources and have low efficiency due to large data volume to be analyzed. In addition, most of the existing methods detect whether the human body falls down by analyzing the human body change, and the identification mode is complex, poor in precision and prone to error. In summary, the existing method has the technical problems of poor fall identification accuracy and low efficiency in specific implementation. Aiming at the root cause of the technical problems, the method considers that the image data of a single frame can be obtained, and the specific analysis is not carried out on the video stream data so as to effectively reduce the data processing amount; in addition, according to the characteristics and advantages of the image data, whether the human body falls or not is judged by analyzing the human body state in the image instead of the human body change, the technical problems of poor falling identification accuracy, large error and low efficiency in the existing method are solved, and the technical effect of accurately and efficiently identifying the falling state is achieved.

Based on the thought, the embodiment of the application provides a human body falling detection method. Specifically, please refer to fig. 1, which is a schematic processing flow diagram of a human fall detection method according to an embodiment of the present application. The human body fall detection method provided by the embodiment of the application can comprise the following steps when being implemented specifically.

S11: and acquiring a target image.

In the embodiment, in order to reduce the amount of calculation and reduce the occupation of calculation resources, in the specific implementation, a single-frame target image may be acquired instead of a video stream acquired by the existing method, and then the subsequent specific analysis and processing may be performed. Compared with video stream, when the target image of a single frame is subsequently analyzed and processed, only a single frame image needs to be analyzed, detected and identified, so that the calculation amount can be effectively reduced, the calculation cost can be reduced, and the identification speed can be improved.

In one embodiment, in order to further reduce the workload of the subsequent human body detection stage and avoid performing image acquisition for acquiring images including a human body multiple times, in the process of acquiring a target image, in a specific implementation, an effective image may be acquired as a target image as preferentially as possible. The effective image may be specifically understood as an image including a human body. Accordingly, an image not including a human body can be understood as an ineffective image. Therefore, the target image can be prevented from being repeatedly acquired for a plurality of times in order to acquire the image containing the human body which can be used subsequently, and the processing efficiency is improved.

In one embodiment, in order to efficiently acquire the effective image, the acquiring of the target image may include the following steps:

s11-1: collecting sound information in a target area;

s11-2: determining the target direction according to the sound information;

s11-3: and moving a camera according to the target position to acquire the target image.

In this embodiment, the target direction may be a direction from which a sound originates. The above directions have a high probability of human movement. Therefore, in the target orientation, an image including a human body, that is, an effective image is acquired with a higher probability than in other orientations.

In this embodiment, in specific implementation, a microphone array may be used as a sound collector to collect sound information in a target area; and determining the direction of the sound source according to the collected sound information through the locator, and determining the direction of the sound source as the target direction. Of course, the microphone arrays are listed above only for better illustration of the embodiments of the present application. In specific implementation, other suitable sound collectors can be selected and used according to specific situations.

In this embodiment, in a specific implementation, the camera may be specifically arranged on a mobile device, that is, the camera may be movable and not in a fixed setting target area. For example, the camera may be provided on a moving device composed of a pulley and a motor. Therefore, the camera can flexibly move in the target area through the moving device, so that the area range for acquiring the target image can be effectively expanded, and more target images can be acquired in a larger detection range. That is, the manner in which the camera is used in the embodiment of the present application is different from the manner in which the camera is used in the existing method. Specifically, in the existing method, when a camera is used, the camera is fixedly arranged at a certain fixed position in a target area to collect video stream data. According to the method of using the cameras in the prior art, the range which can be detected by a single camera is limited, and in order to increase the total detection range, the cameras need to be respectively arranged at a plurality of positions of the target area. Thus, implementation costs are increased. In the embodiment of the application, the camera is arranged on the mobile device, and then the camera is moved in real time through the mobile device according to the situation to acquire the target images at different positions in the target area, so that the target images in a larger range can be acquired by using one or a small number of cameras, and the implementation cost is reduced. Meanwhile, the camera can move, so that the angle and the distance between the camera and the human body can be adjusted according to the specific condition of the human body, and a target image with higher quality can be obtained, so that the subsequent falling recognition can be more accurately carried out. Of course, the above-mentioned mobile devices are only used to better describe the embodiments of the present application. In particular, other movable structures can be selected and used as the moving device according to specific situations and precision requirements, such as a movable robot, a remote control car and the like, so that the position of the camera can be flexibly moved. The present application is not limited thereto.

In this embodiment, in specific implementation, the sound information in the target area may be collected by the microphone array; determining the source direction of the sound through a locator, and taking the direction as the direction in which the human movement is possible, namely the target direction; and then, the camera is moved to the source position of the sound by the mobile device according to the determined target position, so that an effective image with relatively high quality can be obtained by the common camera.

S12: and carrying out human body detection on the target image through a target detection network so as to determine whether the target image is an image containing a human body.

In this embodiment, after the target image is acquired, human body detection is performed on the target image to determine whether the target image to be analyzed is an image including a human body, that is, an effective image. So that subsequently only the valid image can be subjected to the next fall recognition. And taking the image without the human body as an invalid image, and not carrying out the next fall identification. Therefore, the images which do not contain the human body can be excluded in advance, meaningless falling recognition of the images which do not contain the human body is avoided, the data processing amount of falling recognition is reduced, and the processing speed is further improved.

In one embodiment, in implementation, the target image may be obtained again when it is determined that the target image is an image that does not include a human body, so as to perform real-time monitoring on an area in the target area where human activities may exist.

In one embodiment, the data to be analyzed is acquired as a single-frame image, and specific characteristics of image analysis are considered, so that whether the target image is an image containing a human body can be determined quickly and accurately. In specific implementation, the human body detection can be performed on the acquired target image through a target detection network based on a target detection algorithm, so as to determine whether the target image is an image containing the human body.

In one embodiment, before step S12 is executed, the above target detection network for human body detection may be established in advance by:

s1: collecting human body image sample data, wherein the human body image sample data comprises human body images in different states;

s2: marking a human body region in the human body image sample data;

s3: and training by using the labeled human body image sample data to obtain a target detection network based on a target detection algorithm.

In this embodiment, the target detection algorithm may be a detection algorithm based on deep learning, which is also called ssd (single Shot multi box detector) algorithm. The core of the algorithm is that a convolution kernel is adopted on a characteristic diagram to predict category scores and offsets of a series of default bounding boxes, so that whether a target image to be detected is an effective image containing a human body can be quickly and accurately detected.

In this embodiment, in order to match with subsequent fall recognition, it is required that the human body image sample data specifically includes a plurality of images of human body states in different states.

In one embodiment, in order to fully consider a plurality of different human body state conditions, the human body state may specifically include: a standing state of a human body, a sitting state of the human body, a lying state of the human body, a squatting state of the human body, an inclined state of the human body, a lying state of the human body and the like. Therefore, in specific implementation, multiple images containing different human body states can be learned through the target detection algorithm, so that multiple images containing different human body states can be simultaneously detected and identified.

In this embodiment, in a specific implementation, the SSD object detection network may be used to identify a human body region in an image of human body image sample data, so that training related to human body region feature recognition may be performed subsequently.

In one embodiment, before training with the labeled human body image sample data, an SSD target detection network, that is, an initial model equivalent to target detection, may be constructed. In specific implementation, the SSD object detection network may be constructed on a tensoflow framework, and the acceptance _ v2 is used as a feature extractor.

In an embodiment, the training by using the labeled human body image sample data to obtain the target detection network based on the target detection algorithm may include the following steps: training the SSD target detection network, namely an initial model of target detection, by using the labeled human body image sample data as input data to obtain a trained target detection network; and then according to the human body image sample data and the precision requirement, adjusting and optimizing the trained target detection network to obtain an SSD network for human body detection, namely the target detection network based on the target detection algorithm.

S13: and under the condition that the target image is determined to be an image containing a human body, carrying out falling identification on the target image through a convolutional neural network so as to determine whether the human body in the target image is in a falling state.

In one embodiment, in order to quickly and accurately identify the human body state corresponding to the human body state from the image containing the human body, for example, to distinguish that the human body is in a falling state or is not in a falling state, a convolutional neural network may be used to perform falling identification on the target image to determine whether the human body in the target image is in a falling state.

In this embodiment, in a specific implementation, in consideration of the idea related to an image classification algorithm (CNN), a trained convolutional neural network may be used as a fall recognition model, a target image determined to include a human body is used as input data, and the fall recognition model is used to recognize whether the human body in the target image is in a fall state, so that whether the human body falls can be determined according to a single frame of image.

In one embodiment, in implementation, before performing S13, a convolutional neural network with high fall recognition accuracy and high fall recognition speed may be established in advance by:

s1: acquiring human body image sample data, wherein the human body image sample data comprises human body images in different states;

s2: extracting an image meeting the requirements from the human body image sample data to serve as preprocessing sample data;

s3: according to the human body state in the image of the pre-processing sample data, dividing the image in the pre-processing sample data into positive sample data and negative sample data, wherein the image in the positive sample data comprises at least one of the following: an image including a state in which a human body stands, an image including a state in which the human body sits, an image including a state in which the human body squats, and an image including a state in which the human body tilts; the image in the negative sample data comprises at least one of: the image comprises an image of a lying state of a human body and an image of a lying state of the human body;

s4: and training by using the positive sample data and the negative sample data to establish the convolutional neural network for identifying the human body state type.

In one embodiment, it is considered that in order to establish a more accurate target detection network based on a target detection algorithm, the human body image sample data already includes a plurality of images including human body states. Therefore, in the present embodiment, an image that meets the requirements may be extracted as pre-processing sample data based on the human body image sample data.

In this embodiment, after obtaining the pre-processing sample data, the images in the pre-processing sample data need to be classified according to two states, i.e., a falling state and a non-falling state. Specifically, the image representing the non-fall in the preprocessed sample data may include: images including an image in which a human body stands, an image in which a human body sits, an image in which a human body squats, an image in which a human body tilts, and the like are divided into positive sample data, that is, a positive image data set. Characterizing images of falls in pre-processed sample data, comprising: images including an image in a state where a human body lies, an image including a state where a human body lies on the front, and the like are divided into negative sample data, that is, a negative image data set. Therefore, specific training learning can be carried out subsequently by utilizing the two corresponding sample data aiming at the identification of the falling state and the non-falling state of the human body so as to establish the convolutional neural network with higher identification precision.

In an embodiment, the training is performed by using the positive sample data and the negative sample data to establish the convolutional neural network for identifying the human body state type, and the specific implementation may include the following contents: constructing an initial convolutional neural network; and performing identification training on the falling state and the non-falling state of the human body on the initial convolutional neural network by using the positive sample data and the negative sample data as input data, so as to obtain the convolutional neural network with higher identification precision and higher identification speed. And then whether the human body state in the target image corresponds to the falling state of the human body can be accurately identified by utilizing the convolutional neural network. If the human body state in the identified target image corresponds to the falling state of the human body, the human body can be judged to be in the falling state; if the human body state in the identified target image corresponds to the non-falling state of the human body, it can be determined that the human body is not in the falling state.

In an embodiment, in the process of establishing the convolutional neural network, when implemented specifically, the following may be further included:

s1: acquiring image sample data which does not contain a human body;

s2: and carrying out error detection training on the convolutional neural network by using the image sample data which does not contain the human body.

In the embodiment, through the false detection training, the target images which do not contain the human body can be identified and filtered out, and the processing efficiency of the convolutional neural network during the fall detection is improved.

In the embodiment of the application, compared with the prior art, the single-frame target image is obtained instead of the video stream for analysis and processing, the target detection network based on the target detection algorithm is used for firstly identifying the image containing the human body, and then the convolutional neural network based on the classification algorithm is used for classifying and identifying the human body state in the target image so as to identify the specific state of the human body in the target image, so that the technical problems of poor tumble identification accuracy, large error and low efficiency in the prior art are solved, and the technical effect of accurately and efficiently identifying the tumble state is achieved.

In one embodiment, in order to extract pre-processing sample data suitable for fall recognition training from human body image sample data, the satisfactory image may specifically include: an image having a human body area accounting for more than 80%. Therefore, the sample data suitable for the fall recognition training can be extracted from the human body image sample data, the sample data for fall recognition is prevented from being collected again, the training cost is reduced, and the learning efficiency is improved.

In one embodiment, the initial convolutional neural network may be an initiation _ v3 network. The acceptance _ v3 network is a convolutional neural network suitable for image recognition. Of course, it should be noted that the convolutional neural network is only listed above for better illustration of the embodiments of the present application. In particular, other suitable convolutional neural networks may be selected and used according to the specific situation and the specific characteristics to be identified. The present application is not limited thereto.

In one embodiment, before training with the positive and negative sample data to establish the convolutional neural network for identifying the human body state type, the method further includes preprocessing an image in the positive and negative sample data according to an initial convolutional neural network, so that the image in the positive and negative sample data matches the initial convolutional neural network. Specifically, for example, when the initial convolutional neural network is an initiation _ v3 network, the preprocessing may specifically include: and performing image transformation on the images in the positive sample data and the negative sample data to a specified size, for example, transforming to the size of 299 x 299 pixel points.

In one embodiment, it is further considered that in the fall recognition using a convolutional neural network, only two types, namely a falling state of a human body and a non-falling state of the human body, actually need to be distinguished. Therefore, according to the complexity of the classification and identification of the convolutional neural network, in order to improve the processing efficiency and reduce the occupation and waste of computing resources, the convolutional neural network can be simplified and improved firstly when the initial convolutional neural network is established. Wherein, the above simplification improvement may specifically include: reducing the number of layers of the convolutional neural network, and/or reducing the number of convolutional kernels of the convolutional neural network. The convolutional neural network can be simplified and improved by independently reducing the number of layers of the convolutional neural network, or independently reducing the number of convolutional kernels of the convolutional neural network, or simultaneously reducing the number of layers of the convolutional neural network and the number of convolutional kernels of the convolutional neural network, so that the occupation of computing resources is reduced while the recognition precision is considered, and the processing efficiency is improved.

In an embodiment, in a case that the convolutional neural network is an initiation _ v3 network, the above simplified improvement on the initiation _ v3 network may specifically include: the number of layers of the initiation _ v3 network is reduced from 11 layers (or structures) to 6 layers or 5 layers, and/or the number of convolution kernels in the initiation _ v3 network is reduced, so that a simplified convolutional neural network can be obtained.

In one embodiment, the simplified convolutional neural network may be implemented in the following manner:

s1: the existing acceptance _ V3 network is simplified.

In this embodiment, specifically, the last 5 indications structures of the indication _ V3 network may be deleted, so as to obtain a simplified indication _ V3 network.

S2: training the simplified interception _ v3 network by using the preprocessed sample data to obtain a parameter model Fa1 for fall detection.

S3: and sequentially reducing the number of convolution kernels of all convolution layers of the simplified interception _ v3 network to two thirds of the number of the convolution kernels, and modifying the parameter model Fa1 to adapt to the network with the reduced number of the convolution kernels.

S4: training the modified parameter model Fa1 by using the preprocessed sample data, and fine-tuning the modified Fa1 to obtain a parameter model Fa2 for fall detection.

S5: and verifying the parameter model Fa2, and adjusting the parameter model Fa2 according to the training and fine tuning operations included in the S4 according to the verification result to obtain the simplified convolutional neural network.

In this embodiment, the verifying specifically may include: comparing the accuracy of the network fall detection after the reduction of the convolution kernel with the accuracy of the network fall detection before the reduction, if the accuracy of the fall detection is not obviously reduced, continuing to reduce the convolution kernel, and performing corresponding training and fine adjustment operations to obtain a more simplified convolution neural network; if the accuracy rate of fall detection is obviously reduced, the training and fine tuning operations can be stopped, and the last network and parameter model are determined to be used for fall detection, namely, the last network and parameter model are used as a convolutional neural network for fall detection.

In one embodiment, after determining that the human body in the target image is in a falling state, it may be determined that the human body in the target area falls, and an alarm may be issued to prompt that the human body in the target area falls. The alarm sending specifically may include sending an alarm sound through a buzzer to remind a person of falling down; and alarm information (for example, alarm short messages) can be sent to the responsible persons or the peripheral medical personnel in the target area through the communication equipment to request timely treatment and the like. Of course, the above-listed various ways of generating an alarm are merely provided for better illustration of the embodiments of the present application. In specific implementation, other suitable alarm issuing modes can be selected according to specific situations to give an alarm. The present application is not limited thereto.

From the above description, it can be seen that the human body falling detection method provided in the embodiment of the present application identifies the specific state of the human body in the target image by acquiring the target image of a single frame instead of the video stream for analysis and processing, and identifying the image including the human body by using the target detection network based on the target detection algorithm, and then classifying and identifying the human body state in the target image by using the convolutional neural network based on the classification algorithm, so as to solve the technical problems of poor accuracy, large error and low efficiency in identifying falling in the existing method, and achieve the technical effect of accurately and efficiently identifying the falling state; the target position is determined by collecting the sound information, and the effective target image is collected by moving the camera according to the target position, so that the detection range of fall detection is effectively enlarged, the accuracy of obtaining the effective target image is improved, the detection effect is improved, and the user experience is improved; the image containing various human body states is obtained as sample data to establish a target detection network and a convolutional neural network, so that the accuracy of identifying human body falling according to the single-frame image is improved; and the convolutional neural network is correspondingly simplified and improved according to the complexity of the state type to be identified, so that the implementation efficiency is improved, and the occupation of operation resources is reduced.

Based on the same inventive concept, embodiments of the present invention further provide a human body fall detection apparatus, as described in the following embodiments. Because the principle of solving the problems of the human body falling detection device is similar to that of the human body falling detection method, the implementation of the device can refer to the implementation of the human body falling detection method, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated. Please refer to fig. 2, which is a schematic structural diagram of a human body fall detection apparatus provided in an embodiment of the present application, and the apparatus may specifically include: an acquisition module 21, a human body detection module 22, and a fall recognition module 23, and the structure thereof will be described in detail below.

The obtaining module 21 may be specifically configured to obtain a target image;

the human body detection module 22 may be specifically configured to perform human body detection on the target image through a target detection network, so as to determine whether the target image is an image including a human body;

the fall identification module 23 may be specifically configured to, when it is determined that the target image is an image including a human body, perform fall identification on the target image through a convolutional neural network to determine whether the human body in the target image is in a fall state.

In the present embodiment, it should be noted that the human body fall detection device may be specifically a human body fall detection robot capable of detecting a fall of a human body. The human body falling detection robot can be particularly applied to various places such as families, hospitals and markets to detect the places in real time, and timely find that people in the places fall, so that the robot can give an alarm in time and timely carry out related rescue.

In one embodiment, in order to expand the detection range and efficiently acquire the effective target image, the acquiring module 21 may specifically include the following structural units:

the sound collector is specifically used for collecting sound information in the target area;

the locator is specifically used for determining the target direction according to the sound information;

the mobile device can be specifically used for moving the camera according to the target position, and the camera can be specifically used for acquiring a target image.

In this embodiment, the moving device may specifically include a pulley and a motor. Therefore, when the device is implemented specifically, the moving device with the pulley and the motor can drive the camera to move in the target direction, so that an effective target image can be acquired better. Of course, the above-mentioned mobile devices are only for better illustration of the embodiments of the present application. In particular, the moving device may be other types of movable equipment, such as a movable robot, a remote control car, and the like. The present application is not limited thereto.

In this embodiment, the effective target image may be an image including a human body. Through the mobile device, the camera can be moved according to the target position, and an effective target image can be obtained as far as possible, so that the workload of the human body detection module 22 can be reduced, and the working efficiency can be improved.

In one embodiment, in order to alarm in time after detecting that the human body falls so as to provide timely treatment for the fallen person, the device may further include an alarm module for giving an alarm.

In one embodiment, the alarm module may specifically include a buzzer, and thus, when the alarm module is specifically implemented, the alarm module may issue an alarm by the buzzer when it is determined that the target image is considered to be in a falling state.

In one embodiment, the alarm module may further include a communication device such as a signal transmitter, so that when the alarm module is implemented, the communication device such as the signal transmitter may transmit alarm information to a relevant responsible person (for example, a guardian or a security of a department store) or a peripheral medical person when it is determined that the target image is in a fallen state, so as to prompt the relevant responsible person or the peripheral medical person that the person falls, and thus, the patient can be treated as soon as possible.

In an embodiment, the apparatus may further include a target detection network establishing module, and the target detection network establishing module may be implemented according to the following program: acquiring human body image sample data, wherein the human body image sample data comprises a plurality of images containing human body states; marking a human body region in the image of the human body image sample data; and training by using the labeled human body image sample data to obtain a target detection network based on a target detection algorithm.

In one embodiment, the human body state may specifically include: a standing state of a human body, a sitting state of the human body, a lying state of the human body, a squatting state of the human body, an inclined state of the human body, a lying state of the human body and the like. Of course, the above-mentioned human body states are only for better explanation of the embodiments of the present application. In specific implementation, other states besides the above-mentioned states can be introduced as the human body state according to specific situations and requirements. The present application is not limited thereto.

In an embodiment, the human body detection module 22 is connected to the acquisition module 21, and in a specific implementation, the human body detection module 22 may send information to the acquisition module 21 when determining that the target image is an image that does not include a human body, and acquire the target image again through the acquisition module 21.

In an embodiment, the apparatus may further include a convolutional neural network establishing module, configured to establish a convolutional neural network for identifying the human body state type, where the convolutional neural network establishing module may specifically include:

the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit can be specifically used for acquiring human body image sample data, and the human body image sample data comprises a plurality of images containing human body states;

the extraction unit is specifically used for extracting an image meeting the requirements from the human body image sample data to serve as preprocessing sample data;

the dividing unit may be specifically configured to divide the image in the pre-processing sample data into positive sample data and negative sample data according to a human body state in the image of the pre-processing sample data, where the image in the positive sample data includes at least one of the following: an image including a state in which a human body stands, an image including a state in which the human body sits, an image including a state in which the human body squats, and an image including a state in which the human body tilts; the image in the negative sample data comprises at least one of: the image comprises an image of a lying state of a human body and an image of a lying state of the human body;

and the establishing unit is specifically used for training by using the positive sample data and the negative sample data to establish a convolutional neural network for identifying the human body state type.

In one embodiment, the convolutional neural network establishing module may further include:

the false detection training unit can be specifically used for acquiring image sample data which does not contain a human body; and carrying out error detection training on the convolutional neural network by using the image sample data which does not contain the human body.

In this embodiment, in order to establish and train a convolutional neural network with higher accuracy, the satisfactory image may specifically include: images with a human body area accounting for more than 80%, and the like.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

It should be noted that, the systems, devices, modules or units described in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. For convenience of description, in the present specification, the above devices are described as being divided into various units by functions, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

Moreover, in the subject specification, adjectives such as first and second may only be used to distinguish one element or action from another element or action without necessarily requiring or implying any actual such relationship or order. References to an element or component or step (etc.) should not be construed as limited to only one of the element, component, or step, but rather to one or more of the element, component, or step, etc., where the context permits.

From the above description, it can be seen that the human body falling detection device provided in the embodiment of the present application performs analysis processing by acquiring a single frame of target image instead of a video stream, and identifies an image including a human body by using a target detection network based on a target detection algorithm, and then classifies human body states in the target image by using a convolutional neural network based on a classification algorithm to identify specific states of the human body in the target image, thereby solving the technical problems of poor falling identification accuracy and low efficiency in the existing method, and achieving the technical effect of accurately and efficiently identifying a falling state; and the target position is determined by collecting the sound information, and the effective target image is collected by moving the camera according to the target position, so that the detection range of fall detection is effectively enlarged, the accuracy of obtaining the effective target image is improved, and the detection effect is improved.

The embodiment of the present application further provides an electronic device, which may specifically refer to a schematic structural diagram of an electronic device based on the method for detecting a human fall shown in fig. 3, where the electronic device may specifically include an input device 31, a processor 32, and a memory 33. The input device 31 may be specifically configured to receive the acquired target image. The processor 32 may be specifically configured to perform human body detection on the target image through a target detection network, so as to determine whether the target image is an image containing a human body; and under the condition that the target image is determined to be an image containing a human body, carrying out falling identification on the target image through a convolutional neural network so as to determine whether the human body in the target image is in a falling state. The memory 33 may be specifically configured to store the target image, the target detection network, the convolutional neural network, and intermediate data generated in the detection process.

In this embodiment, the input device may be one of the main apparatuses for information exchange between a user and a computer system. The input device may include a keyboard, a mouse, a camera, a scanner, a light pen, a handwriting input board, a voice input device, etc.; the input device is used to input raw data and a program for processing the data into the computer. The input device can also acquire and receive data transmitted by other modules, units and devices. The processor may be implemented in any suitable way. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The memory may in particular be a memory device used in modern information technology for storing information. The memory may include multiple levels, and in a digital system, the memory may be any memory as long as it can store binary data; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.

In this embodiment, the functions and effects specifically realized by the electronic device can be explained by comparing with other embodiments, and are not described herein again.

In an embodiment of the present application, there is also provided a computer storage medium based on a human body fall detection method, where the computer storage medium stores computer program instructions that, when executed, implement: acquiring a target image; detecting a human body of the target image through a target detection network to determine whether the target image is an image containing the human body; and under the condition that the target image is determined to be an image containing a human body, carrying out falling identification on the target image through a convolutional neural network so as to determine whether the human body in the target image is in a falling state.

In the present embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard disk (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.

In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer storage medium can be explained by comparing with other embodiments, and are not described herein again.

In a specific implementation scenario example, the human body fall detection method and device provided by the application are applied to design a corresponding human body fall detection robot, and the human body fall detection robot is applied to carry out specific human body fall detection. The following can be referred to as a specific implementation process.

In this embodiment, the human body fall detection robot specifically refers to a schematic structural diagram of a human body fall detection robot that is shown in fig. 4 and is designed by applying the human body fall detection method and apparatus provided by the embodiments of the present application in one scene example. The robot can specifically use a sound source positioning module to position the approximate direction of a human body (namely, a target direction), and then use a camera to acquire data (namely, a target image) to realize human body falling detection based on a single frame image through a deep learning algorithm. The fall detection robot includes a plurality of functional modules, such as a movable robot body 12, a camera module 13, an alarm module 14 (optional), a sound source positioning module 15 (optional), a human body detection module 16, and a fall identification module 17.

In a specific implementation, the sound source positioning module 15 may be specifically configured to determine an approximate direction of a human body, and capture a single frame of image by using the camera module 13, and the human body detection module 16 and the fall recognition module 17 may be specifically configured to determine whether the human body falls according to the captured image, and transmit the result to the movable robot body 12; if the robot falls, the movable robot body 12 can give an alarm by controlling the alarm module 14.

Wherein, the movable robot body 12 at least comprises: robot main part, motor and pulley isotructure. The camera module 13 can be specifically used for acquiring a single image, and sending the single image to the human body detection module 16 for determining whether a human body exists (i.e., determining whether the image includes a human body). The alarm module 14 may at least include a handset communication function and a 110 alarm function. Therefore, during specific implementation, the mobile phone communication function can be used for sending the falling information and the picture information, and the 110 alarm function is used for realizing 110 alarm so as to help people in time. The sound source positioning module 15 can specifically determine the direction of the sound source through the microphone array, so as to conveniently search people. The human body detection module 16 may specifically implement human body detection through an SSD target detection algorithm in deep learning. The fall recognition module 17 realizes the fall state recognition through the convolutional neural network in the deep learning.

In the present embodiment, the human body fall detection robot may be regarded as a specific human body fall detection device, and the principle of implementation thereof is the same as that of the human body fall detection device.

In specific implementation, referring to the schematic flow chart shown in fig. 5 for performing human body fall detection by using the human body fall detection robot in one scene example, the human body fall detection robot is used for performing human body fall detection. In specific implementation, the method can comprise the following steps:

s1: optionally, the general direction of the person is found by combining the movable robot with the sound source positioning module;

s2: acquiring a single-frame image through a camera module, and transmitting the single-frame image into the movable robot;

s3: the collected single-frame image is transmitted into a human body detection module through the movable robot body;

s4: and judging whether a person exists in the acquired image through a human body detection module. If yes, continuing to 5; if not, returning to 1;

s5: sending the detected human body area to a falling identification module to judge whether the human body falls;

s6: transmitting the result information obtained by identification to the movable robot body;

s7: if the person falls down, continuing 8; if not, returning to 2;

s8: and performing alarm, and transmitting the information and the image of the fall to a connected mobile phone or other terminals.

In this embodiment, the human body detection module is implemented based on an SSD object detection algorithm in deep learning. Before image detection, the detection module may perform SSD algorithm training according to the following procedure:

s1: image data containing a human body (the proportion of the human body in the picture is not limited) is collected (i.e. human body image sample data). Because it is necessary to detect a human body region and to detect a human body in any state, the collected image data may specifically include human bodies in different states, such as standing, crouching, lying, or leaning human bodies.

S2: and labeling the collected image data. The SSD object detection network marks the region of the human body during human body detection, and thus the region of the human body in the image data needs to be provided first during training.

S3: and constructing the SSD target detection network. In specific implementation, the SSD destination detection network can be built on a tensoflow framework, and the acceptance _ v2 is used as a feature extractor.

S4: and training the SSD target detection network by using the processed image data, and finely adjusting the SSD target detection network by using the existing trained parameter model to obtain the SSD network (namely the target detection network) for human body detection.

In this embodiment, the fall detection module may specifically include a convolutional neural network in deep learning. Before the image recognition, the fall recognition module can specifically perform convolutional neural network training through the following procedures:

s1: the collection includes human body image data (human accounts for more than 80% of the picture, namely, the human body region picture detected by the human body detection module) (namely, pre-processing sample data).

S2: positive and negative image data samples are constructed. The positive sample (i.e. positive sample data) contains all non-fallen human body pictures, i.e. the human body is standing, holding, inclining, etc.; the pictures contained in the negative sample (i.e. the negative sample data) are all pictures of the person after falling, i.e. the human body is lying, lying prone, etc.

S3: an image in an image data sample is preprocessed. Specifically, all image data may be transformed to a specified size, such as 299 × 299 pixel size.

S4: and constructing a convolutional neural network. Specifically, the fall identification module may adopt an acceptance _ v3 network.

In this embodiment, it is necessary to supplement that the network of the commonly used acceptance _ v3 is wasteful in terms of computing resources for the need of fall recognition. Therefore, when the initiation _ v3 network is constructed, simplified modification is carried out, and the specific simplified modification comprises the following contents:

s4-1: while ensuring the identification accuracy, the interception structure, such as the number of layers, is reduced. The effects of simplifying the network structure, improving the recognition speed and saving the computing resources are achieved.

S4-2: the number of convolution kernels is reduced while the identification accuracy is ensured. The effects of reducing the network size, improving the recognition speed and saving the computing resources are achieved.

S5: inputting the preprocessed picture data sample into an initiation _ v3 network for training to obtain a fall recognition network (namely, a convolutional neural network).

In this embodiment, when the human body detection module and the fall detection module are used for human body fall detection, the following contents may be specifically included:

s1: and inputting the acquired picture into an SSD target detection network, detecting the area where the human body is located, and storing the result.

S2: all detected human body regions are transformed into a specified size, such as 299 x 299 pixel size.

S3: the result obtained in S2 is input to the obtained acceptance _ v3 model, and prediction is performed simultaneously in a multithread manner, giving a recognition result.

S4: and displaying a falling detection result according to the identification record, and determining whether the human body falls.

After carrying out a plurality of falling detection tests on the human body falling detection robot, the following analysis findings are carried out: the human body falling detection robot can realize high-precision falling detection through a single frame image in a complex scene by using the target detection algorithm SSD and the image classification algorithm CNN, and can implement alarm processing. The problem of inaccurate human body detection in the existing method is solved; meanwhile, the falling detection can be realized only by a single-frame image without analyzing and processing the video stream, so that the calculated amount is reduced, and the detection efficiency is improved; and the movable robot is used as a carrier, so that the omnibearing monitoring can be realized.

Through the scene example, it is verified that the human body falling detection method and the human body falling detection device provided by the embodiment of the application perform analysis processing by acquiring a single-frame target image instead of a video stream, recognize an image containing a human body by using a target detection network based on a target detection algorithm, and classify the human body state in the target image by using a convolutional neural network based on a classification algorithm to recognize the specific state of the human body in the target image, so that the technical problems of poor falling recognition accuracy and low efficiency in the existing method are solved, and the technical effect of accurately and efficiently recognizing a falling state is achieved.

Although various specific embodiments are mentioned in the disclosure of the present application, the present application is not limited to the cases described in the industry standards or the examples, and the like, and some industry standards or the embodiments slightly modified based on the implementation described in the custom manner or the examples can also achieve the same, equivalent or similar, or the expected implementation effects after the modifications. Embodiments employing such modified or transformed data acquisition, processing, output, determination, etc., may still fall within the scope of alternative embodiments of the present application.

Although the present application provides method steps as described in an embodiment or flowchart, more or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded.

The devices or modules and the like explained in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, in implementing the present application, the functions of each module may be implemented in one or more pieces of software and/or hardware, or a module that implements the same function may be implemented by a combination of a plurality of sub-modules, and the like. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

While the present application has been described by way of examples, those of ordinary skill in the art will appreciate that there are numerous variations and permutations of the present application that do not depart from the spirit of the present application and that the appended embodiments are intended to include such variations and permutations without departing from the present application.

Claims

1. A human fall detection method, comprising:

collecting sound information in a target area;

determining the target direction according to the sound information;

according to the target position, moving a camera, and adjusting the angle and the distance between the camera and the human body to obtain a target image, wherein the target image is single-frame image data;

acquiring a target image;

2. The method of claim 1, wherein the convolutional neural network is established as follows:

and training by using the positive sample data and the negative sample data to establish a convolutional neural network for identifying the human body state type.

3. The method of claim 2, wherein in establishing the convolutional neural network, the method further comprises:

acquiring image sample data which does not contain a human body;

and carrying out error detection training on the convolutional neural network by using the image sample data which does not contain the human body.

4. A human fall detection device, comprising:

the acquisition module is used for acquiring a target image;

the acquisition module includes:

the mobile device is used for moving the camera according to the target position and adjusting the angle and the distance between the camera and a human body, and the camera is used for acquiring a target image;

5. The apparatus of claim 4, further comprising a convolutional neural network building block configured to build a convolutional neural network for identifying a human state type, wherein the convolutional neural network building block comprises:

the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring human body image sample data, and the human body image sample data comprises a plurality of images containing human body states;

the extraction unit is used for extracting images meeting requirements from the human body image sample data to serve as preprocessing sample data;

a dividing unit, configured to divide an image in the pre-processing sample data into positive sample data and negative sample data according to a human body state in the image of the pre-processing sample data, where the image in the positive sample data includes at least one of: an image including a state in which a human body stands, an image including a state in which the human body sits, an image including a state in which the human body squats, and an image including a state in which the human body tilts; the image in the negative sample data comprises at least one of: the image comprises an image of a lying state of a human body and an image of a lying state of the human body;

and the establishing unit is used for training by utilizing the positive sample data and the negative sample data to establish a convolutional neural network for identifying the human body state type.

6. The apparatus of claim 5, wherein the convolutional neural network building block further comprises:

the false detection training unit is used for acquiring image sample data which does not contain a human body; and carrying out error detection training on the convolutional neural network by using the image sample data which does not contain the human body.