WO2021051545A1 - 基于行为识别模型的摔倒动作判定方法、装置、计算机设备及存储介质 - Google Patents

基于行为识别模型的摔倒动作判定方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021051545A1
WO2021051545A1 PCT/CN2019/117328 CN2019117328W WO2021051545A1 WO 2021051545 A1 WO2021051545 A1 WO 2021051545A1 CN 2019117328 W CN2019117328 W CN 2019117328W WO 2021051545 A1 WO2021051545 A1 WO 2021051545A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target video
probability
target
falling
Prior art date
Application number
PCT/CN2019/117328
Other languages
English (en)
French (fr)
Inventor
罗郑楠
周俊琨
许扬
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051545A1 publication Critical patent/WO2021051545A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, computer equipment, and storage medium for determining a fall action based on a behavior recognition model.
  • the traditional fall determination methods there are mainly sensor-based determination methods and single-picture-based determination methods.
  • the cost of the sensor-based determination method is relatively high, and it is necessary for the individual to carry a device equipped with the corresponding sensor; the image-based determination method has many scene limitations, and the accuracy rate is not high. It is currently impossible to accurately determine the behavior of falling.
  • the embodiments of the present application provide a fall action determination method, device, computer equipment, and storage medium based on a behavior recognition model to solve the problem of failing to accurately determine a fall action.
  • an embodiment of the present application provides a method for determining a fall action based on a behavior recognition model, including:
  • the comprehensive expected probability is greater than a preset threshold, it is determined that a person falls down in the target video.
  • an embodiment of the present application provides a fall action determination device based on a behavior recognition model, including:
  • the first acquisition module is used to acquire the target video shot by the camera
  • the second obtaining module is used to obtain the target video to be analyzed from the target video
  • the third acquisition module is configured to divide the target video to be analyzed into N segments, and randomly extract a frame of image from each of the segments as the image to be recognized, where N is an integer greater than 1;
  • the probability output module is used to input the image to be recognized into a pre-trained behavior recognition model, and output the first probability of someone falling in the target video through the behavior recognition model, and the first probability of a falling-down accompanying action. Two probability;
  • a fourth obtaining module configured to obtain a comprehensive expected probability according to the first probability and the second probability
  • the judging module is used for judging that a person falls down in the target video when the comprehensive expected probability is greater than a preset threshold.
  • a computer device in a third aspect, includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the foregoing when the computer-readable instructions are executed. Steps of a fall action judging method based on a behavior recognition model.
  • an embodiment of the present application provides a computer non-volatile readable storage medium, including: computer readable instructions, which when executed by a processor implement the above-mentioned fall action based on the behavior recognition model Determine the steps of the method.
  • the target video shot by the camera is first obtained, and the target video to be analyzed is obtained from the target video.
  • the captured video can be analyzed in a targeted manner to improve the efficiency and effect of the analysis; then the target video to be analyzed Divide into N segments, and randomly extract a frame of image from each segment as the image to be recognized.
  • the spatio-temporal relationship of the image can be retained under the premise of reducing the amount of calculation to ensure the accuracy of the fall determination, and then the image to be recognized Input to the pre-trained behavior recognition model, and output the first probability of someone falling in the target video through the behavior recognition model, and the second probability of falling down and accompanying actions.
  • the behavior recognition model can improve the accuracy of fall determination
  • the comprehensive expected probability is obtained according to the first probability and the second probability, and when the comprehensive expected probability is greater than the preset threshold, it is determined that someone falls in the target video, which can achieve accurate fall behavior determination.
  • FIG. 1 is a flowchart of a method for determining a fall action based on a behavior recognition model in an embodiment of the present application
  • Fig. 2 is a schematic diagram of a fall action judging device based on a behavior recognition model in an embodiment of the present application
  • Fig. 3 is a schematic diagram of a computer device in an embodiment of the present application.
  • first, second, third, etc. may be used in the embodiments of the present application to describe the preset range, etc., these preset ranges should not be limited to these terms. These terms are only used to distinguish the preset ranges from each other.
  • the first preset range may also be referred to as the second preset range, and similarly, the second preset range may also be referred to as the first preset range.
  • the word “if” as used herein can be interpreted as “when” or “when” or “in response to determination” or “in response to detection”.
  • the phrase “if determined” or “if detected (statement or event)” can be interpreted as “when determined” or “in response to determination” or “when detected (statement or event) )” or “in response to detection (statement or event)”.
  • FIG. 1 shows a flow chart of the method for determining a fall action based on a behavior recognition model in this embodiment.
  • the fall action determination method based on the behavior recognition model can be applied to a fall determination system, and the fall determination system can be used for determination when a fall determination is made.
  • the fall determination system can be specifically applied to a computer device, where the computer device is a device that can perform human-computer interaction with a user, including but not limited to devices such as computers, smart phones, and tablets.
  • the method for determining a fall action based on a behavior recognition model includes the following steps:
  • the computer device where the fall determination system is located can have its own camera, or it can be connected to an external device to call the camera of the external device, and capture the target video through the camera.
  • the target video to be analyzed can be obtained from the target video, so as to realize a quasi real-time fall determination based on the target video to be analyzed.
  • the duration of the target video to be analyzed may specifically be a user preset duration.
  • the target video to be analyzed is obtained from the target video, which specifically includes:
  • S21 Determine the critical moments of the old and new images, where the critical moments of the old and new images are used to divide the target video into a first image group and a second image group, and the acquisition time of any image in the first image group is less than any one in the second image group The moment the image was acquired.
  • a video consists of a certain number of frames of images.
  • the computer device where the fall determination system is located maintains two image groups, which are used to store the first image group that is older than the critical moment of the new and old images and the second image that is newer to the critical moment of the new and old images. group.
  • the new and old image critical time may be specifically determined according to a user's preset.
  • the new and old image critical time may specifically be a time corresponding to 2 seconds before the current shooting time.
  • the significance of the critical moment of the old and new images at this time is to connect the video that occurred in the past 2 seconds with the video that occurred in the past 2-4 seconds.
  • the target video is divided into the first image group and the second image group at the critical moment of the new and old images, which can save long-term information, and can effectively avoid the lack of long-range semantics when making a fall determination, and changes over time
  • the first image group and the second image group are updated in real time, so that the fall determination system has the ability of near real-time determination, which improves the practicability of fall determination.
  • S22 Acquire the first target video from the first image group, where the moment corresponding to the image corresponding to the last frame in the first target video is the critical moment of the old and new images, and the length of the first target video is half of the user preset duration .
  • S23 Acquire a second target video from the second image group, where the moment corresponding to the image corresponding to the first frame in the second target video is the critical moment of the old and new images, and the length of the second target video is half of the user preset duration .
  • S24 Combine the first target video and the second target video in a time sequence to obtain a target video to be analyzed.
  • steps S22-S24 the half-length videos of the first target video and the second target video are combined into the target to-be-analyzed video.
  • the target video to be analyzed can reflect the connection in time sequence, which is beneficial to improve the accuracy of fall determination.
  • steps S21-S24 a specific implementation method for obtaining the target video to be analyzed from the target video is provided.
  • the interception of the target video at the critical moment of the new and old images has a large real-time relationship and still retains the temporal and spatial relationship.
  • the video segment is used as the target video to be analyzed, which helps to improve the accuracy of subsequent fall determination.
  • S30 Divide the target video to be analyzed into N segments, and randomly extract a frame of image from each segment as the image to be recognized, where N is an integer greater than 1.
  • the target video to be analyzed still includes a large number of frames of images, and the amount of calculation for direct calculation is relatively large. Therefore, in one embodiment, the target video to be analyzed can be divided into N segments, and a frame of image is randomly extracted from each segment as the image to be recognized, which can still retain the spatiotemporal relationship of the image while reducing the amount of calculation. , To ensure the accuracy of the subsequent fall determination.
  • S40 Input the image to be recognized into a pre-trained behavior recognition model, and output the first probability of someone falling in the target video through the behavior recognition model, and the second probability of a falling-down accompanying action.
  • the accompanying actions of falling to the ground refer to the actions that accompany the person at the moment of falling, such as supporting the ground with the hands, landing with the back and other accompanying actions.
  • the function of the pre-trained behavior recognition model is to output the first probability of someone falling in the target video and the second probability of the accompanying action of falling down according to the input image to be recognized.
  • the behavior recognition model combines the falling action and the falling-down accompanying action to comprehensively determine whether someone has fallen. Compared with the judgment based on the falling action alone, the accuracy is higher. Understandably, generally the judgment based on the fall action alone or the fall-associated action alone is only based on a single picture. This implementation is based on the video for the fall judgment, combined with timing information, with Higher accuracy.
  • step S40 the behavior recognition model can be obtained by pre-training using the following steps:
  • S411 Obtain a preset number of falling videos as sample videos, where the duration of the falling video is pre-processed to be equal in length, and the duration of the falling video is the same as the duration of the target to-be-analyzed video.
  • S412 Divide each sample video into N sample segments, and randomly extract a frame of image from each sample segment as the image to be trained, where N is an integer greater than 1.
  • the 2D convolutional neural network is a 2-dimensional convolutional neural network. Understandably, the image to be trained is two-dimensional, and the 2D convolutional neural network can effectively extract the spatial features of the static image.
  • the 2D convolutional neural network includes an input layer, a convolutional layer, and a pooling layer. Among them, the convolutional layer and the pooling layer are provided with multiple layers in the network (such as 16-layer convolutional layer + 16-layer pooling layer).
  • the convolutional layer is used to perform convolution operation on the image to be trained input by the input layer.
  • the convolution operation specifically uses a convolution kernel with a step size of 2 and a size of 7 ⁇ 7; pooling
  • the layer is used to perform pooling operations on the output values in the convolutional layer.
  • the pooling operations include maximum pooling operations and minimum pooling operations.
  • the pooling window (such as size The largest value in the pooling window with a step size of 3 and a step size of 1 is used as the output value of the pooling window.
  • the 2D convolutional neural network is used to perform feature extraction on the training image without further classifying the training image.
  • the 2D convolutional neural network may be shared by each image to be trained, which can effectively improve computing efficiency.
  • S414 Obtain a spatio-temporal relationship feature map group according to the feature images corresponding to the N sample segments.
  • the feature images obtained from the sample fragments can be combined in terms of time sequence to obtain a feature map group with spatiotemporal relationship features, and also a real-time spatial relationship feature map group.
  • step S414 the size of the Nth feature image is expressed as K ⁇ A ⁇ B, K is the number of channels obtained by the convolution processing of the feature image, A ⁇ B is the pixel area of the feature image, and the Nth feature image Expressed as among them, Represents the first feature image in the K channel number in the Nth sample video.
  • obtaining the spatiotemporal relationship feature map group according to the feature images corresponding to the N sample fragments includes: stacking the feature images corresponding to the N sample fragments to obtain the expression ⁇ M 1 , M 2 , ..., M N-1 ,M N ⁇ , the spatio-temporal relationship feature graph group, which consists of N elements, and the size is expressed as N ⁇ K ⁇ A ⁇ B, where, such as stacked elements
  • each sample video is divided into N sample segments, from the first sample video to the Nth sample video in order, and one sample segment is randomly selected from each sample segment in step S412.
  • the frame image is used as the image to be trained, and the image to be trained is also arranged in the order of the first to the Nth.
  • the sequence from N to 1 is used for combination, specifically, from the first element in the spatio-temporal relationship feature graph group It can be seen in Represents the first feature image in the K channel number in the first sample segment, Represents the first feature image in the K channel number in the second sample segment.
  • M 1 It is the last one, which is opposite to the order from the first sample video to the Nth sample video. It means that the elements in the graph group are combined in the order from N to 1, which can be understood as each spatio-temporal relationship
  • the elements in the feature map group are combined in reverse order when they are combined.
  • the stacking process involves stacking and combining feature images with respect to time sequence, stacking and combining feature images with the same index number in different sample fragments to obtain a new feature map group.
  • the feature map group is a real-time spatial relationship feature map group, which combines time-series forward information and reverse information, as well as the characteristics of the image to be trained, which helps to improve the accuracy of the judgment when making a fall judgment.
  • S415 Use a 3D convolutional neural network to extract the spatiotemporal features of the spatiotemporal relationship feature map group.
  • the 3D convolutional neural network is an improved convolutional neural network compared to the 2D convolutional neural network.
  • the 2D convolutional neural network has higher advantages for extracting the spatial features of static images, image classification, detection and other tasks, but for video (with more time series dimensions) and other 3-dimensional objects, because The 2D convolutional neural network does not consider the motion information of the object in the time dimension between the images, and the effect on extracting the temporal features is average. Therefore, for extracting objects with 3 dimensions, such as videos, a 3D convolutional neural 3 network can be used for feature extraction.
  • the convolution kernel used in the 3D convolutional neural network will have one more dimension than the convolution kernel used in the 2D convolutional neural network.
  • the convolution kernel used in the 2D convolutional neural network is 7 ⁇ 7 Convolution kernel
  • the convolution kernel used by the 3D convolutional neural network may specifically be a 7 ⁇ 7 ⁇ 64 convolution kernel.
  • the spatio-temporal relationship feature map group obtained according to the feature images corresponding to the N sample fragments is a feature map group with time series dimensions and has 3 dimensions. Therefore, a 3D convolutional neural network can be used to analyze the space-time relationship.
  • the temporal and spatial features of the relational feature image are extracted.
  • the 3D convolutional neural network includes an input layer, a convolutional layer, and a pooling layer.
  • the convolutional layer is used to perform convolution operations on the spatio-temporal relationship feature map group input by the input layer.
  • the convolution operation uses a step size of 2 and a size of 7 ⁇ 7 ⁇ 64.
  • the pooling layer is used to perform pooling operations on the values output in the convolutional layer. Specifically, a pooling window with a window size of 3 ⁇ 3 ⁇ 64 and a step size of 2 can be used for the pooling operation.
  • the spatiotemporal feature relationship graph obtained in step S414 has spatiotemporal features, and the spatiotemporal features are specifically extracted using a 3D convolutional neural network.
  • S416 Use a 2D convolutional neural network to extract deep features of the spatiotemporal relationship feature map group.
  • this step is a 2D convolution operation on the spatio-temporal relationship feature map group with time series features.
  • the feature extraction of the spatio-temporal relationship feature map group using a 2D convolutional neural network can extract the spatio-temporal relationship feature map.
  • the deep features are features in the two-dimensional image space. They are also valuable for the classification of behavior recognition.
  • the 3D convolutional neural network can be used to extract the spatiotemporal features of the spatiotemporal relationship feature map group and the 2D volume
  • the product neural network extracts the deep features of the spatio-temporal relationship feature graph group as the input features of the classification, thereby improving the recognition accuracy of the behavior recognition model.
  • S417 Connect spatiotemporal features and deep features to a preset classifier.
  • the spatiotemporal features and deep features are expressed in the form of vectors.
  • the elements in the vector and the arrangement order between the elements reflect the spatiotemporal features and deep features of the image to be trained.
  • the main function of the 3D convolutional neural network is to extract features in space and time
  • the main function of the 2D convolutional neural network is to extract the deep features in space.
  • two different convolutional neural network extraction points can be integrated. And the extraction effect makes the output result of the classifier more reliable.
  • a cascade operation can be used to splice the vectors represented by the spatiotemporal features and the deep features, and the classifier can be accessed through a fully connected layer.
  • each neuron in the fully connected layer is fully connected with all the neurons in the previous layer, and the local information with category discrimination in the convolutional layer or the pooling layer is integrated.
  • the output value of the last fully connected layer is passed to an output, which is connected to the preset classifier.
  • the softmax classifier can be used for this classifier.
  • the softmax classifier can use the softmax classifier to analyze the spatiotemporal features and deep layers of the access. The features are mapped to the (0,1) interval to achieve classification.
  • S418 Output the first probability of someone falling in the sample video and the second probability of an accompanying action appearing in the sample video through the classifier.
  • the image to be trained obtained in step S412 has been pre-labeled and classified, and the image to be trained is divided into a falling image, a falling-down accompanying action image, and a normal (non-falling non-falling-down accompanying action) image.
  • the image to be trained will be classified and processed according to the pre-label, and the softmax classifier will output the first probability of someone falling in the sample video and the second probability of falling down and accompanying actions.
  • S419 Using a predefined loss function, obtain the loss value generated during the model training process according to the label value of the sample video, and the first probability and the second probability.
  • the loss value will be generated during the training process of the behavior recognition model, which means that there is an error in the training process, which will affect the recognition accuracy of the model.
  • the method of calculating the loss value in the mathematical method can be used to pre-define and establish Loss function. Through the loss function, according to the label value of the sample video and the loss value calculated during the model training process of the first probability and the second probability, the network parameters can be updated according to the loss value, and a behavior recognition model with higher recognition accuracy can be obtained.
  • the backpropagation algorithm in the mathematical method can be used to update the network parameters of the model according to the loss value until the number of updates reaches the preset update number threshold or the gradient no longer drops during the update process.
  • the update process is ended, and the behavior recognition model is obtained.
  • steps S411 to S41-10 a specific implementation for training a behavior recognition model is provided.
  • the temporal and spatial characteristics of different sample fragments are extracted, so that the extracted features can better reflect the sample fragments
  • the spatial distribution characteristics of and the temporal relationship between sample fragments make the trained behavior recognition model capable of identifying fall events and have a high accuracy rate.
  • step S40 outputting the first probability of someone falling in the target video and the second probability of the accompanying action occurring in the target video through the behavior recognition model specifically includes:
  • S421 Use a 2D convolutional neural network to separately extract features of each image to be recognized, to obtain a feature image of each image to be recognized.
  • S422 Obtain a target spatiotemporal relationship feature map group according to the feature images corresponding to the N segments.
  • S423 Use a 3D convolutional neural network to extract the target spatiotemporal feature of the target spatiotemporal relationship feature map group.
  • S424 Use the 2D convolutional neural network to extract the target deep features of the target spatiotemporal relationship feature map group.
  • S425 Connect the target spatiotemporal feature and the target deep feature into a preset classifier.
  • S426 Output the first probability of someone falling in the target video and the second probability of the accompanying action of falling to the ground through the classifier.
  • steps S421-S426 a specific implementation method is provided for outputting the first probability of someone falling in the target video through the behavior recognition model, and the second probability of the occurrence of falling-down accompanying actions, which can be used in the process of determining the fall event. , Fully extract the spatial and temporal characteristics of the image to be recognized, so that the accuracy of the first probability and the second probability of the output is higher.
  • steps S421-S426 are the process of using the behavior recognition model to recognize behavior. There are similar steps in the step of training the behavior recognition model. For details, please refer to steps S411-S41-10, which will not be repeated here.
  • the comprehensive expected probability can be obtained by a weighted calculation method, or can be obtained by a method based on Bayes' theorem, which is not limited here.
  • the comprehensive expected probability also considers the accompanying actions of the person after the fall. Compared with only the first probability for the fall determination, the comprehensive expected probability for the fall determination has a higher accuracy rate.
  • the target video shot by the camera is first obtained, and the target video to be analyzed is obtained from the target video.
  • the captured video can be analyzed in a targeted manner to improve the efficiency and effect of the analysis; then the target video to be analyzed Divide into N segments, and randomly extract a frame of image from each segment as the image to be recognized.
  • the spatio-temporal relationship of the image can be retained under the premise of reducing the amount of calculation to ensure the accuracy of the fall determination, and then the image to be recognized Input to the pre-trained behavior recognition model, and output the first probability of someone falling in the target video through the behavior recognition model, and the second probability of falling down and accompanying actions.
  • the behavior recognition model can improve the accuracy of fall determination
  • the comprehensive expected probability is obtained according to the first probability and the second probability, and when the comprehensive expected probability is greater than the preset threshold, it is determined that someone falls in the target video, which can achieve accurate fall behavior determination.
  • the embodiment of the present application further provides an embodiment of a device that implements each step and method in the above method embodiment.
  • Fig. 2 shows a principle block diagram of a fall action judging device based on a behavior recognition model in one-to-one correspondence with the fall action judging method based on a behavior recognition model in the embodiment.
  • the device for determining a fall action based on a behavior recognition model includes a first acquisition module 10, a second acquisition module 20, a third acquisition module 30, a probability output module 40, a fourth acquisition module 50, and a determination module 60. .
  • the implementation functions of the first acquisition module 10, the second acquisition module 20, the third acquisition module 30, the probability output module 40, the fourth acquisition module 50, and the determination module 60 are the same as the fall action determination based on the behavior recognition model in the embodiment
  • the steps corresponding to the method correspond one-to-one, and in order to avoid redundancy, this embodiment will not describe them one by one.
  • the first acquisition module 10 is used to acquire a target video shot by a camera.
  • the second obtaining module 20 is used to obtain the target video to be analyzed from the target video.
  • the third acquisition module 30 is configured to divide the target video to be analyzed into N segments, and randomly extract a frame of image from each segment as the image to be recognized, where N is an integer greater than 1.
  • the probability output module 40 is configured to input the image to be recognized into a pre-trained behavior recognition model, and output the first probability of someone falling in the target video through the behavior recognition model and the second probability of a falling-down accompanying action.
  • the fourth obtaining module 50 is configured to obtain a comprehensive expected probability according to the first probability and the second probability.
  • the judging module 60 is used for judging that someone has fallen down in the target video when the comprehensive expected probability is greater than the preset threshold.
  • the second obtaining module 20 is specifically configured to:
  • the first target video is acquired from the first image group, where the moment corresponding to the image corresponding to the last frame in the first target video is the critical moment of the old and new images, and the length of the first target video is half of the user preset duration.
  • the second target video is acquired from the second image group, where the moment corresponding to the image corresponding to the first frame in the second target video is the critical moment of the old and new images, and the length of the second target video is half of the user preset duration.
  • the behavior recognition model is obtained by using a training module, and the training module is specifically used for:
  • the 2D convolutional neural network is used to extract the features of each image to be trained, and the feature image of each image to be trained is obtained.
  • the spatio-temporal relationship feature map group is obtained.
  • the 3D convolutional neural network is used to extract the spatiotemporal features of the spatiotemporal relationship feature map group.
  • a 2D convolutional neural network is used to extract the deep features of the spatiotemporal relationship feature map group.
  • the spatio-temporal features and deep features are connected to the preset classifier.
  • the classifier outputs the first probability of someone falling down in the sample video and the second probability of falling down and accompanying actions.
  • the loss value generated during the model training process is obtained according to the label value of the sample video, and the first probability and the second probability.
  • the back-propagation algorithm is used to update the network parameters of the model, and the behavior recognition model is obtained.
  • the size of the Nth characteristic image is expressed as K ⁇ A ⁇ B
  • K is the number of characteristic image channels
  • a ⁇ B is the pixel area of the characteristic image
  • the Nth characteristic image is expressed as
  • Obtaining the spatiotemporal relationship feature map group according to the feature images corresponding to the N sample fragments includes: stacking the feature images corresponding to the N sample fragments to obtain the expression ⁇ M 1 , M 2 ,..., M N- 1 , M N ⁇ 's spatio-temporal relationship feature graph group, in which the stacked
  • the probability output module 40 is specifically configured to:
  • the 2D convolutional neural network is used to extract the features of each image to be recognized, and the feature image of each image to be recognized is obtained.
  • the target spatiotemporal relationship feature map group is obtained.
  • a 3D convolutional neural network is used to extract the target spatiotemporal features of the target spatiotemporal relationship feature map group.
  • a 2D convolutional neural network is used to extract the target's deep features in the target spatiotemporal relationship feature map group.
  • the target spatiotemporal features and target deep features are connected to the preset classifier.
  • the classifier outputs the first probability of someone falling down in the target video and the second probability of falling down and accompanying actions.
  • the target video shot by the camera is first obtained, and the target video to be analyzed is obtained from the target video.
  • the captured video can be analyzed in a targeted manner to improve the efficiency and effect of the analysis; then the target video to be analyzed Divide into N segments, and randomly extract a frame of image from each segment as the image to be recognized.
  • the spatio-temporal relationship of the image can be retained under the premise of reducing the amount of calculation to ensure the accuracy of the fall determination, and then the image to be recognized Input to the pre-trained behavior recognition model, and output the first probability of someone falling in the target video through the behavior recognition model, and the second probability of falling down and accompanying actions.
  • the behavior recognition model can improve the accuracy of fall determination
  • the comprehensive expected probability is obtained according to the first probability and the second probability, and when the comprehensive expected probability is greater than the preset threshold, it is determined that someone falls in the target video, which can achieve accurate fall behavior determination.
  • This embodiment provides a computer non-volatile readable storage medium.
  • the computer non-volatile readable storage medium stores computer readable instructions.
  • the behavior-based recognition in the embodiment is implemented. In order to avoid repetition, the method for determining the fall action of the model will not be repeated here.
  • the computer-readable instruction is executed by the processor, the function of each module/unit in the device for determining a fall action based on the behavior recognition model in the embodiment is realized. To avoid repetition, it will not be repeated here.
  • Fig. 3 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • the computer device 70 of this embodiment includes: a processor 71, a memory 72, and computer-readable instructions 73 stored in the memory 72 and running on the processor 71, and the computer-readable instructions 73 are processed
  • the device 71 is executed, the method for determining the fall action based on the behavior recognition model in the embodiment is implemented. In order to avoid repetition, it will not be repeated here.
  • the computer-readable instruction 73 is executed by the processor 71, the function of each model/unit in the device for determining a fall action based on the behavior recognition model in the embodiment is realized. In order to avoid repetition, it will not be repeated here.
  • the computer device 70 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device 70 may include, but is not limited to, a processor 71 and a memory 72.
  • FIG. 3 is only an example of the computer device 70, and does not constitute a limitation on the computer device 70. It may include more or less components than those shown in the figure, or a combination of certain components, or different components.
  • computer equipment may also include input and output devices, network access devices, buses, and so on.
  • the so-called processor 71 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 72 may be an internal storage unit of the computer device 70, such as a hard disk or memory of the computer device 70.
  • the memory 72 may also be an external storage device of the computer device 70, such as a plug-in hard disk equipped on the computer device 70, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash memory card (Flash). Card) and so on.
  • the memory 72 may also include both an internal storage unit of the computer device 70 and an external storage device.
  • the memory 72 is used to store computer readable instructions and other programs and data required by the computer equipment.
  • the memory 72 can also be used to temporarily store data that has been output or will be output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

一种基于行为识别模型的摔倒动作判定方法、装置、计算机设备及存储介质,涉及人工智能技术领域。该方法包括:获取摄像头拍摄的目标视频(S10);从目标视频中得到目标待分析视频(S20);将目标待分析视频分为N个片段,并从每个片段中随机抽取一帧图像作为待识别图像,其中,N为大于1的整数(S30);将待识别图像输入到预先训练的行为识别模型中,通过行为识别模型输出目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率(S40);根据第一概率和第二概率得到综合期望概率(S50);当综合期望概率大于预设阈值时,判定在目标视频中出现有人摔倒的情况(S60)。采用该基于行为识别模型的摔倒动作判定方法能够实现准确的摔倒行为判定。

Description

基于行为识别模型的摔倒动作判定方法、装置、计算机设备及存储介质
本申请以2019年9月16日提交的申请号为201910869615.0,名称为“摔倒判定方法、装置、计算机设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
【技术领域】
本申请涉及人工智能技术领域,尤其涉及一种基于行为识别模型的摔倒动作判定方法、装置、计算机设备及存储介质。
【背景技术】
在传统的摔倒判定方式中,主要有基于传感器的判定方式和基于单张图片的判定方式。基于传感器的判定方式成本较高,而且需要个人携带配备有相应传感器的设备;基于图片的判定方式有很多场景的限制,准确率不高。目前无法准确对摔倒行为进行判定。
【发明内容】
有鉴于此,本申请实施例提供了一种基于行为识别模型的摔倒动作判定方法、装置、计算机设备及存储介质,用以解决无法准确判定摔倒行为的问题。
第一方面,本申请实施例提供了一种基于行为识别模型的摔倒动作判定方法,包括:
获取摄像头拍摄的目标视频;
从所述目标视频中得到目标待分析视频;
将所述目标待分析视频分为N个片段,并从每个所述片段中随机抽取一帧图像作为待识别图像,其中,N为大于1的整数;
将所述待识别图像输入到预先训练的行为识别模型中,通过所述行为识别模型输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率;
根据所述第一概率和所述第二概率得到综合期望概率;
当所述综合期望概率大于预设阈值时,判定在所述目标视频中出现有人摔倒的情况。
第二方面,本申请实施例提供了一种基于行为识别模型的摔倒动作判定装置,包括:
第一获取模块,用于获取摄像头拍摄的目标视频;
第二获取模块,用于从所述目标视频中得到目标待分析视频;
第三获取模块,用于将所述目标待分析视频分为N个片段,并从每个所述片段中随机抽取一帧图像作为待识别图像,其中,N为大于1的整数;
概率输出模块,用于将所述待识别图像输入到预先训练的行为识别模型中,通过所述行为识别模型输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率;
第四获取模块,用于根据所述第一概率和所述第二概率得到综合期望概率;
判定模块,用于当所述综合期望概率大于预设阈值时,判定在所述目标视频中出现有人摔倒的情况。
第三方面,一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现上述基于行为识别模型的摔倒动作判定方法的步骤。
第四方面,本申请实施例提供了一种计算机非易失性可读存储介质,包括:计算机可读指令,所述计算机可读指令被处理器执行时实现上述基于行为识别模型的摔倒动作判定方法的步骤。
在本申请实施例中,首先获取摄像头拍摄的目标视频,并从目标视频中得到目标待分析视频,可针对性地对拍摄的视频进行分析,提高分析的效率和效果;接着将目标待分析视频分为N个片段,并从每个片段中随机抽取一帧图像作为待识别图像,可以在减少计算量的前提下仍保留图像的时空关系,保证摔倒判定的准确度,然后将待识别图像输入到预先训练的行为识别模型中,通过行为识别模型输出目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率,通过该行为识别模型可以提高摔倒判定的准确度,最后根据第一概率和第二概率得到综合期望概率,且当综合期望概率大于预设阈值时,判定在目标视频中出现有人摔倒的情况,能够实现准确的摔倒行为判定。
【附图说明】
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。
图1是本申请一实施例中基于行为识别模型的摔倒动作判定方法的一流程图;
图2是本申请一实施例中基于行为识别模型的摔倒动作判定装置的一示意图;
图3是本申请一实施例中计算机设备的一示意图。
【具体实施方式】
为了更好的理解本申请的技术方案,下面结合附图对本申请实施例进行详细描述。
应当明确,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。
应当理解,本文中使用的术语“和/或”仅仅是一种描述关联对象的相同的字段,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
应当理解,尽管在本申请实施例中可能采用术语第一、第二、第三等来描述预设范围等,但这些预设范围不应限于这些术语。这些术语仅用来将预设范围彼此区分开。例如,在不脱离本申请实施例范围的情况下,第一预设范围也可以被称为第二预设范围,类似地,第二预设范围也可以被称为第一预设范围。
取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。
图1示出本实施例中基于行为识别模型的摔倒动作判定方法的一流程图。该基于行为识别模型的摔倒动作判定方法可应用在摔倒判定***上,在进行摔倒判定时可采用该摔倒判定***进行判定。该摔倒判定***具体可应用在计算机设备上,其中,该计算机设备是可与用户进行人机交互的设备,包括但不限于电脑、智能手机和平板等设备。如图1所示,该基于行为识别模型的摔倒动作判定方法包括如下步骤:
S10:获取摄像头拍摄的目标视频。
可以理解地,摔倒判定***所在的计算机设备可以自带摄像头,也可以采用连接外部设备的方式调用外部设备的摄像头,通过摄像头拍摄获取目标视频。
S20:从目标视频中得到目标待分析视频。
可以理解地,在实际拍摄中目标视频会随着拍摄时长而变长,显然,对于一个时长较长的目标视频进行摔倒判定不仅计算量大,且也不符合实际判定的需求。用户期望可以实现准实时的摔倒判定分析。因此,在本实施例中,可从目标视频中得到目标待分析视频,以根据目标待分析视频实现准实时的摔倒判定。
进一步地,目标待分析视频的时长具体可以是用户预设时长,在步骤S20中,从目标视频中得到目标待分析视频,具体包括:
S21:确定新旧图像临界时刻,其中,新旧图像临界时刻用于将目标视频分为第一图像组和第二图像组,第一图像组中任一图像获取的时刻小于第二图像组中任一图像获取的时刻。
可以理解地,视频由一定数量帧的图像组成。
可以理解地,摔倒判定***所在的计算机设备中维护了两个图像组,分别用于存放相对于新旧图像临界时刻较旧的第一图像组和相对于新旧图像临界时刻较新的第二图像组。新旧图像临界时刻具体可以是根据用户的预设定所确定,例如新旧图像临界时刻具体可以是当前拍摄时刻的前2秒所对应的时刻。此时新旧图像临界时刻的意义在于,连接近2秒内发生的视频和近2-4秒内发生的视频的联系。采用新旧图像临界时刻将目标视频分为第一图像组和第二图像组,可以保存较长时程的信息,在进行摔倒判定时,可以有效避免长程语义的缺失,并且,随时间的变化实时更新第一图像组和第二图像组,使得摔倒判定***具有近实时判定的能力,提高了摔倒判定的实用性。
S22:从第一图像组中获取第一目标视频,其中,第一目标视频中最后一帧所对应的图像对应的时刻为新旧图像临界时刻,第一目标视频的长度为用户预设时长的一半。
S23:从第二图像组中获取第二目标视频,其中,第二目标视频中第一帧所对应的图像对应的时刻为新旧图像临界时刻,第二目标视频的长度为用户预设时长的一半。
S24:按照时间的先后顺序组合第一目标视频和第二目标视频,得到目标待分析视频。
步骤S22-S24中,在第一目标视频和第二目标视频中各取一半时长的视频组合成目标待分析视频。该目标待分析视频能够体现出在时序上的联系,有利于提高摔倒判定的准确度。
在步骤S21-S24的步骤中,提供了一种从目标视频中得到目标待分析视频的具体实施方式,通过新旧图像临界时刻在目标视频中截取与实时关系较大、且仍保留有时空关系的视频段作为目标待分析视频,有助于提高后续摔倒判定的准确度。
S30:将目标待分析视频分为N个片段,并从每个片段中随机抽取一帧图像作为待识别图像,其中,N为大于1的整数。
可以理解地,目标待分析视频中仍包括大量帧的图像,直接进行计算的计算量较大。因此, 在一实施例中,可将目标待分析视频分为N个片段,并从每个片段中随机抽取一帧图像作为待识别图像,可以在减少计算量的前提下仍保留图像的时空关系,能够保证后续进行摔倒判定时的准确度。
S40:将待识别图像输入到预先训练的行为识别模型中,通过行为识别模型输出目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率。
其中,倒地伴随动作是指人在摔倒瞬间伴随出现的动作,如用手支撑地面,用背部落地等伴随性动作。
其中,该预先训练的行为识别模型的作用为根据输入的待识别图像,输出目标视频中有人摔倒的第一概率以及出现倒地伴随动作的第二概率。该行为识别模型将摔倒动作和倒地伴随动作进行了结合,综合判定是否有人摔倒,相对于单独根据摔倒动作进行判定,准确度更高。可以理解地,一般单独根据摔倒动作进行判定或者单独对倒地伴随动作进行判定仅是根据单张图片进行判定,本实施是对以视频为基础进行摔倒判定的,结合了时序信息,具有较高的准确度。
进一步地,在步骤S40中,行为识别模型,可采用以下步骤预先训练得到:
S411:获取预设数量的摔倒视频作为样本视频,其中,摔倒视频的时长预先处理为等长,且摔倒视频的时长与目标待分析视频的时长相同。
S412:将每个样本视频分为N个样本片段,并从每个样本片段中随机抽取一帧图像作为待训练图像,其中,N为大于1的整数。
S413:采用2D卷积神经网络分别提取每个待训练图像的特征,得到每个待训练图像的特征图像。其中,2D卷积神经网络即2维卷积神经网络,可以理解地,待训练图像是二维的,采用该2D卷积神经网络能够有效提取静态图像的空间特征。该2D卷积神经网络包括输入层、卷积层和池化层。其中,卷积层和池化层在网络中设置有多层(如16层卷积层+16层池化层)。在该2D卷积神经网络中,卷积层用于对输入层输入的待训练图像进行卷积运算,卷积运算时具体采用步长为2,尺寸为7×7的卷积核;池化层用于对卷积层中输出的值进行池化操作,其中,池化操作包括最大池化操作和最小池化操作等,当采用最大池化操作时,将会采用池化窗口(如大小为3,步长为1的池化窗口)中最大的值作为该池化窗口的输出值。该2D卷积神经网络用于对待训练图像进行特征提取,而不对待训练图像进行进一步地分类。
本实施例中,该2D卷积神经网络可以是每个待训练图像共享的,能够有效提高运算效率。
S414:根据N个样本片段对应的特征图像得到时空关系特征图组。
在一实施例中,可对样本片段得到的特征图像进行关于时序上的组合,得到具备时空关系特征的特征图组,也即时空关系特征图组。
进一步地,在步骤S414中,第N个特征图像的大小表示为K×A×B,K为特征图像经过卷积处理得到的通道数,A×B为特征图像像素面积,第N个特征图像表示为
Figure PCTCN2019117328-appb-000001
Figure PCTCN2019117328-appb-000002
其中,
Figure PCTCN2019117328-appb-000003
表示第N个样本视频中在K通道数中的第一个特征图像。进一步地,根据N个样本片段对应的特征图像得到时空关系特征图组,包括:将N个样本片段对应的特征图像进行堆叠,得到表示为{M 1,M 2,……,M N-1,M N}的时空关系特征图组,该时空关系特征图组由N个元素组成,大小表示为N×K×A×B,其中,如堆叠后的元素
Figure PCTCN2019117328-appb-000004
Figure PCTCN2019117328-appb-000005
需要说明的是,在步骤S412中,每个样本视频分为N个样本片段,分别按顺序从第一个样本视频到第N个样本视频,在步骤S412中从每个样本片段中随机抽取一帧图像作为待训练图像,待训练图像也是按照第一个到第N个的排列顺序。在本实施中,组成时空关系特征图组时则利用了从N到1的顺序进行组合,具体地,从时空关系特征图组中的第一个元素
Figure PCTCN2019117328-appb-000006
中可以看出,
Figure PCTCN2019117328-appb-000007
表示第一个样本片段中在K通道数中的第一个特征图像,
Figure PCTCN2019117328-appb-000008
表示第二个样本片段中在K通道数中的第一个特征图像,在M 1的表达式中,
Figure PCTCN2019117328-appb-000009
是排在最后一个,与按顺序从第一个样本视频到第N个样本视频的顺序相反,表示在图组中的元素是从N到1的顺序进行组合的,可以理解为每个时空关系特征图组中的元素,在进行组合时是倒序进行结合的。
可以理解地,该堆叠过程是对特征图像进行关于时序上的堆叠组合,将不同样本片段中相同索引号的特征图像进行堆叠组合,重新得到一个新的特征图组。该特征图组即时空关系特征图组,时空关系特征图组结合了时序的正向信息和反向信息,以及待训练图像的特征,有助于在进行摔倒判定时提高判定的准确率。
S415:采用3D卷积神经网络提取时空关系特征图组的时空特征。
其中,3D卷积神经网络是相对于2D卷积神经网络进行改进的卷积神经网络。可以理解地,2D卷积神经网络对于提取静态图像的空间特征,在图像的分类、检测等任务上有较高的优势,但是对于视频(多了时间序列上的维度)等3维对象,由于2D卷积神经网络没有考虑到图像之间的时间维度上物体的运动信息,在提取时序特征上的效果一般。因此,对于提取具有3维度的对象如视频,可采用3D卷积神经3网络进行特征提取。
具体地,3D卷积神经网络中采用的卷积核相比于2D卷积神经网络中采用的卷积核会多出一个维度,如2D卷积神经网络采用的卷积核是7×7的卷积核,则3D卷积神经网络采用的卷积核具体可以是7×7×64的卷积核。
可以理解地,在步骤S414中,根据N个样本片段对应的特征图像得到时空关系特征图组 是具有时间序列维度的特征图组,具有3个维度,因此,可采用3D卷积神经网络对时空关系特征图像的时空特征进行提取。具体地,该3D卷积神经网络包括输入层、卷积层和池化层。在该3D卷积神经网络中,卷积层用于对输入层输入的时空关系特征图组进行卷积运算,卷积运算时具体采用步长为2,尺寸为7×7×64的卷积核;池化层用于对卷积层中输出的值进行池化操作,具体可以采用窗口大小为3×3×64,步长为2的池化窗口进行池化操作。
在本实施例中,经步骤S414得到的时空特征关系图是具有时空特征的,该时空特征具体采用可3D卷积神经网络进行提取。
S416:采用2D卷积神经网络提取时空关系特征图组的深层特征。
需要说明的是,该步骤是对具有时序特征的时空关系特征图组作的2D卷积操作,该采用2D卷积神经网络对时空关系特征图组进行的特征提取,能够提取到时空关系特征图组的深层特征,该深层特征是在二维图像空间上的特征,对行为识别的分类同样是有价值的,可以将采用3D卷积神经网络提取时空关系特征图组的时空特征和采用2D卷积神经网络提取时空关系特征图组的深层特征作为分类的输入特征,从而提高行为识别模型的识别准确率。
S417:将时空特征和深层特征接入预设的分类器。
其中,时空特征和深层特征通过向量的形式表示,向量中的元素,及元素之间的排列顺序,体现的是待训练图像的时空特征和深层特征。
可以理解地,3D卷积神经网络主要作用是提取时空上的特征,2D卷积神经网络主要作用是提取空间上的深层特征,本实施例中可综合两种不同的卷积神经网络提取的重点和提取的效果,使得采用分类器输出的结果更加可靠。
具体地,时空特征和深层特征在接入分类器时可采用级联操作对时空特征和深层特征表示的向量进行拼接,并通过全连接层接入分类器。其中,全连接层中的每个神经元与其前一层的所有神经元进行全连接,整合卷积层或者池化层中具有类别区分性的局部信息。在全连接层中,最后一层全连接层的输出值被传递给一个输出,接入预设的分类器,该分类器可以采用softmax分类器,通过softmax分类器对接入的时空特征和深层特征映射到(0,1)区间内实现分类。
S418:通过分类器输出样本视频中有人摔倒的第一概率以及出现倒地伴随动作的第二概率。
可以理解地,在步骤S412得到的待训练图像预先做好了标签分类处理,将待训练图像分为摔倒图像、倒地伴随动作图像和常规(非摔倒非倒地伴随动作)图像。在训练过程中,将会根据预先标签分类处理好的待训练图像,通过softmax分类器输出样本视频中有人摔倒的第一概率以及出现倒地伴随动作的第二概率。
S419:采用预定义的损失函数,根据样本视频的标签值、以及第一概率和第二概率得到在模型训练过程中产生的损失值。
可以理解地,在行为识别模型训练过程中将产生损失值,也即表示训练过程中出现了误差,将影响模型的识别精度,对于此,可采用数学方法中计算损失值的方法预先定义、建立损失函数。通过损失函数,根据样本视频的标签值、以及第一概率和第二概率在模型训练过程中计算产生的损失值,能够根据该损失值更新网络参数,得到识别精确度较高的行为识别模型。
S41-10:根据损失值,采用反向传播算法更新模型的网络参数,得到行为识别模型。
可以理解地,对于已知的损失值,可采用数学方法中的反向传播算法根据损失值对模型的网络参数进行更新,直至更新次数到达预设更新次数阈值或更新过程中梯度不再下降时结束更新过程,从而得到行为识别模型。
在步骤S411至S41-10中,提供了一种训练行为识别模型的具体实施方式,在训练过程中将不同样本片段在时间上和空间上的特征提取出来,使得提取的特征更能体现样本片段的空间分布特点以及样本片段之间的时间联系,使得训练得到的行为识别模型具备识别摔倒事件的能力,具有较高的准确率。
进一步地,在步骤S40中,通过行为识别模型输出目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率,具体包括:
S421:采用2D卷积神经网络分别提取每个待识别图像的特征,得到每个待识别图像的特征图像。
S422:根据N个片段对应的特征图像得到目标时空关系特征图组。
S423:采用3D卷积神经网络提取目标时空关系特征图组的目标时空特征。
S424:采用2D卷积神经网络提取目标时空关系特征图组的目标深层特征。
S425:将目标时空特征和目标深层特征接入预设的分类器。
S426:通过分类器输出目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率。
在步骤S421-S426中,提供了一种通过行为识别模型输出目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率的具体实施方式,能够在判定摔倒事件的过程中,充分地提取待识别图像在空间以及时间上的特征,使得输出的第一概率和第二概率的准确性较高。
可以理解地,步骤S421-S426是采用行为识别模型识别行为的过程,与训练行为识别模型的步骤中有类似的步骤,可参考步骤S411-S41-10,在此不再赘述。
S50:根据第一概率和第二概率得到综合期望概率。
具体地,综合期望概率可以采用加权的计算方式得到,也可以采用基于贝叶斯定理的方式得到,在此不作限定。综合期望概率还考虑了摔倒后人的伴随动作,相比只采用第一概率进行摔倒判定,采用综合期望概率进行摔倒判定的准确率更高。
S60:当综合期望概率大于预设阈值时,判定在目标视频中出现有人摔倒的情况。
在本申请实施例中,首先获取摄像头拍摄的目标视频,并从目标视频中得到目标待分析视频,可针对性地对拍摄的视频进行分析,提高分析的效率和效果;接着将目标待分析视频分为N个片段,并从每个片段中随机抽取一帧图像作为待识别图像,可以在减少计算量的前提下仍保留图像的时空关系,保证摔倒判定的准确度,然后将待识别图像输入到预先训练的行为识别模型中,通过行为识别模型输出目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率,通过该行为识别模型可以提高摔倒判定的准确度,最后根据第一概率和第二概率得到综合期望概率,且当综合期望概率大于预设阈值时,判定在目标视频中出现有人摔倒的情况,能够实现准确的摔倒行为判定。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
基于实施例中所提供的基于行为识别模型的摔倒动作判定方法,本申请实施例进一步给出实现上述方法实施例中各步骤及方法的装置实施例。
图2示出与实施例中基于行为识别模型的摔倒动作判定方法一一对应的基于行为识别模型的摔倒动作判定装置的原理框图。如图2所示,该基于行为识别模型的摔倒动作判定装置包括第一获取模块10、第二获取模块20、第三获取模块30、概率输出模块40、第四获取模块50和判定模块60。其中,第一获取模块10、第二获取模块20、第三获取模块30、概率输出模块40、第四获取模块50和判定模块60的实现功能与实施例中基于行为识别模型的摔倒动作判定方法对应的步骤一一对应,为避免赘述,本实施例不一一详述。
第一获取模块10,用于获取摄像头拍摄的目标视频。
第二获取模块20,用于从目标视频中得到目标待分析视频。
第三获取模块30,用于将目标待分析视频分为N个片段,并从每个片段中随机抽取一帧图像作为待识别图像,其中,N为大于1的整数。
概率输出模块40,用于将待识别图像输入到预先训练的行为识别模型中,通过行为识别模型输出目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率。
第四获取模块50,用于根据第一概率和第二概率得到综合期望概率。
判定模块60,用于当综合期望概率大于预设阈值时,判定在目标视频中出现有人摔倒的 情况。
可选地,第二获取模块20具体用于:
确定新旧图像临界时刻,其中,新旧图像临界时刻用于将目标视频分为第一图像组和第二图像组,第一图像组中任一图像获取的时刻小于第二图像组中任一图像获取的时刻。
从第一图像组中获取第一目标视频,其中,第一目标视频中最后一帧所对应的图像对应的时刻为新旧图像临界时刻,第一目标视频的长度为用户预设时长的一半。
从第二图像组中获取第二目标视频,其中,第二目标视频中第一帧所对应的图像对应的时刻为新旧图像临界时刻,第二目标视频的长度为用户预设时长的一半。
按照时间的先后顺序组合第一目标视频和第二目标视频,得到目标待分析视频。
可选地,行为识别模型采用训练模块得到,该训练模块具体用于:
获取预设数量的摔倒视频作为样本视频,其中,摔倒视频的时长预先处理为等长,且摔倒视频的时长与目标待分析视频的时长相同。
将每个样本视频分为N个样本片段,并从每个样本片段中随机抽取一帧图像作为待训练图像,其中,N为大于1的整数。
采用2D卷积神经网络分别提取每个待训练图像的特征,得到每个待训练图像的特征图像。
根据N个样本片段对应的特征图像得到时空关系特征图组。
采用3D卷积神经网络提取时空关系特征图组的时空特征。
采用2D卷积神经网络提取时空关系特征图组的深层特征。
将时空特征和深层特征接入预设的分类器。
通过分类器输出样本视频中有人摔倒的第一概率以及出现倒地伴随动作的第二概率。
采用预定义的损失函数,根据样本视频的标签值、以及第一概率和第二概率得到在模型训练过程中产生的损失值。
根据损失值,采用反向传播算法更新模型的网络参数,得到行为识别模型。
可选地,第N个特征图像的大小表示为K×A×B,K为特征图像通道数,A×B为特征图像像素面积,第N个特征图像表示为
Figure PCTCN2019117328-appb-000010
根据N个样本片段对应的特征图像得到时空关系特征图组,包括:将N个样本片段对应的特征图像进行堆叠,得到表示为{M 1,M 2,......,M N-1,M N}的时空关系特征图组,其中,堆叠后的
Figure PCTCN2019117328-appb-000011
Figure PCTCN2019117328-appb-000012
可选地,概率输出模块40具体用于:
采用2D卷积神经网络分别提取每个待识别图像的特征,得到每个待识别图像的特征图像。
根据N个片段对应的特征图像得到目标时空关系特征图组。
采用3D卷积神经网络提取目标时空关系特征图组的目标时空特征。
采用2D卷积神经网络提取目标时空关系特征图组的目标深层特征。
将目标时空特征和目标深层特征接入预设的分类器。
通过分类器输出目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率。
在本申请实施例中,首先获取摄像头拍摄的目标视频,并从目标视频中得到目标待分析视频,可针对性地对拍摄的视频进行分析,提高分析的效率和效果;接着将目标待分析视频分为N个片段,并从每个片段中随机抽取一帧图像作为待识别图像,可以在减少计算量的前提下仍保留图像的时空关系,保证摔倒判定的准确度,然后将待识别图像输入到预先训练的行为识别模型中,通过行为识别模型输出目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率,通过该行为识别模型可以提高摔倒判定的准确度,最后根据第一概率和第二概率得到综合期望概率,且当综合期望概率大于预设阈值时,判定在目标视频中出现有人摔倒的情况,能够实现准确的摔倒行为判定。
本实施例提供一计算机非易失性可读存储介质,该计算机非易失性可读存储介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现实施例中基于行为识别模型的摔倒动作判定方法,为避免重复,此处不一一赘述。或者,该计算机可读指令被处理器执行时实现实施例中基于行为识别模型的摔倒动作判定装置中各模块/单元的功能,为避免重复,此处不一一赘述。
图3是本申请一实施例提供的计算机设备的示意图。如图3所示,该实施例的计算机设备70包括:处理器71、存储器72以及存储在存储器72中并可在处理器71上运行的计算机可读指令73,该计算机可读指令73被处理器71执行时实现实施例中的基于行为识别模型的摔倒动作判定方法,为避免重复,此处不一一赘述。或者,该计算机可读指令73被处理器71执行时实现实施例中基于行为识别模型的摔倒动作判定装置中各模型/单元的功能,为避免重复,此处不一一赘述。
计算机设备70可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。计算机设备70可包括,但不仅限于,处理器71、存储器72。本领域技术人员可以理解,图3仅仅是计算机设备70的示例,并不构成对计算机设备70的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如计算机设备还可以包括输入输出设备、网络接入设备、总线等。
所称处理器71可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处 理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器72可以是计算机设备70的内部存储单元,例如计算机设备70的硬盘或内存。存储器72也可以是计算机设备70的外部存储设备,例如计算机设备70上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器72还可以既包括计算机设备70的内部存储单元也包括外部存储设备。存储器72用于存储计算机可读指令以及计算机设备所需的其他程序和数据。存储器72还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种基于行为识别模型的摔倒动作判定方法,其特征在于,所述方法包括:
    获取摄像头拍摄的目标视频;
    从所述目标视频中得到目标待分析视频;
    将所述目标待分析视频分为N个片段,并从每个所述片段中随机抽取一帧图像作为待识别图像,其中,N为大于1的整数;
    将所述待识别图像输入到预先训练的行为识别模型中,通过所述行为识别模型输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率;
    根据所述第一概率和所述第二概率得到综合期望概率;
    当所述综合期望概率大于预设阈值时,判定在所述目标视频中出现有人摔倒的情况。
  2. 根据权利要求1所述的方法,其特征在于,所述目标待分析视频的时长为用户预设时长,所述从所述目标视频中得到目标待分析视频,包括:
    确定新旧图像临界时刻,其中,所述新旧图像临界时刻用于将所述目标视频分为第一图像组和第二图像组,所述第一图像组中任一图像获取的时刻小于所述第二图像组中任一图像获取的时刻;
    从所述第一图像组中获取第一目标视频,其中,所述第一目标视频中最后一帧所对应的图像对应的时刻为所述新旧图像临界时刻,所述第一目标视频的长度为所述用户预设时长的一半;
    从所述第二图像组中获取第二目标视频,其中,所述第二目标视频中第一帧所对应的图像对应的时刻为所述新旧图像临界时刻,所述第二目标视频的长度为所述用户预设时长的一半;
    按照时间的先后顺序组合所述第一目标视频和所述第二目标视频,得到所述目标待分析视频。
  3. 根据权利要求1所述的方法,其特征在于,所述行为识别模型采用以下步骤训练得到:
    获取预设数量的摔倒视频作为样本视频,其中,所述摔倒视频的时长预先处理为等长,且所述摔倒视频的时长与所述目标待分析视频的时长相同;
    将每个样本视频分为N个样本片段,并从每个所述样本片段中随机抽取一帧图像作为待训练图像,其中,N为大于1的整数;
    采用2D卷积神经网络分别提取每个所述待训练图像的特征,得到每个所述待训练图像的特征图像;
    根据N个所述样本片段对应的所述特征图像得到时空关系特征图组;
    采用3D卷积神经网络提取所述时空关系特征图组的时空特征;
    采用2D卷积神经网络提取所述时空关系特征图组的深层特征;
    将所述时空特征和所述深层特征接入预设的分类器;
    通过所述分类器输出所述样本视频中有人摔倒的第一概率以及出现倒地伴随动作的第二概率;
    采用预定义的损失函数,根据所述样本视频的标签值、以及所述第一概率和所述第二概率得到在模型训练过程中产生的损失值;
    根据所述损失值,采用反向传播算法更新模型的网络参数,得到所述行为识别模型。
  4. 根据权利要求3所述的方法,其特征在于,所述第N个特征图像的大小表示为K×A×B,所述K为特征图像通道数,所述A×B为特征图像像素面积,所述第N个特征图像表示为
    Figure PCTCN2019117328-appb-100001
    Figure PCTCN2019117328-appb-100002
    所述根据N个所述样本片段对应的所述特征图像得到时空关系特征图组,包括:
    将N个所述样本片段对应的所述特征图像进行堆叠,得到表示为{M 1,M 2,……,M N-1,M N}的所述时空关系特征图组,其中,堆叠后的
    Figure PCTCN2019117328-appb-100003
  5. 根据权利要求1-4任意一项所述的方法,其特征在于,所述通过所述行为识别模型输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率,包括如下步骤:
    采用2D卷积神经网络分别提取每个所述待识别图像的特征,得到每个所述待识别图像的特征图像;
    根据N个所述片段对应的所述特征图像得到目标时空关系特征图组;
    采用3D卷积神经网络提取所述目标时空关系特征图组的目标时空特征;
    采用2D卷积神经网络提取所述目标时空关系特征图组的目标深层特征;
    将所述目标时空特征和目标深层特征接入预设的分类器;
    通过所述分类器输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率。
  6. 一种基于行为识别模型的摔倒动作判定装置,其特征在于,所述装置包括:
    第一获取模块,用于获取摄像头拍摄的目标视频;
    第二获取模块,用于从所述目标视频中得到目标待分析视频;
    第三获取模块,用于将所述目标待分析视频分为N个片段,并从每个所述片段中随机抽取一帧图像作为待识别图像,其中,N为大于1的整数;
    概率输出模块,用于将所述待识别图像输入到预先训练的行为识别模型中,通过所述行为识别模型输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率;
    第四获取模块,用于根据所述第一概率和所述第二概率得到综合期望概率;
    判定模块,用于当所述综合期望概率大于预设阈值时,判定在所述目标视频中出现有人摔倒的情况。
  7. 根据权利要求6所述的装置,其特征在于,所述目标待分析视频的时长为用户预设时长,所述第二获取模块具体用于:
    确定新旧图像临界时刻,其中,所述新旧图像临界时刻用于将所述目标视频分为第一图像组和第二图像组,所述第一图像组中任一图像获取的时刻小于所述第二图像组中任一图像获取的时刻;
    从所述第一图像组中获取第一目标视频,其中,所述第一目标视频中最后一帧所对应的图像对应的时刻为所述新旧图像临界时刻,所述第一目标视频的长度为所述用户预设时长的一半;
    从所述第二图像组中获取第二目标视频,其中,所述第二目标视频中第一帧所对应的图像对应的时刻为所述新旧图像临界时刻,所述第二目标视频的长度为所述用户预设时长的一半;
    按照时间的先后顺序组合所述第一目标视频和所述第二目标视频,得到所述目标待分析视频。
  8. 根据权利要求6所述的装置,其特征在于,所述行为识别模型采用训练模块得到,所述训练模块具体用于:
    获取预设数量的摔倒视频作为样本视频,其中,所述摔倒视频的时长预先处理为等长,且所述摔倒视频的时长与所述目标待分析视频的时长相同;
    将每个样本视频分为N个样本片段,并从每个所述样本片段中随机抽取一帧图像作为待训练图像,其中,N为大于1的整数;
    采用2D卷积神经网络分别提取每个所述待训练图像的特征,得到每个所述待训练图像的特征图像;
    根据N个所述样本片段对应的所述特征图像得到时空关系特征图组;
    采用3D卷积神经网络提取所述时空关系特征图组的时空特征;
    采用2D卷积神经网络提取所述时空关系特征图组的深层特征;
    将所述时空特征和所述深层特征接入预设的分类器;
    通过所述分类器输出所述样本视频中有人摔倒的第一概率以及出现倒地伴随动作的第二 概率;
    采用预定义的损失函数,根据所述样本视频的标签值、以及所述第一概率和所述第二概率得到在模型训练过程中产生的损失值;
    根据所述损失值,采用反向传播算法更新模型的网络参数,得到所述行为识别模型。
  9. 根据权利要求8所述的装置,其特征在于,所述第N个特征图像的大小表示为K×A×B,所述K为特征图像通道数,所述A×B为特征图像像素面积,所述第N个特征图像表示为
    Figure PCTCN2019117328-appb-100004
    Figure PCTCN2019117328-appb-100005
    所述根据N个所述样本片段对应的所述特征图像得到时空关系特征图组,包括:
    将N个所述样本片段对应的所述特征图像进行堆叠,得到表示为{M 1,M 2,……,M N-1,M N}的所述时空关系特征图组,其中,堆叠后的
    Figure PCTCN2019117328-appb-100006
  10. 根据权利要求6-9任意一项所述的装置,其特征在于,所述概率输出模块具体用于:
    采用2D卷积神经网络分别提取每个所述待识别图像的特征,得到每个所述待识别图像的特征图像;
    根据N个所述片段对应的所述特征图像得到目标时空关系特征图组;
    采用3D卷积神经网络提取所述目标时空关系特征图组的目标时空特征;
    采用2D卷积神经网络提取所述目标时空关系特征图组的目标深层特征;
    将所述目标时空特征和目标深层特征接入预设的分类器;
    通过所述分类器输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率。
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取摄像头拍摄的目标视频;
    从所述目标视频中得到目标待分析视频;
    将所述目标待分析视频分为N个片段,并从每个所述片段中随机抽取一帧图像作为待识别图像,其中,N为大于1的整数;
    将所述待识别图像输入到预先训练的行为识别模型中,通过所述行为识别模型输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率;
    根据所述第一概率和所述第二概率得到综合期望概率;
    当所述综合期望概率大于预设阈值时,判定在所述目标视频中出现有人摔倒的情况。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述目标待分析视频的时长为用 户预设时长,所述处理器执行所述计算机可读指令实现从所述目标视频中得到目标待分析视频时,包括如下步骤:
    确定新旧图像临界时刻,其中,所述新旧图像临界时刻用于将所述目标视频分为第一图像组和第二图像组,所述第一图像组中任一图像获取的时刻小于所述第二图像组中任一图像获取的时刻;
    从所述第一图像组中获取第一目标视频,其中,所述第一目标视频中最后一帧所对应的图像对应的时刻为所述新旧图像临界时刻,所述第一目标视频的长度为所述用户预设时长的一半;
    从所述第二图像组中获取第二目标视频,其中,所述第二目标视频中第一帧所对应的图像对应的时刻为所述新旧图像临界时刻,所述第二目标视频的长度为所述用户预设时长的一半;
    按照时间的先后顺序组合所述第一目标视频和所述第二目标视频,得到所述目标待分析视频。
  13. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令还实现如下步骤:
    在预先训练行为识别模型时,获取预设数量的摔倒视频作为样本视频,其中,所述摔倒视频的时长预先处理为等长,且所述摔倒视频的时长与所述目标待分析视频的时长相同;
    将每个样本视频分为N个样本片段,并从每个所述样本片段中随机抽取一帧图像作为待训练图像,其中,N为大于1的整数;
    采用2D卷积神经网络分别提取每个所述待训练图像的特征,得到每个所述待训练图像的特征图像;
    根据N个所述样本片段对应的所述特征图像得到时空关系特征图组;
    采用3D卷积神经网络提取所述时空关系特征图组的时空特征;
    采用2D卷积神经网络提取所述时空关系特征图组的深层特征;
    将所述时空特征和所述深层特征接入预设的分类器;
    通过所述分类器输出所述样本视频中有人摔倒的第一概率以及出现倒地伴随动作的第二概率;
    采用预定义的损失函数,根据所述样本视频的标签值、以及所述第一概率和所述第二概率得到在模型训练过程中产生的损失值;
    根据所述损失值,采用反向传播算法更新模型的网络参数,得到所述行为识别模型。
  14. 根据权利要求13所述的计算机设备,其特征在于,所述第N个特征图像的大小表示 为K×A×B,所述K为特征图像通道数,所述A×B为特征图像像素面积,所述第N个特征图像表示为
    Figure PCTCN2019117328-appb-100007
    所述根据N个所述样本片段对应的所述特征图像得到时空关系特征图组,包括:
    将N个所述样本片段对应的所述特征图像进行堆叠,得到表示为{M 1,M 2,……,M N-1,M N}的所述时空关系特征图组,其中,堆叠后的
    Figure PCTCN2019117328-appb-100008
  15. 根据权利要求11-14任意一项所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令实现通过所述行为识别模型输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率时,包括如下步骤:
    采用2D卷积神经网络分别提取每个所述待识别图像的特征,得到每个所述待识别图像的特征图像;
    根据N个所述片段对应的所述特征图像得到目标时空关系特征图组;
    采用3D卷积神经网络提取所述目标时空关系特征图组的目标时空特征;
    采用2D卷积神经网络提取所述目标时空关系特征图组的目标深层特征;
    将所述目标时空特征和目标深层特征接入预设的分类器;
    通过所述分类器输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率。
  16. 一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下步骤:
    获取摄像头拍摄的目标视频;
    从所述目标视频中得到目标待分析视频;
    将所述目标待分析视频分为N个片段,并从每个所述片段中随机抽取一帧图像作为待识别图像,其中,N为大于1的整数;
    将所述待识别图像输入到预先训练的行为识别模型中,通过所述行为识别模型输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率;
    根据所述第一概率和所述第二概率得到综合期望概率;
    当所述综合期望概率大于预设阈值时,判定在所述目标视频中出现有人摔倒的情况。
  17. 根据权利要求16所述的计算机非易失性可读存储介质,其特征在于,所述目标待分析视频的时长为用户预设时长,所述计算机可读指令被一个或多个处理器执行实现从所述目标视频中得到目标待分析视频时,包括如下步骤:
    确定新旧图像临界时刻,其中,所述新旧图像临界时刻用于将所述目标视频分为第一图像 组和第二图像组,所述第一图像组中任一图像获取的时刻小于所述第二图像组中任一图像获取的时刻;
    从所述第一图像组中获取第一目标视频,其中,所述第一目标视频中最后一帧所对应的图像对应的时刻为所述新旧图像临界时刻,所述第一目标视频的长度为所述用户预设时长的一半;
    从所述第二图像组中获取第二目标视频,其中,所述第二目标视频中第一帧所对应的图像对应的时刻为所述新旧图像临界时刻,所述第二目标视频的长度为所述用户预设时长的一半;
    按照时间的先后顺序组合所述第一目标视频和所述第二目标视频,得到所述目标待分析视频。
  18. 根据权利要求16所述的计算机非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行还实现如下步骤:
    在预先训练行为识别模型时,获取预设数量的摔倒视频作为样本视频,其中,所述摔倒视频的时长预先处理为等长,且所述摔倒视频的时长与所述目标待分析视频的时长相同;
    将每个样本视频分为N个样本片段,并从每个所述样本片段中随机抽取一帧图像作为待训练图像,其中,N为大于1的整数;
    采用2D卷积神经网络分别提取每个所述待训练图像的特征,得到每个所述待训练图像的特征图像;
    根据N个所述样本片段对应的所述特征图像得到时空关系特征图组;
    采用3D卷积神经网络提取所述时空关系特征图组的时空特征;
    采用2D卷积神经网络提取所述时空关系特征图组的深层特征;
    将所述时空特征和所述深层特征接入预设的分类器;
    通过所述分类器输出所述样本视频中有人摔倒的第一概率以及出现倒地伴随动作的第二概率;
    采用预定义的损失函数,根据所述样本视频的标签值、以及所述第一概率和所述第二概率得到在模型训练过程中产生的损失值;
    根据所述损失值,采用反向传播算法更新模型的网络参数,得到所述行为识别模型。
  19. 根据权利要求18所述的计算机非易失性可读存储介质,其特征在于,所述第N个特征图像的大小表示为K×A×B,所述K为特征图像通道数,所述A×B为特征图像像素面积,所述第N个特征图像表示为
    Figure PCTCN2019117328-appb-100009
    所述根据N个所述样本片段对应的所述特征图像得到时空关系特征图组,包括:
    将N个所述样本片段对应的所述特征图像进行堆叠,得到表示为{M 1,M 2,……,M N-1,M N}的所述时空关系特征图组,其中,堆叠后的
    Figure PCTCN2019117328-appb-100010
  20. 根据权利要求16-19任意一项所述的计算机非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行实现通过所述行为识别模型输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率时,包括如下步骤:
    采用2D卷积神经网络分别提取每个所述待识别图像的特征,得到每个所述待识别图像的特征图像;
    根据N个所述片段对应的所述特征图像得到目标时空关系特征图组;
    采用3D卷积神经网络提取所述目标时空关系特征图组的目标时空特征;
    采用2D卷积神经网络提取所述目标时空关系特征图组的目标深层特征;
    将所述目标时空特征和目标深层特征接入预设的分类器;
    通过所述分类器输出所述目标视频中有人摔倒的第一概率,以及出现倒地伴随动作的第二概率。
PCT/CN2019/117328 2019-09-16 2019-11-12 基于行为识别模型的摔倒动作判定方法、装置、计算机设备及存储介质 WO2021051545A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910869615.0 2019-09-16
CN201910869615.0A CN110765860B (zh) 2019-09-16 2019-09-16 摔倒判定方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021051545A1 true WO2021051545A1 (zh) 2021-03-25

Family

ID=69329763

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117328 WO2021051545A1 (zh) 2019-09-16 2019-11-12 基于行为识别模型的摔倒动作判定方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN110765860B (zh)
WO (1) WO2021051545A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128522A (zh) * 2021-05-11 2021-07-16 四川云从天府人工智能科技有限公司 目标识别方法、装置、计算机设备和存储介质
CN113850829A (zh) * 2021-09-28 2021-12-28 深圳万兴软件有限公司 基于高效深度网络的视频镜头分割方法、装置及相关组件
CN113989938A (zh) * 2021-11-12 2022-01-28 内蒙古科技大学 行为识别方法、装置及电子设备
CN114067442A (zh) * 2022-01-18 2022-02-18 深圳市海清视讯科技有限公司 洗手动作检测方法、模型训练方法、装置及电子设备
CN114220175A (zh) * 2021-12-17 2022-03-22 广州津虹网络传媒有限公司 运动模式识别方法及其装置、设备、介质、产品
CN114972419A (zh) * 2022-04-12 2022-08-30 中国电信股份有限公司 摔倒检测方法、装置、介质与电子设备
CN116385945A (zh) * 2023-06-06 2023-07-04 山东省人工智能研究院 基于随机帧补帧和注意力的视频交互动作检测方法及***

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598026B (zh) * 2020-05-20 2023-05-30 广州市百果园信息技术有限公司 动作识别方法、装置、设备及存储介质
CN111626187B (zh) * 2020-05-25 2023-08-08 京东科技信息技术有限公司 一种身份标注方法、装置、电子设备及存储介质
CN111767888A (zh) * 2020-07-08 2020-10-13 北京澎思科技有限公司 对象状态检测方法、计算机设备、存储介质和电子设备
CN111898518A (zh) * 2020-07-28 2020-11-06 中移(杭州)信息技术有限公司 一种摔倒检测方法、电子设备和存储介质
CN111626273B (zh) * 2020-07-29 2020-12-22 成都睿沿科技有限公司 基于原子性动作时序特性的摔倒行为识别***及方法
CN111899470B (zh) * 2020-08-26 2022-07-22 歌尔科技有限公司 人体跌倒检测方法、装置、设备及存储介质
CN112580523A (zh) * 2020-12-22 2021-03-30 平安国际智慧城市科技股份有限公司 行为识别方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967441A (zh) * 2017-09-19 2018-04-27 北京工业大学 一种基于双通道3d-2d rbm模型的视频行为识别方法
WO2018210796A1 (en) * 2017-05-15 2018-11-22 Deepmind Technologies Limited Neural network systems for action recognition in videos
CN109522902A (zh) * 2017-09-18 2019-03-26 微软技术许可有限责任公司 空-时特征表示的提取
CN109726672A (zh) * 2018-12-27 2019-05-07 哈尔滨工业大学 一种基于人体骨架序列和卷积神经网络的摔倒检测方法
CN109886102A (zh) * 2019-01-14 2019-06-14 华中科技大学 一种基于深度图像的跌倒行为时空域检测方法
CN110084202A (zh) * 2019-04-29 2019-08-02 东南大学 一种基于高效三维卷积的视频行为识别方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955699B (zh) * 2014-03-31 2017-12-26 北京邮电大学 一种基于监控视频的实时摔倒事件检测方法
CN105046882B (zh) * 2015-07-23 2017-09-26 浙江机电职业技术学院 摔倒检测方法以及装置
CN106951834B (zh) * 2017-03-03 2020-04-10 沈阳航空航天大学 一种基于养老机器人平台的摔倒动作检测方法
CN108932479A (zh) * 2018-06-06 2018-12-04 上海理工大学 一种人体异常行为检测方法
CN109508638A (zh) * 2018-10-11 2019-03-22 平安科技(深圳)有限公司 人脸情绪识别方法、装置、计算机设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018210796A1 (en) * 2017-05-15 2018-11-22 Deepmind Technologies Limited Neural network systems for action recognition in videos
CN109522902A (zh) * 2017-09-18 2019-03-26 微软技术许可有限责任公司 空-时特征表示的提取
CN107967441A (zh) * 2017-09-19 2018-04-27 北京工业大学 一种基于双通道3d-2d rbm模型的视频行为识别方法
CN109726672A (zh) * 2018-12-27 2019-05-07 哈尔滨工业大学 一种基于人体骨架序列和卷积神经网络的摔倒检测方法
CN109886102A (zh) * 2019-01-14 2019-06-14 华中科技大学 一种基于深度图像的跌倒行为时空域检测方法
CN110084202A (zh) * 2019-04-29 2019-08-02 东南大学 一种基于高效三维卷积的视频行为识别方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YIZHOU ZHOU, XIAOYAN SUN, ZHENG-JUN ZHA, WENJUN ZENG: "MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 18 June 2018 (2018-06-18), pages 449 - 458, XP033476005 *
YUAN ZHI , HU HUI: "A Fall Detection Method Based on Two-Stream Convolutional Neural Network", JOURNAL OF HENAN NORMAL UNIVERSITY (NATURAL SCIENCE EDITION), vol. 45, no. 3, 8 May 2017 (2017-05-08), pages 96 - 101, XP055792544, ISSN: 1000-2367, DOI: 10.16366/j.cnki.1000-2367.2017.03.014 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128522A (zh) * 2021-05-11 2021-07-16 四川云从天府人工智能科技有限公司 目标识别方法、装置、计算机设备和存储介质
CN113128522B (zh) * 2021-05-11 2024-04-05 四川云从天府人工智能科技有限公司 目标识别方法、装置、计算机设备和存储介质
CN113850829A (zh) * 2021-09-28 2021-12-28 深圳万兴软件有限公司 基于高效深度网络的视频镜头分割方法、装置及相关组件
CN113989938A (zh) * 2021-11-12 2022-01-28 内蒙古科技大学 行为识别方法、装置及电子设备
CN114220175A (zh) * 2021-12-17 2022-03-22 广州津虹网络传媒有限公司 运动模式识别方法及其装置、设备、介质、产品
CN114067442A (zh) * 2022-01-18 2022-02-18 深圳市海清视讯科技有限公司 洗手动作检测方法、模型训练方法、装置及电子设备
CN114972419A (zh) * 2022-04-12 2022-08-30 中国电信股份有限公司 摔倒检测方法、装置、介质与电子设备
CN114972419B (zh) * 2022-04-12 2023-10-03 中国电信股份有限公司 摔倒检测方法、装置、介质与电子设备
CN116385945A (zh) * 2023-06-06 2023-07-04 山东省人工智能研究院 基于随机帧补帧和注意力的视频交互动作检测方法及***
CN116385945B (zh) * 2023-06-06 2023-08-25 山东省人工智能研究院 基于随机帧补帧和注意力的视频交互动作检测方法及***

Also Published As

Publication number Publication date
CN110765860B (zh) 2023-06-23
CN110765860A (zh) 2020-02-07

Similar Documents

Publication Publication Date Title
WO2021051545A1 (zh) 基于行为识别模型的摔倒动作判定方法、装置、计算机设备及存储介质
WO2021017606A1 (zh) 视频处理方法、装置、电子设备及存储介质
Sindagi et al. Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting
CN106803055B (zh) 人脸识别方法和装置
Zhang et al. Fast and robust occluded face detection in ATM surveillance
US8792722B2 (en) Hand gesture detection
WO2022111506A1 (zh) 视频动作识别方法、装置、电子设备和存储介质
US8750573B2 (en) Hand gesture detection
WO2017096753A1 (zh) 人脸关键点跟踪方法、终端和非易失性计算机可读存储介质
WO2016183766A1 (en) Method and apparatus for generating predictive models
WO2023082784A1 (zh) 一种基于局部特征注意力的行人重识别方法和装置
CN111783620A (zh) 表情识别方法、装置、设备及存储介质
Liu et al. Real-time facial expression recognition based on cnn
Xian et al. Evaluation of low-level features for real-world surveillance event detection
WO2019119396A1 (zh) 人脸表情识别方法及装置
US20210117687A1 (en) Image processing method, image processing device, and storage medium
Li et al. Depthwise nonlocal module for fast salient object detection using a single thread
Singh et al. A deep learning based technique for anomaly detection in surveillance videos
CN111209897A (zh) 视频处理的方法、装置和存储介质
CN113255557A (zh) 一种基于深度学习的视频人群情绪分析方法及***
Zhou et al. A study on attention-based LSTM for abnormal behavior recognition with variable pooling
AU2021203821A1 (en) Methods, devices, apparatuses and storage media of detecting correlated objects involved in images
Lahiani et al. Hand pose estimation system based on Viola-Jones algorithm for android devices
CN111666976A (zh) 基于属性信息的特征融合方法、装置和存储介质
CN109299702B (zh) 一种基于深度时空图的人体行为识别方法及***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19945595

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.07.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19945595

Country of ref document: EP

Kind code of ref document: A1