CN110866458A - Multi-user action detection and identification method and device based on three-dimensional convolutional neural network - Google Patents
Multi-user action detection and identification method and device based on three-dimensional convolutional neural network Download PDFInfo
- Publication number
- CN110866458A CN110866458A CN201911032206.1A CN201911032206A CN110866458A CN 110866458 A CN110866458 A CN 110866458A CN 201911032206 A CN201911032206 A CN 201911032206A CN 110866458 A CN110866458 A CN 110866458A
- Authority
- CN
- China
- Prior art keywords
- person
- sequence
- video
- neural network
- convolutional neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 42
- 238000001514 detection method Methods 0.000 title claims abstract description 36
- 230000009471 action Effects 0.000 title claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 32
- 238000011176 pooling Methods 0.000 claims abstract description 20
- 238000007781 pre-processing Methods 0.000 claims abstract description 20
- 238000000354 decomposition reaction Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 238000000926 separation method Methods 0.000 claims description 7
- 238000011156 evaluation Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 4
- 238000003062 neural network model Methods 0.000 abstract 1
- 238000012360 testing method Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 238000004590 computer program Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000012544 monitoring process Methods 0.000 description 6
- 230000015654 memory Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a method and a device for detecting and identifying multi-user actions based on a three-dimensional convolutional neural network. The method comprises the steps of preprocessing a video of a training data set to obtain a sequence of body actions of each person; and inputting the extracted body motion sequence of each person into a three-dimensional convolutional neural network model for motion detection and identification, wherein the three-dimensional convolutional neural network model comprises two 3D convolutional pooling units, two convolutional layers, two maximum pooling layers, a flat layer, two complete connection layers and an output layer. The device comprises a preprocessing module and a detection and identification module. The method and the device can be applied to multi-person motion detection, and the three-dimensional convolution neural network model has good generalization capability and higher identification accuracy when being applied to different types of video data.
Description
Technical Field
The application relates to the field of video processing, in particular to a method and a device for detecting and identifying multi-user actions based on a three-dimensional convolutional neural network.
Background
Human motion detection and recognition is a specific application of video processing, aiming at recognizing human motion and activity through a series of behavior recognition about subjects and surrounding environment, which is a hot problem in the field of computer vision. Human motion detection is an important task for many applications, such as video surveillance systems, video retrieval, multimedia applications of human motion, and so on. Human action recognition methods can be divided into two broad categories: detection recognition and classification. The detection and identification method firstly detects the motion of a person and then identifies the motion. These methods are validated using conventional monitoring data sets, such as KTH, Weisman, IX-MAS, UCF-ARG, PETS, and the like. These data sets are recorded under controlled conditions so that the individual is in the best shot position at the camera, with a simple static background and similar lighting variations. Under the classification method, classification is generally performed according to the motion contained in the video. These methods are evaluated using newly developed data sets that are videos collected from a network, such as from YouTube, or videos recorded using a mobile camera and a complex background under practical conditions without controlling lighting conditions, and include Hollywood, Hollywood2, UCFsport, UCF50, UCF101, HMDB51, HMDB, etc., which explore the diversity of video content (the size of a human body, changes in the position of a human body, etc.), changes in camera motion, and background analysis of such data sets. In order to improve the human motion recognition performance, most of recent studies adopt various deep learning models. Since human body behavior is extracted from multiple movements of the human body or parts thereof, the recognition process must involve video processing in order to understand the pattern of visual appearance changes. Many approaches use several input videos for deep learning modeling to recognize human activities, such as recognizing human behavior using CNN models and long-term short-term memory (LSTM); there is another example of many features proposed by scholars for use with CNN models, using raw frames, optical flow and motion stacked difference images as inputs to CNN models for human behavior recognition; yet another approach is to use six-stream features for general motion recognition, where a number of inputs are used, including a full image, a human image representing only the human body, and the optical flow results of each of the previous features.
The above described method has the following drawbacks:
1. most of the existing methods aim at single-person motion detection and identification, and because of the multi-complexity characteristics of multi-person motion detection, the methods aiming at multi-person motion detection are less;
2. the existing multi-person motion detection and identification method often has low identification accuracy due to the characteristics of complexity and diversity;
3. existing methods are generally directed to a specific type of video data set, such as testing using a conventional monitoring data set or testing using a video data set collected from a network, and few models can be applied to both types of models simultaneously, i.e., the existing models are less generalizable.
Disclosure of Invention
It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.
According to one aspect of the application, a method for detecting and identifying actions of multiple persons based on a three-dimensional convolutional neural network is provided, and the method comprises the following steps:
preprocessing a video of a training data set to obtain a sequence of body actions of each person;
and inputting the extracted body motion sequence of each person into a three-dimensional convolutional neural network model for motion detection and identification, wherein the three-dimensional convolutional neural network model comprises two 3D convolutional pooling units, two convolutional layers, two maximum pooling layers, a flat layer, two complete connection layers and an output layer.
Optionally, the preprocessing the video of the training data set includes:
separating the human body from the video of the training data set;
and extracting a sequence of the body action of each person from the video after the human body is separated.
Optionally, the separating the human body from the video of the training data set includes:
determining a background frame using a background evaluation method of an initialization value of SAD continuity between every two images and entropy of each block;
and calculating the background frame to obtain a differential absolute image, and then performing structural texture decomposition on the differential absolute image to obtain the region of the moving object to complete human body separation.
Optionally, the sequence of extracting the body motion of each person from the video after separating the human body is performed by a tracking method based on an extended version of the coring correlation filter.
Optionally, the preprocessing further comprises extracting motion history images from the video of the training data set;
and combining the extracted motion history image with the sequence of the body motion of each person, and inputting the combined motion history image and the sequence of the body motion of each person into a three-dimensional convolutional neural network model together for motion detection and identification.
According to another aspect of the present application, there is provided a multi-person motion detection and recognition apparatus based on a three-dimensional convolutional neural network, comprising:
a pre-processing module configured to pre-process the video of the training data set to obtain a sequence of body movements of each person;
and the detection and identification module is configured to input the extracted sequence of the body actions of each person into a three-dimensional convolutional neural network model for action detection and identification, wherein the three-dimensional convolutional neural network model comprises two 3D convolutional pooling units, two convolutional layers, two maximum pooling layers, a flat layer, two complete connection layers and an output layer.
Optionally, the preprocessing module includes:
a human body separation submodule configured to separate a human body from a video of a training data set;
a sequence extraction sub-module configured to extract a sequence of body movements of each person from the video of the separated human body.
Optionally, the human body separation submodule includes:
a background frame sub-module configured to determine a background frame using an initialization value of SAD continuity between each two images and a background evaluation method of entropy of each block;
and the structure texture decomposition submodule is configured to calculate the background frame to obtain a difference absolute image, and then perform structure texture decomposition on the difference absolute image to obtain the region of the moving object and complete human body separation.
Optionally, the sequence of extracting the body motion of each person from the video after separating the human body is performed by a tracking method based on an extended version of the coring correlation filter.
Optionally, the preprocessing module further includes:
a history image extraction sub-module configured to extract motion history images from the video of the training data set;
and in the detection and identification module, the extracted motion history image is combined with the sequence of the body action of each person, and the combined motion history image and the sequence are jointly input into the three-dimensional convolutional neural network model for action detection and identification.
The application discloses a method and a device for detecting and identifying actions of multiple persons based on a three-dimensional convolutional neural network, due to the fact that training data are preprocessed, sequences of every person are extracted, and improved three-dimensional convolutional neural network models are adopted to detect and identify actions of the human body, therefore, the method and the device can be applied to action detection of multiple persons, and the three-dimensional convolutional neural network models have good generalization capability and higher identification accuracy when being applied to video data of different types.
The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Drawings
Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:
FIG. 1 is a schematic flow chart diagram of a method for multi-user motion detection and recognition based on a three-dimensional convolutional neural network according to an embodiment of the present application;
FIG. 2 is a schematic block diagram of a multi-user motion detection and recognition apparatus based on a three-dimensional convolutional neural network according to an embodiment of the present application;
FIG. 3 is a block schematic diagram of a computing device of one embodiment of the present application;
fig. 4 is a schematic block diagram of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
Fig. 1 is a schematic flow diagram of a method for multi-person motion detection and recognition based on a three-dimensional convolutional neural network, according to an embodiment of the present application, which may generally include:
s1, preprocessing the video of the training data set to obtain a sequence of the body actions of each person:
prior to model training using a training data set, the data set needs to be preprocessed, the human body is separated in a video and a sequence of body movements is extracted during the movements. The preprocessing process is completed by using a background modeling technology, and specifically comprises the following steps: background frames can be determined quickly and efficiently using an initialization value of SAD continuity between every two images and then using a background evaluation method of entropy of each block. In order to minimize information in each frame and reduce noise and false modeling areas, a cvSub () function in OpenCV may be used to perform function calculation on a background frame to obtain a differential absolute image, then calculate a structural texture decomposition of the differential absolute image, then use a structural component containing only a uniform part of the image for a subsequent process, so far, obtain an area of a moving object, and then adjust the size of the preprocessed data.
In the case of many moving objects or people in a scene, each person in the scene needs to be detected and tracked to generate a sequence of each person. For this purpose, an extended version of the tracking method based on a coring correlation filter (KCF) is used. And extracting and generating a sequence representing human body actions of each person from the tracking result. However, these sequences or RGB videos may contain some redundant information, such as static background, for this reason, Motion History Images (MHI) need to be extracted from the videos, the MHI is combined with the sequences, and then the sequences and the sequences are jointly sent into a three-dimensional convolutional neural network model (3DCNN) for training, and the use of the MHI improves the recognition accuracy and reduces the recognition time.
S2, inputting the extracted sequence of each person' S body motion into a three-dimensional convolutional neural network model for motion detection and recognition, wherein the three-dimensional convolutional neural network model comprises two 3D convolutional pooling units, two convolutional layers, two maximum pooling layers, a flat layer, two complete connection layers and an output layer:
the three-dimensional convolutional neural network (3DCNN) is a supervised learning model with a multi-level deep learning network, which can learn a plurality of invariant features from an input video, and convolution and pooling are main components in the 3DCNN model. In this embodiment, the 3D cnn model architecture includes two 3D convolution pooling units, two convolution layers, two maximum pooling layers, one flat layer, two fully connected layers, and an output layer, where the output layer includes ten neurons representing the number of actions, and the activation function is a ReLU function. And combining the MHI obtained in the last step with the sequence of each person, and then jointly sending the MHI into a three-dimensional convolutional neural network model for training.
The method described in this embodiment uses the KTH, Weizmann, and UCF-ARG datasets for model training, which are three conventional monitoring datasets. And performing model test by using PETS, UCF101, Hollywood and MHAD data sets, wherein the test data set is a conventional monitoring data set and three video data sets collected from a network, and the test data set is directly sent into a trained 3DCNN for detecting and identifying multiple human bodies without any preprocessing part. The resolution of the data set was 32 × 32, the time depth was 10, the batch size during the model training was 128, and the learning rate was 1. According to the embodiment, different types of data sets are tested, and the test result shows that compared with the prior art, the generalization capability and accuracy of the 3DCNN model of the embodiment to different data sets are improved.
Fig. 2 is a schematic block diagram of a multi-person motion detection and recognition apparatus based on a three-dimensional convolutional neural network according to an embodiment of the present application, which may generally include:
a pre-processing module 1 configured to pre-process the video of the training data set, obtaining a sequence of body movements of each person:
prior to model training using a training data set, the data set needs to be preprocessed, the human body is separated in a video and a sequence of body movements is extracted during the movements. The preprocessing process is completed by using a background modeling technology, and specifically comprises the following steps: background frames can be determined quickly and efficiently using an initialization value of SAD continuity between every two images and then using a background evaluation method of entropy of each block. In order to minimize information in each frame and reduce noise and false modeling areas, a cvSub () function in OpenCV may be used to perform function calculation on a background frame to obtain a differential absolute image, then calculate a structural texture decomposition of the differential absolute image, then use a structural component containing only a uniform part of the image for a subsequent process, so far, obtain an area of a moving object, and then adjust the size of the preprocessed data.
In the case of many moving objects or people in a scene, each person in the scene needs to be detected and tracked to generate a sequence of each person. For this purpose, an extended version of the tracking method based on a coring correlation filter (KCF) is used. And extracting and generating a sequence representing human body actions of each person from the tracking result. However, these sequences or RGB videos may contain some redundant information, such as static background, for this reason, Motion History Images (MHI) need to be extracted from the videos, the MHI is combined with the sequences, and then the sequences and the sequences are jointly sent into a three-dimensional convolutional neural network model (3DCNN) for training, and the use of the MHI improves the recognition accuracy and reduces the recognition time.
A detection and recognition module 2 configured to input the extracted sequence of each person's body motion into a three-dimensional convolutional neural network model for motion detection and recognition, the three-dimensional convolutional neural network model including two 3D convolutional pooling units, two convolutional layers, two max-pooling layers, one flat layer, two fully-connected layers, and an output layer:
the three-dimensional convolutional neural network (3DCNN) is a supervised learning model with a multi-level deep learning network, which can learn a plurality of invariant features from an input video, and convolution and pooling are main components in the 3DCNN model. In this embodiment, the 3D cnn model architecture includes two 3D convolution pooling units, two convolution layers, two maximum pooling layers, one flat layer, two fully connected layers, and an output layer, where the output layer includes ten neurons representing the number of actions, and the activation function is a ReLU function. And combining the MHI obtained in the last step with the sequence of each person, and then jointly sending the MHI into a three-dimensional convolutional neural network model for training.
The apparatus of this embodiment uses the KTH, Weizmann, and UCF-ARG datasets for model training, which are three conventional monitoring datasets. And performing model test by using PETS, UCF101, Hollywood and MHAD data sets, wherein the test data set is a conventional monitoring data set and three video data sets collected from a network, and the test data set is directly sent into a trained 3DCNN for detecting and identifying multiple human bodies without any preprocessing part. The resolution of the data set was 32 × 32, the time depth was 10, the batch size during the model training was 128, and the learning rate was 1. According to the embodiment, different types of data sets are tested, and the test result shows that compared with the prior art, the generalization capability and accuracy of the 3DCNN model of the embodiment to different data sets are improved.
Embodiments also provide a computing device, referring to fig. 3, comprising a memory 1120, a processor 1110 and a computer program stored in said memory 1120 and executable by said processor 1110, the computer program being stored in a space 1130 for program code in the memory 1120, the computer program, when executed by the processor 1110, implementing the method steps 1131 for performing any of the methods according to the invention.
The embodiment of the application also provides a computer readable storage medium. Referring to fig. 4, the computer readable storage medium comprises a storage unit for program code provided with a program 1131' for performing the steps of the method according to the invention, which program is executed by a processor.
The embodiment of the application also provides a computer program product containing instructions. Which, when run on a computer, causes the computer to carry out the steps of the method according to the invention.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A multi-person action detection and identification method based on a three-dimensional convolutional neural network comprises the following steps:
preprocessing a video of a training data set to obtain a sequence of body actions of each person;
and inputting the extracted body motion sequence of each person into a three-dimensional convolutional neural network model for motion detection and identification, wherein the three-dimensional convolutional neural network model comprises two 3D convolutional pooling units, two convolutional layers, two maximum pooling layers, a flat layer, two complete connection layers and an output layer.
2. The method of claim 1, wherein preprocessing the video of the training data set comprises:
separating the human body from the video of the training data set;
and extracting a sequence of the body action of each person from the video after the human body is separated.
3. The method of claim 2, wherein the separating the human body from the video of the training data set comprises:
determining a background frame using a background evaluation method of an initialization value of SAD continuity between every two images and entropy of each block;
and calculating the background frame to obtain a differential absolute image, and then performing structural texture decomposition on the differential absolute image to obtain the region of the moving object to complete human body separation.
4. The method of claim 2, wherein the extracting of the sequence of body movements of each person from the video of separated persons is performed using an extended version of a tracking method based on a coring correlation filter.
5. The method according to any one of claims 1 to 4,
the preprocessing further comprises extracting motion history images from the video of the training data set;
and combining the extracted motion history image with the sequence of the body motion of each person, and inputting the combined motion history image and the sequence of the body motion of each person into a three-dimensional convolutional neural network model together for motion detection and identification.
6. A multi-person action detection and identification device based on a three-dimensional convolutional neural network comprises:
a pre-processing module configured to pre-process the video of the training data set to obtain a sequence of body movements of each person;
and the detection and identification module is configured to input the extracted sequence of the body actions of each person into a three-dimensional convolutional neural network model for action detection and identification, wherein the three-dimensional convolutional neural network model comprises two 3D convolutional pooling units, two convolutional layers, two maximum pooling layers, a flat layer, two complete connection layers and an output layer.
7. The apparatus of claim 6, wherein the preprocessing module comprises:
a human body separation submodule configured to separate a human body from a video of a training data set;
a sequence extraction sub-module configured to extract a sequence of body movements of each person from the video of the separated human body.
8. The apparatus of claim 7, wherein the human body separating submodule comprises:
a background frame sub-module configured to determine a background frame using an initialization value of SAD continuity between each two images and a background evaluation method of entropy of each block;
and the structure texture decomposition submodule is configured to calculate the background frame to obtain a difference absolute image, and then perform structure texture decomposition on the difference absolute image to obtain the region of the moving object and complete human body separation.
9. The apparatus of claim 7, wherein the sequence of extracting the body movements of each person from the video after separating the persons is performed by a tracking method based on an extended version of a coring correlation filter.
10. The apparatus according to any one of claims 6-9,
the preprocessing module further comprises:
a history image extraction sub-module configured to extract motion history images from the video of the training data set;
and in the detection and identification module, the extracted motion history image is combined with the sequence of the body action of each person, and the combined motion history image and the sequence are jointly input into the three-dimensional convolutional neural network model for action detection and identification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911032206.1A CN110866458A (en) | 2019-10-28 | 2019-10-28 | Multi-user action detection and identification method and device based on three-dimensional convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911032206.1A CN110866458A (en) | 2019-10-28 | 2019-10-28 | Multi-user action detection and identification method and device based on three-dimensional convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110866458A true CN110866458A (en) | 2020-03-06 |
Family
ID=69653544
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911032206.1A Pending CN110866458A (en) | 2019-10-28 | 2019-10-28 | Multi-user action detection and identification method and device based on three-dimensional convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110866458A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112131995A (en) * | 2020-09-16 | 2020-12-25 | 北京影谱科技股份有限公司 | Action classification method and device, computing equipment and storage medium |
CN112530144A (en) * | 2020-11-06 | 2021-03-19 | 华能国际电力股份有限公司上海石洞口第一电厂 | Method and system for warning violation behaviors of thermal power plant based on neural network |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919011A (en) * | 2019-01-28 | 2019-06-21 | 浙江工业大学 | A kind of action video recognition methods based on more duration informations |
-
2019
- 2019-10-28 CN CN201911032206.1A patent/CN110866458A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919011A (en) * | 2019-01-28 | 2019-06-21 | 浙江工业大学 | A kind of action video recognition methods based on more duration informations |
Non-Patent Citations (2)
Title |
---|
NOOR ALMAADEED等: ""A Novel Approach for Robust Multi Human Action Detection and Recognition based on 3-Dimentional Convolutional Neural Networks"", 《PATTERN RECOGNITION LETTERS》 * |
OMAR ELHARROUSS等: ""Moving object detection zone using a block based background model"", 《IET COMPUTER VISION》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112131995A (en) * | 2020-09-16 | 2020-12-25 | 北京影谱科技股份有限公司 | Action classification method and device, computing equipment and storage medium |
CN112530144A (en) * | 2020-11-06 | 2021-03-19 | 华能国际电力股份有限公司上海石洞口第一电厂 | Method and system for warning violation behaviors of thermal power plant based on neural network |
CN112530144B (en) * | 2020-11-06 | 2022-06-28 | 华能国际电力股份有限公司上海石洞口第一电厂 | Method and system for warning violation behaviors of thermal power plant based on neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Teoh et al. | Face recognition and identification using deep learning approach | |
CN109558832B (en) | Human body posture detection method, device, equipment and storage medium | |
CN112446270B (en) | Training method of pedestrian re-recognition network, pedestrian re-recognition method and device | |
Wang et al. | Hierarchical attention network for action recognition in videos | |
CN111709311B (en) | Pedestrian re-identification method based on multi-scale convolution feature fusion | |
CN109472191B (en) | Pedestrian re-identification and tracking method based on space-time context | |
CN107067413B (en) | A kind of moving target detecting method of time-space domain statistical match local feature | |
Seal et al. | Human face recognition using random forest based fusion of à-trous wavelet transform coefficients from thermal and visible images | |
Allaert et al. | Micro and macro facial expression recognition using advanced local motion patterns | |
CN113378649A (en) | Identity, position and action recognition method, system, electronic equipment and storage medium | |
CN111353399A (en) | Tamper video detection method | |
CN111814690A (en) | Target re-identification method and device and computer readable storage medium | |
CN110866458A (en) | Multi-user action detection and identification method and device based on three-dimensional convolutional neural network | |
Ali et al. | Deep Learning Algorithms for Human Fighting Action Recognition. | |
Sowmyayani et al. | Fall detection in elderly care system based on group of pictures | |
Putra | A Novel Method for Handling Partial Occlusion on Person Re-identification using Partial Siamese Network | |
Al-Dmour et al. | Masked face detection and recognition system based on deep learning algorithms | |
Ghafoor et al. | Egocentric video summarization based on people interaction using deep learning | |
Manssor et al. | TIRFaceNet: thermal IR facial recognition | |
Zhang et al. | ATMLP: Attention and Time Series MLP for Fall Detection | |
Phang et al. | Real-time multi-camera multi-person action recognition using pose estimation | |
Pushparaj et al. | Using 3D convolutional neural network in surveillance videos for recognizing human actions. | |
Wong et al. | Multi-Camera Face Detection and Recognition in Unconstrained Environment | |
Muhamad et al. | A comparative study using improved LSTM/GRU for human action recognition | |
Voronin et al. | Action recognition using the 3D dense microblock difference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200306 |