CN110991278A - Human body action recognition method and device in video of computer vision system - Google Patents

Human body action recognition method and device in video of computer vision system Download PDF

Info

Publication number
CN110991278A
CN110991278A CN201911142506.5A CN201911142506A CN110991278A CN 110991278 A CN110991278 A CN 110991278A CN 201911142506 A CN201911142506 A CN 201911142506A CN 110991278 A CN110991278 A CN 110991278A
Authority
CN
China
Prior art keywords
video
action
extracting
data preprocessing
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911142506.5A
Other languages
Chinese (zh)
Inventor
吉长江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Moviebook Technology Corp Ltd
Original Assignee
Beijing Moviebook Technology Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Moviebook Technology Corp Ltd filed Critical Beijing Moviebook Technology Corp Ltd
Priority to CN201911142506.5A priority Critical patent/CN110991278A/en
Publication of CN110991278A publication Critical patent/CN110991278A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method and a device for recognizing human body actions in videos of a computer vision system, and relates to the field of human body action recognition. The method comprises the following steps: acquiring a video of a computer vision system, and performing data preprocessing on the video; extracting characteristic information of the video subjected to data preprocessing by adopting a parallel multi-characteristic fusion network algorithm; training the human body action on the deep learning model according to the extracted characteristic information; and classifying and identifying the video to be identified by adopting the trained deep learning model. The device includes: the device comprises an acquisition module, an extraction module, a training module and an identification module. The method and the device improve the accuracy of action recognition and the accuracy of classification recognition.

Description

Human body action recognition method and device in video of computer vision system
Technical Field
The present application relates to the field of human motion recognition, and in particular, to a method and an apparatus for recognizing human motion in a video of a computer vision system.
Background
In earlier traditional algorithmic research make internal disorder or usurp, action recognition typically required manual design rules to extract feature descriptors. In the feature extraction method, global features contain more information of human bodies, and a moving object is regarded as a whole to be subjected to feature description, and the method is sensitive to changes such as noise, shielding phenomena and the like. Local features such as space-time interest points are not dependent on moving human body segmentation positioning and tracking, are not very sensitive to noise and shielding, but are difficult to extract stable space-time interest points. Algorithms based on the combination of global and local features that fuse the advantages of both are more common. For example, the positions of the joint points are detected, different types of motion models are trained by using multi-pose estimation, the semantic space relation between people is extracted, and the human motion recognition is realized by combining appearance characteristics. Track features in a video sequence are obtained by utilizing an optical flow field, then optical flow direction histogram, motion direction histogram and motion boundary histogram features are extracted to obtain a motion descriptor, in the process of extracting the features along the motion track, the influence caused by camera motion is eliminated from an optical flow image, and the accuracy of human body motion recognition is further improved. Unlike RGB image data, there are algorithms based on depth image input information including 3D joint points, such as geometric relation modeling for simulating body parts using rotation and translation in 3D space, and a motion feature acquisition method based on a depth motion map, which uses an L2 regularized co-representation classifier for motion classification recognition.
With the development of deep neural networks such as convolutional neural networks, long-short term memory networks and the like, human body action recognition gradually makes greater progress, and the performance of the algorithm exceeds that of the traditional algorithm. Experts propose a dual-stream CNN (convolutional neural Networks) structure, in which a spatial stream processes a single frame image, a temporal stream processes continuous multi-frame dense optical flow information, the spatial and temporal Networks train together, and finally, results are fused and classified. And an expert provides a time division network aiming at the problem of longer time span of a video sequence based on a double-flow method, and performs motion analysis on the whole motion video based on uniform sparse sampling and video grade supervision. The CNN network and the LSTM (Long Short-term memory network) network are combined, and the performance of the classification algorithm is improved through the sequence of the effective expression frames of the memory units. A novel C3D network is newly provided to extract the space-time characteristics of the video, a better experimental effect is obtained, and the 3D convolution can well capture time sequence information.
However, there are many problems with conventional recognition of human interaction. First, the space of human interactive features is complex. Under the real condition, the spatial complexity of the motion brings great difficulty to the extraction of the motion characteristics of the human body, especially the identification of the human body interaction of multiple people. In consideration of the occlusion property of the action, phenomena such as human and background, mutual occlusion between human and human, and self-occlusion may exist in the process of executing the action, which may lead to more complex target detection of the moving individual. In addition, due to the change of the dynamic background in the action execution process, the existence of other problems such as different backgrounds, different illumination, different resolutions, interference of irrelevant action personnel and the like can bring great difficulty to the human action recognition. Second, the motion characteristics at different time periods differ. For human body interaction, different actions have different completion periods, for example, the handshake action and the charging action take different time, and the same action is performed by different people with different time lengths. For the same action, the information feature quantities obtained in different action time phases are also different, for example, in an action starting phase and an action ending phase, the feature difference between different actions is small, while in an action executing phase, the feature difference between different actions is obvious, and the feature difference brought by different action time phases brings difficulty to action recognition. The key frame acquisition in the human body action execution process has important significance on feature extraction, how to add consideration on action time stage in the video down-sampling process to obtain the key frame with more action feature information, and the key frame brings challenges to human body interaction action recognition. Furthermore, the complexity of the interactive feature information. For single human motion recognition, only motion features of a single person need to be considered, such as global features of a human body, such as optical flow and silhouette, or local features of local spatio-temporal interest points. For the feature extraction of human body interaction, not only the motion feature information of each individual moving need to be considered, but also the interaction information of the human-to-human motion needs to be considered, such as the relative position and orientation between human and human in the motion execution process. Secondly, the feature extraction process also has interference caused by individual actions unrelated to interaction.
Disclosure of Invention
It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.
According to one aspect of the application, a method for recognizing human body actions in a video of a computer vision system is provided, which comprises the following steps:
acquiring a video of a computer vision system, and performing data preprocessing on the video;
extracting characteristic information of the video subjected to data preprocessing by adopting a parallel multi-characteristic fusion network algorithm;
training the human body action on the deep learning model according to the extracted characteristic information;
and classifying and identifying the video to be identified by adopting the trained deep learning model.
Optionally, performing feature information extraction on the video after the data preprocessing by using a parallel multi-feature fusion network algorithm, including:
and extracting the characteristic information of the video subjected to data preprocessing based on a parallel multi-characteristic fusion algorithm of an inclusion network and a ResNet depth residual error network.
Optionally, the data preprocessing is performed on the video, and includes:
when double-person interactive action recognition is carried out, the video of the double-person interactive action is cut and divided into two action videos only containing a single person.
Optionally, performing feature information extraction on the video after the data preprocessing by using a parallel multi-feature fusion network algorithm, including:
and respectively extracting the characteristic information of the whole video and the single video after the segmentation by using a parallel multi-characteristic fusion network algorithm.
Optionally, the method further comprises:
in the process of extracting the characteristic information, a Gaussian model down-sampling method fusing time stage characteristics is adopted, different sampling intervals are given to human bodies in different time stages in interactive action, and redundant information is removed.
According to another aspect of the present application, there is provided an apparatus for recognizing human body motion in a video of a computer vision system, comprising:
an acquisition module configured to acquire a video of a computer vision system, data pre-processing the video;
the extraction module is configured to extract the characteristic information of the video after the data preprocessing by adopting a parallel multi-characteristic fusion network algorithm;
a training module configured to train human body actions on the deep learning model according to the extracted feature information;
and the recognition module is configured to perform classified recognition on the video to be recognized by adopting the trained deep learning model.
Optionally, the extraction module is specifically configured to:
and extracting the characteristic information of the video subjected to data preprocessing based on a parallel multi-characteristic fusion algorithm of an inclusion network and a ResNet depth residual error network.
Optionally, the obtaining module is specifically configured to:
when double-person interactive action recognition is carried out, the video of the double-person interactive action is cut and divided into two action videos only containing a single person.
Optionally, the extraction module is specifically configured to:
and respectively extracting the characteristic information of the whole video and the single video after the segmentation by using a parallel multi-characteristic fusion network algorithm.
Optionally, the extraction module is further configured to:
in the process of extracting the characteristic information, a Gaussian model down-sampling method fusing time stage characteristics is adopted, different sampling intervals are given to human bodies in different time stages in interactive action, and redundant information is removed.
According to yet another aspect of the application, there is provided a computing device comprising a memory, a processor and a computer program stored in the memory and executable by the processor, wherein the processor implements the method as described above when executing the computer program.
According to yet another aspect of the application, a computer-readable storage medium, preferably a non-volatile readable storage medium, is provided, having stored therein a computer program which, when executed by a processor, implements a method as described above.
According to yet another aspect of the application, there is provided a computer program product comprising computer readable code which, when executed by a computer device, causes the computer device to perform the method described above.
According to the technical scheme, the video is subjected to data preprocessing through the video of the computer vision system, a parallel multi-feature fusion network algorithm is adopted, the video after the data preprocessing is subjected to feature information extraction, training of human body actions is carried out on the deep learning model according to the extracted feature information, the trained deep learning model is adopted to carry out classification and recognition on the video to be recognized, and the accuracy of the action recognition and the accuracy of the classification and recognition are improved. Further, aiming at the action characteristic differences of different action time stages of the video, the video key frame is obtained by a Gaussian model down-sampling method fusing time stage characteristics, so that the influence of a large amount of redundant information is removed, and the action identification accuracy is further improved.
The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Drawings
Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:
FIG. 1 is a flow diagram of a method for human motion recognition in a video of a computer vision system according to one embodiment of the present application;
FIG. 2 is a flow diagram of a method for human motion recognition in a video of a computer vision system according to another embodiment of the present application;
FIG. 3 is a block diagram of a human motion recognition device in video of a computer vision system according to another embodiment of the present application;
FIG. 4 is a block diagram of a computing device according to another embodiment of the present application;
fig. 5 is a diagram of a computer-readable storage medium structure according to another embodiment of the present application.
Detailed Description
FIG. 1 is a flow diagram of a method for human motion recognition in a video of a computer vision system according to one embodiment of the present application. Referring to fig. 1, the method includes:
101: acquiring a video of a computer vision system, and performing data preprocessing on the video;
102: extracting characteristic information of the video subjected to data preprocessing by adopting a parallel multi-characteristic fusion network algorithm;
103: training the human body action on the deep learning model according to the extracted characteristic information;
104: and classifying and identifying the video to be identified by adopting the trained deep learning model.
In this embodiment, optionally, the extracting the feature information of the video after the data preprocessing is performed by using a parallel multi-feature fusion network algorithm, including:
and extracting the characteristic information of the video subjected to data preprocessing based on a parallel multi-characteristic fusion algorithm of an inclusion network and a ResNet depth residual error network.
In this embodiment, optionally, the data preprocessing is performed on the video, and includes:
when double-person interactive action recognition is carried out, the video of the double-person interactive action is cut and divided into two action videos only containing a single person.
In this embodiment, optionally, the extracting the feature information of the video after the data preprocessing is performed by using a parallel multi-feature fusion network algorithm, including:
and respectively extracting the characteristic information of the whole video and the single video after the segmentation by using a parallel multi-characteristic fusion network algorithm.
In this embodiment, optionally, the method further includes:
in the process of extracting the characteristic information, a Gaussian model down-sampling method fusing time stage characteristics is adopted, different sampling intervals are given to human bodies in different time stages in interactive action, and redundant information is removed.
According to the method provided by the embodiment, the video of the computer vision system is obtained, the video is subjected to data preprocessing, the parallel multi-feature fusion network algorithm is adopted, the feature information of the video subjected to data preprocessing is extracted, the deep learning model is trained for human body actions according to the extracted feature information, the trained deep learning model is adopted to classify and recognize the video to be recognized, and the accuracy of action recognition and the accuracy of classification and recognition are improved. Further, aiming at the action characteristic differences of different action time stages of the video, the video key frame is obtained by a Gaussian model down-sampling method fusing time stage characteristics, so that the influence of a large amount of redundant information is removed, and the action identification accuracy is further improved.
FIG. 2 is a flow diagram of a method for human motion recognition in a video of a computer vision system according to another embodiment of the present application. Referring to fig. 2, the method includes:
201: acquiring a video of a computer vision system, and performing data preprocessing on the video;
in the data preprocessing stage, for the limitation of insufficient database capacity, the data volume can be enlarged by adopting modes of video horizontal turning, random clipping and the like.
In this embodiment, the data preprocessing performed on the video may specifically include:
when double-person interactive action recognition is carried out, the video of the double-person interactive action is cut and divided into two action videos only containing a single person.
202: extracting feature information of the video subjected to data preprocessing based on a parallel multi-feature fusion algorithm of an inclusion network and a ResNet depth residual error network;
the Incep network improves the network performance while reducing the network parameter number by introducing the characteristic receptive fields with different scales; the ResNet network is improved aiming at the network degradation phenomenon caused by the increase of the network depth, so that higher classification accuracy is obtained.
In this embodiment, the step may specifically include:
when double-person interactive action recognition is carried out, feature information extraction is carried out on the whole video and the single-person video after segmentation by respectively using a parallel multi-feature fusion network algorithm.
The individual action detail characteristic information can be extracted from the segmented single video, and the whole video can learn the characteristic information such as the relative position, the orientation and the like of the two persons.
203: in the process of extracting the characteristic information, a Gaussian model down-sampling method fusing time stage characteristics is adopted, different sampling intervals are given to human bodies in different time stages in interactive action, and redundant information is removed;
204: training the human body action on the deep learning model according to the extracted characteristic information;
205: and classifying and identifying the video to be identified by adopting the trained deep learning model.
In the embodiment, when double-person interactive action recognition is carried out, in the video classification stage, the primary recognition results obtained by the whole video and the individual segmentation video are subjected to decision-level fusion, so that the action classification accuracy can be improved.
According to the method provided by the embodiment, the video of the computer vision system is obtained, the video is subjected to data preprocessing, the parallel multi-feature fusion network algorithm is adopted, the feature information of the video subjected to data preprocessing is extracted, the deep learning model is trained for human body actions according to the extracted feature information, the trained deep learning model is adopted to classify and recognize the video to be recognized, and the accuracy of action recognition and the accuracy of classification and recognition are improved. Further, aiming at the action characteristic differences of different action time stages of the video, the video key frame is obtained by a Gaussian model down-sampling method fusing time stage characteristics, so that the influence of a large amount of redundant information is removed, and the action identification accuracy is further improved.
FIG. 3 is a block diagram of a human motion recognition device in a video of a computer vision system according to another embodiment of the present application. Referring to fig. 3, the apparatus includes:
an acquisition module 301 configured to acquire a video of a computer vision system, perform data preprocessing on the video;
an extraction module 302 configured to extract feature information of the video after data preprocessing by using a parallel multi-feature fusion network algorithm;
a training module 303 configured to train human body actions on the deep learning model according to the extracted feature information;
and the recognition module 304 is configured to perform classification recognition on the video to be recognized by adopting the trained deep learning model.
In this embodiment, optionally, the extraction module is specifically configured to:
and extracting the characteristic information of the video subjected to data preprocessing based on a parallel multi-characteristic fusion algorithm of an inclusion network and a ResNet depth residual error network.
In this embodiment, optionally, the obtaining module is specifically configured to:
when double-person interactive action recognition is carried out, the video of the double-person interactive action is cut and divided into two action videos only containing a single person.
In this embodiment, optionally, the extraction module is specifically configured to:
and respectively extracting the characteristic information of the whole video and the single video after the segmentation by using a parallel multi-characteristic fusion network algorithm.
In this embodiment, optionally, the extraction module is further configured to:
in the process of extracting the characteristic information, a Gaussian model down-sampling method fusing time stage characteristics is adopted, different sampling intervals are given to human bodies in different time stages in interactive action, and redundant information is removed.
The apparatus provided in this embodiment may perform the method provided in any of the above method embodiments, and details of the process are described in the method embodiments and are not described herein again.
According to the device provided by the embodiment, the video of the computer vision system is obtained, the video is subjected to data preprocessing, the parallel multi-feature fusion network algorithm is adopted, the video subjected to data preprocessing is subjected to feature information extraction, the deep learning model is trained for human body actions according to the extracted feature information, the trained deep learning model is adopted to classify and recognize the video to be recognized, and the accuracy of action recognition and the accuracy of classification and recognition are improved. Further, aiming at the action characteristic differences of different action time stages of the video, the video key frame is obtained by a Gaussian model down-sampling method fusing time stage characteristics, so that the influence of a large amount of redundant information is removed, and the action identification accuracy is further improved.
The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Embodiments also provide a computing device, referring to fig. 4, comprising a memory 1120, a processor 1110 and a computer program stored in said memory 1120 and executable by said processor 1110, the computer program being stored in a space 1130 for program code in the memory 1120, the computer program, when executed by the processor 1110, implementing the method steps 1131 for performing any of the methods according to the invention.
The embodiment of the application also provides a computer readable storage medium. Referring to fig. 5, the computer readable storage medium comprises a storage unit for program code provided with a program 1131' for performing the steps of the method according to the invention, which program is executed by a processor.
The embodiment of the application also provides a computer program product containing instructions. Which, when run on a computer, causes the computer to carry out the steps of the method according to the invention.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of human motion recognition in a video of a computer vision system, comprising:
acquiring a video of a computer vision system, and performing data preprocessing on the video;
extracting characteristic information of the video subjected to data preprocessing by adopting a parallel multi-characteristic fusion network algorithm;
training the human body action on the deep learning model according to the extracted characteristic information;
and classifying and identifying the video to be identified by adopting the trained deep learning model.
2. The method according to claim 1, wherein the extracting the feature information of the video after the data preprocessing by using a parallel multi-feature fusion network algorithm comprises:
and extracting the characteristic information of the video subjected to data preprocessing based on a parallel multi-characteristic fusion algorithm of an inclusion network and a ResNet depth residual error network.
3. The method of claim 1, wherein pre-processing the video data comprises:
when double-person interactive action recognition is carried out, the video of the double-person interactive action is cut and divided into two action videos only containing a single person.
4. The method according to claim 3, wherein the extracting the feature information of the video after the data preprocessing by using a parallel multi-feature fusion network algorithm comprises:
and respectively extracting the characteristic information of the whole video and the single video after the segmentation by using a parallel multi-characteristic fusion network algorithm.
5. The method according to any one of claims 1-4, further comprising:
in the process of extracting the characteristic information, a Gaussian model down-sampling method fusing time stage characteristics is adopted, different sampling intervals are given to human bodies in different time stages in interactive action, and redundant information is removed.
6. An apparatus for human action recognition in a video of a computer vision system, comprising:
an acquisition module configured to acquire a video of a computer vision system, data pre-processing the video;
the extraction module is configured to extract the characteristic information of the video after the data preprocessing by adopting a parallel multi-characteristic fusion network algorithm;
a training module configured to train human body actions on the deep learning model according to the extracted feature information;
and the recognition module is configured to perform classified recognition on the video to be recognized by adopting the trained deep learning model.
7. The apparatus of claim 6, wherein the extraction module is specifically configured to:
and extracting the characteristic information of the video subjected to data preprocessing based on a parallel multi-characteristic fusion algorithm of an inclusion network and a ResNet depth residual error network.
8. The apparatus of claim 6, wherein the acquisition module is specifically configured to:
when double-person interactive action recognition is carried out, the video of the double-person interactive action is cut and divided into two action videos only containing a single person.
9. The apparatus of claim 8, wherein the extraction module is specifically configured to:
and respectively extracting the characteristic information of the whole video and the single video after the segmentation by using a parallel multi-characteristic fusion network algorithm.
10. The apparatus of any one of claims 6-9, wherein the extraction module is further configured to:
in the process of extracting the characteristic information, a Gaussian model down-sampling method fusing time stage characteristics is adopted, different sampling intervals are given to human bodies in different time stages in interactive action, and redundant information is removed.
CN201911142506.5A 2019-11-20 2019-11-20 Human body action recognition method and device in video of computer vision system Pending CN110991278A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911142506.5A CN110991278A (en) 2019-11-20 2019-11-20 Human body action recognition method and device in video of computer vision system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911142506.5A CN110991278A (en) 2019-11-20 2019-11-20 Human body action recognition method and device in video of computer vision system

Publications (1)

Publication Number Publication Date
CN110991278A true CN110991278A (en) 2020-04-10

Family

ID=70085382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911142506.5A Pending CN110991278A (en) 2019-11-20 2019-11-20 Human body action recognition method and device in video of computer vision system

Country Status (1)

Country Link
CN (1) CN110991278A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639571A (en) * 2020-05-20 2020-09-08 浙江工商大学 Video motion recognition method based on contour convolution neural network
CN111709291A (en) * 2020-05-18 2020-09-25 杭州电子科技大学 Takeaway personnel identity identification method based on fusion information
CN111783520A (en) * 2020-05-18 2020-10-16 北京理工大学 Double-flow network-based laparoscopic surgery stage automatic identification method and device
CN112801061A (en) * 2021-04-07 2021-05-14 南京百伦斯智能科技有限公司 Posture recognition method and system
CN113887516A (en) * 2021-10-29 2022-01-04 北京邮电大学 Feature extraction system and method for human body action recognition
CN117994822A (en) * 2024-04-07 2024-05-07 南京信息工程大学 Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084202A (en) * 2019-04-29 2019-08-02 东南大学 A kind of video behavior recognition methods based on efficient Three dimensional convolution

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084202A (en) * 2019-04-29 2019-08-02 东南大学 A kind of video behavior recognition methods based on efficient Three dimensional convolution

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
渠畅: ""视频监控中人体动作识别关键技术研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709291A (en) * 2020-05-18 2020-09-25 杭州电子科技大学 Takeaway personnel identity identification method based on fusion information
CN111783520A (en) * 2020-05-18 2020-10-16 北京理工大学 Double-flow network-based laparoscopic surgery stage automatic identification method and device
CN111709291B (en) * 2020-05-18 2023-05-26 杭州电子科技大学 Takeaway personnel identity recognition method based on fusion information
CN111639571A (en) * 2020-05-20 2020-09-08 浙江工商大学 Video motion recognition method based on contour convolution neural network
CN111639571B (en) * 2020-05-20 2023-05-23 浙江工商大学 Video action recognition method based on contour convolution neural network
CN112801061A (en) * 2021-04-07 2021-05-14 南京百伦斯智能科技有限公司 Posture recognition method and system
CN113887516A (en) * 2021-10-29 2022-01-04 北京邮电大学 Feature extraction system and method for human body action recognition
CN113887516B (en) * 2021-10-29 2024-05-24 北京邮电大学 Feature extraction system and method for human motion recognition
CN117994822A (en) * 2024-04-07 2024-05-07 南京信息工程大学 Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion

Similar Documents

Publication Publication Date Title
CN109919031B (en) Human behavior recognition method based on deep neural network
CN108470332B (en) Multi-target tracking method and device
Hong et al. Multimodal GANs: Toward crossmodal hyperspectral–multispectral image segmentation
CN110991278A (en) Human body action recognition method and device in video of computer vision system
Li et al. Attentive contexts for object detection
CN107424171B (en) Block-based anti-occlusion target tracking method
Sheng et al. Siamese denoising autoencoders for joints trajectories reconstruction and robust gait recognition
CN112861575A (en) Pedestrian structuring method, device, equipment and storage medium
CN113378649A (en) Identity, position and action recognition method, system, electronic equipment and storage medium
JP2022082493A (en) Pedestrian re-identification method for random shielding recovery based on noise channel
Henrio et al. Anomaly detection in videos recorded by drones in a surveillance context
CN105844204B (en) Human behavior recognition method and device
CN111353385B (en) Pedestrian re-identification method and device based on mask alignment and attention mechanism
CN114842391A (en) Motion posture identification method and system based on video
Jin et al. Cvt-assd: convolutional vision-transformer based attentive single shot multibox detector
Chen et al. Efficient activity detection in untrimmed video with max-subgraph search
CN116416503A (en) Small sample target detection method, system and medium based on multi-mode fusion
CN113761282B (en) Video duplicate checking method and device, electronic equipment and storage medium
Sriram et al. Analytical review and study on object detection techniques in the image
Ehsan et al. An accurate violence detection framework using unsupervised spatial–temporal action translation network
CN110598540A (en) Method and system for extracting gait contour map in monitoring video
Shf et al. Review on deep based object detection
CN110956097A (en) Method and module for extracting occluded human body and method and device for scene conversion
CN115719428A (en) Face image clustering method, device, equipment and medium based on classification model
Ingale et al. Deep Learning for Crowd Image Classification for Images Captured Under Varying Climatic and Lighting Condition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200410