CN112101102A - Method for acquiring 3D limb movement in RGB video based on artificial intelligence - Google Patents
Method for acquiring 3D limb movement in RGB video based on artificial intelligence Download PDFInfo
- Publication number
- CN112101102A CN112101102A CN202010789617.1A CN202010789617A CN112101102A CN 112101102 A CN112101102 A CN 112101102A CN 202010789617 A CN202010789617 A CN 202010789617A CN 112101102 A CN112101102 A CN 112101102A
- Authority
- CN
- China
- Prior art keywords
- human body
- limb
- acquiring
- artificial intelligence
- rgb video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 16
- 238000013136 deep learning model Methods 0.000 claims abstract description 22
- 238000001514 detection method Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 claims description 4
- 230000000007 visual effect Effects 0.000 abstract description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of human body action recognition and acquisition, in particular to a method for acquiring 3D limb actions in RGB (red, green and blue) videos based on artificial intelligence. The method for acquiring the 3D limb movement in the RGB video based on the artificial intelligence receives RGB visual information containing the human body through a server, further calculates the position of the human body from the video, carries out data standardization on the human body information, calculates characteristic data from the position of the human body in the video and the position of key points of the human body, then inputs the characteristic data into a locally stored deep learning model, the deep learning model is trained by using a large amount of RGB video data containing the human body collected by the method, the deep learning model outputs three-dimensional numerical values corresponding to the limb key points, and finally, the three-dimensional numerical values corresponding to the limb key points are automatically optimized to be a final result, so that more detailed limb movement is output.
Description
Technical Field
The invention relates to the technical field of human body action recognition and acquisition, in particular to a method for acquiring 3D limb actions in RGB (red, green and blue) videos based on artificial intelligence.
Background
With the development of computer vision technology, motion recognition by using video acquisition equipment becomes a research focus. In the existing action recognition method, data such as joint positions need to be extracted from a video stream, the data are input into a three-layer bidirectional long-time memory cycle artificial neural network, and dynamic features of the data are extracted by the neural network. And then, inputting the extracted dynamic characteristics into a classifier network, and finally acquiring action types corresponding to the data of the video stream.
At present, the video analysis technology based on deep learning develops rapidly, such as: pose estimation, motion tracking, face feature point detection, etc., a large amount of important information can be extracted from videos and images by computer vision algorithms. For recognizing the limb movement from the video, the existing technology generally only outputs crude information (such as standing, sitting and the like) as a label of the limb movement, and cannot output more detailed limb movement.
Disclosure of Invention
In order to solve the problems, the invention provides a method for acquiring 3D limb actions in RGB (red, green and blue) videos based on artificial intelligence, which mainly aims at the recognition practicability of the limb actions, develops and develops a deep learning model to directly analyze the limb actions in the RGB videos, and outputs three-dimensional numerical value corresponding to the key points of the limbs to express the detailed limb actions.
In order to achieve the purpose, the invention adopts the technical scheme that: a method for acquiring 3D limb actions in an RGB video based on artificial intelligence comprises the following algorithm steps:
s1, receiving RGB video information containing a human body by a server end;
s2, calculating the position of the human body from the video: taking out each frame from the video, temporarily storing the frame in an image format, and inputting each picture into a human body key point detection system to obtain X and Y coordinates of key points;
s3, calculating human body feature point detection from the video: extracting human body features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the human body;
s4, carrying out data standardization on human body information: carrying out data standardization on each feature point group;
s5, extracting characteristic data of human body information: the standardized feature point group becomes different feature data;
s6, inputting the feature data into a locally stored deep learning model;
s7, calculating a three-dimensional numerical value corresponding to the limb key point by using a deep learning model;
and S8, automatically optimizing the three-dimensional numerical values corresponding to the output limb key points.
Further, in S1, the user uploads the video to the server via the network interface, and the human body information received by the server is the human body information selected by the user.
Further, in S1, the RGB video including the human body is acquired by shooting or locally acquiring.
Wherein, in S3, the human body part includes a left arm, a right arm, a left leg, a right leg, a torso, and a head.
Further, in S4, with P ═ { P1, P2,. pn } as all (n) feature points, the normalized feature point group P' is calculated as follows:
Q=P/(max(P)–min(P))
P’=Q-mean(Q)。
further, in S7, feature data P 'is input and a three-dimensional value bs ═ P' × M + b corresponding to the limb key points is calculated, where M and b are a convolution kernel parameter and a bias layer parameter of the depth network, respectively, and the parameters are obtained from the deep learning training process.
Further, in S7, the deep learning model learns the correlation between the feature data of the human body information and the three-dimensional values corresponding to the limb key points in the training data using the multi-layer neural network.
The invention has the beneficial effects that: the method for acquiring the 3D limb movement in the RGB video based on the artificial intelligence receives RGB visual information containing the human body through a server, further calculates the position of the human body from the video, carries out data standardization on the human body information, calculates characteristic data from the position of the human body in the video and the position of key points of the human body, then inputs the characteristic data into a locally stored deep learning model, the deep learning model is trained by using a large amount of RGB video data containing the human body collected by the method, the deep learning model outputs three-dimensional numerical values corresponding to the limb key points, and finally, the three-dimensional numerical values corresponding to the limb key points are automatically optimized to be a final result, so that more detailed limb movement is output.
Drawings
Fig. 1 is a block flow diagram of the present embodiment.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. The present application may be embodied in many different forms and is not limited to the embodiments described in the present embodiment. The following detailed description is provided to facilitate a more thorough understanding of the present disclosure.
Referring to fig. 1, the invention relates to a method for acquiring 3D limb movement in RGB video based on artificial intelligence, comprising the following algorithm steps:
s1, receiving RGB video information containing a human body by a server: a user uploads videos to a server through a network interface (such as a website by using an HTTP hypertext transfer protocol), the human body information received by the server is the human body information selected by the user, and the RGB videos containing the human body are obtained by shooting or locally obtaining;
s2, calculating the position of the human body from the video: taking out each frame from the video, temporarily storing the frame in an image format, and inputting each picture into a human body key point detection system to obtain X and Y coordinates of key points;
s3, calculating human body feature point detection from the video: extracting human body features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of a human body, wherein the feature point groups comprise a left arm, a right arm, a left leg, a right leg, a trunk and a head;
s4, carrying out data standardization on human body information: data normalization is performed on each feature point group, taking P ═ { P1, P2,. and pn } as an example of all (n) feature points, and the normalized feature point group P' is calculated as follows:
Q=P/(max(P)–min(P))
P’=Q-mean(Q);
s5, extracting characteristic data of human body information: the standardized feature point group becomes different feature data;
s6, inputting the feature data into a locally stored deep learning model;
s7, calculating a three-dimensional numerical value corresponding to the limb key point by the deep learning model: the deep learning model learns the correlation between the feature data of human body information and the three-dimensional values corresponding to the limb key points in training data by using a multilayer neural network, inputs the feature data P 'and calculates the three-dimensional values bs (P'. M + b) corresponding to the limb key points, wherein M and b are convolution kernel parameters and bias layer parameters of the deep network respectively, and the parameters are obtained in the deep learning training process;
and S8, automatically optimizing the three-dimensional numerical values corresponding to the output limb key points.
From the above, the method for acquiring the 3D limb movement in the RGB video of the present embodiment mainly includes the following steps: calculating the position of the human body from the video; calculating human body key point detection from the video; extracting feature data of key points of a human body; and inputting the characteristic data into the deep learning model, and calculating a three-dimensional numerical value corresponding to the limb key point. In the deep learning model of the embodiment, the correlation between the feature data of the human key points and the three-dimensional numerical values corresponding to the limb key points is learned in the training data by using the multilayer neural network. In addition, the embodiment also collects a large amount of RGB video data containing human bodies, and labels three-dimensional numerical values corresponding to the limb key points on each section of video for deep learning model training.
Compared with the prior art, the method for acquiring the 3D limb movement in the RGB video in the embodiment includes: firstly, analyzing the tiny changes and actions of a human body through limb images in an RGB video, and identifying the limb actions by using a deep learning model; when analyzing the slight change of the limb part in the limb action, acquiring the key point information of the human body and extracting the feature code of the key point information of the human body; then, the extracted feature codes are used as input information of a deep learning model; and finally, analyzing the received feature codes through a deep learning model, and calculating a three-dimensional numerical value corresponding to the limb key point as feedback. In the body motion recognition process, the RGB video is directly used, other hardware such as a depth camera or a certain brand of smart phones is not needed, the detailed three-dimensional numerical values corresponding to the body key points are output to express detailed motion, and the method can be applied to movies, 3D animations, virtual characters and the like.
It should be further noted that, unless otherwise explicitly stated or limited, terms such as "obtaining," "extracting," "outputting," and the like are to be construed broadly, and specific meanings of the above terms in the present application will be understood by those skilled in the art according to specific situations.
The above embodiments are merely illustrative of the preferred embodiments of the present invention, and not restrictive, and various changes and modifications to the technical solutions of the present invention may be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are intended to fall within the scope of the present invention defined by the appended claims.
Claims (7)
1. A method for acquiring 3D limb actions in RGB video based on artificial intelligence is characterized by comprising the following steps: the algorithm comprises the following steps:
s1, receiving RGB video information containing a human body by a server end;
s2, calculating the position of the human body from the video: taking out each frame from the video, temporarily storing the frame in an image format, and inputting each picture into a human body key point detection system to obtain X and Y coordinates of key points;
s3, calculating human body feature point detection from the video: extracting human body features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the human body;
s4, carrying out data standardization on human body information: carrying out data standardization on each feature point group;
s5, extracting characteristic data of human body information: the standardized feature point group becomes different feature data;
s6, inputting the feature data into a locally stored deep learning model;
s7, calculating a three-dimensional numerical value corresponding to the limb key point by using a deep learning model;
and S8, automatically optimizing the three-dimensional numerical values corresponding to the output limb key points.
2. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S1, the user uploads the video to the server via the network interface, and the human body information received by the server is the human body information selected by the user.
3. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S1, the RGB video including the human body is acquired by shooting or locally acquiring.
4. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: at S3, the human body part includes a left arm, a right arm, a left leg, a right leg, a torso, and a head.
5. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S4, the normalized feature point group P' is calculated by using P ═ { P1, P2,. and pn } as all (n) feature points as follows:
Q=P/(max(P)–min(P))
P’=Q-mean(Q)。
6. the method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S7, feature data P 'is input to calculate a three-dimensional value bs ═ P' × M + b corresponding to the limb key points, where M and b are convolution kernel parameters and bias layer parameters of the depth network, respectively, which are obtained from the deep learning training process.
7. The method for acquiring 3D limb movement in RGB video based on artificial intelligence as claimed in claim 1, wherein: in S7, the deep learning model learns the correlation between the feature data of the human body information and the three-dimensional numerical values corresponding to the limb key points in the training data using the multilayer neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010789617.1A CN112101102A (en) | 2020-08-07 | 2020-08-07 | Method for acquiring 3D limb movement in RGB video based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010789617.1A CN112101102A (en) | 2020-08-07 | 2020-08-07 | Method for acquiring 3D limb movement in RGB video based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112101102A true CN112101102A (en) | 2020-12-18 |
Family
ID=73752698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010789617.1A Pending CN112101102A (en) | 2020-08-07 | 2020-08-07 | Method for acquiring 3D limb movement in RGB video based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112101102A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111460945A (en) * | 2020-03-25 | 2020-07-28 | 亿匀智行(深圳)科技有限公司 | Algorithm for acquiring 3D expression in RGB video based on artificial intelligence |
CN111488824A (en) * | 2020-04-09 | 2020-08-04 | 北京百度网讯科技有限公司 | Motion prompting method and device, electronic equipment and storage medium |
-
2020
- 2020-08-07 CN CN202010789617.1A patent/CN112101102A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111460945A (en) * | 2020-03-25 | 2020-07-28 | 亿匀智行(深圳)科技有限公司 | Algorithm for acquiring 3D expression in RGB video based on artificial intelligence |
CN111488824A (en) * | 2020-04-09 | 2020-08-04 | 北京百度网讯科技有限公司 | Motion prompting method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110135249B (en) | Human behavior identification method based on time attention mechanism and LSTM (least Square TM) | |
Du et al. | Representation learning of temporal dynamics for skeleton-based action recognition | |
Zhou et al. | Activity analysis, summarization, and visualization for indoor human activity monitoring | |
WO2019174439A1 (en) | Image recognition method and apparatus, and terminal and storage medium | |
KR102174595B1 (en) | System and method for identifying faces in unconstrained media | |
CN110889672B (en) | Student card punching and class taking state detection system based on deep learning | |
CN112800903B (en) | Dynamic expression recognition method and system based on space-time diagram convolutional neural network | |
CN112418095A (en) | Facial expression recognition method and system combined with attention mechanism | |
CN111770299B (en) | Method and system for real-time face abstract service of intelligent video conference terminal | |
Murtaza et al. | Analysis of face recognition under varying facial expression: a survey. | |
CN109635727A (en) | A kind of facial expression recognizing method and device | |
Nguyen et al. | Static hand gesture recognition using artificial neural network | |
KR101563297B1 (en) | Method and apparatus for recognizing action in video | |
Rao et al. | Sign Language Recognition System Simulated for Video Captured with Smart Phone Front Camera. | |
CN110458235B (en) | Motion posture similarity comparison method in video | |
CN110633624A (en) | Machine vision human body abnormal behavior identification method based on multi-feature fusion | |
CN113255522A (en) | Personalized motion attitude estimation and analysis method and system based on time consistency | |
CN111460945A (en) | Algorithm for acquiring 3D expression in RGB video based on artificial intelligence | |
CN111898571A (en) | Action recognition system and method | |
CN112489129A (en) | Pose recognition model training method and device, pose recognition method and terminal equipment | |
CN112906520A (en) | Gesture coding-based action recognition method and device | |
CN114120389A (en) | Network training and video frame processing method, device, equipment and storage medium | |
CN113989928B (en) | Motion capturing and redirecting method | |
Megalingam | Human action recognition: a review | |
CN110348395B (en) | Skeleton behavior identification method based on space-time relationship |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 518000 717, building r2-a, Gaoxin industrial village, No. 020, Gaoxin South seventh Road, Gaoxin community, Yuehai street, Nanshan District, Shenzhen, Guangdong Applicant after: Yiyun Zhixing (Shenzhen) Technology Co.,Ltd. Address before: 518000 1403a-1005, east block, Coast Building, No. 15, Haide Third Road, Haizhu community, Yuehai street, Nanshan District, Shenzhen, Guangdong Applicant before: Yiyun Zhixing (Shenzhen) Technology Co.,Ltd. |