CN111460945A - Algorithm for acquiring 3D expression in RGB video based on artificial intelligence - Google Patents
Algorithm for acquiring 3D expression in RGB video based on artificial intelligence Download PDFInfo
- Publication number
- CN111460945A CN111460945A CN202010215726.2A CN202010215726A CN111460945A CN 111460945 A CN111460945 A CN 111460945A CN 202010215726 A CN202010215726 A CN 202010215726A CN 111460945 A CN111460945 A CN 111460945A
- Authority
- CN
- China
- Prior art keywords
- deep learning
- rgb video
- face
- expression
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000014509 gene expression Effects 0.000 title claims abstract description 15
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 12
- 238000013136 deep learning model Methods 0.000 claims abstract description 16
- 239000000203 mixture Substances 0.000 claims abstract description 12
- 238000000034 method Methods 0.000 claims abstract description 6
- 238000001514 detection method Methods 0.000 claims abstract description 5
- 210000000887 face Anatomy 0.000 claims description 7
- 210000004709 eyebrow Anatomy 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 5
- 238000013135 deep learning Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 238000004519 manufacturing process Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008921 facial expression Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an algorithm for acquiring 3D expression in RGB video based on artificial intelligence, which comprises the following steps: s1, receiving RGB video information containing human faces by the server; s2, calculating the position of the face from the video; s3, calculating face characteristic point detection from the video; s4, carrying out data standardization on the face information; s5, extracting feature data of the face information; s6, inputting the feature data into a locally stored deep learning model; s7, calculating a Blend Shape value by the deep learning model; and S8, automatically optimizing the output Blend Shape value. The invention has the advantages that: the method does not need excessive hardware equipment, can output detailed Blend Shape values, and can be applied to 3D animation production.
Description
Technical Field
The invention relates to the field of expression recognition, in particular to an algorithm for acquiring a 3D expression in an RGB video based on artificial intelligence.
Background
With the progress of science and technology, the video analysis technology based on deep learning develops rapidly, such as: pose estimation, motion tracking, face feature point detection, etc., a large amount of important information can be extracted from videos and images by computer vision algorithms.
For recognizing facial expressions from video, the current technology generally only outputs crude information, such as: happiness, anger, sadness, happiness, and the like, which are used as labels of facial expressions, or are bound on API development software of a certain brand smart phone, such as: ARKit for apple mobile.
Disclosure of Invention
The invention aims to solve the technical problem of providing an algorithm which is used for acquiring 3D expression in an RGB video based on artificial intelligence, does not need excessive hardware equipment, can output detailed Blend Shape value and can be applied to 3D animation production.
In order to solve the technical problems, the technical scheme provided by the invention is as follows: an algorithm for acquiring 3D expression in RGB video based on artificial intelligence comprises the following steps:
an algorithm for acquiring 3D expression in RGB video based on artificial intelligence comprises the following steps:
s1, the user uploads the video to a server through a network interface, and the server receives RGB video information containing human faces;
s2, taking out each frame from the video and temporarily storing the frame in an image format, and inputting each image into a human face key point detection system of D L ib to obtain the X and Y coordinates of the key points;
s3, extracting face features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the face;
s4, normalizing the data of each feature point group, taking P = { P1, P2.., pn } as all (n) feature points as an example, the normalized feature point group P' is calculated by the following formula:
Q = P / (max(P) – min(P))
P’= Q - mean(Q);
s5, extracting feature data of the face information, wherein the standardized feature point group becomes different feature data;
s6, inputting the feature data into a locally stored deep learning model;
s7, calculating a BlendShape value by the deep learning model, inputting the feature data P' and calculating the BlendShape value bs by the deep learning model, wherein the formula is as follows:
bs = P’ * M + b
wherein M and b are obtained from the deep learning training process;
and S8, automatically optimizing the output Blend Shape value.
Further, the face information received by the server in S1 is the face information selected by the user.
Further, the feature point group in S3 includes a left eyebrow, a right eyebrow, a left eye, a right eye, a nose, and a mouth.
Further, the deep learning model in S7 learns the correlation between the feature data of the face information and the Blend Shape value in the training data by using a multi-layer neural network.
Compared with the prior art, the invention has the advantages that: the cascade communication resource graphical interaction method based on the intelligent terminal comprises the steps that a server end receives RGB video information containing human faces, the positions of the human faces are calculated from videos, carrying out data standardization on face information, calculating characteristic data from the positions of faces and the positions of face characteristic points in a video, inputting the characteristic data into a locally stored deep learning model, the deep learning model is trained by using a large amount of RGB video data containing human faces collected by the invention, the deep learning model outputs Blend Shape value, and finally, the Blend Shape value is automatically optimized to be the final result, in the facial expression recognition process, the RGB video is directly used, other hardware such as a depth camera or a certain brand of smart phone is not needed, and outputting detailed Blend Shape numerical expression detailed expressions, and can be applied to production of movies, 3D animations and virtual characters.
Drawings
FIG. 1 is a flow chart of an algorithm for obtaining 3D expressions in RGB video based on artificial intelligence.
Detailed Description
Examples
S1, uploading the video to a server by a user through a network interface (such as a website by HTTP hypertext transfer protocol), and receiving RGB video information containing human faces by the server;
s2, taking out each frame from the video and temporarily storing the frame in an image format, and inputting each image into a human face key point detection system of D L ib to obtain the X and Y coordinates of the key points;
s3, extracting face features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the face, wherein the feature point groups comprise a left eyebrow, a right eyebrow, a left eye, a right eye, a nose and a mouth;
s4, normalizing the data of each feature point group, taking P = { P1, P2.., pn } as all (n) feature points as an example, the normalized feature point group P' is calculated by the following formula:
Q = P / (max(P) – min(P))
P’= Q - mean(Q);
s5, extracting feature data of the face information, wherein the standardized feature point group becomes different feature data;
s6, inputting the feature data into a locally stored deep learning model;
s7, calculating a BlendShape value by the deep learning model, inputting the feature data P' and calculating the BlendShape value bs by the deep learning model, wherein the formula is as follows:
bs = P’ * M + b
wherein M and b are obtained from the deep learning training process;
and S8, automatically optimizing the output Blend Shape value.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. All such modifications and variations are intended to be included herein within the scope of this disclosure and the present invention.
Claims (4)
1. An algorithm for acquiring 3D expression in RGB video based on artificial intelligence is characterized by comprising the following steps:
s1, the user uploads the video to a server through a network interface, and the server receives RGB video information containing human faces;
s2, taking out each frame from the video and temporarily storing the frame in an image format, and inputting each image into a human face key point detection system of D L ib to obtain the X and Y coordinates of the key points;
s3, extracting face features based on the obtained key point coordinates, and distinguishing feature point groups by different parts of the face;
s4, normalizing the data of each feature point group, taking P = { P1, P2.., pn } as all (n) feature points as an example, the normalized feature point group P' is calculated by the following formula:
Q = P / (max(P) – min(P))
P’= Q - mean(Q);
s5, extracting feature data of the face information, wherein the standardized feature point group becomes different feature data;
s6, inputting the feature data into a locally stored deep learning model;
s7, calculating a BlendShape value by the deep learning model, inputting the feature data P' and calculating the BlendShape value bs by the deep learning model, wherein the formula is as follows:
bs = P’ * M + b
wherein M and b are obtained from the deep learning training process;
and S8, automatically optimizing the output Blend Shape value.
2. The algorithm for acquiring 3D expression in RGB video based on artificial intelligence as claimed in claim 1, wherein: the face information received by the server in S1 is the face information selected by the user.
3. The algorithm for acquiring 3D expression in RGB video based on artificial intelligence as claimed in claim 1, wherein: the feature point group in S3 includes a left eyebrow, a right eyebrow, a left eye, a right eye, a nose, and a mouth.
4. The algorithm for acquiring 3D expression in RGB video based on artificial intelligence as claimed in claim 1, wherein: the deep learning model in the S7 learns the correlation between the feature data of the face information and the Blend Shape value in the training data by using the multilayer neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010215726.2A CN111460945A (en) | 2020-03-25 | 2020-03-25 | Algorithm for acquiring 3D expression in RGB video based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010215726.2A CN111460945A (en) | 2020-03-25 | 2020-03-25 | Algorithm for acquiring 3D expression in RGB video based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111460945A true CN111460945A (en) | 2020-07-28 |
Family
ID=71685673
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010215726.2A Pending CN111460945A (en) | 2020-03-25 | 2020-03-25 | Algorithm for acquiring 3D expression in RGB video based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111460945A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112101102A (en) * | 2020-08-07 | 2020-12-18 | 亿匀智行(深圳)科技有限公司 | Method for acquiring 3D limb movement in RGB video based on artificial intelligence |
CN112101306A (en) * | 2020-11-10 | 2020-12-18 | 成都市谛视科技有限公司 | Fine facial expression capturing method and device based on RGB image |
CN113066155A (en) * | 2021-03-23 | 2021-07-02 | 华强方特(深圳)动漫有限公司 | 3D expression processing method and device |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104217454A (en) * | 2014-08-21 | 2014-12-17 | 中国科学院计算技术研究所 | Video driven facial animation generation method |
CN104794444A (en) * | 2015-04-16 | 2015-07-22 | 美国掌赢信息科技有限公司 | Facial expression recognition method in instant video and electronic equipment |
CN104951743A (en) * | 2015-03-04 | 2015-09-30 | 苏州大学 | Active-shape-model-algorithm-based method for analyzing face expression |
CN106778563A (en) * | 2016-12-02 | 2017-05-31 | 江苏大学 | A kind of quick any attitude facial expression recognizing method based on the coherent feature in space |
CN107610209A (en) * | 2017-08-17 | 2018-01-19 | 上海交通大学 | Human face countenance synthesis method, device, storage medium and computer equipment |
KR20180037419A (en) * | 2016-10-04 | 2018-04-12 | 재단법인대구경북과학기술원 | Apparatus for age and gender estimation using region-sift and discriminant svm classifier and method thereof |
CN108363973A (en) * | 2018-02-07 | 2018-08-03 | 电子科技大学 | A kind of unconfined 3D expressions moving method |
CN108805040A (en) * | 2018-05-24 | 2018-11-13 | 复旦大学 | It is a kind of that face recognition algorithms are blocked based on piecemeal |
CN108876879A (en) * | 2017-05-12 | 2018-11-23 | 腾讯科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium that human face animation is realized |
CN109493403A (en) * | 2018-11-13 | 2019-03-19 | 北京中科嘉宁科技有限公司 | A method of human face animation is realized based on moving cell Expression Mapping |
CN110415323A (en) * | 2019-07-30 | 2019-11-05 | 成都数字天空科技有限公司 | A kind of fusion deformation coefficient preparation method, device and storage medium |
-
2020
- 2020-03-25 CN CN202010215726.2A patent/CN111460945A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104217454A (en) * | 2014-08-21 | 2014-12-17 | 中国科学院计算技术研究所 | Video driven facial animation generation method |
CN104951743A (en) * | 2015-03-04 | 2015-09-30 | 苏州大学 | Active-shape-model-algorithm-based method for analyzing face expression |
CN104794444A (en) * | 2015-04-16 | 2015-07-22 | 美国掌赢信息科技有限公司 | Facial expression recognition method in instant video and electronic equipment |
KR20180037419A (en) * | 2016-10-04 | 2018-04-12 | 재단법인대구경북과학기술원 | Apparatus for age and gender estimation using region-sift and discriminant svm classifier and method thereof |
CN106778563A (en) * | 2016-12-02 | 2017-05-31 | 江苏大学 | A kind of quick any attitude facial expression recognizing method based on the coherent feature in space |
CN108876879A (en) * | 2017-05-12 | 2018-11-23 | 腾讯科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium that human face animation is realized |
CN107610209A (en) * | 2017-08-17 | 2018-01-19 | 上海交通大学 | Human face countenance synthesis method, device, storage medium and computer equipment |
CN108363973A (en) * | 2018-02-07 | 2018-08-03 | 电子科技大学 | A kind of unconfined 3D expressions moving method |
CN108805040A (en) * | 2018-05-24 | 2018-11-13 | 复旦大学 | It is a kind of that face recognition algorithms are blocked based on piecemeal |
CN109493403A (en) * | 2018-11-13 | 2019-03-19 | 北京中科嘉宁科技有限公司 | A method of human face animation is realized based on moving cell Expression Mapping |
CN110415323A (en) * | 2019-07-30 | 2019-11-05 | 成都数字天空科技有限公司 | A kind of fusion deformation coefficient preparation method, device and storage medium |
Non-Patent Citations (1)
Title |
---|
刘文如: "《零基础入门Python深度学习》", 华中科技大学出版社, pages: 111 - 114 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112101102A (en) * | 2020-08-07 | 2020-12-18 | 亿匀智行(深圳)科技有限公司 | Method for acquiring 3D limb movement in RGB video based on artificial intelligence |
CN112101306A (en) * | 2020-11-10 | 2020-12-18 | 成都市谛视科技有限公司 | Fine facial expression capturing method and device based on RGB image |
CN112101306B (en) * | 2020-11-10 | 2021-02-09 | 成都市谛视科技有限公司 | Fine facial expression capturing method and device based on RGB image |
CN113066155A (en) * | 2021-03-23 | 2021-07-02 | 华强方特(深圳)动漫有限公司 | 3D expression processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110569795B (en) | Image identification method and device and related equipment | |
Zhang et al. | Facial: Synthesizing dynamic talking face with implicit attribute learning | |
Rizwan et al. | An accurate facial expression detector using multi-landmarks selection and local transform features | |
US9805255B2 (en) | Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action | |
CN112800903B (en) | Dynamic expression recognition method and system based on space-time diagram convolutional neural network | |
CN111460945A (en) | Algorithm for acquiring 3D expression in RGB video based on artificial intelligence | |
CN112418095A (en) | Facial expression recognition method and system combined with attention mechanism | |
CN111770299B (en) | Method and system for real-time face abstract service of intelligent video conference terminal | |
CN113420719B (en) | Method and device for generating motion capture data, electronic equipment and storage medium | |
KR101887637B1 (en) | Robot system | |
Meng et al. | Weakly supervised semantic segmentation by a class-level multiple group cosegmentation and foreground fusion strategy | |
CN111563417A (en) | Pyramid structure convolutional neural network-based facial expression recognition method | |
CN111680550B (en) | Emotion information identification method and device, storage medium and computer equipment | |
CN110555896A (en) | Image generation method and device and storage medium | |
CN113254491A (en) | Information recommendation method and device, computer equipment and storage medium | |
CN111108508A (en) | Facial emotion recognition method, intelligent device and computer-readable storage medium | |
CN113298018A (en) | False face video detection method and device based on optical flow field and facial muscle movement | |
CN112257513A (en) | Training method, translation method and system for sign language video translation model | |
CN110866962A (en) | Virtual portrait and expression synchronization method based on convolutional neural network | |
Kumar et al. | Facial emotion recognition and detection using cnn | |
CN112016592A (en) | Domain adaptive semantic segmentation method and device based on cross domain category perception | |
CN108399358B (en) | Expression display method and system for video chat | |
CN113449564A (en) | Behavior image classification method based on human body local semantic knowledge | |
CN113269068B (en) | Gesture recognition method based on multi-modal feature adjustment and embedded representation enhancement | |
CN112101102A (en) | Method for acquiring 3D limb movement in RGB video based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 518000 717, building r2-a, Gaoxin industrial village, No. 020, Gaoxin South seventh Road, Gaoxin community, Yuehai street, Nanshan District, Shenzhen, Guangdong Applicant after: Yiyun Zhixing (Shenzhen) Technology Co.,Ltd. Address before: 518000 1403a-1005, east block, Coast Building, No. 15, Haide Third Road, Haizhu community, Yuehai street, Nanshan District, Shenzhen, Guangdong Applicant before: Yiyun Zhixing (Shenzhen) Technology Co.,Ltd. |
|
CB02 | Change of applicant information |