CN102609969A - Method for processing face and speech synchronous animation based on Chinese text drive - Google Patents

Method for processing face and speech synchronous animation based on Chinese text drive Download PDF

Info

Publication number
CN102609969A
CN102609969A CN2012100375287A CN201210037528A CN102609969A CN 102609969 A CN102609969 A CN 102609969A CN 2012100375287 A CN2012100375287 A CN 2012100375287A CN 201210037528 A CN201210037528 A CN 201210037528A CN 102609969 A CN102609969 A CN 102609969A
Authority
CN
China
Prior art keywords
chinese
face
animation
people
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012100375287A
Other languages
Chinese (zh)
Other versions
CN102609969B (en
Inventor
赵群飞
杜鹏
樊延峰
邓杰
唐品
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN 201210037528 priority Critical patent/CN102609969B/en
Publication of CN102609969A publication Critical patent/CN102609969A/en
Application granted granted Critical
Publication of CN102609969B publication Critical patent/CN102609969B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a method for processing face and speech synchronous animation based on Chinese text drive, comprising the steps of classifying all Chinese phonemes into 16 groups of Chinese visual phonemes according to the size feature of the lip action when pronouncing according to Chinese pinyin, and using an input face image to synthesize a corresponding key frame; analyzing an input text to obtain a corresponding Chinese visual phoneme sequence and a key frame sequence of the animation; interposing a transition frame between every two neighbor key frames; aligning the key frame sequence and a speech stream; and finally, simultaneously playing a speech stream and an animation stream to achieve the face and speech synchronous animation. After inputting any face head portrait and any text content, the method can automatically finish generation and output of the face animation, is simple in operation and smooth in effect, and is suitable for kinds of such occasions as a visual human-computer interface, a computer game, the teaching of Chinese as a foreign language and the like.

Description

The disposal route of the people's face voice synchronous animation that drives based on Chinese language text
Technical field
The present invention relates to people's face voice synchronous cartoon technique field, specifically relate to a kind of disposal route of the people's face voice synchronous animation that drives based on Chinese language text.
Background technology
Word message, acoustic information and visual information are the main forms of mankind nowadays information and knowledge, and they also are that the mankind learn and the important tool that exchanges simultaneously.Even to this day; Reciprocation between the multiple information more and more receives people's attention; Integrate literal, sound and image, formation is direct by the conversion of text to visual speech, i.e. the human face animation system of voice synchronous; Let people when listening computing machine to sound, can see synchronous talker's face, make human-computer interaction interface more friendly, harmonious.People's face voice synchronous cartoon technique is in nearly decades, and from the realization of synthetic animation in real time of initial storage static images played in order three-dimensional face by now, the innovation of technology and the release of product are at a tremendous pace.The research team of China Science & Technology University has realized a voice synchronous animation system compatible mutually with the MPEG-4 standard; This system utilizes two true man's photos of three-dimensional headform and positive side of a neutrality to realize " head of speaking " (the talking head) of a three-dimensional; But its animation effect that realizes comparatively cartoonizes, and speaking with true personage also has bigger gap.The research team of Shanghai Communications University has realized people's face animation system with three-dimensional headform and a front face photo of a neutrality; But the insertion of its transition frames and animation streams and the voice flow synchronous alignment problem on time shaft is handled very simple and crudely and coarse, and the animation of generation has flicker and factitious situation to take place often.
Retrieval through to the prior art document is found; One Chinese patent application number: 201010263097.7; Patent of invention title: based on the moving synchronizing animation systems of the real-time voice-driven human face lip of collaborative filtering; Be characterized in through real-time typing voice, make personage's head model make and import the lip animation of voice synchronous.This system can utilize digital recorder; Receive the voice signal of input in real time; And the output and the human face and lip animation of voice synchronous in real time, when generating multi-modal synchronous storehouse, do not need manual mark, can import the lip animation that men and women's voice carry out voice driven arbitrarily.But needing special multi-modal data acquisition equipment synchronously to gather, this system records speaker voice and people's face three-dimensional feature point movable information in speaking; Must increase difficulty that system realizes and then the usable range that has limited this system; And this system is based on voice driven; The acoustic information of reading aloud under needing before animation generates, to record in advance can not be for the corresponding animation of the text generation that need read aloud arbitrarily.One Chinese patent application number: 200910263558.8; Patent of invention title: the method for voice-driven lip animation; The realization of this method need be gathered some individuals' original audio data and video data: everyone reads initial consonant and simple or compound vowel of a Chinese syllable word, uses DV or video camera to take simultaneously, to obtain audio stream and video stream data; Need the content of collection more, robotization completely.
Summary of the invention
The objective of the invention is to overcome above-mentioned deficiency of the prior art; A kind of method of the people's face voice synchronous animation system that drives based on Chinese language text is provided; This system's full automation only needs the computing machine of a band camera, input to want the content of text of reading aloud; Just can obtain any people's face and read aloud the voice synchronous animation effect of any Chinese language text, output effect is truly smooth.
The present invention realizes through following technical scheme:
A kind of disposal route of the people's face voice synchronous animation that drives based on Chinese language text is characterized in that this method comprises the following steps:
1. gather facial image: by light source the people that light impinges upon the desire collection is produced reflected light or the transmitted light of representing face characteristic on the face, convert light signal to corresponding electric signal by ccd detector again; Or from memory device, read facial image;
2. people's face detects: to step 1. the facial image of gained carry out pre-service, utilize the AdaBoost algorithm to detect human face region then;
3. face characteristic extracts: in the 2. detected human face region of step, utilize the ASM algorithm to extract the unique point of people's face, wherein mouth extracts 32 unique points, and eyes portion extracts 20 unique points, and the peripheral profile of nose and face extracts 30 unique points;
4. key frame is synthetic: the triangle block that mouth image is divided into 49 non-overlapping copies according to step 32 unique points that 3. mouth extracted; According to classification and definition to the visual phoneme of Chinese; Unique point and said triangle block that utilization free-format deformation algorithm controlled step is extracted in 3. move in people's face plane and the form distortion, synthesize corresponding human face animation key frame;
5. transition frames is synthetic: at first; Unique point according to 4. every adjacent two key frames of step; With time is that parameter is carried out the unique point that linear interpolation calculates transition frames to it; According to the unique point of 32 transition frames of this mouth mouth is divided into the triangle block of 49 non-overlapping copies again, these triangle blocks utilization free-format deformation algorithm are synthesized corresponding human face animation transition frames;
Then, based on the definition and the classification of 16 groups of visual phonemes of Chinese and Chinese visual phoneme, between every adjacent two frame key frames, insert the transition frames of different numbers;
6. Chinese text input: import Chinese text or from memory device, read Chinese text;
7. text analyzing: the 6. resulting content of text of step is analyzed, obtained the corresponding Chinese visual phoneme stream of the text;
8. text voice conversion: the voice flow that the 6. resulting content of text of step is converted into voice signal;
9. animation streams and voice flow are synchronous: the key frame that 4. step is synthesized snaps on the voice flow that 8. step changed;
10. the synchronous output of people's face voice and animation shows synthetic people's face voice synchronous animation effect.
8. 6. 5. 1. described step carry out to step with described step to step simultaneously.
Lip motion characteristic when the definition of the visual phoneme of described Chinese and classification are meant by Chinese speech pronunciation classifies as 16 kinds of visual phoneme classes of Chinese with all Chinese phonetic alphabet.
Described pre-service is meant that the facial image to input carries out smothing filtering and angle correction procedure.
The unique point computing formula of said transition frames is following:
P ( k , t ) = t e - t t e - t s × P ( k , t s ) + t - t s t e - t s × P ( k , t e ) , k = 1,2 , . . . , 32 Andt ∈ [ t s , t e ) P in the formula (k, t)For k unique point of mouth at the coordinate of t constantly time the, t sBe the moment that certain Chinese visual phoneme pronunciation begins, t eBe the moment of the visual phoneme pronunciation end of this Chinese.
The computing formula of the said transition frames number that between every adjacent two key frames, need insert is following:
N i = W i W sum × T w × F v , i = 1,2 , . . . , n
In the formula: N iBe the number of the individual Chinese visual phoneme of the corresponding i of certain Chinese character to the transition frames that inserts between i+1 the Chinese visual phoneme, n is the number of all corresponding Chinese visual phonemes of this Chinese character, n≤3, W iBe the weights of i corresponding Chinese visual phoneme of this Chinese character, W SumBe the summation of the weights of all corresponding Chinese visual phonemes of this Chinese character, T wBe the time that this Chinese character pronunciation continues, F vBe the animation broadcasting speed, unit is " a frame per second ".The Chinese visual phoneme of in the Chinese character each all corresponding a key frame in the animation streams, in the Chinese character i and i+1 Chinese visual phoneme just correspondence two adjacent key frames in the animation streams.
Whole process realizes simple, and easy to operate, calculated amount is little, and people's face voice synchronous animation effect of generation is truly smooth.
Description of drawings
Fig. 1 is the process flow figure that the present invention is based on people's face voice synchronous animation of Chinese language text driving.
Fig. 2 is key frame alignment synoptic diagram, and Fa among the figure, Fb, Fc, Fd are Chinese visual phoneme key frame.
Embodiment
Below in conjunction with accompanying drawing and embodiment technical scheme of the present invention is done detailed description, but should not limit protection scope of the present invention with this.
With the Chinese phonetic alphabet table according to when pronunciation the lip motion characteristic be divided into 16 groups of visual phoneme set of Chinese, see Table 1, and define the weights of Chinese visual phoneme, the lip action size when characterizing its pronunciation, as shown in table 2.Table 1 is that Chinese visual phoneme divides into groups, and table 2 is Chinese visual phoneme weight table.
Table 1
Figure BDA0000136497440000041
Table 2
Figure BDA0000136497440000042
At first gather facial image; Detect step through people's face then and detect the human face region in this image; Extract the human face characteristic point in this zone through the human face characteristic point extraction step again; Through these unique points,, between per two adjacent key frames, insert transition frames then according to the definition of the visual phoneme of Chinese and the Chinese visual phoneme key frame of the synthetic human face animation of classification.
Import or read the Chinese text that to read aloud, it is analyzed obtain corresponding Chinese visual aligned phoneme sequence again, convert Chinese text into voice flow through the text voice switch process; At last, the visual phoneme key frame of Chinese is snapped on the voice flow,, realize people's face voice synchronous animation effect so that export animation streams and voice flow synchronously.
Fig. 1 is the process flow diagram of the disposal route of people's face voice synchronous animation of driving based on Chinese language text, and is as shown in the figure, and a kind of disposal route of the people's face voice synchronous animation that drives based on Chinese language text comprises the following steps:
1. gather facial image: by light source the people that light impinges upon the desire collection is produced reflected light or the transmitted light of representing face characteristic on the face, convert light signal to corresponding electric signal by ccd detector again; Or from memory device, read facial image;
2. people's face detects: to step 1. the facial image of gained carry out pre-service such as smothing filtering, angularity correction, utilize the AdaBoost algorithm to detect the approximate region of people's face then;
3. face characteristic extracts: in the approximate region of the 2. detected people's face of step, utilize the ASM algorithm to extract the unique point of people's face, wherein mouth extracts 32 unique points, and eyes portion extracts 20 unique points, and the peripheral profile of nose and face extracts 30 unique points;
4. key frame is synthetic: the mouth that is at first 3. extracted according to step in the present embodiment extracts 32 unique points; Mouth image is divided into the triangle block of 49 non-overlapping copies; Then according in the table 1 to the classification and the definition of the visual phoneme of Chinese; Unique point and above-mentioned triangle block that utilization free-format deformation algorithm controlled step is extracted in 3. move in people's face plane and the form distortion, thereby synthesize corresponding human face animation key frame;
5. transition frames is synthetic: at first; Unique point according to 4. every adjacent two key frames of step; With time is that parameter is carried out the unique point that linear interpolation calculates transition frames to it; According to the unique point of 32 transition frames of this mouth mouth is divided into the triangle block of 49 non-overlapping copies again, these triangle blocks utilization free-format deformation algorithm are synthesized corresponding human face animation transition frames;
The unique point computing formula of any transition frames is following:
P ( k , t ) = t e - t t e - t s × P ( k , t s ) + t - t s t e - t s × P ( k , t e ) , k = 1,2 , . . . , 32 andt ∈ [ t s , t e )
P in the formula (k, t)For k unique point of mouth at the coordinate of t constantly time the, t sBe the moment that certain Chinese visual phoneme pronunciation begins, t eBe the moment of the visual phoneme pronunciation end of this Chinese.
Then, based on the definition and the classification of 16 groups of visual phonemes of Chinese and Chinese visual phoneme, between every adjacent two frame key frames, insert the transition frames of different numbers;
The number that transition frames inserts is according to the weights decision of its corresponding Chinese visual phoneme in table 2, and the computing formula of the transition frames number that need insert between any two adjacent key frames is following:
N i = W i W sum × T w × F v , i = 1,2 , . . . , n
In the formula: N iBe the number of the individual Chinese visual phoneme of the corresponding i of certain Chinese character to the transition frames that should insert between i+1 the Chinese visual phoneme, n is the number (can know n≤3 by table 1) of all corresponding Chinese visual phonemes of this Chinese character, W iBe i the corresponding weights of Chinese visual phoneme in table 2 of this Chinese character, W SumBe the summation of the weights of all corresponding Chinese visual phonemes of this Chinese character, T wBe the time that this Chinese character pronunciation continues, F vBe the animation broadcasting speed, unit is " a frame per second ".
6. Chinese text input: import Chinese text or from memory device, read Chinese text;
7. text analyzing: the 6. resulting content of text of step is analyzed, obtained the corresponding Chinese visual phoneme stream of the text, i.e. the sequence of Chinese visual phoneme;
8. text voice conversion: the voice flow that the 6. resulting content of text of step is converted into voice signal;
9. animation streams and voice flow are synchronous: the key frame that 4. step is synthesized snaps on the voice flow that 8. step changed.Concrete grammar is following:
At first, the text voice transform engine is represented it and begins " reading " Chinese character that the mistiming between two information that takes place in succession is exactly the duration of a Chinese character pronunciation in the information of can dishing out that begins of each Chinese character.Then; 7. the corresponding visual phoneme stream (sequence) of this Chinese character that obtains in through step; The key frame of the human face animation when obtaining this Chinese character pronunciation stream (sequence) is arranged in these key frames streams on the time span that this Chinese character pronunciation continues in the weights ratio shown in the table 2.
10. the synchronous output of people's face voice and animation realizes people's face voice synchronous animation effect.

Claims (6)

1. the disposal route based on people's face voice synchronous animation of Chinese language text driving is characterized in that this method comprises the following steps:
1. gather facial image: by light source the people that light impinges upon the desire collection is produced reflected light or the transmitted light of representing face characteristic on the face, convert light signal to corresponding electric signal by ccd detector again; Or from memory device, read facial image;
2. people's face detects: to step 1. the facial image of gained carry out pre-service, utilize the AdaBoost algorithm to detect human face region then;
3. face characteristic extracts: in the 2. detected human face region of step, utilize the ASM algorithm to extract the unique point of people's face, wherein mouth extracts 32 unique points, and eyes portion extracts 20 unique points, and the peripheral profile of nose and face extracts 30 unique points;
4. key frame is synthetic: the triangle block that mouth image is divided into 49 non-overlapping copies according to step 32 unique points that 3. mouth extracted; According to classification and definition to the visual phoneme of Chinese; Unique point and said triangle block that utilization free-format deformation algorithm controlled step is extracted in 3. move in people's face plane and the form distortion, synthesize corresponding human face animation key frame;
5. transition frames is synthetic: at first; Unique point according to 4. every adjacent two key frames of step; With time is that parameter is carried out the unique point that linear interpolation calculates transition frames to it; According to the unique point of 32 transition frames of this mouth mouth is divided into the triangle block of 49 non-overlapping copies again, these triangle blocks utilization free-format deformation algorithm are synthesized corresponding human face animation transition frames;
Then, based on the definition and the classification of 16 groups of visual phonemes of Chinese and Chinese visual phoneme, between every adjacent two frame key frames, insert the transition frames of different numbers;
6. Chinese text input: import Chinese text or from memory device, read Chinese text;
7. text analyzing: the 6. resulting content of text of step is analyzed, obtained the corresponding Chinese visual phoneme stream of the text;
8. text voice conversion: the voice flow that the 6. resulting content of text of step is converted into voice signal;
9. animation streams and voice flow are synchronous: the key frame that 4. step is synthesized snaps on the voice flow that 8. step changed;
10. the synchronous output of people's face voice and animation.
2. the disposal route of people's face voice synchronous animation according to claim 1 is characterised in that 8. 6. 5. 1. described step carry out to step with described step to step simultaneously.
3. the disposal route of people's face voice synchronous animation according to claim 1 and 2; Be characterised in that; Lip motion characteristic when the definition of the visual phoneme of described Chinese and classification are meant by Chinese speech pronunciation classifies as 16 kinds of visual phoneme classes of Chinese with all Chinese phonetic alphabet.
4. the disposal route of people's face voice synchronous animation according to claim 1 and 2 is characterised in that, described pre-service is meant that the facial image to input carries out smothing filtering and angle correction procedure.
5. the disposal route of people's face voice synchronous animation according to claim 1 and 2 is characterised in that, the unique point computing formula of said transition frames is following:
P ( k , t ) = t e - t t e - t s × P ( k , t s ) + t - t s t e - t s × P ( k , t e ) , k = 1,2 , . . . , 32 andt ∈ [ t s , t e )
P in the formula (k, t)For k unique point of mouth at the coordinate of t constantly time the, t sBe the moment that certain Chinese visual phoneme pronunciation begins, t eBe the moment of the visual phoneme pronunciation end of this Chinese.
6. the disposal route of people's face voice synchronous animation according to claim 1 and 2 is characterised in that, the computing formula of the said transition frames number that between every adjacent two key frames, need insert is following:
N i = W i W sum × T w × F v , i = 1,2 , . . . , n
In the formula: N iBe the number of the individual Chinese visual phoneme of the corresponding i of certain Chinese character to the transition frames that inserts between i+1 the Chinese visual phoneme, n is the number of all corresponding Chinese visual phonemes of this Chinese character, n≤3, W iBe the weights of i corresponding Chinese visual phoneme of this Chinese character, W SumBe the summation of the weights of all corresponding Chinese visual phonemes of this Chinese character, T wBe the time that this Chinese character pronunciation continues, F vBe the animation broadcasting speed, unit is " a frame per second ".
CN 201210037528 2012-02-17 2012-02-17 Method for processing face and speech synchronous animation based on Chinese text drive Expired - Fee Related CN102609969B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201210037528 CN102609969B (en) 2012-02-17 2012-02-17 Method for processing face and speech synchronous animation based on Chinese text drive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201210037528 CN102609969B (en) 2012-02-17 2012-02-17 Method for processing face and speech synchronous animation based on Chinese text drive

Publications (2)

Publication Number Publication Date
CN102609969A true CN102609969A (en) 2012-07-25
CN102609969B CN102609969B (en) 2013-08-07

Family

ID=46527312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201210037528 Expired - Fee Related CN102609969B (en) 2012-02-17 2012-02-17 Method for processing face and speech synchronous animation based on Chinese text drive

Country Status (1)

Country Link
CN (1) CN102609969B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103500461A (en) * 2013-09-18 2014-01-08 珠海金山网络游戏科技有限公司 Animation generation method for reducing real-time interpolation calculated amount
CN104268526A (en) * 2014-09-25 2015-01-07 北京航空航天大学 Chinese character image matching and deformation method
CN104616338A (en) * 2015-01-26 2015-05-13 江苏如意通动漫产业有限公司 Two-dimensional animation-based time-space consistent variable speed interpolation method
CN104834750A (en) * 2015-05-28 2015-08-12 瞬联软件科技(北京)有限公司 Method for generating character curves
CN105390133A (en) * 2015-10-09 2016-03-09 西北师范大学 Tibetan TTVS system realization method
CN105786798A (en) * 2016-02-25 2016-07-20 上海交通大学 Natural language intention understanding method in man-machine interaction
CN106328163A (en) * 2016-08-16 2017-01-11 新疆大学 Uygur language phoneme-viseme parameter conversion method and system
CN107203773A (en) * 2016-03-17 2017-09-26 掌赢信息科技(上海)有限公司 The method and electronic equipment of a kind of mouth expression migration
CN108765528A (en) * 2018-04-10 2018-11-06 南京江大搏达信息科技有限公司 Game charater face 3D animation synthesizing methods based on data-driven
CN109949390A (en) * 2017-12-21 2019-06-28 腾讯科技(深圳)有限公司 Image generating method, dynamic expression image generating method and device
CN110580336A (en) * 2018-06-08 2019-12-17 北京得意音通技术有限责任公司 Lip language word segmentation method and device, storage medium and electronic equipment
CN110730389A (en) * 2019-12-19 2020-01-24 恒信东方文化股份有限公司 Method and device for automatically generating interactive question and answer for video program
CN110853614A (en) * 2018-08-03 2020-02-28 Tcl集团股份有限公司 Virtual object mouth shape driving method and device and terminal equipment
CN110867177A (en) * 2018-08-16 2020-03-06 林其禹 Voice playing system with selectable timbre, playing method thereof and readable recording medium
CN111459452A (en) * 2020-03-31 2020-07-28 北京市商汤科技开发有限公司 Interactive object driving method, device, equipment and storage medium
CN111460785A (en) * 2020-03-31 2020-07-28 北京市商汤科技开发有限公司 Interactive object driving method, device, equipment and storage medium
CN112328076A (en) * 2020-11-06 2021-02-05 北京中科深智科技有限公司 Method and system for driving character gestures through voice
CN113379875A (en) * 2021-03-22 2021-09-10 平安科技(深圳)有限公司 Cartoon character animation generation method, device, equipment and storage medium
CN113672194A (en) * 2020-03-31 2021-11-19 北京市商汤科技开发有限公司 Method, device and equipment for acquiring acoustic feature sample and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030163315A1 (en) * 2002-02-25 2003-08-28 Koninklijke Philips Electronics N.V. Method and system for generating caricaturized talking heads
US6665643B1 (en) * 1998-10-07 2003-12-16 Telecom Italia Lab S.P.A. Method of and apparatus for animation, driven by an audio signal, of a synthesized model of a human face
CN1971621A (en) * 2006-11-10 2007-05-30 中国科学院计算技术研究所 Generating method of cartoon face driven by voice and text together
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN101751692A (en) * 2009-12-24 2010-06-23 四川大学 Method for voice-driven lip animation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6665643B1 (en) * 1998-10-07 2003-12-16 Telecom Italia Lab S.P.A. Method of and apparatus for animation, driven by an audio signal, of a synthesized model of a human face
US20030163315A1 (en) * 2002-02-25 2003-08-28 Koninklijke Philips Electronics N.V. Method and system for generating caricaturized talking heads
CN1971621A (en) * 2006-11-10 2007-05-30 中国科学院计算技术研究所 Generating method of cartoon face driven by voice and text together
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN101751692A (en) * 2009-12-24 2010-06-23 四川大学 Method for voice-driven lip animation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《小型微型计算机***》 20071231 涂欢、周经野等 "一种语音和文本联合驱动的卡通人脸动画方法" 第2238-2241页 1-6 第28卷, 第12期 *
涂欢、周经野等: ""一种语音和文本联合驱动的卡通人脸动画方法"", 《小型微型计算机***》, vol. 28, no. 12, 31 December 2007 (2007-12-31), pages 2238 - 2241 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103500461A (en) * 2013-09-18 2014-01-08 珠海金山网络游戏科技有限公司 Animation generation method for reducing real-time interpolation calculated amount
CN104268526A (en) * 2014-09-25 2015-01-07 北京航空航天大学 Chinese character image matching and deformation method
CN104268526B (en) * 2014-09-25 2017-09-01 北京航空航天大学 A kind of Chinese character picture match and deformation method
CN104616338B (en) * 2015-01-26 2018-02-27 江苏如意通动漫产业有限公司 The consistent speed change interpolating method of space-time based on 2 D animation
CN104616338A (en) * 2015-01-26 2015-05-13 江苏如意通动漫产业有限公司 Two-dimensional animation-based time-space consistent variable speed interpolation method
CN104834750A (en) * 2015-05-28 2015-08-12 瞬联软件科技(北京)有限公司 Method for generating character curves
US10311133B2 (en) 2015-05-28 2019-06-04 Cienet Technologies (Beijing) Co., Ltd. Character curve generating method and device thereof
WO2016188493A1 (en) * 2015-05-28 2016-12-01 瞬联软件科技(北京)有限公司 Character curve generating method and device thereof
CN104834750B (en) * 2015-05-28 2018-03-02 瞬联软件科技(北京)有限公司 A kind of word curve generation method
CN105390133A (en) * 2015-10-09 2016-03-09 西北师范大学 Tibetan TTVS system realization method
CN105786798B (en) * 2016-02-25 2018-11-02 上海交通大学 Natural language is intended to understanding method in a kind of human-computer interaction
CN105786798A (en) * 2016-02-25 2016-07-20 上海交通大学 Natural language intention understanding method in man-machine interaction
CN107203773A (en) * 2016-03-17 2017-09-26 掌赢信息科技(上海)有限公司 The method and electronic equipment of a kind of mouth expression migration
CN106328163A (en) * 2016-08-16 2017-01-11 新疆大学 Uygur language phoneme-viseme parameter conversion method and system
CN106328163B (en) * 2016-08-16 2019-07-02 新疆大学 The conversion method and system of Uighur phoneme-viseme parameter
CN109949390A (en) * 2017-12-21 2019-06-28 腾讯科技(深圳)有限公司 Image generating method, dynamic expression image generating method and device
CN108765528A (en) * 2018-04-10 2018-11-06 南京江大搏达信息科技有限公司 Game charater face 3D animation synthesizing methods based on data-driven
CN110580336A (en) * 2018-06-08 2019-12-17 北京得意音通技术有限责任公司 Lip language word segmentation method and device, storage medium and electronic equipment
CN110853614A (en) * 2018-08-03 2020-02-28 Tcl集团股份有限公司 Virtual object mouth shape driving method and device and terminal equipment
CN110867177A (en) * 2018-08-16 2020-03-06 林其禹 Voice playing system with selectable timbre, playing method thereof and readable recording medium
CN110730389A (en) * 2019-12-19 2020-01-24 恒信东方文化股份有限公司 Method and device for automatically generating interactive question and answer for video program
CN111459452A (en) * 2020-03-31 2020-07-28 北京市商汤科技开发有限公司 Interactive object driving method, device, equipment and storage medium
CN111460785A (en) * 2020-03-31 2020-07-28 北京市商汤科技开发有限公司 Interactive object driving method, device, equipment and storage medium
CN113672194A (en) * 2020-03-31 2021-11-19 北京市商汤科技开发有限公司 Method, device and equipment for acquiring acoustic feature sample and storage medium
CN112328076A (en) * 2020-11-06 2021-02-05 北京中科深智科技有限公司 Method and system for driving character gestures through voice
CN113379875A (en) * 2021-03-22 2021-09-10 平安科技(深圳)有限公司 Cartoon character animation generation method, device, equipment and storage medium
CN113379875B (en) * 2021-03-22 2023-09-29 平安科技(深圳)有限公司 Cartoon character animation generation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN102609969B (en) 2013-08-07

Similar Documents

Publication Publication Date Title
CN102609969B (en) Method for processing face and speech synchronous animation based on Chinese text drive
CN108447474B (en) Modeling and control method for synchronizing virtual character voice and mouth shape
CN103218842B (en) A kind of voice synchronous drives the method for the three-dimensional face shape of the mouth as one speaks and facial pose animation
CN109308731B (en) Speech driving lip-shaped synchronous face video synthesis algorithm of cascade convolution LSTM
EP3226245B1 (en) System and method to insert visual subtitles in videos
CN100476877C (en) Generating method of cartoon face driven by voice and text together
US20060012601A1 (en) Method of animating a synthesised model of a human face driven by an acoustic signal
Yargıç et al. A lip reading application on MS Kinect camera
CN107330961A (en) A kind of audio-visual conversion method of word and system
CN110096966A (en) A kind of audio recognition method merging the multi-modal corpus of depth information Chinese
JP6095381B2 (en) Data processing apparatus, data processing method, and program
CN100596186C (en) An interactive digital multimedia making method based on video and audio
CN101930619A (en) Collaborative filtering-based real-time voice-driven human face and lip synchronous animation system
CN115511994A (en) Method for quickly cloning real person into two-dimensional virtual digital person
CN104144280A (en) Voice and action animation synchronous control and device of electronic greeting card
WO2018113649A1 (en) Virtual reality language interaction system and method
Hong et al. iFACE: a 3D synthetic talking face
Petridis et al. Audiovisual laughter detection based on temporal features
Vignoli et al. A text-speech synchronization technique with applications to talking heads
WO2024113701A1 (en) Voice-based video generation method and apparatus, server, and medium
CN117315102A (en) Virtual anchor processing method, device, computing equipment and storage medium
Sui et al. A 3D audio-visual corpus for speech recognition
Karpov et al. A framework for recording audio-visual speech corpora with a microphone and a high-speed camera
Zahedi et al. Robust sign language recognition system using ToF depth cameras
CN101968894A (en) Method for automatically realizing sound and lip synchronization through Chinese characters

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130807

Termination date: 20160217

CF01 Termination of patent right due to non-payment of annual fee