CN114359450A - Method and device for simulating virtual character speaking - Google Patents

Method and device for simulating virtual character speaking Download PDF

Info

Publication number
CN114359450A
CN114359450A CN202210050718.6A CN202210050718A CN114359450A CN 114359450 A CN114359450 A CN 114359450A CN 202210050718 A CN202210050718 A CN 202210050718A CN 114359450 A CN114359450 A CN 114359450A
Authority
CN
China
Prior art keywords
phoneme
mouth shape
mouth
audio frame
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210050718.6A
Other languages
Chinese (zh)
Inventor
余国军
耿俊怀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaoduo Intelligent Technology Beijing Co ltd
Original Assignee
Xiaoduo Intelligent Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaoduo Intelligent Technology Beijing Co ltd filed Critical Xiaoduo Intelligent Technology Beijing Co ltd
Priority to CN202210050718.6A priority Critical patent/CN114359450A/en
Publication of CN114359450A publication Critical patent/CN114359450A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/2053D [Three Dimensional] animation driven by audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The embodiment of the invention discloses a method and a device for simulating the talking of a virtual character, wherein the method comprises the following steps: according to the multiple phoneme classifications, making a mouth shape corresponding to each phoneme classification to obtain multiple basic mouth shapes; inputting an audio stream, extracting an audio frame of the audio stream, and identifying phonemes of the audio frame; determining a phoneme classification corresponding to the phoneme of the audio frame from the plurality of phoneme classifications, and selecting a basic mouth shape corresponding to the phoneme classification; and synthesizing the selected basic mouth shape into a corresponding mouth shape of the audio frame. The mouth shapes of the real persons are classified by phonemes and are arranged into 14 basic mouth shapes, and a computer can drive the virtual digital population type synchronization by phoneme recognition. Through the virtual digital population type patent, the voice mouth shape synchronization of the virtual digital people can be quickly and accurately realized. A mouth shape standardized mouth shape manufacturing scheme is formulated, and the virtual digital population shape manufacturing efficiency and the mouth shape quality are greatly improved. The virtual digital person is closer to a real person, and the user experience is greatly improved.

Description

Method and device for simulating virtual character speaking
Technical Field
The embodiment of the invention relates to the field of language identification processing, in particular to a method and a device for simulating virtual character speaking.
Background
The virtual digital population type current market has the following three main solutions:
(1) fixed virtual digital demographics: the mouth shape is fixed no matter what the virtual character says, and the voice mouth shape cannot be synchronized;
(2) volume driven virtual digital demographics: the mouth shape size of the virtual character is controlled according to the speaking volume of the virtual character, which is very inaccurate and cannot realize voice mouth shape synchronization;
(3) live picture sequence frame animation: the scheme used by the scientific research news flying virtual digital people sunny and young realizes the voice mouth shape synchronization by recognizing voice and calling picture sequence frame animation.
Disclosure of Invention
Therefore, the embodiment of the invention provides a method and a device for simulating a virtual character to speak, so as to solve the problem that the sound volume identification and the fixed mouth shape in the market in the prior art are only suitable for cartoon characters and cannot realize voice mouth shape synchronization.
In order to achieve the above object, an embodiment of the present invention provides the following:
in one aspect of an embodiment of the present invention, there is provided a method of simulating a virtual character speaking, the method comprising:
according to a plurality of phoneme classifications, making a mouth shape corresponding to each phoneme classification to obtain a plurality of basic mouth shapes;
inputting an audio stream, extracting an audio frame of the audio stream, and identifying a phoneme of the audio frame;
determining the phoneme classification corresponding to the phoneme of the audio frame from the plurality of phoneme classifications, and selecting the basic mouth shape corresponding to the phoneme classification;
synthesizing the selected base mouth shape into a corresponding mouth shape of the audio frame.
Further, the plurality of phoneme classifications includes:
(p,b,m)、(f,v)、(th)、(t,d)、(k,g)、(tS,dZ,S)、(s,z)、(n,l)、(r)、(A)、(e)、(ih)、(oh)、(ou)。
further, in the audio stream, a data amount in units of 2.5ms to 60ms is extracted as one frame of audio.
Further, the method further comprises:
and making a virtual character model, and generating the mouth shape of the virtual character according to the corresponding mouth shape of the audio frame.
Further, the plurality of base dies further comprises: mouth closed and universal.
Further, when the phonemes identified from the audio frame are not in the plurality of phoneme classifications, selecting the generic mouth shape as a base mouth shape;
selecting the mouth closed mouth as a base mouth when no phoneme is recognized from the audio frame.
In one aspect of an embodiment of the present invention, there is also provided an apparatus for simulating a virtual character speaking, the apparatus including:
a basic mouth shape generating unit, which is used for making a mouth shape corresponding to each phoneme classification according to a plurality of phoneme classifications to obtain a plurality of basic mouth shapes;
a phoneme extracting unit, which is used for inputting an audio stream, extracting an audio frame of the audio stream and identifying phonemes of the audio frame;
a basic mouth shape determining unit, configured to determine the phoneme classification corresponding to the phoneme of the audio frame from the plurality of phoneme classifications, and select the basic mouth shape corresponding thereto;
and the mouth shape synthesizing unit is used for synthesizing the selected basic mouth shape into a corresponding mouth shape of the audio frame.
In another aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the above-mentioned method.
In another aspect of embodiments of the present invention, there is provided a computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the above method.
The embodiment of the invention has the following advantages:
the embodiment of the invention discloses a method and a device for simulating the speaking of a virtual character, which classifies the mouth shape of a real person by phonemes, arranges the mouth shape of the real person into 14 basic mouth shapes, and can drive the population shape of a virtual number to be synchronous by the phoneme recognition of a computer. Through the virtual digital population type patent, the voice mouth shape synchronization of the virtual digital people can be quickly and accurately realized. Through the fusion and classification of phonemes, the voice mouth shape synchronization of the virtual digital person is realized, and the mouth shape fault tolerance rate of the virtual digital person during speaking reaches 99.9%. A mouth shape standardized mouth shape manufacturing scheme is formulated, and the virtual digital population shape manufacturing efficiency and the mouth shape quality are greatly improved. The virtual digital person is closer to a real person, and the user experience is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.
FIG. 1 is a flowchart illustrating a method for simulating a virtual character speaking according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an apparatus for simulating virtual character speaking according to an embodiment of the present invention.
In the figure: 102-basic mouth shape generating unit, 104-phoneme extracting unit, 106-basic mouth shape determining unit and 108-mouth shape synthesizing unit.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the present specification, the terms "upper", "lower", "left", "right", "middle", and the like are used for clarity of description, and are not intended to limit the scope of the present invention, and changes or modifications in the relative relationship may be made without substantial changes in the technical content.
Examples
Referring to fig. 1 and 2, an embodiment of the present invention provides a method for simulating a virtual character speaking, including the following steps:
s1: and according to the plurality of phoneme classifications, making a mouth shape corresponding to each phoneme classification to obtain a plurality of basic mouth shapes. Specifically, the phoneme: phonemes are the smallest units of speech that are divided according to the natural properties of the speech. From an acoustic property point of view, a phoneme is the smallest unit of speech divided from a psychoacoustic point of view. From the physiological point of view, a pronunciation action forms a phoneme. If [ ma ] contains [ m ] a ] two pronunciation actions, which are two phonemes. The sounds uttered by the same pronunciation action are the same phoneme, and the sounds uttered by different pronunciation actions are different phonemes. For example, in [ ma-mi ], the two [ m ] pronunciations are identical and are identical phonemes, and [ a ] i is different and is different phoneme. The analysis of phonemes is generally described in terms of pronunciation actions. The pronunciation action [ m ] is: the upper and lower lips are closed, the vocal cords vibrate, and the airflow flows out of the nasal cavity to make sound. In phonetic terms, it is the bicuspid nasal sound. For example, in the present invention, after a number of tests, the phonemes of Mandarin Chinese are arranged into 14 corresponding pronunciation mouth shapes, and the plurality of phoneme classifications includes the following 14 classifications:
(p, b, m), (f, v), (th), (t, d), (k, g), (tS, dZ, S), (S, z), (n, l), (r), (A), (e), (ih), (oh), (ou). Each classified set comprises at least one phoneme, and 14 basic mouth shapes corresponding to the 14 phoneme classified sets are made. The following is a phoneme classification table, which includes 14 phoneme classifications and corresponding pronunciation examples, in which in the example, the Pinyin pronunciation is bold and the English pronunciation is italic.
Phoneme/phoneme Example (Pinyin + word)
p,b,m pu,ban,man
f,v fan,vat
th xing,zan
t,d te,da
k,g call,gan
tS,dZ,S chair,zha,she
S,Z Se,zeal
n,1 la,na
r rui
A ka
e bed
ih tip
oh tou
ou bu
S2: inputting an audio stream, extracting audio frames of the audio stream, and identifying phonemes of the audio frames. The audio data is streaming, and there is no clear concept of one frame per se, and in practical applications, for the convenience of audio algorithm processing/transmission, the data amount in units of 2.5ms to 60ms is generally defined as one frame of audio. This time is called the "sampling time" and has no particular criteria for its length, which is determined by the requirements of the codec and the particular application. Specifically, after a segment of audio frame is extracted, the phonemes in the audio frame will be identified by the neural network identification model.
S3: a phoneme classification corresponding to a phoneme of the audio frame is determined from the plurality of phoneme classifications, and a basic mouth shape corresponding thereto is selected. Specifically, the phoneme in the audio frame is compared with the phoneme classification of 14, and the phoneme classification corresponding to the phoneme of the audio frame is determined. For example, after an audio frame is recognized, a plurality of phonemes are obtained, and a plurality of phoneme classifications corresponding to the plurality of phonemes are respectively identified, and a plurality of basic mouth shapes corresponding to the plurality of phoneme classifications are selected.
S4: and synthesizing the selected basic mouth shape into a corresponding mouth shape of the audio frame. Furthermore, a virtual character model is produced, and the mouth shape of the virtual character is generated according to the corresponding mouth shape of the audio frame. The technical scheme of the invention can identify the phonemes in the audio frame through real-time calling, synthesize the image frame corresponding to the audio frame and synthesize the image frame into the animation or the video in real time, and can quickly and accurately realize the voice mouth shape synchronization of the virtual digital person in the super-realistic/realistic manner.
Further, the plurality of basic dies further comprises: mouth closed and universal. When the phonemes identified from the audio frame are not in the plurality of phoneme classifications, a generic mouth shape is selected as the base mouth shape. When no phoneme is recognized from the audio frame, the mouth-closed mouth shape is selected as the base mouth shape.
As shown in fig. 2, an embodiment of the present invention further provides an apparatus for simulating the virtual character speaking, the apparatus including: a base mouth shape generating unit 102, a phoneme extracting unit 104, a base mouth shape determining unit 106, and a mouth shape synthesizing unit 108.
The basic mouth shape generating unit 102 is configured to create a mouth shape corresponding to each phoneme classification according to the plurality of phoneme classifications, and obtain a plurality of basic mouth shapes. The phoneme extracting unit 104 is used for inputting an audio stream, extracting an audio frame of the audio stream, and identifying phonemes of the audio frame. The base mouth shape determining unit 106 is configured to determine a phoneme classification corresponding to a phoneme of the audio frame from among the plurality of phoneme classifications, and select a base mouth shape corresponding thereto. The mouth shape synthesizing unit 108 is configured to synthesize the selected base mouth shape into a corresponding mouth shape of the audio frame.
The technical scheme of the invention realizes the synchronization of the voice mouth shape of the virtual digital person by the fusion and classification of the phonemes, so that the mouth shape fault tolerance rate of the virtual digital person during speaking can reach 99.9 percent. A mouth shape standardized mouth shape manufacturing scheme is formulated, and the virtual digital population shape manufacturing efficiency and the mouth shape quality are greatly improved. Meanwhile, the virtual digital person is closer to a real person, and the user experience is greatly improved.
The functions of each functional module of the device in the above embodiments of the present description may be implemented through each step of the above method embodiments, and therefore, a specific working process of the device provided in one embodiment of the present description is not repeated herein.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 1.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method in conjunction with fig. 1.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or may be embodied in software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a server. Of course, the processor and the storage medium may reside as discrete components in a server.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (9)

1. A method of simulating a virtual character speaking, the method comprising:
according to a plurality of phoneme classifications, making a mouth shape corresponding to each phoneme classification to obtain a plurality of basic mouth shapes;
inputting an audio stream, extracting an audio frame of the audio stream, and identifying a phoneme of the audio frame;
determining the phoneme classification corresponding to the phoneme of the audio frame from the plurality of phoneme classifications, and selecting the basic mouth shape corresponding to the phoneme classification;
synthesizing the selected base mouth shape into a corresponding mouth shape of the audio frame.
2. The method of claim 1, wherein the plurality of phoneme classifications comprises:
(p,b,m)、(f,v)、(th)、(t,d)、(k,g)、(tS,dZ,S)、(s,z)、(n,l)、(r)、(A)、(e)、(ih)、(oh)、(ou)。
3. the method of claim 1,
in the audio stream, a data amount in units of 2.5ms to 60ms is extracted as one frame of audio.
4. The method of claim 1, further comprising:
and making a virtual character model, and generating the mouth shape of the virtual character according to the corresponding mouth shape of the audio frame.
5. The method of claim 1,
the plurality of base dies further comprises: mouth closed and universal.
6. The method of claim 5,
selecting the generic mouth shape as a base mouth shape when the phonemes identified from the audio frame are not in the plurality of phoneme classifications;
selecting the mouth closed mouth as a base mouth when no phoneme is recognized from the audio frame.
7. An apparatus for simulating a virtual character speaking, the apparatus comprising:
a basic mouth shape generating unit (102) for creating a mouth shape corresponding to each phoneme classification according to the plurality of phoneme classifications to obtain a plurality of basic mouth shapes;
a phoneme extraction unit (104) for inputting an audio stream, extracting an audio frame of the audio stream, and identifying phonemes of the audio frame;
a basic mouth shape determining unit (106) for determining the phoneme classification corresponding to the phoneme of the audio frame from the plurality of phoneme classifications, and selecting the basic mouth shape corresponding thereto;
a mouth shape synthesis unit (108) for synthesizing the selected base mouth shape into a corresponding mouth shape of the audio frame.
8. A computer-readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any one of claims 1-6.
9. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-6.
CN202210050718.6A 2022-01-17 2022-01-17 Method and device for simulating virtual character speaking Pending CN114359450A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210050718.6A CN114359450A (en) 2022-01-17 2022-01-17 Method and device for simulating virtual character speaking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210050718.6A CN114359450A (en) 2022-01-17 2022-01-17 Method and device for simulating virtual character speaking

Publications (1)

Publication Number Publication Date
CN114359450A true CN114359450A (en) 2022-04-15

Family

ID=81092194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210050718.6A Pending CN114359450A (en) 2022-01-17 2022-01-17 Method and device for simulating virtual character speaking

Country Status (1)

Country Link
CN (1) CN114359450A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115050083A (en) * 2022-08-15 2022-09-13 南京硅基智能科技有限公司 Mouth shape correcting model, training of model and application method of model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108447474A (en) * 2018-03-12 2018-08-24 北京灵伴未来科技有限公司 A kind of modeling and the control method of virtual portrait voice and Hp-synchronization
CN109215631A (en) * 2017-07-05 2019-01-15 松下知识产权经营株式会社 Audio recognition method, program, speech recognition equipment and robot
CN109377540A (en) * 2018-09-30 2019-02-22 网易(杭州)网络有限公司 Synthetic method, device, storage medium, processor and the terminal of FA Facial Animation
CN111260761A (en) * 2020-01-15 2020-06-09 北京猿力未来科技有限公司 Method and device for generating mouth shape of animation character
CN111698552A (en) * 2020-05-15 2020-09-22 完美世界(北京)软件科技发展有限公司 Video resource generation method and device
CN112734889A (en) * 2021-02-19 2021-04-30 北京中科深智科技有限公司 Mouth shape animation real-time driving method and system for 2D character
CN113763518A (en) * 2021-09-09 2021-12-07 北京顺天立安科技有限公司 Multi-mode infinite expression synthesis method and device based on virtual digital human
CN113781610A (en) * 2021-06-28 2021-12-10 武汉大学 Virtual face generation method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215631A (en) * 2017-07-05 2019-01-15 松下知识产权经营株式会社 Audio recognition method, program, speech recognition equipment and robot
CN108447474A (en) * 2018-03-12 2018-08-24 北京灵伴未来科技有限公司 A kind of modeling and the control method of virtual portrait voice and Hp-synchronization
CN109377540A (en) * 2018-09-30 2019-02-22 网易(杭州)网络有限公司 Synthetic method, device, storage medium, processor and the terminal of FA Facial Animation
CN111260761A (en) * 2020-01-15 2020-06-09 北京猿力未来科技有限公司 Method and device for generating mouth shape of animation character
CN111698552A (en) * 2020-05-15 2020-09-22 完美世界(北京)软件科技发展有限公司 Video resource generation method and device
CN112734889A (en) * 2021-02-19 2021-04-30 北京中科深智科技有限公司 Mouth shape animation real-time driving method and system for 2D character
CN113781610A (en) * 2021-06-28 2021-12-10 武汉大学 Virtual face generation method
CN113763518A (en) * 2021-09-09 2021-12-07 北京顺天立安科技有限公司 Multi-mode infinite expression synthesis method and device based on virtual digital human

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115050083A (en) * 2022-08-15 2022-09-13 南京硅基智能科技有限公司 Mouth shape correcting model, training of model and application method of model
US11887403B1 (en) 2022-08-15 2024-01-30 Nanjing Silicon Intelligence Technology Co., Ltd. Mouth shape correction model, and model training and application method

Similar Documents

Publication Publication Date Title
US10789290B2 (en) Audio data processing method and apparatus, and computer storage medium
Czyzewski et al. An audio-visual corpus for multimodal automatic speech recognition
US11908451B2 (en) Text-based virtual object animation generation method, apparatus, storage medium, and terminal
KR20220004737A (en) Multilingual speech synthesis and cross-language speech replication
CN108899009B (en) Chinese speech synthesis system based on phoneme
CN111048064B (en) Voice cloning method and device based on single speaker voice synthesis data set
CN110136687B (en) Voice training based cloned accent and rhyme method
CN111489424A (en) Virtual character expression generation method, control method, device and terminal equipment
JP2008500573A (en) Method and system for changing messages
WO2022048404A1 (en) End-to-end virtual object animation generation method and apparatus, storage medium, and terminal
CN112185363B (en) Audio processing method and device
JP2020034883A (en) Voice synthesizer and program
CN114121006A (en) Image output method, device, equipment and storage medium of virtual character
US20230298564A1 (en) Speech synthesis method and apparatus, device, and storage medium
Salvi et al. SynFace—speech-driven facial animation for virtual speech-reading support
CN115938352A (en) Model obtaining method, mouth shape coefficient generating device, mouth shape coefficient generating equipment and mouth shape coefficient generating medium
CN114359450A (en) Method and device for simulating virtual character speaking
CN115312030A (en) Display control method and device of virtual role and electronic equipment
AU2022203531B1 (en) Real-time speech-to-speech generation (rssg) apparatus, method and a system therefore
CN117275485B (en) Audio and video generation method, device, equipment and storage medium
WO2021169825A1 (en) Speech synthesis method and apparatus, device and storage medium
CN116110370A (en) Speech synthesis system and related equipment based on man-machine speech interaction
JP2002229590A (en) Speech recognition system
CN114359443A (en) Method and device for simulating virtual character speaking
Verma et al. Animating expressive faces across languages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination