EA202090169A1 - METHOD AND SYSTEM FOR CREATING MIMICS BASED ON TEXT - Google Patents
METHOD AND SYSTEM FOR CREATING MIMICS BASED ON TEXTInfo
- Publication number
- EA202090169A1 EA202090169A1 EA202090169A EA202090169A EA202090169A1 EA 202090169 A1 EA202090169 A1 EA 202090169A1 EA 202090169 A EA202090169 A EA 202090169A EA 202090169 A EA202090169 A EA 202090169A EA 202090169 A1 EA202090169 A1 EA 202090169A1
- Authority
- EA
- Eurasian Patent Office
- Prior art keywords
- sequence
- image
- frames
- face mask
- head model
- Prior art date
Links
- 238000000034 method Methods 0.000 title abstract 3
- 238000001228 spectrum Methods 0.000 abstract 4
- 230000001815 facial effect Effects 0.000 abstract 2
- 230000008921 facial expression Effects 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/022—Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Processing Or Creating Images (AREA)
Abstract
Данное изобретение в общем относится к области обработки данных изображения, а в частности к способу и системе для создания мимики на основе текста. Техническим результатом, достигаемым при решении вышеуказанного изобретения, является обеспечение возможности создания видеопотока с анимированным изображением 3D-модели головы с размещенной на ней динамической текстурой лицевой маски на основе данных речевого сигнала. Указанный технический результат достигается благодаря осуществлению способа обработки речевого сигнала для формирования видеопотока, выполняемого по меньшей мере одним вычислительным устройством, содержащего этапы, на которых получают данные по меньшей мере одного речевого сигнала; разделяют участки речевого сигнала, содержащие информацию о голосе, на временные окна; формируют для каждого временного окна изображение частотного спектра для получения последовательности изображений частотного спектра; на основе последовательности изображений частотного спектра определяют последовательность данных о множестве координат точек, образующих лицевую маску; размещают лицевую маску на 3D-модели головы для формирования последовательности кадров, содержащих изображение 3D-модели головы с размещенной на ней лицевой маской; на основе последовательности изображений частотного спектра формируют последовательность кадров динамической текстуры лицевой маски; формируют последовательность кадров, содержащих изображение результирующей 3D-модели головы с размещенной на ней динамической текстурой лицевой маски на основе последовательности кадров, содержащих изображение 3D-модели головы с размещенной на ней лицевой маской, и кадров динамической текстуры лицевой маски; формируют последовательность кадров с изображением результирующей 3D-модели головы на фоне сцены; объединяют полученную на предыдущем шаге последовательность кадров в видеопоток.This invention relates generally to the field of image data processing, and in particular to a method and system for generating text-based facial expressions. The technical result achieved by solving the above invention is to provide the possibility of creating a video stream with an animated image of a 3D head model with a dynamic texture of the face mask placed on it based on the speech signal data. The specified technical result is achieved through the implementation of a method for processing a speech signal to generate a video stream, performed by at least one computing device, containing the stages, which obtain data of at least one speech signal; dividing sections of the speech signal containing information about the voice into time windows; generating for each time window a frequency spectrum image to obtain a sequence of frequency spectrum images; on the basis of the sequence of images of the frequency spectrum determine the sequence of data on the set of coordinates of the points that form the face mask; placing the facial mask on the 3D head model to form a sequence of frames containing the image of the 3D head model with the facial mask placed on it; on the basis of the sequence of images of the frequency spectrum form a sequence of frames of the dynamic texture of the face mask; form a sequence of frames containing an image of the resulting 3D head model with a dynamic texture of the face mask placed on it based on a sequence of frames containing an image of a 3D head model with a face mask placed on it and frames of the dynamic texture of the face mask; form a sequence of frames with the image of the resulting 3D-model of the head against the background of the scene; combine the sequence of frames obtained at the previous step into a video stream.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
RU2019144357A RU2723454C1 (en) | 2019-12-27 | 2019-12-27 | Method and system for creating facial expression based on text |
Publications (2)
Publication Number | Publication Date |
---|---|
EA202090169A1 true EA202090169A1 (en) | 2021-06-30 |
EA039495B1 EA039495B1 (en) | 2022-02-03 |
Family
ID=71095938
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EA202090169A EA039495B1 (en) | 2019-12-27 | 2020-01-28 | Method and system for creating facial expressions based on text |
Country Status (3)
Country | Link |
---|---|
EA (1) | EA039495B1 (en) |
RU (1) | RU2723454C1 (en) |
WO (1) | WO2021133201A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2748779C1 (en) * | 2020-10-30 | 2021-05-31 | Общество с ограниченной ответственностью "СДН-видео" | Method and system for automated generation of video stream with digital avatar based on text |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6232965B1 (en) * | 1994-11-30 | 2001-05-15 | California Institute Of Technology | Method and apparatus for synthesizing realistic animations of a human speaking using a computer |
US8730231B2 (en) * | 2007-11-20 | 2014-05-20 | Image Metrics, Inc. | Systems and methods for creating personalized media content having multiple content layers |
US20120130717A1 (en) * | 2010-11-19 | 2012-05-24 | Microsoft Corporation | Real-time Animation for an Expressive Avatar |
WO2015145219A1 (en) * | 2014-03-28 | 2015-10-01 | Navaratnam Ratnakumar | Systems for remote service of customers using virtual and physical mannequins |
US20190147838A1 (en) * | 2014-08-22 | 2019-05-16 | Zya, Inc. | Systems and methods for generating animated multimedia compositions |
US10249291B2 (en) * | 2016-05-27 | 2019-04-02 | Asustek Computer Inc. | Animation synthesis system and lip animation synthesis method |
RU2671990C1 (en) * | 2017-11-14 | 2018-11-08 | Евгений Борисович Югай | Method of displaying three-dimensional face of the object and device for it |
CN109857352A (en) * | 2017-11-30 | 2019-06-07 | 富泰华工业(深圳)有限公司 | Cartoon display method and human-computer interaction device |
WO2021034211A1 (en) * | 2019-08-16 | 2021-02-25 | Станислав Игоревич АШМАНОВ | Method and system of transfer of motion of subject from video onto animated character |
-
2019
- 2019-12-27 RU RU2019144357A patent/RU2723454C1/en active
- 2019-12-27 WO PCT/RU2019/001040 patent/WO2021133201A1/en active Application Filing
-
2020
- 2020-01-28 EA EA202090169A patent/EA039495B1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2021133201A1 (en) | 2021-07-01 |
RU2723454C1 (en) | 2020-06-11 |
EA039495B1 (en) | 2022-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021073416A1 (en) | Method for generating virtual character video on the basis of neural network, and related device | |
CN108200446B (en) | On-line multimedia interaction system and method of virtual image | |
US20180350123A1 (en) | Generating a layered animatable puppet using a content stream | |
US20150279364A1 (en) | Mouth-Phoneme Model for Computerized Lip Reading | |
KR20220097121A (en) | Mouth shape synthesis device and method using random nulling artificial neural network | |
CN112001992A (en) | Voice-driven 3D virtual human expression sound-picture synchronization method and system based on deep learning | |
US10825224B2 (en) | Automatic viseme detection for generating animatable puppet | |
US7257538B2 (en) | Generating animation from visual and audio input | |
KR20210119441A (en) | Real-time face replay based on text and audio | |
CN113299312B (en) | Image generation method, device, equipment and storage medium | |
CN115049016B (en) | Model driving method and device based on emotion recognition | |
RU2721180C1 (en) | Method for generating an animation model of a head based on a speech signal and an electronic computing device which implements it | |
CN114519880B (en) | Active speaker recognition method based on cross-modal self-supervision learning | |
CN110852965A (en) | Video illumination enhancement method and system based on generation countermeasure network | |
CN110121105B (en) | Clip video generation method and device | |
CN113971828A (en) | Virtual object lip driving method, model training method, related device and electronic equipment | |
CN115578512A (en) | Method, device and equipment for training and using generation model of voice broadcast video | |
CN114255737B (en) | Voice generation method and device and electronic equipment | |
EA202090169A1 (en) | METHOD AND SYSTEM FOR CREATING MIMICS BASED ON TEXT | |
CN117315102A (en) | Virtual anchor processing method, device, computing equipment and storage medium | |
CN111260756B (en) | Method and device for transmitting information | |
CN116761013A (en) | Digital human face image changing method, device, equipment and storage medium | |
CN116071467A (en) | Method and device for generating lip-shaped driving model, electronic equipment and storage medium | |
JP2024502326A (en) | How to train a neural network configured to convert 2D images into 3D models | |
CN115996303B (en) | Video generation method, device, electronic equipment and storage medium |