EA202090169A1 - METHOD AND SYSTEM FOR CREATING MIMICS BASED ON TEXT - Google Patents

METHOD AND SYSTEM FOR CREATING MIMICS BASED ON TEXT

Info

Publication number
EA202090169A1
EA202090169A1 EA202090169A EA202090169A EA202090169A1 EA 202090169 A1 EA202090169 A1 EA 202090169A1 EA 202090169 A EA202090169 A EA 202090169A EA 202090169 A EA202090169 A EA 202090169A EA 202090169 A1 EA202090169 A1 EA 202090169A1
Authority
EA
Eurasian Patent Office
Prior art keywords
sequence
image
frames
face mask
head model
Prior art date
Application number
EA202090169A
Other languages
Russian (ru)
Other versions
EA039495B1 (en
Inventor
Альберт Рувимович ЕФИМОВ
Алексей Сергеевич ГОННОЧЕНКО
Михаил Александрович ВЛАДИМИРОВ
Original Assignee
Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк) filed Critical Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Publication of EA202090169A1 publication Critical patent/EA202090169A1/en
Publication of EA039495B1 publication Critical patent/EA039495B1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/022Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Данное изобретение в общем относится к области обработки данных изображения, а в частности к способу и системе для создания мимики на основе текста. Техническим результатом, достигаемым при решении вышеуказанного изобретения, является обеспечение возможности создания видеопотока с анимированным изображением 3D-модели головы с размещенной на ней динамической текстурой лицевой маски на основе данных речевого сигнала. Указанный технический результат достигается благодаря осуществлению способа обработки речевого сигнала для формирования видеопотока, выполняемого по меньшей мере одним вычислительным устройством, содержащего этапы, на которых получают данные по меньшей мере одного речевого сигнала; разделяют участки речевого сигнала, содержащие информацию о голосе, на временные окна; формируют для каждого временного окна изображение частотного спектра для получения последовательности изображений частотного спектра; на основе последовательности изображений частотного спектра определяют последовательность данных о множестве координат точек, образующих лицевую маску; размещают лицевую маску на 3D-модели головы для формирования последовательности кадров, содержащих изображение 3D-модели головы с размещенной на ней лицевой маской; на основе последовательности изображений частотного спектра формируют последовательность кадров динамической текстуры лицевой маски; формируют последовательность кадров, содержащих изображение результирующей 3D-модели головы с размещенной на ней динамической текстурой лицевой маски на основе последовательности кадров, содержащих изображение 3D-модели головы с размещенной на ней лицевой маской, и кадров динамической текстуры лицевой маски; формируют последовательность кадров с изображением результирующей 3D-модели головы на фоне сцены; объединяют полученную на предыдущем шаге последовательность кадров в видеопоток.This invention relates generally to the field of image data processing, and in particular to a method and system for generating text-based facial expressions. The technical result achieved by solving the above invention is to provide the possibility of creating a video stream with an animated image of a 3D head model with a dynamic texture of the face mask placed on it based on the speech signal data. The specified technical result is achieved through the implementation of a method for processing a speech signal to generate a video stream, performed by at least one computing device, containing the stages, which obtain data of at least one speech signal; dividing sections of the speech signal containing information about the voice into time windows; generating for each time window a frequency spectrum image to obtain a sequence of frequency spectrum images; on the basis of the sequence of images of the frequency spectrum determine the sequence of data on the set of coordinates of the points that form the face mask; placing the facial mask on the 3D head model to form a sequence of frames containing the image of the 3D head model with the facial mask placed on it; on the basis of the sequence of images of the frequency spectrum form a sequence of frames of the dynamic texture of the face mask; form a sequence of frames containing an image of the resulting 3D head model with a dynamic texture of the face mask placed on it based on a sequence of frames containing an image of a 3D head model with a face mask placed on it and frames of the dynamic texture of the face mask; form a sequence of frames with the image of the resulting 3D-model of the head against the background of the scene; combine the sequence of frames obtained at the previous step into a video stream.

EA202090169A 2019-12-27 2020-01-28 Method and system for creating facial expressions based on text EA039495B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
RU2019144357A RU2723454C1 (en) 2019-12-27 2019-12-27 Method and system for creating facial expression based on text

Publications (2)

Publication Number Publication Date
EA202090169A1 true EA202090169A1 (en) 2021-06-30
EA039495B1 EA039495B1 (en) 2022-02-03

Family

ID=71095938

Family Applications (1)

Application Number Title Priority Date Filing Date
EA202090169A EA039495B1 (en) 2019-12-27 2020-01-28 Method and system for creating facial expressions based on text

Country Status (3)

Country Link
EA (1) EA039495B1 (en)
RU (1) RU2723454C1 (en)
WO (1) WO2021133201A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2748779C1 (en) * 2020-10-30 2021-05-31 Общество с ограниченной ответственностью "СДН-видео" Method and system for automated generation of video stream with digital avatar based on text

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6232965B1 (en) * 1994-11-30 2001-05-15 California Institute Of Technology Method and apparatus for synthesizing realistic animations of a human speaking using a computer
US8730231B2 (en) * 2007-11-20 2014-05-20 Image Metrics, Inc. Systems and methods for creating personalized media content having multiple content layers
US20120130717A1 (en) * 2010-11-19 2012-05-24 Microsoft Corporation Real-time Animation for an Expressive Avatar
WO2015145219A1 (en) * 2014-03-28 2015-10-01 Navaratnam Ratnakumar Systems for remote service of customers using virtual and physical mannequins
US20190147838A1 (en) * 2014-08-22 2019-05-16 Zya, Inc. Systems and methods for generating animated multimedia compositions
US10249291B2 (en) * 2016-05-27 2019-04-02 Asustek Computer Inc. Animation synthesis system and lip animation synthesis method
RU2671990C1 (en) * 2017-11-14 2018-11-08 Евгений Борисович Югай Method of displaying three-dimensional face of the object and device for it
CN109857352A (en) * 2017-11-30 2019-06-07 富泰华工业(深圳)有限公司 Cartoon display method and human-computer interaction device
WO2021034211A1 (en) * 2019-08-16 2021-02-25 Станислав Игоревич АШМАНОВ Method and system of transfer of motion of subject from video onto animated character

Also Published As

Publication number Publication date
WO2021133201A1 (en) 2021-07-01
RU2723454C1 (en) 2020-06-11
EA039495B1 (en) 2022-02-03

Similar Documents

Publication Publication Date Title
WO2021073416A1 (en) Method for generating virtual character video on the basis of neural network, and related device
CN108200446B (en) On-line multimedia interaction system and method of virtual image
US20180350123A1 (en) Generating a layered animatable puppet using a content stream
US20150279364A1 (en) Mouth-Phoneme Model for Computerized Lip Reading
KR20220097121A (en) Mouth shape synthesis device and method using random nulling artificial neural network
CN112001992A (en) Voice-driven 3D virtual human expression sound-picture synchronization method and system based on deep learning
US10825224B2 (en) Automatic viseme detection for generating animatable puppet
US7257538B2 (en) Generating animation from visual and audio input
KR20210119441A (en) Real-time face replay based on text and audio
CN113299312B (en) Image generation method, device, equipment and storage medium
CN115049016B (en) Model driving method and device based on emotion recognition
RU2721180C1 (en) Method for generating an animation model of a head based on a speech signal and an electronic computing device which implements it
CN114519880B (en) Active speaker recognition method based on cross-modal self-supervision learning
CN110852965A (en) Video illumination enhancement method and system based on generation countermeasure network
CN110121105B (en) Clip video generation method and device
CN113971828A (en) Virtual object lip driving method, model training method, related device and electronic equipment
CN115578512A (en) Method, device and equipment for training and using generation model of voice broadcast video
CN114255737B (en) Voice generation method and device and electronic equipment
EA202090169A1 (en) METHOD AND SYSTEM FOR CREATING MIMICS BASED ON TEXT
CN117315102A (en) Virtual anchor processing method, device, computing equipment and storage medium
CN111260756B (en) Method and device for transmitting information
CN116761013A (en) Digital human face image changing method, device, equipment and storage medium
CN116071467A (en) Method and device for generating lip-shaped driving model, electronic equipment and storage medium
JP2024502326A (en) How to train a neural network configured to convert 2D images into 3D models
CN115996303B (en) Video generation method, device, electronic equipment and storage medium