CN107221344A - A kind of speech emotional moving method - Google Patents

A kind of speech emotional moving method Download PDF

Info

Publication number
CN107221344A
CN107221344A CN201710222674.XA CN201710222674A CN107221344A CN 107221344 A CN107221344 A CN 107221344A CN 201710222674 A CN201710222674 A CN 201710222674A CN 107221344 A CN107221344 A CN 107221344A
Authority
CN
China
Prior art keywords
speech
emotional
target
feature
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710222674.XA
Other languages
Chinese (zh)
Inventor
李华康
杜阳阳
金旭
胡晓东
丘添元
张笑源
孙国梓
李涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201710222674.XA priority Critical patent/CN107221344A/en
Publication of CN107221344A publication Critical patent/CN107221344A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Child & Adolescent Psychology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of speech emotional moving method, speech database generation speech emotional data set is primarily based on, label for labelling is completed, audio feature extraction is then carried out to audio file using speech characteristic parameter model, set of voice features is obtained;Next machine learning is carried out to set of voice features and speech emotional label using Machine learning tools, builds emotion model storehouse.Selection target to be migrated, from multimedia terminal input speech signal, obtain the feature set of current speech signal, current emotional category is obtained by emotional semantic classification, judge whether consistent with the target of input, original input speech signal is exported directly as target emotional speech if consistent, feature feeling shifting is otherwise carried out;Eventually pass phonetic synthesis processing generation final goal emotional speech output.The method proposed by the present invention migrated based on emotional semantic classification and feature, can realize the change of speech emotional on the premise of original speaker's sound mark is not lost.

Description

A kind of speech emotional moving method
Technical field
The invention belongs to technical field of voice recognition, it is related to the moving method of speech emotional, and in particular to one kind is not based on With the moving method of the speech emotional of speech vendors' model.
Background technology
With the development of intelligent chip technology, the intellectuality of various terminal equipment and integration degree more and more higher, equipment Miniaturization, lighting, networking make it that the life of people is more and more convenient.User constantly carries out voice by the network terminal The exchange of video, have accumulated the multi-medium data of magnanimity.With the accumulation of platform data, intelligent Answer System also gradually meet the tendency of and It is raw.These intelligent Answer Systems include speech recognition, sexy analysis, information retrieval, semantic matches, sentence generation, voice conjunction Into grade tip technology.
Speech recognition technology is to allow machine voice signal to be converted into corresponding text by identification technology and understanding process This information or machine instruction, allow machine to understand the expression content of the mankind, mainly include voice unit selection, phonetic feature The technologies such as extraction, pattern match and model training.Voice unit includes three kinds of word (sentence), syllable and velocity of sound, specifically according to field Scape and task are selected.Sub-word units are mainly suitable for small vocabulary speech identifying system;Syllable unit is more suitable for Chinese speech Identification;Although phoneme can explain voice basis well, complicated and changeable due to enunciator leads to not obtain steady Fixed data set, at present still under study for action.
Another research direction is the emotion recognition of voice, mainly by speech signal collection, affective feature extraction and emotion Identification composition.Wherein affective feature extraction mainly has three kinds of prosodic features, the correlated characteristic based on spectrum and tonequality feature.These Feature realizes extraction, and the progress emotion recognition in the form of global characteristics statistical value typically by minimum particle size of frame.In emotion In terms of recognizer, mainly including discrete language emotion classifiers and the major class of dimension speech emotional fallout predictor two.Speech emotional is known Other technology is also widely used for the fields such as telephone service center, the spiritual differentiation of driver, Online Distance Computing Course.
Intelligent body be described as be artificial intelligence of future generation condensation products, ambient environmental factors can not only be recognized, understand Behavior Expression and the language description of people, or even in the communication process with people, with greater need for the emotion for going to understand people, and can be real Existing apish emotional expression, could realize more soft interaction.The emotion research of current intelligent body, which is concentrated mainly on, to be based on Virtual image processing, is related to the multiple fields such as computer graphics, psychology, cognitive science, neuro-physiology, artificial intelligence and grinds The achievement for the person of studying carefully.Although it was found that the environment sensing information of people more than 90% comes from vision, the emotion perception of the overwhelming majority It is to come from voice.The emotion system of class people's intelligent body how is set up from voice field, not yet there is disclosed research issue so far.
The content of the invention
The purpose of the present invention is, using machine learning method as Main Means, to propose a kind of speech emotional expression method of people, And deep learning and convolutional network algorithm are used on this basis, the migration of speech emotional is realized from system.Not only to voice Identification, sentiment analysis provide certain reference method, can be more used widely on following class people intelligent body.
To achieve the above object, technical scheme proposed by the present invention is a kind of speech emotional moving method, it is specific include with Lower step:
Step 1, one speech database of preparation, pass through standard sample and generate speech emotional data set S={ s1,s2,…, sn};
Step 2, using manual type the speech database of step 1 is labelled, mark the emotion E=of each voice document {e1,e2,…,en};
Step 3, using speech characteristic parameter model to each audio file s in sound bankiCarry out audio feature extraction, Obtain basic set of voice features Fi={ f1 i,f2 i,…,fn i};
The voice feelings that step 4, each set of voice features and step 2 that are obtained using Machine learning tools to step 3 are obtained Feel label and carry out machine learning, obtain the characteristic model of each class speech emotional, build emotion model storehouse Eb
Step 5, by a multimedia terminal, selection needs the target Target that speech emotional is migrated;
Step 6, from multimedia terminal input speech signal st
Step 7, by the s currently inputtedtSpeech emotional characteristic extracting module is input to, the feature of current speech signal is obtained Collect Ft={ f1 t,f2 t,…,fn t};
Step 8, using with step 4 identical machine learning algorithm, the s that step 7 is obtainedtSet of voice features FtWith reference to The emotion model storehouse E that step step 4 is obtainedbEmotional semantic classification is carried out, s is obtainedtCurrent emotional category se
The s that step 9, judgment step 8 are obtainedeIt is whether consistent with step 5 Target inputted, if se=Targete, then Original input speech signal is exported directly as target emotional speech, if seTargete, then invocation step 10 carry out feature Feeling shifting;
Step 10, speech emotional principal character of the current speech emotion principal character into emotion model storehouse moved Move;
Phonetic feature after step 11, the feature obtained using Speech Synthesis Algorithm to step 10 migration is processed, and is closed Into the output of final goal emotional speech.
Further, in above-mentioned steps 1, the sample frequency of speech data is 44.1KHz, record length between 3~10s, And save as wav forms.
In step 1, in order to obtain preferable performance, the natural quality dimension of sampled data can not be concentrated excessively, hits According to the collection in all ages and classes, sex, occupation et al. as far as possible.
In step 6, the input can click on to submit after the completion of inputting or record in real time.
The invention has the advantages that:
1st, present invention firstly provides the concept of speech emotional migration, emotion structure side can be provided for following virtual reality Method.
2nd, the method proposed by the present invention migrated based on emotional semantic classification and feature, can not lose original speaker's sounding spy The change of speech emotional is realized on the premise of levying.
Brief description of the drawings
Fig. 1 is the speech emotional moving method schematic diagram that the present invention is provided.
Fig. 2 is the spectrum signature figure that the present invention is originally inputted speech samples.
Fig. 3 is that raw tone sample of the present invention passes through the spectrum signature figure that emotion is converted.
Embodiment
In conjunction with accompanying drawing, the present invention is further detailed explanation.
The present invention provides a kind of user's expression speech emotional moving method based on speech emotional database, as shown in figure 1, The module or function that this method is related to include:
, there is the voice initial data under all ages and classes, sex, scene in basic speech storehouse.
Tag library, Emotion tagging is carried out to basic speech storehouse, such as gentle, glad, angry, angry, sad.
Speech input device, such as microphone, it is possible to achieve the real-time voice input of user.
Speech emotional feature extraction, by sound characteristic analysis tool, obtains general sound characteristic, and according to the language of people Sound signal characteristic and emotion behavior feature, the feature set needed for choosing are used as speech emotional feature.
Machine learning, speech emotional tag library is confirmed using machine learning algorithm, and speech emotional feature set is built and trained Model.
Emotion model storehouse, sound bank data by machine learning obtain according to the dimensions such as sex, age, emotion classify after Speech emotional model library.
Emotion is selected, user selects the emotion model for needing current speech being converted into real time before input speech signal.
Emotional category judges, judges whether the emotion of active user's input is consistent with the emotion of selection.If consistent, directly Connect output target emotional speech.If inconsistent, feeling shifting module is called.
Feeling shifting, in the case where user's input voice and selection emotion are inconsistent, will input speech emotional feature set Characteristic distance contrast is carried out with selection affective characteristics collection, adjustment input speech emotional feature space is represented, realizes feeling shifting.So Exported afterwards using the emotional speech adjusted as target emotional speech.
One embodiment is now provided, to illustrate the transition process of speech emotional, specifically comprised the steps of:
Step 1, this method need to prepare a speech database, preferably, speech data uses standard sample 44.1KHz, in short, the time saves as wav forms between 3~10s, obtains voice feelings some lower tester of record Feel data set S={ s1,s2,…,sn}.In order to obtain preferable performance, whether sampled data as possible or not age, sex, occupation etc. The natural quality dimension of people is excessively concentrated.
Step 2, by the way of artificial, the speech database that step 1 prepares is labelled, each voice document is marked Emotion E={ e1,e2,…,en, such as " worry ", " startled ", " anger ", " disappointment ", " sadness " etc.
Step 3, using speech characteristic parameter model to each audio file s in sound bankiAudio feature extraction is carried out, is obtained To basic set of voice features Fi={ f1 i,f2 i,…,fn iEtc. (Fig. 2 show raw tone sample spectrum signature signal Figure), such as " envelope (env) ", " word speed (speed) ", " zero-crossing rate (zcr) ", " energy (eng) ", " Energy-Entropy (eoe) ", " frequency Compose barycenter (spec_cent) ", " frequency spectrum diffusion (spec_spr) ", " mel-frequency (mfccs) ", " chroma vector (chrona) " Deng.
The feature set and step of step 4, each voice document obtained using Machine learning tools (such as Libsvm) to step 3 Speech emotional label obtained by rapid 2 carries out machine learning, obtains the characteristic model of each class speech emotional, builds emotion model Storehouse Eb
Step 5, by a multimedia terminal, selection needs speech emotional to migrate target Targete, such as " sadness ".
Step 6, from multimedia terminal input speech signal st, point after the completion of being real-time input or recording Hit and submit.
Step 7, by the s currently inputtedtSpeech emotional characteristic extracting module is input to, the feature of current speech signal is obtained Collect Ft={ f1 t,f2 t,…,fn t}。
Step 8, using step 4 identical machine learning algorithm, the s that step 7 is obtainedtSet of voice features FtWith reference to step The emotion model storehouse E that rapid step 4 is obtainedbEmotional semantic classification is carried out, s is obtainedtCurrent emotional category se
The s that step 9, judgment step 8 are obtainedeThe Target inputted with step 5eIt is whether consistent, if se=Targete, then Original input speech signal is exported directly as target emotional speech.If seI Targete, then invocation step 10 carry out spy Levy feeling shifting.
Step 10, by current speech emotion principal character, into emotion model storehouse, speech emotional principal character is migrated (Fig. 3 show the spectrum signature after migration), such as envelope migrates resultenv=(senv+Targetenv)/2, word speed adjustment resultspeed=(sspeed+Targetspeed)/2。
Step 11, the feature obtained using a Speech Synthesis Algorithm (Pitch synchronous overlap add technology, PSOLA) to step 10 The phonetic feature migrated is processed the emotional speech output of synthesis final goal.
The foregoing is only the present invention is preferable to carry out case, is not intended to limit the invention, although with reference to foregoing The present invention is described in detail embodiment, for those skilled in the art, and it still can be to foregoing each reality Apply the technical scheme described in example to be improved, or which part technology is replaced on an equal basis.All spirit in the present invention Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Claims (4)

1. a kind of speech emotional moving method, it is characterised in that comprise the steps of:
Step 1, one speech database of preparation, pass through standard sample and generate speech emotional data set S={ s1,s2,…,sn};
Step 2, using manual type the speech database of step 1 is labelled, mark the emotion E={ e of each voice document1, e2,…,en};
Step 3, using speech characteristic parameter model to each audio file s in sound bankiAudio feature extraction is carried out, is obtained Basic set of voice features Fi={ f1 i,f2 i,…,fn i};
The speech emotional mark that step 4, each set of voice features and step 2 that are obtained using Machine learning tools to step 3 are obtained Label carry out machine learning, obtain the characteristic model of each class speech emotional, build emotion model storehouse Eb
Step 5, by a multimedia terminal, selection needs the target Target that speech emotional is migratede
Step 6, from multimedia terminal input speech signal st
Step 7, by the s currently inputtedtSpeech emotional characteristic extracting module is input to, the feature set F of current speech signal is obtainedt ={ f1 t,f2 t,…,fn t};
Step 8, using with step 4 identical machine learning algorithm, the s that step 7 is obtainedtSet of voice features FtWith reference to step The emotion model storehouse E that step 4 is obtainedbEmotional semantic classification is carried out, s is obtainedtCurrent emotional category se
The s that step 9, judgment step 8 are obtainedeIt is whether consistent with step 5 Target inputted, if se=Targete, then by original Beginning input speech signal is exported directly as target emotional speech, if seTargete, then invocation step 10 carry out feature emotion Migration;
Step 10, speech emotional principal character of the current speech emotion principal character into emotion model storehouse migrated;
Phonetic feature after step 11, the feature obtained using Speech Synthesis Algorithm to step 10 migration is processed, and synthesis is most Whole target emotional speech output.
2. speech emotional moving method according to claim 1, it is characterised in that the sample frequency of speech data in step 1 For 44.1KHz, record length saves as wav forms between 3~10s.
3. speech emotional moving method according to claim 1, it is characterised in that in order to obtain preferable property in step 1 Can, the natural quality dimension of sampled data can not be concentrated excessively.
4. speech emotional moving method according to claim 1, it is characterised in that input can be real-time described in step 6 Click on and submit after the completion of input or recording.
CN201710222674.XA 2017-04-07 2017-04-07 A kind of speech emotional moving method Pending CN107221344A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710222674.XA CN107221344A (en) 2017-04-07 2017-04-07 A kind of speech emotional moving method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710222674.XA CN107221344A (en) 2017-04-07 2017-04-07 A kind of speech emotional moving method

Publications (1)

Publication Number Publication Date
CN107221344A true CN107221344A (en) 2017-09-29

Family

ID=59928228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710222674.XA Pending CN107221344A (en) 2017-04-07 2017-04-07 A kind of speech emotional moving method

Country Status (1)

Country Link
CN (1) CN107221344A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019218773A1 (en) * 2018-05-15 2019-11-21 中兴通讯股份有限公司 Voice synthesis method and device, storage medium, and electronic device
CN111951778A (en) * 2020-07-15 2020-11-17 天津大学 Method for synthesizing emotion voice by using transfer learning under low resource
CN112786026A (en) * 2019-12-31 2021-05-11 深圳市木愚科技有限公司 Parent-child story personalized audio generation system and method based on voice migration learning
CN113421544A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Singing voice synthesis method and device, computer equipment and storage medium
CN113555004A (en) * 2021-07-15 2021-10-26 复旦大学 Voice depression state identification method based on feature selection and transfer learning
CN114495988A (en) * 2021-08-31 2022-05-13 荣耀终端有限公司 Emotion processing method of input information and electronic equipment
CN116955572A (en) * 2023-09-06 2023-10-27 宁波尚煦智能科技有限公司 Online service feedback interaction method based on artificial intelligence and big data system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1787074A (en) * 2005-12-13 2006-06-14 浙江大学 Method for distinguishing speak person based on feeling shifting rule and voice correction
CN101064104A (en) * 2006-04-24 2007-10-31 中国科学院自动化研究所 Emotion voice creating method based on voice conversion
CN101261832A (en) * 2008-04-21 2008-09-10 北京航空航天大学 Extraction and modeling method for Chinese speech sensibility information
CN102184731A (en) * 2011-05-12 2011-09-14 北京航空航天大学 Method for converting emotional speech by combining rhythm parameters with tone parameters
CN103198827A (en) * 2013-03-26 2013-07-10 合肥工业大学 Voice emotion correction method based on relevance of prosodic feature parameter and emotion parameter
CN103544963A (en) * 2013-11-07 2014-01-29 东南大学 Voice emotion recognition method based on core semi-supervised discrimination and analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1787074A (en) * 2005-12-13 2006-06-14 浙江大学 Method for distinguishing speak person based on feeling shifting rule and voice correction
CN101064104A (en) * 2006-04-24 2007-10-31 中国科学院自动化研究所 Emotion voice creating method based on voice conversion
CN101261832A (en) * 2008-04-21 2008-09-10 北京航空航天大学 Extraction and modeling method for Chinese speech sensibility information
CN102184731A (en) * 2011-05-12 2011-09-14 北京航空航天大学 Method for converting emotional speech by combining rhythm parameters with tone parameters
CN103198827A (en) * 2013-03-26 2013-07-10 合肥工业大学 Voice emotion correction method based on relevance of prosodic feature parameter and emotion parameter
CN103544963A (en) * 2013-11-07 2014-01-29 东南大学 Voice emotion recognition method based on core semi-supervised discrimination and analysis

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019218773A1 (en) * 2018-05-15 2019-11-21 中兴通讯股份有限公司 Voice synthesis method and device, storage medium, and electronic device
CN112786026A (en) * 2019-12-31 2021-05-11 深圳市木愚科技有限公司 Parent-child story personalized audio generation system and method based on voice migration learning
CN112786026B (en) * 2019-12-31 2024-05-07 深圳市木愚科技有限公司 Parent-child story personalized audio generation system and method based on voice transfer learning
CN111951778A (en) * 2020-07-15 2020-11-17 天津大学 Method for synthesizing emotion voice by using transfer learning under low resource
CN111951778B (en) * 2020-07-15 2023-10-17 天津大学 Method for emotion voice synthesis by utilizing transfer learning under low resource
CN113421544A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Singing voice synthesis method and device, computer equipment and storage medium
CN113421544B (en) * 2021-06-30 2024-05-10 平安科技(深圳)有限公司 Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium
CN113555004A (en) * 2021-07-15 2021-10-26 复旦大学 Voice depression state identification method based on feature selection and transfer learning
CN114495988A (en) * 2021-08-31 2022-05-13 荣耀终端有限公司 Emotion processing method of input information and electronic equipment
CN116955572A (en) * 2023-09-06 2023-10-27 宁波尚煦智能科技有限公司 Online service feedback interaction method based on artificial intelligence and big data system

Similar Documents

Publication Publication Date Title
CN110838286B (en) Model training method, language identification method, device and equipment
CN110491382B (en) Speech recognition method and device based on artificial intelligence and speech interaction equipment
Li et al. Controllable emotion transfer for end-to-end speech synthesis
CN107221344A (en) A kind of speech emotional moving method
CN110634491B (en) Series connection feature extraction system and method for general voice task in voice signal
CN110853618B (en) Language identification method, model training method, device and equipment
CN111048062B (en) Speech synthesis method and apparatus
CN106503805B (en) A kind of bimodal based on machine learning everybody talk with sentiment analysis method
CN107993665B (en) Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system
CN102779508B (en) Sound bank generates Apparatus for () and method therefor, speech synthesis system and method thereof
CN108564942A (en) One kind being based on the adjustable speech-emotion recognition method of susceptibility and system
CN112650831A (en) Virtual image generation method and device, storage medium and electronic equipment
Zhou et al. Emotion intensity and its control for emotional voice conversion
CN101261832A (en) Extraction and modeling method for Chinese speech sensibility information
CN108364632A (en) A kind of Chinese text voice synthetic method having emotion
Zhang et al. Pre-trained deep convolution neural network model with attention for speech emotion recognition
Chourasia et al. Emotion recognition from speech signal using deep learning
Goel et al. Cross lingual cross corpus speech emotion recognition
CN116092472A (en) Speech synthesis method and synthesis system
Yu et al. Multi-stage audio-visual fusion for dysarthric speech recognition with pre-trained models
CN116129868A (en) Method and system for generating structured photo
Reddy et al. Indian sign language generation from live audio or text for tamil
Yang et al. Review of research on speech emotion recognition
Mouaz et al. A new framework based on KNN and DT for speech identification through emphatic letters in Moroccan dialect
CN113257225A (en) Emotional voice synthesis method and system fusing vocabulary and phoneme pronunciation characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170929

RJ01 Rejection of invention patent application after publication