CN107221344A - A kind of speech emotional moving method - Google Patents
A kind of speech emotional moving method Download PDFInfo
- Publication number
- CN107221344A CN107221344A CN201710222674.XA CN201710222674A CN107221344A CN 107221344 A CN107221344 A CN 107221344A CN 201710222674 A CN201710222674 A CN 201710222674A CN 107221344 A CN107221344 A CN 107221344A
- Authority
- CN
- China
- Prior art keywords
- speech
- emotional
- target
- feature
- emotion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002996 emotional effect Effects 0.000 title claims abstract description 66
- 238000000034 method Methods 0.000 title claims abstract description 20
- 230000008451 emotion Effects 0.000 claims abstract description 35
- 238000010801 machine learning Methods 0.000 claims abstract description 15
- 238000000605 extraction Methods 0.000 claims abstract description 9
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 6
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 6
- 230000005012 migration Effects 0.000 claims description 6
- 238000013508 migration Methods 0.000 claims description 6
- 238000002360 preparation method Methods 0.000 claims description 2
- 230000008859 change Effects 0.000 abstract description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 238000002372 labelling Methods 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 9
- 238000001228 spectrum Methods 0.000 description 6
- 238000011160 research Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000008909 emotion recognition Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000006854 communication Effects 0.000 description 1
- 239000007859 condensation product Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000036403 neuro physiology Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000007430 reference method Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Child & Adolescent Psychology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of speech emotional moving method, speech database generation speech emotional data set is primarily based on, label for labelling is completed, audio feature extraction is then carried out to audio file using speech characteristic parameter model, set of voice features is obtained;Next machine learning is carried out to set of voice features and speech emotional label using Machine learning tools, builds emotion model storehouse.Selection target to be migrated, from multimedia terminal input speech signal, obtain the feature set of current speech signal, current emotional category is obtained by emotional semantic classification, judge whether consistent with the target of input, original input speech signal is exported directly as target emotional speech if consistent, feature feeling shifting is otherwise carried out;Eventually pass phonetic synthesis processing generation final goal emotional speech output.The method proposed by the present invention migrated based on emotional semantic classification and feature, can realize the change of speech emotional on the premise of original speaker's sound mark is not lost.
Description
Technical field
The invention belongs to technical field of voice recognition, it is related to the moving method of speech emotional, and in particular to one kind is not based on
With the moving method of the speech emotional of speech vendors' model.
Background technology
With the development of intelligent chip technology, the intellectuality of various terminal equipment and integration degree more and more higher, equipment
Miniaturization, lighting, networking make it that the life of people is more and more convenient.User constantly carries out voice by the network terminal
The exchange of video, have accumulated the multi-medium data of magnanimity.With the accumulation of platform data, intelligent Answer System also gradually meet the tendency of and
It is raw.These intelligent Answer Systems include speech recognition, sexy analysis, information retrieval, semantic matches, sentence generation, voice conjunction
Into grade tip technology.
Speech recognition technology is to allow machine voice signal to be converted into corresponding text by identification technology and understanding process
This information or machine instruction, allow machine to understand the expression content of the mankind, mainly include voice unit selection, phonetic feature
The technologies such as extraction, pattern match and model training.Voice unit includes three kinds of word (sentence), syllable and velocity of sound, specifically according to field
Scape and task are selected.Sub-word units are mainly suitable for small vocabulary speech identifying system;Syllable unit is more suitable for Chinese speech
Identification;Although phoneme can explain voice basis well, complicated and changeable due to enunciator leads to not obtain steady
Fixed data set, at present still under study for action.
Another research direction is the emotion recognition of voice, mainly by speech signal collection, affective feature extraction and emotion
Identification composition.Wherein affective feature extraction mainly has three kinds of prosodic features, the correlated characteristic based on spectrum and tonequality feature.These
Feature realizes extraction, and the progress emotion recognition in the form of global characteristics statistical value typically by minimum particle size of frame.In emotion
In terms of recognizer, mainly including discrete language emotion classifiers and the major class of dimension speech emotional fallout predictor two.Speech emotional is known
Other technology is also widely used for the fields such as telephone service center, the spiritual differentiation of driver, Online Distance Computing Course.
Intelligent body be described as be artificial intelligence of future generation condensation products, ambient environmental factors can not only be recognized, understand
Behavior Expression and the language description of people, or even in the communication process with people, with greater need for the emotion for going to understand people, and can be real
Existing apish emotional expression, could realize more soft interaction.The emotion research of current intelligent body, which is concentrated mainly on, to be based on
Virtual image processing, is related to the multiple fields such as computer graphics, psychology, cognitive science, neuro-physiology, artificial intelligence and grinds
The achievement for the person of studying carefully.Although it was found that the environment sensing information of people more than 90% comes from vision, the emotion perception of the overwhelming majority
It is to come from voice.The emotion system of class people's intelligent body how is set up from voice field, not yet there is disclosed research issue so far.
The content of the invention
The purpose of the present invention is, using machine learning method as Main Means, to propose a kind of speech emotional expression method of people,
And deep learning and convolutional network algorithm are used on this basis, the migration of speech emotional is realized from system.Not only to voice
Identification, sentiment analysis provide certain reference method, can be more used widely on following class people intelligent body.
To achieve the above object, technical scheme proposed by the present invention is a kind of speech emotional moving method, it is specific include with
Lower step:
Step 1, one speech database of preparation, pass through standard sample and generate speech emotional data set S={ s1,s2,…,
sn};
Step 2, using manual type the speech database of step 1 is labelled, mark the emotion E=of each voice document
{e1,e2,…,en};
Step 3, using speech characteristic parameter model to each audio file s in sound bankiCarry out audio feature extraction,
Obtain basic set of voice features Fi={ f1 i,f2 i,…,fn i};
The voice feelings that step 4, each set of voice features and step 2 that are obtained using Machine learning tools to step 3 are obtained
Feel label and carry out machine learning, obtain the characteristic model of each class speech emotional, build emotion model storehouse Eb;
Step 5, by a multimedia terminal, selection needs the target Target that speech emotional is migrated;
Step 6, from multimedia terminal input speech signal st;
Step 7, by the s currently inputtedtSpeech emotional characteristic extracting module is input to, the feature of current speech signal is obtained
Collect Ft={ f1 t,f2 t,…,fn t};
Step 8, using with step 4 identical machine learning algorithm, the s that step 7 is obtainedtSet of voice features FtWith reference to
The emotion model storehouse E that step step 4 is obtainedbEmotional semantic classification is carried out, s is obtainedtCurrent emotional category se;
The s that step 9, judgment step 8 are obtainedeIt is whether consistent with step 5 Target inputted, if se=Targete, then
Original input speech signal is exported directly as target emotional speech, if seTargete, then invocation step 10 carry out feature
Feeling shifting;
Step 10, speech emotional principal character of the current speech emotion principal character into emotion model storehouse moved
Move;
Phonetic feature after step 11, the feature obtained using Speech Synthesis Algorithm to step 10 migration is processed, and is closed
Into the output of final goal emotional speech.
Further, in above-mentioned steps 1, the sample frequency of speech data is 44.1KHz, record length between 3~10s,
And save as wav forms.
In step 1, in order to obtain preferable performance, the natural quality dimension of sampled data can not be concentrated excessively, hits
According to the collection in all ages and classes, sex, occupation et al. as far as possible.
In step 6, the input can click on to submit after the completion of inputting or record in real time.
The invention has the advantages that:
1st, present invention firstly provides the concept of speech emotional migration, emotion structure side can be provided for following virtual reality
Method.
2nd, the method proposed by the present invention migrated based on emotional semantic classification and feature, can not lose original speaker's sounding spy
The change of speech emotional is realized on the premise of levying.
Brief description of the drawings
Fig. 1 is the speech emotional moving method schematic diagram that the present invention is provided.
Fig. 2 is the spectrum signature figure that the present invention is originally inputted speech samples.
Fig. 3 is that raw tone sample of the present invention passes through the spectrum signature figure that emotion is converted.
Embodiment
In conjunction with accompanying drawing, the present invention is further detailed explanation.
The present invention provides a kind of user's expression speech emotional moving method based on speech emotional database, as shown in figure 1,
The module or function that this method is related to include:
, there is the voice initial data under all ages and classes, sex, scene in basic speech storehouse.
Tag library, Emotion tagging is carried out to basic speech storehouse, such as gentle, glad, angry, angry, sad.
Speech input device, such as microphone, it is possible to achieve the real-time voice input of user.
Speech emotional feature extraction, by sound characteristic analysis tool, obtains general sound characteristic, and according to the language of people
Sound signal characteristic and emotion behavior feature, the feature set needed for choosing are used as speech emotional feature.
Machine learning, speech emotional tag library is confirmed using machine learning algorithm, and speech emotional feature set is built and trained
Model.
Emotion model storehouse, sound bank data by machine learning obtain according to the dimensions such as sex, age, emotion classify after
Speech emotional model library.
Emotion is selected, user selects the emotion model for needing current speech being converted into real time before input speech signal.
Emotional category judges, judges whether the emotion of active user's input is consistent with the emotion of selection.If consistent, directly
Connect output target emotional speech.If inconsistent, feeling shifting module is called.
Feeling shifting, in the case where user's input voice and selection emotion are inconsistent, will input speech emotional feature set
Characteristic distance contrast is carried out with selection affective characteristics collection, adjustment input speech emotional feature space is represented, realizes feeling shifting.So
Exported afterwards using the emotional speech adjusted as target emotional speech.
One embodiment is now provided, to illustrate the transition process of speech emotional, specifically comprised the steps of:
Step 1, this method need to prepare a speech database, preferably, speech data uses standard sample
44.1KHz, in short, the time saves as wav forms between 3~10s, obtains voice feelings some lower tester of record
Feel data set S={ s1,s2,…,sn}.In order to obtain preferable performance, whether sampled data as possible or not age, sex, occupation etc.
The natural quality dimension of people is excessively concentrated.
Step 2, by the way of artificial, the speech database that step 1 prepares is labelled, each voice document is marked
Emotion E={ e1,e2,…,en, such as " worry ", " startled ", " anger ", " disappointment ", " sadness " etc.
Step 3, using speech characteristic parameter model to each audio file s in sound bankiAudio feature extraction is carried out, is obtained
To basic set of voice features Fi={ f1 i,f2 i,…,fn iEtc. (Fig. 2 show raw tone sample spectrum signature signal
Figure), such as " envelope (env) ", " word speed (speed) ", " zero-crossing rate (zcr) ", " energy (eng) ", " Energy-Entropy (eoe) ", " frequency
Compose barycenter (spec_cent) ", " frequency spectrum diffusion (spec_spr) ", " mel-frequency (mfccs) ", " chroma vector (chrona) "
Deng.
The feature set and step of step 4, each voice document obtained using Machine learning tools (such as Libsvm) to step 3
Speech emotional label obtained by rapid 2 carries out machine learning, obtains the characteristic model of each class speech emotional, builds emotion model
Storehouse Eb。
Step 5, by a multimedia terminal, selection needs speech emotional to migrate target Targete, such as " sadness ".
Step 6, from multimedia terminal input speech signal st, point after the completion of being real-time input or recording
Hit and submit.
Step 7, by the s currently inputtedtSpeech emotional characteristic extracting module is input to, the feature of current speech signal is obtained
Collect Ft={ f1 t,f2 t,…,fn t}。
Step 8, using step 4 identical machine learning algorithm, the s that step 7 is obtainedtSet of voice features FtWith reference to step
The emotion model storehouse E that rapid step 4 is obtainedbEmotional semantic classification is carried out, s is obtainedtCurrent emotional category se。
The s that step 9, judgment step 8 are obtainedeThe Target inputted with step 5eIt is whether consistent, if se=Targete, then
Original input speech signal is exported directly as target emotional speech.If seI Targete, then invocation step 10 carry out spy
Levy feeling shifting.
Step 10, by current speech emotion principal character, into emotion model storehouse, speech emotional principal character is migrated
(Fig. 3 show the spectrum signature after migration), such as envelope migrates resultenv=(senv+Targetenv)/2, word speed adjustment
resultspeed=(sspeed+Targetspeed)/2。
Step 11, the feature obtained using a Speech Synthesis Algorithm (Pitch synchronous overlap add technology, PSOLA) to step 10
The phonetic feature migrated is processed the emotional speech output of synthesis final goal.
The foregoing is only the present invention is preferable to carry out case, is not intended to limit the invention, although with reference to foregoing
The present invention is described in detail embodiment, for those skilled in the art, and it still can be to foregoing each reality
Apply the technical scheme described in example to be improved, or which part technology is replaced on an equal basis.All spirit in the present invention
Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.
Claims (4)
1. a kind of speech emotional moving method, it is characterised in that comprise the steps of:
Step 1, one speech database of preparation, pass through standard sample and generate speech emotional data set S={ s1,s2,…,sn};
Step 2, using manual type the speech database of step 1 is labelled, mark the emotion E={ e of each voice document1,
e2,…,en};
Step 3, using speech characteristic parameter model to each audio file s in sound bankiAudio feature extraction is carried out, is obtained
Basic set of voice features Fi={ f1 i,f2 i,…,fn i};
The speech emotional mark that step 4, each set of voice features and step 2 that are obtained using Machine learning tools to step 3 are obtained
Label carry out machine learning, obtain the characteristic model of each class speech emotional, build emotion model storehouse Eb;
Step 5, by a multimedia terminal, selection needs the target Target that speech emotional is migratede;
Step 6, from multimedia terminal input speech signal st;
Step 7, by the s currently inputtedtSpeech emotional characteristic extracting module is input to, the feature set F of current speech signal is obtainedt
={ f1 t,f2 t,…,fn t};
Step 8, using with step 4 identical machine learning algorithm, the s that step 7 is obtainedtSet of voice features FtWith reference to step
The emotion model storehouse E that step 4 is obtainedbEmotional semantic classification is carried out, s is obtainedtCurrent emotional category se;
The s that step 9, judgment step 8 are obtainedeIt is whether consistent with step 5 Target inputted, if se=Targete, then by original
Beginning input speech signal is exported directly as target emotional speech, if seTargete, then invocation step 10 carry out feature emotion
Migration;
Step 10, speech emotional principal character of the current speech emotion principal character into emotion model storehouse migrated;
Phonetic feature after step 11, the feature obtained using Speech Synthesis Algorithm to step 10 migration is processed, and synthesis is most
Whole target emotional speech output.
2. speech emotional moving method according to claim 1, it is characterised in that the sample frequency of speech data in step 1
For 44.1KHz, record length saves as wav forms between 3~10s.
3. speech emotional moving method according to claim 1, it is characterised in that in order to obtain preferable property in step 1
Can, the natural quality dimension of sampled data can not be concentrated excessively.
4. speech emotional moving method according to claim 1, it is characterised in that input can be real-time described in step 6
Click on and submit after the completion of input or recording.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710222674.XA CN107221344A (en) | 2017-04-07 | 2017-04-07 | A kind of speech emotional moving method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710222674.XA CN107221344A (en) | 2017-04-07 | 2017-04-07 | A kind of speech emotional moving method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107221344A true CN107221344A (en) | 2017-09-29 |
Family
ID=59928228
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710222674.XA Pending CN107221344A (en) | 2017-04-07 | 2017-04-07 | A kind of speech emotional moving method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107221344A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019218773A1 (en) * | 2018-05-15 | 2019-11-21 | 中兴通讯股份有限公司 | Voice synthesis method and device, storage medium, and electronic device |
CN111951778A (en) * | 2020-07-15 | 2020-11-17 | 天津大学 | Method for synthesizing emotion voice by using transfer learning under low resource |
CN112786026A (en) * | 2019-12-31 | 2021-05-11 | 深圳市木愚科技有限公司 | Parent-child story personalized audio generation system and method based on voice migration learning |
CN113421544A (en) * | 2021-06-30 | 2021-09-21 | 平安科技(深圳)有限公司 | Singing voice synthesis method and device, computer equipment and storage medium |
CN113555004A (en) * | 2021-07-15 | 2021-10-26 | 复旦大学 | Voice depression state identification method based on feature selection and transfer learning |
CN114495988A (en) * | 2021-08-31 | 2022-05-13 | 荣耀终端有限公司 | Emotion processing method of input information and electronic equipment |
CN116955572A (en) * | 2023-09-06 | 2023-10-27 | 宁波尚煦智能科技有限公司 | Online service feedback interaction method based on artificial intelligence and big data system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1787074A (en) * | 2005-12-13 | 2006-06-14 | 浙江大学 | Method for distinguishing speak person based on feeling shifting rule and voice correction |
CN101064104A (en) * | 2006-04-24 | 2007-10-31 | 中国科学院自动化研究所 | Emotion voice creating method based on voice conversion |
CN101261832A (en) * | 2008-04-21 | 2008-09-10 | 北京航空航天大学 | Extraction and modeling method for Chinese speech sensibility information |
CN102184731A (en) * | 2011-05-12 | 2011-09-14 | 北京航空航天大学 | Method for converting emotional speech by combining rhythm parameters with tone parameters |
CN103198827A (en) * | 2013-03-26 | 2013-07-10 | 合肥工业大学 | Voice emotion correction method based on relevance of prosodic feature parameter and emotion parameter |
CN103544963A (en) * | 2013-11-07 | 2014-01-29 | 东南大学 | Voice emotion recognition method based on core semi-supervised discrimination and analysis |
-
2017
- 2017-04-07 CN CN201710222674.XA patent/CN107221344A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1787074A (en) * | 2005-12-13 | 2006-06-14 | 浙江大学 | Method for distinguishing speak person based on feeling shifting rule and voice correction |
CN101064104A (en) * | 2006-04-24 | 2007-10-31 | 中国科学院自动化研究所 | Emotion voice creating method based on voice conversion |
CN101261832A (en) * | 2008-04-21 | 2008-09-10 | 北京航空航天大学 | Extraction and modeling method for Chinese speech sensibility information |
CN102184731A (en) * | 2011-05-12 | 2011-09-14 | 北京航空航天大学 | Method for converting emotional speech by combining rhythm parameters with tone parameters |
CN103198827A (en) * | 2013-03-26 | 2013-07-10 | 合肥工业大学 | Voice emotion correction method based on relevance of prosodic feature parameter and emotion parameter |
CN103544963A (en) * | 2013-11-07 | 2014-01-29 | 东南大学 | Voice emotion recognition method based on core semi-supervised discrimination and analysis |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019218773A1 (en) * | 2018-05-15 | 2019-11-21 | 中兴通讯股份有限公司 | Voice synthesis method and device, storage medium, and electronic device |
CN112786026A (en) * | 2019-12-31 | 2021-05-11 | 深圳市木愚科技有限公司 | Parent-child story personalized audio generation system and method based on voice migration learning |
CN112786026B (en) * | 2019-12-31 | 2024-05-07 | 深圳市木愚科技有限公司 | Parent-child story personalized audio generation system and method based on voice transfer learning |
CN111951778A (en) * | 2020-07-15 | 2020-11-17 | 天津大学 | Method for synthesizing emotion voice by using transfer learning under low resource |
CN111951778B (en) * | 2020-07-15 | 2023-10-17 | 天津大学 | Method for emotion voice synthesis by utilizing transfer learning under low resource |
CN113421544A (en) * | 2021-06-30 | 2021-09-21 | 平安科技(深圳)有限公司 | Singing voice synthesis method and device, computer equipment and storage medium |
CN113421544B (en) * | 2021-06-30 | 2024-05-10 | 平安科技(深圳)有限公司 | Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium |
CN113555004A (en) * | 2021-07-15 | 2021-10-26 | 复旦大学 | Voice depression state identification method based on feature selection and transfer learning |
CN114495988A (en) * | 2021-08-31 | 2022-05-13 | 荣耀终端有限公司 | Emotion processing method of input information and electronic equipment |
CN116955572A (en) * | 2023-09-06 | 2023-10-27 | 宁波尚煦智能科技有限公司 | Online service feedback interaction method based on artificial intelligence and big data system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110838286B (en) | Model training method, language identification method, device and equipment | |
CN110491382B (en) | Speech recognition method and device based on artificial intelligence and speech interaction equipment | |
Li et al. | Controllable emotion transfer for end-to-end speech synthesis | |
CN107221344A (en) | A kind of speech emotional moving method | |
CN110634491B (en) | Series connection feature extraction system and method for general voice task in voice signal | |
CN110853618B (en) | Language identification method, model training method, device and equipment | |
CN111048062B (en) | Speech synthesis method and apparatus | |
CN106503805B (en) | A kind of bimodal based on machine learning everybody talk with sentiment analysis method | |
CN107993665B (en) | Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system | |
CN102779508B (en) | Sound bank generates Apparatus for () and method therefor, speech synthesis system and method thereof | |
CN108564942A (en) | One kind being based on the adjustable speech-emotion recognition method of susceptibility and system | |
CN112650831A (en) | Virtual image generation method and device, storage medium and electronic equipment | |
Zhou et al. | Emotion intensity and its control for emotional voice conversion | |
CN101261832A (en) | Extraction and modeling method for Chinese speech sensibility information | |
CN108364632A (en) | A kind of Chinese text voice synthetic method having emotion | |
Zhang et al. | Pre-trained deep convolution neural network model with attention for speech emotion recognition | |
Chourasia et al. | Emotion recognition from speech signal using deep learning | |
Goel et al. | Cross lingual cross corpus speech emotion recognition | |
CN116092472A (en) | Speech synthesis method and synthesis system | |
Yu et al. | Multi-stage audio-visual fusion for dysarthric speech recognition with pre-trained models | |
CN116129868A (en) | Method and system for generating structured photo | |
Reddy et al. | Indian sign language generation from live audio or text for tamil | |
Yang et al. | Review of research on speech emotion recognition | |
Mouaz et al. | A new framework based on KNN and DT for speech identification through emphatic letters in Moroccan dialect | |
CN113257225A (en) | Emotional voice synthesis method and system fusing vocabulary and phoneme pronunciation characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170929 |
|
RJ01 | Rejection of invention patent application after publication |