CN106971703A - A kind of song synthetic method and device based on HMM - Google Patents
A kind of song synthetic method and device based on HMM Download PDFInfo
- Publication number
- CN106971703A CN106971703A CN201710160104.2A CN201710160104A CN106971703A CN 106971703 A CN106971703 A CN 106971703A CN 201710160104 A CN201710160104 A CN 201710160104A CN 106971703 A CN106971703 A CN 106971703A
- Authority
- CN
- China
- Prior art keywords
- song
- hmm
- model
- speaker
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010189 synthetic method Methods 0.000 title claims abstract description 24
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 77
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 75
- 238000012549 training Methods 0.000 claims abstract description 38
- 230000006978 adaptation Effects 0.000 claims abstract description 28
- 238000005516 engineering process Methods 0.000 claims abstract description 18
- 238000000034 method Methods 0.000 claims description 32
- 230000003044 adaptive effect Effects 0.000 claims description 29
- 238000001228 spectrum Methods 0.000 claims description 26
- 238000004458 analytical method Methods 0.000 claims description 22
- 238000009826 distribution Methods 0.000 claims description 20
- 230000009466 transformation Effects 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 10
- 230000002194 synthesizing effect Effects 0.000 claims description 7
- 238000012417 linear regression Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 5
- 239000000463 material Substances 0.000 claims description 5
- 241000208340 Araliaceae Species 0.000 claims description 3
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 3
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 3
- 238000010835 comparative analysis Methods 0.000 claims description 3
- 238000003066 decision tree Methods 0.000 claims description 3
- 230000005284 excitation Effects 0.000 claims description 3
- 235000008434 ginseng Nutrition 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 claims description 3
- 238000012731 temporal analysis Methods 0.000 claims description 3
- 230000000737 periodic effect Effects 0.000 claims 1
- 238000011160 research Methods 0.000 abstract description 5
- 230000008451 emotion Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 5
- 238000010606 normalization Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000007476 Maximum Likelihood Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 241001672694 Citrus reticulata Species 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 201000010902 chronic myelomonocytic leukemia Diseases 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0033—Recording/reproducing or transmission of music for electrophonic musical instruments
- G10H1/0041—Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
- G10H1/0058—Transmission between separate instruments or between individual components of a musical system
- G10H1/0066—Transmission between separate instruments or between individual components of a musical system using a MIDI interface
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/148—Duration modelling in HMMs, e.g. semi HMM, segmental models or transition probabilities
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/005—Algorithms for electrophonic musical instruments or musical processing, e.g. for automatic composition or resource allocation
- G10H2250/015—Markov chains, e.g. hidden Markov models [HMM], for musical processing, e.g. musical analysis or musical composition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/471—General musical sound synthesis principles, i.e. sound category-independent synthesis methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Probability & Statistics with Applications (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
The invention discloses a kind of song synthetic method based on HMM and device, with TTS (literary periodicals) technology, pass through HTS (speech synthesis system based on hidden Markov model), and utilize STRAIGHT algorithms, and establish the acoustic model related based on HMM speaker synthesized towards song, the melody Controlling model of song, and speaker adaptation training has been carried out, realize the personalized speech synthesizer that a kind of lyrics based on HMM are changed in real time to song.The system device enriches the research contents of phonetic synthesis, the voice of synthesis is had more the expression of expressive force and emotion;Especially give the opportunity to study that the technical operations such as song is made, music is handled are provided with music-lover;Social resources workable for people are added, with certain practical value and important meaning.
Description
Technical field
The present invention relates to the fields such as human-computer interaction technology, text-language switch technology, speech synthesis technique, and in particular to a kind of
Song synthetic method and device based on HMM.
Background technology
With constantly bringing forth new ideas and perfect for information technology, the music multimedia application in terms of many man-machine interactions is also gradually walked
Enter our daily life, such as computer is requested a song, set a song to music, modifying on song, and mobile phone and listen song to know song etc..How meter is made
The more hommization of calculation machine, can be as the mankind " singing ", that is to say, that, it is known that numbered musical notation and the lyrics, computer just can be automatic
Produce beautiful, interesting to listen to song and have become a kind of new demand.With multimedia technology developing rapidly in entertainment field, together
When more wide application space is also provided for this technology.
Overwhelming majority music are all to record and propagate in a digital format at present, for example, WAV, MP3, MIDI, Yi Jishi
When a variety of storage forms such as music broadcast.Compared with traditional music pattern, digital music is in terms of making, storage, distribution
There is incomparable advantage.By computer, creator can hear the making effect of musical works while setting a song to music, right
Any modification operation that music score is carried out can timely feed back to creator, it is not necessary to carry out traditional rehearsal, performance, record
The process of a series of complex such as system, editor handles music, greatly reduces cycle and the human cost of music making, simultaneously
It it also avoid composer and the creation inspiration accidentally obtained lost in very long production process.
Speech synthesis technique is an important research content of field of human-computer interaction, is the important set of embedded research field
Into part.Nowadays, song synthesis also progressively becomes a much-talked-about topic.However, before the appearance of song synthetic technology, language
The development of sound synthetic technology relative maturity.Some scholars attempt to the method for phonetic synthesis to synthesize song, still
There is a certain degree of otherness again in song and voice.Voice, which focuses on content, (can certainly express purpose, the feelings of speaker
Sense), song focuses on deduction and the fluctuations of melody, and this causes the method for phonetic synthesis to be applied directly to the conjunction of song
Into central.
Among long-term domestic and international research process, song synthesis is similar to speech synthesis technique, has also gradually formed
The synthesis mode of three kinds of main flows:1. waveform concatenation formula is synthesized;2. the synthesis of the formula of parametrization;3. speech modification formula is synthesized.Wherein splice
Synthesis and the synthesis of parametrization formula are all based on corpus, and synthesis tonequality is not high, and speech modification mode is more flexible, is basis
Melodic information changes the parameters,acoustic of voice signal and then reaches the synthesis of song.The lyrics are at home and abroad proposed to song
The personalized speech synthesis changed in real time.Song is produced according to the music-book information of song immediately, it can receive a first song lyrics
Continuous speech.The system is after the typing voice corresponding with the lyrics with Viterbi algorithm in continuous phonetic synthesis unit
Song is synthesized, is realized by pitch synchronous addition of waveforms (Pitch-Synchronous Overlap-Add, PSOLA) method
Pitch, duration, the real-time conversion of energy and frequency spectrum, and synthesize song.Because the system does not account for voice and song in pitch
With the otherness of the acoustic connection such as duration, cause the effect of synthesis undesirable.Also have on this basis, it is proposed that one big language material
The lyrics in storehouse are changed to song, and the system has all reached relatively good result in terms of naturalness and tonequality.The system design
The corpus of 3 mandarins, the optimum combination of each synthesis unit is determined with Viterbi algorithm.The defect of this method
It is:Make corpus devote a tremendous amount of time and people energy.
Therefore, those skilled in the art be directed to exploitation it is a kind of it is new towards have music process demand person based on HMM
Personalized song synthesis implementation method and device.
The content of the invention
In view of the drawbacks described above of prior art, the invention solves the problems that the synthesis of the Chinese song proposed in background technology is ground
Study carefully less, synthesis tonequality is not high, operation the problems such as take time and effort there is provided it is a kind of towards have music process demand person based on
The implementation method and device of HMM personalized song synthesis.
In order to solve the above technical problems, the technical scheme that the present invention is provided is as follows:
A kind of song synthetic method based on HMM, comprises the following steps:
A, analysis voice and song set up the melody Controlling model of song in the otherness of acoustic feature;
B, the acoustic model for setting up the correlation of the speaker based on HMM synthesized towards song;
C, using the speech synthesis system based on HMM synthesize song.
Further, the specific steps of the otherness of voice and song in acoustic feature are analyzed described in the step A such as
Under:
A, analysis of spectrum is carried out to voice signal with temporal analysis and frequency domain analysis, and voice signal and song are believed
Number carry out fundamental frequency comparative analysis;
B, required music-book information is extracted from MIDI systems using MIDI technologies;
C, the melodic information by reading the music score extracted in MIDI files, analyze the architectural feature of its music score file, enter
And music parameter information is obtained, when the music parameter information includes channel number, note pitch, the speed of key, note starting
Between and note duration.
Further, the melody Controlling model of song described in the step A includes fundamental frequency Controlling model and duration is controlled
Model;The discrete notes in music score are converted into continuous fundamental curve using fundamental frequency Controlling model, and mould is controlled using duration
Type obtains the pronunciation duration for singing note.
Further, the acoustic mode of the correlation of the speaker based on HMM synthesized towards song is set up described in the step B
Type has the following steps:
A, the voice language material using speaker, analyze speech data, obtaining speech data includes fundamental frequency F0, duration, frequency
Compose SP and aperiodic index AP parameters,acoustic;And the speaker adaptation training technique based on HMM is utilized, training is mixed
The average sound model of voice;
B, a small amount of speech data using target speaker to be synthesized, by speaker adaptation converter technique, are obtained
The adaptive acoustic model of target speaker, and adaptive model is modified and updated.
Further, described to be trained by the speaker adaptation based on HMM, training obtains the average sound mould of mixing voice
Type comprises the following steps:
A, the corpus to speaker and target speaker corpus data carry out speech analysis, extract its acoustics ginseng
Number:Mel cepstrum coefficients, and calculate their first-order difference and second differnce;
B, with reference to context property collection, carry out the HMM model and shape of HMM model training, training frequency spectrum and base frequency parameters
The HMM MSD-HSMM of many distributions half of state duration parameters;
C, the sound bank using a small amount of target speaker, carry out speaker adaptation training, obtain being averaged for mixing voice
Sound model, so as to obtain context-sensitive MSD-HSMM models.
Further, a small amount of speech data using target speaker to be synthesized, is become by speaker adaptation
Technology is changed, the adaptive acoustic model of target speaker is obtained, and adaptive model is modified and updated, including following step
Suddenly:
After a, speaker adaptation training, using the CMLLR adaptive algorithms based on HSMM, calculating obtains voice conversion
State output probability distribution and duration mean of a probability distribution vector sum covariance matrix, characteristic vector o and shape under state i
State duration d transformation equation is:
bi(o)=N (o;Aui-b,AΣiAT)=| A-1|N(Wξ;ui,Σi)
pi(d)=N (d;αmi-β,ασi 2α)=| α-1|N(αψ;mi,σi 2)
Wherein, ξ=[oT, 1], ψ=[d, 1]T, μiThe average being distributed for state output, miThe average being distributed for duration, Σi
For diagonal covariance matrix,For variance, W=[A-1 b-1] be target speaker's state output probability density distribution linear change
Change matrix, X=[α-1,β-1] be state duration probability density distribution transformation matrix;
B, by the adaptive transformation algorithm based on HSMM, the frequency spectrum, fundamental frequency and duration parameters of speech data can be carried out
Normalization and conversion, for the self-adapting data O that length is T, can carry out maximal possibility estimation to conversion Λ=(W, X);
C, using maximum a posteriori MAP algorithms the adaptive model of voice is corrected and updated, for giving HSMM
Parameter set λ, if its forward direction probability and backward probability are respectively:αtAnd β (i)t(i), then its Continuous Observation sequence under state i
ot-d+1…otGenerating probabilityFor:
MAP estimations are described as follows:
Wherein,WithMean vector after being converted for linear regression, ω and τ are respectively that state output and duration are distributed
MAP estimates parameter,WithFor adaptive mean vectorWithWeighted average MAP estimates.
Further, the language that song is used is synthesized using the speech synthesis system based on HMM described in the step C
It is based on STRAIGHT algorithms that cent, which is analysed with synthetic method,.
Further, synthesizing song using the speech synthesis system based on HMM described in the step C includes following step
Suddenly:
A, using text analyzing instrument the lyrics text of input is analyzed, song that will be given using text analyzing program
Word text is converted to the acoustics annotated sequence comprising linguistic context description information, with clustered in training process obtained each decision tree come
The context HMM model related to each pronunciation and its linguistic context is predicted, then is spliced into a sentence HMM model;
B, according to MIDI files, obtain the pitch and the duration of a sound of each note in the lyrics, phase obtained by melody Controlling model
The fundamental frequency and duration answered, frequency spectrum SP, the aperiodic index AP of syllable and fundamental frequency F0 duration are changed using note duration;
C, using in the related acoustic model of speaker and STRAIGHT algorithm generated statement HMM models on frequency spectrum
SP, aperiodic index AP, duration, fundamental frequency F0 argument sequence, and synthesize voice, musical background is added, song is realized
Synthesis.
Further, the language that song is used is synthesized using the speech synthesis system based on HMM described in the step C
It is based on STRAIGHT algorithms, to comprise the following steps that cent, which is analysed with synthetic method,:
The voice signal of speaker is inputted first, and the fundamental frequency F of voice is extracted with STRAIGHT algorithms0And spectrum envelope
Spectral envelope, are then modulated to parameters,acoustic, the new sound source of generation and time varing filter, further according to original filter
Ripple device model, voice is synthesized using such as following formula:
Wherein, Q represent synthesis excitation in one group of sampling point position, G () represent Pitch modulation, can arbitrarily with original
The F of beginning voice0To match the F after modulation0, time structure of the all-pass filter for controlling fine pitch and original signal, such as one
The linear phase being directly proportional to frequency is moved, for controlling F0Fine structure, from modulation amplitude spectrum A (S (u (w), r (t)), u
(w), r (t)) such as following formula, it can calculate and obtain corresponding Fourier transformation V (w, the t of minimum phase pulsei), wherein A (), u ()
Represent the modulation of amplitude, frequency and time dimension respectively with r ();
Wherein, q represents frequency.
A kind of song synthesizer based on HMM, it is characterised in that including:
Melody control module, the melody Controlling model for setting up song;
The related acoustic module of speaker based on HMM, the acoustics for setting up the speaker's correlation synthesized towards song
Model;
Song synthesis module based on HMM, the singing voice to be synthesized for synthesizing.
Further, the melody control module, including:
MIDI analytic units, for analyzing the music-book information extracted from MIDI files, and obtain corresponding music parameter
Information;
Prosodic control unit, for, in the otherness of acoustic feature, setting up the melody control of song according to voice and song
Model.
Further, the related acoustic module of the speaker based on HMM, including:
Acoustic model unit, the acoustic model for obtaining target speaker;
Parameters,acoustic subelement, for the parameter phonetic synthesis based on HMM.
Further, the song synthesis module based on HMM, including:
Text analysis unit, carries out text analyzing to the lyrics text of input, obtains context-sensitive mark;
HMM model trains subelement, the HMM model storehouse for setting up speech data;
Speaker adaptation subelement, the characteristic parameter for normalizing and changing speaker in training obtains adaptive
Model;
Phonetic synthesis unit, the singing voice to be synthesized for synthesizing;
Song synthesis unit, adds musical background for the singing voice to synthesis, completes the synthesis of song.
The present invention has the advantages and positive effects of:A kind of song synthetic method and device based on HMM, with TTS
(literary periodicals) technology, by HTS (speech synthesis system based on hidden Markov model), and using STRAIGHT algorithms,
And the acoustic model related based on HMM speaker synthesized towards song, the melody Controlling model of song are established, and carry out
Speaker adaptation training, realizes the personalized speech synthesizer that a kind of lyrics based on HMM are changed in real time to song.
Compared with traditional singing sound synthesis system, the system with speech analysis and synthetic method be using STRAIGHT algorithms as base
Plinth, while adding speaker adaptation training process in the training stage, obtains the average sound model of mixing voice, by this
Training process, can reduce the influence caused by the otherness of speaker in sound bank, so as to improve the language of song synthesis
Sound quality;On the basis of average sound model, by speaker adaptation converter technique, using a small amount of speaker's language material, close
Into naturalness and melodious degree all relatively good singing voices.The system device enriches the research contents of phonetic synthesis, makes synthesis
Voice have more the expression of expressive force and emotion;Especially giving, there is music-lover to provide song is made, music is handled etc.
The opportunity to study of technical operation;Social resources workable for people are added, with certain practical value and important meaning.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of system flow block diagram of song synthetic method based on HMM of the preferred embodiment of the present invention;
Fig. 2 is the MIDI system block diagrams of the preferred embodiment of the present invention;
Fig. 3 is the speaker adaptation speech synthesis system block diagram of the preferred embodiment of the present invention;
Fig. 4 is the STRAIGHT analysis-modulation-synthesis system block diagrams of the preferred embodiment of the present invention;
The apparatus structure signal that Fig. 5 realizes for a kind of song synthesis based on HMM of the preferred embodiment of the present invention
Figure.
Embodiment
Below in conjunction with the accompanying drawing in the present invention, the technical scheme in the present invention is clearly and completely described, shown
So, described only a part of embodiment of the invention, rather than whole embodiments.Based on the embodiment in the present invention,
The every other embodiment that those of ordinary skill in the art are obtained on the premise of creative work is not made, belongs to this
Invent the scope of protection.
As shown in figure 1, a preferred embodiment of the present invention discloses a kind of song synthetic method based on HMM, with TTS
(literary periodicals) technology, by HTS (speech synthesis system based on hidden Markov model), and using STRAIGHT algorithms,
And the acoustic model related based on HMM speaker synthesized towards song, the melody Controlling model of song are established, and carry out
Speaker adaptation training, realizes the personalized speech synthetic method that a kind of lyrics based on HMM are changed in real time to song.
Comprise the following steps:
A, analysis voice and song set up the melody Controlling model of song in the otherness of acoustic feature;
The otherness that voice and song are analyzed described in step A in acoustic feature is comprised the following steps that:
A, analysis of spectrum is carried out to voice signal with temporal analysis and frequency domain analysis, and voice signal and song are believed
Number carry out fundamental frequency comparative analysis;
B, required music-book information is extracted from MIDI systems using MIDI technologies;
C, the melodic information by reading the music score extracted in MIDI files, analyze the architectural feature of its music score file, enter
And music parameter information is obtained, when the music parameter information includes channel number, note pitch, the speed of key, note starting
Between and note duration.
As shown in Fig. 2 being MIDI system block diagrams.
The melody Controlling model of song described in step A includes fundamental frequency Controlling model and duration Controlling model;Utilize fundamental frequency
Discrete notes in music score are converted to continuous fundamental curve by Controlling model, and sing note using the acquisition of duration Controlling model
Pronunciation duration.
B, the acoustic model for setting up the correlation of the speaker based on HMM synthesized towards song;
As shown in figure 3, setting up the acoustic model of the correlation of the speaker based on HMM synthesized towards song described in step B
Have the following steps:
A, the voice language material using speaker, analyze speech data, obtaining speech data includes fundamental frequency F0, duration, frequency
Compose SP and aperiodic index AP parameters,acoustic;And the speaker adaptation training technique based on HMM is utilized, training is mixed
The average sound model of voice;
B, a small amount of speech data using target speaker to be synthesized, by speaker adaptation converter technique, are obtained
The adaptive acoustic model of target speaker, and adaptive model is modified and updated, and then synthesize and said with target
Talk about the voice of people's tone color.
As shown in figure 3, described trained by the speaker adaptation based on HMM, training obtains the average sound of mixing voice
Model comprises the following steps:
A, the corpus to speaker and target speaker corpus data carry out speech analysis, extract its acoustics ginseng
Number:Mel cepstrum coefficients, and calculate their first-order difference and second differnce;
B, with reference to context property collection, carry out the HMM model and shape of HMM model training, training frequency spectrum and base frequency parameters
The HMM MSD-HSMM of many distributions half of state duration parameters;
C, the sound bank using a small amount of target speaker, carry out speaker adaptation training, obtain being averaged for mixing voice
Sound model, so that context-sensitive MSD-HSMM models are obtained, including:
1. (CMML) algorithm is linearly returned using constraint maximum likelihood, by the speech data of speaker in training and average sound
Between difference linear regression function representation;
2. the equation of linear regression normalization being distributed with one group of state output distribution and state duration is trained between speaker
Difference;
3. training obtains the average sound model of mixing voice, so as to obtain context-sensitive MSD-HSMM models.
A small amount of speech data using target speaker to be synthesized, by speaker adaptation converter technique, is obtained
It is modified and updates to the adaptive acoustic model of target speaker, and to adaptive model, comprises the following steps:
After a, speaker adaptation training, using the CMLLR adaptive algorithms based on HSMM, calculating obtains voice conversion
State output probability distribution and duration mean of a probability distribution vector sum covariance matrix, characteristic vector o and shape under state i
State duration d transformation equation is:
bi(o)=N (o;Aui-b,AΣiAT)=| A-1|N(Wξ;ui,Σi)
pi(d)=N (d;αmi-β,ασi 2α)=| α-1|N(αψ;mi,σi 2)
Wherein, ξ=[oT, 1], ψ=[d, 1]T, μiThe average being distributed for state output, miThe average being distributed for duration, Σi
For diagonal covariance matrix,For variance, W=[A-1b-1] be target speaker's state output probability density distribution linear change
Change matrix, X=[α-1,β-1] be state duration probability density distribution transformation matrix;
B, by the adaptive transformation algorithm based on HSMM, the frequency spectrum, fundamental frequency and duration parameters of speech data can be carried out
Normalization and conversion, for the self-adapting data O that length is T, can carry out maximal possibility estimation to conversion Λ=(W, X);
C, using maximum a posteriori MAP algorithms the adaptive model of voice is corrected and updated, for giving HSMM
Parameter set λ, if its forward direction probability and backward probability are respectively:αtAnd β (i)t(i), then its Continuous Observation sequence under state i
ot-d+1…otGenerating probabilityFor:
MAP estimations are described as follows:
Wherein,WithMean vector after being converted for linear regression, ω and τ are respectively that state output and duration are distributed
MAP estimates parameter,WithFor adaptive mean vectorWithWeighted average MAP estimates.
C, using the speech synthesis system based on HMM synthesize song.
Synthesize speech analysis and the synthesis that song is used using the speech synthesis system based on HMM described in step C
Method is based on STRAIGHT algorithms.
Synthesize song using the speech synthesis system based on HMM described in step C to comprise the following steps:A, use text
Analysis tool is analyzed the lyrics text of input, is converted to given lyrics text comprising language using text analyzing program
The acoustics annotated sequence of border description information, predicted with obtained each decision tree is clustered in training process with each pronunciation and its
The related context HMM model of linguistic context, then it is spliced into a sentence HMM model;
B, according to MIDI files, obtain the pitch and the duration of a sound of each note in the lyrics, phase obtained by melody Controlling model
The fundamental frequency and duration answered, frequency spectrum SP, the aperiodic index AP of syllable and fundamental frequency F0 duration are changed using note duration;
C, using in the related acoustic model of speaker and STRAIGHT algorithm generated statement HMM models on frequency spectrum
SP, aperiodic index AP, duration, fundamental frequency F0 argument sequence, and synthesize voice, musical background is added, song is realized
Synthesis.
As shown in figure 4, during singing voice is synthesized, analyze-modulate using STRAIGHT-synthesis system comes accurately
Extraction fundamental frequency information, exclude spectrum envelope periodically disturb, described in the step C utilize the phonetic synthesis system based on HMM
It is based on STRAIGHT algorithms, to comprise the following steps that system, which synthesizes the speech analysis that song used and synthetic method,:
The voice signal of speaker is inputted first, and the fundamental frequency F of voice is extracted with STRAIGHT algorithms0And spectrum envelope
Spectral envelope, are then modulated to parameters,acoustic, the new sound source of generation and time varing filter, further according to original filter
Ripple device model, voice is synthesized using such as following formula:
Wherein, Q represent synthesis excitation in one group of sampling point position, G () represent Pitch modulation, can arbitrarily with original
The F of beginning voice0To match the F after modulation0, time structure of the all-pass filter for controlling fine pitch and original signal, such as one
The linear phase being directly proportional to frequency is moved, for controlling F0Fine structure, from modulation amplitude spectrum A (S (u (w), r (t)), u
(w), r (t)) such as following formula, it can calculate and obtain corresponding Fourier transformation V (w, the t of minimum phase pulsei), wherein A (), u ()
Represent the modulation of amplitude, frequency and time dimension respectively with r ();
Wherein, q represents frequency.
Corresponding with the above method, another preferred embodiment of the invention also discloses a kind of song synthesis based on HMM
Device, the device is used to set up the acoustic model related based on HMM speaker synthesized towards song, the melody control mould of song
Type, carries out speaker adaptation training, and by using the HTS (voices based on hidden Markov model of STRAIGHT algorithms
Synthesis system), with reference to TTS (literary periodicals) technology, the lyrics are realized to the personalization conversion in real time of song.In realization, may be used
The function of the present apparatus is realized by software, hardware or software and hardware combining mode.
As shown in figure 5, the song synthesizer includes:Melody control module, the related acoustics of the speaker based on HMM
Module and the song synthesis module based on HMM.
Melody control module, the melody Controlling model for setting up song;
The melody control module, including:
MIDI analytic units, for analyzing the music-book information extracted from MIDI files, and obtain corresponding music parameter
Information;
Prosodic control unit, for, in the otherness of acoustic feature, setting up the melody control of song according to voice and song
Model.
By MIDI analytic units, the music-book information extracted from MIDI files is analyzed, and obtain corresponding music parameter
Information;Then in melody control module, according to voice and song in the otherness of acoustic feature, the melody control mould of song is set up
Type.
The related acoustic module of speaker based on HMM, the acoustics for setting up the speaker's correlation synthesized towards song
Model;
The related acoustic module of the speaker based on HMM, including:
Acoustic model unit, the acoustic model for obtaining target speaker;
Parameters,acoustic subelement, for the parameter phonetic synthesis based on HMM.
Song synthesis module based on HMM, the singing voice to be synthesized for synthesizing.
The song synthesis module based on HMM, including:
Text analysis unit, carries out text analyzing to the lyrics text of input, obtains context-sensitive mark;
HMM model trains subelement, the HMM model storehouse for setting up speech data, by extracting voice number in sound bank
According to speaker's parameters,acoustic, mainly extract fundamental frequency, frequency spectrum and duration parameters, and combine the context markup information in sound storehouse,
The statistical model of acoustic model is trained, further according to context property collection, fundamental frequency, frequency spectrum and duration parameters are determined;
Speaker adaptation subelement, the characteristic parameter for normalizing and changing speaker in training obtains adaptive
Model, is trained by speaker, and state output is distributed and state duration between speaker and average sound model in normalization training
Difference between distribution, and the use linear regression algorithm of maximum likelihood determines the average sound model of many speaker's mixing voices, then
Using self-adapting data, the state output probability distribution and duration mean of a probability distribution vector sum covariance of speaker is calculated
Matrix, and it is converted to target speaker model, so as to set up the MSD-HSMM of target speaker adaptive model;
Phonetic synthesis unit, the singing voice to be synthesized for synthesizing utilizes the adaptive model of amendment, prediction input text
The speech parameter of this lyrics, and Speech acoustics parameter is extracted, then song is synthesized by the VODER based on STRAIGHT algorithms
Sound voice;
Song synthesis unit, adds musical background for the singing voice to synthesis, completes the synthesis of song.
The process described above process can be completed by the related hardware of programmed instruction, and described program can be stored in can
In the storage medium of reading, the program performs the corresponding steps in the above method upon execution.
Above content is to combine specific preferred embodiment further description made for the present invention, it is impossible to assert
The specific implementation of the present invention is confined to these explanations.For general technical staff of the technical field of the invention,
On the premise of not departing from present inventive concept, some simple deduction or replace can also be made, should all be considered as belonging to the present invention's
Protection domain.
Claims (13)
1. a kind of song synthetic method based on HMM, it is characterised in that comprise the following steps:
A, analysis voice and song set up the melody Controlling model of song in the otherness of acoustic feature;
B, the acoustic model for setting up the correlation of the speaker based on HMM synthesized towards song;
C, using the speech synthesis system based on HMM synthesize song.
2. a kind of song synthetic method based on HMM according to claim 1, it is characterised in that described in the step A
Analysis voice and song are comprised the following steps that in the otherness of acoustic feature:
A, with temporal analysis and frequency domain analysis analysis of spectrum is carried out to voice signal, and voice signal is entered with singing voice signals
The comparative analysis of row fundamental frequency;
B, required music-book information is extracted from MIDI systems using MIDI technologies;
C, the melodic information by reading the music score extracted in MIDI files, analyze the architectural feature of its music score file, and then obtain
Music parameter information, the music parameter information include channel number, note pitch, the speed of key, note initial time and
Note duration.
3. a kind of song synthetic method based on HMM according to claim 2, it is characterised in that described in the step A
The melody Controlling model of song includes fundamental frequency Controlling model and duration Controlling model;Using fundamental frequency Controlling model by music score from
Scattered pitch is converted to continuous fundamental curve, and the pronunciation duration for singing note is obtained using duration Controlling model.
4. a kind of song synthetic method based on HMM according to claim 1, it is characterised in that described in the step B
The acoustic model for setting up the correlation of the speaker based on HMM synthesized towards song has the following steps:
A, the voice language material using speaker, analyze speech data, obtaining speech data includes fundamental frequency F0, duration, frequency spectrum SP
With aperiodic index AP parameters,acoustic;And the speaker adaptation training technique based on HMM is utilized, training obtains mixing voice
Average sound model;
B, a small amount of speech data using target speaker to be synthesized, by speaker adaptation converter technique, obtain target
The adaptive acoustic model of speaker, and adaptive model is modified and updated.
5. a kind of song synthetic method based on HMM according to claim 4, it is characterised in that described by based on HMM
Speaker adaptation training, training obtains the average sound model of mixing voice and comprises the following steps:
A, the corpus to speaker and target speaker corpus data carry out speech analysis, extract its parameters,acoustic:Mel
Cepstrum coefficient, and calculate their first-order difference and second differnce;
B, with reference to context property collection, HMM model training is carried out, when HMM model and the state of training frequency spectrum and base frequency parameters
The HMM MSD-HSMM of many distributions half of long parameter;
C, the sound bank using a small amount of target speaker, carry out speaker adaptation training, obtain the average sound mould of mixing voice
Type, so as to obtain context-sensitive MSD-HSMM models.
6. a kind of song synthetic method based on HMM according to claim 4, it is characterised in that described using to be synthesized
Target speaker a small amount of speech data, by speaker adaptation converter technique, obtain target speaker it is adaptive at the sound
Model is learned, and adaptive model is modified and updated, is comprised the following steps:
After a, speaker adaptation training, using the CMLLR adaptive algorithms based on HSMM, the shape for obtaining voice conversion is calculated
State output probability is distributed and duration mean of a probability distribution vector sum covariance matrix, characteristic vector o and during state under state i
Long d transformation equation is:
bi(o)=N (o;Aui-b,AΣiAT)=| A-1|N(Wξ;ui,Σi)
pi(d)=N (d;αmi-β,ασi 2α)=| α-1|N(αψ;mi,σi 2)
Wherein, ξ=[oT, 1], ψ=[d, 1]T, μiThe average being distributed for state output, miThe average being distributed for duration, ΣiTo be right
Angle covariance matrix, σi 2For variance, W=[A-1b-1] be target speaker's state output probability density distribution linear transformation square
Battle array, X=[α-1,β-1] be state duration probability density distribution transformation matrix;
B, by the adaptive transformation algorithm based on HSMM, normalizing can be carried out to the frequency spectrum, fundamental frequency and duration parameters of speech data
Change and convert, for the self-adapting data O that length is T, maximal possibility estimation can be carried out to conversion Λ=(W, X);
C, using maximum a posteriori MAP algorithms the adaptive model of voice is corrected and updated, the ginseng for giving HSMM
Manifold λ, if its forward direction probability and backward probability are respectively:αtAnd β (i)t(i), then its Continuous Observation sequence under state i
ot-d+1…otGenerating probability κt d(i) it is:
MAP estimations are described as follows:
Wherein,WithMean vector after being converted for linear regression, ω and τ are respectively that the MAP of state output and duration distribution estimates
Count parameter,WithFor adaptive mean vectorWithWeighted average MAP estimates.
7. a kind of song synthetic method based on HMM according to claim 1, it is characterised in that described in the step C
It is with STRAIGHT algorithms to synthesize speech analysis that song used and synthetic method using the speech synthesis system based on HMM
Based on.
8. a kind of song synthetic method based on HMM according to claim 7, it is characterised in that described in the step C
Synthesize song using the speech synthesis system based on HMM to comprise the following steps:
A, using text analyzing instrument the lyrics text of input is analyzed, using text analyzing program by given lyrics text
Originally the acoustics annotated sequence comprising linguistic context description information is converted to, is predicted with obtained each decision tree is clustered in training process
The context HMM model related to each pronunciation and its linguistic context, then it is spliced into a sentence HMM model;
B, according to MIDI files, obtain the pitch and the duration of a sound of each note in the lyrics, obtained accordingly by melody Controlling model
Fundamental frequency and duration, frequency spectrum SP, the aperiodic index AP of syllable and fundamental frequency F0 duration are changed using note duration;
C, using in the related acoustic model of speaker and STRAIGHT algorithm generated statement HMM models on frequency spectrum SP, non-
Periodic key AP, duration, fundamental frequency F0 argument sequence, and synthesize voice, musical background is added, the synthesis of song is realized.
9. a kind of song synthetic method based on HMM according to claim 7 or 8, it is characterised in that in the step C
It is described that to synthesize speech analysis that song used and synthetic method using the speech synthesis system based on HMM be with STRAIGHT
Based on algorithm, comprise the following steps:
The voice signal of speaker is inputted first, and the fundamental frequency F of voice is extracted with STRAIGHT algorithms0With spectrum envelope Spectral
Envelope, is then modulated to parameters,acoustic, the new sound source of generation and time varing filter, further according to former filter model,
Voice is synthesized using such as following formula:
Wherein, Q represent synthesis excitation in one group of sampling point position, G () represent Pitch modulation, can arbitrarily with original language
The F of sound0To match the F after modulation0, all-pass filter is used for the time structure for controlling fine pitch and original signal, such as one and frequency
The linear phase that rate is directly proportional is moved, for controlling F0Fine structure, from modulation amplitude spectrum A (S (u (w), r (t)), u (w), r
(t)) such as following formula, it can calculate and obtain corresponding Fourier transformation V (w, the t of minimum phase pulsei), wherein A (), u () and r ()
The modulation of amplitude, frequency and time dimension is represented respectively;
Wherein, q represents frequency.
10. a kind of song synthesizer based on HMM, it is characterised in that including:
Melody control module, the melody Controlling model for setting up song;
The related acoustic module of speaker based on HMM, the acoustic model for setting up the speaker's correlation synthesized towards song;
Song synthesis module based on HMM, the singing voice to be synthesized for synthesizing.
11. a kind of song synthesizer based on HMM according to claim 10, it is characterised in that the melody control
Module, including:
MIDI analytic units, for analyzing the music-book information extracted from MIDI files, and obtain corresponding music parameter information;
Prosodic control unit, for, in the otherness of acoustic feature, setting up the melody Controlling model of song according to voice and song.
12. a kind of song synthesizer based on HMM according to claim 10, it is characterised in that described based on HMM's
The related acoustic module of speaker, including:
Acoustic model unit, the acoustic model for obtaining target speaker;
Parameters,acoustic subelement, for the parameter phonetic synthesis based on HMM.
13. a kind of song synthesizer based on HMM according to claim 10, it is characterised in that described based on HMM's
Song synthesis module, including:
Text analysis unit, carries out text analyzing to the lyrics text of input, obtains context-sensitive mark;
HMM model trains subelement, the HMM model storehouse for setting up speech data;
Speaker adaptation subelement, the characteristic parameter for normalizing and changing speaker in training, obtains adaptive model;
Phonetic synthesis unit, the singing voice to be synthesized for synthesizing;
Song synthesis unit, adds musical background for the singing voice to synthesis, completes the synthesis of song.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710160104.2A CN106971703A (en) | 2017-03-17 | 2017-03-17 | A kind of song synthetic method and device based on HMM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710160104.2A CN106971703A (en) | 2017-03-17 | 2017-03-17 | A kind of song synthetic method and device based on HMM |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106971703A true CN106971703A (en) | 2017-07-21 |
Family
ID=59329007
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710160104.2A Pending CN106971703A (en) | 2017-03-17 | 2017-03-17 | A kind of song synthetic method and device based on HMM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106971703A (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108831437A (en) * | 2018-06-15 | 2018-11-16 | 百度在线网络技术(北京)有限公司 | A kind of song generation method, device, terminal and storage medium |
CN108831435A (en) * | 2018-06-06 | 2018-11-16 | 安徽继远软件有限公司 | A kind of emotional speech synthesizing method based on susceptible sense speaker adaptation |
CN109036370A (en) * | 2018-06-06 | 2018-12-18 | 安徽继远软件有限公司 | A kind of speaker's voice adaptive training method |
CN109068439A (en) * | 2018-07-30 | 2018-12-21 | 上海应用技术大学 | A kind of light coloring control method and its control device based on MIDI theme |
CN109147757A (en) * | 2018-09-11 | 2019-01-04 | 广州酷狗计算机科技有限公司 | Song synthetic method and device |
CN109147809A (en) * | 2018-09-20 | 2019-01-04 | 广州酷狗计算机科技有限公司 | Acoustic signal processing method, device, terminal and storage medium |
CN109192218A (en) * | 2018-09-13 | 2019-01-11 | 广州酷狗计算机科技有限公司 | The method and apparatus of audio processing |
CN109326280A (en) * | 2017-07-31 | 2019-02-12 | 科大讯飞股份有限公司 | Singing synthesis method and device and electronic equipment |
CN109801608A (en) * | 2018-12-18 | 2019-05-24 | 武汉西山艺创文化有限公司 | A kind of song generation method neural network based and system |
CN110164412A (en) * | 2019-04-26 | 2019-08-23 | 吉林大学珠海学院 | A kind of music automatic synthesis method and system based on LSTM |
CN110189741A (en) * | 2018-07-05 | 2019-08-30 | 腾讯数码(天津)有限公司 | Audio synthetic method, device, storage medium and computer equipment |
CN110264984A (en) * | 2019-05-13 | 2019-09-20 | 北京奇艺世纪科技有限公司 | Model training method, music generating method, device and electronic equipment |
CN110364140A (en) * | 2019-06-11 | 2019-10-22 | 平安科技(深圳)有限公司 | Training method, device, computer equipment and the storage medium of song synthetic model |
CN110634460A (en) * | 2018-06-21 | 2019-12-31 | 卡西欧计算机株式会社 | Electronic musical instrument, control method for electronic musical instrument, and storage medium |
CN110634461A (en) * | 2018-06-21 | 2019-12-31 | 卡西欧计算机株式会社 | Electronic musical instrument, control method for electronic musical instrument, and storage medium |
CN110838286A (en) * | 2019-11-19 | 2020-02-25 | 腾讯科技(深圳)有限公司 | Model training method, language identification method, device and equipment |
WO2020140390A1 (en) * | 2019-01-04 | 2020-07-09 | 平安科技(深圳)有限公司 | Vibrato modeling method, device, computer apparatus and storage medium |
CN111402843A (en) * | 2020-03-23 | 2020-07-10 | 北京字节跳动网络技术有限公司 | Rap music generation method and device, readable medium and electronic equipment |
CN111445892A (en) * | 2020-03-23 | 2020-07-24 | 北京字节跳动网络技术有限公司 | Song generation method and device, readable medium and electronic equipment |
CN112037757A (en) * | 2020-09-04 | 2020-12-04 | 腾讯音乐娱乐科技(深圳)有限公司 | Singing voice synthesis method and device and computer readable storage medium |
CN112309410A (en) * | 2020-10-30 | 2021-02-02 | 北京有竹居网络技术有限公司 | Song sound repairing method and device, electronic equipment and storage medium |
CN112420004A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Method and device for generating songs, electronic equipment and computer readable storage medium |
CN113035163A (en) * | 2021-05-11 | 2021-06-25 | 杭州网易云音乐科技有限公司 | Automatic generation method and device of musical composition, storage medium and electronic equipment |
CN113506554A (en) * | 2020-03-23 | 2021-10-15 | 卡西欧计算机株式会社 | Electronic musical instrument and control method for electronic musical instrument |
CN116001664A (en) * | 2022-12-12 | 2023-04-25 | 瑞声声学科技(深圳)有限公司 | Somatosensory type in-vehicle reminding method, system and related equipment |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101030369A (en) * | 2007-03-30 | 2007-09-05 | 清华大学 | Built-in speech discriminating method based on sub-word hidden Markov model |
CN101246685A (en) * | 2008-03-17 | 2008-08-20 | 清华大学 | Pronunciation quality evaluation method of computer auxiliary language learning system |
CN101436403A (en) * | 2007-11-16 | 2009-05-20 | 创新未来科技有限公司 | Method and system for recognizing tone |
CN101516005A (en) * | 2008-02-23 | 2009-08-26 | 华为技术有限公司 | Speech recognition channel selecting system, method and channel switching device |
CN102982803A (en) * | 2012-12-11 | 2013-03-20 | 华南师范大学 | Isolated word speech recognition method based on HRSF and improved DTW algorithm |
CN102982799A (en) * | 2012-12-20 | 2013-03-20 | 中国科学院自动化研究所 | Speech recognition optimization decoding method integrating guide probability |
CN104217713A (en) * | 2014-07-15 | 2014-12-17 | 西北师范大学 | Tibetan-Chinese speech synthesis method and device |
CN105390133A (en) * | 2015-10-09 | 2016-03-09 | 西北师范大学 | Tibetan TTVS system realization method |
CN106128450A (en) * | 2016-08-31 | 2016-11-16 | 西北师范大学 | The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese |
-
2017
- 2017-03-17 CN CN201710160104.2A patent/CN106971703A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101030369A (en) * | 2007-03-30 | 2007-09-05 | 清华大学 | Built-in speech discriminating method based on sub-word hidden Markov model |
CN101436403A (en) * | 2007-11-16 | 2009-05-20 | 创新未来科技有限公司 | Method and system for recognizing tone |
CN101516005A (en) * | 2008-02-23 | 2009-08-26 | 华为技术有限公司 | Speech recognition channel selecting system, method and channel switching device |
CN101246685A (en) * | 2008-03-17 | 2008-08-20 | 清华大学 | Pronunciation quality evaluation method of computer auxiliary language learning system |
CN102982803A (en) * | 2012-12-11 | 2013-03-20 | 华南师范大学 | Isolated word speech recognition method based on HRSF and improved DTW algorithm |
CN102982799A (en) * | 2012-12-20 | 2013-03-20 | 中国科学院自动化研究所 | Speech recognition optimization decoding method integrating guide probability |
CN104217713A (en) * | 2014-07-15 | 2014-12-17 | 西北师范大学 | Tibetan-Chinese speech synthesis method and device |
CN105390133A (en) * | 2015-10-09 | 2016-03-09 | 西北师范大学 | Tibetan TTVS system realization method |
CN106128450A (en) * | 2016-08-31 | 2016-11-16 | 西北师范大学 | The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese |
Non-Patent Citations (3)
Title |
---|
冯欢: "《基于HMM的歌词到歌声转换的研究》", 《CNKI中国优秀硕士学位论文全文数据库(电子期刊)》 * |
吴义坚 等: "《基于HMM的可训练中文语音合成》", 《中文信息学报》 * |
张有为等: "《人机自然交互》", 30 September 2004 * |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109326280A (en) * | 2017-07-31 | 2019-02-12 | 科大讯飞股份有限公司 | Singing synthesis method and device and electronic equipment |
CN109326280B (en) * | 2017-07-31 | 2022-10-04 | 科大讯飞股份有限公司 | Singing synthesis method and device and electronic equipment |
CN108831435A (en) * | 2018-06-06 | 2018-11-16 | 安徽继远软件有限公司 | A kind of emotional speech synthesizing method based on susceptible sense speaker adaptation |
CN109036370A (en) * | 2018-06-06 | 2018-12-18 | 安徽继远软件有限公司 | A kind of speaker's voice adaptive training method |
CN108831437A (en) * | 2018-06-15 | 2018-11-16 | 百度在线网络技术(北京)有限公司 | A kind of song generation method, device, terminal and storage medium |
CN110634461A (en) * | 2018-06-21 | 2019-12-31 | 卡西欧计算机株式会社 | Electronic musical instrument, control method for electronic musical instrument, and storage medium |
CN110634460A (en) * | 2018-06-21 | 2019-12-31 | 卡西欧计算机株式会社 | Electronic musical instrument, control method for electronic musical instrument, and storage medium |
WO2020007148A1 (en) * | 2018-07-05 | 2020-01-09 | 腾讯科技(深圳)有限公司 | Audio synthesizing method, storage medium and computer equipment |
CN110189741A (en) * | 2018-07-05 | 2019-08-30 | 腾讯数码(天津)有限公司 | Audio synthetic method, device, storage medium and computer equipment |
CN109068439A (en) * | 2018-07-30 | 2018-12-21 | 上海应用技术大学 | A kind of light coloring control method and its control device based on MIDI theme |
CN109147757A (en) * | 2018-09-11 | 2019-01-04 | 广州酷狗计算机科技有限公司 | Song synthetic method and device |
CN109192218B (en) * | 2018-09-13 | 2021-05-07 | 广州酷狗计算机科技有限公司 | Method and apparatus for audio processing |
CN109192218A (en) * | 2018-09-13 | 2019-01-11 | 广州酷狗计算机科技有限公司 | The method and apparatus of audio processing |
CN109147809A (en) * | 2018-09-20 | 2019-01-04 | 广州酷狗计算机科技有限公司 | Acoustic signal processing method, device, terminal and storage medium |
CN109801608A (en) * | 2018-12-18 | 2019-05-24 | 武汉西山艺创文化有限公司 | A kind of song generation method neural network based and system |
WO2020140390A1 (en) * | 2019-01-04 | 2020-07-09 | 平安科技(深圳)有限公司 | Vibrato modeling method, device, computer apparatus and storage medium |
CN110164412A (en) * | 2019-04-26 | 2019-08-23 | 吉林大学珠海学院 | A kind of music automatic synthesis method and system based on LSTM |
CN110264984A (en) * | 2019-05-13 | 2019-09-20 | 北京奇艺世纪科技有限公司 | Model training method, music generating method, device and electronic equipment |
CN110264984B (en) * | 2019-05-13 | 2021-07-06 | 北京奇艺世纪科技有限公司 | Model training method, music generation method and device and electronic equipment |
CN110364140A (en) * | 2019-06-11 | 2019-10-22 | 平安科技(深圳)有限公司 | Training method, device, computer equipment and the storage medium of song synthetic model |
CN110364140B (en) * | 2019-06-11 | 2024-02-06 | 平安科技(深圳)有限公司 | Singing voice synthesis model training method, singing voice synthesis model training device, computer equipment and storage medium |
CN112420004A (en) * | 2019-08-22 | 2021-02-26 | 北京峰趣互联网信息服务有限公司 | Method and device for generating songs, electronic equipment and computer readable storage medium |
CN110838286A (en) * | 2019-11-19 | 2020-02-25 | 腾讯科技(深圳)有限公司 | Model training method, language identification method, device and equipment |
CN110838286B (en) * | 2019-11-19 | 2024-05-03 | 腾讯科技(深圳)有限公司 | Model training method, language identification method, device and equipment |
CN111402843B (en) * | 2020-03-23 | 2021-06-11 | 北京字节跳动网络技术有限公司 | Rap music generation method and device, readable medium and electronic equipment |
CN113506554A (en) * | 2020-03-23 | 2021-10-15 | 卡西欧计算机株式会社 | Electronic musical instrument and control method for electronic musical instrument |
CN111445892A (en) * | 2020-03-23 | 2020-07-24 | 北京字节跳动网络技术有限公司 | Song generation method and device, readable medium and electronic equipment |
CN111402843A (en) * | 2020-03-23 | 2020-07-10 | 北京字节跳动网络技术有限公司 | Rap music generation method and device, readable medium and electronic equipment |
CN112037757A (en) * | 2020-09-04 | 2020-12-04 | 腾讯音乐娱乐科技(深圳)有限公司 | Singing voice synthesis method and device and computer readable storage medium |
CN112037757B (en) * | 2020-09-04 | 2024-03-15 | 腾讯音乐娱乐科技(深圳)有限公司 | Singing voice synthesizing method, singing voice synthesizing equipment and computer readable storage medium |
CN112309410A (en) * | 2020-10-30 | 2021-02-02 | 北京有竹居网络技术有限公司 | Song sound repairing method and device, electronic equipment and storage medium |
CN113035163A (en) * | 2021-05-11 | 2021-06-25 | 杭州网易云音乐科技有限公司 | Automatic generation method and device of musical composition, storage medium and electronic equipment |
CN113035163B (en) * | 2021-05-11 | 2021-08-10 | 杭州网易云音乐科技有限公司 | Automatic generation method and device of musical composition, storage medium and electronic equipment |
CN116001664A (en) * | 2022-12-12 | 2023-04-25 | 瑞声声学科技(深圳)有限公司 | Somatosensory type in-vehicle reminding method, system and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106971703A (en) | A kind of song synthetic method and device based on HMM | |
CN101308652B (en) | Synthesizing method of personalized singing voice | |
CN101399036B (en) | Device and method for conversing voice to be rap music | |
US6804649B2 (en) | Expressivity of voice synthesis by emphasizing source signal features | |
CN101178896B (en) | Unit selection voice synthetic method based on acoustics statistical model | |
Kim et al. | Korean singing voice synthesis system based on an LSTM recurrent neural network | |
Umbert et al. | Expression control in singing voice synthesis: Features, approaches, evaluation, and challenges | |
Hono et al. | Recent development of the DNN-based singing voice synthesis system—sinsy | |
Hono et al. | Sinsy: A deep neural network-based singing voice synthesis system | |
CN103915093A (en) | Method and device for realizing voice singing | |
CN102201234A (en) | Speech synthesizing method based on tone automatic tagging and prediction | |
CN105654942A (en) | Speech synthesis method of interrogative sentence and exclamatory sentence based on statistical parameter | |
Cho et al. | A survey on recent deep learning-driven singing voice synthesis systems | |
Gupta et al. | Deep learning approaches in topics of singing information processing | |
Liu et al. | Vibrato learning in multi-singer singing voice synthesis | |
Bonada et al. | Singing voice synthesis combining excitation plus resonance and sinusoidal plus residual models | |
Wada et al. | Sequential generation of singing f0 contours from musical note sequences based on wavenet | |
Chu et al. | MPop600: A Mandarin popular song database with aligned audio, lyrics, and musical scores for singing voice synthesis | |
Bonada et al. | Spectral approach to the modeling of the singing voice | |
Li et al. | A lyrics to singing voice synthesis system with variable timbre | |
Pitrelli et al. | Expressive speech synthesis using American English ToBI: questions and contrastive emphasis | |
Gu et al. | Singing-voice synthesis using demi-syllable unit selection | |
Nose et al. | A style control technique for singing voice synthesis based on multiple-regression HSMM. | |
Khan et al. | Singing Voice Synthesis Using HMM Based TTS and MusicXML | |
Lee et al. | A study of F0 modelling and generation with lyrics and shape characterization for singing voice synthesis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170721 |