CN106971703A - A kind of song synthetic method and device based on HMM - Google Patents

A kind of song synthetic method and device based on HMM Download PDF

Info

Publication number
CN106971703A
CN106971703A CN201710160104.2A CN201710160104A CN106971703A CN 106971703 A CN106971703 A CN 106971703A CN 201710160104 A CN201710160104 A CN 201710160104A CN 106971703 A CN106971703 A CN 106971703A
Authority
CN
China
Prior art keywords
song
hmm
model
speaker
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710160104.2A
Other languages
Chinese (zh)
Inventor
杨鸿武
赵娜
冯欢
甘振业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest Normal University
Original Assignee
Northwest Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest Normal University filed Critical Northwest Normal University
Priority to CN201710160104.2A priority Critical patent/CN106971703A/en
Publication of CN106971703A publication Critical patent/CN106971703A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • G10H1/0041Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
    • G10H1/0058Transmission between separate instruments or between individual components of a musical system
    • G10H1/0066Transmission between separate instruments or between individual components of a musical system using a MIDI interface
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/148Duration modelling in HMMs, e.g. semi HMM, segmental models or transition probabilities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/005Algorithms for electrophonic musical instruments or musical processing, e.g. for automatic composition or resource allocation
    • G10H2250/015Markov chains, e.g. hidden Markov models [HMM], for musical processing, e.g. musical analysis or musical composition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/471General musical sound synthesis principles, i.e. sound category-independent synthesis methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The invention discloses a kind of song synthetic method based on HMM and device, with TTS (literary periodicals) technology, pass through HTS (speech synthesis system based on hidden Markov model), and utilize STRAIGHT algorithms, and establish the acoustic model related based on HMM speaker synthesized towards song, the melody Controlling model of song, and speaker adaptation training has been carried out, realize the personalized speech synthesizer that a kind of lyrics based on HMM are changed in real time to song.The system device enriches the research contents of phonetic synthesis, the voice of synthesis is had more the expression of expressive force and emotion;Especially give the opportunity to study that the technical operations such as song is made, music is handled are provided with music-lover;Social resources workable for people are added, with certain practical value and important meaning.

Description

A kind of song synthetic method and device based on HMM
Technical field
The present invention relates to the fields such as human-computer interaction technology, text-language switch technology, speech synthesis technique, and in particular to a kind of Song synthetic method and device based on HMM.
Background technology
With constantly bringing forth new ideas and perfect for information technology, the music multimedia application in terms of many man-machine interactions is also gradually walked Enter our daily life, such as computer is requested a song, set a song to music, modifying on song, and mobile phone and listen song to know song etc..How meter is made The more hommization of calculation machine, can be as the mankind " singing ", that is to say, that, it is known that numbered musical notation and the lyrics, computer just can be automatic Produce beautiful, interesting to listen to song and have become a kind of new demand.With multimedia technology developing rapidly in entertainment field, together When more wide application space is also provided for this technology.
Overwhelming majority music are all to record and propagate in a digital format at present, for example, WAV, MP3, MIDI, Yi Jishi When a variety of storage forms such as music broadcast.Compared with traditional music pattern, digital music is in terms of making, storage, distribution There is incomparable advantage.By computer, creator can hear the making effect of musical works while setting a song to music, right Any modification operation that music score is carried out can timely feed back to creator, it is not necessary to carry out traditional rehearsal, performance, record The process of a series of complex such as system, editor handles music, greatly reduces cycle and the human cost of music making, simultaneously It it also avoid composer and the creation inspiration accidentally obtained lost in very long production process.
Speech synthesis technique is an important research content of field of human-computer interaction, is the important set of embedded research field Into part.Nowadays, song synthesis also progressively becomes a much-talked-about topic.However, before the appearance of song synthetic technology, language The development of sound synthetic technology relative maturity.Some scholars attempt to the method for phonetic synthesis to synthesize song, still There is a certain degree of otherness again in song and voice.Voice, which focuses on content, (can certainly express purpose, the feelings of speaker Sense), song focuses on deduction and the fluctuations of melody, and this causes the method for phonetic synthesis to be applied directly to the conjunction of song Into central.
Among long-term domestic and international research process, song synthesis is similar to speech synthesis technique, has also gradually formed The synthesis mode of three kinds of main flows:1. waveform concatenation formula is synthesized;2. the synthesis of the formula of parametrization;3. speech modification formula is synthesized.Wherein splice Synthesis and the synthesis of parametrization formula are all based on corpus, and synthesis tonequality is not high, and speech modification mode is more flexible, is basis Melodic information changes the parameters,acoustic of voice signal and then reaches the synthesis of song.The lyrics are at home and abroad proposed to song The personalized speech synthesis changed in real time.Song is produced according to the music-book information of song immediately, it can receive a first song lyrics Continuous speech.The system is after the typing voice corresponding with the lyrics with Viterbi algorithm in continuous phonetic synthesis unit Song is synthesized, is realized by pitch synchronous addition of waveforms (Pitch-Synchronous Overlap-Add, PSOLA) method Pitch, duration, the real-time conversion of energy and frequency spectrum, and synthesize song.Because the system does not account for voice and song in pitch With the otherness of the acoustic connection such as duration, cause the effect of synthesis undesirable.Also have on this basis, it is proposed that one big language material The lyrics in storehouse are changed to song, and the system has all reached relatively good result in terms of naturalness and tonequality.The system design The corpus of 3 mandarins, the optimum combination of each synthesis unit is determined with Viterbi algorithm.The defect of this method It is:Make corpus devote a tremendous amount of time and people energy.
Therefore, those skilled in the art be directed to exploitation it is a kind of it is new towards have music process demand person based on HMM Personalized song synthesis implementation method and device.
The content of the invention
In view of the drawbacks described above of prior art, the invention solves the problems that the synthesis of the Chinese song proposed in background technology is ground Study carefully less, synthesis tonequality is not high, operation the problems such as take time and effort there is provided it is a kind of towards have music process demand person based on The implementation method and device of HMM personalized song synthesis.
In order to solve the above technical problems, the technical scheme that the present invention is provided is as follows:
A kind of song synthetic method based on HMM, comprises the following steps:
A, analysis voice and song set up the melody Controlling model of song in the otherness of acoustic feature;
B, the acoustic model for setting up the correlation of the speaker based on HMM synthesized towards song;
C, using the speech synthesis system based on HMM synthesize song.
Further, the specific steps of the otherness of voice and song in acoustic feature are analyzed described in the step A such as Under:
A, analysis of spectrum is carried out to voice signal with temporal analysis and frequency domain analysis, and voice signal and song are believed Number carry out fundamental frequency comparative analysis;
B, required music-book information is extracted from MIDI systems using MIDI technologies;
C, the melodic information by reading the music score extracted in MIDI files, analyze the architectural feature of its music score file, enter And music parameter information is obtained, when the music parameter information includes channel number, note pitch, the speed of key, note starting Between and note duration.
Further, the melody Controlling model of song described in the step A includes fundamental frequency Controlling model and duration is controlled Model;The discrete notes in music score are converted into continuous fundamental curve using fundamental frequency Controlling model, and mould is controlled using duration Type obtains the pronunciation duration for singing note.
Further, the acoustic mode of the correlation of the speaker based on HMM synthesized towards song is set up described in the step B Type has the following steps:
A, the voice language material using speaker, analyze speech data, obtaining speech data includes fundamental frequency F0, duration, frequency Compose SP and aperiodic index AP parameters,acoustic;And the speaker adaptation training technique based on HMM is utilized, training is mixed The average sound model of voice;
B, a small amount of speech data using target speaker to be synthesized, by speaker adaptation converter technique, are obtained The adaptive acoustic model of target speaker, and adaptive model is modified and updated.
Further, described to be trained by the speaker adaptation based on HMM, training obtains the average sound mould of mixing voice Type comprises the following steps:
A, the corpus to speaker and target speaker corpus data carry out speech analysis, extract its acoustics ginseng Number:Mel cepstrum coefficients, and calculate their first-order difference and second differnce;
B, with reference to context property collection, carry out the HMM model and shape of HMM model training, training frequency spectrum and base frequency parameters The HMM MSD-HSMM of many distributions half of state duration parameters;
C, the sound bank using a small amount of target speaker, carry out speaker adaptation training, obtain being averaged for mixing voice Sound model, so as to obtain context-sensitive MSD-HSMM models.
Further, a small amount of speech data using target speaker to be synthesized, is become by speaker adaptation Technology is changed, the adaptive acoustic model of target speaker is obtained, and adaptive model is modified and updated, including following step Suddenly:
After a, speaker adaptation training, using the CMLLR adaptive algorithms based on HSMM, calculating obtains voice conversion State output probability distribution and duration mean of a probability distribution vector sum covariance matrix, characteristic vector o and shape under state i State duration d transformation equation is:
bi(o)=N (o;Aui-b,AΣiAT)=| A-1|N(Wξ;uii)
pi(d)=N (d;αmi-β,ασi 2α)=| α-1|N(αψ;mii 2)
Wherein, ξ=[oT, 1], ψ=[d, 1]T, μiThe average being distributed for state output, miThe average being distributed for duration, Σi For diagonal covariance matrix,For variance, W=[A-1 b-1] be target speaker's state output probability density distribution linear change Change matrix, X=[α-1-1] be state duration probability density distribution transformation matrix;
B, by the adaptive transformation algorithm based on HSMM, the frequency spectrum, fundamental frequency and duration parameters of speech data can be carried out Normalization and conversion, for the self-adapting data O that length is T, can carry out maximal possibility estimation to conversion Λ=(W, X);
C, using maximum a posteriori MAP algorithms the adaptive model of voice is corrected and updated, for giving HSMM Parameter set λ, if its forward direction probability and backward probability are respectively:αtAnd β (i)t(i), then its Continuous Observation sequence under state i ot-d+1…otGenerating probabilityFor:
MAP estimations are described as follows:
Wherein,WithMean vector after being converted for linear regression, ω and τ are respectively that state output and duration are distributed MAP estimates parameter,WithFor adaptive mean vectorWithWeighted average MAP estimates.
Further, the language that song is used is synthesized using the speech synthesis system based on HMM described in the step C It is based on STRAIGHT algorithms that cent, which is analysed with synthetic method,.
Further, synthesizing song using the speech synthesis system based on HMM described in the step C includes following step Suddenly:
A, using text analyzing instrument the lyrics text of input is analyzed, song that will be given using text analyzing program Word text is converted to the acoustics annotated sequence comprising linguistic context description information, with clustered in training process obtained each decision tree come The context HMM model related to each pronunciation and its linguistic context is predicted, then is spliced into a sentence HMM model;
B, according to MIDI files, obtain the pitch and the duration of a sound of each note in the lyrics, phase obtained by melody Controlling model The fundamental frequency and duration answered, frequency spectrum SP, the aperiodic index AP of syllable and fundamental frequency F0 duration are changed using note duration;
C, using in the related acoustic model of speaker and STRAIGHT algorithm generated statement HMM models on frequency spectrum SP, aperiodic index AP, duration, fundamental frequency F0 argument sequence, and synthesize voice, musical background is added, song is realized Synthesis.
Further, the language that song is used is synthesized using the speech synthesis system based on HMM described in the step C It is based on STRAIGHT algorithms, to comprise the following steps that cent, which is analysed with synthetic method,:
The voice signal of speaker is inputted first, and the fundamental frequency F of voice is extracted with STRAIGHT algorithms0And spectrum envelope Spectral envelope, are then modulated to parameters,acoustic, the new sound source of generation and time varing filter, further according to original filter Ripple device model, voice is synthesized using such as following formula:
Wherein, Q represent synthesis excitation in one group of sampling point position, G () represent Pitch modulation, can arbitrarily with original The F of beginning voice0To match the F after modulation0, time structure of the all-pass filter for controlling fine pitch and original signal, such as one The linear phase being directly proportional to frequency is moved, for controlling F0Fine structure, from modulation amplitude spectrum A (S (u (w), r (t)), u (w), r (t)) such as following formula, it can calculate and obtain corresponding Fourier transformation V (w, the t of minimum phase pulsei), wherein A (), u () Represent the modulation of amplitude, frequency and time dimension respectively with r ();
Wherein, q represents frequency.
A kind of song synthesizer based on HMM, it is characterised in that including:
Melody control module, the melody Controlling model for setting up song;
The related acoustic module of speaker based on HMM, the acoustics for setting up the speaker's correlation synthesized towards song Model;
Song synthesis module based on HMM, the singing voice to be synthesized for synthesizing.
Further, the melody control module, including:
MIDI analytic units, for analyzing the music-book information extracted from MIDI files, and obtain corresponding music parameter Information;
Prosodic control unit, for, in the otherness of acoustic feature, setting up the melody control of song according to voice and song Model.
Further, the related acoustic module of the speaker based on HMM, including:
Acoustic model unit, the acoustic model for obtaining target speaker;
Parameters,acoustic subelement, for the parameter phonetic synthesis based on HMM.
Further, the song synthesis module based on HMM, including:
Text analysis unit, carries out text analyzing to the lyrics text of input, obtains context-sensitive mark;
HMM model trains subelement, the HMM model storehouse for setting up speech data;
Speaker adaptation subelement, the characteristic parameter for normalizing and changing speaker in training obtains adaptive Model;
Phonetic synthesis unit, the singing voice to be synthesized for synthesizing;
Song synthesis unit, adds musical background for the singing voice to synthesis, completes the synthesis of song.
The present invention has the advantages and positive effects of:A kind of song synthetic method and device based on HMM, with TTS (literary periodicals) technology, by HTS (speech synthesis system based on hidden Markov model), and using STRAIGHT algorithms, And the acoustic model related based on HMM speaker synthesized towards song, the melody Controlling model of song are established, and carry out Speaker adaptation training, realizes the personalized speech synthesizer that a kind of lyrics based on HMM are changed in real time to song. Compared with traditional singing sound synthesis system, the system with speech analysis and synthetic method be using STRAIGHT algorithms as base Plinth, while adding speaker adaptation training process in the training stage, obtains the average sound model of mixing voice, by this Training process, can reduce the influence caused by the otherness of speaker in sound bank, so as to improve the language of song synthesis Sound quality;On the basis of average sound model, by speaker adaptation converter technique, using a small amount of speaker's language material, close Into naturalness and melodious degree all relatively good singing voices.The system device enriches the research contents of phonetic synthesis, makes synthesis Voice have more the expression of expressive force and emotion;Especially giving, there is music-lover to provide song is made, music is handled etc. The opportunity to study of technical operation;Social resources workable for people are added, with certain practical value and important meaning.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of system flow block diagram of song synthetic method based on HMM of the preferred embodiment of the present invention;
Fig. 2 is the MIDI system block diagrams of the preferred embodiment of the present invention;
Fig. 3 is the speaker adaptation speech synthesis system block diagram of the preferred embodiment of the present invention;
Fig. 4 is the STRAIGHT analysis-modulation-synthesis system block diagrams of the preferred embodiment of the present invention;
The apparatus structure signal that Fig. 5 realizes for a kind of song synthesis based on HMM of the preferred embodiment of the present invention Figure.
Embodiment
Below in conjunction with the accompanying drawing in the present invention, the technical scheme in the present invention is clearly and completely described, shown So, described only a part of embodiment of the invention, rather than whole embodiments.Based on the embodiment in the present invention, The every other embodiment that those of ordinary skill in the art are obtained on the premise of creative work is not made, belongs to this Invent the scope of protection.
As shown in figure 1, a preferred embodiment of the present invention discloses a kind of song synthetic method based on HMM, with TTS (literary periodicals) technology, by HTS (speech synthesis system based on hidden Markov model), and using STRAIGHT algorithms, And the acoustic model related based on HMM speaker synthesized towards song, the melody Controlling model of song are established, and carry out Speaker adaptation training, realizes the personalized speech synthetic method that a kind of lyrics based on HMM are changed in real time to song. Comprise the following steps:
A, analysis voice and song set up the melody Controlling model of song in the otherness of acoustic feature;
The otherness that voice and song are analyzed described in step A in acoustic feature is comprised the following steps that:
A, analysis of spectrum is carried out to voice signal with temporal analysis and frequency domain analysis, and voice signal and song are believed Number carry out fundamental frequency comparative analysis;
B, required music-book information is extracted from MIDI systems using MIDI technologies;
C, the melodic information by reading the music score extracted in MIDI files, analyze the architectural feature of its music score file, enter And music parameter information is obtained, when the music parameter information includes channel number, note pitch, the speed of key, note starting Between and note duration.
As shown in Fig. 2 being MIDI system block diagrams.
The melody Controlling model of song described in step A includes fundamental frequency Controlling model and duration Controlling model;Utilize fundamental frequency Discrete notes in music score are converted to continuous fundamental curve by Controlling model, and sing note using the acquisition of duration Controlling model Pronunciation duration.
B, the acoustic model for setting up the correlation of the speaker based on HMM synthesized towards song;
As shown in figure 3, setting up the acoustic model of the correlation of the speaker based on HMM synthesized towards song described in step B Have the following steps:
A, the voice language material using speaker, analyze speech data, obtaining speech data includes fundamental frequency F0, duration, frequency Compose SP and aperiodic index AP parameters,acoustic;And the speaker adaptation training technique based on HMM is utilized, training is mixed The average sound model of voice;
B, a small amount of speech data using target speaker to be synthesized, by speaker adaptation converter technique, are obtained The adaptive acoustic model of target speaker, and adaptive model is modified and updated, and then synthesize and said with target Talk about the voice of people's tone color.
As shown in figure 3, described trained by the speaker adaptation based on HMM, training obtains the average sound of mixing voice Model comprises the following steps:
A, the corpus to speaker and target speaker corpus data carry out speech analysis, extract its acoustics ginseng Number:Mel cepstrum coefficients, and calculate their first-order difference and second differnce;
B, with reference to context property collection, carry out the HMM model and shape of HMM model training, training frequency spectrum and base frequency parameters The HMM MSD-HSMM of many distributions half of state duration parameters;
C, the sound bank using a small amount of target speaker, carry out speaker adaptation training, obtain being averaged for mixing voice Sound model, so that context-sensitive MSD-HSMM models are obtained, including:
1. (CMML) algorithm is linearly returned using constraint maximum likelihood, by the speech data of speaker in training and average sound Between difference linear regression function representation;
2. the equation of linear regression normalization being distributed with one group of state output distribution and state duration is trained between speaker Difference;
3. training obtains the average sound model of mixing voice, so as to obtain context-sensitive MSD-HSMM models.
A small amount of speech data using target speaker to be synthesized, by speaker adaptation converter technique, is obtained It is modified and updates to the adaptive acoustic model of target speaker, and to adaptive model, comprises the following steps:
After a, speaker adaptation training, using the CMLLR adaptive algorithms based on HSMM, calculating obtains voice conversion State output probability distribution and duration mean of a probability distribution vector sum covariance matrix, characteristic vector o and shape under state i State duration d transformation equation is:
bi(o)=N (o;Aui-b,AΣiAT)=| A-1|N(Wξ;uii)
pi(d)=N (d;αmi-β,ασi 2α)=| α-1|N(αψ;mii 2)
Wherein, ξ=[oT, 1], ψ=[d, 1]T, μiThe average being distributed for state output, miThe average being distributed for duration, Σi For diagonal covariance matrix,For variance, W=[A-1b-1] be target speaker's state output probability density distribution linear change Change matrix, X=[α-1-1] be state duration probability density distribution transformation matrix;
B, by the adaptive transformation algorithm based on HSMM, the frequency spectrum, fundamental frequency and duration parameters of speech data can be carried out Normalization and conversion, for the self-adapting data O that length is T, can carry out maximal possibility estimation to conversion Λ=(W, X);
C, using maximum a posteriori MAP algorithms the adaptive model of voice is corrected and updated, for giving HSMM Parameter set λ, if its forward direction probability and backward probability are respectively:αtAnd β (i)t(i), then its Continuous Observation sequence under state i ot-d+1…otGenerating probabilityFor:
MAP estimations are described as follows:
Wherein,WithMean vector after being converted for linear regression, ω and τ are respectively that state output and duration are distributed MAP estimates parameter,WithFor adaptive mean vectorWithWeighted average MAP estimates.
C, using the speech synthesis system based on HMM synthesize song.
Synthesize speech analysis and the synthesis that song is used using the speech synthesis system based on HMM described in step C Method is based on STRAIGHT algorithms.
Synthesize song using the speech synthesis system based on HMM described in step C to comprise the following steps:A, use text Analysis tool is analyzed the lyrics text of input, is converted to given lyrics text comprising language using text analyzing program The acoustics annotated sequence of border description information, predicted with obtained each decision tree is clustered in training process with each pronunciation and its The related context HMM model of linguistic context, then it is spliced into a sentence HMM model;
B, according to MIDI files, obtain the pitch and the duration of a sound of each note in the lyrics, phase obtained by melody Controlling model The fundamental frequency and duration answered, frequency spectrum SP, the aperiodic index AP of syllable and fundamental frequency F0 duration are changed using note duration;
C, using in the related acoustic model of speaker and STRAIGHT algorithm generated statement HMM models on frequency spectrum SP, aperiodic index AP, duration, fundamental frequency F0 argument sequence, and synthesize voice, musical background is added, song is realized Synthesis.
As shown in figure 4, during singing voice is synthesized, analyze-modulate using STRAIGHT-synthesis system comes accurately Extraction fundamental frequency information, exclude spectrum envelope periodically disturb, described in the step C utilize the phonetic synthesis system based on HMM It is based on STRAIGHT algorithms, to comprise the following steps that system, which synthesizes the speech analysis that song used and synthetic method,:
The voice signal of speaker is inputted first, and the fundamental frequency F of voice is extracted with STRAIGHT algorithms0And spectrum envelope Spectral envelope, are then modulated to parameters,acoustic, the new sound source of generation and time varing filter, further according to original filter Ripple device model, voice is synthesized using such as following formula:
Wherein, Q represent synthesis excitation in one group of sampling point position, G () represent Pitch modulation, can arbitrarily with original The F of beginning voice0To match the F after modulation0, time structure of the all-pass filter for controlling fine pitch and original signal, such as one The linear phase being directly proportional to frequency is moved, for controlling F0Fine structure, from modulation amplitude spectrum A (S (u (w), r (t)), u (w), r (t)) such as following formula, it can calculate and obtain corresponding Fourier transformation V (w, the t of minimum phase pulsei), wherein A (), u () Represent the modulation of amplitude, frequency and time dimension respectively with r ();
Wherein, q represents frequency.
Corresponding with the above method, another preferred embodiment of the invention also discloses a kind of song synthesis based on HMM Device, the device is used to set up the acoustic model related based on HMM speaker synthesized towards song, the melody control mould of song Type, carries out speaker adaptation training, and by using the HTS (voices based on hidden Markov model of STRAIGHT algorithms Synthesis system), with reference to TTS (literary periodicals) technology, the lyrics are realized to the personalization conversion in real time of song.In realization, may be used The function of the present apparatus is realized by software, hardware or software and hardware combining mode.
As shown in figure 5, the song synthesizer includes:Melody control module, the related acoustics of the speaker based on HMM Module and the song synthesis module based on HMM.
Melody control module, the melody Controlling model for setting up song;
The melody control module, including:
MIDI analytic units, for analyzing the music-book information extracted from MIDI files, and obtain corresponding music parameter Information;
Prosodic control unit, for, in the otherness of acoustic feature, setting up the melody control of song according to voice and song Model.
By MIDI analytic units, the music-book information extracted from MIDI files is analyzed, and obtain corresponding music parameter Information;Then in melody control module, according to voice and song in the otherness of acoustic feature, the melody control mould of song is set up Type.
The related acoustic module of speaker based on HMM, the acoustics for setting up the speaker's correlation synthesized towards song Model;
The related acoustic module of the speaker based on HMM, including:
Acoustic model unit, the acoustic model for obtaining target speaker;
Parameters,acoustic subelement, for the parameter phonetic synthesis based on HMM.
Song synthesis module based on HMM, the singing voice to be synthesized for synthesizing.
The song synthesis module based on HMM, including:
Text analysis unit, carries out text analyzing to the lyrics text of input, obtains context-sensitive mark;
HMM model trains subelement, the HMM model storehouse for setting up speech data, by extracting voice number in sound bank According to speaker's parameters,acoustic, mainly extract fundamental frequency, frequency spectrum and duration parameters, and combine the context markup information in sound storehouse, The statistical model of acoustic model is trained, further according to context property collection, fundamental frequency, frequency spectrum and duration parameters are determined;
Speaker adaptation subelement, the characteristic parameter for normalizing and changing speaker in training obtains adaptive Model, is trained by speaker, and state output is distributed and state duration between speaker and average sound model in normalization training Difference between distribution, and the use linear regression algorithm of maximum likelihood determines the average sound model of many speaker's mixing voices, then Using self-adapting data, the state output probability distribution and duration mean of a probability distribution vector sum covariance of speaker is calculated Matrix, and it is converted to target speaker model, so as to set up the MSD-HSMM of target speaker adaptive model;
Phonetic synthesis unit, the singing voice to be synthesized for synthesizing utilizes the adaptive model of amendment, prediction input text The speech parameter of this lyrics, and Speech acoustics parameter is extracted, then song is synthesized by the VODER based on STRAIGHT algorithms Sound voice;
Song synthesis unit, adds musical background for the singing voice to synthesis, completes the synthesis of song.
The process described above process can be completed by the related hardware of programmed instruction, and described program can be stored in can In the storage medium of reading, the program performs the corresponding steps in the above method upon execution.
Above content is to combine specific preferred embodiment further description made for the present invention, it is impossible to assert The specific implementation of the present invention is confined to these explanations.For general technical staff of the technical field of the invention, On the premise of not departing from present inventive concept, some simple deduction or replace can also be made, should all be considered as belonging to the present invention's Protection domain.

Claims (13)

1. a kind of song synthetic method based on HMM, it is characterised in that comprise the following steps:
A, analysis voice and song set up the melody Controlling model of song in the otherness of acoustic feature;
B, the acoustic model for setting up the correlation of the speaker based on HMM synthesized towards song;
C, using the speech synthesis system based on HMM synthesize song.
2. a kind of song synthetic method based on HMM according to claim 1, it is characterised in that described in the step A Analysis voice and song are comprised the following steps that in the otherness of acoustic feature:
A, with temporal analysis and frequency domain analysis analysis of spectrum is carried out to voice signal, and voice signal is entered with singing voice signals The comparative analysis of row fundamental frequency;
B, required music-book information is extracted from MIDI systems using MIDI technologies;
C, the melodic information by reading the music score extracted in MIDI files, analyze the architectural feature of its music score file, and then obtain Music parameter information, the music parameter information include channel number, note pitch, the speed of key, note initial time and Note duration.
3. a kind of song synthetic method based on HMM according to claim 2, it is characterised in that described in the step A The melody Controlling model of song includes fundamental frequency Controlling model and duration Controlling model;Using fundamental frequency Controlling model by music score from Scattered pitch is converted to continuous fundamental curve, and the pronunciation duration for singing note is obtained using duration Controlling model.
4. a kind of song synthetic method based on HMM according to claim 1, it is characterised in that described in the step B The acoustic model for setting up the correlation of the speaker based on HMM synthesized towards song has the following steps:
A, the voice language material using speaker, analyze speech data, obtaining speech data includes fundamental frequency F0, duration, frequency spectrum SP With aperiodic index AP parameters,acoustic;And the speaker adaptation training technique based on HMM is utilized, training obtains mixing voice Average sound model;
B, a small amount of speech data using target speaker to be synthesized, by speaker adaptation converter technique, obtain target The adaptive acoustic model of speaker, and adaptive model is modified and updated.
5. a kind of song synthetic method based on HMM according to claim 4, it is characterised in that described by based on HMM Speaker adaptation training, training obtains the average sound model of mixing voice and comprises the following steps:
A, the corpus to speaker and target speaker corpus data carry out speech analysis, extract its parameters,acoustic:Mel Cepstrum coefficient, and calculate their first-order difference and second differnce;
B, with reference to context property collection, HMM model training is carried out, when HMM model and the state of training frequency spectrum and base frequency parameters The HMM MSD-HSMM of many distributions half of long parameter;
C, the sound bank using a small amount of target speaker, carry out speaker adaptation training, obtain the average sound mould of mixing voice Type, so as to obtain context-sensitive MSD-HSMM models.
6. a kind of song synthetic method based on HMM according to claim 4, it is characterised in that described using to be synthesized Target speaker a small amount of speech data, by speaker adaptation converter technique, obtain target speaker it is adaptive at the sound Model is learned, and adaptive model is modified and updated, is comprised the following steps:
After a, speaker adaptation training, using the CMLLR adaptive algorithms based on HSMM, the shape for obtaining voice conversion is calculated State output probability is distributed and duration mean of a probability distribution vector sum covariance matrix, characteristic vector o and during state under state i Long d transformation equation is:
bi(o)=N (o;Aui-b,AΣiAT)=| A-1|N(Wξ;uii)
pi(d)=N (d;αmi-β,ασi 2α)=| α-1|N(αψ;mii 2)
Wherein, ξ=[oT, 1], ψ=[d, 1]T, μiThe average being distributed for state output, miThe average being distributed for duration, ΣiTo be right Angle covariance matrix, σi 2For variance, W=[A-1b-1] be target speaker's state output probability density distribution linear transformation square Battle array, X=[α-1-1] be state duration probability density distribution transformation matrix;
B, by the adaptive transformation algorithm based on HSMM, normalizing can be carried out to the frequency spectrum, fundamental frequency and duration parameters of speech data Change and convert, for the self-adapting data O that length is T, maximal possibility estimation can be carried out to conversion Λ=(W, X);
C, using maximum a posteriori MAP algorithms the adaptive model of voice is corrected and updated, the ginseng for giving HSMM Manifold λ, if its forward direction probability and backward probability are respectively:αtAnd β (i)t(i), then its Continuous Observation sequence under state i ot-d+1…otGenerating probability κt d(i) it is:
κ t d ( i ) = 1 P ( O | λ ) Σ j = 1 j ≠ i N α t - d ( j ) p ( d ) Π s = t - d + 1 t b i ( o s ) β t ( i )
MAP estimations are described as follows:
Wherein,WithMean vector after being converted for linear regression, ω and τ are respectively that the MAP of state output and duration distribution estimates Count parameter,WithFor adaptive mean vectorWithWeighted average MAP estimates.
7. a kind of song synthetic method based on HMM according to claim 1, it is characterised in that described in the step C It is with STRAIGHT algorithms to synthesize speech analysis that song used and synthetic method using the speech synthesis system based on HMM Based on.
8. a kind of song synthetic method based on HMM according to claim 7, it is characterised in that described in the step C Synthesize song using the speech synthesis system based on HMM to comprise the following steps:
A, using text analyzing instrument the lyrics text of input is analyzed, using text analyzing program by given lyrics text Originally the acoustics annotated sequence comprising linguistic context description information is converted to, is predicted with obtained each decision tree is clustered in training process The context HMM model related to each pronunciation and its linguistic context, then it is spliced into a sentence HMM model;
B, according to MIDI files, obtain the pitch and the duration of a sound of each note in the lyrics, obtained accordingly by melody Controlling model Fundamental frequency and duration, frequency spectrum SP, the aperiodic index AP of syllable and fundamental frequency F0 duration are changed using note duration;
C, using in the related acoustic model of speaker and STRAIGHT algorithm generated statement HMM models on frequency spectrum SP, non- Periodic key AP, duration, fundamental frequency F0 argument sequence, and synthesize voice, musical background is added, the synthesis of song is realized.
9. a kind of song synthetic method based on HMM according to claim 7 or 8, it is characterised in that in the step C It is described that to synthesize speech analysis that song used and synthetic method using the speech synthesis system based on HMM be with STRAIGHT Based on algorithm, comprise the following steps:
The voice signal of speaker is inputted first, and the fundamental frequency F of voice is extracted with STRAIGHT algorithms0With spectrum envelope Spectral Envelope, is then modulated to parameters,acoustic, the new sound source of generation and time varing filter, further according to former filter model, Voice is synthesized using such as following formula:
y ( t ) = Σ t i ∈ Q 1 G ( f 0 ( t i ) ) V t i ( t - T ( t i ) )
V t i ( t ) = 1 2 π ∫ - ∞ ∞ V ( w , t i ) φ ( w ) e j w ( t ) d w
T ( t i ) = &Sigma; t k &Element; Q , k < i 1 G ( f 0 ( t k ) )
Wherein, Q represent synthesis excitation in one group of sampling point position, G () represent Pitch modulation, can arbitrarily with original language The F of sound0To match the F after modulation0, all-pass filter is used for the time structure for controlling fine pitch and original signal, such as one and frequency The linear phase that rate is directly proportional is moved, for controlling F0Fine structure, from modulation amplitude spectrum A (S (u (w), r (t)), u (w), r (t)) such as following formula, it can calculate and obtain corresponding Fourier transformation V (w, the t of minimum phase pulsei), wherein A (), u () and r () The modulation of amplitude, frequency and time dimension is represented respectively;
V ( &omega; , t ) = exp ( 1 2 &pi; &Integral; 0 &infin; h t ( q ) e j &omega; q d q )
h t ( q ) = 0 , ( q < 0 ) c t ( 0 ) , ( q = 0 ) 2 c t ( q ) , ( q > 0 )
c t ( q ) = 1 2 &pi; &Integral; - &infin; &infin; e - j w q l g A ( S ( u ( w ) , r ( t ) ) , u ( w ) , r ( t ) ) d w
Wherein, q represents frequency.
10. a kind of song synthesizer based on HMM, it is characterised in that including:
Melody control module, the melody Controlling model for setting up song;
The related acoustic module of speaker based on HMM, the acoustic model for setting up the speaker's correlation synthesized towards song;
Song synthesis module based on HMM, the singing voice to be synthesized for synthesizing.
11. a kind of song synthesizer based on HMM according to claim 10, it is characterised in that the melody control Module, including:
MIDI analytic units, for analyzing the music-book information extracted from MIDI files, and obtain corresponding music parameter information;
Prosodic control unit, for, in the otherness of acoustic feature, setting up the melody Controlling model of song according to voice and song.
12. a kind of song synthesizer based on HMM according to claim 10, it is characterised in that described based on HMM's The related acoustic module of speaker, including:
Acoustic model unit, the acoustic model for obtaining target speaker;
Parameters,acoustic subelement, for the parameter phonetic synthesis based on HMM.
13. a kind of song synthesizer based on HMM according to claim 10, it is characterised in that described based on HMM's Song synthesis module, including:
Text analysis unit, carries out text analyzing to the lyrics text of input, obtains context-sensitive mark;
HMM model trains subelement, the HMM model storehouse for setting up speech data;
Speaker adaptation subelement, the characteristic parameter for normalizing and changing speaker in training, obtains adaptive model;
Phonetic synthesis unit, the singing voice to be synthesized for synthesizing;
Song synthesis unit, adds musical background for the singing voice to synthesis, completes the synthesis of song.
CN201710160104.2A 2017-03-17 2017-03-17 A kind of song synthetic method and device based on HMM Pending CN106971703A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710160104.2A CN106971703A (en) 2017-03-17 2017-03-17 A kind of song synthetic method and device based on HMM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710160104.2A CN106971703A (en) 2017-03-17 2017-03-17 A kind of song synthetic method and device based on HMM

Publications (1)

Publication Number Publication Date
CN106971703A true CN106971703A (en) 2017-07-21

Family

ID=59329007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710160104.2A Pending CN106971703A (en) 2017-03-17 2017-03-17 A kind of song synthetic method and device based on HMM

Country Status (1)

Country Link
CN (1) CN106971703A (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108831437A (en) * 2018-06-15 2018-11-16 百度在线网络技术(北京)有限公司 A kind of song generation method, device, terminal and storage medium
CN108831435A (en) * 2018-06-06 2018-11-16 安徽继远软件有限公司 A kind of emotional speech synthesizing method based on susceptible sense speaker adaptation
CN109036370A (en) * 2018-06-06 2018-12-18 安徽继远软件有限公司 A kind of speaker's voice adaptive training method
CN109068439A (en) * 2018-07-30 2018-12-21 上海应用技术大学 A kind of light coloring control method and its control device based on MIDI theme
CN109147757A (en) * 2018-09-11 2019-01-04 广州酷狗计算机科技有限公司 Song synthetic method and device
CN109147809A (en) * 2018-09-20 2019-01-04 广州酷狗计算机科技有限公司 Acoustic signal processing method, device, terminal and storage medium
CN109192218A (en) * 2018-09-13 2019-01-11 广州酷狗计算机科技有限公司 The method and apparatus of audio processing
CN109326280A (en) * 2017-07-31 2019-02-12 科大讯飞股份有限公司 Singing synthesis method and device and electronic equipment
CN109801608A (en) * 2018-12-18 2019-05-24 武汉西山艺创文化有限公司 A kind of song generation method neural network based and system
CN110164412A (en) * 2019-04-26 2019-08-23 吉林大学珠海学院 A kind of music automatic synthesis method and system based on LSTM
CN110189741A (en) * 2018-07-05 2019-08-30 腾讯数码(天津)有限公司 Audio synthetic method, device, storage medium and computer equipment
CN110264984A (en) * 2019-05-13 2019-09-20 北京奇艺世纪科技有限公司 Model training method, music generating method, device and electronic equipment
CN110364140A (en) * 2019-06-11 2019-10-22 平安科技(深圳)有限公司 Training method, device, computer equipment and the storage medium of song synthetic model
CN110634460A (en) * 2018-06-21 2019-12-31 卡西欧计算机株式会社 Electronic musical instrument, control method for electronic musical instrument, and storage medium
CN110634461A (en) * 2018-06-21 2019-12-31 卡西欧计算机株式会社 Electronic musical instrument, control method for electronic musical instrument, and storage medium
CN110838286A (en) * 2019-11-19 2020-02-25 腾讯科技(深圳)有限公司 Model training method, language identification method, device and equipment
WO2020140390A1 (en) * 2019-01-04 2020-07-09 平安科技(深圳)有限公司 Vibrato modeling method, device, computer apparatus and storage medium
CN111402843A (en) * 2020-03-23 2020-07-10 北京字节跳动网络技术有限公司 Rap music generation method and device, readable medium and electronic equipment
CN111445892A (en) * 2020-03-23 2020-07-24 北京字节跳动网络技术有限公司 Song generation method and device, readable medium and electronic equipment
CN112037757A (en) * 2020-09-04 2020-12-04 腾讯音乐娱乐科技(深圳)有限公司 Singing voice synthesis method and device and computer readable storage medium
CN112309410A (en) * 2020-10-30 2021-02-02 北京有竹居网络技术有限公司 Song sound repairing method and device, electronic equipment and storage medium
CN112420004A (en) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 Method and device for generating songs, electronic equipment and computer readable storage medium
CN113035163A (en) * 2021-05-11 2021-06-25 杭州网易云音乐科技有限公司 Automatic generation method and device of musical composition, storage medium and electronic equipment
CN113506554A (en) * 2020-03-23 2021-10-15 卡西欧计算机株式会社 Electronic musical instrument and control method for electronic musical instrument
CN116001664A (en) * 2022-12-12 2023-04-25 瑞声声学科技(深圳)有限公司 Somatosensory type in-vehicle reminding method, system and related equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030369A (en) * 2007-03-30 2007-09-05 清华大学 Built-in speech discriminating method based on sub-word hidden Markov model
CN101246685A (en) * 2008-03-17 2008-08-20 清华大学 Pronunciation quality evaluation method of computer auxiliary language learning system
CN101436403A (en) * 2007-11-16 2009-05-20 创新未来科技有限公司 Method and system for recognizing tone
CN101516005A (en) * 2008-02-23 2009-08-26 华为技术有限公司 Speech recognition channel selecting system, method and channel switching device
CN102982803A (en) * 2012-12-11 2013-03-20 华南师范大学 Isolated word speech recognition method based on HRSF and improved DTW algorithm
CN102982799A (en) * 2012-12-20 2013-03-20 中国科学院自动化研究所 Speech recognition optimization decoding method integrating guide probability
CN104217713A (en) * 2014-07-15 2014-12-17 西北师范大学 Tibetan-Chinese speech synthesis method and device
CN105390133A (en) * 2015-10-09 2016-03-09 西北师范大学 Tibetan TTVS system realization method
CN106128450A (en) * 2016-08-31 2016-11-16 西北师范大学 The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030369A (en) * 2007-03-30 2007-09-05 清华大学 Built-in speech discriminating method based on sub-word hidden Markov model
CN101436403A (en) * 2007-11-16 2009-05-20 创新未来科技有限公司 Method and system for recognizing tone
CN101516005A (en) * 2008-02-23 2009-08-26 华为技术有限公司 Speech recognition channel selecting system, method and channel switching device
CN101246685A (en) * 2008-03-17 2008-08-20 清华大学 Pronunciation quality evaluation method of computer auxiliary language learning system
CN102982803A (en) * 2012-12-11 2013-03-20 华南师范大学 Isolated word speech recognition method based on HRSF and improved DTW algorithm
CN102982799A (en) * 2012-12-20 2013-03-20 中国科学院自动化研究所 Speech recognition optimization decoding method integrating guide probability
CN104217713A (en) * 2014-07-15 2014-12-17 西北师范大学 Tibetan-Chinese speech synthesis method and device
CN105390133A (en) * 2015-10-09 2016-03-09 西北师范大学 Tibetan TTVS system realization method
CN106128450A (en) * 2016-08-31 2016-11-16 西北师范大学 The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
冯欢: "《基于HMM的歌词到歌声转换的研究》", 《CNKI中国优秀硕士学位论文全文数据库(电子期刊)》 *
吴义坚 等: "《基于HMM的可训练中文语音合成》", 《中文信息学报》 *
张有为等: "《人机自然交互》", 30 September 2004 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109326280A (en) * 2017-07-31 2019-02-12 科大讯飞股份有限公司 Singing synthesis method and device and electronic equipment
CN109326280B (en) * 2017-07-31 2022-10-04 科大讯飞股份有限公司 Singing synthesis method and device and electronic equipment
CN108831435A (en) * 2018-06-06 2018-11-16 安徽继远软件有限公司 A kind of emotional speech synthesizing method based on susceptible sense speaker adaptation
CN109036370A (en) * 2018-06-06 2018-12-18 安徽继远软件有限公司 A kind of speaker's voice adaptive training method
CN108831437A (en) * 2018-06-15 2018-11-16 百度在线网络技术(北京)有限公司 A kind of song generation method, device, terminal and storage medium
CN110634461A (en) * 2018-06-21 2019-12-31 卡西欧计算机株式会社 Electronic musical instrument, control method for electronic musical instrument, and storage medium
CN110634460A (en) * 2018-06-21 2019-12-31 卡西欧计算机株式会社 Electronic musical instrument, control method for electronic musical instrument, and storage medium
WO2020007148A1 (en) * 2018-07-05 2020-01-09 腾讯科技(深圳)有限公司 Audio synthesizing method, storage medium and computer equipment
CN110189741A (en) * 2018-07-05 2019-08-30 腾讯数码(天津)有限公司 Audio synthetic method, device, storage medium and computer equipment
CN109068439A (en) * 2018-07-30 2018-12-21 上海应用技术大学 A kind of light coloring control method and its control device based on MIDI theme
CN109147757A (en) * 2018-09-11 2019-01-04 广州酷狗计算机科技有限公司 Song synthetic method and device
CN109192218B (en) * 2018-09-13 2021-05-07 广州酷狗计算机科技有限公司 Method and apparatus for audio processing
CN109192218A (en) * 2018-09-13 2019-01-11 广州酷狗计算机科技有限公司 The method and apparatus of audio processing
CN109147809A (en) * 2018-09-20 2019-01-04 广州酷狗计算机科技有限公司 Acoustic signal processing method, device, terminal and storage medium
CN109801608A (en) * 2018-12-18 2019-05-24 武汉西山艺创文化有限公司 A kind of song generation method neural network based and system
WO2020140390A1 (en) * 2019-01-04 2020-07-09 平安科技(深圳)有限公司 Vibrato modeling method, device, computer apparatus and storage medium
CN110164412A (en) * 2019-04-26 2019-08-23 吉林大学珠海学院 A kind of music automatic synthesis method and system based on LSTM
CN110264984A (en) * 2019-05-13 2019-09-20 北京奇艺世纪科技有限公司 Model training method, music generating method, device and electronic equipment
CN110264984B (en) * 2019-05-13 2021-07-06 北京奇艺世纪科技有限公司 Model training method, music generation method and device and electronic equipment
CN110364140A (en) * 2019-06-11 2019-10-22 平安科技(深圳)有限公司 Training method, device, computer equipment and the storage medium of song synthetic model
CN110364140B (en) * 2019-06-11 2024-02-06 平安科技(深圳)有限公司 Singing voice synthesis model training method, singing voice synthesis model training device, computer equipment and storage medium
CN112420004A (en) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 Method and device for generating songs, electronic equipment and computer readable storage medium
CN110838286A (en) * 2019-11-19 2020-02-25 腾讯科技(深圳)有限公司 Model training method, language identification method, device and equipment
CN110838286B (en) * 2019-11-19 2024-05-03 腾讯科技(深圳)有限公司 Model training method, language identification method, device and equipment
CN111402843B (en) * 2020-03-23 2021-06-11 北京字节跳动网络技术有限公司 Rap music generation method and device, readable medium and electronic equipment
CN113506554A (en) * 2020-03-23 2021-10-15 卡西欧计算机株式会社 Electronic musical instrument and control method for electronic musical instrument
CN111445892A (en) * 2020-03-23 2020-07-24 北京字节跳动网络技术有限公司 Song generation method and device, readable medium and electronic equipment
CN111402843A (en) * 2020-03-23 2020-07-10 北京字节跳动网络技术有限公司 Rap music generation method and device, readable medium and electronic equipment
CN112037757A (en) * 2020-09-04 2020-12-04 腾讯音乐娱乐科技(深圳)有限公司 Singing voice synthesis method and device and computer readable storage medium
CN112037757B (en) * 2020-09-04 2024-03-15 腾讯音乐娱乐科技(深圳)有限公司 Singing voice synthesizing method, singing voice synthesizing equipment and computer readable storage medium
CN112309410A (en) * 2020-10-30 2021-02-02 北京有竹居网络技术有限公司 Song sound repairing method and device, electronic equipment and storage medium
CN113035163A (en) * 2021-05-11 2021-06-25 杭州网易云音乐科技有限公司 Automatic generation method and device of musical composition, storage medium and electronic equipment
CN113035163B (en) * 2021-05-11 2021-08-10 杭州网易云音乐科技有限公司 Automatic generation method and device of musical composition, storage medium and electronic equipment
CN116001664A (en) * 2022-12-12 2023-04-25 瑞声声学科技(深圳)有限公司 Somatosensory type in-vehicle reminding method, system and related equipment

Similar Documents

Publication Publication Date Title
CN106971703A (en) A kind of song synthetic method and device based on HMM
CN101308652B (en) Synthesizing method of personalized singing voice
CN101399036B (en) Device and method for conversing voice to be rap music
US6804649B2 (en) Expressivity of voice synthesis by emphasizing source signal features
CN101178896B (en) Unit selection voice synthetic method based on acoustics statistical model
Kim et al. Korean singing voice synthesis system based on an LSTM recurrent neural network
Umbert et al. Expression control in singing voice synthesis: Features, approaches, evaluation, and challenges
Hono et al. Recent development of the DNN-based singing voice synthesis system—sinsy
Hono et al. Sinsy: A deep neural network-based singing voice synthesis system
CN103915093A (en) Method and device for realizing voice singing
CN102201234A (en) Speech synthesizing method based on tone automatic tagging and prediction
CN105654942A (en) Speech synthesis method of interrogative sentence and exclamatory sentence based on statistical parameter
Cho et al. A survey on recent deep learning-driven singing voice synthesis systems
Gupta et al. Deep learning approaches in topics of singing information processing
Liu et al. Vibrato learning in multi-singer singing voice synthesis
Bonada et al. Singing voice synthesis combining excitation plus resonance and sinusoidal plus residual models
Wada et al. Sequential generation of singing f0 contours from musical note sequences based on wavenet
Chu et al. MPop600: A Mandarin popular song database with aligned audio, lyrics, and musical scores for singing voice synthesis
Bonada et al. Spectral approach to the modeling of the singing voice
Li et al. A lyrics to singing voice synthesis system with variable timbre
Pitrelli et al. Expressive speech synthesis using American English ToBI: questions and contrastive emphasis
Gu et al. Singing-voice synthesis using demi-syllable unit selection
Nose et al. A style control technique for singing voice synthesis based on multiple-regression HSMM.
Khan et al. Singing Voice Synthesis Using HMM Based TTS and MusicXML
Lee et al. A study of F0 modelling and generation with lyrics and shape characterization for singing voice synthesis

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170721