CN110310621A - Sing synthetic method, device, equipment and computer readable storage medium - Google Patents

Sing synthetic method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN110310621A
CN110310621A CN201910407538.7A CN201910407538A CN110310621A CN 110310621 A CN110310621 A CN 110310621A CN 201910407538 A CN201910407538 A CN 201910407538A CN 110310621 A CN110310621 A CN 110310621A
Authority
CN
China
Prior art keywords
music score
feature
audio
singing
opera arias
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910407538.7A
Other languages
Chinese (zh)
Inventor
朱清影
程宁
王健宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910407538.7A priority Critical patent/CN110310621A/en
Publication of CN110310621A publication Critical patent/CN110310621A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention discloses one kind to sing synthetic method, device, equipment and computer readable storage medium, this sings synthetic method the following steps are included: obtaining the music score to be synthesized of preset number format;Extract the music score feature of the music score to be synthesized;The music score feature is inputted the mixed hidden Markov model of preset Gauss to handle, exports corresponding acoustic feature;The acoustic feature of output is synthesized into audio of singing opera arias corresponding with the music score to be synthesized by acoustic code synthesizer.The required data of synthesis are sung much smaller than existing due to mixing the data that hidden Markov model uses in training Gauss, thus go to collect data without expending a large amount of manpower and time, to reduce the difficulty for singing synthesis.

Description

Sing synthetic method, device, equipment and computer readable storage medium
Technical field
The present invention relates to voice processing technology fields, more particularly to singing synthetic method, device, equipment and computer can Read storage medium.
Background technique
In recent years, synthetic technology is sung always by the concern of various circles of society, is sung the maximum convenience of synthetic technology and is existed Computer can be allowed to sing the song of any melody in it, this makes and sing the fields such as the music making being closely related, amusement There is urgent expectation to the progress for singing synthetic technology.The existing mainstream technology for singing synthesis first is that waveform concatenation, wave The core of shape splicing is to prerecord each pronunciation in certain language then to incite somebody to action according to lyrics and notation in the singing style of different pitches These recording connect.
However it is existing sing synthetic technology there are two big difficult point, first is that easily generating waveform during waveform concatenation Distortion, the sound in turn resulting in synthesis are unnatural;Second is that waveform concatenation relies on very large recording data, this just needs to expend A large amount of time and manpower collect data.Result in the difficulty for singing synthesis relatively high in this way.
Summary of the invention
The main purpose of the present invention is to provide one kind to sing synthetic method, device, equipment and computer-readable storage Medium, it is intended to which the synthesis higher technical problem of difficulty is sung in solution in the prior art.
To achieve the above object, one kind provided by the invention sings synthetic method, and the synthetic method of singing includes following Step:
Obtain the music score to be synthesized of preset number format;
Extract the music score feature of the music score to be synthesized;
The music score feature is inputted the mixed hidden Markov model of preset Gauss to handle, it is special to export corresponding acoustics Sign;
The acoustic feature of output is synthesized into audio of singing opera arias corresponding with the music score to be synthesized by acoustic code synthesizer.
Optionally, it is described obtain preset number format music score to be synthesized the step of before, further includes:
Multiple training samples are obtained, the training sample includes: the music score comprising the lyrics, the corresponding audio of singing opera arias of music score;
The music score feature of music score is extracted from each training sample and extracts the acoustic feature for audio of singing opera arias;
Using music score feature as mode input training sample, training sample is exported by model of acoustic feature, training Gauss is mixed Hidden Markov model mixes hidden Markov model to obtain preset Gauss.
Optionally, the preset number format is MusicXML format;The music score for extracting the music score to be synthesized is special Sign includes:
Based on the timing of the music score to be synthesized, the feature tag of the music score to be synthesized is sequentially extracted;
Label is resolved the features as, music score feature is obtained.
Optionally, the feature tag includes: musical features label, text feature label and syllable duration label;
The musical features label includes following music score feature: clef, tone, time signature, speed, pitch, note type;
The text feature label includes following music score feature: syllable, phonetic lyrics text;
The syllable duration label includes following music score feature: syllable sound duration.
Optionally, the acoustic feature includes fundamental frequency;The sing opera arias acoustic feature of audio of the extraction includes:
The audio of singing opera arias is sampled to obtain sampled data, the sampled data includes low sampling rate data and height Sampling rate;
The first normalization crosscorrelation equation value of the low sampling rate data is calculated, and is recorded described the in calculating cycle The first partial maximum of one normalization crosscorrelation equation value;
The second normalization crosscorrelation equation value of the high sampling rate data is calculated, and is recorded described the in calculating cycle Second local maximum of two normalization crosscorrelation equation values;
The fundamental frequency of the sampled data is calculated according to the first partial maximum and the second local maximum;
Wherein, the calculation formula for normalizing crosscorrelation equation value is as follows:
K ∈ [0, K-1];M=iz, i ∈ [0, M-1];Z=t/T;N=w/T;J=m;τ=j;
Wherein,To normalize crosscorrelation equation value, Sj、Sj+k、SτFor sampled data sampling point, z t/T, n w/T, T is audio data samples duration, and t is an audio frame duration, and w is calculating cycle duration, and M is audio frame in a calculating cycle Number, K are the data point number of sampled data, j, m, z, τ, em、em+k、ejFor intermediate reduced parameter, i, k are integer, and K, M are positive whole Number.
Optionally, the acoustic feature includes mel-frequency cepstrum coefficient;The acoustic feature for extracting audio of singing opera arias is also Include:
The audio of singing opera arias is pre-processed, the pretreatment includes preemphasis processing, sub-frame processing and windowing process;
Fourier transformation made to the pretreated audio of singing opera arias, the frequency spectrum for audio of singing opera arias described in acquisition, and according to institute State the power spectrum for audio of singing opera arias described in the frequency spectrum acquisition for audio of singing opera arias;
Using the power spectrum for audio of singing opera arias described in the readable filter group processing of Meier, the Meier function for audio of singing opera arias described in acquisition Rate spectrum;
Cepstral analysis, the mel-frequency cepstrum coefficient for audio of singing opera arias described in acquisition are carried out on the Meier power spectrum.
In addition, to achieve the above object, the present invention also provides one kind to sing synthesizer, it is described to sing synthesizer packet It includes:
Module is obtained, for obtaining the music score to be synthesized of preset number format;
Extraction module, for extracting the music score feature of the music score to be synthesized;
Processing module is handled for the music score feature to be inputted the mixed hidden Markov model of preset Gauss, defeated Corresponding acoustic feature out;
Synthesis module, it is corresponding with the music score to be synthesized for being synthesized the acoustic feature of output by acoustic code synthesizer Audio of singing opera arias.
Optionally, the acquisition module is also used to obtain multiple training samples, and the training sample includes: comprising the lyrics The corresponding audio of singing opera arias of music score, music score;
The extraction module is also used to extract the music score feature of music score from each training sample and extracts audio of singing opera arias Acoustic feature;
The synthesizer of singing further includes training module, and the training module is used to instruct by mode input of music score feature Practice sample, export training sample by model of acoustic feature, training Gauss mixes hidden Markov model, to obtain preset Gauss Mixed hidden Markov model;Wherein, the extraction module further include:
Sampling unit, for being sampled to the audio of singing opera arias to obtain sampled data, the sampled data includes low Sampling rate and high sampling rate data;
First computing unit for calculating the first normalization crosscorrelation equation value of the low sampling rate data, and is remembered Record the first partial maximum of the first normalization crosscorrelation equation value in calculating cycle;
Second computing unit calculates the second normalization crosscorrelation equation value of the high sampling rate data, and recording gauge Calculate the second local maximum of the second normalization crosscorrelation equation value in the period;
Third computing unit calculates the sampled data according to the first partial maximum and the second local maximum Fundamental frequency;
Wherein, the calculation formula for normalizing crosscorrelation equation value is as follows:
K ∈ [0, K-1];M=iz, i ∈ [0, M-1];Z=t/T;N=w/T;J=m;τ=j;
Wherein,To normalize crosscorrelation equation value, Sj、Sj+k、SτFor sampled data sampling point, z t/T, n w/T, T is audio data samples duration, and t is an audio frame duration, and w is calculating cycle duration, and M is audio frame in a calculating cycle Number, K are the data point number of sampled data, j, m, z, τ, em、em+k、ejFor intermediate reduced parameter, i, k are integer, and K, M are positive whole Number.
Optionally, the preset number format is MusicXML format;The extraction module further include:
Extraction unit sequentially extracts the feature mark of the music score to be synthesized for the timing based on the music score to be synthesized Label;
Resolution unit obtains music score feature for resolving the features as label.
Optionally, the feature tag includes: musical features label, text feature label and syllable duration label;
The musical features label includes following music score feature: clef, tone, time signature, speed, pitch, note type;
The text feature label includes following music score feature: syllable, phonetic lyrics text;
The syllable duration label includes following music score feature: syllable sound duration.
Optionally, the extraction module further include:
Pretreatment unit, for pre-processing to the audio of singing opera arias, the pretreatment includes preemphasis processing, framing Processing and windowing process;
Power spectrum acquiring unit is sung opera arias described in acquisition for making Fourier transformation to the pretreated audio of singing opera arias The frequency spectrum of audio, and according to the frequency spectrum of the audio of singing opera arias obtain described in sing opera arias the power spectrum of audio;
Mel-frequency composes acquiring unit, for using the power spectrum for audio of singing opera arias described in the readable filter group processing of Meier, The Meier power spectrum for audio of singing opera arias described in acquisition;
Mel-frequency cepstrum coefficient acquiring unit, for carrying out cepstral analysis on the Meier power spectrum, described in acquisition The mel-frequency cepstrum coefficient for audio of singing opera arias.
In addition, to achieve the above object, the present invention also provides one kind to sing synthesis device, the synthesis device of singing includes It processor, reservoir and is stored on the memory and what can be executed by the processor sings synthesis program, the song It sings and realizes the step of singing synthetic method as described above when synthesis program is executed by the processor.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium It is stored on storage medium and sings synthesis program, described sing is realized when synthesis program is executed by processor and sing conjunction as described above The step of at method.
The present invention is gone out the music score feature extraction in music score to be synthesized by the music score to be synthesized of acquisition preset number format And be input to the mixed hidden Markov model of preset Gauss and handle, to export corresponding acoustic feature, acoustic feature passes through Acoustic code synthesizer synthesizes the corresponding audio of singing opera arias of music score to be synthesized therewith.By the present invention in that mixing hidden Ma Er with trained Gauss Can husband's model acoustic feature is converted into corresponding audio at acoustic feature, then by acoustic code synthesizer by music score Feature Conversion, The required data of synthesis are sung much smaller than existing due to mixing the data that hidden Markov model uses in training Gauss, thus It goes to collect data without expending a large amount of manpower and time, to reduce the difficulty for singing synthesis.
Detailed description of the invention
Fig. 1 is the hardware structural diagram that synthesis device is sung involved in the embodiment of the present invention;
Fig. 2 is the flow diagram that the present invention sings one embodiment of synthetic method;
Fig. 3 is the flow diagram that the present invention sings another embodiment of synthetic method;
Fig. 4 is the refinement flow diagram of mono- embodiment of step S20 in Fig. 2;
Fig. 5 is the refinement flow diagram of mono- embodiment of step S120 in Fig. 3;
Fig. 6 is the refinement flow diagram of another embodiment of step S120 in Fig. 3;
Fig. 7 is the functional block diagram that the present invention sings one embodiment of synthesizer.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
Referring to Fig.1, Fig. 1 is the hardware structural diagram that synthesis device is sung involved in the embodiment of the present invention.This In inventive embodiments, singing synthesis device may include processor 1001 (such as CPU), communication bus 1002, user interface 1003, network interface 1004, memory 1005.Wherein, communication bus 1002 is for realizing the connection communication between these components; User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard);Network interface 1004 can Choosing may include standard wireline interface and wireless interface (such as WI-FI interface);Memory 1005 can be high-speed RAM storage Device, is also possible to stable memory (non-volatile memory), such as magnetic disk storage, and memory 1005 is optional It can also be the storage device independently of aforementioned processor 1001.
It will be understood by those skilled in the art that hardware configuration shown in Fig. 1 does not constitute the limit to application on voiceprint recognition equipment It is fixed, it may include perhaps combining certain components or different component layouts than illustrating more or fewer components.
With continued reference to Fig. 1, the memory 1005 in Fig. 1 as a kind of computer readable storage medium may include operation system System and sings synthesis program at network communication module.
In Fig. 1, network communication module is mainly used for connecting server, carries out data communication with server;And processor 1001 can call the synthesis program of singing stored in memory 1005, and execute and provided in an embodiment of the present invention sing synthesis side Method.
It is the flow diagram for singing one embodiment of synthetic method of the invention, in the present embodiment, institute referring to Fig. 2, Fig. 2 It states and sings synthetic method and include:
Step S10: the music score to be synthesized of preset number format is obtained;
In the present embodiment, preset number format can be midi format, ASCII fromat, MusicXML format etc., herein Just be not listed one by one.Music score to be synthesized is the music score for needing to convert thereof into audio of singing opera arias, and music score is divided into paper score and electricity Sub- music score, the paper score can be the music score of printing, can also be that hand-written music score, paper score can pass through the side of scanning Formula is converted to electronic music, then the music score for obtaining preset number format is handled by software;The electronic music can be from website The electronic music of upper downloading can also be that electronic music of the user from edlin, electronic music obtain after can handling by software Obtain the music score of preset number format.
Step S20: the music score feature of the music score to be synthesized is extracted;
In the present embodiment, music score feature generally comprises note characteristic and lyrics feature, one in note characteristic or Multiple notes are corresponding with a lyrics in lyrics feature.Note in note characteristic also has timing, i.e. sound in music score Symbol needs to be formed set melody according to set timing performance, the lyrics in lyrics feature be it is corresponding with note, because The lyrics in this lyrics feature are corresponding also with set timing.
It, can be according to the timing of note characteristic simultaneously to note characteristic and song when extracting the music score feature of music score to be synthesized Word feature extracts.For example, the music score to be synthesized of preset number format is crucial with corresponding lyrics feature by note characteristic Phrase encapsulation, when extracting the music score feature of music score to be synthesized, in the music score to be synthesized according to timing resolution preset number format Crucial phrase can be obtained the music score feature in the music score to be synthesized of preset number format.
It is, of course, also possible to separately be mentioned by the note characteristic in the music score to be synthesized of preset number format and with lyrics feature It takes.For example, the note characteristic lyrics feature corresponding with the note characteristic in the music score to be synthesized of preset number format is adopted respectively It is extracted according to timing to be synthesized with the crucial phrase encapsulation with same mark when extracting the music score feature of music score to be synthesized Then note characteristic and lyrics feature in music score are used to encapsulate the crucial phrase of note characteristic in parsing and sing for encapsulating When the crucial phrase of word feature, the note characteristic of the crucial phrase with same mark and lyrics feature are mapped, in this way The music score feature being obtained in music score to be synthesized.
Step S30: the music score feature is inputted into the mixed hidden Markov model of preset Gauss and is handled, output corresponds to Acoustic feature;
In the present embodiment, it is a kind of computer model, GMM- that Gauss, which mixes hidden Markov model (GMM-HMM model), HMM model have passed through a large amount of training before use, can accurately find the relationship between music score feature and acoustic feature, This, which allows for GMM-HMM model, can export acoustic feature corresponding with the music score feature according to the music score feature of input.For example, The music score feature of " I Love You, China " is input in GMM-HMM model, GMM-HMM model then according to input " I Love You, The acoustic feature of the music score feature output " I Love You, China " of China ".
It is worth noting that, the acoustic feature of GMM-HMM model output includes the pronunciation of each word in the lyrics, each word pair The tone answered, the lasting duration etc. of each word.
Step S40: the acoustic feature of output synthesized by acoustic code synthesizer corresponding with the music score to be synthesized clear Sing audio.
In the present embodiment, acoustic feature includes the fundamental frequency and mel-frequency cepstrum coefficient of each lyrics, acoustic code synthesizer By the way that by the fundamental frequency of each lyrics and the synthesis of mel-frequency cepstrum coefficient, to obtain the corresponding pronunciation of the lyrics, acoustic code synthesizer is pressed The fundamental frequency of multiple lyrics and mel-frequency cepstrum coefficient are synthesized according to the timing of acoustic feature, can be obtained music score to be synthesized in this way Corresponding audio of singing opera arias.
The present embodiment passes through the music score to be synthesized for obtaining preset number format, by the music score feature extraction in music score to be synthesized It out and is input to preset GMM-HMM model and is handled, to export corresponding acoustic feature, acoustic feature passes through acoustic code and synthesizes Device synthesizes the corresponding audio of singing opera arias of music score to be synthesized therewith.The present embodiment is by using trained GMM-HMM model by music score Acoustic feature is converted to corresponding audio at acoustic feature, then by acoustic code synthesizer by Feature Conversion, due in training GMM- The data that HMM model uses much smaller than it is existing sing synthesis needed for data, thus without expend a large amount of manpower and when Between go collect data, to reduce the difficulty for singing synthesis.
Referring to Fig. 3, Fig. 3 is the flow diagram that the present invention sings another embodiment of synthetic method, in the present embodiment, institute Before stating step S10 further include:
Step S110: obtaining multiple training samples, and the training sample includes: that the music score comprising the lyrics, music score are corresponding It sings opera arias audio;
In the present embodiment, each training sample includes music score and the corresponding video of singing opera arias of music score comprising the lyrics, Multiple training samples are generated by a user, i.e., the user sings the different music score comprising the lyrics, to obtain each music score Corresponding audio of singing opera arias, this allows for the corresponding audio of singing opera arias of each music score comprising the lyrics and constitutes a training sample.
There are many kinds of the generation types of training sample.For example, the music score comprising the lyrics is paper score, paper score can be with It is converted into electronic music by way of scanning, then forms the music score of preset number format by software processing;Include the lyrics The corresponding audio of singing opera arias of music score can be sung acquisition in real time by user, i.e., obtaining user according to this by microphone includes the lyrics Music score is sung opera arias, to form the audio of singing opera arias that this includes the music score of the lyrics.For another example, the music score comprising the lyrics is electronic music, electricity Sub- music score directly passes through the music score that software processing forms preset number format, and the corresponding audio of singing opera arias of music score can record It sings opera arias audio, can thus call directly the audio of singing opera arias originally recorded.
Step S120: extracting the music score feature of music score from each training sample and extracts the acoustic feature for audio of singing opera arias;
In the present embodiment, training sample includes music score and the corresponding audio of singing opera arias of music score comprising the lyrics, includes The music score of the lyrics is the music score of preset number format, and music score feature includes note characteristic and lyrics feature, in note characteristic One or more note is corresponding with a lyrics in lyrics feature.Note in note characteristic also has timing, i.e., happy Note in spectrum needs that set melody could be formed according to set timing performance, and the lyrics in lyrics feature are and note pair It answers, therefore the lyrics in lyrics feature are corresponding also with set timing.
It, can be according to the timing of note characteristic simultaneously to note characteristic and song when extracting the music score feature of each training sample Word feature extracts.For example, encapsulating note characteristic with corresponding lyrics feature with crucial phrase in training sample, is mentioning When taking the music score feature of music score to be synthesized, according to the crucial phrase in the music score to be synthesized of timing resolution preset number format Obtain the music score feature in the music score to be synthesized of preset number format.
It is, of course, also possible to first separately extracted by the note characteristic in training sample and with lyrics feature, then recombinant.Example Such as, the keyword with same mark is respectively adopted in the lyrics feature corresponding with the note characteristic of the note characteristic in training sample Group encapsulation extracts note characteristic and lyrics feature in training sample according to timing, then is used to encapsulate in parsing at the extraction The crucial phrase of note characteristic and when for encapsulating the crucial phrase of lyrics feature, will be with the crucial phrase of same mark Note characteristic and lyrics feature are mapped, and are obtained with the music score feature of training sample in this way.
The pronunciation of each word is different in audio of singing opera arias, and the pronunciation of each word is by its corresponding fundamental frequency and mel-frequency Cepstrum coefficient determines, that is to say, that the acoustic feature for audio of singing opera arias is fallen by the fundamental frequency and mel-frequency of the lyrics each in music score Spectral coefficient constitute, therefore, extract training sample audio of singing opera arias in acoustic feature when, mentioned from audio of singing opera arias according to timing Take the fundamental frequency and mel-frequency cepstrum coefficient of each word.
Extract sing opera arias audio fundamental frequency when, can use short-time average magnitade difference function algorithm, Cepstrum Method, Wavelet Transform with And other methods extract, and are just not listed one by one herein.When extracting the mel-frequency cepstrum coefficient for audio of singing opera arias, will first sing opera arias Then audio sub-frame processing and windowing process make the frequency that Fourier transformation obtains audio of singing opera arias to pretreated audio of singing opera arias Spectrum obtains the power spectrum for audio of singing opera arias according to the frequency spectrum for audio of singing opera arias, and handles audio of singing opera arias using the readable filter group of Meier Power spectrum obtains the Meier power spectrum for audio of singing opera arias;Cepstral analysis is carried out on Meier power spectrum, obtains the Meier for audio of singing opera arias Frequency cepstral coefficient.
Step S130: using music score feature as mode input training sample, training sample, instruction are exported by model of acoustic feature Practice Gauss and mix hidden Markov model, mixes hidden Markov model to obtain preset Gauss.
In the present embodiment, GMM-HMM mode input and output are restriction, in training GMM-HMM model, only It is in order to enable GMM-HMM model to find the corresponding relationship between outputting and inputting, so that GMM-HMM model finds music score spy The corresponding relationship sought peace between the acoustic feature of the corresponding music score feature, so that after the completion of GMM-HMM model is trained to, GMM-HMM model can export acoustic feature corresponding with the music score feature according to the music score feature of input.
It is the refinement flow diagram of mono- embodiment of step S20 in Fig. 2 referring to Fig. 4, Fig. 4, it is in the present embodiment, described pre- If number format is MusicXML format;The step S20 includes:
Step S21: the timing based on the music score to be synthesized sequentially extracts the feature tag of the music score to be synthesized;
In the present embodiment, the music score of MusicXML format is obtained usually using software development such as MuseScore, Overture , the music score of extended formatting can also be converted into the music score of MusicXML format by softwares such as MuseScore, Overture. The music score of MusicXML format is different from staff and numbered musical notation, and the music score of MusicXML format is generated according to the timing of music score Text, there is in text multiple feature tags, a subcharacter in each feature tag correspondence markings music score feature.
Extract MusicXML format music score in feature tag when, according to the timing of music score to be synthesized, that is, press According to the timing of text from the beginning to the end, feature tag is extracted from the text of MusicXML format, can thus be fallen with automatic fitration The text of MusicXML format does not carry the redundant information of feature tag.
Step S22: resolving the features as label, obtains music score feature.
In the present embodiment, there are many kinds of the types of feature tag, feature tag can be keyword to, key symbol pair Etc.;Feature tag can be level-one, and feature tag is also possible to multistage.In this application, feature tag is keyword Right, a pair of of keyword is respectively adopted to label in each note and all information relevant to the note in music score to be synthesized, Subcharacter in the note and all information relevant to the note is again using different keywords to being marked.For example, With<note>with</note>come one note of label and all information relevant to the note, uses<step>with</step>come The corresponding scale of the note is marked, is used<pitch>with</pitch>the corresponding pitch of the note is marked, is used<lyric>with</ Lyric > mark corresponding lyrics of the note etc..
When parsing feature tag, first identification feature label, if this feature label is target signature label, such as<note>and </note>,<step>with</step>,<pitch>with</pitch>deng then to this feature tag resolution, to obtain by feature mark Sign the music score feature of label;If this feature label is non-targeted feature tag, this feature label is not parsed.
It should be noted that information included in music score is relatively more, the feature in music score is marked for convenience, in this Shen Please in feature tag is divided into three categories, i.e. musical features label, text feature label and syllable duration label;Musical features Label includes following music score feature: clef, tone, time signature, speed, pitch, note type;Text feature label includes following pleasure Spectrum signature: syllable, phonetic lyrics text;Syllable duration label includes following music score feature: syllable sound duration.
It is the refinement flow diagram of mono- embodiment of step S120 in Fig. 3 referring to Fig. 5, Fig. 5, based on the above embodiment, In the present embodiment, the step S120 includes:
Step S121: the audio of singing opera arias is sampled to obtain sampled data, the sampled data includes low sampling Rate data and high sampling rate data;
Step S122: the first normalization crosscorrelation equation value of the low sampling rate data is calculated, and records calculating week The first partial maximum of the first normalization crosscorrelation equation value in phase;
Step S123: the second normalization crosscorrelation equation value of the high sampling rate data is calculated, and records calculating week Second local maximum of the second normalization crosscorrelation equation value in phase;
Step S124: the base of the sampled data is calculated according to the first partial maximum and the second local maximum Frequently;
Wherein, the calculation formula for normalizing crosscorrelation equation value is as follows:
K ∈ [0, K-1];M=iz, i ∈ [0, M-1];Z=t/T;N=w/T;J=m;τ=j;
Wherein,To normalize crosscorrelation equation value, Sj、Sj+k、SτFor sampled data sampling point, z t/T, n w/T, T is audio data samples duration, and t is an audio frame duration, and w is calculating cycle duration, and M is audio frame in a calculating cycle Number, K are the data point number of sampled data, j, m, z, τ, em、em+k、ejFor intermediate reduced parameter, i, k are integer, and K, M are positive whole Number.
It is the refinement flow diagram of another embodiment of step S120 in Fig. 2 referring to Fig. 6, Fig. 6, based on the above embodiment, In the present embodiment, the step S120 includes:
Step S121 ': pre-processing the audio of singing opera arias, and the pretreatment includes preemphasis processing, sub-frame processing And windowing process;
In the present embodiment, the calculation formula of preemphasis processing is S'n=Sn-a*Sn-1, wherein SnFor the signal width in time domain Degree, Sn-1For with SnThe signal amplitude of corresponding last moment, S'nFor the signal amplitude in time domain after preemphasis, a is preemphasis Coefficient, the value range of a are 0.9 < a < 1.0.Wherein, preemphasis is that one kind mends input signal high fdrequency component in transmitting terminal The signal processing mode repaid.With the increase of signal rate, signal is damaged very greatly in transmission process, in order to receive terminal energy Obtain relatively good signal waveform, it is necessary to compensate to impaired signal.The thought of pre-emphasis technique is exactly in transmission line Beginning enhancing signal radio-frequency component, to compensate excessive decaying of the high fdrequency component in transmission process.Preemphasis to noise simultaneously It does not influence, therefore output signal-to-noise ratio can be effectively improved.Server-side can be disappeared by carrying out preemphasis processing to audio of singing opera arias Except interference caused by vocal cords and lip etc. in the person's of singing opera arias voiced process, the pent-up radio-frequency head of the audio that can be sung opera arias with effective compensation Point, and the formant for audio high frequency of singing opera arias can be highlighted, the signal amplitude for audio of reinforcing singing opera arias helps to extract audio of singing opera arias Feature.
Sub-frame processing is carried out to the audio of singing opera arias after preemphasis.Framing refers to that the voice signal by whole section is cut into several segments Voice processing technology, the size of every frame in the range of 10-30ms, using general 1/2 frame length as frame move.Frame shifting refers to adjacent The overlapping region of two interframe can be avoided adjacent two frame and change excessive problem.Carrying out sub-frame processing to audio of singing opera arias can incite somebody to action Audio of singing opera arias is divided into the voice data of several segments, can segment audio of singing opera arias, convenient for the extraction for audio frequency characteristics of singing opera arias.
Windowing process is carried out to the audio of singing opera arias after sub-frame processing.After carrying out sub-frame processing to audio of singing opera arias, each frame The initial segment and end end discontinuous place can all occur, so framing is mostly also bigger with the error of original signal.It adopts It is able to solve this problem with adding window, the training voice data after can making sub-frame processing becomes continuously, and makes each frame It can show the feature of periodic function.Windowing process specifically refers to handle training voice data using window function, window Function can choose Hamming window, then it is that Hamming window window is long that the formula of the adding window, which is N, and n is time, SnFor the signal width in time domain Degree, S'nFor the signal amplitude in time domain after adding window.Server-side enables to framing by carrying out windowing process to audio of singing opera arias Treated, and the signal of audio in the time domain of singing opera arias becomes continuously to facilitate to extract the feature for audio of singing opera arias.
Step S122 ': Fourier transformation, the frequency for audio of singing opera arias described in acquisition are made to the pretreated audio of singing opera arias Spectrum, and according to the frequency spectrum of the audio of singing opera arias obtain described in sing opera arias the power spectrum of audio;
Fast Fourier Transform (FFT) (Fast Fourier Transformation, abbreviation FFT) refer to using computer calculate from Dissipate efficient, quick calculation method the general designation of Fourier transformation.Computer can be made to calculate direct computation of DFT using this calculation method Multiplication number required for leaf transformation is greatly reduced, and the number of sampling points being especially transformed is more, the saving of fft algorithm calculation amount It is more significant.
In the present embodiment, following process is specifically included as Fast Fourier Transform (FFT) to pretreated audio of singing opera arias: firstly, Pretreated audio of singing opera arias is calculated using the formula for calculating frequency spectrum, obtains the frequency spectrum for audio of singing opera arias.The calculating frequency spectrum Formula be 1≤k≤N, N is the size of frame, and s (k) is the signal amplitude on frequency domain, and s (n) is the signal amplitude in time domain, and n is Time, i are complex unit.Then, the frequency spectrum of the audio of singing opera arias got is calculated using the formula for calculating power spectrum, is asked The power spectrum for the audio that must sing opera arias.The formula of the calculating power spectrum is 1≤k≤N, and N is the size of frame, and s (k) is the signal on frequency domain Amplitude.By the signal amplitude on frequency domain that will sing opera arias audio from the signal amplitude in time domain and be converted to, further according on the frequency domain Signal amplitude obtains the power spectrum of audio of singing opera arias, for from the power spectrum for audio of singing opera arias extraction sing opera arias audio frequency characteristics provide it is important Technology premise.
Step S123 ': using the power spectrum for audio of singing opera arias described in the readable filter group processing of Meier, sound of singing opera arias described in acquisition The Meier power spectrum of frequency;
It is the mel-frequency analysis carried out to power spectrum using the power spectrum that melscale filter group handles audio of singing opera arias, And mel-frequency analysis is the analysis based on human auditory's perception.Observation discovery human ear is only closed just as a filter group Infuse certain specific frequency components (i.e. the sense of hearing of people is selective frequency), that is to say, that human ear only allows certain frequencies Signal passes through, and directly ignores the certain frequency signals for being not desired to perception.Specifically, melscale filter group includes multiple filtering Device, these filters are not but univesral distributions on frequency coordinate axis, have many filters in low frequency region, distribution is compared Intensively, but in high-frequency region, the number of filter just becomes fewer, is distributed very sparse.It is to be appreciated that melscale filters In the high resolution of low frequency part, the auditory properties with human ear are consistent device group, this is also the physical significance institute of melscale ?.Cutting is carried out to frequency-region signal by using mel-frequency scale filter group, so that last each frequency band is one corresponding Energy value, if the number of filter is 22, then corresponding 22 energy of Meier power spectrum that will obtain training voice data Value.Mel-frequency analysis is carried out by the power spectrum to audio of singing opera arias, so that the Meier power spectrum got maintains and human ear The closely related frequency-portions of characteristic, the frequency-portions can be well reflected out the feature for audio of singing opera arias.
Step S124 ': carrying out cepstral analysis on the Meier power spectrum, and the mel-frequency for audio of singing opera arias described in acquisition falls Spectral coefficient.
The Fourier that cepstrum (cepstrum) refers to that a kind of Fourier transform spectrum of signal carries out again after logarithm operation is anti- Transformation, since general Fourier spectrum is complex number spectrum, thus cepstrum is also known as cepstrum.By the cepstral analysis on Meier power spectrum, It can be excessively high by script intrinsic dimensionality, it is difficult to which the feature for including in the Meier power spectrum of the audio of singing opera arias directly used is converted into The audio frequency characteristics of singing opera arias that can be directly used during model training, the audio frequency characteristics i.e. mel-frequency cepstrum coefficient of singing opera arias.
In addition, the present invention also provides a kind of voice print identification devices.
Referring to Fig. 7, Fig. 7 is the functional block diagram that the present invention sings one embodiment of synthesizer.In the present embodiment, described Singing synthesizer includes:
Module 100 is obtained, for obtaining the music score to be synthesized of preset number format;
Extraction module 200, for extracting the music score feature of the music score to be synthesized;
Processing module 300 is handled for the music score feature to be inputted the mixed hidden Markov model of preset Gauss, Export corresponding acoustic feature;
Synthesis module 400, for being synthesized the acoustic feature of output and the music score to be synthesized by acoustic code synthesizer Corresponding audio of singing opera arias.
Further, the acquisition module 100 is also used to obtain multiple training samples, and the training sample includes: to include The corresponding audio of singing opera arias of music score, the music score of the lyrics;
The extraction module 200 is also used to extract the music score feature of music score from each training sample and extracts audio of singing opera arias Acoustic feature;
The synthesizer of singing further includes training module 500, and the training module 500 is used for using music score feature as model It inputs training sample, export training sample by model of acoustic feature, training Gauss mixes hidden Markov model, preset to obtain Gauss mix hidden Markov model;Wherein, the extraction module 100 further include:
Sampling unit 110, for being sampled to the audio of singing opera arias to obtain sampled data, the sampled data includes Low sampling rate data and high sampling rate data;
First computing unit 120, for calculating the first normalization crosscorrelation equation value of the low sampling rate data, and Record the first partial maximum of the first normalization crosscorrelation equation value in calculating cycle;
Second computing unit 130, calculates the second normalization crosscorrelation equation value of the high sampling rate data, and records Second local maximum of the second normalization crosscorrelation equation value in calculating cycle;
Third computing unit 140 calculates the hits according to the first partial maximum and the second local maximum According to fundamental frequency;
Wherein, the calculation formula for normalizing crosscorrelation equation value is as follows:
K ∈ [0, K-1];M=iz, i ∈ [0, M-1];Z=t/T;N=w/T;J=m;τ=j;
Wherein,To normalize crosscorrelation equation value, Sj、Sj+k、SτFor sampled data sampling point, z t/T, n w/T, T is audio data samples duration, and t is an audio frame duration, and w is calculating cycle duration, and M is audio frame in a calculating cycle Number, K are the data point number of sampled data, j, m, z, τ, em、em+k、ejFor intermediate reduced parameter, i, k are integer, and K, M are positive whole Number.
Optionally, the preset number format is MusicXML format;The extraction module 200 further include:
Extraction unit 210 sequentially extracts the feature of the music score to be synthesized for the timing based on the music score to be synthesized Label;
Resolution unit 220 obtains music score feature for resolving the features as label.
Optionally, the feature tag includes: musical features label, text feature label and syllable duration label;
The musical features label includes following music score feature: clef, tone, time signature, speed, pitch, note type;
The text feature label includes following music score feature: syllable, phonetic lyrics text;
The syllable duration label includes following music score feature: syllable sound duration.
Optionally, the extraction module 200 further include:
Pretreatment unit 230, for being pre-processed to the audio of singing opera arias, it is described pretreatment include preemphasis processing, Sub-frame processing and windowing process;
Power spectrum acquiring unit 240 obtains described clear for making Fourier transformation to the pretreated audio of singing opera arias Sing the frequency spectrum of audio, and according to the frequency spectrum of the audio of singing opera arias obtain described in sing opera arias the power spectrum of audio;
Mel-frequency composes acquiring unit 250, for the power using audio of singing opera arias described in the readable filter group processing of Meier It composes, the Meier power spectrum for audio of singing opera arias described in acquisition;
Mel-frequency cepstrum coefficient acquiring unit 260 obtains institute for carrying out cepstral analysis on the Meier power spectrum State the mel-frequency cepstrum coefficient for audio of singing opera arias.
In addition, the embodiment of the present invention also provides a kind of computer readable storage medium.
It is stored on computer readable storage medium of the present invention and sings synthesis program, sung synthesis program wherein described and located When managing device execution, realize as above-mentioned the step of singing synthetic method.
Wherein, it sings synthesis program and is performed realized method and can refer to the present invention and sing each reality of synthetic method Example is applied, details are not described herein again.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM), including some instructions are used so that a terminal (can be mobile phone, computer, server or network are set It is standby etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, it is all using equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, directly or indirectly Other related technical areas are used in, all of these belong to the protection of the present invention.

Claims (10)

1. one kind sings synthetic method, which is characterized in that it is described sing synthetic method the following steps are included:
Obtain the music score to be synthesized of preset number format;
Extract the music score feature of the music score to be synthesized;
The music score feature is inputted the mixed hidden Markov model of preset Gauss to handle, exports corresponding acoustic feature;
The acoustic feature of output is synthesized into audio of singing opera arias corresponding with the music score to be synthesized by acoustic code synthesizer.
2. singing synthetic method as described in claim 1, which is characterized in that obtain the to be synthesized of preset number format described Before the step of music score, further includes:
Multiple training samples are obtained, the training sample includes: the music score comprising the lyrics, the corresponding audio of singing opera arias of music score;
The music score feature of music score is extracted from each training sample and extracts the acoustic feature for audio of singing opera arias;
Using music score feature as mode input training sample, training sample is exported by model of acoustic feature, training Gauss mixes hidden horse Er Kefu model mixes hidden Markov model to obtain preset Gauss.
3. singing synthetic method as described in claim 1, which is characterized in that the preset number format is MusicXML lattice Formula;The music score feature for extracting the music score to be synthesized includes:
Based on the timing of the music score to be synthesized, the feature tag of the music score to be synthesized is sequentially extracted;
Label is resolved the features as, music score feature is obtained.
4. singing synthetic method as claimed in claim 3, which is characterized in that the feature tag include: musical features label, Text feature label and syllable duration label;
The musical features label includes following music score feature: clef, tone, time signature, speed, pitch, note type;
The text feature label includes following music score feature: syllable, phonetic lyrics text;
The syllable duration label includes following music score feature: syllable sound duration.
5. singing synthetic method as claimed in claim 2, which is characterized in that the acoustic feature includes fundamental frequency;
The sing opera arias acoustic feature of audio of the extraction includes:
The audio of singing opera arias is sampled to obtain sampled data, the sampled data includes low sampling rate data and high sampling Rate data;
The first normalization crosscorrelation equation value of the low sampling rate data is calculated, and records in calculating cycle described first and returns One changes the first partial maximum of crosscorrelation equation value;
The second normalization crosscorrelation equation value of the high sampling rate data is calculated, and records in calculating cycle described second and returns One changes the second local maximum of crosscorrelation equation value;
The fundamental frequency of the sampled data is calculated according to the first partial maximum and the second local maximum;
Wherein, the calculation formula for normalizing crosscorrelation equation value is as follows:
K ∈ [0, K-1];M=iz, i ∈ [0, M-1];Z=t/T;N=w/T;J=m;τ=j;
Wherein,To normalize crosscorrelation equation value, Sj、Sj+k、SτFor sampled data sampling point, z t/T, n w/T, T are Audio data samples duration, t are an audio frame duration, and w is calculating cycle duration, and M is audio frame number in a calculating cycle, K For the data point number of sampled data, j, m, z, τ, em、em+k、ejFor intermediate reduced parameter, i, k are integer, and K, M are positive integer.
6. singing synthetic method as claimed in claim 2, which is characterized in that the acoustic feature includes mel-frequency cepstrum system Number;The acoustic feature for extracting audio of singing opera arias further include:
The audio of singing opera arias is pre-processed, the pretreatment includes preemphasis processing, sub-frame processing and windowing process;
Fourier transformation made to the pretreated audio of singing opera arias, the frequency spectrum for audio of singing opera arias described in acquisition, and according to described clear Sing the power spectrum for audio of singing opera arias described in the frequency spectrum acquisition of audio;
Using the power spectrum for audio of singing opera arias described in the readable filter group processing of Meier, the Meier power for audio of singing opera arias described in acquisition Spectrum;
Cepstral analysis, the mel-frequency cepstrum coefficient for audio of singing opera arias described in acquisition are carried out on the Meier power spectrum.
7. one kind sings synthesizer, which is characterized in that the synthesizer of singing includes:
Module is obtained, for obtaining the music score to be synthesized of preset number format;
Extraction module, for extracting the music score feature of the music score to be synthesized;
Processing module is handled, output pair for the music score feature to be inputted the mixed hidden Markov model of preset Gauss The acoustic feature answered;
Synthesis module is corresponding with the music score to be synthesized clear for being synthesized the acoustic feature of output by acoustic code synthesizer Sing audio.
8. singing synthesizer as claimed in claim 7, which is characterized in that the acquisition module is also used to obtain multiple training Sample, the training sample include: the music score comprising the lyrics, the corresponding audio of singing opera arias of music score;
The extraction module is also used to extract the music score feature of music score from each training sample and extracts the acoustics for audio of singing opera arias Feature;
The synthesizer of singing further includes training module, and the training module is used for using music score feature as mode input training sample This, be that model exports training sample using acoustic feature, the mixed hidden Markov model of training Gauss, with obtain preset Gauss mix it is hidden Markov model;Wherein, the extraction module further include:
Sampling unit, for being sampled to the audio of singing opera arias to obtain sampled data, the sampled data includes low sampling Rate data and high sampling rate data;
First computing unit, for calculating the first normalization crosscorrelation equation value of the low sampling rate data, and recording gauge Calculate the first partial maximum of the first normalization crosscorrelation equation value in the period;
Second computing unit, calculates the second normalization crosscorrelation equation value of the high sampling rate data, and records calculating week Second local maximum of the second normalization crosscorrelation equation value in phase;
Third computing unit calculates the base of the sampled data according to the first partial maximum and the second local maximum Frequently;
Wherein, the calculation formula for normalizing crosscorrelation equation value is as follows:
K ∈ [0, K-1];M=iz, i ∈ [0, M-1];Z=t/T;N=w/T;J=m;τ=j;
Wherein,To normalize crosscorrelation equation value, Sj、Sj+k、SτFor sampled data sampling point, z t/T, n w/T, T are Audio data samples duration, t are an audio frame duration, and w is calculating cycle duration, and M is audio frame number in a calculating cycle, K For the data point number of sampled data, j, m, z, τ, em、em+k、ejFor intermediate reduced parameter, i, k are integer, and K, M are positive integer.
9. one kind sings synthesis device, which is characterized in that the synthesis device of singing includes processor, reservoir and is stored in On the memory and what can be executed by the processor sings synthesis program, and the synthesis program of singing is held by the processor Such as the step of singing synthetic method described in any one of claims 1 to 6 is realized when row.
10. a kind of computer readable storage medium, which is characterized in that be stored on the computer readable storage medium and sing conjunction At program, described sing is realized when synthesis program is executed by processor and sings synthesis as described in any one of claims 1 to 6 The step of method.
CN201910407538.7A 2019-05-16 2019-05-16 Sing synthetic method, device, equipment and computer readable storage medium Pending CN110310621A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910407538.7A CN110310621A (en) 2019-05-16 2019-05-16 Sing synthetic method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910407538.7A CN110310621A (en) 2019-05-16 2019-05-16 Sing synthetic method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN110310621A true CN110310621A (en) 2019-10-08

Family

ID=68074737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910407538.7A Pending CN110310621A (en) 2019-05-16 2019-05-16 Sing synthetic method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110310621A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111063327A (en) * 2019-12-30 2020-04-24 咪咕文化科技有限公司 Audio processing method and device, electronic equipment and storage medium
CN111316352A (en) * 2019-12-24 2020-06-19 深圳市优必选科技股份有限公司 Speech synthesis method, apparatus, computer device and storage medium
CN111477210A (en) * 2020-04-02 2020-07-31 北京字节跳动网络技术有限公司 Speech synthesis method and device
CN111724764A (en) * 2020-06-28 2020-09-29 北京爱数智慧科技有限公司 Method and device for synthesizing music
CN112562633A (en) * 2020-11-30 2021-03-26 北京有竹居网络技术有限公司 Singing synthesis method and device, electronic equipment and storage medium
CN112634841A (en) * 2020-12-02 2021-04-09 爱荔枝科技(北京)有限公司 Guitar music automatic generation method based on voice recognition
CN113053355A (en) * 2021-03-17 2021-06-29 平安科技(深圳)有限公司 Fole human voice synthesis method, device, equipment and storage medium
WO2021218324A1 (en) * 2020-04-27 2021-11-04 北京字节跳动网络技术有限公司 Song synthesis method, device, readable medium, and electronic apparatus
CN114974183A (en) * 2022-05-16 2022-08-30 广州虎牙科技有限公司 Singing voice synthesis method, system and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3296648B2 (en) * 1993-11-30 2002-07-02 三洋電機株式会社 Method and apparatus for improving discontinuity in digital pitch conversion
CN101366080A (en) * 2006-08-15 2009-02-11 美国博通公司 Re-phasing of decoder states after packet loss
CN102521281A (en) * 2011-11-25 2012-06-27 北京师范大学 Humming computer music searching method based on longest matching subsequence algorithm
CN102598119A (en) * 2009-04-21 2012-07-18 剑桥硅无线电有限公司 Pitch estimation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3296648B2 (en) * 1993-11-30 2002-07-02 三洋電機株式会社 Method and apparatus for improving discontinuity in digital pitch conversion
CN101366080A (en) * 2006-08-15 2009-02-11 美国博通公司 Re-phasing of decoder states after packet loss
CN102598119A (en) * 2009-04-21 2012-07-18 剑桥硅无线电有限公司 Pitch estimation
CN102521281A (en) * 2011-11-25 2012-06-27 北京师范大学 Humming computer music searching method based on longest matching subsequence algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李贤: "基于统计模型的汉语歌声合成研究", 《中国博士学位论文全文数据库 信息科技辑》, pages 136 - 70 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111316352A (en) * 2019-12-24 2020-06-19 深圳市优必选科技股份有限公司 Speech synthesis method, apparatus, computer device and storage medium
CN111316352B (en) * 2019-12-24 2023-10-10 深圳市优必选科技股份有限公司 Speech synthesis method, device, computer equipment and storage medium
CN111063327A (en) * 2019-12-30 2020-04-24 咪咕文化科技有限公司 Audio processing method and device, electronic equipment and storage medium
CN111477210A (en) * 2020-04-02 2020-07-31 北京字节跳动网络技术有限公司 Speech synthesis method and device
WO2021218324A1 (en) * 2020-04-27 2021-11-04 北京字节跳动网络技术有限公司 Song synthesis method, device, readable medium, and electronic apparatus
CN111724764A (en) * 2020-06-28 2020-09-29 北京爱数智慧科技有限公司 Method and device for synthesizing music
CN111724764B (en) * 2020-06-28 2023-01-03 北京爱数智慧科技有限公司 Method and device for synthesizing music
CN112562633A (en) * 2020-11-30 2021-03-26 北京有竹居网络技术有限公司 Singing synthesis method and device, electronic equipment and storage medium
CN112634841A (en) * 2020-12-02 2021-04-09 爱荔枝科技(北京)有限公司 Guitar music automatic generation method based on voice recognition
CN112634841B (en) * 2020-12-02 2022-11-29 爱荔枝科技(北京)有限公司 Guitar music automatic generation method based on voice recognition
CN113053355A (en) * 2021-03-17 2021-06-29 平安科技(深圳)有限公司 Fole human voice synthesis method, device, equipment and storage medium
CN114974183A (en) * 2022-05-16 2022-08-30 广州虎牙科技有限公司 Singing voice synthesis method, system and computer equipment

Similar Documents

Publication Publication Date Title
CN110310621A (en) Sing synthetic method, device, equipment and computer readable storage medium
WO2021218138A1 (en) Song synthesis method, apparatus and device, and storage medium
CN1169115C (en) Prosodic databases holding fundamental frequency templates for use in speech synthesis
CN103854646B (en) A kind of method realized DAB and classified automatically
US8594993B2 (en) Frame mapping approach for cross-lingual voice transformation
CN109767778B (en) Bi-L STM and WaveNet fused voice conversion method
CN107170464B (en) Voice speed changing method based on music rhythm and computing equipment
CN106128450A (en) The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese
CN108766409A (en) A kind of opera synthetic method, device and computer readable storage medium
US20230402047A1 (en) Audio processing method and apparatus, electronic device, and computer-readable storage medium
CN111081249A (en) Mode selection method, device and computer readable storage medium
TW201331930A (en) Speech synthesis method and apparatus for electronic system
CN107910005A (en) The target service localization method and device of interaction text
WO2023116243A1 (en) Data conversion method and computer storage medium
CN116913244A (en) Speech synthesis method, equipment and medium
CN113539239B (en) Voice conversion method and device, storage medium and electronic equipment
CN111724764B (en) Method and device for synthesizing music
CN109697985B (en) Voice signal processing method and device and terminal
CN113421544B (en) Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium
CN112164387A (en) Audio synthesis method and device, electronic equipment and computer-readable storage medium
JP6213217B2 (en) Speech synthesis apparatus and computer program for speech synthesis
Zhang Mobile music recognition based on deep neural network
CN1629933B (en) Device, method and converter for speech synthesis
CN115457923B (en) Singing voice synthesis method, device, equipment and storage medium
JP2013156544A (en) Vocalization period specifying device, voice parameter generating device and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination