CN109949783A

CN109949783A - Song synthetic method and system

Info

Publication number: CN109949783A
Application number: CN201910188123.5A
Authority: CN
Inventors: 初敏; 杜斌; 杨喜鹏; 陈博; 刘亚祝; 游永彬
Original assignee: AI Speech Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2019-01-18
Filing date: 2019-03-13
Publication date: 2019-06-28
Anticipated expiration: 2039-03-13
Also published as: CN109949783B

Abstract

The present invention discloses a kind of song synthetic method, comprising: obtains lyrics audio, the lyrics audio chants audio for the lyrics corresponding to song to be synthesized；Obtain the dry sound of target song for corresponding to presently described song to be synthesized；Obtain the target audio feature for corresponding to the dry sound of the target song；It is adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio to obtain corresponding song audio；The song audio is synthesized to obtain song with corresponding background music.The song synthetic method of the embodiment of the present invention only needs user to chant the lyrics, it will be able to want the song sung according to the audio synthesis user that chants of user.User without have it is any sing grounding in basic skills, without understanding any rhythm knowledge, it is only necessary to which the song sung out with the sound of oneself can be obtained in lyrics reading.

Description

Song synthetic method and system

Technical field

The present invention relates to speech synthesis technique field more particularly to a kind of song synthetic method, system, electronic equipment and deposit Storage media.

Background technique

The synthetic technology scheme of song on the market has at present: music score synthesis: being instructed according to the music score of Chinese operas and a large amount of voices of speaker Experienced model synthesizes song.Feature Conversion synthesis: modifying the melody of song by changing pause and the duration of sound, this Kind of scheme can only the better simply song of synthesis ratio, such as the song of rap style.

In both schemes, the first scheme needs a large amount of user data training pattern, and at high cost, practicability is bad, And the mechanical sense of audio is stronger.Second scheme has no way to the higher song of difficulty of circulating and singing of some chambers of such as playing, trill Synthesis.

The current defect of similar technique has: non real-time song synthesis can synthesize any song, but synthesis has mechanical sound, and And model training is at high cost；Real-time song synthesizes only specific several types of songs.

Summary of the invention

The embodiment of the present invention provides a kind of song synthetic method and system, at least solving one of above-mentioned technical problem.

In a first aspect, the embodiment of the present invention provides a kind of song synthetic method, comprising:

Lyrics audio is obtained, the lyrics audio chants audio for the lyrics corresponding to song to be synthesized；

Obtain the dry sound of target song for corresponding to presently described song to be synthesized；

Obtain the target audio feature for corresponding to the dry sound of the target song；

It is adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio to obtain corresponding song Audio；

The song audio is synthesized to obtain song with corresponding background music.

Second aspect, the embodiment of the present invention provide a kind of song synthesis system, comprising:

Audio obtains program module, and for obtaining lyrics audio, the lyrics audio is the song corresponding to song to be synthesized Word chants audio；

Dry sound obtains program module, for obtaining the dry sound of target song for corresponding to presently described song to be synthesized；

Feature obtains program module, for obtaining the target audio feature for corresponding to the dry sound of the target song；

Character adjustment program module, for being carried out according to audio frequency characteristics of the target audio feature to the lyrics audio Adjustment is to obtain corresponding song audio；

Audio synthesis program module, for being synthesized to obtain song the song audio with corresponding background music.

The third aspect, the embodiment of the present invention provide a kind of storage medium, are stored with one or more in the storage medium Including the program executed instruction, it is described execute instruction can by electronic equipment (including but not limited to computer, server, or Network equipment etc.) it reads and executes, for executing any of the above-described song synthetic method of the present invention.

Fourth aspect provides a kind of electronic equipment comprising: at least one processor, and with described at least one Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, institute It states instruction to be executed by least one described processor, so that at least one described processor is able to carry out any of the above-described of the present invention Song synthetic method.

5th aspect, the embodiment of the present invention also provide a kind of computer program product, and the computer program product includes The computer program of storage on a storage medium, the computer program includes program instruction, when described program instruction is calculated When machine executes, the computer is made to execute any of the above-described song synthetic method.

The beneficial effect of the embodiment of the present invention is: the song synthetic method of the embodiment of the present invention only needs user to chant song Word, it will be able to which the song sung is wanted according to the audio synthesis user that chants of user.User without have it is any sing grounding in basic skills, also without Any rhythm knowledge need to be understood, it is only necessary to which the song sung out with the sound of oneself is can be obtained into lyrics reading.And due to The source data of the dry sound of user's song of acquisition analog subscriber tone color is the sound that the user oneself chants the lyrics in the embodiment of the present invention Frequency evidence, so the audio data for chanting the lyrics to user according to the audio frequency characteristics of the dry sound of standard song is only needed to carry out adaptability Adjustment simplifies song synthetic method, reduces the technical difficulty of song synthesis, and is greatly improved song synthesis Efficiency.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, for this field For those of ordinary skill, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is the flow chart of an embodiment of the song synthetic method in the present invention；

Fig. 2 is the functional block diagram of an embodiment of the song synthesis system in the present invention；

Fig. 3 is to read sentence into the execution method flow diagram of song product in the present invention；

Fig. 4 is the reading sentence Cheng Gezhong song synthetic technology flow chart in the present invention；

Fig. 5 is the structural schematic diagram of an embodiment of electronic equipment of the invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.

The present invention can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, member Part, data structure etc..The present invention can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.

In the present invention, the fingers such as " module ", " device ", " system " are applied to the related entities of computer, such as hardware, hardware Combination, software or software in execution with software etc..In detail, for example, element can with but be not limited to run on processing Process, processor, object, executable element, execution thread, program and/or the computer of device.In addition, running on server Application program or shell script, server can be element.One or more elements can be in the process and/or thread of execution In, and element can be localized and/or be distributed between two or multiple stage computers on one computer, and can be by each Kind computer-readable medium operation.Element can also according to the signal with one or more data packets, for example, from one with Another element interacts in local system, distributed system, and/or the network in internet passes through signal and other system interactions The signals of data communicated by locally and/or remotely process.

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise", not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or equipment institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence " including ... ", it is not excluded that including described want There is also other identical elements in the process, method, article or equipment of element.

As shown in Figure 1, for a kind of song synthetic method provided by the embodiment of the present invention, this method comprises: S10, acquisition Lyrics audio, the lyrics audio chant audio for the lyrics corresponding to song to be synthesized.Illustratively, can be user according to Audio that the lyrics of song to be synthesized were chanted the chant either history audio data analog synthesis based on user obtains To chant audio, this is not limited by the present invention.

S20, the dry sound of target song for corresponding to presently described song to be synthesized is obtained；Illustratively, from the library It is middle to obtain the dry sound of target song for corresponding to the song to be synthesized.Wherein, library is building in advance, is stored in library The dry sound of song of multiple songs to be synthesized.

S30, the target audio feature for corresponding to the dry sound of the target song is obtained；Target audio feature is that building is sung in advance It is stored together in song library.Target audio feature includes the fundamental frequency of sound mother's information of each word in the dry sound of target song, tool Body, the fundamental frequency in the dry sound of target song is extracted, and further obtain the fundamental frequency of each word in the dry sound of target song.

S40, it is adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio to obtain accordingly Song audio；

S50, the song audio is synthesized to obtain song with corresponding background music.

The song synthetic method of the embodiment of the present invention only needs user to chant the lyrics, it will be able to chant audio according to user Synthesis user wants the song sung.User without have it is any sing grounding in basic skills, without understanding any rhythm knowledge, it is only necessary to will The lyrics read the song that any type, any style sung out with the sound of oneself can be obtained.

Illustratively, on the one hand, user's selection wants the song of synthesis to obtain the corresponding lyrics, and user chants the lyrics simultaneously Obtain audio data；On the other hand, the dry sound of song and background music that user selects song are obtained from the song library constructed in advance； The audio frequency characteristics (that is, standard audio feature) for further obtaining the dry sound of song, to chant the lyrics to user according to audio frequency characteristics The audio data of acquisition is adjusted to obtain meeting the dry sound of user's song of user's tone color；Finally, by the dry sound of user's song with Background music synthesizes song.

And the source data of the dry sound of user's song due to obtaining analog subscriber tone color in the embodiment of the present invention is the user The audio data of the lyrics oneself is chanted, so only needing user to be chanted according to the audio frequency characteristics of the dry sound of standard song the sound of the lyrics Frequency simplifies song synthetic method according to being adaptively adjusted, and reduces the technical difficulty of song synthesis, and greatly Improve song synthesis efficiency.

In some embodiments, song synthetic method of the invention further include: user's sound is obtained according to the lyrics audio The audio segmental information of frequency, the audio segmental information include that phone segmentation information and/or syllable splitting information and/or sound are female Segmental information；

It is described to be adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio to obtain accordingly Song audio includes: by the target audio feature, the audio segmental information and the lyrics audio input to adaptive mode Type is to obtain corresponding song audio.

Illustratively, it is described by the target audio feature, the audio segmental information and the lyrics audio input extremely Adaptive model includes: to obtain corresponding song audio

The lyrics audio and audio segmental information are input to preparatory trained acoustics adaptive model to described Lyrics audio carries out self-adaptive processing；

The target audio feature is input to preparatory trained song rhythm model to obtain prosodic parameter；

The lyrics audio after self-adaptive processing is adjusted according to the prosodic parameter to obtain corresponding song sound Frequently.

Song synthetic method of the invention is from the voice that one section of user reads to personalized song is formed, from technical standpoint For, it is divided into 2 big processing stages, i.e. speech recognition and speech synthesis, the latter is further subdivided into two parts, i.e. acoustic model adds It holds, rhythm model adjusts prosodic parameter.The aid of so-called acoustic model, it is popular for be exactly the voice data for collecting user, formation Training pattern, then by allowing the tone color for generating song as user after individualized learning.Rhythm model adjusts prosodic parameter, It is then the length for controlling each sound, height, allows melody to upper beat, it is natural and tripping.Then, prosodic parameter and frequency spectrum are joined Number combine, generate song, one section using user tone color deduce, melody like original singer segment with regard to synthetic.Whole flow process Get off, success can be synthesized within 1~2 second.

In some embodiments, after obtaining lyrics audio further include:

Detect whether the lyrics audio is accurately corresponding to the corresponding lyrics；Illustratively, user institute in lyrics audio is detected Whether each word read is correct, for example, in " loving China " " in " pronunciation of word may be read as " zong ", at this moment it is exactly There is the reading word of mistake.

If it is not, then further determining that word wrong in the lyrics audio；If it is continue subsequent song synthesis step Rapid S20-S50.

The audio user feature for corresponding to the active user is determined according to the lyrics audio；Illustratively, from user The audio user feature that can represent user is extracted in the lyrics audio chanted, and correct pronunciation is synthesized with user or will be wrong Pronunciation accidentally is adjusted to correct pronunciation.

The wrong word according to the audio user characteristic modification is to obtain the lyrics sound for being accurately corresponding to the corresponding lyrics Frequently, and sequence executes step S20-S50.

The present embodiment can be read automatically to be modified the content of mistake when mistake occur in the lyrics in user, to guarantee Going on smoothly for song synthesis, chants the lyrics without user again.

In some embodiments, after obtaining lyrics audio further include:

The word of identified mistake is presented to the user, and user is guided individually to chant the word of the mistake；

Obtain the amendment audio that user individually chants the word of the mistake；

Correct lyrics audio is determined according to the amendment audio and the lyrics audio, and sequence executes step S20- S50。

The present embodiment can read the content that automatic identification makes mistake when mistake occur in the lyrics in user and individually guide User chants error section again, chants the whole section of lyrics again without user, that is, ensure that going on smoothly for song of synthesis also mentions User experience has been risen, and has also avoided user and chants the interminable feelings of entire song generated time caused by the whole section of lyrics again Condition.

In some embodiments, before obtaining lyrics audio further include:

Customer attribute information is obtained, the customer attribute information includes user's gender, age of user；

Song recommendations list to be synthesized is generated according to the customer attribute information；

The acquisition lyrics audio includes:

Operation determines song to be synthesized according to the user's choice, and the lyrics of the song to be synthesized are presented to the user；

Detect and obtain the lyrics audio that user chants the song to be synthesized.

Suitable list of songs is recommended for user according to the attribute information of user in the present embodiment, user is facilitated quickly to obtain Interested song is taken, user is promoted and is carrying out the experience in song synthetic operation.

As shown in Fig. 2, the embodiment of the present invention also provides a kind of song synthesis system 200, which includes:

Audio obtains program module 210, and for obtaining lyrics audio, the lyrics audio is corresponding to song to be synthesized The lyrics chant audio；

Dry sound obtains program module 220, for obtaining the dry sound of target song for corresponding to presently described song to be synthesized；

Feature obtains program module 230, for obtaining the target audio feature for corresponding to the dry sound of the target song；

Character adjustment program module 240, for the audio frequency characteristics according to the target audio feature to the lyrics audio It is adjusted to obtain corresponding song audio；

Audio synthesis program module 250, is sung for being synthesized the song audio with corresponding background music It is bent.

The song synthesis system of the embodiments of the present invention can be used for executing the song synthetic method of the embodiment of the present invention, and Reach the realization song synthetic method technical effect achieved of the embodiments of the present invention accordingly, which is not described herein again.This Hardware processor (hardware processor) Lai Shixian related function module can be passed through in inventive embodiments.

The present invention proposes a kind of real-time synthetic technology scheme of high naturalness song for supporting any song, the program be divided into from Line and online two parts.

The effect of offline part is building song library data, and professional singer is looked for record the dry sound of song for needing to synthesize conversion first And background music, the dry sound of song is cut to obtain the snatch of song as synthesis.Mark out the sound of each word in the dry sound of song Then simple or compound vowel of a Chinese syllable information is extracted the fundamental frequency of the dry sound of song (for a segment of audio data), is modified to the fundamental frequency of sound mother.Have The fundamental frequency of sound mother (unvoice) is 0, and having the fundamental frequency of some sound female (voice) is not 0, if the fundamental frequency of plosive is all 0, Therefore the reason that may be not allowed in view of extracting fundamental frequency tool at present, is adjusted the fundamental frequency data of sound mother.

Wherein, the dry sound of song refers to removing the song sound of background music.The acquisition modes of the dry sound of song might not Only with the mode of recording, can be removed from existing song using the means of existing maturation of technology obtain the dry sound of song with Background music.For example, the song sound of most of background music can be eliminated using technologies such as echo cancellor (methods without being limited thereto) Frequently, i.e. the dry sound of song.

The mode of cutting uses for each syllable splitting or carries out cutting or cutting sound to each phoneme Female information.For example, an audio file sing be " loving China " we can obtain whole section of the fundamental frequency of " loving China ", so Afterwards in the way of artificial to " love ", " I ", " in ", the information such as the sound mother of " China " four words carry out cutting (cutting syllable close Phoneme is also similarly).

Online part is data preprocessing phase first, by speech recognition technology to audio data (for example, user institute The lyrics audio of the lyrics for the song to be synthesized chanted) it is proofreaded, whether detection audio data is corresponding with the lyrics, and utilizes Deep learning model in big data obtains the phone segmentation information of audio user.

Then the processing such as noise reduction, end-point detection, UV (Unvoice voice) fundamental frequency reparation is carried out to audio data.By sound Song features and audio data in plain segmental information, song library, being sent into trained adaptive model together, (adaptive model is Refer to and be used to adjust the feature that user reads sentence audio, is adjusted on the melody of song, training data is largely accurately to mark Voice data and cutting markup information) in carry out Data Matching so that audio user feature matches with song features.

The audio frequency characteristics after transformation are post-processed by way of signal processing or big data driving, so that becoming The audio that audio after changing is sung on pronunciation characteristic closer to natural person, and finally it is being incorporated into background music.

The playing method scheme of a kind of " reading sentence Cheng Ge " is designed and may be implemented based on the present invention: lyrics text is inputted by user Including but not limited to nature voice, synthesized voice, splicing sound, recording sound data wait until system, and output obtains user's Timbre Synthesis Song.Marking excitation, the blessing of circle of friends song, song ballot are supported but are not limited to for synthesized voice.

It can be realized " reading sentence Cheng Ge " song synthesis of any song based on method of the invention.It is driven using based on signal Or big data driving speech model is predicted and is handled to audio user signaling point, before guaranteeing user's tone color and semanteme It puts, the sound of user is adjusted to the tone and syllable of song, support " reading sentence Cheng Ge " phonosynthesis of any song.

As shown in figure 3, to read sentence into the execution method flow of song product in the present invention, comprising the following steps:

Step 1: entering program interface, fill in or select user information (including but not limited to gender and age), generate bent Library list.

Step 2: the selection song to be synthesized uploads audio including but not limited to natural voice, synthesized voice, splicing sound, record Sound processed etc..

Step 3: audio data quality proofreads (function is not essential), correct to carry out song synthesis, and mistake is recorded again.

Step 4: user obtains synthesis song audio, and operation including but not limited to can be carried out to audio, such as: playing song Bent, sharing song, song ballot, downloading song etc..

As shown in figure 4, for the reading sentence Cheng Gezhong song synthetic technology process in the present invention, comprising the following steps:

Step 1: the load of song library song resource, the including but not limited to lyrics, music score, signal or general big data model, needle To different crowd, song type, environment adaptive model, empirical adjusting parameter.

Step 2: data prediction, including but not limited to noise reduction, speech recognition, end-point detection, speaker's specificity analysis (property Not etc.), audio quality verification, language model adaptation.It realizes to the text proofreading of audio data, mute section of end-point detection, swash It is regular etc. to encourage section.

Step 3: by the voice signal model of signal driving or big data driving, by characteristic processing and prediction, or The processing or prediction of signaling point, in the case where retaining speaker's tone color and semanteme, by the tone changing of audio user at specified The tone of song and corresponding syllable, so that more meeting the true music score of song.

Step 4: the audio after transformation is post-processed by way of signal processing or big data driving, so that The audio that transformed audio is sung on pronunciation characteristic closer to natural person.

It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Movement merge, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.

In some embodiments, the embodiment of the present invention provides a kind of non-volatile computer readable storage medium storing program for executing, described to deposit Being stored in storage media one or more includes the programs executed instruction, it is described execute instruction can by electronic equipment (including but It is not limited to computer, server or the network equipment etc.) it reads and executes, for executing any of the above-described song of the present invention Synthetic method.

In some embodiments, the embodiment of the present invention also provides a kind of computer program product, and the computer program produces Product include the computer program being stored on non-volatile computer readable storage medium storing program for executing, and the computer program includes that program refers to It enables, when described program instruction is computer-executed, the computer is made to execute any of the above-described song synthetic method.

In some embodiments, the embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor, And the memory being connect at least one described processor communication, wherein the memory is stored with can be by described at least one The instruction that a processor executes, described instruction is executed by least one described processor, so that at least one described processor energy Enough execute song synthetic method.

In some embodiments, the embodiment of the present invention also provides a kind of storage medium, is stored thereon with computer program, It is characterized in that, song synthetic method when which is executed by processor.

Fig. 5 is the hardware configuration signal of the electronic equipment for the execution song synthetic method that another embodiment of the application provides Figure, as shown in figure 5, the equipment includes:

One or more processors 510 and memory 520, in Fig. 5 by taking a processor 510 as an example.

The equipment for executing song synthetic method can also include: input unit 530 and output device 540.

Processor 510, memory 520, input unit 530 and output device 540 can pass through bus or other modes It connects, in Fig. 5 for being connected by bus.

Memory 520 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey Sequence, non-volatile computer executable program and module, such as the corresponding program of song synthetic method in the embodiment of the present application Instruction/module.Non-volatile software program, instruction and the module that processor 510 is stored in memory 520 by operation, Thereby executing the various function application and data processing of server, i.e. realization above method embodiment song synthetic method.

Memory 520 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function；Storage data area can be stored to be created according to using for song synthesizer Data etc..In addition, memory 520 may include high-speed random access memory, it can also include nonvolatile memory, example Such as at least one disk memory, flush memory device or other non-volatile solid state memory parts.In some embodiments, it deposits Optional reservoir 520 includes the memory remotely located relative to processor 510, these remote memories can pass through network connection To song synthesizer.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, mobile radio communication And combinations thereof.

Input unit 530 can receive the number or character information of input, and generates and set with the user of song synthesizer It sets and the related signal of function control.Output device 540 may include that display screen etc. shows equipment.

One or more of modules are stored in the memory 520, when by one or more of processors When 510 execution, the song synthetic method in above-mentioned any means embodiment is executed.

Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.

The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:

(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..

(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.

(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.

(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy Power, stability, reliability, safety, scalability, manageability etc. are more demanding.

(5) other electronic devices with data interaction function.

The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It is realized by the mode of software plus general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned technology Scheme substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, the computer Software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions to So that computer equipment (can be personal computer, server or the network equipment etc.) execute each embodiment or Method described in certain parts of embodiment.

Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations；Although The application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. a kind of song synthetic method, comprising:

2. according to the method described in claim 1, wherein, further including preparatory building library, being stored in the library more The dry sound of song of a song to be synthesized；

Described obtain corresponding to the dry sound of target song of presently described song to be synthesized includes: to obtain to correspond to from the library In the dry sound of target song of the song to be synthesized.

3. according to the method described in claim 2, further including being cut according to the audio that the lyrics audio obtains audio user wherein Divide information；

It is described to be adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio to obtain corresponding song Audio includes:

By the target audio feature, the audio segmental information and the lyrics audio input to adaptive model to obtain phase The song audio answered.

4. according to the method described in claim 3, wherein, it is described by the target audio feature, the audio segmental information and The lyrics audio input obtains corresponding song audio to adaptive model

The lyrics audio and audio segmental information are input to preparatory trained acoustics adaptive model to the lyrics Audio carries out self-adaptive processing；

The lyrics audio after self-adaptive processing is adjusted to obtain corresponding song audio according to the prosodic parameter.

5. the method according to claim 3 or 4, wherein the audio segmental information includes phone segmentation information and/or sound Save segmental information and/or sound mother's segmental information.

6. according to the method described in claim 1, wherein, after obtaining lyrics audio further include:

Detect whether the lyrics audio is accurately corresponding to the corresponding lyrics；

If it is not, then further determining that word wrong in the lyrics audio；

The audio user feature for corresponding to active user is determined according to the lyrics audio；

The wrong word according to the audio user characteristic modification is to obtain the lyrics audio for being accurately corresponding to the corresponding lyrics.

7. according to the method described in claim 1, wherein, before obtaining lyrics audio further include:

The acquisition lyrics audio includes:

Detect and obtain the lyrics audio that user chants the song to be synthesized.

8. a kind of song synthesis system, comprising:

Audio obtains program module, and for obtaining lyrics audio, the lyrics audio is readed aloud for the lyrics corresponding to song to be synthesized Pronunciation frequency；

Character adjustment program module, for being adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio To obtain corresponding song audio；

9. a kind of electronic equipment comprising: at least one processor, and deposited with what at least one described processor communication was connect Reservoir, wherein the memory be stored with can by least one described processor execute instruction, described instruction by it is described at least One processor executes, so that at least one described processor is able to carry out any one of claim 1-7 the method Step.

10. a kind of storage medium, is stored thereon with computer program, which is characterized in that the realization when program is executed by processor The step of any one of claim 1-7 the method.