CN109949783A - Song synthetic method and system - Google Patents

Song synthetic method and system Download PDF

Info

Publication number
CN109949783A
CN109949783A CN201910188123.5A CN201910188123A CN109949783A CN 109949783 A CN109949783 A CN 109949783A CN 201910188123 A CN201910188123 A CN 201910188123A CN 109949783 A CN109949783 A CN 109949783A
Authority
CN
China
Prior art keywords
audio
song
lyrics
user
synthesized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910188123.5A
Other languages
Chinese (zh)
Other versions
CN109949783B (en
Inventor
初敏
杜斌
杨喜鹏
陈博
刘亚祝
游永彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Publication of CN109949783A publication Critical patent/CN109949783A/en
Application granted granted Critical
Publication of CN109949783B publication Critical patent/CN109949783B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

The present invention discloses a kind of song synthetic method, comprising: obtains lyrics audio, the lyrics audio chants audio for the lyrics corresponding to song to be synthesized;Obtain the dry sound of target song for corresponding to presently described song to be synthesized;Obtain the target audio feature for corresponding to the dry sound of the target song;It is adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio to obtain corresponding song audio;The song audio is synthesized to obtain song with corresponding background music.The song synthetic method of the embodiment of the present invention only needs user to chant the lyrics, it will be able to want the song sung according to the audio synthesis user that chants of user.User without have it is any sing grounding in basic skills, without understanding any rhythm knowledge, it is only necessary to which the song sung out with the sound of oneself can be obtained in lyrics reading.

Description

Song synthetic method and system
Technical field
The present invention relates to speech synthesis technique field more particularly to a kind of song synthetic method, system, electronic equipment and deposit Storage media.
Background technique
The synthetic technology scheme of song on the market has at present: music score synthesis: being instructed according to the music score of Chinese operas and a large amount of voices of speaker Experienced model synthesizes song.Feature Conversion synthesis: modifying the melody of song by changing pause and the duration of sound, this Kind of scheme can only the better simply song of synthesis ratio, such as the song of rap style.
In both schemes, the first scheme needs a large amount of user data training pattern, and at high cost, practicability is bad, And the mechanical sense of audio is stronger.Second scheme has no way to the higher song of difficulty of circulating and singing of some chambers of such as playing, trill Synthesis.
The current defect of similar technique has: non real-time song synthesis can synthesize any song, but synthesis has mechanical sound, and And model training is at high cost;Real-time song synthesizes only specific several types of songs.
Summary of the invention
The embodiment of the present invention provides a kind of song synthetic method and system, at least solving one of above-mentioned technical problem.
In a first aspect, the embodiment of the present invention provides a kind of song synthetic method, comprising:
Lyrics audio is obtained, the lyrics audio chants audio for the lyrics corresponding to song to be synthesized;
Obtain the dry sound of target song for corresponding to presently described song to be synthesized;
Obtain the target audio feature for corresponding to the dry sound of the target song;
It is adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio to obtain corresponding song Audio;
The song audio is synthesized to obtain song with corresponding background music.
Second aspect, the embodiment of the present invention provide a kind of song synthesis system, comprising:
Audio obtains program module, and for obtaining lyrics audio, the lyrics audio is the song corresponding to song to be synthesized Word chants audio;
Dry sound obtains program module, for obtaining the dry sound of target song for corresponding to presently described song to be synthesized;
Feature obtains program module, for obtaining the target audio feature for corresponding to the dry sound of the target song;
Character adjustment program module, for being carried out according to audio frequency characteristics of the target audio feature to the lyrics audio Adjustment is to obtain corresponding song audio;
Audio synthesis program module, for being synthesized to obtain song the song audio with corresponding background music.
The third aspect, the embodiment of the present invention provide a kind of storage medium, are stored with one or more in the storage medium Including the program executed instruction, it is described execute instruction can by electronic equipment (including but not limited to computer, server, or Network equipment etc.) it reads and executes, for executing any of the above-described song synthetic method of the present invention.
Fourth aspect provides a kind of electronic equipment comprising: at least one processor, and with described at least one Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, institute It states instruction to be executed by least one described processor, so that at least one described processor is able to carry out any of the above-described of the present invention Song synthetic method.
5th aspect, the embodiment of the present invention also provide a kind of computer program product, and the computer program product includes The computer program of storage on a storage medium, the computer program includes program instruction, when described program instruction is calculated When machine executes, the computer is made to execute any of the above-described song synthetic method.
The beneficial effect of the embodiment of the present invention is: the song synthetic method of the embodiment of the present invention only needs user to chant song Word, it will be able to which the song sung is wanted according to the audio synthesis user that chants of user.User without have it is any sing grounding in basic skills, also without Any rhythm knowledge need to be understood, it is only necessary to which the song sung out with the sound of oneself is can be obtained into lyrics reading.And due to The source data of the dry sound of user's song of acquisition analog subscriber tone color is the sound that the user oneself chants the lyrics in the embodiment of the present invention Frequency evidence, so the audio data for chanting the lyrics to user according to the audio frequency characteristics of the dry sound of standard song is only needed to carry out adaptability Adjustment simplifies song synthetic method, reduces the technical difficulty of song synthesis, and is greatly improved song synthesis Efficiency.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, for this field For those of ordinary skill, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow chart of an embodiment of the song synthetic method in the present invention;
Fig. 2 is the functional block diagram of an embodiment of the song synthesis system in the present invention;
Fig. 3 is to read sentence into the execution method flow diagram of song product in the present invention;
Fig. 4 is the reading sentence Cheng Gezhong song synthetic technology flow chart in the present invention;
Fig. 5 is the structural schematic diagram of an embodiment of electronic equipment of the invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.
The present invention can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, member Part, data structure etc..The present invention can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.
In the present invention, the fingers such as " module ", " device ", " system " are applied to the related entities of computer, such as hardware, hardware Combination, software or software in execution with software etc..In detail, for example, element can with but be not limited to run on processing Process, processor, object, executable element, execution thread, program and/or the computer of device.In addition, running on server Application program or shell script, server can be element.One or more elements can be in the process and/or thread of execution In, and element can be localized and/or be distributed between two or multiple stage computers on one computer, and can be by each Kind computer-readable medium operation.Element can also according to the signal with one or more data packets, for example, from one with Another element interacts in local system, distributed system, and/or the network in internet passes through signal and other system interactions The signals of data communicated by locally and/or remotely process.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise", not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or equipment institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence " including ... ", it is not excluded that including described want There is also other identical elements in the process, method, article or equipment of element.
As shown in Figure 1, for a kind of song synthetic method provided by the embodiment of the present invention, this method comprises: S10, acquisition Lyrics audio, the lyrics audio chant audio for the lyrics corresponding to song to be synthesized.Illustratively, can be user according to Audio that the lyrics of song to be synthesized were chanted the chant either history audio data analog synthesis based on user obtains To chant audio, this is not limited by the present invention.
S20, the dry sound of target song for corresponding to presently described song to be synthesized is obtained;Illustratively, from the library It is middle to obtain the dry sound of target song for corresponding to the song to be synthesized.Wherein, library is building in advance, is stored in library The dry sound of song of multiple songs to be synthesized.
S30, the target audio feature for corresponding to the dry sound of the target song is obtained;Target audio feature is that building is sung in advance It is stored together in song library.Target audio feature includes the fundamental frequency of sound mother's information of each word in the dry sound of target song, tool Body, the fundamental frequency in the dry sound of target song is extracted, and further obtain the fundamental frequency of each word in the dry sound of target song.
S40, it is adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio to obtain accordingly Song audio;
S50, the song audio is synthesized to obtain song with corresponding background music.
The song synthetic method of the embodiment of the present invention only needs user to chant the lyrics, it will be able to chant audio according to user Synthesis user wants the song sung.User without have it is any sing grounding in basic skills, without understanding any rhythm knowledge, it is only necessary to will The lyrics read the song that any type, any style sung out with the sound of oneself can be obtained.
Illustratively, on the one hand, user's selection wants the song of synthesis to obtain the corresponding lyrics, and user chants the lyrics simultaneously Obtain audio data;On the other hand, the dry sound of song and background music that user selects song are obtained from the song library constructed in advance; The audio frequency characteristics (that is, standard audio feature) for further obtaining the dry sound of song, to chant the lyrics to user according to audio frequency characteristics The audio data of acquisition is adjusted to obtain meeting the dry sound of user's song of user's tone color;Finally, by the dry sound of user's song with Background music synthesizes song.
And the source data of the dry sound of user's song due to obtaining analog subscriber tone color in the embodiment of the present invention is the user The audio data of the lyrics oneself is chanted, so only needing user to be chanted according to the audio frequency characteristics of the dry sound of standard song the sound of the lyrics Frequency simplifies song synthetic method according to being adaptively adjusted, and reduces the technical difficulty of song synthesis, and greatly Improve song synthesis efficiency.
In some embodiments, song synthetic method of the invention further include: user's sound is obtained according to the lyrics audio The audio segmental information of frequency, the audio segmental information include that phone segmentation information and/or syllable splitting information and/or sound are female Segmental information;
It is described to be adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio to obtain accordingly Song audio includes: by the target audio feature, the audio segmental information and the lyrics audio input to adaptive mode Type is to obtain corresponding song audio.
Illustratively, it is described by the target audio feature, the audio segmental information and the lyrics audio input extremely Adaptive model includes: to obtain corresponding song audio
The lyrics audio and audio segmental information are input to preparatory trained acoustics adaptive model to described Lyrics audio carries out self-adaptive processing;
The target audio feature is input to preparatory trained song rhythm model to obtain prosodic parameter;
The lyrics audio after self-adaptive processing is adjusted according to the prosodic parameter to obtain corresponding song sound Frequently.
Song synthetic method of the invention is from the voice that one section of user reads to personalized song is formed, from technical standpoint For, it is divided into 2 big processing stages, i.e. speech recognition and speech synthesis, the latter is further subdivided into two parts, i.e. acoustic model adds It holds, rhythm model adjusts prosodic parameter.The aid of so-called acoustic model, it is popular for be exactly the voice data for collecting user, formation Training pattern, then by allowing the tone color for generating song as user after individualized learning.Rhythm model adjusts prosodic parameter, It is then the length for controlling each sound, height, allows melody to upper beat, it is natural and tripping.Then, prosodic parameter and frequency spectrum are joined Number combine, generate song, one section using user tone color deduce, melody like original singer segment with regard to synthetic.Whole flow process Get off, success can be synthesized within 1~2 second.
In some embodiments, after obtaining lyrics audio further include:
Detect whether the lyrics audio is accurately corresponding to the corresponding lyrics;Illustratively, user institute in lyrics audio is detected Whether each word read is correct, for example, in " loving China " " in " pronunciation of word may be read as " zong ", at this moment it is exactly There is the reading word of mistake.
If it is not, then further determining that word wrong in the lyrics audio;If it is continue subsequent song synthesis step Rapid S20-S50.
The audio user feature for corresponding to the active user is determined according to the lyrics audio;Illustratively, from user The audio user feature that can represent user is extracted in the lyrics audio chanted, and correct pronunciation is synthesized with user or will be wrong Pronunciation accidentally is adjusted to correct pronunciation.
The wrong word according to the audio user characteristic modification is to obtain the lyrics sound for being accurately corresponding to the corresponding lyrics Frequently, and sequence executes step S20-S50.
The present embodiment can be read automatically to be modified the content of mistake when mistake occur in the lyrics in user, to guarantee Going on smoothly for song synthesis, chants the lyrics without user again.
In some embodiments, after obtaining lyrics audio further include:
Detect whether the lyrics audio is accurately corresponding to the corresponding lyrics;Illustratively, user institute in lyrics audio is detected Whether each word read is correct, for example, in " loving China " " in " pronunciation of word may be read as " zong ", at this moment it is exactly There is the reading word of mistake.
If it is not, then further determining that word wrong in the lyrics audio;If it is continue subsequent song synthesis step Rapid S20-S50.
The word of identified mistake is presented to the user, and user is guided individually to chant the word of the mistake;
Obtain the amendment audio that user individually chants the word of the mistake;
Correct lyrics audio is determined according to the amendment audio and the lyrics audio, and sequence executes step S20- S50。
The present embodiment can read the content that automatic identification makes mistake when mistake occur in the lyrics in user and individually guide User chants error section again, chants the whole section of lyrics again without user, that is, ensure that going on smoothly for song of synthesis also mentions User experience has been risen, and has also avoided user and chants the interminable feelings of entire song generated time caused by the whole section of lyrics again Condition.
In some embodiments, before obtaining lyrics audio further include:
Customer attribute information is obtained, the customer attribute information includes user's gender, age of user;
Song recommendations list to be synthesized is generated according to the customer attribute information;
The acquisition lyrics audio includes:
Operation determines song to be synthesized according to the user's choice, and the lyrics of the song to be synthesized are presented to the user;
Detect and obtain the lyrics audio that user chants the song to be synthesized.
Suitable list of songs is recommended for user according to the attribute information of user in the present embodiment, user is facilitated quickly to obtain Interested song is taken, user is promoted and is carrying out the experience in song synthetic operation.
As shown in Fig. 2, the embodiment of the present invention also provides a kind of song synthesis system 200, which includes:
Audio obtains program module 210, and for obtaining lyrics audio, the lyrics audio is corresponding to song to be synthesized The lyrics chant audio;
Dry sound obtains program module 220, for obtaining the dry sound of target song for corresponding to presently described song to be synthesized;
Feature obtains program module 230, for obtaining the target audio feature for corresponding to the dry sound of the target song;
Character adjustment program module 240, for the audio frequency characteristics according to the target audio feature to the lyrics audio It is adjusted to obtain corresponding song audio;
Audio synthesis program module 250, is sung for being synthesized the song audio with corresponding background music It is bent.
The song synthesis system of the embodiments of the present invention can be used for executing the song synthetic method of the embodiment of the present invention, and Reach the realization song synthetic method technical effect achieved of the embodiments of the present invention accordingly, which is not described herein again.This Hardware processor (hardware processor) Lai Shixian related function module can be passed through in inventive embodiments.
The present invention proposes a kind of real-time synthetic technology scheme of high naturalness song for supporting any song, the program be divided into from Line and online two parts.
The effect of offline part is building song library data, and professional singer is looked for record the dry sound of song for needing to synthesize conversion first And background music, the dry sound of song is cut to obtain the snatch of song as synthesis.Mark out the sound of each word in the dry sound of song Then simple or compound vowel of a Chinese syllable information is extracted the fundamental frequency of the dry sound of song (for a segment of audio data), is modified to the fundamental frequency of sound mother.Have The fundamental frequency of sound mother (unvoice) is 0, and having the fundamental frequency of some sound female (voice) is not 0, if the fundamental frequency of plosive is all 0, Therefore the reason that may be not allowed in view of extracting fundamental frequency tool at present, is adjusted the fundamental frequency data of sound mother.
Wherein, the dry sound of song refers to removing the song sound of background music.The acquisition modes of the dry sound of song might not Only with the mode of recording, can be removed from existing song using the means of existing maturation of technology obtain the dry sound of song with Background music.For example, the song sound of most of background music can be eliminated using technologies such as echo cancellor (methods without being limited thereto) Frequently, i.e. the dry sound of song.
The mode of cutting uses for each syllable splitting or carries out cutting or cutting sound to each phoneme Female information.For example, an audio file sing be " loving China " we can obtain whole section of the fundamental frequency of " loving China ", so Afterwards in the way of artificial to " love ", " I ", " in ", the information such as the sound mother of " China " four words carry out cutting (cutting syllable close Phoneme is also similarly).
Online part is data preprocessing phase first, by speech recognition technology to audio data (for example, user institute The lyrics audio of the lyrics for the song to be synthesized chanted) it is proofreaded, whether detection audio data is corresponding with the lyrics, and utilizes Deep learning model in big data obtains the phone segmentation information of audio user.
Then the processing such as noise reduction, end-point detection, UV (Unvoice voice) fundamental frequency reparation is carried out to audio data.By sound Song features and audio data in plain segmental information, song library, being sent into trained adaptive model together, (adaptive model is Refer to and be used to adjust the feature that user reads sentence audio, is adjusted on the melody of song, training data is largely accurately to mark Voice data and cutting markup information) in carry out Data Matching so that audio user feature matches with song features.
The audio frequency characteristics after transformation are post-processed by way of signal processing or big data driving, so that becoming The audio that audio after changing is sung on pronunciation characteristic closer to natural person, and finally it is being incorporated into background music.
The playing method scheme of a kind of " reading sentence Cheng Ge " is designed and may be implemented based on the present invention: lyrics text is inputted by user Including but not limited to nature voice, synthesized voice, splicing sound, recording sound data wait until system, and output obtains user's Timbre Synthesis Song.Marking excitation, the blessing of circle of friends song, song ballot are supported but are not limited to for synthesized voice.
It can be realized " reading sentence Cheng Ge " song synthesis of any song based on method of the invention.It is driven using based on signal Or big data driving speech model is predicted and is handled to audio user signaling point, before guaranteeing user's tone color and semanteme It puts, the sound of user is adjusted to the tone and syllable of song, support " reading sentence Cheng Ge " phonosynthesis of any song.
As shown in figure 3, to read sentence into the execution method flow of song product in the present invention, comprising the following steps:
Step 1: entering program interface, fill in or select user information (including but not limited to gender and age), generate bent Library list.
Step 2: the selection song to be synthesized uploads audio including but not limited to natural voice, synthesized voice, splicing sound, record Sound processed etc..
Step 3: audio data quality proofreads (function is not essential), correct to carry out song synthesis, and mistake is recorded again.
Step 4: user obtains synthesis song audio, and operation including but not limited to can be carried out to audio, such as: playing song Bent, sharing song, song ballot, downloading song etc..
As shown in figure 4, for the reading sentence Cheng Gezhong song synthetic technology process in the present invention, comprising the following steps:
Step 1: the load of song library song resource, the including but not limited to lyrics, music score, signal or general big data model, needle To different crowd, song type, environment adaptive model, empirical adjusting parameter.
Step 2: data prediction, including but not limited to noise reduction, speech recognition, end-point detection, speaker's specificity analysis (property Not etc.), audio quality verification, language model adaptation.It realizes to the text proofreading of audio data, mute section of end-point detection, swash It is regular etc. to encourage section.
Step 3: by the voice signal model of signal driving or big data driving, by characteristic processing and prediction, or The processing or prediction of signaling point, in the case where retaining speaker's tone color and semanteme, by the tone changing of audio user at specified The tone of song and corresponding syllable, so that more meeting the true music score of song.
Step 4: the audio after transformation is post-processed by way of signal processing or big data driving, so that The audio that transformed audio is sung on pronunciation characteristic closer to natural person.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Movement merge, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.
In some embodiments, the embodiment of the present invention provides a kind of non-volatile computer readable storage medium storing program for executing, described to deposit Being stored in storage media one or more includes the programs executed instruction, it is described execute instruction can by electronic equipment (including but It is not limited to computer, server or the network equipment etc.) it reads and executes, for executing any of the above-described song of the present invention Synthetic method.
In some embodiments, the embodiment of the present invention also provides a kind of computer program product, and the computer program produces Product include the computer program being stored on non-volatile computer readable storage medium storing program for executing, and the computer program includes that program refers to It enables, when described program instruction is computer-executed, the computer is made to execute any of the above-described song synthetic method.
In some embodiments, the embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor, And the memory being connect at least one described processor communication, wherein the memory is stored with can be by described at least one The instruction that a processor executes, described instruction is executed by least one described processor, so that at least one described processor energy Enough execute song synthetic method.
In some embodiments, the embodiment of the present invention also provides a kind of storage medium, is stored thereon with computer program, It is characterized in that, song synthetic method when which is executed by processor.
Fig. 5 is the hardware configuration signal of the electronic equipment for the execution song synthetic method that another embodiment of the application provides Figure, as shown in figure 5, the equipment includes:
One or more processors 510 and memory 520, in Fig. 5 by taking a processor 510 as an example.
The equipment for executing song synthetic method can also include: input unit 530 and output device 540.
Processor 510, memory 520, input unit 530 and output device 540 can pass through bus or other modes It connects, in Fig. 5 for being connected by bus.
Memory 520 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey Sequence, non-volatile computer executable program and module, such as the corresponding program of song synthetic method in the embodiment of the present application Instruction/module.Non-volatile software program, instruction and the module that processor 510 is stored in memory 520 by operation, Thereby executing the various function application and data processing of server, i.e. realization above method embodiment song synthetic method.
Memory 520 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function;Storage data area can be stored to be created according to using for song synthesizer Data etc..In addition, memory 520 may include high-speed random access memory, it can also include nonvolatile memory, example Such as at least one disk memory, flush memory device or other non-volatile solid state memory parts.In some embodiments, it deposits Optional reservoir 520 includes the memory remotely located relative to processor 510, these remote memories can pass through network connection To song synthesizer.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, mobile radio communication And combinations thereof.
Input unit 530 can receive the number or character information of input, and generates and set with the user of song synthesizer It sets and the related signal of function control.Output device 540 may include that display screen etc. shows equipment.
One or more of modules are stored in the memory 520, when by one or more of processors When 510 execution, the song synthetic method in above-mentioned any means embodiment is executed.
Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.
The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic devices with data interaction function.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It is realized by the mode of software plus general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned technology Scheme substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, the computer Software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions to So that computer equipment (can be personal computer, server or the network equipment etc.) execute each embodiment or Method described in certain parts of embodiment.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although The application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of song synthetic method, comprising:
Lyrics audio is obtained, the lyrics audio chants audio for the lyrics corresponding to song to be synthesized;
Obtain the dry sound of target song for corresponding to presently described song to be synthesized;
Obtain the target audio feature for corresponding to the dry sound of the target song;
It is adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio to obtain corresponding song audio;
The song audio is synthesized to obtain song with corresponding background music.
2. according to the method described in claim 1, wherein, further including preparatory building library, being stored in the library more The dry sound of song of a song to be synthesized;
Described obtain corresponding to the dry sound of target song of presently described song to be synthesized includes: to obtain to correspond to from the library In the dry sound of target song of the song to be synthesized.
3. according to the method described in claim 2, further including being cut according to the audio that the lyrics audio obtains audio user wherein Divide information;
It is described to be adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio to obtain corresponding song Audio includes:
By the target audio feature, the audio segmental information and the lyrics audio input to adaptive model to obtain phase The song audio answered.
4. according to the method described in claim 3, wherein, it is described by the target audio feature, the audio segmental information and The lyrics audio input obtains corresponding song audio to adaptive model
The lyrics audio and audio segmental information are input to preparatory trained acoustics adaptive model to the lyrics Audio carries out self-adaptive processing;
The target audio feature is input to preparatory trained song rhythm model to obtain prosodic parameter;
The lyrics audio after self-adaptive processing is adjusted to obtain corresponding song audio according to the prosodic parameter.
5. the method according to claim 3 or 4, wherein the audio segmental information includes phone segmentation information and/or sound Save segmental information and/or sound mother's segmental information.
6. according to the method described in claim 1, wherein, after obtaining lyrics audio further include:
Detect whether the lyrics audio is accurately corresponding to the corresponding lyrics;
If it is not, then further determining that word wrong in the lyrics audio;
The audio user feature for corresponding to active user is determined according to the lyrics audio;
The wrong word according to the audio user characteristic modification is to obtain the lyrics audio for being accurately corresponding to the corresponding lyrics.
7. according to the method described in claim 1, wherein, before obtaining lyrics audio further include:
Customer attribute information is obtained, the customer attribute information includes user's gender, age of user;
Song recommendations list to be synthesized is generated according to the customer attribute information;
The acquisition lyrics audio includes:
Operation determines song to be synthesized according to the user's choice, and the lyrics of the song to be synthesized are presented to the user;
Detect and obtain the lyrics audio that user chants the song to be synthesized.
8. a kind of song synthesis system, comprising:
Audio obtains program module, and for obtaining lyrics audio, the lyrics audio is readed aloud for the lyrics corresponding to song to be synthesized Pronunciation frequency;
Dry sound obtains program module, for obtaining the dry sound of target song for corresponding to presently described song to be synthesized;
Feature obtains program module, for obtaining the target audio feature for corresponding to the dry sound of the target song;
Character adjustment program module, for being adjusted according to audio frequency characteristics of the target audio feature to the lyrics audio To obtain corresponding song audio;
Audio synthesis program module, for being synthesized to obtain song the song audio with corresponding background music.
9. a kind of electronic equipment comprising: at least one processor, and deposited with what at least one described processor communication was connect Reservoir, wherein the memory be stored with can by least one described processor execute instruction, described instruction by it is described at least One processor executes, so that at least one described processor is able to carry out any one of claim 1-7 the method Step.
10. a kind of storage medium, is stored thereon with computer program, which is characterized in that the realization when program is executed by processor The step of any one of claim 1-7 the method.
CN201910188123.5A 2019-01-18 2019-03-13 Song synthesis method and system Active CN109949783B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910048355 2019-01-18
CN2019100483550 2019-01-18

Publications (2)

Publication Number Publication Date
CN109949783A true CN109949783A (en) 2019-06-28
CN109949783B CN109949783B (en) 2021-01-29

Family

ID=67009722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910188123.5A Active CN109949783B (en) 2019-01-18 2019-03-13 Song synthesis method and system

Country Status (1)

Country Link
CN (1) CN109949783B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110600034A (en) * 2019-09-12 2019-12-20 广州酷狗计算机科技有限公司 Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium
CN111488485A (en) * 2020-04-16 2020-08-04 北京雷石天地电子技术有限公司 Music recommendation method based on convolutional neural network, storage medium and electronic device
CN111554267A (en) * 2020-04-23 2020-08-18 北京字节跳动网络技术有限公司 Audio synthesis method and device, electronic equipment and computer readable medium
CN112164387A (en) * 2020-09-22 2021-01-01 腾讯音乐娱乐科技(深圳)有限公司 Audio synthesis method and device, electronic equipment and computer-readable storage medium
CN112289300A (en) * 2020-10-28 2021-01-29 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method and device, electronic equipment and computer readable storage medium
CN112309351A (en) * 2019-07-31 2021-02-02 武汉Tcl集团工业研究院有限公司 Song generation method and device, intelligent terminal and storage medium
CN112331234A (en) * 2020-10-27 2021-02-05 北京百度网讯科技有限公司 Song multimedia synthesis method and device, electronic equipment and storage medium
CN112417201A (en) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 Audio information pushing method and system, electronic equipment and computer readable medium
CN112669849A (en) * 2020-12-18 2021-04-16 百度国际科技(深圳)有限公司 Method, apparatus, device and storage medium for outputting information
CN112750422A (en) * 2020-12-23 2021-05-04 出门问问(苏州)信息科技有限公司 Singing voice synthesis method, device and equipment
CN113160849A (en) * 2021-03-03 2021-07-23 腾讯音乐娱乐科技(深圳)有限公司 Singing voice synthesis method and device, electronic equipment and computer readable storage medium
CN113488007A (en) * 2021-07-07 2021-10-08 北京灵动音科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN113539215A (en) * 2020-12-29 2021-10-22 腾讯科技(深圳)有限公司 Music style conversion method, device, equipment and storage medium
CN113555001A (en) * 2021-07-23 2021-10-26 平安科技(深圳)有限公司 Singing voice synthesis method and device, computer equipment and storage medium
CN114863898A (en) * 2021-02-04 2022-08-05 广州汽车集团股份有限公司 Vehicle karaoke audio processing method and system and storage medium
CN115910002A (en) * 2023-01-06 2023-04-04 之江实验室 Audio generation method, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105788589A (en) * 2016-05-04 2016-07-20 腾讯科技(深圳)有限公司 Audio data processing method and device
CN106971704A (en) * 2017-04-27 2017-07-21 维沃移动通信有限公司 A kind of audio-frequency processing method and mobile terminal
CN108538302A (en) * 2018-03-16 2018-09-14 广州酷狗计算机科技有限公司 The method and apparatus of Composite tone
CN108877766A (en) * 2018-07-03 2018-11-23 百度在线网络技术(北京)有限公司 Song synthetic method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105788589A (en) * 2016-05-04 2016-07-20 腾讯科技(深圳)有限公司 Audio data processing method and device
CN106971704A (en) * 2017-04-27 2017-07-21 维沃移动通信有限公司 A kind of audio-frequency processing method and mobile terminal
CN108538302A (en) * 2018-03-16 2018-09-14 广州酷狗计算机科技有限公司 The method and apparatus of Composite tone
CN108877766A (en) * 2018-07-03 2018-11-23 百度在线网络技术(北京)有限公司 Song synthetic method, device, equipment and storage medium

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112309351A (en) * 2019-07-31 2021-02-02 武汉Tcl集团工业研究院有限公司 Song generation method and device, intelligent terminal and storage medium
CN112417201A (en) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 Audio information pushing method and system, electronic equipment and computer readable medium
CN110600034A (en) * 2019-09-12 2019-12-20 广州酷狗计算机科技有限公司 Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium
CN110600034B (en) * 2019-09-12 2021-12-03 广州酷狗计算机科技有限公司 Singing voice generation method, singing voice generation device, singing voice generation equipment and storage medium
CN111488485B (en) * 2020-04-16 2023-11-17 北京雷石天地电子技术有限公司 Music recommendation method based on convolutional neural network, storage medium and electronic device
CN111488485A (en) * 2020-04-16 2020-08-04 北京雷石天地电子技术有限公司 Music recommendation method based on convolutional neural network, storage medium and electronic device
CN111554267A (en) * 2020-04-23 2020-08-18 北京字节跳动网络技术有限公司 Audio synthesis method and device, electronic equipment and computer readable medium
CN112164387A (en) * 2020-09-22 2021-01-01 腾讯音乐娱乐科技(深圳)有限公司 Audio synthesis method and device, electronic equipment and computer-readable storage medium
CN112331234A (en) * 2020-10-27 2021-02-05 北京百度网讯科技有限公司 Song multimedia synthesis method and device, electronic equipment and storage medium
US20210407479A1 (en) * 2020-10-27 2021-12-30 Beijing Baidu Netcom Science And Technology Co., Ltd. Method for song multimedia synthesis, electronic device and storage medium
CN112289300B (en) * 2020-10-28 2024-01-09 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method and device, electronic equipment and computer readable storage medium
CN112289300A (en) * 2020-10-28 2021-01-29 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method and device, electronic equipment and computer readable storage medium
WO2022089097A1 (en) * 2020-10-28 2022-05-05 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method and apparatus, electronic device, and computer-readable storage medium
CN112669849A (en) * 2020-12-18 2021-04-16 百度国际科技(深圳)有限公司 Method, apparatus, device and storage medium for outputting information
CN112750422B (en) * 2020-12-23 2023-01-31 出门问问创新科技有限公司 Singing voice synthesis method, device and equipment
CN112750422A (en) * 2020-12-23 2021-05-04 出门问问(苏州)信息科技有限公司 Singing voice synthesis method, device and equipment
CN113539215A (en) * 2020-12-29 2021-10-22 腾讯科技(深圳)有限公司 Music style conversion method, device, equipment and storage medium
CN113539215B (en) * 2020-12-29 2024-01-12 腾讯科技(深圳)有限公司 Music style conversion method, device, equipment and storage medium
CN114863898A (en) * 2021-02-04 2022-08-05 广州汽车集团股份有限公司 Vehicle karaoke audio processing method and system and storage medium
CN113160849A (en) * 2021-03-03 2021-07-23 腾讯音乐娱乐科技(深圳)有限公司 Singing voice synthesis method and device, electronic equipment and computer readable storage medium
CN113160849B (en) * 2021-03-03 2024-05-14 腾讯音乐娱乐科技(深圳)有限公司 Singing voice synthesizing method, singing voice synthesizing device, electronic equipment and computer readable storage medium
CN113488007A (en) * 2021-07-07 2021-10-08 北京灵动音科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN113488007B (en) * 2021-07-07 2024-06-11 北京灵动音科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN113555001A (en) * 2021-07-23 2021-10-26 平安科技(深圳)有限公司 Singing voice synthesis method and device, computer equipment and storage medium
CN115910002A (en) * 2023-01-06 2023-04-04 之江实验室 Audio generation method, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN109949783B (en) 2021-01-29

Similar Documents

Publication Publication Date Title
CN109949783A (en) Song synthetic method and system
CN108962217B (en) Speech synthesis method and related equipment
CN108806655B (en) Automatic generation of songs
CN108806656B (en) Automatic generation of songs
JP5293460B2 (en) Database generating apparatus for singing synthesis and pitch curve generating apparatus
JP6004358B1 (en) Speech synthesis apparatus and speech synthesis method
CN110148427A (en) Audio-frequency processing method, device, system, storage medium, terminal and server
CN105206258A (en) Generation method and device of acoustic model as well as voice synthetic method and device
CN107516511A (en) The Text To Speech learning system of intention assessment and mood
CN105304080A (en) Speech synthesis device and speech synthesis method
US9355634B2 (en) Voice synthesis device, voice synthesis method, and recording medium having a voice synthesis program stored thereon
CN108831437A (en) A kind of song generation method, device, terminal and storage medium
JP2011028230A (en) Apparatus for creating singing synthesizing database, and pitch curve generation apparatus
CN104040618A (en) System and method for producing a more harmonious musical accompaniment and for applying a chain of effects to a musical composition
EP3975167A1 (en) Electronic musical instrument, control method for electronic musical instrument, and storage medium
CN112164379A (en) Audio file generation method, device, equipment and computer readable storage medium
US20150149178A1 (en) System and method for data-driven intonation generation
JP2017111372A (en) Voice synthesis method, voice synthesis control method, voice synthesis device, and voice synthesis controller
CN110246489A (en) Audio recognition method and system for children
CN108172211B (en) Adjustable waveform splicing system and method
Goto Singing information processing
KR20220165666A (en) Method and system for generating synthesis voice using style tag represented by natural language
CN105895079A (en) Voice data processing method and device
JP2013164609A (en) Singing synthesizing database generation device, and pitch curve generation device
JP2017097332A (en) Voice synthesizer and voice synthesizing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee before: AI SPEECH Ltd.