CN111506767A

CN111506767A - Song word filling processing method and device, electronic equipment and storage medium

Info

Publication number: CN111506767A
Application number: CN202010143514.8A
Authority: CN
Inventors: 胡昌然; 吴健
Original assignee: Beijing Smart Sound Technology Co ltd
Current assignee: Beijing Smart Sound Technology Co ltd
Priority date: 2020-03-04
Filing date: 2020-03-04
Publication date: 2020-08-07

Abstract

The disclosure relates to a processing method and device for song word filling, an electronic device and a storage medium, wherein the method comprises the following steps: resolving a musical composition consisting of a melody and original lyrics from the song to be processed in response to the song recomposition operation; obtaining target lyrics according to the original lyrics and a neural network obtained by the recomposing operation expected target; and obtaining a target song according to the music and the target lyrics, wherein the target song and the song to be processed have lyric attributes similar to at least one of a lyric structure, lyric word number and lyric vowel. By adopting the method and the device, after the target lyrics are obtained by adopting the neural network obtained according to the original lyrics and the expected target of the recomposition operation, the target song is obtained according to the music and the target lyrics, and compared with manual filling, the method and the device are more efficient and more intelligent word filling processing modes.

Description

Song word filling processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of data automation processing based on deep learning, and in particular, to a method and an apparatus for processing song word filling, an electronic device, and a storage medium.

Background

In an application scenario of data processing, for song word filling, the words of an existing or given song can be refilled while the music piece part of the song is kept unchanged, so that the song becomes a new song which can be generated according to the melody of the original music piece.

The conventional song word filling is more dependent on manual filling of a user, and along with the development of technologies such as artificial intelligence, machine learning and deep learning, compared with manual filling, the method adopts a more efficient and more intelligent word filling processing mode, which is a future development trend. However, there is no effective solution to this in the related art.

Disclosure of Invention

In view of this, the present disclosure provides a technical solution for song word filling processing.

According to an aspect of the present disclosure, there is provided a method for processing song word filling, the method including:

resolving a musical composition consisting of a melody and original lyrics from the song to be processed in response to the song recomposition operation;

obtaining target lyrics according to the original lyrics and a neural network obtained by the recomposing operation expected target;

and obtaining a target song according to the music and the target lyrics, wherein the target song and the song to be processed have lyric attributes similar to at least one of a lyric structure, lyric word number and lyric vowel.

In a possible implementation manner, the adapting operation desired target is matched with the target lyric, and the obtaining of the target lyric according to the original lyric and the neural network obtained by the adapting operation desired target comprises:

extracting the similar lyric attribute for obtaining the target lyric from the original lyric;

and obtaining the target lyrics according to the extracted similar lyric attributes and the neural network.

In a possible implementation manner, the extracting the similar lyric attribute for obtaining the target lyric from the original lyric includes:

detecting whether a repeated relation exists between different words in the original lyrics, and extracting a lyric structure of the original lyrics according to the obtained detection result;

counting the word number of each lyric in the original lyrics, and extracting the word number of the lyrics of the original lyrics according to the obtained counting result;

and extracting the lyric vowel of the original lyric according to the pinyin information of the last character of each sentence of the lyric in the original lyric.

In a possible implementation manner, the obtaining the target lyric according to the extracted similar lyric attribute and the neural network includes:

generating each lyric according to the lyric structure of the original lyric, the lyric word number of the original lyric, the lyric vowel of the original lyric and the neural network;

and combining the lyrics of each sentence to obtain the target lyrics.

In a possible implementation manner, before generating each lyric according to the lyric structure of the original lyric, the lyric word number of the original lyric, the lyric vowel of the original lyric, and the neural network, the method further includes:

adopting different neural networks according to preset conditions for the condition that the lyric structure of the original lyrics is of a first relation type;

the preset conditions include: whether to use a previous lyric of the original lyric as an input of a current neural network to generate a next lyric.

In a possible implementation manner, the generating each lyric according to the lyric structure of the original lyric, the lyric word number of the original lyric, the lyric vowel of the original lyric and the neural network includes:

inputting the lyric word number of the ith lyric in the original lyrics and the lyric vowel of the last lyric in the ith lyric into a first neural network;

outputting an ith sentence for obtaining the target lyric through the first neural network; wherein i is 1;

the first neural network is composed of a first encoder and a first decoder.

inputting the lyric word number of the ith lyric in the original lyrics, the lyric vowel of the last lyric in the ith lyric and the ith-1 lyric in the target lyrics into a second neural network; wherein, the ith-1 lyrics in the target lyrics are generated correspondingly by the ith-1 lyrics in the original lyrics;

outputting an ith sentence for obtaining the target lyric through the second neural network; wherein, the i > 1;

the second neural network is composed of a second encoder and a second decoder.

According to an aspect of the present disclosure, there is provided a processing apparatus for song word filling, the apparatus including:

a response unit for resolving a musical composition composed of a melody and original lyrics from the song to be processed in response to the song recomposition operation;

the lyric processing unit is used for obtaining target lyrics according to the original lyrics and a neural network obtained by the recomposing operation expected target;

and the song processing unit is used for obtaining a target song according to the music and the target lyrics, and the target song and the song to be processed have lyric attributes which are similar to at least one of a lyric structure, lyric word number and lyric vowel.

In a possible implementation manner, the lyric processing unit is configured to adapt a desired target of the operation, match the target lyric, and extract the similar lyric attribute for obtaining the target lyric from the original lyric;

In a possible implementation manner, the lyric processing unit is configured to:

and combining the lyrics of each sentence to obtain the target lyrics.

In a possible implementation manner, the apparatus further includes a neural network selecting unit, configured to:

the first neural network is composed of a first encoder and a first decoder.

In a possible implementation manner, the lyric processing unit is further configured to:

the second neural network is composed of a second encoder and a second decoder.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform any of the methods described above.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method of any of the above.

In the present disclosure, a musical piece composed of a melody and original lyrics are separated from a song to be processed by responding to a song recomposition operation; obtaining target lyrics according to the original lyrics and a neural network obtained by the recomposing operation expected target; and obtaining a target song according to the music and the target lyrics, wherein the target song and the song to be processed have lyric attributes similar to at least one of a lyric structure, lyric word number and lyric vowel. By adopting the method and the device, after the target lyrics are obtained by adopting the neural network obtained according to the original lyrics and the expected target of the recomposition operation, the target song is obtained according to the music and the target lyrics, and compared with manual filling, the method and the device are more efficient and more intelligent word filling processing modes.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a schematic structural diagram of a Seq2Seq neural network model according to an embodiment of the present disclosure.

Fig. 2 shows a flow diagram of a processing method of song word filling according to an embodiment of the present disclosure.

Fig. 3 shows a process flow diagram of song word filling according to an embodiment of the disclosure.

Fig. 4 shows a schematic structural diagram of a neural network according to an embodiment of the present disclosure.

FIG. 5 illustrates a schematic diagram of generating lyrics based on a neural network in an embodiment in accordance with the present disclosure.

Fig. 6 shows a block diagram of a processing apparatus for song word filling according to an embodiment of the present disclosure.

Fig. 7 shows a block diagram of an electronic device according to an embodiment of the disclosure.

Fig. 8 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

The song filling refers to that for an existing or given song, under the condition of keeping the music part of the song unchanged, the song is refilled so as to become a musical composition capable of singing according to the original melody. The word filling creation of the song is a relatively wide lyric creation mode, and many classical Chinese popular songs are works of singing, such as "future" of Kiroro of Japanese band "is panned and sung by" future "of Liu Ruojin, and" most important things "of MAN band" is panned and sung by "Red day" of Li Du. Compared with the simple creation of lyrics, the recomposition of songs has more constraints, although a new theme can be selected instead of the content of the original words, in order to keep the singing performance of the original song, works before and after word filling often have similar word counts and vowels.

After the generation of natural language based on rules and the generation of natural language based on plans, the generation method based on neural network has become mainstream along with the recent rise of big data and the development of neural network algorithm. Fig. 1 illustrates a schematic structural diagram of a Seq2Seq neural network model according to an embodiment of the present disclosure, as shown in fig. 1, a neural network such as a sequence-to-sequence (Seq2Seq) can be used in the present disclosure, Seq2Seq is a kind of feedback neural network (RNN), and RNN is a kind of neural network that predicts a later state with a previous state and can memorize the previous state. The Seq2Seq refers to the case that the input and output of the end-to-end neural network are "variable length sequences", and the Seq2Seq can be implemented based on an encoder combined with a decoder, and both the encoder and the decoder can be implemented by using RNN. The input sequence a may be used as information to generate an output sequence B, where a and B are different variable length sequences.

The Seq2Seq neural network can be used for tasks such as machine translation, voice recognition, a robot question-answering system, automatic abstractions and the like. In the present disclosure, a Seq2Seq neural network may be trained based on original and target lyrics in an existing or given song. Then, in the process of applying the Seq2Seq neural network obtained after training, the target lyrics can be automatically obtained according to the Seq2Seq neural network and the original lyrics, and the target lyrics are adopted to recompose the lyrics of the song under the condition that the existing or given song is kept unchanged in the music part, so that a new song generated according to the melody of the original song is obtained in the automatic word filling processing mode different from manual filling, and the new song has similar lyrics structure, lyrics word number, lyrics vowel and other lyrics attributes to the existing or given song.

Neural networks of the present disclosure include, but are not limited to: a Seq2Seq neural network, a neural network comprising an encoder in combination with a decoder, is within the scope of the present disclosure.

Fig. 2 is a flowchart of a processing method for song word filling according to an embodiment of the present disclosure, and the method is applied to a processing apparatus for song word filling, for example, where the processing apparatus is deployed in a terminal device or a server or other processing device for execution, automated data processing such as lyric word filling may be performed. The terminal device may be a User Equipment (UE), a mobile device, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the processing method may be implemented by a processor calling computer readable instructions stored in a memory. As shown in fig. 2, the process includes:

step S101, responding to song recomposition operation, resolving a music piece composed of melody and original lyric from the song to be processed.

And S102, obtaining target lyrics according to the original lyrics and the neural network obtained by the recomposing operation expected target.

And step S103, obtaining a target song according to the music and the target lyrics, wherein the target song and the song to be processed have lyric attributes similar to at least one of a lyric structure, lyric word number and lyric vowel.

By adopting the method and the device, after the target lyrics are obtained by adopting the neural network obtained according to the original lyrics and the expected target of the recomposition operation, the target song is obtained according to the music and the target lyrics, namely, the original lyrics in any existing or given song can be fully automatically recomposed, so that a new lyric with the content different from the original lyrics and the structure, the word number and the vowel of the lyrics consistent with the original lyrics is generated.

In one example, in response to a song recomposition operation, a song to be subjected to the song word filling process may be obtained by matching with a song name in an established song database. And acquiring original lyrics of the song, and extracting lyric attributes such as a lyric structure, lyric word number, lyric vowel and the like required for generating target lyrics from the original lyrics. And screening out the adaptation operation expected target required to generate the target lyrics so as to train and obtain a neural network according to the adaptation operation expected target corresponding to the target lyrics, thereby generating the target lyrics sentence by sentence according to the original lyrics and the neural network obtained by training, and combining the lyrics generated by each sentence according to the lyric structure, the lyric word number, the lyric vowel and other lyric attributes of the extracted original lyrics to finally obtain the target lyrics.

In a possible implementation, adapting the desired target of the adaptation operation to match the target lyrics, and obtaining the target lyrics according to the original lyrics and the neural network obtained by the desired target of the adaptation operation may include: and extracting the similar lyric attributes for obtaining the target lyrics from the original lyrics, wherein the similar lyric attributes comprise at least one of a lyric structure, lyric word number and lyric vowel. And obtaining the target lyrics according to the extracted similar lyric attributes and the neural network.

In a possible implementation manner, the lyric structure, the lyric word number, and the lyric vowel may be extracted, and the extracting process may include: detecting whether a repeated relation exists between different words in the original lyrics, and extracting a lyric structure of the original lyrics according to the obtained detection result; counting the word number of each lyric in the original lyrics, and extracting the word number of the lyrics of the original lyrics according to the obtained counting result; and extracting the lyric vowel foot of the original lyric, such as the lyric vowel foot of the last character, according to the pinyin information of the last character of each sentence of the lyric in the original lyric.

In a possible implementation mode, the extracted lyric structure, the lyric word number and the lyric vowel are combined with each generated lyric, so that the final target lyric can be obtained. Namely: obtaining the target lyrics according to the extracted similar lyric attributes and the neural network, wherein the obtaining of the target lyrics comprises the following steps: generating each lyric according to the lyric structure of the original lyric, the lyric word number of the original lyric, the lyric vowel of the original lyric and the neural network; and combining the lyrics of each sentence to obtain the target lyrics.

In a possible implementation manner, before generating each lyric according to the lyric structure of the original lyric, the lyric word number of the original lyric, the lyric vowel of the original lyric, and the neural network, the method further includes: and adopting different neural networks according to preset conditions when the lyric structure of the original lyric is of a first relation type (such as an original type). The preset conditions include: whether to use a previous lyric of the original lyric as an input of a current neural network to generate a next lyric.

In one example, the disclosure includes at least two neural networks, wherein a first neural network, which may be a first Seq2Seq neural network, may generate a first sentence of the target lyrics through the first Seq2Seq neural network; a second neural network, which may be a second Seq2Seq neural network, through which the remaining lyrics, other than the first sentence, of the target lyrics may be generated.

Taking the first neural network as a first Seq2Seq neural network as an example, generating each lyric according to the lyric structure of the original lyric, the lyric word number of the original lyric, the lyric vowel of the original lyric and the neural network may include: inputting the lyric word number of the ith lyric in the original lyrics and the lyric vowel of the last lyric in the ith lyric into a first Seq2Seq neural network, and outputting the ith lyric for obtaining the target lyrics through the first Seq2Seq neural network; wherein i is 1, so that the first sentence of the target lyrics can be generated by the first Seq2Seq neural network. The first Seq2Seq neural network may be formed of a first encoder and a first decoder.

Taking the second Seq2Seq neural network as an example, generating each lyric according to the lyric structure of the original lyric, the lyric word number of the original lyric, the lyric vowel of the original lyric and the neural network may include: inputting the lyric word number of the ith lyric in the original lyrics, the lyric vowel of the last lyric in the ith lyric and the ith-1 lyric in the target lyrics into a second Seq2Seq neural network; wherein, the ith-1 lyrics in the target lyrics are generated correspondingly by the ith-1 lyrics in the original lyrics; outputting an ith sentence for obtaining the target lyric through the second Seq2Seq neural network; wherein i >1, such that the remaining lyrics, other than the first sentence, of the target lyrics may be generated by the second Seq2Seq neural network. The second neural network is composed of a second encoder and a second decoder.

Application example:

fig. 3 is a schematic diagram illustrating a processing flow of song word filling according to an embodiment of the present disclosure, as shown in fig. 3, which mainly includes: 1) the method comprises the steps that a user inputs a song name, obtains original lyrics of a song according to the song name, also can obtain the original lyrics of the song through a singer name, and also can obtain the original lyrics of the song through the song name and the singer name, and specifically can obtain the original lyrics corresponding to the song needing word filling processing through matching with the song name and/or the singer name of an established song database; 2) extracting lyric attributes such as a lyric structure, lyric word number, lyric vowel and the like required for generating target lyrics from the original lyrics through structure extraction, word number extraction and vowel extraction operations; 3) the content required by the generation of the target lyrics is screened out, each sentence of lyrics in the target lyrics is generated sentence by sentence according to the original lyrics, and the target lyrics are obtained by combining according to the extracted lyrics structure of the original lyrics and each generated lyric. It should be noted that, the present disclosure includes but is not limited to the application example shown in fig. 3, and the target lyric may also be directly output to obtain the target lyric through a neural network obtained by previous training, directly according to the lyric of the original lyric, the lyric structure of the original lyric, the lyric word number extraction, the lyric simple or compound foot, and other lyric attributes, without sentence by sentence generation.

The following is a description of various specific processing steps of possible implementations in this application example:

firstly, acquiring original lyrics of a song.

A song database is first constructed. The database contains the original lyrics of the song, the name of the song and the singer supporting the rewriting of the lyrics. The song database may be used to retrieve the original lyrics from the song name and/or the name of the singer. For example, if the user only inputs the name of a song and does not input the name of a singer, one song may be randomly selected from the songs in the song database corresponding to the name of the song, and the original lyrics of the song may be returned. For another example, if the name of the song and the name of the singer input by the user cannot be found in the song database, the method returns to 'the song is not found', and reminds the user to re-input the name of the song and/or the name of the singer.

And secondly, extracting the lyric attributes from the original lyrics through structure extraction, word number extraction and vowel extraction operations.

The extracted lyric attributes may include: the structure of the lyrics, the number of words of each lyric, and the final vowel of the last word of each lyric.

1) The lyric structure refers to the repeated relationship between different sentences in the original lyric. Starting from the second lyric of the original lyric, respectively checking whether the ith lyric is completely the same as the lyric j of any one of the 1 st to the (i-1) th sentences, and if so, marking the lyric i as the repeat type of the lyric j; otherwise, marking the lyric i as an original type. By the detection mode, the detection result of whether each lyric is of the first relation type (such as the original type) appearing for the first time or the repeated type of a certain previous lyric can be obtained, and therefore the lyric structure of the original lyric can be extracted according to the obtained detection result.

2) The lyric word number refers to the word number of each lyric in the original lyrics. The word number of each lyric can be directly obtained in a statistical manner, namely: the words of the lyrics of the original lyrics are extracted according to the obtained statistical result by counting the words of each sentence of the lyrics in the original lyrics.

3) The lyric vowel refers to the last character of each sentence of lyrics in the original lyrics. In order to obtain the final sound foot of the last character of each lyric, firstly, the pinyin information of the last character of each lyric is extracted, and then, the final sound in the pinyin is taken out as the sound foot information, namely: and extracting the lyric vowel foot of the original lyric, such as the lyric vowel foot of the last character, according to the pinyin information of the last character of each sentence of the lyric in the original lyric. Taking a word as "before" as an example, the last word and the last vowel of the word are "before" (IAN), and another word and the last lyric are "love you for a long time" (IU), so on, and will not be described any further.

And thirdly, generating each lyric in the target lyrics one by one according to the original lyrics, and finally combining to obtain the target lyrics.

According to the lyric structure obtained in the second step, it can be known that each sentence in the original lyrics is original or is a repeated type. Then, for each of the original lyrics marked as original type, a Seq2Seq neural network may be used and utilizedNumber of words c_iThe last character of the lyric_iThe rewritten last lyric g_i-1Generating a lyric g of the rewritten target lyric as a constraint condition_i. The calculation formula is shown as formula (1):

here S₁(c_i,p_i)、S₂(c_i,p_i,g_i-1) Two Seq2Seq neural networks are shown, and these two Seq2Seq neural networks may be referred to as a first Seq2Seq neural network and a second Seq2Seq neural network, respectively, and are distinguished by whether the last lyric is used as input to generate the next lyric. The word number information and the vowel information of each sentence in the original lyrics can be represented by a single hot coding vector, wherein the single hot coding can be understood as: a group of character strings or numbers are converted into a group of vectors to represent, and only one vector value in the group of vectors is 1.

Fig. 4 is a schematic structural diagram of a neural network according to an embodiment of the present disclosure, and as shown in fig. 4, a Seq2Seq neural network as a neural network model based on an encoder-decoder may use an RNN to encode an input sequence "i love you < EOS >", where < EOS > refers to a terminator, and integrate the input information into a vector with a fixed length; then, using another RNN network as a decoder to decode the fixed-length vector, and finally obtaining a complete output sequence "l loveyou < EOS >.

In the process of training the Seq2Seq neural network, the training task to which the target is directed may be abstracted as a given sequence x ═ (x ═ x)₁,x₂,x₃,x_n1) And using it as training sample, and hope to obtain correspondent output sequence y ═ y (y)₁,y₂,y₃,y_n2) Which may be referred to as adaptation of the lyrics, a desired target is operated on, and a probability distribution P (y) is calculated from training samples input into the Seq2Seq neural network and the desired target₁,y₂,…,y_n2|x₁,x₂,…,x_n1) And obtaining a loss function according to the probability distribution, training the Seq2Seq neural network according to the back propagation of the loss function until the network converges, ending the training of the Seq2Seq neural network, and then directly adopting the Seq2Seq neural network obtained by the training to respond to the adaptation operation of the lyrics so as to finally generate the target lyrics.

Fig. 5 is a schematic diagram illustrating generation of lyrics based on a neural network according to an embodiment of the present disclosure, and as shown in fig. 5, in response to the adaptation operation of the lyrics, the Seq2Seq neural network obtained by training is used, and a lyric word in an original lyric may be adapted to be "jiu duan days" if the number of words of the lyric is 3 and the vowel of the lyric is IAN, and a lyric in a generated target lyric may be obtained based on the Seq2Seq neural network, "invisible", and the like, and fig. 5 is only an example, and the present disclosure is not limited to the description of the example.

In the process of generating the target lyrics, two Seq2Seq neural networks may be employed, wherein the first Seq2Seq neural network S₁(c_i,p_i) May be referred to as model S₁It is only used to generate the lyrics of the first sentence, the number of words c of the lyrics of the first sentence₁Harmony speech p₁Splicing as input sequence X ═ c₁,p₁). For the second Seq2Seq neural network S₂(c_i,p_i,g_i-1) Simply referred to as model S₂Using the lyrics generated in the previous sentence as input, and counting the words c of the lyrics in the previous sentence_iLyric vowel p_iAnd the last lyric g_i-1Concatenating as input the sequence X ═ (c)_i,p_i,g_i-1). Model S₂After the iterative sequence of the encoder RNN in (1) is input into the model, the hidden layer state h of each position is calculated according to the following formula (2)_iAnd h in the formula (2)_i-1Representing hidden states h relative to the current position_iI is an integer greater than 1:

h_i＝Encoder(x_i,h_i-1) (2)

for each element X in the input sequence X_iBy the operation of the above formula (2), the state h of the final position can be obtained_n1And serves as input to the decoder. The decoder converts h according to the following equation (3) -equation (4)_n1As an initial state and starts decoding the target lyrics to be generated, formula (3) -formula (4) where x₁,x₂,…,x_n1Representing a sequence of inputs; y is_iIndicating that a word corresponding to the current step is desired; y is_i-1Indicating that the corresponding word of the previous step is desired; h denotes the state of the decoder, wherein,

indicating the state when the first word is decoded,

representing the state of the decoder in the iterative process, wherein for the decoder, the decoder processes the state after the ith input, and then the state and the (i + 1) th character are input into the decoder for decoding the next character; by calculating the probability distribution P (y)_i|x₁,x₂,…,x_n1,y_<i) To obtain the final output sequence, i.e. a lyric composed of a plurality of words:

by means of the formula (3) to the formula (4), at each step in the processing process of the decoder, the decoder can calculate the probability of the word generated at the current step according to the hidden state of the previous step and the word output at the previous step, and sample the word at the current step from the calculated probability distribution. Thus, after obtaining a plurality of words iteratively, a word in the target lyrics can be generated by using the plurality of words.

In the pair of models S₁Model S₂When training is carried outThe model is optimized according to the following equation (5) with the criterion of maximizing the likelihood of the generated lyrics, from the probability distribution P (y)_i|x₁,x₂,…,x_n1,y_<i) A loss function L is obtained, argmax refers to the operator that maximizes likelihood:

L＝argmax P(y_i|x₁,x₂,…,x_n1,y_<i) (5)

for the ith lyric marked as a repeat, it is a repeat of the jth lyric. Directly rewriting the target lyrics g according to the following formula (6)_jCopied as new lyrics g_i，：

g_i＝g_j。 (6)

All the lyrics are combined to form the rewritten target lyric, and the target lyric has the same structure, word number and vowel as the original lyric. By adopting the application example, after the neural network is obtained through training, a new lyric with the same lyric structure, lyric word number and lyric vowel as the original lyric can be generated according to the original lyric of a song and the neural network obtained through training. For the training of the neural network, the lyric attributes such as the lyric structure, the lyric word number and the lyric vowel of the original lyric can be extracted from the original lyric, and the Seq2Seq neural network is trained, so that the target lyric meeting the limitation of given constraint conditions (for example, the target lyric has the same lyric structure, lyric word number and lyric vowel as the original lyric) can be generated based on the neural network.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

The above-mentioned method embodiments can be combined with each other to form a combined embodiment without departing from the principle logic, which is limited by the space and will not be repeated in this disclosure.

In addition, the present disclosure also provides a processing apparatus for song word filling, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the processing methods for song word filling provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the method section are omitted for brevity.

Fig. 6 shows a block diagram of a processing apparatus for song word filling according to an embodiment of the present disclosure, and as shown in fig. 6, the processing apparatus includes: a response unit 51 for resolving a musical piece composed of a melody and original lyrics from the song to be processed in response to the song recomposition operation; a lyric processing unit 52 for obtaining target lyrics based on the original lyrics and a neural network obtained by the recomposing operation desired target; and the song processing unit 53 is configured to obtain a target song according to the music and the target lyrics, where the target song and the song to be processed have a lyric attribute similar to at least one of a lyric structure, a lyric word number and a lyric vowel.

In a possible implementation manner, the lyric processing unit is configured to adapt a desired target of the operation, match the target lyric, and extract the similar lyric attribute for obtaining the target lyric from the original lyric; and obtaining the target lyrics according to the extracted similar lyric attributes and the neural network.

In a possible implementation manner, the lyric processing unit is configured to: detecting whether a repeated relation exists between different words in the original lyrics, and extracting a lyric structure of the original lyrics according to the obtained detection result; counting the word number of each lyric in the original lyrics, and extracting the word number of the lyrics of the original lyrics according to the obtained counting result; and extracting the lyric vowel of the original lyric according to the pinyin information of the last character of each sentence of the lyric in the original lyric.

In a possible implementation manner, the lyric processing unit is configured to: generating each lyric according to the lyric structure of the original lyric, the lyric word number of the original lyric, the lyric vowel of the original lyric and the neural network; and combining the lyrics of each sentence to obtain the target lyrics.

In a possible implementation manner, the apparatus further includes a neural network selecting unit, configured to: adopting different neural networks according to preset conditions for the condition that the lyric structure of the original lyrics is of a first relation type; the preset conditions include: whether to use a previous lyric of the original lyric as an input of a current neural network to generate a next lyric.

In a possible implementation manner, the lyric processing unit is configured to: inputting the lyric word number of the ith lyric in the original lyrics and the lyric vowel of the last lyric in the ith lyric into a first neural network; outputting an ith sentence for obtaining the target lyric through the first neural network; wherein i is 1; the first neural network is composed of a first encoder and a first decoder.

In a possible implementation manner, the lyric processing unit is further configured to: inputting the lyric word number of the ith lyric in the original lyrics, the lyric vowel of the last lyric in the ith lyric and the ith-1 lyric in the target lyrics into a second neural network; wherein, the ith-1 lyrics in the target lyrics are generated correspondingly by the ith-1 lyrics in the original lyrics; outputting an ith sentence for obtaining the target lyric through the second neural network; wherein, the i > 1; the second neural network is composed of a second encoder and a second decoder.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a volatile computer readable storage medium or a non-volatile computer readable storage medium.

Embodiments of the present disclosure also provide a computer program product, which includes computer readable code, and when the computer readable code runs on a device, a processor in the device executes instructions for implementing the processing method for song word filling provided in any of the above embodiments.

The disclosed embodiments also provide another computer program product for storing computer readable instructions, which when executed, cause a computer to perform the operations of the processing method for song word filling provided by any one of the above embodiments.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured as the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 7 is a block diagram illustrating an electronic device 800 in accordance with an example embodiment. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 7, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user, in some embodiments, the screen may include a liquid crystal display (L CD) and a Touch Panel (TP). if the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), programmable logic devices (P L D), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 8 is a block diagram illustrating an electronic device 900 in accordance with an example embodiment. For example, the electronic device 900 may be provided as a server. Referring to fig. 8, electronic device 900 includes a processing component 922, which further includes one or more processors, and memory resources, represented by memory 932, for storing instructions, such as applications, that are executable by processing component 922. The application programs stored in memory 932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 922 is configured to execute instructions to perform the above-described methods.

The electronic device 900 may further include a power supply component 926 configured to perform power management of the electronic device 900, a wired or wireless network interface 950 configured to connect the electronic device 900 to a network, and an input-output (I/O) interface 958 the electronic device 900 may be operable based on an operating system stored in the memory 932, such as WindowsServers, Mac OS XTM, UnixTM, &lTtTtranslation = & &gTt L &/T &gTtinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 932, is also provided that includes computer program instructions executable by the processing component 922 of the electronic device 900 to perform the above-described method.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including AN object oriented programming language such as Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" language or similar programming languages.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Different embodiments of the present application may be combined with each other without departing from the logic, and the descriptions of the different embodiments are focused on, and for the parts focused on the descriptions of the different embodiments, reference may be made to the descriptions of the other embodiments.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for song word filling, the method comprising:

2. The method of claim 1, wherein said adapting a desired target of operation to match said target lyrics, said obtaining target lyrics from said original lyrics and a neural network resulting from said desired target of adapting operation, comprises:

3. The method of claim 2, wherein extracting the similar lyric attributes for obtaining the target lyrics from the original lyrics comprises:

4. The method of claim 2 or 3, wherein the obtaining the target lyrics according to the extracted similar lyric attributes and the neural network comprises:

and combining the lyrics of each sentence to obtain the target lyrics.

5. The method of claim 4, wherein before generating each lyric according to a lyric structure of the original lyric, a lyric word number of the original lyric, a lyric tail of the original lyric, and the neural network, the method further comprises:

6. The method of claim 5, wherein generating each lyric based on a lyric structure of the original lyric, a lyric word number of the original lyric, a lyric final foot of the original lyric, and the neural network comprises:

the first neural network is composed of a first encoder and a first decoder.

7. The method of claim 5, wherein generating each lyric based on a lyric structure of the original lyric, a lyric word number of the original lyric, a lyric final foot of the original lyric, and the neural network comprises:

the second neural network is composed of a second encoder and a second decoder.

8. A song word filling processing apparatus, the apparatus comprising:

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: performing the method of any one of claims 1 to 7.

10. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 7.