CN112863465B

CN112863465B - Context information-based music generation method, device and storage medium

Info

Publication number: CN112863465B
Application number: CN202110107935.XA
Authority: CN
Inventors: 曾坤; 吴尚达; 朱明杰; 林格
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2023-05-23
Anticipated expiration: 2041-01-27
Also published as: CN112863465A

Abstract

The invention discloses a music generation method, a device and a storage medium based on context information, wherein the method comprises the following steps: acquiring a music information file; converting a plurality of notes in the music information file into a first sequence of element lists; a melody generation step; extracting the tail of the first melody sequence to obtain a first tail sound sequence; inputting the first tail sound sequence into a preset neural network model based on the context information to obtain a first tail sound sequence; repeating the melody generating step, the step between the melody generating step and the melody connecting step, and the melody connecting step N times; taking the second tuple list sequence obtained in the Nth melody connection execution step as a final tuple list sequence; and decoding the final tuple list sequence to obtain a new music information file. By adopting the embodiment of the invention, the practicability of music generation can be improved.

Description

Context information-based music generation method, device and storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for generating music based on context information, and a storage medium.

Background

The music generation method can be classified into a rule-based system and a neural network system according to the structure of the model and the manner in which the data is processed. The main ideas of the rule-based system are: based on a set of pre-made parameters, typically implemented through a set of tests or rules, to compose musical compositions of the same style or genre. The computer needs to meet the tests or rules when generating music, and finally the music works meeting the requirements are obtained. The main ideas of the neural network system are as follows: the program itself contains little or no specific music theory. They automatically learn the intrinsic rules from example material provided by the user or programmer and then generate a musical composition similar to the example material based on these self-learned rules.

Prior to the popularity of neural networks, the generation of music using rule-based systems was the most common music generation solution. However, these solutions involve a large number of subjective choices that are difficult to verify, resulting in the quality of the resulting work often being unsatisfactory.

In recent years, in order to achieve automatic generation of high-quality music, many researchers have proposed new schemes based on neural networks. With the advent of more and more music data sets, music generation models will be able to better learn the style of music from the corpus and generate new score. At present, a music generation model based on a neural network mainly comprises a cyclic neural network, a convolution neural network and a generation type countermeasure network.

However, current neural network-based music generation methods fail to generate chords or notes shorter in duration than the sixteenth note or in irregular rhythms, nor emphasize the specificity of the ending portion of the music so that the generated music lacks a distinct ending.

Disclosure of Invention

The embodiment of the invention provides a music generating method, a device and a storage medium based on context information, which adopts a tuple list to represent any rhythm and any chord in a music information file, and finally generates an ending sequence through an ending generating model, so that the generated music has obvious ending characteristics, and the practicability of the generated music is improved.

A first aspect of an embodiment of the present application provides a music generating method based on context information, the method including:

acquiring a music information file; the music information file contains a plurality of notes and ordering information among the notes;

converting a plurality of notes in the music information file into a first tuple list sequence, and taking the first tuple list sequence as an initial tuple list sequence; each tuple in the first tuple list sequence corresponds to a note, and each tuple contains duration information and pitch information of the corresponding note; the pitch information includes absolute pitch information of a corresponding note;

melody generation step: generating a first melody sequence according to the initial tuple list sequence;

extracting the tail of the first melody sequence to obtain a first tail sound sequence; the tail consists of M tuples arranged at the last of the first melody sequence, wherein M is more than or equal to 1;

inputting the first tail sound sequence into a preset neural network model based on the context information to obtain a first tail sound sequence;

melody connection step: connecting the first ending sequence with the first melody sequence to obtain a second tuple list sequence, and taking the second tuple list sequence as the initial tuple list sequence;

repeating the melody generating step, the steps from the melody generating step to the melody connecting step, and the melody connecting step for N times, wherein N is greater than or equal to 1; taking the second tuple list sequence obtained in the Nth melody connection execution step as a final tuple list sequence;

and decoding the final tuple list sequence to obtain a new music information file.

In a possible implementation manner of the first aspect, the converting the plurality of notes in the music information file into a first tuple list sequence, and taking the first tuple list sequence as an initial tuple list sequence, further includes:

traversing all the tones of the initial tuple list sequence in a semitone displacement mode, and changing absolute pitches of notes corresponding to the initial tuple list sequence to obtain the initial tuple list sequence with enlarged data volume.

In a possible implementation manner of the first aspect, the converting the plurality of notes in the music information file into a first tuple list sequence, and taking the first tuple list sequence as an initial tuple list sequence specifically includes:

representing each note in the music information file as a corresponding tuple, each tuple containing duration information and pitch information of the corresponding note; the duration information is represented by a rational number, and the pitch information is represented by a character string.

In a possible implementation manner of the first aspect, the generating a first melody sequence according to the initial tuple list sequence specifically includes:

extracting past information and future information of the initial tuple list sequence; the past information is a forward sequence with respect to the initial tuple list sequence, and the future information is a reverse sequence with respect to the initial tuple list sequence;

and inputting the past information and the future information into a deep bidirectional LSTM network model to obtain a first melody sequence.

In a possible implementation manner of the first aspect, the neural network model based on the context information is a model based on a deep unidirectional LSTM network.

In a possible implementation manner of the first aspect, the music information file is a MIDI file.

A second aspect of the embodiments of the present application provides a music generating apparatus based on context information, including:

the music acquisition module is used for acquiring a music information file; the music information file contains a plurality of notes and ordering information among the notes;

a data encoding module for converting a plurality of notes in the music information file into a first tuple list sequence and using the first tuple list sequence as an initial tuple list sequence; each tuple in the first tuple list sequence corresponds to a note, and each tuple contains duration information and pitch information of the corresponding note; the pitch information includes absolute pitch information of a corresponding note;

a melody generating module, configured to generate a first melody sequence according to the initial tuple list sequence;

the ending generating module is used for extracting the end of the first melody sequence to obtain a first tail sound sequence; the tail consists of M tuples arranged at the last of the first melody sequence, wherein M is more than or equal to 1;

the ending generating module is further configured to input the first tail sound sequence into a preset neural network model based on context information, so as to obtain a first ending sequence;

the ending generating module is further configured to connect the first ending sequence with the first melody sequence to obtain a second tuple list sequence, and use the second tuple list sequence as a final tuple list sequence;

and the music decoding module is used for decoding the final tuple list sequence to obtain a new music information file.

A third aspect of the embodiments of the present application provides a computer readable storage medium, including a stored computer program, where the computer readable storage medium is controlled to execute the music generating method based on the context information according to the above embodiments when the computer program runs.

Compared with the prior art, the method, the device and the storage medium for generating the music based on the context information provided by the embodiment of the invention adopt the tuple list representation method to represent the special rhythm and chord in the music information file, and completely convert the time value and the pitch information in the music information file into the tuple list sequence. In the tuple, the duration of each note is represented by a positive rational number, capable of representing any duration; since the pitch attribute is stored in the form of a string, a plurality of pitches can be filled in while also well supporting the representation of chords. A tail generation model is also introduced, specifically for generating a tail, ensuring that the generated music has obvious tail characteristics. The ending generating model can ensure that all ending has ending feeling, does not depend on post-processing, greatly reduces the dependence on the self composing capability of a user, and greatly improves the practicability of music generation and the practicability of generated music.

In addition, before the melody sequence is generated, the tone of the music in the tuple sequence can be expanded, so that the sufficient data can be ensured under each tone number. It not only ensures that the music generation model can generate music of any tonality, but also greatly expands the available music data set.

Drawings

Fig. 1 is a flowchart of a music generating method based on context information according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a music generating apparatus based on context information according to an embodiment of the present invention;

FIG. 3 is an exemplary diagram of a method for representing a list of tuples according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating the operation of a melody generating module according to an embodiment of the present invention;

fig. 5 is a schematic diagram illustrating the operation of an end generating module according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a first aspect of an embodiment of the present invention provides a music generating method based on context information, the method including:

s10, acquiring a music information file; the music information file includes a plurality of notes and ranking information between the plurality of notes.

S11, converting a plurality of notes in the music information file into a first tuple list sequence, and taking the first tuple list sequence as an initial tuple list sequence; each tuple in the first tuple list sequence corresponds to a note, and each tuple contains duration information and pitch information of the corresponding note; the pitch information includes absolute pitch information of a corresponding note.

S12, melody generation: a first melody sequence is generated from the initial tuple list sequence.

S13, extracting the tail end of the first melody sequence to obtain a first tail sound sequence; the end consists of M tuples arranged at the end of the first melody sequence, M being greater than or equal to 1.

S14, inputting the first tail sound sequence into a preset neural network model based on the context information to obtain a first tail sound sequence.

S15, melody connection: and connecting the first ending sequence with the first melody sequence to obtain a second tuple list sequence, and taking the second tuple list sequence as the initial tuple list sequence.

S16, repeatedly executing the melody generation step, the steps from the melody generation step to the melody connection step and the melody connection step for N times, wherein N is greater than or equal to 1; and taking the second tuple list sequence obtained by the Nth execution of the melody connection step as a final tuple list sequence.

S17, decoding the final tuple list sequence to obtain a new music information file.

In S16, the loop is executed for N times from S12 to S16, which is to repeatedly iterate the tuple list, and the iteration can ensure the whole harmony of the recorded music in the finally output music information file to the greatest extent.

Compared with the prior art, the method, the device and the storage medium for generating the music based on the context information provided by the embodiment of the invention adopt the tuple list representation method to represent the special rhythm and chord in the music information file, and completely convert the time value and the pitch information in the music information file into the tuple list sequence. In the tuple, the duration of each note is represented by a positive rational number, capable of representing any duration; since the pitch attribute is stored in the form of a string, a plurality of pitches can be filled in while also well supporting the representation of chords. A tail generation model is also introduced, specifically for generating a tail, ensuring that the generated music has obvious tail characteristics. The ending generating model can ensure that all ending has ending feeling, and meanwhile, the post-processing is not relied on, so that the dependence on the self-composing capability of a user is greatly reduced.

Illustratively, S11 further comprises, thereafter:

In contrast to the conventional method of unifying the musical tones of the music data to the basic tone, in this step, the pitch of all the music data in the sequence of the tuple list is traversed by the semitone displacement in the range of [ -6, +6) for all the musical tones, so that equal parts of the music data are ensured to be available under any musical tone and sufficient amount of the music data provide learning materials for the model. Eventually, the amount of available music data can be expanded up to 12 times compared to the original data set.

Since only the main sound is changed without changing the tuning class when the data is enhanced, the style characteristics of the music information file are preserved. Although the transposition changes the absolute pitch of the notes, the rhythm of the notes, the relative pitch between notes, are not changed. Therefore, the rest of the information of the music information file is unchanged except for the change of the musical range, and the style of the music information file is not changed.

Illustratively, S11 specifically includes:

Referring to fig. 3, each note in the music information file is represented as a tuple: (D, P), wherein D represents the duration of the note, expressed in rational numbers; p represents the pitch of a note and is represented by a string. In particular, when the note is a chord, P will contain a plurality of pitches, and in this representation the chord will be converted into a string containing a plurality of pitch information; and when the note is a rest, P will contain a symbol indicating that it is an unsigned note.

The tuple list representation method can well represent the music information files containing special rhythm type and chords, and lays a foundation for the implementation of the music generation method supporting the two special notes.

It should be noted that fig. 3 is a schematic example of a data element list with special rhythms and chords. For the conventional cadence, its time value is expressed in decimal form; for a particular cadence pattern, its time value is expressed in fractional form. For a single tone, its pitch is represented as a corresponding tone name and tone group; for a chord, its pitch is converted to a digital sequence.

Illustratively, S12 specifically includes:

extracting past information and future information of the initial tuple list sequence; the past information is a forward sequence with respect to the initial tuple list sequence and the future information is a reverse sequence with respect to the initial tuple list sequence.

The first melody sequence is generated using a bi-directional structure whose basic idea is to represent each sequence in forward and reverse directions to two separate recursive hidden layers, eventually connected to the same output layer. The bi-directional structure provides complete, symmetrical past and future information for each note generated, which enables a more accurate prediction and generation of melody sequences.

Referring to fig. 4, a tuple list sequence L is an input layer, a melody sequence M is an output layer, and H is a hidden layer. After the input music information file is obtained, the music information file is sampled and encoded, and converted into the initial tuple list sequence l= (L) ₁ ,…,l _t ,…,l _T ) Where T represents the length of the original tuple list sequence. Then, the past information and the future information are extracted from L as inputs of the N-layer depth bidirectional LSTM network model. Finally, the network will output a melody sequence m= (M ₁ ,…,m _k ,…,m _K ) And transmitting to an ending generation module, wherein K represents the length of the melody sequence.

In practice, a deep bi-directional LSTM network can be seen as optimizing a differentiable error function:

wherein S is _train Represents the total number of sequences in the training data and w is the weight between the network nodes. The model training aim is to minimize the cross entropy loss function, namely the Melody sequence M= (M) obtained by comparison model generation ₁ …,m _k ,…,m _K ) And real data

Differences between them. For a particular input sequence j, the error function may be expressed as:

wherein K is _j Is the length of the sequence j, C _num Is the number of classifications. In each iteration of the step, the model calculates the weight w and bias value b by the following formula:

S _dw (r)＝β·S _dw (r-1)+(1-β)dw ² (r-1)，

S _db (r)＝β·S _db (r-1)+(1-β)db ² (r-1),

in the above formula, S _dw (r) and S _db (r) is the exponentially weighted average of dw and db in the r-th iteration, β is the momentum value, α is the learning rate, and w (r) and b (r) are the updated values of the weights w and bias values b, respectively, after the r-th iteration. Epsilon is a decimal to provide stability of the value. If the verification error is not obvious after the R-th iterationThe model considers it to have converged.

Illustratively, the context information based neural network model is a deep unidirectional LSTM network based model.

To accurately predict and take into account information that is not available after the end, neural network models use deep unidirectional LSTM networks to generate the end portion. As shown in fig. 5, the input of the neural network model is the tail sequence c= (C) ₁ ,…,c _i ,…,c _I ) Wherein I represents the length of the coding sequence. The tailnote sequence C is extracted from the end of the first melody sequence M. The output of this deep LSTM network is the ending sequence E, a fixed duration set of notes. The neural network model will link the first melody sequence M and the first ending sequence E as a new tuple list sequence L ^* . If the generation has not ended, L ^* Will be used as the original tuple sequence for the melody iteration generation to the next one. Otherwise, L ^* Will be decoded, converted and output as a new music information file.

During the generation process, the neural network model can continuously adjust the connection between the melody sequence M and the ending sequence E to make the transition sound more natural, and the ending generator can continuously reevaluate the new tuple list sequence L ^* Whether or not update to new end E is required ^* . It should be noted that this nonlinear creation process also corresponds to the way a human composer writes a spectrum.

Illustratively, the music information file is a MIDI file.

Referring to fig. 2, an embodiment of the present invention provides a music generating apparatus based on context information, including: the music retrieval module 20, the data encoding module 21, the melody generation module 22, the ending generation module 23, and the music decoding module 24.

A music acquisition module 20 for acquiring a music information file; the music information file contains a plurality of notes and ordering information among the notes;

a data encoding module 21 for converting a plurality of notes in the music information file into a first tuple list sequence, and taking the first tuple list sequence as an initial tuple list sequence; each tuple in the first tuple list sequence corresponds to a note, and each tuple contains duration information and pitch information of the corresponding note; the pitch information includes absolute pitch information of a corresponding note;

a melody generation module 22 for generating a first melody sequence from the initial tuple list sequence;

an ending generating module 23, configured to extract the end of the first melody sequence to obtain a first tail sound sequence; the tail consists of M tuples arranged at the last of the first melody sequence, wherein M is more than or equal to 1;

the ending generating module 23 is further configured to input the first tail sound sequence into a preset neural network model based on context information, to obtain a first ending sequence;

the ending generating module 23 is further configured to connect the first ending sequence with the first melody sequence to obtain a second tuple list sequence, and use the second tuple list sequence as a final tuple list sequence;

the music decoding module 24 is configured to decode the final tuple list sequence to obtain a new music information file.

Referring to fig. 2, a music generating apparatus frame of the present invention is shown. The device consists of a music acquisition module 20, a data encoding module 21, a melody generation module 22, a tail generation module 23 and a music decoding module 24, wherein the data encoding module 21 is responsible for converting an input music information file (MIDI) into a tuple list sequence; the melody generation module 22 will generate a new melody sequence based on the existing tuple list sequence; the ending generating module 23 extracts the ending sound sequence from the melody sequence and based on this output a matching ending part ending sequence, combines the melody sequence and the ending sequence. If the iteration has not ended, the combined sequence is taken as input to the melody generation module 22, otherwise it is fed to the music decoding module for decoding and output in MIDI file format as a new music information file.

The computer readable storage medium of the embodiments of the present invention may be a computer readable signal medium or a computer readable storage medium or any combination of the two. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include at least the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). In addition, the computer-readable storage medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims

1. A music generation method based on context information, comprising:

2. The context information based music generating method of claim 1, further comprising, after said converting a plurality of notes in said music information file into a first tuple list sequence and taking said first tuple list sequence as an initial tuple list sequence:

3. The method for generating music based on context information according to claim 1, wherein said converting a plurality of notes in said music information file into a first sequence of tuple list and using said first sequence of tuple list as an initial sequence of tuple list comprises:

4. The method for generating music based on context information according to claim 1, wherein said generating a first melody sequence from said initial tuple list sequence comprises:

5. The method for generating music based on context information according to claim 1, wherein the neural network model based on context information is a model based on deep unidirectional LSTM network.

6. The music generating method based on context information according to claim 1, wherein the music information file is a MIDI file.

7. A music generating apparatus based on context information, comprising:

8. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform a music generating method based on context information according to any of claims 1-6.