CN112634841A

CN112634841A - Guitar music automatic generation method based on voice recognition

Info

Publication number: CN112634841A
Application number: CN202011392002.1A
Authority: CN
Inventors: 刘强; 陈盛; 马文亮
Original assignee: Ai Li Chi Technology Beijing Co ltd
Current assignee: Ai Li Chi Technology Beijing Co ltd
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2021-04-09
Anticipated expiration: 2040-12-02
Also published as: CN112634841B

Abstract

The invention discloses a guitar music automatic generation method based on voice recognition, which comprises the following steps of: step 201, performing framing operation on an input guitar playing audio file to be identified, and extracting the Mel cepstrum coefficient characteristics of each frame; step 202, calculating the classification of the Mel cepstrum coefficient characteristics of each frame in the step 201, and obtaining the pitch and playing mode information of each frame of the tested audio according to the output layer label of the model; step 203, analyzing chord, tone and rhythm information of the music based on the pitch information in the music score file in the step 202; and step 204, integrating the pitch, playing mode, chord, tone and rhythm information output in the steps 202 and 203, and outputting a music score file. The method can automatically generate the guitar music, replaces manual music picking work, and effectively improves the work efficiency.

Description

Guitar music automatic generation method based on voice recognition

Technical Field

The invention belongs to the technical field of sound, and particularly relates to a music score generation method.

Background

In the field of music score, the current technical research mainly focuses on directions of music retrieval, music performance evaluation, automatic composition and the like; the guitar music is processed by focusing on the picture music, and music score is generated by identifying the picture music, and automatic playing and playing can be performed.

In the actual guitar learning process, people often encounter guitar playing of interest, but no music score in any form can be referred to, and in this case, the music score is determined according to the played sound. In reality, the process is generally called as music score scratching, repeated listening and comparison are needed for music score scratching work, and finally the tone, rhythm and chord configuration and decomposition or chord sweeping details of the music score are determined; the staffs taking music must train for a long time to have sufficient ability to distinguish the ears, and also have to master sufficient music theory and deep knowledge of guitar principle and playing technique, so to speak, it is the high-level skill in guitar learning.

It is difficult for the ordinary guitar fans to do. If the guitar music can be directly generated from the playing sound, the learning difficulty of a beginner can be reduced, convenience can be provided for professionals, and only the music score generated automatically needs to be optimized.

The guitar is a stringed musical instrument, the vibration of strings gives out sound, the vibration frequency of the strings determines the pitch, the vibration frequency of the strings is influenced by the diameters and the vibration lengths of the strings, the different diameters of the strings are different, the different grades on the fingerboard determine the vibration lengths of the strings, and therefore the combination of the strings and the grades determines the pitch of the playing. In addition, the guitar belongs to the standard tuning musical instrument, and the standard tones of one string to six strings are E, B, G, D, A, E in turn, so that different strings and taste combinations may give the same pitch.

The chord is a musical concept and is composed of a group of sounds satisfying a certain interval relationship, wherein the lowest pitch of the sounds is a root and is the basis of the chord, and the intervals between different tones determine the color of the chord, which jointly form a chord. The chord is played mainly by two types of chord decomposition and chord sweeping, the chord decomposition is to pop out the chord in turn to form the sound, and the chord sweeping sweeps a plurality of strings simultaneously to produce a plurality of chord tones.

The playing of the guitar can be divided into accompaniment and solo according to the function, the accompaniment generally only plays the chord, provides harmony for the melody, and the solo plays the melody while playing the chord. Whether the music accompaniment or the solo, the pitch of each note is firstly determined when the music score is played, and if the music score is swept, the chord and the swept string are determined, which all need to be distinguished by human ears.

With the development of artificial intelligence technology, sound processing based on deep learning has a great progress, and the method achieves ideal effects in scenes such as voice recognition, scene classification and the like, and is widely applied. The strong learning ability of the neural network is utilized, the work of identifying the pitch of the note can be completed instead of human ears, all the sounds which can be played by the guitar are used as parameters of training data to train a neural network model, unknown sounds are classified by using the model, the pitch information in playing can be determined, and the first step of scratching the music score is also completed.

After the pitch of the music is determined, the musical instruments are divided into bars, and the chord used by each bar is further determined, usually the root of the chord is played first, and the first beat of the bar is the repeated beat, and then the chord trend of the music is determined by combining the combined sound information in the chord.

Finally, the key and the rhythm of the music are determined, the key is generally determined by the composition of the chord, and the rhythm reflects the weight distribution and can be determined by analyzing the energy distribution of the music.

Disclosure of Invention

The invention aims to improve the prior technical problem, namely the invention aims to provide a guitar spectrum automatic generation method based on voice recognition. The technical scheme of the invention is as follows:

a guitar music automatic generation method based on voice recognition is characterized by comprising a process of picking up music of a guitar playing audio file to be recognized, wherein the process comprises the following steps:

step 201, performing framing operation on an input guitar playing audio file to be identified, and extracting the Mel cepstrum coefficient characteristics of each frame;

step 202, calculating the classification of the Mel cepstrum coefficient characteristics of each frame in the step 201, and obtaining the pitch and playing mode information of each frame of the tested audio according to the output layer label of the model;

step 203, analyzing chord, tone and rhythm information of the music based on the pitch information in the music score file in the step 202;

and step 204, integrating the pitch, the playing mode, the chord, the tone, the rhythm and the information output in the steps 202 and 203, and outputting a music score file.

Further, the step 203 comprises the following sub-steps:

step 2031, determining the positions of all chord root sounds in the music, calculating the time intervals, selecting the interval duration with the most frequent occurrence, preliminarily dividing the music into uniform time segments, comparing the sounds appearing in each time segment with the chord template, calculating the similarity, selecting the chord with the highest similarity as the chord of the time segment, and writing the chord information into the music score file;

step 2032, comparing the chord tables of the guitar according to the chord information obtained in step 2031, comparing the number of the neutralized strings in the music in each tune, selecting the tune with the most number as the tonality of the music, and writing the tonality in the music file;

step 2033, calculating the energy of each chord root, calculating the average value of all the energies as a threshold, selecting the root with the energy larger than the threshold, calculating the time interval between adjacent roots, and selecting the most frequently occurring time interval as the bar of the music; calculating the number of notes of each measure according to the duration of the measure, taking the root as accent, selecting notes with the energy close to that of the root from other notes as accent beats, determining which one of four beats, four beats and three beats or eight beats is selected to obtain rhythm information, and writing the rhythm information into a music score file;

step 2034, calculating the duration of a quarter note according to the bar duration and the rhythm information to obtain the playing speed of the music, and writing the playing speed into the music file.

Further, in step 201, the classification to which the mel-frequency cepstrum coefficient feature of each frame belongs is calculated based on a deep neural network model, and the building of the deep neural network model includes the following steps:

step 101, collecting guitar playing sound information by using a recording device, wherein the sound information comprises the grade string plucking sound of each string and all chord sweeping sounds, and generating an audio file;

102, marking the collected guitar sound information, wherein the marking content is a pitch and a playing mode, the pitch may include one or more than one, and the playing mode is divided into string plucking and string sweeping;

103, performing framing operation on the audio file in the step 101, and extracting the Mel cepstrum coefficient characteristics of each frame;

and step 104, training the deep neural network by using a back propagation algorithm based on the characteristic file obtained in the step 103 and the labeled file obtained in the step 102 to obtain a deep neural network model.

Based on the automatic generation method of the guitar music, for guitar playing without picture music, the guitar music can be automatically generated, manual music picking work is replaced, and the working efficiency can be effectively improved for professionals; the method reduces the difficulty of picking up the music score, even a beginner can independently finish the music score picking, and the guitar learning work is convenient. In addition, the method utilizes a deep learning technology, training data can be continuously added according to the characteristics of the neural network aiming at the weaknesses of the neural network, the resolving power exceeding that of human ears is finally obtained, and the method is finally greatly superior to the traditional method in effect and efficiency by combining the strong computing power of a computer.

Detailed Description

The following description is presented to disclose the invention so as to enable any person skilled in the art to practice the invention. The preferred embodiments in the following description are given by way of example only, and other obvious variations will occur to those skilled in the art.

For ease of understanding, before describing embodiments of the present invention, the features of the guitar sound will be described first.

The invention utilizes the strong learning ability of the neural network to replace the human ear to finish the work of identifying the pitch of the note, takes all the sounds played by the guitar as the parameters of training data to train the neural network model, and uses the model to classify the unknown sounds, so that the pitch information in playing can be determined, and the first step of scratching the music score is finished. After the pitch of the music is determined, the musical instruments are divided into bars, and the chord used by each bar is further determined, usually the root of the chord is played first, and the first beat of the bar is the repeated beat, and then the chord trend of the music is determined by combining the combined sound information in the chord. Finally, the key and the rhythm of the music are determined, the key is generally determined by the composition of the chord, and the rhythm reflects the weight distribution and can be determined by analyzing the energy distribution of the music.

The guitar spectrum automatic generation method based on the voice recognition provided by the embodiment of the invention comprises two steps which are sequentially executed: and establishing a neural network model, and performing score scratching on the guitar playing audio file to be identified.

The neural network model building method comprises the following steps:

step 101, collecting guitar playing sound information by using a recording device, wherein the sound information comprises the grade string-plucking sound of each string and all chord string-sweeping sounds, and generating an audio file. Here, the guitar playing sound may come from a finger or a pick, and the audio file is preferably saved in 16k16bit PCM format.

And 102, marking the collected sound information of the guitar, wherein the marking content is a pitch and a playing mode, the pitch may contain one or more than one, and the playing mode is divided into string plucking and string sweeping. The output result of this step is a label file in text format.

And 103, performing framing operation on the audio file in the step 101, extracting the Mel cepstrum coefficient characteristics of each frame, wherein the output result of the step is a binary characteristic file.

And step 104, training the deep neural network by using a back propagation algorithm based on the characteristic file obtained in the step 103 and the labeled file obtained in the step 102 to obtain a deep neural network model. The output result of this step is a binary model file.

Establishing the deep neural network model, namely training the neural network model by using guitar playing sound and labels by using a deep learning technology; the characteristics of the vibration of different pitches on the frequency domain are learned by converting the audio frequency of the time domain into the frequency domain and extracting relevant characteristics and utilizing the strong learning capacity of the neural network, so that the human ear can be replaced, and the work of distinguishing different pitches is completed. According to the characteristics of the neural network, training data can be continuously added aiming at the weaknesses of the neural network, the resolving power exceeding that of human ears is finally obtained, and the effect and the efficiency are greatly superior to those of the traditional method by combining the strong computing power of a computer.

The method specifically comprises the following steps of:

step 201, performing framing operation on an input guitar playing audio file to be identified, and extracting the mel cepstrum coefficient characteristics of each frame. The output result of this step is a binary signature file.

Step 202, calculating the classification of the Mel cepstrum coefficient characteristics of each frame in the step 201 by using the deep neural network model generated in the step 104, obtaining the pitch and the playing mode information of each frame of the tested audio according to the output layer label of the model, newly building a binary music score file, and writing the corresponding pitch information;

step 203, analyzing the chord, tone and rhythm information of the music based on the pitch information in the music file in step 202, including the following steps:

step 2034, calculating the duration of a quarter note according to the bar duration and the rhythm information to obtain the playing speed of the music, and writing the playing speed into a music score file;

and 204, integrating the previous pitch, playing mode, chord, tone and rhythm information, and outputting a music score file in a text form.

The automatic guitar music generation method directly generates the guitar music from the playing sound of the guitar, and has the following characteristics and effects:

1. compared with a method for obtaining music scores from picture music, the method can automatically generate the guitar music for playing the guitar without the picture music, replaces the manual music picking work, and can effectively improve the working efficiency for professionals; the method reduces the difficulty of picking up the music score, even a beginner can independently finish the music score picking, and the guitar learning work is convenient.

2. Training a neural network model by using guitar playing sound and labels by utilizing a deep learning technology; the characteristics of the vibration of different pitches on the frequency domain are learned by converting the audio frequency of the time domain into the frequency domain and extracting relevant characteristics and utilizing the strong learning capacity of the neural network, so that the human ear can be replaced, and the work of distinguishing different pitches is completed. According to the characteristics of the neural network, training data can be continuously added aiming at the weaknesses of the neural network, the resolving power exceeding that of human ears is finally obtained, and the effect and the efficiency are greatly superior to those of the traditional method by combining the strong computing power of a computer.

3. After the neural network identifies the pitch information of each frame of audio, the information of chord distribution, tone, rhythm, speed and the like of the music score is obtained through analysis by combining the pitch composition of the chord and the characteristics of guitar playing. These are all done automatically, do not need manual participation, help the guitar fan to do guitar study faster better.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are given by way of illustration of the principles of the present invention, and that various changes and modifications may be made without departing from the spirit and scope of the invention. Such changes and modifications are intended to be within the scope of the claimed invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A guitar music automatic generation method based on voice recognition is characterized by comprising a process of picking up music of a guitar playing audio file to be recognized, wherein the process comprises the following steps:

and step 204, integrating the pitch, playing mode, chord, tone and rhythm information output in the steps 202 and 203, and outputting a music score file.

2. A method for automatic generation of guitar spectrum based on voice recognition according to claim 1, characterized by the step 203 comprising the sub-steps of:

3. The method for automatically generating guitar spectrum based on voice recognition according to claim 1 or 2, wherein in the step 201, the classification to which the mel cepstral coefficient feature of each frame belongs is calculated based on a deep neural network model, and the deep neural network model is established by the following steps:

102, marking the collected guitar sound information, wherein the marking content is pitch and playing mode;

4. The method of automatic generation of guitar spectrum based on voice recognition according to claim 3, characterized in that the guitar playing voice information comes from finger or pick playing.