CN110164473B

CN110164473B - Chord arrangement detection method based on deep learning

Info

Publication number: CN110164473B
Application number: CN201910422361.8A
Authority: CN
Inventors: 朱媛媛; 郭威; 于贺
Original assignee: Jiangsu Normal University
Current assignee: Jiangsu Normal University
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2021-03-26
Anticipated expiration: 2039-05-21
Also published as: CN110164473A

Abstract

The invention discloses a chord arrangement detection method based on deep learning, and relates to the technical field of chord detection. The chord arrangement detection method comprises the steps of extracting characteristics of sound parts and chord tones by applying a deep learning algorithm, further compressing dimension information of the sound parts and the chord tones by using a principal component analysis method, judging and classifying whether chords have errors or not and error types by using an SVM classifier, positioning notes in the chord tones with density arrangement errors by using a target detection algorithm, measuring pixel distances of the notes through a prediction frame of adjacent sound parts, converting the pixel distances into interval degrees, judging error information based on an arrangement rule of the chords, and labeling the notes with errors. According to the chord arrangement detection method based on deep learning, disclosed by the invention, the workload of teachers for checking students' homework is reduced in a mode of autonomous checking of the students, and meanwhile, the purpose of enhancing the learning efficiency is also achieved.

Description

Chord arrangement detection method based on deep learning

Technical Field

The invention relates to the technical field of chord detection, in particular to a chord arrangement detection method based on deep learning.

Background

At present, courses of music major are divided into practice courses and theoretical courses, and music theory, harmony and harmony in music major theoretical courses are written in music theory.

The national and acoustic courses generally adopt 'four harmony parts', namely four vocal parts in a combined spectrogram consisting of a high-pitch spectrogram and a low-pitch spectrogram, wherein the high-pitch spectrogram comprises two vocal parts (a high-pitch part and a middle-pitch part), and the low-pitch spectrogram comprises two vocal parts (a sub-middle-pitch part and a low-pitch part). Three or more notes are combined in the longitudinal direction in a three-degree superposed relationship to form a chord, which is a general longitudinal structure of four parts and tones. In four harmony writing, three chords have two kinds of arrangement methods of dense and open, and seven chords have three kinds of arrangement methods of dense, open and mixed.

In recent years, with the continuous development of computer technology, most colleges and universities in China actively promote a digital and interactive teaching mode. However, the national music teaching and harmony writing system is still imperfect, and the existing systems for writing chords, harmony and polyphony are all staff notation software introduced from abroad, such as Sibelius, Tonica, Overture, file and the like, which can only write music scores and do not have the functions of analyzing, detecting, judging and explaining harmony. Chord arrangement detection is one of central technologies of a detection unit of a music teaching harmony writing system, and plays an important role in judging the writing result of the harmony, namely the longitudinal structure of harmony. In the detection of the chord arrangement, two aspects of research are mainly included: judging the sound part of the note according to the direction of the symbol, and detecting whether sound part crossing occurs or not; the second is a method for detecting chord arrangement. The detection tasks are finished manually by teachers at present, and the intelligent detection technology for chord arrangement by using a computer is still blank in China.

Therefore, in view of the above problems, there is a need for a new chord arrangement detection method, which can reduce the teaching pressure of teachers and enhance the efficiency of autonomous learning of students by means of autonomous examination of students.

Disclosure of Invention

In view of the above, the present invention discloses a chord arrangement detection method based on deep learning, which utilizes the powerful operation function of a computer, judges and classifies vocal part intersection and chord arrangement problems possibly occurring in chords through a deep learning algorithm, labels error information and corrects opinions for chords with density arrangement problems, reduces the workload of teachers for checking student assignments through a mode of autonomous student check, and simultaneously achieves the purpose of enhancing learning efficiency.

The chord arrangement detection method based on deep learning provided by the invention comprises the following steps:

the method comprises the following steps: and (3) performing feature extraction on chord tones by using a convolutional neural network, further compressing feature dimension information extracted by using a principal component analysis method, and classifying the compressed chord tone features by using an SVM (support vector machine) so as to classify and judge whether chord tones have errors and error types.

Step two: and positioning the predicted frame of the three sound parts above the chord with the wrong density arrangement, and accurately positioning the positions of the notes.

Step three: the degrees of the interval between the three parts above the chord are measured.

Step four: and detecting and judging the arrangement method of the written chords based on the arrangement method of the chords and the degree of the interval of the adjacent sound parts in the three upper sound parts.

Step five: and marking the chord error detected by the arrangement method and giving a text description.

Preferably, in the second step, the music score is divided into S × S subgraphs, chord detection is performed on each subgraph, and the positions of chord notes in the subgraphs are accurately positioned.

Preferably, in step three, the chord vocal part detection algorithm is used to locate the upper three vocal parts in the chord, and the coordinates of the frame are predicted by the vocal parts, so as to measure the pixel distance between adjacent vocal parts.

Preferably, in the fourth step, the chord is detected and determined based on the chord arrangement rule by converting the acoustic portion pixel distance into the interval degree.

Preferably, in step five, the wrong chords are marked in a way of red notes different from normal black notes.

Compared with the prior art, the chord arrangement detection method based on deep learning disclosed by the invention has the advantages that:

(1) the method is based on the deep learning technology to intelligently analyze and detect chord arrangements, summarize correct results, label errors of harmony writing, reduce teaching pressure of teachers and enhance independent learning efficiency of students.

(2) The method not only solves the error caused by the subjectivity of the manual judgment of the music score error information, but also has the advantage of improving the accuracy and rapidity of judging the position of the wrong chord in the music score.

Drawings

For a clearer explanation of the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for a person skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating error information labeling of chords with density arrangement errors according to the present invention;

FIG. 2 is a flowchart of a chord arrangement detection method based on deep learning according to the present invention.

Detailed Description

The following provides a brief description of embodiments of the present invention with reference to the accompanying drawings. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art based on the embodiments of the present invention without any inventive work belong to the protection scope of the present invention.

Fig. 1-2 show preferred embodiments of the invention, which are each parsed in detail from different perspectives.

1-2, initially extracting the characteristics of each sound part and chord tone, such as the shape of a character head, the direction of a character stem and the like, by a convolutional neural network, further compressing characteristic dimension information by using a principal component analysis algorithm, and then classifying a compressed characteristic map by using an SVM (support vector machine), so as to judge whether the chord has density arrangement or sound part crossing errors; accurately positioning chord tone positions with density arrangement errors by using a target detection algorithm; then, positioning a prediction frame for the three sound parts above the frame, and measuring the pixel distance between the adjacent sound parts through the coordinates of the prediction frame; converting the pixel distance of the adjacent sound parts into a musical interval degree; and finally, detecting and judging the chord based on the chord arrangement method rule, and carrying out related annotation explanation on the error information. The method comprises the following specific steps:

the first step is as follows: and finally, classifying the compressed chord tone features through an SVM (support vector machine), so as to classify and judge whether the chord tone has errors and the error type.

The second step is that: based on a deep learning method, chords with density arrangement errors are accurately positioned. By dividing the music score into S multiplied by S subgraphs, the neuron in each subgraph is responsible for carrying out target detection and positioning on the object falling into the cell, and because of the particularity of the music score, a sliding window with at most three predicted objects in each subgraph is set and is represented by B; information of each sliding window is represented by coordinates (T)_x,T_y,T_w,T_h) And a confidence C, wherein the size of the border offset detected by the neural network is (T)_x,T_y) The ratios of the width and height of the detection frame to the input image are represented by T_wAnd T_hTo express, the confidence C represents the probability of chord existence in the detected frame, and the calculation formula is: c ═ P_O+P_IOU(ii) a Wherein, P_ORepresenting the probability of the chord object contained in the sliding window; p_IOUIndicating an overlapping area of the sliding window and the detection object region; if the detected sliding window contains chord, P_O1, otherwise P_O0; since only chords in the score are detected, the detection class C is 1, and the output dimensionality of the final neural network is: s × S × (B × 5+ C).

The definition of the loss function is determined by detecting frame coordinate errors, sliding window confidence errors of each sub-graph and classification errors; the coordinate error function, the confidence error function, the classification error function and the total error function of the detection frame are defined as follows:

loss_total＝loss_coord+loss_conf+loss_class；

wherein, I_ijIndicating whether the jth sliding window in the ith sub-graph contains the target or not; s²Representing the number of sub-images needing to be detected in a music score; b represents the number of sliding windows in each sub-graph; lambda [ alpha ]_coordWeight coefficients for coordinate errors; x is the number of_i、y_iRespectively representing the abscissa and the ordinate of the center point of the sliding window in the ith sub-graph; w is a_iAnd h_iPredicting the width and length of the frame;

and

respectively representing the abscissa and ordinate of the central point of the sliding window of the ith real sub-image, and the width and length of the central point; to balance the balance between coordinate prediction error and target confidence score, λ is introduced_noobjAs a weight coefficient, when no target frame exists in the image, the penalty strength of the contra-credibility is reduced; c_iRepresenting confidence in the true detection box;

representing the confidence of whether the prediction in the ith subgraph contains the target;

representing the probability of predicting the C-th class in the ith sub-graph.

The third step: and intercepting the part with the detected chord tone, and accurately positioning the positions of the three sound parts above the part. Definition of acoustic partThe predicted frame of (L)_x,L_y,L_w,L_h) The coordinate of the center point of the frame is (L)_x,L_y) The width and height of the frame are L respectively_w、L_hThen the coordinate of the center point of the upper frame is L_up＝(L_x,L_y-L_h/2) the coordinate of the center point of the lower frame is L_down＝(L_x,L_y+L_h2), the distance between any two symbols relative to the vocal part can be defined as:

the fourth step: and based on the chord arrangement rule, calculating the pixel spacing between the three adjacent sound parts above the chord, converting the pixel spacing into interval degrees, detecting and judging the arrangement method of the written chord, and judging whether the arrangement of the three sound parts in the music score and above the chord is correct or not.

The fifth step: chords that do not meet the rules are labeled and the reason for their error and how to modify them is indicated. Wrongly written notes will be marked red compared to normal black notes. The operator can think about the correction scheme according to the judgment result, and can also give an instruction selectively and give a text description above the chord. As shown in fig. 1, a three-chord method detects a wrong chord arrangement, the entire chord (four tones) is marked red, the chord is clicked, and a caption is given above the chord.

Through the five steps, whether the chord in the music score is wrong or not can be judged, and the information of the music score with the wrong chord is labeled.

In summary, the chord arrangement detection method based on deep learning disclosed by the invention intelligently analyzes and detects the chord arrangement based on the deep learning technology, summarizes and explains the correct result, and marks the errors of harmony writing, thereby reducing the teaching pressure of teachers and enhancing the autonomous learning efficiency of students. Meanwhile, the method not only solves the error caused by the subjectivity of the manual judgment of the music score error information, but also has the advantage of improving the accuracy and rapidity of judging the position of the wrong chord in the music score.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A chord arrangement detection method based on deep learning is characterized by comprising the following steps:

the method comprises the following steps: using a convolutional neural network to extract the characteristics of chord tones, further compressing the characteristic dimension information extracted by the neural network by using a principal component analysis method, and classifying the compressed chord tone characteristics by using an SVM (support vector machine) so as to classify and judge whether chord tones have errors and the types of errors;

step two: based on a deep learning method, positioning a prediction frame for three sound parts above the chord with wrong density arrangement; by dividing the music score into S multiplied by S subgraphs, the neuron in each subgraph is responsible for carrying out target detection and positioning on the object falling into the cell, and because of the particularity of the music score, a sliding window with at most three predicted objects in each subgraph is set and is represented by B; information of each sliding window is represented by coordinates (T)_x,T_y,T_w,T_h) And a confidence C, wherein the size of the border offset detected by the neural network is (T)_x,T_y) The ratios of the width and height of the detection frame to the input image are represented by T_wAnd T_hTo express, the confidence C represents the probability of chord existence in the detected frame, and the calculation formula is: c ═ P_O+P_IOU(ii) a Wherein, P_ORepresenting the probability of the chord object contained in the sliding window; p_IOUIndicating an overlapping area of the sliding window and the detection object region; if the detected sliding window contains chord, P_O1, otherwise P_O0; since only chords in the score are detected, the detection class C is 1, and the output dimensionality of the final neural network is: s × S × (B × 5+ C);

loss_total＝loss_coord+loss_conf+loss_class；

and

representing the probability of predicting the C category in the ith sub-graph;

step three: using a chord part detection algorithm to position the upper three sound parts in the chord, predicting the coordinates of a frame through the sound parts, and measuring the pixel distance between the adjacent sound parts so as to measure the degree of the interval between the three sound parts above the chord; defining the predicted bounding box of the part as (L)_x,L_y,L_w,L_h) The coordinate of the center point of the frame is (L)_x,L_y) The width and height of the frame are L respectively_w、L_hThe coordinate of the center point of the upper frame is

The coordinate of the central point of the lower frame is L_down＝(L_x,L_y+L_h2), the distance between any two symbols relative to the vocal part can be defined as:

step four: based on the rule of the chord arrangement method, the pixel spacing between the three adjacent sound parts above is calculated and converted into the degree of the interval, and then the arrangement method of the written chord is detected and judged, and whether the arrangement of the three sound parts in the music score and above the chord is correct or not is judged;

step five: marking the chord with error detected in the arrangement method, and indicating the reason of the error and how to modify the chord; the incorrect chords are marked in a red note that is distinct from the normally black notes.