CN110164473B - Chord arrangement detection method based on deep learning - Google Patents

Chord arrangement detection method based on deep learning Download PDF

Info

Publication number
CN110164473B
CN110164473B CN201910422361.8A CN201910422361A CN110164473B CN 110164473 B CN110164473 B CN 110164473B CN 201910422361 A CN201910422361 A CN 201910422361A CN 110164473 B CN110164473 B CN 110164473B
Authority
CN
China
Prior art keywords
chord
frame
arrangement
errors
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910422361.8A
Other languages
Chinese (zh)
Other versions
CN110164473A (en
Inventor
朱媛媛
郭威
于贺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Normal University
Original Assignee
Jiangsu Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Normal University filed Critical Jiangsu Normal University
Priority to CN201910422361.8A priority Critical patent/CN110164473B/en
Publication of CN110164473A publication Critical patent/CN110164473A/en
Application granted granted Critical
Publication of CN110164473B publication Critical patent/CN110164473B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B15/00Teaching music
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention discloses a chord arrangement detection method based on deep learning, and relates to the technical field of chord detection. The chord arrangement detection method comprises the steps of extracting characteristics of sound parts and chord tones by applying a deep learning algorithm, further compressing dimension information of the sound parts and the chord tones by using a principal component analysis method, judging and classifying whether chords have errors or not and error types by using an SVM classifier, positioning notes in the chord tones with density arrangement errors by using a target detection algorithm, measuring pixel distances of the notes through a prediction frame of adjacent sound parts, converting the pixel distances into interval degrees, judging error information based on an arrangement rule of the chords, and labeling the notes with errors. According to the chord arrangement detection method based on deep learning, disclosed by the invention, the workload of teachers for checking students' homework is reduced in a mode of autonomous checking of the students, and meanwhile, the purpose of enhancing the learning efficiency is also achieved.

Description

Chord arrangement detection method based on deep learning
Technical Field
The invention relates to the technical field of chord detection, in particular to a chord arrangement detection method based on deep learning.
Background
At present, courses of music major are divided into practice courses and theoretical courses, and music theory, harmony and harmony in music major theoretical courses are written in music theory.
The national and acoustic courses generally adopt 'four harmony parts', namely four vocal parts in a combined spectrogram consisting of a high-pitch spectrogram and a low-pitch spectrogram, wherein the high-pitch spectrogram comprises two vocal parts (a high-pitch part and a middle-pitch part), and the low-pitch spectrogram comprises two vocal parts (a sub-middle-pitch part and a low-pitch part). Three or more notes are combined in the longitudinal direction in a three-degree superposed relationship to form a chord, which is a general longitudinal structure of four parts and tones. In four harmony writing, three chords have two kinds of arrangement methods of dense and open, and seven chords have three kinds of arrangement methods of dense, open and mixed.
In recent years, with the continuous development of computer technology, most colleges and universities in China actively promote a digital and interactive teaching mode. However, the national music teaching and harmony writing system is still imperfect, and the existing systems for writing chords, harmony and polyphony are all staff notation software introduced from abroad, such as Sibelius, Tonica, Overture, file and the like, which can only write music scores and do not have the functions of analyzing, detecting, judging and explaining harmony. Chord arrangement detection is one of central technologies of a detection unit of a music teaching harmony writing system, and plays an important role in judging the writing result of the harmony, namely the longitudinal structure of harmony. In the detection of the chord arrangement, two aspects of research are mainly included: judging the sound part of the note according to the direction of the symbol, and detecting whether sound part crossing occurs or not; the second is a method for detecting chord arrangement. The detection tasks are finished manually by teachers at present, and the intelligent detection technology for chord arrangement by using a computer is still blank in China.
Therefore, in view of the above problems, there is a need for a new chord arrangement detection method, which can reduce the teaching pressure of teachers and enhance the efficiency of autonomous learning of students by means of autonomous examination of students.
Disclosure of Invention
In view of the above, the present invention discloses a chord arrangement detection method based on deep learning, which utilizes the powerful operation function of a computer, judges and classifies vocal part intersection and chord arrangement problems possibly occurring in chords through a deep learning algorithm, labels error information and corrects opinions for chords with density arrangement problems, reduces the workload of teachers for checking student assignments through a mode of autonomous student check, and simultaneously achieves the purpose of enhancing learning efficiency.
The chord arrangement detection method based on deep learning provided by the invention comprises the following steps:
the method comprises the following steps: and (3) performing feature extraction on chord tones by using a convolutional neural network, further compressing feature dimension information extracted by using a principal component analysis method, and classifying the compressed chord tone features by using an SVM (support vector machine) so as to classify and judge whether chord tones have errors and error types.
Step two: and positioning the predicted frame of the three sound parts above the chord with the wrong density arrangement, and accurately positioning the positions of the notes.
Step three: the degrees of the interval between the three parts above the chord are measured.
Step four: and detecting and judging the arrangement method of the written chords based on the arrangement method of the chords and the degree of the interval of the adjacent sound parts in the three upper sound parts.
Step five: and marking the chord error detected by the arrangement method and giving a text description.
Preferably, in the second step, the music score is divided into S × S subgraphs, chord detection is performed on each subgraph, and the positions of chord notes in the subgraphs are accurately positioned.
Preferably, in step three, the chord vocal part detection algorithm is used to locate the upper three vocal parts in the chord, and the coordinates of the frame are predicted by the vocal parts, so as to measure the pixel distance between adjacent vocal parts.
Preferably, in the fourth step, the chord is detected and determined based on the chord arrangement rule by converting the acoustic portion pixel distance into the interval degree.
Preferably, in step five, the wrong chords are marked in a way of red notes different from normal black notes.
Compared with the prior art, the chord arrangement detection method based on deep learning disclosed by the invention has the advantages that:
(1) the method is based on the deep learning technology to intelligently analyze and detect chord arrangements, summarize correct results, label errors of harmony writing, reduce teaching pressure of teachers and enhance independent learning efficiency of students.
(2) The method not only solves the error caused by the subjectivity of the manual judgment of the music score error information, but also has the advantage of improving the accuracy and rapidity of judging the position of the wrong chord in the music score.
Drawings
For a clearer explanation of the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for a person skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating error information labeling of chords with density arrangement errors according to the present invention;
FIG. 2 is a flowchart of a chord arrangement detection method based on deep learning according to the present invention.
Detailed Description
The following provides a brief description of embodiments of the present invention with reference to the accompanying drawings. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art based on the embodiments of the present invention without any inventive work belong to the protection scope of the present invention.
Fig. 1-2 show preferred embodiments of the invention, which are each parsed in detail from different perspectives.
1-2, initially extracting the characteristics of each sound part and chord tone, such as the shape of a character head, the direction of a character stem and the like, by a convolutional neural network, further compressing characteristic dimension information by using a principal component analysis algorithm, and then classifying a compressed characteristic map by using an SVM (support vector machine), so as to judge whether the chord has density arrangement or sound part crossing errors; accurately positioning chord tone positions with density arrangement errors by using a target detection algorithm; then, positioning a prediction frame for the three sound parts above the frame, and measuring the pixel distance between the adjacent sound parts through the coordinates of the prediction frame; converting the pixel distance of the adjacent sound parts into a musical interval degree; and finally, detecting and judging the chord based on the chord arrangement method rule, and carrying out related annotation explanation on the error information. The method comprises the following specific steps:
the first step is as follows: and finally, classifying the compressed chord tone features through an SVM (support vector machine), so as to classify and judge whether the chord tone has errors and the error type.
The second step is that: based on a deep learning method, chords with density arrangement errors are accurately positioned. By dividing the music score into S multiplied by S subgraphs, the neuron in each subgraph is responsible for carrying out target detection and positioning on the object falling into the cell, and because of the particularity of the music score, a sliding window with at most three predicted objects in each subgraph is set and is represented by B; information of each sliding window is represented by coordinates (T)x,Ty,Tw,Th) And a confidence C, wherein the size of the border offset detected by the neural network is (T)x,Ty) The ratios of the width and height of the detection frame to the input image are represented by TwAnd ThTo express, the confidence C represents the probability of chord existence in the detected frame, and the calculation formula is: c ═ PO+PIOU(ii) a Wherein, PORepresenting the probability of the chord object contained in the sliding window; pIOUIndicating an overlapping area of the sliding window and the detection object region; if the detected sliding window contains chord, PO1, otherwise PO0; since only chords in the score are detected, the detection class C is 1, and the output dimensionality of the final neural network is: s × S × (B × 5+ C).
The definition of the loss function is determined by detecting frame coordinate errors, sliding window confidence errors of each sub-graph and classification errors; the coordinate error function, the confidence error function, the classification error function and the total error function of the detection frame are defined as follows:
Figure GDA0002944248610000051
Figure GDA0002944248610000052
Figure GDA0002944248610000053
losstotal=losscoord+lossconf+lossclass
wherein, IijIndicating whether the jth sliding window in the ith sub-graph contains the target or not; s2Representing the number of sub-images needing to be detected in a music score; b represents the number of sliding windows in each sub-graph; lambda [ alpha ]coordWeight coefficients for coordinate errors; x is the number ofi、yiRespectively representing the abscissa and the ordinate of the center point of the sliding window in the ith sub-graph; w is aiAnd hiPredicting the width and length of the frame;
Figure GDA0002944248610000054
and
Figure GDA0002944248610000055
respectively representing the abscissa and ordinate of the central point of the sliding window of the ith real sub-image, and the width and length of the central point; to balance the balance between coordinate prediction error and target confidence score, λ is introducednoobjAs a weight coefficient, when no target frame exists in the image, the penalty strength of the contra-credibility is reduced; ciRepresenting confidence in the true detection box;
Figure GDA0002944248610000056
representing the confidence of whether the prediction in the ith subgraph contains the target;
Figure GDA0002944248610000057
representing the probability of predicting the C-th class in the ith sub-graph.
The third step: and intercepting the part with the detected chord tone, and accurately positioning the positions of the three sound parts above the part. Definition of acoustic partThe predicted frame of (L)x,Ly,Lw,Lh) The coordinate of the center point of the frame is (L)x,Ly) The width and height of the frame are L respectivelyw、LhThen the coordinate of the center point of the upper frame is Lup=(Lx,Ly-Lh/2) the coordinate of the center point of the lower frame is Ldown=(Lx,Ly+Lh2), the distance between any two symbols relative to the vocal part can be defined as:
Figure GDA0002944248610000061
the fourth step: and based on the chord arrangement rule, calculating the pixel spacing between the three adjacent sound parts above the chord, converting the pixel spacing into interval degrees, detecting and judging the arrangement method of the written chord, and judging whether the arrangement of the three sound parts in the music score and above the chord is correct or not.
The fifth step: chords that do not meet the rules are labeled and the reason for their error and how to modify them is indicated. Wrongly written notes will be marked red compared to normal black notes. The operator can think about the correction scheme according to the judgment result, and can also give an instruction selectively and give a text description above the chord. As shown in fig. 1, a three-chord method detects a wrong chord arrangement, the entire chord (four tones) is marked red, the chord is clicked, and a caption is given above the chord.
Through the five steps, whether the chord in the music score is wrong or not can be judged, and the information of the music score with the wrong chord is labeled.
In summary, the chord arrangement detection method based on deep learning disclosed by the invention intelligently analyzes and detects the chord arrangement based on the deep learning technology, summarizes and explains the correct result, and marks the errors of harmony writing, thereby reducing the teaching pressure of teachers and enhancing the autonomous learning efficiency of students. Meanwhile, the method not only solves the error caused by the subjectivity of the manual judgment of the music score error information, but also has the advantage of improving the accuracy and rapidity of judging the position of the wrong chord in the music score.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (1)

1. A chord arrangement detection method based on deep learning is characterized by comprising the following steps:
the method comprises the following steps: using a convolutional neural network to extract the characteristics of chord tones, further compressing the characteristic dimension information extracted by the neural network by using a principal component analysis method, and classifying the compressed chord tone characteristics by using an SVM (support vector machine) so as to classify and judge whether chord tones have errors and the types of errors;
step two: based on a deep learning method, positioning a prediction frame for three sound parts above the chord with wrong density arrangement; by dividing the music score into S multiplied by S subgraphs, the neuron in each subgraph is responsible for carrying out target detection and positioning on the object falling into the cell, and because of the particularity of the music score, a sliding window with at most three predicted objects in each subgraph is set and is represented by B; information of each sliding window is represented by coordinates (T)x,Ty,Tw,Th) And a confidence C, wherein the size of the border offset detected by the neural network is (T)x,Ty) The ratios of the width and height of the detection frame to the input image are represented by TwAnd ThTo express, the confidence C represents the probability of chord existence in the detected frame, and the calculation formula is: c ═ PO+PIOU(ii) a Wherein, PORepresenting the probability of the chord object contained in the sliding window; pIOUIndicating an overlapping area of the sliding window and the detection object region; if the detected sliding window contains chord, PO1, otherwise PO0; since only chords in the score are detected, the detection class C is 1, and the output dimensionality of the final neural network is: s × S × (B × 5+ C);
the definition of the loss function is determined by detecting frame coordinate errors, sliding window confidence errors of each sub-graph and classification errors; the coordinate error function, the confidence error function, the classification error function and the total error function of the detection frame are defined as follows:
Figure FDA0002944248600000011
Figure FDA0002944248600000012
Figure FDA0002944248600000013
losstotal=losscoord+lossconf+lossclass
wherein, IijIndicating whether the jth sliding window in the ith sub-graph contains the target or not; s2Representing the number of sub-images needing to be detected in a music score; b represents the number of sliding windows in each sub-graph; lambda [ alpha ]coordWeight coefficients for coordinate errors; x is the number ofi、yiRespectively representing the abscissa and the ordinate of the center point of the sliding window in the ith sub-graph; w is aiAnd hiPredicting the width and length of the frame;
Figure FDA0002944248600000021
and
Figure FDA0002944248600000022
respectively representing the abscissa and ordinate of the central point of the sliding window of the ith real sub-image, and the width and length of the central point; to balance the balance between coordinate prediction error and target confidence score, λ is introducednoobjAs a weight coefficient, when no target frame exists in the image, the penalty strength of the contra-credibility is reduced; ciRepresenting confidence in the true detection box;
Figure FDA0002944248600000023
representing the confidence of whether the prediction in the ith subgraph contains the target;
Figure FDA0002944248600000024
representing the probability of predicting the C category in the ith sub-graph;
step three: using a chord part detection algorithm to position the upper three sound parts in the chord, predicting the coordinates of a frame through the sound parts, and measuring the pixel distance between the adjacent sound parts so as to measure the degree of the interval between the three sound parts above the chord; defining the predicted bounding box of the part as (L)x,Ly,Lw,Lh) The coordinate of the center point of the frame is (L)x,Ly) The width and height of the frame are L respectivelyw、LhThe coordinate of the center point of the upper frame is
Figure FDA0002944248600000026
The coordinate of the central point of the lower frame is Ldown=(Lx,Ly+Lh2), the distance between any two symbols relative to the vocal part can be defined as:
Figure FDA0002944248600000025
step four: based on the rule of the chord arrangement method, the pixel spacing between the three adjacent sound parts above is calculated and converted into the degree of the interval, and then the arrangement method of the written chord is detected and judged, and whether the arrangement of the three sound parts in the music score and above the chord is correct or not is judged;
step five: marking the chord with error detected in the arrangement method, and indicating the reason of the error and how to modify the chord; the incorrect chords are marked in a red note that is distinct from the normally black notes.
CN201910422361.8A 2019-05-21 2019-05-21 Chord arrangement detection method based on deep learning Expired - Fee Related CN110164473B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910422361.8A CN110164473B (en) 2019-05-21 2019-05-21 Chord arrangement detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910422361.8A CN110164473B (en) 2019-05-21 2019-05-21 Chord arrangement detection method based on deep learning

Publications (2)

Publication Number Publication Date
CN110164473A CN110164473A (en) 2019-08-23
CN110164473B true CN110164473B (en) 2021-03-26

Family

ID=67631637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910422361.8A Expired - Fee Related CN110164473B (en) 2019-05-21 2019-05-21 Chord arrangement detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN110164473B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111968452A (en) * 2020-08-21 2020-11-20 江苏师范大学 Harmony learning method and device and electronic equipment
CN112381792B (en) * 2020-11-13 2023-05-23 中国人民解放军空军工程大学 Intelligent imaging on-line detection method for radar wave-absorbing coating/electromagnetic shielding film damage based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102723079A (en) * 2012-06-07 2012-10-10 天津大学 Music and chord automatic identification method based on sparse representation
CN103714806A (en) * 2014-01-07 2014-04-09 天津大学 Chord recognition method combining SVM with enhanced PCP
CN106446952A (en) * 2016-09-28 2017-02-22 北京邮电大学 Method and apparatus for recognizing score image

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4823804B2 (en) * 2006-08-09 2011-11-24 株式会社河合楽器製作所 Code name detection device and code name detection program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102723079A (en) * 2012-06-07 2012-10-10 天津大学 Music and chord automatic identification method based on sparse representation
CN103714806A (en) * 2014-01-07 2014-04-09 天津大学 Chord recognition method combining SVM with enhanced PCP
CN106446952A (en) * 2016-09-28 2017-02-22 北京邮电大学 Method and apparatus for recognizing score image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《基于相空间重构和支持向量机的和弦识别》;刘婷;《计算机与数字工程》;20101031;第38卷(第10期);第139-142页 *
《基于鲁棒音阶特征的测度学习SVM的音乐和弦识别》;王蒙蒙等;《信号处理》;20170731;第33卷(第7期);第943-952页 *

Also Published As

Publication number Publication date
CN110164473A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
WO2022147965A1 (en) Arithmetic question marking system based on mixnet-yolov3 and convolutional recurrent neural network (crnn)
Hajič et al. The MUSCIMA++ dataset for handwritten optical music recognition
CN109086756A (en) A kind of text detection analysis method, device and equipment based on deep neural network
CN111666938A (en) Two-place double-license-plate detection and identification method and system based on deep learning
US11461638B2 (en) Figure captioning system and related methods
CN108388871B (en) Vehicle detection method based on vehicle body regression
CN110164473B (en) Chord arrangement detection method based on deep learning
CN104966097A (en) Complex character recognition method based on deep learning
US8768241B2 (en) System and method for representing digital assessments
CN103488415B (en) Personal letter word based on paper medium and pattern recognition system and recognition methods
JP7389787B2 (en) Domain adaptive object detection device and method based on multi-level transition region
CN110796131A (en) Chinese character writing evaluation system
CN109635805A (en) Image text location method and device, image text recognition methods and device
US20230101354A1 (en) Method, system, and storage medium for intelligent analysis of student's actual learning based on exam paper
CN111611854B (en) Classroom condition evaluation method based on pattern recognition
CN112241730A (en) Form extraction method and system based on machine learning
CN115019294A (en) Pointer instrument reading identification method and system
CN111832497B (en) Text detection post-processing method based on geometric features
Antonacopoulos et al. Performance analysis framework for layout analysis methods
CN115393875B (en) MobileNet V3-based staff identification and numbered musical notation conversion method and system
CN111104869A (en) Method for digitizing work-ruler spectrum capable of identifying content of small characters
CN116740723A (en) PDF document identification method based on open source Paddle framework
Yamazaki et al. Embedding a mathematical OCR module into OCRopus
CN113838008A (en) Abnormal cell detection method based on attention-drawing mechanism
CN109740618B (en) Test paper score automatic statistical method and device based on FHOG characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210326