CN107622774B

CN107622774B - A kind of music-tempo spectrogram generation method based on match tracing

Info

Publication number: CN107622774B
Application number: CN201710675484.3A
Authority: CN
Inventors: 桂文明
Original assignee: Jinling Institute of Technology
Current assignee: Jinling Institute of Technology
Priority date: 2017-08-09
Filing date: 2017-08-09
Publication date: 2018-08-21
Anticipated expiration: 2037-08-09
Also published as: CN107622774A

Abstract

The present invention provides a kind of music-tempo spectrogram generation method based on match tracing, is related to content-based music information retrieval field, this approach includes the following steps：Music signal is inputted, generates note starting point detection function o (n) and to its framing；Common music-tempo section is taken to be converted into frequency sets；To each frequency in frequency sets, a corresponding parent is created；Shifting function is carried out to parent, it is often mobile once to generate a new atom；All parents and new atom are assembled into redundant dictionary；With the dictionary, match tracing is carried out to each frame of o (n), the decomposition coefficient of each music-tempo is obtained, ultimately produces the music-tempo spectrogram of the music.The music-tempo spectrogram that the present invention generates has the characteristics that high resolution, sparsity are strong, and the resolution ratio of music-tempo, the shift granularity of parent and match tracing cycle-index can be flexibly set according to oneself requirement, to generate the music-tempo spectrogram of different resolution and different sparsities.

Description

A kind of music-tempo spectrogram generation method based on match tracing

Technical field

The present invention relates to content-based music information retrieval fields, more particularly to a kind of music based on match tracing Normal-moveout spectrum drawing generating method.

Background technology

One, related notion of the present invention and application field

The speed that music carries out is music-tempo (tempo), usually with " clapping per minute " (beats per in contemporary music Minute, abbreviation bpm) it is used as the measurement of speed, such as music-tempo labelIndicate that the speed of the music is every point 120 crotchets of clock, that is, the duration of each crotchet account for 0.5 second, and bmp values are bigger, and speed is faster.

Music-tempo and the beat of music, rhythm etc. are closely related, are one of important features of music.It is examined in music information Rope field, velocity estimation refers to the content based on music, and from forms such as mp3, wav, the file of the waveform containing music signal sets out Estimate the gait of march of music.Velocity estimation itself is a challenging important topic, while being music beat sense again Know, music rhythm identification, music type identification, the element task of the research directions such as music structure analysis.For example, in the Music Day It claps in perception, generally requires first to estimate music-tempo, pushing away for beat type and beat structure is then carried out according to speed It is disconnected；For another example in music type identification, rhythm and speed can be used as a kind of notable feature of identification types.

Music-tempo is continually changing in music traveling process, and one kind of variation is the reason is that since musical composition is created Variation is inherently required when making, however the change frequency of this form is general little in a piece of music, many music are very To not changing；It is another the reason is that the error played or sung generated, the variation of this form is difficult to avoid that, generally It is present in all parts of music.Therefore, estimation music-tempo is actually to need to estimate the velocity amplitude of each time point.Due to depositing The liaison and rest phenomena such as, music-tempo obscures difficulty and distinguishes；Meanwhile there is error again in speed, therefore the speed of each time point is real The vector of multiple velocity component compositions can be regarded on border as.The speed of each time point of a piece of music can use music-tempo spectrogram (tempogram) it is described.The application such as beat-tracking, rhythm identification, type identification of music can be by music-tempo Spectrogram extracts useful information.

Two, the existing generation technique and process of music-tempo spectrogram

The generating process of music-tempo spectrogram is broken generally into two stages, and the first stage is note starting point detection function (note onset detection function) generation phase, note starting point refer to that each note strikes up in music Or that time sung, some documents such as [1] this stage are referred to as novel curve (Novelty Curve) and generate；Second-order Section is spectrogram generating process.

First stage includes mainly that several parts, the signals such as signal transformation, feature extraction, the generation of starting point detection function become It is that musical waveform signal is converted to low frequency from one-dimensional high-frequency data with the method that signal converts to indicate to change purpose.Usually first To signal framing, signal transformation then is carried out to every frame signal, signal transformation method includes short time discrete Fourier transform (Short Time Fourier Transform, abbreviation STFT), wavelet transformation (Wavelet Transform, abbreviation WT) etc..Feature carries It is the features such as extraction time domain, frequency domain and time-frequency representation from the expression of the signal low frequency of previous stage to take.Temporal signatures are typical Such as amplitude envelops feature, frequency domain character such as composes fluctuation characteristic (Spectral Flux) and Frequency Domain Energy, frequency schedule Show that feature is mainly based upon wavelet transformation or the character representation of Cohen class time-frequency distributions.The generation of starting point detection function is basis The situation of change of frame before and after the feature calculation extracted per frame signal, it is prominent that note starting point is generally present in front and back frame positive change In the case of so increasing.Typical note starting point detection function generating process can refer to document [1].

Second stage is to form music speed according to the value of previous stage note starting point detection function, extracting cycle characteristic Spend spectrogram.This stage, main method included auto-relativity function method (Autocorrelation Function, abbreviation at present ACF), Fourier transform two kinds of (Fourier Transform, abbreviation FT) [1].

ACF is risen according to delay extraction note by note starting point detection function adding window and carrying out autocorrelation calculation The periodicity of initial point, and delay is converted into music-tempo measurement, to form music-tempo spectrogram.Its calculation formula is [1]：

A (t, l)=∑_n∈Zo(n)o(n+l)W(n-t)/(2N+1-l) (1.1)

Wherein t, n are discrete time, and it is delay to take l=1...N, and o (n) is note starting point detection function, during W (n) is Heart point is t=0, is supported as the rectangular window of [- N, N].If f_sFor the sampling frequency of o (n), then it is l/f to postpone the l corresponding periods_s, Frequency is f_s/ l, corresponding music-tempo τ=60*f_s/l。

FT methods are to carry out windowed FFT to note starting point detection function, acquire frequency domain characteristic, and by frequency domain Measurement is converted into music-tempo measurement, to form music-tempo spectrogram.Its calculation formula is：

F (t, ω)=∑_n∈Zo(n)W(n-t)e^-2πiωn (1.2)

Wherein t, n are discrete time, and ω is frequency, and o (n) is note starting point detection function, and it is t=to be put centered on W (n) 0, it supports as the Hanning window of [- N, N].For ω, there are two types of methods to determine at present, and one is the discrete Fouriers according to document [2] Leaf transformation method (Discrete Fourier Transform, abbreviation DFT) turns to N number of frequency point by ω ＞ 0 are discrete, is divided into f_s/ NHz；Another kind is similar document [1] way, and it is common music-tempo range to take ω=τ/60Hz, τ ∈ [30,480] bpm, and The corresponding coefficients of each time point ω are calculated using above-mentioned formula.

Three, the deficiencies in the prior art

The achievement of the present invention is embodied in the second stage of music-tempo spectrogram generation.To illustrate the deficiencies in the prior art, draw Enter two concepts of music-tempo resolution ratio and music-tempo spectrogram sparsity, and is illustrated respectively.Music-tempo resolution ratio, Here the frequency resolution for using for reference field, with the gap size of two adjacent active dots of music-tempo spectrogram medium velocity component It indicates, interval is bigger, and velocity resolution is poorer.Music-tempo spectrogram sparsity refers to the nonzero element in all spectrogram coefficients Number, nonzero element number is few, and sparsity is strong, and discrimination degree is good.

1, the prior art cannot be satisfied requirement of the common music-tempo to velocity resolution

The interval of two adjacent active dots is in changing inversely in music-tempo resolution ratio and spectrogram, is spaced bigger resolution ratio It is poorer, conversely, interval is smaller then better.

Investigate ACF method medium velocity components front and back 2 points of difference beThis illustrates speed Degree interval with delay increase and reduces, delay it is bigger, velocity resolution is higher, that is, velocity resolution with speed increase and Increase.We take f by document [1]_s=1/0.023=43.5, when l=51 (τ=51.2), Δ τ=0.98, as l ＜ 51, The resolution ratio of speed is respectively less than 1, and when l=21 (τ=124.2), Δ τ=5.6 cannot differentiate common music at this time Speed (τ ∈ [30,480]), needless to say the case where l ＜ 21.And Δ τ maximum values (when n=1) to be made to be less than 1, f_sIt needs small In 1/30, that is, frame length is greater than 30 seconds when first stage framing, is equal to then the error of note starting point will be also greater than 30 seconds, this was obviously infeasible, and therefore, for ACF, music-tempo resolution ratio is non-constant, when music-tempo is more than 51bpm, Resolution ratio is less than 1, cannot meet the resolution ratio of common music-tempo.

Investigate the DFT method in document [2], speed interval 60*f_s/ Nbpm, by f_s=1/0.023=43.5 is calculated, such as Fruit will reach the music-tempo resolution ratio of 1bpm, need N >=f_s* 60=2610, and a length of 60 seconds when the window of such a length, Major part music is at 300 seconds hereinafter, therefore, length of window requires and general music length is incompatible at present.Improve speed Resolution ratio is spent, another method is to reduce f_s, and the required precision contradiction of this and o (n), therefore be infeasible.

Investigate document [1] FT methods, this method be actually to music signal adding window after, pass through calculate discrete time The method of Fourier transform (Discrete Time Fourier Transform, abbreviation DTFT), which calculates, commonly uses music-tempo pair The coefficient for the frequencies omega answered, this method actually only carry out ω approximate sampling, practical frequency discrimination on discrete point Rate does not get a promotion.

In conclusion the prior art cannot be satisfied requirement of the common music-tempo to velocity resolution, that is, generate The subregion of music-tempo spectrogram will be smudgy.

2, the music-tempo spectrogram sparsity that the prior art generates is not good enough

Music-tempo spectrogram sparsity is stronger, and discrimination degree is better.From the perspective of from still further aspect, spectrogram sparsity illustrates by force Spectrogram energy is concentrated, and property is good, good application effect.

To ACF methods, when music-tempo is more than 51bpm, component coefficient needs under normal precision (such as 1bpm) By interpolation method design factor, necessarily sparsity is caused to decline.And for FT methods, since spectral leakage and resolution ratio are asked The sparsity of the presence of topic, frequency coefficient obviously will equally decline.Therefore, the music-tempo spectrogram sparsity that the prior art generates Not good enough, encircled energy is poor.

In conclusion the music-tempo spectrogram that the prior art generates is in terms of resolution ratio and sparsity, existing defects, and sharp With the producible resolution ratio higher of the present invention, the better music-tempo spectrogram of sparsity.Bibliography used in this patent is as follows：

1.P.Grosche,M.Müller,F.Kurth.Cyclic tempogram—a mid-level tempo representation for musicsignals[C]. in Acoustics Speech and Signal Processing (ICASSP),2010IEEE International Conference on.2010:IEEE.

2.G.Peeters.Time variable Tempo Detection and beat Marking[C].in ICMC.2005.

3.MIREX.MIREX music test data sets

http://www.music-ir.org/evaluation/MIREX/data/2006/tempo/tempo_train_ 2006.zip.2017.

Invention content

The music-tempo spectra resolution rate and sparsity generated the purpose of the invention is to overcome the shortcomings of the prior art Problem provides a kind of music-tempo spectrogram generation method based on match tracing.

In order to solve the above technical problems, the technical solution adopted by the present invention is：

1, music signal is inputted, note starting point detection function o (n) is generated；

2, to o (n) framings, several frame signals are formed；

3, common music-tempo section is taken, by certain music-tempo resolution ratio, sets of speeds is converted into frequency sets；

4, to each frequency in frequency sets, a corresponding parent is created；

5, it presses certain particle size and shifting function is carried out to all parents, often move one atom of generation that moves a step, these are moved The dynamic atom generated forms the atom set of the corresponding frequency of the parent together with parent；

6, the corresponding atom set of all frequencies in frequency sets is assembled into redundant dictionary；

7, match tracing is carried out with redundant dictionary to each frame signal of o (n), recycles certain number, generates a system Row decomposition coefficient and corresponding atom；

8, the decomposition coefficient of each frame signal of o (n) is returned according to the relationship of atom in redundant dictionary and music-tempo Belong to the coefficient of a certain music-tempo；

9, merge the music-tempo spectrum vector per frame signal, form music-tempo spectrogram.

Beneficial effects of the present invention：

It is characteristic of the invention that the matching pursuit algorithm based on redundant dictionary, generates music-tempo spectrogram.Advantage It is to generate the fine resolution and sparse characteristic of spectrogram.

The good resolution ratio of the present invention has benefited from the flexible setting of atom in redundant dictionary, can be according to music-tempo resolution ratio The atom that demand generates higher resolution forms redundant dictionary, to make the resolution ratio higher of spectrogram.Fig. 2-Fig. 4 is to use sound Happy information retrieval exchange comparation and assessment center (Music Information Retrieval Evaluation eXchange, referred to as MIREX a piece of music (train1.wav) in test data set [3]), is respectively adopted auto-relativity function method (Fig. 2), Fourier Leaf transformation method (Fig. 3) and match tracing method of the present invention (Fig. 4), the music-tempo spectrogram of generation.Music-tempo axis it is adjacent Interval is 1bpm (totally 571 point), and in terms of resolution ratio, Fig. 2 auto-relativity function methods are fine in low speed sections resolution ratio, but Highspeed portion is smudgy, and ribbon is gradually wide, and resolution ratio significantly reduces.In high speed and low speed portion in Fig. 3 Fourier transforms Point, ribbon is all wider, resolution ratio obviously be not so good as Fig. 4 of the present invention result (for auto-relativity function method, Fourier transform Compare, cycle-index 571).

The excellent sparse characteristic of the present invention has benefited from redundant dictionary and provides the similar atom with original signal height, and It ensure that the decomposition coefficient of these similar atoms of height is relatively large with tracing algorithm, non-similar atomic is smaller even Zero.Compare from Fig. 2-Fig. 4 it can be seen that the coefficient of Fig. 4 is significantly sparse, zero or close zero coefficient accounting is significantly big.

The music-tempo spectrogram that the present invention generates also has the spirit of application in addition to having good resolution ratio and sparsity Activity.Flexibility be embodied in the resolution ratio of music-tempo, the cycle-index of the shift granularity of parent, match tracing it is adjustable Property.The adjustment of velocity resolution can be implemented during common speed section is converted into frequency sets；The shifting of parent Position granularity can be configured when generating atom set, and granularity is smaller, and atom set is bigger, and spectrogram precision is higher, Fig. 8-10 points Not Wei shift granularity 50,20,5 three kind of situation, can be seen that precision is higher and higher from comparing result；Cycle-index is in match tracing It is arranged in algorithm, cycle-index is bigger, and the coefficient of generation is more, and spectrogram is more intensive, but coefficient magnitude sequence is still constant , Fig. 5-7 is three kinds of situations of cycle-index 20,10,5 respectively, it is clear that the coefficient of spectrogram is fewer and fewer, but larger coefficient is not Become.

Description of the drawings

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings discussed below is only this hair The part attached drawing of bright embodiment, for those of ordinary skill in the art, without creative efforts, Other drawings may also be obtained based on these drawings.

Fig. 1 is the music-tempo spectrogram product process figure that inventive embodiments provide；

Fig. 2 is the music-tempo spectrogram generated using auto-relativity function method；

Fig. 3 is the music-tempo spectrogram generated using Fourier transform；

Fig. 4 is music-tempo spectrogram (cycle-index 571, the shifting that the present invention uses the method based on match tracing to generate 2) position granularity is；

Fig. 5 is music-tempo spectrogram (cycle-index 20, the displacement that the present invention uses the method based on match tracing to generate 2) granularity is；

Fig. 6 is music-tempo spectrogram (cycle-index 10, the displacement that the present invention uses the method based on match tracing to generate 2) granularity is；

Fig. 7 is music-tempo spectrogram (cycle-index 5, the displacement that the present invention uses the method based on match tracing to generate 2) granularity is；

Fig. 8 is music-tempo spectrogram (cycle-index 20, the displacement that the present invention uses the method based on match tracing to generate 50) granularity is；

Fig. 9 is music-tempo spectrogram (cycle-index 20, the displacement that the present invention uses the method based on match tracing to generate 20) granularity is；

Figure 10 is music-tempo spectrogram (cycle-index 20, the shifting that the present invention uses the method based on match tracing to generate 5) position granularity is.

Specific implementation mode

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art obtained without creative efforts it is all its His embodiment, shall fall within the protection scope of the present invention.

Music-tempo resolution ratio uses for reference the frequency resolution in field, with the two of music-tempo spectrogram medium velocity component here The gap size of a adjacent active dot indicates that interval is bigger, and velocity resolution is poorer.Music-tempo spectrogram sparsity refers to institute There is the number of the nonzero element in spectrogram coefficient, nonzero element number is few, and sparsity is strong, and discrimination degree is good.The embodiment of the present invention A kind of music-tempo spectrogram generation method based on match tracing is provided, as shown in Figure 1, this method includes：

1. inputting music signal, note starting point detection function o (n) is generated

The music signal of input is usually the forms such as wav, mp3, the file containing waveform.Music-tempo spectrogram generate the One stage included the processes such as signal transformation, feature extraction, the generation of starting point detection function, the note starting point that output length is N Detection function o (n), i.e. a vector.This stage can refer to document [1] and be implemented.

2. pair o (n) framings, form several frame signals

Framing is carried out to o (n), it is preferable that the frame length of framing is 6 seconds (setting in frame has M point), often jumps (hopsize) about 0.2 second, form detection function matrix X=X (m, n) m ∈ [1...M] the n ∈ [1...N] that line number is M, columns is N.

3. taking common music-tempo section τ ∈ [30,480], τ ∈ R turn sets of speeds by music-tempo resolution requirement Change frequency sets into

The prior art cannot be selected music-tempo resolution ratio to generate required music-tempo spectrogram by user, and the present invention can be certainly By selection music-tempo resolution ratio, corresponding redundant dictionary is generated, and matched tracing algorithm generates corresponding music-tempo spectrum Figure, this embodies the present invention can be configured the flexibility of music-tempo resolution ratio according to application.The music-tempo point of the present invention Positive integer value that resolution value can be 1,2 ... is simultaneously identical in all subintervals, can also be by auto-relativity function method or Fourier The obtaining value method value of converter technique, it might even be possible to be to divide subinterval, and different music-tempo resolution ratio is pressed in each subinterval Value, for example it is 0.25 to take music-tempo resolution ratio in the most common speed interval of music [80,150], and other subintervals take 0.5.Compare for convenience, it is 1 that entire section, which takes music-tempo resolution ratio, in embodiment, then for τ ∈ [30,480], τ ∈ Z is converted into frequency sets set { f_b|f_b=τ/60, τ=[30,31 ... 480], b=[1..B] } and, wherein b is corresponding frequency Frequency serial number in rate set, B are serial number maximum value.

4. each frequency in pair frequency sets creates a corresponding parent

Specifically, for the frequency sets obtained in step 3, by each frequency f in the set_b, create the frequency Cosine function as corresponding parent α_b, the framing length M of the length of o (n), form is： α_b=cos (2 π f_bt),t =(0...M-1)/f_o,f_oFor the sampling rate of o (n), t indicates the time.

5. carrying out moving to right bit manipulation to all parents by certain particle size, one atom of generation that moves a step often is moved, these The mobile atom generated forms the atom set of the corresponding frequency of the parent together with parent

Parent α_bSupporting domain be [0, M-1], shift granularity d=1,2,3... be a positive integer, by parent α_bTo Move right d*j (j=1,2,3...), parent α_bAfter moving to right, the value cos (- 2 of the left side [0, M-d*j-1] supporting domain πf_bT), t=(M-d*j...1)/f_oSupplement, it is often mobile primary in this way, a new atom can be obtained.Parent is herein Therefore periodic function is arranged maximum mobile digit and is no more than a cycle.All parent α_bThe atom obtained with these displacements Together constitute the corresponding atom set d of the parent_b。

The adjustability of parent shift granularity embodies the flexibility using the present invention again in this step.Granularity is smaller, Atom set is bigger, and spectrogram precision is higher, but simultaneously entire music-tempo spectrogram calculating take it is more.Fig. 8-10 is respectively Shift granularity 50,20,5 three kind of situation can be seen that the precision of spectrogram is higher and higher from comparing result.Using can root when the present invention Requirement and spectrogram required precision are taken according to calculating, determines the shift granularity of parent.

6. being assembled into redundant dictionary by the corresponding atom set of all frequencies in frequency sets in the 5th step

All frequency f in frequency sets_bCorresponding atom set d_b, it is assembled into a redundant dictionary D.

7. each frame signal of couple o (n) carries out match tracing with redundant dictionary, certain number is recycled, generates a system Row decomposition coefficient and corresponding atom：

To each frame signal of o (n), i.e., to each row X of detection function matrix_i, i ∈ [1..N], with redundant dictionary D implements matching pursuit algorithm：

(1) residual signal y is set_n=X_i, n=0 starts to execute cycle；

(2) all atom g of computing redundancy dictionary_j∈ D and residual signal y_nInner product ＜ y_n,g_j＞ is selected in all The corresponding atom g of maximum absolute value person in product_kFor this matched atom of cycle, the decomposition coefficient s of n-th cycle is preserved_n=| ＜ y_n,g_k＞ | and corresponding atom g_n=g_k；

(3) residual signal y is recalculated_n+1=y_n| ＜ y_n,g_k＞ | g_k；

(4) if cycle-index or residual signal reach required precision with original signal energy ratio, cycle is exited, n is otherwise set =n+1 is continued to execute since step (2).

Preferably, the present invention generally presses cycle-index and terminates cycle, can be arranged according to the requirement of music-tempo spectrogram and be recycled Number, such as K=10 times, 20 times ... etc..S is obtained after loop termination_n,g_n, n=[1...K].

The present invention is based on the redundant dictionaries that common music-tempo section generates to provide the similar atom with original signal height, Matching pursuit algorithm ensure that the decomposition coefficient of these similar atoms of height is relatively large, and non-similar atomic is smaller even It is zero, so that the music-tempo spectrogram that the present invention generates has more sparse characteristic.Compare from Fig. 2-Fig. 4 and can be seen that Fig. 4's Coefficient is significantly sparse, and zero or close zero coefficient accounting is significantly big.

The cycle-index of matching pursuit algorithm is adjustable, and cycle-index is bigger, and the coefficient of generation is more, and spectrogram is more intensive (coefficient magnitude and genesis sequence be still constant), but calculate and take and will increase with the increase of cycle-index.Fig. 5-7 It is three kinds of situations of cycle-index 20,10,5 respectively, it is clear that the coefficient of spectrogram is fewer and fewer.Big system a small amount of in some applications It counts and just completes task enough, it is smaller value that cycle-index can be arranged at this time, and required calculating takes smaller；And other applications need It wants big coefficient of discharge to provide enough information, only need to increase cycle-index.This also embodies the flexibility that the present invention applies, And this to be the prior art do not have.

8. according to the relationship of atom in redundant dictionary and music-tempo, the decomposition coefficient of each frame signal of o (n), return Belong to the coefficient of a certain music-tempo

To each frame signal, the music-tempo that an initial value is 0 is created first and composes vector S_n, n=[1..N], each component Serial number be music-tempo serial number b, b=[1..B], the value of each component is the decomposition coefficient of the music-tempo.Then, to each The decomposition coefficient s of frame signal_n, according to atom g in redundant dictionary_nRespective frequencies find corresponding music-tempo serial number b, point Solve coefficient s_nAs the decomposition coefficient of the music-tempo, identical music-tempo serial number is answered if there is multiple atom pairs, then will After the cumulative summation of multiple decomposition coefficients, then as the decomposition coefficient of the music-tempo.

9. merging the music-tempo spectrum vector per frame signal, music-tempo spectrogram is formed

The music-tempo spectrum vector S of all frames_n, assembled by row mode and be merged into music-tempo spectrogram S=S (b, n), b= [1..B], n=[1...N].

Example the above is only the implementation of the present invention is not intended to limit the scope of the invention, every to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content, it is relevant to be applied directly or indirectly in other Technical field is included within the scope of the present invention.

Claims

1. a kind of music-tempo spectrogram generation method based on match tracing, specifically comprises the following steps：

S1. music signal is inputted, note starting point detection function o (n) is generated；

S2. to o (n) framings, several frame signals are formed；

Framing is carried out to o (n), it is preferable that the frame length of framing is 6 seconds, if there is M point in frame, often jumps 0.2 second, then forms line number Detection function matrix X=X (m, n) m ∈ [1...M] the n ∈ [1...N] for being N for M, columns；

S3. take common music-tempo section τ ∈ [30,480], τ ∈ R that sets of speeds is converted by music-tempo resolution requirement At frequency sets:

Positive integer value that the value of music-tempo resolution ratio is 1,2 ..., and it is identical in all subintervals；Or press auto-correlation function The obtaining value method value of method or Fourier transform；Subinterval is either divided, and in each subinterval by different music speed Resolution ratio value is spent, when it is 1 that entire section, which takes music-tempo resolution ratio, then for τ ∈ [30,480], τ ∈ Z, Z expression are just Integer is converted into frequency sets { f_b|f_b=τ/60, τ=[30,31 ... 480], b=[1..B] } and, wherein b is corresponding frequency Frequency serial number in set, B are serial number maximum value；

S4. to each frequency in frequency sets, a corresponding parent is created:

For the frequency sets obtained in step S3, by each frequency f in the set_b, create the cosine function conduct of the frequency Corresponding parent α_b, the framing length M of the length of o (n), form is：α_b=cos (2 π f_bT), t=(0...M-1)/f_o,f_o For the sampling rate of o (n), t indicates the time；

S5. all parents are carried out by certain particle size moving to right bit manipulation, often moves one atom of generation that moves a step, these is moved The atom of generation forms the atom set of the corresponding frequency of the parent together with parent:

Parent α_bSupporting domain be [0, M-1], shift granularity d=1,2,3... be a positive integer, by parent α_bIt moves right It moves d*j (j=1,2,3...), parent α_bAfter moving to right, value cos (- 2 π f of the left side [0, M-d*j-1] supporting domain_bt),t =(M-d*j...1)/f_oSupplement, it is often mobile primary in this way, a new atom can be obtained；Parent is period letter herein Therefore number is arranged maximum mobile digit and is no more than a cycle；All parent α_bThe atom obtained with these displacements group together At the corresponding atom set d of the parent_b；

S6. being assembled into redundant dictionary by the corresponding atom set of all frequencies in frequency sets in step S5：

All frequency f in frequency sets_bCorresponding atom set d_b, it is assembled into a redundant dictionary D；

S7. match tracing is carried out with redundant dictionary to each frame signal of o (n), recycles certain number, generates a series of points Solve coefficient and corresponding atom：

To each frame signal of o (n), i.e., to each row X of detection function matrix_i, i ∈ [1..N], with redundant dictionary D, implementation Matching pursuit algorithm：

(1) residual signal y is set_n=X_i, n=0 starts to execute cycle；

(2) all atom g of computing redundancy dictionary_j∈ D and residual signal y_nInner product<y_n,g_j>, select in all inner products absolutely It is worth the corresponding atom g of the maximum_kFor this matched atom of cycle, the decomposition coefficient s of n-th cycle is preserved_n=|<y_n,g_k>| With corresponding atom g_n=g_k；

(3) residual signal y is recalculated_n+1=y_n-|<y_n,g_k>|g_k；

(4) if cycle-index or residual signal reach required precision with original signal energy ratio, cycle is exited, n=n+ is otherwise set 1, it is continued to execute since step (2)；

S8. the decomposition coefficient of each frame signal of o (n) is belonged to according to the relationship of atom in redundant dictionary and music-tempo The coefficient of a certain music-tempo：

To each frame signal, the music-tempo that an initial value is 0 is created first and composes vector S_n, n=[1..N], the sequence of each component Number it is music-tempo serial number b, b=[1..B], the value of each component is the decomposition coefficient of the music-tempo；Then, each frame is believed Number decomposition coefficient s_n, according to atom g in redundant dictionary_nRespective frequencies find corresponding music-tempo serial number b, resolving system Number s_nAs the decomposition coefficient of the music-tempo, identical music-tempo serial number is answered if there is multiple atom pairs, then it will be multiple After the cumulative summation of decomposition coefficient, then as the decomposition coefficient of the music-tempo；

S9. merge the music-tempo spectrum vector per frame signal, form music-tempo spectrogram：

2. a kind of music-tempo spectrogram generation method based on match tracing, it is characterised in that (4) step in step S7 is exited and followed The condition of ring is to terminate to recycle by cycle-index, cycle-index is arranged according to the requirement of music-tempo spectrogram, i.e., to K as cycle Number carries out assignment and exits cycle when K reaches preset value；S is obtained after loop termination_n,g_n, n=[1...K].