CN106782583B

CN106782583B - Robust scale contour feature extraction algorithm based on nuclear norm

Info

Publication number: CN106782583B
Application number: CN201611132721.3A
Authority: CN
Inventors: 李锵; 王蒙蒙; 关欣
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2016-12-09
Filing date: 2016-12-09
Publication date: 2020-04-28
Anticipated expiration: 2036-12-09
Also published as: CN106782583A

Abstract

The invention discloses a robust scale contour feature extraction algorithm based on a nuclear norm, which comprises the following steps of 1, converting a music signal to be input; step 2, windowing the music signals, carrying out Fourier transform to obtain a time-frequency matrix of the music signals, and determining an initial beat point; step 3, performing spectrum low-ranking on the rank of the time-frequency matrix by using nuclear norm constraint; meanwhile, noise points in a norm constraint matrix are used for conducting low-rank transformation on a signal spectrum by using a convex-down optimization problem, and noise is removed; step 4, in the iterative constraint process, the low-rank characteristic of the frequency spectrum is utilized to realize a threshold value self-adaptive adjustment algorithm; and 5, performing effective dimension reduction processing on the time frequency matrix to obtain 12-dimensional chord characteristics. Compared with the prior art, the method extracts the chord characteristics of robustness; the time of the algorithm is effectively reduced; the scale contour characteristics of different types and styles of music signals can be accurately recovered.

Description

Robust scale contour feature extraction algorithm based on nuclear norm

Technical Field

The invention belongs to the field of audio signal analysis in a computer auditory system, and particularly relates to a scale profile feature extraction algorithm.

Background

Harmonic components of music are important elements of music, and are important subjects in the field of music information retrieval. Fundamental frequencies of different frequencies of an audio signal and harmonic components thereof are important components constituting chords and affecting music colors. In addition, the extension of the different frequency components in time constitutes a key factor in chord progression. Intuitively, music in chord duration will present certain structurality in frequency domain-low rank characteristic. Chord feature extraction of music belongs to a part of audio signal analysis in a computer auditory system, and the field mainly processes various information separated from sound signals. Meanwhile, the chord characteristics of music are also the basis for extracting some advanced music information.

The mid-level features of music refer to information that is extracted from the audio signal and can represent the audio signal, and can ultimately be part of the high-level features. In recent years, there have been many scholars proposing a variety of mid-level features that can characterize music. The most widely used of these is the Pitch Class Profiles (PCP). However, since the original music signal contains human voice, drumhead, plosive and gaussian noise, the performance of the PCP features has a great relationship with the type of the music signal to be analyzed. Many scholars have proposed improvements based on PCP, for example, hpcp (harmonic PCP) by Gomez, epcp (enhanced PCP) by Lee. These schemes start with varying frequency domain extraction components and then obtain superior features that are appropriate for a particular music genre.

In addition, since each chord has a certain duration, the stability of the PCP feature during this time determines the accuracy of chord identification. There are many scholars who propose an improved scheme based on PCP marching-chromagram. Fujisjima assumes that the chord continues to count frames and adopts sliding window mean filtering, thereby reducing the influence of noise and avoiding frequent change of the chord; GeoffroyPeeters adopts sliding window median filtering to avoid frequent chord change; bello assumes that the chord is invariant within a beat, and beat synchronization techniques are used to avoid frequent chord changes.

Most beat tracking models consist of two parts, namely note endpoint detection and endpoint intensity curve period extraction. In any model, the fundamental purpose of endpoint detection is to select the peak value of the effective endpoint curve, which is essentially the problem of whether the extreme point is a beat point or not.

Therefore, most chord feature extraction schemes do not consider the structural property of the music signal expressed on the frequency spectrum, and apply some known assumptions, so as to adopt some simple processing methods to optimize the chord features.

Disclosure of Invention

Based on the prior art, the invention provides a robust scale profile feature extraction algorithm based on a nuclear norm, which converts a chord feature extraction problem into a convex optimization problem, and realizes a threshold self-adaptive algorithm by using nuclear norm constraint and first norm constraint and low-rank characteristics expressed by a frequency spectrum of a chord.

The invention relates to a robust scale contour feature extraction algorithm based on a nuclear norm, which comprises the following steps:

step 1, converting a music signal to be input into a standard audio with a sampling rate of 22050Hz/16 bit/single channel as a referenced audio signal x (n), wherein n is the number of data points contained in the converted audio signal;

step 2, performing windowing processing on the music signal X (n), wherein the window function is W (k), and k is the window width of the window function, thereby obtaining a signal time domain matrix X_k×mWherein X is_·,mX (k · m/2: k · m/2+ m) · w (k), where m is the number of frames obtained after framing, and then Fourier Transform (Fourier Transform) is performed to obtain the time-frequency moment of the music signalThe matrix D is F.X, wherein F is a Fourier transform matrix;

step 3, it is assumed that harmonic components contained in the spectrum of the audio signal and noise are mutually independent, that is, D is a + E, where the matrix a represents a matrix formed by harmonic components contained in the spectrum matrix, and E represents a matrix formed by noise components contained in the spectrum matrix; from the above assumptions, the recovery of the harmonic matrix a can be ascribed to the following convex optimization problem:

s.t.A+E＝D

wherein | · | purple sweet_*Representing the nuclear norm of the matrix, i.e. the sum of the singular values of the matrix, | · | luminance₁Representing a norm of a matrix, namely the sum of all non-zero elements, wherein the separated matrix A is a frequency spectrum after low rank treatment, the matrix E contains sparse loud noise and other non-harmonic components, and the matrix D is a frequency spectrum of an original music signal;

step 4, in the iterative constraint process, the low-rank characteristic of the frequency spectrum is utilized to realize a threshold value self-adaptive adjustment algorithm; the method comprises the following specific steps: initializing singular value truncation threshold parameter mu, parameter lambda, iteration index k being 0, and temporary matrix Y₀＝D，E₀Is an all-zero matrix; performing singular value decomposition

Obtaining a singular value matrix sigma; then, from mu_kTo 1.5 mu_kTwenty data points are selected at equal intervals

Wherein 1 ≦ i ≦ 20 for each

Performing an inverse singular value decomposition operation

Since harmonic components are distributed only at several frequency points, the meterCalculation matrix

Variance of a certain column in and from

When the variance is maximum, the corresponding index i is selected and used

Namely, completing a threshold value self-adaptive selection algorithm; calculating the matrix obtained in this step

Updating

Y_k+1＝Y_k+μ_k(D-A_k+1-E_k+1) And k ═ k +1 until convergence;

and 5, performing effective dimension reduction processing on the time frequency matrix to obtain 12-dimensional chord characteristics. Normally, note A is specified₀At a frequency of 440Hz as a reference frequency and passes

Obtaining frequency values of other notes, wherein b is the note and A₀The difference in pitch between them, then, by a mapping formula

Mapping each frequency component of the harmonic matrix A to obtain a robust scale profile feature vector, wherein x corresponds to the frequency value corresponding to each row of the matrix A, and f corresponds to the frequency value corresponding to each row of the matrix A_refThen pass through

And (4) obtaining.

Compared with the prior art, the method effectively removes the damage of human voice and other noises to the chord structure and extracts the chord characteristics with robustness while not damaging the original structure of the music frequency domain; the time of the algorithm is effectively reduced; the scale contour characteristics of different types and styles of music signals can be accurately recovered.

Drawings

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a diagram of different types of chord progression;

FIG. 3 is a schematic diagram of the comparison of the results of the present invention with other algorithms, 1, the original ALM algorithm; 2. APS-ALM algorithm; 3. an ASP algorithm.

Detailed Description

The present invention is described in further detail below with reference to the attached drawing figures.

Step 1, music signal conversion: the music signal to be input is converted into standard audio of a sampling rate 22050Hz/16 bit/single channel as a referenced audio signal.

Step 2, performing windowing processing on the music signal X (n), wherein the window function is W (k), and k is the window width of the window function, thereby obtaining a signal time domain matrix X_k×mWherein X is_·,mX (k · m/2: k · m/2+ m) · w (k), wherein m is the number of frames obtained after framing, and then Fourier Transform (Fourier Transform) is performed to obtain a time-frequency matrix D ═ F · X of the music signal, wherein F is a Fourier Transform matrix;

step 3, spectrum low rank and noise removal: as can be seen from the spectrum, a music signal mainly contains two components: harmonic components and sparse loud noise. The harmonic components appear structurally to have a significant low rank structure; while sparse loud noise appears mainly as sparsity. Therefore, the signal spectrum is low ranked with a convex-down optimization problem and noise is removed:

s.t.A+E＝D

wherein | · | purple_*A kernel norm (kernel norm) representing a matrix, i.e., the sum of singular values of the matrix; i | · | purple wind₁Represents the norm of the matrix, i.e. the sum of all non-zero elements.

The separated matrix a is the spectrum after low rank, while matrix E contains sparse loud noise and some other non-harmonic components, and D is the spectrum of the original music signal.

Step 4, PCP characteristic value extraction:

(4-1) defining a mapping matrix of the frequency spectrum to the PCP characteristics, wherein the matrix is in the form of:

wherein 2 pi · ω_jJ is more than or equal to 0 and less than or equal to N-1 represents the frequency value represented by each frequency component in the frequency spectrum, and N represents the frequency number range obtained by the frequency spectrum; and f_iAnd i is more than or equal to 1 and less than or equal to 12, the frequency values corresponding to 12 scales are represented.

Wherein the content of the first and second substances,

the method is a mapping function, and the function obtained according to the twelve-mean law has universality;

(4-2) obtaining a chord progression characteristic under a low rank constraint, i.e., an RPCP characteristic, by C ═ P · a.

The invention adopts a chord automatic identification test database (Practice Data) of international music information retrieval evaluation Match (MIREX), 20 music segments with different music styles and rhythms are counted, and 39 or 40 experts are provided for each music segment to manually mark the chord type of the segment.

In order to verify the effectiveness of the algorithm, the influence of the robustness scale profile characteristic algorithm based on the nuclear norm on chord progression is compared with the current popular algorithm. The smoothness degree of the main scale in chord advancing is adopted to quantitatively describe different algorithms, so that the influence of the different algorithms on chord advancing is judged. The results are shown in FIG. 2. From experimental results, compared with other algorithms, the algorithm has a better smoothing effect on chord progression on the main musical scale, so that the chord is kept stable in a certain time, the frequency of change is reduced, and the method has a guiding effect on chord identification of the whole song.

In addition, in order to verify the noise suppression effect of the algorithm and the influence of the algorithm on the chord identification accuracy, a template matching algorithm is adopted, and the most popular Harmonic PCP is used as a comparison characteristic to explain the effectiveness of the algorithm. The results of the experiment are shown in table 1. The experimental result shows that the chord identification accuracy rate obtained by the algorithm is improved by 9 percent compared with that of the Harmonic PCP.

TABLE 1 robust PCP and HPCP based average chord recognition ratio comparison

Great three chords	Ab	A	Bb	B	C	Db	D	Eb	E	F	Gb	G
													RPCP(％)	76.1	80	76.6	69.0	76.1	71.8	80.4	72.9	79.6	77.6	73.3	63.6
HPCP(％)	73	78	63.8	66.7	71.7	69.2	78.4	64.6	71.4	61.2	68.9	63.6
													Minor three chords	Abm	Am	Bbm	Bm	Cm	Dbm	Dm	Ebm	Em	Fm	Gbm	Gm
RPCP(％)	84.8	74	69	63.4	88.2	87	75	43.6	65.2	80.4	76.3	66.7
													HPCP(％)	72.7	73.7	67.9	58.5	74.5	85.7	73.5	41	65.2	67.9	63.2	56.4

Generally, an approach to solve the problem of kernel norm constrained low-rank convex optimization is to use an Augmented Lagrange Multiplier (ALM), which is widely applied to a sparse matrix as an input. However, as the matrix dimensions increase, time will increase substantially.

According to the unique characteristic of the chord characteristic, the invention provides an ASP-ALM (adaptive selective area array) algorithm based on the threshold adaptive adjustment algorithm of the chord characteristic. The algorithm flow is as follows: initializing singular value truncation threshold parameter mu, parameter lambda, iteration index k being 0, and temporary matrix Y₀＝D，E₀Is an all-zero matrix; performing singular value decomposition

Wherein 1 ≦ i ≦ 20 for each

Performing an inverse singular value decomposition operation

Since the harmonic components are distributed only at several frequency points, the matrix is calculated

Variance of a certain column in and from

When the variance is maximum, the corresponding index i is selected and used

Updating

Y_k+1＝Y_k+μ_k(D-A_k+1-E_k+1) And k ═ k +1 until convergence.

The adaptive algorithm flow is shown in fig. 1. Where μ represents the degree of matrix recovery in the ALM algorithm. The ASP-ALM algorithm can greatly reduce the time consumption of the ALM algorithm in the chord feature extraction process.

Test results pairs are shown in fig. 3: from the results, it is clear that the time consumption is greatly reduced.

Claims

1. A robust scale contour feature extraction algorithm based on a nuclear norm is characterized by comprising the following steps:

converting a music signal to be input into standard audio with a sampling rate of 22050Hz/16 bit/single channel as a referenced audio signal x (n), wherein n is the number of data points contained in the converted audio signal;

step (2), windowing is carried out on the music signals X (n), the window function is W (k), and k is the window width of the window function, so that a signal time domain matrix X is obtained_k×mWherein X is_·,mX (k · m/2: k · m/2+ m) · w (k), wherein m is the number of frames obtained after framing, and then Fourier Transform (Fourier Transform) is performed to obtain a time-frequency matrix D ═ F · X of the music signal, wherein F is a Fourier Transform matrix;

step (3), it is assumed that harmonic components contained in the spectrum of the audio signal and noise are mutually independent, that is, D ═ a + E, where matrix a represents a matrix formed by harmonic components contained in the spectrum matrix, and E represents a matrix formed by noise components contained in the spectrum matrix; from the above assumptions, the recovery of the harmonic matrix a can be ascribed to the following convex optimization problem:

s.t.A+E＝D

wherein | · | purple_*A kernel norm representing a matrix, i.e. the sum of singular values of the matrix; i | · | purple wind₁A norm representing the matrix, i.e. the sum of all non-zero elements; the separated matrix A is the frequency spectrum after low rank processing, the matrix E contains sparse big noise and other non-harmonic components, and the matrix D is the frequency spectrum of the original music signal;

step (4), in the iterative constraint process, the low-rank characteristic of the frequency spectrum is utilized to realize a threshold value self-adaptive adjustment algorithm; the method comprises the following specific steps: initializing singular value truncation threshold parameter mu, parameter lambda, iteration index k being 0, and temporary matrix Y₀＝D，E₀Is an all-zero matrix; performing singular value decomposition

Wherein 1 ≦ i ≦ 20 for each

Performing an inverse singular value decomposition operation

Variance of a certain column in and from

When the variance is maximum, the corresponding index i is selected and used

Updating

Y_k+1＝Y_k+μ_k(D-A_k+1-E_k+1) And k ═ k +1 until convergence;

step (5), effective dimension reduction processing is carried out on the time frequency matrix to obtain chord characteristics of 12 dimensions, and normally, the notes A are specified₀At a frequency of 440Hz as a reference frequency and passes

And (4) obtaining.

2. The robust scale contour feature extraction algorithm based on kernel norm as claimed in claim 1 wherein the threshold adaptive adjustment algorithm comprises the following steps:

initializing singular value truncation threshold parameter mu, parameter lambda, iteration index k being 0, and temporary matrix Y₀＝D，E₀Is an all-zero matrix; performing singular value decomposition

Obtaining a singular value matrix sigma; then, from mu_kTo 1.5 mu_kAt equal intervalsTwenty data points are selected

Wherein 1 ≦ i ≦ 20 for each

Performing an inverse singular value decomposition operation

Variance of a certain column in and from

When the variance is maximum, the corresponding index i is selected and used

Updating

Y_k+1＝Y_k+μ_k(D-A_k+1-E_k+1) And k ═ k +1 until convergence.