CN102623007B

CN102623007B - Audio characteristic classification method based on variable duration

Info

Publication number: CN102623007B
Application number: CN201110033410.2A
Authority: CN
Inventors: 卢敏; 窦维蓓
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2011-01-30
Filing date: 2011-01-30
Publication date: 2014-01-01
Anticipated expiration: 2031-01-30
Also published as: CN102623007A

Abstract

The invention discloses an audio characteristic classification method based on variable duration in a multimedia signal processing and mode identification technology field. The method comprises the following steps: taking a marked audio sequence whose type is determined as a training sequence; extracting short time characteristics of an audio signal in the training sequence so as to form a short time characteristic vector; calculating a statistical parameter of the each short time characteristic in setting duration so as to acquire a statistical characteristic vector corresponding to the short time characteristic vector; calculating a group of the statistical characteristic vectors corresponding to the short time characteristic vector, and forming a long time characteristic vector of the training sequence by the group of the statistical characteristic vectors; using the long time characteristic vector of the training sequence to train a classifier; extracting a short time characteristic of an ist frame audio signal in a test sequence and calculating an ist frame input long time characteristic vector of the test sequence; sending the ist frame input long time characteristic vector into the trained classifier so as to obtain a classification type. By using the method of the invention, a time-delay problem caused by long time characteristic extraction can be avoided and real time classification of the audio characteristic can be realized.

Description

Audio frequency characteristics sorting technique based on variable duration

Technical field

The invention belongs to multimedia signal dispose and mode identification technology, relate in particular to a kind of audio frequency characteristics sorting technique based on variable duration.

Background technology

Along with the development of the communication technology, digital audio processing is widely used in a plurality of fields such as mobile communication, internet, broadcast and personal electrics.With audio encoding and decoding technique, it take narrowband voice as main voice coding from traditional, expand to gradually the more much higher media audio of bandwidth expansion quality coding, the rise of 3G, LTE has also further had higher requirement to audio encoding and decoding technique of new generation at aspects such as the reliability of the adaptability to channel, transmission and encoding and decoding quality.And no matter be audio coding decoding, or the sounds effects editing making, the diversity that sound signal itself has, make and may need to select different treatment technologies to dissimilar sound signal.As ITU-T G.718 and G.729.1, just sound signal has been divided into to voice and two kinds of coding modes of music, and after G.718-SWB in added the coding mode to the sound signal containing sinuso sine protractor.This shows, in some application scenarios, need first to sound signal, carry out simple and classify efficiently, know affiliated type.

During classification, the feature when short-time characteristic of extraction sound signal and length.Due to the stationarity in short-term of sound signal, usually compare short-time characteristic, when long, the stability of feature and the property distinguished are better, but shortcoming is that the detection time delay is large, and the application on the real-time grading system is had to certain limitation.In addition, steady cycle that different characteristic shows may be inconsistent, if to these features all get under surely same duration calculate corresponding when long feature may not be optimum.

Summary of the invention

The object of the invention is to, while for audio frequency characteristics sorting technique commonly used, mainly adopting extraction long, the technical scheme of feature affects the problem of live effect, a kind of audio frequency characteristics sorting technique based on variable duration is proposed, when long by the variable duration that extracts the same statistical parameter formation of same short-time characteristic under different durations, feature is carried out training classifier, and utilizes the sorter trained to carry out the audio frequency characteristics classification.

Technical scheme of the present invention is that a kind of audio frequency characteristics sorting technique based on variable duration, is characterized in that described method comprises the following steps:

Step 1: will determine that the tonic train of type process mark is as training sequence;

Step 2: the short-time characteristic F that extracts the sound signal in training sequence ₁, F ₂..., F _k, form short character vector

, K is the component number of short character vector;

Step 3: calculate each short-time characteristic F _kin setting duration, the statistical parameter of the short-time characteristic of present frame and (n-1) frame before, n is for setting the totalframes in duration; Each short-time characteristic F _kcorresponding one group of statistical nature vector formed by the statistical parameter of this short-time characteristic

, and then short character vector a corresponding statistical nature vector

, wherein

; 1≤k≤K;

Step 4: choose P value, N ₁, N ₂..., N _pmeet N ₁<N ₂<...<N _p, make n equal respectively N ₁, N ₂..., N _p, according to step 3, calculate short character vector one group of corresponding statistical nature vector

, proper vector during by this group statistical nature vector composing training sequence long:

Step 5: proper vector while utilizing training sequence long

training classifier;

Step 6: extract the short-time characteristic of the sound signal in cycle tests, and calculate the statistical nature vector of the i frame of cycle tests according to the method for step 2 and step 3

and cycle tests;

Step 7: according to the statistical nature vector of the i frame of cycle tests

and cycle tests

, the proper vector when input of the i frame of calculating cycle tests is long;

Step 8: the proper vector when input of i frame is long

send in the sorter after step 5 is trained, its output is the classification type of i frame.

Described short-time characteristic comprises logarithm energy, zero-crossing rate and evenly sub belt energy distribution.

The statistical parameter of the short-time characteristic of described present frame and (n-1) frame before comprises the short-time characteristic maximal value MaxF of present frame and (n-1) frame before _k(n), minimum M inF _k(n), arithmetic mean AvgF _kor variance VarF (n) _k(n) one or more in.

Described proper vector while utilizing training sequence long

proper vector when training classifier specifically utilizes training sequence long train single sorter.

Described proper vector while utilizing training sequence long

training classifier specifically uses the forward direction Method for Feature Selection, proper vector when training sequence long

proper vector when middle selection validity feature forms effective length

, and utilize proper vector while effectively growing

train single sorter.

Described proper vector while utilizing training sequence long

proper vector when training classifier specifically utilizes training sequence long

minute the vector

train respectively separately the set of classifiers formed in parallel after single sorter of the same type.

The proper vector when input of the i frame of described calculating cycle tests is long specifically utilize formula

Wherein, q=1,2, L, P-1,

in

total q,

in

total P-q.

Described single sorter is the independent characteristic sorter based on normal distribution.

Features training sorter when the present invention is long by the variable duration that extracts the same statistical parameter formation of same short-time characteristic under different durations, and utilize the sorter trained to carry out the audio frequency characteristics classification, avoid extracting the latency issue that feature causes when long, realized the real-time grading of audio frequency characteristics.

The accompanying drawing explanation

Fig. 1 is based on the audio frequency characteristics sorting technique process flow diagram of variable duration;

Fig. 2 is the schematic diagram that while utilizing training sequence long, proper vector is trained single sorter;

When Fig. 3 is effective long that while utilizing training sequence long, the validity feature of proper vector forms, proper vector is trained the schematic diagram of single sorter;

Fig. 4 is that while utilizing training sequence long, minute vector of proper vector is trained respectively composition and classification device group schematic diagram in parallel after single sorter of the same type separately;

Fig. 5 is the training sample database information table;

Fig. 6 is test sample book library information table;

Fig. 7 is the classifier performance contrast table.

Embodiment

Below in conjunction with accompanying drawing, preferred embodiment is elaborated.Should be emphasized that, following explanation is only exemplary, rather than in order to limit the scope of the invention and to apply.

The present invention is categorized as example with the voice/music signal under the 32kHz sampling rate and describes.To the audio signal classification of other types, the present invention stands good.

Fig. 1 is based on the audio frequency characteristics sorting technique process flow diagram of variable duration.In Fig. 1, the audio frequency characteristics sorting technique based on variable duration comprises the following steps:

Step 1: will determine that the tonic train of type process mark is as training sequence.

, K is the component number of short character vector.

The present embodiment sound intermediate frequency signal is by every 40ms mono-frame, and the short-time characteristic of calculating comprises logarithm energy, zero-crossing rate and evenly sub belt energy distribution.In the present invention, short-time characteristic includes but not limited to logarithm energy, zero-crossing rate and evenly sub belt energy distribution.

If the sound signal sampling point of i frame is x (n), n=(i-1) L, (i-1) L+1, L, iL-1, L is frame length, the computing formula of each short-time characteristic is as follows:

A, logarithm energy

E_{1} (i) = Σ_{n = (i - 1) L}^{i \cdot L - 1} x^{2} (n)

E ₂(i)＝max(log[E ₁(i)]，-10)

B, zero-crossing rate

ZCR (i) = Σ_{n = (i - 1) L}^{i \cdot L - 1} [sign (x (n) - x (n - 1)) + 1] / 2

Wherein, sign (x) is-symbol function,

sign (x) = \{\begin{matrix} 1, & x > 0 \\ 0, & x = 0 \\ - 1, & x < 0 \end{matrix}

；

C, evenly sub belt energy distribution

SubE (i, k) = Σ_{m = (k - 1) L / 2 K}^{kL / 2 K - 1} X (i, m)

，k＝1，2，L，K

Wherein, X (i, m) is the amplitude spectrum after i frame sound signal is done the FFT conversion.

X (i, m) = | Σ_{k = 1}^{L} x ((i - 1) L + k - 1) \cdot \exp [- j \cdot \frac{2 π}{L} (m - 1) (k - 1)] |

，m＝1，2，L，L

Known according to the character of real sequence FFT, X (i, m) is about the m=L/2+1 even symmetry, therefore (L/2+1) individual value before can only retaining.K is even sub band number, makes K=16 in the present embodiment.

When the present embodiment extracts audio frequency characteristics, the short character vector of i frame

{\overset{r}{V}}_{s} (i) = [\begin{matrix} E_{2} (i) \\ ZCR (i) \\ SubE (i, 1) \\ M \\ SubE (i, 16) \end{matrix}]

Its vectorial dimension is 18.E ₂(i), ZCR (i), SubE (i, 1) ..., SubE (i, 16) is respectively the short character vector F of i frame ₁, F ₂..., F ₁₈.

, and then short character vector

a corresponding statistical nature vector

, wherein

; 1≤k≤K.

The statistical parameter of the short-time characteristic of present frame and (n-1) frame before comprises the short-time characteristic maximal value MaxF of present frame and (n-1) frame before _k(n), minimum M inF _k(n), arithmetic mean AvgF _kor variance VarF (n) _k(n) one or more in.In the present embodiment, select maximal value and variance as statistical parameter, each short-time characteristic F _kcorresponding one group of statistical nature vector formed by the statistical parameter of this short-time characteristic

.After the present embodiment the 2nd step calculating, 18 short-time characteristics are arranged, the statistical nature vector that the statistical parameter by this short-time characteristic that each short-time characteristic is corresponding forms has 2, short character vector

a corresponding statistical nature vector

dimension be 36 dimensions.

Step 4: choose P value, N ₁, N ₂..., N _pmeet N ₁<N ₂<... N _p, make n equal respectively N ₁, N ₂..., N _p, according to step 3, calculate short character vector one group of corresponding statistical nature vector

, proper vector during by this group statistical nature vector composing training sequence long

In the present embodiment, get P=3, N ₁=5, N ₂=15, N ₃=25, obtain the corresponding one group of statistical nature vector of 3 short character vector of i frame

, their vectorial dimension is all 36 dimensions.And then, proper vector during by this group statistical nature vector composing training sequence long

, its vectorial dimension is 108 dimensions.

Step 5: proper vector while utilizing training sequence long

training classifier.

Proper vector when obtaining training sequence long

after, can use known technology, proper vector while utilizing training sequence long

training classifier.

Fig. 2 is the schematic diagram that while utilizing training sequence long, proper vector is trained single sorter.In Fig. 2, proper vector while utilizing training sequence long

proper vector when training classifier can utilize training sequence long

directly train single sorter.

When Fig. 3 is effective long that while utilizing training sequence long, the validity feature of proper vector forms, proper vector is trained the schematic diagram of single sorter.In Fig. 3, proper vector while utilizing training sequence long

training classifier also can use the forward direction Method for Feature Selection, proper vector when training sequence long proper vector when middle selection validity feature forms effective length

, and utilize proper vector while effectively growing

train single sorter.

Fig. 4 is that while utilizing training sequence long, minute vector of proper vector is trained respectively the set of classifiers schematic diagram formed in parallel after single sorter of the same type separately.In Fig. 4, proper vector while utilizing training sequence long

proper vector when training classifier can also utilize training sequence long

minute the vector

In the present embodiment, single sorter is selected the independent characteristic sorter based on normal distribution, and for other sorter, the present invention stands good.During training classifier, use method training classifier as shown in Figure 3 and Figure 4.Use the forward direction Method for Feature Selection, proper vector when training sequence long

108 dimensional features in, select 36 dimension validity features to form proper vector when effectively long

, and utilize proper vector while effectively growing

train single sorter.Simultaneously, respectively with

for the characteristic of division vector, the sorter of stand-alone training same type.

and cycle tests.

and cycle tests

, the proper vector when input of the i frame of calculating cycle tests is long.

The proper vector when input of the i frame of calculating cycle tests is long

specifically utilize formula

Wherein, q=1,2, L, P-1, in

total q,

in

total P-q.

Step 8: the proper vector when input of i frame is long

send in the sorter of step 5 training, its output is the classification type of i frame.

Training sample database in the present embodiment and test sample book storehouse form by voice sequence and music sequence, separate between two databases.Fig. 5 is the training sample database information table, and Fig. 6 is test sample book library information table.On test sample book as above storehouse, test, comparison-of-pair sorting's device results of property as shown in Figure 7.In Fig. 7, test result contrast can be found out: when long, the duration of feature is larger, and classification accuracy rate is higher, but the time delay of type conversion to be detected also larger simultaneously; By contrast, the sorter that obtains of training according to the present invention, aspect the promptness two changed in classification accuracy and the type of detection of audio types, have more excellent performance performance, is more suitable for the system of real-time music/Classification of Speech.

The above; be only the present invention's embodiment preferably, but protection scope of the present invention is not limited to this, anyly is familiar with in technical scope that those skilled in the art disclose in the present invention; the variation that can expect easily or replacement, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims

1. the audio frequency characteristics sorting technique based on variable duration, is characterized in that described method comprises the following steps:

{\overset{&RightArrow;}{V}}_{S} = [\begin{matrix} F_{1} \\ F_{2} \\ \cdot \\ \cdot \\ \cdot \\ F_{K} \end{matrix}],

K is the component number of short character vector;

Step 3: calculate each short-time characteristic F _kin setting duration, the statistical parameter of the short-time characteristic of present frame and (n-1) frame before, n is for setting the totalframes in duration; Each short-time characteristic F _kcorresponding one group of statistical nature vector formed by the statistical parameter of this short-time characteristic and then short character vector

a corresponding statistical nature vector

wherein

{\overset{&RightArrow;}{V}}_{L} (n) = [\begin{matrix} {\overset{&RightArrow;}{L}}_{1} (n) \\ {\overset{&RightArrow;}{L}}_{2} (n) \\ \cdot \\ \cdot \\ \cdot \\ {\overset{&RightArrow;}{L}}_{K} (n) \end{matrix}];

1≤k≤K；

Step 4: choose P value, N ₁, N ₂..., N _pmeet N ₁<N ₂<...<N _p, make n equal respectively N ₁, N ₂..., N _p, according to step 3, calculate short character vector

one group of corresponding statistical nature vector

proper vector during by this group statistical nature vector composing training sequence long

{\overset{&RightArrow;}{V}}_{F} = {[{\overset{&RightArrow;}{V}}_{L}^{T} (N_{1}), {\overset{&RightArrow;}{V}}_{L}^{T} (N_{2}), . . ., {\overset{&RightArrow;}{V}}_{L}^{T} (N_{P})]}^{T};

Step 5: proper vector while utilizing training sequence long

training classifier;

and cycle tests

{\overset{&RightArrow;}{V}}_{L} (N_{2}), . . ., {\overset{&RightArrow;}{V}}_{L} (N_{P});

and cycle tests

the proper vector when input of the i frame of calculating cycle tests is long

The proper vector when input of the i frame of calculating cycle tests is long

specifically utilize formula

{\overset{&RightArrow;}{V}}_{IN} (i) = \{\begin{matrix} {[{\overset{&RightArrow;}{V}}_{L}^{T} (i), . . ., {\overset{&RightArrow;}{V}}_{L}^{T} (i)]}^{T}, \\ {[{\overset{&RightArrow;}{V}}_{L}^{T} (N_{1}), . . ., {\overset{&RightArrow;}{V}}_{L}^{T} (N_{q}), {\overset{&RightArrow;}{V}}_{L}^{T} (i), . . ., {\overset{&RightArrow;}{V}}_{L}^{T} (i)]}^{T}, \\ {\overset{&RightArrow;}{V}}_{F}, \end{matrix}

\begin{matrix} i < N_{1} \\ N_{1} < \\ i &GreaterEqual; N_{P} \end{matrix} . . . < N_{q} \leq i < N_{q + 1} < . . . < N_{P}

Wherein, q=1,2 ..., P-1,

in

total q,

{[{\overset{&RightArrow;}{V}}_{L}^{T} (N_{1}), . . ., {\overset{&RightArrow;}{V}}_{L}^{T} (N_{q}), {\overset{&RightArrow;}{V}}_{L}^{T} (i), . . ., {\overset{&RightArrow;}{V}}_{L}^{T} (i)]}^{T}

In total P-q;

Step 8: the proper vector when input of i frame is long

2. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1, is characterized in that described short-time characteristic comprises logarithm energy, zero-crossing rate and evenly sub belt energy distribution.

3. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1, the statistical parameter that it is characterized in that the short-time characteristic of described present frame and (n-1) frame before comprises the short-time characteristic maximal value MaxF of present frame and (n-1) frame before _k(n), minimum M inF _k(n), arithmetic mean AvgF _kor variance VarF (n) _k(n) one or more in.

4. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1, is characterized in that described proper vector while utilizing training sequence long

train single sorter.

5. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1, is characterized in that described proper vector while utilizing training sequence long

proper vector when middle selection validity feature forms effective length and utilize proper vector when effectively long

train single sorter.

6. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1, is characterized in that described proper vector while utilizing training sequence long

minute the vector train respectively separately the set of classifiers formed in parallel after single sorter of the same type.

7. according to the described a kind of audio frequency characteristics sorting technique based on variable duration of any one claim in claim 4-6, it is characterized in that described single sorter is for the independent characteristic sorter based on normal distribution.