CN102623007B - Audio characteristic classification method based on variable duration - Google Patents
Audio characteristic classification method based on variable duration Download PDFInfo
- Publication number
- CN102623007B CN102623007B CN201110033410.2A CN201110033410A CN102623007B CN 102623007 B CN102623007 B CN 102623007B CN 201110033410 A CN201110033410 A CN 201110033410A CN 102623007 B CN102623007 B CN 102623007B
- Authority
- CN
- China
- Prior art keywords
- rightarrow
- vector
- short
- time characteristic
- training sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 239000013598 vector Substances 0.000 claims abstract description 108
- 238000012549 training Methods 0.000 claims abstract description 67
- 238000012360 testing method Methods 0.000 claims abstract description 29
- 230000005236 sound signal Effects 0.000 claims abstract description 18
- 239000000284 extract Substances 0.000 claims description 9
- 230000001256 tonic effect Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 6
- 238000000605 extraction Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 8
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Images
Landscapes
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses an audio characteristic classification method based on variable duration in a multimedia signal processing and mode identification technology field. The method comprises the following steps: taking a marked audio sequence whose type is determined as a training sequence; extracting short time characteristics of an audio signal in the training sequence so as to form a short time characteristic vector; calculating a statistical parameter of the each short time characteristic in setting duration so as to acquire a statistical characteristic vector corresponding to the short time characteristic vector; calculating a group of the statistical characteristic vectors corresponding to the short time characteristic vector, and forming a long time characteristic vector of the training sequence by the group of the statistical characteristic vectors; using the long time characteristic vector of the training sequence to train a classifier; extracting a short time characteristic of an ist frame audio signal in a test sequence and calculating an ist frame input long time characteristic vector of the test sequence; sending the ist frame input long time characteristic vector into the trained classifier so as to obtain a classification type. By using the method of the invention, a time-delay problem caused by long time characteristic extraction can be avoided and real time classification of the audio characteristic can be realized.
Description
Technical field
The invention belongs to multimedia signal dispose and mode identification technology, relate in particular to a kind of audio frequency characteristics sorting technique based on variable duration.
Background technology
Along with the development of the communication technology, digital audio processing is widely used in a plurality of fields such as mobile communication, internet, broadcast and personal electrics.With audio encoding and decoding technique, it take narrowband voice as main voice coding from traditional, expand to gradually the more much higher media audio of bandwidth expansion quality coding, the rise of 3G, LTE has also further had higher requirement to audio encoding and decoding technique of new generation at aspects such as the reliability of the adaptability to channel, transmission and encoding and decoding quality.And no matter be audio coding decoding, or the sounds effects editing making, the diversity that sound signal itself has, make and may need to select different treatment technologies to dissimilar sound signal.As ITU-T G.718 and G.729.1, just sound signal has been divided into to voice and two kinds of coding modes of music, and after G.718-SWB in added the coding mode to the sound signal containing sinuso sine protractor.This shows, in some application scenarios, need first to sound signal, carry out simple and classify efficiently, know affiliated type.
During classification, the feature when short-time characteristic of extraction sound signal and length.Due to the stationarity in short-term of sound signal, usually compare short-time characteristic, when long, the stability of feature and the property distinguished are better, but shortcoming is that the detection time delay is large, and the application on the real-time grading system is had to certain limitation.In addition, steady cycle that different characteristic shows may be inconsistent, if to these features all get under surely same duration calculate corresponding when long feature may not be optimum.
Summary of the invention
The object of the invention is to, while for audio frequency characteristics sorting technique commonly used, mainly adopting extraction long, the technical scheme of feature affects the problem of live effect, a kind of audio frequency characteristics sorting technique based on variable duration is proposed, when long by the variable duration that extracts the same statistical parameter formation of same short-time characteristic under different durations, feature is carried out training classifier, and utilizes the sorter trained to carry out the audio frequency characteristics classification.
Technical scheme of the present invention is that a kind of audio frequency characteristics sorting technique based on variable duration, is characterized in that described method comprises the following steps:
Step 1: will determine that the tonic train of type process mark is as training sequence;
Step 2: the short-time characteristic F that extracts the sound signal in training sequence
1, F
2..., F
k, form short character vector
, K is the component number of short character vector;
Step 3: calculate each short-time characteristic F
kin setting duration, the statistical parameter of the short-time characteristic of present frame and (n-1) frame before, n is for setting the totalframes in duration; Each short-time characteristic F
kcorresponding one group of statistical nature vector formed by the statistical parameter of this short-time characteristic
, and then short character vector
a corresponding statistical nature vector
, wherein
; 1≤k≤K;
Step 4: choose P value, N
1, N
2..., N
pmeet N
1<N
2<...<N
p, make n equal respectively N
1, N
2..., N
p, according to step 3, calculate short character vector
one group of corresponding statistical nature vector
, proper vector during by this group statistical nature vector composing training sequence long:
Step 6: extract the short-time characteristic of the sound signal in cycle tests, and calculate the statistical nature vector of the i frame of cycle tests according to the method for step 2 and step 3
and cycle tests;
Step 7: according to the statistical nature vector of the i frame of cycle tests
and cycle tests
, the proper vector when input of the i frame of calculating cycle tests is long;
Step 8: the proper vector when input of i frame is long
send in the sorter after step 5 is trained, its output is the classification type of i frame.
Described short-time characteristic comprises logarithm energy, zero-crossing rate and evenly sub belt energy distribution.
The statistical parameter of the short-time characteristic of described present frame and (n-1) frame before comprises the short-time characteristic maximal value MaxF of present frame and (n-1) frame before
k(n), minimum M inF
k(n), arithmetic mean AvgF
kor variance VarF (n)
k(n) one or more in.
Described proper vector while utilizing training sequence long
proper vector when training classifier specifically utilizes training sequence long
train single sorter.
Described proper vector while utilizing training sequence long
training classifier specifically uses the forward direction Method for Feature Selection, proper vector when training sequence long
proper vector when middle selection validity feature forms effective length
, and utilize proper vector while effectively growing
train single sorter.
Described proper vector while utilizing training sequence long
proper vector when training classifier specifically utilizes training sequence long
minute the vector
train respectively separately the set of classifiers formed in parallel after single sorter of the same type.
The proper vector when input of the i frame of described calculating cycle tests is long
specifically utilize formula
Described single sorter is the independent characteristic sorter based on normal distribution.
Features training sorter when the present invention is long by the variable duration that extracts the same statistical parameter formation of same short-time characteristic under different durations, and utilize the sorter trained to carry out the audio frequency characteristics classification, avoid extracting the latency issue that feature causes when long, realized the real-time grading of audio frequency characteristics.
The accompanying drawing explanation
Fig. 1 is based on the audio frequency characteristics sorting technique process flow diagram of variable duration;
Fig. 2 is the schematic diagram that while utilizing training sequence long, proper vector is trained single sorter;
When Fig. 3 is effective long that while utilizing training sequence long, the validity feature of proper vector forms, proper vector is trained the schematic diagram of single sorter;
Fig. 4 is that while utilizing training sequence long, minute vector of proper vector is trained respectively composition and classification device group schematic diagram in parallel after single sorter of the same type separately;
Fig. 5 is the training sample database information table;
Fig. 6 is test sample book library information table;
Fig. 7 is the classifier performance contrast table.
Embodiment
Below in conjunction with accompanying drawing, preferred embodiment is elaborated.Should be emphasized that, following explanation is only exemplary, rather than in order to limit the scope of the invention and to apply.
The present invention is categorized as example with the voice/music signal under the 32kHz sampling rate and describes.To the audio signal classification of other types, the present invention stands good.
Fig. 1 is based on the audio frequency characteristics sorting technique process flow diagram of variable duration.In Fig. 1, the audio frequency characteristics sorting technique based on variable duration comprises the following steps:
Step 1: will determine that the tonic train of type process mark is as training sequence.
Step 2: the short-time characteristic F that extracts the sound signal in training sequence
1, F
2..., F
k, form short character vector
, K is the component number of short character vector.
The present embodiment sound intermediate frequency signal is by every 40ms mono-frame, and the short-time characteristic of calculating comprises logarithm energy, zero-crossing rate and evenly sub belt energy distribution.In the present invention, short-time characteristic includes but not limited to logarithm energy, zero-crossing rate and evenly sub belt energy distribution.
If the sound signal sampling point of i frame is x (n), n=(i-1) L, (i-1) L+1, L, iL-1, L is frame length, the computing formula of each short-time characteristic is as follows:
A, logarithm energy
E
2(i)=max(log[E
1(i)],-10)
B, zero-crossing rate
Wherein, sign (x) is-symbol function,
;
C, evenly sub belt energy distribution
Wherein, X (i, m) is the amplitude spectrum after i frame sound signal is done the FFT conversion.
Known according to the character of real sequence FFT, X (i, m) is about the m=L/2+1 even symmetry, therefore (L/2+1) individual value before can only retaining.K is even sub band number, makes K=16 in the present embodiment.
When the present embodiment extracts audio frequency characteristics, the short character vector of i frame
Its vectorial dimension is 18.E
2(i), ZCR (i), SubE (i, 1) ..., SubE (i, 16) is respectively the short character vector F of i frame
1, F
2..., F
18.
Step 3: calculate each short-time characteristic F
kin setting duration, the statistical parameter of the short-time characteristic of present frame and (n-1) frame before, n is for setting the totalframes in duration; Each short-time characteristic F
kcorresponding one group of statistical nature vector formed by the statistical parameter of this short-time characteristic
, and then short character vector
a corresponding statistical nature vector
, wherein
; 1≤k≤K.
The statistical parameter of the short-time characteristic of present frame and (n-1) frame before comprises the short-time characteristic maximal value MaxF of present frame and (n-1) frame before
k(n), minimum M inF
k(n), arithmetic mean AvgF
kor variance VarF (n)
k(n) one or more in.In the present embodiment, select maximal value and variance as statistical parameter, each short-time characteristic F
kcorresponding one group of statistical nature vector formed by the statistical parameter of this short-time characteristic
.After the present embodiment the 2nd step calculating, 18 short-time characteristics are arranged, the statistical nature vector that the statistical parameter by this short-time characteristic that each short-time characteristic is corresponding forms has 2, short character vector
a corresponding statistical nature vector
dimension be 36 dimensions.
Step 4: choose P value, N
1, N
2..., N
pmeet N
1<N
2<... N
p, make n equal respectively N
1, N
2..., N
p, according to step 3, calculate short character vector
one group of corresponding statistical nature vector
, proper vector during by this group statistical nature vector composing training sequence long
In the present embodiment, get P=3, N
1=5, N
2=15, N
3=25, obtain the corresponding one group of statistical nature vector of 3 short character vector of i frame
, their vectorial dimension is all 36 dimensions.And then, proper vector during by this group statistical nature vector composing training sequence long
, its vectorial dimension is 108 dimensions.
Proper vector when obtaining training sequence long
after, can use known technology, proper vector while utilizing training sequence long
training classifier.
Fig. 2 is the schematic diagram that while utilizing training sequence long, proper vector is trained single sorter.In Fig. 2, proper vector while utilizing training sequence long
proper vector when training classifier can utilize training sequence long
directly train single sorter.
When Fig. 3 is effective long that while utilizing training sequence long, the validity feature of proper vector forms, proper vector is trained the schematic diagram of single sorter.In Fig. 3, proper vector while utilizing training sequence long
training classifier also can use the forward direction Method for Feature Selection, proper vector when training sequence long
proper vector when middle selection validity feature forms effective length
, and utilize proper vector while effectively growing
train single sorter.
Fig. 4 is that while utilizing training sequence long, minute vector of proper vector is trained respectively the set of classifiers schematic diagram formed in parallel after single sorter of the same type separately.In Fig. 4, proper vector while utilizing training sequence long
proper vector when training classifier can also utilize training sequence long
minute the vector
train respectively separately the set of classifiers formed in parallel after single sorter of the same type.
In the present embodiment, single sorter is selected the independent characteristic sorter based on normal distribution, and for other sorter, the present invention stands good.During training classifier, use method training classifier as shown in Figure 3 and Figure 4.Use the forward direction Method for Feature Selection, proper vector when training sequence long
108 dimensional features in, select 36 dimension validity features to form proper vector when effectively long
, and utilize proper vector while effectively growing
train single sorter.Simultaneously, respectively with
for the characteristic of division vector, the sorter of stand-alone training same type.
Step 6: extract the short-time characteristic of the sound signal in cycle tests, and calculate the statistical nature vector of the i frame of cycle tests according to the method for step 2 and step 3
and cycle tests.
Step 7: according to the statistical nature vector of the i frame of cycle tests
and cycle tests
, the proper vector when input of the i frame of calculating cycle tests is long.
The proper vector when input of the i frame of calculating cycle tests is long
specifically utilize formula
Step 8: the proper vector when input of i frame is long
send in the sorter of step 5 training, its output is the classification type of i frame.
Training sample database in the present embodiment and test sample book storehouse form by voice sequence and music sequence, separate between two databases.Fig. 5 is the training sample database information table, and Fig. 6 is test sample book library information table.On test sample book as above storehouse, test, comparison-of-pair sorting's device results of property as shown in Figure 7.In Fig. 7, test result contrast can be found out: when long, the duration of feature is larger, and classification accuracy rate is higher, but the time delay of type conversion to be detected also larger simultaneously; By contrast, the sorter that obtains of training according to the present invention, aspect the promptness two changed in classification accuracy and the type of detection of audio types, have more excellent performance performance, is more suitable for the system of real-time music/Classification of Speech.
The above; be only the present invention's embodiment preferably, but protection scope of the present invention is not limited to this, anyly is familiar with in technical scope that those skilled in the art disclose in the present invention; the variation that can expect easily or replacement, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.
Claims (7)
1. the audio frequency characteristics sorting technique based on variable duration, is characterized in that described method comprises the following steps:
Step 1: will determine that the tonic train of type process mark is as training sequence;
Step 2: the short-time characteristic F that extracts the sound signal in training sequence
1, F
2..., F
k, form short character vector
K is the component number of short character vector;
Step 3: calculate each short-time characteristic F
kin setting duration, the statistical parameter of the short-time characteristic of present frame and (n-1) frame before, n is for setting the totalframes in duration; Each short-time characteristic F
kcorresponding one group of statistical nature vector formed by the statistical parameter of this short-time characteristic
and then short character vector
a corresponding statistical nature vector
wherein
1≤k≤K;
Step 4: choose P value, N
1, N
2..., N
pmeet N
1<N
2<...<N
p, make n equal respectively N
1, N
2..., N
p, according to step 3, calculate short character vector
one group of corresponding statistical nature vector
proper vector during by this group statistical nature vector composing training sequence long
Step 6: extract the short-time characteristic of the sound signal in cycle tests, and calculate the statistical nature vector of the i frame of cycle tests according to the method for step 2 and step 3
and cycle tests
Step 7: according to the statistical nature vector of the i frame of cycle tests
and cycle tests
the proper vector when input of the i frame of calculating cycle tests is long
The proper vector when input of the i frame of calculating cycle tests is long
specifically utilize formula
2. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1, is characterized in that described short-time characteristic comprises logarithm energy, zero-crossing rate and evenly sub belt energy distribution.
3. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1, the statistical parameter that it is characterized in that the short-time characteristic of described present frame and (n-1) frame before comprises the short-time characteristic maximal value MaxF of present frame and (n-1) frame before
k(n), minimum M inF
k(n), arithmetic mean AvgF
kor variance VarF (n)
k(n) one or more in.
4. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1, is characterized in that described proper vector while utilizing training sequence long
proper vector when training classifier specifically utilizes training sequence long
train single sorter.
5. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1, is characterized in that described proper vector while utilizing training sequence long
training classifier specifically uses the forward direction Method for Feature Selection, proper vector when training sequence long
proper vector when middle selection validity feature forms effective length
and utilize proper vector when effectively long
train single sorter.
6. a kind of audio frequency characteristics sorting technique based on variable duration according to claim 1, is characterized in that described proper vector while utilizing training sequence long
proper vector when training classifier specifically utilizes training sequence long
minute the vector
train respectively separately the set of classifiers formed in parallel after single sorter of the same type.
7. according to the described a kind of audio frequency characteristics sorting technique based on variable duration of any one claim in claim 4-6, it is characterized in that described single sorter is for the independent characteristic sorter based on normal distribution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110033410.2A CN102623007B (en) | 2011-01-30 | 2011-01-30 | Audio characteristic classification method based on variable duration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110033410.2A CN102623007B (en) | 2011-01-30 | 2011-01-30 | Audio characteristic classification method based on variable duration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102623007A CN102623007A (en) | 2012-08-01 |
CN102623007B true CN102623007B (en) | 2014-01-01 |
Family
ID=46562887
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110033410.2A Expired - Fee Related CN102623007B (en) | 2011-01-30 | 2011-01-30 | Audio characteristic classification method based on variable duration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102623007B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968986B (en) * | 2012-11-07 | 2015-01-28 | 华南理工大学 | Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics |
CN106328152B (en) * | 2015-06-30 | 2020-01-31 | 芋头科技(杭州)有限公司 | automatic indoor noise pollution identification and monitoring system |
CN105654944B (en) * | 2015-12-30 | 2019-11-01 | 中国科学院自动化研究所 | It is a kind of merged in short-term with it is long when feature modeling ambient sound recognition methods and device |
WO2018199997A1 (en) * | 2017-04-28 | 2018-11-01 | Hewlett-Packard Development Company, L.P. | Audio classifcation with machine learning model using audio duration |
CN108305616B (en) * | 2018-01-16 | 2021-03-16 | 国家计算机网络与信息安全管理中心 | Audio scene recognition method and device based on long-time and short-time feature extraction |
CN113780180B (en) * | 2021-09-13 | 2024-06-25 | 俞加利 | Audio long-term fingerprint extraction and matching method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101067930A (en) * | 2007-06-07 | 2007-11-07 | 深圳先进技术研究院 | Intelligent audio frequency identifying system and identifying method |
CN101236742A (en) * | 2008-03-03 | 2008-08-06 | 中兴通讯股份有限公司 | Music/ non-music real-time detection method and device |
CN101364408A (en) * | 2008-10-07 | 2009-02-11 | 西安成峰科技有限公司 | Sound image combined monitoring method and system |
CN101398825A (en) * | 2007-09-29 | 2009-04-01 | 三星电子株式会社 | Rapid music assorting and searching method and device |
-
2011
- 2011-01-30 CN CN201110033410.2A patent/CN102623007B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101067930A (en) * | 2007-06-07 | 2007-11-07 | 深圳先进技术研究院 | Intelligent audio frequency identifying system and identifying method |
CN101398825A (en) * | 2007-09-29 | 2009-04-01 | 三星电子株式会社 | Rapid music assorting and searching method and device |
CN101236742A (en) * | 2008-03-03 | 2008-08-06 | 中兴通讯股份有限公司 | Music/ non-music real-time detection method and device |
CN101364408A (en) * | 2008-10-07 | 2009-02-11 | 西安成峰科技有限公司 | Sound image combined monitoring method and system |
Non-Patent Citations (2)
Title |
---|
Cyril Joder等.Temporal Integration for Audio Classification With Application to Musical Instrument Classification.《IEEE TRANSACTIONS ON AUDIO,SPEECH,AND LANGUAGE PROCESSING》.2009,第17卷(第1期),174-186. |
Temporal Integration for Audio Classification With Application to Musical Instrument Classification;Cyril Joder等;《IEEE TRANSACTIONS ON AUDIO,SPEECH,AND LANGUAGE PROCESSING》;20090131;第17卷(第1期);174-186 * |
Also Published As
Publication number | Publication date |
---|---|
CN102623007A (en) | 2012-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102623007B (en) | Audio characteristic classification method based on variable duration | |
CN101159834B (en) | Method and system for detecting repeatable video and audio program fragment | |
CN108597498A (en) | Multi-microphone voice acquisition method and device | |
CN110827837A (en) | Whale activity audio classification method based on deep learning | |
CN102446504B (en) | Voice/Music identifying method and equipment | |
CN100580693C (en) | Advertisement detecting and recognizing method and system | |
CN104143324B (en) | A kind of musical tone recognition method | |
CN109767776B (en) | Deception voice detection method based on dense neural network | |
CN101599271A (en) | A kind of recognition methods of digital music emotion | |
CN105741835A (en) | Audio information processing method and terminal | |
CN103854646A (en) | Method for classifying digital audio automatically | |
CN103985381A (en) | Voice frequency indexing method based on parameter fusion optimized decision | |
CN112133277B (en) | Sample generation method and device | |
CN111128211B (en) | Voice separation method and device | |
Lu et al. | Self-supervised audio spatialization with correspondence classifier | |
CN102708861A (en) | Poor speech recognition method based on support vector machine | |
CN106098079A (en) | Method and device for extracting audio signal | |
CN102723079A (en) | Music and chord automatic identification method based on sparse representation | |
CN108615536A (en) | Time-frequency combination feature musical instrument assessment of acoustics system and method based on microphone array | |
CN104123949B (en) | card frame detection method and device | |
Taenzer et al. | Investigating CNN-based Instrument Family Recognition for Western Classical Music Recordings. | |
Shifas et al. | A non-causal FFTNet architecture for speech enhancement | |
CN102214219B (en) | Audio/video content retrieval system and method | |
CN105721090B (en) | A kind of detection and recognition methods of illegal f-m broadcast station | |
Valero et al. | Narrow-band autocorrelation function features for the automatic recognition of acoustic environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20140101 Termination date: 20180130 |