CN107464556A - A kind of audio scene recognition method based on sparse coding - Google Patents
A kind of audio scene recognition method based on sparse coding Download PDFInfo
- Publication number
- CN107464556A CN107464556A CN201610387696.7A CN201610387696A CN107464556A CN 107464556 A CN107464556 A CN 107464556A CN 201610387696 A CN201610387696 A CN 201610387696A CN 107464556 A CN107464556 A CN 107464556A
- Authority
- CN
- China
- Prior art keywords
- atom
- audio signal
- mrow
- scene
- statistical value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 15
- 230000005236 sound signal Effects 0.000 claims abstract description 39
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 6
- 230000000694 effects Effects 0.000 abstract description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/005—Correction of errors induced by the transmission channel, if related to the coding algorithm
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The present invention is a kind of audio scene recognition method based on sparse coding, and this method generates including (1) atom;(2) to audio signal to be measured, audio signal is decomposed on atom D, obtains a sparse factor alpha;Obtain the audio signal statistical value R of corresponding target scenet, integrate outer audio signal statistical value as Ro(3) statistical value R is comparedtAnd RoSize, scene corresponding to larger statistical value is recognition result.The present invention uses the theory of Its Sparse Decomposition, has extracted a kind of sparse features of audio signal, property when this feature has long, has had good effect in terms of audio scene identification.
Description
Technical field
The invention belongs to the network information security, multimedia search technology field, more particularly to a kind of audio field based on sparse coding
Scape recognition methods.
Background technology
Audio scene identifies, is the application semantically in highest level.Its application also very extensively, can pass through audio
To the processing of audio signal on scene level, Audio Signal Processing is set to become more intelligent.The Audio Signal Processing master of scene level
Act on and being embodied in:Audio scene identifies the mass data information for coming from internet, can provide one based in audio
The index of appearance and retrieval, for modern times network search engines for, either technology still using it is upper have good supplement and
It is perfect;Audio scene identify in the data bank of some audio-frequency informations for containing magnanimity such as digital library, multimedia web site,
Can intelligence classification and manage these information datas;Audio scene identify in monitoring field, can in real time to elevator, stop
The public places such as parking lot carry out the monitoring and early warning of emergency situations;Audio scene identification can be the decision system of information intelligent
The Informational support based on audio is provided, in such as unmanned and smart home field, audio scene identification suffers from important work
With.
Analyzed to carry out audio scene identification, it is necessary to which audio signal is mapped on a wordbook:In x=D*a,
Original audio signal (column vector) is represented with x, D is obtained dictionary (dictionary), and a is the original audio on dictionary D
Signal x expression.In order to obtain a, popular method has Fourier transform, wavelet transformation, PCA etc., and these methods obtain
To dictionary be all pre-set, it is extremely difficult to manually set a good dictionary, and its complexity and geometry are special
Property when characterizing unlike signal change very greatly, and in dictionary " base " (basis) require it is too harsh, they must be it is orthogonal,
Although this limitation can simplify problem, the flexibility solved the problems, such as is limited simultaneously.
The content of the invention
The technical problems to be solved by the invention are the defects of overcoming prior art, there is provided a kind of audio scene based on sparse coding
Recognition methods.
The technical scheme is that a kind of audio scene recognition method based on sparse coding, this method comprise the following steps:
(1) atom generates;Training audio signal samples to target scene are trained, and obtain the atom of a target scene
Storehouse D1, for object set outside training audio signal samples be trained, obtain one collection outside atom D2;Atom D1
In atom, there is target scene feature;And collect the atom in outer atom D2, then without target scene the characteristics of;
Audio signal is defined as X=[x1,x2,…,xn], wherein the feature of audio signal is tieed up for m, and λ joins for regularization
Number, atom D have a k row, and each column is all an atom, and wherein m and k size are much smaller than n, and meet the superfluous of atom
Remaining property and excessively complete, i.e. m is less than k;Decomposition of the signal in excessively complete redundancy atom has openness;
The atom D for training to obtain by sample X, for the audio signal in each sample, carry out rarefaction representation;For sample
The coefficient that this X is decomposed on D is set to α=[α1,α2,…,αn], the study of atom, exactly establish a sample
In rarefaction representation, the atom that be indicated of minimum atom, such as following formula can be used:
(2) to audio signal to be measured, audio signal is decomposed on atom D, obtains a sparse factor alpha;
According to the item being not zero in this coefficient, these corresponding atoms in atom are found, count the classification of these atoms
Label, wherein the audio signal statistical value of corresponding target scene isCollecting outer audio signal statistical value isK1 is the atom D1 of target scene atomicity, and k2 is the atomicity of the atom D2 outside collection;
(3) statistical value R is comparedtAnd RoSize, scene corresponding to larger statistical value is recognition result.
The beneficial effects of the present invention are:The present invention uses the theory of Its Sparse Decomposition, has extracted a kind of sparse spy of audio signal
Sign, property when this feature has long, has good effect in terms of audio scene identification.
Brief description of the drawings
Fig. 1 is the audio scene identification framework based on sparse coding
Embodiment
Below, carried out as described in detail below for the present invention with reference to accompanying drawing:
The method of the present invention comprises the following steps;
First, atom generates.Training audio signal samples that will be to target scene, training obtain the atom of a target scene
D1, for object set outside training audio signal samples, training obtain one collection outside atom D2.Original in atom D1
Son, there is target scene feature;And collect the atom in outer atom D2, then without target scene the characteristics of.Study is former
The method of word bank, it is on the audio database of sample, an atom is learnt by adaptive algorithm, study can be obtained
Atom be adapt to data.Audio signal is defined as X=[x1, x2 ..., xn], the wherein feature of audio signal
Tieed up for m, λ is regularization parameter, and atom D there are k row, and each column is all an atom.Wherein m and k size is much smaller than
n.And meeting the redundancy of atom and excessively complete, i.e. m is less than k.Decomposition of the signal in excessively complete redundancy atom has
It is openness.The atom D for training to obtain by sample X, can rarefaction representation for the audio signal in each sample.
α=[α 1, α 2 ..., α n] is set to for the sample X coefficients decomposed on D, the study of atom, exactly establishes one
Individual sample can use the atom that be indicated of minimum atom, such as following formula in rarefaction representation:
Second, to audio signal to be measured, audio signal is decomposed on atom D, obtains a sparse factor alpha.
According to the item being not zero in this coefficient, these corresponding atoms in atom are found, count the class label of these atoms.
The audio signal statistical value for wherein corresponding to target scene isCollecting outer audio signal statistical value isIts
In, k1 is the atom D1 of target scene atomicity, and k2 is the atomicity of the atom D2 outside collection.
Third, comparing statistical value Rt and Ro size, scene corresponding to that big statistical value is the recognition result of model.
The sparse coding of audio signal is solved, is sought to from atom, selects the preferable atom of expression to signal, and
The selection of these atoms tries one's best few number to represent signal.Here it is the Its Sparse Decomposition to signal, and it is most simple to solve this problem
Method be exactly MP (Matching Pursuit) algorithm.
Described is only the instantiation of the present invention, any equivalent transformation based on the inventive method basis, belongs to protection of the present invention
Within the scope of.
Claims (1)
1. a kind of audio scene recognition method based on sparse coding, it is characterised in that comprise the following steps:
(1) atom generates;Training audio signal samples to target scene are trained, and obtain the atom of a target scene
Storehouse D1, for object set outside training audio signal samples be trained, obtain one collection outside atom D2;Atom D1
In atom, there is target scene feature;And collect the atom in outer atom D2, then without target scene the characteristics of;
Audio signal is defined as X=[x1,x2,…,xn], wherein the feature of audio signal is tieed up for m, and λ joins for regularization
Number, atom D have a k row, and each column is all an atom, and wherein m and k size are much smaller than n, and meet the superfluous of atom
Remaining property and excessively complete, i.e. m is less than k;Decomposition of the signal in excessively complete redundancy atom has openness;
The atom D for training to obtain by sample X, for the audio signal in each sample, carry out rarefaction representation;For sample
The coefficient that this X is decomposed on D is set to α=[α1,α2,…,αn], the study of atom, exactly establish a sample
In rarefaction representation, the atom that be indicated of minimum atom, such as following formula can be used:
<mrow>
<msub>
<mi>f</mi>
<mi>n</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>D</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msub>
<mi>min</mi>
<mrow>
<mi>&alpha;</mi>
<mo>&Element;</mo>
<msup>
<mi>R</mi>
<mi>k</mi>
</msup>
</mrow>
</msub>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
<mo>|</mo>
<mo>|</mo>
<mi>x</mi>
<mo>-</mo>
<mi>D</mi>
<mi>&alpha;</mi>
<mo>|</mo>
<msubsup>
<mo>|</mo>
<mn>2</mn>
<mn>2</mn>
</msubsup>
<mo>+</mo>
<mi>&lambda;</mi>
<mo>|</mo>
<mo>|</mo>
<mi>&alpha;</mi>
<mo>|</mo>
<msub>
<mo>|</mo>
<mn>1</mn>
</msub>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>0.1</mn>
<mo>)</mo>
</mrow>
</mrow>
(2) to audio signal to be measured, audio signal is decomposed on atom D, obtains a sparse factor alpha;
According to the item being not zero in this coefficient, these corresponding atoms in atom are found, count the classification of these atoms
Label, wherein the audio signal statistical value of corresponding target scene isCollecting outer audio signal statistical value isK1 is the atom D1 of target scene atomicity, and k2 is the atomicity of the atom D2 outside collection;
(3) statistical value R is comparedtAnd RoSize, scene corresponding to larger statistical value is recognition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610387696.7A CN107464556A (en) | 2016-06-02 | 2016-06-02 | A kind of audio scene recognition method based on sparse coding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610387696.7A CN107464556A (en) | 2016-06-02 | 2016-06-02 | A kind of audio scene recognition method based on sparse coding |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107464556A true CN107464556A (en) | 2017-12-12 |
Family
ID=60544792
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610387696.7A Pending CN107464556A (en) | 2016-06-02 | 2016-06-02 | A kind of audio scene recognition method based on sparse coding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107464556A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100257129A1 (en) * | 2009-03-11 | 2010-10-07 | Google Inc. | Audio classification for information retrieval using sparse features |
CN102723079A (en) * | 2012-06-07 | 2012-10-10 | 天津大学 | Music and chord automatic identification method based on sparse representation |
CN103413551A (en) * | 2013-07-16 | 2013-11-27 | 清华大学 | Sparse dimension reduction-based speaker identification method |
CN103473555A (en) * | 2013-08-26 | 2013-12-25 | 中国科学院自动化研究所 | Horrible video scene recognition method based on multi-view and multi-instance learning |
CN103531199A (en) * | 2013-10-11 | 2014-01-22 | 福州大学 | Ecological sound identification method on basis of rapid sparse decomposition and deep learning |
CN103594084A (en) * | 2013-10-23 | 2014-02-19 | 江苏大学 | Voice emotion recognition method and system based on joint penalty sparse representation dictionary learning |
-
2016
- 2016-06-02 CN CN201610387696.7A patent/CN107464556A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100257129A1 (en) * | 2009-03-11 | 2010-10-07 | Google Inc. | Audio classification for information retrieval using sparse features |
CN102723079A (en) * | 2012-06-07 | 2012-10-10 | 天津大学 | Music and chord automatic identification method based on sparse representation |
CN103413551A (en) * | 2013-07-16 | 2013-11-27 | 清华大学 | Sparse dimension reduction-based speaker identification method |
CN103473555A (en) * | 2013-08-26 | 2013-12-25 | 中国科学院自动化研究所 | Horrible video scene recognition method based on multi-view and multi-instance learning |
CN103531199A (en) * | 2013-10-11 | 2014-01-22 | 福州大学 | Ecological sound identification method on basis of rapid sparse decomposition and deep learning |
CN103594084A (en) * | 2013-10-23 | 2014-02-19 | 江苏大学 | Voice emotion recognition method and system based on joint penalty sparse representation dictionary learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhou et al. | Audio–visual segmentation | |
Yoshimura et al. | Deep learning architect: classification for architectural design through the eye of artificial intelligence | |
CN105631479B (en) | Depth convolutional network image labeling method and device based on non-equilibrium study | |
CN105022835B (en) | A kind of intelligent perception big data public safety recognition methods and system | |
Zhang et al. | Three-dimensional densely connected convolutional network for hyperspectral remote sensing image classification | |
CN105609116B (en) | A kind of automatic identifying method in speech emotional dimension region | |
Bertrand et al. | Bark and leaf fusion systems to improve automatic tree species recognition | |
CN110390952A (en) | City sound event classification method based on bicharacteristic 2-DenseNet parallel connection | |
CN110675421B (en) | Depth image collaborative segmentation method based on few labeling frames | |
CN113761259A (en) | Image processing method and device and computer equipment | |
Li et al. | Dating ancient paintings of Mogao Grottoes using deeply learnt visual codes | |
CN116110405B (en) | Land-air conversation speaker identification method and equipment based on semi-supervised learning | |
Wang et al. | R2-trans: Fine-grained visual categorization with redundancy reduction | |
Somervuo | Time–frequency warping of spectrograms applied to bird sound analyses | |
CN113707175B (en) | Acoustic event detection system based on feature decomposition classifier and adaptive post-processing | |
López-Cifuentes et al. | Attention-based knowledge distillation in scene recognition: the impact of a dct-driven loss | |
CN112489689B (en) | Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure | |
CN114398893A (en) | Clinical data processing model training method and device based on contrast learning | |
CN103336830A (en) | Image search method based on structure semantic histogram | |
Saleem et al. | Stateful human-centered visual captioning system to aid video surveillance | |
CN105006231A (en) | Distributed large population speaker recognition method based on fuzzy clustering decision tree | |
CN102034102B (en) | Image-based significant object extraction method as well as complementary significance graph learning method and system | |
Liu et al. | 3D point cloud of single tree branches and leaves semantic segmentation based on modified PointNet network | |
CN107464556A (en) | A kind of audio scene recognition method based on sparse coding | |
CN115661739A (en) | Vineyard pest fine-grained identification method based on attribute characteristic knowledge graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171212 |