CN107464556A - A kind of audio scene recognition method based on sparse coding - Google Patents

A kind of audio scene recognition method based on sparse coding Download PDF

Info

Publication number
CN107464556A
CN107464556A CN201610387696.7A CN201610387696A CN107464556A CN 107464556 A CN107464556 A CN 107464556A CN 201610387696 A CN201610387696 A CN 201610387696A CN 107464556 A CN107464556 A CN 107464556A
Authority
CN
China
Prior art keywords
atom
audio signal
mrow
scene
statistical value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610387696.7A
Other languages
Chinese (zh)
Inventor
徐杰
陈训逊
王博
王东安
包秀国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN201610387696.7A priority Critical patent/CN107464556A/en
Publication of CN107464556A publication Critical patent/CN107464556A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention is a kind of audio scene recognition method based on sparse coding, and this method generates including (1) atom;(2) to audio signal to be measured, audio signal is decomposed on atom D, obtains a sparse factor alpha;Obtain the audio signal statistical value R of corresponding target scenet, integrate outer audio signal statistical value as Ro(3) statistical value R is comparedtAnd RoSize, scene corresponding to larger statistical value is recognition result.The present invention uses the theory of Its Sparse Decomposition, has extracted a kind of sparse features of audio signal, property when this feature has long, has had good effect in terms of audio scene identification.

Description

A kind of audio scene recognition method based on sparse coding
Technical field
The invention belongs to the network information security, multimedia search technology field, more particularly to a kind of audio field based on sparse coding Scape recognition methods.
Background technology
Audio scene identifies, is the application semantically in highest level.Its application also very extensively, can pass through audio To the processing of audio signal on scene level, Audio Signal Processing is set to become more intelligent.The Audio Signal Processing master of scene level Act on and being embodied in:Audio scene identifies the mass data information for coming from internet, can provide one based in audio The index of appearance and retrieval, for modern times network search engines for, either technology still using it is upper have good supplement and It is perfect;Audio scene identify in the data bank of some audio-frequency informations for containing magnanimity such as digital library, multimedia web site, Can intelligence classification and manage these information datas;Audio scene identify in monitoring field, can in real time to elevator, stop The public places such as parking lot carry out the monitoring and early warning of emergency situations;Audio scene identification can be the decision system of information intelligent The Informational support based on audio is provided, in such as unmanned and smart home field, audio scene identification suffers from important work With.
Analyzed to carry out audio scene identification, it is necessary to which audio signal is mapped on a wordbook:In x=D*a, Original audio signal (column vector) is represented with x, D is obtained dictionary (dictionary), and a is the original audio on dictionary D Signal x expression.In order to obtain a, popular method has Fourier transform, wavelet transformation, PCA etc., and these methods obtain To dictionary be all pre-set, it is extremely difficult to manually set a good dictionary, and its complexity and geometry are special Property when characterizing unlike signal change very greatly, and in dictionary " base " (basis) require it is too harsh, they must be it is orthogonal, Although this limitation can simplify problem, the flexibility solved the problems, such as is limited simultaneously.
The content of the invention
The technical problems to be solved by the invention are the defects of overcoming prior art, there is provided a kind of audio scene based on sparse coding Recognition methods.
The technical scheme is that a kind of audio scene recognition method based on sparse coding, this method comprise the following steps:
(1) atom generates;Training audio signal samples to target scene are trained, and obtain the atom of a target scene Storehouse D1, for object set outside training audio signal samples be trained, obtain one collection outside atom D2;Atom D1 In atom, there is target scene feature;And collect the atom in outer atom D2, then without target scene the characteristics of;
Audio signal is defined as X=[x1,x2,…,xn], wherein the feature of audio signal is tieed up for m, and λ joins for regularization Number, atom D have a k row, and each column is all an atom, and wherein m and k size are much smaller than n, and meet the superfluous of atom Remaining property and excessively complete, i.e. m is less than k;Decomposition of the signal in excessively complete redundancy atom has openness;
The atom D for training to obtain by sample X, for the audio signal in each sample, carry out rarefaction representation;For sample The coefficient that this X is decomposed on D is set to α=[α12,…,αn], the study of atom, exactly establish a sample In rarefaction representation, the atom that be indicated of minimum atom, such as following formula can be used:
(2) to audio signal to be measured, audio signal is decomposed on atom D, obtains a sparse factor alpha;
According to the item being not zero in this coefficient, these corresponding atoms in atom are found, count the classification of these atoms Label, wherein the audio signal statistical value of corresponding target scene isCollecting outer audio signal statistical value isK1 is the atom D1 of target scene atomicity, and k2 is the atomicity of the atom D2 outside collection;
(3) statistical value R is comparedtAnd RoSize, scene corresponding to larger statistical value is recognition result.
The beneficial effects of the present invention are:The present invention uses the theory of Its Sparse Decomposition, has extracted a kind of sparse spy of audio signal Sign, property when this feature has long, has good effect in terms of audio scene identification.
Brief description of the drawings
Fig. 1 is the audio scene identification framework based on sparse coding
Embodiment
Below, carried out as described in detail below for the present invention with reference to accompanying drawing:
The method of the present invention comprises the following steps;
First, atom generates.Training audio signal samples that will be to target scene, training obtain the atom of a target scene D1, for object set outside training audio signal samples, training obtain one collection outside atom D2.Original in atom D1 Son, there is target scene feature;And collect the atom in outer atom D2, then without target scene the characteristics of.Study is former The method of word bank, it is on the audio database of sample, an atom is learnt by adaptive algorithm, study can be obtained Atom be adapt to data.Audio signal is defined as X=[x1, x2 ..., xn], the wherein feature of audio signal Tieed up for m, λ is regularization parameter, and atom D there are k row, and each column is all an atom.Wherein m and k size is much smaller than n.And meeting the redundancy of atom and excessively complete, i.e. m is less than k.Decomposition of the signal in excessively complete redundancy atom has It is openness.The atom D for training to obtain by sample X, can rarefaction representation for the audio signal in each sample. α=[α 1, α 2 ..., α n] is set to for the sample X coefficients decomposed on D, the study of atom, exactly establishes one Individual sample can use the atom that be indicated of minimum atom, such as following formula in rarefaction representation:
Second, to audio signal to be measured, audio signal is decomposed on atom D, obtains a sparse factor alpha. According to the item being not zero in this coefficient, these corresponding atoms in atom are found, count the class label of these atoms. The audio signal statistical value for wherein corresponding to target scene isCollecting outer audio signal statistical value isIts In, k1 is the atom D1 of target scene atomicity, and k2 is the atomicity of the atom D2 outside collection.
Third, comparing statistical value Rt and Ro size, scene corresponding to that big statistical value is the recognition result of model.
The sparse coding of audio signal is solved, is sought to from atom, selects the preferable atom of expression to signal, and The selection of these atoms tries one's best few number to represent signal.Here it is the Its Sparse Decomposition to signal, and it is most simple to solve this problem Method be exactly MP (Matching Pursuit) algorithm.
Described is only the instantiation of the present invention, any equivalent transformation based on the inventive method basis, belongs to protection of the present invention Within the scope of.

Claims (1)

1. a kind of audio scene recognition method based on sparse coding, it is characterised in that comprise the following steps:
(1) atom generates;Training audio signal samples to target scene are trained, and obtain the atom of a target scene Storehouse D1, for object set outside training audio signal samples be trained, obtain one collection outside atom D2;Atom D1 In atom, there is target scene feature;And collect the atom in outer atom D2, then without target scene the characteristics of;
Audio signal is defined as X=[x1,x2,…,xn], wherein the feature of audio signal is tieed up for m, and λ joins for regularization Number, atom D have a k row, and each column is all an atom, and wherein m and k size are much smaller than n, and meet the superfluous of atom Remaining property and excessively complete, i.e. m is less than k;Decomposition of the signal in excessively complete redundancy atom has openness;
The atom D for training to obtain by sample X, for the audio signal in each sample, carry out rarefaction representation;For sample The coefficient that this X is decomposed on D is set to α=[α12,…,αn], the study of atom, exactly establish a sample In rarefaction representation, the atom that be indicated of minimum atom, such as following formula can be used:
<mrow> <msub> <mi>f</mi> <mi>n</mi> </msub> <mrow> <mo>(</mo> <mi>D</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msub> <mi>min</mi> <mrow> <mi>&amp;alpha;</mi> <mo>&amp;Element;</mo> <msup> <mi>R</mi> <mi>k</mi> </msup> </mrow> </msub> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>|</mo> <mo>|</mo> <mi>x</mi> <mo>-</mo> <mi>D</mi> <mi>&amp;alpha;</mi> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&amp;lambda;</mi> <mo>|</mo> <mo>|</mo> <mi>&amp;alpha;</mi> <mo>|</mo> <msub> <mo>|</mo> <mn>1</mn> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>0.1</mn> <mo>)</mo> </mrow> </mrow>
(2) to audio signal to be measured, audio signal is decomposed on atom D, obtains a sparse factor alpha;
According to the item being not zero in this coefficient, these corresponding atoms in atom are found, count the classification of these atoms Label, wherein the audio signal statistical value of corresponding target scene isCollecting outer audio signal statistical value isK1 is the atom D1 of target scene atomicity, and k2 is the atomicity of the atom D2 outside collection;
(3) statistical value R is comparedtAnd RoSize, scene corresponding to larger statistical value is recognition result.
CN201610387696.7A 2016-06-02 2016-06-02 A kind of audio scene recognition method based on sparse coding Pending CN107464556A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610387696.7A CN107464556A (en) 2016-06-02 2016-06-02 A kind of audio scene recognition method based on sparse coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610387696.7A CN107464556A (en) 2016-06-02 2016-06-02 A kind of audio scene recognition method based on sparse coding

Publications (1)

Publication Number Publication Date
CN107464556A true CN107464556A (en) 2017-12-12

Family

ID=60544792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610387696.7A Pending CN107464556A (en) 2016-06-02 2016-06-02 A kind of audio scene recognition method based on sparse coding

Country Status (1)

Country Link
CN (1) CN107464556A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100257129A1 (en) * 2009-03-11 2010-10-07 Google Inc. Audio classification for information retrieval using sparse features
CN102723079A (en) * 2012-06-07 2012-10-10 天津大学 Music and chord automatic identification method based on sparse representation
CN103413551A (en) * 2013-07-16 2013-11-27 清华大学 Sparse dimension reduction-based speaker identification method
CN103473555A (en) * 2013-08-26 2013-12-25 中国科学院自动化研究所 Horrible video scene recognition method based on multi-view and multi-instance learning
CN103531199A (en) * 2013-10-11 2014-01-22 福州大学 Ecological sound identification method on basis of rapid sparse decomposition and deep learning
CN103594084A (en) * 2013-10-23 2014-02-19 江苏大学 Voice emotion recognition method and system based on joint penalty sparse representation dictionary learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100257129A1 (en) * 2009-03-11 2010-10-07 Google Inc. Audio classification for information retrieval using sparse features
CN102723079A (en) * 2012-06-07 2012-10-10 天津大学 Music and chord automatic identification method based on sparse representation
CN103413551A (en) * 2013-07-16 2013-11-27 清华大学 Sparse dimension reduction-based speaker identification method
CN103473555A (en) * 2013-08-26 2013-12-25 中国科学院自动化研究所 Horrible video scene recognition method based on multi-view and multi-instance learning
CN103531199A (en) * 2013-10-11 2014-01-22 福州大学 Ecological sound identification method on basis of rapid sparse decomposition and deep learning
CN103594084A (en) * 2013-10-23 2014-02-19 江苏大学 Voice emotion recognition method and system based on joint penalty sparse representation dictionary learning

Similar Documents

Publication Publication Date Title
Zhou et al. Audio–visual segmentation
Yoshimura et al. Deep learning architect: classification for architectural design through the eye of artificial intelligence
CN105631479B (en) Depth convolutional network image labeling method and device based on non-equilibrium study
CN105022835B (en) A kind of intelligent perception big data public safety recognition methods and system
Zhang et al. Three-dimensional densely connected convolutional network for hyperspectral remote sensing image classification
CN105609116B (en) A kind of automatic identifying method in speech emotional dimension region
Bertrand et al. Bark and leaf fusion systems to improve automatic tree species recognition
CN110390952A (en) City sound event classification method based on bicharacteristic 2-DenseNet parallel connection
CN110675421B (en) Depth image collaborative segmentation method based on few labeling frames
CN113761259A (en) Image processing method and device and computer equipment
Li et al. Dating ancient paintings of Mogao Grottoes using deeply learnt visual codes
CN116110405B (en) Land-air conversation speaker identification method and equipment based on semi-supervised learning
Wang et al. R2-trans: Fine-grained visual categorization with redundancy reduction
Somervuo Time–frequency warping of spectrograms applied to bird sound analyses
CN113707175B (en) Acoustic event detection system based on feature decomposition classifier and adaptive post-processing
López-Cifuentes et al. Attention-based knowledge distillation in scene recognition: the impact of a dct-driven loss
CN112489689B (en) Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure
CN114398893A (en) Clinical data processing model training method and device based on contrast learning
CN103336830A (en) Image search method based on structure semantic histogram
Saleem et al. Stateful human-centered visual captioning system to aid video surveillance
CN105006231A (en) Distributed large population speaker recognition method based on fuzzy clustering decision tree
CN102034102B (en) Image-based significant object extraction method as well as complementary significance graph learning method and system
Liu et al. 3D point cloud of single tree branches and leaves semantic segmentation based on modified PointNet network
CN107464556A (en) A kind of audio scene recognition method based on sparse coding
CN115661739A (en) Vineyard pest fine-grained identification method based on attribute characteristic knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171212