CN107464556A

CN107464556A - A kind of audio scene recognition method based on sparse coding

Info

Publication number: CN107464556A
Application number: CN201610387696.7A
Authority: CN
Inventors: 徐杰; 陈训逊; 王博; 王东安; 包秀国
Original assignee: National Computer Network and Information Security Management Center
Current assignee: National Computer Network and Information Security Management Center
Priority date: 2016-06-02
Filing date: 2016-06-02
Publication date: 2017-12-12

Abstract

The present invention is a kind of audio scene recognition method based on sparse coding, and this method generates including (1) atom；(2) to audio signal to be measured, audio signal is decomposed on atom D, obtains a sparse factor alpha；Obtain the audio signal statistical value R of corresponding target scene_t, integrate outer audio signal statistical value as R_o(3) statistical value R is compared_tAnd R_oSize, scene corresponding to larger statistical value is recognition result.The present invention uses the theory of Its Sparse Decomposition, has extracted a kind of sparse features of audio signal, property when this feature has long, has had good effect in terms of audio scene identification.

Description

A kind of audio scene recognition method based on sparse coding

Technical field

The invention belongs to the network information security, multimedia search technology field, more particularly to a kind of audio field based on sparse coding Scape recognition methods.

Background technology

Audio scene identifies, is the application semantically in highest level.Its application also very extensively, can pass through audio To the processing of audio signal on scene level, Audio Signal Processing is set to become more intelligent.The Audio Signal Processing master of scene level Act on and being embodied in：Audio scene identifies the mass data information for coming from internet, can provide one based in audio The index of appearance and retrieval, for modern times network search engines for, either technology still using it is upper have good supplement and It is perfect；Audio scene identify in the data bank of some audio-frequency informations for containing magnanimity such as digital library, multimedia web site, Can intelligence classification and manage these information datas；Audio scene identify in monitoring field, can in real time to elevator, stop The public places such as parking lot carry out the monitoring and early warning of emergency situations；Audio scene identification can be the decision system of information intelligent The Informational support based on audio is provided, in such as unmanned and smart home field, audio scene identification suffers from important work With.

Analyzed to carry out audio scene identification, it is necessary to which audio signal is mapped on a wordbook：In x=D*a, Original audio signal (column vector) is represented with x, D is obtained dictionary (dictionary), and a is the original audio on dictionary D Signal x expression.In order to obtain a, popular method has Fourier transform, wavelet transformation, PCA etc., and these methods obtain To dictionary be all pre-set, it is extremely difficult to manually set a good dictionary, and its complexity and geometry are special Property when characterizing unlike signal change very greatly, and in dictionary " base " (basis) require it is too harsh, they must be it is orthogonal, Although this limitation can simplify problem, the flexibility solved the problems, such as is limited simultaneously.

The content of the invention

The technical problems to be solved by the invention are the defects of overcoming prior art, there is provided a kind of audio scene based on sparse coding Recognition methods.

The technical scheme is that a kind of audio scene recognition method based on sparse coding, this method comprise the following steps：

(1) atom generates；Training audio signal samples to target scene are trained, and obtain the atom of a target scene Storehouse D1, for object set outside training audio signal samples be trained, obtain one collection outside atom D2；Atom D1 In atom, there is target scene feature；And collect the atom in outer atom D2, then without target scene the characteristics of；

Audio signal is defined as X=[x₁,x₂,…,x_n], wherein the feature of audio signal is tieed up for m, and λ joins for regularization Number, atom D have a k row, and each column is all an atom, and wherein m and k size are much smaller than n, and meet the superfluous of atom Remaining property and excessively complete, i.e. m is less than k；Decomposition of the signal in excessively complete redundancy atom has openness；

The atom D for training to obtain by sample X, for the audio signal in each sample, carry out rarefaction representation；For sample The coefficient that this X is decomposed on D is set to α=[α₁,α₂,…,α_n], the study of atom, exactly establish a sample In rarefaction representation, the atom that be indicated of minimum atom, such as following formula can be used：

(2) to audio signal to be measured, audio signal is decomposed on atom D, obtains a sparse factor alpha；

According to the item being not zero in this coefficient, these corresponding atoms in atom are found, count the classification of these atoms Label, wherein the audio signal statistical value of corresponding target scene isCollecting outer audio signal statistical value isK1 is the atom D1 of target scene atomicity, and k2 is the atomicity of the atom D2 outside collection；

(3) statistical value R is compared_tAnd R_oSize, scene corresponding to larger statistical value is recognition result.

The beneficial effects of the present invention are：The present invention uses the theory of Its Sparse Decomposition, has extracted a kind of sparse spy of audio signal Sign, property when this feature has long, has good effect in terms of audio scene identification.

Brief description of the drawings

Fig. 1 is the audio scene identification framework based on sparse coding

Embodiment

Below, carried out as described in detail below for the present invention with reference to accompanying drawing：

The method of the present invention comprises the following steps；

First, atom generates.Training audio signal samples that will be to target scene, training obtain the atom of a target scene D1, for object set outside training audio signal samples, training obtain one collection outside atom D2.Original in atom D1 Son, there is target scene feature；And collect the atom in outer atom D2, then without target scene the characteristics of.Study is former The method of word bank, it is on the audio database of sample, an atom is learnt by adaptive algorithm, study can be obtained Atom be adapt to data.Audio signal is defined as X=[x1, x2 ..., xn], the wherein feature of audio signal Tieed up for m, λ is regularization parameter, and atom D there are k row, and each column is all an atom.Wherein m and k size is much smaller than n.And meeting the redundancy of atom and excessively complete, i.e. m is less than k.Decomposition of the signal in excessively complete redundancy atom has It is openness.The atom D for training to obtain by sample X, can rarefaction representation for the audio signal in each sample. α=[α 1, α 2 ..., α n] is set to for the sample X coefficients decomposed on D, the study of atom, exactly establishes one Individual sample can use the atom that be indicated of minimum atom, such as following formula in rarefaction representation：

Second, to audio signal to be measured, audio signal is decomposed on atom D, obtains a sparse factor alpha. According to the item being not zero in this coefficient, these corresponding atoms in atom are found, count the class label of these atoms. The audio signal statistical value for wherein corresponding to target scene isCollecting outer audio signal statistical value isIts In, k1 is the atom D1 of target scene atomicity, and k2 is the atomicity of the atom D2 outside collection.

Third, comparing statistical value Rt and Ro size, scene corresponding to that big statistical value is the recognition result of model.

The sparse coding of audio signal is solved, is sought to from atom, selects the preferable atom of expression to signal, and The selection of these atoms tries one's best few number to represent signal.Here it is the Its Sparse Decomposition to signal, and it is most simple to solve this problem Method be exactly MP (Matching Pursuit) algorithm.

Described is only the instantiation of the present invention, any equivalent transformation based on the inventive method basis, belongs to protection of the present invention Within the scope of.

Claims

1. a kind of audio scene recognition method based on sparse coding, it is characterised in that comprise the following steps：

<mrow> <msub> <mi>f</mi> <mi>n</mi> </msub> <mrow> <mo>(</mo> <mi>D</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msub> <mi>min</mi> <mrow> <mi>&alpha;</mi> <mo>&Element;</mo> <msup> <mi>R</mi> <mi>k</mi> </msup> </mrow> </msub> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>|</mo> <mo>|</mo> <mi>x</mi> <mo>-</mo> <mi>D</mi> <mi>&alpha;</mi> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&lambda;</mi> <mo>|</mo> <mo>|</mo> <mi>&alpha;</mi> <mo>|</mo> <msub> <mo>|</mo> <mn>1</mn> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>0.1</mn> <mo>)</mo> </mrow> </mrow>