CN108182949A

CN108182949A - A kind of highway anomalous audio event category method based on depth conversion feature

Info

Publication number: CN108182949A
Application number: CN201711305135.9A
Authority: CN
Inventors: 李艳雄; 李先苦; 张聿晗
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-12-11
Filing date: 2017-12-11
Publication date: 2018-06-19

Abstract

The invention discloses a kind of highway anomalous audio event category methods based on depth conversion feature, acquire highway anomalous audio event sample first, are then divided into training set and test set；Then preemphasis, framing, windowing process are carried out to training set and test set audio event sample respectively, and front and rear 2 frame is taken to form context audio data block；Acoustic feature is extracted from above-mentioned audio data block and is spliced into characteristic vector；By characteristic vector input depth autoencoder network extraction depth conversion feature；Then the long memory network grader in short-term of input, recognizes all kinds of anomalous audio events.Above-mentioned depth autoencoder network feature extractor and long memory network grader in short-term all include training step and testing procedure.The depth conversion that the present invention uses is characterized in the fusion and transformation of each conventional acoustic feature, has better distinction and robustness, more preferably classifying quality can be obtained when classifying to the anomalous audio event in highway complex audio.

Description

A kind of highway anomalous audio event category method based on depth conversion feature

Technical field

The present invention relates to Audio Signal Processings and machine learning techniques field, and in particular to one kind is based on depth conversion feature Highway anomalous audio event category method.

Background technology

As the improvement of people's living standards, the quantity sharp increase of private car, is efficiently runed to expressway safety Pressure is increased, there is an urgent need for a kind of normal events that can be on automatic distinguishing highway and the method for anomalous event.Highway Situation is more complicated, and various abnormal conditions are likely to occur, and traditional method based on video monitoring is difficult to efficiently should comprehensively To various abnormal emergency cases.

Traditional audio event sorting technique is mostly single using mel-frequency cepstrum coefficient, perception linear predictor coefficient etc. Acoustic feature.In view of the features such as difference is big, class inherited is small in highway anomalous audio event context loudness of a sound, class, single biography System acoustic feature can not effectively portray the difference between each audio event, and the present invention combines multiple acoustic features and using depth god Depth integration and the transformation of feature are carried out through network, it is intended to assemble the advantage of each acoustic feature and further excavate each acoustic feature Latent trait obtains more distinction and the depth conversion feature of noise immunity.

Invention content

The purpose of the present invention is to solve drawbacks described above of the prior art, provide a kind of based on depth conversion feature Highway anomalous audio event category method, includes the following steps, prepares data first, acquires highway anomalous audio thing Anomalous audio event sample is simultaneously divided into training set and test set by part sample；Then it pre-processes, respectively to training set and test Collect audio event sample and carry out preemphasis, framing, windowing process, and front and rear 2 frame is taken to form context audio data block；From above-mentioned Acoustic feature is extracted in audio data block, is mainly included：Meier wave filter group, Gabor filter group, normal Q cepstrum coefficients will be upper Three kinds of merging features are stated into acoustic feature vector and input depth autoencoder network extraction depth conversion feature；Then by above-mentioned depth The long memory network grader in short-term of transform characteristics input is spent, recognizes all kinds of anomalous audio events.Above-mentioned depth autoencoder network is special It levies extractor and long memory network grader in short-term all includes training step and testing procedure.The depth conversion that the present invention uses is special Sign is the fusion and transformation of each conventional acoustic feature, has better distinction and robustness, to highway complex audio In anomalous audio event when being classified, more preferably classifying quality can be obtained.

The purpose of the present invention can be reached by adopting the following technical scheme that：

A kind of highway anomalous audio event category method based on depth conversion feature, the method include following Step：

S1, data preparation are gone forward side by side using sound pick-up outfit in audio data of the highway acquisition comprising anomalous audio event Pedestrian's work marks, and above-mentioned audio data then is divided into training dataset and test data set；

S2, pretreatment carry out preemphasis, framing, adding window to training data and test data respectively, and front and rear 2 frame is taken to form Context audio data block；

S3, acoustic feature extraction, acoustics feature extraction is done to pretreated audio data, including Meier wave filter group, Gabor filter group and normal Q cepstrum coefficients, and by above-mentioned three kinds of merging features into an acoustic feature vector；

S4, depth conversion feature extraction build depth autoencoder network, and above-mentioned acoustic feature vector input depth is self-editing Code network, determines depth autoencoder network parameter, the output of depth autoencoder network output layer is pair based on minimum error principle The reconstruct of the input acoustic feature vector of input layer, the output of depth autoencoder network bottleneck layer is depth conversion feature；

S5, anomalous audio event category, by above-mentioned depth conversion feature input trained length memory network in short-term Grader obtains the classification results of anomalous audio event.

Further, data preparation includes the following steps in the step S1：

S1.1, audio data is acquired using sound pick-up outfit：Sound pick-up outfit is placed in the isolated column among highway, sound The sample frequency of frequency evidence is 16KHz, quantization digit 8bit；

S1.2, audio data mark：More than three people or three people audio data is manually marked, for there are objections Mark, final annotation results are determined by the principle that the minority is subordinate to the majority；

S1.3, audio data divide：It is training set and test set by the audio data random division after mark, wherein training Collection accounts for 80%, and test set accounts for 20%.

Further, it pre-processes and includes the following steps in the step S2：

S2.1, preemphasis：System function is used to be filtered for the wave filter of H (z) to audio data, and：

H (z)=1- μ z^-1,

Wherein μ be constant, value 0.98；

S2.2, framing：Audio data after preemphasis is subjected to framing operation, audio frame frame length is 25 milliseconds, and frame shifting is 10 milliseconds；

S2.3, adding window：Audio data after framing is multiplied with window function ω (n), window function ω (n) is Hamming window：

Wherein, N represents frame length, and frame length is sampled point number, and N=25ms × 16KHz=400；

S2.4, construction context audio data block：Front and rear each 2 frame of audio frame is chosen as context, forms 5 frames Audio frame data block.

Further, acoustic feature extraction includes the following steps in the step S3：

S3.1, Meier wave filter group feature extraction, are as follows：

S3.1.1, to t frame audio signals x_t(n) it does Discrete Fourier Transform and obtains linear spectral X_t(k)：

S3.1.2, using mel-frequency wave filter group to above-mentioned linear spectral X_t(k) it is filtered to obtain Meier frequency spectrum, plum Your frequency filter group is several bandpass filters H_m(k), 0≤m ＜ M, M are the number of wave filter, and each wave filter has Triangle filtering characteristic, centre frequency are f (m), and the interval of adjacent f (m) is smaller when m values are smaller, adjacent with the increase of m The interval of f (m) becomes larger, and the transmission function of each bandpass filter is：

Wherein, 0≤m ＜ M, f (m) are defined as follows：

Wherein, f_l、f_hFor the low-limit frequency and highest frequency of wave filter, B^-1Inverse function for B：

B^-1(b)=700 (e^b/1125- 1),

The Meier wave filter group feature F (p) of pth frame audio signal is：

F (p)=X_t(p)H_m(p) 0≤m ＜ M；

S3.2, Gabor filter group feature extraction, are as follows：

S3.2.1, Gabor filter, Gabor filter group are made of the Gabor filter of one group of bidimensional, Gabor filtering Device function is defined as follows：

S_ω(x)=exp (j ω x),

Wherein k represents frequency indices, and n represents frame index, k₀Represent carrier frequency, n₀Represent the center of time frame, ω_kTable Show spectrum modulating frequency, ω_nRepresent time-modulation frequency, v_kAnd v_nCarrier wave is represented respectively in frequency domain and the number of semi-periods of oscillation of time domain dimension, φ represents the global phase of an additivity, and b represents frequency bandwidth；

S3.2.2, Gabor filter group feature extraction, are as follows：

S3.2.2.1, Meier spectral transformation：To t frame audio signals x_t(n) linear spectral X is obtained as discrete Fourier transform (k), then by linear spectral X (k) it is transformed to log-magnitude Meier spectrum X_m(k)：

Wherein, N is frame length, and F (k, m) represents k-th of component of m rank Meier wave filter groups, and M is Meier wave filter Number；

S3.2.2.2, it is filtered using Gabor filter：By logarithm Mel spectral coefficients X_m(k) the Gabor filtering of input bidimensional Device, the real part that Gabor filter is taken to export, obtains the Gabor filter group feature Gabor (p) of pth frame：

Wherein, Re () expressions take function real part, X_m() represents logarithm Meier spectral coefficient, and G () represents Garbor filters Wave device function；

S3.2.2.3, it Gabor filter group is applied to each Mel wave filters obtains the character representation of a higher-dimension, select With 23 Mel wave filters and 41 Gabor filters, then the output of Gabor filter has 23*41=943 dimensions, and Gabor is filtered The output of wave device obtains the Gabor filter group feature of 311 dimensions as double sampling；

S3.3, normal Q cepstrum coefficients feature extraction, are as follows：

S3.3.1, to t frame audio signals x_t(n) normal Q transformation is carried out, obtains normal Q frequency spectrums：

Wherein, k=1,2 ..., K represent frequency indices,It is a_kConjugation, N_kRepresent the length of window function,It represents Downward rounding,

Wherein, f_sRepresent sample frequency, f_kFrequency when representing index as k,Representing phase shift, ω () represents window function, f₁Represent low-limit frequency, B represents bandwidth；

S3.3.2, the energy spectrum for calculating normal Q frequency spectrums | X^CQ(k)|², take the logarithm to energy spectrum, obtain logarithmic energy spectrum log (X^CQ(k)|²), then discrete cosine transform is carried out, obtain the normal Q cepstrum coefficient CQCC (p) of pth frame audio signal：

Wherein, L is maximum discrete frequency, X^CQ(k) it is normal Q frequency spectrums；

S3.4, merging features：By Meier wave filter group feature, Gabor filter group feature, normal Q cepstrum coefficients feature is spelled It is connected into an acoustic feature vector：V=[F (p), Gabor (p), CQCC (p)].

Further, depth conversion feature extraction includes the following steps in the step S4：

S4.1, the sub-network for building depth autoencoder network：Depth autoencoder network conciliates numeral net by coding sub-network Network two parts form, and the lap of above-mentioned two sub-network is bottleneck layer, and the output of bottleneck layer is depth conversion feature；

For single layer coding sub-network：

Y_EO=f (W_inv+b_in),

Wherein v be input acoustic feature vector, Y_EOFor coding sub-network output, W_inTo encode sub-network weight matrix, b_inTo encode sub-network bias vector, f () is activation primitive, chooses Relu functions, expression formula is：

Wherein x_inInput for activation primitive；

For single layer decoding sub-network：

Y=f (W_outY_EO+b_out),

Wherein Y_EOFor decoding sub-network input, W_outTo decode sub-network weight matrix, b_outFor decoding sub-network deviation arrow Amount, f () are activation primitive, choose Relu functions, and y is the output of whole network；

Define loss function：

Wherein v is acoustic feature vector, and y is exported for whole network；

S4.2, training depth autoencoder network：Training objective makes loss function MSE small as possible, obtains network weight matrix With bias vector parameter, the acoustic feature vector extracted is then inputted into depth autoencoder network, obtains depth conversion feature.

Further, anomalous audio event category includes the following steps in the step S5：

S5.1, structure and the long memory network in short-term of training：Define network losses function：

Wherein K be audio event classification number, z_kFor the mark value of kth class audio frequency event, y_kFor the defeated of kth class audio frequency event Go out probability, network training target is minimizes loss function ψ；

S5.2, output category result：After training long memory network grader in short-term, by the depth conversion of test set sample The long memory network grader in short-term of feature input, obtains the output probability of each class audio frequency event, maximum that of output probability Class is court verdict.

The present invention is according to the typical anomalous event of highway (collision, emergency braking, tyre slip, is blown out at overturning) The characteristics of (occur various the earsplitting sounds), based on audio feature extraction with transformation to all kinds of abnormal things on highway Part carries out classification identification, effectively compensates for the deficiency currently based on video frequency monitoring method.

The present invention is had the following advantages relative to the prior art and effect：

1st, long memory network in short-term is applied to highway anomalous audio event category, compared to traditional supporting vector The graders such as machine, K be neighbouring are better.

2nd, traditional single mel-frequency cepstrum coefficient is not used, perceives the acoustic features such as predictive coefficient, and using Meier The assemblage characteristic of wave filter group, Gabor filter group and normal Q cepstrum coefficients, and using depth autoencoder network to said combination Feature carries out fusion transformation, obtains more effectively portraying the depth conversion feature of the time-frequency characteristic difference of anomalous audio event, point Class is better.

Description of the drawings

Fig. 1 is a kind of highway anomalous audio event category method based on depth conversion feature disclosed in the present invention Process step figure.

Specific embodiment

Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art All other embodiments obtained without making creative work shall fall within the protection scope of the present invention.

Embodiment

Fig. 1 is the frame of one embodiment of the highway anomalous audio event category method based on depth conversion feature Figure, it mainly includes procedure below：

S1, data preparation：It is gone forward side by side using sound pick-up outfit in audio data of the highway acquisition comprising anomalous audio event Pedestrian's work marks, and above-mentioned audio data then is divided into training dataset and test data set；Specific steps include：

S2, pretreatment：Preemphasis, framing, adding window are carried out to training data and test data respectively, take front and rear 2 frame conduct Context；Specific steps include：

H (z)=1- μ z^-1,

Wherein μ be constant, value 0.98；

S2.3, adding window：Audio data after framing is multiplied with window function, window function is Hamming window ω (n),

S2.4, construction context audio data block：Front and rear each 2 frame for choosing audio frame forms 5 frames as context Audio frame data block.

S3, acoustic feature extraction：Acoustics feature extraction is done to pretreated data, mainly includes Meier wave filter group, Gabor filter group, normal Q cepstrum coefficients, by above-mentioned three kinds of merging features into acoustic feature vector；It is as follows：

S3.1, Meier wave filter group feature extraction, are as follows：

S3.1.2, using mel-frequency wave filter group to above-mentioned linear spectral X_t(k) it is filtered to obtain Meier frequency spectrum, Middle mel-frequency wave filter group is several bandpass filters H_m(k), the number of 0≤m ＜ M, M for wave filter, each wave filter With triangle filtering characteristic, centre frequency is f (m), and the interval of adjacent f (m) is smaller when m values are smaller, with the increase of m The interval of adjacent f (m) becomes larger, and the transmission function of each bandpass filter is：

Wherein, 0≤m ＜ M, f (m) are defined as follows：

B^-1(b)=700 (e^b/1125- 1),

The Meier wave filter group feature F (p) of pth frame audio signal is：

F (p)=X_t(p)H_m(p) 0≤m ＜ M；

S3.2, Gabor filter group feature extraction, are as follows：

S3.2.1, Gabor filter：Gabor filter group is made of the Gabor filter of one group of bidimensional, Gabor filtering Device function is defined as follows：

S_ω(x)=exp (j ω x),

S3.2.2, Gabor filter group feature extraction, are as follows：

S3.2.2.2, it is filtered using Gabor filter：By logarithm Mel spectral coefficients X_m(k) the Gabor filtering of input bidimensional Device, the real part that Gabor filter is taken to export, obtains the Gabor filter group feature of pth frame：

Wherein Re () expressions take function real part, X_m() represents logarithm Meier spectral coefficient, and G () represents Garbor filtering Device function；

S3.3, normal Q cepstrum coefficients feature extraction, are as follows：

Wherein f_sRepresent sample frequency, f_kFrequency when representing index as k,Representing phase shift, ω () represents window function, f₁Represent low-limit frequency, B represents bandwidth；

S4, depth conversion feature extraction：Depth autoencoder network is built, acoustic feature vector is inputted into depth own coding net Network, the output of depth autoencoder network are the reconstruct to inputting acoustic feature vector, determine that network is joined based on minimum error principle Number, the output of depth autoencoder network bottleneck layer is depth conversion feature；It is as follows：

For single layer coding sub-network：

Y_EO=f (W_inv+b_in),

Wherein, v be input acoustic feature vector, Y_EOFor coding sub-network output, W_inTo encode sub-network weight square Battle array, b_inTo encode sub-network bias vector, f () is activation primitive, general to choose Relu functions, and expression formula is：

Wherein, x_inInput for activation primitive；

For single layer decoding sub-network：

Y=f (W_outY_EO+b_out),

Define loss function：

Wherein v is acoustic feature vector, and y is exported for whole network；

S5, anomalous audio event category：By above-mentioned depth conversion feature input trained length memory network in short-term Grader obtains the classification results of each audio event；It is as follows：

S5.1, structure and the long memory network in short-term of training：Better classification results in order to obtain, the long short-term memory of structure Number of network node is 400, learning rate 0.001, and network iterations are 3000 times, and expansion step number is 10, algorithm for training network Using the back-propagation algorithm along the time,

Define network losses function：

Wherein, K be audio event classification number, z_kFor the mark value of kth class audio frequency event, y_kFor the defeated of kth class audio frequency event Go out probability, network training target is minimizes loss function ψ；

In conclusion the highway anomalous audio event category method disclosed in the present embodiment prepares data first, adopt Collect highway anomalous audio event sample, anomalous audio event sample is then divided into training set and test set；Then it is pre- Processing carries out preemphasis, framing, windowing process to training set and test set audio event sample respectively, and front and rear 2 frame is taken to form Context audio data block；Acoustic feature is extracted from above-mentioned audio data block, is mainly included：Meier wave filter group (Mel Filter Bank, MFB), Gabor filter group (Gabor Filter Bank, GFB), normal Q cepstrum coefficients (Constant Q Cepstral Coefficient, CQCC), then by above-mentioned three kinds of merging features into characteristic vector；Features described above vector is inputted Depth autoencoder network extracts depth conversion feature；Then by the long memory network (Long in short-term of above-mentioned depth conversion feature input Short Term Memory Network, LSTMN) grader, recognize all kinds of anomalous audio events.Above-mentioned depth own coding net Network feature extractor and long memory network grader in short-term all include training step (training dataset is as input) and test step Suddenly (test data set is as input).The depth conversion that the present invention uses is characterized in the fusion and transformation of each conventional acoustic feature, With better distinction and robustness, can be obtained when classifying to the anomalous audio event in highway complex audio More preferably classifying quality.

Above-described embodiment is the preferable embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any Spirit Essences without departing from the present invention with made under principle change, modification, replacement, combine, simplification, Equivalent substitute mode is should be, is included within protection scope of the present invention.

Claims

A kind of 1. highway anomalous audio event category method based on depth conversion feature, which is characterized in that the side Method includes the following steps：

S1, data preparation are gone forward side by side pedestrian using sound pick-up outfit in audio data of the highway acquisition comprising anomalous audio event Work marks, and above-mentioned audio data then is divided into training dataset and test data set；

S2, pretreatment carry out preemphasis, framing, adding window to training data and test data respectively, take front and rear 2 frame composition up and down Literary audio data block；

S3, acoustic feature extraction, do acoustics feature extraction, including Meier wave filter group, Gabor to pretreated audio data Wave filter group and normal Q cepstrum coefficients, and by above-mentioned three kinds of merging features into an acoustic feature vector；

S4, depth conversion feature extraction build depth autoencoder network, and above-mentioned acoustic feature vector is inputted depth own coding net Network determines depth autoencoder network parameter based on minimum error principle, and the output of depth autoencoder network output layer is to input The reconstruct of the input acoustic feature vector of layer, the output of depth autoencoder network bottleneck layer is depth conversion feature；

S5, anomalous audio event category, by the input of above-mentioned depth conversion feature, trained length in short-term classified by memory network Device obtains the classification results of anomalous audio event.
2. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, It is characterized in that, data preparation includes the following steps in the step S1：

S1.1, audio data is acquired using sound pick-up outfit：Sound pick-up outfit is placed in the isolated column among highway, audio number According to sample frequency be 16KHz, quantization digit 8bit；

S1.2, audio data mark：More than three people or three people audio data is manually marked, for there are the marks of objection Note, final annotation results are determined by the principle that the minority is subordinate to the majority；

S1.3, audio data divide：It is training set and test set by the audio data random division after mark, wherein training set accounts for 80%, test set accounts for 20%.
3. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, Include the following steps it is characterized in that, being pre-processed in the step S2：

S2.1, preemphasis：System function is used to be filtered for the wave filter of H (z) to audio data, and：

H (z)=1- μ z^-1,

Wherein μ be constant, value 0.98；

S2.2, framing：Audio data after preemphasis is subjected to framing operation, audio frame frame length is 25 milliseconds, and frame is moved as 10 millis Second；

S2.3, adding window：Audio data after framing is multiplied with window function ω (n), window function ω (n) is Hamming window：

Wherein, N represents frame length, and frame length is sampled point number, and N=25ms × 16KHz=400；

S2.4, construction context audio data block：Front and rear each 2 frame of audio frame is chosen as context, forms the sound of 5 frames Frequency frame data block.
4. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, It is characterized in that, acoustic feature extraction includes the following steps in the step S3：

S3.1, Meier wave filter group feature extraction, are as follows：

S3.1.1, to t frame audio signals x_t(n) it does Discrete Fourier Transform and obtains linear spectral X_t(k)：

S3.1.2, using mel-frequency wave filter group to above-mentioned linear spectral X_t(k) it is filtered to obtain Meier frequency spectrum, Meier frequency Rate wave filter group is several bandpass filters H_m(k), 0≤m ＜ M, M are the number of wave filter, and each wave filter has triangle Shape filtering characteristic, centre frequency are f (m), and the interval of adjacent f (m) is smaller when m values are smaller, with the adjacent f (m) of the increase of m Interval become larger, the transmission function of each bandpass filter is：

Wherein, 0≤m ＜ M, f (m) are defined as follows：

Wherein, f_l、f_hFor the low-limit frequency and highest frequency of wave filter, B^-1Inverse function for B：

B^-1(b)=700 (e^b/1125- 1),

The Meier wave filter group feature F (p) of pth frame audio signal is：

F (p)=X_t(p)H_m(p) 0≤m ＜ M；

S3.2, Gabor filter group feature extraction, are as follows：

S3.2.1, Gabor filter, Gabor filter group are made of the Gabor filter of one group of bidimensional, Gabor filter letter Number is defined as follows：

S_ω(x)=exp (j ω x),

Wherein k represents frequency indices, and n represents frame index, k₀Represent carrier frequency, n₀Represent the center of time frame, ω_kRepresent spectrum Modulating frequency, ω_nRepresent time-modulation frequency, v_kAnd v_nRepresent carrier wave in frequency domain and the number of semi-periods of oscillation of time domain dimension, φ tables respectively Show the global phase of an additivity, b represents frequency bandwidth；

S3.2.2, Gabor filter group feature extraction, are as follows：

S3.2.2.1, Meier spectral transformation：To t frame audio signals x_t(n) linear spectral X (k) is obtained as discrete Fourier transform, Linear spectral X (k) is transformed to log-magnitude Meier spectrum X again_m(k)：

Wherein, N is frame length, and F (k, m) represents k-th of component of m rank Meier wave filter groups, and M is Meier number of filter；

S3.2.2.2, it is filtered using Gabor filter：By logarithm Mel spectral coefficients X_m(k) Gabor filter of bidimensional is inputted, is taken The real part of Gabor filter output, obtains the Gabor filter group feature Gabor (p) of pth frame：

Wherein, Re () expressions take function real part, X_m() represents logarithm Meier spectral coefficient, and G () represents Garbor wave filter letters Number；

S3.2.2.3, it Gabor filter group is applied to each Mel wave filters obtains the character representation of a higher-dimension, select 23 A Mel wave filters and 41 Gabor filters, then the output of Gabor filter has 23*41=943 dimensions, to Gabor filter Output obtain the Gabor filter group feature of 311 dimensions as double sampling；

S3.3, normal Q cepstrum coefficients feature extraction, are as follows：

S3.3.1, to t frame audio signals x_t(n) normal Q transformation is carried out, obtains normal Q frequency spectrums：

Wherein, k=1,2 ..., K represent frequency indices,It is a_kConjugation, N_kRepresent the length of window function,Expression takes downwards It is whole,

Wherein, f_sRepresent sample frequency, f_kFrequency when representing index as k,Represent phase shift, ω () represents window function, f₁Table Show low-limit frequency, B represents bandwidth；

S3.3.2, the energy spectrum for calculating normal Q frequency spectrums | X^CQ(k)|², take the logarithm to energy spectrum, obtain logarithmic energy spectrum log (X^CQ(k) |²), then discrete cosine transform is carried out, obtain the normal Q cepstrum coefficient CQCC (p) of pth frame audio signal：

Wherein, L is maximum discrete frequency, X^CQ(k) it is normal Q frequency spectrums；

S3.4, merging features：By Meier wave filter group feature, Gabor filter group feature, normal Q cepstrum coefficients merging features into One acoustic feature vector：V=[F (p), Gabor (p), CQCC (p)].
5. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, It is characterized in that, depth conversion feature extraction includes the following steps in the step S4：

S4.1, the sub-network for building depth autoencoder network：Depth autoencoder network is by coding sub-network reconciliation numeral network two Part forms, and the lap of above-mentioned two sub-network is bottleneck layer, and the output of bottleneck layer is depth conversion feature；

For single layer coding sub-network：

Y_EO=f (W_inv+b_in),

Wherein v be input acoustic feature vector, Y_EOFor coding sub-network output, W_inTo encode sub-network weight matrix, b_inFor Sub-network bias vector is encoded, f () is activation primitive, chooses Relu functions, expression formula is：

Wherein x_inInput for activation primitive；

For single layer decoding sub-network：

Y=f (W_outY_EO+b_out),

Wherein Y_EOFor decoding sub-network input, W_outTo decode sub-network weight matrix, b_outTo decode sub-network bias vector, f () is activation primitive, chooses Relu functions, and y is the output of whole network；

Define loss function：

Wherein v is acoustic feature vector, and y is exported for whole network；

S4.2, training depth autoencoder network：Training objective makes loss function MSE small as possible, obtain network weight matrix and partially Then the acoustic feature vector extracted is inputted depth autoencoder network, obtains depth conversion feature by difference vector parameter.
6. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, It is characterized in that, anomalous audio event category includes the following steps in the step S5：

S5.1, structure and the long memory network in short-term of training：Define network losses function：

Wherein K be audio event classification number, z_kFor the mark value of kth class audio frequency event, y_kOutput for kth class audio frequency event is general Rate, network training target is minimizes loss function ψ；

S5.2, output category result：After training long memory network grader in short-term, by the depth conversion feature of test set sample The long memory network grader in short-term of input, obtains the output probability of each class audio frequency event, that one kind of output probability maximum is For court verdict.