CN108182949A - A kind of highway anomalous audio event category method based on depth conversion feature - Google Patents

A kind of highway anomalous audio event category method based on depth conversion feature Download PDF

Info

Publication number
CN108182949A
CN108182949A CN201711305135.9A CN201711305135A CN108182949A CN 108182949 A CN108182949 A CN 108182949A CN 201711305135 A CN201711305135 A CN 201711305135A CN 108182949 A CN108182949 A CN 108182949A
Authority
CN
China
Prior art keywords
network
frequency
audio
depth
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711305135.9A
Other languages
Chinese (zh)
Inventor
李艳雄
李先苦
张聿晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201711305135.9A priority Critical patent/CN108182949A/en
Publication of CN108182949A publication Critical patent/CN108182949A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a kind of highway anomalous audio event category methods based on depth conversion feature, acquire highway anomalous audio event sample first, are then divided into training set and test set;Then preemphasis, framing, windowing process are carried out to training set and test set audio event sample respectively, and front and rear 2 frame is taken to form context audio data block;Acoustic feature is extracted from above-mentioned audio data block and is spliced into characteristic vector;By characteristic vector input depth autoencoder network extraction depth conversion feature;Then the long memory network grader in short-term of input, recognizes all kinds of anomalous audio events.Above-mentioned depth autoencoder network feature extractor and long memory network grader in short-term all include training step and testing procedure.The depth conversion that the present invention uses is characterized in the fusion and transformation of each conventional acoustic feature, has better distinction and robustness, more preferably classifying quality can be obtained when classifying to the anomalous audio event in highway complex audio.

Description

A kind of highway anomalous audio event category method based on depth conversion feature
Technical field
The present invention relates to Audio Signal Processings and machine learning techniques field, and in particular to one kind is based on depth conversion feature Highway anomalous audio event category method.
Background technology
As the improvement of people's living standards, the quantity sharp increase of private car, is efficiently runed to expressway safety Pressure is increased, there is an urgent need for a kind of normal events that can be on automatic distinguishing highway and the method for anomalous event.Highway Situation is more complicated, and various abnormal conditions are likely to occur, and traditional method based on video monitoring is difficult to efficiently should comprehensively To various abnormal emergency cases.
Traditional audio event sorting technique is mostly single using mel-frequency cepstrum coefficient, perception linear predictor coefficient etc. Acoustic feature.In view of the features such as difference is big, class inherited is small in highway anomalous audio event context loudness of a sound, class, single biography System acoustic feature can not effectively portray the difference between each audio event, and the present invention combines multiple acoustic features and using depth god Depth integration and the transformation of feature are carried out through network, it is intended to assemble the advantage of each acoustic feature and further excavate each acoustic feature Latent trait obtains more distinction and the depth conversion feature of noise immunity.
Invention content
The purpose of the present invention is to solve drawbacks described above of the prior art, provide a kind of based on depth conversion feature Highway anomalous audio event category method, includes the following steps, prepares data first, acquires highway anomalous audio thing Anomalous audio event sample is simultaneously divided into training set and test set by part sample;Then it pre-processes, respectively to training set and test Collect audio event sample and carry out preemphasis, framing, windowing process, and front and rear 2 frame is taken to form context audio data block;From above-mentioned Acoustic feature is extracted in audio data block, is mainly included:Meier wave filter group, Gabor filter group, normal Q cepstrum coefficients will be upper Three kinds of merging features are stated into acoustic feature vector and input depth autoencoder network extraction depth conversion feature;Then by above-mentioned depth The long memory network grader in short-term of transform characteristics input is spent, recognizes all kinds of anomalous audio events.Above-mentioned depth autoencoder network is special It levies extractor and long memory network grader in short-term all includes training step and testing procedure.The depth conversion that the present invention uses is special Sign is the fusion and transformation of each conventional acoustic feature, has better distinction and robustness, to highway complex audio In anomalous audio event when being classified, more preferably classifying quality can be obtained.
The purpose of the present invention can be reached by adopting the following technical scheme that:
A kind of highway anomalous audio event category method based on depth conversion feature, the method include following Step:
S1, data preparation are gone forward side by side using sound pick-up outfit in audio data of the highway acquisition comprising anomalous audio event Pedestrian's work marks, and above-mentioned audio data then is divided into training dataset and test data set;
S2, pretreatment carry out preemphasis, framing, adding window to training data and test data respectively, and front and rear 2 frame is taken to form Context audio data block;
S3, acoustic feature extraction, acoustics feature extraction is done to pretreated audio data, including Meier wave filter group, Gabor filter group and normal Q cepstrum coefficients, and by above-mentioned three kinds of merging features into an acoustic feature vector;
S4, depth conversion feature extraction build depth autoencoder network, and above-mentioned acoustic feature vector input depth is self-editing Code network, determines depth autoencoder network parameter, the output of depth autoencoder network output layer is pair based on minimum error principle The reconstruct of the input acoustic feature vector of input layer, the output of depth autoencoder network bottleneck layer is depth conversion feature;
S5, anomalous audio event category, by above-mentioned depth conversion feature input trained length memory network in short-term Grader obtains the classification results of anomalous audio event.
Further, data preparation includes the following steps in the step S1:
S1.1, audio data is acquired using sound pick-up outfit:Sound pick-up outfit is placed in the isolated column among highway, sound The sample frequency of frequency evidence is 16KHz, quantization digit 8bit;
S1.2, audio data mark:More than three people or three people audio data is manually marked, for there are objections Mark, final annotation results are determined by the principle that the minority is subordinate to the majority;
S1.3, audio data divide:It is training set and test set by the audio data random division after mark, wherein training Collection accounts for 80%, and test set accounts for 20%.
Further, it pre-processes and includes the following steps in the step S2:
S2.1, preemphasis:System function is used to be filtered for the wave filter of H (z) to audio data, and:
H (z)=1- μ z-1,
Wherein μ be constant, value 0.98;
S2.2, framing:Audio data after preemphasis is subjected to framing operation, audio frame frame length is 25 milliseconds, and frame shifting is 10 milliseconds;
S2.3, adding window:Audio data after framing is multiplied with window function ω (n), window function ω (n) is Hamming window:
Wherein, N represents frame length, and frame length is sampled point number, and N=25ms × 16KHz=400;
S2.4, construction context audio data block:Front and rear each 2 frame of audio frame is chosen as context, forms 5 frames Audio frame data block.
Further, acoustic feature extraction includes the following steps in the step S3:
S3.1, Meier wave filter group feature extraction, are as follows:
S3.1.1, to t frame audio signals xt(n) it does Discrete Fourier Transform and obtains linear spectral Xt(k):
S3.1.2, using mel-frequency wave filter group to above-mentioned linear spectral Xt(k) it is filtered to obtain Meier frequency spectrum, plum Your frequency filter group is several bandpass filters Hm(k), 0≤m < M, M are the number of wave filter, and each wave filter has Triangle filtering characteristic, centre frequency are f (m), and the interval of adjacent f (m) is smaller when m values are smaller, adjacent with the increase of m The interval of f (m) becomes larger, and the transmission function of each bandpass filter is:
Wherein, 0≤m < M, f (m) are defined as follows:
Wherein, fl、fhFor the low-limit frequency and highest frequency of wave filter, B-1Inverse function for B:
B-1(b)=700 (eb/1125- 1),
The Meier wave filter group feature F (p) of pth frame audio signal is:
F (p)=Xt(p)Hm(p) 0≤m < M;
S3.2, Gabor filter group feature extraction, are as follows:
S3.2.1, Gabor filter, Gabor filter group are made of the Gabor filter of one group of bidimensional, Gabor filtering Device function is defined as follows:
Sω(x)=exp (j ω x),
Wherein k represents frequency indices, and n represents frame index, k0Represent carrier frequency, n0Represent the center of time frame, ωkTable Show spectrum modulating frequency, ωnRepresent time-modulation frequency, vkAnd vnCarrier wave is represented respectively in frequency domain and the number of semi-periods of oscillation of time domain dimension, φ represents the global phase of an additivity, and b represents frequency bandwidth;
S3.2.2, Gabor filter group feature extraction, are as follows:
S3.2.2.1, Meier spectral transformation:To t frame audio signals xt(n) linear spectral X is obtained as discrete Fourier transform (k), then by linear spectral X (k) it is transformed to log-magnitude Meier spectrum Xm(k):
Wherein, N is frame length, and F (k, m) represents k-th of component of m rank Meier wave filter groups, and M is Meier wave filter Number;
S3.2.2.2, it is filtered using Gabor filter:By logarithm Mel spectral coefficients Xm(k) the Gabor filtering of input bidimensional Device, the real part that Gabor filter is taken to export, obtains the Gabor filter group feature Gabor (p) of pth frame:
Wherein, Re () expressions take function real part, Xm() represents logarithm Meier spectral coefficient, and G () represents Garbor filters Wave device function;
S3.2.2.3, it Gabor filter group is applied to each Mel wave filters obtains the character representation of a higher-dimension, select With 23 Mel wave filters and 41 Gabor filters, then the output of Gabor filter has 23*41=943 dimensions, and Gabor is filtered The output of wave device obtains the Gabor filter group feature of 311 dimensions as double sampling;
S3.3, normal Q cepstrum coefficients feature extraction, are as follows:
S3.3.1, to t frame audio signals xt(n) normal Q transformation is carried out, obtains normal Q frequency spectrums:
Wherein, k=1,2 ..., K represent frequency indices,It is akConjugation, NkRepresent the length of window function,It represents Downward rounding,
Wherein, fsRepresent sample frequency, fkFrequency when representing index as k,Representing phase shift, ω () represents window function, f1Represent low-limit frequency, B represents bandwidth;
S3.3.2, the energy spectrum for calculating normal Q frequency spectrums | XCQ(k)|2, take the logarithm to energy spectrum, obtain logarithmic energy spectrum log (XCQ(k)|2), then discrete cosine transform is carried out, obtain the normal Q cepstrum coefficient CQCC (p) of pth frame audio signal:
Wherein, L is maximum discrete frequency, XCQ(k) it is normal Q frequency spectrums;
S3.4, merging features:By Meier wave filter group feature, Gabor filter group feature, normal Q cepstrum coefficients feature is spelled It is connected into an acoustic feature vector:V=[F (p), Gabor (p), CQCC (p)].
Further, depth conversion feature extraction includes the following steps in the step S4:
S4.1, the sub-network for building depth autoencoder network:Depth autoencoder network conciliates numeral net by coding sub-network Network two parts form, and the lap of above-mentioned two sub-network is bottleneck layer, and the output of bottleneck layer is depth conversion feature;
For single layer coding sub-network:
YEO=f (Winv+bin),
Wherein v be input acoustic feature vector, YEOFor coding sub-network output, WinTo encode sub-network weight matrix, binTo encode sub-network bias vector, f () is activation primitive, chooses Relu functions, expression formula is:
Wherein xinInput for activation primitive;
For single layer decoding sub-network:
Y=f (WoutYEO+bout),
Wherein YEOFor decoding sub-network input, WoutTo decode sub-network weight matrix, boutFor decoding sub-network deviation arrow Amount, f () are activation primitive, choose Relu functions, and y is the output of whole network;
Define loss function:
Wherein v is acoustic feature vector, and y is exported for whole network;
S4.2, training depth autoencoder network:Training objective makes loss function MSE small as possible, obtains network weight matrix With bias vector parameter, the acoustic feature vector extracted is then inputted into depth autoencoder network, obtains depth conversion feature.
Further, anomalous audio event category includes the following steps in the step S5:
S5.1, structure and the long memory network in short-term of training:Define network losses function:
Wherein K be audio event classification number, zkFor the mark value of kth class audio frequency event, ykFor the defeated of kth class audio frequency event Go out probability, network training target is minimizes loss function ψ;
S5.2, output category result:After training long memory network grader in short-term, by the depth conversion of test set sample The long memory network grader in short-term of feature input, obtains the output probability of each class audio frequency event, maximum that of output probability Class is court verdict.
The present invention is according to the typical anomalous event of highway (collision, emergency braking, tyre slip, is blown out at overturning) The characteristics of (occur various the earsplitting sounds), based on audio feature extraction with transformation to all kinds of abnormal things on highway Part carries out classification identification, effectively compensates for the deficiency currently based on video frequency monitoring method.
The present invention is had the following advantages relative to the prior art and effect:
1st, long memory network in short-term is applied to highway anomalous audio event category, compared to traditional supporting vector The graders such as machine, K be neighbouring are better.
2nd, traditional single mel-frequency cepstrum coefficient is not used, perceives the acoustic features such as predictive coefficient, and using Meier The assemblage characteristic of wave filter group, Gabor filter group and normal Q cepstrum coefficients, and using depth autoencoder network to said combination Feature carries out fusion transformation, obtains more effectively portraying the depth conversion feature of the time-frequency characteristic difference of anomalous audio event, point Class is better.
Description of the drawings
Fig. 1 is a kind of highway anomalous audio event category method based on depth conversion feature disclosed in the present invention Process step figure.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art All other embodiments obtained without making creative work shall fall within the protection scope of the present invention.
Embodiment
Fig. 1 is the frame of one embodiment of the highway anomalous audio event category method based on depth conversion feature Figure, it mainly includes procedure below:
S1, data preparation:It is gone forward side by side using sound pick-up outfit in audio data of the highway acquisition comprising anomalous audio event Pedestrian's work marks, and above-mentioned audio data then is divided into training dataset and test data set;Specific steps include:
S1.1, audio data is acquired using sound pick-up outfit:Sound pick-up outfit is placed in the isolated column among highway, sound The sample frequency of frequency evidence is 16KHz, quantization digit 8bit;
S1.2, audio data mark:More than three people or three people audio data is manually marked, for there are objections Mark, final annotation results are determined by the principle that the minority is subordinate to the majority;
S1.3, audio data divide:It is training set and test set by the audio data random division after mark, wherein training Collection accounts for 80%, and test set accounts for 20%.
S2, pretreatment:Preemphasis, framing, adding window are carried out to training data and test data respectively, take front and rear 2 frame conduct Context;Specific steps include:
S2.1, preemphasis:System function is used to be filtered for the wave filter of H (z) to audio data, and:
H (z)=1- μ z-1,
Wherein μ be constant, value 0.98;
S2.2, framing:Audio data after preemphasis is subjected to framing operation, audio frame frame length is 25 milliseconds, and frame shifting is 10 milliseconds;
S2.3, adding window:Audio data after framing is multiplied with window function, window function is Hamming window ω (n),
Wherein, N represents frame length, and frame length is sampled point number, and N=25ms × 16KHz=400;
S2.4, construction context audio data block:Front and rear each 2 frame for choosing audio frame forms 5 frames as context Audio frame data block.
S3, acoustic feature extraction:Acoustics feature extraction is done to pretreated data, mainly includes Meier wave filter group, Gabor filter group, normal Q cepstrum coefficients, by above-mentioned three kinds of merging features into acoustic feature vector;It is as follows:
S3.1, Meier wave filter group feature extraction, are as follows:
S3.1.1, to t frame audio signals xt(n) it does Discrete Fourier Transform and obtains linear spectral Xt(k):
S3.1.2, using mel-frequency wave filter group to above-mentioned linear spectral Xt(k) it is filtered to obtain Meier frequency spectrum, Middle mel-frequency wave filter group is several bandpass filters Hm(k), the number of 0≤m < M, M for wave filter, each wave filter With triangle filtering characteristic, centre frequency is f (m), and the interval of adjacent f (m) is smaller when m values are smaller, with the increase of m The interval of adjacent f (m) becomes larger, and the transmission function of each bandpass filter is:
Wherein, 0≤m < M, f (m) are defined as follows:
Wherein, fl、fhFor the low-limit frequency and highest frequency of wave filter, B-1Inverse function for B:
B-1(b)=700 (eb/1125- 1),
The Meier wave filter group feature F (p) of pth frame audio signal is:
F (p)=Xt(p)Hm(p) 0≤m < M;
S3.2, Gabor filter group feature extraction, are as follows:
S3.2.1, Gabor filter:Gabor filter group is made of the Gabor filter of one group of bidimensional, Gabor filtering Device function is defined as follows:
Sω(x)=exp (j ω x),
Wherein k represents frequency indices, and n represents frame index, k0Represent carrier frequency, n0Represent the center of time frame, ωkTable Show spectrum modulating frequency, ωnRepresent time-modulation frequency, vkAnd vnCarrier wave is represented respectively in frequency domain and the number of semi-periods of oscillation of time domain dimension, φ represents the global phase of an additivity, and b represents frequency bandwidth;
S3.2.2, Gabor filter group feature extraction, are as follows:
S3.2.2.1, Meier spectral transformation:To t frame audio signals xt(n) linear spectral X is obtained as discrete Fourier transform (k), then by linear spectral X (k) it is transformed to log-magnitude Meier spectrum Xm(k):
Wherein, N is frame length, and F (k, m) represents k-th of component of m rank Meier wave filter groups, and M is Meier wave filter Number;
S3.2.2.2, it is filtered using Gabor filter:By logarithm Mel spectral coefficients Xm(k) the Gabor filtering of input bidimensional Device, the real part that Gabor filter is taken to export, obtains the Gabor filter group feature of pth frame:
Wherein Re () expressions take function real part, Xm() represents logarithm Meier spectral coefficient, and G () represents Garbor filtering Device function;
S3.2.2.3, it Gabor filter group is applied to each Mel wave filters obtains the character representation of a higher-dimension, select With 23 Mel wave filters and 41 Gabor filters, then the output of Gabor filter has 23*41=943 dimensions, and Gabor is filtered The output of wave device obtains the Gabor filter group feature of 311 dimensions as double sampling;
S3.3, normal Q cepstrum coefficients feature extraction, are as follows:
S3.3.1, to t frame audio signals xt(n) normal Q transformation is carried out, obtains normal Q frequency spectrums:
Wherein, k=1,2 ..., K represent frequency indices,It is akConjugation, NkRepresent the length of window function,It represents Downward rounding,
Wherein fsRepresent sample frequency, fkFrequency when representing index as k,Representing phase shift, ω () represents window function, f1Represent low-limit frequency, B represents bandwidth;
S3.3.2, the energy spectrum for calculating normal Q frequency spectrums | XCQ(k)|2, take the logarithm to energy spectrum, obtain logarithmic energy spectrum log (XCQ(k)|2), then discrete cosine transform is carried out, obtain the normal Q cepstrum coefficient CQCC (p) of pth frame audio signal:
Wherein, L is maximum discrete frequency, XCQ(k) it is normal Q frequency spectrums;
S3.4, merging features:By Meier wave filter group feature, Gabor filter group feature, normal Q cepstrum coefficients feature is spelled It is connected into an acoustic feature vector:V=[F (p), Gabor (p), CQCC (p)].
S4, depth conversion feature extraction:Depth autoencoder network is built, acoustic feature vector is inputted into depth own coding net Network, the output of depth autoencoder network are the reconstruct to inputting acoustic feature vector, determine that network is joined based on minimum error principle Number, the output of depth autoencoder network bottleneck layer is depth conversion feature;It is as follows:
S4.1, the sub-network for building depth autoencoder network:Depth autoencoder network conciliates numeral net by coding sub-network Network two parts form, and the lap of above-mentioned two sub-network is bottleneck layer, and the output of bottleneck layer is depth conversion feature;
For single layer coding sub-network:
YEO=f (Winv+bin),
Wherein, v be input acoustic feature vector, YEOFor coding sub-network output, WinTo encode sub-network weight square Battle array, binTo encode sub-network bias vector, f () is activation primitive, general to choose Relu functions, and expression formula is:
Wherein, xinInput for activation primitive;
For single layer decoding sub-network:
Y=f (WoutYEO+bout),
Wherein YEOFor decoding sub-network input, WoutTo decode sub-network weight matrix, boutFor decoding sub-network deviation arrow Amount, f () are activation primitive, choose Relu functions, and y is the output of whole network;
Define loss function:
Wherein v is acoustic feature vector, and y is exported for whole network;
S4.2, training depth autoencoder network:Training objective makes loss function MSE small as possible, obtains network weight matrix With bias vector parameter, the acoustic feature vector extracted is then inputted into depth autoencoder network, obtains depth conversion feature.
S5, anomalous audio event category:By above-mentioned depth conversion feature input trained length memory network in short-term Grader obtains the classification results of each audio event;It is as follows:
S5.1, structure and the long memory network in short-term of training:Better classification results in order to obtain, the long short-term memory of structure Number of network node is 400, learning rate 0.001, and network iterations are 3000 times, and expansion step number is 10, algorithm for training network Using the back-propagation algorithm along the time,
Define network losses function:
Wherein, K be audio event classification number, zkFor the mark value of kth class audio frequency event, ykFor the defeated of kth class audio frequency event Go out probability, network training target is minimizes loss function ψ;
S5.2, output category result:After training long memory network grader in short-term, by the depth conversion of test set sample The long memory network grader in short-term of feature input, obtains the output probability of each class audio frequency event, maximum that of output probability Class is court verdict.
In conclusion the highway anomalous audio event category method disclosed in the present embodiment prepares data first, adopt Collect highway anomalous audio event sample, anomalous audio event sample is then divided into training set and test set;Then it is pre- Processing carries out preemphasis, framing, windowing process to training set and test set audio event sample respectively, and front and rear 2 frame is taken to form Context audio data block;Acoustic feature is extracted from above-mentioned audio data block, is mainly included:Meier wave filter group (Mel Filter Bank, MFB), Gabor filter group (Gabor Filter Bank, GFB), normal Q cepstrum coefficients (Constant Q Cepstral Coefficient, CQCC), then by above-mentioned three kinds of merging features into characteristic vector;Features described above vector is inputted Depth autoencoder network extracts depth conversion feature;Then by the long memory network (Long in short-term of above-mentioned depth conversion feature input Short Term Memory Network, LSTMN) grader, recognize all kinds of anomalous audio events.Above-mentioned depth own coding net Network feature extractor and long memory network grader in short-term all include training step (training dataset is as input) and test step Suddenly (test data set is as input).The depth conversion that the present invention uses is characterized in the fusion and transformation of each conventional acoustic feature, With better distinction and robustness, can be obtained when classifying to the anomalous audio event in highway complex audio More preferably classifying quality.
Above-described embodiment is the preferable embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any Spirit Essences without departing from the present invention with made under principle change, modification, replacement, combine, simplification, Equivalent substitute mode is should be, is included within protection scope of the present invention.

Claims (6)

  1. A kind of 1. highway anomalous audio event category method based on depth conversion feature, which is characterized in that the side Method includes the following steps:
    S1, data preparation are gone forward side by side pedestrian using sound pick-up outfit in audio data of the highway acquisition comprising anomalous audio event Work marks, and above-mentioned audio data then is divided into training dataset and test data set;
    S2, pretreatment carry out preemphasis, framing, adding window to training data and test data respectively, take front and rear 2 frame composition up and down Literary audio data block;
    S3, acoustic feature extraction, do acoustics feature extraction, including Meier wave filter group, Gabor to pretreated audio data Wave filter group and normal Q cepstrum coefficients, and by above-mentioned three kinds of merging features into an acoustic feature vector;
    S4, depth conversion feature extraction build depth autoencoder network, and above-mentioned acoustic feature vector is inputted depth own coding net Network determines depth autoencoder network parameter based on minimum error principle, and the output of depth autoencoder network output layer is to input The reconstruct of the input acoustic feature vector of layer, the output of depth autoencoder network bottleneck layer is depth conversion feature;
    S5, anomalous audio event category, by the input of above-mentioned depth conversion feature, trained length in short-term classified by memory network Device obtains the classification results of anomalous audio event.
  2. 2. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, It is characterized in that, data preparation includes the following steps in the step S1:
    S1.1, audio data is acquired using sound pick-up outfit:Sound pick-up outfit is placed in the isolated column among highway, audio number According to sample frequency be 16KHz, quantization digit 8bit;
    S1.2, audio data mark:More than three people or three people audio data is manually marked, for there are the marks of objection Note, final annotation results are determined by the principle that the minority is subordinate to the majority;
    S1.3, audio data divide:It is training set and test set by the audio data random division after mark, wherein training set accounts for 80%, test set accounts for 20%.
  3. 3. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, Include the following steps it is characterized in that, being pre-processed in the step S2:
    S2.1, preemphasis:System function is used to be filtered for the wave filter of H (z) to audio data, and:
    H (z)=1- μ z-1,
    Wherein μ be constant, value 0.98;
    S2.2, framing:Audio data after preemphasis is subjected to framing operation, audio frame frame length is 25 milliseconds, and frame is moved as 10 millis Second;
    S2.3, adding window:Audio data after framing is multiplied with window function ω (n), window function ω (n) is Hamming window:
    Wherein, N represents frame length, and frame length is sampled point number, and N=25ms × 16KHz=400;
    S2.4, construction context audio data block:Front and rear each 2 frame of audio frame is chosen as context, forms the sound of 5 frames Frequency frame data block.
  4. 4. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, It is characterized in that, acoustic feature extraction includes the following steps in the step S3:
    S3.1, Meier wave filter group feature extraction, are as follows:
    S3.1.1, to t frame audio signals xt(n) it does Discrete Fourier Transform and obtains linear spectral Xt(k):
    S3.1.2, using mel-frequency wave filter group to above-mentioned linear spectral Xt(k) it is filtered to obtain Meier frequency spectrum, Meier frequency Rate wave filter group is several bandpass filters Hm(k), 0≤m < M, M are the number of wave filter, and each wave filter has triangle Shape filtering characteristic, centre frequency are f (m), and the interval of adjacent f (m) is smaller when m values are smaller, with the adjacent f (m) of the increase of m Interval become larger, the transmission function of each bandpass filter is:
    Wherein, 0≤m < M, f (m) are defined as follows:
    Wherein, fl、fhFor the low-limit frequency and highest frequency of wave filter, B-1Inverse function for B:
    B-1(b)=700 (eb/1125- 1),
    The Meier wave filter group feature F (p) of pth frame audio signal is:
    F (p)=Xt(p)Hm(p) 0≤m < M;
    S3.2, Gabor filter group feature extraction, are as follows:
    S3.2.1, Gabor filter, Gabor filter group are made of the Gabor filter of one group of bidimensional, Gabor filter letter Number is defined as follows:
    Sω(x)=exp (j ω x),
    Wherein k represents frequency indices, and n represents frame index, k0Represent carrier frequency, n0Represent the center of time frame, ωkRepresent spectrum Modulating frequency, ωnRepresent time-modulation frequency, vkAnd vnRepresent carrier wave in frequency domain and the number of semi-periods of oscillation of time domain dimension, φ tables respectively Show the global phase of an additivity, b represents frequency bandwidth;
    S3.2.2, Gabor filter group feature extraction, are as follows:
    S3.2.2.1, Meier spectral transformation:To t frame audio signals xt(n) linear spectral X (k) is obtained as discrete Fourier transform, Linear spectral X (k) is transformed to log-magnitude Meier spectrum X againm(k):
    Wherein, N is frame length, and F (k, m) represents k-th of component of m rank Meier wave filter groups, and M is Meier number of filter;
    S3.2.2.2, it is filtered using Gabor filter:By logarithm Mel spectral coefficients Xm(k) Gabor filter of bidimensional is inputted, is taken The real part of Gabor filter output, obtains the Gabor filter group feature Gabor (p) of pth frame:
    Wherein, Re () expressions take function real part, Xm() represents logarithm Meier spectral coefficient, and G () represents Garbor wave filter letters Number;
    S3.2.2.3, it Gabor filter group is applied to each Mel wave filters obtains the character representation of a higher-dimension, select 23 A Mel wave filters and 41 Gabor filters, then the output of Gabor filter has 23*41=943 dimensions, to Gabor filter Output obtain the Gabor filter group feature of 311 dimensions as double sampling;
    S3.3, normal Q cepstrum coefficients feature extraction, are as follows:
    S3.3.1, to t frame audio signals xt(n) normal Q transformation is carried out, obtains normal Q frequency spectrums:
    Wherein, k=1,2 ..., K represent frequency indices,It is akConjugation, NkRepresent the length of window function,Expression takes downwards It is whole,
    Wherein, fsRepresent sample frequency, fkFrequency when representing index as k,Represent phase shift, ω () represents window function, f1Table Show low-limit frequency, B represents bandwidth;
    S3.3.2, the energy spectrum for calculating normal Q frequency spectrums | XCQ(k)|2, take the logarithm to energy spectrum, obtain logarithmic energy spectrum log (XCQ(k) |2), then discrete cosine transform is carried out, obtain the normal Q cepstrum coefficient CQCC (p) of pth frame audio signal:
    Wherein, L is maximum discrete frequency, XCQ(k) it is normal Q frequency spectrums;
    S3.4, merging features:By Meier wave filter group feature, Gabor filter group feature, normal Q cepstrum coefficients merging features into One acoustic feature vector:V=[F (p), Gabor (p), CQCC (p)].
  5. 5. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, It is characterized in that, depth conversion feature extraction includes the following steps in the step S4:
    S4.1, the sub-network for building depth autoencoder network:Depth autoencoder network is by coding sub-network reconciliation numeral network two Part forms, and the lap of above-mentioned two sub-network is bottleneck layer, and the output of bottleneck layer is depth conversion feature;
    For single layer coding sub-network:
    YEO=f (Winv+bin),
    Wherein v be input acoustic feature vector, YEOFor coding sub-network output, WinTo encode sub-network weight matrix, binFor Sub-network bias vector is encoded, f () is activation primitive, chooses Relu functions, expression formula is:
    Wherein xinInput for activation primitive;
    For single layer decoding sub-network:
    Y=f (WoutYEO+bout),
    Wherein YEOFor decoding sub-network input, WoutTo decode sub-network weight matrix, boutTo decode sub-network bias vector, f () is activation primitive, chooses Relu functions, and y is the output of whole network;
    Define loss function:
    Wherein v is acoustic feature vector, and y is exported for whole network;
    S4.2, training depth autoencoder network:Training objective makes loss function MSE small as possible, obtain network weight matrix and partially Then the acoustic feature vector extracted is inputted depth autoencoder network, obtains depth conversion feature by difference vector parameter.
  6. 6. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, It is characterized in that, anomalous audio event category includes the following steps in the step S5:
    S5.1, structure and the long memory network in short-term of training:Define network losses function:
    Wherein K be audio event classification number, zkFor the mark value of kth class audio frequency event, ykOutput for kth class audio frequency event is general Rate, network training target is minimizes loss function ψ;
    S5.2, output category result:After training long memory network grader in short-term, by the depth conversion feature of test set sample The long memory network grader in short-term of input, obtains the output probability of each class audio frequency event, that one kind of output probability maximum is For court verdict.
CN201711305135.9A 2017-12-11 2017-12-11 A kind of highway anomalous audio event category method based on depth conversion feature Pending CN108182949A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711305135.9A CN108182949A (en) 2017-12-11 2017-12-11 A kind of highway anomalous audio event category method based on depth conversion feature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711305135.9A CN108182949A (en) 2017-12-11 2017-12-11 A kind of highway anomalous audio event category method based on depth conversion feature

Publications (1)

Publication Number Publication Date
CN108182949A true CN108182949A (en) 2018-06-19

Family

ID=62545839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711305135.9A Pending CN108182949A (en) 2017-12-11 2017-12-11 A kind of highway anomalous audio event category method based on depth conversion feature

Country Status (1)

Country Link
CN (1) CN108182949A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036382A (en) * 2018-08-15 2018-12-18 武汉大学 A kind of audio feature extraction methods based on KL divergence
CN109065075A (en) * 2018-09-26 2018-12-21 广州势必可赢网络科技有限公司 A kind of method of speech processing, device, system and computer readable storage medium
CN109461458A (en) * 2018-10-26 2019-03-12 合肥工业大学 A kind of audio method for detecting abnormality based on generation confrontation network
CN109584888A (en) * 2019-01-16 2019-04-05 上海大学 Whistle recognition methods based on machine learning
CN110223715A (en) * 2019-05-07 2019-09-10 华南理工大学 It is a kind of based on sound event detection old solitary people man in activity estimation method
CN110390952A (en) * 2019-06-21 2019-10-29 江南大学 City sound event classification method based on bicharacteristic 2-DenseNet parallel connection
CN110718234A (en) * 2019-09-02 2020-01-21 江苏师范大学 Acoustic scene classification method based on semantic segmentation coding and decoding network
CN110942766A (en) * 2019-11-29 2020-03-31 厦门快商通科技股份有限公司 Audio event detection method, system, mobile terminal and storage medium
CN111354373A (en) * 2018-12-21 2020-06-30 中国科学院声学研究所 Audio signal classification method based on neural network intermediate layer characteristic filtering
CN111613240A (en) * 2020-05-22 2020-09-01 杭州电子科技大学 Camouflage voice detection method based on attention mechanism and Bi-LSTM
CN112133084A (en) * 2019-06-25 2020-12-25 浙江吉智新能源汽车科技有限公司 Road information sharing method, device and system
CN112986914A (en) * 2021-02-10 2021-06-18 中国兵器装备集团自动化研究所 Individual helmet and target sound source positioning and voiceprint recognition method thereof
CN113192322A (en) * 2021-03-19 2021-07-30 东北大学 Expressway traffic flow counting method based on cloud edge cooperation
CN113257283A (en) * 2021-03-29 2021-08-13 北京字节跳动网络技术有限公司 Audio signal processing method and device, electronic equipment and storage medium
CN113611288A (en) * 2021-08-06 2021-11-05 南京华捷艾米软件科技有限公司 Audio feature extraction method, device and system
CN113920473A (en) * 2021-10-15 2022-01-11 宿迁硅基智能科技有限公司 Complete event determination method, storage medium and electronic device
CN114863950A (en) * 2022-07-07 2022-08-05 深圳神目信息技术有限公司 Baby crying detection and network establishment method and system based on anomaly detection
CN114927141A (en) * 2022-07-19 2022-08-19 中国人民解放军海军工程大学 Method and system for detecting abnormal underwater acoustic signals
CN115206294A (en) * 2022-09-16 2022-10-18 深圳比特微电子科技有限公司 Training method, sound event detection method, device, equipment and medium
CN116186524A (en) * 2023-05-04 2023-05-30 天津大学 Self-supervision machine abnormal sound detection method
CN117268796A (en) * 2023-11-16 2023-12-22 天津大学 Vehicle fault acoustic event detection method
CN118072766A (en) * 2024-04-24 2024-05-24 南京小草交通科技有限公司 Highway event perception system based on sound detects

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106251860A (en) * 2016-08-09 2016-12-21 张爱英 Unsupervised novelty audio event detection method and system towards safety-security area
CN106952644A (en) * 2017-02-24 2017-07-14 华南理工大学 A kind of complex audio segmentation clustering method based on bottleneck characteristic
WO2017165551A1 (en) * 2016-03-22 2017-09-28 Sri International Systems and methods for speech recognition in unseen and noisy channel conditions
CN107393554A (en) * 2017-06-20 2017-11-24 武汉大学 In a kind of sound scene classification merge class between standard deviation feature extracting method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017165551A1 (en) * 2016-03-22 2017-09-28 Sri International Systems and methods for speech recognition in unseen and noisy channel conditions
CN106251860A (en) * 2016-08-09 2016-12-21 张爱英 Unsupervised novelty audio event detection method and system towards safety-security area
CN106952644A (en) * 2017-02-24 2017-07-14 华南理工大学 A kind of complex audio segmentation clustering method based on bottleneck characteristic
CN107393554A (en) * 2017-06-20 2017-11-24 武汉大学 In a kind of sound scene classification merge class between standard deviation feature extracting method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
MASSIMILIANO TODISCO ET AL: "A New Feature for Automatic Speaker Verification Anti-Spoofing:Constant Q Cepstral Coefficients", 《ODYSSEY 2016-THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP》 *
YANXIONG LI ET AL: "Using multi-stream hierarchical deep neural network to extract deep audio feature for acoustic event detection", 《MULTIMED TOOLS APPL》 *
史秋莹: "基于深度学习和迁移学习的环境声音识别", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
金海: "基于深度神经网络的音频事件检测", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
黄丽霞等: "基于深度自编码网络语音识别噪声鲁棒性研究", 《计算机工程与应用》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036382A (en) * 2018-08-15 2018-12-18 武汉大学 A kind of audio feature extraction methods based on KL divergence
CN109065075A (en) * 2018-09-26 2018-12-21 广州势必可赢网络科技有限公司 A kind of method of speech processing, device, system and computer readable storage medium
CN109461458A (en) * 2018-10-26 2019-03-12 合肥工业大学 A kind of audio method for detecting abnormality based on generation confrontation network
CN109461458B (en) * 2018-10-26 2022-09-13 合肥工业大学 Audio anomaly detection method based on generation countermeasure network
CN111354373A (en) * 2018-12-21 2020-06-30 中国科学院声学研究所 Audio signal classification method based on neural network intermediate layer characteristic filtering
CN109584888A (en) * 2019-01-16 2019-04-05 上海大学 Whistle recognition methods based on machine learning
CN110223715A (en) * 2019-05-07 2019-09-10 华南理工大学 It is a kind of based on sound event detection old solitary people man in activity estimation method
CN110223715B (en) * 2019-05-07 2021-05-25 华南理工大学 Home activity estimation method for solitary old people based on sound event detection
CN110390952A (en) * 2019-06-21 2019-10-29 江南大学 City sound event classification method based on bicharacteristic 2-DenseNet parallel connection
CN110390952B (en) * 2019-06-21 2021-10-22 江南大学 City sound event classification method based on dual-feature 2-DenseNet parallel connection
CN112133084A (en) * 2019-06-25 2020-12-25 浙江吉智新能源汽车科技有限公司 Road information sharing method, device and system
CN112133084B (en) * 2019-06-25 2022-04-12 浙江吉智新能源汽车科技有限公司 Road information sharing method, device and system
CN110718234A (en) * 2019-09-02 2020-01-21 江苏师范大学 Acoustic scene classification method based on semantic segmentation coding and decoding network
CN110942766A (en) * 2019-11-29 2020-03-31 厦门快商通科技股份有限公司 Audio event detection method, system, mobile terminal and storage medium
CN111613240A (en) * 2020-05-22 2020-09-01 杭州电子科技大学 Camouflage voice detection method based on attention mechanism and Bi-LSTM
CN112986914A (en) * 2021-02-10 2021-06-18 中国兵器装备集团自动化研究所 Individual helmet and target sound source positioning and voiceprint recognition method thereof
CN113192322A (en) * 2021-03-19 2021-07-30 东北大学 Expressway traffic flow counting method based on cloud edge cooperation
CN113257283A (en) * 2021-03-29 2021-08-13 北京字节跳动网络技术有限公司 Audio signal processing method and device, electronic equipment and storage medium
CN113257283B (en) * 2021-03-29 2023-09-26 北京字节跳动网络技术有限公司 Audio signal processing method and device, electronic equipment and storage medium
CN113611288A (en) * 2021-08-06 2021-11-05 南京华捷艾米软件科技有限公司 Audio feature extraction method, device and system
CN113920473A (en) * 2021-10-15 2022-01-11 宿迁硅基智能科技有限公司 Complete event determination method, storage medium and electronic device
CN113920473B (en) * 2021-10-15 2022-07-29 宿迁硅基智能科技有限公司 Complete event determination method, storage medium and electronic device
CN114863950A (en) * 2022-07-07 2022-08-05 深圳神目信息技术有限公司 Baby crying detection and network establishment method and system based on anomaly detection
CN114927141A (en) * 2022-07-19 2022-08-19 中国人民解放军海军工程大学 Method and system for detecting abnormal underwater acoustic signals
CN115206294A (en) * 2022-09-16 2022-10-18 深圳比特微电子科技有限公司 Training method, sound event detection method, device, equipment and medium
CN116186524A (en) * 2023-05-04 2023-05-30 天津大学 Self-supervision machine abnormal sound detection method
CN116186524B (en) * 2023-05-04 2023-07-18 天津大学 Self-supervision machine abnormal sound detection method
CN117268796A (en) * 2023-11-16 2023-12-22 天津大学 Vehicle fault acoustic event detection method
CN117268796B (en) * 2023-11-16 2024-01-26 天津大学 Vehicle fault acoustic event detection method
CN118072766A (en) * 2024-04-24 2024-05-24 南京小草交通科技有限公司 Highway event perception system based on sound detects

Similar Documents

Publication Publication Date Title
CN108182949A (en) A kind of highway anomalous audio event category method based on depth conversion feature
CN110827837B (en) Whale activity audio classification method based on deep learning
DE112017001830B4 (en) VOICE ENHANCEMENT AND AUDIO EVENT DETECTION FOR A NON-STATIONARY NOISE ENVIRONMENT
CN110246510B (en) End-to-end voice enhancement method based on RefineNet
CN107393542A (en) A kind of birds species identification method based on binary channels neutral net
CN108922513A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN108630209B (en) Marine organism identification method based on feature fusion and deep confidence network
CN101366078A (en) Neural network classifier for separating audio sources from a monophonic audio signal
CN108711436A (en) Speaker verification's system Replay Attack detection method based on high frequency and bottleneck characteristic
CN111128209B (en) Speech enhancement method based on mixed masking learning target
CN113724712B (en) Bird sound identification method based on multi-feature fusion and combination model
CN111696580B (en) Voice detection method and device, electronic equipment and storage medium
CN109036470A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN108962229A (en) A kind of target speaker's voice extraction method based on single channel, unsupervised formula
CN108806725A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN114495973A (en) Special person voice separation method based on double-path self-attention mechanism
CN115081473A (en) Multi-feature fusion brake noise classification and identification method
CN110544482A (en) single-channel voice separation system
Nossier et al. Mapping and masking targets comparison using different deep learning based speech enhancement architectures
Shifas et al. A non-causal FFTNet architecture for speech enhancement
Wang et al. Low pass filtering and bandwidth extension for robust anti-spoofing countermeasure against codec variabilities
CN110390937A (en) A kind of across channel method for recognizing sound-groove based on ArcFace loss algorithm
CN104240717A (en) Voice enhancement method based on combination of sparse code and ideal binary system mask
EP0658874B1 (en) Process and circuit for producing from a speech signal with small bandwidth a speech signal with great bandwidth
CN113270110A (en) ZPW-2000A track circuit transmitter and receiver fault diagnosis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180619