CN108182949A - A kind of highway anomalous audio event category method based on depth conversion feature - Google Patents
A kind of highway anomalous audio event category method based on depth conversion feature Download PDFInfo
- Publication number
- CN108182949A CN108182949A CN201711305135.9A CN201711305135A CN108182949A CN 108182949 A CN108182949 A CN 108182949A CN 201711305135 A CN201711305135 A CN 201711305135A CN 108182949 A CN108182949 A CN 108182949A
- Authority
- CN
- China
- Prior art keywords
- network
- frequency
- audio
- depth
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 43
- 230000002547 anomalous effect Effects 0.000 title claims abstract description 41
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 38
- 238000000605 extraction Methods 0.000 claims abstract description 28
- 238000012360 testing method Methods 0.000 claims abstract description 23
- 230000015654 memory Effects 0.000 claims abstract description 19
- 238000009432 framing Methods 0.000 claims abstract description 15
- 230000009466 transformation Effects 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 46
- 238000001228 spectrum Methods 0.000 claims description 27
- 230000003595 spectral effect Effects 0.000 claims description 21
- 230000005236 sound signal Effects 0.000 claims description 16
- 239000010410 layer Substances 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 8
- 230000014509 gene expression Effects 0.000 claims description 7
- 239000002356 single layer Substances 0.000 claims description 6
- 238000002360 preparation method Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 230000010355 oscillation Effects 0.000 claims description 3
- 230000010363 phase shift Effects 0.000 claims description 3
- 238000013139 quantization Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 5
- 230000004927 fusion Effects 0.000 abstract description 4
- 238000012956 testing procedure Methods 0.000 abstract description 2
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000686 essence Substances 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a kind of highway anomalous audio event category methods based on depth conversion feature, acquire highway anomalous audio event sample first, are then divided into training set and test set;Then preemphasis, framing, windowing process are carried out to training set and test set audio event sample respectively, and front and rear 2 frame is taken to form context audio data block;Acoustic feature is extracted from above-mentioned audio data block and is spliced into characteristic vector;By characteristic vector input depth autoencoder network extraction depth conversion feature;Then the long memory network grader in short-term of input, recognizes all kinds of anomalous audio events.Above-mentioned depth autoencoder network feature extractor and long memory network grader in short-term all include training step and testing procedure.The depth conversion that the present invention uses is characterized in the fusion and transformation of each conventional acoustic feature, has better distinction and robustness, more preferably classifying quality can be obtained when classifying to the anomalous audio event in highway complex audio.
Description
Technical field
The present invention relates to Audio Signal Processings and machine learning techniques field, and in particular to one kind is based on depth conversion feature
Highway anomalous audio event category method.
Background technology
As the improvement of people's living standards, the quantity sharp increase of private car, is efficiently runed to expressway safety
Pressure is increased, there is an urgent need for a kind of normal events that can be on automatic distinguishing highway and the method for anomalous event.Highway
Situation is more complicated, and various abnormal conditions are likely to occur, and traditional method based on video monitoring is difficult to efficiently should comprehensively
To various abnormal emergency cases.
Traditional audio event sorting technique is mostly single using mel-frequency cepstrum coefficient, perception linear predictor coefficient etc.
Acoustic feature.In view of the features such as difference is big, class inherited is small in highway anomalous audio event context loudness of a sound, class, single biography
System acoustic feature can not effectively portray the difference between each audio event, and the present invention combines multiple acoustic features and using depth god
Depth integration and the transformation of feature are carried out through network, it is intended to assemble the advantage of each acoustic feature and further excavate each acoustic feature
Latent trait obtains more distinction and the depth conversion feature of noise immunity.
Invention content
The purpose of the present invention is to solve drawbacks described above of the prior art, provide a kind of based on depth conversion feature
Highway anomalous audio event category method, includes the following steps, prepares data first, acquires highway anomalous audio thing
Anomalous audio event sample is simultaneously divided into training set and test set by part sample;Then it pre-processes, respectively to training set and test
Collect audio event sample and carry out preemphasis, framing, windowing process, and front and rear 2 frame is taken to form context audio data block;From above-mentioned
Acoustic feature is extracted in audio data block, is mainly included:Meier wave filter group, Gabor filter group, normal Q cepstrum coefficients will be upper
Three kinds of merging features are stated into acoustic feature vector and input depth autoencoder network extraction depth conversion feature;Then by above-mentioned depth
The long memory network grader in short-term of transform characteristics input is spent, recognizes all kinds of anomalous audio events.Above-mentioned depth autoencoder network is special
It levies extractor and long memory network grader in short-term all includes training step and testing procedure.The depth conversion that the present invention uses is special
Sign is the fusion and transformation of each conventional acoustic feature, has better distinction and robustness, to highway complex audio
In anomalous audio event when being classified, more preferably classifying quality can be obtained.
The purpose of the present invention can be reached by adopting the following technical scheme that:
A kind of highway anomalous audio event category method based on depth conversion feature, the method include following
Step:
S1, data preparation are gone forward side by side using sound pick-up outfit in audio data of the highway acquisition comprising anomalous audio event
Pedestrian's work marks, and above-mentioned audio data then is divided into training dataset and test data set;
S2, pretreatment carry out preemphasis, framing, adding window to training data and test data respectively, and front and rear 2 frame is taken to form
Context audio data block;
S3, acoustic feature extraction, acoustics feature extraction is done to pretreated audio data, including Meier wave filter group,
Gabor filter group and normal Q cepstrum coefficients, and by above-mentioned three kinds of merging features into an acoustic feature vector;
S4, depth conversion feature extraction build depth autoencoder network, and above-mentioned acoustic feature vector input depth is self-editing
Code network, determines depth autoencoder network parameter, the output of depth autoencoder network output layer is pair based on minimum error principle
The reconstruct of the input acoustic feature vector of input layer, the output of depth autoencoder network bottleneck layer is depth conversion feature;
S5, anomalous audio event category, by above-mentioned depth conversion feature input trained length memory network in short-term
Grader obtains the classification results of anomalous audio event.
Further, data preparation includes the following steps in the step S1:
S1.1, audio data is acquired using sound pick-up outfit:Sound pick-up outfit is placed in the isolated column among highway, sound
The sample frequency of frequency evidence is 16KHz, quantization digit 8bit;
S1.2, audio data mark:More than three people or three people audio data is manually marked, for there are objections
Mark, final annotation results are determined by the principle that the minority is subordinate to the majority;
S1.3, audio data divide:It is training set and test set by the audio data random division after mark, wherein training
Collection accounts for 80%, and test set accounts for 20%.
Further, it pre-processes and includes the following steps in the step S2:
S2.1, preemphasis:System function is used to be filtered for the wave filter of H (z) to audio data, and:
H (z)=1- μ z-1,
Wherein μ be constant, value 0.98;
S2.2, framing:Audio data after preemphasis is subjected to framing operation, audio frame frame length is 25 milliseconds, and frame shifting is
10 milliseconds;
S2.3, adding window:Audio data after framing is multiplied with window function ω (n), window function ω (n) is Hamming window:
Wherein, N represents frame length, and frame length is sampled point number, and N=25ms × 16KHz=400;
S2.4, construction context audio data block:Front and rear each 2 frame of audio frame is chosen as context, forms 5 frames
Audio frame data block.
Further, acoustic feature extraction includes the following steps in the step S3:
S3.1, Meier wave filter group feature extraction, are as follows:
S3.1.1, to t frame audio signals xt(n) it does Discrete Fourier Transform and obtains linear spectral Xt(k):
S3.1.2, using mel-frequency wave filter group to above-mentioned linear spectral Xt(k) it is filtered to obtain Meier frequency spectrum, plum
Your frequency filter group is several bandpass filters Hm(k), 0≤m < M, M are the number of wave filter, and each wave filter has
Triangle filtering characteristic, centre frequency are f (m), and the interval of adjacent f (m) is smaller when m values are smaller, adjacent with the increase of m
The interval of f (m) becomes larger, and the transmission function of each bandpass filter is:
Wherein, 0≤m < M, f (m) are defined as follows:
Wherein, fl、fhFor the low-limit frequency and highest frequency of wave filter, B-1Inverse function for B:
B-1(b)=700 (eb/1125- 1),
The Meier wave filter group feature F (p) of pth frame audio signal is:
F (p)=Xt(p)Hm(p) 0≤m < M;
S3.2, Gabor filter group feature extraction, are as follows:
S3.2.1, Gabor filter, Gabor filter group are made of the Gabor filter of one group of bidimensional, Gabor filtering
Device function is defined as follows:
Sω(x)=exp (j ω x),
Wherein k represents frequency indices, and n represents frame index, k0Represent carrier frequency, n0Represent the center of time frame, ωkTable
Show spectrum modulating frequency, ωnRepresent time-modulation frequency, vkAnd vnCarrier wave is represented respectively in frequency domain and the number of semi-periods of oscillation of time domain dimension,
φ represents the global phase of an additivity, and b represents frequency bandwidth;
S3.2.2, Gabor filter group feature extraction, are as follows:
S3.2.2.1, Meier spectral transformation:To t frame audio signals xt(n) linear spectral X is obtained as discrete Fourier transform
(k), then by linear spectral X (k) it is transformed to log-magnitude Meier spectrum Xm(k):
Wherein, N is frame length, and F (k, m) represents k-th of component of m rank Meier wave filter groups, and M is Meier wave filter
Number;
S3.2.2.2, it is filtered using Gabor filter:By logarithm Mel spectral coefficients Xm(k) the Gabor filtering of input bidimensional
Device, the real part that Gabor filter is taken to export, obtains the Gabor filter group feature Gabor (p) of pth frame:
Wherein, Re () expressions take function real part, Xm() represents logarithm Meier spectral coefficient, and G () represents Garbor filters
Wave device function;
S3.2.2.3, it Gabor filter group is applied to each Mel wave filters obtains the character representation of a higher-dimension, select
With 23 Mel wave filters and 41 Gabor filters, then the output of Gabor filter has 23*41=943 dimensions, and Gabor is filtered
The output of wave device obtains the Gabor filter group feature of 311 dimensions as double sampling;
S3.3, normal Q cepstrum coefficients feature extraction, are as follows:
S3.3.1, to t frame audio signals xt(n) normal Q transformation is carried out, obtains normal Q frequency spectrums:
Wherein, k=1,2 ..., K represent frequency indices,It is akConjugation, NkRepresent the length of window function,It represents
Downward rounding,
Wherein, fsRepresent sample frequency, fkFrequency when representing index as k,Representing phase shift, ω () represents window function,
f1Represent low-limit frequency, B represents bandwidth;
S3.3.2, the energy spectrum for calculating normal Q frequency spectrums | XCQ(k)|2, take the logarithm to energy spectrum, obtain logarithmic energy spectrum log
(XCQ(k)|2), then discrete cosine transform is carried out, obtain the normal Q cepstrum coefficient CQCC (p) of pth frame audio signal:
Wherein, L is maximum discrete frequency, XCQ(k) it is normal Q frequency spectrums;
S3.4, merging features:By Meier wave filter group feature, Gabor filter group feature, normal Q cepstrum coefficients feature is spelled
It is connected into an acoustic feature vector:V=[F (p), Gabor (p), CQCC (p)].
Further, depth conversion feature extraction includes the following steps in the step S4:
S4.1, the sub-network for building depth autoencoder network:Depth autoencoder network conciliates numeral net by coding sub-network
Network two parts form, and the lap of above-mentioned two sub-network is bottleneck layer, and the output of bottleneck layer is depth conversion feature;
For single layer coding sub-network:
YEO=f (Winv+bin),
Wherein v be input acoustic feature vector, YEOFor coding sub-network output, WinTo encode sub-network weight matrix,
binTo encode sub-network bias vector, f () is activation primitive, chooses Relu functions, expression formula is:
Wherein xinInput for activation primitive;
For single layer decoding sub-network:
Y=f (WoutYEO+bout),
Wherein YEOFor decoding sub-network input, WoutTo decode sub-network weight matrix, boutFor decoding sub-network deviation arrow
Amount, f () are activation primitive, choose Relu functions, and y is the output of whole network;
Define loss function:
Wherein v is acoustic feature vector, and y is exported for whole network;
S4.2, training depth autoencoder network:Training objective makes loss function MSE small as possible, obtains network weight matrix
With bias vector parameter, the acoustic feature vector extracted is then inputted into depth autoencoder network, obtains depth conversion feature.
Further, anomalous audio event category includes the following steps in the step S5:
S5.1, structure and the long memory network in short-term of training:Define network losses function:
Wherein K be audio event classification number, zkFor the mark value of kth class audio frequency event, ykFor the defeated of kth class audio frequency event
Go out probability, network training target is minimizes loss function ψ;
S5.2, output category result:After training long memory network grader in short-term, by the depth conversion of test set sample
The long memory network grader in short-term of feature input, obtains the output probability of each class audio frequency event, maximum that of output probability
Class is court verdict.
The present invention is according to the typical anomalous event of highway (collision, emergency braking, tyre slip, is blown out at overturning)
The characteristics of (occur various the earsplitting sounds), based on audio feature extraction with transformation to all kinds of abnormal things on highway
Part carries out classification identification, effectively compensates for the deficiency currently based on video frequency monitoring method.
The present invention is had the following advantages relative to the prior art and effect:
1st, long memory network in short-term is applied to highway anomalous audio event category, compared to traditional supporting vector
The graders such as machine, K be neighbouring are better.
2nd, traditional single mel-frequency cepstrum coefficient is not used, perceives the acoustic features such as predictive coefficient, and using Meier
The assemblage characteristic of wave filter group, Gabor filter group and normal Q cepstrum coefficients, and using depth autoencoder network to said combination
Feature carries out fusion transformation, obtains more effectively portraying the depth conversion feature of the time-frequency characteristic difference of anomalous audio event, point
Class is better.
Description of the drawings
Fig. 1 is a kind of highway anomalous audio event category method based on depth conversion feature disclosed in the present invention
Process step figure.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention
In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
All other embodiments obtained without making creative work shall fall within the protection scope of the present invention.
Embodiment
Fig. 1 is the frame of one embodiment of the highway anomalous audio event category method based on depth conversion feature
Figure, it mainly includes procedure below:
S1, data preparation:It is gone forward side by side using sound pick-up outfit in audio data of the highway acquisition comprising anomalous audio event
Pedestrian's work marks, and above-mentioned audio data then is divided into training dataset and test data set;Specific steps include:
S1.1, audio data is acquired using sound pick-up outfit:Sound pick-up outfit is placed in the isolated column among highway, sound
The sample frequency of frequency evidence is 16KHz, quantization digit 8bit;
S1.2, audio data mark:More than three people or three people audio data is manually marked, for there are objections
Mark, final annotation results are determined by the principle that the minority is subordinate to the majority;
S1.3, audio data divide:It is training set and test set by the audio data random division after mark, wherein training
Collection accounts for 80%, and test set accounts for 20%.
S2, pretreatment:Preemphasis, framing, adding window are carried out to training data and test data respectively, take front and rear 2 frame conduct
Context;Specific steps include:
S2.1, preemphasis:System function is used to be filtered for the wave filter of H (z) to audio data, and:
H (z)=1- μ z-1,
Wherein μ be constant, value 0.98;
S2.2, framing:Audio data after preemphasis is subjected to framing operation, audio frame frame length is 25 milliseconds, and frame shifting is
10 milliseconds;
S2.3, adding window:Audio data after framing is multiplied with window function, window function is Hamming window ω (n),
Wherein, N represents frame length, and frame length is sampled point number, and N=25ms × 16KHz=400;
S2.4, construction context audio data block:Front and rear each 2 frame for choosing audio frame forms 5 frames as context
Audio frame data block.
S3, acoustic feature extraction:Acoustics feature extraction is done to pretreated data, mainly includes Meier wave filter group,
Gabor filter group, normal Q cepstrum coefficients, by above-mentioned three kinds of merging features into acoustic feature vector;It is as follows:
S3.1, Meier wave filter group feature extraction, are as follows:
S3.1.1, to t frame audio signals xt(n) it does Discrete Fourier Transform and obtains linear spectral Xt(k):
S3.1.2, using mel-frequency wave filter group to above-mentioned linear spectral Xt(k) it is filtered to obtain Meier frequency spectrum,
Middle mel-frequency wave filter group is several bandpass filters Hm(k), the number of 0≤m < M, M for wave filter, each wave filter
With triangle filtering characteristic, centre frequency is f (m), and the interval of adjacent f (m) is smaller when m values are smaller, with the increase of m
The interval of adjacent f (m) becomes larger, and the transmission function of each bandpass filter is:
Wherein, 0≤m < M, f (m) are defined as follows:
Wherein, fl、fhFor the low-limit frequency and highest frequency of wave filter, B-1Inverse function for B:
B-1(b)=700 (eb/1125- 1),
The Meier wave filter group feature F (p) of pth frame audio signal is:
F (p)=Xt(p)Hm(p) 0≤m < M;
S3.2, Gabor filter group feature extraction, are as follows:
S3.2.1, Gabor filter:Gabor filter group is made of the Gabor filter of one group of bidimensional, Gabor filtering
Device function is defined as follows:
Sω(x)=exp (j ω x),
Wherein k represents frequency indices, and n represents frame index, k0Represent carrier frequency, n0Represent the center of time frame, ωkTable
Show spectrum modulating frequency, ωnRepresent time-modulation frequency, vkAnd vnCarrier wave is represented respectively in frequency domain and the number of semi-periods of oscillation of time domain dimension,
φ represents the global phase of an additivity, and b represents frequency bandwidth;
S3.2.2, Gabor filter group feature extraction, are as follows:
S3.2.2.1, Meier spectral transformation:To t frame audio signals xt(n) linear spectral X is obtained as discrete Fourier transform
(k), then by linear spectral X (k) it is transformed to log-magnitude Meier spectrum Xm(k):
Wherein, N is frame length, and F (k, m) represents k-th of component of m rank Meier wave filter groups, and M is Meier wave filter
Number;
S3.2.2.2, it is filtered using Gabor filter:By logarithm Mel spectral coefficients Xm(k) the Gabor filtering of input bidimensional
Device, the real part that Gabor filter is taken to export, obtains the Gabor filter group feature of pth frame:
Wherein Re () expressions take function real part, Xm() represents logarithm Meier spectral coefficient, and G () represents Garbor filtering
Device function;
S3.2.2.3, it Gabor filter group is applied to each Mel wave filters obtains the character representation of a higher-dimension, select
With 23 Mel wave filters and 41 Gabor filters, then the output of Gabor filter has 23*41=943 dimensions, and Gabor is filtered
The output of wave device obtains the Gabor filter group feature of 311 dimensions as double sampling;
S3.3, normal Q cepstrum coefficients feature extraction, are as follows:
S3.3.1, to t frame audio signals xt(n) normal Q transformation is carried out, obtains normal Q frequency spectrums:
Wherein, k=1,2 ..., K represent frequency indices,It is akConjugation, NkRepresent the length of window function,It represents
Downward rounding,
Wherein fsRepresent sample frequency, fkFrequency when representing index as k,Representing phase shift, ω () represents window function,
f1Represent low-limit frequency, B represents bandwidth;
S3.3.2, the energy spectrum for calculating normal Q frequency spectrums | XCQ(k)|2, take the logarithm to energy spectrum, obtain logarithmic energy spectrum log
(XCQ(k)|2), then discrete cosine transform is carried out, obtain the normal Q cepstrum coefficient CQCC (p) of pth frame audio signal:
Wherein, L is maximum discrete frequency, XCQ(k) it is normal Q frequency spectrums;
S3.4, merging features:By Meier wave filter group feature, Gabor filter group feature, normal Q cepstrum coefficients feature is spelled
It is connected into an acoustic feature vector:V=[F (p), Gabor (p), CQCC (p)].
S4, depth conversion feature extraction:Depth autoencoder network is built, acoustic feature vector is inputted into depth own coding net
Network, the output of depth autoencoder network are the reconstruct to inputting acoustic feature vector, determine that network is joined based on minimum error principle
Number, the output of depth autoencoder network bottleneck layer is depth conversion feature;It is as follows:
S4.1, the sub-network for building depth autoencoder network:Depth autoencoder network conciliates numeral net by coding sub-network
Network two parts form, and the lap of above-mentioned two sub-network is bottleneck layer, and the output of bottleneck layer is depth conversion feature;
For single layer coding sub-network:
YEO=f (Winv+bin),
Wherein, v be input acoustic feature vector, YEOFor coding sub-network output, WinTo encode sub-network weight square
Battle array, binTo encode sub-network bias vector, f () is activation primitive, general to choose Relu functions, and expression formula is:
Wherein, xinInput for activation primitive;
For single layer decoding sub-network:
Y=f (WoutYEO+bout),
Wherein YEOFor decoding sub-network input, WoutTo decode sub-network weight matrix, boutFor decoding sub-network deviation arrow
Amount, f () are activation primitive, choose Relu functions, and y is the output of whole network;
Define loss function:
Wherein v is acoustic feature vector, and y is exported for whole network;
S4.2, training depth autoencoder network:Training objective makes loss function MSE small as possible, obtains network weight matrix
With bias vector parameter, the acoustic feature vector extracted is then inputted into depth autoencoder network, obtains depth conversion feature.
S5, anomalous audio event category:By above-mentioned depth conversion feature input trained length memory network in short-term
Grader obtains the classification results of each audio event;It is as follows:
S5.1, structure and the long memory network in short-term of training:Better classification results in order to obtain, the long short-term memory of structure
Number of network node is 400, learning rate 0.001, and network iterations are 3000 times, and expansion step number is 10, algorithm for training network
Using the back-propagation algorithm along the time,
Define network losses function:
Wherein, K be audio event classification number, zkFor the mark value of kth class audio frequency event, ykFor the defeated of kth class audio frequency event
Go out probability, network training target is minimizes loss function ψ;
S5.2, output category result:After training long memory network grader in short-term, by the depth conversion of test set sample
The long memory network grader in short-term of feature input, obtains the output probability of each class audio frequency event, maximum that of output probability
Class is court verdict.
In conclusion the highway anomalous audio event category method disclosed in the present embodiment prepares data first, adopt
Collect highway anomalous audio event sample, anomalous audio event sample is then divided into training set and test set;Then it is pre-
Processing carries out preemphasis, framing, windowing process to training set and test set audio event sample respectively, and front and rear 2 frame is taken to form
Context audio data block;Acoustic feature is extracted from above-mentioned audio data block, is mainly included:Meier wave filter group (Mel
Filter Bank, MFB), Gabor filter group (Gabor Filter Bank, GFB), normal Q cepstrum coefficients (Constant Q
Cepstral Coefficient, CQCC), then by above-mentioned three kinds of merging features into characteristic vector;Features described above vector is inputted
Depth autoencoder network extracts depth conversion feature;Then by the long memory network (Long in short-term of above-mentioned depth conversion feature input
Short Term Memory Network, LSTMN) grader, recognize all kinds of anomalous audio events.Above-mentioned depth own coding net
Network feature extractor and long memory network grader in short-term all include training step (training dataset is as input) and test step
Suddenly (test data set is as input).The depth conversion that the present invention uses is characterized in the fusion and transformation of each conventional acoustic feature,
With better distinction and robustness, can be obtained when classifying to the anomalous audio event in highway complex audio
More preferably classifying quality.
Above-described embodiment is the preferable embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment
Limitation, other any Spirit Essences without departing from the present invention with made under principle change, modification, replacement, combine, simplification,
Equivalent substitute mode is should be, is included within protection scope of the present invention.
Claims (6)
- A kind of 1. highway anomalous audio event category method based on depth conversion feature, which is characterized in that the side Method includes the following steps:S1, data preparation are gone forward side by side pedestrian using sound pick-up outfit in audio data of the highway acquisition comprising anomalous audio event Work marks, and above-mentioned audio data then is divided into training dataset and test data set;S2, pretreatment carry out preemphasis, framing, adding window to training data and test data respectively, take front and rear 2 frame composition up and down Literary audio data block;S3, acoustic feature extraction, do acoustics feature extraction, including Meier wave filter group, Gabor to pretreated audio data Wave filter group and normal Q cepstrum coefficients, and by above-mentioned three kinds of merging features into an acoustic feature vector;S4, depth conversion feature extraction build depth autoencoder network, and above-mentioned acoustic feature vector is inputted depth own coding net Network determines depth autoencoder network parameter based on minimum error principle, and the output of depth autoencoder network output layer is to input The reconstruct of the input acoustic feature vector of layer, the output of depth autoencoder network bottleneck layer is depth conversion feature;S5, anomalous audio event category, by the input of above-mentioned depth conversion feature, trained length in short-term classified by memory network Device obtains the classification results of anomalous audio event.
- 2. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, It is characterized in that, data preparation includes the following steps in the step S1:S1.1, audio data is acquired using sound pick-up outfit:Sound pick-up outfit is placed in the isolated column among highway, audio number According to sample frequency be 16KHz, quantization digit 8bit;S1.2, audio data mark:More than three people or three people audio data is manually marked, for there are the marks of objection Note, final annotation results are determined by the principle that the minority is subordinate to the majority;S1.3, audio data divide:It is training set and test set by the audio data random division after mark, wherein training set accounts for 80%, test set accounts for 20%.
- 3. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, Include the following steps it is characterized in that, being pre-processed in the step S2:S2.1, preemphasis:System function is used to be filtered for the wave filter of H (z) to audio data, and:H (z)=1- μ z-1,Wherein μ be constant, value 0.98;S2.2, framing:Audio data after preemphasis is subjected to framing operation, audio frame frame length is 25 milliseconds, and frame is moved as 10 millis Second;S2.3, adding window:Audio data after framing is multiplied with window function ω (n), window function ω (n) is Hamming window:Wherein, N represents frame length, and frame length is sampled point number, and N=25ms × 16KHz=400;S2.4, construction context audio data block:Front and rear each 2 frame of audio frame is chosen as context, forms the sound of 5 frames Frequency frame data block.
- 4. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, It is characterized in that, acoustic feature extraction includes the following steps in the step S3:S3.1, Meier wave filter group feature extraction, are as follows:S3.1.1, to t frame audio signals xt(n) it does Discrete Fourier Transform and obtains linear spectral Xt(k):S3.1.2, using mel-frequency wave filter group to above-mentioned linear spectral Xt(k) it is filtered to obtain Meier frequency spectrum, Meier frequency Rate wave filter group is several bandpass filters Hm(k), 0≤m < M, M are the number of wave filter, and each wave filter has triangle Shape filtering characteristic, centre frequency are f (m), and the interval of adjacent f (m) is smaller when m values are smaller, with the adjacent f (m) of the increase of m Interval become larger, the transmission function of each bandpass filter is:Wherein, 0≤m < M, f (m) are defined as follows:Wherein, fl、fhFor the low-limit frequency and highest frequency of wave filter, B-1Inverse function for B:B-1(b)=700 (eb/1125- 1),The Meier wave filter group feature F (p) of pth frame audio signal is:F (p)=Xt(p)Hm(p) 0≤m < M;S3.2, Gabor filter group feature extraction, are as follows:S3.2.1, Gabor filter, Gabor filter group are made of the Gabor filter of one group of bidimensional, Gabor filter letter Number is defined as follows:Sω(x)=exp (j ω x),Wherein k represents frequency indices, and n represents frame index, k0Represent carrier frequency, n0Represent the center of time frame, ωkRepresent spectrum Modulating frequency, ωnRepresent time-modulation frequency, vkAnd vnRepresent carrier wave in frequency domain and the number of semi-periods of oscillation of time domain dimension, φ tables respectively Show the global phase of an additivity, b represents frequency bandwidth;S3.2.2, Gabor filter group feature extraction, are as follows:S3.2.2.1, Meier spectral transformation:To t frame audio signals xt(n) linear spectral X (k) is obtained as discrete Fourier transform, Linear spectral X (k) is transformed to log-magnitude Meier spectrum X againm(k):Wherein, N is frame length, and F (k, m) represents k-th of component of m rank Meier wave filter groups, and M is Meier number of filter;S3.2.2.2, it is filtered using Gabor filter:By logarithm Mel spectral coefficients Xm(k) Gabor filter of bidimensional is inputted, is taken The real part of Gabor filter output, obtains the Gabor filter group feature Gabor (p) of pth frame:Wherein, Re () expressions take function real part, Xm() represents logarithm Meier spectral coefficient, and G () represents Garbor wave filter letters Number;S3.2.2.3, it Gabor filter group is applied to each Mel wave filters obtains the character representation of a higher-dimension, select 23 A Mel wave filters and 41 Gabor filters, then the output of Gabor filter has 23*41=943 dimensions, to Gabor filter Output obtain the Gabor filter group feature of 311 dimensions as double sampling;S3.3, normal Q cepstrum coefficients feature extraction, are as follows:S3.3.1, to t frame audio signals xt(n) normal Q transformation is carried out, obtains normal Q frequency spectrums:Wherein, k=1,2 ..., K represent frequency indices,It is akConjugation, NkRepresent the length of window function,Expression takes downwards It is whole,Wherein, fsRepresent sample frequency, fkFrequency when representing index as k,Represent phase shift, ω () represents window function, f1Table Show low-limit frequency, B represents bandwidth;S3.3.2, the energy spectrum for calculating normal Q frequency spectrums | XCQ(k)|2, take the logarithm to energy spectrum, obtain logarithmic energy spectrum log (XCQ(k) |2), then discrete cosine transform is carried out, obtain the normal Q cepstrum coefficient CQCC (p) of pth frame audio signal:Wherein, L is maximum discrete frequency, XCQ(k) it is normal Q frequency spectrums;S3.4, merging features:By Meier wave filter group feature, Gabor filter group feature, normal Q cepstrum coefficients merging features into One acoustic feature vector:V=[F (p), Gabor (p), CQCC (p)].
- 5. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, It is characterized in that, depth conversion feature extraction includes the following steps in the step S4:S4.1, the sub-network for building depth autoencoder network:Depth autoencoder network is by coding sub-network reconciliation numeral network two Part forms, and the lap of above-mentioned two sub-network is bottleneck layer, and the output of bottleneck layer is depth conversion feature;For single layer coding sub-network:YEO=f (Winv+bin),Wherein v be input acoustic feature vector, YEOFor coding sub-network output, WinTo encode sub-network weight matrix, binFor Sub-network bias vector is encoded, f () is activation primitive, chooses Relu functions, expression formula is:Wherein xinInput for activation primitive;For single layer decoding sub-network:Y=f (WoutYEO+bout),Wherein YEOFor decoding sub-network input, WoutTo decode sub-network weight matrix, boutTo decode sub-network bias vector, f () is activation primitive, chooses Relu functions, and y is the output of whole network;Define loss function:Wherein v is acoustic feature vector, and y is exported for whole network;S4.2, training depth autoencoder network:Training objective makes loss function MSE small as possible, obtain network weight matrix and partially Then the acoustic feature vector extracted is inputted depth autoencoder network, obtains depth conversion feature by difference vector parameter.
- 6. a kind of highway anomalous audio event category method based on depth conversion feature according to claim 1, It is characterized in that, anomalous audio event category includes the following steps in the step S5:S5.1, structure and the long memory network in short-term of training:Define network losses function:Wherein K be audio event classification number, zkFor the mark value of kth class audio frequency event, ykOutput for kth class audio frequency event is general Rate, network training target is minimizes loss function ψ;S5.2, output category result:After training long memory network grader in short-term, by the depth conversion feature of test set sample The long memory network grader in short-term of input, obtains the output probability of each class audio frequency event, that one kind of output probability maximum is For court verdict.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711305135.9A CN108182949A (en) | 2017-12-11 | 2017-12-11 | A kind of highway anomalous audio event category method based on depth conversion feature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711305135.9A CN108182949A (en) | 2017-12-11 | 2017-12-11 | A kind of highway anomalous audio event category method based on depth conversion feature |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108182949A true CN108182949A (en) | 2018-06-19 |
Family
ID=62545839
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711305135.9A Pending CN108182949A (en) | 2017-12-11 | 2017-12-11 | A kind of highway anomalous audio event category method based on depth conversion feature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108182949A (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109036382A (en) * | 2018-08-15 | 2018-12-18 | 武汉大学 | A kind of audio feature extraction methods based on KL divergence |
CN109065075A (en) * | 2018-09-26 | 2018-12-21 | 广州势必可赢网络科技有限公司 | A kind of method of speech processing, device, system and computer readable storage medium |
CN109461458A (en) * | 2018-10-26 | 2019-03-12 | 合肥工业大学 | A kind of audio method for detecting abnormality based on generation confrontation network |
CN109584888A (en) * | 2019-01-16 | 2019-04-05 | 上海大学 | Whistle recognition methods based on machine learning |
CN110223715A (en) * | 2019-05-07 | 2019-09-10 | 华南理工大学 | It is a kind of based on sound event detection old solitary people man in activity estimation method |
CN110390952A (en) * | 2019-06-21 | 2019-10-29 | 江南大学 | City sound event classification method based on bicharacteristic 2-DenseNet parallel connection |
CN110718234A (en) * | 2019-09-02 | 2020-01-21 | 江苏师范大学 | Acoustic scene classification method based on semantic segmentation coding and decoding network |
CN110942766A (en) * | 2019-11-29 | 2020-03-31 | 厦门快商通科技股份有限公司 | Audio event detection method, system, mobile terminal and storage medium |
CN111354373A (en) * | 2018-12-21 | 2020-06-30 | 中国科学院声学研究所 | Audio signal classification method based on neural network intermediate layer characteristic filtering |
CN111613240A (en) * | 2020-05-22 | 2020-09-01 | 杭州电子科技大学 | Camouflage voice detection method based on attention mechanism and Bi-LSTM |
CN112133084A (en) * | 2019-06-25 | 2020-12-25 | 浙江吉智新能源汽车科技有限公司 | Road information sharing method, device and system |
CN112986914A (en) * | 2021-02-10 | 2021-06-18 | 中国兵器装备集团自动化研究所 | Individual helmet and target sound source positioning and voiceprint recognition method thereof |
CN113192322A (en) * | 2021-03-19 | 2021-07-30 | 东北大学 | Expressway traffic flow counting method based on cloud edge cooperation |
CN113257283A (en) * | 2021-03-29 | 2021-08-13 | 北京字节跳动网络技术有限公司 | Audio signal processing method and device, electronic equipment and storage medium |
CN113611288A (en) * | 2021-08-06 | 2021-11-05 | 南京华捷艾米软件科技有限公司 | Audio feature extraction method, device and system |
CN113920473A (en) * | 2021-10-15 | 2022-01-11 | 宿迁硅基智能科技有限公司 | Complete event determination method, storage medium and electronic device |
CN114863950A (en) * | 2022-07-07 | 2022-08-05 | 深圳神目信息技术有限公司 | Baby crying detection and network establishment method and system based on anomaly detection |
CN114927141A (en) * | 2022-07-19 | 2022-08-19 | 中国人民解放军海军工程大学 | Method and system for detecting abnormal underwater acoustic signals |
CN115206294A (en) * | 2022-09-16 | 2022-10-18 | 深圳比特微电子科技有限公司 | Training method, sound event detection method, device, equipment and medium |
CN116186524A (en) * | 2023-05-04 | 2023-05-30 | 天津大学 | Self-supervision machine abnormal sound detection method |
CN117268796A (en) * | 2023-11-16 | 2023-12-22 | 天津大学 | Vehicle fault acoustic event detection method |
CN118072766A (en) * | 2024-04-24 | 2024-05-24 | 南京小草交通科技有限公司 | Highway event perception system based on sound detects |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106251860A (en) * | 2016-08-09 | 2016-12-21 | 张爱英 | Unsupervised novelty audio event detection method and system towards safety-security area |
CN106952644A (en) * | 2017-02-24 | 2017-07-14 | 华南理工大学 | A kind of complex audio segmentation clustering method based on bottleneck characteristic |
WO2017165551A1 (en) * | 2016-03-22 | 2017-09-28 | Sri International | Systems and methods for speech recognition in unseen and noisy channel conditions |
CN107393554A (en) * | 2017-06-20 | 2017-11-24 | 武汉大学 | In a kind of sound scene classification merge class between standard deviation feature extracting method |
-
2017
- 2017-12-11 CN CN201711305135.9A patent/CN108182949A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017165551A1 (en) * | 2016-03-22 | 2017-09-28 | Sri International | Systems and methods for speech recognition in unseen and noisy channel conditions |
CN106251860A (en) * | 2016-08-09 | 2016-12-21 | 张爱英 | Unsupervised novelty audio event detection method and system towards safety-security area |
CN106952644A (en) * | 2017-02-24 | 2017-07-14 | 华南理工大学 | A kind of complex audio segmentation clustering method based on bottleneck characteristic |
CN107393554A (en) * | 2017-06-20 | 2017-11-24 | 武汉大学 | In a kind of sound scene classification merge class between standard deviation feature extracting method |
Non-Patent Citations (5)
Title |
---|
MASSIMILIANO TODISCO ET AL: "A New Feature for Automatic Speaker Verification Anti-Spoofing:Constant Q Cepstral Coefficients", 《ODYSSEY 2016-THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP》 * |
YANXIONG LI ET AL: "Using multi-stream hierarchical deep neural network to extract deep audio feature for acoustic event detection", 《MULTIMED TOOLS APPL》 * |
史秋莹: "基于深度学习和迁移学习的环境声音识别", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
金海: "基于深度神经网络的音频事件检测", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
黄丽霞等: "基于深度自编码网络语音识别噪声鲁棒性研究", 《计算机工程与应用》 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109036382A (en) * | 2018-08-15 | 2018-12-18 | 武汉大学 | A kind of audio feature extraction methods based on KL divergence |
CN109065075A (en) * | 2018-09-26 | 2018-12-21 | 广州势必可赢网络科技有限公司 | A kind of method of speech processing, device, system and computer readable storage medium |
CN109461458A (en) * | 2018-10-26 | 2019-03-12 | 合肥工业大学 | A kind of audio method for detecting abnormality based on generation confrontation network |
CN109461458B (en) * | 2018-10-26 | 2022-09-13 | 合肥工业大学 | Audio anomaly detection method based on generation countermeasure network |
CN111354373A (en) * | 2018-12-21 | 2020-06-30 | 中国科学院声学研究所 | Audio signal classification method based on neural network intermediate layer characteristic filtering |
CN109584888A (en) * | 2019-01-16 | 2019-04-05 | 上海大学 | Whistle recognition methods based on machine learning |
CN110223715A (en) * | 2019-05-07 | 2019-09-10 | 华南理工大学 | It is a kind of based on sound event detection old solitary people man in activity estimation method |
CN110223715B (en) * | 2019-05-07 | 2021-05-25 | 华南理工大学 | Home activity estimation method for solitary old people based on sound event detection |
CN110390952A (en) * | 2019-06-21 | 2019-10-29 | 江南大学 | City sound event classification method based on bicharacteristic 2-DenseNet parallel connection |
CN110390952B (en) * | 2019-06-21 | 2021-10-22 | 江南大学 | City sound event classification method based on dual-feature 2-DenseNet parallel connection |
CN112133084A (en) * | 2019-06-25 | 2020-12-25 | 浙江吉智新能源汽车科技有限公司 | Road information sharing method, device and system |
CN112133084B (en) * | 2019-06-25 | 2022-04-12 | 浙江吉智新能源汽车科技有限公司 | Road information sharing method, device and system |
CN110718234A (en) * | 2019-09-02 | 2020-01-21 | 江苏师范大学 | Acoustic scene classification method based on semantic segmentation coding and decoding network |
CN110942766A (en) * | 2019-11-29 | 2020-03-31 | 厦门快商通科技股份有限公司 | Audio event detection method, system, mobile terminal and storage medium |
CN111613240A (en) * | 2020-05-22 | 2020-09-01 | 杭州电子科技大学 | Camouflage voice detection method based on attention mechanism and Bi-LSTM |
CN112986914A (en) * | 2021-02-10 | 2021-06-18 | 中国兵器装备集团自动化研究所 | Individual helmet and target sound source positioning and voiceprint recognition method thereof |
CN113192322A (en) * | 2021-03-19 | 2021-07-30 | 东北大学 | Expressway traffic flow counting method based on cloud edge cooperation |
CN113257283A (en) * | 2021-03-29 | 2021-08-13 | 北京字节跳动网络技术有限公司 | Audio signal processing method and device, electronic equipment and storage medium |
CN113257283B (en) * | 2021-03-29 | 2023-09-26 | 北京字节跳动网络技术有限公司 | Audio signal processing method and device, electronic equipment and storage medium |
CN113611288A (en) * | 2021-08-06 | 2021-11-05 | 南京华捷艾米软件科技有限公司 | Audio feature extraction method, device and system |
CN113920473A (en) * | 2021-10-15 | 2022-01-11 | 宿迁硅基智能科技有限公司 | Complete event determination method, storage medium and electronic device |
CN113920473B (en) * | 2021-10-15 | 2022-07-29 | 宿迁硅基智能科技有限公司 | Complete event determination method, storage medium and electronic device |
CN114863950A (en) * | 2022-07-07 | 2022-08-05 | 深圳神目信息技术有限公司 | Baby crying detection and network establishment method and system based on anomaly detection |
CN114927141A (en) * | 2022-07-19 | 2022-08-19 | 中国人民解放军海军工程大学 | Method and system for detecting abnormal underwater acoustic signals |
CN115206294A (en) * | 2022-09-16 | 2022-10-18 | 深圳比特微电子科技有限公司 | Training method, sound event detection method, device, equipment and medium |
CN116186524A (en) * | 2023-05-04 | 2023-05-30 | 天津大学 | Self-supervision machine abnormal sound detection method |
CN116186524B (en) * | 2023-05-04 | 2023-07-18 | 天津大学 | Self-supervision machine abnormal sound detection method |
CN117268796A (en) * | 2023-11-16 | 2023-12-22 | 天津大学 | Vehicle fault acoustic event detection method |
CN117268796B (en) * | 2023-11-16 | 2024-01-26 | 天津大学 | Vehicle fault acoustic event detection method |
CN118072766A (en) * | 2024-04-24 | 2024-05-24 | 南京小草交通科技有限公司 | Highway event perception system based on sound detects |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108182949A (en) | A kind of highway anomalous audio event category method based on depth conversion feature | |
CN110827837B (en) | Whale activity audio classification method based on deep learning | |
DE112017001830B4 (en) | VOICE ENHANCEMENT AND AUDIO EVENT DETECTION FOR A NON-STATIONARY NOISE ENVIRONMENT | |
CN110246510B (en) | End-to-end voice enhancement method based on RefineNet | |
CN107393542A (en) | A kind of birds species identification method based on binary channels neutral net | |
CN108922513A (en) | Speech differentiation method, apparatus, computer equipment and storage medium | |
CN108630209B (en) | Marine organism identification method based on feature fusion and deep confidence network | |
CN101366078A (en) | Neural network classifier for separating audio sources from a monophonic audio signal | |
CN108711436A (en) | Speaker verification's system Replay Attack detection method based on high frequency and bottleneck characteristic | |
CN111128209B (en) | Speech enhancement method based on mixed masking learning target | |
CN113724712B (en) | Bird sound identification method based on multi-feature fusion and combination model | |
CN111696580B (en) | Voice detection method and device, electronic equipment and storage medium | |
CN109036470A (en) | Speech differentiation method, apparatus, computer equipment and storage medium | |
CN108962229A (en) | A kind of target speaker's voice extraction method based on single channel, unsupervised formula | |
CN108806725A (en) | Speech differentiation method, apparatus, computer equipment and storage medium | |
CN114495973A (en) | Special person voice separation method based on double-path self-attention mechanism | |
CN115081473A (en) | Multi-feature fusion brake noise classification and identification method | |
CN110544482A (en) | single-channel voice separation system | |
Nossier et al. | Mapping and masking targets comparison using different deep learning based speech enhancement architectures | |
Shifas et al. | A non-causal FFTNet architecture for speech enhancement | |
Wang et al. | Low pass filtering and bandwidth extension for robust anti-spoofing countermeasure against codec variabilities | |
CN110390937A (en) | A kind of across channel method for recognizing sound-groove based on ArcFace loss algorithm | |
CN104240717A (en) | Voice enhancement method based on combination of sparse code and ideal binary system mask | |
EP0658874B1 (en) | Process and circuit for producing from a speech signal with small bandwidth a speech signal with great bandwidth | |
CN113270110A (en) | ZPW-2000A track circuit transmitter and receiver fault diagnosis method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180619 |