CN106531159B - A kind of mobile phone source title method based on equipment background noise spectrum signature - Google Patents
A kind of mobile phone source title method based on equipment background noise spectrum signature Download PDFInfo
- Publication number
- CN106531159B CN106531159B CN201611129639.5A CN201611129639A CN106531159B CN 106531159 B CN106531159 B CN 106531159B CN 201611129639 A CN201611129639 A CN 201611129639A CN 106531159 B CN106531159 B CN 106531159B
- Authority
- CN
- China
- Prior art keywords
- mobile phone
- background noise
- nearly
- mute section
- word bank
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001228 spectrum Methods 0.000 title claims abstract description 42
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 47
- 238000012360 testing method Methods 0.000 claims abstract description 40
- 230000009467 reduction Effects 0.000 claims abstract description 25
- 239000000284 extract Substances 0.000 claims abstract description 5
- 230000003595 spectral effect Effects 0.000 claims description 15
- 238000012805 post-processing Methods 0.000 claims description 13
- 238000001514 detection method Methods 0.000 claims description 12
- 230000003044 adaptive effect Effects 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 5
- 230000001174 ascending effect Effects 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 239000000126 substance Substances 0.000 claims description 4
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 2
- 230000002401 inhibitory effect Effects 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims 1
- 238000005070 sampling Methods 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000010606 normalization Methods 0.000 abstract 2
- 230000006870 function Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 10
- 238000002474 experimental method Methods 0.000 description 6
- 238000000513 principal component analysis Methods 0.000 description 6
- DWDGSKGGUZPXMQ-UHFFFAOYSA-N OPPO Chemical compound OPPO DWDGSKGGUZPXMQ-UHFFFAOYSA-N 0.000 description 3
- 244000062793 Sorghum vulgare Species 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 235000019713 millet Nutrition 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 241000208340 Araliaceae Species 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/012—Comfort noise or silence coding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Artificial Intelligence (AREA)
- Telephone Function (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a kind of mobile phone source title methods based on equipment background noise spectrum signature, it carries out nearly mute section of estimation to each speech samples in the corresponding voice word bank of each mobile phone and extracts, and each nearly mute section is post-processed and spliced is integrated into final closely mute section;Then corresponding tested speech word bank is obtained according to corresponding all final nearly mute sections of each mobile phone;Then the corresponding common background noise model of all mobile phones is obtained;The final background noise of each of each mobile phone and its spectrum distribution feature are obtained later;Again by constructing total training set and total test set, and respectively obtain training characteristics space and test feature space;Dimensionality reduction and normalization are successively finally carried out to training characteristics space and test feature space, then model training is carried out to the training characteristics space after normalization, using trained more disaggregated models to the nearly mute section of progress discriminant classification of each of total test set;Advantage is that recognition accuracy is high, stability is good, and computation complexity is low.
Description
Technical field
The present invention relates to a kind of mobile phone source title technologies, are based on equipment background noise spectrum signature more particularly, to one kind
Mobile phone source title method.
Background technique
Nowadays, with the fast development of mobile Internet and microchip industry, mobile terminal is no longer only a kind of communication
Equipment, but part indispensable in people's life.More and more people start to be caught with portable equipments such as smart phone, PAD
Catch and acquire the scene that they see or hear, rather than with camera, recording pen, DV (Digital Video, digital video) etc.
Professional equipment.However, the availability of a large amount of digital collection equipment and acquisition data brings new problem and challenge --- it is more
The safety problem of media.As a kind of detection multi-medium data originality, authenticity, the technology of integrality, multimedia evidence obtaining skill
Art is the hot research problem of information security field.
Mobile phone source title is maximally related application of collecting evidence with multimedia, is used to detect digital recorded file source true
Property and reliability.This research direction causes the concern of a large amount of evidence obtaining researchers, and obtains major progress in recent years.
Such as: Hanilci, C., Ertas, F., Ertas, T., Eskidere, O.Recognition of brand and models of
Cell-Phones from recorded speech signals.IEEE Trans.Inf.Forensics Security.7
(2), (identification of mobile phone brand and model based on recorded audio signals, Institute of Electrical and Electric Engineers are more by 625-634 (2012)
Media evidence obtaining and safe journal) in propose it is a kind of by extract recording file MFCC (Mel Frequency Cepstrum
Coefficient, mel-frequency cepstrum coefficient) knowledge method for distinguishing of the characteristic information for mobile phone brand and model, at 14
In the closed set identification experiment of the cell phone apparatus of different model, discrimination can achieve 96.42%.For another example: Kotropoulos,
C.Source phone identification using sketches of features.IET Biometrics.3(2):
75-83 (2014) (the mobile phone source title based on feature rarefaction representation, British Institute of Engineering Technology, biological journals), pass through
Logarithm is taken to the speech signal spec-trum for the recording file that different mobile phones obtain, be then averaged along time shaft or passes through stacking is every
The characteristic parameter of one frame simultaneously models to obtain large-sized feature vector based on gauss hybrid models, then by being mapped to low-dimensional sky
Between carry out dimensionality reduction, 7 brands, 21 models mobile phone source title experiment in, discrimination can reach 94%.
However, the research of existing most of mobile phone source titles is the characteristic of division extracted based on voice itself, such as:
MFCC (Mel Frequency Cepstrum Coefficient, mel-frequency cepstrum coefficient) feature, LFCC (Linear
Frequency Cepstrum Coefficients, linear frequency cepstrum coefficient) feature, short-time characteristic etc..Although these are related
Feature achieves satisfactory effect in mobile phone source title, but the mobile phone of the characteristic of division extracted based on voice itself
The effect of source title may be subjected to the interference of many condition of uncertainty, as gender, the emotion of speaker change, voice content
Deng, to will affect discrimination and stability, and the identification of the mobile phone source title of the characteristic of division extracted based on voice itself
It still needs further improvement for rate and stability.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of mobile phone sources based on equipment background noise spectrum signature
Recognition methods, recognition accuracy is high, stability is good, and computation complexity is low.
The technical scheme of the invention to solve the technical problem is: a kind of be based on equipment background noise spectrum signature
Mobile phone source title method, it is characterised in that the following steps are included:
1. choosing the mobile phone of M different main brand difference mainstream models, and choose the ginseng of P all ages and classes different sexes
With person;Then the voice that each participant reads aloud immobilized substance with normal word speed, each mobile phone are acquired simultaneously using M mobile phone
P voice is collected altogether, M mobile phone collects M × P voice altogether, it is desirable that the duration of each voice is at least 3 minutes;Then
The collected each voice of each mobile phone is converted into wav formatted voice;Later by the corresponding each wav format language of each mobile phone
Cent is cut into 3~10 seconds sound bites, and takes 10 sound bites as speech samples;It is again that each mobile phone is corresponding total
10P speech samples constitute a voice word bank;Wherein, M > 1, P >=1;
2. being carried out using adaptive end-point detection algorithm to each speech samples in the corresponding voice word bank of each mobile phone
Nearly mute section of estimation is extracted;Then close quiet to being extracted from each speech samples in the corresponding voice word bank of each mobile phone
Segment is post-processed, and to eliminate unnecessary phonological component in mute section nearly, is obtained in the corresponding voice word bank of each mobile phone
The corresponding multistage post-processing of each speech samples after nearly mute section;Again by each of corresponding voice word bank of each mobile phone
Nearly mute section after speech samples corresponding multistage post-processing be stitched together be integrated into one it is final mute section nearly;
3. retain each mobile phone it is all final nearly mute sections corresponding in duration it is final close mute more than or equal to 1.5 seconds
Section, and the corresponding spectrum distribution feature for being used to seek background noise of the mobile phone is constituted by all final nearly mute sections of reservation
Tested speech word bank;
4. inhibiting nearly mute section of the back of each of corresponding tested speech word bank of each mobile phone using improved spectrum-subtraction
Scape noise obtains nearly mute section of the ambient noise model of each of corresponding tested speech word bank of each mobile phone;Then it obtains
The corresponding common background noise model of all mobile phones, by the corresponding common background noise model of mobile phone had at k-th of Frequency point
It is described as BNmean(k),Wherein, symbol " | | " is the symbol that takes absolute value, BNm
(k, n) is indicated in the sound spectrograph of all nearly mute section of ambient noise models in the corresponding tested speech word bank of m-th of mobile phone
K-th of Frequency point, n-th frame in the spectral coefficient in Short Time Fourier Transform domain, 1≤k≤K, K indicate each nearly mute section of frequency
The total number of rate point,KfftIndicate the points of Short Time Fourier Transform;
5. by the nearly mute section of general back corresponding with all mobile phones of each of corresponding tested speech word bank of each mobile phone
A background noise of the difference of scape noise model as the mobile phone;Then intermediate value is carried out to each background noise of each mobile phone
Ambient noise remaining in each background noise to remove each mobile phone is filtered, obtains the final sheet of each of each mobile phone
Back noise;Then Fourier transformation is carried out to the final background noise of each of each mobile phone, it is final obtains each of each mobile phone
The spectral coefficient of background noise;The logarithm for taking 10 to the spectral coefficient of the final background noise of each of each mobile phone again obtains every
The spectral coefficient of the final background noise of each of a mobile phone taken after logarithm;It is finally final to each of each mobile phone along time shaft
T frame is averaged before the spectral coefficient of background noise taken after logarithm, using the average value as the final background of each of each mobile phone
The spectrum distribution feature of noise;Wherein, the points of Fourier transformation are Kfft, the duration of T frame is less than or equal to 1.5 seconds, and T >=
3, the dimension of the spectrum distribution feature of the final background noise of each of each mobile phone is K;
6. counting nearly mute section of the total number in the corresponding tested speech word bank of each mobile phone, it is with the smallest total number
Base value, the half that base value is randomly choosed from all closely mute sections in the corresponding tested speech word bank of each mobile phone are closely quiet
Segment constitutes the corresponding sub- training set of each mobile phone, from remaining all close quiet in the corresponding tested speech word bank of each mobile phone
Nearly mute section of half that base value is randomly choosed in segment constitutes the corresponding sub- test set of each mobile phone;Then by all mobile phones pair
The sub- training set answered constitutes a total training set, and the corresponding sub- test set of all mobile phones is constituted a total test set;Then
The spectrum distribution feature of the final background noise of all mobile phones obtained by total training set is constituted into a training characteristics space, and
The spectrum distribution feature of the final background noise of all mobile phones obtained by total test set is constituted into a test feature space;It
Dimensionality reduction operation is carried out to training characteristics space using principal component analytical method afterwards, then to the institute in the training characteristics space after dimensionality reduction
There is value to be normalized;And mapping matrix used by dimensionality reduction operates is carried out to test feature according to training characteristics space
Space carries out dimensionality reduction operation, then all values in the test feature space after dimensionality reduction are normalized;Finally utilize
Matlab included svm classifier function first carries out model training to the training characteristics space after normalized, obtains an instruction
The more disaggregated models perfected recycle mute section close to each of total test set of trained more disaggregated models classify sentencing
Not.
The step 2. in it is close to being extracted from each speech samples in the corresponding voice word bank of each mobile phone
The mute section of detailed process post-processed are as follows: find out in each speech samples in the corresponding voice word bank of each mobile phone and extract
All sampled points in the sampled value of all sampled points on nearly mute section out less than 5 × Thr, per continuous multiple sampled points
Nearly mute section after forming one section of post-processing, each speech samples obtained in the corresponding voice word bank of each mobile phone are corresponding more
Nearly mute section after section post-processing;Wherein, Thr indicates sub from the corresponding voice of each mobile phone using adaptive end-point detection algorithm
After the absolute value ascending order arrangement of the sampled value of all sampled points on nearly mute section extracted in each speech samples in library
The average value of preceding 30~50% all sampled values.
The step svm classifier function that 6. middle Matlab is carried uses RBF kernel function, what Matlab was carried
Optimal value is obtained using cross validation mode to penalty coefficient and gamma factor in svm classifier function.
Compared with the prior art, the advantages of the present invention are as follows:
1) the method for the present invention utilizes each of corresponding tested speech word bank of each mobile phone nearly mute section and all mobile phones
Corresponding common background noise model is to estimate the background noise of each mobile phone, then carries out to each background noise of each mobile phone
Median filter process, so that without containing remaining ambient noise in the final background noise of each of obtained each mobile phone, thus
Enable the spectrum distribution feature obtained on this basis preferably to carry out the classification of mobile phone, passes through the lot of experiment validation present invention
The discrimination of method can achieve 99.24%.
2) previous mobile phone source title method is mostly all based on voice sample information, vulnerable to the text in speech samples
The factors such as this information, the emotion of speaker influence, and cause to identify that stability is poor, and the method for the present invention be based on nearly mute section into
The extraction of line frequency Spectrum distribution characteristic and mobile phone source title, stability is more preferable.
3) process that spectrum distribution feature is extracted in the method for the present invention is simple, and empty to training characteristics space and test feature
Between after dimensionality reduction, data calculation amount greatly reduces, and computational efficiency is high, and computation complexity is low.
Detailed description of the invention
Fig. 1 is that the overall of the method for the present invention realizes block diagram;
Fig. 2 a is the waveform diagram of a speech samples;
Fig. 2 b is that the detection in the waveform diagram of existing adaptive end-point detection algorithm speech samples shown in Fig. 2 a is shown
It is intended to;
Fig. 2 c is nearly mute section of the waveform diagram extracted in speech samples shown in Fig. 2 a;
Fig. 2 d is post-treated final mute section nearly with what is obtained after splicing for nearly mute section shown in Fig. 2 c;
Fig. 3 a is the sound spectrograph of the final background noise of HTC D820t mobile phone;
Fig. 3 b is the sound spectrograph of the final background noise of 7 mobile phone of Huawei's honor;
Fig. 3 c is the sound spectrograph of the final background noise of 5 mobile phone of apple;
Fig. 3 d is the sound spectrograph of the final background noise of another 5 mobile phone of apple;
Fig. 3 e is the sound spectrograph of the final background noise of Meizu MX4 mobile phone;
Fig. 3 f is the sound spectrograph of the final background noise of 3 mobile phone of millet;
Fig. 3 g is the sound spectrograph of the final background noise of OPPO mono- plus mobile phone;
Fig. 3 h is the sound spectrograph of the final background noise of the happy generation S5 mobile phone of Samsung lid;
Fig. 4 a is the sound spectrograph of the practical background noise of iphone6 mobile phone;
Fig. 4 b is the sound spectrograph of the final background noise of the iphone6 mobile phone obtained using the method for the present invention;
Fig. 4 c be iphone6 mobile phone practical background noise with the iphone6 mobile phone that is obtained using the method for the present invention most
The frequency spectrum comparison schematic diagram of whole background noise.
Specific embodiment
The present invention will be described in further detail below with reference to the embodiments of the drawings.
A kind of mobile phone source title method based on equipment background noise spectrum signature proposed by the present invention, it is overall to realize
Block diagram as shown in Figure 1, itself the following steps are included:
1. choosing the mobile phone of M different main brand difference mainstream models, and choose the ginseng of P all ages and classes different sexes
With person;Then the voice that each participant reads aloud immobilized substance with normal word speed, each mobile phone are acquired simultaneously using M mobile phone
P voice is collected altogether, M mobile phone collects M × P voice altogether, it is desirable that the duration of each voice is at least 3 minutes;Then
The collected each voice of each mobile phone is converted into wav formatted voice;Later by the corresponding each wav format language of each mobile phone
Cent is cut into 3~10 seconds sound bites, and takes 10 sound bites as speech samples;It is again that each mobile phone is corresponding total
10P speech samples constitute a voice word bank;Wherein, M > 1, takes M=24 in the present embodiment, P >=1, in the present embodiment
P=12 is taken, such as includes the male participant of 6 all ages and classes, the women participant of 6 all ages and classes, the acquisition of each voice
Environment quiet, quiet office between selecting one in the present embodiment.
2. using existing adaptive end-point detection algorithm to each voice sample in the corresponding voice word bank of each mobile phone
The nearly mute section of estimation of this progress is extracted;Then it is extracted to from each speech samples in the corresponding voice word bank of each mobile phone
Nearly mute section post-processed, to eliminate unnecessary phonological component in nearly mute section, obtain the corresponding voice of each mobile phone
Nearly mute section after the corresponding multistage post-processing of each speech samples in word bank;It again will be in the corresponding voice word bank of each mobile phone
The corresponding multistage post-processing of each speech samples after nearly mute section be stitched together be integrated into one it is final mute section nearly, obtain
Final nearly mute section of duration be certainly less than the durations of corresponding speech samples.
Here, the reason of first carrying out nearly mute section of estimation to each speech samples is nearly mute section of voice mainly by this
What back noise and ambient noise were constituted, it will not be by acoustic-electric non_uniform response prevailing in the integrated noise of phonological component
Noise is polluted, therefore carries out nearly mute section of estimation, adaptive end-point detection algorithm using adaptive end-point detection algorithm herein
It can identify well mute section nearly;But also contain a small amount of voice messaging in nearly mute section of identification, in order to further eliminate
Phonological component post-processes nearly mute section, and integrates and obtain final closely mute section.
Fig. 2 a gives the waveform diagram of a speech samples, and Fig. 2 b, which gives existing adaptive end-point detection algorithm, to scheme
The schematic diagram detected in the waveform diagram of speech samples shown in 2a, Fig. 2 c gives to be extracted from speech samples shown in Fig. 2 a
Nearly mute section of waveform diagram, Fig. 2 d give shown in Fig. 2 c obtained after nearly mute section post-treated and splicing it is final close quiet
Segment.As can be seen that the method for the present invention can be good at closely mute section of identification from Fig. 2 a and Fig. 2 b;It can from Fig. 2 c
Out, nearly mute section extracted also contain a small amount of voice messaging, and it can be seen from figure 2d that, through close quiet in the method for the present invention
After segment post-processing, final nearly mute section obtained is without containing voice messaging.
In this particular embodiment, step 2. in from each speech samples in the corresponding voice word bank of each mobile phone
The nearly mute section of detailed process post-processed extracted are as follows: find out each voice in the corresponding voice word bank of each mobile phone
All sampled points in the sampled value of all sampled points on nearly mute section extracted in sample less than 5 × Thr, per continuous
Multiple sampled points form nearly mute section after one section of post-processing, obtain each voice sample in the corresponding voice word bank of each mobile phone
Nearly mute section after this corresponding multistage post-processing;Wherein, Thr is indicated using existing adaptive end-point detection algorithm from each
The sampled value of all sampled points on nearly mute section extracted in each speech samples in the corresponding voice word bank of mobile phone
The average value of preceding 30~50% all sampled values, takes Thr to be equal to each voice in the present embodiment after the arrangement of absolute value ascending order
The absolute value ascending order of the sampled value of all sampled points on nearly mute section extracted in each speech samples in word bank arranges
The average value of preceding 40% all sampled values afterwards.
3. since corresponding final nearly mute section of the length of all speech samples in each voice word bank is inconsistent,
Therefore in order to guarantee that eigenmatrix length is consistent at construction feature space, retain duration and be greater than or equal to 1.5 seconds most
It is mute section nearly eventually, and it is final mute section nearly less than 1.5 seconds to remove duration.It is corresponding all final close mute to retain each mobile phone
Duration is final mute section nearly more than or equal to 1.5 seconds in section (corresponding 10P final closely mute section of each voice word bank), and
The test of the corresponding spectrum distribution feature for being used to seek background noise of the mobile phone is constituted by all final nearly mute sections of reservation
Voice word bank.
4. to inhibit ambient noise as far as possible to obtain actual background noise from final closely mute section.Therefore
Nearly mute section of the background of each of corresponding tested speech word bank of each mobile phone is inhibited to make an uproar using existing improved spectrum-subtraction
Sound obtains nearly mute section of the ambient noise model of each of corresponding tested speech word bank of each mobile phone;Then owned
The corresponding common background noise model of mobile phone describes the corresponding common background noise model of mobile phone had at k-th of Frequency point
For BNmean(k),Wherein, symbol " | | " is the symbol that takes absolute value, BNm(k,n)
Indicate the kth in the sound spectrograph of all nearly mute section of ambient noise models in the corresponding tested speech word bank of m-th of mobile phone
In the spectral coefficient in the domain Short Time Fourier Transform (STFT), 1≤k≤K, K indicate each nearly mute section for a Frequency point, n-th frame
The total number of Frequency point,KfftIndicate the points of Short Time Fourier Transform, it in the present embodiment will be in short-term in Fu
The points of leaf transformation are set as 4096, take
5. by the nearly mute section of general back corresponding with all mobile phones of each of corresponding tested speech word bank of each mobile phone
A background noise of the difference of scape noise model as the mobile phone;Then intermediate value is carried out to each background noise of each mobile phone
Ambient noise remaining in each background noise to remove each mobile phone is filtered, obtains the final sheet of each of each mobile phone
Back noise;Then Fourier transformation is carried out to the final background noise of each of each mobile phone, it is final obtains each of each mobile phone
The spectral coefficient of background noise;The logarithm for taking 10 to the spectral coefficient of the final background noise of each of each mobile phone again obtains every
The spectral coefficient of the final background noise of each of a mobile phone taken after logarithm;It is finally final to each of each mobile phone along time shaft
T frame is averaged before the spectral coefficient of background noise taken after logarithm, using the average value as the final background of each of each mobile phone
The spectrum distribution feature of noise;Wherein, the points of Fourier transformation are Kfft, the duration of T frame is less than or equal to 1.5 seconds, and T >=
3, the dimension of the spectrum distribution feature of the final background noise of each of each mobile phone is K.
Fig. 3 a gives the sound spectrograph of the final background noise of HTC D820t mobile phone, and Fig. 3 b gives 7 mobile phone of Huawei's honor
Final background noise sound spectrograph, Fig. 3 c gives the sound spectrograph of the final background noise of 5 mobile phone of apple, and Fig. 3 d is provided
The sound spectrograph of the final background noise of another 5 mobile phone of apple, Fig. 3 e give the final background noise of Meizu MX4 mobile phone
Sound spectrograph, Fig. 3 f give the sound spectrograph of the final background noise of 3 mobile phone of millet, and Fig. 3 g gives OPPO mono- and adds the final of mobile phone
The sound spectrograph of background noise, Fig. 3 h give the sound spectrograph of the final background noise of the happy generation S5 mobile phone of Samsung lid.From Fig. 3 a to figure
In 3h as can be seen that the sound spectrograph of background noise of different brands mobile phone there are great differences, for example, the background of 3 mobile phone of millet
The energy of noise is all strongest, the sound spectrograph of the background noise of Meizu MX4 mobile phone at all Frequency point intervals (0-16KHZ)
Amplitude curve be with frequency in fluctuating change trend, the sound spectrograph of the background noise of HTC D820t mobile phone is in frequency
Near 4000Hz, there is a sharp decline.
Fig. 4 a gives the sound spectrograph of the practical background noise of iphone6 mobile phone, and Fig. 4 b, which gives, utilizes the method for the present invention
The sound spectrograph of the final background noise of obtained iphone6 mobile phone, Fig. 4 c give the practical background noise of iphone6 mobile phone with
The frequency spectrum of the final background noise of the iphone6 mobile phone obtained using the method for the present invention is compared.It can be seen that from Fig. 4 c
The frequency of the practical background noise of iphone6 mobile phone and the final background noise of the iphone6 mobile phone obtained using the method for the present invention
Spectrum be it is much like, absolutely prove that it is feasible and effective that the method for final background noise of mobile phone is obtained in the method for the present invention.
6. counting nearly mute section of the total number in the corresponding tested speech word bank of each mobile phone, it is with the smallest total number
Base value, the half that base value is randomly choosed from all closely mute sections in the corresponding tested speech word bank of each mobile phone are closely quiet
Segment constitutes the corresponding sub- training set of each mobile phone, from remaining all close quiet in the corresponding tested speech word bank of each mobile phone
Nearly mute section of half that base value is randomly choosed in segment constitutes the corresponding sub- test set of each mobile phone;Then by all mobile phones pair
The sub- training set answered constitutes a total training set, and the corresponding sub- test set of all mobile phones is constituted a total test set;Then
The spectrum distribution feature of the final background noise of all mobile phones obtained by total training set is constituted into a training characteristics space, and
The spectrum distribution feature of the final background noise of all mobile phones obtained by total test set is constituted into a test feature space;It
Dimensionality reduction operation is carried out to training characteristics space using principal component analysis (PCA) method afterwards, then to the training characteristics space after dimensionality reduction
In all values be normalized;And mapping matrix used by dimensionality reduction operates is carried out to survey according to training characteristics space
It tries feature space and carries out dimensionality reduction operation, then all values in the test feature space after dimensionality reduction are normalized;Finally
The svm classifier function carried using Matlab first carries out model training to the training characteristics space after normalized, obtains one
A trained more disaggregated models recycle trained more disaggregated models mute section point nearly to each of total test set
Class differentiates.
In this particular embodiment, the step svm classifier function that 6. middle Matlab is carried uses RBF kernel function,
Optimal value is obtained using cross validation mode to penalty coefficient and gamma factor in Matlab included svm classifier function.
The feasibility and validity of method in order to further illustrate the present invention carries out experimental verification to the method for the present invention.
In an experiment, the corresponding voice word bank of each mobile phone is established, effectively to assess the feasible of the method for the present invention
Property and validity.Table 1 lists the brand and model for testing used 24 mobile phones, and 24 mobile phones is utilized to acquire voices.
12 participants (6 male 6 female) are invited to participate in voice collecting.Each participant needs to read aloud immobilized substance with normal word speed, when
Long guarantee 3 minutes or more.Playback environ-ment is relatively quiet office between one, and 24 mobile phones open simultaneously and close recorder.
Each mobile phone acquires the voice of 12 participants, and each voice is divided into 5 seconds sound bites, and each mobile phone obtains 400
A speech samples constitute the corresponding voice word bank of the mobile phone.To each speech samples in the corresponding voice word bank of each mobile phone
It carries out nearly mute section of estimation to extract, obtains nearly mute section in each speech samples in the corresponding voice word bank of each mobile phone,
It is obtained after post-treated again and splicing final mute section nearly in each speech samples in the corresponding voice word bank of each mobile phone.
Since nearly mute section of length is inconsistent, in order to guarantee that eigenmatrix length is consistent at construction feature space, choosing
Take 240 number of speech frames of each model mobile phone greater than nearly mute section of 40 frames, composition seeks the spectrum distribution feature of background noise
Tested speech word bank.When construction feature space, the spectrum distribution feature of the background noise of each nearly mute section of preceding 40 frame is taken
Average value, herein frame length be 30 milliseconds, frame move be 15 milliseconds.
The brand and model and class name of mobile phone employed in the experiment of table 1
In conjunction with the included svm classifier function of principal component analysis (PCA) and Matlab, from the corresponding tested speech of each mobile phone
Nearly mute section of half that base value is randomly choosed in all closely mute sections in word bank constitutes the corresponding sub- training set of each mobile phone,
The half that base value is randomly choosed from remaining all closely mute sections in the corresponding tested speech word bank of each mobile phone is closely quiet
Segment constitutes the corresponding sub- test set of each mobile phone;Then the corresponding sub- training set of all mobile phones is constituted into a total training set,
And the corresponding sub- test set of all mobile phones is constituted into a total test set.By the final sheet of all mobile phones obtained by total training set
The spectrum distribution feature of back noise constitutes a training characteristics space, and by the final sheet of all mobile phones obtained by total test set
The spectrum distribution feature of back noise constitutes a test feature space.Dimensionality reduction is carried out to training characteristics space first with PCA, then
By all values normalized in the training characteristics space after dimensionality reduction, test feature space is according to training characteristics space dimensionality reduction institute
The mapping matrix of use carries out dimensionality reduction, and then all values in the test feature space after dimensionality reduction are normalized.Most
The svm classifier function carried afterwards using Matlab first carries out model training, then benefit to the training characteristics space after normalized
Discriminant classification is carried out to nearly mute section of each of total test set with trained more disaggregated models.
Above-mentioned, the points of Short Time Fourier Transform are 4096, the frequency of the final background noise of each of obtained each mobile phone
The dimension of Spectrum distribution characteristic is 2049, and the dimension of spectrum distribution feature is too big, and may result in spectrum distribution feature cannot be complete
It is independent uncorrelated.It since the spectrum distribution feature of redundancy cannot promote the accuracy rate of identification, or even will affect performance, therefore utilize
PCA carries out dimensionality reduction, forms best training characteristics space and test feature space, tests through experiment, the spectrum distribution after dimensionality reduction
When the dimension of feature is 28 dimension, discrimination highest is to penalty coefficient and gamma in Matlab included svm classifier function at this time
Several values correspond to 112 and 0.01.
Table 2 lists the discrimination of 24 mobile phones, and the data listed by the table 2 can be calculated average recognition accuracy and be
99.24%.The method of the present invention can preferably complete the Classification and Identification of 24 mobile phones, and the discrimination of apple 6 is 91.67%,
Main erroneous judgement is distinguished in brand class, it is mistaken for apple 4s and apple 5s.In addition to apple 6, other mobile phones have compared with
High recognition accuracy, wherein the discrimination for having 18 mobile phones to reach 100%, the brands such as Samsung, OPPO, Meizu can accomplish nothing
Error classification.From the point of view of above-mentioned experimental result, " fingerprint " that the background noise of mobile phone can be used as mobile phone carries out mobile phone well
Source title.In the audio forensics field of mobile phone source title, the background noise of mobile phone is the feature for having very much distinction.
The discrimination (%) of 2 24 mobile phones of table
Claims (3)
1. a kind of mobile phone source title method based on equipment background noise spectrum signature, it is characterised in that the following steps are included:
1. choosing the mobile phone of M different main brand difference mainstream models, and choose the participation of P all ages and classes different sexes
Person;Then the voice that each participant reads aloud immobilized substance with normal word speed is acquired simultaneously using M mobile phone, each mobile phone is total
P voice is collected, M mobile phone collects M × P voice altogether, it is desirable that the duration of each voice is at least 3 minutes;Then will
Each collected each voice of mobile phone is converted into wav formatted voice;Later by the corresponding each wav formatted voice of each mobile phone
3~10 seconds sound bites are divided into, and take 10 sound bites as speech samples;Again by the corresponding total 10P of each mobile phone
A speech samples constitute a voice word bank;Wherein, M > 1, P >=1;
2. being carried out using adaptive end-point detection algorithm to each speech samples in the corresponding voice word bank of each mobile phone close quiet
Segment estimation is extracted;Then to nearly mute section extracted from each speech samples in the corresponding voice word bank of each mobile phone
It is post-processed, to eliminate unnecessary phonological component in mute section nearly, is obtained every in the corresponding voice word bank of each mobile phone
Nearly mute section after the corresponding multistage post-processing of a speech samples;Again by each voice in the corresponding voice word bank of each mobile phone
Nearly mute section after sample corresponding multistage post-processing be stitched together be integrated into one it is final mute section nearly;
3. retain each mobile phone it is all final nearly mute sections corresponding in duration it is final mute section nearly more than or equal to 1.5 seconds,
And all final nearly mute sections of reservation are constituted the survey of the corresponding spectrum distribution feature for being used to seek background noise of the mobile phone
Try voice word bank;
4. inhibiting nearly mute section of the background of each of corresponding tested speech word bank of each mobile phone to make an uproar using improved spectrum-subtraction
Sound obtains nearly mute section of the ambient noise model of each of corresponding tested speech word bank of each mobile phone;Then owned
The corresponding common background noise model of mobile phone describes the corresponding common background noise model of mobile phone had at k-th of Frequency point
For BNmean(k),Wherein, symbol " | | " is the symbol that takes absolute value, BNm(k,n)
Indicate the kth in the sound spectrograph of all nearly mute section of ambient noise models in the corresponding tested speech word bank of m-th of mobile phone
In the spectral coefficient in Short Time Fourier Transform domain, 1≤k≤K, K indicate each nearly mute section of Frequency point for a Frequency point, n-th frame
Total number,KfftIndicate the points of Short Time Fourier Transform;
5. the nearly mute section of common background corresponding with all mobile phones of each of corresponding tested speech word bank of each mobile phone is made an uproar
A background noise of the difference of acoustic model as the mobile phone;Then median filtering is carried out to each background noise of each mobile phone
Ambient noise remaining in each background noise to remove each mobile phone is handled, the final background of each of each mobile phone is obtained and makes an uproar
Sound;Then Fourier transformation is carried out to the final background noise of each of each mobile phone, obtains the final background of each of each mobile phone
The spectral coefficient of noise;The logarithm for taking 10 to the spectral coefficient of the final background noise of each of each mobile phone again, obtains each hand
The spectral coefficient of the final background noise of each of machine taken after logarithm;Finally along time shaft to the final background of each of each mobile phone
T frame is averaged before the spectral coefficient of noise taken after logarithm, using the average value as the final background noise of each of each mobile phone
Spectrum distribution feature;Wherein, the points of Fourier transformation are Kfft, the duration of T frame is less than or equal to 1.5 seconds, and T >=3, often
The dimension of the spectrum distribution feature of the final background noise of each of a mobile phone is K;
6. nearly mute section of the total number in the corresponding tested speech word bank of each mobile phone is counted, on the basis of the smallest total number
Number, the half that base value is randomly choosed from all closely mute sections in the corresponding tested speech word bank of each mobile phone are mute section nearly
The corresponding sub- training set of each mobile phone is constituted, from remaining all closely mute sections in the corresponding tested speech word bank of each mobile phone
Nearly mute section of the half of middle random selection base value constitutes the corresponding sub- test set of each mobile phone;Then all mobile phones are corresponding
Sub- training set constitutes a total training set, and the corresponding sub- test set of all mobile phones is constituted a total test set;It then will be by
The spectrum distribution feature of the final background noise for all mobile phones that total training set obtains constitutes a training characteristics space, and will be by
The spectrum distribution feature of the final background noise for all mobile phones that total test set obtains constitutes a test feature space;It is sharp later
Dimensionality reduction operation is carried out to training characteristics space with principal component analytical method, then to all values in the training characteristics space after dimensionality reduction
It is normalized;And mapping matrix used by dimensionality reduction operates is carried out to test feature space according to training characteristics space
Dimensionality reduction operation is carried out, then all values in the test feature space after dimensionality reduction are normalized;Finally utilize Matlab
Included svm classifier function first to after normalized training characteristics space carry out model training, obtain one it is trained
More disaggregated models recycle trained more disaggregated models to the nearly mute section of progress discriminant classification of each of total test set.
2. a kind of mobile phone source title method based on equipment background noise spectrum signature according to claim 1, special
It levies close quiet to being extracted from each speech samples in the corresponding voice word bank of each mobile phone in being the step 2.
The detailed process that segment is post-processed are as follows: find out in each speech samples in the corresponding voice word bank of each mobile phone and extract
Nearly mute section on all sampled points sampled value in all sampled points less than 5 × Thr, per continuous multiple sampling dots
At nearly mute section after one section of post-processing, the corresponding multistage of each speech samples in the corresponding voice word bank of each mobile phone is obtained
Nearly mute section after post-processing;Wherein, Thr indicates to utilize adaptive end-point detection algorithm from the corresponding voice word bank of each mobile phone
In each speech samples in all sampled points on extract nearly mute section sampled value the arrangement of absolute value ascending order after before
The average value of 30~50% all sampled values.
3. a kind of mobile phone source title method based on equipment background noise spectrum signature according to claim 1 or 2,
It is characterized in that the step svm classifier function that 6. middle Matlab is carried uses RBF kernel function, what Matlab was carried
Optimal value is obtained using cross validation mode to penalty coefficient and gamma factor in svm classifier function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611129639.5A CN106531159B (en) | 2016-12-09 | 2016-12-09 | A kind of mobile phone source title method based on equipment background noise spectrum signature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611129639.5A CN106531159B (en) | 2016-12-09 | 2016-12-09 | A kind of mobile phone source title method based on equipment background noise spectrum signature |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106531159A CN106531159A (en) | 2017-03-22 |
CN106531159B true CN106531159B (en) | 2019-06-18 |
Family
ID=58341615
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611129639.5A Active CN106531159B (en) | 2016-12-09 | 2016-12-09 | A kind of mobile phone source title method based on equipment background noise spectrum signature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106531159B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106941008B (en) * | 2017-04-05 | 2020-11-24 | 华南理工大学 | Blind detection method for splicing and tampering of different source audios based on mute section |
CN107123419A (en) * | 2017-05-18 | 2017-09-01 | 北京大生在线科技有限公司 | The optimization method of background noise reduction in the identification of Sphinx word speeds |
CN107507626B (en) * | 2017-07-07 | 2021-02-19 | 宁波大学 | Mobile phone source identification method based on voice frequency spectrum fusion characteristics |
CN107274912B (en) * | 2017-07-13 | 2020-06-19 | 东莞理工学院 | Method for identifying equipment source of mobile phone recording |
CN108172224B (en) | 2017-12-19 | 2019-08-27 | 浙江大学 | Method based on the defence of machine learning without vocal command control voice assistant |
CN108461092B (en) * | 2018-03-07 | 2022-03-08 | 燕山大学 | Method for analyzing Parkinson's disease voice |
CN109285538B (en) * | 2018-09-19 | 2022-12-27 | 宁波大学 | Method for identifying mobile phone source in additive noise environment based on constant Q transform domain |
CN111092983B (en) * | 2019-12-25 | 2020-12-11 | 清华大学深圳国际研究生院 | Voice call echo and background noise suppression method based on sliding mode variable structure control |
CN112927680B (en) * | 2021-02-10 | 2022-06-17 | 中国工商银行股份有限公司 | Voiceprint effective voice recognition method and device based on telephone channel |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011107650A (en) * | 2009-11-20 | 2011-06-02 | Casio Computer Co Ltd | Voice feature amount calculation device, voice feature amount calculation method, voice feature amount calculation program and voice recognition device |
CN102394062B (en) * | 2011-10-26 | 2013-02-13 | 华南理工大学 | Method and system for automatically identifying voice recording equipment source |
CN106198765B (en) * | 2015-04-29 | 2019-03-15 | 中国科学院声学研究所 | A kind of acoustic signal recognition methods for Metal Crack monitoring |
CN105632516B (en) * | 2016-01-13 | 2019-07-30 | 宁波大学 | A kind of MP3 recording file source title method based on side information statistical property |
CN105845132A (en) * | 2016-03-22 | 2016-08-10 | 宁波大学 | Coding parameter statistical feature-based AAC sound recording document source identification method |
-
2016
- 2016-12-09 CN CN201611129639.5A patent/CN106531159B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106531159A (en) | 2017-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106531159B (en) | A kind of mobile phone source title method based on equipment background noise spectrum signature | |
CN109285538B (en) | Method for identifying mobile phone source in additive noise environment based on constant Q transform domain | |
CN107507626B (en) | Mobile phone source identification method based on voice frequency spectrum fusion characteristics | |
CN109036382B (en) | Audio feature extraction method based on KL divergence | |
CN102394062B (en) | Method and system for automatically identifying voice recording equipment source | |
CN108986824B (en) | Playback voice detection method | |
CN108231067A (en) | Sound scenery recognition methods based on convolutional neural networks and random forest classification | |
CN104835498A (en) | Voiceprint identification method based on multi-type combination characteristic parameters | |
CN107274916A (en) | The method and device operated based on voiceprint to audio/video file | |
Aggarwal et al. | Cellphone identification using noise estimates from recorded audio | |
Paul et al. | Countermeasure to handle replay attacks in practical speaker verification systems | |
Hanilçi et al. | Optimizing acoustic features for source cell-phone recognition using speech signals | |
CN113823293B (en) | Speaker recognition method and system based on voice enhancement | |
Murugappan et al. | DWT and MFCC based human emotional speech classification using LDA | |
CN105825857A (en) | Voiceprint-recognition-based method for assisting deaf patient in determining sound type | |
Jaafar et al. | Automatic syllables segmentation for frog identification system | |
Nandyal et al. | MFCC based text-dependent speaker identification using BPNN | |
CN110136746B (en) | Method for identifying mobile phone source in additive noise environment based on fusion features | |
Zeng et al. | Feature selection based on ReliefF and PCA for underwater sound classification | |
Kaminski et al. | Automatic speaker recognition using a unique personal feature vector and Gaussian Mixture Models | |
CN116386589A (en) | Deep learning voice reconstruction method based on smart phone acceleration sensor | |
Muralikrishna et al. | Spoken language identification in unseen target domain using within-sample similarity loss | |
Verma et al. | Cell-phone identification from recompressed audio recordings | |
CN114512133A (en) | Sound object recognition method, sound object recognition device, server and storage medium | |
CN106887229A (en) | A kind of method and system for lifting the Application on Voiceprint Recognition degree of accuracy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |