CN106531159B - A kind of mobile phone source title method based on equipment background noise spectrum signature - Google Patents

A kind of mobile phone source title method based on equipment background noise spectrum signature Download PDF

Info

Publication number
CN106531159B
CN106531159B CN201611129639.5A CN201611129639A CN106531159B CN 106531159 B CN106531159 B CN 106531159B CN 201611129639 A CN201611129639 A CN 201611129639A CN 106531159 B CN106531159 B CN 106531159B
Authority
CN
China
Prior art keywords
mobile phone
background noise
nearly
mute section
word bank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611129639.5A
Other languages
Chinese (zh)
Other versions
CN106531159A (en
Inventor
王让定
裴安山
严迪群
金超
徐宏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo University
Original Assignee
Ningbo University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo University filed Critical Ningbo University
Priority to CN201611129639.5A priority Critical patent/CN106531159B/en
Publication of CN106531159A publication Critical patent/CN106531159A/en
Application granted granted Critical
Publication of CN106531159B publication Critical patent/CN106531159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Artificial Intelligence (AREA)
  • Telephone Function (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a kind of mobile phone source title methods based on equipment background noise spectrum signature, it carries out nearly mute section of estimation to each speech samples in the corresponding voice word bank of each mobile phone and extracts, and each nearly mute section is post-processed and spliced is integrated into final closely mute section;Then corresponding tested speech word bank is obtained according to corresponding all final nearly mute sections of each mobile phone;Then the corresponding common background noise model of all mobile phones is obtained;The final background noise of each of each mobile phone and its spectrum distribution feature are obtained later;Again by constructing total training set and total test set, and respectively obtain training characteristics space and test feature space;Dimensionality reduction and normalization are successively finally carried out to training characteristics space and test feature space, then model training is carried out to the training characteristics space after normalization, using trained more disaggregated models to the nearly mute section of progress discriminant classification of each of total test set;Advantage is that recognition accuracy is high, stability is good, and computation complexity is low.

Description

A kind of mobile phone source title method based on equipment background noise spectrum signature
Technical field
The present invention relates to a kind of mobile phone source title technologies, are based on equipment background noise spectrum signature more particularly, to one kind Mobile phone source title method.
Background technique
Nowadays, with the fast development of mobile Internet and microchip industry, mobile terminal is no longer only a kind of communication Equipment, but part indispensable in people's life.More and more people start to be caught with portable equipments such as smart phone, PAD Catch and acquire the scene that they see or hear, rather than with camera, recording pen, DV (Digital Video, digital video) etc. Professional equipment.However, the availability of a large amount of digital collection equipment and acquisition data brings new problem and challenge --- it is more The safety problem of media.As a kind of detection multi-medium data originality, authenticity, the technology of integrality, multimedia evidence obtaining skill Art is the hot research problem of information security field.
Mobile phone source title is maximally related application of collecting evidence with multimedia, is used to detect digital recorded file source true Property and reliability.This research direction causes the concern of a large amount of evidence obtaining researchers, and obtains major progress in recent years. Such as: Hanilci, C., Ertas, F., Ertas, T., Eskidere, O.Recognition of brand and models of Cell-Phones from recorded speech signals.IEEE Trans.Inf.Forensics Security.7 (2), (identification of mobile phone brand and model based on recorded audio signals, Institute of Electrical and Electric Engineers are more by 625-634 (2012) Media evidence obtaining and safe journal) in propose it is a kind of by extract recording file MFCC (Mel Frequency Cepstrum Coefficient, mel-frequency cepstrum coefficient) knowledge method for distinguishing of the characteristic information for mobile phone brand and model, at 14 In the closed set identification experiment of the cell phone apparatus of different model, discrimination can achieve 96.42%.For another example: Kotropoulos, C.Source phone identification using sketches of features.IET Biometrics.3(2): 75-83 (2014) (the mobile phone source title based on feature rarefaction representation, British Institute of Engineering Technology, biological journals), pass through Logarithm is taken to the speech signal spec-trum for the recording file that different mobile phones obtain, be then averaged along time shaft or passes through stacking is every The characteristic parameter of one frame simultaneously models to obtain large-sized feature vector based on gauss hybrid models, then by being mapped to low-dimensional sky Between carry out dimensionality reduction, 7 brands, 21 models mobile phone source title experiment in, discrimination can reach 94%.
However, the research of existing most of mobile phone source titles is the characteristic of division extracted based on voice itself, such as: MFCC (Mel Frequency Cepstrum Coefficient, mel-frequency cepstrum coefficient) feature, LFCC (Linear Frequency Cepstrum Coefficients, linear frequency cepstrum coefficient) feature, short-time characteristic etc..Although these are related Feature achieves satisfactory effect in mobile phone source title, but the mobile phone of the characteristic of division extracted based on voice itself The effect of source title may be subjected to the interference of many condition of uncertainty, as gender, the emotion of speaker change, voice content Deng, to will affect discrimination and stability, and the identification of the mobile phone source title of the characteristic of division extracted based on voice itself It still needs further improvement for rate and stability.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of mobile phone sources based on equipment background noise spectrum signature Recognition methods, recognition accuracy is high, stability is good, and computation complexity is low.
The technical scheme of the invention to solve the technical problem is: a kind of be based on equipment background noise spectrum signature Mobile phone source title method, it is characterised in that the following steps are included:
1. choosing the mobile phone of M different main brand difference mainstream models, and choose the ginseng of P all ages and classes different sexes With person;Then the voice that each participant reads aloud immobilized substance with normal word speed, each mobile phone are acquired simultaneously using M mobile phone P voice is collected altogether, M mobile phone collects M × P voice altogether, it is desirable that the duration of each voice is at least 3 minutes;Then The collected each voice of each mobile phone is converted into wav formatted voice;Later by the corresponding each wav format language of each mobile phone Cent is cut into 3~10 seconds sound bites, and takes 10 sound bites as speech samples;It is again that each mobile phone is corresponding total 10P speech samples constitute a voice word bank;Wherein, M > 1, P >=1;
2. being carried out using adaptive end-point detection algorithm to each speech samples in the corresponding voice word bank of each mobile phone Nearly mute section of estimation is extracted;Then close quiet to being extracted from each speech samples in the corresponding voice word bank of each mobile phone Segment is post-processed, and to eliminate unnecessary phonological component in mute section nearly, is obtained in the corresponding voice word bank of each mobile phone The corresponding multistage post-processing of each speech samples after nearly mute section;Again by each of corresponding voice word bank of each mobile phone Nearly mute section after speech samples corresponding multistage post-processing be stitched together be integrated into one it is final mute section nearly;
3. retain each mobile phone it is all final nearly mute sections corresponding in duration it is final close mute more than or equal to 1.5 seconds Section, and the corresponding spectrum distribution feature for being used to seek background noise of the mobile phone is constituted by all final nearly mute sections of reservation Tested speech word bank;
4. inhibiting nearly mute section of the back of each of corresponding tested speech word bank of each mobile phone using improved spectrum-subtraction Scape noise obtains nearly mute section of the ambient noise model of each of corresponding tested speech word bank of each mobile phone;Then it obtains The corresponding common background noise model of all mobile phones, by the corresponding common background noise model of mobile phone had at k-th of Frequency point It is described as BNmean(k),Wherein, symbol " | | " is the symbol that takes absolute value, BNm (k, n) is indicated in the sound spectrograph of all nearly mute section of ambient noise models in the corresponding tested speech word bank of m-th of mobile phone K-th of Frequency point, n-th frame in the spectral coefficient in Short Time Fourier Transform domain, 1≤k≤K, K indicate each nearly mute section of frequency The total number of rate point,KfftIndicate the points of Short Time Fourier Transform;
5. by the nearly mute section of general back corresponding with all mobile phones of each of corresponding tested speech word bank of each mobile phone A background noise of the difference of scape noise model as the mobile phone;Then intermediate value is carried out to each background noise of each mobile phone Ambient noise remaining in each background noise to remove each mobile phone is filtered, obtains the final sheet of each of each mobile phone Back noise;Then Fourier transformation is carried out to the final background noise of each of each mobile phone, it is final obtains each of each mobile phone The spectral coefficient of background noise;The logarithm for taking 10 to the spectral coefficient of the final background noise of each of each mobile phone again obtains every The spectral coefficient of the final background noise of each of a mobile phone taken after logarithm;It is finally final to each of each mobile phone along time shaft T frame is averaged before the spectral coefficient of background noise taken after logarithm, using the average value as the final background of each of each mobile phone The spectrum distribution feature of noise;Wherein, the points of Fourier transformation are Kfft, the duration of T frame is less than or equal to 1.5 seconds, and T >= 3, the dimension of the spectrum distribution feature of the final background noise of each of each mobile phone is K;
6. counting nearly mute section of the total number in the corresponding tested speech word bank of each mobile phone, it is with the smallest total number Base value, the half that base value is randomly choosed from all closely mute sections in the corresponding tested speech word bank of each mobile phone are closely quiet Segment constitutes the corresponding sub- training set of each mobile phone, from remaining all close quiet in the corresponding tested speech word bank of each mobile phone Nearly mute section of half that base value is randomly choosed in segment constitutes the corresponding sub- test set of each mobile phone;Then by all mobile phones pair The sub- training set answered constitutes a total training set, and the corresponding sub- test set of all mobile phones is constituted a total test set;Then The spectrum distribution feature of the final background noise of all mobile phones obtained by total training set is constituted into a training characteristics space, and The spectrum distribution feature of the final background noise of all mobile phones obtained by total test set is constituted into a test feature space;It Dimensionality reduction operation is carried out to training characteristics space using principal component analytical method afterwards, then to the institute in the training characteristics space after dimensionality reduction There is value to be normalized;And mapping matrix used by dimensionality reduction operates is carried out to test feature according to training characteristics space Space carries out dimensionality reduction operation, then all values in the test feature space after dimensionality reduction are normalized;Finally utilize Matlab included svm classifier function first carries out model training to the training characteristics space after normalized, obtains an instruction The more disaggregated models perfected recycle mute section close to each of total test set of trained more disaggregated models classify sentencing Not.
The step 2. in it is close to being extracted from each speech samples in the corresponding voice word bank of each mobile phone The mute section of detailed process post-processed are as follows: find out in each speech samples in the corresponding voice word bank of each mobile phone and extract All sampled points in the sampled value of all sampled points on nearly mute section out less than 5 × Thr, per continuous multiple sampled points Nearly mute section after forming one section of post-processing, each speech samples obtained in the corresponding voice word bank of each mobile phone are corresponding more Nearly mute section after section post-processing;Wherein, Thr indicates sub from the corresponding voice of each mobile phone using adaptive end-point detection algorithm After the absolute value ascending order arrangement of the sampled value of all sampled points on nearly mute section extracted in each speech samples in library The average value of preceding 30~50% all sampled values.
The step svm classifier function that 6. middle Matlab is carried uses RBF kernel function, what Matlab was carried Optimal value is obtained using cross validation mode to penalty coefficient and gamma factor in svm classifier function.
Compared with the prior art, the advantages of the present invention are as follows:
1) the method for the present invention utilizes each of corresponding tested speech word bank of each mobile phone nearly mute section and all mobile phones Corresponding common background noise model is to estimate the background noise of each mobile phone, then carries out to each background noise of each mobile phone Median filter process, so that without containing remaining ambient noise in the final background noise of each of obtained each mobile phone, thus Enable the spectrum distribution feature obtained on this basis preferably to carry out the classification of mobile phone, passes through the lot of experiment validation present invention The discrimination of method can achieve 99.24%.
2) previous mobile phone source title method is mostly all based on voice sample information, vulnerable to the text in speech samples The factors such as this information, the emotion of speaker influence, and cause to identify that stability is poor, and the method for the present invention be based on nearly mute section into The extraction of line frequency Spectrum distribution characteristic and mobile phone source title, stability is more preferable.
3) process that spectrum distribution feature is extracted in the method for the present invention is simple, and empty to training characteristics space and test feature Between after dimensionality reduction, data calculation amount greatly reduces, and computational efficiency is high, and computation complexity is low.
Detailed description of the invention
Fig. 1 is that the overall of the method for the present invention realizes block diagram;
Fig. 2 a is the waveform diagram of a speech samples;
Fig. 2 b is that the detection in the waveform diagram of existing adaptive end-point detection algorithm speech samples shown in Fig. 2 a is shown It is intended to;
Fig. 2 c is nearly mute section of the waveform diagram extracted in speech samples shown in Fig. 2 a;
Fig. 2 d is post-treated final mute section nearly with what is obtained after splicing for nearly mute section shown in Fig. 2 c;
Fig. 3 a is the sound spectrograph of the final background noise of HTC D820t mobile phone;
Fig. 3 b is the sound spectrograph of the final background noise of 7 mobile phone of Huawei's honor;
Fig. 3 c is the sound spectrograph of the final background noise of 5 mobile phone of apple;
Fig. 3 d is the sound spectrograph of the final background noise of another 5 mobile phone of apple;
Fig. 3 e is the sound spectrograph of the final background noise of Meizu MX4 mobile phone;
Fig. 3 f is the sound spectrograph of the final background noise of 3 mobile phone of millet;
Fig. 3 g is the sound spectrograph of the final background noise of OPPO mono- plus mobile phone;
Fig. 3 h is the sound spectrograph of the final background noise of the happy generation S5 mobile phone of Samsung lid;
Fig. 4 a is the sound spectrograph of the practical background noise of iphone6 mobile phone;
Fig. 4 b is the sound spectrograph of the final background noise of the iphone6 mobile phone obtained using the method for the present invention;
Fig. 4 c be iphone6 mobile phone practical background noise with the iphone6 mobile phone that is obtained using the method for the present invention most The frequency spectrum comparison schematic diagram of whole background noise.
Specific embodiment
The present invention will be described in further detail below with reference to the embodiments of the drawings.
A kind of mobile phone source title method based on equipment background noise spectrum signature proposed by the present invention, it is overall to realize Block diagram as shown in Figure 1, itself the following steps are included:
1. choosing the mobile phone of M different main brand difference mainstream models, and choose the ginseng of P all ages and classes different sexes With person;Then the voice that each participant reads aloud immobilized substance with normal word speed, each mobile phone are acquired simultaneously using M mobile phone P voice is collected altogether, M mobile phone collects M × P voice altogether, it is desirable that the duration of each voice is at least 3 minutes;Then The collected each voice of each mobile phone is converted into wav formatted voice;Later by the corresponding each wav format language of each mobile phone Cent is cut into 3~10 seconds sound bites, and takes 10 sound bites as speech samples;It is again that each mobile phone is corresponding total 10P speech samples constitute a voice word bank;Wherein, M > 1, takes M=24 in the present embodiment, P >=1, in the present embodiment P=12 is taken, such as includes the male participant of 6 all ages and classes, the women participant of 6 all ages and classes, the acquisition of each voice Environment quiet, quiet office between selecting one in the present embodiment.
2. using existing adaptive end-point detection algorithm to each voice sample in the corresponding voice word bank of each mobile phone The nearly mute section of estimation of this progress is extracted;Then it is extracted to from each speech samples in the corresponding voice word bank of each mobile phone Nearly mute section post-processed, to eliminate unnecessary phonological component in nearly mute section, obtain the corresponding voice of each mobile phone Nearly mute section after the corresponding multistage post-processing of each speech samples in word bank;It again will be in the corresponding voice word bank of each mobile phone The corresponding multistage post-processing of each speech samples after nearly mute section be stitched together be integrated into one it is final mute section nearly, obtain Final nearly mute section of duration be certainly less than the durations of corresponding speech samples.
Here, the reason of first carrying out nearly mute section of estimation to each speech samples is nearly mute section of voice mainly by this What back noise and ambient noise were constituted, it will not be by acoustic-electric non_uniform response prevailing in the integrated noise of phonological component Noise is polluted, therefore carries out nearly mute section of estimation, adaptive end-point detection algorithm using adaptive end-point detection algorithm herein It can identify well mute section nearly;But also contain a small amount of voice messaging in nearly mute section of identification, in order to further eliminate Phonological component post-processes nearly mute section, and integrates and obtain final closely mute section.
Fig. 2 a gives the waveform diagram of a speech samples, and Fig. 2 b, which gives existing adaptive end-point detection algorithm, to scheme The schematic diagram detected in the waveform diagram of speech samples shown in 2a, Fig. 2 c gives to be extracted from speech samples shown in Fig. 2 a Nearly mute section of waveform diagram, Fig. 2 d give shown in Fig. 2 c obtained after nearly mute section post-treated and splicing it is final close quiet Segment.As can be seen that the method for the present invention can be good at closely mute section of identification from Fig. 2 a and Fig. 2 b;It can from Fig. 2 c Out, nearly mute section extracted also contain a small amount of voice messaging, and it can be seen from figure 2d that, through close quiet in the method for the present invention After segment post-processing, final nearly mute section obtained is without containing voice messaging.
In this particular embodiment, step 2. in from each speech samples in the corresponding voice word bank of each mobile phone The nearly mute section of detailed process post-processed extracted are as follows: find out each voice in the corresponding voice word bank of each mobile phone All sampled points in the sampled value of all sampled points on nearly mute section extracted in sample less than 5 × Thr, per continuous Multiple sampled points form nearly mute section after one section of post-processing, obtain each voice sample in the corresponding voice word bank of each mobile phone Nearly mute section after this corresponding multistage post-processing;Wherein, Thr is indicated using existing adaptive end-point detection algorithm from each The sampled value of all sampled points on nearly mute section extracted in each speech samples in the corresponding voice word bank of mobile phone The average value of preceding 30~50% all sampled values, takes Thr to be equal to each voice in the present embodiment after the arrangement of absolute value ascending order The absolute value ascending order of the sampled value of all sampled points on nearly mute section extracted in each speech samples in word bank arranges The average value of preceding 40% all sampled values afterwards.
3. since corresponding final nearly mute section of the length of all speech samples in each voice word bank is inconsistent, Therefore in order to guarantee that eigenmatrix length is consistent at construction feature space, retain duration and be greater than or equal to 1.5 seconds most It is mute section nearly eventually, and it is final mute section nearly less than 1.5 seconds to remove duration.It is corresponding all final close mute to retain each mobile phone Duration is final mute section nearly more than or equal to 1.5 seconds in section (corresponding 10P final closely mute section of each voice word bank), and The test of the corresponding spectrum distribution feature for being used to seek background noise of the mobile phone is constituted by all final nearly mute sections of reservation Voice word bank.
4. to inhibit ambient noise as far as possible to obtain actual background noise from final closely mute section.Therefore Nearly mute section of the background of each of corresponding tested speech word bank of each mobile phone is inhibited to make an uproar using existing improved spectrum-subtraction Sound obtains nearly mute section of the ambient noise model of each of corresponding tested speech word bank of each mobile phone;Then owned The corresponding common background noise model of mobile phone describes the corresponding common background noise model of mobile phone had at k-th of Frequency point For BNmean(k),Wherein, symbol " | | " is the symbol that takes absolute value, BNm(k,n) Indicate the kth in the sound spectrograph of all nearly mute section of ambient noise models in the corresponding tested speech word bank of m-th of mobile phone In the spectral coefficient in the domain Short Time Fourier Transform (STFT), 1≤k≤K, K indicate each nearly mute section for a Frequency point, n-th frame The total number of Frequency point,KfftIndicate the points of Short Time Fourier Transform, it in the present embodiment will be in short-term in Fu The points of leaf transformation are set as 4096, take
5. by the nearly mute section of general back corresponding with all mobile phones of each of corresponding tested speech word bank of each mobile phone A background noise of the difference of scape noise model as the mobile phone;Then intermediate value is carried out to each background noise of each mobile phone Ambient noise remaining in each background noise to remove each mobile phone is filtered, obtains the final sheet of each of each mobile phone Back noise;Then Fourier transformation is carried out to the final background noise of each of each mobile phone, it is final obtains each of each mobile phone The spectral coefficient of background noise;The logarithm for taking 10 to the spectral coefficient of the final background noise of each of each mobile phone again obtains every The spectral coefficient of the final background noise of each of a mobile phone taken after logarithm;It is finally final to each of each mobile phone along time shaft T frame is averaged before the spectral coefficient of background noise taken after logarithm, using the average value as the final background of each of each mobile phone The spectrum distribution feature of noise;Wherein, the points of Fourier transformation are Kfft, the duration of T frame is less than or equal to 1.5 seconds, and T >= 3, the dimension of the spectrum distribution feature of the final background noise of each of each mobile phone is K.
Fig. 3 a gives the sound spectrograph of the final background noise of HTC D820t mobile phone, and Fig. 3 b gives 7 mobile phone of Huawei's honor Final background noise sound spectrograph, Fig. 3 c gives the sound spectrograph of the final background noise of 5 mobile phone of apple, and Fig. 3 d is provided The sound spectrograph of the final background noise of another 5 mobile phone of apple, Fig. 3 e give the final background noise of Meizu MX4 mobile phone Sound spectrograph, Fig. 3 f give the sound spectrograph of the final background noise of 3 mobile phone of millet, and Fig. 3 g gives OPPO mono- and adds the final of mobile phone The sound spectrograph of background noise, Fig. 3 h give the sound spectrograph of the final background noise of the happy generation S5 mobile phone of Samsung lid.From Fig. 3 a to figure In 3h as can be seen that the sound spectrograph of background noise of different brands mobile phone there are great differences, for example, the background of 3 mobile phone of millet The energy of noise is all strongest, the sound spectrograph of the background noise of Meizu MX4 mobile phone at all Frequency point intervals (0-16KHZ) Amplitude curve be with frequency in fluctuating change trend, the sound spectrograph of the background noise of HTC D820t mobile phone is in frequency Near 4000Hz, there is a sharp decline.
Fig. 4 a gives the sound spectrograph of the practical background noise of iphone6 mobile phone, and Fig. 4 b, which gives, utilizes the method for the present invention The sound spectrograph of the final background noise of obtained iphone6 mobile phone, Fig. 4 c give the practical background noise of iphone6 mobile phone with The frequency spectrum of the final background noise of the iphone6 mobile phone obtained using the method for the present invention is compared.It can be seen that from Fig. 4 c The frequency of the practical background noise of iphone6 mobile phone and the final background noise of the iphone6 mobile phone obtained using the method for the present invention Spectrum be it is much like, absolutely prove that it is feasible and effective that the method for final background noise of mobile phone is obtained in the method for the present invention.
6. counting nearly mute section of the total number in the corresponding tested speech word bank of each mobile phone, it is with the smallest total number Base value, the half that base value is randomly choosed from all closely mute sections in the corresponding tested speech word bank of each mobile phone are closely quiet Segment constitutes the corresponding sub- training set of each mobile phone, from remaining all close quiet in the corresponding tested speech word bank of each mobile phone Nearly mute section of half that base value is randomly choosed in segment constitutes the corresponding sub- test set of each mobile phone;Then by all mobile phones pair The sub- training set answered constitutes a total training set, and the corresponding sub- test set of all mobile phones is constituted a total test set;Then The spectrum distribution feature of the final background noise of all mobile phones obtained by total training set is constituted into a training characteristics space, and The spectrum distribution feature of the final background noise of all mobile phones obtained by total test set is constituted into a test feature space;It Dimensionality reduction operation is carried out to training characteristics space using principal component analysis (PCA) method afterwards, then to the training characteristics space after dimensionality reduction In all values be normalized;And mapping matrix used by dimensionality reduction operates is carried out to survey according to training characteristics space It tries feature space and carries out dimensionality reduction operation, then all values in the test feature space after dimensionality reduction are normalized;Finally The svm classifier function carried using Matlab first carries out model training to the training characteristics space after normalized, obtains one A trained more disaggregated models recycle trained more disaggregated models mute section point nearly to each of total test set Class differentiates.
In this particular embodiment, the step svm classifier function that 6. middle Matlab is carried uses RBF kernel function, Optimal value is obtained using cross validation mode to penalty coefficient and gamma factor in Matlab included svm classifier function.
The feasibility and validity of method in order to further illustrate the present invention carries out experimental verification to the method for the present invention.
In an experiment, the corresponding voice word bank of each mobile phone is established, effectively to assess the feasible of the method for the present invention Property and validity.Table 1 lists the brand and model for testing used 24 mobile phones, and 24 mobile phones is utilized to acquire voices. 12 participants (6 male 6 female) are invited to participate in voice collecting.Each participant needs to read aloud immobilized substance with normal word speed, when Long guarantee 3 minutes or more.Playback environ-ment is relatively quiet office between one, and 24 mobile phones open simultaneously and close recorder. Each mobile phone acquires the voice of 12 participants, and each voice is divided into 5 seconds sound bites, and each mobile phone obtains 400 A speech samples constitute the corresponding voice word bank of the mobile phone.To each speech samples in the corresponding voice word bank of each mobile phone It carries out nearly mute section of estimation to extract, obtains nearly mute section in each speech samples in the corresponding voice word bank of each mobile phone, It is obtained after post-treated again and splicing final mute section nearly in each speech samples in the corresponding voice word bank of each mobile phone. Since nearly mute section of length is inconsistent, in order to guarantee that eigenmatrix length is consistent at construction feature space, choosing Take 240 number of speech frames of each model mobile phone greater than nearly mute section of 40 frames, composition seeks the spectrum distribution feature of background noise Tested speech word bank.When construction feature space, the spectrum distribution feature of the background noise of each nearly mute section of preceding 40 frame is taken Average value, herein frame length be 30 milliseconds, frame move be 15 milliseconds.
The brand and model and class name of mobile phone employed in the experiment of table 1
In conjunction with the included svm classifier function of principal component analysis (PCA) and Matlab, from the corresponding tested speech of each mobile phone Nearly mute section of half that base value is randomly choosed in all closely mute sections in word bank constitutes the corresponding sub- training set of each mobile phone, The half that base value is randomly choosed from remaining all closely mute sections in the corresponding tested speech word bank of each mobile phone is closely quiet Segment constitutes the corresponding sub- test set of each mobile phone;Then the corresponding sub- training set of all mobile phones is constituted into a total training set, And the corresponding sub- test set of all mobile phones is constituted into a total test set.By the final sheet of all mobile phones obtained by total training set The spectrum distribution feature of back noise constitutes a training characteristics space, and by the final sheet of all mobile phones obtained by total test set The spectrum distribution feature of back noise constitutes a test feature space.Dimensionality reduction is carried out to training characteristics space first with PCA, then By all values normalized in the training characteristics space after dimensionality reduction, test feature space is according to training characteristics space dimensionality reduction institute The mapping matrix of use carries out dimensionality reduction, and then all values in the test feature space after dimensionality reduction are normalized.Most The svm classifier function carried afterwards using Matlab first carries out model training, then benefit to the training characteristics space after normalized Discriminant classification is carried out to nearly mute section of each of total test set with trained more disaggregated models.
Above-mentioned, the points of Short Time Fourier Transform are 4096, the frequency of the final background noise of each of obtained each mobile phone The dimension of Spectrum distribution characteristic is 2049, and the dimension of spectrum distribution feature is too big, and may result in spectrum distribution feature cannot be complete It is independent uncorrelated.It since the spectrum distribution feature of redundancy cannot promote the accuracy rate of identification, or even will affect performance, therefore utilize PCA carries out dimensionality reduction, forms best training characteristics space and test feature space, tests through experiment, the spectrum distribution after dimensionality reduction When the dimension of feature is 28 dimension, discrimination highest is to penalty coefficient and gamma in Matlab included svm classifier function at this time Several values correspond to 112 and 0.01.
Table 2 lists the discrimination of 24 mobile phones, and the data listed by the table 2 can be calculated average recognition accuracy and be 99.24%.The method of the present invention can preferably complete the Classification and Identification of 24 mobile phones, and the discrimination of apple 6 is 91.67%, Main erroneous judgement is distinguished in brand class, it is mistaken for apple 4s and apple 5s.In addition to apple 6, other mobile phones have compared with High recognition accuracy, wherein the discrimination for having 18 mobile phones to reach 100%, the brands such as Samsung, OPPO, Meizu can accomplish nothing Error classification.From the point of view of above-mentioned experimental result, " fingerprint " that the background noise of mobile phone can be used as mobile phone carries out mobile phone well Source title.In the audio forensics field of mobile phone source title, the background noise of mobile phone is the feature for having very much distinction.
The discrimination (%) of 2 24 mobile phones of table

Claims (3)

1. a kind of mobile phone source title method based on equipment background noise spectrum signature, it is characterised in that the following steps are included:
1. choosing the mobile phone of M different main brand difference mainstream models, and choose the participation of P all ages and classes different sexes Person;Then the voice that each participant reads aloud immobilized substance with normal word speed is acquired simultaneously using M mobile phone, each mobile phone is total P voice is collected, M mobile phone collects M × P voice altogether, it is desirable that the duration of each voice is at least 3 minutes;Then will Each collected each voice of mobile phone is converted into wav formatted voice;Later by the corresponding each wav formatted voice of each mobile phone 3~10 seconds sound bites are divided into, and take 10 sound bites as speech samples;Again by the corresponding total 10P of each mobile phone A speech samples constitute a voice word bank;Wherein, M > 1, P >=1;
2. being carried out using adaptive end-point detection algorithm to each speech samples in the corresponding voice word bank of each mobile phone close quiet Segment estimation is extracted;Then to nearly mute section extracted from each speech samples in the corresponding voice word bank of each mobile phone It is post-processed, to eliminate unnecessary phonological component in mute section nearly, is obtained every in the corresponding voice word bank of each mobile phone Nearly mute section after the corresponding multistage post-processing of a speech samples;Again by each voice in the corresponding voice word bank of each mobile phone Nearly mute section after sample corresponding multistage post-processing be stitched together be integrated into one it is final mute section nearly;
3. retain each mobile phone it is all final nearly mute sections corresponding in duration it is final mute section nearly more than or equal to 1.5 seconds, And all final nearly mute sections of reservation are constituted the survey of the corresponding spectrum distribution feature for being used to seek background noise of the mobile phone Try voice word bank;
4. inhibiting nearly mute section of the background of each of corresponding tested speech word bank of each mobile phone to make an uproar using improved spectrum-subtraction Sound obtains nearly mute section of the ambient noise model of each of corresponding tested speech word bank of each mobile phone;Then owned The corresponding common background noise model of mobile phone describes the corresponding common background noise model of mobile phone had at k-th of Frequency point For BNmean(k),Wherein, symbol " | | " is the symbol that takes absolute value, BNm(k,n) Indicate the kth in the sound spectrograph of all nearly mute section of ambient noise models in the corresponding tested speech word bank of m-th of mobile phone In the spectral coefficient in Short Time Fourier Transform domain, 1≤k≤K, K indicate each nearly mute section of Frequency point for a Frequency point, n-th frame Total number,KfftIndicate the points of Short Time Fourier Transform;
5. the nearly mute section of common background corresponding with all mobile phones of each of corresponding tested speech word bank of each mobile phone is made an uproar A background noise of the difference of acoustic model as the mobile phone;Then median filtering is carried out to each background noise of each mobile phone Ambient noise remaining in each background noise to remove each mobile phone is handled, the final background of each of each mobile phone is obtained and makes an uproar Sound;Then Fourier transformation is carried out to the final background noise of each of each mobile phone, obtains the final background of each of each mobile phone The spectral coefficient of noise;The logarithm for taking 10 to the spectral coefficient of the final background noise of each of each mobile phone again, obtains each hand The spectral coefficient of the final background noise of each of machine taken after logarithm;Finally along time shaft to the final background of each of each mobile phone T frame is averaged before the spectral coefficient of noise taken after logarithm, using the average value as the final background noise of each of each mobile phone Spectrum distribution feature;Wherein, the points of Fourier transformation are Kfft, the duration of T frame is less than or equal to 1.5 seconds, and T >=3, often The dimension of the spectrum distribution feature of the final background noise of each of a mobile phone is K;
6. nearly mute section of the total number in the corresponding tested speech word bank of each mobile phone is counted, on the basis of the smallest total number Number, the half that base value is randomly choosed from all closely mute sections in the corresponding tested speech word bank of each mobile phone are mute section nearly The corresponding sub- training set of each mobile phone is constituted, from remaining all closely mute sections in the corresponding tested speech word bank of each mobile phone Nearly mute section of the half of middle random selection base value constitutes the corresponding sub- test set of each mobile phone;Then all mobile phones are corresponding Sub- training set constitutes a total training set, and the corresponding sub- test set of all mobile phones is constituted a total test set;It then will be by The spectrum distribution feature of the final background noise for all mobile phones that total training set obtains constitutes a training characteristics space, and will be by The spectrum distribution feature of the final background noise for all mobile phones that total test set obtains constitutes a test feature space;It is sharp later Dimensionality reduction operation is carried out to training characteristics space with principal component analytical method, then to all values in the training characteristics space after dimensionality reduction It is normalized;And mapping matrix used by dimensionality reduction operates is carried out to test feature space according to training characteristics space Dimensionality reduction operation is carried out, then all values in the test feature space after dimensionality reduction are normalized;Finally utilize Matlab Included svm classifier function first to after normalized training characteristics space carry out model training, obtain one it is trained More disaggregated models recycle trained more disaggregated models to the nearly mute section of progress discriminant classification of each of total test set.
2. a kind of mobile phone source title method based on equipment background noise spectrum signature according to claim 1, special It levies close quiet to being extracted from each speech samples in the corresponding voice word bank of each mobile phone in being the step 2. The detailed process that segment is post-processed are as follows: find out in each speech samples in the corresponding voice word bank of each mobile phone and extract Nearly mute section on all sampled points sampled value in all sampled points less than 5 × Thr, per continuous multiple sampling dots At nearly mute section after one section of post-processing, the corresponding multistage of each speech samples in the corresponding voice word bank of each mobile phone is obtained Nearly mute section after post-processing;Wherein, Thr indicates to utilize adaptive end-point detection algorithm from the corresponding voice word bank of each mobile phone In each speech samples in all sampled points on extract nearly mute section sampled value the arrangement of absolute value ascending order after before The average value of 30~50% all sampled values.
3. a kind of mobile phone source title method based on equipment background noise spectrum signature according to claim 1 or 2, It is characterized in that the step svm classifier function that 6. middle Matlab is carried uses RBF kernel function, what Matlab was carried Optimal value is obtained using cross validation mode to penalty coefficient and gamma factor in svm classifier function.
CN201611129639.5A 2016-12-09 2016-12-09 A kind of mobile phone source title method based on equipment background noise spectrum signature Active CN106531159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611129639.5A CN106531159B (en) 2016-12-09 2016-12-09 A kind of mobile phone source title method based on equipment background noise spectrum signature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611129639.5A CN106531159B (en) 2016-12-09 2016-12-09 A kind of mobile phone source title method based on equipment background noise spectrum signature

Publications (2)

Publication Number Publication Date
CN106531159A CN106531159A (en) 2017-03-22
CN106531159B true CN106531159B (en) 2019-06-18

Family

ID=58341615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611129639.5A Active CN106531159B (en) 2016-12-09 2016-12-09 A kind of mobile phone source title method based on equipment background noise spectrum signature

Country Status (1)

Country Link
CN (1) CN106531159B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106941008B (en) * 2017-04-05 2020-11-24 华南理工大学 Blind detection method for splicing and tampering of different source audios based on mute section
CN107123419A (en) * 2017-05-18 2017-09-01 北京大生在线科技有限公司 The optimization method of background noise reduction in the identification of Sphinx word speeds
CN107507626B (en) * 2017-07-07 2021-02-19 宁波大学 Mobile phone source identification method based on voice frequency spectrum fusion characteristics
CN107274912B (en) * 2017-07-13 2020-06-19 东莞理工学院 Method for identifying equipment source of mobile phone recording
CN108172224B (en) 2017-12-19 2019-08-27 浙江大学 Method based on the defence of machine learning without vocal command control voice assistant
CN108461092B (en) * 2018-03-07 2022-03-08 燕山大学 Method for analyzing Parkinson's disease voice
CN109285538B (en) * 2018-09-19 2022-12-27 宁波大学 Method for identifying mobile phone source in additive noise environment based on constant Q transform domain
CN111092983B (en) * 2019-12-25 2020-12-11 清华大学深圳国际研究生院 Voice call echo and background noise suppression method based on sliding mode variable structure control
CN112927680B (en) * 2021-02-10 2022-06-17 中国工商银行股份有限公司 Voiceprint effective voice recognition method and device based on telephone channel

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011107650A (en) * 2009-11-20 2011-06-02 Casio Computer Co Ltd Voice feature amount calculation device, voice feature amount calculation method, voice feature amount calculation program and voice recognition device
CN102394062B (en) * 2011-10-26 2013-02-13 华南理工大学 Method and system for automatically identifying voice recording equipment source
CN106198765B (en) * 2015-04-29 2019-03-15 中国科学院声学研究所 A kind of acoustic signal recognition methods for Metal Crack monitoring
CN105632516B (en) * 2016-01-13 2019-07-30 宁波大学 A kind of MP3 recording file source title method based on side information statistical property
CN105845132A (en) * 2016-03-22 2016-08-10 宁波大学 Coding parameter statistical feature-based AAC sound recording document source identification method

Also Published As

Publication number Publication date
CN106531159A (en) 2017-03-22

Similar Documents

Publication Publication Date Title
CN106531159B (en) A kind of mobile phone source title method based on equipment background noise spectrum signature
CN109285538B (en) Method for identifying mobile phone source in additive noise environment based on constant Q transform domain
CN107507626B (en) Mobile phone source identification method based on voice frequency spectrum fusion characteristics
CN109036382B (en) Audio feature extraction method based on KL divergence
CN102394062B (en) Method and system for automatically identifying voice recording equipment source
CN108986824B (en) Playback voice detection method
CN108231067A (en) Sound scenery recognition methods based on convolutional neural networks and random forest classification
CN104835498A (en) Voiceprint identification method based on multi-type combination characteristic parameters
CN107274916A (en) The method and device operated based on voiceprint to audio/video file
Aggarwal et al. Cellphone identification using noise estimates from recorded audio
Paul et al. Countermeasure to handle replay attacks in practical speaker verification systems
Hanilçi et al. Optimizing acoustic features for source cell-phone recognition using speech signals
CN113823293B (en) Speaker recognition method and system based on voice enhancement
Murugappan et al. DWT and MFCC based human emotional speech classification using LDA
CN105825857A (en) Voiceprint-recognition-based method for assisting deaf patient in determining sound type
Jaafar et al. Automatic syllables segmentation for frog identification system
Nandyal et al. MFCC based text-dependent speaker identification using BPNN
CN110136746B (en) Method for identifying mobile phone source in additive noise environment based on fusion features
Zeng et al. Feature selection based on ReliefF and PCA for underwater sound classification
Kaminski et al. Automatic speaker recognition using a unique personal feature vector and Gaussian Mixture Models
CN116386589A (en) Deep learning voice reconstruction method based on smart phone acceleration sensor
Muralikrishna et al. Spoken language identification in unseen target domain using within-sample similarity loss
Verma et al. Cell-phone identification from recompressed audio recordings
CN114512133A (en) Sound object recognition method, sound object recognition device, server and storage medium
CN106887229A (en) A kind of method and system for lifting the Application on Voiceprint Recognition degree of accuracy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant