CN108053842A - Shortwave sound end detecting method based on image identification - Google Patents

Shortwave sound end detecting method based on image identification Download PDF

Info

Publication number
CN108053842A
CN108053842A CN201711330638.1A CN201711330638A CN108053842A CN 108053842 A CN108053842 A CN 108053842A CN 201711330638 A CN201711330638 A CN 201711330638A CN 108053842 A CN108053842 A CN 108053842A
Authority
CN
China
Prior art keywords
signal
spectrum
voice
carried out
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711330638.1A
Other languages
Chinese (zh)
Other versions
CN108053842B (en
Inventor
陈章鑫
杨孟文
司进修
黄际彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201711330638.1A priority Critical patent/CN108053842B/en
Publication of CN108053842A publication Critical patent/CN108053842A/en
Application granted granted Critical
Publication of CN108053842B publication Critical patent/CN108053842B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Abstract

The invention belongs to speech detection fields, are particularly based on the shortwave sound end detecting method that image identifies.The technical scheme is that:Data are pre-processed first, improve signal-to-noise ratio;Then by specific length framing, Short Time Fourier Transform is carried out at the same time, so as to obtain sound spectrograph;The vocal print in sound spectrograph is finally found using image-recognizing method, determines there is words section in data according to vocal print distribution.There is similar signal-to-noise ratio using the voice of the method for the present invention after the pre-treatment, adjustment parameter is not required in subsequent step, and therefore, the method for the present invention can adaptively be chosen from different ambient noises words section.

Description

Shortwave sound end detecting method based on image identification
Technical field
The invention belongs to speech detection field, especially a kind of shortwave sound end detecting method based on image identification.
Background technology
Although novel radio electrical communication system continuously emerges, short-wave radio set is due to its autonomous communication ability and wide coverage The characteristics of, still it is subject to most attention.But short wave communication emitting radio waves are needed by ionospheric reflection, therefore its noise compared with Greatly.The presence of strong background noise causes monitoring personnel that can not work long hours, it is necessary to do noise reduction process, while to no segment of speech into The processing of row noise elimination.Leakage is listened in order to prevent at this time, and the performance of sound end detecting method is particularly important.
In traditional voice processing, according to different characteristic, there is the method for many end-point detections, as based on correlation function End-point detection, the end-point detection based on cepstrum distance, the end-point detection based on zero ratio of energy and the endpoint inspection based on wavelet decomposition Survey etc..For different phonetic, adjusting parameter, can choose voice exactly has words section.But in changeable environment, it is desirable that real-time Communication for Power In the case of, adjustment endpoint detection parameters are unpractical, and traditional voice processing method is just no longer applicable.
Voice spectrum figure abbreviation sound spectrograph, by the Short Time Fourier Transform of voice analyze and research voice short-term spectrum with The variation relation of time.Sound spectrograph horizontal direction is time shaft, and vertical direction is frequency axis, and gray scale striped thereon represents each The voice short-time spectrum at moment.Sound spectrograph reflects the dynamic spectrum characteristic of voice signal, has important reality in speech analysis With value, it is referred to as visual speech.
The content of the invention
The defects of for the prior art, according to there is no vocal prints in the peculiar mechanism and noise spectrum of mankind's sounding This feature, the present invention propose a kind of adaptive processing method.
The technical scheme is that:Data are pre-processed first, improve signal-to-noise ratio;Then by specific length point Frame is carried out at the same time Short Time Fourier Transform, so as to obtain sound spectrograph;The sound in sound spectrograph is finally found using image-recognizing method Line determines there is words section in data according to vocal print distribution.
A kind of shortwave sound end detecting method based on image identification, step are specific as follows:
S1, voice pretreatment is carried out, the purpose for carrying out voice pretreatment is to ensure the sound spectrograph vocal print clarity formed Roughly the same, this is the premise for carrying out effective image identification, is concretely comprised the following steps:
S11, during voice signal data is gathered, test system it is some due to, the meeting in time series The trend error of a linear or slow change is generated, the zero curve of voice signal is made to deviate baseline, the size even deviated from can be with Time change, this can cause the correlation function of voice, and power spectrum function is deformed in processing calculates, intended using least square method Close trend term removal trend error;
S12, amplitude normalization is carried out;
S13, low-pass filtering, noise of the removal higher than 3500Hz;
S14, the spectrum-subtraction composed using more windows strengthen voice;
S2, image identification is carried out to the sound spectrograph of acquisition, obtains structure, this structure includes sound spectrograph vocal print position Starting point and end point are specially:
S21, sub-frame processing is carried out to voice signal, Short Time Fourier Transform is carried out in units of frame, obtains short-term spectrum;
S22, the short-term spectrum obtained by the time sequencing arrangement S21 of frame, obtain sound spectrograph;
Vocal print in sound spectrograph described in S23, identification S22, i.e.,:Colored sound spectrograph is become into gray level image;Extract gray-scale map Image border, identify gray-scale map middle conductor position;The starting point and end point that include sound spectrograph vocal print position that will be obtained Form structure;
S3, end-point detection is carried out, is specially:
S31, initial point position vector ST=[st are extracted from structure described in S21,st2,...,sti,...,stn] With end point position vector EN=[en1,en2,...,eni,...,enn], wherein, stiRefer to i-th of initial point position, eniRefer to the I end point position.The initial point position vector ST and end point position vector EN are ranked up according to ascending order;
S32, judgement have words section, are regarded as vocal print when there is three horizontal line sections, remaining is noise.Numerically body It is now to work as eni> sti+2I.e. it is believed that i-th point of line segment for starting point is in having words section;
S33, line segment of all affirmatives in having words section is selected, has been looked for whether to the left and right in the range of 100 frame of both direction The element st' of STiIn the presence of being also contained in if having in words section, and substituted script stiIt repeats and is sought in the range of 100 frames to the left and right It looks for, until left and right, the element of ST is not present in 100 frame scopes.
Further, voice is carried out strengthening being as follows using the spectrum-subtraction that more windows are composed described in S14:
Step A, the time series of voice signal is set as x (n), carries out adding window point to x (n) with the Hamming window that length is wlen Frame processing obtains the i-th frame voice signal as xi(m), the xi(m) frame length is wlen, the xi(m) discrete Fourier becomes It is changed to
Step B, M frames, X described in common 2M+1 frames calculation procedure A are respectively taken before and after centered on i framesi(k) each component in Average amplitude is composedAnd phase angleWherein j refers to i frames Centered on rear j frames, Im refers to imaginary part, and Re refers to real part;
Step C, multiple orthogonal data windows is asked to be averaged to obtain Power estimation to same data sequence, more window spectrums are defined asWherein, L be data window number, SmtFor the spectrum of data window w, i.e., Tx (n) is data sequence, and N is sequence length, aw(n) it is w-th of data window, aw(n) it is one group of mutually orthogonal discrete ellipsoid sequence Row, for asking direct spectrum, a respectively with same column signalw(n) meet it is mutually orthogonal between multiple data windows, i.e.,Definition method is composed to the signal x after framing with above-mentioned more windowsi(m) multiple window spectrum estimation is carried out, I.e.
Step D, more window Spectral power density estimates are smoothed, calculate smooth power spectrum densityCalculate noise average power spectrum densityCalculate gain The factorWherein, NIS represents leading without words section The frame number occupied;
Step E, the amplitude spectrum after being subtracted according to obtained more windows spectrum spectrumVoice letter is strengthened in synthesis NumberWherein, more window spectrum spectrum-subtractions are to utilize the leading work(that noise is obtained without words section Rate after the power of overall sound subtracts the ingredient of noise, recovers voice signal using angle relationship, crosses subtracting coefficient and determine to signal Reinforcement degree, gain compensation factor determine calculate duration.
Further, the choosing method of the subtracting coefficient excessively is as follows:
Ith, it is 1 to cross subtracting coefficient initial value, and takes initial signal-to-noise ratio snr'=0;
IIth, reinforcement processing is carried out to voice using more windows spectrum spectrum-subtraction, the signal-to-noise ratio snr of signal after calculating processing;
If the signal-to-noise ratio snr of the III, treated signal is more than initial signal-to-noise ratio snr', next step is carried out, if processing The signal-to-noise ratio snr of signal afterwards is less than or equal to initial signal-to-noise ratio snr', illustrates that voice is not notable in signal, does not then do and locate Reason, retains all voice signals, directly exports;
If the signal-to-noise ratio snr of the IV, treated signal is less than 8dB, crossing subtracting coefficient increases by 0.5, makes snr'=snr, weight Multiple step II-step IV is more than 8dB until Signal-to-Noise.
The beneficial effects of the invention are as follows:
There is similar signal-to-noise ratio using the voice of the method for the present invention after the pre-treatment, adjustment parameter is not required in subsequent step, Therefore, the method for the present invention can adaptively be chosen from different ambient noises words section.
Description of the drawings
Fig. 1 improves spectrum-subtraction schematic diagram for more windows spectrum.
Fig. 2 strengthens process chart for voice.
Fig. 3 is the method for the present invention flow chart.
Fig. 4 is the voice time domain figure before voice pretreatment in specific embodiment 1.
Fig. 5 is the voice time domain figure after voice pretreatment in specific embodiment 1.
Fig. 6 is each frame frequency spectrogram of voice in specific embodiment 1.
Fig. 7 is the sound spectrograph after gray proces in specific embodiment 1.
Fig. 8 is horizontal line section part in the sound spectrograph after gray proces in specific embodiment 1.
Fig. 9 is the sound spectrograph end-point detection result after gray proces in specific embodiment 1.
Figure 10 is endpoint testing result time-domain diagram in specific embodiment 1, wherein, a left side is raw tone, after pretreatment is in the right side Voice.
Figure 11 is the voice time domain figure before voice pretreatment in specific embodiment 2.
Figure 12 is the pretreated voice time domain figure of voice in specific embodiment 2.
Figure 13 is each frame frequency spectrogram of voice in specific embodiment 2.
Figure 14 is the sound spectrograph after gray proces in specific embodiment 2.
Figure 15 is horizontal line section part in the sound spectrograph after gray proces in specific embodiment 2.
Figure 16 is the sound spectrograph end-point detection result after gray proces in specific embodiment 2.
Figure 17 is endpoint testing result time-domain diagram in specific embodiment 2, wherein, a left side is raw tone, after pretreatment is in the right side Voice.
Specific embodiment
The present invention will be described below in conjunction with the accompanying drawings.
The method of the present invention chooses feature of the vocal print characteristic as sound.Due to unique physiological structure of mankind's sounding, from language It can be seen that vocal print in audio spectrogram (sound spectrograph).The vocal print of human speech has notable feature, is having words section, it can be seen that no Energy distribution has specific rule on same frequency;In the spectrogram of voice, horizontally-parallel several lines are presented, these lines are just It is vocal print.Vocal print can embody personal pronunciation character and phoneme feature, find broad application in terms of speech recognition.
As shown in figure 3, the method for the present invention step is as follows:
S1, voice pretreatment is carried out, the purpose for carrying out voice pretreatment is to ensure the sound spectrograph vocal print clarity formed Roughly the same, this is the premise for carrying out effective image identification, is concretely comprised the following steps:
S11, during voice signal data is gathered, test system it is some due to, the meeting in time series The trend error of a linear or slow change is generated, the zero curve of voice signal is made to deviate baseline, the size even deviated from can be with Time change, this can cause the correlation function of voice, and power spectrum function is deformed in processing calculates, intended using least square method Close trend term removal trend error;
S12, amplitude normalization is carried out;
S13, low-pass filtering, noise of the removal higher than 3500Hz;
S14, the spectrum-subtraction composed using more windows strengthen voice, are specially:
Step A, the time series of voice signal is set as x (n), carries out adding window point to x (n) with the Hamming window that length is wlen Frame processing obtains the i-th frame voice signal as xi(m), the xi(m) frame length is wlen, the xi(m) discrete Fourier becomes It is changed to
Step B, M frames, X described in common 2M+1 frames calculation procedure A are respectively taken before and after centered on i framesi(k) each component in Average amplitude is composedAnd phase angleWherein j refers to i frames Centered on rear j frames, Im refers to imaginary part, and Re refers to real part.
Step C, multiple orthogonal data windows is asked to be averaged to obtain Power estimation to same data sequence, more window spectrums are defined asWherein, L be data window number, SmtFor the spectrum of data window w, i.e., Tx (n) is data sequence, and N is sequence length, aw(n) it is w-th of data window, aw(n) it is one group of mutually orthogonal discrete ellipsoid sequence Row, for asking direct spectrum, a respectively with same column signalw(n) meet it is mutually orthogonal between multiple data windows, i.e.,Definition method is composed to the signal x after framing with above-mentioned more windowsi(m) multiple window spectrum estimation is carried out, I.e.
Step D, more window Spectral power density estimates are smoothed, calculate smooth power spectrum densityCalculate noise average power spectrum densityCalculate gain The factorWherein, NIS represents leading without words section The frame number occupied;
Step E, the amplitude spectrum after being subtracted according to obtained more windows spectrum spectrumVoice letter is strengthened in synthesis NumberWherein, more window spectrum spectrum-subtractions are to utilize the leading work(that noise is obtained without words section Rate after the power of overall sound subtracts the ingredient of noise, recovers voice signal using angle relationship, crosses subtracting coefficient and determine to signal Reinforcement degree, gain compensation factor determine calculate duration;
The choosing method of the subtracting coefficient excessively is as follows:
Ith, it is 1 to cross subtracting coefficient initial value, and takes initial signal-to-noise ratio snr'=0;
IIth, reinforcement processing is carried out to voice using more windows spectrum spectrum-subtraction, the signal-to-noise ratio snr of signal after calculating processing;
If the signal-to-noise ratio snr of the III, treated signal is more than initial signal-to-noise ratio snr', next step is carried out, if processing The signal-to-noise ratio snr of signal afterwards is less than or equal to initial signal-to-noise ratio snr', illustrates that voice is not notable in signal, does not then do and locate Reason, retains all voice signals, directly exports;
If the signal-to-noise ratio snr of the IV, treated signal is less than 8dB, crossing subtracting coefficient increases by 0.5, makes snr'=snr, weight Multiple step II-step IV is more than 8dB until Signal-to-Noise;
S2, image identification is carried out to the sound spectrograph of acquisition, obtains structure, this structure includes sound spectrograph vocal print position Starting point and end point are specially:
S21, sub-frame processing is carried out to voice signal, Short Time Fourier Transform is carried out in units of frame, obtains short-term spectrum;
S22, the short-term spectrum obtained by the time sequencing arrangement S21 of frame, obtain sound spectrograph;
Vocal print in sound spectrograph described in S23, identification S22, i.e.,:Colored sound spectrograph is become into gray level image;Extract gray-scale map Image border, identify gray-scale map middle conductor position;The starting point and end point that include sound spectrograph vocal print position that will be obtained Form structure;
S3, end-point detection is carried out, is specially:
S31, initial point position vector ST=[st are extracted from structure described in S21,st2,...,sti,...,stn] With end point position vector EN=[en1,en2,...,eni,...,enn], wherein, stiRefer to i-th of initial point position, eniRefer to the I end point position.The initial point position vector ST and end point position vector EN are ranked up according to ascending order;
S32, judgement have words section, are regarded as vocal print when there is three horizontal line sections, remaining is noise.Numerically body It is now to work as eni> sti+2I.e. it is believed that i-th point of line segment for starting point is in having words section;
S33, line segment of all affirmatives in having words section is selected, has been looked for whether to the left and right in the range of 100 frame of both direction The element st' of STiIn the presence of being also contained in if having in words section, and substituted script stiIt repeats and is sought in the range of 100 frames to the left and right It looks for, until left and right, the element of ST is not present in 100 frame scopes.The purpose for the arrangement is that it prevents due to the ineffective shadow of cut-off line function Ring end-point detection performance.
Specific embodiment 1, pink noise background
Step 1: reading in file, draw time domain figure and see Fig. 4, time-domain diagram is shown in Fig. 5 after voice pretreatment.
By voice framing, frame length 200, frame moves 80, obtains the two-dimensional matrix that the data after framing are 200*2964, each column 200 numbers (per frame) carry out Fourier transformation for a unit and obtain each frame frequency spectrum, then have 2964 frequency spectrums, using transverse axis as when Between, the longitudinal axis draws spectrogram for frequency and sees Fig. 6, takes low frequency part (0Hz~3500Hz) and does gray proces and obtain sound spectrograph, sees Fig. 7.Wherein, show for clarity, Fig. 7, Fig. 8, Fig. 9 are rotated clockwise 90 degree).
There are parallel ripple, i.e. vocal print in visible white part in Fig. 7, this is phonological component, separately there is the not corrugated part of white Caused by being very noisy.Horizontal line section part in figure is chosen, sees Fig. 8.
Starting point end point is stored, is resequenced by X direction size, obtains starting point vector sum end point vector. It is considered that being regarded as vocal print when there is three horizontal line sections, remaining is noise.Numerically it is presented as eni> sti+2, I.e. i-th of line segment end position is more than the starting position of the i-th+2 line segments, and this will serve as a proof it is judged that whether voice has words.To be true It protects without missing inspection information, words section may be had by being found again toward left and right.Obtain result such as Fig. 9.Conversion to time-domain diagram is shown in Figure 10.Utilize this Inventive method under pink noise background, has words Duan Jun to be detected.
Specific embodiment 2, strong noise background
Step is identical with example one, and experimental result is as follows:
It should be noted that under strong noise background, still can be left compared with very noisy frequency spectrum, such as Figure 14 after voice reinforcement processing It is shown, have that words section is higher and have the region where parallel lines for energy in figure, and after having words section, due to having compared with very noisy In the presence of it is relatively low to leave energy in sound spectrograph, into dotted existing noise spectrum.It, can be by noise when identifying line segment such as Figure 15 A part in spectrum is identified as line segment, so can cause to judge by accident in end-point detection.Last testing result is shown in Figure 16 to Figure 17, It can be seen that all in voice have words section all to identify, but the part that a part can be contained only to very noisy be mistaken for Voice.

Claims (3)

1. a kind of shortwave sound end detecting method based on image identification, which is characterized in that its step is specific as follows:
S1, voice pretreatment is carried out, the purpose for carrying out voice pretreatment is to ensure the sound spectrograph vocal print clarity formed substantially Identical, this is the premise for carrying out effective image identification, is concretely comprised the following steps:
S11, during voice signal data is gathered, test system it is some due to, can be generated in time series The trend error of one linear or slow change makes the zero curve of voice signal deviate baseline, the size even deviated from can with when Between change, this can cause the correlation function of voice, and power spectrum function deforms in processing calculates, and is become using least square fitting Gesture item removes trend error;
S12, amplitude normalization is carried out;
S13, low-pass filtering, noise of the removal higher than 3500Hz;
S14, the spectrum-subtraction composed using more windows strengthen voice;
S2, image identification is carried out to the sound spectrograph of acquisition, obtains structure, this structure includes the starting of sound spectrograph vocal print position Point and end point are specially:
S21, sub-frame processing is carried out to voice signal, Short Time Fourier Transform is carried out in units of frame, obtains short-term spectrum;
S22, the short-term spectrum obtained by the time sequencing arrangement S21 of frame, obtain sound spectrograph;
Vocal print in sound spectrograph described in S23, identification S22, i.e.,:Colored sound spectrograph is become into gray level image;Extract the figure of gray-scale map As edge, the position of identification gray-scale map middle conductor;The obtained starting point comprising sound spectrograph vocal print position and end point are formed Structure;
S3, end-point detection is carried out, is specially:
S31, initial point position vector ST=[st are extracted from structure described in S21,st2,...,sti,...,stn] and terminate Point position vector EN=[en1,en2,...,eni,...,enn], wherein, stiRefer to i-th of initial point position, eniRefer to i-th of knot Spot position.The initial point position vector ST and end point position vector EN are ranked up according to ascending order;
S32, judgement have words section, are regarded as vocal print when there is three horizontal line sections, remaining is noise.Numerically embody To work as eni> sti+2I.e. it is believed that i-th point of line segment for starting point is in having words section;
S33, line segment of all affirmatives in having words section is selected, looks for whether ST's in the range of 100 frame of both direction to the left and right Element st'iIn the presence of being also contained in if having in words section, and substituted script stiIt repeats and is found in the range of 100 frames to the left and right, Until left and right, the element of ST is not present in 100 frame scopes.
2. a kind of shortwave sound end detecting method based on image identification according to claim 1, it is characterised in that: Voice is carried out strengthening being as follows using the spectrum-subtraction that more windows are composed described in S14:
Step A, the time series of voice signal is set as x (n), and x (n) is carried out at adding window framing with the Hamming window that length is wlen Reason obtains the i-th frame voice signal as xi(m), the xi(m) frame length is wlen, the xi(m) Discrete Fourier Transform is
Step B, M frames, X described in common 2M+1 frames calculation procedure A are respectively taken before and after centered on i framesi(k) each component is averaged in Amplitude spectrumAnd phase angleWherein j refers to using i frames in The rear j frames of the heart, Im refer to imaginary part, and Re refers to real part;
Step C, multiple orthogonal data windows is asked to be averaged to obtain Power estimation to same data sequence, more window spectrums are defined asWherein, L be data window number, SmtFor the spectrum of data window w, i.e., Tx (n) is data sequence, and N is sequence length, aw(n) it is w-th of data window, aw(n) it is one group of mutually orthogonal discrete ellipsoid sequence Row, for asking direct spectrum, a respectively with same column signalw(n) meet it is mutually orthogonal between multiple data windows, i.e.,Definition method is composed to the signal x after framing with above-mentioned more windowsi(m) multiple window spectrum estimation is carried out, I.e.
Step D, more window Spectral power density estimates are smoothed, calculate smooth power spectrum densityCalculate noise average power spectrum densityCalculate gain The factorWherein, NIS represents leading no words The frame number of Duan Zhanyou;
Step E, the amplitude spectrum after being subtracted according to obtained more windows spectrum spectrumSynthesize enhanced speech signalWherein, more windows spectrum spectrum-subtractions be using the leading power that noise is obtained without words section, After the power of overall sound subtracts the ingredient of noise, recover voice signal using angle relationship, cross subtracting coefficient and determine to signal Reinforcement degree, gain compensation factor determine to calculate duration.
3. a kind of shortwave sound end detecting method based on image identification according to claim 1, it is characterised in that:
The choosing method of the subtracting coefficient excessively is as follows:
Ith, it is 1 to cross subtracting coefficient initial value, and takes initial signal-to-noise ratio snr'=0;
IIth, reinforcement processing is carried out to voice using more windows spectrum spectrum-subtraction, the signal-to-noise ratio snr of signal after calculating processing;
If the signal-to-noise ratio snr of the III, treated signal is more than initial signal-to-noise ratio snr', next step is carried out, if treated The signal-to-noise ratio snr of signal is less than or equal to initial signal-to-noise ratio snr', illustrates that voice is not notable in signal, then does not process, and protects All voice signals are stayed, are directly exported;
If the signal-to-noise ratio snr of the IV, treated signal is less than 8dB, crossing subtracting coefficient increases by 0.5, makes snr'=snr, repeats to walk Rapid II-step IV is more than 8dB until Signal-to-Noise.
CN201711330638.1A 2017-12-13 2017-12-13 Short wave voice endpoint detection method based on image recognition Active CN108053842B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711330638.1A CN108053842B (en) 2017-12-13 2017-12-13 Short wave voice endpoint detection method based on image recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711330638.1A CN108053842B (en) 2017-12-13 2017-12-13 Short wave voice endpoint detection method based on image recognition

Publications (2)

Publication Number Publication Date
CN108053842A true CN108053842A (en) 2018-05-18
CN108053842B CN108053842B (en) 2021-09-14

Family

ID=62132480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711330638.1A Active CN108053842B (en) 2017-12-13 2017-12-13 Short wave voice endpoint detection method based on image recognition

Country Status (1)

Country Link
CN (1) CN108053842B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109346105A (en) * 2018-07-27 2019-02-15 南京理工大学 Directly display the pitch period spectrogram method of pitch period track
CN110047470A (en) * 2019-04-11 2019-07-23 深圳市壹鸽科技有限公司 A kind of sound end detecting method
CN111354378A (en) * 2020-02-12 2020-06-30 北京声智科技有限公司 Voice endpoint detection method, device, equipment and computer storage medium
CN111429905A (en) * 2020-03-23 2020-07-17 北京声智科技有限公司 Voice signal processing method and device, voice intelligent elevator, medium and equipment

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1299126A (en) * 2001-01-16 2001-06-13 北京大学 Method for discriminating acoustic figure with base band components and sounding parameters
US20040260540A1 (en) * 2003-06-20 2004-12-23 Tong Zhang System and method for spectrogram analysis of an audio signal
US20050288923A1 (en) * 2004-06-25 2005-12-29 The Hong Kong University Of Science And Technology Speech enhancement by noise masking
US20100023327A1 (en) * 2006-11-21 2010-01-28 Iucf-Hyu (Industry-University Cooperation Foundation Hanyang University Method for improving speech signal non-linear overweighting gain in wavelet packet transform domain
CN101727905A (en) * 2009-11-27 2010-06-09 江南大学 Method for acquiring vocal print picture with refined time-frequency structure
CN102884575A (en) * 2010-04-22 2013-01-16 高通股份有限公司 Voice activity detection
CN103117066A (en) * 2013-01-17 2013-05-22 杭州电子科技大学 Low signal to noise ratio voice endpoint detection method based on time-frequency instaneous energy spectrum
CN104637497A (en) * 2015-01-16 2015-05-20 南京工程学院 Speech spectrum characteristic extracting method facing speech emotion identification
CN105489226A (en) * 2015-11-23 2016-04-13 湖北工业大学 Wiener filtering speech enhancement method for multi-taper spectrum estimation of pickup
CN105810213A (en) * 2014-12-30 2016-07-27 浙江大华技术股份有限公司 Typical abnormal sound detection method and device
CN106024010A (en) * 2016-05-19 2016-10-12 渤海大学 Speech signal dynamic characteristic extraction method based on formant curves
CN106531174A (en) * 2016-11-27 2017-03-22 福州大学 Animal sound recognition method based on wavelet packet decomposition and spectrogram features
CN106953887A (en) * 2017-01-05 2017-07-14 北京中瑞鸿程科技开发有限公司 A kind of personalized Organisation recommendations method of fine granularity radio station audio content
CN106971740A (en) * 2017-03-28 2017-07-21 吉林大学 Probability and the sound enhancement method of phase estimation are had based on voice

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1299126A (en) * 2001-01-16 2001-06-13 北京大学 Method for discriminating acoustic figure with base band components and sounding parameters
US20040260540A1 (en) * 2003-06-20 2004-12-23 Tong Zhang System and method for spectrogram analysis of an audio signal
US20050288923A1 (en) * 2004-06-25 2005-12-29 The Hong Kong University Of Science And Technology Speech enhancement by noise masking
US20100023327A1 (en) * 2006-11-21 2010-01-28 Iucf-Hyu (Industry-University Cooperation Foundation Hanyang University Method for improving speech signal non-linear overweighting gain in wavelet packet transform domain
CN101727905A (en) * 2009-11-27 2010-06-09 江南大学 Method for acquiring vocal print picture with refined time-frequency structure
CN102884575A (en) * 2010-04-22 2013-01-16 高通股份有限公司 Voice activity detection
CN103117066A (en) * 2013-01-17 2013-05-22 杭州电子科技大学 Low signal to noise ratio voice endpoint detection method based on time-frequency instaneous energy spectrum
CN105810213A (en) * 2014-12-30 2016-07-27 浙江大华技术股份有限公司 Typical abnormal sound detection method and device
CN104637497A (en) * 2015-01-16 2015-05-20 南京工程学院 Speech spectrum characteristic extracting method facing speech emotion identification
CN105489226A (en) * 2015-11-23 2016-04-13 湖北工业大学 Wiener filtering speech enhancement method for multi-taper spectrum estimation of pickup
CN106024010A (en) * 2016-05-19 2016-10-12 渤海大学 Speech signal dynamic characteristic extraction method based on formant curves
CN106531174A (en) * 2016-11-27 2017-03-22 福州大学 Animal sound recognition method based on wavelet packet decomposition and spectrogram features
CN106953887A (en) * 2017-01-05 2017-07-14 北京中瑞鸿程科技开发有限公司 A kind of personalized Organisation recommendations method of fine granularity radio station audio content
CN106971740A (en) * 2017-03-28 2017-07-21 吉林大学 Probability and the sound enhancement method of phase estimation are had based on voice

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KUN-CHING WANG ET AL.: "Voice Activity Detection Algorithm with Low Signal-to-Noise Ratios Based on Spectrum Entropy", 《2008 SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION》 *
孙海英: "基于倒谱特征和浊音特性的语音端点检测方法的研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *
肖纯智: "一种基于语谱图分析的语音增强算法", 《语音技术》 *
陈向民等: "基于语谱图的语音端点检测算法", 《语音技术》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109346105A (en) * 2018-07-27 2019-02-15 南京理工大学 Directly display the pitch period spectrogram method of pitch period track
CN109346105B (en) * 2018-07-27 2022-04-15 南京理工大学 Pitch period spectrogram method for directly displaying pitch period track
CN110047470A (en) * 2019-04-11 2019-07-23 深圳市壹鸽科技有限公司 A kind of sound end detecting method
CN111354378A (en) * 2020-02-12 2020-06-30 北京声智科技有限公司 Voice endpoint detection method, device, equipment and computer storage medium
CN111354378B (en) * 2020-02-12 2020-11-24 北京声智科技有限公司 Voice endpoint detection method, device, equipment and computer storage medium
CN111429905A (en) * 2020-03-23 2020-07-17 北京声智科技有限公司 Voice signal processing method and device, voice intelligent elevator, medium and equipment

Also Published As

Publication number Publication date
CN108053842B (en) 2021-09-14

Similar Documents

Publication Publication Date Title
Drugman et al. Glottal closure and opening instant detection from speech signals
CN103854662B (en) Adaptive voice detection method based on multiple domain Combined estimator
CN108735213B (en) Voice enhancement method and system based on phase compensation
CN108053842A (en) Shortwave sound end detecting method based on image identification
CN106486131B (en) A kind of method and device of speech de-noising
CN106971740B (en) Sound enhancement method based on voice existing probability and phase estimation
CN109410977B (en) Voice segment detection method based on MFCC similarity of EMD-Wavelet
CN109545188A (en) A kind of real-time voice end-point detecting method and device
CN105788603A (en) Audio identification method and system based on empirical mode decomposition
CN108899052B (en) Parkinson speech enhancement method based on multi-band spectral subtraction
US9208799B2 (en) Method and device for estimating a pattern in a signal
CN105679312B (en) The phonetic feature processing method of Application on Voiceprint Recognition under a kind of noise circumstance
CN103730126B (en) Noise suppressing method and noise silencer
CN111091833A (en) Endpoint detection method for reducing noise influence
WO2014070139A2 (en) Speech enhancement
CN114242099A (en) Speech enhancement algorithm based on improved phase spectrum compensation and full convolution neural network
CN111489763B (en) GMM model-based speaker recognition self-adaption method in complex environment
CN110808057A (en) Voice enhancement method for generating confrontation network based on constraint naive
CN107680610A (en) A kind of speech-enhancement system and method
Hsu et al. Voice activity detection based on frequency modulation of harmonics
Amehraye et al. Perceptual improvement of Wiener filtering
CN112233657A (en) Speech enhancement method based on low-frequency syllable recognition
Vetter et al. Single channel speech enhancement using principal component analysis and MDL subspace selection
Xiao et al. Inventory based speech enhancement for speaker dedicated speech communication systems
Li et al. Robust log-energy estimation and its dynamic change enhancement for in-car speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant