CN108053842A - Shortwave sound end detecting method based on image identification - Google Patents
Shortwave sound end detecting method based on image identification Download PDFInfo
- Publication number
- CN108053842A CN108053842A CN201711330638.1A CN201711330638A CN108053842A CN 108053842 A CN108053842 A CN 108053842A CN 201711330638 A CN201711330638 A CN 201711330638A CN 108053842 A CN108053842 A CN 108053842A
- Authority
- CN
- China
- Prior art keywords
- signal
- spectrum
- voice
- carried out
- noise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Abstract
The invention belongs to speech detection fields, are particularly based on the shortwave sound end detecting method that image identifies.The technical scheme is that:Data are pre-processed first, improve signal-to-noise ratio;Then by specific length framing, Short Time Fourier Transform is carried out at the same time, so as to obtain sound spectrograph;The vocal print in sound spectrograph is finally found using image-recognizing method, determines there is words section in data according to vocal print distribution.There is similar signal-to-noise ratio using the voice of the method for the present invention after the pre-treatment, adjustment parameter is not required in subsequent step, and therefore, the method for the present invention can adaptively be chosen from different ambient noises words section.
Description
Technical field
The invention belongs to speech detection field, especially a kind of shortwave sound end detecting method based on image identification.
Background technology
Although novel radio electrical communication system continuously emerges, short-wave radio set is due to its autonomous communication ability and wide coverage
The characteristics of, still it is subject to most attention.But short wave communication emitting radio waves are needed by ionospheric reflection, therefore its noise compared with
Greatly.The presence of strong background noise causes monitoring personnel that can not work long hours, it is necessary to do noise reduction process, while to no segment of speech into
The processing of row noise elimination.Leakage is listened in order to prevent at this time, and the performance of sound end detecting method is particularly important.
In traditional voice processing, according to different characteristic, there is the method for many end-point detections, as based on correlation function
End-point detection, the end-point detection based on cepstrum distance, the end-point detection based on zero ratio of energy and the endpoint inspection based on wavelet decomposition
Survey etc..For different phonetic, adjusting parameter, can choose voice exactly has words section.But in changeable environment, it is desirable that real-time Communication for Power
In the case of, adjustment endpoint detection parameters are unpractical, and traditional voice processing method is just no longer applicable.
Voice spectrum figure abbreviation sound spectrograph, by the Short Time Fourier Transform of voice analyze and research voice short-term spectrum with
The variation relation of time.Sound spectrograph horizontal direction is time shaft, and vertical direction is frequency axis, and gray scale striped thereon represents each
The voice short-time spectrum at moment.Sound spectrograph reflects the dynamic spectrum characteristic of voice signal, has important reality in speech analysis
With value, it is referred to as visual speech.
The content of the invention
The defects of for the prior art, according to there is no vocal prints in the peculiar mechanism and noise spectrum of mankind's sounding
This feature, the present invention propose a kind of adaptive processing method.
The technical scheme is that:Data are pre-processed first, improve signal-to-noise ratio;Then by specific length point
Frame is carried out at the same time Short Time Fourier Transform, so as to obtain sound spectrograph;The sound in sound spectrograph is finally found using image-recognizing method
Line determines there is words section in data according to vocal print distribution.
A kind of shortwave sound end detecting method based on image identification, step are specific as follows:
S1, voice pretreatment is carried out, the purpose for carrying out voice pretreatment is to ensure the sound spectrograph vocal print clarity formed
Roughly the same, this is the premise for carrying out effective image identification, is concretely comprised the following steps:
S11, during voice signal data is gathered, test system it is some due to, the meeting in time series
The trend error of a linear or slow change is generated, the zero curve of voice signal is made to deviate baseline, the size even deviated from can be with
Time change, this can cause the correlation function of voice, and power spectrum function is deformed in processing calculates, intended using least square method
Close trend term removal trend error;
S12, amplitude normalization is carried out;
S13, low-pass filtering, noise of the removal higher than 3500Hz;
S14, the spectrum-subtraction composed using more windows strengthen voice;
S2, image identification is carried out to the sound spectrograph of acquisition, obtains structure, this structure includes sound spectrograph vocal print position
Starting point and end point are specially:
S21, sub-frame processing is carried out to voice signal, Short Time Fourier Transform is carried out in units of frame, obtains short-term spectrum;
S22, the short-term spectrum obtained by the time sequencing arrangement S21 of frame, obtain sound spectrograph;
Vocal print in sound spectrograph described in S23, identification S22, i.e.,:Colored sound spectrograph is become into gray level image;Extract gray-scale map
Image border, identify gray-scale map middle conductor position;The starting point and end point that include sound spectrograph vocal print position that will be obtained
Form structure;
S3, end-point detection is carried out, is specially:
S31, initial point position vector ST=[st are extracted from structure described in S21,st2,...,sti,...,stn]
With end point position vector EN=[en1,en2,...,eni,...,enn], wherein, stiRefer to i-th of initial point position, eniRefer to the
I end point position.The initial point position vector ST and end point position vector EN are ranked up according to ascending order;
S32, judgement have words section, are regarded as vocal print when there is three horizontal line sections, remaining is noise.Numerically body
It is now to work as eni> sti+2I.e. it is believed that i-th point of line segment for starting point is in having words section;
S33, line segment of all affirmatives in having words section is selected, has been looked for whether to the left and right in the range of 100 frame of both direction
The element st' of STiIn the presence of being also contained in if having in words section, and substituted script stiIt repeats and is sought in the range of 100 frames to the left and right
It looks for, until left and right, the element of ST is not present in 100 frame scopes.
Further, voice is carried out strengthening being as follows using the spectrum-subtraction that more windows are composed described in S14:
Step A, the time series of voice signal is set as x (n), carries out adding window point to x (n) with the Hamming window that length is wlen
Frame processing obtains the i-th frame voice signal as xi(m), the xi(m) frame length is wlen, the xi(m) discrete Fourier becomes
It is changed to
Step B, M frames, X described in common 2M+1 frames calculation procedure A are respectively taken before and after centered on i framesi(k) each component in
Average amplitude is composedAnd phase angleWherein j refers to i frames
Centered on rear j frames, Im refers to imaginary part, and Re refers to real part;
Step C, multiple orthogonal data windows is asked to be averaged to obtain Power estimation to same data sequence, more window spectrums are defined asWherein, L be data window number, SmtFor the spectrum of data window w, i.e.,
Tx (n) is data sequence, and N is sequence length, aw(n) it is w-th of data window, aw(n) it is one group of mutually orthogonal discrete ellipsoid sequence
Row, for asking direct spectrum, a respectively with same column signalw(n) meet it is mutually orthogonal between multiple data windows, i.e.,Definition method is composed to the signal x after framing with above-mentioned more windowsi(m) multiple window spectrum estimation is carried out,
I.e.
Step D, more window Spectral power density estimates are smoothed, calculate smooth power spectrum densityCalculate noise average power spectrum densityCalculate gain
The factorWherein, NIS represents leading without words section
The frame number occupied;
Step E, the amplitude spectrum after being subtracted according to obtained more windows spectrum spectrumVoice letter is strengthened in synthesis
NumberWherein, more window spectrum spectrum-subtractions are to utilize the leading work(that noise is obtained without words section
Rate after the power of overall sound subtracts the ingredient of noise, recovers voice signal using angle relationship, crosses subtracting coefficient and determine to signal
Reinforcement degree, gain compensation factor determine calculate duration.
Further, the choosing method of the subtracting coefficient excessively is as follows:
Ith, it is 1 to cross subtracting coefficient initial value, and takes initial signal-to-noise ratio snr'=0;
IIth, reinforcement processing is carried out to voice using more windows spectrum spectrum-subtraction, the signal-to-noise ratio snr of signal after calculating processing;
If the signal-to-noise ratio snr of the III, treated signal is more than initial signal-to-noise ratio snr', next step is carried out, if processing
The signal-to-noise ratio snr of signal afterwards is less than or equal to initial signal-to-noise ratio snr', illustrates that voice is not notable in signal, does not then do and locate
Reason, retains all voice signals, directly exports;
If the signal-to-noise ratio snr of the IV, treated signal is less than 8dB, crossing subtracting coefficient increases by 0.5, makes snr'=snr, weight
Multiple step II-step IV is more than 8dB until Signal-to-Noise.
The beneficial effects of the invention are as follows:
There is similar signal-to-noise ratio using the voice of the method for the present invention after the pre-treatment, adjustment parameter is not required in subsequent step,
Therefore, the method for the present invention can adaptively be chosen from different ambient noises words section.
Description of the drawings
Fig. 1 improves spectrum-subtraction schematic diagram for more windows spectrum.
Fig. 2 strengthens process chart for voice.
Fig. 3 is the method for the present invention flow chart.
Fig. 4 is the voice time domain figure before voice pretreatment in specific embodiment 1.
Fig. 5 is the voice time domain figure after voice pretreatment in specific embodiment 1.
Fig. 6 is each frame frequency spectrogram of voice in specific embodiment 1.
Fig. 7 is the sound spectrograph after gray proces in specific embodiment 1.
Fig. 8 is horizontal line section part in the sound spectrograph after gray proces in specific embodiment 1.
Fig. 9 is the sound spectrograph end-point detection result after gray proces in specific embodiment 1.
Figure 10 is endpoint testing result time-domain diagram in specific embodiment 1, wherein, a left side is raw tone, after pretreatment is in the right side
Voice.
Figure 11 is the voice time domain figure before voice pretreatment in specific embodiment 2.
Figure 12 is the pretreated voice time domain figure of voice in specific embodiment 2.
Figure 13 is each frame frequency spectrogram of voice in specific embodiment 2.
Figure 14 is the sound spectrograph after gray proces in specific embodiment 2.
Figure 15 is horizontal line section part in the sound spectrograph after gray proces in specific embodiment 2.
Figure 16 is the sound spectrograph end-point detection result after gray proces in specific embodiment 2.
Figure 17 is endpoint testing result time-domain diagram in specific embodiment 2, wherein, a left side is raw tone, after pretreatment is in the right side
Voice.
Specific embodiment
The present invention will be described below in conjunction with the accompanying drawings.
The method of the present invention chooses feature of the vocal print characteristic as sound.Due to unique physiological structure of mankind's sounding, from language
It can be seen that vocal print in audio spectrogram (sound spectrograph).The vocal print of human speech has notable feature, is having words section, it can be seen that no
Energy distribution has specific rule on same frequency;In the spectrogram of voice, horizontally-parallel several lines are presented, these lines are just
It is vocal print.Vocal print can embody personal pronunciation character and phoneme feature, find broad application in terms of speech recognition.
As shown in figure 3, the method for the present invention step is as follows:
S1, voice pretreatment is carried out, the purpose for carrying out voice pretreatment is to ensure the sound spectrograph vocal print clarity formed
Roughly the same, this is the premise for carrying out effective image identification, is concretely comprised the following steps:
S11, during voice signal data is gathered, test system it is some due to, the meeting in time series
The trend error of a linear or slow change is generated, the zero curve of voice signal is made to deviate baseline, the size even deviated from can be with
Time change, this can cause the correlation function of voice, and power spectrum function is deformed in processing calculates, intended using least square method
Close trend term removal trend error;
S12, amplitude normalization is carried out;
S13, low-pass filtering, noise of the removal higher than 3500Hz;
S14, the spectrum-subtraction composed using more windows strengthen voice, are specially:
Step A, the time series of voice signal is set as x (n), carries out adding window point to x (n) with the Hamming window that length is wlen
Frame processing obtains the i-th frame voice signal as xi(m), the xi(m) frame length is wlen, the xi(m) discrete Fourier becomes
It is changed to
Step B, M frames, X described in common 2M+1 frames calculation procedure A are respectively taken before and after centered on i framesi(k) each component in
Average amplitude is composedAnd phase angleWherein j refers to i frames
Centered on rear j frames, Im refers to imaginary part, and Re refers to real part.
Step C, multiple orthogonal data windows is asked to be averaged to obtain Power estimation to same data sequence, more window spectrums are defined asWherein, L be data window number, SmtFor the spectrum of data window w, i.e.,
Tx (n) is data sequence, and N is sequence length, aw(n) it is w-th of data window, aw(n) it is one group of mutually orthogonal discrete ellipsoid sequence
Row, for asking direct spectrum, a respectively with same column signalw(n) meet it is mutually orthogonal between multiple data windows, i.e.,Definition method is composed to the signal x after framing with above-mentioned more windowsi(m) multiple window spectrum estimation is carried out,
I.e.
Step D, more window Spectral power density estimates are smoothed, calculate smooth power spectrum densityCalculate noise average power spectrum densityCalculate gain
The factorWherein, NIS represents leading without words section
The frame number occupied;
Step E, the amplitude spectrum after being subtracted according to obtained more windows spectrum spectrumVoice letter is strengthened in synthesis
NumberWherein, more window spectrum spectrum-subtractions are to utilize the leading work(that noise is obtained without words section
Rate after the power of overall sound subtracts the ingredient of noise, recovers voice signal using angle relationship, crosses subtracting coefficient and determine to signal
Reinforcement degree, gain compensation factor determine calculate duration;
The choosing method of the subtracting coefficient excessively is as follows:
Ith, it is 1 to cross subtracting coefficient initial value, and takes initial signal-to-noise ratio snr'=0;
IIth, reinforcement processing is carried out to voice using more windows spectrum spectrum-subtraction, the signal-to-noise ratio snr of signal after calculating processing;
If the signal-to-noise ratio snr of the III, treated signal is more than initial signal-to-noise ratio snr', next step is carried out, if processing
The signal-to-noise ratio snr of signal afterwards is less than or equal to initial signal-to-noise ratio snr', illustrates that voice is not notable in signal, does not then do and locate
Reason, retains all voice signals, directly exports;
If the signal-to-noise ratio snr of the IV, treated signal is less than 8dB, crossing subtracting coefficient increases by 0.5, makes snr'=snr, weight
Multiple step II-step IV is more than 8dB until Signal-to-Noise;
S2, image identification is carried out to the sound spectrograph of acquisition, obtains structure, this structure includes sound spectrograph vocal print position
Starting point and end point are specially:
S21, sub-frame processing is carried out to voice signal, Short Time Fourier Transform is carried out in units of frame, obtains short-term spectrum;
S22, the short-term spectrum obtained by the time sequencing arrangement S21 of frame, obtain sound spectrograph;
Vocal print in sound spectrograph described in S23, identification S22, i.e.,:Colored sound spectrograph is become into gray level image;Extract gray-scale map
Image border, identify gray-scale map middle conductor position;The starting point and end point that include sound spectrograph vocal print position that will be obtained
Form structure;
S3, end-point detection is carried out, is specially:
S31, initial point position vector ST=[st are extracted from structure described in S21,st2,...,sti,...,stn]
With end point position vector EN=[en1,en2,...,eni,...,enn], wherein, stiRefer to i-th of initial point position, eniRefer to the
I end point position.The initial point position vector ST and end point position vector EN are ranked up according to ascending order;
S32, judgement have words section, are regarded as vocal print when there is three horizontal line sections, remaining is noise.Numerically body
It is now to work as eni> sti+2I.e. it is believed that i-th point of line segment for starting point is in having words section;
S33, line segment of all affirmatives in having words section is selected, has been looked for whether to the left and right in the range of 100 frame of both direction
The element st' of STiIn the presence of being also contained in if having in words section, and substituted script stiIt repeats and is sought in the range of 100 frames to the left and right
It looks for, until left and right, the element of ST is not present in 100 frame scopes.The purpose for the arrangement is that it prevents due to the ineffective shadow of cut-off line function
Ring end-point detection performance.
Specific embodiment 1, pink noise background
Step 1: reading in file, draw time domain figure and see Fig. 4, time-domain diagram is shown in Fig. 5 after voice pretreatment.
By voice framing, frame length 200, frame moves 80, obtains the two-dimensional matrix that the data after framing are 200*2964, each column
200 numbers (per frame) carry out Fourier transformation for a unit and obtain each frame frequency spectrum, then have 2964 frequency spectrums, using transverse axis as when
Between, the longitudinal axis draws spectrogram for frequency and sees Fig. 6, takes low frequency part (0Hz~3500Hz) and does gray proces and obtain sound spectrograph, sees
Fig. 7.Wherein, show for clarity, Fig. 7, Fig. 8, Fig. 9 are rotated clockwise 90 degree).
There are parallel ripple, i.e. vocal print in visible white part in Fig. 7, this is phonological component, separately there is the not corrugated part of white
Caused by being very noisy.Horizontal line section part in figure is chosen, sees Fig. 8.
Starting point end point is stored, is resequenced by X direction size, obtains starting point vector sum end point vector.
It is considered that being regarded as vocal print when there is three horizontal line sections, remaining is noise.Numerically it is presented as eni> sti+2,
I.e. i-th of line segment end position is more than the starting position of the i-th+2 line segments, and this will serve as a proof it is judged that whether voice has words.To be true
It protects without missing inspection information, words section may be had by being found again toward left and right.Obtain result such as Fig. 9.Conversion to time-domain diagram is shown in Figure 10.Utilize this
Inventive method under pink noise background, has words Duan Jun to be detected.
Specific embodiment 2, strong noise background
Step is identical with example one, and experimental result is as follows:
It should be noted that under strong noise background, still can be left compared with very noisy frequency spectrum, such as Figure 14 after voice reinforcement processing
It is shown, have that words section is higher and have the region where parallel lines for energy in figure, and after having words section, due to having compared with very noisy
In the presence of it is relatively low to leave energy in sound spectrograph, into dotted existing noise spectrum.It, can be by noise when identifying line segment such as Figure 15
A part in spectrum is identified as line segment, so can cause to judge by accident in end-point detection.Last testing result is shown in Figure 16 to Figure 17,
It can be seen that all in voice have words section all to identify, but the part that a part can be contained only to very noisy be mistaken for
Voice.
Claims (3)
1. a kind of shortwave sound end detecting method based on image identification, which is characterized in that its step is specific as follows:
S1, voice pretreatment is carried out, the purpose for carrying out voice pretreatment is to ensure the sound spectrograph vocal print clarity formed substantially
Identical, this is the premise for carrying out effective image identification, is concretely comprised the following steps:
S11, during voice signal data is gathered, test system it is some due to, can be generated in time series
The trend error of one linear or slow change makes the zero curve of voice signal deviate baseline, the size even deviated from can with when
Between change, this can cause the correlation function of voice, and power spectrum function deforms in processing calculates, and is become using least square fitting
Gesture item removes trend error;
S12, amplitude normalization is carried out;
S13, low-pass filtering, noise of the removal higher than 3500Hz;
S14, the spectrum-subtraction composed using more windows strengthen voice;
S2, image identification is carried out to the sound spectrograph of acquisition, obtains structure, this structure includes the starting of sound spectrograph vocal print position
Point and end point are specially:
S21, sub-frame processing is carried out to voice signal, Short Time Fourier Transform is carried out in units of frame, obtains short-term spectrum;
S22, the short-term spectrum obtained by the time sequencing arrangement S21 of frame, obtain sound spectrograph;
Vocal print in sound spectrograph described in S23, identification S22, i.e.,:Colored sound spectrograph is become into gray level image;Extract the figure of gray-scale map
As edge, the position of identification gray-scale map middle conductor;The obtained starting point comprising sound spectrograph vocal print position and end point are formed
Structure;
S3, end-point detection is carried out, is specially:
S31, initial point position vector ST=[st are extracted from structure described in S21,st2,...,sti,...,stn] and terminate
Point position vector EN=[en1,en2,...,eni,...,enn], wherein, stiRefer to i-th of initial point position, eniRefer to i-th of knot
Spot position.The initial point position vector ST and end point position vector EN are ranked up according to ascending order;
S32, judgement have words section, are regarded as vocal print when there is three horizontal line sections, remaining is noise.Numerically embody
To work as eni> sti+2I.e. it is believed that i-th point of line segment for starting point is in having words section;
S33, line segment of all affirmatives in having words section is selected, looks for whether ST's in the range of 100 frame of both direction to the left and right
Element st'iIn the presence of being also contained in if having in words section, and substituted script stiIt repeats and is found in the range of 100 frames to the left and right,
Until left and right, the element of ST is not present in 100 frame scopes.
2. a kind of shortwave sound end detecting method based on image identification according to claim 1, it is characterised in that:
Voice is carried out strengthening being as follows using the spectrum-subtraction that more windows are composed described in S14:
Step A, the time series of voice signal is set as x (n), and x (n) is carried out at adding window framing with the Hamming window that length is wlen
Reason obtains the i-th frame voice signal as xi(m), the xi(m) frame length is wlen, the xi(m) Discrete Fourier Transform is
Step B, M frames, X described in common 2M+1 frames calculation procedure A are respectively taken before and after centered on i framesi(k) each component is averaged in
Amplitude spectrumAnd phase angleWherein j refers to using i frames in
The rear j frames of the heart, Im refer to imaginary part, and Re refers to real part;
Step C, multiple orthogonal data windows is asked to be averaged to obtain Power estimation to same data sequence, more window spectrums are defined asWherein, L be data window number, SmtFor the spectrum of data window w, i.e.,
Tx (n) is data sequence, and N is sequence length, aw(n) it is w-th of data window, aw(n) it is one group of mutually orthogonal discrete ellipsoid sequence
Row, for asking direct spectrum, a respectively with same column signalw(n) meet it is mutually orthogonal between multiple data windows, i.e.,Definition method is composed to the signal x after framing with above-mentioned more windowsi(m) multiple window spectrum estimation is carried out,
I.e.
Step D, more window Spectral power density estimates are smoothed, calculate smooth power spectrum densityCalculate noise average power spectrum densityCalculate gain
The factorWherein, NIS represents leading no words
The frame number of Duan Zhanyou;
Step E, the amplitude spectrum after being subtracted according to obtained more windows spectrum spectrumSynthesize enhanced speech signalWherein, more windows spectrum spectrum-subtractions be using the leading power that noise is obtained without words section,
After the power of overall sound subtracts the ingredient of noise, recover voice signal using angle relationship, cross subtracting coefficient and determine to signal
Reinforcement degree, gain compensation factor determine to calculate duration.
3. a kind of shortwave sound end detecting method based on image identification according to claim 1, it is characterised in that:
The choosing method of the subtracting coefficient excessively is as follows:
Ith, it is 1 to cross subtracting coefficient initial value, and takes initial signal-to-noise ratio snr'=0;
IIth, reinforcement processing is carried out to voice using more windows spectrum spectrum-subtraction, the signal-to-noise ratio snr of signal after calculating processing;
If the signal-to-noise ratio snr of the III, treated signal is more than initial signal-to-noise ratio snr', next step is carried out, if treated
The signal-to-noise ratio snr of signal is less than or equal to initial signal-to-noise ratio snr', illustrates that voice is not notable in signal, then does not process, and protects
All voice signals are stayed, are directly exported;
If the signal-to-noise ratio snr of the IV, treated signal is less than 8dB, crossing subtracting coefficient increases by 0.5, makes snr'=snr, repeats to walk
Rapid II-step IV is more than 8dB until Signal-to-Noise.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711330638.1A CN108053842B (en) | 2017-12-13 | 2017-12-13 | Short wave voice endpoint detection method based on image recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711330638.1A CN108053842B (en) | 2017-12-13 | 2017-12-13 | Short wave voice endpoint detection method based on image recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108053842A true CN108053842A (en) | 2018-05-18 |
CN108053842B CN108053842B (en) | 2021-09-14 |
Family
ID=62132480
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711330638.1A Active CN108053842B (en) | 2017-12-13 | 2017-12-13 | Short wave voice endpoint detection method based on image recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108053842B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109346105A (en) * | 2018-07-27 | 2019-02-15 | 南京理工大学 | Directly display the pitch period spectrogram method of pitch period track |
CN110047470A (en) * | 2019-04-11 | 2019-07-23 | 深圳市壹鸽科技有限公司 | A kind of sound end detecting method |
CN111354378A (en) * | 2020-02-12 | 2020-06-30 | 北京声智科技有限公司 | Voice endpoint detection method, device, equipment and computer storage medium |
CN111429905A (en) * | 2020-03-23 | 2020-07-17 | 北京声智科技有限公司 | Voice signal processing method and device, voice intelligent elevator, medium and equipment |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1299126A (en) * | 2001-01-16 | 2001-06-13 | 北京大学 | Method for discriminating acoustic figure with base band components and sounding parameters |
US20040260540A1 (en) * | 2003-06-20 | 2004-12-23 | Tong Zhang | System and method for spectrogram analysis of an audio signal |
US20050288923A1 (en) * | 2004-06-25 | 2005-12-29 | The Hong Kong University Of Science And Technology | Speech enhancement by noise masking |
US20100023327A1 (en) * | 2006-11-21 | 2010-01-28 | Iucf-Hyu (Industry-University Cooperation Foundation Hanyang University | Method for improving speech signal non-linear overweighting gain in wavelet packet transform domain |
CN101727905A (en) * | 2009-11-27 | 2010-06-09 | 江南大学 | Method for acquiring vocal print picture with refined time-frequency structure |
CN102884575A (en) * | 2010-04-22 | 2013-01-16 | 高通股份有限公司 | Voice activity detection |
CN103117066A (en) * | 2013-01-17 | 2013-05-22 | 杭州电子科技大学 | Low signal to noise ratio voice endpoint detection method based on time-frequency instaneous energy spectrum |
CN104637497A (en) * | 2015-01-16 | 2015-05-20 | 南京工程学院 | Speech spectrum characteristic extracting method facing speech emotion identification |
CN105489226A (en) * | 2015-11-23 | 2016-04-13 | 湖北工业大学 | Wiener filtering speech enhancement method for multi-taper spectrum estimation of pickup |
CN105810213A (en) * | 2014-12-30 | 2016-07-27 | 浙江大华技术股份有限公司 | Typical abnormal sound detection method and device |
CN106024010A (en) * | 2016-05-19 | 2016-10-12 | 渤海大学 | Speech signal dynamic characteristic extraction method based on formant curves |
CN106531174A (en) * | 2016-11-27 | 2017-03-22 | 福州大学 | Animal sound recognition method based on wavelet packet decomposition and spectrogram features |
CN106953887A (en) * | 2017-01-05 | 2017-07-14 | 北京中瑞鸿程科技开发有限公司 | A kind of personalized Organisation recommendations method of fine granularity radio station audio content |
CN106971740A (en) * | 2017-03-28 | 2017-07-21 | 吉林大学 | Probability and the sound enhancement method of phase estimation are had based on voice |
-
2017
- 2017-12-13 CN CN201711330638.1A patent/CN108053842B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1299126A (en) * | 2001-01-16 | 2001-06-13 | 北京大学 | Method for discriminating acoustic figure with base band components and sounding parameters |
US20040260540A1 (en) * | 2003-06-20 | 2004-12-23 | Tong Zhang | System and method for spectrogram analysis of an audio signal |
US20050288923A1 (en) * | 2004-06-25 | 2005-12-29 | The Hong Kong University Of Science And Technology | Speech enhancement by noise masking |
US20100023327A1 (en) * | 2006-11-21 | 2010-01-28 | Iucf-Hyu (Industry-University Cooperation Foundation Hanyang University | Method for improving speech signal non-linear overweighting gain in wavelet packet transform domain |
CN101727905A (en) * | 2009-11-27 | 2010-06-09 | 江南大学 | Method for acquiring vocal print picture with refined time-frequency structure |
CN102884575A (en) * | 2010-04-22 | 2013-01-16 | 高通股份有限公司 | Voice activity detection |
CN103117066A (en) * | 2013-01-17 | 2013-05-22 | 杭州电子科技大学 | Low signal to noise ratio voice endpoint detection method based on time-frequency instaneous energy spectrum |
CN105810213A (en) * | 2014-12-30 | 2016-07-27 | 浙江大华技术股份有限公司 | Typical abnormal sound detection method and device |
CN104637497A (en) * | 2015-01-16 | 2015-05-20 | 南京工程学院 | Speech spectrum characteristic extracting method facing speech emotion identification |
CN105489226A (en) * | 2015-11-23 | 2016-04-13 | 湖北工业大学 | Wiener filtering speech enhancement method for multi-taper spectrum estimation of pickup |
CN106024010A (en) * | 2016-05-19 | 2016-10-12 | 渤海大学 | Speech signal dynamic characteristic extraction method based on formant curves |
CN106531174A (en) * | 2016-11-27 | 2017-03-22 | 福州大学 | Animal sound recognition method based on wavelet packet decomposition and spectrogram features |
CN106953887A (en) * | 2017-01-05 | 2017-07-14 | 北京中瑞鸿程科技开发有限公司 | A kind of personalized Organisation recommendations method of fine granularity radio station audio content |
CN106971740A (en) * | 2017-03-28 | 2017-07-21 | 吉林大学 | Probability and the sound enhancement method of phase estimation are had based on voice |
Non-Patent Citations (4)
Title |
---|
KUN-CHING WANG ET AL.: "Voice Activity Detection Algorithm with Low Signal-to-Noise Ratios Based on Spectrum Entropy", 《2008 SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION》 * |
孙海英: "基于倒谱特征和浊音特性的语音端点检测方法的研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
肖纯智: "一种基于语谱图分析的语音增强算法", 《语音技术》 * |
陈向民等: "基于语谱图的语音端点检测算法", 《语音技术》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109346105A (en) * | 2018-07-27 | 2019-02-15 | 南京理工大学 | Directly display the pitch period spectrogram method of pitch period track |
CN109346105B (en) * | 2018-07-27 | 2022-04-15 | 南京理工大学 | Pitch period spectrogram method for directly displaying pitch period track |
CN110047470A (en) * | 2019-04-11 | 2019-07-23 | 深圳市壹鸽科技有限公司 | A kind of sound end detecting method |
CN111354378A (en) * | 2020-02-12 | 2020-06-30 | 北京声智科技有限公司 | Voice endpoint detection method, device, equipment and computer storage medium |
CN111354378B (en) * | 2020-02-12 | 2020-11-24 | 北京声智科技有限公司 | Voice endpoint detection method, device, equipment and computer storage medium |
CN111429905A (en) * | 2020-03-23 | 2020-07-17 | 北京声智科技有限公司 | Voice signal processing method and device, voice intelligent elevator, medium and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108053842B (en) | 2021-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Drugman et al. | Glottal closure and opening instant detection from speech signals | |
CN103854662B (en) | Adaptive voice detection method based on multiple domain Combined estimator | |
CN108735213B (en) | Voice enhancement method and system based on phase compensation | |
CN108053842A (en) | Shortwave sound end detecting method based on image identification | |
CN106486131B (en) | A kind of method and device of speech de-noising | |
CN106971740B (en) | Sound enhancement method based on voice existing probability and phase estimation | |
CN109410977B (en) | Voice segment detection method based on MFCC similarity of EMD-Wavelet | |
CN109545188A (en) | A kind of real-time voice end-point detecting method and device | |
CN105788603A (en) | Audio identification method and system based on empirical mode decomposition | |
CN108899052B (en) | Parkinson speech enhancement method based on multi-band spectral subtraction | |
US9208799B2 (en) | Method and device for estimating a pattern in a signal | |
CN105679312B (en) | The phonetic feature processing method of Application on Voiceprint Recognition under a kind of noise circumstance | |
CN103730126B (en) | Noise suppressing method and noise silencer | |
CN111091833A (en) | Endpoint detection method for reducing noise influence | |
WO2014070139A2 (en) | Speech enhancement | |
CN114242099A (en) | Speech enhancement algorithm based on improved phase spectrum compensation and full convolution neural network | |
CN111489763B (en) | GMM model-based speaker recognition self-adaption method in complex environment | |
CN110808057A (en) | Voice enhancement method for generating confrontation network based on constraint naive | |
CN107680610A (en) | A kind of speech-enhancement system and method | |
Hsu et al. | Voice activity detection based on frequency modulation of harmonics | |
Amehraye et al. | Perceptual improvement of Wiener filtering | |
CN112233657A (en) | Speech enhancement method based on low-frequency syllable recognition | |
Vetter et al. | Single channel speech enhancement using principal component analysis and MDL subspace selection | |
Xiao et al. | Inventory based speech enhancement for speaker dedicated speech communication systems | |
Li et al. | Robust log-energy estimation and its dynamic change enhancement for in-car speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |