CN116168719A - Sound gain adjusting method and system based on context analysis - Google Patents

Sound gain adjusting method and system based on context analysis Download PDF

Info

Publication number
CN116168719A
CN116168719A CN202211673539.4A CN202211673539A CN116168719A CN 116168719 A CN116168719 A CN 116168719A CN 202211673539 A CN202211673539 A CN 202211673539A CN 116168719 A CN116168719 A CN 116168719A
Authority
CN
China
Prior art keywords
frame
gain
audio
small
sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211673539.4A
Other languages
Chinese (zh)
Inventor
李鹏
朱尚文
李子豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Acosound Technology Co ltd
Original Assignee
Acosound Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Acosound Technology Co ltd filed Critical Acosound Technology Co ltd
Priority to CN202211673539.4A priority Critical patent/CN116168719A/en
Publication of CN116168719A publication Critical patent/CN116168719A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a sound gain adjusting method and a system based on context analysis, wherein the related gain adjusting method comprises the following steps: s1, acquiring an audio fragment to be processed in sound, and extracting a plurality of audio frames in the audio fragment; s2, carrying out Fourier transform processing on the audio frame, and calculating frequency domain energy and short-time average zero-crossing rate of the audio frame; s3, judging whether the frequency domain energy is smaller than a first preset threshold value, if yes, enabling the current audio frame to be a noise frame, and giving small gain to the noise frame; if not, the current audio frame is a non-noise frame; s4, judging whether the short-time average zero-crossing rate is larger than a second preset threshold value, if so, the current audio frame is a voice frame; if not, the current audio frame is a noise frame, and a small gain is given to the noise frame; s5, judging whether the frequency domain energy of the voice frame is smaller than a first preset threshold value, if yes, enabling the current voice frame to be a small voice frame, and giving a large gain to the small voice frame; if not, the current speech frame is a loud frame and gives a small gain to the loud frame.

Description

Sound gain adjusting method and system based on context analysis
Technical Field
The invention relates to the technical field of hearing aids, in particular to a sound gain adjusting method and system based on context analysis.
Background
A wide dynamic range compression technique (WDRC) is a technique in which the gain of a hearing aid is changed in real time as the intensity of an input sound signal changes, so that the amplified sound signal is well within the reduced auditory dynamic range of a hearing impaired patient. The gain of the wide dynamic range is variable and this system gives the proper gain for low intensity sound signals. There may also be different compression ratios for different frequency bands to achieve the proper loudness level.
The attach-release algorithm: two parameters need to be determined in the algorithm: the startup parameter atack time and release time. These two parameters determine the response speed of the algorithm to the increase and decrease of the signal, and also determine the gain change speed. For the rising signal, the response speed must be fast so as to keep up with the change of the signal, and when the signal reaches the peak value, the gain is adjusted in time, so that the value of the attack time is usually smaller, and is generally 30-200 ms. For the descending signal, the value of the release time is usually larger, and is generally 0.5-2.0 s, depending on the gap time of the voice when the release time is selected.
The context analysis technology is to divide the audio frames of different components (voice, noise and music) in the sound fragment by utilizing the difference of the audio characteristics.
The implementation of the wide dynamic range compression algorithm is illustrated in fig. 1, where the speech is first divided in the frequency domain into a number of separate frequency regions, which are called channels. Within each channel, the gain of that channel is determined by mapping the dynamic range of normal sound into the hearing dynamic range of the hearing impaired person, based on the hearing threshold map of the hearing impaired person, and processing it independently. And then the gains calculated in different channels are acted on the input signal in the frequency domain, and finally the sound signal is synthesized and then output. The specific implementation steps are as follows:
the signal is first framed and time-frequency converted. The time-frequency transform may use an FFT transform, where a WOLA (Weighted Overlap-Add) filter is used to perform the time-frequency transform of the signal. WOLA is an efficient time-frequency conversion and sub-band division implementation method, and WOLA filters divide the full frequency band into multiple sub-bands uniformly.
Then, the channel is divided, that is, a plurality of sub-bands are properly combined according to a certain rule on the whole frequency band and divided into N channels. The bandwidth contained by each channel is different due to the non-linearity of the human ear's perception of frequency. After the channels are divided in the frequency domain, dynamic range compression processing can be independently performed in each channel.
The sound pressure level and gain of each channel are then calculated: assuming that there are K subbands in the channel n, the WOLA analysis result is expressed as:
Xk(m)=rk(m)+ik(m)*i
wherein m is the number of the voice frame; k=0, 1,2,..k-1 is subband number; rk (m) and ik (m) are real and imaginary values, respectively.
Energy root mean square P for K subbands in the channel RMS (m) and sound pressure level SPL (m), expressed as:
Figure BDA0004017109960000021
SPL(m)=20lg(P RMS (m))
after the sound pressure level is obtained, gain calculation can be performed. In order to reflect the change speed of real voice and reduce distortion, before calculating gain, the energy root mean square value P calculated in each channel must be calculated in time domain RMS (m) tracking in a manner conforming to the speed of change of the speech signal, and calculating the gain with a relatively gentle value after tracking. Smoothing P here using an attack-release algorithm RMS (m) and calculating the sound pressure level SPL (m) from the smoothed energy value and thereby calculating the gain. So far, the dynamic range compression processing is respectively carried out on different frequency bands.
Existing wide dynamic range compression techniques and speech noise discrimination techniques have the following drawbacks in the application of hearing aid compensation algorithms:
1. for small noise, the prior art does not deal with the problem of high output audio background noise in order to ensure the intelligibility and fluency of the audio.
2. For audio frame continuity and smoothing, the prior art does not have a scheme for intelligent gain following based on audio frame characteristics. There is still much room for improvement in speech fluency and comfort.
Disclosure of Invention
The invention aims at overcoming the defects of the prior art, and provides a sound gain adjusting method and system based on context analysis, which solve the problem of higher background noise of the traditional wide dynamic range compression technology, realize noise reduction and speech highlighting and enhancement of audio signals and further improve the smoothness, comfort and intelligibility of speech.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a method of sound gain adjustment based on context analysis, comprising:
s1, acquiring an audio fragment to be processed in sound, and extracting a plurality of audio frames in the audio fragment;
s2, carrying out Fourier transform processing on each audio frame of the plurality of extracted audio frames, and calculating the frequency domain energy and the short-time average zero-crossing rate of each audio frame after processing;
s3, judging whether the calculated frequency domain energy is smaller than a first preset threshold value, if yes, enabling the current audio frame to be a noise frame, and giving small gain to the noise frame; if not, the current audio frame is a non-noise frame;
s4, judging whether the calculated short-time average zero crossing rate is larger than a second preset threshold value, if so, the current audio frame is a voice frame; if not, the current audio frame is a noise frame, and a small gain is given to the noise frame;
s5, judging whether the frequency domain energy corresponding to the voice frame is smaller than a first preset threshold value, if yes, enabling the current voice frame to be a small voice frame, and giving a large gain to the small voice frame; if not, the current speech frame is a loud frame and gives a small gain to the loud frame.
Further, the step S5 further includes:
s6, performing gain smoothing processing on the connection sequence of the audio clips formed by the audio frames.
Further, the gain smoothing processing in the step S6 specifically includes: when a loud frame is followed by a small sound frame, the gain following becomes fast; gain following becomes fast when a small sound frame is followed by a large sound frame; when a loud frame is followed by a noise frame, the gain following becomes slow; when a small sound frame is followed by a noise frame, the gain follows faster.
Further, the gain smoothing process in step S6 is to control the attacktime parameter and the release time parameter corresponding to the audio frame.
Accordingly, there is also provided a context analysis based sound gain adjustment system comprising:
the acquisition module is used for acquiring the audio clips to be processed in the sound and extracting a plurality of audio frames in the audio clips;
the computing module is used for carrying out Fourier transform processing on each audio frame of the extracted audio frames and computing the frequency domain energy and the short-time average zero-crossing rate of each audio frame after processing;
the first judging module is used for judging whether the calculated frequency domain energy is smaller than a first preset threshold value, if yes, the current audio frame is a noise frame, and small gain is given to the noise frame; if not, the current audio frame is a non-noise frame;
the second judging module is used for judging whether the calculated short-time average zero crossing rate is larger than a second preset threshold value, if yes, the current audio frame is a voice frame; if not, the current audio frame is a noise frame, and a small gain is given to the noise frame;
the third judging module is used for judging whether the frequency domain energy corresponding to the voice frame is smaller than a first preset threshold value, if yes, the current voice frame is a small voice frame, and a large gain is given to the small voice frame; if not, the current speech frame is a loud frame and gives a small gain to the loud frame.
Further, the method further comprises the following steps:
and the processing module is used for performing gain smoothing processing on the connection sequence of the audio clips formed by the audio frames.
Further, the gain smoothing processing in the processing module specifically includes: when a loud frame is followed by a small sound frame, the gain following becomes fast; gain following becomes fast when a small sound frame is followed by a large sound frame; when a loud frame is followed by a noise frame, the gain following becomes slow; when a small sound frame is followed by a noise frame, the gain follows faster.
Further, the gain smoothing processing in the processing module is to control an attach time parameter and a release time parameter corresponding to the audio frame.
Compared with the prior art, the intelligent compression algorithm based on context analysis can be used for marking the difference of the voice frames and the noise frames in the audio signals and giving different gains respectively, and performing gain smoothing according to the sequence ordering condition of the audio fragment frames, so that the effects of noise reduction and voice highlighting of the audio signals on the basis of wide dynamic range compression are achieved.
Drawings
FIG. 1 is a flow chart of a wide dynamic range compression algorithm implementation provided in the background;
fig. 2 is a flowchart of a sound gain adjustment method based on context analysis according to an embodiment.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
The invention aims at overcoming the defects of the prior art, and provides a sound gain adjusting method and system based on context analysis, wherein the intelligent compression algorithm based on the context analysis realizes that a hearing aid adopts different gain modes for voice and noise.
Example 1
The present embodiment provides a sound gain adjustment method based on context analysis, as shown in fig. 2, including:
s1, acquiring an audio fragment to be processed in sound, and extracting a plurality of audio frames in the audio fragment;
s2, carrying out Fourier transform processing on each audio frame of the plurality of extracted audio frames, and calculating the frequency domain energy and the short-time average zero-crossing rate of each audio frame after processing;
s3, judging whether the calculated frequency domain energy is smaller than a first preset threshold value, if yes, enabling the current audio frame to be a noise frame, and giving small gain to the noise frame; if not, the current audio frame is a non-noise frame;
s4, judging whether the calculated short-time average zero crossing rate is larger than a second preset threshold value, if so, the current audio frame is a voice frame; if not, the current audio frame is a noise frame, and a small gain is given to the noise frame;
s5, judging whether the frequency domain energy corresponding to the voice frame is smaller than a first preset threshold value, if yes, enabling the current voice frame to be a small voice frame, and giving a large gain to the small voice frame; if not, the current speech frame is a loud frame and gives a small gain to the loud frame.
In step S1, an audio clip to be processed in a sound is acquired, and a plurality of audio frames in the audio clip are extracted.
The audio frame mode is intercepted and extracted: to obtain short-time audio signals, a windowing operation is performed on the audio signals. The window function slides smoothly over the audio signal, dividing the audio signal into frames. The framing may be continuous or may be by overlapping segments, the overlapping portion being referred to as the frame shift, typically being half the window length.
The present embodiment chooses to add a hamming window. The window function is expressed as:
Figure BDA0004017109960000051
where N is the window length.
After the window type is selected, a suitable window length is selected based on the audio signal. The window length is decisive for whether the amplitude variation of the audio signal can be reflected. If N is particularly large, i.e. equal to several pitch period magnitudes, it is equivalent to a very narrow low-pass filter, where the signal short-time information will change very slowly and thus not adequately reflect the details of the waveform change; conversely, if N is particularly small, i.e., equal to or smaller than the magnitude of one pitch period, the energy of the signal will fluctuate rapidly according to the subtle conditions of the signal waveform, and the bandpass of the filter will be widened, so that smoother short-time information cannot be obtained, and hence the window length is selected to be appropriate. The decay of the window function is substantially independent of the duration of the window, so that only the bandwidth is changed when the width N is changed.
After the window function and the window length are determined, the windowing operation is completed by multiplying the signal function and the window function.
In step S2, fourier transform processing is performed on each of the extracted plurality of audio frames, and frequency domain energy and short-time average zero-crossing rate of each of the processed audio frames are calculated.
Frame fourier transform operation: the fourier spectrum analysis here uses short-time analysis techniques.
The short-time fourier transform of the signal x (n) is defined as:
Figure BDA0004017109960000061
where ω (n) is a window function.
In actual calculation, the discrete fourier transform is generally used instead of the continuous fourier transform, and the signal needs to be periodically spread, that is, x (n) ω (n) is regarded as a period of a certain periodic signal, and then the discrete fourier transform is performed on the periodic signal, so that a power spectrum is obtained. If the window length is L, then the length of x (n) ω (n) is L and the short-time autocorrelation function R n (k) Is 2L in length. If x (n) ω (n) is spread with L as the period, aliasing will occur in the autocorrelation domain, i.e. the value of the cyclic correlation of this periodic function in one period will be linearly related to R n (k) The power spectrum thus obtained is simply a set of undersampled, i.e. L sample values, of the true power spectrum. If all 2L values of the power spectrum are desired, L can be supplemented after x (n) ω (n)Zero, the cycle correlation and the linear correlation are equivalent when the cycle is extended to a signal with a period of 2L and then discrete Fourier transformation is carried out.
The manner in which the frequency domain energy is calculated is expressed as:
Figure BDA0004017109960000062
wherein E represents the frequency domain energy; xω) represents the value of the fourier transform of the current audio frame at ω; omega 0 Representing half of the sampled data.
The short-time average zero-crossing rate is calculated by the following steps: firstly, checking whether zero crossing occurs in pairs for a signal sequence x (n) corresponding to an audio frame, and if so, carrying out zero crossing once; then performing first-order differential calculation, taking an absolute value, and finally performing low-pass filtering; expressed as:
Figure BDA0004017109960000071
where sgn [. Cndot ] is a sign function, namely:
Figure BDA0004017109960000072
in a speech signal, the zero crossing rate of unvoiced sounds is high, and the zero crossing rate of voiced sounds is low. The zero crossing rate of the speech signal will change more strongly.
In step S3, judging whether the calculated frequency domain energy is smaller than a first preset threshold, if yes, the current audio frame is a noise frame, and giving a small gain to the noise frame; if not, the current audio frame is a non-noise frame.
The frequency domain energy of the current audio frame can be calculated according to the formula in the step S2, and whether the current audio frame is a noise frame or not is judged, namely, if the frequency domain energy obtained by the current calculation is smaller than a first threshold value, the current audio frame is a noise frame, and if the current audio frame is larger than the first threshold value, the current audio frame is a non-noise frame.
In step S4, judging whether the calculated short-time average zero crossing rate is greater than a second preset threshold, if yes, the current audio frame is a voice frame; if not, the current audio frame is a noise frame and gives a small gain to the noise frame.
The short-time average zero-crossing rate of the current audio frame can be calculated according to the formula in the step S2, and whether the current audio frame is a voice frame or not is judged, namely, if the frequency domain energy obtained by the current calculation is larger than a second threshold value, the current audio frame is a voice frame, and if the frequency domain energy is smaller than the first threshold value, the current audio frame is a noise frame.
In step S5, judging whether the frequency domain energy corresponding to the voice frame is smaller than a first preset threshold, if yes, the current voice frame is a small voice frame, and giving a large gain to the small voice frame; if not, the current speech frame is a loud frame and gives a small gain to the loud frame.
After step S4 judges which audio frames are voice frames, in order to make the voice gain smoother, continuously judging whether the frequency domain energy in the current voice frame is smaller than a first preset threshold value, if so, taking the current voice frame as a small voice frame; if not, the frame is a loud frame.
And (3) according to the steps S3-S5, a noise frame, a loud frame and a small sound frame in the audio frame can be obtained through calculation, then the intelligent compression algorithm based on context analysis gives small gain to the noise frame, gives small gain to the loud frame and gives large gain to the small sound frame, so that the voice signal in the hearing aid is amplified and compressed within the dynamic hearing range of a wearer, and meanwhile noise reduction is assisted.
The gain was given as follows:
based on the medical diagnostic results (audiogram) of the hearing aid wearer, in combination with the individual condition (age, sex, medical history, etc.) of the wearer, a suitable fitting formula is selected to calculate the range of sound gain for each frequency expected by the wearer, and a customized technique is used to determine the gain scheme. The hearing aid determines the gain coefficient of the corresponding frequency-sized sound according to the gain scheme, discriminates the gain coefficient frame by frame and gives the corresponding gain.
In this embodiment, further comprising:
s6, performing gain smoothing processing on the connection sequence of the audio clips formed by the audio frames.
After determining a specific frame label (such as noise, loud or small sound) of each audio frame in the audio fragment, adjusting gain following parameters release time and attack time according to the frame label connection sequence corresponding to the audio frames in the audio fragment so as to achieve the control function of gain following change among the audio frames.
The method comprises the following steps: when a loud frame is followed by a small sound frame, the gain following is faster, namely the release time of the loud frame is correspondingly reduced, the attack time of the small sound frame is correspondingly increased, and the overall gain following is faster; when a loud frame is connected after a small sound frame, the gain following speed is increased, namely the release time of the small sound frame is correspondingly reduced, the attack time of the loud frame is correspondingly increased, and the overall gain following speed is increased; when a loud frame is connected with a noise frame, the gain following is slow, namely the release time of the loud frame is correspondingly increased, the attack time of the noise frame is correspondingly increased, and the overall gain following is slow; when a noise frame is connected after a small sound frame, the gain following speed is increased, namely the small sound frame release time is correspondingly reduced, the noise frame atacktime is correspondingly increased, and the overall gain following speed is increased. Through the control of the gain following parameters, smooth dynamic gain change for accurately distinguishing voice and noise is realized, the smoothness and comfort level of voice signals are improved, and the intelligibility of voice is further improved. In addition, more accurate voice frame gain change can be performed on the basis of gain following control, and small noise and low gain are given to achieve a better denoising effect.
According to the embodiment, an intelligent compression algorithm based on context analysis is used, voice frames and noise frames in an audio signal can be marked in a distinguishing mode and respectively given with different gains, gain smoothing is performed according to the sequence ordering condition of audio fragment frames, and the effects of noise reduction and voice highlighting of the audio signal on the basis of wide dynamic range compression are achieved.
Example two
The present embodiment provides a sound gain adjustment system based on context analysis, comprising:
the acquisition module is used for acquiring the audio clips to be processed in the sound and extracting a plurality of audio frames in the audio clips;
the computing module is used for carrying out Fourier transform processing on each audio frame of the extracted audio frames and computing the frequency domain energy and the short-time average zero-crossing rate of each audio frame after processing;
the first judging module is used for judging whether the calculated frequency domain energy is smaller than a first preset threshold value, if yes, the current audio frame is a noise frame, and small gain is given to the noise frame; if not, the current audio frame is a non-noise frame;
the second judging module is used for judging whether the calculated short-time average zero crossing rate is larger than a second preset threshold value, if yes, the current audio frame is a voice frame; if not, the current audio frame is a noise frame, and a small gain is given to the noise frame;
the third judging module is used for judging whether the frequency domain energy corresponding to the voice frame is smaller than a first preset threshold value, if yes, the current voice frame is a small voice frame, and a large gain is given to the small voice frame; if not, the current speech frame is a loud frame and gives a small gain to the loud frame.
Further, the method further comprises the following steps:
and the processing module is used for performing gain smoothing processing on the connection sequence of the audio clips formed by the audio frames.
Further, the gain smoothing processing in the processing module specifically includes: when a loud frame is followed by a small sound frame, the gain following becomes fast; gain following becomes fast when a small sound frame is followed by a large sound frame; when a loud frame is followed by a noise frame, the gain following becomes slow; when a small sound frame is followed by a noise frame, the gain follows faster.
Further, the gain smoothing processing in the processing module is to control an attach time parameter and a release time parameter corresponding to the audio frame.
It should be noted that, the sound gain adjustment system based on context analysis provided in this embodiment is similar to the embodiment, and will not be repeated here.
According to the embodiment, an intelligent compression algorithm based on context analysis is used, voice frames and noise frames in an audio signal can be marked in a distinguishing mode and respectively given with different gains, gain smoothing is performed according to the sequence ordering condition of audio fragment frames, and the effects of noise reduction and voice highlighting of the audio signal on the basis of wide dynamic range compression are achieved.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (8)

1. A method of adjusting sound gain based on context analysis, comprising:
s1, acquiring an audio fragment to be processed in sound, and extracting a plurality of audio frames in the audio fragment;
s2, carrying out Fourier transform processing on each audio frame of the plurality of extracted audio frames, and calculating the frequency domain energy and the short-time average zero-crossing rate of each audio frame after processing;
s3, judging whether the calculated frequency domain energy is smaller than a first preset threshold value, if yes, enabling the current audio frame to be a noise frame, and giving small gain to the noise frame; if not, the current audio frame is a non-noise frame;
s4, judging whether the calculated short-time average zero crossing rate is larger than a second preset threshold value, if so, the current audio frame is a voice frame; if not, the current audio frame is a noise frame, and a small gain is given to the noise frame;
s5, judging whether the frequency domain energy corresponding to the voice frame is smaller than a first preset threshold value, if yes, enabling the current voice frame to be a small voice frame, and giving a large gain to the small voice frame; if not, the current speech frame is a loud frame and gives a small gain to the loud frame.
2. The method for adjusting sound gain based on context analysis according to claim 1, wherein the step S5 further comprises:
s6, performing gain smoothing processing on the connection sequence of the audio clips formed by the audio frames.
3. The method for adjusting sound gain based on context analysis according to claim 2, wherein the performing gain smoothing in step S6 is specifically: when a loud frame is followed by a small sound frame, the gain following becomes fast; gain following becomes fast when a small sound frame is followed by a large sound frame; when a loud frame is followed by a noise frame, the gain following becomes slow; when a small sound frame is followed by a noise frame, the gain follows faster.
4. The method according to claim 2, wherein the gain smoothing in step S6 is performed by controlling an attack time parameter and a release time parameter corresponding to the audio frame.
5. A context analysis based sound gain adjustment system, comprising:
the acquisition module is used for acquiring the audio clips to be processed in the sound and extracting a plurality of audio frames in the audio clips;
the computing module is used for carrying out Fourier transform processing on each audio frame of the extracted audio frames and computing the frequency domain energy and the short-time average zero-crossing rate of each audio frame after processing;
the first judging module is used for judging whether the calculated frequency domain energy is smaller than a first preset threshold value, if yes, the current audio frame is a noise frame, and small gain is given to the noise frame; if not, the current audio frame is a non-noise frame;
the second judging module is used for judging whether the calculated short-time average zero crossing rate is larger than a second preset threshold value, if yes, the current audio frame is a voice frame; if not, the current audio frame is a noise frame, and a small gain is given to the noise frame;
the third judging module is used for judging whether the frequency domain energy corresponding to the voice frame is smaller than a first preset threshold value, if yes, the current voice frame is a small voice frame, and a large gain is given to the small voice frame; if not, the current speech frame is a loud frame and gives a small gain to the loud frame.
6. The context analysis based sound gain adjustment system of claim 5, further comprising:
and the processing module is used for performing gain smoothing processing on the connection sequence of the audio clips formed by the audio frames.
7. The context analysis-based sound gain adjustment system according to claim 6, wherein the processing module performs gain smoothing processing specifically: when a loud frame is followed by a small sound frame, the gain following becomes fast; gain following becomes fast when a small sound frame is followed by a large sound frame; when a loud frame is followed by a noise frame, the gain following becomes slow; when a small sound frame is followed by a noise frame, the gain follows faster.
8. The context analysis based sound gain adjustment system of claim 6, wherein the processing module performs gain smoothing to control an attack time parameter and a release time parameter corresponding to the audio frame.
CN202211673539.4A 2022-12-26 2022-12-26 Sound gain adjusting method and system based on context analysis Pending CN116168719A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211673539.4A CN116168719A (en) 2022-12-26 2022-12-26 Sound gain adjusting method and system based on context analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211673539.4A CN116168719A (en) 2022-12-26 2022-12-26 Sound gain adjusting method and system based on context analysis

Publications (1)

Publication Number Publication Date
CN116168719A true CN116168719A (en) 2023-05-26

Family

ID=86417453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211673539.4A Pending CN116168719A (en) 2022-12-26 2022-12-26 Sound gain adjusting method and system based on context analysis

Country Status (1)

Country Link
CN (1) CN116168719A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117153192A (en) * 2023-10-30 2023-12-01 科大讯飞(苏州)科技有限公司 Audio enhancement method, device, electronic equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09127996A (en) * 1995-10-26 1997-05-16 Sony Corp Voice decoding method and device therefor
JP2002051392A (en) * 2000-08-01 2002-02-15 Alpine Electronics Inc In-vehicle conversation assisting device
CN101388216A (en) * 2007-09-13 2009-03-18 富士通株式会社 Sound processing device, apparatus and method for controlling gain
US20140074463A1 (en) * 2011-05-26 2014-03-13 Advanced Bionics Ag Systems and methods for improving representation by an auditory prosthesis system of audio signals having intermediate sound levels
CN103812462A (en) * 2012-11-15 2014-05-21 华为技术有限公司 Loudness control method and device
KR20160050186A (en) * 2014-10-28 2016-05-11 현대엠엔소프트 주식회사 Apparatus for reducing wind noise and method thereof
US9661438B1 (en) * 2015-03-26 2017-05-23 Amazon Technologies, Inc. Low latency limiter
CN106817483A (en) * 2016-12-26 2017-06-09 建荣半导体(深圳)有限公司 A kind of method and apparatus for adjusting speech volume
CN107910013A (en) * 2017-11-10 2018-04-13 广东欧珀移动通信有限公司 The output processing method and device of a kind of voice signal
CN108022595A (en) * 2016-10-28 2018-05-11 电信科学技术研究院 A kind of voice signal noise-reduction method and user terminal
WO2019080553A1 (en) * 2017-10-23 2019-05-02 科大讯飞股份有限公司 Microphone array-based target voice acquisition method and device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09127996A (en) * 1995-10-26 1997-05-16 Sony Corp Voice decoding method and device therefor
JP2002051392A (en) * 2000-08-01 2002-02-15 Alpine Electronics Inc In-vehicle conversation assisting device
CN101388216A (en) * 2007-09-13 2009-03-18 富士通株式会社 Sound processing device, apparatus and method for controlling gain
US20140074463A1 (en) * 2011-05-26 2014-03-13 Advanced Bionics Ag Systems and methods for improving representation by an auditory prosthesis system of audio signals having intermediate sound levels
EP2714184A1 (en) * 2011-05-26 2014-04-09 Advanced Bionics AG Systems and methods for improving representation by an auditory prosthesis system of audio signals having intermediate sound levels
CN103812462A (en) * 2012-11-15 2014-05-21 华为技术有限公司 Loudness control method and device
KR20160050186A (en) * 2014-10-28 2016-05-11 현대엠엔소프트 주식회사 Apparatus for reducing wind noise and method thereof
US9661438B1 (en) * 2015-03-26 2017-05-23 Amazon Technologies, Inc. Low latency limiter
CN108022595A (en) * 2016-10-28 2018-05-11 电信科学技术研究院 A kind of voice signal noise-reduction method and user terminal
CN106817483A (en) * 2016-12-26 2017-06-09 建荣半导体(深圳)有限公司 A kind of method and apparatus for adjusting speech volume
WO2019080553A1 (en) * 2017-10-23 2019-05-02 科大讯飞股份有限公司 Microphone array-based target voice acquisition method and device
CN107910013A (en) * 2017-11-10 2018-04-13 广东欧珀移动通信有限公司 The output processing method and device of a kind of voice signal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
茹婷婷;谢湘;: "耳语音数据库的设计与采集", 清华大学学报(自然科学版), no. 1 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117153192A (en) * 2023-10-30 2023-12-01 科大讯飞(苏州)科技有限公司 Audio enhancement method, device, electronic equipment and storage medium
CN117153192B (en) * 2023-10-30 2024-02-20 科大讯飞(苏州)科技有限公司 Audio enhancement method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11056130B2 (en) Speech enhancement method and apparatus, device and storage medium
Ma et al. Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions
NO20191310A1 (en) Audio amplification control using specific volume-based hearing event detection
CN109473115B (en) Digital audio signal volume equal loudness adjusting method
CN110473567A (en) Audio-frequency processing method, device and storage medium based on deep neural network
Stone et al. Quantifying the effects of fast-acting compression on the envelope of speech
US20080177539A1 (en) Method of processing voice signals
EP3074975B1 (en) Method of operating a hearing aid system and a hearing aid system
KR20060013400A (en) Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal
TR201810466T4 (en) Apparatus and method for processing an audio signal to improve speech using feature extraction.
CN105931649A (en) Ultra-low time delay audio processing method and system based on spectrum analysis
CN110085259B (en) Audio comparison method, device and equipment
DE102008031150B3 (en) Method for noise suppression and associated hearing aid
CN104867499A (en) Frequency-band-divided wiener filtering and de-noising method used for hearing aid and system thereof
CN116168719A (en) Sound gain adjusting method and system based on context analysis
Li et al. Wavelet-based nonlinear AGC method for hearing aid loudness compensation
US11445307B2 (en) Personal communication device as a hearing aid with real-time interactive user interface
JPH06208395A (en) Formant detecting device and sound processing device
Kates et al. Integrating cognitive and peripheral factors in predicting hearing-aid processing effectiveness
US10013992B2 (en) Fast computation of excitation pattern, auditory pattern and loudness
CN105869652A (en) Psychological acoustic model calculation method and device
CN110010150A (en) Auditory Perception speech characteristic parameter extracting method based on multiresolution
US20240071411A1 (en) Determining dialog quality metrics of a mixed audio signal
Tiwari et al. Sliding-band dynamic range compression for use in hearing aids
Jiang et al. Speech noise reduction algorithm in digital hearing aids based on an improved sub-band SNR estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination