CN116168719A

CN116168719A - Sound gain adjusting method and system based on context analysis

Info

Publication number: CN116168719A
Application number: CN202211673539.4A
Authority: CN
Inventors: 李鹏; 朱尚文; 李子豪
Original assignee: Acosound Technology Co ltd
Current assignee: Acosound Technology Co ltd
Priority date: 2022-12-26
Filing date: 2022-12-26
Publication date: 2023-05-26

Abstract

The invention discloses a sound gain adjusting method and a system based on context analysis, wherein the related gain adjusting method comprises the following steps: s1, acquiring an audio fragment to be processed in sound, and extracting a plurality of audio frames in the audio fragment; s2, carrying out Fourier transform processing on the audio frame, and calculating frequency domain energy and short-time average zero-crossing rate of the audio frame; s3, judging whether the frequency domain energy is smaller than a first preset threshold value, if yes, enabling the current audio frame to be a noise frame, and giving small gain to the noise frame; if not, the current audio frame is a non-noise frame; s4, judging whether the short-time average zero-crossing rate is larger than a second preset threshold value, if so, the current audio frame is a voice frame; if not, the current audio frame is a noise frame, and a small gain is given to the noise frame; s5, judging whether the frequency domain energy of the voice frame is smaller than a first preset threshold value, if yes, enabling the current voice frame to be a small voice frame, and giving a large gain to the small voice frame; if not, the current speech frame is a loud frame and gives a small gain to the loud frame.

Description

Sound gain adjusting method and system based on context analysis

Technical Field

The invention relates to the technical field of hearing aids, in particular to a sound gain adjusting method and system based on context analysis.

Background

A wide dynamic range compression technique (WDRC) is a technique in which the gain of a hearing aid is changed in real time as the intensity of an input sound signal changes, so that the amplified sound signal is well within the reduced auditory dynamic range of a hearing impaired patient. The gain of the wide dynamic range is variable and this system gives the proper gain for low intensity sound signals. There may also be different compression ratios for different frequency bands to achieve the proper loudness level.

The attach-release algorithm: two parameters need to be determined in the algorithm: the startup parameter atack time and release time. These two parameters determine the response speed of the algorithm to the increase and decrease of the signal, and also determine the gain change speed. For the rising signal, the response speed must be fast so as to keep up with the change of the signal, and when the signal reaches the peak value, the gain is adjusted in time, so that the value of the attack time is usually smaller, and is generally 30-200 ms. For the descending signal, the value of the release time is usually larger, and is generally 0.5-2.0 s, depending on the gap time of the voice when the release time is selected.

The context analysis technology is to divide the audio frames of different components (voice, noise and music) in the sound fragment by utilizing the difference of the audio characteristics.

The implementation of the wide dynamic range compression algorithm is illustrated in fig. 1, where the speech is first divided in the frequency domain into a number of separate frequency regions, which are called channels. Within each channel, the gain of that channel is determined by mapping the dynamic range of normal sound into the hearing dynamic range of the hearing impaired person, based on the hearing threshold map of the hearing impaired person, and processing it independently. And then the gains calculated in different channels are acted on the input signal in the frequency domain, and finally the sound signal is synthesized and then output. The specific implementation steps are as follows:

the signal is first framed and time-frequency converted. The time-frequency transform may use an FFT transform, where a WOLA (Weighted Overlap-Add) filter is used to perform the time-frequency transform of the signal. WOLA is an efficient time-frequency conversion and sub-band division implementation method, and WOLA filters divide the full frequency band into multiple sub-bands uniformly.

Then, the channel is divided, that is, a plurality of sub-bands are properly combined according to a certain rule on the whole frequency band and divided into N channels. The bandwidth contained by each channel is different due to the non-linearity of the human ear's perception of frequency. After the channels are divided in the frequency domain, dynamic range compression processing can be independently performed in each channel.

The sound pressure level and gain of each channel are then calculated: assuming that there are K subbands in the channel n, the WOLA analysis result is expressed as:

Xk(m)＝rk(m)+ik(m)*i

wherein m is the number of the voice frame; k=0, 1,2,..k-1 is subband number; rk (m) and ik (m) are real and imaginary values, respectively.

Energy root mean square P for K subbands in the channel _RMS (m) and sound pressure level SPL (m), expressed as:

SPL(m)＝20lg(P _RMS (m))

after the sound pressure level is obtained, gain calculation can be performed. In order to reflect the change speed of real voice and reduce distortion, before calculating gain, the energy root mean square value P calculated in each channel must be calculated in time domain _RMS (m) tracking in a manner conforming to the speed of change of the speech signal, and calculating the gain with a relatively gentle value after tracking. Smoothing P here using an attack-release algorithm _RMS (m) and calculating the sound pressure level SPL (m) from the smoothed energy value and thereby calculating the gain. So far, the dynamic range compression processing is respectively carried out on different frequency bands.

Existing wide dynamic range compression techniques and speech noise discrimination techniques have the following drawbacks in the application of hearing aid compensation algorithms:

1. for small noise, the prior art does not deal with the problem of high output audio background noise in order to ensure the intelligibility and fluency of the audio.

2. For audio frame continuity and smoothing, the prior art does not have a scheme for intelligent gain following based on audio frame characteristics. There is still much room for improvement in speech fluency and comfort.

Disclosure of Invention

The invention aims at overcoming the defects of the prior art, and provides a sound gain adjusting method and system based on context analysis, which solve the problem of higher background noise of the traditional wide dynamic range compression technology, realize noise reduction and speech highlighting and enhancement of audio signals and further improve the smoothness, comfort and intelligibility of speech.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a method of sound gain adjustment based on context analysis, comprising:

s1, acquiring an audio fragment to be processed in sound, and extracting a plurality of audio frames in the audio fragment;

s2, carrying out Fourier transform processing on each audio frame of the plurality of extracted audio frames, and calculating the frequency domain energy and the short-time average zero-crossing rate of each audio frame after processing;

s3, judging whether the calculated frequency domain energy is smaller than a first preset threshold value, if yes, enabling the current audio frame to be a noise frame, and giving small gain to the noise frame; if not, the current audio frame is a non-noise frame;

s4, judging whether the calculated short-time average zero crossing rate is larger than a second preset threshold value, if so, the current audio frame is a voice frame; if not, the current audio frame is a noise frame, and a small gain is given to the noise frame;

s5, judging whether the frequency domain energy corresponding to the voice frame is smaller than a first preset threshold value, if yes, enabling the current voice frame to be a small voice frame, and giving a large gain to the small voice frame; if not, the current speech frame is a loud frame and gives a small gain to the loud frame.

Further, the step S5 further includes:

s6, performing gain smoothing processing on the connection sequence of the audio clips formed by the audio frames.

Further, the gain smoothing processing in the step S6 specifically includes: when a loud frame is followed by a small sound frame, the gain following becomes fast; gain following becomes fast when a small sound frame is followed by a large sound frame; when a loud frame is followed by a noise frame, the gain following becomes slow; when a small sound frame is followed by a noise frame, the gain follows faster.

Further, the gain smoothing process in step S6 is to control the attacktime parameter and the release time parameter corresponding to the audio frame.

Accordingly, there is also provided a context analysis based sound gain adjustment system comprising:

the acquisition module is used for acquiring the audio clips to be processed in the sound and extracting a plurality of audio frames in the audio clips;

the computing module is used for carrying out Fourier transform processing on each audio frame of the extracted audio frames and computing the frequency domain energy and the short-time average zero-crossing rate of each audio frame after processing;

the first judging module is used for judging whether the calculated frequency domain energy is smaller than a first preset threshold value, if yes, the current audio frame is a noise frame, and small gain is given to the noise frame; if not, the current audio frame is a non-noise frame;

the second judging module is used for judging whether the calculated short-time average zero crossing rate is larger than a second preset threshold value, if yes, the current audio frame is a voice frame; if not, the current audio frame is a noise frame, and a small gain is given to the noise frame;

the third judging module is used for judging whether the frequency domain energy corresponding to the voice frame is smaller than a first preset threshold value, if yes, the current voice frame is a small voice frame, and a large gain is given to the small voice frame; if not, the current speech frame is a loud frame and gives a small gain to the loud frame.

Further, the method further comprises the following steps:

and the processing module is used for performing gain smoothing processing on the connection sequence of the audio clips formed by the audio frames.

Further, the gain smoothing processing in the processing module specifically includes: when a loud frame is followed by a small sound frame, the gain following becomes fast; gain following becomes fast when a small sound frame is followed by a large sound frame; when a loud frame is followed by a noise frame, the gain following becomes slow; when a small sound frame is followed by a noise frame, the gain follows faster.

Further, the gain smoothing processing in the processing module is to control an attach time parameter and a release time parameter corresponding to the audio frame.

Compared with the prior art, the intelligent compression algorithm based on context analysis can be used for marking the difference of the voice frames and the noise frames in the audio signals and giving different gains respectively, and performing gain smoothing according to the sequence ordering condition of the audio fragment frames, so that the effects of noise reduction and voice highlighting of the audio signals on the basis of wide dynamic range compression are achieved.

Drawings

FIG. 1 is a flow chart of a wide dynamic range compression algorithm implementation provided in the background;

fig. 2 is a flowchart of a sound gain adjustment method based on context analysis according to an embodiment.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

The invention aims at overcoming the defects of the prior art, and provides a sound gain adjusting method and system based on context analysis, wherein the intelligent compression algorithm based on the context analysis realizes that a hearing aid adopts different gain modes for voice and noise.

Example 1

The present embodiment provides a sound gain adjustment method based on context analysis, as shown in fig. 2, including:

In step S1, an audio clip to be processed in a sound is acquired, and a plurality of audio frames in the audio clip are extracted.

The audio frame mode is intercepted and extracted: to obtain short-time audio signals, a windowing operation is performed on the audio signals. The window function slides smoothly over the audio signal, dividing the audio signal into frames. The framing may be continuous or may be by overlapping segments, the overlapping portion being referred to as the frame shift, typically being half the window length.

The present embodiment chooses to add a hamming window. The window function is expressed as:

where N is the window length.

After the window type is selected, a suitable window length is selected based on the audio signal. The window length is decisive for whether the amplitude variation of the audio signal can be reflected. If N is particularly large, i.e. equal to several pitch period magnitudes, it is equivalent to a very narrow low-pass filter, where the signal short-time information will change very slowly and thus not adequately reflect the details of the waveform change; conversely, if N is particularly small, i.e., equal to or smaller than the magnitude of one pitch period, the energy of the signal will fluctuate rapidly according to the subtle conditions of the signal waveform, and the bandpass of the filter will be widened, so that smoother short-time information cannot be obtained, and hence the window length is selected to be appropriate. The decay of the window function is substantially independent of the duration of the window, so that only the bandwidth is changed when the width N is changed.

After the window function and the window length are determined, the windowing operation is completed by multiplying the signal function and the window function.

In step S2, fourier transform processing is performed on each of the extracted plurality of audio frames, and frequency domain energy and short-time average zero-crossing rate of each of the processed audio frames are calculated.

Frame fourier transform operation: the fourier spectrum analysis here uses short-time analysis techniques.

The short-time fourier transform of the signal x (n) is defined as:

where ω (n) is a window function.

In actual calculation, the discrete fourier transform is generally used instead of the continuous fourier transform, and the signal needs to be periodically spread, that is, x (n) ω (n) is regarded as a period of a certain periodic signal, and then the discrete fourier transform is performed on the periodic signal, so that a power spectrum is obtained. If the window length is L, then the length of x (n) ω (n) is L and the short-time autocorrelation function R _n (k) Is 2L in length. If x (n) ω (n) is spread with L as the period, aliasing will occur in the autocorrelation domain, i.e. the value of the cyclic correlation of this periodic function in one period will be linearly related to R _n (k) The power spectrum thus obtained is simply a set of undersampled, i.e. L sample values, of the true power spectrum. If all 2L values of the power spectrum are desired, L can be supplemented after x (n) ω (n)Zero, the cycle correlation and the linear correlation are equivalent when the cycle is extended to a signal with a period of 2L and then discrete Fourier transformation is carried out.

The manner in which the frequency domain energy is calculated is expressed as:

wherein E represents the frequency domain energy; xω) represents the value of the fourier transform of the current audio frame at ω; omega ₀ Representing half of the sampled data.

The short-time average zero-crossing rate is calculated by the following steps: firstly, checking whether zero crossing occurs in pairs for a signal sequence x (n) corresponding to an audio frame, and if so, carrying out zero crossing once; then performing first-order differential calculation, taking an absolute value, and finally performing low-pass filtering; expressed as:

where sgn [. Cndot ] is a sign function, namely:

in a speech signal, the zero crossing rate of unvoiced sounds is high, and the zero crossing rate of voiced sounds is low. The zero crossing rate of the speech signal will change more strongly.

In step S3, judging whether the calculated frequency domain energy is smaller than a first preset threshold, if yes, the current audio frame is a noise frame, and giving a small gain to the noise frame; if not, the current audio frame is a non-noise frame.

The frequency domain energy of the current audio frame can be calculated according to the formula in the step S2, and whether the current audio frame is a noise frame or not is judged, namely, if the frequency domain energy obtained by the current calculation is smaller than a first threshold value, the current audio frame is a noise frame, and if the current audio frame is larger than the first threshold value, the current audio frame is a non-noise frame.

In step S4, judging whether the calculated short-time average zero crossing rate is greater than a second preset threshold, if yes, the current audio frame is a voice frame; if not, the current audio frame is a noise frame and gives a small gain to the noise frame.

The short-time average zero-crossing rate of the current audio frame can be calculated according to the formula in the step S2, and whether the current audio frame is a voice frame or not is judged, namely, if the frequency domain energy obtained by the current calculation is larger than a second threshold value, the current audio frame is a voice frame, and if the frequency domain energy is smaller than the first threshold value, the current audio frame is a noise frame.

In step S5, judging whether the frequency domain energy corresponding to the voice frame is smaller than a first preset threshold, if yes, the current voice frame is a small voice frame, and giving a large gain to the small voice frame; if not, the current speech frame is a loud frame and gives a small gain to the loud frame.

After step S4 judges which audio frames are voice frames, in order to make the voice gain smoother, continuously judging whether the frequency domain energy in the current voice frame is smaller than a first preset threshold value, if so, taking the current voice frame as a small voice frame; if not, the frame is a loud frame.

And (3) according to the steps S3-S5, a noise frame, a loud frame and a small sound frame in the audio frame can be obtained through calculation, then the intelligent compression algorithm based on context analysis gives small gain to the noise frame, gives small gain to the loud frame and gives large gain to the small sound frame, so that the voice signal in the hearing aid is amplified and compressed within the dynamic hearing range of a wearer, and meanwhile noise reduction is assisted.

The gain was given as follows:

based on the medical diagnostic results (audiogram) of the hearing aid wearer, in combination with the individual condition (age, sex, medical history, etc.) of the wearer, a suitable fitting formula is selected to calculate the range of sound gain for each frequency expected by the wearer, and a customized technique is used to determine the gain scheme. The hearing aid determines the gain coefficient of the corresponding frequency-sized sound according to the gain scheme, discriminates the gain coefficient frame by frame and gives the corresponding gain.

In this embodiment, further comprising:

After determining a specific frame label (such as noise, loud or small sound) of each audio frame in the audio fragment, adjusting gain following parameters release time and attack time according to the frame label connection sequence corresponding to the audio frames in the audio fragment so as to achieve the control function of gain following change among the audio frames.

The method comprises the following steps: when a loud frame is followed by a small sound frame, the gain following is faster, namely the release time of the loud frame is correspondingly reduced, the attack time of the small sound frame is correspondingly increased, and the overall gain following is faster; when a loud frame is connected after a small sound frame, the gain following speed is increased, namely the release time of the small sound frame is correspondingly reduced, the attack time of the loud frame is correspondingly increased, and the overall gain following speed is increased; when a loud frame is connected with a noise frame, the gain following is slow, namely the release time of the loud frame is correspondingly increased, the attack time of the noise frame is correspondingly increased, and the overall gain following is slow; when a noise frame is connected after a small sound frame, the gain following speed is increased, namely the small sound frame release time is correspondingly reduced, the noise frame atacktime is correspondingly increased, and the overall gain following speed is increased. Through the control of the gain following parameters, smooth dynamic gain change for accurately distinguishing voice and noise is realized, the smoothness and comfort level of voice signals are improved, and the intelligibility of voice is further improved. In addition, more accurate voice frame gain change can be performed on the basis of gain following control, and small noise and low gain are given to achieve a better denoising effect.

According to the embodiment, an intelligent compression algorithm based on context analysis is used, voice frames and noise frames in an audio signal can be marked in a distinguishing mode and respectively given with different gains, gain smoothing is performed according to the sequence ordering condition of audio fragment frames, and the effects of noise reduction and voice highlighting of the audio signal on the basis of wide dynamic range compression are achieved.

Example two

The present embodiment provides a sound gain adjustment system based on context analysis, comprising:

Further, the method further comprises the following steps:

It should be noted that, the sound gain adjustment system based on context analysis provided in this embodiment is similar to the embodiment, and will not be repeated here.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A method of adjusting sound gain based on context analysis, comprising:

2. The method for adjusting sound gain based on context analysis according to claim 1, wherein the step S5 further comprises:

3. The method for adjusting sound gain based on context analysis according to claim 2, wherein the performing gain smoothing in step S6 is specifically: when a loud frame is followed by a small sound frame, the gain following becomes fast; gain following becomes fast when a small sound frame is followed by a large sound frame; when a loud frame is followed by a noise frame, the gain following becomes slow; when a small sound frame is followed by a noise frame, the gain follows faster.

4. The method according to claim 2, wherein the gain smoothing in step S6 is performed by controlling an attack time parameter and a release time parameter corresponding to the audio frame.

5. A context analysis based sound gain adjustment system, comprising:

6. The context analysis based sound gain adjustment system of claim 5, further comprising:

7. The context analysis-based sound gain adjustment system according to claim 6, wherein the processing module performs gain smoothing processing specifically: when a loud frame is followed by a small sound frame, the gain following becomes fast; gain following becomes fast when a small sound frame is followed by a large sound frame; when a loud frame is followed by a noise frame, the gain following becomes slow; when a small sound frame is followed by a noise frame, the gain follows faster.

8. The context analysis based sound gain adjustment system of claim 6, wherein the processing module performs gain smoothing to control an attack time parameter and a release time parameter corresponding to the audio frame.