INTELLIGENT HEARING AID
Technical Field
[001 ] The present disclosure relates to intelligent hearing aids, and more particularly, to an intelligent hearing aid with enhanced audio processing performance comprising a microphone, a smartphone and an earphone operatively connected with each other.
Background Art
[002] There are needs of intelligent hearing aids with enhanced audio processing performance for hearing-impaired elderly and such enhanced intelligent hearing aids usually require dedicated high end or expensive components, such as a faster CPU and relatively high quality audio unit. It is desirable to make use of existing and commonly available equipment of hearing-impaired elderly to provide an enhanced hearing aid.
Brief Summary of Invention
[003] There is disclosed an intelligent hearing aid, comprising a microphone, a smartphone and an earphone operatively connected with each other, wherein the microphone is configured for receiving, recording, and transferring environmental audio data / signals to the smartphone; the smartphone is configured for carrying out noise suppression and data amplification on the audio data receiving from the microphone to generate and transfer an enhanced audio data to the earphone; and the earphone is configured for receiving and outputting the enhanced audio data; wherein the noise suppression combines knowledge of pattern recognition and signal processing and probability theory to facilitate estimation of noise power spectral density, and the estimation is conducted under assumptions including: (1 ) speech and noise signals are both addictive in short-time Fourier domain; (2) the speech and noise signals are assumed to be zero-mean and independent of each other and presented as
where being statistical expectation operator; (3) the speech and noise signals are assumed to have spectral coefficients having a complex Gaussian distribution, to deduce Minimum mean-square error estimator,
(4) the noise signal is assumed to be more stationary than the speech signal.
Brief Description of Drawings
[004] The disclosure will be described by way of example with reference to the accompanying drawings, in which:
[005] Figure 1 is a flow chart of a noise suppression algorithm of an example intelligent hearing aid of the present disclosure.
Detailed Description of Invention
[006] In order to address the needs of hearing-impaired elderly, the applicant has designed a hearing aid app for them to listen just by connecting a smartphone with earphone. Different noise filters are devised according to different environment at which users are located. The present disclosure adopts warm tone color in the user-interface for easy-reading to the users.
[007] After downloading and installing the app, the user needs to select different scene mode such as Indoor mode, Outdoor mode, Spacious area mode, and the like, according to audio environment. The environmental sound will be received via the microphone and processed by the App, and then amplified clean audio would play out from the earphone. The app support background running then the application and mobile screen are not necessary to keep on.
[008] The hearing aid of the present disclosure is combining smartphone and headset/earphone as an intelligent hearing aid. First of all, the microphone on earphone keep recording the audio sound in the environment to smartphone operation system. Within the smartphone operation system, the input audio data would be amplified and noise suppression processed. Thirdly, the amplified and clean audio would be transmitted to earphone output to play out for hearing impaired people.
[009] The present disclosure adopts circular buffers and multi-threading mechanism to improve the audio quality without negative effect. Circular buffer had separated the recording thread and processing thread to preserve the continuation of audio output. For implementation, multi-threading was using to divide a first buffer in processing thread and a second buffer in recording thread.
[0010] According to the present disclosure, the recording process contains two major steps. One is called analog to digital conversion, the other one is called data streaming and storing. The buffer is a memory structure used in the data streaming process to temporarily hold the
collected digital voice data. Once the buffer is filled with data, the data will be encapsulated as a package and passed to next process. The size of the buffer cannot be chosen randomly. The buffer size should be determined by the sampling frequency of the analog to digital convertor.
[001 1 ] One critical issue kept bothering the implement of digital hearing aids is the latency. When people are talking, they will most likely expect two kinds of signal, the movement of the speaker's mouth and the sound they are producing. Since the light will travel much faster than the sound, people will observe the movement before they hear the sound. However, human brain cannot notice the time gap between two signals if the gap is smaller than around 30 ms. In the case of digital hearing aid application, the buffer will not transfer any data out before it is full. This character will introduce much delay. The delay will be long enough for the brain to identify it, which means the user will experience the mismatch between the observed movement of the speaker's mouth and the sound they heard. This defect will bring the user tremendous difficulty in receiving and understanding the content of the voice.
[0012] Therefore, the present disclosure deploys a function in the AudioRecord class (getMinBufferSize) to identify the least size of the buffer that won't introduce any side effect. Further information about this function can be found at
https://developer.android.eom/reference/android/media/AudioRecord.html#getMinBuf ferSize%28int,%20int,%20int%29 .
[0013] To improve the audio processing performance of hearing aid app, the real time noise suppression algorithm need to be implemented. For this purpose, the present disclosure have modified the existing prior art noise suppression algorithm from audio-file-based to real time processing. On the other hand, for better implementation of real time noise suppression processing, the computation complexity of original algorithm had also been simplified.
[0014] The prior art statistical-model-based algorithm cannot perform such function. The prior art statistical-model-based algorithm allows the users estimate the noise power spectrum density by fitting the input data into certain statistical model. The estimation result cannot ensure certain level of confidence if the input data failed to exceed certain amount data. The estimation result will be untrustworthy if no enough data was given. The algorithm has to wait for relatively long time to collect enough voice data. However, the characteristic of real-time process violate this condition. The latency of process won't be tolerated if the algorithm needs to wait for the voice and collect large amount of voice input data. Therefore, there is a tradeoff between the accuracy of the estimation and the processing latency.
[0015] The modification of the present disclosure allows the relevant algorithm to perform these functions. To cope with the demand of real-time processing, historical data are involved in the estimation process. A short period of present audio data will be attached to a long period of historical one, which are generated last iteration and stored in a register with certain volume. During each iteration, only the data in the slide window will be involved and the window will slide a few data point to next position after the estimation is done in this iteration. This data arrangement mechanism is inspired by some signal compression technique in the digital communication area. Not only the duration of estimation has been curtailed, but also the level of confidence of the estimation has been raised.
[0016] The Comparison between the existing algorithm and modified algorithm :
[0017] Referring to Figure 1 , which is a flow chart of a noise suppression algorithm of an example intelligent hearing aid of the present disclosure. The present disclosure adopts an enhanced noise cancellation algorithm, which aims at eliminate the corruption of signal, and which was mainly introduced to the single channel audio signal by non-stationary noise. It combines the knowledge of pattern recognition and signal processing to facilitate the estimation of the noise power spectral density. The probability theory was also involved in the developing of this algorithm to build a theoretical noise pattern for the speech enhancement.
[0018] The estimation is conducted under several assumptions: (1 ) The speech and the noise signals are both addictive in the short-time Fourier domain. (2) The signals (speech and noise) are assumed to be zero-mean and independent of each other, which can be presented as
with being the statistical expectation operator. (3) The signals are assumed to have a complex Gaussian distribution in the spectral coefficients to deduce the Minimum mean- square error (MMSE) estimator,
(4) It is reasonable to assume the noise signal is more stationary than the speech signal.
[0019] The design case is that a microphone will record the audio signals at certain frequency and the audio signal sequence will be stored in a buffer, where can contain 256 signals. Once the buffer is full, the 256 signals will be forward to processor. There are THREE main procedures in the iterative noise cancellation. The first key step is aligning the new coming data with historical data reserved by a register. The noise estimation and elimination are done in the second process. The last step is called clean signal finalization. The clean audio signal can be achievable after the final state.
[0020] Each audio signal is an 8-bit (1 byte) double digit. The data frame fed into the iterative noise elimination process are designed to be 2048 byte. Therefore, we assign a register with a 1792-byte storage size to keep the historical data. To ensure the reliability of the estimation, this algorithm combines the current data with historical data. Once there is a new 256-byte data stream coming in, this new data stream will be aligned at the back of the data stored in the register, then multiply with a 2048-point coefficient sequence of Hanning window. That's what must be done in the first data alignment process. As one cycle of noise cancellation requires only 256 new recorded data, real-time processing can be achieved with an inappreciable delay.
[0021 ] Before the highlight of this algorithm, the iterative parameters are all set to zero for convenience except the a priori probability is set to 0.5, which means the algorithm assumes the mean of a posteriori probability of the speech presence is also 50%. The data stream will be transform to the frequency domain by Fourier transform and only half of the frequency coefficients on the positive axis will be remained as the signal power spectrum to decrease
the computational complexity, since those coefficients are symmetric. The posteriori SNR estimation is based on the new signal power spectrum and the noise power spectrum computed in last iteration. Then, the a posteriori speech presence probability (SPP) presents by
where is the a priori SNR for speech presence in our model, and
[0022] The present disclosure employs the spectral noise power estimation of the previous frame, . The noise power estimation and the estimation of a posteriori SPP
are to some extent in inverse proportion. Therefore, the underestimation of aft will slow down the converging to the true value of it. To avoid stagnation, the technical solution of the present disclosure recursively smooth over time, as
and enforce the current a posteriori SPP to be lower than 0.99 if the P(l) larger
[0023] The MMSE estimator under speech presence can be computed as follows.
Then, the estimated spectral noise power is obtained by this recursive smoothing of
where apow = 0.8 is the smoothing factor. According to this estimated spectral nose power density, the estimated a priori SNR can be presented as ξ
[0024] Therefore, the estimated spectral coefficients of clean audio signal are given by
[0025] The clean signal in real time can be easily collected after applying inverse the Fourier transform to the complete spectral domain coefficients of the data stream. (The negative half of the domain was excluded after the data was transform to frequency domain.)
[0026] In the finalization of clean signal, the data must multiply a synthesis to avoid distortion due to the data segmentation, the new clean data will be appended to the earlier attained clean data, which means this new 2048-point signal will be add to the earlier reserved clean signals. The first 256 data will be converted to audio signal and removed from the signal stream. The remaining 1792-byte data will be saved in certain register waiting for the next iteration.
[0027] Through all the computation described above, a relative noise-free audio signal will be generated and heard. There are evidences that this algorithm can approach the optimal solution in a fast speed.
[0028] Features set out in the claims hereto (jointly and severally where appropriate) are to form part of this disclosure and are incorporated herein by reference.
[0029] While various examples or embodiments have been described herein, it should be appreciated that they are for illustration and are not for scope restriction. It should be appreciated that portions or parts of the various example embodiments can be excerpted for combination and/or mix-and-match where appropriate to form other variants without loss of generality.