WO2020177371A1 - Environment adaptive neural network noise reduction method and system for digital hearing aids, and storage medium - Google Patents

Environment adaptive neural network noise reduction method and system for digital hearing aids, and storage medium Download PDF

Info

Publication number
WO2020177371A1
WO2020177371A1 PCT/CN2019/117075 CN2019117075W WO2020177371A1 WO 2020177371 A1 WO2020177371 A1 WO 2020177371A1 CN 2019117075 W CN2019117075 W CN 2019117075W WO 2020177371 A1 WO2020177371 A1 WO 2020177371A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
noise reduction
noise
frame
scene recognition
Prior art date
Application number
PCT/CN2019/117075
Other languages
French (fr)
Chinese (zh)
Inventor
张禄
王明江
张啟权
轩晓光
张馨
孙凤娇
Original Assignee
哈尔滨工业大学(深圳)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 哈尔滨工业大学(深圳) filed Critical 哈尔滨工业大学(深圳)
Publication of WO2020177371A1 publication Critical patent/WO2020177371A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception

Definitions

  • the present invention relates to the field of software technology, in particular to an environment adaptive neural network noise reduction method, system and storage medium for digital hearing aids.
  • noise reduction algorithms to eliminate background noise interference in the environment to meet the requirements of human hearing comfort. Due to the requirements of digital hearing aids for real-time speech processing, the noise reduction algorithms built into hearing aids mostly use algorithms with low computational complexity such as spectral subtraction and Wiener filtering. These algorithms can only deal with simple and stable noise interference environments. The performance is poor in complex noise environments such as ratio and transient noise, and the wearing experience of hearing loss patients is not good.
  • the invention discloses an environment-adaptive neural network noise reduction method for digital hearing aids, which utilizes the powerful mapping ability of the deep neural network and combines an environment-adaptive strategy to realize a high-performance noise reduction algorithm for complex noise environments .
  • the present invention provides an environmental adaptive neural network noise reduction method for digital hearing aids, which includes the following steps:
  • Preprocessing step receiving noisy speech signal, and transmitting the noisy speech signal to the acoustic scene recognition module after sampling and framing;
  • Scene recognition step the acoustic scene recognition module is used to identify the acoustic scene in which it is located, and then the acoustic scene recognition module autonomously selects different neural network models in the neural network noise reduction module to send;
  • Neural network noise reduction step The neural network noise reduction model receives the classification results sent by the acoustic scene recognition module and performs targeted noise reduction processing on the noise in different scenes.
  • the acoustic scene recognition module adopts an LSTM neural network structure with a memory function for time series.
  • the specific steps are as follows:
  • S2 The LSTM neural network reads in a frame of Mel cepstrum coefficient features for processing, and outputs the classification result when a certain frame is reached.
  • the LSTM neural network structure includes an input layer, a hidden layer, and an output layer.
  • the neural units of the output layer correspond to different scene categories.
  • the LSTM neural network will not only process the current input, but also compare with the previously retained The output is combined to realize the function of memory. When the memory of the set number of frames is accumulated, the classification result is output.
  • the LSTM neural network structure memory update principle is as follows:
  • the LSTM neural network structure combines the input feature t n of the current frame with the previously retained output result h n-1 , and also inputs the state C n-1 of the previous frame for judgment, and produces an output h of the current frame n and the output state C n of a current frame, iterate until the memory condition of the required frame is satisfied, and perform softmax transformation on the final output h to obtain the predicted probability of the output layer.
  • the scene recognition step also includes the calculation of the loss loss function during the training of the LSTM neural network.
  • the calculation formula is as follows:
  • the noise reduction models in different scenarios all adopt a fully connected neural network structure, but the number of layers of the fully connected neural network structure and the number of neurons in each layer are different;
  • the noise reduction model of the fully connected neural network structure includes the following steps:
  • Training data set steps select pure speech data as the training set, and then randomly mix the noise data and pure speech to obtain the required noisy training data;
  • Model parameter tuning steps use the minimum mean square error as the cost function, and then tune the model parameters according to the training set loss value and the verification set loss value to obtain the required neural network structure;
  • the verification set is selected as the pure voice data of the verification set and mixed with the noise data to obtain the noisy voice data of the verification set;
  • the minimum mean square error calculation formula is as follows:
  • each hidden layer adopts a regularization method with a drop rate of 0.8, and The coefficient of the L2 regularization term is set to 0.00001; during training, the Adam optimization algorithm is used for back propagation, and iterates 200 times at a learning rate of 0.0001 to achieve a better noise suppression effect.
  • the voice signal received by the microphone is sampled and divided into time domain signals with a frame length of 256 points, the sampling rate is 16000 Hz, and each frame is 16 ms;
  • step S1 the 39-dimensional Mel cepstrum coefficient feature is extracted for each frame
  • the LSTM neural network reads in a frame of Mel cepstrum coefficient features for processing, and outputs the classification result when it reaches 100 frames.
  • the present invention also discloses an environmental adaptive neural network noise reduction system for digital hearing aids, including: a memory, a processor, and a computer program stored on the memory, and the computer program is configured to be called by the processor When implementing the steps of the method described in the claims.
  • the present invention also discloses a computer-readable storage medium storing a computer program, and the computer program is configured to implement the steps of the method described in the claims when called by a processor.
  • the beneficial effects of the present invention are: 1. It can ensure the real-time performance of speech processing, only carry out the forward propagation of the neural network, and the amount of calculation is not high; 2. It can recognize the acoustic scene in which it is located, and then autonomously select different nerves The network model performs targeted noise reduction processing on the noise in different scenes, which can ensure better speech quality and speech intelligibility; 3. Can effectively suppress instantaneous noise; 4. Can work in a low signal-to-noise ratio environment Achieve better noise reduction effect.
  • Figure 1 is a block diagram of the environmental adaptive noise reduction algorithm of the present invention
  • Figure 2 is a diagram of the LSTM network structure of the present invention.
  • Figure 3 is a diagram of the operation mechanism of the LSTM unit of the present invention.
  • Figure 4 is a block diagram of the noise reduction model of the fully connected neural network of the present invention.
  • Fig. 5 is a graph of evaluation results of PESQ indicators of the present invention.
  • Fig. 6 is a graph of evaluation results of STOI indicators of the present invention.
  • the invention discloses an environment-adaptive neural network noise reduction method for digital hearing aids.
  • the method uses a scene recognition module as a decision-driven module, and selects corresponding neural network noise reduction models according to different acoustic scenes to realize different noise reduction methods. Type of suppression.
  • the entire algorithm system of the present invention includes two parts, one is a scene recognition module, and the other is a neural network noise reduction module, as shown in FIG. 1.
  • Fig. 1 is an algorithm block diagram of the entire neural network noise reduction system of the present invention, which is composed of an acoustic scene recognition module and multiple noise reduction models in different scenes. After the noisy speech signal is sampled and divided into frames, it is first sent to the scene recognition module to determine the current scene type, and then sent to the corresponding neural network noise reduction model to realize the noise reduction process.
  • the core part of the whole algorithm system is the recognition module and the noise reduction module, which will be introduced in detail below:
  • the acoustic scene recognition module is designed with a LSTM (Long Short-Term Memory) neural network that has a memory function for time series; first, the voice signal received by the microphone is sampled and divided into time frames with a frame length of 256 points. Domain signal, the sampling rate is 16000Hz, and each frame is 16ms; next, 39-dimensional Mel Frequency Cepstrum Coefficient (MFCC) features are extracted for each frame, and the LSTM network reads one frame of MFCC features at a time It is processed, but the classification result is output only when 100 frames are full, that is to say, the current environmental classification result is updated every 1.6S.
  • LSTM Long Short-Term Memory
  • the structure of the LSTM neural network is shown in Figure 2.
  • the number of neural units in the input layer is 39
  • the number of neural units in the recursive hidden layer is 512
  • the number of neural units in the output layer is 9 (corresponding to 9 scene categories: factory, street , Subway stations, railway stations, restaurants, sports fields, airplane cabins, car interiors, indoor scenes)
  • the corresponding training data is downloaded from the freesound website [1] , each scene is about 2 hours of audio
  • LSTM network Not only will it process the current input, but it will also combine with the previously retained output to achieve the function of memory. When 100 frames of memory are accumulated, the classification result will be output.
  • LSTM memory update mechanism unit shown in Figure 3, wherein the C n 1-C n-1 represents a retained state, f n represents the current frame forgotten gate output, u n denotes the current output frame updating door, O n represents the output of the current frame output gate, C n represents the retention status of the current frame, and h n represents the output of the current frame.
  • the LSTM unit combines the input feature t n of the current frame with the previously retained output result h n-1 , and also inputs the state C n-1 of the previous frame for judgment, and generates an output h n and
  • the output state C n of a current frame is iterated until the memory condition of 100 frames is satisfied, and the final output h is subjected to Softmax (Softmax function, or normalized exponential function) transformation to obtain the predicted probability of the output layer.
  • Softmax Softmax function, or normalized exponential function
  • the loss function during training of the LSTM network is calculated by cross-entropy.
  • the calculation formula is shown in formula (11), where y i and Respectively, the correct classification label and the classification result predicted by the output layer of the LSTM network:
  • the input signal with noise will be sent to different noise reduction models for frame-by-frame processing.
  • the noise reduction models in different scenarios all use a fully connected neural network structure, as shown in Figure 4.
  • the number of layers of the neural network and the number of neurons in each layer are different, which is related to the nature of different scene noises, for example Factory noise requires 3 hidden layers to achieve better noise reduction performance, while car interior noise only needs 2 layers to achieve the same noise reduction effect.
  • the following will take the network structure in the factory scenario as an example for detailed introduction.
  • the model is tuned according to the loss value of the training set and the loss value of the validation set, and finally determined: in the factory noise scene,
  • the neural network is selected as the network structure of 129-1024-1024-1024-129, except that the output layer adopts linear layer, all hidden layer units adopt ReLU activation function; in addition, in order to improve the generalization ability of the network, each layer of hidden layer
  • the regularization method of 0.8 discarding rate is adopted, and the coefficient of the L2 regularization term is set to 0.00001.
  • the noise reduction effect and indicators are all measured on the test set, which is from Aishell Another 400 sentences selected from the data set that are not duplicated in the training set (2 males and 2 females, each speaking 100 sentences), mixed with the last 20% of the factory noise in NOISEX-92 to form -5dB, 0dB, 5dB, 10dB And 15dB five kinds of noise pollution levels.
  • NOISEX-92 a predefined noise pollution level
  • instantaneous noise such as the knocking of machines in the factory was well suppressed, and almost no residual noise was heard.
  • the beneficial effects of the present invention are: 1. It can ensure the real-time performance of speech processing, only carry out the forward propagation of the neural network, and the amount of calculation is not high; 2. It can recognize the acoustic scene in which it is located, and then autonomously select different nerves
  • the network model which performs targeted noise reduction processing on the noise in different scenes, can ensure better speech quality and speech intelligibility; 3. Can effectively suppress instantaneous noise; 4. Can work in a low signal-to-noise ratio environment Achieve better noise reduction

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Otolaryngology (AREA)
  • Neurosurgery (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

Disclosed is an environment adaptive neural network noise reduction method for digital hearing aids. The method comprises the successive execution of the following steps: a preprocessing step, including receiving voice signal with noise and transmitting the voice signal with noise to an acoustic scene recognition module after sampling and framing; a scene recognition step, including recognizing the current acoustic scene by using the acoustic scene recognition module, and independently selecting and sending different neural network modules in a neural network noise reduction module by the acoustic scene recognition module; and a neural network noise reduction step. The beneficial effects of the method are: 1, ensuring the timeliness of voice processing, and having less computation burden while only performing forward spread of neural network; 2, being able to recognize the current acoustic scene and independently selecting different neural network modules to perform targeted noise reduction in different scenes, and guaranteeing better voice quality and voice intelligibility; and 3, being able to effectively inhibiting the instant noise.

Description

一种用于数字助听器的环境自适应神经网络降噪方法、***及存储介质Environment adaptive neural network noise reduction method, system and storage medium for digital hearing aid 技术领域Technical field
本发明涉及软件技术领域,尤其涉及一种用于数字助听器的环境自适应神经网络降噪方法、***及存储介质。The present invention relates to the field of software technology, in particular to an environment adaptive neural network noise reduction method, system and storage medium for digital hearing aids.
背景技术Background technique
目前,市面上的高性能数字助听器都会内置降噪算法,用来消除环境中的背景噪声干扰,以达到满足人耳听觉舒适度的要求。由于数字助听器对语音实时处理的要求,内置在助听器内部的降噪算法多采用谱减法、维纳滤波等运算量较低的算法,这些算法只能应对简单稳定的噪声干扰环境,在低信噪比、瞬时噪声等复杂的噪声环境中性能表现很差,听力损失患者的佩戴使用体验不佳。At present, high-performance digital hearing aids on the market have built-in noise reduction algorithms to eliminate background noise interference in the environment to meet the requirements of human hearing comfort. Due to the requirements of digital hearing aids for real-time speech processing, the noise reduction algorithms built into hearing aids mostly use algorithms with low computational complexity such as spectral subtraction and Wiener filtering. These algorithms can only deal with simple and stable noise interference environments. The performance is poor in complex noise environments such as ratio and transient noise, and the wearing experience of hearing loss patients is not good.
发明内容Summary of the invention
本发明公开了一种用于数字助听器的环境自适应神经网络降噪方法,利用深层神经网络强大的映射能力,并结合环境自适应的策略,实现一种应对复杂噪声环境的高性能降噪算法。The invention discloses an environment-adaptive neural network noise reduction method for digital hearing aids, which utilizes the powerful mapping ability of the deep neural network and combines an environment-adaptive strategy to realize a high-performance noise reduction algorithm for complex noise environments .
本发明提供了一种用于数字助听器的环境自适应神经网络降噪方法,包括依次执行如下步骤:The present invention provides an environmental adaptive neural network noise reduction method for digital hearing aids, which includes the following steps:
预处理步骤:接收带噪语音信号,带噪语音信号经过采样分帧后传输至声学场景识别模块;Preprocessing step: receiving noisy speech signal, and transmitting the noisy speech signal to the acoustic scene recognition module after sampling and framing;
场景识别步骤:采用声学场景识别模块对所处的声学场景进行识别,然后由声学场景识别模块自主的选择神经网络降噪模块中不同的神经网络模型进行发送;Scene recognition step: the acoustic scene recognition module is used to identify the acoustic scene in which it is located, and then the acoustic scene recognition module autonomously selects different neural network models in the neural network noise reduction module to send;
神经网络降噪步骤:神经网络降噪模型接收声学场景识别模块发送的分类结果并对不同场景下的噪声进行针对性地降噪处理。Neural network noise reduction step: The neural network noise reduction model receives the classification results sent by the acoustic scene recognition module and performs targeted noise reduction processing on the noise in different scenes.
作为本发明的进一步改进,在所述场景识别步骤中,所述声学场景识别模块采用了对时间序列具有记忆作用的LSTM神经网络结构,具体步骤如下:As a further improvement of the present invention, in the scene recognition step, the acoustic scene recognition module adopts an LSTM neural network structure with a memory function for time series. The specific steps are as follows:
S1:对每一帧提取设定维数的梅尔倒谱系数特征;S1: Extract the feature of Mel cepstrum coefficients of the set dimension for each frame;
S2:由LSTM神经网络读入一帧梅尔倒谱系数特征进行处理,达到一定帧时将输出分类的结果。S2: The LSTM neural network reads in a frame of Mel cepstrum coefficient features for processing, and outputs the classification result when a certain frame is reached.
作为本发明的进一步改进,所述LSTM神经网络结构包括输入层、隐藏层和输出层,输出层的神经单元对应不同的场景类别,LSTM神经网络不仅会处理当前的输入,还会与之前保留的输出进行组合,实现记忆的作用,当累计设定帧数的记忆后,输出分类结果。As a further improvement of the present invention, the LSTM neural network structure includes an input layer, a hidden layer, and an output layer. The neural units of the output layer correspond to different scene categories. The LSTM neural network will not only process the current input, but also compare with the previously retained The output is combined to realize the function of memory. When the memory of the set number of frames is accumulated, the classification result is output.
作为本发明的进一步改进,所述LSTM神经网络结构记忆更新原理如下:As a further improvement of the present invention, the LSTM neural network structure memory update principle is as follows:
LSTM神经网络结构将当前帧输入的特征t n与之前保留的输出结果h n-1进行组合,同时也将上一帧的状态C n-1一起输入进去进行判断,产生一个当前帧的输出h n和一个当前帧的输出状态C n,一直迭代下去,直到满足所需帧的记忆条件后,对最终的输出h进行softmax变换得到输出层的预测概率。 The LSTM neural network structure combines the input feature t n of the current frame with the previously retained output result h n-1 , and also inputs the state C n-1 of the previous frame for judgment, and produces an output h of the current frame n and the output state C n of a current frame, iterate until the memory condition of the required frame is satisfied, and perform softmax transformation on the final output h to obtain the predicted probability of the output layer.
作为本发明的进一步改进,在所述场景识别步骤中,还包括LSTM神经网络训练时的损失损失函数计算,计算公式如下:As a further improvement of the present invention, the scene recognition step also includes the calculation of the loss loss function during the training of the LSTM neural network. The calculation formula is as follows:
Figure PCTCN2019117075-appb-000001
Figure PCTCN2019117075-appb-000001
其中y i
Figure PCTCN2019117075-appb-000002
分别为正确的分类标签和LSTM网络输出层预测的分类结果。
Where y i and
Figure PCTCN2019117075-appb-000002
They are the correct classification label and the classification result predicted by the output layer of the LSTM network.
作为本发明的进一步改进,不同场景下的降噪模型均采用全连接神经网络结构,但所述全连接神经网络结构的层数和每层的神经元个数是不同的;As a further improvement of the present invention, the noise reduction models in different scenarios all adopt a fully connected neural network structure, but the number of layers of the fully connected neural network structure and the number of neurons in each layer are different;
所述全连接神经网络结构的降噪模型包括执行如下步骤:The noise reduction model of the fully connected neural network structure includes the following steps:
训练数据集步骤:挑选作为训练集的纯净语音数据,然后将噪声数据与纯净语音进行随机混合,获得所需带噪训练数据;Training data set steps: select pure speech data as the training set, and then randomly mix the noise data and pure speech to obtain the required noisy training data;
模型参数调优步骤:采用最小均方误差作为代价函数,再根据训练集loss值和验证集loss值对模型进行参数调优,得到所需的神经网络结构;Model parameter tuning steps: use the minimum mean square error as the cost function, and then tune the model parameters according to the training set loss value and the verification set loss value to obtain the required neural network structure;
训练时,反复进行反向传播算法迭代,能实现较好的噪声抑制效果;During training, repeated iterations of the back propagation algorithm can achieve better noise suppression effects;
所述验证集是挑选作为验证集纯净语音数据,并与噪声数据进行混合,得到验证集带噪语音数据;The verification set is selected as the pure voice data of the verification set and mixed with the noise data to obtain the noisy voice data of the verification set;
所述最小均方误差计算公式如下:The minimum mean square error calculation formula is as follows:
Figure PCTCN2019117075-appb-000003
Figure PCTCN2019117075-appb-000003
其中MSE为均方误差。Where MSE is the mean square error.
作为本发明的进一步改进,除了输出层采用线性层以外,所有的隐藏层单元均采用ReLU激活函数;另外,为了提高网络的泛化能力,每层隐藏层采用0.8丢弃率的正则化方法,且L2正则化项系数设为0.00001;训练时,利用Adam优化算法进行反向传播,以0.0001的学习率迭代200次,便可以实现较好的噪声抑制效果。As a further improvement of the present invention, in addition to the linear layer used in the output layer, all hidden layer units use the ReLU activation function; in addition, in order to improve the generalization ability of the network, each hidden layer adopts a regularization method with a drop rate of 0.8, and The coefficient of the L2 regularization term is set to 0.00001; during training, the Adam optimization algorithm is used for back propagation, and iterates 200 times at a learning rate of 0.0001 to achieve a better noise suppression effect.
作为本发明的进一步改进,在所述预处理步骤中,麦克风接收到的语音信号,经过采样后,将其分成帧长为256点的时域信号,采样率为16000Hz,每一帧为16ms;As a further improvement of the present invention, in the preprocessing step, the voice signal received by the microphone is sampled and divided into time domain signals with a frame length of 256 points, the sampling rate is 16000 Hz, and each frame is 16 ms;
在所述步骤S1中,对每一帧提取39维的梅尔倒谱系数特征;In the step S1, the 39-dimensional Mel cepstrum coefficient feature is extracted for each frame;
在所述步骤S2中,由LSTM神经网络读入一帧梅尔倒谱系数特征进行处理,达到100帧时将输出分类的结果。In the step S2, the LSTM neural network reads in a frame of Mel cepstrum coefficient features for processing, and outputs the classification result when it reaches 100 frames.
本发明还公开了一种用于数字助听器的环境自适应神经网络降噪***,包括:存储器、处理器以及存储在所述存储器上的计算机程序,所述计算机程序配置为由所述处理器调用时实现权利要求所述的方法的步骤。The present invention also discloses an environmental adaptive neural network noise reduction system for digital hearing aids, including: a memory, a processor, and a computer program stored on the memory, and the computer program is configured to be called by the processor When implementing the steps of the method described in the claims.
本发明还公开了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序配置为由处理器调用时实现权利要求所述的方法的步骤。The present invention also discloses a computer-readable storage medium storing a computer program, and the computer program is configured to implement the steps of the method described in the claims when called by a processor.
本发明的有益效果是:1.可以保证语音处理的实时性,只进行神经网络的前向传播,运算量不高;2.可以对所处的声学场景进行识别,然后自主地选择不同的神经网络模型,对不同的场景下的噪声进行针对性地降噪处理,能保证更好的语音质量和语音可懂度;3.可以有效地抑制瞬时噪声;4.可以在低信噪比环境下实现更好的降噪效果。The beneficial effects of the present invention are: 1. It can ensure the real-time performance of speech processing, only carry out the forward propagation of the neural network, and the amount of calculation is not high; 2. It can recognize the acoustic scene in which it is located, and then autonomously select different nerves The network model performs targeted noise reduction processing on the noise in different scenes, which can ensure better speech quality and speech intelligibility; 3. Can effectively suppress instantaneous noise; 4. Can work in a low signal-to-noise ratio environment Achieve better noise reduction effect.
附图说明Description of the drawings
图1是本发明环境自适应降噪算法框图;Figure 1 is a block diagram of the environmental adaptive noise reduction algorithm of the present invention;
图2是本发明LSTM网络结构图;Figure 2 is a diagram of the LSTM network structure of the present invention;
图3是本发明LSTM单元的运行机理图;Figure 3 is a diagram of the operation mechanism of the LSTM unit of the present invention;
图4是本发明全连接神经网络降噪模型框图;Figure 4 is a block diagram of the noise reduction model of the fully connected neural network of the present invention;
图5是本发明PESQ指标评测结果图;Fig. 5 is a graph of evaluation results of PESQ indicators of the present invention;
图6是本发明STOI指标评测结果图。Fig. 6 is a graph of evaluation results of STOI indicators of the present invention.
具体实施方式detailed description
本发明公开了一种用于数字助听器的环境自适应神经网络降噪方法,该方法以场景识别模块作为决策驱动模块,根据不同的声学场景选择对应的神经网络降噪模型,来实现对不同噪声类型的抑制。本发明的整个算法***包含两大部分,一个是场景识别模块,另一个是神经网络降噪模块,如图1所示。The invention discloses an environment-adaptive neural network noise reduction method for digital hearing aids. The method uses a scene recognition module as a decision-driven module, and selects corresponding neural network noise reduction models according to different acoustic scenes to realize different noise reduction methods. Type of suppression. The entire algorithm system of the present invention includes two parts, one is a scene recognition module, and the other is a neural network noise reduction module, as shown in FIG. 1.
图1是本发明整个神经网络降噪***的算法框图,由声学场景识别模块和多个不同场景下的降噪模型组成。带噪语音信号经过采样分帧后,首先会送到场景识别模块来确定当前的场景类型,随后会被送到相应的神经网络降噪模型,实现降噪过程。整个算法***的核心部分在于识别模块和降噪模块两个部分,下面将分别进行详细的介绍:Fig. 1 is an algorithm block diagram of the entire neural network noise reduction system of the present invention, which is composed of an acoustic scene recognition module and multiple noise reduction models in different scenes. After the noisy speech signal is sampled and divided into frames, it is first sent to the scene recognition module to determine the current scene type, and then sent to the corresponding neural network noise reduction model to realize the noise reduction process. The core part of the whole algorithm system is the recognition module and the noise reduction module, which will be introduced in detail below:
声学场景识别模块,采用了对时间序列具有记忆作用的LSTM(Long Short-Term Memory)神经网络进行设计;首先,麦克风接收到的语音信号,经过采样后,将其分成帧长为256点的时域信号,采样率为16000Hz,每一帧为16ms;接下来,对每一帧提取39维的梅尔倒谱系数特征(Mel Frequency Cepstrum Coefficient,MFCC),LSTM网络每次读入一帧MFCC特征进行处理,但是只有满100帧时才会输出分类的结果,也就是说每隔1.6S更新一下当前的环境分类结果。The acoustic scene recognition module is designed with a LSTM (Long Short-Term Memory) neural network that has a memory function for time series; first, the voice signal received by the microphone is sampled and divided into time frames with a frame length of 256 points. Domain signal, the sampling rate is 16000Hz, and each frame is 16ms; next, 39-dimensional Mel Frequency Cepstrum Coefficient (MFCC) features are extracted for each frame, and the LSTM network reads one frame of MFCC features at a time It is processed, but the classification result is output only when 100 frames are full, that is to say, the current environmental classification result is updated every 1.6S.
LSTM神经网络的结构如图2所示,其中输入层的神经单元数为39,递归隐藏层的神经单元数为512,输出层的神经单元数为9(对应着9种场景类别:工厂、街道、地铁站、火车站、餐厅、运动场、飞机舱内、汽车内部、室内场景),相应的训练数据,是从freesound网站 [1]上下载的,每 种场景约2个小时的音频;LSTM网络不仅会处理当前的输入,还会与之前保留的输出进行组合,实现记忆的作用,当累计满100帧的记忆后,输出分类结果。 The structure of the LSTM neural network is shown in Figure 2. The number of neural units in the input layer is 39, the number of neural units in the recursive hidden layer is 512, and the number of neural units in the output layer is 9 (corresponding to 9 scene categories: factory, street , Subway stations, railway stations, restaurants, sports fields, airplane cabins, car interiors, indoor scenes), the corresponding training data is downloaded from the freesound website [1] , each scene is about 2 hours of audio; LSTM network Not only will it process the current input, but it will also combine with the previously retained output to achieve the function of memory. When 100 frames of memory are accumulated, the classification result will be output.
LSTM单元的记忆更新机理如图3所示,其中C n-1C n-1表示上一帧留存的状态,f n表示当前帧遗忘门的输出,u n表示当前帧更新门的输出,O n表示当前帧输出门的输出,C n表示当前帧的留存状态,h n表示当前帧的输出。LSTM单元将当前帧输入的特征t n与之前保留的输出结果h n-1进行组合,同时也将上一帧的状态C n-1一起输入进去进行判断,产生一个当前帧的输出h n和一个当前帧的输出状态C n,一直迭代下去,直到满足100帧的记忆条件后,对最终的输出h进行Softmax(Softmax函数,或称归一化指数函数)变换得到输出层的预测概率。 LSTM memory update mechanism unit shown in Figure 3, wherein the C n 1-C n-1 represents a retained state, f n represents the current frame forgotten gate output, u n denotes the current output frame updating door, O n represents the output of the current frame output gate, C n represents the retention status of the current frame, and h n represents the output of the current frame. The LSTM unit combines the input feature t n of the current frame with the previously retained output result h n-1 , and also inputs the state C n-1 of the previous frame for judgment, and generates an output h n and The output state C n of a current frame is iterated until the memory condition of 100 frames is satisfied, and the final output h is subjected to Softmax (Softmax function, or normalized exponential function) transformation to obtain the predicted probability of the output layer.
各个门以及输出的计算式如下,其中δ(·)和tanh()分别代表sigmoid激活函数和双曲正切激活函数:The calculation formulas of each gate and output are as follows, where δ(·) and tanh() represent the sigmoid activation function and the hyperbolic tangent activation function, respectively:
C_t n=tanh(W c[h n-1,x n]+b c)     (5) C_t n =tanh(W c [h n-1 ,x n ]+b c ) (5)
f n=δ(W f[h n-1,x n]+b f)     (6) f n =δ(W f [h n-1 ,x n ]+b f ) (6)
u n=δ(W u[h n-1,x n]+b u)    (7) u n =δ(W u [h n-1 ,x n ]+b u ) (7)
O n=δ(W o[h n-1,x n]+b o)     (8) O n =δ(W o [h n-1 ,x n ]+b o ) (8)
C n=u n*C_t n+f n*C n-1     (9) C n = u n * C_t n + f n * C n-1 (9)
h n=O n*tanh(C n)    (10) h n =O n *tanh(C n ) (10)
LSTM网络的训练时的损失函数用交叉熵来计算,计算式如式(11)所示,其中y i
Figure PCTCN2019117075-appb-000004
分别为正确的分类标签和LSTM网络输出层预测的分类结果:
The loss function during training of the LSTM network is calculated by cross-entropy. The calculation formula is shown in formula (11), where y i and
Figure PCTCN2019117075-appb-000004
Respectively, the correct classification label and the classification result predicted by the output layer of the LSTM network:
Figure PCTCN2019117075-appb-000005
Figure PCTCN2019117075-appb-000005
根据声学场景分类模块的分类结果,输入的带噪音频信号会被送到不同的降噪模型进行逐帧处理。不同场景下的降噪模型均采用全连接的神经网络结构,如图4所示,但是神经网络的层数和每层的神经元个数是不同的,它与不同的场景噪声性质有关,例如工厂噪声需要3层隐藏层才能实现较好的降噪性能,而汽车内噪声只需要2层便可以实现同样的降噪效果。后面将以工厂场景下的网络结构为例进行详细的介绍。According to the classification result of the acoustic scene classification module, the input signal with noise will be sent to different noise reduction models for frame-by-frame processing. The noise reduction models in different scenarios all use a fully connected neural network structure, as shown in Figure 4. However, the number of layers of the neural network and the number of neurons in each layer are different, which is related to the nature of different scene noises, for example Factory noise requires 3 hidden layers to achieve better noise reduction performance, while car interior noise only needs 2 layers to achieve the same noise reduction effect. The following will take the network structure in the factory scenario as an example for detailed introduction.
如上图3所示,要训练全连接神经网络的降噪模型,首先需要准备足够多的训练数据集,这也是提高网络泛化能力很重要的一个方面,所以我们挑选了Aishell中文数据集 [2]中1200句话(6男6女,每人说100句话)作为训练集的纯净语音数据,然后利用NOISEX-92 [3]噪声库中的工厂噪声(前60%)作为噪声数据与纯净语音进行随机混合,混合的信噪比符合区间[-5,20]的均匀分布,总共获得带噪训练数据时长约为25个小时。为了对模型的参数进行调优,需要设置验证集,同样从Aishell数据集中另外挑选出400句话(2男2女,每人说100句话)作为验证集纯净语音数据,并与NOISEX-92工厂噪声的中间20%进行均匀混合,得到大约8个小时的验证集带噪语音数据。 As shown in Figure 3 above, to train the noise reduction model of the fully connected neural network, you first need to prepare enough training data sets. This is also an important aspect to improve the generalization ability of the network, so we chose the Aishell Chinese data set [2 ] 1200 sentences (6 males and 6 females, each speaking 100 sentences) as the pure speech data of the training set, and then use NOISEX-92 [3] the factory noise in the noise database (the first 60%) as the noise data and pure The speech is randomly mixed, and the mixed signal-to-noise ratio conforms to the uniform distribution of the interval [-5,20]. The total length of the noisy training data is about 25 hours. In order to tune the parameters of the model, it is necessary to set up a validation set. Similarly, another 400 sentences (2 males and 2 females, each speaking 100 sentences) are selected from the Aishell data set as the pure voice data of the validation set, and combined with NOISEX-92 The middle 20% of the factory noise is uniformly mixed to obtain about 8 hours of noisy speech data in the verification set.
Figure PCTCN2019117075-appb-000006
Figure PCTCN2019117075-appb-000006
采用式(12)所示的最小均方误差(Minimum Mean Squared Error,MMSE)作为代价函数,根据训练集loss值和验证集loss值对模型进行参数调优,最后确定:在工厂噪声场景中,选用神经网络为129-1024-1024-1024-129的网络结构,除了输出层采用线性层以外,所有的隐藏层单元均采用ReLU激活函数;另外,为了提高网络的泛化能力,每层隐藏层采用0.8丢弃率的正则化方法,且L2正则化项系数设为0.00001。训练时,利用Adam优化算法(Adam:一种高效的反向传播优化算法,由Adam提出,所以称为Adam优化算法)进行反向传播,以0.0001的学习率迭代200次,便可以实现较好的噪声抑制效果。模型训练完以后,在助听器中只需要进行前向传播,运算量不高,可以满足实时处理的要求。降噪后的PESQ(Perceptual evaluation of speech quality)、STOI(Short-Time Objective Intelligibility)指标评测结果如图5所示,其中降噪效果和指标都是在测试集上测得,测试集是从Aishell数据集中挑选出的与训练集不重复的另外400句话(2男2女,每人说100句话),与NOISEX-92中工厂噪声的后20%混合成-5dB,0dB,5dB,10dB和15dB五种噪声污染程度。另外,进行主观听音时发现,工厂里的机器敲打声等瞬时噪声被抑制的很好,几乎听不到任何残留的噪声。Using the Minimum Mean Squared Error (MMSE) shown in equation (12) as the cost function, the model is tuned according to the loss value of the training set and the loss value of the validation set, and finally determined: in the factory noise scene, The neural network is selected as the network structure of 129-1024-1024-1024-129, except that the output layer adopts linear layer, all hidden layer units adopt ReLU activation function; in addition, in order to improve the generalization ability of the network, each layer of hidden layer The regularization method of 0.8 discarding rate is adopted, and the coefficient of the L2 regularization term is set to 0.00001. During training, use Adam optimization algorithm (Adam: an efficient back propagation optimization algorithm, proposed by Adam, so called Adam optimization algorithm) for back propagation, 200 iterations with a learning rate of 0.0001, you can achieve better The noise suppression effect. After the model is trained, only forward propagation is required in the hearing aid, and the amount of calculation is not high, which can meet the requirements of real-time processing. The evaluation results of PESQ (Perceptual evaluation of speech quality) and STOI (Short-Time Objective Intelligibility) after noise reduction are shown in Figure 5. The noise reduction effect and indicators are all measured on the test set, which is from Aishell Another 400 sentences selected from the data set that are not duplicated in the training set (2 males and 2 females, each speaking 100 sentences), mixed with the last 20% of the factory noise in NOISEX-92 to form -5dB, 0dB, 5dB, 10dB And 15dB five kinds of noise pollution levels. In addition, when subjectively listening, it was found that instantaneous noise such as the knocking of machines in the factory was well suppressed, and almost no residual noise was heard.
本发明的有益效果是:1.可以保证语音处理的实时性,只进行神经网络的前向传播,运算量不高;2.可以对所处的声学场景进行识别,然后自主地选择不同的神经网络模型,对不同的场景下的噪声进行针对性地降噪处理,能保证更好的语音质量和语音可懂度;3.可以有效地抑制瞬时噪声; 4.可以在低信噪比环境下实现更好的降噪效果The beneficial effects of the present invention are: 1. It can ensure the real-time performance of speech processing, only carry out the forward propagation of the neural network, and the amount of calculation is not high; 2. It can recognize the acoustic scene in which it is located, and then autonomously select different nerves The network model, which performs targeted noise reduction processing on the noise in different scenes, can ensure better speech quality and speech intelligibility; 3. Can effectively suppress instantaneous noise; 4. Can work in a low signal-to-noise ratio environment Achieve better noise reduction
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in combination with specific preferred embodiments, and it cannot be considered that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field to which the present invention belongs, several simple deductions or substitutions can be made without departing from the concept of the present invention, which should be regarded as falling within the protection scope of the present invention.

Claims (10)

  1. 一种用于数字助听器的环境自适应神经网络降噪方法,其特征在于,包括依次执行如下步骤:An environment-adaptive neural network noise reduction method for digital hearing aids is characterized in that it includes the following steps:
    预处理步骤:接收带噪语音信号,带噪语音信号经过采样分帧后传输至声学场景识别模块;Preprocessing step: receiving noisy speech signal, and transmitting the noisy speech signal to the acoustic scene recognition module after sampling and framing;
    场景识别步骤:采用声学场景识别模块对所处的声学场景进行识别,然后由声学场景识别模块自主的选择神经网络降噪模块中不同的神经网络模型进行发送;Scene recognition step: the acoustic scene recognition module is used to identify the acoustic scene in which it is located, and then the acoustic scene recognition module autonomously selects different neural network models in the neural network noise reduction module to send;
    神经网络降噪步骤:神经网络降噪模型接收声学场景识别模块发送的分类结果并对不同场景下的噪声进行针对性地降噪处理。Neural network noise reduction step: The neural network noise reduction model receives the classification results sent by the acoustic scene recognition module and performs targeted noise reduction processing on the noise in different scenes.
  2. 根据权利要求1所述的环境自适应神经网络降噪方法,其特征在于,在所述场景识别步骤中,所述声学场景识别模块采用了对时间序列具有记忆作用的LSTM神经网络结构,具体步骤如下:The environment adaptive neural network noise reduction method according to claim 1, wherein in the scene recognition step, the acoustic scene recognition module adopts an LSTM neural network structure that has a memory function for time series. The specific steps are as follows:
    S1:对每一帧提取设定维数的梅尔倒谱系数特征;S1: Extract the feature of Mel cepstrum coefficients of the set dimension for each frame;
    S2:由LSTM神经网络读入一帧梅尔倒谱系数特征进行处理,达到一定帧时将输出分类的结果。S2: The LSTM neural network reads in a frame of Mel cepstrum coefficient features for processing, and outputs the classification result when a certain frame is reached.
  3. 根据权利要求2所述的环境自适应神经网络降噪方法,其特征在于,所述LSTM神经网络结构包括输入层、隐藏层和输出层,输出层的神经单元对应不同的场景类别,LSTM神经网络不仅会处理当前的输入,还会与之前保留的输出进行组合,实现记忆的作用,当累计达到设定帧数的记忆后,输出分类结果。The environment adaptive neural network noise reduction method according to claim 2, wherein the LSTM neural network structure includes an input layer, a hidden layer, and an output layer, and the neural unit of the output layer corresponds to different scene categories, and the LSTM neural network It will not only process the current input, but also combine with the previously retained output to realize the function of memory. When the accumulated memory reaches the set number of frames, the classification result will be output.
  4. 根据权利要求3所述的环境自适应神经网络降噪方法,其特征在于,所述LSTM神经网络结构记忆更新原理如下:The environmental adaptive neural network noise reduction method according to claim 3, wherein the LSTM neural network structure memory update principle is as follows:
    LSTM神经网络结构将当前帧输入的特征t n与之前保留的输出结果h n-1进行组合,同时也将上一帧的状态C n-1一起输入进去进行判断,产生一个当前帧的输出h n和一个当前帧的输出状态C n,一直迭代下去,直到满足所需帧的记忆条件后,对最终的输出h进行softmax变换得到输出层的预测概率。 The LSTM neural network structure combines the input feature t n of the current frame with the previously retained output result h n-1 , and also inputs the state C n-1 of the previous frame for judgment, and produces an output h of the current frame n and the output state C n of a current frame, iterate until the memory condition of the required frame is satisfied, and perform softmax transformation on the final output h to obtain the predicted probability of the output layer.
  5. 根据权利要求4所述的环境自适应神经网络降噪方法,其特征在于,在所述场景识别步骤中,还包括LSTM神经网络训练时的损失损失函数计算,计算公式如下:The environmental adaptive neural network noise reduction method according to claim 4, characterized in that, in the scene recognition step, it further includes the calculation of the loss loss function during the training of the LSTM neural network, and the calculation formula is as follows:
    Figure PCTCN2019117075-appb-100001
    Figure PCTCN2019117075-appb-100001
    其中y i
    Figure PCTCN2019117075-appb-100002
    分别为正确的分类标签和LSTM网络输出层预测的分类结果。
    Where y i and
    Figure PCTCN2019117075-appb-100002
    They are the correct classification label and the classification result predicted by the output layer of the LSTM network.
  6. 根据权利要求1所述的环境自适应神经网络降噪方法,其特征在于,不同场景下的降噪模型均采用全连接神经网络结构,但所述全连接神经网络结构的层数和每层的神经元个数是不同的;The environmental adaptive neural network noise reduction method according to claim 1, wherein the noise reduction models in different scenarios all adopt a fully connected neural network structure, but the number of layers of the fully connected neural network structure and the number of each layer The number of neurons is different;
    所述全连接神经网络结构的降噪模型包括执行如下步骤:The noise reduction model of the fully connected neural network structure includes the following steps:
    训练数据集步骤:挑选作为训练集的纯净语音数据,然后将噪声数据与纯净语音进行随机混合,获得所需带噪训练数据;Training data set steps: select pure speech data as the training set, and then randomly mix the noise data and pure speech to obtain the required noisy training data;
    模型参数调优步骤:采用最小均方误差作为代价函数,再根据训练集loss值和验证集loss值对模型进行参数调优,得到所需的神经网络结构;Model parameter tuning steps: use the minimum mean square error as the cost function, and then tune the model parameters according to the training set loss value and the verification set loss value to obtain the required neural network structure;
    训练时,反复进行反向传播算法迭代,能实现较好的噪声抑制效果;During training, repeated iterations of the back propagation algorithm can achieve better noise suppression effects;
    所述验证集是挑选作为验证集纯净语音数据,并与噪声数据进行混合,得到验证集带噪语音数据;The verification set is selected as the pure voice data of the verification set and mixed with the noise data to obtain the noisy voice data of the verification set;
    所述最小均方误差计算公式如下:The minimum mean square error calculation formula is as follows:
    Figure PCTCN2019117075-appb-100003
    Figure PCTCN2019117075-appb-100003
    其中MSE为均方误差。Where MSE is the mean square error.
  7. 根据权利要求6所述的环境自适应神经网络降噪方法,其特征在于,除了输出层采用线性层以外,所有的隐藏层单元均采用ReLU激活函数;另外,为了提高网络的泛化能力,每层隐藏层采用0.8丢弃率的正则化方法,且L2正则化项系数设为0.00001;训练时,利用Adam优化算法进行反向传播,以0.0001的学习率迭代200次,便可以实现较好的噪声抑制效果。The environmental adaptive neural network noise reduction method according to claim 6, characterized in that, except that the output layer adopts a linear layer, all hidden layer units adopt ReLU activation functions; in addition, in order to improve the generalization ability of the network, every The hidden layer adopts the regularization method of 0.8 drop rate, and the coefficient of the L2 regularization term is set to 0.00001; during training, the Adam optimization algorithm is used for back propagation, and iterates 200 times at a learning rate of 0.0001 to achieve better noise Inhibitory effect.
  8. 根据权利要求2所述的环境自适应神经网络降噪方法,其特征在于,在所述预处理步骤中,麦克风接收到的语音信号,经过采样后,将其分成帧 长为256点的时域信号,采样率为16000Hz,每一帧为16ms;The environmental adaptive neural network noise reduction method according to claim 2, wherein in the preprocessing step, the voice signal received by the microphone is sampled and divided into a time domain with a frame length of 256 points. Signal, the sampling rate is 16000Hz, and each frame is 16ms;
    在所述步骤S1中,对每一帧提取39维的梅尔倒谱系数特征;In the step S1, the 39-dimensional Mel cepstrum coefficient feature is extracted for each frame;
    在所述步骤S2中,由LSTM神经网络读入一帧梅尔倒谱系数特征进行处理,达到100帧时将输出分类的结果。In the step S2, the LSTM neural network reads in a frame of Mel cepstrum coefficient features for processing, and outputs the classification result when it reaches 100 frames.
  9. 一种用于数字助听器的环境自适应神经网络降噪***,其特征在于,包括:存储器、处理器以及存储在所述存储器上的计算机程序,所述计算机程序配置为由所述处理器调用时实现权利要求1-8中任一项所述的方法的步骤。An environmental adaptive neural network noise reduction system for digital hearing aids, which is characterized by comprising: a memory, a processor, and a computer program stored on the memory, the computer program being configured to be called by the processor The steps of implementing the method of any one of claims 1-8.
  10. 一种计算机可读存储介质,其特征在于:所述计算机可读存储介质存储有计算机程序,所述计算机程序配置为由处理器调用时实现权利要求1-8中任一项所述的方法的步骤。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program is configured to implement the method of any one of claims 1-8 when called by a processor. step.
PCT/CN2019/117075 2019-03-06 2019-11-11 Environment adaptive neural network noise reduction method and system for digital hearing aids, and storage medium WO2020177371A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910168122.4 2019-03-06
CN201910168122.4A CN109859767B (en) 2019-03-06 2019-03-06 Environment self-adaptive neural network noise reduction method, system and storage medium for digital hearing aid

Publications (1)

Publication Number Publication Date
WO2020177371A1 true WO2020177371A1 (en) 2020-09-10

Family

ID=66899968

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117075 WO2020177371A1 (en) 2019-03-06 2019-11-11 Environment adaptive neural network noise reduction method and system for digital hearing aids, and storage medium

Country Status (2)

Country Link
CN (1) CN109859767B (en)
WO (1) WO2020177371A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112447183A (en) * 2020-11-16 2021-03-05 北京达佳互联信息技术有限公司 Training method and device for audio processing model, audio denoising method and device, and electronic equipment
CN113314136A (en) * 2021-05-27 2021-08-27 西安电子科技大学 Voice optimization method based on directional noise reduction and dry sound extraction technology
CN113345464A (en) * 2021-05-31 2021-09-03 平安科技(深圳)有限公司 Voice extraction method, system, device and storage medium
CN113707159A (en) * 2021-08-02 2021-11-26 南昌大学 Electric network bird-involved fault bird species identification method based on Mel language graph and deep learning
CN113823322A (en) * 2021-10-26 2021-12-21 武汉芯昌科技有限公司 Simplified and improved Transformer model-based voice recognition method
CN114626412A (en) * 2022-02-28 2022-06-14 长沙融创智胜电子科技有限公司 Multi-class target identification method and system for unattended sensor system
CN114869224A (en) * 2022-03-28 2022-08-09 浙江大学 Lung disease classification detection method based on cooperative deep learning and lung auscultation sound
US20220256294A1 (en) * 2019-05-09 2022-08-11 Sonova Ag Hearing Device System And Method For Processing Audio Signals
CN117290669A (en) * 2023-11-24 2023-12-26 之江实验室 Optical fiber temperature sensing signal noise reduction method, device and medium based on deep learning

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859767B (en) * 2019-03-06 2020-10-13 哈尔滨工业大学(深圳) Environment self-adaptive neural network noise reduction method, system and storage medium for digital hearing aid
CN110379412B (en) 2019-09-05 2022-06-17 腾讯科技(深圳)有限公司 Voice processing method and device, electronic equipment and computer readable storage medium
DE102019213809B3 (en) 2019-09-11 2020-11-26 Sivantos Pte. Ltd. Method for operating a hearing aid and hearing aid
CN110996208B (en) * 2019-12-13 2021-07-30 恒玄科技(上海)股份有限公司 Wireless earphone and noise reduction method thereof
IT201900024454A1 (en) 2019-12-18 2021-06-18 Storti Gianampellio LOW POWER SOUND DEVICE FOR NOISY ENVIRONMENTS
CN113129876B (en) * 2019-12-30 2024-05-14 Oppo广东移动通信有限公司 Network searching method, device, electronic equipment and storage medium
CN111312221B (en) * 2020-01-20 2022-07-22 宁波舜韵电子有限公司 Intelligent range hood based on voice control
CN111491245B (en) * 2020-03-13 2022-03-04 天津大学 Digital hearing aid sound field identification algorithm based on cyclic neural network and implementation method
CN111508509A (en) * 2020-04-02 2020-08-07 广东九联科技股份有限公司 Sound quality processing system and method based on deep learning
CN112565997B (en) * 2020-12-04 2022-03-22 可孚医疗科技股份有限公司 Adaptive noise reduction method and device for hearing aid, hearing aid and storage medium
CN113160789A (en) * 2021-03-05 2021-07-23 南京每深智能科技有限责任公司 Active noise reduction device and method
CN113160844A (en) * 2021-04-27 2021-07-23 山东省计算中心(国家超级计算济南中心) Speech enhancement method and system based on noise background classification
CN113259824B (en) * 2021-05-14 2021-11-30 谷芯(广州)技术有限公司 Real-time multi-channel digital hearing aid noise reduction method and system
CN113266933A (en) * 2021-05-24 2021-08-17 青岛海尔空调器有限总公司 Voice control method of air conditioner and air conditioner
CN113724726A (en) * 2021-08-18 2021-11-30 中国长江电力股份有限公司 Unit operation noise suppression processing method based on full-connection neural network
CN114245280B (en) * 2021-12-20 2023-06-23 清华大学深圳国际研究生院 Scene self-adaptive hearing aid audio enhancement system based on neural network
CN114640937B (en) * 2022-05-18 2022-09-02 深圳市听多多科技有限公司 Hearing aid function implementation method based on wearable device system and wearable device
CN114640938B (en) 2022-05-18 2022-08-23 深圳市听多多科技有限公司 Hearing aid function implementation method based on Bluetooth headset chip and Bluetooth headset
CN116367063B (en) * 2023-04-23 2023-11-14 郑州大学 Bone conduction hearing aid equipment and system based on embedded

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6453284B1 (en) * 1999-07-26 2002-09-17 Texas Tech University Health Sciences Center Multiple voice tracking system and method
CN101529929A (en) * 2006-09-05 2009-09-09 Gn瑞声达A/S A hearing aid with histogram based sound environment classification
CN104952448A (en) * 2015-05-04 2015-09-30 张爱英 Method and system for enhancing features by aid of bidirectional long-term and short-term memory recurrent neural networks
CN105611477A (en) * 2015-12-27 2016-05-25 北京工业大学 Depth and breadth neural network combined speech enhancement algorithm of digital hearing aid
CN108073856A (en) * 2016-11-14 2018-05-25 华为技术有限公司 The recognition methods of noise signal and device
CN108877823A (en) * 2018-07-27 2018-11-23 三星电子(中国)研发中心 Sound enhancement method and device
CN108962278A (en) * 2018-06-26 2018-12-07 常州工学院 A kind of hearing aid sound scene classification method
CN109410976A (en) * 2018-11-01 2019-03-01 北京工业大学 Sound enhancement method based on binaural sound sources positioning and deep learning in binaural hearing aid
CN109859767A (en) * 2019-03-06 2019-06-07 哈尔滨工业大学(深圳) A kind of environment self-adaption neural network noise-reduction method, system and storage medium for digital deaf-aid

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019014890A1 (en) * 2017-07-20 2019-01-24 大象声科(深圳)科技有限公司 Universal single channel real-time noise-reduction method
CN109378010A (en) * 2018-10-29 2019-02-22 珠海格力电器股份有限公司 Training method, the speech de-noising method and device of neural network model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6453284B1 (en) * 1999-07-26 2002-09-17 Texas Tech University Health Sciences Center Multiple voice tracking system and method
CN101529929A (en) * 2006-09-05 2009-09-09 Gn瑞声达A/S A hearing aid with histogram based sound environment classification
CN104952448A (en) * 2015-05-04 2015-09-30 张爱英 Method and system for enhancing features by aid of bidirectional long-term and short-term memory recurrent neural networks
CN105611477A (en) * 2015-12-27 2016-05-25 北京工业大学 Depth and breadth neural network combined speech enhancement algorithm of digital hearing aid
CN108073856A (en) * 2016-11-14 2018-05-25 华为技术有限公司 The recognition methods of noise signal and device
CN108962278A (en) * 2018-06-26 2018-12-07 常州工学院 A kind of hearing aid sound scene classification method
CN108877823A (en) * 2018-07-27 2018-11-23 三星电子(中国)研发中心 Sound enhancement method and device
CN109410976A (en) * 2018-11-01 2019-03-01 北京工业大学 Sound enhancement method based on binaural sound sources positioning and deep learning in binaural hearing aid
CN109859767A (en) * 2019-03-06 2019-06-07 哈尔滨工业大学(深圳) A kind of environment self-adaption neural network noise-reduction method, system and storage medium for digital deaf-aid

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220256294A1 (en) * 2019-05-09 2022-08-11 Sonova Ag Hearing Device System And Method For Processing Audio Signals
US11832058B2 (en) * 2019-05-09 2023-11-28 Sonova Ag Hearing device system and method for processing audio signals
CN112447183A (en) * 2020-11-16 2021-03-05 北京达佳互联信息技术有限公司 Training method and device for audio processing model, audio denoising method and device, and electronic equipment
CN113314136A (en) * 2021-05-27 2021-08-27 西安电子科技大学 Voice optimization method based on directional noise reduction and dry sound extraction technology
CN113345464A (en) * 2021-05-31 2021-09-03 平安科技(深圳)有限公司 Voice extraction method, system, device and storage medium
CN113707159A (en) * 2021-08-02 2021-11-26 南昌大学 Electric network bird-involved fault bird species identification method based on Mel language graph and deep learning
CN113707159B (en) * 2021-08-02 2024-05-03 南昌大学 Power grid bird-involved fault bird species identification method based on Mel language graph and deep learning
CN113823322A (en) * 2021-10-26 2021-12-21 武汉芯昌科技有限公司 Simplified and improved Transformer model-based voice recognition method
CN114626412B (en) * 2022-02-28 2024-04-02 长沙融创智胜电子科技有限公司 Multi-class target identification method and system for unattended sensor system
CN114626412A (en) * 2022-02-28 2022-06-14 长沙融创智胜电子科技有限公司 Multi-class target identification method and system for unattended sensor system
CN114869224A (en) * 2022-03-28 2022-08-09 浙江大学 Lung disease classification detection method based on cooperative deep learning and lung auscultation sound
CN117290669B (en) * 2023-11-24 2024-02-06 之江实验室 Optical fiber temperature sensing signal noise reduction method, device and medium based on deep learning
CN117290669A (en) * 2023-11-24 2023-12-26 之江实验室 Optical fiber temperature sensing signal noise reduction method, device and medium based on deep learning

Also Published As

Publication number Publication date
CN109859767B (en) 2020-10-13
CN109859767A (en) 2019-06-07

Similar Documents

Publication Publication Date Title
WO2020177371A1 (en) Environment adaptive neural network noise reduction method and system for digital hearing aids, and storage medium
CN109841226B (en) Single-channel real-time noise reduction method based on convolution recurrent neural network
Zhao et al. Perceptually guided speech enhancement using deep neural networks
CN112735456B (en) Speech enhancement method based on DNN-CLSTM network
CN107393550B (en) Voice processing method and device
CN110867181B (en) Multi-target speech enhancement method based on SCNN and TCNN joint estimation
CN111583954B (en) Speaker independent single-channel voice separation method
CN109841206A (en) A kind of echo cancel method based on deep learning
CN110428849B (en) Voice enhancement method based on generation countermeasure network
CN106782497B (en) Intelligent voice noise reduction algorithm based on portable intelligent terminal
Tu et al. A hybrid approach to combining conventional and deep learning techniques for single-channel speech enhancement and recognition
CN113744749B (en) Speech enhancement method and system based on psychoacoustic domain weighting loss function
CN107360497B (en) Calculation method and device for estimating reverberation component
Dionelis et al. Modulation-domain Kalman filtering for monaural blind speech denoising and dereverberation
CN112259117A (en) Method for locking and extracting target sound source
CN111739562A (en) Voice activity detection method based on data selectivity and Gaussian mixture model
CN103971697B (en) Sound enhancement method based on non-local mean filtering
TWI749547B (en) Speech enhancement system based on deep learning
Kim et al. iDeepMMSE: An improved deep learning approach to MMSE speech and noise power spectrum estimation for speech enhancement.
Chen et al. Leveraging heteroscedastic uncertainty in learning complex spectral mapping for single-channel speech enhancement
Wang Research progress in speech enhancement technology
CN108573698B (en) Voice noise reduction method based on gender fusion information
CN113744725B (en) Training method of voice endpoint detection model and voice noise reduction method
CN107393559B (en) Method and device for checking voice detection result
Ni et al. Multi-channel dictionary learning speech enhancement based on power spectrum

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19917642

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19917642

Country of ref document: EP

Kind code of ref document: A1