CN112331225B

CN112331225B - Method and device for assisting hearing in high-noise environment

Info

Publication number: CN112331225B
Application number: CN202011159182.9A
Authority: CN
Inventors: 周宇阳
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-10-26
Filing date: 2020-10-26
Publication date: 2023-09-26
Anticipated expiration: 2040-10-26
Also published as: CN112331225A

Abstract

The invention provides a method and a device for assisting hearing in a high-noise environment. The method comprises the steps of obtaining noise information in the environment and establishing a noise sample database; acquiring a plurality of voice information and establishing a voice sample database; acquiring voice information exchanged by staff in a high-noise environment; processing the voice information based on a noise sample database and a voice sample database to obtain clean voice; and outputting the clean voice.

Description

Method and device for assisting hearing in high-noise environment

Technical Field

The invention relates to the technical field of voice separation, in particular to a method and a device for assisting hearing in a high-noise environment.

Background

At present, voice is one of the most direct ways for human to perform information interaction, people always receive interference of other sounds when acquiring voice, especially people working in a high-noise environment are more easily interfered by noise, so that both sides of staff cannot perform effective information interaction, the working efficiency is seriously influenced, and voice separation is an effective way for suppressing interference as a pre-processing scheme;

speech separation refers to the task of separating target speech from background interference. At present, the voice separation mainly utilizes methods such as calculation auditory scene analysis, non-negative matrix factorization and the like, and the method is simple to realize, has large limitation, is small in applicable scene, rapidly reduces performance when noise exists, fails to consider voice characteristics, damages voice and fails to consider a high-noise voice environment;

therefore, the invention provides a method and a device for assisting hearing in a high-noise environment, which are used for solving the problem that staff are difficult to communicate with each other in the high-noise environment.

Disclosure of Invention

The invention provides a method and a device for assisting hearing in a high-noise environment, which are used for solving the problem that workers are difficult to communicate with each other in the high-noise environment.

A method of assisting hearing in a high noise environment, comprising:

acquiring noise information in the environment, and establishing a noise sample database;

acquiring a plurality of voice information and establishing a voice sample database;

acquiring voice information exchanged by staff in a high-noise environment;

processing the voice information based on a noise sample database and a voice sample database to obtain clean voice;

and outputting the clean voice.

As an embodiment of the present invention, the acquiring a plurality of voice information and establishing a voice sample database includes:

the voice information is subjected to analog-digital conversion to obtain digital signals of the voice information;

processing the digital signal by utilizing a fast Fourier transform technology to obtain a plurality of frequency spectrums of the voice information;

according to the frequency spectrum, obtaining the voice frequency information of each time point;

and establishing a voice sample database according to the voice frequency information of each time point.

As an embodiment of the present invention, the processing the digital signal by using a fast fourier transform technology to obtain a frequency spectrum of several pieces of voice information includes:

calculating the frequency spectrum of a plurality of pieces of voice information by utilizing FFT:

wherein ,e is natural number logarithm, p=0, 1, …, M-1, x (N) is N point sequence;

wherein ,T_p (θ) is FFT calculates a value in the spectrum of several pieces of voice information,is a positive integer, 0<＝θ<＝α-1。

As an embodiment of the present invention, the processing the voice information based on the noise sample database and the voice sample database to obtain clean voice includes:

obtaining a noise frequency threshold according to the noise sample database;

performing first processing on the voice information according to the noise frequency threshold value to obtain first filtered voice information; the first processing is filtering frequency signals in the voice information, which are higher than the noise frequency threshold value;

and matching the first filtered voice information with the voice sample database, and filtering frequency signals, which have a difference larger than a preset difference value, between the first filtered voice information and a preset average value in the voice sample database, so as to obtain clean voice information.

As an embodiment of the present invention, the obtaining a noise frequency threshold according to the noise sample database includes:

calculating a noise frequency threshold:

wherein v is the noise frequency threshold, F _i The noise sample database is used for the sample frequency information range, N is the number of samples in the noise sample database, pi is the circumference rate, k is the stiffness coefficient, and m is the mass.

As one embodiment of the present invention, the noise frequency threshold is determined according to a high noise environment;

the high noise environment is determined by noise information in the high noise environment acquired in preset time; wherein,

the high noise environment includes: traffic noise and industrial noise.

As an embodiment of the present invention, the high noise environment is determined by noise information in the high noise environment acquired in a preset time, and includes:

acquiring noise information in a high noise environment within a preset time, and obtaining a digital signal of the voice information through analog-to-digital conversion;

obtaining a noise frequency waveform according to the digital signal, and filtering an isolated waveform in the noise frequency waveform to obtain a section of continuous noise frequency waveform;

taking out the maximum value in the frequency range of the continuous noise waveform, and comparing the maximum value with noise frequency thresholds in the noise sample database to obtain the most similar noise frequency threshold;

determining a high noise environment according to the most similar noise frequency threshold;

wherein the noise frequency threshold value corresponds to the high noise environment one by one.

A device for assisting hearing in a high noise environment, comprising:

the acquisition module is used for acquiring noise information in the environment, a plurality of pieces of voice information and voice information in the environment;

the creating module is used for creating a noise sample database according to the noise information acquired by the acquiring module, obtaining a noise frequency threshold value and creating a voice sample database according to the voice information acquired by the acquiring module;

the comparison module is used for comparing the voice information in the environment with a noise frequency threshold value and determining frequency information of which the frequency of the voice information in the environment is greater than the noise frequency threshold value;

the filtering module is used for filtering frequency information with the frequency greater than a noise frequency threshold value in the environment by utilizing a filter to obtain first filtered voice information;

the matching module is used for matching the first filtering voice information with the voice sample database, and filtering frequency signals, which have a difference larger than a preset difference value, between the first filtering voice information and a preset average value in the voice sample database to obtain clean voice;

and the transmission module is used for synthesizing the clean voice into voice segments and transmitting the voice segments to a receiver.

As an embodiment of the present invention, the creation module performs the following operations:

according to the obtained noise information, an industrial noise sample database, a traffic noise sample database and a mixed noise sample database are established;

according to the acquired voice information, establishing at least 3 voice sample databases; wherein the 3-class voice sample database comprises: a sound sample database for adult men, a sound sample database for adult women, and a sound sample database for elderly men.

As an embodiment of the present invention, the matching module performs the following operations, including:

if the multi-person voice exists in the second filtering voice information, the multi-voice section in the clean voice is separated into single voice sections by utilizing a multi-person voice separation technology in a voice separation technology.

The beneficial effects of the invention are as follows: the influence of noise on voice communication is avoided, so that workers can also quickly and clearly hear the meaning of the other party to be expressed in a high-noise environment, the working efficiency is improved, the mood dysphoria caused by the noise is reduced, the hearing protection of the workers in the high-noise environment is enhanced, and the mutual communication among the workers is facilitated.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a flow chart of a method and apparatus for assisting hearing in a high noise environment according to an embodiment of the present invention;

fig. 2 is a flow chart of a method and apparatus for assisting hearing in a high noise environment according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

Example 1:

as shown in fig. 1, an embodiment of the present invention provides a method for assisting hearing in a high noise environment, including:

step S101: acquiring noise information in the environment, and establishing a noise sample database;

step S102: acquiring a plurality of voice information and establishing a voice sample database;

step S103: acquiring voice information exchanged by staff in a high-noise environment;

step S104: processing the voice information based on a noise sample database and a voice sample database to obtain clean voice;

step S105: outputting the clean voice;

the working principle of the technical scheme is as follows: collecting noise information such as traffic noise and industrial noise which possibly occur in a working environment through a microphone, thereby establishing a noise sample database, obtaining a preset noise frequency threshold value through integrating the noise sample database, collecting a plurality of pieces of voice information including but not limited to voice information of a adult male, voice information of an adult female and voice information of an elderly male through the microphone, establishing a voice sample database, utilizing an FFT (fast Fourier transform) technology, manufacturing a frequency spectrum of the voice sample database through conversion of a time domain and a frequency spectrum of the voice sample database through statistics and integration, and obtaining a voice frequency average value; acquiring voice information in the environment through a microphone, firstly converting the voice information into analog signals based on a voice separation technology, filtering out frequency signals higher than a preset noise frequency threshold value through a filter to obtain first filtered voice information, matching the first voice information with a voice frequency mean value, filtering out frequency signals which have a phase difference greater than a preset difference value with the voice frequency mean value in the frequency information of the first filtered voice information to obtain second filtered voice information, separating out target voice by utilizing the voice separation technology if the second voice information has multiple voices, converting multiple voice segments into single voice segments, transmitting the single voice segments, and directly transmitting the second voice information if the second voice information has no multiple voices;

the beneficial effects of the technical scheme are as follows: the noise information is integrated, a noise information threshold value is preset, the influence of high noise on voice communication is eliminated, and the influence of high noise on voice communication is secondarily eliminated by matching the mean value of the human voice information with the voice information in the environment; the invention can avoid the influence of noise on voice communication, so that the staff can quickly and clearly hear the meaning of the other party to be expressed in a high-noise environment, thereby being beneficial to improving the working efficiency, reducing the mood dysphoria caused by the noise, enhancing the protection of the hearing of the staff in the high-noise environment and facilitating the mutual communication among the staff.

Example 2:

in one embodiment, the acquiring a plurality of voice information and establishing a voice sample database includes:

according to the voice frequency information of each time point, a voice sample database is established;

the working principle of the technical scheme is as follows: converting the acquired plurality of pieces of voice information into analog signals through a microphone, converting the analog signals of the plurality of pieces of voice information into digital signals through an ADC module, converting the digital signals of the plurality of pieces of voice information into continuous frequency spectrums of the plurality of pieces of voice information by utilizing an FFT technology, and realizing time domain and frequency domain conversion through FFT; the method comprises the steps of obtaining voice frequency information of each time point, establishing a voice sample database according to the voice frequency information of each time point, and obtaining a voice frequency average value;

the beneficial effects of the technical scheme are as follows: by utilizing the fast Fourier transform technology, the integration of a plurality of voice frequency information is efficiently achieved, the voice frequency mean value is obtained, voice information in a deep filtration environment is benefited, and the communication of staff is more convenient.

Example 3:

in one embodiment, the obtaining the spectrum of the plurality of voice information by using the fast fourier transform technology includes:

wherein ,T_p (theta) calculating a value in the spectrum of several pieces of voice information for the FFT,is a positive integer, 0<＝θ<＝α-1；

The working principle of the technical scheme is as follows: assuming that the sampling frequency of the voice signal is fs, when the FFT of N points is performed on the voice signal, the frequency interval between two points of the FFY result is fs/N, that is, the frequency represented by any point p (p=0 to M-1) is p x fs/N, wherein ,T_p (θ) calculating a value in the spectrum of several pieces of voice information for FFT,/->Is a positive integer, 0<＝θ<=α -1, thereby producing a spectrum of several pieces of voice information;

the beneficial effects of the technical scheme are as follows: and the frequency spectrums of a plurality of voice information are manufactured, so that the fluctuation condition of the voice information frequency can be intuitively obtained, and the accuracy of the voice frequency mean value is improved.

Example 4:

in one embodiment, the processing the voice information based on the noise sample database and the voice sample database to obtain clean voice includes:

obtaining a noise frequency threshold according to the noise sample database;

matching the first filtered voice information with the voice sample database, and filtering frequency signals, which have a difference larger than a preset difference value, between the first filtered voice information and a preset average value in the voice sample database to obtain clean voice information;

the working principle of the technical scheme is as follows: collecting voice information in the environment through a microphone, denoising by utilizing voice in a voice separation technology, filtering frequency signals with the frequency higher than a preset noise frequency threshold value in the voice information through a filter to obtain first filtered voice, comparing the obtained first filtered voice with a voice frequency mean value, filtering out frequency signals with the difference between the frequency information of the first filtered voice information and the voice frequency mean value being greater than a preset difference value, and converting the frequency signals obtained by filtering into digital signals, namely clean voice information;

the beneficial effects of the technical scheme are as follows: the voice separation technology is beneficial to reducing noise in voice information, improving the definition of target voice, enabling staff to communicate more quickly and improving the working efficiency.

Example 5:

in one embodiment, the obtaining the noise frequency threshold according to the noise sample database includes:

calculating a noise frequency threshold:

wherein v is the noise frequency threshold, F _i The method is characterized in that the method is used for obtaining a sample frequency information range in a noise sample database, N is the number of samples in the noise sample database, pi is a circumference rate, k is a stiffness coefficient, and m is a mass;

the working principle of the technical scheme is as follows: obtaining the maximum value of all sample frequencies in a sample database, adding the maximum values to obtain an average value, and removing the influence of the sound frequency generated by the inherent vibration of the device on the collected data to obtain a noise frequency threshold value, wherein k is the stiffness coefficient of the device, and m is the mass of the device;

the beneficial effects of the technical scheme are as follows: the accuracy of noise filtering of the device is improved.

Example 6:

in one embodiment, the noise frequency threshold is determined from a high noise environment;

the high noise environment includes: traffic noise and industrial noise;

the working principle of the technical scheme is as follows: determining a corresponding noise frequency threshold according to the noise information obtained in the preset time of the high noise environment, and determining the high noise environment;

the beneficial effects of the technical scheme are as follows: different noise frequency thresholds are selected in different high-noise environments, so that the noise filtering precision is improved.

Example 7:

in one embodiment, the high noise environment is determined by noise information in the high noise environment acquired in a preset time, including:

wherein the noise frequency threshold value corresponds to the high noise environment one by one;

the working principle of the technical scheme is as follows: the acquired noise information is converted into a digital signal through analog-to-digital conversion, a noise frequency waveform is obtained according to the digital signal, and an isolated waveform in the noise frequency waveform is filtered to obtain a section of continuous noise frequency waveform. Wherein, an isolated waveform refers to a waveform that is discontinuous or that fluctuates greatly. Selecting the maximum value of the continuous noise frequency waveform, comparing the maximum value with all noise frequency thresholds in a noise sample database, selecting the noise frequency threshold with the most similar comparison result, and determining the current high noise environment;

the beneficial effects of the technical scheme are as follows: different noise frequency threshold values are used for filtering in different high-noise environments, so that the noise filtering precision is improved, and the auxiliary hearing effect is enhanced.

Example 8:

as shown in fig. 2, an embodiment of the present invention provides a device for assisting hearing in a high noise environment, including:

step S201: the acquisition module is used for acquiring noise information in the environment, a plurality of pieces of voice information and voice information in the environment;

step S202: the creating module is used for creating a noise sample database according to the noise information acquired by the acquiring module and creating a voice sample database according to the voice information acquired by the acquiring module to obtain a preset noise frequency threshold value and a voice frequency average value;

step S203: the comparison module is used for comparing the voice information in the environment with a preset noise frequency threshold value and determining frequency information of which the frequency of the voice information in the environment is greater than the preset noise frequency threshold value;

step S204: the filtering module is used for filtering frequency information with the frequency greater than a preset noise frequency threshold value in the environment by utilizing a filter to obtain first filtered voice information;

step S205: the matching module is used for matching the first-time filtered voice information with the voice frequency average value, and filtering frequency signals, which have a difference greater than a preset difference value, from the voice frequency average value in the frequency information of the first-time filtered voice information to obtain second-time filtered voice information;

step S206: the transmission module is used for synthesizing the second filtered voice information into voice segments and transmitting the voice segments to the receiver;

Example 9:

in one embodiment, the creation module performs the following operations:

according to the acquired voice information, establishing at least 3 voice sample databases; wherein the 3-class voice sample database comprises: a sound sample database for adult men, a sound sample database for adult women, and a sound sample database for elderly men;

the working principle of the technical scheme is as follows: collecting sound of whistling, automobile booming sound, ship noise, mechanical noise, aerodynamic noise and electromagnetic noise through a microphone, converting the collected sound into an analog signal through the microphone, converting the analog signal into a digital signal through an ADC module, and integrating the digital signal through simulation to obtain a traffic noise sample database, an industrial noise frequency database and a mixed noise database; collecting sound information of young and old men in a working environment through a microphone, establishing a sound sample database of the young and old men, collecting sound information of adult women in the working environment through the microphone, establishing a sound sample database of the adult women, collecting sound information of the old men in the working environment through the microphone, and establishing a sound sample database of the old men;

the beneficial effects of the technical scheme are as follows: different preset noise information thresholds are set according to different noise information, so that the accuracy of noise filtering is improved; different voice sample databases are established for different crowds, so that the device can accurately assist the hearing of different crowds.

Example 10:

in one embodiment, the matching module performs operations comprising:

if the multi-person voice exists in the second filtering voice information, the multi-voice section in the second filtering voice information is separated into single voice sections by utilizing a multi-person voice separation technology in a voice separation technology;

the working principle of the technical scheme is as follows: separating the multi-voice section in the second filtered voice into single voice sections by utilizing a multi-voice separation technology in the voice separation technology;

the beneficial effects of the technical scheme are as follows: the definition of target voice is improved, simultaneous communication of multiple people is facilitated, and working efficiency is improved.

Example 11:

the invention also provides a method for assisting hearing in a high-noise environment, which comprises the following steps:

step S301: establishing a plurality of noise sample databases based on the first positioning information of the high noise environment, the equipment operation information in the high noise environment and the noise information in the high noise environment;

the main source of noise in the high-noise environment is the running condition of each running device in the environment, when the running information of the devices in the high-noise environment is different, different noise sample databases are established, namely the noise sample databases take positioning information and the running information of the devices as first calling tags; the accurate noise sample database is called;

step S302: acquiring a plurality of voice information and establishing a voice sample database;

for example: generally, workers in the same working environment are fixed or cannot change too much, so that a single-ended voice sample database can be established in advance according to each worker; then, a second calling label is established according to the sound characteristics of the staff;

step S303: acquiring second positioning information of a worker, matching the second positioning information with the first positioning information, and acquiring voice information exchanged by the worker in a high-noise environment when matching;

the second positioning information confirms whether the staff is in a high noise environment or not, and after a corresponding noise sample database is built in advance in the high noise environment where the staff is likely to be, the staff is not considered to be in the high noise environment when the position indicated by the second positioning information is not in the position of the first positioning information for building the noise sample database; no voice data information is acquired.

Step S304: acquiring equipment operation information of the working environment of the current staff based on the second positioning information;

and the server acquires the operation information of each device by inquiring the device where the second positioning information is located.

Step S305: calling a corresponding noise sample database based on the second positioning information and the equipment operation information of the working environment of the current staff;

and matching the second positioning information with the first positioning information in the first call tag, and matching the equipment operation information of the environment where the current staff works with the equipment operation information in the first call tag, so as to match a noise sample database corresponding to the first call tag.

Step S306: denoising the voice information based on a noise sample database to obtain denoised voice;

the noise-removing method has the advantages that the primary denoising of the voice information of the current speaking of the staff is realized through the pre-established noise sample database, and the denoising effect is improved by adopting the accurate noise sample database.

Step S307: extracting features of the denoised voice to obtain a first sound feature; calling a corresponding voice sample database based on the first voice characteristic;

the first sound features are sound features that identify differences among persons to whom the speech belongs, including timbre, loudness, pitch, and the like.

Step S308: matching the denoising voice with the sample voice in the called voice sample database to obtain clean voice;

the matching mode is as follows: extracting secondary characteristics of the de-noised voice, extracting second voice characteristics, calculating similarity between the second voice characteristics and the second voice characteristics of the sample voice, and calling the sample voice with the maximum similarity as clean voice; the second sound feature includes: short-time energy spectrum, formant frequency, amplitude spectrum, etc

Step S309: outputting the clean voice;

the clean voice is played to the other staff through the voice playing equipment, so that the communication of the two staff under the high-noise environment is realized, the cooperation of the staff under the high-noise environment is realized, and the unexpected loss or accident caused by the transmission error of instructions or other interference under the high noise is avoided.

Under the condition that the denoising effect is not considered, the embodiment can have another feasible scheme, namely, the existing denoising mode is directly adopted for denoising, and then the denoising voice is directly matched with the sample voice in the voice sample database, so that clean voice is obtained.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of assisting hearing in a high noise environment, comprising:

acquiring voice information exchanged by staff in a high-noise environment;

outputting the clean voice;

the noise sample database and the voice sample database are based on to process the voice information to obtain clean voice, and the method comprises the following steps:

obtaining a noise frequency threshold according to the noise sample database;

wherein, according to the noise sample database, obtaining a noise frequency threshold value includes:

calculating a noise frequency threshold:

wherein v is the noise frequency threshold, F _i For the range of sample frequency information in the noise sample database, N is the number of samples in the noise sample database, pi is the circumference ratio, k is the stiffness coefficient of the device, and m is the mass of the device.

2. The method for assisting hearing in a high-noise environment according to claim 1, wherein the step of obtaining a plurality of pieces of voice information and creating a voice sample database comprises:

3. The method for assisting hearing in a high-noise environment according to claim 2, wherein said processing said digital signal using a fast fourier transform technique to obtain a spectrum of several pieces of human voice information comprises:

wherein ,T_p (theta) calculating a value in the spectrum of several pieces of voice information for the FFT,is a positive integer, 0<＝θ<＝α-1。

4. A method of assisting hearing in a high noise environment as set forth in claim 1, wherein,

the noise frequency threshold is determined according to a high noise environment;

the high noise environment includes: traffic noise and industrial noise.

5. The method for assisting hearing in a high noise environment according to claim 4, wherein the high noise environment is determined from noise information in the high noise environment acquired during a preset time, comprising:

6. A device for assisting hearing in a high noise environment, comprising:

the transmission module is used for synthesizing the clean voice into voice segments and transmitting the voice segments to the receiver;

the creating module obtains a noise frequency threshold according to the noise sample database, and the creating module comprises:

calculating a noise frequency threshold:

wherein v is the noise frequency threshold, F _i For the sample frequency information range in the noise sample database, N is the number of samples in the noise sample database, pi is the circumference rate, and k is the strength of the deviceAnd the degree coefficient, m, is the mass of the device.

7. The hearing assistance device of claim 6, wherein the creation module performs the following operations:

8. The hearing assistance device of claim 6, wherein the matching module performs the following operations:

if the multi-person voice exists in the second filtering voice information, the multi-voice section in the clean voice is separated into single voice sections by utilizing the multi-person voice separation technology in the voice separation technology.