CN112205981B

CN112205981B - Hearing assessment method and device based on speech intelligibility index

Info

Publication number: CN112205981B
Application number: CN202011077820.2A
Authority: CN
Inventors: 陈婧; 吴玺宏; 杜逾凡; 牛亚东
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2021-09-28
Anticipated expiration: 2040-10-10
Also published as: CN112205981A

Abstract

The invention discloses a hearing evaluation method and equipment based on a speech intelligibility index, which comprises the following steps: 1) establishing a functional relationship between the hearing threshold and speech recognition performance by using the speech intelligibility index; 2) constructing an easily confused double-syllable word pair corpus as a test corpus for speech audiometry according to the selected easily confused vowel pair and the selected consonant pair, and measuring a frequency band weight function (BIF) of the test corpus by using a rapid frequency band weight measurement method; 3) using the confusable double syllable word pair constructed in the step 2) to perform speech audiometry on the testee; then, the sound intensity condition which makes the confusing double syllable word pair with the maximum likelihood value of the test result is selected as the final hearing threshold of the testee. The invention can obtain more stable and reliable results in non-professional environment, and has larger correlation with the result of pure-tone audiometry, thereby being a feasible scheme for solving the hearing evaluation of the mobile terminal.

Description

Hearing assessment method and device based on speech intelligibility index

Technical Field

The invention belongs to the technical field of hearing aids, relates to a hearing evaluation method, and particularly relates to a hearing evaluation method and equipment based on a speech intelligibility index.

Background

The hearing loss is one of important public health problems, and the evaluation data of the world health organization on the hearing disability conditions of member countries of the world health organization show that the hearing disabilities are more than 4.6 hundred million, and according to the second national disability sampling survey in 2006, 2780 million people in China have the hearing loss, which accounts for 34% of the total number of the disabled people. Hearing loss can cause communication impairment, thereby affecting interpersonal relationships and working capacity, leading to social isolation and a reduction in quality of life. Studies have shown that elderly people with hearing loss are at higher risk of falling, dementia, depression and death compared to individuals without hearing loss. Children with hearing loss may develop a delayed speech function and have a significant adverse effect on academic performance. From a socio-economic perspective, the world health organization estimates that hearing loss results in a worldwide loss of $ 7500 billion per year, including economic losses from health sector costs, educational support costs, and productivity losses. Hearing loss affects physical and mental health, and brings high socioeconomic cost, and early discovery and intervention are needed. Hearing assessment is crucial.

The standard for hearing loss diagnosis is the hearing threshold (hearing threshold), which is obtained by playing a fixed frequency pure tone signal under certain circumstances and equipment, etc., and measuring the minimum sound pressure level at which a predetermined percentage of the subject's responses are perceived to be correct. Hearing loss can cause the threshold to rise, as evidenced by the inability to detect relatively faint sounds, which is the most obvious characteristic of hearing loss. The hearing threshold reflects the condition of hearing loss at the different frequencies tested. The normal hearing person has a binaural hearing threshold of 25dB HL or less, and a hearing threshold higher than this value suffers from hearing loss. The hearing threshold was mild hearing loss at 26 to 40dB HL, moderate at 41 to 55dB HL, moderate at 56 to 70dB HL, severe at 71 to 90dB HL, and very severe above 91dB HL. The disabled hearing loss means that the hearing threshold of a better one of the ears of an adult is higher than the average hearing threshold of a normal hearing person by more than 40dB, and the hearing loss of a better one of the ears of a child is higher than 30 dB. For hearing loss, the hearing aid is a commonly used hearing compensation device, which can improve the speech communication ability of the hearing loss, and the hearing threshold is usually input into an adaptation formula to derive the gain of the hearing aid. The hearing threshold may therefore not only reflect the hearing status of a person, but also be applied to the hearing compensation of a hearing impaired person.

The measurement of the hearing threshold has strict requirements on the test environment and audiometric equipment, and usually needs to go to a special institution, and needs a professional person as a main test of the experiment. Also commonly used for adult hearing assessment are whisper Test (Whispered Voice Test), Finger Rub Test (Finger Rub Test), questionnaire, and portable audiometers, among others. The results of previous studies show that although these methods are more convenient than pure-tone audiometry, the accuracy needs to be verified, professional personnel are still needed to participate, quantitative damage conditions at each frequency cannot be given, and the cost of the portable audiometer is high. With the popularization of intelligent devices, some audiometry methods based on mobile intelligent terminals appear. The methods can finish hearing evaluation on the mobile terminal, and have the characteristics of convenience, rapidness and low cost. Some audiometry applications based on intelligent terminals implement pure tone audiometry, but generally, the measured result has deviation due to the strict requirements of the pure tone audiometry on environment and audiometry equipment.

Disclosure of Invention

Aiming at the defects of the existing method, the invention provides a hearing evaluation method and equipment based on speech intelligibility index. The result of the speech audiometry can reflect the auditory perception ability of the user, and meanwhile, the speech audiometry does not need complex guidance, so that the operation of the user is facilitated. Furthermore, speech audiometry has relatively reduced environmental and equipment requirements compared to pure tone audiometry, and thus may be a more robust hearing assessment approach.

The technical scheme of the invention is as follows:

a hearing assessment method based on speech intelligibility index comprises the following steps:

1) the Speech Intelligibility Index (SII) is used to establish the functional relationship between the hearing threshold and Speech recognition performance, and the frame diagram of the Speech Intelligibility Index is shown in fig. 2.

2) Constructing an easily-confused double-syllable word pair corpus as a test corpus for speech audiometry according to the selected easily-confused vowel pair and the selected consonant pair, and measuring a Band weight function (BIF) of the test corpus by using a Quick Band weight measurement method (qBIF) to quantify the importance degree of each Band in each easily-confused double-syllable word pair;

3) using the confusable double-syllable words constructed in the step 2) to perform speech audiometry on listeners; the listeners comprise normal hearing persons and hearing loss persons;

4) selecting the sound intensity of the confusable double syllable word pair to be played by using a random method and a self-adaptive method; the test of the first K confusable double syllable word pairs uses a random method to select the sound intensity, and the subsequent test selects the sound intensity by a self-adaptive method. In the random method, the sound intensity of the confusing syllable doublesyllable word pair to be played is randomly selected from the candidate sound intensity conditions each time. In the self-adaptive method, a gradient descent method is used for selecting the sound intensity condition which enables the likelihood value of the confusing double syllable word pair to be maximum as the tested hearing threshold, then the speech recognition rate of each candidate sound intensity condition is predicted according to an SII model (namely formula 6), and the sound intensity condition with the recognition accuracy rate closest to a set threshold (such as 0.95) is selected as the sound intensity condition when the next confusing double syllable word pair is played;

5) selecting the confusable double-syllable word pair of the next round of speech audiometry from the test material, and carrying out speech audiometry on listeners;

6) repeating the steps 3) to 5) until the set termination condition is met; and then selecting the sound intensity condition which enables the confusing double syllable word pair to have the maximum likelihood value of the test result by using a gradient descent method as the final hearing threshold to be tested. The termination conditions are as follows: the number of test words of confusing bisyllabic word pairs reaches a limit value.

Further, the functional relationship

Wherein SII is speech intelligibility index, W₀The SII value is the SII value corresponding to the speech recognition threshold, and the PC indicates the speech recognition accuracy.

Further, SII ═ Σ_fA_f*W_f(ii) a Wherein A is_fIndicating the audibility of the frequency band f, W_fRepresenting the band weight function.

Further, in the above-mentioned case,

wherein, T_fIs the pure tone threshold at the center frequency of the frequency band f, E_fFor the audiometry of the played sound intensity of the mid-band f signal.

Further, the weight of each selected confusing syllable pair in the frequency band f is measured by a rapid frequency band weight measurement method to obtain the frequency band weight function W_f。

Further, the method for carrying out speech audiometry on the testee by using the confusable double syllable word comprises the following steps: firstly, selecting K confusable double-syllable word pairs to carry out K rounds of speech audiometry, playing a selected confusable double-syllable word pair in each round, and selecting the sound intensity of the confusable double-syllable word pair to be played in each round by using a random method; then, for each confusable double-syllable word pair played subsequently, the sound intensity of each round of confusable double-syllable word pair to be played is selected by a self-adaptive method.

Furthermore, in the self-adaptive method, a gradient descent method is used for selecting the sound intensity condition which enables the likelihood value of the confusing double syllable word pair to be maximum as the hearing threshold of the testee, then the speech recognition rate of each candidate sound intensity condition is predicted according to the SII model, and the sound intensity condition with the recognition accuracy rate closest to the set threshold is selected as the sound intensity condition when the next confusing double syllable word pair is played.

Furthermore, selecting an easily confused double syllable word pair each time to carry out a round of speech audiometry; in each round of speech audiometry, the same confusable double syllable word pair is tested repeatedly, and one word or syllable in the confusable double syllable word pair is played randomly during each test.

Further, the method for selecting the sound intensity condition which makes the confusing bisyllable word pair with the maximum likelihood value of the test result as the final hearing threshold of the human subject comprises the following steps: setting M rounds of speech audiometry to obtain M test results, wherein the strength of the frequency band f of the mth test result is

The band weight function of playing confusing syllable doublet pairs in band f is

The SII value corresponding to the speech recognition threshold is

The result of the identification of the subject is y^mThe hearing threshold estimated by the M test results is T^MThe hearing threshold at frequency band f is

PC_mRepresenting the speech recognition accuracy obtained according to the m-th recognition result; SII dieThe log-likelihood function of the pattern is: LL ═ Σ_m(y^m*log(PC_m)+(1-y^m)*log(1-PC_m) M is 1 to M; then, the likelihood function is maximized by using a gradient descent method to obtain a log likelihood function pair

Derivative of (2)

Then randomizing the initial threshold

Using formulas

Updating the hearing threshold, and recording the t-th step to obtain the hearing threshold

When the above-mentioned process is terminated after iteration T steps, then the obtained hearing threshold is updated in T step

I.e. the final hearing threshold of the frequency band f estimated by M rounds of speech audiometry.

A hearing assessment device based on speech intelligibility index, comprising a speech intelligibility index, a test material and a hearing assessment unit; the hearing evaluation unit comprises a sound intensity selection module and a parameter estimation module; wherein

The speech intelligibility index is used for establishing a functional relation between a hearing threshold and speech recognition performance and calculating the recognition accuracy of the confusable words of the test material under the given sound intensity;

the test material is an easily-confused double-syllable word pair corpus constructed according to the selected easily-confused vowel pair and the consonant pair;

the sound intensity selection module is used for selecting the sound intensity condition of the confusable double-syllable word pair during playing;

and the parameter estimation module is used for selecting the sound intensity condition which enables the confusing double syllables to have the maximum likelihood value of the test result according to the test result as the final hearing threshold of the testee.

Compared with the prior art, the invention has the following positive effects:

the invention uses speech audiometry for hearing evaluation, does not need complex guidance, is convenient for the operation of a user, has weak dependence on environment and audiometric equipment, and is convenient to implement on an intelligent terminal.

The invention can obtain more stable and reliable results in non-professional environment, and has larger correlation with the result of pure-tone audiometry, thereby being a feasible scheme for solving the hearing evaluation of the mobile terminal.

Drawings

FIG. 1 is a block diagram of a hearing assessment method based on speech intelligibility index;

FIG. 2 is a block diagram of a speech intelligibility index;

FIG. 3 is a graph of results under computer simulation conditions;

FIG. 4 is a graph of the results of a real hearing loss test;

fig. 5 is a graph of the results of a hearing loss test using the present invention and pure tone audiometry.

Detailed Description

Specific embodiments of the present invention will be described in more detail below. Fig. 1 is a block diagram of a hearing evaluation method based on speech intelligibility index according to the present invention. The specific implementation steps of the invention comprise the establishment of the relationship between the hearing threshold and the speech recognition, the establishment of speech audiometric materials based on the confusable double syllable word pair, the parameter estimation and the sound intensity condition selection. The specific implementation process of each step is as follows:

1. establishing relationship between hearing threshold and speech recognition

The present work relates the hearing threshold to Speech recognition performance by Speech Intelligibility Index (SII), whose block diagram is shown in fig. 2.

The formula for calculating the speech intelligibility index is as follows:

SII＝∑_fA_f*W_f (1)

wherein f denotes a frequency band, A_fAudibility (Audibility) of the f-band, which is between 0 and 1, indicates that no voice cue can be heard if 0, and indicates that all available voice cues are received by the listener. The calculation formula is as follows, wherein SNR_fRepresents the signal-to-noise ratio within band f:

W_fis a Band Impedance Function (BIF) which represents the Importance of the Band f for speech understanding; as shown in FIG. 2, there are n bands, and the weights of the n bands are measured for each confusing bisyllable pair and are referred to as the band weight function W_f，W_fCan be measured by a fast band weight measurement method. After the SII value is calculated, the predicted value of the speech recognition accuracy PC is obtained by a Transfer Function:

wherein W₀The method is a constant term of logistic regression, and the physical meaning is an SII value corresponding to the recognition accuracy of 50%, namely the SII value corresponding to a speech recognition threshold.

For speech intelligibility index calculation for hearing impaired persons, SII introduces the concept of Equivalent interference (Equivalent interference). SNR used in calculating audibility of a hearing person_fIs the difference between the signal and the noise spectral intensity, the signal-to-noise ratio used in calculating the audibility of the hearing impaired is the difference between the signal and the equivalent interference spectral intensity, which is calculated as follows:

D_f＝max{T_f，N_f} (4)

wherein T is_fIs the pure tone threshold at the center frequency of band f in dB SPL, N_fIs the noise spectral strength of the frequency band f. If the noise-free infection condition is predictedThen equation (4) becomes:

D_f＝T_f (5)

at this time, the relationship between the hearing threshold and the speech recognition expression is as follows

If the frequency band weight function W of each confusable syllable pair is measured by the rapid frequency band weight measurement method_fAnd obtaining a recognition result PC through a speech audiometry experiment. Then the playing sound intensity of the test stimulus when the audiometric midband f signal is known as E_fThen, the hearing threshold T at the frequency band f can be solved by equation (6)_fFinally, the tested hearing threshold T is obtained.

2. Construction of speech audiometric material based on confusable double syllable word pair

The invention constructs a speech audiometric corpus according to easily confused vowels and easily confused consonants, firstly collocating the easily confused consonant pairs with the same vowels and collocating the easily confused vowel pairs with the same consonants to form an easily confused monosyllable word pair, then collocating the easily confused monosyllable word pair with another monosyllable word with the same pronunciation to form an easily confused monosyllable word pair, ensuring that the accents of the two words are the same and ensuring that only one vowel or consonant at the same position in the easily confused monosyllable word pair is different.

3. Selection of sound intensity conditions

The invention needs to collect the result of the tested identified word pair, each experiment shows stimulation with different sound intensity, and the selection of the sound intensity condition is an important problem. A simple strategy is to randomly select-10 to 90dB SPL sound intensity conditions for each band. If random selection is made at-10 to 90dB SPL with 10dB intervals, each band has 11 selectable sound intensity conditions, and 5 octave bands are set in the work, so that 11 are total⁵Possible sound intensity conditions. The method enables the parameter space of the sound intensity condition to be large, and more accurate evaluation results can be difficult to obtain under the condition of less test times and less experimental data. The invention is based on a large number of hearing loss patientsThe clustering results of the threshold selected a "range selection" strategy that used ranges around the thresholds (shown in table 1) for 4 mild to moderate hearing loss types as optional intensity conditions. The method specifically comprises the following steps: the method specifically comprises the following steps: the lowest sound intensities were found for each band at-10, 20 and 50dB SPL, with 10dB intervals, with the sound intensity conditions randomly selected in the range of 40dB, and the sound intensity conditions randomly selected in the range around the threshold for steep hearing loss, as shown in table 2. The strategy has 9530 possible sound intensity conditions, and the parameter space is greatly reduced.

TABLE 1 Hearing threshold Table for representative types of hearing loss

TABLE 2 Sound intensity Range settings

The "range selection" strategy limits the candidate sound intensity conditions, but if the selection is made randomly from the candidate sound intensity conditions each time a stimulus is played, it is difficult for the parameters to converge in a short time. The invention provides a self-adaptive method to accelerate convergence speed. The method estimates a hearing threshold based on existing data, calculates the predicted speech recognition accuracy of each candidate sound intensity condition through an SII model according to the hearing threshold, and selects the sound intensity condition with the recognition accuracy closest to 0.95. After the stimulus is played, new data is collected and the hearing threshold is updated. This process was repeated until all experiments were completed. According to the SII model, if the audibility of each band is 0.5, i.e., the sound intensity is close to the hearing threshold of the subject, the predicted average speech recognition accuracy is 0.95, and therefore a value of 0.95 is used in the adaptive method.

4. Parameter estimation

The invention estimates the listener's hearing threshold through the tested result, and uses the value to proceed the next iteration. Suppose that M pieces of data are collected by experiment, where the intensity of the f-band of the M-th piece of data is expressed as

The band weight function of the played confusing syllable doublet to the band f is

The SII value corresponding to the speech recognition threshold is

The identification result of the test is y^m. Suppose the hearing threshold estimated by M pieces of data is T^MThe hearing threshold at the frequency band f is

The log-likelihood function of the SII model is:

LL＝∑_m(y^m*log(PC_m)+(1-y^m)*log(1-PC_m)) (7)

PC_mshows the speech recognition accuracy obtained by the m-th recognition result (since only one piece of data is limited, PC here_mValue of 0-1) maximizing the log-likelihood function and the log-likelihood function pair by using a gradient descent method

The derivative of (c) is:

randomized initial threshold

Updating hearing threshold in the t step

When a condition is found that maximizes the likelihood of the confusing word for the test result, equations 7 through 9 stop the iteration. Assuming that the above process terminates after the iteration T steps, then

The advantages of the invention are illustrated below with reference to specific embodiments.

The method is used for evaluating the computer simulation conditions and the real speech audiometry result of the hearing loss test. The results of the method will be compared to the results of pure tone audiometry.

1. Hearing assessment experiment setup

In order to verify the accuracy and stability of the hearing evaluation method provided by the invention under an ideal condition, a computer simulation experiment is carried out. The computer simulation experiment refers to replacing a tested object in a real experiment by a calculation method. The recognition accuracy of each stimulus was predicted using the SII model, which uses the parameters: the BIF of the strip and the simulated hearing threshold of the subject. And (4) taking the identification accuracy rate predicted by the SII as the probability of generating a correct answer, and randomly generating an identification result. According to experimental data predicted by the SII model, the hearing evaluation method based on the speech intelligibility index provided by the work is used for evaluating the hearing condition of the simulated tested person and comparing the hearing condition with the hearing threshold of the simulated tested person, namely the hearing threshold set in the SII model, so that the accuracy of the method is analyzed. The simulation experiment of each condition can be run for a plurality of times, and the average result of the simulation experiment can be used as the final result of the simulated speech audiometry.

14 hearing impaired subjects participated in the hearing evaluation process, and all subjects completed speech audiometry and pure tone audiometry of both ears in a sound isolation booth environment. Under the computer simulation condition, the pure tone audiometry results of 14 hearing impairment test subjects (28 test ears) are used as the hearing threshold of the simulation test subjects, and the hearing of the simulation test subjects is evaluated by using the method provided by the invention.

2. Results of the Hearing assessment experiment

In order to evaluate the effectiveness of the invention, the method is respectively used for evaluating the hearing on a real hearing impairment test under the computer simulation condition. Fig. 3 and 4 are graphs showing the average results of real hearing impairment tests under computer simulation conditions.

Under computer simulation conditions, the hearing threshold estimated by the speech audiometry result is high in similarity (r is 0.87, p is less than 0.01) with the result of pure-tone audiometry estimation, and the mean square error is 13.65. For the real hearing loss test, the method also has a significant correlation with the result of pure-tone audiometry (r is 0.54, p is less than 0.01), and the mean square error is 14.77. Fig. 5 is a graph of the results of a hearing impairment test using the present invention and pure tone audiometry under sound isolation booth conditions.

The experimental result shows that the method can obtain the result close to pure-tone audiometry, and meanwhile, the speech audiometry does not need complex guidance, so that the method is convenient for the operation of a user. In addition, the requirements of speech audiometry on environment and equipment are relatively reduced compared with pure tone audiometry, so that the speech audiometry can be used as a feasible scheme for hearing evaluation of the mobile terminal.

Although specific embodiments of the invention have been disclosed for illustrative purposes and the accompanying drawings, which are included to provide a further understanding of the invention and are incorporated by reference, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. Therefore, the present invention should not be limited to the disclosure of the preferred embodiments and the accompanying drawings.

Claims

1. A hearing assessment method based on speech intelligibility index comprises the following steps:

1) establishing a functional relationship between the hearing threshold and speech recognition performance by using the speech intelligibility index;

2) constructing an easily confused double-syllable word pair corpus as a test corpus for speech audiometry according to the selected easily confused vowel pair and the selected consonant pair, and measuring a frequency band weight function of the test corpus by using a rapid frequency band weight measurement method;

3) using the confusable double syllable word pair constructed in the step 2) to perform speech audiometry on the testee; then selecting the sound intensity condition which enables the confusing double syllables to have the maximum likelihood value of the test result as the final hearing threshold of the testee; the method for obtaining the final hearing threshold comprises the following steps: setting M rounds of speech audiometry to obtain M test results, wherein the strength of the frequency band f of the mth test result is

Speech intelligibility index SII value corresponding to speech recognition threshold is

PC_mRepresenting the speech recognition accuracy obtained according to the m-th recognition result; the log-likelihood function of the speech intelligibility index SII model is: LL ═ Σ_m(y^m*log(PC_m)+(1-y^m)*log(1-PC_m) M is 1 to M; then, the likelihood function is maximized by using a gradient descent method to obtain a log likelihood function pair

Derivative of (2)

Then randomizing the initial threshold

Using formulas

2. The method of claim 1, wherein the functional relationship

3. The method of claim 2, wherein the speech intelligibility index SII ═ Σ_fA_f*W_f(ii) a Wherein A is_fIndicating the audibility of the frequency band f, W_fRepresenting the band weight function.

4. The method of claim 3,

5. A method as claimed in claim 2 or 3, characterized in that the selected individual ones are measured by means of a fast band weight measurement methodThe weight of each confusable syllable pair in the frequency band f is obtained to obtain the frequency band weight function W_f。

6. The method of claim 1, wherein the method of speech audiometry for the subject using the confusable bi-syllabic word pair constructed in step 2) comprises: firstly, selecting K confusable double-syllable word pairs to carry out K rounds of speech audiometry, playing a selected confusable double-syllable word pair in each round, and selecting the sound intensity of the confusable double-syllable word pair to be played in each round by using a random method; then, for each confusable double-syllable word pair played subsequently, the sound intensity of each round of confusable double-syllable word pair to be played is selected by a self-adaptive method.

7. The method as claimed in claim 6, wherein in the adaptive method, a gradient descent method is used to select the intensity condition that maximizes the likelihood of the confusing bisyllable word pair as the hearing threshold of the subject, the speech recognition rate of each candidate intensity condition is predicted according to the speech intelligibility index SII model, and the intensity condition whose recognition accuracy is closest to the set threshold is selected as the intensity condition when the next confusing bisyllable word pair is played.

8. The method of claim 1 or 6, wherein a round of speech audiometry is performed each time a confusable pair of bisyllables is selected; in each round of speech audiometry, the same confusable double syllable word pair is tested repeatedly, and one word or syllable in the confusable double syllable word pair is played randomly during each test.

9. A hearing assessment device based on speech intelligibility index, comprising a speech intelligibility index, a test material and a hearing assessment unit; the hearing evaluation unit comprises a sound intensity selection module and a parameter estimation module; wherein

the parameter estimation module is used for selecting the sound intensity condition which enables the confusable double syllables to have the maximum likelihood value of the test result according to the test result as the final hearing threshold of the testee; the method for obtaining the final hearing threshold comprises the following steps: setting M rounds of speech audiometry to obtain M test results, wherein the strength of the frequency band f of the mth test result is

Derivative of (2)

Then randomizing the initial threshold

Using formulas